From 69255e746890274e887ba36a403019380cde0b48 Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Mon, 20 Dec 2021 19:56:41 +0000 Subject: firmware: arm_scmi: Add support for atomic transports An SCMI transport can be configured as .atomic_enabled in order to signal to the SCMI core that all its TX path is executed in atomic context and that, when requested, polling mode should be used while waiting for command responses. When a specific platform configuration had properly configured such a transport as .atomic_enabled, the SCMI core will also take care not to sleep in the corresponding RX path while waiting for a response if that specific command transaction was requested as atomic using polling mode. Asynchronous commands should not be used in an atomic context and so a warning is emitted if polling was requested for an asynchronous command. Add also a method to check, from the SCMI drivers, if the underlying SCMI transport is currently configured to support atomic transactions: this will be used by upper layers to determine if atomic requests can be supported at all on this SCMI instance. Link: https://lore.kernel.org/r/20211220195646.44498-7-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 80e781c51ddc..9f895cb81818 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -612,6 +612,13 @@ struct scmi_notify_ops { * @devm_protocol_get: devres managed method to acquire a protocol and get specific * operations and a dedicated protocol handler * @devm_protocol_put: devres managed method to release a protocol + * @is_transport_atomic: method to check if the underlying transport for this + * instance handle is configured to support atomic + * transactions for commands. + * Some users of the SCMI stack in the upper layers could + * be interested to know if they can assume SCMI + * command transactions associated to this handle will + * never sleep and act accordingly. * @notify_ops: pointer to set of notifications related operations */ struct scmi_handle { @@ -622,6 +629,7 @@ struct scmi_handle { (*devm_protocol_get)(struct scmi_device *sdev, u8 proto, struct scmi_protocol_handle **ph); void (*devm_protocol_put)(struct scmi_device *sdev, u8 proto); + bool (*is_transport_atomic)(const struct scmi_handle *handle); const struct scmi_notify_ops *notify_ops; }; -- cgit From e8c1f36157ce0bf8c150059c3f9f573c13a186df Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Mon, 10 Jan 2022 16:33:05 -0800 Subject: dma-buf-map: Fix dot vs comma in example Fix typo: separate arguments with comma rather than dot. Signed-off-by: Lucas De Marchi Signed-off-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220111003305.1214667-1-lucas.demarchi@intel.com --- include/linux/dma-buf-map.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-buf-map.h b/include/linux/dma-buf-map.h index 278d489e4bdd..19fa0b5ae5ec 100644 --- a/include/linux/dma-buf-map.h +++ b/include/linux/dma-buf-map.h @@ -52,13 +52,13 @@ * * struct dma_buf_map map = DMA_BUF_MAP_INIT_VADDR(0xdeadbeaf); * - * dma_buf_map_set_vaddr(&map. 0xdeadbeaf); + * dma_buf_map_set_vaddr(&map, 0xdeadbeaf); * * To set an address in I/O memory, use dma_buf_map_set_vaddr_iomem(). * * .. code-block:: c * - * dma_buf_map_set_vaddr_iomem(&map. 0xdeadbeaf); + * dma_buf_map_set_vaddr_iomem(&map, 0xdeadbeaf); * * Instances of struct dma_buf_map do not have to be cleaned up, but * can be cleared to NULL with dma_buf_map_clear(). Cleared mappings -- cgit From d72d84aea4d57a735d8cfbade32ed323f47a5941 Mon Sep 17 00:00:00 2001 From: Guchun Chen Date: Fri, 14 Jan 2022 16:37:42 +0800 Subject: locking/rwsem: drop redundant semicolon of down_write_nest_lock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Otherwise, braces are needed when using it. Signed-off-by: Guchun Chen Acked-by: Christian König Acked-by: Peter Zijlstra (Intel) Link: https://patchwork.freedesktop.org/patch/msgid/20220114083742.6219-1-guchun.chen@amd.com Signed-off-by: Christian König --- include/linux/rwsem.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rwsem.h b/include/linux/rwsem.h index f9348769e558..efa5c324369a 100644 --- a/include/linux/rwsem.h +++ b/include/linux/rwsem.h @@ -230,7 +230,7 @@ extern void _down_write_nest_lock(struct rw_semaphore *sem, struct lockdep_map * do { \ typecheck(struct lockdep_map *, &(nest_lock)->dep_map); \ _down_write_nest_lock(sem, &(nest_lock)->dep_map); \ -} while (0); +} while (0) /* * Take/release a lock when not the owner will release it. -- cgit From dee872e124e8d5de22b68c58f6f6c3f5e8889160 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 14 Jan 2022 22:09:45 +0530 Subject: bpf: Populate kfunc BTF ID sets in struct btf This patch prepares the kernel to support putting all kinds of kfunc BTF ID sets in the struct btf itself. The various kernel subsystems will make register_btf_kfunc_id_set call in the initcalls (for built-in code and modules). The 'hook' is one of the many program types, e.g. XDP and TC/SCHED_CLS, STRUCT_OPS, and 'types' are check (allowed or not), acquire, release, and ret_null (with PTR_TO_BTF_ID_OR_NULL return type). A maximum of BTF_KFUNC_SET_MAX_CNT (32) kfunc BTF IDs are permitted in a set of certain hook and type for vmlinux sets, since they are allocated on demand, and otherwise set as NULL. Module sets can only be registered once per hook and type, hence they are directly assigned. A new btf_kfunc_id_set_contains function is exposed for use in verifier, this new method is faster than the existing list searching method, and is also automatic. It also lets other code not care whether the set is unallocated or not. Note that module code can only do single register_btf_kfunc_id_set call per hook. This is why sorting is only done for in-kernel vmlinux sets, because there might be multiple sets for the same hook and type that must be concatenated, hence sorting them is required to ensure bsearch in btf_id_set_contains continues to work correctly. Next commit will update the kernel users to make use of this infrastructure. Finally, add __maybe_unused annotation for BTF ID macros for the !CONFIG_DEBUG_INFO_BTF case, so that they don't produce warnings during build time. The previous patch is also needed to provide synchronization against initialization for module BTF's kfunc_set_tab introduced here, as described below: The kfunc_set_tab pointer in struct btf is write-once (if we consider the registration phase (comprised of multiple register_btf_kfunc_id_set calls) as a single operation). In this sense, once it has been fully prepared, it isn't modified, only used for lookup (from the verifier context). For btf_vmlinux, it is initialized fully during the do_initcalls phase, which happens fairly early in the boot process, before any processes are present. This also eliminates the possibility of bpf_check being called at that point, thus relieving us of ensuring any synchronization between the registration and lookup function (btf_kfunc_id_set_contains). However, the case for module BTF is a bit tricky. The BTF is parsed, prepared, and published from the MODULE_STATE_COMING notifier callback. After this, the module initcalls are invoked, where our registration function will be called to populate the kfunc_set_tab for module BTF. At this point, BTF may be available to userspace while its corresponding module is still intializing. A BTF fd can then be passed to verifier using bpf syscall (e.g. for kfunc call insn). Hence, there is a race window where verifier may concurrently try to lookup the kfunc_set_tab. To prevent this race, we must ensure the operations are serialized, or waiting for the __init functions to complete. In the earlier registration API, this race was alleviated as verifier bpf_check_mod_kfunc_call didn't find the kfunc BTF ID until it was added by the registration function (called usually at the end of module __init function after all module resources have been initialized). If the verifier made the check_kfunc_call before kfunc BTF ID was added to the list, it would fail verification (saying call isn't allowed). The access to list was protected using a mutex. Now, it would still fail verification, but for a different reason (returning ENXIO due to the failed btf_try_get_module call in add_kfunc_call), because if the __init call is in progress the module will be in the middle of MODULE_STATE_COMING -> MODULE_STATE_LIVE transition, and the BTF_MODULE_LIVE flag for btf_module instance will not be set, so the btf_try_get_module call will fail. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220114163953.1455836-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/btf.h | 39 +++++++++++++++++++++++++++++++++++++++ include/linux/btf_ids.h | 13 +++++++------ 2 files changed, 46 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index 0c74348cbc9d..c451f8e2612a 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -12,11 +12,33 @@ #define BTF_TYPE_EMIT(type) ((void)(type *)0) #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) +enum btf_kfunc_type { + BTF_KFUNC_TYPE_CHECK, + BTF_KFUNC_TYPE_ACQUIRE, + BTF_KFUNC_TYPE_RELEASE, + BTF_KFUNC_TYPE_RET_NULL, + BTF_KFUNC_TYPE_MAX, +}; + struct btf; struct btf_member; struct btf_type; union bpf_attr; struct btf_show; +struct btf_id_set; + +struct btf_kfunc_id_set { + struct module *owner; + union { + struct { + struct btf_id_set *check_set; + struct btf_id_set *acquire_set; + struct btf_id_set *release_set; + struct btf_id_set *ret_null_set; + }; + struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX]; + }; +}; extern const struct file_operations btf_fops; @@ -307,6 +329,11 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id); const char *btf_name_by_offset(const struct btf *btf, u32 offset); struct btf *btf_parse_vmlinux(void); struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); +bool btf_kfunc_id_set_contains(const struct btf *btf, + enum bpf_prog_type prog_type, + enum btf_kfunc_type type, u32 kfunc_btf_id); +int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, + const struct btf_kfunc_id_set *s); #else static inline const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id) @@ -318,6 +345,18 @@ static inline const char *btf_name_by_offset(const struct btf *btf, { return NULL; } +static inline bool btf_kfunc_id_set_contains(const struct btf *btf, + enum bpf_prog_type prog_type, + enum btf_kfunc_type type, + u32 kfunc_btf_id) +{ + return false; +} +static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, + const struct btf_kfunc_id_set *s) +{ + return 0; +} #endif struct kfunc_btf_id_set { diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index 919c0fde1c51..bc5d9cc34e4c 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -11,6 +11,7 @@ struct btf_id_set { #ifdef CONFIG_DEBUG_INFO_BTF #include /* for __PASTE */ +#include /* for __maybe_unused */ /* * Following macros help to define lists of BTF IDs placed @@ -146,14 +147,14 @@ extern struct btf_id_set name; #else -#define BTF_ID_LIST(name) static u32 name[5]; +#define BTF_ID_LIST(name) static u32 __maybe_unused name[5]; #define BTF_ID(prefix, name) #define BTF_ID_UNUSED -#define BTF_ID_LIST_GLOBAL(name, n) u32 name[n]; -#define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 name[1]; -#define BTF_ID_LIST_GLOBAL_SINGLE(name, prefix, typename) u32 name[1]; -#define BTF_SET_START(name) static struct btf_id_set name = { 0 }; -#define BTF_SET_START_GLOBAL(name) static struct btf_id_set name = { 0 }; +#define BTF_ID_LIST_GLOBAL(name, n) u32 __maybe_unused name[n]; +#define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 __maybe_unused name[1]; +#define BTF_ID_LIST_GLOBAL_SINGLE(name, prefix, typename) u32 __maybe_unused name[1]; +#define BTF_SET_START(name) static struct btf_id_set __maybe_unused name = { 0 }; +#define BTF_SET_START_GLOBAL(name) static struct btf_id_set __maybe_unused name = { 0 }; #define BTF_SET_END(name) #endif /* CONFIG_DEBUG_INFO_BTF */ -- cgit From b202d84422223b7222cba5031d182f20b37e146e Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 14 Jan 2022 22:09:46 +0530 Subject: bpf: Remove check_kfunc_call callback and old kfunc BTF ID API Completely remove the old code for check_kfunc_call to help it work with modules, and also the callback itself. The previous commit adds infrastructure to register all sets and put them in vmlinux or module BTF, and concatenates all related sets organized by the hook and the type. Once populated, these sets remain immutable for the lifetime of the struct btf. Also, since we don't need the 'owner' module anywhere when doing check_kfunc_call, drop the 'btf_modp' module parameter from find_kfunc_desc_btf. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220114163953.1455836-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 8 -------- include/linux/btf.h | 44 -------------------------------------------- 2 files changed, 52 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6e947cd91152..6d7346c54d83 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -573,7 +573,6 @@ struct bpf_verifier_ops { const struct btf_type *t, int off, int size, enum bpf_access_type atype, u32 *next_btf_id); - bool (*check_kfunc_call)(u32 kfunc_btf_id, struct module *owner); }; struct bpf_prog_offload_ops { @@ -1719,7 +1718,6 @@ int bpf_prog_test_run_raw_tp(struct bpf_prog *prog, int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr); -bool bpf_prog_test_check_kfunc_call(u32 kfunc_id, struct module *owner); bool btf_ctx_access(int off, int size, enum bpf_access_type type, const struct bpf_prog *prog, struct bpf_insn_access_aux *info); @@ -1971,12 +1969,6 @@ static inline int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, return -ENOTSUPP; } -static inline bool bpf_prog_test_check_kfunc_call(u32 kfunc_id, - struct module *owner) -{ - return false; -} - static inline void bpf_map_put(struct bpf_map *map) { } diff --git a/include/linux/btf.h b/include/linux/btf.h index c451f8e2612a..b12cfe3b12bb 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -359,48 +359,4 @@ static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, } #endif -struct kfunc_btf_id_set { - struct list_head list; - struct btf_id_set *set; - struct module *owner; -}; - -struct kfunc_btf_id_list { - struct list_head list; - struct mutex mutex; -}; - -#ifdef CONFIG_DEBUG_INFO_BTF_MODULES -void register_kfunc_btf_id_set(struct kfunc_btf_id_list *l, - struct kfunc_btf_id_set *s); -void unregister_kfunc_btf_id_set(struct kfunc_btf_id_list *l, - struct kfunc_btf_id_set *s); -bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist, u32 kfunc_id, - struct module *owner); - -extern struct kfunc_btf_id_list bpf_tcp_ca_kfunc_list; -extern struct kfunc_btf_id_list prog_test_kfunc_list; -#else -static inline void register_kfunc_btf_id_set(struct kfunc_btf_id_list *l, - struct kfunc_btf_id_set *s) -{ -} -static inline void unregister_kfunc_btf_id_set(struct kfunc_btf_id_list *l, - struct kfunc_btf_id_set *s) -{ -} -static inline bool bpf_check_mod_kfunc_call(struct kfunc_btf_id_list *klist, - u32 kfunc_id, struct module *owner) -{ - return false; -} - -static struct kfunc_btf_id_list bpf_tcp_ca_kfunc_list __maybe_unused; -static struct kfunc_btf_id_list prog_test_kfunc_list __maybe_unused; -#endif - -#define DEFINE_KFUNC_BTF_ID_SET(set, name) \ - struct kfunc_btf_id_set name = { LIST_HEAD_INIT(name.list), (set), \ - THIS_MODULE } - #endif -- cgit From d583691c47dc0424ebe926000339a6d6cd590ff7 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 14 Jan 2022 22:09:47 +0530 Subject: bpf: Introduce mem, size argument pair support for kfunc BPF helpers can associate two adjacent arguments together to pass memory of certain size, using ARG_PTR_TO_MEM and ARG_CONST_SIZE arguments. Since we don't use bpf_func_proto for kfunc, we need to leverage BTF to implement similar support. The ARG_CONST_SIZE processing for helpers is refactored into a common check_mem_size_reg helper that is shared with kfunc as well. kfunc ptr_to_mem support follows logic similar to global functions, where verification is done as if pointer is not null, even when it may be null. This leads to a simple to follow rule for writing kfunc: always check the argument pointer for NULL, except when it is PTR_TO_CTX. Also, the PTR_TO_CTX case is also only safe when the helper expecting pointer to program ctx is not exposed to other programs where same struct is not ctx type. In that case, the type check will fall through to other cases and would permit passing other types of pointers, possibly NULL at runtime. Currently, we require the size argument to be suffixed with "__sz" in the parameter name. This information is then recorded in kernel BTF and verified during function argument checking. In the future we can use BTF tagging instead, and modify the kernel function definitions. This will be a purely kernel-side change. This allows us to have some form of backwards compatibility for structures that are passed in to the kernel function with their size, and allow variable length structures to be passed in if they are accompanied by a size parameter. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220114163953.1455836-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 143401d4c9d9..857fd687bdc2 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -521,6 +521,8 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt); int check_ctx_reg(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int regno); +int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, + u32 regno); int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno, u32 mem_size); -- cgit From 5c073f26f9dc78a6c8194b23eac7537c9692c7d7 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 14 Jan 2022 22:09:48 +0530 Subject: bpf: Add reference tracking support to kfunc This patch adds verifier support for PTR_TO_BTF_ID return type of kfunc to be a reference, by reusing acquire_reference_state/release_reference support for existing in-kernel bpf helpers. We make use of the three kfunc types: - BTF_KFUNC_TYPE_ACQUIRE Return true if kfunc_btf_id is an acquire kfunc. This will acquire_reference_state for the returned PTR_TO_BTF_ID (this is the only allow return value). Note that acquire kfunc must always return a PTR_TO_BTF_ID{_OR_NULL}, otherwise the program is rejected. - BTF_KFUNC_TYPE_RELEASE Return true if kfunc_btf_id is a release kfunc. This will release the reference to the passed in PTR_TO_BTF_ID which has a reference state (from earlier acquire kfunc). The btf_check_func_arg_match returns the regno (of argument register, hence > 0) if the kfunc is a release kfunc, and a proper referenced PTR_TO_BTF_ID is being passed to it. This is similar to how helper call check uses bpf_call_arg_meta to store the ref_obj_id that is later used to release the reference. Similar to in-kernel helper, we only allow passing one referenced PTR_TO_BTF_ID as an argument. It can also be passed in to normal kfunc, but in case of release kfunc there must always be one PTR_TO_BTF_ID argument that is referenced. - BTF_KFUNC_TYPE_RET_NULL For kfunc returning PTR_TO_BTF_ID, tells if it can be NULL, hence force caller to mark the pointer not null (using check) before accessing it. Note that taking into account the case fixed by commit 93c230e3f5bd ("bpf: Enforce id generation for all may-be-null register type") we assign a non-zero id for mark_ptr_or_null_reg logic. Later, if more return types are supported by kfunc, which have a _OR_NULL variant, it might be better to move this id generation under a common reg_type_may_be_null check, similar to the case in the commit. Referenced PTR_TO_BTF_ID is currently only limited to kfunc, but can be extended in the future to other BPF helpers as well. For now, we can rely on the btf_struct_ids_match check to ensure we get the pointer to the expected struct type. In the future, care needs to be taken to avoid ambiguity for reference PTR_TO_BTF_ID passed to release function, in case multiple candidates can release same BTF ID. e.g. there might be two release kfuncs (or kfunc and helper): foo(struct abc *p); bar(struct abc *p); ... such that both release a PTR_TO_BTF_ID with btf_id of struct abc. In this case we would need to track the acquire function corresponding to the release function to avoid type confusion, and store this information in the register state so that an incorrect program can be rejected. This is not a problem right now, hence it is left as an exercise for the future patch introducing such a case in the kernel. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220114163953.1455836-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 857fd687bdc2..ac4797155412 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -566,4 +566,9 @@ static inline u32 type_flag(u32 type) return type & ~BPF_BASE_TYPE_MASK; } +static inline enum bpf_prog_type resolve_prog_type(struct bpf_prog *prog) +{ + return prog->aux->dst_prog ? prog->aux->dst_prog->type : prog->type; +} + #endif /* _LINUX_BPF_VERIFIER_H */ -- cgit From 75ab2b3633ccddd8f7bdf6c76f9ab3f9b2fc5d9d Mon Sep 17 00:00:00 2001 From: Christian König Date: Thu, 28 Oct 2021 13:19:22 +0200 Subject: dma-buf: drop excl_fence parameter from dma_resv_get_fences MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Returning the exclusive fence separately is no longer used. Instead add a write parameter to indicate the use case. Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20211207123411.167006-4-christian.koenig@amd.com --- include/linux/dma-resv.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index eebf04325b34..a715df97b31a 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -458,8 +458,8 @@ void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); -int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **pfence_excl, - unsigned *pshared_count, struct dma_fence ***pshared); +int dma_resv_get_fences(struct dma_resv *obj, bool write, + unsigned int *num_fences, struct dma_fence ***fences); int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout); -- cgit From f10d059661968b01ef61a8b516775f95a18ab8ae Mon Sep 17 00:00:00 2001 From: YiFei Zhu Date: Thu, 16 Dec 2021 02:04:25 +0000 Subject: bpf: Make BPF_PROG_RUN_ARRAY return -err instead of allow boolean Right now BPF_PROG_RUN_ARRAY and related macros return 1 or 0 for whether the prog array allows or rejects whatever is being hooked. The caller of these macros then return -EPERM or continue processing based on thw macro's return value. Unforunately this is inflexible, since -EPERM is the only err that can be returned. This patch should be a no-op; it prepares for the next patch. The returning of the -EPERM is moved to inside the macros, so the outer functions are directly returning what the macros returned if they are non-zero. Signed-off-by: YiFei Zhu Reviewed-by: Stanislav Fomichev Link: https://lore.kernel.org/r/788abcdca55886d1f43274c918eaa9f792a9f33b.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6d7346c54d83..83da1764fcfe 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1277,7 +1277,7 @@ static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx) typedef u32 (*bpf_prog_run_fn)(const struct bpf_prog *prog, const void *ctx); -static __always_inline u32 +static __always_inline int BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu, const void *ctx, bpf_prog_run_fn run_prog, u32 *ret_flags) @@ -1287,7 +1287,7 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu, const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_cg_run_ctx run_ctx; - u32 ret = 1; + int ret = 0; u32 func_ret; migrate_disable(); @@ -1298,7 +1298,8 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu, while ((prog = READ_ONCE(item->prog))) { run_ctx.prog_item = item; func_ret = run_prog(prog, ctx); - ret &= (func_ret & 1); + if (!(func_ret & 1)) + ret = -EPERM; *(ret_flags) |= (func_ret >> 1); item++; } @@ -1308,7 +1309,7 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu, return ret; } -static __always_inline u32 +static __always_inline int BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu, const void *ctx, bpf_prog_run_fn run_prog) { @@ -1317,7 +1318,7 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu, const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_cg_run_ctx run_ctx; - u32 ret = 1; + int ret = 0; migrate_disable(); rcu_read_lock(); @@ -1326,7 +1327,8 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu, old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); while ((prog = READ_ONCE(item->prog))) { run_ctx.prog_item = item; - ret &= run_prog(prog, ctx); + if (!run_prog(prog, ctx)) + ret = -EPERM; item++; } bpf_reset_run_ctx(old_run_ctx); @@ -1394,7 +1396,7 @@ out: u32 _ret; \ _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, &_flags); \ _cn = _flags & BPF_RET_SET_CN; \ - if (_ret) \ + if (!_ret) \ _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \ else \ _ret = (_cn ? NET_XMIT_DROP : -EPERM); \ -- cgit From c4dcfdd406aa2167396ac215e351e5e4dfd7efe3 Mon Sep 17 00:00:00 2001 From: YiFei Zhu Date: Thu, 16 Dec 2021 02:04:26 +0000 Subject: bpf: Move getsockopt retval to struct bpf_cg_run_ctx The retval value is moved to struct bpf_cg_run_ctx for ease of access in different prog types with different context structs layouts. The helper implementation (to be added in a later patch in the series) can simply perform a container_of from current->bpf_ctx to retrieve bpf_cg_run_ctx. Unfortunately, there is no easy way to access the current task_struct via the verifier BPF bytecode rewrite, aside from possibly calling a helper, so a pointer to current task is added to struct bpf_sockopt_kern so that the rewritten BPF bytecode can access struct bpf_cg_run_ctx with an indirection. For backward compatibility, if a getsockopt program rejects a syscall by returning 0, an -EPERM will be generated, by having the BPF_PROG_RUN_ARRAY_CG family macros automatically set the retval to -EPERM. Unlike prior to this patch, this -EPERM will be visible to ctx->retval for any other hooks down the line in the prog array. Additionally, the restriction that getsockopt filters can only set the retval to 0 is removed, considering that certain getsockopt implementations may return optlen. Filters are now able to set the value arbitrarily. Signed-off-by: YiFei Zhu Reviewed-by: Stanislav Fomichev Link: https://lore.kernel.org/r/73b0325f5c29912ccea7ea57ec1ed4d388fc1d37.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 20 +++++++++++--------- include/linux/filter.h | 5 ++++- 2 files changed, 15 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 83da1764fcfe..7b0c11f414d0 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1245,6 +1245,7 @@ struct bpf_run_ctx {}; struct bpf_cg_run_ctx { struct bpf_run_ctx run_ctx; const struct bpf_prog_array_item *prog_item; + int retval; }; struct bpf_trace_run_ctx { @@ -1280,16 +1281,16 @@ typedef u32 (*bpf_prog_run_fn)(const struct bpf_prog *prog, const void *ctx); static __always_inline int BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu, const void *ctx, bpf_prog_run_fn run_prog, - u32 *ret_flags) + int retval, u32 *ret_flags) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog; const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_cg_run_ctx run_ctx; - int ret = 0; u32 func_ret; + run_ctx.retval = retval; migrate_disable(); rcu_read_lock(); array = rcu_dereference(array_rcu); @@ -1299,27 +1300,28 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu, run_ctx.prog_item = item; func_ret = run_prog(prog, ctx); if (!(func_ret & 1)) - ret = -EPERM; + run_ctx.retval = -EPERM; *(ret_flags) |= (func_ret >> 1); item++; } bpf_reset_run_ctx(old_run_ctx); rcu_read_unlock(); migrate_enable(); - return ret; + return run_ctx.retval; } static __always_inline int BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu, - const void *ctx, bpf_prog_run_fn run_prog) + const void *ctx, bpf_prog_run_fn run_prog, + int retval) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog; const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_cg_run_ctx run_ctx; - int ret = 0; + run_ctx.retval = retval; migrate_disable(); rcu_read_lock(); array = rcu_dereference(array_rcu); @@ -1328,13 +1330,13 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu, while ((prog = READ_ONCE(item->prog))) { run_ctx.prog_item = item; if (!run_prog(prog, ctx)) - ret = -EPERM; + run_ctx.retval = -EPERM; item++; } bpf_reset_run_ctx(old_run_ctx); rcu_read_unlock(); migrate_enable(); - return ret; + return run_ctx.retval; } static __always_inline u32 @@ -1394,7 +1396,7 @@ out: u32 _flags = 0; \ bool _cn; \ u32 _ret; \ - _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, &_flags); \ + _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \ _cn = _flags & BPF_RET_SET_CN; \ if (!_ret) \ _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \ diff --git a/include/linux/filter.h b/include/linux/filter.h index 71fa57b88bfc..d23e999dc032 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1356,7 +1356,10 @@ struct bpf_sockopt_kern { s32 level; s32 optname; s32 optlen; - s32 retval; + /* for retval in struct bpf_cg_run_ctx */ + struct task_struct *current_task; + /* Temporary "register" for indirect stores to ppos. */ + u64 tmp_reg; }; int copy_bpf_fprog_from_user(struct sock_fprog *dst, sockptr_t src, int len); -- cgit From b44123b4a3dcad4664d3a0f72c011ffd4c9c4d93 Mon Sep 17 00:00:00 2001 From: YiFei Zhu Date: Thu, 16 Dec 2021 02:04:27 +0000 Subject: bpf: Add cgroup helpers bpf_{get,set}_retval to get/set syscall return value The helpers continue to use int for retval because all the hooks are int-returning rather than long-returning. The return value of bpf_set_retval is int for future-proofing, in case in the future there may be errors trying to set the retval. After the previous patch, if a program rejects a syscall by returning 0, an -EPERM will be generated no matter if the retval is already set to -err. This patch change it being forced only if retval is not -err. This is because we want to support, for example, invoking bpf_set_retval(-EINVAL) and return 0, and have the syscall return value be -EINVAL not -EPERM. For BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY, the prior behavior is that, if the return value is NET_XMIT_DROP, the packet is silently dropped. We preserve this behavior for backward compatibility reasons, so even if an errno is set, the errno does not return to caller. However, setting a non-err to retval cannot propagate so this is not allowed and we return a -EFAULT in that case. Signed-off-by: YiFei Zhu Reviewed-by: Stanislav Fomichev Link: https://lore.kernel.org/r/b4013fd5d16bed0b01977c1fafdeae12e1de61fb.1639619851.git.zhuyifei@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 7b0c11f414d0..dce54eb0aae8 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1299,7 +1299,7 @@ BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu, while ((prog = READ_ONCE(item->prog))) { run_ctx.prog_item = item; func_ret = run_prog(prog, ctx); - if (!(func_ret & 1)) + if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval)) run_ctx.retval = -EPERM; *(ret_flags) |= (func_ret >> 1); item++; @@ -1329,7 +1329,7 @@ BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu, old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); while ((prog = READ_ONCE(item->prog))) { run_ctx.prog_item = item; - if (!run_prog(prog, ctx)) + if (!run_prog(prog, ctx) && !IS_ERR_VALUE((long)run_ctx.retval)) run_ctx.retval = -EPERM; item++; } @@ -1389,7 +1389,7 @@ out: * 0: NET_XMIT_SUCCESS skb should be transmitted * 1: NET_XMIT_DROP skb should be dropped and cn * 2: NET_XMIT_CN skb should be transmitted and cn - * 3: -EPERM skb should be dropped + * 3: -err skb should be dropped */ #define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \ ({ \ @@ -1398,10 +1398,12 @@ out: u32 _ret; \ _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \ _cn = _flags & BPF_RET_SET_CN; \ + if (_ret && !IS_ERR_VALUE((long)_ret)) \ + _ret = -EFAULT; \ if (!_ret) \ _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \ else \ - _ret = (_cn ? NET_XMIT_DROP : -EPERM); \ + _ret = (_cn ? NET_XMIT_DROP : _ret); \ _ret; \ }) -- cgit From 5298d4bfe80f6ae6ae2777bcd1357b0022d98573 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 18 Jan 2022 07:56:14 +0100 Subject: unicode: clean up the Kconfig symbol confusion Turn the CONFIG_UNICODE symbol into a tristate that generates some always built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol. Note that a lot of the IS_ENABLED() checks could be turned from cpp statements into normal ifs, but this change is intended to be fairly mechanic, so that should be cleaned up later. Fixes: 2b3d04787012 ("unicode: Add utf8-data module") Reported-by: Linus Torvalds Reviewed-by: Eric Biggers Signed-off-by: Christoph Hellwig Signed-off-by: Gabriel Krisman Bertazi --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index c8510da6cc6d..fdac22d16c2b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1490,7 +1490,7 @@ struct super_block { #ifdef CONFIG_FS_VERITY const struct fsverity_operations *s_vop; #endif -#ifdef CONFIG_UNICODE +#if IS_ENABLED(CONFIG_UNICODE) struct unicode_map *s_encoding; __u16 s_encoding_flags; #endif -- cgit From 748cd5729ac7421091316e32dcdffb0578563880 Mon Sep 17 00:00:00 2001 From: Di Zhu Date: Wed, 19 Jan 2022 09:40:04 +0800 Subject: bpf: support BPF_PROG_QUERY for progs attached to sockmap Right now there is no way to query whether BPF programs are attached to a sockmap or not. we can use the standard interface in libbpf to query, such as: bpf_prog_query(mapFd, BPF_SK_SKB_STREAM_PARSER, 0, NULL, ...); the mapFd is the fd of sockmap. Signed-off-by: Di Zhu Acked-by: Yonghong Song Reviewed-by: Jakub Sitnicki Link: https://lore.kernel.org/r/20220119014005.1209-1-zhudi2@huawei.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index dce54eb0aae8..80e3387ea3af 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2069,6 +2069,9 @@ int bpf_prog_test_run_syscall(struct bpf_prog *prog, int sock_map_get_from_fd(const union bpf_attr *attr, struct bpf_prog *prog); int sock_map_prog_detach(const union bpf_attr *attr, enum bpf_prog_type ptype); int sock_map_update_elem_sys(struct bpf_map *map, void *key, void *value, u64 flags); +int sock_map_bpf_prog_query(const union bpf_attr *attr, + union bpf_attr __user *uattr); + void sock_map_unhash(struct sock *sk); void sock_map_close(struct sock *sk, long timeout); #else @@ -2122,6 +2125,12 @@ static inline int sock_map_update_elem_sys(struct bpf_map *map, void *key, void { return -EOPNOTSUPP; } + +static inline int sock_map_bpf_prog_query(const union bpf_attr *attr, + union bpf_attr __user *uattr) +{ + return -EINVAL; +} #endif /* CONFIG_BPF_SYSCALL */ #endif /* CONFIG_NET && CONFIG_BPF_SYSCALL */ -- cgit From d16697cb6261d4cc23422e6b1cb2759df8aa76d0 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Fri, 21 Jan 2022 11:09:44 +0100 Subject: net: skbuff: add size metadata to skb_shared_info for xdp Introduce xdp_frags_size field in skb_shared_info data structure to store xdp_buff/xdp_frame frame paged size (xdp_frags_size will be used in xdp frags support). In order to not increase skb_shared_info size we will use a hole due to skb_shared_info alignment. Acked-by: Toke Hoiland-Jorgensen Acked-by: John Fastabend Acked-by: Jesper Dangaard Brouer Signed-off-by: Lorenzo Bianconi Link: https://lore.kernel.org/r/8a849819a3e0a143d540f78a3a5add76e17e980d.1642758637.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index bf11e1fbd69b..8131d0de7559 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -557,6 +557,7 @@ struct skb_shared_info { * Warning : all fields before dataref are cleared in __alloc_skb() */ atomic_t dataref; + unsigned int xdp_frags_size; /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ -- cgit From c2f2cdbeffda7b153c19e0f3d73149c41026c0db Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Fri, 21 Jan 2022 11:09:52 +0100 Subject: bpf: introduce BPF_F_XDP_HAS_FRAGS flag in prog_flags loading the ebpf program Introduce BPF_F_XDP_HAS_FRAGS and the related field in bpf_prog_aux in order to notify the driver the loaded program support xdp frags. Acked-by: Toke Hoiland-Jorgensen Acked-by: John Fastabend Signed-off-by: Lorenzo Bianconi Link: https://lore.kernel.org/r/db2e8075b7032a356003f407d1b0deb99adaa0ed.1642758637.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 80e3387ea3af..e93ed028a030 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -933,6 +933,7 @@ struct bpf_prog_aux { bool func_proto_unreliable; bool sleepable; bool tail_call_reachable; + bool xdp_has_frags; struct hlist_node tramp_hlist; /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ const struct btf_type *attach_func_proto; -- cgit From f45d5b6ce2e835834c94b8b700787984f02cd662 Mon Sep 17 00:00:00 2001 From: Toke Hoiland-Jorgensen Date: Fri, 21 Jan 2022 11:10:02 +0100 Subject: bpf: generalise tail call map compatibility check The check for tail call map compatibility ensures that tail calls only happen between maps of the same type. To ensure backwards compatibility for XDP frags we need a similar type of check for cpumap and devmap programs, so move the state from bpf_array_aux into bpf_map, add xdp_has_frags to the check, and apply the same check to cpumap and devmap. Acked-by: John Fastabend Co-developed-by: Lorenzo Bianconi Signed-off-by: Lorenzo Bianconi Signed-off-by: Toke Hoiland-Jorgensen Link: https://lore.kernel.org/r/f19fd97c0328a39927f3ad03e1ca6b43fd53cdfd.1642758637.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 30 +++++++++++++++++++----------- 1 file changed, 19 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e93ed028a030..e8ec8d2f2fe3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -194,6 +194,17 @@ struct bpf_map { struct work_struct work; struct mutex freeze_mutex; atomic64_t writecnt; + /* 'Ownership' of program-containing map is claimed by the first program + * that is going to use this map or by the first program which FD is + * stored in the map to make sure that all callers and callees have the + * same prog type, JITed flag and xdp_has_frags flag. + */ + struct { + spinlock_t lock; + enum bpf_prog_type type; + bool jited; + bool xdp_has_frags; + } owner; }; static inline bool map_value_has_spin_lock(const struct bpf_map *map) @@ -994,16 +1005,6 @@ struct bpf_prog_aux { }; struct bpf_array_aux { - /* 'Ownership' of prog array is claimed by the first program that - * is going to use this map or by the first program which FD is - * stored in the map to make sure that all callers and callees have - * the same prog type and JITed flag. - */ - struct { - spinlock_t lock; - enum bpf_prog_type type; - bool jited; - } owner; /* Programs with direct jumps into programs part of this array. */ struct list_head poke_progs; struct bpf_map *map; @@ -1178,7 +1179,14 @@ struct bpf_event_entry { struct rcu_head rcu; }; -bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp); +static inline bool map_type_contains_progs(struct bpf_map *map) +{ + return map->map_type == BPF_MAP_TYPE_PROG_ARRAY || + map->map_type == BPF_MAP_TYPE_DEVMAP || + map->map_type == BPF_MAP_TYPE_CPUMAP; +} + +bool bpf_prog_map_compatible(struct bpf_map *map, const struct bpf_prog *fp); int bpf_prog_calc_tag(struct bpf_prog *fp); const struct bpf_func_proto *bpf_get_trace_printk_proto(void); -- cgit From 96489c1c0b53131b0e1ec33e2060538379ad6152 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 16 Dec 2021 12:16:38 +0100 Subject: mtd: nand: ecc: Add infrastructure to support hardware engines Add the necessary helpers to register/unregister hardware ECC engines that will be called from ECC engine drivers. Also add helpers to get the right engine from the user perspective. Keep a reference of the in use ECC engine in order to prevent modules to be unloaded. Put the reference when the engine gets retired. A static list of hardware (only) ECC engines is setup to keep track of the registered engines. Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20211216111654.238086-13-miquel.raynal@bootlin.com --- include/linux/mtd/nand.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index 32fc7edf65b3..4ddd20fe9c9e 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h @@ -263,12 +263,36 @@ struct nand_ecc_engine_ops { struct nand_page_io_req *req); }; +/** + * enum nand_ecc_engine_integration - How the NAND ECC engine is integrated + * @NAND_ECC_ENGINE_INTEGRATION_INVALID: Invalid value + * @NAND_ECC_ENGINE_INTEGRATION_PIPELINED: Pipelined engine, performs on-the-fly + * correction, does not need to copy + * data around + * @NAND_ECC_ENGINE_INTEGRATION_EXTERNAL: External engine, needs to bring the + * data into its own area before use + */ +enum nand_ecc_engine_integration { + NAND_ECC_ENGINE_INTEGRATION_INVALID, + NAND_ECC_ENGINE_INTEGRATION_PIPELINED, + NAND_ECC_ENGINE_INTEGRATION_EXTERNAL, +}; + /** * struct nand_ecc_engine - ECC engine abstraction for NAND devices + * @dev: Host device + * @node: Private field for registration time * @ops: ECC engine operations + * @integration: How the engine is integrated with the host + * (only relevant on %NAND_ECC_ENGINE_TYPE_ON_HOST engines) + * @priv: Private data */ struct nand_ecc_engine { + struct device *dev; + struct list_head node; struct nand_ecc_engine_ops *ops; + enum nand_ecc_engine_integration integration; + void *priv; }; void of_get_nand_ecc_user_config(struct nand_device *nand); @@ -279,8 +303,12 @@ int nand_ecc_prepare_io_req(struct nand_device *nand, int nand_ecc_finish_io_req(struct nand_device *nand, struct nand_page_io_req *req); bool nand_ecc_is_strong_enough(struct nand_device *nand); +int nand_ecc_register_on_host_hw_engine(struct nand_ecc_engine *engine); +int nand_ecc_unregister_on_host_hw_engine(struct nand_ecc_engine *engine); struct nand_ecc_engine *nand_ecc_get_sw_engine(struct nand_device *nand); struct nand_ecc_engine *nand_ecc_get_on_die_hw_engine(struct nand_device *nand); +struct nand_ecc_engine *nand_ecc_get_on_host_hw_engine(struct nand_device *nand); +void nand_ecc_put_on_host_hw_engine(struct nand_device *nand); #if IS_ENABLED(CONFIG_MTD_NAND_ECC_SW_HAMMING) struct nand_ecc_engine *nand_ecc_sw_hamming_get_engine(void); -- cgit From cda32a618debd3fad8e42757b198719ae180f8f4 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 16 Dec 2021 12:16:39 +0100 Subject: mtd: nand: Add a new helper to retrieve the ECC context Introduce nand_to_ecc_ctx() which will allow to easily jump to the private pointer of an ECC context given a NAND device. This is very handy, from the prepare or finish ECC hook, to get the internal context out of the NAND device object. Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20211216111654.238086-14-miquel.raynal@bootlin.com --- include/linux/mtd/nand.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index 4ddd20fe9c9e..b617efa0a881 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h @@ -990,6 +990,11 @@ int nanddev_markbad(struct nand_device *nand, const struct nand_pos *pos); int nanddev_ecc_engine_init(struct nand_device *nand); void nanddev_ecc_engine_cleanup(struct nand_device *nand); +static inline void *nand_to_ecc_ctx(struct nand_device *nand) +{ + return nand->ecc.ctx.priv; +} + /* BBT related functions */ enum nand_bbt_block_status { NAND_BBT_BLOCK_STATUS_UNKNOWN, -- cgit From 02d1d0e4dfc3fdc5aa05b78e7def00dc1e62257e Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Fri, 7 Jan 2022 10:46:11 -0800 Subject: mtd: rawnand: brcmnand: Add platform data structure for BCMA Update the BCMA's chipcommon nand flash driver to detect which chip-select is used and pass that information via platform data to the brcmnand driver. Make sure that the brcmnand platform data structure is always at the beginning of the platform data of the "nflash" device created by BCMA to allow brcmnand to safely de-reference it. Signed-off-by: Florian Fainelli Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220107184614.2670254-7-f.fainelli@gmail.com --- include/linux/bcma/bcma_driver_chipcommon.h | 5 +++++ include/linux/platform_data/brcmnand.h | 12 ++++++++++++ 2 files changed, 17 insertions(+) create mode 100644 include/linux/platform_data/brcmnand.h (limited to 'include/linux') diff --git a/include/linux/bcma/bcma_driver_chipcommon.h b/include/linux/bcma/bcma_driver_chipcommon.h index d35b9206096d..e3314f746bfa 100644 --- a/include/linux/bcma/bcma_driver_chipcommon.h +++ b/include/linux/bcma/bcma_driver_chipcommon.h @@ -3,6 +3,7 @@ #define LINUX_BCMA_DRIVER_CC_H_ #include +#include #include /** ChipCommon core registers. **/ @@ -599,6 +600,10 @@ struct bcma_sflash { #ifdef CONFIG_BCMA_NFLASH struct bcma_nflash { + /* Must be the fist member for the brcmnand driver to + * de-reference that structure. + */ + struct brcmnand_platform_data brcmnand_info; bool present; bool boot; /* This is the flash the SoC boots from */ }; diff --git a/include/linux/platform_data/brcmnand.h b/include/linux/platform_data/brcmnand.h new file mode 100644 index 000000000000..8b8777985dce --- /dev/null +++ b/include/linux/platform_data/brcmnand.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef BRCMNAND_PLAT_DATA_H +#define BRCMNAND_PLAT_DATA_H + +struct brcmnand_platform_data { + int chip_select; + const char * const *part_probe_types; + unsigned int ecc_stepsize; + unsigned int ecc_strength; +}; + +#endif /* BRCMNAND_PLAT_DATA_H */ -- cgit From 1e73d7f689c7a8fa13f78fe8d6be908fdceef17a Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Wed, 15 Dec 2021 16:13:35 +0100 Subject: iio: core: Fix the kernel doc regarding the currentmode iio_dev entry This is an internal variable, which should be accessed in a very sporadic way and in no case changed by any device driver. Signed-off-by: Miquel Raynal Reviewed-by: Alexandru Ardelean Link: https://lore.kernel.org/r/20211215151344.163036-2-miquel.raynal@bootlin.com Signed-off-by: Jonathan Cameron --- include/linux/iio/iio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index 07025d6b3de1..faf00f2c0be6 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -489,7 +489,7 @@ struct iio_buffer_setup_ops { /** * struct iio_dev - industrial I/O device * @modes: [DRIVER] operating modes supported by device - * @currentmode: [DRIVER] current operating mode + * @currentmode: [INTERN] current operating mode * @dev: [DRIVER] device structure, should be assigned a parent * and owner * @buffer: [DRIVER] any buffer present -- cgit From da5936770517aef8b28888f1123fa654c78cc2f9 Mon Sep 17 00:00:00 2001 From: Nuno Sá Date: Sat, 22 Jan 2022 14:09:04 +0100 Subject: adis: simplify 'adis_update_bits' macros MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There's no need to use '__builtin_choose_expr' to choose the right call to 'adis_update_bits_base()'. We can change the 'BUILD_BUG_ON()' condition so that it makes sure only the supported sizes are passed in. With that, we can just use 'sizeof(val)' as the size argument of 'adis_update_bits_base()'. Signed-off-by: Nuno Sá Link: https://lore.kernel.org/r/20220122130905.99-2-nuno.sa@analog.com Signed-off-by: Jonathan Cameron --- include/linux/iio/imu/adis.h | 12 ++++-------- 1 file changed, 4 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/imu/adis.h b/include/linux/iio/imu/adis.h index 7c02f5292eea..11754f97d8bb 100644 --- a/include/linux/iio/imu/adis.h +++ b/include/linux/iio/imu/adis.h @@ -381,10 +381,8 @@ static inline int adis_update_bits_base(struct adis *adis, unsigned int reg, * @val can lead to undesired behavior if the register to update is 16bit. */ #define adis_update_bits(adis, reg, mask, val) ({ \ - BUILD_BUG_ON(sizeof(val) == 1 || sizeof(val) == 8); \ - __builtin_choose_expr(sizeof(val) == 4, \ - adis_update_bits_base(adis, reg, mask, val, 4), \ - adis_update_bits_base(adis, reg, mask, val, 2)); \ + BUILD_BUG_ON(sizeof(val) != 2 && sizeof(val) != 4); \ + adis_update_bits_base(adis, reg, mask, val, sizeof(val)); \ }) /** @@ -399,10 +397,8 @@ static inline int adis_update_bits_base(struct adis *adis, unsigned int reg, * @val can lead to undesired behavior if the register to update is 16bit. */ #define __adis_update_bits(adis, reg, mask, val) ({ \ - BUILD_BUG_ON(sizeof(val) == 1 || sizeof(val) == 8); \ - __builtin_choose_expr(sizeof(val) == 4, \ - __adis_update_bits_base(adis, reg, mask, val, 4), \ - __adis_update_bits_base(adis, reg, mask, val, 2)); \ + BUILD_BUG_ON(sizeof(val) != 2 && sizeof(val) != 4); \ + __adis_update_bits_base(adis, reg, mask, val, sizeof(val)); \ }) int adis_enable_irq(struct adis *adis, bool enable); -- cgit From c39010ea6ba13bdf0003bd353e1d4c663aaac0a8 Mon Sep 17 00:00:00 2001 From: Nuno Sá Date: Sat, 22 Jan 2022 14:09:05 +0100 Subject: iio: adis: stylistic changes MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Minor stylistic changes to address checkptach complains when called with '--strict'. Signed-off-by: Nuno Sá Link: https://lore.kernel.org/r/20220122130905.99-3-nuno.sa@analog.com Signed-off-by: Jonathan Cameron --- include/linux/iio/imu/adis.h | 48 +++++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/imu/adis.h b/include/linux/iio/imu/adis.h index 11754f97d8bb..515ca09764fe 100644 --- a/include/linux/iio/imu/adis.h +++ b/include/linux/iio/imu/adis.h @@ -32,6 +32,7 @@ struct adis_timeout { u16 sw_reset_ms; u16 self_test_ms; }; + /** * struct adis_data - ADIS chip variant specific data * @read_delay: SPI delay for read operations in us @@ -45,7 +46,7 @@ struct adis_timeout { * @self_test_mask: Bitmask of supported self-test operations * @self_test_reg: Register address to request self test command * @self_test_no_autoclear: True if device's self-test needs clear of ctrl reg - * @status_error_msgs: Array of error messgaes + * @status_error_msgs: Array of error messages * @status_error_mask: Bitmask of errors supported by the device * @timeouts: Chip specific delays * @enable_irq: Hook for ADIS devices that have a special IRQ enable/disable @@ -130,12 +131,12 @@ struct adis { unsigned long irq_flag; void *buffer; - uint8_t tx[10] ____cacheline_aligned; - uint8_t rx[4]; + u8 tx[10] ____cacheline_aligned; + u8 rx[4]; }; int adis_init(struct adis *adis, struct iio_dev *indio_dev, - struct spi_device *spi, const struct adis_data *data); + struct spi_device *spi, const struct adis_data *data); int __adis_reset(struct adis *adis); /** @@ -156,9 +157,9 @@ static inline int adis_reset(struct adis *adis) } int __adis_write_reg(struct adis *adis, unsigned int reg, - unsigned int val, unsigned int size); + unsigned int val, unsigned int size); int __adis_read_reg(struct adis *adis, unsigned int reg, - unsigned int *val, unsigned int size); + unsigned int *val, unsigned int size); /** * __adis_write_reg_8() - Write single byte to a register (unlocked) @@ -167,7 +168,7 @@ int __adis_read_reg(struct adis *adis, unsigned int reg, * @value: The value to write */ static inline int __adis_write_reg_8(struct adis *adis, unsigned int reg, - uint8_t val) + u8 val) { return __adis_write_reg(adis, reg, val, 1); } @@ -179,7 +180,7 @@ static inline int __adis_write_reg_8(struct adis *adis, unsigned int reg, * @value: Value to be written */ static inline int __adis_write_reg_16(struct adis *adis, unsigned int reg, - uint16_t val) + u16 val) { return __adis_write_reg(adis, reg, val, 2); } @@ -191,7 +192,7 @@ static inline int __adis_write_reg_16(struct adis *adis, unsigned int reg, * @value: Value to be written */ static inline int __adis_write_reg_32(struct adis *adis, unsigned int reg, - uint32_t val) + u32 val) { return __adis_write_reg(adis, reg, val, 4); } @@ -203,7 +204,7 @@ static inline int __adis_write_reg_32(struct adis *adis, unsigned int reg, * @val: The value read back from the device */ static inline int __adis_read_reg_16(struct adis *adis, unsigned int reg, - uint16_t *val) + u16 *val) { unsigned int tmp; int ret; @@ -222,7 +223,7 @@ static inline int __adis_read_reg_16(struct adis *adis, unsigned int reg, * @val: The value read back from the device */ static inline int __adis_read_reg_32(struct adis *adis, unsigned int reg, - uint32_t *val) + u32 *val) { unsigned int tmp; int ret; @@ -242,7 +243,7 @@ static inline int __adis_read_reg_32(struct adis *adis, unsigned int reg, * @size: The size of the @value (in bytes) */ static inline int adis_write_reg(struct adis *adis, unsigned int reg, - unsigned int val, unsigned int size) + unsigned int val, unsigned int size) { int ret; @@ -261,7 +262,7 @@ static inline int adis_write_reg(struct adis *adis, unsigned int reg, * @size: The size of the @val buffer */ static int adis_read_reg(struct adis *adis, unsigned int reg, - unsigned int *val, unsigned int size) + unsigned int *val, unsigned int size) { int ret; @@ -279,7 +280,7 @@ static int adis_read_reg(struct adis *adis, unsigned int reg, * @value: The value to write */ static inline int adis_write_reg_8(struct adis *adis, unsigned int reg, - uint8_t val) + u8 val) { return adis_write_reg(adis, reg, val, 1); } @@ -291,7 +292,7 @@ static inline int adis_write_reg_8(struct adis *adis, unsigned int reg, * @value: Value to be written */ static inline int adis_write_reg_16(struct adis *adis, unsigned int reg, - uint16_t val) + u16 val) { return adis_write_reg(adis, reg, val, 2); } @@ -303,7 +304,7 @@ static inline int adis_write_reg_16(struct adis *adis, unsigned int reg, * @value: Value to be written */ static inline int adis_write_reg_32(struct adis *adis, unsigned int reg, - uint32_t val) + u32 val) { return adis_write_reg(adis, reg, val, 4); } @@ -315,7 +316,7 @@ static inline int adis_write_reg_32(struct adis *adis, unsigned int reg, * @val: The value read back from the device */ static inline int adis_read_reg_16(struct adis *adis, unsigned int reg, - uint16_t *val) + u16 *val) { unsigned int tmp; int ret; @@ -334,7 +335,7 @@ static inline int adis_read_reg_16(struct adis *adis, unsigned int reg, * @val: The value read back from the device */ static inline int adis_read_reg_32(struct adis *adis, unsigned int reg, - uint32_t *val) + u32 *val) { unsigned int tmp; int ret; @@ -439,8 +440,8 @@ static inline void adis_dev_unlock(struct adis *adis) } int adis_single_conversion(struct iio_dev *indio_dev, - const struct iio_chan_spec *chan, unsigned int error_mask, - int *val); + const struct iio_chan_spec *chan, + unsigned int error_mask, int *val); #define ADIS_VOLTAGE_CHAN(addr, si, chan, name, info_all, bits) { \ .type = IIO_VOLTAGE, \ @@ -489,7 +490,7 @@ int adis_single_conversion(struct iio_dev *indio_dev, .modified = 1, \ .channel2 = IIO_MOD_ ## mod, \ .info_mask_separate = BIT(IIO_CHAN_INFO_RAW) | \ - info_sep, \ + (info_sep), \ .info_mask_shared_by_type = BIT(IIO_CHAN_INFO_SCALE), \ .info_mask_shared_by_all = info_all, \ .address = (addr), \ @@ -523,7 +524,7 @@ devm_adis_setup_buffer_and_trigger(struct adis *adis, struct iio_dev *indio_dev, int devm_adis_probe_trigger(struct adis *adis, struct iio_dev *indio_dev); int adis_update_scan_mode(struct iio_dev *indio_dev, - const unsigned long *scan_mask); + const unsigned long *scan_mask); #else /* CONFIG_IIO_BUFFER */ @@ -547,7 +548,8 @@ static inline int devm_adis_probe_trigger(struct adis *adis, #ifdef CONFIG_DEBUG_FS int adis_debugfs_reg_access(struct iio_dev *indio_dev, - unsigned int reg, unsigned int writeval, unsigned int *readval); + unsigned int reg, unsigned int writeval, + unsigned int *readval); #else -- cgit From f1ba938e4f98941dc2b77795062e49444ec1fee1 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 19 Jan 2022 00:09:13 +0100 Subject: spi: s3c64xx: Delete unused boardfile helpers The helpers to use SPI host 1 and 2 are unused in the kernel and taking up space and maintenance hours. New systems should use device tree and not this, so delete the code. Cc: linux-samsung-soc@vger.kernel.org Cc: Krzysztof Kozlowski Cc: Sylwester Nawrocki Signed-off-by: Linus Walleij Reviewed-by: Sam Protsenko Reviewed-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220118230915.157797-1-linus.walleij@linaro.org Signed-off-by: Mark Brown --- include/linux/platform_data/spi-s3c64xx.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/spi-s3c64xx.h b/include/linux/platform_data/spi-s3c64xx.h index 773daf7915a3..19d690f34670 100644 --- a/include/linux/platform_data/spi-s3c64xx.h +++ b/include/linux/platform_data/spi-s3c64xx.h @@ -52,17 +52,9 @@ struct s3c64xx_spi_info { */ extern void s3c64xx_spi0_set_platdata(int (*cfg_gpio)(void), int src_clk_nr, int num_cs); -extern void s3c64xx_spi1_set_platdata(int (*cfg_gpio)(void), int src_clk_nr, - int num_cs); -extern void s3c64xx_spi2_set_platdata(int (*cfg_gpio)(void), int src_clk_nr, - int num_cs); /* defined by architecture to configure gpio */ extern int s3c64xx_spi0_cfg_gpio(void); -extern int s3c64xx_spi1_cfg_gpio(void); -extern int s3c64xx_spi2_cfg_gpio(void); extern struct s3c64xx_spi_info s3c64xx_spi0_pdata; -extern struct s3c64xx_spi_info s3c64xx_spi1_pdata; -extern struct s3c64xx_spi_info s3c64xx_spi2_pdata; #endif /*__SPI_S3C64XX_H */ -- cgit From 3b5529ae7f3578da633e8ae2ec0715a55a248f9f Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 19 Jan 2022 00:09:14 +0100 Subject: spi: s3c64xx: Drop custom gpio setup argument The SPI0 platform population function was taking a custom gpio setup callback but the only user pass NULL as argument so drop this argument. Cc: linux-samsung-soc@vger.kernel.org Cc: Krzysztof Kozlowski Cc: Sylwester Nawrocki Signed-off-by: Linus Walleij Reviewed-by: Krzysztof Kozlowski Reviewed-by: Sam Protsenko Link: https://lore.kernel.org/r/20220118230915.157797-2-linus.walleij@linaro.org Signed-off-by: Mark Brown --- include/linux/platform_data/spi-s3c64xx.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/spi-s3c64xx.h b/include/linux/platform_data/spi-s3c64xx.h index 19d690f34670..10890a4b55b9 100644 --- a/include/linux/platform_data/spi-s3c64xx.h +++ b/include/linux/platform_data/spi-s3c64xx.h @@ -43,15 +43,13 @@ struct s3c64xx_spi_info { /** * s3c64xx_spi_set_platdata - SPI Controller configure callback by the board * initialization code. - * @cfg_gpio: Pointer to gpio setup function. * @src_clk_nr: Clock the SPI controller is to use to generate SPI clocks. * @num_cs: Number of elements in the 'cs' array. * * Call this from machine init code for each SPI Controller that * has some chips attached to it. */ -extern void s3c64xx_spi0_set_platdata(int (*cfg_gpio)(void), int src_clk_nr, - int num_cs); +extern void s3c64xx_spi0_set_platdata(int src_clk_nr, int num_cs); /* defined by architecture to configure gpio */ extern int s3c64xx_spi0_cfg_gpio(void); -- cgit From a45cf3cc72dd9cfde9db8af32cdf9c431f53f9bc Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 19 Jan 2022 00:09:15 +0100 Subject: spi: s3c64xx: Convert to use GPIO descriptors Convert the S3C64xx SPI host to use GPIO descriptors. Provide GPIO descriptor tables for the one user with CS 0 and 1. Cc: linux-samsung-soc@vger.kernel.org Cc: Sylwester Nawrocki Reviewed-by: Krzysztof Kozlowski Reviewed-by: Sam Protsenko Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220118230915.157797-3-linus.walleij@linaro.org Signed-off-by: Mark Brown --- include/linux/platform_data/spi-s3c64xx.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/spi-s3c64xx.h b/include/linux/platform_data/spi-s3c64xx.h index 10890a4b55b9..5df1ace6d2c9 100644 --- a/include/linux/platform_data/spi-s3c64xx.h +++ b/include/linux/platform_data/spi-s3c64xx.h @@ -16,7 +16,6 @@ struct platform_device; * struct s3c64xx_spi_csinfo - ChipSelect description * @fb_delay: Slave specific feedback delay. * Refer to FB_CLK_SEL register definition in SPI chapter. - * @line: Custom 'identity' of the CS line. * * This is per SPI-Slave Chipselect information. * Allocate and initialize one in machine init code and make the @@ -24,7 +23,6 @@ struct platform_device; */ struct s3c64xx_spi_csinfo { u8 fb_delay; - unsigned line; }; /** -- cgit From 7f2a3cf4e6077a1525092f114be7819e505773a1 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 19 Jan 2022 01:09:14 +0100 Subject: spi: s3c24xx: Convert to GPIO descriptors This driver has a bunch of custom oldstyle GPIO number-passing fields and a custom set-up callback. The good thing is: nothing in the kernel is using it. Convert the driver to use GPIO descriptors with a SPI_MASTER_GPIO_SS flag so that the local CS callback also get invoked as the hardware needs this. New users of this driver can provide GPIO descriptor tables like the other converted drivers. Cc: linux-samsung-soc@vger.kernel.org Cc: Krzysztof Kozlowski Cc: Sylwester Nawrocki Signed-off-by: Linus Walleij Reviewed-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220119000914.192553-1-linus.walleij@linaro.org Signed-off-by: Mark Brown --- include/linux/spi/s3c24xx.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/s3c24xx.h b/include/linux/spi/s3c24xx.h index 440a71593162..9b8bb22d5b0c 100644 --- a/include/linux/spi/s3c24xx.h +++ b/include/linux/spi/s3c24xx.h @@ -10,14 +10,9 @@ #define __LINUX_SPI_S3C24XX_H __FILE__ struct s3c2410_spi_info { - int pin_cs; /* simple gpio cs */ unsigned int num_cs; /* total chipselects */ int bus_num; /* bus number to use. */ - unsigned int use_fiq:1; /* use fiq */ - - void (*gpio_setup)(struct s3c2410_spi_info *spi, int enable); - void (*set_cs)(struct s3c2410_spi_info *spi, int cs, int pol); }; extern int s3c24xx_set_fiq(unsigned int irq, u32 *ack_ptr, bool on); -- cgit From 376040e47334c6dc6a939a32197acceb00fe4acf Mon Sep 17 00:00:00 2001 From: Kenny Yu Date: Mon, 24 Jan 2022 10:54:01 -0800 Subject: bpf: Add bpf_copy_from_user_task() helper This adds a helper for bpf programs to read the memory of other tasks. As an example use case at Meta, we are using a bpf task iterator program and this new helper to print C++ async stack traces for all threads of a given process. Signed-off-by: Kenny Yu Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/r/20220124185403.468466-3-kennyyu@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 8c92c974bd12..394305a5e02f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2243,6 +2243,7 @@ extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto; extern const struct bpf_func_proto bpf_find_vma_proto; extern const struct bpf_func_proto bpf_loop_proto; extern const struct bpf_func_proto bpf_strncmp_proto; +extern const struct bpf_func_proto bpf_copy_from_user_task_proto; const struct bpf_func_proto *tracing_prog_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); -- cgit From 1dc01abad6544cb9d884071b626b706e37aa9601 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Thu, 13 Jan 2022 16:53:57 +0100 Subject: cpumask: Always inline helpers which use bit manipulation functions Former are always inlined so do that for the latter too, for consistency. Size impact is a whopping 5 bytes increase! :-) text data bss dec hex filename 22350551 8213184 1917164 32480899 1ef9e83 vmlinux.x86-64.defconfig.before 22350556 8213152 1917164 32480872 1ef9e68 vmlinux.x86-64.defconfig.after Signed-off-by: Borislav Petkov Signed-off-by: Peter Zijlstra (Intel) Acked-by: Marco Elver Link: https://lore.kernel.org/r/20220113155357.4706-3-bp@alien8.de --- include/linux/cpumask.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 64dae70d31f5..6b06c698cd2a 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -341,12 +341,12 @@ extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool * @cpu: cpu number (< nr_cpu_ids) * @dstp: the cpumask pointer */ -static inline void cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp) +static __always_inline void cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp) { set_bit(cpumask_check(cpu), cpumask_bits(dstp)); } -static inline void __cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp) +static __always_inline void __cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp) { __set_bit(cpumask_check(cpu), cpumask_bits(dstp)); } @@ -357,12 +357,12 @@ static inline void __cpumask_set_cpu(unsigned int cpu, struct cpumask *dstp) * @cpu: cpu number (< nr_cpu_ids) * @dstp: the cpumask pointer */ -static inline void cpumask_clear_cpu(int cpu, struct cpumask *dstp) +static __always_inline void cpumask_clear_cpu(int cpu, struct cpumask *dstp) { clear_bit(cpumask_check(cpu), cpumask_bits(dstp)); } -static inline void __cpumask_clear_cpu(int cpu, struct cpumask *dstp) +static __always_inline void __cpumask_clear_cpu(int cpu, struct cpumask *dstp) { __clear_bit(cpumask_check(cpu), cpumask_bits(dstp)); } @@ -374,7 +374,7 @@ static inline void __cpumask_clear_cpu(int cpu, struct cpumask *dstp) * * Returns 1 if @cpu is set in @cpumask, else returns 0 */ -static inline int cpumask_test_cpu(int cpu, const struct cpumask *cpumask) +static __always_inline int cpumask_test_cpu(int cpu, const struct cpumask *cpumask) { return test_bit(cpumask_check(cpu), cpumask_bits((cpumask))); } @@ -388,7 +388,7 @@ static inline int cpumask_test_cpu(int cpu, const struct cpumask *cpumask) * * test_and_set_bit wrapper for cpumasks. */ -static inline int cpumask_test_and_set_cpu(int cpu, struct cpumask *cpumask) +static __always_inline int cpumask_test_and_set_cpu(int cpu, struct cpumask *cpumask) { return test_and_set_bit(cpumask_check(cpu), cpumask_bits(cpumask)); } @@ -402,7 +402,7 @@ static inline int cpumask_test_and_set_cpu(int cpu, struct cpumask *cpumask) * * test_and_clear_bit wrapper for cpumasks. */ -static inline int cpumask_test_and_clear_cpu(int cpu, struct cpumask *cpumask) +static __always_inline int cpumask_test_and_clear_cpu(int cpu, struct cpumask *cpumask) { return test_and_clear_bit(cpumask_check(cpu), cpumask_bits(cpumask)); } -- cgit From be6ec5b7026620b931e0fa9287d24ad2cd2ab9b6 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Wed, 26 Jan 2022 10:25:55 +0000 Subject: net: xpcs: add support for retrieving supported interface modes Add a function to the xpcs driver to retrieve the supported PHY interface modes, which can be used by drivers to fill in phylink's supported_interfaces mask. We validate the interface bit index to ensure that it fits within the bitmap as xpcs lists PHY_INTERFACE_MODE_MAX in an entry. Tested-by: Wong Vee Khee # Intel EHL Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/pcs/pcs-xpcs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index add077a81b21..3126a4924d92 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -33,6 +33,7 @@ int xpcs_do_config(struct dw_xpcs *xpcs, phy_interface_t interface, unsigned int mode); void xpcs_validate(struct dw_xpcs *xpcs, unsigned long *supported, struct phylink_link_state *state); +void xpcs_get_interfaces(struct dw_xpcs *xpcs, unsigned long *interfaces); int xpcs_config_eee(struct dw_xpcs *xpcs, int mult_fact_100ns, int enable); struct dw_xpcs *xpcs_create(struct mdio_device *mdiodev, -- cgit From fe70fb74b56407c1a5be347258082f8abbe7956d Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Wed, 26 Jan 2022 10:26:10 +0000 Subject: net: stmmac/xpcs: convert to pcs_validate() stmmac explicitly calls the xpcs driver to validate the ethtool linkmodes. This is no longer necessary as phylink now supports validation through a PCS method. Convert both drivers to use this new mechanism. Tested-by: Wong Vee Khee # Intel EHL Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/pcs/pcs-xpcs.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index 3126a4924d92..266eb26fb029 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -31,8 +31,6 @@ void xpcs_link_up(struct phylink_pcs *pcs, unsigned int mode, phy_interface_t interface, int speed, int duplex); int xpcs_do_config(struct dw_xpcs *xpcs, phy_interface_t interface, unsigned int mode); -void xpcs_validate(struct dw_xpcs *xpcs, unsigned long *supported, - struct phylink_link_state *state); void xpcs_get_interfaces(struct dw_xpcs *xpcs, unsigned long *interfaces); int xpcs_config_eee(struct dw_xpcs *xpcs, int mult_fact_100ns, int enable); -- cgit From 4e2a44c1408b6a6a46122704511234f68cf012b8 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Mon, 24 Jan 2022 08:14:22 +0100 Subject: tty: add kfifo to tty_port Define a kfifo inside struct tty_port. We use DECLARE_KFIFO_PTR and let the preexisting tty_port::xmit_buf be also the buffer for the kfifo. And handle the initialization/decomissioning along with xmit_buf, i.e. in tty_port_alloc_xmit_buf() and tty_port_free_xmit_buf(), respectively. This allows for kfifo use in drivers which opt-in, while others still may use the old xmit_buf. mxser will be the first user in the next few patches. Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220124071430.14907-4-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_port.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tty_port.h b/include/linux/tty_port.h index d3ea9ed0b98e..58e9619116b7 100644 --- a/include/linux/tty_port.h +++ b/include/linux/tty_port.h @@ -2,6 +2,7 @@ #ifndef _LINUX_TTY_PORT_H #define _LINUX_TTY_PORT_H +#include #include #include #include @@ -67,6 +68,7 @@ extern const struct tty_port_client_operations tty_port_default_client_ops; * @mutex: locking, for open, shutdown and other port operations * @buf_mutex: @xmit_buf alloc lock * @xmit_buf: optional xmit buffer used by some drivers + * @xmit_fifo: optional xmit buffer used by some drivers * @close_delay: delay in jiffies to wait when closing the port * @closing_wait: delay in jiffies for output to be sent before closing * @drain_delay: set to zero if no pure time based drain is needed else set to @@ -110,6 +112,7 @@ struct tty_port { struct mutex mutex; struct mutex buf_mutex; unsigned char *xmit_buf; + DECLARE_KFIFO_PTR(xmit_fifo, unsigned char); unsigned int close_delay; unsigned int closing_wait; int drain_delay; -- cgit From ebb7fb1557b1d03b906b668aa2164b51e6b7d19a Mon Sep 17 00:00:00 2001 From: Dave Chinner Date: Wed, 26 Jan 2022 09:19:20 -0800 Subject: xfs, iomap: limit individual ioend chain lengths in writeback Trond Myklebust reported soft lockups in XFS IO completion such as this: watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106] CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1 Workqueue: xfs-conv/md127 xfs_end_io [xfs] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20 Call Trace: wake_up_page_bit+0x8a/0x110 iomap_finish_ioend+0xd7/0x1c0 iomap_finish_ioends+0x7f/0xb0 xfs_end_ioend+0x6b/0x100 [xfs] xfs_end_io+0xb9/0xe0 [xfs] process_one_work+0x1a7/0x360 worker_thread+0x1fa/0x390 kthread+0x116/0x130 ret_from_fork+0x35/0x40 Ioends are processed as an atomic completion unit when all the chained bios in the ioend have completed their IO. Logically contiguous ioends can also be merged and completed as a single, larger unit. Both of these things can be problematic as both the bio chains per ioend and the size of the merged ioends processed as a single completion are both unbound. If we have a large sequential dirty region in the page cache, write_cache_pages() will keep feeding us sequential pages and we will keep mapping them into ioends and bios until we get a dirty page at a non-sequential file offset. These large sequential runs can will result in bio and ioend chaining to optimise the io patterns. The pages iunder writeback are pinned within these chains until the submission chaining is broken, allowing the entire chain to be completed. This can result in huge chains being processed in IO completion context. We get deep bio chaining if we have large contiguous physical extents. We will keep adding pages to the current bio until it is full, then we'll chain a new bio to keep adding pages for writeback. Hence we can build bio chains that map millions of pages and tens of gigabytes of RAM if the page cache contains big enough contiguous dirty file regions. This long bio chain pins those pages until the final bio in the chain completes and the ioend can iterate all the chained bios and complete them. OTOH, if we have a physically fragmented file, we end up submitting one ioend per physical fragment that each have a small bio or bio chain attached to them. We do not chain these at IO submission time, but instead we chain them at completion time based on file offset via iomap_ioend_try_merge(). Hence we can end up with unbound ioend chains being built via completion merging. XFS can then do COW remapping or unwritten extent conversion on that merged chain, which involves walking an extent fragment at a time and running a transaction to modify the physical extent information. IOWs, we merge all the discontiguous ioends together into a contiguous file range, only to then process them individually as discontiguous extents. This extent manipulation is computationally expensive and can run in a tight loop, so merging logically contiguous but physically discontigous ioends gains us nothing except for hiding the fact the fact we broke the ioends up into individual physical extents at submission and then need to loop over those individual physical extents at completion. Hence we need to have mechanisms to limit ioend sizes and to break up completion processing of large merged ioend chains: 1. bio chains per ioend need to be bound in length. Pure overwrites go straight to iomap_finish_ioend() in softirq context with the exact bio chain attached to the ioend by submission. Hence the only way to prevent long holdoffs here is to bound ioend submission sizes because we can't reschedule in softirq context. 2. iomap_finish_ioends() has to handle unbound merged ioend chains correctly. This relies on any one call to iomap_finish_ioend() being bound in runtime so that cond_resched() can be issued regularly as the long ioend chain is processed. i.e. this relies on mechanism #1 to limit individual ioend sizes to work correctly. 3. filesystems have to loop over the merged ioends to process physical extent manipulations. This means they can loop internally, and so we break merging at physical extent boundaries so the filesystem can easily insert reschedule points between individual extent manipulations. Signed-off-by: Dave Chinner Reported-and-tested-by: Trond Myklebust Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- include/linux/iomap.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index b55bd49e55f5..97a3a2edb585 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -263,9 +263,11 @@ struct iomap_ioend { struct list_head io_list; /* next ioend in chain */ u16 io_type; u16 io_flags; /* IOMAP_F_* */ + u32 io_folios; /* folios added to ioend */ struct inode *io_inode; /* file being written to */ size_t io_size; /* size of the extent */ loff_t io_offset; /* offset in the file */ + sector_t io_sector; /* start sector of ioend */ struct bio *io_bio; /* bio being built */ struct bio io_inline_bio; /* MUST BE LAST! */ }; -- cgit From 597568e8df046ebf349c706b281a711297ab20fb Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Tue, 25 Jan 2022 13:50:07 +0800 Subject: misc: rtsx: Rework runtime power management flow Commit 5b4258f6721f ("misc: rtsx: rts5249 support runtime PM") uses "rtd3_work" and "idle_work" to manage it's own runtime PM state machine. When its child device, rtsx_pci_sdmmc, uses runtime PM refcount correctly, all the additional works can be managed by generic runtime PM helpers. So consolidate "idle_work" and "rtd3_work" into generic runtime idle callback and runtime suspend callback, respectively. Fixes: 5b4258f6721f ("misc: rtsx: rts5249 support runtime PM") Cc: Ricky WU Tested-by: Ricky WU Signed-off-by: Kai-Heng Feng Link: https://lore.kernel.org/r/20220125055010.1866563-2-kai.heng.feng@canonical.com Signed-off-by: Greg Kroah-Hartman --- include/linux/rtsx_pci.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rtsx_pci.h b/include/linux/rtsx_pci.h index 4ab7bfc675f1..89b7d34e25b6 100644 --- a/include/linux/rtsx_pci.h +++ b/include/linux/rtsx_pci.h @@ -1201,8 +1201,6 @@ struct rtsx_pcr { unsigned int card_exist; struct delayed_work carddet_work; - struct delayed_work idle_work; - struct delayed_work rtd3_work; spinlock_t lock; struct mutex pcr_mutex; @@ -1212,7 +1210,6 @@ struct rtsx_pcr { unsigned int cur_clock; bool remove_pci; bool msi_en; - bool is_runtime_suspended; #define EXTRA_CAPS_SD_SDR50 (1 << 0) #define EXTRA_CAPS_SD_SDR104 (1 << 1) -- cgit From 71732e24609b5a7af96efc89aebde55f76c1de3e Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Tue, 25 Jan 2022 13:50:09 +0800 Subject: misc: rtsx: Quiesce rts5249 on system suspend Set more registers in force_power_down callback to avoid S3 wakeup from hotplugging cards. This is originally written by Ricky WU. Link: https://lore.kernel.org/lkml/c4525b4738f94483b9b8f8571fc80646@realtek.com/ Cc: Ricky WU Tested-by: Ricky WU Signed-off-by: Kai-Heng Feng Link: https://lore.kernel.org/r/20220125055010.1866563-4-kai.heng.feng@canonical.com Signed-off-by: Greg Kroah-Hartman --- include/linux/rtsx_pci.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rtsx_pci.h b/include/linux/rtsx_pci.h index 89b7d34e25b6..3d780b44e678 100644 --- a/include/linux/rtsx_pci.h +++ b/include/linux/rtsx_pci.h @@ -1095,7 +1095,7 @@ struct pcr_ops { unsigned int (*cd_deglitch)(struct rtsx_pcr *pcr); int (*conv_clk_and_div_n)(int clk, int dir); void (*fetch_vendor_settings)(struct rtsx_pcr *pcr); - void (*force_power_down)(struct rtsx_pcr *pcr, u8 pm_state); + void (*force_power_down)(struct rtsx_pcr *pcr, u8 pm_state, bool runtime); void (*stop_cmd)(struct rtsx_pcr *pcr); void (*set_aspm)(struct rtsx_pcr *pcr, bool enable); -- cgit From 8033c6c2fed235b3d571b5a5ede302b752bc5c7d Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jan 2022 10:54:12 -0800 Subject: bpf: remove unused static inlines Remove two dead stubs, sk_msg_clear_meta() was never used, use of xskq_cons_is_full() got replaced by xsk_tx_writeable() in v5.10. Signed-off-by: Jakub Kicinski Link: https://lore.kernel.org/r/20220126185412.2776254-1-kuba@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 10 ---------- include/linux/skmsg.h | 5 ----- 2 files changed, 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 394305a5e02f..2344f793c4dc 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1875,11 +1875,6 @@ static inline int bpf_obj_get_user(const char __user *pathname, int flags) return -EOPNOTSUPP; } -static inline bool dev_map_can_have_prog(struct bpf_map *map) -{ - return false; -} - static inline void __dev_flush(void) { } @@ -1943,11 +1938,6 @@ static inline int cpu_map_generic_redirect(struct bpf_cpu_map_entry *rcpu, return -EOPNOTSUPP; } -static inline bool cpu_map_prog_allowed(struct bpf_map *map) -{ - return false; -} - static inline struct bpf_prog *bpf_prog_get_type_path(const char *name, enum bpf_prog_type type) { diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 18a717fe62eb..ddde5f620901 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -171,11 +171,6 @@ static inline u32 sk_msg_iter_dist(u32 start, u32 end) #define sk_msg_iter_next(msg, which) \ sk_msg_iter_var_next(msg->sg.which) -static inline void sk_msg_clear_meta(struct sk_msg *msg) -{ - memset(&msg->sg, 0, offsetofend(struct sk_msg_sg, copy)); -} - static inline void sk_msg_init(struct sk_msg *msg) { BUILD_BUG_ON(ARRAY_SIZE(msg->sg.data) - 1 != NR_MSG_FRAG_IDS); -- cgit From 27599aacbaefcbf2af7b06b0029459bbf682000d Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Tue, 25 Jan 2022 10:12:18 +0100 Subject: fbdev: Hot-unplug firmware fb devices on forced removal Hot-unplug all firmware-framebuffer devices as part of removing them via remove_conflicting_framebuffers() et al. Releases all memory regions to be acquired by native drivers. Firmware, such as EFI, install a framebuffer while posting the computer. After removing the firmware-framebuffer device from fbdev, a native driver takes over the hardware and the firmware framebuffer becomes invalid. Firmware-framebuffer drivers, specifically simplefb, don't release their device from Linux' device hierarchy. It still owns the firmware framebuffer and blocks the native drivers from loading. This has been observed in the vmwgfx driver. [1] Initiating a device removal (i.e., hot unplug) as part of remove_conflicting_framebuffers() removes the underlying device and returns the memory range to the system. [1] https://lore.kernel.org/dri-devel/20220117180359.18114-1-zack@kde.org/ v2: * rename variable 'dev' to 'device' (Javier) Signed-off-by: Thomas Zimmermann Reported-by: Zack Rusin Reviewed-by: Javier Martinez Canillas Reviewed-by: Zack Rusin Reviewed-by: Hans de Goede CC: stable@vger.kernel.org # v5.11+ Link: https://patchwork.freedesktop.org/patch/msgid/20220125091222.21457-2-tzimmermann@suse.de --- include/linux/fb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 6f3db99ab990..ac294fc4c07b 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -502,6 +502,7 @@ struct fb_info { } *apertures; bool skip_vt_switch; /* no VT switch on suspend/resume required */ + bool forced_out; /* set when being removed by another driver */ }; static inline struct apertures_struct *alloc_apertures(unsigned int max_num) { -- cgit From ec2444530612a886b406e2830d7f314d1a07d4bb Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Wed, 19 Jan 2022 14:39:39 -0800 Subject: psi: Fix "no previous prototype" warnings when CONFIG_CGROUPS=n When CONFIG_CGROUPS is disabled psi code generates the following warnings: kernel/sched/psi.c:1112:21: warning: no previous prototype for 'psi_trigger_create' [-Wmissing-prototypes] 1112 | struct psi_trigger *psi_trigger_create(struct psi_group *group, | ^~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1182:6: warning: no previous prototype for 'psi_trigger_destroy' [-Wmissing-prototypes] 1182 | void psi_trigger_destroy(struct psi_trigger *t) | ^~~~~~~~~~~~~~~~~~~ kernel/sched/psi.c:1249:10: warning: no previous prototype for 'psi_trigger_poll' [-Wmissing-prototypes] 1249 | __poll_t psi_trigger_poll(void **trigger_ptr, | ^~~~~~~~~~~~~~~~ Change declarations of these functions in the header to provide the prototypes even when they are unused. Fixes: 0e94682b73bf ("psi: introduce psi monitor") Reported-by: kernel test robot Signed-off-by: Suren Baghdasaryan Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220119223940.787748-2-surenb@google.com --- include/linux/psi.h | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/psi.h b/include/linux/psi.h index a70ca833c6d7..827970278d62 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -25,18 +25,17 @@ void psi_memstall_enter(unsigned long *flags); void psi_memstall_leave(unsigned long *flags); int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); - -#ifdef CONFIG_CGROUPS -int psi_cgroup_alloc(struct cgroup *cgrp); -void psi_cgroup_free(struct cgroup *cgrp); -void cgroup_move_task(struct task_struct *p, struct css_set *to); - struct psi_trigger *psi_trigger_create(struct psi_group *group, char *buf, size_t nbytes, enum psi_res res); void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *t); __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait); + +#ifdef CONFIG_CGROUPS +int psi_cgroup_alloc(struct cgroup *cgrp); +void psi_cgroup_free(struct cgroup *cgrp); +void cgroup_move_task(struct task_struct *p, struct css_set *to); #endif #else /* CONFIG_PSI */ -- cgit From bd5daba2d02479729526d35de85807aadf6fba20 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jan 2022 11:10:55 -0800 Subject: mii: remove mii_lpa_to_linkmode_lpa_sgmii() The only caller of mii_lpa_to_linkmode_lpa_sgmii() disappeared in v5.10. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/mii.h | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mii.h b/include/linux/mii.h index 12ea29e04293..b8a1a17a87dd 100644 --- a/include/linux/mii.h +++ b/include/linux/mii.h @@ -387,23 +387,6 @@ mii_lpa_mod_linkmode_lpa_sgmii(unsigned long *lp_advertising, u32 lpa) speed_duplex == LPA_SGMII_10FULL); } -/** - * mii_lpa_to_linkmode_adv_sgmii - * @advertising: pointer to destination link mode. - * @lpa: value of the MII_LPA register - * - * A small helper function that translates MII_ADVERTISE bits - * to linkmode advertisement settings when in SGMII mode. - * Clears the old value of advertising. - */ -static inline void mii_lpa_to_linkmode_lpa_sgmii(unsigned long *lp_advertising, - u32 lpa) -{ - linkmode_zero(lp_advertising); - - mii_lpa_mod_linkmode_lpa_sgmii(lp_advertising, lpa); -} - /** * mii_adv_mod_linkmode_adv_t * @advertising:pointer to destination link mode. -- cgit From b1755400b4be33dbd286272b153579631be2e2ca Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jan 2022 11:10:57 -0800 Subject: net: remove net_invalid_timestamp() No callers since v3.15. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/skbuff.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8131d0de7559..60bd2347708c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3899,11 +3899,6 @@ static inline ktime_t net_timedelta(ktime_t t) return ktime_sub(ktime_get_real(), t); } -static inline ktime_t net_invalid_timestamp(void) -{ - return 0; -} - static inline u8 skb_metadata_len(const struct sk_buff *skb) { return skb_shinfo(skb)->meta_len; -- cgit From 08dfa5a19e1f4344ce5d3a5eed4c5529adafe0dc Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jan 2022 11:10:58 -0800 Subject: net: remove linkmode_change_bit() No callers since v5.7, the initial use case seems pretty esoteric so removing this should not harm the completeness of the API. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/linkmode.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/linkmode.h b/include/linux/linkmode.h index f8397f300fcd..15e0e0209da4 100644 --- a/include/linux/linkmode.h +++ b/include/linux/linkmode.h @@ -66,11 +66,6 @@ static inline void linkmode_mod_bit(int nr, volatile unsigned long *addr, linkmode_clear_bit(nr, addr); } -static inline void linkmode_change_bit(int nr, volatile unsigned long *addr) -{ - __change_bit(nr, addr); -} - static inline int linkmode_test_bit(int nr, const volatile unsigned long *addr) { return test_bit(nr, addr); -- cgit From 8b2d546e23bb3588f897089368b5f09e49f7762d Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jan 2022 11:11:02 -0800 Subject: ipv6: remove inet6_rsk() and tcp_twsk_ipv6only() The stubs under !CONFIG_IPV6 were missed when real functions got deleted ca. v3.13. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/ipv6.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index a59d25f19385..1e0f8a31f3de 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -371,19 +371,12 @@ static inline struct ipv6_pinfo * inet6_sk(const struct sock *__sk) return NULL; } -static inline struct inet6_request_sock * - inet6_rsk(const struct request_sock *rsk) -{ - return NULL; -} - static inline struct raw6_sock *raw6_sk(const struct sock *sk) { return NULL; } #define inet6_rcv_saddr(__sk) NULL -#define tcp_twsk_ipv6only(__sk) 0 #define inet_v6_ipv6only(__sk) 0 #endif /* IS_ENABLED(CONFIG_IPV6) */ #endif /* _IPV6_H */ -- cgit From cc81df835c25b108248bad24021a21e77cbb84ac Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jan 2022 11:11:04 -0800 Subject: udp: remove inner_udp_hdr() Not used since added in v3.8. Signed-off-by: Jakub Kicinski Acked-by: Willem de Bruijn Signed-off-by: David S. Miller --- include/linux/udp.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/udp.h b/include/linux/udp.h index ae66dadd8543..254a2654400f 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -23,11 +23,6 @@ static inline struct udphdr *udp_hdr(const struct sk_buff *skb) return (struct udphdr *)skb_transport_header(skb); } -static inline struct udphdr *inner_udp_hdr(const struct sk_buff *skb) -{ - return (struct udphdr *)skb_inner_transport_header(skb); -} - #define UDP_HTABLE_SIZE_MIN (CONFIG_BASE_SMALL ? 128 : 256) static inline u32 udp_hashfn(const struct net *net, u32 num, u32 mask) -- cgit From d59a67f2f3f39012271ed3d11c338706a011c5c2 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jan 2022 11:11:06 -0800 Subject: netlink: remove nl_set_extack_cookie_u32() Not used since v5.10. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netlink.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 1ec631838af9..bda1c385cffb 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -135,15 +135,6 @@ static inline void nl_set_extack_cookie_u64(struct netlink_ext_ack *extack, extack->cookie_len = sizeof(cookie); } -static inline void nl_set_extack_cookie_u32(struct netlink_ext_ack *extack, - u32 cookie) -{ - if (!extack) - return; - memcpy(extack->cookie, &cookie, sizeof(cookie)); - extack->cookie_len = sizeof(cookie); -} - void netlink_kernel_release(struct sock *sk); int __netlink_change_ngroups(struct sock *sk, unsigned int groups); int netlink_change_ngroups(struct sock *sk, unsigned int groups); -- cgit From 46531a30364bd483bfa1b041c15d42a196e77e93 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 27 Jan 2022 14:09:13 +0000 Subject: cgroup/bpf: fast path skb BPF filtering Even though there is a static key protecting from overhead from cgroup-bpf skb filtering when there is nothing attached, in many cases it's not enough as registering a filter for one type will ruin the fast path for all others. It's observed in production servers I've looked at but also in laptops, where registration is done during init by systemd or something else. Add a per-socket fast path check guarding from such overhead. This affects both receive and transmit paths of TCP, UDP and other protocols. It showed ~1% tx/s improvement in small payload UDP send benchmarks using a real NIC and in a server environment and the number jumps to 2-3% for preemtible kernels. Reviewed-by: Stanislav Fomichev Signed-off-by: Pavel Begunkov Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/r/d8c58857113185a764927a46f4b5a058d36d3ec3.1643292455.git.asml.silence@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf-cgroup.h | 24 ++++++++++++++++++++---- include/linux/bpf.h | 13 +++++++++++++ 2 files changed, 33 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index b525d8cdc25b..88a51b242adc 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -8,6 +8,7 @@ #include #include #include +#include #include struct sock; @@ -165,11 +166,23 @@ int bpf_percpu_cgroup_storage_copy(struct bpf_map *map, void *key, void *value); int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, void *value, u64 flags); +/* Opportunistic check to see whether we have any BPF program attached*/ +static inline bool cgroup_bpf_sock_enabled(struct sock *sk, + enum cgroup_bpf_attach_type type) +{ + struct cgroup *cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); + struct bpf_prog_array *array; + + array = rcu_access_pointer(cgrp->bpf.effective[type]); + return array != &bpf_empty_prog_array.hdr; +} + /* Wrappers for __cgroup_bpf_run_filter_skb() guarded by cgroup_bpf_enabled. */ #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk, skb) \ ({ \ int __ret = 0; \ - if (cgroup_bpf_enabled(CGROUP_INET_INGRESS)) \ + if (cgroup_bpf_enabled(CGROUP_INET_INGRESS) && \ + cgroup_bpf_sock_enabled(sk, CGROUP_INET_INGRESS)) \ __ret = __cgroup_bpf_run_filter_skb(sk, skb, \ CGROUP_INET_INGRESS); \ \ @@ -181,7 +194,8 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, int __ret = 0; \ if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk && sk == skb->sk) { \ typeof(sk) __sk = sk_to_full_sk(sk); \ - if (sk_fullsock(__sk)) \ + if (sk_fullsock(__sk) && \ + cgroup_bpf_sock_enabled(__sk, CGROUP_INET_EGRESS)) \ __ret = __cgroup_bpf_run_filter_skb(__sk, skb, \ CGROUP_INET_EGRESS); \ } \ @@ -347,7 +361,8 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, kernel_optval) \ ({ \ int __ret = 0; \ - if (cgroup_bpf_enabled(CGROUP_SETSOCKOPT)) \ + if (cgroup_bpf_enabled(CGROUP_SETSOCKOPT) && \ + cgroup_bpf_sock_enabled(sock, CGROUP_SETSOCKOPT)) \ __ret = __cgroup_bpf_run_filter_setsockopt(sock, level, \ optname, optval, \ optlen, \ @@ -367,7 +382,8 @@ int bpf_percpu_cgroup_storage_update(struct bpf_map *map, void *key, max_optlen, retval) \ ({ \ int __ret = retval; \ - if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT)) \ + if (cgroup_bpf_enabled(CGROUP_GETSOCKOPT) && \ + cgroup_bpf_sock_enabled(sock, CGROUP_GETSOCKOPT)) \ if (!(sock)->sk_prot->bpf_bypass_getsockopt || \ !INDIRECT_CALL_INET_1((sock)->sk_prot->bpf_bypass_getsockopt, \ tcp_bpf_bypass_getsockopt, \ diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2344f793c4dc..e3b82ce51445 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1233,6 +1233,19 @@ struct bpf_prog_array { struct bpf_prog_array_item items[]; }; +struct bpf_empty_prog_array { + struct bpf_prog_array hdr; + struct bpf_prog *null_prog; +}; + +/* to avoid allocating empty bpf_prog_array for cgroups that + * don't have bpf program attached use one global 'bpf_empty_prog_array' + * It will not be modified the caller of bpf_prog_array_alloc() + * (since caller requested prog_cnt == 0) + * that pointer should be 'freed' by bpf_prog_array_free() + */ +extern struct bpf_empty_prog_array bpf_empty_prog_array; + struct bpf_prog_array *bpf_prog_array_alloc(u32 prog_cnt, gfp_t flags); void bpf_prog_array_free(struct bpf_prog_array *progs); int bpf_prog_array_length(struct bpf_prog_array *progs); -- cgit From 7472d5a642c94a0ee1882ff3038de72ffe803a01 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Thu, 27 Jan 2022 07:46:00 -0800 Subject: compiler_types: define __user as __attribute__((btf_type_tag("user"))) The __user attribute is currently mainly used by sparse for type checking. The attribute indicates whether a memory access is in user memory address space or not. Such information is important during tracing kernel internal functions or data structures as accessing user memory often has different mechanisms compared to accessing kernel memory. For example, the perf-probe needs explicit command line specification to indicate a particular argument or string in user-space memory ([1], [2], [3]). Currently, vmlinux BTF is available in kernel with many distributions. If __user attribute information is available in vmlinux BTF, the explicit user memory access information from users will not be necessary as the kernel can figure it out by itself with vmlinux BTF. Besides the above possible use for perf/probe, another use case is for bpf verifier. Currently, for bpf BPF_PROG_TYPE_TRACING type of bpf programs, users can write direct code like p->m1->m2 and "p" could be a function parameter. Without __user information in BTF, the verifier will assume p->m1 accessing kernel memory and will generate normal loads. Let us say "p" actually tagged with __user in the source code. In such cases, p->m1 is actually accessing user memory and direct load is not right and may produce incorrect result. For such cases, bpf_probe_read_user() will be the correct way to read p->m1. To support encoding __user information in BTF, a new attribute __attribute__((btf_type_tag(""))) is implemented in clang ([4]). For example, if we have #define __user __attribute__((btf_type_tag("user"))) during kernel compilation, the attribute "user" information will be preserved in dwarf. After pahole converting dwarf to BTF, __user information will be available in vmlinux BTF. The following is an example with latest upstream clang (clang14) and pahole 1.23: [$ ~] cat test.c #define __user __attribute__((btf_type_tag("user"))) int foo(int __user *arg) { return *arg; } [$ ~] clang -O2 -g -c test.c [$ ~] pahole -JV test.o ... [1] INT int size=4 nr_bits=32 encoding=SIGNED [2] TYPE_TAG user type_id=1 [3] PTR (anon) type_id=2 [4] FUNC_PROTO (anon) return=1 args=(3 arg) [5] FUNC foo type_id=4 [$ ~] You can see for the function argument "int __user *arg", its type is described as PTR -> TYPE_TAG(user) -> INT The kernel can use this information for bpf verification or other use cases. Current btf_type_tag is only supported in clang (>= clang14) and pahole (>= 1.23). gcc support is also proposed and under development ([5]). [1] http://lkml.kernel.org/r/155789874562.26965.10836126971405890891.stgit@devnote2 [2] http://lkml.kernel.org/r/155789872187.26965.4468456816590888687.stgit@devnote2 [3] http://lkml.kernel.org/r/155789871009.26965.14167558859557329331.stgit@devnote2 [4] https://reviews.llvm.org/D111199 [5] https://lore.kernel.org/bpf/0cbeb2fb-1a18-f690-e360-24b1c90c2a91@fb.com/ Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20220127154600.652613-1-yhs@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/compiler_types.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 3c1795fdb568..3f31ff400432 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -31,6 +31,9 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { } # define __kernel # ifdef STRUCTLEAK_PLUGIN # define __user __attribute__((user)) +# elif defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \ + __has_attribute(btf_type_tag) +# define __user __attribute__((btf_type_tag("user"))) # else # define __user # endif -- cgit From c6f1bfe89ac95dc829dcb4ed54780da134ac5fce Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Thu, 27 Jan 2022 07:46:06 -0800 Subject: bpf: reject program if a __user tagged memory accessed in kernel way BPF verifier supports direct memory access for BPF_PROG_TYPE_TRACING type of bpf programs, e.g., a->b. If "a" is a pointer pointing to kernel memory, bpf verifier will allow user to write code in C like a->b and the verifier will translate it to a kernel load properly. If "a" is a pointer to user memory, it is expected that bpf developer should be bpf_probe_read_user() helper to get the value a->b. Without utilizing BTF __user tagging information, current verifier will assume that a->b is a kernel memory access and this may generate incorrect result. Now BTF contains __user information, it can check whether the pointer points to a user memory or not. If it is, the verifier can reject the program and force users to use bpf_probe_read_user() helper explicitly. In the future, we can easily extend btf_add_space for other address space tagging, for example, rcu/percpu etc. Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20220127154606.654961-1-yhs@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 9 ++++++--- include/linux/btf.h | 5 +++++ 2 files changed, 11 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e3b82ce51445..6eb0b180d33b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -332,7 +332,10 @@ enum bpf_type_flag { */ MEM_ALLOC = BIT(2 + BPF_BASE_TYPE_BITS), - __BPF_TYPE_LAST_FLAG = MEM_ALLOC, + /* MEM is in user address space. */ + MEM_USER = BIT(3 + BPF_BASE_TYPE_BITS), + + __BPF_TYPE_LAST_FLAG = MEM_USER, }; /* Max number of base types. */ @@ -588,7 +591,7 @@ struct bpf_verifier_ops { const struct btf *btf, const struct btf_type *t, int off, int size, enum bpf_access_type atype, - u32 *next_btf_id); + u32 *next_btf_id, enum bpf_type_flag *flag); }; struct bpf_prog_offload_ops { @@ -1780,7 +1783,7 @@ static inline bool bpf_tracing_btf_ctx_access(int off, int size, int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf, const struct btf_type *t, int off, int size, enum bpf_access_type atype, - u32 *next_btf_id); + u32 *next_btf_id, enum bpf_type_flag *flag); bool btf_struct_ids_match(struct bpf_verifier_log *log, const struct btf *btf, u32 id, int off, const struct btf *need_btf, u32 need_type_id); diff --git a/include/linux/btf.h b/include/linux/btf.h index b12cfe3b12bb..f6c43dd513fa 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -238,6 +238,11 @@ static inline bool btf_type_is_var(const struct btf_type *t) return BTF_INFO_KIND(t->info) == BTF_KIND_VAR; } +static inline bool btf_type_is_type_tag(const struct btf_type *t) +{ + return BTF_INFO_KIND(t->info) == BTF_KIND_TYPE_TAG; +} + /* union is only a special case of struct: * all its offsetof(member) == 0 */ -- cgit From 9059b04b4108be71397941f4665d5aa79783125a Mon Sep 17 00:00:00 2001 From: Tariq Toukan Date: Sun, 9 Jan 2022 21:46:34 +0200 Subject: net/mlx5: Remove unused TIR modify bitmask enums struct mlx5_ifc_modify_tir_bitmask_bits is used for the bitmask of MODIFY_TIR operations. Remove the unused bitmask enums. Signed-off-by: Tariq Toukan Reviewed-by: Gal Pressman Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 598ac3bcc901..27145c4d6820 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -63,13 +63,6 @@ enum { MLX5_EVENT_TYPE_CODING_FPGA_QP_ERROR = 0x21 }; -enum { - MLX5_MODIFY_TIR_BITMASK_LRO = 0x0, - MLX5_MODIFY_TIR_BITMASK_INDIRECT_TABLE = 0x1, - MLX5_MODIFY_TIR_BITMASK_HASH = 0x2, - MLX5_MODIFY_TIR_BITMASK_TUNNELED_OFFLOAD_EN = 0x3 -}; - enum { MLX5_SET_HCA_CAP_OP_MOD_GENERAL_DEVICE = 0x0, MLX5_SET_HCA_CAP_OP_MOD_ODP = 0x2, -- cgit From b5b3d10ef638fcadc56609961875bf157a92aaa4 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 27 Jan 2022 08:33:49 -0800 Subject: net: mii: remove mii_lpa_mod_linkmode_lpa_sgmii() Vladimir points out that since we removed mii_lpa_to_linkmode_lpa_sgmii(), mii_lpa_mod_linkmode_lpa_sgmii() is also no longer called. Suggested-by: Vladimir Oltean Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/mii.h | 33 --------------------------------- 1 file changed, 33 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mii.h b/include/linux/mii.h index b8a1a17a87dd..5ee13083cec7 100644 --- a/include/linux/mii.h +++ b/include/linux/mii.h @@ -354,39 +354,6 @@ static inline u32 mii_adv_to_ethtool_adv_x(u32 adv) return result; } -/** - * mii_lpa_mod_linkmode_adv_sgmii - * @lp_advertising: pointer to destination link mode. - * @lpa: value of the MII_LPA register - * - * A small helper function that translates MII_LPA bits to - * linkmode advertisement settings for SGMII. - * Leaves other bits unchanged. - */ -static inline void -mii_lpa_mod_linkmode_lpa_sgmii(unsigned long *lp_advertising, u32 lpa) -{ - u32 speed_duplex = lpa & LPA_SGMII_DPX_SPD_MASK; - - linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT, lp_advertising, - speed_duplex == LPA_SGMII_1000HALF); - - linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT, lp_advertising, - speed_duplex == LPA_SGMII_1000FULL); - - linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, lp_advertising, - speed_duplex == LPA_SGMII_100HALF); - - linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, lp_advertising, - speed_duplex == LPA_SGMII_100FULL); - - linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, lp_advertising, - speed_duplex == LPA_SGMII_10HALF); - - linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, lp_advertising, - speed_duplex == LPA_SGMII_10FULL); -} - /** * mii_adv_mod_linkmode_adv_t * @advertising:pointer to destination link mode. -- cgit From 9690ae60429020f38e4aa2540c306f27eb021bc0 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 27 Jan 2022 10:42:59 -0800 Subject: ethtool: add header/data split indication For applications running on a mix of platforms it's useful to have a clear indication whether host's NIC supports the geometry requirements of TCP zero-copy. TCP zero-copy Rx requires data to be neatly placed into memory pages. Most NICs can't do that. This patch is adding GET support only, since the NICs I work with either always have the feature enabled or enable it whenever MTU is set to jumbo. In other words I don't need SET. But adding set should be trivial. (The only note on SET is that we will likely want the setting to be "sticky" and use 0 / `unknown` to reset it back to driver default.) Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/ethtool.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 11efc45de66a..e0853f48b75e 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -70,9 +70,11 @@ enum { /** * struct kernel_ethtool_ringparam - RX/TX ring configuration * @rx_buf_len: Current length of buffers on the rx ring. + * @tcp_data_split: Scatter packet headers and data to separate buffers */ struct kernel_ethtool_ringparam { u32 rx_buf_len; + u8 tcp_data_split; }; /** -- cgit From 6cdef8a6ee748ec522aabee049633b16b4115f73 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 27 Jan 2022 12:09:35 -0800 Subject: SUNRPC: add netns refcount tracker to struct svc_xprt struct svc_xprt holds a long lived reference to a netns, it is worth tracking it. Signed-off-by: Eric Dumazet Acked-by: Chuck Lever Signed-off-by: David S. Miller --- include/linux/sunrpc/svc_xprt.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 571f605bc91e..382af90320ac 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -88,6 +88,7 @@ struct svc_xprt { struct list_head xpt_users; /* callbacks on free */ struct net *xpt_net; + netns_tracker ns_tracker; const struct cred *xpt_cred; struct rpc_xprt *xpt_bc_xprt; /* NFSv4.1 backchannel */ struct rpc_xprt_switch *xpt_bc_xps; /* NFSv4.1 backchannel */ -- cgit From b9a0d6d143ec63132beff04802578c499083f87c Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 27 Jan 2022 12:09:37 -0800 Subject: SUNRPC: add netns refcount tracker to struct rpc_xprt Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/sunrpc/xprt.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 955ea4d7af0b..3cdc8d878d81 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -284,6 +284,7 @@ struct rpc_xprt { } stat; struct net *xprt_net; + netns_tracker ns_tracker; const char *servername; const char *address_strings[RPC_DISPLAY_MAX]; #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) -- cgit From ca0acb511c21738b32386ce0f85c284b351d919e Mon Sep 17 00:00:00 2001 From: Akhil R Date: Fri, 28 Jan 2022 17:14:25 +0530 Subject: device property: Add fwnode_irq_get_byname Add fwnode_irq_get_byname() to get an interrupt by name from either ACPI table or Device Tree, whichever is used for enumeration. In the ACPI case, this allow us to use 'interrupt-names' in _DSD which can be mapped to Interrupt() resource by index. The implementation is similar to 'interrupt-names' in the Device Tree. Signed-off-by: Akhil R Reviewed-by: Andy Shevchenko Acked-by: Rafael J. Wysocki Signed-off-by: Wolfram Sang --- include/linux/property.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/property.h b/include/linux/property.h index 7399a0b45f98..95d56a562b6a 100644 --- a/include/linux/property.h +++ b/include/linux/property.h @@ -121,6 +121,7 @@ struct fwnode_handle *fwnode_handle_get(struct fwnode_handle *fwnode); void fwnode_handle_put(struct fwnode_handle *fwnode); int fwnode_irq_get(const struct fwnode_handle *fwnode, unsigned int index); +int fwnode_irq_get_byname(const struct fwnode_handle *fwnode, const char *name); void __iomem *fwnode_iomap(struct fwnode_handle *fwnode, int index); -- cgit From a263a84088f689bf0c1552a510b25d0bcc45fcae Mon Sep 17 00:00:00 2001 From: Akhil R Date: Fri, 28 Jan 2022 17:14:27 +0530 Subject: i2c: smbus: Use device_*() functions instead of of_*() Change of_*() functions to device_*() for firmware agnostic usage. This allows to have the smbus_alert interrupt without any changes in the controller drivers using the ACPI table. Signed-off-by: Akhil R Reviewed-by: Andy Shevchenko Signed-off-by: Wolfram Sang --- include/linux/i2c-smbus.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/i2c-smbus.h b/include/linux/i2c-smbus.h index 1ef421818d3a..95cf902e0bda 100644 --- a/include/linux/i2c-smbus.h +++ b/include/linux/i2c-smbus.h @@ -30,10 +30,10 @@ struct i2c_client *i2c_new_smbus_alert_device(struct i2c_adapter *adapter, struct i2c_smbus_alert_setup *setup); int i2c_handle_smbus_alert(struct i2c_client *ara); -#if IS_ENABLED(CONFIG_I2C_SMBUS) && IS_ENABLED(CONFIG_OF) -int of_i2c_setup_smbus_alert(struct i2c_adapter *adap); +#if IS_ENABLED(CONFIG_I2C_SMBUS) +int i2c_setup_smbus_alert(struct i2c_adapter *adap); #else -static inline int of_i2c_setup_smbus_alert(struct i2c_adapter *adap) +static inline int i2c_setup_smbus_alert(struct i2c_adapter *adap) { return 0; } -- cgit From e820a33748b5e22cecafddf919a7d8679949deb1 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 26 Jan 2022 15:53:50 +0200 Subject: math.h: Introduce data types for fractional numbers Introduce a macro to produce data types like struct TYPE_fract { __TYPE numerator; __TYPE denominator; }; to be used in the code wherever it's needed. In the following changes convert some users to it. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220126135353.24007-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jonathan Cameron --- include/linux/math.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/math.h b/include/linux/math.h index 53674a327e39..439b8f0b9ebd 100644 --- a/include/linux/math.h +++ b/include/linux/math.h @@ -2,6 +2,7 @@ #ifndef _LINUX_MATH_H #define _LINUX_MATH_H +#include #include #include @@ -106,6 +107,17 @@ } \ ) +#define __STRUCT_FRACT(type) \ +struct type##_fract { \ + __##type numerator; \ + __##type denominator; \ +}; +__STRUCT_FRACT(s16) +__STRUCT_FRACT(u16) +__STRUCT_FRACT(s32) +__STRUCT_FRACT(u32) +#undef __STRUCT_FRACT + /* * Multiplies an integer by a fraction, while avoiding unnecessary * overflow or loss of precision. -- cgit From a5e9b2ddbbc789f34be2333263232435d8edf57c Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 26 Jan 2022 15:53:53 +0200 Subject: iio: adc: qcom-vadc-common: Re-use generic struct u32_fract Instead of custom data type re-use generic struct u32_fract. No changes intended. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220126135353.24007-4-andriy.shevchenko@linux.intel.com Signed-off-by: Jonathan Cameron --- include/linux/iio/adc/qcom-vadc-common.h | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/adc/qcom-vadc-common.h b/include/linux/iio/adc/qcom-vadc-common.h index 33f60f43e1aa..ce78d4804994 100644 --- a/include/linux/iio/adc/qcom-vadc-common.h +++ b/include/linux/iio/adc/qcom-vadc-common.h @@ -6,6 +6,7 @@ #ifndef QCOM_VADC_COMMON_H #define QCOM_VADC_COMMON_H +#include #include #define VADC_CONV_TIME_MIN_US 2000 @@ -79,16 +80,6 @@ struct vadc_linear_graph { s32 gnd; }; -/** - * struct vadc_prescale_ratio - Represent scaling ratio for ADC input. - * @num: the inverse numerator of the gain applied to the input channel. - * @den: the inverse denominator of the gain applied to the input channel. - */ -struct vadc_prescale_ratio { - u32 num; - u32 den; -}; - /** * enum vadc_scale_fn_type - Scaling function to convert ADC code to * physical scaled units for the channel. @@ -144,12 +135,12 @@ struct adc5_data { int qcom_vadc_scale(enum vadc_scale_fn_type scaletype, const struct vadc_linear_graph *calib_graph, - const struct vadc_prescale_ratio *prescale, + const struct u32_fract *prescale, bool absolute, u16 adc_code, int *result_mdec); struct qcom_adc5_scale_type { - int (*scale_fn)(const struct vadc_prescale_ratio *prescale, + int (*scale_fn)(const struct u32_fract *prescale, const struct adc5_data *data, u16 adc_code, int *result); }; -- cgit From 73c105ad2a3e142d81fc59761ce8353d0b211b8f Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Fri, 28 Jan 2022 21:32:40 +0300 Subject: phy: make phy_set_max_speed() *void* After following the call tree of phy_set_max_speed(), it became clear that this function never returns anything but 0, so we can change its result type to *void* and drop the result checks from the three drivers that actually bothered to do it... Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Signed-off-by: Sergey Shtylyov Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/phy.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 6de8d7a90d78..cd08cf1a8b0d 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1661,7 +1661,7 @@ int phy_disable_interrupts(struct phy_device *phydev); void phy_request_interrupt(struct phy_device *phydev); void phy_free_interrupt(struct phy_device *phydev); void phy_print_status(struct phy_device *phydev); -int phy_set_max_speed(struct phy_device *phydev, u32 max_speed); +void phy_set_max_speed(struct phy_device *phydev, u32 max_speed); void phy_remove_link_mode(struct phy_device *phydev, u32 link_mode); void phy_advertise_supported(struct phy_device *phydev); void phy_support_sym_pause(struct phy_device *phydev); -- cgit From 13e906e50a8cf6033f22c03c4d772e36a9e02c6b Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Thu, 27 Jan 2022 12:01:07 -0800 Subject: component: Replace most references to 'master' with 'aggregate device' Remove most references to 'master' in the code and replace them with some form of 'aggregate device'. This better reflects the reality of what this code does, i.e. an aggregate device that represents a device like a GPU card once some set of devices that make up the aggregate device probe and register with the component framework. Cc: Daniel Vetter Cc: Greg Kroah-Hartman Cc: Laurent Pinchart Cc: "Rafael J. Wysocki" Cc: Rob Clark Cc: Russell King Cc: Saravana Kannan Signed-off-by: Stephen Boyd Link: https://lore.kernel.org/r/20220127200141.1295328-2-swboyd@chromium.org Signed-off-by: Greg Kroah-Hartman --- include/linux/component.h | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/component.h b/include/linux/component.h index 16de18f473d7..7012569c6546 100644 --- a/include/linux/component.h +++ b/include/linux/component.h @@ -38,10 +38,10 @@ int component_add_typed(struct device *dev, const struct component_ops *ops, int subcomponent); void component_del(struct device *, const struct component_ops *); -int component_bind_all(struct device *master, void *master_data); -void component_unbind_all(struct device *master, void *master_data); +int component_bind_all(struct device *parent, void *data); +void component_unbind_all(struct device *parent, void *data); -struct master; +struct aggregate_device; /** * struct component_master_ops - callback for the aggregate driver @@ -89,22 +89,22 @@ struct component_match; int component_master_add_with_match(struct device *, const struct component_master_ops *, struct component_match *); -void component_match_add_release(struct device *master, +void component_match_add_release(struct device *parent, struct component_match **matchptr, void (*release)(struct device *, void *), int (*compare)(struct device *, void *), void *compare_data); -void component_match_add_typed(struct device *master, +void component_match_add_typed(struct device *parent, struct component_match **matchptr, int (*compare_typed)(struct device *, int, void *), void *compare_data); /** * component_match_add - add a component match entry - * @master: device with the aggregate driver + * @parent: device with the aggregate driver * @matchptr: pointer to the list of component matches * @compare: compare function to match against all components * @compare_data: opaque pointer passed to the @compare function * - * Adds a new component match to the list stored in @matchptr, which the @master + * Adds a new component match to the list stored in @matchptr, which the @parent * aggregate driver needs to function. The list of component matches pointed to * by @matchptr must be initialized to NULL before adding the first match. This * only matches against components added with component_add(). @@ -114,11 +114,11 @@ void component_match_add_typed(struct device *master, * * See also component_match_add_release() and component_match_add_typed(). */ -static inline void component_match_add(struct device *master, +static inline void component_match_add(struct device *parent, struct component_match **matchptr, int (*compare)(struct device *, void *), void *compare_data) { - component_match_add_release(master, matchptr, NULL, compare, + component_match_add_release(parent, matchptr, NULL, compare, compare_data); } -- cgit From 31455bbda2081af83f72bb4636348b12b82c37c1 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Tue, 25 Jan 2022 01:58:36 +0100 Subject: spi: pxa2xx_spi: Convert to use GPIO descriptors This converts the PXA2xx SPI driver to use GPIO descriptors exclusively to retrieve GPIO chip select lines. The device tree and ACPI paths of the driver already use descriptors, hence ->use_gpio_descriptors is already set and this codepath is well tested. Convert all the PXA boards providing chip select GPIOs as platform data and drop the old GPIO chipselect handling in favor of the core managing it exclusively. Cc: Marek Vasut Cc: Daniel Mack Cc: Haojian Zhuang Cc: Robert Jarzmik Cc: linux-arm-kernel@lists.infradead.org Acked-by: Jonathan Cameron Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220125005836.494807-1-linus.walleij@linaro.org Signed-off-by: Mark Brown --- include/linux/spi/pxa2xx_spi.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/spi/pxa2xx_spi.h b/include/linux/spi/pxa2xx_spi.h index ca74dce36706..4658e7801b42 100644 --- a/include/linux/spi/pxa2xx_spi.h +++ b/include/linux/spi/pxa2xx_spi.h @@ -42,7 +42,6 @@ struct pxa2xx_spi_chip { u8 rx_threshold; u8 dma_burst_size; u32 timeout; - int gpio_cs; }; #if defined(CONFIG_ARCH_PXA) || defined(CONFIG_ARCH_MMP) -- cgit From d80976d9ffd9d7f89a26134a299b236910477f3b Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Tue, 30 Nov 2021 16:27:55 +0100 Subject: dma-resv: some doc polish for iterators MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Hammer it a bit more in that iterators can be restarted and when that matters, plus suggest to prefer the locked version whenver. Also delete the two leftover kerneldoc for static functions plus sprinkle some more links while at it. v2: Keep some comments (Christian) Reviewed-by: Christian König Signed-off-by: Daniel Vetter Cc: Sumit Semwal Cc: "Christian König" Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20211130152756.1388106-1-daniel.vetter@ffwll.ch --- include/linux/dma-resv.h | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index a715df97b31a..afdfdfac729f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -153,6 +153,13 @@ struct dma_resv { * struct dma_resv_iter - current position into the dma_resv fences * * Don't touch this directly in the driver, use the accessor function instead. + * + * IMPORTANT + * + * When using the lockless iterators like dma_resv_iter_next_unlocked() or + * dma_resv_for_each_fence_unlocked() beware that the iterator can be restarted. + * Code which accumulates statistics or similar needs to check for this with + * dma_resv_iter_is_restarted(). */ struct dma_resv_iter { /** @obj: The dma_resv object we iterate over */ @@ -243,7 +250,11 @@ static inline bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor) * &dma_resv.lock and using RCU instead. The cursor needs to be initialized * with dma_resv_iter_begin() and cleaned up with dma_resv_iter_end(). Inside * the iterator a reference to the dma_fence is held and the RCU lock dropped. - * When the dma_resv is modified the iteration starts over again. + * + * Beware that the iterator can be restarted when the struct dma_resv for + * @cursor is modified. Code which accumulates statistics or similar needs to + * check for this with dma_resv_iter_is_restarted(). For this reason prefer the + * lock iterator dma_resv_for_each_fence() whenever possible. */ #define dma_resv_for_each_fence_unlocked(cursor, fence) \ for (fence = dma_resv_iter_first_unlocked(cursor); \ -- cgit From 943515090ec67f81f6f93febfddb8c9118357e97 Mon Sep 17 00:00:00 2001 From: AngeloGioacchino Del Regno Date: Wed, 8 Dec 2021 09:34:22 +0100 Subject: firmware: qcom: scm: Add function to set the maximum IOMMU pool size This is not necessary for basic functionality of the IOMMU, but it's an optimization that tells to the TZ what's the maximum mappable size for the secure IOMMUs, so that it can optimize the data structures in the TZ itself. Signed-off-by: AngeloGioacchino Del Regno [Marijn: ported from 5.3 to the unified architecture in 5.11] Signed-off-by: Marijn Suijten Reviewed-by: Konrad Dybcio Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20211208083423.22037-3-marijn.suijten@somainline.org --- include/linux/qcom_scm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h index 81cad9e1e412..8a065f8660c1 100644 --- a/include/linux/qcom_scm.h +++ b/include/linux/qcom_scm.h @@ -83,6 +83,7 @@ extern bool qcom_scm_restore_sec_cfg_available(void); extern int qcom_scm_restore_sec_cfg(u32 device_id, u32 spare); extern int qcom_scm_iommu_secure_ptbl_size(u32 spare, size_t *size); extern int qcom_scm_iommu_secure_ptbl_init(u64 addr, u32 size, u32 spare); +extern int qcom_scm_iommu_set_cp_pool_size(u32 spare, u32 size); extern int qcom_scm_mem_protect_video_var(u32 cp_start, u32 cp_size, u32 cp_nonpixel_start, u32 cp_nonpixel_size); -- cgit From 071a13332de894cb3c38b17c82350f1e4167c023 Mon Sep 17 00:00:00 2001 From: AngeloGioacchino Del Regno Date: Wed, 8 Dec 2021 09:34:23 +0100 Subject: firmware: qcom: scm: Add function to set IOMMU pagetable addressing Add a function to change the IOMMU pagetable addressing to AArch32 LPAE or AArch64. If doing that, then this must be done for each IOMMU context (not necessarily at the same time). Signed-off-by: AngeloGioacchino Del Regno [Marijn: ported from 5.3 to the unified architecture in 5.11] Signed-off-by: Marijn Suijten Reviewed-by: Konrad Dybcio Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20211208083423.22037-4-marijn.suijten@somainline.org --- include/linux/qcom_scm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h index 8a065f8660c1..ca4a88d7cbdc 100644 --- a/include/linux/qcom_scm.h +++ b/include/linux/qcom_scm.h @@ -108,6 +108,7 @@ extern bool qcom_scm_hdcp_available(void); extern int qcom_scm_hdcp_req(struct qcom_scm_hdcp_req *req, u32 req_cnt, u32 *resp); +extern int qcom_scm_iommu_set_pt_format(u32 sec_id, u32 ctx_num, u32 pt_fmt); extern int qcom_scm_qsmmu500_wait_safe_toggle(bool en); extern int qcom_scm_lmh_dcvsh(u32 payload_fn, u32 payload_reg, u32 payload_val, -- cgit From 6d3ac94bae21de6b6a1d81596752428960012816 Mon Sep 17 00:00:00 2001 From: Yang Guang Date: Fri, 14 Jan 2022 08:11:02 +0800 Subject: ssb: fix boolreturn.cocci warning The coccinelle report ./include/linux/ssb/ssb_driver_gige.h:98:8-9: WARNING: return of 0/1 in function 'ssb_gige_must_flush_posted_writes' with return type bool Return statements in functions returning bool should use true/false instead of 1/0. Reported-by: Zeal Robot Signed-off-by: Yang Guang Signed-off-by: David Yang Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/fa4f1fa737e715eb62a85229ac5f12bae21145cf.1642065490.git.davidcomponentone@gmail.com --- include/linux/ssb/ssb_driver_gige.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ssb/ssb_driver_gige.h b/include/linux/ssb/ssb_driver_gige.h index 15ba0df1ee0d..28c145a51e57 100644 --- a/include/linux/ssb/ssb_driver_gige.h +++ b/include/linux/ssb/ssb_driver_gige.h @@ -95,7 +95,7 @@ static inline bool ssb_gige_must_flush_posted_writes(struct pci_dev *pdev) struct ssb_gige *dev = pdev_to_ssb_gige(pdev); if (dev) return (dev->dev->bus->chip_id == 0x4785); - return 0; + return false; } /* Get the device MAC address */ -- cgit From ef9989afda73332df566852d6e9ca695c05f10ce Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Tue, 1 Feb 2022 13:29:22 +0000 Subject: kvm: add guest_state_{enter,exit}_irqoff() When transitioning to/from guest mode, it is necessary to inform lockdep, tracing, and RCU in a specific order, similar to the requirements for transitions to/from user mode. Additionally, it is necessary to perform vtime accounting for a window around running the guest, with RCU enabled, such that timer interrupts taken from the guest can be accounted as guest time. Most architectures don't handle all the necessary pieces, and a have a number of common bugs, including unsafe usage of RCU during the window between guest_enter() and guest_exit(). On x86, this was dealt with across commits: 87fa7f3e98a1310e ("x86/kvm: Move context tracking where it belongs") 0642391e2139a2c1 ("x86/kvm/vmx: Add hardirq tracing to guest enter/exit") 9fc975e9efd03e57 ("x86/kvm/svm: Add hardirq tracing on guest enter/exit") 3ebccdf373c21d86 ("x86/kvm/vmx: Move guest enter/exit into .noinstr.text") 135961e0a7d555fc ("x86/kvm/svm: Move guest enter/exit into .noinstr.text") 160457140187c5fb ("KVM: x86: Defer vtime accounting 'til after IRQ handling") bc908e091b326467 ("KVM: x86: Consolidate guest enter/exit logic to common helpers") ... but those fixes are specific to x86, and as the resulting logic (while correct) is split across generic helper functions and x86-specific helper functions, it is difficult to see that the entry/exit accounting is balanced. This patch adds generic helpers which architectures can use to handle guest entry/exit consistently and correctly. The guest_{enter,exit}() helpers are split into guest_timing_{enter,exit}() to perform vtime accounting, and guest_context_{enter,exit}() to perform the necessary context tracking and RCU management. The existing guest_{enter,exit}() heleprs are left as wrappers of these. Atop this, new guest_state_enter_irqoff() and guest_state_exit_irqoff() helpers are added to handle the ordering of lockdep, tracing, and RCU manageent. These are inteneded to mirror exit_to_user_mode() and enter_from_user_mode(). Subsequent patches will migrate architectures over to the new helpers, following a sequence: guest_timing_enter_irqoff(); guest_state_enter_irqoff(); < run the vcpu > guest_state_exit_irqoff(); < take any pending IRQs > guest_timing_exit_irqoff(); This sequences handles all of the above correctly, and more clearly balances the entry and exit portions, making it easier to understand. The existing helpers are marked as deprecated, and will be removed once all architectures have been converted. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Reviewed-by: Marc Zyngier Reviewed-by: Paolo Bonzini Reviewed-by: Nicolas Saenz Julienne Message-Id: <20220201132926.3301912-2-mark.rutland@arm.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 112 +++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 109 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f079820f52b5..b3810976a27f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -29,7 +29,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -368,8 +370,11 @@ struct kvm_vcpu { u64 last_used_slot_gen; }; -/* must be called with irqs disabled */ -static __always_inline void guest_enter_irqoff(void) +/* + * Start accounting time towards a guest. + * Must be called before entering guest context. + */ +static __always_inline void guest_timing_enter_irqoff(void) { /* * This is running in ioctl context so its safe to assume that it's the @@ -378,7 +383,18 @@ static __always_inline void guest_enter_irqoff(void) instrumentation_begin(); vtime_account_guest_enter(); instrumentation_end(); +} +/* + * Enter guest context and enter an RCU extended quiescent state. + * + * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is + * unsafe to use any code which may directly or indirectly use RCU, tracing + * (including IRQ flag tracing), or lockdep. All code in this period must be + * non-instrumentable. + */ +static __always_inline void guest_context_enter_irqoff(void) +{ /* * KVM does not hold any references to rcu protected data when it * switches CPU into a guest mode. In fact switching to a guest mode @@ -394,16 +410,79 @@ static __always_inline void guest_enter_irqoff(void) } } -static __always_inline void guest_exit_irqoff(void) +/* + * Deprecated. Architectures should move to guest_timing_enter_irqoff() and + * guest_state_enter_irqoff(). + */ +static __always_inline void guest_enter_irqoff(void) +{ + guest_timing_enter_irqoff(); + guest_context_enter_irqoff(); +} + +/** + * guest_state_enter_irqoff - Fixup state when entering a guest + * + * Entry to a guest will enable interrupts, but the kernel state is interrupts + * disabled when this is invoked. Also tell RCU about it. + * + * 1) Trace interrupts on state + * 2) Invoke context tracking if enabled to adjust RCU state + * 3) Tell lockdep that interrupts are enabled + * + * Invoked from architecture specific code before entering a guest. + * Must be called with interrupts disabled and the caller must be + * non-instrumentable. + * The caller has to invoke guest_timing_enter_irqoff() before this. + * + * Note: this is analogous to exit_to_user_mode(). + */ +static __always_inline void guest_state_enter_irqoff(void) +{ + instrumentation_begin(); + trace_hardirqs_on_prepare(); + lockdep_hardirqs_on_prepare(CALLER_ADDR0); + instrumentation_end(); + + guest_context_enter_irqoff(); + lockdep_hardirqs_on(CALLER_ADDR0); +} + +/* + * Exit guest context and exit an RCU extended quiescent state. + * + * Between guest_context_enter_irqoff() and guest_context_exit_irqoff() it is + * unsafe to use any code which may directly or indirectly use RCU, tracing + * (including IRQ flag tracing), or lockdep. All code in this period must be + * non-instrumentable. + */ +static __always_inline void guest_context_exit_irqoff(void) { context_tracking_guest_exit(); +} +/* + * Stop accounting time towards a guest. + * Must be called after exiting guest context. + */ +static __always_inline void guest_timing_exit_irqoff(void) +{ instrumentation_begin(); /* Flush the guest cputime we spent on the guest */ vtime_account_guest_exit(); instrumentation_end(); } +/* + * Deprecated. Architectures should move to guest_state_exit_irqoff() and + * guest_timing_exit_irqoff(). + */ +static __always_inline void guest_exit_irqoff(void) +{ + guest_context_exit_irqoff(); + guest_timing_exit_irqoff(); +} + static inline void guest_exit(void) { unsigned long flags; @@ -413,6 +492,33 @@ static inline void guest_exit(void) local_irq_restore(flags); } +/** + * guest_state_exit_irqoff - Establish state when returning from guest mode + * + * Entry from a guest disables interrupts, but guest mode is traced as + * interrupts enabled. Also with NO_HZ_FULL RCU might be idle. + * + * 1) Tell lockdep that interrupts are disabled + * 2) Invoke context tracking if enabled to reactivate RCU + * 3) Trace interrupts off state + * + * Invoked from architecture specific code after exiting a guest. + * Must be invoked with interrupts disabled and the caller must be + * non-instrumentable. + * The caller has to invoke guest_timing_exit_irqoff() after this. + * + * Note: this is analogous to enter_from_user_mode(). + */ +static __always_inline void guest_state_exit_irqoff(void) +{ + lockdep_hardirqs_off(CALLER_ADDR0); + guest_context_exit_irqoff(); + + instrumentation_begin(); + trace_hardirqs_off_finish(); + instrumentation_end(); +} + static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) { /* -- cgit From 2220af8ca61ae67de4ec3deec1c6395a2f65b9fd Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Tue, 1 Feb 2022 14:06:47 +0100 Subject: power: supply: core: Refactor power_supply_set_input_current_limit_from_supplier() Some (USB) charger ICs have variants with USB D+ and D- pins to do their own builtin charger-type detection, like e.g. the bq24190 and bq25890 and also variants which lack this functionality, e.g. the bq24192 and bq25892. In case the charger-type; and thus the input-current-limit detection is done outside the charger IC then we need some way to communicate this to the charger IC. In the past extcon was used for this, but if the external detection does e.g. full USB PD negotiation then the extcon cable-types do not convey enough information. For these setups it was decided to model the external charging "brick" and the parameters negotiated with it as a power_supply class-device itself; and power_supply_set_input_current_limit_from_supplier() was introduced to allow drivers to get the input-current-limit this way. But in some cases psy drivers may want to know other properties, e.g. the bq25892 can do "quick-charge" negotiation by pulsing its current draw, but this should only be done if the usb_type psy-property of its supplier is set to DCP (and device-properties indicate the board allows higher voltages). Instead of adding extra helper functions for each property which a psy-driver wants to query from its supplier, refactor power_supply_set_input_current_limit_from_supplier() into a more generic power_supply_get_property_from_supplier() function. Reviewed-by: Andy Shevchenko Signed-off-by: Hans de Goede Signed-off-by: Sebastian Reichel --- include/linux/power_supply.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h index e218041cc000..006111917d1a 100644 --- a/include/linux/power_supply.h +++ b/include/linux/power_supply.h @@ -597,8 +597,9 @@ power_supply_temp2resist_simple(struct power_supply_resistance_temp_table *table int table_len, int temp); extern void power_supply_changed(struct power_supply *psy); extern int power_supply_am_i_supplied(struct power_supply *psy); -extern int power_supply_set_input_current_limit_from_supplier( - struct power_supply *psy); +int power_supply_get_property_from_supplier(struct power_supply *psy, + enum power_supply_property psp, + union power_supply_propval *val); extern int power_supply_set_battery_charged(struct power_supply *psy); #ifdef CONFIG_POWER_SUPPLY -- cgit From 79d35365a5858466ff7b37aaf1fcf11b683b9442 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Tue, 1 Feb 2022 14:06:56 +0100 Subject: power: supply: bq25890: Add support for registering the Vbus boost converter as a regulator The bq25890_charger code supports enabling/disabling the boost converter based on usb-phy notifications. But the usb-phy framework is not used on all boards/platforms. At support for registering the Vbus boost converter as a standard regulator when there is no usb-phy on the board. Also add support for providing regulator_init_data through platform_data for use on boards where device-tree is not used and the platform code must thus provide the regulator_init_data. Reviewed-by: Andy Shevchenko Signed-off-by: Hans de Goede Signed-off-by: Sebastian Reichel --- include/linux/power/bq25890_charger.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 include/linux/power/bq25890_charger.h (limited to 'include/linux') diff --git a/include/linux/power/bq25890_charger.h b/include/linux/power/bq25890_charger.h new file mode 100644 index 000000000000..c706ddb77a08 --- /dev/null +++ b/include/linux/power/bq25890_charger.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Platform data for the TI bq25890 battery charger driver. + */ + +#ifndef _BQ25890_CHARGER_H_ +#define _BQ25890_CHARGER_H_ + +struct regulator_init_data; + +struct bq25890_platform_data { + const struct regulator_init_data *regulator_init_data; +}; + +#endif -- cgit From 3afcbe09470091ca8a8048ef7c96701839a70961 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Tue, 1 Feb 2022 14:07:00 +0100 Subject: mfd: intel_soc_pmic_chtwc: Add cht_wc_model data to struct intel_soc_pmic Tablet / laptop designs using an Intel Cherry Trail x86 main SoC with an Intel Whiskey Cove PMIC do not use a single standard setup for the charger, fuel-gauge and other chips surrounding the PMIC / charging+data USB port. Unlike what is normal on x86 this diversity in designs is not handled by the ACPI tables. On 2 of the 3 known designs there are no standard (PNP0C0A) ACPI battery devices and on the 3th design the ACPI battery device does not work under Linux due to it requiring non-standard and undocumented ACPI behavior. So to make things work under Linux we use native charger and fuel-gauge drivers on these devices, re-using the native drivers used on ARM boards with the same charger / fuel-gauge ICs. This requires various MFD-cell drivers for the CHT-WC PMIC cells to know which model they are exactly running on so that they can e.g. instantiate an I2C-client for the right model charger-IC (the charger is connected to an I2C-controller which is part of the PMIC). Rather then duplicating DMI-id matching to check which model we are running on in each MFD-cell driver, add a check for this to the shared drivers/mfd/intel_soc_pmic_chtwc.c code by using a DMI table for all 3 known models: 1. The GPD Win and GPD Pocket mini-laptops, these are really 2 models but the Pocket re-uses the GPD Win's design in a different housing: The WC PMIC is connected to a TI BQ24292i charger, paired with a Maxim MAX17047 fuelgauge + a FUSB302 USB Type-C Controller + a PI3USB30532 USB switch, for a fully functional Type-C port. 2. The Xiaomi Mi Pad 2: The WC PMIC is connected to a TI BQ25890 charger, paired with a TI BQ27520 fuelgauge, using the TI BQ25890 for BC1.2 charger type detection, for a USB-2 only Type-C port without PD. 3. The Lenovo Yoga Book YB1-X90 / Lenovo Yoga Book YB1-X91 series: The WC PMIC is connected to a TI BQ25892 charger, paired with a TI BQ27542 fuelgauge, using the WC PMIC for BC1.2 charger type detection and using the BQ25892's Mediatek Pump Express+ (1.0) support to enable charging with up to 12V through a micro-USB port. Reviewed-by: Andy Shevchenko Acked-by: Lee Jones Signed-off-by: Hans de Goede Signed-off-by: Sebastian Reichel --- include/linux/mfd/intel_soc_pmic.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mfd/intel_soc_pmic.h b/include/linux/mfd/intel_soc_pmic.h index 6a88e34cb955..945bde1fe55c 100644 --- a/include/linux/mfd/intel_soc_pmic.h +++ b/include/linux/mfd/intel_soc_pmic.h @@ -13,6 +13,13 @@ #include +enum intel_cht_wc_models { + INTEL_CHT_WC_UNKNOWN, + INTEL_CHT_WC_GPD_WIN_POCKET, + INTEL_CHT_WC_XIAOMI_MIPAD2, + INTEL_CHT_WC_LENOVO_YOGABOOK1, +}; + /** * struct intel_soc_pmic - Intel SoC PMIC data * @irq: Master interrupt number of the parent PMIC device @@ -39,6 +46,7 @@ struct intel_soc_pmic { struct regmap_irq_chip_data *irq_chip_data_crit; struct device *dev; struct intel_scu_ipc_dev *scu; + enum intel_cht_wc_models cht_wc_model; }; int intel_soc_pmic_exec_mipi_pmic_seq_element(u16 i2c_address, u32 reg_address, -- cgit From ab28e944197fa78e6af7c4a0ffd6bba9a5bbacf0 Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Mon, 31 Jan 2022 15:01:11 -0800 Subject: topology/sysfs: Add PPIN in sysfs under cpu topology PPIN is the Protected Processor Identification Number. This is used to identify the socket as a Field Replaceable Unit (FRU). Existing code only displays this when reporting errors. But this makes it inconvenient for large clusters to use it for its intended purpose of inventory control. Add ppin to /sys/devices/system/cpu/cpu*/topology to make what is already available using RDMSR more easily accessible. Make the file read only for root in case there are still people concerned about making a unique system "serial number" available. Signed-off-by: Tony Luck Signed-off-by: Borislav Petkov Acked-by: Greg Kroah-Hartman Link: https://lore.kernel.org/r/20220131230111.2004669-6-tony.luck@intel.com --- include/linux/topology.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/topology.h b/include/linux/topology.h index a6e201758ae9..f19bc3626297 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -211,6 +211,9 @@ static inline int cpu_to_mem(int cpu) #ifndef topology_drawer_id #define topology_drawer_id(cpu) ((void)(cpu), -1) #endif +#ifndef topology_ppin +#define topology_ppin(cpu) ((void)(cpu), 0ull) +#endif #ifndef topology_sibling_cpumask #define topology_sibling_cpumask(cpu) cpumask_of(cpu) #endif -- cgit From bee9f65523218e3baeeecde9295c8fbe9bc08e0a Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 27 Jan 2022 16:02:50 +0000 Subject: netfs, cachefiles: Add a method to query presence of data in the cache Add a netfs_cache_ops method by which a network filesystem can ask the cache about what data it has available and where so that it can make a multipage read more efficient. Signed-off-by: David Howells cc: linux-cachefs@redhat.com Acked-by: Jeff Layton Reviewed-by: Rohith Surabattula Signed-off-by: Steve French --- include/linux/netfs.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index b46c39d98bbd..614f22213e21 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -244,6 +244,13 @@ struct netfs_cache_ops { int (*prepare_write)(struct netfs_cache_resources *cres, loff_t *_start, size_t *_len, loff_t i_size, bool no_space_allocated_yet); + + /* Query the occupancy of the cache in a region, returning where the + * next chunk of data starts and how long it is. + */ + int (*query_occupancy)(struct netfs_cache_resources *cres, + loff_t start, size_t len, size_t granularity, + loff_t *_data_start, size_t *_data_len); }; struct readahead_control; -- cgit From 90b2433edb6d995bd23d6adde753095b4ab26104 Mon Sep 17 00:00:00 2001 From: Maíra Canal Date: Mon, 31 Jan 2022 15:22:42 -0300 Subject: seq_file: fix NULL pointer arithmetic warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Implement conditional logic in order to replace NULL pointer arithmetic. The use of NULL pointer arithmetic was pointed out by clang with the following warning: fs/kernfs/file.c:128:15: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] return NULL + !*ppos; ~~~~ ^ fs/seq_file.c:559:14: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] return NULL + (*pos == 0); Signed-off-by: Maíra Canal Signed-off-by: Al Viro --- include/linux/seq_file.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h index 88cc16444b43..60820ab511d2 100644 --- a/include/linux/seq_file.h +++ b/include/linux/seq_file.h @@ -162,6 +162,7 @@ int seq_dentry(struct seq_file *, struct dentry *, const char *); int seq_path_root(struct seq_file *m, const struct path *path, const struct path *root, const char *esc); +void *single_start(struct seq_file *, loff_t *); int single_open(struct file *, int (*)(struct seq_file *, void *), void *); int single_open_size(struct file *, int (*)(struct seq_file *, void *), void *, size_t); int single_release(struct inode *, struct file *); -- cgit From e3dc1399506f894110667ee5c66a6a70f06f3348 Mon Sep 17 00:00:00 2001 From: Stefan Binding Date: Fri, 21 Jan 2022 17:24:23 +0000 Subject: spi: Make spi_alloc_device and spi_add_device public again This functions were previously made private since they were not used. However, these functions will be needed again. Partial revert of commit da21fde0fdb3 ("spi: Make several public functions private to spi.c") Signed-off-by: Stefan Binding Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20220121172431.6876-2-sbinding@opensource.cirrus.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 7ab3fed7b804..0346a3ff27fd 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -1452,7 +1452,19 @@ spi_register_board_info(struct spi_board_info const *info, unsigned n) * use spi_new_device() to describe each device. You can also call * spi_unregister_device() to start making that device vanish, but * normally that would be handled by spi_unregister_controller(). + * + * You can also use spi_alloc_device() and spi_add_device() to use a two + * stage registration sequence for each spi_device. This gives the caller + * some more control over the spi_device structure before it is registered, + * but requires that caller to initialize fields that would otherwise + * be defined using the board info. */ +extern struct spi_device * +spi_alloc_device(struct spi_controller *ctlr); + +extern int +spi_add_device(struct spi_device *spi); + extern struct spi_device * spi_new_device(struct spi_controller *, struct spi_board_info *); -- cgit From 000bee0ed70af79e610444096fb453430220960f Mon Sep 17 00:00:00 2001 From: Stefan Binding Date: Fri, 21 Jan 2022 17:24:24 +0000 Subject: spi: Create helper API to lookup ACPI info for spi device This can then be used to find a spi resource inside an ACPI node, and allocate a spi device. Signed-off-by: Stefan Binding Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20220121172431.6876-3-sbinding@opensource.cirrus.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 0346a3ff27fd..d159cef12f1a 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -16,6 +16,7 @@ #include #include +#include struct dma_chan; struct software_node; @@ -759,6 +760,11 @@ extern int devm_spi_register_controller(struct device *dev, struct spi_controller *ctlr); extern void spi_unregister_controller(struct spi_controller *ctlr); +#if IS_ENABLED(CONFIG_ACPI) +extern struct spi_device *acpi_spi_device_alloc(struct spi_controller *ctlr, + struct acpi_device *adev); +#endif + /* * SPI resource management while processing a SPI message */ -- cgit From 87e59b36e5e26122efd55d77adb9fac827987db0 Mon Sep 17 00:00:00 2001 From: Stefan Binding Date: Fri, 21 Jan 2022 17:24:25 +0000 Subject: spi: Support selection of the index of the ACPI Spi Resource before alloc If a node contains more than one SPI resource it may be necessary to use an index to select which one you want to allocate a spi device for. Signed-off-by: Stefan Binding Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20220121172431.6876-4-sbinding@opensource.cirrus.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index d159cef12f1a..e5bbb9cbd3d7 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -762,7 +762,8 @@ extern void spi_unregister_controller(struct spi_controller *ctlr); #if IS_ENABLED(CONFIG_ACPI) extern struct spi_device *acpi_spi_device_alloc(struct spi_controller *ctlr, - struct acpi_device *adev); + struct acpi_device *adev, + int index); #endif /* -- cgit From e612af7acef2459f1afd885f4107748995a05963 Mon Sep 17 00:00:00 2001 From: Stefan Binding Date: Fri, 21 Jan 2022 17:24:26 +0000 Subject: spi: Add API to count spi acpi resources Some ACPI nodes may have more than one Spi Resource. To be able to handle these case, its necessary to have a way of counting these resources. Signed-off-by: Stefan Binding Reviewed-by: Hans de Goede Link: https://lore.kernel.org/r/20220121172431.6876-5-sbinding@opensource.cirrus.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index e5bbb9cbd3d7..394b4241d989 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -764,6 +764,7 @@ extern void spi_unregister_controller(struct spi_controller *ctlr); extern struct spi_device *acpi_spi_device_alloc(struct spi_controller *ctlr, struct acpi_device *adev, int index); +int acpi_spi_count_resources(struct acpi_device *adev); #endif /* -- cgit From 6d5c900eb64107001e91e1f46bddc254dded8a59 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 24 Jan 2022 09:22:41 -0800 Subject: net/mlx5e: Use struct_group() for memcpy() region In preparation for FORTIFY_SOURCE performing compile-time and run-time field bounds checking for memcpy(), memmove(), and memset(), avoid intentionally writing across neighboring fields. Use struct_group() in struct vlan_ethhdr around members h_dest and h_source, so they can be referenced together. This will allow memcpy() and sizeof() to more easily reason about sizes, improve readability, and avoid future warnings about writing beyond the end of h_dest. "pahole" shows no size nor member offset changes to struct vlan_ethhdr. "objdump -d" shows no object code changes. Fixes: 34802a42b352 ("net/mlx5e: Do not modify the TX SKB") Signed-off-by: Kees Cook Signed-off-by: Saeed Mahameed --- include/linux/if_vlan.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 8420fe504927..2be4dd7e90a9 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -46,8 +46,10 @@ struct vlan_hdr { * @h_vlan_encapsulated_proto: packet type ID or len */ struct vlan_ethhdr { - unsigned char h_dest[ETH_ALEN]; - unsigned char h_source[ETH_ALEN]; + struct_group(addrs, + unsigned char h_dest[ETH_ALEN]; + unsigned char h_source[ETH_ALEN]; + ); __be16 h_vlan_proto; __be16 h_vlan_TCI; __be16 h_vlan_encapsulated_proto; -- cgit From c8eaf6ac76f40f6c59fc7d056e2e08c4a57ea9c7 Mon Sep 17 00:00:00 2001 From: Zhen Ni Date: Fri, 28 Jan 2022 17:50:25 +0800 Subject: sched: move autogroup sysctls into its own file move autogroup sysctls to autogroup.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220128095025.8745-1-nizhen@uniontech.com --- include/linux/sched/sysctl.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index c19dd5a2c05c..3f2b70f8d32c 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -45,10 +45,6 @@ extern unsigned int sysctl_sched_uclamp_util_min_rt_default; extern unsigned int sysctl_sched_cfs_bandwidth_slice; #endif -#ifdef CONFIG_SCHED_AUTOGROUP -extern unsigned int sysctl_sched_autogroup_enabled; -#endif - extern int sysctl_sched_rr_timeslice; extern int sched_rr_timeslice; -- cgit From 1148836fd3226c20de841084aba24184d4fbbe77 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Wed, 2 Feb 2022 14:55:29 +0100 Subject: Revert "fbdev: Garbage collect fbdev scrolling acceleration, part 1 (from TODO list)" This reverts commit b3ec8cdf457e5e63d396fe1346cc788cf7c1b578. Revert the second (of 2) commits which disabled scrolling acceleration in fbcon/fbdev. It introduced a regression for fbdev-supported graphic cards because of the performance penalty by doing screen scrolling by software instead of using the existing graphic card 2D hardware acceleration. Console scrolling acceleration was disabled by dropping code which checked at runtime the driver hardware capabilities for the BINFO_HWACCEL_COPYAREA or FBINFO_HWACCEL_FILLRECT flags and if set, it enabled scrollmode SCROLL_MOVE which uses hardware acceleration to move screen contents. After dropping those checks scrollmode was hard-wired to SCROLL_REDRAW instead, which forces all graphic cards to redraw every character at the new screen position when scrolling. This change effectively disabled all hardware-based scrolling acceleration for ALL drivers, because now all kind of 2D hardware acceleration (bitblt, fillrect) in the drivers isn't used any longer. The original commit message mentions that only 3 DRM drivers (nouveau, omapdrm and gma500) used hardware acceleration in the past and thus code for checking and using scrolling acceleration is obsolete. This statement is NOT TRUE, because beside the DRM drivers there are around 35 other fbdev drivers which depend on fbdev/fbcon and still provide hardware acceleration for fbdev/fbcon. The original commit message also states that syzbot found lots of bugs in fbcon and thus it's "often the solution to just delete code and remove features". This is true, and the bugs - which actually affected all users of fbcon, including DRM - were fixed, or code was dropped like e.g. the support for software scrollback in vgacon (commit 973c096f6a85). So to further analyze which bugs were found by syzbot, I've looked through all patches in drivers/video which were tagged with syzbot or syzkaller back to year 2005. The vast majority fixed the reported issues on a higher level, e.g. when screen is to be resized, or when font size is to be changed. The few ones which touched driver code fixed a real driver bug, e.g. by adding a check. But NONE of those patches touched code of either the SCROLL_MOVE or the SCROLL_REDRAW case. That means, there was no real reason why SCROLL_MOVE had to be ripped-out and just SCROLL_REDRAW had to be used instead. The only reason I can imagine so far was that SCROLL_MOVE wasn't used by DRM and as such it was assumed that it could go away. That argument completely missed the fact that SCROLL_MOVE is still heavily used by fbdev (non-DRM) drivers. Some people mention that using memcpy() instead of the hardware acceleration is pretty much the same speed. But that's not true, at least not for older graphic cards and machines where we see speed decreases by factor 10 and more and thus this change leads to console responsiveness way worse than before. That's why the original commit is to be reverted. By reverting we reintroduce hardware-based scrolling acceleration and fix the performance regression for fbdev drivers. There isn't any impact on DRM when reverting those patches. Signed-off-by: Helge Deller Acked-by: Geert Uytterhoeven Acked-by: Sven Schnelle Cc: stable@vger.kernel.org # v5.16+ Signed-off-by: Helge Deller Signed-off-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220202135531.92183-2-deller@gmx.de --- include/linux/fb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 3da95842b207..02f362c661c8 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -262,7 +262,7 @@ struct fb_ops { /* Draws a rectangle */ void (*fb_fillrect) (struct fb_info *info, const struct fb_fillrect *rect); - /* Copy data from area to another. Obsolete. */ + /* Copy data from area to another */ void (*fb_copyarea) (struct fb_info *info, const struct fb_copyarea *region); /* Draws a image to the display */ void (*fb_imageblit) (struct fb_info *info, const struct fb_image *image); -- cgit From 3ec762fb13c7e7273800b94c80db1c2cc37590d1 Mon Sep 17 00:00:00 2001 From: Ansuel Smith Date: Wed, 2 Feb 2022 01:03:23 +0100 Subject: net: dsa: tag_qca: move define to include linux/dsa Move tag_qca define to include dir linux/dsa as the qca8k require access to the tagger define to support in-band mdio read/write using ethernet packet. Signed-off-by: Ansuel Smith Reviewed-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/dsa/tag_qca.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) create mode 100644 include/linux/dsa/tag_qca.h (limited to 'include/linux') diff --git a/include/linux/dsa/tag_qca.h b/include/linux/dsa/tag_qca.h new file mode 100644 index 000000000000..c02d2d39ff4a --- /dev/null +++ b/include/linux/dsa/tag_qca.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __TAG_QCA_H +#define __TAG_QCA_H + +#define QCA_HDR_LEN 2 +#define QCA_HDR_VERSION 0x2 + +#define QCA_HDR_RECV_VERSION GENMASK(15, 14) +#define QCA_HDR_RECV_PRIORITY GENMASK(13, 11) +#define QCA_HDR_RECV_TYPE GENMASK(10, 6) +#define QCA_HDR_RECV_FRAME_IS_TAGGED BIT(3) +#define QCA_HDR_RECV_SOURCE_PORT GENMASK(2, 0) + +#define QCA_HDR_XMIT_VERSION GENMASK(15, 14) +#define QCA_HDR_XMIT_PRIORITY GENMASK(13, 11) +#define QCA_HDR_XMIT_CONTROL GENMASK(10, 8) +#define QCA_HDR_XMIT_FROM_CPU BIT(7) +#define QCA_HDR_XMIT_DP_BIT GENMASK(6, 0) + +#endif /* __TAG_QCA_H */ -- cgit From c2ee8181fddb293d296477f60b3eb4fa3ce4e1a6 Mon Sep 17 00:00:00 2001 From: Ansuel Smith Date: Wed, 2 Feb 2022 01:03:25 +0100 Subject: net: dsa: tag_qca: add define for handling mgmt Ethernet packet Add all the required define to prepare support for mgmt read/write in Ethernet packet. Any packet of this type has to be dropped as the only use of these special packet is receive ack for an mgmt write request or receive data for an mgmt read request. A struct is used that emulates the Ethernet header but is used for a different purpose. Signed-off-by: Ansuel Smith Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/dsa/tag_qca.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dsa/tag_qca.h b/include/linux/dsa/tag_qca.h index c02d2d39ff4a..f366422ab7a0 100644 --- a/include/linux/dsa/tag_qca.h +++ b/include/linux/dsa/tag_qca.h @@ -12,10 +12,54 @@ #define QCA_HDR_RECV_FRAME_IS_TAGGED BIT(3) #define QCA_HDR_RECV_SOURCE_PORT GENMASK(2, 0) +/* Packet type for recv */ +#define QCA_HDR_RECV_TYPE_NORMAL 0x0 +#define QCA_HDR_RECV_TYPE_MIB 0x1 +#define QCA_HDR_RECV_TYPE_RW_REG_ACK 0x2 + #define QCA_HDR_XMIT_VERSION GENMASK(15, 14) #define QCA_HDR_XMIT_PRIORITY GENMASK(13, 11) #define QCA_HDR_XMIT_CONTROL GENMASK(10, 8) #define QCA_HDR_XMIT_FROM_CPU BIT(7) #define QCA_HDR_XMIT_DP_BIT GENMASK(6, 0) +/* Packet type for xmit */ +#define QCA_HDR_XMIT_TYPE_NORMAL 0x0 +#define QCA_HDR_XMIT_TYPE_RW_REG 0x1 + +/* Check code for a valid mgmt packet. Switch will ignore the packet + * with this wrong. + */ +#define QCA_HDR_MGMT_CHECK_CODE_VAL 0x5 + +/* Specific define for in-band MDIO read/write with Ethernet packet */ +#define QCA_HDR_MGMT_SEQ_LEN 4 /* 4 byte for the seq */ +#define QCA_HDR_MGMT_COMMAND_LEN 4 /* 4 byte for the command */ +#define QCA_HDR_MGMT_DATA1_LEN 4 /* First 4 byte for the mdio data */ +#define QCA_HDR_MGMT_HEADER_LEN (QCA_HDR_MGMT_SEQ_LEN + \ + QCA_HDR_MGMT_COMMAND_LEN + \ + QCA_HDR_MGMT_DATA1_LEN) + +#define QCA_HDR_MGMT_DATA2_LEN 12 /* Other 12 byte for the mdio data */ +#define QCA_HDR_MGMT_PADDING_LEN 34 /* Padding to reach the min Ethernet packet */ + +#define QCA_HDR_MGMT_PKT_LEN (QCA_HDR_MGMT_HEADER_LEN + \ + QCA_HDR_LEN + \ + QCA_HDR_MGMT_DATA2_LEN + \ + QCA_HDR_MGMT_PADDING_LEN) + +#define QCA_HDR_MGMT_SEQ_NUM GENMASK(31, 0) /* 63, 32 */ +#define QCA_HDR_MGMT_CHECK_CODE GENMASK(31, 29) /* 31, 29 */ +#define QCA_HDR_MGMT_CMD BIT(28) /* 28 */ +#define QCA_HDR_MGMT_LENGTH GENMASK(23, 20) /* 23, 20 */ +#define QCA_HDR_MGMT_ADDR GENMASK(18, 0) /* 18, 0 */ + +/* Special struct emulating a Ethernet header */ +struct qca_mgmt_ethhdr { + u32 command; /* command bit 31:0 */ + u32 seq; /* seq 63:32 */ + u32 mdio_data; /* first 4byte mdio */ + __be16 hdr; /* qca hdr */ +} __packed; + #endif /* __TAG_QCA_H */ -- cgit From 18be654a4345f7d937b4bfbad74bea8093e3a93c Mon Sep 17 00:00:00 2001 From: Ansuel Smith Date: Wed, 2 Feb 2022 01:03:26 +0100 Subject: net: dsa: tag_qca: add define for handling MIB packet Add struct to correctly parse a mib Ethernet packet. Signed-off-by: Ansuel Smith Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/dsa/tag_qca.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dsa/tag_qca.h b/include/linux/dsa/tag_qca.h index f366422ab7a0..1fff57f2937b 100644 --- a/include/linux/dsa/tag_qca.h +++ b/include/linux/dsa/tag_qca.h @@ -62,4 +62,14 @@ struct qca_mgmt_ethhdr { __be16 hdr; /* qca hdr */ } __packed; +enum mdio_cmd { + MDIO_WRITE = 0x0, + MDIO_READ +}; + +struct mib_ethhdr { + u32 data[3]; /* first 3 mib counter */ + __be16 hdr; /* qca hdr */ +} __packed; + #endif /* __TAG_QCA_H */ -- cgit From 31eb6b4386ad91930417e3f5c8157a4b5e31cbd5 Mon Sep 17 00:00:00 2001 From: Ansuel Smith Date: Wed, 2 Feb 2022 01:03:27 +0100 Subject: net: dsa: tag_qca: add support for handling mgmt and MIB Ethernet packet Add connect/disconnect helper to assign private struct to the DSA switch. Add support for Ethernet mgmt and MIB if the DSA driver provide an handler to correctly parse and elaborate the data. Signed-off-by: Ansuel Smith Reviewed-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/dsa/tag_qca.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dsa/tag_qca.h b/include/linux/dsa/tag_qca.h index 1fff57f2937b..4359fb0221cf 100644 --- a/include/linux/dsa/tag_qca.h +++ b/include/linux/dsa/tag_qca.h @@ -72,4 +72,11 @@ struct mib_ethhdr { __be16 hdr; /* qca hdr */ } __packed; +struct qca_tagger_data { + void (*rw_reg_ack_handler)(struct dsa_switch *ds, + struct sk_buff *skb); + void (*mib_autocast_handler)(struct dsa_switch *ds, + struct sk_buff *skb); +}; + #endif /* __TAG_QCA_H */ -- cgit From 926597ffce0e3e2f785475df18e1636194209910 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 24 Jan 2022 10:39:11 +0100 Subject: block: move disk_{block,unblock,flush}_events to blk.h No need to have these declarations in a public header. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220124093913.742411-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/genhd.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 6906a45bc761..504f9a6674ac 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -185,9 +185,6 @@ static inline int bdev_read_only(struct block_device *bdev) return bdev->bd_read_only || get_disk_ro(bdev->bd_disk); } -extern void disk_block_events(struct gendisk *disk); -extern void disk_unblock_events(struct gendisk *disk); -extern void disk_flush_events(struct gendisk *disk, unsigned int mask); bool set_capacity_and_notify(struct gendisk *disk, sector_t size); bool disk_force_media_change(struct gendisk *disk, unsigned int events); -- cgit From e7243285c0fc87054990fcde630583586ff8ed5f Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 24 Jan 2022 10:39:12 +0100 Subject: block: move blk_drop_partitions to blk.h No need to have this declaration in a public header. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220124093913.742411-3-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/genhd.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/genhd.h b/include/linux/genhd.h index 504f9a6674ac..aa4bd985dbe5 100644 --- a/include/linux/genhd.h +++ b/include/linux/genhd.h @@ -219,7 +219,6 @@ static inline u64 sb_bdev_nr_blocks(struct super_block *sb) } int bdev_disk_changed(struct gendisk *disk, bool invalidate); -void blk_drop_partitions(struct gendisk *disk); struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id, struct lock_class_key *lkclass); -- cgit From 322cbb50de711814c42fb088f6d31901502c711a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 24 Jan 2022 10:39:13 +0100 Subject: block: remove genhd.h There is no good reason to keep genhd.h separate from the main blkdev.h header that includes it. So fold the contents of genhd.h into blkdev.h and remove genhd.h entirely. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220124093913.742411-4-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 273 ++++++++++++++++++++++++++++++++++++++++++- include/linux/genhd.h | 287 ---------------------------------------------- include/linux/part_stat.h | 2 +- 3 files changed, 271 insertions(+), 291 deletions(-) delete mode 100644 include/linux/genhd.h (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f35aea98bc35..99a4384bb8a5 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1,9 +1,13 @@ /* SPDX-License-Identifier: GPL-2.0 */ +/* + * Portions Copyright (C) 1992 Drew Eckhardt + */ #ifndef _LINUX_BLKDEV_H #define _LINUX_BLKDEV_H -#include -#include +#include +#include +#include #include #include #include @@ -12,11 +16,15 @@ #include #include #include +#include #include #include #include +#include #include #include +#include +#include struct module; struct request_queue; @@ -33,6 +41,10 @@ struct blk_queue_stats; struct blk_stat_callback; struct blk_crypto_profile; +extern const struct device_type disk_type; +extern struct device_type part_type; +extern struct class block_class; + /* Must be consistent with blk_mq_poll_stats_bkt() */ #define BLK_MQ_POLL_STATS_BKTS 16 @@ -45,6 +57,144 @@ struct blk_crypto_profile; */ #define BLKCG_MAX_POLS 6 +#define DISK_MAX_PARTS 256 +#define DISK_NAME_LEN 32 + +#define PARTITION_META_INFO_VOLNAMELTH 64 +/* + * Enough for the string representation of any kind of UUID plus NULL. + * EFI UUID is 36 characters. MSDOS UUID is 11 characters. + */ +#define PARTITION_META_INFO_UUIDLTH (UUID_STRING_LEN + 1) + +struct partition_meta_info { + char uuid[PARTITION_META_INFO_UUIDLTH]; + u8 volname[PARTITION_META_INFO_VOLNAMELTH]; +}; + +/** + * DOC: genhd capability flags + * + * ``GENHD_FL_REMOVABLE``: indicates that the block device gives access to + * removable media. When set, the device remains present even when media is not + * inserted. Shall not be set for devices which are removed entirely when the + * media is removed. + * + * ``GENHD_FL_HIDDEN``: the block device is hidden; it doesn't produce events, + * doesn't appear in sysfs, and can't be opened from userspace or using + * blkdev_get*. Used for the underlying components of multipath devices. + * + * ``GENHD_FL_NO_PART``: partition support is disabled. The kernel will not + * scan for partitions from add_disk, and users can't add partitions manually. + * + */ +enum { + GENHD_FL_REMOVABLE = 1 << 0, + GENHD_FL_HIDDEN = 1 << 1, + GENHD_FL_NO_PART = 1 << 2, +}; + +enum { + DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */ + DISK_EVENT_EJECT_REQUEST = 1 << 1, /* eject requested */ +}; + +enum { + /* Poll even if events_poll_msecs is unset */ + DISK_EVENT_FLAG_POLL = 1 << 0, + /* Forward events to udev */ + DISK_EVENT_FLAG_UEVENT = 1 << 1, + /* Block event polling when open for exclusive write */ + DISK_EVENT_FLAG_BLOCK_ON_EXCL_WRITE = 1 << 2, +}; + +struct disk_events; +struct badblocks; + +struct blk_integrity { + const struct blk_integrity_profile *profile; + unsigned char flags; + unsigned char tuple_size; + unsigned char interval_exp; + unsigned char tag_size; +}; + +struct gendisk { + /* + * major/first_minor/minors should not be set by any new driver, the + * block core will take care of allocating them automatically. + */ + int major; + int first_minor; + int minors; + + char disk_name[DISK_NAME_LEN]; /* name of major driver */ + + unsigned short events; /* supported events */ + unsigned short event_flags; /* flags related to event processing */ + + struct xarray part_tbl; + struct block_device *part0; + + const struct block_device_operations *fops; + struct request_queue *queue; + void *private_data; + + int flags; + unsigned long state; +#define GD_NEED_PART_SCAN 0 +#define GD_READ_ONLY 1 +#define GD_DEAD 2 +#define GD_NATIVE_CAPACITY 3 + + struct mutex open_mutex; /* open/close mutex */ + unsigned open_partitions; /* number of open partitions */ + + struct backing_dev_info *bdi; + struct kobject *slave_dir; +#ifdef CONFIG_BLOCK_HOLDER_DEPRECATED + struct list_head slave_bdevs; +#endif + struct timer_rand_state *random; + atomic_t sync_io; /* RAID */ + struct disk_events *ev; +#ifdef CONFIG_BLK_DEV_INTEGRITY + struct kobject integrity_kobj; +#endif /* CONFIG_BLK_DEV_INTEGRITY */ +#if IS_ENABLED(CONFIG_CDROM) + struct cdrom_device_info *cdi; +#endif + int node_id; + struct badblocks *bb; + struct lockdep_map lockdep_map; + u64 diskseq; +}; + +static inline bool disk_live(struct gendisk *disk) +{ + return !inode_unhashed(disk->part0->bd_inode); +} + +/* + * The gendisk is refcounted by the part0 block_device, and the bd_device + * therein is also used for device model presentation in sysfs. + */ +#define dev_to_disk(device) \ + (dev_to_bdev(device)->bd_disk) +#define disk_to_dev(disk) \ + (&((disk)->part0->bd_device)) + +#if IS_REACHABLE(CONFIG_CDROM) +#define disk_to_cdi(disk) ((disk)->cdi) +#else +#define disk_to_cdi(disk) NULL +#endif + +static inline dev_t disk_devt(struct gendisk *disk) +{ + return MKDEV(disk->major, disk->first_minor); +} + static inline int blk_validate_block_size(unsigned long bsize) { if (bsize < 512 || bsize > PAGE_SIZE || !is_power_of_2(bsize)) @@ -596,6 +746,118 @@ static inline unsigned int blk_queue_depth(struct request_queue *q) #define for_each_bio(_bio) \ for (; _bio; _bio = _bio->bi_next) +int __must_check device_add_disk(struct device *parent, struct gendisk *disk, + const struct attribute_group **groups); +static inline int __must_check add_disk(struct gendisk *disk) +{ + return device_add_disk(NULL, disk, NULL); +} +void del_gendisk(struct gendisk *gp); +void invalidate_disk(struct gendisk *disk); +void set_disk_ro(struct gendisk *disk, bool read_only); +void disk_uevent(struct gendisk *disk, enum kobject_action action); + +static inline int get_disk_ro(struct gendisk *disk) +{ + return disk->part0->bd_read_only || + test_bit(GD_READ_ONLY, &disk->state); +} + +static inline int bdev_read_only(struct block_device *bdev) +{ + return bdev->bd_read_only || get_disk_ro(bdev->bd_disk); +} + +bool set_capacity_and_notify(struct gendisk *disk, sector_t size); +bool disk_force_media_change(struct gendisk *disk, unsigned int events); + +void add_disk_randomness(struct gendisk *disk) __latent_entropy; +void rand_initialize_disk(struct gendisk *disk); + +static inline sector_t get_start_sect(struct block_device *bdev) +{ + return bdev->bd_start_sect; +} + +static inline sector_t bdev_nr_sectors(struct block_device *bdev) +{ + return bdev->bd_nr_sectors; +} + +static inline loff_t bdev_nr_bytes(struct block_device *bdev) +{ + return (loff_t)bdev_nr_sectors(bdev) << SECTOR_SHIFT; +} + +static inline sector_t get_capacity(struct gendisk *disk) +{ + return bdev_nr_sectors(disk->part0); +} + +static inline u64 sb_bdev_nr_blocks(struct super_block *sb) +{ + return bdev_nr_sectors(sb->s_bdev) >> + (sb->s_blocksize_bits - SECTOR_SHIFT); +} + +int bdev_disk_changed(struct gendisk *disk, bool invalidate); + +struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id, + struct lock_class_key *lkclass); +void put_disk(struct gendisk *disk); +struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass); + +/** + * blk_alloc_disk - allocate a gendisk structure + * @node_id: numa node to allocate on + * + * Allocate and pre-initialize a gendisk structure for use with BIO based + * drivers. + * + * Context: can sleep + */ +#define blk_alloc_disk(node_id) \ +({ \ + static struct lock_class_key __key; \ + \ + __blk_alloc_disk(node_id, &__key); \ +}) +void blk_cleanup_disk(struct gendisk *disk); + +int __register_blkdev(unsigned int major, const char *name, + void (*probe)(dev_t devt)); +#define register_blkdev(major, name) \ + __register_blkdev(major, name, NULL) +void unregister_blkdev(unsigned int major, const char *name); + +bool bdev_check_media_change(struct block_device *bdev); +int __invalidate_device(struct block_device *bdev, bool kill_dirty); +void set_capacity(struct gendisk *disk, sector_t size); + +#ifdef CONFIG_BLOCK_HOLDER_DEPRECATED +int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk); +void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk); +int bd_register_pending_holders(struct gendisk *disk); +#else +static inline int bd_link_disk_holder(struct block_device *bdev, + struct gendisk *disk) +{ + return 0; +} +static inline void bd_unlink_disk_holder(struct block_device *bdev, + struct gendisk *disk) +{ +} +static inline int bd_register_pending_holders(struct gendisk *disk) +{ + return 0; +} +#endif /* CONFIG_BLOCK_HOLDER_DEPRECATED */ + +dev_t part_devt(struct gendisk *disk, u8 partno); +void inc_diskseq(struct gendisk *disk); +dev_t blk_lookup_devt(const char *name, int partno); +void blk_request_module(dev_t devt); extern int blk_register_queue(struct gendisk *disk); extern void blk_unregister_queue(struct gendisk *disk); @@ -1311,6 +1573,7 @@ void invalidate_bdev(struct block_device *bdev); int sync_blockdev(struct block_device *bdev); int sync_blockdev_nowait(struct block_device *bdev); void sync_bdevs(bool wait); +void printk_all_partitions(void); #else static inline void invalidate_bdev(struct block_device *bdev) { @@ -1326,7 +1589,11 @@ static inline int sync_blockdev_nowait(struct block_device *bdev) static inline void sync_bdevs(bool wait) { } -#endif +static inline void printk_all_partitions(void) +{ +} +#endif /* CONFIG_BLOCK */ + int fsync_bdev(struct block_device *bdev); int freeze_bdev(struct block_device *bdev); diff --git a/include/linux/genhd.h b/include/linux/genhd.h deleted file mode 100644 index aa4bd985dbe5..000000000000 --- a/include/linux/genhd.h +++ /dev/null @@ -1,287 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _LINUX_GENHD_H -#define _LINUX_GENHD_H - -/* - * genhd.h Copyright (C) 1992 Drew Eckhardt - * Generic hard disk header file by - * Drew Eckhardt - * - * - */ - -#include -#include -#include -#include -#include -#include - -extern const struct device_type disk_type; -extern struct device_type part_type; -extern struct class block_class; - -#define DISK_MAX_PARTS 256 -#define DISK_NAME_LEN 32 - -#define PARTITION_META_INFO_VOLNAMELTH 64 -/* - * Enough for the string representation of any kind of UUID plus NULL. - * EFI UUID is 36 characters. MSDOS UUID is 11 characters. - */ -#define PARTITION_META_INFO_UUIDLTH (UUID_STRING_LEN + 1) - -struct partition_meta_info { - char uuid[PARTITION_META_INFO_UUIDLTH]; - u8 volname[PARTITION_META_INFO_VOLNAMELTH]; -}; - -/** - * DOC: genhd capability flags - * - * ``GENHD_FL_REMOVABLE``: indicates that the block device gives access to - * removable media. When set, the device remains present even when media is not - * inserted. Shall not be set for devices which are removed entirely when the - * media is removed. - * - * ``GENHD_FL_HIDDEN``: the block device is hidden; it doesn't produce events, - * doesn't appear in sysfs, and can't be opened from userspace or using - * blkdev_get*. Used for the underlying components of multipath devices. - * - * ``GENHD_FL_NO_PART``: partition support is disabled. The kernel will not - * scan for partitions from add_disk, and users can't add partitions manually. - * - */ -enum { - GENHD_FL_REMOVABLE = 1 << 0, - GENHD_FL_HIDDEN = 1 << 1, - GENHD_FL_NO_PART = 1 << 2, -}; - -enum { - DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */ - DISK_EVENT_EJECT_REQUEST = 1 << 1, /* eject requested */ -}; - -enum { - /* Poll even if events_poll_msecs is unset */ - DISK_EVENT_FLAG_POLL = 1 << 0, - /* Forward events to udev */ - DISK_EVENT_FLAG_UEVENT = 1 << 1, - /* Block event polling when open for exclusive write */ - DISK_EVENT_FLAG_BLOCK_ON_EXCL_WRITE = 1 << 2, -}; - -struct disk_events; -struct badblocks; - -struct blk_integrity { - const struct blk_integrity_profile *profile; - unsigned char flags; - unsigned char tuple_size; - unsigned char interval_exp; - unsigned char tag_size; -}; - -struct gendisk { - /* - * major/first_minor/minors should not be set by any new driver, the - * block core will take care of allocating them automatically. - */ - int major; - int first_minor; - int minors; - - char disk_name[DISK_NAME_LEN]; /* name of major driver */ - - unsigned short events; /* supported events */ - unsigned short event_flags; /* flags related to event processing */ - - struct xarray part_tbl; - struct block_device *part0; - - const struct block_device_operations *fops; - struct request_queue *queue; - void *private_data; - - int flags; - unsigned long state; -#define GD_NEED_PART_SCAN 0 -#define GD_READ_ONLY 1 -#define GD_DEAD 2 -#define GD_NATIVE_CAPACITY 3 - - struct mutex open_mutex; /* open/close mutex */ - unsigned open_partitions; /* number of open partitions */ - - struct backing_dev_info *bdi; - struct kobject *slave_dir; -#ifdef CONFIG_BLOCK_HOLDER_DEPRECATED - struct list_head slave_bdevs; -#endif - struct timer_rand_state *random; - atomic_t sync_io; /* RAID */ - struct disk_events *ev; -#ifdef CONFIG_BLK_DEV_INTEGRITY - struct kobject integrity_kobj; -#endif /* CONFIG_BLK_DEV_INTEGRITY */ -#if IS_ENABLED(CONFIG_CDROM) - struct cdrom_device_info *cdi; -#endif - int node_id; - struct badblocks *bb; - struct lockdep_map lockdep_map; - u64 diskseq; -}; - -static inline bool disk_live(struct gendisk *disk) -{ - return !inode_unhashed(disk->part0->bd_inode); -} - -/* - * The gendisk is refcounted by the part0 block_device, and the bd_device - * therein is also used for device model presentation in sysfs. - */ -#define dev_to_disk(device) \ - (dev_to_bdev(device)->bd_disk) -#define disk_to_dev(disk) \ - (&((disk)->part0->bd_device)) - -#if IS_REACHABLE(CONFIG_CDROM) -#define disk_to_cdi(disk) ((disk)->cdi) -#else -#define disk_to_cdi(disk) NULL -#endif - -static inline dev_t disk_devt(struct gendisk *disk) -{ - return MKDEV(disk->major, disk->first_minor); -} - -void disk_uevent(struct gendisk *disk, enum kobject_action action); - -/* block/genhd.c */ -int __must_check device_add_disk(struct device *parent, struct gendisk *disk, - const struct attribute_group **groups); -static inline int __must_check add_disk(struct gendisk *disk) -{ - return device_add_disk(NULL, disk, NULL); -} -extern void del_gendisk(struct gendisk *gp); - -void invalidate_disk(struct gendisk *disk); - -void set_disk_ro(struct gendisk *disk, bool read_only); - -static inline int get_disk_ro(struct gendisk *disk) -{ - return disk->part0->bd_read_only || - test_bit(GD_READ_ONLY, &disk->state); -} - -static inline int bdev_read_only(struct block_device *bdev) -{ - return bdev->bd_read_only || get_disk_ro(bdev->bd_disk); -} - -bool set_capacity_and_notify(struct gendisk *disk, sector_t size); -bool disk_force_media_change(struct gendisk *disk, unsigned int events); - -/* drivers/char/random.c */ -extern void add_disk_randomness(struct gendisk *disk) __latent_entropy; -extern void rand_initialize_disk(struct gendisk *disk); - -static inline sector_t get_start_sect(struct block_device *bdev) -{ - return bdev->bd_start_sect; -} - -static inline sector_t bdev_nr_sectors(struct block_device *bdev) -{ - return bdev->bd_nr_sectors; -} - -static inline loff_t bdev_nr_bytes(struct block_device *bdev) -{ - return (loff_t)bdev_nr_sectors(bdev) << SECTOR_SHIFT; -} - -static inline sector_t get_capacity(struct gendisk *disk) -{ - return bdev_nr_sectors(disk->part0); -} - -static inline u64 sb_bdev_nr_blocks(struct super_block *sb) -{ - return bdev_nr_sectors(sb->s_bdev) >> - (sb->s_blocksize_bits - SECTOR_SHIFT); -} - -int bdev_disk_changed(struct gendisk *disk, bool invalidate); - -struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id, - struct lock_class_key *lkclass); -extern void put_disk(struct gendisk *disk); -struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass); - -/** - * blk_alloc_disk - allocate a gendisk structure - * @node_id: numa node to allocate on - * - * Allocate and pre-initialize a gendisk structure for use with BIO based - * drivers. - * - * Context: can sleep - */ -#define blk_alloc_disk(node_id) \ -({ \ - static struct lock_class_key __key; \ - \ - __blk_alloc_disk(node_id, &__key); \ -}) -void blk_cleanup_disk(struct gendisk *disk); - -int __register_blkdev(unsigned int major, const char *name, - void (*probe)(dev_t devt)); -#define register_blkdev(major, name) \ - __register_blkdev(major, name, NULL) -void unregister_blkdev(unsigned int major, const char *name); - -bool bdev_check_media_change(struct block_device *bdev); -int __invalidate_device(struct block_device *bdev, bool kill_dirty); -void set_capacity(struct gendisk *disk, sector_t size); - -#ifdef CONFIG_BLOCK_HOLDER_DEPRECATED -int bd_link_disk_holder(struct block_device *bdev, struct gendisk *disk); -void bd_unlink_disk_holder(struct block_device *bdev, struct gendisk *disk); -int bd_register_pending_holders(struct gendisk *disk); -#else -static inline int bd_link_disk_holder(struct block_device *bdev, - struct gendisk *disk) -{ - return 0; -} -static inline void bd_unlink_disk_holder(struct block_device *bdev, - struct gendisk *disk) -{ -} -static inline int bd_register_pending_holders(struct gendisk *disk) -{ - return 0; -} -#endif /* CONFIG_BLOCK_HOLDER_DEPRECATED */ - -dev_t part_devt(struct gendisk *disk, u8 partno); -void inc_diskseq(struct gendisk *disk); -dev_t blk_lookup_devt(const char *name, int partno); -void blk_request_module(dev_t devt); -#ifdef CONFIG_BLOCK -void printk_all_partitions(void); -#else /* CONFIG_BLOCK */ -static inline void printk_all_partitions(void) -{ -} -#endif /* CONFIG_BLOCK */ - -#endif /* _LINUX_GENHD_H */ diff --git a/include/linux/part_stat.h b/include/linux/part_stat.h index 6f7949b2fd8d..abeba356bc3f 100644 --- a/include/linux/part_stat.h +++ b/include/linux/part_stat.h @@ -2,7 +2,7 @@ #ifndef _LINUX_PART_STAT_H #define _LINUX_PART_STAT_H -#include +#include #include struct disk_stats { -- cgit From 0a3140ea0fae377c9eaa031b7db1670ae422ed47 Mon Sep 17 00:00:00 2001 From: Chaitanya Kulkarni Date: Mon, 24 Jan 2022 10:11:02 +0100 Subject: block: pass a block_device and opf to blk_next_bio All callers need to set the block_device and operation, so lift that into the common code. Signed-off-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220124091107.642561-15-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 117d7f248ac9..edeae54074ed 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -790,6 +790,7 @@ static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb) bio->bi_opf |= REQ_NOWAIT; } -struct bio *blk_next_bio(struct bio *bio, unsigned int nr_pages, gfp_t gfp); +struct bio *blk_next_bio(struct bio *bio, struct block_device *bdev, + unsigned int nr_pages, unsigned int opf, gfp_t gfp); #endif /* __LINUX_BIO_H */ -- cgit From 609be1066731fea86436f5f91022f82e592ab456 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 24 Jan 2022 10:11:03 +0100 Subject: block: pass a block_device and opf to bio_alloc_bioset Pass the block_device and operation that we plan to use this bio for to bio_alloc_bioset to optimize the assigment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220124091107.642561-16-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index edeae54074ed..2f63ae9a71e1 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -405,8 +405,9 @@ extern void bioset_exit(struct bio_set *); extern int biovec_init_pool(mempool_t *pool, int pool_entries); extern int bioset_init_from_src(struct bio_set *bs, struct bio_set *src); -struct bio *bio_alloc_bioset(gfp_t gfp, unsigned short nr_iovecs, - struct bio_set *bs); +struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, + unsigned int opf, gfp_t gfp_mask, + struct bio_set *bs); struct bio *bio_alloc_kiocb(struct kiocb *kiocb, unsigned short nr_vecs, struct bio_set *bs); struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned short nr_iovecs); @@ -419,7 +420,7 @@ extern struct bio_set fs_bio_set; static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned short nr_iovecs) { - return bio_alloc_bioset(gfp_mask, nr_iovecs, &fs_bio_set); + return bio_alloc_bioset(NULL, nr_iovecs, 0, gfp_mask, &fs_bio_set); } void submit_bio(struct bio *bio); -- cgit From b77c88c2100ce6a5ec8126c13599b5a7f6663e32 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 24 Jan 2022 10:11:04 +0100 Subject: block: pass a block_device and opf to bio_alloc_kiocb Pass the block_device and operation that we plan to use this bio for to bio_alloc_kiocb to optimize the assigment. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220124091107.642561-17-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 2f63ae9a71e1..5c5ada2ebb27 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -408,8 +408,8 @@ extern int bioset_init_from_src(struct bio_set *bs, struct bio_set *src); struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, unsigned int opf, gfp_t gfp_mask, struct bio_set *bs); -struct bio *bio_alloc_kiocb(struct kiocb *kiocb, unsigned short nr_vecs, - struct bio_set *bs); +struct bio *bio_alloc_kiocb(struct kiocb *kiocb, struct block_device *bdev, + unsigned short nr_vecs, unsigned int opf, struct bio_set *bs); struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned short nr_iovecs); extern void bio_put(struct bio *); -- cgit From 07888c665b405b1cd3577ddebfeb74f4717a84c4 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 24 Jan 2022 10:11:05 +0100 Subject: block: pass a block_device and opf to bio_alloc Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 5c5ada2ebb27..be6ac92913d4 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -418,9 +418,10 @@ extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *); extern struct bio_set fs_bio_set; -static inline struct bio *bio_alloc(gfp_t gfp_mask, unsigned short nr_iovecs) +static inline struct bio *bio_alloc(struct block_device *bdev, + unsigned short nr_vecs, unsigned int opf, gfp_t gfp_mask) { - return bio_alloc_bioset(NULL, nr_iovecs, 0, gfp_mask, &fs_bio_set); + return bio_alloc_bioset(bdev, nr_vecs, opf, gfp_mask, &fs_bio_set); } void submit_bio(struct bio *bio); -- cgit From 49add4966d79244013fce35f95c6833fae82b8b1 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 24 Jan 2022 10:11:06 +0100 Subject: block: pass a block_device and opf to bio_init Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index be6ac92913d4..41bedf727f59 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -456,8 +456,8 @@ static inline int bio_iov_vecs_to_alloc(struct iov_iter *iter, int max_segs) struct request_queue; extern int submit_bio_wait(struct bio *bio); -extern void bio_init(struct bio *bio, struct bio_vec *table, - unsigned short max_vecs); +void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, + unsigned short max_vecs, unsigned int opf); extern void bio_uninit(struct bio *); extern void bio_reset(struct bio *); void bio_chain(struct bio *, struct bio *); -- cgit From a7c50c940477bae89fb2b4f51bd969a2d95d7512 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 24 Jan 2022 10:11:07 +0100 Subject: block: pass a block_device and opf to bio_reset Pass the block_device that we plan to use this bio for and the operation to bio_reset to optimize the assigment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220124091107.642561-20-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 41bedf727f59..18cfe5bb41ea 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -459,7 +459,7 @@ extern int submit_bio_wait(struct bio *bio); void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, unsigned short max_vecs, unsigned int opf); extern void bio_uninit(struct bio *); -extern void bio_reset(struct bio *); +void bio_reset(struct bio *bio, struct block_device *bdev, unsigned int opf); void bio_chain(struct bio *, struct bio *); int bio_add_page(struct bio *, struct page *, unsigned len, unsigned off); @@ -517,13 +517,6 @@ static inline void bio_set_dev(struct bio *bio, struct block_device *bdev) bio_associate_blkg(bio); } -static inline void bio_copy_dev(struct bio *dst, struct bio *src) -{ - bio_clear_flag(dst, BIO_REMAPPED); - dst->bi_bdev = src->bi_bdev; - bio_clone_blkg_association(dst, src); -} - /* * BIO list management for use by remapping drivers (e.g. DM or MD) and loop. * -- cgit From b1f866b013e6e5583f2f0bf4a61d13eddb9a1799 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 27 Jan 2022 08:05:48 +0100 Subject: block: remove blk_needs_flush_plug blk_needs_flush_plug fails to account for the cb_list, which needs flushing as well. Remove it and just check if there is a plug instead of poking into the internals of the plug structure. Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220127070549.1377856-1-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 99a4384bb8a5..f902a1c2fac0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1055,14 +1055,6 @@ extern void blk_finish_plug(struct blk_plug *); void blk_flush_plug(struct blk_plug *plug, bool from_schedule); -static inline bool blk_needs_flush_plug(struct task_struct *tsk) -{ - struct blk_plug *plug = tsk->plug; - - return plug && - (plug->mq_list || !list_empty(&plug->cb_list)); -} - int blkdev_issue_flush(struct block_device *bdev); long nr_blockdev_pages(void); #else /* CONFIG_BLOCK */ @@ -1086,11 +1078,6 @@ static inline void blk_flush_plug(struct blk_plug *plug, bool async) { } -static inline bool blk_needs_flush_plug(struct task_struct *tsk) -{ - return false; -} - static inline int blkdev_issue_flush(struct block_device *bdev) { return 0; -- cgit From aa8dcccaf32bfdc09f2aff089d5d60c37da5b7b5 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 27 Jan 2022 08:05:49 +0100 Subject: block: check that there is a plug in blk_flush_plug Rename blk_flush_plug to __blk_flush_plug and add a wrapper that includes the NULL check instead of open coding that check everywhere. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220127070549.1377856-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f902a1c2fac0..654163d3b903 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1053,7 +1053,12 @@ extern void blk_start_plug(struct blk_plug *); extern void blk_start_plug_nr_ios(struct blk_plug *, unsigned short); extern void blk_finish_plug(struct blk_plug *); -void blk_flush_plug(struct blk_plug *plug, bool from_schedule); +void __blk_flush_plug(struct blk_plug *plug, bool from_schedule); +static inline void blk_flush_plug(struct blk_plug *plug, bool async) +{ + if (plug) + __blk_flush_plug(plug, async); +} int blkdev_issue_flush(struct block_device *bdev); long nr_blockdev_pages(void); -- cgit From b42c1fc3d55e077d36718ad9800d89100b2aff81 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 27 Jan 2022 07:41:25 +0100 Subject: block: fix the kerneldoc for bio_end_io_acct Document the actually existing parameter name. Reported-by: Stephen Rothwell Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220127064125.1314347-1-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 654163d3b903..3bfc75a2a450 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1520,7 +1520,7 @@ void bio_end_io_acct_remapped(struct bio *bio, unsigned long start_time, /** * bio_end_io_acct - end I/O accounting for bio based drivers * @bio: bio to end account for - * @start: start time returned by bio_start_io_acct() + * @start_time: start time returned by bio_start_io_acct() */ static inline void bio_end_io_acct(struct bio *bio, unsigned long start_time) { -- cgit From e1d2699b96793d19388e302fa095e0da2c145701 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 18 Jan 2022 22:10:52 -0500 Subject: NFS: Avoid duplicate uncached readdir calls on eof If we've reached the end of the directory, then cache that information in the context so that we don't need to do an uncached readdir in order to rediscover that fact. Fixes: 794092c57f89 ("NFS: Do uncached readdir when we're seeking a cookie in an empty page cache") Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker --- include/linux/nfs_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 02aa49323d1d..68f81d8d36de 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -107,6 +107,7 @@ struct nfs_open_dir_context { __u64 dup_cookie; pgoff_t page_index; signed char duped; + bool eof; }; /* -- cgit From 2ea88716369ac9a7486a8cb309d6bf1239ea156c Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Sun, 23 Jan 2022 17:27:47 +0100 Subject: libceph: make recv path in secure mode work the same as send path The recv path of secure mode is intertwined with that of crc mode. While it's slightly more efficient that way (the ciphertext is read into the destination buffer and decrypted in place, thus avoiding two potentially heavy memory allocations for the bounce buffer and the corresponding sg array), it isn't really amenable to changes. Sacrifice that edge and align with the send path which always uses a full-sized bounce buffer (currently there is no other way -- if the kernel crypto API ever grows support for streaming (piecewise) en/decryption for GCM [1], we would be able to easily take advantage of that on both sides). [1] https://lore.kernel.org/all/20141225202830.GA18794@gondor.apana.org.au/ Signed-off-by: Ilya Dryomov Reviewed-by: Jeff Layton --- include/linux/ceph/messenger.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index ff99ce094cfa..6c6b6ea52bb8 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -383,6 +383,10 @@ struct ceph_connection_v2_info { struct ceph_gcm_nonce in_gcm_nonce; struct ceph_gcm_nonce out_gcm_nonce; + struct page **in_enc_pages; + int in_enc_page_cnt; + int in_enc_resid; + int in_enc_i; struct page **out_enc_pages; int out_enc_page_cnt; int out_enc_resid; -- cgit From 038b8d1d1ab1cce11a158d30bf080ff41a2cfd15 Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Thu, 30 Dec 2021 15:13:32 +0100 Subject: libceph: optionally use bounce buffer on recv path in crc mode Both msgr1 and msgr2 in crc mode are zero copy in the sense that message data is read from the socket directly into the destination buffer. We assume that the destination buffer is stable (i.e. remains unchanged while it is being read to) though. Otherwise, CRC errors ensue: libceph: read_partial_message 0000000048edf8ad data crc 1063286393 != exp. 228122706 libceph: osd1 (1)192.168.122.1:6843 bad crc/signature libceph: bad data crc, calculated 57958023, expected 1805382778 libceph: osd2 (2)192.168.122.1:6876 integrity error, bad crc Introduce rxbounce option to enable use of a bounce buffer when receiving message data. In particular this is needed if a mapped image is a Windows VM disk, passed to QEMU. Windows has a system-wide "dummy" page that may be mapped into the destination buffer (potentially more than once into the same buffer) by the Windows Memory Manager in an effort to generate a single large I/O [1][2]. QEMU makes a point of preserving overlap relationships when cloning I/O vectors, so krbd gets exposed to this behaviour. [1] "What Is Really in That MDL?" https://docs.microsoft.com/en-us/previous-versions/windows/hardware/design/dn614012(v=vs.85) [2] https://blogs.msmvps.com/kernelmustard/2005/05/04/dummy-pages/ URL: https://bugzilla.redhat.com/show_bug.cgi?id=1973317 Signed-off-by: Ilya Dryomov Reviewed-by: Jeff Layton --- include/linux/ceph/libceph.h | 1 + include/linux/ceph/messenger.h | 1 + 2 files changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h index 6a89ea410e43..edf62eaa6285 100644 --- a/include/linux/ceph/libceph.h +++ b/include/linux/ceph/libceph.h @@ -35,6 +35,7 @@ #define CEPH_OPT_TCP_NODELAY (1<<4) /* TCP_NODELAY on TCP sockets */ #define CEPH_OPT_NOMSGSIGN (1<<5) /* don't sign msgs (msgr1) */ #define CEPH_OPT_ABORT_ON_FULL (1<<6) /* abort w/ ENOSPC when full */ +#define CEPH_OPT_RXBOUNCE (1<<7) /* double-buffer read data */ #define CEPH_OPT_DEFAULT (CEPH_OPT_TCP_NODELAY) diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index 6c6b6ea52bb8..e7f2fb2fc207 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -461,6 +461,7 @@ struct ceph_connection { struct ceph_msg *out_msg; /* sending message (== tail of out_sent) */ + struct page *bounce_page; u32 in_front_crc, in_middle_crc, in_data_crc; /* calculated crc */ struct timespec64 last_keepalive_ack; /* keepalive2 ack stamp */ -- cgit From 043cfff99a18933fda2fb2e163daee73cc07910b Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Thu, 23 Dec 2021 17:23:00 +0100 Subject: firmware: ti_sci: Fix compilation failure when CONFIG_TI_SCI_PROTOCOL is not defined Remove an extra ";" which breaks compilation. Fixes: 53bf2b0e4e4c ("firmware: ti_sci: Add support for getting resource with subtype") Signed-off-by: Christophe JAILLET Signed-off-by: Nishanth Menon Link: https://lore.kernel.org/r/e6c3cb793e1a6a2a0ae2528d5a5650dfe6a4b6ff.1640276505.git.christophe.jaillet@wanadoo.fr --- include/linux/soc/ti/ti_sci_protocol.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/soc/ti/ti_sci_protocol.h b/include/linux/soc/ti/ti_sci_protocol.h index 0aad7009b50e..bd0d11af76c5 100644 --- a/include/linux/soc/ti/ti_sci_protocol.h +++ b/include/linux/soc/ti/ti_sci_protocol.h @@ -645,7 +645,7 @@ devm_ti_sci_get_of_resource(const struct ti_sci_handle *handle, static inline struct ti_sci_resource * devm_ti_sci_get_resource(const struct ti_sci_handle *handle, struct device *dev, - u32 dev_id, u32 sub_type); + u32 dev_id, u32 sub_type) { return ERR_PTR(-EINVAL); } -- cgit From bfb1a7c91fb7758273b4a8d735313d9cc388b502 Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Wed, 2 Feb 2022 12:55:53 -0800 Subject: x86/bug: Merge annotate_reachable() into _BUG_FLAGS() asm In __WARN_FLAGS(), we had two asm statements (abbreviated): asm volatile("ud2"); asm volatile(".pushsection .discard.reachable"); These pair of statements are used to trigger an exception, but then help objtool understand that for warnings, control flow will be restored immediately afterwards. The problem is that volatile is not a compiler barrier. GCC explicitly documents this: > Note that the compiler can move even volatile asm instructions > relative to other code, including across jump instructions. Also, no clobbers are specified to prevent instructions from subsequent statements from being scheduled by compiler before the second asm statement. This can lead to instructions from subsequent statements being emitted by the compiler before the second asm statement. Providing a scheduling model such as via -march= options enables the compiler to better schedule instructions with known latencies to hide latencies from data hazards compared to inline asm statements in which latencies are not estimated. If an instruction gets scheduled by the compiler between the two asm statements, then objtool will think that it is not reachable, producing a warning. To prevent instructions from being scheduled in between the two asm statements, merge them. Also remove an unnecessary unreachable() asm annotation from BUG() in favor of __builtin_unreachable(). objtool is able to track that the ud2 from BUG() terminates control flow within the function. Link: https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile Link: https://github.com/ClangBuiltLinux/linux/issues/1483 Signed-off-by: Nick Desaulniers Signed-off-by: Josh Poimboeuf Link: https://lore.kernel.org/r/20220202205557.2260694-1-ndesaulniers@google.com --- include/linux/compiler.h | 21 +++++---------------- 1 file changed, 5 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 429dcebe2b99..0f7fd205ab7e 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -117,14 +117,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, */ #define __stringify_label(n) #n -#define __annotate_reachable(c) ({ \ - asm volatile(__stringify_label(c) ":\n\t" \ - ".pushsection .discard.reachable\n\t" \ - ".long " __stringify_label(c) "b - .\n\t" \ - ".popsection\n\t" : : "i" (c)); \ -}) -#define annotate_reachable() __annotate_reachable(__COUNTER__) - #define __annotate_unreachable(c) ({ \ asm volatile(__stringify_label(c) ":\n\t" \ ".pushsection .discard.unreachable\n\t" \ @@ -133,24 +125,21 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, }) #define annotate_unreachable() __annotate_unreachable(__COUNTER__) -#define ASM_UNREACHABLE \ - "999:\n\t" \ - ".pushsection .discard.unreachable\n\t" \ - ".long 999b - .\n\t" \ +#define ASM_REACHABLE \ + "998:\n\t" \ + ".pushsection .discard.reachable\n\t" \ + ".long 998b - .\n\t" \ ".popsection\n\t" /* Annotate a C jump table to allow objtool to follow the code flow */ #define __annotate_jump_table __section(".rodata..c_jump_table") #else -#define annotate_reachable() #define annotate_unreachable() +# define ASM_REACHABLE #define __annotate_jump_table #endif -#ifndef ASM_UNREACHABLE -# define ASM_UNREACHABLE -#endif #ifndef unreachable # define unreachable() do { \ annotate_unreachable(); \ -- cgit From e85c81ba8859a4c839bcd69c5d83b32954133a5b Mon Sep 17 00:00:00 2001 From: Xin Yin Date: Mon, 17 Jan 2022 17:36:54 +0800 Subject: ext4: fast commit may not fallback for ineligible commit For the follow scenario: 1. jbd start commit transaction n 2. task A get new handle for transaction n+1 3. task A do some ineligible actions and mark FC_INELIGIBLE 4. jbd complete transaction n and clean FC_INELIGIBLE 5. task A call fsync In this case fast commit will not fallback to full commit and transaction n+1 also not handled by jbd. Make ext4_fc_mark_ineligible() also record transaction tid for latest ineligible case, when call ext4_fc_cleanup() check current transaction tid, if small than latest ineligible tid do not clear the EXT4_MF_FC_INELIGIBLE. Reported-by: kernel test robot Reported-by: Dan Carpenter Reported-by: Ritesh Harjani Suggested-by: Harshad Shirwadkar Signed-off-by: Xin Yin Link: https://lore.kernel.org/r/20220117093655.35160-2-yinxin.x@bytedance.com Signed-off-by: Theodore Ts'o Cc: stable@kernel.org --- include/linux/jbd2.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index fd933c45281a..d63b8106796e 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1295,7 +1295,7 @@ struct journal_s * Clean-up after fast commit or full commit. JBD2 calls this function * after every commit operation. */ - void (*j_fc_cleanup_callback)(struct journal_s *journal, int); + void (*j_fc_cleanup_callback)(struct journal_s *journal, int full, tid_t tid); /** * @j_fc_replay_callback: -- cgit From 3ca40c0d329113a9f76f6aa01abe73d9f16ace9d Mon Sep 17 00:00:00 2001 From: Ritesh Harjani Date: Mon, 17 Jan 2022 17:41:50 +0530 Subject: jbd2: cleanup unused functions declarations from jbd2.h During code review found no references of few of these below function declarations. This patch cleans those up from jbd2.h Signed-off-by: Ritesh Harjani Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/30d1fc327becda197a4136cf9cdc73d9baa3b7b9.1642416995.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o --- include/linux/jbd2.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index d63b8106796e..afc5572e7b8a 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1419,9 +1419,7 @@ extern void jbd2_journal_unfile_buffer(journal_t *, struct journal_head *); extern bool __jbd2_journal_refile_buffer(struct journal_head *); extern void jbd2_journal_refile_buffer(journal_t *, struct journal_head *); extern void __jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int); -extern void __journal_free_buffer(struct journal_head *bh); extern void jbd2_journal_file_buffer(struct journal_head *, transaction_t *, int); -extern void __journal_clean_data_list(transaction_t *transaction); static inline void jbd2_file_log_bh(struct list_head *head, struct buffer_head *bh) { list_add_tail(&bh->b_assoc_buffers, head); @@ -1486,9 +1484,6 @@ extern int jbd2_journal_write_metadata_buffer(transaction_t *transaction, struct buffer_head **bh_out, sector_t blocknr); -/* Transaction locking */ -extern void __wait_on_journal (journal_t *); - /* Transaction cache support */ extern void jbd2_journal_destroy_transaction_cache(void); extern int __init jbd2_journal_init_transaction_cache(void); @@ -1774,8 +1769,6 @@ static inline unsigned long jbd2_log_space_left(journal_t *journal) #define BJ_Reserved 4 /* Buffer is reserved for access by journal */ #define BJ_Types 5 -extern int jbd_blocks_per_page(struct inode *inode); - /* JBD uses a CRC32 checksum */ #define JBD_MAX_CHECKSUM_SIZE 4 -- cgit From 4f98186848707f530669238d90e0562d92a78aab Mon Sep 17 00:00:00 2001 From: Ritesh Harjani Date: Mon, 17 Jan 2022 17:41:51 +0530 Subject: jbd2: refactor wait logic for transaction updates into a common function No functionality change as such in this patch. This only refactors the common piece of code which waits for t_updates to finish into a common function named as jbd2_journal_wait_updates(journal_t *) Signed-off-by: Ritesh Harjani Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/8c564f70f4b2591171677a2a74fccb22a7b6c3a4.1642416995.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o --- include/linux/jbd2.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index afc5572e7b8a..9c3ada74ffb1 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -594,7 +594,7 @@ struct transaction_s */ unsigned long t_log_start; - /* + /* * Number of buffers on the t_buffers list [j_list_lock, no locks * needed for jbd2 thread] */ @@ -1538,6 +1538,8 @@ extern int jbd2_journal_flush(journal_t *journal, unsigned int flags); extern void jbd2_journal_lock_updates (journal_t *); extern void jbd2_journal_unlock_updates (journal_t *); +void jbd2_journal_wait_updates(journal_t *); + extern journal_t * jbd2_journal_init_dev(struct block_device *bdev, struct block_device *fs_dev, unsigned long long start, int len, int bsize); -- cgit From 67d6212afda218d564890d1674bab28e8612170f Mon Sep 17 00:00:00 2001 From: Igor Pylypiv Date: Thu, 27 Jan 2022 15:39:53 -0800 Subject: Revert "module, async: async_synchronize_full() on module init iff async is used" This reverts commit 774a1221e862b343388347bac9b318767336b20b. We need to finish all async code before the module init sequence is done. In the reverted commit the PF_USED_ASYNC flag was added to mark a thread that called async_schedule(). Then the PF_USED_ASYNC flag was used to determine whether or not async_synchronize_full() needs to be invoked. This works when modprobe thread is calling async_schedule(), but it does not work if module dispatches init code to a worker thread which then calls async_schedule(). For example, PCI driver probing is invoked from a worker thread based on a node where device is attached: if (cpu < nr_cpu_ids) error = work_on_cpu(cpu, local_pci_probe, &ddi); else error = local_pci_probe(&ddi); We end up in a situation where a worker thread gets the PF_USED_ASYNC flag set instead of the modprobe thread. As a result, async_synchronize_full() is not invoked and modprobe completes without waiting for the async code to finish. The issue was discovered while loading the pm80xx driver: (scsi_mod.scan=async) modprobe pm80xx worker ... do_init_module() ... pci_call_probe() work_on_cpu(local_pci_probe) local_pci_probe() pm8001_pci_probe() scsi_scan_host() async_schedule() worker->flags |= PF_USED_ASYNC; ... < return from worker > ... if (current->flags & PF_USED_ASYNC) <--- false async_synchronize_full(); Commit 21c3c5d28007 ("block: don't request module during elevator init") fixed the deadlock issue which the reverted commit 774a1221e862 ("module, async: async_synchronize_full() on module init iff async is used") tried to fix. Since commit 0fdff3ec6d87 ("async, kmod: warn on synchronous request_module() from async workers") synchronous module loading from async is not allowed. Given that the original deadlock issue is fixed and it is no longer allowed to call synchronous request_module() from async we can remove PF_USED_ASYNC flag to make module init consistently invoke async_synchronize_full() unless async module probe is requested. Signed-off-by: Igor Pylypiv Reviewed-by: Changyuan Lyu Reviewed-by: Luis Chamberlain Acked-by: Tejun Heo Signed-off-by: Linus Torvalds --- include/linux/sched.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index f5b2be39a78c..75ba8aa60248 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1680,7 +1680,6 @@ extern struct pid *cad_pid; #define PF_MEMALLOC 0x00000800 /* Allocating memory */ #define PF_NPROC_EXCEEDED 0x00001000 /* set_user() noticed that RLIMIT_NPROC was exceeded */ #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ -#define PF_USED_ASYNC 0x00004000 /* Used async_schedule*(), used by module init */ #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ #define PF_FROZEN 0x00010000 /* Frozen for system suspend */ #define PF_KSWAPD 0x00020000 /* I am kswapd */ -- cgit From 22f56b8e890d4e2835951b437bb6eeebfd1cb18b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 3 Feb 2022 16:01:39 -0500 Subject: XArray: Include bitmap.h from xarray.h xas_find_chunk() calls find_next_bit(), which is defined in find.h, included from bitmap.h. Inside the kernel, this isn't a problem because bitmap.h is included from cpumask.h which is dragged in (eventually) by gfp.h. When building the test-suite, that doesn't happen, so we need to include bitmap.h explicitly. Fixes: 4ade0818cf04 ("tools: sync tools/bitmap with mother linux") Reported-by: Liam Howlett Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/xarray.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/xarray.h b/include/linux/xarray.h index d6d5da6ed735..66e28bc1a023 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -9,6 +9,7 @@ * See Documentation/core-api/xarray.rst for how to use the XArray. */ +#include #include #include #include -- cgit From 00edb2bac29f2e9c1674e85479fa5e3aa8cc648f Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jan 2022 10:55:44 -0800 Subject: i40e: remove enum i40e_client_state It's not used. Signed-off-by: Jakub Kicinski Reviewed-by: Jesse Brandeburg Tested-by: Gurucharan G Signed-off-by: Tony Nguyen --- include/linux/net/intel/i40e_client.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/net/intel/i40e_client.h b/include/linux/net/intel/i40e_client.h index 6b3267b49755..ed42bd5f639f 100644 --- a/include/linux/net/intel/i40e_client.h +++ b/include/linux/net/intel/i40e_client.h @@ -26,11 +26,6 @@ struct i40e_client_version { u8 rsvd; }; -enum i40e_client_state { - __I40E_CLIENT_NULL, - __I40E_CLIENT_REGISTERED -}; - enum i40e_client_instance_state { __I40E_CLIENT_INSTANCE_NONE, __I40E_CLIENT_INSTANCE_OPENED, @@ -190,11 +185,6 @@ struct i40e_client { const struct i40e_client_ops *ops; /* client ops provided by the client */ }; -static inline bool i40e_client_is_registered(struct i40e_client *client) -{ - return test_bit(__I40E_CLIENT_REGISTERED, &client->state); -} - void i40e_client_device_register(struct i40e_info *ldev, struct i40e_client *client); void i40e_client_device_unregister(struct i40e_info *ldev); -- cgit From 3a99f121fe0bfa4b65ff74d9e980018caf54c2d4 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Thu, 27 Jan 2022 18:55:01 -0800 Subject: firmware: qcom: scm: Introduce pas_metadata context Starting with Qualcomm SM8450, some new security enhancements has been done in the secure world, which results in the requirement to keep the metadata segment accessible by the secure world from init_image() until auth_and_reset(). Introduce a "PAS metadata context" object that can be passed to init_image() for tracking the mapped memory and a related release function for client drivers to release the mapping once either auth_and_reset() has been invoked or in error handling paths on the way there. Signed-off-by: Bjorn Andersson Reviewed-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20220128025513.97188-2-bjorn.andersson@linaro.org --- include/linux/qcom_scm.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h index ca4a88d7cbdc..681748619890 100644 --- a/include/linux/qcom_scm.h +++ b/include/linux/qcom_scm.h @@ -68,8 +68,16 @@ extern int qcom_scm_set_warm_boot_addr(void *entry, const cpumask_t *cpus); extern void qcom_scm_cpu_power_down(u32 flags); extern int qcom_scm_set_remote_state(u32 state, u32 id); +struct qcom_scm_pas_metadata { + void *ptr; + dma_addr_t phys; + ssize_t size; +}; + extern int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, - size_t size); + size_t size, + struct qcom_scm_pas_metadata *ctx); +void qcom_scm_pas_metadata_release(struct qcom_scm_pas_metadata *ctx); extern int qcom_scm_pas_mem_setup(u32 peripheral, phys_addr_t addr, phys_addr_t size); extern int qcom_scm_pas_auth_and_reset(u32 peripheral); -- cgit From 8bd42e2341a7857010001f08ee1729ced3b0e394 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Thu, 27 Jan 2022 18:55:03 -0800 Subject: soc: qcom: mdt_loader: Allow hash segment to be split out It's been observed that some firmware found in a Qualcomm SM8450 device has the hash table in a separate .bNN file. Use the newly extracted helper function to load this segment from the separate file, if it's determined that the hashes are not part of the already loaded firmware. In order to do this, the function needs access to the firmware basename and to provide more useful error messages a struct device to associate the errors with. Signed-off-by: Bjorn Andersson Reviewed-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20220128025513.97188-4-bjorn.andersson@linaro.org --- include/linux/soc/qcom/mdt_loader.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/mdt_loader.h b/include/linux/soc/qcom/mdt_loader.h index afd47217996b..46bdb7bace9a 100644 --- a/include/linux/soc/qcom/mdt_loader.h +++ b/include/linux/soc/qcom/mdt_loader.h @@ -23,7 +23,8 @@ int qcom_mdt_load_no_init(struct device *dev, const struct firmware *fw, const char *fw_name, int pas_id, void *mem_region, phys_addr_t mem_phys, size_t mem_size, phys_addr_t *reloc_base); -void *qcom_mdt_read_metadata(const struct firmware *fw, size_t *data_len); +void *qcom_mdt_read_metadata(const struct firmware *fw, size_t *data_len, + const char *fw_name, struct device *dev); #else /* !IS_ENABLED(CONFIG_QCOM_MDT_LOADER) */ @@ -51,7 +52,8 @@ static inline int qcom_mdt_load_no_init(struct device *dev, } static inline void *qcom_mdt_read_metadata(const struct firmware *fw, - size_t *data_len) + size_t *data_len, const char *fw_name, + struct device *dev) { return ERR_PTR(-ENODEV); } -- cgit From b794eecb2af77145d36b9c28f056e242add9c4b2 Mon Sep 17 00:00:00 2001 From: Dave Ertman Date: Tue, 23 Nov 2021 10:25:36 -0800 Subject: ice: add support for DSCP QoS for IDC The ice driver provides QoS information to auxiliary drivers through the exported function ice_get_qos_params. This function doesn't currently support L3 DSCP QoS. Add the necessary defines, structure elements and code to support DSCP QoS through the IIDC functions. Signed-off-by: Dave Ertman Signed-off-by: Shiraz Saleem Signed-off-by: Tony Nguyen --- include/linux/net/intel/iidc.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/net/intel/iidc.h b/include/linux/net/intel/iidc.h index 1289593411d3..1c1332e4df26 100644 --- a/include/linux/net/intel/iidc.h +++ b/include/linux/net/intel/iidc.h @@ -32,6 +32,8 @@ enum iidc_rdma_protocol { }; #define IIDC_MAX_USER_PRIORITY 8 +#define IIDC_MAX_DSCP_MAPPING 64 +#define IIDC_DSCP_PFC_MODE 0x1 /* Struct to hold per RDMA Qset info */ struct iidc_rdma_qset_params { @@ -60,6 +62,8 @@ struct iidc_qos_params { u8 vport_relative_bw; u8 vport_priority_type; u8 num_tc; + u8 pfc_mode; + u8 dscp_map[IIDC_MAX_DSCP_MAPPING]; }; struct iidc_event { -- cgit From f4e526ff7e38e27bb87d53131d227a6fd6f73ab5 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Thu, 27 Jan 2022 18:55:08 -0800 Subject: soc: qcom: mdt_loader: Extract PAS operations Rather than passing a boolean to indicate if the PAS operations should be performed from within __mdt_load(), extract them to their own helper function. This will allow clients to invoke this directly, with some qcom_scm_pas_metadata context that they later needs to release, without further having to complicate the prototype of qcom_mdt_load(). Signed-off-by: Bjorn Andersson Reviewed-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20220128025513.97188-9-bjorn.andersson@linaro.org --- include/linux/soc/qcom/mdt_loader.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/mdt_loader.h b/include/linux/soc/qcom/mdt_loader.h index 46bdb7bace9a..9e8e60421192 100644 --- a/include/linux/soc/qcom/mdt_loader.h +++ b/include/linux/soc/qcom/mdt_loader.h @@ -10,10 +10,14 @@ struct device; struct firmware; +struct qcom_scm_pas_metadata; #if IS_ENABLED(CONFIG_QCOM_MDT_LOADER) ssize_t qcom_mdt_get_size(const struct firmware *fw); +int qcom_mdt_pas_init(struct device *dev, const struct firmware *fw, + const char *fw_name, int pas_id, phys_addr_t mem_phys, + struct qcom_scm_pas_metadata *pas_metadata_ctx); int qcom_mdt_load(struct device *dev, const struct firmware *fw, const char *fw_name, int pas_id, void *mem_region, phys_addr_t mem_phys, size_t mem_size, @@ -33,6 +37,13 @@ static inline ssize_t qcom_mdt_get_size(const struct firmware *fw) return -ENODEV; } +static inline int qcom_mdt_pas_init(struct device *dev, const struct firmware *fw, + const char *fw_name, int pas_id, phys_addr_t mem_phys, + struct qcom_scm_pas_metadata *pas_metadata_ctx) +{ + return -ENODEV; +} + static inline int qcom_mdt_load(struct device *dev, const struct firmware *fw, const char *fw_name, int pas_id, void *mem_region, phys_addr_t mem_phys, -- cgit From 52beb1fc237d67cdc64277dc90047767a6fc52d7 Mon Sep 17 00:00:00 2001 From: Stephan Gerhold Date: Wed, 1 Dec 2021 14:05:04 +0100 Subject: firmware: qcom: scm: Drop cpumask parameter from set_boot_addr() qcom_scm_set_cold/warm_boot_addr() currently take a cpumask parameter, but it's not very useful because at the end we always set the same entry address for all CPUs. This also allows speeding up probe of cpuidle-qcom-spm a bit because only one SCM call needs to be made to the TrustZone firmware, instead of one per CPU. The main reason for this change is that it allows implementing the "multi-cluster" variant of the set_boot_addr() call more easily without having to rely on functions that break in certain build configurations or that are not exported to modules. Signed-off-by: Stephan Gerhold Acked-by: Daniel Lezcano Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20211201130505.257379-4-stephan@gerhold.net --- include/linux/qcom_scm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/qcom_scm.h b/include/linux/qcom_scm.h index 681748619890..f8335644a01a 100644 --- a/include/linux/qcom_scm.h +++ b/include/linux/qcom_scm.h @@ -63,8 +63,8 @@ enum qcom_scm_ice_cipher { extern bool qcom_scm_is_available(void); -extern int qcom_scm_set_cold_boot_addr(void *entry, const cpumask_t *cpus); -extern int qcom_scm_set_warm_boot_addr(void *entry, const cpumask_t *cpus); +extern int qcom_scm_set_cold_boot_addr(void *entry); +extern int qcom_scm_set_warm_boot_addr(void *entry); extern void qcom_scm_cpu_power_down(u32 flags); extern int qcom_scm_set_remote_state(u32 state, u32 id); -- cgit From 2651bf680bc2ad9a078b7222b0873145ab4ece07 Mon Sep 17 00:00:00 2001 From: Song Liu Date: Thu, 3 Feb 2022 11:28:25 -0800 Subject: block: introduce BLK_STS_OFFLINE Currently, drivers reports BLK_STS_IOERR for devices that are not full online or being removed. This behavior could cause confusion for users, as they are not really I/O errors from the device. Solve this issue with a new state BLK_STS_OFFLINE, which reports "device offline error" in dmesg instead of "I/O error". EIO is intentionally kept to not change user visible return value. Signed-off-by: Song Liu Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220203192827.1370270-2-song@kernel.org Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index fe065c394fff..5561e58d158a 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -153,6 +153,13 @@ typedef u8 __bitwise blk_status_t; */ #define BLK_STS_ZONE_ACTIVE_RESOURCE ((__force blk_status_t)16) +/* + * BLK_STS_OFFLINE is returned from the driver when the target device is offline + * or is being taken offline. This could help differentiate the case where a + * device is intentionally being shut down from a real I/O error. + */ +#define BLK_STS_OFFLINE ((__force blk_status_t)17) + /** * blk_path_error - returns true if error may be path related * @error: status the request was completed with -- cgit From 1bc91a5ddf3eaea0e0ea957cccf3abdcfcace00e Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Thu, 20 Jan 2022 13:07:01 +0100 Subject: netfilter: conntrack: handle ->destroy hook via nat_ops instead The nat module already exposes a few functions to the conntrack core. Move the nat extension destroy hook to it. After this, no conntrack extension needs a destroy hook. 'struct nf_ct_ext_type' and the register/unregister api can be removed in a followup patch. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/linux/netfilter.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index 15e71bfff726..c2c6f332fb90 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -379,6 +379,7 @@ struct nf_nat_hook { unsigned int (*manip_pkt)(struct sk_buff *skb, struct nf_conn *ct, enum nf_nat_manip_type mtype, enum ip_conntrack_dir dir); + void (*remove_nat_bysrc)(struct nf_conn *ct); }; extern const struct nf_nat_hook __rcu *nf_nat_hook; -- cgit From 20ff32024624102596f2b4083a17a97ca71d6cd8 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Thu, 20 Jan 2022 16:09:13 +0100 Subject: netfilter: conntrack: pptp: use single option structure Instead of exposing the four hooks individually use a sinle hook ops structure. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/linux/netfilter/nf_conntrack_pptp.h | 38 +++++++++++++---------------- 1 file changed, 17 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfilter/nf_conntrack_pptp.h b/include/linux/netfilter/nf_conntrack_pptp.h index a28aa289afdc..c3bdb4370938 100644 --- a/include/linux/netfilter/nf_conntrack_pptp.h +++ b/include/linux/netfilter/nf_conntrack_pptp.h @@ -300,26 +300,22 @@ union pptp_ctrl_union { struct PptpSetLinkInfo setlink; }; -extern int -(*nf_nat_pptp_hook_outbound)(struct sk_buff *skb, - struct nf_conn *ct, enum ip_conntrack_info ctinfo, - unsigned int protoff, - struct PptpControlHeader *ctlh, - union pptp_ctrl_union *pptpReq); - -extern int -(*nf_nat_pptp_hook_inbound)(struct sk_buff *skb, - struct nf_conn *ct, enum ip_conntrack_info ctinfo, - unsigned int protoff, - struct PptpControlHeader *ctlh, - union pptp_ctrl_union *pptpReq); - -extern void -(*nf_nat_pptp_hook_exp_gre)(struct nf_conntrack_expect *exp_orig, - struct nf_conntrack_expect *exp_reply); - -extern void -(*nf_nat_pptp_hook_expectfn)(struct nf_conn *ct, - struct nf_conntrack_expect *exp); +struct nf_nat_pptp_hook { + int (*outbound)(struct sk_buff *skb, + struct nf_conn *ct, enum ip_conntrack_info ctinfo, + unsigned int protoff, + struct PptpControlHeader *ctlh, + union pptp_ctrl_union *pptpReq); + int (*inbound)(struct sk_buff *skb, + struct nf_conn *ct, enum ip_conntrack_info ctinfo, + unsigned int protoff, + struct PptpControlHeader *ctlh, + union pptp_ctrl_union *pptpReq); + void (*exp_gre)(struct nf_conntrack_expect *exp_orig, + struct nf_conntrack_expect *exp_reply); + void (*expectfn)(struct nf_conn *ct, + struct nf_conntrack_expect *exp); +}; +extern const struct nf_nat_pptp_hook __rcu *nf_nat_pptp_hook; #endif /* _NF_CONNTRACK_PPTP_H */ -- cgit From ac9f0c810684a1b161c18eb4b91ce84cbc13c91d Mon Sep 17 00:00:00 2001 From: Anton Lundin Date: Thu, 3 Feb 2022 10:41:35 +0100 Subject: ata: libata-core: Introduce ATA_HORKAGE_NO_LOG_DIR horkage 06f6c4c6c3e8 ("ata: libata: add missing ata_identify_page_supported() calls") introduced additional calls to ata_identify_page_supported(), thus also adding indirectly accesses to the device log directory log page through ata_log_supported(). Reading this log page causes SATADOM-ML 3ME devices to lock up. Introduce the horkage flag ATA_HORKAGE_NO_LOG_DIR to prevent accesses to the log directory in ata_log_supported() and add a blacklist entry with this flag for "SATADOM-ML 3ME" devices. Fixes: 636f6e2af4fb ("libata: add horkage for missing Identify Device log") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Anton Lundin Signed-off-by: Damien Le Moal --- include/linux/libata.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 605756f645be..7f99b4d78822 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -380,6 +380,7 @@ enum { ATA_HORKAGE_MAX_TRIM_128M = (1 << 26), /* Limit max trim size to 128M */ ATA_HORKAGE_NO_NCQ_ON_ATI = (1 << 27), /* Disable NCQ on ATI chipset */ ATA_HORKAGE_NO_ID_DEV_LOG = (1 << 28), /* Identify device log missing */ + ATA_HORKAGE_NO_LOG_DIR = (1 << 29), /* Do not read log directory */ /* DMA mask for user DMA control: User visible values; DO NOT renumber */ -- cgit From b93235e68921b9acd38ee309953a3a9808105289 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 2 Feb 2022 14:20:31 -0800 Subject: tls: cap the output scatter list to something reasonable TLS recvmsg() passes user pages as destination for decrypt. The decrypt operation is repeated record by record, each record being 16kB, max. TLS allocates an sg_table and uses iov_iter_get_pages() to populate it with enough pages to fit the decrypted record. Even though we decrypt a single message at a time we size the sg_table based on the entire length of the iovec. This leads to unnecessarily large allocations, risking triggering OOM conditions. Use iov_iter_truncate() / iov_iter_reexpand() to construct a "capped" version of iov_iter_npages(). Alternatively we could parametrize iov_iter_npages() to take the size as arg instead of using i->count, or do something else.. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/uio.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/uio.h b/include/linux/uio.h index 1198a2bfc9bf..739285fe5a2f 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -273,6 +273,23 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count) i->count = count; } +static inline int +iov_iter_npages_cap(struct iov_iter *i, int maxpages, size_t max_bytes) +{ + size_t shorted = 0; + int npages; + + if (iov_iter_count(i) > max_bytes) { + shorted = iov_iter_count(i) - max_bytes; + iov_iter_truncate(i, max_bytes); + } + npages = iov_iter_npages(i, INT_MAX); + if (shorted) + iov_iter_reexpand(i, iov_iter_count(i) + shorted); + + return npages; +} + struct csum_state { __wsum csum; size_t off; -- cgit From 56b4b5abcdab6daf71c5536fca2772f178590e06 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 2 Feb 2022 17:01:06 +0100 Subject: block: clone crypto and integrity data in __bio_clone_fast __bio_clone_fast should also clone integrity and crypto data, as a clone without those is incomplete. Right now the only caller that can actually support crypto and integrity data (dm) does it manually for the one callchain that supports these, but we better do it properly in the core. Note that all callers except for the above mentioned one also don't need to handle failure at all, given that the integrity and crypto clones are based on mempool allocations that won't fail for sleeping allocations. Signed-off-by: Christoph Hellwig Reviewed-by: Mike Snitzer Link: https://lore.kernel.org/r/20220202160109.108149-11-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 18cfe5bb41ea..b814361c957b 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -413,7 +413,7 @@ struct bio *bio_alloc_kiocb(struct kiocb *kiocb, struct block_device *bdev, struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned short nr_iovecs); extern void bio_put(struct bio *); -extern void __bio_clone_fast(struct bio *, struct bio *); +int __bio_clone_fast(struct bio *bio, struct bio *bio_src, gfp_t gfp); extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *); extern struct bio_set fs_bio_set; -- cgit From abfc426d1b2fb2176df59851a64223b58ddae7e7 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 2 Feb 2022 17:01:09 +0100 Subject: block: pass a block_device to bio_clone_fast Pass a block_device to bio_clone_fast and __bio_clone_fast and give the functions more suitable names. Signed-off-by: Christoph Hellwig Reviewed-by: Mike Snitzer Link: https://lore.kernel.org/r/20220202160109.108149-14-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index b814361c957b..7523aba4ddf7 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -413,8 +413,10 @@ struct bio *bio_alloc_kiocb(struct kiocb *kiocb, struct block_device *bdev, struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned short nr_iovecs); extern void bio_put(struct bio *); -int __bio_clone_fast(struct bio *bio, struct bio *bio_src, gfp_t gfp); -extern struct bio *bio_clone_fast(struct bio *, gfp_t, struct bio_set *); +struct bio *bio_alloc_clone(struct block_device *bdev, struct bio *bio_src, + gfp_t gfp, struct bio_set *bs); +int bio_init_clone(struct block_device *bdev, struct bio *bio, + struct bio *bio_src, gfp_t gfp); extern struct bio_set fs_bio_set; -- cgit From 916acbf6b4b9262df7de1d2b6208a4efa209a88f Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 3 Feb 2022 16:45:21 +0200 Subject: serial: core: Fix the definition name in the comment of UPF_* flags From day 1 the UPF_LAST_USER wasn't defined, a specific number of the last bit for userspace. Instead the code always relies on ASYNCB_LAST_USER. Fix comment accordingly. Fixes: 904326ecac02 ("tty,serial: Unify UPF_* and ASYNC_* flag definitions") Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220203144521.16457-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index c58cc142d23f..31f7fe527395 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -171,7 +171,7 @@ struct uart_port { * assigned from the serial_struct flags in uart_set_info() * [for bit definitions in the UPF_CHANGE_MASK] * - * Bits [0..UPF_LAST_USER] are userspace defined/visible/changeable + * Bits [0..ASYNCB_LAST_USER] are userspace defined/visible/changeable * The remaining bits are serial-core specific and not modifiable by * userspace. */ -- cgit From 84564481bc4520c47e7fe9c594c0523d81e6a97a Mon Sep 17 00:00:00 2001 From: Aswath Govindraju Date: Fri, 7 Jan 2022 08:44:25 +0100 Subject: mux: Add support for reading mux state from consumer DT node In some cases, we might need to provide the state of the mux to be set for the operation of a given peripheral. Therefore, pass this information using mux-states property. Link: https://lore.kernel.org/lkml/20211123081222.27979-1-a-govindraju@ti.com Signed-off-by: Aswath Govindraju Signed-off-by: Peter Rosin (minor edits) Link: https://lore.kernel.org/r/aac25be8-9515-a980-f7cb-709938c84822@axentia.se Signed-off-by: Greg Kroah-Hartman --- include/linux/mux/consumer.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mux/consumer.h b/include/linux/mux/consumer.h index 7a09b040ac39..2e25c838f831 100644 --- a/include/linux/mux/consumer.h +++ b/include/linux/mux/consumer.h @@ -14,14 +14,19 @@ struct device; struct mux_control; +struct mux_state; unsigned int mux_control_states(struct mux_control *mux); int __must_check mux_control_select_delay(struct mux_control *mux, unsigned int state, unsigned int delay_us); +int __must_check mux_state_select_delay(struct mux_state *mstate, + unsigned int delay_us); int __must_check mux_control_try_select_delay(struct mux_control *mux, unsigned int state, unsigned int delay_us); +int __must_check mux_state_try_select_delay(struct mux_state *mstate, + unsigned int delay_us); static inline int __must_check mux_control_select(struct mux_control *mux, unsigned int state) @@ -29,18 +34,31 @@ static inline int __must_check mux_control_select(struct mux_control *mux, return mux_control_select_delay(mux, state, 0); } +static inline int __must_check mux_state_select(struct mux_state *mstate) +{ + return mux_state_select_delay(mstate, 0); +} + static inline int __must_check mux_control_try_select(struct mux_control *mux, unsigned int state) { return mux_control_try_select_delay(mux, state, 0); } +static inline int __must_check mux_state_try_select(struct mux_state *mstate) +{ + return mux_state_try_select_delay(mstate, 0); +} + int mux_control_deselect(struct mux_control *mux); +int mux_state_deselect(struct mux_state *mstate); struct mux_control *mux_control_get(struct device *dev, const char *mux_name); void mux_control_put(struct mux_control *mux); struct mux_control *devm_mux_control_get(struct device *dev, const char *mux_name); +struct mux_state *devm_mux_state_get(struct device *dev, + const char *mux_name); #endif /* _LINUX_MUX_CONSUMER_H */ -- cgit From bb6e8c28414335a551da5973d44cc537f7abe65a Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Wed, 12 Jan 2022 08:00:53 -0800 Subject: firmware_loader: simplfy builtin or module check The existing check is outdated and confuses developers. Use the already existing IS_REACHABLE() defined on kconfig.h which makes the intention much clearer. Cc: Randy Dunlap Cc: Masahiro Yamada Reported-by: Borislav Petkov Reported-by: Greg Kroah-Hartman Suggested-by: Masahiro Yamada Signed-off-by: Luis Chamberlain Ackd-by: Randy Dunlap Link: https://lore.kernel.org/r/20220112160053.723795-1-mcgrof@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/firmware.h b/include/linux/firmware.h index 3b057dfc8284..ec2ccfebef65 100644 --- a/include/linux/firmware.h +++ b/include/linux/firmware.h @@ -34,7 +34,7 @@ static inline bool firmware_request_builtin(struct firmware *fw, } #endif -#if defined(CONFIG_FW_LOADER) || (defined(CONFIG_FW_LOADER_MODULE) && defined(MODULE)) +#if IS_REACHABLE(CONFIG_FW_LOADER) int request_firmware(const struct firmware **fw, const char *name, struct device *device); int firmware_request_nowarn(const struct firmware **fw, const char *name, -- cgit From bed89478934a9803d1a4d97574dcc30e509aa972 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Wed, 2 Feb 2022 10:49:38 +0200 Subject: ieee80211: fix -Wcast-qual warnings When enabling -Wcast-qual e.g. via W=3, we get a lot of warnings from this file, whenever it's included. Since the fixes are simple, just do that. Signed-off-by: Johannes Berg Signed-off-by: Luca Coelho Link: https://lore.kernel.org/r/iwlwifi.20220202104617.79ec4a4bab29.I8177a0c79d656c552e22c88931d8da06f2977896@changeid Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 559b6c644938..60ee7b3f58e7 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -2427,7 +2427,7 @@ struct ieee80211_tx_pwr_env { static inline u8 ieee80211_he_oper_size(const u8 *he_oper_ie) { - struct ieee80211_he_operation *he_oper = (void *)he_oper_ie; + const struct ieee80211_he_operation *he_oper = (const void *)he_oper_ie; u8 oper_len = sizeof(struct ieee80211_he_operation); u32 he_oper_params; @@ -2460,7 +2460,7 @@ ieee80211_he_oper_size(const u8 *he_oper_ie) static inline const struct ieee80211_he_6ghz_oper * ieee80211_he_6ghz_oper(const struct ieee80211_he_operation *he_oper) { - const u8 *ret = (void *)&he_oper->optional; + const u8 *ret = (const void *)&he_oper->optional; u32 he_oper_params; if (!he_oper) @@ -2475,7 +2475,7 @@ ieee80211_he_6ghz_oper(const struct ieee80211_he_operation *he_oper) if (he_oper_params & IEEE80211_HE_OPERATION_CO_HOSTED_BSS) ret++; - return (void *)ret; + return (const void *)ret; } /* HE Spatial Reuse defines */ @@ -2496,7 +2496,7 @@ ieee80211_he_6ghz_oper(const struct ieee80211_he_operation *he_oper) static inline u8 ieee80211_he_spr_size(const u8 *he_spr_ie) { - struct ieee80211_he_spr *he_spr = (void *)he_spr_ie; + const struct ieee80211_he_spr *he_spr = (const void *)he_spr_ie; u8 spr_len = sizeof(struct ieee80211_he_spr); u8 he_spr_params; -- cgit From b9794a822281944ef3de5b1812a94cbdb8134320 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Fri, 28 Jan 2022 17:35:33 +0100 Subject: powercap/drivers/dtpm: Convert the init table section to a simple array The init table section is freed after the system booted. However the next changes will make per module the DTPM description, so the table won't be accessible when the module is loaded. In order to fix that, we should move the table to the data section where there are very few entries and that makes strange to add it there. The main goal of the table was to keep self-encapsulated code and we can keep it almost as it by using an array instead. Suggested-by: Ulf Hansson Reviewed-by: Ulf Hansson Signed-off-by: Daniel Lezcano Link: https://lore.kernel.org/r/20220128163537.212248-2-daniel.lezcano@linaro.org --- include/linux/dtpm.h | 24 +++--------------------- 1 file changed, 3 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h index d37e5d06a357..506048158a50 100644 --- a/include/linux/dtpm.h +++ b/include/linux/dtpm.h @@ -32,29 +32,11 @@ struct dtpm_ops { void (*release)(struct dtpm *); }; -typedef int (*dtpm_init_t)(void); - -struct dtpm_descr { - dtpm_init_t init; +struct dtpm_subsys_ops { + const char *name; + int (*init)(void); }; -/* Init section thermal table */ -extern struct dtpm_descr __dtpm_table[]; -extern struct dtpm_descr __dtpm_table_end[]; - -#define DTPM_TABLE_ENTRY(name, __init) \ - static struct dtpm_descr __dtpm_table_entry_##name \ - __used __section("__dtpm_table") = { \ - .init = __init, \ - } - -#define DTPM_DECLARE(name, init) DTPM_TABLE_ENTRY(name, init) - -#define for_each_dtpm_table(__dtpm) \ - for (__dtpm = __dtpm_table; \ - __dtpm < __dtpm_table_end; \ - __dtpm++) - static inline struct dtpm *to_dtpm(struct powercap_zone *zone) { return container_of(zone, struct dtpm, zone); -- cgit From 3759ec678e8944dc2ea70cab77a300408f78ae27 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Fri, 28 Jan 2022 17:35:34 +0100 Subject: powercap/drivers/dtpm: Add hierarchy creation The DTPM framework is available but without a way to configure it. This change provides a way to create a hierarchy of DTPM node where the power consumption reflects the sum of the children's power consumption. It is up to the platform to specify an array of dtpm nodes where each element has a pointer to its parent, except the top most one. The type of the node gives the indication of which initialization callback to call. At this time, we can create a virtual node, where its purpose is to be a parent in the hierarchy, and a DT node where the name describes its path. In order to ensure a nice self-encapsulation, the DTPM subsys array contains a couple of initialization functions, one to setup the DTPM backend and one to initialize it up. With this approach, the DTPM framework has a very few material to export. Signed-off-by: Daniel Lezcano Reviewed-by: Ulf Hansson Link: https://lore.kernel.org/r/20220128163537.212248-3-daniel.lezcano@linaro.org --- include/linux/dtpm.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h index 506048158a50..f7a25c70dd4c 100644 --- a/include/linux/dtpm.h +++ b/include/linux/dtpm.h @@ -32,9 +32,23 @@ struct dtpm_ops { void (*release)(struct dtpm *); }; +struct device_node; + struct dtpm_subsys_ops { const char *name; int (*init)(void); + int (*setup)(struct dtpm *, struct device_node *); +}; + +enum DTPM_NODE_TYPE { + DTPM_NODE_VIRTUAL = 0, + DTPM_NODE_DT, +}; + +struct dtpm_node { + enum DTPM_NODE_TYPE type; + const char *name; + struct dtpm_node *parent; }; static inline struct dtpm *to_dtpm(struct powercap_zone *zone) @@ -52,4 +66,5 @@ void dtpm_unregister(struct dtpm *dtpm); int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent); +int dtpm_create_hierarchy(struct of_device_id *dtpm_match_table); #endif -- cgit From 80110bbfbba6f0078d5a1cbc8df004506db8ffe5 Mon Sep 17 00:00:00 2001 From: Pasha Tatashin Date: Thu, 3 Feb 2022 20:49:24 -0800 Subject: mm/page_table_check: check entries at pmd levels syzbot detected a case where the page table counters were not properly updated. syzkaller login: ------------[ cut here ]------------ kernel BUG at mm/page_table_check.c:162! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 3099 Comm: pasha Not tainted 5.16.0+ #48 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIO4 RIP: 0010:__page_table_check_zero+0x159/0x1a0 Call Trace: free_pcp_prepare+0x3be/0xaa0 free_unref_page+0x1c/0x650 free_compound_page+0xec/0x130 free_transhuge_page+0x1be/0x260 __put_compound_page+0x90/0xd0 release_pages+0x54c/0x1060 __pagevec_release+0x7c/0x110 shmem_undo_range+0x85e/0x1250 ... The repro involved having a huge page that is split due to uprobe event temporarily replacing one of the pages in the huge page. Later the huge page was combined again, but the counters were off, as the PTE level was not properly updated. Make sure that when PMD is cleared and prior to freeing the level the PTEs are updated. Link: https://lkml.kernel.org/r/20220131203249.2832273-5-pasha.tatashin@soleen.com Fixes: df4e817b7108 ("mm: page table check") Signed-off-by: Pasha Tatashin Acked-by: David Rientjes Cc: Aneesh Kumar K.V Cc: Anshuman Khandual Cc: Dave Hansen Cc: Greg Thelen Cc: H. Peter Anvin Cc: Hugh Dickins Cc: Ingo Molnar Cc: Jiri Slaby Cc: Mike Rapoport Cc: Muchun Song Cc: Paul Turner Cc: Wei Xu Cc: Will Deacon Cc: Zi Yan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page_table_check.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include/linux') diff --git a/include/linux/page_table_check.h b/include/linux/page_table_check.h index 38cace1da7b6..01e16c7696ec 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -26,6 +26,9 @@ void __page_table_check_pmd_set(struct mm_struct *mm, unsigned long addr, pmd_t *pmdp, pmd_t pmd); void __page_table_check_pud_set(struct mm_struct *mm, unsigned long addr, pud_t *pudp, pud_t pud); +void __page_table_check_pte_clear_range(struct mm_struct *mm, + unsigned long addr, + pmd_t pmd); static inline void page_table_check_alloc(struct page *page, unsigned int order) { @@ -100,6 +103,16 @@ static inline void page_table_check_pud_set(struct mm_struct *mm, __page_table_check_pud_set(mm, addr, pudp, pud); } +static inline void page_table_check_pte_clear_range(struct mm_struct *mm, + unsigned long addr, + pmd_t pmd) +{ + if (static_branch_likely(&page_table_check_disabled)) + return; + + __page_table_check_pte_clear_range(mm, addr, pmd); +} + #else static inline void page_table_check_alloc(struct page *page, unsigned int order) @@ -143,5 +156,11 @@ static inline void page_table_check_pud_set(struct mm_struct *mm, { } +static inline void page_table_check_pte_clear_range(struct mm_struct *mm, + unsigned long addr, + pmd_t pmd) +{ +} + #endif /* CONFIG_PAGE_TABLE_CHECK */ #endif /* __LINUX_PAGE_TABLE_CHECK_H */ -- cgit From 314c459a6fe0957b5885fbc65c53d51444092880 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Thu, 3 Feb 2022 20:49:29 -0800 Subject: mm/pgtable: define pte_index so that preprocessor could recognize it Since commit 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") pte_index is a static inline and there is no define for it that can be recognized by the preprocessor. As a result, vm_insert_pages() uses slower loop over vm_insert_page() instead of insert_pages() that amortizes the cost of spinlock operations when inserting multiple pages. Link: https://lkml.kernel.org/r/20220111145457.20748-1-rppt@kernel.org Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") Signed-off-by: Mike Rapoport Reported-by: Christian Dietrich Reviewed-by: Khalid Aziz Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/pgtable.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index bc8713a76e03..f4f4077b97aa 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -62,6 +62,7 @@ static inline unsigned long pte_index(unsigned long address) { return (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1); } +#define pte_index pte_index #ifndef pmd_index static inline unsigned long pmd_index(unsigned long address) -- cgit From ae26508651272695a3ab353f75ab9a8daf3da324 Mon Sep 17 00:00:00 2001 From: Kevin Hao Date: Sun, 23 Jan 2022 20:45:06 +0800 Subject: cpufreq: Move to_gov_attr_set() to cpufreq.h So it can be reused by other codes. Signed-off-by: Kevin Hao Signed-off-by: Rafael J. Wysocki --- include/linux/cpufreq.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 1ab29e61b078..f0dfc0b260ec 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -658,6 +658,11 @@ struct gov_attr_set { /* sysfs ops for cpufreq governors */ extern const struct sysfs_ops governor_sysfs_ops; +static inline struct gov_attr_set *to_gov_attr_set(struct kobject *kobj) +{ + return container_of(kobj, struct gov_attr_set, kobj); +} + void gov_attr_set_init(struct gov_attr_set *attr_set, struct list_head *list_node); void gov_attr_set_get(struct gov_attr_set *attr_set, struct list_head *list_node); unsigned int gov_attr_set_put(struct gov_attr_set *attr_set, struct list_head *list_node); -- cgit From e70e13e7d4ab8f932f49db1c9500b30a34a6d420 Mon Sep 17 00:00:00 2001 From: Matteo Croce Date: Fri, 4 Feb 2022 01:55:18 +0100 Subject: bpf: Implement bpf_core_types_are_compat(). Adopt libbpf's bpf_core_types_are_compat() for kernel duty by adding explicit recursion limit of 2 which is enough to handle 2 levels of function prototypes. Signed-off-by: Matteo Croce Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220204005519.60361-2-mcroce@linux.microsoft.com --- include/linux/btf.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index f6c43dd513fa..36bc09b8e890 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -327,6 +327,11 @@ static inline const struct btf_var_secinfo *btf_type_var_secinfo( return (const struct btf_var_secinfo *)(t + 1); } +static inline struct btf_param *btf_params(const struct btf_type *t) +{ + return (struct btf_param *)(t + 1); +} + #ifdef CONFIG_BPF_SYSCALL struct bpf_prog; -- cgit From 0463e320421b313db320bbcf2928ebf49f556b67 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Fri, 4 Feb 2022 11:42:10 +0000 Subject: net: phylink: remove phylink_set_10g_modes() phylink_set_10g_modes() is no longer used with the conversion of drivers to phylink_generic_validate(), so we can remove it. Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 713a0c928b7c..cca149f78d35 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -582,7 +582,6 @@ int phylink_speed_up(struct phylink *pl); #define phylink_test(bm, mode) __phylink_do_bit(test_bit, bm, mode) void phylink_set_port_modes(unsigned long *bits); -void phylink_set_10g_modes(unsigned long *mask); void phylink_helper_basex_speed(struct phylink_link_state *state); void phylink_mii_c22_pcs_decode_state(struct phylink_link_state *state, -- cgit From 145c7a793838add5e004e7d49a67654dc7eba147 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 4 Feb 2022 12:15:45 -0800 Subject: ipv6: make mc_forwarding atomic This fixes minor data-races in ip6_mc_input() and batadv_mcast_mla_rtr_flags_softif_get_ipv6() Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/ipv6.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 1e0f8a31f3de..16870f86c74d 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -51,7 +51,7 @@ struct ipv6_devconf { __s32 use_optimistic; #endif #ifdef CONFIG_IPV6_MROUTE - __s32 mc_forwarding; + atomic_t mc_forwarding; #endif __s32 disable_ipv6; __s32 drop_unicast_in_l2_multicast; -- cgit From e3ececfe668facd87d920b608349a32607060e66 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 4 Feb 2022 14:42:35 -0800 Subject: ref_tracker: implement use-after-free detection Whenever ref_tracker_dir_init() is called, mark the struct ref_tracker_dir as dead. Test the dead status from ref_tracker_alloc() and ref_tracker_free() This should detect buggy dev_put()/dev_hold() happening too late in netdevice dismantle process. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/ref_tracker.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ref_tracker.h b/include/linux/ref_tracker.h index 60f3453be23e..a443abda937d 100644 --- a/include/linux/ref_tracker.h +++ b/include/linux/ref_tracker.h @@ -13,6 +13,7 @@ struct ref_tracker_dir { spinlock_t lock; unsigned int quarantine_avail; refcount_t untracked; + bool dead; struct list_head list; /* List of active trackers */ struct list_head quarantine; /* List of dead trackers */ #endif @@ -26,6 +27,7 @@ static inline void ref_tracker_dir_init(struct ref_tracker_dir *dir, INIT_LIST_HEAD(&dir->quarantine); spin_lock_init(&dir->lock); dir->quarantine_avail = quarantine_count; + dir->dead = false; refcount_set(&dir->untracked, 1); stack_depot_init(); } -- cgit From 8fd5522f44dcd7f05454ddc4f16d0f821b676cd9 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 4 Feb 2022 14:42:36 -0800 Subject: ref_tracker: add a count of untracked references We are still chasing a netdev refcount imbalance, and we suspect we have one rogue dev_put() that is consuming a reference taken from a dev_hold_track() To detect this case, allow ref_tracker_alloc() and ref_tracker_free() to be called with a NULL @trackerp parameter, and use a dedicated refcount_t just for them. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/ref_tracker.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ref_tracker.h b/include/linux/ref_tracker.h index a443abda937d..9ca353ab712b 100644 --- a/include/linux/ref_tracker.h +++ b/include/linux/ref_tracker.h @@ -13,6 +13,7 @@ struct ref_tracker_dir { spinlock_t lock; unsigned int quarantine_avail; refcount_t untracked; + refcount_t no_tracker; bool dead; struct list_head list; /* List of active trackers */ struct list_head quarantine; /* List of dead trackers */ @@ -29,6 +30,7 @@ static inline void ref_tracker_dir_init(struct ref_tracker_dir *dir, dir->quarantine_avail = quarantine_count; dir->dead = false; refcount_set(&dir->untracked, 1); + refcount_set(&dir->no_tracker, 1); stack_depot_init(); } -- cgit From 4c6c11ea0f7b00a1894803efe980dfaf3b074886 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 4 Feb 2022 14:42:37 -0800 Subject: net: refine dev_put()/dev_hold() debugging We are still chasing some syzbot reports where we think a rogue dev_put() is called with no corresponding prior dev_hold(). Unfortunately it eats a reference on dev->dev_refcnt taken by innocent dev_hold_track(), meaning that the refcount saturation splat comes too late to be useful. Make sure that 'not tracked' dev_put() and dev_hold() better use CONFIG_NET_DEV_REFCNT_TRACKER=y debug infrastructure: Prior patch in the series allowed ref_tracker_alloc() and ref_tracker_free() to be called with a NULL @trackerp parameter, and to use a separate refcount only to detect too many put() even in the following case: dev_hold_track(dev, tracker_1, GFP_ATOMIC); dev_hold(dev); dev_put(dev); dev_put(dev); // Should complain loudly here. dev_put_track(dev, tracker_1); // instead of here Add clarification about netdev_tracker_alloc() role. v2: I replaced the dev_put() in linkwatch_do_dev() with __dev_put() because callers called netdev_tracker_free(). Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/netdevice.h | 69 +++++++++++++++++++++++++++++++---------------- 1 file changed, 46 insertions(+), 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e490b84732d1..3fb6fb67ed77 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3817,14 +3817,7 @@ extern unsigned int netdev_budget_usecs; /* Called by rtnetlink.c:rtnl_unlock() */ void netdev_run_todo(void); -/** - * dev_put - release reference to device - * @dev: network device - * - * Release reference to device to allow it to be freed. - * Try using dev_put_track() instead. - */ -static inline void dev_put(struct net_device *dev) +static inline void __dev_put(struct net_device *dev) { if (dev) { #ifdef CONFIG_PCPU_DEV_REFCNT @@ -3835,14 +3828,7 @@ static inline void dev_put(struct net_device *dev) } } -/** - * dev_hold - get reference to device - * @dev: network device - * - * Hold reference to device to keep it from being freed. - * Try using dev_hold_track() instead. - */ -static inline void dev_hold(struct net_device *dev) +static inline void __dev_hold(struct net_device *dev) { if (dev) { #ifdef CONFIG_PCPU_DEV_REFCNT @@ -3853,11 +3839,24 @@ static inline void dev_hold(struct net_device *dev) } } +static inline void __netdev_tracker_alloc(struct net_device *dev, + netdevice_tracker *tracker, + gfp_t gfp) +{ +#ifdef CONFIG_NET_DEV_REFCNT_TRACKER + ref_tracker_alloc(&dev->refcnt_tracker, tracker, gfp); +#endif +} + +/* netdev_tracker_alloc() can upgrade a prior untracked reference + * taken by dev_get_by_name()/dev_get_by_index() to a tracked one. + */ static inline void netdev_tracker_alloc(struct net_device *dev, netdevice_tracker *tracker, gfp_t gfp) { #ifdef CONFIG_NET_DEV_REFCNT_TRACKER - ref_tracker_alloc(&dev->refcnt_tracker, tracker, gfp); + refcount_dec(&dev->refcnt_tracker.no_tracker); + __netdev_tracker_alloc(dev, tracker, gfp); #endif } @@ -3873,8 +3872,8 @@ static inline void dev_hold_track(struct net_device *dev, netdevice_tracker *tracker, gfp_t gfp) { if (dev) { - dev_hold(dev); - netdev_tracker_alloc(dev, tracker, gfp); + __dev_hold(dev); + __netdev_tracker_alloc(dev, tracker, gfp); } } @@ -3883,10 +3882,34 @@ static inline void dev_put_track(struct net_device *dev, { if (dev) { netdev_tracker_free(dev, tracker); - dev_put(dev); + __dev_put(dev); } } +/** + * dev_hold - get reference to device + * @dev: network device + * + * Hold reference to device to keep it from being freed. + * Try using dev_hold_track() instead. + */ +static inline void dev_hold(struct net_device *dev) +{ + dev_hold_track(dev, NULL, GFP_ATOMIC); +} + +/** + * dev_put - release reference to device + * @dev: network device + * + * Release reference to device to allow it to be freed. + * Try using dev_put_track() instead. + */ +static inline void dev_put(struct net_device *dev) +{ + dev_put_track(dev, NULL); +} + static inline void dev_replace_track(struct net_device *odev, struct net_device *ndev, netdevice_tracker *tracker, @@ -3895,11 +3918,11 @@ static inline void dev_replace_track(struct net_device *odev, if (odev) netdev_tracker_free(odev, tracker); - dev_hold(ndev); - dev_put(odev); + __dev_hold(ndev); + __dev_put(odev); if (ndev) - netdev_tracker_alloc(ndev, tracker, gfp); + __netdev_tracker_alloc(ndev, tracker, gfp); } /* Carrier loss detection, dial on demand. The functions netif_carrier_on -- cgit From 5a8fb33e530512ee67a11b30f3451a4f030f4b01 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 4 Feb 2022 20:56:14 -0800 Subject: skmsg: convert struct sk_msg_sg::copy to a bitmap We have plans for increasing MAX_SKB_FRAGS, but sk_msg_sg::copy is currently an unsigned long, limiting MAX_SKB_FRAGS to 30 on 32bit arches. Convert it to a bitmap, as Jakub suggested. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/skmsg.h | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 18a717fe62eb..1ff68a88c58d 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -29,7 +29,7 @@ struct sk_msg_sg { u32 end; u32 size; u32 copybreak; - unsigned long copy; + DECLARE_BITMAP(copy, MAX_MSG_FRAGS + 2); /* The extra two elements: * 1) used for chaining the front and sections when the list becomes * partitioned (e.g. end < start). The crypto APIs require the @@ -38,7 +38,6 @@ struct sk_msg_sg { */ struct scatterlist data[MAX_MSG_FRAGS + 2]; }; -static_assert(BITS_PER_LONG >= NR_MSG_FRAG_IDS); /* UAPI in filter.c depends on struct sk_msg_sg being first element. */ struct sk_msg { @@ -234,7 +233,7 @@ static inline void sk_msg_compute_data_pointers(struct sk_msg *msg) { struct scatterlist *sge = sk_msg_elem(msg, msg->sg.start); - if (test_bit(msg->sg.start, &msg->sg.copy)) { + if (test_bit(msg->sg.start, msg->sg.copy)) { msg->data = NULL; msg->data_end = NULL; } else { @@ -253,7 +252,7 @@ static inline void sk_msg_page_add(struct sk_msg *msg, struct page *page, sg_set_page(sge, page, len, offset); sg_unmark_end(sge); - __set_bit(msg->sg.end, &msg->sg.copy); + __set_bit(msg->sg.end, msg->sg.copy); msg->sg.size += len; sk_msg_iter_next(msg, end); } @@ -262,9 +261,9 @@ static inline void sk_msg_sg_copy(struct sk_msg *msg, u32 i, bool copy_state) { do { if (copy_state) - __set_bit(i, &msg->sg.copy); + __set_bit(i, msg->sg.copy); else - __clear_bit(i, &msg->sg.copy); + __clear_bit(i, msg->sg.copy); sk_msg_iter_var_next(i); if (i == msg->sg.end) break; -- cgit From 88590b369354092183bcba04e2368010c462557f Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sat, 5 Feb 2022 15:47:33 +0800 Subject: net: skb_drop_reason: add document for drop reasons Add document for following existing drop reasons: SKB_DROP_REASON_NOT_SPECIFIED SKB_DROP_REASON_NO_SOCKET SKB_DROP_REASON_PKT_TOO_SMALL SKB_DROP_REASON_TCP_CSUM SKB_DROP_REASON_SOCKET_FILTER SKB_DROP_REASON_UDP_CSUM Signed-off-by: Menglong Dong Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a27bcc4f7e9a..f04e3a1f4455 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -314,12 +314,12 @@ struct sk_buff; * used to translate the reason to string. */ enum skb_drop_reason { - SKB_DROP_REASON_NOT_SPECIFIED, - SKB_DROP_REASON_NO_SOCKET, - SKB_DROP_REASON_PKT_TOO_SMALL, - SKB_DROP_REASON_TCP_CSUM, - SKB_DROP_REASON_SOCKET_FILTER, - SKB_DROP_REASON_UDP_CSUM, + SKB_DROP_REASON_NOT_SPECIFIED, /* drop reason is not specified */ + SKB_DROP_REASON_NO_SOCKET, /* socket not found */ + SKB_DROP_REASON_PKT_TOO_SMALL, /* packet size is too small */ + SKB_DROP_REASON_TCP_CSUM, /* TCP checksum error */ + SKB_DROP_REASON_SOCKET_FILTER, /* dropped by socket filter */ + SKB_DROP_REASON_UDP_CSUM, /* UDP checksum error */ SKB_DROP_REASON_MAX, }; -- cgit From 2df3041ba3be950376e8c25a8f6da22f7fcc765c Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sat, 5 Feb 2022 15:47:34 +0800 Subject: net: netfilter: use kfree_drop_reason() for NF_DROP Replace kfree_skb() with kfree_skb_reason() in nf_hook_slow() when skb is dropped by reason of NF_DROP. Following new drop reasons are introduced: SKB_DROP_REASON_NETFILTER_DROP Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f04e3a1f4455..9060159b4375 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -320,6 +320,7 @@ enum skb_drop_reason { SKB_DROP_REASON_TCP_CSUM, /* TCP checksum error */ SKB_DROP_REASON_SOCKET_FILTER, /* dropped by socket filter */ SKB_DROP_REASON_UDP_CSUM, /* UDP checksum error */ + SKB_DROP_REASON_NETFILTER_DROP, /* dropped by netfilter */ SKB_DROP_REASON_MAX, }; -- cgit From 33cba42985c8144eef78d618fc1e51aaa074b169 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sat, 5 Feb 2022 15:47:35 +0800 Subject: net: ipv4: use kfree_skb_reason() in ip_rcv_core() Replace kfree_skb() with kfree_skb_reason() in ip_rcv_core(). Three new drop reasons are introduced: SKB_DROP_REASON_OTHERHOST SKB_DROP_REASON_IP_CSUM SKB_DROP_REASON_IP_INHDR Signed-off-by: Menglong Dong Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9060159b4375..8e82130b3c52 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -321,6 +321,15 @@ enum skb_drop_reason { SKB_DROP_REASON_SOCKET_FILTER, /* dropped by socket filter */ SKB_DROP_REASON_UDP_CSUM, /* UDP checksum error */ SKB_DROP_REASON_NETFILTER_DROP, /* dropped by netfilter */ + SKB_DROP_REASON_OTHERHOST, /* packet don't belong to current + * host (interface is in promisc + * mode) + */ + SKB_DROP_REASON_IP_CSUM, /* IP checksum error */ + SKB_DROP_REASON_IP_INHDR, /* there is something wrong with + * IP header (see + * IPSTATS_MIB_INHDRERRORS) + */ SKB_DROP_REASON_MAX, }; -- cgit From c1f166d1f7eef212096a98b22f5acf92f9af353d Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sat, 5 Feb 2022 15:47:36 +0800 Subject: net: ipv4: use kfree_skb_reason() in ip_rcv_finish_core() Replace kfree_skb() with kfree_skb_reason() in ip_rcv_finish_core(), following drop reasons are introduced: SKB_DROP_REASON_IP_RPFILTER SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST Signed-off-by: Menglong Dong Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8e82130b3c52..4baba45f223d 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -330,6 +330,15 @@ enum skb_drop_reason { * IP header (see * IPSTATS_MIB_INHDRERRORS) */ + SKB_DROP_REASON_IP_RPFILTER, /* IP rpfilter validate failed. + * see the document for rp_filter + * in ip-sysctl.rst for more + * information + */ + SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST, /* destination address of L2 + * is multicast, but L3 is + * unicast. + */ SKB_DROP_REASON_MAX, }; -- cgit From 10580c4791902571777603cb980414892dd5f149 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sat, 5 Feb 2022 15:47:37 +0800 Subject: net: ipv4: use kfree_skb_reason() in ip_protocol_deliver_rcu() Replace kfree_skb() with kfree_skb_reason() in ip_protocol_deliver_rcu(). Following new drop reasons are introduced: SKB_DROP_REASON_XFRM_POLICY SKB_DROP_REASON_IP_NOPROTO Signed-off-by: Menglong Dong Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4baba45f223d..2a64afa97910 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -339,6 +339,8 @@ enum skb_drop_reason { * is multicast, but L3 is * unicast. */ + SKB_DROP_REASON_XFRM_POLICY, /* xfrm policy check failed */ + SKB_DROP_REASON_IP_NOPROTO, /* no support for IP protocol */ SKB_DROP_REASON_MAX, }; -- cgit From 08d4c0370c400fa6ef2194f9ee2e8dccc4a7ab39 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sat, 5 Feb 2022 15:47:39 +0800 Subject: net: udp: use kfree_skb_reason() in __udp_queue_rcv_skb() Replace kfree_skb() with kfree_skb_reason() in __udp_queue_rcv_skb(). Following new drop reasons are introduced: SKB_DROP_REASON_SOCKET_RCVBUFF SKB_DROP_REASON_PROTO_MEM Signed-off-by: Menglong Dong Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2a64afa97910..a5adbf6b51e8 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -341,6 +341,11 @@ enum skb_drop_reason { */ SKB_DROP_REASON_XFRM_POLICY, /* xfrm policy check failed */ SKB_DROP_REASON_IP_NOPROTO, /* no support for IP protocol */ + SKB_DROP_REASON_SOCKET_RCVBUFF, /* socket receive buff is full */ + SKB_DROP_REASON_PROTO_MEM, /* proto memory limition, such as + * udp packet drop out of + * udp_memory_allocated. + */ SKB_DROP_REASON_MAX, }; -- cgit From fda17afc6166e975bec1197bd94cd2a3317bce3f Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Mon, 7 Feb 2022 11:27:53 +0900 Subject: ata: libata-core: Fix ata_dev_config_cpr() The concurrent positioning ranges log page 47h is a general purpose log page and not a subpage of the indentify device log. Using ata_identify_page_supported() to test for concurrent positioning ranges support is thus wrong. ata_log_supported() must be used. Furthermore, unlike other advanced ATA features (e.g. NCQ priority), accesses to the concurrent positioning ranges log page are not gated by a feature bit from the device IDENTIFY data. Since many older drives react badly to the READ LOG EXT and/or READ LOG DMA EXT commands isued to read device log pages, avoid problems with older drives by limiting the concurrent positioning ranges support detection to drives implementing at least the ACS-4 ATA standard (major version 11). This additional condition effectively turns ata_dev_config_cpr() into a nop for older drives, avoiding problems in the field. Fixes: fe22e1c2f705 ("libata: support concurrent positioning ranges log") BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215519 Cc: stable@vger.kernel.org Reviewed-by: Hannes Reinecke Tested-by: Abderraouf Adjal Signed-off-by: Damien Le Moal --- include/linux/ata.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ata.h b/include/linux/ata.h index 199e47e97d64..21292b5bbb55 100644 --- a/include/linux/ata.h +++ b/include/linux/ata.h @@ -324,12 +324,12 @@ enum { ATA_LOG_NCQ_NON_DATA = 0x12, ATA_LOG_NCQ_SEND_RECV = 0x13, ATA_LOG_IDENTIFY_DEVICE = 0x30, + ATA_LOG_CONCURRENT_POSITIONING_RANGES = 0x47, /* Identify device log pages: */ ATA_LOG_SECURITY = 0x06, ATA_LOG_SATA_SETTINGS = 0x08, ATA_LOG_ZONED_INFORMATION = 0x09, - ATA_LOG_CONCURRENT_POSITIONING_RANGES = 0x47, /* Identify device SATA settings log:*/ ATA_LOG_DEVSLP_OFFSET = 0x30, -- cgit From ad5e35f58384179bbd3cdb349904e150f364c4f3 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Fri, 28 Jan 2022 12:34:14 +0100 Subject: mtd: Replace the expert mode symbols with a single helper Reduce the number of exported symbols by replacing: - mtd_expert_analysis_warning (the error string) - mtd_expert_analysis_mode (the boolean) with a single helper: - mtd_check_expert_analysis_mode Calling this helper will both check/return the content of the internal boolean -which is not exported anymore- and as well conditionally WARN_ONCE() the user, like it was done before. While on this function, make the error string local to the helper and set it const. Only export this helper when CONFIG_DEBUG_FS is defined to limit the growth of the Linux kernel size only for a debug feature on production kernels. Mechanically update all the consumers. Suggested-by: Geert Uytterhoeven Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220128113414.1121924-1-miquel.raynal@bootlin.com --- include/linux/mtd/mtd.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h index 1ffa933121f6..1b3fc8c71ab3 100644 --- a/include/linux/mtd/mtd.h +++ b/include/linux/mtd/mtd.h @@ -711,7 +711,11 @@ static inline int mtd_is_bitflip_or_eccerr(int err) { unsigned mtd_mmap_capabilities(struct mtd_info *mtd); -extern char *mtd_expert_analysis_warning; -extern bool mtd_expert_analysis_mode; +#ifdef CONFIG_DEBUG_FS +bool mtd_check_expert_analysis_mode(void); +#else +static inline bool mtd_check_expert_analysis_mode(void) { return false; } +#endif + #endif /* __MTD_MTD_H__ */ -- cgit From 6bf625a4140f24b490766043b307f8252519578b Mon Sep 17 00:00:00 2001 From: Michael Kelley Date: Sun, 6 Feb 2022 11:36:56 -0800 Subject: Drivers: hv: vmbus: Rework use of DMA_BIT_MASK(64) Using DMA_BIT_MASK(64) as an initializer for a global variable causes problems with Clang 12.0.1. The compiler doesn't understand that value 64 is excluded from the shift at compile time, resulting in a build error. While this is a compiler problem, avoid the issue by setting up the dma_mask memory as part of struct hv_device, and initialize it using dma_set_mask(). Reported-by: Nathan Chancellor Reported-by: Vitaly Chikunov Reported-by: Jakub Kicinski Fixes: 743b237c3a7b ("scsi: storvsc: Add Isolation VM support for storvsc driver") Signed-off-by: Michael Kelley Reviewed-by: Nathan Chancellor Tested-by: Nathan Chancellor Link: https://lore.kernel.org/r/1644176216-12531-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index f565a8938836..fe2e0179ed51 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1262,6 +1262,7 @@ struct hv_device { struct vmbus_channel *channel; struct kset *channels_kset; struct device_dma_parameters dma_parms; + u64 dma_mask; /* place holder to keep track of the dir for hv device in debugfs */ struct dentry *debug_dir; -- cgit From cb1f65c1e1424a4b5e4a86da8aa3b8fd8459c8ec Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Fri, 4 Feb 2022 18:35:22 +0100 Subject: PM: s2idle: ACPI: Fix wakeup interrupts handling After commit e3728b50cd9b ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE") wakeup interrupts occurring immediately after the one discarded by acpi_s2idle_wake() may be missed. Moreover, if the SCI triggers again immediately after the rearming in acpi_s2idle_wake(), that wakeup may be missed too. The problem is that pm_system_irq_wakeup() only calls pm_system_wakeup() when pm_wakeup_irq is 0, but that's not the case any more after the interrupt causing acpi_s2idle_wake() to run until pm_wakeup_irq is cleared by the pm_wakeup_clear() call in s2idle_loop(). However, there may be wakeup interrupts occurring in that time frame and if that happens, they will be missed. To address that issue first move the clearing of pm_wakeup_irq to the point at which it is known that the interrupt causing acpi_s2idle_wake() to tun will be discarded, before rearming the SCI for wakeup. Moreover, because that only reduces the size of the time window in which the issue may manifest itself, allow pm_system_irq_wakeup() to register two second wakeup interrupts in a row and, when discarding the first one, replace it with the second one. [Of course, this assumes that only one wakeup interrupt can be discarded in one go, but currently that is the case and I am not aware of any plans to change that.] Fixes: e3728b50cd9b ("ACPI: PM: s2idle: Avoid possible race related to the EC GPE") Cc: 5.4+ # 5.4+ Signed-off-by: Rafael J. Wysocki --- include/linux/suspend.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 3e8ecdebe601..300273ff40cc 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -497,14 +497,14 @@ extern void ksys_sync_helper(void); /* drivers/base/power/wakeup.c */ extern bool events_check_enabled; -extern unsigned int pm_wakeup_irq; extern suspend_state_t pm_suspend_target_state; extern bool pm_wakeup_pending(void); extern void pm_system_wakeup(void); extern void pm_system_cancel_wakeup(void); -extern void pm_wakeup_clear(bool reset); +extern void pm_wakeup_clear(unsigned int irq_number); extern void pm_system_irq_wakeup(unsigned int irq_number); +extern unsigned int pm_wakeup_irq(void); extern bool pm_get_wakeup_count(unsigned int *count, bool block); extern bool pm_save_wakeup_count(unsigned int count); extern void pm_wakep_autosleep_enabled(bool set); -- cgit From ea4692c75e1c63926e4fb0728f5775ef0d733888 Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Wed, 26 Jan 2022 01:39:41 -0800 Subject: lib/string_helpers: Consolidate string helpers implementation There are a few implementations of string helpers in the tree like yesno() that just returns "yes" or "no" depending on a boolean argument. Those are helpful to output strings to the user or log. In order to consolidate them, prefix all of them str_ prefix to make it clear what they are about and avoid symbol clashes. Taking the commoon `val ? "yes" : "no"` implementation, quite a few users of open coded yesno() could later be converted to the new function: $ git grep '?\s*"yes"\s*' | wc -l 286 $ git grep '?\s*"no"\s*' | wc -l 20 The inlined function should keep the const strings local to each compilation unit, the same way it's now, thus not changing the current behavior. Signed-off-by: Lucas De Marchi Reviewed-by: Andy Shevchenko Acked-by: Jani Nikula Acked-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220126093951.1470898-2-lucas.demarchi@intel.com --- include/linux/string_helpers.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h index 7a22921c9db7..4d72258d42fd 100644 --- a/include/linux/string_helpers.h +++ b/include/linux/string_helpers.h @@ -106,4 +106,24 @@ void kfree_strarray(char **array, size_t n); char **devm_kasprintf_strarray(struct device *dev, const char *prefix, size_t n); +static inline const char *str_yes_no(bool v) +{ + return v ? "yes" : "no"; +} + +static inline const char *str_on_off(bool v) +{ + return v ? "on" : "off"; +} + +static inline const char *str_enable_disable(bool v) +{ + return v ? "enable" : "disable"; +} + +static inline const char *str_enabled_disabled(bool v) +{ + return v ? "enabled" : "disabled"; +} + #endif -- cgit From 7938f4218168ae9fc4bdddb15976f9ebbae41999 Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Fri, 4 Feb 2022 09:05:41 -0800 Subject: dma-buf-map: Rename to iosys-map MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rename struct dma_buf_map to struct iosys_map and corresponding APIs. Over time dma-buf-map grew up to more functionality than the one used by dma-buf: in fact it's just a shim layer to abstract system memory, that can be accessed via regular load and store, from IO memory that needs to be acessed via arch helpers. The idea is to extend this API so it can fulfill other needs, internal to a single driver. Example: in the i915 driver it's desired to share the implementation for integrated graphics, which uses mostly system memory, with discrete graphics, which may need to access IO memory. The conversion was mostly done with the following semantic patch: @r1@ @@ - struct dma_buf_map + struct iosys_map @r2@ @@ ( - DMA_BUF_MAP_INIT_VADDR + IOSYS_MAP_INIT_VADDR | - dma_buf_map_set_vaddr + iosys_map_set_vaddr | - dma_buf_map_set_vaddr_iomem + iosys_map_set_vaddr_iomem | - dma_buf_map_is_equal + iosys_map_is_equal | - dma_buf_map_is_null + iosys_map_is_null | - dma_buf_map_is_set + iosys_map_is_set | - dma_buf_map_clear + iosys_map_clear | - dma_buf_map_memcpy_to + iosys_map_memcpy_to | - dma_buf_map_incr + iosys_map_incr ) @@ @@ - #include + #include Then some files had their includes adjusted and some comments were update to remove mentions to dma-buf-map. Since this is not specific to dma-buf anymore, move the documentation to the "Bus-Independent Device Accesses" section. v2: - Squash patches v3: - Fix wrong removal of dma-buf.h from MAINTAINERS - Move documentation from dma-buf.rst to device-io.rst v4: - Change documentation title and level Signed-off-by: Lucas De Marchi Acked-by: Christian König Acked-by: Sumit Semwal Acked-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220204170541.829227-1-lucas.demarchi@intel.com --- include/linux/dma-buf-map.h | 266 -------------------------------------------- include/linux/dma-buf.h | 12 +- include/linux/iosys-map.h | 257 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 263 insertions(+), 272 deletions(-) delete mode 100644 include/linux/dma-buf-map.h create mode 100644 include/linux/iosys-map.h (limited to 'include/linux') diff --git a/include/linux/dma-buf-map.h b/include/linux/dma-buf-map.h deleted file mode 100644 index 278d489e4bdd..000000000000 --- a/include/linux/dma-buf-map.h +++ /dev/null @@ -1,266 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Pointer to dma-buf-mapped memory, plus helpers. - */ - -#ifndef __DMA_BUF_MAP_H__ -#define __DMA_BUF_MAP_H__ - -#include -#include - -/** - * DOC: overview - * - * Calling dma-buf's vmap operation returns a pointer to the buffer's memory. - * Depending on the location of the buffer, users may have to access it with - * I/O operations or memory load/store operations. For example, copying to - * system memory could be done with memcpy(), copying to I/O memory would be - * done with memcpy_toio(). - * - * .. code-block:: c - * - * void *vaddr = ...; // pointer to system memory - * memcpy(vaddr, src, len); - * - * void *vaddr_iomem = ...; // pointer to I/O memory - * memcpy_toio(vaddr, _iomem, src, len); - * - * When using dma-buf's vmap operation, the returned pointer is encoded as - * :c:type:`struct dma_buf_map `. - * :c:type:`struct dma_buf_map ` stores the buffer's address in - * system or I/O memory and a flag that signals the required method of - * accessing the buffer. Use the returned instance and the helper functions - * to access the buffer's memory in the correct way. - * - * The type :c:type:`struct dma_buf_map ` and its helpers are - * actually independent from the dma-buf infrastructure. When sharing buffers - * among devices, drivers have to know the location of the memory to access - * the buffers in a safe way. :c:type:`struct dma_buf_map ` - * solves this problem for dma-buf and its users. If other drivers or - * sub-systems require similar functionality, the type could be generalized - * and moved to a more prominent header file. - * - * Open-coding access to :c:type:`struct dma_buf_map ` is - * considered bad style. Rather then accessing its fields directly, use one - * of the provided helper functions, or implement your own. For example, - * instances of :c:type:`struct dma_buf_map ` can be initialized - * statically with DMA_BUF_MAP_INIT_VADDR(), or at runtime with - * dma_buf_map_set_vaddr(). These helpers will set an address in system memory. - * - * .. code-block:: c - * - * struct dma_buf_map map = DMA_BUF_MAP_INIT_VADDR(0xdeadbeaf); - * - * dma_buf_map_set_vaddr(&map. 0xdeadbeaf); - * - * To set an address in I/O memory, use dma_buf_map_set_vaddr_iomem(). - * - * .. code-block:: c - * - * dma_buf_map_set_vaddr_iomem(&map. 0xdeadbeaf); - * - * Instances of struct dma_buf_map do not have to be cleaned up, but - * can be cleared to NULL with dma_buf_map_clear(). Cleared mappings - * always refer to system memory. - * - * .. code-block:: c - * - * dma_buf_map_clear(&map); - * - * Test if a mapping is valid with either dma_buf_map_is_set() or - * dma_buf_map_is_null(). - * - * .. code-block:: c - * - * if (dma_buf_map_is_set(&map) != dma_buf_map_is_null(&map)) - * // always true - * - * Instances of :c:type:`struct dma_buf_map ` can be compared - * for equality with dma_buf_map_is_equal(). Mappings the point to different - * memory spaces, system or I/O, are never equal. That's even true if both - * spaces are located in the same address space, both mappings contain the - * same address value, or both mappings refer to NULL. - * - * .. code-block:: c - * - * struct dma_buf_map sys_map; // refers to system memory - * struct dma_buf_map io_map; // refers to I/O memory - * - * if (dma_buf_map_is_equal(&sys_map, &io_map)) - * // always false - * - * A set up instance of struct dma_buf_map can be used to access or manipulate - * the buffer memory. Depending on the location of the memory, the provided - * helpers will pick the correct operations. Data can be copied into the memory - * with dma_buf_map_memcpy_to(). The address can be manipulated with - * dma_buf_map_incr(). - * - * .. code-block:: c - * - * const void *src = ...; // source buffer - * size_t len = ...; // length of src - * - * dma_buf_map_memcpy_to(&map, src, len); - * dma_buf_map_incr(&map, len); // go to first byte after the memcpy - */ - -/** - * struct dma_buf_map - Pointer to vmap'ed dma-buf memory. - * @vaddr_iomem: The buffer's address if in I/O memory - * @vaddr: The buffer's address if in system memory - * @is_iomem: True if the dma-buf memory is located in I/O - * memory, or false otherwise. - */ -struct dma_buf_map { - union { - void __iomem *vaddr_iomem; - void *vaddr; - }; - bool is_iomem; -}; - -/** - * DMA_BUF_MAP_INIT_VADDR - Initializes struct dma_buf_map to an address in system memory - * @vaddr_: A system-memory address - */ -#define DMA_BUF_MAP_INIT_VADDR(vaddr_) \ - { \ - .vaddr = (vaddr_), \ - .is_iomem = false, \ - } - -/** - * dma_buf_map_set_vaddr - Sets a dma-buf mapping structure to an address in system memory - * @map: The dma-buf mapping structure - * @vaddr: A system-memory address - * - * Sets the address and clears the I/O-memory flag. - */ -static inline void dma_buf_map_set_vaddr(struct dma_buf_map *map, void *vaddr) -{ - map->vaddr = vaddr; - map->is_iomem = false; -} - -/** - * dma_buf_map_set_vaddr_iomem - Sets a dma-buf mapping structure to an address in I/O memory - * @map: The dma-buf mapping structure - * @vaddr_iomem: An I/O-memory address - * - * Sets the address and the I/O-memory flag. - */ -static inline void dma_buf_map_set_vaddr_iomem(struct dma_buf_map *map, - void __iomem *vaddr_iomem) -{ - map->vaddr_iomem = vaddr_iomem; - map->is_iomem = true; -} - -/** - * dma_buf_map_is_equal - Compares two dma-buf mapping structures for equality - * @lhs: The dma-buf mapping structure - * @rhs: A dma-buf mapping structure to compare with - * - * Two dma-buf mapping structures are equal if they both refer to the same type of memory - * and to the same address within that memory. - * - * Returns: - * True is both structures are equal, or false otherwise. - */ -static inline bool dma_buf_map_is_equal(const struct dma_buf_map *lhs, - const struct dma_buf_map *rhs) -{ - if (lhs->is_iomem != rhs->is_iomem) - return false; - else if (lhs->is_iomem) - return lhs->vaddr_iomem == rhs->vaddr_iomem; - else - return lhs->vaddr == rhs->vaddr; -} - -/** - * dma_buf_map_is_null - Tests for a dma-buf mapping to be NULL - * @map: The dma-buf mapping structure - * - * Depending on the state of struct dma_buf_map.is_iomem, tests if the - * mapping is NULL. - * - * Returns: - * True if the mapping is NULL, or false otherwise. - */ -static inline bool dma_buf_map_is_null(const struct dma_buf_map *map) -{ - if (map->is_iomem) - return !map->vaddr_iomem; - return !map->vaddr; -} - -/** - * dma_buf_map_is_set - Tests is the dma-buf mapping has been set - * @map: The dma-buf mapping structure - * - * Depending on the state of struct dma_buf_map.is_iomem, tests if the - * mapping has been set. - * - * Returns: - * True if the mapping is been set, or false otherwise. - */ -static inline bool dma_buf_map_is_set(const struct dma_buf_map *map) -{ - return !dma_buf_map_is_null(map); -} - -/** - * dma_buf_map_clear - Clears a dma-buf mapping structure - * @map: The dma-buf mapping structure - * - * Clears all fields to zero; including struct dma_buf_map.is_iomem. So - * mapping structures that were set to point to I/O memory are reset for - * system memory. Pointers are cleared to NULL. This is the default. - */ -static inline void dma_buf_map_clear(struct dma_buf_map *map) -{ - if (map->is_iomem) { - map->vaddr_iomem = NULL; - map->is_iomem = false; - } else { - map->vaddr = NULL; - } -} - -/** - * dma_buf_map_memcpy_to - Memcpy into dma-buf mapping - * @dst: The dma-buf mapping structure - * @src: The source buffer - * @len: The number of byte in src - * - * Copies data into a dma-buf mapping. The source buffer is in system - * memory. Depending on the buffer's location, the helper picks the correct - * method of accessing the memory. - */ -static inline void dma_buf_map_memcpy_to(struct dma_buf_map *dst, const void *src, size_t len) -{ - if (dst->is_iomem) - memcpy_toio(dst->vaddr_iomem, src, len); - else - memcpy(dst->vaddr, src, len); -} - -/** - * dma_buf_map_incr - Increments the address stored in a dma-buf mapping - * @map: The dma-buf mapping structure - * @incr: The number of bytes to increment - * - * Increments the address stored in a dma-buf mapping. Depending on the - * buffer's location, the correct value will be updated. - */ -static inline void dma_buf_map_incr(struct dma_buf_map *map, size_t incr) -{ - if (map->is_iomem) - map->vaddr_iomem += incr; - else - map->vaddr += incr; -} - -#endif /* __DMA_BUF_MAP_H__ */ diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 7ab50076e7a6..2097760e8e95 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -13,7 +13,7 @@ #ifndef __DMA_BUF_H__ #define __DMA_BUF_H__ -#include +#include #include #include #include @@ -283,8 +283,8 @@ struct dma_buf_ops { */ int (*mmap)(struct dma_buf *, struct vm_area_struct *vma); - int (*vmap)(struct dma_buf *dmabuf, struct dma_buf_map *map); - void (*vunmap)(struct dma_buf *dmabuf, struct dma_buf_map *map); + int (*vmap)(struct dma_buf *dmabuf, struct iosys_map *map); + void (*vunmap)(struct dma_buf *dmabuf, struct iosys_map *map); }; /** @@ -347,7 +347,7 @@ struct dma_buf { * @vmap_ptr: * The current vmap ptr if @vmapping_counter > 0. Protected by @lock. */ - struct dma_buf_map vmap_ptr; + struct iosys_map vmap_ptr; /** * @exp_name: @@ -628,6 +628,6 @@ int dma_buf_end_cpu_access(struct dma_buf *dma_buf, int dma_buf_mmap(struct dma_buf *, struct vm_area_struct *, unsigned long); -int dma_buf_vmap(struct dma_buf *dmabuf, struct dma_buf_map *map); -void dma_buf_vunmap(struct dma_buf *dmabuf, struct dma_buf_map *map); +int dma_buf_vmap(struct dma_buf *dmabuf, struct iosys_map *map); +void dma_buf_vunmap(struct dma_buf *dmabuf, struct iosys_map *map); #endif /* __DMA_BUF_H__ */ diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h new file mode 100644 index 000000000000..f4186f91caa6 --- /dev/null +++ b/include/linux/iosys-map.h @@ -0,0 +1,257 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Pointer abstraction for IO/system memory + */ + +#ifndef __IOSYS_MAP_H__ +#define __IOSYS_MAP_H__ + +#include +#include + +/** + * DOC: overview + * + * When accessing a memory region, depending on its location, users may have to + * access it with I/O operations or memory load/store operations. For example, + * copying to system memory could be done with memcpy(), copying to I/O memory + * would be done with memcpy_toio(). + * + * .. code-block:: c + * + * void *vaddr = ...; // pointer to system memory + * memcpy(vaddr, src, len); + * + * void *vaddr_iomem = ...; // pointer to I/O memory + * memcpy_toio(vaddr, _iomem, src, len); + * + * The user of such pointer may not have information about the mapping of that + * region or may want to have a single code path to handle operations on that + * buffer, regardless if it's located in system or IO memory. The type + * :c:type:`struct iosys_map ` and its helpers abstract that so the + * buffer can be passed around to other drivers or have separate duties inside + * the same driver for allocation, read and write operations. + * + * Open-coding access to :c:type:`struct iosys_map ` is considered + * bad style. Rather then accessing its fields directly, use one of the provided + * helper functions, or implement your own. For example, instances of + * :c:type:`struct iosys_map ` can be initialized statically with + * IOSYS_MAP_INIT_VADDR(), or at runtime with iosys_map_set_vaddr(). These + * helpers will set an address in system memory. + * + * .. code-block:: c + * + * struct iosys_map map = IOSYS_MAP_INIT_VADDR(0xdeadbeaf); + * + * iosys_map_set_vaddr(&map, 0xdeadbeaf); + * + * To set an address in I/O memory, use iosys_map_set_vaddr_iomem(). + * + * .. code-block:: c + * + * iosys_map_set_vaddr_iomem(&map, 0xdeadbeaf); + * + * Instances of struct iosys_map do not have to be cleaned up, but + * can be cleared to NULL with iosys_map_clear(). Cleared mappings + * always refer to system memory. + * + * .. code-block:: c + * + * iosys_map_clear(&map); + * + * Test if a mapping is valid with either iosys_map_is_set() or + * iosys_map_is_null(). + * + * .. code-block:: c + * + * if (iosys_map_is_set(&map) != iosys_map_is_null(&map)) + * // always true + * + * Instances of :c:type:`struct iosys_map ` can be compared for + * equality with iosys_map_is_equal(). Mappings that point to different memory + * spaces, system or I/O, are never equal. That's even true if both spaces are + * located in the same address space, both mappings contain the same address + * value, or both mappings refer to NULL. + * + * .. code-block:: c + * + * struct iosys_map sys_map; // refers to system memory + * struct iosys_map io_map; // refers to I/O memory + * + * if (iosys_map_is_equal(&sys_map, &io_map)) + * // always false + * + * A set up instance of struct iosys_map can be used to access or manipulate the + * buffer memory. Depending on the location of the memory, the provided helpers + * will pick the correct operations. Data can be copied into the memory with + * iosys_map_memcpy_to(). The address can be manipulated with iosys_map_incr(). + * + * .. code-block:: c + * + * const void *src = ...; // source buffer + * size_t len = ...; // length of src + * + * iosys_map_memcpy_to(&map, src, len); + * iosys_map_incr(&map, len); // go to first byte after the memcpy + */ + +/** + * struct iosys_map - Pointer to IO/system memory + * @vaddr_iomem: The buffer's address if in I/O memory + * @vaddr: The buffer's address if in system memory + * @is_iomem: True if the buffer is located in I/O memory, or false + * otherwise. + */ +struct iosys_map { + union { + void __iomem *vaddr_iomem; + void *vaddr; + }; + bool is_iomem; +}; + +/** + * IOSYS_MAP_INIT_VADDR - Initializes struct iosys_map to an address in system memory + * @vaddr_: A system-memory address + */ +#define IOSYS_MAP_INIT_VADDR(vaddr_) \ + { \ + .vaddr = (vaddr_), \ + .is_iomem = false, \ + } + +/** + * iosys_map_set_vaddr - Sets a iosys mapping structure to an address in system memory + * @map: The iosys_map structure + * @vaddr: A system-memory address + * + * Sets the address and clears the I/O-memory flag. + */ +static inline void iosys_map_set_vaddr(struct iosys_map *map, void *vaddr) +{ + map->vaddr = vaddr; + map->is_iomem = false; +} + +/** + * iosys_map_set_vaddr_iomem - Sets a iosys mapping structure to an address in I/O memory + * @map: The iosys_map structure + * @vaddr_iomem: An I/O-memory address + * + * Sets the address and the I/O-memory flag. + */ +static inline void iosys_map_set_vaddr_iomem(struct iosys_map *map, + void __iomem *vaddr_iomem) +{ + map->vaddr_iomem = vaddr_iomem; + map->is_iomem = true; +} + +/** + * iosys_map_is_equal - Compares two iosys mapping structures for equality + * @lhs: The iosys_map structure + * @rhs: A iosys_map structure to compare with + * + * Two iosys mapping structures are equal if they both refer to the same type of memory + * and to the same address within that memory. + * + * Returns: + * True is both structures are equal, or false otherwise. + */ +static inline bool iosys_map_is_equal(const struct iosys_map *lhs, + const struct iosys_map *rhs) +{ + if (lhs->is_iomem != rhs->is_iomem) + return false; + else if (lhs->is_iomem) + return lhs->vaddr_iomem == rhs->vaddr_iomem; + else + return lhs->vaddr == rhs->vaddr; +} + +/** + * iosys_map_is_null - Tests for a iosys mapping to be NULL + * @map: The iosys_map structure + * + * Depending on the state of struct iosys_map.is_iomem, tests if the + * mapping is NULL. + * + * Returns: + * True if the mapping is NULL, or false otherwise. + */ +static inline bool iosys_map_is_null(const struct iosys_map *map) +{ + if (map->is_iomem) + return !map->vaddr_iomem; + return !map->vaddr; +} + +/** + * iosys_map_is_set - Tests if the iosys mapping has been set + * @map: The iosys_map structure + * + * Depending on the state of struct iosys_map.is_iomem, tests if the + * mapping has been set. + * + * Returns: + * True if the mapping is been set, or false otherwise. + */ +static inline bool iosys_map_is_set(const struct iosys_map *map) +{ + return !iosys_map_is_null(map); +} + +/** + * iosys_map_clear - Clears a iosys mapping structure + * @map: The iosys_map structure + * + * Clears all fields to zero, including struct iosys_map.is_iomem, so + * mapping structures that were set to point to I/O memory are reset for + * system memory. Pointers are cleared to NULL. This is the default. + */ +static inline void iosys_map_clear(struct iosys_map *map) +{ + if (map->is_iomem) { + map->vaddr_iomem = NULL; + map->is_iomem = false; + } else { + map->vaddr = NULL; + } +} + +/** + * iosys_map_memcpy_to - Memcpy into iosys mapping + * @dst: The iosys_map structure + * @src: The source buffer + * @len: The number of byte in src + * + * Copies data into a iosys mapping. The source buffer is in system + * memory. Depending on the buffer's location, the helper picks the correct + * method of accessing the memory. + */ +static inline void iosys_map_memcpy_to(struct iosys_map *dst, const void *src, + size_t len) +{ + if (dst->is_iomem) + memcpy_toio(dst->vaddr_iomem, src, len); + else + memcpy(dst->vaddr, src, len); +} + +/** + * iosys_map_incr - Increments the address stored in a iosys mapping + * @map: The iosys_map structure + * @incr: The number of bytes to increment + * + * Increments the address stored in a iosys mapping. Depending on the + * buffer's location, the correct value will be updated. + */ +static inline void iosys_map_incr(struct iosys_map *map, size_t incr) +{ + if (map->is_iomem) + map->vaddr_iomem += incr; + else + map->vaddr += incr; +} + +#endif /* __IOSYS_MAP_H__ */ -- cgit From 3486bedd99196ecdfe99c0ab5b67ad3c47e8a8fa Mon Sep 17 00:00:00 2001 From: Song Liu Date: Fri, 4 Feb 2022 10:57:35 -0800 Subject: bpf: Use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem This enables sub-page memory charge and allocation. Signed-off-by: Song Liu Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220204185742.271030-3-song@kernel.org --- include/linux/bpf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6eb0b180d33b..366f88afd56b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -846,8 +846,8 @@ void bpf_image_ksym_add(void *data, struct bpf_ksym *ksym); void bpf_image_ksym_del(struct bpf_ksym *ksym); void bpf_ksym_add(struct bpf_ksym *ksym); void bpf_ksym_del(struct bpf_ksym *ksym); -int bpf_jit_charge_modmem(u32 pages); -void bpf_jit_uncharge_modmem(u32 pages); +int bpf_jit_charge_modmem(u32 size); +void bpf_jit_uncharge_modmem(u32 size); bool bpf_prog_has_trampoline(const struct bpf_prog *prog); #else static inline int bpf_trampoline_link_prog(struct bpf_prog *prog, -- cgit From ed2d9e1a26cca963ff5ed3b76326d70f7d8201a9 Mon Sep 17 00:00:00 2001 From: Song Liu Date: Fri, 4 Feb 2022 10:57:36 -0800 Subject: bpf: Use size instead of pages in bpf_binary_header This is necessary to charge sub page memory for the BPF program. Signed-off-by: Song Liu Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220204185742.271030-4-song@kernel.org --- include/linux/filter.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index d23e999dc032..5855eb474c62 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -548,7 +548,7 @@ struct sock_fprog_kern { #define BPF_IMAGE_ALIGNMENT 8 struct bpf_binary_header { - u32 pages; + u32 size; u8 image[] __aligned(BPF_IMAGE_ALIGNMENT); }; @@ -886,8 +886,8 @@ static inline void bpf_prog_lock_ro(struct bpf_prog *fp) static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) { set_vm_flush_reset_perms(hdr); - set_memory_ro((unsigned long)hdr, hdr->pages); - set_memory_x((unsigned long)hdr, hdr->pages); + set_memory_ro((unsigned long)hdr, hdr->size >> PAGE_SHIFT); + set_memory_x((unsigned long)hdr, hdr->size >> PAGE_SHIFT); } static inline struct bpf_binary_header * -- cgit From ebc1415d9b4f043cef5a1fb002ec316e32167e7a Mon Sep 17 00:00:00 2001 From: Song Liu Date: Fri, 4 Feb 2022 10:57:39 -0800 Subject: bpf: Introduce bpf_arch_text_copy This will be used to copy JITed text to RO protected module memory. On x86, bpf_arch_text_copy is implemented with text_poke_copy. bpf_arch_text_copy returns pointer to dst on success, and ERR_PTR(errno) on errors. Signed-off-by: Song Liu Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220204185742.271030-7-song@kernel.org --- include/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 366f88afd56b..ea0d7fd4a410 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2362,6 +2362,8 @@ enum bpf_text_poke_type { int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, void *addr1, void *addr2); +void *bpf_arch_text_copy(void *dst, void *src, size_t len); + struct btf_id_set; bool btf_id_set_contains(const struct btf_id_set *set, u32 id); -- cgit From 33c9805860e584b194199cab1a1e81f4e6395408 Mon Sep 17 00:00:00 2001 From: Song Liu Date: Fri, 4 Feb 2022 10:57:41 -0800 Subject: bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free] This is the jit binary allocator built on top of bpf_prog_pack. bpf_prog_pack allocates RO memory, which cannot be used directly by the JIT engine. Therefore, a temporary rw buffer is allocated for the JIT engine. Once JIT is done, bpf_jit_binary_pack_finalize is used to copy the program to the RO memory. bpf_jit_binary_pack_alloc reserves 16 bytes of extra space for illegal instructions, which is small than the 128 bytes space reserved by bpf_jit_binary_alloc. This change is necessary for bpf_jit_binary_hdr to find the correct header. Also, flag use_bpf_prog_pack is added to differentiate a program allocated by bpf_jit_binary_pack_alloc. Signed-off-by: Song Liu Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220204185742.271030-9-song@kernel.org --- include/linux/bpf.h | 1 + include/linux/filter.h | 21 ++++++++++++--------- 2 files changed, 13 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index ea0d7fd4a410..2fc7e5c5ef41 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -953,6 +953,7 @@ struct bpf_prog_aux { bool sleepable; bool tail_call_reachable; bool xdp_has_frags; + bool use_bpf_prog_pack; struct hlist_node tramp_hlist; /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ const struct btf_type *attach_func_proto; diff --git a/include/linux/filter.h b/include/linux/filter.h index 5855eb474c62..1cb1af917617 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -890,15 +890,6 @@ static inline void bpf_jit_binary_lock_ro(struct bpf_binary_header *hdr) set_memory_x((unsigned long)hdr, hdr->size >> PAGE_SHIFT); } -static inline struct bpf_binary_header * -bpf_jit_binary_hdr(const struct bpf_prog *fp) -{ - unsigned long real_start = (unsigned long)fp->bpf_func; - unsigned long addr = real_start & PAGE_MASK; - - return (void *)addr; -} - int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap); static inline int sk_filter(struct sock *sk, struct sk_buff *skb) { @@ -1068,6 +1059,18 @@ void *bpf_jit_alloc_exec(unsigned long size); void bpf_jit_free_exec(void *addr); void bpf_jit_free(struct bpf_prog *fp); +struct bpf_binary_header * +bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image, + unsigned int alignment, + struct bpf_binary_header **rw_hdr, + u8 **rw_image, + bpf_jit_fill_hole_t bpf_fill_ill_insns); +int bpf_jit_binary_pack_finalize(struct bpf_prog *prog, + struct bpf_binary_header *ro_header, + struct bpf_binary_header *rw_header); +void bpf_jit_binary_pack_free(struct bpf_binary_header *ro_header, + struct bpf_binary_header *rw_header); + int bpf_jit_add_poke_descriptor(struct bpf_prog *prog, struct bpf_jit_poke_descriptor *poke); -- cgit From 976b6d97c62347df3e686f60a5f455bb8ed6ea23 Mon Sep 17 00:00:00 2001 From: Christian König Date: Wed, 19 Jan 2022 11:17:32 +0100 Subject: dma-buf: consolidate dma_fence subclass checking MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Consolidate the wrapper functions to check for dma_fence subclasses in the dma_fence header. This makes it easier to document and also check the different requirements for fence containers in the subclasses. Signed-off-by: Christian König Reviewed-by: Thomas Hellström Link: https://patchwork.freedesktop.org/patch/msgid/20220204100429.2049-2-christian.koenig@amd.com --- include/linux/dma-fence-array.h | 15 +-------------- include/linux/dma-fence-chain.h | 3 +-- include/linux/dma-fence.h | 38 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 40 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h index 303dd712220f..fec374f69e12 100644 --- a/include/linux/dma-fence-array.h +++ b/include/linux/dma-fence-array.h @@ -45,19 +45,6 @@ struct dma_fence_array { struct irq_work work; }; -extern const struct dma_fence_ops dma_fence_array_ops; - -/** - * dma_fence_is_array - check if a fence is from the array subsclass - * @fence: fence to test - * - * Return true if it is a dma_fence_array and false otherwise. - */ -static inline bool dma_fence_is_array(struct dma_fence *fence) -{ - return fence->ops == &dma_fence_array_ops; -} - /** * to_dma_fence_array - cast a fence to a dma_fence_array * @fence: fence to cast to a dma_fence_array @@ -68,7 +55,7 @@ static inline bool dma_fence_is_array(struct dma_fence *fence) static inline struct dma_fence_array * to_dma_fence_array(struct dma_fence *fence) { - if (fence->ops != &dma_fence_array_ops) + if (!fence || !dma_fence_is_array(fence)) return NULL; return container_of(fence, struct dma_fence_array, base); diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h index 54fe3443fd2c..ee906b659694 100644 --- a/include/linux/dma-fence-chain.h +++ b/include/linux/dma-fence-chain.h @@ -49,7 +49,6 @@ struct dma_fence_chain { spinlock_t lock; }; -extern const struct dma_fence_ops dma_fence_chain_ops; /** * to_dma_fence_chain - cast a fence to a dma_fence_chain @@ -61,7 +60,7 @@ extern const struct dma_fence_ops dma_fence_chain_ops; static inline struct dma_fence_chain * to_dma_fence_chain(struct dma_fence *fence) { - if (!fence || fence->ops != &dma_fence_chain_ops) + if (!fence || !dma_fence_is_chain(fence)) return NULL; return container_of(fence, struct dma_fence_chain, base); diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h index 1ea691753bd3..775cdc0b4f24 100644 --- a/include/linux/dma-fence.h +++ b/include/linux/dma-fence.h @@ -587,4 +587,42 @@ struct dma_fence *dma_fence_get_stub(void); struct dma_fence *dma_fence_allocate_private_stub(void); u64 dma_fence_context_alloc(unsigned num); +extern const struct dma_fence_ops dma_fence_array_ops; +extern const struct dma_fence_ops dma_fence_chain_ops; + +/** + * dma_fence_is_array - check if a fence is from the array subclass + * @fence: the fence to test + * + * Return true if it is a dma_fence_array and false otherwise. + */ +static inline bool dma_fence_is_array(struct dma_fence *fence) +{ + return fence->ops == &dma_fence_array_ops; +} + +/** + * dma_fence_is_chain - check if a fence is from the chain subclass + * @fence: the fence to test + * + * Return true if it is a dma_fence_chain and false otherwise. + */ +static inline bool dma_fence_is_chain(struct dma_fence *fence) +{ + return fence->ops == &dma_fence_chain_ops; +} + +/** + * dma_fence_is_container - check if a fence is a container for other fences + * @fence: the fence to test + * + * Return true if this fence is a container for other fences, false otherwise. + * This is important since we can't build up large fence structure or otherwise + * we run into recursion during operation on those fences. + */ +static inline bool dma_fence_is_container(struct dma_fence *fence) +{ + return dma_fence_is_array(fence) || dma_fence_is_chain(fence); +} + #endif /* __LINUX_DMA_FENCE_H */ -- cgit From 18f5fad275efef015226ee4f90eae34d8f44aa5e Mon Sep 17 00:00:00 2001 From: Christian König Date: Thu, 20 Jan 2022 11:42:40 +0100 Subject: dma-buf: add dma_fence_chain_contained helper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It's a reoccurring pattern that we need to extract the fence from a dma_fence_chain object. Add a helper for this. Signed-off-by: Christian König Reviewed-by: Thomas Hellström Link: https://patchwork.freedesktop.org/patch/msgid/20220204100429.2049-6-christian.koenig@amd.com --- include/linux/dma-fence-chain.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h index ee906b659694..10d51bcdf7b7 100644 --- a/include/linux/dma-fence-chain.h +++ b/include/linux/dma-fence-chain.h @@ -66,6 +66,21 @@ to_dma_fence_chain(struct dma_fence *fence) return container_of(fence, struct dma_fence_chain, base); } +/** + * dma_fence_chain_contained - return the contained fence + * @fence: the fence to test + * + * If the fence is a dma_fence_chain the function returns the fence contained + * inside the chain object, otherwise it returns the fence itself. + */ +static inline struct dma_fence * +dma_fence_chain_contained(struct dma_fence *fence) +{ + struct dma_fence_chain *chain = to_dma_fence_chain(fence); + + return chain ? chain->fence : fence; +} + /** * dma_fence_chain_alloc * -- cgit From 5913eb45d036288babb67e97b210233537ef7b90 Mon Sep 17 00:00:00 2001 From: Alistair Francis Date: Mon, 24 Jan 2022 22:10:04 +1000 Subject: mfd: simple-mfd-i2c: Enable support for the silergy,sy7636a Signed-off-by: Alistair Francis Signed-off-by: Lee Jones --- include/linux/mfd/sy7636a.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) create mode 100644 include/linux/mfd/sy7636a.h (limited to 'include/linux') diff --git a/include/linux/mfd/sy7636a.h b/include/linux/mfd/sy7636a.h new file mode 100644 index 000000000000..22f03b2f851e --- /dev/null +++ b/include/linux/mfd/sy7636a.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Functions to access SY3686A power management chip. + * + * Copyright (C) 2021 reMarkable AS - http://www.remarkable.com/ + */ + +#ifndef __MFD_SY7636A_H +#define __MFD_SY7636A_H + +#define SY7636A_REG_OPERATION_MODE_CRL 0x00 +/* It is set if a gpio is used to control the regulator */ +#define SY7636A_OPERATION_MODE_CRL_VCOMCTL BIT(6) +#define SY7636A_OPERATION_MODE_CRL_ONOFF BIT(7) +#define SY7636A_REG_VCOM_ADJUST_CTRL_L 0x01 +#define SY7636A_REG_VCOM_ADJUST_CTRL_H 0x02 +#define SY7636A_REG_VCOM_ADJUST_CTRL_MASK 0x01ff +#define SY7636A_REG_VLDO_VOLTAGE_ADJULST_CTRL 0x03 +#define SY7636A_REG_POWER_ON_DELAY_TIME 0x06 +#define SY7636A_REG_FAULT_FLAG 0x07 +#define SY7636A_FAULT_FLAG_PG BIT(0) +#define SY7636A_REG_TERMISTOR_READOUT 0x08 + +#define SY7636A_REG_MAX 0x08 + +#define VCOM_ADJUST_CTRL_MASK 0x1ff +// Used to shift the high byte +#define VCOM_ADJUST_CTRL_SHIFT 8 +// Used to scale from VCOM_ADJUST_CTRL to mv +#define VCOM_ADJUST_CTRL_SCAL 10000 + +#define FAULT_FLAG_SHIFT 1 + +#endif /* __LINUX_MFD_SY7636A_H */ -- cgit From fac608138c6136126faadafa5554cc0bbabf3c44 Mon Sep 17 00:00:00 2001 From: Jorgen Hansen Date: Mon, 7 Feb 2022 02:27:18 -0800 Subject: VMCI: dma dg: whitespace formatting change for vmci register defines Update formatting of existing register defines in preparation for adding additional register definitions for the VMCI device. Reviewed-by: Vishnu Dasa Signed-off-by: Jorgen Hansen Link: https://lore.kernel.org/r/20220207102725.2742-2-jhansen@vmware.com Signed-off-by: Greg Kroah-Hartman --- include/linux/vmw_vmci_defs.h | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h index e36cb114c188..9911ecfc18ba 100644 --- a/include/linux/vmw_vmci_defs.h +++ b/include/linux/vmw_vmci_defs.h @@ -12,15 +12,15 @@ #include /* Register offsets. */ -#define VMCI_STATUS_ADDR 0x00 -#define VMCI_CONTROL_ADDR 0x04 -#define VMCI_ICR_ADDR 0x08 -#define VMCI_IMR_ADDR 0x0c -#define VMCI_DATA_OUT_ADDR 0x10 -#define VMCI_DATA_IN_ADDR 0x14 -#define VMCI_CAPS_ADDR 0x18 -#define VMCI_RESULT_LOW_ADDR 0x1c -#define VMCI_RESULT_HIGH_ADDR 0x20 +#define VMCI_STATUS_ADDR 0x00 +#define VMCI_CONTROL_ADDR 0x04 +#define VMCI_ICR_ADDR 0x08 +#define VMCI_IMR_ADDR 0x0c +#define VMCI_DATA_OUT_ADDR 0x10 +#define VMCI_DATA_IN_ADDR 0x14 +#define VMCI_CAPS_ADDR 0x18 +#define VMCI_RESULT_LOW_ADDR 0x1c +#define VMCI_RESULT_HIGH_ADDR 0x20 /* Max number of devices. */ #define VMCI_MAX_DEVICES 1 -- cgit From e283a0e8b7ea83915e988ed059384af166b444c0 Mon Sep 17 00:00:00 2001 From: Jorgen Hansen Date: Mon, 7 Feb 2022 02:27:19 -0800 Subject: VMCI: dma dg: add MMIO access to registers Detect the support for MMIO access through examination of the length of the region requested in BAR1. If it is 256KB, the VMCI device supports MMIO access to registers. If MMIO access is supported, map the area of the region used for MMIO access (64KB size at offset 128KB). Add wrapper functions for accessing 32 bit register accesses through either MMIO or IO ports based on device configuration. Sending and receiving datagrams through iowrite8_rep/ioread8_rep is left unchanged for now, and will be addressed in a later change. Reviewed-by: Vishnu Dasa Signed-off-by: Jorgen Hansen Link: https://lore.kernel.org/r/20220207102725.2742-3-jhansen@vmware.com Signed-off-by: Greg Kroah-Hartman --- include/linux/vmw_vmci_defs.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h index 9911ecfc18ba..8fc00e2685cf 100644 --- a/include/linux/vmw_vmci_defs.h +++ b/include/linux/vmw_vmci_defs.h @@ -82,6 +82,18 @@ enum { */ #define VMCI_MAX_PINNED_QP_MEMORY ((size_t)(32 * 1024)) +/* + * The version of the VMCI device that supports MMIO access to registers + * requests 256KB for BAR1 whereas the version of VMCI that supports + * MSI/MSI-X only requests 8KB. The layout of the larger 256KB region is: + * - the first 128KB are used for MSI/MSI-X. + * - the following 64KB are used for MMIO register access. + * - the remaining 64KB are unused. + */ +#define VMCI_WITH_MMIO_ACCESS_BAR_SIZE ((size_t)(256 * 1024)) +#define VMCI_MMIO_ACCESS_OFFSET ((size_t)(128 * 1024)) +#define VMCI_MMIO_ACCESS_SIZE ((size_t)(64 * 1024)) + /* * We have a fixed set of resource IDs available in the VMX. * This allows us to have a very simple implementation since we statically -- cgit From eed2298d936087a1c85e0fa6f7170028e4f4fded Mon Sep 17 00:00:00 2001 From: Jorgen Hansen Date: Mon, 7 Feb 2022 02:27:20 -0800 Subject: VMCI: dma dg: detect DMA datagram capability Detect the VMCI DMA datagram capability, and if present, ack it to the device. Reviewed-by: Vishnu Dasa Signed-off-by: Jorgen Hansen Link: https://lore.kernel.org/r/20220207102725.2742-4-jhansen@vmware.com Signed-off-by: Greg Kroah-Hartman --- include/linux/vmw_vmci_defs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h index 8fc00e2685cf..1ce2cffdc3ae 100644 --- a/include/linux/vmw_vmci_defs.h +++ b/include/linux/vmw_vmci_defs.h @@ -39,6 +39,7 @@ #define VMCI_CAPS_DATAGRAM BIT(2) #define VMCI_CAPS_NOTIFICATIONS BIT(3) #define VMCI_CAPS_PPN64 BIT(4) +#define VMCI_CAPS_DMA_DATAGRAM BIT(5) /* Interrupt Cause register bits. */ #define VMCI_ICR_DATAGRAM BIT(0) -- cgit From 8cb520bea1470ca205980fbf030ed1f472f4af2f Mon Sep 17 00:00:00 2001 From: Jorgen Hansen Date: Mon, 7 Feb 2022 02:27:21 -0800 Subject: VMCI: dma dg: set OS page size Tell the device the page size used by the OS. Reviewed-by: Vishnu Dasa Signed-off-by: Jorgen Hansen Link: https://lore.kernel.org/r/20220207102725.2742-5-jhansen@vmware.com Signed-off-by: Greg Kroah-Hartman --- include/linux/vmw_vmci_defs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h index 1ce2cffdc3ae..4167779469fd 100644 --- a/include/linux/vmw_vmci_defs.h +++ b/include/linux/vmw_vmci_defs.h @@ -21,6 +21,7 @@ #define VMCI_CAPS_ADDR 0x18 #define VMCI_RESULT_LOW_ADDR 0x1c #define VMCI_RESULT_HIGH_ADDR 0x20 +#define VMCI_GUEST_PAGE_SHIFT 0x34 /* Max number of devices. */ #define VMCI_MAX_DEVICES 1 -- cgit From cc68f2177fcbfe2dbe5e9514789b96ba5995ec1e Mon Sep 17 00:00:00 2001 From: Jorgen Hansen Date: Mon, 7 Feb 2022 02:27:22 -0800 Subject: VMCI: dma dg: register dummy IRQ handlers for DMA datagrams Register dummy interrupt handlers for DMA datagrams in preparation for DMA datagram receive operations. Reviewed-by: Vishnu Dasa Signed-off-by: Jorgen Hansen Link: https://lore.kernel.org/r/20220207102725.2742-6-jhansen@vmware.com Signed-off-by: Greg Kroah-Hartman --- include/linux/vmw_vmci_defs.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h index 4167779469fd..2b70c024dacb 100644 --- a/include/linux/vmw_vmci_defs.h +++ b/include/linux/vmw_vmci_defs.h @@ -45,13 +45,22 @@ /* Interrupt Cause register bits. */ #define VMCI_ICR_DATAGRAM BIT(0) #define VMCI_ICR_NOTIFICATION BIT(1) +#define VMCI_ICR_DMA_DATAGRAM BIT(2) /* Interrupt Mask register bits. */ #define VMCI_IMR_DATAGRAM BIT(0) #define VMCI_IMR_NOTIFICATION BIT(1) +#define VMCI_IMR_DMA_DATAGRAM BIT(2) -/* Maximum MSI/MSI-X interrupt vectors in the device. */ -#define VMCI_MAX_INTRS 2 +/* + * Maximum MSI/MSI-X interrupt vectors in the device. + * If VMCI_CAPS_DMA_DATAGRAM is supported by the device, + * VMCI_MAX_INTRS_DMA_DATAGRAM vectors are available, + * otherwise only VMCI_MAX_INTRS_NOTIFICATION. + */ +#define VMCI_MAX_INTRS_NOTIFICATION 2 +#define VMCI_MAX_INTRS_DMA_DATAGRAM 3 +#define VMCI_MAX_INTRS VMCI_MAX_INTRS_DMA_DATAGRAM /* * Supported interrupt vectors. There is one for each ICR value above, @@ -60,6 +69,7 @@ enum { VMCI_INTR_DATAGRAM = 0, VMCI_INTR_NOTIFICATION = 1, + VMCI_INTR_DMA_DATAGRAM = 2, }; /* -- cgit From 5ee109828e73bbe4213c373988608d8f33e03d78 Mon Sep 17 00:00:00 2001 From: Jorgen Hansen Date: Mon, 7 Feb 2022 02:27:23 -0800 Subject: VMCI: dma dg: allocate send and receive buffers for DMA datagrams If DMA datagrams are used, allocate send and receive buffers in coherent DMA memory. This is done in preparation for the send and receive datagram operations, where the buffers are used for the exchange of data between driver and device. Reviewed-by: Vishnu Dasa Signed-off-by: Jorgen Hansen Link: https://lore.kernel.org/r/20220207102725.2742-7-jhansen@vmware.com Signed-off-by: Greg Kroah-Hartman --- include/linux/vmw_vmci_defs.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h index 2b70c024dacb..8bc37d8244a8 100644 --- a/include/linux/vmw_vmci_defs.h +++ b/include/linux/vmw_vmci_defs.h @@ -21,6 +21,10 @@ #define VMCI_CAPS_ADDR 0x18 #define VMCI_RESULT_LOW_ADDR 0x1c #define VMCI_RESULT_HIGH_ADDR 0x20 +#define VMCI_DATA_OUT_LOW_ADDR 0x24 +#define VMCI_DATA_OUT_HIGH_ADDR 0x28 +#define VMCI_DATA_IN_LOW_ADDR 0x2c +#define VMCI_DATA_IN_HIGH_ADDR 0x30 #define VMCI_GUEST_PAGE_SHIFT 0x34 /* Max number of devices. */ -- cgit From 22aa5c7f323022477b70e044eb00e6bfea9498e8 Mon Sep 17 00:00:00 2001 From: Jorgen Hansen Date: Mon, 7 Feb 2022 02:27:24 -0800 Subject: VMCI: dma dg: add support for DMA datagrams sends Use DMA based send operation from the transmit buffer instead of the iowrite8_rep based datagram send when DMA datagrams are supported. The outgoing datagram is sent as inline data in the VMCI transmit buffer. Once the header has been configured, the send is initiated by writing the lower 32 bit of the buffer base address to the VMCI_DATA_OUT_LOW_ADDR register. Only then will the device process the header and the datagram itself. Following that, the driver busy waits (it isn't possible to sleep on the send path) for the header busy flag to change - indicating that the send is complete. Reviewed-by: Vishnu Dasa Signed-off-by: Jorgen Hansen Link: https://lore.kernel.org/r/20220207102725.2742-8-jhansen@vmware.com Signed-off-by: Greg Kroah-Hartman --- include/linux/vmw_vmci_defs.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vmw_vmci_defs.h b/include/linux/vmw_vmci_defs.h index 8bc37d8244a8..6fb663b36f72 100644 --- a/include/linux/vmw_vmci_defs.h +++ b/include/linux/vmw_vmci_defs.h @@ -110,6 +110,40 @@ enum { #define VMCI_MMIO_ACCESS_OFFSET ((size_t)(128 * 1024)) #define VMCI_MMIO_ACCESS_SIZE ((size_t)(64 * 1024)) +/* + * For VMCI devices supporting the VMCI_CAPS_DMA_DATAGRAM capability, the + * sending and receiving of datagrams can be performed using DMA to/from + * a driver allocated buffer. + * Sending and receiving will be handled as follows: + * - when sending datagrams, the driver initializes the buffer where the + * data part will refer to the outgoing VMCI datagram, sets the busy flag + * to 1 and writes the address of the buffer to VMCI_DATA_OUT_HIGH_ADDR + * and VMCI_DATA_OUT_LOW_ADDR. Writing to VMCI_DATA_OUT_LOW_ADDR triggers + * the device processing of the buffer. When the device has processed the + * buffer, it will write the result value to the buffer and then clear the + * busy flag. + * - when receiving datagrams, the driver initializes the buffer where the + * data part will describe the receive buffer, clears the busy flag and + * writes the address of the buffer to VMCI_DATA_IN_HIGH_ADDR and + * VMCI_DATA_IN_LOW_ADDR. Writing to VMCI_DATA_IN_LOW_ADDR triggers the + * device processing of the buffer. The device will copy as many available + * datagrams into the buffer as possible, and then sets the busy flag. + * When the busy flag is set, the driver will process the datagrams in the + * buffer. + */ +struct vmci_data_in_out_header { + uint32_t busy; + uint32_t opcode; + uint32_t size; + uint32_t rsvd; + uint64_t result; +}; + +struct vmci_sg_elem { + uint64_t addr; + uint64_t size; +}; + /* * We have a fixed set of resource IDs available in the VMX. * This allows us to have a very simple implementation since we statically -- cgit From 3301bc53358a0eb0a0db65fd7a513cd4cb50c83a Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Mon, 10 Jan 2022 15:29:45 +0800 Subject: lib/sbitmap: kill 'depth' from sbitmap_word Only the last sbitmap_word can have different depth, and all the others must have same depth of 1U << sb->shift, so not necessary to store it in sbitmap_word, and it can be retrieved easily and efficiently by adding one internal helper of __map_depth(sb, index). Remove 'depth' field from sbitmap_word, then the annotation of ____cacheline_aligned_in_smp for 'word' isn't needed any more. Not see performance effect when running high parallel IOPS test on null_blk. This way saves us one cacheline(usually 64 words) per each sbitmap_word. Cc: Martin Wilck Signed-off-by: Ming Lei Reviewed-by: Martin Wilck Reviewed-by: John Garry Link: https://lore.kernel.org/r/20220110072945.347535-1-ming.lei@redhat.com Signed-off-by: Jens Axboe --- include/linux/sbitmap.h | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 95df357ec009..df3b584b0f0c 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -27,15 +27,10 @@ struct seq_file; * struct sbitmap_word - Word in a &struct sbitmap. */ struct sbitmap_word { - /** - * @depth: Number of bits being used in @word/@cleared - */ - unsigned long depth; - /** * @word: word holding free bits */ - unsigned long word ____cacheline_aligned_in_smp; + unsigned long word; /** * @cleared: word holding cleared bits @@ -164,6 +159,14 @@ struct sbitmap_queue { int sbitmap_init_node(struct sbitmap *sb, unsigned int depth, int shift, gfp_t flags, int node, bool round_robin, bool alloc_hint); +/* sbitmap internal helper */ +static inline unsigned int __map_depth(const struct sbitmap *sb, int index) +{ + if (index == sb->map_nr - 1) + return sb->depth - (index << sb->shift); + return 1U << sb->shift; +} + /** * sbitmap_free() - Free memory used by a &struct sbitmap. * @sb: Bitmap to free. @@ -251,7 +254,7 @@ static inline void __sbitmap_for_each_set(struct sbitmap *sb, while (scanned < sb->depth) { unsigned long word; unsigned int depth = min_t(unsigned int, - sb->map[index].depth - nr, + __map_depth(sb, index) - nr, sb->depth - scanned); scanned += depth; -- cgit From 3f607293b74d6acb06571a774a500143c1f0ed2c Mon Sep 17 00:00:00 2001 From: John Garry Date: Tue, 8 Feb 2022 20:07:04 +0800 Subject: sbitmap: Delete old sbitmap_queue_get_shallow() Since __sbitmap_queue_get_shallow() was introduced in commit c05e66733788 ("sbitmap: add sbitmap_get_shallow() operation"), it has not been used. Delete __sbitmap_queue_get_shallow() and rename public __sbitmap_queue_get_shallow() -> sbitmap_queue_get_shallow() as it is odd to have public __foo but no foo at all. Signed-off-by: John Garry Link: https://lore.kernel.org/r/1644322024-105340-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe --- include/linux/sbitmap.h | 34 ++++------------------------------ 1 file changed, 4 insertions(+), 30 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index df3b584b0f0c..dffeb8281c2d 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -135,7 +135,7 @@ struct sbitmap_queue { /** * @min_shallow_depth: The minimum shallow depth which may be passed to - * sbitmap_queue_get_shallow() or __sbitmap_queue_get_shallow(). + * sbitmap_queue_get_shallow() */ unsigned int min_shallow_depth; }; @@ -463,7 +463,7 @@ unsigned long __sbitmap_queue_get_batch(struct sbitmap_queue *sbq, int nr_tags, unsigned int *offset); /** - * __sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct + * sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct * sbitmap_queue, limiting the depth used from each word, with preemption * already disabled. * @sbq: Bitmap queue to allocate from. @@ -475,8 +475,8 @@ unsigned long __sbitmap_queue_get_batch(struct sbitmap_queue *sbq, int nr_tags, * * Return: Non-negative allocated bit number if successful, -1 otherwise. */ -int __sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, - unsigned int shallow_depth); +int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, + unsigned int shallow_depth); /** * sbitmap_queue_get() - Try to allocate a free bit from a &struct @@ -498,32 +498,6 @@ static inline int sbitmap_queue_get(struct sbitmap_queue *sbq, return nr; } -/** - * sbitmap_queue_get_shallow() - Try to allocate a free bit from a &struct - * sbitmap_queue, limiting the depth used from each word. - * @sbq: Bitmap queue to allocate from. - * @cpu: Output parameter; will contain the CPU we ran on (e.g., to be passed to - * sbitmap_queue_clear()). - * @shallow_depth: The maximum number of bits to allocate from a single word. - * See sbitmap_get_shallow(). - * - * If you call this, make sure to call sbitmap_queue_min_shallow_depth() after - * initializing @sbq. - * - * Return: Non-negative allocated bit number if successful, -1 otherwise. - */ -static inline int sbitmap_queue_get_shallow(struct sbitmap_queue *sbq, - unsigned int *cpu, - unsigned int shallow_depth) -{ - int nr; - - *cpu = get_cpu(); - nr = __sbitmap_queue_get_shallow(sbq, shallow_depth); - put_cpu(); - return nr; -} - /** * sbitmap_queue_min_shallow_depth() - Inform a &struct sbitmap_queue of the * minimum shallow depth that will be used. -- cgit From 2093057ab879da1070c851b9e07126eaa86d0dfc Mon Sep 17 00:00:00 2001 From: Alexandru Elisei Date: Thu, 27 Jan 2022 16:17:55 +0000 Subject: perf: Fix wrong name in comment for struct perf_cpu_context Commit 0793a61d4df8 ("performance counters: core code") added the perf subsystem (then called Performance Counters) to Linux, creating the struct perf_cpu_context. The comment for the struct referred to it as a "struct perf_counter_cpu_context". Commit cdd6c482c9ff ("perf: Do the big rename: Performance Counters -> Performance Events") changed the comment to refer to a "struct perf_event_cpu_context", which was still the wrong name for the struct. Change the comment to say "struct perf_cpu_context". CC: Thomas Gleixner CC: Ingo Molnar Signed-off-by: Alexandru Elisei Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220127161759.53553-3-alexandru.elisei@arm.com --- include/linux/perf_event.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 733649184b27..af97dd427501 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -864,7 +864,7 @@ struct perf_event_context { #define PERF_NR_CONTEXTS 4 /** - * struct perf_event_cpu_context - per cpu event context structure + * struct perf_cpu_context - per cpu event context structure */ struct perf_cpu_context { struct perf_event_context ctx; -- cgit From c6c89783eba05a5e159b07cfd8c68d841cc5de42 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 28 Jan 2022 15:39:36 -0800 Subject: fscrypt: add functions for direct I/O support Encrypted files traditionally haven't supported DIO, due to the need to encrypt/decrypt the data. However, when the encryption is implemented using inline encryption (blk-crypto) instead of the traditional filesystem-layer encryption, it is straightforward to support DIO. In preparation for supporting this, add the following functions: - fscrypt_dio_supported() checks whether a DIO request is supported as far as encryption is concerned. Encrypted files will only support DIO when inline encryption is used and the I/O request is properly aligned; this function checks these preconditions. - fscrypt_limit_io_blocks() limits the length of a bio to avoid crossing a place in the file that a bio with an encryption context cannot cross due to a DUN discontiguity. This function is needed by filesystems that use the iomap DIO implementation (which operates directly on logical ranges, so it won't use fscrypt_mergeable_bio()) and that support FSCRYPT_POLICY_FLAG_IV_INO_LBLK_32. Co-developed-by: Satya Tangirala Signed-off-by: Satya Tangirala Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220128233940.79464-2-ebiggers@kernel.org Signed-off-by: Eric Biggers --- include/linux/fscrypt.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index 91ea9477e9bd..50d92d805bd8 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -714,6 +714,10 @@ bool fscrypt_mergeable_bio(struct bio *bio, const struct inode *inode, bool fscrypt_mergeable_bio_bh(struct bio *bio, const struct buffer_head *next_bh); +bool fscrypt_dio_supported(struct kiocb *iocb, struct iov_iter *iter); + +u64 fscrypt_limit_io_blocks(const struct inode *inode, u64 lblk, u64 nr_blocks); + #else /* CONFIG_FS_ENCRYPTION_INLINE_CRYPT */ static inline bool __fscrypt_inode_uses_inline_crypto(const struct inode *inode) @@ -742,6 +746,20 @@ static inline bool fscrypt_mergeable_bio_bh(struct bio *bio, { return true; } + +static inline bool fscrypt_dio_supported(struct kiocb *iocb, + struct iov_iter *iter) +{ + const struct inode *inode = file_inode(iocb->ki_filp); + + return !fscrypt_needs_contents_encryption(inode); +} + +static inline u64 fscrypt_limit_io_blocks(const struct inode *inode, u64 lblk, + u64 nr_blocks) +{ + return nr_blocks; +} #endif /* !CONFIG_FS_ENCRYPTION_INLINE_CRYPT */ /** -- cgit From b2309a71c1f2fc841feb184195b2e46b2e139bf4 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 7 Feb 2022 10:41:07 -0800 Subject: net: add dev->dev_registered_tracker Convert one dev_hold()/dev_put() pair in register_netdevice() and unregister_netdevice_many() to dev_hold_track() and dev_put_track(). This would allow to detect a rogue dev_put() a bit earlier. Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20220207184107.1401096-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 3fb6fb67ed77..5f6e2c0b0c90 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1948,6 +1948,8 @@ enum netdev_ml_priv_type { * @dev_addr_shadow: Copy of @dev_addr to catch direct writes. * @linkwatch_dev_tracker: refcount tracker used by linkwatch. * @watchdog_dev_tracker: refcount tracker used by watchdog. + * @dev_registered_tracker: tracker for reference held while + * registered * * FIXME: cleanup struct net_device such that network protocol info * moves out. @@ -2282,6 +2284,7 @@ struct net_device { u8 dev_addr_shadow[MAX_ADDR_LEN]; netdevice_tracker linkwatch_dev_tracker; netdevice_tracker watchdog_dev_tracker; + netdevice_tracker dev_registered_tracker; }; #define to_net_dev(d) container_of(d, struct net_device, dev) -- cgit From 6523d3b2ffa238ac033c34a726617061d6a744aa Mon Sep 17 00:00:00 2001 From: Iwona Winiarska Date: Tue, 8 Feb 2022 16:36:30 +0100 Subject: peci: Add core infrastructure Intel processors provide access for various services designed to support processor and DRAM thermal management, platform manageability and processor interface tuning and diagnostics. Those services are available via the Platform Environment Control Interface (PECI) that provides a communication channel between the processor and the Baseboard Management Controller (BMC) or other platform management device. This change introduces PECI subsystem by adding the initial core module and API for controller drivers. Co-developed-by: Jason M Bills Co-developed-by: Jae Hyun Yoo Reviewed-by: Pierre-Louis Bossart Acked-by: Joel Stanley Signed-off-by: Jason M Bills Signed-off-by: Jae Hyun Yoo Signed-off-by: Iwona Winiarska Link: https://lore.kernel.org/r/20220208153639.255278-5-iwona.winiarska@intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/peci.h | 99 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 99 insertions(+) create mode 100644 include/linux/peci.h (limited to 'include/linux') diff --git a/include/linux/peci.h b/include/linux/peci.h new file mode 100644 index 000000000000..26e0a4e73b50 --- /dev/null +++ b/include/linux/peci.h @@ -0,0 +1,99 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2018-2021 Intel Corporation */ + +#ifndef __LINUX_PECI_H +#define __LINUX_PECI_H + +#include +#include +#include +#include + +/* + * Currently we don't support any PECI command over 32 bytes. + */ +#define PECI_REQUEST_MAX_BUF_SIZE 32 + +struct peci_controller; +struct peci_request; + +/** + * struct peci_controller_ops - PECI controller specific methods + * @xfer: PECI transfer function + * + * PECI controllers may have different hardware interfaces - the drivers + * implementing PECI controllers can use this structure to abstract away those + * differences by exposing a common interface for PECI core. + */ +struct peci_controller_ops { + int (*xfer)(struct peci_controller *controller, u8 addr, struct peci_request *req); +}; + +/** + * struct peci_controller - PECI controller + * @dev: device object to register PECI controller to the device model + * @ops: pointer to device specific controller operations + * @bus_lock: lock used to protect multiple callers + * @id: PECI controller ID + * + * PECI controllers usually connect to their drivers using non-PECI bus, + * such as the platform bus. + * Each PECI controller can communicate with one or more PECI devices. + */ +struct peci_controller { + struct device dev; + struct peci_controller_ops *ops; + struct mutex bus_lock; /* held for the duration of xfer */ + u8 id; +}; + +struct peci_controller *devm_peci_controller_add(struct device *parent, + struct peci_controller_ops *ops); + +static inline struct peci_controller *to_peci_controller(void *d) +{ + return container_of(d, struct peci_controller, dev); +} + +/** + * struct peci_device - PECI device + * @dev: device object to register PECI device to the device model + * @controller: manages the bus segment hosting this PECI device + * @addr: address used on the PECI bus connected to the parent controller + * + * A peci_device identifies a single device (i.e. CPU) connected to a PECI bus. + * The behaviour exposed to the rest of the system is defined by the PECI driver + * managing the device. + */ +struct peci_device { + struct device dev; + u8 addr; +}; + +static inline struct peci_device *to_peci_device(struct device *d) +{ + return container_of(d, struct peci_device, dev); +} + +/** + * struct peci_request - PECI request + * @device: PECI device to which the request is sent + * @tx: TX buffer specific data + * @tx.buf: TX buffer + * @tx.len: transfer data length in bytes + * @rx: RX buffer specific data + * @rx.buf: RX buffer + * @rx.len: received data length in bytes + * + * A peci_request represents a request issued by PECI originator (TX) and + * a response received from PECI responder (RX). + */ +struct peci_request { + struct peci_device *device; + struct { + u8 buf[PECI_REQUEST_MAX_BUF_SIZE]; + u8 len; + } rx, tx; +}; + +#endif /* __LINUX_PECI_H */ -- cgit From 52857e6828e260b16ac569578705f83cf2a71ac1 Mon Sep 17 00:00:00 2001 From: Iwona Winiarska Date: Tue, 8 Feb 2022 16:36:32 +0100 Subject: peci: Add device detection Since PECI devices are discoverable, we can dynamically detect devices that are actually available in the system. This change complements the earlier implementation by rescanning PECI bus to detect available devices. For this purpose, it also introduces the minimal API for PECI requests. Reviewed-by: Pierre-Louis Bossart Acked-by: Joel Stanley Signed-off-by: Iwona Winiarska Link: https://lore.kernel.org/r/20220208153639.255278-7-iwona.winiarska@intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/peci.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/peci.h b/include/linux/peci.h index 26e0a4e73b50..7e35673f3786 100644 --- a/include/linux/peci.h +++ b/include/linux/peci.h @@ -60,6 +60,7 @@ static inline struct peci_controller *to_peci_controller(void *d) * @dev: device object to register PECI device to the device model * @controller: manages the bus segment hosting this PECI device * @addr: address used on the PECI bus connected to the parent controller + * @deleted: indicates that PECI device was already deleted * * A peci_device identifies a single device (i.e. CPU) connected to a PECI bus. * The behaviour exposed to the rest of the system is defined by the PECI driver @@ -68,6 +69,7 @@ static inline struct peci_controller *to_peci_controller(void *d) struct peci_device { struct device dev; u8 addr; + bool deleted; }; static inline struct peci_device *to_peci_device(struct device *d) -- cgit From 6b8145b054b27319dddaad4abbb5184e343375da Mon Sep 17 00:00:00 2001 From: Iwona Winiarska Date: Tue, 8 Feb 2022 16:36:34 +0100 Subject: peci: Add support for PECI device drivers Add support for PECI device drivers, which unlike PECI controller drivers are actually able to provide functionalities to userspace. Also, extend peci_request API to allow querying more details about PECI device (e.g. model/family), that's going to be used to find a compatible peci_driver. Reviewed-by: Pierre-Louis Bossart Acked-by: Joel Stanley Signed-off-by: Iwona Winiarska Link: https://lore.kernel.org/r/20220208153639.255278-9-iwona.winiarska@intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/peci.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include/linux') diff --git a/include/linux/peci.h b/include/linux/peci.h index 7e35673f3786..4eda423ba10c 100644 --- a/include/linux/peci.h +++ b/include/linux/peci.h @@ -14,6 +14,14 @@ */ #define PECI_REQUEST_MAX_BUF_SIZE 32 +#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */ +#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */ +#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */ +#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */ +#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */ +#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */ +#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */ + struct peci_controller; struct peci_request; @@ -59,6 +67,11 @@ static inline struct peci_controller *to_peci_controller(void *d) * struct peci_device - PECI device * @dev: device object to register PECI device to the device model * @controller: manages the bus segment hosting this PECI device + * @info: PECI device characteristics + * @info.family: device family + * @info.model: device model + * @info.peci_revision: PECI revision supported by the PECI device + * @info.socket_id: the socket ID represented by the PECI device * @addr: address used on the PECI bus connected to the parent controller * @deleted: indicates that PECI device was already deleted * @@ -68,6 +81,12 @@ static inline struct peci_controller *to_peci_controller(void *d) */ struct peci_device { struct device dev; + struct { + u16 family; + u8 model; + u8 peci_revision; + u8 socket_id; + } info; u8 addr; bool deleted; }; -- cgit From 93e1821c80f9460c8931dc4bc090ede794f966cd Mon Sep 17 00:00:00 2001 From: Iwona Winiarska Date: Tue, 8 Feb 2022 16:36:35 +0100 Subject: peci: Add peci-cpu driver PECI is an interface that may be used by different types of devices. Add a peci-cpu driver compatible with Intel processors. The driver is responsible for handling auxiliary devices that can subsequently be used by other drivers (e.g. hwmons). Reviewed-by: Pierre-Louis Bossart Acked-by: Joel Stanley Signed-off-by: Iwona Winiarska Link: https://lore.kernel.org/r/20220208153639.255278-10-iwona.winiarska@intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/peci-cpu.h | 40 ++++++++++++++++++++++++++++++++++++++++ include/linux/peci.h | 8 -------- 2 files changed, 40 insertions(+), 8 deletions(-) create mode 100644 include/linux/peci-cpu.h (limited to 'include/linux') diff --git a/include/linux/peci-cpu.h b/include/linux/peci-cpu.h new file mode 100644 index 000000000000..ff8ae9c26c80 --- /dev/null +++ b/include/linux/peci-cpu.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2021 Intel Corporation */ + +#ifndef __LINUX_PECI_CPU_H +#define __LINUX_PECI_CPU_H + +#include + +#include "../../arch/x86/include/asm/intel-family.h" + +#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */ +#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */ +#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */ +#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */ +#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */ +#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */ +#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */ +#define PECI_PCS_MODULE_TEMP 9 /* Per Core DTS Temperature Read */ +#define PECI_PCS_THERMAL_MARGIN 10 /* DTS thermal margin */ +#define PECI_PCS_DDR_DIMM_TEMP 14 /* DDR DIMM Temperature */ +#define PECI_PCS_TEMP_TARGET 16 /* Temperature Target Read */ +#define PECI_PCS_TDP_UNITS 30 /* Units for power/energy registers */ + +struct peci_device; + +int peci_temp_read(struct peci_device *device, s16 *temp_raw); + +int peci_pcs_read(struct peci_device *device, u8 index, + u16 param, u32 *data); + +int peci_pci_local_read(struct peci_device *device, u8 bus, u8 dev, + u8 func, u16 reg, u32 *data); + +int peci_ep_pci_local_read(struct peci_device *device, u8 seg, + u8 bus, u8 dev, u8 func, u16 reg, u32 *data); + +int peci_mmio_read(struct peci_device *device, u8 bar, u8 seg, + u8 bus, u8 dev, u8 func, u64 address, u32 *data); + +#endif /* __LINUX_PECI_CPU_H */ diff --git a/include/linux/peci.h b/include/linux/peci.h index 4eda423ba10c..06e6ef935297 100644 --- a/include/linux/peci.h +++ b/include/linux/peci.h @@ -14,14 +14,6 @@ */ #define PECI_REQUEST_MAX_BUF_SIZE 32 -#define PECI_PCS_PKG_ID 0 /* Package Identifier Read */ -#define PECI_PKG_ID_CPU_ID 0x0000 /* CPUID Info */ -#define PECI_PKG_ID_PLATFORM_ID 0x0001 /* Platform ID */ -#define PECI_PKG_ID_DEVICE_ID 0x0002 /* Uncore Device ID */ -#define PECI_PKG_ID_MAX_THREAD_ID 0x0003 /* Max Thread ID */ -#define PECI_PKG_ID_MICROCODE_REV 0x0004 /* CPU Microcode Update Revision */ -#define PECI_PKG_ID_MCA_ERROR_LOG 0x0005 /* Machine Check Status */ - struct peci_controller; struct peci_request; -- cgit From 4f774c4a65bf3987d1a95c966e884f38c8a942af Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Thu, 27 Jan 2022 19:25:53 -0800 Subject: cpufreq: Reintroduce ready() callback This effectively revert '4bf8e582119e ("cpufreq: Remove ready() callback")', in order to reintroduce the ready callback. This is needed in order to be able to leave the thermal pressure interrupts in the Qualcomm CPUfreq driver disabled during initialization, so that it doesn't fire while related_cpus are still 0. Signed-off-by: Bjorn Andersson [ Viresh: Added the Chinese translation as well and updated commit msg ] Signed-off-by: Viresh Kumar --- include/linux/cpufreq.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 1ab29e61b078..3522a272b74d 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -382,6 +382,9 @@ struct cpufreq_driver { int (*suspend)(struct cpufreq_policy *policy); int (*resume)(struct cpufreq_policy *policy); + /* Will be called after the driver is fully initialized */ + void (*ready)(struct cpufreq_policy *policy); + struct freq_attr **attr; /* platform specific boost support code */ -- cgit From 8cba323437a49a45756d661f500b324fc2d486fe Mon Sep 17 00:00:00 2001 From: Sean Nyekjaer Date: Tue, 8 Feb 2022 09:52:13 +0100 Subject: mtd: rawnand: protect access to rawnand devices while in suspend Prevent rawnand access while in a suspended state. Commit 013e6292aaf5 ("mtd: rawnand: Simplify the locking") allows the rawnand layer to return errors rather than waiting in a blocking wait. Tested on a iMX6ULL. Fixes: 013e6292aaf5 ("mtd: rawnand: Simplify the locking") Signed-off-by: Sean Nyekjaer Reviewed-by: Boris Brezillon Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220208085213.1838273-1-sean@geanix.com --- include/linux/mtd/rawnand.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mtd/rawnand.h b/include/linux/mtd/rawnand.h index 5b88cd51fadb..dcf90144d70b 100644 --- a/include/linux/mtd/rawnand.h +++ b/include/linux/mtd/rawnand.h @@ -1240,6 +1240,7 @@ struct nand_secure_region { * @lock: Lock protecting the suspended field. Also used to serialize accesses * to the NAND device * @suspended: Set to 1 when the device is suspended, 0 when it's not + * @resume_wq: wait queue to sleep if rawnand is in suspended state. * @cur_cs: Currently selected target. -1 means no target selected, otherwise we * should always have cur_cs >= 0 && cur_cs < nanddev_ntargets(). * NAND Controller drivers should not modify this value, but they're @@ -1294,6 +1295,7 @@ struct nand_chip { /* Internals */ struct mutex lock; unsigned int suspended : 1; + wait_queue_head_t resume_wq; int cur_cs; int read_retries; struct nand_secure_region *secure_regions; -- cgit From 5145abeb0649acf810a32e63bd762e617a9b3309 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 16 Dec 2021 12:16:41 +0100 Subject: mtd: nand: ecc: Provide a helper to retrieve a pilelined engine device In a pipelined engine situation, we might either have the host which internally has support for error correction, or have it using an external hardware block for this purpose. In the former case, the host is also the ECC engine. In the latter case, it is not. In order to get the right pointers on the right devices (for example: in order to devm_* allocate variables), let's introduce this helper which can safely be called by pipelined ECC engines in order to retrieve the right device structure. Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20211216111654.238086-16-miquel.raynal@bootlin.com --- include/linux/mtd/nand.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index b617efa0a881..615b3e3a3920 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h @@ -309,6 +309,7 @@ struct nand_ecc_engine *nand_ecc_get_sw_engine(struct nand_device *nand); struct nand_ecc_engine *nand_ecc_get_on_die_hw_engine(struct nand_device *nand); struct nand_ecc_engine *nand_ecc_get_on_host_hw_engine(struct nand_device *nand); void nand_ecc_put_on_host_hw_engine(struct nand_device *nand); +struct device *nand_ecc_get_engine_dev(struct device *host); #if IS_ENABLED(CONFIG_MTD_NAND_ECC_SW_HAMMING) struct nand_ecc_engine *nand_ecc_sw_hamming_get_engine(void); -- cgit From 4398693a9e24bcab0b99ea219073917991d0792b Mon Sep 17 00:00:00 2001 From: Bartosz Golaszewski Date: Tue, 8 Feb 2022 11:48:31 +0100 Subject: gpiolib: make struct comments into real kernel docs We have several comments that start with '/**' but don't conform to the kernel doc standard. Add proper detailed descriptions for the affected definitions and move the docs from the forward declarations to the struct definitions where applicable. Reported-by: Randy Dunlap Signed-off-by: Bartosz Golaszewski Reviewed-by: Andy Shevchenko Tested-by: Randy Dunlap --- include/linux/gpio/consumer.h | 35 ++++++++++++++++------------------- 1 file changed, 16 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h index 3ad67b4a72be..c3aa8b330e1c 100644 --- a/include/linux/gpio/consumer.h +++ b/include/linux/gpio/consumer.h @@ -8,27 +8,16 @@ #include struct device; - -/** - * Opaque descriptor for a GPIO. These are obtained using gpiod_get() and are - * preferable to the old integer-based handles. - * - * Contrary to integers, a pointer to a gpio_desc is guaranteed to be valid - * until the GPIO is released. - */ struct gpio_desc; - -/** - * Opaque descriptor for a structure of GPIO array attributes. This structure - * is attached to struct gpiod_descs obtained from gpiod_get_array() and can be - * passed back to get/set array functions in order to activate fast processing - * path if applicable. - */ struct gpio_array; /** - * Struct containing an array of descriptors that can be obtained using - * gpiod_get_array(). + * struct gpio_descs - Struct containing an array of descriptors that can be + * obtained using gpiod_get_array() + * + * @info: Pointer to the opaque gpio_array structure + * @ndescs: Number of held descriptors + * @desc: Array of pointers to GPIO descriptors */ struct gpio_descs { struct gpio_array *info; @@ -43,8 +32,16 @@ struct gpio_descs { #define GPIOD_FLAGS_BIT_NONEXCLUSIVE BIT(4) /** - * Optional flags that can be passed to one of gpiod_* to configure direction - * and output value. These values cannot be OR'd. + * enum gpiod_flags - Optional flags that can be passed to one of gpiod_* to + * configure direction and output value. These values + * cannot be OR'd. + * + * @GPIOD_ASIS: Don't change anything + * @GPIOD_IN: Set lines to input mode + * @GPIOD_OUT_LOW: Set lines to output and drive them low + * @GPIOD_OUT_HIGH: Set lines to output and drive them high + * @GPIOD_OUT_LOW_OPEN_DRAIN: Set lines to open-drain output and drive them low + * @GPIOD_OUT_HIGH_OPEN_DRAIN: Set lines to open-drain output and drive them high */ enum gpiod_flags { GPIOD_ASIS = 0, -- cgit From a0386bba70934d42f586eaf68b21d5eeaffa7bd0 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Sun, 23 Jan 2022 18:52:01 +0100 Subject: spi: make remove callback a void function MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The value returned by an spi driver's remove function is mostly ignored. (Only an error message is printed if the value is non-zero that the error is ignored.) So change the prototype of the remove function to return no value. This way driver authors are not tempted to assume that passing an error to the upper layer is a good idea. All drivers are adapted accordingly. There is no intended change of behaviour, all callbacks were prepared to return 0 before. Signed-off-by: Uwe Kleine-König Acked-by: Marc Kleine-Budde Acked-by: Andy Shevchenko Reviewed-by: Geert Uytterhoeven Acked-by: Jérôme Pouiller Acked-by: Miquel Raynal Acked-by: Jonathan Cameron Acked-by: Claudius Heine Acked-by: Stefan Schmidt Acked-by: Alexandre Belloni Acked-by: Ulf Hansson # For MMC Acked-by: Marcus Folkesson Acked-by: Łukasz Stelmach Acked-by: Lee Jones Link: https://lore.kernel.org/r/20220123175201.34839-6-u.kleine-koenig@pengutronix.de Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 7ab3fed7b804..c84e61b99c7b 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -280,7 +280,7 @@ struct spi_message; struct spi_driver { const struct spi_device_id *id_table; int (*probe)(struct spi_device *spi); - int (*remove)(struct spi_device *spi); + void (*remove)(struct spi_device *spi); void (*shutdown)(struct spi_device *spi); struct device_driver driver; }; -- cgit From 1f8863bfb5ca500ea1c7669b16b1931ba27fce20 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 1 Feb 2022 12:02:59 +0000 Subject: genirq: Allow the PM device to originate from irq domain As a preparation to moving the reference to the device used for runtime power management, add a new 'dev' field to the irqdomain structure for that exact purpose. The irq_chip_pm_{get,put}() helpers are made aware of the dual location via a new private helper. No functional change intended. Signed-off-by: Marc Zyngier Reviewed-by: Geert Uytterhoeven Tested-by: Geert Uytterhoeven Tested-by: Tony Lindgren Acked-by: Bartosz Golaszewski Link: https://lore.kernel.org/r/20220201120310.878267-2-maz@kernel.org --- include/linux/irqdomain.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index d476405802e9..be25a33293e5 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -151,6 +151,8 @@ struct irq_domain_chip_generic; * @gc: Pointer to a list of generic chips. There is a helper function for * setting up one or more generic chips for interrupt controllers * drivers using the generic chip library which uses this pointer. + * @dev: Pointer to a device that the domain represent, and that will be + * used for power management purposes. * @parent: Pointer to parent irq_domain to support hierarchy irq_domains * * Revmap data, used internally by irq_domain @@ -171,6 +173,7 @@ struct irq_domain { struct fwnode_handle *fwnode; enum irq_domain_bus_token bus_token; struct irq_domain_chip_generic *gc; + struct device *dev; #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY struct irq_domain *parent; #endif @@ -226,6 +229,13 @@ static inline struct device_node *irq_domain_get_of_node(struct irq_domain *d) return to_of_node(d->fwnode); } +static inline void irq_domain_set_pm_device(struct irq_domain *d, + struct device *dev) +{ + if (d) + d->dev = dev; +} + #ifdef CONFIG_IRQ_DOMAIN struct fwnode_handle *__irq_domain_alloc_fwnode(unsigned int type, int id, const char *name, phys_addr_t *pa); -- cgit From c306d737691ef84305d4ed0d302c63db2932f0bb Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 25 Jan 2022 15:57:45 -0500 Subject: NFSD: Deprecate NFS_OFFSET_MAX NFS_OFFSET_MAX was introduced way back in Linux v2.3.y before there was a kernel-wide OFFSET_MAX value. As a clean up, replace the last few uses of it with its generic equivalent, and get rid of it. Signed-off-by: Chuck Lever --- include/linux/nfs.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs.h b/include/linux/nfs.h index 0dc7ad38a0da..b06375e88e58 100644 --- a/include/linux/nfs.h +++ b/include/linux/nfs.h @@ -36,14 +36,6 @@ static inline void nfs_copy_fh(struct nfs_fh *target, const struct nfs_fh *sourc memcpy(target->data, source->data, source->size); } - -/* - * This is really a general kernel constant, but since nothing like - * this is defined in the kernel headers, I have to do it here. - */ -#define NFS_OFFSET_MAX ((__s64)((~(__u64)0) >> 1)) - - enum nfs3_stable_how { NFS_UNSTABLE = 0, NFS_DATA_SYNC = 1, -- cgit From 70e038f89b467995708207fb57bbf46aec32dc2c Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 16 Dec 2021 12:16:42 +0100 Subject: mtd: nand: mxic-ecc: Support SPI pipelined mode Introduce the support for another possible configuration: the ECC engine may work as DMA master (pipelined) and move itself the data to/from the NAND chip into the buffer, applying the necessary corrections/computations on the fly. This driver offers an ECC engine implementation that must be instatiated from a SPI controller driver. Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20211216111654.238086-17-miquel.raynal@bootlin.com --- include/linux/mtd/nand-ecc-mxic.h | 49 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 49 insertions(+) create mode 100644 include/linux/mtd/nand-ecc-mxic.h (limited to 'include/linux') diff --git a/include/linux/mtd/nand-ecc-mxic.h b/include/linux/mtd/nand-ecc-mxic.h new file mode 100644 index 000000000000..f3aa1ac82aed --- /dev/null +++ b/include/linux/mtd/nand-ecc-mxic.h @@ -0,0 +1,49 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright © 2019 Macronix + * Author: Miquèl Raynal + * + * Header for the Macronix external ECC engine. + */ + +#ifndef __MTD_NAND_ECC_MXIC_H__ +#define __MTD_NAND_ECC_MXIC_H__ + +#include +#include + +struct mxic_ecc_engine; + +#if IS_ENABLED(CONFIG_MTD_NAND_ECC_MXIC) + +struct nand_ecc_engine_ops *mxic_ecc_get_pipelined_ops(void); +struct nand_ecc_engine *mxic_ecc_get_pipelined_engine(struct platform_device *spi_pdev); +void mxic_ecc_put_pipelined_engine(struct nand_ecc_engine *eng); +int mxic_ecc_process_data_pipelined(struct nand_ecc_engine *eng, + unsigned int direction, dma_addr_t dirmap); + +#else /* !CONFIG_MTD_NAND_ECC_MXIC */ + +static inline struct nand_ecc_engine_ops *mxic_ecc_get_pipelined_ops(void) +{ + return NULL; +} + +static inline struct nand_ecc_engine * +mxic_ecc_get_pipelined_engine(struct platform_device *spi_pdev) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +static inline void mxic_ecc_put_pipelined_engine(struct nand_ecc_engine *eng) {} + +static inline int mxic_ecc_process_data_pipelined(struct nand_ecc_engine *eng, + unsigned int direction, + dma_addr_t dirmap) +{ + return -EOPNOTSUPP; +} + +#endif /* CONFIG_MTD_NAND_ECC_MXIC */ + +#endif /* __MTD_NAND_ECC_MXIC_H__ */ -- cgit From 4a3cc7fb6e63bcfdedec25364738f1493345bd20 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 27 Jan 2022 10:17:56 +0100 Subject: spi: spi-mem: Introduce a capability structure Create a spi_controller_mem_caps structure and put it within the spi_controller structure close to the spi_controller_mem_ops strucure. So far the only field in this structure is the support for dtr operations, but soon we will add another parameter. Also create a helper to parse the capabilities and check if the requested capability has been set or not. Signed-off-by: Miquel Raynal Reviewed-by: Pratyush Yadav Reviewed-by: Boris Brezillon Reviewed-by: Tudor Ambarus Reviewed-by: Mark Brown Link: https://lore.kernel.org/linux-mtd/20220127091808.1043392-2-miquel.raynal@bootlin.com --- include/linux/spi/spi-mem.h | 11 +++++++++++ include/linux/spi/spi.h | 3 +++ 2 files changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h index 85e2ff7b840d..38e5d45c9842 100644 --- a/include/linux/spi/spi-mem.h +++ b/include/linux/spi/spi-mem.h @@ -285,6 +285,17 @@ struct spi_controller_mem_ops { unsigned long timeout_ms); }; +/** + * struct spi_controller_mem_caps - SPI memory controller capabilities + * @dtr: Supports DTR operations + */ +struct spi_controller_mem_caps { + bool dtr; +}; + +#define spi_mem_controller_is_capable(ctlr, cap) \ + ((ctlr)->mem_caps && (ctlr)->mem_caps->cap) + /** * struct spi_mem_driver - SPI memory driver * @spidrv: inherit from a SPI driver diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 7ab3fed7b804..cf99a1ee0e74 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -23,6 +23,7 @@ struct ptp_system_timestamp; struct spi_controller; struct spi_transfer; struct spi_controller_mem_ops; +struct spi_controller_mem_caps; /* * INTERFACES between SPI master-side drivers and SPI slave protocol handlers, @@ -415,6 +416,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @mem_ops: optimized/dedicated operations for interactions with SPI memory. * This field is optional and should only be implemented if the * controller has native support for memory like operations. + * @mem_caps: controller capabilities for the handling of memory operations. * @unprepare_message: undo any work done by prepare_message(). * @slave_abort: abort the ongoing transfer request on an SPI slave controller * @cs_gpios: LEGACY: array of GPIO descs to use as chip select lines; one per @@ -639,6 +641,7 @@ struct spi_controller { /* Optimized handlers for SPI memory-like operations. */ const struct spi_controller_mem_ops *mem_ops; + const struct spi_controller_mem_caps *mem_caps; /* gpio chip select */ int *cs_gpios; -- cgit From 9a15efc5d5e6b5beaed0883e5bdcd0b1384c1b20 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 27 Jan 2022 10:18:00 +0100 Subject: spi: spi-mem: Kill the spi_mem_dtr_supports_op() helper Now that spi_mem_default_supports_op() has access to the static controller capabilities (relating to memory operations), and now that these capabilities have been filled by the relevant controllers, there is no need for a specific helper checking only DTR operations, so let's just kill spi_mem_dtr_supports_op() and simply use spi_mem_default_supports_op() instead. Signed-off-by: Miquel Raynal Reviewed-by: Pratyush Yadav Reviewed-by: Boris Brezillon Reviewed-by: Tudor Ambarus Link: https://lore.kernel.org/linux-mtd/20220127091808.1043392-6-miquel.raynal@bootlin.com --- include/linux/spi/spi-mem.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h index 38e5d45c9842..4a1bfe689872 100644 --- a/include/linux/spi/spi-mem.h +++ b/include/linux/spi/spi-mem.h @@ -330,10 +330,6 @@ void spi_controller_dma_unmap_mem_op_data(struct spi_controller *ctlr, bool spi_mem_default_supports_op(struct spi_mem *mem, const struct spi_mem_op *op); - -bool spi_mem_dtr_supports_op(struct spi_mem *mem, - const struct spi_mem_op *op); - #else static inline int spi_controller_dma_map_mem_op_data(struct spi_controller *ctlr, @@ -356,13 +352,6 @@ bool spi_mem_default_supports_op(struct spi_mem *mem, { return false; } - -static inline -bool spi_mem_dtr_supports_op(struct spi_mem *mem, - const struct spi_mem_op *op) -{ - return false; -} #endif /* CONFIG_SPI_MEM */ int spi_mem_adjust_op_size(struct spi_mem *mem, struct spi_mem_op *op); -- cgit From a433c2cbd75ab76f277364f44e76f32c7df306e7 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 27 Jan 2022 10:18:01 +0100 Subject: spi: spi-mem: Add an ecc parameter to the spi_mem_op structure Soon the SPI-NAND core will need a way to request a SPI controller to enable ECC support for a given operation. This is because of the pipelined integration of certain ECC engines, which are directly managed by the SPI controller itself. Introduce a spi_mem_op additional field for this purpose: ecc. So far this field is left unset and checked to be false by all the SPI controller drivers in their ->supports_op() hook, as they all call spi_mem_default_supports_op(). Signed-off-by: Miquel Raynal Acked-by: Pratyush Yadav Reviewed-by: Boris Brezillon Reviewed-by: Tudor Ambarus Link: https://lore.kernel.org/linux-mtd/20220127091808.1043392-7-miquel.raynal@bootlin.com --- include/linux/spi/spi-mem.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h index 4a1bfe689872..2ba044d0d5e5 100644 --- a/include/linux/spi/spi-mem.h +++ b/include/linux/spi/spi-mem.h @@ -89,6 +89,7 @@ enum spi_mem_data_dir { * @dummy.dtr: whether the dummy bytes should be sent in DTR mode or not * @data.buswidth: number of IO lanes used to send/receive the data * @data.dtr: whether the data should be sent in DTR mode or not + * @data.ecc: whether error correction is required or not * @data.dir: direction of the transfer * @data.nbytes: number of data bytes to send/receive. Can be zero if the * operation does not involve transferring data @@ -119,6 +120,7 @@ struct spi_mem_op { struct { u8 buswidth; u8 dtr : 1; + u8 ecc : 1; enum spi_mem_data_dir dir; unsigned int nbytes; union { @@ -288,9 +290,11 @@ struct spi_controller_mem_ops { /** * struct spi_controller_mem_caps - SPI memory controller capabilities * @dtr: Supports DTR operations + * @ecc: Supports operations with error correction */ struct spi_controller_mem_caps { bool dtr; + bool ecc; }; #define spi_mem_controller_is_capable(ctlr, cap) \ -- cgit From f9d7c7265bcff7d9a17425a8cddf702e8fe159c2 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 27 Jan 2022 10:18:03 +0100 Subject: mtd: spinand: Create direct mapping descriptors for ECC operations In order for pipelined ECC engines to be able to enable/disable the ECC engine only when needed and avoid races when future parallel-operations will be supported, we need to provide the information about the use of the ECC engine in the direct mapping hooks. As direct mapping configurations are meant to be static, it is best to create two new mappings: one for regular 'raw' accesses and one for accesses involving correction. It is up to the driver to use or not the new ECC enable boolean contained in the spi-mem operation. As dirmaps are not free (they consume a few pages of MMIO address space) and because these extra entries are only meant to be used by pipelined engines, let's limit their use to this specific type of engine and save a bit of memory with all the other setups. Signed-off-by: Miquel Raynal Reviewed-by: Boris Brezillon Link: https://lore.kernel.org/linux-mtd/20220127091808.1043392-9-miquel.raynal@bootlin.com --- include/linux/mtd/spinand.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mtd/spinand.h b/include/linux/mtd/spinand.h index 6988956b8492..3aa28240a77f 100644 --- a/include/linux/mtd/spinand.h +++ b/include/linux/mtd/spinand.h @@ -389,6 +389,8 @@ struct spinand_info { struct spinand_dirmap { struct spi_mem_dirmap_desc *wdesc; struct spi_mem_dirmap_desc *rdesc; + struct spi_mem_dirmap_desc *wdesc_ecc; + struct spi_mem_dirmap_desc *rdesc_ecc; }; /** -- cgit From 00360ebae483e603d55ec9a7231b787cb80ffe13 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Wed, 2 Feb 2022 15:45:36 +0100 Subject: spi: mxic: Add support for pipelined ECC operations Some SPI-NAND chips do not have a proper on-die ECC engine providing error correction/detection. This is particularly an issue on embedded devices with limited resources because all the computations must happen in software, unless an external hardware engine is provided. These external engines are new and can be of two categories: external or pipelined. Macronix is providing both, the former being already supported. The second, however, is very SoC implementation dependent and must be instantiated by the SPI host controller directly. An entire subsystem has been contributed to support these engines which makes the insertion into another subsystem such as SPI quite straightforward without the need for a lot of specific functions. Signed-off-by: Miquel Raynal Reviewed-by: Mark Brown Link: https://lore.kernel.org/linux-mtd/20220202144536.393792-1-miquel.raynal@bootlin.com --- include/linux/mtd/nand-ecc-mxic.h | 2 +- include/linux/mtd/nand.h | 15 +++++++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mtd/nand-ecc-mxic.h b/include/linux/mtd/nand-ecc-mxic.h index f3aa1ac82aed..b125926e458c 100644 --- a/include/linux/mtd/nand-ecc-mxic.h +++ b/include/linux/mtd/nand-ecc-mxic.h @@ -14,7 +14,7 @@ struct mxic_ecc_engine; -#if IS_ENABLED(CONFIG_MTD_NAND_ECC_MXIC) +#if IS_ENABLED(CONFIG_MTD_NAND_ECC_MXIC) && IS_REACHABLE(CONFIG_MTD_NAND_CORE) struct nand_ecc_engine_ops *mxic_ecc_get_pipelined_ops(void); struct nand_ecc_engine *mxic_ecc_get_pipelined_engine(struct platform_device *spi_pdev); diff --git a/include/linux/mtd/nand.h b/include/linux/mtd/nand.h index 615b3e3a3920..c3693bb87b4c 100644 --- a/include/linux/mtd/nand.h +++ b/include/linux/mtd/nand.h @@ -303,8 +303,23 @@ int nand_ecc_prepare_io_req(struct nand_device *nand, int nand_ecc_finish_io_req(struct nand_device *nand, struct nand_page_io_req *req); bool nand_ecc_is_strong_enough(struct nand_device *nand); + +#if IS_REACHABLE(CONFIG_MTD_NAND_CORE) int nand_ecc_register_on_host_hw_engine(struct nand_ecc_engine *engine); int nand_ecc_unregister_on_host_hw_engine(struct nand_ecc_engine *engine); +#else +static inline int +nand_ecc_register_on_host_hw_engine(struct nand_ecc_engine *engine) +{ + return -ENOTSUPP; +} +static inline int +nand_ecc_unregister_on_host_hw_engine(struct nand_ecc_engine *engine) +{ + return -ENOTSUPP; +} +#endif + struct nand_ecc_engine *nand_ecc_get_sw_engine(struct nand_device *nand); struct nand_ecc_engine *nand_ecc_get_on_die_hw_engine(struct nand_device *nand); struct nand_ecc_engine *nand_ecc_get_on_host_hw_engine(struct nand_device *nand); -- cgit From beb0622138cd2848dec06b0651a988c39d099574 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 1 Feb 2022 12:03:10 +0000 Subject: genirq: Kill irq_chip::parent_device Now that noone is using irq_chip::parent_device in the tree, get rid of it. Signed-off-by: Marc Zyngier Acked-by: Bartosz Golaszewski Link: https://lore.kernel.org/r/20220201120310.878267-13-maz@kernel.org --- include/linux/irq.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/irq.h b/include/linux/irq.h index 848e1e12c5c6..2cb2e2ac2703 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -456,7 +456,6 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d) /** * struct irq_chip - hardware interrupt chip descriptor * - * @parent_device: pointer to parent device for irqchip * @name: name for /proc/interrupts * @irq_startup: start up the interrupt (defaults to ->enable if NULL) * @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL) @@ -503,7 +502,6 @@ static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d) * @flags: chip specific flags */ struct irq_chip { - struct device *parent_device; const char *name; unsigned int (*irq_startup)(struct irq_data *data); void (*irq_shutdown)(struct irq_data *data); -- cgit From 27c196c7b73cb70bbed3a9df46563bab60e63415 Mon Sep 17 00:00:00 2001 From: Terry Bowman Date: Wed, 9 Feb 2022 11:27:09 -0600 Subject: kernel/resource: Introduce request_mem_region_muxed() Support for requesting muxed memory region is implemented but not currently callable as a macro. Add the request muxed memory region macro. MMIO memory accesses can be synchronized using request_mem_region() which is already available. This call will return failure if the resource is busy. The 'muxed' version of this macro will handle a busy resource by using a wait queue to retry until the resource is available. Signed-off-by: Terry Bowman Reviewed-by: Andy Shevchenko Signed-off-by: Wolfram Sang --- include/linux/ioport.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 8359c50f9988..ec5f71f7135b 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -262,6 +262,8 @@ resource_union(struct resource *r1, struct resource *r2, struct resource *r) #define request_muxed_region(start,n,name) __request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED) #define __request_mem_region(start,n,name, excl) __request_region(&iomem_resource, (start), (n), (name), excl) #define request_mem_region(start,n,name) __request_region(&iomem_resource, (start), (n), (name), 0) +#define request_mem_region_muxed(start, n, name) \ + __request_region(&iomem_resource, (start), (n), (name), IORESOURCE_MUXED) #define request_mem_region_exclusive(start,n,name) \ __request_region(&iomem_resource, (start), (n), (name), IORESOURCE_EXCLUSIVE) #define rename_region(region, newname) do { (region)->name = (newname); } while (0) -- cgit From 8008e7902f28eb9e5459b21d375b3e5b4090efff Mon Sep 17 00:00:00 2001 From: Sai Prakash Ranjan Date: Fri, 28 Jan 2022 13:17:09 +0530 Subject: soc: qcom: llcc: Update the logic for version info extraction LLCC HW version info is made up of major, branch, minor and echo version bits each of which are 8bits. Several features in newer LLCC HW are based on the full version rather than just major or minor versions such as write-subcache enable which is applicable for versions v2.0.0.0 and later, also upcoming write-subcache cacheable for SM8450 SoC which is only present in versions v2.1.0.0 and later, so it makes it easier and cleaner to just directly compare with the full version than adding additional major/branch/ minor/echo version checks. So remove the earlier major version check and add full version check for those features. Signed-off-by: Sai Prakash Ranjan Tested-by: Vinod Koul Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/a82d7c32348c51fcc2b63e220d91b318bf706c83.1643355594.git.quic_saipraka@quicinc.com --- include/linux/soc/qcom/llcc-qcom.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/llcc-qcom.h b/include/linux/soc/qcom/llcc-qcom.h index 9e8fd92c96b7..beecf00b707d 100644 --- a/include/linux/soc/qcom/llcc-qcom.h +++ b/include/linux/soc/qcom/llcc-qcom.h @@ -83,7 +83,7 @@ struct llcc_edac_reg_data { * @bitmap: Bit map to track the active slice ids * @offsets: Pointer to the bank offsets array * @ecc_irq: interrupt for llcc cache error detection and reporting - * @major_version: Indicates the LLCC major version + * @version: Indicates the LLCC version */ struct llcc_drv_data { struct regmap *regmap; @@ -96,7 +96,7 @@ struct llcc_drv_data { unsigned long *bitmap; u32 *offsets; int ecc_irq; - u32 major_version; + u32 version; }; #if IS_ENABLED(CONFIG_QCOM_LLCC) -- cgit From a6e9d7ef252c44a4f33b4403cd367430697dd9be Mon Sep 17 00:00:00 2001 From: Sai Prakash Ranjan Date: Fri, 28 Jan 2022 13:17:13 +0530 Subject: soc: qcom: llcc: Add configuration data for SM8450 SoC Add LLCC configuration data for SM8450 SoC. Signed-off-by: Sai Prakash Ranjan Tested-by: Vinod Koul Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/fec944cb8f2a4a70785903c6bfec629c6f31b6a4.1643355594.git.quic_saipraka@quicinc.com --- include/linux/soc/qcom/llcc-qcom.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/llcc-qcom.h b/include/linux/soc/qcom/llcc-qcom.h index beecf00b707d..0bc21ee58fac 100644 --- a/include/linux/soc/qcom/llcc-qcom.h +++ b/include/linux/soc/qcom/llcc-qcom.h @@ -35,7 +35,12 @@ #define LLCC_WRCACHE 31 #define LLCC_CVPFW 32 #define LLCC_CPUSS1 33 +#define LLCC_CAMEXP0 34 +#define LLCC_CPUMTE 35 #define LLCC_CPUHWT 36 +#define LLCC_MDMCLAD2 37 +#define LLCC_CAMEXP1 38 +#define LLCC_AENPU 45 /** * struct llcc_slice_desc - Cache slice descriptor -- cgit From 297565aa22cfa80ab0f88c3569693aea0b6afb6d Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sat, 5 Feb 2022 16:23:45 +0100 Subject: lib/xor: make xor prototypes more friendly to compiler vectorization Modern compilers are perfectly capable of extracting parallelism from the XOR routines, provided that the prototypes reflect the nature of the input accurately, in particular, the fact that the input vectors are expected not to overlap. This is not documented explicitly, but is implied by the interchangeability of the various C routines, some of which use temporary variables while others don't: this means that these routines only behave identically for non-overlapping inputs. So let's decorate these input vectors with the __restrict modifier, which informs the compiler that there is no overlap. While at it, make the input-only vectors pointer-to-const as well. Tested-by: Nathan Chancellor Signed-off-by: Ard Biesheuvel Reviewed-by: Nick Desaulniers Link: https://github.com/ClangBuiltLinux/linux/issues/563 Signed-off-by: Herbert Xu --- include/linux/raid/xor.h | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/raid/xor.h b/include/linux/raid/xor.h index 2a9fee8ddae3..51b811b62322 100644 --- a/include/linux/raid/xor.h +++ b/include/linux/raid/xor.h @@ -11,13 +11,20 @@ struct xor_block_template { struct xor_block_template *next; const char *name; int speed; - void (*do_2)(unsigned long, unsigned long *, unsigned long *); - void (*do_3)(unsigned long, unsigned long *, unsigned long *, - unsigned long *); - void (*do_4)(unsigned long, unsigned long *, unsigned long *, - unsigned long *, unsigned long *); - void (*do_5)(unsigned long, unsigned long *, unsigned long *, - unsigned long *, unsigned long *, unsigned long *); + void (*do_2)(unsigned long, unsigned long * __restrict, + const unsigned long * __restrict); + void (*do_3)(unsigned long, unsigned long * __restrict, + const unsigned long * __restrict, + const unsigned long * __restrict); + void (*do_4)(unsigned long, unsigned long * __restrict, + const unsigned long * __restrict, + const unsigned long * __restrict, + const unsigned long * __restrict); + void (*do_5)(unsigned long, unsigned long * __restrict, + const unsigned long * __restrict, + const unsigned long * __restrict, + const unsigned long * __restrict, + const unsigned long * __restrict); }; #endif -- cgit From dc1b4df09acdca7a89806b28f235cd6d8dcd3d24 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 7 Feb 2022 10:19:43 +0000 Subject: atomics: Fix atomic64_{read_acquire,set_release} fallbacks Arnd reports that on 32-bit architectures, the fallbacks for atomic64_read_acquire() and atomic64_set_release() are broken as they use smp_load_acquire() and smp_store_release() respectively, which do not work on types larger than the native word size. Since those contain compiletime_assert_atomic_type(), any attempt to use those fallbacks will result in a build-time error. e.g. with the following added to arch/arm/kernel/setup.c: | void test_atomic64(atomic64_t *v) | { | atomic64_set_release(v, 5); | atomic64_read_acquire(v); | } The compiler will complain as follows: | In file included from : | In function 'arch_atomic64_set_release', | inlined from 'test_atomic64' at ./include/linux/atomic/atomic-instrumented.h:669:2: | ././include/linux/compiler_types.h:346:38: error: call to '__compiletime_assert_9' declared with attribute error: Need native word sized stores/loads for atomicity. | 346 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | | ^ | ././include/linux/compiler_types.h:327:4: note: in definition of macro '__compiletime_assert' | 327 | prefix ## suffix(); \ | | ^~~~~~ | ././include/linux/compiler_types.h:346:2: note: in expansion of macro '_compiletime_assert' | 346 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__) | | ^~~~~~~~~~~~~~~~~~~ | ././include/linux/compiler_types.h:349:2: note: in expansion of macro 'compiletime_assert' | 349 | compiletime_assert(__native_word(t), \ | | ^~~~~~~~~~~~~~~~~~ | ./include/asm-generic/barrier.h:133:2: note: in expansion of macro 'compiletime_assert_atomic_type' | 133 | compiletime_assert_atomic_type(*p); \ | | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ./include/asm-generic/barrier.h:164:55: note: in expansion of macro '__smp_store_release' | 164 | #define smp_store_release(p, v) do { kcsan_release(); __smp_store_release(p, v); } while (0) | | ^~~~~~~~~~~~~~~~~~~ | ./include/linux/atomic/atomic-arch-fallback.h:1270:2: note: in expansion of macro 'smp_store_release' | 1270 | smp_store_release(&(v)->counter, i); | | ^~~~~~~~~~~~~~~~~ | make[2]: *** [scripts/Makefile.build:288: arch/arm/kernel/setup.o] Error 1 | make[1]: *** [scripts/Makefile.build:550: arch/arm/kernel] Error 2 | make: *** [Makefile:1831: arch/arm] Error 2 Fix this by only using smp_load_acquire() and smp_store_release() for native atomic types, and otherwise falling back to the regular barriers necessary for acquire/release semantics, as we do in the more generic acquire and release fallbacks. Since the fallback templates are used to generate the atomic64_*() and atomic_*() operations, the __native_word() check is added to both. For the atomic_*() operations, which are always 32-bit, the __native_word() check is redundant but not harmful, as it is always true. For the example above this works as expected on 32-bit, e.g. for arm multi_v7_defconfig: | : | push {r4, r5} | dmb ish | pldw [r0] | mov r2, #5 | mov r3, #0 | ldrexd r4, [r0] | strexd r4, r2, [r0] | teq r4, #0 | bne 484 | ldrexd r2, [r0] | dmb ish | pop {r4, r5} | bx lr ... and also on 64-bit, e.g. for arm64 defconfig: | : | bti c | paciasp | mov x1, #0x5 | stlr x1, [x0] | ldar x0, [x0] | autiasp | ret Reported-by: Arnd Bergmann Signed-off-by: Mark Rutland Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ard Biesheuvel Reviewed-by: Boqun Feng Link: https://lore.kernel.org/r/20220207101943.439825-1-mark.rutland@arm.com --- include/linux/atomic/atomic-arch-fallback.h | 38 +++++++++++++++++++++++++---- 1 file changed, 33 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h index a3dba31df01e..6db58d180866 100644 --- a/include/linux/atomic/atomic-arch-fallback.h +++ b/include/linux/atomic/atomic-arch-fallback.h @@ -151,7 +151,16 @@ static __always_inline int arch_atomic_read_acquire(const atomic_t *v) { - return smp_load_acquire(&(v)->counter); + int ret; + + if (__native_word(atomic_t)) { + ret = smp_load_acquire(&(v)->counter); + } else { + ret = arch_atomic_read(v); + __atomic_acquire_fence(); + } + + return ret; } #define arch_atomic_read_acquire arch_atomic_read_acquire #endif @@ -160,7 +169,12 @@ arch_atomic_read_acquire(const atomic_t *v) static __always_inline void arch_atomic_set_release(atomic_t *v, int i) { - smp_store_release(&(v)->counter, i); + if (__native_word(atomic_t)) { + smp_store_release(&(v)->counter, i); + } else { + __atomic_release_fence(); + arch_atomic_set(v, i); + } } #define arch_atomic_set_release arch_atomic_set_release #endif @@ -1258,7 +1272,16 @@ arch_atomic_dec_if_positive(atomic_t *v) static __always_inline s64 arch_atomic64_read_acquire(const atomic64_t *v) { - return smp_load_acquire(&(v)->counter); + s64 ret; + + if (__native_word(atomic64_t)) { + ret = smp_load_acquire(&(v)->counter); + } else { + ret = arch_atomic64_read(v); + __atomic_acquire_fence(); + } + + return ret; } #define arch_atomic64_read_acquire arch_atomic64_read_acquire #endif @@ -1267,7 +1290,12 @@ arch_atomic64_read_acquire(const atomic64_t *v) static __always_inline void arch_atomic64_set_release(atomic64_t *v, s64 i) { - smp_store_release(&(v)->counter, i); + if (__native_word(atomic64_t)) { + smp_store_release(&(v)->counter, i); + } else { + __atomic_release_fence(); + arch_atomic64_set(v, i); + } } #define arch_atomic64_set_release arch_atomic64_set_release #endif @@ -2358,4 +2386,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v) #endif #endif /* _LINUX_ATOMIC_FALLBACK_H */ -// cca554917d7ea73d5e3e7397dd70c484cad9b2c4 +// 8e2cc06bc0d2c0967d2f8424762bd48555ee40ae -- cgit From 9983a9d577db415c41099a20a5637ab25dd3c240 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Tue, 8 Feb 2022 18:08:02 +0100 Subject: locking/local_lock: Make the empty local_lock_*() function a macro. It has been said that local_lock() does not add any overhead compared to preempt_disable() in a !LOCKDEP configuration. A micro benchmark showed an unexpected result which can be reduced to the fact that local_lock() was not entirely optimized away. In the !LOCKDEP configuration local_lock_acquire() is an empty static inline function. On x86 the this_cpu_ptr() argument of that function is fully evaluated leading to an additional mov+add instructions which are not needed and not used. Replace the static inline function with a macro. The typecheck() macro ensures that the argument is of proper type while the resulting disassembly shows no traces of this_cpu_ptr(). Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Waiman Long Link: https://lkml.kernel.org/r/YgKjciR60fZft2l4@linutronix.de --- include/linux/local_lock_internal.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 975e33b793a7..6d635e8306d6 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -44,9 +44,9 @@ static inline void local_lock_debug_init(local_lock_t *l) } #else /* CONFIG_DEBUG_LOCK_ALLOC */ # define LOCAL_LOCK_DEBUG_INIT(lockname) -static inline void local_lock_acquire(local_lock_t *l) { } -static inline void local_lock_release(local_lock_t *l) { } -static inline void local_lock_debug_init(local_lock_t *l) { } +# define local_lock_acquire(__ll) do { typecheck(local_lock_t *, __ll); } while (0) +# define local_lock_release(__ll) do { typecheck(local_lock_t *, __ll); } while (0) +# define local_lock_debug_init(__ll) do { typecheck(local_lock_t *, __ll); } while (0) #endif /* !CONFIG_DEBUG_LOCK_ALLOC */ #define INIT_LOCAL_LOCK(lockname) { LOCAL_LOCK_DEBUG_INIT(lockname) } -- cgit From 48b6190a00425a1bebac9f7ae4b338a1e20f50f3 Mon Sep 17 00:00:00 2001 From: "D. Wythe" Date: Thu, 10 Feb 2022 17:11:36 +0800 Subject: net/smc: Limit SMC visits when handshake workqueue congested This patch intends to provide a mechanism to put constraint on SMC connections visit according to the pressure of SMC handshake process. At present, frequent visits will cause the incoming connections to be backlogged in SMC handshake queue, raise the connections established time. Which is quite unacceptable for those applications who base on short lived connections. There are two ways to implement this mechanism: 1. Put limitation after TCP established. 2. Put limitation before TCP established. In the first way, we need to wait and receive CLC messages that the client will potentially send, and then actively reply with a decline message, in a sense, which is also a sort of SMC handshake, affect the connections established time on its way. In the second way, the only problem is that we need to inject SMC logic into TCP when it is about to reply the incoming SYN, since we already do that, it's seems not a problem anymore. And advantage is obvious, few additional processes are required to complete the constraint. This patch use the second way. After this patch, connections who beyond constraint will not informed any SMC indication, and SMC will not be involved in any of its subsequent processes. Link: https://lore.kernel.org/all/1641301961-59331-1-git-send-email-alibuda@linux.alibaba.com/ Signed-off-by: D. Wythe Signed-off-by: David S. Miller --- include/linux/tcp.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 78b91bb92f0d..1168302b7927 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -394,6 +394,7 @@ struct tcp_sock { bool is_mptcp; #endif #if IS_ENABLED(CONFIG_SMC) + bool (*smc_hs_congested)(const struct sock *sk); bool syn_smc; /* SYN includes SMC */ #endif -- cgit From a6a6fe27bab48f0d09a64b051e7bde432fcae081 Mon Sep 17 00:00:00 2001 From: "D. Wythe" Date: Thu, 10 Feb 2022 17:11:37 +0800 Subject: net/smc: Dynamic control handshake limitation by socket options This patch aims to add dynamic control for SMC handshake limitation for every smc sockets, in production environment, it is possible for the same applications to handle different service types, and may have different opinion on SMC handshake limitation. This patch try socket options to complete it, since we don't have socket option level for SMC yet, which requires us to implement it at the same time. This patch does the following: - add new socket option level: SOL_SMC. - add new SMC socket option: SMC_LIMIT_HS. - provide getter/setter for SMC socket options. Link: https://lore.kernel.org/all/20f504f961e1a803f85d64229ad84260434203bd.1644323503.git.alibuda@linux.alibaba.com/ Signed-off-by: D. Wythe Signed-off-by: David S. Miller --- include/linux/socket.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/socket.h b/include/linux/socket.h index 8ef26d89ef49..6f85f5d957ef 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -366,6 +366,7 @@ struct ucred { #define SOL_XDP 283 #define SOL_MPTCP 284 #define SOL_MCTP 285 +#define SOL_SMC 286 /* IPX options */ #define IPX_TYPE 1 -- cgit From 0e51e2ab49a99bc5077760aa083dfa1c3bf9899b Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Fri, 11 Feb 2022 18:11:47 +0800 Subject: block: remove THROTL_IOPS_MAX No one uses THROTL_IOPS_MAX any more, so remove it. Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220211101149.2368042-2-ming.lei@redhat.com Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index b4de2010fba5..bdc49bd4eef0 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -28,8 +28,6 @@ /* percpu_counter batch for blkg_[rw]stats, per-cpu drift doesn't matter */ #define BLKG_STAT_CPU_BATCH (INT_MAX / 2) -/* Max limits for throttle policy */ -#define THROTL_IOPS_MAX UINT_MAX #define FC_APPID_LEN 129 -- cgit From 672fdcf0e7de3b1e39416ac85abf178f023271f1 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Fri, 11 Feb 2022 18:11:49 +0800 Subject: block: partition include/linux/blk-cgroup.h Partition include/linux/blk-cgroup.h into two parts: one is public part, the other is block layer private part. Suggested by Christoph Hellwig. Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220211101149.2368042-4-ming.lei@redhat.com Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 459 +-------------------------------------------- 1 file changed, 5 insertions(+), 454 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index bdc49bd4eef0..f2ad8ed8f777 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -25,12 +25,8 @@ #include #include -/* percpu_counter batch for blkg_[rw]stats, per-cpu drift doesn't matter */ -#define BLKG_STAT_CPU_BATCH (INT_MAX / 2) - #define FC_APPID_LEN 129 - #ifdef CONFIG_BLK_CGROUP enum blkg_iostat_type { @@ -42,6 +38,7 @@ enum blkg_iostat_type { }; struct blkcg_gq; +struct blkg_policy_data; struct blkcg { struct cgroup_subsys_state css; @@ -74,36 +71,6 @@ struct blkg_iostat_set { struct blkg_iostat last; }; -/* - * A blkcg_gq (blkg) is association between a block cgroup (blkcg) and a - * request_queue (q). This is used by blkcg policies which need to track - * information per blkcg - q pair. - * - * There can be multiple active blkcg policies and each blkg:policy pair is - * represented by a blkg_policy_data which is allocated and freed by each - * policy's pd_alloc/free_fn() methods. A policy can allocate private data - * area by allocating larger data structure which embeds blkg_policy_data - * at the beginning. - */ -struct blkg_policy_data { - /* the blkg and policy id this per-policy data belongs to */ - struct blkcg_gq *blkg; - int plid; -}; - -/* - * Policies that need to keep per-blkcg data which is independent from any - * request_queue associated to it should implement cpd_alloc/free_fn() - * methods. A policy can allocate private data area by allocating larger - * data structure which embeds blkcg_policy_data at the beginning. - * cpd_init() is invoked to let each policy handle per-blkcg data. - */ -struct blkcg_policy_data { - /* the blkcg and policy id this per-policy data belongs to */ - struct blkcg *blkcg; - int plid; -}; - /* association between a blk cgroup and a request queue */ struct blkcg_gq { /* Pointer to the associated request_queue */ @@ -139,120 +106,17 @@ struct blkcg_gq { struct rcu_head rcu_head; }; -typedef struct blkcg_policy_data *(blkcg_pol_alloc_cpd_fn)(gfp_t gfp); -typedef void (blkcg_pol_init_cpd_fn)(struct blkcg_policy_data *cpd); -typedef void (blkcg_pol_free_cpd_fn)(struct blkcg_policy_data *cpd); -typedef void (blkcg_pol_bind_cpd_fn)(struct blkcg_policy_data *cpd); -typedef struct blkg_policy_data *(blkcg_pol_alloc_pd_fn)(gfp_t gfp, - struct request_queue *q, struct blkcg *blkcg); -typedef void (blkcg_pol_init_pd_fn)(struct blkg_policy_data *pd); -typedef void (blkcg_pol_online_pd_fn)(struct blkg_policy_data *pd); -typedef void (blkcg_pol_offline_pd_fn)(struct blkg_policy_data *pd); -typedef void (blkcg_pol_free_pd_fn)(struct blkg_policy_data *pd); -typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkg_policy_data *pd); -typedef bool (blkcg_pol_stat_pd_fn)(struct blkg_policy_data *pd, - struct seq_file *s); - -struct blkcg_policy { - int plid; - /* cgroup files for the policy */ - struct cftype *dfl_cftypes; - struct cftype *legacy_cftypes; - - /* operations */ - blkcg_pol_alloc_cpd_fn *cpd_alloc_fn; - blkcg_pol_init_cpd_fn *cpd_init_fn; - blkcg_pol_free_cpd_fn *cpd_free_fn; - blkcg_pol_bind_cpd_fn *cpd_bind_fn; - - blkcg_pol_alloc_pd_fn *pd_alloc_fn; - blkcg_pol_init_pd_fn *pd_init_fn; - blkcg_pol_online_pd_fn *pd_online_fn; - blkcg_pol_offline_pd_fn *pd_offline_fn; - blkcg_pol_free_pd_fn *pd_free_fn; - blkcg_pol_reset_pd_stats_fn *pd_reset_stats_fn; - blkcg_pol_stat_pd_fn *pd_stat_fn; -}; - -extern struct blkcg blkcg_root; extern struct cgroup_subsys_state * const blkcg_root_css; -extern bool blkcg_debug_stats; - -struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg, - struct request_queue *q, bool update_hint); -int blkcg_init_queue(struct request_queue *q); -void blkcg_exit_queue(struct request_queue *q); - -/* Blkio controller policy registration */ -int blkcg_policy_register(struct blkcg_policy *pol); -void blkcg_policy_unregister(struct blkcg_policy *pol); -int blkcg_activate_policy(struct request_queue *q, - const struct blkcg_policy *pol); -void blkcg_deactivate_policy(struct request_queue *q, - const struct blkcg_policy *pol); - -const char *blkg_dev_name(struct blkcg_gq *blkg); -void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg, - u64 (*prfill)(struct seq_file *, - struct blkg_policy_data *, int), - const struct blkcg_policy *pol, int data, - bool show_total); -u64 __blkg_prfill_u64(struct seq_file *sf, struct blkg_policy_data *pd, u64 v); - -struct blkg_conf_ctx { - struct block_device *bdev; - struct blkcg_gq *blkg; - char *body; -}; - -struct block_device *blkcg_conf_open_bdev(char **inputp); -int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol, - char *input, struct blkg_conf_ctx *ctx); -void blkg_conf_finish(struct blkg_conf_ctx *ctx); -/** - * blkcg_css - find the current css - * - * Find the css associated with either the kthread or the current task. - * This may return a dying css, so it is up to the caller to use tryget logic - * to confirm it is alive and well. - */ -static inline struct cgroup_subsys_state *blkcg_css(void) -{ - struct cgroup_subsys_state *css; - - css = kthread_blkcg(); - if (css) - return css; - return task_css(current, io_cgrp_id); -} +void blkcg_destroy_blkgs(struct blkcg *blkcg); +void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); +void blkcg_maybe_throttle_current(void); static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css) { return css ? container_of(css, struct blkcg, css) : NULL; } -/** - * __bio_blkcg - internal, inconsistent version to get blkcg - * - * DO NOT USE. - * This function is inconsistent and consequently is dangerous to use. The - * first part of the function returns a blkcg where a reference is owned by the - * bio. This means it does not need to be rcu protected as it cannot go away - * with the bio owning a reference to it. However, the latter potentially gets - * it from task_css(). This can race against task migration and the cgroup - * dying. It is also semantically different as it must be called rcu protected - * and is susceptible to failure when trying to get a reference to it. - * Therefore, it is not ok to assume that *_get() will always succeed on the - * blkcg returned here. - */ -static inline struct blkcg *__bio_blkcg(struct bio *bio) -{ - if (bio && bio->bi_blkg) - return bio->bi_blkg->blkcg; - return css_to_blkcg(blkcg_css()); -} - /** * bio_blkcg - grab the blkcg associated with a bio * @bio: target bio @@ -288,22 +152,6 @@ static inline bool blk_cgroup_congested(void) return ret; } -/** - * bio_issue_as_root_blkg - see if this bio needs to be issued as root blkg - * @return: true if this bio needs to be submitted with the root blkg context. - * - * In order to avoid priority inversions we sometimes need to issue a bio as if - * it were attached to the root blkg, and then backcharge to the actual owning - * blkg. The idea is we do bio_blkcg() to look up the actual context for the - * bio and attach the appropriate blkg to the bio. Then we call this helper and - * if it is true run with the root blkg for that queue and then do any - * backcharging to the originating cgroup once the io is complete. - */ -static inline bool bio_issue_as_root_blkg(struct bio *bio) -{ - return (bio->bi_opf & (REQ_META | REQ_SWAP)) != 0; -} - /** * blkcg_parent - get the parent of a blkcg * @blkcg: blkcg of interest @@ -315,96 +163,6 @@ static inline struct blkcg *blkcg_parent(struct blkcg *blkcg) return css_to_blkcg(blkcg->css.parent); } -/** - * __blkg_lookup - internal version of blkg_lookup() - * @blkcg: blkcg of interest - * @q: request_queue of interest - * @update_hint: whether to update lookup hint with the result or not - * - * This is internal version and shouldn't be used by policy - * implementations. Looks up blkgs for the @blkcg - @q pair regardless of - * @q's bypass state. If @update_hint is %true, the caller should be - * holding @q->queue_lock and lookup hint is updated on success. - */ -static inline struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, - struct request_queue *q, - bool update_hint) -{ - struct blkcg_gq *blkg; - - if (blkcg == &blkcg_root) - return q->root_blkg; - - blkg = rcu_dereference(blkcg->blkg_hint); - if (blkg && blkg->q == q) - return blkg; - - return blkg_lookup_slowpath(blkcg, q, update_hint); -} - -/** - * blkg_lookup - lookup blkg for the specified blkcg - q pair - * @blkcg: blkcg of interest - * @q: request_queue of interest - * - * Lookup blkg for the @blkcg - @q pair. This function should be called - * under RCU read lock. - */ -static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, - struct request_queue *q) -{ - WARN_ON_ONCE(!rcu_read_lock_held()); - return __blkg_lookup(blkcg, q, false); -} - -/** - * blk_queue_root_blkg - return blkg for the (blkcg_root, @q) pair - * @q: request_queue of interest - * - * Lookup blkg for @q at the root level. See also blkg_lookup(). - */ -static inline struct blkcg_gq *blk_queue_root_blkg(struct request_queue *q) -{ - return q->root_blkg; -} - -/** - * blkg_to_pdata - get policy private data - * @blkg: blkg of interest - * @pol: policy of interest - * - * Return pointer to private data associated with the @blkg-@pol pair. - */ -static inline struct blkg_policy_data *blkg_to_pd(struct blkcg_gq *blkg, - struct blkcg_policy *pol) -{ - return blkg ? blkg->pd[pol->plid] : NULL; -} - -static inline struct blkcg_policy_data *blkcg_to_cpd(struct blkcg *blkcg, - struct blkcg_policy *pol) -{ - return blkcg ? blkcg->cpd[pol->plid] : NULL; -} - -/** - * pdata_to_blkg - get blkg associated with policy private data - * @pd: policy private data of interest - * - * @pd is policy private data. Determine the blkg it's associated with. - */ -static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd) -{ - return pd ? pd->blkg : NULL; -} - -static inline struct blkcg *cpd_to_blkcg(struct blkcg_policy_data *cpd) -{ - return cpd ? cpd->blkcg : NULL; -} - -extern void blkcg_destroy_blkgs(struct blkcg *blkcg); - /** * blkcg_pin_online - pin online state * @blkcg: blkcg of interest @@ -437,231 +195,24 @@ static inline void blkcg_unpin_online(struct blkcg *blkcg) } while (blkcg); } -/** - * blkg_path - format cgroup path of blkg - * @blkg: blkg of interest - * @buf: target buffer - * @buflen: target buffer length - * - * Format the path of the cgroup of @blkg into @buf. - */ -static inline int blkg_path(struct blkcg_gq *blkg, char *buf, int buflen) -{ - return cgroup_path(blkg->blkcg->css.cgroup, buf, buflen); -} - -/** - * blkg_get - get a blkg reference - * @blkg: blkg to get - * - * The caller should be holding an existing reference. - */ -static inline void blkg_get(struct blkcg_gq *blkg) -{ - percpu_ref_get(&blkg->refcnt); -} - -/** - * blkg_tryget - try and get a blkg reference - * @blkg: blkg to get - * - * This is for use when doing an RCU lookup of the blkg. We may be in the midst - * of freeing this blkg, so we can only use it if the refcnt is not zero. - */ -static inline bool blkg_tryget(struct blkcg_gq *blkg) -{ - return blkg && percpu_ref_tryget(&blkg->refcnt); -} - -/** - * blkg_put - put a blkg reference - * @blkg: blkg to put - */ -static inline void blkg_put(struct blkcg_gq *blkg) -{ - percpu_ref_put(&blkg->refcnt); -} - -/** - * blkg_for_each_descendant_pre - pre-order walk of a blkg's descendants - * @d_blkg: loop cursor pointing to the current descendant - * @pos_css: used for iteration - * @p_blkg: target blkg to walk descendants of - * - * Walk @c_blkg through the descendants of @p_blkg. Must be used with RCU - * read locked. If called under either blkcg or queue lock, the iteration - * is guaranteed to include all and only online blkgs. The caller may - * update @pos_css by calling css_rightmost_descendant() to skip subtree. - * @p_blkg is included in the iteration and the first node to be visited. - */ -#define blkg_for_each_descendant_pre(d_blkg, pos_css, p_blkg) \ - css_for_each_descendant_pre((pos_css), &(p_blkg)->blkcg->css) \ - if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \ - (p_blkg)->q, false))) - -/** - * blkg_for_each_descendant_post - post-order walk of a blkg's descendants - * @d_blkg: loop cursor pointing to the current descendant - * @pos_css: used for iteration - * @p_blkg: target blkg to walk descendants of - * - * Similar to blkg_for_each_descendant_pre() but performs post-order - * traversal instead. Synchronization rules are the same. @p_blkg is - * included in the iteration and the last node to be visited. - */ -#define blkg_for_each_descendant_post(d_blkg, pos_css, p_blkg) \ - css_for_each_descendant_post((pos_css), &(p_blkg)->blkcg->css) \ - if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \ - (p_blkg)->q, false))) - -bool __blkcg_punt_bio_submit(struct bio *bio); - -static inline bool blkcg_punt_bio_submit(struct bio *bio) -{ - if (bio->bi_opf & REQ_CGROUP_PUNT) - return __blkcg_punt_bio_submit(bio); - else - return false; -} - -static inline void blkcg_bio_issue_init(struct bio *bio) -{ - bio_issue_init(&bio->bi_issue, bio_sectors(bio)); -} - -static inline void blkcg_use_delay(struct blkcg_gq *blkg) -{ - if (WARN_ON_ONCE(atomic_read(&blkg->use_delay) < 0)) - return; - if (atomic_add_return(1, &blkg->use_delay) == 1) - atomic_inc(&blkg->blkcg->css.cgroup->congestion_count); -} - -static inline int blkcg_unuse_delay(struct blkcg_gq *blkg) -{ - int old = atomic_read(&blkg->use_delay); - - if (WARN_ON_ONCE(old < 0)) - return 0; - if (old == 0) - return 0; - - /* - * We do this song and dance because we can race with somebody else - * adding or removing delay. If we just did an atomic_dec we'd end up - * negative and we'd already be in trouble. We need to subtract 1 and - * then check to see if we were the last delay so we can drop the - * congestion count on the cgroup. - */ - while (old) { - int cur = atomic_cmpxchg(&blkg->use_delay, old, old - 1); - if (cur == old) - break; - old = cur; - } - - if (old == 0) - return 0; - if (old == 1) - atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); - return 1; -} - -/** - * blkcg_set_delay - Enable allocator delay mechanism with the specified delay amount - * @blkg: target blkg - * @delay: delay duration in nsecs - * - * When enabled with this function, the delay is not decayed and must be - * explicitly cleared with blkcg_clear_delay(). Must not be mixed with - * blkcg_[un]use_delay() and blkcg_add_delay() usages. - */ -static inline void blkcg_set_delay(struct blkcg_gq *blkg, u64 delay) -{ - int old = atomic_read(&blkg->use_delay); - - /* We only want 1 person setting the congestion count for this blkg. */ - if (!old && atomic_cmpxchg(&blkg->use_delay, old, -1) == old) - atomic_inc(&blkg->blkcg->css.cgroup->congestion_count); - - atomic64_set(&blkg->delay_nsec, delay); -} - -/** - * blkcg_clear_delay - Disable allocator delay mechanism - * @blkg: target blkg - * - * Disable use_delay mechanism. See blkcg_set_delay(). - */ -static inline void blkcg_clear_delay(struct blkcg_gq *blkg) -{ - int old = atomic_read(&blkg->use_delay); - - /* We only want 1 person clearing the congestion count for this blkg. */ - if (old && atomic_cmpxchg(&blkg->use_delay, old, 0) == old) - atomic_dec(&blkg->blkcg->css.cgroup->congestion_count); -} - -void blk_cgroup_bio_start(struct bio *bio); -void blkcg_add_delay(struct blkcg_gq *blkg, u64 now, u64 delta); -void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); -void blkcg_maybe_throttle_current(void); #else /* CONFIG_BLK_CGROUP */ struct blkcg { }; -struct blkg_policy_data { -}; - -struct blkcg_policy_data { -}; - struct blkcg_gq { }; -struct blkcg_policy { -}; - #define blkcg_root_css ((struct cgroup_subsys_state *)ERR_PTR(-EINVAL)) static inline void blkcg_maybe_throttle_current(void) { } static inline bool blk_cgroup_congested(void) { return false; } #ifdef CONFIG_BLOCK - static inline void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay) { } - -static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, void *key) { return NULL; } -static inline struct blkcg_gq *blk_queue_root_blkg(struct request_queue *q) -{ return NULL; } -static inline int blkcg_init_queue(struct request_queue *q) { return 0; } -static inline void blkcg_exit_queue(struct request_queue *q) { } -static inline int blkcg_policy_register(struct blkcg_policy *pol) { return 0; } -static inline void blkcg_policy_unregister(struct blkcg_policy *pol) { } -static inline int blkcg_activate_policy(struct request_queue *q, - const struct blkcg_policy *pol) { return 0; } -static inline void blkcg_deactivate_policy(struct request_queue *q, - const struct blkcg_policy *pol) { } - -static inline struct blkcg *__bio_blkcg(struct bio *bio) { return NULL; } static inline struct blkcg *bio_blkcg(struct bio *bio) { return NULL; } +#endif /* CONFIG_BLOCK */ -static inline struct blkg_policy_data *blkg_to_pd(struct blkcg_gq *blkg, - struct blkcg_policy *pol) { return NULL; } -static inline struct blkcg_gq *pd_to_blkg(struct blkg_policy_data *pd) { return NULL; } -static inline char *blkg_path(struct blkcg_gq *blkg) { return NULL; } -static inline void blkg_get(struct blkcg_gq *blkg) { } -static inline void blkg_put(struct blkcg_gq *blkg) { } - -static inline bool blkcg_punt_bio_submit(struct bio *bio) { return false; } -static inline void blkcg_bio_issue_init(struct bio *bio) { } -static inline void blk_cgroup_bio_start(struct bio *bio) { } - -#define blk_queue_for_each_rl(rl, q) \ - for ((rl) = &(q)->root_rl; (rl); (rl) = NULL) - -#endif /* CONFIG_BLOCK */ #endif /* CONFIG_BLK_CGROUP */ #ifdef CONFIG_BLK_CGROUP_FC_APPID -- cgit From a8abb0c3dc1e28454851a00f8b7333d9695d566c Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Wed, 9 Feb 2022 12:33:23 +0530 Subject: bpf: Fix crash due to incorrect copy_map_value When both bpf_spin_lock and bpf_timer are present in a BPF map value, copy_map_value needs to skirt both objects when copying a value into and out of the map. However, the current code does not set both s_off and t_off in copy_map_value, which leads to a crash when e.g. bpf_spin_lock is placed in map value with bpf_timer, as bpf_map_update_elem call will be able to overwrite the other timer object. When the issue is not fixed, an overwriting can produce the following splat: [root@(none) bpf]# ./test_progs -t timer_crash [ 15.930339] bpf_testmod: loading out-of-tree module taints kernel. [ 16.037849] ================================================================== [ 16.038458] BUG: KASAN: user-memory-access in __pv_queued_spin_lock_slowpath+0x32b/0x520 [ 16.038944] Write of size 8 at addr 0000000000043ec0 by task test_progs/325 [ 16.039399] [ 16.039514] CPU: 0 PID: 325 Comm: test_progs Tainted: G OE 5.16.0+ #278 [ 16.039983] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.15.0-1 04/01/2014 [ 16.040485] Call Trace: [ 16.040645] [ 16.040805] dump_stack_lvl+0x59/0x73 [ 16.041069] ? __pv_queued_spin_lock_slowpath+0x32b/0x520 [ 16.041427] kasan_report.cold+0x116/0x11b [ 16.041673] ? __pv_queued_spin_lock_slowpath+0x32b/0x520 [ 16.042040] __pv_queued_spin_lock_slowpath+0x32b/0x520 [ 16.042328] ? memcpy+0x39/0x60 [ 16.042552] ? pv_hash+0xd0/0xd0 [ 16.042785] ? lockdep_hardirqs_off+0x95/0xd0 [ 16.043079] __bpf_spin_lock_irqsave+0xdf/0xf0 [ 16.043366] ? bpf_get_current_comm+0x50/0x50 [ 16.043608] ? jhash+0x11a/0x270 [ 16.043848] bpf_timer_cancel+0x34/0xe0 [ 16.044119] bpf_prog_c4ea1c0f7449940d_sys_enter+0x7c/0x81 [ 16.044500] bpf_trampoline_6442477838_0+0x36/0x1000 [ 16.044836] __x64_sys_nanosleep+0x5/0x140 [ 16.045119] do_syscall_64+0x59/0x80 [ 16.045377] ? lock_is_held_type+0xe4/0x140 [ 16.045670] ? irqentry_exit_to_user_mode+0xa/0x40 [ 16.046001] ? mark_held_locks+0x24/0x90 [ 16.046287] ? asm_exc_page_fault+0x1e/0x30 [ 16.046569] ? asm_exc_page_fault+0x8/0x30 [ 16.046851] ? lockdep_hardirqs_on+0x7e/0x100 [ 16.047137] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 16.047405] RIP: 0033:0x7f9e4831718d [ 16.047602] Code: b4 0c 00 0f 05 eb a9 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b3 6c 0c 00 f7 d8 64 89 01 48 [ 16.048764] RSP: 002b:00007fff488086b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000023 [ 16.049275] RAX: ffffffffffffffda RBX: 00007f9e48683740 RCX: 00007f9e4831718d [ 16.049747] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007fff488086d0 [ 16.050225] RBP: 00007fff488086f0 R08: 00007fff488085d7 R09: 00007f9e4cb594a0 [ 16.050648] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f9e484cde30 [ 16.051124] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 16.051608] [ 16.051762] ================================================================== Fixes: 68134668c17f ("bpf: Add map side support for bpf timers.") Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com --- include/linux/bpf.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index fa517ae604ad..31a83449808b 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -224,7 +224,8 @@ static inline void copy_map_value(struct bpf_map *map, void *dst, void *src) if (unlikely(map_value_has_spin_lock(map))) { s_off = map->spin_lock_off; s_sz = sizeof(struct bpf_spin_lock); - } else if (unlikely(map_value_has_timer(map))) { + } + if (unlikely(map_value_has_timer(map))) { t_off = map->timer_off; t_sz = sizeof(struct bpf_timer); } -- cgit From 5eaed6eedbe9612f642ad2b880f961d1c6c8ec2b Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Fri, 11 Feb 2022 11:49:53 -0800 Subject: bpf: Fix a bpf_timer initialization issue MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The patch in [1] intends to fix a bpf_timer related issue, but the fix caused existing 'timer' selftest to fail with hang or some random errors. After some debug, I found an issue with check_and_init_map_value() in the hashtab.c. More specifically, in hashtab.c, we have code l_new = bpf_map_kmalloc_node(&htab->map, ...) check_and_init_map_value(&htab->map, l_new...) Note that bpf_map_kmalloc_node() does not do initialization so l_new contains random value. The function check_and_init_map_value() intends to zero the bpf_spin_lock and bpf_timer if they exist in the map. But I found bpf_spin_lock is zero'ed but bpf_timer is not zero'ed. With [1], later copy_map_value() skips copying of bpf_spin_lock and bpf_timer. The non-zero bpf_timer caused random failures for 'timer' selftest. Without [1], for both bpf_spin_lock and bpf_timer case, bpf_timer will be zero'ed, so 'timer' self test is okay. For check_and_init_map_value(), why bpf_spin_lock is zero'ed properly while bpf_timer not. In bpf uapi header, we have struct bpf_spin_lock { __u32 val; }; struct bpf_timer { __u64 :64; __u64 :64; } __attribute__((aligned(8))); The initialization code: *(struct bpf_spin_lock *)(dst + map->spin_lock_off) = (struct bpf_spin_lock){}; *(struct bpf_timer *)(dst + map->timer_off) = (struct bpf_timer){}; It appears the compiler has no obligation to initialize anonymous fields. For example, let us use clang with bpf target as below: $ cat t.c struct bpf_timer { unsigned long long :64; }; struct bpf_timer2 { unsigned long long a; }; void test(struct bpf_timer *t) { *t = (struct bpf_timer){}; } void test2(struct bpf_timer2 *t) { *t = (struct bpf_timer2){}; } $ clang -target bpf -O2 -c -g t.c $ llvm-objdump -d t.o ... 0000000000000000 : 0: 95 00 00 00 00 00 00 00 exit 0000000000000008 : 1: b7 02 00 00 00 00 00 00 r2 = 0 2: 7b 21 00 00 00 00 00 00 *(u64 *)(r1 + 0) = r2 3: 95 00 00 00 00 00 00 00 exit gcc11.2 does not have the above issue. But from INTERNATIONAL STANDARD ©ISO/IEC ISO/IEC 9899:201x Programming languages — C http://www.open-std.org/Jtc1/sc22/wg14/www/docs/n1547.pdf page 157: Except where explicitly stated otherwise, for the purposes of this subclause unnamed members of objects of structure and union type do not participate in initialization. Unnamed members of structure objects have indeterminate value even after initialization. To fix the problem, let use memset for bpf_timer case in check_and_init_map_value(). For consistency, memset is also used for bpf_spin_lock case. [1] https://lore.kernel.org/bpf/20220209070324.1093182-2-memxor@gmail.com/ Fixes: 68134668c17f3 ("bpf: Add map side support for bpf timers.") Signed-off-by: Yonghong Song Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220211194953.3142152-1-yhs@fb.com --- include/linux/bpf.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 31a83449808b..d0ad379d1e62 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -209,11 +209,9 @@ static inline bool map_value_has_timer(const struct bpf_map *map) static inline void check_and_init_map_value(struct bpf_map *map, void *dst) { if (unlikely(map_value_has_spin_lock(map))) - *(struct bpf_spin_lock *)(dst + map->spin_lock_off) = - (struct bpf_spin_lock){}; + memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock)); if (unlikely(map_value_has_timer(map))) - *(struct bpf_timer *)(dst + map->timer_off) = - (struct bpf_timer){}; + memset(dst + map->timer_off, 0, sizeof(struct bpf_timer)); } /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */ -- cgit From e496132ebedd870b67f1f6d2428f9bb9d7ae27fd Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Tue, 8 Feb 2022 09:43:34 +0000 Subject: sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs Commit 7d2b5dd0bcc4 ("sched/numa: Allow a floating imbalance between NUMA nodes") allowed an imbalance between NUMA nodes such that communicating tasks would not be pulled apart by the load balancer. This works fine when there is a 1:1 relationship between LLC and node but can be suboptimal for multiple LLCs if independent tasks prematurely use CPUs sharing cache. Zen* has multiple LLCs per node with local memory channels and due to the allowed imbalance, it's far harder to tune some workloads to run optimally than it is on hardware that has 1 LLC per node. This patch allows an imbalance to exist up to the point where LLCs should be balanced between nodes. On a Zen3 machine running STREAM parallelised with OMP to have on instance per LLC the results and without binding, the results are 5.17.0-rc0 5.17.0-rc0 vanilla sched-numaimb-v6 MB/sec copy-16 162596.94 ( 0.00%) 580559.74 ( 257.05%) MB/sec scale-16 136901.28 ( 0.00%) 374450.52 ( 173.52%) MB/sec add-16 157300.70 ( 0.00%) 564113.76 ( 258.62%) MB/sec triad-16 151446.88 ( 0.00%) 564304.24 ( 272.61%) STREAM can use directives to force the spread if the OpenMP is new enough but that doesn't help if an application uses threads and it's not known in advance how many threads will be created. Coremark is a CPU and cache intensive benchmark parallelised with threads. When running with 1 thread per core, the vanilla kernel allows threads to contend on cache. With the patch; 5.17.0-rc0 5.17.0-rc0 vanilla sched-numaimb-v5 Min Score-16 368239.36 ( 0.00%) 389816.06 ( 5.86%) Hmean Score-16 388607.33 ( 0.00%) 427877.08 * 10.11%* Max Score-16 408945.69 ( 0.00%) 481022.17 ( 17.62%) Stddev Score-16 15247.04 ( 0.00%) 24966.82 ( -63.75%) CoeffVar Score-16 3.92 ( 0.00%) 5.82 ( -48.48%) It can also make a big difference for semi-realistic workloads like specjbb which can execute arbitrary numbers of threads without advance knowledge of how they should be placed. Even in cases where the average performance is neutral, the results are more stable. 5.17.0-rc0 5.17.0-rc0 vanilla sched-numaimb-v6 Hmean tput-1 71631.55 ( 0.00%) 73065.57 ( 2.00%) Hmean tput-8 582758.78 ( 0.00%) 556777.23 ( -4.46%) Hmean tput-16 1020372.75 ( 0.00%) 1009995.26 ( -1.02%) Hmean tput-24 1416430.67 ( 0.00%) 1398700.11 ( -1.25%) Hmean tput-32 1687702.72 ( 0.00%) 1671357.04 ( -0.97%) Hmean tput-40 1798094.90 ( 0.00%) 2015616.46 * 12.10%* Hmean tput-48 1972731.77 ( 0.00%) 2333233.72 ( 18.27%) Hmean tput-56 2386872.38 ( 0.00%) 2759483.38 ( 15.61%) Hmean tput-64 2909475.33 ( 0.00%) 2925074.69 ( 0.54%) Hmean tput-72 2585071.36 ( 0.00%) 2962443.97 ( 14.60%) Hmean tput-80 2994387.24 ( 0.00%) 3015980.59 ( 0.72%) Hmean tput-88 3061408.57 ( 0.00%) 3010296.16 ( -1.67%) Hmean tput-96 3052394.82 ( 0.00%) 2784743.41 ( -8.77%) Hmean tput-104 2997814.76 ( 0.00%) 2758184.50 ( -7.99%) Hmean tput-112 2955353.29 ( 0.00%) 2859705.09 ( -3.24%) Hmean tput-120 2889770.71 ( 0.00%) 2764478.46 ( -4.34%) Hmean tput-128 2871713.84 ( 0.00%) 2750136.73 ( -4.23%) Stddev tput-1 5325.93 ( 0.00%) 2002.53 ( 62.40%) Stddev tput-8 6630.54 ( 0.00%) 10905.00 ( -64.47%) Stddev tput-16 25608.58 ( 0.00%) 6851.16 ( 73.25%) Stddev tput-24 12117.69 ( 0.00%) 4227.79 ( 65.11%) Stddev tput-32 27577.16 ( 0.00%) 8761.05 ( 68.23%) Stddev tput-40 59505.86 ( 0.00%) 2048.49 ( 96.56%) Stddev tput-48 168330.30 ( 0.00%) 93058.08 ( 44.72%) Stddev tput-56 219540.39 ( 0.00%) 30687.02 ( 86.02%) Stddev tput-64 121750.35 ( 0.00%) 9617.36 ( 92.10%) Stddev tput-72 223387.05 ( 0.00%) 34081.13 ( 84.74%) Stddev tput-80 128198.46 ( 0.00%) 22565.19 ( 82.40%) Stddev tput-88 136665.36 ( 0.00%) 27905.97 ( 79.58%) Stddev tput-96 111925.81 ( 0.00%) 99615.79 ( 11.00%) Stddev tput-104 146455.96 ( 0.00%) 28861.98 ( 80.29%) Stddev tput-112 88740.49 ( 0.00%) 58288.23 ( 34.32%) Stddev tput-120 186384.86 ( 0.00%) 45812.03 ( 75.42%) Stddev tput-128 78761.09 ( 0.00%) 57418.48 ( 27.10%) Similarly, for embarassingly parallel problems like NPB-ep, there are improvements due to better spreading across LLC when the machine is not fully utilised. vanilla sched-numaimb-v6 Min ep.D 31.79 ( 0.00%) 26.11 ( 17.87%) Amean ep.D 31.86 ( 0.00%) 26.17 * 17.86%* Stddev ep.D 0.07 ( 0.00%) 0.05 ( 24.41%) CoeffVar ep.D 0.22 ( 0.00%) 0.20 ( 7.97%) Max ep.D 31.93 ( 0.00%) 26.21 ( 17.91%) Signed-off-by: Mel Gorman Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Gautham R. Shenoy Tested-by: K Prateek Nayak Link: https://lore.kernel.org/r/20220208094334.16379-3-mgorman@techsingularity.net --- include/linux/sched/topology.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 8054641c0a7b..56cffe42abbc 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -93,6 +93,7 @@ struct sched_domain { unsigned int busy_factor; /* less balancing by factor if busy */ unsigned int imbalance_pct; /* No balance until over watermark */ unsigned int cache_nice_tries; /* Leave cache hot tasks for # tries */ + unsigned int imb_numa_nr; /* Nr running tasks that allows a NUMA imbalance */ int nohz_idle; /* NOHZ IDLE status */ int flags; /* See SD_* */ -- cgit From 0764db9b49c932b89ee4d9e3236dff4bb07b4a66 Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Fri, 11 Feb 2022 16:32:32 -0800 Subject: mm: memcg: synchronize objcg lists with a dedicated spinlock Alexander reported a circular lock dependency revealed by the mmap1 ltp test: LOCKDEP_CIRCULAR (suite: ltp, case: mtest06 (mmap1)) WARNING: possible circular locking dependency detected 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Not tainted ------------------------------------------------------ mmap1/202299 is trying to acquire lock: 00000001892c0188 (css_set_lock){..-.}-{2:2}, at: obj_cgroup_release+0x4a/0xe0 but task is already holding lock: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&sighand->siglock){-.-.}-{2:2}: __lock_acquire+0x604/0xbd8 lock_acquire.part.0+0xe2/0x238 lock_acquire+0xb0/0x200 _raw_spin_lock_irqsave+0x6a/0xd8 __lock_task_sighand+0x90/0x190 cgroup_freeze_task+0x2e/0x90 cgroup_migrate_execute+0x11c/0x608 cgroup_update_dfl_csses+0x246/0x270 cgroup_subtree_control_write+0x238/0x518 kernfs_fop_write_iter+0x13e/0x1e0 new_sync_write+0x100/0x190 vfs_write+0x22c/0x2d8 ksys_write+0x6c/0xf8 __do_syscall+0x1da/0x208 system_call+0x82/0xb0 -> #0 (css_set_lock){..-.}-{2:2}: check_prev_add+0xe0/0xed8 validate_chain+0x736/0xb20 __lock_acquire+0x604/0xbd8 lock_acquire.part.0+0xe2/0x238 lock_acquire+0xb0/0x200 _raw_spin_lock_irqsave+0x6a/0xd8 obj_cgroup_release+0x4a/0xe0 percpu_ref_put_many.constprop.0+0x150/0x168 drain_obj_stock+0x94/0xe8 refill_obj_stock+0x94/0x278 obj_cgroup_charge+0x164/0x1d8 kmem_cache_alloc+0xac/0x528 __sigqueue_alloc+0x150/0x308 __send_signal+0x260/0x550 send_signal+0x7e/0x348 force_sig_info_to_task+0x104/0x180 force_sig_fault+0x48/0x58 __do_pgm_check+0x120/0x1f0 pgm_check_handler+0x11e/0x180 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sighand->siglock); lock(css_set_lock); lock(&sighand->siglock); lock(css_set_lock); *** DEADLOCK *** 2 locks held by mmap1/202299: #0: 00000000ca3b3818 (&sighand->siglock){-.-.}-{2:2}, at: force_sig_info_to_task+0x38/0x180 #1: 00000001892ad560 (rcu_read_lock){....}-{1:2}, at: percpu_ref_put_many.constprop.0+0x0/0x168 stack backtrace: CPU: 15 PID: 202299 Comm: mmap1 Not tainted 5.17.0-20220113.rc0.git0.f2211f194038.300.fc35.s390x+debug #1 Hardware name: IBM 3906 M04 704 (LPAR) Call Trace: dump_stack_lvl+0x76/0x98 check_noncircular+0x136/0x158 check_prev_add+0xe0/0xed8 validate_chain+0x736/0xb20 __lock_acquire+0x604/0xbd8 lock_acquire.part.0+0xe2/0x238 lock_acquire+0xb0/0x200 _raw_spin_lock_irqsave+0x6a/0xd8 obj_cgroup_release+0x4a/0xe0 percpu_ref_put_many.constprop.0+0x150/0x168 drain_obj_stock+0x94/0xe8 refill_obj_stock+0x94/0x278 obj_cgroup_charge+0x164/0x1d8 kmem_cache_alloc+0xac/0x528 __sigqueue_alloc+0x150/0x308 __send_signal+0x260/0x550 send_signal+0x7e/0x348 force_sig_info_to_task+0x104/0x180 force_sig_fault+0x48/0x58 __do_pgm_check+0x120/0x1f0 pgm_check_handler+0x11e/0x180 INFO: lockdep is turned off. In this example a slab allocation from __send_signal() caused a refilling and draining of a percpu objcg stock, resulted in a releasing of another non-related objcg. Objcg release path requires taking the css_set_lock, which is used to synchronize objcg lists. This can create a circular dependency with the sighandler lock, which is taken with the locked css_set_lock by the freezer code (to freeze a task). In general it seems that using css_set_lock to synchronize objcg lists makes any slab allocations and deallocation with the locked css_set_lock and any intervened locks risky. To fix the problem and make the code more robust let's stop using css_set_lock to synchronize objcg lists and use a new dedicated spinlock instead. Link: https://lkml.kernel.org/r/Yfm1IHmoGdyUR81T@carbon.dhcp.thefacebook.com Fixes: bf4f059954dc ("mm: memcg/slab: obj_cgroup API") Signed-off-by: Roman Gushchin Reported-by: Alexander Egorenkov Tested-by: Alexander Egorenkov Reviewed-by: Waiman Long Acked-by: Tejun Heo Reviewed-by: Shakeel Butt Reviewed-by: Jeremy Linton Tested-by: Jeremy Linton Cc: Johannes Weiner Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b72d75141e12..0abbd685703b 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -219,7 +219,7 @@ struct obj_cgroup { struct mem_cgroup *memcg; atomic_t nr_charged_bytes; union { - struct list_head list; + struct list_head list; /* protected by objcg_lock */ struct rcu_head rcu; }; }; @@ -315,7 +315,8 @@ struct mem_cgroup { #ifdef CONFIG_MEMCG_KMEM int kmemcg_id; struct obj_cgroup __rcu *objcg; - struct list_head objcg_list; /* list of inherited objcgs */ + /* list of inherited objcgs, protected by objcg_lock */ + struct list_head objcg_list; #endif MEMCG_PADDING(_pad2_); -- cgit From 8913c61001482378d4ed8cc577b17c1ba3e847e4 Mon Sep 17 00:00:00 2001 From: Peng Liu Date: Fri, 11 Feb 2022 16:32:35 -0800 Subject: kfence: make test case compatible with run time set sample interval The parameter kfence_sample_interval can be set via boot parameter and late shell command, which is convenient for automated tests and KFENCE parameter optimization. However, KFENCE test case just uses compile-time CONFIG_KFENCE_SAMPLE_INTERVAL, which will make KFENCE test case not run as users desired. Export kfence_sample_interval, so that KFENCE test case can use run-time-set sample interval. Link: https://lkml.kernel.org/r/20220207034432.185532-1-liupeng256@huawei.com Signed-off-by: Peng Liu Reviewed-by: Marco Elver Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Jonathan Corbet Cc: Sumit Semwal Cc: Christian Knig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kfence.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kfence.h b/include/linux/kfence.h index 4b5e3679a72c..f49e64222628 100644 --- a/include/linux/kfence.h +++ b/include/linux/kfence.h @@ -17,6 +17,8 @@ #include #include +extern unsigned long kfence_sample_interval; + /* * We allocate an even number of pages, as it simplifies calculations to map * address to metadata indices; effectively, the very first page serves as an -- cgit From 6d240170811aad7330e6d0b3857fb0d4d9c82b56 Mon Sep 17 00:00:00 2001 From: Peng Fan Date: Mon, 7 Feb 2022 10:05:40 +0800 Subject: firmware: imx: add get resource owner api Add resource owner management API, this API could be used to check whether M4 is under control of Linux. Signed-off-by: Peng Fan Signed-off-by: Shawn Guo --- include/linux/firmware/imx/svc/rm.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/imx/svc/rm.h b/include/linux/firmware/imx/svc/rm.h index 456b6a59d29b..31456f897aa9 100644 --- a/include/linux/firmware/imx/svc/rm.h +++ b/include/linux/firmware/imx/svc/rm.h @@ -59,11 +59,16 @@ enum imx_sc_rm_func { #if IS_ENABLED(CONFIG_IMX_SCU) bool imx_sc_rm_is_resource_owned(struct imx_sc_ipc *ipc, u16 resource); +int imx_sc_rm_get_resource_owner(struct imx_sc_ipc *ipc, u16 resource, u8 *pt); #else static inline bool imx_sc_rm_is_resource_owned(struct imx_sc_ipc *ipc, u16 resource) { return true; } +static inline int imx_sc_rm_get_resource_owner(struct imx_sc_ipc *ipc, u16 resource, u8 *pt) +{ + return -EOPNOTSUPP; +} #endif #endif -- cgit From 3e96dcfb96e80d2f7f1edb6a1ac81b12de996fa8 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Fri, 11 Feb 2022 23:32:27 +0100 Subject: ARM: ixp4xx: Delete the Goramo MLR boardfile MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This board is replaced with the corresponding device tree. Also delete dangling platform data file only used by this boardfile and nothing else. Cc: Krzysztof Hałasa Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220211223238.648934-3-linus.walleij@linaro.org Signed-off-by: Linus Walleij --- include/linux/platform_data/wan_ixp4xx_hss.h | 17 ----------------- 1 file changed, 17 deletions(-) delete mode 100644 include/linux/platform_data/wan_ixp4xx_hss.h (limited to 'include/linux') diff --git a/include/linux/platform_data/wan_ixp4xx_hss.h b/include/linux/platform_data/wan_ixp4xx_hss.h deleted file mode 100644 index d525a0feb9e1..000000000000 --- a/include/linux/platform_data/wan_ixp4xx_hss.h +++ /dev/null @@ -1,17 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __PLATFORM_DATA_WAN_IXP4XX_HSS_H -#define __PLATFORM_DATA_WAN_IXP4XX_HSS_H - -#include - -/* Information about built-in HSS (synchronous serial) interfaces */ -struct hss_plat_info { - int (*set_clock)(int port, unsigned int clock_type); - int (*open)(int port, void *pdev, - void (*set_carrier_cb)(void *pdev, int carrier)); - void (*close)(int port, void *pdev); - u8 txreadyq; - u32 timer_freq; -}; - -#endif -- cgit From b50113cbdd1340c31e85e4cfc5f5e81ce9cbb2aa Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Fri, 11 Feb 2022 23:32:31 +0100 Subject: soc: ixp4xx: Add features from regmap helper If we want to read the CFG2 register on the expansion bus and apply the inversion and check for some hardcoded versions this helper comes in handy. Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220211223238.648934-7-linus.walleij@linaro.org Signed-off-by: Linus Walleij --- include/linux/soc/ixp4xx/cpu.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/ixp4xx/cpu.h b/include/linux/soc/ixp4xx/cpu.h index 88bd8de0e803..48c2e241ac83 100644 --- a/include/linux/soc/ixp4xx/cpu.h +++ b/include/linux/soc/ixp4xx/cpu.h @@ -9,6 +9,7 @@ #define __SOC_IXP4XX_CPU_H__ #include +#include #ifdef CONFIG_ARM #include #endif @@ -23,6 +24,9 @@ #define IXP46X_PROCESSOR_ID_VALUE 0x69054200 /* including IXP455 */ #define IXP46X_PROCESSOR_ID_MASK 0xfffffff0 +/* Feature register in the expansion bus controller */ +#define IXP4XX_EXP_CNFG2 0x2c + /* "fuse" bits of IXP_EXP_CFG2 */ /* All IXP4xx CPUs */ #define IXP4XX_FEATURE_RCOMP (1 << 0) @@ -89,6 +93,22 @@ u32 ixp4xx_read_feature_bits(void); void ixp4xx_write_feature_bits(u32 value); +static inline u32 cpu_ixp4xx_features(struct regmap *rmap) +{ + u32 val; + + regmap_read(rmap, IXP4XX_EXP_CNFG2, &val); + /* For some reason this register is inverted */ + val = ~val; + if (cpu_is_ixp42x_rev_a0()) + return IXP42X_FEATURE_MASK & ~(IXP4XX_FEATURE_RCOMP | + IXP4XX_FEATURE_AES); + if (cpu_is_ixp42x()) + return val & IXP42X_FEATURE_MASK; + if (cpu_is_ixp43x()) + return val & IXP43X_FEATURE_MASK; + return val & IXP46X_FEATURE_MASK; +} #else #define cpu_is_ixp42x_rev_a0() 0 #define cpu_is_ixp42x() 0 @@ -101,6 +121,10 @@ static inline u32 ixp4xx_read_feature_bits(void) static inline void ixp4xx_write_feature_bits(u32 value) { } +static inline u32 cpu_ixp4xx_features(struct regmap *rmap) +{ + return 0; +} #endif #endif /* _ASM_ARCH_CPU_H */ -- cgit From 8754a7e61c766fbc533c627b56ff181550dca00e Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Fri, 11 Feb 2022 23:32:32 +0100 Subject: soc: ixp4xx-npe: Access syscon regs using regmap If we access the syscon (expansion bus config registers) using the syscon regmap instead of relying on direct accessor functions, we do not need to call this static code in the machine (arch/arm/mach-ixp4xx/common.c) which makes things less dependent on custom machine-dependent code. Look up the syscon regmap and handle the error: this will make deferred probe work with relation to the syscon. Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220211223238.648934-8-linus.walleij@linaro.org Signed-off-by: Linus Walleij --- include/linux/soc/ixp4xx/npe.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/ixp4xx/npe.h b/include/linux/soc/ixp4xx/npe.h index 2a91f465d456..9efeac777da1 100644 --- a/include/linux/soc/ixp4xx/npe.h +++ b/include/linux/soc/ixp4xx/npe.h @@ -3,6 +3,7 @@ #define __IXP4XX_NPE_H #include +#include extern const char *npe_names[]; @@ -17,6 +18,7 @@ struct npe_regs { struct npe { struct npe_regs __iomem *regs; + struct regmap *rmap; int id; int valid; }; -- cgit From c8200f4e7267545a384fb86a4630f76958ab9df6 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Fri, 11 Feb 2022 23:32:33 +0100 Subject: net: ixp4xx_eth: Drop platform data support All IXP4xx platforms are converted to device tree, the platform data path is no longer used. Drop the code and custom include, confine the driver in its own file. Depend on OF and remove ifdefs around this, as we are all probing from OF now. Cc: David S. Miller Cc: Jakub Kicinski Cc: netdev@vger.kernel.org Signed-off-by: Linus Walleij Acked-by: Jakub Kicinski Link: https://lore.kernel.org/r/20220211223238.648934-9-linus.walleij@linaro.org Signed-off-by: Linus Walleij --- include/linux/platform_data/eth_ixp4xx.h | 21 --------------------- 1 file changed, 21 deletions(-) delete mode 100644 include/linux/platform_data/eth_ixp4xx.h (limited to 'include/linux') diff --git a/include/linux/platform_data/eth_ixp4xx.h b/include/linux/platform_data/eth_ixp4xx.h deleted file mode 100644 index 114b0940729f..000000000000 --- a/include/linux/platform_data/eth_ixp4xx.h +++ /dev/null @@ -1,21 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __PLATFORM_DATA_ETH_IXP4XX -#define __PLATFORM_DATA_ETH_IXP4XX - -#include - -#define IXP4XX_ETH_NPEA 0x00 -#define IXP4XX_ETH_NPEB 0x10 -#define IXP4XX_ETH_NPEC 0x20 - -/* Information about built-in Ethernet MAC interfaces */ -struct eth_plat_info { - u8 phy; /* MII PHY ID, 0 - 31 */ - u8 rxq; /* configurable, currently 0 - 31 only */ - u8 txreadyq; - u8 hwaddr[6]; - u8 npe; /* NPE instance used by this interface */ - bool has_mdio; /* If this instance has an MDIO bus */ -}; - -#endif -- cgit From 3059dfa52c07a9b6d770e87dc21b68f4295239c5 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Fri, 11 Feb 2022 23:32:35 +0100 Subject: ARM: ixp4xx: Remove feature bit accessors We switched users of the accessors over to using syscon to inspect the bits, or removed the need for checking them. Delete these accessors. Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220211223238.648934-11-linus.walleij@linaro.org Signed-off-by: Linus Walleij --- include/linux/soc/ixp4xx/cpu.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soc/ixp4xx/cpu.h b/include/linux/soc/ixp4xx/cpu.h index 48c2e241ac83..f526ac33afea 100644 --- a/include/linux/soc/ixp4xx/cpu.h +++ b/include/linux/soc/ixp4xx/cpu.h @@ -90,9 +90,6 @@ IXP43X_PROCESSOR_ID_VALUE) #define cpu_is_ixp46x() ((read_cpuid_id() & IXP46X_PROCESSOR_ID_MASK) == \ IXP46X_PROCESSOR_ID_VALUE) - -u32 ixp4xx_read_feature_bits(void); -void ixp4xx_write_feature_bits(u32 value); static inline u32 cpu_ixp4xx_features(struct regmap *rmap) { u32 val; @@ -114,13 +111,6 @@ static inline u32 cpu_ixp4xx_features(struct regmap *rmap) #define cpu_is_ixp42x() 0 #define cpu_is_ixp43x() 0 #define cpu_is_ixp46x() 0 -static inline u32 ixp4xx_read_feature_bits(void) -{ - return 0; -} -static inline void ixp4xx_write_feature_bits(u32 value) -{ -} static inline u32 cpu_ixp4xx_features(struct regmap *rmap) { return 0; -- cgit From f5c54f77b07b278cfde4a654e111c39996ac8b5b Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Fri, 4 Feb 2022 09:30:13 +0100 Subject: cpumask: Add a x86-specific cpumask_clear_cpu() helper Add a x86-specific cpumask_clear_cpu() helper which will be used in places where the explicit KASAN-instrumentation in the *_bit() helpers is unwanted. Also, always inline two more cpumask generic helpers. allyesconfig: text data bss dec hex filename 190553143 159425889 32076404 382055436 16c5b40c vmlinux.before 190551812 159424945 32076404 382053161 16c5ab29 vmlinux.after Signed-off-by: Borislav Petkov Acked-by: Marco Elver Link: https://lore.kernel.org/r/20220204083015.17317-2-bp@alien8.de --- include/linux/cpumask.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 6b06c698cd2a..fe29ac7cc469 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -102,7 +102,7 @@ extern atomic_t __num_online_cpus; extern cpumask_t cpus_booted_once_mask; -static inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits) +static __always_inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits) { #ifdef CONFIG_DEBUG_PER_CPU_MAPS WARN_ON_ONCE(cpu >= bits); @@ -110,7 +110,7 @@ static inline void cpu_max_bits_warn(unsigned int cpu, unsigned int bits) } /* verify cpu argument to cpumask_* operators */ -static inline unsigned int cpumask_check(unsigned int cpu) +static __always_inline unsigned int cpumask_check(unsigned int cpu) { cpu_max_bits_warn(cpu, nr_cpumask_bits); return cpu; -- cgit From 2618a0dae09ef37728dab89ff60418cbe25ae6bd Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Sat, 12 Feb 2022 09:14:49 -0800 Subject: etherdevice: Adjust ether_addr* prototypes to silence -Wstringop-overead MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With GCC 12, -Wstringop-overread was warning about an implicit cast from char[6] to char[8]. However, the extra 2 bytes are always thrown away, alignment doesn't matter, and the risk of hitting the edge of unallocated memory has been accepted, so this prototype can just be converted to a regular char *. Silences: net/core/dev.c: In function ‘bpf_prog_run_generic_xdp’: net/core/dev.c:4618:21: warning: ‘ether_addr_equal_64bits’ reading 8 bytes from a region of size 6 [-Wstringop-overread] 4618 | orig_host = ether_addr_equal_64bits(eth->h_dest, > skb->dev->dev_addr); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ net/core/dev.c:4618:21: note: referencing argument 1 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} net/core/dev.c:4618:21: note: referencing argument 2 of type ‘const u8[8]’ {aka ‘const unsigned char[8]’} In file included from net/core/dev.c:91: include/linux/etherdevice.h:375:20: note: in a call to function ‘ether_addr_equal_64bits’ 375 | static inline bool ether_addr_equal_64bits(const u8 addr1[6+2], | ^~~~~~~~~~~~~~~~~~~~~~~ Reported-by: Marc Kleine-Budde Tested-by: Marc Kleine-Budde Link: https://lore.kernel.org/netdev/20220212090811.uuzk6d76agw2vv73@pengutronix.de Cc: Jakub Kicinski Cc: "David S. Miller" Cc: netdev@vger.kernel.org Signed-off-by: Kees Cook Signed-off-by: David S. Miller --- include/linux/etherdevice.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index 2ad71cc90b37..92b10e67d5f8 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -134,7 +134,7 @@ static inline bool is_multicast_ether_addr(const u8 *addr) #endif } -static inline bool is_multicast_ether_addr_64bits(const u8 addr[6+2]) +static inline bool is_multicast_ether_addr_64bits(const u8 *addr) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64 #ifdef __BIG_ENDIAN @@ -372,8 +372,7 @@ static inline bool ether_addr_equal(const u8 *addr1, const u8 *addr2) * Please note that alignment of addr1 & addr2 are only guaranteed to be 16 bits. */ -static inline bool ether_addr_equal_64bits(const u8 addr1[6+2], - const u8 addr2[6+2]) +static inline bool ether_addr_equal_64bits(const u8 *addr1, const u8 *addr2) { #if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS) && BITS_PER_LONG == 64 u64 fold = (*(const u64 *)addr1) ^ (*(const u64 *)addr2); -- cgit From 845301001308aab8fb7902548f6c3256d28b8c48 Mon Sep 17 00:00:00 2001 From: Daisuke Nojiri Date: Wed, 26 Jan 2022 10:04:10 -0800 Subject: power: supply: PCHG: Use MKBP for device event handling This change makes the PCHG driver receive device events through MKBP protocol since CrOS EC switched to deliver all peripheral charge events to the MKBP protocol. This will unify PCHG event handling on X86 and ARM. Signed-off-by: Daisuke Nojiri Signed-off-by: Sebastian Reichel --- include/linux/platform_data/cros_ec_commands.h | 64 ++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_commands.h b/include/linux/platform_data/cros_ec_commands.h index 271bd87bff0a..95e7e5667291 100644 --- a/include/linux/platform_data/cros_ec_commands.h +++ b/include/linux/platform_data/cros_ec_commands.h @@ -3386,6 +3386,9 @@ enum ec_mkbp_event { /* Send an incoming CEC message to the AP */ EC_MKBP_EVENT_CEC_MESSAGE = 9, + /* Peripheral device charger event */ + EC_MKBP_EVENT_PCHG = 12, + /* Number of MKBP events */ EC_MKBP_EVENT_COUNT, }; @@ -5527,6 +5530,67 @@ enum pchg_state { [PCHG_STATE_CONNECTED] = "CONNECTED", \ } +/* + * Update firmware of peripheral chip + */ +#define EC_CMD_PCHG_UPDATE 0x0136 + +/* Port number is encoded in bit[28:31]. */ +#define EC_MKBP_PCHG_PORT_SHIFT 28 +/* Utility macro for converting MKBP event to port number. */ +#define EC_MKBP_PCHG_EVENT_TO_PORT(e) (((e) >> EC_MKBP_PCHG_PORT_SHIFT) & 0xf) +/* Utility macro for extracting event bits. */ +#define EC_MKBP_PCHG_EVENT_MASK(e) ((e) \ + & GENMASK(EC_MKBP_PCHG_PORT_SHIFT-1, 0)) + +#define EC_MKBP_PCHG_UPDATE_OPENED BIT(0) +#define EC_MKBP_PCHG_WRITE_COMPLETE BIT(1) +#define EC_MKBP_PCHG_UPDATE_CLOSED BIT(2) +#define EC_MKBP_PCHG_UPDATE_ERROR BIT(3) +#define EC_MKBP_PCHG_DEVICE_EVENT BIT(4) + +enum ec_pchg_update_cmd { + /* Reset chip to normal mode. */ + EC_PCHG_UPDATE_CMD_RESET_TO_NORMAL = 0, + /* Reset and put a chip in update (a.k.a. download) mode. */ + EC_PCHG_UPDATE_CMD_OPEN, + /* Write a block of data containing FW image. */ + EC_PCHG_UPDATE_CMD_WRITE, + /* Close update session. */ + EC_PCHG_UPDATE_CMD_CLOSE, + /* End of commands */ + EC_PCHG_UPDATE_CMD_COUNT, +}; + +struct ec_params_pchg_update { + /* PCHG port number */ + uint8_t port; + /* enum ec_pchg_update_cmd */ + uint8_t cmd; + /* Padding */ + uint8_t reserved0; + uint8_t reserved1; + /* Version of new firmware */ + uint32_t version; + /* CRC32 of new firmware */ + uint32_t crc32; + /* Address in chip memory where is written to */ + uint32_t addr; + /* Size of */ + uint32_t size; + /* Partial data of new firmware */ + uint8_t data[]; +} __ec_align4; + +BUILD_ASSERT(EC_PCHG_UPDATE_CMD_COUNT + < BIT(sizeof(((struct ec_params_pchg_update *)0)->cmd)*8)); + +struct ec_response_pchg_update { + /* Block size */ + uint32_t block_size; +} __ec_align4; + + /*****************************************************************************/ /* Voltage regulator controls */ -- cgit From f68f2ff91512c199ec24883001245912afc17873 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 20 Apr 2021 23:22:52 -0700 Subject: fortify: Detect struct member overflows in memcpy() at compile-time memcpy() is dead; long live memcpy() tl;dr: In order to eliminate a large class of common buffer overflow flaws that continue to persist in the kernel, have memcpy() (under CONFIG_FORTIFY_SOURCE) perform bounds checking of the destination struct member when they have a known size. This would have caught all of the memcpy()-related buffer write overflow flaws identified in at least the last three years. Background and analysis: While stack-based buffer overflow flaws are largely mitigated by stack canaries (and similar) features, heap-based buffer overflow flaws continue to regularly appear in the kernel. Many classes of heap buffer overflows are mitigated by FORTIFY_SOURCE when using the strcpy() family of functions, but a significant number remain exposed through the memcpy() family of functions. At its core, FORTIFY_SOURCE uses the compiler's __builtin_object_size() internal[0] to determine the available size at a target address based on the compile-time known structure layout details. It operates in two modes: outer bounds (0) and inner bounds (1). In mode 0, the size of the enclosing structure is used. In mode 1, the size of the specific field is used. For example: struct object { u16 scalar1; /* 2 bytes */ char array[6]; /* 6 bytes */ u64 scalar2; /* 8 bytes */ u32 scalar3; /* 4 bytes */ u32 scalar4; /* 4 bytes */ } instance; __builtin_object_size(instance.array, 0) == 22, since the remaining size of the enclosing structure starting from "array" is 22 bytes (6 + 8 + 4 + 4). __builtin_object_size(instance.array, 1) == 6, since the remaining size of the specific field "array" is 6 bytes. The initial implementation of FORTIFY_SOURCE used mode 0 because there were many cases of both strcpy() and memcpy() functions being used to write (or read) across multiple fields in a structure. For example, it would catch this, which is writing 2 bytes beyond the end of "instance": memcpy(&instance.array, data, 25); While this didn't protect against overwriting adjacent fields in a given structure, it would at least stop overflows from reaching beyond the end of the structure into neighboring memory, and provided a meaningful mitigation of a subset of buffer overflow flaws. However, many desirable targets remain within the enclosing structure (for example function pointers). As it happened, there were very few cases of strcpy() family functions intentionally writing beyond the end of a string buffer. Once all known cases were removed from the kernel, the strcpy() family was tightened[1] to use mode 1, providing greater mitigation coverage. What remains is switching memcpy() to mode 1 as well, but making the switch is much more difficult because of how frustrating it can be to find existing "normal" uses of memcpy() that expect to write (or read) across multiple fields. The root cause of the problem is that the C language lacks a common pattern to indicate the intent of an author's use of memcpy(), and is further complicated by the available compile-time and run-time mitigation behaviors. The FORTIFY_SOURCE mitigation comes in two halves: the compile-time half, when both the buffer size _and_ the length of the copy is known, and the run-time half, when only the buffer size is known. If neither size is known, there is no bounds checking possible. At compile-time when the compiler sees that a length will always exceed a known buffer size, a warning can be deterministically emitted. For the run-time half, the length is tested against the known size of the buffer, and the overflowing operation is detected. (The performance overhead for these tests is virtually zero.) It is relatively easy to find compile-time false-positives since a warning is always generated. Fixing the false positives, however, can be very time-consuming as there are hundreds of instances. While it's possible some over-read conditions could lead to kernel memory exposures, the bulk of the risk comes from the run-time flaws where the length of a write may end up being attacker-controlled and lead to an overflow. Many of the compile-time false-positives take a form similar to this: memcpy(&instance.scalar2, data, sizeof(instance.scalar2) + sizeof(instance.scalar3)); and the run-time ones are similar, but lack a constant expression for the size of the copy: memcpy(instance.array, data, length); The former is meant to cover multiple fields (though its style has been frowned upon more recently), but has been technically legal. Both lack any expressivity in the C language about the author's _intent_ in a way that a compiler can check when the length isn't known at compile time. A comment doesn't work well because what's needed is something a compiler can directly reason about. Is a given memcpy() call expected to overflow into neighbors? Is it not? By using the new struct_group() macro, this intent can be much more easily encoded. It is not as easy to find the run-time false-positives since the code path to exercise a seemingly out-of-bounds condition that is actually expected may not be trivially reachable. Tightening the restrictions to block an operation for a false positive will either potentially create a greater flaw (if a copy is truncated by the mitigation), or destabilize the kernel (e.g. with a BUG()), making things completely useless for the end user. As a result, tightening the memcpy() restriction (when there is a reasonable level of uncertainty of the number of false positives), needs to first WARN() with no truncation. (Though any sufficiently paranoid end-user can always opt to set the panic_on_warn=1 sysctl.) Once enough development time has passed, the mitigation can be further intensified. (Note that this patch is only the compile-time checking step, which is a prerequisite to doing run-time checking, which will come in future patches.) Given the potential frustrations of weeding out all the false positives when tightening the run-time checks, it is reasonable to wonder if these changes would actually add meaningful protection. Looking at just the last three years, there are 23 identified flaws with a CVE that mention "buffer overflow", and 11 are memcpy()-related buffer overflows. (For the remaining 12: 7 are array index overflows that would be mitigated by systems built with CONFIG_UBSAN_BOUNDS=y: CVE-2019-0145, CVE-2019-14835, CVE-2019-14896, CVE-2019-14897, CVE-2019-14901, CVE-2019-17666, CVE-2021-28952. 2 are miscalculated allocation sizes which could be mitigated with memory tagging: CVE-2019-16746, CVE-2019-2181. 1 is an iovec buffer bug maybe mitigated by memory tagging: CVE-2020-10742. 1 is a type confusion bug mitigated by stack canaries: CVE-2020-10942. 1 is a string handling logic bug with no mitigation I'm aware of: CVE-2021-28972.) At my last count on an x86_64 allmodconfig build, there are 35,294 calls to memcpy(). With callers instrumented to report all places where the buffer size is known but the length remains unknown (i.e. a run-time bounds check is added), we can count how many new run-time bounds checks are added when the destination and source arguments of memcpy() are changed to use "mode 1" bounds checking: 1,276. This means for the future run-time checking, there is a worst-case upper bounds of 3.6% false positives to fix. In addition, there were around 150 new compile-time warnings to evaluate and fix (which have now been fixed). With this instrumentation it's also possible to compare the places where the known 11 memcpy() flaw overflows manifested against the resulting list of potential new run-time bounds checks, as a measure of potential efficacy of the tightened mitigation. Much to my surprise, horror, and delight, all 11 flaws would have been detected by the newly added run-time bounds checks, making this a distinctly clear mitigation improvement: 100% coverage for known memcpy() flaws, with a possible 2 orders of magnitude gain in coverage over existing but undiscovered run-time dynamic length flaws (i.e. 1265 newly covered sites in addition to the 11 known), against only <4% of all memcpy() callers maybe gaining a false positive run-time check, with only about 150 new compile-time instances needing evaluation. Specifically these would have been mitigated: CVE-2020-24490 https://git.kernel.org/linus/a2ec905d1e160a33b2e210e45ad30445ef26ce0e CVE-2020-12654 https://git.kernel.org/linus/3a9b153c5591548612c3955c9600a98150c81875 CVE-2020-12653 https://git.kernel.org/linus/b70261a288ea4d2f4ac7cd04be08a9f0f2de4f4d CVE-2019-14895 https://git.kernel.org/linus/3d94a4a8373bf5f45cf5f939e88b8354dbf2311b CVE-2019-14816 https://git.kernel.org/linus/7caac62ed598a196d6ddf8d9c121e12e082cac3a CVE-2019-14815 https://git.kernel.org/linus/7caac62ed598a196d6ddf8d9c121e12e082cac3a CVE-2019-14814 https://git.kernel.org/linus/7caac62ed598a196d6ddf8d9c121e12e082cac3a CVE-2019-10126 https://git.kernel.org/linus/69ae4f6aac1578575126319d3f55550e7e440449 CVE-2019-9500 https://git.kernel.org/linus/1b5e2423164b3670e8bc9174e4762d297990deff no-CVE-yet https://git.kernel.org/linus/130f634da1af649205f4a3dd86cbe5c126b57914 no-CVE-yet https://git.kernel.org/linus/d10a87a3535cce2b890897914f5d0d83df669c63 To accelerate the review of potential run-time false positives, it's also worth noting that it is possible to partially automate checking by examining the memcpy() buffer argument to check for the destination struct member having a neighboring array member. It is reasonable to expect that the vast majority of run-time false positives would look like the already evaluated and fixed compile-time false positives, where the most common pattern is neighboring arrays. (And, FWIW, many of the compile-time fixes were actual bugs, so it is reasonable to assume we'll have similar cases of actual bugs getting fixed for run-time checks.) Implementation: Tighten the memcpy() destination buffer size checking to use the actual ("mode 1") target buffer size as the bounds check instead of their enclosing structure's ("mode 0") size. Use a common inline for memcpy() (and memmove() in a following patch), since all the tests are the same. All new cross-field memcpy() uses must use the struct_group() macro or similar to target a specific range of fields, so that FORTIFY_SOURCE can reason about the size and safety of the copy. For now, cross-member "mode 1" _read_ detection at compile-time will be limited to W=1 builds, since it is, unfortunately, very common. As the priority is solving write overflows, read overflows will be part of a future phase (and can be fixed in parallel, for anyone wanting to look at W=1 build output). For run-time, the "mode 0" size checking and mitigation is left unchanged, with "mode 1" to be added in stages. In this patch, no new run-time checks are added. Future patches will first bounds-check writes, and only perform a WARN() for now. This way any missed run-time false positives can be flushed out over the coming several development cycles, but system builders who have tested their workloads to be WARN()-free can enable the panic_on_warn=1 sysctl to immediately gain a mitigation against this class of buffer overflows. Once that is under way, run-time bounds-checking of reads can be similarly enabled. Related classes of flaws that will remain unmitigated: - memcpy() with flexible array structures, as the compiler does not currently have visibility into the size of the trailing flexible array. These can be fixed in the future by refactoring such cases to use a new set of flexible array structure helpers to perform the common serialization/deserialization code patterns doing allocation and/or copying. - memcpy() with raw pointers (e.g. void *, char *, etc), or otherwise having their buffer size unknown at compile time, have no good mitigation beyond memory tagging (and even that would only protect against inter-object overflow, not intra-object neighboring field overflows), or refactoring. Some kind of "fat pointer" solution is likely needed to gain proper size-of-buffer awareness. (e.g. see struct membuf) - type confusion where a higher level type's allocation size does not match the resulting cast type eventually passed to a deeper memcpy() call where the compiler cannot see the true type. In theory, greater static analysis could catch these, and the use of -Warray-bounds will help find some of these. [0] https://gcc.gnu.org/onlinedocs/gcc/Object-Size-Checking.html [1] https://git.kernel.org/linus/6a39e62abbafd1d58d1722f40c7d26ef379c6a2f Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 109 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 97 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index a6cd6815f249..f578d00403ad 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -8,7 +8,9 @@ void fortify_panic(const char *name) __noreturn __cold; void __read_overflow(void) __compiletime_error("detected read beyond size of object (1st parameter)"); void __read_overflow2(void) __compiletime_error("detected read beyond size of object (2nd parameter)"); +void __read_overflow2_field(size_t avail, size_t wanted) __compiletime_warning("detected read beyond size of field (2nd parameter); maybe use struct_group()?"); void __write_overflow(void) __compiletime_error("detected write beyond size of object (1st parameter)"); +void __write_overflow_field(size_t avail, size_t wanted) __compiletime_warning("detected write beyond size of field (1st parameter); maybe use struct_group()?"); #define __compiletime_strlen(p) \ ({ \ @@ -209,22 +211,105 @@ __FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size) return __underlying_memset(p, c, size); } -__FORTIFY_INLINE void *memcpy(void *p, const void *q, __kernel_size_t size) +/* + * To make sure the compiler can enforce protection against buffer overflows, + * memcpy(), memmove(), and memset() must not be used beyond individual + * struct members. If you need to copy across multiple members, please use + * struct_group() to create a named mirror of an anonymous struct union. + * (e.g. see struct sk_buff.) Read overflow checking is currently only + * done when a write overflow is also present, or when building with W=1. + * + * Mitigation coverage matrix + * Bounds checking at: + * +-------+-------+-------+-------+ + * | Compile time | Run time | + * memcpy() argument sizes: | write | read | write | read | + * dest source length +-------+-------+-------+-------+ + * memcpy(known, known, constant) | y | y | n/a | n/a | + * memcpy(known, unknown, constant) | y | n | n/a | V | + * memcpy(known, known, dynamic) | n | n | B | B | + * memcpy(known, unknown, dynamic) | n | n | B | V | + * memcpy(unknown, known, constant) | n | y | V | n/a | + * memcpy(unknown, unknown, constant) | n | n | V | V | + * memcpy(unknown, known, dynamic) | n | n | V | B | + * memcpy(unknown, unknown, dynamic) | n | n | V | V | + * +-------+-------+-------+-------+ + * + * y = perform deterministic compile-time bounds checking + * n = cannot perform deterministic compile-time bounds checking + * n/a = no run-time bounds checking needed since compile-time deterministic + * B = can perform run-time bounds checking (currently unimplemented) + * V = vulnerable to run-time overflow (will need refactoring to solve) + * + */ +__FORTIFY_INLINE void fortify_memcpy_chk(__kernel_size_t size, + const size_t p_size, + const size_t q_size, + const size_t p_size_field, + const size_t q_size_field, + const char *func) { - size_t p_size = __builtin_object_size(p, 0); - size_t q_size = __builtin_object_size(q, 0); - if (__builtin_constant_p(size)) { - if (p_size < size) + /* + * Length argument is a constant expression, so we + * can perform compile-time bounds checking where + * buffer sizes are known. + */ + + /* Error when size is larger than enclosing struct. */ + if (p_size > p_size_field && p_size < size) __write_overflow(); - if (q_size < size) + if (q_size > q_size_field && q_size < size) __read_overflow2(); + + /* Warn when write size argument larger than dest field. */ + if (p_size_field < size) + __write_overflow_field(p_size_field, size); + /* + * Warn for source field over-read when building with W=1 + * or when an over-write happened, so both can be fixed at + * the same time. + */ + if ((IS_ENABLED(KBUILD_EXTRA_WARN1) || p_size_field < size) && + q_size_field < size) + __read_overflow2_field(q_size_field, size); } - if (p_size < size || q_size < size) - fortify_panic(__func__); - return __underlying_memcpy(p, q, size); + /* + * At this point, length argument may not be a constant expression, + * so run-time bounds checking can be done where buffer sizes are + * known. (This is not an "else" because the above checks may only + * be compile-time warnings, and we want to still warn for run-time + * overflows.) + */ + + /* + * Always stop accesses beyond the struct that contains the + * field, when the buffer's remaining size is known. + * (The -1 test is to optimize away checks where the buffer + * lengths are unknown.) + */ + if ((p_size != (size_t)(-1) && p_size < size) || + (q_size != (size_t)(-1) && q_size < size)) + fortify_panic(func); } +#define __fortify_memcpy_chk(p, q, size, p_size, q_size, \ + p_size_field, q_size_field, op) ({ \ + size_t __fortify_size = (size_t)(size); \ + fortify_memcpy_chk(__fortify_size, p_size, q_size, \ + p_size_field, q_size_field, #op); \ + __underlying_##op(p, q, __fortify_size); \ +}) + +/* + * __builtin_object_size() must be captured here to avoid evaluating argument + * side-effects further into the macro layers. + */ +#define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ + __builtin_object_size(p, 0), __builtin_object_size(q, 0), \ + __builtin_object_size(p, 1), __builtin_object_size(q, 1), \ + memcpy) + __FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -304,13 +389,14 @@ __FORTIFY_INLINE void *kmemdup(const void *p, size_t size, gfp_t gfp) return __real_kmemdup(p, size, gfp); } -/* defined after fortified strlen and memcpy to reuse them */ +/* Defined after fortified strlen to reuse it. */ __FORTIFY_INLINE char *strcpy(char *p, const char *q) { size_t p_size = __builtin_object_size(p, 1); size_t q_size = __builtin_object_size(q, 1); size_t size; + /* If neither buffer size is known, immediately give up. */ if (p_size == (size_t)-1 && q_size == (size_t)-1) return __underlying_strcpy(p, q); size = strlen(q) + 1; @@ -320,14 +406,13 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q) /* Run-time check for dynamic size overflow. */ if (p_size < size) fortify_panic(__func__); - memcpy(p, q, size); + __underlying_memcpy(p, q, size); return p; } /* Don't use these outside the FORITFY_SOURCE implementation */ #undef __underlying_memchr #undef __underlying_memcmp -#undef __underlying_memcpy #undef __underlying_memmove #undef __underlying_memset #undef __underlying_strcat -- cgit From 938a000e3f9bead24ea753286b3e4d2423275c9e Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 16 Jun 2021 14:48:19 -0700 Subject: fortify: Detect struct member overflows in memmove() at compile-time As done for memcpy(), also update memmove() to use the same tightened compile-time checks under CONFIG_FORTIFY_SOURCE. Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index f578d00403ad..098d8a322a7a 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -309,22 +309,10 @@ __FORTIFY_INLINE void fortify_memcpy_chk(__kernel_size_t size, __builtin_object_size(p, 0), __builtin_object_size(q, 0), \ __builtin_object_size(p, 1), __builtin_object_size(q, 1), \ memcpy) - -__FORTIFY_INLINE void *memmove(void *p, const void *q, __kernel_size_t size) -{ - size_t p_size = __builtin_object_size(p, 0); - size_t q_size = __builtin_object_size(q, 0); - - if (__builtin_constant_p(size)) { - if (p_size < size) - __write_overflow(); - if (q_size < size) - __read_overflow2(); - } - if (p_size < size || q_size < size) - fortify_panic(__func__); - return __underlying_memmove(p, q, size); -} +#define memmove(p, q, s) __fortify_memcpy_chk(p, q, s, \ + __builtin_object_size(p, 0), __builtin_object_size(q, 0), \ + __builtin_object_size(p, 1), __builtin_object_size(q, 1), \ + memmove) extern void *__real_memscan(void *, int, __kernel_size_t) __RENAME(memscan); __FORTIFY_INLINE void *memscan(void *p, int c, __kernel_size_t size) @@ -413,7 +401,6 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q) /* Don't use these outside the FORITFY_SOURCE implementation */ #undef __underlying_memchr #undef __underlying_memcmp -#undef __underlying_memmove #undef __underlying_memset #undef __underlying_strcat #undef __underlying_strcpy -- cgit From 28e77cc1c0686621a4d416f599cee5ab369daa0a Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 16 Jun 2021 14:42:23 -0700 Subject: fortify: Detect struct member overflows in memset() at compile-time As done for memcpy(), also update memset() to use the same tightened compile-time bounds checking under CONFIG_FORTIFY_SOURCE. Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 54 +++++++++++++++++++++++++++++++++++------- 1 file changed, 46 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 098d8a322a7a..53123712bb3b 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -200,17 +200,56 @@ __FORTIFY_INLINE char *strncat(char *p, const char *q, __kernel_size_t count) return p; } -__FORTIFY_INLINE void *memset(void *p, int c, __kernel_size_t size) +__FORTIFY_INLINE void fortify_memset_chk(__kernel_size_t size, + const size_t p_size, + const size_t p_size_field) { - size_t p_size = __builtin_object_size(p, 0); + if (__builtin_constant_p(size)) { + /* + * Length argument is a constant expression, so we + * can perform compile-time bounds checking where + * buffer sizes are known. + */ - if (__builtin_constant_p(size) && p_size < size) - __write_overflow(); - if (p_size < size) - fortify_panic(__func__); - return __underlying_memset(p, c, size); + /* Error when size is larger than enclosing struct. */ + if (p_size > p_size_field && p_size < size) + __write_overflow(); + + /* Warn when write size is larger than dest field. */ + if (p_size_field < size) + __write_overflow_field(p_size_field, size); + } + /* + * At this point, length argument may not be a constant expression, + * so run-time bounds checking can be done where buffer sizes are + * known. (This is not an "else" because the above checks may only + * be compile-time warnings, and we want to still warn for run-time + * overflows.) + */ + + /* + * Always stop accesses beyond the struct that contains the + * field, when the buffer's remaining size is known. + * (The -1 test is to optimize away checks where the buffer + * lengths are unknown.) + */ + if (p_size != (size_t)(-1) && p_size < size) + fortify_panic("memset"); } +#define __fortify_memset_chk(p, c, size, p_size, p_size_field) ({ \ + size_t __fortify_size = (size_t)(size); \ + fortify_memset_chk(__fortify_size, p_size, p_size_field), \ + __underlying_memset(p, c, __fortify_size); \ +}) + +/* + * __builtin_object_size() must be captured here to avoid evaluating argument + * side-effects further into the macro layers. + */ +#define memset(p, c, s) __fortify_memset_chk(p, c, s, \ + __builtin_object_size(p, 0), __builtin_object_size(p, 1)) + /* * To make sure the compiler can enforce protection against buffer overflows, * memcpy(), memmove(), and memset() must not be used beyond individual @@ -401,7 +440,6 @@ __FORTIFY_INLINE char *strcpy(char *p, const char *q) /* Don't use these outside the FORITFY_SOURCE implementation */ #undef __underlying_memchr #undef __underlying_memcmp -#undef __underlying_memset #undef __underlying_strcat #undef __underlying_strcpy #undef __underlying_strlen -- cgit From f361143141362485b39eb40bb4910e3f4219180f Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 8 Feb 2022 14:53:43 -0800 Subject: fortify: Replace open-coded __gnu_inline attribute Replace open-coded gnu_inline attribute with the normal kernel convention for attributes: __gnu_inline Signed-off-by: Kees Cook Reviewed-by: Nick Desaulniers Link: https://lore.kernel.org/r/20220208225350.1331628-2-keescook@chromium.org --- include/linux/fortify-string.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 53123712bb3b..439aad24ab3b 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -2,7 +2,7 @@ #ifndef _LINUX_FORTIFY_STRING_H_ #define _LINUX_FORTIFY_STRING_H_ -#define __FORTIFY_INLINE extern __always_inline __attribute__((gnu_inline)) +#define __FORTIFY_INLINE extern __always_inline __gnu_inline #define __RENAME(x) __asm__(#x) void fortify_panic(const char *name) __noreturn __cold; -- cgit From f0202b8ca48cef152d4cdf775f39be6d3b372e1e Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 8 Feb 2022 14:53:44 -0800 Subject: Compiler Attributes: Add __pass_object_size for Clang In order to gain greater visibility to type information when using __builtin_object_size(), Clang has a function attribute "pass_object_size" that will make size information available for marked arguments in a function by way of implicit additional function arguments that are then wired up the __builtin_object_size(). This is needed to implement FORTIFY_SOURCE in Clang, as a workaround to Clang's __builtin_object_size() having limited visibility[1] into types across function calls (even inlines). This attribute has an additional benefit that it can be used even on non-inline functions to gain argument size information. [1] https://github.com/llvm/llvm-project/issues/53516 Cc: Nick Desaulniers Cc: Nathan Chancellor Cc: llvm@lists.linux.dev Reviewed-by: Miguel Ojeda Signed-off-by: Kees Cook Reviewed-by: Nick Desaulniers Link: https://lore.kernel.org/r/20220208225350.1331628-3-keescook@chromium.org --- include/linux/compiler_attributes.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index 37e260020221..d0c503772061 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -263,6 +263,20 @@ */ #define __packed __attribute__((__packed__)) +/* + * Note: the "type" argument should match any __builtin_object_size(p, type) usage. + * + * Optional: not supported by gcc. + * Optional: not supported by icc. + * + * clang: https://clang.llvm.org/docs/AttributeReference.html#pass-object-size-pass-dynamic-object-size + */ +#if __has_attribute(__pass_object_size__) +# define __pass_object_size(type) __attribute__((__pass_object_size__(type))) +#else +# define __pass_object_size(type) +#endif + /* * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-pure-function-attribute */ -- cgit From d694dbaefd6fa6bdde84b704779eab53942753a5 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 8 Feb 2022 14:53:45 -0800 Subject: Compiler Attributes: Add __overloadable for Clang In order for FORTIFY_SOURCE to use __pass_object_size on an "extern inline" function, as all the fortified string functions are, the functions must be marked as being overloadable (i.e. different prototypes due to the implicitly injected object size arguments). This allows the __pass_object_size versions to take precedence. Cc: Nathan Chancellor Cc: llvm@lists.linux.dev Reviewed-by: Miguel Ojeda Reviewed-by: Nick Desaulniers Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220208225350.1331628-4-keescook@chromium.org --- include/linux/compiler_attributes.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index d0c503772061..dcaf55f5d1ae 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -257,6 +257,18 @@ */ #define __noreturn __attribute__((__noreturn__)) +/* + * Optional: not supported by gcc. + * Optional: not supported by icc. + * + * clang: https://clang.llvm.org/docs/AttributeReference.html#overloadable + */ +#if __has_attribute(__overloadable__) +# define __overloadable __attribute__((__overloadable__)) +#else +# define __overloadable +#endif + /* * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Type-Attributes.html#index-packed-type-attribute * clang: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#index-packed-variable-attribute -- cgit From 1c7f4e5c1b6c9d9b508007c141dca77cab9434b4 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 8 Feb 2022 14:53:46 -0800 Subject: Compiler Attributes: Add __diagnose_as for Clang Clang will perform various compile-time diagnostics on uses of various functions (e.g. simple bounds-checking on strcpy(), etc). These diagnostics can be assigned to other functions (for example, new implementations of the string functions under CONFIG_FORTIFY_SOURCE) using the "diagnose_as_builtin" attribute. This allows those functions to retain their compile-time diagnostic warnings. Cc: Nathan Chancellor Cc: llvm@lists.linux.dev Reviewed-by: Miguel Ojeda Reviewed-by: Nick Desaulniers Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220208225350.1331628-5-keescook@chromium.org --- include/linux/compiler_attributes.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index dcaf55f5d1ae..445e80517cab 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -100,6 +100,19 @@ # define __copy(symbol) #endif +/* + * Optional: not supported by gcc + * Optional: only supported since clang >= 14.0 + * Optional: not supported by icc + * + * clang: https://clang.llvm.org/docs/AttributeReference.html#diagnose_as_builtin + */ +#if __has_attribute(__diagnose_as_builtin__) +# define __diagnose_as(builtin...) __attribute__((__diagnose_as_builtin__(builtin))) +#else +# define __diagnose_as(builtin...) +#endif + /* * Don't. Just don't. See commit 771c035372a0 ("deprecate the '__deprecated' * attribute warnings entirely and for good") for more information. -- cgit From 0a2b782a00f33e7d06dc43d099fa071ae97bee77 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 8 Feb 2022 14:53:47 -0800 Subject: fortify: Make pointer arguments const In preparation for using Clang's __pass_object_size attribute, make all the pointer arguments to the fortified string functions const. Nothing was changing their values anyway, so this added requirement (needed by __pass_object_size) requires no code changes and has no impact on the binary instruction output. Signed-off-by: Kees Cook Reviewed-by: Nick Desaulniers Link: https://lore.kernel.org/r/20220208225350.1331628-6-keescook@chromium.org --- include/linux/fortify-string.h | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 439aad24ab3b..f874ada4b9af 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -50,7 +50,7 @@ extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) #define __underlying_strncpy __builtin_strncpy #endif -__FORTIFY_INLINE char *strncpy(char *p, const char *q, __kernel_size_t size) +__FORTIFY_INLINE char *strncpy(char * const p, const char *q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 1); @@ -61,7 +61,7 @@ __FORTIFY_INLINE char *strncpy(char *p, const char *q, __kernel_size_t size) return __underlying_strncpy(p, q, size); } -__FORTIFY_INLINE char *strcat(char *p, const char *q) +__FORTIFY_INLINE char *strcat(char * const p, const char *q) { size_t p_size = __builtin_object_size(p, 1); @@ -73,7 +73,7 @@ __FORTIFY_INLINE char *strcat(char *p, const char *q) } extern __kernel_size_t __real_strnlen(const char *, __kernel_size_t) __RENAME(strnlen); -__FORTIFY_INLINE __kernel_size_t strnlen(const char *p, __kernel_size_t maxlen) +__FORTIFY_INLINE __kernel_size_t strnlen(const char * const p, __kernel_size_t maxlen) { size_t p_size = __builtin_object_size(p, 1); size_t p_len = __compiletime_strlen(p); @@ -94,7 +94,7 @@ __FORTIFY_INLINE __kernel_size_t strnlen(const char *p, __kernel_size_t maxlen) } /* defined after fortified strnlen to reuse it. */ -__FORTIFY_INLINE __kernel_size_t strlen(const char *p) +__FORTIFY_INLINE __kernel_size_t strlen(const char * const p) { __kernel_size_t ret; size_t p_size = __builtin_object_size(p, 1); @@ -110,7 +110,7 @@ __FORTIFY_INLINE __kernel_size_t strlen(const char *p) /* defined after fortified strlen to reuse it */ extern size_t __real_strlcpy(char *, const char *, size_t) __RENAME(strlcpy); -__FORTIFY_INLINE size_t strlcpy(char *p, const char *q, size_t size) +__FORTIFY_INLINE size_t strlcpy(char * const p, const char * const q, size_t size) { size_t p_size = __builtin_object_size(p, 1); size_t q_size = __builtin_object_size(q, 1); @@ -137,7 +137,7 @@ __FORTIFY_INLINE size_t strlcpy(char *p, const char *q, size_t size) /* defined after fortified strnlen to reuse it */ extern ssize_t __real_strscpy(char *, const char *, size_t) __RENAME(strscpy); -__FORTIFY_INLINE ssize_t strscpy(char *p, const char *q, size_t size) +__FORTIFY_INLINE ssize_t strscpy(char * const p, const char * const q, size_t size) { size_t len; /* Use string size rather than possible enclosing struct size. */ @@ -183,7 +183,7 @@ __FORTIFY_INLINE ssize_t strscpy(char *p, const char *q, size_t size) } /* defined after fortified strlen and strnlen to reuse them */ -__FORTIFY_INLINE char *strncat(char *p, const char *q, __kernel_size_t count) +__FORTIFY_INLINE char *strncat(char * const p, const char * const q, __kernel_size_t count) { size_t p_len, copy_len; size_t p_size = __builtin_object_size(p, 1); @@ -354,7 +354,7 @@ __FORTIFY_INLINE void fortify_memcpy_chk(__kernel_size_t size, memmove) extern void *__real_memscan(void *, int, __kernel_size_t) __RENAME(memscan); -__FORTIFY_INLINE void *memscan(void *p, int c, __kernel_size_t size) +__FORTIFY_INLINE void *memscan(void * const p, int c, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -365,7 +365,7 @@ __FORTIFY_INLINE void *memscan(void *p, int c, __kernel_size_t size) return __real_memscan(p, c, size); } -__FORTIFY_INLINE int memcmp(const void *p, const void *q, __kernel_size_t size) +__FORTIFY_INLINE int memcmp(const void * const p, const void * const q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); size_t q_size = __builtin_object_size(q, 0); @@ -381,7 +381,7 @@ __FORTIFY_INLINE int memcmp(const void *p, const void *q, __kernel_size_t size) return __underlying_memcmp(p, q, size); } -__FORTIFY_INLINE void *memchr(const void *p, int c, __kernel_size_t size) +__FORTIFY_INLINE void *memchr(const void * const p, int c, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -393,7 +393,7 @@ __FORTIFY_INLINE void *memchr(const void *p, int c, __kernel_size_t size) } void *__real_memchr_inv(const void *s, int c, size_t n) __RENAME(memchr_inv); -__FORTIFY_INLINE void *memchr_inv(const void *p, int c, size_t size) +__FORTIFY_INLINE void *memchr_inv(const void * const p, int c, size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -405,7 +405,7 @@ __FORTIFY_INLINE void *memchr_inv(const void *p, int c, size_t size) } extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup); -__FORTIFY_INLINE void *kmemdup(const void *p, size_t size, gfp_t gfp) +__FORTIFY_INLINE void *kmemdup(const void * const p, size_t size, gfp_t gfp) { size_t p_size = __builtin_object_size(p, 0); @@ -417,7 +417,7 @@ __FORTIFY_INLINE void *kmemdup(const void *p, size_t size, gfp_t gfp) } /* Defined after fortified strlen to reuse it. */ -__FORTIFY_INLINE char *strcpy(char *p, const char *q) +__FORTIFY_INLINE char *strcpy(char * const p, const char * const q) { size_t p_size = __builtin_object_size(p, 1); size_t q_size = __builtin_object_size(q, 1); -- cgit From 92df138a8d663cefebc3124041253677a53c92cf Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 8 Feb 2022 14:53:48 -0800 Subject: fortify: Use __diagnose_as() for better diagnostic coverage In preparation for using Clang's __pass_object_size, add __diagnose_as() attributes to mark the functions as being the same as the indicated builtins. When __daignose_as() is available, Clang will have a more complete ability to apply its own diagnostic analysis to callers of these functions, as if they were the builtins themselves. Without __diagnose_as, Clang's compile time diagnostic messages won't be as precise as they could be, but at least users of older toolchains will still benefit from having fortified routines. Signed-off-by: Kees Cook Reviewed-by: Nick Desaulniers Link: https://lore.kernel.org/r/20220208225350.1331628-7-keescook@chromium.org --- include/linux/fortify-string.h | 21 ++++++++++++++------- 1 file changed, 14 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index f874ada4b9af..db1ad1c1c79a 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -50,7 +50,8 @@ extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) #define __underlying_strncpy __builtin_strncpy #endif -__FORTIFY_INLINE char *strncpy(char * const p, const char *q, __kernel_size_t size) +__FORTIFY_INLINE __diagnose_as(__builtin_strncpy, 1, 2, 3) +char *strncpy(char * const p, const char *q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 1); @@ -61,7 +62,8 @@ __FORTIFY_INLINE char *strncpy(char * const p, const char *q, __kernel_size_t si return __underlying_strncpy(p, q, size); } -__FORTIFY_INLINE char *strcat(char * const p, const char *q) +__FORTIFY_INLINE __diagnose_as(__builtin_strcat, 1, 2) +char *strcat(char * const p, const char *q) { size_t p_size = __builtin_object_size(p, 1); @@ -94,7 +96,8 @@ __FORTIFY_INLINE __kernel_size_t strnlen(const char * const p, __kernel_size_t m } /* defined after fortified strnlen to reuse it. */ -__FORTIFY_INLINE __kernel_size_t strlen(const char * const p) +__FORTIFY_INLINE __diagnose_as(__builtin_strlen, 1) +__kernel_size_t strlen(const char * const p) { __kernel_size_t ret; size_t p_size = __builtin_object_size(p, 1); @@ -183,7 +186,8 @@ __FORTIFY_INLINE ssize_t strscpy(char * const p, const char * const q, size_t si } /* defined after fortified strlen and strnlen to reuse them */ -__FORTIFY_INLINE char *strncat(char * const p, const char * const q, __kernel_size_t count) +__FORTIFY_INLINE __diagnose_as(__builtin_strncat, 1, 2, 3) +char *strncat(char * const p, const char * const q, __kernel_size_t count) { size_t p_len, copy_len; size_t p_size = __builtin_object_size(p, 1); @@ -365,7 +369,8 @@ __FORTIFY_INLINE void *memscan(void * const p, int c, __kernel_size_t size) return __real_memscan(p, c, size); } -__FORTIFY_INLINE int memcmp(const void * const p, const void * const q, __kernel_size_t size) +__FORTIFY_INLINE __diagnose_as(__builtin_memcmp, 1, 2, 3) +int memcmp(const void * const p, const void * const q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); size_t q_size = __builtin_object_size(q, 0); @@ -381,7 +386,8 @@ __FORTIFY_INLINE int memcmp(const void * const p, const void * const q, __kernel return __underlying_memcmp(p, q, size); } -__FORTIFY_INLINE void *memchr(const void * const p, int c, __kernel_size_t size) +__FORTIFY_INLINE __diagnose_as(__builtin_memchr, 1, 2, 3) +void *memchr(const void * const p, int c, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -417,7 +423,8 @@ __FORTIFY_INLINE void *kmemdup(const void * const p, size_t size, gfp_t gfp) } /* Defined after fortified strlen to reuse it. */ -__FORTIFY_INLINE char *strcpy(char * const p, const char * const q) +__FORTIFY_INLINE __diagnose_as(__builtin_strcpy, 1, 2) +char *strcpy(char * const p, const char * const q) { size_t p_size = __builtin_object_size(p, 1); size_t q_size = __builtin_object_size(q, 1); -- cgit From 67ebc3ab446230c77fe3b545a9d8a11cac1cfb6e Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 8 Feb 2022 14:53:49 -0800 Subject: fortify: Make sure strlen() may still be used as a constant expression In preparation for enabling Clang FORTIFY_SOURCE support, redefine strlen() as a macro that tests for being a constant expression so that strlen() can still be used in static initializers, which is lost when adding __pass_object_size and __overloadable. An example of this usage can be seen here: https://lore.kernel.org/all/202201252321.dRmWZ8wW-lkp@intel.com/ Notably, this constant expression feature of strlen() is not available for architectures that build with -ffreestanding. This means the kernel currently does not universally expect strlen() to be used this way, but since there _are_ some build configurations that depend on it, retain the characteristic for Clang FORTIFY_SOURCE builds too. Signed-off-by: Kees Cook Reviewed-by: Nick Desaulniers Link: https://lore.kernel.org/r/20220208225350.1331628-8-keescook@chromium.org --- include/linux/fortify-string.h | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index db1ad1c1c79a..f77cf22e2d60 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -2,6 +2,8 @@ #ifndef _LINUX_FORTIFY_STRING_H_ #define _LINUX_FORTIFY_STRING_H_ +#include + #define __FORTIFY_INLINE extern __always_inline __gnu_inline #define __RENAME(x) __asm__(#x) @@ -95,9 +97,16 @@ __FORTIFY_INLINE __kernel_size_t strnlen(const char * const p, __kernel_size_t m return ret; } -/* defined after fortified strnlen to reuse it. */ +/* + * Defined after fortified strnlen to reuse it. However, it must still be + * possible for strlen() to be used on compile-time strings for use in + * static initializers (i.e. as a constant expression). + */ +#define strlen(p) \ + __builtin_choose_expr(__is_constexpr(__builtin_strlen(p)), \ + __builtin_strlen(p), __fortify_strlen(p)) __FORTIFY_INLINE __diagnose_as(__builtin_strlen, 1) -__kernel_size_t strlen(const char * const p) +__kernel_size_t __fortify_strlen(const char * const p) { __kernel_size_t ret; size_t p_size = __builtin_object_size(p, 1); -- cgit From 281d0c962752fb40866dd8d4cade68656f34bd1f Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 8 Feb 2022 14:53:50 -0800 Subject: fortify: Add Clang support Enable FORTIFY_SOURCE support for Clang: Use the new __pass_object_size and __overloadable attributes so that Clang will have appropriate visibility into argument sizes such that __builtin_object_size(p, 1) will behave correctly. Additional details available here: https://github.com/llvm/llvm-project/issues/53516 https://github.com/ClangBuiltLinux/linux/issues/1401 A bug with __builtin_constant_p() of globally defined variables was fixed in Clang 13 (and backported to 12.0.1), so FORTIFY support must depend on that version or later. Additional details here: https://bugs.llvm.org/show_bug.cgi?id=41459 commit a52f8a59aef4 ("fortify: Explicitly disable Clang support") A bug with Clang's -mregparm=3 and -m32 makes some builtins unusable, so removing -ffreestanding (to gain the needed libcall optimizations with Clang) cannot be done. Without the libcall optimizations, Clang cannot provide appropriate FORTIFY coverage, so it must be disabled for CONFIG_X86_32. Additional details here; https://github.com/llvm/llvm-project/issues/53645 Cc: Miguel Ojeda Cc: Nick Desaulniers Cc: Nathan Chancellor Cc: George Burgess IV Cc: llvm@lists.linux.dev Signed-off-by: Kees Cook Reviewed-by: Nick Desaulniers Link: https://lore.kernel.org/r/20220208225350.1331628-9-keescook@chromium.org --- include/linux/fortify-string.h | 40 ++++++++++++++++++++++++++-------------- 1 file changed, 26 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index f77cf22e2d60..295637a66c46 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -4,7 +4,7 @@ #include -#define __FORTIFY_INLINE extern __always_inline __gnu_inline +#define __FORTIFY_INLINE extern __always_inline __gnu_inline __overloadable #define __RENAME(x) __asm__(#x) void fortify_panic(const char *name) __noreturn __cold; @@ -52,8 +52,17 @@ extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) #define __underlying_strncpy __builtin_strncpy #endif +/* + * Clang's use of __builtin_object_size() within inlines needs hinting via + * __pass_object_size(). The preference is to only ever use type 1 (member + * size, rather than struct size), but there remain some stragglers using + * type 0 that will be converted in the future. + */ +#define POS __pass_object_size(1) +#define POS0 __pass_object_size(0) + __FORTIFY_INLINE __diagnose_as(__builtin_strncpy, 1, 2, 3) -char *strncpy(char * const p, const char *q, __kernel_size_t size) +char *strncpy(char * const POS p, const char *q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 1); @@ -65,7 +74,7 @@ char *strncpy(char * const p, const char *q, __kernel_size_t size) } __FORTIFY_INLINE __diagnose_as(__builtin_strcat, 1, 2) -char *strcat(char * const p, const char *q) +char *strcat(char * const POS p, const char *q) { size_t p_size = __builtin_object_size(p, 1); @@ -77,7 +86,7 @@ char *strcat(char * const p, const char *q) } extern __kernel_size_t __real_strnlen(const char *, __kernel_size_t) __RENAME(strnlen); -__FORTIFY_INLINE __kernel_size_t strnlen(const char * const p, __kernel_size_t maxlen) +__FORTIFY_INLINE __kernel_size_t strnlen(const char * const POS p, __kernel_size_t maxlen) { size_t p_size = __builtin_object_size(p, 1); size_t p_len = __compiletime_strlen(p); @@ -106,7 +115,7 @@ __FORTIFY_INLINE __kernel_size_t strnlen(const char * const p, __kernel_size_t m __builtin_choose_expr(__is_constexpr(__builtin_strlen(p)), \ __builtin_strlen(p), __fortify_strlen(p)) __FORTIFY_INLINE __diagnose_as(__builtin_strlen, 1) -__kernel_size_t __fortify_strlen(const char * const p) +__kernel_size_t __fortify_strlen(const char * const POS p) { __kernel_size_t ret; size_t p_size = __builtin_object_size(p, 1); @@ -122,7 +131,7 @@ __kernel_size_t __fortify_strlen(const char * const p) /* defined after fortified strlen to reuse it */ extern size_t __real_strlcpy(char *, const char *, size_t) __RENAME(strlcpy); -__FORTIFY_INLINE size_t strlcpy(char * const p, const char * const q, size_t size) +__FORTIFY_INLINE size_t strlcpy(char * const POS p, const char * const POS q, size_t size) { size_t p_size = __builtin_object_size(p, 1); size_t q_size = __builtin_object_size(q, 1); @@ -149,7 +158,7 @@ __FORTIFY_INLINE size_t strlcpy(char * const p, const char * const q, size_t siz /* defined after fortified strnlen to reuse it */ extern ssize_t __real_strscpy(char *, const char *, size_t) __RENAME(strscpy); -__FORTIFY_INLINE ssize_t strscpy(char * const p, const char * const q, size_t size) +__FORTIFY_INLINE ssize_t strscpy(char * const POS p, const char * const POS q, size_t size) { size_t len; /* Use string size rather than possible enclosing struct size. */ @@ -196,7 +205,7 @@ __FORTIFY_INLINE ssize_t strscpy(char * const p, const char * const q, size_t si /* defined after fortified strlen and strnlen to reuse them */ __FORTIFY_INLINE __diagnose_as(__builtin_strncat, 1, 2, 3) -char *strncat(char * const p, const char * const q, __kernel_size_t count) +char *strncat(char * const POS p, const char * const POS q, __kernel_size_t count) { size_t p_len, copy_len; size_t p_size = __builtin_object_size(p, 1); @@ -367,7 +376,7 @@ __FORTIFY_INLINE void fortify_memcpy_chk(__kernel_size_t size, memmove) extern void *__real_memscan(void *, int, __kernel_size_t) __RENAME(memscan); -__FORTIFY_INLINE void *memscan(void * const p, int c, __kernel_size_t size) +__FORTIFY_INLINE void *memscan(void * const POS0 p, int c, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -379,7 +388,7 @@ __FORTIFY_INLINE void *memscan(void * const p, int c, __kernel_size_t size) } __FORTIFY_INLINE __diagnose_as(__builtin_memcmp, 1, 2, 3) -int memcmp(const void * const p, const void * const q, __kernel_size_t size) +int memcmp(const void * const POS0 p, const void * const POS0 q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); size_t q_size = __builtin_object_size(q, 0); @@ -396,7 +405,7 @@ int memcmp(const void * const p, const void * const q, __kernel_size_t size) } __FORTIFY_INLINE __diagnose_as(__builtin_memchr, 1, 2, 3) -void *memchr(const void * const p, int c, __kernel_size_t size) +void *memchr(const void * const POS0 p, int c, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -408,7 +417,7 @@ void *memchr(const void * const p, int c, __kernel_size_t size) } void *__real_memchr_inv(const void *s, int c, size_t n) __RENAME(memchr_inv); -__FORTIFY_INLINE void *memchr_inv(const void * const p, int c, size_t size) +__FORTIFY_INLINE void *memchr_inv(const void * const POS0 p, int c, size_t size) { size_t p_size = __builtin_object_size(p, 0); @@ -420,7 +429,7 @@ __FORTIFY_INLINE void *memchr_inv(const void * const p, int c, size_t size) } extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup); -__FORTIFY_INLINE void *kmemdup(const void * const p, size_t size, gfp_t gfp) +__FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp) { size_t p_size = __builtin_object_size(p, 0); @@ -433,7 +442,7 @@ __FORTIFY_INLINE void *kmemdup(const void * const p, size_t size, gfp_t gfp) /* Defined after fortified strlen to reuse it. */ __FORTIFY_INLINE __diagnose_as(__builtin_strcpy, 1, 2) -char *strcpy(char * const p, const char * const q) +char *strcpy(char * const POS p, const char * const POS q) { size_t p_size = __builtin_object_size(p, 1); size_t q_size = __builtin_object_size(q, 1); @@ -462,4 +471,7 @@ char *strcpy(char * const p, const char * const q) #undef __underlying_strncat #undef __underlying_strncpy +#undef POS +#undef POS0 + #endif /* _LINUX_FORTIFY_STRING_H_ */ -- cgit From ddbd89deb7d32b1fbb879f48d68fda1a8ac58e8e Mon Sep 17 00:00:00 2001 From: Halil Pasic Date: Fri, 11 Feb 2022 02:12:52 +0100 Subject: swiotlb: fix info leak with DMA_FROM_DEVICE The problem I'm addressing was discovered by the LTP test covering cve-2018-1000204. A short description of what happens follows: 1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV and a corresponding dxferp. The peculiar thing about this is that TUR is not reading from the device. 2) In sg_start_req() the invocation of blk_rq_map_user() effectively bounces the user-space buffer. As if the device was to transfer into it. Since commit a45b599ad808 ("scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()") we make sure this first bounce buffer is allocated with GFP_ZERO. 3) For the rest of the story we keep ignoring that we have a TUR, so the device won't touch the buffer we prepare as if the we had a DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device and the buffer allocated by SG is mapped by the function virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here scatter-gather and not scsi generics). This mapping involves bouncing via the swiotlb (we need swiotlb to do virtio in protected guest like s390 Secure Execution, or AMD SEV). 4) When the SCSI TUR is done, we first copy back the content of the second (that is swiotlb) bounce buffer (which most likely contains some previous IO data), to the first bounce buffer, which contains all zeros. Then we copy back the content of the first bounce buffer to the user-space buffer. 5) The test case detects that the buffer, which it zero-initialized, ain't all zeros and fails. One can argue that this is an swiotlb problem, because without swiotlb we leak all zeros, and the swiotlb should be transparent in a sense that it does not affect the outcome (if all other participants are well behaved). Copying the content of the original buffer into the swiotlb buffer is the only way I can think of to make swiotlb transparent in such scenarios. So let's do just that if in doubt, but allow the driver to tell us that the whole mapped buffer is going to be overwritten, in which case we can preserve the old behavior and avoid the performance impact of the extra bounce. Signed-off-by: Halil Pasic Signed-off-by: Christoph Hellwig --- include/linux/dma-mapping.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index dca2b1355bb1..6150d11a607e 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -61,6 +61,14 @@ */ #define DMA_ATTR_PRIVILEGED (1UL << 9) +/* + * This is a hint to the DMA-mapping subsystem that the device is expected + * to overwrite the entire mapped size, thus the caller does not require any + * of the previous buffer contents to be preserved. This allows + * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers. + */ +#define DMA_ATTR_OVERWRITE (1UL << 10) + /* * A dma_addr_t can hold any valid DMA or bus address for the platform. It can * be given to a device to use as a DMA source or target. It is specific to a -- cgit From cd149eff8d2201a63c074a6d9d03e52926aa535d Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Wed, 9 Feb 2022 15:27:04 +0300 Subject: mtd: spi-nor: intel-spi: Disable write protection only if asked Currently the driver tries to disable the BIOS write protection automatically even if this is not what the user wants. For this reason modify the driver so that by default it does not touch the write protection. Only if specifically asked by the user (setting writeable=1 command line parameter) the driver tries to disable the BIOS write protection. Signed-off-by: Mika Westerberg Reviewed-by: Andy Shevchenko Reviewed-by: Mauro Lima Reviewed-by: Tudor Ambarus Acked-by: Lee Jones Link: https://lore.kernel.org/r/20220209122706.42439-2-mika.westerberg@linux.intel.com Signed-off-by: Mark Brown --- include/linux/platform_data/x86/intel-spi.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/intel-spi.h b/include/linux/platform_data/x86/intel-spi.h index 7f53a5c6f35e..7dda3f690465 100644 --- a/include/linux/platform_data/x86/intel-spi.h +++ b/include/linux/platform_data/x86/intel-spi.h @@ -19,11 +19,13 @@ enum intel_spi_type { /** * struct intel_spi_boardinfo - Board specific data for Intel SPI driver * @type: Type which this controller is compatible with - * @writeable: The chip is writeable + * @set_writeable: Try to make the chip writeable (optional) + * @data: Data to be passed to @set_writeable can be %NULL */ struct intel_spi_boardinfo { enum intel_spi_type type; - bool writeable; + bool (*set_writeable)(void __iomem *base, void *data); + void *data; }; #endif /* INTEL_SPI_PDATA_H */ -- cgit From e23e5a05d1fd9479586c40ffbcc056b3e34ef816 Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Wed, 9 Feb 2022 15:27:05 +0300 Subject: mtd: spi-nor: intel-spi: Convert to SPI MEM The preferred way to implement SPI-NOR controller drivers is through SPI subsubsystem utilizing the SPI MEM core functions. This converts the Intel SPI flash controller driver over the SPI MEM by moving the driver from SPI-NOR subsystem to SPI subsystem and in one go make it use the SPI MEM functions. The driver name will be changed from intel-spi to spi-intel to match the convention used in the SPI subsystem. Signed-off-by: Mika Westerberg Reviewed-by: Andy Shevchenko Reviewed-by: Mauro Lima Reviewed-by: Boris Brezillon Acked-by: Lee Jones Acked-by: Pratyush Yadav Reviewed-by: Tudor Ambarus Link: https://lore.kernel.org/r/20220209122706.42439-3-mika.westerberg@linux.intel.com Signed-off-by: Mark Brown --- include/linux/mfd/lpc_ich.h | 2 +- include/linux/platform_data/x86/intel-spi.h | 31 ----------------------------- include/linux/platform_data/x86/spi-intel.h | 31 +++++++++++++++++++++++++++++ 3 files changed, 32 insertions(+), 32 deletions(-) delete mode 100644 include/linux/platform_data/x86/intel-spi.h create mode 100644 include/linux/platform_data/x86/spi-intel.h (limited to 'include/linux') diff --git a/include/linux/mfd/lpc_ich.h b/include/linux/mfd/lpc_ich.h index 39967a5eca6d..ea4a4b1b246a 100644 --- a/include/linux/mfd/lpc_ich.h +++ b/include/linux/mfd/lpc_ich.h @@ -8,7 +8,7 @@ #ifndef LPC_ICH_H #define LPC_ICH_H -#include +#include /* GPIO resources */ #define ICH_RES_GPIO 0 diff --git a/include/linux/platform_data/x86/intel-spi.h b/include/linux/platform_data/x86/intel-spi.h deleted file mode 100644 index 7dda3f690465..000000000000 --- a/include/linux/platform_data/x86/intel-spi.h +++ /dev/null @@ -1,31 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Intel PCH/PCU SPI flash driver. - * - * Copyright (C) 2016, Intel Corporation - * Author: Mika Westerberg - */ - -#ifndef INTEL_SPI_PDATA_H -#define INTEL_SPI_PDATA_H - -enum intel_spi_type { - INTEL_SPI_BYT = 1, - INTEL_SPI_LPT, - INTEL_SPI_BXT, - INTEL_SPI_CNL, -}; - -/** - * struct intel_spi_boardinfo - Board specific data for Intel SPI driver - * @type: Type which this controller is compatible with - * @set_writeable: Try to make the chip writeable (optional) - * @data: Data to be passed to @set_writeable can be %NULL - */ -struct intel_spi_boardinfo { - enum intel_spi_type type; - bool (*set_writeable)(void __iomem *base, void *data); - void *data; -}; - -#endif /* INTEL_SPI_PDATA_H */ diff --git a/include/linux/platform_data/x86/spi-intel.h b/include/linux/platform_data/x86/spi-intel.h new file mode 100644 index 000000000000..a512ec37abbb --- /dev/null +++ b/include/linux/platform_data/x86/spi-intel.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Intel PCH/PCU SPI flash driver. + * + * Copyright (C) 2016, Intel Corporation + * Author: Mika Westerberg + */ + +#ifndef SPI_INTEL_PDATA_H +#define SPI_INTEL_PDATA_H + +enum intel_spi_type { + INTEL_SPI_BYT = 1, + INTEL_SPI_LPT, + INTEL_SPI_BXT, + INTEL_SPI_CNL, +}; + +/** + * struct intel_spi_boardinfo - Board specific data for Intel SPI driver + * @type: Type which this controller is compatible with + * @set_writeable: Try to make the chip writeable (optional) + * @data: Data to be passed to @set_writeable can be %NULL + */ +struct intel_spi_boardinfo { + enum intel_spi_type type; + bool (*set_writeable)(void __iomem *base, void *data); + void *data; +}; + +#endif /* SPI_INTEL_PDATA_H */ -- cgit From f48dc6b9664963107e500aecfc2f4df27dc5afb6 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Fri, 11 Feb 2022 00:19:54 +0100 Subject: spi: Retire legacy GPIO handling All drivers using GPIOs as chip select have been rewritten to use GPIO descriptors passing the ->use_gpio_descriptors flag. Retire the code and fields used by the legacy GPIO API. Do not drop the ->use_gpio_descriptors flag: it now only indicates that we want to use GPIOs in addition to native chip selects. Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220210231954.807904-1-linus.walleij@linaro.org Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index c3746ff35691..579d71cdf6fa 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -137,9 +137,6 @@ extern int spi_delay_exec(struct spi_delay *_delay, struct spi_transfer *xfer); * for driver coldplugging, and in uevents used for hotplugging * @driver_override: If the name of a driver is written to this attribute, then * the device will bind to the named driver and only the named driver. - * @cs_gpio: LEGACY: gpio number of the chipselect line (optional, -ENOENT when - * not using a GPIO line) use cs_gpiod in new drivers by opting in on - * the spi_master. * @cs_gpiod: gpio descriptor of the chipselect line (optional, NULL when * not using a GPIO line) * @word_delay: delay to be inserted between consecutive @@ -186,7 +183,6 @@ struct spi_device { void *controller_data; char modalias[SPI_NAME_SIZE]; const char *driver_override; - int cs_gpio; /* LEGACY: chip select gpio */ struct gpio_desc *cs_gpiod; /* chip select gpio desc */ struct spi_delay word_delay; /* inter-word delay */ /* CS delays */ @@ -418,17 +414,12 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * controller has native support for memory like operations. * @unprepare_message: undo any work done by prepare_message(). * @slave_abort: abort the ongoing transfer request on an SPI slave controller - * @cs_gpios: LEGACY: array of GPIO descs to use as chip select lines; one per - * CS number. Any individual value may be -ENOENT for CS lines that - * are not GPIOs (driven by the SPI controller itself). Use the cs_gpiods - * in new drivers. * @cs_gpiods: Array of GPIO descs to use as chip select lines; one per CS * number. Any individual value may be NULL for CS lines that * are not GPIOs (driven by the SPI controller itself). * @use_gpio_descriptors: Turns on the code in the SPI core to parse and grab - * GPIO descriptors rather than using global GPIO numbers grabbed by the - * driver. This will fill in @cs_gpiods and @cs_gpios should not be used, - * and SPI devices will have the cs_gpiod assigned rather than cs_gpio. + * GPIO descriptors. This will fill in @cs_gpiods and SPI devices will have + * the cs_gpiod assigned if a GPIO line is found for the chipselect. * @unused_native_cs: When cs_gpiods is used, spi_register_controller() will * fill in this field with the first unused native CS, to be used by SPI * controller drivers that need to drive a native CS when using GPIO CS. @@ -642,7 +633,6 @@ struct spi_controller { const struct spi_controller_mem_ops *mem_ops; /* gpio chip select */ - int *cs_gpios; struct gpio_desc **cs_gpiods; bool use_gpio_descriptors; s8 unused_native_cs; -- cgit From 1de785a58035031f81a43c42f355fa1db510dc36 Mon Sep 17 00:00:00 2001 From: Jeff LaBundy Date: Sun, 23 Jan 2022 13:01:05 -0600 Subject: mfd: iqs62x: Provide device revision to sub-devices Individual sub-devices may elect to make decisions based on the specific revision of silicon encountered at probe. This data is already read from the device, but is not retained. Pass this data on to the sub-devices by adding the software and hardware numbers (registers 0x01 and 0x02, respectively) to the iqs62x_core struct. Signed-off-by: Jeff LaBundy Signed-off-by: Lee Jones --- include/linux/mfd/iqs62x.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mfd/iqs62x.h b/include/linux/mfd/iqs62x.h index 5ced55eae11b..ffc86010af74 100644 --- a/include/linux/mfd/iqs62x.h +++ b/include/linux/mfd/iqs62x.h @@ -14,6 +14,11 @@ #define IQS624_PROD_NUM 0x43 #define IQS625_PROD_NUM 0x4E +#define IQS620_HW_NUM_V0 0x82 +#define IQS620_HW_NUM_V1 IQS620_HW_NUM_V0 +#define IQS620_HW_NUM_V2 IQS620_HW_NUM_V0 +#define IQS620_HW_NUM_V3 0x92 + #define IQS621_ALS_FLAGS 0x16 #define IQS622_ALS_FLAGS 0x14 @@ -129,6 +134,8 @@ struct iqs62x_core { struct completion fw_done; enum iqs62x_ui_sel ui_sel; unsigned long event_cache; + u8 sw_num; + u8 hw_num; }; extern const struct iqs62x_event_desc iqs62x_events[IQS62X_NUM_EVENTS]; -- cgit From 5891cd5ec46c2c2eb6427cb54d214b149635dd0e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 11 Feb 2022 12:06:23 -0800 Subject: net_sched: add __rcu annotation to netdev->qdisc syzbot found a data-race [1] which lead me to add __rcu annotations to netdev->qdisc, and proper accessors to get LOCKDEP support. [1] BUG: KCSAN: data-race in dev_activate / qdisc_lookup_rcu write to 0xffff888168ad6410 of 8 bytes by task 13559 on cpu 1: attach_default_qdiscs net/sched/sch_generic.c:1167 [inline] dev_activate+0x2ed/0x8f0 net/sched/sch_generic.c:1221 __dev_open+0x2e9/0x3a0 net/core/dev.c:1416 __dev_change_flags+0x167/0x3f0 net/core/dev.c:8139 rtnl_configure_link+0xc2/0x150 net/core/rtnetlink.c:3150 __rtnl_newlink net/core/rtnetlink.c:3489 [inline] rtnl_newlink+0xf4d/0x13e0 net/core/rtnetlink.c:3529 rtnetlink_rcv_msg+0x745/0x7e0 net/core/rtnetlink.c:5594 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff888168ad6410 of 8 bytes by task 13560 on cpu 0: qdisc_lookup_rcu+0x30/0x2e0 net/sched/sch_api.c:323 __tcf_qdisc_find+0x74/0x3a0 net/sched/cls_api.c:1050 tc_del_tfilter+0x1c7/0x1350 net/sched/cls_api.c:2211 rtnetlink_rcv_msg+0x5ba/0x7e0 net/core/rtnetlink.c:5585 netlink_rcv_skb+0x14e/0x250 net/netlink/af_netlink.c:2494 rtnetlink_rcv+0x18/0x20 net/core/rtnetlink.c:5612 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline] netlink_unicast+0x602/0x6d0 net/netlink/af_netlink.c:1343 netlink_sendmsg+0x728/0x850 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:705 [inline] sock_sendmsg net/socket.c:725 [inline] ____sys_sendmsg+0x39a/0x510 net/socket.c:2413 ___sys_sendmsg net/socket.c:2467 [inline] __sys_sendmsg+0x195/0x230 net/socket.c:2496 __do_sys_sendmsg net/socket.c:2505 [inline] __se_sys_sendmsg net/socket.c:2503 [inline] __x64_sys_sendmsg+0x42/0x50 net/socket.c:2503 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffffffff85dee080 -> 0xffff88815d96ec00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 13560 Comm: syz-executor.2 Not tainted 5.17.0-rc3-syzkaller-00116-gf1baf68e1383-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Fixes: 470502de5bdb ("net: sched: unlock rules update API") Signed-off-by: Eric Dumazet Cc: Vlad Buslov Reported-by: syzbot Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Signed-off-by: David S. Miller --- include/linux/netdevice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e490b84732d1..8b5a314db167 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2158,7 +2158,7 @@ struct net_device { struct netdev_queue *_tx ____cacheline_aligned_in_smp; unsigned int num_tx_queues; unsigned int real_num_tx_queues; - struct Qdisc *qdisc; + struct Qdisc __rcu *qdisc; unsigned int tx_queue_len; spinlock_t tx_global_lock; -- cgit From baebdf48c360080710f80699eea3affbb13d6c65 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Sat, 12 Feb 2022 00:38:38 +0100 Subject: net: dev: Makes sure netif_rx() can be invoked in any context. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Dave suggested a while ago (eleven years by now) "Let's make netif_rx() work in all contexts and get rid of netif_rx_ni()". Eric agreed and pointed out that modern devices should use netif_receive_skb() to avoid the overhead. In the meantime someone added another variant, netif_rx_any_context(), which behaves as suggested. netif_rx() must be invoked with disabled bottom halves to ensure that pending softirqs, which were raised within the function, are handled. netif_rx_ni() can be invoked only from process context (bottom halves must be enabled) because the function handles pending softirqs without checking if bottom halves were disabled or not. netif_rx_any_context() invokes on the former functions by checking in_interrupts(). netif_rx() could be taught to handle both cases (disabled and enabled bottom halves) by simply disabling bottom halves while invoking netif_rx_internal(). The local_bh_enable() invocation will then invoke pending softirqs only if the BH-disable counter drops to zero. Eric is concerned about the overhead of BH-disable+enable especially in regard to the loopback driver. As critical as this driver is, it will receive a shortcut to avoid the additional overhead which is not needed. Add a local_bh_disable() section in netif_rx() to ensure softirqs are handled if needed. Provide __netif_rx() which does not disable BH and has a lockdep assert to ensure that interrupts are disabled. Use this shortcut in the loopback driver and in drivers/net/*.c. Make netif_rx_ni() and netif_rx_any_context() invoke netif_rx() so they can be removed once they are no more users left. Link: https://lkml.kernel.org/r/20100415.020246.218622820.davem@davemloft.net Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Eric Dumazet Reviewed-by: Toke Høiland-Jørgensen Signed-off-by: David S. Miller --- include/linux/netdevice.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5f6e2c0b0c90..e99db8341da5 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3672,8 +3672,18 @@ u32 bpf_prog_run_generic_xdp(struct sk_buff *skb, struct xdp_buff *xdp, void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog); int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb); int netif_rx(struct sk_buff *skb); -int netif_rx_ni(struct sk_buff *skb); -int netif_rx_any_context(struct sk_buff *skb); +int __netif_rx(struct sk_buff *skb); + +static inline int netif_rx_ni(struct sk_buff *skb) +{ + return netif_rx(skb); +} + +static inline int netif_rx_any_context(struct sk_buff *skb) +{ + return netif_rx(skb); +} + int netif_receive_skb(struct sk_buff *skb); int netif_receive_skb_core(struct sk_buff *skb); void netif_receive_skb_list_internal(struct list_head *head); -- cgit From 76f05d88623e559eb9fc41db9fb911e67fab0e7a Mon Sep 17 00:00:00 2001 From: M Chetan Kumar Date: Mon, 14 Feb 2022 12:46:52 +0530 Subject: net: wwan: debugfs obtained dev reference not dropped WWAN driver call's wwan_get_debugfs_dir() to obtain WWAN debugfs dir entry. As part of this procedure it returns a reference to a found device. Since there is no debugfs interface available at WWAN subsystem, it is not possible to drop dev reference post debugfs use. This leads to side effects like post wwan driver load and reload the wwan instance gets increment from wwanX to wwanX+1. A new debugfs interface is added in wwan subsystem so that wwan driver can drop the obtained dev reference post debugfs use. void wwan_put_debugfs_dir(struct dentry *dir) Signed-off-by: M Chetan Kumar Signed-off-by: David S. Miller --- include/linux/wwan.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/wwan.h b/include/linux/wwan.h index afb3334ec8c5..5ce2acf444fb 100644 --- a/include/linux/wwan.h +++ b/include/linux/wwan.h @@ -174,11 +174,13 @@ void wwan_unregister_ops(struct device *parent); #ifdef CONFIG_WWAN_DEBUGFS struct dentry *wwan_get_debugfs_dir(struct device *parent); +void wwan_put_debugfs_dir(struct dentry *dir); #else static inline struct dentry *wwan_get_debugfs_dir(struct device *parent) { return ERR_PTR(-ENODEV); } +static inline void wwan_put_debugfs_dir(struct dentry *dir) {} #endif #endif /* __WWAN_H */ -- cgit From 32e92d9f6f876721cc4f565f8369d542b6266380 Mon Sep 17 00:00:00 2001 From: John Garry Date: Thu, 3 Feb 2022 17:59:20 +0800 Subject: iommu/iova: Separate out rcache init Currently the rcache structures are allocated for all IOVA domains, even if they do not use "fast" alloc+free interface. This is wasteful of memory. In addition, fails in init_iova_rcaches() are not handled safely, which is less than ideal. Make "fast" users call a separate rcache init explicitly, which includes error checking. Signed-off-by: John Garry Reviewed-by: Robin Murphy Acked-by: Michael S. Tsirkin Link: https://lore.kernel.org/r/1643882360-241739-1-git-send-email-john.garry@huawei.com Signed-off-by: Joerg Roedel --- include/linux/iova.h | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iova.h b/include/linux/iova.h index cea79cb9f26c..320a70e40233 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -21,18 +21,8 @@ struct iova { unsigned long pfn_lo; /* Lowest allocated pfn */ }; -struct iova_magazine; -struct iova_cpu_rcache; -#define IOVA_RANGE_CACHE_MAX_SIZE 6 /* log of max cached IOVA range size (in pages) */ -#define MAX_GLOBAL_MAGS 32 /* magazines per bin */ - -struct iova_rcache { - spinlock_t lock; - unsigned long depot_size; - struct iova_magazine *depot[MAX_GLOBAL_MAGS]; - struct iova_cpu_rcache __percpu *cpu_rcaches; -}; +struct iova_rcache; /* holds all the iova translations for a domain */ struct iova_domain { @@ -46,7 +36,7 @@ struct iova_domain { unsigned long max32_alloc_size; /* Size of last failed allocation */ struct iova anchor; /* rbtree lookup anchor */ - struct iova_rcache rcaches[IOVA_RANGE_CACHE_MAX_SIZE]; /* IOVA range caches */ + struct iova_rcache *rcaches; struct hlist_node cpuhp_dead; }; @@ -102,6 +92,7 @@ struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo, unsigned long pfn_hi); void init_iova_domain(struct iova_domain *iovad, unsigned long granule, unsigned long start_pfn); +int iova_domain_init_rcaches(struct iova_domain *iovad); struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn); void put_iova_domain(struct iova_domain *iovad); #else -- cgit From b2638e56c2ced2ca258d22f939c47327b189e00c Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 3 Feb 2022 14:56:13 +0200 Subject: device property: Don't split fwnode_get_irq*() APIs in the code New fwnode_get_irq_byname() landed after an unrelated function by ordering. Move fwnode_iomap(), so fwnode_get_irq*() APIs will go together. No functional change intended. Signed-off-by: Andy Shevchenko Reviewed-by: Sakari Ailus Signed-off-by: Rafael J. Wysocki --- include/linux/property.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/property.h b/include/linux/property.h index 95d56a562b6a..4cd4b326941f 100644 --- a/include/linux/property.h +++ b/include/linux/property.h @@ -123,8 +123,6 @@ void fwnode_handle_put(struct fwnode_handle *fwnode); int fwnode_irq_get(const struct fwnode_handle *fwnode, unsigned int index); int fwnode_irq_get_byname(const struct fwnode_handle *fwnode, const char *name); -void __iomem *fwnode_iomap(struct fwnode_handle *fwnode, int index); - unsigned int device_get_child_node_count(struct device *dev); static inline bool device_property_read_bool(struct device *dev, @@ -388,8 +386,10 @@ enum dev_dma_attr device_get_dma_attr(struct device *dev); const void *device_get_match_data(struct device *dev); int device_get_phy_mode(struct device *dev); - int fwnode_get_phy_mode(struct fwnode_handle *fwnode); + +void __iomem *fwnode_iomap(struct fwnode_handle *fwnode, int index); + struct fwnode_handle *fwnode_graph_get_next_endpoint( const struct fwnode_handle *fwnode, struct fwnode_handle *prev); struct fwnode_handle * -- cgit From 7a853c2d5951419fdf3c1c9d2b6f5a38f6a6857d Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Mon, 7 Feb 2022 15:02:45 -0800 Subject: mm: Change CONFIG option for mm->pasid field This currently depends on CONFIG_IOMMU_SUPPORT. But it is only needed when CONFIG_IOMMU_SVA option is enabled. Change the CONFIG guards around definition and initialization of mm->pasid field. Suggested-by: Jacob Pan Signed-off-by: Fenghua Yu Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner Reviewed-by: Lu Baolu Link: https://lore.kernel.org/r/20220207230254.3342514-3-fenghua.yu@intel.com --- include/linux/mm_types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5140e5feb486..c5cbfd7915ad 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -631,7 +631,7 @@ struct mm_struct { #endif struct work_struct async_put_work; -#ifdef CONFIG_IOMMU_SUPPORT +#ifdef CONFIG_IOMMU_SVA u32 pasid; #endif } __randomize_layout; -- cgit From 150154aae4311e7e6458903baecdc8fffe981ed3 Mon Sep 17 00:00:00 2001 From: "Uladzislau Rezki (Sony)" Date: Wed, 1 Dec 2021 10:20:53 +0100 Subject: rcu: Fix description of kvfree_rcu() The kvfree_rcu() header comment's description of the "ptr" parameter is unclear, therefore rephrase it to make it clear that it is a pointer to the memory to eventually be passed to kvfree(). Reported-by: Steven Rostedt Signed-off-by: Uladzislau Rezki (Sony) Signed-off-by: Paul E. McKenney --- include/linux/rcupdate.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 88b42eb46406..9d7df8d36af0 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -924,7 +924,7 @@ static inline notrace void rcu_read_unlock_sched_notrace(void) * * kvfree_rcu(ptr); * - * where @ptr is a pointer to kvfree(). + * where @ptr is the pointer to be freed by kvfree(). * * Please note, head-less way of freeing is permitted to * use from a context that has to follow might_sleep() -- cgit From 58d4292bd037b01fbb940a5170817f7d40caa9d5 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Fri, 14 Jan 2022 16:07:28 -0800 Subject: rcu: Uninline multi-use function: finish_rcuwait() This is a rarely used function, so uninlining its 3 instructions is probably a win or a wash - but the main motivation is to make independent of task_struct details. Signed-off-by: Ingo Molnar Signed-off-by: Paul E. McKenney --- include/linux/rcuwait.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rcuwait.h b/include/linux/rcuwait.h index 61c56cca95c4..8052d34da782 100644 --- a/include/linux/rcuwait.h +++ b/include/linux/rcuwait.h @@ -47,11 +47,7 @@ static inline void prepare_to_rcuwait(struct rcuwait *w) rcu_assign_pointer(w->task, current); } -static inline void finish_rcuwait(struct rcuwait *w) -{ - rcu_assign_pointer(w->task, NULL); - __set_current_state(TASK_RUNNING); -} +extern void finish_rcuwait(struct rcuwait *w); #define rcuwait_wait_event(w, condition, state) \ ({ \ -- cgit From e6339d3b443c436c3b8f45eefec2212a8c07065d Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Fri, 14 Jan 2022 16:16:55 -0800 Subject: rcu: Remove __read_mostly annotations from rcu_scheduler_active externs Remove the __read_mostly attributes from the rcu_scheduler_active extern declarations, because these attributes are ignored for prototypes and we'd have to include the full header to gain this functionally pointless attribute defined. Signed-off-by: Ingo Molnar Signed-off-by: Paul E. McKenney --- include/linux/rcupdate.h | 2 +- include/linux/rcutree.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 9d7df8d36af0..e7c39c200e2b 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -84,7 +84,7 @@ static inline int rcu_preempt_depth(void) /* Internal to kernel */ void rcu_init(void); -extern int rcu_scheduler_active __read_mostly; +extern int rcu_scheduler_active; void rcu_sched_clock_irq(int user); void rcu_report_dead(unsigned int cpu); void rcutree_migrate_callbacks(int cpu); diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 53209d669400..76665db179fa 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -62,7 +62,7 @@ static inline void rcu_irq_exit_check_preempt(void) { } void exit_rcu(void); void rcu_scheduler_starting(void); -extern int rcu_scheduler_active __read_mostly; +extern int rcu_scheduler_active; void rcu_end_inkernel_boot(void); bool rcu_inkernel_boot_has_ended(void); bool rcu_is_watching(void); -- cgit From 7a5fbc9bcba5325a45297a4ba00091f39a63a1ed Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Mon, 7 Feb 2022 15:02:46 -0800 Subject: iommu/ioasid: Introduce a helper to check for valid PASIDs Define a pasid_valid() helper to check if a given PASID is valid. [ bp: Massage commit message. ] Suggested-by: Ashok Raj Suggested-by: Jacob Pan Signed-off-by: Fenghua Yu Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner Reviewed-by: Lu Baolu Link: https://lore.kernel.org/r/20220207230254.3342514-4-fenghua.yu@intel.com --- include/linux/ioasid.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h index e9dacd4b9f6b..2237f64dbaae 100644 --- a/include/linux/ioasid.h +++ b/include/linux/ioasid.h @@ -41,6 +41,10 @@ void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, int ioasid_register_allocator(struct ioasid_allocator_ops *allocator); void ioasid_unregister_allocator(struct ioasid_allocator_ops *allocator); int ioasid_set_data(ioasid_t ioasid, void *data); +static inline bool pasid_valid(ioasid_t ioasid) +{ + return ioasid != INVALID_IOASID; +} #else /* !CONFIG_IOASID */ static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, @@ -78,5 +82,10 @@ static inline int ioasid_set_data(ioasid_t ioasid, void *data) return -ENOTSUPP; } +static inline bool pasid_valid(ioasid_t ioasid) +{ + return false; +} + #endif /* CONFIG_IOASID */ #endif /* __LINUX_IOASID_H */ -- cgit From a6cbd44093ef305b02ad5f80ed54abf0148a696c Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Mon, 7 Feb 2022 15:02:47 -0800 Subject: kernel/fork: Initialize mm's PASID A new mm doesn't have a PASID yet when it's created. Initialize the mm's PASID on fork() or for init_mm to INVALID_IOASID (-1). INIT_PASID (0) is reserved for kernel legacy DMA PASID. It cannot be allocated to a user process. Initializing the process's PASID to 0 may cause confusion that's why the process uses the reserved kernel legacy DMA PASID. Initializing the PASID to INVALID_IOASID (-1) explicitly tells the process doesn't have a valid PASID yet. Even though the only user of mm_pasid_init() is in fork.c, define it in as the first of three mm/pasid life cycle functions (init/set/drop) to keep these all together. Suggested-by: Dave Hansen Signed-off-by: Fenghua Yu Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner Link: https://lore.kernel.org/r/20220207230254.3342514-5-fenghua.yu@intel.com --- include/linux/sched/mm.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index aa5f09ca5bcf..c74d1edbac2f 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -8,6 +8,7 @@ #include #include #include +#include /* * Routines for handling mm_structs @@ -433,4 +434,13 @@ static inline void membarrier_update_current_mm(struct mm_struct *next_mm) } #endif +#ifdef CONFIG_IOMMU_SVA +static inline void mm_pasid_init(struct mm_struct *mm) +{ + mm->pasid = INVALID_IOASID; +} +#else +static inline void mm_pasid_init(struct mm_struct *mm) {} +#endif + #endif /* _LINUX_SCHED_MM_H */ -- cgit From 8cb37a5974a48569aab8a1736d21399fddbdbdb2 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 31 Jan 2022 10:05:20 +0100 Subject: stack: Introduce CONFIG_RANDOMIZE_KSTACK_OFFSET The randomize_kstack_offset feature is unconditionally compiled in when the architecture supports it. To add constraints on compiler versions, we require a dedicated Kconfig variable. Therefore, introduce RANDOMIZE_KSTACK_OFFSET. Furthermore, this option is now also configurable by EXPERT kernels: while the feature is supposed to have zero performance overhead when disabled, due to its use of static branches, there are few cases where giving a distribution the option to disable the feature entirely makes sense. For example, in very resource constrained environments, which would never enable the feature to begin with, in which case the additional kernel code size increase would be redundant. Signed-off-by: Marco Elver Reviewed-by: Nathan Chancellor Acked-by: Peter Zijlstra (Intel) Acked-by: Kees Cook Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220131090521.1947110-1-elver@google.com --- include/linux/randomize_kstack.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h index bebc911161b6..91f1b990a3c3 100644 --- a/include/linux/randomize_kstack.h +++ b/include/linux/randomize_kstack.h @@ -2,6 +2,7 @@ #ifndef _LINUX_RANDOMIZE_KSTACK_H #define _LINUX_RANDOMIZE_KSTACK_H +#ifdef CONFIG_RANDOMIZE_KSTACK_OFFSET #include #include #include @@ -50,5 +51,9 @@ void *__builtin_alloca(size_t size); raw_cpu_write(kstack_offset, offset); \ } \ } while (0) +#else /* CONFIG_RANDOMIZE_KSTACK_OFFSET */ +#define add_random_kstack_offset() do { } while (0) +#define choose_random_kstack_offset(rand) do { } while (0) +#endif /* CONFIG_RANDOMIZE_KSTACK_OFFSET */ #endif -- cgit From efa90c11f62e6b7252fb75efe2787056872a627c Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 31 Jan 2022 10:05:21 +0100 Subject: stack: Constrain and fix stack offset randomization with Clang builds All supported versions of Clang perform auto-init of __builtin_alloca() when stack auto-init is on (CONFIG_INIT_STACK_ALL_{ZERO,PATTERN}). add_random_kstack_offset() uses __builtin_alloca() to add a stack offset. This means, when CONFIG_INIT_STACK_ALL_{ZERO,PATTERN} is enabled, add_random_kstack_offset() will auto-init that unused portion of the stack used to add an offset. There are several problems with this: 1. These offsets can be as large as 1023 bytes. Performing memset() on them isn't exactly cheap, and this is done on every syscall entry. 2. Architectures adding add_random_kstack_offset() to syscall entry implemented in C require them to be 'noinstr' (e.g. see x86 and s390). The potential problem here is that a call to memset may occur, which is not noinstr. A x86_64 defconfig kernel with Clang 11 and CONFIG_VMLINUX_VALIDATION shows: | vmlinux.o: warning: objtool: do_syscall_64()+0x9d: call to memset() leaves .noinstr.text section | vmlinux.o: warning: objtool: do_int80_syscall_32()+0xab: call to memset() leaves .noinstr.text section | vmlinux.o: warning: objtool: __do_fast_syscall_32()+0xe2: call to memset() leaves .noinstr.text section | vmlinux.o: warning: objtool: fixup_bad_iret()+0x2f: call to memset() leaves .noinstr.text section Clang 14 (unreleased) will introduce a way to skip alloca initialization via __builtin_alloca_uninitialized() (https://reviews.llvm.org/D115440). Constrain RANDOMIZE_KSTACK_OFFSET to only be enabled if no stack auto-init is enabled, the compiler is GCC, or Clang is version 14+. Use __builtin_alloca_uninitialized() if the compiler provides it, as is done by Clang 14. Link: https://lkml.kernel.org/r/YbHTKUjEejZCLyhX@elver.google.com Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall") Signed-off-by: Marco Elver Reviewed-by: Nathan Chancellor Acked-by: Kees Cook Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220131090521.1947110-2-elver@google.com --- include/linux/randomize_kstack.h | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h index 91f1b990a3c3..1468caf001c0 100644 --- a/include/linux/randomize_kstack.h +++ b/include/linux/randomize_kstack.h @@ -17,8 +17,20 @@ DECLARE_PER_CPU(u32, kstack_offset); * alignment. Also, since this use is being explicitly masked to a max of * 10 bits, stack-clash style attacks are unlikely. For more details see * "VLAs" in Documentation/process/deprecated.rst + * + * The normal __builtin_alloca() is initialized with INIT_STACK_ALL (currently + * only with Clang and not GCC). Initializing the unused area on each syscall + * entry is expensive, and generating an implicit call to memset() may also be + * problematic (such as in noinstr functions). Therefore, if the compiler + * supports it (which it should if it initializes allocas), always use the + * "uninitialized" variant of the builtin. */ -void *__builtin_alloca(size_t size); +#if __has_builtin(__builtin_alloca_uninitialized) +#define __kstack_alloca __builtin_alloca_uninitialized +#else +#define __kstack_alloca __builtin_alloca +#endif + /* * Use, at most, 10 bits of entropy. We explicitly cap this to keep the * "VLA" from being unbounded (see above). 10 bits leaves enough room for @@ -37,7 +49,7 @@ void *__builtin_alloca(size_t size); if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ &randomize_kstack_offset)) { \ u32 offset = raw_cpu_read(kstack_offset); \ - u8 *ptr = __builtin_alloca(KSTACK_OFFSET_MAX(offset)); \ + u8 *ptr = __kstack_alloca(KSTACK_OFFSET_MAX(offset)); \ /* Keep allocation even after "ptr" loses scope. */ \ asm volatile("" :: "r"(ptr) : "memory"); \ } \ -- cgit From 481153991c410e322383490527c20b0c41dbf652 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Thu, 10 Feb 2022 22:33:41 +0100 Subject: i2c: don't expose function which is only used internally MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit i2c_setup_smbus_alert() is only needed within the I2C core, so no need to expose it to other modules. Signed-off-by: Wolfram Sang Reviewed-by: Niklas Söderlund Signed-off-by: Wolfram Sang --- include/linux/i2c-smbus.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/i2c-smbus.h b/include/linux/i2c-smbus.h index 95cf902e0bda..ced1c6ead52a 100644 --- a/include/linux/i2c-smbus.h +++ b/include/linux/i2c-smbus.h @@ -30,14 +30,6 @@ struct i2c_client *i2c_new_smbus_alert_device(struct i2c_adapter *adapter, struct i2c_smbus_alert_setup *setup); int i2c_handle_smbus_alert(struct i2c_client *ara); -#if IS_ENABLED(CONFIG_I2C_SMBUS) -int i2c_setup_smbus_alert(struct i2c_adapter *adap); -#else -static inline int i2c_setup_smbus_alert(struct i2c_adapter *adap) -{ - return 0; -} -#endif #if IS_ENABLED(CONFIG_I2C_SMBUS) && IS_ENABLED(CONFIG_I2C_SLAVE) struct i2c_client *i2c_new_slave_host_notify_device(struct i2c_adapter *adapter); void i2c_free_slave_host_notify_device(struct i2c_client *client); -- cgit From 701fac40384f07197b106136012804c3cae0b3de Mon Sep 17 00:00:00 2001 From: Fenghua Yu Date: Mon, 7 Feb 2022 15:02:48 -0800 Subject: iommu/sva: Assign a PASID to mm on PASID allocation and free it on mm exit PASIDs are process-wide. It was attempted to use refcounted PASIDs to free them when the last thread drops the refcount. This turned out to be complex and error prone. Given the fact that the PASID space is 20 bits, which allows up to 1M processes to have a PASID associated concurrently, PASID resource exhaustion is not a realistic concern. Therefore, it was decided to simplify the approach and stick with lazy on demand PASID allocation, but drop the eager free approach and make an allocated PASID's lifetime bound to the lifetime of the process. Get rid of the refcounting mechanisms and replace/rename the interfaces to reflect this new approach. [ bp: Massage commit message. ] Suggested-by: Dave Hansen Signed-off-by: Fenghua Yu Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Reviewed-by: Lu Baolu Reviewed-by: Jacob Pan Reviewed-by: Thomas Gleixner Acked-by: Joerg Roedel Link: https://lore.kernel.org/r/20220207230254.3342514-6-fenghua.yu@intel.com --- include/linux/ioasid.h | 12 ++---------- include/linux/sched/mm.h | 16 ++++++++++++++++ 2 files changed, 18 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ioasid.h b/include/linux/ioasid.h index 2237f64dbaae..af1c9d62e642 100644 --- a/include/linux/ioasid.h +++ b/include/linux/ioasid.h @@ -34,8 +34,7 @@ struct ioasid_allocator_ops { #if IS_ENABLED(CONFIG_IOASID) ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max, void *private); -void ioasid_get(ioasid_t ioasid); -bool ioasid_put(ioasid_t ioasid); +void ioasid_free(ioasid_t ioasid); void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, bool (*getter)(void *)); int ioasid_register_allocator(struct ioasid_allocator_ops *allocator); @@ -53,14 +52,7 @@ static inline ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, return INVALID_IOASID; } -static inline void ioasid_get(ioasid_t ioasid) -{ -} - -static inline bool ioasid_put(ioasid_t ioasid) -{ - return false; -} +static inline void ioasid_free(ioasid_t ioasid) { } static inline void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid, bool (*getter)(void *)) diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index c74d1edbac2f..a80356e9dc69 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -439,8 +439,24 @@ static inline void mm_pasid_init(struct mm_struct *mm) { mm->pasid = INVALID_IOASID; } + +/* Associate a PASID with an mm_struct: */ +static inline void mm_pasid_set(struct mm_struct *mm, u32 pasid) +{ + mm->pasid = pasid; +} + +static inline void mm_pasid_drop(struct mm_struct *mm) +{ + if (pasid_valid(mm->pasid)) { + ioasid_free(mm->pasid); + mm->pasid = INVALID_IOASID; + } +} #else static inline void mm_pasid_init(struct mm_struct *mm) {} +static inline void mm_pasid_set(struct mm_struct *mm, u32 pasid) {} +static inline void mm_pasid_drop(struct mm_struct *mm) {} #endif #endif /* _LINUX_SCHED_MM_H */ -- cgit From a3d29e8291b622780eb6e4e3eeaf2b24ec78fd43 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 7 Feb 2022 15:02:50 -0800 Subject: sched: Define and initialize a flag to identify valid PASID in the task Add a new single bit field to the task structure to track whether this task has initialized the IA32_PASID MSR to the mm's PASID. Initialize the field to zero when creating a new task with fork/clone. Signed-off-by: Peter Zijlstra Co-developed-by: Fenghua Yu Signed-off-by: Fenghua Yu Signed-off-by: Borislav Petkov Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner Link: https://lore.kernel.org/r/20220207230254.3342514-8-fenghua.yu@intel.com --- include/linux/sched.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 75ba8aa60248..4e5de3aed410 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -938,6 +938,9 @@ struct task_struct { /* Recursion prevention for eventfd_signal() */ unsigned in_eventfd_signal:1; #endif +#ifdef CONFIG_IOMMU_SVA + unsigned pasid_activated:1; +#endif unsigned long atomic_flags; /* Flags requiring atomic access. */ -- cgit From 45ec846c1cd11835a29c85645065115dd791aa45 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 9 Feb 2022 16:25:58 +0000 Subject: irqdomain: Let irq_domain_set_{info,hwirq_and_chip} take a const irq_chip In order to let a const irqchip be fed to the irqchip layer, adjust the various prototypes. An extra cast in irq_domain_set_hwirq_and_chip() is required to avoid a warning. Signed-off-by: Marc Zyngier Acked-by: Linus Walleij Link: https://lore.kernel.org/r/20220209162607.1118325-2-maz@kernel.org --- include/linux/irqdomain.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index be25a33293e5..00d577f90883 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -479,7 +479,8 @@ int irq_destroy_ipi(unsigned int irq, const struct cpumask *dest); extern struct irq_data *irq_domain_get_irq_data(struct irq_domain *domain, unsigned int virq); extern void irq_domain_set_info(struct irq_domain *domain, unsigned int virq, - irq_hw_number_t hwirq, struct irq_chip *chip, + irq_hw_number_t hwirq, + const struct irq_chip *chip, void *chip_data, irq_flow_handler_t handler, void *handler_data, const char *handler_name); extern void irq_domain_reset_irq_data(struct irq_data *irq_data); @@ -522,7 +523,7 @@ extern int irq_domain_alloc_irqs_hierarchy(struct irq_domain *domain, extern int irq_domain_set_hwirq_and_chip(struct irq_domain *domain, unsigned int virq, irq_hw_number_t hwirq, - struct irq_chip *chip, + const struct irq_chip *chip, void *chip_data); extern void irq_domain_free_irqs_common(struct irq_domain *domain, unsigned int virq, -- cgit From 393e1280f765661cf39785e967676a4e57324126 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 9 Feb 2022 16:25:59 +0000 Subject: genirq: Allow irq_chip registration functions to take a const irq_chip In order to let a const irqchip be fed to the irqchip layer, adjust the various prototypes. An extra cast in irq_set_chip()() is required to avoid a warning. Signed-off-by: Marc Zyngier Acked-by: Linus Walleij Link: https://lore.kernel.org/r/20220209162607.1118325-3-maz@kernel.org --- include/linux/irq.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/irq.h b/include/linux/irq.h index 2cb2e2ac2703..f92788ccdba2 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -710,10 +710,11 @@ extern struct irq_chip no_irq_chip; extern struct irq_chip dummy_irq_chip; extern void -irq_set_chip_and_handler_name(unsigned int irq, struct irq_chip *chip, +irq_set_chip_and_handler_name(unsigned int irq, const struct irq_chip *chip, irq_flow_handler_t handle, const char *name); -static inline void irq_set_chip_and_handler(unsigned int irq, struct irq_chip *chip, +static inline void irq_set_chip_and_handler(unsigned int irq, + const struct irq_chip *chip, irq_flow_handler_t handle) { irq_set_chip_and_handler_name(irq, chip, handle, NULL); @@ -803,7 +804,7 @@ static inline void irq_set_percpu_devid_flags(unsigned int irq) } /* Set/get chip/data for an IRQ: */ -extern int irq_set_chip(unsigned int irq, struct irq_chip *chip); +extern int irq_set_chip(unsigned int irq, const struct irq_chip *chip); extern int irq_set_handler_data(unsigned int irq, void *data); extern int irq_set_chip_data(unsigned int irq, void *data); extern int irq_set_irq_type(unsigned int irq, unsigned int type); -- cgit From 3fb212a042fbd8eccbb2af1852e03ed7757b9600 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 9 Feb 2022 16:26:04 +0000 Subject: irqchip/versatile-fpga: Switch to dynamic chip name output Move the name output to the relevant callback, which allows us some nice cleanups (mostly owing to the fact that the driver is now DT only. We also drop a random include directive from the ftintc010 driver. Signed-off-by: Marc Zyngier Reviewed-by: Linus Walleij Acked-by: Linus Walleij Link: https://lore.kernel.org/r/20220209162607.1118325-8-maz@kernel.org --- include/linux/irqchip/versatile-fpga.h | 14 -------------- 1 file changed, 14 deletions(-) delete mode 100644 include/linux/irqchip/versatile-fpga.h (limited to 'include/linux') diff --git a/include/linux/irqchip/versatile-fpga.h b/include/linux/irqchip/versatile-fpga.h deleted file mode 100644 index a978fc8c7996..000000000000 --- a/include/linux/irqchip/versatile-fpga.h +++ /dev/null @@ -1,14 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef PLAT_FPGA_IRQ_H -#define PLAT_FPGA_IRQ_H - -struct device_node; -struct pt_regs; - -void fpga_handle_irq(struct pt_regs *regs); -void fpga_irq_init(void __iomem *, const char *, int, int, u32, - struct device_node *node); -int fpga_irq_of_init(struct device_node *node, - struct device_node *parent); - -#endif -- cgit From 5e50f5d4ff31e95599d695df1f0a4e7d2d6fef99 Mon Sep 17 00:00:00 2001 From: Ondrej Mosnacek Date: Sat, 12 Feb 2022 18:59:21 +0100 Subject: security: add sctp_assoc_established hook security_sctp_assoc_established() is added to replace security_inet_conn_established() called in sctp_sf_do_5_1E_ca(), so that asoc can be accessed in security subsystem and save the peer secid to asoc->peer_secid. Fixes: 72e89f50084c ("security: Add support for SCTP security hooks") Reported-by: Prashanth Prahlad Based-on-patch-by: Xin Long Reviewed-by: Xin Long Tested-by: Richard Haines Signed-off-by: Ondrej Mosnacek Signed-off-by: Paul Moore --- include/linux/lsm_hook_defs.h | 2 ++ include/linux/lsm_hooks.h | 5 +++++ include/linux/security.h | 8 ++++++++ 3 files changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index a5a724c308d8..45931d81ccc3 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -332,6 +332,8 @@ LSM_HOOK(int, 0, sctp_bind_connect, struct sock *sk, int optname, struct sockaddr *address, int addrlen) LSM_HOOK(void, LSM_RET_VOID, sctp_sk_clone, struct sctp_association *asoc, struct sock *sk, struct sock *newsk) +LSM_HOOK(int, 0, sctp_assoc_established, struct sctp_association *asoc, + struct sk_buff *skb) #endif /* CONFIG_SECURITY_NETWORK */ #ifdef CONFIG_SECURITY_INFINIBAND diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 3bf5c658bc44..419b5febc3ca 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1046,6 +1046,11 @@ * @asoc pointer to current sctp association structure. * @sk pointer to current sock structure. * @newsk pointer to new sock structure. + * @sctp_assoc_established: + * Passes the @asoc and @chunk->skb of the association COOKIE_ACK packet + * to the security module. + * @asoc pointer to sctp association structure. + * @skb pointer to skbuff of association packet. * * Security hooks for Infiniband * diff --git a/include/linux/security.h b/include/linux/security.h index 6d72772182c8..25b3ef71f495 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -1422,6 +1422,8 @@ int security_sctp_bind_connect(struct sock *sk, int optname, struct sockaddr *address, int addrlen); void security_sctp_sk_clone(struct sctp_association *asoc, struct sock *sk, struct sock *newsk); +int security_sctp_assoc_established(struct sctp_association *asoc, + struct sk_buff *skb); #else /* CONFIG_SECURITY_NETWORK */ static inline int security_unix_stream_connect(struct sock *sock, @@ -1641,6 +1643,12 @@ static inline void security_sctp_sk_clone(struct sctp_association *asoc, struct sock *newsk) { } + +static inline int security_sctp_assoc_established(struct sctp_association *asoc, + struct sk_buff *skb) +{ + return 0; +} #endif /* CONFIG_SECURITY_NETWORK */ #ifdef CONFIG_SECURITY_INFINIBAND -- cgit From b62a8486de3ab1d7c2353ec422b9cca3abfcfbcd Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Mon, 31 Jan 2022 16:54:52 +0000 Subject: elfcore: Replace CONFIG_{IA64, UML} checks with a new option As arm64 is about to introduce MTE-specific phdrs in the core dump, add a common CONFIG_ARCH_BINFMT_ELF_EXTRA_PHDRS option currently selectable by UML_X86 and IA64. Signed-off-by: Catalin Marinas Cc: Eric Biederman Link: https://lore.kernel.org/r/20220131165456.2160675-2-catalin.marinas@arm.com Signed-off-by: Will Deacon --- include/linux/elfcore.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h index 746e081879a5..f8e206e82476 100644 --- a/include/linux/elfcore.h +++ b/include/linux/elfcore.h @@ -114,7 +114,7 @@ static inline int elf_core_copy_task_fpregs(struct task_struct *t, struct pt_reg #endif } -#if (defined(CONFIG_UML) && defined(CONFIG_X86_32)) || defined(CONFIG_IA64) +#ifdef CONFIG_ARCH_BINFMT_ELF_EXTRA_PHDRS /* * These functions parameterize elf_core_dump in fs/binfmt_elf.c to write out * extra segments containing the gate DSO contents. Dumping its @@ -149,6 +149,6 @@ static inline size_t elf_core_extra_data_size(void) { return 0; } -#endif +#endif /* CONFIG_ARCH_BINFMT_ELF_EXTRA_PHDRS */ #endif /* _LINUX_ELFCORE_H */ -- cgit From f41b6be1ebdae452819551ed35a46e6fd32bf467 Mon Sep 17 00:00:00 2001 From: Jens Wiklander Date: Fri, 4 Feb 2022 10:33:51 +0100 Subject: tee: remove unused tee_shm_pool_alloc_res_mem() None of the drivers in the TEE subsystem uses tee_shm_pool_alloc_res_mem() so remove the function. Reviewed-by: Sumit Garg Signed-off-by: Jens Wiklander --- include/linux/tee_drv.h | 30 ------------------------------ 1 file changed, 30 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index 5e1533ee3785..6b0f0d01ebdf 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -278,36 +278,6 @@ static inline void tee_shm_pool_mgr_destroy(struct tee_shm_pool_mgr *poolm) poolm->ops->destroy_poolmgr(poolm); } -/** - * struct tee_shm_pool_mem_info - holds information needed to create a shared - * memory pool - * @vaddr: Virtual address of start of pool - * @paddr: Physical address of start of pool - * @size: Size in bytes of the pool - */ -struct tee_shm_pool_mem_info { - unsigned long vaddr; - phys_addr_t paddr; - size_t size; -}; - -/** - * tee_shm_pool_alloc_res_mem() - Create a shared memory pool from reserved - * memory range - * @priv_info: Information for driver private shared memory pool - * @dmabuf_info: Information for dma-buf shared memory pool - * - * Start and end of pools will must be page aligned. - * - * Allocation with the flag TEE_SHM_DMA_BUF set will use the range supplied - * in @dmabuf, others will use the range provided by @priv. - * - * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure. - */ -struct tee_shm_pool * -tee_shm_pool_alloc_res_mem(struct tee_shm_pool_mem_info *priv_info, - struct tee_shm_pool_mem_info *dmabuf_info); - /** * tee_shm_pool_free() - Free a shared memory pool * @pool: The shared memory pool to free -- cgit From 71cc47d4cc1f7a333584e0f2f7c863c71a6d3ced Mon Sep 17 00:00:00 2001 From: Jens Wiklander Date: Fri, 4 Feb 2022 10:33:52 +0100 Subject: tee: add tee_shm_alloc_user_buf() Adds a new function tee_shm_alloc_user_buf() for user mode allocations, replacing passing the flags TEE_SHM_MAPPED | TEE_SHM_DMA_BUF to tee_shm_alloc(). Reviewed-by: Sumit Garg Signed-off-by: Jens Wiklander --- include/linux/tee_drv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index 6b0f0d01ebdf..a4393c8c38f3 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * Copyright (c) 2015-2016, Linaro Limited + * Copyright (c) 2015-2016 Linaro Limited */ #ifndef __TEE_DRV_H -- cgit From d88e0493a054c9fe72ade41a42d42e958ee6503d Mon Sep 17 00:00:00 2001 From: Jens Wiklander Date: Fri, 4 Feb 2022 10:33:53 +0100 Subject: tee: simplify shm pool handling Replaces the shared memory pool based on two pools with a single pool. The alloc() function pointer in struct tee_shm_pool_ops gets another parameter, align. This makes it possible to make less than page aligned allocations from the optional reserved shared memory pool while still making user space allocations page aligned. With in practice unchanged behaviour using only a single pool for bookkeeping. The allocation algorithm in the static OP-TEE shared memory pool is changed from best-fit to first-fit since only the latter supports an alignment parameter. The best-fit algorithm was previously the default choice and not a conscious one. The optee and amdtee drivers are updated as needed to work with this changed pool handling. This also removes OPTEE_SHM_NUM_PRIV_PAGES which becomes obsolete with this change as the private pages can be mixed with the payload pages. The OP-TEE driver changes minimum alignment for argument struct from 8 bytes to 512 bytes. A typical OP-TEE private shm allocation is 224 bytes (argument struct with 6 parameters, needed for open session). So with an alignment of 512 well waste a bit more than 50%. Before this we had a single page reserved for this so worst case usage compared to that would be 3 pages instead of 1 page. However, this worst case only occurs if there is a high pressure from multiple threads on secure world. All in all this should scale up and down better than fixed boundaries. Reviewed-by: Sumit Garg Signed-off-by: Jens Wiklander --- include/linux/tee_drv.h | 60 +++++++++++++++++-------------------------------- 1 file changed, 20 insertions(+), 40 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index a4393c8c38f3..ed641dc314bd 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * Copyright (c) 2015-2016 Linaro Limited + * Copyright (c) 2015-2022 Linaro Limited */ #ifndef __TEE_DRV_H @@ -221,62 +221,39 @@ struct tee_shm { }; /** - * struct tee_shm_pool_mgr - shared memory manager + * struct tee_shm_pool - shared memory pool * @ops: operations * @private_data: private data for the shared memory manager */ -struct tee_shm_pool_mgr { - const struct tee_shm_pool_mgr_ops *ops; +struct tee_shm_pool { + const struct tee_shm_pool_ops *ops; void *private_data; }; /** - * struct tee_shm_pool_mgr_ops - shared memory pool manager operations + * struct tee_shm_pool_ops - shared memory pool operations * @alloc: called when allocating shared memory * @free: called when freeing shared memory - * @destroy_poolmgr: called when destroying the pool manager + * @destroy_pool: called when destroying the pool */ -struct tee_shm_pool_mgr_ops { - int (*alloc)(struct tee_shm_pool_mgr *poolmgr, struct tee_shm *shm, - size_t size); - void (*free)(struct tee_shm_pool_mgr *poolmgr, struct tee_shm *shm); - void (*destroy_poolmgr)(struct tee_shm_pool_mgr *poolmgr); +struct tee_shm_pool_ops { + int (*alloc)(struct tee_shm_pool *pool, struct tee_shm *shm, + size_t size, size_t align); + void (*free)(struct tee_shm_pool *pool, struct tee_shm *shm); + void (*destroy_pool)(struct tee_shm_pool *pool); }; -/** - * tee_shm_pool_alloc() - Create a shared memory pool from shm managers - * @priv_mgr: manager for driver private shared memory allocations - * @dmabuf_mgr: manager for dma-buf shared memory allocations - * - * Allocation with the flag TEE_SHM_DMA_BUF set will use the range supplied - * in @dmabuf, others will use the range provided by @priv. - * - * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure. - */ -struct tee_shm_pool *tee_shm_pool_alloc(struct tee_shm_pool_mgr *priv_mgr, - struct tee_shm_pool_mgr *dmabuf_mgr); - /* - * tee_shm_pool_mgr_alloc_res_mem() - Create a shm manager for reserved - * memory + * tee_shm_pool_alloc_res_mem() - Create a shm manager for reserved memory * @vaddr: Virtual address of start of pool * @paddr: Physical address of start of pool * @size: Size in bytes of the pool * - * @returns pointer to a 'struct tee_shm_pool_mgr' or an ERR_PTR on failure. - */ -struct tee_shm_pool_mgr *tee_shm_pool_mgr_alloc_res_mem(unsigned long vaddr, - phys_addr_t paddr, - size_t size, - int min_alloc_order); - -/** - * tee_shm_pool_mgr_destroy() - Free a shared memory manager + * @returns pointer to a 'struct tee_shm_pool' or an ERR_PTR on failure. */ -static inline void tee_shm_pool_mgr_destroy(struct tee_shm_pool_mgr *poolm) -{ - poolm->ops->destroy_poolmgr(poolm); -} +struct tee_shm_pool *tee_shm_pool_alloc_res_mem(unsigned long vaddr, + phys_addr_t paddr, size_t size, + int min_alloc_order); /** * tee_shm_pool_free() - Free a shared memory pool @@ -285,7 +262,10 @@ static inline void tee_shm_pool_mgr_destroy(struct tee_shm_pool_mgr *poolm) * The must be no remaining shared memory allocated from this pool when * this function is called. */ -void tee_shm_pool_free(struct tee_shm_pool *pool); +static inline void tee_shm_pool_free(struct tee_shm_pool *pool) +{ + pool->ops->destroy_pool(pool); +} /** * tee_get_drvdata() - Return driver_data pointer -- cgit From 5d41f1b3e3282909b6bbceacb9aebe1d3c849a49 Mon Sep 17 00:00:00 2001 From: Jens Wiklander Date: Fri, 4 Feb 2022 10:33:54 +0100 Subject: tee: replace tee_shm_alloc() tee_shm_alloc() is replaced by three new functions, tee_shm_alloc_user_buf() - for user mode allocations, replacing passing the flags TEE_SHM_MAPPED | TEE_SHM_DMA_BUF tee_shm_alloc_kernel_buf() - for kernel mode allocations, slightly optimized compared to using the flags TEE_SHM_MAPPED | TEE_SHM_DMA_BUF. tee_shm_alloc_priv_buf() - primarily for TEE driver internal use. This also makes the interface easier to use as we can get rid of the somewhat hard to use flags parameter. The TEE subsystem and the TEE drivers are updated to use the new functions instead. Reviewed-by: Sumit Garg Signed-off-by: Jens Wiklander --- include/linux/tee_drv.h | 16 +--------------- 1 file changed, 1 insertion(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index ed641dc314bd..7f038f8787c7 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -273,21 +273,7 @@ static inline void tee_shm_pool_free(struct tee_shm_pool *pool) */ void *tee_get_drvdata(struct tee_device *teedev); -/** - * tee_shm_alloc() - Allocate shared memory - * @ctx: Context that allocates the shared memory - * @size: Requested size of shared memory - * @flags: Flags setting properties for the requested shared memory. - * - * Memory allocated as global shared memory is automatically freed when the - * TEE file pointer is closed. The @flags field uses the bits defined by - * TEE_SHM_* above. TEE_SHM_MAPPED must currently always be set. If - * TEE_SHM_DMA_BUF global shared memory will be allocated and associated - * with a dma-buf handle, else driver private memory. - * - * @returns a pointer to 'struct tee_shm' - */ -struct tee_shm *tee_shm_alloc(struct tee_context *ctx, size_t size, u32 flags); +struct tee_shm *tee_shm_alloc_priv_buf(struct tee_context *ctx, size_t size); struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size); /** -- cgit From 056d3fed3d1ff3f5d699be337f048f9eed2befaf Mon Sep 17 00:00:00 2001 From: Jens Wiklander Date: Fri, 4 Feb 2022 10:33:56 +0100 Subject: tee: add tee_shm_register_{user,kernel}_buf() Adds the two new functions tee_shm_register_user_buf() and tee_shm_register_kernel_buf() which should be used instead of the old tee_shm_register(). This avoids having the caller supplying the flags parameter which exposes a bit more than desired of the internals of the TEE subsystem. Reviewed-by: Sumit Garg Signed-off-by: Jens Wiklander --- include/linux/tee_drv.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index 7f038f8787c7..c9d2cc32a5ed 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -287,6 +287,8 @@ struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size); */ struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr, size_t length, u32 flags); +struct tee_shm *tee_shm_register_kernel_buf(struct tee_context *ctx, + void *addr, size_t length); /** * tee_shm_is_registered() - Check if shared memory object in registered in TEE -- cgit From 53e16519c2eccdb2e1b123405466a29aaea1132e Mon Sep 17 00:00:00 2001 From: Jens Wiklander Date: Fri, 4 Feb 2022 10:33:58 +0100 Subject: tee: replace tee_shm_register() tee_shm_register() is replaced by the previously introduced functions tee_shm_register_user_buf() and tee_shm_register_kernel_buf(). Since there are not external callers left we can remove tee_shm_register() and refactor the remains. Reviewed-by: Sumit Garg Signed-off-by: Jens Wiklander --- include/linux/tee_drv.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index c9d2cc32a5ed..a3b663ef0694 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -276,17 +276,6 @@ void *tee_get_drvdata(struct tee_device *teedev); struct tee_shm *tee_shm_alloc_priv_buf(struct tee_context *ctx, size_t size); struct tee_shm *tee_shm_alloc_kernel_buf(struct tee_context *ctx, size_t size); -/** - * tee_shm_register() - Register shared memory buffer - * @ctx: Context that registers the shared memory - * @addr: Address is userspace of the shared buffer - * @length: Length of the shared buffer - * @flags: Flags setting properties for the requested shared memory. - * - * @returns a pointer to 'struct tee_shm' - */ -struct tee_shm *tee_shm_register(struct tee_context *ctx, unsigned long addr, - size_t length, u32 flags); struct tee_shm *tee_shm_register_kernel_buf(struct tee_context *ctx, void *addr, size_t length); -- cgit From a45ea4efa358577c623d7353a6ba9af3c17f6ca0 Mon Sep 17 00:00:00 2001 From: Jens Wiklander Date: Fri, 4 Feb 2022 10:33:59 +0100 Subject: tee: refactor TEE_SHM_* flags Removes the redundant TEE_SHM_DMA_BUF, TEE_SHM_EXT_DMA_BUF, TEE_SHM_MAPPED and TEE_SHM_KERNEL_MAPPED flags. TEE_SHM_REGISTER is renamed to TEE_SHM_DYNAMIC in order to better match its usage. Assigns new values to the remaining flags to void gaps. Reviewed-by: Sumit Garg Signed-off-by: Jens Wiklander --- include/linux/tee_drv.h | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index a3b663ef0694..911cad324acc 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -20,14 +20,11 @@ * specific TEE driver. */ -#define TEE_SHM_MAPPED BIT(0) /* Memory mapped by the kernel */ -#define TEE_SHM_DMA_BUF BIT(1) /* Memory with dma-buf handle */ -#define TEE_SHM_EXT_DMA_BUF BIT(2) /* Memory with dma-buf handle */ -#define TEE_SHM_REGISTER BIT(3) /* Memory registered in secure world */ -#define TEE_SHM_USER_MAPPED BIT(4) /* Memory mapped in user space */ -#define TEE_SHM_POOL BIT(5) /* Memory allocated from pool */ -#define TEE_SHM_KERNEL_MAPPED BIT(6) /* Memory mapped in kernel space */ -#define TEE_SHM_PRIV BIT(7) /* Memory private to TEE driver */ +#define TEE_SHM_DYNAMIC BIT(0) /* Dynamic shared memory registered */ + /* in secure world */ +#define TEE_SHM_USER_MAPPED BIT(1) /* Memory mapped in user space */ +#define TEE_SHM_POOL BIT(2) /* Memory allocated from pool */ +#define TEE_SHM_PRIV BIT(3) /* Memory private to TEE driver */ struct device; struct tee_device; @@ -280,13 +277,13 @@ struct tee_shm *tee_shm_register_kernel_buf(struct tee_context *ctx, void *addr, size_t length); /** - * tee_shm_is_registered() - Check if shared memory object in registered in TEE + * tee_shm_is_dynamic() - Check if shared memory object is of the dynamic kind * @shm: Shared memory handle - * @returns true if object is registered in TEE + * @returns true if object is dynamic shared memory */ -static inline bool tee_shm_is_registered(struct tee_shm *shm) +static inline bool tee_shm_is_dynamic(struct tee_shm *shm) { - return shm && (shm->flags & TEE_SHM_REGISTER); + return shm && (shm->flags & TEE_SHM_DYNAMIC); } /** -- cgit From a257cacc38718c83cee003487e03197f237f5c3f Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Tue, 15 Feb 2022 13:41:02 +0100 Subject: asm-generic: Define CONFIG_HAVE_FUNCTION_DESCRIPTORS Replace HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR by a config option named CONFIG_HAVE_FUNCTION_DESCRIPTORS and use it instead of 'dereference_function_descriptor' macro to know whether an arch has function descriptors. To limit churn in one of the following patches, use an #ifdef/#else construct with empty first part instead of an #ifndef in asm-generic/sections.h On powerpc, make sure the config option matches the ABI used by the compiler with a BUILD_BUG_ON() and add missing _CALL_ELF=2 when calling 'sparse' so that sparse sees the same piece of code as GCC. And include a helper to check whether an arch has function descriptors or not : have_function_descriptors() Signed-off-by: Christophe Leroy Reviewed-by: Kees Cook Reviewed-by: Nicholas Piggin Acked-by: Helge Deller Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/4a0f11fb0ea74a3197bc44dd7ba25e53a24fd03d.1644928018.git.christophe.leroy@csgroup.eu --- include/linux/kallsyms.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index 4176c7eca7b5..ce1bd2fbf23e 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -48,7 +48,7 @@ static inline int is_ksym_addr(unsigned long addr) static inline void *dereference_symbol_descriptor(void *ptr) { -#ifdef HAVE_DEREFERENCE_FUNCTION_DESCRIPTOR +#ifdef CONFIG_HAVE_FUNCTION_DESCRIPTORS struct module *mod; ptr = dereference_kernel_function_descriptor(ptr); -- cgit From ba2689234be92024e5635d30fe744f4853ad97db Mon Sep 17 00:00:00 2001 From: James Morse Date: Thu, 18 Nov 2021 13:59:46 +0000 Subject: arm64: entry: Add vectors that have the bhb mitigation sequences Some CPUs affected by Spectre-BHB need a sequence of branches, or a firmware call to be run before any indirect branch. This needs to go in the vectors. No CPU needs both. While this can be patched in, it would run on all CPUs as there is a single set of vectors. If only one part of a big/little combination is affected, the unaffected CPUs have to run the mitigation too. Create extra vectors that include the sequence. Subsequent patches will allow affected CPUs to select this set of vectors. Later patches will modify the loop count to match what the CPU requires. Reviewed-by: Catalin Marinas Signed-off-by: James Morse --- include/linux/arm-smccc.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/arm-smccc.h b/include/linux/arm-smccc.h index 63ccb5252190..220c8c60e021 100644 --- a/include/linux/arm-smccc.h +++ b/include/linux/arm-smccc.h @@ -92,6 +92,11 @@ ARM_SMCCC_SMC_32, \ 0, 0x7fff) +#define ARM_SMCCC_ARCH_WORKAROUND_3 \ + ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ + ARM_SMCCC_SMC_32, \ + 0, 0x3fff) + #define ARM_SMCCC_VENDOR_HYP_CALL_UID_FUNC_ID \ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \ ARM_SMCCC_SMC_32, \ -- cgit From 08bc13d8efe30258850ecdc5d4f38c0bc07689c0 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 10 Feb 2022 20:12:43 +0100 Subject: ieee80211: use tab to indent struct ieee80211_neighbor_ap_info Somehow spaces were used here, use tab instead. Link: https://lore.kernel.org/r/20220210201242.da8fa2e5ae8d.Ia452db01876e52e815f6337fef437049df0d8bd9@changeid Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 60ee7b3f58e7..a2940a853783 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -4005,10 +4005,10 @@ static inline bool for_each_element_completed(const struct element *element, #define IEEE80211_RNR_TBTT_PARAMS_COLOC_AP 0x40 struct ieee80211_neighbor_ap_info { - u8 tbtt_info_hdr; - u8 tbtt_info_len; - u8 op_class; - u8 channel; + u8 tbtt_info_hdr; + u8 tbtt_info_len; + u8 op_class; + u8 channel; } __packed; enum ieee80211_range_params_max_total_ltf { -- cgit From d61f4274daa41565b5f9ce589b4ba1239781c139 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 14 Feb 2022 17:29:21 +0100 Subject: ieee80211: add helper to check HE capability element size This element has a very dynamic structure, create a small helper function to validate its size. We're currently checking it in mac80211 in a conversion function, but that's actually slightly buggy. Link: https://lore.kernel.org/r/20220214172920.750bee9eaf37.Ie18359bd38143b7dc949078f10752413e6d36854@changeid Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index a2940a853783..dfc84680af82 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -9,7 +9,7 @@ * Copyright (c) 2006, Michael Wu * Copyright (c) 2013 - 2014 Intel Mobile Communications GmbH * Copyright (c) 2016 - 2017 Intel Deutschland GmbH - * Copyright (c) 2018 - 2021 Intel Corporation + * Copyright (c) 2018 - 2022 Intel Corporation */ #ifndef LINUX_IEEE80211_H @@ -2338,6 +2338,29 @@ ieee80211_he_ppe_size(u8 ppe_thres_hdr, const u8 *phy_cap_info) return n; } +static inline bool ieee80211_he_capa_size_ok(const u8 *data, u8 len) +{ + const struct ieee80211_he_cap_elem *he_cap_ie_elem = (const void *)data; + u8 needed = sizeof(*he_cap_ie_elem); + + if (len < needed) + return false; + + needed += ieee80211_he_mcs_nss_size(he_cap_ie_elem); + if (len < needed) + return false; + + if (he_cap_ie_elem->phy_cap_info[6] & + IEEE80211_HE_PHY_CAP6_PPE_THRESHOLD_PRESENT) { + if (len < needed + 1) + return false; + needed += ieee80211_he_ppe_size(data[needed], + he_cap_ie_elem->phy_cap_info); + } + + return len >= needed; +} + /* HE Operation defines */ #define IEEE80211_HE_OPERATION_DFLT_PE_DURATION_MASK 0x00000007 #define IEEE80211_HE_OPERATION_TWT_REQUIRED 0x00000008 -- cgit From cbc1ca0a9d0a54cb8aa7c54c16e71599551e3f74 Mon Sep 17 00:00:00 2001 From: Ilan Peer Date: Mon, 14 Feb 2022 17:29:51 +0100 Subject: ieee80211: Add EHT (802.11be) definitions Based on Draft P802.11be_D1.4. Signed-off-by: Ilan Peer Link: https://lore.kernel.org/r/20220214173004.928e23cacb2b.Id30a3ef2844b296efbd5486fe1da9ca36a95c5cf@changeid Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 299 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 299 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index dfc84680af82..7204bc28ca31 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -1925,6 +1926,111 @@ struct ieee80211_mu_edca_param_set { struct ieee80211_he_mu_edca_param_ac_rec ac_vo; } __packed; +#define IEEE80211_EHT_MCS_NSS_RX 0x0f +#define IEEE80211_EHT_MCS_NSS_TX 0xf0 + +/** + * struct ieee80211_eht_mcs_nss_supp_20mhz_only - EHT 20MHz only station max + * supported NSS for per MCS. + * + * For each field below, bits 0 - 3 indicate the maximal number of spatial + * streams for Rx, and bits 4 - 7 indicate the maximal number of spatial streams + * for Tx. + * + * @rx_tx_mcs7_max_nss: indicates the maximum number of spatial streams + * supported for reception and the maximum number of spatial streams + * supported for transmission for MCS 0 - 7. + * @rx_tx_mcs9_max_nss: indicates the maximum number of spatial streams + * supported for reception and the maximum number of spatial streams + * supported for transmission for MCS 8 - 9. + * @rx_tx_mcs11_max_nss: indicates the maximum number of spatial streams + * supported for reception and the maximum number of spatial streams + * supported for transmission for MCS 10 - 11. + * @rx_tx_mcs13_max_nss: indicates the maximum number of spatial streams + * supported for reception and the maximum number of spatial streams + * supported for transmission for MCS 12 - 13. + */ +struct ieee80211_eht_mcs_nss_supp_20mhz_only { + u8 rx_tx_mcs7_max_nss; + u8 rx_tx_mcs9_max_nss; + u8 rx_tx_mcs11_max_nss; + u8 rx_tx_mcs13_max_nss; +}; + +/** + * struct ieee80211_eht_mcs_nss_supp_bw - EHT max supported NSS per MCS (except + * 20MHz only stations). + * + * For each field below, bits 0 - 3 indicate the maximal number of spatial + * streams for Rx, and bits 4 - 7 indicate the maximal number of spatial streams + * for Tx. + * + * @rx_tx_mcs9_max_nss: indicates the maximum number of spatial streams + * supported for reception and the maximum number of spatial streams + * supported for transmission for MCS 0 - 9. + * @rx_tx_mcs11_max_nss: indicates the maximum number of spatial streams + * supported for reception and the maximum number of spatial streams + * supported for transmission for MCS 10 - 11. + * @rx_tx_mcs13_max_nss: indicates the maximum number of spatial streams + * supported for reception and the maximum number of spatial streams + * supported for transmission for MCS 12 - 13. + */ +struct ieee80211_eht_mcs_nss_supp_bw { + u8 rx_tx_mcs9_max_nss; + u8 rx_tx_mcs11_max_nss; + u8 rx_tx_mcs13_max_nss; +}; + +/** + * struct ieee80211_eht_cap_elem_fixed - EHT capabilities fixed data + * + * This structure is the "EHT Capabilities element" fixed fields as + * described in P802.11be_D1.4 section 9.4.2.313. + * + * @mac_cap_info: MAC capabilities, see IEEE80211_EHT_MAC_CAP* + * @phy_cap_info: PHY capabilities, see IEEE80211_EHT_PHY_CAP* + */ +struct ieee80211_eht_cap_elem_fixed { + u8 mac_cap_info[2]; + u8 phy_cap_info[9]; +} __packed; + +/** + * struct ieee80211_eht_cap_elem - EHT capabilities element + * @fixed: fixed parts, see &ieee80211_eht_cap_elem_fixed + * @optional: optional parts + */ +struct ieee80211_eht_cap_elem { + struct ieee80211_eht_cap_elem_fixed fixed; + + /* + * Followed by: + * Supported EHT-MCS And NSS Set field: 4, 3, 6 or 9 octets. + * EHT PPE Thresholds field: variable length. + */ + u8 optional[]; +} __packed; + +/** + * struct ieee80211_eht_operation - eht operation element + * + * This structure is the "EHT Operation Element" fields as + * described in P802.11be_D1.4 section 9.4.2.311 + * + * FIXME: The spec is unclear how big the fields are, and doesn't + * indicate the "Disabled Subchannel Bitmap Present" in the + * structure (Figure 9-1002a) at all ... + */ +struct ieee80211_eht_operation { + u8 chan_width; + u8 ccfs; + u8 present_bm; + + u8 disable_subchannel_bitmap[]; +} __packed; + +#define IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT 0x1 + /* 802.11ac VHT Capabilities */ #define IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_3895 0x00000000 #define IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_7991 0x00000001 @@ -2129,6 +2235,8 @@ enum ieee80211_client_reg_power { #define IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_40MHZ_80MHZ_IN_5G 0x04 #define IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_160MHZ_IN_5G 0x08 #define IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_80PLUS80_MHZ_IN_5G 0x10 +#define IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_MASK_ALL 0x1e + #define IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_RU_MAPPING_IN_2G 0x20 #define IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_RU_MAPPING_IN_5G 0x40 #define IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_MASK 0xfe @@ -2622,6 +2730,194 @@ ieee80211_he_spr_size(const u8 *he_spr_ie) #define S1G_OPER_CH_WIDTH_PRIMARY_1MHZ BIT(0) #define S1G_OPER_CH_WIDTH_OPER GENMASK(4, 1) +/* EHT MAC capabilities as defined in P802.11be_D1.4 section 9.4.2.313.2 */ +#define IEEE80211_EHT_MAC_CAP0_NSEP_PRIO_ACCESS 0x01 +#define IEEE80211_EHT_MAC_CAP0_OM_CONTROL 0x02 +#define IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 0x04 +#define IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE2 0x08 +#define IEEE80211_EHT_MAC_CAP0_RESTRICTED_TWT 0x10 +#define IEEE80211_EHT_MAC_CAP0_SCS_TRAFFIC_DESC 0x20 +#define IEEE80211_EHT_MAC_CAP0_MAX_AMPDU_LEN_MASK 0xc0 +#define IEEE80211_EHT_MAC_CAP0_MAX_AMPDU_LEN_3895 0 +#define IEEE80211_EHT_MAC_CAP0_MAX_AMPDU_LEN_7991 1 +#define IEEE80211_EHT_MAC_CAP0_MAX_AMPDU_LEN_11454 2 + +/* EHT PHY capabilities as defined in P802.11be_D1.4 section 9.4.2.313.3 */ +#define IEEE80211_EHT_PHY_CAP0_320MHZ_IN_6GHZ 0x02 +#define IEEE80211_EHT_PHY_CAP0_242_TONE_RU_GT20MHZ 0x04 +#define IEEE80211_EHT_PHY_CAP0_NDP_4_EHT_LFT_32_GI 0x08 +#define IEEE80211_EHT_PHY_CAP0_PARTIAL_BW_UL_MU_MIMO 0x10 +#define IEEE80211_EHT_PHY_CAP0_SU_BEAMFORMER 0x20 +#define IEEE80211_EHT_PHY_CAP0_SU_BEAMFORMEE 0x40 + +/* EHT beamformee number of spatial streams <= 80MHz is split */ +#define IEEE80211_EHT_PHY_CAP0_BEAMFORMEE_SS_80MHZ_MASK 0x80 +#define IEEE80211_EHT_PHY_CAP1_BEAMFORMEE_SS_80MHZ_MASK 0x03 + +#define IEEE80211_EHT_PHY_CAP1_BEAMFORMEE_SS_160MHZ_MASK 0x1c +#define IEEE80211_EHT_PHY_CAP1_BEAMFORMEE_SS_320MHZ_MASK 0xe0 + +#define IEEE80211_EHT_PHY_CAP2_SOUNDING_DIM_80MHZ_MASK 0x07 +#define IEEE80211_EHT_PHY_CAP2_SOUNDING_DIM_160MHZ_MASK 0x38 + +/* EHT number of sounding dimensions for 320MHz is split */ +#define IEEE80211_EHT_PHY_CAP2_SOUNDING_DIM_320MHZ_MASK 0xc0 +#define IEEE80211_EHT_PHY_CAP3_SOUNDING_DIM_320MHZ_MASK 0x01 +#define IEEE80211_EHT_PHY_CAP3_NG_16_SU_FEEDBACK 0x02 +#define IEEE80211_EHT_PHY_CAP3_NG_16_MU_FEEDBACK 0x04 +#define IEEE80211_EHT_PHY_CAP3_CODEBOOK_4_2_SU_FDBK 0x08 +#define IEEE80211_EHT_PHY_CAP3_CODEBOOK_7_5_MU_FDBK 0x10 +#define IEEE80211_EHT_PHY_CAP3_TRIG_SU_BF_FDBK 0x20 +#define IEEE80211_EHT_PHY_CAP3_TRIG_MU_BF_PART_BW_FDBK 0x40 +#define IEEE80211_EHT_PHY_CAP3_TRIG_CQI_FDBK 0x80 + +#define IEEE80211_EHT_PHY_CAP4_PART_BW_DL_MU_MIMO 0x01 +#define IEEE80211_EHT_PHY_CAP4_PSR_SR_SUPP 0x02 +#define IEEE80211_EHT_PHY_CAP4_POWER_BOOST_FACT_SUPP 0x04 +#define IEEE80211_EHT_PHY_CAP4_EHT_MU_PPDU_4_EHT_LTF_08_GI 0x08 +#define IEEE80211_EHT_PHY_CAP4_MAX_NC_MASK 0xf0 + +#define IEEE80211_EHT_PHY_CAP5_NON_TRIG_CQI_FEEDBACK 0x01 +#define IEEE80211_EHT_PHY_CAP5_TX_LESS_242_TONE_RU_SUPP 0x02 +#define IEEE80211_EHT_PHY_CAP5_RX_LESS_242_TONE_RU_SUPP 0x04 +#define IEEE80211_EHT_PHY_CAP5_PPE_THRESHOLD_PRESENT 0x08 +#define IEEE80211_EHT_PHY_CAP5_COMMON_NOMINAL_PKT_PAD_MASK 0x30 +#define IEEE80211_EHT_PHY_CAP5_COMMON_NOMINAL_PKT_PAD_0US 0 +#define IEEE80211_EHT_PHY_CAP5_COMMON_NOMINAL_PKT_PAD_8US 1 +#define IEEE80211_EHT_PHY_CAP5_COMMON_NOMINAL_PKT_PAD_16US 2 +#define IEEE80211_EHT_PHY_CAP5_COMMON_NOMINAL_PKT_PAD_20US 3 + +/* Maximum number of supported EHT LTF is split */ +#define IEEE80211_EHT_PHY_CAP5_MAX_NUM_SUPP_EHT_LTF_MASK 0xc0 +#define IEEE80211_EHT_PHY_CAP6_MAX_NUM_SUPP_EHT_LTF_MASK 0x07 + +#define IEEE80211_EHT_PHY_CAP6_MCS15_SUPP_MASK 0x78 +#define IEEE80211_EHT_PHY_CAP6_EHT_DUP_6GHZ_SUPP 0x80 + +#define IEEE80211_EHT_PHY_CAP7_20MHZ_STA_RX_NDP_WIDER_BW 0x01 +#define IEEE80211_EHT_PHY_CAP7_NON_OFDMA_UL_MU_MIMO_80MHZ 0x02 +#define IEEE80211_EHT_PHY_CAP7_NON_OFDMA_UL_MU_MIMO_160MHZ 0x04 +#define IEEE80211_EHT_PHY_CAP7_NON_OFDMA_UL_MU_MIMO_320MHZ 0x08 +#define IEEE80211_EHT_PHY_CAP7_MU_BEAMFORMER_80MHZ 0x10 +#define IEEE80211_EHT_PHY_CAP7_MU_BEAMFORMER_160MHZ 0x20 +#define IEEE80211_EHT_PHY_CAP7_MU_BEAMFORMER_320MHZ 0x40 +#define IEEE80211_EHT_PHY_CAP7_TB_SOUNDING_FDBK_RATE_LIMIT 0x80 + +#define IEEE80211_EHT_PHY_CAP8_RX_1024QAM_WIDER_BW_DL_OFDMA 0x01 +#define IEEE80211_EHT_PHY_CAP8_RX_4096QAM_WIDER_BW_DL_OFDMA 0x02 + +/* + * EHT operation channel width as defined in P802.11be_D1.4 section 9.4.2.311 + */ +#define IEEE80211_EHT_OPER_CHAN_WIDTH 0x7 +#define IEEE80211_EHT_OPER_CHAN_WIDTH_20MHZ 0 +#define IEEE80211_EHT_OPER_CHAN_WIDTH_40MHZ 1 +#define IEEE80211_EHT_OPER_CHAN_WIDTH_80MHZ 2 +#define IEEE80211_EHT_OPER_CHAN_WIDTH_160MHZ 3 +#define IEEE80211_EHT_OPER_CHAN_WIDTH_320MHZ 4 + +/* Calculate 802.11be EHT capabilities IE Tx/Rx EHT MCS NSS Support Field size */ +static inline u8 +ieee80211_eht_mcs_nss_size(const struct ieee80211_he_cap_elem *he_cap, + const struct ieee80211_eht_cap_elem_fixed *eht_cap) +{ + u8 count = 0; + + /* on 2.4 GHz, if it supports 40 MHz, the result is 3 */ + if (he_cap->phy_cap_info[0] & + IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_40MHZ_IN_2G) + return 3; + + /* on 2.4 GHz, these three bits are reserved, so should be 0 */ + if (he_cap->phy_cap_info[0] & + IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_40MHZ_80MHZ_IN_5G) + count += 3; + + if (he_cap->phy_cap_info[0] & + IEEE80211_HE_PHY_CAP0_CHANNEL_WIDTH_SET_160MHZ_IN_5G) + count += 3; + + if (eht_cap->phy_cap_info[0] & IEEE80211_EHT_PHY_CAP0_320MHZ_IN_6GHZ) + count += 3; + + return count ? count : 4; +} + +/* 802.11be EHT PPE Thresholds */ +#define IEEE80211_EHT_PPE_THRES_NSS_POS 0 +#define IEEE80211_EHT_PPE_THRES_NSS_MASK 0xf +#define IEEE80211_EHT_PPE_THRES_RU_INDEX_BITMASK_MASK 0x1f0 +#define IEEE80211_EHT_PPE_THRES_INFO_PPET_SIZE 3 +#define IEEE80211_EHT_PPE_THRES_INFO_HEADER_SIZE 9 + +/* + * Calculate 802.11be EHT capabilities IE EHT field size + */ +static inline u8 +ieee80211_eht_ppe_size(u16 ppe_thres_hdr, const u8 *phy_cap_info) +{ + u32 n; + + if (!(phy_cap_info[5] & + IEEE80211_EHT_PHY_CAP5_PPE_THRESHOLD_PRESENT)) + return 0; + + n = hweight16(ppe_thres_hdr & + IEEE80211_EHT_PPE_THRES_RU_INDEX_BITMASK_MASK); + n *= 1 + u16_get_bits(ppe_thres_hdr, IEEE80211_EHT_PPE_THRES_NSS_MASK); + + /* + * Each pair is 6 bits, and we need to add the 9 "header" bits to the + * total size. + */ + n = n * IEEE80211_EHT_PPE_THRES_INFO_PPET_SIZE * 2 + + IEEE80211_EHT_PPE_THRES_INFO_HEADER_SIZE; + return DIV_ROUND_UP(n, 8); +} + +static inline bool +ieee80211_eht_capa_size_ok(const u8 *he_capa, const u8 *data, u8 len) +{ + const struct ieee80211_eht_cap_elem_fixed *elem = (const void *)data; + u8 needed = sizeof(struct ieee80211_eht_cap_elem_fixed); + + if (len < needed || !he_capa) + return false; + + needed += ieee80211_eht_mcs_nss_size((const void *)he_capa, + (const void *)data); + if (len < needed) + return false; + + if (elem->phy_cap_info[5] & + IEEE80211_EHT_PHY_CAP5_PPE_THRESHOLD_PRESENT) { + u16 ppe_thres_hdr; + + if (len < needed + sizeof(ppe_thres_hdr)) + return false; + + ppe_thres_hdr = get_unaligned_le16(data + needed); + needed += ieee80211_eht_ppe_size(ppe_thres_hdr, + elem->phy_cap_info); + } + + return len >= needed; +} + +static inline bool +ieee80211_eht_oper_size_ok(const u8 *data, u8 len) +{ + const struct ieee80211_eht_operation *elem = (const void *)data; + u8 needed = sizeof(*elem); + + if (len < needed) + return false; + + if (elem->present_bm & IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT) + needed += 2; + + return len >= needed; +} #define LISTEN_INT_USF GENMASK(15, 14) #define LISTEN_INT_UI GENMASK(13, 0) @@ -3077,6 +3373,9 @@ enum ieee80211_eid_ext { WLAN_EID_EXT_SHORT_SSID_LIST = 58, WLAN_EID_EXT_HE_6GHZ_CAPA = 59, WLAN_EID_EXT_UL_MU_POWER_CAPA = 60, + WLAN_EID_EXT_EHT_OPERATION = 106, + WLAN_EID_EXT_EHT_MULTI_LINK = 107, + WLAN_EID_EXT_EHT_CAPABILITY = 108, }; /* Action category code */ -- cgit From 2a2c86f15e17c5013b9897b67d895e64a25ae3cb Mon Sep 17 00:00:00 2001 From: Mordechay Goodstein Date: Mon, 14 Feb 2022 17:29:52 +0100 Subject: ieee80211: add EHT 1K aggregation definitions We add the fields for parsing extended ADDBA request/respond, and new max 1K aggregation for limit ADDBA request/respond. Adjust drivers to use the proper macro, IEEE80211_MAX_AMPDU_BUF -> IEEE80211_MAX_AMPDU_BUF_HE. Signed-off-by: Mordechay Goodstein Link: https://lore.kernel.org/r/20220214173004.b8b447ce95b7.I0ee2554c94e89abc7a752b0f7cc7fd79c273efea@changeid Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 7204bc28ca31..a4143466cb32 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -1024,6 +1024,8 @@ struct ieee80211_tpc_report_ie { #define IEEE80211_ADDBA_EXT_FRAG_LEVEL_MASK GENMASK(2, 1) #define IEEE80211_ADDBA_EXT_FRAG_LEVEL_SHIFT 1 #define IEEE80211_ADDBA_EXT_NO_FRAG BIT(0) +#define IEEE80211_ADDBA_EXT_BUF_SIZE_MASK GENMASK(7, 5) +#define IEEE80211_ADDBA_EXT_BUF_SIZE_SHIFT 10 struct ieee80211_addba_ext_ie { u8 data; @@ -1698,10 +1700,12 @@ struct ieee80211_ht_operation { * A-MPDU buffer sizes * According to HT size varies from 8 to 64 frames * HE adds the ability to have up to 256 frames. + * EHT adds the ability to have up to 1K frames. */ #define IEEE80211_MIN_AMPDU_BUF 0x8 #define IEEE80211_MAX_AMPDU_BUF_HT 0x40 -#define IEEE80211_MAX_AMPDU_BUF 0x100 +#define IEEE80211_MAX_AMPDU_BUF_HE 0x100 +#define IEEE80211_MAX_AMPDU_BUF_EHT 0x400 /* Spatial Multiplexing Power Save Modes (for capability) */ -- cgit From e6df4ead85d9da1b07dd40bd4c6d2182f3e210c4 Mon Sep 17 00:00:00 2001 From: Zhaoyang Huang Date: Tue, 25 Jan 2022 14:56:58 +0800 Subject: psi: fix possible trigger missing in the window When a new threshold breaching stall happens after a psi event was generated and within the window duration, the new event is not generated because the events are rate-limited to one per window. If after that no new stall is recorded then the event will not be generated even after rate-limiting duration has passed. This is happening because with no new stall, window_update will not be called even though threshold was previously breached. To fix this, record threshold breaching occurrence and generate the event once window duration is passed. Suggested-by: Suren Baghdasaryan Signed-off-by: Zhaoyang Huang Signed-off-by: Peter Zijlstra (Intel) Acked-by: Johannes Weiner Acked-by: Suren Baghdasaryan Link: https://lore.kernel.org/r/1643093818-19835-1-git-send-email-huangzhaoyang@gmail.com --- include/linux/psi_types.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 516c0fe836fd..dc3ec5e4b9ee 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -144,6 +144,9 @@ struct psi_trigger { /* Refcounting to prevent premature destruction */ struct kref refcount; + + /* Deferred event(s) from previous ratelimit window */ + bool pending_event; }; struct psi_group { -- cgit From 04d4e665a60902cf36e7ad39af1179cb5df542ad Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Mon, 7 Feb 2022 16:59:06 +0100 Subject: sched/isolation: Use single feature type while referring to housekeeping cpumask Refer to housekeeping APIs using single feature types instead of flags. This prevents from passing multiple isolation features at once to housekeeping interfaces, which soon won't be possible anymore as each isolation features will have their own cpumask. Signed-off-by: Frederic Weisbecker Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Juri Lelli Reviewed-by: Phil Auld Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org --- include/linux/sched/isolation.h | 43 +++++++++++++++++++++-------------------- 1 file changed, 22 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h index cc9f393e2a70..8c15abd67aed 100644 --- a/include/linux/sched/isolation.h +++ b/include/linux/sched/isolation.h @@ -5,54 +5,55 @@ #include #include -enum hk_flags { - HK_FLAG_TIMER = 1, - HK_FLAG_RCU = (1 << 1), - HK_FLAG_MISC = (1 << 2), - HK_FLAG_SCHED = (1 << 3), - HK_FLAG_TICK = (1 << 4), - HK_FLAG_DOMAIN = (1 << 5), - HK_FLAG_WQ = (1 << 6), - HK_FLAG_MANAGED_IRQ = (1 << 7), - HK_FLAG_KTHREAD = (1 << 8), +enum hk_type { + HK_TYPE_TIMER, + HK_TYPE_RCU, + HK_TYPE_MISC, + HK_TYPE_SCHED, + HK_TYPE_TICK, + HK_TYPE_DOMAIN, + HK_TYPE_WQ, + HK_TYPE_MANAGED_IRQ, + HK_TYPE_KTHREAD, + HK_TYPE_MAX }; #ifdef CONFIG_CPU_ISOLATION DECLARE_STATIC_KEY_FALSE(housekeeping_overridden); -extern int housekeeping_any_cpu(enum hk_flags flags); -extern const struct cpumask *housekeeping_cpumask(enum hk_flags flags); -extern bool housekeeping_enabled(enum hk_flags flags); -extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags); -extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags); +extern int housekeeping_any_cpu(enum hk_type type); +extern const struct cpumask *housekeeping_cpumask(enum hk_type type); +extern bool housekeeping_enabled(enum hk_type type); +extern void housekeeping_affine(struct task_struct *t, enum hk_type type); +extern bool housekeeping_test_cpu(int cpu, enum hk_type type); extern void __init housekeeping_init(void); #else -static inline int housekeeping_any_cpu(enum hk_flags flags) +static inline int housekeeping_any_cpu(enum hk_type type) { return smp_processor_id(); } -static inline const struct cpumask *housekeeping_cpumask(enum hk_flags flags) +static inline const struct cpumask *housekeeping_cpumask(enum hk_type type) { return cpu_possible_mask; } -static inline bool housekeeping_enabled(enum hk_flags flags) +static inline bool housekeeping_enabled(enum hk_type type) { return false; } static inline void housekeeping_affine(struct task_struct *t, - enum hk_flags flags) { } + enum hk_type type) { } static inline void housekeeping_init(void) { } #endif /* CONFIG_CPU_ISOLATION */ -static inline bool housekeeping_cpu(int cpu, enum hk_flags flags) +static inline bool housekeeping_cpu(int cpu, enum hk_type type) { #ifdef CONFIG_CPU_ISOLATION if (static_branch_unlikely(&housekeeping_overridden)) - return housekeeping_test_cpu(cpu, flags); + return housekeeping_test_cpu(cpu, type); #endif return true; } -- cgit From fe65deb56e552a8c9bf7f27860dbdeac12a36116 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Mon, 14 Feb 2022 01:57:16 +0900 Subject: jump_label: Avoid unneeded casts in STATIC_KEY_INIT_{TRUE,FALSE} Commit 3821fd35b58d ("jump_label: Reduce the size of struct static_key") introduced the union to struct static_key. It is more natual to set JUMP_TYPE_* to the .type field without casting. Signed-off-by: Masahiro Yamada Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220213165717.2354046-1-masahiroy@kernel.org --- include/linux/jump_label.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 48b9b2a82767..6924e6837e6d 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -251,10 +251,10 @@ extern void static_key_disable_cpuslocked(struct static_key *key); */ #define STATIC_KEY_INIT_TRUE \ { .enabled = { 1 }, \ - { .entries = (void *)JUMP_TYPE_TRUE } } + { .type = JUMP_TYPE_TRUE } } #define STATIC_KEY_INIT_FALSE \ { .enabled = { 0 }, \ - { .entries = (void *)JUMP_TYPE_FALSE } } + { .type = JUMP_TYPE_FALSE } } #else /* !CONFIG_JUMP_LABEL */ -- cgit From cd27ccfc727e99352321c0c75012ab9c5a90321e Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Mon, 14 Feb 2022 01:57:17 +0900 Subject: jump_label: Refactor #ifdef of struct static_key Move #ifdef CONFIG_JUMP_LABEL inside the struct static_key. Signed-off-by: Masahiro Yamada Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220213165717.2354046-2-masahiroy@kernel.org --- include/linux/jump_label.h | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 6924e6837e6d..107751cc047b 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -82,10 +82,9 @@ extern bool static_key_initialized; "%s(): static key '%pS' used before call to jump_label_init()", \ __func__, (key)) -#ifdef CONFIG_JUMP_LABEL - struct static_key { atomic_t enabled; +#ifdef CONFIG_JUMP_LABEL /* * Note: * To make anonymous unions work with old compilers, the static @@ -104,13 +103,9 @@ struct static_key { struct jump_entry *entries; struct static_key_mod *next; }; +#endif /* CONFIG_JUMP_LABEL */ }; -#else -struct static_key { - atomic_t enabled; -}; -#endif /* CONFIG_JUMP_LABEL */ #endif /* __ASSEMBLY__ */ #ifdef CONFIG_JUMP_LABEL -- cgit From 8c30e2d81bfddc5ab9f6b04db1c0f7d6ca7bdf46 Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Fri, 11 Feb 2022 10:46:40 +0100 Subject: fbdev: Don't sort deferred-I/O pages by default Fbdev's deferred I/O sorts all dirty pages by default, which incurs a significant overhead. Make the sorting step optional and update the few drivers that require it. Use a FIFO list by default. Most fbdev drivers with deferred I/O build a bounding rectangle around the dirty pages or simply flush the whole screen. The only two affected DRM drivers, generic fbdev and vmwgfx, both use a bounding rectangle. In those cases, the exact order of the pages doesn't matter. The other drivers look at the page index or handle pages one-by-one. The patch sets the sort_pagelist flag for those, even though some of them would probably work correctly without sorting. Driver maintainers should update their driver accordingly. Sorting pages by memory offset for deferred I/O performs an implicit bubble-sort step on the list of dirty pages. The algorithm goes through the list of dirty pages and inserts each new page according to its index field. Even worse, list traversal always starts at the first entry. As video memory is most likely updated scanline by scanline, the algorithm traverses through the complete list for each updated page. For example, with 1024x768x32bpp each page covers exactly one scanline. Writing a single screen update from top to bottom requires updating 768 pages. With an average list length of 384 entries, a screen update creates (768 * 384 =) 294912 compare operation. Fix this by making the sorting step opt-in and update the few drivers that require it. All other drivers work with unsorted page lists. Pages are appended to the list. Therefore, in the common case of writing the framebuffer top to bottom, pages are still sorted by offset, which may have a positive effect on performance. Playing a video [1] in mplayer's benchmark mode shows the difference (i7-4790, FullHD, simpledrm, kernel with debugging). mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg With sorted page lists: BENCHMARKs: VC: 32.960s VO: 73.068s A: 0.000s Sys: 2.413s = 108.441s BENCHMARK%: VC: 30.3947% VO: 67.3802% A: 0.0000% Sys: 2.2251% = 100.0000% With unsorted page lists: BENCHMARKs: VC: 31.005s VO: 42.889s A: 0.000s Sys: 2.256s = 76.150s BENCHMARK%: VC: 40.7156% VO: 56.3219% A: 0.0000% Sys: 2.9625% = 100.0000% VC shows the overhead of video decoding, VO shows the overhead of the video output. Using unsorted page lists reduces the benchmark's run time by ~32s/~25%. v2: * Make sorted pagelists the special case (Sam) * Comment on drivers' use of pagelist (Sam) * Warn about the overhead in comment Signed-off-by: Thomas Zimmermann Acked-by: Sam Ravnborg Acked-by: Andy Shevchenko Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1] Link: https://patchwork.freedesktop.org/patch/msgid/20220211094640.21632-3-tzimmermann@suse.de --- include/linux/fb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 9a14f3f8a329..39baa9a70779 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -204,6 +204,7 @@ struct fb_pixmap { struct fb_deferred_io { /* delay between mkwrite and deferred handler */ unsigned long delay; + bool sort_pagelist; /* sort pagelist by offset */ struct mutex lock; /* mutex that protects the page list */ struct list_head pagelist; /* list of touched pages */ /* callback */ -- cgit From e1be43d9b5d0d1310dbd90185a8e5c7145dde40f Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Sat, 18 Sep 2021 15:17:53 -0700 Subject: overflow: Implement size_t saturating arithmetic helpers In order to perform more open-coded replacements of common allocation size arithmetic, the kernel needs saturating (SIZE_MAX) helpers for multiplication, addition, and subtraction. For example, it is common in allocators, especially on realloc, to add to an existing size: p = krealloc(map->patch, sizeof(struct reg_sequence) * (map->patch_regs + num_regs), GFP_KERNEL); There is no existing saturating replacement for this calculation, and just leaving the addition open coded inside array_size() could potentially overflow as well. For example, an overflow in an expression for a size_t argument might wrap to zero: array_size(anything, something_at_size_max + 1) == 0 Introduce size_mul(), size_add(), and size_sub() helpers that implicitly promote arguments to size_t and saturated calculations for use in allocations. With these helpers it is also possible to redefine array_size(), array3_size(), flex_array_size(), and struct_size() in terms of the new helpers. As with the check_*_overflow() helpers, the new helpers use __must_check, though what is really desired is a way to make sure that assignment is only to a size_t lvalue. Without this, it's still possible to introduce overflow/underflow via type conversion (i.e. from size_t to int). Enforcing this will currently need to be left to static analysis or future use of -Wconversion. Additionally update the overflow unit tests to force runtime evaluation for the pathological cases. Cc: Rasmus Villemoes Cc: Gustavo A. R. Silva Cc: Nathan Chancellor Cc: Jason Gunthorpe Cc: Nick Desaulniers Cc: Leon Romanovsky Cc: Keith Busch Cc: Len Baker Signed-off-by: Kees Cook --- include/linux/overflow.h | 110 +++++++++++++++++++++++++++++------------------ 1 file changed, 69 insertions(+), 41 deletions(-) (limited to 'include/linux') diff --git a/include/linux/overflow.h b/include/linux/overflow.h index 4669632bd72b..59d7228104d0 100644 --- a/include/linux/overflow.h +++ b/include/linux/overflow.h @@ -118,81 +118,94 @@ static inline bool __must_check __must_check_overflow(bool overflow) })) /** - * array_size() - Calculate size of 2-dimensional array. - * - * @a: dimension one - * @b: dimension two + * size_mul() - Calculate size_t multiplication with saturation at SIZE_MAX * - * Calculates size of 2-dimensional array: @a * @b. + * @factor1: first factor + * @factor2: second factor * - * Returns: number of bytes needed to represent the array or SIZE_MAX on - * overflow. + * Returns: calculate @factor1 * @factor2, both promoted to size_t, + * with any overflow causing the return value to be SIZE_MAX. The + * lvalue must be size_t to avoid implicit type conversion. */ -static inline __must_check size_t array_size(size_t a, size_t b) +static inline size_t __must_check size_mul(size_t factor1, size_t factor2) { size_t bytes; - if (check_mul_overflow(a, b, &bytes)) + if (check_mul_overflow(factor1, factor2, &bytes)) return SIZE_MAX; return bytes; } /** - * array3_size() - Calculate size of 3-dimensional array. + * size_add() - Calculate size_t addition with saturation at SIZE_MAX * - * @a: dimension one - * @b: dimension two - * @c: dimension three - * - * Calculates size of 3-dimensional array: @a * @b * @c. + * @addend1: first addend + * @addend2: second addend * - * Returns: number of bytes needed to represent the array or SIZE_MAX on - * overflow. + * Returns: calculate @addend1 + @addend2, both promoted to size_t, + * with any overflow causing the return value to be SIZE_MAX. The + * lvalue must be size_t to avoid implicit type conversion. */ -static inline __must_check size_t array3_size(size_t a, size_t b, size_t c) +static inline size_t __must_check size_add(size_t addend1, size_t addend2) { size_t bytes; - if (check_mul_overflow(a, b, &bytes)) - return SIZE_MAX; - if (check_mul_overflow(bytes, c, &bytes)) + if (check_add_overflow(addend1, addend2, &bytes)) return SIZE_MAX; return bytes; } -/* - * Compute a*b+c, returning SIZE_MAX on overflow. Internal helper for - * struct_size() below. +/** + * size_sub() - Calculate size_t subtraction with saturation at SIZE_MAX + * + * @minuend: value to subtract from + * @subtrahend: value to subtract from @minuend + * + * Returns: calculate @minuend - @subtrahend, both promoted to size_t, + * with any overflow causing the return value to be SIZE_MAX. For + * composition with the size_add() and size_mul() helpers, neither + * argument may be SIZE_MAX (or the result with be forced to SIZE_MAX). + * The lvalue must be size_t to avoid implicit type conversion. */ -static inline __must_check size_t __ab_c_size(size_t a, size_t b, size_t c) +static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend) { size_t bytes; - if (check_mul_overflow(a, b, &bytes)) - return SIZE_MAX; - if (check_add_overflow(bytes, c, &bytes)) + if (minuend == SIZE_MAX || subtrahend == SIZE_MAX || + check_sub_overflow(minuend, subtrahend, &bytes)) return SIZE_MAX; return bytes; } /** - * struct_size() - Calculate size of structure with trailing array. - * @p: Pointer to the structure. - * @member: Name of the array member. - * @count: Number of elements in the array. + * array_size() - Calculate size of 2-dimensional array. * - * Calculates size of memory needed for structure @p followed by an - * array of @count number of @member elements. + * @a: dimension one + * @b: dimension two * - * Return: number of bytes needed or SIZE_MAX on overflow. + * Calculates size of 2-dimensional array: @a * @b. + * + * Returns: number of bytes needed to represent the array or SIZE_MAX on + * overflow. */ -#define struct_size(p, member, count) \ - __ab_c_size(count, \ - sizeof(*(p)->member) + __must_be_array((p)->member),\ - sizeof(*(p))) +#define array_size(a, b) size_mul(a, b) + +/** + * array3_size() - Calculate size of 3-dimensional array. + * + * @a: dimension one + * @b: dimension two + * @c: dimension three + * + * Calculates size of 3-dimensional array: @a * @b * @c. + * + * Returns: number of bytes needed to represent the array or SIZE_MAX on + * overflow. + */ +#define array3_size(a, b, c) size_mul(size_mul(a, b), c) /** * flex_array_size() - Calculate size of a flexible array member @@ -208,7 +221,22 @@ static inline __must_check size_t __ab_c_size(size_t a, size_t b, size_t c) * Return: number of bytes needed or SIZE_MAX on overflow. */ #define flex_array_size(p, member, count) \ - array_size(count, \ - sizeof(*(p)->member) + __must_be_array((p)->member)) + size_mul(count, \ + sizeof(*(p)->member) + __must_be_array((p)->member)) + +/** + * struct_size() - Calculate size of structure with trailing flexible array. + * + * @p: Pointer to the structure. + * @member: Name of the array member. + * @count: Number of elements in the array. + * + * Calculates size of memory needed for structure @p followed by an + * array of @count number of @member elements. + * + * Return: number of bytes needed or SIZE_MAX on overflow. + */ +#define struct_size(p, member, count) \ + size_add(sizeof(*(p)), flex_array_size(p, member, count)) #endif /* __LINUX_OVERFLOW_H */ -- cgit From 230f6fa2c1db6a3f3e668cfe95995ac8e6eee212 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 9 Feb 2022 16:40:41 -0800 Subject: overflow: Provide constant expression struct_size There have been cases where struct_size() (or flex_array_size()) needs to be calculated for an initializer, which requires it be a constant expression. This is possible when the "count" argument is a constant expression, so provide this ability for the helpers. Cc: Gustavo A. R. Silva Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Rasmus Villemoes Signed-off-by: Kees Cook Reviewed-by: Gustavo A. R. Silva Tested-by: Gustavo A. R. Silva Link: https://lore.kernel.org/lkml/20220210010407.GA701603@embeddedor --- include/linux/overflow.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/overflow.h b/include/linux/overflow.h index 59d7228104d0..f1221d11f8e5 100644 --- a/include/linux/overflow.h +++ b/include/linux/overflow.h @@ -4,6 +4,7 @@ #include #include +#include /* * We need to compute the minimum and maximum values representable in a given @@ -221,8 +222,9 @@ static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend) * Return: number of bytes needed or SIZE_MAX on overflow. */ #define flex_array_size(p, member, count) \ - size_mul(count, \ - sizeof(*(p)->member) + __must_be_array((p)->member)) + __builtin_choose_expr(__is_constexpr(count), \ + (count) * sizeof(*(p)->member) + __must_be_array((p)->member), \ + size_mul(count, sizeof(*(p)->member) + __must_be_array((p)->member))) /** * struct_size() - Calculate size of structure with trailing flexible array. @@ -237,6 +239,8 @@ static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend) * Return: number of bytes needed or SIZE_MAX on overflow. */ #define struct_size(p, member, count) \ - size_add(sizeof(*(p)), flex_array_size(p, member, count)) + __builtin_choose_expr(__is_constexpr(count), \ + sizeof(*(p)) + flex_array_size(p, member, count), \ + size_add(sizeof(*(p)), flex_array_size(p, member, count))) #endif /* __LINUX_OVERFLOW_H */ -- cgit From 28db4711bf48303814dcfd8d41a41106e90bc374 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 15 Feb 2022 11:05:38 +0100 Subject: blk-mq: remove the request_queue argument to blk_insert_cloned_request The request must be submitted to the queue it was allocated for, so remove the extra request_queue argument. Signed-off-by: Christoph Hellwig Reviewed-by: Mike Snitzer Link: https://lore.kernel.org/r/20220215100540.3892965-4-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index d319ffa59354..3a41d50b85d3 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -952,8 +952,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src, struct bio_set *bs, gfp_t gfp_mask, int (*bio_ctr)(struct bio *, struct bio *, void *), void *data); void blk_rq_unprep_clone(struct request *rq); -blk_status_t blk_insert_cloned_request(struct request_queue *q, - struct request *rq); +blk_status_t blk_insert_cloned_request(struct request *rq); struct rq_map_data { struct page **pages; -- cgit From 76792055c4c8b2472ca1ae48e0ddaf8497529f08 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 15 Feb 2022 10:45:10 +0100 Subject: block: add a ->free_disk method Add a method to notify the driver that the gendisk is about to be freed. This allows drivers to tie the lifetime of their private data to that of the gendisk and thus deal with device removal races without expensive synchronization and boilerplate code. A new flag is added so that ->free_disk is only called after a successful call to add_disk, which significantly simplifies the error handling path during probing. Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220215094514.3828912-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3bfc75a2a450..f757f9c2871f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -146,6 +146,7 @@ struct gendisk { #define GD_READ_ONLY 1 #define GD_DEAD 2 #define GD_NATIVE_CAPACITY 3 +#define GD_ADDED 4 struct mutex open_mutex; /* open/close mutex */ unsigned open_partitions; /* number of open partitions */ @@ -1464,6 +1465,7 @@ struct block_device_operations { void (*unlock_native_capacity) (struct gendisk *); int (*getgeo)(struct block_device *, struct hd_geometry *); int (*set_read_only)(struct block_device *bdev, bool ro); + void (*free_disk)(struct gendisk *disk); /* this callback is with swap_lock and sometimes page table lock held */ void (*swap_slot_free_notify) (struct block_device *, unsigned long); int (*report_zones)(struct gendisk *, sector_t sector, -- cgit From 5224f79096170bf7b92cc8fe42a12f44b91e5f62 Mon Sep 17 00:00:00 2001 From: "Gustavo A. R. Silva" Date: Mon, 14 Feb 2022 19:11:44 -0600 Subject: treewide: Replace zero-length arrays with flexible-array members MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This code was transformed with the help of Coccinelle: (next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch) @@ identifier S, member, array; type T1, T2; @@ struct S { ... T1 member; T2 array[ - 0 ]; }; UAPI and wireless changes were intentionally excluded from this patch and will be sent out separately. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/78 Reviewed-by: Kees Cook Signed-off-by: Gustavo A. R. Silva --- include/linux/greybus/greybus_manifest.h | 4 ++-- include/linux/greybus/hd.h | 2 +- include/linux/greybus/module.h | 2 +- include/linux/i3c/ccc.h | 6 +++--- include/linux/platform_data/brcmfmac.h | 2 +- include/linux/platform_data/cros_ec_commands.h | 2 +- 6 files changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/greybus/greybus_manifest.h b/include/linux/greybus/greybus_manifest.h index 6e62fe478712..bef9eb2093e9 100644 --- a/include/linux/greybus/greybus_manifest.h +++ b/include/linux/greybus/greybus_manifest.h @@ -100,7 +100,7 @@ enum { struct greybus_descriptor_string { __u8 length; __u8 id; - __u8 string[0]; + __u8 string[]; } __packed; /* @@ -175,7 +175,7 @@ struct greybus_manifest_header { struct greybus_manifest { struct greybus_manifest_header header; - struct greybus_descriptor descriptors[0]; + struct greybus_descriptor descriptors[]; } __packed; #endif /* __GREYBUS_MANIFEST_H */ diff --git a/include/linux/greybus/hd.h b/include/linux/greybus/hd.h index d3faf0c1a569..718e2857054e 100644 --- a/include/linux/greybus/hd.h +++ b/include/linux/greybus/hd.h @@ -58,7 +58,7 @@ struct gb_host_device { struct gb_svc *svc; /* Private data for the host driver */ - unsigned long hd_priv[0] __aligned(sizeof(s64)); + unsigned long hd_priv[] __aligned(sizeof(s64)); }; #define to_gb_host_device(d) container_of(d, struct gb_host_device, dev) diff --git a/include/linux/greybus/module.h b/include/linux/greybus/module.h index 47b839af145d..3efe2133acfd 100644 --- a/include/linux/greybus/module.h +++ b/include/linux/greybus/module.h @@ -23,7 +23,7 @@ struct gb_module { bool disconnected; - struct gb_interface *interfaces[0]; + struct gb_interface *interfaces[]; }; #define to_gb_module(d) container_of(d, struct gb_module, dev) diff --git a/include/linux/i3c/ccc.h b/include/linux/i3c/ccc.h index 73b0982cc519..ad59a4ae60d1 100644 --- a/include/linux/i3c/ccc.h +++ b/include/linux/i3c/ccc.h @@ -132,7 +132,7 @@ struct i3c_ccc_dev_desc { struct i3c_ccc_defslvs { u8 count; struct i3c_ccc_dev_desc master; - struct i3c_ccc_dev_desc slaves[0]; + struct i3c_ccc_dev_desc slaves[]; } __packed; /** @@ -240,7 +240,7 @@ struct i3c_ccc_bridged_slave_desc { */ struct i3c_ccc_setbrgtgt { u8 count; - struct i3c_ccc_bridged_slave_desc bslaves[0]; + struct i3c_ccc_bridged_slave_desc bslaves[]; } __packed; /** @@ -318,7 +318,7 @@ enum i3c_ccc_setxtime_subcmd { */ struct i3c_ccc_setxtime { u8 subcmd; - u8 data[0]; + u8 data[]; } __packed; #define I3C_CCC_GETXTIME_SYNC_MODE BIT(0) diff --git a/include/linux/platform_data/brcmfmac.h b/include/linux/platform_data/brcmfmac.h index 2b5676ff35be..f922a192fe58 100644 --- a/include/linux/platform_data/brcmfmac.h +++ b/include/linux/platform_data/brcmfmac.h @@ -178,7 +178,7 @@ struct brcmfmac_platform_data { void (*power_off)(void); char *fw_alternative_path; int device_count; - struct brcmfmac_pd_device devices[0]; + struct brcmfmac_pd_device devices[]; }; diff --git a/include/linux/platform_data/cros_ec_commands.h b/include/linux/platform_data/cros_ec_commands.h index 271bd87bff0a..728735aed980 100644 --- a/include/linux/platform_data/cros_ec_commands.h +++ b/include/linux/platform_data/cros_ec_commands.h @@ -5644,7 +5644,7 @@ struct ec_response_typec_discovery { uint8_t svid_count; /* Number of SVIDs partner sent */ uint16_t reserved; uint32_t discovery_vdo[6]; /* Max VDOs allowed after VDM header is 6 */ - struct svid_mode_info svids[0]; + struct svid_mode_info svids[]; } __ec_align1; /* USB Type-C commands for AP-controlled device policy. */ -- cgit From 7a5428dcb7902700b830e912feee4e845df7c019 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 17 Feb 2022 08:52:31 +0100 Subject: block: fix surprise removal for drivers calling blk_set_queue_dying MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Various block drivers call blk_set_queue_dying to mark a disk as dead due to surprise removal events, but since commit 8e141f9eb803 that doesn't work given that the GD_DEAD flag needs to be set to stop I/O. Replace the driver calls to blk_set_queue_dying with a new (and properly documented) blk_mark_disk_dead API, and fold blk_set_queue_dying into the only remaining caller. Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: Markus Blöchl Signed-off-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Link: https://lore.kernel.org/r/20220217075231.1140-1-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f35aea98bc35..16b47035e4b0 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -748,7 +748,8 @@ extern bool blk_queue_can_use_dma_map_merging(struct request_queue *q, bool __must_check blk_get_queue(struct request_queue *); extern void blk_put_queue(struct request_queue *); -extern void blk_set_queue_dying(struct request_queue *); + +void blk_mark_disk_dead(struct gendisk *disk); #ifdef CONFIG_BLOCK /* -- cgit From 2e7dfb0e9cacad0f1adbc4b97f0b96ba35027f24 Mon Sep 17 00:00:00 2001 From: Samuel Holland Date: Sun, 13 Feb 2022 23:01:16 -0600 Subject: usb: typec: Factor out non-PD fwnode properties Basic programmable non-PD Type-C port controllers do not need the full TCPM library, but they share the same devicetree binding and the same typec_capability structure. Factor out a helper for parsing those properties which map to fields in struct typec_capability, so the code can be shared between TCPM and basic non-TCPM drivers. Reviewed-by: Heikki Krogerus Signed-off-by: Samuel Holland Link: https://lore.kernel.org/r/20220214050118.61015-4-samuel@sholland.org Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/typec.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb/typec.h b/include/linux/usb/typec.h index 7ba45a97eeae..fdf737d48b3b 100644 --- a/include/linux/usb/typec.h +++ b/include/linux/usb/typec.h @@ -295,6 +295,9 @@ int typec_set_mode(struct typec_port *port, int mode); void *typec_get_drvdata(struct typec_port *port); +int typec_get_fw_cap(struct typec_capability *cap, + struct fwnode_handle *fwnode); + int typec_find_pwr_opmode(const char *name); int typec_find_orientation(const char *name); int typec_find_port_power_role(const char *name); -- cgit From ebcbc6ea7d8a604ad8504dae70a6ac1b1e64a0b7 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 14 Feb 2022 18:20:24 -0800 Subject: mm/munlock: delete page_mlock() and all its works We have recommended some applications to mlock their userspace, but that turns out to be counter-productive: when many processes mlock the same file, contention on rmap's i_mmap_rwsem can become intolerable at exit: it is needed for write, to remove any vma mapping that file from rmap's tree; but hogged for read by those with mlocks calling page_mlock() (formerly known as try_to_munlock()) on *each* page mapped from the file (the purpose being to find out whether another process has the page mlocked, so therefore it should not be unmlocked yet). Several optimizations have been made in the past: one is to skip page_mlock() when mapcount tells that nothing else has this page mapped; but that doesn't help at all when others do have it mapped. This time around, I initially intended to add a preliminary search of the rmap tree for overlapping VM_LOCKED ranges; but that gets messy with locking order, when in doubt whether a page is actually present; and risks adding even more contention on the i_mmap_rwsem. A solution would be much easier, if only there were space in struct page for an mlock_count... but actually, most of the time, there is space for it - an mlocked page spends most of its life on an unevictable LRU, but since 3.18 removed the scan_unevictable_pages sysctl, that "LRU" has been redundant. Let's try to reuse its page->lru. But leave that until a later patch: in this patch, clear the ground by removing page_mlock(), and all the infrastructure that has gathered around it - which mostly hinders understanding, and will make reviewing new additions harder. Don't mind those old comments about THPs, they date from before 4.5's refcounting rework: splitting is not a risk here. Just keep a minimal version of munlock_vma_page(), as reminder of what it should attend to (in particular, the odd way PGSTRANDED is counted out of PGMUNLOCKED), and likewise a stub for munlock_vma_pages_range(). Move unchanged __mlock_posix_error_return() out of the way, down to above its caller: this series then makes no further change after mlock_fixup(). After this and each following commit, the kernel builds, boots and runs; but with deficiencies which may show up in testing of mlock and munlock. The system calls succeed or fail as before, and mlock remains effective in preventing page reclaim; but meminfo's Unevictable and Mlocked amounts may be shown too low after mlock, grow, then stay too high after munlock: with previously mlocked pages remaining unevictable for too long, until finally unmapped and freed and counts corrected. Normal service will be resumed in "mm/munlock: mlock_pte_range() when mlocking or munlocking". Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index e704b1a4c06c..dc48aa8c2c94 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -237,12 +237,6 @@ unsigned long page_address_in_vma(struct page *, struct vm_area_struct *); */ int folio_mkclean(struct folio *); -/* - * called in munlock()/munmap() path to check for other vmas holding - * the page mlocked. - */ -void page_mlock(struct page *page); - void remove_migration_ptes(struct page *old, struct page *new, bool locked); /* -- cgit From b67bf49ce7aae72f63739abee6ac25f64bf20081 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 14 Feb 2022 18:21:52 -0800 Subject: mm/munlock: delete FOLL_MLOCK and FOLL_POPULATE If counting page mlocks, we must not double-count: follow_page_pte() can tell if a page has already been Mlocked or not, but cannot tell if a pte has already been counted or not: that will have to be done when the pte is mapped in (which lru_cache_add_inactive_or_unevictable() already tracks for new anon pages, but there's no such tracking yet for others). Delete all the FOLL_MLOCK code - faulting in the missing pages will do all that is necessary, without special mlock_vma_page() calls from here. But then FOLL_POPULATE turns out to serve no purpose - it was there so that its absence would tell faultin_page() not to faultin page when setting up VM_LOCKONFAULT areas; but if there's no special work needed here for mlock, then there's no work at all here for VM_LOCKONFAULT. Have I got that right? I've not looked into the history, but see that FOLL_POPULATE goes back before VM_LOCKONFAULT: did it serve a different purpose before? Ah, yes, it was used to skip the old stack guard page. And is it intentional that COW is not broken on existing pages when setting up a VM_LOCKONFAULT area? I can see that being argued either way, and have no reason to disagree with current behaviour. Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 213cc569b192..74ee50c2033b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2925,13 +2925,11 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_FORCE 0x10 /* get_user_pages read/write w/o permission */ #define FOLL_NOWAIT 0x20 /* if a disk transfer is needed, start the IO * and return without waiting upon it */ -#define FOLL_POPULATE 0x40 /* fault in pages (with FOLL_MLOCK) */ #define FOLL_NOFAULT 0x80 /* do not fault in pages */ #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ #define FOLL_NUMA 0x200 /* force NUMA hinting page fault */ #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ -#define FOLL_MLOCK 0x1000 /* lock present pages */ #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */ #define FOLL_COW 0x4000 /* internal GUP flag */ #define FOLL_ANON 0x8000 /* don't do file mappings */ -- cgit From cea86fe246b694a191804b47378eb9d77aefabec Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 14 Feb 2022 18:26:39 -0800 Subject: mm/munlock: rmap call mlock_vma_page() munlock_vma_page() Add vma argument to mlock_vma_page() and munlock_vma_page(), make them inline functions which check (vma->vm_flags & VM_LOCKED) before calling mlock_page() and munlock_page() in mm/mlock.c. Add bool compound to mlock_vma_page() and munlock_vma_page(): this is because we have understandable difficulty in accounting pte maps of THPs, and if passed a PageHead page, mlock_page() and munlock_page() cannot tell whether it's a pmd map to be counted or a pte map to be ignored. Add vma arg to page_add_file_rmap() and page_remove_rmap(), like the others, and use that to call mlock_vma_page() at the end of the page adds, and munlock_vma_page() at the end of page_remove_rmap() (end or beginning? unimportant, but end was easier for assertions in testing). No page lock is required (although almost all adds happen to hold it): delete the "Serialize with page migration" BUG_ON(!PageLocked(page))s. Certainly page lock did serialize with page migration, but I'm having difficulty explaining why that was ever important. Mlock accounting on THPs has been hard to define, differed between anon and file, involved PageDoubleMap in some places and not others, required clear_page_mlock() at some points. Keep it simple now: just count the pmds and ignore the ptes, there is no reason for ptes to undo pmd mlocks. page_add_new_anon_rmap() callers unchanged: they have long been calling lru_cache_add_inactive_or_unevictable(), which does its own VM_LOCKED handling (it also checks for not VM_SPECIAL: I think that's overcautious, and inconsistent with other checks, that mmap_region() already prevents VM_LOCKED on VM_SPECIAL; but haven't quite convinced myself to change it). Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index dc48aa8c2c94..ac29b076082b 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -167,18 +167,19 @@ struct anon_vma *page_get_anon_vma(struct page *page); */ void page_move_anon_rmap(struct page *, struct vm_area_struct *); void page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, bool); + unsigned long address, bool compound); void do_page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, int); + unsigned long address, int flags); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long, bool); -void page_add_file_rmap(struct page *, bool); -void page_remove_rmap(struct page *, bool); - + unsigned long address, bool compound); +void page_add_file_rmap(struct page *, struct vm_area_struct *, + bool compound); +void page_remove_rmap(struct page *, struct vm_area_struct *, + bool compound); void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long); + unsigned long address); void hugepage_add_new_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long); + unsigned long address); static inline void page_dup_rmap(struct page *page, bool compound) { -- cgit From 07ca760673088f262da57ff42c15558688565aa2 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Mon, 14 Feb 2022 18:29:54 -0800 Subject: mm/munlock: maintain page->mlock_count while unevictable Previous patches have been preparatory: now implement page->mlock_count. The ordering of the "Unevictable LRU" is of no significance, and there is no point holding unevictable pages on a list: place page->mlock_count to overlay page->lru.prev (since page->lru.next is overlaid by compound_head, which needs to be even so as not to satisfy PageTail - though 2 could be added instead of 1 for each mlock, if that's ever an improvement). But it's only safe to rely on or modify page->mlock_count while lruvec lock is held and page is on unevictable "LRU" - we can save lots of edits by continuing to pretend that there's an imaginary LRU here (there is an unevictable count which still needs to be maintained, but not a list). The mlock_count technique suffers from an unreliability much like with page_mlock(): while someone else has the page off LRU, not much can be done. As before, err on the safe side (behave as if mlock_count 0), and let try_to_unlock_one() move the page to unevictable if reclaim finds out later on - a few misplaced pages don't matter, what we want to avoid is imbalancing reclaim by flooding evictable lists with unevictable pages. I am not a fan of "if (!isolate_lru_page(page)) putback_lru_page(page);": if we have taken lruvec lock to get the page off its present list, then we save everyone trouble (and however many extra atomic ops) by putting it on its destination list immediately. Signed-off-by: Hugh Dickins Acked-by: Vlastimil Babka Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm_inline.h | 11 ++++++++--- include/linux/mm_types.h | 19 +++++++++++++++++-- 2 files changed, 25 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index b725839dfe71..884d6f6af05b 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -99,7 +99,8 @@ void lruvec_add_folio(struct lruvec *lruvec, struct folio *folio) update_lru_size(lruvec, lru, folio_zonenum(folio), folio_nr_pages(folio)); - list_add(&folio->lru, &lruvec->lists[lru]); + if (lru != LRU_UNEVICTABLE) + list_add(&folio->lru, &lruvec->lists[lru]); } static __always_inline void add_page_to_lru_list(struct page *page, @@ -115,6 +116,7 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struct folio *folio) update_lru_size(lruvec, lru, folio_zonenum(folio), folio_nr_pages(folio)); + /* This is not expected to be used on LRU_UNEVICTABLE */ list_add_tail(&folio->lru, &lruvec->lists[lru]); } @@ -127,8 +129,11 @@ static __always_inline void add_page_to_lru_list_tail(struct page *page, static __always_inline void lruvec_del_folio(struct lruvec *lruvec, struct folio *folio) { - list_del(&folio->lru); - update_lru_size(lruvec, folio_lru_list(folio), folio_zonenum(folio), + enum lru_list lru = folio_lru_list(folio); + + if (lru != LRU_UNEVICTABLE) + list_del(&folio->lru); + update_lru_size(lruvec, lru, folio_zonenum(folio), -folio_nr_pages(folio)); } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5140e5feb486..475bdb282769 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -85,7 +85,16 @@ struct page { * lruvec->lru_lock. Sometimes used as a generic list * by the page owner. */ - struct list_head lru; + union { + struct list_head lru; + /* Or, for the Unevictable "LRU list" slot */ + struct { + /* Always even, to negate PageTail */ + void *__filler; + /* Count page's or folio's mlocks */ + unsigned int mlock_count; + }; + }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; pgoff_t index; /* Our offset within mapping. */ @@ -241,7 +250,13 @@ struct folio { struct { /* public: */ unsigned long flags; - struct list_head lru; + union { + struct list_head lru; + struct { + void *__filler; + unsigned int mlock_count; + }; + }; struct address_space *mapping; pgoff_t index; void *private; -- cgit From 904b10fb189cc15376e9bfce1ef0282e68b0b004 Mon Sep 17 00:00:00 2001 From: Pali Rohár Date: Mon, 14 Feb 2022 12:41:08 +0100 Subject: PCI: Add defines for normal and subtractive PCI bridges MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add these PCI class codes to pci_ids.h: PCI_CLASS_BRIDGE_PCI_NORMAL PCI_CLASS_BRIDGE_PCI_SUBTRACTIVE Use these defines in all kernel code for describing PCI class codes for normal and subtractive PCI bridges. [bhelgaas: similar change in pci-mvebu.c] Link: https://lore.kernel.org/r/20220214114109.26809-1-pali@kernel.org Signed-off-by: Pali Rohár Signed-off-by: Bjorn Helgaas --- include/linux/pci_ids.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index aad54c666407..130949c3b486 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -60,6 +60,8 @@ #define PCI_CLASS_BRIDGE_EISA 0x0602 #define PCI_CLASS_BRIDGE_MC 0x0603 #define PCI_CLASS_BRIDGE_PCI 0x0604 +#define PCI_CLASS_BRIDGE_PCI_NORMAL 0x060400 +#define PCI_CLASS_BRIDGE_PCI_SUBTRACTIVE 0x060401 #define PCI_CLASS_BRIDGE_PCMCIA 0x0605 #define PCI_CLASS_BRIDGE_NUBUS 0x0606 #define PCI_CLASS_BRIDGE_CARDBUS 0x0607 -- cgit From 091296d30917f6e63a9402974f546412dfb2a8e6 Mon Sep 17 00:00:00 2001 From: Miri Korenblit Date: Sat, 5 Feb 2022 11:21:36 +0200 Subject: iwlwifi: mvm: refactor setting PPE thresholds in STA_HE_CTXT_CMD We are setting the PPE Thresholds in STA_HE_CTXT_CMD according to HE PHY Capabilities IE. As EHT is introduced, we will have to set this thresholds according to EHT PHY Capabilities IE if we're in an EHT connection. Some parts of the code can be used for both HE and EHT. Put this parts in functions which will be used in the patch which adds support for EHT PPE. Signed-off-by: Miri Korenblit Signed-off-by: Luca Coelho Link: https://lore.kernel.org/r/iwlwifi.20220205112029.48a508dfffef.If392e44d88f96ebed7fadf827e327194d4bd97b1@changeid Signed-off-by: Luca Coelho --- include/linux/ieee80211.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index a4143466cb32..75d40acb60c1 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -2421,6 +2421,7 @@ ieee80211_he_mcs_nss_size(const struct ieee80211_he_cap_elem *he_cap) #define IEEE80211_PPE_THRES_RU_INDEX_BITMASK_MASK 0x78 #define IEEE80211_PPE_THRES_RU_INDEX_BITMASK_POS (3) #define IEEE80211_PPE_THRES_INFO_PPET_SIZE (3) +#define IEEE80211_HE_PPE_THRES_INFO_HEADER_SIZE (7) /* * Calculate 802.11ax HE capabilities IE PPE field size -- cgit From ec87cf3782f7b05ae15e061d36c763c35488f292 Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Wed, 2 Feb 2022 23:58:21 +0300 Subject: ata: libata: make ata_host_suspend() *void* MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ata_host_suspend() always returns 0, so the result checks in many drivers look pointless. Let's make this function return *void* instead of *int*. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Signed-off-by: Sergey Shtylyov Acked-by: Uwe Kleine-König Reviewed-by: Hannes Reinecke Signed-off-by: Damien Le Moal --- include/linux/libata.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 605756f645be..a49e75c4206d 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1080,7 +1080,7 @@ extern int ata_sas_scsi_ioctl(struct ata_port *ap, struct scsi_device *dev, extern bool ata_link_online(struct ata_link *link); extern bool ata_link_offline(struct ata_link *link); #ifdef CONFIG_PM -extern int ata_host_suspend(struct ata_host *host, pm_message_t mesg); +extern void ata_host_suspend(struct ata_host *host, pm_message_t mesg); extern void ata_host_resume(struct ata_host *host); extern void ata_sas_port_suspend(struct ata_port *ap); extern void ata_sas_port_resume(struct ata_port *ap); -- cgit From 47f0bd5032106469827cf56c8b45bb9101112105 Mon Sep 17 00:00:00 2001 From: Jacques de Laval Date: Thu, 17 Feb 2022 16:02:02 +0100 Subject: net: Add new protocol attribute to IP addresses This patch adds a new protocol attribute to IPv4 and IPv6 addresses. Inspiration was taken from the protocol attribute of routes. User space applications like iproute2 can set/get the protocol with the Netlink API. The attribute is stored as an 8-bit unsigned integer. The protocol attribute is set by kernel for these categories: - IPv4 and IPv6 loopback addresses - IPv6 addresses generated from router announcements - IPv6 link local addresses User space may pass custom protocols, not defined by the kernel. Grouping addresses on their origin is useful in scenarios where you want to distinguish between addresses based on who added them, e.g. kernel vs. user space. Tagging addresses with a string label is an existing feature that could be used as a solution. Unfortunately the max length of a label is 15 characters, and for compatibility reasons the label must be prefixed with the name of the device followed by a colon. Since device names also have a max length of 15 characters, only -1 characters is guaranteed to be available for any origin tag, which is not that much. A reference implementation of user space setting and getting protocols is available for iproute2: https://github.com/westermo/iproute2/commit/9a6ea18bd79f47f293e5edc7780f315ea42ff540 Signed-off-by: Jacques de Laval Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20220217150202.80802-1-Jacques.De.Laval@westermo.com Signed-off-by: Jakub Kicinski --- include/linux/inetdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index 674aeead6260..ead323243e7b 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -150,6 +150,7 @@ struct in_ifaddr { __be32 ifa_broadcast; unsigned char ifa_scope; unsigned char ifa_prefixlen; + unsigned char ifa_proto; __u32 ifa_flags; char ifa_label[IFNAMSIZ]; -- cgit From b1e8206582f9d680cff7d04828708c8b6ab32957 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 14 Feb 2022 10:16:57 +0100 Subject: sched: Fix yet more sched_fork() races Where commit 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group") fixed a fork race vs cgroup, it opened up a race vs syscalls by not placing the task on the runqueue before it gets exposed through the pidhash. Commit 13765de8148f ("sched/fair: Fix fault in reweight_entity") is trying to fix a single instance of this, instead fix the whole class of issues, effectively reverting this commit. Fixes: 4ef0c5c6b5ba ("kernel/sched: Fix sched_fork() access an invalid sched_task_group") Reported-by: Linus Torvalds Signed-off-by: Peter Zijlstra (Intel) Tested-by: Tadeusz Struk Tested-by: Zhang Qiao Tested-by: Dietmar Eggemann Link: https://lkml.kernel.org/r/YgoeCbwj5mbCR0qA@hirez.programming.kicks-ass.net --- include/linux/sched/task.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index b9198a1b3a84..e84e54d1b490 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -54,8 +54,8 @@ extern asmlinkage void schedule_tail(struct task_struct *prev); extern void init_idle(struct task_struct *idle, int cpu); extern int sched_fork(unsigned long clone_flags, struct task_struct *p); -extern void sched_post_fork(struct task_struct *p, - struct kernel_clone_args *kargs); +extern void sched_cgroup_fork(struct task_struct *p, struct kernel_clone_args *kargs); +extern void sched_post_fork(struct task_struct *p); extern void sched_dead(struct task_struct *p); void __noreturn do_task_dead(void); -- cgit From 8a69fe0be143b0a1af829f85f0e9a1ae7d6a04db Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 14 Feb 2022 16:52:11 +0000 Subject: sched/preempt: Refactor sched_dynamic_update() Currently sched_dynamic_update needs to open-code the enabled/disabled function names for each preemption model it supports, when in practice this is a boolean enabled/disabled state for each function. Make this clearer and avoid repetition by defining the enabled/disabled states at the function definition, and using helper macros to perform the static_call_update(). Where x86 currently overrides the enabled function, it is made to provide both the enabled and disabled states for consistency, with defaults provided by the core code otherwise. In subsequent patches this will allow us to support PREEMPT_DYNAMIC without static calls. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Signed-off-by: Peter Zijlstra (Intel) Acked-by: Ard Biesheuvel Acked-by: Frederic Weisbecker Link: https://lore.kernel.org/r/20220214165216.2231574-3-mark.rutland@arm.com --- include/linux/entry-common.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 2e2b8d6140ed..a01ac1a0a292 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -456,6 +456,8 @@ irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs); */ void irqentry_exit_cond_resched(void); #ifdef CONFIG_PREEMPT_DYNAMIC +#define irqentry_exit_cond_resched_dynamic_enabled irqentry_exit_cond_resched +#define irqentry_exit_cond_resched_dynamic_disabled NULL DECLARE_STATIC_CALL(irqentry_exit_cond_resched, irqentry_exit_cond_resched); #endif -- cgit From 4624a14f4daa8ab4578d274555fd8847254ce339 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 14 Feb 2022 16:52:12 +0000 Subject: sched/preempt: Simplify irqentry_exit_cond_resched() callers Currently callers of irqentry_exit_cond_resched() need to be aware of whether the function should be indirected via a static call, leading to ugly ifdeffery in callers. Save them the hassle with a static inline wrapper that does the right thing. The raw_irqentry_exit_cond_resched() will also be useful in subsequent patches which will add conditional wrappers for preemption functions. Note: in arch/x86/entry/common.c, xen_pv_evtchn_do_upcall() always calls irqentry_exit_cond_resched() directly, even when PREEMPT_DYNAMIC is in use. I believe this is a latent bug (which this patch corrects), but I'm not entirely certain this wasn't deliberate. Signed-off-by: Mark Rutland Signed-off-by: Peter Zijlstra (Intel) Acked-by: Ard Biesheuvel Acked-by: Frederic Weisbecker Link: https://lore.kernel.org/r/20220214165216.2231574-4-mark.rutland@arm.com --- include/linux/entry-common.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index a01ac1a0a292..dfd84c59b144 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -454,11 +454,14 @@ irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs); * * Conditional reschedule with additional sanity checks. */ -void irqentry_exit_cond_resched(void); +void raw_irqentry_exit_cond_resched(void); #ifdef CONFIG_PREEMPT_DYNAMIC -#define irqentry_exit_cond_resched_dynamic_enabled irqentry_exit_cond_resched +#define irqentry_exit_cond_resched_dynamic_enabled raw_irqentry_exit_cond_resched #define irqentry_exit_cond_resched_dynamic_disabled NULL -DECLARE_STATIC_CALL(irqentry_exit_cond_resched, irqentry_exit_cond_resched); +DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched); +#define irqentry_exit_cond_resched() static_call(irqentry_exit_cond_resched)() +#else +#define irqentry_exit_cond_resched() raw_irqentry_exit_cond_resched() #endif /** -- cgit From 99cf983cc8bca4adb461b519664c939a565cfd4d Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Mon, 14 Feb 2022 16:52:14 +0000 Subject: sched/preempt: Add PREEMPT_DYNAMIC using static keys Where an architecture selects HAVE_STATIC_CALL but not HAVE_STATIC_CALL_INLINE, each static call has an out-of-line trampoline which will either branch to a callee or return to the caller. On such architectures, a number of constraints can conspire to make those trampolines more complicated and potentially less useful than we'd like. For example: * Hardware and software control flow integrity schemes can require the addition of "landing pad" instructions (e.g. `BTI` for arm64), which will also be present at the "real" callee. * Limited branch ranges can require that trampolines generate or load an address into a register and perform an indirect branch (or at least have a slow path that does so). This loses some of the benefits of having a direct branch. * Interaction with SW CFI schemes can be complicated and fragile, e.g. requiring that we can recognise idiomatic codegen and remove indirections understand, at least until clang proves more helpful mechanisms for dealing with this. For PREEMPT_DYNAMIC, we don't need the full power of static calls, as we really only need to enable/disable specific preemption functions. We can achieve the same effect without a number of the pain points above by using static keys to fold early returns into the preemption functions themselves rather than in an out-of-line trampoline, effectively inlining the trampoline into the start of the function. For arm64, this results in good code generation. For example, the dynamic_cond_resched() wrapper looks as follows when enabled. When disabled, the first `B` is replaced with a `NOP`, resulting in an early return. | : | bti c | b // or `nop` | mov w0, #0x0 | ret | mrs x0, sp_el0 | ldr x0, [x0, #8] | cbnz x0, | paciasp | stp x29, x30, [sp, #-16]! | mov x29, sp | bl | mov w0, #0x1 | ldp x29, x30, [sp], #16 | autiasp | ret ... compared to the regular form of the function: | <__cond_resched>: | bti c | mrs x0, sp_el0 | ldr x1, [x0, #8] | cbz x1, <__cond_resched+0x18> | mov w0, #0x0 | ret | paciasp | stp x29, x30, [sp, #-16]! | mov x29, sp | bl | mov w0, #0x1 | ldp x29, x30, [sp], #16 | autiasp | ret Any architecture which implements static keys should be able to use this to implement PREEMPT_DYNAMIC with similar cost to non-inlined static calls. Since this is likely to have greater overhead than (inlined) static calls, PREEMPT_DYNAMIC is only defaulted to enabled when HAVE_PREEMPT_DYNAMIC_CALL is selected. Signed-off-by: Mark Rutland Signed-off-by: Peter Zijlstra (Intel) Acked-by: Ard Biesheuvel Acked-by: Frederic Weisbecker Link: https://lore.kernel.org/r/20220214165216.2231574-6-mark.rutland@arm.com --- include/linux/entry-common.h | 10 ++++++++-- include/linux/kernel.h | 7 ++++++- include/linux/sched.h | 10 +++++++++- 3 files changed, 23 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index dfd84c59b144..141952f4fee8 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -456,13 +456,19 @@ irqentry_state_t noinstr irqentry_enter(struct pt_regs *regs); */ void raw_irqentry_exit_cond_resched(void); #ifdef CONFIG_PREEMPT_DYNAMIC +#if defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) #define irqentry_exit_cond_resched_dynamic_enabled raw_irqentry_exit_cond_resched #define irqentry_exit_cond_resched_dynamic_disabled NULL DECLARE_STATIC_CALL(irqentry_exit_cond_resched, raw_irqentry_exit_cond_resched); #define irqentry_exit_cond_resched() static_call(irqentry_exit_cond_resched)() -#else -#define irqentry_exit_cond_resched() raw_irqentry_exit_cond_resched() +#elif defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) +DECLARE_STATIC_KEY_TRUE(sk_dynamic_irqentry_exit_cond_resched); +void dynamic_irqentry_exit_cond_resched(void); +#define irqentry_exit_cond_resched() dynamic_irqentry_exit_cond_resched() #endif +#else /* CONFIG_PREEMPT_DYNAMIC */ +#define irqentry_exit_cond_resched() raw_irqentry_exit_cond_resched() +#endif /* CONFIG_PREEMPT_DYNAMIC */ /** * irqentry_exit - Handle return from exception that used irqentry_enter() diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 33f47a996513..a890428bcc1a 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -99,7 +99,7 @@ struct user; extern int __cond_resched(void); # define might_resched() __cond_resched() -#elif defined(CONFIG_PREEMPT_DYNAMIC) +#elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) extern int __cond_resched(void); @@ -110,6 +110,11 @@ static __always_inline void might_resched(void) static_call_mod(might_resched)(); } +#elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) + +extern int dynamic_might_resched(void); +# define might_resched() dynamic_might_resched() + #else # define might_resched() do { } while (0) diff --git a/include/linux/sched.h b/include/linux/sched.h index 508b91d57470..de03ddeb064b 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2020,7 +2020,7 @@ static inline int test_tsk_need_resched(struct task_struct *tsk) #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) extern int __cond_resched(void); -#ifdef CONFIG_PREEMPT_DYNAMIC +#if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) DECLARE_STATIC_CALL(cond_resched, __cond_resched); @@ -2029,6 +2029,14 @@ static __always_inline int _cond_resched(void) return static_call_mod(cond_resched)(); } +#elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) +extern int dynamic_cond_resched(void); + +static __always_inline int _cond_resched(void) +{ + return dynamic_cond_resched(); +} + #else static inline int _cond_resched(void) -- cgit From 64b4a0f8b51b20e0c9dbff7748365994364d5f01 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Sat, 19 Feb 2022 11:47:22 +0000 Subject: net: phylink: remove phylink_config's pcs_poll phylink_config's pcs_poll is no longer used, let's get rid of it. Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index cca149f78d35..9ef9b7047f19 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -86,7 +86,6 @@ enum phylink_op_type { * @type: operation type of PHYLINK instance * @legacy_pre_march2020: driver has not been updated for March 2020 updates * (See commit 7cceb599d15d ("net: phylink: avoid mac_config calls") - * @pcs_poll: MAC PCS cannot provide link change interrupt * @poll_fixed_state: if true, starts link_poll, * if MAC link is at %MLO_AN_FIXED mode. * @ovr_an_inband: if true, override PCS to MLO_AN_INBAND @@ -100,7 +99,6 @@ struct phylink_config { struct device *dev; enum phylink_op_type type; bool legacy_pre_march2020; - bool pcs_poll; bool poll_fixed_state; bool ovr_an_inband; void (*get_fixed_state)(struct phylink_config *config, -- cgit From efcef265fd83d9a68a68926abecb3e1dd3e260a8 Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Tue, 15 Feb 2022 21:49:26 +0300 Subject: ata: add/use ata_taskfile::{error|status} fields Add the explicit error and status register fields to 'struct ata_taskfile' using the anonymous *union*s ('struct ide_taskfile' had that for ages!) and update the libata taskfile code accordingly. There should be no object code changes resulting from that... Signed-off-by: Sergey Shtylyov Signed-off-by: Damien Le Moal --- include/linux/libata.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index a49e75c4206d..0619ae462ecd 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -518,7 +518,10 @@ struct ata_taskfile { u8 hob_lbam; u8 hob_lbah; - u8 feature; + union { + u8 error; + u8 feature; + }; u8 nsect; u8 lbal; u8 lbam; @@ -526,7 +529,10 @@ struct ata_taskfile { u8 device; - u8 command; /* IO operation */ + union { + u8 status; + u8 command; + }; u32 auxiliary; /* auxiliary field */ /* from SATA 3.1 and */ -- cgit From cccd73d607fee52f35b4b030408fa5f6c21ef503 Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Wed, 16 Feb 2022 09:41:32 -0800 Subject: iosys-map: Add offset to iosys_map_memcpy_to() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In certain situations it's useful to be able to write to an offset of the mapping. Add a dst_offset to iosys_map_memcpy_to(). Cc: Sumit Semwal Cc: Christian König Cc: Thomas Zimmermann Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Lucas De Marchi Reviewed-by: Christian König Reviewed-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220216174147.3073235-2-lucas.demarchi@intel.com --- include/linux/iosys-map.h | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index f4186f91caa6..edd730b1e899 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -220,22 +220,23 @@ static inline void iosys_map_clear(struct iosys_map *map) } /** - * iosys_map_memcpy_to - Memcpy into iosys mapping + * iosys_map_memcpy_to - Memcpy into offset of iosys_map * @dst: The iosys_map structure + * @dst_offset: The offset from which to copy * @src: The source buffer * @len: The number of byte in src * - * Copies data into a iosys mapping. The source buffer is in system - * memory. Depending on the buffer's location, the helper picks the correct - * method of accessing the memory. + * Copies data into a iosys_map with an offset. The source buffer is in + * system memory. Depending on the buffer's location, the helper picks the + * correct method of accessing the memory. */ -static inline void iosys_map_memcpy_to(struct iosys_map *dst, const void *src, - size_t len) +static inline void iosys_map_memcpy_to(struct iosys_map *dst, size_t dst_offset, + const void *src, size_t len) { if (dst->is_iomem) - memcpy_toio(dst->vaddr_iomem, src, len); + memcpy_toio(dst->vaddr_iomem + dst_offset, src, len); else - memcpy(dst->vaddr, src, len); + memcpy(dst->vaddr + dst_offset, src, len); } /** -- cgit From e62f25e8b3cdd29224c27938addba817aedd4b54 Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Wed, 16 Feb 2022 09:41:33 -0800 Subject: iosys-map: Add a few more helpers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit First the simplest ones: - iosys_map_memset(): when abstracting system and I/O memory, just like the memcpy() use case, memset() also has dedicated functions to be called for using IO memory. - iosys_map_memcpy_from(): we may need to copy data from I/O memory, not only to. In certain situations it's useful to be able to read or write to an offset that is calculated by having the memory layout given by a struct declaration. Usually we are going to read/write a u8, u16, u32 or u64. As a pre-requisite for the implementation, add iosys_map_memcpy_from() to be the equivalent of iosys_map_memcpy_to(), but in the other direction. Then add 2 pairs of macros: - iosys_map_rd() / iosys_map_wr() - iosys_map_rd_field() / iosys_map_wr_field() The first pair takes the C-type and offset to read/write. The second pair uses a struct describing the layout of the mapping in order to calculate the offset and size being read/written. We could use readb, readw, readl, readq and the write* counterparts, however due to alignment issues this may not work on all architectures. If alignment needs to be checked to call the right function, it's not possible to decide at compile-time which function to call: so just leave the decision to the memcpy function that will do exactly that. Finally, in order to use the above macros with a map derived from another, add another initializer: IOSYS_MAP_INIT_OFFSET(). v2: - Rework IOSYS_MAP_INIT_OFFSET() so it doesn't rely on aliasing rules within the union - Add offset to both iosys_map_rd_field() and iosys_map_wr_field() to allow the struct itself to be at an offset from the mapping - Add documentation to iosys_map_rd_field() with example and expected memory layout v3: - Drop kernel.h include as it's not needed anymore Cc: Sumit Semwal Cc: Christian König Cc: Thomas Zimmermann Cc: Mauro Carvalho Chehab Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Lucas De Marchi Reviewed-by: Mauro Carvalho Chehab Reviewed-by: Matt Atwood Reviewed-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220216174147.3073235-3-lucas.demarchi@intel.com --- include/linux/iosys-map.h | 201 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 201 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index edd730b1e899..e69a002d5aa4 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -120,6 +120,45 @@ struct iosys_map { .is_iomem = false, \ } +/** + * IOSYS_MAP_INIT_OFFSET - Initializes struct iosys_map from another iosys_map + * @map_: The dma-buf mapping structure to copy from + * @offset_: Offset to add to the other mapping + * + * Initializes a new iosys_map struct based on another passed as argument. It + * does a shallow copy of the struct so it's possible to update the back storage + * without changing where the original map points to. It is the equivalent of + * doing: + * + * .. code-block:: c + * + * iosys_map map = other_map; + * iosys_map_incr(&map, &offset); + * + * Example usage: + * + * .. code-block:: c + * + * void foo(struct device *dev, struct iosys_map *base_map) + * { + * ... + * struct iosys_map map = IOSYS_MAP_INIT_OFFSET(base_map, FIELD_OFFSET); + * ... + * } + * + * The advantage of using the initializer over just increasing the offset with + * iosys_map_incr() like above is that the new map will always point to the + * right place of the buffer during its scope. It reduces the risk of updating + * the wrong part of the buffer and having no compiler warning about that. If + * the assignment to IOSYS_MAP_INIT_OFFSET() is forgotten, the compiler can warn + * about the use of uninitialized variable. + */ +#define IOSYS_MAP_INIT_OFFSET(map_, offset_) ({ \ + struct iosys_map copy = *map_; \ + iosys_map_incr(©, offset_); \ + copy; \ +}) + /** * iosys_map_set_vaddr - Sets a iosys mapping structure to an address in system memory * @map: The iosys_map structure @@ -239,6 +278,26 @@ static inline void iosys_map_memcpy_to(struct iosys_map *dst, size_t dst_offset, memcpy(dst->vaddr + dst_offset, src, len); } +/** + * iosys_map_memcpy_from - Memcpy from iosys_map into system memory + * @dst: Destination in system memory + * @src: The iosys_map structure + * @src_offset: The offset from which to copy + * @len: The number of byte in src + * + * Copies data from a iosys_map with an offset. The dest buffer is in + * system memory. Depending on the mapping location, the helper picks the + * correct method of accessing the memory. + */ +static inline void iosys_map_memcpy_from(void *dst, const struct iosys_map *src, + size_t src_offset, size_t len) +{ + if (src->is_iomem) + memcpy_fromio(dst, src->vaddr_iomem + src_offset, len); + else + memcpy(dst, src->vaddr + src_offset, len); +} + /** * iosys_map_incr - Increments the address stored in a iosys mapping * @map: The iosys_map structure @@ -255,4 +314,146 @@ static inline void iosys_map_incr(struct iosys_map *map, size_t incr) map->vaddr += incr; } +/** + * iosys_map_memset - Memset iosys_map + * @dst: The iosys_map structure + * @offset: Offset from dst where to start setting value + * @value: The value to set + * @len: The number of bytes to set in dst + * + * Set value in iosys_map. Depending on the buffer's location, the helper + * picks the correct method of accessing the memory. + */ +static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, + int value, size_t len) +{ + if (dst->is_iomem) + memset_io(dst->vaddr_iomem + offset, value, len); + else + memset(dst->vaddr + offset, value, len); +} + +/** + * iosys_map_rd - Read a C-type value from the iosys_map + * + * @map__: The iosys_map structure + * @offset__: The offset from which to read + * @type__: Type of the value being read + * + * Read a C type value from iosys_map, handling possible un-aligned accesses to + * the mapping. + * + * Returns: + * The value read from the mapping. + */ +#define iosys_map_rd(map__, offset__, type__) ({ \ + type__ val; \ + iosys_map_memcpy_from(&val, map__, offset__, sizeof(val)); \ + val; \ +}) + +/** + * iosys_map_wr - Write a C-type value to the iosys_map + * + * @map__: The iosys_map structure + * @offset__: The offset from the mapping to write to + * @type__: Type of the value being written + * @val__: Value to write + * + * Write a C-type value to the iosys_map, handling possible un-aligned accesses + * to the mapping. + */ +#define iosys_map_wr(map__, offset__, type__, val__) ({ \ + type__ val = (val__); \ + iosys_map_memcpy_to(map__, offset__, &val, sizeof(val)); \ +}) + +/** + * iosys_map_rd_field - Read a member from a struct in the iosys_map + * + * @map__: The iosys_map structure + * @struct_offset__: Offset from the beggining of the map, where the struct + * is located + * @struct_type__: The struct describing the layout of the mapping + * @field__: Member of the struct to read + * + * Read a value from iosys_map considering its layout is described by a C struct + * starting at @struct_offset__. The field offset and size is calculated and its + * value read handling possible un-aligned memory accesses. For example: suppose + * there is a @struct foo defined as below and the value ``foo.field2.inner2`` + * needs to be read from the iosys_map: + * + * .. code-block:: c + * + * struct foo { + * int field1; + * struct { + * int inner1; + * int inner2; + * } field2; + * int field3; + * } __packed; + * + * This is the expected memory layout of a buffer using iosys_map_rd_field(): + * + * +------------------------------+--------------------------+ + * | Address | Content | + * +==============================+==========================+ + * | buffer + 0000 | start of mmapped buffer | + * | | pointed by iosys_map | + * +------------------------------+--------------------------+ + * | ... | ... | + * +------------------------------+--------------------------+ + * | buffer + ``struct_offset__`` | start of ``struct foo`` | + * +------------------------------+--------------------------+ + * | ... | ... | + * +------------------------------+--------------------------+ + * | buffer + wwww | ``foo.field2.inner2`` | + * +------------------------------+--------------------------+ + * | ... | ... | + * +------------------------------+--------------------------+ + * | buffer + yyyy | end of ``struct foo`` | + * +------------------------------+--------------------------+ + * | ... | ... | + * +------------------------------+--------------------------+ + * | buffer + zzzz | end of mmaped buffer | + * +------------------------------+--------------------------+ + * + * Values automatically calculated by this macro or not needed are denoted by + * wwww, yyyy and zzzz. This is the code to read that value: + * + * .. code-block:: c + * + * x = iosys_map_rd_field(&map, offset, struct foo, field2.inner2); + * + * Returns: + * The value read from the mapping. + */ +#define iosys_map_rd_field(map__, struct_offset__, struct_type__, field__) ({ \ + struct_type__ *s; \ + iosys_map_rd(map__, struct_offset__ + offsetof(struct_type__, field__), \ + typeof(s->field__)); \ +}) + +/** + * iosys_map_wr_field - Write to a member of a struct in the iosys_map + * + * @map__: The iosys_map structure + * @struct_offset__: Offset from the beggining of the map, where the struct + * is located + * @struct_type__: The struct describing the layout of the mapping + * @field__: Member of the struct to read + * @val__: Value to write + * + * Write a value to the iosys_map considering its layout is described by a C struct + * starting at @struct_offset__. The field offset and size is calculated and the + * @val__ is written handling possible un-aligned memory accesses. Refer to + * iosys_map_rd_field() for expected usage and memory layout. + */ +#define iosys_map_wr_field(map__, struct_offset__, struct_type__, field__, val__) ({ \ + struct_type__ *s; \ + iosys_map_wr(map__, struct_offset__ + offsetof(struct_type__, field__), \ + typeof(s->field__), val__); \ +}) + #endif /* __IOSYS_MAP_H__ */ -- cgit From 643b622b51f1f0015e0a80f90b4ef9032e6ddb1b Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sun, 20 Feb 2022 15:06:32 +0800 Subject: net: tcp: add skb drop reasons to tcp_v{4,6}_inbound_md5_hash() Pass the address of drop reason to tcp_v4_inbound_md5_hash() and tcp_v6_inbound_md5_hash() to store the reasons for skb drops when this function fails. Therefore, the drop reason can be passed to kfree_skb_reason() when the skb needs to be freed. Following drop reasons are added: SKB_DROP_REASON_TCP_MD5NOTFOUND SKB_DROP_REASON_TCP_MD5UNEXPECTED SKB_DROP_REASON_TCP_MD5FAILURE SKB_DROP_REASON_TCP_MD5* above correspond to LINUX_MIB_TCPMD5* Reviewed-by: Mengen Sun Reviewed-by: Hao Peng Signed-off-by: Menglong Dong Reviewed-by: Eric Dumazet Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a5adbf6b51e8..46678eb587ff 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -346,6 +346,18 @@ enum skb_drop_reason { * udp packet drop out of * udp_memory_allocated. */ + SKB_DROP_REASON_TCP_MD5NOTFOUND, /* no MD5 hash and one + * expected, corresponding + * to LINUX_MIB_TCPMD5NOTFOUND + */ + SKB_DROP_REASON_TCP_MD5UNEXPECTED, /* MD5 hash and we're not + * expecting one, corresponding + * to LINUX_MIB_TCPMD5UNEXPECTED + */ + SKB_DROP_REASON_TCP_MD5FAILURE, /* MD5 hash and its wrong, + * corresponding to + * LINUX_MIB_TCPMD5FAILURE + */ SKB_DROP_REASON_MAX, }; -- cgit From 7a26dc9e7b43f5a24c4b843713e728582adf1c38 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sun, 20 Feb 2022 15:06:33 +0800 Subject: net: tcp: add skb drop reasons to tcp_add_backlog() Pass the address of drop_reason to tcp_add_backlog() to store the reasons for skb drops when fails. Following drop reasons are introduced: SKB_DROP_REASON_SOCKET_BACKLOG Reviewed-by: Mengen Sun Reviewed-by: Hao Peng Signed-off-by: Menglong Dong Reviewed-by: Eric Dumazet Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 46678eb587ff..f7f33c79945b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -358,6 +358,10 @@ enum skb_drop_reason { * corresponding to * LINUX_MIB_TCPMD5FAILURE */ + SKB_DROP_REASON_SOCKET_BACKLOG, /* failed to add skb to socket + * backlog (see + * LINUX_MIB_TCPBACKLOGDROP) + */ SKB_DROP_REASON_MAX, }; -- cgit From 2a968ef60e1fac4e694d9f60ce19a3b66b40e8c3 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sun, 20 Feb 2022 15:06:35 +0800 Subject: net: tcp: use tcp_drop_reason() for tcp_rcv_established() Replace tcp_drop() used in tcp_rcv_established() with tcp_drop_reason(). Following drop reasons are added: SKB_DROP_REASON_TCP_FLAGS Reviewed-by: Mengen Sun Reviewed-by: Hao Peng Signed-off-by: Menglong Dong Reviewed-by: Eric Dumazet Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f7f33c79945b..671db9f49efe 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -362,6 +362,7 @@ enum skb_drop_reason { * backlog (see * LINUX_MIB_TCPBACKLOGDROP) */ + SKB_DROP_REASON_TCP_FLAGS, /* TCP flags invalid */ SKB_DROP_REASON_MAX, }; -- cgit From a7ec381049c0d1f03e342063d75f5c3b314d0ec2 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sun, 20 Feb 2022 15:06:36 +0800 Subject: net: tcp: use tcp_drop_reason() for tcp_data_queue() Replace tcp_drop() used in tcp_data_queue() with tcp_drop_reason(). Following drop reasons are introduced: SKB_DROP_REASON_TCP_ZEROWINDOW SKB_DROP_REASON_TCP_OLD_DATA SKB_DROP_REASON_TCP_OVERWINDOW SKB_DROP_REASON_TCP_OLD_DATA is used for the case that end_seq of skb less than the left edges of receive window. (Maybe there is a better name?) Reviewed-by: Mengen Sun Reviewed-by: Hao Peng Signed-off-by: Menglong Dong Reviewed-by: Eric Dumazet Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 671db9f49efe..554ef2c848ee 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -363,6 +363,19 @@ enum skb_drop_reason { * LINUX_MIB_TCPBACKLOGDROP) */ SKB_DROP_REASON_TCP_FLAGS, /* TCP flags invalid */ + SKB_DROP_REASON_TCP_ZEROWINDOW, /* TCP receive window size is zero, + * see LINUX_MIB_TCPZEROWINDOWDROP + */ + SKB_DROP_REASON_TCP_OLD_DATA, /* the TCP data reveived is already + * received before (spurious retrans + * may happened), see + * LINUX_MIB_DELAYEDACKLOST + */ + SKB_DROP_REASON_TCP_OVERWINDOW, /* the TCP data is out of window, + * the seq of the first byte exceed + * the right edges of receive + * window + */ SKB_DROP_REASON_MAX, }; -- cgit From d25e481be0c519d1a458b14191dc8c2a8bb3e24a Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sun, 20 Feb 2022 15:06:37 +0800 Subject: net: tcp: use tcp_drop_reason() for tcp_data_queue_ofo() Replace tcp_drop() used in tcp_data_queue_ofo with tcp_drop_reason(). Following drop reasons are introduced: SKB_DROP_REASON_TCP_OFOMERGE Reviewed-by: Mengen Sun Reviewed-by: Hao Peng Signed-off-by: Menglong Dong Reviewed-by: Eric Dumazet Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 554ef2c848ee..a3e90efe6586 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -376,6 +376,10 @@ enum skb_drop_reason { * the right edges of receive * window */ + SKB_DROP_REASON_TCP_OFOMERGE, /* the data of skb is already in + * the ofo queue, corresponding to + * LINUX_MIB_TCPOFOMERGE + */ SKB_DROP_REASON_MAX, }; -- cgit From 44a3918c8245ab10c6c9719dd12e7a8d291980d8 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Fri, 18 Feb 2022 11:49:08 -0800 Subject: x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting With unprivileged eBPF enabled, eIBRS (without retpoline) is vulnerable to Spectre v2 BHB-based attacks. When both are enabled, print a warning message and report it in the 'spectre_v2' sysfs vulnerabilities file. Signed-off-by: Josh Poimboeuf Signed-off-by: Borislav Petkov Reviewed-by: Thomas Gleixner --- include/linux/bpf.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index fa517ae604ad..1f56806d8eb9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1793,6 +1793,11 @@ struct bpf_core_ctx { int bpf_core_apply(struct bpf_core_ctx *ctx, const struct bpf_core_relo *relo, int relo_idx, void *insn); +static inline bool unprivileged_ebpf_enabled(void) +{ + return !sysctl_unprivileged_bpf_disabled; +} + #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) { @@ -2012,6 +2017,12 @@ bpf_jit_find_kfunc_model(const struct bpf_prog *prog, { return NULL; } + +static inline bool unprivileged_ebpf_enabled(void) +{ + return false; +} + #endif /* CONFIG_BPF_SYSCALL */ void __bpf_free_used_btfs(struct bpf_prog_aux *aux, -- cgit From 509853f9e1e7b1490dc79f735a5dbafc9298f40d Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Fri, 11 Feb 2022 19:14:54 +0100 Subject: genirq: Provide generic_handle_irq_safe() Provide generic_handle_irq_safe() which can used from any context. Suggested-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Reviewed-by: Hans de Goede Reviewed-by: Oleksandr Natalenko Reviewed-by: Wolfram Sang Link: https://lore.kernel.org/r/20220211181500.1856198-2-bigeasy@linutronix.de --- include/linux/irqdesc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h index 93d270ca0c56..a77584593f7d 100644 --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -160,6 +160,7 @@ static inline void generic_handle_irq_desc(struct irq_desc *desc) int handle_irq_desc(struct irq_desc *desc); int generic_handle_irq(unsigned int irq); +int generic_handle_irq_safe(unsigned int irq); #ifdef CONFIG_IRQ_DOMAIN /* -- cgit From 93dd04ab0b2b32ae6e70284afc764c577156658e Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 18 Feb 2022 14:13:58 +0100 Subject: slab: remove __alloc_size attribute from __kmalloc_track_caller Commit c37495d6254c ("slab: add __alloc_size attributes for better bounds checking") added __alloc_size attributes to a bunch of kmalloc function prototypes. Unfortunately the change to __kmalloc_track_caller seems to cause clang to generate broken code and the first time this is called when booting, the box will crash. While the compiler problems are being reworked and attempted to be solved [1], let's just drop the attribute to solve the issue now. Once it is resolved it can be added back. [1] https://github.com/ClangBuiltLinux/linux/issues/1599 Fixes: c37495d6254c ("slab: add __alloc_size attributes for better bounds checking") Cc: stable Cc: Kees Cook Cc: Daniel Micay Cc: Nick Desaulniers Cc: Christoph Lameter Cc: Pekka Enberg Cc: Joonsoo Kim Cc: Andrew Morton Cc: Vlastimil Babka Cc: Nathan Chancellor Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Cc: llvm@lists.linux.dev Signed-off-by: Greg Kroah-Hartman Acked-by: Nick Desaulniers Acked-by: David Rientjes Acked-by: Kees Cook Signed-off-by: Vlastimil Babka Link: https://lore.kernel.org/r/20220218131358.3032912-1-gregkh@linuxfoundation.org --- include/linux/slab.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 37bde99b74af..5b6193fd8bd9 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -660,8 +660,7 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag * allocator where we care about the real place the memory allocation * request comes from. */ -extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller) - __alloc_size(1); +extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller); #define kmalloc_track_caller(size, flags) \ __kmalloc_track_caller(size, flags, _RET_IP_) -- cgit From 05976c5f3bff8f8b5230da4b39f7cd6dfba9943e Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Thu, 17 Feb 2022 13:12:31 +0000 Subject: firmware: arm_scmi: Support optional system wide atomic-threshold-us An SCMI agent can be configured system-wide with a well-defined atomic threshold: only SCMI synchronous command whose latency has been advertised by the SCMI platform to be lower or equal to this configured threshold will be considered for atomic operations, when requested and if supported by the underlying transport at all. Link: https://lore.kernel.org/r/20220217131234.50328-6-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 9f895cb81818..fdf6bd83cc59 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -619,6 +619,8 @@ struct scmi_notify_ops { * be interested to know if they can assume SCMI * command transactions associated to this handle will * never sleep and act accordingly. + * An optional atomic threshold value could be returned + * where configured. * @notify_ops: pointer to set of notifications related operations */ struct scmi_handle { @@ -629,7 +631,8 @@ struct scmi_handle { (*devm_protocol_get)(struct scmi_device *sdev, u8 proto, struct scmi_protocol_handle **ph); void (*devm_protocol_put)(struct scmi_device *sdev, u8 proto); - bool (*is_transport_atomic)(const struct scmi_handle *handle); + bool (*is_transport_atomic)(const struct scmi_handle *handle, + unsigned int *atomic_threshold); const struct scmi_notify_ops *notify_ops; }; -- cgit From b7bd36f2e9430a58aefdc326f8e6653e9b000243 Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Thu, 17 Feb 2022 13:12:32 +0000 Subject: firmware: arm_scmi: Add atomic support to clock protocol Introduce new _atomic variant for SCMI clock protocol operations related to enable disable operations: when an atomic operation is required the xfer poll_completion flag is set for that transaction. Link: https://lore.kernel.org/r/20220217131234.50328-7-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index fdf6bd83cc59..306e576835f8 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -82,6 +82,9 @@ struct scmi_clk_proto_ops { u64 rate); int (*enable)(const struct scmi_protocol_handle *ph, u32 clk_id); int (*disable)(const struct scmi_protocol_handle *ph, u32 clk_id); + int (*enable_atomic)(const struct scmi_protocol_handle *ph, u32 clk_id); + int (*disable_atomic)(const struct scmi_protocol_handle *ph, + u32 clk_id); }; /** -- cgit From 18f295b758b227857133f680a397a42e83c62f3f Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Thu, 17 Feb 2022 13:12:33 +0000 Subject: firmware: arm_scmi: Add support for clock_enable_latency An SCMI platform can optionally advertise an enable latency typically associated with a specific clock resource: add support for parsing such optional message field and export such information in the usual publicly accessible clock descriptor. Link: https://lore.kernel.org/r/20220217131234.50328-8-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 306e576835f8..b87551f41f9f 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -42,6 +42,7 @@ struct scmi_revision_info { struct scmi_clock_info { char name[SCMI_MAX_STR_SIZE]; + unsigned int enable_latency; bool rate_discrete; union { struct { -- cgit From f6c052afe6f802d87c74153b7a57c43b2e9faf07 Mon Sep 17 00:00:00 2001 From: Christophe Kerello Date: Sun, 20 Feb 2022 15:14:31 +0000 Subject: nvmem: core: Fix a conflict between MTD and NVMEM on wp-gpios property Wp-gpios property can be used on NVMEM nodes and the same property can be also used on MTD NAND nodes. In case of the wp-gpios property is defined at NAND level node, the GPIO management is done at NAND driver level. Write protect is disabled when the driver is probed or resumed and is enabled when the driver is released or suspended. When no partitions are defined in the NAND DT node, then the NAND DT node will be passed to NVMEM framework. If wp-gpios property is defined in this node, the GPIO resource is taken twice and the NAND controller driver fails to probe. It would be possible to set config->wp_gpio at MTD level before calling nvmem_register function but NVMEM framework will toggle this GPIO on each write when this GPIO should only be controlled at NAND level driver to ensure that the Write Protect has not been enabled. A way to fix this conflict is to add a new boolean flag in nvmem_config named ignore_wp. In case ignore_wp is set, the GPIO resource will be managed by the provider. Fixes: 2a127da461a9 ("nvmem: add support for the write-protect pin") Cc: stable@vger.kernel.org Signed-off-by: Christophe Kerello Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20220220151432.16605-2-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/nvmem-provider.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nvmem-provider.h b/include/linux/nvmem-provider.h index 98efb7b5660d..c9a3ac9efeaa 100644 --- a/include/linux/nvmem-provider.h +++ b/include/linux/nvmem-provider.h @@ -70,7 +70,8 @@ struct nvmem_keepout { * @word_size: Minimum read/write access granularity. * @stride: Minimum read/write access stride. * @priv: User context passed to read/write callbacks. - * @wp-gpio: Write protect pin + * @wp-gpio: Write protect pin + * @ignore_wp: Write Protect pin is managed by the provider. * * Note: A default "nvmem" name will be assigned to the device if * no name is specified in its configuration. In such case "" is @@ -92,6 +93,7 @@ struct nvmem_config { enum nvmem_type type; bool read_only; bool root_only; + bool ignore_wp; struct device_node *of_node; bool no_of_node; nvmem_reg_read_t reg_read; -- cgit From 190fae468592bc2f0efc8b928920f8f712b5831e Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Sun, 20 Feb 2022 15:15:15 +0000 Subject: nvmem: core: Remove unused devm_nvmem_unregister() There are no users and seems no will come of the devm_nvmem_unregister(). Remove the function and remove the unused devm_nvmem_match() along with it. Signed-off-by: Andy Shevchenko Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20220220151527.17216-2-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/nvmem-provider.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nvmem-provider.h b/include/linux/nvmem-provider.h index 98efb7b5660d..99c01c43d7a8 100644 --- a/include/linux/nvmem-provider.h +++ b/include/linux/nvmem-provider.h @@ -133,8 +133,6 @@ void nvmem_unregister(struct nvmem_device *nvmem); struct nvmem_device *devm_nvmem_register(struct device *dev, const struct nvmem_config *cfg); -int devm_nvmem_unregister(struct device *dev, struct nvmem_device *nvmem); - void nvmem_add_cell_table(struct nvmem_cell_table *table); void nvmem_del_cell_table(struct nvmem_cell_table *table); @@ -153,12 +151,6 @@ devm_nvmem_register(struct device *dev, const struct nvmem_config *c) return nvmem_register(c); } -static inline int -devm_nvmem_unregister(struct device *dev, struct nvmem_device *nvmem) -{ - return -EOPNOTSUPP; -} - static inline void nvmem_add_cell_table(struct nvmem_cell_table *table) {} static inline void nvmem_del_cell_table(struct nvmem_cell_table *table) {} -- cgit From 04ec96b768c9dd43946b047c3da60dcc66431370 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 9 Feb 2022 14:43:25 +0100 Subject: random: make more consistent use of integer types We've been using a flurry of int, unsigned int, size_t, and ssize_t. Let's unify all of this into size_t where it makes sense, as it does in most places, and leave ssize_t for return values with possible errors. In addition, keeping with the convention of other functions in this file, functions that are dealing with raw bytes now take void * consistently instead of a mix of that and u8 *, because much of the time we're actually passing some other structure that is then interpreted as bytes by the function. We also take the opportunity to fix the outdated and incorrect comment in get_random_bytes_arch(). Cc: Theodore Ts'o Reviewed-by: Dominik Brodowski Reviewed-by: Jann Horn Reviewed-by: Eric Biggers Signed-off-by: Jason A. Donenfeld --- include/linux/hw_random.h | 2 +- include/linux/random.h | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index 8e6dd908da21..1a9fc38f8938 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -61,6 +61,6 @@ extern int devm_hwrng_register(struct device *dev, struct hwrng *rng); extern void hwrng_unregister(struct hwrng *rng); extern void devm_hwrng_unregister(struct device *dve, struct hwrng *rng); /** Feed random bits into the pool. */ -extern void add_hwgenerator_randomness(const char *buffer, size_t count, size_t entropy); +extern void add_hwgenerator_randomness(const void *buffer, size_t count, size_t entropy); #endif /* LINUX_HWRANDOM_H_ */ diff --git a/include/linux/random.h b/include/linux/random.h index c45b2693e51f..e92efb39779c 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -20,8 +20,8 @@ struct random_ready_callback { struct module *owner; }; -extern void add_device_randomness(const void *, unsigned int); -extern void add_bootloader_randomness(const void *, unsigned int); +extern void add_device_randomness(const void *, size_t); +extern void add_bootloader_randomness(const void *, size_t); #if defined(LATENT_ENTROPY_PLUGIN) && !defined(__CHECKER__) static inline void add_latent_entropy(void) @@ -37,13 +37,13 @@ extern void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) __latent_entropy; extern void add_interrupt_randomness(int irq) __latent_entropy; -extern void get_random_bytes(void *buf, int nbytes); +extern void get_random_bytes(void *buf, size_t nbytes); extern int wait_for_random_bytes(void); extern int __init rand_initialize(void); extern bool rng_is_initialized(void); extern int add_random_ready_callback(struct random_ready_callback *rdy); extern void del_random_ready_callback(struct random_ready_callback *rdy); -extern int __must_check get_random_bytes_arch(void *buf, int nbytes); +extern size_t __must_check get_random_bytes_arch(void *buf, size_t nbytes); #ifndef MODULE extern const struct file_operations random_fops, urandom_fops; @@ -87,7 +87,7 @@ static inline unsigned long get_random_canary(void) /* Calls wait_for_random_bytes() and then calls get_random_bytes(buf, nbytes). * Returns the result of the call to wait_for_random_bytes. */ -static inline int get_random_bytes_wait(void *buf, int nbytes) +static inline int get_random_bytes_wait(void *buf, size_t nbytes) { int ret = wait_for_random_bytes(); get_random_bytes(buf, nbytes); -- cgit From 6071a6c0fba2d747742cadcbb3ba26ed756ed73b Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Fri, 11 Feb 2022 12:28:33 +0100 Subject: random: remove useless header comment This really adds nothing at all useful. Cc: Theodore Ts'o Reviewed-by: Dominik Brodowski Reviewed-by: Eric Biggers Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index e92efb39779c..37e1e8c43d7e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -1,9 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0 */ -/* - * include/linux/random.h - * - * Include file for the random number generator. - */ + #ifndef _LINUX_RANDOM_H #define _LINUX_RANDOM_H -- cgit From b777c38239fec5a528e59f55b379e31b1a187524 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sun, 13 Feb 2022 16:17:01 +0100 Subject: random: pull add_hwgenerator_randomness() declaration into random.h add_hwgenerator_randomness() is a function implemented and documented inside of random.c. It is the way that hardware RNGs push data into it. Therefore, it should be declared in random.h. Otherwise sparse complains with: random.c:1137:6: warning: symbol 'add_hwgenerator_randomness' was not declared. Should it be static? The alternative would be to include hw_random.h into random.c, but that wouldn't really be good for anything except slowing down compile time. Cc: Matt Mackall Cc: Theodore Ts'o Acked-by: Herbert Xu Reviewed-by: Eric Biggers Reviewed-by: Dominik Brodowski Signed-off-by: Jason A. Donenfeld --- include/linux/hw_random.h | 2 -- include/linux/random.h | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index 1a9fc38f8938..aa1d4da03538 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -60,7 +60,5 @@ extern int devm_hwrng_register(struct device *dev, struct hwrng *rng); /** Unregister a Hardware Random Number Generator driver. */ extern void hwrng_unregister(struct hwrng *rng); extern void devm_hwrng_unregister(struct device *dve, struct hwrng *rng); -/** Feed random bits into the pool. */ -extern void add_hwgenerator_randomness(const void *buffer, size_t count, size_t entropy); #endif /* LINUX_HWRANDOM_H_ */ diff --git a/include/linux/random.h b/include/linux/random.h index 37e1e8c43d7e..d7354de9351e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -32,6 +32,8 @@ static inline void add_latent_entropy(void) {} extern void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) __latent_entropy; extern void add_interrupt_randomness(int irq) __latent_entropy; +extern void add_hwgenerator_randomness(const void *buffer, size_t count, + size_t entropy); extern void get_random_bytes(void *buf, size_t nbytes); extern int wait_for_random_bytes(void); -- cgit From 3191dd5a1179ef0fad5a050a1702ae98b6251e8f Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sun, 13 Feb 2022 22:48:04 +0100 Subject: random: clear fast pool, crng, and batches in cpuhp bring up For the irq randomness fast pool, rather than having to use expensive atomics, which were visibly the most expensive thing in the entire irq handler, simply take care of the extreme edge case of resetting count to zero in the cpuhp online handler, just after workqueues have been reenabled. This simplifies the code a bit and lets us use vanilla variables rather than atomics, and performance should be improved. As well, very early on when the CPU comes up, while interrupts are still disabled, we clear out the per-cpu crng and its batches, so that it always starts with fresh randomness. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Theodore Ts'o Cc: Sultan Alsawaf Cc: Dominik Brodowski Acked-by: Sebastian Andrzej Siewior Signed-off-by: Jason A. Donenfeld --- include/linux/cpuhotplug.h | 2 ++ include/linux/random.h | 5 +++++ 2 files changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 411a428ace4d..481e565cc5c4 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -100,6 +100,7 @@ enum cpuhp_state { CPUHP_AP_ARM_CACHE_B15_RAC_DEAD, CPUHP_PADATA_DEAD, CPUHP_AP_DTPM_CPU_DEAD, + CPUHP_RANDOM_PREPARE, CPUHP_WORKQUEUE_PREP, CPUHP_POWER_NUMA_PREPARE, CPUHP_HRTIMERS_PREPARE, @@ -240,6 +241,7 @@ enum cpuhp_state { CPUHP_AP_PERF_CSKY_ONLINE, CPUHP_AP_WATCHDOG_ONLINE, CPUHP_AP_WORKQUEUE_ONLINE, + CPUHP_AP_RANDOM_ONLINE, CPUHP_AP_RCUTREE_ONLINE, CPUHP_AP_BASE_CACHEINFO_ONLINE, CPUHP_AP_ONLINE_DYN, diff --git a/include/linux/random.h b/include/linux/random.h index d7354de9351e..6148b8d1ccf3 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -156,4 +156,9 @@ static inline bool __init arch_get_random_long_early(unsigned long *v) } #endif +#ifdef CONFIG_SMP +extern int random_prepare_cpu(unsigned int cpu); +extern int random_online_cpu(unsigned int cpu); +#endif + #endif /* _LINUX_RANDOM_H */ -- cgit From 0fbb4d93b38bce1f8235aacfa37e90ad8f011473 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Thu, 17 Feb 2022 23:40:32 -0500 Subject: dm: add dm_submit_bio_remap interface Where possible, switch from early bio-based IO accounting (at the time DM clones each incoming bio) to late IO accounting just before each remapped bio is issued to underlying device via submit_bio_noacct(). Allows more precise bio-based IO accounting for DM targets that use their own workqueues to perform additional processing of each bio in conjunction with their DM_MAPIO_SUBMITTED return from their map function. When a target is updated to use dm_submit_bio_remap() they must also set ti->accounts_remapped_io to true. Use xchg() in start_io_acct(), as suggested by Mikulas, to ensure each IO is only started once. The xchg race only happens if __send_duplicate_bios() sends multiple bios -- that case is reflected via tio->is_duplicate_bio. Given the niche nature of this race, it is best to avoid any xchg performance penalty for normal IO. For IO that was never submitted with dm_bio_submit_remap(), but the target completes the clone with bio_endio, accounting is started then ended and pending_io counter decremented. Reviewed-by: Mikulas Patocka Signed-off-by: Mike Snitzer --- include/linux/device-mapper.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index b26fecf6c8e8..7752d14d13f8 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -362,6 +362,12 @@ struct dm_target { * zone append operations using regular writes. */ bool emulate_zone_append:1; + + /* + * Set if the target will submit IO using dm_submit_bio_remap() + * after returning DM_MAPIO_SUBMITTED from its map function. + */ + bool accounts_remapped_io:1; }; void *dm_per_bio_data(struct bio *bio, size_t data_size); @@ -465,6 +471,7 @@ int dm_suspended(struct dm_target *ti); int dm_post_suspending(struct dm_target *ti); int dm_noflush_suspending(struct dm_target *ti); void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors); +void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone, bool from_wq); union map_info *dm_get_rq_mapinfo(struct request *rq); #ifdef CONFIG_BLK_DEV_ZONED -- cgit From a8b9d116cda047f38ba29ce26df49c57479ffaa2 Mon Sep 17 00:00:00 2001 From: Tom Rix Date: Wed, 26 Jan 2022 05:18:26 -0800 Subject: dm: cleanup double word in comment Remove the second 'a'. Signed-off-by: Tom Rix Signed-off-by: Mike Snitzer --- include/linux/device-mapper.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 7752d14d13f8..70cd9449275a 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -358,7 +358,7 @@ struct dm_target { bool limit_swap_bios:1; /* - * Set if this target implements a a zoned device and needs emulation of + * Set if this target implements a zoned device and needs emulation of * zone append operations using regular writes. */ bool emulate_zone_append:1; -- cgit From e0891269a8c25715bd9510dc355326b00ab42db2 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 16 Feb 2022 16:22:26 +0000 Subject: linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}() Currently aliasing an asm function requires adding START and END annotations for each name, as per Documentation/asm-annotations.rst: SYM_FUNC_START_ALIAS(__memset) SYM_FUNC_START(memset) ... asm insns ... SYM_FUNC_END(memset) SYM_FUNC_END_ALIAS(__memset) This is more painful than necessary to maintain, especially where a function has many aliases, some of which we may wish to define conditionally. For example, arm64's memcpy/memmove implementation (which uses some arch-specific SYM_*() helpers) has: SYM_FUNC_START_ALIAS(__memmove) SYM_FUNC_START_ALIAS_WEAK_PI(memmove) SYM_FUNC_START_ALIAS(__memcpy) SYM_FUNC_START_WEAK_PI(memcpy) ... asm insns ... SYM_FUNC_END_PI(memcpy) EXPORT_SYMBOL(memcpy) SYM_FUNC_END_ALIAS(__memcpy) EXPORT_SYMBOL(__memcpy) SYM_FUNC_END_ALIAS_PI(memmove) EXPORT_SYMBOL(memmove) SYM_FUNC_END_ALIAS(__memmove) EXPORT_SYMBOL(__memmove) SYM_FUNC_START(name) It would be much nicer if we could define the aliases *after* the standard function definition. This would avoid the need to specify each symbol name twice, and would make it easier to spot the canonical function definition. This patch adds new macros to allow us to do so, which allows the above example to be rewritten more succinctly as: SYM_FUNC_START(__pi_memcpy) ... asm insns ... SYM_FUNC_END(__pi_memcpy) SYM_FUNC_ALIAS(__memcpy, __pi_memcpy) EXPORT_SYMBOL(__memcpy) SYM_FUNC_ALIAS_WEAK(memcpy, __memcpy) EXPORT_SYMBOL(memcpy) SYM_FUNC_ALIAS(__pi_memmove, __pi_memcpy) SYM_FUNC_ALIAS(__memmove, __pi_memmove) EXPORT_SYMBOL(__memmove) SYM_FUNC_ALIAS_WEAK(memmove, __memmove) EXPORT_SYMBOL(memmove) The reduction in duplication will also make it possible to replace some uses of WEAK with more accurate Kconfig guards, e.g. #ifndef CONFIG_KASAN SYM_FUNC_ALIAS(memmove, __memmove) EXPORT_SYMBOL(memmove) #endif ... which should make it easier to ensure that symbols are neither used nor overidden unexpectedly. The existing SYM_FUNC_START_ALIAS() and SYM_FUNC_START_LOCAL_ALIAS() are marked as deprecated, and will be removed once existing users are moved over to the new scheme. The tools/perf/ copy of linkage.h is updated to match. A subsequent patch will depend upon this when updating the x86 asm annotations. Signed-off-by: Mark Rutland Acked-by: Ard Biesheuvel Acked-by: Josh Poimboeuf Acked-by: Mark Brown Cc: Arnaldo Carvalho de Melo Cc: Borislav Petkov Cc: Jiri Slaby Cc: Peter Zijlstra Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220216162229.1076788-2-mark.rutland@arm.com Signed-off-by: Will Deacon --- include/linux/linkage.h | 37 ++++++++++++++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/linkage.h b/include/linux/linkage.h index dbf8506decca..e574a84d8b11 100644 --- a/include/linux/linkage.h +++ b/include/linux/linkage.h @@ -165,7 +165,18 @@ #ifndef SYM_END #define SYM_END(name, sym_type) \ .type name sym_type ASM_NL \ - .size name, .-name + .set .L__sym_size_##name, .-name ASM_NL \ + .size name, .L__sym_size_##name +#endif + +/* SYM_ALIAS -- use only if you have to */ +#ifndef SYM_ALIAS +#define SYM_ALIAS(alias, name, sym_type, linkage) \ + linkage(alias) ASM_NL \ + .set alias, name ASM_NL \ + .type alias sym_type ASM_NL \ + .set .L__sym_size_##alias, .L__sym_size_##name ASM_NL \ + .size alias, .L__sym_size_##alias #endif /* === code annotations === */ @@ -275,6 +286,30 @@ SYM_END(name, SYM_T_FUNC) #endif +/* + * SYM_FUNC_ALIAS -- define a global alias for an existing function + */ +#ifndef SYM_FUNC_ALIAS +#define SYM_FUNC_ALIAS(alias, name) \ + SYM_ALIAS(alias, name, SYM_T_FUNC, SYM_L_GLOBAL) +#endif + +/* + * SYM_FUNC_ALIAS_LOCAL -- define a local alias for an existing function + */ +#ifndef SYM_FUNC_ALIAS_LOCAL +#define SYM_FUNC_ALIAS_LOCAL(alias, name) \ + SYM_ALIAS(alias, name, SYM_T_FUNC, SYM_L_LOCAL) +#endif + +/* + * SYM_FUNC_ALIAS_WEAK -- define a weak global alias for an existing function + */ +#ifndef SYM_FUNC_ALIAS_WEAK +#define SYM_FUNC_ALIAS_WEAK(alias, name) \ + SYM_ALIAS(alias, name, SYM_T_FUNC, SYM_L_WEAK) +#endif + /* SYM_CODE_START -- use for non-C (special) functions */ #ifndef SYM_CODE_START #define SYM_CODE_START(name) \ -- cgit From be9aea74400433e03c2a8b0260fc9ffe2495f698 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 16 Feb 2022 16:22:29 +0000 Subject: linkage: remove SYM_FUNC_{START,END}_ALIAS() Now that all aliases are defined using SYM_FUNC_ALIAS(), remove the old SYM_FUNC_{START,END}_ALIAS() macros. There should be no functional change as a result of this patch. Signed-off-by: Mark Rutland Acked-by: Ard Biesheuvel Acked-by: Josh Poimboeuf Acked-by: Mark Brown Cc: Arnaldo Carvalho de Melo Cc: Borislav Petkov Cc: Jiri Slaby Cc: Peter Zijlstra Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220216162229.1076788-5-mark.rutland@arm.com Signed-off-by: Will Deacon --- include/linux/linkage.h | 30 ------------------------------ 1 file changed, 30 deletions(-) (limited to 'include/linux') diff --git a/include/linux/linkage.h b/include/linux/linkage.h index e574a84d8b11..acb1ad2356f1 100644 --- a/include/linux/linkage.h +++ b/include/linux/linkage.h @@ -211,30 +211,8 @@ SYM_ENTRY(name, linkage, SYM_A_NONE) #endif -/* - * SYM_FUNC_START_LOCAL_ALIAS -- use where there are two local names for one - * function - */ -#ifndef SYM_FUNC_START_LOCAL_ALIAS -#define SYM_FUNC_START_LOCAL_ALIAS(name) \ - SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN) -#endif - -/* - * SYM_FUNC_START_ALIAS -- use where there are two global names for one - * function - */ -#ifndef SYM_FUNC_START_ALIAS -#define SYM_FUNC_START_ALIAS(name) \ - SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN) -#endif - /* SYM_FUNC_START -- use for global functions */ #ifndef SYM_FUNC_START -/* - * The same as SYM_FUNC_START_ALIAS, but we will need to distinguish these two - * later. - */ #define SYM_FUNC_START(name) \ SYM_START(name, SYM_L_GLOBAL, SYM_A_ALIGN) #endif @@ -247,7 +225,6 @@ /* SYM_FUNC_START_LOCAL -- use for local functions */ #ifndef SYM_FUNC_START_LOCAL -/* the same as SYM_FUNC_START_LOCAL_ALIAS, see comment near SYM_FUNC_START */ #define SYM_FUNC_START_LOCAL(name) \ SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN) #endif @@ -270,18 +247,11 @@ SYM_START(name, SYM_L_WEAK, SYM_A_NONE) #endif -/* SYM_FUNC_END_ALIAS -- the end of LOCAL_ALIASed or ALIASed function */ -#ifndef SYM_FUNC_END_ALIAS -#define SYM_FUNC_END_ALIAS(name) \ - SYM_END(name, SYM_T_FUNC) -#endif - /* * SYM_FUNC_END -- the end of SYM_FUNC_START_LOCAL, SYM_FUNC_START, * SYM_FUNC_START_WEAK, ... */ #ifndef SYM_FUNC_END -/* the same as SYM_FUNC_END_ALIAS, see comment near SYM_FUNC_START */ #define SYM_FUNC_END(name) \ SYM_END(name, SYM_T_FUNC) #endif -- cgit From 6e031ec0e5a2dda53e12e0d2a7e9b15b47a3c502 Mon Sep 17 00:00:00 2001 From: Alex Williamson Date: Mon, 24 Jan 2022 16:11:37 -0700 Subject: vfio/pci: Stub vfio_pci_vga_rw when !CONFIG_VFIO_PCI_VGA Resolve build errors reported against UML build for undefined ioport_map() and ioport_unmap() functions. Without this config option a device cannot have vfio_pci_core_device.has_vga set, so the existing function would always return -EINVAL anyway. Reported-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20220123125737.2658758-1-geert@linux-m68k.org Link: https://lore.kernel.org/r/164306582968.3758255.15192949639574660648.stgit@omen Signed-off-by: Alex Williamson --- include/linux/vfio_pci_core.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index ef9a44b6cf5d..ae6f4838ab75 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -159,8 +159,17 @@ extern ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, extern ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite); +#ifdef CONFIG_VFIO_PCI_VGA extern ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, loff_t *ppos, bool iswrite); +#else +static inline ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, + char __user *buf, size_t count, + loff_t *ppos, bool iswrite) +{ + return -EINVAL; +} +#endif extern long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, uint64_t data, int count, int fd); -- cgit From 1a03d3f13ffe5dd24142d6db629e72c11b704d99 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Thu, 17 Feb 2022 11:24:04 +0100 Subject: fork: Move task stack accounting to do_exit() There is no need to perform the stack accounting of the outgoing task in its final schedule() invocation which happens with preemption disabled. The task is leaving, the resources will be freed and the accounting can happen in do_exit() before the actual schedule invocation which frees the stack memory. Move the accounting of the stack memory from release_task_stack() to exit_task_stack_account() which then can be invoked from do_exit(). Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Andy Lutomirski Link: https://lore.kernel.org/r/20220217102406.3697941-7-bigeasy@linutronix.de --- include/linux/sched/task_stack.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h index d10150587d81..892562ebbd3a 100644 --- a/include/linux/sched/task_stack.h +++ b/include/linux/sched/task_stack.h @@ -79,6 +79,8 @@ static inline void *try_get_task_stack(struct task_struct *tsk) static inline void put_task_stack(struct task_struct *tsk) {} #endif +void exit_task_stack_account(struct task_struct *tsk); + #define task_stack_end_corrupted(task) \ (*(end_of_stack(task)) != STACK_END_MAGIC) -- cgit From f9b5e46f4097eb298f68e5b02f70697a90a44739 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Fri, 18 Feb 2022 17:29:44 -0800 Subject: kasan: split kasan_*enabled() functions into a separate header In an upcoming commit we are going to need to call kasan_hw_tags_enabled() from arch/arm64/include/asm/mte.h. This would create a circular dependency between headers if KASAN_GENERIC or KASAN_SW_TAGS is enabled: linux/kasan.h -> linux/pgtable.h -> asm/pgtable.h -> asm/mte.h -> linux/kasan.h. Break the cycle by introducing a new header linux/kasan-enabled.h with the kasan_*enabled() functions that can be included from asm/mte.h. Link: https://linux-review.googlesource.com/id/I5b0d96c6ed0026fc790899e14d42b2fac6ab568e Signed-off-by: Peter Collingbourne Reviewed-by: Andrey Konovalov Link: https://lore.kernel.org/r/20220219012945.894950-1-pcc@google.com Signed-off-by: Will Deacon --- include/linux/kasan-enabled.h | 33 +++++++++++++++++++++++++++++++++ include/linux/kasan.h | 23 +---------------------- 2 files changed, 34 insertions(+), 22 deletions(-) create mode 100644 include/linux/kasan-enabled.h (limited to 'include/linux') diff --git a/include/linux/kasan-enabled.h b/include/linux/kasan-enabled.h new file mode 100644 index 000000000000..4b6615375022 --- /dev/null +++ b/include/linux/kasan-enabled.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_KASAN_ENABLED_H +#define _LINUX_KASAN_ENABLED_H + +#ifdef CONFIG_KASAN_HW_TAGS + +DECLARE_STATIC_KEY_FALSE(kasan_flag_enabled); + +static __always_inline bool kasan_enabled(void) +{ + return static_branch_likely(&kasan_flag_enabled); +} + +static inline bool kasan_hw_tags_enabled(void) +{ + return kasan_enabled(); +} + +#else /* CONFIG_KASAN_HW_TAGS */ + +static inline bool kasan_enabled(void) +{ + return IS_ENABLED(CONFIG_KASAN); +} + +static inline bool kasan_hw_tags_enabled(void) +{ + return false; +} + +#endif /* CONFIG_KASAN_HW_TAGS */ + +#endif /* LINUX_KASAN_ENABLED_H */ diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 4a45562d8893..b6a93261c92a 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -3,6 +3,7 @@ #define _LINUX_KASAN_H #include +#include #include #include #include @@ -83,33 +84,11 @@ static inline void kasan_disable_current(void) {} #ifdef CONFIG_KASAN_HW_TAGS -DECLARE_STATIC_KEY_FALSE(kasan_flag_enabled); - -static __always_inline bool kasan_enabled(void) -{ - return static_branch_likely(&kasan_flag_enabled); -} - -static inline bool kasan_hw_tags_enabled(void) -{ - return kasan_enabled(); -} - void kasan_alloc_pages(struct page *page, unsigned int order, gfp_t flags); void kasan_free_pages(struct page *page, unsigned int order); #else /* CONFIG_KASAN_HW_TAGS */ -static inline bool kasan_enabled(void) -{ - return IS_ENABLED(CONFIG_KASAN); -} - -static inline bool kasan_hw_tags_enabled(void) -{ - return false; -} - static __always_inline void kasan_alloc_pages(struct page *page, unsigned int order, gfp_t flags) { -- cgit From a773187e37fa5c3bcd92ffda9085707df34f903a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 9 Feb 2022 09:28:27 +0100 Subject: scsi: dm: Remove WRITE_SAME support There are no more end-users of REQ_OP_WRITE_SAME left, so we can start deleting it. Link: https://lore.kernel.org/r/20220209082828.2629273-7-hch@lst.de Reviewed-by: Mike Snitzer Signed-off-by: Christoph Hellwig Signed-off-by: Martin K. Petersen --- include/linux/device-mapper.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index b26fecf6c8e8..721db99f4f63 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -316,12 +316,6 @@ struct dm_target { */ unsigned num_secure_erase_bios; - /* - * The number of WRITE SAME bios that will be submitted to the target. - * The bio number can be accessed with dm_bio_get_target_bio_nr. - */ - unsigned num_write_same_bios; - /* * The number of WRITE ZEROES bios that will be submitted to the target. * The bio number can be accessed with dm_bio_get_target_bio_nr. -- cgit From 73bd66d9c834220579c881a3eb020fd8917075d8 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 9 Feb 2022 09:28:28 +0100 Subject: scsi: block: Remove REQ_OP_WRITE_SAME support No more users of REQ_OP_WRITE_SAME or drivers implementing it are left, so remove the infrastructure. [mkp: fold in and tweak sysfs reporting fix] Link: https://lore.kernel.org/r/20220209082828.2629273-8-hch@lst.de Reviewed-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig Signed-off-by: Martin K. Petersen --- include/linux/bio.h | 3 --- include/linux/blk_types.h | 2 -- include/linux/blkdev.h | 19 ------------------- 3 files changed, 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 117d7f248ac9..eb402afa370a 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -65,7 +65,6 @@ static inline bool bio_no_advance_iter(const struct bio *bio) { return bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_SECURE_ERASE || - bio_op(bio) == REQ_OP_WRITE_SAME || bio_op(bio) == REQ_OP_WRITE_ZEROES; } @@ -186,8 +185,6 @@ static inline unsigned bio_segments(struct bio *bio) case REQ_OP_SECURE_ERASE: case REQ_OP_WRITE_ZEROES: return 0; - case REQ_OP_WRITE_SAME: - return 1; default: break; } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index fe065c394fff..077adf4b6e73 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -354,8 +354,6 @@ enum req_opf { REQ_OP_DISCARD = 3, /* securely erase sectors */ REQ_OP_SECURE_ERASE = 5, - /* write the same sector many times */ - REQ_OP_WRITE_SAME = 7, /* write the zero filled sector many times */ REQ_OP_WRITE_ZEROES = 9, /* Open a zone */ diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 9c95df26fc26..b10470d5e986 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -97,7 +97,6 @@ struct queue_limits { unsigned int io_opt; unsigned int max_discard_sectors; unsigned int max_hw_discard_sectors; - unsigned int max_write_same_sectors; unsigned int max_write_zeroes_sectors; unsigned int max_zone_append_sectors; unsigned int discard_granularity; @@ -651,9 +650,6 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, return min(q->limits.max_discard_sectors, UINT_MAX >> SECTOR_SHIFT); - if (unlikely(op == REQ_OP_WRITE_SAME)) - return q->limits.max_write_same_sectors; - if (unlikely(op == REQ_OP_WRITE_ZEROES)) return q->limits.max_write_zeroes_sectors; @@ -696,8 +692,6 @@ extern void blk_queue_max_discard_segments(struct request_queue *, extern void blk_queue_max_segment_size(struct request_queue *, unsigned int); extern void blk_queue_max_discard_sectors(struct request_queue *q, unsigned int max_discard_sectors); -extern void blk_queue_max_write_same_sectors(struct request_queue *q, - unsigned int max_write_same_sectors); extern void blk_queue_max_write_zeroes_sectors(struct request_queue *q, unsigned int max_write_same_sectors); extern void blk_queue_logical_block_size(struct request_queue *, unsigned int); @@ -842,9 +836,6 @@ static inline long nr_blockdev_pages(void) extern void blk_io_schedule(void); -extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector, - sector_t nr_sects, gfp_t gfp_mask, struct page *page); - #define BLKDEV_DISCARD_SECURE (1 << 0) /* issue a secure erase */ extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, @@ -1071,16 +1062,6 @@ static inline int bdev_discard_alignment(struct block_device *bdev) return q->limits.discard_alignment; } -static inline unsigned int bdev_write_same(struct block_device *bdev) -{ - struct request_queue *q = bdev_get_queue(bdev); - - if (q) - return q->limits.max_write_same_sectors; - - return 0; -} - static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); -- cgit From 763087dab97547230a6807c865a6a5ae53a59247 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 21 Feb 2022 19:21:12 -0800 Subject: net: add skb_set_end_offset() helper We have multiple places where this helper is convenient, and plan using it in the following patch. Signed-off-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a3e90efe6586..115be7f73487 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1536,6 +1536,11 @@ static inline unsigned int skb_end_offset(const struct sk_buff *skb) { return skb->end; } + +static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) +{ + skb->end = offset; +} #else static inline unsigned char *skb_end_pointer(const struct sk_buff *skb) { @@ -1546,6 +1551,11 @@ static inline unsigned int skb_end_offset(const struct sk_buff *skb) { return skb->end - skb->head; } + +static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) +{ + skb->end = skb->head + offset; +} #endif /* Internal */ -- cgit From 2b88cba55883eaafbc9b7cbff0b2c7cdba71ed01 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 21 Feb 2022 19:21:13 -0800 Subject: net: preserve skb_end_offset() in skb_unclone_keeptruesize() syzbot found another way to trigger the infamous WARN_ON_ONCE(delta < len) in skb_try_coalesce() [1] I was able to root cause the issue to kfence. When kfence is in action, the following assertion is no longer true: int size = xxxx; void *ptr1 = kmalloc(size, gfp); void *ptr2 = kmalloc(size, gfp); if (ptr1 && ptr2) ASSERT(ksize(ptr1) == ksize(ptr2)); We attempted to fix these issues in the blamed commits, but forgot that TCP was possibly shifting data after skb_unclone_keeptruesize() has been used, notably from tcp_retrans_try_collapse(). So we not only need to keep same skb->truesize value, we also need to make sure TCP wont fill new tailroom that pskb_expand_head() was able to get from a addr = kmalloc(...) followed by ksize(addr) Split skb_unclone_keeptruesize() into two parts: 1) Inline skb_unclone_keeptruesize() for the common case, when skb is not cloned. 2) Out of line __skb_unclone_keeptruesize() for the 'slow path'. WARNING: CPU: 1 PID: 6490 at net/core/skbuff.c:5295 skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295 Modules linked in: CPU: 1 PID: 6490 Comm: syz-executor161 Not tainted 5.17.0-rc4-syzkaller-00229-g4f12b742eb2b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:skb_try_coalesce+0x1235/0x1560 net/core/skbuff.c:5295 Code: bf 01 00 00 00 0f b7 c0 89 c6 89 44 24 20 e8 62 24 4e fa 8b 44 24 20 83 e8 01 0f 85 e5 f0 ff ff e9 87 f4 ff ff e8 cb 20 4e fa <0f> 0b e9 06 f9 ff ff e8 af b2 95 fa e9 69 f0 ff ff e8 95 b2 95 fa RSP: 0018:ffffc900063af268 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 00000000ffffffd5 RCX: 0000000000000000 RDX: ffff88806fc05700 RSI: ffffffff872abd55 RDI: 0000000000000003 RBP: ffff88806e675500 R08: 00000000ffffffd5 R09: 0000000000000000 R10: ffffffff872ab659 R11: 0000000000000000 R12: ffff88806dd554e8 R13: ffff88806dd9bac0 R14: ffff88806dd9a2c0 R15: 0000000000000155 FS: 00007f18014f9700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020002000 CR3: 000000006be7a000 CR4: 00000000003506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcp_try_coalesce net/ipv4/tcp_input.c:4651 [inline] tcp_try_coalesce+0x393/0x920 net/ipv4/tcp_input.c:4630 tcp_queue_rcv+0x8a/0x6e0 net/ipv4/tcp_input.c:4914 tcp_data_queue+0x11fd/0x4bb0 net/ipv4/tcp_input.c:5025 tcp_rcv_established+0x81e/0x1ff0 net/ipv4/tcp_input.c:5947 tcp_v4_do_rcv+0x65e/0x980 net/ipv4/tcp_ipv4.c:1719 sk_backlog_rcv include/net/sock.h:1037 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2779 release_sock+0x54/0x1b0 net/core/sock.c:3311 sk_wait_data+0x177/0x450 net/core/sock.c:2821 tcp_recvmsg_locked+0xe28/0x1fd0 net/ipv4/tcp.c:2457 tcp_recvmsg+0x137/0x610 net/ipv4/tcp.c:2572 inet_recvmsg+0x11b/0x5e0 net/ipv4/af_inet.c:850 sock_recvmsg_nosec net/socket.c:948 [inline] sock_recvmsg net/socket.c:966 [inline] sock_recvmsg net/socket.c:962 [inline] ____sys_recvmsg+0x2c4/0x600 net/socket.c:2632 ___sys_recvmsg+0x127/0x200 net/socket.c:2674 __sys_recvmsg+0xe2/0x1a0 net/socket.c:2704 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae Fixes: c4777efa751d ("net: add and use skb_unclone_keeptruesize() helper") Fixes: 097b9146c0e2 ("net: fix up truesize of cloned skb in skb_prepare_for_shift()") Reported-by: syzbot Signed-off-by: Eric Dumazet Cc: Marco Elver Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 115be7f73487..31be38078918 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1795,19 +1795,19 @@ static inline int skb_unclone(struct sk_buff *skb, gfp_t pri) return 0; } -/* This variant of skb_unclone() makes sure skb->truesize is not changed */ +/* This variant of skb_unclone() makes sure skb->truesize + * and skb_end_offset() are not changed, whenever a new skb->head is needed. + * + * Indeed there is no guarantee that ksize(kmalloc(X)) == ksize(kmalloc(X)) + * when various debugging features are in place. + */ +int __skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri); static inline int skb_unclone_keeptruesize(struct sk_buff *skb, gfp_t pri) { might_sleep_if(gfpflags_allow_blocking(pri)); - if (skb_cloned(skb)) { - unsigned int save = skb->truesize; - int res; - - res = pskb_expand_head(skb, 0, 0, pri); - skb->truesize = save; - return res; - } + if (skb_cloned(skb)) + return __skb_unclone_keeptruesize(skb, pri); return 0; } -- cgit From d0b9d6dcaa5ac480c272683919f387cc6d82b638 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 23 Sep 2021 20:42:44 +0200 Subject: sched/headers: Fix header to build standalone: Uses various kernel types that don't build standalone. Signed-off-by: Ingo Molnar Reviewed-by: Peter Zijlstra --- include/linux/sched_clock.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched_clock.h b/include/linux/sched_clock.h index 835ee87ed792..cb41c5edb4d4 100644 --- a/include/linux/sched_clock.h +++ b/include/linux/sched_clock.h @@ -5,6 +5,8 @@ #ifndef LINUX_SCHED_CLOCK #define LINUX_SCHED_CLOCK +#include + #ifdef CONFIG_GENERIC_SCHED_CLOCK /** * struct clock_read_data - data required to read from sched_clock() -- cgit From 669f45f19cf7feea7066dce86a0ce89dfc8a2065 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Tue, 22 Feb 2022 13:23:59 +0100 Subject: sched/headers: Add initial new headers as identity mappings This allows code sharing between fast-headers tree and the vanilla scheduler tree. Signed-off-by: Ingo Molnar Reviewed-by: Peter Zijlstra --- include/linux/cgroup_api.h | 1 + include/linux/cpumask_api.h | 1 + include/linux/fs_api.h | 1 + include/linux/gfp_api.h | 1 + include/linux/hashtable_api.h | 1 + include/linux/hrtimer_api.h | 1 + include/linux/kobject_api.h | 1 + include/linux/kref_api.h | 1 + include/linux/ktime_api.h | 1 + include/linux/llist_api.h | 1 + include/linux/lockdep_api.h | 1 + include/linux/mm_api.h | 1 + include/linux/mutex_api.h | 1 + include/linux/perf_event_api.h | 1 + include/linux/pgtable_api.h | 1 + include/linux/ptrace_api.h | 1 + include/linux/rcuwait_api.h | 1 + include/linux/refcount_api.h | 1 + include/linux/sched/affinity.h | 1 + include/linux/sched/cond_resched.h | 1 + include/linux/sched/posix-timers.h | 1 + include/linux/sched/rseq_api.h | 1 + include/linux/sched/task_flags.h | 1 + include/linux/sched/thread_info_api.h | 1 + include/linux/seqlock_api.h | 1 + include/linux/softirq.h | 1 + include/linux/spinlock_api.h | 1 + include/linux/swait_api.h | 1 + include/linux/syscalls_api.h | 1 + include/linux/u64_stats_sync_api.h | 1 + include/linux/wait_api.h | 1 + include/linux/workqueue_api.h | 1 + 32 files changed, 32 insertions(+) create mode 100644 include/linux/cgroup_api.h create mode 100644 include/linux/cpumask_api.h create mode 100644 include/linux/fs_api.h create mode 100644 include/linux/gfp_api.h create mode 100644 include/linux/hashtable_api.h create mode 100644 include/linux/hrtimer_api.h create mode 100644 include/linux/kobject_api.h create mode 100644 include/linux/kref_api.h create mode 100644 include/linux/ktime_api.h create mode 100644 include/linux/llist_api.h create mode 100644 include/linux/lockdep_api.h create mode 100644 include/linux/mm_api.h create mode 100644 include/linux/mutex_api.h create mode 100644 include/linux/perf_event_api.h create mode 100644 include/linux/pgtable_api.h create mode 100644 include/linux/ptrace_api.h create mode 100644 include/linux/rcuwait_api.h create mode 100644 include/linux/refcount_api.h create mode 100644 include/linux/sched/affinity.h create mode 100644 include/linux/sched/cond_resched.h create mode 100644 include/linux/sched/posix-timers.h create mode 100644 include/linux/sched/rseq_api.h create mode 100644 include/linux/sched/task_flags.h create mode 100644 include/linux/sched/thread_info_api.h create mode 100644 include/linux/seqlock_api.h create mode 100644 include/linux/softirq.h create mode 100644 include/linux/spinlock_api.h create mode 100644 include/linux/swait_api.h create mode 100644 include/linux/syscalls_api.h create mode 100644 include/linux/u64_stats_sync_api.h create mode 100644 include/linux/wait_api.h create mode 100644 include/linux/workqueue_api.h (limited to 'include/linux') diff --git a/include/linux/cgroup_api.h b/include/linux/cgroup_api.h new file mode 100644 index 000000000000..d0cfe8025111 --- /dev/null +++ b/include/linux/cgroup_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/cpumask_api.h b/include/linux/cpumask_api.h new file mode 100644 index 000000000000..83bd3ebe82b0 --- /dev/null +++ b/include/linux/cpumask_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/fs_api.h b/include/linux/fs_api.h new file mode 100644 index 000000000000..83be38d6d413 --- /dev/null +++ b/include/linux/fs_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/gfp_api.h b/include/linux/gfp_api.h new file mode 100644 index 000000000000..5a05a2764a86 --- /dev/null +++ b/include/linux/gfp_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/hashtable_api.h b/include/linux/hashtable_api.h new file mode 100644 index 000000000000..c268ac2c5c0e --- /dev/null +++ b/include/linux/hashtable_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/hrtimer_api.h b/include/linux/hrtimer_api.h new file mode 100644 index 000000000000..8d9700894468 --- /dev/null +++ b/include/linux/hrtimer_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/kobject_api.h b/include/linux/kobject_api.h new file mode 100644 index 000000000000..6e36a054c2d6 --- /dev/null +++ b/include/linux/kobject_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/kref_api.h b/include/linux/kref_api.h new file mode 100644 index 000000000000..d67e554721d2 --- /dev/null +++ b/include/linux/kref_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/ktime_api.h b/include/linux/ktime_api.h new file mode 100644 index 000000000000..f697d493960f --- /dev/null +++ b/include/linux/ktime_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/llist_api.h b/include/linux/llist_api.h new file mode 100644 index 000000000000..625bec0393a1 --- /dev/null +++ b/include/linux/llist_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/lockdep_api.h b/include/linux/lockdep_api.h new file mode 100644 index 000000000000..907e66979ab2 --- /dev/null +++ b/include/linux/lockdep_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/mm_api.h b/include/linux/mm_api.h new file mode 100644 index 000000000000..a5ace2b198b8 --- /dev/null +++ b/include/linux/mm_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/mutex_api.h b/include/linux/mutex_api.h new file mode 100644 index 000000000000..85ab9491e13e --- /dev/null +++ b/include/linux/mutex_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/perf_event_api.h b/include/linux/perf_event_api.h new file mode 100644 index 000000000000..c2fd6048b790 --- /dev/null +++ b/include/linux/perf_event_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/pgtable_api.h b/include/linux/pgtable_api.h new file mode 100644 index 000000000000..ff367a4ba8c4 --- /dev/null +++ b/include/linux/pgtable_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/ptrace_api.h b/include/linux/ptrace_api.h new file mode 100644 index 000000000000..26e7d275ad8d --- /dev/null +++ b/include/linux/ptrace_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/rcuwait_api.h b/include/linux/rcuwait_api.h new file mode 100644 index 000000000000..f962e28544dd --- /dev/null +++ b/include/linux/rcuwait_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/refcount_api.h b/include/linux/refcount_api.h new file mode 100644 index 000000000000..5f032589f568 --- /dev/null +++ b/include/linux/refcount_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/sched/affinity.h b/include/linux/sched/affinity.h new file mode 100644 index 000000000000..227f5be81bcd --- /dev/null +++ b/include/linux/sched/affinity.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/sched/cond_resched.h b/include/linux/sched/cond_resched.h new file mode 100644 index 000000000000..227f5be81bcd --- /dev/null +++ b/include/linux/sched/cond_resched.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/sched/posix-timers.h b/include/linux/sched/posix-timers.h new file mode 100644 index 000000000000..523a381d6c88 --- /dev/null +++ b/include/linux/sched/posix-timers.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/sched/rseq_api.h b/include/linux/sched/rseq_api.h new file mode 100644 index 000000000000..cf2af72693e1 --- /dev/null +++ b/include/linux/sched/rseq_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/sched/task_flags.h b/include/linux/sched/task_flags.h new file mode 100644 index 000000000000..227f5be81bcd --- /dev/null +++ b/include/linux/sched/task_flags.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/sched/thread_info_api.h b/include/linux/sched/thread_info_api.h new file mode 100644 index 000000000000..2c60fbc16c08 --- /dev/null +++ b/include/linux/sched/thread_info_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/seqlock_api.h b/include/linux/seqlock_api.h new file mode 100644 index 000000000000..be91e7d3b826 --- /dev/null +++ b/include/linux/seqlock_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/softirq.h b/include/linux/softirq.h new file mode 100644 index 000000000000..c73d7dcb4cb5 --- /dev/null +++ b/include/linux/softirq.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/spinlock_api.h b/include/linux/spinlock_api.h new file mode 100644 index 000000000000..6338b27f98df --- /dev/null +++ b/include/linux/spinlock_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/swait_api.h b/include/linux/swait_api.h new file mode 100644 index 000000000000..1eeaaaaa5ea7 --- /dev/null +++ b/include/linux/swait_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/syscalls_api.h b/include/linux/syscalls_api.h new file mode 100644 index 000000000000..23e012b04db4 --- /dev/null +++ b/include/linux/syscalls_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/u64_stats_sync_api.h b/include/linux/u64_stats_sync_api.h new file mode 100644 index 000000000000..c72ca63da44b --- /dev/null +++ b/include/linux/u64_stats_sync_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/wait_api.h b/include/linux/wait_api.h new file mode 100644 index 000000000000..4e930548935a --- /dev/null +++ b/include/linux/wait_api.h @@ -0,0 +1 @@ +#include diff --git a/include/linux/workqueue_api.h b/include/linux/workqueue_api.h new file mode 100644 index 000000000000..77debb5d2760 --- /dev/null +++ b/include/linux/workqueue_api.h @@ -0,0 +1 @@ +#include -- cgit From fbed5664b73830dcb1a19af4f7f1f1b424f54609 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Tue, 22 Feb 2022 13:28:20 +0100 Subject: sched/headers: Make the header build standalone This header depends on various scheduler definitions. Signed-off-by: Ingo Molnar Reviewed-by: Peter Zijlstra --- include/linux/sched/deadline.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched/deadline.h b/include/linux/sched/deadline.h index 1aff00b65f3c..7c83d4d5a971 100644 --- a/include/linux/sched/deadline.h +++ b/include/linux/sched/deadline.h @@ -6,6 +6,8 @@ * NORMAL/BATCH tasks. */ +#include + #define MAX_DL_PRIO 0 static inline int dl_prio(int prio) -- cgit From b26ef81c46ed15d11ddddba9ba1cd52c749385ad Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Tue, 22 Feb 2022 14:04:50 -0800 Subject: drop_monitor: remove quadratic behavior drop_monitor is using an unique list on which all netdevices in the host have an element, regardless of their netns. This scales poorly, not only at device unregister time (what I caught during my netns dismantle stress tests), but also at packet processing time whenever trace_napi_poll_hit() is called. If the intent was to avoid adding one pointer in 'struct net_device' then surely we prefer O(1) behavior. Signed-off-by: Eric Dumazet Cc: Neil Horman Signed-off-by: David S. Miller --- include/linux/netdevice.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 93fc680b658f..c79ee2296296 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2236,7 +2236,9 @@ struct net_device { #if IS_ENABLED(CONFIG_MRP) struct mrp_port __rcu *mrp_port; #endif - +#if IS_ENABLED(CONFIG_NET_DROP_MONITOR) + struct dm_hw_stat_delta __rcu *dm_private; +#endif struct device dev; const struct attribute_group *sysfs_groups[4]; const struct attribute_group *sysfs_rx_queue_group; -- cgit From a21d9a670d81103db7f788de1a4a4a6e4b891a0b Mon Sep 17 00:00:00 2001 From: Hans Schultz Date: Wed, 23 Feb 2022 11:16:46 +0100 Subject: net: bridge: Add support for bridge port in locked mode In a 802.1X scenario, clients connected to a bridge port shall not be allowed to have traffic forwarded until fully authenticated. A static fdb entry of the clients MAC address for the bridge port unlocks the client and allows bidirectional communication. This scenario is facilitated with setting the bridge port in locked mode, which is also supported by various switchcore chipsets. Signed-off-by: Hans Schultz Acked-by: Nikolay Aleksandrov Reviewed-by: Ido Schimmel Signed-off-by: David S. Miller --- include/linux/if_bridge.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 509e18c7e740..3aae023a9353 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -58,6 +58,7 @@ struct br_ip_list { #define BR_MRP_LOST_CONT BIT(18) #define BR_MRP_LOST_IN_CONT BIT(19) #define BR_TX_FWD_OFFLOAD BIT(20) +#define BR_PORT_LOCKED BIT(21) #define BR_DEFAULT_AGEING_TIME (300 * HZ) -- cgit From c2700d2886a87f83f31e0a301de1d2350b52c79b Mon Sep 17 00:00:00 2001 From: Varun Prakash Date: Sat, 22 Jan 2022 22:27:44 +0530 Subject: nvme-tcp: send H2CData PDUs based on MAXH2CDATA As per NVMe/TCP specification (revision 1.0a, section 3.6.2.3) Maximum Host to Controller Data length (MAXH2CDATA): Specifies the maximum number of PDU-Data bytes per H2CData PDU in bytes. This value is a multiple of dwords and should be no less than 4,096. Current code sets H2CData PDU data_length to r2t_length, it does not check MAXH2CDATA value. Fix this by setting H2CData PDU data_length to min(req->h2cdata_left, queue->maxh2cdata). Also validate MAXH2CDATA value returned by target in ICResp PDU, if it is not a multiple of dword or if it is less than 4096 return -EINVAL from nvme_tcp_init_connection(). Signed-off-by: Varun Prakash Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig --- include/linux/nvme-tcp.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nvme-tcp.h b/include/linux/nvme-tcp.h index 959e0bd9a913..75470159a194 100644 --- a/include/linux/nvme-tcp.h +++ b/include/linux/nvme-tcp.h @@ -12,6 +12,7 @@ #define NVME_TCP_DISC_PORT 8009 #define NVME_TCP_ADMIN_CCSZ SZ_8K #define NVME_TCP_DIGEST_LENGTH 4 +#define NVME_TCP_MIN_MAXH2CDATA 4096 enum nvme_tcp_pfv { NVME_TCP_PFV_1_0 = 0x0, -- cgit From f2eb478f2f322217aa642e11c1cc011f99c797e6 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 22 Feb 2022 08:07:13 +0100 Subject: kernfs: move struct kernfs_root out of the public view. There is no need to have struct kernfs_root be part of kernfs.h for the whole kernel to see and poke around it. Move it internal to kernfs code and provide a helper function, kernfs_root_to_node(), to handle the one field that kernfs users were directly accessing from the structure. Cc: Imran Khan Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220222070713.3517679-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman --- include/linux/kernfs.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 861c4f0f8a29..62aff082dc3f 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -185,6 +185,7 @@ struct kernfs_syscall_ops { struct kernfs_root *root); }; +#if 0 struct kernfs_root { /* published fields */ struct kernfs_node *kn; @@ -202,6 +203,9 @@ struct kernfs_root { wait_queue_head_t deactivate_waitq; struct rw_semaphore kernfs_rwsem; }; +#endif + +struct kernfs_node *kernfs_root_to_node(struct kernfs_root *root); struct kernfs_open_file { /* published fields */ -- cgit From c404c64d64bc31bebe8a2015103671f7cd282731 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Sun, 30 Jan 2022 22:02:06 +0100 Subject: powercap/dtpm: Destroy hierarchy function The hierarchy creation function exits but without a destroy hierarchy function. Due to that, the modules creating the hierarchy can not be unloaded properly because they don't have an exit callback. Provide the dtpm_destroy_hierarchy() function to remove the previously created hierarchy. The function relies on all the release mechanisms implemented by the underlying powercap framework. Signed-off-by: Daniel Lezcano Reviewed-by: Ulf Hansson Link: https://lore.kernel.org/r/20220130210210.549877-4-daniel.lezcano@linaro.org --- include/linux/dtpm.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dtpm.h b/include/linux/dtpm.h index f7a25c70dd4c..a4a13514b730 100644 --- a/include/linux/dtpm.h +++ b/include/linux/dtpm.h @@ -37,6 +37,7 @@ struct device_node; struct dtpm_subsys_ops { const char *name; int (*init)(void); + void (*exit)(void); int (*setup)(struct dtpm *, struct device_node *); }; @@ -67,4 +68,6 @@ void dtpm_unregister(struct dtpm *dtpm); int dtpm_register(const char *name, struct dtpm *dtpm, struct dtpm *parent); int dtpm_create_hierarchy(struct of_device_id *dtpm_match_table); + +void dtpm_destroy_hierarchy(void); #endif -- cgit From 43c075959de3c45608636d9d80ff9e61d166fb21 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 26 Jan 2022 10:57:07 -0800 Subject: mlx5: remove unused static inlines mlx5 has some unused static inline helpers in include/ while at it also clean static inlines in the driver itself. Signed-off-by: Jakub Kicinski Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 78655d8d13a7..1b398c9e17b9 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -863,20 +863,10 @@ struct mlx5_hca_vport_context { bool grh_required; }; -static inline void *mlx5_buf_offset(struct mlx5_frag_buf *buf, int offset) -{ - return buf->frags->buf + offset; -} - #define STRUCT_FIELD(header, field) \ .struct_offset_bytes = offsetof(struct ib_unpacked_ ## header, field), \ .struct_size_bytes = sizeof((struct ib_unpacked_ ## header *)0)->field -static inline struct mlx5_core_dev *pci2mlx5_core_dev(struct pci_dev *pdev) -{ - return pci_get_drvdata(pdev); -} - extern struct dentry *mlx5_debugfs_root; static inline u16 fw_rev_maj(struct mlx5_core_dev *dev) -- cgit From c2c922dae77f36e24d246c6e310cee0c61afc6fb Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Mon, 29 Nov 2021 16:24:28 +0200 Subject: net/mlx5: Add ability to insert to specific flow group If the flow table isn't an autogroup the upper driver has to create the flow groups explicitly. This information can't later be used when creating rules to insert into a specific flow group. Allow such use case. Signed-off-by: Mark Bloch Reviewed-by: Maor Gottlieb Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index b1aad14689e3..e3bfed68b08a 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -224,6 +224,7 @@ struct mlx5_flow_act { u32 flags; struct mlx5_fs_vlan vlan[MLX5_FS_VLAN_DEPTH]; struct ib_counters *counters; + struct mlx5_flow_group *fg; }; #define MLX5_DECLARE_FLOW_ACT(name) \ -- cgit From 605bef0015b163867127202b821dce79804d603d Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Sun, 5 Apr 2020 17:50:25 -0700 Subject: net/mlx5: cmdif, cmd_check refactoring Do not mangle the command outbox in the internal low level cmd_exec and cmd_invoke functions. Instead return a proper unique error code and move the driver error checking to be at a higher level in mlx5_cmd_exec(). Signed-off-by: Saeed Mahameed Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 1b398c9e17b9..8a8408708e6c 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -981,7 +981,6 @@ int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int mlx5_cmd_exec_polling(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int out_size); -void mlx5_cmd_mbox_status(void *out, u8 *status, u32 *syndrome); bool mlx5_cmd_is_down(struct mlx5_core_dev *dev); int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type); -- cgit From f23519e542e51c19ab3081deb089bb3f8fec7bb9 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Sat, 17 Aug 2019 03:05:10 -0700 Subject: net/mlx5: cmdif, Add new api for command execution Add mlx5_cmd_do. Unlike mlx5_cmd_exec, this function will not modify or translate outbox.status. The function will return: return = 0: Command was executed, outbox.status == MLX5_CMD_STAT_OK. return = -EREMOTEIO: Executed, outbox.status != MLX5_CMD_STAT_OK. return < 0: Command execution couldn't be performed by FW or driver. And document other mlx5_cmd_exec functions. Signed-off-by: Saeed Mahameed Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 8a8408708e6c..1b9bec8fa870 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -964,6 +964,8 @@ int mlx5_cmd_exec_cb(struct mlx5_async_ctx *ctx, void *in, int in_size, void *out, int out_size, mlx5_async_cbk_t callback, struct mlx5_async_work *work); +int mlx5_cmd_do(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int out_size); +int mlx5_cmd_check(struct mlx5_core_dev *dev, int err, void *in, void *out); int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int out_size); -- cgit From 31803e59233efc838b9dcb26edea28a4b2389e97 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Mon, 30 Mar 2020 21:12:58 -0700 Subject: net/mlx5: Use mlx5_cmd_do() in core create_{cq,dct} mlx5_core_create_{cq/dct} functions are non-trivial mlx5 commands functions. They check command execution status themselves and hide valuable FW failure information. For mlx5_core/eth kernel user this is what we actually want, but for a devx/rdma user the hidden information is essential and should be propagated up to the caller, thus we convert these commands to use mlx5_cmd_do to return the FW/driver and command outbox status as is, and let the caller decide what to do with it. For kernel callers of mlx5_core_create_{cq/dct} or those who only care about the binary status (FAIL/SUCCESS) they must check status themselves via mlx5_cmd_check() to restore the current behavior. err = mlx5_create_cq(in, out) err = mlx5_cmd_check(err, in, out) if (err) // handle err For DEVX users and those who care about full visibility, They will just propagate the error to user space, and app can check if err == -EREMOTEIO, then outbox.{status,syndrome} are valid. API Note: mlx5_cmd_check() must be used by kernel users since it allows the driver to intercept the command execution status and return a driver simulated status in case of driver induced error handling or reset/recovery flows. Signed-off-by: Saeed Mahameed Signed-off-by: Saeed Mahameed --- include/linux/mlx5/cq.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/cq.h b/include/linux/mlx5/cq.h index 7bfb67363434..cb15308b5cb0 100644 --- a/include/linux/mlx5/cq.h +++ b/include/linux/mlx5/cq.h @@ -183,6 +183,8 @@ static inline void mlx5_cq_put(struct mlx5_core_cq *cq) complete(&cq->free); } +int mlx5_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, + u32 *in, int inlen, u32 *out, int outlen); int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq, u32 *in, int inlen, u32 *out, int outlen); int mlx5_core_destroy_cq(struct mlx5_core_dev *dev, struct mlx5_core_cq *cq); -- cgit From 0a41527608e7f3da61e76564f5a8749a1fddc7f1 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Tue, 31 Mar 2020 22:03:00 -0700 Subject: net/mlx5: cmdif, Refactor error handling and reporting of async commands Same as the new mlx5_cmd_do API, report all information to callers and let them handle the error values and outbox parsing. The user callback status "work->user_callback(status)" is now similar to the error rc code returned from the blocking mlx5_cmd_do() version, and now is defined as follows: -EREMOTEIO : Command executed by FW, outbox.status != MLX5_CMD_STAT_OK. Caller must check FW outbox status. 0 : Command execution successful, outbox.status == MLX5_CMD_STAT_OK. < 0 : Command couldn't execute, FW or driver induced error. Signed-off-by: Saeed Mahameed Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 1b9bec8fa870..432151d7d0bf 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -955,6 +955,7 @@ typedef void (*mlx5_async_cbk_t)(int status, struct mlx5_async_work *context); struct mlx5_async_work { struct mlx5_async_ctx *ctx; mlx5_async_cbk_t user_callback; + void *out; /* pointer to the cmd output buffer */ }; void mlx5_cmd_init_async_ctx(struct mlx5_core_dev *dev, @@ -963,7 +964,7 @@ void mlx5_cmd_cleanup_async_ctx(struct mlx5_async_ctx *ctx); int mlx5_cmd_exec_cb(struct mlx5_async_ctx *ctx, void *in, int in_size, void *out, int out_size, mlx5_async_cbk_t callback, struct mlx5_async_work *work); - +void mlx5_cmd_out_err(struct mlx5_core_dev *dev, u16 opcode, u16 op_mod, void *out); int mlx5_cmd_do(struct mlx5_core_dev *dev, void *in, int in_size, void *out, int out_size); int mlx5_cmd_check(struct mlx5_core_dev *dev, int err, void *in, void *out); int mlx5_cmd_exec(struct mlx5_core_dev *dev, void *in, int in_size, void *out, -- cgit From 72fb3b60a3114a1154a8ae5629ea3b43a88a7a4d Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Sat, 14 Aug 2021 17:57:51 +0300 Subject: net/mlx5: Add reset_state field to MFRL register Add new field reset_state to MFRL register. This field expose current state of sync reset for fw update. This field enables sharing with the user more details on why fw activate failed in case it failed the sync reset stage. Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 598ac3bcc901..8ca2d65ff789 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -9694,7 +9694,8 @@ struct mlx5_ifc_pcam_reg_bits { }; struct mlx5_ifc_mcam_enhanced_features_bits { - u8 reserved_at_0[0x6b]; + u8 reserved_at_0[0x6a]; + u8 reset_state[0x1]; u8 ptpcyc2realtime_modify[0x1]; u8 reserved_at_6c[0x2]; u8 pci_status_and_power[0x1]; @@ -10375,6 +10376,14 @@ struct mlx5_ifc_mcda_reg_bits { u8 data[][0x20]; }; +enum { + MLX5_MFRL_REG_RESET_STATE_IDLE = 0, + MLX5_MFRL_REG_RESET_STATE_IN_NEGOTIATION = 1, + MLX5_MFRL_REG_RESET_STATE_RESET_IN_PROGRESS = 2, + MLX5_MFRL_REG_RESET_STATE_TIMEOUT = 3, + MLX5_MFRL_REG_RESET_STATE_NACK = 4, +}; + enum { MLX5_MFRL_REG_RESET_TYPE_FULL_CHIP = BIT(0), MLX5_MFRL_REG_RESET_TYPE_NET_PORT_ALIVE = BIT(1), @@ -10393,7 +10402,8 @@ struct mlx5_ifc_mfrl_reg_bits { u8 pci_sync_for_fw_update_start[0x1]; u8 pci_sync_for_fw_update_resp[0x2]; u8 rst_type_sel[0x3]; - u8 reserved_at_28[0x8]; + u8 reserved_at_28[0x4]; + u8 reset_state[0x4]; u8 reset_type[0x8]; u8 reset_level[0x8]; }; -- cgit From 45fee8edb4b333af79efad7a99de51718ebda94b Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Mon, 6 Sep 2021 11:02:44 +0300 Subject: net/mlx5: Add clarification on sync reset failure In case devlink reload action fw_activate failed in sync reset stage, use the new MFRL field reset_state to find why it failed and share this clarification with the user. Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 432151d7d0bf..d3b1a6a1f8d2 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1031,6 +1031,9 @@ int mlx5_core_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn) void mlx5_qp_debugfs_init(struct mlx5_core_dev *dev); void mlx5_qp_debugfs_cleanup(struct mlx5_core_dev *dev); +int mlx5_access_reg(struct mlx5_core_dev *dev, void *data_in, int size_in, + void *data_out, int size_out, u16 reg_id, int arg, + int write, bool verbose); int mlx5_core_access_reg(struct mlx5_core_dev *dev, void *data_in, int size_in, void *data_out, int size_out, u16 reg_num, int arg, int write); -- cgit From 1241e329ce2e1f5b1039fd356b75867b29721ad2 Mon Sep 17 00:00:00 2001 From: Subbaraya Sundeep Date: Wed, 23 Feb 2022 00:09:12 +0530 Subject: ethtool: add support to set/get completion queue event size Add support to set completion queue event size via ethtool -G parameter and get it via ethtool -g parameter. ~ # ./ethtool -G eth0 cqe-size 512 ~ # ./ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 1048576 RX Mini: n/a RX Jumbo: n/a TX: 1048576 Current hardware settings: RX: 256 RX Mini: n/a RX Jumbo: n/a TX: 4096 RX Buf Len: 2048 CQE Size: 128 Signed-off-by: Subbaraya Sundeep Signed-off-by: Sunil Goutham Signed-off-by: Jakub Kicinski --- include/linux/ethtool.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index e0853f48b75e..4af58459a1e7 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -71,18 +71,22 @@ enum { * struct kernel_ethtool_ringparam - RX/TX ring configuration * @rx_buf_len: Current length of buffers on the rx ring. * @tcp_data_split: Scatter packet headers and data to separate buffers + * @cqe_size: Size of TX/RX completion queue event */ struct kernel_ethtool_ringparam { u32 rx_buf_len; u8 tcp_data_split; + u32 cqe_size; }; /** * enum ethtool_supported_ring_param - indicator caps for setting ring params * @ETHTOOL_RING_USE_RX_BUF_LEN: capture for setting rx_buf_len + * @ETHTOOL_RING_USE_CQE_SIZE: capture for setting cqe_size */ enum ethtool_supported_ring_param { ETHTOOL_RING_USE_RX_BUF_LEN = BIT(0), + ETHTOOL_RING_USE_CQE_SIZE = BIT(1), }; #define __ETH_RSS_HASH_BIT(bit) ((u32)1 << (bit)) -- cgit From 5597f082fcaf17e2d9adee7a1f6fa02f5872f9d5 Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Thu, 13 Jan 2022 15:34:07 +0100 Subject: can: bittiming: mark function arguments and local variables as const This patch marks the arguments of some functions as well as some local variables as constant. Link: https://lore.kernel.org/all/20220124215642.3474154-7-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde --- include/linux/can/bittiming.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/can/bittiming.h b/include/linux/can/bittiming.h index a81652d1c6f3..7ae21c0f7f23 100644 --- a/include/linux/can/bittiming.h +++ b/include/linux/can/bittiming.h @@ -113,7 +113,7 @@ struct can_tdc_const { }; #ifdef CONFIG_CAN_CALC_BITTIMING -int can_calc_bittiming(struct net_device *dev, struct can_bittiming *bt, +int can_calc_bittiming(const struct net_device *dev, struct can_bittiming *bt, const struct can_bittiming_const *btc); void can_calc_tdco(struct can_tdc *tdc, const struct can_tdc_const *tdc_const, @@ -121,7 +121,7 @@ void can_calc_tdco(struct can_tdc *tdc, const struct can_tdc_const *tdc_const, u32 *ctrlmode, u32 ctrlmode_supported); #else /* !CONFIG_CAN_CALC_BITTIMING */ static inline int -can_calc_bittiming(struct net_device *dev, struct can_bittiming *bt, +can_calc_bittiming(const struct net_device *dev, struct can_bittiming *bt, const struct can_bittiming_const *btc) { netdev_err(dev, "bit-timing calculation not available\n"); @@ -136,7 +136,7 @@ can_calc_tdco(struct can_tdc *tdc, const struct can_tdc_const *tdc_const, } #endif /* CONFIG_CAN_CALC_BITTIMING */ -int can_get_bittiming(struct net_device *dev, struct can_bittiming *bt, +int can_get_bittiming(const struct net_device *dev, struct can_bittiming *bt, const struct can_bittiming_const *btc, const u32 *bitrate_const, const unsigned int bitrate_const_cnt); -- cgit From 05f2281b4192320a20d746df6146b3dd82f96e39 Mon Sep 17 00:00:00 2001 From: Ricardo Rivera-Matos Date: Mon, 14 Feb 2022 18:07:56 -0600 Subject: power: supply: Introduces bypass charging property Adds a POWER_SUPPLY_CHARGE_TYPE_BYPASS option to the POWER_SUPPLY_PROP_CHARGE_TYPE property to facilitate bypass charging operation. In bypass charging operation, the charger bypasses the charging path around the integrated converter allowing for a "smart" wall adaptor to perform the power conversion externally. This operational mode is critical for the USB PPS standard of power adaptors and is becoming a common feature in modern charging ICs such as: - BQ25980 - BQ25975 - BQ25960 - LN8000 - LN8410 Signed-off-by: Ricardo Rivera-Matos Signed-off-by: Sebastian Reichel --- include/linux/power_supply.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h index 006111917d1a..c135196aa9d1 100644 --- a/include/linux/power_supply.h +++ b/include/linux/power_supply.h @@ -49,6 +49,7 @@ enum { POWER_SUPPLY_CHARGE_TYPE_ADAPTIVE, /* dynamically adjusted speed */ POWER_SUPPLY_CHARGE_TYPE_CUSTOM, /* use CHARGE_CONTROL_* props */ POWER_SUPPLY_CHARGE_TYPE_LONGLIFE, /* slow speed, longer life */ + POWER_SUPPLY_CHARGE_TYPE_BYPASS, /* bypassing the charger */ }; enum { -- cgit From 4f0b903ded728c505850daf2914bfc08841f0ae6 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 23 Feb 2022 17:14:37 +0200 Subject: fsnotify: fix merge with parent's ignored mask fsnotify_parent() does not consider the parent's mark at all unless the parent inode shows interest in events on children and in the specific event. So unless parent added an event to both its mark mask and ignored mask, the event will not be ignored. Fix this by declaring the interest of an object in an event when the event is in either a mark mask or ignored mask. Link: https://lore.kernel.org/r/20220223151438.790268-2-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify_backend.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 790c31844db5..5f9c960049b0 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -601,6 +601,21 @@ extern void fsnotify_remove_queued_event(struct fsnotify_group *group, /* functions used to manipulate the marks attached to inodes */ +/* Get mask for calculating object interest taking ignored mask into account */ +static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) +{ + __u32 mask = mark->mask; + + if (!mark->ignored_mask) + return mask; + + /* + * If mark is interested in ignoring events on children, the object must + * show interest in those events for fsnotify_parent() to notice it. + */ + return mask | (mark->ignored_mask & ALL_FSNOTIFY_EVENTS); +} + /* Get mask of events for a list of marks */ extern __u32 fsnotify_conn_mask(struct fsnotify_mark_connector *conn); /* Calculate mask of events for a list of marks */ -- cgit From 04e317ba72d07901b03399b3d1525e83424df5b3 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 23 Feb 2022 17:14:38 +0200 Subject: fsnotify: optimize FS_MODIFY events with no ignored masks fsnotify() treats FS_MODIFY events specially - it does not skip them even if the FS_MODIFY event does not apear in the object's fsnotify mask. This is because send_to_group() checks if FS_MODIFY needs to clear ignored mask of marks. The common case is that an object does not have any mark with ignored mask and in particular, that it does not have a mark with ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag. Set FS_MODIFY in object's fsnotify mask during fsnotify_recalc_mask() if object has a mark with an ignored mask and without the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag and remove the special treatment of FS_MODIFY in fsnotify(), so that FS_MODIFY events could be optimized in the common case. Call fsnotify_recalc_mask() from fanotify after adding or removing an ignored mask from a mark without FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY or when adding the FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY flag to a mark with ignored mask (the flag cannot be removed by fanotify uapi). Performance results for doing 10000000 write(2)s to tmpfs: vanilla patched without notification mark 25.486+-1.054 24.965+-0.244 with notification mark 30.111+-0.139 26.891+-1.355 So we can see the overhead of notification subsystem has been drastically reduced. Link: https://lore.kernel.org/r/20220223151438.790268-3-amir73il@gmail.com Suggested-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify_backend.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 5f9c960049b0..0805b74cae44 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -609,6 +609,10 @@ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) if (!mark->ignored_mask) return mask; + /* Interest in FS_MODIFY may be needed for clearing ignored mask */ + if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) + mask |= FS_MODIFY; + /* * If mark is interested in ignoring events on children, the object must * show interest in those events for fsnotify_parent() to notice it. -- cgit From 0cc62aed370d91ec3d5cf258e53028c736e88a09 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 21 Jan 2022 08:42:21 +0000 Subject: sizes.h: Add SZ_1T macro MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Today drivers/pci/controller/pci-xgene.c defines SZ_1T Move it into linux/sizes.h so that it can be re-used elsewhere. Link: https://lore.kernel.org/r/575cb7164cf124c75df7cb9242ea7374733942bf.1642752946.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy Signed-off-by: Lorenzo Pieralisi Reviewed-by: Krzysztof Wilczyński Acked-by: Bjorn Helgaas Cc: Toan Le Cc: linux-pci@vger.kernel.org --- include/linux/sizes.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sizes.h b/include/linux/sizes.h index 1ac79bcee2bb..84aa448d8bb3 100644 --- a/include/linux/sizes.h +++ b/include/linux/sizes.h @@ -47,6 +47,8 @@ #define SZ_8G _AC(0x200000000, ULL) #define SZ_16G _AC(0x400000000, ULL) #define SZ_32G _AC(0x800000000, ULL) + +#define SZ_1T _AC(0x10000000000, ULL) #define SZ_64T _AC(0x400000000000, ULL) #endif /* __LINUX_SIZES_H__ */ -- cgit From 34737e26980341519d00e84711fe619f9f47e79c Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 11 Feb 2022 08:50:00 +0100 Subject: uaccess: add generic __{get,put}_kernel_nofault Nine architectures are still missing __{get,put}_kernel_nofault: alpha, ia64, microblaze, nds32, nios2, openrisc, sh, sparc32, xtensa. Add a generic version that lets everything use the normal copy_{from,to}_kernel_nofault() code based on these, removing the last use of get_fs()/set_fs() from architecture-independent code. Reviewed-by: Christoph Hellwig Acked-by: Geert Uytterhoeven Signed-off-by: Arnd Bergmann --- include/linux/uaccess.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include/linux') diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index ac0394087f7d..67e9bc94dc40 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -368,6 +368,25 @@ long strncpy_from_user_nofault(char *dst, const void __user *unsafe_addr, long count); long strnlen_user_nofault(const void __user *unsafe_addr, long count); +#ifndef __get_kernel_nofault +#define __get_kernel_nofault(dst, src, type, label) \ +do { \ + type __user *p = (type __force __user *)(src); \ + type data; \ + if (__get_user(data, p)) \ + goto label; \ + *(type *)dst = data; \ +} while (0) + +#define __put_kernel_nofault(dst, src, type, label) \ +do { \ + type __user *p = (type __force __user *)(dst); \ + type data = *(type *)src; \ + if (__put_user(data, p)) \ + goto label; \ +} while (0) +#endif + /** * get_kernel_nofault(): safely attempt to read from a location * @val: read into this variable -- cgit From 12700c17fc286149324f92d6d380bc48e43f253d Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 15 Feb 2022 17:55:04 +0100 Subject: uaccess: generalize access_ok() There are many different ways that access_ok() is defined across architectures, but in the end, they all just compare against the user_addr_max() value or they accept anything. Provide one definition that works for most architectures, checking against TASK_SIZE_MAX for user processes or skipping the check inside of uaccess_kernel() sections. For architectures without CONFIG_SET_FS(), this should be the fastest check, as it comes down to a single comparison of a pointer against a compile-time constant, while the architecture specific versions tend to do something more complex for historic reasons or get something wrong. Type checking for __user annotations is handled inconsistently across architectures, but this is easily simplified as well by using an inline function that takes a 'const void __user *' argument. A handful of callers need an extra __user annotation for this. Some architectures had trick to use 33-bit or 65-bit arithmetic on the addresses to calculate the overflow, however this simpler version uses fewer registers, which means it can produce better object code in the end despite needing a second (statically predicted) branch. Reviewed-by: Christoph Hellwig Acked-by: Mark Rutland [arm64, asm-generic] Acked-by: Geert Uytterhoeven Acked-by: Stafford Horne Acked-by: Dinh Nguyen Signed-off-by: Arnd Bergmann --- include/linux/uaccess.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 67e9bc94dc40..2c31667e62e0 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -33,13 +33,6 @@ typedef struct { /* empty dummy */ } mm_segment_t; -#ifndef TASK_SIZE_MAX -#define TASK_SIZE_MAX TASK_SIZE -#endif - -#define uaccess_kernel() (false) -#define user_addr_max() (TASK_SIZE_MAX) - static inline mm_segment_t force_uaccess_begin(void) { return (mm_segment_t) { }; -- cgit From 967747bbc084b93b54e66f9047d342232314cd25 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 11 Feb 2022 21:42:45 +0100 Subject: uaccess: remove CONFIG_SET_FS There are no remaining callers of set_fs(), so CONFIG_SET_FS can be removed globally, along with the thread_info field and any references to it. This turns access_ok() into a cheaper check against TASK_SIZE_MAX. As CONFIG_SET_FS is now gone, drop all remaining references to set_fs()/get_fs(), mm_segment_t, user_addr_max() and uaccess_kernel(). Acked-by: Sam Ravnborg # for sparc32 changes Acked-by: "Eric W. Biederman" Tested-by: Sergey Matyukevich # for arc changes Acked-by: Stafford Horne # [openrisc, asm-generic] Acked-by: Dinh Nguyen Signed-off-by: Arnd Bergmann --- include/linux/syscalls.h | 4 ---- include/linux/uaccess.h | 33 --------------------------------- 2 files changed, 37 deletions(-) (limited to 'include/linux') diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index 819c0cb00b6d..a34b0f9a9972 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -290,10 +290,6 @@ static inline void addr_limit_user_check(void) return; #endif - if (CHECK_DATA_CORRUPTION(uaccess_kernel(), - "Invalid address limit on user-mode return")) - force_sig(SIGKILL); - #ifdef TIF_FSCHECK clear_thread_flag(TIF_FSCHECK); #endif diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 2c31667e62e0..2421a41f3a8e 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -10,39 +10,6 @@ #include -#ifdef CONFIG_SET_FS -/* - * Force the uaccess routines to be wired up for actual userspace access, - * overriding any possible set_fs(KERNEL_DS) still lingering around. Undone - * using force_uaccess_end below. - */ -static inline mm_segment_t force_uaccess_begin(void) -{ - mm_segment_t fs = get_fs(); - - set_fs(USER_DS); - return fs; -} - -static inline void force_uaccess_end(mm_segment_t oldfs) -{ - set_fs(oldfs); -} -#else /* CONFIG_SET_FS */ -typedef struct { - /* empty dummy */ -} mm_segment_t; - -static inline mm_segment_t force_uaccess_begin(void) -{ - return (mm_segment_t) { }; -} - -static inline void force_uaccess_end(mm_segment_t oldfs) -{ -} -#endif /* CONFIG_SET_FS */ - /* * Architectures should provide two primitives (raw_copy_{to,from}_user()) * and get rid of their private instances of copy_{to,from}_user() and -- cgit From 2c861b73a23b1237bdd054b6023bb27f6747c2b9 Mon Sep 17 00:00:00 2001 From: Pali Rohár Date: Sat, 19 Feb 2022 16:28:13 +0100 Subject: math64: New DIV_U64_ROUND_CLOSEST helper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Provide DIV_U64_ROUND_CLOSEST helper which uses div_u64 to perform division rounded to the closest integer using unsigned 64bit dividend and unsigned 32bit divisor. Reviewed-by: Marek Behún Signed-off-by: Pali Rohár Signed-off-by: Marek Behún Link: https://lore.kernel.org/r/20220219152818.4319-2-kabel@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/math64.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/math64.h b/include/linux/math64.h index 2928f03d6d46..a14f40de1dca 100644 --- a/include/linux/math64.h +++ b/include/linux/math64.h @@ -300,6 +300,19 @@ u64 mul_u64_u64_div_u64(u64 a, u64 mul, u64 div); #define DIV64_U64_ROUND_CLOSEST(dividend, divisor) \ ({ u64 _tmp = (divisor); div64_u64((dividend) + _tmp / 2, _tmp); }) +/* + * DIV_U64_ROUND_CLOSEST - unsigned 64bit divide with 32bit divisor rounded to nearest integer + * @dividend: unsigned 64bit dividend + * @divisor: unsigned 32bit divisor + * + * Divide unsigned 64bit dividend by unsigned 32bit divisor + * and round to closest integer. + * + * Return: dividend / divisor rounded to nearest integer + */ +#define DIV_U64_ROUND_CLOSEST(dividend, divisor) \ + ({ u32 _tmp = (divisor); div_u64((u64)(dividend) + _tmp / 2, _tmp); }) + /* * DIV_S64_ROUND_CLOSEST - signed 64bit divide with 32bit divisor rounded to nearest integer * @dividend: signed 64bit dividend -- cgit From 085a884434f3e3b08349a0ba0904f9f561739d57 Mon Sep 17 00:00:00 2001 From: Richard Gong Date: Wed, 23 Feb 2022 08:49:08 -0600 Subject: firmware: stratix10-svc: extend SVC driver to get the firmware version Extend Intel service layer driver to get the firmware version running at FPGA device. Therefore FPGA manager driver, one of Intel service layer driver's client, can decide whether to handle the newly added bitstream authentication function based on the retrieved firmware version. Link: https://lore.kernel.org/lkml/1617114785-22211-2-git-send-email-richard.gong@linux.intel.com Acked-by: Moritz Fischr Signed-off-by: Richard Gong Signed-off-by: Dinh Nguyen Link: https://lore.kernel.org/r/20220223144908.399522-2-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/intel/stratix10-smc.h | 21 +++++++++++++++++++-- include/linux/firmware/intel/stratix10-svc-client.h | 4 ++++ 2 files changed, 23 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/firmware/intel/stratix10-smc.h b/include/linux/firmware/intel/stratix10-smc.h index c3e5ab014caf..aad497a9ad8b 100644 --- a/include/linux/firmware/intel/stratix10-smc.h +++ b/include/linux/firmware/intel/stratix10-smc.h @@ -321,8 +321,6 @@ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FPGA_CONFIG_COMPLETED_WRITE) #define INTEL_SIP_SMC_ECC_DBE \ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_ECC_DBE) -#endif - /** * Request INTEL_SIP_SMC_RSU_NOTIFY * @@ -404,3 +402,22 @@ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FPGA_CONFIG_COMPLETED_WRITE) #define INTEL_SIP_SMC_FUNCID_RSU_MAX_RETRY 18 #define INTEL_SIP_SMC_RSU_MAX_RETRY \ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_RSU_MAX_RETRY) + +/** + * Request INTEL_SIP_SMC_FIRMWARE_VERSION + * + * Sync call used to query the version of running firmware + * + * Call register usage: + * a0 INTEL_SIP_SMC_FIRMWARE_VERSION + * a1-a7 not used + * + * Return status: + * a0 INTEL_SIP_SMC_STATUS_OK or INTEL_SIP_SMC_STATUS_ERROR + * a1 running firmware version + */ +#define INTEL_SIP_SMC_FUNCID_FIRMWARE_VERSION 31 +#define INTEL_SIP_SMC_FIRMWARE_VERSION \ + INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FIRMWARE_VERSION) + +#endif diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h index 19781b0f6429..18c1841fdb1f 100644 --- a/include/linux/firmware/intel/stratix10-svc-client.h +++ b/include/linux/firmware/intel/stratix10-svc-client.h @@ -104,6 +104,9 @@ struct stratix10_svc_chan; * * @COMMAND_RSU_DCMF_VERSION: query firmware for the DCMF version, return status * is SVC_STATUS_OK or SVC_STATUS_ERROR + * + * @COMMAND_FIRMWARE_VERSION: query running firmware version, return status + * is SVC_STATUS_OK or SVC_STATUS_ERROR */ enum stratix10_svc_command_code { COMMAND_NOOP = 0, @@ -117,6 +120,7 @@ enum stratix10_svc_command_code { COMMAND_RSU_RETRY, COMMAND_RSU_MAX_RETRY, COMMAND_RSU_DCMF_VERSION, + COMMAND_FIRMWARE_VERSION, }; /** -- cgit From f1d0821bf37ba3cecee0fd1e9ae72a943a69d01d Mon Sep 17 00:00:00 2001 From: Ronak Jain Date: Wed, 9 Feb 2022 00:27:07 -0800 Subject: firmware: xilinx: Add support for runtime features Add support for runtime features by using an IOCTL call. The features can be enabled or disabled from the firmware as well as the features can be configured at runtime by querying IOCTL_SET_FEATURE_CONFIG id. Similarly, the user can get the configured values of features by querying IOCTL_GET_FEATURE_CONFIG id. Acked-by: Michal Simek Signed-off-by: Ronak Jain Link: https://lore.kernel.org/r/20220209082709.32378-2-ronak.jain@xilinx.com Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/xlnx-zynqmp.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h index 907cb01890cf..cf557fbeb8c7 100644 --- a/include/linux/firmware/xlnx-zynqmp.h +++ b/include/linux/firmware/xlnx-zynqmp.h @@ -143,6 +143,9 @@ enum pm_ioctl_id { IOCTL_OSPI_MUX_SELECT = 21, /* Register SGI to ATF */ IOCTL_REGISTER_SGI = 25, + /* Runtime feature configuration */ + IOCTL_SET_FEATURE_CONFIG = 26, + IOCTL_GET_FEATURE_CONFIG = 27, }; enum pm_query_id { @@ -376,6 +379,14 @@ enum ospi_mux_select_type { PM_OSPI_MUX_SEL_LINEAR = 1, }; +enum pm_feature_config_id { + PM_FEATURE_INVALID = 0, + PM_FEATURE_OVERTEMP_STATUS = 1, + PM_FEATURE_OVERTEMP_VALUE = 2, + PM_FEATURE_EXTWDT_STATUS = 3, + PM_FEATURE_EXTWDT_VALUE = 4, +}; + /** * struct zynqmp_pm_query_data - PM query data * @qid: query ID @@ -447,6 +458,8 @@ int zynqmp_pm_load_pdi(const u32 src, const u64 address); int zynqmp_pm_register_notifier(const u32 node, const u32 event, const u32 wake, const u32 enable); int zynqmp_pm_feature(const u32 api_id); +int zynqmp_pm_set_feature_config(enum pm_feature_config_id id, u32 value); +int zynqmp_pm_get_feature_config(enum pm_feature_config_id id, u32 *payload); #else static inline int zynqmp_pm_get_api_version(u32 *version) { @@ -689,6 +702,18 @@ static inline int zynqmp_pm_feature(const u32 api_id) { return -ENODEV; } + +static inline int zynqmp_pm_set_feature_config(enum pm_feature_config_id id, + u32 value) +{ + return -ENODEV; +} + +static inline int zynqmp_pm_get_feature_config(enum pm_feature_config_id id, + u32 *payload) +{ + return -ENODEV; +} #endif #endif /* __FIRMWARE_ZYNQMP_H__ */ -- cgit From 2502960fba7e94e090112069694365295c32ccc5 Mon Sep 17 00:00:00 2001 From: Yong Wu Date: Mon, 14 Feb 2022 14:07:57 +0800 Subject: component: Add common helper for compare/release functions The component requires the compare/release functions, there are so many copies in current kernel. Just define four common helpers for them. Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Signed-off-by: Yong Wu Link: https://lore.kernel.org/r/20220214060819.7334-2-yong.wu@mediatek.com Signed-off-by: Greg Kroah-Hartman --- include/linux/component.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/component.h b/include/linux/component.h index 7012569c6546..df4aa75c9e7c 100644 --- a/include/linux/component.h +++ b/include/linux/component.h @@ -82,6 +82,12 @@ struct component_master_ops { void (*unbind)(struct device *master); }; +/* A set helper functions for component compare/release */ +int component_compare_of(struct device *dev, void *data); +void component_release_of(struct device *dev, void *data); +int component_compare_dev(struct device *dev, void *data); +int component_compare_dev_name(struct device *dev, void *data); + void component_master_del(struct device *, const struct component_master_ops *); -- cgit From 9584e7263e9ebcd94b184dc3efc847355a624220 Mon Sep 17 00:00:00 2001 From: Claudiu Beznea Date: Thu, 13 Jan 2022 16:48:54 +0200 Subject: ARM: at91: PM: add cpu idle support for sama7g5 Add CPU idle support for SAMA7G5. Support will make use of PMC_CPU_RATIO register to divide the CPU clock by 16 before switching it to idle and use automatic self-refresh option of DDR controller. Signed-off-by: Claudiu Beznea Acked-by: Stephen Boyd Signed-off-by: Nicolas Ferre Link: https://lore.kernel.org/r/20220113144900.906370-5-claudiu.beznea@microchip.com --- include/linux/clk/at91_pmc.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk/at91_pmc.h b/include/linux/clk/at91_pmc.h index ccb3f034bfa9..3484309b59bf 100644 --- a/include/linux/clk/at91_pmc.h +++ b/include/linux/clk/at91_pmc.h @@ -78,6 +78,10 @@ #define AT91_PMC_MAINRDY (1 << 16) /* Main Clock Ready */ #define AT91_CKGR_PLLAR 0x28 /* PLL A Register */ + +#define AT91_PMC_RATIO 0x2c /* Processor clock ratio register [SAMA7G5 only] */ +#define AT91_PMC_RATIO_RATIO (0xf) /* CPU clock ratio. */ + #define AT91_CKGR_PLLBR 0x2c /* PLL B Register */ #define AT91_PMC_DIV (0xff << 0) /* Divider */ #define AT91_PMC_PLLCOUNT (0x3f << 8) /* PLL Counter */ -- cgit From 8b4195cd6dc3f1f0ab457d23d21e9f72fde0760a Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Wed, 23 Feb 2022 14:43:47 +0100 Subject: mtd: spi-nor: move all xilinx specifics into xilinx.c Mechanically move all the xilinx functions to its own module. Then register the new flash specific ready() function. Signed-off-by: Michael Walle Signed-off-by: Tudor Ambarus Reviewed-by: Pratyush Yadav Link: https://lore.kernel.org/r/20220223134358.1914798-22-michael@walle.cc --- include/linux/mtd/spi-nor.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h index fc90fce26e33..b44b05a6f934 100644 --- a/include/linux/mtd/spi-nor.h +++ b/include/linux/mtd/spi-nor.h @@ -86,15 +86,6 @@ #define SPINOR_OP_BP 0x02 /* Byte program */ #define SPINOR_OP_AAI_WP 0xad /* Auto address increment word program */ -/* Used for S3AN flashes only */ -#define SPINOR_OP_XSE 0x50 /* Sector erase */ -#define SPINOR_OP_XPP 0x82 /* Page program */ -#define SPINOR_OP_XRDSR 0xd7 /* Read status register */ - -#define XSR_PAGESIZE BIT(0) /* Page size in Po2 or Linear */ -#define XSR_RDY BIT(7) /* Ready */ - - /* Used for Macronix and Winbond flashes. */ #define SPINOR_OP_EN4B 0xb7 /* Enter 4-byte mode */ #define SPINOR_OP_EX4B 0xe9 /* Exit 4-byte mode */ -- cgit From c770abe52d81089a8b8ecd1fe42722e29bbab5f5 Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Wed, 23 Feb 2022 14:43:50 +0100 Subject: mtd: spi-nor: move all micron-st specifics into micron-st.c The flag status register is only available on micron flashes. Move all the functions around that into the micron module. This is almost a mechanical move except for the spi_nor_fsr_ready() which now also checks the normal status register. Previously, this was done in spi_nor_ready(). Signed-off-by: Michael Walle Signed-off-by: Tudor Ambarus Tested-by: Pratyush Yadav # on mt35xu512aba, s28hs512t Reviewed-by: Pratyush Yadav Link: https://lore.kernel.org/r/20220223134358.1914798-25-michael@walle.cc --- include/linux/mtd/spi-nor.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h index b44b05a6f934..4622251a79ff 100644 --- a/include/linux/mtd/spi-nor.h +++ b/include/linux/mtd/spi-nor.h @@ -47,8 +47,6 @@ #define SPINOR_OP_RDID 0x9f /* Read JEDEC ID */ #define SPINOR_OP_RDSFDP 0x5a /* Read SFDP */ #define SPINOR_OP_RDCR 0x35 /* Read configuration register */ -#define SPINOR_OP_RDFSR 0x70 /* Read flag status register */ -#define SPINOR_OP_CLFSR 0x50 /* Clear flag status register */ #define SPINOR_OP_RDEAR 0xc8 /* Read Extended Address Register */ #define SPINOR_OP_WREAR 0xc5 /* Write Extended Address Register */ #define SPINOR_OP_SRSTEN 0x66 /* Software Reset Enable */ @@ -126,12 +124,6 @@ /* Enhanced Volatile Configuration Register bits */ #define EVCR_QUAD_EN_MICRON BIT(7) /* Micron Quad I/O */ -/* Flag Status Register bits */ -#define FSR_READY BIT(7) /* Device status, 0 = Busy, 1 = Ready */ -#define FSR_E_ERR BIT(5) /* Erase operation status */ -#define FSR_P_ERR BIT(4) /* Program operation status */ -#define FSR_PT_ERR BIT(1) /* Protection error bit */ - /* Status Register 2 bits. */ #define SR2_QUAD_EN_BIT1 BIT(1) #define SR2_LB1 BIT(3) /* Security Register Lock Bit 1 */ -- cgit From 837d5181beef068c16bb8424c2c1571a7d5d7966 Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Wed, 23 Feb 2022 14:43:54 +0100 Subject: mtd: spi-nor: move all spansion specifics into spansion.c The clear status register flags is only available on spansion flashes. Move all the functions around that into the spanion module. Signed-off-by: Michael Walle Signed-off-by: Tudor Ambarus Tested-by: Pratyush Yadav # on mt35xu512aba, s28hs512t Reviewed-by: Pratyush Yadav Link: https://lore.kernel.org/r/20220223134358.1914798-29-michael@walle.cc --- include/linux/mtd/spi-nor.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h index 4622251a79ff..5e25a7b75ae2 100644 --- a/include/linux/mtd/spi-nor.h +++ b/include/linux/mtd/spi-nor.h @@ -90,7 +90,6 @@ /* Used for Spansion flashes only. */ #define SPINOR_OP_BRWR 0x17 /* Bank register write */ -#define SPINOR_OP_CLSR 0x30 /* Clear status register 1 */ /* Used for Micron flashes only. */ #define SPINOR_OP_RD_EVCR 0x65 /* Read EVCR register */ -- cgit From bc82c38a6933aab308387d4aca47e0a05de7b553 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Fri, 11 Feb 2022 08:10:18 +0100 Subject: tracing: Uninline trace_trigger_soft_disabled() partly On a powerpc32 build with CONFIG_CC_OPTIMISE_FOR_SIZE, the inline keyword is not honored and trace_trigger_soft_disabled() appears approx 50 times in vmlinux. Adding -Winline to the build, the following message appears: ./include/linux/trace_events.h:712:1: error: inlining failed in call to 'trace_trigger_soft_disabled': call is unlikely and code size would grow [-Werror=inline] That function is rather big for an inlined function: c003df60 : c003df60: 94 21 ff f0 stwu r1,-16(r1) c003df64: 7c 08 02 a6 mflr r0 c003df68: 90 01 00 14 stw r0,20(r1) c003df6c: bf c1 00 08 stmw r30,8(r1) c003df70: 83 e3 00 24 lwz r31,36(r3) c003df74: 73 e9 01 00 andi. r9,r31,256 c003df78: 41 82 00 10 beq c003df88 c003df7c: 38 60 00 00 li r3,0 c003df80: 39 61 00 10 addi r11,r1,16 c003df84: 4b fd 60 ac b c0014030 <_rest32gpr_30_x> c003df88: 73 e9 00 80 andi. r9,r31,128 c003df8c: 7c 7e 1b 78 mr r30,r3 c003df90: 41 a2 00 14 beq c003dfa4 c003df94: 38 c0 00 00 li r6,0 c003df98: 38 a0 00 00 li r5,0 c003df9c: 38 80 00 00 li r4,0 c003dfa0: 48 05 c5 f1 bl c009a590 c003dfa4: 73 e9 00 40 andi. r9,r31,64 c003dfa8: 40 82 00 28 bne c003dfd0 c003dfac: 73 ff 02 00 andi. r31,r31,512 c003dfb0: 41 82 ff cc beq c003df7c c003dfb4: 80 01 00 14 lwz r0,20(r1) c003dfb8: 83 e1 00 0c lwz r31,12(r1) c003dfbc: 7f c3 f3 78 mr r3,r30 c003dfc0: 83 c1 00 08 lwz r30,8(r1) c003dfc4: 7c 08 03 a6 mtlr r0 c003dfc8: 38 21 00 10 addi r1,r1,16 c003dfcc: 48 05 6f 6c b c0094f38 c003dfd0: 38 60 00 01 li r3,1 c003dfd4: 4b ff ff ac b c003df80 However it is located in a hot path so inlining it is important. But forcing inlining of the entire function by using __always_inline leads to increasing the text size by approx 20 kbytes. Instead, split the fonction in two parts, one part with the likely fast path, flagged __always_inline, and a second part out of line. With this change, on a powerpc32 with CONFIG_CC_OPTIMISE_FOR_SIZE vmlinux text increases by only 1,4 kbytes, which is partly compensated by a decrease of vmlinux data by 7 kbytes. On ppc64_defconfig which has CONFIG_CC_OPTIMISE_FOR_SPEED, this change reduces vmlinux text by more than 30 kbytes. Link: https://lkml.kernel.org/r/69ce0986a52d026d381d612801d978aa4f977460.1644563295.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 70c069aef02c..dcea51fb60e2 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -699,6 +699,8 @@ event_triggers_post_call(struct trace_event_file *file, bool trace_event_ignore_this_pid(struct trace_event_file *trace_file); +bool __trace_trigger_soft_disabled(struct trace_event_file *file); + /** * trace_trigger_soft_disabled - do triggers and test if soft disabled * @file: The file pointer of the event to test @@ -708,20 +710,20 @@ bool trace_event_ignore_this_pid(struct trace_event_file *trace_file); * triggers that require testing the fields, it will return true, * otherwise false. */ -static inline bool +static __always_inline bool trace_trigger_soft_disabled(struct trace_event_file *file) { unsigned long eflags = file->flags; - if (!(eflags & EVENT_FILE_FL_TRIGGER_COND)) { - if (eflags & EVENT_FILE_FL_TRIGGER_MODE) - event_triggers_call(file, NULL, NULL, NULL); - if (eflags & EVENT_FILE_FL_SOFT_DISABLED) - return true; - if (eflags & EVENT_FILE_FL_PID_FILTER) - return trace_event_ignore_this_pid(file); - } - return false; + if (likely(!(eflags & (EVENT_FILE_FL_TRIGGER_MODE | + EVENT_FILE_FL_SOFT_DISABLED | + EVENT_FILE_FL_PID_FILTER)))) + return false; + + if (likely(eflags & EVENT_FILE_FL_TRIGGER_COND)) + return false; + + return __trace_trigger_soft_disabled(file); } #ifdef CONFIG_BPF_EVENTS -- cgit From 8786fde8421ce755a842051f9528674a1b1f0b9a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 22 Jan 2022 20:54:52 +0000 Subject: Convert NFS from readpages to readahead NFS is one of the last two users of the deprecated ->readpages aop. This conversion looks straightforward, but I have only compile-tested it. Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 68f81d8d36de..333ea05e2531 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -601,8 +601,7 @@ nfs_have_writebacks(struct inode *inode) * linux/fs/nfs/read.c */ extern int nfs_readpage(struct file *, struct page *); -extern int nfs_readpages(struct file *, struct address_space *, - struct list_head *, unsigned); +void nfs_readahead(struct readahead_control *); /* * inline functions -- cgit From 43245eca6e670ebf65908b549641c1460a9cc944 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Wed, 2 Feb 2022 17:55:02 -0500 Subject: NFSv4.1 support for NFS4_RESULT_PRESERVER_UNLINKED In 4.1+, the server is allowed to set a flag NFS4_RESULT_PRESERVE_UNLINKED in reply to the OPEN, that tells the client that it does not need to do a silly rename of an opened file when it's being removed. Signed-off-by: Olga Kornievskaia Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 333ea05e2531..0e79dbbc759a 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -277,6 +277,7 @@ struct nfs4_copy_state { #define NFS_INO_STALE (1) /* possible stale inode */ #define NFS_INO_ACL_LRU_SET (2) /* Inode is on the LRU list */ #define NFS_INO_INVALIDATING (3) /* inode is being invalidated */ +#define NFS_INO_PRESERVE_UNLINKED (4) /* preserve file if removed while open */ #define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */ #define NFS_INO_FORCE_READDIR (7) /* force readdirplus */ #define NFS_INO_LAYOUTCOMMIT (9) /* layoutcommit required */ -- cgit From 88a6099fc3274a27814d26dd688fdc5cd7a480ee Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 9 Feb 2022 13:22:48 -0500 Subject: NFS: Replace last uses of NFS_INO_REVAL_PAGECACHE Now that we have more fine grained attribute revalidation, let's just get rid of NFS_INO_REVAL_PAGECACHE. Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 0e79dbbc759a..ce3128e4bffa 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -356,11 +356,9 @@ static inline void nfs_mark_for_revalidate(struct inode *inode) struct nfs_inode *nfsi = NFS_I(inode); spin_lock(&inode->i_lock); - nfsi->cache_validity |= NFS_INO_REVAL_PAGECACHE - | NFS_INO_INVALID_ACCESS - | NFS_INO_INVALID_ACL - | NFS_INO_INVALID_CHANGE - | NFS_INO_INVALID_CTIME; + nfsi->cache_validity |= NFS_INO_INVALID_ACCESS | NFS_INO_INVALID_ACL | + NFS_INO_INVALID_CHANGE | NFS_INO_INVALID_CTIME | + NFS_INO_INVALID_SIZE; if (S_ISDIR(inode->i_mode)) nfsi->cache_validity |= NFS_INO_INVALID_DATA; spin_unlock(&inode->i_lock); -- cgit From 41e97b7f8a15d15da03ca15e6ff7b9b7ab7f588c Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 9 Feb 2022 13:26:19 -0500 Subject: NFS: Remove unused flag NFS_INO_REVAL_PAGECACHE Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index ce3128e4bffa..72a732a5103c 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -247,7 +247,6 @@ struct nfs4_copy_state { #define NFS_INO_INVALID_ATIME BIT(2) /* cached atime is invalid */ #define NFS_INO_INVALID_ACCESS BIT(3) /* cached access cred invalid */ #define NFS_INO_INVALID_ACL BIT(4) /* cached acls are invalid */ -#define NFS_INO_REVAL_PAGECACHE BIT(5) /* must revalidate pagecache */ #define NFS_INO_REVAL_FORCED BIT(6) /* force revalidation ignoring a delegation */ #define NFS_INO_INVALID_LABEL BIT(7) /* cached label is invalid */ #define NFS_INO_INVALID_CHANGE BIT(8) /* cached change is invalid */ -- cgit From 891b7023010cc0d62d814619b1893745d7613f7f Mon Sep 17 00:00:00 2001 From: Jonathan Neuschäfer Date: Sat, 5 Feb 2022 11:36:09 +0100 Subject: clk: mux: Declare u32 *table parameter as const MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The elements of the table are never modified in clk-mux.c. To make this clear to clock drivers, declare the parameter as const u32 *table. Signed-off-by: Jonathan Neuschäfer Link: https://lore.kernel.org/r/20220205103613.1216218-4-j.neuschaefer@gmx.net Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 2faa6f7aa8a8..27be57528874 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -888,7 +888,7 @@ void clk_hw_unregister_divider(struct clk_hw *hw); struct clk_mux { struct clk_hw hw; void __iomem *reg; - u32 *table; + const u32 *table; u32 mask; u8 shift; u8 flags; @@ -913,18 +913,18 @@ struct clk_hw *__clk_hw_register_mux(struct device *dev, struct device_node *np, const struct clk_hw **parent_hws, const struct clk_parent_data *parent_data, unsigned long flags, void __iomem *reg, u8 shift, u32 mask, - u8 clk_mux_flags, u32 *table, spinlock_t *lock); + u8 clk_mux_flags, const u32 *table, spinlock_t *lock); struct clk_hw *__devm_clk_hw_register_mux(struct device *dev, struct device_node *np, const char *name, u8 num_parents, const char * const *parent_names, const struct clk_hw **parent_hws, const struct clk_parent_data *parent_data, unsigned long flags, void __iomem *reg, u8 shift, u32 mask, - u8 clk_mux_flags, u32 *table, spinlock_t *lock); + u8 clk_mux_flags, const u32 *table, spinlock_t *lock); struct clk *clk_register_mux_table(struct device *dev, const char *name, const char * const *parent_names, u8 num_parents, unsigned long flags, void __iomem *reg, u8 shift, u32 mask, - u8 clk_mux_flags, u32 *table, spinlock_t *lock); + u8 clk_mux_flags, const u32 *table, spinlock_t *lock); #define clk_register_mux(dev, name, parent_names, num_parents, flags, reg, \ shift, width, clk_mux_flags, lock) \ @@ -962,9 +962,9 @@ struct clk *clk_register_mux_table(struct device *dev, const char *name, (shift), BIT((width)) - 1, (clk_mux_flags), \ NULL, (lock)) -int clk_mux_val_to_index(struct clk_hw *hw, u32 *table, unsigned int flags, +int clk_mux_val_to_index(struct clk_hw *hw, const u32 *table, unsigned int flags, unsigned int val); -unsigned int clk_mux_index_to_val(u32 *table, unsigned int flags, u8 index); +unsigned int clk_mux_index_to_val(const u32 *table, unsigned int flags, u8 index); void clk_unregister_mux(struct clk *clk); void clk_hw_unregister_mux(struct clk_hw *hw); -- cgit From f7f497cb702462e8505ff3d8d4e7722ad95626a1 Mon Sep 17 00:00:00 2001 From: Ritesh Harjani Date: Wed, 16 Feb 2022 12:30:35 +0530 Subject: jbd2: kill t_handle_lock transaction spinlock This patch kills t_handle_lock transaction spinlock completely from jbd2. To explain the reasoning, currently there were three sites at which this spinlock was used. 1. jbd2_journal_wait_updates() a. Based on careful code review it can be seen that, we don't need this lock here. This is since we wait for any currently ongoing updates based on a atomic variable t_updates. And we anyway don't take any t_handle_lock while in stop_this_handle(). i.e. write_lock(&journal->j_state_lock() jbd2_journal_wait_updates() stop_this_handle() while (atomic_read(txn->t_updates) { | DEFINE_WAIT(wait); | prepare_to_wait(); | if (atomic_read(txn->t_updates) if (atomic_dec_and_test(txn->t_updates)) write_unlock(&journal->j_state_lock); schedule(); wake_up() write_lock(&journal->j_state_lock); finish_wait(); } txn->t_state = T_COMMIT write_unlock(&journal->j_state_lock); b. Also note that between atomic_inc(&txn->t_updates) in start_this_handle() and jbd2_journal_wait_updates(), the synchronization happens via read_lock(journal->j_state_lock) in start_this_handle(); 2. jbd2_journal_extend() a. jbd2_journal_extend() is called with the handle of each process from task_struct. So no lock required in updating member fields of handle_t b. For member fields of h_transaction, all updates happens only via atomic APIs (which is also within read_lock()). So, no need of this transaction spinlock. 3. update_t_max_wait() Based on Jan suggestion, this can be carefully removed using atomic cmpxchg API. Note that there can be several processes which are waiting for a new transaction to be allocated and started. For doing this only one process will succeed in taking write_lock() and allocating a new txn. After that all of the process will be updating the t_max_wait (max transaction wait time). This can be done via below method w/o taking any locks using atomic cmpxchg. For more details refer [1] new = get_new_val(); old = READ_ONCE(ptr->max_val); while (old < new) old = cmpxchg(&ptr->max_val, old, new); [1]: https://lwn.net/Articles/849237/ Suggested-by: Jan Kara Signed-off-by: Ritesh Harjani Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/d89e599658b4a1f3893a48c6feded200073037fc.1644992076.git.riteshh@linux.ibm.com Signed-off-by: Theodore Ts'o --- include/linux/jbd2.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 9c3ada74ffb1..a787872e1e86 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -554,9 +554,6 @@ struct transaction_chp_stats_s { * ->j_list_lock * * j_state_lock - * ->t_handle_lock - * - * j_state_lock * ->j_list_lock (journal_unmap_buffer) * */ -- cgit From 5e187189ec324f78035d33a4bc123a9c4ca6f3e3 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sat, 26 Feb 2022 12:18:29 +0800 Subject: net: ip: add skb drop reasons for ip egress path Replace kfree_skb() which is used in the packet egress path of IP layer with kfree_skb_reason(). Functions that are involved include: __ip_queue_xmit() ip_finish_output() ip_mc_finish_output() ip6_output() ip6_finish_output() ip6_finish_output2() Following new drop reasons are introduced: SKB_DROP_REASON_IP_OUTNOROUTES SKB_DROP_REASON_BPF_CGROUP_EGRESS SKB_DROP_REASON_IPV6DISABLED SKB_DROP_REASON_NEIGH_CREATEFAIL Reviewed-by: Mengen Sun Reviewed-by: Hao Peng Signed-off-by: Menglong Dong Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 31be38078918..62b4bed1b7bc 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -380,6 +380,15 @@ enum skb_drop_reason { * the ofo queue, corresponding to * LINUX_MIB_TCPOFOMERGE */ + SKB_DROP_REASON_IP_OUTNOROUTES, /* route lookup failed */ + SKB_DROP_REASON_BPF_CGROUP_EGRESS, /* dropped by + * BPF_PROG_TYPE_CGROUP_SKB + * eBPF program + */ + SKB_DROP_REASON_IPV6DISABLED, /* IPv6 is disabled on the device */ + SKB_DROP_REASON_NEIGH_CREATEFAIL, /* failed to create neigh + * entry + */ SKB_DROP_REASON_MAX, }; -- cgit From a5736edda10ca2ba075606baad1dda3f2426766d Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sat, 26 Feb 2022 12:18:30 +0800 Subject: net: neigh: use kfree_skb_reason() for __neigh_event_send() Replace kfree_skb() used in __neigh_event_send() with kfree_skb_reason(). Following drop reasons are added: SKB_DROP_REASON_NEIGH_FAILED SKB_DROP_REASON_NEIGH_QUEUEFULL SKB_DROP_REASON_NEIGH_DEAD The first two reasons above should be the hot path that skb drops in neighbour subsystem. Reviewed-by: Mengen Sun Reviewed-by: Hao Peng Signed-off-by: Menglong Dong Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 62b4bed1b7bc..d67941f78b92 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -389,6 +389,11 @@ enum skb_drop_reason { SKB_DROP_REASON_NEIGH_CREATEFAIL, /* failed to create neigh * entry */ + SKB_DROP_REASON_NEIGH_FAILED, /* neigh entry in failed state */ + SKB_DROP_REASON_NEIGH_QUEUEFULL, /* arp_queue for neigh + * entry is full + */ + SKB_DROP_REASON_NEIGH_DEAD, /* neigh entry is dead */ SKB_DROP_REASON_MAX, }; -- cgit From 21ca9fb62d4688da41825e0f05d8e7e26afc69d6 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Thu, 24 Feb 2022 16:20:10 +0200 Subject: PCI/IOV: Add pci_iov_vf_id() to get VF index The PCI core uses the VF index internally, often called the vf_id, during the setup of the VF, eg pci_iov_add_virtfn(). This index is needed for device drivers that implement live migration for their internal operations that configure/control their VFs. Specifically, mlx5_vfio_pci driver that is introduced in coming patches from this series needs it and not the bus/device/function which is exposed today. Add pci_iov_vf_id() which computes the vf_id by reversing the math that was used to create the bus/device/function. Link: https://lore.kernel.org/all/20220224142024.147653-2-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe Acked-by: Bjorn Helgaas Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- include/linux/pci.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 8253a5413d7c..3d4ff7b35ad1 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2166,7 +2166,7 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar); #ifdef CONFIG_PCI_IOV int pci_iov_virtfn_bus(struct pci_dev *dev, int id); int pci_iov_virtfn_devfn(struct pci_dev *dev, int id); - +int pci_iov_vf_id(struct pci_dev *dev); int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); void pci_disable_sriov(struct pci_dev *dev); @@ -2194,6 +2194,12 @@ static inline int pci_iov_virtfn_devfn(struct pci_dev *dev, int id) { return -ENOSYS; } + +static inline int pci_iov_vf_id(struct pci_dev *dev) +{ + return -ENOSYS; +} + static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn) { return -ENODEV; } -- cgit From a7e9f240c0da4fb73a353c603daf4beba04c6ecf Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Thu, 24 Feb 2022 16:20:13 +0200 Subject: PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF There are some cases where a SR-IOV VF driver will need to reach into and interact with the PF driver. This requires accessing the drvdata of the PF. Provide a function pci_iov_get_pf_drvdata() to return this PF drvdata in a safe way. Normally accessing a drvdata of a foreign struct device would be done using the device_lock() to protect against device driver probe()/remove() races. However, due to the design of pci_enable_sriov() this will result in a ABBA deadlock on the device_lock as the PF's device_lock is held during PF sriov_configure() while calling pci_enable_sriov() which in turn holds the VF's device_lock while calling VF probe(), and similarly for remove. This means the VF driver can never obtain the PF's device_lock. Instead use the implicit locking created by pci_enable/disable_sriov(). A VF driver can access its PF drvdata only while its own driver is attached, and the PF driver can control access to its own drvdata based on when it calls pci_enable/disable_sriov(). To use this API the PF driver will setup the PF drvdata in the probe() function. pci_enable_sriov() is only called from sriov_configure() which cannot happen until probe() completes, ensuring no VF races with drvdata setup. For removal, the PF driver must call pci_disable_sriov() in its remove function before destroying any of the drvdata. This ensures that all VF drivers are unbound before returning, fencing concurrent access to the drvdata. The introduction of a new function to do this access makes clear the special locking scheme and the documents the requirements on the PF/VF drivers using this. Link: https://lore.kernel.org/all/20220224142024.147653-5-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe Acked-by: Bjorn Helgaas Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- include/linux/pci.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 3d4ff7b35ad1..60d423d8f0c4 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2167,6 +2167,7 @@ void __iomem *pci_ioremap_wc_bar(struct pci_dev *pdev, int bar); int pci_iov_virtfn_bus(struct pci_dev *dev, int id); int pci_iov_virtfn_devfn(struct pci_dev *dev, int id); int pci_iov_vf_id(struct pci_dev *dev); +void *pci_iov_get_pf_drvdata(struct pci_dev *dev, struct pci_driver *pf_driver); int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn); void pci_disable_sriov(struct pci_dev *dev); @@ -2200,6 +2201,12 @@ static inline int pci_iov_vf_id(struct pci_dev *dev) return -ENOSYS; } +static inline void *pci_iov_get_pf_drvdata(struct pci_dev *dev, + struct pci_driver *pf_driver) +{ + return ERR_PTR(-EINVAL); +} + static inline int pci_enable_sriov(struct pci_dev *dev, int nr_virtfn) { return -ENODEV; } -- cgit From 1695b97b291e79295bf5c26cba5ecc4b443d8ac7 Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Thu, 24 Feb 2022 16:20:14 +0200 Subject: net/mlx5: Expose APIs to get/put the mlx5 core device Expose an API to get the mlx5 core device from a given VF PCI device if mlx5_core is its driver. Upon the get API we stay with the intf_state_mutex locked to make sure that the device can't be gone/unloaded till the caller will complete its job over the device, this expects to be for a short period of time for any flow that the lock is taken. Upon the put API we unlock the intf_state_mutex. The use case for those APIs is the migration flow of a VF over VFIO PCI. In that case the VF doesn't ride on mlx5_core, because the device is driving *two* different PCI devices, the PF owned by mlx5_core and the VF owned by the vfio driver. The mlx5_core of the PF is accessed only during the narrow window of the VF's ioctl that requires its services. This allows the PF driver to be more independent of the VF driver, so long as it doesn't reset the FW. Link: https://lore.kernel.org/all/20220224142024.147653-6-yishaih@nvidia.com Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 78655d8d13a7..319322a8ff94 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1143,6 +1143,9 @@ int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type, int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type, u64 length, u16 uid, phys_addr_t addr, u32 obj_id); +struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev); +void mlx5_vf_put_core_dev(struct mlx5_core_dev *mdev); + #ifdef CONFIG_MLX5_CORE_IPOIB struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev, struct ib_device *ibdev, -- cgit From adfdaff3d14fe819a0420d81788f7ebbcd954940 Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Thu, 24 Feb 2022 16:20:15 +0200 Subject: net/mlx5: Introduce migration bits and structures Introduce migration IFC related stuff to enable migration commands. Link: https://lore.kernel.org/all/20220224142024.147653-7-yishaih@nvidia.com Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc.h | 147 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 146 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 598ac3bcc901..b1c27409c997 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -127,6 +127,11 @@ enum { MLX5_CMD_OP_QUERY_SF_PARTITION = 0x111, MLX5_CMD_OP_ALLOC_SF = 0x113, MLX5_CMD_OP_DEALLOC_SF = 0x114, + MLX5_CMD_OP_SUSPEND_VHCA = 0x115, + MLX5_CMD_OP_RESUME_VHCA = 0x116, + MLX5_CMD_OP_QUERY_VHCA_MIGRATION_STATE = 0x117, + MLX5_CMD_OP_SAVE_VHCA_STATE = 0x118, + MLX5_CMD_OP_LOAD_VHCA_STATE = 0x119, MLX5_CMD_OP_CREATE_MKEY = 0x200, MLX5_CMD_OP_QUERY_MKEY = 0x201, MLX5_CMD_OP_DESTROY_MKEY = 0x202, @@ -1757,7 +1762,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 reserved_at_682[0x1]; u8 log_max_sf[0x5]; u8 apu[0x1]; - u8 reserved_at_689[0x7]; + u8 reserved_at_689[0x4]; + u8 migration[0x1]; + u8 reserved_at_68e[0x2]; u8 log_min_sf_size[0x8]; u8 max_num_sf_partitions[0x8]; @@ -11519,4 +11526,142 @@ enum { MLX5_MTT_PERM_RW = MLX5_MTT_PERM_READ | MLX5_MTT_PERM_WRITE, }; +enum { + MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_INITIATOR = 0x0, + MLX5_SUSPEND_VHCA_IN_OP_MOD_SUSPEND_RESPONDER = 0x1, +}; + +struct mlx5_ifc_suspend_vhca_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; +}; + +struct mlx5_ifc_suspend_vhca_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x40]; +}; + +enum { + MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_RESPONDER = 0x0, + MLX5_RESUME_VHCA_IN_OP_MOD_RESUME_INITIATOR = 0x1, +}; + +struct mlx5_ifc_resume_vhca_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; +}; + +struct mlx5_ifc_resume_vhca_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x40]; +}; + +struct mlx5_ifc_query_vhca_migration_state_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; +}; + +struct mlx5_ifc_query_vhca_migration_state_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x40]; + + u8 required_umem_size[0x20]; + + u8 reserved_at_a0[0x160]; +}; + +struct mlx5_ifc_save_vhca_state_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; + + u8 va[0x40]; + + u8 mkey[0x20]; + + u8 size[0x20]; +}; + +struct mlx5_ifc_save_vhca_state_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 actual_image_size[0x20]; + + u8 reserved_at_60[0x20]; +}; + +struct mlx5_ifc_load_vhca_state_in_bits { + u8 opcode[0x10]; + u8 uid[0x10]; + + u8 reserved_at_20[0x10]; + u8 op_mod[0x10]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; + + u8 va[0x40]; + + u8 mkey[0x20]; + + u8 size[0x20]; +}; + +struct mlx5_ifc_load_vhca_state_out_bits { + u8 status[0x8]; + u8 reserved_at_8[0x18]; + + u8 syndrome[0x20]; + + u8 reserved_at_40[0x40]; +}; + #endif /* MLX5_IFC_H */ -- cgit From e8eb9e32999dc5995b19d5141f8ebb38f69696fc Mon Sep 17 00:00:00 2001 From: Dimitris Michailidis Date: Thu, 24 Feb 2022 18:58:55 -0800 Subject: PCI: Add Fungible Vendor ID to pci_ids.h Cc: Bjorn Helgaas Cc: linux-pci@vger.kernel.org Signed-off-by: Dimitris Michailidis Acked-by: Bjorn Helgaas Signed-off-by: David S. Miller --- include/linux/pci_ids.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index aad54c666407..c7e6f2043c7d 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2561,6 +2561,8 @@ #define PCI_VENDOR_ID_HYGON 0x1d94 +#define PCI_VENDOR_ID_FUNGIBLE 0x1dad + #define PCI_VENDOR_ID_HXT 0x1dbf #define PCI_VENDOR_ID_TEKRAM 0x1de1 -- cgit From 91495f21fcec3a889d6a9c21e6df16c4c15f2184 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 25 Feb 2022 11:22:16 +0200 Subject: net: dsa: tag_8021q: replace the SVL bridging with VLAN-unaware IVL bridging For VLAN-unaware bridging, tag_8021q uses something perhaps a bit too tied with the sja1105 switch: each port uses the same pvid which is also used for standalone operation (a unique one from which the source port and device ID can be retrieved when packets from that port are forwarded to the CPU). Since each port has a unique pvid when performing autonomous forwarding, the switch must be configured for Shared VLAN Learning (SVL) such that the VLAN ID itself is ignored when performing FDB lookups. Without SVL, packets would always be flooded, since FDB lookup in the source port's VLAN would never find any entry. First of all, to make tag_8021q more palatable to switches which might not support Shared VLAN Learning, let's just use a common VLAN for all ports that are under the same bridge. Secondly, using Shared VLAN Learning means that FDB isolation can never be enforced. But if all ports under the same VLAN-unaware bridge share the same VLAN ID, it can. The disadvantage is that the CPU port can no longer perform precise source port identification for these packets. But at least we have a mechanism which has proven to be adequate for that situation: imprecise RX (dsa_find_designated_bridge_port_by_vid), which is what we use for termination on VLAN-aware bridges. The VLAN ID that VLAN-unaware bridges will use with tag_8021q is the same one as we were previously using for imprecise TX (bridge TX forwarding offload). It is already allocated, it is just a matter of using it. Note that because now all ports under the same bridge share the same VLAN, the complexity of performing a tag_8021q bridge join decreases dramatically. We no longer have to install the RX VLAN of a newly joining port into the port membership of the existing bridge ports. The newly joining port just becomes a member of the VLAN corresponding to that bridge, and the other ports are already members of it from when they joined the bridge themselves. So forwarding works properly. This means that we can unhook dsa_tag_8021q_bridge_{join,leave} from the cross-chip notifier level dsa_switch_bridge_{join,leave}. We can put these calls directly into the sja1105 driver. With this new mode of operation, a port controlled by tag_8021q can have two pvids whereas before it could only have one. The pvid for standalone operation is different from the pvid used for VLAN-unaware bridging. This is done, again, so that FDB isolation can be enforced. Let tag_8021q manage this by deleting the standalone pvid when a port joins a bridge, and restoring it when it leaves it. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/dsa/8021q.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dsa/8021q.h b/include/linux/dsa/8021q.h index 939a1beaddf7..f47f227baa27 100644 --- a/include/linux/dsa/8021q.h +++ b/include/linux/dsa/8021q.h @@ -32,17 +32,17 @@ int dsa_tag_8021q_register(struct dsa_switch *ds, __be16 proto); void dsa_tag_8021q_unregister(struct dsa_switch *ds); +int dsa_tag_8021q_bridge_join(struct dsa_switch *ds, int port, + struct dsa_bridge bridge); + +void dsa_tag_8021q_bridge_leave(struct dsa_switch *ds, int port, + struct dsa_bridge bridge); + struct sk_buff *dsa_8021q_xmit(struct sk_buff *skb, struct net_device *netdev, u16 tpid, u16 tci); void dsa_8021q_rcv(struct sk_buff *skb, int *source_port, int *switch_id); -int dsa_tag_8021q_bridge_tx_fwd_offload(struct dsa_switch *ds, int port, - struct dsa_bridge bridge); - -void dsa_tag_8021q_bridge_tx_fwd_unoffload(struct dsa_switch *ds, int port, - struct dsa_bridge bridge); - u16 dsa_8021q_bridge_tx_fwd_offload_vid(unsigned int bridge_num); u16 dsa_tag_8021q_tx_vid(const struct dsa_port *dp); -- cgit From d7f9787a763f35225287aedb9364c972ae128d18 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 25 Feb 2022 11:22:17 +0200 Subject: net: dsa: tag_8021q: add support for imprecise RX based on the VBID The sja1105 switch can't populate the PORT field of the tag_8021q header when sending a frame to the CPU with a non-zero VBID. Similar to dsa_find_designated_bridge_port_by_vid() which performs imprecise RX for VLAN-aware bridges, let's introduce a helper in tag_8021q for performing imprecise RX based on the VLAN that it has allocated for a VLAN-unaware bridge. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/dsa/8021q.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dsa/8021q.h b/include/linux/dsa/8021q.h index f47f227baa27..92f5243b841e 100644 --- a/include/linux/dsa/8021q.h +++ b/include/linux/dsa/8021q.h @@ -41,7 +41,11 @@ void dsa_tag_8021q_bridge_leave(struct dsa_switch *ds, int port, struct sk_buff *dsa_8021q_xmit(struct sk_buff *skb, struct net_device *netdev, u16 tpid, u16 tci); -void dsa_8021q_rcv(struct sk_buff *skb, int *source_port, int *switch_id); +void dsa_8021q_rcv(struct sk_buff *skb, int *source_port, int *switch_id, + int *vbid); + +struct net_device *dsa_tag_8021q_find_port_by_vbid(struct net_device *master, + int vbid); u16 dsa_8021q_bridge_tx_fwd_offload_vid(unsigned int bridge_num); -- cgit From 04b67e18ce5b29785578397f6785f28f512d64aa Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 25 Feb 2022 11:22:20 +0200 Subject: net: dsa: tag_8021q: merge RX and TX VLANs In the old Shared VLAN Learning mode of operation that tag_8021q previously used for forwarding, we needed to have distinct concepts for an RX and a TX VLAN. An RX VLAN could be installed on all ports that were members of a given bridge, so that autonomous forwarding could still work, while a TX VLAN was dedicated for precise packet steering, so it just contained the CPU port and one egress port. Now that tag_8021q uses Independent VLAN Learning and imprecise RX/TX all over, those lines have been blurred and we no longer have the need to do precise TX towards a port that is in a bridge. As for standalone ports, it is fine to use the same VLAN ID for both RX and TX. This patch changes the tag_8021q format by shifting the VLAN range it reserves, and halving it. Previously, our DIR bits were encoding the VLAN direction (RX/TX) and were set to either 1 or 2. This meant that tag_8021q reserved 2K VLANs, or 50% of the available range. Change the DIR bits to a hardcoded value of 3 now, which makes tag_8021q reserve only 1K VLANs, and a different range now (the last 1K). This is done so that we leave the old format in place in case we need to return to it. In terms of code, the vid_is_dsa_8021q_rxvlan and vid_is_dsa_8021q_txvlan functions go away. Any vid_is_dsa_8021q is both a TX and an RX VLAN, and they are no longer distinct. For example, felix which did different things for different VLAN types, now needs to handle the RX and the TX logic for the same VLAN. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/dsa/8021q.h | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dsa/8021q.h b/include/linux/dsa/8021q.h index 92f5243b841e..b4e2862633f6 100644 --- a/include/linux/dsa/8021q.h +++ b/include/linux/dsa/8021q.h @@ -49,18 +49,12 @@ struct net_device *dsa_tag_8021q_find_port_by_vbid(struct net_device *master, u16 dsa_8021q_bridge_tx_fwd_offload_vid(unsigned int bridge_num); -u16 dsa_tag_8021q_tx_vid(const struct dsa_port *dp); - -u16 dsa_tag_8021q_rx_vid(const struct dsa_port *dp); +u16 dsa_tag_8021q_standalone_vid(const struct dsa_port *dp); int dsa_8021q_rx_switch_id(u16 vid); int dsa_8021q_rx_source_port(u16 vid); -bool vid_is_dsa_8021q_rxvlan(u16 vid); - -bool vid_is_dsa_8021q_txvlan(u16 vid); - bool vid_is_dsa_8021q(u16 vid); #endif /* _NET_DSA_8021Q_H */ -- cgit From b6362bdf750b4ba266d4a10156174ec52460e73d Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Fri, 25 Feb 2022 11:22:21 +0200 Subject: net: dsa: tag_8021q: rename dsa_8021q_bridge_tx_fwd_offload_vid The dsa_8021q_bridge_tx_fwd_offload_vid is no longer used just for bridge TX forwarding offload, it is the private VLAN reserved for VLAN-unaware bridging in a way that is compatible with FDB isolation. So just rename it dsa_tag_8021q_bridge_vid. Signed-off-by: Vladimir Oltean Signed-off-by: David S. Miller --- include/linux/dsa/8021q.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dsa/8021q.h b/include/linux/dsa/8021q.h index b4e2862633f6..3ed117e299ec 100644 --- a/include/linux/dsa/8021q.h +++ b/include/linux/dsa/8021q.h @@ -47,7 +47,7 @@ void dsa_8021q_rcv(struct sk_buff *skb, int *source_port, int *switch_id, struct net_device *dsa_tag_8021q_find_port_by_vbid(struct net_device *master, int vbid); -u16 dsa_8021q_bridge_tx_fwd_offload_vid(unsigned int bridge_num); +u16 dsa_tag_8021q_bridge_vid(unsigned int bridge_num); u16 dsa_tag_8021q_standalone_vid(const struct dsa_port *dp); -- cgit From bc437f7515f5e14aec9f2801412d9ea48116a97d Mon Sep 17 00:00:00 2001 From: Liam Beguin Date: Sat, 12 Feb 2022 21:57:30 -0500 Subject: iio: afe: rescale: expose scale processing function In preparation for the addition of kunit tests, expose the logic responsible for combining channel scales. Signed-off-by: Liam Beguin Reviewed-by: Peter Rosin Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220213025739.2561834-2-liambeguin@gmail.com Signed-off-by: Jonathan Cameron --- include/linux/iio/afe/rescale.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) create mode 100644 include/linux/iio/afe/rescale.h (limited to 'include/linux') diff --git a/include/linux/iio/afe/rescale.h b/include/linux/iio/afe/rescale.h new file mode 100644 index 000000000000..8a2eb34af327 --- /dev/null +++ b/include/linux/iio/afe/rescale.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2018 Axentia Technologies AB + */ + +#ifndef __IIO_RESCALE_H__ +#define __IIO_RESCALE_H__ + +#include +#include + +struct device; +struct rescale; + +struct rescale_cfg { + enum iio_chan_type type; + int (*props)(struct device *dev, struct rescale *rescale); +}; + +struct rescale { + const struct rescale_cfg *cfg; + struct iio_channel *source; + struct iio_chan_spec chan; + struct iio_chan_spec_ext_info *ext_info; + bool chan_processed; + s32 numerator; + s32 denominator; +}; + +int rescale_process_scale(struct rescale *rescale, int scale_type, + int *val, int *val2); +#endif /* __IIO_RESCALE_H__ */ -- cgit From a29c3283653b80b916c5ca5292c5d36415e38e92 Mon Sep 17 00:00:00 2001 From: Liam Beguin Date: Sat, 12 Feb 2022 21:57:32 -0500 Subject: iio: afe: rescale: add offset support This is a preparatory change required for the addition of temperature sensing front ends. Signed-off-by: Liam Beguin Reviewed-by: Peter Rosin Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220213025739.2561834-4-liambeguin@gmail.com Signed-off-by: Jonathan Cameron --- include/linux/iio/afe/rescale.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iio/afe/rescale.h b/include/linux/iio/afe/rescale.h index 8a2eb34af327..6eecb435488f 100644 --- a/include/linux/iio/afe/rescale.h +++ b/include/linux/iio/afe/rescale.h @@ -25,8 +25,12 @@ struct rescale { bool chan_processed; s32 numerator; s32 denominator; + s32 offset; }; int rescale_process_scale(struct rescale *rescale, int scale_type, int *val, int *val2); +int rescale_process_offset(struct rescale *rescale, int scale_type, + int scale, int scale2, int schan_off, + int *val, int *val2); #endif /* __IIO_RESCALE_H__ */ -- cgit From e75d16e58467c5703821e12536c7dc438f3c425d Mon Sep 17 00:00:00 2001 From: Armin Wolf Date: Thu, 24 Feb 2022 07:12:09 +0100 Subject: hwmon: (core) Add support for pwm auto channels attribute pwm[1-*]_auto_channels_temp is documented as an official hwmon sysfs attribute, yet there is no support for it in the new with_info-API. Fix that. Signed-off-by: Armin Wolf Link: https://lore.kernel.org/r/20220224061210.16452-2-W_Armin@gmx.de Signed-off-by: Guenter Roeck --- include/linux/hwmon.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hwmon.h b/include/linux/hwmon.h index fad1f1df26df..eba380b76d15 100644 --- a/include/linux/hwmon.h +++ b/include/linux/hwmon.h @@ -332,12 +332,14 @@ enum hwmon_pwm_attributes { hwmon_pwm_enable, hwmon_pwm_mode, hwmon_pwm_freq, + hwmon_pwm_auto_channels_temp, }; #define HWMON_PWM_INPUT BIT(hwmon_pwm_input) #define HWMON_PWM_ENABLE BIT(hwmon_pwm_enable) #define HWMON_PWM_MODE BIT(hwmon_pwm_mode) #define HWMON_PWM_FREQ BIT(hwmon_pwm_freq) +#define HWMON_PWM_AUTO_CHANNELS_TEMP BIT(hwmon_pwm_auto_channels_temp) enum hwmon_intrusion_attributes { hwmon_intrusion_alarm, -- cgit From d72ce7d324786257410bec6b36e3756647dd76fd Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Sat, 26 Feb 2022 00:27:55 +0100 Subject: power: supply: ab8500: Standardize maintenance charging Maintenance charging is the phase of keeping up the charge after the battery has charged fully using CC/CV charging. This can be done in many successive phases and is usually done with a slightly lower constant voltage than CV, and a slightly lower allowed current. Add an array of maintenance charging points each with a current, voltage and safety timer, and add helper functions to use these. Migrate the AB8500 code over. This is used in several Samsung products using the AB8500 and these batteries and their complete parameters will be added later as full examples, but the default battery in the AB8500 code serves as a reasonable example so far. Reviewed-by: Matti Vaittinen Signed-off-by: Linus Walleij Signed-off-by: Sebastian Reichel --- include/linux/power_supply.h | 64 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) (limited to 'include/linux') diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h index c135196aa9d1..8ced6550caa7 100644 --- a/include/linux/power_supply.h +++ b/include/linux/power_supply.h @@ -349,6 +349,52 @@ struct power_supply_resistance_temp_table { int resistance; /* internal resistance percent */ }; +/** + * struct power_supply_maintenance_charge_table - setting for maintenace charging + * @charge_current_max_ua: maintenance charging current that is used to keep + * the charge of the battery full as current is consumed after full charging. + * The corresponding charge_voltage_max_uv is used as a safeguard: when we + * reach this voltage the maintenance charging current is turned off. It is + * turned back on if we fall below this voltage. + * @charge_voltage_max_uv: maintenance charging voltage that is usually a bit + * lower than the constant_charge_voltage_max_uv. We can apply this settings + * charge_current_max_ua until we get back up to this voltage. + * @safety_timer_minutes: maintenance charging safety timer, with an expiry + * time in minutes. We will only use maintenance charging in this setting + * for a certain amount of time, then we will first move to the next + * maintenance charge current and voltage pair in respective array and wait + * for the next safety timer timeout, or, if we reached the last maintencance + * charging setting, disable charging until we reach + * charge_restart_voltage_uv and restart ordinary CC/CV charging from there. + * These timers should be chosen to align with the typical discharge curve + * for the battery. + * + * When the main CC/CV charging is complete the battery can optionally be + * maintenance charged at the voltages from this table: a table of settings is + * traversed using a slightly lower current and voltage than what is used for + * CC/CV charging. The maintenance charging will for safety reasons not go on + * indefinately: we lower the current and voltage with successive maintenance + * settings, then disable charging completely after we reach the last one, + * and after that we do not restart charging until we reach + * charge_restart_voltage_uv (see struct power_supply_battery_info) and restart + * ordinary CC/CV charging from there. + * + * As an example, a Samsung EB425161LA Lithium-Ion battery is CC/CV charged + * at 900mA to 4340mV, then maintenance charged at 600mA and 4150mV for + * 60 hours, then maintenance charged at 600mA and 4100mV for 200 hours. + * After this the charge cycle is restarted waiting for + * charge_restart_voltage_uv. + * + * For most mobile electronics this type of maintenance charging is enough for + * the user to disconnect the device and make use of it before both maintenance + * charging cycles are complete. + */ +struct power_supply_maintenance_charge_table { + int charge_current_max_ua; + int charge_voltage_max_uv; + int charge_safety_timer_minutes; +}; + #define POWER_SUPPLY_OCV_TEMP_MAX 20 /** @@ -394,6 +440,10 @@ struct power_supply_resistance_temp_table { * @constant_charge_voltage_max_uv: voltage in microvolts signifying the end of * the CC (constant current) charging phase and the beginning of the CV * (constant voltage) charging phase. + * @maintenance_charge: an array of maintenance charging settings to be used + * after the main CC/CV charging phase is complete. + * @maintenance_charge_size: the number of maintenance charging settings in + * maintenance_charge. * @factory_internal_resistance_uohm: the internal resistance of the battery * at fabrication time, expressed in microohms. This resistance will vary * depending on the lifetime and charge of the battery, so this is just a @@ -543,6 +593,8 @@ struct power_supply_battery_info { int overvoltage_limit_uv; int constant_charge_current_max_ua; int constant_charge_voltage_max_uv; + struct power_supply_maintenance_charge_table *maintenance_charge; + int maintenance_charge_size; int factory_internal_resistance_uohm; int ocv_temp[POWER_SUPPLY_OCV_TEMP_MAX]; int temp_ambient_alert_min; @@ -596,6 +648,8 @@ extern int power_supply_batinfo_ocv2cap(struct power_supply_battery_info *info, extern int power_supply_temp2resist_simple(struct power_supply_resistance_temp_table *table, int table_len, int temp); +extern struct power_supply_maintenance_charge_table * +power_supply_get_maintenance_charging_setting(struct power_supply_battery_info *info, int index); extern void power_supply_changed(struct power_supply *psy); extern int power_supply_am_i_supplied(struct power_supply *psy); int power_supply_get_property_from_supplier(struct power_supply *psy, @@ -603,6 +657,16 @@ int power_supply_get_property_from_supplier(struct power_supply *psy, union power_supply_propval *val); extern int power_supply_set_battery_charged(struct power_supply *psy); +static inline bool +power_supply_supports_maintenance_charging(struct power_supply_battery_info *info) +{ + struct power_supply_maintenance_charge_table *mt; + + mt = power_supply_get_maintenance_charging_setting(info, 0); + + return (mt != NULL); +} + #ifdef CONFIG_POWER_SUPPLY extern int power_supply_is_system_supplied(void); #else -- cgit From 0e8b903b522b5a3cb473035cea085d396dd7150a Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Sat, 26 Feb 2022 00:27:56 +0100 Subject: power: supply: ab8500: Standardize alert mode charging The AB8500 code is using a special current and voltage setting when the battery is in "alert mode", i.e. when it is starting to go outside normal operating conditions so it is too cold or too hot. This makes sense as a way for the charging algorithm to deal with hostile environments. Add the needed members to the struct power_supply_battery_info, and switch the AB8500 charging code over to using this. Reviewed-by: Matti Vaittineen Signed-off-by: Linus Walleij Signed-off-by: Sebastian Reichel --- include/linux/power_supply.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h index 8ced6550caa7..f8601598d3d3 100644 --- a/include/linux/power_supply.h +++ b/include/linux/power_supply.h @@ -444,6 +444,19 @@ struct power_supply_maintenance_charge_table { * after the main CC/CV charging phase is complete. * @maintenance_charge_size: the number of maintenance charging settings in * maintenance_charge. + * @alert_low_temp_charge_current_ua: The charging current to use if the battery + * enters low alert temperature, i.e. if the internal temperature is between + * temp_alert_min and temp_min. No matter the charging phase, this + * and alert_high_temp_charge_voltage_uv will be applied. + * @alert_low_temp_charge_voltage_uv: Same as alert_low_temp_charge_current_ua, + * but for the charging voltage. + * @alert_high_temp_charge_current_ua: The charging current to use if the + * battery enters high alert temperature, i.e. if the internal temperature is + * between temp_alert_max and temp_max. No matter the charging phase, this + * and alert_high_temp_charge_voltage_uv will be applied, usually lowering + * the charging current as an evasive manouver. + * @alert_high_temp_charge_voltage_uv: Same as + * alert_high_temp_charge_current_ua, but for the charging voltage. * @factory_internal_resistance_uohm: the internal resistance of the battery * at fabrication time, expressed in microohms. This resistance will vary * depending on the lifetime and charge of the battery, so this is just a @@ -595,6 +608,10 @@ struct power_supply_battery_info { int constant_charge_voltage_max_uv; struct power_supply_maintenance_charge_table *maintenance_charge; int maintenance_charge_size; + int alert_low_temp_charge_current_ua; + int alert_low_temp_charge_voltage_uv; + int alert_high_temp_charge_current_ua; + int alert_high_temp_charge_voltage_uv; int factory_internal_resistance_uohm; int ocv_temp[POWER_SUPPLY_OCV_TEMP_MAX]; int temp_ambient_alert_min; -- cgit From 1f918e0fe43ec41d906e2cf96b80b15451fed7ba Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Sat, 26 Feb 2022 00:27:57 +0100 Subject: power: supply: ab8500: Standardize BTI resistance The Battery Type Indicator (BTI) resistor is a resistor mounted between a special terminal on the battery and ground. By sending a fixed current (such as 7mA) through this resistor and measuring the voltage over it, the resistance can be determined, and this verifies the battery type. Typical side view of the battery: o o o GND BTI +3.8V Typical example of the electrical layout: +3.8 V BTI | | | + | _______ [ ] 7kOhm ___ | | | | | GND GND By verifying this resistance before attempting to charge the battery we add an additional level of security. In some systems this is used for plug-and-play of batteries with different capacity. In other cases, this is merely used to verify that the right type of battery is connected, if several batteries have the same physical shape and can be plugged into the same slot. Sometimes this is just a surplus security mechanism. Nokia and Samsung among many other vendors are known to use these BTI resistors. Add the BTI properties to struct power_supply_battery_info and switch the AB8500 charger code over to using it. Signed-off-by: Linus Walleij Signed-off-by: Sebastian Reichel --- include/linux/power_supply.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h index f8601598d3d3..7fdc03cf2285 100644 --- a/include/linux/power_supply.h +++ b/include/linux/power_supply.h @@ -498,6 +498,14 @@ struct power_supply_maintenance_charge_table { * by temperature: highest temperature with lowest resistance first, lowest * temperature with highest resistance last. * @resist_table_size: the number of items in the resist_table. + * @bti_resistance_ohm: The Battery Type Indicator (BIT) nominal resistance + * in ohms for this battery, if an identification resistor is mounted + * between a third battery terminal and ground. This scheme is used by a lot + * of mobile device batteries. + * @bti_resistance_tolerance: The tolerance in percent of the BTI resistance, + * for example 10 for +/- 10%, if the bti_resistance is set to 7000 and the + * tolerance is 10% we will detect a proper battery if the BTI resistance + * is between 6300 and 7700 Ohm. * * This is the recommended struct to manage static battery parameters, * populated by power_supply_get_battery_info(). Most platform drivers should @@ -624,6 +632,8 @@ struct power_supply_battery_info { int ocv_table_size[POWER_SUPPLY_OCV_TEMP_MAX]; struct power_supply_resistance_temp_table *resist_table; int resist_table_size; + int bti_resistance_ohm; + int bti_resistance_tolerance; }; extern struct atomic_notifier_head power_supply_notifier; @@ -667,6 +677,8 @@ power_supply_temp2resist_simple(struct power_supply_resistance_temp_table *table int table_len, int temp); extern struct power_supply_maintenance_charge_table * power_supply_get_maintenance_charging_setting(struct power_supply_battery_info *info, int index); +extern bool power_supply_battery_bti_in_range(struct power_supply_battery_info *info, + int resistance); extern void power_supply_changed(struct power_supply *psy); extern int power_supply_am_i_supplied(struct power_supply *psy); int power_supply_get_property_from_supplier(struct power_supply *psy, @@ -684,6 +696,7 @@ power_supply_supports_maintenance_charging(struct power_supply_battery_info *inf return (mt != NULL); } + #ifdef CONFIG_POWER_SUPPLY extern int power_supply_is_system_supplied(void); #else -- cgit From e9e7d165b4b0413c5b7db74515e4981a226b78a0 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Sat, 26 Feb 2022 00:27:58 +0100 Subject: power: supply: Support VBAT-to-Ri lookup tables In Samsung devices, the method used to compensate for temperature, age, load etc is by way of VBAT to Ri tables, which correlates the battery voltage under load (VBAT) to an internal resistance (Ri). Using this Ri and a measurement of the current out of the battery (IBAT) the open circuit voltage (OCV) can be calculated as: OCV = VBAT - (Ri * IBAT) The details are described in comments to struct power_supply_battery_info in the commit. Since not all batteries supply this VBAT-to-Ri data, the fallback method to use the temperature-to-Ri lookup table can also be used as a fallback. Add two helper functions to check if we have the tables needed for using power_supply_vbat2ri() or power_supply_temp2resist_simple() respectively, so capacity estimation code can choose which one to employ. Signed-off-by: Linus Walleij Signed-off-by: Sebastian Reichel --- include/linux/power_supply.h | 113 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 111 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h index 7fdc03cf2285..cb380c1d9459 100644 --- a/include/linux/power_supply.h +++ b/include/linux/power_supply.h @@ -349,6 +349,11 @@ struct power_supply_resistance_temp_table { int resistance; /* internal resistance percent */ }; +struct power_supply_vbat_ri_table { + int vbat_uv; /* Battery voltage in microvolt */ + int ri_uohm; /* Internal resistance in microohm */ +}; + /** * struct power_supply_maintenance_charge_table - setting for maintenace charging * @charge_current_max_ua: maintenance charging current that is used to keep @@ -460,7 +465,14 @@ struct power_supply_maintenance_charge_table { * @factory_internal_resistance_uohm: the internal resistance of the battery * at fabrication time, expressed in microohms. This resistance will vary * depending on the lifetime and charge of the battery, so this is just a - * nominal ballpark figure. + * nominal ballpark figure. This internal resistance is given for the state + * when the battery is discharging. + * @factory_internal_resistance_charging_uohm: the internal resistance of the + * battery at fabrication time while charging, expressed in microohms. + * The charging process will affect the internal resistance of the battery + * so this value provides a better resistance under these circumstances. + * This resistance will vary depending on the lifetime and charge of the + * battery, so this is just a nominal ballpark figure. * @ocv_temp: array indicating the open circuit voltage (OCV) capacity * temperature indices. This is an array of temperatures in degrees Celsius * indicating which capacity table to use for a certain temperature, since @@ -498,6 +510,21 @@ struct power_supply_maintenance_charge_table { * by temperature: highest temperature with lowest resistance first, lowest * temperature with highest resistance last. * @resist_table_size: the number of items in the resist_table. + * @vbat2ri_discharging: this is a table that correlates Battery voltage (VBAT) + * to internal resistance (Ri). The resistance is given in microohm for the + * corresponding voltage in microvolts. The internal resistance is used to + * determine the open circuit voltage so that we can determine the capacity + * of the battery. These voltages to resistance tables apply when the battery + * is discharging. The table must be ordered descending by voltage: highest + * voltage first. + * @vbat2ri_discharging_size: the number of items in the vbat2ri_discharging + * table. + * @vbat2ri_charging: same function as vbat2ri_discharging but for the state + * when the battery is charging. Being under charge changes the battery's + * internal resistance characteristics so a separate table is needed.* + * The table must be ordered descending by voltage: highest voltage first. + * @vbat2ri_charging_size: the number of items in the vbat2ri_charging + * table. * @bti_resistance_ohm: The Battery Type Indicator (BIT) nominal resistance * in ohms for this battery, if an identification resistor is mounted * between a third battery terminal and ground. This scheme is used by a lot @@ -512,7 +539,9 @@ struct power_supply_maintenance_charge_table { * use these for consistency. * * Its field names must correspond to elements in enum power_supply_property. - * The default field value is -EINVAL. + * The default field value is -EINVAL or NULL for pointers. + * + * CC/CV CHARGING: * * The charging parameters here assume a CC/CV charging scheme. This method * is most common with Lithium Ion batteries (other methods are possible) and @@ -597,6 +626,66 @@ struct power_supply_maintenance_charge_table { * Overcharging Lithium Ion cells can be DANGEROUS and lead to fire or * explosions. * + * DETERMINING BATTERY CAPACITY: + * + * Several members of the struct deal with trying to determine the remaining + * capacity in the battery, usually as a percentage of charge. In practice + * many chargers uses a so-called fuel gauge or coloumb counter that measure + * how much charge goes into the battery and how much goes out (+/- leak + * consumption). This does not help if we do not know how much capacity the + * battery has to begin with, such as when it is first used or was taken out + * and charged in a separate charger. Therefore many capacity algorithms use + * the open circuit voltage with a look-up table to determine the rough + * capacity of the battery. The open circuit voltage can be conceptualized + * with an ideal voltage source (V) in series with an internal resistance (Ri) + * like this: + * + * +-------> IBAT >----------------+ + * | ^ | + * [ ] Ri | | + * | | VBAT | + * o <---------- | | + * +| ^ | [ ] Rload + * .---. | | | + * | V | | OCV | | + * '---' | | | + * | | | | + * GND +-------------------------------+ + * + * If we disconnect the load (here simplified as a fixed resistance Rload) + * and measure VBAT with a infinite impedance voltage meter we will get + * VBAT = OCV and this assumption is sometimes made even under load, assuming + * Rload is insignificant. However this will be of dubious quality because the + * load is rarely that small and Ri is strongly nonlinear depending on + * temperature and how much capacity is left in the battery due to the + * chemistry involved. + * + * In many practical applications we cannot just disconnect the battery from + * the load, so instead we often try to measure the instantaneous IBAT (the + * current out from the battery), estimate the Ri and thus calculate the + * voltage drop over Ri and compensate like this: + * + * OCV = VBAT - (IBAT * Ri) + * + * The tables vbat2ri_discharging and vbat2ri_charging are used to determine + * (by interpolation) the Ri from the VBAT under load. These curves are highly + * nonlinear and may need many datapoints but can be found in datasheets for + * some batteries. This gives the compensated open circuit voltage (OCV) for + * the battery even under load. Using this method will also compensate for + * temperature changes in the environment: this will also make the internal + * resistance change, and it will affect the VBAT under load, so correlating + * VBAT to Ri takes both remaining capacity and temperature into consideration. + * + * Alternatively a manufacturer can specify how the capacity of the battery + * is dependent on the battery temperature which is the main factor affecting + * Ri. As we know all checmical reactions are faster when it is warm and slower + * when it is cold. You can put in 1500mAh and only get 800mAh out before the + * voltage drops too low for example. This effect is also highly nonlinear and + * the purpose of the table resist_table: this will take a temperature and + * tell us how big percentage of Ri the specified temperature correlates to. + * Usually we have 100% of the factory_internal_resistance_uohm at 25 degrees + * Celsius. + * * The power supply class itself doesn't use this struct as of now. */ @@ -621,6 +710,7 @@ struct power_supply_battery_info { int alert_high_temp_charge_current_ua; int alert_high_temp_charge_voltage_uv; int factory_internal_resistance_uohm; + int factory_internal_resistance_charging_uohm; int ocv_temp[POWER_SUPPLY_OCV_TEMP_MAX]; int temp_ambient_alert_min; int temp_ambient_alert_max; @@ -632,6 +722,10 @@ struct power_supply_battery_info { int ocv_table_size[POWER_SUPPLY_OCV_TEMP_MAX]; struct power_supply_resistance_temp_table *resist_table; int resist_table_size; + struct power_supply_vbat_ri_table *vbat2ri_discharging; + int vbat2ri_discharging_size; + struct power_supply_vbat_ri_table *vbat2ri_charging; + int vbat2ri_charging_size; int bti_resistance_ohm; int bti_resistance_tolerance; }; @@ -675,6 +769,8 @@ extern int power_supply_batinfo_ocv2cap(struct power_supply_battery_info *info, extern int power_supply_temp2resist_simple(struct power_supply_resistance_temp_table *table, int table_len, int temp); +extern int power_supply_vbat2ri(struct power_supply_battery_info *info, + int vbat_uv, bool charging); extern struct power_supply_maintenance_charge_table * power_supply_get_maintenance_charging_setting(struct power_supply_battery_info *info, int index); extern bool power_supply_battery_bti_in_range(struct power_supply_battery_info *info, @@ -696,6 +792,19 @@ power_supply_supports_maintenance_charging(struct power_supply_battery_info *inf return (mt != NULL); } +static inline bool +power_supply_supports_vbat2ri(struct power_supply_battery_info *info) +{ + return ((info->vbat2ri_discharging != NULL) && + info->vbat2ri_discharging_size > 0); +} + +static inline bool +power_supply_supports_temp2ri(struct power_supply_battery_info *info) +{ + return ((info->resist_table != NULL) && + info->resist_table_size > 0); +} #ifdef CONFIG_POWER_SUPPLY extern int power_supply_is_system_supplied(void); -- cgit From 342479c86d3e8f9e946a07ff0cafbd36511ae30a Mon Sep 17 00:00:00 2001 From: Chun-Jie Chen Date: Sun, 30 Jan 2022 09:21:04 +0800 Subject: soc: mediatek: pm-domains: Add support for mt8195 Add domain control data including bus protection data size change due to more protection steps in mt8195. Signed-off-by: Chun-Jie Chen Reviewed-by: Chen-Yu Tsai Reviewed-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20220130012104.5292-6-chun-jie.chen@mediatek.com Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/infracfg.h | 82 +++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/infracfg.h b/include/linux/soc/mediatek/infracfg.h index 4615a228da51..d858e0bab7a2 100644 --- a/include/linux/soc/mediatek/infracfg.h +++ b/include/linux/soc/mediatek/infracfg.h @@ -2,6 +2,88 @@ #ifndef __SOC_MEDIATEK_INFRACFG_H #define __SOC_MEDIATEK_INFRACFG_H +#define MT8195_TOP_AXI_PROT_EN_STA1 0x228 +#define MT8195_TOP_AXI_PROT_EN_1_STA1 0x258 +#define MT8195_TOP_AXI_PROT_EN_SET 0x2a0 +#define MT8195_TOP_AXI_PROT_EN_CLR 0x2a4 +#define MT8195_TOP_AXI_PROT_EN_1_SET 0x2a8 +#define MT8195_TOP_AXI_PROT_EN_1_CLR 0x2ac +#define MT8195_TOP_AXI_PROT_EN_MM_SET 0x2d4 +#define MT8195_TOP_AXI_PROT_EN_MM_CLR 0x2d8 +#define MT8195_TOP_AXI_PROT_EN_MM_STA1 0x2ec +#define MT8195_TOP_AXI_PROT_EN_2_SET 0x714 +#define MT8195_TOP_AXI_PROT_EN_2_CLR 0x718 +#define MT8195_TOP_AXI_PROT_EN_2_STA1 0x724 +#define MT8195_TOP_AXI_PROT_EN_VDNR_SET 0xb84 +#define MT8195_TOP_AXI_PROT_EN_VDNR_CLR 0xb88 +#define MT8195_TOP_AXI_PROT_EN_VDNR_STA1 0xb90 +#define MT8195_TOP_AXI_PROT_EN_VDNR_1_SET 0xba4 +#define MT8195_TOP_AXI_PROT_EN_VDNR_1_CLR 0xba8 +#define MT8195_TOP_AXI_PROT_EN_VDNR_1_STA1 0xbb0 +#define MT8195_TOP_AXI_PROT_EN_VDNR_2_SET 0xbb8 +#define MT8195_TOP_AXI_PROT_EN_VDNR_2_CLR 0xbbc +#define MT8195_TOP_AXI_PROT_EN_VDNR_2_STA1 0xbc4 +#define MT8195_TOP_AXI_PROT_EN_SUB_INFRA_VDNR_SET 0xbcc +#define MT8195_TOP_AXI_PROT_EN_SUB_INFRA_VDNR_CLR 0xbd0 +#define MT8195_TOP_AXI_PROT_EN_SUB_INFRA_VDNR_STA1 0xbd8 +#define MT8195_TOP_AXI_PROT_EN_MM_2_SET 0xdcc +#define MT8195_TOP_AXI_PROT_EN_MM_2_CLR 0xdd0 +#define MT8195_TOP_AXI_PROT_EN_MM_2_STA1 0xdd8 + +#define MT8195_TOP_AXI_PROT_EN_VDOSYS0 BIT(6) +#define MT8195_TOP_AXI_PROT_EN_VPPSYS0 BIT(10) +#define MT8195_TOP_AXI_PROT_EN_MFG1 BIT(11) +#define MT8195_TOP_AXI_PROT_EN_MFG1_2ND GENMASK(22, 21) +#define MT8195_TOP_AXI_PROT_EN_VPPSYS0_2ND BIT(23) +#define MT8195_TOP_AXI_PROT_EN_1_MFG1 GENMASK(20, 19) +#define MT8195_TOP_AXI_PROT_EN_1_CAM BIT(22) +#define MT8195_TOP_AXI_PROT_EN_2_CAM BIT(0) +#define MT8195_TOP_AXI_PROT_EN_2_MFG1_2ND GENMASK(6, 5) +#define MT8195_TOP_AXI_PROT_EN_2_MFG1 BIT(7) +#define MT8195_TOP_AXI_PROT_EN_2_AUDIO (BIT(9) | BIT(11)) +#define MT8195_TOP_AXI_PROT_EN_2_ADSP (BIT(12) | GENMASK(16, 14)) +#define MT8195_TOP_AXI_PROT_EN_MM_CAM (BIT(0) | BIT(2) | BIT(4)) +#define MT8195_TOP_AXI_PROT_EN_MM_IPE BIT(1) +#define MT8195_TOP_AXI_PROT_EN_MM_IMG BIT(3) +#define MT8195_TOP_AXI_PROT_EN_MM_VDOSYS0 GENMASK(21, 17) +#define MT8195_TOP_AXI_PROT_EN_MM_VPPSYS1 GENMASK(8, 5) +#define MT8195_TOP_AXI_PROT_EN_MM_VENC (BIT(9) | BIT(11)) +#define MT8195_TOP_AXI_PROT_EN_MM_VENC_CORE1 (BIT(10) | BIT(12)) +#define MT8195_TOP_AXI_PROT_EN_MM_VDEC0 BIT(13) +#define MT8195_TOP_AXI_PROT_EN_MM_VDEC1 BIT(14) +#define MT8195_TOP_AXI_PROT_EN_MM_VDOSYS1_2ND BIT(22) +#define MT8195_TOP_AXI_PROT_EN_MM_VPPSYS1_2ND BIT(23) +#define MT8195_TOP_AXI_PROT_EN_MM_CAM_2ND BIT(24) +#define MT8195_TOP_AXI_PROT_EN_MM_IMG_2ND BIT(25) +#define MT8195_TOP_AXI_PROT_EN_MM_VENC_2ND BIT(26) +#define MT8195_TOP_AXI_PROT_EN_MM_WPESYS BIT(27) +#define MT8195_TOP_AXI_PROT_EN_MM_VDEC0_2ND BIT(28) +#define MT8195_TOP_AXI_PROT_EN_MM_VDEC1_2ND BIT(29) +#define MT8195_TOP_AXI_PROT_EN_MM_VDOSYS1 GENMASK(31, 30) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VPPSYS0_2ND (GENMASK(1, 0) | BIT(4) | BIT(11)) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VENC BIT(2) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VENC_CORE1 (BIT(3) | BIT(15)) +#define MT8195_TOP_AXI_PROT_EN_MM_2_CAM (BIT(5) | BIT(17)) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VPPSYS1 (GENMASK(7, 6) | BIT(18)) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VPPSYS0 GENMASK(9, 8) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VDOSYS1 BIT(10) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VDEC2_2ND BIT(12) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VDEC0_2ND BIT(13) +#define MT8195_TOP_AXI_PROT_EN_MM_2_WPESYS_2ND BIT(14) +#define MT8195_TOP_AXI_PROT_EN_MM_2_IPE BIT(16) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VDEC2 BIT(21) +#define MT8195_TOP_AXI_PROT_EN_MM_2_VDEC0 BIT(22) +#define MT8195_TOP_AXI_PROT_EN_MM_2_WPESYS GENMASK(24, 23) +#define MT8195_TOP_AXI_PROT_EN_VDNR_1_EPD_TX BIT(1) +#define MT8195_TOP_AXI_PROT_EN_VDNR_1_DP_TX BIT(2) +#define MT8195_TOP_AXI_PROT_EN_VDNR_PCIE_MAC_P0 (BIT(11) | BIT(28)) +#define MT8195_TOP_AXI_PROT_EN_VDNR_PCIE_MAC_P1 (BIT(12) | BIT(29)) +#define MT8195_TOP_AXI_PROT_EN_VDNR_1_PCIE_MAC_P0 BIT(13) +#define MT8195_TOP_AXI_PROT_EN_VDNR_1_PCIE_MAC_P1 BIT(14) +#define MT8195_TOP_AXI_PROT_EN_SUB_INFRA_VDNR_MFG1 (BIT(17) | BIT(19)) +#define MT8195_TOP_AXI_PROT_EN_SUB_INFRA_VDNR_VPPSYS0 BIT(20) +#define MT8195_TOP_AXI_PROT_EN_SUB_INFRA_VDNR_VDOSYS0 BIT(21) + #define MT8192_TOP_AXI_PROT_EN_STA1 0x228 #define MT8192_TOP_AXI_PROT_EN_1_STA1 0x258 #define MT8192_TOP_AXI_PROT_EN_SET 0x2a0 -- cgit From 88590cbc17033c86c8591d9f22401325961a8a59 Mon Sep 17 00:00:00 2001 From: Chun-Jie Chen Date: Tue, 15 Feb 2022 18:49:17 +0800 Subject: soc: mediatek: pm-domains: Add support for mt8186 Add power domain control data in mt8186. Signed-off-by: Chun-Jie Chen Link: https://lore.kernel.org/r/20220215104917.5726-3-chun-jie.chen@mediatek.com Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/infracfg.h | 48 +++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/infracfg.h b/include/linux/soc/mediatek/infracfg.h index d858e0bab7a2..8a1c2040a28e 100644 --- a/include/linux/soc/mediatek/infracfg.h +++ b/include/linux/soc/mediatek/infracfg.h @@ -140,6 +140,54 @@ #define MT8192_TOP_AXI_PROT_EN_MM_2_MDP_2ND BIT(13) #define MT8192_TOP_AXI_PROT_EN_VDNR_CAM BIT(21) +#define MT8186_TOP_AXI_PROT_EN_SET (0x2A0) +#define MT8186_TOP_AXI_PROT_EN_CLR (0x2A4) +#define MT8186_TOP_AXI_PROT_EN_STA (0x228) +#define MT8186_TOP_AXI_PROT_EN_1_SET (0x2A8) +#define MT8186_TOP_AXI_PROT_EN_1_CLR (0x2AC) +#define MT8186_TOP_AXI_PROT_EN_1_STA (0x258) +#define MT8186_TOP_AXI_PROT_EN_2_SET (0x2B0) +#define MT8186_TOP_AXI_PROT_EN_2_CLR (0x2B4) +#define MT8186_TOP_AXI_PROT_EN_2_STA (0x26C) +#define MT8186_TOP_AXI_PROT_EN_3_SET (0x2B8) +#define MT8186_TOP_AXI_PROT_EN_3_CLR (0x2BC) +#define MT8186_TOP_AXI_PROT_EN_3_STA (0x2C8) + +/* MFG1 */ +#define MT8186_TOP_AXI_PROT_EN_1_MFG1_STEP1 (GENMASK(28, 27)) +#define MT8186_TOP_AXI_PROT_EN_MFG1_STEP2 (GENMASK(22, 21)) +#define MT8186_TOP_AXI_PROT_EN_MFG1_STEP3 (BIT(25)) +#define MT8186_TOP_AXI_PROT_EN_1_MFG1_STEP4 (BIT(29)) +/* DIS */ +#define MT8186_TOP_AXI_PROT_EN_1_DIS_STEP1 (GENMASK(12, 11)) +#define MT8186_TOP_AXI_PROT_EN_DIS_STEP2 (GENMASK(2, 1) | GENMASK(11, 10)) +/* IMG */ +#define MT8186_TOP_AXI_PROT_EN_1_IMG_STEP1 (BIT(23)) +#define MT8186_TOP_AXI_PROT_EN_1_IMG_STEP2 (BIT(15)) +/* IPE */ +#define MT8186_TOP_AXI_PROT_EN_1_IPE_STEP1 (BIT(24)) +#define MT8186_TOP_AXI_PROT_EN_1_IPE_STEP2 (BIT(16)) +/* CAM */ +#define MT8186_TOP_AXI_PROT_EN_1_CAM_STEP1 (GENMASK(22, 21)) +#define MT8186_TOP_AXI_PROT_EN_1_CAM_STEP2 (GENMASK(14, 13)) +/* VENC */ +#define MT8186_TOP_AXI_PROT_EN_1_VENC_STEP1 (BIT(31)) +#define MT8186_TOP_AXI_PROT_EN_1_VENC_STEP2 (BIT(19)) +/* VDEC */ +#define MT8186_TOP_AXI_PROT_EN_1_VDEC_STEP1 (BIT(30)) +#define MT8186_TOP_AXI_PROT_EN_1_VDEC_STEP2 (BIT(17)) +/* WPE */ +#define MT8186_TOP_AXI_PROT_EN_2_WPE_STEP1 (BIT(17)) +#define MT8186_TOP_AXI_PROT_EN_2_WPE_STEP2 (BIT(16)) +/* CONN_ON */ +#define MT8186_TOP_AXI_PROT_EN_1_CONN_ON_STEP1 (BIT(18)) +#define MT8186_TOP_AXI_PROT_EN_CONN_ON_STEP2 (BIT(14)) +#define MT8186_TOP_AXI_PROT_EN_CONN_ON_STEP3 (BIT(13)) +#define MT8186_TOP_AXI_PROT_EN_CONN_ON_STEP4 (BIT(16)) +/* ADSP_TOP */ +#define MT8186_TOP_AXI_PROT_EN_3_ADSP_TOP_STEP1 (GENMASK(12, 11)) +#define MT8186_TOP_AXI_PROT_EN_3_ADSP_TOP_STEP2 (GENMASK(1, 0)) + #define MT8183_TOP_AXI_PROT_EN_STA1 0x228 #define MT8183_TOP_AXI_PROT_EN_STA1_1 0x258 #define MT8183_TOP_AXI_PROT_EN_SET 0x2a0 -- cgit From e65b831a1e191caff3fc0d06bc7019cdaf8f868e Mon Sep 17 00:00:00 2001 From: Qinghua Jin Date: Fri, 7 Jan 2022 10:22:58 +0800 Subject: nvme-fc: fix a typo subsytem -> subsystem Signed-off-by: Qinghua Jin Signed-off-by: Christoph Hellwig --- include/linux/nvme-fc-driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h index cb909edb76c4..5358a5facdee 100644 --- a/include/linux/nvme-fc-driver.h +++ b/include/linux/nvme-fc-driver.h @@ -721,7 +721,7 @@ enum { * * Fields with static values for the port. Initialized by the * port_info struct supplied to the registration call. - * @port_num: NVME-FC transport subsytem port number + * @port_num: NVME-FC transport subsystem port number * @node_name: FC WWNN for the port * @port_name: FC WWPN for the port * @private: pointer to memory allocated alongside the local port -- cgit From bd83fe6f2cd2133beaac7c423fd36c3515048fc8 Mon Sep 17 00:00:00 2001 From: Alan Adamson Date: Thu, 3 Feb 2022 00:11:53 -0800 Subject: nvme: add verbose error logging Improves logging of NVMe errors. If NVME_VERBOSE_ERRORS is configured, a verbose description of the error is logged, otherwise only status codes/bits is logged. Signed-off-by: Chaitanya Kulkarni [kch]: fix several nits, cosmetics, and trim down code. Signed-off-by: Martin K. Petersen Signed-off-by: Alan Adamson Reviewed-by: Himanshu Madhani Reviewed-by: Keith Busch Signed-off-by: Christoph Hellwig --- include/linux/nvme.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 855dd9b3e84b..1f946e5bf7c1 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -1636,6 +1636,7 @@ enum { NVME_SC_HOST_ABORTED_CMD = 0x371, NVME_SC_CRD = 0x1800, + NVME_SC_MORE = 0x2000, NVME_SC_DNR = 0x4000, }; -- cgit From 86c2457a8e8112f16af8fd10a3e1dd7a302c3c3e Mon Sep 17 00:00:00 2001 From: Martin Belanger Date: Tue, 8 Feb 2022 14:33:46 -0500 Subject: nvme: expose cntrltype and dctype through sysfs TP8010 introduces the Discovery Controller Type attribute (dctype). The dctype is returned in the response to the Identify command. This patch exposes the dctype through the sysfs. Since the dctype depends on the Controller Type (cntrltype), another attribute of the Identify response, the patch also exposes the cntrltype as well. The dctype will only be displayed for discovery controllers. A note about the naming of this attribute: Although TP8010 calls this attribute the Discovery Controller Type, note that the dctype is now part of the response to the Identify command for all controller types. I/O, Discovery, and Admin controllers all share the same Identify response PDU structure. Non-discovery controllers as well as pre-TP8010 discovery controllers will continue to set this field to 0 (which has always been the default for reserved bytes). Per TP8010, the value 0 now means "Discovery controller type is not reported" instead of "Reserved". One could argue that this definition is correct even for non-discovery controllers, and by extension, exposing it in the sysfs for non-discovery controllers is appropriate. Signed-off-by: Martin Belanger Reviewed-by: Chaitanya Kulkarni Reviewed-by: John Meneghini Reviewed-by: Hannes Reinecke Signed-off-by: Christoph Hellwig --- include/linux/nvme.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 1f946e5bf7c1..9dbc3ef4daf7 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -43,6 +43,12 @@ enum nvme_ctrl_type { NVME_CTRL_ADMIN = 3, /* Administrative controller */ }; +enum nvme_dctype { + NVME_DCTYPE_NOT_REPORTED = 0, + NVME_DCTYPE_DDC = 1, /* Direct Discovery Controller */ + NVME_DCTYPE_CDC = 2, /* Central Discovery Controller */ +}; + /* Address Family codes for Discovery Log Page entry ADRFAM field */ enum { NVMF_ADDR_FAMILY_PCI = 0, /* PCIe */ @@ -320,7 +326,9 @@ struct nvme_id_ctrl { __le16 icdoff; __u8 ctrattr; __u8 msdbd; - __u8 rsvd1804[244]; + __u8 rsvd1804[2]; + __u8 dctype; + __u8 rsvd1807[241]; struct nvme_id_power_state psd[32]; __u8 vs[1024]; }; -- cgit From a5081bad2eac5108593ac36b6551201f0df9e897 Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Sat, 26 Feb 2022 14:56:22 +0000 Subject: net: phylink: remove phylink_set_pcs() As all users of phylink_set_pcs() have now been updated to use the mac_select_pcs() method, it can be removed from the phylink kernel API and its functionality moved into phylink_major_config(). Removing phylink_set_pcs() gives us a single approach for attaching a PCS within phylink. Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 9ef9b7047f19..223781622b33 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -532,7 +532,6 @@ void phylink_generic_validate(struct phylink_config *config, struct phylink *phylink_create(struct phylink_config *, struct fwnode_handle *, phy_interface_t iface, const struct phylink_mac_ops *mac_ops); -void phylink_set_pcs(struct phylink *, struct phylink_pcs *pcs); void phylink_destroy(struct phylink *); int phylink_connect_phy(struct phylink *, struct phy_device *); -- cgit From 989192ac6ad54acccd5dd1c058055dacd6b94b30 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 16 Feb 2022 10:52:41 +0800 Subject: iommu/vt-d: Remove guest pasid related callbacks The guest pasid related callbacks are not called in the tree. Remove them to avoid dead code. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220216025249.3459465-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 10 ---------- include/linux/intel-svm.h | 12 ------------ 2 files changed, 22 deletions(-) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 69230fd695ea..beaacb589abd 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -525,12 +525,6 @@ struct context_entry { */ #define DOMAIN_FLAG_USE_FIRST_LEVEL BIT(1) -/* - * Domain represents a virtual machine which demands iommu nested - * translation mode support. - */ -#define DOMAIN_FLAG_NESTING_MODE BIT(2) - struct dmar_domain { int nid; /* node id */ @@ -765,9 +759,6 @@ struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn); extern void intel_svm_check(struct intel_iommu *iommu); extern int intel_svm_enable_prq(struct intel_iommu *iommu); extern int intel_svm_finish_prq(struct intel_iommu *iommu); -int intel_svm_bind_gpasid(struct iommu_domain *domain, struct device *dev, - struct iommu_gpasid_bind_data *data); -int intel_svm_unbind_gpasid(struct device *dev, u32 pasid); struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, void *drvdata); void intel_svm_unbind(struct iommu_sva *handle); @@ -795,7 +786,6 @@ struct intel_svm { unsigned int flags; u32 pasid; - int gpasid; /* In case that guest PASID is different from host PASID */ struct list_head devs; }; #else diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h index 1b73bab7eeff..b3b125b332aa 100644 --- a/include/linux/intel-svm.h +++ b/include/linux/intel-svm.h @@ -25,17 +25,5 @@ * do such IOTLB flushes automatically. */ #define SVM_FLAG_SUPERVISOR_MODE BIT(0) -/* - * The SVM_FLAG_GUEST_MODE flag is used when a PASID bind is for guest - * processes. Compared to the host bind, the primary differences are: - * 1. mm life cycle management - * 2. fault reporting - */ -#define SVM_FLAG_GUEST_MODE BIT(1) -/* - * The SVM_FLAG_GUEST_PASID flag is used when a guest has its own PASID space, - * which requires guest and host PASID translation at both directions. - */ -#define SVM_FLAG_GUEST_PASID BIT(2) #endif /* __INTEL_SVM_H__ */ -- cgit From 0c9f178778919bb500443f340647b44227778fb2 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 16 Feb 2022 10:52:42 +0800 Subject: iommu: Remove guest pasid related interfaces and definitions The guest pasid related uapi interfaces and definitions are not referenced anywhere in the tree. We've also reached a consensus to replace them with a new iommufd design. Remove them to avoid dead code. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220216025249.3459465-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 44 -------------------------------------------- 1 file changed, 44 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index de0c57a567c8..cde1d0aad60f 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -229,9 +229,6 @@ struct iommu_iotlb_gather { * @sva_unbind: Unbind process address space from device * @sva_get_pasid: Get PASID associated to a SVA handle * @page_response: handle page request response - * @cache_invalidate: invalidate translation caches - * @sva_bind_gpasid: bind guest pasid and mm - * @sva_unbind_gpasid: unbind guest pasid and mm * @def_domain_type: device default domain type, return value: * - IOMMU_DOMAIN_IDENTITY: must use an identity domain * - IOMMU_DOMAIN_DMA: must use a dma domain @@ -301,12 +298,6 @@ struct iommu_ops { int (*page_response)(struct device *dev, struct iommu_fault_event *evt, struct iommu_page_response *msg); - int (*cache_invalidate)(struct iommu_domain *domain, struct device *dev, - struct iommu_cache_invalidate_info *inv_info); - int (*sva_bind_gpasid)(struct iommu_domain *domain, - struct device *dev, struct iommu_gpasid_bind_data *data); - - int (*sva_unbind_gpasid)(struct device *dev, u32 pasid); int (*def_domain_type)(struct device *dev); @@ -421,14 +412,6 @@ extern int iommu_attach_device(struct iommu_domain *domain, struct device *dev); extern void iommu_detach_device(struct iommu_domain *domain, struct device *dev); -extern int iommu_uapi_cache_invalidate(struct iommu_domain *domain, - struct device *dev, - void __user *uinfo); - -extern int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain, - struct device *dev, void __user *udata); -extern int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, - struct device *dev, void __user *udata); extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev, ioasid_t pasid); extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev); @@ -1051,33 +1034,6 @@ static inline u32 iommu_sva_get_pasid(struct iommu_sva *handle) return IOMMU_PASID_INVALID; } -static inline int -iommu_uapi_cache_invalidate(struct iommu_domain *domain, - struct device *dev, - struct iommu_cache_invalidate_info *inv_info) -{ - return -ENODEV; -} - -static inline int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain, - struct device *dev, void __user *udata) -{ - return -ENODEV; -} - -static inline int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, - struct device *dev, void __user *udata) -{ - return -ENODEV; -} - -static inline int iommu_sva_unbind_gpasid(struct iommu_domain *domain, - struct device *dev, - ioasid_t pasid) -{ - return -ENODEV; -} - static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev) { return NULL; -- cgit From 241469685d8d54075f12a6f3eb0628f85e13ff37 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 16 Feb 2022 10:52:43 +0800 Subject: iommu/vt-d: Remove aux-domain related callbacks The aux-domain related callbacks are not called in the tree. Remove them to avoid dead code. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220216025249.3459465-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index beaacb589abd..5cfda90b2cca 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -542,7 +542,6 @@ struct dmar_domain { u8 iommu_snooping: 1; /* indicate snooping control feature */ struct list_head devices; /* all devices' list */ - struct list_head subdevices; /* all subdevices' list */ struct iova_domain iovad; /* iova's that belong to this domain */ struct dma_pte *pgd; /* virtual address */ @@ -557,11 +556,6 @@ struct dmar_domain { 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */ u64 max_addr; /* maximum mapped address */ - u32 default_pasid; /* - * The default pasid used for non-SVM - * traffic on mediated devices. - */ - struct iommu_domain domain; /* generic domain data structure for iommu core */ }; @@ -614,21 +608,11 @@ struct intel_iommu { void *perf_statistic; }; -/* Per subdevice private data */ -struct subdev_domain_info { - struct list_head link_phys; /* link to phys device siblings */ - struct list_head link_domain; /* link to domain siblings */ - struct device *pdev; /* physical device derived from */ - struct dmar_domain *domain; /* aux-domain */ - int users; /* user count */ -}; - /* PCI domain-device relationship */ struct device_domain_info { struct list_head link; /* link to domain siblings */ struct list_head global; /* link to global list */ struct list_head table; /* link to pasid table */ - struct list_head subdevices; /* subdevices sibling */ u32 segment; /* PCI segment number */ u8 bus; /* PCI bus number */ u8 devfn; /* PCI devfn number */ @@ -639,7 +623,6 @@ struct device_domain_info { u8 pri_enabled:1; u8 ats_supported:1; u8 ats_enabled:1; - u8 auxd_enabled:1; /* Multiple domains per device */ u8 ats_qdep; struct device *dev; /* it's NULL for PCIe-to-PCI bridge */ struct intel_iommu *iommu; /* IOMMU used by this device */ -- cgit From 8652d875939b6a1fe6d5c93cf4fb91df61cda5a7 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 16 Feb 2022 10:52:44 +0800 Subject: iommu: Remove aux-domain related interfaces and iommu_ops The aux-domain related interfaces and iommu_ops are not referenced anywhere in the tree. We've also reached a consensus to redesign it based the new iommufd framework. Remove them to avoid dead code. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220216025249.3459465-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 29 ----------------------------- 1 file changed, 29 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index cde1d0aad60f..9983a01373b2 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -144,7 +144,6 @@ struct iommu_resv_region { /** * enum iommu_dev_features - Per device IOMMU features - * @IOMMU_DEV_FEAT_AUX: Auxiliary domain feature * @IOMMU_DEV_FEAT_SVA: Shared Virtual Addresses * @IOMMU_DEV_FEAT_IOPF: I/O Page Faults such as PRI or Stall. Generally * enabling %IOMMU_DEV_FEAT_SVA requires @@ -157,7 +156,6 @@ struct iommu_resv_region { * iommu_dev_has_feature(), and enable it using iommu_dev_enable_feature(). */ enum iommu_dev_features { - IOMMU_DEV_FEAT_AUX, IOMMU_DEV_FEAT_SVA, IOMMU_DEV_FEAT_IOPF, }; @@ -223,8 +221,6 @@ struct iommu_iotlb_gather { * @dev_has/enable/disable_feat: per device entries to check/enable/disable * iommu specific features. * @dev_feat_enabled: check enabled feature - * @aux_attach/detach_dev: aux-domain specific attach/detach entries. - * @aux_get_pasid: get the pasid given an aux-domain * @sva_bind: Bind process address space to device * @sva_unbind: Unbind process address space from device * @sva_get_pasid: Get PASID associated to a SVA handle @@ -285,11 +281,6 @@ struct iommu_ops { int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f); int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f); - /* Aux-domain specific attach/detach entries */ - int (*aux_attach_dev)(struct iommu_domain *domain, struct device *dev); - void (*aux_detach_dev)(struct iommu_domain *domain, struct device *dev); - int (*aux_get_pasid)(struct iommu_domain *domain, struct device *dev); - struct iommu_sva *(*sva_bind)(struct device *dev, struct mm_struct *mm, void *drvdata); void (*sva_unbind)(struct iommu_sva *handle); @@ -655,9 +646,6 @@ void iommu_release_device(struct device *dev); int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f); int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f); bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f); -int iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev); -void iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev); -int iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev); struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, @@ -1002,23 +990,6 @@ iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features feat) return -ENODEV; } -static inline int -iommu_aux_attach_device(struct iommu_domain *domain, struct device *dev) -{ - return -ENODEV; -} - -static inline void -iommu_aux_detach_device(struct iommu_domain *domain, struct device *dev) -{ -} - -static inline int -iommu_aux_get_pasid(struct iommu_domain *domain, struct device *dev) -{ - return -ENODEV; -} - static inline struct iommu_sva * iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, void *drvdata) { -- cgit From 71fe30698dc3be6fd277c49c0c7b220c7ae92ffd Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 16 Feb 2022 10:52:45 +0800 Subject: iommu: Remove apply_resv_region The apply_resv_region callback in iommu_ops was introduced to reserve an IOVA range in the given DMA domain when the IOMMU driver manages the IOVA by itself. As all drivers converted to use dma-iommu in the core, there's no driver using this anymore. Remove it to avoid dead code. Suggested-by: Robin Murphy Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220216025249.3459465-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9983a01373b2..9ffa2e88f3d0 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -214,7 +214,6 @@ struct iommu_iotlb_gather { * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) * @get_resv_regions: Request list of reserved regions for a device * @put_resv_regions: Free list of reserved regions for a device - * @apply_resv_region: Temporary helper call-back for iova reserved ranges * @of_xlate: add OF master IDs to iommu grouping * @is_attach_deferred: Check if domain attach should be deferred from iommu * driver init to device driver init (default no) @@ -268,9 +267,6 @@ struct iommu_ops { /* Request/Free a list of reserved regions for a device */ void (*get_resv_regions)(struct device *dev, struct list_head *list); void (*put_resv_regions)(struct device *dev, struct list_head *list); - void (*apply_resv_region)(struct device *dev, - struct iommu_domain *domain, - struct iommu_resv_region *region); int (*of_xlate)(struct device *dev, struct of_phandle_args *args); bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev); -- cgit From 3f6634d997dba8140b3a7cba01776b9638d70dff Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 16 Feb 2022 10:52:47 +0800 Subject: iommu: Use right way to retrieve iommu_ops The common iommu_ops is hooked to both device and domain. When a helper has both device and domain pointer, the way to get the iommu_ops looks messy in iommu core. This sorts out the way to get iommu_ops. The device related helpers go through device pointer, while the domain related ones go through domain pointer. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220216025249.3459465-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9ffa2e88f3d0..90f1b5e3809d 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -381,6 +381,17 @@ static inline void iommu_iotlb_gather_init(struct iommu_iotlb_gather *gather) }; } +static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) +{ + /* + * Assume that valid ops must be installed if iommu_probe_device() + * has succeeded. The device ops are essentially for internal use + * within the IOMMU subsystem itself, so we should be able to trust + * ourselves not to misuse the helper. + */ + return dev->iommu->iommu_dev->ops; +} + #define IOMMU_GROUP_NOTIFY_ADD_DEVICE 1 /* Device added */ #define IOMMU_GROUP_NOTIFY_DEL_DEVICE 2 /* Pre Device removed */ #define IOMMU_GROUP_NOTIFY_BIND_DRIVER 3 /* Pre Driver bind */ -- cgit From 41bb23e70b50b89b6137cbecd37009d782454860 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 16 Feb 2022 10:52:48 +0800 Subject: iommu: Remove unused argument in is_attach_deferred The is_attach_deferred iommu_ops callback is a device op. The domain argument is unnecessary and never used. Remove it to make code clean. Suggested-by: Robin Murphy Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220216025249.3459465-9-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 90f1b5e3809d..1ded6076a75c 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -269,7 +269,7 @@ struct iommu_ops { void (*put_resv_regions)(struct device *dev, struct list_head *list); int (*of_xlate)(struct device *dev, struct of_phandle_args *args); - bool (*is_attach_deferred)(struct iommu_domain *domain, struct device *dev); + bool (*is_attach_deferred)(struct device *dev); /* Per device IOMMU features */ bool (*dev_has_feat)(struct device *dev, enum iommu_dev_features f); -- cgit From 9a630a4b41a2639b65d024a5d2d88ed3ecca130a Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 16 Feb 2022 10:52:49 +0800 Subject: iommu: Split struct iommu_ops Move the domain specific operations out of struct iommu_ops into a new structure that only has domain specific operations. This solves the problem of needing to know if the method vector for a given operation needs to be retrieved from the device or the domain. Logically the domain ops are the ones that make sense for external subsystems and endpoint drivers to use, while device ops, with the sole exception of domain_alloc, are IOMMU API internals. Signed-off-by: Lu Baolu Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220216025249.3459465-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 91 ++++++++++++++++++++++++++++++--------------------- 1 file changed, 53 insertions(+), 38 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 1ded6076a75c..9208eca4b0d1 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -37,6 +37,7 @@ struct iommu_group; struct bus_type; struct device; struct iommu_domain; +struct iommu_domain_ops; struct notifier_block; struct iommu_sva; struct iommu_fault_event; @@ -88,7 +89,7 @@ struct iommu_domain_geometry { struct iommu_domain { unsigned type; - const struct iommu_ops *ops; + const struct iommu_domain_ops *ops; unsigned long pgsize_bitmap; /* Bitmap of page sizes in use */ iommu_fault_handler_t handler; void *handler_token; @@ -192,26 +193,11 @@ struct iommu_iotlb_gather { * struct iommu_ops - iommu ops and capabilities * @capable: check capability * @domain_alloc: allocate iommu domain - * @domain_free: free iommu domain - * @attach_dev: attach device to an iommu domain - * @detach_dev: detach device from an iommu domain - * @map: map a physically contiguous memory region to an iommu domain - * @map_pages: map a physically contiguous set of pages of the same size to - * an iommu domain. - * @unmap: unmap a physically contiguous memory region from an iommu domain - * @unmap_pages: unmap a number of pages of the same size from an iommu domain - * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain - * @iotlb_sync_map: Sync mappings created recently using @map to the hardware - * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush - * queue - * @iova_to_phys: translate iova to physical address * @probe_device: Add device to iommu driver handling * @release_device: Remove device from iommu driver handling * @probe_finalize: Do final setup work after the device is added to an IOMMU * group and attached to the groups domain * @device_group: find iommu group for a particular device - * @enable_nesting: Enable nesting - * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) * @get_resv_regions: Request list of reserved regions for a device * @put_resv_regions: Free list of reserved regions for a device * @of_xlate: add OF master IDs to iommu grouping @@ -228,6 +214,7 @@ struct iommu_iotlb_gather { * - IOMMU_DOMAIN_IDENTITY: must use an identity domain * - IOMMU_DOMAIN_DMA: must use a dma domain * - 0: use the default setting + * @default_domain_ops: the default ops for domains * @pgsize_bitmap: bitmap of all possible supported page sizes * @owner: Driver module providing these ops */ @@ -236,33 +223,11 @@ struct iommu_ops { /* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); - void (*domain_free)(struct iommu_domain *); - int (*attach_dev)(struct iommu_domain *domain, struct device *dev); - void (*detach_dev)(struct iommu_domain *domain, struct device *dev); - int (*map)(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t size, int prot, gfp_t gfp); - int (*map_pages)(struct iommu_domain *domain, unsigned long iova, - phys_addr_t paddr, size_t pgsize, size_t pgcount, - int prot, gfp_t gfp, size_t *mapped); - size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, - size_t size, struct iommu_iotlb_gather *iotlb_gather); - size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova, - size_t pgsize, size_t pgcount, - struct iommu_iotlb_gather *iotlb_gather); - void (*flush_iotlb_all)(struct iommu_domain *domain); - void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova, - size_t size); - void (*iotlb_sync)(struct iommu_domain *domain, - struct iommu_iotlb_gather *iotlb_gather); - phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); struct iommu_device *(*probe_device)(struct device *dev); void (*release_device)(struct device *dev); void (*probe_finalize)(struct device *dev); struct iommu_group *(*device_group)(struct device *dev); - int (*enable_nesting)(struct iommu_domain *domain); - int (*set_pgtable_quirks)(struct iommu_domain *domain, - unsigned long quirks); /* Request/Free a list of reserved regions for a device */ void (*get_resv_regions)(struct device *dev, struct list_head *list); @@ -288,10 +253,60 @@ struct iommu_ops { int (*def_domain_type)(struct device *dev); + const struct iommu_domain_ops *default_domain_ops; unsigned long pgsize_bitmap; struct module *owner; }; +/** + * struct iommu_domain_ops - domain specific operations + * @attach_dev: attach an iommu domain to a device + * @detach_dev: detach an iommu domain from a device + * @map: map a physically contiguous memory region to an iommu domain + * @map_pages: map a physically contiguous set of pages of the same size to + * an iommu domain. + * @unmap: unmap a physically contiguous memory region from an iommu domain + * @unmap_pages: unmap a number of pages of the same size from an iommu domain + * @flush_iotlb_all: Synchronously flush all hardware TLBs for this domain + * @iotlb_sync_map: Sync mappings created recently using @map to the hardware + * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush + * queue + * @iova_to_phys: translate iova to physical address + * @enable_nesting: Enable nesting + * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) + * @free: Release the domain after use. + */ +struct iommu_domain_ops { + int (*attach_dev)(struct iommu_domain *domain, struct device *dev); + void (*detach_dev)(struct iommu_domain *domain, struct device *dev); + + int (*map)(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t size, int prot, gfp_t gfp); + int (*map_pages)(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t pgsize, size_t pgcount, + int prot, gfp_t gfp, size_t *mapped); + size_t (*unmap)(struct iommu_domain *domain, unsigned long iova, + size_t size, struct iommu_iotlb_gather *iotlb_gather); + size_t (*unmap_pages)(struct iommu_domain *domain, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *iotlb_gather); + + void (*flush_iotlb_all)(struct iommu_domain *domain); + void (*iotlb_sync_map)(struct iommu_domain *domain, unsigned long iova, + size_t size); + void (*iotlb_sync)(struct iommu_domain *domain, + struct iommu_iotlb_gather *iotlb_gather); + + phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, + dma_addr_t iova); + + int (*enable_nesting)(struct iommu_domain *domain); + int (*set_pgtable_quirks)(struct iommu_domain *domain, + unsigned long quirks); + + void (*free)(struct iommu_domain *domain); +}; + /** * struct iommu_device - IOMMU core representation of one IOMMU hardware * instance -- cgit From 6bb477df04366e0f69dd2f49e1ae1099069326bc Mon Sep 17 00:00:00 2001 From: Yun Zhou Date: Thu, 17 Feb 2022 22:12:34 +0800 Subject: spi: use specific last_cs instead of last_cs_enable Commit d40f0b6f2e21 instroduced last_cs_enable to avoid setting chipselect if it's not necessary, but it also introduces a bug. The chipselect may not be set correctly on multi-device SPI busses. The reason is that we can't judge the chipselect by bool last_cs_enable, since chipselect may be modified after other devices were accessed. So we should record the specific state of chipselect in case of confusion. Signed-off-by: Yun Zhou Link: https://lore.kernel.org/r/20220217141234.72737-1-yun.zhou@windriver.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 579d71cdf6fa..7d005fa4631c 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -370,7 +370,8 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @cur_msg_prepared: spi_prepare_message was called for the currently * in-flight message * @cur_msg_mapped: message has been mapped for DMA - * @last_cs_enable: was enable true on the last call to set_cs. + * @last_cs: the last chip_select that is recorded by set_cs, -1 on non chip + * selected * @last_cs_mode_high: was (mode & SPI_CS_HIGH) true on the last call to set_cs. * @xfer_completion: used by core transfer_one_message() * @busy: message pump is busy @@ -603,7 +604,7 @@ struct spi_controller { bool auto_runtime_pm; bool cur_msg_prepared; bool cur_msg_mapped; - bool last_cs_enable; + char last_cs; bool last_cs_mode_high; bool fallback; struct completion xfer_completion; -- cgit From 20f01f163203666010ee1560852590a0c0572726 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 24 Jan 2022 13:59:38 -0800 Subject: blk-crypto: show crypto capabilities in sysfs Add sysfs files that expose the inline encryption capabilities of request queues: /sys/block/$disk/queue/crypto/max_dun_bits /sys/block/$disk/queue/crypto/modes/$mode /sys/block/$disk/queue/crypto/num_keyslots Userspace can use these new files to decide what encryption settings to use, or whether to use inline encryption at all. This also brings the crypto capabilities in line with the other queue properties, which are already discoverable via the queue directory in sysfs. Design notes: - Place the new files in a new subdirectory "crypto" to group them together and to avoid complicating the main "queue" directory. This also makes it possible to replace "crypto" with a symlink later if we ever make the blk_crypto_profiles into real kobjects (see below). - It was necessary to define a new kobject that corresponds to the crypto subdirectory. For now, this kobject just contains a pointer to the blk_crypto_profile. Note that multiple queues (and hence multiple such kobjects) may refer to the same blk_crypto_profile. An alternative design would more closely match the current kernel data structures: the blk_crypto_profile could be a kobject itself, located directly under the host controller device's kobject, while /sys/block/$disk/queue/crypto would be a symlink to it. I decided not to do that for now because it would require a lot more changes, such as no longer embedding blk_crypto_profile in other structures, and also because I'm not sure we can rule out moving the crypto capabilities into 'struct queue_limits' in the future. (Even if multiple queues share the same crypto engine, maybe the supported data unit sizes could differ due to other queue properties.) It would also still be possible to switch to that design later without breaking userspace, by replacing the directory with a symlink. - Use "max_dun_bits" instead of "max_dun_bytes". Currently, the kernel internally stores this value in bytes, but that's an implementation detail. It probably makes more sense to talk about this value in bits, and choosing bits is more future-proof. - "modes" is a sub-subdirectory, since there may be multiple supported crypto modes, sysfs is supposed to have one value per file, and it makes sense to group all the mode files together. - Each mode had to be named. The crypto API names like "xts(aes)" are not appropriate because they don't specify the key size. Therefore, I assigned new names. The exact names chosen are arbitrary, but they happen to match the names used in log messages in fs/crypto/. - The "num_keyslots" file is a bit different from the others in that it is only useful to know for performance reasons. However, it's included as it can still be useful. For example, a user might not want to use inline encryption if there aren't very many keyslots. Reviewed-by: Hannes Reinecke Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20220124215938.2769-4-ebiggers@kernel.org Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f757f9c2871f..e19947d84f12 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -413,6 +413,7 @@ struct request_queue { #ifdef CONFIG_BLK_INLINE_ENCRYPTION struct blk_crypto_profile *crypto_profile; + struct kobject *crypto_kobject; #endif unsigned int rq_timeout; -- cgit From 25d490eb46486e88c16e64d9eb7cfd33a642d596 Mon Sep 17 00:00:00 2001 From: Wang Kefeng Date: Sat, 18 Dec 2021 09:30:40 +0100 Subject: ARM: 9172/1: amba: Cleanup amba pclk operation There is no user about amba_pclk_[un]prepare() besides pl330.c, directly use clk_[un]prepare(). After this, all the function about amba pclk operation, enable, disable, [un]prepare could be killed. Acked-by: Vinod Koul Signed-off-by: Kefeng Wang Signed-off-by: Russell King (Oracle) --- include/linux/amba/bus.h | 20 -------------------- 1 file changed, 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h index 6c7f47846971..09174970b855 100644 --- a/include/linux/amba/bus.h +++ b/include/linux/amba/bus.h @@ -121,26 +121,6 @@ struct amba_device *amba_find_device(const char *, struct device *, unsigned int int amba_request_regions(struct amba_device *, const char *); void amba_release_regions(struct amba_device *); -static inline int amba_pclk_enable(struct amba_device *dev) -{ - return clk_enable(dev->pclk); -} - -static inline void amba_pclk_disable(struct amba_device *dev) -{ - clk_disable(dev->pclk); -} - -static inline int amba_pclk_prepare(struct amba_device *dev) -{ - return clk_prepare(dev->pclk); -} - -static inline void amba_pclk_unprepare(struct amba_device *dev) -{ - clk_unprepare(dev->pclk); -} - /* Some drivers don't use the struct amba_device */ #define AMBA_CONFIG_BITS(a) (((a) >> 24) & 0xff) #define AMBA_REV_BITS(a) (((a) >> 20) & 0x0f) -- cgit From dacf3ca134d0dc105caee77651a349a86bd77456 Mon Sep 17 00:00:00 2001 From: Wang Kefeng Date: Sat, 18 Dec 2021 09:30:41 +0100 Subject: ARM: 9173/1: amba: kill amba_find_match() There is no one use amba_find_match(), kill it. Signed-off-by: Kefeng Wang Signed-off-by: Russell King (Oracle) --- include/linux/amba/bus.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h index 09174970b855..6562f543c3e0 100644 --- a/include/linux/amba/bus.h +++ b/include/linux/amba/bus.h @@ -117,7 +117,6 @@ void amba_device_put(struct amba_device *); int amba_device_add(struct amba_device *, struct resource *); int amba_device_register(struct amba_device *, struct resource *); void amba_device_unregister(struct amba_device *); -struct amba_device *amba_find_device(const char *, struct device *, unsigned int, unsigned int); int amba_request_regions(struct amba_device *, const char *); void amba_release_regions(struct amba_device *); -- cgit From 1a93b82c59ab45f2d9f14c47f09d2e341ff02381 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 18 Feb 2022 07:07:08 -0500 Subject: NFS: constify nfs_server_capable() and nfs_have_writebacks() Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 72a732a5103c..6e10725887d1 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -363,7 +363,7 @@ static inline void nfs_mark_for_revalidate(struct inode *inode) spin_unlock(&inode->i_lock); } -static inline int nfs_server_capable(struct inode *inode, int cap) +static inline int nfs_server_capable(const struct inode *inode, int cap) { return NFS_SERVER(inode)->caps & cap; } @@ -587,12 +587,11 @@ extern struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail); extern void nfs_commit_free(struct nfs_commit_data *data); bool nfs_commit_end(struct nfs_mds_commit_info *cinfo); -static inline int -nfs_have_writebacks(struct inode *inode) +static inline bool nfs_have_writebacks(const struct inode *inode) { if (S_ISREG(inode->i_mode)) return atomic_long_read(&NFS_I(inode)->nrequests) != 0; - return 0; + return false; } /* -- cgit From a9ff2e99e9fa501ec965da03c18a5422b37a2f44 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 25 Jan 2022 10:17:59 -0500 Subject: SUNRPC: Remove the .svo_enqueue_xprt method We have never been able to track down and address the underlying cause of the performance issues with workqueue-based service support. svo_enqueue_xprt is called multiple times per RPC, so it adds instruction path length, but always ends up at the same function: svc_xprt_do_enqueue(). We do not anticipate needing this flexibility for dynamic nfsd thread management support. As a micro-optimization, remove .svo_enqueue_xprt because Spectre/Meltdown makes virtual function calls more costly. This change essentially reverts commit b9e13cdfac70 ("nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operation"). Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 3 --- include/linux/sunrpc/svc_xprt.h | 1 - 2 files changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index f35c22b3355f..6ef9c1cafd0b 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -61,9 +61,6 @@ struct svc_serv_ops { /* function for service threads to run */ int (*svo_function)(void *); - /* queue up a transport for servicing */ - void (*svo_enqueue_xprt)(struct svc_xprt *); - /* optional module to count when adding threads. * Thread function must call module_put_and_kthread_exit() to exit. */ diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 571f605bc91e..a3ba027fb4ba 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -131,7 +131,6 @@ int svc_create_xprt(struct svc_serv *, const char *, struct net *, const int, const unsigned short, int, const struct cred *); void svc_xprt_received(struct svc_xprt *xprt); -void svc_xprt_do_enqueue(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); void svc_xprt_copy_addrs(struct svc_rqst *rqstp, struct svc_xprt *xprt); -- cgit From 87cdd8641c8a1ec6afd2468265e20840a57fd888 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 25 Jan 2022 13:49:29 -0500 Subject: SUNRPC: Remove svo_shutdown method Clean up. Neil observed that "any code that calls svc_shutdown_net() knows what the shutdown function should be, and so can call it directly." Signed-off-by: Chuck Lever Reviewed-by: NeilBrown --- include/linux/sunrpc/svc.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 6ef9c1cafd0b..63794d772eb3 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -55,9 +55,6 @@ struct svc_pool { struct svc_serv; struct svc_serv_ops { - /* Callback to use when last thread exits. */ - void (*svo_shutdown)(struct svc_serv *, struct net *); - /* function for service threads to run */ int (*svo_function)(void *); -- cgit From 352ad31448fecc78a2e9b78da64eea5d63b8d0ce Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 26 Jan 2022 11:42:08 -0500 Subject: SUNRPC: Rename svc_create_xprt() Clean up: Use the "svc_xprt_" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_xprt.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index a3ba027fb4ba..a7f6f17c3dc5 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -127,9 +127,10 @@ int svc_reg_xprt_class(struct svc_xprt_class *); void svc_unreg_xprt_class(struct svc_xprt_class *); void svc_xprt_init(struct net *, struct svc_xprt_class *, struct svc_xprt *, struct svc_serv *); -int svc_create_xprt(struct svc_serv *, const char *, struct net *, - const int, const unsigned short, int, - const struct cred *); +int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, + struct net *net, const int family, + const unsigned short port, int flags, + const struct cred *cred); void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); -- cgit From 4355d767a21b9445958fc11bce9a9701f76529d3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 31 Jan 2022 13:34:29 -0500 Subject: SUNRPC: Rename svc_close_xprt() Clean up: Use the "svc_xprt_" function naming convention as is used for other external APIs. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_xprt.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index a7f6f17c3dc5..bf7d029fb48c 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -135,7 +135,7 @@ void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); void svc_xprt_copy_addrs(struct svc_rqst *rqstp, struct svc_xprt *xprt); -void svc_close_xprt(struct svc_xprt *xprt); +void svc_xprt_close(struct svc_xprt *xprt); int svc_port_is_privileged(struct sockaddr *sin); int svc_print_xprts(char *buf, int maxlen); struct svc_xprt *svc_find_xprt(struct svc_serv *serv, const char *xcl_name, -- cgit From c7d7ec8f043e53ad16e30f5ebb8b9df415ec0f2b Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 26 Jan 2022 11:30:55 -0500 Subject: SUNRPC: Remove svc_shutdown_net() Clean up: svc_shutdown_net() now does nothing but call svc_close_net(). Replace all external call sites. svc_close_net() is renamed to be the inverse of svc_xprt_create(). Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 1 - include/linux/sunrpc/svc_xprt.h | 1 + 2 files changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 63794d772eb3..5603158b2aa7 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -508,7 +508,6 @@ struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, const struct svc_serv_ops *); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); -void svc_shutdown_net(struct svc_serv *, struct net *); int svc_process(struct svc_rqst *); int bc_svc_process(struct svc_serv *, struct rpc_rqst *, struct svc_rqst *); diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index bf7d029fb48c..42e113742429 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -131,6 +131,7 @@ int svc_xprt_create(struct svc_serv *serv, const char *xprt_name, struct net *net, const int family, const unsigned short port, int flags, const struct cred *cred); +void svc_xprt_destroy_all(struct svc_serv *serv, struct net *net); void svc_xprt_received(struct svc_xprt *xprt); void svc_xprt_enqueue(struct svc_xprt *xprt); void svc_xprt_put(struct svc_xprt *xprt); -- cgit From f49169c97fceb21ad6a0aaf671c50b0f520f15a5 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 16 Feb 2022 12:31:09 -0500 Subject: NFSD: Remove svc_serv_ops::svo_module struct svc_serv_ops is about to be removed. Neil Brown says: > I suspect svo_module can go as well - I don't think the thread is > ever the thing that primarily keeps a module active. A random sample of kthread_create() callers shows sunrpc is the only one that manages module reference count in this way. Suggested-by: Neil Brown Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 5603158b2aa7..dfc9283f412f 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -57,11 +57,6 @@ struct svc_serv; struct svc_serv_ops { /* function for service threads to run */ int (*svo_function)(void *); - - /* optional module to count when adding threads. - * Thread function must call module_put_and_kthread_exit() to exit. - */ - struct module *svo_module; }; /* -- cgit From 37902c6313090235c847af89c5515591261ee338 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 16 Feb 2022 12:16:27 -0500 Subject: NFSD: Move svc_serv_ops::svo_function into struct svc_serv Hoist svo_function back into svc_serv and remove struct svc_serv_ops, since the struct is now devoid of fields. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index dfc9283f412f..a5dda4987e8b 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -52,13 +52,6 @@ struct svc_pool { unsigned long sp_flags; } ____cacheline_aligned_in_smp; -struct svc_serv; - -struct svc_serv_ops { - /* function for service threads to run */ - int (*svo_function)(void *); -}; - /* * RPC service. * @@ -91,7 +84,8 @@ struct svc_serv { unsigned int sv_nrpools; /* number of thread pools */ struct svc_pool * sv_pools; /* array of thread pools */ - const struct svc_serv_ops *sv_ops; /* server operations */ + int (*sv_threadfn)(void *data); + #if defined(CONFIG_SUNRPC_BACKCHANNEL) struct list_head sv_cb_list; /* queue for callback requests * that arrive over the same @@ -492,7 +486,7 @@ int svc_rpcb_setup(struct svc_serv *serv, struct net *net); void svc_rpcb_cleanup(struct svc_serv *serv, struct net *net); int svc_bind(struct svc_serv *serv, struct net *net); struct svc_serv *svc_create(struct svc_program *, unsigned int, - const struct svc_serv_ops *); + int (*threadfn)(void *data)); struct svc_rqst *svc_rqst_alloc(struct svc_serv *serv, struct svc_pool *pool, int node); void svc_rqst_replace_page(struct svc_rqst *rqstp, @@ -500,7 +494,7 @@ void svc_rqst_replace_page(struct svc_rqst *rqstp, void svc_rqst_free(struct svc_rqst *); void svc_exit_thread(struct svc_rqst *); struct svc_serv * svc_create_pooled(struct svc_program *, unsigned int, - const struct svc_serv_ops *); + int (*threadfn)(void *data)); int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int); int svc_pool_stats_open(struct svc_serv *serv, struct file *file); int svc_process(struct svc_rqst *); -- cgit From 74aaf96feaca80285912cc6f19575b3e97177918 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 22 Feb 2022 13:10:52 -0500 Subject: SUNRPC: Teach server to recognize RPC_AUTH_TLS Initial support for the RPC_AUTH_TLS authentication flavor enables NFSD to eventually accept an RPC_AUTH_TLS probe from clients. This patch simply prevents NFSD from rejecting these probes completely. In the meantime, graft this support in now so that RPC_AUTH_TLS support keeps up with generic code and API changes in the RPC server. Down the road, server-side transport implementations will populate xpo_start_tls when they can support RPC-with-TLS. For example, TCP will eventually populate it, but RDMA won't. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc_xprt.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc_xprt.h b/include/linux/sunrpc/svc_xprt.h index 42e113742429..20068ccfd0cc 100644 --- a/include/linux/sunrpc/svc_xprt.h +++ b/include/linux/sunrpc/svc_xprt.h @@ -28,6 +28,7 @@ struct svc_xprt_ops { void (*xpo_free)(struct svc_xprt *); void (*xpo_secure_port)(struct svc_rqst *rqstp); void (*xpo_kill_temp_xprt)(struct svc_xprt *); + void (*xpo_start_tls)(struct svc_xprt *); }; struct svc_xprt_class { -- cgit From 797bd4d41c8b41afc10fad39146ac9558cb39e94 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Thu, 24 Feb 2022 10:55:54 +0100 Subject: tty: serial: define UART_LCR_WLEN() macro Define a generic UART_LCR_WLEN() macro with a size argument. It can be used to encode byte size into an LCR value. Therefore we can use it to simplify the drivers using tty_get_char_size() in the next patches. Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220224095558.30929-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/serial.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/serial.h b/include/linux/serial.h index 0916107c77f9..0b8b7d7c8f33 100644 --- a/include/linux/serial.h +++ b/include/linux/serial.h @@ -12,6 +12,8 @@ #include #include +/* Helper for dealing with UART_LCR_WLEN* defines */ +#define UART_LCR_WLEN(x) ((x) - 5) /* * Counters of the input lines (CTS, DSR, RI, CD) interrupts -- cgit From 17a8f31bba7bac8cce4bd12bab50697da96e7710 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Mon, 28 Feb 2022 04:18:05 +0100 Subject: netfilter: egress: silence egress hook lockdep splats Netfilter assumes its called with rcu_read_lock held, but in egress hook case it may be called with BH readlock. This triggers lockdep splat. In order to avoid to change all rcu_dereference() to rcu_dereference_check(..., rcu_read_lock_bh_held()), wrap nf_hook_slow with read lock/unlock pair. Reported-by: Eric Dumazet Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/linux/netfilter_netdev.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netfilter_netdev.h b/include/linux/netfilter_netdev.h index b4dd96e4dc8d..e6487a691136 100644 --- a/include/linux/netfilter_netdev.h +++ b/include/linux/netfilter_netdev.h @@ -101,7 +101,11 @@ static inline struct sk_buff *nf_hook_egress(struct sk_buff *skb, int *rc, nf_hook_state_init(&state, NF_NETDEV_EGRESS, NFPROTO_NETDEV, dev, NULL, NULL, dev_net(dev), NULL); + + /* nf assumes rcu_read_lock, not just read_lock_bh */ + rcu_read_lock(); ret = nf_hook_slow(skb, &state, e, 0); + rcu_read_unlock(); if (ret == 1) { return skb; -- cgit From dcfd5192563909219f6304b4e3e10db071158eef Mon Sep 17 00:00:00 2001 From: Alyssa Rosenzweig Date: Tue, 15 Feb 2022 13:46:51 -0500 Subject: soc: mediatek: mtk-infracfg: Disable ACP on MT8192 MT8192 contains an experimental Accelerator Coherency Port implementation, which does not work correctly but was unintentionally enabled by default. For correct operation of the GPU, we must set a chicken bit disabling ACP on MT8192. Adapted from the following downstream change to the out-of-tree, legacy Mali GPU driver: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/2781271/5 Note this change is required for both Panfrost and the legacy kernel driver. Co-developed-by: Robin Murphy Signed-off-by: Robin Murphy Signed-off-by: Alyssa Rosenzweig Cc: Nick Fan Cc: Nicolas Boichat Cc: Chen-Yu Tsai Cc: Stephen Boyd Cc: AngeloGioacchino Del Regno Tested-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20220215184651.12168-1-alyssa.rosenzweig@collabora.com Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/infracfg.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/infracfg.h b/include/linux/soc/mediatek/infracfg.h index 8a1c2040a28e..50804ac748bd 100644 --- a/include/linux/soc/mediatek/infracfg.h +++ b/include/linux/soc/mediatek/infracfg.h @@ -277,6 +277,9 @@ #define INFRA_TOPAXI_PROTECTEN_SET 0x0260 #define INFRA_TOPAXI_PROTECTEN_CLR 0x0264 +#define MT8192_INFRA_CTRL 0x290 +#define MT8192_INFRA_CTRL_DISABLE_MFG2ACP BIT(9) + #define REG_INFRA_MISC 0xf00 #define F_DDR_4GB_SUPPORT_EN BIT(13) -- cgit From cb66b9ba5cdacab5592bce31a4083fc4ac172ea5 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Mon, 28 Feb 2022 22:58:28 -0800 Subject: Input: add input_copy_abs() function Add a new helper function to copy absinfo from one input_dev to another input_dev. This is useful to e.g. setup a pen/stylus input-device for combined touchscreen/pen hardware where the pen uses the same coordinates as the touchscreen. Suggested-by: Dmitry Torokhov Signed-off-by: Hans de Goede Link: https://lore.kernel.org/r/20220131143539.109142-2-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov --- include/linux/input.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/input.h b/include/linux/input.h index 0354b298d874..49790c1bd2c4 100644 --- a/include/linux/input.h +++ b/include/linux/input.h @@ -475,6 +475,8 @@ static inline void input_set_events_per_packet(struct input_dev *dev, int n_even void input_alloc_absinfo(struct input_dev *dev); void input_set_abs_params(struct input_dev *dev, unsigned int axis, int min, int max, int fuzz, int flat); +void input_copy_abs(struct input_dev *dst, unsigned int dst_axis, + const struct input_dev *src, unsigned int src_axis); #define INPUT_GENERATE_ABS_ACCESSORS(_suffix, _item) \ static inline int input_abs_get_##_suffix(struct input_dev *dev, \ -- cgit From 50bb467c9e76743fbc8441d29113cdad62dbc4fe Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Fri, 18 Feb 2022 09:38:58 +0000 Subject: rfkill: define rfill_soft_blocked() if !RFKILL If CONFIG_RFKILL is not set, the Intel WiFi driver will not build the iw_mvm driver part due to the missing rfill_soft_blocked() call. Adding a inline declaration of rfill_soft_blocked() if CONFIG_RFKILL=n fixes the following error: drivers/net/wireless/intel/iwlwifi/mvm/mvm.h: In function 'iwl_mvm_mei_set_sw_rfkill_state': drivers/net/wireless/intel/iwlwifi/mvm/mvm.h:2215:38: error: implicit declaration of function 'rfkill_soft_blocked'; did you mean 'rfkill_blocked'? [-Werror=implicit-function-declaration] 2215 | mvm->hw_registered ? rfkill_soft_blocked(mvm->hw->wiphy->rfkill) : false; | ^~~~~~~~~~~~~~~~~~~ | rfkill_blocked Signed-off-by: Ben Dooks Reported-by: Neill Whillans Fixes: 5bc9a9dd7535 ("rfkill: allow to get the software rfkill state") Link: https://lore.kernel.org/r/20220218093858.1245677-1-ben.dooks@codethink.co.uk Signed-off-by: Johannes Berg --- include/linux/rfkill.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rfkill.h b/include/linux/rfkill.h index c35f3962dc4f..373003ace639 100644 --- a/include/linux/rfkill.h +++ b/include/linux/rfkill.h @@ -308,6 +308,11 @@ static inline bool rfkill_blocked(struct rfkill *rfkill) return false; } +static inline bool rfkill_soft_blocked(struct rfkill *rfkill) +{ + return false; +} + static inline enum rfkill_type rfkill_find_type(const char *name) { return RFKILL_TYPE_ALL; -- cgit From 2f6f66ccd21e854cd3743bc8def68f8b4e7d91fc Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 25 Feb 2022 18:22:44 +0000 Subject: KVM: Drop kvm_reload_remote_mmus(), open code request in x86 users Remove the generic kvm_reload_remote_mmus() and open code its functionality into the two x86 callers. x86 is (obviously) the only architecture that uses the hook, and is also the only architecture that uses KVM_REQ_MMU_RELOAD in a way that's consistent with the name. That will change in a future patch, as x86's usage when zapping a single shadow page x86 doesn't actually _need_ to reload all vCPUs' MMUs, only MMUs whose root is being zapped actually need to be reloaded. s390 also uses KVM_REQ_MMU_RELOAD, but for a slightly different purpose. Drop the generic code in anticipation of implementing s390 and x86 arch specific requests, which will allow dropping KVM_REQ_MMU_RELOAD entirely. Opportunistically reword the x86 TDP MMU comment to avoid making references to functions (and requests!) when possible, and to remove the rather ambiguous "this". No functional change intended. Cc: Ben Gardon Signed-off-by: Sean Christopherson Reviewed-by: Ben Gardon Message-Id: <20220225182248.3812651-4-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f11039944c08..0aeb47cffd43 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1325,7 +1325,6 @@ int kvm_vcpu_yield_to(struct kvm_vcpu *target); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible); void kvm_flush_remote_tlbs(struct kvm *kvm); -void kvm_reload_remote_mmus(struct kvm *kvm); #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min); -- cgit From e65a3b46b5b1fab92c3273bcf71e028a4d307400 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 25 Feb 2022 18:22:47 +0000 Subject: KVM: Drop KVM_REQ_MMU_RELOAD and update vcpu-requests.rst documentation Remove the now unused KVM_REQ_MMU_RELOAD, shift KVM_REQ_VM_DEAD into the unoccupied space, and update vcpu-requests.rst, which was missing an entry for KVM_REQ_VM_DEAD. Switching KVM_REQ_VM_DEAD to entry '1' also fixes the stale comment about bits 4-7 being reserved. Reviewed-by: Claudio Imbrenda Signed-off-by: Sean Christopherson Reviewed-by: Ben Gardon Message-Id: <20220225182248.3812651-7-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 0aeb47cffd43..9536ffa0473b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -153,10 +153,9 @@ static inline bool is_error_page(struct page *page) * Bits 4-7 are reserved for more arch-independent bits. */ #define KVM_REQ_TLB_FLUSH (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) -#define KVM_REQ_MMU_RELOAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) +#define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_UNBLOCK 2 #define KVM_REQ_UNHALT 3 -#define KVM_REQ_VM_DEAD (4 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_GPC_INVALIDATE (5 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQUEST_ARCH_BASE 8 -- cgit From e45f1c1d70cae0d7a28ad60f9c6391e210354f0b Mon Sep 17 00:00:00 2001 From: Georgi Djakov Date: Tue, 1 Mar 2022 11:07:35 +0200 Subject: interconnect: Add stubs for the bulk API Add stub functions for the bulk API to allow compile testing. Reviewed-by: Alex Elder Link: https://lore.kernel.org/r/20220301090735.26599-1-djakov@kernel.org Signed-off-by: Georgi Djakov --- include/linux/interconnect.h | 36 +++++++++++++++++++++++++++++------- 1 file changed, 29 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h index f2dd2fc8d3cd..f685777b875e 100644 --- a/include/linux/interconnect.h +++ b/include/linux/interconnect.h @@ -38,13 +38,6 @@ struct icc_bulk_data { u32 peak_bw; }; -int __must_check of_icc_bulk_get(struct device *dev, int num_paths, - struct icc_bulk_data *paths); -void icc_bulk_put(int num_paths, struct icc_bulk_data *paths); -int icc_bulk_set_bw(int num_paths, const struct icc_bulk_data *paths); -int icc_bulk_enable(int num_paths, const struct icc_bulk_data *paths); -void icc_bulk_disable(int num_paths, const struct icc_bulk_data *paths); - #if IS_ENABLED(CONFIG_INTERCONNECT) struct icc_path *icc_get(struct device *dev, const int src_id, @@ -58,6 +51,12 @@ int icc_disable(struct icc_path *path); int icc_set_bw(struct icc_path *path, u32 avg_bw, u32 peak_bw); void icc_set_tag(struct icc_path *path, u32 tag); const char *icc_get_name(struct icc_path *path); +int __must_check of_icc_bulk_get(struct device *dev, int num_paths, + struct icc_bulk_data *paths); +void icc_bulk_put(int num_paths, struct icc_bulk_data *paths); +int icc_bulk_set_bw(int num_paths, const struct icc_bulk_data *paths); +int icc_bulk_enable(int num_paths, const struct icc_bulk_data *paths); +void icc_bulk_disable(int num_paths, const struct icc_bulk_data *paths); #else @@ -112,6 +111,29 @@ static inline const char *icc_get_name(struct icc_path *path) return NULL; } +static inline int of_icc_bulk_get(struct device *dev, int num_paths, struct icc_bulk_data *paths) +{ + return 0; +} + +static inline void icc_bulk_put(int num_paths, struct icc_bulk_data *paths) +{ +} + +static inline int icc_bulk_set_bw(int num_paths, const struct icc_bulk_data *paths) +{ + return 0; +} + +static inline int icc_bulk_enable(int num_paths, const struct icc_bulk_data *paths) +{ + return 0; +} + +static inline void icc_bulk_disable(int num_paths, const struct icc_bulk_data *paths) +{ +} + #endif /* CONFIG_INTERCONNECT */ #endif /* __LINUX_INTERCONNECT_H */ -- cgit From 1c1813a743fe84d0dcf53743baa4edc1f74c44c8 Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Thu, 3 Feb 2022 15:32:15 +0100 Subject: HID: core: statically allocate read buffers This is a preparation patch for rethinking the generic processing of HID reports. We can actually pre-allocate all of our memory instead of dynamically allocating/freeing it whenever we parse a report. Signed-off-by: Benjamin Tissoires Reviewed-by: Ping Cheng Acked-by: Peter Hutterer Signed-off-by: Jiri Kosina --- include/linux/hid.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index 7487b0586fe6..3fbfe0986659 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -476,6 +476,7 @@ struct hid_field { unsigned report_count; /* number of this field in the report */ unsigned report_type; /* (input,output,feature) */ __s32 *value; /* last known value(s) */ + __s32 *new_value; /* newly read value(s) */ __s32 logical_minimum; __s32 logical_maximum; __s32 physical_minimum; -- cgit From b79c1abae5e19726c5060749e4e7c9e426b045c8 Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Thu, 3 Feb 2022 15:32:17 +0100 Subject: HID: core: split data fetching from processing in hid_input_field() This is a preparatory patch for being able to process the usages out of order. We split the retrieval of the data in a separate function and also split out the processing of the usages depending if the field is an array or a variable. No functional changes from this patch. Signed-off-by: Benjamin Tissoires Reviewed-by: Ping Cheng Acked-by: Peter Hutterer Signed-off-by: Jiri Kosina --- include/linux/hid.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index 3fbfe0986659..cf79eb3da465 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -342,7 +342,7 @@ struct hid_item { * HID device quirks. */ -/* +/* * Increase this if you need to configure more HID quirks at module load time */ #define MAX_USBHID_BOOT_QUIRKS 4 @@ -483,6 +483,7 @@ struct hid_field { __s32 physical_maximum; __s32 unit_exponent; unsigned unit; + bool ignored; /* this field is ignored in this event */ struct hid_report *report; /* associated report */ unsigned index; /* index into report->field[] */ /* hidinput data */ -- cgit From 22f4b026c3ddd4b26c5baa202bd3ee38feaa2e9a Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Thu, 3 Feb 2022 15:32:21 +0100 Subject: HID: compute an ordered list of input fields to process This will be used in a later commit: we build a list of input fields (and usage_index) that is ordered based on a usage priority. Changing the usage priority allows to re-order the processed list, meaning that we can enforce some usages to be process before others. For instance, before processing InRange in the HID tablets, we need to know if we are using the eraser (side or button). Enforcing a higher (lower number) priority for Invert allows to force the input stack to process that field before. Signed-off-by: Benjamin Tissoires Reviewed-by: Ping Cheng Acked-by: Peter Hutterer Signed-off-by: Jiri Kosina --- include/linux/hid.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index cf79eb3da465..f25020c0d6b8 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -477,6 +477,7 @@ struct hid_field { unsigned report_type; /* (input,output,feature) */ __s32 *value; /* last known value(s) */ __s32 *new_value; /* newly read value(s) */ + __s32 *usages_priorities; /* priority of each usage when reading the report */ __s32 logical_minimum; __s32 logical_maximum; __s32 physical_minimum; @@ -493,13 +494,22 @@ struct hid_field { #define HID_MAX_FIELDS 256 +struct hid_field_entry { + struct list_head list; + struct hid_field *field; + unsigned int index; + __s32 priority; +}; + struct hid_report { struct list_head list; struct list_head hidinput_list; + struct list_head field_entry_list; /* ordered list of input fields */ unsigned int id; /* id of this report */ unsigned int type; /* report type */ unsigned int application; /* application usage for this report */ struct hid_field *field[HID_MAX_FIELDS]; /* fields of the report */ + struct hid_field_entry *field_entries; /* allocated memory of input field_entry */ unsigned maxfield; /* maximum valid field index */ unsigned size; /* size of the report (bits) */ struct hid_device *device; /* associated device */ -- cgit From 048cddfd440583a07530774fe20c7d26d7378155 Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Thu, 3 Feb 2022 15:32:23 +0100 Subject: HID: input: enforce Invert usage to be processed before InRange When a device exposes both Invert and InRange, Invert must be processed before InRange. If we keep the order of the device and we process them out of order, InRange will first set BTN_TOOL_PEN, and then Invert will set BTN_TOOL_RUBBER. Userspace knows how to deal with that situation, but fixing it in the kernel is now easier. Signed-off-by: Benjamin Tissoires Reviewed-by: Ping Cheng Acked-by: Peter Hutterer Signed-off-by: Jiri Kosina --- include/linux/hid.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index f25020c0d6b8..eaad0655b05c 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -477,7 +477,9 @@ struct hid_field { unsigned report_type; /* (input,output,feature) */ __s32 *value; /* last known value(s) */ __s32 *new_value; /* newly read value(s) */ - __s32 *usages_priorities; /* priority of each usage when reading the report */ + __s32 *usages_priorities; /* priority of each usage when reading the report + * bits 8-16 are reserved for hid-input usage + */ __s32 logical_minimum; __s32 logical_maximum; __s32 physical_minimum; -- cgit From 87562fcd134214a68e58d0714b820f2f2da75b1f Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Thu, 3 Feb 2022 15:32:24 +0100 Subject: HID: input: remove the need for HID_QUIRK_INVERT HID_QUIRK_INVERT is kind of complex to deal with and was bogus. Furthermore, it didn't make sense to use a global per struct hid_device quirk for something dynamic as the current state. Store the current tool information in the report itself, and re-order the processing of the fields to enforce having all the tablet "state" fields before getting to In Range and other input fields. This way, we now have all the information whether a tool is present or not while processing In Range. This new behavior enforces that only one tool gets forwarded to userspace at the same time, and that if either eraser or invert is set, we enforce BTN_TOOL_RUBBER. Note that the release of the previous tool now happens in its own EV_SYN report so userspace doesn't get confused by having 2 tools. These changes are tested in the following hid-tools regression tests: https://gitlab.freedesktop.org/libevdev/hid-tools/-/merge_requests/127 Signed-off-by: Benjamin Tissoires Reviewed-by: Ping Cheng Acked-by: Peter Hutterer Signed-off-by: Jiri Kosina --- include/linux/hid.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index eaad0655b05c..feb8df61168f 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -347,7 +347,7 @@ struct hid_item { */ #define MAX_USBHID_BOOT_QUIRKS 4 -#define HID_QUIRK_INVERT BIT(0) +/* BIT(0) reserved for backward compatibility, was HID_QUIRK_INVERT */ #define HID_QUIRK_NOTOUCH BIT(1) #define HID_QUIRK_IGNORE BIT(2) #define HID_QUIRK_NOGET BIT(3) @@ -515,6 +515,10 @@ struct hid_report { unsigned maxfield; /* maximum valid field index */ unsigned size; /* size of the report (bits) */ struct hid_device *device; /* associated device */ + + /* tool related state */ + bool tool_active; /* whether the current tool is active */ + unsigned int tool; /* BTN_TOOL_* */ }; #define HID_MAX_IDS 256 -- cgit From 5c20000a4756f57c824e3045c978ef19136a676d Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Thu, 3 Feb 2022 15:32:25 +0100 Subject: HID: input: accommodate priorities for slotted devices Multitouch devices in hybrid mode are reporting multiple times the same collection. We should accommodate for this in our handling of priorities by defining the slots they belong to. Signed-off-by: Benjamin Tissoires Reviewed-by: Ping Cheng Acked-by: Peter Hutterer Signed-off-by: Jiri Kosina --- include/linux/hid.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index feb8df61168f..4363a63b9775 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -492,6 +492,7 @@ struct hid_field { /* hidinput data */ struct hid_input *hidinput; /* associated input structure */ __u16 dpad; /* dpad input code */ + unsigned int slot_idx; /* slot index in a report */ }; #define HID_MAX_FIELDS 256 -- cgit From dc6e0818bc9a0336d9accf3ea35d146d72aa7a18 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Sun, 20 Feb 2022 13:14:25 +0800 Subject: sched/cpuacct: Optimize away RCU read lock Since cpuacct_charge() is called from the scheduler update_curr(), we must already have rq lock held, then the RCU read lock can be optimized away. And do the same thing in it's wrapper cgroup_account_cputime(), but we can't use lockdep_assert_rq_held() there, which defined in kernel/sched/sched.h. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220220051426.5274-2-zhouchengming@bytedance.com --- include/linux/cgroup.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 75c151413fda..9a109c6ac0e0 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -791,11 +791,9 @@ static inline void cgroup_account_cputime(struct task_struct *task, cpuacct_charge(task, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime(cgrp, delta_exec); - rcu_read_unlock(); } static inline void cgroup_account_cputime_field(struct task_struct *task, -- cgit From 3eba0505d03a9c1eb30d40c2330c0880b22d1b3a Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Sun, 20 Feb 2022 13:14:26 +0800 Subject: sched/cpuacct: Remove redundant RCU read lock The cpuacct_account_field() and it's cgroup v2 wrapper cgroup_account_cputime_field() is only called from cputime in task_group_account_field(), which is already in RCU read-side critical section. So remove these redundant RCU read lock. Suggested-by: Tejun Heo Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220220051426.5274-3-zhouchengming@bytedance.com --- include/linux/cgroup.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 9a109c6ac0e0..1e356c222756 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -804,11 +804,9 @@ static inline void cgroup_account_cputime_field(struct task_struct *task, cpuacct_account_field(task, index, delta_exec); - rcu_read_lock(); cgrp = task_dfl_cgroup(task); if (cgroup_parent(cgrp)) __cgroup_account_cputime_field(cgrp, index, delta_exec); - rcu_read_unlock(); } #else /* CONFIG_CGROUPS */ -- cgit From fa2c3254d7cfff5f7a916ab928a562d1165f17bb Mon Sep 17 00:00:00 2001 From: Valentin Schneider Date: Thu, 20 Jan 2022 16:25:19 +0000 Subject: sched/tracing: Don't re-read p->state when emitting sched_switch event As of commit c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu") the following sequence becomes possible: p->__state = TASK_INTERRUPTIBLE; __schedule() deactivate_task(p); ttwu() READ !p->on_rq p->__state=TASK_WAKING trace_sched_switch() __trace_sched_switch_state() task_state_index() return 0; TASK_WAKING isn't in TASK_REPORT, so the task appears as TASK_RUNNING in the trace event. Prevent this by pushing the value read from __schedule() down the trace event. Reported-by: Abhijeet Dharmapurikar Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Steven Rostedt (Google) Link: https://lore.kernel.org/r/20220120162520.570782-2-valentin.schneider@arm.com --- include/linux/sched.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index f00132e7ef6e..457c8a058b77 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1620,10 +1620,10 @@ static inline pid_t task_pgrp_nr(struct task_struct *tsk) #define TASK_REPORT_IDLE (TASK_REPORT + 1) #define TASK_REPORT_MAX (TASK_REPORT_IDLE << 1) -static inline unsigned int task_state_index(struct task_struct *tsk) +static inline unsigned int __task_state_index(unsigned int tsk_state, + unsigned int tsk_exit_state) { - unsigned int tsk_state = READ_ONCE(tsk->__state); - unsigned int state = (tsk_state | tsk->exit_state) & TASK_REPORT; + unsigned int state = (tsk_state | tsk_exit_state) & TASK_REPORT; BUILD_BUG_ON_NOT_POWER_OF_2(TASK_REPORT_MAX); @@ -1633,6 +1633,11 @@ static inline unsigned int task_state_index(struct task_struct *tsk) return fls(state); } +static inline unsigned int task_state_index(struct task_struct *tsk) +{ + return __task_state_index(READ_ONCE(tsk->__state), tsk->exit_state); +} + static inline char task_index_to_char(unsigned int state) { static const char state_char[] = "RSDTtXZPI"; -- cgit From 25795ef6299f07ce3838f3253a9cb34f64efcfae Mon Sep 17 00:00:00 2001 From: Valentin Schneider Date: Thu, 20 Jan 2022 16:25:20 +0000 Subject: sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit TASK_RTLOCK_WAIT currently isn't part of TASK_REPORT, thus a task blocking on an rtlock will appear as having a task state == 0, IOW TASK_RUNNING. The actual state is saved in p->saved_state, but reading it after reading p->__state has a few issues: o that could still be TASK_RUNNING in the case of e.g. rt_spin_lock o ttwu_state_match() might have changed that to TASK_RUNNING As pointed out by Eric, adding TASK_RTLOCK_WAIT to TASK_REPORT implies exposing a new state to userspace tools which way not know what to do with them. The only information that needs to be conveyed here is that a task is waiting on an rt_mutex, which matches TASK_UNINTERRUPTIBLE - there's no need for a new state. Reported-by: Uwe Kleine-König Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Steven Rostedt (Google) Link: https://lore.kernel.org/r/20220120162520.570782-3-valentin.schneider@arm.com --- include/linux/sched.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 457c8a058b77..78b606c9ab38 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1630,6 +1630,14 @@ static inline unsigned int __task_state_index(unsigned int tsk_state, if (tsk_state == TASK_IDLE) state = TASK_REPORT_IDLE; + /* + * We're lying here, but rather than expose a completely new task state + * to userspace, we can make this appear as if the task has gone through + * a regular rt_mutex_lock() call. + */ + if (tsk_state == TASK_RTLOCK_WAIT) + state = TASK_UNINTERRUPTIBLE; + return fls(state); } -- cgit From 6ddf5f165f13ab623d04aee2a473d35818255199 Mon Sep 17 00:00:00 2001 From: Milind Changire Date: Mon, 14 Feb 2022 05:01:01 +0000 Subject: ceph: add getvxattr op Problem: Some directory vxattrs (e.g. ceph.dir.pin.random) are governed by information that isn't necessarily shared with the client. Add support for the new GETVXATTR operation, which allows the client to query the MDS directly for vxattrs. When the client is queried for a vxattr that doesn't have a special handler, have it issue a GETVXATTR to the MDS directly. Solution: Adds new getvxattr op to fetch ceph.dir.pin*, ceph.dir.layout* and ceph.file.layout* vxattrs. If the entire layout for a dir or a file is being set, then it is expected that the layout be set in standard JSON format. Individual field value retrieval is not wrapped in JSON. The JSON format also applies while setting the vxattr if the entire layout is being set in one go. As a temporary measure, setting a vxattr can also be done in the old format. The old format will be deprecated in the future. URL: https://tracker.ceph.com/issues/51062 Signed-off-by: Milind Changire Reviewed-by: Jeff Layton Signed-off-by: Ilya Dryomov --- include/linux/ceph/ceph_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 7ad6c3d0db7d..66db21ac5f0c 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -328,6 +328,7 @@ enum { CEPH_MDS_OP_LOOKUPPARENT = 0x00103, CEPH_MDS_OP_LOOKUPINO = 0x00104, CEPH_MDS_OP_LOOKUPNAME = 0x00105, + CEPH_MDS_OP_GETVXATTR = 0x00106, CEPH_MDS_OP_SETXATTR = 0x01105, CEPH_MDS_OP_RMXATTR = 0x01106, -- cgit From ab58a5a1c0487b67f7409f39d3c8593d416d4e7f Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Tue, 15 Feb 2022 20:23:14 +0800 Subject: ceph: move to a dedicated slabcache for ceph_cap_snap There could be huge number of capsnaps around at any given time. On x86_64 the structure is 248 bytes, which will be rounded up to 256 bytes by kzalloc. Move this to a dedicated slabcache to save 8 bytes for each. [ jlayton: use kmem_cache_zalloc ] Signed-off-by: Xiubo Li Signed-off-by: Jeff Layton Signed-off-by: Ilya Dryomov --- include/linux/ceph/libceph.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ceph/libceph.h b/include/linux/ceph/libceph.h index edf62eaa6285..00af2c98da75 100644 --- a/include/linux/ceph/libceph.h +++ b/include/linux/ceph/libceph.h @@ -284,6 +284,7 @@ DEFINE_RB_LOOKUP_FUNC(name, type, keyfld, nodefld) extern struct kmem_cache *ceph_inode_cachep; extern struct kmem_cache *ceph_cap_cachep; +extern struct kmem_cache *ceph_cap_snap_cachep; extern struct kmem_cache *ceph_cap_flush_cachep; extern struct kmem_cache *ceph_dentry_cachep; extern struct kmem_cache *ceph_file_cachep; -- cgit From 1753629ea0f34900467185b7d8b0db11a64f4728 Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Wed, 23 Feb 2022 09:04:55 +0800 Subject: ceph: remove incorrect and unused CEPH_INO_DOTDOT macro Ceph have removed this macro and the 0x3 will be use for global dummy snaprealm. Signed-off-by: Xiubo Li Reviewed-by: Jeff Layton Signed-off-by: Ilya Dryomov --- include/linux/ceph/ceph_fs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 66db21ac5f0c..f14f9bc290e6 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -29,7 +29,6 @@ #define CEPH_INO_ROOT 1 #define CEPH_INO_CEPH 2 /* hidden .ceph dir */ -#define CEPH_INO_DOTDOT 3 /* used by ceph fuse for parent (..) */ /* arbitrary limit on max # of monitors (cluster of 3 is typical) */ #define CEPH_MAX_MON 31 -- cgit From 5ed91587e201c77b35a5555c8c082655bb5834fe Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Wed, 23 Feb 2022 09:04:56 +0800 Subject: ceph: do not release the global snaprealm until unmounting The global snaprealm would be created and then destroyed immediately every time when updating it. URL: https://tracker.ceph.com/issues/54362 Signed-off-by: Xiubo Li Reviewed-by: Jeff Layton Signed-off-by: Ilya Dryomov --- include/linux/ceph/ceph_fs.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index f14f9bc290e6..86bf82dbd8b8 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -28,7 +28,8 @@ #define CEPH_INO_ROOT 1 -#define CEPH_INO_CEPH 2 /* hidden .ceph dir */ +#define CEPH_INO_CEPH 2 /* hidden .ceph dir */ +#define CEPH_INO_GLOBAL_SNAPREALM 3 /* global dummy snaprealm */ /* arbitrary limit on max # of monitors (cluster of 3 is typical) */ #define CEPH_MAX_MON 31 -- cgit From 2cbfae0f50f7a0f8fc9d552bd856f6cd7b7608b6 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 24 Feb 2022 01:56:20 +0200 Subject: ACPI: platform: Constify properties parameter in acpi_create_platform_device() Properties are not and should not be changed in the callee, hence constify properties parameter in acpi_create_platform_device(). Signed-off-by: Andy Shevchenko Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6274758648e3..9ac545379447 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -691,7 +691,7 @@ int acpi_device_uevent_modalias(struct device *, struct kobj_uevent_env *); int acpi_device_modalias(struct device *, char *, int); struct platform_device *acpi_create_platform_device(struct acpi_device *, - struct property_entry *); + const struct property_entry *); #define ACPI_PTR(_ptr) (_ptr) static inline void acpi_device_set_enumerated(struct acpi_device *adev) @@ -930,7 +930,7 @@ static inline int acpi_device_modalias(struct device *dev, static inline struct platform_device * acpi_create_platform_device(struct acpi_device *adev, - struct property_entry *properties) + const struct property_entry *properties) { return NULL; } -- cgit From d65bc29be0ae4ca2368df25dc6f6247aefb57f07 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Sun, 13 Feb 2022 22:25:20 +0300 Subject: binfmt: move more stuff undef CONFIG_COREDUMP struct linux_binfmt::core_dump and struct min_coredump::min_coredump are used under CONFIG_COREDUMP only. Shrink those embedded configs a bit. Signed-off-by: Alexey Dobriyan Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/YglbIFyN+OtwVyjW@localhost.localdomain --- include/linux/binfmts.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 049cf9421d83..5d651c219c99 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -98,8 +98,10 @@ struct linux_binfmt { struct module *module; int (*load_binary)(struct linux_binprm *); int (*load_shlib)(struct file *); +#ifdef CONFIG_COREDUMP int (*core_dump)(struct coredump_params *cprm); unsigned long min_coredump; /* minimal dump size */ +#endif } __randomize_layout; extern void __register_binfmt(struct linux_binfmt *fmt, int insert); -- cgit From 26440303310591e29121964ede0048583cb3126d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 24 Feb 2022 18:55:52 +0100 Subject: scsi: core: Remove This header is empty now except for an include of , so remove it. Link: https://lore.kernel.org/r/20220224175552.988286-9-hch@lst.de Reviewed-by: Bart Van Assche Reviewed-by: John Garry Signed-off-by: Christoph Hellwig Signed-off-by: Martin K. Petersen --- include/linux/bsg-lib.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bsg-lib.h b/include/linux/bsg-lib.h index 6b211323a489..9e97ced2896d 100644 --- a/include/linux/bsg-lib.h +++ b/include/linux/bsg-lib.h @@ -10,7 +10,6 @@ #define _BLK_BSG_ #include -#include struct bsg_job; struct request; -- cgit From 728dd0ab37421396927749fc8cec9c2009c526c8 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 22 Feb 2022 08:59:33 -0500 Subject: NFS: Don't re-read the entire page cache to find the next cookie If the page cache entry that was last read gets invalidated for some reason, then make sure we can re-create it on the next call to readdir. This, combined with the cache page validation, allows us to reuse the cached value of page-index on successive calls to nfs_readdir. Credit is due to Benjamin Coddington for showing that the concept works, and that it allows for improved cache sharing between processes even in the case where pages are lost due to LRU or active invalidation. Suggested-by: Benjamin Coddington Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 6e10725887d1..1c533f2c1f36 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -105,6 +105,7 @@ struct nfs_open_dir_context { __be32 verf[NFS_DIR_VERIFIER_SIZE]; __u64 dir_cookie; __u64 dup_cookie; + __u64 last_cookie; pgoff_t page_index; signed char duped; bool eof; -- cgit From 580f236737d13ee25d5b0b1d124f50014fe6833b Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 7 Feb 2022 13:37:00 -0500 Subject: NFS: Adjust the amount of readahead performed by NFS readdir The current NFS readdir code will always try to maximise the amount of readahead it performs on the assumption that we can cache anything that isn't immediately read by the process. There are several cases where this assumption breaks down, including when the 'ls -l' heuristic kicks in to try to force use of readdirplus as a batch replacement for lookup/getattr. This patch therefore tries to tone down the amount of readahead we perform, and adjust it to try to match the amount of data being requested by user space. Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 1c533f2c1f36..691a27936849 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -107,6 +107,7 @@ struct nfs_open_dir_context { __u64 dup_cookie; __u64 last_cookie; pgoff_t page_index; + unsigned int dtsize; signed char duped; bool eof; }; -- cgit From 230bc98f7a2a49eb472d184bdec91fd3096384b3 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Thu, 17 Feb 2022 11:08:24 -0500 Subject: NFS: Improve heuristic for readdirplus The heuristic for readdirplus is designed to try to detect 'ls -l' and similar patterns. It does so by looking for cache hit/miss patterns in both the attribute cache and in the dcache of the files in a given directory, and then sets a flag for the readdirplus code to interpret. The problem with this approach is that a single attribute or dcache miss can cause the NFS code to force a refresh of the attributes for the entire set of files contained in the directory. To be able to make a more nuanced decision, let's sample the number of hits and misses in the set of open directory descriptors. That allows us to set thresholds at which we start preferring READDIRPLUS over regular READDIR, or at which we start to force a re-read of the remaining readdir cache using READDIRPLUS. Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 691a27936849..20a4cf0acad2 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -101,6 +101,8 @@ struct nfs_open_context { struct nfs_open_dir_context { struct list_head list; + atomic_t cache_hits; + atomic_t cache_misses; unsigned long attr_gencount; __be32 verf[NFS_DIR_VERIFIER_SIZE]; __u64 dir_cookie; @@ -110,6 +112,7 @@ struct nfs_open_dir_context { unsigned int dtsize; signed char duped; bool eof; + struct rcu_head rcu_head; }; /* @@ -274,13 +277,11 @@ struct nfs4_copy_state { /* * Bit offsets in flags field */ -#define NFS_INO_ADVISE_RDPLUS (0) /* advise readdirplus */ #define NFS_INO_STALE (1) /* possible stale inode */ #define NFS_INO_ACL_LRU_SET (2) /* Inode is on the LRU list */ #define NFS_INO_INVALIDATING (3) /* inode is being invalidated */ #define NFS_INO_PRESERVE_UNLINKED (4) /* preserve file if removed while open */ #define NFS_INO_FSCACHE (5) /* inode can be cached by FS-Cache */ -#define NFS_INO_FORCE_READDIR (7) /* force readdirplus */ #define NFS_INO_LAYOUTCOMMIT (9) /* layoutcommit required */ #define NFS_INO_LAYOUTCOMMITTING (10) /* layoutcommit inflight */ #define NFS_INO_LAYOUTSTATS (11) /* layoutstats inflight */ -- cgit From f648022faa68ef76058aa121d1aa3a967d59cae8 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 23 Feb 2022 11:31:51 -0500 Subject: NFS: Convert readdir page cache to use a cookie based index Instead of using a linear index to address the pages, use the cookie of the first entry, since that is what we use to match the page anyway. This allows us to avoid re-reading the entire cache on a seekdir() type of operation. The latter is very common when re-exporting NFS, and is a major performance drain. The change does affect our duplicate cookie detection, since we can no longer rely on the page index as a linear offset for detecting whether we looped backwards. However since we no longer do a linear search through all the pages on each call to nfs_readdir(), this is less of a concern than it was previously. The other downside is that invalidate_mapping_pages() no longer can use the page index to avoid clearing pages that have been read. A subsequent patch will restore the functionality this provides to the 'ls -l' heuristic. Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 20a4cf0acad2..42aad886d3c0 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -106,11 +106,9 @@ struct nfs_open_dir_context { unsigned long attr_gencount; __be32 verf[NFS_DIR_VERIFIER_SIZE]; __u64 dir_cookie; - __u64 dup_cookie; __u64 last_cookie; pgoff_t page_index; unsigned int dtsize; - signed char duped; bool eof; struct rcu_head rcu_head; }; -- cgit From b0365ccb0712efacf99936e94e92eb7ae63de4d5 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 23 Feb 2022 13:29:59 -0500 Subject: NFS: Fix up forced readdirplus Avoid clearing the entire readdir page cache if we're just doing forced readdirplus for the 'ls -l' heuristic. Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 42aad886d3c0..3893386ceaed 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -109,6 +109,7 @@ struct nfs_open_dir_context { __u64 last_cookie; pgoff_t page_index; unsigned int dtsize; + bool force_clear; bool eof; struct rcu_head rcu_head; }; -- cgit From 0adf85b445c7fbc5d2df1f8c1bc54d62c4340237 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sun, 27 Feb 2022 12:46:24 -0500 Subject: NFS: Optimise away the previous cookie field Replace the 'previous cookie' field in struct nfs_entry with the array->last_cookie. Signed-off-by: Trond Myklebust --- include/linux/nfs_xdr.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 728cb0c1f0b6..82f7c2730b9a 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -745,8 +745,7 @@ struct nfs_auth_info { */ struct nfs_entry { __u64 ino; - __u64 cookie, - prev_cookie; + __u64 cookie; const char * name; unsigned int len; int eof; -- cgit From d6097b8d5d55f26cd2244e7e7f00a5a077772a91 Mon Sep 17 00:00:00 2001 From: Nicolai Stange Date: Mon, 21 Feb 2022 13:10:58 +0100 Subject: crypto: api - allow algs only in specific constructions in FIPS mode Currently we do not distinguish between algorithms that fail on the self-test vs. those which are disabled in FIPS mode (not allowed). Both are marked as having failed the self-test. Recently the need arose to allow the usage of certain algorithms only as arguments to specific template instantiations in FIPS mode. For example, standalone "dh" must be blocked, but e.g. "ffdhe2048(dh)" is allowed. Other potential use cases include "cbcmac(aes)", which must only be used with ccm(), or "ghash", which must be used only for gcm(). This patch allows this scenario by adding a new flag FIPS_INTERNAL to indicate those algorithms that are not FIPS-allowed. They can then be used as template arguments only, i.e. when looked up via crypto_grab_spawn() to be more specific. The FIPS_INTERNAL bit gets propagated upwards recursively into the surrounding template instances, until the construction eventually matches an explicit testmgr entry with ->fips_allowed being set, if any. The behaviour to skip !->fips_allowed self-test executions in FIPS mode will be retained. Note that this effectively means that FIPS_INTERNAL algorithms are handled very similarly to the INTERNAL ones in this regard. It is expected that the FIPS_INTERNAL algorithms will receive sufficient testing when the larger constructions they're a part of, if any, get exercised by testmgr. Note that as a side-effect of this patch algorithms which are not FIPS-allowed will now return ENOENT instead of ELIBBAD. Hopefully this is not an issue as some people were relying on this already. Link: https://lore.kernel.org/r/YeEVSaMEVJb3cQkq@gondor.apana.org.au Originally-by: Herbert Xu Signed-off-by: Nicolai Stange Reviewed-by: Hannes Reinecke Signed-off-by: Herbert Xu --- include/linux/crypto.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/crypto.h b/include/linux/crypto.h index 855869e1fd32..2324ab6f1846 100644 --- a/include/linux/crypto.h +++ b/include/linux/crypto.h @@ -132,6 +132,15 @@ */ #define CRYPTO_ALG_ALLOCATES_MEMORY 0x00010000 +/* + * Mark an algorithm as a service implementation only usable by a + * template and never by a normal user of the kernel crypto API. + * This is intended to be used by algorithms that are themselves + * not FIPS-approved but may instead be used to implement parts of + * a FIPS-approved algorithm (e.g., dh vs. ffdhe2048(dh)). + */ +#define CRYPTO_ALG_FIPS_INTERNAL 0x00020000 + /* * Transform masks and values (for crt_flags). */ -- cgit From 80f940ef527efdd8e36c69f7b5ee8e07ac8891d9 Mon Sep 17 00:00:00 2001 From: Harsha Date: Wed, 23 Feb 2022 16:05:02 +0530 Subject: firmware: xilinx: Add ZynqMP SHA API for SHA3 functionality This patch adds zynqmp_pm_sha_hash API in the ZynqMP firmware to compute SHA3 hash of given data. Signed-off-by: Harsha Signed-off-by: Kalyani Akula Acked-by: Michal Simek Signed-off-by: Herbert Xu --- include/linux/firmware/xlnx-zynqmp.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h index 907cb01890cf..f6783f58c64a 100644 --- a/include/linux/firmware/xlnx-zynqmp.h +++ b/include/linux/firmware/xlnx-zynqmp.h @@ -93,6 +93,7 @@ enum pm_api_id { PM_FPGA_LOAD = 22, PM_FPGA_GET_STATUS = 23, PM_GET_CHIPID = 24, + PM_SECURE_SHA = 26, PM_PINCTRL_REQUEST = 28, PM_PINCTRL_RELEASE = 29, PM_PINCTRL_GET_FUNCTION = 30, @@ -427,6 +428,7 @@ int zynqmp_pm_set_requirement(const u32 node, const u32 capabilities, const u32 qos, const enum zynqmp_pm_request_ack ack); int zynqmp_pm_aes_engine(const u64 address, u32 *out); +int zynqmp_pm_sha_hash(const u64 address, const u32 size, const u32 flags); int zynqmp_pm_fpga_load(const u64 address, const u32 size, const u32 flags); int zynqmp_pm_fpga_get_status(u32 *value); int zynqmp_pm_write_ggs(u32 index, u32 value); @@ -601,6 +603,12 @@ static inline int zynqmp_pm_aes_engine(const u64 address, u32 *out) return -ENODEV; } +static inline int zynqmp_pm_sha_hash(const u64 address, const u32 size, + const u32 flags) +{ + return -ENODEV; +} + static inline int zynqmp_pm_fpga_load(const u64 address, const u32 size, const u32 flags) { -- cgit From 4f9a7a1dc2a294c5c5c4b0246e2281e6ec88fb91 Mon Sep 17 00:00:00 2001 From: Lukasz Luba Date: Wed, 2 Mar 2022 11:29:14 +0000 Subject: OPP: Add "opp-microwatt" supporting code Add new property to the OPP: power value. The OPP entry in the DT can have "opp-microwatt". Add the needed code to handle this new property in the existing infrastructure. Signed-off-by: Lukasz Luba Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 879c138c7b8e..0d85a63a1f78 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -32,14 +32,17 @@ enum dev_pm_opp_event { * @u_volt_min: Minimum voltage in microvolts corresponding to this OPP * @u_volt_max: Maximum voltage in microvolts corresponding to this OPP * @u_amp: Maximum current drawn by the device in microamperes + * @u_watt: Power used by the device in microwatts * - * This structure stores the voltage/current values for a single power supply. + * This structure stores the voltage/current/power values for a single power + * supply. */ struct dev_pm_opp_supply { unsigned long u_volt; unsigned long u_volt_min; unsigned long u_volt_max; unsigned long u_amp; + unsigned long u_watt; }; /** @@ -94,6 +97,8 @@ void dev_pm_opp_put_opp_table(struct opp_table *opp_table); unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp); +unsigned long dev_pm_opp_get_power(struct dev_pm_opp *opp); + unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp); unsigned int dev_pm_opp_get_level(struct dev_pm_opp *opp); @@ -186,6 +191,11 @@ static inline unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp) return 0; } +static inline unsigned long dev_pm_opp_get_power(struct dev_pm_opp *opp) +{ + return 0; +} + static inline unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp) { return 0; -- cgit From caeea9e6671984c3865918459d756b961a24bb49 Mon Sep 17 00:00:00 2001 From: Lukasz Luba Date: Wed, 2 Mar 2022 11:29:15 +0000 Subject: PM: EM: add macro to set .active_power() callback conditionally The Energy Model is able to use new power values coming from DT. Add a new macro which is helpful in setting the .active_power() callback conditionally in setup time. The dual-macro implementation handles both kernel configurations: w/ EM and w/o EM built-in. Reported-by: kernel test robot Signed-off-by: Lukasz Luba Signed-off-by: Viresh Kumar --- include/linux/energy_model.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 6377adc3b78d..9f3c400bc52d 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -116,6 +116,7 @@ struct em_data_callback { struct device *dev); }; #define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb } +#define EM_SET_ACTIVE_POWER_CB(em_cb, cb) ((em_cb).active_power = cb) struct em_perf_domain *em_cpu_get(int cpu); struct em_perf_domain *em_pd_get(struct device *dev); @@ -264,6 +265,7 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd) #else struct em_data_callback {}; #define EM_DATA_CB(_active_power_cb) { } +#define EM_SET_ACTIVE_POWER_CB(em_cb, cb) do { } while (0) static inline int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, -- cgit From bf08824a0f4776fc0626b82b6924fa1a5643eacb Mon Sep 17 00:00:00 2001 From: Kurt Kanzenbach Date: Mon, 28 Feb 2022 20:58:56 +0100 Subject: flow_dissector: Add support for HSR Network drivers such as igb or igc call eth_get_headlen() to determine the header length for their to be constructed skbs in receive path. When running HSR on top of these drivers, it results in triggering BUG_ON() in skb_pull(). The reason is the skb headlen is not sufficient for HSR to work correctly. skb_pull() notices that. For instance, eth_get_headlen() returns 14 bytes for TCP traffic over HSR which is not correct. The problem is, the flow dissection code does not take HSR into account. Therefore, add support for it. Reported-by: Anthony Harivel Signed-off-by: Kurt Kanzenbach Link: https://lore.kernel.org/r/20220228195856.88187-1-kurt@linutronix.de Signed-off-by: Jakub Kicinski --- include/linux/if_hsr.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/if_hsr.h b/include/linux/if_hsr.h index 38bbc537d4e4..408539d5ea5f 100644 --- a/include/linux/if_hsr.h +++ b/include/linux/if_hsr.h @@ -9,6 +9,22 @@ enum hsr_version { PRP_V1, }; +/* HSR Tag. + * As defined in IEC-62439-3:2010, the HSR tag is really { ethertype = 0x88FB, + * path, LSDU_size, sequence Nr }. But we let eth_header() create { h_dest, + * h_source, h_proto = 0x88FB }, and add { path, LSDU_size, sequence Nr, + * encapsulated protocol } instead. + * + * Field names as defined in the IEC:2010 standard for HSR. + */ +struct hsr_tag { + __be16 path_and_LSDU_size; + __be16 sequence_nr; + __be16 encap_proto; +} __packed; + +#define HSR_HLEN 6 + #if IS_ENABLED(CONFIG_HSR) extern bool is_hsr_master(struct net_device *dev); extern int hsr_get_version(struct net_device *dev, enum hsr_version *ver); -- cgit From 9309f97aef6d8250bb484dabeac925c3a7c57716 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Wed, 2 Mar 2022 18:31:20 +0200 Subject: net: dev: Add hardware stats support Offloading switch device drivers may be able to collect statistics of the traffic taking place in the HW datapath that pertains to a certain soft netdevice, such as VLAN. Add the necessary infrastructure to allow exposing these statistics to the offloaded netdevice in question. The API was shaped by the following considerations: - Collection of HW statistics is not free: there may be a finite number of counters, and the act of counting may have a performance impact. It is therefore necessary to allow toggling whether HW counting should be done for any particular SW netdevice. - As the drivers are loaded and removed, a particular device may get offloaded and unoffloaded again. At the same time, the statistics values need to stay monotonic (modulo the eventual 64-bit wraparound), increasing only to reflect traffic measured in the device. To that end, the netdevice keeps around a lazily-allocated copy of struct rtnl_link_stats64. Device drivers then contribute to the values kept therein at various points. Even as the driver goes away, the struct stays around to maintain the statistics values. - Different HW devices may be able to count different things. The motivation behind this patch in particular is exposure of HW counters on Nvidia Spectrum switches, where the only practical approach to counting traffic on offloaded soft netdevices currently is to use router interface counters, and count L3 traffic. Correspondingly that is the statistics suite added in this patch. Other devices may be able to measure different kinds of traffic, and for that reason, the APIs are built to allow uniform access to different statistics suites. - Because soft netdevices and offloading drivers are only loosely bound, a netdevice uses a notifier chain to communicate with the drivers. Several new notifiers, NETDEV_OFFLOAD_XSTATS_*, have been added to carry messages to the offloading drivers. - Devices can have various conditions for when a particular counter is available. As the device is configured and reconfigured, the device offload may become or cease being suitable for counter binding. A netdevice can use a notifier type NETDEV_OFFLOAD_XSTATS_REPORT_USED to ping offloading drivers and determine whether anyone currently implements a given statistics suite. This information can then be propagated to user space. When the driver decides to unoffload a netdevice, it can use a newly-added function, netdev_offload_xstats_report_delta(), to record outstanding collected statistics, before destroying the HW counter. This patch adds a helper, call_netdevice_notifiers_info_robust(), for dispatching a notifier with the possibility of unwind when one of the consumers bails. Given the wish to eventually get rid of the global notifier block altogether, this helper only invokes the per-netns notifier block. Signed-off-by: Petr Machata Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- include/linux/netdevice.h | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index c79ee2296296..19a27ac361ef 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1950,6 +1950,7 @@ enum netdev_ml_priv_type { * @watchdog_dev_tracker: refcount tracker used by watchdog. * @dev_registered_tracker: tracker for reference held while * registered + * @offload_xstats_l3: L3 HW stats for this netdevice. * * FIXME: cleanup struct net_device such that network protocol info * moves out. @@ -2287,6 +2288,7 @@ struct net_device { netdevice_tracker linkwatch_dev_tracker; netdevice_tracker watchdog_dev_tracker; netdevice_tracker dev_registered_tracker; + struct rtnl_hw_stats64 *offload_xstats_l3; }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -2727,6 +2729,10 @@ enum netdev_cmd { NETDEV_CVLAN_FILTER_DROP_INFO, NETDEV_SVLAN_FILTER_PUSH_INFO, NETDEV_SVLAN_FILTER_DROP_INFO, + NETDEV_OFFLOAD_XSTATS_ENABLE, + NETDEV_OFFLOAD_XSTATS_DISABLE, + NETDEV_OFFLOAD_XSTATS_REPORT_USED, + NETDEV_OFFLOAD_XSTATS_REPORT_DELTA, }; const char *netdev_cmd_to_name(enum netdev_cmd cmd); @@ -2777,6 +2783,42 @@ struct netdev_notifier_pre_changeaddr_info { const unsigned char *dev_addr; }; +enum netdev_offload_xstats_type { + NETDEV_OFFLOAD_XSTATS_TYPE_L3 = 1, +}; + +struct netdev_notifier_offload_xstats_info { + struct netdev_notifier_info info; /* must be first */ + enum netdev_offload_xstats_type type; + + union { + /* NETDEV_OFFLOAD_XSTATS_REPORT_DELTA */ + struct netdev_notifier_offload_xstats_rd *report_delta; + /* NETDEV_OFFLOAD_XSTATS_REPORT_USED */ + struct netdev_notifier_offload_xstats_ru *report_used; + }; +}; + +int netdev_offload_xstats_enable(struct net_device *dev, + enum netdev_offload_xstats_type type, + struct netlink_ext_ack *extack); +int netdev_offload_xstats_disable(struct net_device *dev, + enum netdev_offload_xstats_type type); +bool netdev_offload_xstats_enabled(const struct net_device *dev, + enum netdev_offload_xstats_type type); +int netdev_offload_xstats_get(struct net_device *dev, + enum netdev_offload_xstats_type type, + struct rtnl_hw_stats64 *stats, bool *used, + struct netlink_ext_ack *extack); +void +netdev_offload_xstats_report_delta(struct netdev_notifier_offload_xstats_rd *rd, + const struct rtnl_hw_stats64 *stats); +void +netdev_offload_xstats_report_used(struct netdev_notifier_offload_xstats_ru *ru); +void netdev_offload_xstats_push_delta(struct net_device *dev, + enum netdev_offload_xstats_type type, + const struct rtnl_hw_stats64 *stats); + static inline void netdev_notifier_info_init(struct netdev_notifier_info *info, struct net_device *dev) { -- cgit From 5fd0b838efac16046509f7fb100455d0463b9687 Mon Sep 17 00:00:00 2001 From: Petr Machata Date: Wed, 2 Mar 2022 18:31:23 +0200 Subject: net: rtnetlink: Add UAPI toggle for IFLA_OFFLOAD_XSTATS_L3_STATS The offloaded HW stats are designed to allow per-netdevice enablement and disablement. Add an attribute, IFLA_STATS_SET_OFFLOAD_XSTATS_L3_STATS, which should be carried by the RTM_SETSTATS message, and expresses a desire to toggle L3 offload xstats on or off. As part of the above, add an exported function rtnl_offload_xstats_notify() that drivers can use when they have installed or deinstalled the counters backing the HW stats. At this point, it is possible to enable, disable and query L3 offload xstats on netdevices. (However there is no driver actually implementing these.) Signed-off-by: Petr Machata Signed-off-by: Ido Schimmel Signed-off-by: David S. Miller --- include/linux/rtnetlink.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index bb9cb84114c1..7f970b16da3a 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -134,4 +134,7 @@ extern int ndo_dflt_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq, int (*vlan_fill)(struct sk_buff *skb, struct net_device *dev, u32 filter_mask)); + +extern void rtnl_offload_xstats_notify(struct net_device *dev); + #endif /* __LINUX_RTNETLINK_H */ -- cgit From 445ad495f0ff71553693e6a2a9d7a3bc6917ca36 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Thu, 24 Feb 2022 16:20:17 +0200 Subject: vfio: Have the core code decode the VFIO_DEVICE_FEATURE ioctl Invoke a new device op 'device_feature' to handle just the data array portion of the command. This lifts the ioctl validation to the core code and makes it simpler for either the core code, or layered drivers, to implement their own feature values. Provide vfio_check_feature() to consolidate checking the flags/etc against what the driver supports. Link: https://lore.kernel.org/all/20220224142024.147653-9-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe Tested-by: Shameer Kolothum Reviewed-by: Alex Williamson Reviewed-by: Cornelia Huck Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- include/linux/vfio.h | 32 ++++++++++++++++++++++++++++++++ include/linux/vfio_pci_core.h | 2 ++ 2 files changed, 34 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 76191d7abed1..550c28f2ef60 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -55,6 +55,7 @@ struct vfio_device { * @match: Optional device name match callback (return: 0 for no-match, >0 for * match, -errno for abort (ex. match with insufficient or incorrect * additional args) + * @device_feature: Optional, fill in the VFIO_DEVICE_FEATURE ioctl */ struct vfio_device_ops { char *name; @@ -69,8 +70,39 @@ struct vfio_device_ops { int (*mmap)(struct vfio_device *vdev, struct vm_area_struct *vma); void (*request)(struct vfio_device *vdev, unsigned int count); int (*match)(struct vfio_device *vdev, char *buf); + int (*device_feature)(struct vfio_device *device, u32 flags, + void __user *arg, size_t argsz); }; +/** + * vfio_check_feature - Validate user input for the VFIO_DEVICE_FEATURE ioctl + * @flags: Arg from the device_feature op + * @argsz: Arg from the device_feature op + * @supported_ops: Combination of VFIO_DEVICE_FEATURE_GET and SET the driver + * supports + * @minsz: Minimum data size the driver accepts + * + * For use in a driver's device_feature op. Checks that the inputs to the + * VFIO_DEVICE_FEATURE ioctl are correct for the driver's feature. Returns 1 if + * the driver should execute the get or set, otherwise the relevant + * value should be returned. + */ +static inline int vfio_check_feature(u32 flags, size_t argsz, u32 supported_ops, + size_t minsz) +{ + if ((flags & (VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_SET)) & + ~supported_ops) + return -EINVAL; + if (flags & VFIO_DEVICE_FEATURE_PROBE) + return 0; + /* Without PROBE one of GET or SET must be requested */ + if (!(flags & (VFIO_DEVICE_FEATURE_GET | VFIO_DEVICE_FEATURE_SET))) + return -EINVAL; + if (argsz < minsz) + return -EINVAL; + return 1; +} + void vfio_init_group_dev(struct vfio_device *device, struct device *dev, const struct vfio_device_ops *ops); void vfio_uninit_group_dev(struct vfio_device *device); diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index ef9a44b6cf5d..beba0b2ed87d 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -220,6 +220,8 @@ int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn); extern const struct pci_error_handlers vfio_pci_core_err_handlers; long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, unsigned long arg); +int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, + void __user *arg, size_t argsz); ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf, size_t count, loff_t *ppos); ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf, -- cgit From 115dcec65f61d53e25e1bed5e380468b30f98b14 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Thu, 24 Feb 2022 16:20:18 +0200 Subject: vfio: Define device migration protocol v2 Replace the existing region based migration protocol with an ioctl based protocol. The two protocols have the same general semantic behaviors, but the way the data is transported is changed. This is the STOP_COPY portion of the new protocol, it defines the 5 states for basic stop and copy migration and the protocol to move the migration data in/out of the kernel. Compared to the clarification of the v1 protocol Alex proposed: https://lore.kernel.org/r/163909282574.728533.7460416142511440919.stgit@omen This has a few deliberate functional differences: - ERROR arcs allow the device function to remain unchanged. - The protocol is not required to return to the original state on transition failure. Instead userspace can execute an unwind back to the original state, reset, or do something else without needing kernel support. This simplifies the kernel design and should userspace choose a policy like always reset, avoids doing useless work in the kernel on error handling paths. - PRE_COPY is made optional, userspace must discover it before using it. This reflects the fact that the majority of drivers we are aware of right now will not implement PRE_COPY. - segmentation is not part of the data stream protocol, the receiver does not have to reproduce the framing boundaries. The hybrid FSM for the device_state is described as a Mealy machine by documenting each of the arcs the driver is required to implement. Defining the remaining set of old/new device_state transitions as 'combination transitions' which are naturally defined as taking multiple FSM arcs along the shortest path within the FSM's digraph allows a complete matrix of transitions. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE is defined to replace writing to the device_state field in the region. This allows returning a brand new FD whenever the requested transition opens a data transfer session. The VFIO core code implements the new feature and provides a helper function to the driver. Using the helper the driver only has to implement 6 of the FSM arcs and the other combination transitions are elaborated consistently from those arcs. A new VFIO_DEVICE_FEATURE of VFIO_DEVICE_FEATURE_MIGRATION is defined to report the capability for migration and indicate which set of states and arcs are supported by the device. The FSM provides a lot of flexibility to make backwards compatible extensions but the VFIO_DEVICE_FEATURE also allows for future breaking extensions for scenarios that cannot support even the basic STOP_COPY requirements. The VFIO_DEVICE_FEATURE_MIG_DEVICE_STATE with the GET option (i.e. VFIO_DEVICE_FEATURE_GET) can be used to read the current migration state of the VFIO device. Data transfer sessions are now carried over a file descriptor, instead of the region. The FD functions for the lifetime of the data transfer session. read() and write() transfer the data with normal Linux stream FD semantics. This design allows future expansion to support poll(), io_uring, and other performance optimizations. The complicated mmap mode for data transfer is discarded as current qemu doesn't take meaningful advantage of it, and the new qemu implementation avoids substantially all the performance penalty of using a read() on the region. Link: https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe Tested-by: Shameer Kolothum Reviewed-by: Kevin Tian Reviewed-by: Alex Williamson Reviewed-by: Cornelia Huck Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- include/linux/vfio.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 550c28f2ef60..c44e80bbbd3b 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -56,6 +56,16 @@ struct vfio_device { * match, -errno for abort (ex. match with insufficient or incorrect * additional args) * @device_feature: Optional, fill in the VFIO_DEVICE_FEATURE ioctl + * @migration_set_state: Optional callback to change the migration state for + * devices that support migration. It's mandatory for + * VFIO_DEVICE_FEATURE_MIGRATION migration support. + * The returned FD is used for data transfer according to the FSM + * definition. The driver is responsible to ensure that FD reaches end + * of stream or error whenever the migration FSM leaves a data transfer + * state or before close_device() returns. + * @migration_get_state: Optional callback to get the migration state for + * devices that support migration. It's mandatory for + * VFIO_DEVICE_FEATURE_MIGRATION migration support. */ struct vfio_device_ops { char *name; @@ -72,6 +82,11 @@ struct vfio_device_ops { int (*match)(struct vfio_device *vdev, char *buf); int (*device_feature)(struct vfio_device *device, u32 flags, void __user *arg, size_t argsz); + struct file *(*migration_set_state)( + struct vfio_device *device, + enum vfio_device_mig_state new_state); + int (*migration_get_state)(struct vfio_device *device, + enum vfio_device_mig_state *curr_state); }; /** @@ -114,6 +129,11 @@ extern void vfio_device_put(struct vfio_device *device); int vfio_assign_device_set(struct vfio_device *device, void *set_id); +int vfio_mig_get_next_state(struct vfio_device *device, + enum vfio_device_mig_state cur_fsm, + enum vfio_device_mig_state new_fsm, + enum vfio_device_mig_state *next_fsm); + /* * External user API */ -- cgit From 8cb3d83b959be0631cd719b995c40c3cda21cd47 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Thu, 24 Feb 2022 16:20:19 +0200 Subject: vfio: Extend the device migration protocol with RUNNING_P2P The RUNNING_P2P state is designed to support multiple devices in the same VM that are doing P2P transactions between themselves. When in RUNNING_P2P the device must be able to accept incoming P2P transactions but should not generate outgoing P2P transactions. As an optional extension to the mandatory states it is defined as in between STOP and RUNNING: STOP -> RUNNING_P2P -> RUNNING -> RUNNING_P2P -> STOP For drivers that are unable to support RUNNING_P2P the core code silently merges RUNNING_P2P and RUNNING together. Unless driver support is present, the new state cannot be used in SET_STATE. Drivers that support this will be required to implement 4 FSM arcs beyond the basic FSM. 2 of the basic FSM arcs become combination transitions. Compared to the v1 clarification, NDMA is redefined into FSM states and is described in terms of the desired P2P quiescent behavior, noting that halting all DMA is an acceptable implementation. Link: https://lore.kernel.org/all/20220224142024.147653-11-yishaih@nvidia.com Signed-off-by: Jason Gunthorpe Tested-by: Shameer Kolothum Reviewed-by: Kevin Tian Reviewed-by: Alex Williamson Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- include/linux/vfio.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index c44e80bbbd3b..66dda06ec42d 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -33,6 +33,7 @@ struct vfio_device { struct vfio_group *group; struct vfio_device_set *dev_set; struct list_head dev_set_list; + unsigned int migration_flags; /* Members below here are private, not for driver use */ refcount_t refcount; -- cgit From 915076f70efad04bf6fc843970a3c7763487011a Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Thu, 24 Feb 2022 16:20:23 +0200 Subject: vfio/pci: Expose vfio_pci_core_aer_err_detected() Expose vfio_pci_core_aer_err_detected() to be used by drivers as part of their pci_error_handlers structure. Next patch for mlx5 driver will use it. Link: https://lore.kernel.org/all/20220224142024.147653-15-yishaih@nvidia.com Reviewed-by: Alex Williamson Signed-off-by: Yishai Hadas Signed-off-by: Leon Romanovsky --- include/linux/vfio_pci_core.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index beba0b2ed87d..9f1bf8e49d43 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -232,6 +232,8 @@ int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf); int vfio_pci_core_enable(struct vfio_pci_core_device *vdev); void vfio_pci_core_disable(struct vfio_pci_core_device *vdev); void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev); +pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev, + pci_channel_state_t state); static inline bool vfio_pci_is_vga(struct pci_dev *pdev) { -- cgit From 3f8bab174cb26aa5a8053c4457cc733881e3ad88 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Thu, 3 Mar 2022 09:08:31 +0100 Subject: serial: make uart_console_write->putchar()'s character an unsigned char MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently, uart_console_write->putchar's second parameter (the character) is of type int. It makes little sense, provided uart_console_write() accepts the input string as "const char *s" and passes its content -- the characters -- to putchar(). So switch the character's type to unsigned char. We don't use char as that is signed on some platforms. That would cause troubles for drivers which (implicitly) cast the char to u16 when writing to the device. Sign extension would happen in that case and the value written would be completely different to the provided char. DZ is an example of such a driver -- on MIPS, it uses u16 for dz_out in dz_console_putchar(). Note we do the char -> uchar conversion implicitly in uart_console_write(). Provided we do not change size of the data type, sign extension does not happen there, so the problem is void. This makes the types consistent and unified with the rest of the uart layer, which uses unsigned char in most places already. One exception is xmit_buf, but that is going to be converted later. Cc: Paul Cercueil Cc: Tobias Klauser Cc: Russell King Cc: Vineet Gupta Cc: Nicolas Ferre Cc: Alexandre Belloni Cc: Ludovic Desroches Cc: Florian Fainelli Cc: bcm-kernel-feedback-list@broadcom.com Cc: Alexander Shiyan Cc: Baruch Siach Cc: "Maciej W. Rozycki" Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Albert Ou Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: Karol Gugala Cc: Mateusz Holenko Cc: Vladimir Zapolskiy Cc: Neil Armstrong Cc: Kevin Hilman Cc: Jerome Brunet Cc: Martin Blumenstingl Cc: Taichi Sugaya Cc: Takao Orito Cc: Liviu Dudau Cc: Sudeep Holla Cc: Lorenzo Pieralisi Cc: "Andreas Färber" Cc: Manivannan Sadhasivam Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Andy Gross Cc: Bjorn Andersson Cc: Krzysztof Kozlowski Cc: Orson Zhai Cc: Baolin Wang Cc: Chunyan Zhang Cc: Patrice Chotard Cc: Maxime Coquelin Cc: Alexandre Torgue Cc: "David S. Miller" Cc: Peter Korsgaard Cc: Michal Simek Acked-by: Richard Genoud [atmel_serial] Acked-by: Uwe Kleine-König Acked-by: Paul Cercueil Acked-by: Neil Armstrong # meson_serial Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220303080831.21783-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 31f7fe527395..14ae35f68abb 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -399,7 +399,7 @@ int uart_set_options(struct uart_port *port, struct console *co, int baud, struct tty_driver *uart_console_device(struct console *co, int *index); void uart_console_write(struct uart_port *port, const char *s, unsigned int count, - void (*putchar)(struct uart_port *, int)); + void (*putchar)(struct uart_port *, unsigned char)); /* * Port/driver registration/removal -- cgit From a1ac9c8acec1605c6b43af418f79facafdced680 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 2 Mar 2022 11:55:25 -0800 Subject: net: Add skb->mono_delivery_time to distinguish mono delivery_time from (rcv) timestamp skb->tstamp was first used as the (rcv) timestamp. The major usage is to report it to the user (e.g. SO_TIMESTAMP). Later, skb->tstamp is also set as the (future) delivery_time (e.g. EDT in TCP) during egress and used by the qdisc (e.g. sch_fq) to make decision on when the skb can be passed to the dev. Currently, there is no way to tell skb->tstamp having the (rcv) timestamp or the delivery_time, so it is always reset to 0 whenever forwarded between egress and ingress. While it makes sense to always clear the (rcv) timestamp in skb->tstamp to avoid confusing sch_fq that expects the delivery_time, it is a performance issue [0] to clear the delivery_time if the skb finally egress to a fq@phy-dev. For example, when forwarding from egress to ingress and then finally back to egress: tcp-sender => veth@netns => veth@hostns => fq@eth0@hostns ^ ^ reset rest This patch adds one bit skb->mono_delivery_time to flag the skb->tstamp is storing the mono delivery_time (EDT) instead of the (rcv) timestamp. The current use case is to keep the TCP mono delivery_time (EDT) and to be used with sch_fq. A latter patch will also allow tc-bpf@ingress to read and change the mono delivery_time. In the future, another bit (e.g. skb->user_delivery_time) can be added for the SCM_TXTIME where the clock base is tracked by sk->sk_clockid. [ This patch is a prep work. The following patches will get the other parts of the stack ready first. Then another patch after that will finally set the skb->mono_delivery_time. ] skb_set_delivery_time() function is added. It is used by the tcp_output.c and during ip[6] fragmentation to assign the delivery_time to the skb->tstamp and also set the skb->mono_delivery_time. A note on the change in ip_send_unicast_reply() in ip_output.c. It is only used by TCP to send reset/ack out of a ctl_sk. Like the new skb_set_delivery_time(), this patch sets the skb->mono_delivery_time to 0 for now as a place holder. It will be enabled in a latter patch. A similar case in tcp_ipv6 can be done with skb_set_delivery_time() in tcp_v6_send_response(). [0] (slide 22): https://linuxplumbersconf.org/event/11/contributions/953/attachments/867/1658/LPC_2021_BPF_Datapath_Extensions.pdf Signed-off-by: Martin KaFai Lau Signed-off-by: David S. Miller --- include/linux/skbuff.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d67941f78b92..803ffa63dea6 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -795,6 +795,10 @@ typedef unsigned char *sk_buff_data_t; * @dst_pending_confirm: need to confirm neighbour * @decrypted: Decrypted SKB * @slow_gro: state present at GRO time, slower prepare step required + * @mono_delivery_time: When set, skb->tstamp has the + * delivery_time in mono clock base (i.e. EDT). Otherwise, the + * skb->tstamp has the (rcv) timestamp at ingress and + * delivery_time at egress. * @napi_id: id of the NAPI struct this skb came from * @sender_cpu: (aka @napi_id) source CPU in XPS * @secmark: security marking @@ -965,6 +969,7 @@ struct sk_buff { __u8 decrypted:1; #endif __u8 slow_gro:1; + __u8 mono_delivery_time:1; #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ @@ -3983,6 +3988,14 @@ static inline ktime_t net_timedelta(ktime_t t) return ktime_sub(ktime_get_real(), t); } +static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, + bool mono) +{ + skb->tstamp = kt; + /* Setting mono_delivery_time will be enabled later */ + skb->mono_delivery_time = 0; +} + static inline u8 skb_metadata_len(const struct sk_buff *skb) { return skb_shinfo(skb)->meta_len; -- cgit From de799101519aad23c6096041ba2744d7b5517e6a Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 2 Mar 2022 11:55:31 -0800 Subject: net: Add skb_clear_tstamp() to keep the mono delivery_time Right now, skb->tstamp is reset to 0 whenever the skb is forwarded. If skb->tstamp has the mono delivery_time, clearing it can hurt the performance when it finally transmits out to fq@phy-dev. The earlier patch added a skb->mono_delivery_time bit to flag the skb->tstamp carrying the mono delivery_time. This patch adds skb_clear_tstamp() helper which keeps the mono delivery_time and clears everything else. The delivery_time clearing will be postponed until the stack knows the skb will be delivered locally. It will be done in a latter patch. Signed-off-by: Martin KaFai Lau Signed-off-by: David S. Miller --- include/linux/skbuff.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 803ffa63dea6..27a28920e7b3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3996,6 +3996,14 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, skb->mono_delivery_time = 0; } +static inline void skb_clear_tstamp(struct sk_buff *skb) +{ + if (skb->mono_delivery_time) + return; + + skb->tstamp = 0; +} + static inline u8 skb_metadata_len(const struct sk_buff *skb) { return skb_shinfo(skb)->meta_len; @@ -4852,7 +4860,7 @@ static inline void skb_set_redirected(struct sk_buff *skb, bool from_ingress) #ifdef CONFIG_NET_REDIRECT skb->from_ingress = from_ingress; if (skb->from_ingress) - skb->tstamp = 0; + skb_clear_tstamp(skb); #endif } -- cgit From 27942a15209f564ed8ee2a9e126cb7b105181355 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 2 Mar 2022 11:55:38 -0800 Subject: net: Handle delivery_time in skb->tstamp during network tapping with af_packet A latter patch will set the skb->mono_delivery_time to flag the skb->tstamp is used as the mono delivery_time (EDT) instead of the (rcv) timestamp. skb_clear_tstamp() will then keep this delivery_time during forwarding. This patch is to make the network tapping (with af_packet) to handle the delivery_time stored in skb->tstamp. Regardless of tapping at the ingress or egress, the tapped skb is received by the af_packet socket, so it is ingress to the af_packet socket and it expects the (rcv) timestamp. When tapping at egress, dev_queue_xmit_nit() is used. It has already expected skb->tstamp may have delivery_time, so it does skb_clone()+net_timestamp_set() to ensure the cloned skb has the (rcv) timestamp before passing to the af_packet sk. This patch only adds to clear the skb->mono_delivery_time bit in net_timestamp_set(). When tapping at ingress, it currently expects the skb->tstamp is either 0 or the (rcv) timestamp. Meaning, the tapping at ingress path has already expected the skb->tstamp could be 0 and it will get the (rcv) timestamp by ktime_get_real() when needed. There are two cases for tapping at ingress: One case is af_packet queues the skb to its sk_receive_queue. The skb is either not shared or new clone created. The newly added skb_clear_delivery_time() is called to clear the delivery_time (if any) and set the (rcv) timestamp if needed before the skb is queued to the sk_receive_queue. Another case, the ingress skb is directly copied to the rx_ring and tpacket_get_timestamp() is used to get the (rcv) timestamp. The newly added skb_tstamp() is used in tpacket_get_timestamp() to check the skb->mono_delivery_time bit before returning skb->tstamp. As mentioned earlier, the tapping@ingress has already expected the skb may not have the (rcv) timestamp (because no sk has asked for it) and has handled this case by directly calling ktime_get_real(). Signed-off-by: Martin KaFai Lau Signed-off-by: David S. Miller --- include/linux/skbuff.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 27a28920e7b3..7e2d796ece80 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3996,6 +3996,22 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, skb->mono_delivery_time = 0; } +DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); + +/* It is used in the ingress path to clear the delivery_time. + * If needed, set the skb->tstamp to the (rcv) timestamp. + */ +static inline void skb_clear_delivery_time(struct sk_buff *skb) +{ + if (skb->mono_delivery_time) { + skb->mono_delivery_time = 0; + if (static_branch_unlikely(&netstamp_needed_key)) + skb->tstamp = ktime_get_real(); + else + skb->tstamp = 0; + } +} + static inline void skb_clear_tstamp(struct sk_buff *skb) { if (skb->mono_delivery_time) @@ -4004,6 +4020,14 @@ static inline void skb_clear_tstamp(struct sk_buff *skb) skb->tstamp = 0; } +static inline ktime_t skb_tstamp(const struct sk_buff *skb) +{ + if (skb->mono_delivery_time) + return 0; + + return skb->tstamp; +} + static inline u8 skb_metadata_len(const struct sk_buff *skb) { return skb_shinfo(skb)->meta_len; -- cgit From d93376f503c7a586707925957592c0f16f4db0b1 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 2 Mar 2022 11:55:44 -0800 Subject: net: Clear mono_delivery_time bit in __skb_tstamp_tx() In __skb_tstamp_tx(), it may clone the egress skb and queues the clone to the sk_error_queue. The outgoing skb may have the mono delivery_time while the (rcv) timestamp is expected for the clone, so the skb->mono_delivery_time bit needs to be cleared from the clone. This patch adds the skb->mono_delivery_time clearing to the existing __net_timestamp() and use it in __skb_tstamp_tx(). The __net_timestamp() fast path usage in dev.c is changed to directly call ktime_get_real() since the mono_delivery_time bit is not set at that point. Signed-off-by: Martin KaFai Lau Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 7e2d796ece80..8e8a4af4f9e2 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3981,6 +3981,7 @@ static inline void skb_get_new_timestampns(const struct sk_buff *skb, static inline void __net_timestamp(struct sk_buff *skb) { skb->tstamp = ktime_get_real(); + skb->mono_delivery_time = 0; } static inline ktime_t net_timedelta(ktime_t t) -- cgit From d98d58a002619b5c165f1eedcd731e2fe2c19088 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 2 Mar 2022 11:55:50 -0800 Subject: net: Set skb->mono_delivery_time and clear it after sch_handle_ingress() The previous patches handled the delivery_time before sch_handle_ingress(). This patch can now set the skb->mono_delivery_time to flag the skb->tstamp is used as the mono delivery_time (EDT) instead of the (rcv) timestamp and also clear it with skb_clear_delivery_time() after sch_handle_ingress(). This will make the bpf_redirect_*() to keep the mono delivery_time and used by a qdisc (fq) of the egress-ing interface. A latter patch will postpone the skb_clear_delivery_time() until the stack learns that the skb is being delivered locally and that will make other kernel forwarding paths (ip[6]_forward) able to keep the delivery_time also. Thus, like the previous patches on using the skb->mono_delivery_time bit, calling skb_clear_delivery_time() is not limited within the CONFIG_NET_INGRESS to avoid too many code churns among this set. Signed-off-by: Martin KaFai Lau Signed-off-by: David S. Miller --- include/linux/skbuff.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8e8a4af4f9e2..0f5fd53059cd 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3993,8 +3993,7 @@ static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, bool mono) { skb->tstamp = kt; - /* Setting mono_delivery_time will be enabled later */ - skb->mono_delivery_time = 0; + skb->mono_delivery_time = kt && mono; } DECLARE_STATIC_KEY_FALSE(netstamp_needed_key); -- cgit From b6561f8491ca899e5a08311796085c9738d631ae Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 2 Mar 2022 11:56:09 -0800 Subject: net: ipv6: Get rcv timestamp if needed when handling hop-by-hop IOAM option IOAM is a hop-by-hop option with a temporary iana allocation (49). Since it is hop-by-hop, it is done before the input routing decision. One of the traced data field is the (rcv) timestamp. When the locally generated skb is looping from egress to ingress over a virtual interface (e.g. veth, loopback...), skb->tstamp may have the delivery time before it is known that it will be delivered locally and received by another sk. Like handling the network tapping (tcpdump) in the earlier patch, this patch gets the timestamp if needed without over-writing the delivery_time in the skb->tstamp. skb_tstamp_cond() is added to do the ktime_get_real() with an extra cond arg to check on top of the netstamp_needed_key static key. skb_tstamp_cond() will also be used in a latter patch and it needs the netstamp_needed_key check. Signed-off-by: Martin KaFai Lau Signed-off-by: David S. Miller --- include/linux/skbuff.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0f5fd53059cd..4b5b926a81f2 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -4028,6 +4028,17 @@ static inline ktime_t skb_tstamp(const struct sk_buff *skb) return skb->tstamp; } +static inline ktime_t skb_tstamp_cond(const struct sk_buff *skb, bool cond) +{ + if (!skb->mono_delivery_time && skb->tstamp) + return skb->tstamp; + + if (static_branch_unlikely(&netstamp_needed_key) || cond) + return ktime_get_real(); + + return 0; +} + static inline u8 skb_metadata_len(const struct sk_buff *skb) { return skb_shinfo(skb)->meta_len; -- cgit From 7449197d600d30d038b1ce6285ab4747096eac7f Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 2 Mar 2022 11:56:28 -0800 Subject: bpf: Keep the (rcv) timestamp behavior for the existing tc-bpf@ingress The current tc-bpf@ingress reads and writes the __sk_buff->tstamp as a (rcv) timestamp which currently could either be 0 (not available) or ktime_get_real(). This patch is to backward compatible with the (rcv) timestamp expectation at ingress. If the skb->tstamp has the delivery_time, the bpf insn rewrite will read 0 for tc-bpf running at ingress as it is not available. When writing at ingress, it will also clear the skb->mono_delivery_time bit. /* BPF_READ: a = __sk_buff->tstamp */ if (!skb->tc_at_ingress || !skb->mono_delivery_time) a = skb->tstamp; else a = 0 /* BPF_WRITE: __sk_buff->tstamp = a */ if (skb->tc_at_ingress) skb->mono_delivery_time = 0; skb->tstamp = a; [ A note on the BPF_CGROUP_INET_INGRESS which can also access skb->tstamp. At that point, the skb is delivered locally and skb_clear_delivery_time() has already been done, so the skb->tstamp will only have the (rcv) timestamp. ] If the tc-bpf@egress writes 0 to skb->tstamp, the skb->mono_delivery_time has to be cleared also. It could be done together during convert_ctx_access(). However, the latter patch will also expose the skb->mono_delivery_time bit as __sk_buff->delivery_time_type. Changing the delivery_time_type in the background may surprise the user, e.g. the 2nd read on __sk_buff->delivery_time_type may need a READ_ONCE() to avoid compiler optimization. Thus, in expecting the needs in the latter patch, this patch does a check on !skb->tstamp after running the tc-bpf and clears the skb->mono_delivery_time bit if needed. The earlier discussion on v4 [0]. The bpf insn rewrite requires the skb's mono_delivery_time bit and tc_at_ingress bit. They are moved up in sk_buff so that bpf rewrite can be done at a fixed offset. tc_skip_classify is moved together with tc_at_ingress. To get one bit for mono_delivery_time, csum_not_inet is moved down and this bit is currently used by sctp. [0]: https://lore.kernel.org/bpf/20220217015043.khqwqklx45c4m4se@kafai-mbp.dhcp.thefacebook.com/ Signed-off-by: Martin KaFai Lau Signed-off-by: David S. Miller --- include/linux/skbuff.h | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4b5b926a81f2..5445860e1ba6 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -941,8 +941,12 @@ struct sk_buff { __u8 vlan_present:1; /* See PKT_VLAN_PRESENT_BIT */ __u8 csum_complete_sw:1; __u8 csum_level:2; - __u8 csum_not_inet:1; __u8 dst_pending_confirm:1; + __u8 mono_delivery_time:1; +#ifdef CONFIG_NET_CLS_ACT + __u8 tc_skip_classify:1; + __u8 tc_at_ingress:1; +#endif #ifdef CONFIG_IPV6_NDISC_NODETYPE __u8 ndisc_nodetype:2; #endif @@ -953,10 +957,6 @@ struct sk_buff { #ifdef CONFIG_NET_SWITCHDEV __u8 offload_fwd_mark:1; __u8 offload_l3_fwd_mark:1; -#endif -#ifdef CONFIG_NET_CLS_ACT - __u8 tc_skip_classify:1; - __u8 tc_at_ingress:1; #endif __u8 redirected:1; #ifdef CONFIG_NET_REDIRECT @@ -969,7 +969,7 @@ struct sk_buff { __u8 decrypted:1; #endif __u8 slow_gro:1; - __u8 mono_delivery_time:1; + __u8 csum_not_inet:1; #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ @@ -1047,10 +1047,16 @@ struct sk_buff { /* if you move pkt_vlan_present around you also must adapt these constants */ #ifdef __BIG_ENDIAN_BITFIELD #define PKT_VLAN_PRESENT_BIT 7 +#define TC_AT_INGRESS_MASK (1 << 0) +#define SKB_MONO_DELIVERY_TIME_MASK (1 << 2) #else #define PKT_VLAN_PRESENT_BIT 0 +#define TC_AT_INGRESS_MASK (1 << 7) +#define SKB_MONO_DELIVERY_TIME_MASK (1 << 5) #endif #define PKT_VLAN_PRESENT_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset) +#define TC_AT_INGRESS_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset) +#define SKB_MONO_DELIVERY_TIME_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset) #ifdef __KERNEL__ /* -- cgit From 8d21ec0e46ed6e39994accff8eb4f2be3d2e76b5 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 2 Mar 2022 11:56:34 -0800 Subject: bpf: Add __sk_buff->delivery_time_type and bpf_skb_set_skb_delivery_time() * __sk_buff->delivery_time_type: This patch adds __sk_buff->delivery_time_type. It tells if the delivery_time is stored in __sk_buff->tstamp or not. It will be most useful for ingress to tell if the __sk_buff->tstamp has the (rcv) timestamp or delivery_time. If delivery_time_type is 0 (BPF_SKB_DELIVERY_TIME_NONE), it has the (rcv) timestamp. Two non-zero types are defined for the delivery_time_type, BPF_SKB_DELIVERY_TIME_MONO and BPF_SKB_DELIVERY_TIME_UNSPEC. For UNSPEC, it can only happen in egress because only mono delivery_time can be forwarded to ingress now. The clock of UNSPEC delivery_time can be deduced from the skb->sk->sk_clockid which is how the sch_etf doing it also. * Provide forwarded delivery_time to tc-bpf@ingress: With the help of the new delivery_time_type, the tc-bpf has a way to tell if the __sk_buff->tstamp has the (rcv) timestamp or the delivery_time. During bpf load time, the verifier will learn if the bpf prog has accessed the new __sk_buff->delivery_time_type. If it does, it means the tc-bpf@ingress is expecting the skb->tstamp could have the delivery_time. The kernel will then read the skb->tstamp as-is during bpf insn rewrite without checking the skb->mono_delivery_time. This is done by adding a new prog->delivery_time_access bit. The same goes for writing skb->tstamp. * bpf_skb_set_delivery_time(): The bpf_skb_set_delivery_time() helper is added to allow setting both delivery_time and the delivery_time_type at the same time. If the tc-bpf does not need to change the delivery_time_type, it can directly write to the __sk_buff->tstamp as the existing tc-bpf has already been doing. It will be most useful at ingress to change the __sk_buff->tstamp from the (rcv) timestamp to a mono delivery_time and then bpf_redirect_*(). bpf only has mono clock helper (bpf_ktime_get_ns), and the current known use case is the mono EDT for fq, and only mono delivery time can be kept during forward now, so bpf_skb_set_delivery_time() only supports setting BPF_SKB_DELIVERY_TIME_MONO. It can be extended later when use cases come up and the forwarding path also supports other clock bases. Signed-off-by: Martin KaFai Lau Signed-off-by: David S. Miller --- include/linux/filter.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index 1cb1af917617..9bf26307247f 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -572,7 +572,8 @@ struct bpf_prog { has_callchain_buf:1, /* callchain buffer allocated? */ enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ - call_get_func_ip:1; /* Do we call get_func_ip() */ + call_get_func_ip:1, /* Do we call get_func_ip() */ + delivery_time_access:1; /* Accessed __sk_buff->delivery_time_type */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ -- cgit From 5c3f1f9cc4cbbf491233982b5975ae2d284de5df Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 16 Feb 2022 15:31:35 +1100 Subject: mm: remove the __KERNEL__ guard from __KERNEL__ ifdefs don't make sense outside of include/uapi/. Link: https://lkml.kernel.org/r/20220210072828.2930359-3-hch@lst.de Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni Reviewed-by: Muchun Song Reviewed-by: Dan Williams Tested-by: "Sierra Guiza, Alejandro (Alex)" Cc: Alex Deucher Cc: Alistair Popple Cc: Ben Skeggs Cc: Christian Knig Cc: Felix Kuehling Cc: Karol Herbst Cc: Lyude Paul Cc: Miaohe Lin Cc: "Pan, Xinhui" Cc: Ralph Campbell Signed-off-by: Andrew Morton Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 74ee50c2033b..2cca8cd30186 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3,9 +3,6 @@ #define _LINUX_MM_H #include - -#ifdef __KERNEL__ - #include #include #include @@ -3379,5 +3376,4 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start, } #endif -#endif /* __KERNEL__ */ #endif /* _LINUX_MM_H */ -- cgit From 730ff52194cdb324b7680e5054c546f7b52de8a2 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 16 Feb 2022 15:31:35 +1100 Subject: mm: remove pointless includes from hmm.h pulls in the world for no good reason at all. Remove the includes and push a few ones into the users instead. Link: https://lkml.kernel.org/r/20220210072828.2930359-4-hch@lst.de Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni Reviewed-by: Muchun Song Tested-by: "Sierra Guiza, Alejandro (Alex)" Cc: Alex Deucher Cc: Alistair Popple Cc: Ben Skeggs Cc: Christian Knig Cc: Dan Williams Cc: Felix Kuehling Cc: Karol Herbst Cc: Lyude Paul Cc: Miaohe Lin Cc: "Pan, Xinhui" Cc: Ralph Campbell Signed-off-by: Andrew Morton Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/hmm.h | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hmm.h b/include/linux/hmm.h index 2fd2e91d5107..d5a6f101f843 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -9,14 +9,9 @@ #ifndef LINUX_HMM_H #define LINUX_HMM_H -#include -#include +#include -#include -#include -#include -#include -#include +struct mmu_interval_notifier; /* * On output: -- cgit From 75e55d8a107edb2fd6e02b1fa8a81531209cda04 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 16 Feb 2022 15:31:35 +1100 Subject: mm: move free_devmap_managed_page to memremap.c free_devmap_managed_page has nothing to do with the code in swap.c, move it to live with the rest of the code for devmap handling. Link: https://lkml.kernel.org/r/20220210072828.2930359-5-hch@lst.de Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni Reviewed-by: Muchun Song Reviewed-by: Dan Williams Tested-by: "Sierra Guiza, Alejandro (Alex)" Cc: Alex Deucher Cc: Alistair Popple Cc: Ben Skeggs Cc: Christian Knig Cc: Felix Kuehling Cc: Karol Herbst Cc: Lyude Paul Cc: Miaohe Lin Cc: "Pan, Xinhui" Cc: Ralph Campbell Signed-off-by: Andrew Morton Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 2cca8cd30186..a9d6473fc045 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1092,7 +1092,6 @@ static inline bool is_zone_movable_page(const struct page *page) } #ifdef CONFIG_DEV_PAGEMAP_OPS -void free_devmap_managed_page(struct page *page); DECLARE_STATIC_KEY_FALSE(devmap_managed_key); static inline bool page_is_devmap_managed(struct page *page) -- cgit From 895749455f6054e0c7b40a6ec449a3ab6db51bdd Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 16 Feb 2022 15:31:35 +1100 Subject: mm: simplify freeing of devmap managed pages Make put_devmap_managed_page return if it took charge of the page or not and remove the separate page_is_devmap_managed helper. Link: https://lkml.kernel.org/r/20220210072828.2930359-6-hch@lst.de Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Chaitanya Kulkarni Reviewed-by: Dan Williams Tested-by: "Sierra Guiza, Alejandro (Alex)" Cc: Alex Deucher Cc: Alistair Popple Cc: Ben Skeggs Cc: Christian Knig Cc: Felix Kuehling Cc: Karol Herbst Cc: Lyude Paul Cc: Miaohe Lin Cc: Muchun Song Cc: "Pan, Xinhui" Cc: Ralph Campbell Signed-off-by: Andrew Morton Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm.h | 34 ++++++++++------------------------ 1 file changed, 10 insertions(+), 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index a9d6473fc045..8a59f0456149 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1094,33 +1094,24 @@ static inline bool is_zone_movable_page(const struct page *page) #ifdef CONFIG_DEV_PAGEMAP_OPS DECLARE_STATIC_KEY_FALSE(devmap_managed_key); -static inline bool page_is_devmap_managed(struct page *page) +bool __put_devmap_managed_page(struct page *page); +static inline bool put_devmap_managed_page(struct page *page) { if (!static_branch_unlikely(&devmap_managed_key)) return false; if (!is_zone_device_page(page)) return false; - switch (page->pgmap->type) { - case MEMORY_DEVICE_PRIVATE: - case MEMORY_DEVICE_FS_DAX: - return true; - default: - break; - } - return false; + if (page->pgmap->type != MEMORY_DEVICE_PRIVATE && + page->pgmap->type != MEMORY_DEVICE_FS_DAX) + return false; + return __put_devmap_managed_page(page); } -void put_devmap_managed_page(struct page *page); - #else /* CONFIG_DEV_PAGEMAP_OPS */ -static inline bool page_is_devmap_managed(struct page *page) +static inline bool put_devmap_managed_page(struct page *page) { return false; } - -static inline void put_devmap_managed_page(struct page *page) -{ -} #endif /* CONFIG_DEV_PAGEMAP_OPS */ static inline bool is_device_private_page(const struct page *page) @@ -1220,16 +1211,11 @@ static inline void put_page(struct page *page) struct folio *folio = page_folio(page); /* - * For devmap managed pages we need to catch refcount transition from - * 2 to 1, when refcount reach one it means the page is free and we - * need to inform the device driver through callback. See - * include/linux/memremap.h and HMM for details. + * For some devmap managed pages we need to catch refcount transition + * from 2 to 1: */ - if (page_is_devmap_managed(&folio->page)) { - put_devmap_managed_page(&folio->page); + if (put_devmap_managed_page(&folio->page)) return; - } - folio_put(folio); } -- cgit From dc90f0846df4870b6cc8528c31e5c60f18fb68be Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 16 Feb 2022 15:31:36 +1100 Subject: mm: don't include in Move the check for the actual pgmap types that need the free at refcount one behavior into the out of line helper, and thus avoid the need to pull memremap.h into mm.h. Link: https://lkml.kernel.org/r/20220210072828.2930359-7-hch@lst.de Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Dan Williams Acked-by: Felix Kuehling Tested-by: "Sierra Guiza, Alejandro (Alex)" Cc: Alex Deucher Cc: Alistair Popple Cc: Ben Skeggs Cc: Chaitanya Kulkarni Cc: Karol Herbst Cc: Lyude Paul Cc: Miaohe Lin Cc: Muchun Song Cc: "Pan, Xinhui" Cc: Ralph Campbell Signed-off-by: Andrew Morton Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/memremap.h | 18 ++++++++++++++++++ include/linux/mm.h | 20 -------------------- 2 files changed, 18 insertions(+), 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 1fafcc38acba..514ab46f597e 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -1,6 +1,8 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef _LINUX_MEMREMAP_H_ #define _LINUX_MEMREMAP_H_ + +#include #include #include #include @@ -129,6 +131,22 @@ static inline unsigned long pgmap_vmemmap_nr(struct dev_pagemap *pgmap) return 1 << pgmap->vmemmap_shift; } +static inline bool is_device_private_page(const struct page *page) +{ + return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && + IS_ENABLED(CONFIG_DEVICE_PRIVATE) && + is_zone_device_page(page) && + page->pgmap->type == MEMORY_DEVICE_PRIVATE; +} + +static inline bool is_pci_p2pdma_page(const struct page *page) +{ + return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && + IS_ENABLED(CONFIG_PCI_P2PDMA) && + is_zone_device_page(page) && + page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA; +} + #ifdef CONFIG_ZONE_DEVICE void *memremap_pages(struct dev_pagemap *pgmap, int nid); void memunmap_pages(struct dev_pagemap *pgmap); diff --git a/include/linux/mm.h b/include/linux/mm.h index 8a59f0456149..cb8bee88e70c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -23,7 +23,6 @@ #include #include #include -#include #include #include #include @@ -1101,9 +1100,6 @@ static inline bool put_devmap_managed_page(struct page *page) return false; if (!is_zone_device_page(page)) return false; - if (page->pgmap->type != MEMORY_DEVICE_PRIVATE && - page->pgmap->type != MEMORY_DEVICE_FS_DAX) - return false; return __put_devmap_managed_page(page); } @@ -1114,22 +1110,6 @@ static inline bool put_devmap_managed_page(struct page *page) } #endif /* CONFIG_DEV_PAGEMAP_OPS */ -static inline bool is_device_private_page(const struct page *page) -{ - return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && - IS_ENABLED(CONFIG_DEVICE_PRIVATE) && - is_zone_device_page(page) && - page->pgmap->type == MEMORY_DEVICE_PRIVATE; -} - -static inline bool is_pci_p2pdma_page(const struct page *page) -{ - return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && - IS_ENABLED(CONFIG_PCI_P2PDMA) && - is_zone_device_page(page) && - page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA; -} - /* 127: arbitrary random number, small enough to assemble well */ #define folio_ref_zero_or_close_to_overflow(folio) \ ((unsigned int) folio_ref_count(folio) + 127u <= 127u) -- cgit From 27674ef6c73f0c9096a9827dc5d6ba9fc7808422 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 16 Feb 2022 15:31:36 +1100 Subject: mm: remove the extra ZONE_DEVICE struct page refcount ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference count doesn't need to be treated specially for ZONE_DEVICE pages. Note that this excludes the special idle page wakeup for fsdax pages, which still happens at refcount 1. This is a separate issue and will be sorted out later. Given that only fsdax pages require the notifiacation when the refcount hits 1 now, the PAGEMAP_OPS Kconfig symbol can go away and be replaced with a FS_DAX check for this hook in the put_page fastpath. Based on an earlier patch from Ralph Campbell . Link: https://lkml.kernel.org/r/20220210072828.2930359-8-hch@lst.de Signed-off-by: Christoph Hellwig Reviewed-by: Logan Gunthorpe Reviewed-by: Ralph Campbell Reviewed-by: Jason Gunthorpe Reviewed-by: Dan Williams Acked-by: Felix Kuehling Tested-by: "Sierra Guiza, Alejandro (Alex)" Cc: Alex Deucher Cc: Alistair Popple Cc: Ben Skeggs Cc: Chaitanya Kulkarni Cc: Christian Knig Cc: Karol Herbst Cc: Lyude Paul Cc: Miaohe Lin Cc: Muchun Song Cc: "Pan, Xinhui" Signed-off-by: Andrew Morton Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/memremap.h | 12 +++++------- include/linux/mm.h | 6 +++--- 2 files changed, 8 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 514ab46f597e..d6a114dd5ea8 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -68,9 +68,9 @@ enum memory_type { struct dev_pagemap_ops { /* - * Called once the page refcount reaches 1. (ZONE_DEVICE pages never - * reach 0 refcount unless there is a refcount bug. This allows the - * device driver to implement its own memory management.) + * Called once the page refcount reaches 0. The reference count will be + * reset to one by the core code after the method is called to prepare + * for handing out the page again. */ void (*page_free)(struct page *page); @@ -133,16 +133,14 @@ static inline unsigned long pgmap_vmemmap_nr(struct dev_pagemap *pgmap) static inline bool is_device_private_page(const struct page *page) { - return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && - IS_ENABLED(CONFIG_DEVICE_PRIVATE) && + return IS_ENABLED(CONFIG_DEVICE_PRIVATE) && is_zone_device_page(page) && page->pgmap->type == MEMORY_DEVICE_PRIVATE; } static inline bool is_pci_p2pdma_page(const struct page *page) { - return IS_ENABLED(CONFIG_DEV_PAGEMAP_OPS) && - IS_ENABLED(CONFIG_PCI_P2PDMA) && + return IS_ENABLED(CONFIG_PCI_P2PDMA) && is_zone_device_page(page) && page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA; } diff --git a/include/linux/mm.h b/include/linux/mm.h index cb8bee88e70c..0201d258c646 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1090,7 +1090,7 @@ static inline bool is_zone_movable_page(const struct page *page) return page_zonenum(page) == ZONE_MOVABLE; } -#ifdef CONFIG_DEV_PAGEMAP_OPS +#if defined(CONFIG_ZONE_DEVICE) && defined(CONFIG_FS_DAX) DECLARE_STATIC_KEY_FALSE(devmap_managed_key); bool __put_devmap_managed_page(struct page *page); @@ -1103,12 +1103,12 @@ static inline bool put_devmap_managed_page(struct page *page) return __put_devmap_managed_page(page); } -#else /* CONFIG_DEV_PAGEMAP_OPS */ +#else /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ static inline bool put_devmap_managed_page(struct page *page) { return false; } -#endif /* CONFIG_DEV_PAGEMAP_OPS */ +#endif /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ /* 127: arbitrary random number, small enough to assemble well */ #define folio_ref_zero_or_close_to_overflow(folio) \ -- cgit From dc4e8c07e9e2f69387579c49caca26ba239f7270 Mon Sep 17 00:00:00 2001 From: Shuai Xue Date: Sun, 27 Feb 2022 20:25:45 +0800 Subject: ACPI: APEI: explicit init of HEST and GHES in apci_init() From commit e147133a42cb ("ACPI / APEI: Make hest.c manage the estatus memory pool") was merged, ghes_init() relies on acpi_hest_init() to manage the estatus memory pool. On the other hand, ghes_init() relies on sdei_init() to detect the SDEI version and (un)register events. The dependencies are as follows: ghes_init() => acpi_hest_init() => acpi_bus_init() => acpi_init() ghes_init() => sdei_init() HEST is not PCI-specific and initcall ordering is implicit and not well-defined within a level. Based on above, remove acpi_hest_init() from acpi_pci_root_init() and convert ghes_init() and sdei_init() from initcalls to explicit calls in the following order: acpi_hest_init() ghes_init() sdei_init() Signed-off-by: Shuai Xue Signed-off-by: Rafael J. Wysocki --- include/linux/arm_sdei.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/arm_sdei.h b/include/linux/arm_sdei.h index 0a241c5c911d..14dc461b0e82 100644 --- a/include/linux/arm_sdei.h +++ b/include/linux/arm_sdei.h @@ -46,9 +46,11 @@ int sdei_unregister_ghes(struct ghes *ghes); /* For use by arch code when CPU hotplug notifiers are not appropriate. */ int sdei_mask_local_cpu(void); int sdei_unmask_local_cpu(void); +void __init sdei_init(void); #else static inline int sdei_mask_local_cpu(void) { return 0; } static inline int sdei_unmask_local_cpu(void) { return 0; } +static inline void sdei_init(void) { } #endif /* CONFIG_ARM_SDE_INTERFACE */ -- cgit From 31b9887c7258ca47d9c665a80f19f006c86756b1 Mon Sep 17 00:00:00 2001 From: Jamie Iles Date: Mon, 17 Jan 2022 17:48:15 +0000 Subject: i3c: remove i2c board info from i2c_dev_desc I2C board info is only required during adapter setup so there is no requirement to keeping a pointer to it once running. To support dynamic device addition we can't rely on board info - user-space creation through sysfs won't have a boardinfo. Cc: Alexandre Belloni Signed-off-by: Jamie Iles Signed-off-by: Alexandre Belloni Link: https://lore.kernel.org/r/20220117174816.1963463-2-quic_jiles@quicinc.com --- include/linux/i3c/master.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/i3c/master.h b/include/linux/i3c/master.h index 9cb39d901cd5..604a126b78c8 100644 --- a/include/linux/i3c/master.h +++ b/include/linux/i3c/master.h @@ -85,7 +85,6 @@ struct i2c_dev_boardinfo { */ struct i2c_dev_desc { struct i3c_i2c_dev_desc common; - const struct i2c_dev_boardinfo *boardinfo; struct i2c_client *dev; u16 addr; u8 lvr; -- cgit From 98b4d7a4e7374a44c4afd9f08330e72f6ad0d644 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 4 Mar 2022 14:00:40 +0800 Subject: net: dev: use kfree_skb_reason() for sch_handle_egress() Replace kfree_skb() used in sch_handle_egress() with kfree_skb_reason(). The drop reason SKB_DROP_REASON_TC_EGRESS is introduced. Considering the code path of tc egerss, we make it distinct with the drop reason of SKB_DROP_REASON_QDISC_DROP in the next commit. Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5445860e1ba6..1ffe64616741 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -394,6 +394,7 @@ enum skb_drop_reason { * entry is full */ SKB_DROP_REASON_NEIGH_DEAD, /* neigh entry is dead */ + SKB_DROP_REASON_TC_EGRESS, /* dropped in TC egress HOOK */ SKB_DROP_REASON_MAX, }; -- cgit From 215b0f1963d4e34fccac6992b3debe26f78a6eb8 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 4 Mar 2022 14:00:41 +0800 Subject: net: skb: introduce the function kfree_skb_list_reason() To report reasons of skb drops, introduce the function kfree_skb_list_reason() and make kfree_skb_list() an inline call to it. This function will be used in the next commit in __dev_xmit_skb(). Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/skbuff.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 1ffe64616741..1f2de153755c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1202,10 +1202,16 @@ static inline void kfree_skb(struct sk_buff *skb) } void skb_release_head_state(struct sk_buff *skb); -void kfree_skb_list(struct sk_buff *segs); +void kfree_skb_list_reason(struct sk_buff *segs, + enum skb_drop_reason reason); void skb_dump(const char *level, const struct sk_buff *skb, bool full_pkt); void skb_tx_error(struct sk_buff *skb); +static inline void kfree_skb_list(struct sk_buff *segs) +{ + kfree_skb_list_reason(segs, SKB_DROP_REASON_NOT_SPECIFIED); +} + #ifdef CONFIG_TRACEPOINTS void consume_skb(struct sk_buff *skb); #else -- cgit From 7faef0547f4c29031a68d058918b031a8e520d49 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 4 Mar 2022 14:00:42 +0800 Subject: net: dev: add skb drop reasons to __dev_xmit_skb() Add reasons for skb drops to __dev_xmit_skb() by replacing kfree_skb_list() with kfree_skb_list_reason(). The drop reason of SKB_DROP_REASON_QDISC_DROP is introduced for qdisc enqueue fails. Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/skbuff.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 1f2de153755c..769d54ba2f3b 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -395,6 +395,10 @@ enum skb_drop_reason { */ SKB_DROP_REASON_NEIGH_DEAD, /* neigh entry is dead */ SKB_DROP_REASON_TC_EGRESS, /* dropped in TC egress HOOK */ + SKB_DROP_REASON_QDISC_DROP, /* dropped by qdisc when packet + * outputting (failed to enqueue to + * current qdisc) + */ SKB_DROP_REASON_MAX, }; -- cgit From 44f0bd40803c0e04f1c8cd59df3c7acce783ae9c Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 4 Mar 2022 14:00:43 +0800 Subject: net: dev: use kfree_skb_reason() for enqueue_to_backlog() Replace kfree_skb() used in enqueue_to_backlog() with kfree_skb_reason(). The skb rop reason SKB_DROP_REASON_CPU_BACKLOG is introduced for the case of failing to enqueue the skb to the per CPU backlog queue. The further reason can be backlog queue full or RPS flow limition, and I think we needn't to make further distinctions. Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/skbuff.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 769d54ba2f3b..44ecbfff2b69 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -399,6 +399,12 @@ enum skb_drop_reason { * outputting (failed to enqueue to * current qdisc) */ + SKB_DROP_REASON_CPU_BACKLOG, /* failed to enqueue the skb to + * the per CPU backlog queue. This + * can be caused by backlog queue + * full (see netdev_max_backlog in + * net.rst) or RPS flow limit + */ SKB_DROP_REASON_MAX, }; -- cgit From 7e726ed81e1ddd5fdc431e02b94fcfe2a9876d42 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 4 Mar 2022 14:00:44 +0800 Subject: net: dev: use kfree_skb_reason() for do_xdp_generic() Replace kfree_skb() used in do_xdp_generic() with kfree_skb_reason(). The drop reason SKB_DROP_REASON_XDP is introduced for this case. Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 44ecbfff2b69..7a38d0f90cef 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -405,6 +405,7 @@ enum skb_drop_reason { * full (see netdev_max_backlog in * net.rst) or RPS flow limit */ + SKB_DROP_REASON_XDP, /* dropped by XDP in input path */ SKB_DROP_REASON_MAX, }; -- cgit From a568aff26ac03ee9eb1482683514914a5ec3b4c3 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 4 Mar 2022 14:00:45 +0800 Subject: net: dev: use kfree_skb_reason() for sch_handle_ingress() Replace kfree_skb() used in sch_handle_ingress() with kfree_skb_reason(). Following drop reasons are introduced: SKB_DROP_REASON_TC_INGRESS Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 7a38d0f90cef..6b14b2d297d2 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -406,6 +406,7 @@ enum skb_drop_reason { * net.rst) or RPS flow limit */ SKB_DROP_REASON_XDP, /* dropped by XDP in input path */ + SKB_DROP_REASON_TC_INGRESS, /* dropped in TC ingress HOOK */ SKB_DROP_REASON_MAX, }; -- cgit From 6c2728b7c14164928cb7cb9c847dead101b2d503 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 4 Mar 2022 14:00:46 +0800 Subject: net: dev: use kfree_skb_reason() for __netif_receive_skb_core() Add reason for skb drops to __netif_receive_skb_core() when packet_type not found to handle the skb. For this purpose, the drop reason SKB_DROP_REASON_PTYPE_ABSENT is introduced. Take ether packets for example, this case mainly happens when L3 protocol is not supported. Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/skbuff.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6b14b2d297d2..2be263184d1e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -407,6 +407,11 @@ enum skb_drop_reason { */ SKB_DROP_REASON_XDP, /* dropped by XDP in input path */ SKB_DROP_REASON_TC_INGRESS, /* dropped in TC ingress HOOK */ + SKB_DROP_REASON_PTYPE_ABSENT, /* not packet_type found to handle + * the skb. For an etner packet, + * this means that L3 protocol is + * not supported + */ SKB_DROP_REASON_MAX, }; -- cgit From 838d6d3461db0fdbf33fc5f8a69c27b50b4a46da Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Fri, 14 Jan 2022 14:56:15 -0500 Subject: virtio: unexport virtio_finalize_features virtio_finalize_features is only used internally within virtio. No reason to export it. Signed-off-by: Michael S. Tsirkin Reviewed-by: Cornelia Huck Acked-by: Jason Wang --- include/linux/virtio.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 72292a62cd90..5464f398912a 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -133,7 +133,6 @@ bool is_virtio_device(struct device *dev); void virtio_break_device(struct virtio_device *dev); void virtio_config_changed(struct virtio_device *dev); -int virtio_finalize_features(struct virtio_device *dev); #ifdef CONFIG_PM_SLEEP int virtio_device_freeze(struct virtio_device *dev); int virtio_device_restore(struct virtio_device *dev); -- cgit From 4fa59ede95195f267101a1b8916992cf3f245cdb Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Fri, 14 Jan 2022 14:58:41 -0500 Subject: virtio: acknowledge all features before access The feature negotiation was designed in a way that makes it possible for devices to know which config fields will be accessed by drivers. This is broken since commit 404123c2db79 ("virtio: allow drivers to validate features") with fallout in at least block and net. We have a partial work-around in commit 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") which at least lets devices find out which format should config space have, but this is a partial fix: guests should not access config space without acknowledging features since otherwise we'll never be able to change the config space format. To fix, split finalize_features from virtio_finalize_features and call finalize_features with all feature bits before validation, and then - if validation changed any bits - once again after. Since virtio_finalize_features no longer writes out features rename it to virtio_features_ok - since that is what it does: checks that features are ok with the device. As a side effect, this also reduces the amount of hypervisor accesses - we now only acknowledge features once unless we are clearing any features when validating (which is uncommon). IRC I think that this was more or less always the intent in the spec but unfortunately the way the spec is worded does not say this explicitly, I plan to address this at the spec level, too. Acked-by: Jason Wang Cc: stable@vger.kernel.org Fixes: 404123c2db79 ("virtio: allow drivers to validate features") Fixes: 2f9a174f918e ("virtio: write back F_VERSION_1 before validate") Cc: "Halil Pasic" Signed-off-by: Michael S. Tsirkin --- include/linux/virtio_config.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 4d107ad31149..dafdc7f48c01 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -64,8 +64,9 @@ struct virtio_shm_region { * Returns the first 64 feature bits (all we currently need). * @finalize_features: confirm what device features we'll be using. * vdev: the virtio_device - * This gives the final feature bits for the device: it can change + * This sends the driver feature bits to the device: it can change * the dev->feature bits if it wants. + * Note: despite the name this can be called any number of times. * Returns 0 on success or error status * @bus_name: return the bus name associated with the device (optional) * vdev: the virtio_device -- cgit From bf9ad37dc8a30cce22ae95d6c2ca6abf8731d305 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Tue, 14 Jul 2015 14:26:34 +0200 Subject: signal, x86: Delay calling signals in atomic on RT enabled kernels On x86_64 we must disable preemption before we enable interrupts for stack faults, int3 and debugging, because the current task is using a per CPU debug stack defined by the IST. If we schedule out, another task can come in and use the same stack and cause the stack to be corrupted and crash the kernel on return. When CONFIG_PREEMPT_RT is enabled, spinlock_t locks become sleeping, and one of these is the spin lock used in signal handling. Some of the debug code (int3) causes do_trap() to send a signal. This function calls a spinlock_t lock that has been converted to a sleeping lock. If this happens, the above issues with the corrupted stack is possible. Instead of calling the signal right away, for PREEMPT_RT and x86, the signal information is stored on the stacks task_struct and TIF_NOTIFY_RESUME is set. Then on exit of the trap, the signal resume code will send the signal when preemption is enabled. [ rostedt: Switched from #ifdef CONFIG_PREEMPT_RT to ARCH_RT_DELAYS_SIGNAL_SEND and added comments to the code. ] [bigeasy: Add on 32bit as per Yang Shi, minor rewording. ] [ tglx: Use a config option ] Signed-off-by: Oleg Nesterov Signed-off-by: Steven Rostedt Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/Ygq5aBB/qMQw6aP5@linutronix.de --- include/linux/sched.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 75ba8aa60248..098e37fd770a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1087,6 +1087,9 @@ struct task_struct { /* Restored if set_restore_sigmask() was used: */ sigset_t saved_sigmask; struct sigpending pending; +#ifdef CONFIG_RT_DELAYED_SIGNALS + struct kernel_siginfo forced_info; +#endif unsigned long sas_ss_sp; size_t sas_ss_size; unsigned int sas_ss_flags; -- cgit From 402e6688a7df482cc57be0b0dd5b443d995073fb Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 1 Mar 2022 10:01:48 +0800 Subject: iommu/vt-d: Remove intel_iommu::domains The "domains" field of the intel_iommu structure keeps the mapping of domain_id to dmar_domain. This information is not used anywhere. Remove and cleanup it to avoid unnecessary memory consumption. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 5cfda90b2cca..8c7591b5f3e2 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -578,7 +578,6 @@ struct intel_iommu { #ifdef CONFIG_INTEL_IOMMU unsigned long *domain_ids; /* bitmap of domains */ - struct dmar_domain ***domains; /* ptr to domains */ spinlock_t lock; /* protect context, domain ids */ struct root_entry *root_entry; /* virtual address */ -- cgit From 586081d3f6b13ec9dfdfdf3d7842a688b376fa5e Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 1 Mar 2022 10:01:52 +0800 Subject: iommu/vt-d: Remove DEFER_DEVICE_DOMAIN_INFO Allocate and set the per-device iommu private data during iommu device probe. This makes the per-device iommu private data always available during iommu_probe_device() and iommu_release_device(). With this changed, the dummy DEFER_DEVICE_DOMAIN_INFO pointer could be removed. The wrappers for getting the private data and domain are also cleaned. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220214025704.3184654-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220301020159.633356-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 8c7591b5f3e2..03f1134fc2fe 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -733,8 +733,6 @@ int for_each_device_domain(int (*fn)(struct device_domain_info *info, void *data), void *data); void iommu_flush_write_buffer(struct intel_iommu *iommu); int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev); -struct dmar_domain *find_domain(struct device *dev); -struct device_domain_info *get_domain_info(struct device *dev); struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn); #ifdef CONFIG_INTEL_IOMMU_SVM -- cgit From 2852631d96a643e0b21a9b14ad38caf465d7225f Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Tue, 1 Mar 2022 10:01:56 +0800 Subject: iommu/vt-d: Move intel_iommu_ops to header file Compiler is not happy about hidden declaration of intel_iommu_ops. .../iommu.c:414:24: warning: symbol 'intel_iommu_ops' was not declared. Should it be static? Move declaration to header file to make compiler happy. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220207141240.8253-1-andriy.shevchenko@linux.intel.com Signed-off-by: Lu Baolu Link: https://lore.kernel.org/r/20220301020159.633356-10-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 03f1134fc2fe..4909d6c9ac21 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -783,6 +783,8 @@ bool context_present(struct context_entry *context); struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, u8 devfn, int alloc); +extern const struct iommu_ops intel_iommu_ops; + #ifdef CONFIG_INTEL_IOMMU extern int iommu_calculate_agaw(struct intel_iommu *iommu); extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu); -- cgit From 97f2f2c5317f55ae3440733a090a96a251da222b Mon Sep 17 00:00:00 2001 From: Yian Chen Date: Tue, 1 Mar 2022 10:01:59 +0800 Subject: iommu/vt-d: Enable ATS for the devices in SATC table Starting from Intel VT-d v3.2, Intel platform BIOS can provide additional SATC table structure. SATC table includes a list of SoC integrated devices that support ATC (Address translation cache). Enabling ATC (via ATS capability) can be a functional requirement for SATC device operation or optional to enhance device performance/functionality. This is determined by the bit of ATC_REQUIRED in SATC table. When IOMMU is working in scalable mode, software chooses to always enable ATS for every device in SATC table because Intel SoC devices in SATC table are trusted to use ATS. On the other hand, if IOMMU is in legacy mode, ATS of SATC capable devices can work transparently to software and be automatically enabled by IOMMU hardware. As the result, there is no need for software to enable ATS on these devices. This also removes dmar_find_matched_atsr_unit() helper as it becomes dead code now. Signed-off-by: Yian Chen Link: https://lore.kernel.org/r/20220222185416.1722611-1-yian.chen@intel.com Signed-off-by: Lu Baolu Link: https://lore.kernel.org/r/20220301020159.633356-13-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 4909d6c9ac21..2f9891cb3d00 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -693,7 +693,6 @@ static inline int nr_pte_to_next_page(struct dma_pte *pte) } extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev); -extern int dmar_find_matched_atsr_unit(struct pci_dev *dev); extern int dmar_enable_qi(struct intel_iommu *iommu); extern void dmar_disable_qi(struct intel_iommu *iommu); -- cgit From 26c9da51949916b73d995a2c89412c346903273d Mon Sep 17 00:00:00 2001 From: Puranjay Mohan Date: Wed, 16 Feb 2022 13:42:23 +0530 Subject: remoteproc: Introduce sysfs_read_only flag The remoteproc framework provides sysfs interfaces for changing the firmware name and for starting/stopping a remote processor through the sysfs files 'state' and 'firmware'. The 'coredump' file is used to set the coredump configuration. The 'recovery' sysfs file can also be used similarly to control the error recovery state machine of a remoteproc. These interfaces are currently allowed irrespective of how the remoteprocs were booted (like remoteproc self auto-boot, remoteproc client-driven boot etc). These interfaces can adversely affect a remoteproc and its clients especially when a remoteproc is being controlled by a remoteproc client driver(s). Also, not all remoteproc drivers may want to support the sysfs interfaces by default. Add support to make the remoteproc sysfs files read only by introducing a state flag 'sysfs_read_only' that the individual remoteproc drivers can set based on their usage needs. The default behavior is to allow the sysfs operations as before. Implement attribute_group->is_visible() to make the sysfs entries read only when 'sysfs_read_only' flag is set. Signed-off-by: Puranjay Mohan Link: https://lore.kernel.org/r/20220216081224.9956-2-p-mohan@ti.com Signed-off-by: Mathieu Poirier --- include/linux/remoteproc.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index e0600e1e5c17..93a1d0050fbc 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -523,6 +523,7 @@ struct rproc_dump_segment { * @table_sz: size of @cached_table * @has_iommu: flag to indicate if remote processor is behind an MMU * @auto_boot: flag to indicate if remote processor should be auto-started + * @sysfs_read_only: flag to make remoteproc sysfs files read only * @dump_segments: list of segments in the firmware * @nb_vdev: number of vdev currently handled by rproc * @elf_class: firmware ELF class @@ -562,6 +563,7 @@ struct rproc { size_t table_sz; bool has_iommu; bool auto_boot; + bool sysfs_read_only; struct list_head dump_segments; int nb_vdev; u8 elf_class; -- cgit From e0077cc13b831f8fad5557442f73bf7728683713 Mon Sep 17 00:00:00 2001 From: Si-Wei Liu Date: Fri, 14 Jan 2022 19:27:59 -0500 Subject: vdpa: factor out vdpa_set_features_unlocked for vdpa internal use No functional change introduced. vdpa bus driver such as virtio_vdpa or vhost_vdpa is not supposed to take care of the locking for core by its own. The locked API vdpa_set_features should suffice the bus driver's need. Signed-off-by: Si-Wei Liu Reviewed-by: Eli Cohen Link: https://lore.kernel.org/r/1642206481-30721-2-git-send-email-si-wei.liu@oracle.com Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang --- include/linux/vdpa.h | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 2de442ececae..721089bb4c84 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -401,18 +401,24 @@ static inline int vdpa_reset(struct vdpa_device *vdev) return ret; } -static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features, bool locked) +static inline int vdpa_set_features_unlocked(struct vdpa_device *vdev, u64 features) { const struct vdpa_config_ops *ops = vdev->config; int ret; - if (!locked) - mutex_lock(&vdev->cf_mutex); - vdev->features_valid = true; ret = ops->set_driver_features(vdev, features); - if (!locked) - mutex_unlock(&vdev->cf_mutex); + + return ret; +} + +static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features) +{ + int ret; + + mutex_lock(&vdev->cf_mutex); + ret = vdpa_set_features_unlocked(vdev, features); + mutex_unlock(&vdev->cf_mutex); return ret; } -- cgit From b4060db9251f919506e4d672737c6b8ab9a84701 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Wed, 23 Feb 2022 08:34:48 -0800 Subject: PM: runtime: Have devm_pm_runtime_enable() handle pm_runtime_dont_use_autosuspend() The PM Runtime docs say: Drivers in ->remove() callback should undo the runtime PM changes done in ->probe(). Usually this means calling pm_runtime_disable(), pm_runtime_dont_use_autosuspend() etc. From grepping code, it's clear that many people aren't aware of the need to call pm_runtime_dont_use_autosuspend(). When brainstorming solutions, one idea that came up was to leverage the new-ish devm_pm_runtime_enable() function. The idea here is that: * When the devm action is called we know that the driver is being removed. It's the perfect time to undo the use_autosuspend. * The code of pm_runtime_dont_use_autosuspend() already handles the case of being called when autosuspend wasn't enabled. Suggested-by: Laurent Pinchart Signed-off-by: Douglas Anderson Reviewed-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- include/linux/pm_runtime.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 9f09601c465a..2bff6a10095d 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -567,6 +567,10 @@ static inline void pm_runtime_disable(struct device *dev) * Allow the runtime PM autosuspend mechanism to be used for @dev whenever * requested (or "autosuspend" will be handled as direct runtime-suspend for * it). + * + * NOTE: It's important to undo this with pm_runtime_dont_use_autosuspend() + * at driver exit time unless your driver initially enabled pm_runtime + * with devm_pm_runtime_enable() (which handles it for you). */ static inline void pm_runtime_use_autosuspend(struct device *dev) { -- cgit From fd8be27e50e04f6e80af0f3e327cced525558256 Mon Sep 17 00:00:00 2001 From: Michal Suchanek Date: Fri, 25 Feb 2022 21:51:35 +0100 Subject: efifb: Remove redundant efifb_setup_from_dmi stub efifb is the only user of efifb_setup_from_dmi which is provided by sysfb which is selected by efifb. That makes the stub redundant. Signed-off-by: Michal Suchanek Reviewed-by: Javier Martinez Canillas Signed-off-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/7416c439d68e9e96068ea5c77e05c99c7df41750.1645822213.git.msuchanek@suse.de --- include/linux/efi.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index ccd4d3f91c98..0cbbc4103632 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1329,10 +1329,6 @@ static inline struct efi_mokvar_table_entry *efi_mokvar_entry_find( } #endif -#ifdef CONFIG_SYSFB extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt); -#else -static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) { } -#endif #endif /* _LINUX_EFI_H */ -- cgit From 92c45b63ce22c8898aa41806e8d6692bcd577510 Mon Sep 17 00:00:00 2001 From: Mark Tomlinson Date: Thu, 6 Aug 2020 16:14:55 +1200 Subject: PCI: Reduce warnings on possible RW1C corruption For hardware that only supports 32-bit writes to PCI there is the possibility of clearing RW1C (write-one-to-clear) bits. A rate-limited messages was introduced by fb2659230120, but rate-limiting is not the best choice here. Some devices may not show the warnings they should if another device has just produced a bunch of warnings. Also, the number of messages can be a nuisance on devices which are otherwise working fine. Change the ratelimit to a single warning per bus. This ensures no bus is 'starved' of emitting a warning and also that there isn't a continuous stream of warnings. It would be preferable to have a warning per device, but the pci_dev structure is not available here, and a lookup from devfn would be far too slow. Suggested-by: Bjorn Helgaas Fixes: fb2659230120 ("PCI: Warn on possible RW1C corruption for sub-32 bit config writes") Link: https://lore.kernel.org/r/20200806041455.11070-1-mark.tomlinson@alliedtelesis.co.nz Signed-off-by: Mark Tomlinson Signed-off-by: Bjorn Helgaas Reviewed-by: Florian Fainelli Reviewed-by: Rob Herring Acked-by: Scott Branden --- include/linux/pci.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 8253a5413d7c..678fecdf6b81 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -668,6 +668,7 @@ struct pci_bus { struct bin_attribute *legacy_io; /* Legacy I/O for this bus */ struct bin_attribute *legacy_mem; /* Legacy mem */ unsigned int is_added:1; + unsigned int unsafe_warn:1; /* warned about RW1C config write */ }; #define to_pci_bus(n) container_of(n, struct pci_bus, dev) -- cgit From 5c26f6ac9416b63d093e29c30e79b3297e425472 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 4 Mar 2022 20:28:51 -0800 Subject: mm: refactor vm_area_struct::anon_vma_name usage code Avoid mixing strings and their anon_vma_name referenced pointers by using struct anon_vma_name whenever possible. This simplifies the code and allows easier sharing of anon_vma_name structures when they represent the same name. [surenb@google.com: fix comment] Link: https://lkml.kernel.org/r/20220223153613.835563-1-surenb@google.com Link: https://lkml.kernel.org/r/20220224231834.1481408-1-surenb@google.com Signed-off-by: Suren Baghdasaryan Suggested-by: Matthew Wilcox Suggested-by: Michal Hocko Acked-by: Michal Hocko Cc: Colin Cross Cc: Sumit Semwal Cc: Dave Hansen Cc: Kees Cook Cc: "Kirill A. Shutemov" Cc: Vlastimil Babka Cc: Johannes Weiner Cc: "Eric W. Biederman" Cc: Christian Brauner Cc: Alexey Gladkov Cc: Sasha Levin Cc: Chris Hyser Cc: Davidlohr Bueso Cc: Peter Collingbourne Cc: Xiaofeng Cao Cc: David Hildenbrand Cc: Cyrill Gorcunov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 7 ++-- include/linux/mm_inline.h | 87 ++++++++++++++++++++++++++++++++--------------- include/linux/mm_types.h | 5 ++- 3 files changed, 67 insertions(+), 32 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 213cc569b192..5744a3fc4716 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2626,7 +2626,7 @@ static inline int vma_adjust(struct vm_area_struct *vma, unsigned long start, extern struct vm_area_struct *vma_merge(struct mm_struct *, struct vm_area_struct *prev, unsigned long addr, unsigned long end, unsigned long vm_flags, struct anon_vma *, struct file *, pgoff_t, - struct mempolicy *, struct vm_userfaultfd_ctx, const char *); + struct mempolicy *, struct vm_userfaultfd_ctx, struct anon_vma_name *); extern struct anon_vma *find_mergeable_anon_vma(struct vm_area_struct *); extern int __split_vma(struct mm_struct *, struct vm_area_struct *, unsigned long addr, int new_below); @@ -3372,11 +3372,12 @@ static inline int seal_check_future_write(int seals, struct vm_area_struct *vma) #ifdef CONFIG_ANON_VMA_NAME int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, - unsigned long len_in, const char *name); + unsigned long len_in, + struct anon_vma_name *anon_name); #else static inline int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, - unsigned long len_in, const char *name) { + unsigned long len_in, struct anon_vma_name *anon_name) { return 0; } #endif diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index b725839dfe71..dd3accaa4e6d 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -140,50 +140,81 @@ static __always_inline void del_page_from_lru_list(struct page *page, #ifdef CONFIG_ANON_VMA_NAME /* - * mmap_lock should be read-locked when calling vma_anon_name() and while using - * the returned pointer. + * mmap_lock should be read-locked when calling anon_vma_name(). Caller should + * either keep holding the lock while using the returned pointer or it should + * raise anon_vma_name refcount before releasing the lock. */ -extern const char *vma_anon_name(struct vm_area_struct *vma); +extern struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma); +extern struct anon_vma_name *anon_vma_name_alloc(const char *name); +extern void anon_vma_name_free(struct kref *kref); -/* - * mmap_lock should be read-locked for orig_vma->vm_mm. - * mmap_lock should be write-locked for new_vma->vm_mm or new_vma should be - * isolated. - */ -extern void dup_vma_anon_name(struct vm_area_struct *orig_vma, - struct vm_area_struct *new_vma); +/* mmap_lock should be read-locked */ +static inline void anon_vma_name_get(struct anon_vma_name *anon_name) +{ + if (anon_name) + kref_get(&anon_name->kref); +} -/* - * mmap_lock should be write-locked or vma should have been isolated under - * write-locked mmap_lock protection. - */ -extern void free_vma_anon_name(struct vm_area_struct *vma); +static inline void anon_vma_name_put(struct anon_vma_name *anon_name) +{ + if (anon_name) + kref_put(&anon_name->kref, anon_vma_name_free); +} -/* mmap_lock should be read-locked */ -static inline bool is_same_vma_anon_name(struct vm_area_struct *vma, - const char *name) +static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma, + struct vm_area_struct *new_vma) +{ + struct anon_vma_name *anon_name = anon_vma_name(orig_vma); + + if (anon_name) { + anon_vma_name_get(anon_name); + new_vma->anon_name = anon_name; + } +} + +static inline void free_anon_vma_name(struct vm_area_struct *vma) { - const char *vma_name = vma_anon_name(vma); + /* + * Not using anon_vma_name because it generates a warning if mmap_lock + * is not held, which might be the case here. + */ + if (!vma->vm_file) + anon_vma_name_put(vma->anon_name); +} - /* either both NULL, or pointers to same string */ - if (vma_name == name) +static inline bool anon_vma_name_eq(struct anon_vma_name *anon_name1, + struct anon_vma_name *anon_name2) +{ + if (anon_name1 == anon_name2) return true; - return name && vma_name && !strcmp(name, vma_name); + return anon_name1 && anon_name2 && + !strcmp(anon_name1->name, anon_name2->name); } + #else /* CONFIG_ANON_VMA_NAME */ -static inline const char *vma_anon_name(struct vm_area_struct *vma) +static inline struct anon_vma_name *anon_vma_name(struct vm_area_struct *vma) { return NULL; } -static inline void dup_vma_anon_name(struct vm_area_struct *orig_vma, - struct vm_area_struct *new_vma) {} -static inline void free_vma_anon_name(struct vm_area_struct *vma) {} -static inline bool is_same_vma_anon_name(struct vm_area_struct *vma, - const char *name) + +static inline struct anon_vma_name *anon_vma_name_alloc(const char *name) +{ + return NULL; +} + +static inline void anon_vma_name_get(struct anon_vma_name *anon_name) {} +static inline void anon_vma_name_put(struct anon_vma_name *anon_name) {} +static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma, + struct vm_area_struct *new_vma) {} +static inline void free_anon_vma_name(struct vm_area_struct *vma) {} + +static inline bool anon_vma_name_eq(struct anon_vma_name *anon_name1, + struct anon_vma_name *anon_name2) { return true; } + #endif /* CONFIG_ANON_VMA_NAME */ static inline void init_tlb_flush_pending(struct mm_struct *mm) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5140e5feb486..0f549870da6a 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -416,7 +416,10 @@ struct vm_area_struct { struct rb_node rb; unsigned long rb_subtree_last; } shared; - /* Serialized by mmap_sem. */ + /* + * Serialized by mmap_sem. Never use directly because it is + * valid only when vm_file is NULL. Use anon_vma_name instead. + */ struct anon_vma_name *anon_name; }; -- cgit From 96403e11283def1d1c465c8279514c9a504d8630 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Fri, 4 Mar 2022 20:28:55 -0800 Subject: mm: prevent vm_area_struct::anon_name refcount saturation A deep process chain with many vmas could grow really high. With default sysctl_max_map_count (64k) and default pid_max (32k) the max number of vmas in the system is 2147450880 and the refcounter has headroom of 1073774592 before it reaches REFCOUNT_SATURATED (3221225472). Therefore it's unlikely that an anonymous name refcounter will overflow with these defaults. Currently the max for pid_max is PID_MAX_LIMIT (4194304) and for sysctl_max_map_count it's INT_MAX (2147483647). In this configuration anon_vma_name refcount overflow becomes theoretically possible (that still require heavy sharing of that anon_vma_name between processes). kref refcounting interface used in anon_vma_name structure will detect a counter overflow when it reaches REFCOUNT_SATURATED value but will only generate a warning and freeze the ref counter. This would lead to the refcounted object never being freed. A determined attacker could leak memory like that but it would be rather expensive and inefficient way to do so. To ensure anon_vma_name refcount does not overflow, stop anon_vma_name sharing when the refcount reaches REFCOUNT_MAX (2147483647), which still leaves INT_MAX/2 (1073741823) values before the counter reaches REFCOUNT_SATURATED. This should provide enough headroom for raising the refcounts temporarily. Link: https://lkml.kernel.org/r/20220223153613.835563-2-surenb@google.com Signed-off-by: Suren Baghdasaryan Suggested-by: Michal Hocko Acked-by: Michal Hocko Cc: Alexey Gladkov Cc: Chris Hyser Cc: Christian Brauner Cc: Colin Cross Cc: Cyrill Gorcunov Cc: Dave Hansen Cc: David Hildenbrand Cc: Davidlohr Bueso Cc: "Eric W. Biederman" Cc: Johannes Weiner Cc: Kees Cook Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Peter Collingbourne Cc: Sasha Levin Cc: Sumit Semwal Cc: Vlastimil Babka Cc: Xiaofeng Cao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm_inline.h | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index dd3accaa4e6d..cf90b1fa2c60 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -161,15 +161,25 @@ static inline void anon_vma_name_put(struct anon_vma_name *anon_name) kref_put(&anon_name->kref, anon_vma_name_free); } +static inline +struct anon_vma_name *anon_vma_name_reuse(struct anon_vma_name *anon_name) +{ + /* Prevent anon_name refcount saturation early on */ + if (kref_read(&anon_name->kref) < REFCOUNT_MAX) { + anon_vma_name_get(anon_name); + return anon_name; + + } + return anon_vma_name_alloc(anon_name->name); +} + static inline void dup_anon_vma_name(struct vm_area_struct *orig_vma, struct vm_area_struct *new_vma) { struct anon_vma_name *anon_name = anon_vma_name(orig_vma); - if (anon_name) { - anon_vma_name_get(anon_name); - new_vma->anon_name = anon_name; - } + if (anon_name) + new_vma->anon_name = anon_vma_name_reuse(anon_name); } static inline void free_anon_vma_name(struct vm_area_struct *vma) -- cgit From 25b35dd28138f61f9a0fb8b76c0483761fd228bd Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Sat, 5 Mar 2022 04:16:38 +0530 Subject: bpf: Add check_func_arg_reg_off function Lift the list of register types allowed for having fixed and variable offsets when passed as helper function arguments into a common helper, so that they can be reused for kfunc checks in later commits. Keeping a common helper aids maintainability and allows us to follow the same consistent rules across helpers and kfuncs. Also, convert check_func_arg to use this function. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220304224645.3677453-2-memxor@gmail.com --- include/linux/bpf_verifier.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 7a7be8c057f2..38b24ee8d8c2 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -521,6 +521,9 @@ bpf_prog_offload_remove_insns(struct bpf_verifier_env *env, u32 off, u32 cnt); int check_ptr_off_reg(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int regno); +int check_func_arg_reg_off(struct bpf_verifier_env *env, + const struct bpf_reg_state *reg, int regno, + enum bpf_arg_type arg_type); int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno); int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, -- cgit From 24d5bb806c7e2c0b9972564fd493069f612d90dd Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Sat, 5 Mar 2022 04:16:41 +0530 Subject: bpf: Harden register offset checks for release helpers and kfuncs Let's ensure that the PTR_TO_BTF_ID reg being passed in to release BPF helpers and kfuncs always has its offset set to 0. While not a real problem now, there's a very real possibility this will become a problem when more and more kfuncs are exposed, and more BPF helpers are added which can release PTR_TO_BTF_ID. Previous commits already protected against non-zero var_off. One of the case we are concerned about now is when we have a type that can be returned by e.g. an acquire kfunc: struct foo { int a; int b; struct bar b; }; ... and struct bar is also a type that can be returned by another acquire kfunc. Then, doing the following sequence: struct foo *f = bpf_get_foo(); // acquire kfunc if (!f) return 0; bpf_put_bar(&f->b); // release kfunc ... would work with the current code, since the btf_struct_ids_match takes reg->off into account for matching pointer type with release kfunc argument type, but would obviously be incorrect, and most likely lead to a kernel crash. A test has been included later to prevent regressions in this area. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220304224645.3677453-5-memxor@gmail.com --- include/linux/bpf_verifier.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 38b24ee8d8c2..c1fc4af47f69 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -523,7 +523,8 @@ int check_ptr_off_reg(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int regno); int check_func_arg_reg_off(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int regno, - enum bpf_arg_type arg_type); + enum bpf_arg_type arg_type, + bool is_release_func); int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno); int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, -- cgit From f014a00bbeb09cea16017b82448d32a468a6b96f Mon Sep 17 00:00:00 2001 From: Nathan Chancellor Date: Sat, 5 Mar 2022 04:16:42 +0530 Subject: compiler-clang.h: Add __diag infrastructure for clang Add __diag macros similar to those in compiler-gcc.h, so that warnings that need to be adjusted for specific cases but not globally can be ignored when building with clang. Signed-off-by: Nathan Chancellor Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220304224645.3677453-6-memxor@gmail.com [ Kartikeya: wrote commit message ] --- include/linux/compiler-clang.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index 3c4de9b6c6e3..f1aa41d520bd 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -68,3 +68,25 @@ #define __nocfi __attribute__((__no_sanitize__("cfi"))) #define __cficanonical __attribute__((__cfi_canonical_jump_table__)) + +/* + * Turn individual warnings and errors on and off locally, depending + * on version. + */ +#define __diag_clang(version, severity, s) \ + __diag_clang_ ## version(__diag_clang_ ## severity s) + +/* Severity used in pragma directives */ +#define __diag_clang_ignore ignored +#define __diag_clang_warn warning +#define __diag_clang_error error + +#define __diag_str1(s) #s +#define __diag_str(s) __diag_str1(s) +#define __diag(s) _Pragma(__diag_str(clang diagnostic s)) + +#if CONFIG_CLANG_VERSION >= 110000 +#define __diag_clang_11(s) __diag(s) +#else +#define __diag_clang_11(s) +#endif -- cgit From 4d1ea705d797e66edd70ffa708b83888a210a437 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Sat, 5 Mar 2022 04:16:43 +0530 Subject: compiler_types.h: Add unified __diag_ignore_all for GCC/LLVM Add a __diag_ignore_all macro, to ignore warnings for both GCC and LLVM, without having to specify the compiler type and version. By default, GCC 8 and clang 11 are used. This will be used by bpf subsystem to ignore -Wmissing-prototypes warning for functions that are meant to be global functions so that they are in vmlinux BTF, but don't have a prototype. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220304224645.3677453-7-memxor@gmail.com --- include/linux/compiler-clang.h | 3 +++ include/linux/compiler-gcc.h | 3 +++ include/linux/compiler_types.h | 4 ++++ 3 files changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index f1aa41d520bd..babb1347148c 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -90,3 +90,6 @@ #else #define __diag_clang_11(s) #endif + +#define __diag_ignore_all(option, comment) \ + __diag_clang(11, ignore, option) diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index ccbbd31b3aae..d364c98a4a80 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -151,6 +151,9 @@ #define __diag_GCC_8(s) #endif +#define __diag_ignore_all(option, comment) \ + __diag_GCC(8, ignore, option) + /* * Prior to 9.1, -Wno-alloc-size-larger-than (and therefore the "alloc_size" * attribute) do not work, and must be disabled. diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 3f31ff400432..8e5d2f50f951 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -371,4 +371,8 @@ struct ftrace_likely_data { #define __diag_error(compiler, version, option, comment) \ __diag_ ## compiler(version, error, option) +#ifndef __diag_ignore_all +#define __diag_ignore_all(option, comment) +#endif + #endif /* __LINUX_COMPILER_TYPES_H */ -- cgit From 9216c916237805c93d054ed022afb172ddbc3ed1 Mon Sep 17 00:00:00 2001 From: Hao Luo Date: Fri, 4 Mar 2022 11:16:55 -0800 Subject: compiler_types: Define __percpu as __attribute__((btf_type_tag("percpu"))) This is similar to commit 7472d5a642c9 ("compiler_types: define __user as __attribute__((btf_type_tag("user")))"), where a type tag "user" was introduced to identify the pointers that point to user memory. With that change, the newest compile toolchain can encode __user information into vmlinux BTF, which can be used by the BPF verifier to enforce safe program behaviors. Similarly, we have __percpu attribute, which is mainly used to indicate memory is allocated in percpu region. The __percpu pointers in kernel are supposed to be used together with functions like per_cpu_ptr() and this_cpu_ptr(), which perform necessary calculation on the pointer's base address. Without the btf_type_tag introduced in this patch, __percpu pointers will be treated as regular memory pointers in vmlinux BTF and BPF programs are allowed to directly dereference them, generating incorrect behaviors. Now with "percpu" btf_type_tag, the BPF verifier is able to differentiate __percpu pointers from regular pointers and forbids unexpected behaviors like direct load. The following is an example similar to the one given in commit 7472d5a642c9: [$ ~] cat test.c #define __percpu __attribute__((btf_type_tag("percpu"))) int foo(int __percpu *arg) { return *arg; } [$ ~] clang -O2 -g -c test.c [$ ~] pahole -JV test.o ... File test.o: [1] INT int size=4 nr_bits=32 encoding=SIGNED [2] TYPE_TAG percpu type_id=1 [3] PTR (anon) type_id=2 [4] FUNC_PROTO (anon) return=1 args=(3 arg) [5] FUNC foo type_id=4 [$ ~] for the function argument "int __percpu *arg", its type is described as PTR -> TYPE_TAG(percpu) -> INT The kernel can use this information for bpf verification or other use cases. Like commit 7472d5a642c9, this feature requires clang (>= clang14) and pahole (>= 1.23). Signed-off-by: Hao Luo Signed-off-by: Alexei Starovoitov Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20220304191657.981240-3-haoluo@google.com --- include/linux/compiler_types.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 8e5d2f50f951..b9a8ae9440c7 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -38,7 +38,12 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { } # define __user # endif # define __iomem -# define __percpu +# if defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \ + __has_attribute(btf_type_tag) +# define __percpu __attribute__((btf_type_tag("percpu"))) +# else +# define __percpu +# endif # define __rcu # define __chk_user_ptr(x) (void)0 # define __chk_io_ptr(x) (void)0 -- cgit From 5844101a1be9b8636024cb31c865ef13c7cc6db3 Mon Sep 17 00:00:00 2001 From: Hao Luo Date: Fri, 4 Mar 2022 11:16:56 -0800 Subject: bpf: Reject programs that try to load __percpu memory. With the introduction of the btf_type_tag "percpu", we can add a MEM_PERCPU to identify those pointers that point to percpu memory. The ability of differetiating percpu pointers from regular memory pointers have two benefits: 1. It forbids unexpected use of percpu pointers, such as direct loads. In kernel, there are special functions used for accessing percpu memory. Directly loading percpu memory is meaningless. We already have BPF helpers like bpf_per_cpu_ptr() and bpf_this_cpu_ptr() that wrap the kernel percpu functions. So we can now convert percpu pointers into regular pointers in a safe way. 2. Previously, bpf_per_cpu_ptr() and bpf_this_cpu_ptr() only work on PTR_TO_PERCPU_BTF_ID, a special reg_type which describes static percpu variables in kernel (we rely on pahole to encode them into vmlinux BTF). Now, since we can identify __percpu tagged pointers, we can also identify dynamically allocated percpu memory as well. It means we can use bpf_xxx_cpu_ptr() on dynamic percpu memory. This would be very convenient when accessing fields like "cgroup->rstat_cpu". Signed-off-by: Hao Luo Signed-off-by: Alexei Starovoitov Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20220304191657.981240-4-haoluo@google.com --- include/linux/bpf.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f19abc59b6cd..88449fbbe063 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -334,7 +334,15 @@ enum bpf_type_flag { /* MEM is in user address space. */ MEM_USER = BIT(3 + BPF_BASE_TYPE_BITS), - __BPF_TYPE_LAST_FLAG = MEM_USER, + /* MEM is a percpu memory. MEM_PERCPU tags PTR_TO_BTF_ID. When tagged + * with MEM_PERCPU, PTR_TO_BTF_ID _cannot_ be directly accessed. In + * order to drop this tag, it must be passed into bpf_per_cpu_ptr() + * or bpf_this_cpu_ptr(), which will return the pointer corresponding + * to the specified cpu. + */ + MEM_PERCPU = BIT(4 + BPF_BASE_TYPE_BITS), + + __BPF_TYPE_LAST_FLAG = MEM_PERCPU, }; /* Max number of base types. */ @@ -516,7 +524,6 @@ enum bpf_reg_type { */ PTR_TO_MEM, /* reg points to valid memory region */ PTR_TO_BUF, /* reg points to a read/write buffer */ - PTR_TO_PERCPU_BTF_ID, /* reg points to a percpu kernel variable */ PTR_TO_FUNC, /* reg points to a bpf program function */ __BPF_REG_TYPE_MAX, -- cgit From 736f16de75f9bb32d76f652cb66f04d1bc685057 Mon Sep 17 00:00:00 2001 From: Dongli Zhang Date: Fri, 4 Mar 2022 06:55:05 -0800 Subject: net: tap: track dropped skb via kfree_skb_reason() The TAP can be used as vhost-net backend. E.g., the tap_handle_frame() is the interface to forward the skb from TAP to vhost-net/virtio-net. However, there are many "goto drop" in the TAP driver. Therefore, the kfree_skb_reason() is involved at each "goto drop" to help userspace ftrace/ebpf to track the reason for the loss of packets. The below reasons are introduced: - SKB_DROP_REASON_SKB_CSUM - SKB_DROP_REASON_SKB_GSO_SEG - SKB_DROP_REASON_SKB_UCOPY_FAULT - SKB_DROP_REASON_DEV_HDR - SKB_DROP_REASON_FULL_RING Cc: Joao Martins Cc: Joe Jin Signed-off-by: Dongli Zhang Signed-off-by: David S. Miller --- include/linux/skbuff.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2be263184d1e..67cfff4065b6 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -412,6 +412,19 @@ enum skb_drop_reason { * this means that L3 protocol is * not supported */ + SKB_DROP_REASON_SKB_CSUM, /* sk_buff checksum computation + * error + */ + SKB_DROP_REASON_SKB_GSO_SEG, /* gso segmentation error */ + SKB_DROP_REASON_SKB_UCOPY_FAULT, /* failed to copy data from + * user space, e.g., via + * zerocopy_sg_from_iter() + * or skb_orphan_frags_rx() + */ + SKB_DROP_REASON_DEV_HDR, /* device driver specific + * header/metadata is invalid + */ + SKB_DROP_REASON_FULL_RING, /* ring buffer is full */ SKB_DROP_REASON_MAX, }; -- cgit From 4b4f052e2d89c2eb7e13ee28ba9e85f8097aef3d Mon Sep 17 00:00:00 2001 From: Dongli Zhang Date: Fri, 4 Mar 2022 06:55:07 -0800 Subject: net: tun: track dropped skb via kfree_skb_reason() The TUN can be used as vhost-net backend. E.g, the tun_net_xmit() is the interface to forward the skb from TUN to vhost-net/virtio-net. However, there are many "goto drop" in the TUN driver. Therefore, the kfree_skb_reason() is involved at each "goto drop" to help userspace ftrace/ebpf to track the reason for the loss of packets. The below reasons are introduced: - SKB_DROP_REASON_DEV_READY - SKB_DROP_REASON_NOMEM - SKB_DROP_REASON_HDR_TRUNC - SKB_DROP_REASON_TAP_FILTER - SKB_DROP_REASON_TAP_TXFILTER Cc: Joao Martins Cc: Joe Jin Signed-off-by: Dongli Zhang Signed-off-by: David S. Miller --- include/linux/skbuff.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 67cfff4065b6..34f572271c0c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -424,7 +424,25 @@ enum skb_drop_reason { SKB_DROP_REASON_DEV_HDR, /* device driver specific * header/metadata is invalid */ + /* the device is not ready to xmit/recv due to any of its data + * structure that is not up/ready/initialized, e.g., the IFF_UP is + * not set, or driver specific tun->tfiles[txq] is not initialized + */ + SKB_DROP_REASON_DEV_READY, SKB_DROP_REASON_FULL_RING, /* ring buffer is full */ + SKB_DROP_REASON_NOMEM, /* error due to OOM */ + SKB_DROP_REASON_HDR_TRUNC, /* failed to trunc/extract the header + * from networking data, e.g., failed + * to pull the protocol header from + * frags via pskb_may_pull() + */ + SKB_DROP_REASON_TAP_FILTER, /* dropped by (ebpf) filter directly + * attached to tun/tap, e.g., via + * TUNSETFILTEREBPF + */ + SKB_DROP_REASON_TAP_TXFILTER, /* dropped by tx filter implemented + * at tun/tap, e.g., check_filter() + */ SKB_DROP_REASON_MAX, }; -- cgit From f72de02ebece2e962462bc0c1e9efd29eaa029b2 Mon Sep 17 00:00:00 2001 From: Kurt Kanzenbach Date: Sat, 5 Mar 2022 12:21:25 +0100 Subject: ptp: Add generic PTP is_sync() function PHY drivers such as micrel or dp83640 need to analyze whether a given skb is a PTP sync message for one step functionality. In order to avoid code duplication introduce a generic function and move it to ptp classify. Signed-off-by: Kurt Kanzenbach Signed-off-by: David S. Miller --- include/linux/ptp_classify.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ptp_classify.h b/include/linux/ptp_classify.h index 9afd34a2d36c..fefa7790dc46 100644 --- a/include/linux/ptp_classify.h +++ b/include/linux/ptp_classify.h @@ -126,6 +126,17 @@ static inline u8 ptp_get_msgtype(const struct ptp_header *hdr, return msgtype; } +/** + * ptp_msg_is_sync - Evaluates whether the given skb is a PTP Sync message + * @skb: packet buffer + * @type: type of the packet (see ptp_classify_raw()) + * + * This function evaluates whether the given skb is a PTP Sync message. + * + * Return: true if sync message, false otherwise + */ +bool ptp_msg_is_sync(struct sk_buff *skb, unsigned int type); + void __init ptp_classifier_init(void); #else static inline void ptp_classifier_init(void) @@ -148,5 +159,9 @@ static inline u8 ptp_get_msgtype(const struct ptp_header *hdr, */ return PTP_MSGTYPE_SYNC; } +static inline bool ptp_msg_is_sync(struct sk_buff *skb, unsigned int type) +{ + return false; +} #endif #endif /* _PTP_CLASSIFY_H_ */ -- cgit From 2655926aea9beea62c9ba80c032485456fd848f0 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Sun, 6 Mar 2022 22:57:52 +0100 Subject: net: Remove netif_rx_any_context() and netif_rx_ni(). Remove netif_rx_any_context and netif_rx_ni() because there are no more users in tree. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: David S. Miller --- include/linux/netdevice.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 19a27ac361ef..29a850a8d460 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3718,16 +3718,6 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb); int netif_rx(struct sk_buff *skb); int __netif_rx(struct sk_buff *skb); -static inline int netif_rx_ni(struct sk_buff *skb) -{ - return netif_rx(skb); -} - -static inline int netif_rx_any_context(struct sk_buff *skb) -{ - return netif_rx(skb); -} - int netif_receive_skb(struct sk_buff *skb); int netif_receive_skb_core(struct sk_buff *skb); void netif_receive_skb_list_internal(struct list_head *head); -- cgit From 23c7f8d7989e1646aac82f75761b7648c355cb8a Mon Sep 17 00:00:00 2001 From: Steffen Klassert Date: Mon, 7 Mar 2022 13:11:41 +0100 Subject: net: Fix esp GSO on inter address family tunnels. The esp tunnel GSO handlers use skb_mac_gso_segment to push the inner packet to the segmentation handlers. However, skb_mac_gso_segment takes the Ethernet Protocol ID from 'skb->protocol' which is wrong for inter address family tunnels. We fix this by introducing a new skb_eth_gso_segment function. This function can be used if it is necessary to pass the Ethernet Protocol ID directly to the segmentation handler. First users of this function will be the esp4 and esp6 tunnel segmentation handlers. Fixes: c35fe4106b92 ("xfrm: Add mode handlers for IPsec on layer 2") Signed-off-by: Steffen Klassert --- include/linux/netdevice.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 8b5a314db167..f53ea7038441 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4602,6 +4602,8 @@ int skb_csum_hwoffload_help(struct sk_buff *skb, struct sk_buff *__skb_gso_segment(struct sk_buff *skb, netdev_features_t features, bool tx_path); +struct sk_buff *skb_eth_gso_segment(struct sk_buff *skb, + netdev_features_t features, __be16 type); struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb, netdev_features_t features); -- cgit From 97939610b893de068c82c347d06319cd231a4602 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 4 Mar 2022 19:01:05 +0100 Subject: block: remove bio_devname All callers are gone, so remove this wrapper. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220304180105.409765-11-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 7523aba4ddf7..4c21f6e69e18 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -491,8 +491,6 @@ static inline void bio_release_pages(struct bio *bio, bool mark_dirty) __bio_release_pages(bio, mark_dirty); } -extern const char *bio_devname(struct bio *bio, char *buffer); - #define bio_dev(bio) \ disk_devt((bio)->bi_bdev->bd_disk) -- cgit From a26d84633c2ba3c41fd7692986fb2231c9228c4e Mon Sep 17 00:00:00 2001 From: Luca Ceresoli Date: Wed, 23 Feb 2022 18:59:02 +0100 Subject: rtc: max77686: Rename day-of-month defines RTC_DATE and REG_RTC_DATE are used for the registers holding the day of month. Rename these constants to mean what they mean. Signed-off-by: Luca Ceresoli Reviewed-by: Krzysztof Kozlowski Acked-by: Alexandre Belloni Signed-off-by: Lee Jones --- include/linux/mfd/max77686-private.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/max77686-private.h b/include/linux/mfd/max77686-private.h index b1482b3cf353..3acceeedbaba 100644 --- a/include/linux/mfd/max77686-private.h +++ b/include/linux/mfd/max77686-private.h @@ -152,7 +152,7 @@ enum max77686_rtc_reg { MAX77686_RTC_WEEKDAY = 0x0A, MAX77686_RTC_MONTH = 0x0B, MAX77686_RTC_YEAR = 0x0C, - MAX77686_RTC_DATE = 0x0D, + MAX77686_RTC_MONTHDAY = 0x0D, MAX77686_ALARM1_SEC = 0x0E, MAX77686_ALARM1_MIN = 0x0F, MAX77686_ALARM1_HOUR = 0x10, @@ -352,7 +352,7 @@ enum max77802_rtc_reg { MAX77802_RTC_WEEKDAY = 0xCA, MAX77802_RTC_MONTH = 0xCB, MAX77802_RTC_YEAR = 0xCC, - MAX77802_RTC_DATE = 0xCD, + MAX77802_RTC_MONTHDAY = 0xCD, MAX77802_RTC_AE1 = 0xCE, MAX77802_ALARM1_SEC = 0xCF, MAX77802_ALARM1_MIN = 0xD0, -- cgit From 60b050ff3a60273d56b4017d0f45d5ca71aae5ad Mon Sep 17 00:00:00 2001 From: Luca Ceresoli Date: Wed, 23 Feb 2022 18:59:05 +0100 Subject: mfd: max77714: Add driver for Maxim MAX77714 PMIC Add a simple driver for the Maxim MAX77714 PMIC, supporting RTC and watchdog only. Signed-off-by: Luca Ceresoli Reviewed-by: Krzysztof Kozlowski Signed-off-by: Lee Jones --- include/linux/mfd/max77714.h | 60 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 60 insertions(+) create mode 100644 include/linux/mfd/max77714.h (limited to 'include/linux') diff --git a/include/linux/mfd/max77714.h b/include/linux/mfd/max77714.h new file mode 100644 index 000000000000..a970dc455426 --- /dev/null +++ b/include/linux/mfd/max77714.h @@ -0,0 +1,60 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Maxim MAX77714 Register and data structures definition. + * + * Copyright (C) 2022 Luca Ceresoli + * Author: Luca Ceresoli + */ + +#ifndef __LINUX_MFD_MAX77714_H_ +#define __LINUX_MFD_MAX77714_H_ + +#include + +#define MAX77714_INT_TOP 0x00 +#define MAX77714_INT_TOPM 0x07 /* Datasheet says "read only", but it is RW */ + +#define MAX77714_INT_TOP_ONOFF BIT(1) +#define MAX77714_INT_TOP_RTC BIT(3) +#define MAX77714_INT_TOP_GPIO BIT(4) +#define MAX77714_INT_TOP_LDO BIT(5) +#define MAX77714_INT_TOP_SD BIT(6) +#define MAX77714_INT_TOP_GLBL BIT(7) + +#define MAX77714_32K_STATUS 0x30 +#define MAX77714_32K_STATUS_SIOSCOK BIT(5) +#define MAX77714_32K_STATUS_XOSCOK BIT(4) +#define MAX77714_32K_STATUS_32KSOURCE BIT(3) +#define MAX77714_32K_STATUS_32KLOAD_MSK 0x3 +#define MAX77714_32K_STATUS_32KLOAD_SHF 1 +#define MAX77714_32K_STATUS_CRYSTAL_CFG BIT(0) + +#define MAX77714_32K_CONFIG 0x31 +#define MAX77714_32K_CONFIG_XOSC_RETRY BIT(4) + +#define MAX77714_CNFG_GLBL2 0x91 +#define MAX77714_WDTEN BIT(2) +#define MAX77714_WDTSLPC BIT(3) +#define MAX77714_TWD_MASK 0x3 +#define MAX77714_TWD_2s 0x0 +#define MAX77714_TWD_16s 0x1 +#define MAX77714_TWD_64s 0x2 +#define MAX77714_TWD_128s 0x3 + +#define MAX77714_CNFG_GLBL3 0x92 +#define MAX77714_WDTC BIT(0) + +#define MAX77714_CNFG2_ONOFF 0x94 +#define MAX77714_WD_RST_WK BIT(5) + +/* Interrupts */ +enum { + MAX77714_IRQ_TOP_ONOFF, + MAX77714_IRQ_TOP_RTC, /* Real-time clock */ + MAX77714_IRQ_TOP_GPIO, /* GPIOs */ + MAX77714_IRQ_TOP_LDO, /* Low-dropout regulators */ + MAX77714_IRQ_TOP_SD, /* Step-down regulators */ + MAX77714_IRQ_TOP_GLBL, /* "Global resources": Low-Battery, overtemp... */ +}; + +#endif /* __LINUX_MFD_MAX77714_H_ */ -- cgit From c47383f849097c2b3547e28365578cd9e5811378 Mon Sep 17 00:00:00 2001 From: Johnson Wang Date: Thu, 6 Jan 2022 14:54:04 +0800 Subject: mfd: Add support for the MediaTek MT6366 PMIC This adds support for the MediaTek MT6366 PMIC. This is a multifunction device with the following sub modules: - Regulator - RTC - Codec - Interrupt It is interfaced to the host controller using SPI interface by a proprietary hardware called PMIC wrapper or pwrap. MT6366 MFD is a child device of the pwrap. Signed-off-by: Johnson Wang Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220106065407.16036-2-johnson.wang@mediatek.com --- include/linux/mfd/mt6358/registers.h | 7 +++++++ include/linux/mfd/mt6397/core.h | 1 + 2 files changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mfd/mt6358/registers.h b/include/linux/mfd/mt6358/registers.h index 201139b12140..3d33517f178c 100644 --- a/include/linux/mfd/mt6358/registers.h +++ b/include/linux/mfd/mt6358/registers.h @@ -94,6 +94,10 @@ #define MT6358_BUCK_VCORE_CON0 0x1488 #define MT6358_BUCK_VCORE_DBG0 0x149e #define MT6358_BUCK_VCORE_DBG1 0x14a0 +#define MT6358_BUCK_VCORE_SSHUB_CON0 0x14a4 +#define MT6358_BUCK_VCORE_SSHUB_CON1 0x14a6 +#define MT6358_BUCK_VCORE_SSHUB_ELR0 MT6358_BUCK_VCORE_SSHUB_CON1 +#define MT6358_BUCK_VCORE_SSHUB_DBG1 MT6358_BUCK_VCORE_DBG1 #define MT6358_BUCK_VCORE_ELR0 0x14aa #define MT6358_BUCK_VGPU_CON0 0x1508 #define MT6358_BUCK_VGPU_DBG0 0x151e @@ -169,6 +173,9 @@ #define MT6358_LDO_VSRAM_OTHERS_CON0 0x1ba6 #define MT6358_LDO_VSRAM_OTHERS_DBG0 0x1bc0 #define MT6358_LDO_VSRAM_OTHERS_DBG1 0x1bc2 +#define MT6358_LDO_VSRAM_OTHERS_SSHUB_CON0 0x1bc4 +#define MT6358_LDO_VSRAM_OTHERS_SSHUB_CON1 0x1bc6 +#define MT6358_LDO_VSRAM_OTHERS_SSHUB_DBG1 MT6358_LDO_VSRAM_OTHERS_DBG1 #define MT6358_LDO_VSRAM_GPU_CON0 0x1bc8 #define MT6358_LDO_VSRAM_GPU_DBG0 0x1be2 #define MT6358_LDO_VSRAM_GPU_DBG1 0x1be4 diff --git a/include/linux/mfd/mt6397/core.h b/include/linux/mfd/mt6397/core.h index 56f210eebc54..1cf78726503b 100644 --- a/include/linux/mfd/mt6397/core.h +++ b/include/linux/mfd/mt6397/core.h @@ -14,6 +14,7 @@ enum chip_id { MT6323_CHIP_ID = 0x23, MT6358_CHIP_ID = 0x58, MT6359_CHIP_ID = 0x59, + MT6366_CHIP_ID = 0x66, MT6391_CHIP_ID = 0x91, MT6397_CHIP_ID = 0x97, }; -- cgit From aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13 Mon Sep 17 00:00:00 2001 From: Halil Pasic Date: Sat, 5 Mar 2022 18:07:14 +0100 Subject: swiotlb: rework "fix info leak with DMA_FROM_DEVICE" Unfortunately, we ended up merging an old version of the patch "fix info leak with DMA_FROM_DEVICE" instead of merging the latest one. Christoph (the swiotlb maintainer), he asked me to create an incremental fix (after I have pointed this out the mix up, and asked him for guidance). So here we go. The main differences between what we got and what was agreed are: * swiotlb_sync_single_for_device is also required to do an extra bounce * We decided not to introduce DMA_ATTR_OVERWRITE until we have exploiters * The implantation of DMA_ATTR_OVERWRITE is flawed: DMA_ATTR_OVERWRITE must take precedence over DMA_ATTR_SKIP_CPU_SYNC Thus this patch removes DMA_ATTR_OVERWRITE, and makes swiotlb_sync_single_for_device() bounce unconditionally (that is, also when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale data from the swiotlb buffer. Let me note, that if the size used with dma_sync_* API is less than the size used with dma_[un]map_*, under certain circumstances we may still end up with swiotlb not being transparent. In that sense, this is no perfect fix either. To get this bullet proof, we would have to bounce the entire mapping/bounce buffer. For that we would have to figure out the starting address, and the size of the mapping in swiotlb_sync_single_for_device(). While this does seem possible, there seems to be no firm consensus on how things are supposed to work. Signed-off-by: Halil Pasic Fixes: ddbd89deb7d3 ("swiotlb: fix info leak with DMA_FROM_DEVICE") Cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig Signed-off-by: Linus Torvalds --- include/linux/dma-mapping.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 6150d11a607e..dca2b1355bb1 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -61,14 +61,6 @@ */ #define DMA_ATTR_PRIVILEGED (1UL << 9) -/* - * This is a hint to the DMA-mapping subsystem that the device is expected - * to overwrite the entire mapped size, thus the caller does not require any - * of the previous buffer contents to be preserved. This allows - * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers. - */ -#define DMA_ATTR_OVERWRITE (1UL << 10) - /* * A dma_addr_t can hold any valid DMA or bus address for the platform. It can * be given to a device to use as a DMA source or target. It is specific to a -- cgit From c75e707fe1aab32f1dc8e09845533b6542d9aaa9 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 4 Mar 2022 18:55:56 +0100 Subject: block: remove the per-bio/request write hint With the NVMe support for this gone, there are no consumers of these hints left, so remove them. Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 1 - include/linux/blkdev.h | 3 --- 2 files changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 5561e58d158a..ba8cfa57255f 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -250,7 +250,6 @@ struct bio { */ unsigned short bi_flags; /* BIO_* below */ unsigned short bi_ioprio; - unsigned short bi_write_hint; blk_status_t bi_status; atomic_t __bi_remaining; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index a12c031af887..226836ccd060 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -518,9 +518,6 @@ struct request_queue { bool mq_sysfs_init_done; -#define BLK_MAX_WRITE_HINTS 5 - u64 write_hints[BLK_MAX_WRITE_HINTS]; - /* * Independent sector access ranges. This is always NULL for * devices that do not have multiple independent access ranges. -- cgit From c340b990d58c856c1636e0c10abb9e4351ad852a Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 3 Mar 2022 12:13:05 -0800 Subject: block: support pi with extended metadata The nvme spec allows protection information formats with metadata extending beyond the pi field. Use the actual size of the metadata field for incrementing the buffer. Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen Signed-off-by: Keith Busch Link: https://lore.kernel.org/r/20220303201312.3255347-2-kbusch@kernel.org Signed-off-by: Jens Axboe --- include/linux/blk-integrity.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h index 8a038ea0717e..378b2459efe2 100644 --- a/include/linux/blk-integrity.h +++ b/include/linux/blk-integrity.h @@ -19,6 +19,7 @@ struct blk_integrity_iter { sector_t seed; unsigned int data_size; unsigned short interval; + unsigned char tuple_size; const char *disk_name; }; -- cgit From 7ee8809df990d1de379002973baee1681e8d7dd3 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 3 Mar 2022 12:13:08 -0800 Subject: linux/kernel: introduce lower_48_bits function Recent data integrity field enhancements allow reference tags to be up to 48 bits. Introduce an inline helper function since this will be a repeated operation. Suggested-by: Bart Van Assche Signed-off-by: Keith Busch Reviewed-by: Bart Van Assche Link: https://lore.kernel.org/r/20220303201312.3255347-5-kbusch@kernel.org Signed-off-by: Jens Axboe --- include/linux/kernel.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 33f47a996513..3877478681b9 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -63,6 +63,15 @@ } \ ) +/** + * lower_48_bits() - return bits 0-47 of a number + * @n: the number we're accessing + */ +static inline u64 lower_48_bits(u64 n) +{ + return n & ((1ull << 48) - 1); +} + /** * upper_32_bits - return bits 32-63 of a number * @n: the number we're accessing -- cgit From cbc0a40e17da361a2ada8d669413ccfbd2028f2d Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 3 Mar 2022 12:13:09 -0800 Subject: lib: add rocksoft model crc64 The NVM Express specification extended data integrity fields to 64 bits using the Rocksoft parameters. Add the poly to the crc64 table generation, and provide a generic library routine implementing the algorithm. The Rocksoft 64-bit CRC model parameters are as follows: Poly: 0xAD93D23594C93659 Initial value: 0xFFFFFFFFFFFFFFFF Reflected Input: True Reflected Output: True Xor Final: 0xFFFFFFFFFFFFFFFF Since this model used reflected bits, the implementation generates the reflected table so the result is ordered consistently. Cc: Christoph Hellwig Cc: Hannes Reinecke Cc: Martin K. Petersen Signed-off-by: Keith Busch Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220303201312.3255347-6-kbusch@kernel.org Signed-off-by: Jens Axboe --- include/linux/crc64.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/crc64.h b/include/linux/crc64.h index c756e65a1b58..9480f38cc7cf 100644 --- a/include/linux/crc64.h +++ b/include/linux/crc64.h @@ -8,4 +8,6 @@ #include u64 __pure crc64_be(u64 crc, const void *p, size_t len); +u64 __pure crc64_rocksoft_generic(u64 crc, const void *p, size_t len); + #endif /* _LINUX_CRC64_H */ -- cgit From f3813f4b287e480b1fcd62ca798d8556644b8278 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 3 Mar 2022 12:13:10 -0800 Subject: crypto: add rocksoft 64b crc guard tag framework Hardware specific features may be able to calculate a crc64, so provide a framework for drivers to register their implementation. If nothing is registered, fallback to the generic table lookup implementation. The implementation is modeled after the crct10dif equivalent. Signed-off-by: Keith Busch Link: https://lore.kernel.org/r/20220303201312.3255347-7-kbusch@kernel.org Signed-off-by: Jens Axboe --- include/linux/crc64.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/crc64.h b/include/linux/crc64.h index 9480f38cc7cf..e044c60d1e61 100644 --- a/include/linux/crc64.h +++ b/include/linux/crc64.h @@ -7,7 +7,12 @@ #include +#define CRC64_ROCKSOFT_STRING "crc64-rocksoft" + u64 __pure crc64_be(u64 crc, const void *p, size_t len); u64 __pure crc64_rocksoft_generic(u64 crc, const void *p, size_t len); +u64 crc64_rocksoft(const unsigned char *buffer, size_t len); +u64 crc64_rocksoft_update(u64 crc, const unsigned char *buffer, size_t len); + #endif /* _LINUX_CRC64_H */ -- cgit From a7d4383f17e10f338ea757a849f02298790d24fb Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 3 Mar 2022 12:13:11 -0800 Subject: block: add pi for extended integrity The NVMe specification defines new data integrity formats beyond the t10 tuple. Add support for the specification defined CRC64 formats, assuming the reference tag does not need to be split with the "storage tag". Cc: Hannes Reinecke Cc: "Martin K. Petersen" Signed-off-by: Keith Busch Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220303201312.3255347-8-kbusch@kernel.org Signed-off-by: Jens Axboe --- include/linux/t10-pi.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/t10-pi.h b/include/linux/t10-pi.h index c635c2e014e3..a4b1af581f69 100644 --- a/include/linux/t10-pi.h +++ b/include/linux/t10-pi.h @@ -53,4 +53,24 @@ extern const struct blk_integrity_profile t10_pi_type1_ip; extern const struct blk_integrity_profile t10_pi_type3_crc; extern const struct blk_integrity_profile t10_pi_type3_ip; +struct crc64_pi_tuple { + __be64 guard_tag; + __be16 app_tag; + __u8 ref_tag[6]; +}; + +static inline u64 ext_pi_ref_tag(struct request *rq) +{ + unsigned int shift = ilog2(queue_logical_block_size(rq->q)); + +#ifdef CONFIG_BLK_DEV_INTEGRITY + if (rq->q->integrity.interval_exp) + shift = rq->q->integrity.interval_exp; +#endif + return lower_48_bits(blk_rq_pos(rq) >> (shift - SECTOR_SHIFT)); +} + +extern const struct blk_integrity_profile ext_pi_type1_crc64; +extern const struct blk_integrity_profile ext_pi_type3_crc64; + #endif -- cgit From 4020aad85c6785ddac8d51f345ff9e3328ce773a Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 3 Mar 2022 12:13:12 -0800 Subject: nvme: add support for enhanced metadata NVM Express ratified TP 4068 defines new protection information formats. Implement support for the CRC64 guard tags. Since the block layer doesn't support variable length reference tags, driver support for the Storage Tag space is not supported at this time. Cc: Hannes Reinecke Cc: "Martin K. Petersen" Cc: Klaus Jensen Signed-off-by: Keith Busch Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220303201312.3255347-9-kbusch@kernel.org Signed-off-by: Jens Axboe --- include/linux/nvme.h | 53 ++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 47 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 9dbc3ef4daf7..4f44f83817a9 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -244,6 +244,7 @@ enum { enum nvme_ctrl_attr { NVME_CTRL_ATTR_HID_128_BIT = (1 << 0), NVME_CTRL_ATTR_TBKAS = (1 << 6), + NVME_CTRL_ATTR_ELBAS = (1 << 15), }; struct nvme_id_ctrl { @@ -399,8 +400,7 @@ struct nvme_id_ns { __le16 endgid; __u8 nguid[16]; __u8 eui64[8]; - struct nvme_lbaf lbaf[16]; - __u8 rsvd192[192]; + struct nvme_lbaf lbaf[64]; __u8 vs[3712]; }; @@ -418,8 +418,7 @@ struct nvme_id_ns_zns { __le32 rrl; __le32 frl; __u8 rsvd20[2796]; - struct nvme_zns_lbafe lbafe[16]; - __u8 rsvd3072[768]; + struct nvme_zns_lbafe lbafe[64]; __u8 vs[256]; }; @@ -428,6 +427,30 @@ struct nvme_id_ctrl_zns { __u8 rsvd1[4095]; }; +struct nvme_id_ns_nvm { + __le64 lbstm; + __u8 pic; + __u8 rsvd9[3]; + __le32 elbaf[64]; + __u8 rsvd268[3828]; +}; + +enum { + NVME_ID_NS_NVM_STS_MASK = 0x3f, + NVME_ID_NS_NVM_GUARD_SHIFT = 7, + NVME_ID_NS_NVM_GUARD_MASK = 0x3, +}; + +static inline __u8 nvme_elbaf_sts(__u32 elbaf) +{ + return elbaf & NVME_ID_NS_NVM_STS_MASK; +} + +static inline __u8 nvme_elbaf_guard_type(__u32 elbaf) +{ + return (elbaf >> NVME_ID_NS_NVM_GUARD_SHIFT) & NVME_ID_NS_NVM_GUARD_MASK; +} + struct nvme_id_ctrl_nvm { __u8 vsl; __u8 wzsl; @@ -478,6 +501,8 @@ enum { NVME_NS_FEAT_IO_OPT = 1 << 4, NVME_NS_ATTR_RO = 1 << 0, NVME_NS_FLBAS_LBA_MASK = 0xf, + NVME_NS_FLBAS_LBA_UMASK = 0x60, + NVME_NS_FLBAS_LBA_SHIFT = 1, NVME_NS_FLBAS_META_EXT = 0x10, NVME_NS_NMIC_SHARED = 1 << 0, NVME_LBAF_RP_BEST = 0, @@ -496,6 +521,18 @@ enum { NVME_NS_DPS_PI_TYPE3 = 3, }; +enum { + NVME_NVM_NS_16B_GUARD = 0, + NVME_NVM_NS_32B_GUARD = 1, + NVME_NVM_NS_64B_GUARD = 2, +}; + +static inline __u8 nvme_lbaf_index(__u8 flbas) +{ + return (flbas & NVME_NS_FLBAS_LBA_MASK) | + ((flbas & NVME_NS_FLBAS_LBA_UMASK) >> NVME_NS_FLBAS_LBA_SHIFT); +} + /* Identify Namespace Metadata Capabilities (MC): */ enum { NVME_MC_EXTENDED_LBA = (1 << 0), @@ -842,7 +879,8 @@ struct nvme_rw_command { __u8 flags; __u16 command_id; __le32 nsid; - __u64 rsvd2; + __le32 cdw2; + __le32 cdw3; __le64 metadata; union nvme_data_ptr dptr; __le64 slba; @@ -996,11 +1034,14 @@ enum { struct nvme_feat_host_behavior { __u8 acre; - __u8 resv1[511]; + __u8 etdas; + __u8 lbafee; + __u8 resv1[509]; }; enum { NVME_ENABLE_ACRE = 1, + NVME_ENABLE_LBAFEE = 1, }; /* Admin commands */ -- cgit From 2984539959dbaf4e65e19bf90c2419304a81a985 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Tue, 8 Feb 2022 17:16:33 +0100 Subject: tick/rcu: Remove obsolete rcu_needs_cpu() parameters With the removal of CONFIG_RCU_FAST_NO_HZ, the parameters in rcu_needs_cpu() are not necessary anymore. Simply remove them. Signed-off-by: Frederic Weisbecker Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Paul E. McKenney Cc: Paul Menzel --- include/linux/rcutiny.h | 3 +-- include/linux/rcutree.h | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 858f4d429946..5fed476f977f 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -64,9 +64,8 @@ static inline void rcu_softirq_qs(void) rcu_tasks_qs(current, (preempt)); \ } while (0) -static inline int rcu_needs_cpu(u64 basemono, u64 *nextevt) +static inline int rcu_needs_cpu(void) { - *nextevt = KTIME_MAX; return 0; } diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 53209d669400..6cc91291d078 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -19,7 +19,7 @@ void rcu_softirq_qs(void); void rcu_note_context_switch(bool preempt); -int rcu_needs_cpu(u64 basem, u64 *nextevt); +int rcu_needs_cpu(void); void rcu_cpu_stall_reset(void); /* -- cgit From 0345691b24c076655ce8f0f4bfd24cba3467ccbd Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Tue, 8 Feb 2022 17:16:34 +0100 Subject: tick/rcu: Stop allowing RCU_SOFTIRQ in idle RCU_SOFTIRQ used to be special in that it could be raised on purpose within the idle path to prevent from stopping the tick. Some code still prevents from unnecessary warnings related to this specific behaviour while entering in dynticks-idle mode. However the nohz layout has changed quite a bit in ten years, and the removal of CONFIG_RCU_FAST_NO_HZ has been the final straw to this safe-conduct. Now the RCU_SOFTIRQ vector is expected to be raised from sane places. A remaining corner case is admitted though when the vector is invoked in fragile hotplug path. Signed-off-by: Frederic Weisbecker Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Paul E. McKenney Cc: Paul Menzel --- include/linux/interrupt.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 9367f1cb2e3c..9613326d2f8a 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -579,7 +579,13 @@ enum NR_SOFTIRQS }; -#define SOFTIRQ_STOP_IDLE_MASK (~(1 << RCU_SOFTIRQ)) +/* + * Ignoring the RCU vector after ksoftirqd is parked is fine + * because: + * 1) rcutree_migrate_callbacks() takes care of the queue. + * 2) rcu_report_dead() reports the final quiescent states. + */ +#define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ)) /* map softirq index to softirq name. update 'softirq_to_name' in * kernel/softirq.c when adding a new softirq. -- cgit From f96272a90d9eaea9933aaab704ddbd258feb3841 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Tue, 8 Feb 2022 17:16:35 +0100 Subject: lib/irq_poll: Declare IRQ_POLL softirq vector as ksoftirqd-parking safe The following warning may appear while setting a CPU down: NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!! The IRQ_POLL_SOFTIRQ vector can be raised during the hotplug cpu_down() path after ksoftirqd is parked and before the CPU actually dies. However this is handled afterward at the CPUHP_IRQ_POLL_DEAD stage where the queue gets migrated. Hence this warning can be considered spurious and the vector can join the "hotplug-safe" list. Reported-and-tested-by: Paul Menzel Signed-off-by: Frederic Weisbecker Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Paul E. McKenney Cc: Paul Menzel --- include/linux/interrupt.h | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 9613326d2f8a..f40754caaefa 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -580,12 +580,15 @@ enum }; /* - * Ignoring the RCU vector after ksoftirqd is parked is fine - * because: - * 1) rcutree_migrate_callbacks() takes care of the queue. + * The following vectors can be safely ignored after ksoftirqd is parked: + * + * _ RCU: + * 1) rcutree_migrate_callbacks() migrates the queue. * 2) rcu_report_dead() reports the final quiescent states. + * + * _ IRQ_POLL: irq_poll_cpu_dead() migrates the queue */ -#define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ)) +#define SOFTIRQ_HOTPLUG_SAFE_MASK (BIT(RCU_SOFTIRQ) | BIT(IRQ_POLL_SOFTIRQ)) /* map softirq index to softirq name. update 'softirq_to_name' in * kernel/softirq.c when adding a new softirq. -- cgit From b0e846248de5bd57e4b785791f359ecb8f2d7311 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Mon, 27 Dec 2021 07:48:39 +0100 Subject: mfd: db8500-prcmu: Remove dead code for a non-existing config The config DBX500_PRCMU_QOS_POWER was never introduced in the kernel repository. So, the ifdef in ./include/linux/mfd/dbx500-prcmu.h was never effective. Remove these dead function prototypes. Signed-off-by: Lukas Bulwahn Reviewed-by: Linus Walleij Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20211227064839.21405-1-lukas.bulwahn@gmail.com --- include/linux/mfd/dbx500-prcmu.h | 18 ------------------ 1 file changed, 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/dbx500-prcmu.h b/include/linux/mfd/dbx500-prcmu.h index cbf9d7619493..2a255362b5ac 100644 --- a/include/linux/mfd/dbx500-prcmu.h +++ b/include/linux/mfd/dbx500-prcmu.h @@ -556,22 +556,6 @@ static inline void prcmu_clear(unsigned int reg, u32 bits) #define PRCMU_QOS_ARM_OPP 3 #define PRCMU_QOS_DEFAULT_VALUE -1 -#ifdef CONFIG_DBX500_PRCMU_QOS_POWER - -unsigned long prcmu_qos_get_cpufreq_opp_delay(void); -void prcmu_qos_set_cpufreq_opp_delay(unsigned long); -void prcmu_qos_force_opp(int, s32); -int prcmu_qos_requirement(int pm_qos_class); -int prcmu_qos_add_requirement(int pm_qos_class, char *name, s32 value); -int prcmu_qos_update_requirement(int pm_qos_class, char *name, s32 new_value); -void prcmu_qos_remove_requirement(int pm_qos_class, char *name); -int prcmu_qos_add_notifier(int prcmu_qos_class, - struct notifier_block *notifier); -int prcmu_qos_remove_notifier(int prcmu_qos_class, - struct notifier_block *notifier); - -#else - static inline unsigned long prcmu_qos_get_cpufreq_opp_delay(void) { return 0; @@ -613,6 +597,4 @@ static inline int prcmu_qos_remove_notifier(int prcmu_qos_class, return 0; } -#endif - #endif /* __MACH_PRCMU_H */ -- cgit From 56f216d8efbc1212bf5ff8a6ff5e29927965e8db Mon Sep 17 00:00:00 2001 From: Peter Geis Date: Tue, 8 Feb 2022 14:40:23 -0500 Subject: mfd: rk808: Add reboot support to rk808.c This adds reboot support to the rk808 pmic driver and enables it for the rk809 and rk817 devices. This only enables if the rockchip,system-power-controller flag is set. Signed-off-by: Peter Geis Signed-off-by: Frank Wunderlich Reviewed-by: Dmitry Osipenko Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220208194023.929720-1-pgwipeout@gmail.com --- include/linux/mfd/rk808.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mfd/rk808.h b/include/linux/mfd/rk808.h index a96e6d43ca06..58602032e642 100644 --- a/include/linux/mfd/rk808.h +++ b/include/linux/mfd/rk808.h @@ -373,6 +373,7 @@ enum rk805_reg { #define SWITCH2_EN BIT(6) #define SWITCH1_EN BIT(5) #define DEV_OFF_RST BIT(3) +#define DEV_RST BIT(2) #define DEV_OFF BIT(0) #define RTC_STOP BIT(0) -- cgit From 68fa55f0e05ce371c4b5de7932d9f570d61bf791 Mon Sep 17 00:00:00 2001 From: Bharat Bhushan Date: Fri, 11 Feb 2022 10:23:46 +0530 Subject: perf/marvell: cn10k DDR perf event core ownership As DDR perf event counters are not per core, so they should be accessed only by one core at a time. Select new core when previously owning core is going offline. Signed-off-by: Bharat Bhushan Reviewed-by: Bhaskara Budiredla Link: https://lore.kernel.org/r/20220211045346.17894-5-bbhushan2@marvell.com Signed-off-by: Will Deacon --- include/linux/cpuhotplug.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 411a428ace4d..2bc550ac8dc7 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -231,6 +231,7 @@ enum cpuhp_state { CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE, CPUHP_AP_PERF_ARM_APM_XGENE_ONLINE, CPUHP_AP_PERF_ARM_CAVIUM_TX2_UNCORE_ONLINE, + CPUHP_AP_PERF_ARM_MARVELL_CN10K_DDR_ONLINE, CPUHP_AP_PERF_POWERPC_NEST_IMC_ONLINE, CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE, CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE, -- cgit From 64807c2321512f67959e51f09e302c145c336184 Mon Sep 17 00:00:00 2001 From: Arun Ramadoss Date: Mon, 7 Mar 2022 21:45:14 +0530 Subject: net: phy: exported the genphy_read_master_slave function genphy_read_master_slave function allows to configure the master/slave for gigabit phys only. In order to use this function irrespective of speed, moved the speed check to the genphy_read_status call. Signed-off-by: Arun Ramadoss Reviewed-by: Andrew Lunn Signed-off-by: Paolo Abeni --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index cd08cf1a8b0d..20beeaa7443b 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1578,6 +1578,7 @@ int genphy_update_link(struct phy_device *phydev); int genphy_read_lpa(struct phy_device *phydev); int genphy_read_status_fixed(struct phy_device *phydev); int genphy_read_status(struct phy_device *phydev); +int genphy_read_master_slave(struct phy_device *phydev); int genphy_suspend(struct phy_device *phydev); int genphy_resume(struct phy_device *phydev); int genphy_loopback(struct phy_device *phydev, bool enable); -- cgit From 1280f12f56a15abde23503ba876343e5f201c9c2 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 8 Feb 2022 18:56:03 +0000 Subject: drivers/perf: arm_pmu: Handle 47 bit counters The current ARM PMU framework can only deal with 32 or 64bit counters. Teach it about a 47bit flavour. Yes, this is odd. Reviewed-by: Hector Martin Signed-off-by: Marc Zyngier Signed-off-by: Will Deacon --- include/linux/perf/arm_pmu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 2512e2f9cd4e..0407a38b470a 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -26,6 +26,8 @@ */ /* Event uses a 64bit counter */ #define ARMPMU_EVT_64BIT 1 +/* Event uses a 47bit counter */ +#define ARMPMU_EVT_47BIT 2 #define HW_OP_UNSUPPORTED 0xFFFF #define C(_x) PERF_COUNT_HW_CACHE_##_x -- cgit From a8749a35c39903120ec421ef2525acc8e0daa55c Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Tue, 8 Mar 2022 04:47:22 -0500 Subject: mm: vmalloc: introduce array allocation functions Linux has dozens of occurrences of vmalloc(array_size()) and vzalloc(array_size()). Allow to simplify the code by providing vmalloc_array and vcalloc, as well as the underscored variants that let the caller specify the GFP flags. Acked-by: Michal Hocko Signed-off-by: Paolo Bonzini --- include/linux/vmalloc.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 880227b9f044..d1bbd4fd50c5 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -159,6 +159,11 @@ void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, int node, const void *caller) __alloc_size(1); void *vmalloc_no_huge(unsigned long size) __alloc_size(1); +extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2); +extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2); +extern void *__vcalloc(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2); +extern void *vcalloc(size_t n, size_t size) __alloc_size(1, 2); + extern void vfree(const void *addr); extern void vfree_atomic(const void *addr); -- cgit From a99a3e2efaf1f4454eb5c9176f47e66de075b134 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Mon, 31 Jan 2022 11:50:46 -0600 Subject: coredump: Move definition of struct coredump_params into coredump.h Move the definition of struct coredump_params into coredump.h where it belongs. Remove the slightly errorneous comment explaining why struct coredump_params was declared in binfmts.h. Signed-off-by: "Eric W. Biederman" --- include/linux/binfmts.h | 13 +------------ include/linux/coredump.h | 12 +++++++++++- 2 files changed, 12 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 5d651c219c99..3dc20c4f394c 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -8,6 +8,7 @@ #include struct filename; +struct coredump_params; #define CORENAME_MAX_SIZE 128 @@ -77,18 +78,6 @@ struct linux_binprm { #define BINPRM_FLAGS_PRESERVE_ARGV0_BIT 3 #define BINPRM_FLAGS_PRESERVE_ARGV0 (1 << BINPRM_FLAGS_PRESERVE_ARGV0_BIT) -/* Function parameter for binfmt->coredump */ -struct coredump_params { - const kernel_siginfo_t *siginfo; - struct pt_regs *regs; - struct file *file; - unsigned long limit; - unsigned long mm_flags; - loff_t written; - loff_t pos; - loff_t to_skip; -}; - /* * This structure defines the functions that are used to load the binary formats that * linux accepts. diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 248a68c668b4..2ee1460a1d66 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -14,11 +14,21 @@ struct core_vma_metadata { unsigned long dump_size; }; +struct coredump_params { + const kernel_siginfo_t *siginfo; + struct pt_regs *regs; + struct file *file; + unsigned long limit; + unsigned long mm_flags; + loff_t written; + loff_t pos; + loff_t to_skip; +}; + /* * These are the only things you should do on a core-file: use only these * functions to write out all the necessary info. */ -struct coredump_params; extern void dump_skip_to(struct coredump_params *cprm, unsigned long to); extern void dump_skip(struct coredump_params *cprm, size_t nr); extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr); -- cgit From 95c5436a4883841588dae86fb0b9325f47ba5ad3 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Tue, 8 Mar 2022 12:55:29 -0600 Subject: coredump: Snapshot the vmas in do_coredump Move the call of dump_vma_snapshot and kvfree(vma_meta) out of the individual coredump routines into do_coredump itself. This makes the code less error prone and easier to maintain. Make the vma snapshot available to the coredump routines in struct coredump_params. This makes it easier to change and update what is captures in the vma snapshot and will be needed for fixing fill_file_notes. Reviewed-by: Jann Horn Reviewed-by: Kees Cook Signed-off-by: "Eric W. Biederman" --- include/linux/coredump.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 2ee1460a1d66..7d05370e555e 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -23,6 +23,9 @@ struct coredump_params { loff_t written; loff_t pos; loff_t to_skip; + int vma_count; + size_t vma_data_size; + struct core_vma_metadata *vma_meta; }; /* @@ -35,9 +38,6 @@ extern int dump_emit(struct coredump_params *cprm, const void *addr, int nr); extern int dump_align(struct coredump_params *cprm, int align); int dump_user_range(struct coredump_params *cprm, unsigned long start, unsigned long len); -int dump_vma_snapshot(struct coredump_params *cprm, int *vma_count, - struct core_vma_metadata **vma_meta, - size_t *vma_data_size_ptr); extern void do_coredump(const kernel_siginfo_t *siginfo); #else static inline void do_coredump(const kernel_siginfo_t *siginfo) {} -- cgit From a759de6991b35ad437adba32b5f0cb2fd9e75929 Mon Sep 17 00:00:00 2001 From: Youngjin Jang Date: Tue, 8 Mar 2022 04:07:39 +0900 Subject: PM: sleep: Add device name to suspend_report_result() Currently, suspend_report_result() prints only function information. If any driver uses a common PM function, nobody knows who exactly called the failing function. A device pinter is needed to recognize the failing device. For example: PM: dpm_run_callback(): pnp_bus_suspend+0x0/0x10 returns 0 PM: dpm_run_callback(): pci_pm_suspend+0x0/0x150 returns 0 become after the change: serial 00:05: PM: dpm_run_callback(): pnp_bus_suspend+0x0/0x10 returns 0 pci 0000:00:01.3: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x150 returns 0 Signed-off-by: Youngjin Jang [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki --- include/linux/pm.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm.h b/include/linux/pm.h index f7d2be686359..e65b3ab28377 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -770,11 +770,11 @@ extern int dpm_suspend_late(pm_message_t state); extern int dpm_suspend(pm_message_t state); extern int dpm_prepare(pm_message_t state); -extern void __suspend_report_result(const char *function, void *fn, int ret); +extern void __suspend_report_result(const char *function, struct device *dev, void *fn, int ret); -#define suspend_report_result(fn, ret) \ +#define suspend_report_result(dev, fn, ret) \ do { \ - __suspend_report_result(__func__, fn, ret); \ + __suspend_report_result(__func__, dev, fn, ret); \ } while (0) extern int device_pm_wait_for_dev(struct device *sub, struct device *dev); @@ -814,7 +814,7 @@ static inline int dpm_suspend_start(pm_message_t state) return 0; } -#define suspend_report_result(fn, ret) do {} while (0) +#define suspend_report_result(dev, fn, ret) do {} while (0) static inline int device_pm_wait_for_dev(struct device *a, struct device *b) { -- cgit From 390031c942116d4733310f0684beb8db19885fe6 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Tue, 8 Mar 2022 13:04:19 -0600 Subject: coredump: Use the vma snapshot in fill_files_note Matthew Wilcox reported that there is a missing mmap_lock in file_files_note that could possibly lead to a user after free. Solve this by using the existing vma snapshot for consistency and to avoid the need to take the mmap_lock anywhere in the coredump code except for dump_vma_snapshot. Update the dump_vma_snapshot to capture vm_pgoff and vm_file that are neeeded by fill_files_note. Add free_vma_snapshot to free the captured values of vm_file. Reported-by: Matthew Wilcox Link: https://lkml.kernel.org/r/20220131153740.2396974-1-willy@infradead.org Cc: stable@vger.kernel.org Fixes: a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped files") Reviewed-by: Kees Cook Signed-off-by: "Eric W. Biederman" --- include/linux/coredump.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/coredump.h b/include/linux/coredump.h index 7d05370e555e..08a1d3e7e46d 100644 --- a/include/linux/coredump.h +++ b/include/linux/coredump.h @@ -12,6 +12,8 @@ struct core_vma_metadata { unsigned long start, end; unsigned long flags; unsigned long dump_size; + unsigned long pgoff; + struct file *file; }; struct coredump_params { -- cgit From dc55e35f9e810f23dd69cfdc91a3d636023f57a2 Mon Sep 17 00:00:00 2001 From: Alexey Gladkov Date: Mon, 14 Feb 2022 19:18:14 +0100 Subject: ipc: Store mqueue sysctls in the ipc namespace Right now, the mqueue sysctls take ipc namespaces into account in a rather hacky way. This works in most cases, but does not respect the user namespace. Within the user namespace, the user cannot change the /proc/sys/fs/mqueue/* parametres. This poses a problem in the rootless containers. To solve this I changed the implementation of the mqueue sysctls just like some other sysctls. So far, the changes do not provide additional access to files. This will be done in a future patch. v3: * Don't implemenet set_permissions to keep the current behavior. v2: * Fixed compilation problem if CONFIG_POSIX_MQUEUE_SYSCTL is not specified. Reported-by: kernel test robot Signed-off-by: Alexey Gladkov Link: https://lkml.kernel.org/r/b0ccbb2489119f1f20c737cf1930c3a9c4e4243a.1644862280.git.legion@kernel.org Signed-off-by: Eric W. Biederman --- include/linux/ipc_namespace.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index b75395ec8d52..fa787d97d60a 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -10,6 +10,7 @@ #include #include #include +#include struct user_namespace; @@ -63,6 +64,9 @@ struct ipc_namespace { unsigned int mq_msg_default; unsigned int mq_msgsize_default; + struct ctl_table_set mq_set; + struct ctl_table_header *mq_sysctls; + /* user_ns which owns the ipc ns */ struct user_namespace *user_ns; struct ucounts *ucounts; @@ -169,14 +173,18 @@ static inline void put_ipc_ns(struct ipc_namespace *ns) #ifdef CONFIG_POSIX_MQUEUE_SYSCTL -struct ctl_table_header; -extern struct ctl_table_header *mq_register_sysctl_table(void); +void retire_mq_sysctls(struct ipc_namespace *ns); +bool setup_mq_sysctls(struct ipc_namespace *ns); #else /* CONFIG_POSIX_MQUEUE_SYSCTL */ -static inline struct ctl_table_header *mq_register_sysctl_table(void) +static inline void retire_mq_sysctls(struct ipc_namespace *ns) { - return NULL; +} + +static inline bool setup_mq_sysctls(struct ipc_namespace *ns) +{ + return true; } #endif /* CONFIG_POSIX_MQUEUE_SYSCTL */ -- cgit From 1f5c135ee509e89e0cc274333a65f73c62cb16e5 Mon Sep 17 00:00:00 2001 From: Alexey Gladkov Date: Mon, 14 Feb 2022 19:18:15 +0100 Subject: ipc: Store ipc sysctls in the ipc namespace The ipc sysctls are not available for modification inside the user namespace. Following the mqueue sysctls, we changed the implementation to be more userns friendly. So far, the changes do not provide additional access to files. This will be done in a future patch. Signed-off-by: Alexey Gladkov Link: https://lkml.kernel.org/r/be6f9d014276f4dddd0c3aa05a86052856c1c555.1644862280.git.legion@kernel.org Signed-off-by: Eric W. Biederman --- include/linux/ipc_namespace.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index fa787d97d60a..e3e8c8662b49 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -67,6 +67,9 @@ struct ipc_namespace { struct ctl_table_set mq_set; struct ctl_table_header *mq_sysctls; + struct ctl_table_set ipc_set; + struct ctl_table_header *ipc_sysctls; + /* user_ns which owns the ipc ns */ struct user_namespace *user_ns; struct ucounts *ucounts; @@ -188,4 +191,22 @@ static inline bool setup_mq_sysctls(struct ipc_namespace *ns) } #endif /* CONFIG_POSIX_MQUEUE_SYSCTL */ + +#ifdef CONFIG_SYSVIPC_SYSCTL + +bool setup_ipc_sysctls(struct ipc_namespace *ns); +void retire_ipc_sysctls(struct ipc_namespace *ns); + +#else /* CONFIG_SYSVIPC_SYSCTL */ + +static inline void retire_ipc_sysctls(struct ipc_namespace *ns) +{ +} + +static inline bool setup_ipc_sysctls(struct ipc_namespace *ns) +{ + return true; +} + +#endif /* CONFIG_SYSVIPC_SYSCTL */ #endif -- cgit From c57bef0287dd5deeabaea5727950559fb9037cd9 Mon Sep 17 00:00:00 2001 From: Barret Rhoden Date: Thu, 6 Jan 2022 12:20:40 -0500 Subject: prlimit: make do_prlimit() static There are no other callers in the kernel. Fixed up a comment format and whitespace issue when moving do_prlimit() higher in sys.c. Signed-off-by: Barret Rhoden Link: https://lkml.kernel.org/r/20220106172041.522167-3-brho@google.com Signed-off-by: Eric W. Biederman --- include/linux/resource.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/resource.h b/include/linux/resource.h index bdf491cbcab7..4fdbc0c3f315 100644 --- a/include/linux/resource.h +++ b/include/linux/resource.h @@ -8,7 +8,5 @@ struct task_struct; void getrusage(struct task_struct *p, int who, struct rusage *ru); -int do_prlimit(struct task_struct *tsk, unsigned int resource, - struct rlimit *new_rlim, struct rlimit *old_rlim); #endif -- cgit From 18c91bb2d87268d23868bf13508f5bc9cf04e89a Mon Sep 17 00:00:00 2001 From: Barret Rhoden Date: Thu, 6 Jan 2022 12:20:41 -0500 Subject: prlimit: do not grab the tasklist_lock Unnecessarily grabbing the tasklist_lock can be a scalability bottleneck for workloads that also must grab the tasklist_lock for waiting, killing, and cloning. The tasklist_lock was grabbed to protect tsk->sighand from disappearing (becoming NULL). tsk->signal was already protected by holding a reference to tsk. update_rlimit_cpu() assumed tsk->sighand != NULL. With this commit, it attempts to lock_task_sighand(). However, this means that update_rlimit_cpu() can fail. This only happens when a task is exiting. Note that during exec, sighand may *change*, but it will not be NULL. Prior to this commit, the do_prlimit() ensured that update_rlimit_cpu() would not fail by read locking the tasklist_lock and checking tsk->sighand != NULL. If update_rlimit_cpu() fails, there may be other tasks that are not exiting that share tsk->signal. However, the group_leader is the last task to be released, so if we cannot update_rlimit_cpu(group_leader), then the entire process is exiting. The only other caller of update_rlimit_cpu() is selinux_bprm_committing_creds(). It has tsk == current, so update_rlimit_cpu() cannot fail (current->sighand cannot disappear until current exits). This change resulted in a 14% speedup on a microbenchmark where parents kill and wait on their children, and children getpriority, setpriority, and getrlimit. Signed-off-by: Barret Rhoden Link: https://lkml.kernel.org/r/20220106172041.522167-4-brho@google.com Signed-off-by: Eric W. Biederman --- include/linux/posix-timers.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 5bbcd280bfd2..9cf126c3b27f 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -253,7 +253,7 @@ void posix_cpu_timers_exit_group(struct task_struct *task); void set_process_cpu_timer(struct task_struct *task, unsigned int clock_idx, u64 *newval, u64 *oldval); -void update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); +int update_rlimit_cpu(struct task_struct *task, unsigned long rlim_new); void posixtimer_rearm(struct kernel_siginfo *info); #endif -- cgit From 41d36a9f3e5336f5b48c3adba0777b8e217020d7 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 8 Mar 2022 07:05:28 +0100 Subject: fs: remove kiocb.ki_hint This field is entirely unused now except for a tracepoint in f2fs, so remove it. Signed-off-by: Christoph Hellwig Reviewed-by: Dave Chinner Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220308060529.736277-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/fs.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index e2d892b201b0..d5658ac5d8c6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -327,7 +327,6 @@ struct kiocb { void (*ki_complete)(struct kiocb *iocb, long ret); void *private; int ki_flags; - u16 ki_hint; u16 ki_ioprio; /* See linux/ioprio.h */ struct wait_page_queue *ki_waitq; /* for async buffered IO */ randomized_struct_fields_end @@ -2225,21 +2224,11 @@ static inline enum rw_hint file_write_hint(struct file *file) static inline int iocb_flags(struct file *file); -static inline u16 ki_hint_validate(enum rw_hint hint) -{ - typeof(((struct kiocb *)0)->ki_hint) max_hint = -1; - - if (hint <= max_hint) - return hint; - return 0; -} - static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp) { *kiocb = (struct kiocb) { .ki_filp = filp, .ki_flags = iocb_flags(filp), - .ki_hint = ki_hint_validate(file_write_hint(filp)), .ki_ioprio = get_current_ioprio(), }; } @@ -2250,7 +2239,6 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, *kiocb = (struct kiocb) { .ki_filp = filp, .ki_flags = kiocb_src->ki_flags, - .ki_hint = kiocb_src->ki_hint, .ki_ioprio = kiocb_src->ki_ioprio, .ki_pos = kiocb_src->ki_pos, }; -- cgit From 7b12e49669c99f63bc12351c57e581f1f14d4adf Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 8 Mar 2022 07:05:29 +0100 Subject: fs: remove fs.f_write_hint The value is now completely unused except for reporting it back through the F_GET_FILE_RW_HINT ioctl, so remove the value and the two ioctls for it. Trying to use the F_SET_FILE_RW_HINT and F_GET_FILE_RW_HINT fcntls will now return EINVAL, just like it would on a kernel that never supported this functionality in the first place. Signed-off-by: Christoph Hellwig Reviewed-by: Dave Chinner Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220308060529.736277-3-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/fs.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index d5658ac5d8c6..a1fc3b41cd82 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -966,7 +966,6 @@ struct file { * Must not be taken from IRQ context. */ spinlock_t f_lock; - enum rw_hint f_write_hint; atomic_long_t f_count; unsigned int f_flags; fmode_t f_mode; @@ -2214,14 +2213,6 @@ static inline bool HAS_UNMAPPED_ID(struct user_namespace *mnt_userns, !gid_valid(i_gid_into_mnt(mnt_userns, inode)); } -static inline enum rw_hint file_write_hint(struct file *file) -{ - if (file->f_write_hint != WRITE_LIFE_NOT_SET) - return file->f_write_hint; - - return file_inode(file)->i_write_hint; -} - static inline int iocb_flags(struct file *file); static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp) -- cgit From 4e5cc99e1e485954a9c09872e0eeea570fb2b5a5 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Tue, 8 Mar 2022 15:32:19 +0800 Subject: blk-mq: manage hctx map via xarray First code becomes more clean by switching to xarray from plain array. Second use-after-free on q->queue_hw_ctx can be fixed because queue_for_each_hw_ctx() may be run when updating nr_hw_queues is in-progress. With this patch, q->hctx_table is defined as xarray, and this structure will share same lifetime with request queue, so queue_for_each_hw_ctx() can use q->hctx_table to lookup hctx reliably. Reported-by: Yu Kuai Signed-off-by: Ming Lei Reviewed-by: Hannes Reinecke Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220308073219.91173-7-ming.lei@redhat.com [axboe: fix blk_mq_hw_ctx forward declaration] Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 3 +-- include/linux/blkdev.h | 2 +- 2 files changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 3a41d50b85d3..7aa5c54901a9 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -917,8 +917,7 @@ static inline void *blk_mq_rq_to_pdu(struct request *rq) } #define queue_for_each_hw_ctx(q, hctx, i) \ - for ((i) = 0; (i) < (q)->nr_hw_queues && \ - ({ hctx = (q)->queue_hw_ctx[i]; 1; }); (i)++) + xa_for_each(&(q)->hctx_table, (i), (hctx)) #define hctx_for_each_ctx(hctx, ctx, i) \ for ((i) = 0; (i) < (hctx)->nr_ctx && \ diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index e19947d84f12..d982171469bc 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -355,7 +355,7 @@ struct request_queue { unsigned int queue_depth; /* hw dispatch queues */ - struct blk_mq_hw_ctx **queue_hw_ctx; + struct xarray hctx_table; unsigned int nr_hw_queues; /* -- cgit From 1330b6ef3313fcec577d2b020c290dc8b9f11f1a Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 7 Mar 2022 16:44:21 -0800 Subject: skb: make drop reason booleanable We have a number of cases where function returns drop/no drop decision as a boolean. Now that we want to report the reason code as well we have to pass extra output arguments. We can make the reason code evaluate correctly as bool. I believe we're good to reorder the reasons as they are reported to user space as strings. Signed-off-by: Jakub Kicinski Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 34f572271c0c..26538ceb4b01 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -314,6 +314,7 @@ struct sk_buff; * used to translate the reason to string. */ enum skb_drop_reason { + SKB_NOT_DROPPED_YET = 0, SKB_DROP_REASON_NOT_SPECIFIED, /* drop reason is not specified */ SKB_DROP_REASON_NO_SOCKET, /* socket not found */ SKB_DROP_REASON_PKT_TOO_SMALL, /* packet size is too small */ -- cgit From d8fd5a1e78db375f2246d43df7833fec07a221cd Mon Sep 17 00:00:00 2001 From: Joey Gouly Date: Tue, 1 Mar 2022 15:45:18 +0000 Subject: kasan: fix a missing header include of static_keys.h The kasan-enabled.h header relies on static keys, so make sure to include the header to avoid compilation errors (with JUMP_LABEL=n). It fixes the following: ./include/linux/kasan-enabled.h:9:1: warning: data definition has no type or storage class 9 | DECLARE_STATIC_KEY_FALSE(kasan_flag_enabled); | ^~~~~~~~~~~~~~~~~~~~~~~~ error: type defaults to 'int' in declaration of 'DECLARE_STATIC_KEY_FALSE' [-Werror=implicit-int] Fixes: f9b5e46f4097eb29 ("kasan: split kasan_*enabled() functions into a separate header") Cc: Peter Collingbourne Cc: Mark Rutland Cc: Catalin Marinas Cc: Will Deacon Acked-by: Andrey Konovalov Signed-off-by: Joey Gouly Link: https://lore.kernel.org/r/20220301154518.19456-1-joey.gouly@arm.com Signed-off-by: Will Deacon --- include/linux/kasan-enabled.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kasan-enabled.h b/include/linux/kasan-enabled.h index 4b6615375022..6f612d69ea0c 100644 --- a/include/linux/kasan-enabled.h +++ b/include/linux/kasan-enabled.h @@ -2,6 +2,8 @@ #ifndef _LINUX_KASAN_ENABLED_H #define _LINUX_KASAN_ENABLED_H +#include + #ifdef CONFIG_KASAN_HW_TAGS DECLARE_STATIC_KEY_FALSE(kasan_flag_enabled); -- cgit From f804360bb3a50decbed6e2761247964dca72c080 Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Sat, 26 Feb 2022 22:41:25 +0100 Subject: clk: qcom: smd: Add missing RPM clocks for msm8992/4 XO and MSS_CFG were omitted when first adding the clocks for these SoCs. Add them, and while at it, move the XO clock to the top of the definition list, as ideally everyone should start using it sooner or later.. Fixes: b4297844995f ("clk: qcom: smd: Add support for MSM8992/4 rpm clocks") Signed-off-by: Konrad Dybcio Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20220226214126.21209-2-konrad.dybcio@somainline.org --- include/linux/soc/qcom/smd-rpm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/smd-rpm.h b/include/linux/soc/qcom/smd-rpm.h index 860dd8cdf9f3..82c9d489833a 100644 --- a/include/linux/soc/qcom/smd-rpm.h +++ b/include/linux/soc/qcom/smd-rpm.h @@ -40,6 +40,7 @@ struct qcom_smd_rpm; #define QCOM_SMD_RPM_AGGR_CLK 0x72676761 #define QCOM_SMD_RPM_HWKM_CLK 0x6d6b7768 #define QCOM_SMD_RPM_PKA_CLK 0x616b70 +#define QCOM_SMD_RPM_MCFG_CLK 0x6766636d int qcom_rpm_smd_write(struct qcom_smd_rpm *rpm, int state, -- cgit From 69fe0f29892077f14b56e2a479b6bcf533209d53 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Fri, 4 Mar 2022 21:08:03 -0500 Subject: block: add ->poll_bio to block_device_operations Prepare for supporting IO polling for bio-based driver. Add ->poll_bio callback so that bio-based driver can provide their own logic for polling bio. Also fix ->submit_bio_bio typo in comment block above __submit_bio_noacct. Reviewed-by: Christoph Hellwig Reviewed-by: Jens Axboe Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer --- include/linux/blkdev.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f757f9c2871f..51f1b1ddbed2 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1455,6 +1455,8 @@ enum blk_unique_id { struct block_device_operations { void (*submit_bio)(struct bio *bio); + int (*poll_bio)(struct bio *bio, struct io_comp_batch *iob, + unsigned int flags); int (*open) (struct block_device *, fmode_t); void (*release) (struct gendisk *, fmode_t); int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int); -- cgit From ac77998b7ac3044f0509b097da9637184598980d Mon Sep 17 00:00:00 2001 From: Mohammad Kabat Date: Thu, 25 Mar 2021 14:38:55 +0200 Subject: net/mlx5: Fix size field in bufferx_reg struct According to HW spec the field "size" should be 16 bits in bufferx register. Fixes: e281682bf294 ("net/mlx5_core: HW data structs/types definitions cleanup") Signed-off-by: Mohammad Kabat Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 598ac3bcc901..5743f5b3414b 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -9900,8 +9900,8 @@ struct mlx5_ifc_bufferx_reg_bits { u8 reserved_at_0[0x6]; u8 lossy[0x1]; u8 epsb[0x1]; - u8 reserved_at_8[0xc]; - u8 size[0xc]; + u8 reserved_at_8[0x8]; + u8 size[0x10]; u8 xoff_threshold[0x10]; u8 xon_threshold[0x10]; -- cgit From 99a2b9be077ae3a5d97fbf5f7782e0f2e9812978 Mon Sep 17 00:00:00 2001 From: Ben Ben-Ishay Date: Wed, 2 Mar 2022 17:07:08 +0200 Subject: net/mlx5e: SHAMPO, reduce TIR indication SHAMPO is an RQ / WQ feature, an indication was added to the TIR in the first place to enforce suitability between connected TIR and RQ, this enforcement does not exist in current the Firmware implementation and was redundant in the first place. Fixes: 83439f3c37aa ("net/mlx5e: Add HW-GRO offload") Signed-off-by: Ben Ben-Ishay Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 5743f5b3414b..49a48d7709ac 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -3434,7 +3434,6 @@ enum { enum { MLX5_TIRC_PACKET_MERGE_MASK_IPV4_LRO = BIT(0), MLX5_TIRC_PACKET_MERGE_MASK_IPV6_LRO = BIT(1), - MLX5_TIRC_PACKET_MERGE_MASK_SHAMPO = BIT(2), }; enum { -- cgit From 34f46ae0d4b38e83cfb26fb6f06b5b5efea47fdc Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 27 Jan 2022 15:22:21 +0200 Subject: net/mlx5: Add command failures data to debugfs Add new counters to command interface debugfs to count command failures. The following counters added: total_failed - number of times command failed (any kind of failure). failed_mbox_status - number of times command failed on bad status returned by FW. In addition, add data about last command failure to command interface debugfs: last_failed_errno - last command failed returned errno. last_failed_mbox_status - last bad status returned by FW. Signed-off-by: Moshe Shemesh Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index d3b1a6a1f8d2..f18c1e15a12c 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -264,6 +264,14 @@ enum { struct mlx5_cmd_stats { u64 sum; u64 n; + /* number of times command failed */ + u64 failed; + /* number of times command failed on bad status returned by FW */ + u64 failed_mbox_status; + /* last command failed returned errno */ + u32 last_failed_errno; + /* last bad status returned by FW */ + u8 last_failed_mbox_status; struct dentry *root; /* protect command average calculations */ spinlock_t lock; @@ -955,6 +963,7 @@ typedef void (*mlx5_async_cbk_t)(int status, struct mlx5_async_work *context); struct mlx5_async_work { struct mlx5_async_ctx *ctx; mlx5_async_cbk_t user_callback; + u16 opcode; /* cmd opcode */ void *out; /* pointer to the cmd output buffer */ }; -- cgit From d2cb8dda214fb75f946ba554a1ecd25da7004b2b Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Wed, 26 Jan 2022 07:23:53 +0200 Subject: net/mlx5: Change release_all_pages cap bit location mlx5 FW has changed release_all_pages cap bit by one bit offset to reflect a fix in the FW flow for release_all_pages. The driver should use the new bit to ensure it calls release_all_pages only if the FW fix is there. Signed-off-by: Moshe Shemesh Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index ea65131835ab..69985e4d8dfe 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1419,8 +1419,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 reserved_at_130[0xa]; u8 log_max_ra_res_dc[0x6]; - u8 reserved_at_140[0x6]; + u8 reserved_at_140[0x5]; u8 release_all_pages[0x1]; + u8 must_not_use[0x1]; u8 reserved_at_147[0x2]; u8 roce_accl[0x1]; u8 log_max_ra_req_qp[0x6]; -- cgit From 66771a1c729e816dc84e29ba555013b6e722fb22 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Fri, 18 Feb 2022 09:36:20 +0200 Subject: net/mlx5: Move debugfs entries to separate struct Move the debugfs entry pointers under priv to their own struct. Add get function for device debugfs root. Signed-off-by: Moshe Shemesh Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index f18c1e15a12c..aa539d245a12 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -551,6 +551,14 @@ struct mlx5_adev { int idx; }; +struct mlx5_debugfs_entries { + struct dentry *dbg_root; + struct dentry *qp_debugfs; + struct dentry *eq_debugfs; + struct dentry *cq_debugfs; + struct dentry *cmdif_debugfs; +}; + struct mlx5_ft_pool; struct mlx5_priv { /* IRQ table valid only for real pci devices PF or VF */ @@ -570,12 +578,7 @@ struct mlx5_priv { struct mlx5_core_health health; struct list_head traps; - /* start: qp staff */ - struct dentry *qp_debugfs; - struct dentry *eq_debugfs; - struct dentry *cq_debugfs; - struct dentry *cmdif_debugfs; - /* end: qp staff */ + struct mlx5_debugfs_entries dbg; /* start: alloc staff */ /* protect buffer allocation according to numa node */ @@ -585,7 +588,6 @@ struct mlx5_priv { struct mutex pgdir_mutex; struct list_head pgdir_list; /* end: alloc staff */ - struct dentry *dbg_root; struct list_head ctx_list; spinlock_t ctx_lock; @@ -1038,6 +1040,7 @@ int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn); int mlx5_core_attach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn); int mlx5_core_detach_mcg(struct mlx5_core_dev *dev, union ib_gid *mgid, u32 qpn); +struct dentry *mlx5_debugfs_get_dev_root(struct mlx5_core_dev *dev); void mlx5_qp_debugfs_init(struct mlx5_core_dev *dev); void mlx5_qp_debugfs_cleanup(struct mlx5_core_dev *dev); int mlx5_access_reg(struct mlx5_core_dev *dev, void *data_in, int size_in, -- cgit From 4e05cbf05c66ca6931ee4f2a5aff0d84f1e65f40 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 27 Jan 2022 07:03:33 +0200 Subject: net/mlx5: Add pages debugfs Add pages debugfs to expose the following counters for debuggability: fw_pages_total - How many pages were given to FW and not returned yet. vfs_pages - For SRIOV, how many pages were given to FW for virtual functions usage. host_pf_pages - For ECPF, how many pages were given to FW for external hosts physical functions usage. Signed-off-by: Moshe Shemesh Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index aa539d245a12..c5f93b5910ed 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -557,6 +557,7 @@ struct mlx5_debugfs_entries { struct dentry *eq_debugfs; struct dentry *cq_debugfs; struct dentry *cmdif_debugfs; + struct dentry *pages_debugfs; }; struct mlx5_ft_pool; @@ -569,11 +570,11 @@ struct mlx5_priv { struct mlx5_nb pg_nb; struct workqueue_struct *pg_wq; struct xarray page_root_xa; - int fw_pages; + u32 fw_pages; atomic_t reg_pages; struct list_head free_list; - int vfs_pages; - int host_pf_pages; + u32 vfs_pages; + u32 host_pf_pages; struct mlx5_core_health health; struct list_head traps; @@ -1026,6 +1027,8 @@ int mlx5_pagealloc_init(struct mlx5_core_dev *dev); void mlx5_pagealloc_cleanup(struct mlx5_core_dev *dev); void mlx5_pagealloc_start(struct mlx5_core_dev *dev); void mlx5_pagealloc_stop(struct mlx5_core_dev *dev); +void mlx5_pages_debugfs_init(struct mlx5_core_dev *dev); +void mlx5_pages_debugfs_cleanup(struct mlx5_core_dev *dev); void mlx5_core_req_pages_handler(struct mlx5_core_dev *dev, u16 func_id, s32 npages, bool ec_function); int mlx5_satisfy_startup_pages(struct mlx5_core_dev *dev, int boot); -- cgit From 32071187e9fb18da62f5be569bd2ea0d7a981ee8 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Thu, 27 Jan 2022 07:51:14 +0200 Subject: net/mlx5: Add debugfs counters for page commands failures Add the following new debugfs counters for debug and verbosity: fw_pages_alloc_failed - number of pages FW requested but driver failed to allocate. give_pages_dropped - number of pages given to FW, but command give pages failed by FW. reclaim_pages_discard - number of pages which were about to reclaim back and FW failed the command. Signed-off-by: Moshe Shemesh Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index c5f93b5910ed..00a914b0716e 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -575,6 +575,9 @@ struct mlx5_priv { struct list_head free_list; u32 vfs_pages; u32 host_pf_pages; + u32 fw_pages_alloc_failed; + u32 give_pages_dropped; + u32 reclaim_pages_discard; struct mlx5_core_health health; struct list_head traps; -- cgit From 5c422bfad2fbab96381273e50c7084465199501d Mon Sep 17 00:00:00 2001 From: Yevgeny Kliteynik Date: Thu, 24 Feb 2022 00:58:58 +0200 Subject: net/mlx5: DR, Add support for matching on Internet Header Length (IHL) Add support for matching on new field - Internet Header Length (IHL). Signed-off-by: Muhammad Sammar Signed-off-by: Yevgeny Kliteynik Reviewed-by: Alex Vesker Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 69985e4d8dfe..1f0c35162b7b 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -493,7 +493,10 @@ struct mlx5_ifc_fte_match_set_lyr_2_4_bits { u8 tcp_sport[0x10]; u8 tcp_dport[0x10]; - u8 reserved_at_c0[0x18]; + u8 reserved_at_c0[0x10]; + u8 ipv4_ihl[0x4]; + u8 reserved_at_c4[0x4]; + u8 ttl_hoplimit[0x8]; u8 udp_sport[0x10]; -- cgit From 6862c787c7e88df490675ed781dc9052dba88a56 Mon Sep 17 00:00:00 2001 From: Yevgeny Kliteynik Date: Wed, 23 Feb 2022 18:40:59 +0200 Subject: net/mlx5: DR, Add support for ConnectX-7 steering Add support for a new SW format version that is implemented by ConnectX-7. Except for several differences, the STEv2 is identical to STEv1, so for most callbacks the STEv2 context struct will call STEv1 functions. Signed-off-by: Yevgeny Kliteynik Reviewed-by: Alex Vesker Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 1f0c35162b7b..bcf60ede6fcc 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1346,6 +1346,7 @@ enum mlx5_fc_bulk_alloc_bitmask { enum { MLX5_STEERING_FORMAT_CONNECTX_5 = 0, MLX5_STEERING_FORMAT_CONNECTX_6DX = 1, + MLX5_STEERING_FORMAT_CONNECTX_7 = 2, }; struct mlx5_ifc_cmd_hca_cap_bits { -- cgit From 9a61d0838cd0a81529badfba7bfa39e81d5529d3 Mon Sep 17 00:00:00 2001 From: Kajol Jain Date: Fri, 25 Feb 2022 20:00:21 +0530 Subject: drivers/nvdimm: Add nvdimm pmu structure A structure is added called nvdimm_pmu, for performance stats reporting support of nvdimm devices. It can be used to add device pmu data such as pmu data structure for performance stats, nvdimm device pointer along with cpumask attributes. Acked-by: Peter Zijlstra (Intel) Tested-by: Nageswara R Sastry Signed-off-by: Kajol Jain Reviewed-by: Madhavan Srinivasan Link: https://lore.kernel.org/r/20220225143024.47947-2-kjain@linux.ibm.com Signed-off-by: Dan Williams --- include/linux/nd.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nd.h b/include/linux/nd.h index 8a8c63edb1b2..ad186e828263 100644 --- a/include/linux/nd.h +++ b/include/linux/nd.h @@ -8,6 +8,7 @@ #include #include #include +#include enum nvdimm_event { NVDIMM_REVALIDATE_POISON, @@ -23,6 +24,25 @@ enum nvdimm_claim_class { NVDIMM_CCLASS_UNKNOWN, }; +/** + * struct nvdimm_pmu - data structure for nvdimm perf driver + * @pmu: pmu data structure for nvdimm performance stats. + * @dev: nvdimm device pointer. + * @cpu: designated cpu for counter access. + * @node: node for cpu hotplug notifier link. + * @cpuhp_state: state for cpu hotplug notification. + * @arch_cpumask: cpumask to get designated cpu for counter access. + */ +struct nvdimm_pmu { + struct pmu pmu; + struct device *dev; + int cpu; + struct hlist_node node; + enum cpuhp_state cpuhp_state; + /* cpumask provided by arch/platform specific code */ + struct cpumask arch_cpumask; +}; + struct nd_device_driver { struct device_driver drv; unsigned long type; -- cgit From 0fab1ba6ad6ba1f76380f92ead95c6e861ef8116 Mon Sep 17 00:00:00 2001 From: Kajol Jain Date: Fri, 25 Feb 2022 20:00:22 +0530 Subject: drivers/nvdimm: Add perf interface to expose nvdimm performance stats A common interface is added to get performance stats reporting support for nvdimm devices. Added interface defines supported event list, config fields for the event attributes and their corresponding bit values which are exported via sysfs. Interface also added support for pmu register/unregister functions, cpu hotplug feature along with macros for handling events addition via sysfs. It adds attribute groups for format, cpumask and events to the pmu structure. User could use the standard perf tool to access perf events exposed via nvdimm pmu. [Declare pmu functions in nd.h file to resolve implicit-function-declaration warning and make hotplug function static as reported by kernel test robot] Acked-by: Peter Zijlstra (Intel) Tested-by: Nageswara R Sastry Signed-off-by: Kajol Jain Link: https://lore.kernel.org/all/202202241242.zqzGkguy-lkp@intel.com/ Reported-by: kernel test robot Reviewed-by: Madhavan Srinivasan Link: https://lore.kernel.org/r/20220225143024.47947-3-kjain@linux.ibm.com Signed-off-by: Dan Williams --- include/linux/nd.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nd.h b/include/linux/nd.h index ad186e828263..4813c7089e5c 100644 --- a/include/linux/nd.h +++ b/include/linux/nd.h @@ -9,6 +9,7 @@ #include #include #include +#include enum nvdimm_event { NVDIMM_REVALIDATE_POISON, @@ -24,6 +25,19 @@ enum nvdimm_claim_class { NVDIMM_CCLASS_UNKNOWN, }; +#define NVDIMM_EVENT_VAR(_id) event_attr_##_id +#define NVDIMM_EVENT_PTR(_id) (&event_attr_##_id.attr.attr) + +#define NVDIMM_EVENT_ATTR(_name, _id) \ + PMU_EVENT_ATTR(_name, NVDIMM_EVENT_VAR(_id), _id, \ + nvdimm_events_sysfs_show) + +/* Event attribute array index */ +#define NVDIMM_PMU_FORMAT_ATTR 0 +#define NVDIMM_PMU_EVENT_ATTR 1 +#define NVDIMM_PMU_CPUMASK_ATTR 2 +#define NVDIMM_PMU_NULL_ATTR 3 + /** * struct nvdimm_pmu - data structure for nvdimm perf driver * @pmu: pmu data structure for nvdimm performance stats. @@ -43,6 +57,16 @@ struct nvdimm_pmu { struct cpumask arch_cpumask; }; +extern ssize_t nvdimm_events_sysfs_show(struct device *dev, + struct device_attribute *attr, + char *page); + +int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev); +void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu); +void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu); +int perf_pmu_register(struct pmu *pmu, const char *name, int type); +void perf_pmu_unregister(struct pmu *pmu); + struct nd_device_driver { struct device_driver drv; unsigned long type; -- cgit From 013a3e7c79ac51e6cdd84e3580863a562c7597c7 Mon Sep 17 00:00:00 2001 From: Min Li Date: Tue, 8 Mar 2022 09:10:51 -0500 Subject: ptp: idt82p33: use rsmu driver to access i2c/spi bus rsmu (Renesas Synchronization Management Unit ) driver is located in drivers/mfd and responsible for creating multiple devices including idt82p33 phc, which will then use the exposed regmap and mutex handle to access i2c/spi bus. Signed-off-by: Min Li Acked-by: Richard Cochran Link: https://lore.kernel.org/r/1646748651-16811-1-git-send-email-min.li.xe@renesas.com Signed-off-by: Jakub Kicinski --- include/linux/mfd/idt82p33_reg.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mfd/idt82p33_reg.h b/include/linux/mfd/idt82p33_reg.h index 129a6c078221..1db532feeb91 100644 --- a/include/linux/mfd/idt82p33_reg.h +++ b/include/linux/mfd/idt82p33_reg.h @@ -7,6 +7,8 @@ #ifndef HAVE_IDT82P33_REG #define HAVE_IDT82P33_REG +#define REG_ADDR(page, offset) (((page) << 0x7) | ((offset) & 0x7f)) + /* Register address */ #define DPLL1_TOD_CNFG 0x134 #define DPLL2_TOD_CNFG 0x1B4 @@ -41,6 +43,7 @@ #define REG_SOFT_RESET 0X381 #define OUT_MUX_CNFG(outn) REG_ADDR(0x6, (0xC * (outn))) +#define TOD_TRIGGER(wr_trig, rd_trig) ((wr_trig & 0xf) << 4 | (rd_trig & 0xf)) /* Register bit definitions */ #define SYNC_TOD BIT(1) -- cgit From 8ddde07a3d285a0f3cec14924446608320fdc013 Mon Sep 17 00:00:00 2001 From: Tian Tao Date: Tue, 8 Mar 2022 16:59:10 +0800 Subject: dma-mapping: benchmark: extract a common header file for map_benchmark definition kernel/dma/map_benchmark.c and selftests/dma/dma_map_benchmark.c have duplicate map_benchmark definitions, which tends to lead to inconsistent changes to map_benchmark on both sides, extract a common header file to avoid this problem. Signed-off-by: Tian Tao Acked-by: Barry Song Reviewed-by: Shuah Khan Signed-off-by: Christoph Hellwig --- include/linux/map_benchmark.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 include/linux/map_benchmark.h (limited to 'include/linux') diff --git a/include/linux/map_benchmark.h b/include/linux/map_benchmark.h new file mode 100644 index 000000000000..62674c83bde4 --- /dev/null +++ b/include/linux/map_benchmark.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2022 HiSilicon Limited. + */ + +#ifndef _KERNEL_DMA_BENCHMARK_H +#define _KERNEL_DMA_BENCHMARK_H + +#define DMA_MAP_BENCHMARK _IOWR('d', 1, struct map_benchmark) +#define DMA_MAP_MAX_THREADS 1024 +#define DMA_MAP_MAX_SECONDS 300 +#define DMA_MAP_MAX_TRANS_DELAY (10 * NSEC_PER_MSEC) + +#define DMA_MAP_BIDIRECTIONAL 0 +#define DMA_MAP_TO_DEVICE 1 +#define DMA_MAP_FROM_DEVICE 2 + +struct map_benchmark { + __u64 avg_map_100ns; /* average map latency in 100ns */ + __u64 map_stddev; /* standard deviation of map latency */ + __u64 avg_unmap_100ns; /* as above */ + __u64 unmap_stddev; + __u32 threads; /* how many threads will do map/unmap in parallel */ + __u32 seconds; /* how long the test will last */ + __s32 node; /* which numa node this benchmark will run on */ + __u32 dma_bits; /* DMA addressing capability */ + __u32 dma_dir; /* DMA data direction */ + __u32 dma_trans_ns; /* time for DMA transmission in ns */ + __u32 granule; /* how many PAGE_SIZE will do map/unmap once a time */ +}; +#endif /* _KERNEL_DMA_BENCHMARK_H */ -- cgit From e7a6c00dc77aedf27a601738ea509f1caea6d673 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 4 Mar 2022 08:22:22 -0700 Subject: io_uring: add support for registering ring file descriptors Lots of workloads use multiple threads, in which case the file table is shared between them. This makes getting and putting the ring file descriptor for each io_uring_enter(2) system call more expensive, as it involves an atomic get and put for each call. Similarly to how we allow registering normal file descriptors to avoid this overhead, add support for an io_uring_register(2) API that allows to register the ring fds themselves: 1) IORING_REGISTER_RING_FDS - takes an array of io_uring_rsrc_update structs, and registers them with the task. 2) IORING_UNREGISTER_RING_FDS - takes an array of io_uring_src_update structs, and unregisters them. When a ring fd is registered, it is internally represented by an offset. This offset is returned to the application, and the application then uses this offset and sets IORING_ENTER_REGISTERED_RING for the io_uring_enter(2) system call. This works just like using a registered file descriptor, rather than a real one, in an SQE, where IOSQE_FIXED_FILE gets set to tell io_uring that we're using an internal offset/descriptor rather than a real file descriptor. In initial testing, this provides a nice bump in performance for threaded applications in real world cases where the batch count (eg number of requests submitted per io_uring_enter(2) invocation) is low. In a microbenchmark, submitting NOP requests, we see the following increases in performance: Requests per syscall Baseline Registered Increase ---------------------------------------------------------------- 1 ~7030K ~8080K +15% 2 ~13120K ~14800K +13% 4 ~22740K ~25300K +11% Co-developed-by: Xiaoguang Wang Signed-off-by: Jens Axboe --- include/linux/io_uring.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 649a4d7c241b..1814e698d861 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -9,11 +9,14 @@ struct sock *io_uring_get_socket(struct file *file); void __io_uring_cancel(bool cancel_all); void __io_uring_free(struct task_struct *tsk); +void io_uring_unreg_ringfd(void); static inline void io_uring_files_cancel(void) { - if (current->io_uring) + if (current->io_uring) { + io_uring_unreg_ringfd(); __io_uring_cancel(false); + } } static inline void io_uring_task_cancel(void) { -- cgit From 382627824afb09cd86192f4b649fca8c5bfe5f40 Mon Sep 17 00:00:00 2001 From: Xiongwei Song Date: Thu, 10 Mar 2022 22:07:00 +0800 Subject: mm: slab: Delete unused SLAB_DEACTIVATED flag Since commit 9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for all accounted allocations") deleted all SLAB_DEACTIVATED users, therefore this flag is not needed any more, let's delete it. Signed-off-by: Xiongwei Song Acked-by: David Rientjes Reviewed-by: Roman Gushchin Acked-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka Link: https://lore.kernel.org/r/20220310140701.87908-2-sxwjean@me.com --- include/linux/slab.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 37bde99b74af..b6b3eed6c7c4 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -117,9 +117,6 @@ #define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0x00020000U) #define SLAB_TEMPORARY SLAB_RECLAIM_ACCOUNT /* Objects are short-lived */ -/* Slab deactivation flag */ -#define SLAB_DEACTIVATED ((slab_flags_t __force)0x10000000U) - /* * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. * -- cgit From afcf5441b9ff22ac57244cd45ff102ebc2e32d1a Mon Sep 17 00:00:00 2001 From: Dan Li Date: Wed, 2 Mar 2022 23:43:23 -0800 Subject: arm64: Add gcc Shadow Call Stack support Shadow call stacks will be available in GCC >= 12, this patch makes the corresponding kernel configuration available when compiling the kernel with the gcc. Note that the implementation in GCC is slightly different from Clang. With SCS enabled, functions will only pop x30 once in the epilogue, like: str x30, [x18], #8 stp x29, x30, [sp, #-16]! ...... - ldp x29, x30, [sp], #16 //clang + ldr x29, [sp], #16 //GCC ldr x30, [x18, #-8]! Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ce09ab17ddd21f73ff2caf6eec3b0ee9b0e1a11e Reviewed-by: Nathan Chancellor Reviewed-by: Kees Cook Reviewed-by: Nick Desaulniers Signed-off-by: Dan Li Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220303074323.86282-1-ashimida@linux.alibaba.com --- include/linux/compiler-gcc.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index ccbbd31b3aae..deff5b308470 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -97,6 +97,10 @@ #define KASAN_ABI_VERSION 4 #endif +#ifdef CONFIG_SHADOW_CALL_STACK +#define __noscs __attribute__((__no_sanitize__("shadow-call-stack"))) +#endif + #if __has_attribute(__no_sanitize_address__) #define __no_sanitize_address __attribute__((no_sanitize_address)) #else -- cgit From b7f8dff09827c96032c34a945ee7757e394b5952 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Thu, 10 Mar 2022 11:45:58 -0500 Subject: dm: simplify dm_sumbit_bio_remap interface Remove the from_wq argument from dm_sumbit_bio_remap(). Eliminates the need for dm_sumbit_bio_remap() callers to know whether they are calling for a workqueue or from the original dm_submit_bio(). Add map_task to dm_io struct, record the map_task in alloc_io and clear it after all target ->map() calls have completed. Update dm_sumbit_bio_remap to check if 'current' matches io->map_task rather than rely on passed 'from_rq' argument. This change really simplifies the chore of porting each DM target to using dm_sumbit_bio_remap() because there is no longer the risk of programming error by not completely knowing all the different contexts a particular method that calls dm_sumbit_bio_remap() might be used in. Signed-off-by: Mike Snitzer --- include/linux/device-mapper.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 70cd9449275a..901ec191250c 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -471,7 +471,7 @@ int dm_suspended(struct dm_target *ti); int dm_post_suspending(struct dm_target *ti); int dm_noflush_suspending(struct dm_target *ti); void dm_accept_partial_bio(struct bio *bio, unsigned n_sectors); -void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone, bool from_wq); +void dm_submit_bio_remap(struct bio *clone, struct bio *tgt_clone); union map_info *dm_get_rq_mapinfo(struct request *rq); #ifdef CONFIG_BLK_DEV_ZONED -- cgit From a2a591fb76e6f5461dfd04715b69c317e50c43a5 Mon Sep 17 00:00:00 2001 From: Ilkka Koskinen Date: Tue, 8 Mar 2022 18:07:50 -0800 Subject: ACPI: AGDI: Add driver for Arm Generic Diagnostic Dump and Reset device ACPI for Arm Components 1.1 Platform Design Document v1.1 [0] specifices Arm Generic Diagnostic Device Interface (AGDI). It allows an admin to issue diagnostic dump and reset via an SDEI event or an interrupt. This patch implements SDEI path. [0] https://developer.arm.com/documentation/den0093/latest/ Signed-off-by: Ilkka Koskinen Reviewed-by: Russell King (Oracle) Acked-by: Lorenzo Pieralisi Signed-off-by: Rafael J. Wysocki --- include/linux/acpi_agdi.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) create mode 100644 include/linux/acpi_agdi.h (limited to 'include/linux') diff --git a/include/linux/acpi_agdi.h b/include/linux/acpi_agdi.h new file mode 100644 index 000000000000..f477f0b452fa --- /dev/null +++ b/include/linux/acpi_agdi.h @@ -0,0 +1,13 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef __ACPI_AGDI_H__ +#define __ACPI_AGDI_H__ + +#include + +#ifdef CONFIG_ACPI_AGDI +void __init acpi_agdi_init(void); +#else +static inline void acpi_agdi_init(void) {} +#endif +#endif /* __ACPI_AGDI_H__ */ -- cgit From 9924fbb51e0ae30b8d7eec7c1c839d74da9678b3 Mon Sep 17 00:00:00 2001 From: Ionela Voinescu Date: Thu, 10 Mar 2022 14:54:50 +0000 Subject: arch_topology: obtain cpu capacity using information from CPPC Define topology_init_cpu_capacity_cppc() to use highest performance values from _CPC objects to obtain and set maximum capacity information for each CPU. acpi_cppc_processor_probe() is a good point at which to trigger the initialization of CPU (u-arch) capacity values, as at this point the highest performance values can be obtained from each CPU's _CPC objects. Architectures can therefore use this functionality through arch_init_invariance_cppc(). The performance scale used by CPPC is a unified scale for all CPUs in the system. Therefore, by obtaining the raw highest performance values from the _CPC objects, and normalizing them on the [0, 1024] capacity scale, used by the task scheduler, we obtain the CPU capacity of each CPU. While an ACPI Notify(0x85) could alert about a change in the highest performance value, which should in turn retrigger the CPU capacity computations, this notification is not currently handled by the ACPI processor driver. When supported, a call to arch_init_invariance_cppc() would perform the update. Signed-off-by: Ionela Voinescu Acked-by: Sudeep Holla Tested-by: Valentin Schneider Tested-by: Yicong Yang Signed-off-by: Rafael J. Wysocki --- include/linux/arch_topology.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index cce6136b300a..58cbe18d825c 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -11,6 +11,10 @@ void topology_normalize_cpu_scale(void); int topology_update_cpu_topology(void); +#ifdef CONFIG_ACPI_CPPC_LIB +void topology_init_cpu_capacity_cppc(void); +#endif + struct device_node; bool topology_parse_cpu_capacity(struct device_node *cpu_node, int cpu); -- cgit From 19397e8b546d20226153dafe5dce34c4393752c4 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 27 Jan 2022 11:17:32 -0600 Subject: ptrace: Move ptrace_report_syscall into ptrace.h Move ptrace_report_syscall from tracehook.h into ptrace.h where it belongs. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/ptrace.h | 27 +++++++++++++++++++++++++++ include/linux/tracehook.h | 26 -------------------------- 2 files changed, 27 insertions(+), 26 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 8aee2945ff08..91b1074edb4c 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -413,4 +413,31 @@ static inline void user_single_step_report(struct pt_regs *regs) extern int task_current_syscall(struct task_struct *target, struct syscall_info *info); extern void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oact); + +/* + * ptrace report for syscall entry and exit looks identical. + */ +static inline int ptrace_report_syscall(unsigned long message) +{ + int ptrace = current->ptrace; + + if (!(ptrace & PT_PTRACED)) + return 0; + + current->ptrace_message = message; + ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0)); + + /* + * this isn't the same as continuing with a signal, but it will do + * for normal use. strace only continues with a signal if the + * stopping signal is not SIGTRAP. -brl + */ + if (current->exit_code) { + send_sig(current->exit_code, current, 1); + current->exit_code = 0; + } + + current->ptrace_message = 0; + return fatal_signal_pending(current); +} #endif diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 88c007ab5ebc..998bc3863559 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -51,32 +51,6 @@ #include struct linux_binprm; -/* - * ptrace report for syscall entry and exit looks identical. - */ -static inline int ptrace_report_syscall(unsigned long message) -{ - int ptrace = current->ptrace; - - if (!(ptrace & PT_PTRACED)) - return 0; - - current->ptrace_message = message; - ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0)); - - /* - * this isn't the same as continuing with a signal, but it will do - * for normal use. strace only continues with a signal if the - * stopping signal is not SIGTRAP. -brl - */ - if (current->exit_code) { - send_sig(current->exit_code, current, 1); - current->exit_code = 0; - } - - current->ptrace_message = 0; - return fatal_signal_pending(current); -} /** * tracehook_report_syscall_entry - task is about to attempt a system call -- cgit From 153474ba1a4aed0a7b797b4c2be8c35c7a4e57bd Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 27 Jan 2022 11:46:37 -0600 Subject: ptrace: Create ptrace_report_syscall_{entry,exit} in ptrace.h Rename tracehook_report_syscall_{entry,exit} to ptrace_report_syscall_{entry,exit} and place them in ptrace.h There is no longer any generic tracehook infractructure so make these ptrace specific functions ptrace specific. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-3-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/entry-common.h | 6 +++--- include/linux/ptrace.h | 51 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/tracehook.h | 51 -------------------------------------------- 3 files changed, 54 insertions(+), 54 deletions(-) (limited to 'include/linux') diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 2e2b8d6140ed..a670e9fba7a9 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -3,7 +3,7 @@ #define __LINUX_ENTRYCOMMON_H #include -#include +#include #include #include #include @@ -95,7 +95,7 @@ static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs #ifndef arch_syscall_enter_tracehook static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs) { - return tracehook_report_syscall_entry(regs); + return ptrace_report_syscall_entry(regs); } #endif @@ -294,7 +294,7 @@ static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step); #ifndef arch_syscall_exit_tracehook static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step) { - tracehook_report_syscall_exit(regs, step); + ptrace_report_syscall_exit(regs, step); } #endif diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 91b1074edb4c..5310f43e4762 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -440,4 +440,55 @@ static inline int ptrace_report_syscall(unsigned long message) current->ptrace_message = 0; return fatal_signal_pending(current); } + +/** + * ptrace_report_syscall_entry - task is about to attempt a system call + * @regs: user register state of current task + * + * This will be called if %SYSCALL_WORK_SYSCALL_TRACE or + * %SYSCALL_WORK_SYSCALL_EMU have been set, when the current task has just + * entered the kernel for a system call. Full user register state is + * available here. Changing the values in @regs can affect the system + * call number and arguments to be tried. It is safe to block here, + * preventing the system call from beginning. + * + * Returns zero normally, or nonzero if the calling arch code should abort + * the system call. That must prevent normal entry so no system call is + * made. If @task ever returns to user mode after this, its register state + * is unspecified, but should be something harmless like an %ENOSYS error + * return. It should preserve enough information so that syscall_rollback() + * can work (see asm-generic/syscall.h). + * + * Called without locks, just after entering kernel mode. + */ +static inline __must_check int ptrace_report_syscall_entry( + struct pt_regs *regs) +{ + return ptrace_report_syscall(PTRACE_EVENTMSG_SYSCALL_ENTRY); +} + +/** + * ptrace_report_syscall_exit - task has just finished a system call + * @regs: user register state of current task + * @step: nonzero if simulating single-step or block-step + * + * This will be called if %SYSCALL_WORK_SYSCALL_TRACE has been set, when + * the current task has just finished an attempted system call. Full + * user register state is available here. It is safe to block here, + * preventing signals from being processed. + * + * If @step is nonzero, this report is also in lieu of the normal + * trap that would follow the system call instruction because + * user_enable_block_step() or user_enable_single_step() was used. + * In this case, %SYSCALL_WORK_SYSCALL_TRACE might not be set. + * + * Called without locks, just before checking for pending signals. + */ +static inline void ptrace_report_syscall_exit(struct pt_regs *regs, int step) +{ + if (step) + user_single_step_report(regs); + else + ptrace_report_syscall(PTRACE_EVENTMSG_SYSCALL_EXIT); +} #endif diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 998bc3863559..819e82ac09bd 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -52,57 +52,6 @@ struct linux_binprm; -/** - * tracehook_report_syscall_entry - task is about to attempt a system call - * @regs: user register state of current task - * - * This will be called if %SYSCALL_WORK_SYSCALL_TRACE or - * %SYSCALL_WORK_SYSCALL_EMU have been set, when the current task has just - * entered the kernel for a system call. Full user register state is - * available here. Changing the values in @regs can affect the system - * call number and arguments to be tried. It is safe to block here, - * preventing the system call from beginning. - * - * Returns zero normally, or nonzero if the calling arch code should abort - * the system call. That must prevent normal entry so no system call is - * made. If @task ever returns to user mode after this, its register state - * is unspecified, but should be something harmless like an %ENOSYS error - * return. It should preserve enough information so that syscall_rollback() - * can work (see asm-generic/syscall.h). - * - * Called without locks, just after entering kernel mode. - */ -static inline __must_check int tracehook_report_syscall_entry( - struct pt_regs *regs) -{ - return ptrace_report_syscall(PTRACE_EVENTMSG_SYSCALL_ENTRY); -} - -/** - * tracehook_report_syscall_exit - task has just finished a system call - * @regs: user register state of current task - * @step: nonzero if simulating single-step or block-step - * - * This will be called if %SYSCALL_WORK_SYSCALL_TRACE has been set, when - * the current task has just finished an attempted system call. Full - * user register state is available here. It is safe to block here, - * preventing signals from being processed. - * - * If @step is nonzero, this report is also in lieu of the normal - * trap that would follow the system call instruction because - * user_enable_block_step() or user_enable_single_step() was used. - * In this case, %SYSCALL_WORK_SYSCALL_TRACE might not be set. - * - * Called without locks, just before checking for pending signals. - */ -static inline void tracehook_report_syscall_exit(struct pt_regs *regs, int step) -{ - if (step) - user_single_step_report(regs); - else - ptrace_report_syscall(PTRACE_EVENTMSG_SYSCALL_EXIT); -} - /** * tracehook_signal_handler - signal handler setup is complete * @stepping: nonzero if debugger single-step or block-step in use -- cgit From 0cfcb2b9ef48bbcaf5d43b9f1893f63a938e8176 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 27 Jan 2022 12:00:55 -0600 Subject: ptrace: Remove arch_syscall_{enter,exit}_tracehook These functions are alwasy one-to-one wrappers around ptrace_report_syscall_entry and ptrace_report_syscall_exit. So directly call the functions they are wrapping instead. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/entry-common.h | 43 ++----------------------------------------- 1 file changed, 2 insertions(+), 41 deletions(-) (limited to 'include/linux') diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index a670e9fba7a9..9efbdda61f7a 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -79,26 +79,6 @@ static __always_inline void arch_check_user_regs(struct pt_regs *regs); static __always_inline void arch_check_user_regs(struct pt_regs *regs) {} #endif -/** - * arch_syscall_enter_tracehook - Wrapper around tracehook_report_syscall_entry() - * @regs: Pointer to currents pt_regs - * - * Returns: 0 on success or an error code to skip the syscall. - * - * Defaults to tracehook_report_syscall_entry(). Can be replaced by - * architecture specific code. - * - * Invoked from syscall_enter_from_user_mode() - */ -static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs); - -#ifndef arch_syscall_enter_tracehook -static inline __must_check int arch_syscall_enter_tracehook(struct pt_regs *regs) -{ - return ptrace_report_syscall_entry(regs); -} -#endif - /** * enter_from_user_mode - Establish state when coming from user mode * @@ -157,7 +137,7 @@ void syscall_enter_from_user_mode_prepare(struct pt_regs *regs); * It handles the following work items: * * 1) syscall_work flag dependent invocations of - * arch_syscall_enter_tracehook(), __secure_computing(), trace_sys_enter() + * ptrace_report_syscall_entry(), __secure_computing(), trace_sys_enter() * 2) Invocation of audit_syscall_entry() */ long syscall_enter_from_user_mode_work(struct pt_regs *regs, long syscall); @@ -279,25 +259,6 @@ static __always_inline void arch_exit_to_user_mode(void) { } */ void arch_do_signal_or_restart(struct pt_regs *regs, bool has_signal); -/** - * arch_syscall_exit_tracehook - Wrapper around tracehook_report_syscall_exit() - * @regs: Pointer to currents pt_regs - * @step: Indicator for single step - * - * Defaults to tracehook_report_syscall_exit(). Can be replaced by - * architecture specific code. - * - * Invoked from syscall_exit_to_user_mode() - */ -static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step); - -#ifndef arch_syscall_exit_tracehook -static inline void arch_syscall_exit_tracehook(struct pt_regs *regs, bool step) -{ - ptrace_report_syscall_exit(regs, step); -} -#endif - /** * exit_to_user_mode - Fixup state when exiting to user mode * @@ -347,7 +308,7 @@ void syscall_exit_to_user_mode_work(struct pt_regs *regs); * - rseq syscall exit * - audit * - syscall tracing - * - tracehook (single stepping) + * - ptrace (single stepping) * * 2) Preparatory work * - Exit to user mode loop (common TIF handling). Invokes -- cgit From c145137dc990fd67b52fbc52faae5ba46f168cca Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 27 Jan 2022 12:04:27 -0600 Subject: ptrace: Remove tracehook_signal_handler The two line function tracehook_signal_handler is only called from signal_delivered. Expand it inline in signal_delivered and remove it. Just to make it easier to understand what is going on. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/tracehook.h | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 819e82ac09bd..b77bf4917196 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -52,23 +52,6 @@ struct linux_binprm; -/** - * tracehook_signal_handler - signal handler setup is complete - * @stepping: nonzero if debugger single-step or block-step in use - * - * Called by the arch code after a signal handler has been set up. - * Register and stack state reflects the user handler about to run. - * Signal mask changes have already been made. - * - * Called without locks, shortly before returning to user mode - * (or handling more signals). - */ -static inline void tracehook_signal_handler(int stepping) -{ - if (stepping) - ptrace_notify(SIGTRAP); -} - /** * set_notify_resume - cause tracehook_notify_resume() to be called * @task: task that will call tracehook_notify_resume() -- cgit From 8ca07e17c9dd4c4afcb4a3f2ea8f0a0d41c0f982 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 28 Jan 2022 13:55:47 -0600 Subject: task_work: Remove unnecessary include from posix_timers.h Break a header file circular dependency by removing the unnecessary include of task_work.h from posix_timers.h. sched.h -> posix-timers.h posix-timers.h -> task_work.h task_work.h -> sched.h Add missing includes of task_work.h to: arch/x86/mm/tlb.c kernel/time/posix-cpu-timers.c Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-6-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/posix-timers.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/posix-timers.h b/include/linux/posix-timers.h index 5bbcd280bfd2..83539bb2f023 100644 --- a/include/linux/posix-timers.h +++ b/include/linux/posix-timers.h @@ -6,7 +6,6 @@ #include #include #include -#include struct kernel_siginfo; struct task_struct; -- cgit From 7f62d40d9cb50fd146fe8ff071f98fa3c1855083 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 9 Feb 2022 08:52:41 -0600 Subject: task_work: Introduce task_work_pending Wrap the test of task->task_works in a helper function to make it clear what is being tested. All of the other readers of task->task_work use READ_ONCE and this is even necessary on current as other processes can update task->task_work. So for consistency I have added READ_ONCE into task_work_pending. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-7-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/task_work.h | 5 +++++ include/linux/tracehook.h | 4 ++-- 2 files changed, 7 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/task_work.h b/include/linux/task_work.h index 5b8a93f288bb..897494b597ba 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -19,6 +19,11 @@ enum task_work_notify_mode { TWA_SIGNAL, }; +static inline bool task_work_pending(struct task_struct *task) +{ + return READ_ONCE(task->task_works); +} + int task_work_add(struct task_struct *task, struct callback_head *twork, enum task_work_notify_mode mode); diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index b77bf4917196..fa834a22e86e 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -90,7 +90,7 @@ static inline void tracehook_notify_resume(struct pt_regs *regs) * hlist_add_head(task->task_works); */ smp_mb__after_atomic(); - if (unlikely(current->task_works)) + if (unlikely(task_work_pending(current))) task_work_run(); #ifdef CONFIG_KEYS_REQUEST_CACHE @@ -115,7 +115,7 @@ static inline void tracehook_notify_signal(void) { clear_thread_flag(TIF_NOTIFY_SIGNAL); smp_mb__after_atomic(); - if (current->task_works) + if (task_work_pending(current)) task_work_run(); } -- cgit From 3b5d4ddf8fe1f60082513f94bae586ac80188a03 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 9 Mar 2022 01:04:50 -0800 Subject: bpf: net: Remove TC_AT_INGRESS_OFFSET and SKB_MONO_DELIVERY_TIME_OFFSET macro This patch removes the TC_AT_INGRESS_OFFSET and SKB_MONO_DELIVERY_TIME_OFFSET macros. Instead, PKT_VLAN_PRESENT_OFFSET is used because all of them are at the same offset. Comment is added to make it clear that changing the position of tc_at_ingress or mono_delivery_time will require to adjust the defined macros. The earlier discussion can be found here: https://lore.kernel.org/bpf/419d994e-ff61-7c11-0ec7-11fefcb0186e@iogearbox.net/ Signed-off-by: Martin KaFai Lau Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220309090450.3710955-1-kafai@fb.com --- include/linux/skbuff.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2be263184d1e..7b0cb10e70cb 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -960,10 +960,10 @@ struct sk_buff { __u8 csum_complete_sw:1; __u8 csum_level:2; __u8 dst_pending_confirm:1; - __u8 mono_delivery_time:1; + __u8 mono_delivery_time:1; /* See SKB_MONO_DELIVERY_TIME_MASK */ #ifdef CONFIG_NET_CLS_ACT __u8 tc_skip_classify:1; - __u8 tc_at_ingress:1; + __u8 tc_at_ingress:1; /* See TC_AT_INGRESS_MASK */ #endif #ifdef CONFIG_IPV6_NDISC_NODETYPE __u8 ndisc_nodetype:2; @@ -1062,7 +1062,9 @@ struct sk_buff { #endif #define PKT_TYPE_OFFSET offsetof(struct sk_buff, __pkt_type_offset) -/* if you move pkt_vlan_present around you also must adapt these constants */ +/* if you move pkt_vlan_present, tc_at_ingress, or mono_delivery_time + * around, you also must adapt these constants. + */ #ifdef __BIG_ENDIAN_BITFIELD #define PKT_VLAN_PRESENT_BIT 7 #define TC_AT_INGRESS_MASK (1 << 0) @@ -1073,8 +1075,6 @@ struct sk_buff { #define SKB_MONO_DELIVERY_TIME_MASK (1 << 5) #endif #define PKT_VLAN_PRESENT_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset) -#define TC_AT_INGRESS_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset) -#define SKB_MONO_DELIVERY_TIME_OFFSET offsetof(struct sk_buff, __pkt_vlan_present_offset) #ifdef __KERNEL__ /* -- cgit From 9bb984f28d5bcb917d35d930fcfb89f90f9449fd Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Wed, 9 Mar 2022 01:05:09 -0800 Subject: bpf: Remove BPF_SKB_DELIVERY_TIME_NONE and rename s/delivery_time_/tstamp_/ This patch is to simplify the uapi bpf.h regarding to the tstamp type and use a similar way as the kernel to describe the value stored in __sk_buff->tstamp. My earlier thought was to avoid describing the semantic and clock base for the rcv timestamp until there is more clarity on the use case, so the __sk_buff->delivery_time_type naming instead of __sk_buff->tstamp_type. With some thoughts, it can reuse the UNSPEC naming. This patch first removes BPF_SKB_DELIVERY_TIME_NONE and also rename BPF_SKB_DELIVERY_TIME_UNSPEC to BPF_SKB_TSTAMP_UNSPEC and BPF_SKB_DELIVERY_TIME_MONO to BPF_SKB_TSTAMP_DELIVERY_MONO. The semantic of BPF_SKB_TSTAMP_DELIVERY_MONO is the same: __sk_buff->tstamp has delivery time in mono clock base. BPF_SKB_TSTAMP_UNSPEC means __sk_buff->tstamp has the (rcv) tstamp at ingress and the delivery time at egress. At egress, the clock base could be found from skb->sk->sk_clockid. __sk_buff->tstamp == 0 naturally means NONE, so NONE is not needed. With BPF_SKB_TSTAMP_UNSPEC for the rcv tstamp at ingress, the __sk_buff->delivery_time_type is also renamed to __sk_buff->tstamp_type which was also suggested in the earlier discussion: https://lore.kernel.org/bpf/b181acbe-caf8-502d-4b7b-7d96b9fc5d55@iogearbox.net/ The above will then make __sk_buff->tstamp and __sk_buff->tstamp_type the same as its kernel skb->tstamp and skb->mono_delivery_time counter part. The internal kernel function bpf_skb_convert_dtime_type_read() is then renamed to bpf_skb_convert_tstamp_type_read() and it can be simplified with the BPF_SKB_DELIVERY_TIME_NONE gone. A BPF_ALU32_IMM(BPF_AND) insn is also saved by using BPF_JMP32_IMM(BPF_JSET). The bpf helper bpf_skb_set_delivery_time() is also renamed to bpf_skb_set_tstamp(). The arg name is changed from dtime to tstamp also. It only allows setting tstamp 0 for BPF_SKB_TSTAMP_UNSPEC and it could be relaxed later if there is use case to change mono delivery time to non mono. prog->delivery_time_access is also renamed to prog->tstamp_type_access. Signed-off-by: Martin KaFai Lau Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220309090509.3712315-1-kafai@fb.com --- include/linux/filter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index 9bf26307247f..05ed9bd31b45 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -573,7 +573,7 @@ struct bpf_prog { enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ call_get_func_ip:1, /* Do we call get_func_ip() */ - delivery_time_access:1; /* Accessed __sk_buff->delivery_time_type */ + tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */ enum bpf_prog_type type; /* Type of BPF program */ enum bpf_attach_type expected_attach_type; /* For some prog types */ u32 len; /* Number of filter blocks */ -- cgit From 26183cfe478c1d1d5cd1e3920a4b2c5b1980849d Mon Sep 17 00:00:00 2001 From: Colin Foster Date: Tue, 8 Mar 2022 22:25:44 -0800 Subject: net: phy: correct spelling error of media in documentation The header file incorrectly referenced "median-independant interface" instead of media. Correct this typo. Signed-off-by: Colin Foster Fixes: 4069a572d423 ("net: phy: Document core PHY structures") Reviewed-by: Russell King (Oracle) Link: https://lore.kernel.org/r/20220309062544.3073-1-colin.foster@in-advantage.com Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 6de8d7a90d78..8fa70ba063a5 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -87,8 +87,8 @@ extern const int phy_10gbit_features_array[1]; * * @PHY_INTERFACE_MODE_NA: Not Applicable - don't touch * @PHY_INTERFACE_MODE_INTERNAL: No interface, MAC and PHY combined - * @PHY_INTERFACE_MODE_MII: Median-independent interface - * @PHY_INTERFACE_MODE_GMII: Gigabit median-independent interface + * @PHY_INTERFACE_MODE_MII: Media-independent interface + * @PHY_INTERFACE_MODE_GMII: Gigabit media-independent interface * @PHY_INTERFACE_MODE_SGMII: Serial gigabit media-independent interface * @PHY_INTERFACE_MODE_TBI: Ten Bit Interface * @PHY_INTERFACE_MODE_REVMII: Reverse Media Independent Interface -- cgit From 8ba62d37949e248c698c26e0d82d72fda5d33ebf Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 9 Feb 2022 09:51:14 -0600 Subject: task_work: Call tracehook_notify_signal from get_signal on all architectures Always handle TIF_NOTIFY_SIGNAL in get_signal. With commit 35d0b389f3b2 ("task_work: unconditionally run task_work from get_signal()") always calling task_work_run all of the work of tracehook_notify_signal is already happening except clearing TIF_NOTIFY_SIGNAL. Factor clear_notify_signal out of tracehook_notify_signal and use it in get_signal so that get_signal only needs one call of task_work_run. To keep the semantics in sync update xfer_to_guest_mode_work (which does not call get_signal) to call tracehook_notify_signal if either _TIF_SIGPENDING or _TIF_NOTIFY_SIGNAL. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-8-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/entry-common.h | 2 +- include/linux/tracehook.h | 9 +++++++-- 2 files changed, 8 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 9efbdda61f7a..3537fd25f14e 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -257,7 +257,7 @@ static __always_inline void arch_exit_to_user_mode(void) { } * * Invoked from exit_to_user_mode_loop(). */ -void arch_do_signal_or_restart(struct pt_regs *regs, bool has_signal); +void arch_do_signal_or_restart(struct pt_regs *regs); /** * exit_to_user_mode - Fixup state when exiting to user mode diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index fa834a22e86e..b44a7820c468 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -106,6 +106,12 @@ static inline void tracehook_notify_resume(struct pt_regs *regs) rseq_handle_notify_resume(NULL, regs); } +static inline void clear_notify_signal(void) +{ + clear_thread_flag(TIF_NOTIFY_SIGNAL); + smp_mb__after_atomic(); +} + /* * called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL. This * is currently used by TWA_SIGNAL based task_work, which requires breaking @@ -113,8 +119,7 @@ static inline void tracehook_notify_resume(struct pt_regs *regs) */ static inline void tracehook_notify_signal(void) { - clear_thread_flag(TIF_NOTIFY_SIGNAL); - smp_mb__after_atomic(); + clear_notify_signal(); if (task_work_pending(current)) task_work_run(); } -- cgit From 7c5d8fa6fbb12a3f0eefe8762bfede508e147cb3 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 9 Feb 2022 11:18:54 -0600 Subject: task_work: Decouple TIF_NOTIFY_SIGNAL and task_work There are a small handful of reasons besides pending signals that the kernel might want to break out of interruptible sleeps. The flag TIF_NOTIFY_SIGNAL and the helpers that set and clear TIF_NOTIFY_SIGNAL provide that the infrastructure for breaking out of interruptible sleeps and entering the return to user space slow path for those cases. Expand tracehook_notify_signal inline in it's callers and remove it, which makes clear that TIF_NOTIFY_SIGNAL and task_work are separate concepts. Update the comment on set_notify_signal to more accurately describe it's purpose. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-9-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/tracehook.h | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index b44a7820c468..e5d676e841e3 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -113,19 +113,8 @@ static inline void clear_notify_signal(void) } /* - * called by exit_to_user_mode_loop() if ti_work & _TIF_NOTIFY_SIGNAL. This - * is currently used by TWA_SIGNAL based task_work, which requires breaking - * wait loops to ensure that task_work is noticed and run. - */ -static inline void tracehook_notify_signal(void) -{ - clear_notify_signal(); - if (task_work_pending(current)) - task_work_run(); -} - -/* - * Called when we have work to process from exit_to_user_mode_loop() + * Called to break out of interruptible wait loops, and enter the + * exit_to_user_mode_loop(). */ static inline void set_notify_signal(struct task_struct *task) { -- cgit From 593febb143d17aa2096cd123c9d62b6981eb1d97 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 9 Feb 2022 11:46:54 -0600 Subject: signal: Move set_notify_signal and clear_notify_signal into sched/signal.h The header tracehook.h is no place for code to live. The functions set_notify_signal and clear_notify_signal are not about signals. They are about interruptions that act like signals. The fundamental signal primitives wind up calling set_notify_signal and clear_notify_signal. Which means they need to be maintained with the signal code. Since set_notify_signal and clear_notify_signal must be maintained with the signal subsystem move them into sched/signal.h and claim them as part of the signal subsystem. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-10-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/sched/signal.h | 17 +++++++++++++++++ include/linux/tracehook.h | 17 ----------------- 2 files changed, 17 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index b6ecb9fc4cd2..3c8b34876744 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -349,6 +349,23 @@ extern void sigqueue_free(struct sigqueue *); extern int send_sigqueue(struct sigqueue *, struct pid *, enum pid_type); extern int do_sigaction(int, struct k_sigaction *, struct k_sigaction *); +static inline void clear_notify_signal(void) +{ + clear_thread_flag(TIF_NOTIFY_SIGNAL); + smp_mb__after_atomic(); +} + +/* + * Called to break out of interruptible wait loops, and enter the + * exit_to_user_mode_loop(). + */ +static inline void set_notify_signal(struct task_struct *task) +{ + if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) && + !wake_up_state(task, TASK_INTERRUPTIBLE)) + kick_process(task); +} + static inline int restart_syscall(void) { set_tsk_thread_flag(current, TIF_SIGPENDING); diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index e5d676e841e3..1b7365aef8da 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -106,21 +106,4 @@ static inline void tracehook_notify_resume(struct pt_regs *regs) rseq_handle_notify_resume(NULL, regs); } -static inline void clear_notify_signal(void) -{ - clear_thread_flag(TIF_NOTIFY_SIGNAL); - smp_mb__after_atomic(); -} - -/* - * Called to break out of interruptible wait loops, and enter the - * exit_to_user_mode_loop(). - */ -static inline void set_notify_signal(struct task_struct *task) -{ - if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) && - !wake_up_state(task, TASK_INTERRUPTIBLE)) - kick_process(task); -} - #endif /* */ -- cgit From d3c51a0c8944e3a5bba458358b4c1f9ae2de0133 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 28 Jan 2022 14:36:58 -0600 Subject: resume_user_mode: Remove #ifdef TIF_NOTIFY_RESUME in set_notify_resume Every architecture defines TIF_NOTIFY_RESUME so remove the unnecessary ifdef. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-11-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/tracehook.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 1b7365aef8da..946404ebe10b 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -63,10 +63,8 @@ struct linux_binprm; */ static inline void set_notify_resume(struct task_struct *task) { -#ifdef TIF_NOTIFY_RESUME if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_RESUME)) kick_process(task); -#endif } /** -- cgit From 03248addadf1a5ef0a03cbcd5ec905b49adb9658 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 9 Feb 2022 12:20:45 -0600 Subject: resume_user_mode: Move to resume_user_mode.h Move set_notify_resume and tracehook_notify_resume into resume_user_mode.h. While doing that rename tracehook_notify_resume to resume_user_mode_work. Update all of the places that included tracehook.h for these functions to include resume_user_mode.h instead. Update all of the callers of tracehook_notify_resume to call resume_user_mode_work. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-12-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/entry-kvm.h | 2 +- include/linux/resume_user_mode.h | 64 ++++++++++++++++++++++++++++++++++++++++ include/linux/tracehook.h | 51 -------------------------------- 3 files changed, 65 insertions(+), 52 deletions(-) create mode 100644 include/linux/resume_user_mode.h (limited to 'include/linux') diff --git a/include/linux/entry-kvm.h b/include/linux/entry-kvm.h index 07c878d6e323..6813171afccb 100644 --- a/include/linux/entry-kvm.h +++ b/include/linux/entry-kvm.h @@ -3,7 +3,7 @@ #define __LINUX_ENTRYKVM_H #include -#include +#include #include #include #include diff --git a/include/linux/resume_user_mode.h b/include/linux/resume_user_mode.h new file mode 100644 index 000000000000..285189454449 --- /dev/null +++ b/include/linux/resume_user_mode.h @@ -0,0 +1,64 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifndef LINUX_RESUME_USER_MODE_H +#define LINUX_RESUME_USER_MODE_H + +#include +#include +#include +#include + +/** + * set_notify_resume - cause resume_user_mode_work() to be called + * @task: task that will call resume_user_mode_work() + * + * Calling this arranges that @task will call resume_user_mode_work() + * before returning to user mode. If it's already running in user mode, + * it will enter the kernel and call resume_user_mode_work() soon. + * If it's blocked, it will not be woken. + */ +static inline void set_notify_resume(struct task_struct *task) +{ + if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_RESUME)) + kick_process(task); +} + + +/** + * resume_user_mode_work - Perform work before returning to user mode + * @regs: user-mode registers of @current task + * + * This is called when %TIF_NOTIFY_RESUME has been set. Now we are + * about to return to user mode, and the user state in @regs can be + * inspected or adjusted. The caller in arch code has cleared + * %TIF_NOTIFY_RESUME before the call. If the flag gets set again + * asynchronously, this will be called again before we return to + * user mode. + * + * Called without locks. + */ +static inline void resume_user_mode_work(struct pt_regs *regs) +{ + clear_thread_flag(TIF_NOTIFY_RESUME); + /* + * This barrier pairs with task_work_add()->set_notify_resume() after + * hlist_add_head(task->task_works); + */ + smp_mb__after_atomic(); + if (unlikely(task_work_pending(current))) + task_work_run(); + +#ifdef CONFIG_KEYS_REQUEST_CACHE + if (unlikely(current->cached_requested_key)) { + key_put(current->cached_requested_key); + current->cached_requested_key = NULL; + } +#endif + + mem_cgroup_handle_over_high(); + blkcg_maybe_throttle_current(); + + rseq_handle_notify_resume(NULL, regs); +} + +#endif /* LINUX_RESUME_USER_MODE_H */ diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h index 946404ebe10b..9f6b3fd1880a 100644 --- a/include/linux/tracehook.h +++ b/include/linux/tracehook.h @@ -52,56 +52,5 @@ struct linux_binprm; -/** - * set_notify_resume - cause tracehook_notify_resume() to be called - * @task: task that will call tracehook_notify_resume() - * - * Calling this arranges that @task will call tracehook_notify_resume() - * before returning to user mode. If it's already running in user mode, - * it will enter the kernel and call tracehook_notify_resume() soon. - * If it's blocked, it will not be woken. - */ -static inline void set_notify_resume(struct task_struct *task) -{ - if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_RESUME)) - kick_process(task); -} - -/** - * tracehook_notify_resume - report when about to return to user mode - * @regs: user-mode registers of @current task - * - * This is called when %TIF_NOTIFY_RESUME has been set. Now we are - * about to return to user mode, and the user state in @regs can be - * inspected or adjusted. The caller in arch code has cleared - * %TIF_NOTIFY_RESUME before the call. If the flag gets set again - * asynchronously, this will be called again before we return to - * user mode. - * - * Called without locks. - */ -static inline void tracehook_notify_resume(struct pt_regs *regs) -{ - clear_thread_flag(TIF_NOTIFY_RESUME); - /* - * This barrier pairs with task_work_add()->set_notify_resume() after - * hlist_add_head(task->task_works); - */ - smp_mb__after_atomic(); - if (unlikely(task_work_pending(current))) - task_work_run(); - -#ifdef CONFIG_KEYS_REQUEST_CACHE - if (unlikely(current->cached_requested_key)) { - key_put(current->cached_requested_key); - current->cached_requested_key = NULL; - } -#endif - - mem_cgroup_handle_over_high(); - blkcg_maybe_throttle_current(); - - rseq_handle_notify_resume(NULL, regs); -} #endif /* */ -- cgit From 355f841a3f8ca980c9682937a5257d3a1f6fc09d Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Wed, 9 Feb 2022 12:47:08 -0600 Subject: tracehook: Remove tracehook.h Now that all of the definitions have moved out of tracehook.h into ptrace.h, sched/signal.h, resume_user_mode.h there is nothing left in tracehook.h so remove it. Update the few files that were depending upon tracehook.h to bring in definitions to use the headers they need directly. Reviewed-by: Kees Cook Link: https://lkml.kernel.org/r/20220309162454.123006-13-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/tracehook.h | 56 ----------------------------------------------- 1 file changed, 56 deletions(-) delete mode 100644 include/linux/tracehook.h (limited to 'include/linux') diff --git a/include/linux/tracehook.h b/include/linux/tracehook.h deleted file mode 100644 index 9f6b3fd1880a..000000000000 --- a/include/linux/tracehook.h +++ /dev/null @@ -1,56 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Tracing hooks - * - * Copyright (C) 2008-2009 Red Hat, Inc. All rights reserved. - * - * This file defines hook entry points called by core code where - * user tracing/debugging support might need to do something. These - * entry points are called tracehook_*(). Each hook declared below - * has a detailed kerneldoc comment giving the context (locking et - * al) from which it is called, and the meaning of its return value. - * - * Each function here typically has only one call site, so it is ok - * to have some nontrivial tracehook_*() inlines. In all cases, the - * fast path when no tracing is enabled should be very short. - * - * The purpose of this file and the tracehook_* layer is to consolidate - * the interface that the kernel core and arch code uses to enable any - * user debugging or tracing facility (such as ptrace). The interfaces - * here are carefully documented so that maintainers of core and arch - * code do not need to think about the implementation details of the - * tracing facilities. Likewise, maintainers of the tracing code do not - * need to understand all the calling core or arch code in detail, just - * documented circumstances of each call, such as locking conditions. - * - * If the calling core code changes so that locking is different, then - * it is ok to change the interface documented here. The maintainer of - * core code changing should notify the maintainers of the tracing code - * that they need to work out the change. - * - * Some tracehook_*() inlines take arguments that the current tracing - * implementations might not necessarily use. These function signatures - * are chosen to pass in all the information that is on hand in the - * caller and might conceivably be relevant to a tracer, so that the - * core code won't have to be updated when tracing adds more features. - * If a call site changes so that some of those parameters are no longer - * already on hand without extra work, then the tracehook_* interface - * can change so there is no make-work burden on the core code. The - * maintainer of core code changing should notify the maintainers of the - * tracing code that they need to work out the change. - */ - -#ifndef _LINUX_TRACEHOOK_H -#define _LINUX_TRACEHOOK_H 1 - -#include -#include -#include -#include -#include -#include -struct linux_binprm; - - - -#endif /* */ -- cgit From 6789ab9668d920d7be636cb994d85be9a816f9d5 Mon Sep 17 00:00:00 2001 From: Hao Luo Date: Thu, 10 Mar 2022 13:16:55 -0800 Subject: compiler_types: Refactor the use of btf_type_tag attribute. Previous patches have introduced the compiler attribute btf_type_tag for __user and __percpu. The availability of this attribute depends on some CONFIGs and compiler support. This patch refactors the use of btf_type_tag by introducing BTF_TYPE_TAG, which hides all the dependencies. No functional change. Suggested-by: Andrii Nakryiko Signed-off-by: Hao Luo Signed-off-by: Alexei Starovoitov Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20220310211655.3173786-1-haoluo@google.com --- include/linux/compiler_types.h | 19 +++++++++---------- 1 file changed, 9 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index b9a8ae9440c7..1bc760ba400c 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -4,6 +4,13 @@ #ifndef __ASSEMBLY__ +#if defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \ + __has_attribute(btf_type_tag) +# define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value))) +#else +# define BTF_TYPE_TAG(value) /* nothing */ +#endif + #ifdef __CHECKER__ /* address spaces */ # define __kernel __attribute__((address_space(0))) @@ -31,19 +38,11 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { } # define __kernel # ifdef STRUCTLEAK_PLUGIN # define __user __attribute__((user)) -# elif defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \ - __has_attribute(btf_type_tag) -# define __user __attribute__((btf_type_tag("user"))) # else -# define __user +# define __user BTF_TYPE_TAG(user) # endif # define __iomem -# if defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \ - __has_attribute(btf_type_tag) -# define __percpu __attribute__((btf_type_tag("percpu"))) -# else -# define __percpu -# endif +# define __percpu BTF_TYPE_TAG(percpu) # define __rcu # define __chk_user_ptr(x) (void)0 # define __chk_io_ptr(x) (void)0 -- cgit From 271907ee2f29cd1078fd219f0778fd824fb1971c Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Mon, 17 Jan 2022 15:14:44 +0200 Subject: net/mlx5: Query the maximum MCIA register read size from firmware The MCIA register supports either 12 or 32 dwords, use the correct value by querying the capability from the MCAM register. Signed-off-by: Gal Pressman Reviewed-by: Maxim Mikityanskiy Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 4 +++- include/linux/mlx5/port.h | 1 - 2 files changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 318fae4b3560..745107ff681d 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -9691,7 +9691,9 @@ struct mlx5_ifc_pcam_reg_bits { }; struct mlx5_ifc_mcam_enhanced_features_bits { - u8 reserved_at_0[0x6a]; + u8 reserved_at_0[0x5d]; + u8 mcia_32dwords[0x1]; + u8 reserved_at_5e[0xc]; u8 reset_state[0x1]; u8 ptpcyc2realtime_modify[0x1]; u8 reserved_at_6c[0x2]; diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h index 77ea4f9c5265..402413b3e914 100644 --- a/include/linux/mlx5/port.h +++ b/include/linux/mlx5/port.h @@ -56,7 +56,6 @@ enum mlx5_an_status { MLX5_AN_LINK_DOWN = 4, }; -#define MLX5_EEPROM_MAX_BYTES 32 #define MLX5_EEPROM_IDENTIFIER_BYTE_MASK 0x000000ff #define MLX5_I2C_ADDR_LOW 0x50 #define MLX5_I2C_ADDR_HIGH 0x51 -- cgit From fcb610a86c53dfcfbb2aa62e704481112752f367 Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Mon, 17 Jan 2022 15:53:06 +0200 Subject: net/mlx5: Parse module mapping using mlx5_ifc The assumption that the first byte in the module mapping dword is the module number shouldn't be hard-coded in the driver, but come from mlx5_ifc structs. While at it, fix the incorrect width for the 'rx_lane' and 'tx_lane' fields. Signed-off-by: Gal Pressman Reviewed-by: Maxim Mikityanskiy Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 8 ++++---- include/linux/mlx5/port.h | 1 - 2 files changed, 4 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 745107ff681d..91b7f730ed91 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -9888,10 +9888,10 @@ struct mlx5_ifc_pcmr_reg_bits { }; struct mlx5_ifc_lane_2_module_mapping_bits { - u8 reserved_at_0[0x6]; - u8 rx_lane[0x2]; - u8 reserved_at_8[0x6]; - u8 tx_lane[0x2]; + u8 reserved_at_0[0x4]; + u8 rx_lane[0x4]; + u8 reserved_at_8[0x4]; + u8 tx_lane[0x4]; u8 reserved_at_10[0x8]; u8 module[0x8]; }; diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h index 402413b3e914..28a928b0684b 100644 --- a/include/linux/mlx5/port.h +++ b/include/linux/mlx5/port.h @@ -56,7 +56,6 @@ enum mlx5_an_status { MLX5_AN_LINK_DOWN = 4, }; -#define MLX5_EEPROM_IDENTIFIER_BYTE_MASK 0x000000ff #define MLX5_I2C_ADDR_LOW 0x50 #define MLX5_I2C_ADDR_HIGH 0x51 #define MLX5_EEPROM_PAGE_LENGTH 256 -- cgit From 286f950545e0d9c8aa802cbfc9676860bbc49179 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Wed, 16 Feb 2022 15:21:58 +0530 Subject: coresight: Drop unused 'none' enum value for each component CORESIGHT_DEV_TYPE_NONE/CORESIGHT_DEV_SUBTYPE_XXXX_NONE values are not used any where. Actual enumeration can start from 0. Just drop these unused enum values. Cc: Mathieu Poirier Cc: Suzuki K Poulose Cc: Mike Leach Cc: Leo Yan Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Anshuman Khandual Link: https://lore.kernel.org/r/1645005118-10561-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Mathieu Poirier Signed-off-by: Suzuki K Poulose --- include/linux/coresight.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/coresight.h b/include/linux/coresight.h index 93a2922b7653..9f445f09fcfe 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -36,7 +36,6 @@ extern struct bus_type coresight_bustype; enum coresight_dev_type { - CORESIGHT_DEV_TYPE_NONE, CORESIGHT_DEV_TYPE_SINK, CORESIGHT_DEV_TYPE_LINK, CORESIGHT_DEV_TYPE_LINKSINK, @@ -46,7 +45,6 @@ enum coresight_dev_type { }; enum coresight_dev_subtype_sink { - CORESIGHT_DEV_SUBTYPE_SINK_NONE, CORESIGHT_DEV_SUBTYPE_SINK_PORT, CORESIGHT_DEV_SUBTYPE_SINK_BUFFER, CORESIGHT_DEV_SUBTYPE_SINK_SYSMEM, @@ -54,21 +52,18 @@ enum coresight_dev_subtype_sink { }; enum coresight_dev_subtype_link { - CORESIGHT_DEV_SUBTYPE_LINK_NONE, CORESIGHT_DEV_SUBTYPE_LINK_MERG, CORESIGHT_DEV_SUBTYPE_LINK_SPLIT, CORESIGHT_DEV_SUBTYPE_LINK_FIFO, }; enum coresight_dev_subtype_source { - CORESIGHT_DEV_SUBTYPE_SOURCE_NONE, CORESIGHT_DEV_SUBTYPE_SOURCE_PROC, CORESIGHT_DEV_SUBTYPE_SOURCE_BUS, CORESIGHT_DEV_SUBTYPE_SOURCE_SOFTWARE, }; enum coresight_dev_subtype_helper { - CORESIGHT_DEV_SUBTYPE_HELPER_NONE, CORESIGHT_DEV_SUBTYPE_HELPER_CATU, }; -- cgit From 3a73333fb370f7b65de9d94c53df503642bda789 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Thu, 3 Mar 2022 17:05:34 -0500 Subject: tracing: Add TRACE_CUSTOM_EVENT() macro To make it really easy to add custom events from modules, add a TRACE_CUSTOM_EVENT() macro that acts just like the TRACE_EVENT() macro, but creates a custom event to an already existing tracepoint. The trace_custom_sched.[ch] has been updated to use this new macro to show how simple it is. Link: https://lkml.kernel.org/r/20220303220625.738622494@goodmis.org Cc: Ingo Molnar Cc: Andrew Morton Cc: Joel Fernandes Cc: Peter Zijlstra Cc: Masami Hiramatsu Cc: Tom Zanussi Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 70c069aef02c..9b09fd633d48 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -315,6 +315,7 @@ enum { TRACE_EVENT_FL_KPROBE_BIT, TRACE_EVENT_FL_UPROBE_BIT, TRACE_EVENT_FL_EPROBE_BIT, + TRACE_EVENT_FL_CUSTOM_BIT, }; /* @@ -328,6 +329,9 @@ enum { * KPROBE - Event is a kprobe * UPROBE - Event is a uprobe * EPROBE - Event is an event probe + * CUSTOM - Event is a custom event (to be attached to an exsiting tracepoint) + * This is set when the custom event has not been attached + * to a tracepoint yet, then it is cleared when it is. */ enum { TRACE_EVENT_FL_FILTERED = (1 << TRACE_EVENT_FL_FILTERED_BIT), @@ -339,6 +343,7 @@ enum { TRACE_EVENT_FL_KPROBE = (1 << TRACE_EVENT_FL_KPROBE_BIT), TRACE_EVENT_FL_UPROBE = (1 << TRACE_EVENT_FL_UPROBE_BIT), TRACE_EVENT_FL_EPROBE = (1 << TRACE_EVENT_FL_EPROBE_BIT), + TRACE_EVENT_FL_CUSTOM = (1 << TRACE_EVENT_FL_CUSTOM_BIT), }; #define TRACE_EVENT_FL_UKPROBE (TRACE_EVENT_FL_KPROBE | TRACE_EVENT_FL_UPROBE) @@ -440,7 +445,9 @@ static inline bool bpf_prog_array_valid(struct trace_event_call *call) static inline const char * trace_event_name(struct trace_event_call *call) { - if (call->flags & TRACE_EVENT_FL_TRACEPOINT) + if (call->flags & TRACE_EVENT_FL_CUSTOM) + return call->name; + else if (call->flags & TRACE_EVENT_FL_TRACEPOINT) return call->tp ? call->tp->name : NULL; else return call->name; @@ -901,3 +908,18 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type, #endif #endif /* _LINUX_TRACE_EVENT_H */ + +/* + * Note: we keep the TRACE_CUSTOM_EVENT outside the include file ifdef protection. + * This is due to the way trace custom events work. If a file includes two + * trace event headers under one "CREATE_CUSTOM_TRACE_EVENTS" the first include + * will override the TRACE_CUSTOM_EVENT and break the second include. + */ + +#ifndef TRACE_CUSTOM_EVENT + +#define DECLARE_CUSTOM_EVENT_CLASS(name, proto, args, tstruct, assign, print) +#define DEFINE_CUSTOM_EVENT(template, name, proto, args) +#define TRACE_CUSTOM_EVENT(name, proto, args, struct, assign, print) + +#endif /* ifdef TRACE_CUSTOM_EVENT (see note above) */ -- cgit From 380af29b8d7670c445965bd573ab219aff0c4c11 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Thu, 10 Mar 2022 21:37:09 -0500 Subject: tracing: Add snapshot at end of kernel boot up Add ftrace_boot_snapshot kernel parameter that will take a snapshot at the end of boot up just before switching over to user space (it happens during the kernel freeing of init memory). This is useful when there's interesting data that can be collected from kernel start up, but gets overridden by user space start up code. With this option, the ring buffer content from the boot up traces gets saved in the snapshot at the end of boot up. This trace can be read from: /sys/kernel/tracing/snapshot Signed-off-by: Steven Rostedt (Google) --- include/linux/ftrace.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 9999e29187de..37b619185ec9 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -30,6 +30,12 @@ #define ARCH_SUPPORTS_FTRACE_OPS 0 #endif +#ifdef CONFIG_TRACING +extern void ftrace_boot_snapshot(void); +#else +static inline void ftrace_boot_snapshot(void) { } +#endif + #ifdef CONFIG_FUNCTION_TRACER struct ftrace_ops; struct ftrace_regs; @@ -215,7 +221,10 @@ struct ftrace_ops_hash { void ftrace_free_init_mem(void); void ftrace_free_mem(struct module *mod, void *start, void *end); #else -static inline void ftrace_free_init_mem(void) { } +static inline void ftrace_free_init_mem(void) +{ + ftrace_boot_snapshot(); +} static inline void ftrace_free_mem(struct module *mod, void *start, void *end) { } #endif -- cgit From b65700d046a60d0a29f6289e13c62d0c93689f11 Mon Sep 17 00:00:00 2001 From: Suman Anna Date: Tue, 8 Mar 2022 18:25:15 +0100 Subject: remoteproc: move rproc_da_to_va declaration to remoteproc.h The rproc_da_to_va() API is an exported function, so move its declaration from the remoteproc local remoteproc_internal.h to the public remoteproc.h file. This will allow drivers outside of the remoteproc folder to be able to use this API. Signed-off-by: Suman Anna Signed-off-by: Dave Gerlach [adjusted line numbers to apply] Signed-off-by: Drew Fustini Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20220308172515.29556-1-dfustini@baylibre.com --- include/linux/remoteproc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index 93a1d0050fbc..b2ee325e0af1 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -675,6 +675,7 @@ void rproc_shutdown(struct rproc *rproc); int rproc_detach(struct rproc *rproc); int rproc_set_firmware(struct rproc *rproc, const char *fw_name); void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type); +void *rproc_da_to_va(struct rproc *rproc, u64 da, size_t len, bool *is_iomem); void rproc_coredump_using_sections(struct rproc *rproc); int rproc_coredump_add_segment(struct rproc *rproc, dma_addr_t da, size_t size); int rproc_coredump_add_custom_segment(struct rproc *rproc, -- cgit From c993ee0f9f81caf5767a50d1faeba39a0dc82af2 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 11 Mar 2022 13:23:31 +0000 Subject: watch_queue: Fix filter limit check In watch_queue_set_filter(), there are a couple of places where we check that the filter type value does not exceed what the type_filter bitmap can hold. One place calculates the number of bits by: if (tf[i].type >= sizeof(wfilter->type_filter) * 8) which is fine, but the second does: if (tf[i].type >= sizeof(wfilter->type_filter) * BITS_PER_LONG) which is not. This can lead to a couple of out-of-bounds writes due to a too-large type: (1) __set_bit() on wfilter->type_filter (2) Writing more elements in wfilter->filters[] than we allocated. Fix this by just using the proper WATCH_TYPE__NR instead, which is the number of types we actually know about. The bug may cause an oops looking something like: BUG: KASAN: slab-out-of-bounds in watch_queue_set_filter+0x659/0x740 Write of size 4 at addr ffff88800d2c66bc by task watch_queue_oob/611 ... Call Trace: dump_stack_lvl+0x45/0x59 print_address_description.constprop.0+0x1f/0x150 ... kasan_report.cold+0x7f/0x11b ... watch_queue_set_filter+0x659/0x740 ... __x64_sys_ioctl+0x127/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae Allocated by task 611: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 watch_queue_set_filter+0x23a/0x740 __x64_sys_ioctl+0x127/0x190 do_syscall_64+0x43/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae The buggy address belongs to the object at ffff88800d2c66a0 which belongs to the cache kmalloc-32 of size 32 The buggy address is located 28 bytes inside of 32-byte region [ffff88800d2c66a0, ffff88800d2c66c0) Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: Jann Horn Signed-off-by: David Howells Signed-off-by: Linus Torvalds --- include/linux/watch_queue.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h index c994d1b2cdba..3b9a40ae8bdb 100644 --- a/include/linux/watch_queue.h +++ b/include/linux/watch_queue.h @@ -28,7 +28,8 @@ struct watch_type_filter { struct watch_filter { union { struct rcu_head rcu; - unsigned long type_filter[2]; /* Bitmask of accepted types */ + /* Bitmask of accepted types */ + DECLARE_BITMAP(type_filter, WATCH_TYPE__NR); }; u32 nr_filters; /* Number of filters */ struct watch_type_filter filters[]; -- cgit From c13b780c4597e1e6cee3154053a196aa329b1367 Mon Sep 17 00:00:00 2001 From: Suman Anna Date: Sun, 13 Feb 2022 14:12:42 -0600 Subject: remoteproc: Change rproc_shutdown() to return a status The rproc_shutdown() function is currently not returning any error code, and any failures within rproc_stop() are not passed back to the users. Change the signature to return a success value back to the callers. The remoteproc sysfs and cdev interfaces are also updated to return back this status to userspace. Signed-off-by: Suman Anna Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20220213201246.25952-2-s-anna@ti.com --- include/linux/remoteproc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index b2ee325e0af1..7c943f0a2fc4 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -671,7 +671,7 @@ rproc_of_resm_mem_entry_init(struct device *dev, u32 of_resm_idx, size_t len, u32 da, const char *name, ...); int rproc_boot(struct rproc *rproc); -void rproc_shutdown(struct rproc *rproc); +int rproc_shutdown(struct rproc *rproc); int rproc_detach(struct rproc *rproc); int rproc_set_firmware(struct rproc *rproc, const char *fw_name); void rproc_report_crash(struct rproc *rproc, enum rproc_crash_type type); -- cgit From 84bd3690bf54a2f2f3b8449afa022aac1957ba17 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 9 Mar 2022 19:49:37 -0800 Subject: nvdimm/namespace: Delete nd_namespace_blk Now that none of the configuration paths consider BLK namespaces, delete the BLK namespace data and supporting code. Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/164688417727.2879318.11691110761800109662.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- include/linux/nd.h | 26 -------------------------- 1 file changed, 26 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nd.h b/include/linux/nd.h index 4813c7089e5c..7b2ccbdc1cbc 100644 --- a/include/linux/nd.h +++ b/include/linux/nd.h @@ -136,27 +136,6 @@ struct nd_namespace_pmem { int id; }; -/** - * struct nd_namespace_blk - namespace for dimm-bounded persistent memory - * @alt_name: namespace name supplied in the dimm label - * @uuid: namespace name supplied in the dimm label - * @id: ida allocated id - * @lbasize: blk namespaces have a native sector size when btt not present - * @size: sum of all the resource ranges allocated to this namespace - * @num_resources: number of dpa extents to claim - * @res: discontiguous dpa extents for given dimm - */ -struct nd_namespace_blk { - struct nd_namespace_common common; - char *alt_name; - uuid_t *uuid; - int id; - unsigned long lbasize; - resource_size_t size; - int num_resources; - struct resource **res; -}; - static inline struct nd_namespace_io *to_nd_namespace_io(const struct device *dev) { return container_of(dev, struct nd_namespace_io, common.dev); @@ -169,11 +148,6 @@ static inline struct nd_namespace_pmem *to_nd_namespace_pmem(const struct device return container_of(nsio, struct nd_namespace_pmem, nsio); } -static inline struct nd_namespace_blk *to_nd_namespace_blk(const struct device *dev) -{ - return container_of(dev, struct nd_namespace_blk, common.dev); -} - /** * nvdimm_read_bytes() - synchronously read bytes from an nvdimm namespace * @ndns: device to read -- cgit From 3b6c6c039707f6bb7c64af2aa82a437fabb93aee Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 9 Mar 2022 19:49:48 -0800 Subject: nvdimm/region: Delete nd_blk_region infrastructure Now that the nd_namespace_blk infrastructure is removed, delete all the region machinery to coordinate provisioning aliased capacity between PMEM and BLK. Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/164688418803.2879318.1302315202397235855.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- include/linux/libnvdimm.h | 24 ------------------------ 1 file changed, 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index 7074aa9af525..0d61e07b6827 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -25,8 +25,6 @@ struct badrange { }; enum { - /* when a dimm supports both PMEM and BLK access a label is required */ - NDD_ALIASING = 0, /* unarmed memory devices may not persist writes */ NDD_UNARMED = 1, /* locked memory devices should not be accessed */ @@ -35,8 +33,6 @@ enum { NDD_SECURITY_OVERWRITE = 3, /* tracking whether or not there is a pending device reference */ NDD_WORK_PENDING = 4, - /* ignore / filter NSLABEL_FLAG_LOCAL for this DIMM, i.e. no aliasing */ - NDD_NOBLK = 5, /* dimm supports namespace labels */ NDD_LABELING = 6, @@ -140,21 +136,6 @@ static inline void __iomem *devm_nvdimm_ioremap(struct device *dev, } struct nvdimm_bus; -struct module; -struct nd_blk_region; -struct nd_blk_region_desc { - int (*enable)(struct nvdimm_bus *nvdimm_bus, struct device *dev); - int (*do_io)(struct nd_blk_region *ndbr, resource_size_t dpa, - void *iobuf, u64 len, int rw); - struct nd_region_desc ndr_desc; -}; - -static inline struct nd_blk_region_desc *to_blk_region_desc( - struct nd_region_desc *ndr_desc) -{ - return container_of(ndr_desc, struct nd_blk_region_desc, ndr_desc); - -} /* * Note that separate bits for locked + unlocked are defined so that @@ -257,7 +238,6 @@ struct nvdimm_bus *nvdimm_to_bus(struct nvdimm *nvdimm); struct nvdimm *to_nvdimm(struct device *dev); struct nd_region *to_nd_region(struct device *dev); struct device *nd_region_dev(struct nd_region *nd_region); -struct nd_blk_region *to_nd_blk_region(struct device *dev); struct nvdimm_bus_descriptor *to_nd_desc(struct nvdimm_bus *nvdimm_bus); struct device *to_nvdimm_bus_dev(struct nvdimm_bus *nvdimm_bus); const char *nvdimm_name(struct nvdimm *nvdimm); @@ -295,10 +275,6 @@ struct nd_region *nvdimm_blk_region_create(struct nvdimm_bus *nvdimm_bus, struct nd_region *nvdimm_volatile_region_create(struct nvdimm_bus *nvdimm_bus, struct nd_region_desc *ndr_desc); void *nd_region_provider_data(struct nd_region *nd_region); -void *nd_blk_region_provider_data(struct nd_blk_region *ndbr); -void nd_blk_region_set_provider_data(struct nd_blk_region *ndbr, void *data); -struct nvdimm *nd_blk_region_to_dimm(struct nd_blk_region *ndbr); -unsigned long nd_blk_memremap_flags(struct nd_blk_region *ndbr); unsigned int nd_region_acquire_lane(struct nd_region *nd_region); void nd_region_release_lane(struct nd_region *nd_region, unsigned int lane); u64 nd_fletcher64(void *addr, size_t len, bool le); -- cgit From c97448437847bd76116b3a077e44808e946bb1ae Mon Sep 17 00:00:00 2001 From: Maxime Ripard Date: Fri, 25 Feb 2022 15:35:29 +0100 Subject: clk: Add clk_drop_range In order to reset the range on a clock, we need to call clk_set_rate_range with a minimum of 0 and a maximum of ULONG_MAX. Since it's fairly inconvenient, let's introduce a clk_drop_range() function that will do just this. Suggested-by: Stephen Boyd Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20220225143534.405820-8-maxime@cerno.tech Signed-off-by: Stephen Boyd --- include/linux/clk.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk.h b/include/linux/clk.h index 266e8de3cb51..39faa54efe88 100644 --- a/include/linux/clk.h +++ b/include/linux/clk.h @@ -986,6 +986,17 @@ static inline void clk_bulk_disable_unprepare(int num_clks, clk_bulk_unprepare(num_clks, clks); } +/** + * clk_drop_range - Reset any range set on that clock + * @clk: clock source + * + * Returns success (0) or negative errno. + */ +static inline int clk_drop_range(struct clk *clk) +{ + return clk_set_rate_range(clk, 0, ULONG_MAX); +} + /** * clk_get_optional - lookup and obtain a reference to an optional clock * producer. -- cgit From d3826a95222c44a527f76b011bb5af8c924632e9 Mon Sep 17 00:00:00 2001 From: Dirk van der Merwe Date: Fri, 11 Mar 2022 11:43:06 +0100 Subject: nfp: add support for NFP3800/NFP3803 PCIe devices Enable binding the nfp driver to NFP3800 and NFP3803 devices. The PCIE_SRAM offset is different for the NFP3800 device, which also only supports a single explicit group. Changes to Dirk's work: * 48-bit dma addressing is not ready yet. Keep 40-bit dma addressing for NFP3800. Signed-off-by: Dirk van der Merwe Signed-off-by: Jakub Kicinski Signed-off-by: Fei Qin Signed-off-by: Simon Horman Signed-off-by: Jakub Kicinski --- include/linux/pci_ids.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index c7e6f2043c7d..5462c29f0538 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2531,9 +2531,11 @@ #define PCI_VENDOR_ID_HUAWEI 0x19e5 #define PCI_VENDOR_ID_NETRONOME 0x19ee +#define PCI_DEVICE_ID_NETRONOME_NFP3800 0x3800 #define PCI_DEVICE_ID_NETRONOME_NFP4000 0x4000 #define PCI_DEVICE_ID_NETRONOME_NFP5000 0x5000 #define PCI_DEVICE_ID_NETRONOME_NFP6000 0x6000 +#define PCI_DEVICE_ID_NETRONOME_NFP3800_VF 0x3803 #define PCI_DEVICE_ID_NETRONOME_NFP6000_VF 0x6003 #define PCI_VENDOR_ID_QMI 0x1a32 -- cgit From 625788b5844511cf4c30cffa7fa0bc3a69cebc82 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 10 Mar 2022 21:14:20 -0800 Subject: net: add per-cpu storage and net->core_stats Before adding yet another possibly contended atomic_long_t, it is time to add per-cpu storage for existing ones: dev->tx_dropped, dev->rx_dropped, and dev->rx_nohandler Because many devices do not have to increment such counters, allocate the per-cpu storage on demand, so that dev_get_stats() does not have to spend considerable time folding zero counters. Note that some drivers have abused these counters which were supposed to be only used by core networking stack. v4: should use per_cpu_ptr() in dev_get_stats() (Jakub) v3: added a READ_ONCE() in netdev_core_stats_alloc() (Paolo) v2: add a missing include (reported by kernel test robot ) Change in netdev_core_stats_alloc() (Jakub) Signed-off-by: Eric Dumazet Cc: jeffreyji Reviewed-by: Brian Vazquez Reviewed-by: Jakub Kicinski Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20220311051420.2608812-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 46 +++++++++++++++++++++++++++++++++++++--------- 1 file changed, 37 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index acd3cf69b61f..0d994710b335 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -194,6 +195,14 @@ struct net_device_stats { unsigned long tx_compressed; }; +/* per-cpu stats, allocated on demand. + * Try to fit them in a single cache line, for dev_get_stats() sake. + */ +struct net_device_core_stats { + local_t rx_dropped; + local_t tx_dropped; + local_t rx_nohandler; +} __aligned(4 * sizeof(local_t)); #include #include @@ -1735,12 +1744,8 @@ enum netdev_ml_priv_type { * @stats: Statistics struct, which was left as a legacy, use * rtnl_link_stats64 instead * - * @rx_dropped: Dropped packets by core network, - * do not use this in drivers - * @tx_dropped: Dropped packets by core network, + * @core_stats: core networking counters, * do not use this in drivers - * @rx_nohandler: nohandler dropped packets by core network on - * inactive devices, do not use this in drivers * @carrier_up_count: Number of times the carrier has been up * @carrier_down_count: Number of times the carrier has been down * @@ -2023,9 +2028,7 @@ struct net_device { struct net_device_stats stats; /* not used by modern drivers */ - atomic_long_t rx_dropped; - atomic_long_t tx_dropped; - atomic_long_t rx_nohandler; + struct net_device_core_stats __percpu *core_stats; /* Stats to monitor link on/off, flapping */ atomic_t carrier_up_count; @@ -3839,13 +3842,38 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev, return false; } +struct net_device_core_stats *netdev_core_stats_alloc(struct net_device *dev); + +static inline struct net_device_core_stats *dev_core_stats(struct net_device *dev) +{ + /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ + struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); + + if (likely(p)) + return this_cpu_ptr(p); + + return netdev_core_stats_alloc(dev); +} + +#define DEV_CORE_STATS_INC(FIELD) \ +static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ +{ \ + struct net_device_core_stats *p = dev_core_stats(dev); \ + \ + if (p) \ + local_inc(&p->FIELD); \ +} +DEV_CORE_STATS_INC(rx_dropped) +DEV_CORE_STATS_INC(tx_dropped) +DEV_CORE_STATS_INC(rx_nohandler) + static __always_inline int ____dev_forward_skb(struct net_device *dev, struct sk_buff *skb, const bool check_mtu) { if (skb_orphan_frags(skb, GFP_ATOMIC) || unlikely(!__is_skb_forwardable(dev, skb, check_mtu))) { - atomic_long_inc(&dev->rx_dropped); + dev_core_stats_rx_dropped_inc(dev); kfree_skb(skb); return NET_RX_DROP; } -- cgit From f2aa197e4794bf4c2c0c9570684f86e6fa103e8b Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Sat, 5 Mar 2022 11:41:03 +0800 Subject: cgroup: Fix suspicious rcu_dereference_check() usage warning task_css_set_check() will use rcu_dereference_check() to check for rcu_read_lock_held() on the read-side, which is not true after commit dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock"). This commit drop explicit rcu_read_lock(), change to RCU-sched read-side critical section. So fix the RCU warning by adding check for rcu_read_lock_sched_held(). Fixes: dc6e0818bc9a ("sched/cpuacct: Optimize away RCU read lock") Reported-by: Linux Kernel Functional Testing Reported-by: syzbot+16e3f2c77e7c5a0113f9@syzkaller.appspotmail.com Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Acked-by: Tejun Heo Tested-by: Zhouyi Zhou Tested-by: Marek Szyprowski Link: https://lore.kernel.org/r/20220305034103.57123-1-zhouchengming@bytedance.com --- include/linux/cgroup.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 1e356c222756..0d1ada8968d7 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -450,6 +450,7 @@ extern struct mutex cgroup_mutex; extern spinlock_t css_set_lock; #define task_css_set_check(task, __c) \ rcu_dereference_check((task)->cgroups, \ + rcu_read_lock_sched_held() || \ lockdep_is_held(&cgroup_mutex) || \ lockdep_is_held(&css_set_lock) || \ ((task)->flags & PF_EXITING) || (__c)) -- cgit From 6f98a4bfee72c22f50aedb39fb761567969865fe Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Mon, 7 Feb 2022 17:19:24 +0100 Subject: random: block in /dev/urandom This topic has come up countless times, and usually doesn't go anywhere. This time I thought I'd bring it up with a slightly narrower focus, updated for some developments over the last three years: we finally can make /dev/urandom always secure, in light of the fact that our RNG is now always seeded. Ever since Linus' 50ee7529ec45 ("random: try to actively add entropy rather than passively wait for it"), the RNG does a haveged-style jitter dance around the scheduler, in order to produce entropy (and credit it) for the case when we're stuck in wait_for_random_bytes(). How ever you feel about the Linus Jitter Dance is beside the point: it's been there for three years and usually gets the RNG initialized in a second or so. As a matter of fact, this is what happens currently when people use getrandom(). It's already there and working, and most people have been using it for years without realizing. So, given that the kernel has grown this mechanism for seeding itself from nothing, and that this procedure happens pretty fast, maybe there's no point any longer in having /dev/urandom give insecure bytes. In the past we didn't want the boot process to deadlock, which was understandable. But now, in the worst case, a second goes by, and the problem is resolved. It seems like maybe we're finally at a point when we can get rid of the infamous "urandom read hole". The one slight drawback is that the Linus Jitter Dance relies on random_ get_entropy() being implemented. The first lines of try_to_generate_ entropy() are: stack.now = random_get_entropy(); if (stack.now == random_get_entropy()) return; On most platforms, random_get_entropy() is simply aliased to get_cycles(). The number of machines without a cycle counter or some other implementation of random_get_entropy() in 2022, which can also run a mainline kernel, and at the same time have a both broken and out of date userspace that relies on /dev/urandom never blocking at boot is thought to be exceedingly low. And to be clear: those museum pieces without cycle counters will continue to run Linux just fine, and even /dev/urandom will be operable just like before; the RNG just needs to be seeded first through the usual means, which should already be the case now. On systems that really do want unseeded randomness, we already offer getrandom(GRND_INSECURE), which is in use by, e.g., systemd for seeding their hash tables at boot. Nothing in this commit would affect GRND_INSECURE, and it remains the means of getting those types of random numbers. This patch goes a long way toward eliminating a long overdue userspace crypto footgun. After several decades of endless user confusion, we will finally be able to say, "use any single one of our random interfaces and you'll be fine. They're all the same. It doesn't matter." And that, I think, is really something. Finally all of those blog posts and disagreeing forums and contradictory articles will all become correct about whatever they happened to recommend, and along with it, a whole class of vulnerabilities eliminated. With very minimal downside, we're finally in a position where we can make this change. Cc: Dinh Nguyen Cc: Nick Hu Cc: Max Filippov Cc: Palmer Dabbelt Cc: David S. Miller Cc: Yoshinori Sato Cc: Michal Simek Cc: Borislav Petkov Cc: Guo Ren Cc: Geert Uytterhoeven Cc: Joshua Kinard Cc: David Laight Cc: Dominik Brodowski Cc: Eric Biggers Cc: Ard Biesheuvel Cc: Arnd Bergmann Cc: Thomas Gleixner Cc: Andy Lutomirski Cc: Kees Cook Cc: Lennart Poettering Cc: Konstantin Ryabitsev Cc: Linus Torvalds Cc: Greg Kroah-Hartman Cc: Theodore Ts'o Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 6148b8d1ccf3..725a4d08c0a0 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -44,7 +44,7 @@ extern void del_random_ready_callback(struct random_ready_callback *rdy); extern size_t __must_check get_random_bytes_arch(void *buf, size_t nbytes); #ifndef MODULE -extern const struct file_operations random_fops, urandom_fops; +extern const struct file_operations random_fops; #endif u32 get_random_u32(void); -- cgit From ae099e8e98fb01395228628be5a4661e3bd86fe4 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 23 Feb 2022 13:43:44 +0100 Subject: random: add mechanism for VM forks to reinitialize crng When a VM forks, we must immediately mix in additional information to the stream of random output so that two forks or a rollback don't produce the same stream of random numbers, which could have catastrophic cryptographic consequences. This commit adds a simple API, add_vmfork_ randomness(), for that, by force reseeding the crng. This has the added benefit of also draining the entropy pool and setting its timer back, so that any old entropy that was there prior -- which could have already been used by a different fork, or generally gone stale -- does not contribute to the accounting of the next 256 bits. Cc: Dominik Brodowski Cc: Theodore Ts'o Cc: Jann Horn Cc: Eric Biggers Reviewed-by: Ard Biesheuvel Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 725a4d08c0a0..117468f3a92e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -34,6 +34,7 @@ extern void add_input_randomness(unsigned int type, unsigned int code, extern void add_interrupt_randomness(int irq) __latent_entropy; extern void add_hwgenerator_randomness(const void *buffer, size_t count, size_t entropy); +extern void add_vmfork_randomness(const void *unique_vm_id, size_t size); extern void get_random_bytes(void *buf, size_t nbytes); extern int wait_for_random_bytes(void); -- cgit From d273845ecb0e0626842782a4497f0c5876139ec3 Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Fri, 25 Feb 2022 16:55:52 +0100 Subject: ACPI: allow longer device IDs We create a list of ACPI "PNP" IDs which contains _HID, _CID, and CLS entries of the respective devices. However, when making structs for matching, we squeeze those IDs into acpi_device_id, which only has 9 bytes space to store the identifier. The subsystem actually captures the full length of the IDs, and the modalias has the full length, but this struct we use for matching is limited. It originally had 16 bytes, but was changed to only have 9 in 6543becf26ff ("mod/file2alias: make modalias generation safe for cross compiling"), presumably on the theory that it would match the ACPI spec so it didn't matter. Unfortunately, while most people adhere to the ACPI specs, Microsoft decided that its VM Generation Counter device [1] should only be identifiable by _CID with a value of "VM_Gen_Counter", which is longer than 9 characters. To allow device drivers to match identifiers that exceed the 9 byte limit, this simply ups the length to 16, just like it was before the aforementioned commit. Empirical testing indicates that this doesn't actually increase vmlinux size on 64-bit, because the ulong in the same struct caused there to be 7 bytes of padding anyway, and when doing a s/M/Y/g i386_defconfig build, the bzImage only increased by 0.0055%, so negligible. This patch is a prerequisite to add support for VMGenID in Linux, the subsequent patch in this series. It has been confirmed to also work on the udev/modalias side in userspace. [1] https://download.microsoft.com/download/3/1/C/31CFC307-98CA-4CA5-914C-D9772691E214/VirtualMachineGenerationID.docx Signed-off-by: Alexander Graf Co-developed-by: Jason A. Donenfeld [Jason: reworked commit message a bit, went with len=16 approach.] Cc: Mika Westerberg Cc: Andy Shevchenko Cc: Len Brown Cc: Greg Kroah-Hartman Reviewed-by: Ard Biesheuvel Acked-by: Hans de Goede Acked-by: Rafael J. Wysocki Signed-off-by: Jason A. Donenfeld --- include/linux/mod_devicetable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h index 4bb71979a8fd..5da5d990ff58 100644 --- a/include/linux/mod_devicetable.h +++ b/include/linux/mod_devicetable.h @@ -211,7 +211,7 @@ struct css_device_id { kernel_ulong_t driver_data; }; -#define ACPI_ID_LEN 9 +#define ACPI_ID_LEN 16 struct acpi_device_id { __u8 id[ACPI_ID_LEN]; -- cgit From a4107d34f960df99ca07fa8eb022425a804f59f3 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Tue, 1 Mar 2022 15:14:04 +0100 Subject: random: do not export add_vmfork_randomness() unless needed Since add_vmfork_randomness() is only called from vmgenid.o, we can guard it in CONFIG_VMGENID, similarly to how we do with add_disk_randomness() and CONFIG_BLOCK. If we ever have multiple things calling into add_vmfork_randomness(), we can add another shared Kconfig symbol for that, but for now, this is good enough. Even though add_vmfork_randomess() is a pretty small function, removing it means that there are only calls to crng_reseed(false) and none to crng_reseed(true), which means the compiler can constant propagate the false, removing branches from crng_reseed() and its descendants. Additionally, we don't even need the symbol to be exported if CONFIG_VMGENID is not a module, so conditionalize that too. Cc: Dominik Brodowski Cc: Theodore Ts'o Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 117468f3a92e..f209f1a78899 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -34,7 +34,9 @@ extern void add_input_randomness(unsigned int type, unsigned int code, extern void add_interrupt_randomness(int irq) __latent_entropy; extern void add_hwgenerator_randomness(const void *buffer, size_t count, size_t entropy); +#if IS_ENABLED(CONFIG_VMGENID) extern void add_vmfork_randomness(const void *unique_vm_id, size_t size); +#endif extern void get_random_bytes(void *buf, size_t nbytes); extern int wait_for_random_bytes(void); -- cgit From 5acd35487dc911541672b3ffc322851769c32a56 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Tue, 1 Mar 2022 20:03:49 +0100 Subject: random: replace custom notifier chain with standard one We previously rolled our own randomness readiness notifier, which only has two users in the whole kernel. Replace this with a more standard atomic notifier block that serves the same purpose with less code. Also unexport the symbols, because no modules use it, only unconditional builtins. The only drawback is that it's possible for a notification handler returning the "stop" code to prevent further processing, but given that there are only two users, and that we're unexporting this anyway, that doesn't seem like a significant drawback for the simplification we receive here. Cc: Greg Kroah-Hartman Cc: Theodore Ts'o Reviewed-by: Dominik Brodowski Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index f209f1a78899..fab1ab0563b4 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -10,11 +10,7 @@ #include -struct random_ready_callback { - struct list_head list; - void (*func)(struct random_ready_callback *rdy); - struct module *owner; -}; +struct notifier_block; extern void add_device_randomness(const void *, size_t); extern void add_bootloader_randomness(const void *, size_t); @@ -42,8 +38,8 @@ extern void get_random_bytes(void *buf, size_t nbytes); extern int wait_for_random_bytes(void); extern int __init rand_initialize(void); extern bool rng_is_initialized(void); -extern int add_random_ready_callback(struct random_ready_callback *rdy); -extern void del_random_ready_callback(struct random_ready_callback *rdy); +extern int register_random_ready_notifier(struct notifier_block *nb); +extern int unregister_random_ready_notifier(struct notifier_block *nb); extern size_t __must_check get_random_bytes_arch(void *buf, size_t nbytes); #ifndef MODULE -- cgit From f3c2682bad7bc6033c837e9c66e5af881fe8d465 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Tue, 1 Mar 2022 20:22:39 +0100 Subject: random: provide notifier for VM fork Drivers such as WireGuard need to learn when VMs fork in order to clear sessions. This commit provides a simple notifier_block for that, with a register and unregister function. When no VM fork detection is compiled in, this turns into a no-op, similar to how the power notifier works. Cc: Dominik Brodowski Cc: Theodore Ts'o Reviewed-by: Greg Kroah-Hartman Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index fab1ab0563b4..c0baffe7afb1 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -32,6 +32,11 @@ extern void add_hwgenerator_randomness(const void *buffer, size_t count, size_t entropy); #if IS_ENABLED(CONFIG_VMGENID) extern void add_vmfork_randomness(const void *unique_vm_id, size_t size); +extern int register_random_vmfork_notifier(struct notifier_block *nb); +extern int unregister_random_vmfork_notifier(struct notifier_block *nb); +#else +static inline int register_random_vmfork_notifier(struct notifier_block *nb) { return 0; } +static inline int unregister_random_vmfork_notifier(struct notifier_block *nb) { return 0; } #endif extern void get_random_bytes(void *buf, size_t nbytes); -- cgit From df227dc8a68d81b7fe3416eca16d1f21d0b537f0 Mon Sep 17 00:00:00 2001 From: Dave Gerlach Date: Wed, 9 Feb 2022 22:16:31 -0600 Subject: mailbox: ti-msgmgr: Operate mailbox in polled mode during system suspend During the system suspend path we must set all queues to operate in polled mode as it is possible for any protocol built using this mailbox, such as TISCI, to require communication during the no irq phase of suspend, and we cannot rely on interrupts there. Polled mode is implemented by allowing the mailbox user to define an RX channel as part of the message that is sent which is what gets polled for a response. If polled mode is enabled, this will immediately be polled for a response at the end of the mailbox send_data op before returning success for the data send or timing out if no response is received. Finally, to ensure polled mode is always enabled during system suspend, iterate through all queues to set RX queues to polled mode during system suspend and disable polled mode for all in the resume handler. Signed-off-by: Dave Gerlach Signed-off-by: Jassi Brar --- include/linux/soc/ti/ti-msgmgr.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/soc/ti/ti-msgmgr.h b/include/linux/soc/ti/ti-msgmgr.h index 1f6e76d423cf..69a8d7682c4b 100644 --- a/include/linux/soc/ti/ti-msgmgr.h +++ b/include/linux/soc/ti/ti-msgmgr.h @@ -1,7 +1,7 @@ /* * Texas Instruments' Message Manager * - * Copyright (C) 2015-2016 Texas Instruments Incorporated - https://www.ti.com/ + * Copyright (C) 2015-2022 Texas Instruments Incorporated - https://www.ti.com/ * Nishanth Menon * * This program is free software; you can redistribute it and/or modify @@ -17,10 +17,14 @@ #ifndef TI_MSGMGR_H #define TI_MSGMGR_H +struct mbox_chan; + /** * struct ti_msgmgr_message - Message Manager structure * @len: Length of data in the Buffer * @buf: Buffer pointer + * @chan_rx: Expected channel for response, must be provided to use polled rx + * @timeout_rx_ms: Timeout value to use if polling for response * * This is the structure for data used in mbox_send_message * the length of data buffer used depends on the SoC integration @@ -30,6 +34,8 @@ struct ti_msgmgr_message { size_t len; u8 *buf; + struct mbox_chan *chan_rx; + int timeout_rx_ms; }; #endif /* TI_MSGMGR_H */ -- cgit From a41b05edfedb939440e83666f23de3ef9af33acf Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 7 Mar 2022 10:41:44 +1100 Subject: SUNRPC/auth: async tasks mustn't block waiting for memory When memory is short, new worker threads cannot be created and we depend on the minimum one rpciod thread to be able to handle everything. So it must not block waiting for memory. mempools are particularly a problem as memory can only be released back to the mempool by an async rpc task running. If all available workqueue threads are waiting on the mempool, no thread is available to return anything. lookup_cred() can block on a mempool or kmalloc - and this can cause deadlocks. So add a new RPCAUTH_LOOKUP flag for async lookups and don't block on memory. If the -ENOMEM gets back to call_refreshresult(), wait a short while and try again. HZ>>4 is chosen as it is used elsewhere for -ENOMEM retries. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust --- include/linux/sunrpc/auth.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h index 98da816b5fc2..3e6ce288a7fc 100644 --- a/include/linux/sunrpc/auth.h +++ b/include/linux/sunrpc/auth.h @@ -99,6 +99,7 @@ struct rpc_auth_create_args { /* Flags for rpcauth_lookupcred() */ #define RPCAUTH_LOOKUP_NEW 0x01 /* Accept an uninitialised cred */ +#define RPCAUTH_LOOKUP_ASYNC 0x02 /* Don't block waiting for memory */ /* * Client authentication ops -- cgit From 89c2be8a951654758dffeaaa6272328d9c8f29be Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 7 Mar 2022 10:41:44 +1100 Subject: NFS: discard NFS_RPC_SWAPFLAGS and RPC_TASK_ROOTCREDS NFS_RPC_SWAPFLAGS is only used for READ requests. It sets RPC_TASK_SWAPPER which gives some memory-allocation priority to requests. This is not needed for swap READ - though it is for writes where it is set via a different mechanism. RPC_TASK_ROOTCREDS causes the 'machine' credential to be used. This is not needed as the root credential is saved when the swap file is opened, and this is used for all IO. So NFS_RPC_SWAPFLAGS isn't needed, and as it is the only user of RPC_TASK_ROOTCREDS, that isn't needed either. Remove both. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 5 ----- include/linux/sunrpc/sched.h | 1 - 2 files changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 3893386ceaed..9074ed0b65aa 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -45,11 +45,6 @@ */ #define NFS_MAX_TRANSPORTS 16 -/* - * These are the default flags for swap requests - */ -#define NFS_RPC_SWAPFLAGS (RPC_TASK_SWAPPER|RPC_TASK_ROOTCREDS) - /* * Size of the NFS directory verifier */ diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index db964bb63912..56710f8056d3 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -124,7 +124,6 @@ struct rpc_task_setup { #define RPC_TASK_MOVEABLE 0x0004 /* nfs4.1+ rpc tasks */ #define RPC_TASK_NULLCREDS 0x0010 /* Use AUTH_NULL credential */ #define RPC_CALL_MAJORSEEN 0x0020 /* major timeout seen */ -#define RPC_TASK_ROOTCREDS 0x0040 /* force root creds */ #define RPC_TASK_DYNAMIC 0x0080 /* task was kmalloc'ed */ #define RPC_TASK_NO_ROUND_ROBIN 0x0100 /* send requests on "main" xprt */ #define RPC_TASK_SOFT 0x0200 /* Use soft timeouts */ -- cgit From 4dc73c679114a2f408567e2e44770ed934190db2 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 7 Mar 2022 10:41:44 +1100 Subject: NFSv4: keep state manager thread active if swap is enabled If we are swapping over NFSv4, we may not be able to allocate memory to start the state-manager thread at the time when we need it. So keep it always running when swap is enabled, and just signal it to start. This requires updating and testing the cl_swapper count on the root rpc_clnt after following all ->cl_parent links. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust --- include/linux/nfs_xdr.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 82f7c2730b9a..49ba486aea5f 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1797,6 +1797,8 @@ struct nfs_rpc_ops { struct nfs_server *(*clone_server)(struct nfs_server *, struct nfs_fh *, struct nfs_fattr *, rpc_authflavor_t); int (*discover_trunking)(struct nfs_server *, struct nfs_fh *); + void (*enable_swap)(struct inode *inode); + void (*disable_swap)(struct inode *inode); }; /* -- cgit From 64158668ac8b31626a8ce48db4cad08496eb8340 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 7 Mar 2022 10:41:44 +1100 Subject: NFS: swap IO handling is slightly different for O_DIRECT IO 1/ Taking the i_rwsem for swap IO triggers lockdep warnings regarding possible deadlocks with "fs_reclaim". These deadlocks could, I believe, eventuate if a buffered read on the swapfile was attempted. We don't need coherence with the page cache for a swap file, and buffered writes are forbidden anyway. There is no other need for i_rwsem during direct IO. So never take it for swap_rw() 2/ generic_write_checks() explicitly forbids writes to swap, and performs checks that are not needed for swap. So bypass it for swap_rw(). Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 9074ed0b65aa..c47c448befc8 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -508,10 +508,10 @@ static inline const struct cred *nfs_file_cred(struct file *file) * linux/fs/nfs/direct.c */ extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *); -extern ssize_t nfs_file_direct_read(struct kiocb *iocb, - struct iov_iter *iter); -extern ssize_t nfs_file_direct_write(struct kiocb *iocb, - struct iov_iter *iter); +ssize_t nfs_file_direct_read(struct kiocb *iocb, + struct iov_iter *iter, bool swap); +ssize_t nfs_file_direct_write(struct kiocb *iocb, + struct iov_iter *iter, bool swap); /* * linux/fs/nfs/dir.c -- cgit From 1f4a5983d623d6dbda4cc7587a2d9d798e0d4035 Mon Sep 17 00:00:00 2001 From: Ziyang Xuan Date: Fri, 11 Mar 2022 17:04:03 +0800 Subject: net: macvlan: add net device refcount tracker Add net device refcount tracker to macvlan. Signed-off-by: Ziyang Xuan Signed-off-by: David S. Miller --- include/linux/if_macvlan.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index 10c94a3936ca..b42294739063 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -21,6 +21,7 @@ struct macvlan_dev { struct hlist_node hlist; struct macvlan_port *port; struct net_device *lowerdev; + netdevice_tracker dev_tracker; void *accel_priv; struct vlan_pcpu_stats __percpu *pcpu_stats; -- cgit From fbd9a2ceba5c74bbfa19cf257ae4b4b2c820860d Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Fri, 11 Mar 2022 16:03:42 +0100 Subject: net: Add lockdep asserts to ____napi_schedule(). ____napi_schedule() needs to be invoked with disabled interrupts due to __raise_softirq_irqoff (in order not to corrupt the per-CPU list). ____napi_schedule() needs also to be invoked from an interrupt context so that the raised-softirq is processed while the interrupt context is left. Add lockdep asserts for both conditions. While this is the second time the irq/softirq check is needed, provide a generic lockdep_assert_softirq_will_run() which is used by both caller. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: David S. Miller --- include/linux/lockdep.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 467b94257105..0cc65d216701 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -329,6 +329,12 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie); #define lockdep_assert_none_held_once() \ lockdep_assert_once(!current->lockdep_depth) +/* + * Ensure that softirq is handled within the callchain and not delayed and + * handled by chance. + */ +#define lockdep_assert_softirq_will_run() \ + lockdep_assert_once(hardirq_count() | softirq_count()) #define lockdep_recursing(tsk) ((tsk)->lockdep_recursion) @@ -414,6 +420,7 @@ extern int lockdep_is_held(const void *); #define lockdep_assert_held_read(l) do { (void)(l); } while (0) #define lockdep_assert_held_once(l) do { (void)(l); } while (0) #define lockdep_assert_none_held_once() do { } while (0) +#define lockdep_assert_softirq_will_run() do { } while (0) #define lockdep_recursing(tsk) (0) -- cgit From 871129332d74c9e94bd110932ac4445833995639 Mon Sep 17 00:00:00 2001 From: Omar Sandoval Date: Wed, 4 Sep 2019 12:13:25 -0700 Subject: fs: export rw_verify_area() I'm adding btrfs ioctls to read and write compressed data, and rather than duplicating the checks in rw_verify_area(), let's just export it. Reviewed-by: Josef Bacik Signed-off-by: Omar Sandoval Reviewed-by: David Sterba Signed-off-by: David Sterba --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index e2d892b201b0..0ebfc2519212 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3173,6 +3173,7 @@ extern loff_t fixed_size_llseek(struct file *file, loff_t offset, int whence, loff_t size); extern loff_t no_seek_end_llseek_size(struct file *, loff_t, int, loff_t); extern loff_t no_seek_end_llseek(struct file *, loff_t, int); +int rw_verify_area(int, struct file *, const loff_t *, size_t); extern int generic_file_open(struct inode * inode, struct file * filp); extern int nonseekable_open(struct inode * inode, struct file * filp); extern int stream_open(struct inode * inode, struct file * filp); -- cgit From f6f7a25a650818c61defa97d60a79b216618315f Mon Sep 17 00:00:00 2001 From: Omar Sandoval Date: Thu, 12 Aug 2021 15:34:57 -0700 Subject: fs: export variant of generic_write_checks without iov_iter Encoded I/O in Btrfs needs to check a write with a given logical size without an iov_iter that matches that size (because the iov_iter we have is for the compressed data). So, factor out the parts of generic_write_check() that don't need an iov_iter into a new generic_write_checks_count() function and export that. Reviewed-by: Nikolay Borisov Signed-off-by: Omar Sandoval Reviewed-by: David Sterba Signed-off-by: David Sterba --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 0ebfc2519212..27746a3da8fd 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3130,6 +3130,7 @@ extern int sb_min_blocksize(struct super_block *, int); extern int generic_file_mmap(struct file *, struct vm_area_struct *); extern int generic_file_readonly_mmap(struct file *, struct vm_area_struct *); extern ssize_t generic_write_checks(struct kiocb *, struct iov_iter *); +int generic_write_checks_count(struct kiocb *iocb, loff_t *count); extern int generic_write_check_limits(struct file *file, loff_t pos, loff_t *count); extern int generic_file_rw_checks(struct file *file_in, struct file *file_out); -- cgit From ec090a0392ff634e9304c19a8578d476a9e0c830 Mon Sep 17 00:00:00 2001 From: Tudor Ambarus Date: Fri, 25 Feb 2022 16:46:56 +0200 Subject: mtd: core: Remove partid and partname debugfs files partid and partname debugfs files were used just by SPI NOR, but they were replaced by sysfs entries. Since these debugfs files are no longer used in mtd, remove dead code. The directory is kept as it is used by nandsim, mtdswap and docg3. Signed-off-by: Tudor Ambarus Reviewed-by: Pratyush Yadav Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220225144656.634682-1-tudor.ambarus@microchip.com --- include/linux/mtd/mtd.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h index 1b3fc8c71ab3..151607e9d64a 100644 --- a/include/linux/mtd/mtd.h +++ b/include/linux/mtd/mtd.h @@ -188,9 +188,6 @@ struct module; /* only needed for owner field in mtd_info */ */ struct mtd_debug_info { struct dentry *dfs_dir; - - const char *partname; - const char *partid; }; /** -- cgit From fc93db153b0187a9047f00bfd5cce75108530593 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 12 Mar 2022 13:45:05 -0800 Subject: net: disable preemption in dev_core_stats_XXX_inc() helpers syzbot was kind enough to remind us that dev->{tx_dropped|rx_dropped} could be increased in process context. BUG: using smp_processor_id() in preemptible [00000000] code: syz-executor413/3593 caller is netdev_core_stats_alloc+0x98/0x110 net/core/dev.c:10298 CPU: 1 PID: 3593 Comm: syz-executor413 Not tainted 5.17.0-rc7-syzkaller-02426-g97aeb877de7f #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 check_preemption_disabled+0x16b/0x170 lib/smp_processor_id.c:49 netdev_core_stats_alloc+0x98/0x110 net/core/dev.c:10298 dev_core_stats include/linux/netdevice.h:3855 [inline] dev_core_stats_rx_dropped_inc include/linux/netdevice.h:3866 [inline] tun_get_user+0x3455/0x3ab0 drivers/net/tun.c:1800 tun_chr_write_iter+0xe1/0x200 drivers/net/tun.c:2015 call_write_iter include/linux/fs.h:2074 [inline] new_sync_write+0x431/0x660 fs/read_write.c:503 vfs_write+0x7cd/0xae0 fs/read_write.c:590 ksys_write+0x12d/0x250 fs/read_write.c:643 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f2cf4f887e3 Code: 5d 41 5c 41 5d 41 5e e9 9b fd ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 28 48 89 54 24 18 RSP: 002b:00007ffd50dd5fd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007ffd50dd6000 RCX: 00007f2cf4f887e3 RDX: 000000000000002a RSI: 0000000000000000 RDI: 00000000000000c8 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffd50dd5ff0 R14: 00007ffd50dd5fe8 R15: 00007ffd50dd5fe4 Fixes: 625788b58445 ("net: add per-cpu storage and net->core_stats") Signed-off-by: Eric Dumazet Cc: jeffreyji Cc: Brian Vazquez Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20220312214505.3294762-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 0d994710b335..8cbe96ce0a2c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3858,10 +3858,14 @@ static inline struct net_device_core_stats *dev_core_stats(struct net_device *de #define DEV_CORE_STATS_INC(FIELD) \ static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ { \ - struct net_device_core_stats *p = dev_core_stats(dev); \ + struct net_device_core_stats *p; \ + \ + preempt_disable(); \ + p = dev_core_stats(dev); \ \ if (p) \ local_inc(&p->FIELD); \ + preempt_enable(); \ } DEV_CORE_STATS_INC(rx_dropped) DEV_CORE_STATS_INC(tx_dropped) -- cgit From c14c6843aeb8cdc8f6b0e49411d230e6f6dfda62 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:23 +0000 Subject: fs: read_mapping_page() should take a struct file argument While read_cache_page() takes a void *, because you can pass a pointer to anything as the first argument of filler_t, if we are calling read_mapping_page(), it will be passed as the first argument of ->readpage, so we know this must be a struct file pointer, and we should let the compiler enforce that for us. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/pagemap.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 270bf5136c34..55a80d8f0e9c 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -636,15 +636,15 @@ extern int read_cache_pages(struct address_space *mapping, struct list_head *pages, filler_t *filler, void *data); static inline struct page *read_mapping_page(struct address_space *mapping, - pgoff_t index, void *data) + pgoff_t index, struct file *file) { - return read_cache_page(mapping, index, NULL, data); + return read_cache_page(mapping, index, NULL, file); } static inline struct folio *read_mapping_folio(struct address_space *mapping, - pgoff_t index, void *data) + pgoff_t index, struct file *file) { - return read_cache_folio(mapping, index, NULL, data); + return read_cache_folio(mapping, index, NULL, file); } /* -- cgit From cd1067beeebfe23fc8cab071790fefb273962d51 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:26 +0000 Subject: buffer: Add folio_buffers() While there is no intent to use large folios in filesystems using buffer heads, converting the filesystems to use single-page folios is still worth doing to remove legacy infrastructure and hidden calls to compound_head(). These helper functions are needed for that conversion to take place. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/buffer_head.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 36f33685c8c0..3451f1fcda12 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -144,6 +144,7 @@ BUFFER_FNS(Defer_Completion, defer_completion) ((struct buffer_head *)page_private(page)); \ }) #define page_has_buffers(page) PagePrivate(page) +#define folio_buffers(folio) folio_get_private(folio) void buffer_check_dirty_writeback(struct page *page, bool *dirty, bool *writeback); -- cgit From 2e7e80f7e7e9dbbb3c2a85ee923ca32826052816 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:27 +0000 Subject: fs: Convert is_partially_uptodate to folios Since the uptodate property is maintained on a per-folio basis, the is_partially_uptodate method should also take a folio. Fix the types at the same time so it's clear that it returns true/false and takes the count in bytes, not blocks. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/buffer_head.h | 3 +-- include/linux/fs.h | 4 ++-- include/linux/iomap.h | 3 +-- 3 files changed, 4 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 3451f1fcda12..79d465057889 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -225,8 +225,7 @@ int __block_write_full_page(struct inode *inode, struct page *page, get_block_t *get_block, struct writeback_control *wbc, bh_end_io_t *handler); int block_read_full_page(struct page*, get_block_t*); -int block_is_partially_uptodate(struct page *page, unsigned long from, - unsigned long count); +bool block_is_partially_uptodate(struct folio *, size_t from, size_t count); int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len, unsigned flags, struct page **pagep, get_block_t *get_block); int __block_write_begin(struct page *page, loff_t pos, unsigned len, diff --git a/include/linux/fs.h b/include/linux/fs.h index e2d892b201b0..5939e6694ada 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -400,8 +400,8 @@ struct address_space_operations { bool (*isolate_page)(struct page *, isolate_mode_t); void (*putback_page)(struct page *); int (*launder_page) (struct page *); - int (*is_partially_uptodate) (struct page *, unsigned long, - unsigned long); + bool (*is_partially_uptodate) (struct folio *, size_t from, + size_t count); void (*is_dirty_writeback) (struct page *, bool *, bool *); int (*error_remove_page)(struct address_space *, struct page *); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 97a3a2edb585..3bcbb264f83f 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -227,8 +227,7 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, const struct iomap_ops *ops); int iomap_readpage(struct page *page, const struct iomap_ops *ops); void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); -int iomap_is_partially_uptodate(struct page *page, unsigned long from, - unsigned long count); +bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); int iomap_releasepage(struct page *page, gfp_t gfp_mask); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); void iomap_invalidatepage(struct page *page, unsigned int offset, -- cgit From aa1b46dcdc7baaf5fec0be25782ef24b26aa209e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sun, 13 Mar 2022 21:15:02 -1000 Subject: block: fix rq-qos breakage from skipping rq_qos_done_bio() a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked") made bio_endio() skip rq_qos_done_bio() if BIO_TRACKED is not set. While this fixed a potential oops, it also broke blk-iocost by skipping the done_bio callback for merged bios. Before, whether a bio goes through rq_qos_throttle() or rq_qos_merge(), rq_qos_done_bio() would be called on the bio on completion with BIO_TRACKED distinguishing the former from the latter. rq_qos_done_bio() is not called for bios which wenth through rq_qos_merge(). This royally confuses blk-iocost as the merged bios never finish and are considered perpetually in-flight. One reliably reproducible failure mode is an intermediate cgroup geting stuck active preventing its children from being activated due to the leaf-only rule, leading to loss of control. The following is from resctl-bench protection scenario which emulates isolating a web server like workload from a memory bomb run on an iocost configuration which should yield a reasonable level of protection. # cat /sys/block/nvme2n1/device/model Samsung SSD 970 PRO 512GB # cat /sys/fs/cgroup/io.cost.model 259:0 ctrl=user model=linear rbps=834913556 rseqiops=93622 rrandiops=102913 wbps=618985353 wseqiops=72325 wrandiops=71025 # cat /sys/fs/cgroup/io.cost.qos 259:0 enable=1 ctrl=user rpct=95.00 rlat=18776 wpct=95.00 wlat=8897 min=60.00 max=100.00 # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 ... Memory Hog Summary ================== IO Latency: R p50=242u:336u/2.5m p90=794u:1.4m/7.5m p99=2.7m:8.0m/62.5m max=8.0m:36.4m/350m W p50=221u:323u/1.5m p90=709u:1.2m/5.5m p99=1.5m:2.5m/9.5m max=6.9m:35.9m/350m Isolation and Request Latency Impact Distributions: min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev isol% 15.90 15.90 15.90 40.05 57.24 59.07 60.01 74.63 74.63 90.35 90.35 58.12 15.82 lat-imp% 0 0 0 0 0 4.55 14.68 15.54 233.5 548.1 548.1 53.88 143.6 Result: isol=58.12:15.82% lat_imp=53.88%:143.6 work_csv=100.0% missing=3.96% The isolation result of 58.12% is close to what this device would show without any IO control. Fix it by introducing a new flag BIO_QOS_MERGED to mark merged bios and calling rq_qos_done_bio() on them too. For consistency and clarity, rename BIO_TRACKED to BIO_QOS_THROTTLED. The flag checks are moved into rq_qos_done_bio() so that it's next to the code paths that set the flags. With the patch applied, the above same benchmark shows: # resctl-bench -m 29.6G -r out.json run protection::scenario=mem-hog,loops=1 ... Memory Hog Summary ================== IO Latency: R p50=123u:84.4u/985u p90=322u:256u/2.5m p99=1.6m:1.4m/9.5m max=11.1m:36.0m/350m W p50=429u:274u/995u p90=1.7m:1.3m/4.5m p99=3.4m:2.7m/11.5m max=7.9m:5.9m/26.5m Isolation and Request Latency Impact Distributions: min p01 p05 p10 p25 p50 p75 p90 p95 p99 max mean stdev isol% 84.91 84.91 89.51 90.73 92.31 94.49 96.36 98.04 98.71 100.0 100.0 94.42 2.81 lat-imp% 0 0 0 0 0 2.81 5.73 11.11 13.92 17.53 22.61 4.10 4.68 Result: isol=94.42:2.81% lat_imp=4.10%:4.68 work_csv=58.34% missing=0% Signed-off-by: Tejun Heo Fixes: a647a524a467 ("block: don't call rq_qos_ops->done_bio if the bio isn't tracked") Cc: stable@vger.kernel.org # v5.15+ Cc: Ming Lei Cc: Yu Kuai Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/Yi7rdrzQEHjJLGKB@slm.duckdns.org Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 5561e58d158a..0c3563b45fe9 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -324,7 +324,8 @@ enum { BIO_TRACE_COMPLETION, /* bio_endio() should trace the final completion * of this bio. */ BIO_CGROUP_ACCT, /* has been accounted to a cgroup */ - BIO_TRACKED, /* set if bio goes through the rq_qos path */ + BIO_QOS_THROTTLED, /* bio went through rq_qos throttle path */ + BIO_QOS_MERGED, /* but went through rq_qos merge path */ BIO_REMAPPED, BIO_ZONE_WRITE_LOCKED, /* Owns a zoned device zone write lock */ BIO_PERCPU_CACHE, /* can participate in per-cpu alloc cache */ -- cgit From 45ceaf14d53a123e5955477da501bc6f26b99039 Mon Sep 17 00:00:00 2001 From: Stephen Boyd Date: Mon, 14 Mar 2022 19:45:37 -0700 Subject: Input: extract ChromeOS vivaldi physmap show function Let's introduce a common library file for the physmap show function duplicated between three different keyboard drivers. This largely copies the code from cros_ec_keyb.c which has the most recent version of the show function, while using the vivaldi_data struct from the hid-vivaldi driver. This saves a small amount of space in an allyesconfig build. $ ./scripts/bloat-o-meter vmlinux.before vmlinux.after add/remove: 3/0 grow/shrink: 2/3 up/down: 412/-720 (-308) Function old new delta vivaldi_function_row_physmap_show - 292 +292 _sub_I_65535_1 1057564 1057616 +52 _sub_D_65535_0 1057564 1057616 +52 e843419@49f2_00062737_9b04 - 8 +8 e843419@20f6_0002a34d_35bc - 8 +8 atkbd_parse_fwnode_data 480 472 -8 atkbd_do_show_function_row_physmap 316 76 -240 function_row_physmap_show 620 148 -472 Total: Before=285581925, After=285581617, chg -0.00% Signed-off-by: Stephen Boyd Tested-by: Stephen Boyd # coachz, wormdingler Link: https://lore.kernel.org/r/20220228075446.466016-3-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov --- include/linux/input/vivaldi-fmap.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 include/linux/input/vivaldi-fmap.h (limited to 'include/linux') diff --git a/include/linux/input/vivaldi-fmap.h b/include/linux/input/vivaldi-fmap.h new file mode 100644 index 000000000000..7e4b7023bf04 --- /dev/null +++ b/include/linux/input/vivaldi-fmap.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _VIVALDI_FMAP_H +#define _VIVALDI_FMAP_H + +#include + +#define VIVALDI_MAX_FUNCTION_ROW_KEYS 24 + +/** + * struct vivaldi_data - Function row map data for ChromeOS Vivaldi keyboards + * @function_row_physmap: An array of scancodes or their equivalent (HID usage + * codes, encoded rows/columns, etc) for the top + * row function keys, in an order from left to right + * @num_function_row_keys: The number of top row keys in a custom keyboard + * + * This structure is supposed to be used by ChromeOS keyboards using + * the Vivaldi keyboard function row design. + */ +struct vivaldi_data { + u32 function_row_physmap[VIVALDI_MAX_FUNCTION_ROW_KEYS]; + unsigned int num_function_row_keys; +}; + +ssize_t vivaldi_function_row_physmap_show(const struct vivaldi_data *data, + char *buf); + +#endif /* _VIVALDI_FMAP_H */ -- cgit From c8c301abeae58ec756b8fcb2178a632bd3c9e284 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 8 Mar 2022 16:30:18 +0100 Subject: x86/ibt: Add ANNOTATE_NOENDBR In order to have objtool warn about code references to !ENDBR instruction, we need an annotation to allow this for non-control-flow instances -- consider text range checks, text patching, or return trampolines etc. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Kees Cook Acked-by: Josh Poimboeuf Link: https://lore.kernel.org/r/20220308154317.578968224@infradead.org --- include/linux/objtool.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/objtool.h b/include/linux/objtool.h index aca52db2f3f3..f797368820c8 100644 --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -77,6 +77,12 @@ struct unwind_hint { #define STACK_FRAME_NON_STANDARD_FP(func) #endif +#define ANNOTATE_NOENDBR \ + "986: \n\t" \ + ".pushsection .discard.noendbr\n\t" \ + _ASM_PTR " 986b\n\t" \ + ".popsection\n\t" + #else /* __ASSEMBLY__ */ /* @@ -129,6 +135,13 @@ struct unwind_hint { .popsection .endm +.macro ANNOTATE_NOENDBR +.Lhere_\@: + .pushsection .discard.noendbr + .quad .Lhere_\@ + .popsection +.endm + #endif /* __ASSEMBLY__ */ #else /* !CONFIG_STACK_VALIDATION */ @@ -139,12 +152,15 @@ struct unwind_hint { "\n\t" #define STACK_FRAME_NON_STANDARD(func) #define STACK_FRAME_NON_STANDARD_FP(func) +#define ANNOTATE_NOENDBR #else #define ANNOTATE_INTRA_FUNCTION_CALL .macro UNWIND_HINT sp_reg:req sp_offset=0 type:req end=0 .endm .macro STACK_FRAME_NON_STANDARD func:req .endm +.macro ANNOTATE_NOENDBR +.endm #endif #endif /* CONFIG_STACK_VALIDATION */ -- cgit From cc66bb91457827f62e2b6cb2518666820f0a6c48 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 8 Mar 2022 16:30:32 +0100 Subject: x86/ibt,kprobes: Cure sym+0 equals fentry woes In order to allow kprobes to skip the ENDBR instructions at sym+0 for X86_KERNEL_IBT builds, change _kprobe_addr() to take an architecture callback to inspect the function at hand and modify the offset if needed. This streamlines the existing interface to cover more cases and require less hooks. Once PowerPC gets fully converted there will only be the one arch hook. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Masami Hiramatsu Acked-by: Josh Poimboeuf Link: https://lore.kernel.org/r/20220308154318.405947704@infradead.org --- include/linux/kprobes.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 19b884353b15..9c28f7a0ef42 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -265,7 +265,6 @@ extern int arch_init_kprobes(void); extern void kprobes_inc_nmissed_count(struct kprobe *p); extern bool arch_within_kprobe_blacklist(unsigned long addr); extern int arch_populate_kprobe_blacklist(void); -extern bool arch_kprobe_on_func_entry(unsigned long offset); extern int kprobe_on_func_entry(kprobe_opcode_t *addr, const char *sym, unsigned long offset); extern bool within_kprobe_blacklist(unsigned long addr); @@ -384,6 +383,8 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void) } kprobe_opcode_t *kprobe_lookup_name(const char *name, unsigned int offset); +kprobe_opcode_t *arch_adjust_kprobe_addr(unsigned long addr, unsigned long offset, bool *on_func_entry); + int register_kprobe(struct kprobe *p); void unregister_kprobe(struct kprobe *p); int register_kprobes(struct kprobe **kps, int num); -- cgit From cb9010f87dcbcdbb51cc96b922c6260848cecbd1 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 8 Mar 2022 16:30:44 +0100 Subject: x86/ibt: Ensure module init/exit points have references Since the references to the module init/exit points only have external references, a module LTO run will consider them 'unused' and seal them, leading to an immediate fail on module load. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Josh Poimboeuf Link: https://lore.kernel.org/r/20220308154319.113767246@infradead.org --- include/linux/cfi.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cfi.h b/include/linux/cfi.h index 879744aaa6e0..c6dfc1ed0626 100644 --- a/include/linux/cfi.h +++ b/include/linux/cfi.h @@ -34,8 +34,17 @@ static inline void cfi_module_remove(struct module *mod, unsigned long base_addr #else /* !CONFIG_CFI_CLANG */ -#define __CFI_ADDRESSABLE(fn, __attr) +#ifdef CONFIG_X86_KERNEL_IBT + +#define __CFI_ADDRESSABLE(fn, __attr) \ + const void *__cfi_jt_ ## fn __visible __attr = (void *)&fn + +#endif /* CONFIG_X86_KERNEL_IBT */ #endif /* CONFIG_CFI_CLANG */ +#ifndef __CFI_ADDRESSABLE +#define __CFI_ADDRESSABLE(fn, __attr) +#endif + #endif /* _LINUX_CFI_H */ -- cgit From eae654f1c21216daa9fbb92591c0d9f5ae46cfc5 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 8 Mar 2022 16:30:48 +0100 Subject: exit: Mark do_group_exit() __noreturn vmlinux.o: warning: objtool: get_signal()+0x108: unreachable instruction 0000 000000000007f930 : ... 0103 7fa33: e8 00 00 00 00 call 7fa38 7fa34: R_X86_64_PLT32 do_group_exit-0x4 0108 7fa38: 41 8b 45 74 mov 0x74(%r13),%eax Signed-off-by: Peter Zijlstra (Intel) Acked-by: Josh Poimboeuf Link: https://lore.kernel.org/r/20220308154319.351270711@infradead.org --- include/linux/sched/task.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index e84e54d1b490..719c9a6cac8d 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -79,7 +79,7 @@ static inline void exit_thread(struct task_struct *tsk) { } #endif -extern void do_group_exit(int); +extern __noreturn void do_group_exit(int); extern void exit_files(struct task_struct *); extern void exit_itimers(struct signal_struct *); -- cgit From 105cd68596392cfe15056a891b0723609dcad247 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 14 Mar 2022 17:58:35 +0100 Subject: x86: Mark __invalid_creds() __noreturn vmlinux.o: warning: objtool: ksys_unshare()+0x36c: unreachable instruction 0000 0000000000067040 : ... 0364 673a4: 4c 89 ef mov %r13,%rdi 0367 673a7: e8 00 00 00 00 call 673ac 673a8: R_X86_64_PLT32 __invalid_creds-0x4 036c 673ac: e9 28 ff ff ff jmp 672d9 0371 673b1: 41 bc f4 ff ff ff mov $0xfffffff4,%r12d 0377 673b7: e9 80 fd ff ff jmp 6713c Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/Yi9gOW9f1GGwwUD6@hirez.programming.kicks-ass.net --- include/linux/cred.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cred.h b/include/linux/cred.h index fcbc6885cc09..9ed9232af934 100644 --- a/include/linux/cred.h +++ b/include/linux/cred.h @@ -176,7 +176,7 @@ extern int set_cred_ucounts(struct cred *); * check for validity of credentials */ #ifdef CONFIG_DEBUG_CREDENTIALS -extern void __invalid_creds(const struct cred *, const char *, unsigned); +extern void __noreturn __invalid_creds(const struct cred *, const char *, unsigned); extern void __validate_process_creds(struct task_struct *, const char *, unsigned); -- cgit From dca5da2abe406168b85f97e22109710ebe0bda08 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 14 Mar 2022 18:05:52 +0100 Subject: x86,objtool: Move the ASM_REACHABLE annotation to objtool.h Because we need a variant for .S files too. Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/Yi9gOW9f1GGwwUD6@hirez.programming.kicks-ass.net --- include/linux/compiler.h | 7 ------- include/linux/objtool.h | 16 ++++++++++++++++ 2 files changed, 16 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 0f7fd205ab7e..219aa5ddbc73 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -125,18 +125,11 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, }) #define annotate_unreachable() __annotate_unreachable(__COUNTER__) -#define ASM_REACHABLE \ - "998:\n\t" \ - ".pushsection .discard.reachable\n\t" \ - ".long 998b - .\n\t" \ - ".popsection\n\t" - /* Annotate a C jump table to allow objtool to follow the code flow */ #define __annotate_jump_table __section(".rodata..c_jump_table") #else #define annotate_unreachable() -# define ASM_REACHABLE #define __annotate_jump_table #endif diff --git a/include/linux/objtool.h b/include/linux/objtool.h index f797368820c8..586d35720f13 100644 --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -83,6 +83,12 @@ struct unwind_hint { _ASM_PTR " 986b\n\t" \ ".popsection\n\t" +#define ASM_REACHABLE \ + "998:\n\t" \ + ".pushsection .discard.reachable\n\t" \ + ".long 998b - .\n\t" \ + ".popsection\n\t" + #else /* __ASSEMBLY__ */ /* @@ -142,6 +148,13 @@ struct unwind_hint { .popsection .endm +.macro REACHABLE +.Lhere_\@: + .pushsection .discard.reachable + .long .Lhere_\@ - . + .popsection +.endm + #endif /* __ASSEMBLY__ */ #else /* !CONFIG_STACK_VALIDATION */ @@ -153,6 +166,7 @@ struct unwind_hint { #define STACK_FRAME_NON_STANDARD(func) #define STACK_FRAME_NON_STANDARD_FP(func) #define ANNOTATE_NOENDBR +#define ASM_REACHABLE #else #define ANNOTATE_INTRA_FUNCTION_CALL .macro UNWIND_HINT sp_reg:req sp_offset=0 type:req end=0 @@ -161,6 +175,8 @@ struct unwind_hint { .endm .macro ANNOTATE_NOENDBR .endm +.macro REACHABLE +.endm #endif #endif /* CONFIG_STACK_VALIDATION */ -- cgit From 5ad6b2bdaaea712486145fa5a78ec24d25289071 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:28 +0000 Subject: fs: Turn do_invalidatepage() into folio_invalidate() Take a folio instead of a page, fix the types of the offset & length, and export it to filesystems. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/mm.h | 3 --- include/linux/pagemap.h | 1 + 2 files changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 213cc569b192..7808a7959066 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1939,9 +1939,6 @@ int get_kernel_pages(const struct kvec *iov, int nr_pages, int write, struct page **pages); struct page *get_dump_page(unsigned long addr); -extern void do_invalidatepage(struct page *page, unsigned int offset, - unsigned int length); - bool folio_mark_dirty(struct folio *folio); bool set_page_dirty(struct page *page); int set_page_dirty_lock(struct page *page); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 55a80d8f0e9c..4503d5baa252 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -893,6 +893,7 @@ static inline void cancel_dirty_page(struct page *page) } bool folio_clear_dirty_for_io(struct folio *folio); bool clear_page_dirty_for_io(struct page *page); +void folio_invalidate(struct folio *folio, size_t offset, size_t length); int __must_check folio_write_one(struct folio *folio); static inline int __must_check write_one_page(struct page *page) { -- cgit From 128d1f8241d62ab014eef6dd4ef9bb977dbeadb2 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:32 +0000 Subject: fs: Add invalidate_folio() aops method This is used in preference to invalidatepage, if defined. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 5939e6694ada..bcdb613cd652 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -387,6 +387,7 @@ struct address_space_operations { /* Unfortunately this kludge is needed for FIBMAP. Don't use it */ sector_t (*bmap)(struct address_space *, sector_t); + void (*invalidate_folio) (struct folio *, size_t offset, size_t len); void (*invalidatepage) (struct page *, unsigned int, unsigned int); int (*releasepage) (struct page *, gfp_t); void (*freepage)(struct page *); -- cgit From d82354f6b05fc3b35029b3f75ddbf41b82af3bc8 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:33 +0000 Subject: iomap: Remove iomap_invalidatepage() Use iomap_invalidate_folio() in all the iomap-based filesystems and rename the iomap_invalidatepage tracepoint. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/iomap.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 3bcbb264f83f..b76f0dd149fb 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -230,8 +230,6 @@ void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); int iomap_releasepage(struct page *page, gfp_t gfp_mask); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); -void iomap_invalidatepage(struct page *page, unsigned int offset, - unsigned int len); #ifdef CONFIG_MIGRATION int iomap_migrate_page(struct address_space *mapping, struct page *newpage, struct page *page, enum migrate_mode mode); -- cgit From 7ba13abbd31ee9265e88d7dc029c0f786e665192 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:34 +0000 Subject: fs: Turn block_invalidatepage into block_invalidate_folio Remove special-casing of a NULL invalidatepage, since there is no more block_invalidatepage. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/buffer_head.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 79d465057889..9ee9d003d736 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -217,8 +217,7 @@ extern int buffer_heads_over_limit; * Generic address_space_operations implementations for buffer_head-backed * address_spaces. */ -void block_invalidatepage(struct page *page, unsigned int offset, - unsigned int length); +void block_invalidate_folio(struct folio *folio, size_t offset, size_t length); int block_write_full_page(struct page *page, get_block_t *get_block, struct writeback_control *wbc); int __block_write_full_page(struct inode *inode, struct page *page, -- cgit From 5660a8630dab61a28e07ec00c42bf605b182d725 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:35 +0000 Subject: fs: Remove noop_invalidatepage() We used to have to use noop_invalidatepage() to prevent block_invalidatepage() from being called, but that behaviour is now gone. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/fs.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index bcdb613cd652..a40ea82248da 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3323,8 +3323,6 @@ extern int simple_rename(struct user_namespace *, struct inode *, extern void simple_recursive_removal(struct dentry *, void (*callback)(struct dentry *)); extern int noop_fsync(struct file *, loff_t, loff_t, int); -extern void noop_invalidatepage(struct page *page, unsigned int offset, - unsigned int length); extern ssize_t noop_direct_IO(struct kiocb *iocb, struct iov_iter *iter); extern int simple_empty(struct dentry *); extern int simple_write_begin(struct file *file, struct address_space *mapping, -- cgit From ccd16945dba091fdf1036d7711b9f6cbd287ae28 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:43 +0000 Subject: ext4: Convert invalidatepage to invalidate_folio Extensive changes, but fairly mechanical. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/jbd2.h | 4 ++-- include/linux/pagemap.h | 18 ++++++++++++++++++ 2 files changed, 20 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 9c3ada74ffb1..1b9d1e205a2f 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1530,8 +1530,8 @@ void jbd2_journal_set_triggers(struct buffer_head *, struct jbd2_buffer_trigger_type *type); extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *); extern int jbd2_journal_forget (handle_t *, struct buffer_head *); -extern int jbd2_journal_invalidatepage(journal_t *, - struct page *, unsigned int, unsigned int); +int jbd2_journal_invalidate_folio(journal_t *, struct folio *, + size_t offset, size_t length); extern int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page); extern int jbd2_journal_stop(handle_t *); extern int jbd2_journal_flush(journal_t *journal, unsigned int flags); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 4503d5baa252..6a9617e9c6bc 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -422,6 +422,24 @@ static inline struct folio *filemap_get_folio(struct address_space *mapping, return __filemap_get_folio(mapping, index, 0, 0); } +/** + * filemap_lock_folio - Find and lock a folio. + * @mapping: The address_space to search. + * @index: The page index. + * + * Looks up the page cache entry at @mapping & @index. If a folio is + * present, it is returned locked with an increased refcount. + * + * Context: May sleep. + * Return: A folio or %NULL if there is no folio in the cache for this + * index. Will not return a shadow, swap or DAX entry. + */ +static inline struct folio *filemap_lock_folio(struct address_space *mapping, + pgoff_t index) +{ + return __filemap_get_folio(mapping, index, FGP_LOCK, 0); +} + /** * find_get_page - find and get a page reference * @mapping: the address_space to search -- cgit From 6d740c76ea86053208b20c41fb5ec1de07acb996 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:47 +0000 Subject: nfs: Convert from invalidatepage to invalidate_folio Print the folio index instead of the pointer, since this is more useful. We also don't need to use page_file_mapping() as we do not invalidate swapcache pages. Since this is the only caller of nfs_wb_page_cancel(), convert it to nfs_wb_folio_cancel(). Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/nfs_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index 68f81d8d36de..784120cc217e 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -583,7 +583,7 @@ extern int nfs_updatepage(struct file *, struct page *, unsigned int, unsigned extern int nfs_sync_inode(struct inode *inode); extern int nfs_wb_all(struct inode *inode); extern int nfs_wb_page(struct inode *inode, struct page *page); -extern int nfs_wb_page_cancel(struct inode *inode, struct page* page); +int nfs_wb_folio_cancel(struct inode *inode, struct folio *folio); extern int nfs_commit_inode(struct inode *, int); extern struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail); extern void nfs_commit_free(struct nfs_commit_data *data); -- cgit From f50015a596fa106bf642bd85fbf6e6b52cc913d0 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:51 +0000 Subject: fs: Remove aops->invalidatepage With all users migrated to ->invalidate_folio, remove the old operation. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/fs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index a40ea82248da..af9ae091bd82 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -388,7 +388,6 @@ struct address_space_operations { /* Unfortunately this kludge is needed for FIBMAP. Don't use it */ sector_t (*bmap)(struct address_space *, sector_t); void (*invalidate_folio) (struct folio *, size_t offset, size_t len); - void (*invalidatepage) (struct page *, unsigned int, unsigned int); int (*releasepage) (struct page *, gfp_t); void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); -- cgit From affa80e8c6a1df473694c2087259901872309cc4 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:52 +0000 Subject: fs: Add aops->launder_folio Since the only difference between ->launder_page and ->launder_folio is the type of the pointer, these can safely use a union without affecting bisectability. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/fs.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index af9ae091bd82..0af3075cdff2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -399,7 +399,10 @@ struct address_space_operations { struct page *, struct page *, enum migrate_mode); bool (*isolate_page)(struct page *, isolate_mode_t); void (*putback_page)(struct page *); - int (*launder_page) (struct page *); + union { + int (*launder_page) (struct page *); + int (*launder_folio) (struct folio *); + }; bool (*is_partially_uptodate) (struct folio *, size_t from, size_t count); void (*is_dirty_writeback) (struct page *, bool *, bool *); -- cgit From 072acba6d08730beba5bad293c7ce6d0c4b0624c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:21:59 +0000 Subject: fs: Remove aops->launder_page With all users converted to ->launder_folio, remove ->launder_page. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/fs.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 0af3075cdff2..055be40084f1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -399,10 +399,7 @@ struct address_space_operations { struct page *, struct page *, enum migrate_mode); bool (*isolate_page)(struct page *, isolate_mode_t); void (*putback_page)(struct page *); - union { - int (*launder_page) (struct page *); - int (*launder_folio) (struct folio *); - }; + int (*launder_folio)(struct folio *); bool (*is_partially_uptodate) (struct folio *, size_t from, size_t count); void (*is_dirty_writeback) (struct page *, bool *, bool *); -- cgit From 6f31a5a261dbbe7bf7f585dfe81f8acd4b25ec3b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:22:00 +0000 Subject: fs: Add aops->dirty_folio This replaces ->set_page_dirty(). It returns a bool instead of an int and takes the address_space as a parameter instead of expecting the implementations to retrieve the address_space from the page. This is particularly important for filesystems which use FS_OPS for swap. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 055be40084f1..c3d5db8851ae 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -369,6 +369,7 @@ struct address_space_operations { /* Set a page dirty. Return true if this dirtied it */ int (*set_page_dirty)(struct page *page); + bool (*dirty_folio)(struct address_space *, struct folio *); /* * Reads in the requested pages. Unlike ->readpage(), this is -- cgit From 8fb72b4a76933ae6f86725cc8e4a8190ba84d755 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:22:01 +0000 Subject: fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() Convert all users of fscache_set_page_dirty to use fscache_dirty_folio. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/fscache.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 296c5f1d9f35..d44ff747a657 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -616,9 +616,11 @@ static inline void fscache_write_to_cache(struct fscache_cookie *cookie, } #if __fscache_available -extern int fscache_set_page_dirty(struct page *page, struct fscache_cookie *cookie); +bool fscache_dirty_folio(struct address_space *mapping, struct folio *folio, + struct fscache_cookie *cookie); #else -#define fscache_set_page_dirty(PAGE, COOKIE) (__set_page_dirty_nobuffers((PAGE))) +#define fscache_dirty_folio(MAPPING, FOLIO, COOKIE) \ + filemap_dirty_folio(MAPPING, FOLIO) #endif /** @@ -626,7 +628,7 @@ extern int fscache_set_page_dirty(struct page *page, struct fscache_cookie *cook * @wbc: The writeback control * @cookie: The cookie referring to the cache object * - * Unpin the writeback resources pinned by fscache_set_page_dirty(). This is + * Unpin the writeback resources pinned by fscache_dirty_folio(). This is * intended to be called by the netfs's ->write_inode() method. */ static inline void fscache_unpin_writeback(struct writeback_control *wbc, -- cgit From 7e63df00cf5e609ebbee5ffbc3df1900d8a4443c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:22:10 +0000 Subject: mm: Convert swap_set_page_dirty() to swap_dirty_folio() Straightforward conversion. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/swap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 1d38d9475c4d..65a37e555124 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -427,7 +427,7 @@ extern int swap_writepage(struct page *page, struct writeback_control *wbc); extern void end_swap_bio_write(struct bio *bio); extern int __swap_writepage(struct page *page, struct writeback_control *wbc, bio_end_io_t end_write_func); -extern int swap_set_page_dirty(struct page *page); +bool swap_dirty_folio(struct address_space *mapping, struct folio *folio); int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); -- cgit From 938d3480b92fa5e454b7734294f12a7b75126f09 Mon Sep 17 00:00:00 2001 From: Wang Yufen Date: Fri, 4 Mar 2022 16:11:42 +0800 Subject: bpf, sockmap: Fix memleak in sk_psock_queue_msg If tcp_bpf_sendmsg is running during a tear down operation we may enqueue data on the ingress msg queue while tear down is trying to free it. sk1 (redirect sk2) sk2 ------------------- --------------- tcp_bpf_sendmsg() tcp_bpf_send_verdict() tcp_bpf_sendmsg_redir() bpf_tcp_ingress() sock_map_close() lock_sock() lock_sock() ... blocking sk_psock_stop sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED); release_sock(sk); lock_sock() sk_mem_charge() get_page() sk_psock_queue_msg() sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED); drop_sk_msg() release_sock() While drop_sk_msg(), the msg has charged memory form sk by sk_mem_charge and has sg pages need to put. To fix we use sk_msg_free() and then kfee() msg. This issue can cause the following info: WARNING: CPU: 0 PID: 9202 at net/core/stream.c:205 sk_stream_kill_queues+0xc8/0xe0 Call Trace: inet_csk_destroy_sock+0x55/0x110 tcp_rcv_state_process+0xe5f/0xe90 ? sk_filter_trim_cap+0x10d/0x230 ? tcp_v4_do_rcv+0x161/0x250 tcp_v4_do_rcv+0x161/0x250 tcp_v4_rcv+0xc3a/0xce0 ip_protocol_deliver_rcu+0x3d/0x230 ip_local_deliver_finish+0x54/0x60 ip_local_deliver+0xfd/0x110 ? ip_protocol_deliver_rcu+0x230/0x230 ip_rcv+0xd6/0x100 ? ip_local_deliver+0x110/0x110 __netif_receive_skb_one_core+0x85/0xa0 process_backlog+0xa4/0x160 __napi_poll+0x29/0x1b0 net_rx_action+0x287/0x300 __do_softirq+0xff/0x2fc do_softirq+0x79/0x90 WARNING: CPU: 0 PID: 531 at net/ipv4/af_inet.c:154 inet_sock_destruct+0x175/0x1b0 Call Trace: __sk_destruct+0x24/0x1f0 sk_psock_destroy+0x19b/0x1c0 process_one_work+0x1b3/0x3c0 ? process_one_work+0x3c0/0x3c0 worker_thread+0x30/0x350 ? process_one_work+0x3c0/0x3c0 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 Fixes: 9635720b7c88 ("bpf, sockmap: Fix memleak on ingress msg enqueue") Signed-off-by: Wang Yufen Signed-off-by: Daniel Borkmann Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/20220304081145.2037182-2-wangyufen@huawei.com --- include/linux/skmsg.h | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index fdb5375f0562..c5a2d6f50f25 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -304,21 +304,16 @@ static inline void sock_drop(struct sock *sk, struct sk_buff *skb) kfree_skb(skb); } -static inline void drop_sk_msg(struct sk_psock *psock, struct sk_msg *msg) -{ - if (msg->skb) - sock_drop(psock->sk, msg->skb); - kfree(msg); -} - static inline void sk_psock_queue_msg(struct sk_psock *psock, struct sk_msg *msg) { spin_lock_bh(&psock->ingress_lock); if (sk_psock_test_state(psock, SK_PSOCK_TX_ENABLED)) list_add_tail(&msg->list, &psock->ingress_msg); - else - drop_sk_msg(psock, msg); + else { + sk_msg_free(psock->sk, msg); + kfree(msg); + } spin_unlock_bh(&psock->ingress_lock); } -- cgit From 23a9dbbe0faf124fc4c139615633b9d12a3a89ef Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Tue, 15 Mar 2022 18:34:06 +0300 Subject: NFSD: prevent integer overflow on 32 bit systems On a 32 bit system, the "len * sizeof(*p)" operation can have an integer overflow. Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter Signed-off-by: Chuck Lever --- include/linux/sunrpc/xdr.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index b519609af1d0..4417f667c757 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -731,6 +731,8 @@ xdr_stream_decode_uint32_array(struct xdr_stream *xdr, if (unlikely(xdr_stream_decode_u32(xdr, &len) < 0)) return -EBADMSG; + if (len > SIZE_MAX / sizeof(*p)) + return -EBADMSG; p = xdr_inline_decode(xdr, len * sizeof(*p)); if (unlikely(!p)) return -EBADMSG; -- cgit From b0ae33a2d2fb6c55117b377ec4ae3f2c84eab6a2 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 4 Mar 2022 16:19:55 +0100 Subject: usb: early: xhci-dbc: Remove duplicate keep parsing The generic earlyprintk= parsing already parses the optional ",keep", no need to duplicate that in the xdbc driver. Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220304152135.975568860@infradead.org Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/xhci-dbgp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/usb/xhci-dbgp.h b/include/linux/usb/xhci-dbgp.h index 0a37f1283bf0..01fe768873f9 100644 --- a/include/linux/usb/xhci-dbgp.h +++ b/include/linux/usb/xhci-dbgp.h @@ -15,7 +15,7 @@ #define __LINUX_XHCI_DBGP_H #ifdef CONFIG_EARLY_PRINTK_USB_XDBC -int __init early_xdbc_parse_parameter(char *s); +int __init early_xdbc_parse_parameter(char *s, int keep_early); int __init early_xdbc_setup_hardware(void); void __init early_xdbc_register_console(void); #else -- cgit From ff5812e00d5e7bdeae68784d99686eff67896931 Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Tue, 8 Mar 2022 18:48:54 +0000 Subject: crypto: hisilicon/qm: Move the QM header to include/linux Since we are going to introduce VFIO PCI HiSilicon ACC driver for live migration in subsequent patches, move the ACC QM header file to a common include dir. Acked-by: Zhou Wang Acked-by: Longfang Liu Acked-by: Kai Ye Signed-off-by: Shameer Kolothum Link: https://lore.kernel.org/r/20220308184902.2242-2-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson --- include/linux/hisi_acc_qm.h | 441 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 441 insertions(+) create mode 100644 include/linux/hisi_acc_qm.h (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h new file mode 100644 index 000000000000..3068093229a5 --- /dev/null +++ b/include/linux/hisi_acc_qm.h @@ -0,0 +1,441 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Copyright (c) 2019 HiSilicon Limited. */ +#ifndef HISI_ACC_QM_H +#define HISI_ACC_QM_H + +#include +#include +#include +#include +#include + +#define QM_QNUM_V1 4096 +#define QM_QNUM_V2 1024 +#define QM_MAX_VFS_NUM_V2 63 + +/* qm user domain */ +#define QM_ARUSER_M_CFG_1 0x100088 +#define AXUSER_SNOOP_ENABLE BIT(30) +#define AXUSER_CMD_TYPE GENMASK(14, 12) +#define AXUSER_CMD_SMMU_NORMAL 1 +#define AXUSER_NS BIT(6) +#define AXUSER_NO BIT(5) +#define AXUSER_FP BIT(4) +#define AXUSER_SSV BIT(0) +#define AXUSER_BASE (AXUSER_SNOOP_ENABLE | \ + FIELD_PREP(AXUSER_CMD_TYPE, \ + AXUSER_CMD_SMMU_NORMAL) | \ + AXUSER_NS | AXUSER_NO | AXUSER_FP) +#define QM_ARUSER_M_CFG_ENABLE 0x100090 +#define ARUSER_M_CFG_ENABLE 0xfffffffe +#define QM_AWUSER_M_CFG_1 0x100098 +#define QM_AWUSER_M_CFG_ENABLE 0x1000a0 +#define AWUSER_M_CFG_ENABLE 0xfffffffe +#define QM_WUSER_M_CFG_ENABLE 0x1000a8 +#define WUSER_M_CFG_ENABLE 0xffffffff + +/* qm cache */ +#define QM_CACHE_CTL 0x100050 +#define SQC_CACHE_ENABLE BIT(0) +#define CQC_CACHE_ENABLE BIT(1) +#define SQC_CACHE_WB_ENABLE BIT(4) +#define SQC_CACHE_WB_THRD GENMASK(10, 5) +#define CQC_CACHE_WB_ENABLE BIT(11) +#define CQC_CACHE_WB_THRD GENMASK(17, 12) +#define QM_AXI_M_CFG 0x1000ac +#define AXI_M_CFG 0xffff +#define QM_AXI_M_CFG_ENABLE 0x1000b0 +#define AM_CFG_SINGLE_PORT_MAX_TRANS 0x300014 +#define AXI_M_CFG_ENABLE 0xffffffff +#define QM_PEH_AXUSER_CFG 0x1000cc +#define QM_PEH_AXUSER_CFG_ENABLE 0x1000d0 +#define PEH_AXUSER_CFG 0x401001 +#define PEH_AXUSER_CFG_ENABLE 0xffffffff + +#define QM_AXI_RRESP BIT(0) +#define QM_AXI_BRESP BIT(1) +#define QM_ECC_MBIT BIT(2) +#define QM_ECC_1BIT BIT(3) +#define QM_ACC_GET_TASK_TIMEOUT BIT(4) +#define QM_ACC_DO_TASK_TIMEOUT BIT(5) +#define QM_ACC_WB_NOT_READY_TIMEOUT BIT(6) +#define QM_SQ_CQ_VF_INVALID BIT(7) +#define QM_CQ_VF_INVALID BIT(8) +#define QM_SQ_VF_INVALID BIT(9) +#define QM_DB_TIMEOUT BIT(10) +#define QM_OF_FIFO_OF BIT(11) +#define QM_DB_RANDOM_INVALID BIT(12) +#define QM_MAILBOX_TIMEOUT BIT(13) +#define QM_FLR_TIMEOUT BIT(14) + +#define QM_BASE_NFE (QM_AXI_RRESP | QM_AXI_BRESP | QM_ECC_MBIT | \ + QM_ACC_GET_TASK_TIMEOUT | QM_DB_TIMEOUT | \ + QM_OF_FIFO_OF | QM_DB_RANDOM_INVALID | \ + QM_MAILBOX_TIMEOUT | QM_FLR_TIMEOUT) +#define QM_BASE_CE QM_ECC_1BIT + +#define QM_Q_DEPTH 1024 +#define QM_MIN_QNUM 2 +#define HISI_ACC_SGL_SGE_NR_MAX 255 +#define QM_SHAPER_CFG 0x100164 +#define QM_SHAPER_ENABLE BIT(30) +#define QM_SHAPER_TYPE1_OFFSET 10 + +/* page number for queue file region */ +#define QM_DOORBELL_PAGE_NR 1 + +/* uacce mode of the driver */ +#define UACCE_MODE_NOUACCE 0 /* don't use uacce */ +#define UACCE_MODE_SVA 1 /* use uacce sva mode */ +#define UACCE_MODE_DESC "0(default) means only register to crypto, 1 means both register to crypto and uacce" + +enum qm_stop_reason { + QM_NORMAL, + QM_SOFT_RESET, + QM_FLR, +}; + +enum qm_state { + QM_INIT = 0, + QM_START, + QM_CLOSE, + QM_STOP, +}; + +enum qp_state { + QP_INIT = 1, + QP_START, + QP_STOP, + QP_CLOSE, +}; + +enum qm_hw_ver { + QM_HW_UNKNOWN = -1, + QM_HW_V1 = 0x20, + QM_HW_V2 = 0x21, + QM_HW_V3 = 0x30, +}; + +enum qm_fun_type { + QM_HW_PF, + QM_HW_VF, +}; + +enum qm_debug_file { + CURRENT_QM, + CURRENT_Q, + CLEAR_ENABLE, + DEBUG_FILE_NUM, +}; + +struct qm_dfx { + atomic64_t err_irq_cnt; + atomic64_t aeq_irq_cnt; + atomic64_t abnormal_irq_cnt; + atomic64_t create_qp_err_cnt; + atomic64_t mb_err_cnt; +}; + +struct debugfs_file { + enum qm_debug_file index; + struct mutex lock; + struct qm_debug *debug; +}; + +struct qm_debug { + u32 curr_qm_qp_num; + u32 sqe_mask_offset; + u32 sqe_mask_len; + struct qm_dfx dfx; + struct dentry *debug_root; + struct dentry *qm_d; + struct debugfs_file files[DEBUG_FILE_NUM]; +}; + +struct qm_shaper_factor { + u32 func_qos; + u64 cir_b; + u64 cir_u; + u64 cir_s; + u64 cbs_s; +}; + +struct qm_dma { + void *va; + dma_addr_t dma; + size_t size; +}; + +struct hisi_qm_status { + u32 eq_head; + bool eqc_phase; + u32 aeq_head; + bool aeqc_phase; + atomic_t flags; + int stop_reason; +}; + +struct hisi_qm; + +struct hisi_qm_err_info { + char *acpi_rst; + u32 msi_wr_port; + u32 ecc_2bits_mask; + u32 dev_ce_mask; + u32 ce; + u32 nfe; + u32 fe; +}; + +struct hisi_qm_err_status { + u32 is_qm_ecc_mbit; + u32 is_dev_ecc_mbit; +}; + +struct hisi_qm_err_ini { + int (*hw_init)(struct hisi_qm *qm); + void (*hw_err_enable)(struct hisi_qm *qm); + void (*hw_err_disable)(struct hisi_qm *qm); + u32 (*get_dev_hw_err_status)(struct hisi_qm *qm); + void (*clear_dev_hw_err_status)(struct hisi_qm *qm, u32 err_sts); + void (*open_axi_master_ooo)(struct hisi_qm *qm); + void (*close_axi_master_ooo)(struct hisi_qm *qm); + void (*open_sva_prefetch)(struct hisi_qm *qm); + void (*close_sva_prefetch)(struct hisi_qm *qm); + void (*log_dev_hw_err)(struct hisi_qm *qm, u32 err_sts); + void (*err_info_init)(struct hisi_qm *qm); +}; + +struct hisi_qm_list { + struct mutex lock; + struct list_head list; + int (*register_to_crypto)(struct hisi_qm *qm); + void (*unregister_from_crypto)(struct hisi_qm *qm); +}; + +struct hisi_qm { + enum qm_hw_ver ver; + enum qm_fun_type fun_type; + const char *dev_name; + struct pci_dev *pdev; + void __iomem *io_base; + void __iomem *db_io_base; + u32 sqe_size; + u32 qp_base; + u32 qp_num; + u32 qp_in_used; + u32 ctrl_qp_num; + u32 max_qp_num; + u32 vfs_num; + u32 db_interval; + struct list_head list; + struct hisi_qm_list *qm_list; + + struct qm_dma qdma; + struct qm_sqc *sqc; + struct qm_cqc *cqc; + struct qm_eqe *eqe; + struct qm_aeqe *aeqe; + dma_addr_t sqc_dma; + dma_addr_t cqc_dma; + dma_addr_t eqe_dma; + dma_addr_t aeqe_dma; + + struct hisi_qm_status status; + const struct hisi_qm_err_ini *err_ini; + struct hisi_qm_err_info err_info; + struct hisi_qm_err_status err_status; + unsigned long misc_ctl; /* driver removing and reset sched */ + + struct rw_semaphore qps_lock; + struct idr qp_idr; + struct hisi_qp *qp_array; + + struct mutex mailbox_lock; + + const struct hisi_qm_hw_ops *ops; + + struct qm_debug debug; + + u32 error_mask; + + struct workqueue_struct *wq; + struct work_struct work; + struct work_struct rst_work; + struct work_struct cmd_process; + + const char *algs; + bool use_sva; + bool is_frozen; + + /* doorbell isolation enable */ + bool use_db_isolation; + resource_size_t phys_base; + resource_size_t db_phys_base; + struct uacce_device *uacce; + int mode; + struct qm_shaper_factor *factor; + u32 mb_qos; + u32 type_rate; +}; + +struct hisi_qp_status { + atomic_t used; + u16 sq_tail; + u16 cq_head; + bool cqc_phase; + atomic_t flags; +}; + +struct hisi_qp_ops { + int (*fill_sqe)(void *sqe, void *q_parm, void *d_parm); +}; + +struct hisi_qp { + u32 qp_id; + u8 alg_type; + u8 req_type; + + struct qm_dma qdma; + void *sqe; + struct qm_cqe *cqe; + dma_addr_t sqe_dma; + dma_addr_t cqe_dma; + + struct hisi_qp_status qp_status; + struct hisi_qp_ops *hw_ops; + void *qp_ctx; + void (*req_cb)(struct hisi_qp *qp, void *data); + void (*event_cb)(struct hisi_qp *qp); + + struct hisi_qm *qm; + bool is_resetting; + bool is_in_kernel; + u16 pasid; + struct uacce_queue *uacce_q; +}; + +static inline int q_num_set(const char *val, const struct kernel_param *kp, + unsigned int device) +{ + struct pci_dev *pdev = pci_get_device(PCI_VENDOR_ID_HUAWEI, + device, NULL); + u32 n, q_num; + int ret; + + if (!val) + return -EINVAL; + + if (!pdev) { + q_num = min_t(u32, QM_QNUM_V1, QM_QNUM_V2); + pr_info("No device found currently, suppose queue number is %u\n", + q_num); + } else { + if (pdev->revision == QM_HW_V1) + q_num = QM_QNUM_V1; + else + q_num = QM_QNUM_V2; + } + + ret = kstrtou32(val, 10, &n); + if (ret || n < QM_MIN_QNUM || n > q_num) + return -EINVAL; + + return param_set_int(val, kp); +} + +static inline int vfs_num_set(const char *val, const struct kernel_param *kp) +{ + u32 n; + int ret; + + if (!val) + return -EINVAL; + + ret = kstrtou32(val, 10, &n); + if (ret < 0) + return ret; + + if (n > QM_MAX_VFS_NUM_V2) + return -EINVAL; + + return param_set_int(val, kp); +} + +static inline int mode_set(const char *val, const struct kernel_param *kp) +{ + u32 n; + int ret; + + if (!val) + return -EINVAL; + + ret = kstrtou32(val, 10, &n); + if (ret != 0 || (n != UACCE_MODE_SVA && + n != UACCE_MODE_NOUACCE)) + return -EINVAL; + + return param_set_int(val, kp); +} + +static inline int uacce_mode_set(const char *val, const struct kernel_param *kp) +{ + return mode_set(val, kp); +} + +static inline void hisi_qm_init_list(struct hisi_qm_list *qm_list) +{ + INIT_LIST_HEAD(&qm_list->list); + mutex_init(&qm_list->lock); +} + +int hisi_qm_init(struct hisi_qm *qm); +void hisi_qm_uninit(struct hisi_qm *qm); +int hisi_qm_start(struct hisi_qm *qm); +int hisi_qm_stop(struct hisi_qm *qm, enum qm_stop_reason r); +struct hisi_qp *hisi_qm_create_qp(struct hisi_qm *qm, u8 alg_type); +int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg); +int hisi_qm_stop_qp(struct hisi_qp *qp); +void hisi_qm_release_qp(struct hisi_qp *qp); +int hisi_qp_send(struct hisi_qp *qp, const void *msg); +int hisi_qm_get_free_qp_num(struct hisi_qm *qm); +int hisi_qm_get_vft(struct hisi_qm *qm, u32 *base, u32 *number); +void hisi_qm_debug_init(struct hisi_qm *qm); +enum qm_hw_ver hisi_qm_get_hw_version(struct pci_dev *pdev); +void hisi_qm_debug_regs_clear(struct hisi_qm *qm); +int hisi_qm_sriov_enable(struct pci_dev *pdev, int max_vfs); +int hisi_qm_sriov_disable(struct pci_dev *pdev, bool is_frozen); +int hisi_qm_sriov_configure(struct pci_dev *pdev, int num_vfs); +void hisi_qm_dev_err_init(struct hisi_qm *qm); +void hisi_qm_dev_err_uninit(struct hisi_qm *qm); +pci_ers_result_t hisi_qm_dev_err_detected(struct pci_dev *pdev, + pci_channel_state_t state); +pci_ers_result_t hisi_qm_dev_slot_reset(struct pci_dev *pdev); +void hisi_qm_reset_prepare(struct pci_dev *pdev); +void hisi_qm_reset_done(struct pci_dev *pdev); + +struct hisi_acc_sgl_pool; +struct hisi_acc_hw_sgl *hisi_acc_sg_buf_map_to_hw_sgl(struct device *dev, + struct scatterlist *sgl, struct hisi_acc_sgl_pool *pool, + u32 index, dma_addr_t *hw_sgl_dma); +void hisi_acc_sg_buf_unmap(struct device *dev, struct scatterlist *sgl, + struct hisi_acc_hw_sgl *hw_sgl); +struct hisi_acc_sgl_pool *hisi_acc_create_sgl_pool(struct device *dev, + u32 count, u32 sge_nr); +void hisi_acc_free_sgl_pool(struct device *dev, + struct hisi_acc_sgl_pool *pool); +int hisi_qm_alloc_qps_node(struct hisi_qm_list *qm_list, int qp_num, + u8 alg_type, int node, struct hisi_qp **qps); +void hisi_qm_free_qps(struct hisi_qp **qps, int qp_num); +void hisi_qm_dev_shutdown(struct pci_dev *pdev); +void hisi_qm_wait_task_finish(struct hisi_qm *qm, struct hisi_qm_list *qm_list); +int hisi_qm_alg_register(struct hisi_qm *qm, struct hisi_qm_list *qm_list); +void hisi_qm_alg_unregister(struct hisi_qm *qm, struct hisi_qm_list *qm_list); +int hisi_qm_resume(struct device *dev); +int hisi_qm_suspend(struct device *dev); +void hisi_qm_pm_uninit(struct hisi_qm *qm); +void hisi_qm_pm_init(struct hisi_qm *qm); +int hisi_qm_get_dfx_access(struct hisi_qm *qm); +void hisi_qm_put_dfx_access(struct hisi_qm *qm); +void hisi_qm_regs_dump(struct seq_file *s, struct debugfs_regset32 *regset); +#endif -- cgit From b4b084d7133284e29777ca445e001186259602d4 Mon Sep 17 00:00:00 2001 From: Longfang Liu Date: Tue, 8 Mar 2022 18:48:55 +0000 Subject: crypto: hisilicon/qm: Move few definitions to common header Move Doorbell and Mailbox definitions to common header file. Also export QM mailbox functions. This will be useful when we introduce VFIO PCI HiSilicon ACC live migration driver. Signed-off-by: Longfang Liu Acked-by: Zhou Wang Signed-off-by: Shameer Kolothum Link: https://lore.kernel.org/r/20220308184902.2242-3-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson --- include/linux/hisi_acc_qm.h | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 3068093229a5..6a6477c34666 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -34,6 +34,40 @@ #define QM_WUSER_M_CFG_ENABLE 0x1000a8 #define WUSER_M_CFG_ENABLE 0xffffffff +/* mailbox */ +#define QM_MB_CMD_SQC 0x0 +#define QM_MB_CMD_CQC 0x1 +#define QM_MB_CMD_EQC 0x2 +#define QM_MB_CMD_AEQC 0x3 +#define QM_MB_CMD_SQC_BT 0x4 +#define QM_MB_CMD_CQC_BT 0x5 +#define QM_MB_CMD_SQC_VFT_V2 0x6 +#define QM_MB_CMD_STOP_QP 0x8 +#define QM_MB_CMD_SRC 0xc +#define QM_MB_CMD_DST 0xd + +#define QM_MB_CMD_SEND_BASE 0x300 +#define QM_MB_EVENT_SHIFT 8 +#define QM_MB_BUSY_SHIFT 13 +#define QM_MB_OP_SHIFT 14 +#define QM_MB_CMD_DATA_ADDR_L 0x304 +#define QM_MB_CMD_DATA_ADDR_H 0x308 +#define QM_MB_MAX_WAIT_CNT 6000 + +/* doorbell */ +#define QM_DOORBELL_CMD_SQ 0 +#define QM_DOORBELL_CMD_CQ 1 +#define QM_DOORBELL_CMD_EQ 2 +#define QM_DOORBELL_CMD_AEQ 3 + +#define QM_DOORBELL_SQ_CQ_BASE_V2 0x1000 +#define QM_DOORBELL_EQ_AEQ_BASE_V2 0x2000 +#define QM_QP_MAX_NUM_SHIFT 11 +#define QM_DB_CMD_SHIFT_V2 12 +#define QM_DB_RAND_SHIFT_V2 16 +#define QM_DB_INDEX_SHIFT_V2 32 +#define QM_DB_PRIORITY_SHIFT_V2 48 + /* qm cache */ #define QM_CACHE_CTL 0x100050 #define SQC_CACHE_ENABLE BIT(0) @@ -414,6 +448,10 @@ pci_ers_result_t hisi_qm_dev_slot_reset(struct pci_dev *pdev); void hisi_qm_reset_prepare(struct pci_dev *pdev); void hisi_qm_reset_done(struct pci_dev *pdev); +int hisi_qm_wait_mb_ready(struct hisi_qm *qm); +int hisi_qm_mb(struct hisi_qm *qm, u8 cmd, dma_addr_t dma_addr, u16 queue, + bool op); + struct hisi_acc_sgl_pool; struct hisi_acc_hw_sgl *hisi_acc_sg_buf_map_to_hw_sgl(struct device *dev, struct scatterlist *sgl, struct hisi_acc_sgl_pool *pool, -- cgit From fae74feacd2dfb6172c4862fc760f969a991c06f Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Tue, 8 Mar 2022 18:48:56 +0000 Subject: hisi_acc_qm: Move VF PCI device IDs to common header Move the PCI Device IDs of HiSilicon ACC VF devices to a common header and also use a uniform naming convention. This will be useful when we introduce the vfio PCI HiSilicon ACC live migration driver in subsequent patches. Cc: Bjorn Helgaas Acked-by: Zhou Wang Acked-by: Longfang Liu Acked-by: Kai Ye Signed-off-by: Shameer Kolothum Acked-by: Bjorn Helgaas # pci_ids.h Link: https://lore.kernel.org/r/20220308184902.2242-4-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson --- include/linux/pci_ids.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index aad54c666407..31dee2b65a62 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2529,6 +2529,9 @@ #define PCI_DEVICE_ID_KORENIX_JETCARDF3 0x17ff #define PCI_VENDOR_ID_HUAWEI 0x19e5 +#define PCI_DEVICE_ID_HUAWEI_ZIP_VF 0xa251 +#define PCI_DEVICE_ID_HUAWEI_SEC_VF 0xa256 +#define PCI_DEVICE_ID_HUAWEI_HPRE_VF 0xa259 #define PCI_VENDOR_ID_NETRONOME 0x19ee #define PCI_DEVICE_ID_NETRONOME_NFP4000 0x4000 -- cgit From 442fbc099b839551f8576723da22c1269cc695ce Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Tue, 8 Mar 2022 18:48:59 +0000 Subject: hisi_acc_vfio_pci: Add helper to retrieve the struct pci_driver struct pci_driver pointer is an input into the pci_iov_get_pf_drvdata(). Introduce helpers to retrieve the ACC PF dev struct pci_driver pointers as we use this in ACC vfio migration driver. Acked-by: Zhou Wang Acked-by: Kai Ye Acked-by: Longfang Liu Signed-off-by: Shameer Kolothum Link: https://lore.kernel.org/r/20220308184902.2242-7-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson --- include/linux/hisi_acc_qm.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 6a6477c34666..00f2a4db8723 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -476,4 +476,9 @@ void hisi_qm_pm_init(struct hisi_qm *qm); int hisi_qm_get_dfx_access(struct hisi_qm *qm); void hisi_qm_put_dfx_access(struct hisi_qm *qm); void hisi_qm_regs_dump(struct seq_file *s, struct debugfs_regset32 *regset); + +/* Used by VFIO ACC live migration driver */ +struct pci_driver *hisi_sec_get_pf_driver(void); +struct pci_driver *hisi_hpre_get_pf_driver(void); +struct pci_driver *hisi_zip_get_pf_driver(void); #endif -- cgit From 1e459b25081d4f939b8a1fb4c71dab0cec8f974a Mon Sep 17 00:00:00 2001 From: Longfang Liu Date: Tue, 8 Mar 2022 18:49:00 +0000 Subject: crypto: hisilicon/qm: Set the VF QM state register We use VF QM state register to record the status of the QM configuration state. This will be used in the ACC migration driver to determine whether we can safely save and restore the QM data. Signed-off-by: Longfang Liu Acked-by: Zhou Wang Signed-off-by: Shameer Kolothum Link: https://lore.kernel.org/r/20220308184902.2242-8-shameerali.kolothum.thodi@huawei.com Signed-off-by: Alex Williamson --- include/linux/hisi_acc_qm.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 00f2a4db8723..177f7b7cd414 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -67,6 +67,7 @@ #define QM_DB_RAND_SHIFT_V2 16 #define QM_DB_INDEX_SHIFT_V2 32 #define QM_DB_PRIORITY_SHIFT_V2 48 +#define QM_VF_STATE 0x60 /* qm cache */ #define QM_CACHE_CTL 0x100050 @@ -162,6 +163,11 @@ enum qm_debug_file { DEBUG_FILE_NUM, }; +enum qm_vf_state { + QM_READY = 0, + QM_NOT_READY, +}; + struct qm_dfx { atomic64_t err_irq_cnt; atomic64_t aeq_irq_cnt; -- cgit From 8126b1c73108bc691f5643df19071a59a69d0bc6 Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Mon, 14 Mar 2022 19:59:53 +0100 Subject: pstore: Don't use semaphores in always-atomic-context code pstore_dump() is *always* invoked in atomic context (nowadays in an RCU read-side critical section, before that under a spinlock). It doesn't make sense to try to use semaphores here. This is mostly a revert of commit ea84b580b955 ("pstore: Convert buf_lock to semaphore"), except that two parts aren't restored back exactly as they were: - keep the lock initialization in pstore_register - in efi_pstore_write(), always set the "block" flag to false - omit "is_locked", that was unnecessary since commit 959217c84c27 ("pstore: Actually give up during locking failure") - fix the bailout message The actual problem that the buggy commit was trying to address may have been that the use of preemptible() in efi_pstore_write() was wrong - it only looks at preempt_count() and the state of IRQs, but __rcu_read_lock() doesn't touch either of those under CONFIG_PREEMPT_RCU. (Sidenote: CONFIG_PREEMPT_RCU means that the scheduler can preempt tasks in RCU read-side critical sections, but you're not allowed to actively block/reschedule.) Lockdep probably never caught the problem because it's very rare that you actually hit the contended case, so lockdep always just sees the down_trylock(), not the down_interruptible(), and so it can't tell that there's a problem. Fixes: ea84b580b955 ("pstore: Convert buf_lock to semaphore") Cc: stable@vger.kernel.org Acked-by: Sebastian Andrzej Siewior Signed-off-by: Jann Horn Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220314185953.2068993-1-jannh@google.com --- include/linux/pstore.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pstore.h b/include/linux/pstore.h index eb93a54cff31..e97a8188f0fd 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -14,7 +14,7 @@ #include #include #include -#include +#include #include #include @@ -87,7 +87,7 @@ struct pstore_record { * @owner: module which is responsible for this backend driver * @name: name of the backend driver * - * @buf_lock: semaphore to serialize access to @buf + * @buf_lock: spinlock to serialize access to @buf * @buf: preallocated crash dump buffer * @bufsize: size of @buf available for crash dump bytes (must match * smallest number of bytes available for writing to a @@ -178,7 +178,7 @@ struct pstore_info { struct module *owner; const char *name; - struct semaphore buf_lock; + spinlock_t buf_lock; char *buf; size_t bufsize; -- cgit From e621900ad28b748e058b81d6078a5d5eb37b3973 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:22:12 +0000 Subject: fs: Convert __set_page_dirty_buffers to block_dirty_folio Convert all callers; mostly this is just changing the aops to point at it, but a few implementations need a little more work. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/buffer_head.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 9ee9d003d736..bcb4fe9b8575 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -397,7 +397,7 @@ __bread(struct block_device *bdev, sector_t block, unsigned size) return __bread_gfp(bdev, block, size, __GFP_MOVABLE); } -extern int __set_page_dirty_buffers(struct page *page); +bool block_dirty_folio(struct address_space *mapping, struct folio *folio); #else /* CONFIG_BLOCK */ -- cgit From 46de8b979492e1377947700ecb1e3169088668b2 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:22:13 +0000 Subject: fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio This is a mechanical change. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/pagemap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 6a9617e9c6bc..ab85d2856a37 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -919,7 +919,7 @@ static inline int __must_check write_one_page(struct page *page) } int __set_page_dirty_nobuffers(struct page *page); -int __set_page_dirty_no_writeback(struct page *page); +bool noop_dirty_folio(struct address_space *mapping, struct folio *folio); void page_endio(struct page *page, bool is_write, int err); -- cgit From 3a3bae50af5d73fab5da20484029de77ca67bb2e Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 9 Feb 2022 20:22:15 +0000 Subject: fs: Remove aops ->set_page_dirty With all implementations converted to ->dirty_folio, we can stop calling this fallback method and remove it entirely. Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Damien Le Moal Acked-by: Damien Le Moal Tested-by: Mike Marshall # orangefs Tested-by: David Howells # afs --- include/linux/fs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index c3d5db8851ae..b472d78f00b0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -367,8 +367,7 @@ struct address_space_operations { /* Write back some dirty pages from this mapping. */ int (*writepages)(struct address_space *, struct writeback_control *); - /* Set a page dirty. Return true if this dirtied it */ - int (*set_page_dirty)(struct page *page); + /* Mark a folio dirty. Return true if this dirtied it */ bool (*dirty_folio)(struct address_space *, struct folio *); /* -- cgit From f6c46b1d62f8ffbf2cf6eb43ab0d277bd3f7e948 Mon Sep 17 00:00:00 2001 From: David Woodhouse Date: Fri, 11 Mar 2022 19:20:17 +0000 Subject: PM: hibernate: Honour ACPI hardware signature by default for virtual guests The ACPI specification says that OSPM should refuse to restore from hibernate if the hardware signature changes, and should boot from scratch. However, real BIOSes often vary the hardware signature in cases where we *do* want to resume from hibernate, so Linux doesn't follow the spec by default. However, in a virtual environment there's no reason for the VMM to vary the hardware signature *unless* it wants to trigger a clean reboot as defined by the ACPI spec. So enable the check by default if a hypervisor is detected. Signed-off-by: David Woodhouse Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6274758648e3..766dbcb82df1 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -526,7 +526,7 @@ acpi_status acpi_release_memory(acpi_handle handle, struct resource *res, int acpi_resources_are_enforced(void); #ifdef CONFIG_HIBERNATION -void __init acpi_check_s4_hw_signature(int check); +extern int acpi_check_s4_hw_signature; #endif #ifdef CONFIG_PM_SLEEP -- cgit From d2a3b7c5becc3992f8e7d2b9bf5eacceeedb9a48 Mon Sep 17 00:00:00 2001 From: Hou Tao Date: Wed, 9 Mar 2022 20:33:20 +0800 Subject: bpf: Fix net.core.bpf_jit_harden race It is the bpf_jit_harden counterpart to commit 60b58afc96c9 ("bpf: fix net.core.bpf_jit_enable race"). bpf_jit_harden will be tested twice for each subprog if there are subprogs in bpf program and constant blinding may increase the length of program, so when running "./test_progs -t subprogs" and toggling bpf_jit_harden between 0 and 2, jit_subprogs may fail because constant blinding increases the length of subprog instructions during extra passs. So cache the value of bpf_jit_blinding_enabled() during program allocation, and use the cached value during constant blinding, subprog JITing and args tracking of tail call. Signed-off-by: Hou Tao Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220309123321.2400262-4-houtao1@huawei.com --- include/linux/filter.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index 05ed9bd31b45..ed0c0ff42ad5 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -566,6 +566,7 @@ struct bpf_prog { gpl_compatible:1, /* Is filter GPL compatible? */ cb_access:1, /* Is control block accessed? */ dst_needed:1, /* Do we need dst entry? */ + blinding_requested:1, /* needs constant blinding */ blinded:1, /* Was blinded */ is_func:1, /* program is a bpf function */ kprobe_override:1, /* Do we override a kprobe? */ -- cgit From 4ee06de7729d795773145692e246a06448b1eb7a Mon Sep 17 00:00:00 2001 From: Nicolas Dichtel Date: Tue, 15 Mar 2022 10:20:08 +0100 Subject: net: handle ARPHRD_PIMREG in dev_is_mac_header_xmit() This kind of interface doesn't have a mac header. This patch fixes bpf_redirect() to a PIM interface. Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper") Signed-off-by: Nicolas Dichtel Link: https://lore.kernel.org/r/20220315092008.31423-1-nicolas.dichtel@6wind.com Signed-off-by: Jakub Kicinski --- include/linux/if_arp.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/if_arp.h b/include/linux/if_arp.h index b712217f7030..1ed52441972f 100644 --- a/include/linux/if_arp.h +++ b/include/linux/if_arp.h @@ -52,6 +52,7 @@ static inline bool dev_is_mac_header_xmit(const struct net_device *dev) case ARPHRD_VOID: case ARPHRD_NONE: case ARPHRD_RAWIP: + case ARPHRD_PIMREG: return false; default: return true; -- cgit From c42fa24b44751c62c86e98430ef915c0609a2ab8 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Wed, 16 Mar 2022 13:39:03 +0100 Subject: ACPI: bus: Avoid using CPPC if not supported by firmware If the platform firmware indicates that it does not support CPPC by clearing the OSC_SB_CPC_SUPPORT and OSC_SB_CPCV2_SUPPORT bits in the platform _OSC capabilities mask, avoid attempting to evaluate _CPC which may fail in that case. Because the OSC_SB_CPC_SUPPORT and OSC_SB_CPCV2_SUPPORT bits are only added to the supported platform capabilities mask on x86, when X86_FEATURE_HWP is supported, allow _CPC to be evaluated regardless in the other cases. Link: https://lore.kernel.org/linux-acpi/CAJZ5v0i=ecAksq0TV+iLVObm-=fUfdqPABzzkgm9K6KxO1ZCcg@mail.gmail.com Signed-off-by: Rafael J. Wysocki Tested-by: Mario Limonciello Acked-by: Huang Rui Reviewed-by: Mika Westerberg --- include/linux/acpi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6274758648e3..690b0533104a 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -580,6 +580,7 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context); extern bool osc_sb_apei_support_acked; extern bool osc_pc_lpi_support_confirmed; extern bool osc_sb_native_usb4_support_confirmed; +extern bool osc_sb_cppc_not_supported; /* USB4 Capabilities */ #define OSC_USB_USB3_TUNNELING 0x00000001 -- cgit From 20e1d6402a71dba7ad2b81f332a3c14c7d3b939b Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Thu, 17 Mar 2022 09:14:42 -0500 Subject: ACPI / x86: Add support for LPS0 callback handler Currenty the latest thing run during a suspend to idle attempt is the LPS0 `prepare_late` callback and the earliest thing is the `resume_early` callback. There is a desire for the `amd-pmc` driver to suspend later in the suspend process (ideally the very last thing), so create a callback that it or any other driver can hook into to do this. Signed-off-by: Mario Limonciello Acked-by: Rafael J. Wysocki Link: https://lore.kernel.org/r/20220317141445.6498-1-mario.limonciello@amd.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/acpi.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6274758648e3..47c16cdc8e0e 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1023,7 +1023,15 @@ void acpi_os_set_prepare_extended_sleep(int (*func)(u8 sleep_state, acpi_status acpi_os_prepare_extended_sleep(u8 sleep_state, u32 val_a, u32 val_b); - +#ifdef CONFIG_X86 +struct acpi_s2idle_dev_ops { + struct list_head list_node; + void (*prepare)(void); + void (*restore)(void); +}; +int acpi_register_lps0_dev(struct acpi_s2idle_dev_ops *arg); +void acpi_unregister_lps0_dev(struct acpi_s2idle_dev_ops *arg); +#endif /* CONFIG_X86 */ #ifndef CONFIG_IA64 void arch_reserve_mem_area(acpi_physical_address addr, size_t size); #else -- cgit From 4206fe40b2c01fb5988980cff6ea958b16578026 Mon Sep 17 00:00:00 2001 From: Tariq Toukan Date: Tue, 27 Apr 2021 16:30:32 +0300 Subject: net/mlx5: Remove unused exported contiguous coherent buffer allocation API All WQ types moved to using the fragmented allocation API for coherent memory. Contiguous API is not used anymore. Remove it, reduce the number of exported functions. Signed-off-by: Tariq Toukan Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 00a914b0716e..a386aec1eb65 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1009,9 +1009,6 @@ void mlx5_start_health_poll(struct mlx5_core_dev *dev); void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health); void mlx5_drain_health_wq(struct mlx5_core_dev *dev); void mlx5_trigger_health_work(struct mlx5_core_dev *dev); -int mlx5_buf_alloc(struct mlx5_core_dev *dev, - int size, struct mlx5_frag_buf *buf); -void mlx5_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf); int mlx5_frag_buf_alloc_node(struct mlx5_core_dev *dev, int size, struct mlx5_frag_buf *buf, int node); void mlx5_frag_buf_free(struct mlx5_core_dev *dev, struct mlx5_frag_buf *buf); -- cgit From 770c9a3a01af178a90368a78c75eb91707c7233c Mon Sep 17 00:00:00 2001 From: Tariq Toukan Date: Thu, 20 May 2021 15:34:57 +0300 Subject: net/mlx5: Remove unused fill page array API function mlx5_fill_page_array API function is not used. Remove it, reduce the number of exported functions. Signed-off-by: Tariq Toukan Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index a386aec1eb65..96cd740d94a3 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1036,7 +1036,6 @@ int mlx5_reclaim_startup_pages(struct mlx5_core_dev *dev); void mlx5_register_debugfs(void); void mlx5_unregister_debugfs(void); -void mlx5_fill_page_array(struct mlx5_frag_buf *buf, __be64 *pas); void mlx5_fill_page_frag_array_perm(struct mlx5_frag_buf *buf, __be64 *pas, u8 perm); void mlx5_fill_page_frag_array(struct mlx5_frag_buf *frag_buf, __be64 *pas); int mlx5_vector2eqn(struct mlx5_core_dev *dev, int vector, int *eqn); -- cgit From cceac97afa090284b3ceecd93ea6b7b527116767 Mon Sep 17 00:00:00 2001 From: Tobias Waldekranz Date: Wed, 16 Mar 2022 16:08:49 +0100 Subject: net: bridge: mst: Add helper to map an MSTI to a VID set br_mst_get_info answers the question: "On this bridge, which VIDs are mapped to the given MSTI?" This is useful in switchdev drivers, which might have to fan-out operations, relating to an MSTI, per VLAN. An example: When a port's MST state changes from forwarding to blocking, a driver may choose to flush the dynamic FDB entries on that port to get faster reconvergence of the network, but this should only be done in the VLANs that are managed by the MSTI in question. Signed-off-by: Tobias Waldekranz Reviewed-by: Vladimir Oltean Acked-by: Nikolay Aleksandrov Signed-off-by: Jakub Kicinski --- include/linux/if_bridge.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 3aae023a9353..1cf0cc46d90d 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -119,6 +119,7 @@ int br_vlan_get_info(const struct net_device *dev, u16 vid, struct bridge_vlan_info *p_vinfo); int br_vlan_get_info_rcu(const struct net_device *dev, u16 vid, struct bridge_vlan_info *p_vinfo); +int br_mst_get_info(const struct net_device *dev, u16 msti, unsigned long *vids); #else static inline bool br_vlan_enabled(const struct net_device *dev) { @@ -151,6 +152,12 @@ static inline int br_vlan_get_info_rcu(const struct net_device *dev, u16 vid, { return -EINVAL; } + +static inline int br_mst_get_info(const struct net_device *dev, u16 msti, + unsigned long *vids) +{ + return -EINVAL; +} #endif #if IS_ENABLED(CONFIG_BRIDGE) -- cgit From 48d57b2e5f439246c4e12ed7705a5e3241294b03 Mon Sep 17 00:00:00 2001 From: Tobias Waldekranz Date: Wed, 16 Mar 2022 16:08:50 +0100 Subject: net: bridge: mst: Add helper to check if MST is enabled This is useful for switchdev drivers that might want to refuse to join a bridge where MST is enabled, if the hardware can't support it. Signed-off-by: Tobias Waldekranz Reviewed-by: Vladimir Oltean Acked-by: Nikolay Aleksandrov Signed-off-by: Jakub Kicinski --- include/linux/if_bridge.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 1cf0cc46d90d..4efd5540279a 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -119,6 +119,7 @@ int br_vlan_get_info(const struct net_device *dev, u16 vid, struct bridge_vlan_info *p_vinfo); int br_vlan_get_info_rcu(const struct net_device *dev, u16 vid, struct bridge_vlan_info *p_vinfo); +bool br_mst_enabled(const struct net_device *dev); int br_mst_get_info(const struct net_device *dev, u16 msti, unsigned long *vids); #else static inline bool br_vlan_enabled(const struct net_device *dev) @@ -153,6 +154,11 @@ static inline int br_vlan_get_info_rcu(const struct net_device *dev, u16 vid, return -EINVAL; } +static inline bool br_mst_enabled(const struct net_device *dev) +{ + return false; +} + static inline int br_mst_get_info(const struct net_device *dev, u16 msti, unsigned long *vids) { -- cgit From f54fd0e163068ced4d0d27db64cd3cbda95b74e4 Mon Sep 17 00:00:00 2001 From: Tobias Waldekranz Date: Wed, 16 Mar 2022 16:08:51 +0100 Subject: net: bridge: mst: Add helper to query a port's MST state This is useful for switchdev drivers who are offloading MST states into hardware. As an example, a driver may wish to flush the FDB for a port when it transitions from forwarding to blocking - which means that the previous state must be discoverable. Signed-off-by: Tobias Waldekranz Reviewed-by: Vladimir Oltean Acked-by: Nikolay Aleksandrov Signed-off-by: Jakub Kicinski --- include/linux/if_bridge.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h index 4efd5540279a..d62ef428e3aa 100644 --- a/include/linux/if_bridge.h +++ b/include/linux/if_bridge.h @@ -121,6 +121,7 @@ int br_vlan_get_info_rcu(const struct net_device *dev, u16 vid, struct bridge_vlan_info *p_vinfo); bool br_mst_enabled(const struct net_device *dev); int br_mst_get_info(const struct net_device *dev, u16 msti, unsigned long *vids); +int br_mst_get_state(const struct net_device *dev, u16 msti, u8 *state); #else static inline bool br_vlan_enabled(const struct net_device *dev) { @@ -164,6 +165,11 @@ static inline int br_mst_get_info(const struct net_device *dev, u16 msti, { return -EINVAL; } +static inline int br_mst_get_state(const struct net_device *dev, u16 msti, + u8 *state) +{ + return -EINVAL; +} #endif #if IS_ENABLED(CONFIG_BRIDGE) -- cgit From 4f554e955614f19425cee86de4669351741a6280 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 15 Mar 2022 23:00:26 +0900 Subject: ftrace: Add ftrace_set_filter_ips function Adding ftrace_set_filter_ips function to be able to set filter on multiple ip addresses at once. With the kprobe multi attach interface we have cases where we need to initialize ftrace_ops object with thousands of functions, so having single function diving into ftrace_hash_move_and_update_ops with ftrace_lock is faster. The functions ips are passed as unsigned long array with count. Signed-off-by: Jiri Olsa Signed-off-by: Steven Rostedt (Google) Tested-by: Steven Rostedt (Google) Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/164735282673.1084943.18310504594134769804.stgit@devnote2 --- include/linux/ftrace.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 9999e29187de..60847cbce0da 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -512,6 +512,8 @@ struct dyn_ftrace { int ftrace_set_filter_ip(struct ftrace_ops *ops, unsigned long ip, int remove, int reset); +int ftrace_set_filter_ips(struct ftrace_ops *ops, unsigned long *ips, + unsigned int cnt, int remove, int reset); int ftrace_set_filter(struct ftrace_ops *ops, unsigned char *buf, int len, int reset); int ftrace_set_notrace(struct ftrace_ops *ops, unsigned char *buf, @@ -802,6 +804,7 @@ static inline unsigned long ftrace_location(unsigned long ip) #define ftrace_regex_open(ops, flag, inod, file) ({ -ENODEV; }) #define ftrace_set_early_filter(ops, buf, enable) do { } while (0) #define ftrace_set_filter_ip(ops, ip, remove, reset) ({ -ENODEV; }) +#define ftrace_set_filter_ips(ops, ips, cnt, remove, reset) ({ -ENODEV; }) #define ftrace_set_filter(ops, buf, len, reset) ({ -ENODEV; }) #define ftrace_set_notrace(ops, buf, len, reset) ({ -ENODEV; }) #define ftrace_free_filter(ops) do { } while (0) -- cgit From cad9931f64dc7f5dbdec12cae9f30063360f9855 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Tue, 15 Mar 2022 23:00:38 +0900 Subject: fprobe: Add ftrace based probe APIs The fprobe is a wrapper API for ftrace function tracer. Unlike kprobes, this probes only supports the function entry, but this can probe multiple functions by one fprobe. The usage is similar, user will set their callback to fprobe::entry_handler and call register_fprobe*() with probed functions. There are 3 registration interfaces, - register_fprobe() takes filtering patterns of the functin names. - register_fprobe_ips() takes an array of ftrace-location addresses. - register_fprobe_syms() takes an array of function names. The registered fprobes can be unregistered with unregister_fprobe(). e.g. struct fprobe fp = { .entry_handler = user_handler }; const char *targets[] = { "func1", "func2", "func3"}; ... ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets)); ... unregister_fprobe(&fp); Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (Google) Tested-by: Steven Rostedt (Google) Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/164735283857.1084943.1154436951479395551.stgit@devnote2 --- include/linux/fprobe.h | 87 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 87 insertions(+) create mode 100644 include/linux/fprobe.h (limited to 'include/linux') diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h new file mode 100644 index 000000000000..2ba099aff041 --- /dev/null +++ b/include/linux/fprobe.h @@ -0,0 +1,87 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* Simple ftrace probe wrapper */ +#ifndef _LINUX_FPROBE_H +#define _LINUX_FPROBE_H + +#include +#include + +/** + * struct fprobe - ftrace based probe. + * @ops: The ftrace_ops. + * @nmissed: The counter for missing events. + * @flags: The status flag. + * @entry_handler: The callback function for function entry. + */ +struct fprobe { +#ifdef CONFIG_FUNCTION_TRACER + /* + * If CONFIG_FUNCTION_TRACER is not set, CONFIG_FPROBE is disabled too. + * But user of fprobe may keep embedding the struct fprobe on their own + * code. To avoid build error, this will keep the fprobe data structure + * defined here, but remove ftrace_ops data structure. + */ + struct ftrace_ops ops; +#endif + unsigned long nmissed; + unsigned int flags; + void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); +}; + +#define FPROBE_FL_DISABLED 1 + +static inline bool fprobe_disabled(struct fprobe *fp) +{ + return (fp) ? fp->flags & FPROBE_FL_DISABLED : false; +} + +#ifdef CONFIG_FPROBE +int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter); +int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num); +int register_fprobe_syms(struct fprobe *fp, const char **syms, int num); +int unregister_fprobe(struct fprobe *fp); +#else +static inline int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter) +{ + return -EOPNOTSUPP; +} +static inline int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num) +{ + return -EOPNOTSUPP; +} +static inline int register_fprobe_syms(struct fprobe *fp, const char **syms, int num) +{ + return -EOPNOTSUPP; +} +static inline int unregister_fprobe(struct fprobe *fp) +{ + return -EOPNOTSUPP; +} +#endif + +/** + * disable_fprobe() - Disable fprobe + * @fp: The fprobe to be disabled. + * + * This will soft-disable @fp. Note that this doesn't remove the ftrace + * hooks from the function entry. + */ +static inline void disable_fprobe(struct fprobe *fp) +{ + if (fp) + fp->flags |= FPROBE_FL_DISABLED; +} + +/** + * enable_fprobe() - Enable fprobe + * @fp: The fprobe to be enabled. + * + * This will soft-enable @fp. + */ +static inline void enable_fprobe(struct fprobe *fp) +{ + if (fp) + fp->flags &= ~FPROBE_FL_DISABLED; +} + +#endif -- cgit From 54ecbe6f1ed5138c895bdff55608cf502755b20e Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Tue, 15 Mar 2022 23:00:50 +0900 Subject: rethook: Add a generic return hook Add a return hook framework which hooks the function return. Most of the logic came from the kretprobe, but this is independent from kretprobe. Note that this is expected to be used with other function entry hooking feature, like ftrace, fprobe, adn kprobes. Eventually this will replace the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment, this is just an additional hook. Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (Google) Tested-by: Steven Rostedt (Google) Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2 --- include/linux/rethook.h | 100 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/sched.h | 3 ++ 2 files changed, 103 insertions(+) create mode 100644 include/linux/rethook.h (limited to 'include/linux') diff --git a/include/linux/rethook.h b/include/linux/rethook.h new file mode 100644 index 000000000000..c8ac1e5afcd1 --- /dev/null +++ b/include/linux/rethook.h @@ -0,0 +1,100 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Return hooking with list-based shadow stack. + */ +#ifndef _LINUX_RETHOOK_H +#define _LINUX_RETHOOK_H + +#include +#include +#include +#include +#include +#include + +struct rethook_node; + +typedef void (*rethook_handler_t) (struct rethook_node *, void *, struct pt_regs *); + +/** + * struct rethook - The rethook management data structure. + * @data: The user-defined data storage. + * @handler: The user-defined return hook handler. + * @pool: The pool of struct rethook_node. + * @ref: The reference counter. + * @rcu: The rcu_head for deferred freeing. + * + * Don't embed to another data structure, because this is a self-destructive + * data structure when all rethook_node are freed. + */ +struct rethook { + void *data; + rethook_handler_t handler; + struct freelist_head pool; + refcount_t ref; + struct rcu_head rcu; +}; + +/** + * struct rethook_node - The rethook shadow-stack entry node. + * @freelist: The freelist, linked to struct rethook::pool. + * @rcu: The rcu_head for deferred freeing. + * @llist: The llist, linked to a struct task_struct::rethooks. + * @rethook: The pointer to the struct rethook. + * @ret_addr: The storage for the real return address. + * @frame: The storage for the frame pointer. + * + * You can embed this to your extended data structure to store any data + * on each entry of the shadow stack. + */ +struct rethook_node { + union { + struct freelist_node freelist; + struct rcu_head rcu; + }; + struct llist_node llist; + struct rethook *rethook; + unsigned long ret_addr; + unsigned long frame; +}; + +struct rethook *rethook_alloc(void *data, rethook_handler_t handler); +void rethook_free(struct rethook *rh); +void rethook_add_node(struct rethook *rh, struct rethook_node *node); +struct rethook_node *rethook_try_get(struct rethook *rh); +void rethook_recycle(struct rethook_node *node); +void rethook_hook(struct rethook_node *node, struct pt_regs *regs, bool mcount); +unsigned long rethook_find_ret_addr(struct task_struct *tsk, unsigned long frame, + struct llist_node **cur); + +/* Arch dependent code must implement arch_* and trampoline code */ +void arch_rethook_prepare(struct rethook_node *node, struct pt_regs *regs, bool mcount); +void arch_rethook_trampoline(void); + +/** + * is_rethook_trampoline() - Check whether the address is rethook trampoline + * @addr: The address to be checked + * + * Return true if the @addr is the rethook trampoline address. + */ +static inline bool is_rethook_trampoline(unsigned long addr) +{ + return addr == (unsigned long)dereference_symbol_descriptor(arch_rethook_trampoline); +} + +/* If the architecture needs to fixup the return address, implement it. */ +void arch_rethook_fixup_return(struct pt_regs *regs, + unsigned long correct_ret_addr); + +/* Generic trampoline handler, arch code must prepare asm stub */ +unsigned long rethook_trampoline_handler(struct pt_regs *regs, + unsigned long frame); + +#ifdef CONFIG_RETHOOK +void rethook_flush_task(struct task_struct *tk); +#else +#define rethook_flush_task(tsk) do { } while (0) +#endif + +#endif + diff --git a/include/linux/sched.h b/include/linux/sched.h index 75ba8aa60248..7034f53404e3 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1481,6 +1481,9 @@ struct task_struct { #ifdef CONFIG_KRETPROBES struct llist_head kretprobe_instances; #endif +#ifdef CONFIG_RETHOOK + struct llist_head rethooks; +#endif #ifdef CONFIG_ARCH_HAS_PARANOID_L1D_FLUSH /* -- cgit From 5b0ab78998e32564a011b14c4c7f9c81e2d42b9d Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Tue, 15 Mar 2022 23:01:48 +0900 Subject: fprobe: Add exit_handler support Add exit_handler to fprobe. fprobe + rethook allows us to hook the kernel function return. The rethook will be enabled only if the fprobe::exit_handler is set. Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (Google) Tested-by: Steven Rostedt (Google) Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/164735290790.1084943.10601965782208052202.stgit@devnote2 --- include/linux/fprobe.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h index 2ba099aff041..8eefec2b485e 100644 --- a/include/linux/fprobe.h +++ b/include/linux/fprobe.h @@ -5,13 +5,16 @@ #include #include +#include /** * struct fprobe - ftrace based probe. * @ops: The ftrace_ops. * @nmissed: The counter for missing events. * @flags: The status flag. + * @rethook: The rethook data structure. (internal data) * @entry_handler: The callback function for function entry. + * @exit_handler: The callback function for function exit. */ struct fprobe { #ifdef CONFIG_FUNCTION_TRACER @@ -25,7 +28,10 @@ struct fprobe { #endif unsigned long nmissed; unsigned int flags; + struct rethook *rethook; + void (*entry_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); + void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); }; #define FPROBE_FL_DISABLED 1 -- cgit From ab51e15d535e07be9839e0df056a4ebe9c5bac83 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Tue, 15 Mar 2022 23:02:11 +0900 Subject: fprobe: Introduce FPROBE_FL_KPROBE_SHARED flag for fprobe Introduce FPROBE_FL_KPROBE_SHARED flag for sharing fprobe callback with kprobes safely from the viewpoint of recursion. Since the recursion safety of the fprobe (and ftrace) is a bit different from the kprobes, this may cause an issue if user wants to run the same code from the fprobe and the kprobes. The kprobes has per-cpu 'current_kprobe' variable which protects the kprobe handler from recursion in any case. On the other hand, the fprobe uses only ftrace_test_recursion_trylock(), which will allow interrupt context calls another (or same) fprobe during the fprobe user handler is running. This is not a matter in cases if the common callback shared among the kprobes and the fprobe has its own recursion detection, or it can handle the recursion in the different contexts (normal/interrupt/NMI.) But if it relies on the 'current_kprobe' recursion lock, it has to check kprobe_running() and use kprobe_busy_*() APIs. Fprobe has FPROBE_FL_KPROBE_SHARED flag to do this. If your common callback code will be shared with kprobes, please set FPROBE_FL_KPROBE_SHARED *before* registering the fprobe, like; fprobe.flags = FPROBE_FL_KPROBE_SHARED; register_fprobe(&fprobe, "func*", NULL); This will protect your common callback from the nested call. Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (Google) Tested-by: Steven Rostedt (Google) Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/164735293127.1084943.15687374237275817599.stgit@devnote2 --- include/linux/fprobe.h | 12 ++++++++++++ include/linux/kprobes.h | 3 +++ 2 files changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fprobe.h b/include/linux/fprobe.h index 8eefec2b485e..1c2bde0ead73 100644 --- a/include/linux/fprobe.h +++ b/include/linux/fprobe.h @@ -34,13 +34,25 @@ struct fprobe { void (*exit_handler)(struct fprobe *fp, unsigned long entry_ip, struct pt_regs *regs); }; +/* This fprobe is soft-disabled. */ #define FPROBE_FL_DISABLED 1 +/* + * This fprobe handler will be shared with kprobes. + * This flag must be set before registering. + */ +#define FPROBE_FL_KPROBE_SHARED 2 + static inline bool fprobe_disabled(struct fprobe *fp) { return (fp) ? fp->flags & FPROBE_FL_DISABLED : false; } +static inline bool fprobe_shared_with_kprobes(struct fprobe *fp) +{ + return (fp) ? fp->flags & FPROBE_FL_KPROBE_SHARED : false; +} + #ifdef CONFIG_FPROBE int register_fprobe(struct fprobe *fp, const char *filter, const char *notfilter); int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num); diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 19b884353b15..5f1859836deb 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -427,6 +427,9 @@ static inline struct kprobe *kprobe_running(void) { return NULL; } +#define kprobe_busy_begin() do {} while (0) +#define kprobe_busy_end() do {} while (0) + static inline int register_kprobe(struct kprobe *p) { return -EOPNOTSUPP; -- cgit From a0019cd7d41a191253859349535d76ee28ab7d96 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 16 Mar 2022 13:24:07 +0100 Subject: lib/sort: Add priv pointer to swap function Adding support to have priv pointer in swap callback function. Following the initial change on cmp callback functions [1] and adding SWAP_WRAPPER macro to identify sort call of sort_r. Signed-off-by: Jiri Olsa Signed-off-by: Alexei Starovoitov Reviewed-by: Masami Hiramatsu Link: https://lore.kernel.org/bpf/20220316122419.933957-2-jolsa@kernel.org [1] 4333fb96ca10 ("media: lib/sort.c: implement sort() variant taking context argument") --- include/linux/sort.h | 2 +- include/linux/types.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sort.h b/include/linux/sort.h index b5898725fe9d..e163287ac6c1 100644 --- a/include/linux/sort.h +++ b/include/linux/sort.h @@ -6,7 +6,7 @@ void sort_r(void *base, size_t num, size_t size, cmp_r_func_t cmp_func, - swap_func_t swap_func, + swap_r_func_t swap_func, const void *priv); void sort(void *base, size_t num, size_t size, diff --git a/include/linux/types.h b/include/linux/types.h index ac825ad90e44..ea8cf60a8a79 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -226,6 +226,7 @@ struct callback_head { typedef void (*rcu_callback_t)(struct rcu_head *head); typedef void (*call_rcu_func_t)(struct rcu_head *head, rcu_callback_t func); +typedef void (*swap_r_func_t)(void *a, void *b, int size, const void *priv); typedef void (*swap_func_t)(void *a, void *b, int size); typedef int (*cmp_r_func_t)(const void *a, const void *b, const void *priv); -- cgit From 0dcac272540613d41c05e89679e4ddb978b612f1 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Wed, 16 Mar 2022 13:24:09 +0100 Subject: bpf: Add multi kprobe link Adding new link type BPF_LINK_TYPE_KPROBE_MULTI that attaches kprobe program through fprobe API. The fprobe API allows to attach probe on multiple functions at once very fast, because it works on top of ftrace. On the other hand this limits the probe point to the function entry or return. The kprobe program gets the same pt_regs input ctx as when it's attached through the perf API. Adding new attach type BPF_TRACE_KPROBE_MULTI that allows attachment kprobe to multiple function with new link. User provides array of addresses or symbols with count to attach the kprobe program to. The new link_create uapi interface looks like: struct { __u32 flags; __u32 cnt; __aligned_u64 syms; __aligned_u64 addrs; } kprobe_multi; The flags field allows single BPF_TRACE_KPROBE_MULTI bit to create return multi kprobe. Signed-off-by: Masami Hiramatsu Signed-off-by: Jiri Olsa Signed-off-by: Alexei Starovoitov Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20220316122419.933957-4-jolsa@kernel.org --- include/linux/bpf_types.h | 1 + include/linux/trace_events.h | 7 +++++++ 2 files changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 48a91c51c015..3e24ad0c4b3c 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -140,3 +140,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp) #ifdef CONFIG_PERF_EVENTS BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf) #endif +BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index dcea51fb60e2..8f0e9e7cb493 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -15,6 +15,7 @@ struct array_buffer; struct tracer; struct dentry; struct bpf_prog; +union bpf_attr; const char *trace_print_flags_seq(struct trace_seq *p, const char *delim, unsigned long flags, @@ -738,6 +739,7 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp); int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id, u32 *fd_type, const char **buf, u64 *probe_offset, u64 *probe_addr); +int bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); #else static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx) { @@ -779,6 +781,11 @@ static inline int bpf_get_perf_event_info(const struct perf_event *event, { return -EOPNOTSUPP; } +static inline int +bpf_kprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *prog) +{ + return -EOPNOTSUPP; +} #endif enum { -- cgit From 7a19006b60b129ce2cbe787f4f910dc0ec5a1ec6 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 18 Mar 2022 08:34:52 +0100 Subject: kernfs: remove unneeded #if 0 guard Commit f2eb478f2f32 ("kernfs: move struct kernfs_root out of the public view.") moved kernfs_root out of kernfs.h, but my debugging code of a #if 0 was left in accidentally. Fix that up by removing the guards. Fixes: f2eb478f2f32 ("kernfs: move struct kernfs_root out of the public view.") Cc: Tejun Heo Reported-by: Al Viro Link: https://lore.kernel.org/r/20220318073452.1486568-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman --- include/linux/kernfs.h | 20 -------------------- 1 file changed, 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 62aff082dc3f..e2ae15a6225e 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -185,26 +185,6 @@ struct kernfs_syscall_ops { struct kernfs_root *root); }; -#if 0 -struct kernfs_root { - /* published fields */ - struct kernfs_node *kn; - unsigned int flags; /* KERNFS_ROOT_* flags */ - - /* private fields, do not use outside kernfs proper */ - struct idr ino_idr; - u32 last_id_lowbits; - u32 id_highbits; - struct kernfs_syscall_ops *syscall_ops; - - /* list of kernfs_super_info of this root, protected by kernfs_rwsem */ - struct list_head supers; - - wait_queue_head_t deactivate_waitq; - struct rw_semaphore kernfs_rwsem; -}; -#endif - struct kernfs_node *kernfs_root_to_node(struct kernfs_root *root); struct kernfs_open_file { -- cgit From e9b57aaae605eb869a628fdc21fe7d2c77a2205d Mon Sep 17 00:00:00 2001 From: Jeffle Xu Date: Wed, 9 Feb 2022 14:00:47 +0800 Subject: fscache: export fscache_end_operation() Export fscache_end_operation() to avoid code duplication. Besides, considering the paired fscache_begin_read_operation() is already exported, it shall make sense to also export fscache_end_operation(). Signed-off-by: Jeffle Xu Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/20220302125134.131039-2-jefflexu@linux.alibaba.com/ # Jeffle's v4 Link: https://lore.kernel.org/r/164622971432.3564931.12184135678781328146.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678190346.1200972.7453733431978569479.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692888334.2099075.5166283293894267365.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/20220316131723.111553-2-jefflexu@linux.alibaba.com/ # v5 --- include/linux/fscache.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 296c5f1d9f35..d2430da8aa67 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -456,6 +456,20 @@ int fscache_begin_read_operation(struct netfs_cache_resources *cres, return -ENOBUFS; } +/** + * fscache_end_operation - End the read operation for the netfs lib + * @cres: The cache resources for the read operation + * + * Clean up the resources at the end of the read request. + */ +static inline void fscache_end_operation(struct netfs_cache_resources *cres) +{ + const struct netfs_cache_ops *ops = fscache_operation_valid(cres); + + if (ops) + ops->end_operation(cres); +} + /** * fscache_read - Start a read from the cache. * @cres: The cache resources to use -- cgit From 6a19114b8e7f1e24d80b0812e26d78d7ae1ec6dd Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 17 Feb 2022 10:01:23 +0000 Subject: netfs: Rename netfs_read_*request to netfs_io_*request Rename netfs_read_*request to netfs_io_*request so that the same structures can be used for the write helpers too. perl -p -i -e 's/netfs_read_(request|subrequest)/netfs_io_$1/g' \ `git grep -l 'netfs_read_\(sub\|\)request'` perl -p -i -e 's/nr_rd_ops/nr_outstanding/g' \ `git grep -l nr_rd_ops` perl -p -i -e 's/nr_wr_ops/nr_copy_ops/g' \ `git grep -l nr_wr_ops` perl -p -i -e 's/netfs_read_source/netfs_io_source/g' \ `git grep -l 'netfs_read_source'` perl -p -i -e 's/netfs_io_request_ops/netfs_request_ops/g' \ `git grep -l 'netfs_io_request_ops'` perl -p -i -e 's/init_rreq/init_request/g' \ `git grep -l 'init_rreq'` Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164622988070.3564931.7089670190434315183.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678195157.1200972.366609966927368090.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692891535.2099075.18435198075367420588.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/netfs.h | 42 +++++++++++++++++++++--------------------- 1 file changed, 21 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 614f22213e21..a2ca91cb7a68 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -106,7 +106,7 @@ static inline int wait_on_page_fscache_killable(struct page *page) return folio_wait_private_2_killable(page_folio(page)); } -enum netfs_read_source { +enum netfs_io_source { NETFS_FILL_WITH_ZEROES, NETFS_DOWNLOAD_FROM_SERVER, NETFS_READ_FROM_CACHE, @@ -130,8 +130,8 @@ struct netfs_cache_resources { /* * Descriptor for a single component subrequest. */ -struct netfs_read_subrequest { - struct netfs_read_request *rreq; /* Supervising read request */ +struct netfs_io_subrequest { + struct netfs_io_request *rreq; /* Supervising read request */ struct list_head rreq_link; /* Link in rreq->subrequests */ loff_t start; /* Where to start the I/O */ size_t len; /* Size of the I/O */ @@ -139,7 +139,7 @@ struct netfs_read_subrequest { refcount_t usage; short error; /* 0 or error that occurred */ unsigned short debug_index; /* Index in list (for debugging output) */ - enum netfs_read_source source; /* Where to read from */ + enum netfs_io_source source; /* Where to read from */ unsigned long flags; #define NETFS_SREQ_WRITE_TO_CACHE 0 /* Set if should write to cache */ #define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */ @@ -152,7 +152,7 @@ struct netfs_read_subrequest { * Descriptor for a read helper request. This is used to make multiple I/O * requests on a variety of sources and then stitch the result together. */ -struct netfs_read_request { +struct netfs_io_request { struct work_struct work; struct inode *inode; /* The file being accessed */ struct address_space *mapping; /* The mapping being accessed */ @@ -160,8 +160,8 @@ struct netfs_read_request { struct list_head subrequests; /* Requests to fetch I/O from disk or net */ void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; - atomic_t nr_rd_ops; /* Number of read ops in progress */ - atomic_t nr_wr_ops; /* Number of write ops in progress */ + atomic_t nr_outstanding; /* Number of read ops in progress */ + atomic_t nr_copy_ops; /* Number of write ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ @@ -176,23 +176,23 @@ struct netfs_read_request { #define NETFS_RREQ_DONT_UNLOCK_FOLIOS 3 /* Don't unlock the folios on completion */ #define NETFS_RREQ_FAILED 4 /* The request failed */ #define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */ - const struct netfs_read_request_ops *netfs_ops; + const struct netfs_request_ops *netfs_ops; }; /* * Operations the network filesystem can/must provide to the helpers. */ -struct netfs_read_request_ops { +struct netfs_request_ops { bool (*is_cache_enabled)(struct inode *inode); - void (*init_rreq)(struct netfs_read_request *rreq, struct file *file); - int (*begin_cache_operation)(struct netfs_read_request *rreq); - void (*expand_readahead)(struct netfs_read_request *rreq); - bool (*clamp_length)(struct netfs_read_subrequest *subreq); - void (*issue_op)(struct netfs_read_subrequest *subreq); - bool (*is_still_valid)(struct netfs_read_request *rreq); + void (*init_request)(struct netfs_io_request *rreq, struct file *file); + int (*begin_cache_operation)(struct netfs_io_request *rreq); + void (*expand_readahead)(struct netfs_io_request *rreq); + bool (*clamp_length)(struct netfs_io_subrequest *subreq); + void (*issue_op)(struct netfs_io_subrequest *subreq); + bool (*is_still_valid)(struct netfs_io_request *rreq); int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, struct folio *folio, void **_fsdata); - void (*done)(struct netfs_read_request *rreq); + void (*done)(struct netfs_io_request *rreq); void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; @@ -235,7 +235,7 @@ struct netfs_cache_ops { /* Prepare a read operation, shortening it to a cached/uncached * boundary as appropriate. */ - enum netfs_read_source (*prepare_read)(struct netfs_read_subrequest *subreq, + enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq, loff_t i_size); /* Prepare a write operation, working out what part of the write we can @@ -255,19 +255,19 @@ struct netfs_cache_ops { struct readahead_control; extern void netfs_readahead(struct readahead_control *, - const struct netfs_read_request_ops *, + const struct netfs_request_ops *, void *); extern int netfs_readpage(struct file *, struct folio *, - const struct netfs_read_request_ops *, + const struct netfs_request_ops *, void *); extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, unsigned int, struct folio **, void **, - const struct netfs_read_request_ops *, + const struct netfs_request_ops *, void *); -extern void netfs_subreq_terminated(struct netfs_read_subrequest *, ssize_t, bool); +extern void netfs_subreq_terminated(struct netfs_io_subrequest *, ssize_t, bool); extern void netfs_stats_show(struct seq_file *); #endif /* _LINUX_NETFS_H */ -- cgit From f18a378580a761c8559b7d90afaa157269559c05 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 17 Feb 2022 10:14:32 +0000 Subject: netfs: Finish off rename of netfs_read_request to netfs_io_request Adjust helper function names and comments after mass rename of struct netfs_read_*request to struct netfs_io_*request. Changes ======= ver #2) - Make the changes in the docs also. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164622992433.3564931.6684311087845150271.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678196111.1200972.5001114956865989528.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692892567.2099075.13895804222087028813.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/netfs.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index a2ca91cb7a68..f63de27d6f29 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -131,7 +131,7 @@ struct netfs_cache_resources { * Descriptor for a single component subrequest. */ struct netfs_io_subrequest { - struct netfs_io_request *rreq; /* Supervising read request */ + struct netfs_io_request *rreq; /* Supervising I/O request */ struct list_head rreq_link; /* Link in rreq->subrequests */ loff_t start; /* Where to start the I/O */ size_t len; /* Size of the I/O */ @@ -139,29 +139,29 @@ struct netfs_io_subrequest { refcount_t usage; short error; /* 0 or error that occurred */ unsigned short debug_index; /* Index in list (for debugging output) */ - enum netfs_io_source source; /* Where to read from */ + enum netfs_io_source source; /* Where to read from/write to */ unsigned long flags; -#define NETFS_SREQ_WRITE_TO_CACHE 0 /* Set if should write to cache */ +#define NETFS_SREQ_COPY_TO_CACHE 0 /* Set if should copy the data to the cache */ #define NETFS_SREQ_CLEAR_TAIL 1 /* Set if the rest of the read should be cleared */ -#define NETFS_SREQ_SHORT_READ 2 /* Set if there was a short read from the cache */ +#define NETFS_SREQ_SHORT_IO 2 /* Set if the I/O was short */ #define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */ #define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */ }; /* - * Descriptor for a read helper request. This is used to make multiple I/O - * requests on a variety of sources and then stitch the result together. + * Descriptor for an I/O helper request. This is used to make multiple I/O + * operations to a variety of data stores and then stitch the result together. */ struct netfs_io_request { struct work_struct work; struct inode *inode; /* The file being accessed */ struct address_space *mapping; /* The mapping being accessed */ struct netfs_cache_resources cache_resources; - struct list_head subrequests; /* Requests to fetch I/O from disk or net */ + struct list_head subrequests; /* Contributory I/O operations */ void *netfs_priv; /* Private data for the netfs */ unsigned int debug_id; - atomic_t nr_outstanding; /* Number of read ops in progress */ - atomic_t nr_copy_ops; /* Number of write ops in progress */ + atomic_t nr_outstanding; /* Number of ops in progress */ + atomic_t nr_copy_ops; /* Number of copy-to-cache ops in progress */ size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ @@ -171,7 +171,7 @@ struct netfs_io_request { refcount_t usage; unsigned long flags; #define NETFS_RREQ_INCOMPLETE_IO 0 /* Some ioreqs terminated short or with error */ -#define NETFS_RREQ_WRITE_TO_CACHE 1 /* Need to write to the cache */ +#define NETFS_RREQ_COPY_TO_CACHE 1 /* Need to write to the cache */ #define NETFS_RREQ_NO_UNLOCK_FOLIO 2 /* Don't unlock no_unlock_folio on completion */ #define NETFS_RREQ_DONT_UNLOCK_FOLIOS 3 /* Don't unlock the folios on completion */ #define NETFS_RREQ_FAILED 4 /* The request failed */ @@ -188,7 +188,7 @@ struct netfs_request_ops { int (*begin_cache_operation)(struct netfs_io_request *rreq); void (*expand_readahead)(struct netfs_io_request *rreq); bool (*clamp_length)(struct netfs_io_subrequest *subreq); - void (*issue_op)(struct netfs_io_subrequest *subreq); + void (*issue_read)(struct netfs_io_subrequest *subreq); bool (*is_still_valid)(struct netfs_io_request *rreq); int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, struct folio *folio, void **_fsdata); -- cgit From de74023befa1876f64bc5871a2a4a51850517118 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 17 Feb 2022 21:13:05 +0000 Subject: netfs: Trace refcounting on the netfs_io_request struct Add refcount tracing for the netfs_io_request structure. Changes ======= ver #3) - Switch 'W=' to 'R=' in the traceline to match other request debug IDs. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164622997668.3564931.14456171619219324968.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678200943.1200972.7241495532327787765.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692900920.2099075.11847712419940675791.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/netfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index f63de27d6f29..541aebe828f3 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -168,7 +168,7 @@ struct netfs_io_request { loff_t i_size; /* Size of the file */ loff_t start; /* Start position */ pgoff_t no_unlock_folio; /* Don't unlock this folio after read */ - refcount_t usage; + refcount_t ref; unsigned long flags; #define NETFS_RREQ_INCOMPLETE_IO 0 /* Some ioreqs terminated short or with error */ #define NETFS_RREQ_COPY_TO_CACHE 1 /* Need to write to the cache */ -- cgit From 6cd3d6fd1fe2feae5ff9f6a821081569bb140bf4 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 17 Feb 2022 15:01:24 +0000 Subject: netfs: Trace refcounting on the netfs_io_subrequest struct Add refcount tracing for the netfs_io_subrequest structure. Changes ======= ver #3) - Switch 'W=' to 'R=' in the traceline to match other request debug IDs. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164622998584.3564931.5052255990645723639.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678202603.1200972.14726007419792315578.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692901860.2099075.4845820886851239935.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/netfs.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 541aebe828f3..c702bd8ea8da 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -18,6 +18,8 @@ #include #include +enum netfs_sreq_ref_trace; + /* * Overload PG_private_2 to give us PG_fscache - this is used to indicate that * a page is currently backed by a local disk cache @@ -136,7 +138,7 @@ struct netfs_io_subrequest { loff_t start; /* Where to start the I/O */ size_t len; /* Size of the I/O */ size_t transferred; /* Amount of data transferred */ - refcount_t usage; + refcount_t ref; short error; /* 0 or error that occurred */ unsigned short debug_index; /* Index in list (for debugging output) */ enum netfs_io_source source; /* Where to read from/write to */ @@ -268,6 +270,10 @@ extern int netfs_write_begin(struct file *, struct address_space *, void *); extern void netfs_subreq_terminated(struct netfs_io_subrequest *, ssize_t, bool); +extern void netfs_get_subrequest(struct netfs_io_subrequest *subreq, + enum netfs_sreq_ref_trace what); +extern void netfs_put_subrequest(struct netfs_io_subrequest *subreq, + bool was_async, enum netfs_sreq_ref_trace what); extern void netfs_stats_show(struct seq_file *); #endif /* _LINUX_NETFS_H */ -- cgit From 663dfb65c3b3ea4b8e1944680352992d58f3aa22 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 26 Aug 2021 09:24:42 -0400 Subject: netfs: Refactor arguments for netfs_alloc_read_request Pass start and len to the rreq allocator. This should ensure that the fields are set so that ->init_request() can use them. Also add a parameter to indicates the origin of the request. Ceph can use this to tell whether to get caps. Changes ======= ver #3) - Change the author to me as Jeff feels that most of the patch is my changes now. ver #2) - Show the request origin in the netfs_rreq tracepoint. Signed-off-by: Jeff Layton Co-developed-by: David Howells Signed-off-by: David Howells cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164622989020.3564931.17517006047854958747.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678208569.1200972.12153682697842916557.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692904155.2099075.14717645623034355995.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/netfs.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index c702bd8ea8da..7dc741d9b21b 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -150,6 +150,12 @@ struct netfs_io_subrequest { #define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */ }; +enum netfs_io_origin { + NETFS_READAHEAD, /* This read was triggered by readahead */ + NETFS_READPAGE, /* This read is a synchronous read */ + NETFS_READ_FOR_WRITE, /* This read is to prepare a write */ +} __mode(byte); + /* * Descriptor for an I/O helper request. This is used to make multiple I/O * operations to a variety of data stores and then stitch the result together. @@ -167,6 +173,7 @@ struct netfs_io_request { size_t submitted; /* Amount submitted for I/O so far */ size_t len; /* Length of the request */ short error; /* 0 or error that occurred */ + enum netfs_io_origin origin; /* Origin of the request */ loff_t i_size; /* Size of the file */ loff_t start; /* Start position */ pgoff_t no_unlock_folio; /* Don't unlock this folio after read */ -- cgit From 2de160417315b8d64455fe03e9bb7d3308ac3281 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 20 Jan 2022 21:55:46 +0000 Subject: netfs: Change ->init_request() to return an error code Change the request initialisation function to return an error code so that the network filesystem can return a failure (ENOMEM, for example). This will also allow ceph to abort a ->readahead() op if the server refuses to give it a cap allowing local caching from within the netfslib framework (errors aren't passed back through ->readahead(), so returning, say, -ENOBUFS will cause the op to be aborted). Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164678212401.1200972.16537041523832944934.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692905398.2099075.5238033621684646524.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/netfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 7dc741d9b21b..4b99e38f73d9 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -193,7 +193,7 @@ struct netfs_io_request { */ struct netfs_request_ops { bool (*is_cache_enabled)(struct inode *inode); - void (*init_request)(struct netfs_io_request *rreq, struct file *file); + int (*init_request)(struct netfs_io_request *rreq, struct file *file); int (*begin_cache_operation)(struct netfs_io_request *rreq); void (*expand_readahead)(struct netfs_io_request *rreq); bool (*clamp_length)(struct netfs_io_subrequest *subreq); -- cgit From bc899ee1c898e520574ff4d99356eb2e724a9265 Mon Sep 17 00:00:00 2001 From: David Howells Date: Tue, 29 Jun 2021 22:37:05 +0100 Subject: netfs: Add a netfs inode context Add a netfs_i_context struct that should be included in the network filesystem's own inode struct wrapper, directly after the VFS's inode struct, e.g.: struct my_inode { struct { /* These must be contiguous */ struct inode vfs_inode; struct netfs_i_context netfs_ctx; }; }; The netfs_i_context struct so far contains a single field for the network filesystem to use - the cache cookie: struct netfs_i_context { ... struct fscache_cookie *cache; }; Three functions are provided to help with this: (1) void netfs_i_context_init(struct inode *inode, const struct netfs_request_ops *ops); Initialise the netfs context and set the operations. (2) struct netfs_i_context *netfs_i_context(struct inode *inode); Find the netfs context from the VFS inode. (3) struct inode *netfs_inode(struct netfs_i_context *ctx); Find the VFS inode from the netfs context. Changes ======= ver #4) - Fix netfs_is_cache_enabled() to check cookie->cache_priv to see if a cache is present[3]. - Fix netfs_skip_folio_read() to zero out all of the page, not just some of it[3]. ver #3) - Split out the bit to move ceph cap-getting on readahead into ceph_init_request()[1]. - Stick in a comment to the netfs inode structs indicating the contiguity requirements[2]. ver #2) - Adjust documentation to match. - Use "#if IS_ENABLED()" in netfs_i_cookie(), not "#ifdef". - Move the cap check from ceph_readahead() to ceph_init_request() to be called from netfslib. - Remove ceph_readahead() and use netfs_readahead() directly instead. Signed-off-by: David Howells Acked-by: Jeff Layton cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/8af0d47f17d89c06bbf602496dd845f2b0bf25b3.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/beaf4f6a6c2575ed489adb14b257253c868f9a5c.camel@kernel.org/ [2] Link: https://lore.kernel.org/r/3536452.1647421585@warthog.procyon.org.uk/ [3] Link: https://lore.kernel.org/r/164622984545.3564931.15691742939278418580.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678213320.1200972.16807551936267647470.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692909854.2099075.9535537286264248057.stgit@warthog.procyon.org.uk/ # v3 Link: https://lore.kernel.org/r/306388.1647595110@warthog.procyon.org.uk/ # v4 --- include/linux/netfs.h | 81 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 70 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 4b99e38f73d9..8458b30172a5 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -118,6 +118,16 @@ enum netfs_io_source { typedef void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error, bool was_async); +/* + * Per-inode description. This must be directly after the inode struct. + */ +struct netfs_i_context { + const struct netfs_request_ops *ops; +#if IS_ENABLED(CONFIG_FSCACHE) + struct fscache_cookie *cache; +#endif +}; + /* * Resources required to do operations on a cache. */ @@ -192,7 +202,6 @@ struct netfs_io_request { * Operations the network filesystem can/must provide to the helpers. */ struct netfs_request_ops { - bool (*is_cache_enabled)(struct inode *inode); int (*init_request)(struct netfs_io_request *rreq, struct file *file); int (*begin_cache_operation)(struct netfs_io_request *rreq); void (*expand_readahead)(struct netfs_io_request *rreq); @@ -263,18 +272,11 @@ struct netfs_cache_ops { }; struct readahead_control; -extern void netfs_readahead(struct readahead_control *, - const struct netfs_request_ops *, - void *); -extern int netfs_readpage(struct file *, - struct folio *, - const struct netfs_request_ops *, - void *); +extern void netfs_readahead(struct readahead_control *); +extern int netfs_readpage(struct file *, struct page *); extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, unsigned int, struct folio **, - void **, - const struct netfs_request_ops *, - void *); + void **); extern void netfs_subreq_terminated(struct netfs_io_subrequest *, ssize_t, bool); extern void netfs_get_subrequest(struct netfs_io_subrequest *subreq, @@ -283,4 +285,61 @@ extern void netfs_put_subrequest(struct netfs_io_subrequest *subreq, bool was_async, enum netfs_sreq_ref_trace what); extern void netfs_stats_show(struct seq_file *); +/** + * netfs_i_context - Get the netfs inode context from the inode + * @inode: The inode to query + * + * Get the netfs lib inode context from the network filesystem's inode. The + * context struct is expected to directly follow on from the VFS inode struct. + */ +static inline struct netfs_i_context *netfs_i_context(struct inode *inode) +{ + return (struct netfs_i_context *)(inode + 1); +} + +/** + * netfs_inode - Get the netfs inode from the inode context + * @ctx: The context to query + * + * Get the netfs inode from the netfs library's inode context. The VFS inode + * is expected to directly precede the context struct. + */ +static inline struct inode *netfs_inode(struct netfs_i_context *ctx) +{ + return ((struct inode *)ctx) - 1; +} + +/** + * netfs_i_context_init - Initialise a netfs lib context + * @inode: The inode with which the context is associated + * @ops: The netfs's operations list + * + * Initialise the netfs library context struct. This is expected to follow on + * directly from the VFS inode struct. + */ +static inline void netfs_i_context_init(struct inode *inode, + const struct netfs_request_ops *ops) +{ + struct netfs_i_context *ctx = netfs_i_context(inode); + + memset(ctx, 0, sizeof(*ctx)); + ctx->ops = ops; +} + +/** + * netfs_i_cookie - Get the cache cookie from the inode + * @inode: The inode to query + * + * Get the caching cookie (if enabled) from the network filesystem's inode. + */ +static inline struct fscache_cookie *netfs_i_cookie(struct inode *inode) +{ +#if IS_ENABLED(CONFIG_FSCACHE) + struct netfs_i_context *ctx = netfs_i_context(inode); + return ctx->cache; +#else + return NULL; +#endif +} + #endif /* _LINUX_NETFS_H */ -- cgit From 4058f742105ecfcbdf99e1139e6c1f74fb8e6db9 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 5 Nov 2021 13:55:38 +0000 Subject: netfs: Keep track of the actual remote file size Provide a place in which to keep track of the actual remote file size in the netfs context. This is needed because inode->i_size will be updated as we buffer writes in the pagecache, but the server file size won't get updated until we flush them back. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/164623013727.3564931.17659955636985232717.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/164678219305.1200972.6459431995188365134.stgit@warthog.procyon.org.uk/ # v2 Link: https://lore.kernel.org/r/164692921865.2099075.5310757978508056134.stgit@warthog.procyon.org.uk/ # v3 --- include/linux/netfs.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 8458b30172a5..c7bf1eaf51d5 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -126,6 +126,7 @@ struct netfs_i_context { #if IS_ENABLED(CONFIG_FSCACHE) struct fscache_cookie *cache; #endif + loff_t remote_i_size; /* Size of the remote file */ }; /* @@ -324,6 +325,21 @@ static inline void netfs_i_context_init(struct inode *inode, memset(ctx, 0, sizeof(*ctx)); ctx->ops = ops; + ctx->remote_i_size = i_size_read(inode); +} + +/** + * netfs_resize_file - Note that a file got resized + * @inode: The inode being resized + * @new_i_size: The new file size + * + * Inform the netfs lib that a file got resized so that it can adjust its state. + */ +static inline void netfs_resize_file(struct inode *inode, loff_t new_i_size) +{ + struct netfs_i_context *ctx = netfs_i_context(inode); + + ctx->remote_i_size = new_i_size; } /** -- cgit From f58c252e30cf74f68b0054293adc03b5923b9f0e Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Mon, 14 Mar 2022 11:14:32 +0200 Subject: serial: 8250: fix XOFF/XON sending when DMA is used MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When 8250 UART is using DMA, x_char (XON/XOFF) is never sent to the wire. After this change, x_char is injected correctly. Create uart_xchar_out() helper for sending the x_char out and accounting related to it. It seems that almost every driver does these same steps with x_char. Except for 8250, however, almost all currently lack .serial_out so they cannot immediately take advantage of this new helper. The downside of this patch is that it might reintroduce the problems some devices faced with mixed DMA/non-DMA transfer which caused revert f967fc8f165f (Revert "serial: 8250_dma: don't bother DMA with small transfers"). However, the impact should be limited to cases with XON/XOFF (that didn't work with DMA capable devices to begin with so this problem is not very likely to cause a major issue, if any at all). Fixes: 9ee4b83e51f74 ("serial: 8250: Add support for dmaengine") Reported-by: Gilles Buloz Tested-by: Gilles Buloz Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220314091432.4288-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 14ae35f68abb..d4828e69087a 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -458,6 +458,8 @@ extern void uart_handle_cts_change(struct uart_port *uport, extern void uart_insert_char(struct uart_port *port, unsigned int status, unsigned int overrun, unsigned int ch, unsigned int flag); +void uart_xchar_out(struct uart_port *uport, int offset); + #ifdef CONFIG_MAGIC_SYSRQ_SERIAL #define SYSRQ_TIMEOUT (HZ * 5) -- cgit From 336d4b814bf078fa698488632c19beca47308896 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 27 Jan 2022 12:15:32 -0600 Subject: ptrace: Move setting/clearing ptrace_message into ptrace_stop Today ptrace_message is easy to overlook as it not a core part of ptrace_stop. It has been overlooked so much that there are places that set ptrace_message and don't clear it, and places that never set it. So if you get an unlucky sequence of events the ptracer may be able to read a ptrace_message that does not apply to the current ptrace stop. Move setting of ptrace_message into ptrace_stop so that it always gets set before the stop, and always gets cleared after the stop. This prevents non-sense from being reported to userspace and makes ptrace_message more visible in the ptrace helper functions so that kernel developers can see it. Link: https://lkml.kernel.org/r/87bky67qfv.fsf_-_@email.froward.int.ebiederm.org Acked-by: Oleg Nesterov Reviewed-by: Kees Cook Signed-off-by: "Eric W. Biederman" --- include/linux/ptrace.h | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 5310f43e4762..3e6b46e2b7be 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -60,7 +60,7 @@ extern int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned extern void ptrace_disable(struct task_struct *); extern int ptrace_request(struct task_struct *child, long request, unsigned long addr, unsigned long data); -extern void ptrace_notify(int exit_code); +extern void ptrace_notify(int exit_code, unsigned long message); extern void __ptrace_link(struct task_struct *child, struct task_struct *new_parent, const struct cred *ptracer_cred); @@ -155,8 +155,7 @@ static inline bool ptrace_event_enabled(struct task_struct *task, int event) static inline void ptrace_event(int event, unsigned long message) { if (unlikely(ptrace_event_enabled(current, event))) { - current->ptrace_message = message; - ptrace_notify((event << 8) | SIGTRAP); + ptrace_notify((event << 8) | SIGTRAP, message); } else if (event == PTRACE_EVENT_EXEC) { /* legacy EXEC report via SIGTRAP */ if ((current->ptrace & (PT_PTRACED|PT_SEIZED)) == PT_PTRACED) @@ -424,8 +423,7 @@ static inline int ptrace_report_syscall(unsigned long message) if (!(ptrace & PT_PTRACED)) return 0; - current->ptrace_message = message; - ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0)); + ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0), message); /* * this isn't the same as continuing with a signal, but it will do @@ -437,7 +435,6 @@ static inline int ptrace_report_syscall(unsigned long message) current->exit_code = 0; } - current->ptrace_message = 0; return fatal_signal_pending(current); } -- cgit From 6487d1dab837214ec2fd3f0ddd5f787e63be7c20 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 27 Jan 2022 12:19:13 -0600 Subject: ptrace: Return the signal to continue with from ptrace_stop The signal a task should continue with after a ptrace stop is inconsistently read, cleared, and sent. Solve this by reading and clearing the signal to be sent in ptrace_stop. In an ideal world everything except ptrace_signal would share a common implementation of continuing with the signal, so ptracers could count on the signal they ask to continue with actually being delivered. For now retain bug compatibility and just return with the signal number the ptracer requested the code continue with. Link: https://lkml.kernel.org/r/875yoe7qdp.fsf_-_@email.froward.int.ebiederm.org Reviewed-by: Kees Cook Signed-off-by: "Eric W. Biederman" --- include/linux/ptrace.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 3e6b46e2b7be..15b3d176b6b4 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -60,7 +60,7 @@ extern int ptrace_writedata(struct task_struct *tsk, char __user *src, unsigned extern void ptrace_disable(struct task_struct *); extern int ptrace_request(struct task_struct *child, long request, unsigned long addr, unsigned long data); -extern void ptrace_notify(int exit_code, unsigned long message); +extern int ptrace_notify(int exit_code, unsigned long message); extern void __ptrace_link(struct task_struct *child, struct task_struct *new_parent, const struct cred *ptracer_cred); @@ -419,21 +419,21 @@ extern void sigaction_compat_abi(struct k_sigaction *act, struct k_sigaction *oa static inline int ptrace_report_syscall(unsigned long message) { int ptrace = current->ptrace; + int signr; if (!(ptrace & PT_PTRACED)) return 0; - ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0), message); + signr = ptrace_notify(SIGTRAP | ((ptrace & PT_TRACESYSGOOD) ? 0x80 : 0), + message); /* * this isn't the same as continuing with a signal, but it will do * for normal use. strace only continues with a signal if the * stopping signal is not SIGTRAP. -brl */ - if (current->exit_code) { - send_sig(current->exit_code, current, 1); - current->exit_code = 0; - } + if (signr) + send_sig(signr, current, 1); return fatal_signal_pending(current); } -- cgit From 86fc59ef818beb0e1945d17f8e734898baba7e4e Mon Sep 17 00:00:00 2001 From: Colin Foster Date: Sun, 13 Mar 2022 15:45:23 -0700 Subject: regmap: add configurable downshift for addresses Add an additional reg_downshift to be applied to register addresses before any register accesses. An example of a device that uses this is a VSC7514 chip, which require each register address to be downshifted by two if the access is performed over a SPI bus. Signed-off-by: Colin Foster Link: https://lore.kernel.org/r/20220313224524.399947-2-colin.foster@in-advantage.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 22652e5fbc38..40fb9399add6 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -237,6 +237,8 @@ typedef void (*regmap_unlock)(void *); * @reg_stride: The register address stride. Valid register addresses are a * multiple of this value. If set to 0, a value of 1 will be * used. + * @reg_downshift: The number of bits to downshift the register before + * performing any operations. * @pad_bits: Number of bits of padding between register and value. * @val_bits: Number of bits in a register value, mandatory. * @@ -360,6 +362,7 @@ struct regmap_config { int reg_bits; int reg_stride; + int reg_downshift; int pad_bits; int val_bits; -- cgit From 0074f3f2b1e43d3cedd97e47fb6980db6d2ba79e Mon Sep 17 00:00:00 2001 From: Colin Foster Date: Sun, 13 Mar 2022 15:45:24 -0700 Subject: regmap: allow a defined reg_base to be added to every address There's an inconsistency that arises when a register set can be accessed internally via MMIO, or externally via SPI. The VSC7514 chip allows both modes of operation. When internally accessed, the system utilizes __iomem, devm_ioremap_resource, and devm_regmap_init_mmio. For SPI it isn't possible to utilize memory-mapped IO. To properly operate, the resource base must be added to the register before every operation. Signed-off-by: Colin Foster Link: https://lore.kernel.org/r/20220313224524.399947-3-colin.foster@in-advantage.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 40fb9399add6..de81a94d7b30 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -239,6 +239,8 @@ typedef void (*regmap_unlock)(void *); * used. * @reg_downshift: The number of bits to downshift the register before * performing any operations. + * @reg_base: Value to be added to every register address before performing any + * operation. * @pad_bits: Number of bits of padding between register and value. * @val_bits: Number of bits in a register value, mandatory. * @@ -363,6 +365,7 @@ struct regmap_config { int reg_bits; int reg_stride; int reg_downshift; + unsigned int reg_base; int pad_bits; int val_bits; -- cgit From 046e1537a3cf0adc68fe865b5dc9a7e731cc63b3 Mon Sep 17 00:00:00 2001 From: Íñigo Huguet Date: Tue, 15 Mar 2022 10:18:32 +0100 Subject: net: set default rss queues num to physical cores / 2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Network drivers can call to netif_get_num_default_rss_queues to get the default number of receive queues to use. Right now, this default number is min(8, num_online_cpus()). Instead, as suggested by Jakub, use the number of physical cores divided by 2 as a way to avoid wasting CPU resources and to avoid using both CPU threads, but still allowing to scale for high-end processors with many cores. As an exception, select 2 queues for processors with 2 cores, because otherwise it won't take any advantage of RSS despite being SMP capable. Tested: Processor Intel Xeon E5-2620 (2 sockets, 6 cores/socket, 2 threads/core). NIC Broadcom NetXtreme II BCM57810 (10GBps). Ran some tests with `perf stat iperf3 -R`, with parallelisms of 1, 8 and 24, getting the following results: - Number of queues: 6 (instead of 8) - Network throughput: not affected - CPU usage: utilized 0.05-0.12 CPUs more than before (having 24 CPUs this is only 0.2-0.5% higher) - Reduced the number of context switches by 7-50%, being more noticeable when using a higher number of parallel threads. Suggested-by: Jakub Kicinski Signed-off-by: Íñigo Huguet Link: https://lore.kernel.org/r/20220315091832.13873-1-ihuguet@redhat.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 8cbe96ce0a2c..e01a8ce7181f 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3664,7 +3664,6 @@ static inline unsigned int get_netdev_rx_queue_index( } #endif -#define DEFAULT_MAX_NUM_RSS_QUEUES (8) int netif_get_num_default_rss_queues(void); enum skb_free_reason { -- cgit From 0c125f87a84097c182c481be7497af9f816e5db5 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Sat, 26 Feb 2022 05:07:22 +0100 Subject: clk: fixed-factor: Introduce devm_clk_hw_register_fixed_factor_index() Add an API for a fixed factor clk that uses an index for the parent instead of a string name. This allows us to move drivers away from the string based method of describing parents and use the DT/firmware based method instead. Signed-off-by: Marek Vasut Cc: Michael Turquette Cc: Rob Herring Cc: Stephen Boyd Cc: devicetree@vger.kernel.org Link: https://lore.kernel.org/r/20220226040723.143705-2-marex@denx.de [sboyd@kernel.org: Expose a new API instead of internal function] Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 2faa6f7aa8a8..b7a7923f6bbb 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -1003,6 +1003,9 @@ void clk_hw_unregister_fixed_factor(struct clk_hw *hw); struct clk_hw *devm_clk_hw_register_fixed_factor(struct device *dev, const char *name, const char *parent_name, unsigned long flags, unsigned int mult, unsigned int div); +struct clk_hw *devm_clk_hw_register_fixed_factor_index(struct device *dev, + const char *name, unsigned int index, unsigned long flags, + unsigned int mult, unsigned int div); /** * struct clk_fractional_divider - adjustable fractional divider clock * -- cgit From d714fb25e755ad96b699993fac47f48c4d6cebe9 Mon Sep 17 00:00:00 2001 From: Jae Hyun Yoo Date: Fri, 18 Mar 2022 13:41:33 -0700 Subject: i2c: add tracepoints for I2C slave events I2C slave events tracepoints can be enabled by: echo 1 > /sys/kernel/tracing/events/i2c_slave/enable and logs in /sys/kernel/tracing/trace will look like: ... i2c_slave: i2c-0 a=010 ret=0 WR_REQ [] ... i2c_slave: i2c-0 a=010 ret=0 WR_RCV [02] ... i2c_slave: i2c-0 a=010 ret=0 WR_RCV [0c] ... i2c_slave: i2c-0 a=010 ret=0 STOP [] ... i2c_slave: i2c-0 a=010 ret=0 RD_REQ [04] ... i2c_slave: i2c-0 a=010 ret=0 RD_PRO [b4] ... i2c_slave: i2c-0 a=010 ret=0 STOP [] formatted as: i2c- a= ret= <- callback return value [] trace printings can be selected by adding a filter like: echo adapter_nr==1 >/sys/kernel/tracing/events/i2c_slave/filter Signed-off-by: Jae Hyun Yoo Signed-off-by: Wolfram Sang --- include/linux/i2c.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/i2c.h b/include/linux/i2c.h index 7d4f52ceb7b5..fbda5ada2afc 100644 --- a/include/linux/i2c.h +++ b/include/linux/i2c.h @@ -392,12 +392,8 @@ enum i2c_slave_event { int i2c_slave_register(struct i2c_client *client, i2c_slave_cb_t slave_cb); int i2c_slave_unregister(struct i2c_client *client); bool i2c_detect_slave_mode(struct device *dev); - -static inline int i2c_slave_event(struct i2c_client *client, - enum i2c_slave_event event, u8 *val) -{ - return client->slave_cb(client, event, val); -} +int i2c_slave_event(struct i2c_client *client, + enum i2c_slave_event event, u8 *val); #else static inline bool i2c_detect_slave_mode(struct device *dev) { return false; } #endif -- cgit From b00fa38a9c1cba044a32a601b49a55a18ed719d1 Mon Sep 17 00:00:00 2001 From: Joanne Koong Date: Thu, 17 Mar 2022 21:55:52 -0700 Subject: bpf: Enable non-atomic allocations in local storage Currently, local storage memory can only be allocated atomically (GFP_ATOMIC). This restriction is too strict for sleepable bpf programs. In this patch, the verifier detects whether the program is sleepable, and passes the corresponding GFP_KERNEL or GFP_ATOMIC flag as a 5th argument to bpf_task/sk/inode_storage_get. This flag will propagate down to the local storage functions that allocate memory. Please note that bpf_task/sk/inode_storage_update_elem functions are invoked by userspace applications through syscalls. Preemption is disabled before bpf_task/sk/inode_storage_update_elem is called, which means they will always have to allocate memory atomically. Signed-off-by: Joanne Koong Signed-off-by: Alexei Starovoitov Acked-by: KP Singh Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220318045553.3091807-2-joannekoong@fb.com --- include/linux/bpf_local_storage.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h index 37b3906af8b1..493e63258497 100644 --- a/include/linux/bpf_local_storage.h +++ b/include/linux/bpf_local_storage.h @@ -154,16 +154,17 @@ void bpf_selem_unlink_map(struct bpf_local_storage_elem *selem); struct bpf_local_storage_elem * bpf_selem_alloc(struct bpf_local_storage_map *smap, void *owner, void *value, - bool charge_mem); + bool charge_mem, gfp_t gfp_flags); int bpf_local_storage_alloc(void *owner, struct bpf_local_storage_map *smap, - struct bpf_local_storage_elem *first_selem); + struct bpf_local_storage_elem *first_selem, + gfp_t gfp_flags); struct bpf_local_storage_data * bpf_local_storage_update(void *owner, struct bpf_local_storage_map *smap, - void *value, u64 map_flags); + void *value, u64 map_flags, gfp_t gfp_flags); void bpf_local_storage_free_rcu(struct rcu_head *rcu); -- cgit From 3387ce4d8a5f2956fab827edf499fe6780e83faa Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Mon, 21 Mar 2022 11:05:50 +0100 Subject: headers/prep: Fix header to build standalone: Add the dependency to , because cgroup_move_task() will dereference 'struct css_set'. ( Only older toolchains are affected, due to variations in the implementation of rcu_assign_pointer() et al. ) Cc: Peter Zijlstra Cc: Linus Torvalds Reported-by: Sachin Sant Reported-by: Andrew Morton Reported-by: Borislav Petkov Signed-off-by: Ingo Molnar --- include/linux/psi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/psi.h b/include/linux/psi.h index 7f7d1d88c3bb..89784763d19e 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -6,6 +6,7 @@ #include #include #include +#include struct seq_file; struct css_set; -- cgit From a43bf604446414103b7535f38e739b65601c4fb2 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Wed, 16 Mar 2022 18:24:26 -0400 Subject: NFSv4.1 provide mount option to toggle trunking discovery Introduce a new mount option -- trunkdiscovery,notrunkdiscovery -- to toggle whether or not the client will engage in actively discovery of trunking locations. v2 make notrunkdiscovery default Signed-off-by: Olga Kornievskaia Fixes: 1976b2b31462 ("NFSv4.1 query for fs_location attr on a new file system") Signed-off-by: Trond Myklebust --- include/linux/nfs_fs_sb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index ca0959e51e81..b0e3fd550122 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -151,6 +151,7 @@ struct nfs_server { #define NFS_MOUNT_SOFTREVAL 0x800000 #define NFS_MOUNT_WRITE_EAGER 0x01000000 #define NFS_MOUNT_WRITE_WAIT 0x02000000 +#define NFS_MOUNT_TRUNK_DISCOVERY 0x04000000 unsigned int fattr_valid; /* Valid attributes */ unsigned int caps; /* server capabilities */ -- cgit From 028152260c5782504995e1fe3910ad39d4e4f34a Mon Sep 17 00:00:00 2001 From: Rob Herring Date: Mon, 21 Mar 2022 11:35:29 -0500 Subject: Revert "of: base: Introduce of_alias_get_alias_list() to check alias IDs" This reverts commit b1078c355d76769b5ddefc67d143fbd9b6e52c05. The single user of of_alias_get_alias_list(), drivers/tty/serial/xilinx_uartps.c, has since been refactored and no longer needs this function. It also contained a Smatch checker warning: drivers/of/base.c:2038 of_alias_get_alias_list() warn: passing negative bit value 's32min-(-2),0-s32max' to 'set_bit()' Reported-by: Dan Carpenter Signed-off-by: Rob Herring --- include/linux/of.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index 2dc77430a91a..04971e85fbc9 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -388,9 +388,6 @@ extern int of_phandle_iterator_args(struct of_phandle_iterator *it, extern void of_alias_scan(void * (*dt_alloc)(u64 size, u64 align)); extern int of_alias_get_id(struct device_node *np, const char *stem); extern int of_alias_get_highest_id(const char *stem); -extern int of_alias_get_alias_list(const struct of_device_id *matches, - const char *stem, unsigned long *bitmap, - unsigned int nbits); extern int of_machine_is_compatible(const char *compat); @@ -766,13 +763,6 @@ static inline int of_alias_get_highest_id(const char *stem) return -ENOSYS; } -static inline int of_alias_get_alias_list(const struct of_device_id *matches, - const char *stem, unsigned long *bitmap, - unsigned int nbits) -{ - return -ENOSYS; -} - static inline int of_machine_is_compatible(const char *compat) { return 0; -- cgit From 4c65422901154766e5cee17875ed680366a4a141 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 7 Jan 2022 13:45:25 -0500 Subject: mm/gup: Remove an assumption of a contiguous memmap This assumption needs the inverse of nth_page(), which is temporarily named page_nth() until it's renamed later in this series. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard Reviewed-by: Jason Gunthorpe Reviewed-by: William Kucharski --- include/linux/mm.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 0201d258c646..e3f8755f65ed 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -212,8 +212,10 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) +#define page_nth(head, tail) (page_to_pfn(tail) - page_to_pfn(head)) #else #define nth_page(page,n) ((page) + (n)) +#define page_nth(head, tail) ((tail) - (head)) #endif /* to align the pointer to the (next) page boundary */ -- cgit From 5232c63f46fdd779303527ec36c518cc1e9c6b4e Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 6 Jan 2022 16:46:43 -0500 Subject: mm: Make compound_pincount always available Move compound_pincount from the third page to the second page, which means it's available for all compound pages. That lets us delete hpage_pincount_available(). On 32-bit systems, there isn't enough space for both compound_pincount and compound_nr in the second page (it would collide with page->private, which is in use for pages in the swap cache), so revert the optimisation of storing both compound_order and compound_nr on 32-bit systems. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: John Hubbard Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: William Kucharski --- include/linux/mm.h | 21 ++++++++------------- include/linux/mm_types.h | 7 +++++-- 2 files changed, 13 insertions(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index e3f8755f65ed..c64bd0b67d75 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -887,17 +887,6 @@ static inline void destroy_compound_page(struct page *page) compound_page_dtors[page[1].compound_dtor](page); } -static inline bool hpage_pincount_available(struct page *page) -{ - /* - * Can the page->hpage_pinned_refcount field be used? That field is in - * the 3rd page of the compound page, so the smallest (2-page) compound - * pages cannot support it. - */ - page = compound_head(page); - return PageCompound(page) && compound_order(page) > 1; -} - static inline int head_compound_pincount(struct page *head) { return atomic_read(compound_pincount_ptr(head)); @@ -905,7 +894,7 @@ static inline int head_compound_pincount(struct page *head) static inline int compound_pincount(struct page *page) { - VM_BUG_ON_PAGE(!hpage_pincount_available(page), page); + VM_BUG_ON_PAGE(!PageCompound(page), page); page = compound_head(page); return head_compound_pincount(page); } @@ -913,7 +902,9 @@ static inline int compound_pincount(struct page *page) static inline void set_compound_order(struct page *page, unsigned int order) { page[1].compound_order = order; +#ifdef CONFIG_64BIT page[1].compound_nr = 1U << order; +#endif } /* Returns the number of pages in this potentially compound page. */ @@ -921,7 +912,11 @@ static inline unsigned long compound_nr(struct page *page) { if (!PageHead(page)) return 1; +#ifdef CONFIG_64BIT return page[1].compound_nr; +#else + return 1UL << compound_order(page); +#endif } /* Returns the number of bytes in this potentially compound page. */ @@ -1269,7 +1264,7 @@ void unpin_user_pages(struct page **pages, unsigned long npages); */ static inline bool page_maybe_dma_pinned(struct page *page) { - if (hpage_pincount_available(page)) + if (PageCompound(page)) return compound_pincount(page) > 0; /* diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 475bdb282769..0e274c9b934e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -135,11 +135,14 @@ struct page { unsigned char compound_dtor; unsigned char compound_order; atomic_t compound_mapcount; + atomic_t compound_pincount; +#ifdef CONFIG_64BIT unsigned int compound_nr; /* 1 << compound_order */ +#endif }; struct { /* Second tail page of compound page */ unsigned long _compound_pad_1; /* compound_head */ - atomic_t hpage_pinned_refcount; + unsigned long _compound_pad_2; /* For both global and memcg */ struct list_head deferred_list; }; @@ -300,7 +303,7 @@ static inline atomic_t *compound_mapcount_ptr(struct page *page) static inline atomic_t *compound_pincount_ptr(struct page *page) { - return &page[2].hpage_pinned_refcount; + return &page[1].compound_pincount; } /* -- cgit From 3d11b225aeb184bc3dc9b4b27b302815a7c531aa Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 27 Dec 2021 18:28:58 -0500 Subject: mm: Add folio_pincount_ptr() This is the folio equivalent of compound_pincount_ptr(). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard Reviewed-by: Jason Gunthorpe Reviewed-by: William Kucharski --- include/linux/mm.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index c64bd0b67d75..c45739dfdd04 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1544,6 +1544,11 @@ static inline unsigned long folio_pfn(struct folio *folio) return page_to_pfn(&folio->page); } +static inline atomic_t *folio_pincount_ptr(struct folio *folio) +{ + return &folio_page(folio, 1)->compound_pincount; +} + /* MIGRATE_CMA and ZONE_MOVABLE do not allow pin pages */ #ifdef CONFIG_MIGRATION static inline bool is_pinnable_page(struct page *page) -- cgit From 0b90ddae13441c43a30d2e2689b8193a81891c92 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 27 Dec 2021 18:40:41 -0500 Subject: mm: Turn page_maybe_dma_pinned() into folio_maybe_dma_pinned() Replace three calls to compound_head() with one. This removes the last user of compound_pincount(), so remove that helper too. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard Reviewed-by: Jason Gunthorpe Reviewed-by: William Kucharski --- include/linux/mm.h | 129 ++++++++++++++++++++++++++--------------------------- 1 file changed, 63 insertions(+), 66 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index c45739dfdd04..35e453ac5c0f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -892,13 +892,6 @@ static inline int head_compound_pincount(struct page *head) return atomic_read(compound_pincount_ptr(head)); } -static inline int compound_pincount(struct page *page) -{ - VM_BUG_ON_PAGE(!PageCompound(page), page); - page = compound_head(page); - return head_compound_pincount(page); -} - static inline void set_compound_order(struct page *page, unsigned int order) { page[1].compound_order = order; @@ -1236,70 +1229,11 @@ void unpin_user_page_range_dirty_lock(struct page *page, unsigned long npages, bool make_dirty); void unpin_user_pages(struct page **pages, unsigned long npages); -/** - * page_maybe_dma_pinned - Report if a page is pinned for DMA. - * @page: The page. - * - * This function checks if a page has been pinned via a call to - * a function in the pin_user_pages() family. - * - * For non-huge pages, the return value is partially fuzzy: false is not fuzzy, - * because it means "definitely not pinned for DMA", but true means "probably - * pinned for DMA, but possibly a false positive due to having at least - * GUP_PIN_COUNTING_BIAS worth of normal page references". - * - * False positives are OK, because: a) it's unlikely for a page to get that many - * refcounts, and b) all the callers of this routine are expected to be able to - * deal gracefully with a false positive. - * - * For huge pages, the result will be exactly correct. That's because we have - * more tracking data available: the 3rd struct page in the compound page is - * used to track the pincount (instead using of the GUP_PIN_COUNTING_BIAS - * scheme). - * - * For more information, please see Documentation/core-api/pin_user_pages.rst. - * - * Return: True, if it is likely that the page has been "dma-pinned". - * False, if the page is definitely not dma-pinned. - */ -static inline bool page_maybe_dma_pinned(struct page *page) -{ - if (PageCompound(page)) - return compound_pincount(page) > 0; - - /* - * page_ref_count() is signed. If that refcount overflows, then - * page_ref_count() returns a negative value, and callers will avoid - * further incrementing the refcount. - * - * Here, for that overflow case, use the signed bit to count a little - * bit higher via unsigned math, and thus still get an accurate result. - */ - return ((unsigned int)page_ref_count(compound_head(page))) >= - GUP_PIN_COUNTING_BIAS; -} - static inline bool is_cow_mapping(vm_flags_t flags) { return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE; } -/* - * This should most likely only be called during fork() to see whether we - * should break the cow immediately for a page on the src mm. - */ -static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, - struct page *page) -{ - if (!is_cow_mapping(vma->vm_flags)) - return false; - - if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)) - return false; - - return page_maybe_dma_pinned(page); -} - #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif @@ -1549,6 +1483,69 @@ static inline atomic_t *folio_pincount_ptr(struct folio *folio) return &folio_page(folio, 1)->compound_pincount; } +/** + * folio_maybe_dma_pinned - Report if a folio may be pinned for DMA. + * @folio: The folio. + * + * This function checks if a folio has been pinned via a call to + * a function in the pin_user_pages() family. + * + * For small folios, the return value is partially fuzzy: false is not fuzzy, + * because it means "definitely not pinned for DMA", but true means "probably + * pinned for DMA, but possibly a false positive due to having at least + * GUP_PIN_COUNTING_BIAS worth of normal folio references". + * + * False positives are OK, because: a) it's unlikely for a folio to + * get that many refcounts, and b) all the callers of this routine are + * expected to be able to deal gracefully with a false positive. + * + * For large folios, the result will be exactly correct. That's because + * we have more tracking data available: the compound_pincount is used + * instead of the GUP_PIN_COUNTING_BIAS scheme. + * + * For more information, please see Documentation/core-api/pin_user_pages.rst. + * + * Return: True, if it is likely that the page has been "dma-pinned". + * False, if the page is definitely not dma-pinned. + */ +static inline bool folio_maybe_dma_pinned(struct folio *folio) +{ + if (folio_test_large(folio)) + return atomic_read(folio_pincount_ptr(folio)) > 0; + + /* + * folio_ref_count() is signed. If that refcount overflows, then + * folio_ref_count() returns a negative value, and callers will avoid + * further incrementing the refcount. + * + * Here, for that overflow case, use the sign bit to count a little + * bit higher via unsigned math, and thus still get an accurate result. + */ + return ((unsigned int)folio_ref_count(folio)) >= + GUP_PIN_COUNTING_BIAS; +} + +static inline bool page_maybe_dma_pinned(struct page *page) +{ + return folio_maybe_dma_pinned(page_folio(page)); +} + +/* + * This should most likely only be called during fork() to see whether we + * should break the cow immediately for a page on the src mm. + */ +static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, + struct page *page) +{ + if (!is_cow_mapping(vma->vm_flags)) + return false; + + if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)) + return false; + + return page_maybe_dma_pinned(page); +} + /* MIGRATE_CMA and ZONE_MOVABLE do not allow pin pages */ #ifdef CONFIG_MIGRATION static inline bool is_pinnable_page(struct page *page) -- cgit From 40fcc7fc2c3838f3afe07a3a72709b45566e6cdb Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 29 Dec 2021 12:23:55 -0500 Subject: mm: Remove page_cache_add_speculative() and page_cache_get_speculative() These wrappers have no more callers, so delete them. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard Reviewed-by: Jason Gunthorpe Reviewed-by: William Kucharski --- include/linux/mm.h | 7 +++---- include/linux/pagemap.h | 10 ---------- 2 files changed, 3 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 35e453ac5c0f..b764057022c8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1215,10 +1215,9 @@ static inline void put_page(struct page *page) * applications that don't have huge page reference counts, this won't be an * issue. * - * Locking: the lockless algorithm described in page_cache_get_speculative() - * and page_cache_gup_pin_speculative() provides safe operation for - * get_user_pages and page_mkclean and other calls that race to set up page - * table entries. + * Locking: the lockless algorithm described in folio_try_get_rcu() + * provides safe operation for get_user_pages(), page_mkclean() and + * other calls that race to set up page table entries. */ #define GUP_PIN_COUNTING_BIAS (1U << 10) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 270bf5136c34..cdb3f118603a 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -283,16 +283,6 @@ static inline struct inode *folio_inode(struct folio *folio) return folio->mapping->host; } -static inline bool page_cache_add_speculative(struct page *page, int count) -{ - return folio_ref_try_add_rcu((struct folio *)page, count); -} - -static inline bool page_cache_get_speculative(struct page *page) -{ - return page_cache_add_speculative(page, 1); -} - /** * folio_attach_private - Attach private data to a folio. * @folio: Folio to attach data to. -- cgit From 822951d84684d7a0c4f45e7231c960e7fe786d8f Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 8 Jan 2022 00:15:04 -0500 Subject: mm/hugetlb: Use try_grab_folio() instead of try_grab_compound_head() follow_hugetlb_page() only cares about success or failure, so it doesn't need to know the type of the returned pointer, only whether it's NULL or not. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard Reviewed-by: Jason Gunthorpe Reviewed-by: William Kucharski --- include/linux/mm.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index b764057022c8..dca5c99395c9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1124,9 +1124,6 @@ static inline void get_page(struct page *page) } bool __must_check try_grab_page(struct page *page, unsigned int flags); -struct page *try_grab_compound_head(struct page *page, int refs, - unsigned int flags); - static inline __must_check bool try_get_page(struct page *page) { -- cgit From 659508f9c936aa6e3aaf6e9cf6a4a8836b8f8355 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 23 Dec 2021 10:20:12 -0500 Subject: mm/gup: Turn compound_range_next() into gup_folio_range_next() Convert the only caller to work on folios instead of pages. This removes the last caller of put_compound_head(), so delete it. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: John Hubbard Reviewed-by: Jason Gunthorpe Reviewed-by: William Kucharski --- include/linux/mm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index dca5c99395c9..0d3f9057a807 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -212,10 +212,10 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) -#define page_nth(head, tail) (page_to_pfn(tail) - page_to_pfn(head)) +#define folio_page_idx(folio, p) (page_to_pfn(p) - folio_pfn(folio)) #else #define nth_page(page,n) ((page) + (n)) -#define page_nth(head, tail) ((tail) - (head)) +#define folio_page_idx(folio, p) ((p) - &(folio)->page) #endif /* to align the pointer to the (next) page boundary */ -- cgit From 536939ff516382b391a0039262e27fc80c7b3924 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 21 Mar 2022 12:57:38 -0400 Subject: mm: Add three folio wrappers folio_is_zone_device() is equivalent to is_zone_device_page(), folio_is_device_private() is equivalent to is_device_private_page(), and folio_is_pinnable() is equivalent to is_pinnable_page(). All of these tests return the same result for every page in the folio, so we can just pass the head page of the folio to the page variant of the function. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/memremap.h | 5 +++++ include/linux/mm.h | 10 ++++++++++ 2 files changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memremap.h b/include/linux/memremap.h index d6a114dd5ea8..8af304f6b504 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -138,6 +138,11 @@ static inline bool is_device_private_page(const struct page *page) page->pgmap->type == MEMORY_DEVICE_PRIVATE; } +static inline bool folio_is_device_private(const struct folio *folio) +{ + return is_device_private_page(&folio->page); +} + static inline bool is_pci_p2pdma_page(const struct page *page) { return IS_ENABLED(CONFIG_PCI_P2PDMA) && diff --git a/include/linux/mm.h b/include/linux/mm.h index 0d3f9057a807..2ca10c167f35 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1075,6 +1075,11 @@ static inline bool is_zone_device_page(const struct page *page) } #endif +static inline bool folio_is_zone_device(const struct folio *folio) +{ + return is_zone_device_page(&folio->page); +} + static inline bool is_zone_movable_page(const struct page *page) { return page_zonenum(page) == ZONE_MOVABLE; @@ -1556,6 +1561,11 @@ static inline bool is_pinnable_page(struct page *page) } #endif +static inline bool folio_is_pinnable(struct folio *folio) +{ + return is_pinnable_page(&folio->page); +} + static inline void set_page_zone(struct page *page, enum zone_type zone) { page->flags &= ~(ZONES_MASK << ZONES_PGSHIFT); -- cgit From 8927f6473e56e32e328ae8ed43736412f7f76a4e Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 23 Dec 2021 16:39:05 -0500 Subject: mm/workingset: Convert workingset_eviction() to take a folio This removes an assumption that THPs are the only kind of compound pages and removes a few hidden calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/swap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 1d38d9475c4d..de36f140227e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -328,7 +328,7 @@ static inline swp_entry_t folio_swap_entry(struct folio *folio) /* linux/mm/workingset.c */ void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages); -void *workingset_eviction(struct page *page, struct mem_cgroup *target_memcg); +void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg); void workingset_refault(struct folio *folio, void *shadow); void workingset_activation(struct folio *folio); -- cgit From 3ecb0087ecee6213544a1e0b838826a0f4831ce5 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 27 Dec 2021 21:11:34 -0500 Subject: mm/memcg: Convert mem_cgroup_swapout() to take a folio This removes an assumption that THPs are the only kind of compound pages and removes a couple of hidden calls to compound_head. It also documents that you can't pass a tail page to mem_cgroup_swapout(). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/swap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index de36f140227e..e7cb7a1e6ceb 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -741,7 +741,7 @@ static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask) #endif #ifdef CONFIG_MEMCG_SWAP -extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry); +void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry); extern int __mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry); static inline int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry) { @@ -761,7 +761,7 @@ static inline void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_p extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg); extern bool mem_cgroup_swap_full(struct page *page); #else -static inline void mem_cgroup_swapout(struct page *page, swp_entry_t entry) +static inline void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) { } -- cgit From 06d20bdb986815a75fb1addf34655756ba922e3a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 17 Jan 2022 14:40:12 -0500 Subject: mm: Add lru_to_folio() Since page->lru occupies the same bytes as compound_head, any page on the LRU list must be a folio. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 2ca10c167f35..a583b7375445 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -225,6 +225,10 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, #define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)(addr), PAGE_SIZE) #define lru_to_page(head) (list_entry((head)->prev, struct page, lru)) +static inline struct folio *lru_to_folio(struct list_head *head) +{ + return list_entry((head)->prev, struct folio, lru); +} void setup_initial_init_mm(void *start_code, void *end_code, void *end_data, void *brk); -- cgit From 5100da38ef3c33d9ad8b60b29c2b671249bf7e1d Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 12 Feb 2022 22:48:55 -0500 Subject: mm: Convert remove_mapping() to take a folio Add kernel-doc and return the number of pages removed in order to get the statistics right in __invalidate_mapping_pages(). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Miaohe Lin --- include/linux/swap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index e7cb7a1e6ceb..304f174b4d31 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -395,7 +395,7 @@ extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, unsigned long *nr_scanned); extern unsigned long shrink_all_memory(unsigned long nr_pages); extern int vm_swappiness; -extern int remove_mapping(struct address_space *mapping, struct page *page); +long remove_mapping(struct address_space *mapping, struct folio *folio); extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA -- cgit From d6c75dc22c755c567838f12f12a16f2a323ebd4e Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 13 Feb 2022 15:22:28 -0500 Subject: mm/truncate: Split invalidate_inode_page() into mapping_evict_folio() Some of the callers already have the address_space and can avoid calling folio_mapping() and checking if the folio was already truncated. Also add kernel-doc and fix the return type (in case we ever support folios larger than 4TB). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Miaohe Lin --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index a583b7375445..dede2eda4d7f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1825,7 +1825,6 @@ extern void truncate_setsize(struct inode *inode, loff_t newsize); void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to); void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end); int generic_error_remove_page(struct address_space *mapping, struct page *page); -int invalidate_inode_page(struct page *page); #ifdef CONFIG_MMU extern vm_fault_t handle_mm_fault(struct vm_area_struct *vma, -- cgit From 261b6840ed10419ac2f554e515592d59dd5c82cf Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 13 Feb 2022 16:40:24 -0500 Subject: mm: Turn deactivate_file_page() into deactivate_file_folio() This function has one caller which already has a reference to the page, so we don't need to use get_page_unless_zero(). Also move the prototype to mm/internal.h. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Miaohe Lin --- include/linux/swap.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 304f174b4d31..064e60e9f63f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -372,7 +372,6 @@ extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_cpu_zone(struct zone *zone); extern void lru_add_drain_all(void); -extern void deactivate_file_page(struct page *page); extern void deactivate_page(struct page *page); extern void mark_page_lazyfree(struct page *page); extern void swap_setup(void); -- cgit From c56109dd35c9204cd6c49d2116ef36e5044ef867 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 13 Feb 2022 17:22:10 -0500 Subject: mm/truncate: Combine invalidate_mapping_pagevec() and __invalidate_mapping_pages() We can save a function call by combining these two functions, which are identical except for the return value. Also move the prototype to mm/internal.h. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Miaohe Lin --- include/linux/fs.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index e2d892b201b0..85c584c5c623 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2749,10 +2749,6 @@ extern bool is_bad_inode(struct inode *); unsigned long invalidate_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t end); -void invalidate_mapping_pagevec(struct address_space *mapping, - pgoff_t start, pgoff_t end, - unsigned long *nr_pagevec); - static inline void invalidate_remote_inode(struct inode *inode) { if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || -- cgit From cbcc268bb1ce5b539e7652d398e08e9b83dc4cef Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 13 Feb 2022 17:23:58 -0500 Subject: fs: Move many prototypes to pagemap.h These functions are page cache functionality and don't need to be declared in fs.h. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Miaohe Lin --- include/linux/fs.h | 116 ------------------------------------------------ include/linux/pagemap.h | 114 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 114 insertions(+), 116 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 85c584c5c623..0961c979e949 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2746,50 +2746,6 @@ extern void init_special_inode(struct inode *, umode_t, dev_t); extern void make_bad_inode(struct inode *); extern bool is_bad_inode(struct inode *); -unsigned long invalidate_mapping_pages(struct address_space *mapping, - pgoff_t start, pgoff_t end); - -static inline void invalidate_remote_inode(struct inode *inode) -{ - if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || - S_ISLNK(inode->i_mode)) - invalidate_mapping_pages(inode->i_mapping, 0, -1); -} -extern int invalidate_inode_pages2(struct address_space *mapping); -extern int invalidate_inode_pages2_range(struct address_space *mapping, - pgoff_t start, pgoff_t end); -extern int write_inode_now(struct inode *, int); -extern int filemap_fdatawrite(struct address_space *); -extern int filemap_flush(struct address_space *); -extern int filemap_fdatawait_keep_errors(struct address_space *mapping); -extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, - loff_t lend); -extern int filemap_fdatawait_range_keep_errors(struct address_space *mapping, - loff_t start_byte, loff_t end_byte); - -static inline int filemap_fdatawait(struct address_space *mapping) -{ - return filemap_fdatawait_range(mapping, 0, LLONG_MAX); -} - -extern bool filemap_range_has_page(struct address_space *, loff_t lstart, - loff_t lend); -extern int filemap_write_and_wait_range(struct address_space *mapping, - loff_t lstart, loff_t lend); -extern int __filemap_fdatawrite_range(struct address_space *mapping, - loff_t start, loff_t end, int sync_mode); -extern int filemap_fdatawrite_range(struct address_space *mapping, - loff_t start, loff_t end); -extern int filemap_check_errors(struct address_space *mapping); -extern void __filemap_set_wb_err(struct address_space *mapping, int err); -int filemap_fdatawrite_wbc(struct address_space *mapping, - struct writeback_control *wbc); - -static inline int filemap_write_and_wait(struct address_space *mapping) -{ - return filemap_write_and_wait_range(mapping, 0, LLONG_MAX); -} - extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart, loff_t lend); extern int __must_check file_check_and_advance_wb_err(struct file *file); @@ -2801,67 +2757,6 @@ static inline int file_write_and_wait(struct file *file) return file_write_and_wait_range(file, 0, LLONG_MAX); } -/** - * filemap_set_wb_err - set a writeback error on an address_space - * @mapping: mapping in which to set writeback error - * @err: error to be set in mapping - * - * When writeback fails in some way, we must record that error so that - * userspace can be informed when fsync and the like are called. We endeavor - * to report errors on any file that was open at the time of the error. Some - * internal callers also need to know when writeback errors have occurred. - * - * When a writeback error occurs, most filesystems will want to call - * filemap_set_wb_err to record the error in the mapping so that it will be - * automatically reported whenever fsync is called on the file. - */ -static inline void filemap_set_wb_err(struct address_space *mapping, int err) -{ - /* Fastpath for common case of no error */ - if (unlikely(err)) - __filemap_set_wb_err(mapping, err); -} - -/** - * filemap_check_wb_err - has an error occurred since the mark was sampled? - * @mapping: mapping to check for writeback errors - * @since: previously-sampled errseq_t - * - * Grab the errseq_t value from the mapping, and see if it has changed "since" - * the given value was sampled. - * - * If it has then report the latest error set, otherwise return 0. - */ -static inline int filemap_check_wb_err(struct address_space *mapping, - errseq_t since) -{ - return errseq_check(&mapping->wb_err, since); -} - -/** - * filemap_sample_wb_err - sample the current errseq_t to test for later errors - * @mapping: mapping to be sampled - * - * Writeback errors are always reported relative to a particular sample point - * in the past. This function provides those sample points. - */ -static inline errseq_t filemap_sample_wb_err(struct address_space *mapping) -{ - return errseq_sample(&mapping->wb_err); -} - -/** - * file_sample_sb_err - sample the current errseq_t to test for later errors - * @file: file pointer to be sampled - * - * Grab the most current superblock-level errseq_t value for the given - * struct file. - */ -static inline errseq_t file_sample_sb_err(struct file *file) -{ - return errseq_sample(&file->f_path.dentry->d_sb->s_wb_err); -} - extern int vfs_fsync_range(struct file *file, loff_t start, loff_t end, int datasync); extern int vfs_fsync(struct file *file, int datasync); @@ -3604,15 +3499,4 @@ extern int vfs_fadvise(struct file *file, loff_t offset, loff_t len, extern int generic_fadvise(struct file *file, loff_t offset, loff_t len, int advice); -/* - * Flush file data before changing attributes. Caller must hold any locks - * required to prevent further writes to this file until we're done setting - * flags. - */ -static inline int inode_drain_writes(struct inode *inode) -{ - inode_dio_wait(inode); - return filemap_write_and_wait(inode->i_mapping); -} - #endif /* _LINUX_FS_H */ diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index cdb3f118603a..f968b36ad771 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -18,6 +18,120 @@ struct folio_batch; +unsigned long invalidate_mapping_pages(struct address_space *mapping, + pgoff_t start, pgoff_t end); + +static inline void invalidate_remote_inode(struct inode *inode) +{ + if (S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode) || + S_ISLNK(inode->i_mode)) + invalidate_mapping_pages(inode->i_mapping, 0, -1); +} +int invalidate_inode_pages2(struct address_space *mapping); +int invalidate_inode_pages2_range(struct address_space *mapping, + pgoff_t start, pgoff_t end); +int write_inode_now(struct inode *, int sync); +int filemap_fdatawrite(struct address_space *); +int filemap_flush(struct address_space *); +int filemap_fdatawait_keep_errors(struct address_space *mapping); +int filemap_fdatawait_range(struct address_space *, loff_t lstart, loff_t lend); +int filemap_fdatawait_range_keep_errors(struct address_space *mapping, + loff_t start_byte, loff_t end_byte); + +static inline int filemap_fdatawait(struct address_space *mapping) +{ + return filemap_fdatawait_range(mapping, 0, LLONG_MAX); +} + +bool filemap_range_has_page(struct address_space *, loff_t lstart, loff_t lend); +int filemap_write_and_wait_range(struct address_space *mapping, + loff_t lstart, loff_t lend); +int __filemap_fdatawrite_range(struct address_space *mapping, + loff_t start, loff_t end, int sync_mode); +int filemap_fdatawrite_range(struct address_space *mapping, + loff_t start, loff_t end); +int filemap_check_errors(struct address_space *mapping); +void __filemap_set_wb_err(struct address_space *mapping, int err); +int filemap_fdatawrite_wbc(struct address_space *mapping, + struct writeback_control *wbc); + +static inline int filemap_write_and_wait(struct address_space *mapping) +{ + return filemap_write_and_wait_range(mapping, 0, LLONG_MAX); +} + +/** + * filemap_set_wb_err - set a writeback error on an address_space + * @mapping: mapping in which to set writeback error + * @err: error to be set in mapping + * + * When writeback fails in some way, we must record that error so that + * userspace can be informed when fsync and the like are called. We endeavor + * to report errors on any file that was open at the time of the error. Some + * internal callers also need to know when writeback errors have occurred. + * + * When a writeback error occurs, most filesystems will want to call + * filemap_set_wb_err to record the error in the mapping so that it will be + * automatically reported whenever fsync is called on the file. + */ +static inline void filemap_set_wb_err(struct address_space *mapping, int err) +{ + /* Fastpath for common case of no error */ + if (unlikely(err)) + __filemap_set_wb_err(mapping, err); +} + +/** + * filemap_check_wb_err - has an error occurred since the mark was sampled? + * @mapping: mapping to check for writeback errors + * @since: previously-sampled errseq_t + * + * Grab the errseq_t value from the mapping, and see if it has changed "since" + * the given value was sampled. + * + * If it has then report the latest error set, otherwise return 0. + */ +static inline int filemap_check_wb_err(struct address_space *mapping, + errseq_t since) +{ + return errseq_check(&mapping->wb_err, since); +} + +/** + * filemap_sample_wb_err - sample the current errseq_t to test for later errors + * @mapping: mapping to be sampled + * + * Writeback errors are always reported relative to a particular sample point + * in the past. This function provides those sample points. + */ +static inline errseq_t filemap_sample_wb_err(struct address_space *mapping) +{ + return errseq_sample(&mapping->wb_err); +} + +/** + * file_sample_sb_err - sample the current errseq_t to test for later errors + * @file: file pointer to be sampled + * + * Grab the most current superblock-level errseq_t value for the given + * struct file. + */ +static inline errseq_t file_sample_sb_err(struct file *file) +{ + return errseq_sample(&file->f_path.dentry->d_sb->s_wb_err); +} + +/* + * Flush file data before changing attributes. Caller must hold any locks + * required to prevent further writes to this file until we're done setting + * flags. + */ +static inline int inode_drain_writes(struct inode *inode) +{ + inode_dio_wait(inode); + return filemap_write_and_wait(inode->i_mapping); +} + static inline bool mapping_empty(struct address_space *mapping) { return xa_empty(&mapping->i_pages); -- cgit From 74e8ee4708a8edabbbc7ab8c12ec24d7a561bb41 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 18 Jan 2022 10:50:48 -0500 Subject: mm: Turn head_compound_mapcount() into folio_entire_mapcount() Adjust documentation to be more clear. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/mm.h | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index dede2eda4d7f..70f0ca217962 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -776,21 +776,26 @@ static inline int is_vmalloc_or_module_addr(const void *x) } #endif -static inline int head_compound_mapcount(struct page *head) +/* + * How many times the entire folio is mapped as a single unit (eg by a + * PMD or PUD entry). This is probably not what you want, except for + * debugging purposes; look at folio_mapcount() or page_mapcount() + * instead. + */ +static inline int folio_entire_mapcount(struct folio *folio) { - return atomic_read(compound_mapcount_ptr(head)) + 1; + VM_BUG_ON_FOLIO(!folio_test_large(folio), folio); + return atomic_read(folio_mapcount_ptr(folio)) + 1; } /* * Mapcount of compound page as a whole, does not include mapped sub-pages. * - * Must be called only for compound pages or any their tail sub-pages. + * Must be called only for compound pages. */ static inline int compound_mapcount(struct page *page) { - VM_BUG_ON_PAGE(!PageCompound(page), page); - page = compound_head(page); - return head_compound_mapcount(page); + return folio_entire_mapcount(page_folio(page)); } /* -- cgit From 4ba1119cd53166d853050ff1a9d76079cd8f8e06 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 17 Jan 2022 16:33:26 -0500 Subject: mm: Add folio_mapcount() This implements the same algorithm as total_mapcount(), which is transformed into a wrapper function. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/mm.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 70f0ca217962..0d380dc26847 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -825,8 +825,14 @@ static inline int page_mapcount(struct page *page) return atomic_read(&page->_mapcount) + 1; } +int folio_mapcount(struct folio *folio); + #ifdef CONFIG_TRANSPARENT_HUGEPAGE -int total_mapcount(struct page *page); +static inline int total_mapcount(struct page *page) +{ + return folio_mapcount(page_folio(page)); +} + int page_trans_huge_mapcount(struct page *page); #else static inline int total_mapcount(struct page *page) -- cgit From 346cf61311f6e348337e95aa2febe29e86137f42 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 18 Jan 2022 08:56:47 -0500 Subject: mm: Add split_folio_to_list() This is a convenience function; split_huge_page_to_list() can take any page in a folio (and does so on purpose because that page will be the one which keeps the refcount). But it's convenient for the callers to pass the folio instead of the first page in the folio. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/huge_mm.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e4c18ba8d3bf..71c073d411ac 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -483,6 +483,12 @@ static inline bool thp_migration_supported(void) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static inline int split_folio_to_list(struct folio *folio, + struct list_head *list) +{ + return split_huge_page_to_list(&folio->page, list); +} + /** * thp_size - Size of a transparent huge page. * @page: Head page of a transparent huge page. -- cgit From f087b903fc2e4975bff9742a66ee7a837a2f545b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 2 Feb 2022 23:29:45 -0500 Subject: mm: Add folio_pgoff() This is the folio equivalent of page_to_pgoff(). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/pagemap.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index f968b36ad771..a73c928e1d74 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -817,6 +817,17 @@ static inline loff_t folio_file_pos(struct folio *folio) return page_file_offset(&folio->page); } +/* + * Get the offset in PAGE_SIZE (even for hugetlb folios). + * (TODO: hugetlb folios should have ->index in PAGE_SIZE) + */ +static inline pgoff_t folio_pgoff(struct folio *folio) +{ + if (unlikely(folio_test_hugetlb(folio))) + return hugetlb_basepage_index(&folio->page); + return folio->index; +} + extern pgoff_t linear_hugepage_index(struct vm_area_struct *vma, unsigned long address); -- cgit From eed05e54d275b3cfc5d8c79843c5276a5878e94a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 3 Feb 2022 09:06:08 -0500 Subject: mm: Add DEFINE_PAGE_VMA_WALK and DEFINE_FOLIO_VMA_WALK Instead of declaring a struct page_vma_mapped_walk directly, use these helpers to allow us to transition to a PFN approach in the following patches. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index ac29b076082b..0d894a2bfaa1 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -214,6 +214,22 @@ struct page_vma_mapped_walk { unsigned int flags; }; +#define DEFINE_PAGE_VMA_WALK(name, _page, _vma, _address, _flags) \ + struct page_vma_mapped_walk name = { \ + .page = _page, \ + .vma = _vma, \ + .address = _address, \ + .flags = _flags, \ + } + +#define DEFINE_FOLIO_VMA_WALK(name, _folio, _vma, _address, _flags) \ + struct page_vma_mapped_walk name = { \ + .page = &_folio->page, \ + .vma = _vma, \ + .address = _address, \ + .flags = _flags, \ + } + static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw) { /* HugeTLB pte is set to the relevant page table entry without pte_mapped. */ -- cgit From 2aff7a4755bed2870ee23b75bc88cdc8d76cdd03 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 3 Feb 2022 11:40:17 -0500 Subject: mm: Convert page_vma_mapped_walk to work on PFNs page_mapped_in_vma() really just wants to walk one page, but as the code stands, if passed the head page of a compound page, it will walk every page in the compound page. Extract pfn/nr_pages/pgoff from the struct page early, so they can be overridden by page_mapped_in_vma(). Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/hugetlb.h | 5 +++++ include/linux/rmap.h | 17 ++++++++++++----- 2 files changed, 17 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d1897a69c540..6ba2f8e74fbb 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -970,6 +970,11 @@ static inline struct hstate *page_hstate(struct page *page) return NULL; } +static inline struct hstate *size_to_hstate(unsigned long size) +{ + return NULL; +} + static inline unsigned long huge_page_size(struct hstate *h) { return PAGE_SIZE; diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 0d894a2bfaa1..0c838ba1a8ee 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -11,6 +11,7 @@ #include #include #include +#include /* * The anon_vma heads a list of private "related" vmas, to scan if @@ -201,11 +202,13 @@ int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, /* Avoid racy checks */ #define PVMW_SYNC (1 << 0) -/* Look for migarion entries rather than present PTEs */ +/* Look for migration entries rather than present PTEs */ #define PVMW_MIGRATION (1 << 1) struct page_vma_mapped_walk { - struct page *page; + unsigned long pfn; + unsigned long nr_pages; + pgoff_t pgoff; struct vm_area_struct *vma; unsigned long address; pmd_t *pmd; @@ -216,7 +219,9 @@ struct page_vma_mapped_walk { #define DEFINE_PAGE_VMA_WALK(name, _page, _vma, _address, _flags) \ struct page_vma_mapped_walk name = { \ - .page = _page, \ + .pfn = page_to_pfn(_page), \ + .nr_pages = compound_nr(page), \ + .pgoff = page_to_pgoff(page), \ .vma = _vma, \ .address = _address, \ .flags = _flags, \ @@ -224,7 +229,9 @@ struct page_vma_mapped_walk { #define DEFINE_FOLIO_VMA_WALK(name, _folio, _vma, _address, _flags) \ struct page_vma_mapped_walk name = { \ - .page = &_folio->page, \ + .pfn = folio_pfn(_folio), \ + .nr_pages = folio_nr_pages(_folio), \ + .pgoff = folio_pgoff(_folio), \ .vma = _vma, \ .address = _address, \ .flags = _flags, \ @@ -233,7 +240,7 @@ struct page_vma_mapped_walk { static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw) { /* HugeTLB pte is set to the relevant page table entry without pte_mapped. */ - if (pvmw->pte && !PageHuge(pvmw->page)) + if (pvmw->pte && !is_vm_hugetlb_page(pvmw->vma)) pte_unmap(pvmw->pte); if (pvmw->ptl) spin_unlock(pvmw->ptl); -- cgit From b3ac04132c4b9bc5c9c14608424d410e7ca3b400 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 21 Jan 2022 11:27:31 -0500 Subject: mm/rmap: Turn page_referenced() into folio_referenced() Both its callers pass a page which was previously on an LRU list, so were passing a folio by definition. Use the type system to enforce that and remove a few calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/rmap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 0c838ba1a8ee..69a1664216de 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -190,7 +190,7 @@ static inline void page_dup_rmap(struct page *page, bool compound) /* * Called from mm/vmscan.c to handle paging out */ -int page_referenced(struct page *, int is_locked, +int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); void try_to_migrate(struct page *page, enum ttu_flags flags); @@ -301,7 +301,7 @@ void rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc); #define anon_vma_prepare(vma) (0) #define anon_vma_link(vma) do {} while (0) -static inline int page_referenced(struct page *page, int is_locked, +static inline int folio_referenced(struct folio *folio, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags) { -- cgit From af28a988b313a601c12c410a42e485ca46adcfee Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 21 Jan 2022 10:44:52 -0500 Subject: mm/huge_memory: Convert __split_huge_pmd() to take a folio Convert split_huge_pmd_address() at the same time since it only passes the folio through, and its two callers already have a folio on hand. Removes numerous calls to compound_head() and removes an assumption that a page cannot be larger than a PMD. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/huge_mm.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 71c073d411ac..4368b314d9c8 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -194,7 +194,7 @@ static inline int split_huge_page(struct page *page) void deferred_split_huge_page(struct page *page); void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct page *page); + unsigned long address, bool freeze, struct folio *folio); #define split_huge_pmd(__vma, __pmd, __address) \ do { \ @@ -207,7 +207,7 @@ void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, void split_huge_pmd_address(struct vm_area_struct *vma, unsigned long address, - bool freeze, struct page *page); + bool freeze, struct folio *folio); void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, unsigned long address); @@ -406,9 +406,9 @@ static inline void deferred_split_huge_page(struct page *page) {} do { } while (0) static inline void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, - unsigned long address, bool freeze, struct page *page) {} + unsigned long address, bool freeze, struct folio *folio) {} static inline void split_huge_pmd_address(struct vm_area_struct *vma, - unsigned long address, bool freeze, struct page *page) {} + unsigned long address, bool freeze, struct folio *folio) {} #define split_huge_pud(__vma, __pmd, __address) \ do { } while (0) -- cgit From 869f7ee6f6477341f859c8b0949ae81caf9ca7f3 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 15 Feb 2022 09:28:49 -0500 Subject: mm/rmap: Convert try_to_unmap() to take a folio Change all three callers and the worker function try_to_unmap_one(). Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 69a1664216de..a0c5c38c733f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -194,7 +194,7 @@ int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); void try_to_migrate(struct page *page, enum ttu_flags flags); -void try_to_unmap(struct page *, enum ttu_flags flags); +void try_to_unmap(struct folio *, enum ttu_flags flags); int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, unsigned long end, struct page **pages, @@ -309,7 +309,7 @@ static inline int folio_referenced(struct folio *folio, int is_locked, return 0; } -static inline void try_to_unmap(struct page *page, enum ttu_flags flags) +static inline void try_to_unmap(struct folio *folio, enum ttu_flags flags) { } -- cgit From 4b8554c527f3cfa183f6c06d231a9387873205a0 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 28 Jan 2022 14:29:43 -0500 Subject: mm/rmap: Convert try_to_migrate() to folios Convert the callers to pass a folio and the try_to_migrate_one() worker to use a folio throughout. Fixes an assumption that a folio must be <= PMD size. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index a0c5c38c733f..e6e935c81414 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -193,7 +193,7 @@ static inline void page_dup_rmap(struct page *page, bool compound) int folio_referenced(struct folio *, int is_locked, struct mem_cgroup *memcg, unsigned long *vm_flags); -void try_to_migrate(struct page *page, enum ttu_flags flags); +void try_to_migrate(struct folio *folio, enum ttu_flags flags); void try_to_unmap(struct folio *, enum ttu_flags flags); int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, -- cgit From 4eecb8b9163df82c87c91764a02fff228ef25f6d Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 28 Jan 2022 23:32:59 -0500 Subject: mm/migrate: Convert remove_migration_ptes() to folios Convert the implementation and all callers. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index e6e935c81414..21af80d5b711 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -261,7 +261,7 @@ unsigned long page_address_in_vma(struct page *, struct vm_area_struct *); */ int folio_mkclean(struct folio *); -void remove_migration_ptes(struct page *old, struct page *new, bool locked); +void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); /* * Called by memory-failure.c to kill processes. -- cgit From 9595d76942b8714627d670a7e7ae543812c731ae Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 1 Feb 2022 23:33:08 -0500 Subject: mm/rmap: Turn page_lock_anon_vma_read() into folio_lock_anon_vma_read() Add back page_lock_anon_vma_read() as a wrapper. This saves a few calls to compound_head(). If any callers were passing a tail page before, this would have failed to lock the anon VMA as page->mapping is not valid for tail pages. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/rmap.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 21af80d5b711..be020d38b0a5 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -267,6 +267,7 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); * Called by memory-failure.c to kill processes. */ struct anon_vma *page_lock_anon_vma_read(struct page *page); +struct anon_vma *folio_lock_anon_vma_read(struct folio *folio); void page_unlock_anon_vma_read(struct anon_vma *anon_vma); int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); -- cgit From e05b34539d008ab819388f699b25eae962ba24ac Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 29 Jan 2022 11:52:52 -0500 Subject: mm: Turn page_anon_vma() into folio_anon_vma() Move the prototype from mm.h to mm/internal.h and convert all callers to pass a folio. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 0d380dc26847..a879c583f665 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1730,7 +1730,6 @@ static inline void *folio_address(const struct folio *folio) } extern void *page_rmapping(struct page *page); -extern struct anon_vma *page_anon_vma(struct page *page); extern pgoff_t __page_file_index(struct page *page); /* -- cgit From 2f031c6f042cb8a9b221a8b6b80e69de5170f830 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 29 Jan 2022 16:06:53 -0500 Subject: mm/rmap: Convert rmap_walk() to take a folio This ripples all the way through to every calling and called function from rmap. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/ksm.h | 4 ++-- include/linux/rmap.h | 11 +++++------ 2 files changed, 7 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ksm.h b/include/linux/ksm.h index a38a5bca1ba5..0b4f17418f64 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -51,7 +51,7 @@ static inline void ksm_exit(struct mm_struct *mm) struct page *ksm_might_need_to_copy(struct page *page, struct vm_area_struct *vma, unsigned long address); -void rmap_walk_ksm(struct page *page, struct rmap_walk_control *rwc); +void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc); void folio_migrate_ksm(struct folio *newfolio, struct folio *folio); #else /* !CONFIG_KSM */ @@ -78,7 +78,7 @@ static inline struct page *ksm_might_need_to_copy(struct page *page, return page; } -static inline void rmap_walk_ksm(struct page *page, +static inline void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc) { } diff --git a/include/linux/rmap.h b/include/linux/rmap.h index be020d38b0a5..a74d811c5b3f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -266,7 +266,6 @@ void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); /* * Called by memory-failure.c to kill processes. */ -struct anon_vma *page_lock_anon_vma_read(struct page *page); struct anon_vma *folio_lock_anon_vma_read(struct folio *folio); void page_unlock_anon_vma_read(struct anon_vma *anon_vma); int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); @@ -286,15 +285,15 @@ struct rmap_walk_control { * Return false if page table scanning in rmap_walk should be stopped. * Otherwise, return true. */ - bool (*rmap_one)(struct page *page, struct vm_area_struct *vma, + bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg); - int (*done)(struct page *page); - struct anon_vma *(*anon_lock)(struct page *page); + int (*done)(struct folio *folio); + struct anon_vma *(*anon_lock)(struct folio *folio); bool (*invalid_vma)(struct vm_area_struct *vma, void *arg); }; -void rmap_walk(struct page *page, struct rmap_walk_control *rwc); -void rmap_walk_locked(struct page *page, struct rmap_walk_control *rwc); +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); #else /* !CONFIG_MMU */ -- cgit From 84fbbe21894bb9be8e16df408cbfbb9466fe396d Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 29 Jan 2022 16:16:54 -0500 Subject: mm/rmap: Constify the rmap_walk_control argument The rmap walking functions do not modify the rmap_walk_control, and page_idle_clear_pte_refs() takes advantage of that to move construction of the rmap_walk_control to compile time. This lets us remove an unclean cast. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/ksm.h | 4 ++-- include/linux/rmap.h | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 0b4f17418f64..0630e545f4cb 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -51,7 +51,7 @@ static inline void ksm_exit(struct mm_struct *mm) struct page *ksm_might_need_to_copy(struct page *page, struct vm_area_struct *vma, unsigned long address); -void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc); +void rmap_walk_ksm(struct folio *folio, const struct rmap_walk_control *rwc); void folio_migrate_ksm(struct folio *newfolio, struct folio *folio); #else /* !CONFIG_KSM */ @@ -79,7 +79,7 @@ static inline struct page *ksm_might_need_to_copy(struct page *page, } static inline void rmap_walk_ksm(struct folio *folio, - struct rmap_walk_control *rwc) + const struct rmap_walk_control *rwc) { } diff --git a/include/linux/rmap.h b/include/linux/rmap.h index a74d811c5b3f..17230c458341 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -292,8 +292,8 @@ struct rmap_walk_control { bool (*invalid_vma)(struct vm_area_struct *vma, void *arg); }; -void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); -void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); +void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); +void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); #else /* !CONFIG_MMU */ -- cgit From d4b4084ac3154c51ff5fa71f669264cc44429be2 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 4 Feb 2022 14:13:31 -0500 Subject: mm: Turn can_split_huge_page() into can_split_folio() This function already required a head page to be passed, so this just adds type-safety and removes a few implicit calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/huge_mm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4368b314d9c8..e0348bca3d66 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -185,7 +185,7 @@ void prep_transhuge_page(struct page *page); void free_transhuge_page(struct page *page); bool is_transparent_hugepage(struct page *page); -bool can_split_huge_page(struct page *page, int *pextra_pins); +bool can_split_folio(struct folio *folio, int *pextra_pins); int split_huge_page_to_list(struct page *page, struct list_head *list); static inline int split_huge_page(struct page *page) { @@ -387,7 +387,7 @@ static inline bool is_transparent_hugepage(struct page *page) #define thp_get_unmapped_area NULL static inline bool -can_split_huge_page(struct page *page, int *pextra_pins) +can_split_folio(struct folio *folio, int *pextra_pins) { BUILD_BUG(); return false; -- cgit From 06d44142d49dc2e02d255ea9d72dc4c20f20388f Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 10 Oct 2020 11:47:55 -0400 Subject: mm: Fix READ_ONLY_THP warning These counters only exist if CONFIG_READ_ONLY_THP_FOR_FS is defined, but we do not need to warn if the filesystem natively supports large folios. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index a73c928e1d74..0a2417fc531c 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -326,7 +326,7 @@ static inline void filemap_nr_thps_inc(struct address_space *mapping) if (!mapping_large_folio_support(mapping)) atomic_inc(&mapping->nr_thps); #else - WARN_ON_ONCE(1); + WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0); #endif } @@ -336,7 +336,7 @@ static inline void filemap_nr_thps_dec(struct address_space *mapping) if (!mapping_large_folio_support(mapping)) atomic_dec(&mapping->nr_thps); #else - WARN_ON_ONCE(1); + WARN_ON_ONCE(mapping_large_folio_support(mapping) == 0); #endif } -- cgit From 421f1ab48452af48b64e205de1caca3d1ba415f4 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 15 Jan 2022 23:27:08 -0500 Subject: mm: Make large folios depend on THP Some parts of the VM still depend on THP to handle large folios correctly. Until those are fixed, prevent creating large folios if THP are disabled. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 0a2417fc531c..20d7cbabf654 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -306,9 +306,14 @@ static inline void mapping_set_large_folios(struct address_space *mapping) __set_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); } +/* + * Large folio support currently depends on THP. These dependencies are + * being worked on but are not yet fixed. + */ static inline bool mapping_large_folio_support(struct address_space *mapping) { - return test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); + return IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && + test_bit(AS_LARGE_FOLIO_SUPPORT, &mapping->flags); } static inline int filemap_nr_thps(struct address_space *mapping) -- cgit From 18788cfa236967741b83db1035ab24539e2a21bb Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 May 2020 20:54:38 -0400 Subject: mm: Support arbitrary THP sizes For code which has not yet been converted from THP to folios, use the compound size of the page instead of assuming PTE or PMD size. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/huge_mm.h | 47 ----------------------------------------------- include/linux/mm.h | 31 +++++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+), 47 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e0348bca3d66..0734aff8fa19 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -250,30 +250,6 @@ static inline spinlock_t *pud_trans_huge_lock(pud_t *pud, return NULL; } -/** - * thp_order - Order of a transparent huge page. - * @page: Head page of a transparent huge page. - */ -static inline unsigned int thp_order(struct page *page) -{ - VM_BUG_ON_PGFLAGS(PageTail(page), page); - if (PageHead(page)) - return HPAGE_PMD_ORDER; - return 0; -} - -/** - * thp_nr_pages - The number of regular pages in this huge page. - * @page: The head page of a huge page. - */ -static inline int thp_nr_pages(struct page *page) -{ - VM_BUG_ON_PGFLAGS(PageTail(page), page); - if (PageHead(page)) - return HPAGE_PMD_NR; - return 1; -} - /** * folio_test_pmd_mappable - Can we map this folio with a PMD? * @folio: The folio to test @@ -336,18 +312,6 @@ static inline struct list_head *page_deferred_list(struct page *page) #define HPAGE_PUD_MASK ({ BUILD_BUG(); 0; }) #define HPAGE_PUD_SIZE ({ BUILD_BUG(); 0; }) -static inline unsigned int thp_order(struct page *page) -{ - VM_BUG_ON_PGFLAGS(PageTail(page), page); - return 0; -} - -static inline int thp_nr_pages(struct page *page) -{ - VM_BUG_ON_PGFLAGS(PageTail(page), page); - return 1; -} - static inline bool folio_test_pmd_mappable(struct folio *folio) { return false; @@ -489,15 +453,4 @@ static inline int split_folio_to_list(struct folio *folio, return split_huge_page_to_list(&folio->page, list); } -/** - * thp_size - Size of a transparent huge page. - * @page: Head page of a transparent huge page. - * - * Return: Number of bytes in this page. - */ -static inline unsigned long thp_size(struct page *page) -{ - return PAGE_SIZE << thp_order(page); -} - #endif /* _LINUX_HUGE_MM_H */ diff --git a/include/linux/mm.h b/include/linux/mm.h index a879c583f665..c1966ad34142 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -939,6 +939,37 @@ static inline unsigned int page_shift(struct page *page) return PAGE_SHIFT + compound_order(page); } +/** + * thp_order - Order of a transparent huge page. + * @page: Head page of a transparent huge page. + */ +static inline unsigned int thp_order(struct page *page) +{ + VM_BUG_ON_PGFLAGS(PageTail(page), page); + return compound_order(page); +} + +/** + * thp_nr_pages - The number of regular pages in this huge page. + * @page: The head page of a huge page. + */ +static inline int thp_nr_pages(struct page *page) +{ + VM_BUG_ON_PGFLAGS(PageTail(page), page); + return compound_nr(page); +} + +/** + * thp_size - Size of a transparent huge page. + * @page: Head page of a transparent huge page. + * + * Return: Number of bytes in this page. + */ +static inline unsigned long thp_size(struct page *page) +{ + return PAGE_SIZE << thp_order(page); +} + void free_compound_page(struct page *page); #ifdef CONFIG_MMU -- cgit From 351bdbb6419ca988802882badadc321d384d0254 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 21 Mar 2022 10:22:37 +0100 Subject: net: Revert the softirq will run annotation in ____napi_schedule(). The lockdep annotation lockdep_assert_softirq_will_run() expects that either hard or soft interrupts are disabled because both guaranty that the "raised" soft-interrupts will be processed once the context is left. This triggers in flush_smp_call_function_from_idle() but it this case it explicitly calls do_softirq() in case of pending softirqs. Revert the "softirq will run" annotation in ____napi_schedule() and move the check back to __netif_rx() as it was. Keep the IRQ-off assert in ____napi_schedule() because this is always required. Fixes: fbd9a2ceba5c7 ("net: Add lockdep asserts to ____napi_schedule().") Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Jason A. Donenfeld Link: https://lore.kernel.org/r/YjhD3ZKWysyw8rc6@linutronix.de Signed-off-by: Jakub Kicinski --- include/linux/lockdep.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 0cc65d216701..467b94257105 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -329,12 +329,6 @@ extern void lock_unpin_lock(struct lockdep_map *lock, struct pin_cookie); #define lockdep_assert_none_held_once() \ lockdep_assert_once(!current->lockdep_depth) -/* - * Ensure that softirq is handled within the callchain and not delayed and - * handled by chance. - */ -#define lockdep_assert_softirq_will_run() \ - lockdep_assert_once(hardirq_count() | softirq_count()) #define lockdep_recursing(tsk) ((tsk)->lockdep_recursion) @@ -420,7 +414,6 @@ extern int lockdep_is_held(const void *); #define lockdep_assert_held_read(l) do { (void)(l); } while (0) #define lockdep_assert_held_once(l) do { (void)(l); } while (0) #define lockdep_assert_none_held_once() do { } while (0) -#define lockdep_assert_softirq_will_run() do { } while (0) #define lockdep_recursing(tsk) (0) -- cgit From f5bfa23f576fdcc8e0b7fbff44cf70bd69ff9bdb Mon Sep 17 00:00:00 2001 From: Atish Patra Date: Fri, 18 Feb 2022 16:46:54 -0800 Subject: RISC-V: Add a perf core library for pmu drivers Implement a perf core library that can support all the essential perf features in future. It can also accommodate any type of PMU implementation in future. Currently, both SBI based perf driver and legacy driver implemented uses the library. Most of the common perf functionalities are kept in this core library wile PMU specific driver can implement PMU specific features. For example, the SBI specific functionality will be implemented in the SBI specific driver. Reviewed-by: Anup Patel Signed-off-by: Atish Patra Signed-off-by: Atish Patra Signed-off-by: Palmer Dabbelt --- include/linux/perf/riscv_pmu.h | 65 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 include/linux/perf/riscv_pmu.h (limited to 'include/linux') diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h new file mode 100644 index 000000000000..0d8979765d79 --- /dev/null +++ b/include/linux/perf/riscv_pmu.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2018 SiFive + * Copyright (C) 2018 Andes Technology Corporation + * Copyright (C) 2021 Western Digital Corporation or its affiliates. + * + */ + +#ifndef _ASM_RISCV_PERF_EVENT_H +#define _ASM_RISCV_PERF_EVENT_H + +#include +#include +#include + +#ifdef CONFIG_RISCV_PMU + +/* + * The RISCV_MAX_COUNTERS parameter should be specified. + */ + +#define RISCV_MAX_COUNTERS 64 +#define RISCV_OP_UNSUPP (-EOPNOTSUPP) +#define RISCV_PMU_PDEV_NAME "riscv-pmu" + +#define RISCV_PMU_STOP_FLAG_RESET 1 + +struct cpu_hw_events { + /* currently enabled events */ + int n_events; + /* currently enabled events */ + struct perf_event *events[RISCV_MAX_COUNTERS]; + /* currently enabled counters */ + DECLARE_BITMAP(used_event_ctrs, RISCV_MAX_COUNTERS); +}; + +struct riscv_pmu { + struct pmu pmu; + char *name; + + irqreturn_t (*handle_irq)(int irq_num, void *dev); + + int num_counters; + u64 (*ctr_read)(struct perf_event *event); + int (*ctr_get_idx)(struct perf_event *event); + int (*ctr_get_width)(int idx); + void (*ctr_clear_idx)(struct perf_event *event); + void (*ctr_start)(struct perf_event *event, u64 init_val); + void (*ctr_stop)(struct perf_event *event, unsigned long flag); + int (*event_map)(struct perf_event *event, u64 *config); + + struct cpu_hw_events __percpu *hw_events; + struct hlist_node node; +}; + +#define to_riscv_pmu(p) (container_of(p, struct riscv_pmu, pmu)) +unsigned long riscv_pmu_ctr_read_csr(unsigned long csr); +int riscv_pmu_event_set_period(struct perf_event *event); +uint64_t riscv_pmu_ctr_get_width_mask(struct perf_event *event); +u64 riscv_pmu_event_update(struct perf_event *event); +struct riscv_pmu *riscv_pmu_alloc(void); + +#endif /* CONFIG_RISCV_PMU */ + +#endif /* _ASM_RISCV_PERF_EVENT_H */ -- cgit From 9b3e150e310ee71d7bae1e31c38a300cfa5e951b Mon Sep 17 00:00:00 2001 From: Atish Patra Date: Fri, 18 Feb 2022 16:46:55 -0800 Subject: RISC-V: Add a simple platform driver for RISC-V legacy perf The old RISC-V perf implementation allowed counting of only cycle/instruction counters using perf. Restore that feature by implementing a simple platform driver under a separate config to provide backward compatibility. Any existing software stack will continue to work as it is. However, it provides an easy way out in future where we can remove the legacy driver. Reviewed-by: Anup Patel Signed-off-by: Atish Patra Signed-off-by: Atish Patra Signed-off-by: Palmer Dabbelt --- include/linux/perf/riscv_pmu.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h index 0d8979765d79..9140c491fc54 100644 --- a/include/linux/perf/riscv_pmu.h +++ b/include/linux/perf/riscv_pmu.h @@ -22,6 +22,7 @@ #define RISCV_MAX_COUNTERS 64 #define RISCV_OP_UNSUPP (-EOPNOTSUPP) #define RISCV_PMU_PDEV_NAME "riscv-pmu" +#define RISCV_PMU_LEGACY_PDEV_NAME "riscv-pmu-legacy" #define RISCV_PMU_STOP_FLAG_RESET 1 @@ -58,6 +59,11 @@ unsigned long riscv_pmu_ctr_read_csr(unsigned long csr); int riscv_pmu_event_set_period(struct perf_event *event); uint64_t riscv_pmu_ctr_get_width_mask(struct perf_event *event); u64 riscv_pmu_event_update(struct perf_event *event); +#ifdef CONFIG_RISCV_PMU_LEGACY +void riscv_pmu_legacy_skip_init(void); +#else +static inline void riscv_pmu_legacy_skip_init(void) {}; +#endif struct riscv_pmu *riscv_pmu_alloc(void); #endif /* CONFIG_RISCV_PMU */ -- cgit From e9991434596f5373dfd75857b445eb92a9253c56 Mon Sep 17 00:00:00 2001 From: Atish Patra Date: Fri, 18 Feb 2022 16:46:57 -0800 Subject: RISC-V: Add perf platform driver based on SBI PMU extension RISC-V SBI specification added a PMU extension that allows to configure start/stop any pmu counter. The RISC-V perf can use most of the generic perf features except interrupt overflow and event filtering based on privilege mode which will be added in future. It also allows to monitor a handful of firmware counters that can provide insights into firmware activity during a performance analysis. Signed-off-by: Atish Patra Signed-off-by: Atish Patra Signed-off-by: Palmer Dabbelt --- include/linux/cpuhotplug.h | 1 + include/linux/perf/riscv_pmu.h | 6 ++++-- 2 files changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 411a428ace4d..51c7e4929df7 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -165,6 +165,7 @@ enum cpuhp_state { CPUHP_AP_PERF_ARM_HW_BREAKPOINT_STARTING, CPUHP_AP_PERF_ARM_ACPI_STARTING, CPUHP_AP_PERF_ARM_STARTING, + CPUHP_AP_PERF_RISCV_STARTING, CPUHP_AP_ARM_L2X0_STARTING, CPUHP_AP_EXYNOS4_MCT_TIMER_STARTING, CPUHP_AP_ARM_ARCH_TIMER_STARTING, diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h index 9140c491fc54..0f226948c0ca 100644 --- a/include/linux/perf/riscv_pmu.h +++ b/include/linux/perf/riscv_pmu.h @@ -31,8 +31,10 @@ struct cpu_hw_events { int n_events; /* currently enabled events */ struct perf_event *events[RISCV_MAX_COUNTERS]; - /* currently enabled counters */ - DECLARE_BITMAP(used_event_ctrs, RISCV_MAX_COUNTERS); + /* currently enabled hardware counters */ + DECLARE_BITMAP(used_hw_ctrs, RISCV_MAX_COUNTERS); + /* currently enabled firmware counters */ + DECLARE_BITMAP(used_fw_ctrs, RISCV_MAX_COUNTERS); }; struct riscv_pmu { -- cgit From 4905ec2fb7e6421c14c9fb7276f5aa92f60f2b98 Mon Sep 17 00:00:00 2001 From: Atish Patra Date: Fri, 18 Feb 2022 16:46:58 -0800 Subject: RISC-V: Add sscofpmf extension support The sscofpmf extension allows counter overflow and filtering for programmable counters. Enable the perf driver to handle the overflow interrupt. The overflow interrupt is a hart local interrupt. Thus, per cpu overflow interrupts are setup as a child under the root INTC irq domain. Signed-off-by: Atish Patra Signed-off-by: Atish Patra Signed-off-by: Palmer Dabbelt --- include/linux/perf/riscv_pmu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h index 0f226948c0ca..46f9b6fe306e 100644 --- a/include/linux/perf/riscv_pmu.h +++ b/include/linux/perf/riscv_pmu.h @@ -29,6 +29,8 @@ struct cpu_hw_events { /* currently enabled events */ int n_events; + /* Counter overflow interrupt */ + int irq; /* currently enabled events */ struct perf_event *events[RISCV_MAX_COUNTERS]; /* currently enabled hardware counters */ -- cgit From 863a66cdb4df25fd146d9851c3289072298566d5 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Wed, 16 Mar 2022 09:27:08 +0800 Subject: lib/sbitmap: allocate sb->map via kvzalloc_node sbitmap has been used in scsi for replacing atomic operations on sdev->device_busy, so IOPS on some fast scsi storage can be improved. However, sdev->device_busy can be changed in fast path, so we have to allocate the sb->map statically. sdev->device_busy has been capped to 1024, but some drivers may configure the default depth as < 8, then cause each sbitmap word to hold only one bit. Finally 1024 * 128( sizeof(sbitmap_word)) bytes is needed for sb->map, given it is order 5 allocation, sometimes it may fail. Avoid the issue by using kvzalloc_node() for allocating sb->map. Cc: Ewan D. Milne Signed-off-by: Ming Lei Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220316012708.354668-1-ming.lei@redhat.com Signed-off-by: Jens Axboe --- include/linux/sbitmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index dffeb8281c2d..8f5a86e210b9 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -174,7 +174,7 @@ static inline unsigned int __map_depth(const struct sbitmap *sb, int index) static inline void sbitmap_free(struct sbitmap *sb) { free_percpu(sb->alloc_hint); - kfree(sb->map); + kvfree(sb->map); sb->map = NULL; } -- cgit From 4a0cb83ba6e0cd73a50fa4f84736846bf0029f2b Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 21 Mar 2022 22:10:53 -0700 Subject: netdevice: add missing dm_private kdoc Building htmldocs complains: include/linux/netdevice.h:2295: warning: Function parameter or member 'dm_private' not described in 'net_device' Fixes: b26ef81c46ed ("drop_monitor: remove quadratic behavior") Signed-off-by: Jakub Kicinski Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20220322051053.1883186-1-kuba@kernel.org Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e01a8ce7181f..cd7a597c55b1 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1899,6 +1899,8 @@ enum netdev_ml_priv_type { * @garp_port: GARP * @mrp_port: MRP * + * @dm_private: Drop monitor private + * * @dev: Class/net/name entry * @sysfs_groups: Space for optional device, statistics and wireless * sysfs groups -- cgit From 0313bc278dac7cd9ce83a8d384581dc043156965 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 22 Mar 2022 09:17:20 -0700 Subject: Revert "random: block in /dev/urandom" This reverts commit 6f98a4bfee72c22f50aedb39fb761567969865fe. It turns out we still can't do this. Way too many platforms that don't have any real source of randomness at boot and no jitter entropy because they don't even have a cycle counter. As reported by Guenter Roeck: "This causes a large number of qemu boot test failures for various architectures (arm, m68k, microblaze, sparc32, xtensa are the ones I observed). Common denominator is that boot hangs at 'Saving random seed:'" This isn't hugely unexpected - we tried it, it failed, so now we'll revert it. Link: https://lore.kernel.org/all/20220322155820.GA1745955@roeck-us.net/ Reported-and-bisected-by: Guenter Roeck Cc: Jason Donenfeld Signed-off-by: Linus Torvalds --- include/linux/random.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index c0baffe7afb1..f673fbb838b3 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -48,7 +48,7 @@ extern int unregister_random_ready_notifier(struct notifier_block *nb); extern size_t __must_check get_random_bytes_arch(void *buf, size_t nbytes); #ifndef MODULE -extern const struct file_operations random_fops; +extern const struct file_operations random_fops, urandom_fops; #endif u32 get_random_u32(void); -- cgit From 89f42494f92f448747bd8a7ab1ae8b5d5520577d Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 16 Mar 2022 19:10:43 -0400 Subject: SUNRPC: Don't call connect() more than once on a TCP socket Avoid socket state races due to repeated calls to ->connect() using the same socket. If connect() returns 0 due to the connection having completed, but we are in fact in a closing state, then we may leave the XPRT_CONNECTING flag set on the transport. Reported-by: Enrico Scholz Fixes: 3be232f11a3c ("SUNRPC: Prevent immediate close+reconnect") Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xprtsock.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h index 8c2a712cb242..689062afdd61 100644 --- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -89,5 +89,6 @@ struct sock_xprt { #define XPRT_SOCK_WAKE_WRITE (5) #define XPRT_SOCK_WAKE_PENDING (6) #define XPRT_SOCK_WAKE_DISCONNECT (7) +#define XPRT_SOCK_CONNECT_SENT (8) #endif /* _LINUX_SUNRPC_XPRTSOCK_H */ -- cgit From 2790a624d43084de590884934969e19c7a82316a Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 15 Mar 2022 08:12:40 -0400 Subject: SUNRPC: Replace internal use of SOCKWQ_ASYNC_NOSPACE The socket's SOCKWQ_ASYNC_NOSPACE can be cleared by various actors in the socket layer, so replace it with our own flag in the transport sock_state field. Reported-by: Chuck Lever Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xprtsock.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h index 689062afdd61..3eb0079669c5 100644 --- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -90,5 +90,6 @@ struct sock_xprt { #define XPRT_SOCK_WAKE_PENDING (6) #define XPRT_SOCK_WAKE_DISCONNECT (7) #define XPRT_SOCK_CONNECT_SENT (8) +#define XPRT_SOCK_NOSPACE (9) #endif /* _LINUX_SUNRPC_XPRTSOCK_H */ -- cgit From 33e5c765bc1ea5e06ea7603637f14d727e6fcdf3 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 14 Mar 2022 22:02:22 -0400 Subject: NFS: Fix memory allocation in rpc_malloc() When in a low memory situation, we do want rpciod to kick off direct reclaim in the case where that helps, however we don't want it looping forever in mempool_alloc(). So first try allocating from the slab using GFP_KERNEL | __GFP_NORETRY, and then fall back to a GFP_NOWAIT allocation from the mempool. Ditto for rpc_alloc_task() Signed-off-by: Trond Myklebust --- include/linux/sunrpc/sched.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 56710f8056d3..1d7a3e51b795 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -262,6 +262,7 @@ void rpc_destroy_mempool(void); extern struct workqueue_struct *rpciod_workqueue; extern struct workqueue_struct *xprtiod_workqueue; void rpc_prepare_task(struct rpc_task *task); +gfp_t rpc_task_gfp_mask(void); static inline int rpc_wait_for_completion_task(struct rpc_task *task) { -- cgit From 515dcdcd48736576c6f5c197814da6f81c60a21e Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Mon, 21 Mar 2022 12:34:19 -0400 Subject: NFS: nfsiod should not block forever in mempool_alloc() The concern is that since nfsiod is sometimes required to kick off a commit, it can get locked up waiting forever in mempool_alloc() instead of failing gracefully and leaving the commit until later. Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY, then fall back to a non-blocking attempt to allocate from the memory pool. Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index c47c448befc8..db305abafc9e 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -580,7 +580,7 @@ extern int nfs_wb_all(struct inode *inode); extern int nfs_wb_page(struct inode *inode, struct page *page); extern int nfs_wb_page_cancel(struct inode *inode, struct page* page); extern int nfs_commit_inode(struct inode *, int); -extern struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail); +extern struct nfs_commit_data *nfs_commitdata_alloc(void); extern void nfs_commit_free(struct nfs_commit_data *data); bool nfs_commit_end(struct nfs_mds_commit_info *cinfo); -- cgit From 62eb29526b48d20704668a2fdf97a49d01bf52ce Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Tue, 22 Mar 2022 14:38:33 -0700 Subject: linux/kthread.h: remove unused macros Ever since these macros were introduced in commit b56c0d8937e6 ("kthread: implement kthread_worker"), there has been precisely one user (commit 4d115420707a, "NVMe: Async IO queue deletion"), and that user went away in 2016 with db3cbfff5bcc ("NVMe: IO queue deletion re-write"). Apart from being unused, these macros are also awkward to use (which may contribute to them not being used): Having a way to statically (or on-stack) allocating the storage for the struct kthread_worker itself doesn't help much, since obviously one needs to have some code for actually _spawning_ the worker thread, which must have error checking. And these days we have the kthread_create_worker() interface which both allocates the struct kthread_worker and spawns the kthread. Link: https://lkml.kernel.org/r/20220314145343.494694-1-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes Acked-by: Tejun Heo Cc: "Eric W. Biederman" Cc: Petr Mladek Cc: David Hildenbrand Cc: Yafang Shao Cc: Cai Huoqing Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kthread.h | 22 ---------------------- 1 file changed, 22 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kthread.h b/include/linux/kthread.h index 3df4ea04716f..de5d75bafd66 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -141,12 +141,6 @@ struct kthread_delayed_work { struct timer_list timer; }; -#define KTHREAD_WORKER_INIT(worker) { \ - .lock = __RAW_SPIN_LOCK_UNLOCKED((worker).lock), \ - .work_list = LIST_HEAD_INIT((worker).work_list), \ - .delayed_work_list = LIST_HEAD_INIT((worker).delayed_work_list),\ - } - #define KTHREAD_WORK_INIT(work, fn) { \ .node = LIST_HEAD_INIT((work).node), \ .func = (fn), \ @@ -158,9 +152,6 @@ struct kthread_delayed_work { TIMER_IRQSAFE), \ } -#define DEFINE_KTHREAD_WORKER(worker) \ - struct kthread_worker worker = KTHREAD_WORKER_INIT(worker) - #define DEFINE_KTHREAD_WORK(work, fn) \ struct kthread_work work = KTHREAD_WORK_INIT(work, fn) @@ -168,19 +159,6 @@ struct kthread_delayed_work { struct kthread_delayed_work dwork = \ KTHREAD_DELAYED_WORK_INIT(dwork, fn) -/* - * kthread_worker.lock needs its own lockdep class key when defined on - * stack with lockdep enabled. Use the following macros in such cases. - */ -#ifdef CONFIG_LOCKDEP -# define KTHREAD_WORKER_INIT_ONSTACK(worker) \ - ({ kthread_init_worker(&worker); worker; }) -# define DEFINE_KTHREAD_WORKER_ONSTACK(worker) \ - struct kthread_worker worker = KTHREAD_WORKER_INIT_ONSTACK(worker) -#else -# define DEFINE_KTHREAD_WORKER_ONSTACK(worker) DEFINE_KTHREAD_WORKER(worker) -#endif - extern void __kthread_init_worker(struct kthread_worker *worker, const char *name, struct lock_class_key *key); -- cgit From bf507030f312c68fdbb17c2d33f317cda109a484 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 22 Mar 2022 14:38:48 -0700 Subject: doc: convert 'subsection' to 'section' in gfp.h Patch series "Remove remaining parts of congestion tracking code", v2. This patch (of 11): Various DOC: sections in gfp.h have subsection headers (~~~) but the place where they are included in mm-api.rst does not have section, only chapters. So convert to section headers (---) to avoid confusion. Specifically if sections are added later in mm-api.rst, an error results. Link: https://lkml.kernel.org/r/164549971112.9187.16871723439770288255.stgit@noble.brown Link: https://lkml.kernel.org/r/164549983733.9187.17894407453436115822.stgit@noble.brown Signed-off-by: NeilBrown Cc: Jan Kara Cc: Wu Fengguang Cc: Jaegeuk Kim Cc: Chao Yu Cc: Jeff Layton Cc: Ilya Dryomov Cc: Miklos Szeredi Cc: Trond Myklebust Cc: Anna Schumaker Cc: Ryusuke Konishi Cc: Darrick J. Wong Cc: Philipp Reisner Cc: Lars Ellenberg Cc: Paolo Valente Cc: Jens Axboe Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/gfp.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 80f63c862be5..20f6fbe12993 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -79,7 +79,7 @@ struct vm_area_struct; * DOC: Page mobility and placement hints * * Page mobility and placement hints - * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * --------------------------------- * * These flags provide hints about how mobile the page is. Pages with similar * mobility are placed within the same pageblocks to minimise problems due @@ -112,7 +112,7 @@ struct vm_area_struct; * DOC: Watermark modifiers * * Watermark modifiers -- controls access to emergency reserves - * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * ------------------------------------------------------------ * * %__GFP_HIGH indicates that the caller is high-priority and that granting * the request is necessary before the system can make forward progress. @@ -144,7 +144,7 @@ struct vm_area_struct; * DOC: Reclaim modifiers * * Reclaim modifiers - * ~~~~~~~~~~~~~~~~~ + * ----------------- * Please note that all the following flags are only applicable to sleepable * allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them). * @@ -224,7 +224,7 @@ struct vm_area_struct; * DOC: Action modifiers * * Action modifiers - * ~~~~~~~~~~~~~~~~ + * ---------------- * * %__GFP_NOWARN suppresses allocation failure reports. * @@ -256,7 +256,7 @@ struct vm_area_struct; * DOC: Useful GFP flag combinations * * Useful GFP flag combinations - * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * ---------------------------- * * Useful GFP flag combinations that are commonly used. It is recommended * that subsystems start with one of these combinations and then set/clear -- cgit From 84dacdbd5352bfef82423760fa2e8bffaeef9e05 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 22 Mar 2022 14:38:51 -0700 Subject: mm: document and polish read-ahead code Add some "big-picture" documentation for read-ahead and polish the code to make it fit this documentation. The meaning of ->async_size is clarified to match its name. i.e. Any request to ->readahead() has a sync part and an async part. The caller will wait for the sync pages to complete, but will not wait for the async pages. The first async page is still marked PG_readahead Note that the current function names page_cache_sync_ra() and page_cache_async_ra() are misleading. All ra request are partly sync and partly async, so either part can be empty. A page_cache_sync_ra() request will usually set ->async_size non-zero, implying it is not all synchronous. When a non-zero req_count is passed to page_cache_async_ra(), the implication is that some prefix of the request is synchronous, though the calculation made there is incorrect - I haven't tried to fix it. Link: https://lkml.kernel.org/r/164549983734.9187.11586890887006601405.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/fs.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index e2d892b201b0..8b5c486bd4a2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -930,10 +930,15 @@ struct fown_struct { * struct file_ra_state - Track a file's readahead state. * @start: Where the most recent readahead started. * @size: Number of pages read in the most recent readahead. - * @async_size: Start next readahead when this many pages are left. - * @ra_pages: Maximum size of a readahead request. + * @async_size: Numer of pages that were/are not needed immediately + * and so were/are genuinely "ahead". Start next readahead when + * the first of these pages is accessed. + * @ra_pages: Maximum size of a readahead request, copied from the bdi. * @mmap_miss: How many mmap accesses missed in the page cache. * @prev_pos: The last byte in the most recent read request. + * + * When this structure is passed to ->readahead(), the "most recent" + * readahead means the current readahead. */ struct file_ra_state { pgoff_t start; -- cgit From 6df25e58532be7a4cd6fb15bcd85805947402d91 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 22 Mar 2022 14:39:01 -0700 Subject: nfs: remove reliance on bdi congestion The bdi congestion tracking in not widely used and will be removed. NFS is one of a small number of filesystems that uses it, setting just the async (write) congestion flag at what it determines are appropriate times. The only remaining effect of the async flag is to cause (some) WB_SYNC_NONE writes to be skipped. So instead of setting the flag, set an internal flag and change: - .writepages to do nothing if WB_SYNC_NONE and the flag is set - .writepage to return AOP_WRITEPAGE_ACTIVATE if WB_SYNC_NONE and the flag is set. The writepages change causes a behavioural change in that pageout() can now return PAGE_ACTIVATE instead of PAGE_KEEP, so SetPageActive() will be called on the page which (I think) wil further delay the next attempt at writeout. This might be a good thing. Link: https://lkml.kernel.org/r/164549983738.9187.3972219847989393182.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/nfs_fs_sb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index ca0959e51e81..6aa2a200676a 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -138,6 +138,7 @@ struct nfs_server { struct nlm_host *nlm_host; /* NLM client handle */ struct nfs_iostats __percpu *io_stats; /* I/O statistics */ atomic_long_t writeback; /* number of writeback pages */ + unsigned int write_congested;/* flag set when writeback gets too high */ unsigned int flags; /* various flags */ /* The following are for internal use only. Also see uapi/linux/nfs_mount.h */ -- cgit From fe55d563d4174f13839a9b7ef7309da5031b5d93 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 22 Mar 2022 14:39:07 -0700 Subject: remove inode_congested() inode_congested() reports if the backing-device for the inode is congested. No bdi reports congestion any more, so this always returns 'false'. So remove inode_congested() and related functions, and remove the call sites, assuming that inode_congested() always returns 'false'. Link: https://lkml.kernel.org/r/164549983741.9187.2174285592262191311.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/backing-dev.h | 22 ---------------------- 1 file changed, 22 deletions(-) (limited to 'include/linux') diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 483979c1b9f4..860b675c2929 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -162,7 +162,6 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi, gfp_t gfp); void wb_memcg_offline(struct mem_cgroup *memcg); void wb_blkcg_offline(struct blkcg *blkcg); -int inode_congested(struct inode *inode, int cong_bits); /** * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode @@ -390,29 +389,8 @@ static inline void wb_blkcg_offline(struct blkcg *blkcg) { } -static inline int inode_congested(struct inode *inode, int cong_bits) -{ - return wb_congested(&inode_to_bdi(inode)->wb, cong_bits); -} - #endif /* CONFIG_CGROUP_WRITEBACK */ -static inline int inode_read_congested(struct inode *inode) -{ - return inode_congested(inode, 1 << WB_sync_congested); -} - -static inline int inode_write_congested(struct inode *inode) -{ - return inode_congested(inode, 1 << WB_async_congested); -} - -static inline int inode_rw_congested(struct inode *inode) -{ - return inode_congested(inode, (1 << WB_sync_congested) | - (1 << WB_async_congested)); -} - static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits) { return wb_congested(&bdi->wb, cong_bits); -- cgit From b9b1335e640308acc1b8f26c739b804c80a6c147 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 22 Mar 2022 14:39:10 -0700 Subject: remove bdi_congested() and wb_congested() and related functions These functions are no longer useful as no BDIs report congestions any more. Removing the test on bdi_write_contested() in current_may_throttle() could cause a small change in behaviour, but only when PF_LOCAL_THROTTLE is set. So replace the calls by 'false' and simplify the code - and remove the functions. [akpm@linux-foundation.org: fix build] Link: https://lkml.kernel.org/r/164549983742.9187.2570198746005819592.stgit@noble.brown Signed-off-by: NeilBrown Acked-by: Ryusuke Konishi [nilfs] Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/backing-dev.h | 26 -------------------------- 1 file changed, 26 deletions(-) (limited to 'include/linux') diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 860b675c2929..2d764566280c 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -135,11 +135,6 @@ static inline bool writeback_in_progress(struct bdi_writeback *wb) struct backing_dev_info *inode_to_bdi(struct inode *inode); -static inline int wb_congested(struct bdi_writeback *wb, int cong_bits) -{ - return wb->congested & cong_bits; -} - long congestion_wait(int sync, long timeout); static inline bool mapping_can_writeback(struct address_space *mapping) @@ -391,27 +386,6 @@ static inline void wb_blkcg_offline(struct blkcg *blkcg) #endif /* CONFIG_CGROUP_WRITEBACK */ -static inline int bdi_congested(struct backing_dev_info *bdi, int cong_bits) -{ - return wb_congested(&bdi->wb, cong_bits); -} - -static inline int bdi_read_congested(struct backing_dev_info *bdi) -{ - return bdi_congested(bdi, 1 << WB_sync_congested); -} - -static inline int bdi_write_congested(struct backing_dev_info *bdi) -{ - return bdi_congested(bdi, 1 << WB_async_congested); -} - -static inline int bdi_rw_congested(struct backing_dev_info *bdi) -{ - return bdi_congested(bdi, (1 << WB_sync_congested) | - (1 << WB_async_congested)); -} - const char *bdi_dev_name(struct backing_dev_info *bdi); #endif /* _LINUX_BACKING_DEV_H */ -- cgit From a88f2096d5a2d91179db5dd9aa8f60dc3df9bb3e Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Tue, 22 Mar 2022 14:39:19 -0700 Subject: remove congestion tracking framework This framework is no longer used - so discard it. Link: https://lkml.kernel.org/r/164549983747.9187.6171768583526866601.stgit@noble.brown Signed-off-by: NeilBrown Cc: Anna Schumaker Cc: Chao Yu Cc: Darrick J. Wong Cc: Ilya Dryomov Cc: Jaegeuk Kim Cc: Jan Kara Cc: Jeff Layton Cc: Jens Axboe Cc: Lars Ellenberg Cc: Miklos Szeredi Cc: Paolo Valente Cc: Philipp Reisner Cc: Ryusuke Konishi Cc: Trond Myklebust Cc: Wu Fengguang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/backing-dev-defs.h | 8 -------- include/linux/backing-dev.h | 2 -- 2 files changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 993c5628a726..e863c88df95f 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -207,14 +207,6 @@ struct backing_dev_info { #endif }; -enum { - BLK_RW_ASYNC = 0, - BLK_RW_SYNC = 1, -}; - -void clear_bdi_congested(struct backing_dev_info *bdi, int sync); -void set_bdi_congested(struct backing_dev_info *bdi, int sync); - struct wb_lock_cookie { bool locked; unsigned long flags; diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 2d764566280c..87ce24d238f3 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -135,8 +135,6 @@ static inline bool writeback_in_progress(struct bdi_writeback *wb) struct backing_dev_info *inode_to_bdi(struct inode *inode); -long congestion_wait(int sync, long timeout); - static inline bool mapping_can_writeback(struct address_space *mapping) { return inode_to_bdi(mapping->host)->capabilities & BDI_CAP_WRITEBACK; -- cgit From a128b054ce029554a4a52fc3abb8c1df8bafcaef Mon Sep 17 00:00:00 2001 From: Anthony Iliopoulos Date: Tue, 22 Mar 2022 14:39:22 -0700 Subject: mount: warn only once about timestamp range expiration Commit f8b92ba67c5d ("mount: Add mount warning for impending timestamp expiry") introduced a mount warning regarding filesystem timestamp limits, that is printed upon each writable mount or remount. This can result in a lot of unnecessary messages in the kernel log in setups where filesystems are being frequently remounted (or mounted multiple times). Avoid this by setting a superblock flag which indicates that the warning has been emitted at least once for any particular mount, as suggested in [1]. Link: https://lore.kernel.org/CAHk-=wim6VGnxQmjfK_tDg6fbHYKL4EFkmnTjVr9QnRqjDBAeA@mail.gmail.com/ [1] Link: https://lkml.kernel.org/r/20220119202934.26495-1-ailiop@suse.com Signed-off-by: Anthony Iliopoulos Reviewed-by: Christoph Hellwig Acked-by: Christian Brauner Reviewed-by: Darrick J. Wong Cc: Alexander Viro Cc: Deepa Dinamani Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 8b5c486bd4a2..ca9445f6cf3d 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1440,6 +1440,7 @@ extern int send_sigurg(struct fown_struct *fown); #define SB_I_SKIP_SYNC 0x00000100 /* Skip superblock at global sync */ #define SB_I_PERSB_BDI 0x00000200 /* has a per-sb bdi */ +#define SB_I_TS_EXPIRY_WARNED 0x00000400 /* warned about timestamp range expiry */ /* Possible states of 'frozen' field */ enum { -- cgit From eb5279fb7e41804ecc15ed3cf716a9e2b419af57 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Tue, 22 Mar 2022 14:39:28 -0700 Subject: filemap: remove find_get_pages() It's unused now. Remove it and clean up the relevant comment. Link: https://lkml.kernel.org/r/20220208134149.47299-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Christoph Hellwig Cc: Matthew Wilcox (Oracle) Cc: David Howells Cc: William Kucharski Cc: Vlastimil Babka Cc: Kirill A. Shutemov Cc: Johannes Weiner Cc: Andreas Gruenbacher Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/pagemap.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 270bf5136c34..dc31eb981ea2 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -594,13 +594,6 @@ static inline struct page *find_subpage(struct page *head, pgoff_t index) unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start, pgoff_t end, unsigned int nr_pages, struct page **pages); -static inline unsigned find_get_pages(struct address_space *mapping, - pgoff_t *start, unsigned int nr_pages, - struct page **pages) -{ - return find_get_pages_range(mapping, start, (pgoff_t)-1, nr_pages, - pages); -} unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start, unsigned int nr_pages, struct page **pages); unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, -- cgit From ad6c441266dcd50be080a47e1178a1b15369923c Mon Sep 17 00:00:00 2001 From: John Hubbard Date: Tue, 22 Mar 2022 14:39:43 -0700 Subject: mm/gup: remove unused pin_user_pages_locked() This routine was used for a short while, but then the calling code was refactored and the only caller was removed. Link: https://lkml.kernel.org/r/20220204020010.68930-4-jhubbard@nvidia.com Signed-off-by: John Hubbard Reviewed-by: David Hildenbrand Reviewed-by: Jason Gunthorpe Reviewed-by: Jan Kara Reviewed-by: Christoph Hellwig Reviewed-by: Claudio Imbrenda Cc: Alex Williamson Cc: Andrea Arcangeli Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lukas Bulwahn Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 5744a3fc4716..d4a2b40066fa 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1918,8 +1918,6 @@ long pin_user_pages(unsigned long start, unsigned long nr_pages, struct vm_area_struct **vmas); long get_user_pages_locked(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, int *locked); -long pin_user_pages_locked(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, -- cgit From 73fd16d8080f7b1537ba7aa29917f64d6fffa664 Mon Sep 17 00:00:00 2001 From: John Hubbard Date: Tue, 22 Mar 2022 14:39:50 -0700 Subject: mm/gup: remove unused get_user_pages_locked() Now that the last caller of get_user_pages_locked() is gone, remove it. Link: https://lkml.kernel.org/r/20220204020010.68930-6-jhubbard@nvidia.com Signed-off-by: John Hubbard Reviewed-by: Jan Kara Reviewed-by: Jason Gunthorpe Reviewed-by: Claudio Imbrenda Reviewed-by: Christoph Hellwig Cc: Alex Williamson Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Kirill A. Shutemov Cc: Lukas Bulwahn Cc: Matthew Wilcox (Oracle) Cc: Peter Xu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d4a2b40066fa..c02a8cc16e4f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1916,8 +1916,6 @@ long get_user_pages(unsigned long start, unsigned long nr_pages, long pin_user_pages(unsigned long start, unsigned long nr_pages, unsigned int gup_flags, struct page **pages, struct vm_area_struct **vmas); -long get_user_pages_locked(unsigned long start, unsigned long nr_pages, - unsigned int gup_flags, struct page **pages, int *locked); long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, -- cgit From f7cd16a55837f37b4c3835a2c646023e4d0f0e04 Mon Sep 17 00:00:00 2001 From: Xavier Roche Date: Tue, 22 Mar 2022 14:39:55 -0700 Subject: tmpfs: support for file creation time Various filesystems (including ext4) now support file creation time. This adds such support for tmpfs-based filesystems. Note that using shmem_getattr() on other file types than regular requires that shmem_is_huge() check type, to stop incorrect HPAGE_PMD_SIZE blksize. [hughd@google.com: three tweaks to creation time patch] Link: https://lkml.kernel.org/r/b954973a-b8d1-cab8-63bd-6ea8063de3@google.com Link: https://lkml.kernel.org/r/20220314211150.GA123458@xavier-xps Link: https://lkml.kernel.org/r/b954973a-b8d1-cab8-63bd-6ea8063de3@google.com Link: https://lkml.kernel.org/r/20220211213628.GA1919658@xavier-xps Signed-off-by: Xavier Roche Signed-off-by: Hugh Dickins Tested-by: Jean Delvare Tested-by: Sylvain Bellone Reported-by: Xavier Grand Reviewed-by: Jean Delvare Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/shmem_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index e65b80ed09e7..ab51d3cd39bd 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -24,6 +24,7 @@ struct shmem_inode_info { struct shared_policy policy; /* NUMA memory alloc policy */ struct simple_xattrs xattrs; /* list of xattrs */ atomic_t stop_eviction; /* hold when working on inode */ + struct timespec64 i_crtime; /* file creation time */ struct inode vfs_inode; }; -- cgit From a8c49af3be5f0b4e105ef678bcf14ef102c270be Mon Sep 17 00:00:00 2001 From: Yosry Ahmed Date: Tue, 22 Mar 2022 14:40:10 -0700 Subject: memcg: add per-memcg total kernel memory stat Currently memcg stats show several types of kernel memory: kernel stack, page tables, sock, vmalloc, and slab. However, there are other allocations with __GFP_ACCOUNT (or supersets such as GFP_KERNEL_ACCOUNT) that are not accounted in any of those stats, a few examples are: - various kvm allocations (e.g. allocated pages to create vcpus) - io_uring - tmp_page in pipes during pipe_write() - bpf ringbuffers - unix sockets Keeping track of the total kernel memory is essential for the ease of migration from cgroup v1 to v2 as there are large discrepancies between v1's kmem.usage_in_bytes and the sum of the available kernel memory stats in v2. Adding separate memcg stats for all __GFP_ACCOUNT kernel allocations is an impractical maintenance burden as there a lot of those all over the kernel code, with more use cases likely to show up in the future. Therefore, add a "kernel" memcg stat that is analogous to kmem page counter, with added benefits such as using rstat infrastructure which aggregates stats more efficiently. Additionally, this provides a lighter alternative in case the legacy kmem is deprecated in the future [yosryahmed@google.com: v2] Link: https://lkml.kernel.org/r/20220203193856.972500-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20220201200823.3283171-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Johannes Weiner Cc: Michal Hocko Cc: Muchun Song Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0abbd685703b..8612d7dd0859 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -34,6 +34,7 @@ enum memcg_stat_item { MEMCG_SOCK, MEMCG_PERCPU_B, MEMCG_VMALLOC, + MEMCG_KMEM, MEMCG_NR_STAT, }; -- cgit From 486bc7060cb510fa60cb85a013d5ed51ce0fe456 Mon Sep 17 00:00:00 2001 From: Wei Yang Date: Tue, 22 Mar 2022 14:40:16 -0700 Subject: mm/memcg: retrieve parent memcg from css.parent The parent we get from page_counter is correct, while this is two different hierarchy. Let's retrieve the parent memcg from css.parent just like parent_cs(), blkcg_parent(), etc. Link: https://lkml.kernel.org/r/20220201004643.8391-2-richard.weiyang@gmail.com Signed-off-by: Wei Yang Reviewed-by: Muchun Song Acked-by: Michal Hocko Reviewed-by: Roman Gushchin Reviewed-by: Shakeel Butt Cc: Roman Gushchin Cc: Johannes Weiner Cc: Vladimir Davydov Cc: Yang Shi Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Mike Rapoport Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 8612d7dd0859..ef4b445392a9 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -842,9 +842,7 @@ static inline struct mem_cgroup *lruvec_memcg(struct lruvec *lruvec) */ static inline struct mem_cgroup *parent_mem_cgroup(struct mem_cgroup *memcg) { - if (!memcg->memory.parent) - return NULL; - return mem_cgroup_from_counter(memcg->memory.parent, memory); + return mem_cgroup_from_css(memcg->css.parent); } static inline bool mem_cgroup_is_descendant(struct mem_cgroup *memcg, -- cgit From 6a6b7b77cc0fdc13f50c66c219c8c05500a8dfce Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:40:53 -0700 Subject: mm: list_lru: transpose the array of per-node per-memcg lru lists Patch series "Optimize list lru memory consumption", v6. In our server, we found a suspected memory leak problem. The kmalloc-32 consumes more than 6GB of memory. Other kmem_caches consume less than 2GB memory. After our in-depth analysis, the memory consumption of kmalloc-32 slab cache is the cause of list_lru_one allocation. crash> p memcg_nr_cache_ids memcg_nr_cache_ids = $2 = 24574 memcg_nr_cache_ids is very large and memory consumption of each list_lru can be calculated with the following formula. num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32) There are 4 numa nodes in our system, so each list_lru consumes ~3MB. crash> list super_blocks | wc -l 952 Every mount will register 2 list lrus, one is for inode, another is for dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3 MB (~5.6GB). But now the number of memory cgroups is less than 500. So I guess more than 12286 memory cgroups have been created on this machine (I do not know why there are so many cgroups, it may be a user's bug or the user really want to do that). Because memcg_nr_cache_ids has not been reduced to a suitable value. It leads to waste a lot of memory. If we want to reduce memcg_nr_cache_ids, we have to *reboot* the server. This is not what we want. In order to reduce memcg_nr_cache_ids, I had posted a patchset [1] to do this. But this did not fundamentally solve the problem. We currently allocate scope for every memcg to be able to tracked on every superblock instantiated in the system, regardless of whether that superblock is even accessible to that memcg. These huge memcg counts come from container hosts where memcgs are confined to just a small subset of the total number of superblocks that instantiated at any given point in time. For these systems with huge container counts, list_lru does not need the capability of tracking every memcg on every superblock. What it comes down to is that the list_lru is only needed for a given memcg if that memcg is instatiating and freeing objects on a given list_lru. As Dave said, "Which makes me think we should be moving more towards 'add the memcg to the list_lru at the first insert' model rather than 'instantiate all at memcg init time just in case'." This patchset aims to optimize the list lru memory consumption from different aspects. I had done a easy test to show the optimization. I create 10k memory cgroups and mount 10k filesystems in the systems. We use free command to show how many memory does the systems comsumes after this operation (There are 2 numa nodes in the system). +-----------------------+------------------------+ | condition | memory consumption | +-----------------------+------------------------+ | without this patchset | 24464 MB | +-----------------------+------------------------+ | after patch 1 | 21957 MB | <--------+ +-----------------------+------------------------+ | | after patch 10 | 6895 MB | | +-----------------------+------------------------+ | | after patch 12 | 4367 MB | | +-----------------------+------------------------+ | | The more the number of nodes, the more obvious the effect---+ BTW, there was a recent discussion [2] on the same issue. [1] https://lore.kernel.org/all/20210428094949.43579-1-songmuchun@bytedance.com/ [2] https://lore.kernel.org/all/20210405054848.GA1077931@in.ibm.com/ This series not only optimizes the memory usage of list_lru but also simplifies the code. This patch (of 16): The current scheme of maintaining per-node per-memcg lru lists looks like: struct list_lru { struct list_lru_node *node; (for each node) struct list_lru_memcg *memcg_lrus; struct list_lru_one *lru[]; (for each memcg) } By effectively transposing the two-dimension array of list_lru_one's structures (per-node per-memcg => per-memcg per-node) it's possible to save some memory and simplify alloc/dealloc paths. The new scheme looks like: struct list_lru { struct list_lru_memcg *mlrus; struct list_lru_per_memcg *mlru[]; (for each memcg) struct list_lru_one node[0]; (for each node) } Memory savings are coming from not only 'struct rcu_head' but also some pointer arrays used to store the pointer to 'struct list_lru_one'. The array is per node and its size is 8 (a pointer) * num_memcgs. So the total size of the arrays is 8 * num_nodes * memcg_nr_cache_ids. After this patch, the size becomes 8 * memcg_nr_cache_ids. Link: https://lkml.kernel.org/r/20220228122126.37293-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220228122126.37293-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Johannes Weiner Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Vladimir Davydov Cc: Shakeel Butt Cc: Yang Shi Cc: Alex Shi Cc: Wei Yang Cc: Dave Chinner Cc: Trond Myklebust Cc: Anna Schumaker Cc: Jaegeuk Kim Cc: Chao Yu Cc: Kari Argillander Cc: Vlastimil Babka Cc: Qi Zheng Cc: Xiongchun Duan Cc: Fam Zheng Cc: Roman Gushchin Cc: Theodore Ts'o Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 1b5fceb565df..729a27b6ff53 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -31,10 +31,15 @@ struct list_lru_one { long nr_items; }; +struct list_lru_per_memcg { + /* array of per cgroup per node lists, indexed by node id */ + struct list_lru_one node[0]; +}; + struct list_lru_memcg { - struct rcu_head rcu; + struct rcu_head rcu; /* array of per cgroup lists, indexed by memcg_cache_id */ - struct list_lru_one *lru[]; + struct list_lru_per_memcg *mlru[]; }; struct list_lru_node { @@ -42,11 +47,7 @@ struct list_lru_node { spinlock_t lock; /* global list, used for the root cgroup in cgroup aware lrus */ struct list_lru_one lru; -#ifdef CONFIG_MEMCG_KMEM - /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ - struct list_lru_memcg __rcu *memcg_lrus; -#endif - long nr_items; + long nr_items; } ____cacheline_aligned_in_smp; struct list_lru { @@ -55,6 +56,8 @@ struct list_lru { struct list_head list; int shrinker_id; bool memcg_aware; + /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ + struct list_lru_memcg __rcu *mlrus; #endif }; -- cgit From 88f2ef73fd66491a2f9a82373d22ca6540f23c62 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:40:56 -0700 Subject: mm: introduce kmem_cache_alloc_lru We currently allocate scope for every memcg to be able to tracked on every superblock instantiated in the system, regardless of whether that superblock is even accessible to that memcg. These huge memcg counts come from container hosts where memcgs are confined to just a small subset of the total number of superblocks that instantiated at any given point in time. For these systems with huge container counts, list_lru does not need the capability of tracking every memcg on every superblock. What it comes down to is that adding the memcg to the list_lru at the first insert. So introduce kmem_cache_alloc_lru to allocate objects and its list_lru. In the later patch, we will convert all inode and dentry allocation from kmem_cache_alloc to kmem_cache_alloc_lru. Link: https://lkml.kernel.org/r/20220228122126.37293-3-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 4 ++++ include/linux/memcontrol.h | 14 ++++++++++++++ include/linux/slab.h | 3 +++ 3 files changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 729a27b6ff53..ab912c49334f 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -56,6 +56,8 @@ struct list_lru { struct list_head list; int shrinker_id; bool memcg_aware; + /* protects ->mlrus->mlru[i] */ + spinlock_t lock; /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ struct list_lru_memcg __rcu *mlrus; #endif @@ -72,6 +74,8 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware, #define list_lru_init_memcg(lru, shrinker) \ __list_lru_init((lru), true, NULL, shrinker) +int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, + gfp_t gfp); int memcg_update_all_list_lrus(int num_memcgs); void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg); diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ef4b445392a9..b636732f0c14 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -524,6 +524,20 @@ static inline struct mem_cgroup *page_memcg_check(struct page *page) return (struct mem_cgroup *)(memcg_data & ~MEMCG_DATA_FLAGS_MASK); } +static inline struct mem_cgroup *get_mem_cgroup_from_objcg(struct obj_cgroup *objcg) +{ + struct mem_cgroup *memcg; + + rcu_read_lock(); +retry: + memcg = obj_cgroup_memcg(objcg); + if (unlikely(!css_tryget(&memcg->css))) + goto retry; + rcu_read_unlock(); + + return memcg; +} + #ifdef CONFIG_MEMCG_KMEM /* * folio_memcg_kmem - Check if the folio has the memcg_kmem flag set. diff --git a/include/linux/slab.h b/include/linux/slab.h index 5b6193fd8bd9..e6addaf91afd 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -135,6 +135,7 @@ #include +struct list_lru; struct mem_cgroup; /* * struct kmem_cache related prototypes @@ -416,6 +417,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size, void *__kmalloc(size_t size, gfp_t flags) __assume_kmalloc_alignment __alloc_size(1); void *kmem_cache_alloc(struct kmem_cache *s, gfp_t flags) __assume_slab_alignment __malloc; +void *kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, + gfp_t gfpflags) __assume_slab_alignment __malloc; void kmem_cache_free(struct kmem_cache *s, void *objp); /* -- cgit From 8b9f3ac5b01db85c6cf74c2c3a71280cc3045c9c Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:41:00 -0700 Subject: fs: introduce alloc_inode_sb() to allocate filesystems specific inode The allocated inode cache is supposed to be added to its memcg list_lru which should be allocated as well in advance. That can be done by kmem_cache_alloc_lru() which allocates object and list_lru. The file systems is main user of it. So introduce alloc_inode_sb() to allocate file system specific inodes and set up the inode reclaim context properly. The file system is supposed to use alloc_inode_sb() to allocate inodes. In later patches, we will convert all users to the new API. Link: https://lkml.kernel.org/r/20220228122126.37293-4-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Roman Gushchin Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/fs.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index ca9445f6cf3d..58a73e59e4c0 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -42,6 +42,7 @@ #include #include #include +#include #include #include @@ -3114,6 +3115,16 @@ extern void free_inode_nonrcu(struct inode *inode); extern int should_remove_suid(struct dentry *); extern int file_remove_privs(struct file *); +/* + * This must be used for allocating filesystems specific inodes to set + * up the inode reclaim context correctly. + */ +static inline void * +alloc_inode_sb(struct super_block *sb, struct kmem_cache *cache, gfp_t gfp) +{ + return kmem_cache_alloc_lru(cache, &sb->s_inode_lru, gfp); +} + extern void __insert_inode_hash(struct inode *, unsigned long hashval); static inline void insert_inode_hash(struct inode *inode) { -- cgit From 9bbdc0f324097f72b2354c2f8be4cdffd32679b6 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:41:12 -0700 Subject: xarray: use kmem_cache_alloc_lru to allocate xa_node The workingset will add the xa_node to the shadow_nodes list. So the allocation of xa_node should be done by kmem_cache_alloc_lru(). Using xas_set_lru() to pass the list_lru which we want to insert xa_node into to set up the xa_node reclaim context correctly. Link: https://lkml.kernel.org/r/20220228122126.37293-9-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Johannes Weiner Acked-by: Roman Gushchin Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/swap.h | 5 ++++- include/linux/xarray.h | 9 ++++++++- 2 files changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 1d38d9475c4d..3db431276d82 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -334,9 +334,12 @@ void workingset_activation(struct folio *folio); /* Only track the nodes of mappings with shadow entries */ void workingset_update_node(struct xa_node *node); +extern struct list_lru shadow_nodes; #define mapping_set_update(xas, mapping) do { \ - if (!dax_mapping(mapping) && !shmem_mapping(mapping)) \ + if (!dax_mapping(mapping) && !shmem_mapping(mapping)) { \ xas_set_update(xas, workingset_update_node); \ + xas_set_lru(xas, &shadow_nodes); \ + } \ } while (0) /* linux/mm/page_alloc.c */ diff --git a/include/linux/xarray.h b/include/linux/xarray.h index d6d5da6ed735..bb52b786be1b 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1317,6 +1317,7 @@ struct xa_state { struct xa_node *xa_node; struct xa_node *xa_alloc; xa_update_node_t xa_update; + struct list_lru *xa_lru; }; /* @@ -1336,7 +1337,8 @@ struct xa_state { .xa_pad = 0, \ .xa_node = XAS_RESTART, \ .xa_alloc = NULL, \ - .xa_update = NULL \ + .xa_update = NULL, \ + .xa_lru = NULL, \ } /** @@ -1631,6 +1633,11 @@ static inline void xas_set_update(struct xa_state *xas, xa_update_node_t update) xas->xa_update = update; } +static inline void xas_set_lru(struct xa_state *xas, struct list_lru *lru) +{ + xas->xa_lru = lru; +} + /** * xas_next_entry() - Advance iterator to next present entry. * @xas: XArray operation state. -- cgit From 5abc1e37afa0335c52608d640fd30910b2eeda21 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:41:19 -0700 Subject: mm: list_lru: allocate list_lru_one only when needed In our server, we found a suspected memory leak problem. The kmalloc-32 consumes more than 6GB of memory. Other kmem_caches consume less than 2GB memory. After our in-depth analysis, the memory consumption of kmalloc-32 slab cache is the cause of list_lru_one allocation. crash> p memcg_nr_cache_ids memcg_nr_cache_ids = $2 = 24574 memcg_nr_cache_ids is very large and memory consumption of each list_lru can be calculated with the following formula. num_numa_node * memcg_nr_cache_ids * 32 (kmalloc-32) There are 4 numa nodes in our system, so each list_lru consumes ~3MB. crash> list super_blocks | wc -l 952 Every mount will register 2 list lrus, one is for inode, another is for dentry. There are 952 super_blocks. So the total memory is 952 * 2 * 3 MB (~5.6GB). But the number of memory cgroup is less than 500. So I guess more than 12286 containers have been deployed on this machine (I do not know why there are so many containers, it may be a user's bug or the user really want to do that). And memcg_nr_cache_ids has not been reduced to a suitable value. This can waste a lot of memory. Now the infrastructure for dynamic list_lru_one allocation is ready, so remove statically allocated memory code to save memory. Link: https://lkml.kernel.org/r/20220228122126.37293-11-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index ab912c49334f..c36db6dc2a65 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -32,14 +32,15 @@ struct list_lru_one { }; struct list_lru_per_memcg { + struct rcu_head rcu; /* array of per cgroup per node lists, indexed by node id */ - struct list_lru_one node[0]; + struct list_lru_one node[]; }; struct list_lru_memcg { struct rcu_head rcu; /* array of per cgroup lists, indexed by memcg_cache_id */ - struct list_lru_per_memcg *mlru[]; + struct list_lru_per_memcg __rcu *mlru[]; }; struct list_lru_node { @@ -77,7 +78,7 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware, int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, gfp_t gfp); int memcg_update_all_list_lrus(int num_memcgs); -void memcg_drain_all_list_lrus(int src_idx, struct mem_cgroup *dst_memcg); +void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst); /** * list_lru_add: add an element to the lru list's tail -- cgit From 1f391eb270791359ee79031945dbe3afeaec6ce3 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:41:22 -0700 Subject: mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus The purpose of the memcg_drain_all_list_lrus() is list_lrus reparenting. It is very similar to memcg_reparent_objcgs(). Rename it to memcg_reparent_list_lrus() so that the name can more consistent with memcg_reparent_objcgs(). Link: https://lkml.kernel.org/r/20220228122126.37293-12-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index c36db6dc2a65..4b00fd8cb373 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -78,7 +78,7 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware, int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, gfp_t gfp); int memcg_update_all_list_lrus(int num_memcgs); -void memcg_drain_all_list_lrus(struct mem_cgroup *src, struct mem_cgroup *dst); +void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent); /** * list_lru_add: add an element to the lru list's tail -- cgit From bbca91cca9a902de2e9907370e9c1e0a3d1aab0f Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:41:25 -0700 Subject: mm: list_lru: replace linear array with xarray If we run 10k containers in the system, the size of the list_lru_memcg->lrus can be ~96KB per list_lru. When we decrease the number containers, the size of the array will not be shrinked. It is not scalable. The xarray is a good choice for this case. We can save a lot of memory when there are tens of thousands continers in the system. If we use xarray, we also can remove the logic code of resizing array, which can simplify the code. [akpm@linux-foundation.org: remove unused local] Link: https://lkml.kernel.org/r/20220228122126.37293-13-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 13 ++----------- include/linux/memcontrol.h | 23 ----------------------- 2 files changed, 2 insertions(+), 34 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 4b00fd8cb373..572c263561ac 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -11,6 +11,7 @@ #include #include #include +#include struct mem_cgroup; @@ -37,12 +38,6 @@ struct list_lru_per_memcg { struct list_lru_one node[]; }; -struct list_lru_memcg { - struct rcu_head rcu; - /* array of per cgroup lists, indexed by memcg_cache_id */ - struct list_lru_per_memcg __rcu *mlru[]; -}; - struct list_lru_node { /* protects all lists on the node, including per cgroup */ spinlock_t lock; @@ -57,10 +52,7 @@ struct list_lru { struct list_head list; int shrinker_id; bool memcg_aware; - /* protects ->mlrus->mlru[i] */ - spinlock_t lock; - /* for cgroup aware lrus points to per cgroup lists, otherwise NULL */ - struct list_lru_memcg __rcu *mlrus; + struct xarray xa; #endif }; @@ -77,7 +69,6 @@ int __list_lru_init(struct list_lru *lru, bool memcg_aware, int memcg_list_lru_alloc(struct mem_cgroup *memcg, struct list_lru *lru, gfp_t gfp); -int memcg_update_all_list_lrus(int num_memcgs); void memcg_reparent_list_lrus(struct mem_cgroup *memcg, struct mem_cgroup *parent); /** diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index b636732f0c14..066b7a3b8a9e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1685,18 +1685,6 @@ void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); extern struct static_key_false memcg_kmem_enabled_key; -extern int memcg_nr_cache_ids; -void memcg_get_cache_ids(void); -void memcg_put_cache_ids(void); - -/* - * Helper macro to loop through all memcg-specific caches. Callers must still - * check if the cache is valid (it is either valid or NULL). - * the slab_mutex must be held when looping through those caches - */ -#define for_each_memcg_cache_index(_idx) \ - for ((_idx) = 0; (_idx) < memcg_nr_cache_ids; (_idx)++) - static inline bool memcg_kmem_enabled(void) { return static_branch_likely(&memcg_kmem_enabled_key); @@ -1753,9 +1741,6 @@ static inline void __memcg_kmem_uncharge_page(struct page *page, int order) { } -#define for_each_memcg_cache_index(_idx) \ - for (; NULL; ) - static inline bool memcg_kmem_enabled(void) { return false; @@ -1766,14 +1751,6 @@ static inline int memcg_cache_id(struct mem_cgroup *memcg) return -1; } -static inline void memcg_get_cache_ids(void) -{ -} - -static inline void memcg_put_cache_ids(void) -{ -} - static inline struct mem_cgroup *mem_cgroup_from_obj(void *p) { return NULL; -- cgit From d70110704d2d52a65a9fe43be409d8c3acce79fe Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:41:35 -0700 Subject: mm: list_lru: rename list_lru_per_memcg to list_lru_memcg The name of list_lru_memcg was occupied before and became free since last commit. Rename list_lru_per_memcg to list_lru_memcg since the name is brief. Link: https://lkml.kernel.org/r/20220228122126.37293-16-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list_lru.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/list_lru.h b/include/linux/list_lru.h index 572c263561ac..b35968ee9fb5 100644 --- a/include/linux/list_lru.h +++ b/include/linux/list_lru.h @@ -32,7 +32,7 @@ struct list_lru_one { long nr_items; }; -struct list_lru_per_memcg { +struct list_lru_memcg { struct rcu_head rcu; /* array of per cgroup per node lists, indexed by node id */ struct list_lru_one node[]; -- cgit From 7c52f65de40ff0e44f821559eb61931d368a4c48 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:41:38 -0700 Subject: mm: memcontrol: rename memcg_cache_id to memcg_kmem_id The memcg_cache_id() introduced by commit 2633d7a02823 ("slab/slub: consider a memcg parameter in kmem_create_cache") is used to index in the kmem_cache->memcg_params->memcg_caches array. Since kmem_cache->memcg_params.memcg_caches has been removed by commit 9855609bde03 ("mm: memcg/slab: use a single set of kmem_caches for all accounted allocations"). So the name does not need to reflect cache related. Just rename it to memcg_kmem_id. And it can reflect kmem related. Link: https://lkml.kernel.org/r/20220228122126.37293-17-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alex Shi Cc: Anna Schumaker Cc: Chao Yu Cc: Dave Chinner Cc: Fam Zheng Cc: Jaegeuk Kim Cc: Johannes Weiner Cc: Kari Argillander Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Qi Zheng Cc: Roman Gushchin Cc: Shakeel Butt Cc: Theodore Ts'o Cc: Trond Myklebust Cc: Vladimir Davydov Cc: Vlastimil Babka Cc: Wei Yang Cc: Xiongchun Duan Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 066b7a3b8a9e..a68dce3873fc 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1708,7 +1708,7 @@ static inline void memcg_kmem_uncharge_page(struct page *page, int order) * A helper for accessing memcg's kmem_id, used for getting * corresponding LRU lists. */ -static inline int memcg_cache_id(struct mem_cgroup *memcg) +static inline int memcg_kmem_id(struct mem_cgroup *memcg) { return memcg ? memcg->kmemcg_id : -1; } @@ -1746,7 +1746,7 @@ static inline bool memcg_kmem_enabled(void) return false; } -static inline int memcg_cache_id(struct mem_cgroup *memcg) +static inline int memcg_kmem_id(struct mem_cgroup *memcg) { return -1; } -- cgit From 16785bd7743104d57257a455001172b75afa7614 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Tue, 22 Mar 2022 14:41:47 -0700 Subject: mm: merge pte_mkhuge() call into arch_make_huge_pte() Each call into pte_mkhuge() is invariably followed by arch_make_huge_pte(). Instead arch_make_huge_pte() can accommodate pte_mkhuge() at the beginning. This updates generic fallback stub for arch_make_huge_pte() and available platforms definitions. This makes huge pte creation much cleaner and easier to follow. Link: https://lkml.kernel.org/r/1643860669-26307-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual Reviewed-by: Christophe Leroy Acked-by: Mike Kravetz Acked-by: Catalin Marinas Cc: Will Deacon Cc: Michael Ellerman Cc: Paul Mackerras Cc: "David S. Miller" Cc: Mike Kravetz Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d1897a69c540..52c462390aee 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -754,7 +754,7 @@ static inline void arch_clear_hugepage_flags(struct page *page) { } static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags) { - return entry; + return pte_mkhuge(entry); } #endif -- cgit From ff11a7ce1f0f8c1e7870de26860024b4ddbf5755 Mon Sep 17 00:00:00 2001 From: Bang Li Date: Tue, 22 Mar 2022 14:43:02 -0700 Subject: mm/vmalloc: fix comments about vmap_area struct The vmap_area_root should be in the "busy" tree and the free_vmap_area_root should be in the "free" tree. Link: https://lkml.kernel.org/r/20220305011510.33596-1-libang.linuxer@gmail.com Fixes: 688fcbfc06e4 ("mm/vmalloc: modify struct vmap_area to reduce its size") Signed-off-by: Bang Li Reviewed-by: Uladzislau Rezki (Sony) Cc: Pengfei Li Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/vmalloc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 880227b9f044..05065915edd7 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -80,8 +80,8 @@ struct vmap_area { /* * The following two variables can be packed, because * a vmap_area object can be either: - * 1) in "free" tree (root is vmap_area_root) - * 2) or "busy" tree (root is free_vmap_area_root) + * 1) in "free" tree (root is free_vmap_area_root) + * 2) or "busy" tree (root is vmap_area_root) */ union { unsigned long subtree_max_size; /* in "free" tree */ -- cgit From 1dd214b8f21ca46d5431be9b2db8513c59e07a26 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Tue, 22 Mar 2022 14:43:05 -0700 Subject: mm: page_alloc: avoid merging non-fallbackable pageblocks with others This is done in addition to MIGRATE_ISOLATE pageblock merge avoidance. It prepares for the upcoming removal of the MAX_ORDER-1 alignment requirement for CMA and alloc_contig_range(). MIGRATE_HIGHATOMIC should not merge with other migratetypes like MIGRATE_ISOLATE and MIGRARTE_CMA[1], so this commit prevents that too. Remove MIGRATE_CMA and MIGRATE_ISOLATE from fallbacks list, since they are never used. [1] https://lore.kernel.org/linux-mm/20211130100853.GP3366@techsingularity.net/ Link: https://lkml.kernel.org/r/20220124175957.1261961-1-zi.yan@sent.com Signed-off-by: Zi Yan Acked-by: Mel Gorman Acked-by: David Hildenbrand Acked-by: Vlastimil Babka Acked-by: Mike Rapoport Reviewed-by: Oscar Salvador Cc: Mike Rapoport Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aed44e9b5d89..71b77aab748d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -83,6 +83,17 @@ static inline bool is_migrate_movable(int mt) return is_migrate_cma(mt) || mt == MIGRATE_MOVABLE; } +/* + * Check whether a migratetype can be merged with another migratetype. + * + * It is only mergeable when it can fall back to other migratetypes for + * allocation. See fallbacks[MIGRATE_TYPES][3] in page_alloc.c. + */ +static inline bool migratetype_is_mergeable(int mt) +{ + return mt < MIGRATE_PCPTYPES; +} + #define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ for (type = 0; type < MIGRATE_TYPES; type++) -- cgit From 7f37e49cbd60ef71b82d25cd55b039a65d06387c Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Tue, 22 Mar 2022 14:43:11 -0700 Subject: mm/mmzone.h: remove unused macros Remove pgdat_page_nr, nid_page_nr and NODE_MEM_MAP. They are unused now. Link: https://lkml.kernel.org/r/20220127093210.62293-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Mike Rapoport Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 71b77aab748d..c9e6a50109b9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -931,12 +931,6 @@ typedef struct pglist_data { #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) #define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages) -#ifdef CONFIG_FLATMEM -#define pgdat_page_nr(pgdat, pagenr) ((pgdat)->node_mem_map + (pagenr)) -#else -#define pgdat_page_nr(pgdat, pagenr) pfn_to_page((pgdat)->node_start_pfn + (pagenr)) -#endif -#define nid_page_nr(nid, pagenr) pgdat_page_nr(NODE_DATA(nid),(pagenr)) #define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn) #define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid)) @@ -1112,7 +1106,6 @@ static inline struct pglist_data *NODE_DATA(int nid) { return &contig_page_data; } -#define NODE_MEM_MAP(nid) mem_map #else /* CONFIG_NUMA */ -- cgit From e16faf26780fc0c8dd693ea9ee8420a7706cb2f5 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 22 Mar 2022 14:43:17 -0700 Subject: cma: factor out minimum alignment requirement Patch series "mm: enforce pageblock_order < MAX_ORDER". Having pageblock_order >= MAX_ORDER seems to be able to happen in corner cases and some parts of the kernel are not prepared for it. For example, Aneesh has shown [1] that such kernels can be compiled on ppc64 with 64k base pages by setting FORCE_MAX_ZONEORDER=8, which will run into a WARN_ON_ONCE(order >= MAX_ORDER) in comapction code right during boot. We can get pageblock_order >= MAX_ORDER when the default hugetlb size is bigger than the maximum allocation granularity of the buddy, in which case we are no longer talking about huge pages but instead gigantic pages. Having pageblock_order >= MAX_ORDER can only make alloc_contig_range() of such gigantic pages more likely to succeed. Reliable use of gigantic pages either requires boot time allcoation or CMA, no need to overcomplicate some places in the kernel to optimize for corner cases that are broken in other areas of the kernel. This patch (of 2): Let's enforce pageblock_order < MAX_ORDER and simplify. Especially patch #1 can be regarded a cleanup before: [PATCH v5 0/6] Use pageblock_order for cma and alloc_contig_range alignment. [2] [1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com [2] https://lkml.kernel.org/r/20220211164135.1803616-1-zi.yan@sent.com Link: https://lkml.kernel.org/r/20220214174132.219303-2-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Zi Yan Acked-by: Rob Herring Cc: Aneesh Kumar K.V Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Frank Rowand Cc: Michael S. Tsirkin Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Robin Murphy Cc: Minchan Kim Cc: Vlastimil Babka Cc: John Garry via iommu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/cma.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cma.h b/include/linux/cma.h index bd801023504b..75fe188ec4a1 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -20,6 +20,15 @@ #define CMA_MAX_NAME 64 +/* + * TODO: once the buddy -- especially pageblock merging and alloc_contig_range() + * -- can deal with only some pageblocks of a higher-order page being + * MIGRATE_CMA, we can use pageblock_nr_pages. + */ +#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \ + pageblock_nr_pages) +#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES) + struct cma; extern unsigned long totalcma_pages; -- cgit From b3d40a2b6d10c9d0424d2b398bf962fb6adad87e Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 22 Mar 2022 14:43:20 -0700 Subject: mm: enforce pageblock_order < MAX_ORDER Some places in the kernel don't really expect pageblock_order >= MAX_ORDER, and it looks like this is only possible in corner cases: 1) CONFIG_DEFERRED_STRUCT_PAGE_INIT we'll end up freeing pageblock_order pages via __free_pages_core(), which cannot possibly work. 2) find_zone_movable_pfns_for_nodes() will roundup the ZONE_MOVABLE start PFN to MAX_ORDER_NR_PAGES. Consequently with a bigger pageblock_order, we could have a single pageblock partially managed by two zones. 3) compaction code runs into __fragmentation_index() with order >= MAX_ORDER, when checking WARN_ON_ONCE(order >= MAX_ORDER). [1] 4) mm/page_reporting.c won't be reporting any pages with default page_reporting_order == pageblock_order, as we'll be skipping the reporting loop inside page_reporting_process_zone(). 5) __rmqueue_fallback() will never be able to steal with ALLOC_NOFRAGMENT. pageblock_order >= MAX_ORDER is weird either way: it's a pure optimization for making alloc_contig_range(), as used for allcoation of gigantic pages, a little more reliable to succeed. However, if there is demand for somewhat reliable allocation of gigantic pages, affected setups should be using CMA or boottime allocations instead. So let's make sure that pageblock_order < MAX_ORDER and simplify. [1] https://lkml.kernel.org/r/87r189a2ks.fsf@linux.ibm.com Link: https://lkml.kernel.org/r/20220214174132.219303-3-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Zi Yan Cc: Aneesh Kumar K.V Cc: Benjamin Herrenschmidt Cc: Christoph Hellwig Cc: Frank Rowand Cc: John Garry via iommu Cc: Marek Szyprowski Cc: Michael Ellerman Cc: Michael S. Tsirkin Cc: Minchan Kim Cc: Paul Mackerras Cc: Rob Herring Cc: Robin Murphy Cc: Vlastimil Babka Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/cma.h | 3 +-- include/linux/pageblock-flags.h | 7 +++++-- 2 files changed, 6 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cma.h b/include/linux/cma.h index 75fe188ec4a1..b1ba94f1cc9c 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -25,8 +25,7 @@ * -- can deal with only some pageblocks of a higher-order page being * MIGRATE_CMA, we can use pageblock_nr_pages. */ -#define CMA_MIN_ALIGNMENT_PAGES max_t(phys_addr_t, MAX_ORDER_NR_PAGES, \ - pageblock_nr_pages) +#define CMA_MIN_ALIGNMENT_PAGES MAX_ORDER_NR_PAGES #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES) struct cma; diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h index 973fd731a520..83c7248053a1 100644 --- a/include/linux/pageblock-flags.h +++ b/include/linux/pageblock-flags.h @@ -37,8 +37,11 @@ extern unsigned int pageblock_order; #else /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ -/* Huge pages are a constant size */ -#define pageblock_order HUGETLB_PAGE_ORDER +/* + * Huge pages are a constant size, but don't exceed the maximum allocation + * granularity. + */ +#define pageblock_order min_t(unsigned int, HUGETLB_PAGE_ORDER, MAX_ORDER - 1) #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ -- cgit From 1ca75fa7f19d694c58af681fa023295072b03120 Mon Sep 17 00:00:00 2001 From: Oscar Salvador Date: Tue, 22 Mar 2022 14:43:51 -0700 Subject: arch/x86/mm/numa: Do not initialize nodes twice On x86, prior to ("mm: handle uninitialized numa nodes gracecully"), NUMA nodes could be allocated at three different places. - numa_register_memblks - init_cpu_to_node - init_gi_nodes All these calls happen at setup_arch, and have the following order: setup_arch ... x86_numa_init numa_init numa_register_memblks ... init_cpu_to_node init_memory_less_node alloc_node_data free_area_init_memoryless_node init_gi_nodes init_memory_less_node alloc_node_data free_area_init_memoryless_node numa_register_memblks() is only interested in those nodes which have memory, so it skips over any memoryless node it founds. Later on, when we have read ACPI's SRAT table, we call init_cpu_to_node() and init_gi_nodes(), which initialize any memoryless node we might have that have either CPU or Initiator affinity, meaning we allocate pg_data_t struct for them and we mark them as ONLINE. So far so good, but the thing is that after ("mm: handle uninitialized numa nodes gracefully"), we allocate all possible NUMA nodes in free_area_init(), meaning we have a picture like the following: setup_arch x86_numa_init numa_init numa_register_memblks <-- allocate non-memoryless node x86_init.paging.pagetable_init ... free_area_init free_area_init_memoryless <-- allocate memoryless node init_cpu_to_node alloc_node_data <-- allocate memoryless node with CPU free_area_init_memoryless_node init_gi_nodes alloc_node_data <-- allocate memoryless node with Initiator free_area_init_memoryless_node free_area_init() already allocates all possible NUMA nodes, but init_cpu_to_node() and init_gi_nodes() are clueless about that, so they go ahead and allocate a new pg_data_t struct without checking anything, meaning we end up allocating twice. It should be mad clear that this only happens in the case where memoryless NUMA node happens to have a CPU/Initiator affinity. So get rid of init_memory_less_node() and just set the node online. Note that setting the node online is needed, otherwise we choke down the chain when bringup_nonboot_cpus() ends up calling __try_online_node()->register_one_node()->... and we blow up in bus_add_device(). As can be seen here: BUG: kernel NULL pointer dereference, address: 0000000000000060 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC PTI CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.17.0-rc4-1-default+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/4 RIP: 0010:bus_add_device+0x5a/0x140 Code: 8b 74 24 20 48 89 df e8 84 96 ff ff 85 c0 89 c5 75 38 48 8b 53 50 48 85 d2 0f 84 bb 00 004 RSP: 0000:ffffc9000022bd10 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888100987400 RCX: ffff8881003e4e19 RDX: ffff8881009a5e00 RSI: ffff888100987400 RDI: ffff888100987400 RBP: 0000000000000000 R08: ffff8881003e4e18 R09: ffff8881003e4c98 R10: 0000000000000000 R11: ffff888100402bc0 R12: ffffffff822ceba0 R13: 0000000000000000 R14: ffff888100987400 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88853fc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000060 CR3: 000000000200a001 CR4: 00000000001706b0 Call Trace: device_add+0x4c0/0x910 __register_one_node+0x97/0x2d0 __try_online_node+0x85/0xc0 try_online_node+0x25/0x40 cpu_up+0x4f/0x100 bringup_nonboot_cpus+0x4f/0x60 smp_init+0x26/0x79 kernel_init_freeable+0x130/0x2f1 kernel_init+0x17/0x150 ret_from_fork+0x22/0x30 The reason is simple, by the time bringup_nonboot_cpus() gets called, we did not register the node_subsys bus yet, so we crash when bus_add_device() tries to dereference bus()->p. The following shows the order of the calls: kernel_init_freeable smp_init bringup_nonboot_cpus ... bus_add_device() <- we did not register node_subsys yet do_basic_setup do_initcalls postcore_initcall(register_node_type); register_node_type subsys_system_register subsys_register bus_register <- register node_subsys bus Why setting the node online saves us then? Well, simply because __try_online_node() backs off when the node is online, meaning we do not end up calling register_one_node() in the first place. This is subtle, broken and deserves a deep analysis and thought about how to put this into shape, but for now let us have this easy fix for the leaking memory issue. [osalvador@suse.de: add comments] Link: https://lkml.kernel.org/r/20220221142649.3457-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20220218224302.5282-2-osalvador@suse.de Fixes: da4490c958ad ("mm: handle uninitialized numa nodes gracefully") Signed-off-by: Oscar Salvador Acked-by: Michal Hocko Cc: David Hildenbrand Cc: Rafael Aquini Cc: Dave Hansen Cc: Wei Yang Cc: Dennis Zhou Cc: Alexey Makhalov Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index c02a8cc16e4f..d0978235775f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2449,7 +2449,6 @@ static inline spinlock_t *pud_lock(struct mm_struct *mm, pud_t *pud) } extern void __init pagecache_init(void); -extern void __init free_area_init_memoryless_node(int nid); extern void free_initmem(void); /* -- cgit From 888af2701db79b9b27c7e37f9ede528a5ca53b76 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Tue, 22 Mar 2022 14:44:44 -0700 Subject: mm/memory-failure.c: fix race with changing page compound again Patch series "A few fixup patches for memory failure", v2. This series contains a few patches to fix the race with changing page compound page, make non-LRU movable pages unhandlable and so on. More details can be found in the respective changelogs. There is a race window where we got the compound_head, the hugetlb page could be freed to buddy, or even changed to another compound page just before we try to get hwpoison page. Think about the below race window: CPU 1 CPU 2 memory_failure_hugetlb struct page *head = compound_head(p); hugetlb page might be freed to buddy, or even changed to another compound page. get_hwpoison_page -- page is not what we want now... If this race happens, just bail out. Also MF_MSG_DIFFERENT_PAGE_SIZE is introduced to record this event. [akpm@linux-foundation.org: s@/**@/*@, per Naoya Horiguchi] Link: https://lkml.kernel.org/r/20220312074613.4798-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220312074613.4798-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Cc: Tony Luck Cc: Borislav Petkov Cc: Mike Kravetz Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d0978235775f..45a449e8c209 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3239,6 +3239,7 @@ enum mf_action_page_type { MF_MSG_BUDDY, MF_MSG_DAX, MF_MSG_UNSPLIT_THP, + MF_MSG_DIFFERENT_PAGE_SIZE, MF_MSG_UNKNOWN, }; -- cgit From 1e7a8181640a620300e98e22223ca3445b349840 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Tue, 22 Mar 2022 14:44:53 -0700 Subject: mm, fault-injection: declare should_fail_alloc_page() The mm/ directory can almost fully be built with W=1, which would help in local development. One remaining issue is missing prototype for should_fail_alloc_page(). Thus add it next to the should_failslab() prototype. Note the previous attempt by commit f7173090033c ("mm/page_alloc: make should_fail_alloc_page() static") had to be reverted by commit 54aa386661fe as it caused an unresolved symbol error with CONFIG_DEBUG_INFO_BTF=y Link: https://lkml.kernel.org/r/20220314165724.16071-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka Cc: Mel Gorman Cc: Matthew Wilcox Cc: David Hildenbrand Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/fault-inject.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h index e525f6957c49..2d04f6448cde 100644 --- a/include/linux/fault-inject.h +++ b/include/linux/fault-inject.h @@ -64,6 +64,8 @@ static inline struct dentry *fault_create_debugfs_attr(const char *name, struct kmem_cache; +bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order); + int should_failslab(struct kmem_cache *s, gfp_t gfpflags); #ifdef CONFIG_FAILSLAB extern bool __should_failslab(struct kmem_cache *s, gfp_t gfpflags); -- cgit From e7d324850bfcb30df563d144c0363cc44595277d Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:45:00 -0700 Subject: mm: hugetlb: free the 2nd vmemmap page associated with each HugeTLB page Patch series "Free the 2nd vmemmap page associated with each HugeTLB page", v7. This series can minimize the overhead of struct page for 2MB HugeTLB pages significantly. It further reduces the overhead of struct page by 12.5% for a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB HugeTLB. It is a nice gain. Comments and reviews are welcome. Thanks. The main implementation and details can refer to the commit log of patch 1. In this series, I have changed the following four helpers, the following table shows the impact of the overhead of those helpers. +------------------+-----------------------+ | APIs | head page | tail page | +------------------+-----------+-----------+ | PageHead() | Y | N | +------------------+-----------+-----------+ | PageTail() | Y | N | +------------------+-----------+-----------+ | PageCompound() | N | N | +------------------+-----------+-----------+ | compound_head() | Y | N | +------------------+-----------+-----------+ Y: Overhead is increased. N: Overhead is _NOT_ increased. It shows that the overhead of those helpers on a tail page don't change between "hugetlb_free_vmemmap=on" and "hugetlb_free_vmemmap=off". But the overhead on a head page will be increased when "hugetlb_free_vmemmap=on" (except PageCompound()). So I believe that Matthew Wilcox's folio series will help with this. The users of PageHead() and PageTail() are much less than compound_head() and most users of PageTail() are VM_BUG_ON(), so I have done some tests about the overhead of compound_head() on head pages. I have tested the overhead of calling compound_head() on a head page, which is 2.11ns (Measure the call time of 10 million times compound_head(), and then average). For a head page whose address is not aligned with PAGE_SIZE or a non-compound page, the overhead of compound_head() is 2.54ns which is increased by 20%. For a head page whose address is aligned with PAGE_SIZE, the overhead of compound_head() is 2.97ns which is increased by 40%. Most pages are the former. I do not think the overhead is significant since the overhead of compound_head() itself is low. This patch (of 5): This patch minimizes the overhead of struct page for 2MB HugeTLB pages significantly. It further reduces the overhead of struct page by 12.5% for a 2MB HugeTLB compared to the previous approach, which means 2GB per 1TB HugeTLB (2MB type). After the feature of "Free sonme vmemmap pages of HugeTLB page" is enabled, the mapping of the vmemmap addresses associated with a 2MB HugeTLB page becomes the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | -------------> | 1 | | | +-----------+ +-----------+ | | | 2 | ----------------^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | 3 | ------------------+ | | | | | | +-----------+ | | | | | | | 4 | --------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | ----------------------+ | | | | +-----------+ | | | | | 6 | ------------------------+ | | | +-----------+ | | | | 7 | --------------------------+ | | +-----------+ | | | | | | +-----------+ As we can see, the 2nd vmemmap page frame (indexed by 1) is reused and remaped. However, the 2nd vmemmap page frame is also can be freed to the buddy allocator, then we can change the mapping from the figure above to the figure below. HugeTLB struct pages(8 pages) page frame(8 pages) +-----------+ ---virt_to_page---> +-----------+ mapping to +-----------+---> PG_head | | | 0 | -------------> | 0 | | | +-----------+ +-----------+ | | | 1 | ---------------^ ^ ^ ^ ^ ^ ^ | | +-----------+ | | | | | | | | | 2 | -----------------+ | | | | | | | +-----------+ | | | | | | | | 3 | -------------------+ | | | | | | +-----------+ | | | | | | | 4 | ---------------------+ | | | | 2MB | +-----------+ | | | | | | 5 | -----------------------+ | | | | +-----------+ | | | | | 6 | -------------------------+ | | | +-----------+ | | | | 7 | ---------------------------+ | | +-----------+ | | | | | | +-----------+ After we do this, all tail vmemmap pages (1-7) are mapped to the head vmemmap page frame (0). In other words, there are more than one page struct with PG_head associated with each HugeTLB page. We __know__ that there is only one head page struct, the tail page structs with PG_head are fake head page structs. We need an approach to distinguish between those two different types of page structs so that compound_head(), PageHead() and PageTail() can work properly if the parameter is the tail page struct but with PG_head. The following code snippet describes how to distinguish between real and fake head page struct. if (test_bit(PG_head, &page->flags)) { unsigned long head = READ_ONCE(page[1].compound_head); if (head & 1) { if (head == (unsigned long)page + 1) ==> head page struct else ==> tail page struct } else ==> head page struct } We can safely access the field of the @page[1] with PG_head because the @page is a compound page composed with at least two contiguous pages. [songmuchun@bytedance.com: restore lost comment changes] Link: https://lkml.kernel.org/r/20211101031651.75851-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20211101031651.75851-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Barry Song Cc: Mike Kravetz Cc: Oscar Salvador Cc: Michal Hocko Cc: David Hildenbrand Cc: Chen Huang Cc: Bodeddula Balasubramaniam Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Xiongchun Duan Cc: Fam Zheng Cc: Qi Zheng Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 78 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 74 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1c3b6e5c8bfd..111e453f23d2 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -190,13 +190,69 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +extern bool hugetlb_free_vmemmap_enabled; + +/* + * If the feature of freeing some vmemmap pages associated with each HugeTLB + * page is enabled, the head vmemmap page frame is reused and all of the tail + * vmemmap addresses map to the head vmemmap page frame (furture details can + * refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other + * words, there are more than one page struct with PG_head associated with each + * HugeTLB page. We __know__ that there is only one head page struct, the tail + * page structs with PG_head are fake head page structs. We need an approach + * to distinguish between those two different types of page structs so that + * compound_head() can return the real head page struct when the parameter is + * the tail page struct but with PG_head. + * + * The page_fixed_fake_head() returns the real head page struct if the @page is + * fake page head, otherwise, returns @page which can either be a true page + * head or tail. + */ +static __always_inline const struct page *page_fixed_fake_head(const struct page *page) +{ + if (!hugetlb_free_vmemmap_enabled) + return page; + + /* + * Only addresses aligned with PAGE_SIZE of struct page may be fake head + * struct page. The alignment check aims to avoid access the fields ( + * e.g. compound_head) of the @page[1]. It can avoid touch a (possibly) + * cold cacheline in some cases. + */ + if (IS_ALIGNED((unsigned long)page, PAGE_SIZE) && + test_bit(PG_head, &page->flags)) { + /* + * We can safely access the field of the @page[1] with PG_head + * because the @page is a compound page composed with at least + * two contiguous pages. + */ + unsigned long head = READ_ONCE(page[1].compound_head); + + if (likely(head & 1)) + return (const struct page *)(head - 1); + } + return page; +} +#else +static inline const struct page *page_fixed_fake_head(const struct page *page) +{ + return page; +} +#endif + +static __always_inline int page_is_fake_head(struct page *page) +{ + return page_fixed_fake_head(page) != page; +} + static inline unsigned long _compound_head(const struct page *page) { unsigned long head = READ_ONCE(page->compound_head); if (unlikely(head & 1)) return head - 1; - return (unsigned long)page; + return (unsigned long)page_fixed_fake_head(page); } #define compound_head(page) ((typeof(page))_compound_head(page)) @@ -231,12 +287,13 @@ static inline unsigned long _compound_head(const struct page *page) static __always_inline int PageTail(struct page *page) { - return READ_ONCE(page->compound_head) & 1; + return READ_ONCE(page->compound_head) & 1 || page_is_fake_head(page); } static __always_inline int PageCompound(struct page *page) { - return test_bit(PG_head, &page->flags) || PageTail(page); + return test_bit(PG_head, &page->flags) || + READ_ONCE(page->compound_head) & 1; } #define PAGE_POISON_PATTERN -1l @@ -695,7 +752,20 @@ static inline bool test_set_page_writeback(struct page *page) return set_page_writeback(page); } -__PAGEFLAG(Head, head, PF_ANY) CLEARPAGEFLAG(Head, head, PF_ANY) +static __always_inline bool folio_test_head(struct folio *folio) +{ + return test_bit(PG_head, folio_flags(folio, FOLIO_PF_ANY)); +} + +static __always_inline int PageHead(struct page *page) +{ + PF_POISONED_CHECK(page); + return test_bit(PG_head, &page->flags) && !page_is_fake_head(page); +} + +__SETPAGEFLAG(Head, head, PF_ANY) +__CLEARPAGEFLAG(Head, head, PF_ANY) +CLEARPAGEFLAG(Head, head, PF_ANY) /** * folio_test_large() - Does this folio contain more than one page? -- cgit From a6b40850c442bf996e729e1d441d3dbc37cea171 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:45:03 -0700 Subject: mm: hugetlb: replace hugetlb_free_vmemmap_enabled with a static_key The page_fixed_fake_head() is used throughout memory management and the conditional check requires checking a global variable, although the overhead of this check may be small, it increases when the memory cache comes under pressure. Also, the global variable will not be modified after system boot, so it is very appropriate to use static key machanism. Link: https://lkml.kernel.org/r/20211101031651.75851-3-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Barry Song Cc: Bodeddula Balasubramaniam Cc: Chen Huang Cc: David Hildenbrand Cc: Fam Zheng Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Michal Hocko Cc: Mike Kravetz Cc: Oscar Salvador Cc: Qi Zheng Cc: Xiongchun Duan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 6 ------ include/linux/page-flags.h | 16 ++++++++++++++-- 2 files changed, 14 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 52c462390aee..08357b4c7be7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1075,12 +1075,6 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr } #endif /* CONFIG_HUGETLB_PAGE */ -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; -#else -#define hugetlb_free_vmemmap_enabled false -#endif - static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 111e453f23d2..340cb8156568 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -191,7 +191,14 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -extern bool hugetlb_free_vmemmap_enabled; +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + hugetlb_free_vmemmap_enabled_key); + +static __always_inline bool hugetlb_free_vmemmap_enabled(void) +{ + return static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + &hugetlb_free_vmemmap_enabled_key); +} /* * If the feature of freeing some vmemmap pages associated with each HugeTLB @@ -211,7 +218,7 @@ extern bool hugetlb_free_vmemmap_enabled; */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!hugetlb_free_vmemmap_enabled) + if (!hugetlb_free_vmemmap_enabled()) return page; /* @@ -239,6 +246,11 @@ static inline const struct page *page_fixed_fake_head(const struct page *page) { return page; } + +static inline bool hugetlb_free_vmemmap_enabled(void) +{ + return false; +} #endif static __always_inline int page_is_fake_head(struct page *page) -- cgit From e54084173487804f5e2f23facf107fd9336e637e Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 22 Mar 2022 14:45:12 -0700 Subject: mm: sparsemem: move vmemmap related to HugeTLB to CONFIG_HUGETLB_PAGE_FREE_VMEMMAP The vmemmap_remap_free/alloc are relevant to HugeTLB, so move those functiongs to the scope of CONFIG_HUGETLB_PAGE_FREE_VMEMMAP. Link: https://lkml.kernel.org/r/20211101031651.75851-6-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Barry Song Cc: Bodeddula Balasubramaniam Cc: Chen Huang Cc: David Hildenbrand Cc: Fam Zheng Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Michal Hocko Cc: Mike Kravetz Cc: Oscar Salvador Cc: Qi Zheng Cc: Xiongchun Duan Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 45a449e8c209..9d58321386cc 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3146,10 +3146,12 @@ static inline void print_vma_addr(char *prefix, unsigned long rip) } #endif +#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP int vmemmap_remap_free(unsigned long start, unsigned long end, unsigned long reuse); int vmemmap_remap_alloc(unsigned long start, unsigned long end, unsigned long reuse, gfp_t gfp_mask); +#endif void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, -- cgit From 824ddc601adc2cc48efb7f58b57997986c1c1276 Mon Sep 17 00:00:00 2001 From: Nadav Amit Date: Tue, 22 Mar 2022 14:45:32 -0700 Subject: userfaultfd: provide unmasked address on page-fault Userfaultfd is supposed to provide the full address (i.e., unmasked) of the faulting access back to userspace. However, that is not the case for quite some time. Even running "userfaultfd_demo" from the userfaultfd man page provides the wrong output (and contradicts the man page). Notice that "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and not the first read address (0x7fc5e30b300f). Address returned by mmap() = 0x7fc5e30b3000 fault_handler_thread(): poll() returns: nready = 1; POLLIN = 1; POLLERR = 0 UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000 (uffdio_copy.copy returned 4096) Read address 0x7fc5e30b300f in main(): A Read address 0x7fc5e30b340f in main(): A Read address 0x7fc5e30b380f in main(): A Read address 0x7fc5e30b3c0f in main(): A The exact address is useful for various reasons and specifically for prefetching decisions. If it is known that the memory is populated by certain objects whose size is not page-aligned, then based on the faulting address, the uffd-monitor can decide whether to prefetch and prefault the adjacent page. This bug has been for quite some time in the kernel: since commit 1a29d85eb0f1 ("mm: use vmf->address instead of of vmf->virtual_address") vmf->virtual_address"), which dates back to 2016. A concern has been raised that existing userspace application might rely on the old/wrong behavior in which the address is masked. Therefore, it was suggested to provide the masked address unless the user explicitly asks for the exact address. Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct userfaultfd to provide the exact address. Add a new "real_address" field to vmf to hold the unmasked address. Provide the address to userspace accordingly. Initialize real_address in various code-paths to be consistent with address, even when it is not used, to be on the safe side. [namit@vmware.com: initialize real_address on all code paths, per Jan] Link: https://lkml.kernel.org/r/20220226022655.350562-1-namit@vmware.com [akpm@linux-foundation.org: fix typo in comment, per Jan] Link: https://lkml.kernel.org/r/20220218041003.3508-1-namit@vmware.com Signed-off-by: Nadav Amit Acked-by: Peter Xu Reviewed-by: David Hildenbrand Acked-by: Mike Rapoport Reviewed-by: Jan Kara Cc: Andrea Arcangeli Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 9d58321386cc..0e4fd101616e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -478,7 +478,8 @@ struct vm_fault { struct vm_area_struct *vma; /* Target VMA */ gfp_t gfp_mask; /* gfp mask to be used for allocations */ pgoff_t pgoff; /* Logical page offset based on vma */ - unsigned long address; /* Faulting virtual address */ + unsigned long address; /* Faulting virtual address - masked */ + unsigned long real_address; /* Faulting virtual address - unmasked */ }; enum fault_flag flags; /* FAULT_FLAG_xxx flags * XXX: should really be 'const' */ -- cgit From b698f0a1773f7df73f2bb4bfe0e597ea1bb3881f Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 22 Mar 2022 14:45:38 -0700 Subject: mm/fs: delete PF_SWAPWRITE PF_SWAPWRITE has been redundant since v3.2 commit ee72886d8ed5 ("mm: vmscan: do not writeback filesystem pages in direct reclaim"). Coincidentally, NeilBrown's current patch "remove inode_congested()" deletes may_write_to_inode(), which appeared to be the one function which took notice of PF_SWAPWRITE. But if you study the old logic, and the conditions under which may_write_to_inode() was called, you discover that flag and function have been pointless for a decade. Link: https://lkml.kernel.org/r/75e80e7-742d-e3bd-531-614db8961e4@google.com Signed-off-by: Hugh Dickins Cc: NeilBrown Cc: Jan Kara Cc: "Darrick J. Wong" Cc: Dave Chinner Cc: Matthew Wilcox Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 75ba8aa60248..c3c841a02a00 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1689,7 +1689,6 @@ extern struct pid *cad_pid; * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ -#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */ #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMALLOC_PIN 0x10000000 /* Allocation context constrained to zones which allow long term pinning. */ -- cgit From 89f6c88a6ab4a11deb14c270f7f1454cda4f73d6 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Tue, 22 Mar 2022 14:45:41 -0700 Subject: mm: __isolate_lru_page_prepare() in isolate_migratepages_block() __isolate_lru_page_prepare() conflates two unrelated functions, with the flags to one disjoint from the flags to the other; and hides some of the important checks outside of isolate_migratepages_block(), where the sequence is better to be visible. It comes from the days of lumpy reclaim, before compaction, when the combination made more sense. Move what's needed by mm/compaction.c isolate_migratepages_block() inline there, and what's needed by mm/vmscan.c isolate_lru_pages() inline there. Shorten "isolate_mode" to "mode", so the sequence of conditions is easier to read. Declare a "mapping" variable, to save one call to page_mapping() (but not another: calling again after page is locked is necessary). Simplify isolate_lru_pages() with a "move_to" list pointer. Link: https://lkml.kernel.org/r/879d62a8-91cc-d3c6-fb3b-69768236df68@google.com Signed-off-by: Hugh Dickins Acked-by: David Rientjes Reviewed-by: Alex Shi Cc: Alexander Duyck Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/swap.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 3db431276d82..a246c137678e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -387,7 +387,6 @@ extern void lru_cache_add_inactive_or_unevictable(struct page *page, extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); -extern bool __isolate_lru_page_prepare(struct page *page, isolate_mode_t mode); extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, -- cgit From 356ea3865687926e5da7579d1f3351d3f0a322a1 Mon Sep 17 00:00:00 2001 From: "andrew.yang" Date: Tue, 22 Mar 2022 14:46:08 -0700 Subject: mm/migrate: fix race between lock page and clear PG_Isolated When memory is tight, system may start to compact memory for large continuous memory demands. If one process tries to lock a memory page that is being locked and isolated for compaction, it may wait a long time or even forever. This is because compaction will perform non-atomic PG_Isolated clear while holding page lock, this may overwrite PG_waiters set by the process that can't obtain the page lock and add itself to the waiting queue to wait for the lock to be unlocked. CPU1 CPU2 lock_page(page); (successful) lock_page(); (failed) __ClearPageIsolated(page); SetPageWaiters(page) (may be overwritten) unlock_page(page); The solution is to not perform non-atomic operation on page flags while holding page lock. Link: https://lkml.kernel.org/r/20220315030515.20263-1-andrew.yang@mediatek.com Signed-off-by: andrew.yang Cc: Matthias Brugger Cc: Matthew Wilcox Cc: "Vlastimil Babka" Cc: David Howells Cc: "William Kucharski" Cc: David Hildenbrand Cc: Yang Shi Cc: Marc Zyngier Cc: Nicholas Tang Cc: Kuan-Ying Lee Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 340cb8156568..88fe1d759cdd 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -1000,7 +1000,7 @@ PAGE_TYPE_OPS(Guard, guard) extern bool is_free_buddy_page(struct page *page); -__PAGEFLAG(Isolated, isolated, PF_ANY); +PAGEFLAG(Isolated, isolated, PF_ANY); #ifdef CONFIG_MMU #define __PG_MLOCKED (1UL << PG_mlocked) -- cgit From 27d121d0ec6d604d0147c5b579e4181b688a2d64 Mon Sep 17 00:00:00 2001 From: Hari Bathini Date: Tue, 22 Mar 2022 14:46:14 -0700 Subject: mm/cma: provide option to opt out from exposing pages on activation failure Patch series "powerpc/fadump: handle CMA activation failure appropriately", v3. Commit 072355c1cf2d ("mm/cma: expose all pages to the buddy if activation of an area fails") started exposing all pages to buddy allocator on CMA activation failure. But there can be CMA users that want to handle the reserved memory differently on CMA allocation failure. Provide an option to opt out from exposing pages to buddy for such cases. Link: https://lkml.kernel.org/r/20220117075246.36072-1-hbathini@linux.ibm.com Link: https://lkml.kernel.org/r/20220117075246.36072-2-hbathini@linux.ibm.com Signed-off-by: Hari Bathini Reviewed-by: David Hildenbrand Cc: Oscar Salvador Cc: Mike Kravetz Cc: Mahesh Salgaonkar Cc: Sourabh Jain Cc: Michael Ellerman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/cma.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cma.h b/include/linux/cma.h index b1ba94f1cc9c..90fd742fd1ef 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -58,4 +58,6 @@ extern bool cma_pages_valid(struct cma *cma, const struct page *pages, unsigned extern bool cma_release(struct cma *cma, const struct page *pages, unsigned long count); extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data); + +extern void cma_reserve_pages_on_error(struct cma *cma); #endif -- cgit From e39bb6be9f2b39a6dbaeff484361de76021b175d Mon Sep 17 00:00:00 2001 From: Huang Ying Date: Tue, 22 Mar 2022 14:46:20 -0700 Subject: NUMA Balancing: add page promotion counter Patch series "NUMA balancing: optimize memory placement for memory tiering system", v13 With the advent of various new memory types, some machines will have multiple types of memory, e.g. DRAM and PMEM (persistent memory). The memory subsystem of these machines can be called memory tiering system, because the performance of the different types of memory are different. After commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM"), the PMEM could be used as the cost-effective volatile memory in separate NUMA nodes. In a typical memory tiering system, there are CPUs, DRAM and PMEM in each physical NUMA node. The CPUs and the DRAM will be put in one logical node, while the PMEM will be put in another (faked) logical node. To optimize the system overall performance, the hot pages should be placed in DRAM node. To do that, we need to identify the hot pages in the PMEM node and migrate them to DRAM node via NUMA migration. In the original NUMA balancing, there are already a set of existing mechanisms to identify the pages recently accessed by the CPUs in a node and migrate the pages to the node. So we can reuse these mechanisms to build the mechanisms to optimize the page placement in the memory tiering system. This is implemented in this patchset. At the other hand, the cold pages should be placed in PMEM node. So, we also need to identify the cold pages in the DRAM node and migrate them to PMEM node. In commit 26aa2d199d6f ("mm/migrate: demote pages during reclaim"), a mechanism to demote the cold DRAM pages to PMEM node under memory pressure is implemented. Based on that, the cold DRAM pages can be demoted to PMEM node proactively to free some memory space on DRAM node to accommodate the promoted hot PMEM pages. This is implemented in this patchset too. We have tested the solution with the pmbench memory accessing benchmark with the 80:20 read/write ratio and the Gauss access address distribution on a 2 socket Intel server with Optane DC Persistent Memory Model. The test results shows that the pmbench score can improve up to 95.9%. This patch (of 3): In a system with multiple memory types, e.g. DRAM and PMEM, the CPU and DRAM in one socket will be put in one NUMA node as before, while the PMEM will be put in another NUMA node as described in the description of the commit c221c0b0308f ("device-dax: "Hotplug" persistent memory for use like normal RAM"). So, the NUMA balancing mechanism will identify all PMEM accesses as remote access and try to promote the PMEM pages to DRAM. To distinguish the number of the inter-type promoted pages from that of the inter-socket migrated pages. A new vmstat count is added. The counter is per-node (count in the target node). So this can be used to identify promotion imbalance among the NUMA nodes. Link: https://lkml.kernel.org/r/20220301085329.3210428-1-ying.huang@intel.com Link: https://lkml.kernel.org/r/20220221084529.1052339-1-ying.huang@intel.com Link: https://lkml.kernel.org/r/20220221084529.1052339-2-ying.huang@intel.com Signed-off-by: "Huang, Ying" Reviewed-by: Yang Shi Tested-by: Baolin Wang Reviewed-by: Baolin Wang Acked-by: Johannes Weiner Reviewed-by: Oscar Salvador Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Dave Hansen Cc: Zi Yan Cc: Wei Xu Cc: Shakeel Butt Cc: zhongjiang-ali Cc: Feng Tang Cc: Randy Dunlap Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 3 +++ include/linux/node.h | 5 +++++ 2 files changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c9e6a50109b9..310b6e7ce58a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -221,6 +221,9 @@ enum node_stat_item { NR_PAGETABLE, /* used for pagetables */ #ifdef CONFIG_SWAP NR_SWAPCACHE, +#endif +#ifdef CONFIG_NUMA_BALANCING + PGPROMOTE_SUCCESS, /* promote successfully */ #endif NR_VM_NODE_STAT_ITEMS }; diff --git a/include/linux/node.h b/include/linux/node.h index bb21fd631b16..81bbf1c0afd3 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -181,4 +181,9 @@ static inline void register_hugetlbfs_with_node(node_registration_func_t reg, #define to_node(device) container_of(device, struct node, dev) +static inline bool node_is_toptier(int node) +{ + return node_state(node, N_CPU); +} + #endif /* _LINUX_NODE_H_ */ -- cgit From c574bbe917036c8968b984c82c7b13194fe5ce98 Mon Sep 17 00:00:00 2001 From: Huang Ying Date: Tue, 22 Mar 2022 14:46:23 -0700 Subject: NUMA balancing: optimize page placement for memory tiering system With the advent of various new memory types, some machines will have multiple types of memory, e.g. DRAM and PMEM (persistent memory). The memory subsystem of these machines can be called memory tiering system, because the performance of the different types of memory are usually different. In such system, because of the memory accessing pattern changing etc, some pages in the slow memory may become hot globally. So in this patch, the NUMA balancing mechanism is enhanced to optimize the page placement among the different memory types according to hot/cold dynamically. In a typical memory tiering system, there are CPUs, fast memory and slow memory in each physical NUMA node. The CPUs and the fast memory will be put in one logical node (called fast memory node), while the slow memory will be put in another (faked) logical node (called slow memory node). That is, the fast memory is regarded as local while the slow memory is regarded as remote. So it's possible for the recently accessed pages in the slow memory node to be promoted to the fast memory node via the existing NUMA balancing mechanism. The original NUMA balancing mechanism will stop to migrate pages if the free memory of the target node becomes below the high watermark. This is a reasonable policy if there's only one memory type. But this makes the original NUMA balancing mechanism almost do not work to optimize page placement among different memory types. Details are as follows. It's the common cases that the working-set size of the workload is larger than the size of the fast memory nodes. Otherwise, it's unnecessary to use the slow memory at all. So, there are almost always no enough free pages in the fast memory nodes, so that the globally hot pages in the slow memory node cannot be promoted to the fast memory node. To solve the issue, we have 2 choices as follows, a. Ignore the free pages watermark checking when promoting hot pages from the slow memory node to the fast memory node. This will create some memory pressure in the fast memory node, thus trigger the memory reclaiming. So that, the cold pages in the fast memory node will be demoted to the slow memory node. b. Define a new watermark called wmark_promo which is higher than wmark_high, and have kswapd reclaiming pages until free pages reach such watermark. The scenario is as follows: when we want to promote hot-pages from a slow memory to a fast memory, but fast memory's free pages would go lower than high watermark with such promotion, we wake up kswapd with wmark_promo watermark in order to demote cold pages and free us up some space. So, next time we want to promote hot-pages we might have a chance of doing so. The choice "a" may create high memory pressure in the fast memory node. If the memory pressure of the workload is high, the memory pressure may become so high that the memory allocation latency of the workload is influenced, e.g. the direct reclaiming may be triggered. The choice "b" works much better at this aspect. If the memory pressure of the workload is high, the hot pages promotion will stop earlier because its allocation watermark is higher than that of the normal memory allocation. So in this patch, choice "b" is implemented. A new zone watermark (WMARK_PROMO) is added. Which is larger than the high watermark and can be controlled via watermark_scale_factor. In addition to the original page placement optimization among sockets, the NUMA balancing mechanism is extended to be used to optimize page placement according to hot/cold among different memory types. So the sysctl user space interface (numa_balancing) is extended in a backward compatible way as follow, so that the users can enable/disable these functionality individually. The sysctl is converted from a Boolean value to a bits field. The definition of the flags is, - 0: NUMA_BALANCING_DISABLED - 1: NUMA_BALANCING_NORMAL - 2: NUMA_BALANCING_MEMORY_TIERING We have tested the patch with the pmbench memory accessing benchmark with the 80:20 read/write ratio and the Gauss access address distribution on a 2 socket Intel server with Optane DC Persistent Memory Model. The test results shows that the pmbench score can improve up to 95.9%. Thanks Andrew Morton to help fix the document format error. Link: https://lkml.kernel.org/r/20220221084529.1052339-3-ying.huang@intel.com Signed-off-by: "Huang, Ying" Tested-by: Baolin Wang Reviewed-by: Baolin Wang Acked-by: Johannes Weiner Reviewed-by: Oscar Salvador Reviewed-by: Yang Shi Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Dave Hansen Cc: Zi Yan Cc: Wei Xu Cc: Shakeel Butt Cc: zhongjiang-ali Cc: Randy Dunlap Cc: Feng Tang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 1 + include/linux/sched/sysctl.h | 10 ++++++++++ 2 files changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 310b6e7ce58a..962b14d403e8 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -353,6 +353,7 @@ enum zone_watermarks { WMARK_MIN, WMARK_LOW, WMARK_HIGH, + WMARK_PROMO, NR_WMARK }; diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index c19dd5a2c05c..b5eec8854c5a 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -23,6 +23,16 @@ enum sched_tunable_scaling { SCHED_TUNABLESCALING_END, }; +#define NUMA_BALANCING_DISABLED 0x0 +#define NUMA_BALANCING_NORMAL 0x1 +#define NUMA_BALANCING_MEMORY_TIERING 0x2 + +#ifdef CONFIG_NUMA_BALANCING +extern int sysctl_numa_balancing_mode; +#else +#define sysctl_numa_balancing_mode 0 +#endif + /* * control realtime throttling: * -- cgit From 4d45c3aff5ebf80d329eba0f90544d20224f612d Mon Sep 17 00:00:00 2001 From: Yang Yang Date: Tue, 22 Mar 2022 14:46:33 -0700 Subject: mm/vmstat: add event for ksm swapping in copy When faults in from swap what used to be a KSM page and that page had been swapped in before, system has to make a copy, and leaves remerging the pages to a later pass of ksmd. That is not good for performace, we'd better to reduce this kind of copy. There are some ways to reduce it, for example lessen swappiness or madvise(, , MADV_MERGEABLE) range. So add this event to support doing this tuning. Just like this patch: "mm, THP, swap: add THP swapping out fallback counting". Link: https://lkml.kernel.org/r/20220113023839.758845-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang Reviewed-by: Ran Xiaokai Cc: Hugh Dickins Cc: Yang Shi Cc: Dave Hansen Cc: Saravanan D Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/vm_event_item.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 7b2363388bfa..16a0a4fd000b 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -129,6 +129,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_SWAP SWAP_RA, SWAP_RA_HIT, +#ifdef CONFIG_KSM + KSM_SWPIN_COPY, +#endif #endif #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, -- cgit From e930d999715073a70d306fb59a394ea8b84d0b45 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 22 Mar 2022 14:46:51 -0700 Subject: mm, memory_hotplug: make arch_alloc_nodedata independent on CONFIG_MEMORY_HOTPLUG Patch series "mm, memory_hotplug: handle unitialized numa node gracefully". The core of the fix is patch 2 which also links existing bug reports. The high level goal is to have all possible numa nodes have their pgdat allocated and initialized so for_each_possible_node(nid) NODE_DATA(nid) will never return garbage. This has proven to be problem in several places when an offline numa node is used for an allocation just to realize that node_data and therefore allocation fallback zonelists are not initialized and such an allocation request blows up. There were attempts to address that by checking node_online in several places including the page allocator. This patchset approaches the problem from a different perspective and instead of special casing, which just adds a runtime overhead, it allocates pglist_data for each possible node. This can add some memory overhead for platforms with high number of possible nodes if they do not contain any memory. This should be a rather rare configuration though. How to test this? David has provided and excellent howto: http://lkml.kernel.org/r/6e5ebc19-890c-b6dd-1924-9f25c441010d@redhat.com Patches 1 and 3-6 are mostly cleanups. The patchset has been reviewed by Rafael (thanks!) and the core fix tested by Rafael and Alexey (thanks to both). David has tested as per instructions above and hasn't found any fallouts in the memory hotplug scenarios. This patch (of 6): This is a preparatory patch and it doesn't introduce any functional change. It merely pulls out arch_alloc_nodedata (and co) outside of CONFIG_MEMORY_HOTPLUG because the following patch will need to call this from the generic MM code. Link: https://lkml.kernel.org/r/20220127085305.20890-1-mhocko@kernel.org Link: https://lkml.kernel.org/r/20220127085305.20890-2-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Rafael Aquini Acked-by: David Hildenbrand Acked-by: Mike Rapoport Reviewed-by: Oscar Salvador Reviewed-by: Wei Yang Cc: Alexey Makhalov Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Dumazet Cc: Nico Pache Cc: Tejun Heo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memory_hotplug.h | 119 ++++++++++++++++++++--------------------- 1 file changed, 59 insertions(+), 60 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index be48e003a518..4355983b364d 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -16,6 +16,65 @@ struct memory_group; struct resource; struct vmem_altmap; +#ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION +/* + * For supporting node-hotadd, we have to allocate a new pgdat. + * + * If an arch has generic style NODE_DATA(), + * node_data[nid] = kzalloc() works well. But it depends on the architecture. + * + * In general, generic_alloc_nodedata() is used. + * Now, arch_free_nodedata() is just defined for error path of node_hot_add. + * + */ +extern pg_data_t *arch_alloc_nodedata(int nid); +extern void arch_free_nodedata(pg_data_t *pgdat); +extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat); + +#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ + +#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid) +#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat) + +#ifdef CONFIG_NUMA +/* + * XXX: node aware allocation can't work well to get new node's memory at this time. + * Because, pgdat for the new node is not allocated/initialized yet itself. + * To use new node's memory, more consideration will be necessary. + */ +#define generic_alloc_nodedata(nid) \ +({ \ + kzalloc(sizeof(pg_data_t), GFP_KERNEL); \ +}) +/* + * This definition is just for error path in node hotadd. + * For node hotremove, we have to replace this. + */ +#define generic_free_nodedata(pgdat) kfree(pgdat) + +extern pg_data_t *node_data[]; +static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) +{ + node_data[nid] = pgdat; +} + +#else /* !CONFIG_NUMA */ + +/* never called */ +static inline pg_data_t *generic_alloc_nodedata(int nid) +{ + BUG(); + return NULL; +} +static inline void generic_free_nodedata(pg_data_t *pgdat) +{ +} +static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) +{ +} +#endif /* CONFIG_NUMA */ +#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ + #ifdef CONFIG_MEMORY_HOTPLUG struct page *pfn_to_online_page(unsigned long pfn); @@ -154,66 +213,6 @@ int add_pages(int nid, unsigned long start_pfn, unsigned long nr_pages, struct mhp_params *params); #endif /* ARCH_HAS_ADD_PAGES */ -#ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION -/* - * For supporting node-hotadd, we have to allocate a new pgdat. - * - * If an arch has generic style NODE_DATA(), - * node_data[nid] = kzalloc() works well. But it depends on the architecture. - * - * In general, generic_alloc_nodedata() is used. - * Now, arch_free_nodedata() is just defined for error path of node_hot_add. - * - */ -extern pg_data_t *arch_alloc_nodedata(int nid); -extern void arch_free_nodedata(pg_data_t *pgdat); -extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat); - -#else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ - -#define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid) -#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat) - -#ifdef CONFIG_NUMA -/* - * If ARCH_HAS_NODEDATA_EXTENSION=n, this func is used to allocate pgdat. - * XXX: kmalloc_node() can't work well to get new node's memory at this time. - * Because, pgdat for the new node is not allocated/initialized yet itself. - * To use new node's memory, more consideration will be necessary. - */ -#define generic_alloc_nodedata(nid) \ -({ \ - kzalloc(sizeof(pg_data_t), GFP_KERNEL); \ -}) -/* - * This definition is just for error path in node hotadd. - * For node hotremove, we have to replace this. - */ -#define generic_free_nodedata(pgdat) kfree(pgdat) - -extern pg_data_t *node_data[]; -static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) -{ - node_data[nid] = pgdat; -} - -#else /* !CONFIG_NUMA */ - -/* never called */ -static inline pg_data_t *generic_alloc_nodedata(int nid) -{ - BUG(); - return NULL; -} -static inline void generic_free_nodedata(pg_data_t *pgdat) -{ -} -static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) -{ -} -#endif /* CONFIG_NUMA */ -#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ - void get_online_mems(void); void put_online_mems(void); -- cgit From 09f49dca570a917a8c6bccd7e8c61f5141534e3a Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 22 Mar 2022 14:46:54 -0700 Subject: mm: handle uninitialized numa nodes gracefully We have had several reports [1][2][3] that page allocator blows up when an allocation from a possible node is requested. The underlying reason is that NODE_DATA for the specific node is not allocated. NUMA specific initialization is arch specific and it can vary a lot. E.g. x86 tries to initialize all nodes that have some cpu affinity (see init_cpu_to_node) but this can be insufficient because the node might be cpuless for example. One way to address this problem would be to check for !node_online nodes when trying to get a zonelist and silently fall back to another node. That is unfortunately adding a branch into allocator hot path and it doesn't handle any other potential NODE_DATA users. This patch takes a different approach (following a lead of [3]) and it pre allocates pgdat for all possible nodes in an arch indipendent code - free_area_init. All uninitialized nodes are treated as memoryless nodes. node_state of the node is not changed because that would lead to other side effects - e.g. sysfs representation of such a node and from past discussions [4] it is known that some tools might have problems digesting that. Newly allocated pgdat only gets a minimal initialization and the rest of the work is expected to be done by the memory hotplug - hotadd_new_pgdat (renamed to hotadd_init_pgdat). generic_alloc_nodedata is changed to use the memblock allocator because neither page nor slab allocators are available at the stage when all pgdats are allocated. Hotplug doesn't allocate pgdat anymore so we can use the early boot allocator. The only arch specific implementation is ia64 and that is changed to use the early allocator as well. [1] http://lkml.kernel.org/r/20211101201312.11589-1-amakhalov@vmware.com [2] http://lkml.kernel.org/r/20211207224013.880775-1-npache@redhat.com [3] http://lkml.kernel.org/r/20190114082416.30939-1-mhocko@kernel.org [4] http://lkml.kernel.org/r/20200428093836.27190-1-srikar@linux.vnet.ibm.com [akpm@linux-foundation.org: replace comment, per Mike] Link: https://lkml.kernel.org/r/Yfe7RBeLCijnWBON@dhcp22.suse.cz Reported-by: Alexey Makhalov Tested-by: Alexey Makhalov Reported-by: Nico Pache Acked-by: Rafael Aquini Tested-by: Rafael Aquini Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador Acked-by: Mike Rapoport Signed-off-by: Michal Hocko Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Dumazet Cc: Tejun Heo Cc: Wei Yang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memory_hotplug.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 4355983b364d..cdd66bfdf855 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -44,7 +44,7 @@ extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat); */ #define generic_alloc_nodedata(nid) \ ({ \ - kzalloc(sizeof(pg_data_t), GFP_KERNEL); \ + memblock_alloc(sizeof(*pgdat), SMP_CACHE_BYTES); \ }) /* * This definition is just for error path in node hotadd. -- cgit From 390511e1476eb1cc41d420a7661b33f4d8584c3f Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 22 Mar 2022 14:46:57 -0700 Subject: mm, memory_hotplug: drop arch_free_nodedata Prior to "mm: handle uninitialized numa nodes gracefully" memory hotplug used to allocate pgdat when memory has been added to a node (hotadd_init_pgdat) arch_free_nodedata has been only used in the failure path because once the pgdat is exported (to be visible by NODA_DATA(nid)) it cannot really be freed because there is no synchronization available for that. pgdat is allocated for each possible nodes now so the memory hotplug doesn't need to do the ever use arch_free_nodedata so drop it. This patch doesn't introduce any functional change. Link: https://lkml.kernel.org/r/20220127085305.20890-4-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Rafael Aquini Acked-by: David Hildenbrand Acked-by: Mike Rapoport Reviewed-by: Oscar Salvador Cc: Alexey Makhalov Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Dumazet Cc: Nico Pache Cc: Tejun Heo Cc: Wei Yang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memory_hotplug.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index cdd66bfdf855..60f09d3ebb3d 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -24,17 +24,14 @@ struct vmem_altmap; * node_data[nid] = kzalloc() works well. But it depends on the architecture. * * In general, generic_alloc_nodedata() is used. - * Now, arch_free_nodedata() is just defined for error path of node_hot_add. * */ extern pg_data_t *arch_alloc_nodedata(int nid); -extern void arch_free_nodedata(pg_data_t *pgdat); extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat); #else /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ #define arch_alloc_nodedata(nid) generic_alloc_nodedata(nid) -#define arch_free_nodedata(pgdat) generic_free_nodedata(pgdat) #ifdef CONFIG_NUMA /* -- cgit From 70b5b46a754245d383811b4d2f2c76c34bb7e145 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 22 Mar 2022 14:47:00 -0700 Subject: mm, memory_hotplug: reorganize new pgdat initialization When a !node_online node is brought up it needs a hotplug specific initialization because the node could be either uninitialized yet or it could have been recycled after previous hotremove. hotadd_init_pgdat is responsible for that. Internal pgdat state is initialized at two places currently - hotadd_init_pgdat - free_area_init_core_hotplug There is no real clear cut what should go where but this patch's chosen to move the whole internal state initialization into free_area_init_core_hotplug. hotadd_init_pgdat is still responsible to pull all the parts together - most notably to initialize zonelists because those depend on the overall topology. This patch doesn't introduce any functional change. Link: https://lkml.kernel.org/r/20220127085305.20890-5-mhocko@kernel.org Signed-off-by: Michal Hocko Acked-by: Rafael Aquini Acked-by: David Hildenbrand Reviewed-by: Oscar Salvador Cc: Alexey Makhalov Cc: Christoph Lameter Cc: Dennis Zhou Cc: Eric Dumazet Cc: Mike Rapoport Cc: Nico Pache Cc: Tejun Heo Cc: Wei Yang Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memory_hotplug.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 60f09d3ebb3d..76bf2de86def 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct zone *zone); extern void clear_zone_contiguous(struct zone *zone); #ifdef CONFIG_MEMORY_HOTPLUG -extern void __ref free_area_init_core_hotplug(int nid); +extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat); extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags); extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags); extern int add_memory_resource(int nid, struct resource *resource, -- cgit From 2848a28b0a6052a4c8450397d2647d7d8e3f6f06 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 22 Mar 2022 14:47:13 -0700 Subject: drivers/base/node: consolidate node device subsystem initialization in node_dev_init() ... and call node_dev_init() after memory_dev_init() from driver_init(), so before any of the existing arch/subsys calls. All online nodes should be known at that point: early during boot, arch code determines node and zone ranges and sets the relevant nodes online; usually this happens in setup_arch(). This is in line with memory_dev_init(), which initializes the memory device subsystem and creates all memory block devices. Similar to memory_dev_init(), panic() if anything goes wrong, we don't want to continue with such basic initialization errors. The important part is that node_dev_init() gets called after memory_dev_init() and after cpu_dev_init(), but before any of the relevant archs call register_cpu() to register the new cpu device under the node device. The latter should be the case for the current users of topology_init(). Link: https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com Signed-off-by: David Hildenbrand Reviewed-by: Oscar Salvador Tested-by: Anatoly Pugachev (sparc64) Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: Oscar Salvador Cc: Mike Rapoport Cc: Catalin Marinas Cc: Will Deacon Cc: Thomas Bogendoerfer Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Albert Ou Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Yoshinori Sato Cc: Rich Felker Cc: "David S. Miller" Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "Rafael J. Wysocki" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/node.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/node.h b/include/linux/node.h index 81bbf1c0afd3..7f876d48af11 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -112,6 +112,7 @@ static inline void link_mem_sections(int nid, unsigned long start_pfn, extern void unregister_node(struct node *node); #ifdef CONFIG_NUMA +extern void node_dev_init(void); /* Core of the node registration - only memory hotplug should use this */ extern int __register_one_node(int nid); @@ -149,6 +150,9 @@ extern void register_hugetlbfs_with_node(node_registration_func_t doregister, node_registration_func_t unregister); #endif #else +static inline void node_dev_init(void) +{ +} static inline int __register_one_node(int nid) { return 0; -- cgit From cc6515591b25f08ce199e9379844a964f52a27f2 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 22 Mar 2022 14:47:28 -0700 Subject: drivers/base/node: rename link_mem_sections() to register_memory_block_under_node() Patch series "drivers/base/memory: determine and store zone for single-zone memory blocks", v2. I remember talking to Michal in the past about removing test_pages_in_a_zone(), which we use for: * verifying that a memory block we intend to offline is really only managed by a single zone. We don't support offlining of memory blocks that are managed by multiple zones (e.g., multiple nodes, DMA and DMA32) * exposing that zone to user space via /sys/devices/system/memory/memory*/valid_zones Now that I identified some more cases where test_pages_in_a_zone() might go wrong, and we received an UBSAN report (see patch #3), let's get rid of this PFN walker. So instead of detecting the zone at runtime with test_pages_in_a_zone() by scanning the memmap, let's determine and remember for each memory block if it's managed by a single zone. The stored zone can then be used for the above two cases, avoiding a manual lookup using test_pages_in_a_zone(). This avoids eventually stumbling over uninitialized memmaps in corner cases, especially when ZONE_DEVICE ranges partly fall into memory block (that are responsible for managing System RAM). Handling memory onlining is easy, because we online to exactly one zone. Handling boot memory is more tricky, because we want to avoid scanning all zones of all nodes to detect possible zones that overlap with the physical memory region of interest. Fortunately, we already have code that determines the applicable nodes for a memory block, to create sysfs links -- we'll hook into that. Patch #1 is a simple cleanup I had laying around for a longer time. Patch #2 contains the main logic to remove test_pages_in_a_zone() and further details. [1] https://lkml.kernel.org/r/20220128144540.153902-1-david@redhat.com [2] https://lkml.kernel.org/r/20220203105212.30385-1-david@redhat.com This patch (of 2): Let's adjust the stale terminology, making it match unregister_memory_block_under_nodes() and do_register_memory_block_under_node(). We're dealing with memory block devices, which span 1..X memory sections. Link: https://lkml.kernel.org/r/20220210184359.235565-1-david@redhat.com Link: https://lkml.kernel.org/r/20220210184359.235565-2-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Oscar Salvador Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: "Rafael J. Wysocki" Cc: Rafael Parra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/node.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/node.h b/include/linux/node.h index 7f876d48af11..40d641a8bfb0 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -99,13 +99,13 @@ extern struct node *node_devices[]; typedef void (*node_registration_func_t)(struct node *); #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA) -void link_mem_sections(int nid, unsigned long start_pfn, - unsigned long end_pfn, - enum meminit_context context); +void register_memory_blocks_under_node(int nid, unsigned long start_pfn, + unsigned long end_pfn, + enum meminit_context context); #else -static inline void link_mem_sections(int nid, unsigned long start_pfn, - unsigned long end_pfn, - enum meminit_context context) +static inline void register_memory_blocks_under_node(int nid, unsigned long start_pfn, + unsigned long end_pfn, + enum meminit_context context) { } #endif @@ -129,8 +129,8 @@ static inline int register_one_node(int nid) error = __register_one_node(nid); if (error) return error; - /* link memory sections under this node */ - link_mem_sections(nid, start_pfn, end_pfn, MEMINIT_EARLY); + register_memory_blocks_under_node(nid, start_pfn, end_pfn, + MEMINIT_EARLY); } return error; -- cgit From 395f6081bad49f9c54abafebab49ee23aa985bbd Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 22 Mar 2022 14:47:31 -0700 Subject: drivers/base/memory: determine and store zone for single-zone memory blocks test_pages_in_a_zone() is just another nasty PFN walker that can easily stumble over ZONE_DEVICE memory ranges falling into the same memory block as ordinary system RAM: the memmap of parts of these ranges might possibly be uninitialized. In fact, we observed (on an older kernel) with UBSAN: UBSAN: Undefined behaviour in ./include/linux/mm.h:1133:50 index 7 is out of range for type 'zone [5]' CPU: 121 PID: 35603 Comm: read_all Kdump: loaded Tainted: [...] Hardware name: Dell Inc. PowerEdge R7425/08V001, BIOS 1.12.2 11/15/2019 Call Trace: dump_stack+0x9a/0xf0 ubsan_epilogue+0x9/0x7a __ubsan_handle_out_of_bounds+0x13a/0x181 test_pages_in_a_zone+0x3c4/0x500 show_valid_zones+0x1fa/0x380 dev_attr_show+0x43/0xb0 sysfs_kf_seq_show+0x1c5/0x440 seq_read+0x49d/0x1190 vfs_read+0xff/0x300 ksys_read+0xb8/0x170 do_syscall_64+0xa5/0x4b0 entry_SYSCALL_64_after_hwframe+0x6a/0xdf RIP: 0033:0x7f01f4439b52 We seem to stumble over a memmap that contains a garbage zone id. While we could try inserting pfn_to_online_page() calls, it will just make memory offlining slower, because we use test_pages_in_a_zone() to make sure we're offlining pages that all belong to the same zone. Let's just get rid of this PFN walker and determine the single zone of a memory block -- if any -- for early memory blocks during boot. For memory onlining, we know the single zone already. Let's avoid any additional memmap scanning and just rely on the zone information available during boot. For memory hot(un)plug, we only really care about memory blocks that: * span a single zone (and, thereby, a single node) * are completely System RAM (IOW, no holes, no ZONE_DEVICE) If one of these conditions is not met, we reject memory offlining. Hotplugged memory blocks (starting out offline), always meet both conditions. There are three scenarios to handle: (1) Memory hot(un)plug A memory block with zone == NULL cannot be offlined, corresponding to our previous test_pages_in_a_zone() check. After successful memory onlining/offlining, we simply set the zone accordingly. * Memory onlining: set the zone we just used for onlining * Memory offlining: set zone = NULL So a hotplugged memory block starts with zone = NULL. Once memory onlining is done, we set the proper zone. (2) Boot memory with !CONFIG_NUMA We know that there is just a single pgdat, so we simply scan all zones of that pgdat for an intersection with our memory block PFN range when adding the memory block. If more than one zone intersects (e.g., DMA and DMA32 on x86 for the first memory block) we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. (3) Boot memory with CONFIG_NUMA At the point in time we create the memory block devices during boot, we don't know yet which nodes *actually* span a memory block. While we could scan all zones of all nodes for intersections, overlapping nodes complicate the situation and scanning all nodes is possibly expensive. But that problem has already been solved by the code that sets the node of a memory block and creates the link in the sysfs -- do_register_memory_block_under_node(). So, we hook into the code that sets the node id for a memory block. If we already have a different node id set for the memory block, we know that multiple nodes *actually* have PFNs falling into our memory block: we set zone = NULL and consequently mimic what test_pages_in_a_zone() used to do. If there is no node id set, we do the same as (2) for the given node. Note that the call order in driver_init() is: -> memory_dev_init(): create memory block devices -> node_dev_init(): link memory block devices to the node and set the node id So in summary, we detect if there is a single zone responsible for this memory block and we consequently store the zone in that case in the memory block, updating it during memory onlining/offlining. Link: https://lkml.kernel.org/r/20220210184359.235565-3-david@redhat.com Signed-off-by: David Hildenbrand Reported-by: Rafael Parra Reviewed-by: Oscar Salvador Cc: "Rafael J. Wysocki" Cc: Greg Kroah-Hartman Cc: Michal Hocko Cc: Rafael Parra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memory.h | 12 ++++++++++++ include/linux/memory_hotplug.h | 6 ++---- 2 files changed, 14 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory.h b/include/linux/memory.h index 88eb587b5143..aa619464a1df 100644 --- a/include/linux/memory.h +++ b/include/linux/memory.h @@ -70,6 +70,13 @@ struct memory_block { unsigned long state; /* serialized by the dev->lock */ int online_type; /* for passing data to online routine */ int nid; /* NID for this memory block */ + /* + * The single zone of this memory block if all PFNs of this memory block + * that are System RAM (not a memory hole, not ZONE_DEVICE ranges) are + * managed by a single zone. NULL if multiple zones (including nodes) + * apply. + */ + struct zone *zone; struct device dev; /* * Number of vmemmap pages. These pages @@ -161,6 +168,11 @@ int walk_dynamic_memory_groups(int nid, walk_memory_groups_func_t func, }) #define register_hotmemory_notifier(nb) register_memory_notifier(nb) #define unregister_hotmemory_notifier(nb) unregister_memory_notifier(nb) + +#ifdef CONFIG_NUMA +void memory_block_add_nid(struct memory_block *mem, int nid, + enum meminit_context context); +#endif /* CONFIG_NUMA */ #endif /* CONFIG_MEMORY_HOTPLUG */ /* diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 76bf2de86def..1ce6f8044f1e 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -163,8 +163,6 @@ extern int mhp_init_memmap_on_memory(unsigned long pfn, unsigned long nr_pages, extern void mhp_deinit_memmap_on_memory(unsigned long pfn, unsigned long nr_pages); extern int online_pages(unsigned long pfn, unsigned long nr_pages, struct zone *zone, struct memory_group *group); -extern struct zone *test_pages_in_a_zone(unsigned long start_pfn, - unsigned long end_pfn); extern void __offline_isolated_pages(unsigned long start_pfn, unsigned long end_pfn); @@ -293,7 +291,7 @@ static inline void pgdat_resize_init(struct pglist_data *pgdat) {} extern void try_offline_node(int nid); extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages, - struct memory_group *group); + struct zone *zone, struct memory_group *group); extern int remove_memory(u64 start, u64 size); extern void __remove_memory(u64 start, u64 size); extern int offline_and_remove_memory(u64 start, u64 size); @@ -302,7 +300,7 @@ extern int offline_and_remove_memory(u64 start, u64 size); static inline void try_offline_node(int nid) {} static inline int offline_pages(unsigned long start_pfn, unsigned long nr_pages, - struct memory_group *group) + struct zone *zone, struct memory_group *group) { return -EINVAL; } -- cgit From 734c15700cdf9062ae98d8b131c6fe873dfad26d Mon Sep 17 00:00:00 2001 From: Oscar Salvador Date: Tue, 22 Mar 2022 14:47:37 -0700 Subject: mm: only re-generate demotion targets when a numa node changes its N_CPU state Abhishek reported that after patch [1], hotplug operations are taking roughly double the expected time. [2] The reason behind is that the CPU callbacks that migrate_on_reclaim_init() sets always call set_migration_target_nodes() whenever a CPU is brought up/down. But we only care about numa nodes going from having cpus to become cpuless, and vice versa, as that influences the demotion_target order. We do already have two CPU callbacks (vmstat_cpu_online() and vmstat_cpu_dead()) that check exactly that, so get rid of the CPU callbacks in migrate_on_reclaim_init() and only call set_migration_target_nodes() from vmstat_cpu_{dead,online}() whenever a numa node change its N_CPU state. [1] https://lore.kernel.org/linux-mm/20210721063926.3024591-2-ying.huang@intel.com/ [2] https://lore.kernel.org/linux-mm/eb438ddd-2919-73d4-bd9f-b7eecdd9577a@linux.vnet.ibm.com/ [osalvador@suse.de: add feedback from Huang Ying] Link: https://lkml.kernel.org/r/20220314150945.12694-1-osalvador@suse.de Link: https://lkml.kernel.org/r/20220310120749.23077-1-osalvador@suse.de Fixes: 884a6e5d1f93b ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: Oscar Salvador Reviewed-by: Baolin Wang Tested-by: Baolin Wang Reported-by: Abhishek Goel Cc: Dave Hansen Cc: "Huang, Ying" Cc: Abhishek Goel Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/migrate.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index db96e10eb8da..90e75d5a54d6 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -48,7 +48,15 @@ int folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio, struct folio *folio, int extra_count); extern bool numa_demotion_enabled; +extern void migrate_on_reclaim_init(void); +#ifdef CONFIG_HOTPLUG_CPU +extern void set_migration_target_nodes(void); #else +static inline void set_migration_target_nodes(void) {} +#endif +#else + +static inline void set_migration_target_nodes(void) {} static inline void putback_movable_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t new, -- cgit From 6eada26ffc80bfe1f2db088be0c44ec82b5cd3dc Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Tue, 22 Mar 2022 14:47:46 -0700 Subject: mm: remove usercopy_warn() Users of usercopy_warn() were removed by commit 53944f171a89 ("mm: remove HARDENED_USERCOPY_FALLBACK") Remove it. Link: https://lkml.kernel.org/r/5f26643fc70b05f8455b60b99c30c17d635fa640.1644231910.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy Reviewed-by: Miaohe Lin Reviewed-by: Stephen Kitt Reviewed-by: Muchun Song Cc: Kees Cook Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/uaccess.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index ac0394087f7d..bca27b4e5eb2 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -401,8 +401,6 @@ static inline void user_access_restore(unsigned long flags) { } #endif #ifdef CONFIG_HARDENED_USERCOPY -void usercopy_warn(const char *name, const char *detail, bool to_user, - unsigned long offset, unsigned long len); void __noreturn usercopy_abort(const char *name, const char *detail, bool to_user, unsigned long offset, unsigned long len); -- cgit From ad7489d5262d2aa775b5e5a1782793925fa90065 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Tue, 22 Mar 2022 14:47:49 -0700 Subject: mm: uninline copy_overflow() While building a small config with CONFIG_CC_OPTIMISE_FOR_SIZE, I ended up with more than 50 times the following function in vmlinux because GCC doesn't honor the 'inline' keyword: c00243bc : c00243bc: 94 21 ff f0 stwu r1,-16(r1) c00243c0: 7c 85 23 78 mr r5,r4 c00243c4: 7c 64 1b 78 mr r4,r3 c00243c8: 3c 60 c0 62 lis r3,-16286 c00243cc: 7c 08 02 a6 mflr r0 c00243d0: 38 63 5e e5 addi r3,r3,24293 c00243d4: 90 01 00 14 stw r0,20(r1) c00243d8: 4b ff 82 45 bl c001c61c <__warn_printk> c00243dc: 0f e0 00 00 twui r0,0 c00243e0: 80 01 00 14 lwz r0,20(r1) c00243e4: 38 21 00 10 addi r1,r1,16 c00243e8: 7c 08 03 a6 mtlr r0 c00243ec: 4e 80 00 20 blr With -Winline, GCC tells: /include/linux/thread_info.h:212:20: warning: inlining failed in call to 'copy_overflow': call is unlikely and code size would grow [-Winline] copy_overflow() is a non conditional warning called by check_copy_size() on an error path. check_copy_size() have to remain inlined in order to benefit from constant folding, but copy_overflow() is not worth inlining. Uninline the warning when CONFIG_BUG is selected. When CONFIG_BUG is not selected, WARN() does nothing so skip it. This reduces the size of vmlinux by almost 4kbytes. Link: https://lkml.kernel.org/r/e1723b9cfa924bcefcd41f69d0025b38e4c9364e.1644819985.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy Cc: David Laight Cc: Anshuman Khandual Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/thread_info.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/thread_info.h b/include/linux/thread_info.h index 73a6f34b3847..9f392ec76f2b 100644 --- a/include/linux/thread_info.h +++ b/include/linux/thread_info.h @@ -209,9 +209,12 @@ __bad_copy_from(void); extern void __compiletime_error("copy destination size is too small") __bad_copy_to(void); +void __copy_overflow(int size, unsigned long count); + static inline void copy_overflow(int size, unsigned long count) { - WARN(1, "Buffer overflow detected (%d < %lu)!\n", size, count); + if (IS_ENABLED(CONFIG_BUG)) + __copy_overflow(size, count); } static __always_inline __must_check bool -- cgit From d7ca25c53e25a9a628aaa19b5a031f115d8c353d Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Tue, 22 Mar 2022 14:47:58 -0700 Subject: highmem: document kunmap_local() Some users of kmap() add an offset to the kmap() address to be used during the mapping. When converting to kmap_local_page() the base address does not need to be stored because any address within the page can be used in kunmap_local(). However, this was not clear from the documentation and cause some questions.[1] Document that any address in the page can be used in kunmap_local() to clarify this for future users. [1] https://lore.kernel.org/lkml/20211213154543.GM3538886@iweiny-DESK2.sc.intel.com/ [ira.weiny@intel.com: updates per Christoph] Link: https://lkml.kernel.org/r/20220124182138.816693-1-ira.weiny@intel.com Link: https://lkml.kernel.org/r/20220124013045.806718-1-ira.weiny@intel.com Signed-off-by: Ira Weiny Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/highmem-internal.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/highmem-internal.h b/include/linux/highmem-internal.h index 0a0b2b09b1b8..a77be5630209 100644 --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -246,6 +246,16 @@ do { \ __kunmap_atomic(__addr); \ } while (0) +/** + * kunmap_local - Unmap a page mapped via kmap_local_page(). + * @__addr: An address within the page mapped + * + * @__addr can be any address within the mapped page. Commonly it is the + * address return from kmap_local_page(), but it can also include offsets. + * + * Unmapping should be done in the reverse order of the mapping. See + * kmap_local_page() for details. + */ #define kunmap_local(__addr) \ do { \ BUILD_BUG_ON(__same_type((__addr), struct page *)); \ -- cgit From 436428255d5981e49ff015fc8e398ecf2ba10c24 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 22 Mar 2022 14:48:37 -0700 Subject: mm/damon/core: move damon_set_targets() into dbgfs damon_set_targets() function is defined in the core for general use cases, but called from only dbgfs. Also, because the function is for general use cases, dbgfs does additional handling of pid type target id case. To make the situation simpler, this commit moves the function into dbgfs and makes it to do the pid type case handling on its own. Link: https://lkml.kernel.org/r/20211230100723.2238-4-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/damon.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 5e1e3a128b77..bd021af5db3d 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -484,8 +484,6 @@ unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); -int damon_set_targets(struct damon_ctx *ctx, - unsigned long *ids, ssize_t nr_ids); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, unsigned long aggr_int, unsigned long primitive_upd_int, unsigned long min_nr_reg, unsigned long max_nr_reg); -- cgit From 1971bd630452e943380429336a851c55b027eed1 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 22 Mar 2022 14:48:40 -0700 Subject: mm/damon: remove the target id concept DAMON asks each monitoring target ('struct damon_target') to have one 'unsigned long' integer called 'id', which should be unique among the targets of same monitoring context. Meaning of it is, however, totally up to the monitoring primitives that registered to the monitoring context. For example, the virtual address spaces monitoring primitives treats the id as a 'struct pid' pointer. This makes the code flexible, but ugly, not well-documented, and type-unsafe[1]. Also, identification of each target can be done via its index. For the reason, this commit removes the concept and uses clear type definition. For now, only 'struct pid' pointer is used for the virtual address spaces monitoring. If DAMON is extended in future so that we need to put another identifier field in the struct, we will use a union for such primitives-dependent fields and document which primitives are using which type. [1] https://lore.kernel.org/linux-mm/20211013154535.4aaeaaf9d0182922e405dd1e@linux-foundation.org/ Link: https://lkml.kernel.org/r/20211230100723.2238-5-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/damon.h | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index bd021af5db3d..7c1d915b3587 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -60,19 +60,18 @@ struct damon_region { /** * struct damon_target - Represents a monitoring target. - * @id: Unique identifier for this target. + * @pid: The PID of the virtual address space to monitor. * @nr_regions: Number of monitoring target regions of this target. * @regions_list: Head of the monitoring target regions of this target. * @list: List head for siblings. * * Each monitoring context could have multiple targets. For example, a context * for virtual memory address spaces could have multiple target processes. The - * @id of each target should be unique among the targets of the context. For - * example, in the virtual address monitoring context, it could be a pidfd or - * an address of an mm_struct. + * @pid should be set for appropriate address space monitoring primitives + * including the virtual address spaces monitoring primitives. */ struct damon_target { - unsigned long id; + struct pid *pid; unsigned int nr_regions; struct list_head regions_list; struct list_head list; @@ -475,7 +474,7 @@ struct damos *damon_new_scheme( void damon_add_scheme(struct damon_ctx *ctx, struct damos *s); void damon_destroy_scheme(struct damos *s); -struct damon_target *damon_new_target(unsigned long id); +struct damon_target *damon_new_target(void); void damon_add_target(struct damon_ctx *ctx, struct damon_target *t); bool damon_targets_empty(struct damon_ctx *ctx); void damon_free_target(struct damon_target *t); -- cgit From f7d911c39cbbb88d625216a0cfd0517a3047c46e Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 22 Mar 2022 14:48:46 -0700 Subject: mm/damon: rename damon_primitives to damon_operations Patch series "Allow DAMON user code independent of monitoring primitives". In-kernel DAMON user code is required to configure the monitoring context (struct damon_ctx) with proper monitoring primitives (struct damon_primitive). This makes the user code dependent to all supporting monitoring primitives. For example, DAMON debugfs interface depends on both DAMON_VADDR and DAMON_PADDR, though some users have interest in only one use case. As more monitoring primitives are introduced, the problem will be bigger. To minimize such unnecessary dependency, this patchset makes monitoring primitives can be registered by the implemnting code and later dynamically searched and selected by the user code. In addition to that, this patchset renames monitoring primitives to monitoring operations, which is more easy to intuitively understand what it means and how it would be structed. This patch (of 8): DAMON has a set of callback functions called monitoring primitives and let it can be configured with various implementations for easy extension for different address spaces and usages. However, the word 'primitive' is not so explicit. Meanwhile, many other structs resembles similar purpose calls themselves 'operations'. To make the code easier to be understood, this commit renames 'damon_primitives' to 'damon_operations' before it is too late to rename. Link: https://lkml.kernel.org/r/20220215184603.1479-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220215184603.1479-2-sj@kernel.org Signed-off-by: SeongJae Park Cc: Xin Hao Cc: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/damon.h | 48 ++++++++++++++++++++++++------------------------ 1 file changed, 24 insertions(+), 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 7c1d915b3587..00baeb42c18e 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -67,8 +67,8 @@ struct damon_region { * * Each monitoring context could have multiple targets. For example, a context * for virtual memory address spaces could have multiple target processes. The - * @pid should be set for appropriate address space monitoring primitives - * including the virtual address spaces monitoring primitives. + * @pid should be set for appropriate &struct damon_operations including the + * virtual address spaces monitoring operations. */ struct damon_target { struct pid *pid; @@ -120,9 +120,9 @@ enum damos_action { * uses smaller one as the effective quota. * * For selecting regions within the quota, DAMON prioritizes current scheme's - * target memory regions using the &struct damon_primitive->get_scheme_score. + * target memory regions using the &struct damon_operations->get_scheme_score. * You could customize the prioritization logic by setting &weight_sz, - * &weight_nr_accesses, and &weight_age, because monitoring primitives are + * &weight_nr_accesses, and &weight_age, because monitoring operations are * encouraged to respect those. */ struct damos_quota { @@ -256,10 +256,10 @@ struct damos { struct damon_ctx; /** - * struct damon_primitive - Monitoring primitives for given use cases. + * struct damon_operations - Monitoring operations for given use cases. * - * @init: Initialize primitive-internal data structures. - * @update: Update primitive-internal data structures. + * @init: Initialize operations-related data structures. + * @update: Update operations-related data structures. * @prepare_access_checks: Prepare next access check of target regions. * @check_accesses: Check the accesses to target regions. * @reset_aggregated: Reset aggregated accesses monitoring results. @@ -269,18 +269,18 @@ struct damon_ctx; * @cleanup: Clean up the context. * * DAMON can be extended for various address spaces and usages. For this, - * users should register the low level primitives for their target address - * space and usecase via the &damon_ctx.primitive. Then, the monitoring thread + * users should register the low level operations for their target address + * space and usecase via the &damon_ctx.ops. Then, the monitoring thread * (&damon_ctx.kdamond) calls @init and @prepare_access_checks before starting - * the monitoring, @update after each &damon_ctx.primitive_update_interval, and + * the monitoring, @update after each &damon_ctx.ops_update_interval, and * @check_accesses, @target_valid and @prepare_access_checks after each * &damon_ctx.sample_interval. Finally, @reset_aggregated is called after each * &damon_ctx.aggr_interval. * - * @init should initialize primitive-internal data structures. For example, + * @init should initialize operations-related data structures. For example, * this could be used to construct proper monitoring target regions and link * those to @damon_ctx.adaptive_targets. - * @update should update the primitive-internal data structures. For example, + * @update should update the operations-related data structures. For example, * this could be used to update monitoring target regions for current status. * @prepare_access_checks should manipulate the monitoring regions to be * prepared for the next access check. @@ -300,7 +300,7 @@ struct damon_ctx; * monitoring. * @cleanup is called from @kdamond just before its termination. */ -struct damon_primitive { +struct damon_operations { void (*init)(struct damon_ctx *context); void (*update)(struct damon_ctx *context); void (*prepare_access_checks)(struct damon_ctx *context); @@ -354,15 +354,15 @@ struct damon_callback { * * @sample_interval: The time between access samplings. * @aggr_interval: The time between monitor results aggregations. - * @primitive_update_interval: The time between monitoring primitive updates. + * @ops_update_interval: The time between monitoring operations updates. * * For each @sample_interval, DAMON checks whether each region is accessed or * not. It aggregates and keeps the access information (number of accesses to * each region) for @aggr_interval time. DAMON also checks whether the target * memory regions need update (e.g., by ``mmap()`` calls from the application, * in case of virtual memory monitoring) and applies the changes for each - * @primitive_update_interval. All time intervals are in micro-seconds. - * Please refer to &struct damon_primitive and &struct damon_callback for more + * @ops_update_interval. All time intervals are in micro-seconds. + * Please refer to &struct damon_operations and &struct damon_callback for more * detail. * * @kdamond: Kernel thread who does the monitoring. @@ -374,7 +374,7 @@ struct damon_callback { * * Once started, the monitoring thread runs until explicitly required to be * terminated or every monitoring target is invalid. The validity of the - * targets is checked via the &damon_primitive.target_valid of @primitive. The + * targets is checked via the &damon_operations.target_valid of @ops. The * termination can also be explicitly requested by writing non-zero to * @kdamond_stop. The thread sets @kdamond to NULL when it terminates. * Therefore, users can know whether the monitoring is ongoing or terminated by @@ -384,7 +384,7 @@ struct damon_callback { * Note that the monitoring thread protects only @kdamond and @kdamond_stop via * @kdamond_lock. Accesses to other fields must be protected by themselves. * - * @primitive: Set of monitoring primitives for given use cases. + * @ops: Set of monitoring operations for given use cases. * @callback: Set of callbacks for monitoring events notifications. * * @min_nr_regions: The minimum number of adaptive monitoring regions. @@ -395,17 +395,17 @@ struct damon_callback { struct damon_ctx { unsigned long sample_interval; unsigned long aggr_interval; - unsigned long primitive_update_interval; + unsigned long ops_update_interval; /* private: internal use only */ struct timespec64 last_aggregation; - struct timespec64 last_primitive_update; + struct timespec64 last_ops_update; /* public: */ struct task_struct *kdamond; struct mutex kdamond_lock; - struct damon_primitive primitive; + struct damon_operations ops; struct damon_callback callback; unsigned long min_nr_regions; @@ -484,7 +484,7 @@ unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long primitive_upd_int, + unsigned long aggr_int, unsigned long ops_upd_int, unsigned long min_nr_reg, unsigned long max_nr_reg); int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes, ssize_t nr_schemes); @@ -497,12 +497,12 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); #ifdef CONFIG_DAMON_VADDR bool damon_va_target_valid(void *t); -void damon_va_set_primitives(struct damon_ctx *ctx); +void damon_va_set_operations(struct damon_ctx *ctx); #endif /* CONFIG_DAMON_VADDR */ #ifdef CONFIG_DAMON_PADDR bool damon_pa_target_valid(void *t); -void damon_pa_set_primitives(struct damon_ctx *ctx); +void damon_pa_set_operations(struct damon_ctx *ctx); #endif /* CONFIG_DAMON_PADDR */ #endif /* _DAMON_H */ -- cgit From 9f7b053a0f6121f89e00d1688bfca0bf278caa25 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 22 Mar 2022 14:48:49 -0700 Subject: mm/damon: let monitoring operations can be registered and selected In-kernel DAMON user code like DAMON debugfs interface should set 'struct damon_operations' of its 'struct damon_ctx' on its own. Therefore, the client code should depend on all supporting monitoring operations implementations that it could use. For example, DAMON debugfs interface depends on both vaddr and paddr, while some of the users are not always interested in both. To minimize such unnecessary dependencies, this commit makes the monitoring operations can be registered by implementing code and then dynamically selected by the user code without build-time dependency. Link: https://lkml.kernel.org/r/20220215184603.1479-3-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Xin Hao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/damon.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 00baeb42c18e..076da277b249 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -253,11 +253,24 @@ struct damos { struct list_head list; }; +/** + * enum damon_ops_id - Identifier for each monitoring operations implementation + * + * @DAMON_OPS_VADDR: Monitoring operations for virtual address spaces + * @DAMON_OPS_PADDR: Monitoring operations for the physical address space + */ +enum damon_ops_id { + DAMON_OPS_VADDR, + DAMON_OPS_PADDR, + NR_DAMON_OPS, +}; + struct damon_ctx; /** * struct damon_operations - Monitoring operations for given use cases. * + * @id: Identifier of this operations set. * @init: Initialize operations-related data structures. * @update: Update operations-related data structures. * @prepare_access_checks: Prepare next access check of target regions. @@ -277,6 +290,8 @@ struct damon_ctx; * &damon_ctx.sample_interval. Finally, @reset_aggregated is called after each * &damon_ctx.aggr_interval. * + * Each &struct damon_operations instance having valid @id can be registered + * via damon_register_ops() and selected by damon_select_ops() later. * @init should initialize operations-related data structures. For example, * this could be used to construct proper monitoring target regions and link * those to @damon_ctx.adaptive_targets. @@ -301,6 +316,7 @@ struct damon_ctx; * @cleanup is called from @kdamond just before its termination. */ struct damon_operations { + enum damon_ops_id id; void (*init)(struct damon_ctx *context); void (*update)(struct damon_ctx *context); void (*prepare_access_checks)(struct damon_ctx *context); @@ -489,6 +505,8 @@ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes, ssize_t nr_schemes); int damon_nr_running_ctxs(void); +int damon_register_ops(struct damon_operations *ops); +int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id); int damon_start(struct damon_ctx **ctxs, int nr_ctxs); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); -- cgit From 851040566a008f7248cb754d5bb9a3e34f2effe5 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 22 Mar 2022 14:49:07 -0700 Subject: mm/damon/paddr,vaddr: remove damon_{p,v}a_{target_valid,set_operations}() Because DAMON debugfs interface and DAMON-based proactive reclaim are now using monitoring operations via registration mechanism, damon_{p,v}a_{target_valid,set_operations}() functions have no user. This commit clean them up. Link: https://lkml.kernel.org/r/20220215184603.1479-9-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Xin Hao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/damon.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 076da277b249..49c4a11ecf20 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -513,14 +513,4 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); #endif /* CONFIG_DAMON */ -#ifdef CONFIG_DAMON_VADDR -bool damon_va_target_valid(void *t); -void damon_va_set_operations(struct damon_ctx *ctx); -#endif /* CONFIG_DAMON_VADDR */ - -#ifdef CONFIG_DAMON_PADDR -bool damon_pa_target_valid(void *t); -void damon_pa_set_operations(struct damon_ctx *ctx); -#endif /* CONFIG_DAMON_PADDR */ - #endif /* _DAMON_H */ -- cgit From 8b9b0d335a345cbc590baa5aa8aa02c467c1e4e8 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 22 Mar 2022 14:49:21 -0700 Subject: mm/damon/core: allow non-exclusive DAMON start/stop MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "Introduce DAMON sysfs interface", v3. Introduction ============ DAMON's debugfs-based user interface (DAMON_DBGFS) served very well, so far. However, it unnecessarily depends on debugfs, while DAMON is not aimed to be used for only debugging. Also, the interface receives multiple values via one file. For example, schemes file receives 18 values. As a result, it is inefficient, hard to be used, and difficult to be extended. Especially, keeping backward compatibility of user space tools is getting only challenging. It would be better to implement another reliable and flexible interface and deprecate DAMON_DBGFS in long term. For the reason, this patchset introduces a sysfs-based new user interface of DAMON. The idea of the new interface is, using directory hierarchies and having one dedicated file for each value. For a short example, users can do the virtual address monitoring via the interface as below: # cd /sys/kernel/mm/damon/admin/ # echo 1 > kdamonds/nr_kdamonds # echo 1 > kdamonds/0/contexts/nr_contexts # echo vaddr > kdamonds/0/contexts/0/operations # echo 1 > kdamonds/0/contexts/0/targets/nr_targets # echo $(pidof ) > kdamonds/0/contexts/0/targets/0/pid_target # echo on > kdamonds/0/state A brief representation of the files hierarchy of DAMON sysfs interface is as below. Childs are represented with indentation, directories are having '/' suffix, and files in each directory are separated by comma. /sys/kernel/mm/damon/admin │ kdamonds/nr_kdamonds │ │ 0/state,pid │ │ │ contexts/nr_contexts │ │ │ │ 0/operations │ │ │ │ │ monitoring_attrs/ │ │ │ │ │ │ intervals/sample_us,aggr_us,update_us │ │ │ │ │ │ nr_regions/min,max │ │ │ │ │ targets/nr_targets │ │ │ │ │ │ 0/pid_target │ │ │ │ │ │ │ regions/nr_regions │ │ │ │ │ │ │ │ 0/start,end │ │ │ │ │ │ │ │ ... │ │ │ │ │ │ ... │ │ │ │ │ schemes/nr_schemes │ │ │ │ │ │ 0/action │ │ │ │ │ │ │ access_pattern/ │ │ │ │ │ │ │ │ sz/min,max │ │ │ │ │ │ │ │ nr_accesses/min,max │ │ │ │ │ │ │ │ age/min,max │ │ │ │ │ │ │ quotas/ms,bytes,reset_interval_ms │ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil │ │ │ │ │ │ │ watermarks/metric,interval_us,high,mid,low │ │ │ │ │ │ │ stats/nr_tried,sz_tried,nr_applied,sz_applied,qt_exceeds │ │ │ │ │ │ ... │ │ │ │ ... │ │ ... Detailed usage of the files will be described in the final Documentation patch of this patchset. Main Difference Between DAMON_DBGFS and DAMON_SYSFS --------------------------------------------------- At the moment, DAMON_DBGFS and DAMON_SYSFS provides same features. One important difference between them is their exclusiveness. DAMON_DBGFS works in an exclusive manner, so that no DAMON worker thread (kdamond) in the system can run concurrently and interfere somehow. For the reason, DAMON_DBGFS asks users to construct all monitoring contexts and start them at once. It's not a big problem but makes the operation a little bit complex and unflexible. For more flexible usage, DAMON_SYSFS moves the responsibility of preventing any possible interference to the admins and work in a non-exclusive manner. That is, users can configure and start contexts one by one. Note that DAMON respects both exclusive groups and non-exclusive groups of contexts, in a manner similar to that of reader-writer locks. That is, if any exclusive monitoring contexts (e.g., contexts that started via DAMON_DBGFS) are running, DAMON_SYSFS does not start new contexts, and vice versa. Future Plan of DAMON_DBGFS Deprecation ====================================== Once this patchset is merged, DAMON_DBGFS development will be frozen. That is, we will maintain it to work as is now so that no users will be break. But, it will not be extended to provide any new feature of DAMON. The support will be continued only until next LTS release. After that, we will drop DAMON_DBGFS. User-space Tooling Compatibility -------------------------------- As DAMON_SYSFS provides all features of DAMON_DBGFS, all user space tooling can move to DAMON_SYSFS. As we will continue supporting DAMON_DBGFS until next LTS kernel release, user space tools would have enough time to move to DAMON_SYSFS. The official user space tool, damo[1], is already supporting both DAMON_SYSFS and DAMON_DBGFS. Both correctness tests[2] and performance tests[3] of DAMON using DAMON_SYSFS also passed. [1] https://github.com/awslabs/damo [2] https://github.com/awslabs/damon-tests/tree/master/corr [3] https://github.com/awslabs/damon-tests/tree/master/perf Sequence of Patches =================== First two patches (patches 1-2) make core changes for DAMON_SYSFS. The first one (patch 1) allows non-exclusive DAMON contexts so that DAMON_SYSFS can work in non-exclusive mode, while the second one (patch 2) adds size of DAMON enum types so that DAMON API users can safely iterate the enums. Third patch (patch 3) implements basic sysfs stub for virtual address spaces monitoring. Note that this implements only sysfs files and DAMON is not linked. Fourth patch (patch 4) links the DAMON_SYSFS to DAMON so that users can control DAMON using the sysfs files. Following six patches (patches 5-10) implements other DAMON features that DAMON_DBGFS supports one by one (physical address space monitoring, DAMON-based operation schemes, schemes quotas, schemes prioritization weights, schemes watermarks, and schemes stats). Following patch (patch 11) adds a simple selftest for DAMON_SYSFS, and the final one (patch 12) documents DAMON_SYSFS. This patch (of 13): To avoid interference between DAMON contexts monitoring overlapping memory regions, damon_start() works in an exclusive manner. That is, damon_start() does nothing bug fails if any context that started by another instance of the function is still running. This makes its usage a little bit restrictive. However, admins could aware each DAMON usage and address such interferences on their own in some cases. This commit hence implements non-exclusive mode of the function and allows the callers to select the mode. Note that the exclusive groups and non-exclusive groups of contexts will respect each other in a manner similar to that of reader-writer locks. Therefore, this commit will not cause any behavioral change to the exclusive groups. Link: https://lkml.kernel.org/r/20220228081314.5770-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220228081314.5770-2-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Cc: Shuah Khan Cc: David Rientjes Cc: Xin Hao Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/damon.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 49c4a11ecf20..f8e99e47d747 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -508,7 +508,7 @@ int damon_nr_running_ctxs(void); int damon_register_ops(struct damon_operations *ops); int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id); -int damon_start(struct damon_ctx **ctxs, int nr_ctxs); +int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); #endif /* CONFIG_DAMON */ -- cgit From 5257f36ec289d544532f2889cbed11abbb06cf0c Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 22 Mar 2022 14:49:24 -0700 Subject: mm/damon/core: add number of each enum type values This commit declares the number of legal values for each DAMON enum types to make traversals of such DAMON enum types easy and safe. Link: https://lkml.kernel.org/r/20220228081314.5770-3-sj@kernel.org Signed-off-by: SeongJae Park Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Jonathan Corbet Cc: Shuah Khan Cc: Xin Hao Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/damon.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index f8e99e47d747..f23cbfa4248d 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -87,6 +87,7 @@ struct damon_target { * @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE. * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE. * @DAMOS_STAT: Do nothing but count the stat. + * @NR_DAMOS_ACTIONS: Total number of DAMOS actions */ enum damos_action { DAMOS_WILLNEED, @@ -95,6 +96,7 @@ enum damos_action { DAMOS_HUGEPAGE, DAMOS_NOHUGEPAGE, DAMOS_STAT, /* Do nothing but only record the stat */ + NR_DAMOS_ACTIONS, }; /** @@ -157,10 +159,12 @@ struct damos_quota { * * @DAMOS_WMARK_NONE: Ignore the watermarks of the given scheme. * @DAMOS_WMARK_FREE_MEM_RATE: Free memory rate of the system in [0,1000]. + * @NR_DAMOS_WMARK_METRICS: Total number of DAMOS watermark metrics */ enum damos_wmark_metric { DAMOS_WMARK_NONE, DAMOS_WMARK_FREE_MEM_RATE, + NR_DAMOS_WMARK_METRICS, }; /** -- cgit From 2af7e566a8616c278e1d7287ce86cd3900bed943 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Tue, 22 Mar 2022 10:22:24 -0700 Subject: net/mlx5e: Fix build warning, detected write beyond size of field When merged with Linus tree, the cited patch below will cause the following build warning: In function 'fortify_memset_chk', inlined from 'mlx5e_xmit_xdp_frame' at drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c:438:3: include/linux/fortify-string.h:242:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix that by grouping the fields to memeset in struct_group() to avoid the false alarm. Fixes: 9ded70fa1d81 ("net/mlx5e: Don't prefill WQEs in XDP SQ in the multi buffer mode") Reported-by: Stephen Rothwell Suggested-by: Stephen Rothwell Signed-off-by: Saeed Mahameed Link: https://lore.kernel.org/r/20220322172224.31849-1-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/qp.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index 61e48d459b23..8bda3ba5b109 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -202,6 +202,9 @@ struct mlx5_wqe_fmr_seg { struct mlx5_wqe_ctrl_seg { __be32 opmod_idx_opcode; __be32 qpn_ds; + + struct_group(trailer, + u8 signature; u8 rsvd[2]; u8 fm_ce_se; @@ -211,6 +214,8 @@ struct mlx5_wqe_ctrl_seg { __be32 umr_mkey; __be32 tis_tir_num; }; + + ); /* end of trailer group */ }; #define MLX5_WQE_CTRL_DS_MASK 0x3f -- cgit From d578c770c85233af592e54537f93f3831bde7e9a Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Wed, 23 Mar 2022 09:13:08 +0800 Subject: block: avoid calling blkg_free() in atomic context blkg_free() can currently be called in atomic context, either spin lock is held, or run in rcu callback. Meantime either request queue's release handler or ->pd_free_fn can sleep. Fix the issue by scheduling a work function for freeing blkcg_gq the instance. [ 148.553894] BUG: sleeping function called from invalid context at block/blk-sysfs.c:767 [ 148.557381] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 0, name: swapper/13 [ 148.560741] preempt_count: 101, expected: 0 [ 148.562577] RCU nest depth: 0, expected: 0 [ 148.564379] 1 lock held by swapper/13/0: [ 148.566127] #0: ffffffff82615f80 (rcu_callback){....}-{0:0}, at: rcu_lock_acquire+0x0/0x1b [ 148.569640] Preemption disabled at: [ 148.569642] [] ___slab_alloc+0x554/0x661 [ 148.573559] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Not tainted 5.17.0_up+ #110 [ 148.576834] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-1.fc33 04/01/2014 [ 148.579768] Call Trace: [ 148.580567] [ 148.581262] dump_stack_lvl+0x56/0x7c [ 148.582367] ? ___slab_alloc+0x554/0x661 [ 148.583526] __might_resched+0x1af/0x1c8 [ 148.584678] blk_release_queue+0x24/0x109 [ 148.585861] kobject_cleanup+0xc9/0xfe [ 148.586979] blkg_free+0x46/0x63 [ 148.587962] rcu_do_batch+0x1c5/0x3db [ 148.589057] rcu_core+0x14a/0x184 [ 148.590065] __do_softirq+0x14d/0x2c7 [ 148.591167] __irq_exit_rcu+0x7a/0xd4 [ 148.592264] sysvec_apic_timer_interrupt+0x82/0xa5 [ 148.593649] [ 148.594354] [ 148.595058] asm_sysvec_apic_timer_interrupt+0x12/0x20 Cc: Tejun Heo Fixes: 0a9a25ca7843 ("block: let blkcg_gq grab request queue's refcnt") Reported-by: Christoph Hellwig Link: https://lore.kernel.org/linux-block/20220322093322.GA27283@lst.de/ Signed-off-by: Ming Lei Link: https://lore.kernel.org/r/20220323011308.2010380-1-ming.lei@redhat.com Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index f2ad8ed8f777..652cd05b0924 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -95,7 +95,10 @@ struct blkcg_gq { spinlock_t async_bio_lock; struct bio_list async_bios; - struct work_struct async_bio_work; + union { + struct work_struct async_bio_work; + struct work_struct free_work; + }; atomic_t use_delay; atomic64_t delay_nsec; -- cgit From 553f685ebf9689e701ec6b95c0ca3595cf1a743f Mon Sep 17 00:00:00 2001 From: YueHaibing Date: Fri, 11 Mar 2022 20:55:18 +0800 Subject: mfd: db8500-prcmu: Remove unused inline function commit b0e846248de5 ("mfd: db8500-prcmu: Remove dead code for a non-existing config") left behind this, remove it. Signed-off-by: YueHaibing Reviewed-by: Linus Walleij Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220311125518.31064-1-yuehaibing@huawei.com --- include/linux/mfd/dbx500-prcmu.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/dbx500-prcmu.h b/include/linux/mfd/dbx500-prcmu.h index 2a255362b5ac..e7a7e70fdb38 100644 --- a/include/linux/mfd/dbx500-prcmu.h +++ b/include/linux/mfd/dbx500-prcmu.h @@ -561,10 +561,6 @@ static inline unsigned long prcmu_qos_get_cpufreq_opp_delay(void) return 0; } -static inline void prcmu_qos_set_cpufreq_opp_delay(unsigned long n) {} - -static inline void prcmu_qos_force_opp(int prcmu_qos_class, s32 i) {} - static inline int prcmu_qos_requirement(int prcmu_qos_class) { return 0; -- cgit From 30d024b5058e0433914022f87d917a97a9527632 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Wed, 23 Mar 2022 15:35:10 +1200 Subject: cacheflush.h: Add forward declaration for struct folio The struct folio is not declared in cacheflush.h so we need to provide a forward declaration as otherwise users of this header file may get warnings. Reported-by: Guenter Roeck Fixes: 522a0032af00 ("Add linux/cacheflush.h") Signed-off-by: Herbert Xu Reviewed-by: Matthew Wilcox (Oracle) Signed-off-by: Linus Torvalds --- include/linux/cacheflush.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h index fef8b607f97e..a6189d21f2ba 100644 --- a/include/linux/cacheflush.h +++ b/include/linux/cacheflush.h @@ -4,6 +4,8 @@ #include +struct folio; + #if ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE #ifndef ARCH_IMPLEMENTS_FLUSH_DCACHE_FOLIO void flush_dcache_folio(struct folio *folio); -- cgit From d91612d7f01aca454469976d25db761c5085ae4d Mon Sep 17 00:00:00 2001 From: Samuel Holland Date: Wed, 2 Feb 2022 20:17:35 -0600 Subject: clk: sunxi-ng: Add support for the sun6i RTC clocks The RTC power domain in sun6i and newer SoCs manages the 16 MHz RC oscillator (called "IOSC" or "osc16M") and the optional 32 kHz crystal oscillator (called "LOSC" or "osc32k"). Starting with the H6, this power domain also handles the 24 MHz DCXO (called variously "HOSC", "dcxo24M", or "osc24M") as well. The H6 also adds a calibration circuit for IOSC. Later SoCs introduce further variations on the design: - H616 adds an additional mux for the 32 kHz fanout source. - R329 adds an additional mux for the RTC timekeeping clock, a clock for the SPI bus between power domains inside the RTC, and removes the IOSC calibration functionality. Take advantage of the CCU framework to handle this increased complexity. This driver is intended to be a drop-in replacement for the existing RTC clock provider. So some runtime adjustment of the clock parents is needed, both to handle hardware differences, and to support the old binding which omitted some of the input clocks. Signed-off-by: Samuel Holland Acked-by: Stephen Boyd Signed-off-by: Alexandre Belloni Link: https://lore.kernel.org/r/20220203021736.13434-6-samuel@sholland.org --- include/linux/clk/sunxi-ng.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk/sunxi-ng.h b/include/linux/clk/sunxi-ng.h index cf32123b39f5..57c8ec44ab4e 100644 --- a/include/linux/clk/sunxi-ng.h +++ b/include/linux/clk/sunxi-ng.h @@ -9,4 +9,6 @@ int sunxi_ccu_set_mmc_timing_mode(struct clk *clk, bool new_mode); int sunxi_ccu_get_mmc_timing_mode(struct clk *clk); +int sun6i_rtc_ccu_probe(struct device *dev, void __iomem *reg); + #endif -- cgit From 5c0a04a663019dd14cb000d370cff0ced6df7ef1 Mon Sep 17 00:00:00 2001 From: Alexandre Belloni Date: Wed, 9 Mar 2022 17:22:33 +0100 Subject: rtc: ds1685: drop no_irq No platforms are currently setting no_irq. Anyway, letting platform_get_irq fail is fine as this means that there is no IRQ. In that case, clear RTC_FEATURE_ALARM so the core knows there are no alarms. Signed-off-by: Alexandre Belloni Link: https://lore.kernel.org/r/20220309162301.61679-2-alexandre.belloni@bootlin.com --- include/linux/rtc/ds1685.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rtc/ds1685.h b/include/linux/rtc/ds1685.h index 67ee9d20cc5a..5a41c3bbcbe3 100644 --- a/include/linux/rtc/ds1685.h +++ b/include/linux/rtc/ds1685.h @@ -46,7 +46,6 @@ struct ds1685_priv { u32 regstep; int irq_num; bool bcd_mode; - bool no_irq; u8 (*read)(struct ds1685_priv *, int); void (*write)(struct ds1685_priv *, int, u8); void (*prepare_poweroff)(void); -- cgit From 1a31d63632553a54af6c0c3c5b5930e931a94ee4 Mon Sep 17 00:00:00 2001 From: Alexandre Belloni Date: Wed, 9 Mar 2022 17:23:00 +0100 Subject: rtc: remove uie_unsupported uie_unsupported is not used by any drivers anymore, remove it. Signed-off-by: Alexandre Belloni Link: https://lore.kernel.org/r/20220309162301.61679-29-alexandre.belloni@bootlin.com --- include/linux/rtc.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rtc.h b/include/linux/rtc.h index 47fd1c2d3a57..1fd9c6a21ebe 100644 --- a/include/linux/rtc.h +++ b/include/linux/rtc.h @@ -110,8 +110,6 @@ struct rtc_device { struct hrtimer pie_timer; /* sub second exp, so needs hrtimer */ int pie_enabled; struct work_struct irqwork; - /* Some hardware can't support UIE mode */ - int uie_unsupported; /* * This offset specifies the update timing of the RTC. -- cgit From de7a9e949f4f094741f708cd05572f932d009d02 Mon Sep 17 00:00:00 2001 From: Kajol Jain Date: Wed, 23 Mar 2022 22:15:49 +0530 Subject: drivers/nvdimm: Fix build failure when CONFIG_PERF_EVENTS is not set The following build failure occurs when CONFIG_PERF_EVENTS is not set as generic pmu functions are not visible in that scenario. |-- s390-randconfig-r044-20220313 | |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_migrate_context | |-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_register | `-- nd_perf.c:(.text):undefined-reference-to-perf_pmu_unregister Similar build failure in nds32 architecture: nd_perf.c:(.text+0x21e): undefined reference to `perf_pmu_migrate_context' nd_perf.c:(.text+0x434): undefined reference to `perf_pmu_register' nd_perf.c:(.text+0x57c): undefined reference to `perf_pmu_unregister' Fix this issue by adding check for CONFIG_PERF_EVENTS config option and disabling the nvdimm perf interface incase this config is not set. Also remove function declaration of perf_pmu_migrate_context, perf_pmu_register, perf_pmu_unregister functions from nd.h as these are common pmu functions which are part of perf_event.h and since we are disabling nvdimm perf interface incase CONFIG_PERF_EVENTS option is not set, we not need to declare them in nd.h Also move the platform_device header file addition part from nd.h to nd_perf.c and add stub functions for register_nvdimm_pmu and unregister_nvdimm_pmu functions to handle CONFIG_PERF_EVENTS=n case. Fixes: 0fab1ba6ad6b ("drivers/nvdimm: Add perf interface to expose nvdimm performance stats") (Commit id based on libnvdimm-for-next tree) Signed-off-by: Kajol Jain Link: https://lore.kernel.org/all/62317124.YBQFU33+s%2FwdvWGj%25lkp@intel.com/ Reported-by: kernel test robot Link: https://lore.kernel.org/r/20220323164550.109768-1-kjain@linux.ibm.com Signed-off-by: Dan Williams --- include/linux/nd.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nd.h b/include/linux/nd.h index 7b2ccbdc1cbc..b9771ba1ef87 100644 --- a/include/linux/nd.h +++ b/include/linux/nd.h @@ -9,7 +9,6 @@ #include #include #include -#include enum nvdimm_event { NVDIMM_REVALIDATE_POISON, @@ -57,15 +56,24 @@ struct nvdimm_pmu { struct cpumask arch_cpumask; }; +struct platform_device; + +#ifdef CONFIG_PERF_EVENTS extern ssize_t nvdimm_events_sysfs_show(struct device *dev, struct device_attribute *attr, char *page); int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev); void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu); -void perf_pmu_migrate_context(struct pmu *pmu, int src_cpu, int dst_cpu); -int perf_pmu_register(struct pmu *pmu, const char *name, int type); -void perf_pmu_unregister(struct pmu *pmu); + +#else +static inline int register_nvdimm_pmu(struct nvdimm_pmu *nvdimm, struct platform_device *pdev) +{ + return -ENXIO; +} + +static inline void unregister_nvdimm_pmu(struct nvdimm_pmu *nd_pmu) { } +#endif struct nd_device_driver { struct device_driver drv; -- cgit From 179fd6ba3bacbf7b19cbdf9b14be109d54318394 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Wed, 23 Mar 2022 16:05:32 -0700 Subject: Documentation/sparse: add hints about __CHECKER__ Several attributes depend on __CHECKER__, but previously there was no clue in the tree about when __CHECKER__ might be defined. Add hints at the most common places (__kernel, __user, __iomem, __bitwise) and in the sparse documentation. Link: https://lkml.kernel.org/r/20220310220927.245704-3-helgaas@kernel.org Signed-off-by: Bjorn Helgaas Cc: Jonathan Corbet Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: "Michael S . Tsirkin" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/compiler_types.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 3c1795fdb568..4b3915ce38a4 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -4,6 +4,7 @@ #ifndef __ASSEMBLY__ +/* sparse defines __CHECKER__; see Documentation/dev-tools/sparse.rst */ #ifdef __CHECKER__ /* address spaces */ # define __kernel __attribute__((address_space(0))) -- cgit From 14e83077d55ff4b88fe39f5e98fb8230c2ccb4fb Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Wed, 23 Mar 2022 16:05:41 -0700 Subject: include: drop pointless __compiler_offsetof indirection (1) compiler_types.h is unconditionally included via an -include flag (see scripts/Makefile.lib), and it defines __compiler_offsetof unconditionally. So testing for definedness of __compiler_offsetof is mostly pointless. (2) Every relevant compiler provides __builtin_offsetof (even sparse has had that for 14 years), and if for whatever reason one would end up picking up the poor man's fallback definition (C file compiler with completely custom CFLAGS?), newer clang versions won't treat the result as an Integer Constant Expression, so if used in place where such is required (static initializer or static_assert), one would get errors like t.c:11:16: error: static_assert expression is not an integral constant expression t.c:11:16: note: cast that performs the conversions of a reinterpret_cast is not allowed in a constant expression t.c:4:33: note: expanded from macro 'offsetof' #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) So just define offsetof unconditionally and directly in terms of __builtin_offsetof. Link: https://lkml.kernel.org/r/20220202102147.326672-1-linux@rasmusvillemoes.dk Signed-off-by: Rasmus Villemoes Reviewed-by: Miguel Ojeda Reviewed-by: Nathan Chancellor Reviewed-by: Kees Cook Acked-by: Nick Desaulniers Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/compiler_types.h | 2 -- include/linux/stddef.h | 6 +----- 2 files changed, 1 insertion(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4b3915ce38a4..232dbd97f8b1 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -138,8 +138,6 @@ struct ftrace_likely_data { */ #define __naked __attribute__((__naked__)) notrace -#define __compiler_offsetof(a, b) __builtin_offsetof(a, b) - /* * Prefer gnu_inline, so that extern inline functions do not emit an * externally visible function. This makes extern inline behave as per gnu89 diff --git a/include/linux/stddef.h b/include/linux/stddef.h index ca507bd5f808..929d67710cc5 100644 --- a/include/linux/stddef.h +++ b/include/linux/stddef.h @@ -13,11 +13,7 @@ enum { }; #undef offsetof -#ifdef __compiler_offsetof -#define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER) -#else -#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER) -#endif +#define offsetof(TYPE, MEMBER) __builtin_offsetof(TYPE, MEMBER) /** * sizeof_field() - Report the size of a struct field in bytes -- cgit From f334f5668bedf7307f6df1d98b14f55902931926 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 23 Mar 2022 16:05:44 -0700 Subject: ilog2: force inlining of __ilog2_u32() and __ilog2_u64() Building a kernel with CONFIG_CC_OPTIMISE_FOR_SIZE leads to __ilog2_u32() being duplicated 50 times and __ilog2_u64() 3 times in vmlinux on a tiny powerpc32 config. __ilog2_u32() being 2 instructions it is not worth being kept out of line, so force inlining. Allthough the u64 version is a bit bigger, there is still a small benefit in keeping it inlined. On a 64 bits config there's a real benefit. With this change the size of vmlinux text is reduced by 1 kbytes, which is approx 50% more than the size of the removed functions. Before the patch there is for instance: c00d2a94 <__ilog2_u32>: c00d2a94: 7c 63 00 34 cntlzw r3,r3 c00d2a98: 20 63 00 1f subfic r3,r3,31 c00d2a9c: 4e 80 00 20 blr c00d36d8 <__order_base_2>: c00d36d8: 28 03 00 01 cmplwi r3,1 c00d36dc: 40 81 00 2c ble c00d3708 <__order_base_2+0x30> c00d36e0: 94 21 ff f0 stwu r1,-16(r1) c00d36e4: 7c 08 02 a6 mflr r0 c00d36e8: 38 63 ff ff addi r3,r3,-1 c00d36ec: 90 01 00 14 stw r0,20(r1) c00d36f0: 4b ff f3 a5 bl c00d2a94 <__ilog2_u32> c00d36f4: 80 01 00 14 lwz r0,20(r1) c00d36f8: 38 63 00 01 addi r3,r3,1 c00d36fc: 7c 08 03 a6 mtlr r0 c00d3700: 38 21 00 10 addi r1,r1,16 c00d3704: 4e 80 00 20 blr c00d3708: 38 60 00 00 li r3,0 c00d370c: 4e 80 00 20 blr With the patch it has become: c00d356c <__order_base_2>: c00d356c: 28 03 00 01 cmplwi r3,1 c00d3570: 40 81 00 14 ble c00d3584 <__order_base_2+0x18> c00d3574: 38 63 ff ff addi r3,r3,-1 c00d3578: 7c 63 00 34 cntlzw r3,r3 c00d357c: 20 63 00 20 subfic r3,r3,32 c00d3580: 4e 80 00 20 blr c00d3584: 38 60 00 00 li r3,0 c00d3588: 4e 80 00 20 blr No more need for __order_base_2() to setup a stack frame and save/restore caller address. And the following 'add 1' is merged in the subtract. Another typical use of it: c080ff28 : ... c080fff8: 7f c3 f3 78 mr r3,r30 c080fffc: 4b 8f 81 f1 bl c01081ec <__ilog2_u32> c0810000: 38 63 ff f2 addi r3,r3,-14 ... Becomes c080ff1c : ... c080ffec: 7f c3 00 34 cntlzw r3,r30 c080fff0: 20 63 00 11 subfic r3,r3,17 ... Here no need to move r30 argument to r3 then substract 14 to result. Just work on r30 and merge the 'sub 14' with the 'sub from 31'. Link: https://lkml.kernel.org/r/803a2ac3d923ebcfd0dd40f5886b05cae7bb0aba.1644243860.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/log2.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/log2.h b/include/linux/log2.h index df0b155c2141..9f30d087a128 100644 --- a/include/linux/log2.h +++ b/include/linux/log2.h @@ -18,7 +18,7 @@ * - the arch is not required to handle n==0 if implementing the fallback */ #ifndef CONFIG_ARCH_HAS_ILOG2_U32 -static inline __attribute__((const)) +static __always_inline __attribute__((const)) int __ilog2_u32(u32 n) { return fls(n) - 1; @@ -26,7 +26,7 @@ int __ilog2_u32(u32 n) #endif #ifndef CONFIG_ARCH_HAS_ILOG2_U64 -static inline __attribute__((const)) +static __always_inline __attribute__((const)) int __ilog2_u64(u64 n) { return fls64(n) - 1; -- cgit From 25cb5b7ac6a7f7943e22a6831b74a6aa964d2a0c Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 23 Mar 2022 16:05:47 -0700 Subject: bitfield: add explicit inclusions to the example MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It's not obvious that bitfield.h doesn't guarantee the bits.h inclusion and the example in the former is confusing. Some developers think that it's okay to just include bitfield.h to get it working. Change example to explicitly include necessary headers in order to avoid confusion. Link: https://lkml.kernel.org/r/20220207123341.47533-1-andriy.shevchenko@linux.intel.com Fixes: 3e9b3112ec74 ("add basic register-field manipulation macros") Depends-on: 8bd9cb51daac ("locking/atomics, asm-generic: Move some macros from to a new file") Signed-off-by: Andy Shevchenko Reported-by: Jan Dąbroś Cc: Peter Zijlstra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bitfield.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bitfield.h b/include/linux/bitfield.h index 6093fa6db260..c9be1657f03d 100644 --- a/include/linux/bitfield.h +++ b/include/linux/bitfield.h @@ -19,6 +19,9 @@ * * Example: * + * #include + * #include + * * #define REG_FIELD_A GENMASK(6, 0) * #define REG_FIELD_B BIT(7) * #define REG_FIELD_C GENMASK(15, 8) -- cgit From abc7da58c4b388ab9f6a756c0f297c29d3af665f Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Wed, 23 Mar 2022 16:06:11 -0700 Subject: init.h: improve __setup and early_param documentation Igor noted in [1] that there are quite a few __setup() handling functions that return incorrect values. Doing this can be harmless, but it can also cause strings to be added to init's argument or environment list, polluting them. Since __setup() handling and return values are not documented, first add documentation for that. Also add more documentation for early_param() handling and return values. For __setup() functions, returning 0 (not handled) has questionable value if it is just a malformed option value, as in rodata=junk since returning 0 would just cause "rodata=junk" to be added to init's environment unnecessarily: Run /sbin/init as init process with arguments: /sbin/init with environment: HOME=/ TERM=linux splash=native rodata=junk Also, there are no recommendations on whether to print a warning when an unknown parameter value is seen. I am not addressing that here. [1] lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Link: https://lkml.kernel.org/r/20220221050852.1147-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap Reported-by: Igor Zhbanov Cc: Ingo Molnar Cc: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/init.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/init.h b/include/linux/init.h index d82b4b2e1d25..baf0b29a7010 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -320,12 +320,19 @@ struct obs_kernel_param { __aligned(__alignof__(struct obs_kernel_param)) \ = { __setup_str_##unique_id, fn, early } +/* + * NOTE: __setup functions return values: + * @fn returns 1 (or non-zero) if the option argument is "handled" + * and returns 0 if the option argument is "not handled". + */ #define __setup(str, fn) \ __setup_param(str, fn, fn, 0) /* - * NOTE: fn is as per module_param, not __setup! - * Emits warning if fn returns non-zero. + * NOTE: @fn is as per module_param, not __setup! + * I.e., @fn returns 0 for no error or non-zero for error + * (possibly @fn returns a -errno value, but it does not matter). + * Emits warning if @fn returns non-zero. */ #define early_param(str, fn) \ __setup_param(str, fn, fn, 1) -- cgit From f05fa10901aa4b11a84257ecb7ac7f8fe7c4ee1b Mon Sep 17 00:00:00 2001 From: Jisheng Zhang Date: Wed, 23 Mar 2022 16:06:33 -0700 Subject: kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible Patch series "kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef", v2. Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE" by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code and increase compile coverage. I only modified x86, arm, arm64 and riscv, other architectures such as sh, powerpc and s390 are better to be kept kexec code as-is so they are not touched. This patch (of 5): Make the forward declarations of crashk_res, crashk_low_res and crash_notes always visible. Code referring to these symbols can then just check for IS_ENABLED(CONFIG_KEXEC_CORE), instead of requiring conditional compilation using an #ifdef, thus preparing to increase compile coverage and simplify the code. Link: https://lkml.kernel.org/r/20211206160514.2000-1-jszhang@kernel.org Link: https://lkml.kernel.org/r/20211206160514.2000-2-jszhang@kernel.org Signed-off-by: Jisheng Zhang Acked-by: Baoquan He Cc: Russell King Cc: Catalin Marinas Cc: Will Deacon Cc: Paul Walmsley Cc: Palmer Dabbelt Cc: Albert Ou Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: H. Peter Anvin Cc: Eric W. Biederman Cc: Alexandre Ghiti Cc: Palmer Dabbelt Cc: Russell King (Oracle) Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kexec.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 0c994ae37729..58d1b58a971e 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -20,6 +20,12 @@ #include +/* Location of a reserved region to hold the crash kernel. + */ +extern struct resource crashk_res; +extern struct resource crashk_low_res; +extern note_buf_t __percpu *crash_notes; + #ifdef CONFIG_KEXEC_CORE #include #include @@ -350,12 +356,6 @@ extern int kexec_load_disabled; #define KEXEC_FILE_FLAGS (KEXEC_FILE_UNLOAD | KEXEC_FILE_ON_CRASH | \ KEXEC_FILE_NO_INITRAMFS) -/* Location of a reserved region to hold the crash kernel. - */ -extern struct resource crashk_res; -extern struct resource crashk_low_res; -extern note_buf_t __percpu *crash_notes; - /* flag to track if kexec reboot is in progress */ extern bool kexec_in_progress; -- cgit From 548e7432dc2da475a18077b612e8d55b8ff51891 Mon Sep 17 00:00:00 2001 From: Christian König Date: Fri, 24 Sep 2021 10:55:45 +0200 Subject: dma-buf: add dma_resv_replace_fences v2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This function allows to replace fences from the shared fence list when we can gurantee that the operation represented by the original fence has finished or no accesses to the resources protected by the dma_resv object any more when the new fence finishes. Then use this function in the amdkfd code when BOs are unmapped from the process. v2: add an example when this is usefull. Signed-off-by: Christian König Reviewed-by: Felix Kuehling Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220321135856.1331-1-christian.koenig@amd.com --- include/linux/dma-resv.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index afdfdfac729f..3f53177bdb46 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -468,6 +468,8 @@ void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); +void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, + struct dma_fence *fence); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, bool write, unsigned int *num_fences, struct dma_fence ***fences); -- cgit From 8938d48451f5d7cb565dfa68aa0bd0e81985da09 Mon Sep 17 00:00:00 2001 From: Christian König Date: Fri, 24 Sep 2021 14:19:22 +0200 Subject: dma-buf: finally make the dma_resv_list private v2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drivers should never touch this directly. v2: drop kerneldoc for now internal handling Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220321135856.1331-2-christian.koenig@amd.com --- include/linux/dma-resv.h | 26 +------------------------- 1 file changed, 1 insertion(+), 25 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 3f53177bdb46..202cc65d0621 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -47,18 +47,7 @@ extern struct ww_class reservation_ww_class; -/** - * struct dma_resv_list - a list of shared fences - * @rcu: for internal use - * @shared_count: table of shared fences - * @shared_max: for growing shared fence table - * @shared: shared fence table - */ -struct dma_resv_list { - struct rcu_head rcu; - u32 shared_count, shared_max; - struct dma_fence __rcu *shared[]; -}; +struct dma_resv_list; /** * struct dma_resv - a reservation object manages fences for a buffer @@ -451,19 +440,6 @@ dma_resv_excl_fence(struct dma_resv *obj) return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj)); } -/** - * dma_resv_shared_list - get the reservation object's shared fence list - * @obj: the reservation object - * - * Returns the shared fence list. Caller must either hold the objects - * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(), - * or one of the variants of each - */ -static inline struct dma_resv_list *dma_resv_shared_list(struct dma_resv *obj) -{ - return rcu_dereference_check(obj->fence, dma_resv_held(obj)); -} - void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); -- cgit From d645552e9bd96671079b27015294ec7f9748fa2b Mon Sep 17 00:00:00 2001 From: Phil Sutter Date: Thu, 24 Mar 2022 15:02:40 +0100 Subject: netfilter: egress: Report interface as outgoing Otherwise packets in egress chains seem like they are being received by the interface, not sent out via it. Fixes: 42df6e1d221dd ("netfilter: Introduce egress hook") Signed-off-by: Phil Sutter Signed-off-by: Florian Westphal --- include/linux/netfilter_netdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfilter_netdev.h b/include/linux/netfilter_netdev.h index e6487a691136..8676316547cc 100644 --- a/include/linux/netfilter_netdev.h +++ b/include/linux/netfilter_netdev.h @@ -99,7 +99,7 @@ static inline struct sk_buff *nf_hook_egress(struct sk_buff *skb, int *rc, return skb; nf_hook_state_init(&state, NF_NETDEV_EGRESS, - NFPROTO_NETDEV, dev, NULL, NULL, + NFPROTO_NETDEV, NULL, dev, NULL, dev_net(dev), NULL); /* nf assumes rcu_read_lock, not just read_lock_bh */ -- cgit From bb43b14b576228c580bdc7e1aeeded54d540b5ef Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 24 Mar 2022 18:09:49 -0700 Subject: mm: delete __ClearPageWaiters() The PG_waiters bit is not included in PAGE_FLAGS_CHECK_AT_FREE, and vmscan.c's free_unref_page_list() callers rely on that not to generate bad_page() alerts. So __page_cache_release(), put_pages_list() and release_pages() (and presumably copy-and-pasted free_zone_device_page()) are redundant and misleading to make a special point of clearing it (as the "__" implies, it could only safely be used on the freeing path). Delete __ClearPageWaiters(). Remark on this in one of the "possible" comments in folio_wake_bit(), and delete the superfluous comments. Link: https://lkml.kernel.org/r/3eafa969-5b1a-accf-88fe-318784c791a@google.com Signed-off-by: Hugh Dickins Tested-by: Yu Zhao Reviewed-by: Yang Shi Reviewed-by: David Hildenbrand Cc: Matthew Wilcox Cc: Nicholas Piggin Cc: Yu Zhao Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 88fe1d759cdd..9d8eeaa67d05 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -481,7 +481,7 @@ static inline int TestClearPage##uname(struct page *page) { return 0; } TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname) __PAGEFLAG(Locked, locked, PF_NO_TAIL) -PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) __CLEARPAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) +PAGEFLAG(Waiters, waiters, PF_ONLY_HEAD) PAGEFLAG(Error, error, PF_NO_TAIL) TESTCLEARFLAG(Error, error, PF_NO_TAIL) PAGEFLAG(Referenced, referenced, PF_HEAD) TESTCLEARFLAG(Referenced, referenced, PF_HEAD) -- cgit From 7c13c163e036c646b77753deacfe2f5478b654bc Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:10:10 -0700 Subject: kasan, page_alloc: merge kasan_free_pages into free_pages_prepare Currently, the code responsible for initializing and poisoning memory in free_pages_prepare() is scattered across two locations: kasan_free_pages() for HW_TAGS KASAN and free_pages_prepare() itself. This is confusing. This and a few following patches combine the code from these two locations. Along the way, these patches also simplify the performed checks to make them easier to follow. Replaces the only caller of kasan_free_pages() with its implementation. As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is enabled, moving the code does no functional changes. This patch is not useful by itself but makes the simplifications in the following patches easier to follow. Link: https://lkml.kernel.org/r/303498d15840bb71905852955c6e2390ecc87139.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Alexander Potapenko Acked-by: Marco Elver Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index b6a93261c92a..dd2161af84e9 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -85,7 +85,6 @@ static inline void kasan_disable_current(void) {} #ifdef CONFIG_KASAN_HW_TAGS void kasan_alloc_pages(struct page *page, unsigned int order, gfp_t flags); -void kasan_free_pages(struct page *page, unsigned int order); #else /* CONFIG_KASAN_HW_TAGS */ @@ -96,13 +95,6 @@ static __always_inline void kasan_alloc_pages(struct page *page, BUILD_BUG(); } -static __always_inline void kasan_free_pages(struct page *page, - unsigned int order) -{ - /* Only available for integrated init. */ - BUILD_BUG(); -} - #endif /* CONFIG_KASAN_HW_TAGS */ static inline bool kasan_has_integrated_init(void) -- cgit From c82ce3195fd17e57a370e4dc8f36c195b8807b95 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:10:22 -0700 Subject: mm: clarify __GFP_ZEROTAGS comment __GFP_ZEROTAGS is intended as an optimization: if memory is zeroed during allocation, it's possible to set memory tags at the same time with little performance impact. Clarify this intention of __GFP_ZEROTAGS in the comment. Link: https://lkml.kernel.org/r/cdffde013973c5634a447513e10ec0d21e8eee29.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/gfp.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 20f6fbe12993..72dc4ca75cf3 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -232,8 +232,10 @@ struct vm_area_struct; * * %__GFP_ZERO returns a zeroed page on success. * - * %__GFP_ZEROTAGS returns a page with zeroed memory tags on success, if - * __GFP_ZERO is set. + * %__GFP_ZEROTAGS zeroes memory tags at allocation time if the memory itself + * is being zeroed (either via __GFP_ZERO or via init_on_alloc). This flag is + * intended for optimization: setting memory tags at the same time as zeroing + * memory has minimal additional performace impact. * * %__GFP_SKIP_KASAN_POISON returns a page which does not need to be poisoned * on deallocation. Typically used for userspace pages. Currently only has an -- cgit From b42090ae6f3aa07b0a39403545d688489548a6a8 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:10:31 -0700 Subject: kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hook Currently, the code responsible for initializing and poisoning memory in post_alloc_hook() is scattered across two locations: kasan_alloc_pages() hook for HW_TAGS KASAN and post_alloc_hook() itself. This is confusing. This and a few following patches combine the code from these two locations. Along the way, these patches do a step-by-step restructure the many performed checks to make them easier to follow. Replace the only caller of kasan_alloc_pages() with its implementation. As kasan_has_integrated_init() is only true when CONFIG_KASAN_HW_TAGS is enabled, moving the code does no functional changes. Also move init and init_tags variables definitions out of kasan_has_integrated_init() clause in post_alloc_hook(), as they have the same values regardless of what the if condition evaluates to. This patch is not useful by itself but makes the simplifications in the following patches easier to follow. Link: https://lkml.kernel.org/r/5ac7e0b30f5cbb177ec363ddd7878a3141289592.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index dd2161af84e9..4ed94616c8f0 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -84,17 +84,8 @@ static inline void kasan_disable_current(void) {} #ifdef CONFIG_KASAN_HW_TAGS -void kasan_alloc_pages(struct page *page, unsigned int order, gfp_t flags); - #else /* CONFIG_KASAN_HW_TAGS */ -static __always_inline void kasan_alloc_pages(struct page *page, - unsigned int order, gfp_t flags) -{ - /* Only available for integrated init. */ - BUILD_BUG(); -} - #endif /* CONFIG_KASAN_HW_TAGS */ static inline bool kasan_has_integrated_init(void) -- cgit From 63840de296472f3914bb933b11ba2b764590755e Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:10:52 -0700 Subject: kasan, x86, arm64, s390: rename functions for modules shadow Rename kasan_free_shadow to kasan_free_module_shadow and kasan_module_alloc to kasan_alloc_module_shadow. These functions are used to allocate/free shadow memory for kernel modules when KASAN_VMALLOC is not enabled. The new names better reflect their purpose. Also reword the comment next to their declaration to improve clarity. Link: https://lkml.kernel.org/r/36db32bde765d5d0b856f77d2d806e838513fe84.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Catalin Marinas Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 4ed94616c8f0..b450c7653137 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -433,17 +433,17 @@ static inline void kasan_populate_early_vm_area_shadow(void *start, !defined(CONFIG_KASAN_VMALLOC) /* - * These functions provide a special case to support backing module - * allocations with real shadow memory. With KASAN vmalloc, the special - * case is unnecessary, as the work is handled in the generic case. + * These functions allocate and free shadow memory for kernel modules. + * They are only required when KASAN_VMALLOC is not supported, as otherwise + * shadow memory is allocated by the generic vmalloc handlers. */ -int kasan_module_alloc(void *addr, size_t size, gfp_t gfp_mask); -void kasan_free_shadow(const struct vm_struct *vm); +int kasan_alloc_module_shadow(void *addr, size_t size, gfp_t gfp_mask); +void kasan_free_module_shadow(const struct vm_struct *vm); #else /* (CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS) && !CONFIG_KASAN_VMALLOC */ -static inline int kasan_module_alloc(void *addr, size_t size, gfp_t gfp_mask) { return 0; } -static inline void kasan_free_shadow(const struct vm_struct *vm) {} +static inline int kasan_alloc_module_shadow(void *addr, size_t size, gfp_t gfp_mask) { return 0; } +static inline void kasan_free_module_shadow(const struct vm_struct *vm) {} #endif /* (CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS) && !CONFIG_KASAN_VMALLOC */ -- cgit From 0b7ccc70ee1d5b499d1626c9d28f729507b1c036 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:10:55 -0700 Subject: kasan, vmalloc: drop outdated VM_KASAN comment The comment about VM_KASAN in include/linux/vmalloc.c is outdated. VM_KASAN is currently only used to mark vm_areas allocated for kernel modules when CONFIG_KASAN_VMALLOC is disabled. Drop the comment. Link: https://lkml.kernel.org/r/780395afea83a147b3b5acc36cf2e38f7f8479f9.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Alexander Potapenko Acked-by: Marco Elver Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/vmalloc.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 5a0c3b556848..2ca95c7db463 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -35,17 +35,6 @@ struct notifier_block; /* in notifier.h */ #define VM_DEFER_KMEMLEAK 0 #endif -/* - * VM_KASAN is used slightly differently depending on CONFIG_KASAN_VMALLOC. - * - * If IS_ENABLED(CONFIG_KASAN_VMALLOC), VM_KASAN is set on a vm_struct after - * shadow memory has been mapped. It's used to handle allocation errors so that - * we don't try to poison shadow on free if it was never allocated. - * - * Otherwise, VM_KASAN is set for kasan_module_alloc() allocations and used to - * determine which allocations need the module shadow freed. - */ - /* bits [20..32] reserved for arch specific ioremap internals */ /* -- cgit From 5bd9bae22a455321327982d0b595a7e76d425fd0 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:10:58 -0700 Subject: kasan: reorder vmalloc hooks Group functions that [de]populate shadow memory for vmalloc. Group functions that [un]poison memory for vmalloc. This patch does no functional changes but prepares KASAN code for adding vmalloc support to HW_TAGS KASAN. Link: https://lkml.kernel.org/r/aeef49eb249c206c4c9acce2437728068da74c28.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Alexander Potapenko Acked-by: Marco Elver Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index b450c7653137..5f9ed1372eef 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -397,34 +397,32 @@ static inline void kasan_init_hw_tags(void) { } #ifdef CONFIG_KASAN_VMALLOC +void kasan_populate_early_vm_area_shadow(void *start, unsigned long size); int kasan_populate_vmalloc(unsigned long addr, unsigned long size); -void kasan_poison_vmalloc(const void *start, unsigned long size); -void kasan_unpoison_vmalloc(const void *start, unsigned long size); void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, unsigned long free_region_end); -void kasan_populate_early_vm_area_shadow(void *start, unsigned long size); +void kasan_unpoison_vmalloc(const void *start, unsigned long size); +void kasan_poison_vmalloc(const void *start, unsigned long size); #else /* CONFIG_KASAN_VMALLOC */ +static inline void kasan_populate_early_vm_area_shadow(void *start, + unsigned long size) { } static inline int kasan_populate_vmalloc(unsigned long start, unsigned long size) { return 0; } - -static inline void kasan_poison_vmalloc(const void *start, unsigned long size) -{ } -static inline void kasan_unpoison_vmalloc(const void *start, unsigned long size) -{ } static inline void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) {} + unsigned long free_region_end) { } -static inline void kasan_populate_early_vm_area_shadow(void *start, - unsigned long size) +static inline void kasan_unpoison_vmalloc(const void *start, unsigned long size) +{ } +static inline void kasan_poison_vmalloc(const void *start, unsigned long size) { } #endif /* CONFIG_KASAN_VMALLOC */ -- cgit From 579fb0ac085be2015529a5f7bd1061899a6374de Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:11:01 -0700 Subject: kasan: add wrappers for vmalloc hooks Add wrappers around functions that [un]poison memory for vmalloc allocations. These functions will be used by HW_TAGS KASAN and therefore need to be disabled when kasan=off command line argument is provided. This patch does no functional changes for software KASAN modes. Link: https://lkml.kernel.org/r/3b8728eac438c55389fb0f9a8a2145d71dd77487.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Alexander Potapenko Acked-by: Marco Elver Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 5f9ed1372eef..a6d6fdefb278 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -403,8 +403,21 @@ void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, unsigned long free_region_end); -void kasan_unpoison_vmalloc(const void *start, unsigned long size); -void kasan_poison_vmalloc(const void *start, unsigned long size); +void __kasan_unpoison_vmalloc(const void *start, unsigned long size); +static __always_inline void kasan_unpoison_vmalloc(const void *start, + unsigned long size) +{ + if (kasan_enabled()) + __kasan_unpoison_vmalloc(start, size); +} + +void __kasan_poison_vmalloc(const void *start, unsigned long size); +static __always_inline void kasan_poison_vmalloc(const void *start, + unsigned long size) +{ + if (kasan_enabled()) + __kasan_poison_vmalloc(start, size); +} #else /* CONFIG_KASAN_VMALLOC */ -- cgit From 1d96320f8d5320423737da61e6e6937f6b475b5c Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:11:13 -0700 Subject: kasan, vmalloc: add vmalloc tagging for SW_TAGS Add vmalloc tagging support to SW_TAGS KASAN. - __kasan_unpoison_vmalloc() now assigns a random pointer tag, poisons the virtual mapping accordingly, and embeds the tag into the returned pointer. - __get_vm_area_node() (used by vmalloc() and vmap()) and pcpu_get_vm_areas() save the tagged pointer into vm_struct->addr (note: not into vmap_area->addr). This requires putting kasan_unpoison_vmalloc() after setup_vmalloc_vm[_locked](); otherwise the latter will overwrite the tagged pointer. The tagged pointer then is naturally propagateed to vmalloc() and vmap(). - vm_map_ram() returns the tagged pointer directly. As a result of this change, vm_struct->addr is now tagged. Enabling KASAN_VMALLOC with SW_TAGS is not yet allowed. Link: https://lkml.kernel.org/r/4a78f3c064ce905e9070c29733aca1dd254a74f1.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index a6d6fdefb278..d3efb6cbd0f5 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -403,12 +403,13 @@ void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, unsigned long free_region_end); -void __kasan_unpoison_vmalloc(const void *start, unsigned long size); -static __always_inline void kasan_unpoison_vmalloc(const void *start, - unsigned long size) +void *__kasan_unpoison_vmalloc(const void *start, unsigned long size); +static __always_inline void *kasan_unpoison_vmalloc(const void *start, + unsigned long size) { if (kasan_enabled()) - __kasan_unpoison_vmalloc(start, size); + return __kasan_unpoison_vmalloc(start, size); + return (void *)start; } void __kasan_poison_vmalloc(const void *start, unsigned long size); @@ -433,8 +434,11 @@ static inline void kasan_release_vmalloc(unsigned long start, unsigned long free_region_start, unsigned long free_region_end) { } -static inline void kasan_unpoison_vmalloc(const void *start, unsigned long size) -{ } +static inline void *kasan_unpoison_vmalloc(const void *start, + unsigned long size) +{ + return (void *)start; +} static inline void kasan_poison_vmalloc(const void *start, unsigned long size) { } -- cgit From 01d92c7f358ce892279ca830cf6ccf2862a17d1c Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:11:16 -0700 Subject: kasan, vmalloc, arm64: mark vmalloc mappings as pgprot_tagged HW_TAGS KASAN relies on ARM Memory Tagging Extension (MTE). With MTE, a memory region must be mapped as MT_NORMAL_TAGGED to allow setting memory tags via MTE-specific instructions. Add proper protection bits to vmalloc() allocations. These allocations are always backed by page_alloc pages, so the tags will actually be getting set on the corresponding physical memory. Link: https://lkml.kernel.org/r/983fc33542db2f6b1e77b34ca23448d4640bbb9e.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Co-developed-by: Vincenzo Frascino Signed-off-by: Vincenzo Frascino Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/vmalloc.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 2ca95c7db463..3b1df7da402d 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -115,6 +115,13 @@ static inline int arch_vmap_pte_supported_shift(unsigned long size) } #endif +#ifndef arch_vmap_pgprot_tagged +static inline pgprot_t arch_vmap_pgprot_tagged(pgprot_t prot) +{ + return prot; +} +#endif + /* * Highlevel APIs for driver use */ -- cgit From f49d9c5bb15cc5c470097078ec1f26ff605ae1a2 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:11:23 -0700 Subject: kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS Only define the ___GFP_SKIP_KASAN_POISON flag when CONFIG_KASAN_HW_TAGS is enabled. This patch it not useful by itself, but it prepares the code for additions of new KASAN-specific GFP patches. Link: https://lkml.kernel.org/r/44e5738a584c11801b2b8f1231898918efc8634a.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/gfp.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 72dc4ca75cf3..75abf8d5bc9b 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -54,7 +54,11 @@ struct vm_area_struct; #define ___GFP_THISNODE 0x200000u #define ___GFP_ACCOUNT 0x400000u #define ___GFP_ZEROTAGS 0x800000u +#ifdef CONFIG_KASAN_HW_TAGS #define ___GFP_SKIP_KASAN_POISON 0x1000000u +#else +#define ___GFP_SKIP_KASAN_POISON 0 +#endif #ifdef CONFIG_LOCKDEP #define ___GFP_NOLOCKDEP 0x2000000u #else @@ -251,7 +255,9 @@ struct vm_area_struct; #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT (25 + IS_ENABLED(CONFIG_LOCKDEP)) +#define __GFP_BITS_SHIFT (24 + \ + IS_ENABLED(CONFIG_KASAN_HW_TAGS) + \ + IS_ENABLED(CONFIG_LOCKDEP)) #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /** -- cgit From 53ae233c30a623ff44ff2f83854e92530c5d9fc2 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:11:26 -0700 Subject: kasan, page_alloc: allow skipping unpoisoning for HW_TAGS Add a new GFP flag __GFP_SKIP_KASAN_UNPOISON that allows skipping KASAN poisoning for page_alloc allocations. The flag is only effective with HW_TAGS KASAN. This flag will be used by vmalloc code for page_alloc allocations backing vmalloc() mappings in a following patch. The reason to skip KASAN poisoning for these pages in page_alloc is because vmalloc code will be poisoning them instead. Also reword the comment for __GFP_SKIP_KASAN_POISON. Link: https://lkml.kernel.org/r/35c97d77a704f6ff971dd3bfe4be95855744108e.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/gfp.h | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 75abf8d5bc9b..688a39fde43e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -55,12 +55,14 @@ struct vm_area_struct; #define ___GFP_ACCOUNT 0x400000u #define ___GFP_ZEROTAGS 0x800000u #ifdef CONFIG_KASAN_HW_TAGS -#define ___GFP_SKIP_KASAN_POISON 0x1000000u +#define ___GFP_SKIP_KASAN_UNPOISON 0x1000000u +#define ___GFP_SKIP_KASAN_POISON 0x2000000u #else +#define ___GFP_SKIP_KASAN_UNPOISON 0 #define ___GFP_SKIP_KASAN_POISON 0 #endif #ifdef CONFIG_LOCKDEP -#define ___GFP_NOLOCKDEP 0x2000000u +#define ___GFP_NOLOCKDEP 0x4000000u #else #define ___GFP_NOLOCKDEP 0 #endif @@ -241,22 +243,25 @@ struct vm_area_struct; * intended for optimization: setting memory tags at the same time as zeroing * memory has minimal additional performace impact. * - * %__GFP_SKIP_KASAN_POISON returns a page which does not need to be poisoned - * on deallocation. Typically used for userspace pages. Currently only has an - * effect in HW tags mode. + * %__GFP_SKIP_KASAN_UNPOISON makes KASAN skip unpoisoning on page allocation. + * Only effective in HW_TAGS mode. + * + * %__GFP_SKIP_KASAN_POISON makes KASAN skip poisoning on page deallocation. + * Typically, used for userspace pages. Only effective in HW_TAGS mode. */ #define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) #define __GFP_COMP ((__force gfp_t)___GFP_COMP) #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_ZEROTAGS ((__force gfp_t)___GFP_ZEROTAGS) -#define __GFP_SKIP_KASAN_POISON ((__force gfp_t)___GFP_SKIP_KASAN_POISON) +#define __GFP_SKIP_KASAN_UNPOISON ((__force gfp_t)___GFP_SKIP_KASAN_UNPOISON) +#define __GFP_SKIP_KASAN_POISON ((__force gfp_t)___GFP_SKIP_KASAN_POISON) /* Disable lockdep for GFP context tracking */ #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT (24 + \ - IS_ENABLED(CONFIG_KASAN_HW_TAGS) + \ +#define __GFP_BITS_SHIFT (24 + \ + 2 * IS_ENABLED(CONFIG_KASAN_HW_TAGS) + \ IS_ENABLED(CONFIG_LOCKDEP)) #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) -- cgit From 9353ffa6e9e90d2b6348209cf2b95a8ffee18711 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:11:29 -0700 Subject: kasan, page_alloc: allow skipping memory init for HW_TAGS Add a new GFP flag __GFP_SKIP_ZERO that allows to skip memory initialization. The flag is only effective with HW_TAGS KASAN. This flag will be used by vmalloc code for page_alloc allocations backing vmalloc() mappings in a following patch. The reason to skip memory initialization for these pages in page_alloc is because vmalloc code will be initializing them instead. With the current implementation, when __GFP_SKIP_ZERO is provided, __GFP_ZEROTAGS is ignored. This doesn't matter, as these two flags are never provided at the same time. However, if this is changed in the future, this particular implementation detail can be changed as well. Link: https://lkml.kernel.org/r/0d53efeff345de7d708e0baa0d8829167772521e.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/gfp.h | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 688a39fde43e..0fa17fb85de5 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -55,14 +55,16 @@ struct vm_area_struct; #define ___GFP_ACCOUNT 0x400000u #define ___GFP_ZEROTAGS 0x800000u #ifdef CONFIG_KASAN_HW_TAGS -#define ___GFP_SKIP_KASAN_UNPOISON 0x1000000u -#define ___GFP_SKIP_KASAN_POISON 0x2000000u +#define ___GFP_SKIP_ZERO 0x1000000u +#define ___GFP_SKIP_KASAN_UNPOISON 0x2000000u +#define ___GFP_SKIP_KASAN_POISON 0x4000000u #else +#define ___GFP_SKIP_ZERO 0 #define ___GFP_SKIP_KASAN_UNPOISON 0 #define ___GFP_SKIP_KASAN_POISON 0 #endif #ifdef CONFIG_LOCKDEP -#define ___GFP_NOLOCKDEP 0x4000000u +#define ___GFP_NOLOCKDEP 0x8000000u #else #define ___GFP_NOLOCKDEP 0 #endif @@ -239,9 +241,10 @@ struct vm_area_struct; * %__GFP_ZERO returns a zeroed page on success. * * %__GFP_ZEROTAGS zeroes memory tags at allocation time if the memory itself - * is being zeroed (either via __GFP_ZERO or via init_on_alloc). This flag is - * intended for optimization: setting memory tags at the same time as zeroing - * memory has minimal additional performace impact. + * is being zeroed (either via __GFP_ZERO or via init_on_alloc, provided that + * __GFP_SKIP_ZERO is not set). This flag is intended for optimization: setting + * memory tags at the same time as zeroing memory has minimal additional + * performace impact. * * %__GFP_SKIP_KASAN_UNPOISON makes KASAN skip unpoisoning on page allocation. * Only effective in HW_TAGS mode. @@ -253,6 +256,7 @@ struct vm_area_struct; #define __GFP_COMP ((__force gfp_t)___GFP_COMP) #define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) #define __GFP_ZEROTAGS ((__force gfp_t)___GFP_ZEROTAGS) +#define __GFP_SKIP_ZERO ((__force gfp_t)___GFP_SKIP_ZERO) #define __GFP_SKIP_KASAN_UNPOISON ((__force gfp_t)___GFP_SKIP_KASAN_UNPOISON) #define __GFP_SKIP_KASAN_POISON ((__force gfp_t)___GFP_SKIP_KASAN_POISON) @@ -261,7 +265,7 @@ struct vm_area_struct; /* Room for N __GFP_FOO bits */ #define __GFP_BITS_SHIFT (24 + \ - 2 * IS_ENABLED(CONFIG_KASAN_HW_TAGS) + \ + 3 * IS_ENABLED(CONFIG_KASAN_HW_TAGS) + \ IS_ENABLED(CONFIG_LOCKDEP)) #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) -- cgit From 23689e91fb22c15b84ac6c22ad9942039792f3af Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:11:32 -0700 Subject: kasan, vmalloc: add vmalloc tagging for HW_TAGS Add vmalloc tagging support to HW_TAGS KASAN. The key difference between HW_TAGS and the other two KASAN modes when it comes to vmalloc: HW_TAGS KASAN can only assign tags to physical memory. The other two modes have shadow memory covering every mapped virtual memory region. Make __kasan_unpoison_vmalloc() for HW_TAGS KASAN: - Skip non-VM_ALLOC mappings as HW_TAGS KASAN can only tag a single mapping of normal physical memory; see the comment in the function. - Generate a random tag, tag the returned pointer and the allocation, and initialize the allocation at the same time. - Propagate the tag into the page stucts to allow accesses through page_address(vmalloc_to_page()). The rest of vmalloc-related KASAN hooks are not needed: - The shadow-related ones are fully skipped. - __kasan_poison_vmalloc() is kept as a no-op with a comment. Poisoning and zeroing of physical pages that are backing vmalloc() allocations are skipped via __GFP_SKIP_KASAN_UNPOISON and __GFP_SKIP_ZERO: __kasan_unpoison_vmalloc() does that instead. Enabling CONFIG_KASAN_VMALLOC with HW_TAGS is not yet allowed. Link: https://lkml.kernel.org/r/d19b2e9e59a9abc59d05b72dea8429dcaea739c6.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Co-developed-by: Vincenzo Frascino Signed-off-by: Vincenzo Frascino Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 36 ++++++++++++++++++++++++++++++++---- 1 file changed, 32 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index d3efb6cbd0f5..65fc5285bb59 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -26,6 +26,12 @@ struct kunit_kasan_expectation { #endif +typedef unsigned int __bitwise kasan_vmalloc_flags_t; + +#define KASAN_VMALLOC_NONE 0x00u +#define KASAN_VMALLOC_INIT 0x01u +#define KASAN_VMALLOC_VM_ALLOC 0x02u + #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) #include @@ -397,18 +403,39 @@ static inline void kasan_init_hw_tags(void) { } #ifdef CONFIG_KASAN_VMALLOC +#if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) + void kasan_populate_early_vm_area_shadow(void *start, unsigned long size); int kasan_populate_vmalloc(unsigned long addr, unsigned long size); void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, unsigned long free_region_end); -void *__kasan_unpoison_vmalloc(const void *start, unsigned long size); +#else /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */ + +static inline void kasan_populate_early_vm_area_shadow(void *start, + unsigned long size) +{ } +static inline int kasan_populate_vmalloc(unsigned long start, + unsigned long size) +{ + return 0; +} +static inline void kasan_release_vmalloc(unsigned long start, + unsigned long end, + unsigned long free_region_start, + unsigned long free_region_end) { } + +#endif /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */ + +void *__kasan_unpoison_vmalloc(const void *start, unsigned long size, + kasan_vmalloc_flags_t flags); static __always_inline void *kasan_unpoison_vmalloc(const void *start, - unsigned long size) + unsigned long size, + kasan_vmalloc_flags_t flags) { if (kasan_enabled()) - return __kasan_unpoison_vmalloc(start, size); + return __kasan_unpoison_vmalloc(start, size, flags); return (void *)start; } @@ -435,7 +462,8 @@ static inline void kasan_release_vmalloc(unsigned long start, unsigned long free_region_end) { } static inline void *kasan_unpoison_vmalloc(const void *start, - unsigned long size) + unsigned long size, + kasan_vmalloc_flags_t flags) { return (void *)start; } -- cgit From f6e39794f4b6da7ca9b77f2f9ad11fd6f0ac83e5 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:11:35 -0700 Subject: kasan, vmalloc: only tag normal vmalloc allocations The kernel can use to allocate executable memory. The only supported way to do that is via __vmalloc_node_range() with the executable bit set in the prot argument. (vmap() resets the bit via pgprot_nx()). Once tag-based KASAN modes start tagging vmalloc allocations, executing code from such allocations will lead to the PC register getting a tag, which is not tolerated by the kernel. Only tag the allocations for normal kernel pages. [andreyknvl@google.com: pass KASAN_VMALLOC_PROT_NORMAL to kasan_unpoison_vmalloc()] Link: https://lkml.kernel.org/r/9230ca3d3e40ffca041c133a524191fd71969a8d.1646233925.git.andreyknvl@google.com [andreyknvl@google.com: support tagged vmalloc mappings] Link: https://lkml.kernel.org/r/2f6605e3a358cf64d73a05710cb3da356886ad29.1646233925.git.andreyknvl@google.com [andreyknvl@google.com: don't unintentionally disabled poisoning] Link: https://lkml.kernel.org/r/de4587d6a719232e83c760113e46ed2d4d8da61e.1646757322.git.andreyknvl@google.com Link: https://lkml.kernel.org/r/fbfd9939a4dc375923c9a5c6b9e7ab05c26b8c6b.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Acked-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Mark Rutland Cc: Peter Collingbourne Cc: Vincenzo Frascino Cc: Will Deacon Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 65fc5285bb59..903b9453931d 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -28,9 +28,10 @@ struct kunit_kasan_expectation { typedef unsigned int __bitwise kasan_vmalloc_flags_t; -#define KASAN_VMALLOC_NONE 0x00u -#define KASAN_VMALLOC_INIT 0x01u -#define KASAN_VMALLOC_VM_ALLOC 0x02u +#define KASAN_VMALLOC_NONE 0x00u +#define KASAN_VMALLOC_INIT 0x01u +#define KASAN_VMALLOC_VM_ALLOC 0x02u +#define KASAN_VMALLOC_PROT_NORMAL 0x04u #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) -- cgit From ed6d74446cbfb88c747d4b32477a9d46132694db Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:12:02 -0700 Subject: kasan: test: support async (again) and asymm modes for HW_TAGS Async mode support has already been implemented in commit e80a76aa1a91 ("kasan, arm64: tests supports for HW_TAGS async mode") but then got accidentally broken in commit 99734b535d9b ("kasan: detect false-positives in tests"). Restore the changes removed by the latter patch and adapt them for asymm mode: add a sync_fault flag to kunit_kasan_expectation that only get set if the MTE fault was synchronous, and reenable MTE on such faults in tests. Also rename kunit_kasan_expectation to kunit_kasan_status and move its definition to mm/kasan/kasan.h from include/linux/kasan.h, as this structure is only internally used by KASAN. Also put the structure definition under IS_ENABLED(CONFIG_KUNIT). Link: https://lkml.kernel.org/r/133970562ccacc93ba19d754012c562351d4a8c8.1645033139.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Cc: Marco Elver Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Andrey Ryabinin Cc: Vincenzo Frascino Cc: Catalin Marinas Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 903b9453931d..fe36215807f7 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -19,11 +19,6 @@ struct task_struct; #include #include -/* kasan_data struct is used in KUnit tests for KASAN expected failures */ -struct kunit_kasan_expectation { - bool report_found; -}; - #endif typedef unsigned int __bitwise kasan_vmalloc_flags_t; -- cgit From 80207910cd71b4e0e87140d165d82b5d3ff69e53 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 24 Mar 2022 18:13:12 -0700 Subject: kasan: move and hide kasan_save_enable/restore_multi_shot - Move kasan_save_enable/restore_multi_shot() declarations to mm/kasan/kasan.h, as there is no need for them to be visible outside of KASAN implementation. - Only define and export these functions when KASAN tests are enabled. - Move their definitions closer to other test-related code in report.c. Link: https://lkml.kernel.org/r/6ba637333b78447f027d775f2d55ab1a40f63c99.1646237226.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Marco Elver Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kasan.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index fe36215807f7..ceebcb9de7bf 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -267,10 +267,6 @@ static __always_inline bool kasan_check_byte(const void *addr) return true; } - -bool kasan_save_enable_multi_shot(void); -void kasan_restore_multi_shot(bool enabled); - #else /* CONFIG_KASAN */ static inline slab_flags_t kasan_never_merge(void) -- cgit From 562beb7235abfebdd8366e0664a5c3d1e597b990 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 24 Mar 2022 18:13:27 -0700 Subject: mm/huge_memory: make is_transparent_hugepage() static It's only used inside the huge_memory.c now. Don't export it and make it static. We can thus reduce the size of huge_memory.o a bit. Without this patch: text data bss dec hex filename 32319 2965 4 35288 89d8 mm/huge_memory.o With this patch: text data bss dec hex filename 32042 2957 4 35003 88bb mm/huge_memory.o Link: https://lkml.kernel.org/r/20220302082145.12028-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Muchun Song Reviewed-by: Yang Shi Cc: Matthew Wilcox Cc: William Kucharski Cc: Hugh Dickins Cc: Peter Xu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/huge_mm.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 0734aff8fa19..2999190adc22 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -183,7 +183,6 @@ unsigned long thp_get_unmapped_area(struct file *filp, unsigned long addr, void prep_transhuge_page(struct page *page); void free_transhuge_page(struct page *page); -bool is_transparent_hugepage(struct page *page); bool can_split_folio(struct folio *folio, int *pextra_pins); int split_huge_page_to_list(struct page *page, struct list_head *list); @@ -341,11 +340,6 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, static inline void prep_transhuge_page(struct page *page) {} -static inline bool is_transparent_hugepage(struct page *page) -{ - return false; -} - #define transparent_hugepage_flags 0UL #define thp_get_unmapped_area NULL -- cgit From 03104c2c5db8918030788e607e4af980b2f42bb3 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 24 Mar 2022 18:13:50 -0700 Subject: mm/swapfile: remove stale reuse_swap_page() All users are gone, let's remove it. We'll let SWP_STABLE_WRITES stick around for now, as it might come in handy in the near future. Link: https://lkml.kernel.org/r/20220131162940.210846-8-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Kirill A. Shutemov Cc: Liang Zhang Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oleg Nesterov Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/swap.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index f37837c614c5..27093b477c5f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -515,7 +515,6 @@ extern int __swp_swapcount(swp_entry_t entry); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); -extern bool reuse_swap_page(struct page *); extern int try_to_free_swap(struct page *); struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); @@ -681,9 +680,6 @@ static inline int swp_swapcount(swp_entry_t entry) return 0; } -#define reuse_swap_page(page) \ - (page_trans_huge_mapcount(page) == 1) - static inline int try_to_free_swap(struct page *page) { return 0; -- cgit From 55c62fa7c53368a9011cd1276c001a1469078c6a Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 24 Mar 2022 18:13:53 -0700 Subject: mm/huge_memory: remove stale page_trans_huge_mapcount() All users are gone, let's remove it. Link: https://lkml.kernel.org/r/20220131162940.210846-9-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Kirill A. Shutemov Cc: Liang Zhang Cc: Matthew Wilcox (Oracle) Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oleg Nesterov Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 7a3dd7e617e4..e34edb775334 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -834,16 +834,11 @@ static inline int total_mapcount(struct page *page) return folio_mapcount(page_folio(page)); } -int page_trans_huge_mapcount(struct page *page); #else static inline int total_mapcount(struct page *page) { return page_mapcount(page); } -static inline int page_trans_huge_mapcount(struct page *page) -{ - return page_mapcount(page); -} #endif static inline struct page *virt_to_head_page(const void *x) -- cgit From 566d3362885aab04d6b0f885f12db3176ca3a032 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Thu, 24 Mar 2022 18:13:59 -0700 Subject: mm: warn on deleting redirtied only if accounted filemap_unaccount_folio() has a WARN_ON_ONCE(folio_test_dirty(folio)). It is good to warn of late dirtying on a persistent filesystem, but late dirtying on tmpfs can only lose data which is expected to be thrown away; and it's a pity if that warning comes ONCE on tmpfs, then hides others which really matter. Make it conditional on mapping_cap_writeback(). Cleanup: then folio_account_cleaned() no longer needs to check that for itself, and so no longer needs to know the mapping. Link: https://lkml.kernel.org/r/b5a1106c-7226-a5c6-ad41-ad4832cae1f@google.com Signed-off-by: Hugh Dickins Cc: Matthew Wilcox Cc: Jan Kara Cc: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/pagemap.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 58f395f3febe..a8d0b327b066 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1009,8 +1009,7 @@ static inline void __set_page_dirty(struct page *page, { __folio_mark_dirty(page_folio(page), mapping, warn); } -void folio_account_cleaned(struct folio *folio, struct address_space *mapping, - struct bdi_writeback *wb); +void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb); void __folio_cancel_dirty(struct folio *folio); static inline void folio_cancel_dirty(struct folio *folio) { -- cgit From caaf2ae712b7cc3c7717898fe267dbf882a502ef Mon Sep 17 00:00:00 2001 From: Christian König Date: Mon, 24 Jan 2022 14:03:24 +0100 Subject: dma-buf: Add dma_fence_array_for_each (v2) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a helper to iterate over all fences in a dma_fence_array object. v2 (Jason Ekstrand) - Return NULL from dma_fence_array_first if head == NULL. This matches the iterator behavior of dma_fence_chain_for_each in that it iterates zero times if head == NULL. - Return NULL from dma_fence_array_next if index > array->num_fences. Signed-off-by: Jason Ekstrand Reviewed-by: Jason Ekstrand Reviewed-by: Christian König Cc: Daniel Vetter Cc: Maarten Lankhorst Link: https://patchwork.freedesktop.org/patch/msgid/20210610210925.642582-2-jason@jlekstrand.net Signed-off-by: Christian König --- include/linux/dma-fence-array.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h index fec374f69e12..e34dcb0bb462 100644 --- a/include/linux/dma-fence-array.h +++ b/include/linux/dma-fence-array.h @@ -61,6 +61,19 @@ to_dma_fence_array(struct dma_fence *fence) return container_of(fence, struct dma_fence_array, base); } +/** + * dma_fence_array_for_each - iterate over all fences in array + * @fence: current fence + * @index: index into the array + * @head: potential dma_fence_array object + * + * Test if @array is a dma_fence_array object and if yes iterate over all fences + * in the array. If not just iterate over the fence in @array itself. + */ +#define dma_fence_array_for_each(fence, index, head) \ + for (index = 0, fence = dma_fence_array_first(head); fence; \ + ++(index), fence = dma_fence_array_next(head, index)) + struct dma_fence_array *dma_fence_array_create(int num_fences, struct dma_fence **fences, u64 context, unsigned seqno, @@ -68,4 +81,8 @@ struct dma_fence_array *dma_fence_array_create(int num_fences, bool dma_fence_match_context(struct dma_fence *fence, u64 context); +struct dma_fence *dma_fence_array_first(struct dma_fence *head); +struct dma_fence *dma_fence_array_next(struct dma_fence *head, + unsigned int index); + #endif /* __LINUX_DMA_FENCE_ARRAY_H */ -- cgit From 64a8f92fd783e750cdb81af75942dcd53bbf61bd Mon Sep 17 00:00:00 2001 From: Christian König Date: Fri, 11 Mar 2022 10:27:53 +0100 Subject: dma-buf: add dma_fence_unwrap v2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a general purpose helper to deep dive into dma_fence_chain/dma_fence_array structures and iterate over all the fences in them. This is useful when we need to flatten out all fences in those structures. v2: some selftests cleanup, improved function naming and documentation Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220311110244.1245-1-christian.koenig@amd.com --- include/linux/dma-fence-array.h | 2 + include/linux/dma-fence-chain.h | 2 + include/linux/dma-fence-unwrap.h | 95 ++++++++++++++++++++++++++++++++++++++++ 3 files changed, 99 insertions(+) create mode 100644 include/linux/dma-fence-unwrap.h (limited to 'include/linux') diff --git a/include/linux/dma-fence-array.h b/include/linux/dma-fence-array.h index e34dcb0bb462..ec7f25def392 100644 --- a/include/linux/dma-fence-array.h +++ b/include/linux/dma-fence-array.h @@ -69,6 +69,8 @@ to_dma_fence_array(struct dma_fence *fence) * * Test if @array is a dma_fence_array object and if yes iterate over all fences * in the array. If not just iterate over the fence in @array itself. + * + * For a deep dive iterator see dma_fence_unwrap_for_each(). */ #define dma_fence_array_for_each(fence, index, head) \ for (index = 0, fence = dma_fence_array_first(head); fence; \ diff --git a/include/linux/dma-fence-chain.h b/include/linux/dma-fence-chain.h index 10d51bcdf7b7..4bdf0b96da28 100644 --- a/include/linux/dma-fence-chain.h +++ b/include/linux/dma-fence-chain.h @@ -112,6 +112,8 @@ static inline void dma_fence_chain_free(struct dma_fence_chain *chain) * * Iterate over all fences in the chain. We keep a reference to the current * fence while inside the loop which must be dropped when breaking out. + * + * For a deep dive iterator see dma_fence_unwrap_for_each(). */ #define dma_fence_chain_for_each(iter, head) \ for (iter = dma_fence_get(head); iter; \ diff --git a/include/linux/dma-fence-unwrap.h b/include/linux/dma-fence-unwrap.h new file mode 100644 index 000000000000..77e335a1bcac --- /dev/null +++ b/include/linux/dma-fence-unwrap.h @@ -0,0 +1,95 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * fence-chain: chain fences together in a timeline + * + * Copyright (C) 2022 Advanced Micro Devices, Inc. + * Authors: + * Christian König + */ + +#ifndef __LINUX_DMA_FENCE_UNWRAP_H +#define __LINUX_DMA_FENCE_UNWRAP_H + +#include +#include + +/** + * struct dma_fence_unwrap - cursor into the container structure + * + * Should be used with dma_fence_unwrap_for_each() iterator macro. + */ +struct dma_fence_unwrap { + /** + * @chain: potential dma_fence_chain, but can be other fence as well + */ + struct dma_fence *chain; + /** + * @array: potential dma_fence_array, but can be other fence as well + */ + struct dma_fence *array; + /** + * @index: last returned index if @array is really a dma_fence_array + */ + unsigned int index; +}; + +/* Internal helper to start new array iteration, don't use directly */ +static inline struct dma_fence * +__dma_fence_unwrap_array(struct dma_fence_unwrap * cursor) +{ + cursor->array = dma_fence_chain_contained(cursor->chain); + cursor->index = 0; + return dma_fence_array_first(cursor->array); +} + +/** + * dma_fence_unwrap_first - return the first fence from fence containers + * @head: the entrypoint into the containers + * @cursor: current position inside the containers + * + * Unwraps potential dma_fence_chain/dma_fence_array containers and return the + * first fence. + */ +static inline struct dma_fence * +dma_fence_unwrap_first(struct dma_fence *head, struct dma_fence_unwrap *cursor) +{ + cursor->chain = dma_fence_get(head); + return __dma_fence_unwrap_array(cursor); +} + +/** + * dma_fence_unwrap_next - return the next fence from a fence containers + * @cursor: current position inside the containers + * + * Continue unwrapping the dma_fence_chain/dma_fence_array containers and return + * the next fence from them. + */ +static inline struct dma_fence * +dma_fence_unwrap_next(struct dma_fence_unwrap *cursor) +{ + struct dma_fence *tmp; + + ++cursor->index; + tmp = dma_fence_array_next(cursor->array, cursor->index); + if (tmp) + return tmp; + + cursor->chain = dma_fence_chain_walk(cursor->chain); + return __dma_fence_unwrap_array(cursor); +} + +/** + * dma_fence_unwrap_for_each - iterate over all fences in containers + * @fence: current fence + * @cursor: current position inside the containers + * @head: starting point for the iterator + * + * Unwrap dma_fence_chain and dma_fence_array containers and deep dive into all + * potential fences in them. If @head is just a normal fence only that one is + * returned. + */ +#define dma_fence_unwrap_for_each(fence, cursor, head) \ + for (fence = dma_fence_unwrap_first(head, cursor); fence; \ + fence = dma_fence_unwrap_next(cursor)) + +#endif -- cgit From 421ab1be43bd015ffe744f4ea25df4f19d1ce6fe Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 25 Mar 2022 10:37:31 -0400 Subject: SUNRPC: Do not dereference non-socket transports in sysfs Do not cast the struct xprt to a sock_xprt unless we know it is a UDP or TCP transport. Otherwise the call to lock the mutex will scribble over whatever structure is actually there. This has been seen to cause hard system lockups when the underlying transport was RDMA. Fixes: b49ea673e119 ("SUNRPC: lock against ->sock changing during sysfs read") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xprt.h | 3 +++ include/linux/sunrpc/xprtsock.h | 1 - 2 files changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 955ea4d7af0b..eef5e87c03b4 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -139,6 +139,9 @@ struct rpc_xprt_ops { void (*rpcbind)(struct rpc_task *task); void (*set_port)(struct rpc_xprt *xprt, unsigned short port); void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task); + int (*get_srcaddr)(struct rpc_xprt *xprt, char *buf, + size_t buflen); + unsigned short (*get_srcport)(struct rpc_xprt *xprt); int (*buf_alloc)(struct rpc_task *task); void (*buf_free)(struct rpc_task *task); void (*prepare_request)(struct rpc_rqst *req); diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h index 3eb0079669c5..38284f25eddf 100644 --- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -10,7 +10,6 @@ int init_socket_xprt(void); void cleanup_socket_xprt(void); -unsigned short get_srcport(struct rpc_xprt *); #define RPC_MIN_RESVPORT (1U) #define RPC_MAX_RESVPORT (65535U) -- cgit From bddac7c1e02ba47f0570e494c9289acea3062cc1 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sat, 26 Mar 2022 10:42:04 -0700 Subject: Revert "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit aa6f8dcbab473f3a3c7454b74caa46d36cdc5d13. It turns out this breaks at least the ath9k wireless driver, and possibly others. What the ath9k driver does on packet receive is to set up the DMA transfer with: int ath_rx_init(..) .. bf->bf_buf_addr = dma_map_single(sc->dev, skb->data, common->rx_bufsize, DMA_FROM_DEVICE); and then the receive logic (through ath_rx_tasklet()) will fetch incoming packets static bool ath_edma_get_buffers(..) .. dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr, common->rx_bufsize, DMA_FROM_DEVICE); ret = ath9k_hw_process_rxdesc_edma(ah, rs, skb->data); if (ret == -EINPROGRESS) { /*let device gain the buffer again*/ dma_sync_single_for_device(sc->dev, bf->bf_buf_addr, common->rx_bufsize, DMA_FROM_DEVICE); return false; } and it's worth noting how that first DMA sync: dma_sync_single_for_cpu(..DMA_FROM_DEVICE); is there to make sure the CPU can read the DMA buffer (possibly by copying it from the bounce buffer area, or by doing some cache flush). The iommu correctly turns that into a "copy from bounce bufer" so that the driver can look at the state of the packets. In the meantime, the device may continue to write to the DMA buffer, but we at least have a snapshot of the state due to that first DMA sync. But that _second_ DMA sync: dma_sync_single_for_device(..DMA_FROM_DEVICE); is telling the DMA mapping that the CPU wasn't interested in the area because the packet wasn't there. In the case of a DMA bounce buffer, that is a no-op. Note how it's not a sync for the CPU (the "for_device()" part), and it's not a sync for data written by the CPU (the "DMA_FROM_DEVICE" part). Or rather, it _should_ be a no-op. That's what commit aa6f8dcbab47 broke: it made the code bounce the buffer unconditionally, and changed the DMA_FROM_DEVICE to just unconditionally and illogically be DMA_TO_DEVICE. [ Side note: purely within the confines of the swiotlb driver it wasn't entirely illogical: The reason it did that odd DMA_FROM_DEVICE -> DMA_TO_DEVICE conversion thing is because inside the swiotlb driver, it uses just a swiotlb_bounce() helper that doesn't care about the whole distinction of who the sync is for - only which direction to bounce. So it took the "sync for device" to mean that the CPU must have been the one writing, and thought it meant DMA_TO_DEVICE. ] Also note how the commentary in that commit was wrong, probably due to that whole confusion, claiming that the commit makes the swiotlb code "bounce unconditionally (that is, also when dir == DMA_TO_DEVICE) in order do avoid synchronising back stale data from the swiotlb buffer" which is nonsensical for two reasons: - that "also when dir == DMA_TO_DEVICE" is nonsensical, as that was exactly when it always did - and should do - the bounce. - since this is a sync for the device (not for the CPU), we're clearly fundamentally not coping back stale data from the bounce buffers at all, because we'd be copying *to* the bounce buffers. So that commit was just very confused. It confused the direction of the synchronization (to the device, not the cpu) with the direction of the DMA (from the device). Reported-and-bisected-by: Oleksandr Natalenko Reported-by: Olha Cherevyk Cc: Halil Pasic Cc: Christoph Hellwig Cc: Kalle Valo Cc: Robin Murphy Cc: Toke Høiland-Jørgensen Cc: Maxime Bizon Cc: Johannes Berg Signed-off-by: Linus Torvalds --- include/linux/dma-mapping.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index dca2b1355bb1..6150d11a607e 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -61,6 +61,14 @@ */ #define DMA_ATTR_PRIVILEGED (1UL << 9) +/* + * This is a hint to the DMA-mapping subsystem that the device is expected + * to overwrite the entire mapped size, thus the caller does not require any + * of the previous buffer contents to be preserved. This allows + * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers. + */ +#define DMA_ATTR_OVERWRITE (1UL << 10) + /* * A dma_addr_t can hold any valid DMA or bus address for the platform. It can * be given to a device to use as a DMA source or target. It is specific to a -- cgit From 901c7280ca0d5e2b4a8929fbe0bfb007ac2a6544 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Mon, 28 Mar 2022 11:37:05 -0700 Subject: Reinstate some of "swiotlb: rework "fix info leak with DMA_FROM_DEVICE"" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Halil Pasic points out [1] that the full revert of that commit (revert in bddac7c1e02b), and that a partial revert that only reverts the problematic case, but still keeps some of the cleanups is probably better.  And that partial revert [2] had already been verified by Oleksandr Natalenko to also fix the issue, I had just missed that in the long discussion. So let's reinstate the cleanups from commit aa6f8dcbab47 ("swiotlb: rework "fix info leak with DMA_FROM_DEVICE""), and effectively only revert the part that caused problems. Link: https://lore.kernel.org/all/20220328013731.017ae3e3.pasic@linux.ibm.com/ [1] Link: https://lore.kernel.org/all/20220324055732.GB12078@lst.de/ [2] Link: https://lore.kernel.org/all/4386660.LvFx2qVVIh@natalenko.name/ [3] Suggested-by: Halil Pasic Tested-by: Oleksandr Natalenko Cc: Christoph Hellwig" Signed-off-by: Linus Torvalds --- include/linux/dma-mapping.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 6150d11a607e..dca2b1355bb1 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -61,14 +61,6 @@ */ #define DMA_ATTR_PRIVILEGED (1UL << 9) -/* - * This is a hint to the DMA-mapping subsystem that the device is expected - * to overwrite the entire mapped size, thus the caller does not require any - * of the previous buffer contents to be preserved. This allows - * bounce-buffering implementations to optimise DMA_FROM_DEVICE transfers. - */ -#define DMA_ATTR_OVERWRITE (1UL << 10) - /* * A dma_addr_t can hold any valid DMA or bus address for the platform. It can * be given to a device to use as a DMA source or target. It is specific to a -- cgit From 504c1cabe325df65c18ef38365ddd1a41c6b591b Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Tue, 25 Jan 2022 21:22:21 +0800 Subject: mm/balloon_compaction: make balloon page compaction callbacks static Since commit b1123ea6d3b3 ("mm: balloon: use general non-lru movable page feature"), these functions are called via balloon_aops callbacks. They're not called directly outside this file. So make them static and clean up the relevant code. Signed-off-by: Miaohe Lin Link: https://lore.kernel.org/r/20220125132221.2220-1-linmiaohe@huawei.com Signed-off-by: Michael S. Tsirkin Reviewed-by: Muchun Song --- include/linux/balloon_compaction.h | 22 ---------------------- 1 file changed, 22 deletions(-) (limited to 'include/linux') diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h index 338aa27e4773..edb7f6d41faa 100644 --- a/include/linux/balloon_compaction.h +++ b/include/linux/balloon_compaction.h @@ -80,12 +80,6 @@ static inline void balloon_devinfo_init(struct balloon_dev_info *balloon) #ifdef CONFIG_BALLOON_COMPACTION extern const struct address_space_operations balloon_aops; -extern bool balloon_page_isolate(struct page *page, - isolate_mode_t mode); -extern void balloon_page_putback(struct page *page); -extern int balloon_page_migrate(struct address_space *mapping, - struct page *newpage, - struct page *page, enum migrate_mode mode); /* * balloon_page_insert - insert a page into the balloon's page list and make @@ -155,22 +149,6 @@ static inline void balloon_page_delete(struct page *page) list_del(&page->lru); } -static inline bool balloon_page_isolate(struct page *page) -{ - return false; -} - -static inline void balloon_page_putback(struct page *page) -{ - return; -} - -static inline int balloon_page_migrate(struct page *newpage, - struct page *page, enum migrate_mode mode) -{ - return 0; -} - static inline gfp_t balloon_mapping_gfp_mask(void) { return GFP_HIGHUSER; -- cgit From a61280ddddaa45f95b60dd54c05f8e0e5b6810b7 Mon Sep 17 00:00:00 2001 From: Longpeng Date: Tue, 15 Mar 2022 11:25:51 +0800 Subject: vdpa: support exposing the config size to userspace - GET_CONFIG_SIZE: return the size of the virtio config space. The size contains the fields which are conditional on feature bits. Acked-by: Jason Wang Signed-off-by: Longpeng Link: https://lore.kernel.org/r/20220315032553.455-2-longpeng2@huawei.com Signed-off-by: Michael S. Tsirkin Reviewed-by: Stefano Garzarella --- include/linux/vdpa.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 721089bb4c84..a5269191edda 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -207,7 +207,8 @@ struct vdpa_map_file { * @reset: Reset device * @vdev: vdpa device * Returns integer: success (0) or error (< 0) - * @get_config_size: Get the size of the configuration space + * @get_config_size: Get the size of the configuration space includes + * fields that are conditional on feature bits. * @vdev: vdpa device * Returns size_t: configuration size * @get_config: Read from device specific configuration space -- cgit From 81d46d693173a5c86a9b0c648eca1817ad5c0ae5 Mon Sep 17 00:00:00 2001 From: Longpeng Date: Tue, 15 Mar 2022 11:25:52 +0800 Subject: vdpa: change the type of nvqs to u32 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Change vdpa_device.nvqs and vhost_vdpa.nvqs to use u32 Signed-off-by: Longpeng Link: https://lore.kernel.org/r/20220315032553.455-3-longpeng2@huawei.com Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang Signed-off-by: Longpeng <longpeng2@huawei.com>

Acked-by: Jason Wang <jasowang@redhat.com>
 
Reviewed-by: Stefano Garzarella --- include/linux/vdpa.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index a5269191edda..8943a209202e 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -83,7 +83,7 @@ struct vdpa_device { unsigned int index; bool features_valid; bool use_va; - int nvqs; + u32 nvqs; struct vdpa_mgmt_dev *mdev; }; @@ -338,10 +338,10 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, dev_struct, member)), name, use_va), \ dev_struct, member) -int vdpa_register_device(struct vdpa_device *vdev, int nvqs); +int vdpa_register_device(struct vdpa_device *vdev, u32 nvqs); void vdpa_unregister_device(struct vdpa_device *vdev); -int _vdpa_register_device(struct vdpa_device *vdev, int nvqs); +int _vdpa_register_device(struct vdpa_device *vdev, u32 nvqs); void _vdpa_unregister_device(struct vdpa_device *vdev); /** -- cgit From f32404ae1bb9a7428a3c77419672a28895d185bf Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 25 Mar 2022 22:50:23 +0100 Subject: net: move net_unlink_todo() out of the header There's no reason for this to be in netdevice.h, it's all just used in dev.c. Also make it no longer inline and let the compiler decide to do that by itself. Signed-off-by: Johannes Berg Link: https://lore.kernel.org/r/20220325225023.f49b9056fe1c.I6b901a2df00000837a9bd251a8dd259bd23f5ded@changeid Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cd7a597c55b1..59e27a2b7bf0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4601,16 +4601,6 @@ bool netdev_has_upper_dev(struct net_device *dev, struct net_device *upper_dev); struct net_device *netdev_upper_get_next_dev_rcu(struct net_device *dev, struct list_head **iter); -#ifdef CONFIG_LOCKDEP -static LIST_HEAD(net_unlink_list); - -static inline void net_unlink_todo(struct net_device *dev) -{ - if (list_empty(&dev->unlink_list)) - list_add_tail(&dev->unlink_list, &net_unlink_list); -} -#endif - /* iterate through upper list, must be called under RCU read lock */ #define netdev_for_each_upper_dev_rcu(dev, updev, iter) \ for (iter = &(dev)->adj_list.upper, \ -- cgit From 73f9b911faa74ac5107879de05c9489c419f41bb Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Sat, 26 Mar 2022 11:27:05 +0900 Subject: kprobes: Use rethook for kretprobe if possible Use rethook for kretprobe function return hooking if the arch sets CONFIG_HAVE_RETHOOK=y. In this case, CONFIG_KRETPROBE_ON_RETHOOK is set to 'y' automatically, and the kretprobe internal data fields switches to use rethook. If not, it continues to use kretprobe specific function return hooks. Suggested-by: Peter Zijlstra Signed-off-by: Masami Hiramatsu Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/164826162556.2455864.12255833167233452047.stgit@devnote2 --- include/linux/kprobes.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 49 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 312ff997c743..157168769fc2 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -28,6 +28,7 @@ #include #include #include +#include #include #ifdef CONFIG_KPROBES @@ -149,13 +150,20 @@ struct kretprobe { int maxactive; int nmissed; size_t data_size; +#ifdef CONFIG_KRETPROBE_ON_RETHOOK + struct rethook *rh; +#else struct freelist_head freelist; struct kretprobe_holder *rph; +#endif }; #define KRETPROBE_MAX_DATA_SIZE 4096 struct kretprobe_instance { +#ifdef CONFIG_KRETPROBE_ON_RETHOOK + struct rethook_node node; +#else union { struct freelist_node freelist; struct rcu_head rcu; @@ -164,6 +172,7 @@ struct kretprobe_instance { struct kretprobe_holder *rph; kprobe_opcode_t *ret_addr; void *fp; +#endif char data[]; }; @@ -186,10 +195,24 @@ extern void kprobe_busy_begin(void); extern void kprobe_busy_end(void); #ifdef CONFIG_KRETPROBES -extern void arch_prepare_kretprobe(struct kretprobe_instance *ri, - struct pt_regs *regs); +/* Check whether @p is used for implementing a trampoline. */ extern int arch_trampoline_kprobe(struct kprobe *p); +#ifdef CONFIG_KRETPROBE_ON_RETHOOK +static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance *ri) +{ + RCU_LOCKDEP_WARN(!rcu_read_lock_any_held(), + "Kretprobe is accessed from instance under preemptive context"); + + return (struct kretprobe *)READ_ONCE(ri->node.rethook->data); +} +static nokprobe_inline unsigned long get_kretprobe_retaddr(struct kretprobe_instance *ri) +{ + return ri->node.ret_addr; +} +#else +extern void arch_prepare_kretprobe(struct kretprobe_instance *ri, + struct pt_regs *regs); void arch_kretprobe_fixup_return(struct pt_regs *regs, kprobe_opcode_t *correct_ret_addr); @@ -232,6 +255,12 @@ static nokprobe_inline struct kretprobe *get_kretprobe(struct kretprobe_instance return READ_ONCE(ri->rph->rp); } +static nokprobe_inline unsigned long get_kretprobe_retaddr(struct kretprobe_instance *ri) +{ + return (unsigned long)ri->ret_addr; +} +#endif /* CONFIG_KRETPROBE_ON_RETHOOK */ + #else /* !CONFIG_KRETPROBES */ static inline void arch_prepare_kretprobe(struct kretprobe *rp, struct pt_regs *regs) @@ -395,7 +424,11 @@ void unregister_kretprobe(struct kretprobe *rp); int register_kretprobes(struct kretprobe **rps, int num); void unregister_kretprobes(struct kretprobe **rps, int num); +#ifdef CONFIG_KRETPROBE_ON_RETHOOK +#define kprobe_flush_task(tk) do {} while (0) +#else void kprobe_flush_task(struct task_struct *tk); +#endif void kprobe_free_init_mem(void); @@ -509,6 +542,19 @@ static inline bool is_kprobe_optinsn_slot(unsigned long addr) #endif /* !CONFIG_OPTPROBES */ #ifdef CONFIG_KRETPROBES +#ifdef CONFIG_KRETPROBE_ON_RETHOOK +static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr) +{ + return is_rethook_trampoline(addr); +} + +static nokprobe_inline +unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp, + struct llist_node **cur) +{ + return rethook_find_ret_addr(tsk, (unsigned long)fp, cur); +} +#else static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr) { return (void *)addr == kretprobe_trampoline_addr(); @@ -516,6 +562,7 @@ static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr) unsigned long kretprobe_find_ret_addr(struct task_struct *tsk, void *fp, struct llist_node **cur); +#endif #else static nokprobe_inline bool is_kretprobe_trampoline(unsigned long addr) { -- cgit From 5974ea7ce0f9a5987fc8cf5e08ad6e3e70bb542e Mon Sep 17 00:00:00 2001 From: Sungup Moon Date: Mon, 14 Mar 2022 20:05:45 +0900 Subject: nvme: allow duplicate NSIDs for private namespaces A NVMe subsystem with multiple controller can have private namespaces that use the same NSID under some conditions: "If Namespace Management, ANA Reporting, or NVM Sets are supported, the NSIDs shall be unique within the NVM subsystem. If the Namespace Management, ANA Reporting, and NVM Sets are not supported, then NSIDs: a) for shared namespace shall be unique; and b) for private namespace are not required to be unique." Reference: Section 6.1.6 NSID and Namespace Usage; NVM Express 1.4c spec. Make sure this specific setup is supported in Linux. Fixes: 9ad1927a3bc2 ("nvme: always search for namespace head") Signed-off-by: Sungup Moon [hch: refactored and fixed the controller vs subsystem based naming conflict] Signed-off-by: Christoph Hellwig Reviewed-by: Sagi Grimberg --- include/linux/nvme.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 9dbc3ef4daf7..2dcee34d467d 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -345,6 +345,7 @@ enum { NVME_CTRL_ONCS_TIMESTAMP = 1 << 6, NVME_CTRL_VWC_PRESENT = 1 << 0, NVME_CTRL_OACS_SEC_SUPP = 1 << 0, + NVME_CTRL_OACS_NS_MNGT_SUPP = 1 << 3, NVME_CTRL_OACS_DIRECTIVES = 1 << 5, NVME_CTRL_OACS_DBBUF_SUPP = 1 << 8, NVME_CTRL_LPA_CMD_EFFECTS_LOG = 1 << 1, -- cgit From 3ae8fd41573af4fb3a490c9ed947fc936ba87190 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Tue, 11 Jan 2022 16:57:50 -0600 Subject: rtc: mc146818-lib: Fix the AltCentury for AMD platforms Setting the century forward has been failing on AMD platforms. There was a previous attempt at fixing this for family 0x17 as part of commit 7ad295d5196a ("rtc: Fix the AltCentury value on AMD/Hygon platform") but this was later reverted due to some problems reported that appeared to stem from an FW bug on a family 0x17 desktop system. The same comments mentioned in the previous commit continue to apply to the newer platforms as well. ``` MC146818 driver use function mc146818_set_time() to set register RTC_FREQ_SELECT(RTC_REG_A)'s bit4-bit6 field which means divider stage reset value on Intel platform to 0x7. While AMD/Hygon RTC_REG_A(0Ah)'s bit4 is defined as DV0 [Reference]: DV0 = 0 selects Bank 0, DV0 = 1 selects Bank 1. Bit5-bit6 is defined as reserved. DV0 is set to 1, it will select Bank 1, which will disable AltCentury register(0x32) access. As UEFI pass acpi_gbl_FADT.century 0x32 (AltCentury), the CMOS write will be failed on code: CMOS_WRITE(century, acpi_gbl_FADT.century). Correct RTC_REG_A bank select bit(DV0) to 0 on AMD/Hygon CPUs, it will enable AltCentury(0x32) register writing and finally setup century as expected. ``` However in closer examination the change previously submitted was also modifying bits 5 & 6 which are declared reserved in the AMD documentation. So instead modify just the DV0 bank selection bit. Being cognizant that there was a failure reported before, split the code change out to a static function that can also be used for exclusions if any regressions such as Mikhail's pop up again. Cc: Jinke Fan Cc: Mikhail Gavrilov Link: https://lore.kernel.org/all/CABXGCsMLob0DC25JS8wwAYydnDoHBSoMh2_YLPfqm3TTvDE-Zw@mail.gmail.com/ Link: https://www.amd.com/system/files/TechDocs/51192_Bolton_FCH_RRG.pdf Signed-off-by: Raul E Rangel Signed-off-by: Mario Limonciello Signed-off-by: Alexandre Belloni Link: https://lore.kernel.org/r/20220111225750.1699-1-mario.limonciello@amd.com --- include/linux/mc146818rtc.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mc146818rtc.h b/include/linux/mc146818rtc.h index 808bb4cee230..b0da04fe087b 100644 --- a/include/linux/mc146818rtc.h +++ b/include/linux/mc146818rtc.h @@ -86,6 +86,8 @@ struct cmos_rtc_board_info { /* 2 values for divider stage reset, others for "testing purposes only" */ # define RTC_DIV_RESET1 0x60 # define RTC_DIV_RESET2 0x70 + /* In AMD BKDG bit 5 and 6 are reserved, bit 4 is for select dv0 bank */ +# define RTC_AMD_BANK_SELECT 0x10 /* Periodic intr. / Square wave rate select. 0=none, 1=32.8kHz,... 15=2Hz */ # define RTC_RATE_SELECT 0x0F -- cgit From eb07d5a4da041fd2e30e386e5fd12d23bb31cf9e Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Wed, 30 Mar 2022 11:48:37 +1100 Subject: SUNRPC: handle malloc failure in ->request_prepare If ->request_prepare() detects an error, it sets ->rq_task->tk_status. This is easy for callers to ignore. The only caller is xprt_request_enqueue_receive() and it does ignore the error, as does call_encode() which calls it. This can result in a request being queued to receive a reply without an allocated receive buffer. So instead of setting rq_task->tk_status, return an error, and store in ->tk_status only in call_encode(); The call to xprt_request_enqueue_receive() is now earlier in call_encode(), where the error can still be handled. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xprt.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index eef5e87c03b4..f171f8c09e13 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -144,7 +144,7 @@ struct rpc_xprt_ops { unsigned short (*get_srcport)(struct rpc_xprt *xprt); int (*buf_alloc)(struct rpc_task *task); void (*buf_free)(struct rpc_task *task); - void (*prepare_request)(struct rpc_rqst *req); + int (*prepare_request)(struct rpc_rqst *req); int (*send_request)(struct rpc_rqst *req); void (*wait_for_reply_request)(struct rpc_task *task); void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task); @@ -357,10 +357,9 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task); void xprt_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task); void xprt_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *req); -void xprt_request_prepare(struct rpc_rqst *req); bool xprt_prepare_transmit(struct rpc_task *task); void xprt_request_enqueue_transmit(struct rpc_task *task); -void xprt_request_enqueue_receive(struct rpc_task *task); +int xprt_request_enqueue_receive(struct rpc_task *task); void xprt_request_wait_receive(struct rpc_task *task); void xprt_request_dequeue_xprt(struct rpc_task *task); bool xprt_request_need_retransmit(struct rpc_task *task); -- cgit From 7968778914e53788a01c2dee2692cab157de9ac0 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Wed, 9 Mar 2022 20:50:39 +0100 Subject: PCI: Remove the deprecated "pci-dma-compat.h" API Now that all usages of the functions defined in "pci-dma-compat.h" have been removed, it is time to remove this file as well. In order not to break builds, move the "#include " that was in "pci-dma-compat.h" into "include/linux/pci.h" Signed-off-by: Christophe JAILLET Acked-by: Bjorn Helgaas Signed-off-by: Christoph Hellwig --- include/linux/pci-dma-compat.h | 129 ----------------------------------------- include/linux/pci.h | 3 +- 2 files changed, 1 insertion(+), 131 deletions(-) delete mode 100644 include/linux/pci-dma-compat.h (limited to 'include/linux') diff --git a/include/linux/pci-dma-compat.h b/include/linux/pci-dma-compat.h deleted file mode 100644 index 249d4d7fbf18..000000000000 --- a/include/linux/pci-dma-compat.h +++ /dev/null @@ -1,129 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* include this file if the platform implements the dma_ DMA Mapping API - * and wants to provide the pci_ DMA Mapping API in terms of it */ - -#ifndef _ASM_GENERIC_PCI_DMA_COMPAT_H -#define _ASM_GENERIC_PCI_DMA_COMPAT_H - -#include - -/* This defines the direction arg to the DMA mapping routines. */ -#define PCI_DMA_BIDIRECTIONAL DMA_BIDIRECTIONAL -#define PCI_DMA_TODEVICE DMA_TO_DEVICE -#define PCI_DMA_FROMDEVICE DMA_FROM_DEVICE -#define PCI_DMA_NONE DMA_NONE - -static inline void * -pci_alloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) -{ - return dma_alloc_coherent(&hwdev->dev, size, dma_handle, GFP_ATOMIC); -} - -static inline void * -pci_zalloc_consistent(struct pci_dev *hwdev, size_t size, - dma_addr_t *dma_handle) -{ - return dma_alloc_coherent(&hwdev->dev, size, dma_handle, GFP_ATOMIC); -} - -static inline void -pci_free_consistent(struct pci_dev *hwdev, size_t size, - void *vaddr, dma_addr_t dma_handle) -{ - dma_free_coherent(&hwdev->dev, size, vaddr, dma_handle); -} - -static inline dma_addr_t -pci_map_single(struct pci_dev *hwdev, void *ptr, size_t size, int direction) -{ - return dma_map_single(&hwdev->dev, ptr, size, (enum dma_data_direction)direction); -} - -static inline void -pci_unmap_single(struct pci_dev *hwdev, dma_addr_t dma_addr, - size_t size, int direction) -{ - dma_unmap_single(&hwdev->dev, dma_addr, size, (enum dma_data_direction)direction); -} - -static inline dma_addr_t -pci_map_page(struct pci_dev *hwdev, struct page *page, - unsigned long offset, size_t size, int direction) -{ - return dma_map_page(&hwdev->dev, page, offset, size, (enum dma_data_direction)direction); -} - -static inline void -pci_unmap_page(struct pci_dev *hwdev, dma_addr_t dma_address, - size_t size, int direction) -{ - dma_unmap_page(&hwdev->dev, dma_address, size, (enum dma_data_direction)direction); -} - -static inline int -pci_map_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) -{ - return dma_map_sg(&hwdev->dev, sg, nents, (enum dma_data_direction)direction); -} - -static inline void -pci_unmap_sg(struct pci_dev *hwdev, struct scatterlist *sg, - int nents, int direction) -{ - dma_unmap_sg(&hwdev->dev, sg, nents, (enum dma_data_direction)direction); -} - -static inline void -pci_dma_sync_single_for_cpu(struct pci_dev *hwdev, dma_addr_t dma_handle, - size_t size, int direction) -{ - dma_sync_single_for_cpu(&hwdev->dev, dma_handle, size, (enum dma_data_direction)direction); -} - -static inline void -pci_dma_sync_single_for_device(struct pci_dev *hwdev, dma_addr_t dma_handle, - size_t size, int direction) -{ - dma_sync_single_for_device(&hwdev->dev, dma_handle, size, (enum dma_data_direction)direction); -} - -static inline void -pci_dma_sync_sg_for_cpu(struct pci_dev *hwdev, struct scatterlist *sg, - int nelems, int direction) -{ - dma_sync_sg_for_cpu(&hwdev->dev, sg, nelems, (enum dma_data_direction)direction); -} - -static inline void -pci_dma_sync_sg_for_device(struct pci_dev *hwdev, struct scatterlist *sg, - int nelems, int direction) -{ - dma_sync_sg_for_device(&hwdev->dev, sg, nelems, (enum dma_data_direction)direction); -} - -static inline int -pci_dma_mapping_error(struct pci_dev *pdev, dma_addr_t dma_addr) -{ - return dma_mapping_error(&pdev->dev, dma_addr); -} - -#ifdef CONFIG_PCI -static inline int pci_set_dma_mask(struct pci_dev *dev, u64 mask) -{ - return dma_set_mask(&dev->dev, mask); -} - -static inline int pci_set_consistent_dma_mask(struct pci_dev *dev, u64 mask) -{ - return dma_set_coherent_mask(&dev->dev, mask); -} -#else -static inline int pci_set_dma_mask(struct pci_dev *dev, u64 mask) -{ return -EIO; } -static inline int pci_set_consistent_dma_mask(struct pci_dev *dev, u64 mask) -{ return -EIO; } -#endif - -#endif diff --git a/include/linux/pci.h b/include/linux/pci.h index b957eeb89c7a..60adf42460ab 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2473,8 +2473,7 @@ static inline bool pci_is_thunderbolt_attached(struct pci_dev *pdev) void pci_uevent_ers(struct pci_dev *pdev, enum pci_ers_result err_type); #endif -/* Provide the legacy pci_dma_* API */ -#include +#include #define pci_printk(level, pdev, fmt, arg...) \ dev_printk(level, &(pdev)->dev, fmt, ##arg) -- cgit From c18c86808b78c4c2dc69f27f37c57abab14ee387 Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Wed, 30 Mar 2022 02:22:17 -0400 Subject: Revert "virtio_config: introduce a new .enable_cbs method" This reverts commit d50497eb4e554e1f0351e1836ee7241c059592e6. The new callback ended up not being used, and it's asymmetrical: just enable, no disable. Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang --- include/linux/virtio_config.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index dafdc7f48c01..b341dd62aa4d 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -23,8 +23,6 @@ struct virtio_shm_region { * any of @get/@set, @get_status/@set_status, or @get_features/ * @finalize_features are NOT safe to be called from an atomic * context. - * @enable_cbs: enable the callbacks - * vdev: the virtio_device * @get: read the value of a configuration field * vdev: the virtio_device * offset: the offset of the configuration field @@ -78,7 +76,6 @@ struct virtio_shm_region { */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { - void (*enable_cbs)(struct virtio_device *vdev); void (*get)(struct virtio_device *vdev, unsigned offset, void *buf, unsigned len); void (*set)(struct virtio_device *vdev, unsigned offset, @@ -233,9 +230,6 @@ void virtio_device_ready(struct virtio_device *dev) { unsigned status = dev->config->get_status(dev); - if (dev->config->enable_cbs) - dev->config->enable_cbs(dev); - BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK); dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK); } -- cgit From 4a9c7bbe2ed4d2b240674b1fb606c41d3940c412 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 29 Mar 2022 18:14:56 -0700 Subject: bpf: Resolve to prog->aux->dst_prog->type only for BPF_PROG_TYPE_EXT The commit 7e40781cc8b7 ("bpf: verifier: Use target program's type for access verifications") fixes the verifier checking for BPF_PROG_TYPE_EXT (extension) prog such that the verifier looks for things based on the target prog type that it is extending instead of the BPF_PROG_TYPE_EXT itself. The current resolve_prog_type() returns the target prog type. It checks for nullness on prog->aux->dst_prog. However, when loading a BPF_PROG_TYPE_TRACING prog and it is tracing another bpf prog instead of a kernel function, prog->aux->dst_prog is not NULL also. In this case, the verifier should still verify as the BPF_PROG_TYPE_TRACING type instead of the traced prog type in prog->aux->dst_prog->type. An oops has been reported when tracing a struct_ops prog. A NULL dereference happened in check_return_code() when accessing the prog->aux->attach_func_proto->type and prog->aux->attach_func_proto is NULL here because the traced struct_ops prog has the "unreliable" set. This patch is to change the resolve_prog_type() to only return the target prog type if the prog being verified is BPF_PROG_TYPE_EXT. Fixes: 7e40781cc8b7 ("bpf: verifier: Use target program's type for access verifications") Signed-off-by: Martin KaFai Lau Signed-off-by: Alexei Starovoitov Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20220330011456.2984509-1-kafai@fb.com --- include/linux/bpf_verifier.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index c1fc4af47f69..3a9d2d7cc6b7 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -570,9 +570,11 @@ static inline u32 type_flag(u32 type) return type & ~BPF_BASE_TYPE_MASK; } +/* only use after check_attach_btf_id() */ static inline enum bpf_prog_type resolve_prog_type(struct bpf_prog *prog) { - return prog->aux->dst_prog ? prog->aux->dst_prog->type : prog->type; + return prog->type == BPF_PROG_TYPE_EXT ? + prog->aux->dst_prog->type : prog->type; } #endif /* _LINUX_BPF_VERIFIER_H */ -- cgit From 7dd5ad2d3e82fb55229e3fe18e09160878e77e20 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 31 Mar 2022 10:36:55 +0200 Subject: Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels" Revert commit bf9ad37dc8a. It needs to be better encapsulated and generalized. Signed-off-by: Thomas Gleixner Cc: "Eric W. Biederman" Cc: Oleg Nesterov Cc: Sebastian Andrzej Siewior --- include/linux/sched.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 4a6fdd2a679f..d5e3c00b74e1 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1090,9 +1090,6 @@ struct task_struct { /* Restored if set_restore_sigmask() was used: */ sigset_t saved_sigmask; struct sigpending pending; -#ifdef CONFIG_RT_DELAYED_SIGNALS - struct kernel_siginfo forced_info; -#endif unsigned long sas_ss_sp; size_t sas_ss_size; unsigned int sas_ss_flags; -- cgit From 48ec13d36d3ff716a8a08e6583a925def7a2564d Mon Sep 17 00:00:00 2001 From: Joey Gouly Date: Fri, 18 Mar 2022 12:12:33 +0000 Subject: gpio: Properly document parent data union Suppress a warning in the html docs by documenting these fields separately. Signed-off-by: Joey Gouly Link: https://lore.kernel.org/lkml/20211027220118.71a229ab@canb.auug.org.au/ Cc: Linus Walleij Cc: Bartosz Golaszewski Cc: Marc Zyngier Cc: Stephen Rothwell Reviewed-by: Linus Walleij Signed-off-by: Bartosz Golaszewski --- include/linux/gpio/driver.h | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index b0728c8ad90c..98c93510640e 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -168,13 +168,16 @@ struct gpio_irq_chip { /** * @parent_handler_data: + * + * If @per_parent_data is false, @parent_handler_data is a single + * pointer used as the data associated with every parent interrupt. + * * @parent_handler_data_array: * - * Data associated, and passed to, the handler for the parent - * interrupt. Can either be a single pointer if @per_parent_data - * is false, or an array of @num_parents pointers otherwise. If - * @per_parent_data is true, @parent_handler_data_array cannot be - * NULL. + * If @per_parent_data is true, @parent_handler_data_array is + * an array of @num_parents pointers, and is used to associate + * different data for each parent. This cannot be NULL if + * @per_parent_data is true. */ union { void *parent_handler_data; -- cgit From 229a08a4f4e4f9949801cc39b6480ddc9c487183 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 9 Mar 2022 09:37:31 -0800 Subject: ARM/dma-mapping: Remove CMA code when not built with CMA The MAX_CMA_AREAS could be set to 0, which would result in code that would attempt to operate beyond the end of a zero-sized array. If CONFIG_CMA is disabled, just remove this code entirely. Found when building arm on GCC 10.x for several defconfigs (e.g. axm55xx_defconfig) under -Warray-bounds: arch/arm/mm/dma-mapping.c:396:22: warning: array subscript is outside array bounds of 'struct dma_contig_early_reserve[0]' [-Warray-bounds] 396 | dma_mmu_remap[dma_mmu_remap_num].size = size; | ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ arch/arm/mm/dma-mapping.c:389:40: note: while referencing 'dma_mmu_remap' 389 | static struct dma_contig_early_reserve dma_mmu_remap[MAX_CMA_AREAS] __initdata; | ^~~~~~~~~~~~~ Cc: Russell King Cc: Logan Gunthorpe Cc: Martin Oliveira Cc: David Hildenbrand Cc: Andrew Morton Cc: Stephen Rothwell Cc: Zi Yan Cc: Hari Bathini Cc: Minchan Kim Cc: Mike Kravetz Cc: linux-arm-kernel@lists.infradead.org Link: https://lore.kernel.org/all/6243ee60.1c69fb81.16de6.7dbf@mx.google.com/ Signed-off-by: Kees Cook Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/lkml/20220310070041.GA24874@lst.de Reviewed-by: David Hildenbrand Link: https://lore.kernel.org/lkml/9059fa71-330f-f04f-b155-2850abb72a71@redhat.com --- include/linux/cma.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cma.h b/include/linux/cma.h index bd801023504b..2c2ede7f0724 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -12,10 +12,6 @@ */ #ifdef CONFIG_CMA_AREAS #define MAX_CMA_AREAS (1 + CONFIG_CMA_AREAS) - -#else -#define MAX_CMA_AREAS (0) - #endif #define CMA_MAX_NAME 64 -- cgit From 15325e3c1013035c2e3e266ba79a0c3bef905f25 Mon Sep 17 00:00:00 2001 From: Christian König Date: Thu, 11 Nov 2021 15:18:34 +0100 Subject: dma-buf: drop the DAG approach for the dma_resv object v3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit So far we had the approach of using a directed acyclic graph with the dma_resv obj. This turned out to have many downsides, especially it means that every single driver and user of this interface needs to be aware of this restriction when adding fences. If the rules for the DAG are not followed then we end up with potential hard to debug memory corruption, information leaks or even elephant big security holes because we allow userspace to access freed up memory. Since we already took a step back from that by always looking at all fences we now go a step further and stop dropping the shared fences when a new exclusive one is added. v2: Drop some now superflous documentation v3: Add some more documentation for the new handling. Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220321135856.1331-11-christian.koenig@amd.com --- include/linux/dma-buf.h | 4 +--- include/linux/dma-resv.h | 22 +++++----------------- 2 files changed, 6 insertions(+), 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 7ab50076e7a6..85ab5554425e 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -424,9 +424,7 @@ struct dma_buf { * IMPORTANT: * * All drivers must obey the struct dma_resv rules, specifically the - * rules for updating fences, see &dma_resv.fence_excl and - * &dma_resv.fence. If these dependency rules are broken access tracking - * can be lost resulting in use after free issues. + * rules for updating and obeying fences. */ struct dma_resv *resv; diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 202cc65d0621..dccaf7b1663e 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -93,23 +93,11 @@ struct dma_resv { * * The exclusive fence, if there is one currently. * - * There are two ways to update this fence: - * - * - First by calling dma_resv_add_excl_fence(), which replaces all - * fences attached to the reservation object. To guarantee that no - * fences are lost, this new fence must signal only after all previous - * fences, both shared and exclusive, have signalled. In some cases it - * is convenient to achieve that by attaching a struct dma_fence_array - * with all the new and old fences. - * - * - Alternatively the fence can be set directly, which leaves the - * shared fences unchanged. To guarantee that no fences are lost, this - * new fence must signal only after the previous exclusive fence has - * signalled. Since the shared fences are staying intact, it is not - * necessary to maintain any ordering against those. If semantically - * only a new access is added without actually treating the previous - * one as a dependency the exclusive fences can be strung together - * using struct dma_fence_chain. + * To guarantee that no fences are lost, this new fence must signal + * only after the previous exclusive fence has signalled. If + * semantically only a new access is added without actually treating the + * previous one as a dependency the exclusive fences can be strung + * together using struct dma_fence_chain. * * Note that actual semantics of what an exclusive or shared fence mean * is defined by the user, for reservation objects shared across drivers -- cgit From aad5b23ebf21573a32b6f07644f028d64492a5d6 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Mon, 28 Mar 2022 12:34:31 -0400 Subject: dm: fix dm_io and dm_target_io flags race condition on Alpha Early alpha processors cannot write a single byte or short; they read 8 bytes, modify the value in registers and write back 8 bytes. This could cause race condition in the structure dm_io - if the fields flags and io_count are modified simultaneously. Fix this bug by using 32-bit flags if we are on Alpha and if we are compiling for a processor that doesn't have the byte-word-extension. Signed-off-by: Mikulas Patocka Fixes: bd4a6dd241ae ("dm: reduce size of dm_io and dm_target_io structs") [snitzer: Jens allowed this change since Mikulas owns a relevant Alpha!] Acked-by: Jens Axboe Signed-off-by: Mike Snitzer --- include/linux/blk_types.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index dd0763a1c674..1973ef9bd40f 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -85,8 +85,10 @@ struct block_device { */ #if defined(CONFIG_ALPHA) && !defined(__alpha_bwx__) typedef u32 __bitwise blk_status_t; +typedef u32 blk_short_t; #else typedef u8 __bitwise blk_status_t; +typedef u16 blk_short_t; #endif #define BLK_STS_OK 0 #define BLK_STS_NOTSUPP ((__force blk_status_t)1) -- cgit From ebf921a9fac38560e0fc3a4381e163a6969efd5a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 22 Jan 2022 15:46:22 -0500 Subject: readahead: Remove read_cache_pages() With no remaining users, remove this function and the related infrastructure. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Al Viro Acked-by: Al Viro --- include/linux/pagemap.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index a8d0b327b066..993994cd943a 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -752,8 +752,6 @@ struct page *read_cache_page(struct address_space *, pgoff_t index, filler_t *filler, void *data); extern struct page * read_cache_page_gfp(struct address_space *mapping, pgoff_t index, gfp_t gfp_mask); -extern int read_cache_pages(struct address_space *mapping, - struct list_head *pages, filler_t *filler, void *data); static inline struct page *read_mapping_page(struct address_space *mapping, pgoff_t index, struct file *file) -- cgit From 704528d895dd3e7b173e672116b4eb2b0a0fceb0 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 23 Mar 2022 21:29:04 -0400 Subject: fs: Remove ->readpages address space operation All filesystems have now been converted to use ->readahead, so remove the ->readpages operation and fix all the comments that used to refer to it. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Al Viro Acked-by: Al Viro --- include/linux/fs.h | 6 ------ include/linux/fsverity.h | 2 +- 2 files changed, 1 insertion(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 183160872133..7c81887cc7e8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -370,12 +370,6 @@ struct address_space_operations { /* Mark a folio dirty. Return true if this dirtied it */ bool (*dirty_folio)(struct address_space *, struct folio *); - /* - * Reads in the requested pages. Unlike ->readpage(), this is - * PURELY used for read-ahead!. - */ - int (*readpages)(struct file *filp, struct address_space *mapping, - struct list_head *pages, unsigned nr_pages); void (*readahead)(struct readahead_control *); int (*write_begin)(struct file *, struct address_space *mapping, diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h index b568b3c7d095..a7afc800bd8d 100644 --- a/include/linux/fsverity.h +++ b/include/linux/fsverity.h @@ -221,7 +221,7 @@ static inline void fsverity_enqueue_verify_work(struct work_struct *work) * * This checks whether ->i_verity_info has been set. * - * Filesystems call this from ->readpages() to check whether the pages need to + * Filesystems call this from ->readahead() to check whether the pages need to * be verified or not. Don't use IS_VERITY() for this purpose; it's subject to * a race condition where the file is being read concurrently with * FS_IOC_ENABLE_VERITY completing. (S_VERITY is set before ->i_verity_info.) -- cgit From a9fcd89d67bb8c4ad613b54ab691fc603c94a03a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 14 Feb 2022 09:13:43 -0500 Subject: fs: Remove read_actor_t This typedef is not used any more. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Al Viro Acked-by: Al Viro --- include/linux/fs.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 7c81887cc7e8..7588d3a0ced8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -357,9 +357,6 @@ typedef struct { int error; } read_descriptor_t; -typedef int (*read_actor_t)(read_descriptor_t *, struct page *, - unsigned long, unsigned long); - struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); -- cgit From b2403a61308533c576c9dd783fcb73a9186e0b37 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 14 Feb 2022 09:15:34 -0500 Subject: fs, net: Move read_descriptor_t to net.h fs.h has no more need for this typedef; networking is now the sole user of the read_descriptor_t. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Al Viro Acked-by: Al Viro --- include/linux/fs.h | 19 ------------------- include/linux/net.h | 19 +++++++++++++++++++ 2 files changed, 19 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 7588d3a0ced8..8ff28939de60 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -338,25 +338,6 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb) return kiocb->ki_complete == NULL; } -/* - * "descriptor" for what we're up to with a read. - * This allows us to use the same read code yet - * have multiple different users of the data that - * we read from a file. - * - * The simplest case just copies the data to user - * mode. - */ -typedef struct { - size_t written; - size_t count; - union { - char __user *buf; - void *data; - } arg; - int error; -} read_descriptor_t; - struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); diff --git a/include/linux/net.h b/include/linux/net.h index ba736b457a06..12093f4db50c 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -125,6 +125,25 @@ struct socket { struct socket_wq wq; }; +/* + * "descriptor" for what we're up to with a read. + * This allows us to use the same read code yet + * have multiple different users of the data that + * we read from a file. + * + * The simplest case just copies the data to user + * mode. + */ +typedef struct { + size_t written; + size_t count; + union { + char __user *buf; + void *data; + } arg; + int error; +} read_descriptor_t; + struct vm_area_struct; struct page; struct sockaddr; -- cgit From 800ba29547e16d5fbe67ca764ba660e049e9f1bf Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 19 Feb 2022 23:19:49 -0500 Subject: fs: Pass an iocb to generic_perform_write() We can extract both the file pointer and the pos from the iocb. This simplifies each caller as well as allowing generic_perform_write() to see more of the iocb contents in the future. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Christian Brauner Reviewed-by: Al Viro Acked-by: Al Viro --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 8ff28939de60..468dc7ec821f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2999,7 +2999,7 @@ extern ssize_t generic_file_read_iter(struct kiocb *, struct iov_iter *); extern ssize_t __generic_file_write_iter(struct kiocb *, struct iov_iter *); extern ssize_t generic_file_write_iter(struct kiocb *, struct iov_iter *); extern ssize_t generic_file_direct_write(struct kiocb *, struct iov_iter *); -extern ssize_t generic_perform_write(struct file *, struct iov_iter *, loff_t); +ssize_t generic_perform_write(struct kiocb *, struct iov_iter *); ssize_t vfs_iter_read(struct file *file, struct iov_iter *iter, loff_t *ppos, rwf_t flags); -- cgit From d7414ba14a3a67f81321069219dc7dbc095022c3 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 20 Feb 2022 22:28:03 -0500 Subject: filemap: Remove AOP_FLAG_CONT_EXPAND This flag is no longer used, so remove it. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Al Viro Acked-by: Al Viro --- include/linux/fs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 468dc7ec821f..bbde95387a23 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -275,7 +275,6 @@ enum positive_aop_returns { AOP_TRUNCATED_PAGE = 0x80001, }; -#define AOP_FLAG_CONT_EXPAND 0x0001 /* called from cont_expand */ #define AOP_FLAG_NOFS 0x0002 /* used by filesystem to direct * helper code (eg buffer layer) * to clear GFP_FS from alloc */ -- cgit From ada543af3bfe3d953986eca118601b9612382c13 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Fri, 1 Apr 2022 11:28:45 -0700 Subject: mm, kasan: fix __GFP_BITS_SHIFT definition breaking LOCKDEP KASAN changes that added new GFP flags mistakenly updated __GFP_BITS_SHIFT as the total number of GFP bits instead of as a shift used to define __GFP_BITS_MASK. This broke LOCKDEP, as __GFP_BITS_MASK now gets the 25th bit enabled instead of the 28th for __GFP_NOLOCKDEP. Update __GFP_BITS_SHIFT to always count KASAN GFP bits. In the future, we could handle all combinations of KASAN and LOCKDEP to occupy as few bits as possible. For now, we have enough GFP bits to be inefficient in this quick fix. Link: https://lkml.kernel.org/r/462ff52742a1fcc95a69778685737f723ee4dfb3.1648400273.git.andreyknvl@google.com Fixes: 9353ffa6e9e9 ("kasan, page_alloc: allow skipping memory init for HW_TAGS") Fixes: 53ae233c30a6 ("kasan, page_alloc: allow skipping unpoisoning for HW_TAGS") Fixes: f49d9c5bb15c ("kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS") Signed-off-by: Andrey Konovalov Reported-by: Sebastian Andrzej Siewior Tested-by: Sebastian Andrzej Siewior Acked-by: Vlastimil Babka Cc: Marco Elver Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Andrey Ryabinin Cc: Matthew Wilcox Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/gfp.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 0fa17fb85de5..761f8f1885c7 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -264,9 +264,7 @@ struct vm_area_struct; #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT (24 + \ - 3 * IS_ENABLED(CONFIG_KASAN_HW_TAGS) + \ - IS_ENABLED(CONFIG_LOCKDEP)) +#define __GFP_BITS_SHIFT (27 + IS_ENABLED(CONFIG_LOCKDEP)) #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /** -- cgit From df06dae3f2a8bfb83683abf88d3dcde23fc8093d Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Wed, 23 Feb 2022 16:53:02 +0000 Subject: KVM: Don't actually set a request when evicting vCPUs for GFN cache invd Don't actually set a request bit in vcpu->requests when making a request purely to force a vCPU to exit the guest. Logging a request but not actually consuming it would cause the vCPU to get stuck in an infinite loop during KVM_RUN because KVM would see the pending request and bail from VM-Enter to service the request. Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as nothing in KVM is wired up to set guest_uses_pa=true. But, it'd be all too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init() without implementing handling of the request, especially since getting test coverage of MMU notifier interaction with specific KVM features usually requires a directed test. Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus to evict_vcpus. The purpose of the request is to get vCPUs out of guest mode, it's supposed to _avoid_ waking vCPUs that are blocking. Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to what it wants to accomplish, and to genericize the name so that it can used for similar but unrelated scenarios, should they arise in the future. Add a comment and documentation to explain why the "no action" request exists. Add compile-time assertions to help detect improper usage. Use the inner assertless helper in the one s390 path that makes requests without a hardcoded request. Cc: David Woodhouse Signed-off-by: Sean Christopherson Message-Id: <20220223165302.3205276-1-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 38 +++++++++++++++++++++++++++++++------- 1 file changed, 31 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9536ffa0473b..678fd7914521 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -148,6 +148,7 @@ static inline bool is_error_page(struct page *page) #define KVM_REQUEST_MASK GENMASK(7,0) #define KVM_REQUEST_NO_WAKEUP BIT(8) #define KVM_REQUEST_WAIT BIT(9) +#define KVM_REQUEST_NO_ACTION BIT(10) /* * Architecture-independent vcpu->requests bit members * Bits 4-7 are reserved for more arch-independent bits. @@ -156,9 +157,18 @@ static inline bool is_error_page(struct page *page) #define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_UNBLOCK 2 #define KVM_REQ_UNHALT 3 -#define KVM_REQ_GPC_INVALIDATE (5 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQUEST_ARCH_BASE 8 +/* + * KVM_REQ_OUTSIDE_GUEST_MODE exists is purely as way to force the vCPU to + * OUTSIDE_GUEST_MODE. KVM_REQ_OUTSIDE_GUEST_MODE differs from a vCPU "kick" + * in that it ensures the vCPU has reached OUTSIDE_GUEST_MODE before continuing + * on. A kick only guarantees that the vCPU is on its way out, e.g. a previous + * kick may have set vcpu->mode to EXITING_GUEST_MODE, and so there's no + * guarantee the vCPU received an IPI and has actually exited guest mode. + */ +#define KVM_REQ_OUTSIDE_GUEST_MODE (KVM_REQUEST_NO_ACTION | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) + #define KVM_ARCH_REQ_FLAGS(nr, flags) ({ \ BUILD_BUG_ON((unsigned)(nr) >= (sizeof_field(struct kvm_vcpu, requests) * 8) - KVM_REQUEST_ARCH_BASE); \ (unsigned)(((nr) + KVM_REQUEST_ARCH_BASE) | (flags)); \ @@ -1222,7 +1232,9 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn); * @vcpu: vCPU to be used for marking pages dirty and to be woken on * invalidation. * @guest_uses_pa: indicates that the resulting host physical PFN is used while - * @vcpu is IN_GUEST_MODE so invalidations should wake it. + * @vcpu is IN_GUEST_MODE; invalidations of the cache from MMU + * notifiers (but not for KVM memslot changes!) will also force + * @vcpu to exit the guest to refresh the cache. * @kernel_map: requests a kernel virtual mapping (kmap / memremap). * @gpa: guest physical address to map. * @len: sanity check; the range being access must fit a single page. @@ -1233,10 +1245,9 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn); * -EFAULT for an untranslatable guest physical address. * * This primes a gfn_to_pfn_cache and links it into the @kvm's list for - * invalidations to be processed. Invalidation callbacks to @vcpu using - * %KVM_REQ_GPC_INVALIDATE will occur only for MMU notifiers, not for KVM - * memslot changes. Callers are required to use kvm_gfn_to_pfn_cache_check() - * to ensure that the cache is valid before accessing the target page. + * invalidations to be processed. Callers are required to use + * kvm_gfn_to_pfn_cache_check() to ensure that the cache is valid before + * accessing the target page. */ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, struct kvm_vcpu *vcpu, bool guest_uses_pa, @@ -1984,7 +1995,7 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args) void kvm_arch_irq_routing_update(struct kvm *kvm); -static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu) +static inline void __kvm_make_request(int req, struct kvm_vcpu *vcpu) { /* * Ensure the rest of the request is published to kvm_check_request's @@ -1994,6 +2005,19 @@ static inline void kvm_make_request(int req, struct kvm_vcpu *vcpu) set_bit(req & KVM_REQUEST_MASK, (void *)&vcpu->requests); } +static __always_inline void kvm_make_request(int req, struct kvm_vcpu *vcpu) +{ + /* + * Request that don't require vCPU action should never be logged in + * vcpu->requests. The vCPU won't clear the request, so it will stay + * logged indefinitely and prevent the vCPU from entering the guest. + */ + BUILD_BUG_ON(!__builtin_constant_p(req) || + (req & KVM_REQUEST_NO_ACTION)); + + __kvm_make_request(req, vcpu); +} + static inline bool kvm_request_pending(struct kvm_vcpu *vcpu) { return READ_ONCE(vcpu->requests); -- cgit From d0d96121d03d6d9cf608d948247a9f24f5a02da9 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Thu, 3 Mar 2022 15:41:11 +0000 Subject: KVM: Use enum to track if cached PFN will be used in guest and/or host Replace the guest_uses_pa and kernel_map booleans in the PFN cache code with a unified enum/bitmask. Using explicit names makes it easier to review and audit call sites. Opportunistically add a WARN to prevent passing garbage; instantating a cache without declaring its usage is either buggy or pointless. Signed-off-by: Sean Christopherson Signed-off-by: David Woodhouse Signed-off-by: Paolo Bonzini Message-Id: <20220303154127.202856-2-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 16 ++++++++-------- include/linux/kvm_types.h | 10 ++++++++-- 2 files changed, 16 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 678fd7914521..be9bbc0c6200 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1231,11 +1231,12 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn); * @gpc: struct gfn_to_pfn_cache object. * @vcpu: vCPU to be used for marking pages dirty and to be woken on * invalidation. - * @guest_uses_pa: indicates that the resulting host physical PFN is used while - * @vcpu is IN_GUEST_MODE; invalidations of the cache from MMU - * notifiers (but not for KVM memslot changes!) will also force - * @vcpu to exit the guest to refresh the cache. - * @kernel_map: requests a kernel virtual mapping (kmap / memremap). + * @usage: indicates if the resulting host physical PFN is used while + * the @vcpu is IN_GUEST_MODE (in which case invalidation of + * the cache from MMU notifiers---but not for KVM memslot + * changes!---will also force @vcpu to exit the guest and + * refresh the cache); and/or if the PFN used directly + * by KVM (and thus needs a kernel virtual mapping). * @gpa: guest physical address to map. * @len: sanity check; the range being access must fit a single page. * @dirty: mark the cache dirty immediately. @@ -1250,9 +1251,8 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn); * accessing the target page. */ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, - struct kvm_vcpu *vcpu, bool guest_uses_pa, - bool kernel_map, gpa_t gpa, unsigned long len, - bool dirty); + struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, + gpa_t gpa, unsigned long len, bool dirty); /** * kvm_gfn_to_pfn_cache_check - check validity of a gfn_to_pfn_cache. diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index dceac12c1ce5..784f37cbf33e 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -18,6 +18,7 @@ struct kvm_memslots; enum kvm_mr_change; +#include #include #include @@ -46,6 +47,12 @@ typedef u64 hfn_t; typedef hfn_t kvm_pfn_t; +enum pfn_cache_usage { + KVM_GUEST_USES_PFN = BIT(0), + KVM_HOST_USES_PFN = BIT(1), + KVM_GUEST_AND_HOST_USE_PFN = KVM_GUEST_USES_PFN | KVM_HOST_USES_PFN, +}; + struct gfn_to_hva_cache { u64 generation; gpa_t gpa; @@ -64,11 +71,10 @@ struct gfn_to_pfn_cache { rwlock_t lock; void *khva; kvm_pfn_t pfn; + enum pfn_cache_usage usage; bool active; bool valid; bool dirty; - bool kernel_map; - bool guest_uses_pa; }; #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE -- cgit From cf1d88b36ba7e83bdaa50bccc4c47864e8f08cbe Mon Sep 17 00:00:00 2001 From: David Woodhouse Date: Thu, 3 Mar 2022 15:41:12 +0000 Subject: KVM: Remove dirty handling from gfn_to_pfn_cache completely It isn't OK to cache the dirty status of a page in internal structures for an indefinite period of time. Any time a vCPU exits the run loop to userspace might be its last; the VMM might do its final check of the dirty log, flush the last remaining dirty pages to the destination and complete a live migration. If we have internal 'dirty' state which doesn't get flushed until the vCPU is finally destroyed on the source after migration is complete, then we have lost data because that will escape the final copy. This problem already exists with the use of kvm_vcpu_unmap() to mark pages dirty in e.g. VMX nesting. Note that the actual Linux MM already considers the page to be dirty since we have a writeable mapping of it. This is just about the KVM dirty logging. For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to track which gfn_to_pfn_caches have been used and explicitly mark the corresponding pages dirty before returning to userspace. But we would have needed external tracking of that anyway, rather than walking the full list of GPCs to find those belonging to this vCPU which are dirty. So let's rely *solely* on that external tracking, and keep it simple rather than laying a tempting trap for callers to fall into. Signed-off-by: David Woodhouse Signed-off-by: Paolo Bonzini Message-Id: <20220303154127.202856-3-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 14 +++++--------- include/linux/kvm_types.h | 1 - 2 files changed, 5 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index be9bbc0c6200..3f9b22c4983a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1239,7 +1239,6 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn); * by KVM (and thus needs a kernel virtual mapping). * @gpa: guest physical address to map. * @len: sanity check; the range being access must fit a single page. - * @dirty: mark the cache dirty immediately. * * @return: 0 for success. * -EINVAL for a mapping which would cross a page boundary. @@ -1252,7 +1251,7 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn); */ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, - gpa_t gpa, unsigned long len, bool dirty); + gpa_t gpa, unsigned long len); /** * kvm_gfn_to_pfn_cache_check - check validity of a gfn_to_pfn_cache. @@ -1261,7 +1260,6 @@ int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, * @gpc: struct gfn_to_pfn_cache object. * @gpa: current guest physical address to map. * @len: sanity check; the range being access must fit a single page. - * @dirty: mark the cache dirty immediately. * * @return: %true if the cache is still valid and the address matches. * %false if the cache is not valid. @@ -1283,7 +1281,6 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, * @gpc: struct gfn_to_pfn_cache object. * @gpa: updated guest physical address to map. * @len: sanity check; the range being access must fit a single page. - * @dirty: mark the cache dirty immediately. * * @return: 0 for success. * -EINVAL for a mapping which would cross a page boundary. @@ -1296,7 +1293,7 @@ bool kvm_gfn_to_pfn_cache_check(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, * with the lock still held to permit access. */ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, - gpa_t gpa, unsigned long len, bool dirty); + gpa_t gpa, unsigned long len); /** * kvm_gfn_to_pfn_cache_unmap - temporarily unmap a gfn_to_pfn_cache. @@ -1304,10 +1301,9 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, * @kvm: pointer to kvm instance. * @gpc: struct gfn_to_pfn_cache object. * - * This unmaps the referenced page and marks it dirty, if appropriate. The - * cache is left in the invalid state but at least the mapping from GPA to - * userspace HVA will remain cached and can be reused on a subsequent - * refresh. + * This unmaps the referenced page. The cache is left in the invalid state + * but at least the mapping from GPA to userspace HVA will remain cached + * and can be reused on a subsequent refresh. */ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc); diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 784f37cbf33e..ac1ebb37a0ff 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -74,7 +74,6 @@ struct gfn_to_pfn_cache { enum pfn_cache_usage usage; bool active; bool valid; - bool dirty; }; #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE -- cgit From 8733068b9bdbc7a54f02dcc59eb0e4789cd60942 Mon Sep 17 00:00:00 2001 From: David Woodhouse Date: Thu, 3 Mar 2022 15:41:17 +0000 Subject: KVM: x86/xen: Make kvm_xen_set_evtchn() reusable from other places Clean it up to return -errno on error consistently, while still being compatible with the return conventions for kvm_arch_set_irq_inatomic() and the kvm_set_irq() callback. We use -ENOTCONN to indicate when the port is masked. No existing users care, except that it's negative. Also allow it to optimise the vCPU lookup. Unless we abuse the lapic map, there is no quick lookup from APIC ID to a vCPU; the logic in kvm_get_vcpu_by_id() will just iterate over all vCPUs till it finds the one it wants. So do that just once and stash the result in the struct kvm_xen_evtchn for next time. Signed-off-by: David Woodhouse Signed-off-by: Paolo Bonzini Message-Id: <20220303154127.202856-8-dwmw2@infradead.org> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3f9b22c4983a..252ee4a61b58 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -611,7 +611,8 @@ struct kvm_hv_sint { struct kvm_xen_evtchn { u32 port; - u32 vcpu; + u32 vcpu_id; + int vcpu_idx; u32 priority; }; -- cgit From 18bfee3216fa6f28d55ebf88d824a539d2bec3c7 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 30 Mar 2022 09:00:19 +0200 Subject: ftrace: Make ftrace_graph_is_dead() a static branch ftrace_graph_is_dead() is used on hot paths, it just reads a variable in memory and is not worth suffering function call constraints. For instance, at entry of prepare_ftrace_return(), inlining it avoids saving prepare_ftrace_return() parameters to stack and restoring them after calling ftrace_graph_is_dead(). While at it using a static branch is even more performant and is rather well adapted considering that the returned value will almost never change. Inline ftrace_graph_is_dead() and replace 'kill_ftrace_graph' bool by a static branch. The performance improvement is noticeable. Link: https://lkml.kernel.org/r/e0411a6a0ed3eafff0ad2bc9cd4b0e202b4617df.1648623570.git.christophe.leroy@csgroup.eu Signed-off-by: Christophe Leroy Signed-off-by: Steven Rostedt (Google) --- include/linux/ftrace.h | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 37b619185ec9..f15a4b76cbfc 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -9,6 +9,7 @@ #include #include +#include #include #include #include @@ -1015,7 +1016,20 @@ unsigned long ftrace_graph_ret_addr(struct task_struct *task, int *idx, extern int register_ftrace_graph(struct fgraph_ops *ops); extern void unregister_ftrace_graph(struct fgraph_ops *ops); -extern bool ftrace_graph_is_dead(void); +/** + * ftrace_graph_is_dead - returns true if ftrace_graph_stop() was called + * + * ftrace_graph_stop() is called when a severe error is detected in + * the function graph tracing. This function is called by the critical + * paths of function graph to keep those paths from doing any more harm. + */ +DECLARE_STATIC_KEY_FALSE(kill_ftrace_graph); + +static inline bool ftrace_graph_is_dead(void) +{ + return static_branch_unlikely(&kill_ftrace_graph); +} + extern void ftrace_graph_stop(void); /* The current handlers in use */ -- cgit From 5cfff569cab8bf544bab62c911c5d6efd5af5e05 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Fri, 1 Apr 2022 14:39:03 -0400 Subject: tracing: Move user_events.h temporarily out of include/uapi While user_events API is under development and has been marked for broken to not let the API become fixed, move the header file out of the uapi directory. This is to prevent it from being installed, then later changed, and then have an old distro user space update with a new kernel, where applications see the user_events being available, but the old header is in place, and then they get compiled incorrectly. Also, surround the include with CONFIG_COMPILE_TEST to the current location, but when the BROKEN tag is taken off, it will use the uapi directory, and fail to compile. This is a good way to remind us to move the header back. Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com Link: https://lkml.kernel.org/r/20220401143903.188384f3@gandalf.local.home Suggested-by: Mathieu Desnoyers Signed-off-by: Steven Rostedt (Google) --- include/linux/user_events.h | 63 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 63 insertions(+) create mode 100644 include/linux/user_events.h (limited to 'include/linux') diff --git a/include/linux/user_events.h b/include/linux/user_events.h new file mode 100644 index 000000000000..736e05603463 --- /dev/null +++ b/include/linux/user_events.h @@ -0,0 +1,63 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Copyright (c) 2021, Microsoft Corporation. + * + * Authors: + * Beau Belgrave + */ +#ifndef _UAPI_LINUX_USER_EVENTS_H +#define _UAPI_LINUX_USER_EVENTS_H + +#include +#include + +#ifdef __KERNEL__ +#include +#else +#include +#endif + +#define USER_EVENTS_SYSTEM "user_events" +#define USER_EVENTS_PREFIX "u:" + +/* Bits 0-6 are for known probe types, Bit 7 is for unknown probes */ +#define EVENT_BIT_FTRACE 0 +#define EVENT_BIT_PERF 1 +#define EVENT_BIT_OTHER 7 + +#define EVENT_STATUS_FTRACE (1 << EVENT_BIT_FTRACE) +#define EVENT_STATUS_PERF (1 << EVENT_BIT_PERF) +#define EVENT_STATUS_OTHER (1 << EVENT_BIT_OTHER) + +/* Create dynamic location entry within a 32-bit value */ +#define DYN_LOC(offset, size) ((size) << 16 | (offset)) + +/* + * Describes an event registration and stores the results of the registration. + * This structure is passed to the DIAG_IOCSREG ioctl, callers at a minimum + * must set the size and name_args before invocation. + */ +struct user_reg { + + /* Input: Size of the user_reg structure being used */ + __u32 size; + + /* Input: Pointer to string with event name, description and flags */ + __u64 name_args; + + /* Output: Byte index of the event within the status page */ + __u32 status_index; + + /* Output: Index of the event to use when writing data */ + __u32 write_index; +}; + +#define DIAG_IOC_MAGIC '*' + +/* Requests to register a user_event */ +#define DIAG_IOCSREG _IOWR(DIAG_IOC_MAGIC, 0, struct user_reg*) + +/* Requests to delete a user_event */ +#define DIAG_IOCSDEL _IOW(DIAG_IOC_MAGIC, 1, char*) + +#endif /* _UAPI_LINUX_USER_EVENTS_H */ -- cgit From 1cd927ad6f62f27d8908498dcbf61395c5dd5fe2 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Fri, 1 Apr 2022 14:39:03 -0400 Subject: tracing: mark user_events as BROKEN After being merged, user_events become more visible to a wider audience that have concerns with the current API. It is too late to fix this for this release, but instead of a full revert, just mark it as BROKEN (which prevents it from being selected in make config). Then we can work finding a better API. If that fails, then it will need to be completely reverted. To not have the code silently bitrot, still allow building it with COMPILE_TEST. And to prevent the uapi header from being installed, then later changed, and then have an old distro user space see the old version, move the header file out of the uapi directory. Surround the include with CONFIG_COMPILE_TEST to the current location, but when the BROKEN tag is taken off, it will use the uapi directory, and fail to compile. This is a good way to remind us to move the header back. Link: https://lore.kernel.org/all/20220330155835.5e1f6669@gandalf.local.home Link: https://lkml.kernel.org/r/20220330201755.29319-1-mathieu.desnoyers@efficios.com Suggested-by: Mathieu Desnoyers Signed-off-by: Steven Rostedt (Google) Signed-off-by: Linus Torvalds --- include/linux/user_events.h | 116 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 include/linux/user_events.h (limited to 'include/linux') diff --git a/include/linux/user_events.h b/include/linux/user_events.h new file mode 100644 index 000000000000..e570840571e1 --- /dev/null +++ b/include/linux/user_events.h @@ -0,0 +1,116 @@ +/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ +/* + * Copyright (c) 2021, Microsoft Corporation. + * + * Authors: + * Beau Belgrave + */ +#ifndef _UAPI_LINUX_USER_EVENTS_H +#define _UAPI_LINUX_USER_EVENTS_H + +#include +#include + +#ifdef __KERNEL__ +#include +#else +#include +#endif + +#define USER_EVENTS_SYSTEM "user_events" +#define USER_EVENTS_PREFIX "u:" + +/* Bits 0-6 are for known probe types, Bit 7 is for unknown probes */ +#define EVENT_BIT_FTRACE 0 +#define EVENT_BIT_PERF 1 +#define EVENT_BIT_OTHER 7 + +#define EVENT_STATUS_FTRACE (1 << EVENT_BIT_FTRACE) +#define EVENT_STATUS_PERF (1 << EVENT_BIT_PERF) +#define EVENT_STATUS_OTHER (1 << EVENT_BIT_OTHER) + +/* Create dynamic location entry within a 32-bit value */ +#define DYN_LOC(offset, size) ((size) << 16 | (offset)) + +/* Use raw iterator for attached BPF program(s), no affect on ftrace/perf */ +#define FLAG_BPF_ITER (1 << 0) + +/* + * Describes an event registration and stores the results of the registration. + * This structure is passed to the DIAG_IOCSREG ioctl, callers at a minimum + * must set the size and name_args before invocation. + */ +struct user_reg { + + /* Input: Size of the user_reg structure being used */ + __u32 size; + + /* Input: Pointer to string with event name, description and flags */ + __u64 name_args; + + /* Output: Byte index of the event within the status page */ + __u32 status_index; + + /* Output: Index of the event to use when writing data */ + __u32 write_index; +}; + +#define DIAG_IOC_MAGIC '*' + +/* Requests to register a user_event */ +#define DIAG_IOCSREG _IOWR(DIAG_IOC_MAGIC, 0, struct user_reg*) + +/* Requests to delete a user_event */ +#define DIAG_IOCSDEL _IOW(DIAG_IOC_MAGIC, 1, char*) + +/* Data type that was passed to the BPF program */ +enum { + /* Data resides in kernel space */ + USER_BPF_DATA_KERNEL, + + /* Data resides in user space */ + USER_BPF_DATA_USER, + + /* Data is a pointer to a user_bpf_iter structure */ + USER_BPF_DATA_ITER, +}; + +/* + * Describes an iovec iterator that BPF programs can use to access data for + * a given user_event write() / writev() call. + */ +struct user_bpf_iter { + + /* Offset of the data within the first iovec */ + __u32 iov_offset; + + /* Number of iovec structures */ + __u32 nr_segs; + + /* Pointer to iovec structures */ + const struct iovec *iov; +}; + +/* Context that BPF programs receive when attached to a user_event */ +struct user_bpf_context { + + /* Data type being passed (see union below) */ + __u32 data_type; + + /* Length of the data */ + __u32 data_len; + + /* Pointer to data, varies by data type */ + union { + /* Kernel data (data_type == USER_BPF_DATA_KERNEL) */ + void *kdata; + + /* User data (data_type == USER_BPF_DATA_USER) */ + void *udata; + + /* Direct iovec (data_type == USER_BPF_DATA_ITER) */ + struct user_bpf_iter *iter; + }; +}; + +#endif /* _UAPI_LINUX_USER_EVENTS_H */ -- cgit From 92cedee6a6a3e6fcc3ffc0e3866baae5f6f76ac1 Mon Sep 17 00:00:00 2001 From: Christian König Date: Wed, 3 Nov 2021 10:02:08 +0100 Subject: dma-buf: add dma_resv_get_singleton v2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a function to simplify getting a single fence for all the fences in the dma_resv object. v2: fix ref leak in error handling Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220321135856.1331-3-christian.koenig@amd.com --- include/linux/dma-resv.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index dccaf7b1663e..233ed4f14d9e 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -437,6 +437,8 @@ void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); int dma_resv_get_fences(struct dma_resv *obj, bool write, unsigned int *num_fences, struct dma_fence ***fences); +int dma_resv_get_singleton(struct dma_resv *obj, bool write, + struct dma_fence **fence); int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, unsigned long timeout); -- cgit From 8bea9af887de4c99a95f93f2ce400ef63e8b4e9b Mon Sep 17 00:00:00 2001 From: Lars-Peter Clausen Date: Tue, 22 Mar 2022 12:50:27 +0200 Subject: iio: adc: ad_sigma_delta: Add sequencer support Some sigma-delta chips support sampling of multiple channels in continuous mode. When the operating with more than one channel enabled, the channel sequencer cycles through the enabled channels in sequential order, from first channel to the last one. If a channel is disabled, it is skipped by the sequencer. If more than one channel is used in continuous mode, instruct the device to append the status to the SPI transfer (1 extra byte) every time we receive a sample. All sigma-delta chips possessing a sampling sequencer have this ability. Inside the status register there will be the number of the converted channel. In this way, even if the CPU won't keep up with the sampling rate, it won't send to userspace wrong channel samples. When multiple channels are enabled in continuous mode, the device needs to perform a measurement on all slots before we can push to userspace the sample. If, during sequencing and data reading, a channel measurement is lost, a desync occurred. In this case, ad_sigma_delta drops the incomplete sample and waits for the device to send the measurement on the first active slot. Co-developed-by: Alexandru Tachici Signed-off-by: Alexandru Tachici Signed-off-by: Lars-Peter Clausen Link: https://lore.kernel.org/r/20220322105029.86389-5-alexandru.tachici@analog.com Signed-off-by: Jonathan Cameron --- include/linux/iio/adc/ad_sigma_delta.h | 38 ++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iio/adc/ad_sigma_delta.h b/include/linux/iio/adc/ad_sigma_delta.h index c525fd51652f..7852f6c9a714 100644 --- a/include/linux/iio/adc/ad_sigma_delta.h +++ b/include/linux/iio/adc/ad_sigma_delta.h @@ -32,26 +32,34 @@ struct iio_dev; /** * struct ad_sigma_delta_info - Sigma Delta driver specific callbacks and options * @set_channel: Will be called to select the current channel, may be NULL. + * @append_status: Will be called to enable status append at the end of the sample, may be NULL. * @set_mode: Will be called to select the current mode, may be NULL. + * @disable_all: Will be called to disable all channels, may be NULL. * @postprocess_sample: Is called for each sampled data word, can be used to * modify or drop the sample data, it, may be NULL. * @has_registers: true if the device has writable and readable registers, false * if there is just one read-only sample data shift register. * @addr_shift: Shift of the register address in the communications register. * @read_mask: Mask for the communications register having the read bit set. + * @status_ch_mask: Mask for the channel number stored in status register. * @data_reg: Address of the data register, if 0 the default address of 0x3 will * be used. * @irq_flags: flags for the interrupt used by the triggered buffer + * @num_slots: Number of sequencer slots */ struct ad_sigma_delta_info { int (*set_channel)(struct ad_sigma_delta *, unsigned int channel); + int (*append_status)(struct ad_sigma_delta *, bool append); int (*set_mode)(struct ad_sigma_delta *, enum ad_sigma_delta_mode mode); + int (*disable_all)(struct ad_sigma_delta *); int (*postprocess_sample)(struct ad_sigma_delta *, unsigned int raw_sample); bool has_registers; unsigned int addr_shift; unsigned int read_mask; + unsigned int status_ch_mask; unsigned int data_reg; unsigned long irq_flags; + unsigned int num_slots; }; /** @@ -76,6 +84,13 @@ struct ad_sigma_delta { uint8_t comm; const struct ad_sigma_delta_info *info; + unsigned int active_slots; + unsigned int current_slot; + unsigned int num_slots; + bool status_appended; + /* map slots to channels in order to know what to expect from devices */ + unsigned int *slots; + uint8_t *samples_buf; /* * DMA (thus cache coherency maintenance) requires the @@ -97,6 +112,29 @@ static inline int ad_sigma_delta_set_channel(struct ad_sigma_delta *sd, return 0; } +static inline int ad_sigma_delta_append_status(struct ad_sigma_delta *sd, bool append) +{ + int ret; + + if (sd->info->append_status) { + ret = sd->info->append_status(sd, append); + if (ret < 0) + return ret; + + sd->status_appended = append; + } + + return 0; +} + +static inline int ad_sigma_delta_disable_all(struct ad_sigma_delta *sd) +{ + if (sd->info->disable_all) + return sd->info->disable_all(sd); + + return 0; +} + static inline int ad_sigma_delta_set_mode(struct ad_sigma_delta *sd, unsigned int mode) { -- cgit From e87f4152e542610d0b4c6c8548964a68a59d2040 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Wed, 23 Mar 2022 20:02:41 +0100 Subject: task_stack, x86/cea: Force-inline stack helpers Force-inline two stack helpers to fix the following objtool warnings: vmlinux.o: warning: objtool: in_task_stack()+0xc: call to task_stack_page() leaves .noinstr.text section vmlinux.o: warning: objtool: in_entry_stack()+0x10: call to cpu_entry_stack() leaves .noinstr.text section Signed-off-by: Borislav Petkov Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220324183607.31717-2-bp@alien8.de --- include/linux/sched/task_stack.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched/task_stack.h b/include/linux/sched/task_stack.h index 892562ebbd3a..5e799a47431e 100644 --- a/include/linux/sched/task_stack.h +++ b/include/linux/sched/task_stack.h @@ -16,7 +16,7 @@ * try_get_task_stack() instead. task_stack_page will return a pointer * that could get freed out from under you. */ -static inline void *task_stack_page(const struct task_struct *task) +static __always_inline void *task_stack_page(const struct task_struct *task) { return task->stack; } -- cgit From 5467801f1fcbdc46bc7298a84dbf3ca1ff2a7320 Mon Sep 17 00:00:00 2001 From: Shreeya Patel Date: Mon, 21 Mar 2022 19:02:41 +0530 Subject: gpio: Restrict usage of GPIO chip irq members before initialization GPIO chip irq members are exposed before they could be completely initialized and this leads to race conditions. One such issue was observed for the gc->irq.domain variable which was accessed through the I2C interface in gpiochip_to_irq() before it could be initialized by gpiochip_add_irqchip(). This resulted in Kernel NULL pointer dereference. Following are the logs for reference :- kernel: Call Trace: kernel: gpiod_to_irq+0x53/0x70 kernel: acpi_dev_gpio_irq_get_by+0x113/0x1f0 kernel: i2c_acpi_get_irq+0xc0/0xd0 kernel: i2c_device_probe+0x28a/0x2a0 kernel: really_probe+0xf2/0x460 kernel: RIP: 0010:gpiochip_to_irq+0x47/0xc0 To avoid such scenarios, restrict usage of GPIO chip irq members before they are completely initialized. Signed-off-by: Shreeya Patel Cc: stable@vger.kernel.org Reviewed-by: Andy Shevchenko Reviewed-by: Linus Walleij Signed-off-by: Bartosz Golaszewski --- include/linux/gpio/driver.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 98c93510640e..874aabd270c9 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -221,6 +221,15 @@ struct gpio_irq_chip { */ bool per_parent_data; + /** + * @initialized: + * + * Flag to track GPIO chip irq member's initialization. + * This flag will make sure GPIO chip irq members are not used + * before they are initialized. + */ + bool initialized; + /** * @init_hw: optional routine to initialize hardware before * an IRQ chip will be added. This is quite useful when -- cgit From f0e3c6261af183f0c2246cfe691abec78377622c Mon Sep 17 00:00:00 2001 From: Johnson Wang Date: Fri, 1 Apr 2022 16:02:11 +0800 Subject: regulator: mt6366: Add support for MT6366 regulator The MT6366 is a regulator found on boards based on MediaTek MT8186 and probably other SoCs. It is a so called pmic and connects as a slave to SoC using SPI, wrapped inside the pmic-wrapper. Reviewed-by: AngeloGioacchino Del Regno Signed-off-by: Johnson Wang Link: https://lore.kernel.org/r/20220401080212.27383-2-johnson.wang@mediatek.com Signed-off-by: Mark Brown --- include/linux/regulator/mt6358-regulator.h | 45 ++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regulator/mt6358-regulator.h b/include/linux/regulator/mt6358-regulator.h index 1cc304946d09..bdcf83cd719e 100644 --- a/include/linux/regulator/mt6358-regulator.h +++ b/include/linux/regulator/mt6358-regulator.h @@ -48,9 +48,54 @@ enum { MT6358_ID_VLDO28, MT6358_ID_VAUD28, MT6358_ID_VSIM2, + MT6358_ID_VCORE_SSHUB, + MT6358_ID_VSRAM_OTHERS_SSHUB, MT6358_ID_RG_MAX, }; +enum { + MT6366_ID_VDRAM1 = 0, + MT6366_ID_VCORE, + MT6366_ID_VPA, + MT6366_ID_VPROC11, + MT6366_ID_VPROC12, + MT6366_ID_VGPU, + MT6366_ID_VS2, + MT6366_ID_VMODEM, + MT6366_ID_VS1, + MT6366_ID_VDRAM2, + MT6366_ID_VSIM1, + MT6366_ID_VIBR, + MT6366_ID_VRF12, + MT6366_ID_VIO18, + MT6366_ID_VUSB, + MT6366_ID_VCN18, + MT6366_ID_VFE28, + MT6366_ID_VSRAM_PROC11, + MT6366_ID_VCN28, + MT6366_ID_VSRAM_OTHERS, + MT6366_ID_VSRAM_GPU, + MT6366_ID_VXO22, + MT6366_ID_VEFUSE, + MT6366_ID_VAUX18, + MT6366_ID_VMCH, + MT6366_ID_VBIF28, + MT6366_ID_VSRAM_PROC12, + MT6366_ID_VEMC, + MT6366_ID_VIO28, + MT6366_ID_VA12, + MT6366_ID_VRF18, + MT6366_ID_VCN33_BT, + MT6366_ID_VCN33_WIFI, + MT6366_ID_VMC, + MT6366_ID_VAUD28, + MT6366_ID_VSIM2, + MT6366_ID_VCORE_SSHUB, + MT6366_ID_VSRAM_OTHERS_SSHUB, + MT6366_ID_RG_MAX, +}; + #define MT6358_MAX_REGULATOR MT6358_ID_RG_MAX +#define MT6366_MAX_REGULATOR MT6366_ID_RG_MAX #endif /* __LINUX_REGULATOR_MT6358_H */ -- cgit From 71d637823cac7748079a912e0373476c7cf6f985 Mon Sep 17 00:00:00 2001 From: Christian König Date: Wed, 3 Nov 2021 13:35:14 +0100 Subject: dma-buf: finally make dma_resv_excl_fence private v2 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Drivers should never touch this directly. v2: fix rebase clash Signed-off-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20220321135856.1331-10-christian.koenig@amd.com Reviewed-by: Daniel Vetter --- include/linux/dma-resv.h | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 233ed4f14d9e..ecb697d4d861 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -411,23 +411,6 @@ static inline void dma_resv_unlock(struct dma_resv *obj) ww_mutex_unlock(&obj->lock); } -/** - * dma_resv_excl_fence - return the object's exclusive fence - * @obj: the reservation object - * - * Returns the exclusive fence (if any). Caller must either hold the objects - * through dma_resv_lock() or the RCU read side lock through rcu_read_lock(), - * or one of the variants of each - * - * RETURNS - * The exclusive fence or NULL - */ -static inline struct dma_fence * -dma_resv_excl_fence(struct dma_resv *obj) -{ - return rcu_dereference_check(obj->fence_excl, dma_resv_held(obj)); -} - void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); -- cgit From 8fd4ddda2f49a66bf5dd3d0c01966c4b1971308b Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 14 Mar 2022 12:49:36 +0100 Subject: static_call: Don't make __static_call_return0 static System.map shows that vmlinux contains several instances of __static_call_return0(): c0004fc0 t __static_call_return0 c0011518 t __static_call_return0 c00d8160 t __static_call_return0 arch_static_call_transform() uses the middle one to check whether we are setting a call to __static_call_return0 or not: c0011520 : c0011520: 3d 20 c0 01 lis r9,-16383 <== r9 = 0xc001 << 16 c0011524: 39 29 15 18 addi r9,r9,5400 <== r9 += 0x1518 c0011528: 7c 05 48 00 cmpw r5,r9 <== r9 has value 0xc0011518 here So if static_call_update() is called with one of the other instances of __static_call_return0(), arch_static_call_transform() won't recognise it. In order to work properly, global single instance of __static_call_return0() is required. Fixes: 3f2a8fc4b15d ("static_call/x86: Add __static_call_return0()") Signed-off-by: Christophe Leroy Signed-off-by: Peter Zijlstra (Intel) Acked-by: Josh Poimboeuf Link: https://lkml.kernel.org/r/30821468a0e7d28251954b578e5051dc09300d04.1647258493.git.christophe.leroy@csgroup.eu --- include/linux/static_call.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/static_call.h b/include/linux/static_call.h index 3e56a9751c06..fcc5b48989b3 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -248,10 +248,7 @@ static inline int static_call_text_reserved(void *start, void *end) return 0; } -static inline long __static_call_return0(void) -{ - return 0; -} +extern long __static_call_return0(void); #define EXPORT_STATIC_CALL(name) \ EXPORT_SYMBOL(STATIC_CALL_KEY(name)); \ -- cgit From 5517d500829c683a358a8de04ecb2e28af629ae5 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 14 Mar 2022 11:27:35 +0100 Subject: static_call: Properly initialise DEFINE_STATIC_CALL_RET0() When a static call is updated with __static_call_return0() as target, arch_static_call_transform() set it to use an optimised set of instructions which are meant to lay in the same cacheline. But when initialising a static call with DEFINE_STATIC_CALL_RET0(), we get a branch to the real __static_call_return0() function instead of getting the optimised setup: c00d8120 <__SCT__perf_snapshot_branch_stack>: c00d8120: 4b ff ff f4 b c00d8114 <__static_call_return0> c00d8124: 3d 80 c0 0e lis r12,-16370 c00d8128: 81 8c 81 3c lwz r12,-32452(r12) c00d812c: 7d 89 03 a6 mtctr r12 c00d8130: 4e 80 04 20 bctr c00d8134: 38 60 00 00 li r3,0 c00d8138: 4e 80 00 20 blr c00d813c: 00 00 00 00 .long 0x0 Add ARCH_DEFINE_STATIC_CALL_RET0_TRAMP() defined by each architecture to setup the optimised configuration, and rework DEFINE_STATIC_CALL_RET0() to call it: c00d8120 <__SCT__perf_snapshot_branch_stack>: c00d8120: 48 00 00 14 b c00d8134 <__SCT__perf_snapshot_branch_stack+0x14> c00d8124: 3d 80 c0 0e lis r12,-16370 c00d8128: 81 8c 81 3c lwz r12,-32452(r12) c00d812c: 7d 89 03 a6 mtctr r12 c00d8130: 4e 80 04 20 bctr c00d8134: 38 60 00 00 li r3,0 c00d8138: 4e 80 00 20 blr c00d813c: 00 00 00 00 .long 0x0 Signed-off-by: Christophe Leroy Signed-off-by: Peter Zijlstra (Intel) Acked-by: Josh Poimboeuf Link: https://lore.kernel.org/r/1e0a61a88f52a460f62a58ffc2a5f847d1f7d9d8.1647253456.git.christophe.leroy@csgroup.eu --- include/linux/static_call.h | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/static_call.h b/include/linux/static_call.h index fcc5b48989b3..3c50b0fdda16 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -196,6 +196,14 @@ extern long __static_call_return0(void); }; \ ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) +#define DEFINE_STATIC_CALL_RET0(name, _func) \ + DECLARE_STATIC_CALL(name, _func); \ + struct static_call_key STATIC_CALL_KEY(name) = { \ + .func = __static_call_return0, \ + .type = 1, \ + }; \ + ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name) + #define static_call_cond(name) (void)__static_call(name) #define EXPORT_STATIC_CALL(name) \ @@ -231,6 +239,12 @@ static inline int static_call_init(void) { return 0; } }; \ ARCH_DEFINE_STATIC_CALL_NULL_TRAMP(name) +#define DEFINE_STATIC_CALL_RET0(name, _func) \ + DECLARE_STATIC_CALL(name, _func); \ + struct static_call_key STATIC_CALL_KEY(name) = { \ + .func = __static_call_return0, \ + }; \ + ARCH_DEFINE_STATIC_CALL_RET0_TRAMP(name) #define static_call_cond(name) (void)__static_call(name) @@ -284,6 +298,9 @@ static inline long __static_call_return0(void) .func = NULL, \ } +#define DEFINE_STATIC_CALL_RET0(name, _func) \ + __DEFINE_STATIC_CALL(name, _func, __static_call_return0) + static inline void __static_call_nop(void) { } /* @@ -327,7 +344,4 @@ static inline int static_call_text_reserved(void *start, void *end) #define DEFINE_STATIC_CALL(name, _func) \ __DEFINE_STATIC_CALL(name, _func, _func) -#define DEFINE_STATIC_CALL_RET0(name, _func) \ - __DEFINE_STATIC_CALL(name, _func, __static_call_return0) - #endif /* _LINUX_STATIC_CALL_H */ -- cgit From df21c0d7a94db64a4e1a0d070e26fb02e60fefab Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 14 Mar 2022 11:27:36 +0100 Subject: static_call: Remove __DEFINE_STATIC_CALL macro Only DEFINE_STATIC_CALL use __DEFINE_STATIC_CALL macro now when CONFIG_HAVE_STATIC_CALL is selected. Only keep __DEFINE_STATIC_CALL() for the generic fallback, and also use it to implement DEFINE_STATIC_CALL_NULL() in that case. Signed-off-by: Christophe Leroy Signed-off-by: Peter Zijlstra (Intel) Acked-by: Josh Poimboeuf Link: https://lore.kernel.org/r/329074f92d96e3220ebe15da7bbe2779beee31eb.1647253456.git.christophe.leroy@csgroup.eu --- include/linux/static_call.h | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/static_call.h b/include/linux/static_call.h index 3c50b0fdda16..df53bed9d71f 100644 --- a/include/linux/static_call.h +++ b/include/linux/static_call.h @@ -180,13 +180,13 @@ extern int static_call_text_reserved(void *start, void *end); extern long __static_call_return0(void); -#define __DEFINE_STATIC_CALL(name, _func, _func_init) \ +#define DEFINE_STATIC_CALL(name, _func) \ DECLARE_STATIC_CALL(name, _func); \ struct static_call_key STATIC_CALL_KEY(name) = { \ - .func = _func_init, \ + .func = _func, \ .type = 1, \ }; \ - ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func_init) + ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func) #define DEFINE_STATIC_CALL_NULL(name, _func) \ DECLARE_STATIC_CALL(name, _func); \ @@ -225,12 +225,12 @@ extern long __static_call_return0(void); static inline int static_call_init(void) { return 0; } -#define __DEFINE_STATIC_CALL(name, _func, _func_init) \ +#define DEFINE_STATIC_CALL(name, _func) \ DECLARE_STATIC_CALL(name, _func); \ struct static_call_key STATIC_CALL_KEY(name) = { \ - .func = _func_init, \ + .func = _func, \ }; \ - ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func_init) + ARCH_DEFINE_STATIC_CALL_TRAMP(name, _func) #define DEFINE_STATIC_CALL_NULL(name, _func) \ DECLARE_STATIC_CALL(name, _func); \ @@ -292,11 +292,11 @@ static inline long __static_call_return0(void) .func = _func_init, \ } +#define DEFINE_STATIC_CALL(name, _func) \ + __DEFINE_STATIC_CALL(name, _func, _func) + #define DEFINE_STATIC_CALL_NULL(name, _func) \ - DECLARE_STATIC_CALL(name, _func); \ - struct static_call_key STATIC_CALL_KEY(name) = { \ - .func = NULL, \ - } + __DEFINE_STATIC_CALL(name, _func, NULL) #define DEFINE_STATIC_CALL_RET0(name, _func) \ __DEFINE_STATIC_CALL(name, _func, __static_call_return0) @@ -341,7 +341,4 @@ static inline int static_call_text_reserved(void *start, void *end) #endif /* CONFIG_HAVE_STATIC_CALL */ -#define DEFINE_STATIC_CALL(name, _func) \ - __DEFINE_STATIC_CALL(name, _func, _func) - #endif /* _LINUX_STATIC_CALL_H */ -- cgit From 2d2f8f083ef29e9b7adfe5cb421368331543473f Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 28 Mar 2022 16:58:09 +0200 Subject: Revert "locking/local_lock: Make the empty local_lock_*() function a macro." With volatile removed from arch_raw_cpu_ptr() the compiler no longer creates the per-CPU reference. The usage of the macro can be reverted now. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220328145810.86783-3-bigeasy@linutronix.de --- include/linux/local_lock_internal.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/local_lock_internal.h b/include/linux/local_lock_internal.h index 6d635e8306d6..975e33b793a7 100644 --- a/include/linux/local_lock_internal.h +++ b/include/linux/local_lock_internal.h @@ -44,9 +44,9 @@ static inline void local_lock_debug_init(local_lock_t *l) } #else /* CONFIG_DEBUG_LOCK_ALLOC */ # define LOCAL_LOCK_DEBUG_INIT(lockname) -# define local_lock_acquire(__ll) do { typecheck(local_lock_t *, __ll); } while (0) -# define local_lock_release(__ll) do { typecheck(local_lock_t *, __ll); } while (0) -# define local_lock_debug_init(__ll) do { typecheck(local_lock_t *, __ll); } while (0) +static inline void local_lock_acquire(local_lock_t *l) { } +static inline void local_lock_release(local_lock_t *l) { } +static inline void local_lock_debug_init(local_lock_t *l) { } #endif /* !CONFIG_DEBUG_LOCK_ALLOC */ #define INIT_LOCAL_LOCK(lockname) { LOCAL_LOCK_DEBUG_INIT(lockname) } -- cgit From 8b023accc8df70e72f7704d29fead7ca914d6837 Mon Sep 17 00:00:00 2001 From: Nick Desaulniers Date: Mon, 14 Mar 2022 15:19:03 -0700 Subject: lockdep: Fix -Wunused-parameter for _THIS_IP_ While looking into a bug related to the compiler's handling of addresses of labels, I noticed some uses of _THIS_IP_ seemed unused in lockdep. Drive by cleanup. -Wunused-parameter: kernel/locking/lockdep.c:1383:22: warning: unused parameter 'ip' kernel/locking/lockdep.c:4246:48: warning: unused parameter 'ip' kernel/locking/lockdep.c:4844:19: warning: unused parameter 'ip' Signed-off-by: Nick Desaulniers Signed-off-by: Peter Zijlstra (Intel) Acked-by: Waiman Long Link: https://lore.kernel.org/r/20220314221909.2027027-1-ndesaulniers@google.com --- include/linux/irqflags.h | 4 ++-- include/linux/kvm_host.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 4b140938b03e..5ec0fa71399e 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -20,13 +20,13 @@ #ifdef CONFIG_PROVE_LOCKING extern void lockdep_softirqs_on(unsigned long ip); extern void lockdep_softirqs_off(unsigned long ip); - extern void lockdep_hardirqs_on_prepare(unsigned long ip); + extern void lockdep_hardirqs_on_prepare(void); extern void lockdep_hardirqs_on(unsigned long ip); extern void lockdep_hardirqs_off(unsigned long ip); #else static inline void lockdep_softirqs_on(unsigned long ip) { } static inline void lockdep_softirqs_off(unsigned long ip) { } - static inline void lockdep_hardirqs_on_prepare(unsigned long ip) { } + static inline void lockdep_hardirqs_on_prepare(void) { } static inline void lockdep_hardirqs_on(unsigned long ip) { } static inline void lockdep_hardirqs_off(unsigned long ip) { } #endif diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3f9b22c4983a..e1b2413750c0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -450,7 +450,7 @@ static __always_inline void guest_state_enter_irqoff(void) { instrumentation_begin(); trace_hardirqs_on_prepare(); - lockdep_hardirqs_on_prepare(CALLER_ADDR0); + lockdep_hardirqs_on_prepare(); instrumentation_end(); guest_context_enter_irqoff(); -- cgit From bfe4daf850f45d92dcd3da477f0b0456620294c3 Mon Sep 17 00:00:00 2001 From: Stephane Eranian Date: Tue, 22 Mar 2022 15:15:05 -0700 Subject: perf/core: Add perf_clear_branch_entry_bitfields() helper Make it simpler to reset all the info fields on the perf_branch_entry by adding a helper inline function. The goal is to centralize the initialization to avoid missing a field in case more are added. Signed-off-by: Stephane Eranian Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220322221517.2510440-2-eranian@google.com --- include/linux/perf_event.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index af97dd427501..a411080d5169 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1063,6 +1063,22 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->txn = 0; } +/* + * Clear all bitfields in the perf_branch_entry. + * The to and from fields are not cleared because they are + * systematically modified by caller. + */ +static inline void perf_clear_branch_entry_bitfields(struct perf_branch_entry *br) +{ + br->mispred = 0; + br->predicted = 0; + br->in_tx = 0; + br->abort = 0; + br->cycles = 0; + br->type = 0; + br->reserved = 0; +} + extern void perf_output_sample(struct perf_output_handle *handle, struct perf_event_header *header, struct perf_sample_data *data, -- cgit From 2a606a18cd672a16343d146a126721b34cc6adbd Mon Sep 17 00:00:00 2001 From: Stephane Eranian Date: Tue, 22 Mar 2022 15:15:12 -0700 Subject: ACPI: Add perf low power callback Add an optional callback needed by some PMU features, e.g., AMD BRS, to give a chance to the perf_events code to change its state before a CPU goes to low power and after it comes back. The callback is void when the PERF_NEEDS_LOPWR_CB flag is not set. This flag must be set in arch specific perf_event.h header whenever needed. When not set, there is no impact on the ACPI code. Signed-off-by: Stephane Eranian [peterz: build fix] Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220322221517.2510440-9-eranian@google.com --- include/linux/perf_event.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index a411080d5169..da759560eec5 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1676,4 +1676,10 @@ typedef int (perf_snapshot_branch_stack_t)(struct perf_branch_entry *entries, unsigned int cnt); DECLARE_STATIC_CALL(perf_snapshot_branch_stack, perf_snapshot_branch_stack_t); +#ifndef PERF_NEEDS_LOPWR_CB +static inline void perf_lopwr_cb(bool mode) +{ +} +#endif + #endif /* _LINUX_PERF_EVENT_H */ -- cgit From cfe43f478b79ba45573ca22d52d0d8823be068fa Mon Sep 17 00:00:00 2001 From: Valentin Schneider Date: Fri, 12 Nov 2021 18:52:01 +0000 Subject: preempt/dynamic: Introduce preemption model accessors CONFIG_PREEMPT{_NONE, _VOLUNTARY} designate either: o The build-time preemption model when !PREEMPT_DYNAMIC o The default boot-time preemption model when PREEMPT_DYNAMIC IOW, using those on PREEMPT_DYNAMIC kernels is meaningless - the actual model could have been set to something else by the "preempt=foo" cmdline parameter. Same problem applies to CONFIG_PREEMPTION. Introduce a set of helpers to determine the actual preemption model used by the live kernel. Suggested-by: Marco Elver Signed-off-by: Valentin Schneider Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Marco Elver Acked-by: Frederic Weisbecker Link: https://lore.kernel.org/r/20211112185203.280040-3-valentin.schneider@arm.com --- include/linux/sched.h | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index d5e3c00b74e1..67f06f72c50e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2117,6 +2117,47 @@ static inline void cond_resched_rcu(void) #endif } +#ifdef CONFIG_PREEMPT_DYNAMIC + +extern bool preempt_model_none(void); +extern bool preempt_model_voluntary(void); +extern bool preempt_model_full(void); + +#else + +static inline bool preempt_model_none(void) +{ + return IS_ENABLED(CONFIG_PREEMPT_NONE); +} +static inline bool preempt_model_voluntary(void) +{ + return IS_ENABLED(CONFIG_PREEMPT_VOLUNTARY); +} +static inline bool preempt_model_full(void) +{ + return IS_ENABLED(CONFIG_PREEMPT); +} + +#endif + +static inline bool preempt_model_rt(void) +{ + return IS_ENABLED(CONFIG_PREEMPT_RT); +} + +/* + * Does the preemption model allow non-cooperative preemption? + * + * For !CONFIG_PREEMPT_DYNAMIC kernels this is an exact match with + * CONFIG_PREEMPTION; for CONFIG_PREEMPT_DYNAMIC this doesn't work as the + * kernel is *built* with CONFIG_PREEMPTION=y but may run with e.g. the + * PREEMPT_NONE model. + */ +static inline bool preempt_model_preemptible(void) +{ + return preempt_model_full() || preempt_model_rt(); +} + /* * Does a critical section need to be broken due to another * task waiting?: (technically does not depend on CONFIG_PREEMPTION, -- cgit From 8c756a0a2de17f1535ef885ac7e556e016735eb2 Mon Sep 17 00:00:00 2001 From: Sakari Ailus Date: Thu, 31 Mar 2022 15:54:47 +0300 Subject: device property: Convert device_{dma_supported,get_dma_attr} to fwnode Make the device_dma_supported and device_get_dma_attr functions to use the fwnode ops, and move the implementation to ACPI and OF frameworks. Signed-off-by: Sakari Ailus Acked-by: Rob Herring Reviewed-by: Andy Shevchenko Reviewed-by: Heikki Krogerus Signed-off-by: Rafael J. Wysocki --- include/linux/fwnode.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h index 3a532ba66f6c..6f307f21fc65 100644 --- a/include/linux/fwnode.h +++ b/include/linux/fwnode.h @@ -113,6 +113,9 @@ struct fwnode_operations { bool (*device_is_available)(const struct fwnode_handle *fwnode); const void *(*device_get_match_data)(const struct fwnode_handle *fwnode, const struct device *dev); + bool (*device_dma_supported)(const struct fwnode_handle *fwnode); + enum dev_dma_attr + (*device_get_dma_attr)(const struct fwnode_handle *fwnode); bool (*property_present)(const struct fwnode_handle *fwnode, const char *propname); int (*property_read_int_array)(const struct fwnode_handle *fwnode, -- cgit From 68b979d068d3d0dceb14c446f664433d96f20a7e Mon Sep 17 00:00:00 2001 From: Sakari Ailus Date: Thu, 31 Mar 2022 15:54:49 +0300 Subject: device property: Add iomap to fwnode operations Add iomap() fwnode operation to implement fwnode_iomap() through fwnode operations, moving the code in fwnode_iomap() to OF framework. Note that the IS_ENABLED(CONFIG_OF_ADDRESS) && is_of_node(fwnode) check is needed for Sparc that has its own implementation of of_iomap anyway. Let the pre-compiler to handle that check. Signed-off-by: Sakari Ailus Reviewed-by: Andy Shevchenko Reviewed-by: Heikki Krogerus Signed-off-by: Rafael J. Wysocki --- include/linux/fwnode.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h index 6f307f21fc65..ebbc3bf03f95 100644 --- a/include/linux/fwnode.h +++ b/include/linux/fwnode.h @@ -148,6 +148,7 @@ struct fwnode_operations { (*graph_get_port_parent)(struct fwnode_handle *fwnode); int (*graph_parse_endpoint)(const struct fwnode_handle *fwnode, struct fwnode_endpoint *endpoint); + void __iomem *(*iomap)(struct fwnode_handle *fwnode, int index); int (*add_links)(struct fwnode_handle *fwnode); }; -- cgit From 99c63707bafd15bcf97fbd6bef1c92d5bfa01d28 Mon Sep 17 00:00:00 2001 From: Sakari Ailus Date: Thu, 31 Mar 2022 15:54:50 +0300 Subject: device property: Add irq_get to fwnode operation Add irq_get() fwnode operation to implement fwnode_irq_get() through fwnode operations, moving the code in fwnode_irq_get() to OF and ACPI frameworks. Signed-off-by: Sakari Ailus Acked-by: Rob Herring Reviewed-by: Andy Shevchenko Reviewed-by: Heikki Krogerus Signed-off-by: Rafael J. Wysocki --- include/linux/fwnode.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h index ebbc3bf03f95..6ab69871b06d 100644 --- a/include/linux/fwnode.h +++ b/include/linux/fwnode.h @@ -149,6 +149,7 @@ struct fwnode_operations { int (*graph_parse_endpoint)(const struct fwnode_handle *fwnode, struct fwnode_endpoint *endpoint); void __iomem *(*iomap)(struct fwnode_handle *fwnode, int index); + int (*irq_get)(const struct fwnode_handle *fwnode, unsigned int index); int (*add_links)(struct fwnode_handle *fwnode); }; -- cgit From cdb4f26a63c391317e335e6e683a614358e70aeb Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Thu, 6 Jan 2022 14:31:51 +0100 Subject: kobject: kobj_type: remove default_attrs Now that all in-kernel users of default_attrs for the kobj_type are gone and converted to properly use the default_groups pointer instead, it can be safely removed. There is one standard way to create sysfs files in a kobj_type, and not two like before, causing confusion as to which should be used. Cc: "Rafael J. Wysocki" Link: https://lore.kernel.org/r/20220106133151.607703-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman --- include/linux/kobject.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kobject.h b/include/linux/kobject.h index c7b47399b36a..57fb972fea05 100644 --- a/include/linux/kobject.h +++ b/include/linux/kobject.h @@ -120,7 +120,6 @@ extern char *kobject_get_path(struct kobject *kobj, gfp_t flag); struct kobj_type { void (*release)(struct kobject *kobj); const struct sysfs_ops *sysfs_ops; - struct attribute **default_attrs; /* use default_groups instead */ const struct attribute_group **default_groups; const struct kobj_ns_type_operations *(*child_ns_type)(struct kobject *kobj); const void *(*namespace)(struct kobject *kobj); -- cgit From 1be9473e31ab87ad1b6ecf9fd11df461930ede85 Mon Sep 17 00:00:00 2001 From: Aaron Tomlin Date: Tue, 22 Mar 2022 14:03:34 +0000 Subject: module: Move livepatch support to a separate file No functional change. This patch migrates livepatch support (i.e. used during module add/or load and remove/or deletion) from core module code into kernel/module/livepatch.c. At the moment it contains code to persist Elf information about a given livepatch module, only. The new file was added to MAINTAINERS. Reviewed-by: Petr Mladek Tested-by: Petr Mladek Signed-off-by: Aaron Tomlin Signed-off-by: Luis Chamberlain --- include/linux/module.h | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/module.h b/include/linux/module.h index 1e135fd5c076..7ec9715de7dc 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -663,17 +663,14 @@ static inline bool module_requested_async_probing(struct module *module) return module && module->async_probe_requested; } -#ifdef CONFIG_LIVEPATCH static inline bool is_livepatch_module(struct module *mod) { +#ifdef CONFIG_LIVEPATCH return mod->klp; -} -#else /* !CONFIG_LIVEPATCH */ -static inline bool is_livepatch_module(struct module *mod) -{ +#else return false; +#endif } -#endif /* CONFIG_LIVEPATCH */ bool is_module_sig_enforced(void); void set_module_sig_enforced(void); -- cgit From 0c1e42805c25c87eb7a6f3b18bdbf3b3b7840aff Mon Sep 17 00:00:00 2001 From: Aaron Tomlin Date: Tue, 22 Mar 2022 14:03:37 +0000 Subject: module: Move extra signature support out of core code No functional change. This patch migrates additional module signature check code from core module code into kernel/module/signing.c. Reviewed-by: Christophe Leroy Signed-off-by: Aaron Tomlin Signed-off-by: Luis Chamberlain --- include/linux/module.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/module.h b/include/linux/module.h index 7ec9715de7dc..5e2059f3afc7 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -672,7 +672,6 @@ static inline bool is_livepatch_module(struct module *mod) #endif } -bool is_module_sig_enforced(void); void set_module_sig_enforced(void); #else /* !CONFIG_MODULES... */ @@ -799,10 +798,6 @@ static inline bool module_requested_async_probing(struct module *module) return false; } -static inline bool is_module_sig_enforced(void) -{ - return false; -} static inline void set_module_sig_enforced(void) { @@ -854,11 +849,18 @@ static inline bool retpoline_module_ok(bool has_retpoline) #endif #ifdef CONFIG_MODULE_SIG +bool is_module_sig_enforced(void); + static inline bool module_sig_ok(struct module *module) { return module->sig_ok; } #else /* !CONFIG_MODULE_SIG */ +static inline bool is_module_sig_enforced(void) +{ + return false; +} + static inline bool module_sig_ok(struct module *module) { return true; -- cgit From f64205a42046d3802c423fa2059e7fca39af127c Mon Sep 17 00:00:00 2001 From: Aaron Tomlin Date: Tue, 22 Mar 2022 14:03:43 +0000 Subject: module: Move kdb module related code out of main kdb code No functional change. This patch migrates the kdb 'lsmod' command support out of main kdb code into its own file under kernel/module. In addition to the above, a minor style warning i.e. missing a blank line after declarations, was resolved too. The new file was added to MAINTAINERS. Finally we remove linux/module.h as it is entirely redundant. Reviewed-by: Daniel Thompson Acked-by: Daniel Thompson Signed-off-by: Aaron Tomlin Signed-off-by: Luis Chamberlain --- include/linux/kdb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/kdb.h b/include/linux/kdb.h index ea0f5e580fac..07dfb6a20a1c 100644 --- a/include/linux/kdb.h +++ b/include/linux/kdb.h @@ -222,5 +222,6 @@ enum { extern int kdbgetintenv(const char *, int *); extern int kdb_set(int, const char **); +int kdb_lsmod(int argc, const char **argv); #endif /* !_KDB_H */ -- cgit From 01dc0386efb769056257410ba5754558384006a7 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 23 Feb 2022 13:02:14 +0100 Subject: module: Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC Add CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC to allow architectures to request having modules data in vmalloc area instead of module area. This is required on powerpc book3s/32 in order to set data non executable, because it is not possible to set executability on page basis, this is done per 256 Mbytes segments. The module area has exec right, vmalloc area has noexec. This can also be useful on other powerpc/32 in order to maximize the chance of code being close enough to kernel core to avoid branch trampolines. Cc: Jason Wessel Acked-by: Daniel Thompson Cc: Douglas Anderson Signed-off-by: Christophe Leroy [mcgrof: rebased in light of kernel/module/kdb.c move] Signed-off-by: Luis Chamberlain --- include/linux/module.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/module.h b/include/linux/module.h index 5e2059f3afc7..46d4d5f2516e 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -422,6 +422,9 @@ struct module { /* Core layout: rbtree is accessed frequently, so keep together. */ struct module_layout core_layout __module_layout_align; struct module_layout init_layout; +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC + struct module_layout data_layout; +#endif /* Arch-specific module values */ struct mod_arch_specific arch; @@ -569,6 +572,11 @@ bool is_module_text_address(unsigned long addr); static inline bool within_module_core(unsigned long addr, const struct module *mod) { +#ifdef CONFIG_ARCH_WANTS_MODULES_DATA_IN_VMALLOC + if ((unsigned long)mod->data_layout.base <= addr && + addr < (unsigned long)mod->data_layout.base + mod->data_layout.size) + return true; +#endif return (unsigned long)mod->core_layout.base <= addr && addr < (unsigned long)mod->core_layout.base + mod->core_layout.size; } -- cgit From baa914cd81f51f4e4f3bae5bb59764b32ad8c353 Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Tue, 5 Apr 2022 16:22:20 +0900 Subject: firewire: add kernel API to access CYCLE_TIME register 1394 OHCI specification defined Isochronous Cycle Timer Register to get value of CYCLE_TIME register defined by IEEE 1394 for CSR architecture defined by ISO/IEC 13213. Unit driver can calculate packet time by compute with the value of CYCLE_TIME and timeStamp field in descriptor of each isochronous and asynchronous context. The resolution of CYCLE_TIME is 49.576 MHz, while the one of timeStamp is 8,000 Hz. Current implementation of Linux FireWire subsystem allows the driver to get the value of CYCLE_TIMER CSR register by transaction service. The transaction service has overhead in regard of access to MMIO register. This commit adds kernel API for unit driver to access the register directly. Signed-off-by: Takashi Sakamoto Link: https://lore.kernel.org/r/20220405072221.226217-3-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai --- include/linux/firewire.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firewire.h b/include/linux/firewire.h index 07967a450eaa..2f467c52bdec 100644 --- a/include/linux/firewire.h +++ b/include/linux/firewire.h @@ -150,6 +150,8 @@ static inline void fw_card_put(struct fw_card *card) kref_put(&card->kref, fw_card_release); } +int fw_card_read_cycle_time(struct fw_card *card, u32 *cycle_time); + struct fw_attribute_group { struct attribute_group *groups[2]; struct attribute_group group; -- cgit From b2405aa948b95afc5246fa56fc05c3512cd6185c Mon Sep 17 00:00:00 2001 From: Takashi Sakamoto Date: Tue, 5 Apr 2022 16:22:21 +0900 Subject: firewire: add kernel API to access packet structure in request structure for AR context In 1394 OHCI specification, descriptor of Asynchronous Receive DMA context has timeStamp field in its trailer quadlet. The field is written by the host controller for the time to receive asynchronous request subaction in isochronous cycle time. In Linux FireWire subsystem, the value of field is stored to fw_packet structure and copied to fw_request structure as the part. The fw_request structure is hidden from unit driver and passed as opaque pointer when calling registered handler. It's inconvenient to the unit driver which needs timestamp of packet. This commit adds kernel API to pick up timestamp from opaque pointer to fw_request structure. Signed-off-by: Takashi Sakamoto Link: https://lore.kernel.org/r/20220405072221.226217-4-o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai --- include/linux/firewire.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/firewire.h b/include/linux/firewire.h index 2f467c52bdec..980019053e54 100644 --- a/include/linux/firewire.h +++ b/include/linux/firewire.h @@ -354,6 +354,7 @@ void fw_core_remove_address_handler(struct fw_address_handler *handler); void fw_send_response(struct fw_card *card, struct fw_request *request, int rcode); int fw_get_request_speed(struct fw_request *request); +u32 fw_request_get_timestamp(const struct fw_request *request); void fw_send_request(struct fw_card *card, struct fw_transaction *t, int tcode, int destination_id, int generation, int speed, unsigned long long offset, void *payload, size_t length, -- cgit From a8e2512efc65892a1cbf608d9c03c8bcbe5a623a Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Fri, 1 Apr 2022 15:06:04 +0100 Subject: PM: core: Add NS varients of EXPORT[_GPL]_SIMPLE_DEV_PM_OPS and runtime pm equiv As more drivers start to use namespaces, we need to have varients of these useful macros that allow the export to be in a particular namespace. Signed-off-by: Jonathan Cameron Cc: Paul Cercueil Signed-off-by: Rafael J. Wysocki --- include/linux/pm.h | 14 +++++++++----- include/linux/pm_runtime.h | 10 ++++++++-- 2 files changed, 17 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm.h b/include/linux/pm.h index e65b3ab28377..ffe941958501 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -368,13 +368,13 @@ const struct dev_pm_ops name = { \ #ifdef CONFIG_PM #define _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, runtime_suspend_fn, \ - runtime_resume_fn, idle_fn, sec) \ + runtime_resume_fn, idle_fn, sec, ns) \ _DEFINE_DEV_PM_OPS(name, suspend_fn, resume_fn, runtime_suspend_fn, \ runtime_resume_fn, idle_fn); \ - _EXPORT_SYMBOL(name, sec) + __EXPORT_SYMBOL(name, sec, ns) #else #define _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, runtime_suspend_fn, \ - runtime_resume_fn, idle_fn, sec) \ + runtime_resume_fn, idle_fn, sec, ns) \ static __maybe_unused _DEFINE_DEV_PM_OPS(__static_##name, suspend_fn, \ resume_fn, runtime_suspend_fn, \ runtime_resume_fn, idle_fn) @@ -391,9 +391,13 @@ static __maybe_unused _DEFINE_DEV_PM_OPS(__static_##name, suspend_fn, \ _DEFINE_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL) #define EXPORT_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ - _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "") + _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "", "") #define EXPORT_GPL_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ - _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "_gpl") + _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "_gpl", "") +#define EXPORT_NS_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn, ns) \ + _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "", #ns) +#define EXPORT_NS_GPL_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn, ns) \ + _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "_gpl", #ns) /* Deprecated. Use DEFINE_SIMPLE_DEV_PM_OPS() instead. */ #define SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 2bff6a10095d..9e4d056967c6 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -41,10 +41,16 @@ #define EXPORT_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn) \ _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ - suspend_fn, resume_fn, idle_fn, "") + suspend_fn, resume_fn, idle_fn, "", "") #define EXPORT_GPL_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn) \ _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ - suspend_fn, resume_fn, idle_fn, "_gpl") + suspend_fn, resume_fn, idle_fn, "_gpl", "") +#define EXPORT_NS_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn, ns) \ + _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ + suspend_fn, resume_fn, idle_fn, "", #ns) +#define EXPORT_NS_GPL_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn, ns) \ + _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ + suspend_fn, resume_fn, idle_fn, "_gpl", #ns) #ifdef CONFIG_PM extern struct workqueue_struct *pm_wq; -- cgit From 0b5c21bbc01e92745ca1ca4f6fd87d878fa3ea5e Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 4 Apr 2022 11:38:47 +0200 Subject: net: ensure net_todo_list is processed quickly In [1], Will raised a potential issue that the cfg80211 code, which does (from a locking perspective) rtnl_lock() wiphy_lock() rtnl_unlock() might be suspectible to ABBA deadlocks, because rtnl_unlock() calls netdev_run_todo(), which might end up calling rtnl_lock() again, which could then deadlock (see the comment in the code added here for the scenario). Some back and forth and thinking ensued, but clearly this can't happen if the net_todo_list is empty at the rtnl_unlock() here. Clearly, the code here cannot actually put an entry on it, and all other users of rtnl_unlock() will empty it since that will always go through netdev_run_todo(), emptying the list. So the only other way to get there would be to add to the list and then unlock the RTNL without going through rtnl_unlock(), which is only possible through __rtnl_unlock(). However, this isn't exported and not used in many places, and none of them seem to be able to unregister before using it. Therefore, add a WARN_ON() in the code to ensure this invariant won't be broken, so that the cfg80211 (or any similar) code stays safe. [1] https://lore.kernel.org/r/Yjzpo3TfZxtKPMAG@google.com Signed-off-by: Johannes Berg Link: https://lore.kernel.org/r/20220404113847.0ee02e4a70da.Ic73d206e217db20fd22dcec14fe5442ca732804b@changeid Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 59e27a2b7bf0..b6a1e7f643da 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3894,7 +3894,8 @@ void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev); extern int netdev_budget; extern unsigned int netdev_budget_usecs; -/* Called by rtnetlink.c:rtnl_unlock() */ +/* Used by rtnetlink.c:__rtnl_unlock()/rtnl_unlock() */ +extern struct list_head net_todo_list; void netdev_run_todo(void); static inline void __dev_put(struct net_device *dev) -- cgit From 40379a0084c2f65eb62c102f5bbf5cdc14a50410 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Mon, 4 Apr 2022 15:08:15 +0300 Subject: net/mlx5_fpga: Drop INNOVA TLS support Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code. [1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf Link: https://lore.kernel.org/r/b88add368def721ea9d054cb69def72d9e3f67aa.1649073691.git.leonro@nvidia.com Reviewed-by: Tariq Toukan Reviewed-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc_fpga.h | 63 -------------------------------------- 1 file changed, 63 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc_fpga.h b/include/linux/mlx5/mlx5_ifc_fpga.h index 07d77323f78a..e3d824f6a309 100644 --- a/include/linux/mlx5/mlx5_ifc_fpga.h +++ b/include/linux/mlx5/mlx5_ifc_fpga.h @@ -54,7 +54,6 @@ enum { enum { MLX5_FPGA_CAP_SANDBOX_PRODUCT_ID_IPSEC = 0x2, - MLX5_FPGA_CAP_SANDBOX_PRODUCT_ID_TLS = 0x3, }; struct mlx5_ifc_fpga_shell_caps_bits { @@ -387,27 +386,6 @@ struct mlx5_ifc_fpga_destroy_qp_out_bits { u8 reserved_at_40[0x40]; }; -struct mlx5_ifc_tls_extended_cap_bits { - u8 aes_gcm_128[0x1]; - u8 aes_gcm_256[0x1]; - u8 reserved_at_2[0x1e]; - u8 reserved_at_20[0x20]; - u8 context_capacity_total[0x20]; - u8 context_capacity_rx[0x20]; - u8 context_capacity_tx[0x20]; - u8 reserved_at_a0[0x10]; - u8 tls_counter_size[0x10]; - u8 tls_counters_addr_low[0x20]; - u8 tls_counters_addr_high[0x20]; - u8 rx[0x1]; - u8 tx[0x1]; - u8 tls_v12[0x1]; - u8 tls_v13[0x1]; - u8 lro[0x1]; - u8 ipv6[0x1]; - u8 reserved_at_106[0x1a]; -}; - struct mlx5_ifc_ipsec_extended_cap_bits { u8 encapsulation[0x20]; @@ -572,45 +550,4 @@ struct mlx5_ifc_fpga_ipsec_sa { __be16 vid; /* only 12 bits, rest is reserved */ __be16 reserved2; } __packed; - -enum fpga_tls_cmds { - CMD_SETUP_STREAM = 0x1001, - CMD_TEARDOWN_STREAM = 0x1002, - CMD_RESYNC_RX = 0x1003, -}; - -#define MLX5_TLS_1_2 (0) - -#define MLX5_TLS_ALG_AES_GCM_128 (0) -#define MLX5_TLS_ALG_AES_GCM_256 (1) - -struct mlx5_ifc_tls_cmd_bits { - u8 command_type[0x20]; - u8 ipv6[0x1]; - u8 direction_sx[0x1]; - u8 tls_version[0x2]; - u8 reserved[0x1c]; - u8 swid[0x20]; - u8 src_port[0x10]; - u8 dst_port[0x10]; - union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits src_ipv4_src_ipv6; - union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits dst_ipv4_dst_ipv6; - u8 tls_rcd_sn[0x40]; - u8 tcp_sn[0x20]; - u8 tls_implicit_iv[0x20]; - u8 tls_xor_iv[0x40]; - u8 encryption_key[0x100]; - u8 alg[4]; - u8 reserved2[0x1c]; - u8 reserved3[0x4a0]; -}; - -struct mlx5_ifc_tls_resp_bits { - u8 syndrome[0x20]; - u8 stream_id[0x20]; - u8 reserved[0x40]; -}; - -#define MLX5_TLS_COMMAND_SIZE (0x100) - #endif /* MLX5_IFC_FPGA_H */ -- cgit From 0276bd3a94c072de3f69b5afe6224e488cc76635 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Tue, 5 Apr 2022 17:15:16 +0200 Subject: IB/mlx5: Fix undefined behavior due to shift overflowing the constant MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix: drivers/infiniband/hw/mlx5/main.c: In function ‘translate_eth_legacy_proto_oper’: drivers/infiniband/hw/mlx5/main.c:370:2: error: case label does not reduce to an integer constant case MLX5E_PROT_MASK(MLX5E_50GBASE_KR2): ^~~~ See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory details as to why it triggers with older gccs only. Link: https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de Signed-off-by: Borislav Petkov Cc: Leon Romanovsky Cc: Saeed Mahameed Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Leon Romanovsky --- include/linux/mlx5/port.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/port.h b/include/linux/mlx5/port.h index 28a928b0684b..e96ee1e348cb 100644 --- a/include/linux/mlx5/port.h +++ b/include/linux/mlx5/port.h @@ -141,7 +141,7 @@ enum mlx5_ptys_width { MLX5_PTYS_WIDTH_12X = 1 << 4, }; -#define MLX5E_PROT_MASK(link_mode) (1 << link_mode) +#define MLX5E_PROT_MASK(link_mode) (1U << link_mode) #define MLX5_GET_ETH_PROTO(reg, out, ext, field) \ (ext ? MLX5_GET(reg, out, ext_##field) : \ MLX5_GET(reg, out, field)) -- cgit From a285909f471d6703a04b2b3942c352e27131c92b Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 6 Apr 2022 15:00:03 +0900 Subject: mm/slub, kunit: Make slub_kunit unaffected by user specified flags slub_kunit does not expect other debugging flags to be set when running tests. When SLAB_RED_ZONE flag is set globally, test fails because the flag affects number of errors reported. To make slub_kunit unaffected by user specified debugging flags, introduce SLAB_NO_USER_FLAGS to ignore them. With this flag, only flags specified in the code are used and others are ignored. Suggested-by: Vlastimil Babka Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka Link: https://lore.kernel.org/r/Yk0sY9yoJhFEXWOg@hyeyoo --- include/linux/slab.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 373b3ef99f4e..11ceddcae9f4 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -112,6 +112,13 @@ #define SLAB_KASAN 0 #endif +/* + * Ignore user specified debugging flags. + * Intended for caches created for self-tests so they have only flags + * specified in the code and other flags are ignored. + */ +#define SLAB_NO_USER_FLAGS ((slab_flags_t __force)0x10000000U) + /* The following flags affect the page allocator grouping pages by mobility */ /* Objects are reclaimable */ #define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0x00020000U) -- cgit From a5f1783be29adae15666fd803efd7d2979130869 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Wed, 2 Mar 2022 12:02:22 +0100 Subject: lib/stackdepot: allow requesting early initialization dynamically In a later patch we want to add stackdepot support for object owner tracking in slub caches, which is enabled by slub_debug boot parameter. This creates a bootstrap problem as some caches are created early in boot when slab_is_available() is false and thus stack_depot_init() tries to use memblock. But, as reported by Hyeonggon Yoo [1] we are already beyond memblock_free_all(). Ideally memblock allocation should fail, yet it succeeds, but later the system crashes, which is a separately handled issue. To resolve this boostrap issue in a robust way, this patch adds another way to request stack_depot_early_init(), which happens at a well-defined point of time. In addition to build-time CONFIG_STACKDEPOT_ALWAYS_INIT, code that's e.g. processing boot parameters (which happens early enough) can call a new function stack_depot_want_early_init(), which sets a flag that stack_depot_early_init() will check. In this patch we also convert page_owner to this approach. While it doesn't have the bootstrap issue as slub, it's also a functionality enabled by a boot param and can thus request stack_depot_early_init() with memblock allocation instead of later initialization with kvmalloc(). As suggested by Mike, make stack_depot_early_init() only attempt memblock allocation and stack_depot_init() only attempt kvmalloc(). Also change the latter to kvcalloc(). In both cases we can lose the explicit array zeroing, which the allocations do already. As suggested by Marco, provide empty implementations of the init functions for !CONFIG_STACKDEPOT builds to simplify the callers. [1] https://lore.kernel.org/all/YhnUcqyeMgCrWZbd@ip-172-31-19-208.ap-northeast-1.compute.internal/ Reported-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Suggested-by: Mike Rapoport Suggested-by: Marco Elver Signed-off-by: Vlastimil Babka Reviewed-by: Marco Elver Reviewed-and-tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Mike Rapoport Acked-by: David Rientjes --- include/linux/stackdepot.h | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index 17f992fe6355..bc2797955de9 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -20,18 +20,36 @@ depot_stack_handle_t __stack_depot_save(unsigned long *entries, gfp_t gfp_flags, bool can_alloc); /* - * Every user of stack depot has to call this during its own init when it's - * decided that it will be calling stack_depot_save() later. + * Every user of stack depot has to call stack_depot_init() during its own init + * when it's decided that it will be calling stack_depot_save() later. This is + * recommended for e.g. modules initialized later in the boot process, when + * slab_is_available() is true. * * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot * enabled as part of mm_init(), for subsystems where it's known at compile time * that stack depot will be used. + * + * Another alternative is to call stack_depot_want_early_init(), when the + * decision to use stack depot is taken e.g. when evaluating kernel boot + * parameters, which precedes the enablement point in mm_init(). + * + * stack_depot_init() and stack_depot_want_early_init() can be called regardless + * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print + * functions should only be called from code that makes sure CONFIG_STACKDEPOT + * is enabled. */ +#ifdef CONFIG_STACKDEPOT int stack_depot_init(void); -#ifdef CONFIG_STACKDEPOT_ALWAYS_INIT -static inline int stack_depot_early_init(void) { return stack_depot_init(); } +void __init stack_depot_want_early_init(void); + +/* This is supposed to be called only from mm_init() */ +int __init stack_depot_early_init(void); #else +static inline int stack_depot_init(void) { return 0; } + +static inline void stack_depot_want_early_init(void) { } + static inline int stack_depot_early_init(void) { return 0; } #endif -- cgit From f742b90e61bb53b27771f64bdae05db03a6ab1f2 Mon Sep 17 00:00:00 2001 From: Brijesh Singh Date: Thu, 24 Feb 2022 10:55:49 -0600 Subject: x86/mm: Extend cc_attr to include AMD SEV-SNP The CC_ATTR_GUEST_SEV_SNP can be used by the guest to query whether the SNP (Secure Nested Paging) feature is active. Signed-off-by: Brijesh Singh Signed-off-by: Borislav Petkov Link: https://lore.kernel.org/r/20220307213356.2797205-10-brijesh.singh@amd.com --- include/linux/cc_platform.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index efd8205282da..d08dd65b5c43 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -72,6 +72,14 @@ enum cc_attr { * Examples include TDX guest & SEV. */ CC_ATTR_GUEST_UNROLL_STRING_IO, + + /** + * @CC_ATTR_SEV_SNP: Guest SNP is active. + * + * The platform/OS is running as a guest/virtual machine and actively + * using AMD SEV-SNP features. + */ + CC_ATTR_GUEST_SEV_SNP, }; #ifdef CONFIG_ARCH_HAS_CC_PLATFORM -- cgit From f4b41f062c424209e3939a81e6da022e049a45f2 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Mon, 4 Apr 2022 18:30:22 +0200 Subject: net: remove noblock parameter from skb_recv_datagram() skb_recv_datagram() has two parameters 'flags' and 'noblock' that are merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)' As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags' into 'flags' and 'noblock' with finally obsolete bit operations like this: skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc); And this is not even done consistently with the 'flags' parameter. This patch removes the obsolete and costly splitting into two parameters and only performs bit operations when really needed on the caller side. One missing conversion thankfully reported by kernel test robot. I missed to enable kunit tests to build the mctp code. Reported-by: kernel test robot Signed-off-by: Oliver Hartkopp Signed-off-by: David S. Miller --- include/linux/skbuff.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 3a30cae8b0a5..2394441fa3dd 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3836,8 +3836,7 @@ struct sk_buff *__skb_try_recv_datagram(struct sock *sk, struct sk_buff *__skb_recv_datagram(struct sock *sk, struct sk_buff_head *sk_queue, unsigned int flags, int *off, int *err); -struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned flags, int noblock, - int *err); +struct sk_buff *skb_recv_datagram(struct sock *sk, unsigned int flags, int *err); __poll_t datagram_poll(struct file *file, struct socket *sock, struct poll_table_struct *wait); int skb_copy_datagram_iter(const struct sk_buff *from, int offset, -- cgit From fe696ccb277d332dc4e625b5b20b988b04d16c04 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sun, 3 Apr 2022 15:53:54 -0700 Subject: gpu: host1x: Fix a kernel-doc warning Add @cache description to eliminate a kernel-doc warning. include/linux/host1x.h:104: warning: Function parameter or member 'cache' not described in 'host1x_client' Fixes: 1f39b1dfa53c ("drm/tegra: Implement buffer object cache") Signed-off-by: Randy Dunlap Cc: Thierry Reding Cc: linux-tegra@vger.kernel.org Cc: David Airlie Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Signed-off-by: Thierry Reding --- include/linux/host1x.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/host1x.h b/include/linux/host1x.h index e8dc5bc41f79..00278853eadf 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -81,6 +81,7 @@ struct host1x_client_ops { * @parent: pointer to parent structure * @usecount: reference count for this structure * @lock: mutex for mutually exclusive concurrency + * @cache: host1x buffer object cache */ struct host1x_client { struct list_head list; -- cgit From 804775dfc2885e93a0a4b35db1914c2cc25172b5 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Tue, 5 Apr 2022 21:57:47 +0200 Subject: net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED) The Wireless Ethernet Dispatch subsystem on the MT7622 SoC can be configured to intercept and handle access to the DMA queues and PCIe interrupts for a MT7615/MT7915 wireless card. It can manage the internal WDMA (Wireless DMA) controller, which allows ethernet packets to be passed from the packet switch engine (PSE) to the wireless card, bypassing the CPU entirely. This can be used to implement hardware flow offloading from ethernet to WLAN. Signed-off-by: Felix Fietkau Signed-off-by: David S. Miller --- include/linux/soc/mediatek/mtk_wed.h | 131 +++++++++++++++++++++++++++++++++++ 1 file changed, 131 insertions(+) create mode 100644 include/linux/soc/mediatek/mtk_wed.h (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk_wed.h b/include/linux/soc/mediatek/mtk_wed.h new file mode 100644 index 000000000000..7e00cca06709 --- /dev/null +++ b/include/linux/soc/mediatek/mtk_wed.h @@ -0,0 +1,131 @@ +#ifndef __MTK_WED_H +#define __MTK_WED_H + +#include +#include +#include +#include + +#define MTK_WED_TX_QUEUES 2 + +struct mtk_wed_hw; +struct mtk_wdma_desc; + +struct mtk_wed_ring { + struct mtk_wdma_desc *desc; + dma_addr_t desc_phys; + int size; + + u32 reg_base; + void __iomem *wpdma; +}; + +struct mtk_wed_device { +#ifdef CONFIG_NET_MEDIATEK_SOC_WED + const struct mtk_wed_ops *ops; + struct device *dev; + struct mtk_wed_hw *hw; + bool init_done, running; + int wdma_idx; + int irq; + + struct mtk_wed_ring tx_ring[MTK_WED_TX_QUEUES]; + struct mtk_wed_ring txfree_ring; + struct mtk_wed_ring tx_wdma[MTK_WED_TX_QUEUES]; + + struct { + int size; + void **pages; + struct mtk_wdma_desc *desc; + dma_addr_t desc_phys; + } buf_ring; + + /* filled by driver: */ + struct { + struct pci_dev *pci_dev; + + u32 wpdma_phys; + + u16 token_start; + unsigned int nbuf; + + u32 (*init_buf)(void *ptr, dma_addr_t phys, int token_id); + int (*offload_enable)(struct mtk_wed_device *wed); + void (*offload_disable)(struct mtk_wed_device *wed); + } wlan; +#endif +}; + +struct mtk_wed_ops { + int (*attach)(struct mtk_wed_device *dev); + int (*tx_ring_setup)(struct mtk_wed_device *dev, int ring, + void __iomem *regs); + int (*txfree_ring_setup)(struct mtk_wed_device *dev, + void __iomem *regs); + void (*detach)(struct mtk_wed_device *dev); + + void (*stop)(struct mtk_wed_device *dev); + void (*start)(struct mtk_wed_device *dev, u32 irq_mask); + void (*reset_dma)(struct mtk_wed_device *dev); + + u32 (*reg_read)(struct mtk_wed_device *dev, u32 reg); + void (*reg_write)(struct mtk_wed_device *dev, u32 reg, u32 val); + + u32 (*irq_get)(struct mtk_wed_device *dev, u32 mask); + void (*irq_set_mask)(struct mtk_wed_device *dev, u32 mask); +}; + +extern const struct mtk_wed_ops __rcu *mtk_soc_wed_ops; + +static inline int +mtk_wed_device_attach(struct mtk_wed_device *dev) +{ + int ret = -ENODEV; + +#ifdef CONFIG_NET_MEDIATEK_SOC_WED + rcu_read_lock(); + dev->ops = rcu_dereference(mtk_soc_wed_ops); + if (dev->ops) + ret = dev->ops->attach(dev); + else + rcu_read_unlock(); + + if (ret) + dev->ops = NULL; +#endif + + return ret; +} + +#ifdef CONFIG_NET_MEDIATEK_SOC_WED +#define mtk_wed_device_active(_dev) !!(_dev)->ops +#define mtk_wed_device_detach(_dev) (_dev)->ops->detach(_dev) +#define mtk_wed_device_start(_dev, _mask) (_dev)->ops->start(_dev, _mask) +#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs) \ + (_dev)->ops->tx_ring_setup(_dev, _ring, _regs) +#define mtk_wed_device_txfree_ring_setup(_dev, _regs) \ + (_dev)->ops->txfree_ring_setup(_dev, _regs) +#define mtk_wed_device_reg_read(_dev, _reg) \ + (_dev)->ops->reg_read(_dev, _reg) +#define mtk_wed_device_reg_write(_dev, _reg, _val) \ + (_dev)->ops->reg_write(_dev, _reg, _val) +#define mtk_wed_device_irq_get(_dev, _mask) \ + (_dev)->ops->irq_get(_dev, _mask) +#define mtk_wed_device_irq_set_mask(_dev, _mask) \ + (_dev)->ops->irq_set_mask(_dev, _mask) +#else +static inline bool mtk_wed_device_active(struct mtk_wed_device *dev) +{ + return false; +} +#define mtk_wed_device_detach(_dev) do {} while (0) +#define mtk_wed_device_start(_dev, _mask) do {} while (0) +#define mtk_wed_device_tx_ring_setup(_dev, _ring, _regs) -ENODEV +#define mtk_wed_device_txfree_ring_setup(_dev, _ring, _regs) -ENODEV +#define mtk_wed_device_reg_read(_dev, _reg) 0 +#define mtk_wed_device_reg_write(_dev, _reg, _val) do {} while (0) +#define mtk_wed_device_irq_get(_dev, _mask) 0 +#define mtk_wed_device_irq_set_mask(_dev, _mask) do {} while (0) +#endif + +#endif -- cgit From a333215e10cb5d3b1e0685ca117f0e9452215485 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Tue, 5 Apr 2022 21:57:48 +0200 Subject: net: ethernet: mtk_eth_soc: implement flow offloading to WED devices This allows hardware flow offloading from Ethernet to WLAN on MT7622 SoC Co-developed-by: Lorenzo Bianconi Signed-off-by: Lorenzo Bianconi Signed-off-by: Felix Fietkau Signed-off-by: David S. Miller --- include/linux/netdevice.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b6a1e7f643da..7b2a0b739684 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -862,6 +862,7 @@ enum net_device_path_type { DEV_PATH_BRIDGE, DEV_PATH_PPPOE, DEV_PATH_DSA, + DEV_PATH_MTK_WDMA, }; struct net_device_path { @@ -887,6 +888,12 @@ struct net_device_path { int port; u16 proto; } dsa; + struct { + u8 wdma_idx; + u8 queue; + u16 wcid; + u8 bss; + } mtk_wdma; }; }; -- cgit From 3e9c4584336149146fe15cb5703fc10a2ca2d2a0 Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Thu, 24 Mar 2022 11:30:25 +0100 Subject: gpu: host1x: Do not use mapping cache for job submissions Buffer mappings used in job submissions are usually small and not rapidly reused as opposed to framebuffers (which are usually large and rapidly reused, for example when page-flipping between double-buffered framebuffers). Avoid going through the mapping cache for these buffers since the cache would also lead to leaks if nobody is ever releasing the cache's last reference. For DRM/KMS these last references are dropped when the framebuffers are removed and therefore no longer needed. While at it, also add a note about the need to explicitly remove the final reference to the mapping in the cache. Reviewed-by: Jon Hunter Tested-by: Jon Hunter Signed-off-by: Thierry Reding --- include/linux/host1x.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/host1x.h b/include/linux/host1x.h index 00278853eadf..c0bf4e581fe9 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -31,6 +31,11 @@ u64 host1x_get_dma_mask(struct host1x *host1x); * struct host1x_bo_cache - host1x buffer object cache * @mappings: list of mappings * @lock: synchronizes accesses to the list of mappings + * + * Note that entries are not periodically evicted from this cache and instead need to be + * explicitly released. This is used primarily for DRM/KMS where the cache's reference is + * released when the last reference to a buffer object represented by a mapping in this + * cache is dropped. */ struct host1x_bo_cache { struct list_head mappings; -- cgit From c8d4c18bfbc4ab467188dbe45cc8155759f49d9e Mon Sep 17 00:00:00 2001 From: Christian König Date: Tue, 16 Nov 2021 15:20:45 +0100 Subject: dma-buf/drivers: make reserving a shared slot mandatory v4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Audit all the users of dma_resv_add_excl_fence() and make sure they reserve a shared slot also when only trying to add an exclusive fence. This is the next step towards handling the exclusive fence like a shared one. v2: fix missed case in amdgpu v3: and two more radeon, rename function v4: add one more case to TTM, fix i915 after rebase Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220406075132.3263-2-christian.koenig@amd.com --- include/linux/dma-resv.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index ecb697d4d861..5fa04d0fccad 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -117,7 +117,7 @@ struct dma_resv { * A new fence is added by calling dma_resv_add_shared_fence(). Since * this often needs to be done past the point of no return in command * submission it cannot fail, and therefore sufficient slots need to be - * reserved by calling dma_resv_reserve_shared(). + * reserved by calling dma_resv_reserve_fences(). * * Note that actual semantics of what an exclusive or shared fence mean * is defined by the user, for reservation objects shared across drivers @@ -413,7 +413,7 @@ static inline void dma_resv_unlock(struct dma_resv *obj) void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); -int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences); +int dma_resv_reserve_fences(struct dma_resv *obj, unsigned int num_fences); void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, struct dma_fence *fence); -- cgit From 773f91b2cf3f52df0d7508fdbf60f37567cdaee4 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 1 Apr 2022 17:08:21 -0400 Subject: SUNRPC: Fix NFSD's request deferral on RDMA transports Trond Myklebust reports an NFSD crash in svc_rdma_sendto(). Further investigation shows that the crash occurred while NFSD was handling a deferred request. This patch addresses two inter-related issues that prevent request deferral from working correctly for RPC/RDMA requests: 1. Prevent the crash by ensuring that the original svc_rqst::rq_xprt_ctxt value is available when the request is revisited. Otherwise svc_rdma_sendto() does not have a Receive context available with which to construct its reply. 2. Possibly since before commit 71641d99ce03 ("svcrdma: Properly compute .len and .buflen for received RPC Calls"), svc_rdma_recvfrom() did not include the transport header in the returned xdr_buf. There should have been no need for svc_defer() and friends to save and restore that header, as of that commit. This issue is addressed in a backport-friendly way by simply having svc_rdma_recvfrom() set rq_xprt_hlen to zero unconditionally, just as svc_tcp_recvfrom() does. This enables svc_deferred_recv() to correctly reconstruct an RPC message received via RPC/RDMA. Reported-by: Trond Myklebust Link: https://lore.kernel.org/linux-nfs/82662b7190f26fb304eb0ab1bb04279072439d4e.camel@hammerspace.com/ Signed-off-by: Chuck Lever Cc: --- include/linux/sunrpc/svc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index a5dda4987e8b..217711fc9cac 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -395,6 +395,7 @@ struct svc_deferred_req { size_t addrlen; struct sockaddr_storage daddr; /* where reply must come from */ size_t daddrlen; + void *xprt_ctxt; struct cache_deferred_req handle; size_t xprt_hlen; int argslen; -- cgit From a60707d74bd1d119cf7bcc5101cda912fc46d5e3 Mon Sep 17 00:00:00 2001 From: Zhen Ni Date: Tue, 15 Feb 2022 19:45:57 +0800 Subject: sched: Move child_runs_first sysctls to fair.c move child_runs_first sysctls to fair.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni Signed-off-by: Luis Chamberlain --- include/linux/sched/sysctl.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index c1076b5e17fb..1d2ff3cd1728 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -14,8 +14,6 @@ extern unsigned long sysctl_hung_task_timeout_secs; enum { sysctl_hung_task_timeout_secs = 0 }; #endif -extern unsigned int sysctl_sched_child_runs_first; - enum sched_tunable_scaling { SCHED_TUNABLESCALING_NONE, SCHED_TUNABLESCALING_LOG, -- cgit From f5ef06d58be8311a9425e6a54a053ecb350952f3 Mon Sep 17 00:00:00 2001 From: Zhen Ni Date: Tue, 15 Feb 2022 19:45:58 +0800 Subject: sched: Move schedstats sysctls to core.c move schedstats sysctls to core.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni Signed-off-by: Luis Chamberlain --- include/linux/sched/sysctl.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 1d2ff3cd1728..6c7a6850559b 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -64,8 +64,6 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); int sysctl_numa_balancing(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); -int sysctl_schedstats(struct ctl_table *table, int write, void *buffer, - size_t *lenp, loff_t *ppos); #if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) extern unsigned int sysctl_sched_energy_aware; -- cgit From d9ab0e63fa7f8405fbb19e28c5191e0880a7f2db Mon Sep 17 00:00:00 2001 From: Zhen Ni Date: Tue, 15 Feb 2022 19:45:59 +0800 Subject: sched: Move rt_period/runtime sysctls to rt.c move rt_period/runtime sysctls to rt.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni Signed-off-by: Luis Chamberlain --- include/linux/sched/sysctl.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 6c7a6850559b..4391c1307945 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -31,15 +31,6 @@ extern int sysctl_numa_balancing_mode; #define sysctl_numa_balancing_mode 0 #endif -/* - * control realtime throttling: - * - * /proc/sys/kernel/sched_rt_period_us - * /proc/sys/kernel/sched_rt_runtime_us - */ -extern unsigned int sysctl_sched_rt_period; -extern int sysctl_sched_rt_runtime; - extern unsigned int sysctl_sched_dl_period_max; extern unsigned int sysctl_sched_dl_period_min; @@ -58,8 +49,6 @@ extern int sched_rr_timeslice; int sched_rr_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); -int sched_rt_handler(struct ctl_table *table, int write, void *buffer, - size_t *lenp, loff_t *ppos); int sysctl_sched_uclamp_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); int sysctl_numa_balancing(struct ctl_table *table, int write, void *buffer, -- cgit From 84227c12888b1105725cd2de14705b029bcbb4b2 Mon Sep 17 00:00:00 2001 From: Zhen Ni Date: Tue, 15 Feb 2022 19:46:00 +0800 Subject: sched: Move deadline_period sysctls to deadline.c move deadline_period sysctls to deadline.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni Signed-off-by: Luis Chamberlain --- include/linux/sched/sysctl.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 4391c1307945..7da9b94c5e1c 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -31,9 +31,6 @@ extern int sysctl_numa_balancing_mode; #define sysctl_numa_balancing_mode 0 #endif -extern unsigned int sysctl_sched_dl_period_max; -extern unsigned int sysctl_sched_dl_period_min; - #ifdef CONFIG_UCLAMP_TASK extern unsigned int sysctl_sched_uclamp_util_min; extern unsigned int sysctl_sched_uclamp_util_max; -- cgit From dafd7a9dad22fadcb290b24dff54e2eae3b89776 Mon Sep 17 00:00:00 2001 From: Zhen Ni Date: Tue, 15 Feb 2022 19:46:01 +0800 Subject: sched: Move rr_timeslice sysctls to rt.c move rr_timeslice sysctls to rt.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni Signed-off-by: Luis Chamberlain --- include/linux/sched/sysctl.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 7da9b94c5e1c..3d307e512d1f 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -41,11 +41,6 @@ extern unsigned int sysctl_sched_uclamp_util_min_rt_default; extern unsigned int sysctl_sched_cfs_bandwidth_slice; #endif -extern int sysctl_sched_rr_timeslice; -extern int sched_rr_timeslice; - -int sched_rr_handler(struct ctl_table *table, int write, void *buffer, - size_t *lenp, loff_t *ppos); int sysctl_sched_uclamp_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); int sysctl_numa_balancing(struct ctl_table *table, int write, void *buffer, -- cgit From 3267e0156c3341ac25b37a0f60551cdae1634b60 Mon Sep 17 00:00:00 2001 From: Zhen Ni Date: Tue, 15 Feb 2022 19:46:02 +0800 Subject: sched: Move uclamp_util sysctls to core.c move uclamp_util sysctls to core.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni Signed-off-by: Luis Chamberlain --- include/linux/sched/sysctl.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 3d307e512d1f..0934b21a57a4 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -31,18 +31,10 @@ extern int sysctl_numa_balancing_mode; #define sysctl_numa_balancing_mode 0 #endif -#ifdef CONFIG_UCLAMP_TASK -extern unsigned int sysctl_sched_uclamp_util_min; -extern unsigned int sysctl_sched_uclamp_util_max; -extern unsigned int sysctl_sched_uclamp_util_min_rt_default; -#endif - #ifdef CONFIG_CFS_BANDWIDTH extern unsigned int sysctl_sched_cfs_bandwidth_slice; #endif -int sysctl_sched_uclamp_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); int sysctl_numa_balancing(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); -- cgit From d4ae80ffa64f87b9c355692b680b603add084e96 Mon Sep 17 00:00:00 2001 From: Zhen Ni Date: Tue, 15 Feb 2022 19:46:03 +0800 Subject: sched: Move cfs_bandwidth_slice sysctls to fair.c move cfs_bandwidth_slice sysctls to fair.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni Signed-off-by: Luis Chamberlain --- include/linux/sched/sysctl.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 0934b21a57a4..198f77c8a873 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -31,10 +31,6 @@ extern int sysctl_numa_balancing_mode; #define sysctl_numa_balancing_mode 0 #endif -#ifdef CONFIG_CFS_BANDWIDTH -extern unsigned int sysctl_sched_cfs_bandwidth_slice; -#endif - int sysctl_numa_balancing(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); -- cgit From 8a0441415b3f9b9a920a6a5086580ea3daa7b884 Mon Sep 17 00:00:00 2001 From: Zhen Ni Date: Tue, 15 Feb 2022 19:46:04 +0800 Subject: sched: Move energy_aware sysctls to topology.c move energy_aware sysctls to topology.c and use the new register_sysctl_init() to register the sysctl interface. Signed-off-by: Zhen Ni Signed-off-by: Luis Chamberlain --- include/linux/sched/sysctl.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 198f77c8a873..e650946816d0 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -34,10 +34,4 @@ extern int sysctl_numa_balancing_mode; int sysctl_numa_balancing(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); -#if defined(CONFIG_ENERGY_MODEL) && defined(CONFIG_CPU_FREQ_GOV_SCHEDUTIL) -extern unsigned int sysctl_sched_energy_aware; -int sched_energy_aware_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); -#endif - #endif /* _LINUX_SCHED_SYSCTL_H */ -- cgit From 06d177662fb86b80c7fc2290667b9a14cb0bd925 Mon Sep 17 00:00:00 2001 From: tangmeng Date: Thu, 17 Feb 2022 12:23:21 +0800 Subject: kernel/reboot: move reboot sysctls to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. All filesystem syctls now get reviewed by fs folks. This commit follows the commit of fs, move the poweroff_cmd and ctrl-alt-del sysctls to its own file, kernel/reboot.c. Signed-off-by: tangmeng Signed-off-by: Luis Chamberlain --- include/linux/reboot.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/reboot.h b/include/linux/reboot.h index af907a3d68d1..a2429648d831 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -71,12 +71,8 @@ extern void kernel_restart(char *cmd); extern void kernel_halt(void); extern void kernel_power_off(void); -extern int C_A_D; /* for sysctl */ void ctrl_alt_del(void); -#define POWEROFF_CMD_PATH_LEN 256 -extern char poweroff_cmd[POWEROFF_CMD_PATH_LEN]; - extern void orderly_poweroff(bool force); extern void orderly_reboot(void); void hw_protection_shutdown(const char *reason, int ms_until_forced); -- cgit From 43fe219aa56a2fdd8f0623c9470a32b14b0617a5 Mon Sep 17 00:00:00 2001 From: sujiaxun Date: Thu, 17 Feb 2022 18:51:48 -0800 Subject: mm: move oom_kill sysctls to their own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the oom_kill sysctls to their own file, mm/oom_kill.c [sfr@canb.auug.org.au: null-terminate the array] Link: https://lkml.kernel.org/r/20220216193202.28838626@canb.auug.org.au Link: https://lkml.kernel.org/r/20220215093203.31032-1-sujiaxun@uniontech.com Signed-off-by: sujiaxun Signed-off-by: Stephen Rothwell Cc: Kees Cook Cc: Iurii Zaikin Cc: Michal Hocko Signed-off-by: Andrew Morton Signed-off-by: Luis Chamberlain --- include/linux/oom.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/oom.h b/include/linux/oom.h index 2db9a1432511..02d1e7bbd8cd 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -123,8 +123,4 @@ extern void oom_killer_enable(void); extern struct task_struct *find_lock_task_mm(struct task_struct *p); -/* sysctls */ -extern int sysctl_oom_dump_tasks; -extern int sysctl_oom_kill_allocating_task; -extern int sysctl_panic_on_oom; #endif /* _INCLUDE_LINUX_OOM_H */ -- cgit From aa779e5102195e1d9ade95dcbc0bfbd8f916eb59 Mon Sep 17 00:00:00 2001 From: zhanglianjie Date: Thu, 17 Feb 2022 18:51:51 -0800 Subject: mm: move page-writeback sysctls to their own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. So move the page-writeback sysctls to its own file. [akpm@linux-foundation.org: coding-style cleanups] akpm@linux-foundation.org: fix CONFIG_SYSCTL=n warnings] Link: https://lkml.kernel.org/r/20220129012955.26594-1-zhanglianjie@uniontech.com Signed-off-by: zhanglianjie Cc: Kees Cook Cc: Iurii Zaikin Cc: Luis Chamberlain Signed-off-by: Andrew Morton Signed-off-by: Luis Chamberlain --- include/linux/writeback.h | 15 --------------- 1 file changed, 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/writeback.h b/include/linux/writeback.h index fec248ab1fec..dc2b94e6a94f 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -345,28 +345,13 @@ void wb_domain_exit(struct wb_domain *dom); extern struct wb_domain global_wb_domain; /* These are exported to sysctl. */ -extern int dirty_background_ratio; -extern unsigned long dirty_background_bytes; -extern int vm_dirty_ratio; -extern unsigned long vm_dirty_bytes; extern unsigned int dirty_writeback_interval; extern unsigned int dirty_expire_interval; extern unsigned int dirtytime_expire_interval; -extern int vm_highmem_is_dirtyable; extern int laptop_mode; -int dirty_background_ratio_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); -int dirty_background_bytes_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); -int dirty_ratio_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); -int dirty_bytes_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); int dirtytime_interval_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); -int dirty_writeback_centisecs_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty); unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh); -- cgit From f79c9b8ae8bde10126586c1bb55b5fd027276d8e Mon Sep 17 00:00:00 2001 From: tangmeng Date: Fri, 18 Feb 2022 18:58:57 +0800 Subject: kernel/lockdep: move lockdep sysctls to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. All filesystem syctls now get reviewed by fs folks. This commit follows the commit of fs, move the prove_locking and lock_stat sysctls to its own file, kernel/lockdep.c. Signed-off-by: tangmeng Signed-off-by: Luis Chamberlain --- include/linux/lockdep.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 467b94257105..37951c17908e 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -16,10 +16,6 @@ struct task_struct; -/* for sysctl */ -extern int prove_locking; -extern int lock_stat; - #ifdef CONFIG_LOCKDEP #include -- cgit From 9df918698408fd914493aba0b7858fef50eba63a Mon Sep 17 00:00:00 2001 From: tangmeng Date: Fri, 18 Feb 2022 18:59:12 +0800 Subject: kernel/panic: move panic sysctls to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. All filesystem syctls now get reviewed by fs folks. This commit follows the commit of fs, move the oops_all_cpu_backtrace sysctl to its own file, kernel/panic.c. Signed-off-by: tangmeng Signed-off-by: Luis Chamberlain --- include/linux/panic.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/panic.h b/include/linux/panic.h index f5844908a089..e71161da69c4 100644 --- a/include/linux/panic.h +++ b/include/linux/panic.h @@ -15,12 +15,6 @@ extern void oops_enter(void); extern void oops_exit(void); extern bool oops_may_print(void); -#ifdef CONFIG_SMP -extern unsigned int sysctl_oops_all_cpu_backtrace; -#else -#define sysctl_oops_all_cpu_backtrace 0 -#endif /* CONFIG_SMP */ - extern int panic_timeout; extern unsigned long panic_print; extern int panic_on_oops; -- cgit From 801b501439d1b366d524dee4fc1e6b3473a95b9a Mon Sep 17 00:00:00 2001 From: tangmeng Date: Fri, 18 Feb 2022 18:59:23 +0800 Subject: kernel/acct: move acct sysctls to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. All filesystem syctls now get reviewed by fs folks. This commit follows the commit of fs, move the acct sysctl to its own file, kernel/acct.c. Signed-off-by: tangmeng Signed-off-by: Luis Chamberlain --- include/linux/acct.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/acct.h b/include/linux/acct.h index bc70e81895c0..2718c4854815 100644 --- a/include/linux/acct.h +++ b/include/linux/acct.h @@ -21,7 +21,6 @@ #ifdef CONFIG_BSD_PROCESS_ACCT struct pid_namespace; -extern int acct_parm[]; /* for sysctl */ extern void acct_collect(long exitcode, int group_dead); extern void acct_process(void); extern void acct_exit_ns(struct pid_namespace *); -- cgit From 1186618a6a35d43a865448472a261184b608d13c Mon Sep 17 00:00:00 2001 From: tangmeng Date: Fri, 18 Feb 2022 18:59:36 +0800 Subject: kernel/delayacct: move delayacct sysctls to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. All filesystem syctls now get reviewed by fs folks. This commit follows the commit of fs, move the delayacct sysctl to its own file, kernel/delayacct.c. Signed-off-by: tangmeng Signed-off-by: Luis Chamberlain --- include/linux/delayacct.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/delayacct.h b/include/linux/delayacct.h index 3e03d010bd2e..6b16a6930a19 100644 --- a/include/linux/delayacct.h +++ b/include/linux/delayacct.h @@ -61,9 +61,6 @@ extern int delayacct_on; /* Delay accounting turned on/off */ extern struct kmem_cache *delayacct_cache; extern void delayacct_init(void); -extern int sysctl_delayacct(struct ctl_table *table, int write, void *buffer, - size_t *lenp, loff_t *ppos); - extern void __delayacct_tsk_init(struct task_struct *); extern void __delayacct_tsk_exit(struct task_struct *); extern void __delayacct_blkio_start(void); -- cgit From d772cc2c321900f3f463a124eebeb7218e66dda6 Mon Sep 17 00:00:00 2001 From: tangmeng Date: Fri, 18 Feb 2022 18:59:49 +0800 Subject: kernel/do_mount_initrd: move real_root_dev sysctls to its own file kernel/sysctl.c is a kitchen sink where everyone leaves their dirty dishes, this makes it very difficult to maintain. To help with this maintenance let's start by moving sysctls to places where they actually belong. The proc sysctl maintainers do not want to know what sysctl knobs you wish to add for your own piece of code, we just care about the core logic. All filesystem syctls now get reviewed by fs folks. This commit follows the commit of fs, move the real_root_dev sysctl to its own file, kernel/do_mount_initrd.c. Signed-off-by: tangmeng Signed-off-by: Luis Chamberlain --- include/linux/initrd.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/initrd.h b/include/linux/initrd.h index 1bbe9af48dc3..f1a1f4c92ded 100644 --- a/include/linux/initrd.h +++ b/include/linux/initrd.h @@ -29,8 +29,6 @@ static inline void wait_for_initramfs(void) {} extern phys_addr_t phys_initrd_start; extern unsigned long phys_initrd_size; -extern unsigned int real_root_dev; - extern char __initramfs_start[]; extern unsigned long __initramfs_size; -- cgit From 8e4e83b2278bdfb55cb2b13de07cf0a721ce8af7 Mon Sep 17 00:00:00 2001 From: Wei Xiao Date: Wed, 23 Feb 2022 19:11:53 +0800 Subject: ftrace: move sysctl_ftrace_enabled to ftrace.c This moves ftrace_enabled to trace/ftrace.c. We move sysctls to places where features actually belong to improve the readability of the code and reduce the risk of code merge conflicts. At the same time, the proc-sysctl maintainers do not want to know what sysctl knobs you wish to add for your owner piece of code, we just care about the core logic. Signed-off-by: Wei Xiao Acked-by: Steven Rostedt (Google) Signed-off-by: Luis Chamberlain --- include/linux/ftrace.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 4816b7e11047..088b915853dd 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -101,9 +101,6 @@ static inline int ftrace_mod_get_kallsym(unsigned int symnum, unsigned long *val #ifdef CONFIG_FUNCTION_TRACER extern int ftrace_enabled; -extern int -ftrace_enable_sysctl(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_ARGS -- cgit From 7bc80a5462c37eab58a9ea386064307c0f447fd1 Mon Sep 17 00:00:00 2001 From: Christian König Date: Tue, 9 Nov 2021 11:08:18 +0100 Subject: dma-buf: add enum dma_resv_usage v4 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This change adds the dma_resv_usage enum and allows us to specify why a dma_resv object is queried for its containing fences. Additional to that a dma_resv_usage_rw() helper function is added to aid retrieving the fences for a read or write userspace submission. This is then deployed to the different query functions of the dma_resv object and all of their users. When the write paratermer was previously true we now use DMA_RESV_USAGE_WRITE and DMA_RESV_USAGE_READ otherwise. v2: add KERNEL/OTHER in separate patch v3: some kerneldoc suggestions by Daniel v4: some more kerneldoc suggestions by Daniel, fix missing cases lost in the rebase pointed out by Bas. Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-2-christian.koenig@amd.com --- include/linux/dma-buf.h | 8 ++++-- include/linux/dma-resv.h | 73 +++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 66 insertions(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index 6fb91956ab8d..a297397743a2 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -408,6 +408,9 @@ struct dma_buf { * pipelining across drivers. These do not set any fences for their * access. An example here is v4l. * + * - Driver should use dma_resv_usage_rw() when retrieving fences as + * dependency for implicit synchronization. + * * DYNAMIC IMPORTER RULES: * * Dynamic importers, see dma_buf_attachment_is_dynamic(), have @@ -423,8 +426,9 @@ struct dma_buf { * * IMPORTANT: * - * All drivers must obey the struct dma_resv rules, specifically the - * rules for updating and obeying fences. + * All drivers and memory management related functions must obey the + * struct dma_resv rules, specifically the rules for updating and + * obeying fences. See enum dma_resv_usage for further descriptions. */ struct dma_resv *resv; diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 5fa04d0fccad..92cd8023980f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -49,6 +49,53 @@ extern struct ww_class reservation_ww_class; struct dma_resv_list; +/** + * enum dma_resv_usage - how the fences from a dma_resv obj are used + * + * This enum describes the different use cases for a dma_resv object and + * controls which fences are returned when queried. + * + * An important fact is that there is the order WRITEobj = obj; - cursor->all_fences = all_fences; + cursor->usage = usage; cursor->fence = NULL; } @@ -241,7 +288,7 @@ static inline bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor) * dma_resv_for_each_fence - fence iterator * @cursor: a struct dma_resv_iter pointer * @obj: a dma_resv object pointer - * @all_fences: true if all fences should be returned + * @usage: controls which fences to return * @fence: the current fence * * Iterate over the fences in a struct dma_resv object while holding the @@ -250,8 +297,8 @@ static inline bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor) * valid as long as the lock is held and so no extra reference to the fence is * taken. */ -#define dma_resv_for_each_fence(cursor, obj, all_fences, fence) \ - for (dma_resv_iter_begin(cursor, obj, all_fences), \ +#define dma_resv_for_each_fence(cursor, obj, usage, fence) \ + for (dma_resv_iter_begin(cursor, obj, usage), \ fence = dma_resv_iter_first(cursor); fence; \ fence = dma_resv_iter_next(cursor)) @@ -418,14 +465,14 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, struct dma_fence *fence); void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); -int dma_resv_get_fences(struct dma_resv *obj, bool write, +int dma_resv_get_fences(struct dma_resv *obj, enum dma_resv_usage usage, unsigned int *num_fences, struct dma_fence ***fences); -int dma_resv_get_singleton(struct dma_resv *obj, bool write, +int dma_resv_get_singleton(struct dma_resv *obj, enum dma_resv_usage usage, struct dma_fence **fence); int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src); -long dma_resv_wait_timeout(struct dma_resv *obj, bool wait_all, bool intr, - unsigned long timeout); -bool dma_resv_test_signaled(struct dma_resv *obj, bool test_all); +long dma_resv_wait_timeout(struct dma_resv *obj, enum dma_resv_usage usage, + bool intr, unsigned long timeout); +bool dma_resv_test_signaled(struct dma_resv *obj, enum dma_resv_usage usage); void dma_resv_describe(struct dma_resv *obj, struct seq_file *seq); #endif /* _LINUX_RESERVATION_H */ -- cgit From 73511edf8b196e6f1ccda0fdf294ff57aa2dc9db Mon Sep 17 00:00:00 2001 From: Christian König Date: Tue, 9 Nov 2021 11:08:18 +0100 Subject: dma-buf: specify usage while adding fences to dma_resv obj v7 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Instead of distingting between shared and exclusive fences specify the fence usage while adding fences. Rework all drivers to use this interface instead and deprecate the old one. v2: some kerneldoc comments suggested by Daniel v3: fix a missing case in radeon v4: rebase on nouveau changes, fix lockdep and temporary disable warning v5: more documentation updates v6: separate internal dma_resv changes from this patch, avoids to disable warning temporary, rebase on upstream changes v7: fix missed case in lima driver, minimize changes to i915_gem_busy_ioctl Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-3-christian.koenig@amd.com --- include/linux/dma-buf.h | 16 ++++++++-------- include/linux/dma-resv.h | 25 +++++++++++++++---------- 2 files changed, 23 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h index a297397743a2..71731796c8c3 100644 --- a/include/linux/dma-buf.h +++ b/include/linux/dma-buf.h @@ -393,15 +393,15 @@ struct dma_buf { * e.g. exposed in `Implicit Fence Poll Support`_ must follow the * below rules. * - * - Drivers must add a shared fence through dma_resv_add_shared_fence() - * for anything the userspace API considers a read access. This highly - * depends upon the API and window system. + * - Drivers must add a read fence through dma_resv_add_fence() with the + * DMA_RESV_USAGE_READ flag for anything the userspace API considers a + * read access. This highly depends upon the API and window system. * - * - Similarly drivers must set the exclusive fence through - * dma_resv_add_excl_fence() for anything the userspace API considers - * write access. + * - Similarly drivers must add a write fence through + * dma_resv_add_fence() with the DMA_RESV_USAGE_WRITE flag for + * anything the userspace API considers write access. * - * - Drivers may just always set the exclusive fence, since that only + * - Drivers may just always add a write fence, since that only * causes unecessarily synchronization, but no correctness issues. * * - Some drivers only expose a synchronous userspace API with no @@ -416,7 +416,7 @@ struct dma_buf { * Dynamic importers, see dma_buf_attachment_is_dynamic(), have * additional constraints on how they set up fences: * - * - Dynamic importers must obey the exclusive fence and wait for it to + * - Dynamic importers must obey the write fences and wait for them to * signal before allowing access to the buffer's underlying storage * through the device. * diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 92cd8023980f..98dc5234b487 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -195,6 +195,9 @@ struct dma_resv_iter { /** @fence: the currently handled fence */ struct dma_fence *fence; + /** @fence_usage: the usage of the current fence */ + enum dma_resv_usage fence_usage; + /** @seq: sequence number to check for modifications */ unsigned int seq; @@ -244,14 +247,15 @@ static inline void dma_resv_iter_end(struct dma_resv_iter *cursor) } /** - * dma_resv_iter_is_exclusive - test if the current fence is the exclusive one + * dma_resv_iter_usage - Return the usage of the current fence * @cursor: the cursor of the current position * - * Returns true if the currently returned fence is the exclusive one. + * Returns the usage of the currently processed fence. */ -static inline bool dma_resv_iter_is_exclusive(struct dma_resv_iter *cursor) +static inline enum dma_resv_usage +dma_resv_iter_usage(struct dma_resv_iter *cursor) { - return cursor->index == 0; + return cursor->fence_usage; } /** @@ -306,9 +310,9 @@ static inline bool dma_resv_iter_is_restarted(struct dma_resv_iter *cursor) #define dma_resv_assert_held(obj) lockdep_assert_held(&(obj)->lock.base) #ifdef CONFIG_DEBUG_MUTEXES -void dma_resv_reset_shared_max(struct dma_resv *obj); +void dma_resv_reset_max_fences(struct dma_resv *obj); #else -static inline void dma_resv_reset_shared_max(struct dma_resv *obj) {} +static inline void dma_resv_reset_max_fences(struct dma_resv *obj) {} #endif /** @@ -454,17 +458,18 @@ static inline struct ww_acquire_ctx *dma_resv_locking_ctx(struct dma_resv *obj) */ static inline void dma_resv_unlock(struct dma_resv *obj) { - dma_resv_reset_shared_max(obj); + dma_resv_reset_max_fences(obj); ww_mutex_unlock(&obj->lock); } void dma_resv_init(struct dma_resv *obj); void dma_resv_fini(struct dma_resv *obj); int dma_resv_reserve_fences(struct dma_resv *obj, unsigned int num_fences); -void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence); +void dma_resv_add_fence(struct dma_resv *obj, struct dma_fence *fence, + enum dma_resv_usage usage); void dma_resv_replace_fences(struct dma_resv *obj, uint64_t context, - struct dma_fence *fence); -void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence); + struct dma_fence *fence, + enum dma_resv_usage usage); int dma_resv_get_fences(struct dma_resv *obj, enum dma_resv_usage usage, unsigned int *num_fences, struct dma_fence ***fences); int dma_resv_get_singleton(struct dma_resv *obj, enum dma_resv_usage usage, -- cgit From 047a1b877ed48098bed71fcfb1d4891e1b54441d Mon Sep 17 00:00:00 2001 From: Christian König Date: Tue, 23 Nov 2021 09:33:07 +0100 Subject: dma-buf & drm/amdgpu: remove dma_resv workaround MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rework the internals of the dma_resv object to allow adding more than one write fence and remember for each fence what purpose it had. This allows removing the workaround from amdgpu which used a container for this instead. Signed-off-by: Christian König Reviewed-by: Daniel Vetter Cc: amd-gfx@lists.freedesktop.org Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-4-christian.koenig@amd.com --- include/linux/dma-resv.h | 47 +++++++++++------------------------------------ 1 file changed, 11 insertions(+), 36 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 98dc5234b487..7bb7e7edbb6f 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -99,8 +99,8 @@ static inline enum dma_resv_usage dma_resv_usage_rw(bool write) /** * struct dma_resv - a reservation object manages fences for a buffer * - * There are multiple uses for this, with sometimes slightly different rules in - * how the fence slots are used. + * This is a container for dma_fence objects which needs to handle multiple use + * cases. * * One use is to synchronize cross-driver access to a struct dma_buf, either for * dynamic buffer management or just to handle implicit synchronization between @@ -130,47 +130,22 @@ struct dma_resv { * @seq: * * Sequence count for managing RCU read-side synchronization, allows - * read-only access to @fence_excl and @fence while ensuring we take a - * consistent snapshot. + * read-only access to @fences while ensuring we take a consistent + * snapshot. */ seqcount_ww_mutex_t seq; /** - * @fence_excl: + * @fences: * - * The exclusive fence, if there is one currently. + * Array of fences which where added to the dma_resv object * - * To guarantee that no fences are lost, this new fence must signal - * only after the previous exclusive fence has signalled. If - * semantically only a new access is added without actually treating the - * previous one as a dependency the exclusive fences can be strung - * together using struct dma_fence_chain. - * - * Note that actual semantics of what an exclusive or shared fence mean - * is defined by the user, for reservation objects shared across drivers - * see &dma_buf.resv. - */ - struct dma_fence __rcu *fence_excl; - - /** - * @fence: - * - * List of current shared fences. - * - * There are no ordering constraints of shared fences against the - * exclusive fence slot. If a waiter needs to wait for all access, it - * has to wait for both sets of fences to signal. - * - * A new fence is added by calling dma_resv_add_shared_fence(). Since - * this often needs to be done past the point of no return in command + * A new fence is added by calling dma_resv_add_fence(). Since this + * often needs to be done past the point of no return in command * submission it cannot fail, and therefore sufficient slots need to be * reserved by calling dma_resv_reserve_fences(). - * - * Note that actual semantics of what an exclusive or shared fence mean - * is defined by the user, for reservation objects shared across drivers - * see &dma_buf.resv. */ - struct dma_resv_list __rcu *fence; + struct dma_resv_list __rcu *fences; }; /** @@ -207,8 +182,8 @@ struct dma_resv_iter { /** @fences: the shared fences; private, *MUST* not dereference */ struct dma_resv_list *fences; - /** @shared_count: number of shared fences */ - unsigned int shared_count; + /** @num_fences: number of fences */ + unsigned int num_fences; /** @is_restarted: true if this is the first returned fence */ bool is_restarted; -- cgit From b29895e18304feb7e8afc6388db7ece60327b23c Mon Sep 17 00:00:00 2001 From: Christian König Date: Fri, 26 Nov 2021 14:12:42 +0100 Subject: dma-buf: add DMA_RESV_USAGE_KERNEL v3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add an usage for kernel submissions. Waiting for those are mandatory for dynamic DMA-bufs. As a precaution this patch also changes all occurrences where fences are added as part of memory management in TTM, VMWGFX and i915 to use the new value because it now becomes possible for drivers to ignore fences with the WRITE usage. v2: use "must" in documentation, fix whitespaces v3: separate out some driver changes and better document why some changes should still be part of this patch. Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-5-christian.koenig@amd.com --- include/linux/dma-resv.h | 24 +++++++++++++++++++++--- 1 file changed, 21 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 7bb7e7edbb6f..a749f229ae91 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -55,11 +55,29 @@ struct dma_resv_list; * This enum describes the different use cases for a dma_resv object and * controls which fences are returned when queried. * - * An important fact is that there is the order WRITE Date: Tue, 9 Nov 2021 11:08:18 +0100 Subject: dma-buf: add DMA_RESV_USAGE_BOOKKEEP v3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add an usage for submissions independent of implicit sync but still interesting for memory management. v2: cleanup the kerneldoc a bit v3: separate amdgpu changes from this Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-10-christian.koenig@amd.com --- include/linux/dma-resv.h | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index a749f229ae91..1db759eacc98 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -55,7 +55,7 @@ struct dma_resv_list; * This enum describes the different use cases for a dma_resv object and * controls which fences are returned when queried. * - * An important fact is that there is the order KERNEL Date: Mon, 4 Apr 2022 14:58:37 +0200 Subject: dma-buf: drop seq count based update MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This should be possible now since we don't have the distinction between exclusive and shared fences any more. The only possible pitfall is that a dma_fence would be reused during the RCU grace period, but even that could be handled with a single extra check. Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-15-christian.koenig@amd.com --- include/linux/dma-resv.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index 1db759eacc98..c8ccbc94d5d2 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -155,15 +155,6 @@ struct dma_resv { */ struct ww_mutex lock; - /** - * @seq: - * - * Sequence count for managing RCU read-side synchronization, allows - * read-only access to @fences while ensuring we take a consistent - * snapshot. - */ - seqcount_ww_mutex_t seq; - /** * @fences: * @@ -202,9 +193,6 @@ struct dma_resv_iter { /** @fence_usage: the usage of the current fence */ enum dma_resv_usage fence_usage; - /** @seq: sequence number to check for modifications */ - unsigned int seq; - /** @index: index into the shared fences */ unsigned int index; -- cgit From e84815cbbc767617221e6891e77f2486c9199dfa Mon Sep 17 00:00:00 2001 From: Christian König Date: Thu, 7 Apr 2022 10:20:55 +0200 Subject: seqlock: drop seqcount_ww_mutex_t MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Daniel pointed out that this series removes the last user of seqcount_ww_mutex_t, so let's drop this. Signed-off-by: Christian König Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: linux-kernel@vger.kernel.org Acked-by: Peter Zijlstra (Intel) Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220407085946.744568-16-christian.koenig@amd.com --- include/linux/seqlock.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h index 37ded6b8fee6..3926e9027947 100644 --- a/include/linux/seqlock.h +++ b/include/linux/seqlock.h @@ -17,7 +17,6 @@ #include #include #include -#include #include #include @@ -164,7 +163,7 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s) * static initializer or init function. This enables lockdep to validate * that the write side critical section is properly serialized. * - * LOCKNAME: raw_spinlock, spinlock, rwlock, mutex, or ww_mutex. + * LOCKNAME: raw_spinlock, spinlock, rwlock or mutex */ /* @@ -184,7 +183,6 @@ static inline void seqcount_lockdep_reader_access(const seqcount_t *s) #define seqcount_spinlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, spinlock) #define seqcount_rwlock_init(s, lock) seqcount_LOCKNAME_init(s, lock, rwlock) #define seqcount_mutex_init(s, lock) seqcount_LOCKNAME_init(s, lock, mutex) -#define seqcount_ww_mutex_init(s, lock) seqcount_LOCKNAME_init(s, lock, ww_mutex) /* * SEQCOUNT_LOCKNAME() - Instantiate seqcount_LOCKNAME_t and helpers @@ -277,7 +275,6 @@ SEQCOUNT_LOCKNAME(raw_spinlock, raw_spinlock_t, false, s->lock, raw_s SEQCOUNT_LOCKNAME(spinlock, spinlock_t, __SEQ_RT, s->lock, spin, spin_lock(s->lock)) SEQCOUNT_LOCKNAME(rwlock, rwlock_t, __SEQ_RT, s->lock, read, read_lock(s->lock)) SEQCOUNT_LOCKNAME(mutex, struct mutex, true, s->lock, mutex, mutex_lock(s->lock)) -SEQCOUNT_LOCKNAME(ww_mutex, struct ww_mutex, true, &s->lock->base, ww_mutex, ww_mutex_lock(s->lock, NULL)) /* * SEQCNT_LOCKNAME_ZERO - static initializer for seqcount_LOCKNAME_t @@ -304,8 +301,7 @@ SEQCOUNT_LOCKNAME(ww_mutex, struct ww_mutex, true, &s->lock->base, ww_mu __seqprop_case((s), raw_spinlock, prop), \ __seqprop_case((s), spinlock, prop), \ __seqprop_case((s), rwlock, prop), \ - __seqprop_case((s), mutex, prop), \ - __seqprop_case((s), ww_mutex, prop)) + __seqprop_case((s), mutex, prop)) #define seqprop_ptr(s) __seqprop(s, ptr) #define seqprop_sequence(s) __seqprop(s, sequence) -- cgit From f584b68005ac782097d63a691740cb0dfed072ed Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 4 Apr 2022 15:11:04 -0400 Subject: mm: Add vma_alloc_folio() This wrapper around alloc_pages_vma() calls prep_transhuge_page(), removing the obligation from the caller. This is in the same spirit as __folio_alloc(). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Zi Yan Reviewed-by: William Kucharski --- include/linux/gfp.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 761f8f1885c7..3e3d36fc2109 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -613,9 +613,11 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, #ifdef CONFIG_NUMA struct page *alloc_pages(gfp_t gfp, unsigned int order); struct folio *folio_alloc(gfp_t gfp, unsigned order); -extern struct page *alloc_pages_vma(gfp_t gfp_mask, int order, +struct page *alloc_pages_vma(gfp_t gfp_mask, int order, struct vm_area_struct *vma, unsigned long addr, bool hugepage); +struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, + unsigned long addr, bool hugepage); #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ alloc_pages_vma(gfp_mask, order, vma, addr, true) #else @@ -627,8 +629,10 @@ static inline struct folio *folio_alloc(gfp_t gfp, unsigned int order) { return __folio_alloc_node(gfp, order, numa_node_id()); } -#define alloc_pages_vma(gfp_mask, order, vma, addr, false)\ +#define alloc_pages_vma(gfp_mask, order, vma, addr, hugepage) \ alloc_pages(gfp_mask, order) +#define vma_alloc_folio(gfp, order, vma, addr, hugepage) \ + folio_alloc(gfp, order) #define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ alloc_pages(gfp_mask, order) #endif -- cgit From 5ea98e01ab524cbc53dad8aebd27b434ebe5d074 Mon Sep 17 00:00:00 2001 From: Brijesh Singh Date: Mon, 7 Mar 2022 15:33:39 -0600 Subject: x86/boot: Add Confidential Computing type to setup_data While launching encrypted guests, the hypervisor may need to provide some additional information during the guest boot. When booting under an EFI-based BIOS, the EFI configuration table contains an entry for the confidential computing blob that contains the required information. To support booting encrypted guests on non-EFI VMs, the hypervisor needs to pass this additional information to the guest kernel using a different method. For this purpose, introduce SETUP_CC_BLOB type in setup_data to hold the physical address of the confidential computing blob location. The boot loader or hypervisor may choose to use this method instead of an EFI configuration table. The CC blob location scanning should give preference to a setup_data blob over an EFI configuration table. In AMD SEV-SNP, the CC blob contains the address of the secrets and CPUID pages. The secrets page includes information such as a VM to PSP communication key and the CPUID page contains PSP-filtered CPUID values. Define the AMD SEV confidential computing blob structure. While at it, define the EFI GUID for the confidential computing blob. [ bp: Massage commit message, mark struct __packed. ] Signed-off-by: Brijesh Singh Signed-off-by: Borislav Petkov Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20220307213356.2797205-30-brijesh.singh@amd.com --- include/linux/efi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index ccd4d3f91c98..984aa688997a 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -390,6 +390,7 @@ void efi_native_runtime_setup(void); #define EFI_CERT_SHA256_GUID EFI_GUID(0xc1c41626, 0x504c, 0x4092, 0xac, 0xa9, 0x41, 0xf9, 0x36, 0x93, 0x43, 0x28) #define EFI_CERT_X509_GUID EFI_GUID(0xa5c059a1, 0x94e4, 0x4aa7, 0x87, 0xb5, 0xab, 0x15, 0x5c, 0x2b, 0xf0, 0x72) #define EFI_CERT_X509_SHA256_GUID EFI_GUID(0x3bd2a492, 0x96c0, 0x4079, 0xb4, 0x20, 0xfc, 0xf9, 0x8e, 0xf1, 0x03, 0xed) +#define EFI_CC_BLOB_GUID EFI_GUID(0x067b1f5f, 0xcf26, 0x44c5, 0x85, 0x54, 0x93, 0xd7, 0x77, 0x91, 0x2d, 0x42) /* * This GUID is used to pass to the kernel proper the struct screen_info -- cgit From 6b2060cf9138a2cd5f3468a949d3869abed049ef Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Tue, 5 Apr 2022 23:03:26 +0200 Subject: fb: Delete fb_info->queue It was only used by fbcon, and that now switched to its own, private work. Acked-by: Sam Ravnborg Acked-by: Thomas Zimmermann Signed-off-by: Daniel Vetter Cc: Helge Deller Cc: linux-fbdev@vger.kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20220405210335.3434130-9-daniel.vetter@ffwll.ch --- include/linux/fb.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 9a77ab615c36..f95da1af9ff6 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -450,7 +450,6 @@ struct fb_info { struct fb_var_screeninfo var; /* Current var */ struct fb_fix_screeninfo fix; /* Current fix */ struct fb_monspecs monspecs; /* Current Monitor specs */ - struct work_struct queue; /* Framebuffer event queue */ struct fb_pixmap pixmap; /* Image hardware mapper */ struct fb_pixmap sprite; /* Cursor hardware mapper */ struct fb_cmap cmap; /* Current cmap */ -- cgit From bae1a962ac2c5e6be08319ff3f7d6df542584fce Mon Sep 17 00:00:00 2001 From: Kuppuswamy Sathyanarayanan Date: Wed, 6 Apr 2022 02:29:33 +0300 Subject: x86/topology: Disable CPU online/offline control for TDX guests Unlike regular VMs, TDX guests use the firmware hand-off wakeup method to wake up the APs during the boot process. This wakeup model uses a mailbox to communicate with firmware to bring up the APs. As per the design, this mailbox can only be used once for the given AP, which means after the APs are booted, the same mailbox cannot be used to offline/online the given AP. More details about this requirement can be found in Intel TDX Virtual Firmware Design Guide, sec titled "AP initialization in OS" and in sec titled "Hotplug Device". Since the architecture does not support any method of offlining the CPUs, disable CPU hotplug support in the kernel. Since this hotplug disable feature can be re-used by other VM guests, add a new CC attribute CC_ATTR_HOTPLUG_DISABLED and use it to disable the hotplug support. Attempt to offline CPU will fail with -EOPNOTSUPP. Signed-off-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kirill A. Shutemov Signed-off-by: Dave Hansen Reviewed-by: Andi Kleen Reviewed-by: Tony Luck Reviewed-by: Thomas Gleixner Link: https://lkml.kernel.org/r/20220405232939.73860-25-kirill.shutemov@linux.intel.com --- include/linux/cc_platform.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cc_platform.h b/include/linux/cc_platform.h index efd8205282da..691494bbaf5a 100644 --- a/include/linux/cc_platform.h +++ b/include/linux/cc_platform.h @@ -72,6 +72,16 @@ enum cc_attr { * Examples include TDX guest & SEV. */ CC_ATTR_GUEST_UNROLL_STRING_IO, + + /** + * @CC_ATTR_HOTPLUG_DISABLED: Hotplug is not supported or disabled. + * + * The platform/OS is running as a guest/virtual machine does not + * support CPU hotplug feature. + * + * Examples include TDX Guest. + */ + CC_ATTR_HOTPLUG_DISABLED, }; #ifdef CONFIG_ARCH_HAS_CC_PLATFORM -- cgit From 88dee0cc93adcd83db9d089c1163dc88edafd1c1 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 6 Apr 2022 22:34:35 -0400 Subject: NFS: Ensure rpc_run_task() cannot fail in nfs_async_rename() Ensure the call to rpc_run_task() cannot fail by preallocating the rpc_task. Fixes: 910ad38697d9 ("NFS: Fix memory allocation in rpc_alloc_task()") Signed-off-by: Trond Myklebust --- include/linux/nfs_xdr.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 49ba486aea5f..2863e5a69c6a 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1694,6 +1694,7 @@ struct nfs_unlinkdata { struct nfs_renamedata { struct nfs_renameargs args; struct nfs_renameres res; + struct rpc_task task; const struct cred *cred; struct inode *old_dir; struct dentry *old_dentry; -- cgit From 6264f58ca0e54e41d63c2d00334a48bac28fbf30 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 6 Apr 2022 14:37:54 -0700 Subject: net: extract a few internals from netdevice.h There's a number of functions and static variables used under net/core/ but not from the outside. We currently dump most of them into netdevice.h. That bad for many reasons: - netdevice.h is very cluttered, hard to figure out what the APIs are; - netdevice.h is very long; - we have to touch netdevice.h more which causes expensive incremental builds. Create a header under net/core/ and move some declarations. The new header is also a bit of a catch-all but that's fine, if we create more specific headers people will likely over-think where their declaration fit best. And end up putting them in netdevice.h, again. More work should be done on splitting netdevice.h into more targeted headers, but that'd be more time consuming so small steps. Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 72 ++--------------------------------------------- 1 file changed, 2 insertions(+), 70 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7b2a0b739684..7e7b2a72e473 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -59,7 +59,8 @@ struct dsa_port; struct ip_tunnel_parm; struct macsec_context; struct macsec_ops; - +struct netdev_name_node; +struct sd_flow_limit; struct sfp_bus; /* 802.11 specific */ struct wireless_dev; @@ -1020,16 +1021,6 @@ struct dev_ifalias { struct devlink; struct tlsdev_ops; -struct netdev_name_node { - struct hlist_node hlist; - struct list_head list; - struct net_device *dev; - const char *name; -}; - -int netdev_name_node_alt_create(struct net_device *dev, const char *name); -int netdev_name_node_alt_destroy(struct net_device *dev, const char *name); - struct netdev_net_notifier { struct list_head list; struct notifier_block *nb; @@ -2975,7 +2966,6 @@ struct net_device *dev_get_by_index(struct net *net, int ifindex); struct net_device *__dev_get_by_index(struct net *net, int ifindex); struct net_device *dev_get_by_index_rcu(struct net *net, int ifindex); struct net_device *dev_get_by_napi_id(unsigned int napi_id); -int netdev_get_name(struct net *net, char *name, int ifindex); int dev_restart(struct net_device *dev); @@ -3034,19 +3024,6 @@ static inline bool dev_has_header(const struct net_device *dev) return dev->header_ops && dev->header_ops->create; } -#ifdef CONFIG_NET_FLOW_LIMIT -#define FLOW_LIMIT_HISTORY (1 << 7) /* must be ^2 and !overflow buckets */ -struct sd_flow_limit { - u64 count; - unsigned int num_buckets; - unsigned int history_head; - u16 history[FLOW_LIMIT_HISTORY]; - u8 buckets[]; -}; - -extern int netdev_flow_limit_table_len; -#endif /* CONFIG_NET_FLOW_LIMIT */ - /* * Incoming packets are placed on per-CPU queues */ @@ -3770,7 +3747,6 @@ int dev_change_flags(struct net_device *dev, unsigned int flags, struct netlink_ext_ack *extack); void __dev_notify_flags(struct net_device *, unsigned int old_flags, unsigned int gchanges); -int dev_change_name(struct net_device *, const char *); int dev_set_alias(struct net_device *, const char *, size_t); int dev_get_alias(const struct net_device *, char *, size_t); int __dev_change_net_namespace(struct net_device *dev, struct net *net, @@ -3782,13 +3758,7 @@ int dev_change_net_namespace(struct net_device *dev, struct net *net, return __dev_change_net_namespace(dev, net, pat, 0); } int __dev_set_mtu(struct net_device *, int); -int dev_validate_mtu(struct net_device *dev, int mtu, - struct netlink_ext_ack *extack); -int dev_set_mtu_ext(struct net_device *dev, int mtu, - struct netlink_ext_ack *extack); int dev_set_mtu(struct net_device *, int); -int dev_change_tx_queue_len(struct net_device *, unsigned long); -void dev_set_group(struct net_device *, int); int dev_pre_changeaddr_notify(struct net_device *dev, const char *addr, struct netlink_ext_ack *extack); int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa, @@ -3796,24 +3766,13 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa, int dev_set_mac_address_user(struct net_device *dev, struct sockaddr *sa, struct netlink_ext_ack *extack); int dev_get_mac_address(struct sockaddr *sa, struct net *net, char *dev_name); -int dev_change_carrier(struct net_device *, bool new_carrier); -int dev_get_phys_port_id(struct net_device *dev, - struct netdev_phys_item_id *ppid); -int dev_get_phys_port_name(struct net_device *dev, - char *name, size_t len); int dev_get_port_parent_id(struct net_device *dev, struct netdev_phys_item_id *ppid, bool recurse); bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b); -int dev_change_proto_down(struct net_device *dev, bool proto_down); -void dev_change_proto_down_reason(struct net_device *dev, unsigned long mask, - u32 value); struct sk_buff *validate_xmit_skb_list(struct sk_buff *skb, struct net_device *dev, bool *again); struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev, struct netdev_queue *txq, int *ret); -typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf); -int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack, - int fd, int expected_fd, u32 flags); int bpf_xdp_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); u8 dev_xdp_prog_count(struct net_device *dev); u32 dev_xdp_prog_id(struct net_device *dev, enum bpf_xdp_mode mode); @@ -3898,13 +3857,6 @@ static __always_inline int ____dev_forward_skb(struct net_device *dev, bool dev_nit_active(struct net_device *dev); void dev_queue_xmit_nit(struct sk_buff *skb, struct net_device *dev); -extern int netdev_budget; -extern unsigned int netdev_budget_usecs; - -/* Used by rtnetlink.c:__rtnl_unlock()/rtnl_unlock() */ -extern struct list_head net_todo_list; -void netdev_run_todo(void); - static inline void __dev_put(struct net_device *dev) { if (dev) { @@ -4021,10 +3973,7 @@ static inline void dev_replace_track(struct net_device *odev, * called netif_lowerlayer_*() because they represent the state of any * kind of lower layer not just hardware media. */ - -void linkwatch_init_dev(struct net_device *dev); void linkwatch_fire_event(struct net_device *dev); -void linkwatch_forget_dev(struct net_device *dev); /** * netif_carrier_ok - test if carrier present @@ -4470,9 +4419,6 @@ int dev_addr_add(struct net_device *dev, const unsigned char *addr, unsigned char addr_type); int dev_addr_del(struct net_device *dev, const unsigned char *addr, unsigned char addr_type); -void dev_addr_flush(struct net_device *dev); -int dev_addr_init(struct net_device *dev); -void dev_addr_check(struct net_device *dev); /* Functions used for unicast addresses handling */ int dev_uc_add(struct net_device *dev, const unsigned char *addr); @@ -4562,7 +4508,6 @@ static inline void __dev_mc_unsync(struct net_device *dev, /* Functions used for secondary unicast and multicast support */ void dev_set_rx_mode(struct net_device *dev); -void __dev_set_rx_mode(struct net_device *dev); int dev_set_promiscuity(struct net_device *dev, int inc); int dev_set_allmulti(struct net_device *dev, int inc); void netdev_state_change(struct net_device *dev); @@ -4580,11 +4525,6 @@ void dev_fetch_sw_netstats(struct rtnl_link_stats64 *s, void dev_get_tstats64(struct net_device *dev, struct rtnl_link_stats64 *s); extern int netdev_max_backlog; -extern int netdev_tstamp_prequeue; -extern int netdev_unregister_timeout_secs; -extern int weight_p; -extern int dev_weight_rx_bias; -extern int dev_weight_tx_bias; extern int dev_rx_weight; extern int dev_tx_weight; extern int gro_normal_batch; @@ -4772,12 +4712,6 @@ static inline void netdev_rx_csum_fault(struct net_device *dev, void net_enable_timestamp(void); void net_disable_timestamp(void); -#ifdef CONFIG_PROC_FS -int __init dev_proc_init(void); -#else -#define dev_proc_init() 0 -#endif - static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops, struct sk_buff *skb, struct net_device *dev, bool more) @@ -4813,8 +4747,6 @@ extern const struct kobj_ns_type_operations net_ns_type_operations; const char *netdev_drivername(const struct net_device *dev); -void linkwatch_run_queue(void); - static inline netdev_features_t netdev_intersect_features(netdev_features_t f1, netdev_features_t f2) { -- cgit From 794c24e9921f32ded4422833a990ccf11dc3c00e Mon Sep 17 00:00:00 2001 From: Jeffrey Ji Date: Wed, 6 Apr 2022 17:26:00 +0000 Subject: net-core: rx_otherhost_dropped to core_stats Increment rx_otherhost_dropped counter when packet dropped due to mismatched dest MAC addr. An example when this drop can occur is when manually crafting raw packets that will be consumed by a user space application via a tap device. For testing purposes local traffic was generated using trafgen for the client and netcat to start a server Tested: Created 2 netns, sent 1 packet using trafgen from 1 to the other with "{eth(daddr=$INCORRECT_MAC...}", verified that iproute2 showed the counter was incremented. (Also had to modify iproute2 to show the stat, additional patch for that coming next.) Signed-off-by: Jeffrey Ji Reviewed-by: Brian Vazquez Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20220406172600.1141083-1-jeffreyjilinux@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7e7b2a72e473..28ea4f8269d4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -203,6 +203,7 @@ struct net_device_core_stats { local_t rx_dropped; local_t tx_dropped; local_t rx_nohandler; + local_t rx_otherhost_dropped; } __aligned(4 * sizeof(local_t)); #include @@ -3837,6 +3838,7 @@ static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ DEV_CORE_STATS_INC(rx_dropped) DEV_CORE_STATS_INC(tx_dropped) DEV_CORE_STATS_INC(rx_nohandler) +DEV_CORE_STATS_INC(rx_otherhost_dropped) static __always_inline int ____dev_forward_skb(struct net_device *dev, struct sk_buff *skb, -- cgit From b71597edfaade119157ded98991bac7160be80c2 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 8 Apr 2022 10:00:42 +0200 Subject: mmc: core: improve API to make clear mmc_hw_reset is for cards To make it unambiguous that mmc_hw_reset() is for cards and not for controllers, we make the function argument mmc_card instead of mmc_host. Also, all users are converted. Suggested-by: Ulf Hansson Signed-off-by: Wolfram Sang Acked-by: Kalle Valo Link: https://lore.kernel.org/r/20220408080045.6497-2-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson --- include/linux/mmc/core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h index 71101d1ec825..de5c64bbdb72 100644 --- a/include/linux/mmc/core.h +++ b/include/linux/mmc/core.h @@ -175,7 +175,7 @@ void mmc_wait_for_req(struct mmc_host *host, struct mmc_request *mrq); int mmc_wait_for_cmd(struct mmc_host *host, struct mmc_command *cmd, int retries); -int mmc_hw_reset(struct mmc_host *host); +int mmc_hw_reset(struct mmc_card *card); int mmc_sw_reset(struct mmc_host *host); void mmc_set_data_timeout(struct mmc_data *data, const struct mmc_card *card); -- cgit From bdae79651453df0bca20963fc2ab970146ef2a37 Mon Sep 17 00:00:00 2001 From: Shuai Xue Date: Tue, 8 Mar 2022 22:40:51 +0800 Subject: efi/cper: Add a cper_mem_err_status_str() to decode error description Introduce a new helper function cper_mem_err_status_str() to decode the error status value into a human readable string. [ bp: Massage. ] Signed-off-by: Shuai Xue Signed-off-by: Borislav Petkov Acked-by: Ard Biesheuvel Link: https://lore.kernel.org/r/20220308144053.49090-2-xueshuai@linux.alibaba.com --- include/linux/cper.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cper.h b/include/linux/cper.h index 6a511a1078ca..5b1dd27b317d 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -558,6 +558,7 @@ extern const char *const cper_proc_error_type_strs[4]; u64 cper_next_record_id(void); const char *cper_severity_str(unsigned int); const char *cper_mem_err_type_str(unsigned int); +const char *cper_mem_err_status_str(u64 status); void cper_print_bits(const char *prefix, unsigned int bits, const char * const strs[], unsigned int strs_size); void cper_mem_err_pack(const struct cper_sec_mem_err *, -- cgit From ed27b5df3877458eb24615fd9c202178660db009 Mon Sep 17 00:00:00 2001 From: Shuai Xue Date: Tue, 8 Mar 2022 22:40:52 +0800 Subject: EDAC/ghes: Unify CPER memory error location reporting Switch the GHES EDAC memory error reporting functions to use the common CPER ones and get rid of code duplication. [ bp: - rewrite commit message, remove useless text - rip out useless reformatting - align function params on the opening brace - rename function to a more descriptive name - drop useless function exports - handle buffer lengths properly when printing other detail - remove useless casting ] Signed-off-by: Shuai Xue Signed-off-by: Borislav Petkov Link: https://lore.kernel.org/r/20220308144053.49090-3-xueshuai@linux.alibaba.com --- include/linux/cper.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cper.h b/include/linux/cper.h index 5b1dd27b317d..eacb7dd7b3af 100644 --- a/include/linux/cper.h +++ b/include/linux/cper.h @@ -569,5 +569,7 @@ void cper_print_proc_arm(const char *pfx, const struct cper_sec_proc_arm *proc); void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc); +int cper_mem_err_location(struct cper_mem_err_compact *mem, char *msg); +int cper_dimm_err_location(struct cper_mem_err_compact *mem, char *msg); #endif -- cgit From 0c2cae09a765b1c1d842eb9328982976ec735926 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 17 Mar 2022 11:33:11 +0200 Subject: gpiolib: acpi: Convert type for pin to be unsigned A pin that comes from ACPI tables is of unsigned type. This also applies to the internal APIs which use unsigned int to store the pin. Convert type for pin to be unsigned in the places where it's not yet true. While at it, add a stub for acpi_get_and_request_gpiod() for the sake of consistency in the APIs. Signed-off-by: Andy Shevchenko --- include/linux/gpio/consumer.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h index c3aa8b330e1c..e71f6e1bfafe 100644 --- a/include/linux/gpio/consumer.h +++ b/include/linux/gpio/consumer.h @@ -688,7 +688,7 @@ void acpi_dev_remove_driver_gpios(struct acpi_device *adev); int devm_acpi_dev_add_driver_gpios(struct device *dev, const struct acpi_gpio_mapping *gpios); -struct gpio_desc *acpi_get_and_request_gpiod(char *path, int pin, char *label); +struct gpio_desc *acpi_get_and_request_gpiod(char *path, unsigned int pin, char *label); #else /* CONFIG_GPIOLIB && CONFIG_ACPI */ @@ -705,6 +705,12 @@ static inline int devm_acpi_dev_add_driver_gpios(struct device *dev, return -ENXIO; } +static inline struct gpio_desc *acpi_get_and_request_gpiod(char *path, unsigned int pin, + char *label) +{ + return ERR_PTR(-ENOSYS); +} + #endif /* CONFIG_GPIOLIB && CONFIG_ACPI */ -- cgit From 85ebb1a6bd62147ebcfa70500d513331a8daf9e0 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 1 Apr 2022 13:35:52 +0300 Subject: gpiolib: Introduce for_each_gpiochip_node() loop helper Introduce for_each_gpiochip_node() loop helper which iterates over the GPIO controller child nodes of a given device. Signed-off-by: Andy Shevchenko Reviewed-by: Geert Uytterhoeven Tested-by: Geert Uytterhoeven Acked-by: Bartosz Golaszewski --- include/linux/gpio/driver.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 98c93510640e..bfc91f122d5f 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -3,13 +3,14 @@ #define __LINUX_GPIO_DRIVER_H #include -#include #include #include #include #include #include #include +#include +#include struct gpio_desc; struct of_phandle_args; @@ -750,4 +751,8 @@ static inline void gpiochip_unlock_as_irq(struct gpio_chip *gc, } #endif /* CONFIG_GPIOLIB */ +#define for_each_gpiochip_node(dev, child) \ + device_for_each_child_node(dev, child) \ + if (!fwnode_property_present(child, "gpio-controller")) {} else + #endif /* __LINUX_GPIO_DRIVER_H */ -- cgit From 0b19dde90ad004592792a928c75e80612be3e2e8 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 1 Apr 2022 13:35:53 +0300 Subject: gpiolib: Introduce gpiochip_node_count() helper The gpiochip_node_count() helper iterates over the device child nodes that have the "gpio-controller" property set. It returns the number of such nodes under a given device. Signed-off-by: Andy Shevchenko Reviewed-by: Geert Uytterhoeven Tested-by: Geert Uytterhoeven Acked-by: Bartosz Golaszewski --- include/linux/gpio/driver.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index bfc91f122d5f..12de0b22b4ef 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -755,4 +755,15 @@ static inline void gpiochip_unlock_as_irq(struct gpio_chip *gc, device_for_each_child_node(dev, child) \ if (!fwnode_property_present(child, "gpio-controller")) {} else +static inline unsigned int gpiochip_node_count(struct device *dev) +{ + struct fwnode_handle *child; + unsigned int count = 0; + + for_each_gpiochip_node(dev, child) + count++; + + return count; +} + #endif /* __LINUX_GPIO_DRIVER_H */ -- cgit From 2c547f299827c12244d613eb2ee3616d88f56088 Mon Sep 17 00:00:00 2001 From: Yue Hu Date: Wed, 6 Apr 2022 11:50:17 +0800 Subject: fscache: Remove the cookie parameter from fscache_clear_page_bits() The cookie is not used at all, remove it and update the usage in io.c and afs/write.c (which is the only user outside of fscache currently) at the same time. [DH: Amended the documentation also] Signed-off-by: Yue Hu Signed-off-by: David Howells cc: linux-cachefs@redhat.com Link: https://listman.redhat.com/archives/linux-cachefs/2022-April/006659.html --- include/linux/fscache.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 6727fb0db619..e25539072463 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -573,7 +573,6 @@ int fscache_write(struct netfs_cache_resources *cres, /** * fscache_clear_page_bits - Clear the PG_fscache bits from a set of pages - * @cookie: The cookie representing the cache object * @mapping: The netfs inode to use as the source * @start: The start position in @mapping * @len: The amount of data to unlock @@ -582,8 +581,7 @@ int fscache_write(struct netfs_cache_resources *cres, * Clear the PG_fscache flag from a sequence of pages and wake up anyone who's * waiting. */ -static inline void fscache_clear_page_bits(struct fscache_cookie *cookie, - struct address_space *mapping, +static inline void fscache_clear_page_bits(struct address_space *mapping, loff_t start, size_t len, bool caching) { -- cgit From a431dbbc540532b7465eae4fc8b56a85a9fc7d17 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Fri, 8 Apr 2022 13:09:01 -0700 Subject: mm/sparsemem: fix 'mem_section' will never be NULL gcc 12 warning The gcc 12 compiler reports a "'mem_section' will never be NULL" warning on the following code: static inline struct mem_section *__nr_to_section(unsigned long nr) { #ifdef CONFIG_SPARSEMEM_EXTREME if (!mem_section) return NULL; #endif if (!mem_section[SECTION_NR_TO_ROOT(nr)]) return NULL; : It happens with CONFIG_SPARSEMEM_EXTREME off. The mem_section definition is #ifdef CONFIG_SPARSEMEM_EXTREME extern struct mem_section **mem_section; #else extern struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]; #endif In the !CONFIG_SPARSEMEM_EXTREME case, mem_section is a static 2-dimensional array and so the check "!mem_section[SECTION_NR_TO_ROOT(nr)]" doesn't make sense. Fix this warning by moving the "!mem_section[SECTION_NR_TO_ROOT(nr)]" check up inside the CONFIG_SPARSEMEM_EXTREME block and adding an explicit NR_SECTION_ROOTS check to make sure that there is no out-of-bound array access. Link: https://lkml.kernel.org/r/20220331180246.2746210-1-longman@redhat.com Fixes: 3e347261a80b ("sparsemem extreme implementation") Signed-off-by: Waiman Long Reported-by: Justin Forbes Cc: "Kirill A . Shutemov" Cc: Ingo Molnar Cc: Rafael Aquini Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 962b14d403e8..46ffab808f03 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1397,13 +1397,16 @@ static inline unsigned long *section_to_usemap(struct mem_section *ms) static inline struct mem_section *__nr_to_section(unsigned long nr) { + unsigned long root = SECTION_NR_TO_ROOT(nr); + + if (unlikely(root >= NR_SECTION_ROOTS)) + return NULL; + #ifdef CONFIG_SPARSEMEM_EXTREME - if (!mem_section) + if (!mem_section || !mem_section[root]) return NULL; #endif - if (!mem_section[SECTION_NR_TO_ROOT(nr)]) - return NULL; - return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; + return &mem_section[root][nr & SECTION_ROOT_MASK]; } extern size_t mem_section_usage_size(void); -- cgit From 2fa33b3518a8da0a5345b7ae0064223b5e4e156f Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 6 Apr 2022 11:25:36 +0300 Subject: net/mlx5_fpga: Drop INNOVA IPsec support Mellanox INNOVA IPsec cards are EOL in Nov, 2019 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code. [1] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf Link: https://lore.kernel.org/r/2afe88ec5020a491079eacf6fe3c89b64d65195c.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc_fpga.h | 148 ------------------------------------- 1 file changed, 148 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc_fpga.h b/include/linux/mlx5/mlx5_ifc_fpga.h index e3d824f6a309..45c7c0d67635 100644 --- a/include/linux/mlx5/mlx5_ifc_fpga.h +++ b/include/linux/mlx5/mlx5_ifc_fpga.h @@ -386,68 +386,6 @@ struct mlx5_ifc_fpga_destroy_qp_out_bits { u8 reserved_at_40[0x40]; }; -struct mlx5_ifc_ipsec_extended_cap_bits { - u8 encapsulation[0x20]; - - u8 reserved_0[0x12]; - u8 v2_command[0x1]; - u8 udp_encap[0x1]; - u8 rx_no_trailer[0x1]; - u8 ipv4_fragment[0x1]; - u8 ipv6[0x1]; - u8 esn[0x1]; - u8 lso[0x1]; - u8 transport_and_tunnel_mode[0x1]; - u8 tunnel_mode[0x1]; - u8 transport_mode[0x1]; - u8 ah_esp[0x1]; - u8 esp[0x1]; - u8 ah[0x1]; - u8 ipv4_options[0x1]; - - u8 auth_alg[0x20]; - - u8 enc_alg[0x20]; - - u8 sa_cap[0x20]; - - u8 reserved_1[0x10]; - u8 number_of_ipsec_counters[0x10]; - - u8 ipsec_counters_addr_low[0x20]; - u8 ipsec_counters_addr_high[0x20]; -}; - -struct mlx5_ifc_ipsec_counters_bits { - u8 dec_in_packets[0x40]; - - u8 dec_out_packets[0x40]; - - u8 dec_bypass_packets[0x40]; - - u8 enc_in_packets[0x40]; - - u8 enc_out_packets[0x40]; - - u8 enc_bypass_packets[0x40]; - - u8 drop_dec_packets[0x40]; - - u8 failed_auth_dec_packets[0x40]; - - u8 drop_enc_packets[0x40]; - - u8 success_add_sa[0x40]; - - u8 fail_add_sa[0x40]; - - u8 success_delete_sa[0x40]; - - u8 fail_delete_sa[0x40]; - - u8 dropped_cmd[0x40]; -}; - enum { MLX5_FPGA_QP_ERROR_EVENT_SYNDROME_RETRY_COUNTER_EXPIRED = 0x1, MLX5_FPGA_QP_ERROR_EVENT_SYNDROME_RNR_EXPIRED = 0x2, @@ -464,90 +402,4 @@ struct mlx5_ifc_fpga_qp_error_event_bits { u8 reserved_at_c0[0x8]; u8 fpga_qpn[0x18]; }; -enum mlx5_ifc_fpga_ipsec_response_syndrome { - MLX5_FPGA_IPSEC_RESPONSE_SUCCESS = 0, - MLX5_FPGA_IPSEC_RESPONSE_ILLEGAL_REQUEST = 1, - MLX5_FPGA_IPSEC_RESPONSE_SADB_ISSUE = 2, - MLX5_FPGA_IPSEC_RESPONSE_WRITE_RESPONSE_ISSUE = 3, -}; - -struct mlx5_ifc_fpga_ipsec_cmd_resp { - __be32 syndrome; - union { - __be32 sw_sa_handle; - __be32 flags; - }; - u8 reserved[24]; -} __packed; - -enum mlx5_ifc_fpga_ipsec_cmd_opcode { - MLX5_FPGA_IPSEC_CMD_OP_ADD_SA = 0, - MLX5_FPGA_IPSEC_CMD_OP_DEL_SA = 1, - MLX5_FPGA_IPSEC_CMD_OP_ADD_SA_V2 = 2, - MLX5_FPGA_IPSEC_CMD_OP_DEL_SA_V2 = 3, - MLX5_FPGA_IPSEC_CMD_OP_MOD_SA_V2 = 4, - MLX5_FPGA_IPSEC_CMD_OP_SET_CAP = 5, -}; - -enum mlx5_ifc_fpga_ipsec_cap { - MLX5_FPGA_IPSEC_CAP_NO_TRAILER = BIT(0), -}; - -struct mlx5_ifc_fpga_ipsec_cmd_cap { - __be32 cmd; - __be32 flags; - u8 reserved[24]; -} __packed; - -enum mlx5_ifc_fpga_ipsec_sa_flags { - MLX5_FPGA_IPSEC_SA_ESN_EN = BIT(0), - MLX5_FPGA_IPSEC_SA_ESN_OVERLAP = BIT(1), - MLX5_FPGA_IPSEC_SA_IPV6 = BIT(2), - MLX5_FPGA_IPSEC_SA_DIR_SX = BIT(3), - MLX5_FPGA_IPSEC_SA_SPI_EN = BIT(4), - MLX5_FPGA_IPSEC_SA_SA_VALID = BIT(5), - MLX5_FPGA_IPSEC_SA_IP_ESP = BIT(6), - MLX5_FPGA_IPSEC_SA_IP_AH = BIT(7), -}; - -enum mlx5_ifc_fpga_ipsec_sa_enc_mode { - MLX5_FPGA_IPSEC_SA_ENC_MODE_NONE = 0, - MLX5_FPGA_IPSEC_SA_ENC_MODE_AES_GCM_128_AUTH_128 = 1, - MLX5_FPGA_IPSEC_SA_ENC_MODE_AES_GCM_256_AUTH_128 = 3, -}; - -struct mlx5_ifc_fpga_ipsec_sa_v1 { - __be32 cmd; - u8 key_enc[32]; - u8 key_auth[32]; - __be32 sip[4]; - __be32 dip[4]; - union { - struct { - __be32 reserved; - u8 salt_iv[8]; - __be32 salt; - } __packed gcm; - struct { - u8 salt[16]; - } __packed cbc; - }; - __be32 spi; - __be32 sw_sa_handle; - __be16 tfclen; - u8 enc_mode; - u8 reserved1[2]; - u8 flags; - u8 reserved2[2]; -}; - -struct mlx5_ifc_fpga_ipsec_sa { - struct mlx5_ifc_fpga_ipsec_sa_v1 ipsec_sa_v1; - __be16 udp_sp; - __be16 udp_dp; - u8 reserved1[4]; - __be32 esn; - __be16 vid; /* only 12 bits, rest is reserved */ - __be16 reserved2; -} __packed; #endif /* MLX5_IFC_FPGA_H */ -- cgit From de8bdb476908e64805df4bfbad20618cbb1f9ffa Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 6 Apr 2022 11:25:42 +0300 Subject: RDMA/mlx5: Drop crypto flow steering API The mlx5 flow steering crypto API was intended to be used in FPGA devices, which is not supported for years already. The removal of mlx5 crypto FPGA code together with inability to configure encryption keys makes the low steering API completely unusable. So delete the code, so any ESP flow steering requests will fail with not supported error, as it is happening now anyway as no device support this type of API. Link: https://lore.kernel.org/r/634a5face7734381463d809bfb89850f6998deac.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky --- include/linux/mlx5/accel.h | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/accel.h b/include/linux/mlx5/accel.h index dacf69516002..af67d51308cf 100644 --- a/include/linux/mlx5/accel.h +++ b/include/linux/mlx5/accel.h @@ -111,10 +111,6 @@ struct mlx5_accel_esp_xfrm { struct mlx5_accel_esp_xfrm_attrs attrs; }; -enum { - MLX5_ACCEL_XFRM_FLAG_REQUIRE_METADATA = 1UL << 0, -}; - enum mlx5_accel_ipsec_cap { MLX5_ACCEL_IPSEC_CAP_DEVICE = 1 << 0, MLX5_ACCEL_IPSEC_CAP_REQUIRED_METADATA = 1 << 1, @@ -132,8 +128,7 @@ u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev); struct mlx5_accel_esp_xfrm * mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev, - const struct mlx5_accel_esp_xfrm_attrs *attrs, - u32 flags); + const struct mlx5_accel_esp_xfrm_attrs *attrs); void mlx5_accel_esp_destroy_xfrm(struct mlx5_accel_esp_xfrm *xfrm); int mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm, const struct mlx5_accel_esp_xfrm_attrs *attrs); @@ -144,8 +139,10 @@ static inline u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev) { ret static inline struct mlx5_accel_esp_xfrm * mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev, - const struct mlx5_accel_esp_xfrm_attrs *attrs, - u32 flags) { return ERR_PTR(-EOPNOTSUPP); } + const struct mlx5_accel_esp_xfrm_attrs *attrs) +{ + return ERR_PTR(-EOPNOTSUPP); +} static inline void mlx5_accel_esp_destroy_xfrm(struct mlx5_accel_esp_xfrm *xfrm) {} static inline int -- cgit From 2451da081a343e079d9f5a7b063fcdf0bc439aa8 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 6 Apr 2022 11:25:46 +0300 Subject: net/mlx5: Unify device IPsec capabilities check Merge two different function to one in order to provide coherent picture if the device is IPsec capable or not. Link: https://lore.kernel.org/r/8f10ea06ad19c6f651e9fb33921009658f01e1d5.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky --- include/linux/mlx5/accel.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/accel.h b/include/linux/mlx5/accel.h index af67d51308cf..9145e2d37c0e 100644 --- a/include/linux/mlx5/accel.h +++ b/include/linux/mlx5/accel.h @@ -124,7 +124,7 @@ enum mlx5_accel_ipsec_cap { #ifdef CONFIG_MLX5_ACCEL -u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev); +u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev); struct mlx5_accel_esp_xfrm * mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev, @@ -135,7 +135,10 @@ int mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm, #else -static inline u32 mlx5_accel_ipsec_device_caps(struct mlx5_core_dev *mdev) { return 0; } +static inline u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev) +{ + return 0; +} static inline struct mlx5_accel_esp_xfrm * mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev, -- cgit From 54deb0e77561973f4ca4515e18ab972c281eea1d Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 6 Apr 2022 11:25:48 +0300 Subject: net/mlx5: Remove not-needed IPsec config In current code, the CONFIG_MLX5_IPSEC and CONFIG_MLX5_EN_IPSEC are the same. So remove useless indirection. Link: https://lore.kernel.org/r/fd14492cbc01a0d51a5bfedde02bcd2154123fde.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky --- include/linux/mlx5/accel.h | 4 ++-- include/linux/mlx5/driver.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/accel.h b/include/linux/mlx5/accel.h index 9145e2d37c0e..73e4d50a9f02 100644 --- a/include/linux/mlx5/accel.h +++ b/include/linux/mlx5/accel.h @@ -122,7 +122,7 @@ enum mlx5_accel_ipsec_cap { MLX5_ACCEL_IPSEC_CAP_TX_IV_IS_ESN = 1 << 7, }; -#ifdef CONFIG_MLX5_ACCEL +#ifdef CONFIG_MLX5_EN_IPSEC u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev); @@ -152,5 +152,5 @@ static inline int mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm, const struct mlx5_accel_esp_xfrm_attrs *attrs) { return -EOPNOTSUPP; } -#endif /* CONFIG_MLX5_ACCEL */ +#endif /* CONFIG_MLX5_EN_IPSEC */ #endif /* __MLX5_ACCEL_H__ */ diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 9424503eb8d3..5af53c035949 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -778,7 +778,7 @@ struct mlx5_core_dev { #ifdef CONFIG_MLX5_FPGA struct mlx5_fpga_device *fpga; #endif -#ifdef CONFIG_MLX5_ACCEL +#ifdef CONFIG_MLX5_EN_IPSEC const struct mlx5_accel_ipsec_ops *ipsec_ops; #endif struct mlx5_clock clock; -- cgit From f2b41b32cde8453a0a26875261f0e26809c2805a Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 6 Apr 2022 11:25:51 +0300 Subject: net/mlx5: Remove ipsec_ops function table There is only one IPsec implementation and ipsec_ops is not needed at all in this situation. Together with removal of ipsec_ops, we can drop the entry checks as these functions are called for IPsec devices only. Link: https://lore.kernel.org/r/bc8dd1c8a77b65dbf5e2cf92c813ffaca2505c5f.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 5af53c035949..ff47d49d8be4 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -777,9 +777,6 @@ struct mlx5_core_dev { } roce; #ifdef CONFIG_MLX5_FPGA struct mlx5_fpga_device *fpga; -#endif -#ifdef CONFIG_MLX5_EN_IPSEC - const struct mlx5_accel_ipsec_ops *ipsec_ops; #endif struct mlx5_clock clock; struct mlx5_ib_clock_info *clock_info; -- cgit From 2984287c4c19949d7eb451dcad0bd5c54a2a376f Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 6 Apr 2022 11:25:52 +0300 Subject: net/mlx5: Remove not-implemented IPsec capabilities Clean a capabilities enum to remove not-implemented bits. Link: https://lore.kernel.org/r/1044bb7b779107ff38e48e3f6553421104f3f819.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky --- include/linux/mlx5/accel.h | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/accel.h b/include/linux/mlx5/accel.h index 73e4d50a9f02..0f2596297f6a 100644 --- a/include/linux/mlx5/accel.h +++ b/include/linux/mlx5/accel.h @@ -113,13 +113,10 @@ struct mlx5_accel_esp_xfrm { enum mlx5_accel_ipsec_cap { MLX5_ACCEL_IPSEC_CAP_DEVICE = 1 << 0, - MLX5_ACCEL_IPSEC_CAP_REQUIRED_METADATA = 1 << 1, - MLX5_ACCEL_IPSEC_CAP_ESP = 1 << 2, - MLX5_ACCEL_IPSEC_CAP_IPV6 = 1 << 3, - MLX5_ACCEL_IPSEC_CAP_LSO = 1 << 4, - MLX5_ACCEL_IPSEC_CAP_RX_NO_TRAILER = 1 << 5, - MLX5_ACCEL_IPSEC_CAP_ESN = 1 << 6, - MLX5_ACCEL_IPSEC_CAP_TX_IV_IS_ESN = 1 << 7, + MLX5_ACCEL_IPSEC_CAP_ESP = 1 << 1, + MLX5_ACCEL_IPSEC_CAP_IPV6 = 1 << 2, + MLX5_ACCEL_IPSEC_CAP_LSO = 1 << 3, + MLX5_ACCEL_IPSEC_CAP_ESN = 1 << 4, }; #ifdef CONFIG_MLX5_EN_IPSEC -- cgit From efaa0227f6c6a5073951b20cf2f8c63c4155306c Mon Sep 17 00:00:00 2001 From: tangmeng Date: Tue, 15 Feb 2022 14:50:19 +0800 Subject: timers: Move timer sysctl into the timer code This is part of the effort to reduce kernel/sysctl.c to only contain the core logic. Signed-off-by: tangmeng Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20220215065019.7520-1-tangmeng@uniontech.com --- include/linux/timer.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/timer.h b/include/linux/timer.h index fda13c9d1256..648f00105f58 100644 --- a/include/linux/timer.h +++ b/include/linux/timer.h @@ -196,14 +196,6 @@ extern void init_timers(void); struct hrtimer; extern enum hrtimer_restart it_real_fn(struct hrtimer *); -#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ_COMMON) -struct ctl_table; - -extern unsigned int sysctl_timer_migration; -int timer_migration_handler(struct ctl_table *table, int write, - void *buffer, size_t *lenp, loff_t *ppos); -#endif - unsigned long __round_jiffies(unsigned long j, int cpu); unsigned long __round_jiffies_relative(unsigned long j, int cpu); unsigned long round_jiffies(unsigned long j); -- cgit From a8b6d6708bb682108d8c899bc0cb7873240daf8a Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Mon, 7 Feb 2022 15:38:28 +0100 Subject: iio: core: Enhance the kernel doc of modes and currentmodes iio_dev entries Let's provide more details about these two variables because their understanding may not be straightforward for someone not used to the IIO subsystem internal logic. The different modes will soon be also be more documented for the same reason. Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20220207143840.707510-2-miquel.raynal@bootlin.com Signed-off-by: Jonathan Cameron --- include/linux/iio/iio.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index faf00f2c0be6..f191b80466cd 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -488,8 +488,15 @@ struct iio_buffer_setup_ops { /** * struct iio_dev - industrial I/O device - * @modes: [DRIVER] operating modes supported by device - * @currentmode: [INTERN] current operating mode + * @modes: [DRIVER] bitmask listing all the operating modes + * supported by the IIO device. This list should be + * initialized before registering the IIO device. It can + * also be filed up by the IIO core, as a result of + * enabling particular features in the driver + * (see iio_triggered_event_setup()). + * @currentmode: [INTERN] operating mode currently in use, may be + * eventually checked by device drivers but should be + * considered read-only as this is a core internal bit * @dev: [DRIVER] device structure, should be assigned a parent * and owner * @buffer: [DRIVER] any buffer present -- cgit From 474010127e2505fc463236470908e1ff5ddb3578 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Mon, 7 Feb 2022 15:38:33 +0100 Subject: iio: st_sensors: Add a local lock for protecting odr Right now the (framework) mlock lock is (ab)used for multiple purposes: 1- protecting concurrent accesses over the odr local cache 2- avoid changing samplig frequency whilst buffer is running Let's start by handling situation #1 with a local lock. Suggested-by: Jonathan Cameron Cc: Denis Ciocca Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20220207143840.707510-7-miquel.raynal@bootlin.com Signed-off-by: Jonathan Cameron --- include/linux/iio/common/st_sensors.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iio/common/st_sensors.h b/include/linux/iio/common/st_sensors.h index 22f67845cdd3..db4a1b260348 100644 --- a/include/linux/iio/common/st_sensors.h +++ b/include/linux/iio/common/st_sensors.h @@ -237,6 +237,7 @@ struct st_sensor_settings { * @hw_irq_trigger: if we're using the hardware interrupt on the sensor. * @hw_timestamp: Latest timestamp from the interrupt handler, when in use. * @buffer_data: Data used by buffer part. + * @odr_lock: Local lock for preventing concurrent ODR accesses/changes */ struct st_sensor_data { struct iio_trigger *trig; @@ -261,6 +262,8 @@ struct st_sensor_data { s64 hw_timestamp; char buffer_data[ST_SENSORS_MAX_BUFFER_SIZE] ____cacheline_aligned; + + struct mutex odr_lock; }; #ifdef CONFIG_IIO_BUFFER -- cgit From 2f53b4adfede66f1bc1c8bb7efd7ced2bad1191a Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Mon, 7 Feb 2022 15:38:36 +0100 Subject: iio: Un-inline iio_buffer_enabled() As we are going to hide the currentmode inside the opaque structure, this helper would soon need to call a non-inline function which would simply drop the benefit of having the helper defined inline in a header. One alternative is to move this helper in the core as there is no more interest in defining it inline in a header. We will pay the minor cost either way. Let's do like the iio_device_id() helper which also refers to the opaque structure and gets defined in the core. Suggested-by: Jonathan Cameron Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20220207143840.707510-10-miquel.raynal@bootlin.com Signed-off-by: Jonathan Cameron --- include/linux/iio/iio.h | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index f191b80466cd..faabb852128a 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -550,6 +550,7 @@ struct iio_dev { }; int iio_device_id(struct iio_dev *indio_dev); +bool iio_buffer_enabled(struct iio_dev *indio_dev); const struct iio_chan_spec *iio_find_channel_from_si(struct iio_dev *indio_dev, int si); @@ -679,16 +680,6 @@ struct iio_dev *devm_iio_device_alloc(struct device *parent, int sizeof_priv); __printf(2, 3) struct iio_trigger *devm_iio_trigger_alloc(struct device *parent, const char *fmt, ...); -/** - * iio_buffer_enabled() - helper function to test if the buffer is enabled - * @indio_dev: IIO device structure for device - **/ -static inline bool iio_buffer_enabled(struct iio_dev *indio_dev) -{ - return indio_dev->currentmode - & (INDIO_BUFFER_TRIGGERED | INDIO_BUFFER_HARDWARE | - INDIO_BUFFER_SOFTWARE); -} /** * iio_get_debugfs_dentry() - helper function to get the debugfs_dentry -- cgit From 8c576f87ad7eb639b8bd4472a9bb830e0696dda5 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Mon, 7 Feb 2022 15:38:37 +0100 Subject: iio: core: Hide read accesses to iio_dev->currentmode In order to later move this variable within the opaque structure, let's create a helper for accessing it in read-only mode. This helper will be exposed to device drivers and kept accessible for the few that could need it. The write access to this variable however should be fully reserved to the core so in a second step we will hide this variable into the opaque structure. Cc: Eugen Hristev Cc: Nicolas Ferre Cc: Alexandre Belloni Cc: Ludovic Desroches Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20220207143840.707510-11-miquel.raynal@bootlin.com Signed-off-by: Jonathan Cameron --- include/linux/iio/iio.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index faabb852128a..31098ffa7dc9 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -550,6 +550,7 @@ struct iio_dev { }; int iio_device_id(struct iio_dev *indio_dev); +int iio_device_get_current_mode(struct iio_dev *indio_dev); bool iio_buffer_enabled(struct iio_dev *indio_dev); const struct iio_chan_spec -- cgit From 51570c9d4b3a678f77a50ac139f67290e946ec86 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Mon, 7 Feb 2022 15:38:38 +0100 Subject: iio: core: Move the currentmode entry to the opaque structure This entry should, under no situation, be modified by device drivers. Now that we have limited its read access to device drivers really needing it and did so through a dedicated helper, we can easily move this variable to the opaque structure in order to prevent any further modification from non-authorized code (out of the core, basically). Signed-off-by: Miquel Raynal Reviewed-by: Alexandru Ardelean Link: https://lore.kernel.org/r/20220207143840.707510-12-miquel.raynal@bootlin.com Signed-off-by: Jonathan Cameron --- include/linux/iio/iio-opaque.h | 4 ++++ include/linux/iio/iio.h | 4 ---- 2 files changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/iio-opaque.h b/include/linux/iio/iio-opaque.h index 2be12b7b5dc5..6b3586b3f952 100644 --- a/include/linux/iio/iio-opaque.h +++ b/include/linux/iio/iio-opaque.h @@ -7,6 +7,9 @@ * struct iio_dev_opaque - industrial I/O device opaque information * @indio_dev: public industrial I/O device information * @id: used to identify device internally + * @currentmode: operating mode currently in use, may be eventually + * checked by device drivers but should be considered + * read-only as this is a core internal bit * @driver_module: used to make it harder to undercut users * @info_exist_lock: lock to prevent use during removal * @trig_readonly: mark the current trigger immutable @@ -36,6 +39,7 @@ */ struct iio_dev_opaque { struct iio_dev indio_dev; + int currentmode; int id; struct module *driver_module; struct mutex info_exist_lock; diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index 31098ffa7dc9..85cb924debd9 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -494,9 +494,6 @@ struct iio_buffer_setup_ops { * also be filed up by the IIO core, as a result of * enabling particular features in the driver * (see iio_triggered_event_setup()). - * @currentmode: [INTERN] operating mode currently in use, may be - * eventually checked by device drivers but should be - * considered read-only as this is a core internal bit * @dev: [DRIVER] device structure, should be assigned a parent * and owner * @buffer: [DRIVER] any buffer present @@ -523,7 +520,6 @@ struct iio_buffer_setup_ops { */ struct iio_dev { int modes; - int currentmode; struct device dev; struct iio_buffer *buffer; -- cgit From f67c6c73cb07a4778425a2064640333ef7bfa42b Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Mon, 7 Feb 2022 15:38:39 +0100 Subject: iio: core: Simplify the registration of kfifo buffers Among all the users of the kfifo buffers, no one uses the INDIO_BUFFER_HARDWARE mode. So let's take this as a general rule and simplify a little bit the internals - overall the documentation - by eliminating unused specific cases. Use the INDIO_BUFFER_SOFTWARE mode by default with kfifo buffers, which will basically mimic what all the "non direct" modes do. Cc: Benson Leung Cc: Guenter Roeck Cc: Jyoti Bhayana Cc: Jean-Baptiste Maneyrol Cc: Lorenzo Bianconi Cc: Michael Hennerich Cc: Greg Kroah-Hartman Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20220207143840.707510-13-miquel.raynal@bootlin.com Signed-off-by: Jonathan Cameron --- include/linux/iio/kfifo_buf.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/kfifo_buf.h b/include/linux/iio/kfifo_buf.h index ccd2ceae7b25..8a83fb58232d 100644 --- a/include/linux/iio/kfifo_buf.h +++ b/include/linux/iio/kfifo_buf.h @@ -12,11 +12,10 @@ void iio_kfifo_free(struct iio_buffer *r); int devm_iio_kfifo_buffer_setup_ext(struct device *dev, struct iio_dev *indio_dev, - int mode_flags, const struct iio_buffer_setup_ops *setup_ops, const struct attribute **buffer_attrs); -#define devm_iio_kfifo_buffer_setup(dev, indio_dev, mode_flags, setup_ops) \ - devm_iio_kfifo_buffer_setup_ext((dev), (indio_dev), (mode_flags), (setup_ops), NULL) +#define devm_iio_kfifo_buffer_setup(dev, indio_dev, setup_ops) \ + devm_iio_kfifo_buffer_setup_ext((dev), (indio_dev), (setup_ops), NULL) #endif -- cgit From 4f1a22ee7b576a38dc5705837c9b0de0c7b5b064 Mon Sep 17 00:00:00 2001 From: John Garry Date: Fri, 8 Apr 2022 17:04:12 +0800 Subject: libata: Improve ATA queued command allocation Improve ATA queued command allocation as follows: - For attaining a qc tag for a SAS host we need to allocate a bit in ata_port.sas_tag_allocated bitmap. However we already have a unique tag per device in range [0, ATA_MAX_QUEUE -1] in the scsi cmnd budget token, so just use that instead. - It is a bit pointless to have ata_qc_new_init() in libata-core.c since it pokes scsi internals, so inline it in ata_scsi_qc_new() (in libata-scsi.c). Also update Doc accordingly. - Use standard SCSI helpers set_host_byte() and set_status_byte() in ata_scsi_qc_new(). Christoph Hellwig originally contributed the change to inline ata_qc_new_init(). Signed-off-by: John Garry Reviewed-by: Christoph Hellwig Signed-off-by: Damien Le Moal --- include/linux/libata.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 9b1d3d8b1252..16107122e587 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -820,7 +820,6 @@ struct ata_port { unsigned int cbl; /* cable type; ATA_CBL_xxx */ struct ata_queued_cmd qcmd[ATA_MAX_QUEUE + 1]; - unsigned long sas_tag_allocated; /* for sas tag allocation only */ u64 qc_active; int nr_active_links; /* #links with active qcs */ unsigned int sas_last_tag; /* track next tag hw expects */ -- cgit From 9f8ed577c28813410614b418bad42285840c1a00 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Thu, 7 Apr 2022 14:20:50 +0800 Subject: net: skb: rename SKB_DROP_REASON_PTYPE_ABSENT As David Ahern suggested, the reasons for skb drops should be more general and not be code based. Therefore, rename SKB_DROP_REASON_PTYPE_ABSENT to SKB_DROP_REASON_UNHANDLED_PROTO, which is used for the cases of no L3 protocol handler, no L4 protocol handler, version extensions, etc. From previous discussion, now we have the aim to make these reasons more abstract and users based, avoiding code based. Signed-off-by: Menglong Dong Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2394441fa3dd..173bc35a10a3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -408,11 +408,9 @@ enum skb_drop_reason { */ SKB_DROP_REASON_XDP, /* dropped by XDP in input path */ SKB_DROP_REASON_TC_INGRESS, /* dropped in TC ingress HOOK */ - SKB_DROP_REASON_PTYPE_ABSENT, /* not packet_type found to handle - * the skb. For an etner packet, - * this means that L3 protocol is - * not supported - */ + SKB_DROP_REASON_UNHANDLED_PROTO, /* protocol not implemented + * or not supported + */ SKB_DROP_REASON_SKB_CSUM, /* sk_buff checksum computation * error */ -- cgit From b384c95a861eebf47e88695cf6a29f34e0b10b0f Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Thu, 7 Apr 2022 14:20:52 +0800 Subject: net: icmp: add skb drop reasons to icmp protocol Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with kfree_skb_reason(). In order to get the reasons of the skb drops after icmp message handle, we change the return type of 'handler()' in 'struct icmp_control' from 'bool' to 'enum skb_drop_reason'. This may change its original intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means success now. Therefore, all 'handler' and the call of them need to be handled. Following 'handler' functions are involved: icmp_unreach() icmp_redirect() icmp_echo() icmp_timestamp() icmp_discard() And following new drop reasons are added: SKB_DROP_REASON_ICMP_CSUM SKB_DROP_REASON_INVALID_PROTO The reason 'INVALID_PROTO' is introduced for the case that the packet doesn't follow rfc 1122 and is dropped. This is not a common case, and I believe we can locate the problem from the data in the packet. For now, this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types. Maybe there should be a document file for these reasons. For example, list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO' drop reason. Therefore, users can locate their problems according to the document. Reviewed-by: Hao Peng Reviewed-by: Jiang Biao Signed-off-by: Menglong Dong Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/skbuff.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 173bc35a10a3..9b81ba497665 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -442,6 +442,11 @@ enum skb_drop_reason { SKB_DROP_REASON_TAP_TXFILTER, /* dropped by tx filter implemented * at tun/tap, e.g., check_filter() */ + SKB_DROP_REASON_ICMP_CSUM, /* ICMP checksum error */ + SKB_DROP_REASON_INVALID_PROTO, /* the packet doesn't follow RFC + * 2211, such as a broadcasts + * ICMP_TIMESTAMP + */ SKB_DROP_REASON_MAX, }; -- cgit From 2e2ac4a3327479f7e2744cdd88a5c823f2057bad Mon Sep 17 00:00:00 2001 From: Laurent Vivier Date: Wed, 6 Apr 2022 22:15:20 +0200 Subject: tty: goldfish: Introduce gf_ioread32()/gf_iowrite32() The goldfish TTY device was clearly defined as having little-endian registers, but the switch to __raw_{read,write}l(() broke its driver when running on big-endian kernels (if anyone ever tried this). The m68k qemu implementation got this wrong, and assumed native-endian registers. While this is a bug in qemu, it is probably impossible to fix that since there is no way of knowing which other operating systems have started relying on that bug over the years. Hence revert commit da31de35cd2f ("tty: goldfish: use __raw_writel()/__raw_readl()", and define gf_ioread32()/gf_iowrite32() to be able to use accessors defined by the architecture. Cc: stable@vger.kernel.org # v5.11+ Fixes: da31de35cd2fb78f ("tty: goldfish: use __raw_writel()/__raw_readl()") Signed-off-by: Laurent Vivier Link: https://lore.kernel.org/r/20220406201523.243733-2-laurent@vivier.eu [geert: Add rationale based on Arnd's comments] Signed-off-by: Geert Uytterhoeven --- include/linux/goldfish.h | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/goldfish.h b/include/linux/goldfish.h index 12be1601fd84..bcc17f95b906 100644 --- a/include/linux/goldfish.h +++ b/include/linux/goldfish.h @@ -8,14 +8,21 @@ /* Helpers for Goldfish virtual platform */ +#ifndef gf_ioread32 +#define gf_ioread32 ioread32 +#endif +#ifndef gf_iowrite32 +#define gf_iowrite32 iowrite32 +#endif + static inline void gf_write_ptr(const void *ptr, void __iomem *portl, void __iomem *porth) { const unsigned long addr = (unsigned long)ptr; - __raw_writel(lower_32_bits(addr), portl); + gf_iowrite32(lower_32_bits(addr), portl); #ifdef CONFIG_64BIT - __raw_writel(upper_32_bits(addr), porth); + gf_iowrite32(upper_32_bits(addr), porth); #endif } @@ -23,9 +30,9 @@ static inline void gf_write_dma_addr(const dma_addr_t addr, void __iomem *portl, void __iomem *porth) { - __raw_writel(lower_32_bits(addr), portl); + gf_iowrite32(lower_32_bits(addr), portl); #ifdef CONFIG_ARCH_DMA_ADDR_T_64BIT - __raw_writel(upper_32_bits(addr), porth); + gf_iowrite32(upper_32_bits(addr), porth); #endif } -- cgit From 52126d4c03798cc55aa927fea4c776ab26b5a5f0 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 6 Feb 2022 10:47:45 +0100 Subject: dmaengine: Remove a useless mutex According to lib/idr.c, The IDA handles its own locking. It is safe to call any of the IDA functions without synchronisation in your code. so the 'chan_mutex' mutex can just be removed. It is here only to protect some ida_alloc()/ida_free() calls. Signed-off-by: Christophe JAILLET Link: https://lore.kernel.org/r/7180452c1d77b039e27b6f9418e0e7d9dd33c431.1644140845.git.christophe.jaillet@wanadoo.fr Signed-off-by: Vinod Koul --- include/linux/dmaengine.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 842d4f7ca752..6db9e03afd0b 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -870,7 +870,6 @@ struct dma_device { struct device *dev; struct module *owner; struct ida chan_ida; - struct mutex chan_mutex; /* to protect chan_ida */ u32 src_addr_widths; u32 dst_addr_widths; -- cgit From 95ebe80d99de3cb849c522a1f768e5e8befa0b7c Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 20 Jan 2022 13:16:18 -0800 Subject: srcu: Fix s/is/if/ typo in srcu_node comment This commit fixed a typo in the srcu_node structure's ->srcu_have_cbs comment. While in the area, redo a couple of comments to take advantage of 100-character line lengths. Signed-off-by: Paul E. McKenney --- include/linux/srcutree.h | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index cb1f4351e8ba..4025840ba9a3 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -47,11 +47,9 @@ struct srcu_data { */ struct srcu_node { spinlock_t __private lock; - unsigned long srcu_have_cbs[4]; /* GP seq for children */ - /* having CBs, but only */ - /* is > ->srcu_gq_seq. */ - unsigned long srcu_data_have_cbs[4]; /* Which srcu_data structs */ - /* have CBs for given GP? */ + unsigned long srcu_have_cbs[4]; /* GP seq for children having CBs, but only */ + /* if greater than ->srcu_gq_seq. */ + unsigned long srcu_data_have_cbs[4]; /* Which srcu_data structs have CBs for given GP? */ unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */ struct srcu_node *srcu_parent; /* Next up in tree. */ int grplo; /* Least CPU for node. */ -- cgit From 994f706872e6ce080506bd795ecf783d5b617de6 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 24 Jan 2022 09:46:57 -0800 Subject: srcu: Make Tree SRCU able to operate without snp_node array This commit makes Tree SRCU able to operate without an snp_node array, that is, when the srcu_data structures' ->mynode pointers are NULL. This can result in high contention on the srcu_struct structure's ->lock, but only when there are lots of call_srcu(), synchronize_srcu(), and synchronize_srcu_expedited() calls. Note that when there is no snp_node array, all SRCU callbacks use CPU 0's callback queue. This is optimal in the common case of low update-side load because it removes the need to search each CPU for the single callback that made the grace period happen. Co-developed-by: Neeraj Upadhyay Signed-off-by: Neeraj Upadhyay Signed-off-by: Paul E. McKenney --- include/linux/srcutree.h | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index 4025840ba9a3..8d1da136a93a 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -63,8 +63,9 @@ struct srcu_struct { struct srcu_node node[NUM_RCU_NODES]; /* Combining tree. */ struct srcu_node *level[RCU_NUM_LVLS + 1]; /* First node at each level. */ + int srcu_size_state; /* Small-to-big transition state. */ struct mutex srcu_cb_mutex; /* Serialize CB preparation. */ - spinlock_t __private lock; /* Protect counters */ + spinlock_t __private lock; /* Protect counters and size state. */ struct mutex srcu_gp_mutex; /* Serialize GP work. */ unsigned int srcu_idx; /* Current rdr array element. */ unsigned long srcu_gp_seq; /* Grace-period seq #. */ @@ -83,6 +84,17 @@ struct srcu_struct { struct lockdep_map dep_map; }; +/* Values for size state variable (->srcu_size_state). */ +#define SRCU_SIZE_SMALL 0 +#define SRCU_SIZE_ALLOC 1 +#define SRCU_SIZE_WAIT_BARRIER 2 +#define SRCU_SIZE_WAIT_CALL 3 +#define SRCU_SIZE_WAIT_CBS1 4 +#define SRCU_SIZE_WAIT_CBS2 5 +#define SRCU_SIZE_WAIT_CBS3 6 +#define SRCU_SIZE_WAIT_CBS4 7 +#define SRCU_SIZE_BIG 8 + /* Values for state variable (bottom bits of ->srcu_gp_seq). */ #define SRCU_STATE_IDLE 0 #define SRCU_STATE_SCAN1 1 -- cgit From 2ec303113d978931ef368886c4c6bc854493e8bf Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 21 Jan 2022 16:13:52 -0800 Subject: srcu: Dynamically allocate srcu_node array This commit shrinks the srcu_struct structure by converting its ->node field from a fixed-size compile-time array to a pointer to a dynamically allocated array. In kernels built with large values of NR_CPUS that boot on systems with smaller numbers of CPUs, this can save significant memory. [ paulmck: Apply kernel test robot feedback. ] Reported-by: A cast of thousands Co-developed-by: Neeraj Upadhyay Signed-off-by: Neeraj Upadhyay Signed-off-by: Paul E. McKenney --- include/linux/srcutree.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index 8d1da136a93a..8501b6b45941 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -60,7 +60,7 @@ struct srcu_node { * Per-SRCU-domain structure, similar in function to rcu_state. */ struct srcu_struct { - struct srcu_node node[NUM_RCU_NODES]; /* Combining tree. */ + struct srcu_node *node; /* Combining tree. */ struct srcu_node *level[RCU_NUM_LVLS + 1]; /* First node at each level. */ int srcu_size_state; /* Small-to-big transition state. */ -- cgit From db8f1471c61336477e2bf74dcb00e67d650e6dea Mon Sep 17 00:00:00 2001 From: Alexander Aring Date: Wed, 26 Jan 2022 10:03:54 -0500 Subject: srcu: Use export for srcu_struct defined by DEFINE_STATIC_SRCU() If an srcu_struct structure defined by tree SRCU's DEFINE_STATIC_SRCU() is used by a module, sparse will give the following diagnostic: sparse: symbol '__srcu_struct_nodes_srcu' was not declared. Should it be static? The problem is that a within-module DEFINE_STATIC_SRCU() must define a non-static srcu_struct because it is exported by referencing it in a special '__section("___srcu_struct_ptrs")'. This reference is needed so that module load and unloading can invoke init_srcu_struct() and cleanup_srcu_struct(), respectively. Unfortunately, sparse is unaware of '__section("___srcu_struct_ptrs")', resulting in the above false-positive diagnostic. To avoid this false positive, this commit therefore creates a prototype of the srcu_struct with an "extern" keyword. Signed-off-by: Alexander Aring Signed-off-by: Paul E. McKenney --- include/linux/srcutree.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index 8501b6b45941..44e998643f48 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -131,6 +131,7 @@ struct srcu_struct { #ifdef MODULE # define __DEFINE_SRCU(name, is_static) \ is_static struct srcu_struct name; \ + extern struct srcu_struct * const __srcu_struct_##name; \ struct srcu_struct * const __srcu_struct_##name \ __section("___srcu_struct_ptrs") = &name #else -- cgit From 46470cf85d2b61abd37c6f66c4dacc1bc510d10f Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 27 Jan 2022 13:20:49 -0800 Subject: srcu: Prevent cleanup_srcu_struct() from freeing non-dynamic ->sda When an srcu_struct structure is created (but not in a kernel module) by DEFINE_SRCU() and friends, the per-CPU srcu_data structure is statically allocated. In all other cases, that structure is obtained from alloc_percpu(), in which case cleanup_srcu_struct() must invoke free_percpu() on the resulting ->sda pointer in the srcu_struct pointer. Which it does. Except that it also invokes free_percpu() on the ->sda pointer referencing the statically allocated per-CPU srcu_data structures. Which free_percpu() is surprisingly OK with. This commit nevertheless stops cleanup_srcu_struct() from freeing statically allocated per-CPU srcu_data structures. Co-developed-by: Neeraj Upadhyay Signed-off-by: Neeraj Upadhyay Signed-off-by: Paul E. McKenney --- include/linux/srcutree.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index 44e998643f48..44bd204498a1 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -73,6 +73,7 @@ struct srcu_struct { unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */ unsigned long srcu_last_gp_end; /* Last GP end timestamp (ns) */ struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */ + bool sda_is_static; /* May ->sda be passed to free_percpu()? */ unsigned long srcu_barrier_seq; /* srcu_barrier seq #. */ struct mutex srcu_barrier_mutex; /* Serialize barrier ops. */ struct completion srcu_barrier_completion; -- cgit From 9f2e91d94c91558e3764fe4e01c5e6281a90f239 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 27 Jan 2022 20:32:05 -0800 Subject: srcu: Add contention-triggered addition of srcu_node tree This commit instruments the acquisitions of the srcu_struct structure's ->lock, enabling the initiation of a transition from SRCU_SIZE_SMALL to SRCU_SIZE_BIG when sufficient contention is experienced. The instrumentation counts the number of trylock failures within the confines of a single jiffy. If that number exceeds the value specified by the srcutree.small_contention_lim kernel boot parameter (which defaults to 100), and if the value specified by the srcutree.convert_to_big kernel boot parameter has the 0x10 bit set (defaults to 0), then a transition will be automatically initiated. By default, there will never be any transitions, so that none of the srcu_struct structures ever gains an srcu_node array. The useful values for srcutree.convert_to_big are: 0x00: Never convert. 0x01: Always convert at init_srcu_struct() time. 0x02: Convert when rcutorture prints its first round of statistics. 0x03: Decide conversion approach at boot given system size. 0x10: Convert if contention is encountered. 0x12: Convert if contention is encountered or when rcutorture prints its first round of statistics, whichever comes first. The value 0x11 acts the same as 0x01 because the conversion happens before there is any chance of contention. [ paulmck: Apply "static" feedback from kernel test robot. ] Co-developed-by: Neeraj Upadhyay Signed-off-by: Neeraj Upadhyay Signed-off-by: Paul E. McKenney --- include/linux/srcutree.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index 44bd204498a1..1b9ff4ed37e4 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -72,6 +72,8 @@ struct srcu_struct { unsigned long srcu_gp_seq_needed; /* Latest gp_seq needed. */ unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */ unsigned long srcu_last_gp_end; /* Last GP end timestamp (ns) */ + unsigned long srcu_size_jiffies; /* Current contention-measurement interval. */ + unsigned long srcu_n_lock_retries; /* Contention events in current interval. */ struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */ bool sda_is_static; /* May ->sda be passed to free_percpu()? */ unsigned long srcu_barrier_seq; /* srcu_barrier seq #. */ -- cgit From 5d90070816534882b9158f14154b7e2cdef1194a Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 4 Mar 2022 10:41:44 -0800 Subject: rcu-tasks: Make Tasks RCU account for userspace execution The main Tasks RCU quiescent state is voluntary context switch. However, userspace execution is also a valid quiescent state, and is a valuable one for userspace applications that spin repeatedly executing light-weight non-sleeping system calls. Currently, such an application can delay a Tasks RCU grace period for many tens of seconds. This commit therefore enlists the aid of the scheduler-clock interrupt to provide a Tasks RCU quiescent state when it interrupted a task executing in userspace. [ paulmck: Apply feedback from kernel test robot. ] Cc: Martin KaFai Lau Cc: Neil Spring Signed-off-by: Paul E. McKenney --- include/linux/rcupdate.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index e7c39c200e2b..1a32036c918c 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -196,6 +196,7 @@ void synchronize_rcu_tasks_rude(void); void exit_tasks_rcu_start(void); void exit_tasks_rcu_finish(void); #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */ +#define rcu_tasks_classic_qs(t, preempt) do { } while (0) #define rcu_tasks_qs(t, preempt) do { } while (0) #define rcu_note_voluntary_context_switch(t) do { } while (0) #define call_rcu_tasks call_rcu -- cgit From bd6c375b92c3f367e184d164e12952e4b9d9fb4f Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Tue, 15 Mar 2022 16:33:38 +0100 Subject: rcutorture: Call preempt_schedule() through static call/key The rcutorture test suite sometimess triggers a random scheduler preemption call while simulating a read delay. Unfortunately, its direct call to preempt_schedule() bypasses the static call/key filter used by CONFIG_PREEMPT_DYNAMIC. This breaks the no-preempt assumption when the dynamic preemption mode is "none". For example, rcu_blocking_is_gp() is fooled and abbreviates grace periods when the CPU runs in no-preempt UP mode. Fix this by making torture_preempt_schedule() call __preempt_schedule(), which uses the static call/key. Reported-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney --- include/linux/torture.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/torture.h b/include/linux/torture.h index 63fa4196e51c..7038104463e4 100644 --- a/include/linux/torture.h +++ b/include/linux/torture.h @@ -118,7 +118,7 @@ void _torture_stop_kthread(char *m, struct task_struct **tp); _torture_stop_kthread("Stopping " #n " task", &(tp)) #ifdef CONFIG_PREEMPTION -#define torture_preempt_schedule() preempt_schedule() +#define torture_preempt_schedule() __preempt_schedule() #else #define torture_preempt_schedule() do { } while (0) #endif -- cgit From a28c1ab312712c26a8d004af1f68628d625dafac Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Sat, 9 Apr 2022 22:13:56 +0300 Subject: ata: libata-core: fix parameter type in ata_xfer_mode2shift() The data transfer mode that corresponds to the 'xfer_mode' parameter for ata_xfer_mode2shift() is a 8-bit *unsigned* value. Using *unsigned long* to declare the parameter leads to a problematic implicit *int* to *unsigned long* cast and was most probably a result of a copy/paste mistake -- use the 'u8' type instead, as in ata_xfer_mode2mask()... Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Signed-off-by: Sergey Shtylyov Signed-off-by: Damien Le Moal --- include/linux/libata.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 16107122e587..732de9014626 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1110,7 +1110,7 @@ extern void ata_unpack_xfermask(unsigned long xfer_mask, unsigned long *udma_mask); extern u8 ata_xfer_mask2mode(unsigned long xfer_mask); extern unsigned long ata_xfer_mode2mask(u8 xfer_mode); -extern int ata_xfer_mode2shift(unsigned long xfer_mode); +extern int ata_xfer_mode2shift(u8 xfer_mode); extern const char *ata_mode_string(unsigned long xfer_mask); extern unsigned long ata_id_xfermask(const u16 *id); extern int ata_std_qc_defer(struct ata_queued_cmd *qc); -- cgit From 868e6139c5212e7d9de8332806aacfeafb349320 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Sun, 27 Mar 2022 11:33:16 -0600 Subject: block: move lower_48_bits() to block The function is not generally applicable enough to be included in the core kernel header. Move it to block since it's the only subsystem using it. Suggested-by: Linus Torvalds Signed-off-by: Keith Busch Link: https://lore.kernel.org/r/20220327173316.315-1-kbusch@kernel.org Signed-off-by: Jens Axboe --- include/linux/kernel.h | 9 --------- include/linux/t10-pi.h | 9 +++++++++ 2 files changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kernel.h b/include/linux/kernel.h index 08ba5995aa8b..a890428bcc1a 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -63,15 +63,6 @@ } \ ) -/** - * lower_48_bits() - return bits 0-47 of a number - * @n: the number we're accessing - */ -static inline u64 lower_48_bits(u64 n) -{ - return n & ((1ull << 48) - 1); -} - /** * upper_32_bits - return bits 32-63 of a number * @n: the number we're accessing diff --git a/include/linux/t10-pi.h b/include/linux/t10-pi.h index a4b1af581f69..248f4ac95642 100644 --- a/include/linux/t10-pi.h +++ b/include/linux/t10-pi.h @@ -59,6 +59,15 @@ struct crc64_pi_tuple { __u8 ref_tag[6]; }; +/** + * lower_48_bits() - return bits 0-47 of a number + * @n: the number we're accessing + */ +static inline u64 lower_48_bits(u64 n) +{ + return n & ((1ull << 48) - 1); +} + static inline u64 ext_pi_ref_tag(struct request *rq) { unsigned int shift = ilog2(queue_logical_block_size(rq->q)); -- cgit From 1a95e04e29a116c3424988c70c441ca8ec2779ff Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Tue, 12 Apr 2022 11:24:00 +0100 Subject: net: phylink: remove phylink_helper_basex_speed() As there are now no users of phylink_helper_basex_speed(), we can remove this obsolete functionality. Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 223781622b33..6d06896fc20d 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -160,11 +160,6 @@ struct phylink_mac_ops { * clearing unsupported speeds and duplex settings. The port modes * should not be cleared; phylink_set_port_modes() will help with this. * - * If the @state->interface mode is %PHY_INTERFACE_MODE_1000BASEX - * or %PHY_INTERFACE_MODE_2500BASEX, select the appropriate mode - * based on @state->advertising and/or @state->speed and update - * @state->interface accordingly. See phylink_helper_basex_speed(). - * * When @config->supported_interfaces has been set, phylink will iterate * over the supported interfaces to determine the full capability of the * MAC. The validation function must not print errors if @state->interface @@ -579,7 +574,6 @@ int phylink_speed_up(struct phylink *pl); #define phylink_test(bm, mode) __phylink_do_bit(test_bit, bm, mode) void phylink_set_port_modes(unsigned long *bits); -void phylink_helper_basex_speed(struct phylink_link_state *state); void phylink_mii_c22_pcs_decode_state(struct phylink_link_state *state, u16 bmsr, u16 lpa); -- cgit From 1306d5362a591493a2d07f685ed2cc480dcda320 Mon Sep 17 00:00:00 2001 From: Nikolay Aleksandrov Date: Wed, 13 Apr 2022 13:51:56 +0300 Subject: net: add ndo_fdb_del_bulk Add a new netdev op called ndo_fdb_del_bulk, it will be later used for driver-specific bulk delete implementation dispatched from rtnetlink. The first user will be the bridge, we need it to signal to rtnetlink from the driver that we support bulk delete operation (NLM_F_BULK). Signed-off-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/linux/netdevice.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 28ea4f8269d4..a602f29365b0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1260,6 +1260,10 @@ struct netdev_net_notifier { * struct net_device *dev, * const unsigned char *addr, u16 vid) * Deletes the FDB entry from dev coresponding to addr. + * int (*ndo_fdb_del_bulk)(struct ndmsg *ndm, struct nlattr *tb[], + * struct net_device *dev, + * u16 vid, + * struct netlink_ext_ack *extack); * int (*ndo_fdb_dump)(struct sk_buff *skb, struct netlink_callback *cb, * struct net_device *dev, struct net_device *filter_dev, * int *idx) @@ -1510,6 +1514,11 @@ struct net_device_ops { struct net_device *dev, const unsigned char *addr, u16 vid); + int (*ndo_fdb_del_bulk)(struct ndmsg *ndm, + struct nlattr *tb[], + struct net_device *dev, + u16 vid, + struct netlink_ext_ack *extack); int (*ndo_fdb_dump)(struct sk_buff *skb, struct netlink_callback *cb, struct net_device *dev, -- cgit From b0c3e796f24b588b862b61ce235d3c9417dc8983 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Fri, 8 Apr 2022 18:14:57 +0200 Subject: random: make random_get_entropy() return an unsigned long Some implementations were returning type `unsigned long`, while others that fell back to get_cycles() were implicitly returning a `cycles_t` or an untyped constant int literal. That makes for weird and confusing code, and basically all code in the kernel already handled it like it was an `unsigned long`. I recently tried to handle it as the largest type it could be, a `cycles_t`, but doing so doesn't really help with much. Instead let's just make random_get_entropy() return an unsigned long all the time. This also matches the commonly used `arch_get_random_long()` function, so now RDRAND and RDTSC return the same sized integer, which means one can fallback to the other more gracefully. Cc: Dominik Brodowski Cc: Theodore Ts'o Acked-by: Thomas Gleixner Signed-off-by: Jason A. Donenfeld --- include/linux/timex.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/timex.h b/include/linux/timex.h index 059b18eb1f1f..5745c90c8800 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -75,7 +75,7 @@ * By default we use get_cycles() for this purpose, but individual * architectures may override this in their asm/timex.h header file. */ -#define random_get_entropy() get_cycles() +#define random_get_entropy() ((unsigned long)get_cycles()) #endif /* -- cgit From d6d3146ce532268ad0ffd8d92d2b7492898decf1 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Wed, 13 Apr 2022 16:15:52 +0800 Subject: skb: add some helpers for skb drop reasons In order to simply the definition and assignment for 'enum skb_drop_reason', introduce some helpers. SKB_DR() is used to define a variable of type 'enum skb_drop_reason' with the 'SKB_DROP_REASON_NOT_SPECIFIED' initial value. SKB_DR_SET() is used to set the value of the variable. Seems it is a little useless? But it makes the code shorter. SKB_DR_OR() is used to set the value of the variable if it is not set yet, which means its value is SKB_DROP_REASON_NOT_SPECIFIED. Signed-off-by: Menglong Dong Reviewed-by: Jiang Biao Reviewed-by: Hao Peng Signed-off-by: David S. Miller --- include/linux/skbuff.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9b81ba497665..0cbd6ada957c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -450,6 +450,18 @@ enum skb_drop_reason { SKB_DROP_REASON_MAX, }; +#define SKB_DR_INIT(name, reason) \ + enum skb_drop_reason name = SKB_DROP_REASON_##reason +#define SKB_DR(name) \ + SKB_DR_INIT(name, NOT_SPECIFIED) +#define SKB_DR_SET(name, reason) \ + (name = SKB_DROP_REASON_##reason) +#define SKB_DR_OR(name, reason) \ + do { \ + if (name == SKB_DROP_REASON_NOT_SPECIFIED) \ + SKB_DR_SET(name, reason); \ + } while (0) + /* To allow 64K frame to be packed as single skb without frag_list we * require 64K/PAGE_SIZE pages plus 1 additional page to allow for * buffers which do not start on a page boundary. -- cgit From c4eb664191b4a5ff6856478f903924176697719e Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Wed, 13 Apr 2022 16:15:53 +0800 Subject: net: ipv4: add skb drop reasons to ip_error() Eventually, I find out the handler function for inputting route lookup fail: ip_error(). The drop reasons we used in ip_error() are almost corresponding to IPSTATS_MIB_*, and following new reasons are introduced: SKB_DROP_REASON_IP_INADDRERRORS SKB_DROP_REASON_IP_INNOROUTES Isn't the name SKB_DROP_REASON_IP_HOSTUNREACH and SKB_DROP_REASON_IP_NETUNREACH more accurate? To make them corresponding to IPSTATS_MIB_*, we keep their name still. Signed-off-by: Menglong Dong Reviewed-by: Jiang Biao Reviewed-by: Hao Peng Signed-off-by: David S. Miller --- include/linux/skbuff.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0cbd6ada957c..886e83ac4b70 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -447,6 +447,12 @@ enum skb_drop_reason { * 2211, such as a broadcasts * ICMP_TIMESTAMP */ + SKB_DROP_REASON_IP_INADDRERRORS, /* host unreachable, corresponding + * to IPSTATS_MIB_INADDRERRORS + */ + SKB_DROP_REASON_IP_INNOROUTES, /* network unreachable, corresponding + * to IPSTATS_MIB_INADDRERRORS + */ SKB_DROP_REASON_MAX, }; -- cgit From 2edc1a383fda8d2f580216292dfd9daeae691e47 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Wed, 13 Apr 2022 16:15:55 +0800 Subject: net: ip: add skb drop reasons to ip forwarding Replace kfree_skb() which is used in ip6_forward() and ip_forward() with kfree_skb_reason(). The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for the case that the length of the packet exceeds MTU and can't fragment. Signed-off-by: Menglong Dong Reviewed-by: Jiang Biao Reviewed-by: Hao Peng Signed-off-by: David S. Miller --- include/linux/skbuff.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 886e83ac4b70..0ef11df1bc67 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -453,6 +453,9 @@ enum skb_drop_reason { SKB_DROP_REASON_IP_INNOROUTES, /* network unreachable, corresponding * to IPSTATS_MIB_INADDRERRORS */ + SKB_DROP_REASON_PKT_TOO_BIG, /* packet size is too big (maybe exceed + * the MTU) + */ SKB_DROP_REASON_MAX, }; -- cgit From 1ad6d548e2a452f21bcee4606ee4ec7afcde5f37 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Wed, 13 Apr 2022 16:15:56 +0800 Subject: net: icmp: introduce function icmpv6_param_prob_reason() In order to add the skb drop reasons support to icmpv6_param_prob(), introduce the function icmpv6_param_prob_reason() and make icmpv6_param_prob() an inline call to it. This new function will be used in the following patches. Signed-off-by: Menglong Dong Reviewed-by: Jiang Biao Reviewed-by: Hao Peng Signed-off-by: David S. Miller --- include/linux/icmpv6.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/icmpv6.h b/include/linux/icmpv6.h index 9055cb380ee2..db0f4fcfdaf4 100644 --- a/include/linux/icmpv6.h +++ b/include/linux/icmpv6.h @@ -79,8 +79,9 @@ extern int icmpv6_init(void); extern int icmpv6_err_convert(u8 type, u8 code, int *err); extern void icmpv6_cleanup(void); -extern void icmpv6_param_prob(struct sk_buff *skb, - u8 code, int pos); +extern void icmpv6_param_prob_reason(struct sk_buff *skb, + u8 code, int pos, + enum skb_drop_reason reason); struct flowi6; struct in6_addr; @@ -91,6 +92,12 @@ extern void icmpv6_flow_init(struct sock *sk, const struct in6_addr *daddr, int oif); +static inline void icmpv6_param_prob(struct sk_buff *skb, u8 code, int pos) +{ + icmpv6_param_prob_reason(skb, code, pos, + SKB_DROP_REASON_NOT_SPECIFIED); +} + static inline bool icmpv6_is_err(int type) { switch (type) { -- cgit From f8e6b7babfeb40987e946bc1427609a9976017fa Mon Sep 17 00:00:00 2001 From: Karol Herbst Date: Mon, 11 Apr 2022 15:44:04 +0200 Subject: dma-buf-map: remove renamed header file MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit commit 7938f4218168 ("dma-buf-map: Rename to iosys-map") already renamed this file, but it got brought back by a merge. Delete it for real this time. Fixes: 30424ebae8df ("Merge tag 'drm-intel-gt-next-2022-02-17' of git://anongit.freedesktop.org/drm/drm-intel into drm-intel-next") Cc: Rodrigo Vivi Cc: Lucas De Marchi Cc: dri-devel@lists.freedesktop.org Signed-off-by: Karol Herbst Reviewed-by: Michel Dänzer Reviewed-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220411134404.524776-1-kherbst@redhat.com --- include/linux/dma-buf-map.h | 266 -------------------------------------------- 1 file changed, 266 deletions(-) delete mode 100644 include/linux/dma-buf-map.h (limited to 'include/linux') diff --git a/include/linux/dma-buf-map.h b/include/linux/dma-buf-map.h deleted file mode 100644 index 19fa0b5ae5ec..000000000000 --- a/include/linux/dma-buf-map.h +++ /dev/null @@ -1,266 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Pointer to dma-buf-mapped memory, plus helpers. - */ - -#ifndef __DMA_BUF_MAP_H__ -#define __DMA_BUF_MAP_H__ - -#include -#include - -/** - * DOC: overview - * - * Calling dma-buf's vmap operation returns a pointer to the buffer's memory. - * Depending on the location of the buffer, users may have to access it with - * I/O operations or memory load/store operations. For example, copying to - * system memory could be done with memcpy(), copying to I/O memory would be - * done with memcpy_toio(). - * - * .. code-block:: c - * - * void *vaddr = ...; // pointer to system memory - * memcpy(vaddr, src, len); - * - * void *vaddr_iomem = ...; // pointer to I/O memory - * memcpy_toio(vaddr, _iomem, src, len); - * - * When using dma-buf's vmap operation, the returned pointer is encoded as - * :c:type:`struct dma_buf_map `. - * :c:type:`struct dma_buf_map ` stores the buffer's address in - * system or I/O memory and a flag that signals the required method of - * accessing the buffer. Use the returned instance and the helper functions - * to access the buffer's memory in the correct way. - * - * The type :c:type:`struct dma_buf_map ` and its helpers are - * actually independent from the dma-buf infrastructure. When sharing buffers - * among devices, drivers have to know the location of the memory to access - * the buffers in a safe way. :c:type:`struct dma_buf_map ` - * solves this problem for dma-buf and its users. If other drivers or - * sub-systems require similar functionality, the type could be generalized - * and moved to a more prominent header file. - * - * Open-coding access to :c:type:`struct dma_buf_map ` is - * considered bad style. Rather then accessing its fields directly, use one - * of the provided helper functions, or implement your own. For example, - * instances of :c:type:`struct dma_buf_map ` can be initialized - * statically with DMA_BUF_MAP_INIT_VADDR(), or at runtime with - * dma_buf_map_set_vaddr(). These helpers will set an address in system memory. - * - * .. code-block:: c - * - * struct dma_buf_map map = DMA_BUF_MAP_INIT_VADDR(0xdeadbeaf); - * - * dma_buf_map_set_vaddr(&map, 0xdeadbeaf); - * - * To set an address in I/O memory, use dma_buf_map_set_vaddr_iomem(). - * - * .. code-block:: c - * - * dma_buf_map_set_vaddr_iomem(&map, 0xdeadbeaf); - * - * Instances of struct dma_buf_map do not have to be cleaned up, but - * can be cleared to NULL with dma_buf_map_clear(). Cleared mappings - * always refer to system memory. - * - * .. code-block:: c - * - * dma_buf_map_clear(&map); - * - * Test if a mapping is valid with either dma_buf_map_is_set() or - * dma_buf_map_is_null(). - * - * .. code-block:: c - * - * if (dma_buf_map_is_set(&map) != dma_buf_map_is_null(&map)) - * // always true - * - * Instances of :c:type:`struct dma_buf_map ` can be compared - * for equality with dma_buf_map_is_equal(). Mappings the point to different - * memory spaces, system or I/O, are never equal. That's even true if both - * spaces are located in the same address space, both mappings contain the - * same address value, or both mappings refer to NULL. - * - * .. code-block:: c - * - * struct dma_buf_map sys_map; // refers to system memory - * struct dma_buf_map io_map; // refers to I/O memory - * - * if (dma_buf_map_is_equal(&sys_map, &io_map)) - * // always false - * - * A set up instance of struct dma_buf_map can be used to access or manipulate - * the buffer memory. Depending on the location of the memory, the provided - * helpers will pick the correct operations. Data can be copied into the memory - * with dma_buf_map_memcpy_to(). The address can be manipulated with - * dma_buf_map_incr(). - * - * .. code-block:: c - * - * const void *src = ...; // source buffer - * size_t len = ...; // length of src - * - * dma_buf_map_memcpy_to(&map, src, len); - * dma_buf_map_incr(&map, len); // go to first byte after the memcpy - */ - -/** - * struct dma_buf_map - Pointer to vmap'ed dma-buf memory. - * @vaddr_iomem: The buffer's address if in I/O memory - * @vaddr: The buffer's address if in system memory - * @is_iomem: True if the dma-buf memory is located in I/O - * memory, or false otherwise. - */ -struct dma_buf_map { - union { - void __iomem *vaddr_iomem; - void *vaddr; - }; - bool is_iomem; -}; - -/** - * DMA_BUF_MAP_INIT_VADDR - Initializes struct dma_buf_map to an address in system memory - * @vaddr_: A system-memory address - */ -#define DMA_BUF_MAP_INIT_VADDR(vaddr_) \ - { \ - .vaddr = (vaddr_), \ - .is_iomem = false, \ - } - -/** - * dma_buf_map_set_vaddr - Sets a dma-buf mapping structure to an address in system memory - * @map: The dma-buf mapping structure - * @vaddr: A system-memory address - * - * Sets the address and clears the I/O-memory flag. - */ -static inline void dma_buf_map_set_vaddr(struct dma_buf_map *map, void *vaddr) -{ - map->vaddr = vaddr; - map->is_iomem = false; -} - -/** - * dma_buf_map_set_vaddr_iomem - Sets a dma-buf mapping structure to an address in I/O memory - * @map: The dma-buf mapping structure - * @vaddr_iomem: An I/O-memory address - * - * Sets the address and the I/O-memory flag. - */ -static inline void dma_buf_map_set_vaddr_iomem(struct dma_buf_map *map, - void __iomem *vaddr_iomem) -{ - map->vaddr_iomem = vaddr_iomem; - map->is_iomem = true; -} - -/** - * dma_buf_map_is_equal - Compares two dma-buf mapping structures for equality - * @lhs: The dma-buf mapping structure - * @rhs: A dma-buf mapping structure to compare with - * - * Two dma-buf mapping structures are equal if they both refer to the same type of memory - * and to the same address within that memory. - * - * Returns: - * True is both structures are equal, or false otherwise. - */ -static inline bool dma_buf_map_is_equal(const struct dma_buf_map *lhs, - const struct dma_buf_map *rhs) -{ - if (lhs->is_iomem != rhs->is_iomem) - return false; - else if (lhs->is_iomem) - return lhs->vaddr_iomem == rhs->vaddr_iomem; - else - return lhs->vaddr == rhs->vaddr; -} - -/** - * dma_buf_map_is_null - Tests for a dma-buf mapping to be NULL - * @map: The dma-buf mapping structure - * - * Depending on the state of struct dma_buf_map.is_iomem, tests if the - * mapping is NULL. - * - * Returns: - * True if the mapping is NULL, or false otherwise. - */ -static inline bool dma_buf_map_is_null(const struct dma_buf_map *map) -{ - if (map->is_iomem) - return !map->vaddr_iomem; - return !map->vaddr; -} - -/** - * dma_buf_map_is_set - Tests is the dma-buf mapping has been set - * @map: The dma-buf mapping structure - * - * Depending on the state of struct dma_buf_map.is_iomem, tests if the - * mapping has been set. - * - * Returns: - * True if the mapping is been set, or false otherwise. - */ -static inline bool dma_buf_map_is_set(const struct dma_buf_map *map) -{ - return !dma_buf_map_is_null(map); -} - -/** - * dma_buf_map_clear - Clears a dma-buf mapping structure - * @map: The dma-buf mapping structure - * - * Clears all fields to zero; including struct dma_buf_map.is_iomem. So - * mapping structures that were set to point to I/O memory are reset for - * system memory. Pointers are cleared to NULL. This is the default. - */ -static inline void dma_buf_map_clear(struct dma_buf_map *map) -{ - if (map->is_iomem) { - map->vaddr_iomem = NULL; - map->is_iomem = false; - } else { - map->vaddr = NULL; - } -} - -/** - * dma_buf_map_memcpy_to - Memcpy into dma-buf mapping - * @dst: The dma-buf mapping structure - * @src: The source buffer - * @len: The number of byte in src - * - * Copies data into a dma-buf mapping. The source buffer is in system - * memory. Depending on the buffer's location, the helper picks the correct - * method of accessing the memory. - */ -static inline void dma_buf_map_memcpy_to(struct dma_buf_map *dst, const void *src, size_t len) -{ - if (dst->is_iomem) - memcpy_toio(dst->vaddr_iomem, src, len); - else - memcpy(dst->vaddr, src, len); -} - -/** - * dma_buf_map_incr - Increments the address stored in a dma-buf mapping - * @map: The dma-buf mapping structure - * @incr: The number of bytes to increment - * - * Increments the address stored in a dma-buf mapping. Depending on the - * buffer's location, the correct value will be updated. - */ -static inline void dma_buf_map_incr(struct dma_buf_map *map, size_t incr) -{ - if (map->is_iomem) - map->vaddr_iomem += incr; - else - map->vaddr += incr; -} - -#endif /* __DMA_BUF_MAP_H__ */ -- cgit From bdc21a4d286c6917fe966fd765de99411095b1b4 Mon Sep 17 00:00:00 2001 From: Lukasz Luba Date: Mon, 21 Mar 2022 09:57:22 +0000 Subject: PM: EM: Add .get_cost() callback The Energy Model (EM) supports devices which report abstract power scale, not only real Watts. The primary goal for EM is to enable the Energy Aware Scheduler (EAS) for a given platform. Some of the platforms might not be able to deliver proper power values. The only information that they might have is the relative efficiency between CPU types. Thus, it makes sense to remove some restrictions in the EM framework and introduce a mechanism which would support those platforms. What is crucial for EAS to operate is the 'cost' field in the EM. The 'cost' is calculated internally in EM framework based on knowledge from 'power' values. The 'cost' values must be strictly increasing. The existing API with its 'power' value size restrictions cannot guarantee that the 'cost' will meet this requirement. Since the platform is missing this detailed information, but has only efficiency details, introduce a new custom callback in the EM framework. The new callback would allow to provide the 'cost' values which reflect efficiency of the CPUs. This would allow to provide EAS information which has different relation than what would be forced by the EM internal formulas calculating 'cost' values. Thanks to this new callback it is possible to create a system view for EAS which has no overlapping performance states across many Performance Domains. Signed-off-by: Lukasz Luba Reviewed-by: Ionela Voinescu Signed-off-by: Rafael J. Wysocki --- include/linux/energy_model.h | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 9f3c400bc52d..0a3a5663177b 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -114,9 +114,30 @@ struct em_data_callback { */ int (*active_power)(unsigned long *power, unsigned long *freq, struct device *dev); + + /** + * get_cost() - Provide the cost at the given performance state of + * a device + * @dev : Device for which we do this operation (can be a CPU) + * @freq : Frequency at the performance state in kHz + * @cost : The cost value for the performance state + * (modified) + * + * In case of CPUs, the cost is the one of a single CPU in the domain. + * It is expected to fit in the [0, EM_MAX_POWER] range due to internal + * usage in EAS calculation. + * + * Return 0 on success, or appropriate error value in case of failure. + */ + int (*get_cost)(struct device *dev, unsigned long freq, + unsigned long *cost); }; -#define EM_DATA_CB(_active_power_cb) { .active_power = &_active_power_cb } #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) ((em_cb).active_power = cb) +#define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) \ + { .active_power = _active_power_cb, \ + .get_cost = _cost_cb } +#define EM_DATA_CB(_active_power_cb) \ + EM_ADV_DATA_CB(_active_power_cb, NULL) struct em_perf_domain *em_cpu_get(int cpu); struct em_perf_domain *em_pd_get(struct device *dev); @@ -264,6 +285,7 @@ static inline int em_pd_nr_perf_states(struct em_perf_domain *pd) #else struct em_data_callback {}; +#define EM_ADV_DATA_CB(_active_power_cb, _cost_cb) { } #define EM_DATA_CB(_active_power_cb) { } #define EM_SET_ACTIVE_POWER_CB(em_cb, cb) do { } while (0) -- cgit From fc3a9a9858478ab5f8441765a3f9552a0ceba10c Mon Sep 17 00:00:00 2001 From: Pierre Gondois Date: Mon, 21 Mar 2022 09:57:23 +0000 Subject: PM: EM: Add artificial EM flag The Energy Model (EM) can be used on platforms which are missing real power information. Those platforms would implement .get_cost() which populates needed values for the Energy Aware Scheduler (EAS). The EAS doesn't use 'power' fields from EM, but other frameworks might use them. Thus, to avoid miss-usage of this specific type of EM, introduce a new flags which can be checked by other frameworks. Signed-off-by: Pierre Gondois Signed-off-by: Lukasz Luba Reviewed-by: Ionela Voinescu Signed-off-by: Rafael J. Wysocki --- include/linux/energy_model.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 0a3a5663177b..92e82a322859 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -67,11 +67,16 @@ struct em_perf_domain { * * EM_PERF_DOMAIN_SKIP_INEFFICIENCIES: Skip inefficient states when estimating * energy consumption. + * + * EM_PERF_DOMAIN_ARTIFICIAL: The power values are artificial and might be + * created by platform missing real power information */ #define EM_PERF_DOMAIN_MILLIWATTS BIT(0) #define EM_PERF_DOMAIN_SKIP_INEFFICIENCIES BIT(1) +#define EM_PERF_DOMAIN_ARTIFICIAL BIT(2) #define em_span_cpus(em) (to_cpumask((em)->cpus)) +#define em_is_artificial(em) ((em)->flags & EM_PERF_DOMAIN_ARTIFICIAL) #ifdef CONFIG_ENERGY_MODEL #define EM_MAX_POWER 0xFFFF -- cgit From 75a3a99a5a9886af13be44e640cb415ebda80db2 Mon Sep 17 00:00:00 2001 From: Lukasz Luba Date: Mon, 21 Mar 2022 09:57:25 +0000 Subject: PM: EM: Change the order of arguments in the .active_power() callback The .active_power() callback passes the device pointer when it's called. Aligned with a convetion present in other subsystems and pass the 'dev' as a first argument. It looks more cleaner. Adjust all affected drivers which implement that API callback. Suggested-by: Ionela Voinescu Signed-off-by: Lukasz Luba Reviewed-by: Ionela Voinescu Signed-off-by: Rafael J. Wysocki --- include/linux/energy_model.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 92e82a322859..8419bffb4398 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -101,11 +101,11 @@ struct em_data_callback { /** * active_power() - Provide power at the next performance state of * a device + * @dev : Device for which we do this operation (can be a CPU) * @power : Active power at the performance state * (modified) * @freq : Frequency at the performance state in kHz * (modified) - * @dev : Device for which we do this operation (can be a CPU) * * active_power() must find the lowest performance state of 'dev' above * 'freq' and update 'power' and 'freq' to the matching active power @@ -117,8 +117,8 @@ struct em_data_callback { * * Return 0 on success. */ - int (*active_power)(unsigned long *power, unsigned long *freq, - struct device *dev); + int (*active_power)(struct device *dev, unsigned long *power, + unsigned long *freq); /** * get_cost() - Provide the cost at the given performance state of -- cgit From ce1cb680ff1c5b88505f878137796b1723e00193 Mon Sep 17 00:00:00 2001 From: David Cohen Date: Thu, 24 Mar 2022 08:07:30 +0000 Subject: PM: sleep: enable dynamic debug support within pm_pr_dbg() Currently pm_pr_dbg() is used to filter kernel pm debug messages based on pm_debug_messages_on flag. The problem is if we enable/disable this flag it will affect all pm_pr_dbg() calls at once, so we can't individually control them. This patch changes pm_pr_dbg() implementation as such: - If pm_debug_messages_on is enabled, print the message. - If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is enabled, only print the messages explicitly enabled on /sys/kernel/debug/dynamic_debug/control. - If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is disabled, don't print the message. Signed-off-by: David Cohen Signed-off-by: Rafael J. Wysocki --- include/linux/suspend.h | 44 +++++++++++++++++++++++++++++++++++++++----- 1 file changed, 39 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 300273ff40cc..70f2921e2e70 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -542,22 +542,56 @@ static inline void unlock_system_sleep(void) {} #ifdef CONFIG_PM_SLEEP_DEBUG extern bool pm_print_times_enabled; extern bool pm_debug_messages_on; -extern __printf(2, 3) void __pm_pr_dbg(bool defer, const char *fmt, ...); +static inline int pm_dyn_debug_messages_on(void) +{ +#ifdef CONFIG_DYNAMIC_DEBUG + return 1; +#else + return 0; +#endif +} +#ifndef pr_fmt +#define pr_fmt(fmt) "PM: " fmt +#endif +#define __pm_pr_dbg(fmt, ...) \ + do { \ + if (pm_debug_messages_on) \ + printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); \ + else if (pm_dyn_debug_messages_on()) \ + pr_debug(fmt, ##__VA_ARGS__); \ + } while (0) +#define __pm_deferred_pr_dbg(fmt, ...) \ + do { \ + if (pm_debug_messages_on) \ + printk_deferred(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); \ + } while (0) #else #define pm_print_times_enabled (false) #define pm_debug_messages_on (false) #include -#define __pm_pr_dbg(defer, fmt, ...) \ - no_printk(KERN_DEBUG fmt, ##__VA_ARGS__) +#define __pm_pr_dbg(fmt, ...) \ + no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) +#define __pm_deferred_pr_dbg(fmt, ...) \ + no_printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__) #endif +/** + * pm_pr_dbg - print pm sleep debug messages + * + * If pm_debug_messages_on is enabled, print message. + * If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is enabled, + * print message only from instances explicitly enabled on dynamic debug's + * control. + * If pm_debug_messages_on is disabled and CONFIG_DYNAMIC_DEBUG is disabled, + * don't print message. + */ #define pm_pr_dbg(fmt, ...) \ - __pm_pr_dbg(false, fmt, ##__VA_ARGS__) + __pm_pr_dbg(fmt, ##__VA_ARGS__) #define pm_deferred_pr_dbg(fmt, ...) \ - __pm_pr_dbg(true, fmt, ##__VA_ARGS__) + __pm_deferred_pr_dbg(fmt, ##__VA_ARGS__) #ifdef CONFIG_PM_AUTOSLEEP -- cgit From 1227418989346af3af179742cf42ce842e0ad484 Mon Sep 17 00:00:00 2001 From: Dov Murik Date: Tue, 12 Apr 2022 21:21:24 +0000 Subject: efi: Save location of EFI confidential computing area Confidential computing (coco) hardware such as AMD SEV (Secure Encrypted Virtualization) allows a guest owner to inject secrets into the VMs memory without the host/hypervisor being able to read them. Firmware support for secret injection is available in OVMF, which reserves a memory area for secret injection and includes a pointer to it the in EFI config table entry LINUX_EFI_COCO_SECRET_TABLE_GUID. If EFI exposes such a table entry, uefi_init() will keep a pointer to the EFI config table entry in efi.coco_secret, so it can be used later by the kernel (specifically drivers/virt/coco/efi_secret). It will also appear in the kernel log as "CocoSecret=ADDRESS"; for example: [ 0.000000] efi: EFI v2.70 by EDK II [ 0.000000] efi: CocoSecret=0x7f22e680 SMBIOS=0x7f541000 ACPI=0x7f77e000 ACPI 2.0=0x7f77e014 MEMATTR=0x7ea0c018 The new functionality can be enabled with CONFIG_EFI_COCO_SECRET=y. Signed-off-by: Dov Murik Reviewed-by: Gerd Hoffmann Link: https://lore.kernel.org/r/20220412212127.154182-2-dovmurik@linux.ibm.com Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index ccd4d3f91c98..771d4cd06b56 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -405,6 +405,7 @@ void efi_native_runtime_setup(void); #define LINUX_EFI_MEMRESERVE_TABLE_GUID EFI_GUID(0x888eb0c6, 0x8ede, 0x4ff5, 0xa8, 0xf0, 0x9a, 0xee, 0x5c, 0xb9, 0x77, 0xc2) #define LINUX_EFI_INITRD_MEDIA_GUID EFI_GUID(0x5568e427, 0x68fc, 0x4f3d, 0xac, 0x74, 0xca, 0x55, 0x52, 0x31, 0xcc, 0x68) #define LINUX_EFI_MOK_VARIABLE_TABLE_GUID EFI_GUID(0xc451ed2b, 0x9694, 0x45d3, 0xba, 0xba, 0xed, 0x9f, 0x89, 0x88, 0xa3, 0x89) +#define LINUX_EFI_COCO_SECRET_AREA_GUID EFI_GUID(0xadf956ad, 0xe98c, 0x484c, 0xae, 0x11, 0xb5, 0x1c, 0x7d, 0x33, 0x64, 0x47) /* OEM GUIDs */ #define DELLEMC_EFI_RCI2_TABLE_GUID EFI_GUID(0x2d9f28a2, 0xa886, 0x456a, 0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55) @@ -596,6 +597,7 @@ extern struct efi { unsigned long tpm_log; /* TPM2 Event Log table */ unsigned long tpm_final_log; /* TPM2 Final Events Log table */ unsigned long mokvar_table; /* MOK variable config table */ + unsigned long coco_secret; /* Confidential computing secret table */ efi_get_time_t *get_time; efi_set_time_t *set_time; @@ -1335,4 +1337,12 @@ extern void efifb_setup_from_dmi(struct screen_info *si, const char *opt); static inline void efifb_setup_from_dmi(struct screen_info *si, const char *opt) { } #endif +struct linux_efi_coco_secret_area { + u64 base_pa; + u64 size; +}; + +/* Header of a populated EFI secret area */ +#define EFI_SECRET_TABLE_HEADER_GUID EFI_GUID(0x1e74f542, 0x71dd, 0x4d66, 0x96, 0x3e, 0xef, 0x42, 0x87, 0xff, 0x17, 0x3b) + #endif /* _LINUX_EFI_H */ -- cgit From aa480379d8bdb33920d68acfd90f823c8af32578 Mon Sep 17 00:00:00 2001 From: Jan Kiszka Date: Fri, 4 Mar 2022 07:36:37 +0100 Subject: efi: Add missing prototype for efi_capsule_setup_info Fixes "no previous declaration for 'efi_capsule_setup_info'" warnings under W=1. Fixes: 2959c95d510c ("efi/capsule: Add support for Quark security header") Signed-off-by: Jan Kiszka Link: https://lore.kernel.org/r/c28d3f86-dd72-27d1-e2c2-40971b8da6bd@siemens.com Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 771d4cd06b56..fd266198f9d8 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -213,6 +213,8 @@ struct capsule_info { size_t page_bytes_remain; }; +int efi_capsule_setup_info(struct capsule_info *cap_info, void *kbuff, + size_t hdr_bytes); int __efi_capsule_setup_info(struct capsule_info *cap_info); /* -- cgit From 1ef3342a934e235aca72b4bcc0d6854d80a65077 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 13 Apr 2022 10:10:36 -0300 Subject: vfio/pci: Fix vf_token mechanism when device-specific VF drivers are used get_pf_vdev() tries to check if a PF is a VFIO PF by looking at the driver: if (pci_dev_driver(physfn) != pci_dev_driver(vdev->pdev)) { However now that we have multiple VF and PF drivers this is no longer reliable. This means that security tests realted to vf_token can be skipped by mixing and matching different VFIO PCI drivers. Instead of trying to use the driver core to find the PF devices maintain a linked list of all PF vfio_pci_core_device's that we have called pci_enable_sriov() on. When registering a VF just search the list to see if the PF is present and record the match permanently in the struct. PCI core locking prevents a PF from passing pci_disable_sriov() while VF drivers are attached so the VFIO owned PF becomes a static property of the VF. In common cases where vfio does not own the PF the global list remains empty and the VF's pointer is statically NULL. This also fixes a lockdep splat from recursive locking of the vfio_group::device_lock between vfio_device_get_from_name() and vfio_device_get_from_dev(). If the VF and PF share the same group this would deadlock. Fixes: ff53edf6d6ab ("vfio/pci: Split the pci_driver code out of vfio_pci_core.c") Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/0-v3-876570980634+f2e8-vfio_vf_token_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio_pci_core.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 74a4a0f17b28..48f2dd3c568c 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -133,6 +133,8 @@ struct vfio_pci_core_device { struct mutex ioeventfds_lock; struct list_head ioeventfds_list; struct vfio_pci_vf_token *vf_token; + struct list_head sriov_pfs_item; + struct vfio_pci_core_device *sriov_pf_core_dev; struct notifier_block nb; struct mutex vma_lock; struct list_head vma_list; -- cgit From 002752af7b89b74c64fe6bec8c5fde3d3a7810d8 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 8 Apr 2022 21:48:40 +0300 Subject: device property: Allow error pointer to be passed to fwnode APIs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Some of the fwnode APIs might return an error pointer instead of NULL or valid fwnode handle. The result of such API call may be considered optional and hence the test for it is usually done in a form of fwnode = fwnode_find_reference(...); if (IS_ERR(fwnode)) ...error handling... Nevertheless the resulting fwnode may have bumped the reference count and hence caller of the above API is obliged to call fwnode_handle_put(). Since fwnode may be not valid either as NULL or error pointer the check has to be performed there. This approach uglifies the code and adds a point of making a mistake, i.e. forgetting about error point case. To prevent this, allow an error pointer to be passed to the fwnode APIs. Fixes: 83b34afb6b79 ("device property: Introduce fwnode_find_reference()") Reported-by: Nuno Sá Tested-by: Nuno Sá Acked-by: Nuno Sá Reviewed-by: Sakari Ailus Reviewed-by: Heikki Krogerus Signed-off-by: Andy Shevchenko Tested-by: Michael Walle Signed-off-by: Rafael J. Wysocki --- include/linux/fwnode.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h index 6ab69871b06d..9a81c4410b9f 100644 --- a/include/linux/fwnode.h +++ b/include/linux/fwnode.h @@ -153,12 +153,12 @@ struct fwnode_operations { int (*add_links)(struct fwnode_handle *fwnode); }; -#define fwnode_has_op(fwnode, op) \ - ((fwnode) && (fwnode)->ops && (fwnode)->ops->op) +#define fwnode_has_op(fwnode, op) \ + (!IS_ERR_OR_NULL(fwnode) && (fwnode)->ops && (fwnode)->ops->op) + #define fwnode_call_int_op(fwnode, op, ...) \ - (fwnode ? (fwnode_has_op(fwnode, op) ? \ - (fwnode)->ops->op(fwnode, ## __VA_ARGS__) : -ENXIO) : \ - -EINVAL) + (fwnode_has_op(fwnode, op) ? \ + (fwnode)->ops->op(fwnode, ## __VA_ARGS__) : (IS_ERR_OR_NULL(fwnode) ? -EINVAL : -ENXIO)) #define fwnode_call_bool_op(fwnode, op, ...) \ (fwnode_has_op(fwnode, op) ? \ -- cgit From 87ffea09470d94c93dd6a5a22d4b2216b395d1ea Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 8 Apr 2022 21:48:41 +0300 Subject: device property: Introduce fwnode_for_each_parent_node() In a few cases the functionality of fwnode_for_each_parent_node() is already in use. Introduce a common helper macro for it. It may be used by others as well in the future. Signed-off-by: Andy Shevchenko Reviewed-by: Sakari Ailus Signed-off-by: Rafael J. Wysocki --- include/linux/property.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/property.h b/include/linux/property.h index 4cd4b326941f..15d6863ae962 100644 --- a/include/linux/property.h +++ b/include/linux/property.h @@ -83,9 +83,14 @@ struct fwnode_handle *fwnode_find_reference(const struct fwnode_handle *fwnode, const char *fwnode_get_name(const struct fwnode_handle *fwnode); const char *fwnode_get_name_prefix(const struct fwnode_handle *fwnode); + struct fwnode_handle *fwnode_get_parent(const struct fwnode_handle *fwnode); -struct fwnode_handle *fwnode_get_next_parent( - struct fwnode_handle *fwnode); +struct fwnode_handle *fwnode_get_next_parent(struct fwnode_handle *fwnode); + +#define fwnode_for_each_parent_node(fwnode, parent) \ + for (parent = fwnode_get_parent(fwnode); parent; \ + parent = fwnode_get_next_parent(parent)) + struct device *fwnode_get_next_parent_dev(struct fwnode_handle *fwnode); unsigned int fwnode_count_parents(const struct fwnode_handle *fwn); struct fwnode_handle *fwnode_get_nth_parent(struct fwnode_handle *fwn, -- cgit From 022fe6bc8f3bd0f2b1f9bc778bacc38020dfd420 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 8 Apr 2022 21:48:42 +0300 Subject: device property: Drop 'test' prefix in parameters of fwnode_is_ancestor_of() The part 'is' in the function name implies the test against something. Drop unnecessary 'test' prefix in the fwnode_is_ancestor_of() parameters. No functional change intended. Signed-off-by: Andy Shevchenko Reviewed-by: Sakari Ailus Signed-off-by: Rafael J. Wysocki --- include/linux/property.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/property.h b/include/linux/property.h index 15d6863ae962..fc24d45632eb 100644 --- a/include/linux/property.h +++ b/include/linux/property.h @@ -95,8 +95,7 @@ struct device *fwnode_get_next_parent_dev(struct fwnode_handle *fwnode); unsigned int fwnode_count_parents(const struct fwnode_handle *fwn); struct fwnode_handle *fwnode_get_nth_parent(struct fwnode_handle *fwn, unsigned int depth); -bool fwnode_is_ancestor_of(struct fwnode_handle *test_ancestor, - struct fwnode_handle *test_child); +bool fwnode_is_ancestor_of(struct fwnode_handle *ancestor, struct fwnode_handle *child); struct fwnode_handle *fwnode_get_next_child_node( const struct fwnode_handle *fwnode, struct fwnode_handle *child); struct fwnode_handle *fwnode_get_next_available_child_node( -- cgit From 4e140f59d285c1ca1e5c81b4c13e27366865bd09 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 10 Jan 2022 23:15:27 +0000 Subject: mm/usercopy: Check kmap addresses properly If you are copying to an address in the kmap region, you may not copy across a page boundary, no matter what the size of the underlying allocation. You can't kmap() a slab page because slab pages always come from low memory. Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Kees Cook Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220110231530.665970-2-willy@infradead.org --- include/linux/highmem-internal.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/highmem-internal.h b/include/linux/highmem-internal.h index a77be5630209..337bd9f32921 100644 --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -149,6 +149,11 @@ static inline void totalhigh_pages_add(long count) atomic_long_add(count, &_totalhigh_pages); } +static inline bool is_kmap_addr(const void *x) +{ + unsigned long addr = (unsigned long)x; + return addr >= PKMAP_ADDR(0) && addr < PKMAP_ADDR(LAST_PKMAP); +} #else /* CONFIG_HIGHMEM */ static inline struct page *kmap_to_page(void *addr) @@ -234,6 +239,11 @@ static inline void __kunmap_atomic(void *addr) static inline unsigned int nr_free_highpages(void) { return 0; } static inline unsigned long totalhigh_pages(void) { return 0UL; } +static inline bool is_kmap_addr(const void *x) +{ + return false; +} + #endif /* CONFIG_HIGHMEM */ /* -- cgit From e6f3b3c9c109ed57230996cf4a4c1b8ae7e36a81 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Tue, 5 Apr 2022 15:16:18 -0700 Subject: cfi: Use __builtin_function_start Clang 14 added support for the __builtin_function_start function, which allows us to implement the function_nocfi macro without architecture-specific inline assembly and in a way that also works with static initializers. Change CONFIG_CFI_CLANG to depend on Clang >= 14, define function_nocfi using __builtin_function_start, and remove the arm64 inline assembly implementation. Link: https://github.com/llvm/llvm-project/commit/ec2e26eaf63558934f5b73a6e530edc453cf9508 Link: https://github.com/ClangBuiltLinux/linux/issues/1353 Signed-off-by: Sami Tolvanen Reviewed-by: Nick Desaulniers Reviewed-by: Mark Rutland Tested-by: Mark Rutland Acked-by: Will Deacon # arm64 Reviewed-by: Nathan Chancellor Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220405221618.633743-1-samitolvanen@google.com --- include/linux/compiler-clang.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index babb1347148c..c84fec767445 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -69,6 +69,16 @@ #define __nocfi __attribute__((__no_sanitize__("cfi"))) #define __cficanonical __attribute__((__cfi_canonical_jump_table__)) +#if defined(CONFIG_CFI_CLANG) +/* + * With CONFIG_CFI_CLANG, the compiler replaces function address + * references with the address of the function's CFI jump table + * entry. The function_nocfi macro always returns the address of the + * actual function instead. + */ +#define function_nocfi(x) __builtin_function_start(x) +#endif + /* * Turn individual warnings and errors on and off locally, depending * on version. -- cgit From 63cec1389e116ae0e2a15e612a5b49651e04be3f Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 4 Apr 2022 18:09:14 -0700 Subject: fscrypt: split up FS_CRYPTO_BLOCK_SIZE FS_CRYPTO_BLOCK_SIZE is neither the filesystem block size nor the granularity of encryption. Rather, it defines two logically separate constraints that both arise from the block size of the AES cipher: - The alignment required for the lengths of file contents blocks - The minimum input/output length for the filenames encryption modes Since there are way too many things called the "block size", and the connection with the AES block size is not easily understood, split FS_CRYPTO_BLOCK_SIZE into two constants FSCRYPT_CONTENTS_ALIGNMENT and FSCRYPT_FNAME_MIN_MSG_LEN that more clearly describe what they are. Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20220405010914.18519-1-ebiggers@kernel.org --- include/linux/fscrypt.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index 50d92d805bd8..efc7f96e5e26 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -18,7 +18,17 @@ #include #include -#define FS_CRYPTO_BLOCK_SIZE 16 +/* + * The lengths of all file contents blocks must be divisible by this value. + * This is needed to ensure that all contents encryption modes will work, as + * some of the supported modes don't support arbitrarily byte-aligned messages. + * + * Since the needed alignment is 16 bytes, most filesystems will meet this + * requirement naturally, as typical block sizes are powers of 2. However, if a + * filesystem can generate arbitrarily byte-aligned block lengths (e.g., via + * compression), then it will need to pad to this alignment before encryption. + */ +#define FSCRYPT_CONTENTS_ALIGNMENT 16 union fscrypt_policy; struct fscrypt_info; -- cgit From 9554e908fb5d02e48a681d1eca180225bf109e83 Mon Sep 17 00:00:00 2001 From: Brian Gerst Date: Fri, 25 Mar 2022 11:39:51 -0400 Subject: ELF: Remove elf_core_copy_kernel_regs() x86-32 was the last architecture that implemented separate user and kernel registers. Signed-off-by: Brian Gerst Signed-off-by: Borislav Petkov Reviewed-by: Thomas Gleixner Acked-by: Andy Lutomirski Acked-by: Michael Ellerman Link: https://lore.kernel.org/r/20220325153953.162643-3-brgerst@gmail.com --- include/linux/elfcore.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h index f8e206e82476..346a8b56cdc8 100644 --- a/include/linux/elfcore.h +++ b/include/linux/elfcore.h @@ -84,15 +84,6 @@ static inline void elf_core_copy_regs(elf_gregset_t *elfregs, struct pt_regs *re #endif } -static inline void elf_core_copy_kernel_regs(elf_gregset_t *elfregs, struct pt_regs *regs) -{ -#ifdef ELF_CORE_COPY_KERNEL_REGS - ELF_CORE_COPY_KERNEL_REGS((*elfregs), regs); -#else - elf_core_copy_regs(elfregs, regs); -#endif -} - static inline int elf_core_copy_task_regs(struct task_struct *t, elf_gregset_t* elfregs) { #if defined (ELF_CORE_COPY_TASK_REGS) -- cgit From 64b97df995f0c943be469a019d6117c89e2131bc Mon Sep 17 00:00:00 2001 From: Lech Perczak Date: Wed, 13 Apr 2022 03:44:14 +0200 Subject: cdc_ether: export usbnet_cdc_zte_rx_fixup MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling") introduces a workaround for certain ZTE modems reporting invalid MAC addresses over CDC-ECM. The same issue was present on their RNDIS interface,which was fixed in commit a5a18bdf7453 ("rndis_host: Set valid random MAC on buggy devices"). However, internal modem of ZTE MF286R router, on its RNDIS interface, also exhibits a second issue fixed already in CDC-ECM, of the device not respecting configured random MAC address. In order to share the fixup for this with rndis_host driver, export the workaround function, which will be re-used in the following commit in rndis_host. Cc: Kristian Evensen Cc: Bjørn Mork Cc: Oliver Neukum Signed-off-by: Lech Perczak Signed-off-by: Paolo Abeni --- include/linux/usb/usbnet.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index 8336e86ce606..1b4d72d5e891 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -214,6 +214,7 @@ extern int usbnet_ether_cdc_bind(struct usbnet *dev, struct usb_interface *intf) extern int usbnet_cdc_bind(struct usbnet *, struct usb_interface *); extern void usbnet_cdc_unbind(struct usbnet *, struct usb_interface *); extern void usbnet_cdc_status(struct usbnet *, struct urb *); +extern int usbnet_cdc_zte_rx_fixup(struct usbnet *dev, struct sk_buff *skb); /* CDC and RNDIS support the same host-chosen packet filters for IN transfers */ #define DEFAULT_FILTER (USB_CDC_PACKET_TYPE_BROADCAST \ -- cgit From 36e747972d8b4c09e6e3275e31a3acba46e2c4d2 Mon Sep 17 00:00:00 2001 From: Lech Perczak Date: Wed, 13 Apr 2022 03:44:15 +0200 Subject: rndis_host: enable the bogus MAC fixup for ZTE devices from cdc_ether MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Certain ZTE modems, namely: MF823. MF831, MF910, built-in modem from MF286R, expose both CDC-ECM and RNDIS network interfaces. They have a trait of ignoring the locally-administered MAC address configured on the interface both in CDC-ECM and RNDIS part, and this leads to dropping of incoming traffic by the host. However, the workaround was only present in CDC-ECM, and MF286R explicitly requires it in RNDIS mode. Re-use the workaround in rndis_host as well, to fix operation of MF286R module, some versions of which expose only the RNDIS interface. Do so by introducing new flag, RNDIS_DRIVER_DATA_DST_MAC_FIXUP, and testing for it in rndis_rx_fixup. This is required, as RNDIS uses frame batching, and all of the packets inside the batch need the fixup. This might introduce a performance penalty, because test is done for every returned Ethernet frame. Apply the workaround to both "flavors" of RNDIS interfaces, as older ZTE modems, like MF823 found in the wild, report the USB_CLASS_COMM class interfaces, while MF286R reports USB_CLASS_WIRELESS_CONTROLLER. Suggested-by: Bjørn Mork Cc: Kristian Evensen Cc: Oliver Neukum Signed-off-by: Lech Perczak Signed-off-by: Paolo Abeni --- include/linux/usb/rndis_host.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/usb/rndis_host.h b/include/linux/usb/rndis_host.h index 809bccd08455..cc42db51bbba 100644 --- a/include/linux/usb/rndis_host.h +++ b/include/linux/usb/rndis_host.h @@ -197,6 +197,7 @@ struct rndis_keepalive_c { /* IN (optionally OUT) */ /* Flags for driver_info::data */ #define RNDIS_DRIVER_DATA_POLL_STATUS 1 /* poll status before control */ +#define RNDIS_DRIVER_DATA_DST_MAC_FIXUP 2 /* device ignores configured MAC address */ extern void rndis_status(struct usbnet *dev, struct urb *urb); extern int -- cgit From 3dc6ffae2da201284cb24af66af77ee0bbb2efaa Mon Sep 17 00:00:00 2001 From: Kurt Kanzenbach Date: Thu, 14 Apr 2022 11:18:03 +0200 Subject: timekeeping: Introduce fast accessor to clock tai Introduce fast/NMI safe accessor to clock tai for tracing. The Linux kernel tracing infrastructure has support for using different clocks to generate timestamps for trace events. Especially in TSN networks it's useful to have TAI as trace clock, because the application scheduling is done in accordance to the network time, which is based on TAI. With a tai trace_clock in place, it becomes very convenient to correlate network activity with Linux kernel application traces. Use the same implementation as ktime_get_boot_fast_ns() does by reading the monotonic time and adding the TAI offset. The same limitations as for the fast boot implementation apply. The TAI offset may change at run time e.g., by setting the time or using adjtimex() with an offset. However, these kind of offset changes are rare events. Nevertheless, the user has to be aware and deal with it in post processing. An alternative approach would be to use the same implementation as ktime_get_real_fast_ns() does. However, this requires to add an additional u64 member to the tk_read_base struct. This struct together with a seqcount is designed to fit into a single cache line on 64 bit architectures. Adding a new member would violate this constraint. Signed-off-by: Kurt Kanzenbach Signed-off-by: Thomas Gleixner Cc: Steven Rostedt Link: https://lore.kernel.org/r/20220414091805.89667-2-kurt@linutronix.de --- include/linux/timekeeping.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/timekeeping.h b/include/linux/timekeeping.h index 78a98bdff76d..fe1e467ba046 100644 --- a/include/linux/timekeeping.h +++ b/include/linux/timekeeping.h @@ -177,6 +177,7 @@ static inline u64 ktime_get_raw_ns(void) extern u64 ktime_get_mono_fast_ns(void); extern u64 ktime_get_raw_fast_ns(void); extern u64 ktime_get_boot_fast_ns(void); +extern u64 ktime_get_tai_fast_ns(void); extern u64 ktime_get_real_fast_ns(void); /* -- cgit From af47d8033fc731f19600efd27ba4a7d0fdfcc77c Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 14 Apr 2022 21:42:48 +0300 Subject: gpiolib: Introduce a helper to get first GPIO controller node Introduce a helper to get first GPIO controller node which drivers may want to use. Signed-off-by: Andy Shevchenko Tested-by: Marek Szyprowski --- include/linux/gpio/driver.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 12de0b22b4ef..83e2d72e51bb 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -766,4 +766,14 @@ static inline unsigned int gpiochip_node_count(struct device *dev) return count; } +static inline struct fwnode_handle *gpiochip_node_get_first(struct device *dev) +{ + struct fwnode_handle *fwnode; + + for_each_gpiochip_node(dev, fwnode) + return fwnode; + + return NULL; +} + #endif /* __LINUX_GPIO_DRIVER_H */ -- cgit From f1724d397c60d296c0805c95a46ae7fc7163b70c Mon Sep 17 00:00:00 2001 From: Kai Ye Date: Sat, 9 Apr 2022 16:03:18 +0800 Subject: crypto: hisilicon/qm - add register checking for ACC Add register detection function to accelerator. Provided a tool that user can checking differential register through Debugfs. e.g. cd /sys/kernel/debug/hisi_zip//zip_dfx cat diff_regs Signed-off-by: Longfang Liu Signed-off-by: Kai Ye Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 177f7b7cd414..39acc0316a60 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -168,6 +168,12 @@ enum qm_vf_state { QM_NOT_READY, }; +struct dfx_diff_registers { + u32 *regs; + u32 reg_offset; + u32 reg_len; +}; + struct qm_dfx { atomic64_t err_irq_cnt; atomic64_t aeq_irq_cnt; @@ -190,6 +196,8 @@ struct qm_debug { struct dentry *debug_root; struct dentry *qm_d; struct debugfs_file files[DEBUG_FILE_NUM]; + struct dfx_diff_registers *qm_diff_regs; + struct dfx_diff_registers *acc_diff_regs; }; struct qm_shaper_factor { @@ -448,6 +456,12 @@ int hisi_qm_sriov_disable(struct pci_dev *pdev, bool is_frozen); int hisi_qm_sriov_configure(struct pci_dev *pdev, int num_vfs); void hisi_qm_dev_err_init(struct hisi_qm *qm); void hisi_qm_dev_err_uninit(struct hisi_qm *qm); +int hisi_qm_diff_regs_init(struct hisi_qm *qm, + struct dfx_diff_registers *dregs, int reg_len); +void hisi_qm_diff_regs_uninit(struct hisi_qm *qm, int reg_len); +void hisi_qm_acc_diff_regs_dump(struct hisi_qm *qm, struct seq_file *s, + struct dfx_diff_registers *dregs, int regs_len); + pci_ers_result_t hisi_qm_dev_err_detected(struct pci_dev *pdev, pci_channel_state_t state); pci_ers_result_t hisi_qm_dev_slot_reset(struct pci_dev *pdev); -- cgit From a888ccd6c66683a49977ba6a2b91fe52fbec9367 Mon Sep 17 00:00:00 2001 From: Kai Ye Date: Sat, 9 Apr 2022 16:03:25 +0800 Subject: crypto: hisilicon/qm - add last word dumping for ACC Add last word dumping function during acc engines controller reset. The last words are reported to the printed information during the reset. The dmesg information included qm debugging registers and engine debugging registers. It can help to improve debugging capability. Signed-off-by: Kai Ye Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 39acc0316a60..e5522eaf88fd 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -196,6 +196,9 @@ struct qm_debug { struct dentry *debug_root; struct dentry *qm_d; struct debugfs_file files[DEBUG_FILE_NUM]; + unsigned int *qm_last_words; + /* ACC engines recoreding last regs */ + unsigned int *last_words; struct dfx_diff_registers *qm_diff_regs; struct dfx_diff_registers *acc_diff_regs; }; @@ -251,6 +254,7 @@ struct hisi_qm_err_ini { void (*open_sva_prefetch)(struct hisi_qm *qm); void (*close_sva_prefetch)(struct hisi_qm *qm); void (*log_dev_hw_err)(struct hisi_qm *qm, u32 err_sts); + void (*show_last_dfx_regs)(struct hisi_qm *qm); void (*err_info_init)(struct hisi_qm *qm); }; -- cgit From f6f586102add59d57bcc6eea06fdeaae11bb17a1 Mon Sep 17 00:00:00 2001 From: Eric Tremblay Date: Wed, 30 Mar 2022 12:46:40 +0200 Subject: serial: 8250: Handle UART without interrupt on TEMT using em485 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce the UART_CAP_NOTEMT capability. The capability indicates that the UART doesn't have an interrupt available on TEMT. In the case where the device does not support it, we calculate the maximum time it could take for the transmitter to empty the shift register. When we get in the situation where we get the THRE interrupt, we check if the TEMT bit is set. If it's not, we start the a timer and recall __stop_tx() after the delay. The transmit sequence is a bit modified when the capability is set. The new timer is used between the last interrupt(THRE) and a potential stop_tx timer. Signed-off-by: Giulio Benetti [moved to use added UART_CAP_TEMT] Signed-off-by: Heiko Stuebner [moved to use added UART_CAP_NOTEMT, improve timeout] Signed-off-by: Eric Tremblay [rebased to v5.17, making use of tty_get_frame_size] Signed-off-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20220330104642.229507-2-u.kleine-koenig@pengutronix.de Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_8250.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h index ff84a3ed10ea..de135852107c 100644 --- a/include/linux/serial_8250.h +++ b/include/linux/serial_8250.h @@ -79,7 +79,9 @@ struct uart_8250_ops { struct uart_8250_em485 { struct hrtimer start_tx_timer; /* "rs485 start tx" timer */ struct hrtimer stop_tx_timer; /* "rs485 stop tx" timer */ + struct hrtimer no_temt_timer; /* "rs485 no TEMT interrupt" timer */ struct hrtimer *active_timer; /* pointer to active timer */ + unsigned long no_temt_delay; /* Delay for no_temt_timer */ struct uart_8250_port *port; /* for hrtimer callbacks */ unsigned int tx_stopped:1; /* tx is currently stopped */ }; -- cgit From 4dc84c06a343fcb95fd5a0acb537aefa4ebdd1b0 Mon Sep 17 00:00:00 2001 From: Jie Wang Date: Tue, 12 Apr 2022 10:01:19 +0800 Subject: net: ethtool: extend ringparam set/get APIs for tx_push Currently tx push is a standard driver feature which controls use of a fast path descriptor push. So this patch extends the ringparam APIs and data structures to support set/get tx push by ethtool -G/g. Signed-off-by: Jie Wang Signed-off-by: Guangbin Huang Signed-off-by: Jakub Kicinski --- include/linux/ethtool.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h index 4af58459a1e7..99dc7bfbcd3c 100644 --- a/include/linux/ethtool.h +++ b/include/linux/ethtool.h @@ -71,11 +71,13 @@ enum { * struct kernel_ethtool_ringparam - RX/TX ring configuration * @rx_buf_len: Current length of buffers on the rx ring. * @tcp_data_split: Scatter packet headers and data to separate buffers + * @tx_push: The flag of tx push mode * @cqe_size: Size of TX/RX completion queue event */ struct kernel_ethtool_ringparam { u32 rx_buf_len; u8 tcp_data_split; + u8 tx_push; u32 cqe_size; }; @@ -83,10 +85,12 @@ struct kernel_ethtool_ringparam { * enum ethtool_supported_ring_param - indicator caps for setting ring params * @ETHTOOL_RING_USE_RX_BUF_LEN: capture for setting rx_buf_len * @ETHTOOL_RING_USE_CQE_SIZE: capture for setting cqe_size + * @ETHTOOL_RING_USE_TX_PUSH: capture for setting tx_push */ enum ethtool_supported_ring_param { ETHTOOL_RING_USE_RX_BUF_LEN = BIT(0), ETHTOOL_RING_USE_CQE_SIZE = BIT(1), + ETHTOOL_RING_USE_TX_PUSH = BIT(2), }; #define __ETH_RSS_HASH_BIT(bit) ((u32)1 << (bit)) -- cgit From 2dfe63e61cc31ee59ce951672b0850b5229cd5b0 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Thu, 14 Apr 2022 19:13:40 -0700 Subject: mm, kfence: support kmem_dump_obj() for KFENCE objects Calling kmem_obj_info() via kmem_dump_obj() on KFENCE objects has been producing garbage data due to the object not actually being maintained by SLAB or SLUB. Fix this by implementing __kfence_obj_info() that copies relevant information to struct kmem_obj_info when the object was allocated by KFENCE; this is called by a common kmem_obj_info(), which also calls the slab/slub/slob specific variant now called __kmem_obj_info(). For completeness, kmem_dump_obj() now displays if the object was allocated by KFENCE. Link: https://lore.kernel.org/all/20220323090520.GG16885@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/20220406131558.3558585-1-elver@google.com Fixes: b89fb5ef0ce6 ("mm, kfence: insert KFENCE hooks for SLUB") Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Marco Elver Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reported-by: kernel test robot Acked-by: Vlastimil Babka [slab] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kfence.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kfence.h b/include/linux/kfence.h index f49e64222628..726857a4b680 100644 --- a/include/linux/kfence.h +++ b/include/linux/kfence.h @@ -204,6 +204,22 @@ static __always_inline __must_check bool kfence_free(void *addr) */ bool __must_check kfence_handle_page_fault(unsigned long addr, bool is_write, struct pt_regs *regs); +#ifdef CONFIG_PRINTK +struct kmem_obj_info; +/** + * __kfence_obj_info() - fill kmem_obj_info struct + * @kpp: kmem_obj_info to be filled + * @object: the object + * + * Return: + * * false - not a KFENCE object + * * true - a KFENCE object, filled @kpp + * + * Copies information to @kpp for KFENCE objects. + */ +bool __kfence_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab); +#endif + #else /* CONFIG_KFENCE */ static inline bool is_kfence_address(const void *addr) { return false; } @@ -221,6 +237,14 @@ static inline bool __must_check kfence_handle_page_fault(unsigned long addr, boo return false; } +#ifdef CONFIG_PRINTK +struct kmem_obj_info; +static inline bool __kfence_obj_info(struct kmem_obj_info *kpp, void *object, struct slab *slab) +{ + return false; +} +#endif + #endif #endif /* _LINUX_KFENCE_H */ -- cgit From f9a2fb73318eb4dbf8cd84866b8b0dd012d8b116 Mon Sep 17 00:00:00 2001 From: Arun Ajith S Date: Fri, 15 Apr 2022 08:34:02 +0000 Subject: net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131 Add a new neighbour cache entry in STALE state for routers on receiving an unsolicited (gratuitous) neighbour advertisement with target link-layer-address option specified. This is similar to the arp_accept configuration for IPv4. A new sysctl endpoint is created to turn on this behaviour: /proc/sys/net/ipv6/conf/interface/accept_unsolicited_na. Signed-off-by: Arun Ajith S Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/ipv6.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 16870f86c74d..918bfea4ef5f 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -61,6 +61,7 @@ struct ipv6_devconf { __s32 suppress_frag_ndisc; __s32 accept_ra_mtu; __s32 drop_unsolicited_na; + __s32 accept_unsolicited_na; struct ipv6_stable_secret { bool initialized; struct in6_addr secret; -- cgit From da40b613f89c43c58986e6f30560ad6573a4d569 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 15 Apr 2022 17:10:41 -0700 Subject: tcp: add drop reason support to tcp_validate_incoming() Creates four new drop reasons for the following cases: 1) packet being rejected by RFC 7323 PAWS check 2) packet being rejected by SEQUENCE check 3) Invalid RST packet 4) Invalid SYN packet Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/skbuff.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0ef11df1bc67..a903da1fa0ed 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -381,6 +381,12 @@ enum skb_drop_reason { * the ofo queue, corresponding to * LINUX_MIB_TCPOFOMERGE */ + SKB_DROP_REASON_TCP_RFC7323_PAWS, /* PAWS check, corresponding to + * LINUX_MIB_PAWSESTABREJECTED + */ + SKB_DROP_REASON_TCP_INVALID_SEQUENCE, /* Not acceptable SEQ field */ + SKB_DROP_REASON_TCP_RESET, /* Invalid RST packet */ + SKB_DROP_REASON_TCP_INVALID_SYN, /* Incoming packet has unexpected SYN flag */ SKB_DROP_REASON_IP_OUTNOROUTES, /* route lookup failed */ SKB_DROP_REASON_BPF_CGROUP_EGRESS, /* dropped by * BPF_PROG_TYPE_CGROUP_SKB -- cgit From 669da7a71890b2b2a31a7e9571c0fdf1123e26ef Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 15 Apr 2022 17:10:43 -0700 Subject: tcp: add drop reasons to tcp_rcv_state_process() Add basic support for drop reasons in tcp_rcv_state_process() Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/skbuff.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a903da1fa0ed..6f1410b5ff13 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -387,6 +387,9 @@ enum skb_drop_reason { SKB_DROP_REASON_TCP_INVALID_SEQUENCE, /* Not acceptable SEQ field */ SKB_DROP_REASON_TCP_RESET, /* Invalid RST packet */ SKB_DROP_REASON_TCP_INVALID_SYN, /* Incoming packet has unexpected SYN flag */ + SKB_DROP_REASON_TCP_CLOSE, /* TCP socket in CLOSE state */ + SKB_DROP_REASON_TCP_FASTOPEN, /* dropped by FASTOPEN request socket */ + SKB_DROP_REASON_TCP_OLD_ACK, /* TCP ACK is old, but in window */ SKB_DROP_REASON_IP_OUTNOROUTES, /* route lookup failed */ SKB_DROP_REASON_BPF_CGROUP_EGRESS, /* dropped by * BPF_PROG_TYPE_CGROUP_SKB -- cgit From 4b506af9c5b8de0da34097d50d9448dfb33d70c3 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 15 Apr 2022 17:10:44 -0700 Subject: tcp: add two drop reasons for tcp_ack() Add TCP_TOO_OLD_ACK and TCP_ACK_UNSENT_DATA drop reasons so that tcp_rcv_established() can report them. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/skbuff.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6f1410b5ff13..9ff5557b1909 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -390,6 +390,8 @@ enum skb_drop_reason { SKB_DROP_REASON_TCP_CLOSE, /* TCP socket in CLOSE state */ SKB_DROP_REASON_TCP_FASTOPEN, /* dropped by FASTOPEN request socket */ SKB_DROP_REASON_TCP_OLD_ACK, /* TCP ACK is old, but in window */ + SKB_DROP_REASON_TCP_TOO_OLD_ACK, /* TCP ACK is too old */ + SKB_DROP_REASON_TCP_ACK_UNSENT_DATA, /* TCP ACK for data we haven't sent yet */ SKB_DROP_REASON_IP_OUTNOROUTES, /* route lookup failed */ SKB_DROP_REASON_BPF_CGROUP_EGRESS, /* dropped by * BPF_PROG_TYPE_CGROUP_SKB -- cgit From e7c89ae4078eab24af71ba26b91642e819a4bd7f Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 15 Apr 2022 17:10:45 -0700 Subject: tcp: add drop reason support to tcp_prune_ofo_queue() Add one reason for packets dropped from OFO queue because of memory pressure. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9ff5557b1909..ad15ad208b56 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -392,6 +392,7 @@ enum skb_drop_reason { SKB_DROP_REASON_TCP_OLD_ACK, /* TCP ACK is old, but in window */ SKB_DROP_REASON_TCP_TOO_OLD_ACK, /* TCP ACK is too old */ SKB_DROP_REASON_TCP_ACK_UNSENT_DATA, /* TCP ACK for data we haven't sent yet */ + SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE, /* pruned from TCP OFO queue */ SKB_DROP_REASON_IP_OUTNOROUTES, /* route lookup failed */ SKB_DROP_REASON_BPF_CGROUP_EGRESS, /* dropped by * BPF_PROG_TYPE_CGROUP_SKB -- cgit From 8fbf195798b56e1e87f62d01be636a6425c304c2 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 15 Apr 2022 17:10:48 -0700 Subject: tcp: add drop reason support to tcp_ofo_queue() packets in OFO queue might be redundant, and dropped. tcp_drop() is no longer needed. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/skbuff.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ad15ad208b56..84d78df60453 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -393,6 +393,7 @@ enum skb_drop_reason { SKB_DROP_REASON_TCP_TOO_OLD_ACK, /* TCP ACK is too old */ SKB_DROP_REASON_TCP_ACK_UNSENT_DATA, /* TCP ACK for data we haven't sent yet */ SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE, /* pruned from TCP OFO queue */ + SKB_DROP_REASON_TCP_OFO_DROP, /* data already in receive queue */ SKB_DROP_REASON_IP_OUTNOROUTES, /* route lookup failed */ SKB_DROP_REASON_BPF_CGROUP_EGRESS, /* dropped by * BPF_PROG_TYPE_CGROUP_SKB -- cgit From 0df71650c051ab106c921de257f4b38e9e3dd251 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Thu, 24 Mar 2022 16:35:24 -0400 Subject: block: allow using the per-cpu bio cache from bio_alloc_bioset Replace the BIO_PERCPU_CACHE bio-internal flag with a REQ_ALLOC_CACHE one that can be passed to bio_alloc / bio_alloc_bioset, and implement the percpu cache allocation logic in a helper called from bio_alloc_bioset. This allows any bio_alloc_bioset user to use the percpu caches instead of having the functionality tied to struct kiocb. Signed-off-by: Mike Snitzer [hch: refactored a bit] Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220324203526.62306-2-snitzer@kernel.org Signed-off-by: Jens Axboe --- include/linux/bio.h | 2 -- include/linux/blk_types.h | 3 ++- 2 files changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 278cc81cc1e7..0fe8a51a0778 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -405,8 +405,6 @@ extern int bioset_init_from_src(struct bio_set *bs, struct bio_set *src); struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, unsigned int opf, gfp_t gfp_mask, struct bio_set *bs); -struct bio *bio_alloc_kiocb(struct kiocb *kiocb, struct block_device *bdev, - unsigned short nr_vecs, unsigned int opf, struct bio_set *bs); struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned short nr_iovecs); extern void bio_put(struct bio *); diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 1973ef9bd40f..4968cb17b13b 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -329,7 +329,6 @@ enum { BIO_QOS_MERGED, /* but went through rq_qos merge path */ BIO_REMAPPED, BIO_ZONE_WRITE_LOCKED, /* Owns a zoned device zone write lock */ - BIO_PERCPU_CACHE, /* can participate in per-cpu alloc cache */ BIO_FLAG_LAST }; @@ -414,6 +413,7 @@ enum req_flag_bits { __REQ_NOUNMAP, /* do not free blocks when zeroing */ __REQ_POLLED, /* caller polls for completion using bio_poll */ + __REQ_ALLOC_CACHE, /* allocate IO from cache if available */ /* for driver use */ __REQ_DRV, @@ -439,6 +439,7 @@ enum req_flag_bits { #define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) #define REQ_POLLED (1ULL << __REQ_POLLED) +#define REQ_ALLOC_CACHE (1ULL << __REQ_ALLOC_CACHE) #define REQ_DRV (1ULL << __REQ_DRV) #define REQ_SWAP (1ULL << __REQ_SWAP) -- cgit From b53f3dcd705e52a07bd0311870dbfb6842e88c91 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Thu, 24 Mar 2022 16:35:25 -0400 Subject: block: allow use of per-cpu bio alloc cache by block drivers Refine per-cpu bio alloc cache interfaces so that DM and other block drivers can properly create and use the cache: DM uses bioset_init_from_src() to do its final bioset initialization, so must update bioset_init_from_src() to set BIOSET_PERCPU_CACHE if %src bioset has a cache. Also move bio_clear_polled() to include/linux/bio.h to allow users outside of block core. Signed-off-by: Mike Snitzer Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220324203526.62306-3-snitzer@kernel.org Signed-off-by: Jens Axboe --- include/linux/bio.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 0fe8a51a0778..8e87f3065042 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -780,6 +780,12 @@ static inline void bio_set_polled(struct bio *bio, struct kiocb *kiocb) bio->bi_opf |= REQ_NOWAIT; } +static inline void bio_clear_polled(struct bio *bio) +{ + /* can't support alloc cache if we turn off polling */ + bio->bi_opf &= ~(REQ_POLLED | REQ_ALLOC_CACHE); +} + struct bio *blk_next_bio(struct bio *bio, struct block_device *bdev, unsigned int nr_pages, unsigned int opf, gfp_t gfp); -- cgit From 066ff571011d8416e903d3d4f1f41e0b5eb91e1d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 6 Apr 2022 08:12:27 +0200 Subject: block: turn bio_kmalloc into a simple kmalloc wrapper Remove the magic autofree semantics and require the callers to explicitly call bio_init to initialize the bio. This allows bio_free to catch accidental bio_put calls on bio_init()ed bios as well. Signed-off-by: Christoph Hellwig Acked-by: Coly Li Acked-by: Mike Snitzer Link: https://lore.kernel.org/r/20220406061228.410163-5-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/bio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 8e87f3065042..49eff01fb829 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -405,7 +405,7 @@ extern int bioset_init_from_src(struct bio_set *bs, struct bio_set *src); struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, unsigned int opf, gfp_t gfp_mask, struct bio_set *bs); -struct bio *bio_kmalloc(gfp_t gfp_mask, unsigned short nr_iovecs); +struct bio *bio_kmalloc(unsigned short nr_vecs, gfp_t gfp_mask); extern void bio_put(struct bio *); struct bio *bio_alloc_clone(struct block_device *bdev, struct bio *bio_src, -- cgit From 10f0d2a517796b8f6dc04fb0cc3e49003ae6b0bc Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:42 +0200 Subject: block: add a bdev_nonrot helper Add a helper to check the nonrot flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Acked-by: David Sterba [btrfs] Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 60d016138997..3a9578e14a6b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1326,6 +1326,11 @@ static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) return 0; } +static inline bool bdev_nonrot(struct block_device *bdev) +{ + return blk_queue_nonrot(bdev_get_queue(bdev)); +} + static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); -- cgit From 08e688fdb8f7e862092ae64cee20bc8b463d1046 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:43 +0200 Subject: block: add a bdev_write_cache helper Add a helper to check the write cache flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Acked-by: David Sterba [btrfs] Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220415045258.199825-13-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 3a9578e14a6b..807a49aa5a27 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1331,6 +1331,11 @@ static inline bool bdev_nonrot(struct block_device *bdev) return blk_queue_nonrot(bdev_get_queue(bdev)); } +static inline bool bdev_write_cache(struct block_device *bdev) +{ + return test_bit(QUEUE_FLAG_WC, &bdev_get_queue(bdev)->queue_flags); +} + static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); -- cgit From a557e82e5a01826f902bd94fc925c03f253cb712 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:44 +0200 Subject: block: add a bdev_fua helper Add a helper to check the FUA flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220415045258.199825-14-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 807a49aa5a27..075b16d4560e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -602,7 +602,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); REQ_FAILFAST_DRIVER)) #define blk_queue_quiesced(q) test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags) #define blk_queue_pm_only(q) atomic_read(&(q)->pm_only) -#define blk_queue_fua(q) test_bit(QUEUE_FLAG_FUA, &(q)->queue_flags) #define blk_queue_registered(q) test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags) #define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags) @@ -1336,6 +1335,11 @@ static inline bool bdev_write_cache(struct block_device *bdev) return test_bit(QUEUE_FLAG_WC, &bdev_get_queue(bdev)->queue_flags); } +static inline bool bdev_fua(struct block_device *bdev) +{ + return test_bit(QUEUE_FLAG_FUA, &bdev_get_queue(bdev)->queue_flags); +} + static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); -- cgit From 36d254893aa6a6e204075c3cce94bb572ac32c04 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:45 +0200 Subject: block: add a bdev_stable_writes helper Add a helper to check the stable writes flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220415045258.199825-15-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 075b16d4560e..a433798c3343 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1330,6 +1330,12 @@ static inline bool bdev_nonrot(struct block_device *bdev) return blk_queue_nonrot(bdev_get_queue(bdev)); } +static inline bool bdev_stable_writes(struct block_device *bdev) +{ + return test_bit(QUEUE_FLAG_STABLE_WRITES, + &bdev_get_queue(bdev)->queue_flags); +} + static inline bool bdev_write_cache(struct block_device *bdev) { return test_bit(QUEUE_FLAG_WC, &bdev_get_queue(bdev)->queue_flags); -- cgit From 2aba0d19f4d8c8929b4b3b94a9cfde2aa20e6ee2 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:46 +0200 Subject: block: add a bdev_max_zone_append_sectors helper Add a helper to check the max supported sectors for zone append based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig Acked-by: Damien Le Moal Reviewed-by: Martin K. Petersen Reviewed-by: Johannes Thumshirn Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220415045258.199825-16-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index a433798c3343..f8c50b77543e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1188,6 +1188,12 @@ static inline unsigned int queue_max_zone_append_sectors(const struct request_qu return min(l->max_zone_append_sectors, l->max_sectors); } +static inline unsigned int +bdev_max_zone_append_sectors(struct block_device *bdev) +{ + return queue_max_zone_append_sectors(bdev_get_queue(bdev)); +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { int retval = 512; -- cgit From 640f2a23911b8388989547f89d055afbb910b88e Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:48 +0200 Subject: block: use bdev_alignment_offset in disk_alignment_offset_show This does the same as the open coded variant except for an extra branch, and allows to remove queue_alignment_offset entirely. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220415045258.199825-18-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f8c50b77543e..d5346e72e364 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1251,14 +1251,6 @@ bdev_zone_write_granularity(struct block_device *bdev) return queue_zone_write_granularity(bdev_get_queue(bdev)); } -static inline int queue_alignment_offset(const struct request_queue *q) -{ - if (q->limits.misaligned) - return -1; - - return q->limits.alignment_offset; -} - static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t sector) { unsigned int granularity = max(lim->physical_block_size, lim->io_min); -- cgit From 89098b075cb74a80083bc4ed6b71d0ee18b6898f Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:49 +0200 Subject: block: move bdev_alignment_offset and queue_limit_alignment_offset out of line No need to inline these fairly larger helpers. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220415045258.199825-19-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 21 +-------------------- 1 file changed, 1 insertion(+), 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d5346e72e364..0a1795ac2627 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1251,26 +1251,7 @@ bdev_zone_write_granularity(struct block_device *bdev) return queue_zone_write_granularity(bdev_get_queue(bdev)); } -static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t sector) -{ - unsigned int granularity = max(lim->physical_block_size, lim->io_min); - unsigned int alignment = sector_div(sector, granularity >> SECTOR_SHIFT) - << SECTOR_SHIFT; - - return (granularity + lim->alignment_offset - alignment) % granularity; -} - -static inline int bdev_alignment_offset(struct block_device *bdev) -{ - struct request_queue *q = bdev_get_queue(bdev); - - if (q->limits.misaligned) - return -1; - if (bdev_is_partition(bdev)) - return queue_limit_alignment_offset(&q->limits, - bdev->bd_start_sect); - return q->limits.alignment_offset; -} +int bdev_alignment_offset(struct block_device *bdev); static inline int queue_discard_alignment(const struct request_queue *q) { -- cgit From 4e1462ffe8998749884d61f91be251a7a8719677 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:50 +0200 Subject: block: remove queue_discard_alignment Just use bdev_alignment_offset in disk_discard_alignment_show instead. That helpers is the same except for an always false branch that doesn't matter in this slow path. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220415045258.199825-20-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 0a1795ac2627..5a9b7aeda010 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1253,14 +1253,6 @@ bdev_zone_write_granularity(struct block_device *bdev) int bdev_alignment_offset(struct block_device *bdev); -static inline int queue_discard_alignment(const struct request_queue *q) -{ - if (q->limits.discard_misaligned) - return -1; - - return q->limits.discard_alignment; -} - static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector_t sector) { unsigned int alignment, granularity, offset; -- cgit From 5c4b4a5c6f11c869a57c6bd977143430bc9dc43d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:52 +0200 Subject: block: move {bdev,queue_limit}_discard_alignment out of line No need to inline these fairly larger helpers. Also fix the return value to be unsigned, just like the field in struct queue_limits. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/20220415045258.199825-22-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 34 +--------------------------------- 1 file changed, 1 insertion(+), 33 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 5a9b7aeda010..34b1cfd06742 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1252,39 +1252,7 @@ bdev_zone_write_granularity(struct block_device *bdev) } int bdev_alignment_offset(struct block_device *bdev); - -static inline int queue_limit_discard_alignment(struct queue_limits *lim, sector_t sector) -{ - unsigned int alignment, granularity, offset; - - if (!lim->max_discard_sectors) - return 0; - - /* Why are these in bytes, not sectors? */ - alignment = lim->discard_alignment >> SECTOR_SHIFT; - granularity = lim->discard_granularity >> SECTOR_SHIFT; - if (!granularity) - return 0; - - /* Offset of the partition start in 'granularity' sectors */ - offset = sector_div(sector, granularity); - - /* And why do we do this modulus *again* in blkdev_issue_discard()? */ - offset = (granularity + alignment - offset) % granularity; - - /* Turn it back into bytes, gaah */ - return offset << SECTOR_SHIFT; -} - -static inline int bdev_discard_alignment(struct block_device *bdev) -{ - struct request_queue *q = bdev_get_queue(bdev); - - if (bdev_is_partition(bdev)) - return queue_limit_discard_alignment(&q->limits, - bdev->bd_start_sect); - return q->limits.discard_alignment; -} +unsigned int bdev_discard_alignment(struct block_device *bdev); static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) { -- cgit From cf0fbf894bb543f472f682c486be48298eccf199 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:54 +0200 Subject: block: add a bdev_max_discard_sectors helper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a helper to query the number of sectors support per each discard bio based on the block device and use this helper to stop various places from poking into the request_queue to see if discard is supported and if so how much. This mirrors what is done e.g. for write zeroes as well. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Acked-by: Christoph Böhmwalder [drbd] Acked-by: Coly Li [bcache] Acked-by: David Sterba [btrfs] Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220415045258.199825-24-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 34b1cfd06742..ce16247d3afa 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1254,6 +1254,11 @@ bdev_zone_write_granularity(struct block_device *bdev) int bdev_alignment_offset(struct block_device *bdev); unsigned int bdev_discard_alignment(struct block_device *bdev); +static inline unsigned int bdev_max_discard_sectors(struct block_device *bdev) +{ + return bdev_get_queue(bdev)->limits.max_discard_sectors; +} + static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); -- cgit From 70200574cc229f6ba038259e8142af2aa09e6976 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:55 +0200 Subject: block: remove QUEUE_FLAG_DISCARD MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Acked-by: Christoph Böhmwalder [drbd] Acked-by: Jan Höppner [s390] Acked-by: Coly Li [bcache] Acked-by: David Sterba [btrfs] Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ce16247d3afa..767ab22e1052 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -540,7 +540,6 @@ struct request_queue { #define QUEUE_FLAG_NONROT 6 /* non-rotational device (SSD) */ #define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */ #define QUEUE_FLAG_IO_STAT 7 /* do disk/partitions IO accounting */ -#define QUEUE_FLAG_DISCARD 8 /* supports DISCARD */ #define QUEUE_FLAG_NOXMERGES 9 /* No extended merges */ #define QUEUE_FLAG_ADD_RANDOM 10 /* Contributes to random pool */ #define QUEUE_FLAG_SECERASE 11 /* supports secure erase */ @@ -582,7 +581,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); test_bit(QUEUE_FLAG_STABLE_WRITES, &(q)->queue_flags) #define blk_queue_io_stat(q) test_bit(QUEUE_FLAG_IO_STAT, &(q)->queue_flags) #define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags) -#define blk_queue_discard(q) test_bit(QUEUE_FLAG_DISCARD, &(q)->queue_flags) #define blk_queue_zone_resetall(q) \ test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags) #define blk_queue_secure_erase(q) \ -- cgit From 7b47ef52d0a2025fd1408a8a0990933b8e1e510f Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:56 +0200 Subject: block: add a bdev_discard_granularity helper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Abstract away implementation details from file systems by providing a block_device based helper to retrieve the discard granularity. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Acked-by: Christoph Böhmwalder [drbd] Acked-by: Ryusuke Konishi Acked-by: David Sterba [btrfs] Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 767ab22e1052..f1cf557ea20e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1257,6 +1257,11 @@ static inline unsigned int bdev_max_discard_sectors(struct block_device *bdev) return bdev_get_queue(bdev)->limits.max_discard_sectors; } +static inline unsigned int bdev_discard_granularity(struct block_device *bdev) +{ + return bdev_get_queue(bdev)->limits.discard_granularity; +} + static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); -- cgit From 44abff2c0b970ae3d310b97617525dc01f248d7c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 15 Apr 2022 06:52:57 +0200 Subject: block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig Reviewed-by: Martin K. Petersen Acked-by: Christoph Böhmwalder [drbd] Acked-by: Ryusuke Konishi [nifs2] Acked-by: Jaegeuk Kim [f2fs] Acked-by: Coly Li [bcache] Acked-by: David Sterba [btrfs] Acked-by: Chao Yu Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f1cf557ea20e..c9b5925af5a3 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -248,6 +248,7 @@ struct queue_limits { unsigned int io_opt; unsigned int max_discard_sectors; unsigned int max_hw_discard_sectors; + unsigned int max_secure_erase_sectors; unsigned int max_write_zeroes_sectors; unsigned int max_zone_append_sectors; unsigned int discard_granularity; @@ -542,7 +543,6 @@ struct request_queue { #define QUEUE_FLAG_IO_STAT 7 /* do disk/partitions IO accounting */ #define QUEUE_FLAG_NOXMERGES 9 /* No extended merges */ #define QUEUE_FLAG_ADD_RANDOM 10 /* Contributes to random pool */ -#define QUEUE_FLAG_SECERASE 11 /* supports secure erase */ #define QUEUE_FLAG_SAME_FORCE 12 /* force complete on same CPU */ #define QUEUE_FLAG_DEAD 13 /* queue tear-down finished */ #define QUEUE_FLAG_INIT_DONE 14 /* queue is initialized */ @@ -583,8 +583,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #define blk_queue_add_random(q) test_bit(QUEUE_FLAG_ADD_RANDOM, &(q)->queue_flags) #define blk_queue_zone_resetall(q) \ test_bit(QUEUE_FLAG_ZONE_RESETALL, &(q)->queue_flags) -#define blk_queue_secure_erase(q) \ - (test_bit(QUEUE_FLAG_SECERASE, &(q)->queue_flags)) #define blk_queue_dax(q) test_bit(QUEUE_FLAG_DAX, &(q)->queue_flags) #define blk_queue_pci_p2pdma(q) \ test_bit(QUEUE_FLAG_PCI_P2PDMA, &(q)->queue_flags) @@ -947,6 +945,8 @@ extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int); extern void blk_queue_max_segments(struct request_queue *, unsigned short); extern void blk_queue_max_discard_segments(struct request_queue *, unsigned short); +void blk_queue_max_secure_erase_sectors(struct request_queue *q, + unsigned int max_sectors); extern void blk_queue_max_segment_size(struct request_queue *, unsigned int); extern void blk_queue_max_discard_sectors(struct request_queue *q, unsigned int max_discard_sectors); @@ -1087,13 +1087,12 @@ static inline long nr_blockdev_pages(void) extern void blk_io_schedule(void); -#define BLKDEV_DISCARD_SECURE (1 << 0) /* issue a secure erase */ - -extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector, - sector_t nr_sects, gfp_t gfp_mask, unsigned long flags); -extern int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, - sector_t nr_sects, gfp_t gfp_mask, int flags, - struct bio **biop); +int blkdev_issue_discard(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask); +int __blkdev_issue_discard(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp_mask, struct bio **biop); +int blkdev_issue_secure_erase(struct block_device *bdev, sector_t sector, + sector_t nr_sects, gfp_t gfp); #define BLKDEV_ZERO_NOUNMAP (1 << 0) /* do not free blocks */ #define BLKDEV_ZERO_NOFALLBACK (1 << 1) /* don't write explicit zeroes */ @@ -1112,7 +1111,7 @@ static inline int sb_issue_discard(struct super_block *sb, sector_t block, SECTOR_SHIFT), nr_blocks << (sb->s_blocksize_bits - SECTOR_SHIFT), - gfp_mask, flags); + gfp_mask); } static inline int sb_issue_zeroout(struct super_block *sb, sector_t block, sector_t nr_blocks, gfp_t gfp_mask) @@ -1262,6 +1261,12 @@ static inline unsigned int bdev_discard_granularity(struct block_device *bdev) return bdev_get_queue(bdev)->limits.discard_granularity; } +static inline unsigned int +bdev_max_secure_erase_sectors(struct block_device *bdev) +{ + return bdev_get_queue(bdev)->limits.max_secure_erase_sectors; +} + static inline unsigned int bdev_write_zeroes_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); -- cgit From a2daa27c0c6137481226aee5b3136e453c642929 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 14 Feb 2022 11:44:42 +0100 Subject: swiotlb: simplify swiotlb_max_segment Remove the bogus Xen override that was usually larger than the actual size and just calculate the value on demand. Note that swiotlb_max_segment still doesn't make sense as an interface and should eventually be removed. Signed-off-by: Christoph Hellwig Reviewed-by: Anshuman Khandual Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Boris Ostrovsky --- include/linux/swiotlb.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index f6c3638255d5..9fb3a568f0c5 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -164,7 +164,6 @@ static inline void swiotlb_adjust_size(unsigned long size) #endif /* CONFIG_SWIOTLB */ extern void swiotlb_print_info(void); -extern void swiotlb_set_max_segment(unsigned int); #ifdef CONFIG_DMA_RESTRICTED_POOL struct page *swiotlb_alloc(struct device *dev, size_t size); -- cgit From 0d5ffd9a256d8995764f9d4a35a8c3917839d169 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 14 Feb 2022 11:07:28 +0100 Subject: swiotlb: rename swiotlb_late_init_with_default_size swiotlb_late_init_with_default_size is an overly verbose name that doesn't even catch what the function is doing, given that the size is not just a default but the actual requested size. Rename it to swiotlb_init_late. Signed-off-by: Christoph Hellwig Reviewed-by: Anshuman Khandual Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Boris Ostrovsky --- include/linux/swiotlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 9fb3a568f0c5..b48b26bfa0ed 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -40,7 +40,7 @@ extern void swiotlb_init(int verbose); int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); unsigned long swiotlb_size_or_default(void); extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs); -extern int swiotlb_late_init_with_default_size(size_t default_size); +int swiotlb_init_late(size_t size); extern void __init swiotlb_update_mem_attributes(void); phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys, -- cgit From 78013eaadf696d2105982abb4018fbae394ca08f Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 14 Feb 2022 14:11:44 +0100 Subject: x86: remove the IOMMU table infrastructure The IOMMU table tries to separate the different IOMMUs into different backends, but actually requires various cross calls. Rewrite the code to do the generic swiotlb/swiotlb-xen setup directly in pci-dma.c and then just call into the IOMMU drivers. Signed-off-by: Christoph Hellwig Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Boris Ostrovsky --- include/linux/dmar.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dmar.h b/include/linux/dmar.h index 45e903d84733..cbd714a198a0 100644 --- a/include/linux/dmar.h +++ b/include/linux/dmar.h @@ -121,7 +121,7 @@ extern int dmar_remove_dev_scope(struct dmar_pci_notify_info *info, u16 segment, struct dmar_dev_scope *devices, int count); /* Intel IOMMU detection */ -extern int detect_intel_iommu(void); +void detect_intel_iommu(void); extern int enable_drhd_fault_handling(void); extern int dmar_device_add(acpi_handle handle); extern int dmar_device_remove(acpi_handle handle); @@ -197,6 +197,10 @@ static inline bool dmar_platform_optin(void) return false; } +static inline void detect_intel_iommu(void) +{ +} + #endif /* CONFIG_DMAR_TABLE */ struct irte { -- cgit From c6af2aa9ffc9763826607bc2664ef3ea4475ed18 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 29 Mar 2022 17:27:33 +0200 Subject: swiotlb: make the swiotlb_init interface more useful Pass a boolean flag to indicate if swiotlb needs to be enabled based on the addressing needs, and replace the verbose argument with a set of flags, including one to force enable bounce buffering. Note that this patch removes the possibility to force xen-swiotlb use with the swiotlb=force parameter on the command line on x86 (arm and arm64 never supported that), but this interface will be restored shortly. Signed-off-by: Christoph Hellwig Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Boris Ostrovsky --- include/linux/swiotlb.h | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index b48b26bfa0ed..ae0407173e84 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -13,11 +13,8 @@ struct device; struct page; struct scatterlist; -enum swiotlb_force { - SWIOTLB_NORMAL, /* Default - depending on HW DMA mask etc. */ - SWIOTLB_FORCE, /* swiotlb=force */ - SWIOTLB_NO_FORCE, /* swiotlb=noforce */ -}; +#define SWIOTLB_VERBOSE (1 << 0) /* verbose initialization */ +#define SWIOTLB_FORCE (1 << 1) /* force bounce buffering */ /* * Maximum allowable number of contiguous slabs to map, @@ -36,8 +33,7 @@ enum swiotlb_force { /* default to 64MB */ #define IO_TLB_DEFAULT_SIZE (64UL<<20) -extern void swiotlb_init(int verbose); -int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose); +int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, unsigned int flags); unsigned long swiotlb_size_or_default(void); extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs); int swiotlb_init_late(size_t size); @@ -126,13 +122,16 @@ static inline bool is_swiotlb_force_bounce(struct device *dev) return mem && mem->force_bounce; } +void swiotlb_init(bool addressing_limited, unsigned int flags); void __init swiotlb_exit(void); unsigned int swiotlb_max_segment(void); size_t swiotlb_max_mapping_size(struct device *dev); bool is_swiotlb_active(struct device *dev); void __init swiotlb_adjust_size(unsigned long size); #else -#define swiotlb_force SWIOTLB_NO_FORCE +static inline void swiotlb_init(bool addressing_limited, unsigned int flags) +{ +} static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr) { return false; -- cgit From 8ba2ed1be90fc210126f68186564707478552c95 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 28 Feb 2022 13:36:57 +0200 Subject: swiotlb: add a SWIOTLB_ANY flag to lift the low memory restriction Power SVM wants to allocate a swiotlb buffer that is not restricted to low memory for the trusted hypervisor scheme. Consolidate the support for this into the swiotlb_init interface by adding a new flag. Signed-off-by: Christoph Hellwig Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Boris Ostrovsky --- include/linux/swiotlb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index ae0407173e84..eabdd8998702 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -15,6 +15,7 @@ struct scatterlist; #define SWIOTLB_VERBOSE (1 << 0) /* verbose initialization */ #define SWIOTLB_FORCE (1 << 1) /* force bounce buffering */ +#define SWIOTLB_ANY (1 << 2) /* allow any memory for the buffer */ /* * Maximum allowable number of contiguous slabs to map, -- cgit From 742519538e6b07250c8085bbff4bd358bc03bf16 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 14 Feb 2022 11:12:59 +0100 Subject: swiotlb: pass a gfp_mask argument to swiotlb_init_late Let the caller chose a zone to allocate from. This will be used later on by the xen-swiotlb initialization on arm. Signed-off-by: Christoph Hellwig Reviewed-by: Anshuman Khandual Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Boris Ostrovsky --- include/linux/swiotlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index eabdd8998702..ee655f2e4d28 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -37,7 +37,7 @@ struct scatterlist; int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, unsigned int flags); unsigned long swiotlb_size_or_default(void); extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs); -int swiotlb_init_late(size_t size); +int swiotlb_init_late(size_t size, gfp_t gfp_mask); extern void __init swiotlb_update_mem_attributes(void); phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys, -- cgit From 7374153d294eb51de5a81ac38ff1c4fef8927bec Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 14 Mar 2022 08:02:57 +0100 Subject: swiotlb: provide swiotlb_init variants that remap the buffer To shared more code between swiotlb and xen-swiotlb, offer a swiotlb_init_remap interface and add a remap callback to swiotlb_init_late that will allow Xen to remap the buffer without duplicating much of the logic. Signed-off-by: Christoph Hellwig Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Boris Ostrovsky --- include/linux/swiotlb.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index ee655f2e4d28..7b50c82f84ce 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -36,8 +36,11 @@ struct scatterlist; int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, unsigned int flags); unsigned long swiotlb_size_or_default(void); +void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags, + int (*remap)(void *tlb, unsigned long nslabs)); +int swiotlb_init_late(size_t size, gfp_t gfp_mask, + int (*remap)(void *tlb, unsigned long nslabs)); extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs); -int swiotlb_init_late(size_t size, gfp_t gfp_mask); extern void __init swiotlb_update_mem_attributes(void); phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys, -- cgit From 6424e31b1c050a25aea033206d5f626f3523448c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 15 Mar 2022 07:41:04 +0100 Subject: swiotlb: remove swiotlb_init_with_tbl and swiotlb_init_late_with_tbl No users left. Signed-off-by: Christoph Hellwig Reviewed-by: Konrad Rzeszutek Wilk Tested-by: Boris Ostrovsky --- include/linux/swiotlb.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7b50c82f84ce..7ed35dd3de6e 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -34,13 +34,11 @@ struct scatterlist; /* default to 64MB */ #define IO_TLB_DEFAULT_SIZE (64UL<<20) -int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, unsigned int flags); unsigned long swiotlb_size_or_default(void); void __init swiotlb_init_remap(bool addressing_limit, unsigned int flags, int (*remap)(void *tlb, unsigned long nslabs)); int swiotlb_init_late(size_t size, gfp_t gfp_mask, int (*remap)(void *tlb, unsigned long nslabs)); -extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs); extern void __init swiotlb_update_mem_attributes(void); phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys, -- cgit From f47a6113f4e87db7ca066635822e1b3ca3ed9514 Mon Sep 17 00:00:00 2001 From: Tzung-Bi Shih Date: Wed, 16 Feb 2022 16:03:03 +0800 Subject: platform/chrome: cros_ec: remove unused variable `was_wake_device` Reviewed-by: Prashant Malani Signed-off-by: Tzung-Bi Shih --- include/linux/platform_data/cros_ec_proto.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_proto.h b/include/linux/platform_data/cros_ec_proto.h index df3c78c92ca2..c65971ec90ea 100644 --- a/include/linux/platform_data/cros_ec_proto.h +++ b/include/linux/platform_data/cros_ec_proto.h @@ -76,8 +76,6 @@ struct cros_ec_command { * struct cros_ec_device - Information about a ChromeOS EC device. * @phys_name: Name of physical comms layer (e.g. 'i2c-4'). * @dev: Device pointer for physical comms device - * @was_wake_device: True if this device was set to wake the system from - * sleep at the last suspend. * @cros_class: The class structure for this device. * @cmd_readmem: Direct read of the EC memory-mapped region, if supported. * @offset: Is within EC_LPC_ADDR_MEMMAP region. @@ -137,7 +135,6 @@ struct cros_ec_device { /* These are used by other drivers that want to talk to the EC */ const char *phys_name; struct device *dev; - bool was_wake_device; struct class *cros_class; int (*cmd_readmem)(struct cros_ec_device *ec, unsigned int offset, unsigned int bytes, void *dest); -- cgit From 5f0614a55ecebdf55f1a17db0b5f6b787ed009f1 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Sun, 17 Apr 2022 22:27:13 -0400 Subject: block: change exported IO accounting interface from gendisk to bdev Export IO accounting interfaces in terms of block_device now that gendisk has become more internal to block core. Rename __part_{start,end}_io_acct's first argument from part to bdev. Rename __part_{start,end}_io_acct to bdev_{start,end}_io_acct and export them. Remove disk_{start,end}_io_acct and update caller (zram) to use bdev_{start,end}_io_acct. DM can now be updated to use bdev_{start,end}_io_acct. Signed-off-by: Ming Lei Signed-off-by: Mike Snitzer Link: https://lore.kernel.org/r/20220418022733.56168-2-snitzer@kernel.org Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c9b5925af5a3..34724b15813b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1463,9 +1463,10 @@ static inline void blk_wake_io_task(struct task_struct *waiter) wake_up_process(waiter); } -unsigned long disk_start_io_acct(struct gendisk *disk, unsigned int sectors, - unsigned int op); -void disk_end_io_acct(struct gendisk *disk, unsigned int op, +unsigned long bdev_start_io_acct(struct block_device *bdev, + unsigned int sectors, unsigned int op, + unsigned long start_time); +void bdev_end_io_acct(struct block_device *bdev, unsigned int op, unsigned long start_time); void bio_start_io_acct_time(struct bio *bio, unsigned long start_time); -- cgit From dbdc1be32591af023db2812706f01e6cd2f42bfc Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 30 Mar 2022 07:29:06 +0200 Subject: block: add a disk_openers helper Add a helper that returns the openers for a given gendisk to avoid having drivers poke into disk->part0 to get at this information in a somewhat cumbersome way. Signed-off-by: Christoph Hellwig Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20220330052917.2566582-5-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c9b5925af5a3..436645bde13f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -176,6 +176,21 @@ static inline bool disk_live(struct gendisk *disk) return !inode_unhashed(disk->part0->bd_inode); } +/** + * disk_openers - returns how many openers are there for a disk + * @disk: disk to check + * + * This returns the number of openers for a disk. Note that this value is only + * stable if disk->open_mutex is held. + * + * Note: Due to a quirk in the block layer open code, each open partition is + * only counted once even if there are multiple openers. + */ +static inline unsigned int disk_openers(struct gendisk *disk) +{ + return disk->part0->bd_openers; +} + /* * The gendisk is refcounted by the part0 block_device, and the bd_device * therein is also used for device model presentation in sysfs. -- cgit From 9acf381f3e8f715175c29f4b6d722f6b6797d139 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 30 Mar 2022 07:29:07 +0200 Subject: block: turn bdev->bd_openers into an atomic_t All manipulation of bd_openers is under disk->open_mutex and will remain so for the foreseeable future. But at least one place reads it without the lock (blkdev_get) and there are more to be added. So make sure the compiler does not do turn the increments and decrements into non-atomic sequences by using an atomic_t. Signed-off-by: Christoph Hellwig Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20220330052917.2566582-6-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 2 +- include/linux/blkdev.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 4968cb17b13b..c62274466e72 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -44,7 +44,7 @@ struct block_device { unsigned long bd_stamp; bool bd_read_only; /* read-only policy */ dev_t bd_dev; - int bd_openers; + atomic_t bd_openers; struct inode * bd_inode; /* will die */ struct super_block * bd_super; void * bd_claiming; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 436645bde13f..afad3d1d0dac 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -188,7 +188,7 @@ static inline bool disk_live(struct gendisk *disk) */ static inline unsigned int disk_openers(struct gendisk *disk) { - return disk->part0->bd_openers; + return atomic_read(&disk->part0->bd_openers); } /* -- cgit From 57b888ca2541785de2fcb90575b378921919b6c0 Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Fri, 18 Mar 2022 09:54:22 -0700 Subject: platform/chrome: Re-introduce cros_ec_cmd_xfer and use it for ioctls Commit 413dda8f2c6f ("platform/chrome: cros_ec_chardev: Use cros_ec_cmd_xfer_status helper") inadvertendly changed the userspace ABI. Previously, cros_ec ioctls would only report errors if the EC communication failed, and otherwise return success and the result of the EC communication. An EC command execution failure was reported in the EC response field. The above mentioned commit changed this behavior, and the ioctl itself would fail. This breaks userspace commands trying to analyze the EC command execution error since the actual EC command response is no longer reported to userspace. Fix the problem by re-introducing the cros_ec_cmd_xfer() helper, and use it to handle ioctl messages. Fixes: 413dda8f2c6f ("platform/chrome: cros_ec_chardev: Use cros_ec_cmd_xfer_status helper") Cc: Daisuke Nojiri Cc: Rob Barnes Cc: Rajat Jain Cc: Brian Norris Cc: Parth Malkan Reviewed-by: Daisuke Nojiri Reviewed-by: Brian Norris Signed-off-by: Guenter Roeck Signed-off-by: Tzung-Bi Shih --- include/linux/platform_data/cros_ec_proto.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_proto.h b/include/linux/platform_data/cros_ec_proto.h index c65971ec90ea..138fd912c808 100644 --- a/include/linux/platform_data/cros_ec_proto.h +++ b/include/linux/platform_data/cros_ec_proto.h @@ -213,6 +213,9 @@ int cros_ec_prepare_tx(struct cros_ec_device *ec_dev, int cros_ec_check_result(struct cros_ec_device *ec_dev, struct cros_ec_command *msg); +int cros_ec_cmd_xfer(struct cros_ec_device *ec_dev, + struct cros_ec_command *msg); + int cros_ec_cmd_xfer_status(struct cros_ec_device *ec_dev, struct cros_ec_command *msg); -- cgit From 2f1e85b1aee459b7d0fd981839042c6a38ffaf0c Mon Sep 17 00:00:00 2001 From: Tonghao Zhang Date: Sat, 16 Apr 2022 00:40:45 +0800 Subject: net: sched: use queue_mapping to pick tx queue This patch fixes issue: * If we install tc filters with act_skbedit in clsact hook. It doesn't work, because netdev_core_pick_tx() overwrites queue_mapping. $ tc filter ... action skbedit queue_mapping 1 And this patch is useful: * We can use FQ + EDT to implement efficient policies. Tx queues are picked by xps, ndo_select_queue of netdev driver, or skb hash in netdev_core_pick_tx(). In fact, the netdev driver, and skb hash are _not_ under control. xps uses the CPUs map to select Tx queues, but we can't figure out which task_struct of pod/containter running on this cpu in most case. We can use clsact filters to classify one pod/container traffic to one Tx queue. Why ? In containter networking environment, there are two kinds of pod/ containter/net-namespace. One kind (e.g. P1, P2), the high throughput is key in these applications. But avoid running out of network resource, the outbound traffic of these pods is limited, using or sharing one dedicated Tx queues assigned HTB/TBF/FQ Qdisc. Other kind of pods (e.g. Pn), the low latency of data access is key. And the traffic is not limited. Pods use or share other dedicated Tx queues assigned FIFO Qdisc. This choice provides two benefits. First, contention on the HTB/FQ Qdisc lock is significantly reduced since fewer CPUs contend for the same queue. More importantly, Qdisc contention can be eliminated completely if each CPU has its own FIFO Qdisc for the second kind of pods. There must be a mechanism in place to support classifying traffic based on pods/container to different Tx queues. Note that clsact is outside of Qdisc while Qdisc can run a classifier to select a sub-queue under the lock. In general recording the decision in the skb seems a little heavy handed. This patch introduces a per-CPU variable, suggested by Eric. The xmit.skip_txqueue flag is firstly cleared in __dev_queue_xmit(). - Tx Qdisc may install that skbedit actions, then xmit.skip_txqueue flag is set in qdisc->enqueue() though tx queue has been selected in netdev_tx_queue_mapping() or netdev_core_pick_tx(). That flag is cleared firstly in __dev_queue_xmit(), is useful: - Avoid picking Tx queue with netdev_tx_queue_mapping() in next netdev in such case: eth0 macvlan - eth0.3 vlan - eth0 ixgbe-phy: For example, eth0, macvlan in pod, which root Qdisc install skbedit queue_mapping, send packets to eth0.3, vlan in host. In __dev_queue_xmit() of eth0.3, clear the flag, does not select tx queue according to skb->queue_mapping because there is no filters in clsact or tx Qdisc of this netdev. Same action taked in eth0, ixgbe in Host. - Avoid picking Tx queue for next packet. If we set xmit.skip_txqueue in tx Qdisc (qdisc->enqueue()), the proper way to clear it is clearing it in __dev_queue_xmit when processing next packets. For performance reasons, use the static key. If user does not config the NET_EGRESS, the patch will not be compiled. +----+ +----+ +----+ | P1 | | P2 | | Pn | +----+ +----+ +----+ | | | +-----------+-----------+ | | clsact/skbedit | MQ v +-----------+-----------+ | q0 | q1 | qn v v v HTB/FQ HTB/FQ ... FIFO Cc: Jamal Hadi Salim Cc: Cong Wang Cc: Jiri Pirko Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Jonathan Lemon Cc: Eric Dumazet Cc: Alexander Lobakin Cc: Paolo Abeni Cc: Talal Ahmad Cc: Kevin Hao Cc: Ilias Apalodimas Cc: Kees Cook Cc: Kumar Kartikeya Dwivedi Cc: Antoine Tenart Cc: Wei Wang Cc: Arnd Bergmann Suggested-by: Eric Dumazet Signed-off-by: Tonghao Zhang Acked-by: Jamal Hadi Salim Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 3 +++ include/linux/rtnetlink.h | 1 + 2 files changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index a602f29365b0..7dccbfd1bf56 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3061,6 +3061,9 @@ struct softnet_data { struct { u16 recursion; u8 more; +#ifdef CONFIG_NET_EGRESS + u8 skip_txqueue; +#endif } xmit; #ifdef CONFIG_RPS /* input_queue_head should be written by cpu owning this struct, diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h index 7f970b16da3a..ae2c6a3cec5d 100644 --- a/include/linux/rtnetlink.h +++ b/include/linux/rtnetlink.h @@ -100,6 +100,7 @@ void net_dec_ingress_queue(void); #ifdef CONFIG_NET_EGRESS void net_inc_egress_queue(void); void net_dec_egress_queue(void); +void netdev_xmit_skip_txqueue(bool skip); #endif void rtnetlink_init(void); -- cgit From c6547c2ed0e1487c91983faccad841611ab6a783 Mon Sep 17 00:00:00 2001 From: Sascha Hauer Date: Thu, 14 Apr 2022 18:22:37 +0200 Subject: dmaengine: imx: Move header to include/dma/ The i.MX DMA drivers are device tree only, nothing in include/linux/platform_data/dma-imx.h has platform_data in it, so move the file to include/linux/dma/imx-dma.h. Signed-off-by: Sascha Hauer Acked-By: Vinod Koul Link: https://lore.kernel.org/r/20220414162249.3934543-10-s.hauer@pengutronix.de Signed-off-by: Mark Brown --- include/linux/dma/imx-dma.h | 68 +++++++++++++++++++++++++++++++++++ include/linux/platform_data/dma-imx.h | 68 ----------------------------------- 2 files changed, 68 insertions(+), 68 deletions(-) create mode 100644 include/linux/dma/imx-dma.h delete mode 100644 include/linux/platform_data/dma-imx.h (limited to 'include/linux') diff --git a/include/linux/dma/imx-dma.h b/include/linux/dma/imx-dma.h new file mode 100644 index 000000000000..b06cba85a6d4 --- /dev/null +++ b/include/linux/dma/imx-dma.h @@ -0,0 +1,68 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright 2004-2009 Freescale Semiconductor, Inc. All Rights Reserved. + */ + +#ifndef __LINUX_DMA_IMX_H +#define __LINUX_DMA_IMX_H + +#include +#include +#include + +/* + * This enumerates peripheral types. Used for SDMA. + */ +enum sdma_peripheral_type { + IMX_DMATYPE_SSI, /* MCU domain SSI */ + IMX_DMATYPE_SSI_SP, /* Shared SSI */ + IMX_DMATYPE_MMC, /* MMC */ + IMX_DMATYPE_SDHC, /* SDHC */ + IMX_DMATYPE_UART, /* MCU domain UART */ + IMX_DMATYPE_UART_SP, /* Shared UART */ + IMX_DMATYPE_FIRI, /* FIRI */ + IMX_DMATYPE_CSPI, /* MCU domain CSPI */ + IMX_DMATYPE_CSPI_SP, /* Shared CSPI */ + IMX_DMATYPE_SIM, /* SIM */ + IMX_DMATYPE_ATA, /* ATA */ + IMX_DMATYPE_CCM, /* CCM */ + IMX_DMATYPE_EXT, /* External peripheral */ + IMX_DMATYPE_MSHC, /* Memory Stick Host Controller */ + IMX_DMATYPE_MSHC_SP, /* Shared Memory Stick Host Controller */ + IMX_DMATYPE_DSP, /* DSP */ + IMX_DMATYPE_MEMORY, /* Memory */ + IMX_DMATYPE_FIFO_MEMORY,/* FIFO type Memory */ + IMX_DMATYPE_SPDIF, /* SPDIF */ + IMX_DMATYPE_IPU_MEMORY, /* IPU Memory */ + IMX_DMATYPE_ASRC, /* ASRC */ + IMX_DMATYPE_ESAI, /* ESAI */ + IMX_DMATYPE_SSI_DUAL, /* SSI Dual FIFO */ + IMX_DMATYPE_ASRC_SP, /* Shared ASRC */ + IMX_DMATYPE_SAI, /* SAI */ +}; + +enum imx_dma_prio { + DMA_PRIO_HIGH = 0, + DMA_PRIO_MEDIUM = 1, + DMA_PRIO_LOW = 2 +}; + +struct imx_dma_data { + int dma_request; /* DMA request line */ + int dma_request2; /* secondary DMA request line */ + enum sdma_peripheral_type peripheral_type; + int priority; +}; + +static inline int imx_dma_is_ipu(struct dma_chan *chan) +{ + return !strcmp(dev_name(chan->device->dev), "ipu-core"); +} + +static inline int imx_dma_is_general_purpose(struct dma_chan *chan) +{ + return !strcmp(chan->device->dev->driver->name, "imx-sdma") || + !strcmp(chan->device->dev->driver->name, "imx-dma"); +} + +#endif /* __LINUX_DMA_IMX_H */ diff --git a/include/linux/platform_data/dma-imx.h b/include/linux/platform_data/dma-imx.h deleted file mode 100644 index 281adbb26e6b..000000000000 --- a/include/linux/platform_data/dma-imx.h +++ /dev/null @@ -1,68 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright 2004-2009 Freescale Semiconductor, Inc. All Rights Reserved. - */ - -#ifndef __ASM_ARCH_MXC_DMA_H__ -#define __ASM_ARCH_MXC_DMA_H__ - -#include -#include -#include - -/* - * This enumerates peripheral types. Used for SDMA. - */ -enum sdma_peripheral_type { - IMX_DMATYPE_SSI, /* MCU domain SSI */ - IMX_DMATYPE_SSI_SP, /* Shared SSI */ - IMX_DMATYPE_MMC, /* MMC */ - IMX_DMATYPE_SDHC, /* SDHC */ - IMX_DMATYPE_UART, /* MCU domain UART */ - IMX_DMATYPE_UART_SP, /* Shared UART */ - IMX_DMATYPE_FIRI, /* FIRI */ - IMX_DMATYPE_CSPI, /* MCU domain CSPI */ - IMX_DMATYPE_CSPI_SP, /* Shared CSPI */ - IMX_DMATYPE_SIM, /* SIM */ - IMX_DMATYPE_ATA, /* ATA */ - IMX_DMATYPE_CCM, /* CCM */ - IMX_DMATYPE_EXT, /* External peripheral */ - IMX_DMATYPE_MSHC, /* Memory Stick Host Controller */ - IMX_DMATYPE_MSHC_SP, /* Shared Memory Stick Host Controller */ - IMX_DMATYPE_DSP, /* DSP */ - IMX_DMATYPE_MEMORY, /* Memory */ - IMX_DMATYPE_FIFO_MEMORY,/* FIFO type Memory */ - IMX_DMATYPE_SPDIF, /* SPDIF */ - IMX_DMATYPE_IPU_MEMORY, /* IPU Memory */ - IMX_DMATYPE_ASRC, /* ASRC */ - IMX_DMATYPE_ESAI, /* ESAI */ - IMX_DMATYPE_SSI_DUAL, /* SSI Dual FIFO */ - IMX_DMATYPE_ASRC_SP, /* Shared ASRC */ - IMX_DMATYPE_SAI, /* SAI */ -}; - -enum imx_dma_prio { - DMA_PRIO_HIGH = 0, - DMA_PRIO_MEDIUM = 1, - DMA_PRIO_LOW = 2 -}; - -struct imx_dma_data { - int dma_request; /* DMA request line */ - int dma_request2; /* secondary DMA request line */ - enum sdma_peripheral_type peripheral_type; - int priority; -}; - -static inline int imx_dma_is_ipu(struct dma_chan *chan) -{ - return !strcmp(dev_name(chan->device->dev), "ipu-core"); -} - -static inline int imx_dma_is_general_purpose(struct dma_chan *chan) -{ - return !strcmp(chan->device->dev->driver->name, "imx-sdma") || - !strcmp(chan->device->dev->driver->name, "imx-dma"); -} - -#endif -- cgit From 824a0a02cd74776461aaa30d792b1ed9111c9aa5 Mon Sep 17 00:00:00 2001 From: Sascha Hauer Date: Thu, 14 Apr 2022 18:22:39 +0200 Subject: dmaengine: imx-sdma: Add multi fifo support The i.MX SDMA engine can read from / write to multiple successive hardware FIFO registers, referred to as "Multi FIFO support". This is needed for the micfil driver and certain configurations of the SAI driver. This patch adds support for this feature. The number of FIFOs to read from / write to must be communicated from the client driver to the SDMA engine. For this the struct dma_slave_config::peripheral_config field is used. Signed-off-by: Sascha Hauer Acked-By: Vinod Koul Link: https://lore.kernel.org/r/20220414162249.3934543-12-s.hauer@pengutronix.de Signed-off-by: Mark Brown --- include/linux/dma/imx-dma.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma/imx-dma.h b/include/linux/dma/imx-dma.h index b06cba85a6d4..8887762360d4 100644 --- a/include/linux/dma/imx-dma.h +++ b/include/linux/dma/imx-dma.h @@ -39,6 +39,7 @@ enum sdma_peripheral_type { IMX_DMATYPE_SSI_DUAL, /* SSI Dual FIFO */ IMX_DMATYPE_ASRC_SP, /* Shared ASRC */ IMX_DMATYPE_SAI, /* SAI */ + IMX_DMATYPE_MULTI_SAI, /* MULTI FIFOs For Audio */ }; enum imx_dma_prio { @@ -65,4 +66,23 @@ static inline int imx_dma_is_general_purpose(struct dma_chan *chan) !strcmp(chan->device->dev->driver->name, "imx-dma"); } +/** + * struct sdma_peripheral_config - SDMA config for audio + * @n_fifos_src: Number of FIFOs for recording + * @n_fifos_dst: Number of FIFOs for playback + * @sw_done: Use software done. Needed for PDM (micfil) + * + * Some i.MX Audio devices (SAI, micfil) have multiple successive FIFO + * registers. For multichannel recording/playback the SAI/micfil have + * one FIFO register per channel and the SDMA engine has to read/write + * the next channel from/to the next register and wrap around to the + * first register when all channels are handled. The number of active + * channels must be communicated to the SDMA engine using this struct. + */ +struct sdma_peripheral_config { + int n_fifos_src; + int n_fifos_dst; + bool sw_done; +}; + #endif /* __LINUX_DMA_IMX_H */ -- cgit From 6c846d026d490b2383d395bc8e7b06336219667b Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 19 Apr 2022 15:18:37 +0100 Subject: gpio: Don't fiddle with irqchips marked as immutable In order to move away from gpiolib messing with the internals of unsuspecting irqchips, add a flag by which irqchips advertise that they are not to be messed with, and do solemnly swear that they correctly call into the gpiolib helpers when required. Also nudge the users into converting their drivers to the new model. Reviewed-by: Andy Shevchenko Reviewed-by: Bartosz Golaszewski Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220419141846.598305-2-maz@kernel.org --- include/linux/irq.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/irq.h b/include/linux/irq.h index f92788ccdba2..505308253d23 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -569,6 +569,7 @@ struct irq_chip { * IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND: Invokes __enable_irq()/__disable_irq() for wake irqs * in the suspend path if they are in disabled state * IRQCHIP_AFFINITY_PRE_STARTUP: Default affinity update before startup + * IRQCHIP_IMMUTABLE: Don't ever change anything in this chip */ enum { IRQCHIP_SET_TYPE_MASKED = (1 << 0), @@ -582,6 +583,7 @@ enum { IRQCHIP_SUPPORTS_NMI = (1 << 8), IRQCHIP_ENABLE_WAKEUP_ON_SUSPEND = (1 << 9), IRQCHIP_AFFINITY_PRE_STARTUP = (1 << 10), + IRQCHIP_IMMUTABLE = (1 << 11), }; #include -- cgit From 704f08753b6dcd0e08c1953af0b2c7f3fac87111 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 19 Apr 2022 15:18:38 +0100 Subject: gpio: Expose the gpiochip_irq_re[ql]res helpers The GPIO subsystem has a couple of internal helpers to manage resources on behalf of the irqchip. Expose them so that GPIO drivers can use them directly. Reviewed-by: Andy Shevchenko Reviewed-by: Bartosz Golaszewski Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220419141846.598305-3-maz@kernel.org --- include/linux/gpio/driver.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 98c93510640e..066bcfdf878d 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -579,6 +579,10 @@ void gpiochip_relres_irq(struct gpio_chip *gc, unsigned int offset); void gpiochip_disable_irq(struct gpio_chip *gc, unsigned int offset); void gpiochip_enable_irq(struct gpio_chip *gc, unsigned int offset); +/* irq_data versions of the above */ +int gpiochip_irq_reqres(struct irq_data *data); +void gpiochip_irq_relres(struct irq_data *data); + /* Line status inquiry for drivers */ bool gpiochip_line_is_open_drain(struct gpio_chip *gc, unsigned int offset); bool gpiochip_line_is_open_source(struct gpio_chip *gc, unsigned int offset); -- cgit From 36b78aae4bfee749bbde73be570796bfd0f56bec Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 19 Apr 2022 15:18:39 +0100 Subject: gpio: Add helpers to ease the transition towards immutable irq_chip Add a couple of new helpers to make it slightly simpler to convert drivers to immutable irq_chip structures: - GPIOCHIP_IRQ_RESOURCE_HELPERS populates the irq_chip structure with the resource management callbacks - gpio_irq_chip_set_chip() populates the gpio_irq_chip.chip structure, avoiding the proliferation of ugly casts Reviewed-by: Andy Shevchenko Reviewed-by: Bartosz Golaszewski Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220419141846.598305-4-maz@kernel.org --- include/linux/gpio/driver.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 066bcfdf878d..832990099097 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -583,6 +583,18 @@ void gpiochip_enable_irq(struct gpio_chip *gc, unsigned int offset); int gpiochip_irq_reqres(struct irq_data *data); void gpiochip_irq_relres(struct irq_data *data); +/* Paste this in your irq_chip structure */ +#define GPIOCHIP_IRQ_RESOURCE_HELPERS \ + .irq_request_resources = gpiochip_irq_reqres, \ + .irq_release_resources = gpiochip_irq_relres + +static inline void gpio_irq_chip_set_chip(struct gpio_irq_chip *girq, + const struct irq_chip *chip) +{ + /* Yes, dropping const is ugly, but it isn't like we have a choice */ + girq->chip = (struct irq_chip *)chip; +} + /* Line status inquiry for drivers */ bool gpiochip_line_is_open_drain(struct gpio_chip *gc, unsigned int offset); bool gpiochip_line_is_open_source(struct gpio_chip *gc, unsigned int offset); -- cgit From 08d3df8c81537089fc8f21006b56f2f6fb23c6f8 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Sun, 1 Sep 2019 22:26:10 +0200 Subject: ARM: pxa: split up mach/hardware.h The mach/hardware.h is included in lots of places, and it provides three different things on pxa: - the cpu_is_pxa* macros - an indirect inclusion of mach/addr-map.h - the __REG() and io_pv2() helper macros Split it up into separate and mach/pxa-regs.h headers, then change all the files that use mach/hardware.h to include the exact set of those three headers that they actually need, allowing for further more targeted cleanup. linux/soc/pxa/cpu.h can remain permanently exported and is now in a global location along with similar headers. pxa-regs.h and addr-map.h are only used in a very small number of drivers now and can be moved to arch/arm/mach-pxa/ directly when those drivers are to pass the necessary data as resources. Cc: Michael Turquette Cc: Stephen Boyd Acked-by: Viresh Kumar Acked-by: Dmitry Torokhov Cc: Jacek Anaszewski Cc: Pavel Machek Acked-by: Ulf Hansson Cc: Dominik Brodowski Acked-by: Alexandre Belloni Cc: Greg Kroah-Hartman Cc: Guenter Roeck Acked-by: Mark Brown Cc: linux-clk@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: linux-input@vger.kernel.org Cc: linux-leds@vger.kernel.org Cc: linux-mmc@vger.kernel.org Cc: linux-mtd@lists.infradead.org Cc: linux-rtc@vger.kernel.org Cc: linux-usb@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org Cc: linux-watchdog@vger.kernel.org Cc: alsa-devel@alsa-project.org Signed-off-by: Arnd Bergmann --- include/linux/soc/pxa/cpu.h | 252 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 252 insertions(+) create mode 100644 include/linux/soc/pxa/cpu.h (limited to 'include/linux') diff --git a/include/linux/soc/pxa/cpu.h b/include/linux/soc/pxa/cpu.h new file mode 100644 index 000000000000..5782450ee45c --- /dev/null +++ b/include/linux/soc/pxa/cpu.h @@ -0,0 +1,252 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Author: Nicolas Pitre + * Created: Jun 15, 2001 + * Copyright: MontaVista Software Inc. + */ + +#ifndef __SOC_PXA_CPU_H +#define __SOC_PXA_CPU_H + +#ifdef CONFIG_ARM +#include +#endif + +/* + * CPU Stepping CPU_ID JTAG_ID + * + * PXA210 B0 0x69052922 0x2926C013 + * PXA210 B1 0x69052923 0x3926C013 + * PXA210 B2 0x69052924 0x4926C013 + * PXA210 C0 0x69052D25 0x5926C013 + * + * PXA250 A0 0x69052100 0x09264013 + * PXA250 A1 0x69052101 0x19264013 + * PXA250 B0 0x69052902 0x29264013 + * PXA250 B1 0x69052903 0x39264013 + * PXA250 B2 0x69052904 0x49264013 + * PXA250 C0 0x69052D05 0x59264013 + * + * PXA255 A0 0x69052D06 0x69264013 + * + * PXA26x A0 0x69052903 0x39264013 + * PXA26x B0 0x69052D05 0x59264013 + * + * PXA27x A0 0x69054110 0x09265013 + * PXA27x A1 0x69054111 0x19265013 + * PXA27x B0 0x69054112 0x29265013 + * PXA27x B1 0x69054113 0x39265013 + * PXA27x C0 0x69054114 0x49265013 + * PXA27x C5 0x69054117 0x79265013 + * + * PXA30x A0 0x69056880 0x0E648013 + * PXA30x A1 0x69056881 0x1E648013 + * PXA31x A0 0x69056890 0x0E649013 + * PXA31x A1 0x69056891 0x1E649013 + * PXA31x A2 0x69056892 0x2E649013 + * PXA32x B1 0x69056825 0x5E642013 + * PXA32x B2 0x69056826 0x6E642013 + * + * PXA930 B0 0x69056835 0x5E643013 + * PXA930 B1 0x69056837 0x7E643013 + * PXA930 B2 0x69056838 0x8E643013 + * + * PXA935 A0 0x56056931 0x1E653013 + * PXA935 B0 0x56056936 0x6E653013 + * PXA935 B1 0x56056938 0x8E653013 + */ +#ifdef CONFIG_PXA25x +#define __cpu_is_pxa210(id) \ + ({ \ + unsigned int _id = (id) & 0xf3f0; \ + _id == 0x2120; \ + }) + +#define __cpu_is_pxa250(id) \ + ({ \ + unsigned int _id = (id) & 0xf3ff; \ + _id <= 0x2105; \ + }) + +#define __cpu_is_pxa255(id) \ + ({ \ + unsigned int _id = (id) & 0xffff; \ + _id == 0x2d06; \ + }) + +#define __cpu_is_pxa25x(id) \ + ({ \ + unsigned int _id = (id) & 0xf300; \ + _id == 0x2100; \ + }) +#else +#define __cpu_is_pxa210(id) (0) +#define __cpu_is_pxa250(id) (0) +#define __cpu_is_pxa255(id) (0) +#define __cpu_is_pxa25x(id) (0) +#endif + +#ifdef CONFIG_PXA27x +#define __cpu_is_pxa27x(id) \ + ({ \ + unsigned int _id = (id) >> 4 & 0xfff; \ + _id == 0x411; \ + }) +#else +#define __cpu_is_pxa27x(id) (0) +#endif + +#ifdef CONFIG_CPU_PXA300 +#define __cpu_is_pxa300(id) \ + ({ \ + unsigned int _id = (id) >> 4 & 0xfff; \ + _id == 0x688; \ + }) +#else +#define __cpu_is_pxa300(id) (0) +#endif + +#ifdef CONFIG_CPU_PXA310 +#define __cpu_is_pxa310(id) \ + ({ \ + unsigned int _id = (id) >> 4 & 0xfff; \ + _id == 0x689; \ + }) +#else +#define __cpu_is_pxa310(id) (0) +#endif + +#ifdef CONFIG_CPU_PXA320 +#define __cpu_is_pxa320(id) \ + ({ \ + unsigned int _id = (id) >> 4 & 0xfff; \ + _id == 0x603 || _id == 0x682; \ + }) +#else +#define __cpu_is_pxa320(id) (0) +#endif + +#ifdef CONFIG_CPU_PXA930 +#define __cpu_is_pxa930(id) \ + ({ \ + unsigned int _id = (id) >> 4 & 0xfff; \ + _id == 0x683; \ + }) +#else +#define __cpu_is_pxa930(id) (0) +#endif + +#ifdef CONFIG_CPU_PXA935 +#define __cpu_is_pxa935(id) \ + ({ \ + unsigned int _id = (id) >> 4 & 0xfff; \ + _id == 0x693; \ + }) +#else +#define __cpu_is_pxa935(id) (0) +#endif + +#define cpu_is_pxa210() \ + ({ \ + __cpu_is_pxa210(read_cpuid_id()); \ + }) + +#define cpu_is_pxa250() \ + ({ \ + __cpu_is_pxa250(read_cpuid_id()); \ + }) + +#define cpu_is_pxa255() \ + ({ \ + __cpu_is_pxa255(read_cpuid_id()); \ + }) + +#define cpu_is_pxa25x() \ + ({ \ + __cpu_is_pxa25x(read_cpuid_id()); \ + }) + +#define cpu_is_pxa27x() \ + ({ \ + __cpu_is_pxa27x(read_cpuid_id()); \ + }) + +#define cpu_is_pxa300() \ + ({ \ + __cpu_is_pxa300(read_cpuid_id()); \ + }) + +#define cpu_is_pxa310() \ + ({ \ + __cpu_is_pxa310(read_cpuid_id()); \ + }) + +#define cpu_is_pxa320() \ + ({ \ + __cpu_is_pxa320(read_cpuid_id()); \ + }) + +#define cpu_is_pxa930() \ + ({ \ + __cpu_is_pxa930(read_cpuid_id()); \ + }) + +#define cpu_is_pxa935() \ + ({ \ + __cpu_is_pxa935(read_cpuid_id()); \ + }) + + + +/* + * CPUID Core Generation Bit + * <= 0x2 for pxa21x/pxa25x/pxa26x/pxa27x + */ +#if defined(CONFIG_PXA25x) || defined(CONFIG_PXA27x) +#define __cpu_is_pxa2xx(id) \ + ({ \ + unsigned int _id = (id) >> 13 & 0x7; \ + _id <= 0x2; \ + }) +#else +#define __cpu_is_pxa2xx(id) (0) +#endif + +#ifdef CONFIG_PXA3xx +#define __cpu_is_pxa3xx(id) \ + ({ \ + __cpu_is_pxa300(id) \ + || __cpu_is_pxa310(id) \ + || __cpu_is_pxa320(id) \ + || __cpu_is_pxa93x(id); \ + }) +#else +#define __cpu_is_pxa3xx(id) (0) +#endif + +#if defined(CONFIG_CPU_PXA930) || defined(CONFIG_CPU_PXA935) +#define __cpu_is_pxa93x(id) \ + ({ \ + __cpu_is_pxa930(id) \ + || __cpu_is_pxa935(id); \ + }) +#else +#define __cpu_is_pxa93x(id) (0) +#endif + +#define cpu_is_pxa2xx() \ + ({ \ + __cpu_is_pxa2xx(read_cpuid_id()); \ + }) + +#define cpu_is_pxa3xx() \ + ({ \ + __cpu_is_pxa3xx(read_cpuid_id()); \ + }) + +#define cpu_is_pxa93x() \ + ({ \ + __cpu_is_pxa93x(read_cpuid_id()); \ + }) + +#endif -- cgit From 22f0866513c2e531ae65a9d5dfc82f24497ef3b3 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 2 Sep 2019 00:02:08 +0200 Subject: ARM: pxa: move mach/sound.h to linux/platform_data/ This is a basically a platform_data file, so move it out of the mach/* header directory. Cc: Marek Vasut Cc: Tomas Cech Cc: Sergey Lapin Acked-by: Mark Brown Acked-by: Robert Jarzmik Cc: alsa-devel@alsa-project.org Signed-off-by: Arnd Bergmann --- include/linux/platform_data/asoc-pxa.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 include/linux/platform_data/asoc-pxa.h (limited to 'include/linux') diff --git a/include/linux/platform_data/asoc-pxa.h b/include/linux/platform_data/asoc-pxa.h new file mode 100644 index 000000000000..327454cd8246 --- /dev/null +++ b/include/linux/platform_data/asoc-pxa.h @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __SOC_PXA_AUDIO_H__ +#define __SOC_PXA_AUDIO_H__ + +#include +#include +#include + +/* + * @reset_gpio: AC97 reset gpio (normally gpio113 or gpio95) + * a -1 value means no gpio will be used for reset + * @codec_pdata: AC97 codec platform_data + + * reset_gpio should only be specified for pxa27x CPUs where a silicon + * bug prevents correct operation of the reset line. If not specified, + * the default behaviour on these CPUs is to consider gpio 113 as the + * AC97 reset line, which is the default on most boards. + */ +typedef struct { + int (*startup)(struct snd_pcm_substream *, void *); + void (*shutdown)(struct snd_pcm_substream *, void *); + void (*suspend)(void *); + void (*resume)(void *); + void *priv; + int reset_gpio; + void *codec_pdata[AC97_BUS_MAX_DEVICES]; +} pxa2xx_audio_ops_t; + +extern void pxa_set_ac97_info(pxa2xx_audio_ops_t *ops); + +#endif -- cgit From ee84cbd5df2beaf14e8af0955f1ab15ad3f81504 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 2 Sep 2019 00:15:44 +0200 Subject: ARM: pxa: move regs-lcd.h into driver Only the pxafb driver uses this header, so move it into the same directory. The SMART_* macros are required by some platform data definitions and can go into the linux/platform_data/video-pxafb.h header. Acked-by: Bartlomiej Zolnierkiewicz Acked-by: Robert Jarzmik Cc: dri-devel@lists.freedesktop.org Cc: linux-fbdev@vger.kernel.org Signed-off-by: Arnd Bergmann --- include/linux/platform_data/video-pxafb.h | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/video-pxafb.h b/include/linux/platform_data/video-pxafb.h index b3d574778326..6333bac166a5 100644 --- a/include/linux/platform_data/video-pxafb.h +++ b/include/linux/platform_data/video-pxafb.h @@ -8,7 +8,6 @@ */ #include -#include /* * Supported LCD connections @@ -153,6 +152,27 @@ struct pxafb_mach_info { void pxa_set_fb_info(struct device *, struct pxafb_mach_info *); unsigned long pxafb_get_hsync_time(struct device *dev); +/* smartpanel related */ +#define SMART_CMD_A0 (0x1 << 8) +#define SMART_CMD_READ_STATUS_REG (0x0 << 9) +#define SMART_CMD_READ_FRAME_BUFFER ((0x0 << 9) | SMART_CMD_A0) +#define SMART_CMD_WRITE_COMMAND (0x1 << 9) +#define SMART_CMD_WRITE_DATA ((0x1 << 9) | SMART_CMD_A0) +#define SMART_CMD_WRITE_FRAME ((0x2 << 9) | SMART_CMD_A0) +#define SMART_CMD_WAIT_FOR_VSYNC (0x3 << 9) +#define SMART_CMD_NOOP (0x4 << 9) +#define SMART_CMD_INTERRUPT (0x5 << 9) + +#define SMART_CMD(x) (SMART_CMD_WRITE_COMMAND | ((x) & 0xff)) +#define SMART_DAT(x) (SMART_CMD_WRITE_DATA | ((x) & 0xff)) + +/* SMART_DELAY() is introduced for software controlled delay primitive which + * can be inserted between command sequences, unused command 0x6 is used here + * and delay ranges from 0ms ~ 255ms + */ +#define SMART_CMD_DELAY (0x6 << 9) +#define SMART_DELAY(ms) (SMART_CMD_DELAY | ((ms) & 0xff)) + #ifdef CONFIG_FB_PXA_SMARTPANEL extern int pxafb_smart_queue(struct fb_info *info, uint16_t *cmds, int); extern int pxafb_smart_flush(struct fb_info *info); -- cgit From eb38c2053b67977844404cbbdee341dbf3a02d36 Mon Sep 17 00:00:00 2001 From: Marc Kleine-Budde Date: Mon, 14 Mar 2022 23:09:10 +0100 Subject: can: rx-offload: rename can_rx_offload_queue_sorted() -> can_rx_offload_queue_timestamp() This patch renames the function can_rx_offload_queue_sorted() to can_rx_offload_queue_timestamp(). This better describes what the function does, it adds a newly RX'ed skb to the sorted queue by its timestamp. Link: https://lore.kernel.org/all/20220417194327.2699059-1-mkl@pengutronix.de Signed-off-by: Marc Kleine-Budde --- include/linux/can/rx-offload.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/can/rx-offload.h b/include/linux/can/rx-offload.h index c11477620403..c205c51d79c9 100644 --- a/include/linux/can/rx-offload.h +++ b/include/linux/can/rx-offload.h @@ -42,8 +42,8 @@ int can_rx_offload_add_manual(struct net_device *dev, int can_rx_offload_irq_offload_timestamp(struct can_rx_offload *offload, u64 reg); int can_rx_offload_irq_offload_fifo(struct can_rx_offload *offload); -int can_rx_offload_queue_sorted(struct can_rx_offload *offload, - struct sk_buff *skb, u32 timestamp); +int can_rx_offload_queue_timestamp(struct can_rx_offload *offload, + struct sk_buff *skb, u32 timestamp); unsigned int can_rx_offload_get_echo_skb(struct can_rx_offload *offload, unsigned int idx, u32 timestamp, unsigned int *frame_len_ptr); -- cgit From 055eb95533273bc334794dbc598400d10800528f Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Thu, 14 Apr 2022 09:12:33 -0700 Subject: bpf: Move rcu lock management out of BPF_PROG_RUN routines Commit 7d08c2c91171 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions") switched a bunch of BPF_PROG_RUN macros to inline routines. This changed the semantic a bit. Due to arguments expansion of macros, it used to be: rcu_read_lock(); array = rcu_dereference(cgrp->bpf.effective[atype]); ... Now, with with inline routines, we have: array_rcu = rcu_dereference(cgrp->bpf.effective[atype]); /* array_rcu can be kfree'd here */ rcu_read_lock(); array = rcu_dereference(array_rcu); I'm assuming in practice rcu subsystem isn't fast enough to trigger this but let's use rcu API properly. Also, rename to lower caps to not confuse with macros. Additionally, drop and expand BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY. See [1] for more context. [1] https://lore.kernel.org/bpf/CAKH8qBs60fOinFdxiiQikK_q0EcVxGvNTQoWvHLEUGbgcj1UYg@mail.gmail.com/T/#u v2 - keep rcu locks inside by passing cgroup_bpf Fixes: 7d08c2c91171 ("bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions") Signed-off-by: Stanislav Fomichev Signed-off-by: Alexei Starovoitov Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220414161233.170780-1-sdf@google.com --- include/linux/bpf.h | 115 ++++------------------------------------------------ 1 file changed, 7 insertions(+), 108 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index bdb5298735ce..7bf441563ffc 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1221,7 +1221,7 @@ u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, /* an array of programs to be executed under rcu_lock. * * Typical usage: - * ret = BPF_PROG_RUN_ARRAY(&bpf_prog_array, ctx, bpf_prog_run); + * ret = bpf_prog_run_array(rcu_dereference(&bpf_prog_array), ctx, bpf_prog_run); * * the structure returned by bpf_prog_array_alloc() should be populated * with program pointers and the last pointer must be NULL. @@ -1315,83 +1315,22 @@ static inline void bpf_reset_run_ctx(struct bpf_run_ctx *old_ctx) typedef u32 (*bpf_prog_run_fn)(const struct bpf_prog *prog, const void *ctx); -static __always_inline int -BPF_PROG_RUN_ARRAY_CG_FLAGS(const struct bpf_prog_array __rcu *array_rcu, - const void *ctx, bpf_prog_run_fn run_prog, - int retval, u32 *ret_flags) -{ - const struct bpf_prog_array_item *item; - const struct bpf_prog *prog; - const struct bpf_prog_array *array; - struct bpf_run_ctx *old_run_ctx; - struct bpf_cg_run_ctx run_ctx; - u32 func_ret; - - run_ctx.retval = retval; - migrate_disable(); - rcu_read_lock(); - array = rcu_dereference(array_rcu); - item = &array->items[0]; - old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); - while ((prog = READ_ONCE(item->prog))) { - run_ctx.prog_item = item; - func_ret = run_prog(prog, ctx); - if (!(func_ret & 1) && !IS_ERR_VALUE((long)run_ctx.retval)) - run_ctx.retval = -EPERM; - *(ret_flags) |= (func_ret >> 1); - item++; - } - bpf_reset_run_ctx(old_run_ctx); - rcu_read_unlock(); - migrate_enable(); - return run_ctx.retval; -} - -static __always_inline int -BPF_PROG_RUN_ARRAY_CG(const struct bpf_prog_array __rcu *array_rcu, - const void *ctx, bpf_prog_run_fn run_prog, - int retval) -{ - const struct bpf_prog_array_item *item; - const struct bpf_prog *prog; - const struct bpf_prog_array *array; - struct bpf_run_ctx *old_run_ctx; - struct bpf_cg_run_ctx run_ctx; - - run_ctx.retval = retval; - migrate_disable(); - rcu_read_lock(); - array = rcu_dereference(array_rcu); - item = &array->items[0]; - old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); - while ((prog = READ_ONCE(item->prog))) { - run_ctx.prog_item = item; - if (!run_prog(prog, ctx) && !IS_ERR_VALUE((long)run_ctx.retval)) - run_ctx.retval = -EPERM; - item++; - } - bpf_reset_run_ctx(old_run_ctx); - rcu_read_unlock(); - migrate_enable(); - return run_ctx.retval; -} - static __always_inline u32 -BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu, +bpf_prog_run_array(const struct bpf_prog_array *array, const void *ctx, bpf_prog_run_fn run_prog) { const struct bpf_prog_array_item *item; const struct bpf_prog *prog; - const struct bpf_prog_array *array; struct bpf_run_ctx *old_run_ctx; struct bpf_trace_run_ctx run_ctx; u32 ret = 1; - migrate_disable(); - rcu_read_lock(); - array = rcu_dereference(array_rcu); + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), "no rcu lock held"); + if (unlikely(!array)) - goto out; + return ret; + + migrate_disable(); old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); item = &array->items[0]; while ((prog = READ_ONCE(item->prog))) { @@ -1400,50 +1339,10 @@ BPF_PROG_RUN_ARRAY(const struct bpf_prog_array __rcu *array_rcu, item++; } bpf_reset_run_ctx(old_run_ctx); -out: - rcu_read_unlock(); migrate_enable(); return ret; } -/* To be used by __cgroup_bpf_run_filter_skb for EGRESS BPF progs - * so BPF programs can request cwr for TCP packets. - * - * Current cgroup skb programs can only return 0 or 1 (0 to drop the - * packet. This macro changes the behavior so the low order bit - * indicates whether the packet should be dropped (0) or not (1) - * and the next bit is a congestion notification bit. This could be - * used by TCP to call tcp_enter_cwr() - * - * Hence, new allowed return values of CGROUP EGRESS BPF programs are: - * 0: drop packet - * 1: keep packet - * 2: drop packet and cn - * 3: keep packet and cn - * - * This macro then converts it to one of the NET_XMIT or an error - * code that is then interpreted as drop packet (and no cn): - * 0: NET_XMIT_SUCCESS skb should be transmitted - * 1: NET_XMIT_DROP skb should be dropped and cn - * 2: NET_XMIT_CN skb should be transmitted and cn - * 3: -err skb should be dropped - */ -#define BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY(array, ctx, func) \ - ({ \ - u32 _flags = 0; \ - bool _cn; \ - u32 _ret; \ - _ret = BPF_PROG_RUN_ARRAY_CG_FLAGS(array, ctx, func, 0, &_flags); \ - _cn = _flags & BPF_RET_SET_CN; \ - if (_ret && !IS_ERR_VALUE((long)_ret)) \ - _ret = -EFAULT; \ - if (!_ret) \ - _ret = (_cn ? NET_XMIT_CN : NET_XMIT_SUCCESS); \ - else \ - _ret = (_cn ? NET_XMIT_DROP : _ret); \ - _ret; \ - }) - #ifdef CONFIG_BPF_SYSCALL DECLARE_PER_CPU(int, bpf_prog_active); extern struct mutex bpf_stats_enabled_mutex; -- cgit From 705191b03d507744c7e097f78d583621c14988ac Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 19 Apr 2022 15:14:23 +0200 Subject: fs: fix acl translation Last cycle we extended the idmapped mounts infrastructure to support idmapped mounts of idmapped filesystems (No such filesystem yet exist.). Since then, the meaning of an idmapped mount is a mount whose idmapping is different from the filesystems idmapping. While doing that work we missed to adapt the acl translation helpers. They still assume that checking for the identity mapping is enough. But they need to use the no_idmapping() helper instead. Note, POSIX ACLs are always translated right at the userspace-kernel boundary using the caller's current idmapping and the initial idmapping. The order depends on whether we're coming from or going to userspace. The filesystem's idmapping doesn't matter at the border. Consequently, if a non-idmapped mount is passed we need to make sure to always pass the initial idmapping as the mount's idmapping and not the filesystem idmapping. Since it's irrelevant here it would yield invalid ids and prevent setting acls for filesystems that are mountable in a userns and support posix acls (tmpfs and fuse). I verified the regression reported in [1] and verified that this patch fixes it. A regression test will be added to xfstests in parallel. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1] Fixes: bd303368b776 ("fs: support mapped mounts of mapped filesystems") Cc: Seth Forshee Cc: Christoph Hellwig Cc: # 5.17 Cc: Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Linus Torvalds --- include/linux/posix_acl_xattr.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/posix_acl_xattr.h b/include/linux/posix_acl_xattr.h index 060e8d203181..1766e1de6956 100644 --- a/include/linux/posix_acl_xattr.h +++ b/include/linux/posix_acl_xattr.h @@ -34,15 +34,19 @@ posix_acl_xattr_count(size_t size) #ifdef CONFIG_FS_POSIX_ACL void posix_acl_fix_xattr_from_user(struct user_namespace *mnt_userns, + struct inode *inode, void *value, size_t size); void posix_acl_fix_xattr_to_user(struct user_namespace *mnt_userns, + struct inode *inode, void *value, size_t size); #else static inline void posix_acl_fix_xattr_from_user(struct user_namespace *mnt_userns, + struct inode *inode, void *value, size_t size) { } static inline void posix_acl_fix_xattr_to_user(struct user_namespace *mnt_userns, + struct inode *inode, void *value, size_t size) { } -- cgit From 559089e0a93d44280ec3ab478830af319c56dbe3 Mon Sep 17 00:00:00 2001 From: Song Liu Date: Fri, 15 Apr 2022 09:44:10 -0700 Subject: vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP Huge page backed vmalloc memory could benefit performance in many cases. However, some users of vmalloc may not be ready to handle huge pages for various reasons: hardware constraints, potential pages split, etc. VM_NO_HUGE_VMAP was introduced to allow vmalloc users to opt-out huge pages. However, it is not easy to track down all the users that require the opt-out, as the allocation are passed different stacks and may cause issues in different layers. To address this issue, replace VM_NO_HUGE_VMAP with an opt-in flag, VM_ALLOW_HUGE_VMAP, so that users that benefit from huge pages could ask specificially. Also, remove vmalloc_no_huge() and add opt-in helper vmalloc_huge(). Fixes: fac54e2bfb5b ("x86/Kconfig: Select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP") Link: https://lore.kernel.org/netdev/14444103-d51b-0fb3-ee63-c3f182f0b546@molgen.mpg.de/" Reviewed-by: Christoph Hellwig Signed-off-by: Song Liu Reviewed-by: Rik van Riel Signed-off-by: Linus Torvalds --- include/linux/vmalloc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 3b1df7da402d..b159c2789961 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -26,7 +26,7 @@ struct notifier_block; /* in notifier.h */ #define VM_KASAN 0x00000080 /* has allocated kasan shadow memory */ #define VM_FLUSH_RESET_PERMS 0x00000100 /* reset direct map and flush TLB on unmap, can't be freed in atomic context */ #define VM_MAP_PUT_PAGES 0x00000200 /* put pages and free array in vfree */ -#define VM_NO_HUGE_VMAP 0x00000400 /* force PAGE_SIZE pte mapping */ +#define VM_ALLOW_HUGE_VMAP 0x00000400 /* Allow for huge pages on archs with HAVE_ARCH_HUGE_VMALLOC */ #if (defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS)) && \ !defined(CONFIG_KASAN_VMALLOC) @@ -153,7 +153,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align, const void *caller) __alloc_size(1); void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, int node, const void *caller) __alloc_size(1); -void *vmalloc_no_huge(unsigned long size) __alloc_size(1); +void *vmalloc_huge(unsigned long size, gfp_t gfp_mask) __alloc_size(1); extern void *__vmalloc_array(size_t n, size_t size, gfp_t flags) __alloc_size(1, 2); extern void *vmalloc_array(size_t n, size_t size) __alloc_size(1, 2); -- cgit From b83deaa741558babf4b8d51d34f6637ccfff1b26 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 28 May 2020 22:57:40 +0200 Subject: ARM: pxa: move pcmcia board data into mach-pxa The drivers/pcmcia/pxa2xx_*.c are essentially part of the board files, but for historic reasons located in drivers/pcmcia. Move them into the same place as the actual board file to avoid lots of machine header inclusions. Cc: Marek Vasut Cc: Dominik Brodowski Cc: Jonathan Cameron Signed-off-by: Arnd Bergmann --- include/linux/platform_data/pcmcia-pxa2xx_viper.h | 12 ------------ 1 file changed, 12 deletions(-) delete mode 100644 include/linux/platform_data/pcmcia-pxa2xx_viper.h (limited to 'include/linux') diff --git a/include/linux/platform_data/pcmcia-pxa2xx_viper.h b/include/linux/platform_data/pcmcia-pxa2xx_viper.h deleted file mode 100644 index a23b58aff9e1..000000000000 --- a/include/linux/platform_data/pcmcia-pxa2xx_viper.h +++ /dev/null @@ -1,12 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __ARCOM_PCMCIA_H -#define __ARCOM_PCMCIA_H - -struct arcom_pcmcia_pdata { - int cd_gpio; - int rdy_gpio; - int pwr_gpio; - void (*reset)(int state); -}; - -#endif -- cgit From d4e5268a08b211b536fed29beb24271ecd85187e Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 8 Apr 2022 11:45:55 +0200 Subject: x86,objtool: Mark cpu_startup_entry() __noreturn GCC-8 isn't clever enough to figure out that cpu_start_entry() is a noreturn while objtool is. This results in code after the call in start_secondary(). Give GCC a hand so that they all agree on things. vmlinux.o: warning: objtool: start_secondary()+0x10e: unreachable Reported-by: Rick Edgecombe Signed-off-by: Peter Zijlstra (Intel) Acked-by: Josh Poimboeuf Link: https://lore.kernel.org/r/20220408094718.383658532@infradead.org --- include/linux/cpu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 9cf51e41e697..54dc2f9a2d56 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -167,7 +167,7 @@ static inline int suspend_disable_secondary_cpus(void) { return 0; } static inline void suspend_enable_secondary_cpus(void) { } #endif /* !CONFIG_PM_SLEEP_SMP */ -void cpu_startup_entry(enum cpuhp_state state); +void __noreturn cpu_startup_entry(enum cpuhp_state state); void cpu_idle_poll_ctrl(bool enable); -- cgit From dcf456c9a095a6e71f53d6f6f004133ee851ee70 Mon Sep 17 00:00:00 2001 From: KP Singh Date: Mon, 18 Apr 2022 15:51:58 +0000 Subject: bpf: Fix usage of trace RCU in local storage. bpf_{sk,task,inode}_storage_free() do not need to use call_rcu_tasks_trace as no BPF program should be accessing the owner as it's being destroyed. The only other reader at this point is bpf_local_storage_map_free() which uses normal RCU. The only path that needs trace RCU are: * bpf_local_storage_{delete,update} helpers * map_{delete,update}_elem() syscalls Fixes: 0fe4b381a59e ("bpf: Allow bpf_local_storage to be used by sleepable programs") Signed-off-by: KP Singh Signed-off-by: Alexei Starovoitov Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220418155158.2865678-1-kpsingh@kernel.org --- include/linux/bpf_local_storage.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_local_storage.h index 493e63258497..7ea18d4da84b 100644 --- a/include/linux/bpf_local_storage.h +++ b/include/linux/bpf_local_storage.h @@ -143,9 +143,9 @@ void bpf_selem_link_storage_nolock(struct bpf_local_storage *local_storage, bool bpf_selem_unlink_storage_nolock(struct bpf_local_storage *local_storage, struct bpf_local_storage_elem *selem, - bool uncharge_omem); + bool uncharge_omem, bool use_trace_rcu); -void bpf_selem_unlink(struct bpf_local_storage_elem *selem); +void bpf_selem_unlink(struct bpf_local_storage_elem *selem, bool use_trace_rcu); void bpf_selem_link_map(struct bpf_local_storage_map *smap, struct bpf_local_storage_elem *selem); -- cgit From 3abfaefb9a6dda5e0bfa6160f55abfd0e32d8b0a Mon Sep 17 00:00:00 2001 From: Liu Ying Date: Tue, 19 Apr 2022 09:08:49 +0800 Subject: phy: Add LVDS configuration options This patch allows LVDS PHYs to be configured through the generic functions and through a custom structure added to the generic union. The parameters added here are based on common LVDS PHY implementation practices. The set of parameters should cover all potential users. Cc: Kishon Vijay Abraham I Cc: Vinod Koul Cc: NXP Linux Team Signed-off-by: Liu Ying Link: https://lore.kernel.org/r/20220419010852.452169-3-victor.liu@nxp.com Signed-off-by: Vinod Koul --- include/linux/phy/phy-lvds.h | 32 ++++++++++++++++++++++++++++++++ include/linux/phy/phy.h | 4 ++++ 2 files changed, 36 insertions(+) create mode 100644 include/linux/phy/phy-lvds.h (limited to 'include/linux') diff --git a/include/linux/phy/phy-lvds.h b/include/linux/phy/phy-lvds.h new file mode 100644 index 000000000000..09931d080a6d --- /dev/null +++ b/include/linux/phy/phy-lvds.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright 2020,2022 NXP + */ + +#ifndef __PHY_LVDS_H_ +#define __PHY_LVDS_H_ + +/** + * struct phy_configure_opts_lvds - LVDS configuration set + * @bits_per_lane_and_dclk_cycle: Number of bits per lane per differential + * clock cycle. + * @differential_clk_rate: Clock rate, in Hertz, of the LVDS + * differential clock. + * @lanes: Number of active, consecutive, + * data lanes, starting from lane 0, + * used for the transmissions. + * @is_slave: Boolean, true if the phy is a slave + * which works together with a master + * phy to support dual link transmission, + * otherwise a regular phy or a master phy. + * + * This structure is used to represent the configuration state of a LVDS phy. + */ +struct phy_configure_opts_lvds { + unsigned int bits_per_lane_and_dclk_cycle; + unsigned long differential_clk_rate; + unsigned int lanes; + bool is_slave; +}; + +#endif /* __PHY_LVDS_H_ */ diff --git a/include/linux/phy/phy.h b/include/linux/phy/phy.h index f3286f4cd306..b1413757fcc3 100644 --- a/include/linux/phy/phy.h +++ b/include/linux/phy/phy.h @@ -17,6 +17,7 @@ #include #include +#include #include struct phy; @@ -57,10 +58,13 @@ enum phy_media { * the MIPI_DPHY phy mode. * @dp: Configuration set applicable for phys supporting * the DisplayPort protocol. + * @lvds: Configuration set applicable for phys supporting + * the LVDS phy mode. */ union phy_configure_opts { struct phy_configure_opts_mipi_dphy mipi_dphy; struct phy_configure_opts_dp dp; + struct phy_configure_opts_lvds lvds; }; /** -- cgit From fc44ff0ae9f2aa20b64c4e63ab4614156df80240 Mon Sep 17 00:00:00 2001 From: Ben Walker Date: Tue, 1 Mar 2022 11:25:48 -0700 Subject: dmaengine: Document dmaengine_prep_dma_memset Document this function to make clear the expected behavior of the 'value' parameter. It was intended to match the behavior of POSIX memset as laid out here: https://lore.kernel.org/dmaengine/YejrA5ZWZ3lTRO%2F1@matsya/ Signed-off-by: Ben Walker Link: https://lore.kernel.org/r/20220301182551.883474-2-benjamin.walker@intel.com Signed-off-by: Vinod Koul --- include/linux/dmaengine.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index 6db9e03afd0b..b46b88e6aa0d 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -1030,6 +1030,14 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_interleaved_dma( return chan->device->device_prep_interleaved_dma(chan, xt, flags); } +/** + * dmaengine_prep_dma_memset() - Prepare a DMA memset descriptor. + * @chan: The channel to be used for this descriptor + * @dest: Address of buffer to be set + * @value: Treated as a single byte value that fills the destination buffer + * @len: The total size of dest + * @flags: DMA engine flags + */ static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memset( struct dma_chan *chan, dma_addr_t dest, int value, size_t len, unsigned long flags) -- cgit From 5252c1c5a08e583ab1363f809002cd8a59272b35 Mon Sep 17 00:00:00 2001 From: Chun-Kuang Hu Date: Sat, 16 Apr 2022 17:54:28 +0800 Subject: soc: mediatek: cmdq: Use mailbox rx_callback instead of cmdq_task_cb rx_callback is a standard mailbox callback mechanism and could cover the function of proprietary cmdq_task_cb, so use the standard one instead of the proprietary one. Client has changed to use the standard callback machanism and sync dma buffer in client driver, so remove the proprietary callback in cmdq helper. Signed-off-by: Chun-Kuang Hu Reviewed-by: jason-jh.lin Tested-by: jason-jh.lin Link: https://lore.kernel.org/r/1650102868-26219-1-git-send-email-chunkuang.hu@kernel.org Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/mtk-cmdq.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk-cmdq.h b/include/linux/soc/mediatek/mtk-cmdq.h index ac6b5f3cba95..2b498f4f3946 100644 --- a/include/linux/soc/mediatek/mtk-cmdq.h +++ b/include/linux/soc/mediatek/mtk-cmdq.h @@ -268,8 +268,6 @@ int cmdq_pkt_finalize(struct cmdq_pkt *pkt); * cmdq_pkt_flush_async() - trigger CMDQ to asynchronously execute the CMDQ * packet and call back at the end of done packet * @pkt: the CMDQ packet - * @cb: called at the end of done packet - * @data: this data will pass back to cb * * Return: 0 for success; else the error code is returned * @@ -277,7 +275,6 @@ int cmdq_pkt_finalize(struct cmdq_pkt *pkt); * at the end of done packet. Note that this is an ASYNC function. When the * function returned, it may or may not be finished. */ -int cmdq_pkt_flush_async(struct cmdq_pkt *pkt, cmdq_async_flush_cb cb, - void *data); +int cmdq_pkt_flush_async(struct cmdq_pkt *pkt); #endif /* __MTK_CMDQ_H__ */ -- cgit From 7a107b2c6b813c3c587d1e76cb405dc637dd2b36 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Wed, 20 Apr 2022 11:19:01 +0300 Subject: Revert "serial: 8250: Handle UART without interrupt on TEMT using em485" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This partially reverts commit f6f586102add. The code added by that commit containted math overflow for 32-bit archs. In addition, the approach used in it is unnecessarily complicated requiring a dedicated timer just for notemt. A simpler approach for providing UART_CAP_NOTEMT already exists (patches 1-2): https://lore.kernel.org/linux-serial/20220411083321.9131-3-ilpo.jarvinen@linux.intel.com/T/#u Thus, simply revert the UART_CAP_NOTEMT change for now. There were two driver changes within the patch series adding UART_CAP_NOTEMT taking advantage of the newly added flag. This does not revert the driver changes and therefore also UART_CAP_NOTEMT define has to remain. UART_CAP_NOTEMT remains no-op until support is again added. Fixes: f6f586102add ("serial: 8250: Handle UART without interrupt on TEMT using em485") Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/5f874142-fb1f-bff7-f33-fac823e65e2e@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_8250.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h index de135852107c..ff84a3ed10ea 100644 --- a/include/linux/serial_8250.h +++ b/include/linux/serial_8250.h @@ -79,9 +79,7 @@ struct uart_8250_ops { struct uart_8250_em485 { struct hrtimer start_tx_timer; /* "rs485 start tx" timer */ struct hrtimer stop_tx_timer; /* "rs485 stop tx" timer */ - struct hrtimer no_temt_timer; /* "rs485 no TEMT interrupt" timer */ struct hrtimer *active_timer; /* pointer to active timer */ - unsigned long no_temt_delay; /* Delay for no_temt_timer */ struct uart_8250_port *port; /* for hrtimer callbacks */ unsigned int tx_stopped:1; /* tx is currently stopped */ }; -- cgit From 92ece28072f18f30099770c5d4b8e300ea6820fa Mon Sep 17 00:00:00 2001 From: Liu Jian Date: Sat, 16 Apr 2022 18:58:00 +0800 Subject: net: Change skb_ensure_writable()'s write_len param to unsigned int type Both pskb_may_pull() and skb_clone_writable()'s length parameters are of type unsigned int already. Therefore, change this function's write_len param to unsigned int type. Signed-off-by: Liu Jian Signed-off-by: Daniel Borkmann Acked-by: Song Liu Link: https://lore.kernel.org/bpf/20220416105801.88708-3-liujian56@huawei.com --- include/linux/skbuff.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2394441fa3dd..b37c829d08e1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -3885,7 +3885,7 @@ struct sk_buff *skb_segment(struct sk_buff *skb, netdev_features_t features); struct sk_buff *skb_segment_list(struct sk_buff *skb, netdev_features_t features, unsigned int offset); struct sk_buff *skb_vlan_untag(struct sk_buff *skb); -int skb_ensure_writable(struct sk_buff *skb, int write_len); +int skb_ensure_writable(struct sk_buff *skb, unsigned int write_len); int __skb_vlan_pop(struct sk_buff *skb, u16 *vlan_tci); int skb_vlan_pop(struct sk_buff *skb); int skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci); -- cgit From 37c5f9e80e015d0df17d0c377c18523002986851 Mon Sep 17 00:00:00 2001 From: Oleksandr Ocheretnyi Date: Sun, 17 Apr 2022 11:46:47 -0700 Subject: mtd: fix 'part' field data corruption in mtd_info Commit 46b5889cc2c5 ("mtd: implement proper partition handling") started using "mtd_get_master_ofs()" in mtd callbacks to determine memory offsets by means of 'part' field from mtd_info, what previously was smashed accessing 'master' field in the mtd_set_dev_defaults() method. That provides wrong offset what causes hardware access errors. Just make 'part', 'master' as separate fields, rather than using union type to avoid 'part' data corruption when mtd_set_dev_defaults() is called. Fixes: 46b5889cc2c5 ("mtd: implement proper partition handling") Signed-off-by: Oleksandr Ocheretnyi Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220417184649.449289-1-oocheret@cisco.com --- include/linux/mtd/mtd.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h index 151607e9d64a..955aee14b0f7 100644 --- a/include/linux/mtd/mtd.h +++ b/include/linux/mtd/mtd.h @@ -389,10 +389,8 @@ struct mtd_info { /* List of partitions attached to this MTD device */ struct list_head partitions; - union { - struct mtd_part part; - struct mtd_master master; - }; + struct mtd_part part; + struct mtd_master master; }; static inline struct mtd_info *mtd_get_master(struct mtd_info *mtd) -- cgit From f4c5c7f9d2e5ab005d57826b740b694b042a737c Mon Sep 17 00:00:00 2001 From: Felix Matouschek Date: Mon, 18 Apr 2022 15:28:03 +0200 Subject: mtd: spinand: Add support for XTX XT26G0xA Add support for XTX Technology XT26G01AXXXXX, XTX26G02AXXXXX and XTX26G04AXXXXX SPI NAND. These are 3V, 1G/2G/4Gbit serial SLC NAND flash devices with on-die ECC (8bit strength per 512bytes). Tested on Teltonika RUTX10 flashed with OpenWrt. Links: - http://www.xtxtech.com/download/?AId=225 - https://datasheet.lcsc.com/szlcsc/2005251034_XTX-XT26G01AWSEGA_C558841.pdf Signed-off-by: Felix Matouschek Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220418132803.664103-1-felix@matouschek.org --- include/linux/mtd/spinand.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mtd/spinand.h b/include/linux/mtd/spinand.h index 3aa28240a77f..5584d3bb6556 100644 --- a/include/linux/mtd/spinand.h +++ b/include/linux/mtd/spinand.h @@ -266,6 +266,7 @@ extern const struct spinand_manufacturer micron_spinand_manufacturer; extern const struct spinand_manufacturer paragon_spinand_manufacturer; extern const struct spinand_manufacturer toshiba_spinand_manufacturer; extern const struct spinand_manufacturer winbond_spinand_manufacturer; +extern const struct spinand_manufacturer xtx_spinand_manufacturer; /** * struct spinand_op_variants - SPI NAND operation variants -- cgit From 0b0002315adfea3d9eb102665a4441727d4d2621 Mon Sep 17 00:00:00 2001 From: Weili Qian Date: Sat, 16 Apr 2022 18:45:56 +0800 Subject: crypto: hisilicon/qm - remove unused function declaration The 'hisi_qm_get_hw_version' function is unused, so remove the function declaration. Signed-off-by: Weili Qian Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index e5522eaf88fd..5177858aeea9 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -453,7 +453,6 @@ int hisi_qp_send(struct hisi_qp *qp, const void *msg); int hisi_qm_get_free_qp_num(struct hisi_qm *qm); int hisi_qm_get_vft(struct hisi_qm *qm, u32 *base, u32 *number); void hisi_qm_debug_init(struct hisi_qm *qm); -enum qm_hw_ver hisi_qm_get_hw_version(struct pci_dev *pdev); void hisi_qm_debug_regs_clear(struct hisi_qm *qm); int hisi_qm_sriov_enable(struct pci_dev *pdev, int max_vfs); int hisi_qm_sriov_disable(struct pci_dev *pdev, bool is_frozen); -- cgit From fb06eb9727d67eac16e6b6aff35362476389cb88 Mon Sep 17 00:00:00 2001 From: Weili Qian Date: Sat, 16 Apr 2022 18:45:57 +0800 Subject: crypto: hisilicon/qm - set function with static These functions 'hisi_qm_create_qp' and 'hisi_qm_set_vft' are not used outside qm.c, so they are marked as static. Signed-off-by: Weili Qian Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 5177858aeea9..e99b286b50bc 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -445,13 +445,11 @@ int hisi_qm_init(struct hisi_qm *qm); void hisi_qm_uninit(struct hisi_qm *qm); int hisi_qm_start(struct hisi_qm *qm); int hisi_qm_stop(struct hisi_qm *qm, enum qm_stop_reason r); -struct hisi_qp *hisi_qm_create_qp(struct hisi_qm *qm, u8 alg_type); int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg); int hisi_qm_stop_qp(struct hisi_qp *qp); void hisi_qm_release_qp(struct hisi_qp *qp); int hisi_qp_send(struct hisi_qp *qp, const void *msg); int hisi_qm_get_free_qp_num(struct hisi_qm *qm); -int hisi_qm_get_vft(struct hisi_qm *qm, u32 *base, u32 *number); void hisi_qm_debug_init(struct hisi_qm *qm); void hisi_qm_debug_regs_clear(struct hisi_qm *qm); int hisi_qm_sriov_enable(struct pci_dev *pdev, int max_vfs); -- cgit From 7982996c5b0812cbf7846deff0e2f5dcbf290129 Mon Sep 17 00:00:00 2001 From: Weili Qian Date: Sat, 16 Apr 2022 18:45:58 +0800 Subject: crypto: hisilicon/qm - replace hisi_qm_release_qp() with hisi_qm_free_qps() hisi_qm_free_qps() can release multiple queues in one call, and it is already exported. So, replace hisi_qm_release_qp() with hisi_qm_free_qps() in zip_crypto.c, and do not export hisi_qm_release_qp() outside qm.c. Signed-off-by: Weili Qian Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index e99b286b50bc..d2ccb11071c5 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -447,7 +447,6 @@ int hisi_qm_start(struct hisi_qm *qm); int hisi_qm_stop(struct hisi_qm *qm, enum qm_stop_reason r); int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg); int hisi_qm_stop_qp(struct hisi_qp *qp); -void hisi_qm_release_qp(struct hisi_qp *qp); int hisi_qp_send(struct hisi_qp *qp, const void *msg); int hisi_qm_get_free_qp_num(struct hisi_qm *qm); void hisi_qm_debug_init(struct hisi_qm *qm); -- cgit From b0c42232fce499ba96fbf2c5ebd2368efeb6597e Mon Sep 17 00:00:00 2001 From: Weili Qian Date: Sat, 16 Apr 2022 18:45:59 +0800 Subject: crypto: hisilicon/qm - remove hisi_qm_get_free_qp_num() hisi_qm_get_free_qp_num() is to get the free queue number on the function. It is a simple function and is only called by hisi_qm_get_available_instances(). This patch modifies to get the free queue directly in hisi_qm_get_available_instances(), and remove hisi_qm_get_free_qp_num(). Signed-off-by: Weili Qian Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index d2ccb11071c5..6cabafffd0dd 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -448,7 +448,6 @@ int hisi_qm_stop(struct hisi_qm *qm, enum qm_stop_reason r); int hisi_qm_start_qp(struct hisi_qp *qp, unsigned long arg); int hisi_qm_stop_qp(struct hisi_qp *qp); int hisi_qp_send(struct hisi_qp *qp, const void *msg); -int hisi_qm_get_free_qp_num(struct hisi_qm *qm); void hisi_qm_debug_init(struct hisi_qm *qm); void hisi_qm_debug_regs_clear(struct hisi_qm *qm); int hisi_qm_sriov_enable(struct pci_dev *pdev, int max_vfs); -- cgit From 6c7f98b334a32df5cac8abac8983ac4ce17cab57 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 11 Apr 2022 16:13:59 +0200 Subject: vfio/mdev: Remove vfio_mdev.c Now that all mdev drivers directly create their own mdev_device driver and directly register with the vfio core's vfio_device_ops this is all dead code. Delete vfio_mdev.c and the mdev_parent_ops members that are connected to it. Signed-off-by: Jason Gunthorpe Signed-off-by: Christoph Hellwig Signed-off-by: Zhi Wang Link: http://patchwork.freedesktop.org/patch/msgid/20220411141403.86980-31-hch@lst.de Reviewed-by: Kirti Wankhede Reviewed-by: Zhi Wang --- include/linux/mdev.h | 48 +----------------------------------------------- 1 file changed, 1 insertion(+), 47 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 15d03f6532d0..192aed656116 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -40,40 +40,7 @@ struct device *mtype_get_parent_dev(struct mdev_type *mtype); * @mdev_attr_groups: Attributes of the mediated device. * @supported_type_groups: Attributes to define supported types. It is mandatory * to provide supported types. - * @create: Called to allocate basic resources in parent device's - * driver for a particular mediated device. It is - * mandatory to provide create ops. - * @mdev: mdev_device structure on of mediated device - * that is being created - * Returns integer: success (0) or error (< 0) - * @remove: Called to free resources in parent device's driver for - * a mediated device. It is mandatory to provide 'remove' - * ops. - * @mdev: mdev_device device structure which is being - * destroyed - * Returns integer: success (0) or error (< 0) - * @read: Read emulation callback - * @mdev: mediated device structure - * @buf: read buffer - * @count: number of bytes to read - * @ppos: address. - * Retuns number on bytes read on success or error. - * @write: Write emulation callback - * @mdev: mediated device structure - * @buf: write buffer - * @count: number of bytes to be written - * @ppos: address. - * Retuns number on bytes written on success or error. - * @ioctl: IOCTL callback - * @mdev: mediated device structure - * @cmd: ioctl command - * @arg: arguments to ioctl - * @mmap: mmap callback - * @mdev: mediated device structure - * @vma: vma structure - * @request: request callback to release device - * @mdev: mediated device structure - * @count: request sequence number + * * Parent device that support mediated device should be registered with mdev * module with mdev_parent_ops structure. **/ @@ -83,19 +50,6 @@ struct mdev_parent_ops { const struct attribute_group **dev_attr_groups; const struct attribute_group **mdev_attr_groups; struct attribute_group **supported_type_groups; - - int (*create)(struct mdev_device *mdev); - int (*remove)(struct mdev_device *mdev); - int (*open_device)(struct mdev_device *mdev); - void (*close_device)(struct mdev_device *mdev); - ssize_t (*read)(struct mdev_device *mdev, char __user *buf, - size_t count, loff_t *ppos); - ssize_t (*write)(struct mdev_device *mdev, const char __user *buf, - size_t count, loff_t *ppos); - long (*ioctl)(struct mdev_device *mdev, unsigned int cmd, - unsigned long arg); - int (*mmap)(struct mdev_device *mdev, struct vm_area_struct *vma); - void (*request)(struct mdev_device *mdev, unsigned int count); }; /* interface for exporting mdev supported type attributes */ -- cgit From e6486939d8ea22569af942a1888aa3824f59cc5e Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 11 Apr 2022 16:14:00 +0200 Subject: vfio/mdev: Remove mdev_parent_ops dev_attr_groups This is only used by one sample to print a fixed string that is pointless. In general, having a device driver attach sysfs attributes to the parent is horrific. This should never happen, and always leads to some kind of liftime bug as it become very difficult for the sysfs attribute to go back to any data owned by the device driver. Remove the general mechanism to create this abuse. Signed-off-by: Jason Gunthorpe Signed-off-by: Christoph Hellwig Signed-off-by: Zhi Wang Link: http://patchwork.freedesktop.org/patch/msgid/20220411141403.86980-32-hch@lst.de Reviewed-by: Kirti Wankhede Reviewed-by: Zhi Wang --- include/linux/mdev.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 192aed656116..69df1fa2cb6f 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -36,7 +36,6 @@ struct device *mtype_get_parent_dev(struct mdev_type *mtype); * * @owner: The module owner. * @device_driver: Which device driver to probe() on newly created devices - * @dev_attr_groups: Attributes of the parent device. * @mdev_attr_groups: Attributes of the mediated device. * @supported_type_groups: Attributes to define supported types. It is mandatory * to provide supported types. @@ -47,7 +46,6 @@ struct device *mtype_get_parent_dev(struct mdev_type *mtype); struct mdev_parent_ops { struct module *owner; struct mdev_driver *device_driver; - const struct attribute_group **dev_attr_groups; const struct attribute_group **mdev_attr_groups; struct attribute_group **supported_type_groups; }; -- cgit From 6b42f491e17ce13f5ff7f2d1f49c73a0f4c47b20 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 11 Apr 2022 16:14:01 +0200 Subject: vfio/mdev: Remove mdev_parent_ops The last useful member in this struct is the supported_type_groups, move it to the mdev_driver and delete mdev_parent_ops. Replace it with mdev_driver as an argument to mdev_register_device() Signed-off-by: Jason Gunthorpe Signed-off-by: Christoph Hellwig Signed-off-by: Zhi Wang Link: http://patchwork.freedesktop.org/patch/msgid/20220411141403.86980-33-hch@lst.de Reviewed-by: Kirti Wankhede Reviewed-by: Zhi Wang --- include/linux/mdev.h | 25 ++++--------------------- 1 file changed, 4 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 69df1fa2cb6f..1f6f57a3c316 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -30,26 +30,6 @@ unsigned int mdev_get_type_group_id(struct mdev_device *mdev); unsigned int mtype_get_type_group_id(struct mdev_type *mtype); struct device *mtype_get_parent_dev(struct mdev_type *mtype); -/** - * struct mdev_parent_ops - Structure to be registered for each parent device to - * register the device to mdev module. - * - * @owner: The module owner. - * @device_driver: Which device driver to probe() on newly created devices - * @mdev_attr_groups: Attributes of the mediated device. - * @supported_type_groups: Attributes to define supported types. It is mandatory - * to provide supported types. - * - * Parent device that support mediated device should be registered with mdev - * module with mdev_parent_ops structure. - **/ -struct mdev_parent_ops { - struct module *owner; - struct mdev_driver *device_driver; - const struct attribute_group **mdev_attr_groups; - struct attribute_group **supported_type_groups; -}; - /* interface for exporting mdev supported type attributes */ struct mdev_type_attribute { struct attribute attr; @@ -74,12 +54,15 @@ struct mdev_type_attribute mdev_type_attr_##_name = \ * struct mdev_driver - Mediated device driver * @probe: called when new device created * @remove: called when device removed + * @supported_type_groups: Attributes to define supported types. It is mandatory + * to provide supported types. * @driver: device driver structure * **/ struct mdev_driver { int (*probe)(struct mdev_device *dev); void (*remove)(struct mdev_device *dev); + struct attribute_group **supported_type_groups; struct device_driver driver; }; @@ -98,7 +81,7 @@ static inline const guid_t *mdev_uuid(struct mdev_device *mdev) extern struct bus_type mdev_bus_type; -int mdev_register_device(struct device *dev, const struct mdev_parent_ops *ops); +int mdev_register_device(struct device *dev, struct mdev_driver *mdev_driver); void mdev_unregister_device(struct device *dev); int mdev_register_driver(struct mdev_driver *drv); -- cgit From 2917f53113be3b7a0f374e02cebe6d6b749366b5 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 11 Apr 2022 16:14:03 +0200 Subject: vfio/mdev: Remove mdev drvdata This is no longer used, remove it. All usages were moved over to either use container_of() from a vfio_device or to use dev_drvdata() directly on the mdev. Signed-off-by: Jason Gunthorpe Signed-off-by: Christoph Hellwig Signed-off-by: Zhi Wang Link: http://patchwork.freedesktop.org/patch/msgid/20220411141403.86980-35-hch@lst.de Reviewed-by: Kirti Wankhede Reviewed-by: Zhi Wang --- include/linux/mdev.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 1f6f57a3c316..bb539794f54a 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -15,7 +15,6 @@ struct mdev_type; struct mdev_device { struct device dev; guid_t uuid; - void *driver_data; struct list_head next; struct mdev_type *type; bool active; @@ -66,14 +65,6 @@ struct mdev_driver { struct device_driver driver; }; -static inline void *mdev_get_drvdata(struct mdev_device *mdev) -{ - return mdev->driver_data; -} -static inline void mdev_set_drvdata(struct mdev_device *mdev, void *data) -{ - mdev->driver_data = data; -} static inline const guid_t *mdev_uuid(struct mdev_device *mdev) { return &mdev->uuid; -- cgit From 042c48848b7d8dab4f095bce2872c02f711bf5d0 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 5 Aug 2019 23:15:37 +0200 Subject: ARM: omap1: move lcd_dma code into omapfb driver The omapfb driver is split into platform specific code for omap1, and driver code that is also specific to omap1. Moving both parts into the driver directory simplifies the structure and avoids the dependency on certain omap machine header files. As mach/lcd_dma.h can not be included from include/linux/omap-dma.h any more now, move the omap_lcd_dma_running() declaration into the omap-dma header, which matches where it is defined. Acked-by: Bartlomiej Zolnierkiewicz Acked-by: Tony Lindgren Signed-off-by: Arnd Bergmann --- include/linux/omap-dma.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/omap-dma.h b/include/linux/omap-dma.h index 5c5c93ad6b50..441f5f0919c6 100644 --- a/include/linux/omap-dma.h +++ b/include/linux/omap-dma.h @@ -328,8 +328,8 @@ extern dma_addr_t omap_get_dma_dst_pos(int lch); extern int omap_get_dma_active_status(int lch); extern int omap_dma_running(void); -#if defined(CONFIG_ARCH_OMAP1) && IS_ENABLED(CONFIG_FB_OMAP) -#include +#if IS_ENABLED(CONFIG_FB_OMAP) +extern int omap_lcd_dma_running(void); #else static inline int omap_lcd_dma_running(void) { -- cgit From 0768fb6709343679e55f7135e2ed2c432e4500d8 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 9 Aug 2019 21:42:31 +0200 Subject: ARM: omap1: declare a dummy omap_set_dma_priority omapfb calls directly into the omap_set_dma_priority() function in the DMA driver. This prevents compile-testing omapfb on other architectures. Add an inline function next to the other ones for non-omap configurations. Acked-by: Bartlomiej Zolnierkiewicz Acked-by: Tony Lindgren Signed-off-by: Arnd Bergmann --- include/linux/omap-dma.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/omap-dma.h b/include/linux/omap-dma.h index 441f5f0919c6..5e228428fda1 100644 --- a/include/linux/omap-dma.h +++ b/include/linux/omap-dma.h @@ -338,6 +338,9 @@ static inline int omap_lcd_dma_running(void) #endif #else /* CONFIG_ARCH_OMAP */ +static inline void omap_set_dma_priority(int lch, int dst_port, int priority) +{ +} static inline struct omap_system_dma_plat_info *omap_get_plat_info(void) { -- cgit From e8e77e97507bf3ec5419b8067d242b538cf75cba Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 6 Aug 2019 14:34:31 +0200 Subject: ARM: omap1: move mach/usb.h to include/linux/soc The register definitions in this header are used in at least four different places, with little hope of completely cleaning that up. Split up the file into a portion that becomes a linux-wide header under include/linux/soc/ti/, and the parts that are actually only needed by board files. Acked-by: Felipe Balbi Acked-by: Tony Lindgren Signed-off-by: Arnd Bergmann --- include/linux/soc/ti/omap1-usb.h | 116 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 116 insertions(+) create mode 100644 include/linux/soc/ti/omap1-usb.h (limited to 'include/linux') diff --git a/include/linux/soc/ti/omap1-usb.h b/include/linux/soc/ti/omap1-usb.h new file mode 100644 index 000000000000..67488698601a --- /dev/null +++ b/include/linux/soc/ti/omap1-usb.h @@ -0,0 +1,116 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __SOC_TI_OMAP1_USB +#define __SOC_TI_OMAP1_USB +/* + * Constants in this file are used all over the place, in platform + * code, as well as the udc, phy and ohci drivers. + * This is not a great design, but unlikely to get fixed after + * such a long time. Don't do this elsewhere. + */ + +#define OMAP1_OTG_BASE 0xfffb0400 +#define OMAP1_UDC_BASE 0xfffb4000 + +#define OMAP2_UDC_BASE 0x4805e200 +#define OMAP2_OTG_BASE 0x4805e300 +#define OTG_BASE OMAP1_OTG_BASE +#define UDC_BASE OMAP1_UDC_BASE + +/* + * OTG and transceiver registers, for OMAPs starting with ARM926 + */ +#define OTG_REV (OTG_BASE + 0x00) +#define OTG_SYSCON_1 (OTG_BASE + 0x04) +# define USB2_TRX_MODE(w) (((w)>>24)&0x07) +# define USB1_TRX_MODE(w) (((w)>>20)&0x07) +# define USB0_TRX_MODE(w) (((w)>>16)&0x07) +# define OTG_IDLE_EN (1 << 15) +# define HST_IDLE_EN (1 << 14) +# define DEV_IDLE_EN (1 << 13) +# define OTG_RESET_DONE (1 << 2) +# define OTG_SOFT_RESET (1 << 1) +#define OTG_SYSCON_2 (OTG_BASE + 0x08) +# define OTG_EN (1 << 31) +# define USBX_SYNCHRO (1 << 30) +# define OTG_MST16 (1 << 29) +# define SRP_GPDATA (1 << 28) +# define SRP_GPDVBUS (1 << 27) +# define SRP_GPUVBUS(w) (((w)>>24)&0x07) +# define A_WAIT_VRISE(w) (((w)>>20)&0x07) +# define B_ASE_BRST(w) (((w)>>16)&0x07) +# define SRP_DPW (1 << 14) +# define SRP_DATA (1 << 13) +# define SRP_VBUS (1 << 12) +# define OTG_PADEN (1 << 10) +# define HMC_PADEN (1 << 9) +# define UHOST_EN (1 << 8) +# define HMC_TLLSPEED (1 << 7) +# define HMC_TLLATTACH (1 << 6) +# define OTG_HMC(w) (((w)>>0)&0x3f) +#define OTG_CTRL (OTG_BASE + 0x0c) +# define OTG_USB2_EN (1 << 29) +# define OTG_USB2_DP (1 << 28) +# define OTG_USB2_DM (1 << 27) +# define OTG_USB1_EN (1 << 26) +# define OTG_USB1_DP (1 << 25) +# define OTG_USB1_DM (1 << 24) +# define OTG_USB0_EN (1 << 23) +# define OTG_USB0_DP (1 << 22) +# define OTG_USB0_DM (1 << 21) +# define OTG_ASESSVLD (1 << 20) +# define OTG_BSESSEND (1 << 19) +# define OTG_BSESSVLD (1 << 18) +# define OTG_VBUSVLD (1 << 17) +# define OTG_ID (1 << 16) +# define OTG_DRIVER_SEL (1 << 15) +# define OTG_A_SETB_HNPEN (1 << 12) +# define OTG_A_BUSREQ (1 << 11) +# define OTG_B_HNPEN (1 << 9) +# define OTG_B_BUSREQ (1 << 8) +# define OTG_BUSDROP (1 << 7) +# define OTG_PULLDOWN (1 << 5) +# define OTG_PULLUP (1 << 4) +# define OTG_DRV_VBUS (1 << 3) +# define OTG_PD_VBUS (1 << 2) +# define OTG_PU_VBUS (1 << 1) +# define OTG_PU_ID (1 << 0) +#define OTG_IRQ_EN (OTG_BASE + 0x10) /* 16-bit */ +# define DRIVER_SWITCH (1 << 15) +# define A_VBUS_ERR (1 << 13) +# define A_REQ_TMROUT (1 << 12) +# define A_SRP_DETECT (1 << 11) +# define B_HNP_FAIL (1 << 10) +# define B_SRP_TMROUT (1 << 9) +# define B_SRP_DONE (1 << 8) +# define B_SRP_STARTED (1 << 7) +# define OPRT_CHG (1 << 0) +#define OTG_IRQ_SRC (OTG_BASE + 0x14) /* 16-bit */ + // same bits as in IRQ_EN +#define OTG_OUTCTRL (OTG_BASE + 0x18) /* 16-bit */ +# define OTGVPD (1 << 14) +# define OTGVPU (1 << 13) +# define OTGPUID (1 << 12) +# define USB2VDR (1 << 10) +# define USB2PDEN (1 << 9) +# define USB2PUEN (1 << 8) +# define USB1VDR (1 << 6) +# define USB1PDEN (1 << 5) +# define USB1PUEN (1 << 4) +# define USB0VDR (1 << 2) +# define USB0PDEN (1 << 1) +# define USB0PUEN (1 << 0) +#define OTG_TEST (OTG_BASE + 0x20) /* 16-bit */ +#define OTG_VENDOR_CODE (OTG_BASE + 0xfc) /* 16-bit */ + +/*-------------------------------------------------------------------------*/ + +/* OMAP1 */ +#define USB_TRANSCEIVER_CTRL (0xfffe1000 + 0x0064) +# define CONF_USB2_UNI_R (1 << 8) +# define CONF_USB1_UNI_R (1 << 7) +# define CONF_USB_PORT0_R(x) (((x)>>4)&0x7) +# define CONF_USB0_ISOLATE_R (1 << 3) +# define CONF_USB_PWRDN_DM_R (1 << 2) +# define CONF_USB_PWRDN_DP_R (1 << 1) + +#endif -- cgit From 1e9ca7c811f76b16c7822ad009b6b0c352227a15 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 6 Aug 2019 15:24:58 +0200 Subject: ARM: omap1: move some headers to include/linux/soc There are three remaining header files that are used by omap1 specific device drivers: - mach/soc.h provides cpu_is_omapXXX abstractions - mach/hardware.h provides omap_read/omap_write functions and physical addresses - mach/mux.h provides an omap specific pinctrl abstraction This is generally not how we do platform abstractions today, and it would be good to completely get rid of these in favor of passing information through platform devices and the pinctrl subsystem. However, given that nobody is working on that, just move it one step forward by splitting out the header files that are used by drivers today from the machine headers that are only used internally. Acked-by: Tony Lindgren Signed-off-by: Arnd Bergmann --- include/linux/soc/ti/omap1-io.h | 143 ++++++++++++++++++ include/linux/soc/ti/omap1-mux.h | 311 +++++++++++++++++++++++++++++++++++++++ include/linux/soc/ti/omap1-soc.h | 198 +++++++++++++++++++++++++ 3 files changed, 652 insertions(+) create mode 100644 include/linux/soc/ti/omap1-io.h create mode 100644 include/linux/soc/ti/omap1-mux.h create mode 100644 include/linux/soc/ti/omap1-soc.h (limited to 'include/linux') diff --git a/include/linux/soc/ti/omap1-io.h b/include/linux/soc/ti/omap1-io.h new file mode 100644 index 000000000000..9332c92690f4 --- /dev/null +++ b/include/linux/soc/ti/omap1-io.h @@ -0,0 +1,143 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __ASM_ARCH_OMAP_IO_H +#define __ASM_ARCH_OMAP_IO_H + +#ifndef __ASSEMBLER__ +#include + +#if defined(CONFIG_ARCH_OMAP) && defined(CONFIG_ARCH_OMAP1) +/* + * NOTE: Please use ioremap + __raw_read/write where possible instead of these + */ +extern u8 omap_readb(u32 pa); +extern u16 omap_readw(u32 pa); +extern u32 omap_readl(u32 pa); +extern void omap_writeb(u8 v, u32 pa); +extern void omap_writew(u16 v, u32 pa); +extern void omap_writel(u32 v, u32 pa); +#elif defined(CONFIG_COMPILE_TEST) +static inline u8 omap_readb(u32 pa) { return 0; } +static inline u16 omap_readw(u32 pa) { return 0; } +static inline u32 omap_readl(u32 pa) { return 0; } +static inline void omap_writeb(u8 v, u32 pa) { } +static inline void omap_writew(u16 v, u32 pa) { } +static inline void omap_writel(u32 v, u32 pa) { } +#endif +#endif + +/* + * ---------------------------------------------------------------------------- + * System control registers + * ---------------------------------------------------------------------------- + */ +#define MOD_CONF_CTRL_0 0xfffe1080 +#define MOD_CONF_CTRL_1 0xfffe1110 + +/* + * --------------------------------------------------------------------------- + * UPLD + * --------------------------------------------------------------------------- + */ +#define ULPD_REG_BASE (0xfffe0800) +#define ULPD_IT_STATUS (ULPD_REG_BASE + 0x14) +#define ULPD_SETUP_ANALOG_CELL_3 (ULPD_REG_BASE + 0x24) +#define ULPD_CLOCK_CTRL (ULPD_REG_BASE + 0x30) +# define DIS_USB_PVCI_CLK (1 << 5) /* no USB/FAC synch */ +# define USB_MCLK_EN (1 << 4) /* enable W4_USB_CLKO */ +#define ULPD_SOFT_REQ (ULPD_REG_BASE + 0x34) +# define SOFT_UDC_REQ (1 << 4) +# define SOFT_USB_CLK_REQ (1 << 3) +# define SOFT_DPLL_REQ (1 << 0) +#define ULPD_DPLL_CTRL (ULPD_REG_BASE + 0x3c) +#define ULPD_STATUS_REQ (ULPD_REG_BASE + 0x40) +#define ULPD_APLL_CTRL (ULPD_REG_BASE + 0x4c) +#define ULPD_POWER_CTRL (ULPD_REG_BASE + 0x50) +#define ULPD_SOFT_DISABLE_REQ_REG (ULPD_REG_BASE + 0x68) +# define DIS_MMC2_DPLL_REQ (1 << 11) +# define DIS_MMC1_DPLL_REQ (1 << 10) +# define DIS_UART3_DPLL_REQ (1 << 9) +# define DIS_UART2_DPLL_REQ (1 << 8) +# define DIS_UART1_DPLL_REQ (1 << 7) +# define DIS_USB_HOST_DPLL_REQ (1 << 6) +#define ULPD_SDW_CLK_DIV_CTRL_SEL (ULPD_REG_BASE + 0x74) +#define ULPD_CAM_CLK_CTRL (ULPD_REG_BASE + 0x7c) + +/* + * ---------------------------------------------------------------------------- + * Clocks + * ---------------------------------------------------------------------------- + */ +#define CLKGEN_REG_BASE (0xfffece00) +#define ARM_CKCTL (CLKGEN_REG_BASE + 0x0) +#define ARM_IDLECT1 (CLKGEN_REG_BASE + 0x4) +#define ARM_IDLECT2 (CLKGEN_REG_BASE + 0x8) +#define ARM_EWUPCT (CLKGEN_REG_BASE + 0xC) +#define ARM_RSTCT1 (CLKGEN_REG_BASE + 0x10) +#define ARM_RSTCT2 (CLKGEN_REG_BASE + 0x14) +#define ARM_SYSST (CLKGEN_REG_BASE + 0x18) +#define ARM_IDLECT3 (CLKGEN_REG_BASE + 0x24) + +#define CK_RATEF 1 +#define CK_IDLEF 2 +#define CK_ENABLEF 4 +#define CK_SELECTF 8 +#define SETARM_IDLE_SHIFT + +/* DPLL control registers */ +#define DPLL_CTL (0xfffecf00) + +/* DSP clock control. Must use __raw_readw() and __raw_writew() with these */ +#define DSP_CONFIG_REG_BASE IOMEM(0xe1008000) +#define DSP_CKCTL (DSP_CONFIG_REG_BASE + 0x0) +#define DSP_IDLECT1 (DSP_CONFIG_REG_BASE + 0x4) +#define DSP_IDLECT2 (DSP_CONFIG_REG_BASE + 0x8) +#define DSP_RSTCT2 (DSP_CONFIG_REG_BASE + 0x14) + +/* + * ---------------------------------------------------------------------------- + * Pulse-Width Light + * ---------------------------------------------------------------------------- + */ +#define OMAP_PWL_BASE 0xfffb5800 +#define OMAP_PWL_ENABLE (OMAP_PWL_BASE + 0x00) +#define OMAP_PWL_CLK_ENABLE (OMAP_PWL_BASE + 0x04) + +/* + * ---------------------------------------------------------------------------- + * Pin multiplexing registers + * ---------------------------------------------------------------------------- + */ +#define FUNC_MUX_CTRL_0 0xfffe1000 +#define FUNC_MUX_CTRL_1 0xfffe1004 +#define FUNC_MUX_CTRL_2 0xfffe1008 +#define COMP_MODE_CTRL_0 0xfffe100c +#define FUNC_MUX_CTRL_3 0xfffe1010 +#define FUNC_MUX_CTRL_4 0xfffe1014 +#define FUNC_MUX_CTRL_5 0xfffe1018 +#define FUNC_MUX_CTRL_6 0xfffe101C +#define FUNC_MUX_CTRL_7 0xfffe1020 +#define FUNC_MUX_CTRL_8 0xfffe1024 +#define FUNC_MUX_CTRL_9 0xfffe1028 +#define FUNC_MUX_CTRL_A 0xfffe102C +#define FUNC_MUX_CTRL_B 0xfffe1030 +#define FUNC_MUX_CTRL_C 0xfffe1034 +#define FUNC_MUX_CTRL_D 0xfffe1038 +#define PULL_DWN_CTRL_0 0xfffe1040 +#define PULL_DWN_CTRL_1 0xfffe1044 +#define PULL_DWN_CTRL_2 0xfffe1048 +#define PULL_DWN_CTRL_3 0xfffe104c +#define PULL_DWN_CTRL_4 0xfffe10ac + +/* OMAP-1610 specific multiplexing registers */ +#define FUNC_MUX_CTRL_E 0xfffe1090 +#define FUNC_MUX_CTRL_F 0xfffe1094 +#define FUNC_MUX_CTRL_10 0xfffe1098 +#define FUNC_MUX_CTRL_11 0xfffe109c +#define FUNC_MUX_CTRL_12 0xfffe10a0 +#define PU_PD_SEL_0 0xfffe10b4 +#define PU_PD_SEL_1 0xfffe10b8 +#define PU_PD_SEL_2 0xfffe10bc +#define PU_PD_SEL_3 0xfffe10c0 +#define PU_PD_SEL_4 0xfffe10c4 + +#endif diff --git a/include/linux/soc/ti/omap1-mux.h b/include/linux/soc/ti/omap1-mux.h new file mode 100644 index 000000000000..59c239b5569c --- /dev/null +++ b/include/linux/soc/ti/omap1-mux.h @@ -0,0 +1,311 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef __SOC_TI_OMAP1_MUX_H +#define __SOC_TI_OMAP1_MUX_H +/* + * This should not really be a global header, it reflects the + * traditional way that omap1 does pin muxing without the + * pinctrl subsystem. + */ + +enum omap7xx_index { + /* OMAP 730 keyboard */ + E2_7XX_KBR0, + J7_7XX_KBR1, + E1_7XX_KBR2, + F3_7XX_KBR3, + D2_7XX_KBR4, + C2_7XX_KBC0, + D3_7XX_KBC1, + E4_7XX_KBC2, + F4_7XX_KBC3, + E3_7XX_KBC4, + + /* USB */ + AA17_7XX_USB_DM, + W16_7XX_USB_PU_EN, + W17_7XX_USB_VBUSI, + W18_7XX_USB_DMCK_OUT, + W19_7XX_USB_DCRST, + + /* MMC */ + MMC_7XX_CMD, + MMC_7XX_CLK, + MMC_7XX_DAT0, + + /* I2C */ + I2C_7XX_SCL, + I2C_7XX_SDA, + + /* SPI */ + SPI_7XX_1, + SPI_7XX_2, + SPI_7XX_3, + SPI_7XX_4, + SPI_7XX_5, + SPI_7XX_6, + + /* UART */ + UART_7XX_1, + UART_7XX_2, +}; + +enum omap1xxx_index { + /* UART1 (BT_UART_GATING)*/ + UART1_TX = 0, + UART1_RTS, + + /* UART2 (COM_UART_GATING)*/ + UART2_TX, + UART2_RX, + UART2_CTS, + UART2_RTS, + + /* UART3 (GIGA_UART_GATING) */ + UART3_TX, + UART3_RX, + UART3_CTS, + UART3_RTS, + UART3_CLKREQ, + UART3_BCLK, /* 12MHz clock out */ + Y15_1610_UART3_RTS, + + /* PWT & PWL */ + PWT, + PWL, + + /* USB master generic */ + R18_USB_VBUS, + R18_1510_USB_GPIO0, + W4_USB_PUEN, + W4_USB_CLKO, + W4_USB_HIGHZ, + W4_GPIO58, + + /* USB1 master */ + USB1_SUSP, + USB1_SEO, + W13_1610_USB1_SE0, + USB1_TXEN, + USB1_TXD, + USB1_VP, + USB1_VM, + USB1_RCV, + USB1_SPEED, + R13_1610_USB1_SPEED, + R13_1710_USB1_SE0, + + /* USB2 master */ + USB2_SUSP, + USB2_VP, + USB2_TXEN, + USB2_VM, + USB2_RCV, + USB2_SEO, + USB2_TXD, + + /* OMAP-1510 GPIO */ + R18_1510_GPIO0, + R19_1510_GPIO1, + M14_1510_GPIO2, + + /* OMAP1610 GPIO */ + P18_1610_GPIO3, + Y15_1610_GPIO17, + + /* OMAP-1710 GPIO */ + R18_1710_GPIO0, + V2_1710_GPIO10, + N21_1710_GPIO14, + W15_1710_GPIO40, + + /* MPUIO */ + MPUIO2, + N15_1610_MPUIO2, + MPUIO4, + MPUIO5, + T20_1610_MPUIO5, + W11_1610_MPUIO6, + V10_1610_MPUIO7, + W11_1610_MPUIO9, + V10_1610_MPUIO10, + W10_1610_MPUIO11, + E20_1610_MPUIO13, + U20_1610_MPUIO14, + E19_1610_MPUIO15, + + /* MCBSP2 */ + MCBSP2_CLKR, + MCBSP2_CLKX, + MCBSP2_DR, + MCBSP2_DX, + MCBSP2_FSR, + MCBSP2_FSX, + + /* MCBSP3 */ + MCBSP3_CLKX, + + /* Misc ballouts */ + BALLOUT_V8_ARMIO3, + N20_HDQ, + + /* OMAP-1610 MMC2 */ + W8_1610_MMC2_DAT0, + V8_1610_MMC2_DAT1, + W15_1610_MMC2_DAT2, + R10_1610_MMC2_DAT3, + Y10_1610_MMC2_CLK, + Y8_1610_MMC2_CMD, + V9_1610_MMC2_CMDDIR, + V5_1610_MMC2_DATDIR0, + W19_1610_MMC2_DATDIR1, + R18_1610_MMC2_CLKIN, + + /* OMAP-1610 External Trace Interface */ + M19_1610_ETM_PSTAT0, + L15_1610_ETM_PSTAT1, + L18_1610_ETM_PSTAT2, + L19_1610_ETM_D0, + J19_1610_ETM_D6, + J18_1610_ETM_D7, + + /* OMAP16XX GPIO */ + P20_1610_GPIO4, + V9_1610_GPIO7, + W8_1610_GPIO9, + N20_1610_GPIO11, + N19_1610_GPIO13, + P10_1610_GPIO22, + V5_1610_GPIO24, + AA20_1610_GPIO_41, + W19_1610_GPIO48, + M7_1610_GPIO62, + V14_16XX_GPIO37, + R9_16XX_GPIO18, + L14_16XX_GPIO49, + + /* OMAP-1610 uWire */ + V19_1610_UWIRE_SCLK, + U18_1610_UWIRE_SDI, + W21_1610_UWIRE_SDO, + N14_1610_UWIRE_CS0, + P15_1610_UWIRE_CS3, + N15_1610_UWIRE_CS1, + + /* OMAP-1610 SPI */ + U19_1610_SPIF_SCK, + U18_1610_SPIF_DIN, + P20_1610_SPIF_DIN, + W21_1610_SPIF_DOUT, + R18_1610_SPIF_DOUT, + N14_1610_SPIF_CS0, + N15_1610_SPIF_CS1, + T19_1610_SPIF_CS2, + P15_1610_SPIF_CS3, + + /* OMAP-1610 Flash */ + L3_1610_FLASH_CS2B_OE, + M8_1610_FLASH_CS2B_WE, + + /* First MMC */ + MMC_CMD, + MMC_DAT1, + MMC_DAT2, + MMC_DAT0, + MMC_CLK, + MMC_DAT3, + + /* OMAP-1710 MMC CMDDIR and DATDIR0 */ + M15_1710_MMC_CLKI, + P19_1710_MMC_CMDDIR, + P20_1710_MMC_DATDIR0, + + /* OMAP-1610 USB0 alternate pin configuration */ + W9_USB0_TXEN, + AA9_USB0_VP, + Y5_USB0_RCV, + R9_USB0_VM, + V6_USB0_TXD, + W5_USB0_SE0, + V9_USB0_SPEED, + V9_USB0_SUSP, + + /* USB2 */ + W9_USB2_TXEN, + AA9_USB2_VP, + Y5_USB2_RCV, + R9_USB2_VM, + V6_USB2_TXD, + W5_USB2_SE0, + + /* 16XX UART */ + R13_1610_UART1_TX, + V14_16XX_UART1_RX, + R14_1610_UART1_CTS, + AA15_1610_UART1_RTS, + R9_16XX_UART2_RX, + L14_16XX_UART3_RX, + + /* I2C OMAP-1610 */ + I2C_SCL, + I2C_SDA, + + /* Keypad */ + F18_1610_KBC0, + D20_1610_KBC1, + D19_1610_KBC2, + E18_1610_KBC3, + C21_1610_KBC4, + G18_1610_KBR0, + F19_1610_KBR1, + H14_1610_KBR2, + E20_1610_KBR3, + E19_1610_KBR4, + N19_1610_KBR5, + + /* Power management */ + T20_1610_LOW_PWR, + + /* MCLK Settings */ + V5_1710_MCLK_ON, + V5_1710_MCLK_OFF, + R10_1610_MCLK_ON, + R10_1610_MCLK_OFF, + + /* CompactFlash controller */ + P11_1610_CF_CD2, + R11_1610_CF_IOIS16, + V10_1610_CF_IREQ, + W10_1610_CF_RESET, + W11_1610_CF_CD1, + + /* parallel camera */ + J15_1610_CAM_LCLK, + J18_1610_CAM_D7, + J19_1610_CAM_D6, + J14_1610_CAM_D5, + K18_1610_CAM_D4, + K19_1610_CAM_D3, + K15_1610_CAM_D2, + K14_1610_CAM_D1, + L19_1610_CAM_D0, + L18_1610_CAM_VS, + L15_1610_CAM_HS, + M19_1610_CAM_RSTZ, + Y15_1610_CAM_OUTCLK, + + /* serial camera */ + H19_1610_CAM_EXCLK, + Y12_1610_CCP_CLKP, + W13_1610_CCP_CLKM, + W14_1610_CCP_DATAP, + Y14_1610_CCP_DATAM, + +}; + +#ifdef CONFIG_OMAP_MUX +extern int omap_cfg_reg(unsigned long reg_cfg); +#else +static inline int omap_cfg_reg(unsigned long reg_cfg) { return 0; } +#endif + +#endif diff --git a/include/linux/soc/ti/omap1-soc.h b/include/linux/soc/ti/omap1-soc.h new file mode 100644 index 000000000000..81008d400bb6 --- /dev/null +++ b/include/linux/soc/ti/omap1-soc.h @@ -0,0 +1,198 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * OMAP cpu type detection + * + * Copyright (C) 2004, 2008 Nokia Corporation + * + * Copyright (C) 2009-11 Texas Instruments. + * + * Written by Tony Lindgren + * + * Added OMAP4/5 specific defines - Santosh Shilimkar + */ + +#ifndef __ASM_ARCH_OMAP_CPU_H +#define __ASM_ARCH_OMAP_CPU_H + +/* + * Test if multicore OMAP support is needed + */ +#undef MULTI_OMAP1 +#undef OMAP_NAME + +#ifdef CONFIG_ARCH_OMAP730 +# ifdef OMAP_NAME +# undef MULTI_OMAP1 +# define MULTI_OMAP1 +# else +# define OMAP_NAME omap730 +# endif +#endif +#ifdef CONFIG_ARCH_OMAP850 +# ifdef OMAP_NAME +# undef MULTI_OMAP1 +# define MULTI_OMAP1 +# else +# define OMAP_NAME omap850 +# endif +#endif +#ifdef CONFIG_ARCH_OMAP15XX +# ifdef OMAP_NAME +# undef MULTI_OMAP1 +# define MULTI_OMAP1 +# else +# define OMAP_NAME omap1510 +# endif +#endif +#ifdef CONFIG_ARCH_OMAP16XX +# ifdef OMAP_NAME +# undef MULTI_OMAP1 +# define MULTI_OMAP1 +# else +# define OMAP_NAME omap16xx +# endif +#endif + +/* + * omap_rev bits: + * CPU id bits (0730, 1510, 1710, 2422...) [31:16] + * CPU revision (See _REV_ defined in cpu.h) [15:08] + * CPU class bits (15xx, 16xx, 24xx, 34xx...) [07:00] + */ +unsigned int omap_rev(void); + +/* + * Get the CPU revision for OMAP devices + */ +#define GET_OMAP_REVISION() ((omap_rev() >> 8) & 0xff) + +/* + * Macros to group OMAP into cpu classes. + * These can be used in most places. + * cpu_is_omap7xx(): True for OMAP730, OMAP850 + * cpu_is_omap15xx(): True for OMAP1510, OMAP5910 and OMAP310 + * cpu_is_omap16xx(): True for OMAP1610, OMAP5912 and OMAP1710 + */ +#define GET_OMAP_CLASS (omap_rev() & 0xff) + +#define IS_OMAP_CLASS(class, id) \ +static inline int is_omap ##class (void) \ +{ \ + return (GET_OMAP_CLASS == (id)) ? 1 : 0; \ +} + +#define GET_OMAP_SUBCLASS ((omap_rev() >> 20) & 0x0fff) + +#define IS_OMAP_SUBCLASS(subclass, id) \ +static inline int is_omap ##subclass (void) \ +{ \ + return (GET_OMAP_SUBCLASS == (id)) ? 1 : 0; \ +} + +IS_OMAP_CLASS(7xx, 0x07) +IS_OMAP_CLASS(15xx, 0x15) +IS_OMAP_CLASS(16xx, 0x16) + +#define cpu_is_omap7xx() 0 +#define cpu_is_omap15xx() 0 +#define cpu_is_omap16xx() 0 + +#if defined(MULTI_OMAP1) +# if defined(CONFIG_ARCH_OMAP730) +# undef cpu_is_omap7xx +# define cpu_is_omap7xx() is_omap7xx() +# endif +# if defined(CONFIG_ARCH_OMAP850) +# undef cpu_is_omap7xx +# define cpu_is_omap7xx() is_omap7xx() +# endif +# if defined(CONFIG_ARCH_OMAP15XX) +# undef cpu_is_omap15xx +# define cpu_is_omap15xx() is_omap15xx() +# endif +# if defined(CONFIG_ARCH_OMAP16XX) +# undef cpu_is_omap16xx +# define cpu_is_omap16xx() is_omap16xx() +# endif +#else +# if defined(CONFIG_ARCH_OMAP730) +# undef cpu_is_omap7xx +# define cpu_is_omap7xx() 1 +# endif +# if defined(CONFIG_ARCH_OMAP850) +# undef cpu_is_omap7xx +# define cpu_is_omap7xx() 1 +# endif +# if defined(CONFIG_ARCH_OMAP15XX) +# undef cpu_is_omap15xx +# define cpu_is_omap15xx() 1 +# endif +# if defined(CONFIG_ARCH_OMAP16XX) +# undef cpu_is_omap16xx +# define cpu_is_omap16xx() 1 +# endif +#endif + +/* + * Macros to detect individual cpu types. + * These are only rarely needed. + * cpu_is_omap310(): True for OMAP310 + * cpu_is_omap1510(): True for OMAP1510 + * cpu_is_omap1610(): True for OMAP1610 + * cpu_is_omap1611(): True for OMAP1611 + * cpu_is_omap5912(): True for OMAP5912 + * cpu_is_omap1621(): True for OMAP1621 + * cpu_is_omap1710(): True for OMAP1710 + */ +#define GET_OMAP_TYPE ((omap_rev() >> 16) & 0xffff) + +#define IS_OMAP_TYPE(type, id) \ +static inline int is_omap ##type (void) \ +{ \ + return (GET_OMAP_TYPE == (id)) ? 1 : 0; \ +} + +IS_OMAP_TYPE(310, 0x0310) +IS_OMAP_TYPE(1510, 0x1510) +IS_OMAP_TYPE(1610, 0x1610) +IS_OMAP_TYPE(1611, 0x1611) +IS_OMAP_TYPE(5912, 0x1611) +IS_OMAP_TYPE(1621, 0x1621) +IS_OMAP_TYPE(1710, 0x1710) + +#define cpu_is_omap310() 0 +#define cpu_is_omap1510() 0 +#define cpu_is_omap1610() 0 +#define cpu_is_omap5912() 0 +#define cpu_is_omap1611() 0 +#define cpu_is_omap1621() 0 +#define cpu_is_omap1710() 0 + +#define cpu_class_is_omap1() 1 + +/* + * Whether we have MULTI_OMAP1 or not, we still need to distinguish + * between 310 vs. 1510 and 1611B/5912 vs. 1710. + */ + +#if defined(CONFIG_ARCH_OMAP15XX) +# undef cpu_is_omap310 +# undef cpu_is_omap1510 +# define cpu_is_omap310() is_omap310() +# define cpu_is_omap1510() is_omap1510() +#endif + +#if defined(CONFIG_ARCH_OMAP16XX) +# undef cpu_is_omap1610 +# undef cpu_is_omap1611 +# undef cpu_is_omap5912 +# undef cpu_is_omap1621 +# undef cpu_is_omap1710 +# define cpu_is_omap1610() is_omap1610() +# define cpu_is_omap1611() is_omap1611() +# define cpu_is_omap5912() is_omap5912() +# define cpu_is_omap1621() is_omap1621() +# define cpu_is_omap1710() is_omap1710() +#endif + +#endif -- cgit From 9fe15316563cbd46601c770a7214ccc5e1925bfb Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Tue, 6 Aug 2019 15:13:18 +0200 Subject: ARM: omap1: innovator: move ohci phy power handling to board file The innovator board needs a special case for its phy control. Move the corresponding code into the board file and out of the common code by adding another callback. Acked-by: Felipe Balbi Acked-by: Tony Lindgren Signed-off-by: Arnd Bergmann --- include/linux/platform_data/usb-omap1.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/usb-omap1.h b/include/linux/platform_data/usb-omap1.h index 878e572a78bf..e7b8dc92a269 100644 --- a/include/linux/platform_data/usb-omap1.h +++ b/include/linux/platform_data/usb-omap1.h @@ -50,6 +50,8 @@ struct omap_usb_config { int (*ocpi_enable)(void); void (*lb_reset)(void); + + int (*transceiver_power)(int on); }; #endif /* __LINUX_USB_OMAP1_H */ -- cgit From 17ea03b75e5665c9ce4945aa5afd097f3c845cdf Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 27 Nov 2020 15:46:47 +0100 Subject: ARM: omap: dma: make usb support optional Most of the plat-omap/dma.c code is specific to the USB driver. Hide that code when it is not in use, to make it clearer which parts are actually still required. Acked-by: Tony Lindgren Signed-off-by: Arnd Bergmann --- include/linux/omap-dma.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/omap-dma.h b/include/linux/omap-dma.h index 5e228428fda1..07fa58ae9902 100644 --- a/include/linux/omap-dma.h +++ b/include/linux/omap-dma.h @@ -299,8 +299,9 @@ extern void omap_set_dma_priority(int lch, int dst_port, int priority); extern int omap_request_dma(int dev_id, const char *dev_name, void (*callback)(int lch, u16 ch_status, void *data), void *data, int *dma_ch); -extern void omap_disable_dma_irq(int ch, u16 irq_bits); extern void omap_free_dma(int ch); +#if IS_ENABLED(CONFIG_USB_OMAP) +extern void omap_disable_dma_irq(int ch, u16 irq_bits); extern void omap_start_dma(int lch); extern void omap_stop_dma(int lch); extern void omap_set_dma_transfer_params(int lch, int data_type, @@ -326,6 +327,8 @@ extern void omap_set_dma_dest_burst_mode(int lch, extern dma_addr_t omap_get_dma_src_pos(int lch); extern dma_addr_t omap_get_dma_dst_pos(int lch); extern int omap_get_dma_active_status(int lch); +#endif + extern int omap_dma_running(void); #if IS_ENABLED(CONFIG_FB_OMAP) -- cgit From 3550bba25d5587a701e6edf20e20984d2ee72c78 Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Sat, 9 Apr 2022 11:51:28 +0200 Subject: gpiolib: of: Introduce hook for missing gpio-ranges Since commit 2ab73c6d8323 ("gpio: Support GPIO controllers without pin-ranges") the device tree nodes of GPIO controller need the gpio-ranges property to handle gpio-hogs. Unfortunately it's impossible to guarantee that every new kernel is shipped with an updated device tree binary. In order to provide backward compatibility with those older DTB, we need a callback within of_gpiochip_add_pin_range() so the relevant platform driver can handle this case. Fixes: 2ab73c6d8323 ("gpio: Support GPIO controllers without pin-ranges") Signed-off-by: Stefan Wahren Reviewed-by: Florian Fainelli Tested-by: Florian Fainelli Acked-by: Bartosz Golaszewski Link: https://lore.kernel.org/r/20220409095129.45786-2-stefan.wahren@i2se.com Signed-off-by: Linus Walleij --- include/linux/gpio/driver.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 98c93510640e..c2041bd01f90 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -492,6 +492,18 @@ struct gpio_chip { */ int (*of_xlate)(struct gpio_chip *gc, const struct of_phandle_args *gpiospec, u32 *flags); + + /** + * @of_gpio_ranges_fallback: + * + * Optional hook for the case that no gpio-ranges property is defined + * within the device tree node "np" (usually DT before introduction + * of gpio-ranges). So this callback is helpful to provide the + * necessary backward compatibility for the pin ranges. + */ + int (*of_gpio_ranges_fallback)(struct gpio_chip *gc, + struct device_node *np); + #endif /* CONFIG_OF_GPIO */ }; -- cgit From 8d084b2eae7fc5fcfc9f143cd7321a88e1cd76aa Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Tue, 5 Apr 2022 17:15:13 +0200 Subject: usb: typec: tcpm: Fix undefined behavior due to shift overflowing the constant MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix: drivers/usb/typec/tcpm/tcpm.c: In function ‘run_state_machine’: drivers/usb/typec/tcpm/tcpm.c:4724:3: error: case label does not reduce to an integer constant case BDO_MODE_TESTDATA: ^~~~ See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory details as to why it triggers with older gccs only. Signed-off-by: Borislav Petkov Cc: Greg Kroah-Hartman Cc: linux-usb@vger.kernel.org Link: https://lore.kernel.org/r/20220405151517.29753-8-bp@alien8.de Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/pd_bdo.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/usb/pd_bdo.h b/include/linux/usb/pd_bdo.h index 033fe3e17141..7c25b88d79f9 100644 --- a/include/linux/usb/pd_bdo.h +++ b/include/linux/usb/pd_bdo.h @@ -15,7 +15,7 @@ #define BDO_MODE_CARRIER2 (5 << 28) #define BDO_MODE_CARRIER3 (6 << 28) #define BDO_MODE_EYE (7 << 28) -#define BDO_MODE_TESTDATA (8 << 28) +#define BDO_MODE_TESTDATA (8U << 28) #define BDO_MODE_MASK(mode) ((mode) & 0xf0000000) -- cgit From 2031f2876896d82aca7e82f84accd9181b9587fb Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 15 Apr 2022 00:43:43 +0000 Subject: KVM: Add helpers to wrap vcpu->srcu_idx and yell if it's abused Add wrappers to acquire/release KVM's SRCU lock when stashing the index in vcpu->src_idx, along with rudimentary detection of illegal usage, e.g. re-acquiring SRCU and thus overwriting vcpu->src_idx. Because the SRCU index is (currently) either 0 or 1, illegal nesting bugs can go unnoticed for quite some time and only cause problems when the nested lock happens to get a different index. Wrap the WARNs in PROVE_RCU=y, and make them ONCE, otherwise KVM will likely yell so loudly that it will bring the kernel to its knees. Signed-off-by: Sean Christopherson Tested-by: Fabiano Rosas Message-Id: <20220415004343.2203171-4-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3f9b22c4983a..2dab4b696682 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -315,7 +315,10 @@ struct kvm_vcpu { int cpu; int vcpu_id; /* id given by userspace at creation */ int vcpu_idx; /* index in kvm->vcpus array */ - int srcu_idx; + int ____srcu_idx; /* Don't use this directly. You've been warned. */ +#ifdef CONFIG_PROVE_RCU + int srcu_depth; +#endif int mode; u64 requests; unsigned long guest_debug; @@ -840,6 +843,25 @@ static inline void kvm_vm_bugged(struct kvm *kvm) unlikely(__ret); \ }) +static inline void kvm_vcpu_srcu_read_lock(struct kvm_vcpu *vcpu) +{ +#ifdef CONFIG_PROVE_RCU + WARN_ONCE(vcpu->srcu_depth++, + "KVM: Illegal vCPU srcu_idx LOCK, depth=%d", vcpu->srcu_depth - 1); +#endif + vcpu->____srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); +} + +static inline void kvm_vcpu_srcu_read_unlock(struct kvm_vcpu *vcpu) +{ + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->____srcu_idx); + +#ifdef CONFIG_PROVE_RCU + WARN_ONCE(--vcpu->srcu_depth, + "KVM: Illegal vCPU srcu_idx UNLOCK, depth=%d", vcpu->srcu_depth); +#endif +} + static inline bool kvm_dirty_log_manual_protect_and_init_set(struct kvm *kvm) { return !!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET); -- cgit From 1e3dc1d8622b2699e6cf1cc06885105b13c9c514 Mon Sep 17 00:00:00 2001 From: Tomas Winkler Date: Tue, 19 Apr 2022 12:33:08 -0700 Subject: drm/i915/gsc: add gsc as a mei auxiliary device GSC is a graphics system controller, it provides a chassis controller for graphics discrete cards. There are two MEI interfaces in GSC: HECI1 and HECI2. Both interfaces are on the BAR0 at offsets 0x00258000 and 0x00259000. GSC is a GT Engine (class 4: instance 6). HECI1 interrupt is signaled via bit 15 and HECI2 via bit 14 in the interrupt register. This patch exports GSC as auxiliary device for mei driver to bind to for HECI2 interface and prepares for HECI1 interface as it will follow up soon. CC: Rodrigo Vivi Signed-off-by: Tomas Winkler Signed-off-by: Vitaly Lubart Signed-off-by: Alexander Usyskin Acked-by: Tvrtko Ursulin Reviewed-by: Daniele Ceraolo Spurio Signed-off-by: Daniele Ceraolo Spurio Link: https://patchwork.freedesktop.org/patch/msgid/20220419193314.526966-2-daniele.ceraolospurio@intel.com --- include/linux/mei_aux.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) create mode 100644 include/linux/mei_aux.h (limited to 'include/linux') diff --git a/include/linux/mei_aux.h b/include/linux/mei_aux.h new file mode 100644 index 000000000000..587f25128848 --- /dev/null +++ b/include/linux/mei_aux.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2022, Intel Corporation. All rights reserved. + */ +#ifndef _LINUX_MEI_AUX_H +#define _LINUX_MEI_AUX_H + +#include + +struct mei_aux_device { + struct auxiliary_device aux_dev; + int irq; + struct resource bar; +}; + +#define auxiliary_dev_to_mei_aux_dev(auxiliary_dev) \ + container_of(auxiliary_dev, struct mei_aux_device, aux_dev) + +#endif /* _LINUX_MEI_AUX_H */ -- cgit From 988f11e046401f8561c4afefa506a50f0203de40 Mon Sep 17 00:00:00 2001 From: liaohua Date: Thu, 7 Apr 2022 15:29:48 +0800 Subject: latencytop: move sysctl to its own file This moves latencytop sysctl to kernel/latencytop.c Signed-off-by: liaohua Signed-off-by: Luis Chamberlain --- include/linux/latencytop.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/latencytop.h b/include/linux/latencytop.h index abe3d95f795b..84f1053cf2a8 100644 --- a/include/linux/latencytop.h +++ b/include/linux/latencytop.h @@ -38,9 +38,6 @@ account_scheduler_latency(struct task_struct *task, int usecs, int inter) void clear_tsk_latency_tracing(struct task_struct *p); -int sysctl_latencytop(struct ctl_table *table, int write, void *buffer, - size_t *lenp, loff_t *ppos); - #else static inline void -- cgit From 683412ccf61294d727ead4a73d97397396e69a6b Mon Sep 17 00:00:00 2001 From: Mingwei Zhang Date: Thu, 21 Apr 2022 03:14:07 +0000 Subject: KVM: SEV: add cache flush to solve SEV cache incoherency issues Flush the CPU caches when memory is reclaimed from an SEV guest (where reclaim also includes it being unmapped from KVM's memslots). Due to lack of coherency for SEV encrypted memory, failure to flush results in silent data corruption if userspace is malicious/broken and doesn't ensure SEV guest memory is properly pinned and unpinned. Cache coherency is not enforced across the VM boundary in SEV (AMD APM vol.2 Section 15.34.7). Confidential cachelines, generated by confidential VM guests have to be explicitly flushed on the host side. If a memory page containing dirty confidential cachelines was released by VM and reallocated to another user, the cachelines may corrupt the new user at a later time. KVM takes a shortcut by assuming all confidential memory remain pinned until the end of VM lifetime. Therefore, KVM does not flush cache at mmu_notifier invalidation events. Because of this incorrect assumption and the lack of cache flushing, malicous userspace can crash the host kernel: creating a malicious VM and continuously allocates/releases unpinned confidential memory pages when the VM is running. Add cache flush operations to mmu_notifier operations to ensure that any physical memory leaving the guest VM get flushed. In particular, hook mmu_notifier_invalidate_range_start and mmu_notifier_release events and flush cache accordingly. The hook after releasing the mmu lock to avoid contention with other vCPUs. Cc: stable@vger.kernel.org Suggested-by: Sean Christpherson Reported-by: Mingwei Zhang Signed-off-by: Mingwei Zhang Message-Id: <20220421031407.2516575-4-mizhang@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 2dab4b696682..34eed5f85ed6 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2219,6 +2219,8 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp, void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, unsigned long start, unsigned long end); +void kvm_arch_guest_memory_reclaimed(struct kvm *kvm); + #ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu); #else -- cgit From 405ce051236cc65b30bbfe490b28ce60ae6aed85 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 21 Apr 2022 16:35:33 -0700 Subject: mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb() There is a race condition between memory_failure_hugetlb() and hugetlb free/demotion, which causes setting PageHWPoison flag on the wrong page. The one simple result is that wrong processes can be killed, but another (more serious) one is that the actual error is left unhandled, so no one prevents later access to it, and that might lead to more serious results like consuming corrupted data. Think about the below race window: CPU 1 CPU 2 memory_failure_hugetlb struct page *head = compound_head(p); hugetlb page might be freed to buddy, or even changed to another compound page. get_hwpoison_page -- page is not what we want now... The current code first does prechecks roughly and then reconfirms after taking refcount, but it's found that it makes code overly complicated, so move the prechecks in a single hugetlb_lock range. A newly introduced function, try_memory_failure_hugetlb(), always takes hugetlb_lock (even for non-hugetlb pages). That can be improved, but memory_failure() is rare in principle, so should not be a big problem. Link: https://lkml.kernel.org/r/20220408135323.1559401-2-naoya.horiguchi@linux.dev Fixes: 761ad8d7c7b5 ("mm: hwpoison: introduce memory_failure_hugetlb()") Signed-off-by: Naoya Horiguchi Reported-by: Mike Kravetz Reviewed-by: Miaohe Lin Reviewed-by: Mike Kravetz Cc: Yang Shi Cc: Dan Carpenter Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/hugetlb.h | 6 ++++++ include/linux/mm.h | 8 ++++++++ 2 files changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 53c1b6082a4c..ac2a1d758a80 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -169,6 +169,7 @@ long hugetlb_unreserve_pages(struct inode *inode, long start, long end, long freed); bool isolate_huge_page(struct page *page, struct list_head *list); int get_hwpoison_huge_page(struct page *page, bool *hugetlb); +int get_huge_page_for_hwpoison(unsigned long pfn, int flags); void putback_active_hugepage(struct page *page); void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason); void free_huge_page(struct page *page); @@ -378,6 +379,11 @@ static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb) return 0; } +static inline int get_huge_page_for_hwpoison(unsigned long pfn, int flags) +{ + return 0; +} + static inline void putback_active_hugepage(struct page *page) { } diff --git a/include/linux/mm.h b/include/linux/mm.h index e34edb775334..9f44254af8ce 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3197,6 +3197,14 @@ extern int sysctl_memory_failure_recovery; extern void shake_page(struct page *p); extern atomic_long_t num_poisoned_pages __read_mostly; extern int soft_offline_page(unsigned long pfn, int flags); +#ifdef CONFIG_MEMORY_FAILURE +extern int __get_huge_page_for_hwpoison(unsigned long pfn, int flags); +#else +static inline int __get_huge_page_for_hwpoison(unsigned long pfn, int flags) +{ + return 0; +} +#endif #ifndef arch_memory_failure static inline int arch_memory_failure(unsigned long pfn, int flags) -- cgit From 9b3016154c913b2e7ec5ae5c9a42eb9e732d86aa Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Thu, 21 Apr 2022 16:35:40 -0700 Subject: memcg: sync flush only if periodic flush is delayed MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Daniel Dao has reported [1] a regression on workloads that may trigger a lot of refaults (anon and file). The underlying issue is that flushing rstat is expensive. Although rstat flush are batched with (nr_cpus * MEMCG_BATCH) stat updates, it seems like there are workloads which genuinely do stat updates larger than batch value within short amount of time. Since the rstat flush can happen in the performance critical codepaths like page faults, such workload can suffer greatly. This patch fixes this regression by making the rstat flushing conditional in the performance critical codepaths. More specifically, the kernel relies on the async periodic rstat flusher to flush the stats and only if the periodic flusher is delayed by more than twice the amount of its normal time window then the kernel allows rstat flushing from the performance critical codepaths. Now the question: what are the side-effects of this change? The worst that can happen is the refault codepath will see 4sec old lruvec stats and may cause false (or missed) activations of the refaulted page which may under-or-overestimate the workingset size. Though that is not very concerning as the kernel can already miss or do false activations. There are two more codepaths whose flushing behavior is not changed by this patch and we may need to come to them in future. One is the writeback stats used by dirty throttling and second is the deactivation heuristic in the reclaim. For now keeping an eye on them and if there is report of regression due to these codepaths, we will reevaluate then. Link: https://lore.kernel.org/all/CA+wXwBSyO87ZX5PVwdHm-=dBjZYECGmfnydUicUyrQqndgX2MQ@mail.gmail.com [1] Link: https://lkml.kernel.org/r/20220304184040.1304781-1-shakeelb@google.com Fixes: 1f828223b799 ("memcg: flush lruvec stats in the refault") Signed-off-by: Shakeel Butt Reported-by: Daniel Dao Tested-by: Ivan Babrou Cc: Michal Hocko Cc: Roman Gushchin Cc: Johannes Weiner Cc: Michal Koutný Cc: Frank Hofmann Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a68dce3873fc..89b14729d59f 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1012,6 +1012,7 @@ static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, } void mem_cgroup_flush_stats(void); +void mem_cgroup_flush_stats_delayed(void); void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val); @@ -1455,6 +1456,10 @@ static inline void mem_cgroup_flush_stats(void) { } +static inline void mem_cgroup_flush_stats_delayed(void) +{ +} + static inline void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, int val) { -- cgit From 5f24d5a579d1eace79d505b148808a850b417d4c Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Thu, 21 Apr 2022 16:35:46 -0700 Subject: mm, hugetlb: allow for "high" userspace addresses This is a fix for commit f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") for hugetlb. This patch adds support for "high" userspace addresses that are optionally supported on the system and have to be requested via a hint mechanism ("high" addr parameter to mmap). Architectures such as powerpc and x86 achieve this by making changes to their architectural versions of hugetlb_get_unmapped_area() function. However, arm64 uses the generic version of that function. So take into account arch_get_mmap_base() and arch_get_mmap_end() in hugetlb_get_unmapped_area(). To allow that, move those two macros out of mm/mmap.c into include/linux/sched/mm.h If these macros are not defined in architectural code then they default to (TASK_SIZE) and (base) so should not introduce any behavioural changes to architectures that do not define them. For the time being, only ARM64 is affected by this change. Catalin (ARM64) said "We should have fixed hugetlb_get_unmapped_area() as well when we added support for 52-bit VA. The reason for commit f6795053dac8 was to prevent normal mmap() from returning addresses above 48-bit by default as some user-space had hard assumptions about this. It's a slight ABI change if you do this for hugetlb_get_unmapped_area() but I doubt anyone would notice. It's more likely that the current behaviour would cause issues, so I'd rather have them consistent. Basically when arm64 gained support for 52-bit addresses we did not want user-space calling mmap() to suddenly get such high addresses, otherwise we could have inadvertently broken some programs (similar behaviour to x86 here). Hence we added commit f6795053dac8. But we missed hugetlbfs which could still get such high mmap() addresses. So in theory that's a potential regression that should have bee addressed at the same time as commit f6795053dac8 (and before arm64 enabled 52-bit addresses)" Link: https://lkml.kernel.org/r/ab847b6edb197bffdfe189e70fb4ac76bfe79e0d.1650033747.git.christophe.leroy@csgroup.eu Fixes: f6795053dac8 ("mm: mmap: Allow for "high" userspace addresses") Signed-off-by: Christophe Leroy Reviewed-by: Catalin Marinas Cc: Steve Capper Cc: Will Deacon Cc: [5.0.x] Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched/mm.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index a80356e9dc69..1ad1f4bfa025 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -136,6 +136,14 @@ static inline void mm_update_next_owner(struct mm_struct *mm) #endif /* CONFIG_MEMCG */ #ifdef CONFIG_MMU +#ifndef arch_get_mmap_end +#define arch_get_mmap_end(addr) (TASK_SIZE) +#endif + +#ifndef arch_get_mmap_base +#define arch_get_mmap_base(addr, base) (base) +#endif + extern void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack); extern unsigned long -- cgit From e4a38402c36e42df28eb1a5394be87e6571fb48a Mon Sep 17 00:00:00 2001 From: Nico Pache Date: Thu, 21 Apr 2022 16:36:01 -0700 Subject: oom_kill.c: futex: delay the OOM reaper to allow time for proper futex cleanup The pthread struct is allocated on PRIVATE|ANONYMOUS memory [1] which can be targeted by the oom reaper. This mapping is used to store the futex robust list head; the kernel does not keep a copy of the robust list and instead references a userspace address to maintain the robustness during a process death. A race can occur between exit_mm and the oom reaper that allows the oom reaper to free the memory of the futex robust list before the exit path has handled the futex death: CPU1 CPU2 -------------------------------------------------------------------- page_fault do_exit "signal" wake_oom_reaper oom_reaper oom_reap_task_mm (invalidates mm) exit_mm exit_mm_release futex_exit_release futex_cleanup exit_robust_list get_user (EFAULT- can't access memory) If the get_user EFAULT's, the kernel will be unable to recover the waiters on the robust_list, leaving userspace mutexes hung indefinitely. Delay the OOM reaper, allowing more time for the exit path to perform the futex cleanup. Reproducer: https://gitlab.com/jsavitz/oom_futex_reproducer Based on a patch by Michal Hocko. Link: https://elixir.bootlin.com/glibc/glibc-2.35/source/nptl/allocatestack.c#L370 [1] Link: https://lkml.kernel.org/r/20220414144042.677008-1-npache@redhat.com Fixes: 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") Signed-off-by: Joel Savitz Signed-off-by: Nico Pache Co-developed-by: Joel Savitz Suggested-by: Thomas Gleixner Acked-by: Thomas Gleixner Acked-by: Michal Hocko Cc: Rafael Aquini Cc: Waiman Long Cc: Herton R. Krzesinski Cc: Juri Lelli Cc: Vincent Guittot Cc: Dietmar Eggemann Cc: Steven Rostedt Cc: Ben Segall Cc: Mel Gorman Cc: Daniel Bristot de Oliveira Cc: David Rientjes Cc: Andrea Arcangeli Cc: Davidlohr Bueso Cc: Peter Zijlstra Cc: Ingo Molnar Cc: Joel Savitz Cc: Darren Hart Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/sched.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index d5e3c00b74e1..a8911b1f35aa 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1443,6 +1443,7 @@ struct task_struct { int pagefault_disabled; #ifdef CONFIG_MMU struct task_struct *oom_reaper_list; + struct timer_list oom_reaper_timer; #endif #ifdef CONFIG_VMAP_STACK struct vm_struct *stack_vm_area; -- cgit From 52ef8efcb75e8a8aab88e74c1376c2785d9a5452 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Sat, 2 Apr 2022 00:28:23 +0200 Subject: dma: omap: hide legacy interface The legacy interface for omap-dma is only used on OMAP1, and the same is true for the non-DT case. Make both of these conditional on CONFIG_ARCH_OMAP1 being set to simplify the dependency. The non-OMAP stub functions in include/linux/omap-dma.h are note needed any more either now, because they are only called on OMAP1. Acked-by: Tony Lindgren Acked-By: Vinod Koul Signed-off-by: Arnd Bergmann --- include/linux/omap-dma.h | 22 ---------------------- 1 file changed, 22 deletions(-) (limited to 'include/linux') diff --git a/include/linux/omap-dma.h b/include/linux/omap-dma.h index 07fa58ae9902..254b4e10511b 100644 --- a/include/linux/omap-dma.h +++ b/include/linux/omap-dma.h @@ -292,7 +292,6 @@ struct omap_system_dma_plat_info { #define dma_omap15xx() __dma_omap15xx(d) #define dma_omap16xx() __dma_omap16xx(d) -#if defined(CONFIG_ARCH_OMAP) extern struct omap_system_dma_plat_info *omap_get_plat_info(void); extern void omap_set_dma_priority(int lch, int dst_port, int priority); @@ -340,25 +339,4 @@ static inline int omap_lcd_dma_running(void) } #endif -#else /* CONFIG_ARCH_OMAP */ -static inline void omap_set_dma_priority(int lch, int dst_port, int priority) -{ -} - -static inline struct omap_system_dma_plat_info *omap_get_plat_info(void) -{ - return NULL; -} - -static inline int omap_request_dma(int dev_id, const char *dev_name, - void (*callback)(int lch, u16 ch_status, void *data), - void *data, int *dma_ch) -{ - return -ENODEV; -} - -static inline void omap_free_dma(int ch) { } - -#endif /* CONFIG_ARCH_OMAP */ - #endif /* __LINUX_OMAP_DMA_H */ -- cgit From 615dce5bf7369334f8c4f19f7f8f2a5634b4f911 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 12 Sep 2019 10:11:24 +0200 Subject: ARM: omap1: fix build with no SoC selected In a multiplatform randconfig kernel, one can have CONFIG_ARCH_OMAP1 enabled, but none of the specific SoCs. This leads to some build issues as the code is not meant to deal with this configuration at the moment: arch/arm/mach-omap1/io.c:86:20: error: unused function 'omap1_map_common_io' [-Werror,-Wunused-function] arch/arm/mach-omap1/pm.h:113:2: error: "Power management for this processor not implemented yet" [-Werror,-W#warnings] Use the same trick as on OMAP2 and guard the actual compilation of platform code with another Makefile ifdef check based on an option that depends on having at least one SoC enabled. The io.c file still needs to get compiled to allow building device drivers with a dependency on CONFIG_ARCH_OMAP1. Acked-by: Tony Lindgren Signed-off-by: Arnd Bergmann --- include/linux/soc/ti/omap1-io.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soc/ti/omap1-io.h b/include/linux/soc/ti/omap1-io.h index 9332c92690f4..f7f12728d4a6 100644 --- a/include/linux/soc/ti/omap1-io.h +++ b/include/linux/soc/ti/omap1-io.h @@ -5,7 +5,7 @@ #ifndef __ASSEMBLER__ #include -#if defined(CONFIG_ARCH_OMAP) && defined(CONFIG_ARCH_OMAP1) +#ifdef CONFIG_ARCH_OMAP1_ANY /* * NOTE: Please use ioremap + __raw_read/write where possible instead of these */ @@ -15,7 +15,7 @@ extern u32 omap_readl(u32 pa); extern void omap_writeb(u8 v, u32 pa); extern void omap_writew(u16 v, u32 pa); extern void omap_writel(u32 v, u32 pa); -#elif defined(CONFIG_COMPILE_TEST) +#else static inline u8 omap_readb(u32 pa) { return 0; } static inline u16 omap_readw(u32 pa) { return 0; } static inline u32 omap_readl(u32 pa) { return 0; } -- cgit From 78ed93d72ded679e3caf0758357209887bda885f Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 4 Apr 2022 13:12:04 +0200 Subject: signal: Deliver SIGTRAP on perf event asynchronously if blocked With SIGTRAP on perf events, we have encountered termination of processes due to user space attempting to block delivery of SIGTRAP. Consider this case: ... sigset_t s; sigemptyset(&s); sigaddset(&s, SIGTRAP | ); sigprocmask(SIG_BLOCK, &s, ...); ... When the perf event triggers, while SIGTRAP is blocked, force_sig_perf() will force the signal, but revert back to the default handler, thus terminating the task. This makes sense for error conditions, but not so much for explicitly requested monitoring. However, the expectation is still that signals generated by perf events are synchronous, which will no longer be the case if the signal is blocked and delivered later. To give user space the ability to clearly distinguish synchronous from asynchronous signals, introduce siginfo_t::si_perf_flags and TRAP_PERF_FLAG_ASYNC (opted for flags in case more binary information is required in future). The resolution to the problem is then to (a) no longer force the signal (avoiding the terminations), but (b) tell user space via si_perf_flags if the signal was synchronous or not, so that such signals can be handled differently (e.g. let user space decide to ignore or consider the data imprecise). The alternative of making the kernel ignore SIGTRAP on perf events if the signal is blocked may work for some usecases, but likely causes issues in others that then have to revert back to interception of sigprocmask() (which we want to avoid). [ A concrete example: when using breakpoint perf events to track data-flow, in a region of code where signals are blocked, data-flow can no longer be tracked accurately. When a relevant asynchronous signal is received after unblocking the signal, the data-flow tracking logic needs to know its state is imprecise. ] Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov Signed-off-by: Marco Elver Signed-off-by: Peter Zijlstra (Intel) Acked-by: Geert Uytterhoeven Tested-by: Dmitry Vyukov Link: https://lore.kernel.org/r/20220404111204.935357-1-elver@google.com --- include/linux/compat.h | 1 + include/linux/sched/signal.h | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/compat.h b/include/linux/compat.h index 1c758b0e0359..01fddf72a81f 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -235,6 +235,7 @@ typedef struct compat_siginfo { struct { compat_ulong_t _data; u32 _type; + u32 _flags; } _perf; }; } _sigfault; diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 3c8b34876744..bab7cc56b13a 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -320,7 +320,7 @@ int send_sig_mceerr(int code, void __user *, short, struct task_struct *); int force_sig_bnderr(void __user *addr, void __user *lower, void __user *upper); int force_sig_pkuerr(void __user *addr, u32 pkey); -int force_sig_perf(void __user *addr, u32 type, u64 sig_data); +int send_sig_perf(void __user *addr, u32 type, u64 sig_data); int force_sig_ptrace_errno_trap(int errno, void __user *addr); int force_sig_fault_trapno(int sig, int code, void __user *addr, int trapno); -- cgit From 03f16cd020eb8bb2eb837e2090086f296a9fa91d Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Apr 2022 09:50:36 -0700 Subject: objtool: Add CONFIG_OBJTOOL Now that stack validation is an optional feature of objtool, add CONFIG_OBJTOOL and replace most usages of CONFIG_STACK_VALIDATION with it. CONFIG_STACK_VALIDATION can now be considered to be frame-pointer specific. CONFIG_UNWINDER_ORC is already inherently valid for live patching, so no need to "validate" it. Signed-off-by: Josh Poimboeuf Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Miroslav Benes Link: https://lkml.kernel.org/r/939bf3d85604b2a126412bf11af6e3bd3b872bcb.1650300597.git.jpoimboe@redhat.com --- include/linux/compiler.h | 6 +++--- include/linux/instrumentation.h | 6 +++--- include/linux/objtool.h | 6 +++--- 3 files changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 219aa5ddbc73..01ce94b58b42 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -109,7 +109,7 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, #endif /* Unreachable code */ -#ifdef CONFIG_STACK_VALIDATION +#ifdef CONFIG_OBJTOOL /* * These macros help objtool understand GCC code flow for unreachable code. * The __COUNTER__ based labels are a hack to make each instance of the macros @@ -128,10 +128,10 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, /* Annotate a C jump table to allow objtool to follow the code flow */ #define __annotate_jump_table __section(".rodata..c_jump_table") -#else +#else /* !CONFIG_OBJTOOL */ #define annotate_unreachable() #define __annotate_jump_table -#endif +#endif /* CONFIG_OBJTOOL */ #ifndef unreachable # define unreachable() do { \ diff --git a/include/linux/instrumentation.h b/include/linux/instrumentation.h index 24359b4a9605..9111a3704072 100644 --- a/include/linux/instrumentation.h +++ b/include/linux/instrumentation.h @@ -2,7 +2,7 @@ #ifndef __LINUX_INSTRUMENTATION_H #define __LINUX_INSTRUMENTATION_H -#if defined(CONFIG_DEBUG_ENTRY) && defined(CONFIG_STACK_VALIDATION) +#ifdef CONFIG_VMLINUX_VALIDATION #include @@ -53,9 +53,9 @@ ".popsection\n\t" : : "i" (c)); \ }) #define instrumentation_end() __instrumentation_end(__COUNTER__) -#else +#else /* !CONFIG_VMLINUX_VALIDATION */ # define instrumentation_begin() do { } while(0) # define instrumentation_end() do { } while(0) -#endif +#endif /* CONFIG_VMLINUX_VALIDATION */ #endif /* __LINUX_INSTRUMENTATION_H */ diff --git a/include/linux/objtool.h b/include/linux/objtool.h index 586d35720f13..977d90ba642d 100644 --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -38,7 +38,7 @@ struct unwind_hint { #define UNWIND_HINT_TYPE_REGS_PARTIAL 2 #define UNWIND_HINT_TYPE_FUNC 3 -#ifdef CONFIG_STACK_VALIDATION +#ifdef CONFIG_OBJTOOL #ifndef __ASSEMBLY__ @@ -157,7 +157,7 @@ struct unwind_hint { #endif /* __ASSEMBLY__ */ -#else /* !CONFIG_STACK_VALIDATION */ +#else /* !CONFIG_OBJTOOL */ #ifndef __ASSEMBLY__ @@ -179,6 +179,6 @@ struct unwind_hint { .endm #endif -#endif /* CONFIG_STACK_VALIDATION */ +#endif /* CONFIG_OBJTOOL */ #endif /* _LINUX_OBJTOOL_H */ -- cgit From 0f620cefd7753175b6258fed85f76c2014ec3799 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 18 Apr 2022 09:50:41 -0700 Subject: objtool: Rename "VMLINUX_VALIDATION" -> "NOINSTR_VALIDATION" CONFIG_VMLINUX_VALIDATION is just the validation of the "noinstr" rules. That name is a misnomer, because now objtool actually does vmlinux validation for other reasons. Rename CONFIG_VMLINUX_VALIDATION to CONFIG_NOINSTR_VALIDATION. Signed-off-by: Josh Poimboeuf Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Miroslav Benes Link: https://lkml.kernel.org/r/173f07e2d6d1afc0874aed975a61783207c6a531.1650300597.git.jpoimboe@redhat.com --- include/linux/instrumentation.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/instrumentation.h b/include/linux/instrumentation.h index 9111a3704072..bc7babe91b2e 100644 --- a/include/linux/instrumentation.h +++ b/include/linux/instrumentation.h @@ -2,7 +2,7 @@ #ifndef __LINUX_INSTRUMENTATION_H #define __LINUX_INSTRUMENTATION_H -#ifdef CONFIG_VMLINUX_VALIDATION +#ifdef CONFIG_NOINSTR_VALIDATION #include @@ -53,9 +53,9 @@ ".popsection\n\t" : : "i" (c)); \ }) #define instrumentation_end() __instrumentation_end(__COUNTER__) -#else /* !CONFIG_VMLINUX_VALIDATION */ +#else /* !CONFIG_NOINSTR_VALIDATION */ # define instrumentation_begin() do { } while(0) # define instrumentation_end() do { } while(0) -#endif /* CONFIG_VMLINUX_VALIDATION */ +#endif /* CONFIG_NOINSTR_VALIDATION */ #endif /* __LINUX_INSTRUMENTATION_H */ -- cgit From 89e9c7280075f6733b22dd0740daeddeb1256ebf Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Wed, 20 Apr 2022 10:58:50 +0900 Subject: ipv6: Remove __ipv6_only_sock(). Since commit 9fe516ba3fb2 ("inet: move ipv6only in sock_common"), ipv6_only_sock() and __ipv6_only_sock() are the same macro. Let's remove the one. Signed-off-by: Kuniyuki Iwashima Reviewed-by: David Ahern Signed-off-by: David S. Miller --- include/linux/ipv6.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 918bfea4ef5f..ec5ca392eaa3 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -340,8 +340,7 @@ static inline struct raw6_sock *raw6_sk(const struct sock *sk) return (struct raw6_sock *)sk; } -#define __ipv6_only_sock(sk) (sk->sk_ipv6only) -#define ipv6_only_sock(sk) (__ipv6_only_sock(sk)) +#define ipv6_only_sock(sk) (sk->sk_ipv6only) #define ipv6_sk_rxinfo(sk) ((sk)->sk_family == PF_INET6 && \ inet6_sk(sk)->rxopt.bits.rxinfo) @@ -358,7 +357,6 @@ static inline int inet_v6_ipv6only(const struct sock *sk) return ipv6_only_sock(sk); } #else -#define __ipv6_only_sock(sk) 0 #define ipv6_only_sock(sk) 0 #define ipv6_sk_rxinfo(sk) 0 -- cgit From b804923b7ccb9c9629703364e927b48cd02a9254 Mon Sep 17 00:00:00 2001 From: "jason-jh.lin" Date: Tue, 19 Apr 2022 17:41:36 +0800 Subject: soc: mediatek: add mtk-mmsys support for mt8195 vdosys0 1. Add mt8195 mmsys compatible for 2 vdosys. 2. Add io_start into each driver data of mt8195 vdosys. 3. Add get match data function to identify mmsys by io_start. 4. Add mt8195 routing table settings of vdosys0. Signed-off-by: jason-jh.lin Reviewed-by: AngeloGioacchino Del Regno Reviewed-by: Rex-BC Chen Reviewed-by: CK Hu Link: https://lore.kernel.org/r/20220419094143.9561-2-jason-jh.lin@mediatek.com Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/mtk-mmsys.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk-mmsys.h b/include/linux/soc/mediatek/mtk-mmsys.h index 4bba275e235a..cff5c9adbf46 100644 --- a/include/linux/soc/mediatek/mtk-mmsys.h +++ b/include/linux/soc/mediatek/mtk-mmsys.h @@ -17,13 +17,24 @@ enum mtk_ddp_comp_id { DDP_COMPONENT_COLOR0, DDP_COMPONENT_COLOR1, DDP_COMPONENT_DITHER, + DDP_COMPONENT_DITHER1, + DDP_COMPONENT_DP_INTF0, + DDP_COMPONENT_DP_INTF1, DDP_COMPONENT_DPI0, DDP_COMPONENT_DPI1, + DDP_COMPONENT_DSC0, + DDP_COMPONENT_DSC1, DDP_COMPONENT_DSI0, DDP_COMPONENT_DSI1, DDP_COMPONENT_DSI2, DDP_COMPONENT_DSI3, DDP_COMPONENT_GAMMA, + DDP_COMPONENT_MERGE0, + DDP_COMPONENT_MERGE1, + DDP_COMPONENT_MERGE2, + DDP_COMPONENT_MERGE3, + DDP_COMPONENT_MERGE4, + DDP_COMPONENT_MERGE5, DDP_COMPONENT_OD0, DDP_COMPONENT_OD1, DDP_COMPONENT_OVL0, -- cgit From 4e8988c634a1159e803bfdc0118578ded97e012c Mon Sep 17 00:00:00 2001 From: "jason-jh.lin" Date: Tue, 19 Apr 2022 17:41:41 +0800 Subject: soc: mediatek: add DDP_DOMPONENT_DITHER0 enum for mt8195 vdosys0 The mmsys routing table of mt8195 vdosys0 has 2 DITHER components, so mmsys need to add DDP_COMPONENT_DITHER1 and change all usages of DITHER enum form DDP_COMPONENT_DITHER to DDP_COMPONENT_DITHER0. But its header need to keep DDP_COMPONENT_DITHER enum until drm/mediatek also changed it. Signed-off-by: jason-jh.lin Reviewed-by: AngeloGioacchino Del Regno Reviewed-by: Rex-BC Chen Link: https://lore.kernel.org/r/20220419094143.9561-7-jason-jh.lin@mediatek.com Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/mtk-mmsys.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk-mmsys.h b/include/linux/soc/mediatek/mtk-mmsys.h index cff5c9adbf46..59117d970daf 100644 --- a/include/linux/soc/mediatek/mtk-mmsys.h +++ b/include/linux/soc/mediatek/mtk-mmsys.h @@ -17,6 +17,7 @@ enum mtk_ddp_comp_id { DDP_COMPONENT_COLOR0, DDP_COMPONENT_COLOR1, DDP_COMPONENT_DITHER, + DDP_COMPONENT_DITHER0 = DDP_COMPONENT_DITHER, DDP_COMPONENT_DITHER1, DDP_COMPONENT_DP_INTF0, DDP_COMPONENT_DP_INTF1, -- cgit From bd40cbb0e3b37a4d2a2d9e2ac40122cdf619b1f3 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Tue, 19 Apr 2022 19:29:16 +0200 Subject: PM: domains: Move genpd's time-accounting to ktime_get_mono_fast_ns() To move towards a more consistent behaviour between genpd and the runtime PM core, let's start by converting genpd's time-accounting from ktime_get() into ktime_get_mono_fast_ns(). Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- include/linux/pm_domain.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 67017c9390c8..043d48e4420a 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -98,7 +98,7 @@ struct genpd_power_state { u64 usage; u64 rejected; struct fwnode_handle *fwnode; - ktime_t idle_time; + u64 idle_time; void *data; }; @@ -149,8 +149,8 @@ struct generic_pm_domain { unsigned int state_count); unsigned int state_count; /* number of states */ unsigned int state_idx; /* state that genpd will go to when off */ - ktime_t on_time; - ktime_t accounting_time; + u64 on_time; + u64 accounting_time; const struct genpd_lock_ops *lock_ops; union { struct mutex mlock; -- cgit From 6c2f421174273de8f83cde4286d1c076d43a2d35 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 19 Apr 2022 13:34:24 +0200 Subject: driver: platform: Add helper for safer setting of driver_override Several core drivers and buses expect that driver_override is a dynamically allocated memory thus later they can kfree() it. However such assumption is not documented, there were in the past and there are already users setting it to a string literal. This leads to kfree() of static memory during device release (e.g. in error paths or during unbind): kernel BUG at ../mm/slub.c:3960! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM ... (kfree) from [] (platform_device_release+0x88/0xb4) (platform_device_release) from [] (device_release+0x2c/0x90) (device_release) from [] (kobject_put+0xec/0x20c) (kobject_put) from [] (exynos5_clk_probe+0x154/0x18c) (exynos5_clk_probe) from [] (platform_drv_probe+0x6c/0xa4) (platform_drv_probe) from [] (really_probe+0x280/0x414) (really_probe) from [] (driver_probe_device+0x78/0x1c4) (driver_probe_device) from [] (bus_for_each_drv+0x74/0xb8) (bus_for_each_drv) from [] (__device_attach+0xd4/0x16c) (__device_attach) from [] (bus_probe_device+0x88/0x90) (bus_probe_device) from [] (device_add+0x3dc/0x62c) (device_add) from [] (of_platform_device_create_pdata+0x94/0xbc) (of_platform_device_create_pdata) from [] (of_platform_bus_create+0x1a8/0x4fc) (of_platform_bus_create) from [] (of_platform_bus_create+0x20c/0x4fc) (of_platform_bus_create) from [] (of_platform_populate+0x84/0x118) (of_platform_populate) from [] (of_platform_default_populate_init+0xa0/0xb8) (of_platform_default_populate_init) from [] (do_one_initcall+0x8c/0x404) Provide a helper which clearly documents the usage of driver_override. This will allow later to reuse the helper and reduce the amount of duplicated code. Convert the platform driver to use a new helper and make the driver_override field const char (it is not modified by the core). Reviewed-by: Rafael J. Wysocki Acked-by: Rafael J. Wysocki Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/device/driver.h | 2 ++ include/linux/platform_device.h | 6 +++++- 2 files changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index 15e7c5e15d62..700453017e1c 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -151,6 +151,8 @@ extern int __must_check driver_create_file(struct device_driver *driver, extern void driver_remove_file(struct device_driver *driver, const struct driver_attribute *attr); +int driver_set_override(struct device *dev, const char **override, + const char *s, size_t len); extern int __must_check driver_for_each_device(struct device_driver *drv, struct device *start, void *data, diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h index 7c96f169d274..582d83ed9a91 100644 --- a/include/linux/platform_device.h +++ b/include/linux/platform_device.h @@ -31,7 +31,11 @@ struct platform_device { struct resource *resource; const struct platform_device_id *id_entry; - char *driver_override; /* Driver name to force a match */ + /* + * Driver name to force a match. Do not set directly, because core + * frees it. Use driver_set_override() to set or clear it. + */ + const char *driver_override; /* MFD cell pointer */ struct mfd_cell *mfd_cell; -- cgit From 6e67955087e72f1a5f64dd8014c27e1d9317fbb4 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 19 Apr 2022 13:34:25 +0200 Subject: amba: Use driver_set_override() instead of open-coding Use a helper to set driver_override to reduce the amount of duplicated code. Make the driver_override field const char, because it is not modified by the core and it matches other subsystems. Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220419113435.246203-3-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/amba/bus.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h index 6562f543c3e0..93799a72ff82 100644 --- a/include/linux/amba/bus.h +++ b/include/linux/amba/bus.h @@ -70,7 +70,11 @@ struct amba_device { unsigned int cid; struct amba_cs_uci_id uci; unsigned int irq[AMBA_NR_IRQS]; - char *driver_override; + /* + * Driver name to force a match. Do not set directly, because core + * frees it. Use driver_set_override() to set or clear it. + */ + const char *driver_override; }; struct amba_driver { -- cgit From 5688f212e98a2469583a067fa5da4312ddc4e357 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 19 Apr 2022 13:34:26 +0200 Subject: fsl-mc: Use driver_set_override() instead of open-coding Use a helper to set driver_override to reduce the amount of duplicated code. Make the driver_override field const char, because it is not modified by the core and it matches other subsystems. Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220419113435.246203-4-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/fsl/mc.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h index 7b6c42bfb660..7a87ab9eba99 100644 --- a/include/linux/fsl/mc.h +++ b/include/linux/fsl/mc.h @@ -170,7 +170,9 @@ struct fsl_mc_obj_desc { * @regions: pointer to array of MMIO region entries * @irqs: pointer to array of pointers to interrupts allocated to this device * @resource: generic resource associated with this MC object device, if any. - * @driver_override: driver name to force a match + * @driver_override: driver name to force a match; do not set directly, + * because core frees it; use driver_set_override() to + * set or clear it. * * Generic device object for MC object devices that are "attached" to a * MC bus. @@ -204,7 +206,7 @@ struct fsl_mc_device { struct fsl_mc_device_irq **irqs; struct fsl_mc_resource *resource; struct device_link *consumer_link; - char *driver_override; + const char *driver_override; }; #define to_fsl_mc_device(_dev) \ -- cgit From 01ed100276bdac259f1b418b998904a7486b0b68 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 19 Apr 2022 13:34:27 +0200 Subject: hv: Use driver_set_override() instead of open-coding Use a helper to set driver_override to the reduce amount of duplicated code. Make the driver_override field const char, because it is not modified by the core and it matches other subsystems. Reviewed-by: Michael Kelley Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220419113435.246203-5-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/hyperv.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index fe2e0179ed51..12e2336b23b7 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1257,7 +1257,11 @@ struct hv_device { u16 device_id; struct device device; - char *driver_override; /* Driver name to force a match */ + /* + * Driver name to force a match. Do not set directly, because core + * frees it. Use driver_set_override() to set or clear it. + */ + const char *driver_override; struct vmbus_channel *channel; struct kset *channels_kset; -- cgit From 23d99baf9d729ca30b2fb6798a7b403a37bfb800 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 19 Apr 2022 13:34:28 +0200 Subject: PCI: Use driver_set_override() instead of open-coding Use a helper to set driver_override to the reduce amount of duplicated code. Make the driver_override field const char, because it is not modified by the core and it matches other subsystems. Reviewed-by: Andy Shevchenko Acked-by: Bjorn Helgaas Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220419113435.246203-6-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/pci.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 60adf42460ab..844d38f589cf 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -516,7 +516,11 @@ struct pci_dev { u16 acs_cap; /* ACS Capability offset */ phys_addr_t rom; /* Physical address if not from BAR */ size_t romlen; /* Length if not from BAR */ - char *driver_override; /* Driver name to force a match */ + /* + * Driver name to force a match. Do not set directly, because core + * frees it. Use driver_set_override() to set or clear it. + */ + const char *driver_override; unsigned long priv_flags; /* Private flags for the PCI driver */ -- cgit From 19368f0f23e80929691dd5b1354832c0e0494419 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 19 Apr 2022 13:34:30 +0200 Subject: spi: Use helper for safer setting of driver_override Use a helper to set driver_override to the reduce amount of duplicated code. Reviewed-by: Mark Brown Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220419113435.246203-8-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/spi/spi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 5f8c063ddff4..f0177f9b6e13 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -138,6 +138,8 @@ extern int spi_delay_exec(struct spi_delay *_delay, struct spi_transfer *xfer); * for driver coldplugging, and in uevents used for hotplugging * @driver_override: If the name of a driver is written to this attribute, then * the device will bind to the named driver and only the named driver. + * Do not set directly, because core frees it; use driver_set_override() to + * set or clear it. * @cs_gpiod: gpio descriptor of the chipselect line (optional, NULL when * not using a GPIO line) * @word_delay: delay to be inserted between consecutive -- cgit From 240bf4e665741241983580812a2be1b033f120ee Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 19 Apr 2022 13:34:31 +0200 Subject: vdpa: Use helper for safer setting of driver_override Use a helper to set driver_override to the reduce amount of duplicated code. Acked-by: Michael S. Tsirkin Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220419113435.246203-9-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/vdpa.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 8943a209202e..c0a5083632ab 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -64,7 +64,9 @@ struct vdpa_mgmt_dev; * struct vdpa_device - representation of a vDPA device * @dev: underlying device * @dma_dev: the actual device that is performing DMA - * @driver_override: driver name to force a match + * @driver_override: driver name to force a match; do not set directly, + * because core frees it; use driver_set_override() to + * set or clear it. * @config: the configuration ops for this device. * @cf_mutex: Protects get and set access to configuration layout. * @index: device index -- cgit From 42cd402b8fd4672b692400fe5f9eecd55d2794ac Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Tue, 19 Apr 2022 13:34:35 +0200 Subject: rpmsg: Fix kfree() of static memory on setting driver_override The driver_override field from platform driver should not be initialized from static memory (string literal) because the core later kfree() it, for example when driver_override is set via sysfs. Use dedicated helper to set driver_override properly. Fixes: 950a7388f02b ("rpmsg: Turn name service into a stand alone driver") Fixes: c0cdc19f84a4 ("rpmsg: Driver for user space endpoint interface") Reviewed-by: Bjorn Andersson Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220419113435.246203-13-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/rpmsg.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rpmsg.h b/include/linux/rpmsg.h index 02fa9116cd60..20c8cd1cde21 100644 --- a/include/linux/rpmsg.h +++ b/include/linux/rpmsg.h @@ -41,7 +41,9 @@ struct rpmsg_channel_info { * rpmsg_device - device that belong to the rpmsg bus * @dev: the device struct * @id: device id (used to match between rpmsg drivers and devices) - * @driver_override: driver name to force a match + * @driver_override: driver name to force a match; do not set directly, + * because core frees it; use driver_set_override() to + * set or clear it. * @src: local address * @dst: destination address * @ept: the rpmsg endpoint of this channel @@ -51,7 +53,7 @@ struct rpmsg_channel_info { struct rpmsg_device { struct device dev; struct rpmsg_device_id id; - char *driver_override; + const char *driver_override; u32 src; u32 dst; struct rpmsg_endpoint *ept; -- cgit From f918cfc08c1755b9e54cd6effc923fa809045cf4 Mon Sep 17 00:00:00 2001 From: Ronak Jain Date: Wed, 6 Apr 2022 03:55:23 -0700 Subject: firmware: xilinx: add support for IOCTL and QUERY ID feature check Add support to check if IOCTL ID or QUERY ID is supported in firmware or not. Signed-off-by: Ronak Jain Link: https://lore.kernel.org/r/1649242526-17493-2-git-send-email-ronak.jain@xilinx.com Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/xlnx-zynqmp.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h index 14f00a7672d1..1ec73d5352c3 100644 --- a/include/linux/firmware/xlnx-zynqmp.h +++ b/include/linux/firmware/xlnx-zynqmp.h @@ -29,6 +29,11 @@ /* SMC SIP service Call Function Identifier Prefix */ #define PM_SIP_SVC 0xC2000000 + +/* PM API versions */ +#define PM_API_VERSION_2 2 + +/* ATF only commands */ #define PM_GET_TRUSTZONE_VERSION 0xa03 #define PM_SET_SUSPEND_MODE 0xa02 #define GET_CALLBACK_DATA 0xa01 @@ -460,6 +465,7 @@ int zynqmp_pm_load_pdi(const u32 src, const u64 address); int zynqmp_pm_register_notifier(const u32 node, const u32 event, const u32 wake, const u32 enable); int zynqmp_pm_feature(const u32 api_id); +int zynqmp_pm_is_function_supported(const u32 api_id, const u32 id); int zynqmp_pm_set_feature_config(enum pm_feature_config_id id, u32 value); int zynqmp_pm_get_feature_config(enum pm_feature_config_id id, u32 *payload); #else @@ -678,6 +684,11 @@ static inline int zynqmp_pm_pinctrl_get_function(const u32 pin, u32 *id) return -ENODEV; } +static inline int zynqmp_pm_is_function_supported(const u32 api_id, const u32 id) +{ + return -ENODEV; +} + static inline int zynqmp_pm_pinctrl_set_function(const u32 pin, const u32 id) { return -ENODEV; -- cgit From faebd693c59387b7b765fab64b543855e15a91b4 Mon Sep 17 00:00:00 2001 From: John Ogness Date: Thu, 21 Apr 2022 23:28:36 +0206 Subject: printk: rename cpulock functions Since the printk cpulock is CPU-reentrant and since it is used in all contexts, its usage must be carefully considered and most likely will require programming locklessly. To avoid mistaking the printk cpulock as a typical lock, rename it to cpu_sync. The main functions then become: printk_cpu_sync_get_irqsave(flags); printk_cpu_sync_put_irqrestore(flags); Add extra notes of caution in the function description to help developers understand the requirements for correct usage. Signed-off-by: John Ogness Reviewed-by: Petr Mladek Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220421212250.565456-2-john.ogness@linutronix.de --- include/linux/printk.h | 54 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 33 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index 1522df223c0f..859323a52985 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -277,43 +277,55 @@ static inline void printk_trigger_flush(void) #endif #ifdef CONFIG_SMP -extern int __printk_cpu_trylock(void); -extern void __printk_wait_on_cpu_lock(void); -extern void __printk_cpu_unlock(void); +extern int __printk_cpu_sync_try_get(void); +extern void __printk_cpu_sync_wait(void); +extern void __printk_cpu_sync_put(void); /** - * printk_cpu_lock_irqsave() - Acquire the printk cpu-reentrant spinning - * lock and disable interrupts. + * printk_cpu_sync_get_irqsave() - Acquire the printk cpu-reentrant spinning + * lock and disable interrupts. * @flags: Stack-allocated storage for saving local interrupt state, - * to be passed to printk_cpu_unlock_irqrestore(). + * to be passed to printk_cpu_sync_put_irqrestore(). * * If the lock is owned by another CPU, spin until it becomes available. * Interrupts are restored while spinning. + * + * CAUTION: This function must be used carefully. It does not behave like a + * typical lock. Here are important things to watch out for... + * + * * This function is reentrant on the same CPU. Therefore the calling + * code must not assume exclusive access to data if code accessing the + * data can run reentrant or within NMI context on the same CPU. + * + * * If there exists usage of this function from NMI context, it becomes + * unsafe to perform any type of locking or spinning to wait for other + * CPUs after calling this function from any context. This includes + * using spinlocks or any other busy-waiting synchronization methods. */ -#define printk_cpu_lock_irqsave(flags) \ - for (;;) { \ - local_irq_save(flags); \ - if (__printk_cpu_trylock()) \ - break; \ - local_irq_restore(flags); \ - __printk_wait_on_cpu_lock(); \ +#define printk_cpu_sync_get_irqsave(flags) \ + for (;;) { \ + local_irq_save(flags); \ + if (__printk_cpu_sync_try_get()) \ + break; \ + local_irq_restore(flags); \ + __printk_cpu_sync_wait(); \ } /** - * printk_cpu_unlock_irqrestore() - Release the printk cpu-reentrant spinning - * lock and restore interrupts. - * @flags: Caller's saved interrupt state, from printk_cpu_lock_irqsave(). + * printk_cpu_sync_put_irqrestore() - Release the printk cpu-reentrant spinning + * lock and restore interrupts. + * @flags: Caller's saved interrupt state, from printk_cpu_sync_get_irqsave(). */ -#define printk_cpu_unlock_irqrestore(flags) \ +#define printk_cpu_sync_put_irqrestore(flags) \ do { \ - __printk_cpu_unlock(); \ + __printk_cpu_sync_put(); \ local_irq_restore(flags); \ - } while (0) \ + } while (0) #else -#define printk_cpu_lock_irqsave(flags) ((void)flags) -#define printk_cpu_unlock_irqrestore(flags) ((void)flags) +#define printk_cpu_sync_get_irqsave(flags) ((void)flags) +#define printk_cpu_sync_put_irqrestore(flags) ((void)flags) #endif /* CONFIG_SMP */ -- cgit From f5343321b71ac0a1112adeab0ff90b239bad3a83 Mon Sep 17 00:00:00 2001 From: John Ogness Date: Thu, 21 Apr 2022 23:28:37 +0206 Subject: printk: cpu sync always disable interrupts The CPU sync functions are a NOP for !CONFIG_SMP. But for !CONFIG_SMP they still need to disable interrupts in order to preserve context within the CPU sync sections. Signed-off-by: John Ogness Reviewed-by: Petr Mladek Reviewed-by: Sergey Senozhatsky Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220421212250.565456-3-john.ogness@linutronix.de --- include/linux/printk.h | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index 859323a52985..b70a42f94031 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -281,9 +281,16 @@ extern int __printk_cpu_sync_try_get(void); extern void __printk_cpu_sync_wait(void); extern void __printk_cpu_sync_put(void); +#else + +#define __printk_cpu_sync_try_get() true +#define __printk_cpu_sync_wait() +#define __printk_cpu_sync_put() +#endif /* CONFIG_SMP */ + /** - * printk_cpu_sync_get_irqsave() - Acquire the printk cpu-reentrant spinning - * lock and disable interrupts. + * printk_cpu_sync_get_irqsave() - Disable interrupts and acquire the printk + * cpu-reentrant spinning lock. * @flags: Stack-allocated storage for saving local interrupt state, * to be passed to printk_cpu_sync_put_irqrestore(). * @@ -322,13 +329,6 @@ extern void __printk_cpu_sync_put(void); local_irq_restore(flags); \ } while (0) -#else - -#define printk_cpu_sync_get_irqsave(flags) ((void)flags) -#define printk_cpu_sync_put_irqrestore(flags) ((void)flags) - -#endif /* CONFIG_SMP */ - extern int kptr_restrict; /** -- cgit From a699449bb13b70b8bd10dc03ad7327ea3993221e Mon Sep 17 00:00:00 2001 From: John Ogness Date: Thu, 21 Apr 2022 23:28:44 +0206 Subject: printk: refactor and rework printing logic Refactor/rework printing logic in order to prepare for moving to threaded console printing. - Move @console_seq into struct console so that the current "position" of each console can be tracked individually. - Move @console_dropped into struct console so that the current drop count of each console can be tracked individually. - Modify printing logic so that each console independently loads, prepares, and prints its next record. - Remove exclusive_console logic. Since console positions are handled independently, replaying past records occurs naturally. - Update the comments explaining why preemption is disabled while printing from printk() context. With these changes, there is a change in behavior: the console replaying the log (formerly exclusive console) will no longer block other consoles. New messages appear on the other consoles while the newly added console is still replaying. Signed-off-by: John Ogness Reviewed-by: Petr Mladek Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220421212250.565456-10-john.ogness@linutronix.de --- include/linux/console.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/console.h b/include/linux/console.h index 7cd758a4f44e..8c1686e2c233 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -151,6 +151,8 @@ struct console { int cflag; uint ispeed; uint ospeed; + u64 seq; + unsigned long dropped; void *data; struct console *next; }; -- cgit From 3b604ca81202eea2a917eb6491e90f610fba0ec7 Mon Sep 17 00:00:00 2001 From: John Ogness Date: Thu, 21 Apr 2022 23:28:46 +0206 Subject: printk: add pr_flush() Provide a might-sleep function to allow waiting for console printers to catch up to the latest logged message. Use pr_flush() whenever it is desirable to get buffered messages printed before continuing: suspend_console(), resume_console(), console_stop(), console_start(), console_unblank(). Signed-off-by: John Ogness Reviewed-by: Petr Mladek Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220421212250.565456-12-john.ogness@linutronix.de --- include/linux/printk.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index b70a42f94031..091fba7283e1 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -170,6 +170,8 @@ extern void __printk_safe_exit(void); #define printk_deferred_enter __printk_safe_enter #define printk_deferred_exit __printk_safe_exit +extern bool pr_flush(int timeout_ms, bool reset_on_progress); + /* * Please don't use printk_ratelimit(), because it shares ratelimiting state * with all other unrelated printk_ratelimit() callsites. Instead use @@ -220,6 +222,11 @@ static inline void printk_deferred_exit(void) { } +static inline bool pr_flush(int timeout_ms, bool reset_on_progress) +{ + return true; +} + static inline int printk_ratelimit(void) { return 0; -- cgit From 2bb2b7b57f81255c13f4395ea911d6bdc70c9fe2 Mon Sep 17 00:00:00 2001 From: John Ogness Date: Thu, 21 Apr 2022 23:28:47 +0206 Subject: printk: add functions to prefer direct printing Once kthread printing is available, console printing will no longer occur in the context of the printk caller. However, there are some special contexts where it is desirable for the printk caller to directly print out kernel messages. Using pr_flush() to wait for threaded printers is only possible if the caller is in a sleepable context and the kthreads are active. That is not always the case. Introduce printk_prefer_direct_enter() and printk_prefer_direct_exit() functions to explicitly (and globally) activate/deactivate preferred direct console printing. The term "direct console printing" refers to printing to all enabled consoles from the context of the printk caller. The term "prefer" is used because this type of printing is only best effort. If the console is currently locked or other printers are already actively printing, the printk caller will need to rely on the other contexts to handle the printing. This preferred direct printing is how all printing has been handled until now (unless it was explicitly deferred). When kthread printing is introduced, there may be some unanticipated problems due to kthreads being unable to flush important messages. In order to minimize such risks, preferred direct printing is activated for the primary important messages when the system experiences general types of major errors. These are: - emergency reboot/shutdown - cpu and rcu stalls - hard and soft lockups - hung tasks - warn - sysrq Note that since kthread printing does not yet exist, no behavior changes result from this commit. This is only implementing the counter and marking the various places where preferred direct printing is active. Signed-off-by: John Ogness Reviewed-by: Petr Mladek Acked-by: Paul E. McKenney # for RCU Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220421212250.565456-13-john.ogness@linutronix.de --- include/linux/printk.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index 091fba7283e1..cd26aab0ab2a 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -170,6 +170,9 @@ extern void __printk_safe_exit(void); #define printk_deferred_enter __printk_safe_enter #define printk_deferred_exit __printk_safe_exit +extern void printk_prefer_direct_enter(void); +extern void printk_prefer_direct_exit(void); + extern bool pr_flush(int timeout_ms, bool reset_on_progress); /* @@ -222,6 +225,14 @@ static inline void printk_deferred_exit(void) { } +static inline void printk_prefer_direct_enter(void) +{ +} + +static inline void printk_prefer_direct_exit(void) +{ +} + static inline bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; -- cgit From 09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7 Mon Sep 17 00:00:00 2001 From: John Ogness Date: Thu, 21 Apr 2022 23:28:48 +0206 Subject: printk: add kthread console printers Create a kthread for each console to perform console printing. During normal operation (@system_state == SYSTEM_RUNNING), the kthread printers are responsible for all printing on their respective consoles. During non-normal operation, console printing is done as it has been: within the context of the printk caller or within irqwork triggered by the printk caller, referred to as direct printing. Since threaded console printers are responsible for all printing during normal operation, this also includes messages generated via deferred printk calls. If direct printing is in effect during a deferred printk call, the queued irqwork will perform the direct printing. To make it clear that this is the only time that the irqwork will perform direct printing, rename the flag PRINTK_PENDING_OUTPUT to PRINTK_PENDING_DIRECT_OUTPUT. Threaded console printers synchronize against each other and against console lockers by taking the console lock for each message that is printed. Note that the kthread printers do not care about direct printing. They will always try to print if new records are available. They can be blocked by direct printing, but will be woken again once direct printing is finished. Console unregistration is a bit tricky because the associated kthread printer cannot be stopped while the console lock is held. A policy is implemented that states: whichever task clears con->thread (under the console lock) is responsible for stopping the kthread. unregister_console() will clear con->thread while the console lock is held and then stop the kthread after releasing the console lock. For consoles that have implemented the exit() callback, the kthread is stopped before exit() is called. Signed-off-by: John Ogness Reviewed-by: Petr Mladek Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220421212250.565456-14-john.ogness@linutronix.de --- include/linux/console.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/console.h b/include/linux/console.h index 8c1686e2c233..9a251e70c090 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -153,6 +153,8 @@ struct console { uint ospeed; u64 seq; unsigned long dropped; + struct task_struct *thread; + void *data; struct console *next; }; -- cgit From 5e7260712b9a76de19c32f15d667f8f93e573978 Mon Sep 17 00:00:00 2001 From: Guillaume Nault Date: Thu, 21 Apr 2022 14:47:26 +0200 Subject: qed: Remove IP services API. qed_nvmetcp_ip_services.c and its corresponding header file were introduced in commit 806ee7f81a2b ("qed: Add IP services APIs support") but there's still no users for any of the functions they declare. Since these files are effectively unused, let's just drop them. Found by code inspection. Compile-tested only. Signed-off-by: Guillaume Nault Link: https://lore.kernel.org/r/351ac8c847980e22850eb390553f8cc0e1ccd0ce.1650545051.git.gnault@redhat.com Signed-off-by: Jakub Kicinski --- include/linux/qed/qed_nvmetcp_ip_services_if.h | 29 -------------------------- 1 file changed, 29 deletions(-) delete mode 100644 include/linux/qed/qed_nvmetcp_ip_services_if.h (limited to 'include/linux') diff --git a/include/linux/qed/qed_nvmetcp_ip_services_if.h b/include/linux/qed/qed_nvmetcp_ip_services_if.h deleted file mode 100644 index 3604aee53796..000000000000 --- a/include/linux/qed/qed_nvmetcp_ip_services_if.h +++ /dev/null @@ -1,29 +0,0 @@ -/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */ -/* - * Copyright 2021 Marvell. All rights reserved. - */ - -#ifndef _QED_IP_SERVICES_IF_H -#define _QED_IP_SERVICES_IF_H - -#include -#include -#include -#include - -int qed_route_ipv4(struct sockaddr_storage *local_addr, - struct sockaddr_storage *remote_addr, - struct sockaddr *hardware_address, - struct net_device **ndev); -int qed_route_ipv6(struct sockaddr_storage *local_addr, - struct sockaddr_storage *remote_addr, - struct sockaddr *hardware_address, - struct net_device **ndev); -void qed_vlan_get_ndev(struct net_device **ndev, u16 *vlan_id); -struct pci_dev *qed_validate_ndev(struct net_device *ndev); -void qed_return_tcp_port(struct socket *sock); -int qed_fetch_tcp_port(struct sockaddr_storage local_ip_addr, - struct socket **sock, u16 *port); -__be16 qed_get_in_port(struct sockaddr_storage *sa); - -#endif /* _QED_IP_SERVICES_IF_H */ -- cgit From 9ea4dcf49878bb9546b8fa9319dcbdc9b7ee20f8 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 22 Apr 2022 15:58:11 -0700 Subject: PM: CXL: Disable suspend The CXL specification claims S3 support at a hardware level, but at a system software level there are some missing pieces. Section 9.4 (CXL 2.0) rightly claims that "CXL mem adapters may need aux power to retain memory context across S3", but there is no enumeration mechanism for the OS to determine if a given adapter has that support. Moreover the save state and resume image for the system may inadvertantly end up in a CXL device that needs to be restored before the save state is recoverable. I.e. a circular dependency that is not resolvable without a third party save-area. Arrange for the cxl_mem driver to fail S3 attempts. This still nominaly allows for suspend, but requires unbinding all CXL memory devices before the suspend to ensure the typical DRAM flow is taken. The cxl_mem unbind flow is intended to also tear down all CXL memory regions associated with a given cxl_memdev. It is reasonable to assume that any device participating in a System RAM range published in the EFI memory map is covered by aux power and save-area outside the device itself. So this restriction can be minimized in the future once pre-existing region enumeration support arrives, and perhaps a spec update to clarify if the EFI memory map is sufficent for determining the range of devices managed by platform-firmware for S3 support. Per Rafael, if the CXL configuration prevents suspend then it should fail early before tasks are frozen, and mem_sleep should stop showing 'mem' as an option [1]. Effectively CXL augments the platform suspend ->valid() op since, for example, the ACPI ops are not aware of the CXL / PCI dependencies. Given the split role of platform firmware vs OS provisioned CXL memory it is up to the cxl_mem driver to determine if the CXL configuration has elements that platform firmware may not be prepared to restore. Link: https://lore.kernel.org/r/CAJZ5v0hGVN_=3iU8OLpHY3Ak35T5+JcBM-qs8SbojKrpd0VXsA@mail.gmail.com [1] Cc: "Rafael J. Wysocki" Cc: Pavel Machek Cc: Len Brown Reviewed-by: Rafael J. Wysocki Link: https://lore.kernel.org/r/165066828317.3907920.5690432272182042556.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- include/linux/pm.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm.h b/include/linux/pm.h index e65b3ab28377..7911c4c9a7be 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -36,6 +36,15 @@ static inline void pm_vt_switch_unregister(struct device *dev) } #endif /* CONFIG_VT_CONSOLE_SLEEP */ +#ifdef CONFIG_CXL_SUSPEND +bool cxl_mem_active(void); +#else +static inline bool cxl_mem_active(void) +{ + return false; +} +#endif + /* * Device power management */ -- cgit From f226041424cf87245d39a1b2dfae304308b36b6b Mon Sep 17 00:00:00 2001 From: Dave Gerlach Date: Sat, 9 Apr 2022 14:12:15 -0700 Subject: soc: ti: wkup_m3_ipc: Add support for toggling VTT regulator Some boards like the AM335x EVM-SK and AM437x GP EVM provide software control via a GPIO pin to toggle the DDR VTT regulator to reduce power consumption in low power states. The VTT regulator should be disabled after enabling self-refresh on suspend, and should be enabled before disabling self-refresh on resume. This is to allow proper self-refresh entry/exit commands to be transmitted to the memory. The "ti,vtt-gpio-pin" device tree property in the wkup_m3_ipc node specifies which GPIO pin to use. This property is communicated to the Wakeup Cortex M3 co-processor where the actual toggling of the GPIO pin happens in CM3 firmware [1]. Please note that the GPIO pin must be on the GPIO0 module as that module is in the wakeup power domain. [1] https://git.ti.com/cgit/processor-firmware/ti-amx3-cm3-pm-firmware/tree/src/pm_services/ddr.c?h=08.02.00.006#n190 Signed-off-by: Dave Gerlach Signed-off-by: Keerthy [dfustini: remove the unnecessary "ti,needs-vtt-toggle" property] Signed-off-by: Drew Fustini Signed-off-by: Nishanth Menon Link: https://lore.kernel.org/r/20220409211215.2529387-3-dfustini@baylibre.com --- include/linux/wkup_m3_ipc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/wkup_m3_ipc.h b/include/linux/wkup_m3_ipc.h index 3f496967b538..2bc52c6381d5 100644 --- a/include/linux/wkup_m3_ipc.h +++ b/include/linux/wkup_m3_ipc.h @@ -33,6 +33,7 @@ struct wkup_m3_ipc { int mem_type; unsigned long resume_addr; + int vtt_conf; int state; struct completion sync_complete; -- cgit From 0f08c2e7458e25c967d844170f8ad1aac3b57a02 Mon Sep 17 00:00:00 2001 From: Vincent Mailhol Date: Thu, 17 Mar 2022 12:55:06 +0900 Subject: usb: deprecate the third argument of usb_maxpacket() This is a transitional patch with the ultimate goal of changing the prototype of usb_maxpacket() from: | static inline __u16 | usb_maxpacket(struct usb_device *udev, int pipe, int is_out) into: | static inline u16 usb_maxpacket(struct usb_device *udev, int pipe) The third argument of usb_maxpacket(): is_out gets removed because it can be derived from its second argument: pipe using usb_pipeout(pipe). Furthermore, in the current version, ubs_pipeout(pipe) is called regardless in order to sanitize the is_out parameter. In order to make a smooth change, we first deprecate the is_out parameter by simply ignoring it (using a variadic function) and will remove it later, once all the callers get updated. The body of the function is reworked accordingly and is_out is replaced by usb_pipeout(pipe). The WARN_ON() calls become unnecessary and get removed. Finally, the return type is changed from __u16 to u16 because this is not a UAPI function. Signed-off-by: Vincent Mailhol Link: https://lore.kernel.org/r/20220317035514.6378-2-mailhol.vincent@wanadoo.fr Signed-off-by: Greg Kroah-Hartman --- include/linux/usb.h | 16 +++++----------- 1 file changed, 5 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb.h b/include/linux/usb.h index 200b7b79acb5..572e136f6314 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -1969,21 +1969,17 @@ usb_pipe_endpoint(struct usb_device *dev, unsigned int pipe) return eps[usb_pipeendpoint(pipe)]; } -/*-------------------------------------------------------------------------*/ - -static inline __u16 -usb_maxpacket(struct usb_device *udev, int pipe, int is_out) +static inline u16 usb_maxpacket(struct usb_device *udev, int pipe, + /* int is_out deprecated */ ...) { struct usb_host_endpoint *ep; unsigned epnum = usb_pipeendpoint(pipe); - if (is_out) { - WARN_ON(usb_pipein(pipe)); + if (usb_pipeout(pipe)) ep = udev->ep_out[epnum]; - } else { - WARN_ON(usb_pipeout(pipe)); + else ep = udev->ep_in[epnum]; - } + if (!ep) return 0; @@ -1991,8 +1987,6 @@ usb_maxpacket(struct usb_device *udev, int pipe, int is_out) return usb_endpoint_maxp(&ep->desc); } -/* ----------------------------------------------------------------------- */ - /* translate USB error codes to codes user space understands */ static inline int usb_translate_errors(int error_code) { -- cgit From 2ddf7617d568fdcbd3cc454bb4849fa6d3398c86 Mon Sep 17 00:00:00 2001 From: Vincent Mailhol Date: Thu, 17 Mar 2022 12:55:13 +0900 Subject: usb: remove third argument of usb_maxpacket() Now that all users of usb_maxpacket() have been migrated to only use two arguments, remove the third variadic argument which was introduced for the transition. Signed-off-by: Vincent Mailhol Link: https://lore.kernel.org/r/20220317035514.6378-9-mailhol.vincent@wanadoo.fr Signed-off-by: Greg Kroah-Hartman --- include/linux/usb.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb.h b/include/linux/usb.h index 572e136f6314..8127782aa7a1 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -1969,8 +1969,7 @@ usb_pipe_endpoint(struct usb_device *dev, unsigned int pipe) return eps[usb_pipeendpoint(pipe)]; } -static inline u16 usb_maxpacket(struct usb_device *udev, int pipe, - /* int is_out deprecated */ ...) +static inline u16 usb_maxpacket(struct usb_device *udev, int pipe) { struct usb_host_endpoint *ep; unsigned epnum = usb_pipeendpoint(pipe); -- cgit From bdddc253b0938a0063798881d1f6a971ea1d8943 Mon Sep 17 00:00:00 2001 From: Vincent Mailhol Date: Thu, 17 Mar 2022 12:55:14 +0900 Subject: usb: rework usb_maxpacket() using usb_pipe_endpoint() Rework the body of usb_maxpacket() and just rely on the usb_pipe_endpoint() helper function to retrieve the host endpoint instead of doing it by hand. Signed-off-by: Vincent Mailhol Link: https://lore.kernel.org/r/20220317035514.6378-10-mailhol.vincent@wanadoo.fr Signed-off-by: Greg Kroah-Hartman --- include/linux/usb.h | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb.h b/include/linux/usb.h index 8127782aa7a1..60bee864d897 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -1971,13 +1971,7 @@ usb_pipe_endpoint(struct usb_device *dev, unsigned int pipe) static inline u16 usb_maxpacket(struct usb_device *udev, int pipe) { - struct usb_host_endpoint *ep; - unsigned epnum = usb_pipeendpoint(pipe); - - if (usb_pipeout(pipe)) - ep = udev->ep_out[epnum]; - else - ep = udev->ep_in[epnum]; + struct usb_host_endpoint *ep = usb_pipe_endpoint(udev, pipe); if (!ep) return 0; -- cgit From da214a475f8bd1d3e9e7a19ddfeb4d1617551bab Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Tue, 12 Apr 2022 14:22:39 -0600 Subject: net: add __sys_socket_file() This works like __sys_socket(), except instead of allocating and returning a socket fd, it just returns the file associated with the socket. No fd is installed into the process file table. This is similar to do_accept(), and allows io_uring to use this without instantiating a file descriptor in the process file table. Signed-off-by: Jens Axboe Acked-by: David S. Miller Link: https://lore.kernel.org/r/20220412202240.234207-2-axboe@kernel.dk --- include/linux/socket.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/socket.h b/include/linux/socket.h index 6f85f5d957ef..a1882e1e71d2 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -434,6 +434,7 @@ extern struct file *do_accept(struct file *file, unsigned file_flags, extern int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen, int flags); extern int __sys_socket(int family, int type, int protocol); +extern struct file *__sys_socket_file(int family, int type, int protocol); extern int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen); extern int __sys_connect_file(struct file *file, struct sockaddr_storage *addr, int addrlen, int file_flags); -- cgit From da32b5817253697671af961715517bfbb308a592 Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Sat, 23 Apr 2022 11:07:49 +0100 Subject: mm: Add fault_in_subpage_writeable() to probe at sub-page granularity On hardware with features like arm64 MTE or SPARC ADI, an access fault can be triggered at sub-page granularity. Depending on how the fault_in_writeable() function is used, the caller can get into a live-lock by continuously retrying the fault-in on an address different from the one where the uaccess failed. In the majority of cases progress is ensured by the following conditions: 1. copy_to_user_nofault() guarantees at least one byte access if the user address is not faulting. 2. The fault_in_writeable() loop is resumed from the first address that could not be accessed by copy_to_user_nofault(). If the loop iteration is restarted from an earlier (initial) point, the loop is repeated with the same conditions and it would live-lock. Introduce an arch-specific probe_subpage_writeable() and call it from the newly added fault_in_subpage_writeable() function. The arch code with sub-page faults will have to implement the specific probing functionality. Note that no other fault_in_subpage_*() functions are added since they have no callers currently susceptible to a live-lock. Signed-off-by: Catalin Marinas Cc: Andrew Morton Link: https://lore.kernel.org/r/20220423100751.1870771-2-catalin.marinas@arm.com Signed-off-by: Catalin Marinas --- include/linux/pagemap.h | 1 + include/linux/uaccess.h | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 993994cd943a..6165283bdb6f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1046,6 +1046,7 @@ void folio_add_wait_queue(struct folio *folio, wait_queue_entry_t *waiter); * Fault in userspace address range. */ size_t fault_in_writeable(char __user *uaddr, size_t size); +size_t fault_in_subpage_writeable(char __user *uaddr, size_t size); size_t fault_in_safe_writeable(const char __user *uaddr, size_t size); size_t fault_in_readable(const char __user *uaddr, size_t size); diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 546179418ffa..5a328cf02b75 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -231,6 +231,28 @@ static inline bool pagefault_disabled(void) */ #define faulthandler_disabled() (pagefault_disabled() || in_atomic()) +#ifndef CONFIG_ARCH_HAS_SUBPAGE_FAULTS + +/** + * probe_subpage_writeable: probe the user range for write faults at sub-page + * granularity (e.g. arm64 MTE) + * @uaddr: start of address range + * @size: size of address range + * + * Returns 0 on success, the number of bytes not probed on fault. + * + * It is expected that the caller checked for the write permission of each + * page in the range either by put_user() or GUP. The architecture port can + * implement a more efficient get_user() probing if the same sub-page faults + * are triggered by either a read or a write. + */ +static inline size_t probe_subpage_writeable(char __user *uaddr, size_t size) +{ + return 0; +} + +#endif /* CONFIG_ARCH_HAS_SUBPAGE_FAULTS */ + #ifndef ARCH_HAS_NOCACHE_UACCESS static inline __must_check unsigned long -- cgit From 3c938cc5cebcbd2291fe97f523c0705a2c24c77d Mon Sep 17 00:00:00 2001 From: Schspa Shi Date: Tue, 19 Apr 2022 09:28:10 +0800 Subject: gpio: use raw spinlock for gpio chip shadowed data In case of PREEMPT_RT, there is a raw_spinlock -> spinlock dependency as the lockdep report shows. __irq_set_handler irq_get_desc_buslock __irq_get_desc_lock raw_spin_lock_irqsave(&desc->lock, *flags); // raw spinlock get here __irq_do_set_handler mask_ack_irq dwapb_irq_ack spin_lock_irqsave(&gc->bgpio_lock, flags); // sleep able spinlock irq_put_desc_busunlock Replace with a raw lock to avoid BUGs. This lock is only used to access registers, and It's safe to replace with the raw lock without bad influence. [ 15.090359][ T1] ============================= [ 15.090365][ T1] [ BUG: Invalid wait context ] [ 15.090373][ T1] 5.10.59-rt52-00983-g186a6841c682-dirty #3 Not tainted [ 15.090386][ T1] ----------------------------- [ 15.090392][ T1] swapper/0/1 is trying to lock: [ 15.090402][ T1] 70ff00018507c188 (&gc->bgpio_lock){....}-{3:3}, at: _raw_spin_lock_irqsave+0x1c/0x28 [ 15.090470][ T1] other info that might help us debug this: [ 15.090477][ T1] context-{5:5} [ 15.090485][ T1] 3 locks held by swapper/0/1: [ 15.090497][ T1] #0: c2ff0001816de1a0 (&dev->mutex){....}-{4:4}, at: __device_driver_lock+0x98/0x104 [ 15.090553][ T1] #1: ffff90001485b4b8 (irq_domain_mutex){+.+.}-{4:4}, at: irq_domain_associate+0xbc/0x6d4 [ 15.090606][ T1] #2: 4bff000185d7a8e0 (lock_class){....}-{2:2}, at: _raw_spin_lock_irqsave+0x1c/0x28 [ 15.090654][ T1] stack backtrace: [ 15.090661][ T1] CPU: 4 PID: 1 Comm: swapper/0 Not tainted 5.10.59-rt52-00983-g186a6841c682-dirty #3 [ 15.090682][ T1] Hardware name: Horizon Robotics Journey 5 DVB (DT) [ 15.090692][ T1] Call trace: ...... [ 15.090811][ T1] _raw_spin_lock_irqsave+0x1c/0x28 [ 15.090828][ T1] dwapb_irq_ack+0xb4/0x300 [ 15.090846][ T1] __irq_do_set_handler+0x494/0xb2c [ 15.090864][ T1] __irq_set_handler+0x74/0x114 [ 15.090881][ T1] irq_set_chip_and_handler_name+0x44/0x58 [ 15.090900][ T1] gpiochip_irq_map+0x210/0x644 Signed-off-by: Schspa Shi Reviewed-by: Andy Shevchenko Acked-by: Linus Walleij Acked-by: Doug Berger Acked-by: Serge Semin Signed-off-by: Bartosz Golaszewski --- include/linux/gpio/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 98c93510640e..991d374dcf71 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -436,7 +436,7 @@ struct gpio_chip { void __iomem *reg_dir_in; bool bgpio_dir_unreadable; int bgpio_bits; - spinlock_t bgpio_lock; + raw_spinlock_t bgpio_lock; unsigned long bgpio_data; unsigned long bgpio_dir; #endif /* CONFIG_GPIO_GENERIC */ -- cgit From 38035c04f5865c4ef9597d6beed6a7178f90f64a Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:13 +0300 Subject: inotify: move control flags from mask to mark flags The inotify control flags in the mark mask (e.g. FS_IN_ONE_SHOT) are not relevant to object interest mask, so move them to the mark flags. This frees up some bits in the object interest mask. Link: https://lore.kernel.org/r/20220422120327.3459282-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify_backend.h | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 0805b74cae44..b1c72edd9784 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -55,7 +55,6 @@ #define FS_ACCESS_PERM 0x00020000 /* access event in a permissions hook */ #define FS_OPEN_EXEC_PERM 0x00040000 /* open/exec event in a permission hook */ -#define FS_EXCL_UNLINK 0x04000000 /* do not send events if object is unlinked */ /* * Set on inode mark that cares about things that happen to its children. * Always set for dnotify and inotify. @@ -66,7 +65,6 @@ #define FS_RENAME 0x10000000 /* File was renamed */ #define FS_DN_MULTISHOT 0x20000000 /* dnotify multishot */ #define FS_ISDIR 0x40000000 /* event occurred against dir */ -#define FS_IN_ONESHOT 0x80000000 /* only send event once */ #define FS_MOVE (FS_MOVED_FROM | FS_MOVED_TO) @@ -106,8 +104,7 @@ FS_ERROR) /* Extra flags that may be reported with event or control handling of events */ -#define ALL_FSNOTIFY_FLAGS (FS_EXCL_UNLINK | FS_ISDIR | FS_IN_ONESHOT | \ - FS_DN_MULTISHOT | FS_EVENT_ON_CHILD) +#define ALL_FSNOTIFY_FLAGS (FS_ISDIR | FS_EVENT_ON_CHILD | FS_DN_MULTISHOT) #define ALL_FSNOTIFY_BITS (ALL_FSNOTIFY_EVENTS | ALL_FSNOTIFY_FLAGS) @@ -473,9 +470,14 @@ struct fsnotify_mark { struct fsnotify_mark_connector *connector; /* Events types to ignore [mark->lock, group->mark_mutex] */ __u32 ignored_mask; -#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x01 -#define FSNOTIFY_MARK_FLAG_ALIVE 0x02 -#define FSNOTIFY_MARK_FLAG_ATTACHED 0x04 + /* General fsnotify mark flags */ +#define FSNOTIFY_MARK_FLAG_ALIVE 0x0001 +#define FSNOTIFY_MARK_FLAG_ATTACHED 0x0002 + /* inotify mark flags */ +#define FSNOTIFY_MARK_FLAG_EXCL_UNLINK 0x0010 +#define FSNOTIFY_MARK_FLAG_IN_ONESHOT 0x0020 + /* fanotify mark flags */ +#define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 unsigned int flags; /* flags [mark->lock] */ }; -- cgit From 867a448d587e7fa845bceaf4ee1c632448f2a9fa Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:15 +0300 Subject: fsnotify: pass flags argument to fsnotify_alloc_group() Add flags argument to fsnotify_alloc_group(), define and use the flag FSNOTIFY_GROUP_USER in inotify and fanotify instead of the helper fsnotify_alloc_user_group() to indicate user allocation. Although the flag FSNOTIFY_GROUP_USER is currently not used after group allocation, we store the flags argument in the group struct for future use of other group flags. Link: https://lore.kernel.org/r/20220422120327.3459282-5-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify_backend.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index b1c72edd9784..f0bf557af009 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -210,6 +210,9 @@ struct fsnotify_group { unsigned int priority; bool shutdown; /* group is being shut down, don't queue more events */ +#define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ + int flags; + /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ atomic_t user_waits; /* Number of tasks waiting for user @@ -543,8 +546,9 @@ static inline void fsnotify_update_flags(struct dentry *dentry) /* called from fsnotify listeners, such as fanotify or dnotify */ /* create a new group */ -extern struct fsnotify_group *fsnotify_alloc_group(const struct fsnotify_ops *ops); -extern struct fsnotify_group *fsnotify_alloc_user_group(const struct fsnotify_ops *ops); +extern struct fsnotify_group *fsnotify_alloc_group( + const struct fsnotify_ops *ops, + int flags); /* get reference to a group */ extern void fsnotify_get_group(struct fsnotify_group *group); /* drop reference on a group from fsnotify_alloc_group */ -- cgit From f3010343d9e119da35ee864b3a28993bb5c78ed7 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:16 +0300 Subject: fsnotify: make allow_dups a property of the group Instead of passing the allow_dups argument to fsnotify_add_mark() as an argument, define the group flag FSNOTIFY_GROUP_DUPS to express the allow_dups behavior and set this behavior at group creation time for all calls of fsnotify_add_mark(). Rename the allow_dups argument to generic add_flags argument for future use. Link: https://lore.kernel.org/r/20220422120327.3459282-6-amir73il@gmail.com Suggested-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify_backend.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index f0bf557af009..dd440e6ff528 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -211,6 +211,7 @@ struct fsnotify_group { bool shutdown; /* group is being shut down, don't queue more events */ #define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ +#define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ int flags; /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ @@ -641,26 +642,26 @@ extern int fsnotify_get_conn_fsid(const struct fsnotify_mark_connector *conn, /* attach the mark to the object */ extern int fsnotify_add_mark(struct fsnotify_mark *mark, fsnotify_connp_t *connp, unsigned int obj_type, - int allow_dups, __kernel_fsid_t *fsid); + int add_flags, __kernel_fsid_t *fsid); extern int fsnotify_add_mark_locked(struct fsnotify_mark *mark, fsnotify_connp_t *connp, - unsigned int obj_type, int allow_dups, + unsigned int obj_type, int add_flags, __kernel_fsid_t *fsid); /* attach the mark to the inode */ static inline int fsnotify_add_inode_mark(struct fsnotify_mark *mark, struct inode *inode, - int allow_dups) + int add_flags) { return fsnotify_add_mark(mark, &inode->i_fsnotify_marks, - FSNOTIFY_OBJ_TYPE_INODE, allow_dups, NULL); + FSNOTIFY_OBJ_TYPE_INODE, add_flags, NULL); } static inline int fsnotify_add_inode_mark_locked(struct fsnotify_mark *mark, struct inode *inode, - int allow_dups) + int add_flags) { return fsnotify_add_mark_locked(mark, &inode->i_fsnotify_marks, - FSNOTIFY_OBJ_TYPE_INODE, allow_dups, + FSNOTIFY_OBJ_TYPE_INODE, add_flags, NULL); } -- cgit From 43b245a788e2d8f1bb742668a9bdace02fcb3e96 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:17 +0300 Subject: fsnotify: create helpers for group mark_mutex lock Create helpers to take and release the group mark_mutex lock. Define a flag FSNOTIFY_GROUP_NOFS in fsnotify_group that determines if the mark_mutex lock is fs reclaim safe or not. If not safe, the lock helpers take the lock and disable direct fs reclaim. In that case we annotate the mutex with a different lockdep class to express to lockdep that an allocation of mark of an fs reclaim safe group may take the group lock of another "NOFS" group to evict inodes. For now, converted only the callers in common code and no backend defines the NOFS flag. It is intended to be set by fanotify for evictable marks support. Link: https://lore.kernel.org/r/20220422120327.3459282-7-amir73il@gmail.com Suggested-by: Jan Kara Link: https://lore.kernel.org/r/20220321112310.vpr7oxro2xkz5llh@quack3.lan/ Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify_backend.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index dd440e6ff528..d62111e83244 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -20,6 +20,7 @@ #include #include #include +#include /* * IN_* from inotfy.h lines up EXACTLY with FS_*, this is so we can easily @@ -212,7 +213,9 @@ struct fsnotify_group { #define FSNOTIFY_GROUP_USER 0x01 /* user allocated group */ #define FSNOTIFY_GROUP_DUPS 0x02 /* allow multiple marks per object */ +#define FSNOTIFY_GROUP_NOFS 0x04 /* group lock is not direct reclaim safe */ int flags; + unsigned int owner_flags; /* stored flags of mark_mutex owner */ /* stores all fastpath marks assoc with this group so they can be cleaned on unregister */ struct mutex mark_mutex; /* protect marks_list */ @@ -254,6 +257,31 @@ struct fsnotify_group { }; }; +/* + * These helpers are used to prevent deadlock when reclaiming inodes with + * evictable marks of the same group that is allocating a new mark. + */ +static inline void fsnotify_group_lock(struct fsnotify_group *group) +{ + mutex_lock(&group->mark_mutex); + if (group->flags & FSNOTIFY_GROUP_NOFS) + group->owner_flags = memalloc_nofs_save(); +} + +static inline void fsnotify_group_unlock(struct fsnotify_group *group) +{ + if (group->flags & FSNOTIFY_GROUP_NOFS) + memalloc_nofs_restore(group->owner_flags); + mutex_unlock(&group->mark_mutex); +} + +static inline void fsnotify_group_assert_locked(struct fsnotify_group *group) +{ + WARN_ON_ONCE(!mutex_is_locked(&group->mark_mutex)); + if (group->flags & FSNOTIFY_GROUP_NOFS) + WARN_ON_ONCE(!(current->flags & PF_MEMALLOC_NOFS)); +} + /* When calling fsnotify tell it if the data is a path or inode */ enum fsnotify_data_type { FSNOTIFY_EVENT_NONE, -- cgit From c3638b5b13740fa31762d414bbce8b7a694e582a Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:22 +0300 Subject: fsnotify: allow adding an inode mark without pinning inode fsnotify_add_mark() and variants implicitly take a reference on inode when attaching a mark to an inode. Make that behavior opt-out with the mark flag FSNOTIFY_MARK_FLAG_NO_IREF. Instead of taking the inode reference when attaching connector to inode and dropping the inode reference when detaching connector from inode, take the inode reference on attach of the first mark that wants to hold an inode reference and drop the inode reference on detach of the last mark that wants to hold an inode reference. Backends can "upgrade" an existing mark to take an inode reference, but cannot "downgrade" a mark with inode reference to release the refernce. This leaves the choice to the backend whether or not to pin the inode when adding an inode mark. This is intended to be used when adding a mark with ignored mask that is used for optimization in cases where group can afford getting unneeded events and reinstate the mark with ignored mask when inode is accessed again after being evicted. Link: https://lore.kernel.org/r/20220422120327.3459282-12-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify_backend.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index d62111e83244..9a1a9e78f69f 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -456,6 +456,7 @@ struct fsnotify_mark_connector { spinlock_t lock; unsigned short type; /* Type of object [lock] */ #define FSNOTIFY_CONN_FLAG_HAS_FSID 0x01 +#define FSNOTIFY_CONN_FLAG_HAS_IREF 0x02 unsigned short flags; /* flags [lock] */ __kernel_fsid_t fsid; /* fsid of filesystem containing object */ union { @@ -510,6 +511,7 @@ struct fsnotify_mark { #define FSNOTIFY_MARK_FLAG_IN_ONESHOT 0x0020 /* fanotify mark flags */ #define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 +#define FSNOTIFY_MARK_FLAG_NO_IREF 0x0200 unsigned int flags; /* flags [mark->lock] */ }; -- cgit From 5f9d3bd520261fd7a850818c71809fd580e0f30c Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Fri, 22 Apr 2022 15:03:27 +0300 Subject: fanotify: enable "evictable" inode marks Now that the direct reclaim path is handled we can enable evictable inode marks. Link: https://lore.kernel.org/r/20220422120327.3459282-17-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fanotify.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index 419cadcd7ff5..edc28555814c 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -66,6 +66,7 @@ FAN_MARK_ONLYDIR | \ FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ + FAN_MARK_EVICTABLE | \ FAN_MARK_FLUSH) /* -- cgit From 430c3500995484962bdbccf358201afef8055535 Mon Sep 17 00:00:00 2001 From: Richard Fitzgerald Date: Mon, 25 Apr 2022 10:51:59 +0100 Subject: firmware: cirrus: cs_dsp: Avoid padding bytes in cs_dsp_coeff_ctl Change the order of members in struct cs_dsp_coeff_ctl to avoid the compiler having to insert alignment padding bytes. On a x86_64 build this saves 16 bytes per control. - Pointers are collected to the top of the struct (with the exception of priv, as noted below), so that they are inherently aligned. - The set and enable bitflags are placed together so they can be merged. - priv is placed at the end of the struct - it is for use by the client so it is helpful to make it stand out, and since the compiler will always pad the struct size to an alignment multiple putting a pointer last won't introduce any more padding. - struct cs_dsp_alg_region is placed at the end, right before priv, for the same reasoning as priv. Signed-off-by: Richard Fitzgerald Reviewed-by: Charles Keepax Link: https://lore.kernel.org/r/20220425095159.3044527-1-rf@opensource.cirrus.com Signed-off-by: Mark Brown --- include/linux/firmware/cirrus/cs_dsp.h | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/firmware/cirrus/cs_dsp.h b/include/linux/firmware/cirrus/cs_dsp.h index 38b4da3ddfe4..30055706cce2 100644 --- a/include/linux/firmware/cirrus/cs_dsp.h +++ b/include/linux/firmware/cirrus/cs_dsp.h @@ -68,36 +68,36 @@ struct cs_dsp_alg_region { /** * struct cs_dsp_coeff_ctl - Describes a coefficient control + * @list: List node for internal use + * @dsp: DSP instance associated with this control + * @cache: Cached value of the control * @fw_name: Name of the firmware * @subname: Name of the control parsed from the WMFW * @subname_len: Length of subname - * @alg_region: Logical region associated with this control - * @dsp: DSP instance associated with this control - * @enabled: Flag indicating whether control is enabled - * @list: List node for internal use - * @cache: Cached value of the control * @offset: Offset of control within alg_region in words * @len: Length of the cached value in bytes - * @set: Flag indicating the value has been written by the user - * @flags: Bitfield of WMFW_CTL_FLAG_ control flags defined in wmfw.h * @type: One of the WMFW_CTL_TYPE_ control types defined in wmfw.h + * @flags: Bitfield of WMFW_CTL_FLAG_ control flags defined in wmfw.h + * @set: Flag indicating the value has been written by the user + * @enabled: Flag indicating whether control is enabled + * @alg_region: Logical region associated with this control * @priv: For use by the client */ struct cs_dsp_coeff_ctl { + struct list_head list; + struct cs_dsp *dsp; + void *cache; const char *fw_name; /* Subname is needed to match with firmware */ const char *subname; unsigned int subname_len; - struct cs_dsp_alg_region alg_region; - struct cs_dsp *dsp; - unsigned int enabled:1; - struct list_head list; - void *cache; unsigned int offset; size_t len; - unsigned int set:1; - unsigned int flags; unsigned int type; + unsigned int flags; + unsigned int set:1; + unsigned int enabled:1; + struct cs_dsp_alg_region alg_region; void *priv; }; -- cgit From 66200bbcde697dea7f48071d325ea4dec6f66b11 Mon Sep 17 00:00:00 2001 From: Michael Kelley Date: Tue, 12 Apr 2022 19:49:00 -0700 Subject: Drivers: hv: vmbus: Add VMbus IMC device to unsupported list Hyper-V may offer an Initial Machine Configuration (IMC) synthetic device to guest VMs. The device may be used by Windows guests to get specialization information, such as the hostname. But the device is not used in Linux and there is no Linux driver, so it is unsupported. Currently, the IMC device GUID is not recognized by the VMbus driver, which results in an "Unknown GUID" error message during boot. Add the GUID to the list of known but unsupported devices so that the error message is not generated. Other than avoiding the error message, there is no change in guest behavior. Signed-off-by: Michael Kelley Link: https://lore.kernel.org/r/1649818140-100953-1-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index fe2e0179ed51..2d085d951ee4 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1451,12 +1451,14 @@ void vmbus_free_mmio(resource_size_t start, resource_size_t size); 0x80, 0x2e, 0x27, 0xed, 0xe1, 0x9f) /* - * Linux doesn't support the 3 devices: the first two are for - * Automatic Virtual Machine Activation, and the third is for - * Remote Desktop Virtualization. + * Linux doesn't support these 4 devices: the first two are for + * Automatic Virtual Machine Activation, the third is for + * Remote Desktop Virtualization, and the fourth is Initial + * Machine Configuration (IMC) used only by Windows guests. * {f8e65716-3cb3-4a06-9a60-1889c5cccab5} * {3375baf4-9e15-4b30-b765-67acb10d607b} * {276aacf4-ac15-426c-98dd-7521ad3f01fe} + * {c376c1c3-d276-48d2-90a9-c04748072c60} */ #define HV_AVMA1_GUID \ @@ -1471,6 +1473,10 @@ void vmbus_free_mmio(resource_size_t start, resource_size_t size); .guid = GUID_INIT(0x276aacf4, 0xac15, 0x426c, 0x98, 0xdd, \ 0x75, 0x21, 0xad, 0x3f, 0x01, 0xfe) +#define HV_IMC_GUID \ + .guid = GUID_INIT(0xc376c1c3, 0xd276, 0x48d2, 0x90, 0xa9, \ + 0xc0, 0x47, 0x48, 0x07, 0x2c, 0x60) + /* * Common header for Hyper-V ICs */ -- cgit From b03afa57c65e1e045e02df49777e953742745f4c Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Tue, 19 Apr 2022 14:23:22 +0200 Subject: Drivers: hv: vmbus: Introduce vmbus_sendpacket_getid() The function can be used to send a VMbus packet and retrieve the corresponding transaction ID. It will be used by hv_pci. No functional change. Suggested-by: Michael Kelley Signed-off-by: Andrea Parri (Microsoft) Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20220419122325.10078-4-parri.andrea@gmail.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 2d085d951ee4..7cb8718825ee 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1161,6 +1161,13 @@ extern int vmbus_open(struct vmbus_channel *channel, extern void vmbus_close(struct vmbus_channel *channel); +extern int vmbus_sendpacket_getid(struct vmbus_channel *channel, + void *buffer, + u32 bufferLen, + u64 requestid, + u64 *trans_id, + enum vmbus_packet_type type, + u32 flags); extern int vmbus_sendpacket(struct vmbus_channel *channel, void *buffer, u32 bufferLen, -- cgit From 0aadb6a7bb811554cf39318b5d18e8ec50dd9f02 Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Tue, 19 Apr 2022 14:23:23 +0200 Subject: Drivers: hv: vmbus: Introduce vmbus_request_addr_match() The function can be used to retrieve and clear/remove a transation ID from a channel requestor, provided the memory address corresponding to the ID equals a specified address. The function, and its 'lockless' variant __vmbus_request_addr_match(), will be used by hv_pci. Refactor vmbus_request_addr() to reuse the 'newly' introduced code. No functional change. Suggested-by: Michael Kelley Signed-off-by: Andrea Parri (Microsoft) Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20220419122325.10078-5-parri.andrea@gmail.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 7cb8718825ee..2e4962a2fbec 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -788,6 +788,7 @@ struct vmbus_requestor { #define VMBUS_NO_RQSTOR U64_MAX #define VMBUS_RQST_ERROR (U64_MAX - 1) +#define VMBUS_RQST_ADDR_ANY U64_MAX /* NetVSC-specific */ #define VMBUS_RQST_ID_NO_RESPONSE (U64_MAX - 2) /* StorVSC-specific */ @@ -1042,6 +1043,10 @@ struct vmbus_channel { }; u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr); +u64 __vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id, + u64 rqst_addr); +u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id, + u64 rqst_addr); u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id); static inline bool is_hvsock_channel(const struct vmbus_channel *c) -- cgit From b91eaf7267cf7aec0a4e087decf7770dfb694d78 Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Tue, 19 Apr 2022 14:23:24 +0200 Subject: Drivers: hv: vmbus: Introduce {lock,unlock}_requestor() To abtract the lock and unlock operations on the requestor spin lock. The helpers will come in handy in hv_pci. No functional change. Suggested-by: Michael Kelley Signed-off-by: Andrea Parri (Microsoft) Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20220419122325.10078-6-parri.andrea@gmail.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 2e4962a2fbec..460a716f4748 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1042,6 +1042,21 @@ struct vmbus_channel { u32 max_pkt_size; }; +#define lock_requestor(channel, flags) \ +do { \ + struct vmbus_requestor *rqstor = &(channel)->requestor; \ + \ + spin_lock_irqsave(&rqstor->req_lock, flags); \ +} while (0) + +static __always_inline void unlock_requestor(struct vmbus_channel *channel, + unsigned long flags) +{ + struct vmbus_requestor *rqstor = &channel->requestor; + + spin_unlock_irqrestore(&rqstor->req_lock, flags); +} + u64 vmbus_next_request_id(struct vmbus_channel *channel, u64 rqst_addr); u64 __vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id, u64 rqst_addr); -- cgit From 067c098766c6af667a9002d4e33cf1f3c998abbe Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Wed, 20 Apr 2022 17:25:05 -0500 Subject: of: overlay: rework overlay apply and remove kfree()s Fix various kfree() issues related to of_overlay_apply(). - Double kfree() of fdt and tree when init_overlay_changeset() returns an error. - free_overlay_changeset() free the root of the unflattened overlay (variable tree) instead of the memory that contains the unflattened overlay. - For the case of a failure during applying an overlay, move kfree() of new_fdt and overlay_mem into free_overlay_changeset(), which is called by the function that allocated them. - For the case of removing an overlay, the kfree() of new_fdt and overlay_mem remains in free_overlay_changeset(). - Check return value of of_fdt_unflatten_tree() for error instead of checking the returned value of overlay_root. - When storing pointers to allocated objects in ovcs, do so as near to the allocation as possible instead of in deeply layered function. More clearly document policy related to lifetime of pointers into overlay memory. Double kfree() Reported-by: Slawomir Stepien Signed-off-by: Frank Rowand Signed-off-by: Rob Herring Link: https://lore.kernel.org/r/20220420222505.928492-3-frowand.list@gmail.com --- include/linux/of.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index 04971e85fbc9..17741eee0ca4 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -1543,7 +1543,8 @@ static inline bool of_device_is_system_power_controller(const struct device_node */ enum of_overlay_notify_action { - OF_OVERLAY_PRE_APPLY = 0, + OF_OVERLAY_INIT = 0, /* kzalloc() of ovcs sets this value */ + OF_OVERLAY_PRE_APPLY, OF_OVERLAY_POST_APPLY, OF_OVERLAY_PRE_REMOVE, OF_OVERLAY_POST_REMOVE, -- cgit From 5b0e58542acb3672f42e08cb1cd57784f5e08d6b Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 7 Apr 2022 12:08:54 +0200 Subject: net: ieee802154: Enhance/fix the names of the MLME return codes Let's keep these definitions as close to the specification as possible while they are not yet in use. The names get slightly longer, but we gain the minor cost of being able to search the spec more easily. Signed-off-by: Miquel Raynal Acked-by: Alexander Aring Link: https://lore.kernel.org/r/20220407100903.1695973-2-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- include/linux/ieee802154.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ieee802154.h b/include/linux/ieee802154.h index 95c831162212..01d945c8b2e1 100644 --- a/include/linux/ieee802154.h +++ b/include/linux/ieee802154.h @@ -136,16 +136,16 @@ enum { IEEE802154_SUCCESS = 0x0, /* The beacon was lost following a synchronization request. */ - IEEE802154_BEACON_LOSS = 0xe0, + IEEE802154_BEACON_LOST = 0xe0, /* * A transmission could not take place due to activity on the * channel, i.e., the CSMA-CA mechanism has failed. */ - IEEE802154_CHNL_ACCESS_FAIL = 0xe1, + IEEE802154_CHANNEL_ACCESS_FAILURE = 0xe1, /* The GTS request has been denied by the PAN coordinator. */ - IEEE802154_DENINED = 0xe2, + IEEE802154_DENIED = 0xe2, /* The attempt to disable the transceiver has failed. */ - IEEE802154_DISABLE_TRX_FAIL = 0xe3, + IEEE802154_DISABLE_TRX_FAILURE = 0xe3, /* * The received frame induces a failed security check according to * the security suite. @@ -185,9 +185,9 @@ enum { * A PAN identifier conflict has been detected and communicated to the * PAN coordinator. */ - IEEE802154_PANID_CONFLICT = 0xee, + IEEE802154_PAN_ID_CONFLICT = 0xee, /* A coordinator realignment command has been received. */ - IEEE802154_REALIGMENT = 0xef, + IEEE802154_REALIGNMENT = 0xef, /* The transaction has expired and its information discarded. */ IEEE802154_TRANSACTION_EXPIRED = 0xf0, /* There is no capacity to store the transaction. */ @@ -203,7 +203,7 @@ enum { * A SET/GET request was issued with the identifier of a PIB attribute * that is not supported. */ - IEEE802154_UNSUPPORTED_ATTR = 0xf4, + IEEE802154_UNSUPPORTED_ATTRIBUTE = 0xf4, /* * A request to perform a scan operation failed because the MLME was * in the process of performing a previously initiated scan operation. -- cgit From f06cfc233ac67e26ed137f4e098c047939cf1530 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Thu, 7 Apr 2022 12:08:55 +0200 Subject: net: ieee802154: Fill the list of MLME return codes There are more codes than already listed, let's be a bit more exhaustive. This will allow to drop device drivers local definitions of these codes. Signed-off-by: Miquel Raynal Acked-by: Alexander Aring Link: https://lore.kernel.org/r/20220407100903.1695973-3-miquel.raynal@bootlin.com Signed-off-by: Stefan Schmidt --- include/linux/ieee802154.h | 67 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 66 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ieee802154.h b/include/linux/ieee802154.h index 01d945c8b2e1..f1f9412b6ac6 100644 --- a/include/linux/ieee802154.h +++ b/include/linux/ieee802154.h @@ -134,7 +134,35 @@ enum { * a successful transmission. */ IEEE802154_SUCCESS = 0x0, - + /* The requested operation failed. */ + IEEE802154_MAC_ERROR = 0x1, + /* The requested operation has been cancelled. */ + IEEE802154_CANCELLED = 0x2, + /* + * Device is ready to poll the coordinator for data in a non beacon + * enabled PAN. + */ + IEEE802154_READY_FOR_POLL = 0x3, + /* Wrong frame counter. */ + IEEE802154_COUNTER_ERROR = 0xdb, + /* + * The frame does not conforms to the incoming key usage policy checking + * procedure. + */ + IEEE802154_IMPROPER_KEY_TYPE = 0xdc, + /* + * The frame does not conforms to the incoming security level usage + * policy checking procedure. + */ + IEEE802154_IMPROPER_SECURITY_LEVEL = 0xdd, + /* Secured frame received with an empty Frame Version field. */ + IEEE802154_UNSUPPORTED_LEGACY = 0xde, + /* + * A secured frame is received or must be sent but security is not + * enabled in the device. Or, the Auxiliary Security Header has security + * level of zero in it. + */ + IEEE802154_UNSUPPORTED_SECURITY = 0xdf, /* The beacon was lost following a synchronization request. */ IEEE802154_BEACON_LOST = 0xe0, /* @@ -204,11 +232,48 @@ enum { * that is not supported. */ IEEE802154_UNSUPPORTED_ATTRIBUTE = 0xf4, + /* Missing source or destination address or address mode. */ + IEEE802154_INVALID_ADDRESS = 0xf5, + /* + * MLME asked to turn the receiver on, but the on time duration is too + * big compared to the macBeaconOrder. + */ + IEEE802154_ON_TIME_TOO_LONG = 0xf6, + /* + * MLME asaked to turn the receiver on, but the request was delayed for + * too long before getting processed. + */ + IEEE802154_PAST_TIME = 0xf7, + /* + * The StartTime parameter is nonzero, and the MLME is not currently + * tracking the beacon of the coordinator through which it is + * associated. + */ + IEEE802154_TRACKING_OFF = 0xf8, + /* + * The index inside the hierarchical values in PIBAttribute is out of + * range. + */ + IEEE802154_INVALID_INDEX = 0xf9, + /* + * The number of PAN descriptors discovered during a scan has been + * reached. + */ + IEEE802154_LIMIT_REACHED = 0xfa, + /* + * The PIBAttribute parameter specifies an attribute that is a read-only + * attribute. + */ + IEEE802154_READ_ONLY = 0xfb, /* * A request to perform a scan operation failed because the MLME was * in the process of performing a previously initiated scan operation. */ IEEE802154_SCAN_IN_PROGRESS = 0xfc, + /* The outgoing superframe overlaps the incoming superframe. */ + IEEE802154_SUPERFRAME_OVERLAP = 0xfd, + /* Any other error situation. */ + IEEE802154_SYSTEM_ERROR = 0xff, }; /* frame control handling */ -- cgit From c83227a5d05ed77b634ce4c2fc5f143ae2a4d6f5 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Thu, 21 Apr 2022 22:06:54 +0200 Subject: irq/gpio: ixp4xx: Drop boardfile probe path The boardfiles for IXP4xx have been deleted. Delete all the quirks and code dealing with that boot path and rely solely on device tree boot. Fix some missing static keywords that the kernel test robot was complaining about while we're at it. Cc: Bartosz Golaszewski Acked-by: Marc Zyngier Signed-off-by: Linus Walleij Signed-off-by: Bartosz Golaszewski --- include/linux/irqchip/irq-ixp4xx.h | 12 ------------ 1 file changed, 12 deletions(-) delete mode 100644 include/linux/irqchip/irq-ixp4xx.h (limited to 'include/linux') diff --git a/include/linux/irqchip/irq-ixp4xx.h b/include/linux/irqchip/irq-ixp4xx.h deleted file mode 100644 index 9395917d6936..000000000000 --- a/include/linux/irqchip/irq-ixp4xx.h +++ /dev/null @@ -1,12 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __IRQ_IXP4XX_H -#define __IRQ_IXP4XX_H - -#include -struct irq_domain; - -void ixp4xx_irq_init(resource_size_t irqbase, - bool is_356); -struct irq_domain *ixp4xx_get_irq_domain(void); - -#endif /* __IRQ_IXP4XX_H */ -- cgit From fae74fb5d525f979085b6e70b883d7a7049bf15f Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 25 Apr 2022 19:32:55 +0200 Subject: gpio: pcf857x: Make teardown callback return void MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit All teardown functions return 0. Also there is little sense in returning a negative error code from an i2c remove function as this only results in emitting an error message but the device is removed nevertheless. This patch is a preparation for making i2c remove callbacks return void. Signed-off-by: Uwe Kleine-König Signed-off-by: Bartosz Golaszewski --- include/linux/platform_data/pcf857x.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/pcf857x.h b/include/linux/platform_data/pcf857x.h index 11d4ed78c7f4..01d0a3ea3aef 100644 --- a/include/linux/platform_data/pcf857x.h +++ b/include/linux/platform_data/pcf857x.h @@ -36,7 +36,7 @@ struct pcf857x_platform_data { int (*setup)(struct i2c_client *client, int gpio, unsigned ngpio, void *context); - int (*teardown)(struct i2c_client *client, + void (*teardown)(struct i2c_client *client, int gpio, unsigned ngpio, void *context); void *context; -- cgit From d9d31cf88702ae071bec033e5c8714048aa71285 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Mon, 25 Apr 2022 15:04:48 -0700 Subject: bpf: Use bpf_prog_run_array_cg_flags everywhere Rename bpf_prog_run_array_cg_flags to bpf_prog_run_array_cg and use it everywhere. check_return_code already enforces sane return ranges for all cgroup types. (only egress and bind hooks have uncanonical return ranges, the rest is using [0, 1]) No functional changes. v2: - 'func_ret & 1' under explicit test (Andrii & Martin) Suggested-by: Alexei Starovoitov Signed-off-by: Stanislav Fomichev Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220425220448.3669032-1-sdf@google.com --- include/linux/bpf-cgroup.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 88a51b242adc..669d96d074ad 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -225,24 +225,20 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk, #define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, atype) \ ({ \ - u32 __unused_flags; \ int __ret = 0; \ if (cgroup_bpf_enabled(atype)) \ __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype, \ - NULL, \ - &__unused_flags); \ + NULL, NULL); \ __ret; \ }) #define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, atype, t_ctx) \ ({ \ - u32 __unused_flags; \ int __ret = 0; \ if (cgroup_bpf_enabled(atype)) { \ lock_sock(sk); \ __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype, \ - t_ctx, \ - &__unused_flags); \ + t_ctx, NULL); \ release_sock(sk); \ } \ __ret; \ -- cgit From 61df10c7799e27807ad5e459eec9d77cddf8bf45 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 25 Apr 2022 03:18:49 +0530 Subject: bpf: Allow storing unreferenced kptr in map This commit introduces a new pointer type 'kptr' which can be embedded in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID register must have the same type as in the map value's BTF, and loading a kptr marks the destination register as PTR_TO_BTF_ID with the correct kernel BTF and BTF ID. Such kptr are unreferenced, i.e. by the time another invocation of the BPF program loads this pointer, the object which the pointer points to may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are patched to PROBE_MEM loads by the verifier, it would safe to allow user to still access such invalid pointer, but passing such pointers into BPF helpers and kfuncs should not be permitted. A future patch in this series will close this gap. The flexibility offered by allowing programs to dereference such invalid pointers while being safe at runtime frees the verifier from doing complex lifetime tracking. As long as the user may ensure that the object remains valid, it can ensure data read by it from the kernel object is valid. The user indicates that a certain pointer must be treated as kptr capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this information is recorded in the object BTF which will be passed into the kernel by way of map's BTF information. The name and kind from the map value BTF is used to look up the in-kernel type, and the actual BTF and BTF ID is recorded in the map struct in a new kptr_off_tab member. For now, only storing pointers to structs is permitted. An example of this specification is shown below: #define __kptr __attribute__((btf_type_tag("kptr"))) struct map_value { ... struct task_struct __kptr *task; ... }; Then, in a BPF program, user may store PTR_TO_BTF_ID with the type task_struct into the map, and then load it later. Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as the verifier cannot know whether the value is NULL or not statically, it must treat all potential loads at that map value offset as loading a possibly NULL pointer. Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL) are allowed instructions that can access such a pointer. On BPF_LDX, the destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX, it is checked whether the source register type is a PTR_TO_BTF_ID with same BTF type as specified in the map BTF. The access size must always be BPF_DW. For the map in map support, the kptr_off_tab for outer map is copied from the inner map's kptr_off_tab. It was chosen to do a deep copy instead of introducing a refcount to kptr_off_tab, because the copy only needs to be done when paramterizing using inner_map_fd in the map in map case, hence would be unnecessary for all other users. It is not permitted to use MAP_FREEZE command and mmap for BPF map having kptrs, similar to the bpf_timer case. A kptr also requires that BPF program has both read and write access to the map (hence both BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed). Note that check_map_access must be called from both check_helper_mem_access and for the BPF instructions, hence the kptr check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src and reuse it for this purpose. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220424214901.2743946-2-memxor@gmail.com --- include/linux/bpf.h | 31 ++++++++++++++++++++++++++++++- include/linux/btf.h | 2 ++ 2 files changed, 32 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 7bf441563ffc..ce12124048c0 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -155,6 +155,24 @@ struct bpf_map_ops { const struct bpf_iter_seq_info *iter_seq_info; }; +enum { + /* Support at most 8 pointers in a BPF map value */ + BPF_MAP_VALUE_OFF_MAX = 8, +}; + +struct bpf_map_value_off_desc { + u32 offset; + struct { + struct btf *btf; + u32 btf_id; + } kptr; +}; + +struct bpf_map_value_off { + u32 nr_off; + struct bpf_map_value_off_desc off[]; +}; + struct bpf_map { /* The first two cachelines with read-mostly members of which some * are also accessed in fast-path (e.g. ops, max_entries). @@ -171,6 +189,7 @@ struct bpf_map { u64 map_extra; /* any per-map-type extra fields */ u32 map_flags; int spin_lock_off; /* >=0 valid offset, <0 error */ + struct bpf_map_value_off *kptr_off_tab; int timer_off; /* >=0 valid offset, <0 error */ u32 id; int numa_node; @@ -184,7 +203,7 @@ struct bpf_map { char name[BPF_OBJ_NAME_LEN]; bool bypass_spec_v1; bool frozen; /* write-once; write-protected by freeze_mutex */ - /* 14 bytes hole */ + /* 6 bytes hole */ /* The 3rd and 4th cacheline with misc members to avoid false sharing * particularly with refcounting. @@ -217,6 +236,11 @@ static inline bool map_value_has_timer(const struct bpf_map *map) return map->timer_off >= 0; } +static inline bool map_value_has_kptrs(const struct bpf_map *map) +{ + return !IS_ERR_OR_NULL(map->kptr_off_tab); +} + static inline void check_and_init_map_value(struct bpf_map *map, void *dst) { if (unlikely(map_value_has_spin_lock(map))) @@ -1396,6 +1420,11 @@ void bpf_prog_put(struct bpf_prog *prog); void bpf_prog_free_id(struct bpf_prog *prog, bool do_idr_lock); void bpf_map_free_id(struct bpf_map *map, bool do_idr_lock); +struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u32 offset); +void bpf_map_free_kptr_off_tab(struct bpf_map *map); +struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map); +bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b); + struct bpf_map *bpf_map_get(u32 ufd); struct bpf_map *bpf_map_get_with_uref(u32 ufd); struct bpf_map *__bpf_map_get(struct fd f); diff --git a/include/linux/btf.h b/include/linux/btf.h index 36bc09b8e890..19c297f9a52f 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -123,6 +123,8 @@ bool btf_member_is_reg_int(const struct btf *btf, const struct btf_type *s, u32 expected_offset, u32 expected_size); int btf_find_spin_lock(const struct btf *btf, const struct btf_type *t); int btf_find_timer(const struct btf *btf, const struct btf_type *t); +struct bpf_map_value_off *btf_parse_kptrs(const struct btf *btf, + const struct btf_type *t); bool btf_type_is_void(const struct btf_type *t); s32 btf_find_by_name_kind(const struct btf *btf, const char *name, u8 kind); const struct btf_type *btf_type_skip_modifiers(const struct btf *btf, -- cgit From 8f14852e89113d738c99c375b4c8b8b7e1073df1 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 25 Apr 2022 03:18:50 +0530 Subject: bpf: Tag argument to be released in bpf_func_proto Add a new type flag for bpf_arg_type that when set tells verifier that for a release function, that argument's register will be the one for which meta.ref_obj_id will be set, and which will then be released using release_reference. To capture the regno, introduce a new field release_regno in bpf_call_arg_meta. This would be required in the next patch, where we may either pass NULL or a refcounted pointer as an argument to the release function bpf_kptr_xchg. Just releasing only when meta.ref_obj_id is set is not enough, as there is a case where the type of argument needed matches, but the ref_obj_id is set to 0. Hence, we must enforce that whenever meta.ref_obj_id is zero, the register that is to be released can only be NULL for a release function. Since we now indicate whether an argument is to be released in bpf_func_proto itself, is_release_function helper has lost its utitlity, hence refactor code to work without it, and just rely on meta.release_regno to know when to release state for a ref_obj_id. Still, the restriction of one release argument and only one ref_obj_id passed to BPF helper or kfunc remains. This may be lifted in the future. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220424214901.2743946-3-memxor@gmail.com --- include/linux/bpf.h | 5 ++++- include/linux/bpf_verifier.h | 3 +-- 2 files changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index ce12124048c0..492edd2c5713 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -366,7 +366,10 @@ enum bpf_type_flag { */ MEM_PERCPU = BIT(4 + BPF_BASE_TYPE_BITS), - __BPF_TYPE_LAST_FLAG = MEM_PERCPU, + /* Indicates that the argument will be released. */ + OBJ_RELEASE = BIT(5 + BPF_BASE_TYPE_BITS), + + __BPF_TYPE_LAST_FLAG = OBJ_RELEASE, }; /* Max number of base types. */ diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 3a9d2d7cc6b7..1f1e7f2ea967 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -523,8 +523,7 @@ int check_ptr_off_reg(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int regno); int check_func_arg_reg_off(struct bpf_verifier_env *env, const struct bpf_reg_state *reg, int regno, - enum bpf_arg_type arg_type, - bool is_release_func); + enum bpf_arg_type arg_type); int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno); int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, -- cgit From c0a5a21c25f37c9fd7b36072f9968cdff1e4aa13 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 25 Apr 2022 03:18:51 +0530 Subject: bpf: Allow storing referenced kptr in map Extending the code in previous commits, introduce referenced kptr support, which needs to be tagged using 'kptr_ref' tag instead. Unlike unreferenced kptr, referenced kptr have a lot more restrictions. In addition to the type matching, only a newly introduced bpf_kptr_xchg helper is allowed to modify the map value at that offset. This transfers the referenced pointer being stored into the map, releasing the references state for the program, and returning the old value and creating new reference state for the returned pointer. Similar to unreferenced pointer case, return value for this case will also be PTR_TO_BTF_ID_OR_NULL. The reference for the returned pointer must either be eventually released by calling the corresponding release function, otherwise it must be transferred into another map. It is also allowed to call bpf_kptr_xchg with a NULL pointer, to clear the value, and obtain the old value if any. BPF_LDX, BPF_STX, and BPF_ST cannot access referenced kptr. A future commit will permit using BPF_LDX for such pointers, but attempt at making it safe, since the lifetime of object won't be guaranteed. There are valid reasons to enforce the restriction of permitting only bpf_kptr_xchg to operate on referenced kptr. The pointer value must be consistent in face of concurrent modification, and any prior values contained in the map must also be released before a new one is moved into the map. To ensure proper transfer of this ownership, bpf_kptr_xchg returns the old value, which the verifier would require the user to either free or move into another map, and releases the reference held for the pointer being moved in. In the future, direct BPF_XCHG instruction may also be permitted to work like bpf_kptr_xchg helper. Note that process_kptr_func doesn't have to call check_helper_mem_access, since we already disallow rdonly/wronly flags for map, which is what check_map_access_type checks, and we already ensure the PTR_TO_MAP_VALUE refers to kptr by obtaining its off_desc, so check_map_access is also not required. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220424214901.2743946-4-memxor@gmail.com --- include/linux/bpf.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 492edd2c5713..24310837bafc 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -160,8 +160,14 @@ enum { BPF_MAP_VALUE_OFF_MAX = 8, }; +enum bpf_kptr_type { + BPF_KPTR_UNREF, + BPF_KPTR_REF, +}; + struct bpf_map_value_off_desc { u32 offset; + enum bpf_kptr_type type; struct { struct btf *btf; u32 btf_id; @@ -418,6 +424,7 @@ enum bpf_arg_type { ARG_PTR_TO_STACK, /* pointer to stack */ ARG_PTR_TO_CONST_STR, /* pointer to a null terminated read-only string */ ARG_PTR_TO_TIMER, /* pointer to bpf_timer */ + ARG_PTR_TO_KPTR, /* pointer to referenced kptr */ __BPF_ARG_TYPE_MAX, /* Extended arg_types. */ @@ -427,6 +434,7 @@ enum bpf_arg_type { ARG_PTR_TO_SOCKET_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_SOCKET, ARG_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM, ARG_PTR_TO_STACK_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_STACK, + ARG_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID, /* This must be the last entry. Its purpose is to ensure the enum is * wide enough to hold the higher bits reserved for bpf_type_flag. -- cgit From 6efe152d4061a83a2c5db6a0e5b58f9345c9742f Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 25 Apr 2022 03:18:52 +0530 Subject: bpf: Prevent escaping of kptr loaded from maps While we can guarantee that even for unreferenced kptr, the object pointer points to being freed etc. can be handled by the verifier's exception handling (normal load patching to PROBE_MEM loads), we still cannot allow the user to pass these pointers to BPF helpers and kfunc, because the same exception handling won't be done for accesses inside the kernel. The same is true if a referenced pointer is loaded using normal load instruction. Since the reference is not guaranteed to be held while the pointer is used, it must be marked as untrusted. Hence introduce a new type flag, PTR_UNTRUSTED, which is used to mark all registers loading unreferenced and referenced kptr from BPF maps, and ensure they can never escape the BPF program and into the kernel by way of calling stable/unstable helpers. In check_ptr_to_btf_access, the !type_may_be_null check to reject type flags is still correct, as apart from PTR_MAYBE_NULL, only MEM_USER, MEM_PERCPU, and PTR_UNTRUSTED may be set for PTR_TO_BTF_ID. The first two are checked inside the function and rejected using a proper error message, but we still want to allow dereference of untrusted case. Also, we make sure to inherit PTR_UNTRUSTED when chain of pointers are walked, so that this flag is never dropped once it has been set on a PTR_TO_BTF_ID (i.e. trusted to untrusted transition can only be in one direction). In convert_ctx_accesses, extend the switch case to consider untrusted PTR_TO_BTF_ID in addition to normal PTR_TO_BTF_ID for PROBE_MEM conversion for BPF_LDX. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220424214901.2743946-5-memxor@gmail.com --- include/linux/bpf.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 24310837bafc..e13a5cbd4ebb 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -375,7 +375,15 @@ enum bpf_type_flag { /* Indicates that the argument will be released. */ OBJ_RELEASE = BIT(5 + BPF_BASE_TYPE_BITS), - __BPF_TYPE_LAST_FLAG = OBJ_RELEASE, + /* PTR is not trusted. This is only used with PTR_TO_BTF_ID, to mark + * unreferenced and referenced kptr loaded from map value using a load + * instruction, so that they can only be dereferenced but not escape the + * BPF program into the kernel (i.e. cannot be passed as arguments to + * kfunc or bpf helpers). + */ + PTR_UNTRUSTED = BIT(6 + BPF_BASE_TYPE_BITS), + + __BPF_TYPE_LAST_FLAG = PTR_UNTRUSTED, }; /* Max number of base types. */ -- cgit From 4d7d7f69f4b104b2ddeec6a1e7fcfd2d044ed8c4 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 25 Apr 2022 03:18:53 +0530 Subject: bpf: Adapt copy_map_value for multiple offset case Since now there might be at most 10 offsets that need handling in copy_map_value, the manual shuffling and special case is no longer going to work. Hence, let's generalise the copy_map_value function by using a sorted array of offsets to skip regions that must be avoided while copying into and out of a map value. When the map is created, we populate the offset array in struct map, Then, copy_map_value uses this sorted offset array is used to memcpy while skipping timer, spin lock, and kptr. The array is allocated as in most cases none of these special fields would be present in map value, hence we can save on space for the common case by not embedding the entire object inside bpf_map struct. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220424214901.2743946-6-memxor@gmail.com --- include/linux/bpf.h | 56 ++++++++++++++++++++++++++++------------------------- 1 file changed, 30 insertions(+), 26 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e13a5cbd4ebb..4e12b27a5de6 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -158,6 +158,9 @@ struct bpf_map_ops { enum { /* Support at most 8 pointers in a BPF map value */ BPF_MAP_VALUE_OFF_MAX = 8, + BPF_MAP_OFF_ARR_MAX = BPF_MAP_VALUE_OFF_MAX + + 1 + /* for bpf_spin_lock */ + 1, /* for bpf_timer */ }; enum bpf_kptr_type { @@ -179,6 +182,12 @@ struct bpf_map_value_off { struct bpf_map_value_off_desc off[]; }; +struct bpf_map_off_arr { + u32 cnt; + u32 field_off[BPF_MAP_OFF_ARR_MAX]; + u8 field_sz[BPF_MAP_OFF_ARR_MAX]; +}; + struct bpf_map { /* The first two cachelines with read-mostly members of which some * are also accessed in fast-path (e.g. ops, max_entries). @@ -207,10 +216,7 @@ struct bpf_map { struct mem_cgroup *memcg; #endif char name[BPF_OBJ_NAME_LEN]; - bool bypass_spec_v1; - bool frozen; /* write-once; write-protected by freeze_mutex */ - /* 6 bytes hole */ - + struct bpf_map_off_arr *off_arr; /* The 3rd and 4th cacheline with misc members to avoid false sharing * particularly with refcounting. */ @@ -230,6 +236,8 @@ struct bpf_map { bool jited; bool xdp_has_frags; } owner; + bool bypass_spec_v1; + bool frozen; /* write-once; write-protected by freeze_mutex */ }; static inline bool map_value_has_spin_lock(const struct bpf_map *map) @@ -253,37 +261,33 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst) memset(dst + map->spin_lock_off, 0, sizeof(struct bpf_spin_lock)); if (unlikely(map_value_has_timer(map))) memset(dst + map->timer_off, 0, sizeof(struct bpf_timer)); + if (unlikely(map_value_has_kptrs(map))) { + struct bpf_map_value_off *tab = map->kptr_off_tab; + int i; + + for (i = 0; i < tab->nr_off; i++) + *(u64 *)(dst + tab->off[i].offset) = 0; + } } /* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */ static inline void copy_map_value(struct bpf_map *map, void *dst, void *src) { - u32 s_off = 0, s_sz = 0, t_off = 0, t_sz = 0; + u32 curr_off = 0; + int i; - if (unlikely(map_value_has_spin_lock(map))) { - s_off = map->spin_lock_off; - s_sz = sizeof(struct bpf_spin_lock); - } - if (unlikely(map_value_has_timer(map))) { - t_off = map->timer_off; - t_sz = sizeof(struct bpf_timer); + if (likely(!map->off_arr)) { + memcpy(dst, src, map->value_size); + return; } - if (unlikely(s_sz || t_sz)) { - if (s_off < t_off || !s_sz) { - swap(s_off, t_off); - swap(s_sz, t_sz); - } - memcpy(dst, src, t_off); - memcpy(dst + t_off + t_sz, - src + t_off + t_sz, - s_off - t_off - t_sz); - memcpy(dst + s_off + s_sz, - src + s_off + s_sz, - map->value_size - s_off - s_sz); - } else { - memcpy(dst, src, map->value_size); + for (i = 0; i < map->off_arr->cnt; i++) { + u32 next_off = map->off_arr->field_off[i]; + + memcpy(dst + curr_off, src + curr_off, next_off - curr_off); + curr_off += map->off_arr->field_sz[i]; } + memcpy(dst + curr_off, src + curr_off, map->value_size - curr_off); } void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, bool lock_src); -- cgit From 5ce937d613a423ca3102f53d9f3daf4210c1b6e2 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 25 Apr 2022 03:18:54 +0530 Subject: bpf: Populate pairs of btf_id and destructor kfunc in btf To support storing referenced PTR_TO_BTF_ID in maps, we require associating a specific BTF ID with a 'destructor' kfunc. This is because we need to release a live referenced pointer at a certain offset in map value from the map destruction path, otherwise we end up leaking resources. Hence, introduce support for passing an array of btf_id, kfunc_btf_id pairs that denote a BTF ID and its associated release function. Then, add an accessor 'btf_find_dtor_kfunc' which can be used to look up the destructor kfunc of a certain BTF ID. If found, we can use it to free the object from the map free path. The registration of these pairs also serve as a whitelist of structures which are allowed as referenced PTR_TO_BTF_ID in a BPF map, because without finding the destructor kfunc, we will bail and return an error. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220424214901.2743946-7-memxor@gmail.com --- include/linux/btf.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index 19c297f9a52f..fea424681d66 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -40,6 +40,11 @@ struct btf_kfunc_id_set { }; }; +struct btf_id_dtor_kfunc { + u32 btf_id; + u32 kfunc_btf_id; +}; + extern const struct file_operations btf_fops; void btf_get(struct btf *btf); @@ -346,6 +351,9 @@ bool btf_kfunc_id_set_contains(const struct btf *btf, enum btf_kfunc_type type, u32 kfunc_btf_id); int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, const struct btf_kfunc_id_set *s); +s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id); +int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, u32 add_cnt, + struct module *owner); #else static inline const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id) @@ -369,6 +377,15 @@ static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, { return 0; } +static inline s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id) +{ + return -ENOENT; +} +static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dtors, + u32 add_cnt, struct module *owner) +{ + return 0; +} #endif #endif -- cgit From 14a324f6a67ef6a53e04362a70160a47eb8afffa Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 25 Apr 2022 03:18:55 +0530 Subject: bpf: Wire up freeing of referenced kptr A destructor kfunc can be defined as void func(type *), where type may be void or any other pointer type as per convenience. In this patch, we ensure that the type is sane and capture the function pointer into off_desc of ptr_off_tab for the specific pointer offset, with the invariant that the dtor pointer is always set when 'kptr_ref' tag is applied to the pointer's pointee type, which is indicated by the flag BPF_MAP_VALUE_OFF_F_REF. Note that only BTF IDs whose destructor kfunc is registered, thus become the allowed BTF IDs for embedding as referenced kptr. Hence it serves the purpose of finding dtor kfunc BTF ID, as well acting as a check against the whitelist of allowed BTF IDs for this purpose. Finally, wire up the actual freeing of the referenced pointer if any at all available offsets, so that no references are leaked after the BPF map goes away and the BPF program previously moved the ownership a referenced pointer into it. The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem will free any existing referenced kptr. The same case is with LRU map's bpf_lru_push_free/htab_lru_push_free functions, which are extended to reset unreferenced and free referenced kptr. Note that unlike BPF timers, kptr is not reset or freed when map uref drops to zero. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220424214901.2743946-8-memxor@gmail.com --- include/linux/bpf.h | 4 ++++ include/linux/btf.h | 2 ++ 2 files changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 4e12b27a5de6..6141564c76c8 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -23,6 +23,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; @@ -173,6 +174,8 @@ struct bpf_map_value_off_desc { enum bpf_kptr_type type; struct { struct btf *btf; + struct module *module; + btf_dtor_kfunc_t dtor; u32 btf_id; } kptr; }; @@ -1447,6 +1450,7 @@ struct bpf_map_value_off_desc *bpf_map_kptr_off_contains(struct bpf_map *map, u3 void bpf_map_free_kptr_off_tab(struct bpf_map *map); struct bpf_map_value_off *bpf_map_copy_kptr_off_tab(const struct bpf_map *map); bool bpf_map_equal_kptr_off_tab(const struct bpf_map *map_a, const struct bpf_map *map_b); +void bpf_map_free_kptrs(struct bpf_map *map, void *map_value); struct bpf_map *bpf_map_get(u32 ufd); struct bpf_map *bpf_map_get_with_uref(u32 ufd); diff --git a/include/linux/btf.h b/include/linux/btf.h index fea424681d66..f70625dd5bb4 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -45,6 +45,8 @@ struct btf_id_dtor_kfunc { u32 kfunc_btf_id; }; +typedef void (*btf_dtor_kfunc_t)(void *); + extern const struct file_operations btf_fops; void btf_get(struct btf *btf); -- cgit From a1ef195996526da45bbc9710849254023df75aea Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 25 Apr 2022 03:18:56 +0530 Subject: bpf: Teach verifier about kptr_get kfunc helpers We introduce a new style of kfunc helpers, namely *_kptr_get, where they take pointer to the map value which points to a referenced kernel pointer contained in the map. Since this is referenced, only bpf_kptr_xchg from BPF side and xchg from kernel side is allowed to change the current value, and each pointer that resides in that location would be referenced, and RCU protected (this must be kept in mind while adding kernel types embeddable as reference kptr in BPF maps). This means that if do the load of the pointer value in an RCU read section, and find a live pointer, then as long as we hold RCU read lock, it won't be freed by a parallel xchg + release operation. This allows us to implement a safe refcount increment scheme. Hence, enforce that first argument of all such kfunc is a proper PTR_TO_MAP_VALUE pointing at the right offset to referenced pointer. For the rest of the arguments, they are subjected to typical kfunc argument checks, hence allowing some flexibility in passing more intent into how the reference should be taken. For instance, in case of struct nf_conn, it is not freed until RCU grace period ends, but can still be reused for another tuple once refcount has dropped to zero. Hence, a bpf_ct_kptr_get helper not only needs to call refcount_inc_not_zero, but also do a tuple match after incrementing the reference, and when it fails to match it, put the reference again and return NULL. This can be implemented easily if we allow passing additional parameters to the bpf_ct_kptr_get kfunc, like a struct bpf_sock_tuple * and a tuple__sz pair. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220424214901.2743946-9-memxor@gmail.com --- include/linux/btf.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index f70625dd5bb4..2611cea2c2b6 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -17,6 +17,7 @@ enum btf_kfunc_type { BTF_KFUNC_TYPE_ACQUIRE, BTF_KFUNC_TYPE_RELEASE, BTF_KFUNC_TYPE_RET_NULL, + BTF_KFUNC_TYPE_KPTR_ACQUIRE, BTF_KFUNC_TYPE_MAX, }; @@ -35,6 +36,7 @@ struct btf_kfunc_id_set { struct btf_id_set *acquire_set; struct btf_id_set *release_set; struct btf_id_set *ret_null_set; + struct btf_id_set *kptr_acquire_set; }; struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX]; }; -- cgit From 2ab3b3808eb17f729edfd69e061667ca0a427195 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Mon, 25 Apr 2022 03:18:57 +0530 Subject: bpf: Make BTF type match stricter for release arguments The current of behavior of btf_struct_ids_match for release arguments is that when type match fails, it retries with first member type again (recursively). Since the offset is already 0, this is akin to just casting the pointer in normal C, since if type matches it was just embedded inside parent sturct as an object. However, we want to reject cases for release function type matching, be it kfunc or BPF helpers. An example is the following: struct foo { struct bar b; }; struct foo *v = acq_foo(); rel_bar(&v->b); // btf_struct_ids_match fails btf_types_are_same, then // retries with first member type and succeeds, while // it should fail. Hence, don't walk the struct and only rely on btf_types_are_same for strict mode. All users of strict mode must be dealing with zero offset anyway, since otherwise they would want the struct to be walked. Signed-off-by: Kumar Kartikeya Dwivedi Signed-off-by: Alexei Starovoitov Link: https://lore.kernel.org/bpf/20220424214901.2743946-10-memxor@gmail.com --- include/linux/bpf.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6141564c76c8..0af5793ba417 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1748,7 +1748,8 @@ int btf_struct_access(struct bpf_verifier_log *log, const struct btf *btf, u32 *next_btf_id, enum bpf_type_flag *flag); bool btf_struct_ids_match(struct bpf_verifier_log *log, const struct btf *btf, u32 id, int off, - const struct btf *need_btf, u32 need_type_id); + const struct btf *need_btf, u32 need_type_id, + bool strict); int btf_distill_func_proto(struct bpf_verifier_log *log, struct btf *btf, -- cgit From 66eb6df79aefd6b3f7d2e749da7104e90cedc0ff Mon Sep 17 00:00:00 2001 From: Andrew Davis Date: Mon, 25 Apr 2022 09:16:16 -0500 Subject: tee: remove tee_shm_va2pa() and tee_shm_pa2va() We should not need to index into SHMs based on absolute VA/PA. These functions are not used and this kind of usage should not be encouraged anyway. Remove these functions. Signed-off-by: Andrew Davis Reviewed-by: Sumit Garg Signed-off-by: Jens Wiklander --- include/linux/tee_drv.h | 18 ------------------ 1 file changed, 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tee_drv.h b/include/linux/tee_drv.h index 911cad324acc..17eb1c5205d3 100644 --- a/include/linux/tee_drv.h +++ b/include/linux/tee_drv.h @@ -298,24 +298,6 @@ void tee_shm_free(struct tee_shm *shm); */ void tee_shm_put(struct tee_shm *shm); -/** - * tee_shm_va2pa() - Get physical address of a virtual address - * @shm: Shared memory handle - * @va: Virtual address to tranlsate - * @pa: Returned physical address - * @returns 0 on success and < 0 on failure - */ -int tee_shm_va2pa(struct tee_shm *shm, void *va, phys_addr_t *pa); - -/** - * tee_shm_pa2va() - Get virtual address of a physical address - * @shm: Shared memory handle - * @pa: Physical address to tranlsate - * @va: Returned virtual address - * @returns 0 on success and < 0 on failure - */ -int tee_shm_pa2va(struct tee_shm *shm, phys_addr_t pa, void **va); - /** * tee_shm_get_va() - Get virtual address of a shared memory plus an offset * @shm: Shared memory handle -- cgit From 97730bbb242cde22b7140acd202ffd88823886c9 Mon Sep 17 00:00:00 2001 From: Russ Weight Date: Thu, 21 Apr 2022 14:22:00 -0700 Subject: firmware_loader: Add firmware-upload support Extend the firmware subsystem to support a persistent sysfs interface that userspace may use to initiate a firmware update. For example, FPGA based PCIe cards load firmware and FPGA images from local FLASH when the card boots. The images in FLASH may be updated with new images provided by the user at his/her convenience. A device driver may call firmware_upload_register() to expose persistent "loading" and "data" sysfs files. These files are used in the same way as the fallback sysfs "loading" and "data" files. When 0 is written to "loading" to complete the write of firmware data, the data is transferred to the lower-level driver using pre-registered call-back functions. The data transfer is done in the context of a kernel worker thread. Reviewed-by: Luis Chamberlain Reviewed-by: Tianfei zhang Tested-by: Matthew Gerlach Signed-off-by: Russ Weight Link: https://lore.kernel.org/r/20220421212204.36052-5-russell.h.weight@intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware.h | 82 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware.h b/include/linux/firmware.h index ec2ccfebef65..de7fea3bca51 100644 --- a/include/linux/firmware.h +++ b/include/linux/firmware.h @@ -17,6 +17,64 @@ struct firmware { void *priv; }; +/** + * enum fw_upload_err - firmware upload error codes + * @FW_UPLOAD_ERR_NONE: returned to indicate success + * @FW_UPLOAD_ERR_HW_ERROR: error signalled by hardware, see kernel log + * @FW_UPLOAD_ERR_TIMEOUT: SW timed out on handshake with HW/firmware + * @FW_UPLOAD_ERR_CANCELED: upload was cancelled by the user + * @FW_UPLOAD_ERR_BUSY: there is an upload operation already in progress + * @FW_UPLOAD_ERR_INVALID_SIZE: invalid firmware image size + * @FW_UPLOAD_ERR_RW_ERROR: read or write to HW failed, see kernel log + * @FW_UPLOAD_ERR_WEAROUT: FLASH device is approaching wear-out, wait & retry + * @FW_UPLOAD_ERR_MAX: Maximum error code marker + */ +enum fw_upload_err { + FW_UPLOAD_ERR_NONE, + FW_UPLOAD_ERR_HW_ERROR, + FW_UPLOAD_ERR_TIMEOUT, + FW_UPLOAD_ERR_CANCELED, + FW_UPLOAD_ERR_BUSY, + FW_UPLOAD_ERR_INVALID_SIZE, + FW_UPLOAD_ERR_RW_ERROR, + FW_UPLOAD_ERR_WEAROUT, + FW_UPLOAD_ERR_MAX +}; + +struct fw_upload { + void *dd_handle; /* reference to parent driver */ + void *priv; /* firmware loader private fields */ +}; + +/** + * struct fw_upload_ops - device specific operations to support firmware upload + * @prepare: Required: Prepare secure update + * @write: Required: The write() op receives the remaining + * size to be written and must return the actual + * size written or a negative error code. The write() + * op will be called repeatedly until all data is + * written. + * @poll_complete: Required: Check for the completion of the + * HW authentication/programming process. + * @cancel: Required: Request cancellation of update. This op + * is called from the context of a different kernel + * thread, so race conditions need to be considered. + * @cleanup: Optional: Complements the prepare() + * function and is called at the completion + * of the update, on success or failure, if the + * prepare function succeeded. + */ +struct fw_upload_ops { + enum fw_upload_err (*prepare)(struct fw_upload *fw_upload, + const u8 *data, u32 size); + enum fw_upload_err (*write)(struct fw_upload *fw_upload, + const u8 *data, u32 offset, + u32 size, u32 *written); + enum fw_upload_err (*poll_complete)(struct fw_upload *fw_upload); + void (*cancel)(struct fw_upload *fw_upload); + void (*cleanup)(struct fw_upload *fw_upload); +}; + struct module; struct device; @@ -112,6 +170,30 @@ static inline int request_partial_firmware_into_buf #endif +#ifdef CONFIG_FW_UPLOAD + +struct fw_upload * +firmware_upload_register(struct module *module, struct device *parent, + const char *name, const struct fw_upload_ops *ops, + void *dd_handle); +void firmware_upload_unregister(struct fw_upload *fw_upload); + +#else + +static inline struct fw_upload * +firmware_upload_register(struct module *module, struct device *parent, + const char *name, const struct fw_upload_ops *ops, + void *dd_handle) +{ + return ERR_PTR(-EINVAL); +} + +static inline void firmware_upload_unregister(struct fw_upload *fw_upload) +{ +} + +#endif + int firmware_request_cache(struct device *device, const char *name); #endif -- cgit From d434743e5cac3558381b767f343eef5af246a4dc Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:37 +0530 Subject: bus: mhi: ep: Add support for registering MHI endpoint controllers This commit adds support for registering MHI endpoint controller drivers with the MHI endpoint stack. MHI endpoint controller drivers manage the interaction with the host machines (such as x86). They are also the MHI endpoint bus master in charge of managing the physical link between the host and endpoint device. Eventhough the MHI spec is bus agnostic, the current implementation is entirely based on PCIe bus. The endpoint controller driver encloses all information about the underlying physical bus like PCIe. The registration process involves parsing the channel configuration and allocating an MHI EP device. Channels used in the endpoint stack follows the perspective of the MHI host stack. i.e., UL - From host to endpoint DL - From endpoint to host Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-2-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 136 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 include/linux/mhi_ep.h (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h new file mode 100644 index 000000000000..891b199a9703 --- /dev/null +++ b/include/linux/mhi_ep.h @@ -0,0 +1,136 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2022, Linaro Ltd. + * + */ +#ifndef _MHI_EP_H_ +#define _MHI_EP_H_ + +#include +#include + +#define MHI_EP_DEFAULT_MTU 0x8000 + +/** + * struct mhi_ep_channel_config - Channel configuration structure for controller + * @name: The name of this channel + * @num: The number assigned to this channel + * @num_elements: The number of elements that can be queued to this channel + * @dir: Direction that data may flow on this channel + */ +struct mhi_ep_channel_config { + char *name; + u32 num; + u32 num_elements; + enum dma_data_direction dir; +}; + +/** + * struct mhi_ep_cntrl_config - MHI Endpoint controller configuration + * @mhi_version: MHI spec version supported by the controller + * @max_channels: Maximum number of channels supported + * @num_channels: Number of channels defined in @ch_cfg + * @ch_cfg: Array of defined channels + */ +struct mhi_ep_cntrl_config { + u32 mhi_version; + u32 max_channels; + u32 num_channels; + const struct mhi_ep_channel_config *ch_cfg; +}; + +/** + * struct mhi_ep_db_info - MHI Endpoint doorbell info + * @mask: Mask of the doorbell interrupt + * @status: Status of the doorbell interrupt + */ +struct mhi_ep_db_info { + u32 mask; + u32 status; +}; + +/** + * struct mhi_ep_cntrl - MHI Endpoint controller structure + * @cntrl_dev: Pointer to the struct device of physical bus acting as the MHI + * Endpoint controller + * @mhi_dev: MHI Endpoint device instance for the controller + * @mmio: MMIO region containing the MHI registers + * @mhi_chan: Points to the channel configuration table + * @mhi_event: Points to the event ring configurations table + * @mhi_cmd: Points to the command ring configurations table + * @sm: MHI Endpoint state machine + * @raise_irq: CB function for raising IRQ to the host + * @alloc_map: CB function for allocating memory in endpoint for storing host context and mapping it + * @unmap_free: CB function to unmap and free the allocated memory in endpoint for storing host context + * @read_from_host: CB function for reading from host memory from endpoint + * @write_to_host: CB function for writing to host memory from endpoint + * @mhi_state: MHI Endpoint state + * @max_chan: Maximum channels supported by the endpoint controller + * @mru: MRU (Maximum Receive Unit) value of the endpoint controller + * @index: MHI Endpoint controller index + */ +struct mhi_ep_cntrl { + struct device *cntrl_dev; + struct mhi_ep_device *mhi_dev; + void __iomem *mmio; + + struct mhi_ep_chan *mhi_chan; + struct mhi_ep_event *mhi_event; + struct mhi_ep_cmd *mhi_cmd; + struct mhi_ep_sm *sm; + + void (*raise_irq)(struct mhi_ep_cntrl *mhi_cntrl, u32 vector); + int (*alloc_map)(struct mhi_ep_cntrl *mhi_cntrl, u64 pci_addr, phys_addr_t *phys_ptr, + void __iomem **virt, size_t size); + void (*unmap_free)(struct mhi_ep_cntrl *mhi_cntrl, u64 pci_addr, phys_addr_t phys, + void __iomem *virt, size_t size); + int (*read_from_host)(struct mhi_ep_cntrl *mhi_cntrl, u64 from, void *to, size_t size); + int (*write_to_host)(struct mhi_ep_cntrl *mhi_cntrl, void *from, u64 to, size_t size); + + enum mhi_state mhi_state; + + u32 max_chan; + u32 mru; + u32 index; +}; + +/** + * struct mhi_ep_device - Structure representing an MHI Endpoint device that binds + * to channels or is associated with controllers + * @dev: Driver model device node for the MHI Endpoint device + * @mhi_cntrl: Controller the device belongs to + * @id: Pointer to MHI Endpoint device ID struct + * @name: Name of the associated MHI Endpoint device + * @ul_chan: UL channel for the device + * @dl_chan: DL channel for the device + * @dev_type: MHI device type + */ +struct mhi_ep_device { + struct device dev; + struct mhi_ep_cntrl *mhi_cntrl; + const struct mhi_device_id *id; + const char *name; + struct mhi_ep_chan *ul_chan; + struct mhi_ep_chan *dl_chan; + enum mhi_device_type dev_type; +}; + +#define to_mhi_ep_device(dev) container_of(dev, struct mhi_ep_device, dev) + +/** + * mhi_ep_register_controller - Register MHI Endpoint controller + * @mhi_cntrl: MHI Endpoint controller to register + * @config: Configuration to use for the controller + * + * Return: 0 if controller registrations succeeds, a negative error code otherwise. + */ +int mhi_ep_register_controller(struct mhi_ep_cntrl *mhi_cntrl, + const struct mhi_ep_cntrl_config *config); + +/** + * mhi_ep_unregister_controller - Unregister MHI Endpoint controller + * @mhi_cntrl: MHI Endpoint controller to unregister + */ +void mhi_ep_unregister_controller(struct mhi_ep_cntrl *mhi_cntrl); + +#endif -- cgit From ee0360b20b3fa08e2eee300eacb45e812ae44578 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:38 +0530 Subject: bus: mhi: ep: Add support for registering MHI endpoint client drivers This commit adds support for registering MHI endpoint client drivers with the MHI endpoint stack. MHI endpoint client drivers bind to one or more MHI endpoint devices inorder to send and receive the upper-layer protocol packets like IP packets, modem control messages, and diagnostics messages over MHI bus. Reviewed-by: Hemant Kumar Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-3-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 55 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index 891b199a9703..e2b94f9eb846 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -101,8 +101,8 @@ struct mhi_ep_cntrl { * @mhi_cntrl: Controller the device belongs to * @id: Pointer to MHI Endpoint device ID struct * @name: Name of the associated MHI Endpoint device - * @ul_chan: UL channel for the device - * @dl_chan: DL channel for the device + * @ul_chan: UL (from host to endpoint) channel for the device + * @dl_chan: DL (from endpoint to host) channel for the device * @dev_type: MHI device type */ struct mhi_ep_device { @@ -115,7 +115,60 @@ struct mhi_ep_device { enum mhi_device_type dev_type; }; +/** + * struct mhi_ep_driver - Structure representing a MHI Endpoint client driver + * @id_table: Pointer to MHI Endpoint device ID table + * @driver: Device driver model driver + * @probe: CB function for client driver probe function + * @remove: CB function for client driver remove function + * @ul_xfer_cb: CB function for UL (from host to endpoint) data transfer + * @dl_xfer_cb: CB function for DL (from endpoint to host) data transfer + */ +struct mhi_ep_driver { + const struct mhi_device_id *id_table; + struct device_driver driver; + int (*probe)(struct mhi_ep_device *mhi_ep, + const struct mhi_device_id *id); + void (*remove)(struct mhi_ep_device *mhi_ep); + void (*ul_xfer_cb)(struct mhi_ep_device *mhi_dev, + struct mhi_result *result); + void (*dl_xfer_cb)(struct mhi_ep_device *mhi_dev, + struct mhi_result *result); +}; + #define to_mhi_ep_device(dev) container_of(dev, struct mhi_ep_device, dev) +#define to_mhi_ep_driver(drv) container_of(drv, struct mhi_ep_driver, driver) + +/* + * module_mhi_ep_driver() - Helper macro for drivers that don't do + * anything special other than using default mhi_ep_driver_register() and + * mhi_ep_driver_unregister(). This eliminates a lot of boilerplate. + * Each module may only use this macro once. + */ +#define module_mhi_ep_driver(mhi_drv) \ + module_driver(mhi_drv, mhi_ep_driver_register, \ + mhi_ep_driver_unregister) + +/* + * Macro to avoid include chaining to get THIS_MODULE + */ +#define mhi_ep_driver_register(mhi_drv) \ + __mhi_ep_driver_register(mhi_drv, THIS_MODULE) + +/** + * __mhi_ep_driver_register - Register a driver with MHI Endpoint bus + * @mhi_drv: Driver to be associated with the device + * @owner: The module owner + * + * Return: 0 if driver registrations succeeds, a negative error code otherwise. + */ +int __mhi_ep_driver_register(struct mhi_ep_driver *mhi_drv, struct module *owner); + +/** + * mhi_ep_driver_unregister - Unregister a driver from MHI Endpoint bus + * @mhi_drv: Driver associated with the device + */ +void mhi_ep_driver_unregister(struct mhi_ep_driver *mhi_drv); /** * mhi_ep_register_controller - Register MHI Endpoint controller -- cgit From e9e4da23cd65ea76ba658346f5c182791bd1cea9 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:40 +0530 Subject: bus: mhi: ep: Add support for managing MMIO registers Add support for managing the Memory Mapped Input Output (MMIO) registers of the MHI bus. All MHI operations are carried out using the MMIO registers by both host and the endpoint device. The MMIO registers reside inside the endpoint device memory (fixed location based on the platform) and the address is passed by the MHI EP controller driver during its registration. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-5-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index e2b94f9eb846..5db048e258e4 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -59,6 +59,10 @@ struct mhi_ep_db_info { * @mhi_event: Points to the event ring configurations table * @mhi_cmd: Points to the command ring configurations table * @sm: MHI Endpoint state machine + * @ch_ctx_host_pa: Physical address of host channel context data structure + * @ev_ctx_host_pa: Physical address of host event context data structure + * @cmd_ctx_host_pa: Physical address of host command context data structure + * @chdb: Array of channel doorbell interrupt info * @raise_irq: CB function for raising IRQ to the host * @alloc_map: CB function for allocating memory in endpoint for storing host context and mapping it * @unmap_free: CB function to unmap and free the allocated memory in endpoint for storing host context @@ -67,6 +71,10 @@ struct mhi_ep_db_info { * @mhi_state: MHI Endpoint state * @max_chan: Maximum channels supported by the endpoint controller * @mru: MRU (Maximum Receive Unit) value of the endpoint controller + * @event_rings: Number of event rings supported by the endpoint controller + * @hw_event_rings: Number of hardware event rings supported by the endpoint controller + * @chdb_offset: Channel doorbell offset set by the host + * @erdb_offset: Event ring doorbell offset set by the host * @index: MHI Endpoint controller index */ struct mhi_ep_cntrl { @@ -79,6 +87,12 @@ struct mhi_ep_cntrl { struct mhi_ep_cmd *mhi_cmd; struct mhi_ep_sm *sm; + u64 ch_ctx_host_pa; + u64 ev_ctx_host_pa; + u64 cmd_ctx_host_pa; + + struct mhi_ep_db_info chdb[4]; + void (*raise_irq)(struct mhi_ep_cntrl *mhi_cntrl, u32 vector); int (*alloc_map)(struct mhi_ep_cntrl *mhi_cntrl, u64 pci_addr, phys_addr_t *phys_ptr, void __iomem **virt, size_t size); @@ -91,6 +105,10 @@ struct mhi_ep_cntrl { u32 max_chan; u32 mru; + u32 event_rings; + u32 hw_event_rings; + u32 chdb_offset; + u32 erdb_offset; u32 index; }; -- cgit From 961aeb6892242e0a667f5b8eb62b9b0a6041752c Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:42 +0530 Subject: bus: mhi: ep: Add support for sending events to the host Add support for sending the events to the host over MHI bus from the endpoint. Following events are supported: 1. Transfer completion event 2. Command completion event 3. State change event 4. Execution Environment (EE) change event An event is sent whenever an operation has been completed in the MHI EP device. Event is sent using the MHI event ring and additionally the host is notified using an IRQ if required. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-7-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index 5db048e258e4..46236ffb528a 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -59,10 +59,14 @@ struct mhi_ep_db_info { * @mhi_event: Points to the event ring configurations table * @mhi_cmd: Points to the command ring configurations table * @sm: MHI Endpoint state machine + * @ch_ctx_cache: Cache of host channel context data structure + * @ev_ctx_cache: Cache of host event context data structure + * @cmd_ctx_cache: Cache of host command context data structure * @ch_ctx_host_pa: Physical address of host channel context data structure * @ev_ctx_host_pa: Physical address of host event context data structure * @cmd_ctx_host_pa: Physical address of host command context data structure * @chdb: Array of channel doorbell interrupt info + * @event_lock: Lock for protecting event rings * @raise_irq: CB function for raising IRQ to the host * @alloc_map: CB function for allocating memory in endpoint for storing host context and mapping it * @unmap_free: CB function to unmap and free the allocated memory in endpoint for storing host context @@ -87,11 +91,15 @@ struct mhi_ep_cntrl { struct mhi_ep_cmd *mhi_cmd; struct mhi_ep_sm *sm; + struct mhi_chan_ctxt *ch_ctx_cache; + struct mhi_event_ctxt *ev_ctx_cache; + struct mhi_cmd_ctxt *cmd_ctx_cache; u64 ch_ctx_host_pa; u64 ev_ctx_host_pa; u64 cmd_ctx_host_pa; struct mhi_ep_db_info chdb[4]; + struct mutex event_lock; void (*raise_irq)(struct mhi_ep_cntrl *mhi_cntrl, u32 vector); int (*alloc_map)(struct mhi_ep_cntrl *mhi_cntrl, u64 pci_addr, phys_addr_t *phys_ptr, -- cgit From f9baa4f737950523ca648866dfa345ac378e4487 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:43 +0530 Subject: bus: mhi: ep: Add support for managing MHI state machine Add support for managing the MHI state machine by controlling the state transitions. Only the following MHI state transitions are supported: 1. Ready state 2. M0 state 3. M3 state 4. SYS_ERR state Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-8-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index 46236ffb528a..2880d2aa88b8 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -67,6 +67,11 @@ struct mhi_ep_db_info { * @cmd_ctx_host_pa: Physical address of host command context data structure * @chdb: Array of channel doorbell interrupt info * @event_lock: Lock for protecting event rings + * @list_lock: Lock for protecting state transition and channel doorbell lists + * @state_lock: Lock for protecting state transitions + * @st_transition_list: List of state transitions + * @wq: Dedicated workqueue for handling rings and state changes + * @state_work: State transition worker * @raise_irq: CB function for raising IRQ to the host * @alloc_map: CB function for allocating memory in endpoint for storing host context and mapping it * @unmap_free: CB function to unmap and free the allocated memory in endpoint for storing host context @@ -100,6 +105,13 @@ struct mhi_ep_cntrl { struct mhi_ep_db_info chdb[4]; struct mutex event_lock; + spinlock_t list_lock; + spinlock_t state_lock; + + struct list_head st_transition_list; + + struct workqueue_struct *wq; + struct work_struct state_work; void (*raise_irq)(struct mhi_ep_cntrl *mhi_cntrl, u32 vector); int (*alloc_map)(struct mhi_ep_cntrl *mhi_cntrl, u64 pci_addr, phys_addr_t *phys_ptr, -- cgit From 4799e71b082615445dc40ba0bbb86cbb76c24724 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:44 +0530 Subject: bus: mhi: ep: Add support for processing MHI endpoint interrupts Add support for processing MHI endpoint interrupts such as control interrupt, command interrupt and channel interrupt from the host. The interrupts will be generated in the endpoint device whenever host writes to the corresponding doorbell registers. The doorbell logic is handled inside the hardware internally. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-9-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index 2880d2aa88b8..137bd3ee2e43 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -70,6 +70,7 @@ struct mhi_ep_db_info { * @list_lock: Lock for protecting state transition and channel doorbell lists * @state_lock: Lock for protecting state transitions * @st_transition_list: List of state transitions + * @ch_db_list: List of queued channel doorbells * @wq: Dedicated workqueue for handling rings and state changes * @state_work: State transition worker * @raise_irq: CB function for raising IRQ to the host @@ -85,6 +86,7 @@ struct mhi_ep_db_info { * @chdb_offset: Channel doorbell offset set by the host * @erdb_offset: Event ring doorbell offset set by the host * @index: MHI Endpoint controller index + * @irq: IRQ used by the endpoint controller */ struct mhi_ep_cntrl { struct device *cntrl_dev; @@ -109,6 +111,7 @@ struct mhi_ep_cntrl { spinlock_t state_lock; struct list_head st_transition_list; + struct list_head ch_db_list; struct workqueue_struct *wq; struct work_struct state_work; @@ -130,6 +133,7 @@ struct mhi_ep_cntrl { u32 chdb_offset; u32 erdb_offset; u32 index; + int irq; }; /** -- cgit From fb3a26b7e8aff11e44d582604f61c38f63bd507c Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:45 +0530 Subject: bus: mhi: ep: Add support for powering up the MHI endpoint stack Add support for MHI endpoint power_up that includes initializing the MMIO and rings, caching the host MHI registers, and setting the MHI state to M0. After registering the MHI EP controller, the stack has to be powered up for usage. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-10-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index 137bd3ee2e43..3b065f82fbeb 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -65,6 +65,9 @@ struct mhi_ep_db_info { * @ch_ctx_host_pa: Physical address of host channel context data structure * @ev_ctx_host_pa: Physical address of host event context data structure * @cmd_ctx_host_pa: Physical address of host command context data structure + * @ch_ctx_cache_phys: Physical address of the host channel context cache + * @ev_ctx_cache_phys: Physical address of the host event context cache + * @cmd_ctx_cache_phys: Physical address of the host command context cache * @chdb: Array of channel doorbell interrupt info * @event_lock: Lock for protecting event rings * @list_lock: Lock for protecting state transition and channel doorbell lists @@ -87,6 +90,7 @@ struct mhi_ep_db_info { * @erdb_offset: Event ring doorbell offset set by the host * @index: MHI Endpoint controller index * @irq: IRQ used by the endpoint controller + * @enabled: Check if the endpoint controller is enabled or not */ struct mhi_ep_cntrl { struct device *cntrl_dev; @@ -104,6 +108,9 @@ struct mhi_ep_cntrl { u64 ch_ctx_host_pa; u64 ev_ctx_host_pa; u64 cmd_ctx_host_pa; + phys_addr_t ch_ctx_cache_phys; + phys_addr_t ev_ctx_cache_phys; + phys_addr_t cmd_ctx_cache_phys; struct mhi_ep_db_info chdb[4]; struct mutex event_lock; @@ -134,6 +141,7 @@ struct mhi_ep_cntrl { u32 erdb_offset; u32 index; int irq; + bool enabled; }; /** @@ -228,4 +236,12 @@ int mhi_ep_register_controller(struct mhi_ep_cntrl *mhi_cntrl, */ void mhi_ep_unregister_controller(struct mhi_ep_cntrl *mhi_cntrl); +/** + * mhi_ep_power_up - Power up the MHI endpoint stack + * @mhi_cntrl: MHI Endpoint controller + * + * Return: 0 if power up succeeds, a negative error code otherwise. + */ +int mhi_ep_power_up(struct mhi_ep_cntrl *mhi_cntrl); + #endif -- cgit From 5d507ee04894e166f8c5a29f05c6b06ce91d5833 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:46 +0530 Subject: bus: mhi: ep: Add support for powering down the MHI endpoint stack Add support for MHI endpoint power_down that includes stopping all available channels, destroying the channels, resetting the event and transfer rings and freeing the host cache. The stack will be powered down whenever the physical bus link goes down. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-11-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index 3b065f82fbeb..9da683e8302c 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -244,4 +244,10 @@ void mhi_ep_unregister_controller(struct mhi_ep_cntrl *mhi_cntrl); */ int mhi_ep_power_up(struct mhi_ep_cntrl *mhi_cntrl); +/** + * mhi_ep_power_down - Power down the MHI endpoint stack + * @mhi_cntrl: MHI controller + */ +void mhi_ep_power_down(struct mhi_ep_cntrl *mhi_cntrl); + #endif -- cgit From 7a97b6b47353c60dd1b53ada6180741437e377f2 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:47 +0530 Subject: bus: mhi: ep: Add support for handling MHI_RESET Add support for handling MHI_RESET in MHI endpoint stack. MHI_RESET will be issued by the host during shutdown and during error scenario so that it can recover the endpoint device without restarting the whole device. MHI_RESET handling involves resetting the internal MHI registers, data structures, state machines, resetting all channels/rings and setting MHICTRL.RESET bit to 0. Additionally the device will also move to READY state if the reset was due to SYS_ERR. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-12-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index 9da683e8302c..2f31a54c205f 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -76,6 +76,7 @@ struct mhi_ep_db_info { * @ch_db_list: List of queued channel doorbells * @wq: Dedicated workqueue for handling rings and state changes * @state_work: State transition worker + * @reset_work: Worker for MHI Endpoint reset * @raise_irq: CB function for raising IRQ to the host * @alloc_map: CB function for allocating memory in endpoint for storing host context and mapping it * @unmap_free: CB function to unmap and free the allocated memory in endpoint for storing host context @@ -122,6 +123,7 @@ struct mhi_ep_cntrl { struct workqueue_struct *wq; struct work_struct state_work; + struct work_struct reset_work; void (*raise_irq)(struct mhi_ep_cntrl *mhi_cntrl, u32 vector); int (*alloc_map)(struct mhi_ep_cntrl *mhi_cntrl, u64 pci_addr, phys_addr_t *phys_ptr, -- cgit From e827569062a804c67b51930ce83a4cb886113cb7 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:49 +0530 Subject: bus: mhi: ep: Add support for processing command rings Add support for processing the command rings. Command ring is used by the host to issue channel specific commands to the ep device. Following commands are supported: 1. Start channel 2. Stop channel 3. Reset channel Once the device receives the command doorbell interrupt from host, it executes the command and generates a command completion event to the host in the primary event ring. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-14-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index 2f31a54c205f..8c6406d9c51f 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -77,6 +77,7 @@ struct mhi_ep_db_info { * @wq: Dedicated workqueue for handling rings and state changes * @state_work: State transition worker * @reset_work: Worker for MHI Endpoint reset + * @cmd_ring_work: Worker for processing command rings * @raise_irq: CB function for raising IRQ to the host * @alloc_map: CB function for allocating memory in endpoint for storing host context and mapping it * @unmap_free: CB function to unmap and free the allocated memory in endpoint for storing host context @@ -124,6 +125,7 @@ struct mhi_ep_cntrl { struct workqueue_struct *wq; struct work_struct state_work; struct work_struct reset_work; + struct work_struct cmd_ring_work; void (*raise_irq)(struct mhi_ep_cntrl *mhi_cntrl, u32 vector); int (*alloc_map)(struct mhi_ep_cntrl *mhi_cntrl, u64 pci_addr, phys_addr_t *phys_ptr, -- cgit From 530125889977365cb6db32d7d0bd84c9f54c8aab Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:50 +0530 Subject: bus: mhi: ep: Add support for reading from the host Data transfer between host and the ep device happens over the transfer ring associated with each bi-directional channel pair. Host defines the transfer ring by allocating memory for it. The read and write pointer addresses of the transfer ring are stored in the channel context. Once host places the elements in the transfer ring, it increments the write pointer and rings the channel doorbell. Device will receive the doorbell interrupt and will process the transfer ring elements. This commit adds support for reading the transfer ring elements from the transfer ring till write pointer, incrementing the read pointer and finally sending the completion event to the host through corresponding event ring. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-15-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index 8c6406d9c51f..fc7d197413eb 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -254,4 +254,13 @@ int mhi_ep_power_up(struct mhi_ep_cntrl *mhi_cntrl); */ void mhi_ep_power_down(struct mhi_ep_cntrl *mhi_cntrl); +/** + * mhi_ep_queue_is_empty - Determine whether the transfer queue is empty + * @mhi_dev: Device associated with the channels + * @dir: DMA direction for the channel + * + * Return: true if the queue is empty, false otherwise. + */ +bool mhi_ep_queue_is_empty(struct mhi_ep_device *mhi_dev, enum dma_data_direction dir); + #endif -- cgit From 03c0bb8ec983f993a704417d73cc0a3511453d3e Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:51 +0530 Subject: bus: mhi: ep: Add support for processing channel rings Add support for processing the channel rings from host. For the channel ring associated with DL channel, the xfer callback will simply invoked. For the case of UL channel, the ring elements will be read in a buffer till the write pointer and later passed to the client driver using the xfer callback. The client drivers should provide the callbacks for both UL and DL channels during registration. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-16-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index fc7d197413eb..eecc8f35d630 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -78,6 +78,7 @@ struct mhi_ep_db_info { * @state_work: State transition worker * @reset_work: Worker for MHI Endpoint reset * @cmd_ring_work: Worker for processing command rings + * @ch_ring_work: Worker for processing channel rings * @raise_irq: CB function for raising IRQ to the host * @alloc_map: CB function for allocating memory in endpoint for storing host context and mapping it * @unmap_free: CB function to unmap and free the allocated memory in endpoint for storing host context @@ -126,6 +127,7 @@ struct mhi_ep_cntrl { struct work_struct state_work; struct work_struct reset_work; struct work_struct cmd_ring_work; + struct work_struct ch_ring_work; void (*raise_irq)(struct mhi_ep_cntrl *mhi_cntrl, u32 vector); int (*alloc_map)(struct mhi_ep_cntrl *mhi_cntrl, u64 pci_addr, phys_addr_t *phys_ptr, -- cgit From 2d945a394d9c1c59d88397cb383b11216d018a6b Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:52 +0530 Subject: bus: mhi: ep: Add support for queueing SKBs to the host Add support for queueing SKBs to the host over the transfer ring of the relevant channel. The mhi_ep_queue_skb() API will be used by the client networking drivers to queue the SKBs to the host over MHI bus. The host will add ring elements to the transfer ring periodically for the device and the device will write SKBs to the ring elements. If a single SKB doesn't fit in a ring element (TRE), it will be placed in multiple ring elements and the overflow event will be sent for all ring elements except the last one. For the last ring element, the EOT event will be sent indicating the packet boundary. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-17-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mhi_ep.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mhi_ep.h b/include/linux/mhi_ep.h index eecc8f35d630..478aece17046 100644 --- a/include/linux/mhi_ep.h +++ b/include/linux/mhi_ep.h @@ -265,4 +265,13 @@ void mhi_ep_power_down(struct mhi_ep_cntrl *mhi_cntrl); */ bool mhi_ep_queue_is_empty(struct mhi_ep_device *mhi_dev, enum dma_data_direction dir); +/** + * mhi_ep_queue_skb - Send SKBs to host over MHI Endpoint + * @mhi_dev: Device associated with the DL channel + * @skb: SKBs to be queued + * + * Return: 0 if the SKBs has been sent successfully, a negative error code otherwise. + */ +int mhi_ep_queue_skb(struct mhi_ep_device *mhi_dev, struct sk_buff *skb); + #endif -- cgit From c268c0a8a33047cd957fecc1349d09a68eb6ad9e Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Tue, 5 Apr 2022 19:27:54 +0530 Subject: bus: mhi: ep: Add uevent support for module autoloading Add uevent support to MHI endpoint bus so that the client drivers can be autoloaded by udev when the MHI endpoint devices gets created. The client drivers are expected to provide MODULE_DEVICE_TABLE with the MHI id_table struct so that the alias can be exported. The MHI endpoint reused the mhi_device_id structure of the MHI bus. Reviewed-by: Alex Elder Signed-off-by: Manivannan Sadhasivam Link: https://lore.kernel.org/r/20220405135754.6622-19-manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mod_devicetable.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mod_devicetable.h b/include/linux/mod_devicetable.h index 5da5d990ff58..549590e9c644 100644 --- a/include/linux/mod_devicetable.h +++ b/include/linux/mod_devicetable.h @@ -835,6 +835,8 @@ struct wmi_device_id { #define MHI_DEVICE_MODALIAS_FMT "mhi:%s" #define MHI_NAME_SIZE 32 +#define MHI_EP_DEVICE_MODALIAS_FMT "mhi_ep:%s" + /** * struct mhi_device_id - MHI device identification * @chan: MHI channel name -- cgit From 31f6bd7fad3b149a1eb6f67fc2e742e4df369b3d Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Mon, 25 Apr 2022 17:33:58 +0300 Subject: serial: Store character timing information to uart_port MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Struct uart_port currently stores FIFO timeout. Having character timing information readily available is useful. Even serial core itself determines char_time from port->timeout using inverse calculation. Store frame_time directly into uart_port. Character time is stored in nanoseconds to have reasonable precision with high rates. To avoid overflow, 64-bit math is necessary. It might be possible to determine timeout from frame_time by multiplying it with fifosize as needed but only part of the users seem to be protected by a lock. Thus, this patch does not pursue storing only frame_time in uart_port. Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220425143410.12703-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index d4828e69087a..cbd5070bc87f 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -232,6 +232,7 @@ struct uart_port { int hw_stopped; /* sw-assisted CTS flow state */ unsigned int mctrl; /* current modem ctrl settings */ unsigned int timeout; /* character-based timeout */ + unsigned int frame_time; /* frame timing in ns */ unsigned int type; /* port type */ const struct uart_ops *ops; unsigned int custom_divisor; -- cgit From 7a20917d30fb627bebff6bacbe860ad4eb3a04d6 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Fri, 22 Apr 2022 15:23:45 -0700 Subject: device property: Add helper to match multiple connections In some cases multiple connections with the same connection id needs to be resolved from a fwnode graph. One such example is when separate hardware is used for performing muxing and/or orientation switching of the SuperSpeed and SBU lines in a USB Type-C connector. In this case the connector needs to belong to a graph with multiple matching remote endpoints, and the Type-C controller needs to be able to resolve them both. Add a new API that allows this kind of lookup. Reviewed-by: Andy Shevchenko Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20220422222351.1297276-2-bjorn.andersson@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/property.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/property.h b/include/linux/property.h index 4cd4b326941f..de7ff336d2c8 100644 --- a/include/linux/property.h +++ b/include/linux/property.h @@ -447,6 +447,11 @@ static inline void *device_connection_find_match(struct device *dev, return fwnode_connection_find_match(dev_fwnode(dev), con_id, data, match); } +int fwnode_connection_find_matches(struct fwnode_handle *fwnode, + const char *con_id, void *data, + devcon_match_fn_t match, + void **matches, unsigned int matches_len); + /* -------------------------------------------------------------------------- */ /* Software fwnode support - when HW description is incomplete or missing */ -- cgit From 713fd49b430c37263c6cae2c82954f4e1cbcd90d Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Fri, 22 Apr 2022 15:23:48 -0700 Subject: usb: typec: mux: Introduce indirection Rather than directly exposing the implementation's representation of the typec muxes to the controller/clients, introduce an indirection object. This enables the introduction of turning this relationship into a one-to-many in the following patch. Acked-by: Heikki Krogerus Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20220422222351.1297276-5-bjorn.andersson@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/typec_mux.h | 22 ++++++++++++---------- 1 file changed, 12 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb/typec_mux.h b/include/linux/usb/typec_mux.h index a9d9957933dc..ee57781dcf28 100644 --- a/include/linux/usb/typec_mux.h +++ b/include/linux/usb/typec_mux.h @@ -8,11 +8,13 @@ struct device; struct typec_mux; +struct typec_mux_dev; struct typec_switch; +struct typec_switch_dev; struct typec_altmode; struct fwnode_handle; -typedef int (*typec_switch_set_fn_t)(struct typec_switch *sw, +typedef int (*typec_switch_set_fn_t)(struct typec_switch_dev *sw, enum typec_orientation orientation); struct typec_switch_desc { @@ -32,13 +34,13 @@ static inline struct typec_switch *typec_switch_get(struct device *dev) return fwnode_typec_switch_get(dev_fwnode(dev)); } -struct typec_switch * +struct typec_switch_dev * typec_switch_register(struct device *parent, const struct typec_switch_desc *desc); -void typec_switch_unregister(struct typec_switch *sw); +void typec_switch_unregister(struct typec_switch_dev *sw); -void typec_switch_set_drvdata(struct typec_switch *sw, void *data); -void *typec_switch_get_drvdata(struct typec_switch *sw); +void typec_switch_set_drvdata(struct typec_switch_dev *sw, void *data); +void *typec_switch_get_drvdata(struct typec_switch_dev *sw); struct typec_mux_state { struct typec_altmode *alt; @@ -46,7 +48,7 @@ struct typec_mux_state { void *data; }; -typedef int (*typec_mux_set_fn_t)(struct typec_mux *mux, +typedef int (*typec_mux_set_fn_t)(struct typec_mux_dev *mux, struct typec_mux_state *state); struct typec_mux_desc { @@ -67,11 +69,11 @@ typec_mux_get(struct device *dev, const struct typec_altmode_desc *desc) return fwnode_typec_mux_get(dev_fwnode(dev), desc); } -struct typec_mux * +struct typec_mux_dev * typec_mux_register(struct device *parent, const struct typec_mux_desc *desc); -void typec_mux_unregister(struct typec_mux *mux); +void typec_mux_unregister(struct typec_mux_dev *mux); -void typec_mux_set_drvdata(struct typec_mux *mux, void *data); -void *typec_mux_get_drvdata(struct typec_mux *mux); +void typec_mux_set_drvdata(struct typec_mux_dev *mux, void *data); +void *typec_mux_get_drvdata(struct typec_mux_dev *mux); #endif /* __USB_TYPEC_MUX */ -- cgit From af1969a2d734d6272c0640b50c3ed31e59e203a9 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Sat, 23 Apr 2022 20:42:03 -0400 Subject: USB: gadget: Rename usb_gadget_probe_driver() In preparation for adding a "gadget" bus, this patch renames usb_gadget_probe_driver() to usb_gadget_register_driver(). The new name will be more accurate, since gadget drivers will be registered on the gadget bus and the probing will be done by the driver core, not the UDC core. Signed-off-by: Alan Stern Link: https://lore.kernel.org/r/YmSc29YZvxgT5fEJ@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/gadget.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h index 10fe57cf40be..5830b8a903da 100644 --- a/include/linux/usb/gadget.h +++ b/include/linux/usb/gadget.h @@ -745,7 +745,7 @@ struct usb_gadget_driver { */ /** - * usb_gadget_probe_driver - probe a gadget driver + * usb_gadget_register_driver - register a gadget driver * @driver: the driver being registered * Context: can sleep * @@ -755,7 +755,7 @@ struct usb_gadget_driver { * registration call returns. It's expected that the @bind() function will * be in init sections. */ -int usb_gadget_probe_driver(struct usb_gadget_driver *driver); +int usb_gadget_register_driver(struct usb_gadget_driver *driver); /** * usb_gadget_unregister_driver - unregister a gadget driver -- cgit From fc274c1e997314bf47f6a62c79b5d7e554ed59c4 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Sat, 23 Apr 2022 21:35:51 -0400 Subject: USB: gadget: Add a new bus for gadgets This patch adds a "gadget" bus and uses it for registering gadgets and their drivers. From now on, bindings will be managed by the driver core rather than through ad-hoc manipulations in the UDC core. As part of this change, the driver_pending_list is removed. The UDC core won't need to keep track of unbound drivers for later binding, because the driver core handles all of that for us. However, we do need one new feature: a way to prevent gadget drivers from being bound to more than one gadget at a time. The existing code does this automatically, but the driver core doesn't -- it's perfectly happy to bind a single driver to all the matching devices on the bus. The patch adds a new bitflag to the usb_gadget_driver structure for this purpose. A nice side effect of this change is a reduction in the total lines of code, since now the driver core will do part of the work that the UDC used to do. A possible future patch could add udc devices to the gadget bus, say as a separate device type. Signed-off-by: Alan Stern Link: https://lore.kernel.org/r/YmSpdxaDNeC2BBOf@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/gadget.h | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h index 5830b8a903da..cf7af8a0a6e9 100644 --- a/include/linux/usb/gadget.h +++ b/include/linux/usb/gadget.h @@ -664,9 +664,9 @@ static inline int usb_gadget_check_config(struct usb_gadget *gadget) * @driver: Driver model state for this driver. * @udc_name: A name of UDC this driver should be bound to. If udc_name is NULL, * this driver will be bound to any available UDC. - * @pending: UDC core private data used for deferred probe of this driver. - * @match_existing_only: If udc is not found, return an error and don't add this - * gadget driver to list of pending driver + * @match_existing_only: If udc is not found, return an error and fail + * the driver registration + * @is_bound: Allow a driver to be bound to only one gadget * * Devices are disabled till a gadget driver successfully bind()s, which * means the driver will handle setup() requests needed to enumerate (and @@ -729,8 +729,8 @@ struct usb_gadget_driver { struct device_driver driver; char *udc_name; - struct list_head pending; unsigned match_existing_only:1; + bool is_bound:1; }; @@ -740,22 +740,30 @@ struct usb_gadget_driver { /* driver modules register and unregister, as usual. * these calls must be made in a context that can sleep. * - * these will usually be implemented directly by the hardware-dependent - * usb bus interface driver, which will only support a single driver. + * A gadget driver can be bound to only one gadget at a time. */ /** - * usb_gadget_register_driver - register a gadget driver + * usb_gadget_register_driver_owner - register a gadget driver * @driver: the driver being registered + * @owner: the driver module + * @mod_name: the driver module's build name * Context: can sleep * * Call this in your gadget driver's module initialization function, - * to tell the underlying usb controller driver about your driver. + * to tell the underlying UDC controller driver about your driver. * The @bind() function will be called to bind it to a gadget before this * registration call returns. It's expected that the @bind() function will * be in init sections. + * + * Use the macro defined below instead of calling this directly. */ -int usb_gadget_register_driver(struct usb_gadget_driver *driver); +int usb_gadget_register_driver_owner(struct usb_gadget_driver *driver, + struct module *owner, const char *mod_name); + +/* use a define to avoid include chaining to get THIS_MODULE & friends */ +#define usb_gadget_register_driver(driver) \ + usb_gadget_register_driver_owner(driver, THIS_MODULE, KBUILD_MODNAME) /** * usb_gadget_unregister_driver - unregister a gadget driver -- cgit From 8e8b11956486e3fe8cacf54a1d492ebdd8cc1fb2 Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Thu, 17 Feb 2022 10:42:52 -0800 Subject: of/platform: Add stubs for of_platform_device_create/destroy() Code for platform_device_create() and of_platform_device_destroy() is only generated if CONFIG_OF_ADDRESS=y. Add stubs to avoid unresolved symbols when CONFIG_OF_ADDRESS is not set. Reviewed-by: Stephen Boyd Reviewed-by: Douglas Anderson Acked-by: Rob Herring Signed-off-by: Matthias Kaehlcke Link: https://lore.kernel.org/r/20220217104219.v21.1.I08fd2e1c775af04f663730e9fb4d00e6bbb38541@changeid Signed-off-by: Greg Kroah-Hartman --- include/linux/of_platform.h | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of_platform.h b/include/linux/of_platform.h index 84a966623e78..d15b6cd5e1c3 100644 --- a/include/linux/of_platform.h +++ b/include/linux/of_platform.h @@ -61,16 +61,18 @@ static inline struct platform_device *of_find_device_by_node(struct device_node } #endif +extern int of_platform_bus_probe(struct device_node *root, + const struct of_device_id *matches, + struct device *parent); + +#ifdef CONFIG_OF_ADDRESS /* Platform devices and busses creation */ extern struct platform_device *of_platform_device_create(struct device_node *np, const char *bus_id, struct device *parent); extern int of_platform_device_destroy(struct device *dev, void *data); -extern int of_platform_bus_probe(struct device_node *root, - const struct of_device_id *matches, - struct device *parent); -#ifdef CONFIG_OF_ADDRESS + extern int of_platform_populate(struct device_node *root, const struct of_device_id *matches, const struct of_dev_auxdata *lookup, @@ -84,6 +86,18 @@ extern int devm_of_platform_populate(struct device *dev); extern void devm_of_platform_depopulate(struct device *dev); #else +/* Platform devices and busses creation */ +static inline struct platform_device *of_platform_device_create(struct device_node *np, + const char *bus_id, + struct device *parent) +{ + return NULL; +} +static inline int of_platform_device_destroy(struct device *dev, void *data) +{ + return -ENODEV; +} + static inline int of_platform_populate(struct device_node *root, const struct of_device_id *matches, const struct of_dev_auxdata *lookup, -- cgit From 0298b4b95cb373c21e6323c905589f8dac42c5b4 Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Thu, 17 Feb 2022 10:42:53 -0800 Subject: usb: misc: Add onboard_usb_hub driver The main issue this driver addresses is that a USB hub needs to be powered before it can be discovered. For discrete onboard hubs (an example for such a hub is the Realtek RTS5411) this is often solved by supplying the hub with an 'always-on' regulator, which is kind of a hack. Some onboard hubs may require further initialization steps, like changing the state of a GPIO or enabling a clock, which requires even more hacks. This driver creates a platform device representing the hub which performs the necessary initialization. Currently it only supports switching on a single regulator, support for multiple regulators or other actions can be added as needed. Different initialization sequences can be supported based on the compatible string. Besides performing the initialization the driver can be configured to power the hub off during system suspend. This can help to extend battery life on battery powered devices which have no requirements to keep the hub powered during suspend. The driver can also be configured to leave the hub powered when a wakeup capable USB device is connected when suspending, and power it off otherwise. Technically the driver consists of two drivers, the platform driver described above and a very thin USB driver that subclasses the generic driver. The purpose of this driver is to provide the platform driver with the USB devices corresponding to the hub(s) (a hub controller may provide multiple 'logical' hubs, e.g. one to support USB 2.0 and another for USB 3.x). Note: the current series only supports hubs connected directly to a root hub, support for other configurations could be added if needed. Co-developed-by: Ravi Chandra Sadineni Reviewed-by: Douglas Anderson Acked-by: Alan Stern Signed-off-by: Ravi Chandra Sadineni Signed-off-by: Matthias Kaehlcke Link: https://lore.kernel.org/r/20220217104219.v21.2.I7c9a1f1d6ced41dd8310e8a03da666a32364e790@changeid Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/onboard_hub.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 include/linux/usb/onboard_hub.h (limited to 'include/linux') diff --git a/include/linux/usb/onboard_hub.h b/include/linux/usb/onboard_hub.h new file mode 100644 index 000000000000..d9373230556e --- /dev/null +++ b/include/linux/usb/onboard_hub.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LINUX_USB_ONBOARD_HUB_H +#define __LINUX_USB_ONBOARD_HUB_H + +struct usb_device; +struct list_head; + +#if IS_ENABLED(CONFIG_USB_ONBOARD_HUB) +void onboard_hub_create_pdevs(struct usb_device *parent_hub, struct list_head *pdev_list); +void onboard_hub_destroy_pdevs(struct list_head *pdev_list); +#else +static inline void onboard_hub_create_pdevs(struct usb_device *parent_hub, + struct list_head *pdev_list) {} +static inline void onboard_hub_destroy_pdevs(struct list_head *pdev_list) {} +#endif + +#endif /* __LINUX_USB_ONBOARD_HUB_H */ -- cgit From c40b62216c1aecc0dc00faf33d71bd71cb440337 Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Thu, 17 Feb 2022 10:42:54 -0800 Subject: usb: core: hcd: Create platform devices for onboard hubs in probe() Call onboard_hub_create/destroy_pdevs() from usb_add/remove_hcd() for primary HCDs to create/destroy platform devices for onboard USB hubs that may be connected to the root hub of the controller. These functions are a NOP unless CONFIG_USB_ONBOARD_HUB=y/m. Also add a field to struct usb_hcd to keep track of the onboard hub platform devices that are owned by the HCD. Reviewed-by: Douglas Anderson Reviewed-by: Stephen Boyd Signed-off-by: Matthias Kaehlcke Link: https://lore.kernel.org/r/20220217104219.v21.3.I7a3a7d9d2126c34079b1cab87aa0b2ec3030f9b7@changeid Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/hcd.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h index 548a028f2dab..4ebc91c09182 100644 --- a/include/linux/usb/hcd.h +++ b/include/linux/usb/hcd.h @@ -198,6 +198,7 @@ struct usb_hcd { struct usb_hcd *shared_hcd; struct usb_hcd *primary_hcd; + struct list_head onboard_hub_devs; #define HCD_BUFFER_POOLS 4 struct dma_pool *pool[HCD_BUFFER_POOLS]; -- cgit From 9723f69d1de37ca25474372ad1753ea39bb0dce1 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 8 Apr 2022 10:00:43 +0200 Subject: mmc: core: improve API to make clear that mmc_sw_reset is for cards To make it unambiguous that mmc_sw_reset() is for cards and not for controllers, we make the function argument mmc_card instead of mmc_host. There are no users to convert currently. Suggested-by: Ulf Hansson Signed-off-by: Wolfram Sang Link: https://lore.kernel.org/r/20220408080045.6497-3-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson --- include/linux/mmc/core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mmc/core.h b/include/linux/mmc/core.h index de5c64bbdb72..6efec0b9820c 100644 --- a/include/linux/mmc/core.h +++ b/include/linux/mmc/core.h @@ -176,7 +176,7 @@ int mmc_wait_for_cmd(struct mmc_host *host, struct mmc_command *cmd, int retries); int mmc_hw_reset(struct mmc_card *card); -int mmc_sw_reset(struct mmc_host *host); +int mmc_sw_reset(struct mmc_card *card); void mmc_set_data_timeout(struct mmc_data *data, const struct mmc_card *card); #endif /* LINUX_MMC_CORE_H */ -- cgit From 32f18e596141f40dbc5d8700b2730371ffd3055f Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Fri, 8 Apr 2022 10:00:44 +0200 Subject: mmc: improve API to make clear hw_reset callback is for cards To make it unambiguous that the hw_reset callback is for cards and not for controllers, we add 'card' to the callback name and convert all users in one go. We keep the argument as mmc_host, though, because the callback is used very early when mmc_card is not yet populated. Signed-off-by: Wolfram Sang Link: https://lore.kernel.org/r/20220408080045.6497-4-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson --- include/linux/mmc/host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index 7afb57cab00b..c193c50ccd78 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -181,7 +181,7 @@ struct mmc_host_ops { unsigned int max_dtr, int host_drv, int card_drv, int *drv_type); /* Reset the eMMC card via RST_n */ - void (*hw_reset)(struct mmc_host *host); + void (*card_hw_reset)(struct mmc_host *host); void (*card_event)(struct mmc_host *host); /* -- cgit From 13acb62ce1ee7377ef03034fbbad6f5499464b86 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Tue, 12 Apr 2022 11:31:02 +0200 Subject: mmc: sh_mmcif: move platform_data header to proper location We have a dedicated directory for platform_data meanwhile, don't spoil the MMC directory with it. Reviewed-by: Geert Uytterhoeven Signed-off-by: Wolfram Sang Link: https://lore.kernel.org/r/20220412093102.3428-1-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson --- include/linux/mmc/sh_mmcif.h | 209 --------------------------------- include/linux/platform_data/sh_mmcif.h | 207 ++++++++++++++++++++++++++++++++ 2 files changed, 207 insertions(+), 209 deletions(-) delete mode 100644 include/linux/mmc/sh_mmcif.h create mode 100644 include/linux/platform_data/sh_mmcif.h (limited to 'include/linux') diff --git a/include/linux/mmc/sh_mmcif.h b/include/linux/mmc/sh_mmcif.h deleted file mode 100644 index e25533b95d9f..000000000000 --- a/include/linux/mmc/sh_mmcif.h +++ /dev/null @@ -1,209 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * include/linux/mmc/sh_mmcif.h - * - * platform data for eMMC driver - * - * Copyright (C) 2010 Renesas Solutions Corp. - */ - -#ifndef LINUX_MMC_SH_MMCIF_H -#define LINUX_MMC_SH_MMCIF_H - -#include -#include - -/* - * MMCIF : CE_CLK_CTRL [19:16] - * 1000 : Peripheral clock / 512 - * 0111 : Peripheral clock / 256 - * 0110 : Peripheral clock / 128 - * 0101 : Peripheral clock / 64 - * 0100 : Peripheral clock / 32 - * 0011 : Peripheral clock / 16 - * 0010 : Peripheral clock / 8 - * 0001 : Peripheral clock / 4 - * 0000 : Peripheral clock / 2 - * 1111 : Peripheral clock (sup_pclk set '1') - */ - -struct sh_mmcif_plat_data { - unsigned int slave_id_tx; /* embedded slave_id_[tr]x */ - unsigned int slave_id_rx; - u8 sup_pclk; /* 1 :SH7757, 0: SH7724/SH7372 */ - unsigned long caps; - u32 ocr; -}; - -#define MMCIF_CE_CMD_SET 0x00000000 -#define MMCIF_CE_ARG 0x00000008 -#define MMCIF_CE_ARG_CMD12 0x0000000C -#define MMCIF_CE_CMD_CTRL 0x00000010 -#define MMCIF_CE_BLOCK_SET 0x00000014 -#define MMCIF_CE_CLK_CTRL 0x00000018 -#define MMCIF_CE_BUF_ACC 0x0000001C -#define MMCIF_CE_RESP3 0x00000020 -#define MMCIF_CE_RESP2 0x00000024 -#define MMCIF_CE_RESP1 0x00000028 -#define MMCIF_CE_RESP0 0x0000002C -#define MMCIF_CE_RESP_CMD12 0x00000030 -#define MMCIF_CE_DATA 0x00000034 -#define MMCIF_CE_INT 0x00000040 -#define MMCIF_CE_INT_MASK 0x00000044 -#define MMCIF_CE_HOST_STS1 0x00000048 -#define MMCIF_CE_HOST_STS2 0x0000004C -#define MMCIF_CE_CLK_CTRL2 0x00000070 -#define MMCIF_CE_VERSION 0x0000007C - -/* CE_BUF_ACC */ -#define BUF_ACC_DMAWEN (1 << 25) -#define BUF_ACC_DMAREN (1 << 24) -#define BUF_ACC_BUSW_32 (0 << 17) -#define BUF_ACC_BUSW_16 (1 << 17) -#define BUF_ACC_ATYP (1 << 16) - -/* CE_CLK_CTRL */ -#define CLK_ENABLE (1 << 24) /* 1: output mmc clock */ -#define CLK_CLEAR (0xf << 16) -#define CLK_SUP_PCLK (0xf << 16) -#define CLKDIV_4 (1 << 16) /* mmc clock frequency. - * n: bus clock/(2^(n+1)) */ -#define CLKDIV_256 (7 << 16) /* mmc clock frequency. (see above) */ -#define SRSPTO_256 (2 << 12) /* resp timeout */ -#define SRBSYTO_29 (0xf << 8) /* resp busy timeout */ -#define SRWDTO_29 (0xf << 4) /* read/write timeout */ -#define SCCSTO_29 (0xf << 0) /* ccs timeout */ - -/* CE_VERSION */ -#define SOFT_RST_ON (1 << 31) -#define SOFT_RST_OFF 0 - -static inline u32 sh_mmcif_readl(void __iomem *addr, int reg) -{ - return __raw_readl(addr + reg); -} - -static inline void sh_mmcif_writel(void __iomem *addr, int reg, u32 val) -{ - __raw_writel(val, addr + reg); -} - -#define SH_MMCIF_BBS 512 /* boot block size */ - -static inline void sh_mmcif_boot_cmd_send(void __iomem *base, - unsigned long cmd, unsigned long arg) -{ - sh_mmcif_writel(base, MMCIF_CE_INT, 0); - sh_mmcif_writel(base, MMCIF_CE_ARG, arg); - sh_mmcif_writel(base, MMCIF_CE_CMD_SET, cmd); -} - -static inline int sh_mmcif_boot_cmd_poll(void __iomem *base, unsigned long mask) -{ - unsigned long tmp; - int cnt; - - for (cnt = 0; cnt < 1000000; cnt++) { - tmp = sh_mmcif_readl(base, MMCIF_CE_INT); - if (tmp & mask) { - sh_mmcif_writel(base, MMCIF_CE_INT, tmp & ~mask); - return 0; - } - } - - return -1; -} - -static inline int sh_mmcif_boot_cmd(void __iomem *base, - unsigned long cmd, unsigned long arg) -{ - sh_mmcif_boot_cmd_send(base, cmd, arg); - return sh_mmcif_boot_cmd_poll(base, 0x00010000); -} - -static inline int sh_mmcif_boot_do_read_single(void __iomem *base, - unsigned int block_nr, - unsigned long *buf) -{ - int k; - - /* CMD13 - Status */ - sh_mmcif_boot_cmd(base, 0x0d400000, 0x00010000); - - if (sh_mmcif_readl(base, MMCIF_CE_RESP0) != 0x0900) - return -1; - - /* CMD17 - Read */ - sh_mmcif_boot_cmd(base, 0x11480000, block_nr * SH_MMCIF_BBS); - if (sh_mmcif_boot_cmd_poll(base, 0x00100000) < 0) - return -1; - - for (k = 0; k < (SH_MMCIF_BBS / 4); k++) - buf[k] = sh_mmcif_readl(base, MMCIF_CE_DATA); - - return 0; -} - -static inline int sh_mmcif_boot_do_read(void __iomem *base, - unsigned long first_block, - unsigned long nr_blocks, - void *buf) -{ - unsigned long k; - int ret = 0; - - /* In data transfer mode: Set clock to Bus clock/4 (about 20Mhz) */ - sh_mmcif_writel(base, MMCIF_CE_CLK_CTRL, - CLK_ENABLE | CLKDIV_4 | SRSPTO_256 | - SRBSYTO_29 | SRWDTO_29 | SCCSTO_29); - - /* CMD9 - Get CSD */ - sh_mmcif_boot_cmd(base, 0x09806000, 0x00010000); - - /* CMD7 - Select the card */ - sh_mmcif_boot_cmd(base, 0x07400000, 0x00010000); - - /* CMD16 - Set the block size */ - sh_mmcif_boot_cmd(base, 0x10400000, SH_MMCIF_BBS); - - for (k = 0; !ret && k < nr_blocks; k++) - ret = sh_mmcif_boot_do_read_single(base, first_block + k, - buf + (k * SH_MMCIF_BBS)); - - return ret; -} - -static inline void sh_mmcif_boot_init(void __iomem *base) -{ - /* reset */ - sh_mmcif_writel(base, MMCIF_CE_VERSION, SOFT_RST_ON); - sh_mmcif_writel(base, MMCIF_CE_VERSION, SOFT_RST_OFF); - - /* byte swap */ - sh_mmcif_writel(base, MMCIF_CE_BUF_ACC, BUF_ACC_ATYP); - - /* Set block size in MMCIF hardware */ - sh_mmcif_writel(base, MMCIF_CE_BLOCK_SET, SH_MMCIF_BBS); - - /* Enable the clock, set it to Bus clock/256 (about 325Khz). */ - sh_mmcif_writel(base, MMCIF_CE_CLK_CTRL, - CLK_ENABLE | CLKDIV_256 | SRSPTO_256 | - SRBSYTO_29 | SRWDTO_29 | SCCSTO_29); - - /* CMD0 */ - sh_mmcif_boot_cmd(base, 0x00000040, 0); - - /* CMD1 - Get OCR */ - do { - sh_mmcif_boot_cmd(base, 0x01405040, 0x40300000); /* CMD1 */ - } while ((sh_mmcif_readl(base, MMCIF_CE_RESP0) & 0x80000000) - != 0x80000000); - - /* CMD2 - Get CID */ - sh_mmcif_boot_cmd(base, 0x02806040, 0); - - /* CMD3 - Set card relative address */ - sh_mmcif_boot_cmd(base, 0x03400040, 0x00010000); -} - -#endif /* LINUX_MMC_SH_MMCIF_H */ diff --git a/include/linux/platform_data/sh_mmcif.h b/include/linux/platform_data/sh_mmcif.h new file mode 100644 index 000000000000..6eb914f958f9 --- /dev/null +++ b/include/linux/platform_data/sh_mmcif.h @@ -0,0 +1,207 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * platform data for eMMC driver + * + * Copyright (C) 2010 Renesas Solutions Corp. + */ + +#ifndef LINUX_MMC_SH_MMCIF_H +#define LINUX_MMC_SH_MMCIF_H + +#include +#include + +/* + * MMCIF : CE_CLK_CTRL [19:16] + * 1000 : Peripheral clock / 512 + * 0111 : Peripheral clock / 256 + * 0110 : Peripheral clock / 128 + * 0101 : Peripheral clock / 64 + * 0100 : Peripheral clock / 32 + * 0011 : Peripheral clock / 16 + * 0010 : Peripheral clock / 8 + * 0001 : Peripheral clock / 4 + * 0000 : Peripheral clock / 2 + * 1111 : Peripheral clock (sup_pclk set '1') + */ + +struct sh_mmcif_plat_data { + unsigned int slave_id_tx; /* embedded slave_id_[tr]x */ + unsigned int slave_id_rx; + u8 sup_pclk; /* 1 :SH7757, 0: SH7724/SH7372 */ + unsigned long caps; + u32 ocr; +}; + +#define MMCIF_CE_CMD_SET 0x00000000 +#define MMCIF_CE_ARG 0x00000008 +#define MMCIF_CE_ARG_CMD12 0x0000000C +#define MMCIF_CE_CMD_CTRL 0x00000010 +#define MMCIF_CE_BLOCK_SET 0x00000014 +#define MMCIF_CE_CLK_CTRL 0x00000018 +#define MMCIF_CE_BUF_ACC 0x0000001C +#define MMCIF_CE_RESP3 0x00000020 +#define MMCIF_CE_RESP2 0x00000024 +#define MMCIF_CE_RESP1 0x00000028 +#define MMCIF_CE_RESP0 0x0000002C +#define MMCIF_CE_RESP_CMD12 0x00000030 +#define MMCIF_CE_DATA 0x00000034 +#define MMCIF_CE_INT 0x00000040 +#define MMCIF_CE_INT_MASK 0x00000044 +#define MMCIF_CE_HOST_STS1 0x00000048 +#define MMCIF_CE_HOST_STS2 0x0000004C +#define MMCIF_CE_CLK_CTRL2 0x00000070 +#define MMCIF_CE_VERSION 0x0000007C + +/* CE_BUF_ACC */ +#define BUF_ACC_DMAWEN (1 << 25) +#define BUF_ACC_DMAREN (1 << 24) +#define BUF_ACC_BUSW_32 (0 << 17) +#define BUF_ACC_BUSW_16 (1 << 17) +#define BUF_ACC_ATYP (1 << 16) + +/* CE_CLK_CTRL */ +#define CLK_ENABLE (1 << 24) /* 1: output mmc clock */ +#define CLK_CLEAR (0xf << 16) +#define CLK_SUP_PCLK (0xf << 16) +#define CLKDIV_4 (1 << 16) /* mmc clock frequency. + * n: bus clock/(2^(n+1)) */ +#define CLKDIV_256 (7 << 16) /* mmc clock frequency. (see above) */ +#define SRSPTO_256 (2 << 12) /* resp timeout */ +#define SRBSYTO_29 (0xf << 8) /* resp busy timeout */ +#define SRWDTO_29 (0xf << 4) /* read/write timeout */ +#define SCCSTO_29 (0xf << 0) /* ccs timeout */ + +/* CE_VERSION */ +#define SOFT_RST_ON (1 << 31) +#define SOFT_RST_OFF 0 + +static inline u32 sh_mmcif_readl(void __iomem *addr, int reg) +{ + return __raw_readl(addr + reg); +} + +static inline void sh_mmcif_writel(void __iomem *addr, int reg, u32 val) +{ + __raw_writel(val, addr + reg); +} + +#define SH_MMCIF_BBS 512 /* boot block size */ + +static inline void sh_mmcif_boot_cmd_send(void __iomem *base, + unsigned long cmd, unsigned long arg) +{ + sh_mmcif_writel(base, MMCIF_CE_INT, 0); + sh_mmcif_writel(base, MMCIF_CE_ARG, arg); + sh_mmcif_writel(base, MMCIF_CE_CMD_SET, cmd); +} + +static inline int sh_mmcif_boot_cmd_poll(void __iomem *base, unsigned long mask) +{ + unsigned long tmp; + int cnt; + + for (cnt = 0; cnt < 1000000; cnt++) { + tmp = sh_mmcif_readl(base, MMCIF_CE_INT); + if (tmp & mask) { + sh_mmcif_writel(base, MMCIF_CE_INT, tmp & ~mask); + return 0; + } + } + + return -1; +} + +static inline int sh_mmcif_boot_cmd(void __iomem *base, + unsigned long cmd, unsigned long arg) +{ + sh_mmcif_boot_cmd_send(base, cmd, arg); + return sh_mmcif_boot_cmd_poll(base, 0x00010000); +} + +static inline int sh_mmcif_boot_do_read_single(void __iomem *base, + unsigned int block_nr, + unsigned long *buf) +{ + int k; + + /* CMD13 - Status */ + sh_mmcif_boot_cmd(base, 0x0d400000, 0x00010000); + + if (sh_mmcif_readl(base, MMCIF_CE_RESP0) != 0x0900) + return -1; + + /* CMD17 - Read */ + sh_mmcif_boot_cmd(base, 0x11480000, block_nr * SH_MMCIF_BBS); + if (sh_mmcif_boot_cmd_poll(base, 0x00100000) < 0) + return -1; + + for (k = 0; k < (SH_MMCIF_BBS / 4); k++) + buf[k] = sh_mmcif_readl(base, MMCIF_CE_DATA); + + return 0; +} + +static inline int sh_mmcif_boot_do_read(void __iomem *base, + unsigned long first_block, + unsigned long nr_blocks, + void *buf) +{ + unsigned long k; + int ret = 0; + + /* In data transfer mode: Set clock to Bus clock/4 (about 20Mhz) */ + sh_mmcif_writel(base, MMCIF_CE_CLK_CTRL, + CLK_ENABLE | CLKDIV_4 | SRSPTO_256 | + SRBSYTO_29 | SRWDTO_29 | SCCSTO_29); + + /* CMD9 - Get CSD */ + sh_mmcif_boot_cmd(base, 0x09806000, 0x00010000); + + /* CMD7 - Select the card */ + sh_mmcif_boot_cmd(base, 0x07400000, 0x00010000); + + /* CMD16 - Set the block size */ + sh_mmcif_boot_cmd(base, 0x10400000, SH_MMCIF_BBS); + + for (k = 0; !ret && k < nr_blocks; k++) + ret = sh_mmcif_boot_do_read_single(base, first_block + k, + buf + (k * SH_MMCIF_BBS)); + + return ret; +} + +static inline void sh_mmcif_boot_init(void __iomem *base) +{ + /* reset */ + sh_mmcif_writel(base, MMCIF_CE_VERSION, SOFT_RST_ON); + sh_mmcif_writel(base, MMCIF_CE_VERSION, SOFT_RST_OFF); + + /* byte swap */ + sh_mmcif_writel(base, MMCIF_CE_BUF_ACC, BUF_ACC_ATYP); + + /* Set block size in MMCIF hardware */ + sh_mmcif_writel(base, MMCIF_CE_BLOCK_SET, SH_MMCIF_BBS); + + /* Enable the clock, set it to Bus clock/256 (about 325Khz). */ + sh_mmcif_writel(base, MMCIF_CE_CLK_CTRL, + CLK_ENABLE | CLKDIV_256 | SRSPTO_256 | + SRBSYTO_29 | SRWDTO_29 | SCCSTO_29); + + /* CMD0 */ + sh_mmcif_boot_cmd(base, 0x00000040, 0); + + /* CMD1 - Get OCR */ + do { + sh_mmcif_boot_cmd(base, 0x01405040, 0x40300000); /* CMD1 */ + } while ((sh_mmcif_readl(base, MMCIF_CE_RESP0) & 0x80000000) + != 0x80000000); + + /* CMD2 - Get CID */ + sh_mmcif_boot_cmd(base, 0x02806040, 0); + + /* CMD3 - Set card relative address */ + sh_mmcif_boot_cmd(base, 0x03400040, 0x00010000); +} + +#endif /* LINUX_MMC_SH_MMCIF_H */ -- cgit From 8e274732115f63c1d09136284431b3555bd5cc56 Mon Sep 17 00:00:00 2001 From: John Ogness Date: Mon, 25 Apr 2022 23:04:28 +0206 Subject: printk: extend console_lock for per-console locking Currently threaded console printers synchronize against each other using console_lock(). However, different console drivers are unrelated and do not require any synchronization between each other. Removing the synchronization between the threaded console printers will allow each console to print at its own speed. But the threaded consoles printers do still need to synchronize against console_lock() callers. Introduce a per-console mutex and a new console boolean field @blocked to provide this synchronization. console_lock() is modified so that it must acquire the mutex of each console in order to set the @blocked field. Console printing threads will acquire their mutex while printing a record. If @blocked was set, the thread will go back to sleep instead of printing. The reason for the @blocked boolean field is so that console_lock() callers do not need to acquire multiple console mutexes simultaneously, which would introduce unnecessary complexity due to nested mutex locking. Also, a new field was chosen instead of adding a new @flags value so that the blocked status could be checked without concern of reading inconsistent values due to @flags updates from other contexts. Threaded console printers also need to synchronize against console_trylock() callers. Since console_trylock() may be called from any context, the per-console mutex cannot be used for this synchronization. (mutex_trylock() cannot be called from atomic contexts.) Introduce a global atomic counter to identify if any threaded printers are active. The threaded printers will also check the atomic counter to identify if the console has been locked by another task via console_trylock(). Note that @console_sem is still used to provide synchronization between console_lock() and console_trylock() callers. A locking overview for console_lock(), console_trylock(), and the threaded printers is as follows (pseudo code): console_lock() { down(&console_sem); for_each_console(con) { mutex_lock(&con->lock); con->blocked = true; mutex_unlock(&con->lock); } /* console_lock acquired */ } console_trylock() { if (down_trylock(&console_sem) == 0) { if (atomic_cmpxchg(&console_kthreads_active, 0, -1) == 0) { /* console_lock acquired */ } } } threaded_printer() { mutex_lock(&con->lock); if (!con->blocked) { /* console_lock() callers blocked */ if (atomic_inc_unless_negative(&console_kthreads_active)) { /* console_trylock() callers blocked */ con->write(); atomic_dec(&console_lock_count); } } mutex_unlock(&con->lock); } The console owner and waiter logic now only applies between contexts that have taken the console_lock via console_trylock(). Threaded printers never take the console_lock, so they do not have a console_lock to handover. Tasks that have used console_lock() will block the threaded printers using a mutex and if the console_lock is handed over to an atomic context, it would be unable to unblock the threaded printers. However, the console_trylock() case is really the only scenario that is interesting for handovers anyway. @panic_console_dropped must change to atomic_t since it is no longer protected exclusively by the console_lock. Since threaded printers remain asleep if they see that the console is locked, they now must be explicitly woken in __console_unlock(). This means wake_up_klogd() calls following a console_unlock() are no longer necessary and are removed. Also note that threaded printers no longer need to check @console_suspended. The check for the @blocked field implicitly covers the suspended console case. Signed-off-by: John Ogness Reviewed-by: Petr Mladek Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/878rrs6ft7.fsf@jogness.linutronix.de --- include/linux/console.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/console.h b/include/linux/console.h index 9a251e70c090..143653090c48 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -16,6 +16,7 @@ #include #include +#include struct vc_data; struct console_font_op; @@ -154,6 +155,20 @@ struct console { u64 seq; unsigned long dropped; struct task_struct *thread; + bool blocked; + + /* + * The per-console lock is used by printing kthreads to synchronize + * this console with callers of console_lock(). This is necessary in + * order to allow printing kthreads to run in parallel to each other, + * while each safely accessing the @blocked field and synchronizing + * against direct printing via console_lock/console_unlock. + * + * Note: For synchronizing against direct printing via + * console_trylock/console_unlock, see the static global + * variable @console_kthreads_active. + */ + struct mutex lock; void *data; struct console *next; -- cgit From 33337d03f04f9ea3a50ac2d3490a5d7a3ba9be82 Mon Sep 17 00:00:00 2001 From: Dylan Yudaken Date: Tue, 26 Apr 2022 01:29:05 -0700 Subject: io_uring: add io_uring_get_opcode In some debug scenarios it is useful to have the text representation of the opcode. Add this function in preparation. Signed-off-by: Dylan Yudaken Link: https://lore.kernel.org/r/20220426082907.3600028-3-dylany@fb.com Signed-off-by: Jens Axboe --- include/linux/io_uring.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 1814e698d861..24651c229ed2 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -10,6 +10,7 @@ struct sock *io_uring_get_socket(struct file *file); void __io_uring_cancel(bool cancel_all); void __io_uring_free(struct task_struct *tsk); void io_uring_unreg_ringfd(void); +const char *io_uring_get_opcode(u8 opcode); static inline void io_uring_files_cancel(void) { @@ -42,6 +43,10 @@ static inline void io_uring_files_cancel(void) static inline void io_uring_free(struct task_struct *tsk) { } +static inline const char *io_uring_get_opcode(u8 opcode) +{ + return ""; +} #endif #endif -- cgit From 54c861f930180e096483a6e1744c9c4b76e20da5 Mon Sep 17 00:00:00 2001 From: Daniel Ammann Date: Tue, 5 Apr 2022 14:54:25 +0200 Subject: mfd: tps65218: Fix trivial typo in comment The driver is for TPS65218, not TPS65219. Signed-off-by: Daniel Ammann Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220405125426.28016-1-daniel.ammann@bytesatwork.ch --- include/linux/mfd/tps65218.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mfd/tps65218.h b/include/linux/mfd/tps65218.h index f4ca367e3473..122e24ddbd4b 100644 --- a/include/linux/mfd/tps65218.h +++ b/include/linux/mfd/tps65218.h @@ -1,7 +1,7 @@ /* * linux/mfd/tps65218.h * - * Functions to access TPS65219 power management chip. + * Functions to access TPS65218 power management chip. * * Copyright (C) 2014 Texas Instruments Incorporated - https://www.ti.com/ * -- cgit From 7f5aaa4a0ae6948eace6cf30f40a3ec27ece8441 Mon Sep 17 00:00:00 2001 From: Maíra Canal Date: Wed, 6 Apr 2022 11:40:41 -0300 Subject: mfd: hi655x-pmic: Replace legacy gpio interface for gpiod interface MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Considering the current transition of the GPIO subsystem, remove all dependencies of the legacy GPIO interface (linux/gpio.h and linux /of_gpio.h) and replace it with the descriptor-based GPIO approach. Signed-off-by: Maíra Canal Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/Yk2maZuf+5FGL+eg@fedora --- include/linux/mfd/hi655x-pmic.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mfd/hi655x-pmic.h b/include/linux/mfd/hi655x-pmic.h index af5d97239c0d..6a012784dd1b 100644 --- a/include/linux/mfd/hi655x-pmic.h +++ b/include/linux/mfd/hi655x-pmic.h @@ -12,6 +12,8 @@ #ifndef __HI655X_PMIC_H #define __HI655X_PMIC_H +#include + /* Hi655x registers are mapped to memory bus in 4 bytes stride */ #define HI655X_STRIDE 4 #define HI655X_BUS_ADDR(x) ((x) << 2) @@ -53,7 +55,7 @@ struct hi655x_pmic { struct resource *res; struct device *dev; struct regmap *regmap; - int gpio; + struct gpio_desc *gpio; unsigned int ver; struct regmap_irq_chip_data *irq_data; }; -- cgit From 82028ba4d590c4e9eb3c93326019db7c26d02e35 Mon Sep 17 00:00:00 2001 From: Fabien Parent Date: Fri, 15 Apr 2022 17:36:24 +0200 Subject: mfd: mt6359: Add missing defines necessary for mtk-pmic-keys support Add 2 missing MT6359 registers that are needed to implement the keyboard driver. Signed-off-by: Fabien Parent Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220415153629.1817202-3-fparent@baylibre.com --- include/linux/mfd/mt6359/registers.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mfd/mt6359/registers.h b/include/linux/mfd/mt6359/registers.h index 2135c9695918..2a4394a27b1c 100644 --- a/include/linux/mfd/mt6359/registers.h +++ b/include/linux/mfd/mt6359/registers.h @@ -8,6 +8,8 @@ /* PMIC Registers */ #define MT6359_SWCID 0xa +#define MT6359_TOPSTATUS 0x2a +#define MT6359_TOP_RST_MISC 0x14c #define MT6359_MISC_TOP_INT_CON0 0x188 #define MT6359_MISC_TOP_INT_STATUS0 0x194 #define MT6359_TOP_INT_STATUS0 0x19e -- cgit From c317ab71facc2cd0a94145973318a4c914e11acc Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Mon, 25 Apr 2022 21:32:47 +0800 Subject: bpf: Compute map_btf_id during build time For now, the field 'map_btf_id' in 'struct bpf_map_ops' for all map types are computed during vmlinux-btf init: btf_parse_vmlinux() -> btf_vmlinux_map_ids_init() It will lookup the btf_type according to the 'map_btf_name' field in 'struct bpf_map_ops'. This process can be done during build time, thanks to Jiri's resolve_btfids. selftest of map_ptr has passed: $96 map_ptr:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Reported-by: kernel test robot Signed-off-by: Menglong Dong Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0af5793ba417..be94833d390a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -148,8 +148,7 @@ struct bpf_map_ops { bpf_callback_t callback_fn, void *callback_ctx, u64 flags); - /* BTF name and id of struct allocated by map_alloc */ - const char * const map_btf_name; + /* BTF id of struct allocated by map_alloc */ int *map_btf_id; /* bpf_iter info used to open a seq_file */ -- cgit From 3ce0f2373f7073f04715b1ffd7b1812d3183f79a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 5 Apr 2022 15:12:57 +0800 Subject: compat: consolidate the compat_flock{,64} definition Provide a single common definition for the compat_flock and compat_flock64 structures using the same tricks as for the native variants. Another extra define is added for the packing required on x86. Signed-off-by: Christoph Hellwig Signed-off-by: Guo Ren Reviewed-by: Arnd Bergmann Tested-by: Heiko Stuebner Acked-by: Helge Deller # parisc Link: https://lore.kernel.org/r/20220405071314.3225832-4-guoren@kernel.org Signed-off-by: Palmer Dabbelt --- include/linux/compat.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compat.h b/include/linux/compat.h index 1c758b0e0359..a0481fe6c5d5 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -258,6 +258,37 @@ struct compat_rlimit { compat_ulong_t rlim_max; }; +#ifdef __ARCH_NEED_COMPAT_FLOCK64_PACKED +#define __ARCH_COMPAT_FLOCK64_PACK __attribute__((packed)) +#else +#define __ARCH_COMPAT_FLOCK64_PACK +#endif + +struct compat_flock { + short l_type; + short l_whence; + compat_off_t l_start; + compat_off_t l_len; +#ifdef __ARCH_COMPAT_FLOCK_EXTRA_SYSID + __ARCH_COMPAT_FLOCK_EXTRA_SYSID +#endif + compat_pid_t l_pid; +#ifdef __ARCH_COMPAT_FLOCK_PAD + __ARCH_COMPAT_FLOCK_PAD +#endif +}; + +struct compat_flock64 { + short l_type; + short l_whence; + compat_loff_t l_start; + compat_loff_t l_len; + compat_pid_t l_pid; +#ifdef __ARCH_COMPAT_FLOCK64_PAD + __ARCH_COMPAT_FLOCK64_PAD +#endif +} __ARCH_COMPAT_FLOCK64_PACK; + struct compat_rusage { struct old_timeval32 ru_utime; struct old_timeval32 ru_stime; -- cgit From 59c10c52f573faca862cda5ebcdd43831608eb5a Mon Sep 17 00:00:00 2001 From: Guo Ren Date: Tue, 5 Apr 2022 15:13:05 +0800 Subject: riscv: compat: syscall: Add compat_sys_call_table implementation Implement compat sys_call_table and some system call functions: truncate64, ftruncate64, fallocate, pread64, pwrite64, sync_file_range, readahead, fadvise64_64 which need argument translation. Signed-off-by: Guo Ren Signed-off-by: Guo Ren Reviewed-by: Arnd Bergmann Tested-by: Heiko Stuebner Link: https://lore.kernel.org/r/20220405071314.3225832-12-guoren@kernel.org Signed-off-by: Palmer Dabbelt --- include/linux/compat.h | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compat.h b/include/linux/compat.h index a0481fe6c5d5..ce6cd9b30d27 100644 --- a/include/linux/compat.h +++ b/include/linux/compat.h @@ -926,6 +926,43 @@ asmlinkage long compat_sys_sigaction(int sig, /* obsolete: net/socket.c */ asmlinkage long compat_sys_socketcall(int call, u32 __user *args); +#ifdef __ARCH_WANT_COMPAT_TRUNCATE64 +asmlinkage long compat_sys_truncate64(const char __user *pathname, compat_arg_u64(len)); +#endif + +#ifdef __ARCH_WANT_COMPAT_FTRUNCATE64 +asmlinkage long compat_sys_ftruncate64(unsigned int fd, compat_arg_u64(len)); +#endif + +#ifdef __ARCH_WANT_COMPAT_FALLOCATE +asmlinkage long compat_sys_fallocate(int fd, int mode, compat_arg_u64(offset), + compat_arg_u64(len)); +#endif + +#ifdef __ARCH_WANT_COMPAT_PREAD64 +asmlinkage long compat_sys_pread64(unsigned int fd, char __user *buf, size_t count, + compat_arg_u64(pos)); +#endif + +#ifdef __ARCH_WANT_COMPAT_PWRITE64 +asmlinkage long compat_sys_pwrite64(unsigned int fd, const char __user *buf, size_t count, + compat_arg_u64(pos)); +#endif + +#ifdef __ARCH_WANT_COMPAT_SYNC_FILE_RANGE +asmlinkage long compat_sys_sync_file_range(int fd, compat_arg_u64(pos), + compat_arg_u64(nbytes), unsigned int flags); +#endif + +#ifdef __ARCH_WANT_COMPAT_FADVISE64_64 +asmlinkage long compat_sys_fadvise64_64(int fd, compat_arg_u64(pos), + compat_arg_u64(len), int advice); +#endif + +#ifdef __ARCH_WANT_COMPAT_READAHEAD +asmlinkage long compat_sys_readahead(int fd, compat_arg_u64(offset), size_t count); +#endif + #endif /* CONFIG_ARCH_HAS_SYSCALL_WRAPPER */ /** -- cgit From a2a9d67a26ec94a99ed29efbd61cf5be0a575678 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Wed, 6 Apr 2022 11:31:19 +0900 Subject: bootconfig: Support embedding a bootconfig file in kernel This allows kernel developer to embed a default bootconfig file in the kernel instead of embedding it in the initrd. This will be good for who are using the kernel without initrd, or who needs a default bootconfigs. This needs to set two kconfigs: CONFIG_BOOT_CONFIG_EMBED=y and set the file path to CONFIG_BOOT_CONFIG_EMBED_FILE. Note that you still need 'bootconfig' command line option to load the embedded bootconfig. Also if you boot using an initrd with a different bootconfig, the kernel will use the bootconfig in the initrd, instead of the default bootconfig. Link: https://lkml.kernel.org/r/164921227943.1090670.14035119557571329218.stgit@devnote2 Cc: Padmanabha Srinivasaiah Cc: Jonathan Corbet Cc: Randy Dunlap Cc: Nick Desaulniers Cc: Sami Tolvanen Cc: Nathan Chancellor Cc: Masahiro Yamada Cc: Linux Kbuild mailing list Signed-off-by: Masami Hiramatsu Signed-off-by: Steven Rostedt (Google) --- include/linux/bootconfig.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bootconfig.h b/include/linux/bootconfig.h index a4665c7ab07c..1611f9db878e 100644 --- a/include/linux/bootconfig.h +++ b/include/linux/bootconfig.h @@ -289,4 +289,14 @@ int __init xbc_get_info(int *node_size, size_t *data_size); /* XBC cleanup data structures */ void __init xbc_exit(void); +/* XBC embedded bootconfig data in kernel */ +#ifdef CONFIG_BOOT_CONFIG_EMBED +const char * __init xbc_get_embedded_bootconfig(size_t *size); +#else +static inline const char *xbc_get_embedded_bootconfig(size_t *size) +{ + return NULL; +} +#endif + #endif -- cgit From 68822bdf76f10c3dc80609d4e2cdc1e847429086 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 22 Apr 2022 13:12:37 -0700 Subject: net: generalize skb freeing deferral to per-cpu lists Logic added in commit f35f821935d8 ("tcp: defer skb freeing after socket lock is released") helped bulk TCP flows to move the cost of skbs frees outside of critical section where socket lock was held. But for RPC traffic, or hosts with RFS enabled, the solution is far from being ideal. For RPC traffic, recvmsg() has to return to user space right after skb payload has been consumed, meaning that BH handler has no chance to pick the skb before recvmsg() thread. This issue is more visible with BIG TCP, as more RPC fit one skb. For RFS, even if BH handler picks the skbs, they are still picked from the cpu on which user thread is running. Ideally, it is better to free the skbs (and associated page frags) on the cpu that originally allocated them. This patch removes the per socket anchor (sk->defer_list) and instead uses a per-cpu list, which will hold more skbs per round. This new per-cpu list is drained at the end of net_action_rx(), after incoming packets have been processed, to lower latencies. In normal conditions, skbs are added to the per-cpu list with no further action. In the (unlikely) cases where the cpu does not run net_action_rx() handler fast enough, we use an IPI to raise NET_RX_SOFTIRQ on the remote cpu. Also, we do not bother draining the per-cpu list from dev_cpu_dead() This is because skbs in this list have no requirement on how fast they should be freed. Note that we can add in the future a small per-cpu cache if we see any contention on sd->defer_lock. Tested on a pair of hosts with 100Gbit NIC, RFS enabled, and /proc/sys/net/ipv4/tcp_rmem[2] tuned to 16MB to work around page recycling strategy used by NIC driver (its page pool capacity being too small compared to number of skbs/pages held in sockets receive queues) Note that this tuning was only done to demonstrate worse conditions for skb freeing for this particular test. These conditions can happen in more general production workload. 10 runs of one TCP_STREAM flow Before: Average throughput: 49685 Mbit. Kernel profiles on cpu running user thread recvmsg() show high cost for skb freeing related functions (*) 57.81% [kernel] [k] copy_user_enhanced_fast_string (*) 12.87% [kernel] [k] skb_release_data (*) 4.25% [kernel] [k] __free_one_page (*) 3.57% [kernel] [k] __list_del_entry_valid 1.85% [kernel] [k] __netif_receive_skb_core 1.60% [kernel] [k] __skb_datagram_iter (*) 1.59% [kernel] [k] free_unref_page_commit (*) 1.16% [kernel] [k] __slab_free 1.16% [kernel] [k] _copy_to_iter (*) 1.01% [kernel] [k] kfree (*) 0.88% [kernel] [k] free_unref_page 0.57% [kernel] [k] ip6_rcv_core 0.55% [kernel] [k] ip6t_do_table 0.54% [kernel] [k] flush_smp_call_function_queue (*) 0.54% [kernel] [k] free_pcppages_bulk 0.51% [kernel] [k] llist_reverse_order 0.38% [kernel] [k] process_backlog (*) 0.38% [kernel] [k] free_pcp_prepare 0.37% [kernel] [k] tcp_recvmsg_locked (*) 0.37% [kernel] [k] __list_add_valid 0.34% [kernel] [k] sock_rfree 0.34% [kernel] [k] _raw_spin_lock_irq (*) 0.33% [kernel] [k] __page_cache_release 0.33% [kernel] [k] tcp_v6_rcv (*) 0.33% [kernel] [k] __put_page (*) 0.29% [kernel] [k] __mod_zone_page_state 0.27% [kernel] [k] _raw_spin_lock After patch: Average throughput: 73076 Mbit. Kernel profiles on cpu running user thread recvmsg() looks better: 81.35% [kernel] [k] copy_user_enhanced_fast_string 1.95% [kernel] [k] _copy_to_iter 1.95% [kernel] [k] __skb_datagram_iter 1.27% [kernel] [k] __netif_receive_skb_core 1.03% [kernel] [k] ip6t_do_table 0.60% [kernel] [k] sock_rfree 0.50% [kernel] [k] tcp_v6_rcv 0.47% [kernel] [k] ip6_rcv_core 0.45% [kernel] [k] read_tsc 0.44% [kernel] [k] _raw_spin_lock_irqsave 0.37% [kernel] [k] _raw_spin_lock 0.37% [kernel] [k] native_irq_return_iret 0.33% [kernel] [k] __inet6_lookup_established 0.31% [kernel] [k] ip6_protocol_deliver_rcu 0.29% [kernel] [k] tcp_rcv_established 0.29% [kernel] [k] llist_reverse_order v2: kdoc issue (kernel bots) do not defer if (alloc_cpu == smp_processor_id()) (Paolo) replace the sk_buff_head with a single-linked list (Jakub) add a READ_ONCE()/WRITE_ONCE() for the lockless read of sd->defer_list Signed-off-by: Eric Dumazet Acked-by: Paolo Abeni Link: https://lore.kernel.org/r/20220422201237.416238-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 5 +++++ include/linux/skbuff.h | 3 +++ 2 files changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7dccbfd1bf56..ac8a5f71220a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3081,6 +3081,11 @@ struct softnet_data { struct sk_buff_head input_pkt_queue; struct napi_struct backlog; + /* Another possibly contended cache line */ + spinlock_t defer_lock ____cacheline_aligned_in_smp; + int defer_count; + struct sk_buff *defer_list; + call_single_data_t defer_csd; }; static inline void input_queue_head_incr(struct softnet_data *sd) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 84d78df60453..5cbc184ca685 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -888,6 +888,7 @@ typedef unsigned char *sk_buff_data_t; * delivery_time at egress. * @napi_id: id of the NAPI struct this skb came from * @sender_cpu: (aka @napi_id) source CPU in XPS + * @alloc_cpu: CPU which did the skb allocation. * @secmark: security marking * @mark: Generic packet mark * @reserved_tailroom: (aka @mark) number of bytes of free space available @@ -1080,6 +1081,7 @@ struct sk_buff { unsigned int sender_cpu; }; #endif + u16 alloc_cpu; #ifdef CONFIG_NETWORK_SECMARK __u32 secmark; #endif @@ -1321,6 +1323,7 @@ struct sk_buff *__build_skb(void *data, unsigned int frag_size); struct sk_buff *build_skb(void *data, unsigned int frag_size); struct sk_buff *build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size); +void skb_attempt_defer_free(struct sk_buff *skb); struct sk_buff *napi_build_skb(void *data, unsigned int frag_size); -- cgit From 6510ea973d8d9d4a0cb2fb557b36bd1ab3eb49f6 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 25 Apr 2022 18:39:46 +0200 Subject: net: Use this_cpu_inc() to increment net->core_stats The macro dev_core_stats_##FIELD##_inc() disables preemption and invokes netdev_core_stats_alloc() to return a per-CPU pointer. netdev_core_stats_alloc() will allocate memory on its first invocation which breaks on PREEMPT_RT because it requires non-atomic context for memory allocation. This can be avoided by enabling preemption in netdev_core_stats_alloc() assuming the caller always disables preemption. It might be better to replace local_inc() with this_cpu_inc() now that dev_core_stats_##FIELD##_inc() gained a preempt-disable section and does not rely on already disabled preemption. This results in less instructions on x86-64: local_inc: | incl %gs:__preempt_count(%rip) # __preempt_count | movq 488(%rdi), %rax # _1->core_stats, _22 | testq %rax, %rax # _22 | je .L585 #, | add %gs:this_cpu_off(%rip), %rax # this_cpu_off, tcp_ptr__ | .L586: | testq %rax, %rax # _27 | je .L587 #, | incq (%rax) # _6->a.counter | .L587: | decl %gs:__preempt_count(%rip) # __preempt_count this_cpu_inc(), this patch: | movq 488(%rdi), %rax # _1->core_stats, _5 | testq %rax, %rax # _5 | je .L591 #, | .L585: | incq %gs:(%rax) # _18->rx_dropped Use unsigned long as type for the counter. Use this_cpu_inc() to increment the counter. Use a plain read of the counter. Signed-off-by: Sebastian Andrzej Siewior Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/YmbO0pxgtKpCw4SY@linutronix.de Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 59e27a2b7bf0..b1fbe21650bb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -199,10 +199,10 @@ struct net_device_stats { * Try to fit them in a single cache line, for dev_get_stats() sake. */ struct net_device_core_stats { - local_t rx_dropped; - local_t tx_dropped; - local_t rx_nohandler; -} __aligned(4 * sizeof(local_t)); + unsigned long rx_dropped; + unsigned long tx_dropped; + unsigned long rx_nohandler; +} __aligned(4 * sizeof(unsigned long)); #include #include @@ -3843,15 +3843,15 @@ static __always_inline bool __is_skb_forwardable(const struct net_device *dev, return false; } -struct net_device_core_stats *netdev_core_stats_alloc(struct net_device *dev); +struct net_device_core_stats __percpu *netdev_core_stats_alloc(struct net_device *dev); -static inline struct net_device_core_stats *dev_core_stats(struct net_device *dev) +static inline struct net_device_core_stats __percpu *dev_core_stats(struct net_device *dev) { /* This READ_ONCE() pairs with the write in netdev_core_stats_alloc() */ struct net_device_core_stats __percpu *p = READ_ONCE(dev->core_stats); if (likely(p)) - return this_cpu_ptr(p); + return p; return netdev_core_stats_alloc(dev); } @@ -3859,14 +3859,11 @@ static inline struct net_device_core_stats *dev_core_stats(struct net_device *de #define DEV_CORE_STATS_INC(FIELD) \ static inline void dev_core_stats_##FIELD##_inc(struct net_device *dev) \ { \ - struct net_device_core_stats *p; \ + struct net_device_core_stats __percpu *p; \ \ - preempt_disable(); \ p = dev_core_stats(dev); \ - \ if (p) \ - local_inc(&p->FIELD); \ - preempt_enable(); \ + this_cpu_inc(p->FIELD); \ } DEV_CORE_STATS_INC(rx_dropped) DEV_CORE_STATS_INC(tx_dropped) -- cgit From 6423d2951087231706246f81851067f7f0593d4a Mon Sep 17 00:00:00 2001 From: Won Chung Date: Mon, 14 Mar 2022 19:54:58 +0000 Subject: driver core: Add sysfs support for physical location of a device When ACPI table includes _PLD fields for a device, create a new directory (physical_location) in sysfs to share _PLD fields. Currently without PLD information, when there are multiple of same devices, it is hard to distinguish which device corresponds to which physical device at which location. For example, when there are two Type C connectors, it is hard to find out which connector corresponds to the Type C port on the left panel versus the Type C port on the right panel. With PLD information provided, we can determine which specific device at which location is doing what. _PLD output includes much more fields, but only generic fields are added and exposed to sysfs, so that non-ACPI devices can also support it in the future. The minimal generic fields needed for locating a device are the following. - panel - vertical_position - horizontal_position - dock - lid Signed-off-by: Won Chung Link: https://lore.kernel.org/r/20220314195458.271430-1-wonchung@google.com Signed-off-by: Greg Kroah-Hartman --- include/linux/device.h | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index 93459724dcde..766fbea6ca83 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -386,6 +386,75 @@ struct dev_msi_info { #endif }; +/** + * enum device_physical_location_panel - Describes which panel surface of the + * system's housing the device connection point resides on. + * @DEVICE_PANEL_TOP: Device connection point is on the top panel. + * @DEVICE_PANEL_BOTTOM: Device connection point is on the bottom panel. + * @DEVICE_PANEL_LEFT: Device connection point is on the left panel. + * @DEVICE_PANEL_RIGHT: Device connection point is on the right panel. + * @DEVICE_PANEL_FRONT: Device connection point is on the front panel. + * @DEVICE_PANEL_BACK: Device connection point is on the back panel. + * @DEVICE_PANEL_UNKNOWN: The panel with device connection point is unknown. + */ +enum device_physical_location_panel { + DEVICE_PANEL_TOP, + DEVICE_PANEL_BOTTOM, + DEVICE_PANEL_LEFT, + DEVICE_PANEL_RIGHT, + DEVICE_PANEL_FRONT, + DEVICE_PANEL_BACK, + DEVICE_PANEL_UNKNOWN, +}; + +/** + * enum device_physical_location_vertical_position - Describes vertical + * position of the device connection point on the panel surface. + * @DEVICE_VERT_POS_UPPER: Device connection point is at upper part of panel. + * @DEVICE_VERT_POS_CENTER: Device connection point is at center part of panel. + * @DEVICE_VERT_POS_LOWER: Device connection point is at lower part of panel. + */ +enum device_physical_location_vertical_position { + DEVICE_VERT_POS_UPPER, + DEVICE_VERT_POS_CENTER, + DEVICE_VERT_POS_LOWER, +}; + +/** + * enum device_physical_location_horizontal_position - Describes horizontal + * position of the device connection point on the panel surface. + * @DEVICE_HORI_POS_LEFT: Device connection point is at left part of panel. + * @DEVICE_HORI_POS_CENTER: Device connection point is at center part of panel. + * @DEVICE_HORI_POS_RIGHT: Device connection point is at right part of panel. + */ +enum device_physical_location_horizontal_position { + DEVICE_HORI_POS_LEFT, + DEVICE_HORI_POS_CENTER, + DEVICE_HORI_POS_RIGHT, +}; + +/** + * struct device_physical_location - Device data related to physical location + * of the device connection point. + * @panel: Panel surface of the system's housing that the device connection + * point resides on. + * @vertical_position: Vertical position of the device connection point within + * the panel. + * @horizontal_position: Horizontal position of the device connection point + * within the panel. + * @dock: Set if the device connection point resides in a docking station or + * port replicator. + * @lid: Set if this device connection point resides on the lid of laptop + * system. + */ +struct device_physical_location { + enum device_physical_location_panel panel; + enum device_physical_location_vertical_position vertical_position; + enum device_physical_location_horizontal_position horizontal_position; + bool dock; + bool lid; +}; + /** * struct device - The basic device structure * @parent: The device's "parent" device, the device to which it is attached. @@ -453,6 +522,8 @@ struct dev_msi_info { * device (i.e. the bus driver that discovered the device). * @iommu_group: IOMMU group the device belongs to. * @iommu: Per device generic IOMMU runtime data + * @physical_location: Describes physical location of the device connection + * point in the system housing. * @removable: Whether the device can be removed from the system. This * should be set by the subsystem / bus driver that discovered * the device. @@ -567,6 +638,8 @@ struct device { struct iommu_group *iommu_group; struct dev_iommu *iommu; + struct device_physical_location *physical_location; + enum device_removable removable; bool offline_disabled:1; -- cgit From b041b525dab95352fbd666b14dc73ab898df465f Mon Sep 17 00:00:00 2001 From: Tony Luck Date: Thu, 10 Mar 2022 12:48:53 -0800 Subject: x86/split_lock: Make life miserable for split lockers In https://lore.kernel.org/all/87y22uujkm.ffs@tglx/ Thomas said: Its's simply wishful thinking that stuff gets fixed because of a WARN_ONCE(). This has never worked. The only thing which works is to make stuff fail hard or slow it down in a way which makes it annoying enough to users to complain. He was talking about WBINVD. But it made me think about how we use the split lock detection feature in Linux. Existing code has three options for applications: 1) Don't enable split lock detection (allow arbitrary split locks) 2) Warn once when a process uses split lock, but let the process keep running with split lock detection disabled 3) Kill process that use split locks Option 2 falls into the "wishful thinking" territory that Thomas warns does nothing. But option 3 might not be viable in a situation with legacy applications that need to run. Hence make option 2 much stricter to "slow it down in a way which makes it annoying". Primary reason for this change is to provide better quality of service to the rest of the applications running on the system. Internal testing shows that even with many processes splitting locks, performance for the rest of the system is much more responsive. The new "warn" mode operates like this. When an application tries to execute a bus lock the #AC handler. 1) Delays (interruptibly) 10 ms before moving to next step. 2) Blocks (interruptibly) until it can get the semaphore If interrupted, just return. Assume the signal will either kill the task, or direct execution away from the instruction that is trying to get the bus lock. 3) Disables split lock detection for the current core 4) Schedules a work queue to re-enable split lock detect in 2 jiffies 5) Returns The work queue that re-enables split lock detection also releases the semaphore. There is a corner case where a CPU may be taken offline while split lock detection is disabled. A CPU hotplug handler handles this case. Old behaviour was to only print the split lock warning on the first occurrence of a split lock from a task. Preserve that by adding a flag to the task structure that suppresses subsequent split lock messages from that task. Signed-off-by: Tony Luck Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/20220310204854.31752-2-tony.luck@intel.com --- include/linux/sched.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index a8911b1f35aa..23e03c7824c4 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -941,6 +941,9 @@ struct task_struct { #ifdef CONFIG_IOMMU_SVA unsigned pasid_activated:1; #endif +#ifdef CONFIG_CPU_SUP_INTEL + unsigned reported_split_lock:1; +#endif unsigned long atomic_flags; /* Flags requiring atomic access. */ -- cgit From 4fd62f15afa0d0da4823f429a2fb4c3492a84edf Mon Sep 17 00:00:00 2001 From: Chuanhong Guo Date: Sun, 24 Apr 2022 11:25:23 +0800 Subject: mtd: nand: make mtk_ecc.c a separated module this code will be used in mediatek snfi spi-mem controller with pipelined ECC engine. Signed-off-by: Chuanhong Guo Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220424032527.673605-2-gch981213@gmail.com --- include/linux/mtd/nand-ecc-mtk.h | 47 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 47 insertions(+) create mode 100644 include/linux/mtd/nand-ecc-mtk.h (limited to 'include/linux') diff --git a/include/linux/mtd/nand-ecc-mtk.h b/include/linux/mtd/nand-ecc-mtk.h new file mode 100644 index 000000000000..0e48c36e6ca0 --- /dev/null +++ b/include/linux/mtd/nand-ecc-mtk.h @@ -0,0 +1,47 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* + * MTK SDG1 ECC controller + * + * Copyright (c) 2016 Mediatek + * Authors: Xiaolei Li + * Jorge Ramirez-Ortiz + */ + +#ifndef __DRIVERS_MTD_NAND_MTK_ECC_H__ +#define __DRIVERS_MTD_NAND_MTK_ECC_H__ + +#include + +enum mtk_ecc_mode {ECC_DMA_MODE = 0, ECC_NFI_MODE = 1}; +enum mtk_ecc_operation {ECC_ENCODE, ECC_DECODE}; + +struct device_node; +struct mtk_ecc; + +struct mtk_ecc_stats { + u32 corrected; + u32 bitflips; + u32 failed; +}; + +struct mtk_ecc_config { + enum mtk_ecc_operation op; + enum mtk_ecc_mode mode; + dma_addr_t addr; + u32 strength; + u32 sectors; + u32 len; +}; + +int mtk_ecc_encode(struct mtk_ecc *, struct mtk_ecc_config *, u8 *, u32); +void mtk_ecc_get_stats(struct mtk_ecc *, struct mtk_ecc_stats *, int); +int mtk_ecc_wait_done(struct mtk_ecc *, enum mtk_ecc_operation); +int mtk_ecc_enable(struct mtk_ecc *, struct mtk_ecc_config *); +void mtk_ecc_disable(struct mtk_ecc *); +void mtk_ecc_adjust_strength(struct mtk_ecc *ecc, u32 *p); +unsigned int mtk_ecc_get_parity_bits(struct mtk_ecc *ecc); + +struct mtk_ecc *of_mtk_ecc_get(struct device_node *); +void mtk_ecc_release(struct mtk_ecc *); + +#endif -- cgit From e5be15767e7e284351853cbaba80cde8620341fb Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Mon, 25 Apr 2022 08:07:48 -0400 Subject: hex2bin: make the function hex_to_bin constant-time The function hex2bin is used to load cryptographic keys into device mapper targets dm-crypt and dm-integrity. It should take constant time independent on the processed data, so that concurrently running unprivileged code can't infer any information about the keys via microarchitectural convert channels. This patch changes the function hex_to_bin so that it contains no branches and no memory accesses. Note that this shouldn't cause performance degradation because the size of the new function is the same as the size of the old function (on x86-64) - and the new function causes no branch misprediction penalties. I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64 i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32 sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64 powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are no branches in the generated code. Signed-off-by: Mikulas Patocka Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds --- include/linux/kernel.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kernel.h b/include/linux/kernel.h index a890428bcc1a..fe6efb24d151 100644 --- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -285,7 +285,7 @@ static inline char *hex_byte_pack_upper(char *buf, u8 byte) return buf; } -extern int hex_to_bin(char ch); +extern int hex_to_bin(unsigned char ch); extern int __must_check hex2bin(u8 *dst, const char *src, size_t count); extern char *bin2hex(char *dst, const void *src, size_t count); -- cgit From 7d84c1ebf9ddafca27b481e6da7d24a023dacaa2 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Fri, 15 Apr 2022 21:20:02 +0200 Subject: x86/aperfmperf: Replace aperfmperf_get_khz() The frequency invariance infrastructure provides the APERF/MPERF samples already. Utilize them for the cpu frequency display in /proc/cpuinfo. The sample is considered valid for 20ms. So for idle or isolated NOHZ full CPUs the function returns 0, which is matching the previous behaviour. This gets rid of the mass IPIs and a delay of 20ms for stabilizing observed by Eric when reading /proc/cpuinfo. Reported-by: Eric Dumazet Signed-off-by: Thomas Gleixner Tested-by: Eric Dumazet Reviewed-by: Rafael J. Wysocki Acked-by: Peter Zijlstra (Intel) Acked-by: Paul E. McKenney Link: https://lore.kernel.org/r/20220415161206.875029458@linutronix.de --- include/linux/cpufreq.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h index 35c7d6db4139..d5595d57f4e5 100644 --- a/include/linux/cpufreq.h +++ b/include/linux/cpufreq.h @@ -1199,7 +1199,6 @@ static inline void sched_cpufreq_governor_change(struct cpufreq_policy *policy, struct cpufreq_governor *old_gov) { } #endif -extern void arch_freq_prepare_all(void); extern unsigned int arch_freq_get_on_cpu(int cpu); #ifndef arch_set_freq_scale -- cgit From b941820ec938d175fd6e4e7ba5b4a6ca9d092afd Mon Sep 17 00:00:00 2001 From: Heikki Krogerus Date: Mon, 25 Apr 2022 14:45:44 +0300 Subject: ACPI: OSL: Remove the helper for deactivating memory region There are no more users for acpi_release_memory(). Signed-off-by: Heikki Krogerus Reviewed-by: Greg Kroah-Hartman Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index d7136d13aa44..fadda404bcc9 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -520,9 +520,6 @@ int acpi_check_resource_conflict(const struct resource *res); int acpi_check_region(resource_size_t start, resource_size_t n, const char *name); -acpi_status acpi_release_memory(acpi_handle handle, struct resource *res, - u32 level); - int acpi_resources_are_enforced(void); #ifdef CONFIG_HIBERNATION -- cgit From 0a8e98305f63deaf0a799d5cf5532cc83af035d1 Mon Sep 17 00:00:00 2001 From: Tokunori Ikegami Date: Thu, 24 Mar 2022 02:04:56 +0900 Subject: mtd: cfi_cmdset_0002: Use chip_ready() for write on S29GL064N Since commit dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value") buffered writes fail on S29GL064N. This is because, on S29GL064N, reads return 0xFF at the end of DQ polling for write completion, where as, chip_good() check expects actual data written to the last location to be returned post DQ polling completion. Fix is to revert to using chip_good() for S29GL064N which only checks for DQ lines to settle down to determine write completion. Link: https://lore.kernel.org/r/b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.de/ Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value") Cc: stable@vger.kernel.org Signed-off-by: Tokunori Ikegami Acked-by: Vignesh Raghavendra Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220323170458.5608-3-ikegami.t@gmail.com --- include/linux/mtd/cfi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mtd/cfi.h b/include/linux/mtd/cfi.h index fd1ecb821106..d88bb56c18e2 100644 --- a/include/linux/mtd/cfi.h +++ b/include/linux/mtd/cfi.h @@ -286,6 +286,7 @@ struct cfi_private { map_word sector_erase_cmd; unsigned long chipshift; /* Because they're of the same type */ const char *im_name; /* inter_module name for cmdset_setup */ + unsigned long quirks; struct flchip chips[]; /* per-chip data structure for each chip */ }; -- cgit From ed36d04e8f8d7b00db451b0fa56a54e8e02ec43e Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Mon, 25 Apr 2022 13:42:02 +0100 Subject: iommu: Introduce device_iommu_capable() iommu_capable() only really works for systems where all IOMMU instances are completely homogeneous, and all devices are IOMMU-mapped. Implement the new variant which will be able to give a more accurate answer for whichever device the caller is actually interested in, and even more so once all the external users have been converted and we can reliably pass the device pointer through the internal driver interface too. Signed-off-by: Robin Murphy Link: https://lore.kernel.org/r/8407eb9586677995b7a9fd70d0fd82d85929a9bb.1650878781.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9208eca4b0d1..e26cf84e5d82 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -417,6 +417,7 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops); extern int bus_iommu_probe(struct bus_type *bus); extern bool iommu_present(struct bus_type *bus); +extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap); extern bool iommu_capable(struct bus_type *bus, enum iommu_cap cap); extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus); extern struct iommu_group *iommu_group_get_by_id(int id); @@ -689,6 +690,11 @@ static inline bool iommu_present(struct bus_type *bus) return false; } +static inline bool device_iommu_capable(struct device *dev, enum iommu_cap cap) +{ + return false; +} + static inline bool iommu_capable(struct bus_type *bus, enum iommu_cap cap) { return false; -- cgit From d0be55fbeb6ac694d15af5d1aad19cdec8cd64e5 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Mon, 25 Apr 2022 13:42:03 +0100 Subject: iommu: Add capability for pre-boot DMA protection VT-d's dmar_platform_optin() actually represents a combination of properties fairly well standardised by Microsoft as "Pre-boot DMA Protection" and "Kernel DMA Protection"[1]. As such, we can provide interested consumers with an abstracted capability rather than driver-specific interfaces that won't scale. We name it for the former aspect since that's what external callers are most likely to be interested in; the latter is for the IOMMU layer to handle itself. [1] https://docs.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Reviewed-by: Lu Baolu Signed-off-by: Robin Murphy Link: https://lore.kernel.org/r/d6218dff2702472da80db6aec2c9589010684551.1650878781.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index e26cf84e5d82..4123693ae319 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -107,6 +107,8 @@ enum iommu_cap { transactions */ IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */ IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */ + IOMMU_CAP_PRE_BOOT_PROTECTION, /* Firmware says it used the IOMMU for + DMA protection and we should too */ }; /* These are the possible reserved region types */ -- cgit From 86eaf4a5b4312bea8676fb79399d9e08b53d8e71 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Mon, 25 Apr 2022 13:42:04 +0100 Subject: thunderbolt: Make iommu_dma_protection more accurate Between me trying to get rid of iommu_present() and Mario wanting to support the AMD equivalent of DMAR_PLATFORM_OPT_IN, scrutiny has shown that the iommu_dma_protection attribute is being far too optimistic. Even if an IOMMU might be present for some PCI segment in the system, that doesn't necessarily mean it provides translation for the device(s) we care about. Furthermore, all that DMAR_PLATFORM_OPT_IN really does is tell us that memory was protected before the kernel was loaded, and prevent the user from disabling the intel-iommu driver entirely. While that lets us assume kernel integrity, what matters for actual runtime DMA protection is whether we trust individual devices, based on the "external facing" property that we expect firmware to describe for Thunderbolt ports. It's proven challenging to determine the appropriate ports accurately given the variety of possible topologies, so while still not getting a perfect answer, by putting enough faith in firmware we can at least get a good bit closer. If we can see that any device near a Thunderbolt NHI has all the requisites for Kernel DMA Protection, chances are that it *is* a relevant port, but moreover that implies that firmware is playing the game overall, so we'll use that to assume that all Thunderbolt ports should be correctly marked and thus will end up fully protected. CC: Mario Limonciello Reviewed-by: Christoph Hellwig Acked-by: Mika Westerberg Signed-off-by: Robin Murphy Link: https://lore.kernel.org/r/b153f208bc9eafab5105bad0358b77366509d2d4.1650878781.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel --- include/linux/thunderbolt.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h index 124e13cb1469..7a8ad984e651 100644 --- a/include/linux/thunderbolt.h +++ b/include/linux/thunderbolt.h @@ -465,6 +465,7 @@ static inline struct tb_xdomain *tb_service_parent(struct tb_service *svc) * @msix_ida: Used to allocate MSI-X vectors for rings * @going_away: The host controller device is about to disappear so when * this flag is set, avoid touching the hardware anymore. + * @iommu_dma_protection: An IOMMU will isolate external-facing ports. * @interrupt_work: Work scheduled to handle ring interrupt when no * MSI-X is used. * @hop_count: Number of rings (end point hops) supported by NHI. @@ -479,6 +480,7 @@ struct tb_nhi { struct tb_ring **rx_rings; struct ida msix_ida; bool going_away; + bool iommu_dma_protection; struct work_struct interrupt_work; u32 hop_count; unsigned long quirks; -- cgit From 2147c438fde135d6c145a96e373d9348e7076f7f Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Mon, 25 Apr 2022 16:40:02 -0700 Subject: x86/speculation: Add missing prototype for unpriv_ebpf_notify() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix the following warnings seen with "make W=1": kernel/sysctl.c:183:13: warning: no previous prototype for ‘unpriv_ebpf_notify’ [-Wmissing-prototypes] 183 | void __weak unpriv_ebpf_notify(int new_state) | ^~~~~~~~~~~~~~~~~~ arch/x86/kernel/cpu/bugs.c:659:6: warning: no previous prototype for ‘unpriv_ebpf_notify’ [-Wmissing-prototypes] 659 | void unpriv_ebpf_notify(int new_state) | ^~~~~~~~~~~~~~~~~~ Fixes: 44a3918c8245 ("x86/speculation: Include unprivileged eBPF status in Spectre v2 mitigation reporting") Reported-by: kernel test robot Signed-off-by: Josh Poimboeuf Signed-off-by: Borislav Petkov Link: https://lore.kernel.org/r/5689d065f739602ececaee1e05e68b8644009608.1650930000.git.jpoimboe@redhat.com --- include/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index bdb5298735ce..ecc3d3ec41cf 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2085,6 +2085,8 @@ void bpf_offload_dev_netdev_unregister(struct bpf_offload_dev *offdev, struct net_device *netdev); bool bpf_offload_dev_match(struct bpf_prog *prog, struct net_device *netdev); +void unpriv_ebpf_notify(int new_state); + #if defined(CONFIG_NET) && defined(CONFIG_BPF_SYSCALL) int bpf_prog_offload_init(struct bpf_prog *prog, union bpf_attr *attr); -- cgit From 1ea2a07a532b0e22aabe7e8483f935c672b9e7ed Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Mon, 18 Apr 2022 08:49:50 +0800 Subject: iommu: Add DMA ownership management interfaces Multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. These devices must either be entirely under kernel control or userspace control, never a mixture. This adds dma ownership management in iommu core and exposes several interfaces for the device drivers and the device userspace assignment framework (i.e. VFIO), so that any conflict between user and kernel controlled dma could be detected at the beginning. The device driver oriented interfaces are, int iommu_device_use_default_domain(struct device *dev); void iommu_device_unuse_default_domain(struct device *dev); By calling iommu_device_use_default_domain(), the device driver tells the iommu layer that the device dma is handled through the kernel DMA APIs. The iommu layer will manage the IOVA and use the default domain for DMA address translation. The device user-space assignment framework oriented interfaces are, int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); void iommu_group_release_dma_owner(struct iommu_group *group); bool iommu_group_dma_owner_claimed(struct iommu_group *group); The device userspace assignment must be disallowed if the DMA owner claiming interface returns failure. Signed-off-by: Jason Gunthorpe Signed-off-by: Kevin Tian Signed-off-by: Lu Baolu Reviewed-by: Robin Murphy Link: https://lore.kernel.org/r/20220418005000.897664-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 9208eca4b0d1..77972ef978b5 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -675,6 +675,13 @@ struct iommu_sva *iommu_sva_bind_device(struct device *dev, void iommu_sva_unbind_device(struct iommu_sva *handle); u32 iommu_sva_get_pasid(struct iommu_sva *handle); +int iommu_device_use_default_domain(struct device *dev); +void iommu_device_unuse_default_domain(struct device *dev); + +int iommu_group_claim_dma_owner(struct iommu_group *group, void *owner); +void iommu_group_release_dma_owner(struct iommu_group *group); +bool iommu_group_dma_owner_claimed(struct iommu_group *group); + #else /* CONFIG_IOMMU_API */ struct iommu_ops {}; @@ -1031,6 +1038,30 @@ static inline struct iommu_fwspec *dev_iommu_fwspec_get(struct device *dev) { return NULL; } + +static inline int iommu_device_use_default_domain(struct device *dev) +{ + return 0; +} + +static inline void iommu_device_unuse_default_domain(struct device *dev) +{ +} + +static inline int +iommu_group_claim_dma_owner(struct iommu_group *group, void *owner) +{ + return -ENODEV; +} + +static inline void iommu_group_release_dma_owner(struct iommu_group *group) +{ +} + +static inline bool iommu_group_dma_owner_claimed(struct iommu_group *group) +{ + return false; +} #endif /* CONFIG_IOMMU_API */ /** -- cgit From 25f3bcfc54bcf7b0e45d140ec8bfbbf10ba11869 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Mon, 18 Apr 2022 08:49:51 +0800 Subject: driver core: Add dma_cleanup callback in bus_type The bus_type structure defines dma_configure() callback for bus drivers to configure DMA on the devices. This adds the paired dma_cleanup() callback and calls it during driver unbinding so that bus drivers can do some cleanup work. One use case for this paired DMA callbacks is for the bus driver to check for DMA ownership conflicts during driver binding, where multiple devices belonging to a same IOMMU group (the minimum granularity of isolation and protection) may be assigned to kernel drivers or user space respectively. Without this change, for example, the vfio driver has to listen to a bus BOUND_DRIVER event and then BUG_ON() in case of dma ownership conflict. This leads to bad user experience since careless driver binding operation may crash the system if the admin overlooks the group restriction. Aside from bad design, this leads to a security problem as a root user, even with lockdown=integrity, can force the kernel to BUG. With this change, the bus driver could check and set the DMA ownership in driver binding process and fail on ownership conflicts. The DMA ownership should be released during driver unbinding. Signed-off-by: Lu Baolu Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220418005000.897664-3-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/device/bus.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h index a039ab809753..d8b29ccd07e5 100644 --- a/include/linux/device/bus.h +++ b/include/linux/device/bus.h @@ -59,6 +59,8 @@ struct fwnode_handle; * bus supports. * @dma_configure: Called to setup DMA configuration on a device on * this bus. + * @dma_cleanup: Called to cleanup DMA configuration on a device on + * this bus. * @pm: Power management operations of this bus, callback the specific * device driver's pm-ops. * @iommu_ops: IOMMU specific operations for this bus, used to attach IOMMU @@ -103,6 +105,7 @@ struct bus_type { int (*num_vf)(struct device *dev); int (*dma_configure)(struct device *dev); + void (*dma_cleanup)(struct device *dev); const struct dev_pm_ops *pm; -- cgit From 4a6d9dd564d0e7339fc15ecc5ce66db4ad842be2 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Mon, 18 Apr 2022 08:49:52 +0800 Subject: amba: Stop sharing platform_dma_configure() Stop sharing platform_dma_configure() helper as they are about to have their own bus dma_configure callbacks. Signed-off-by: Lu Baolu Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220418005000.897664-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/platform_device.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h index 7c96f169d274..17fde717df68 100644 --- a/include/linux/platform_device.h +++ b/include/linux/platform_device.h @@ -328,8 +328,6 @@ extern int platform_pm_restore(struct device *dev); #define platform_pm_restore NULL #endif -extern int platform_dma_configure(struct device *dev); - #ifdef CONFIG_PM_SLEEP #define USE_PLATFORM_PM_SLEEP_OPS \ .suspend = platform_pm_suspend, \ -- cgit From 512881eacfa72c2136b27b9934b7b27504a9efc2 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Mon, 18 Apr 2022 08:49:53 +0800 Subject: bus: platform,amba,fsl-mc,PCI: Add device DMA ownership management The devices on platform/amba/fsl-mc/PCI buses could be bound to drivers with the device DMA managed by kernel drivers or user-space applications. Unfortunately, multiple devices may be placed in the same IOMMU group because they cannot be isolated from each other. The DMA on these devices must either be entirely under kernel control or userspace control, never a mixture. Otherwise the driver integrity is not guaranteed because they could access each other through the peer-to-peer accesses which by-pass the IOMMU protection. This checks and sets the default DMA mode during driver binding, and cleanups during driver unbinding. In the default mode, the device DMA is managed by the device driver which handles DMA operations through the kernel DMA APIs (see Documentation/core-api/dma-api.rst). For cases where the devices are assigned for userspace control through the userspace driver framework(i.e. VFIO), the drivers(for example, vfio_pci/ vfio_platfrom etc.) may set a new flag (driver_managed_dma) to skip this default setting in the assumption that the drivers know what they are doing with the device DMA. Calling iommu_device_use_default_domain() before {of,acpi}_dma_configure is currently a problem. As things stand, the IOMMU driver ignored the initial iommu_probe_device() call when the device was added, since at that point it had no fwspec yet. In this situation, {of,acpi}_iommu_configure() are retriggering iommu_probe_device() after the IOMMU driver has seen the firmware data via .of_xlate to learn that it actually responsible for the given device. As the result, before that gets fixed, iommu_use_default_domain() goes at the end, and calls arch_teardown_dma_ops() if it fails. Cc: Greg Kroah-Hartman Cc: Bjorn Helgaas Cc: Stuart Yoder Cc: Laurentiu Tudor Signed-off-by: Lu Baolu Reviewed-by: Greg Kroah-Hartman Reviewed-by: Jason Gunthorpe Reviewed-by: Robin Murphy Tested-by: Eric Auger Link: https://lore.kernel.org/r/20220418005000.897664-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/amba/bus.h | 8 ++++++++ include/linux/fsl/mc.h | 8 ++++++++ include/linux/pci.h | 8 ++++++++ include/linux/platform_device.h | 8 ++++++++ 4 files changed, 32 insertions(+) (limited to 'include/linux') diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h index 6562f543c3e0..2ddce9bcd00e 100644 --- a/include/linux/amba/bus.h +++ b/include/linux/amba/bus.h @@ -79,6 +79,14 @@ struct amba_driver { void (*remove)(struct amba_device *); void (*shutdown)(struct amba_device *); const struct amba_id *id_table; + /* + * For most device drivers, no need to care about this flag as long as + * all DMAs are handled through the kernel DMA API. For some special + * ones, for example VFIO drivers, they know how to manage the DMA + * themselves and set this flag so that the IOMMU layer will allow them + * to setup and manage their own I/O address space. + */ + bool driver_managed_dma; }; /* diff --git a/include/linux/fsl/mc.h b/include/linux/fsl/mc.h index 7b6c42bfb660..27efef8affb1 100644 --- a/include/linux/fsl/mc.h +++ b/include/linux/fsl/mc.h @@ -32,6 +32,13 @@ struct fsl_mc_io; * @shutdown: Function called at shutdown time to quiesce the device * @suspend: Function called when a device is stopped * @resume: Function called when a device is resumed + * @driver_managed_dma: Device driver doesn't use kernel DMA API for DMA. + * For most device drivers, no need to care about this flag + * as long as all DMAs are handled through the kernel DMA API. + * For some special ones, for example VFIO drivers, they know + * how to manage the DMA themselves and set this flag so that + * the IOMMU layer will allow them to setup and manage their + * own I/O address space. * * Generic DPAA device driver object for device drivers that are registered * with a DPRC bus. This structure is to be embedded in each device-specific @@ -45,6 +52,7 @@ struct fsl_mc_driver { void (*shutdown)(struct fsl_mc_device *dev); int (*suspend)(struct fsl_mc_device *dev, pm_message_t state); int (*resume)(struct fsl_mc_device *dev); + bool driver_managed_dma; }; #define to_fsl_mc_driver(_drv) \ diff --git a/include/linux/pci.h b/include/linux/pci.h index 60adf42460ab..b933d2b08d4d 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -895,6 +895,13 @@ struct module; * created once it is bound to the driver. * @driver: Driver model structure. * @dynids: List of dynamically added device IDs. + * @driver_managed_dma: Device driver doesn't use kernel DMA API for DMA. + * For most device drivers, no need to care about this flag + * as long as all DMAs are handled through the kernel DMA API. + * For some special ones, for example VFIO drivers, they know + * how to manage the DMA themselves and set this flag so that + * the IOMMU layer will allow them to setup and manage their + * own I/O address space. */ struct pci_driver { struct list_head node; @@ -913,6 +920,7 @@ struct pci_driver { const struct attribute_group **dev_groups; struct device_driver driver; struct pci_dynids dynids; + bool driver_managed_dma; }; static inline struct pci_driver *to_pci_driver(struct device_driver *drv) diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h index 17fde717df68..b3d9c744f1e5 100644 --- a/include/linux/platform_device.h +++ b/include/linux/platform_device.h @@ -210,6 +210,14 @@ struct platform_driver { struct device_driver driver; const struct platform_device_id *id_table; bool prevent_deferred_probe; + /* + * For most device drivers, no need to care about this flag as long as + * all DMAs are handled through the kernel DMA API. For some special + * ones, for example VFIO drivers, they know how to manage the DMA + * themselves and set this flag so that the IOMMU layer will allow them + * to setup and manage their own I/O address space. + */ + bool driver_managed_dma; }; #define to_platform_driver(drv) (container_of((drv), struct platform_driver, \ -- cgit From a5f1bd1afacd7b1e088f93f66af5453df0d8be9a Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Mon, 18 Apr 2022 08:50:00 +0800 Subject: iommu: Remove iommu group changes notifier The iommu group changes notifer is not referenced in the tree. Remove it to avoid dead code. Suggested-by: Christoph Hellwig Signed-off-by: Lu Baolu Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220418005000.897664-12-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 23 ----------------------- 1 file changed, 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 77972ef978b5..6ef2df258673 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -407,13 +407,6 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) return dev->iommu->iommu_dev->ops; } -#define IOMMU_GROUP_NOTIFY_ADD_DEVICE 1 /* Device added */ -#define IOMMU_GROUP_NOTIFY_DEL_DEVICE 2 /* Pre Device removed */ -#define IOMMU_GROUP_NOTIFY_BIND_DRIVER 3 /* Pre Driver bind */ -#define IOMMU_GROUP_NOTIFY_BOUND_DRIVER 4 /* Post Driver bind */ -#define IOMMU_GROUP_NOTIFY_UNBIND_DRIVER 5 /* Pre Driver unbind */ -#define IOMMU_GROUP_NOTIFY_UNBOUND_DRIVER 6 /* Post Driver unbind */ - extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops); extern int bus_iommu_probe(struct bus_type *bus); extern bool iommu_present(struct bus_type *bus); @@ -478,10 +471,6 @@ extern int iommu_group_for_each_dev(struct iommu_group *group, void *data, extern struct iommu_group *iommu_group_get(struct device *dev); extern struct iommu_group *iommu_group_ref_get(struct iommu_group *group); extern void iommu_group_put(struct iommu_group *group); -extern int iommu_group_register_notifier(struct iommu_group *group, - struct notifier_block *nb); -extern int iommu_group_unregister_notifier(struct iommu_group *group, - struct notifier_block *nb); extern int iommu_register_device_fault_handler(struct device *dev, iommu_dev_fault_handler_t handler, void *data); @@ -878,18 +867,6 @@ static inline void iommu_group_put(struct iommu_group *group) { } -static inline int iommu_group_register_notifier(struct iommu_group *group, - struct notifier_block *nb) -{ - return -ENODEV; -} - -static inline int iommu_group_unregister_notifier(struct iommu_group *group, - struct notifier_block *nb) -{ - return 0; -} - static inline int iommu_register_device_fault_handler(struct device *dev, iommu_dev_fault_handler_t handler, -- cgit From 00675017e0aeba5305665c52ded4ddce6a4c0231 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Mon, 4 Apr 2022 12:51:40 +0200 Subject: fs: add two trivial lookup helpers Similar to the addition of lookup_one() add a version of lookup_one_unlocked() and lookup_one_positive_unlocked() that take idmapped mounts into account. This is required to port overlay to support idmapped base layers. Cc: Tested-by: Giuseppe Scrivano Reviewed-by: Amir Goldstein Reviewed-by: Christoph Hellwig Signed-off-by: Christian Brauner (Microsoft) Signed-off-by: Miklos Szeredi --- include/linux/namei.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/namei.h b/include/linux/namei.h index e89329bb3134..caeb08a98536 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -69,6 +69,12 @@ extern struct dentry *lookup_one_len(const char *, struct dentry *, int); extern struct dentry *lookup_one_len_unlocked(const char *, struct dentry *, int); extern struct dentry *lookup_positive_unlocked(const char *, struct dentry *, int); struct dentry *lookup_one(struct user_namespace *, const char *, struct dentry *, int); +struct dentry *lookup_one_unlocked(struct user_namespace *mnt_userns, + const char *name, struct dentry *base, + int len); +struct dentry *lookup_one_positive_unlocked(struct user_namespace *mnt_userns, + const char *name, + struct dentry *base, int len); extern int follow_down_one(struct path *); extern int follow_down(struct path *); -- cgit From dbde6d0c7a5a462a1767a07c28eadd2c3dd08fb2 Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Thu, 28 Apr 2022 16:51:05 +0200 Subject: hv_sock: Add validation for untrusted Hyper-V values For additional robustness in the face of Hyper-V errors or malicious behavior, validate all values that originate from packets that Hyper-V has sent to the guest in the host-to-guest ring buffer. Ensure that invalid values cannot cause data being copied out of the bounds of the source buffer in hvs_stream_dequeue(). Signed-off-by: Andrea Parri (Microsoft) Reviewed-by: Michael Kelley Reviewed-by: Stefano Garzarella Link: https://lore.kernel.org/r/20220428145107.7878-4-parri.andrea@gmail.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 460a716f4748..a01c9fd0a334 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1696,6 +1696,11 @@ static inline u32 hv_pkt_datalen(const struct vmpacket_descriptor *desc) return (desc->len8 << 3) - (desc->offset8 << 3); } +/* Get packet length associated with descriptor */ +static inline u32 hv_pkt_len(const struct vmpacket_descriptor *desc) +{ + return desc->len8 << 3; +} struct vmpacket_descriptor * hv_pkt_iter_first_raw(struct vmbus_channel *channel); -- cgit From da795eb239d9e9496812e22cb582d38a71e0789a Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Thu, 28 Apr 2022 16:51:06 +0200 Subject: Drivers: hv: vmbus: Accept hv_sock offers in isolated guests So that isolated guests can communicate with the host via hv_sock channels. Signed-off-by: Andrea Parri (Microsoft) Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20220428145107.7878-5-parri.andrea@gmail.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index a01c9fd0a334..b028905d8334 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1064,10 +1064,14 @@ u64 vmbus_request_addr_match(struct vmbus_channel *channel, u64 trans_id, u64 rqst_addr); u64 vmbus_request_addr(struct vmbus_channel *channel, u64 trans_id); +static inline bool is_hvsock_offer(const struct vmbus_channel_offer_channel *o) +{ + return !!(o->offer.chn_flags & VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER); +} + static inline bool is_hvsock_channel(const struct vmbus_channel *c) { - return !!(c->offermsg.offer.chn_flags & - VMBUS_CHANNEL_TLNPI_PROVIDER_OFFER); + return is_hvsock_offer(&c->offermsg); } static inline bool is_sub_channel(const struct vmbus_channel *c) -- cgit From 1c9de08f7f952b4101f092802581344033d84429 Mon Sep 17 00:00:00 2001 From: "Andrea Parri (Microsoft)" Date: Thu, 28 Apr 2022 16:51:07 +0200 Subject: Drivers: hv: vmbus: Refactor the ring-buffer iterator functions With no users of hv_pkt_iter_next_raw() and no "external" users of hv_pkt_iter_first_raw(), the iterator functions can be refactored and simplified to remove some indirection/code. Signed-off-by: Andrea Parri (Microsoft) Reviewed-by: Michael Kelley Link: https://lore.kernel.org/r/20220428145107.7878-6-parri.andrea@gmail.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 35 ++++------------------------------- 1 file changed, 4 insertions(+), 31 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index b028905d8334..c440c45887c2 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1706,55 +1706,28 @@ static inline u32 hv_pkt_len(const struct vmpacket_descriptor *desc) return desc->len8 << 3; } -struct vmpacket_descriptor * -hv_pkt_iter_first_raw(struct vmbus_channel *channel); - struct vmpacket_descriptor * hv_pkt_iter_first(struct vmbus_channel *channel); struct vmpacket_descriptor * __hv_pkt_iter_next(struct vmbus_channel *channel, - const struct vmpacket_descriptor *pkt, - bool copy); + const struct vmpacket_descriptor *pkt); void hv_pkt_iter_close(struct vmbus_channel *channel); static inline struct vmpacket_descriptor * -hv_pkt_iter_next_pkt(struct vmbus_channel *channel, - const struct vmpacket_descriptor *pkt, - bool copy) +hv_pkt_iter_next(struct vmbus_channel *channel, + const struct vmpacket_descriptor *pkt) { struct vmpacket_descriptor *nxt; - nxt = __hv_pkt_iter_next(channel, pkt, copy); + nxt = __hv_pkt_iter_next(channel, pkt); if (!nxt) hv_pkt_iter_close(channel); return nxt; } -/* - * Get next packet descriptor without copying it out of the ring buffer - * If at end of list, return NULL and update host. - */ -static inline struct vmpacket_descriptor * -hv_pkt_iter_next_raw(struct vmbus_channel *channel, - const struct vmpacket_descriptor *pkt) -{ - return hv_pkt_iter_next_pkt(channel, pkt, false); -} - -/* - * Get next packet descriptor from iterator - * If at end of list, return NULL and update host. - */ -static inline struct vmpacket_descriptor * -hv_pkt_iter_next(struct vmbus_channel *channel, - const struct vmpacket_descriptor *pkt) -{ - return hv_pkt_iter_next_pkt(channel, pkt, true); -} - #define foreach_vmbus_pkt(pkt, channel) \ for (pkt = hv_pkt_iter_first(channel); pkt; \ pkt = hv_pkt_iter_next(channel, pkt)) -- cgit From 6043257b1de069fbb5a2a52d7211c0275bc8c0e0 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 11 Apr 2022 12:16:05 -0300 Subject: iommu: Introduce the domain op enforce_cache_coherency() This new mechanism will replace using IOMMU_CAP_CACHE_COHERENCY and IOMMU_CACHE to control the no-snoop blocking behavior of the IOMMU. Currently only Intel and AMD IOMMUs are known to support this feature. They both implement it as an IOPTE bit, that when set, will cause PCIe TLPs to that IOVA with the no-snoop bit set to be treated as though the no-snoop bit was clear. The new API is triggered by calling enforce_cache_coherency() before mapping any IOVA to the domain which globally switches on no-snoop blocking. This allows other implementations that might block no-snoop globally and outside the IOPTE - AMD also documents such a HW capability. Leave AMD out of sync with Intel and have it block no-snoop even for in-kernel users. This can be trivially resolved in a follow up patch. Only VFIO needs to call this API because it does not have detailed control over the device to avoid requesting no-snoop behavior at the device level. Other places using domains with real kernel drivers should simply avoid asking their devices to set the no-snoop bit. Reviewed-by: Lu Baolu Reviewed-by: Kevin Tian Acked-by: Robin Murphy Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/1-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 1 + include/linux/iommu.h | 4 ++++ 2 files changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 2f9891cb3d00..4c2baf2446c2 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -540,6 +540,7 @@ struct dmar_domain { u8 has_iotlb_device: 1; u8 iommu_coherency: 1; /* indicate coherency of iommu access */ u8 iommu_snooping: 1; /* indicate snooping control feature */ + u8 force_snooping : 1; /* Create IOPTEs with snoop control */ struct list_head devices; /* all devices' list */ struct iova_domain iovad; /* iova's that belong to this domain */ diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 4123693ae319..c7ad6b10e261 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -274,6 +274,9 @@ struct iommu_ops { * @iotlb_sync: Flush all queued ranges from the hardware TLBs and empty flush * queue * @iova_to_phys: translate iova to physical address + * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, + * including no-snoop TLPs on PCIe or other platform + * specific mechanisms. * @enable_nesting: Enable nesting * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) * @free: Release the domain after use. @@ -302,6 +305,7 @@ struct iommu_domain_ops { phys_addr_t (*iova_to_phys)(struct iommu_domain *domain, dma_addr_t iova); + bool (*enforce_cache_coherency)(struct iommu_domain *domain); int (*enable_nesting)(struct iommu_domain *domain); int (*set_pgtable_quirks)(struct iommu_domain *domain, unsigned long quirks); -- cgit From 71cfafda9c9bd9812cdb62ddb94daf65a1af12c1 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 11 Apr 2022 12:16:06 -0300 Subject: vfio: Move the Intel no-snoop control off of IOMMU_CACHE IOMMU_CACHE means "normal DMA to this iommu_domain's IOVA should be cache coherent" and is used by the DMA API. The definition allows for special non-coherent DMA to exist - ie processing of the no-snoop flag in PCIe TLPs - so long as this behavior is opt-in by the device driver. The flag is mainly used by the DMA API to synchronize the IOMMU setting with the expected cache behavior of the DMA master. eg based on dev_is_dma_coherent() in some case. For Intel IOMMU IOMMU_CACHE was redefined to mean 'force all DMA to be cache coherent' which has the practical effect of causing the IOMMU to ignore the no-snoop bit in a PCIe TLP. x86 platforms are always IOMMU_CACHE, so Intel should ignore this flag. Instead use the new domain op enforce_cache_coherency() which causes every IOPTE created in the domain to have the no-snoop blocking behavior. Reconfigure VFIO to always use IOMMU_CACHE and call enforce_cache_coherency() to operate the special Intel behavior. Remove the IOMMU_CACHE test from Intel IOMMU. Ultimately VFIO plumbs the result of enforce_cache_coherency() back into the x86 platform code through kvm_arch_register_noncoherent_dma() which controls if the WBINVD instruction is available in the guest. No other archs implement kvm_arch_register_noncoherent_dma() nor are there any other known consumers of VFIO_DMA_CC_IOMMU that might be affected by the user visible result change on non-x86 archs. Reviewed-by: Kevin Tian Reviewed-by: Lu Baolu Acked-by: Alex Williamson Acked-by: Robin Murphy Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/2-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 4c2baf2446c2..72e5d7900e71 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -539,7 +539,6 @@ struct dmar_domain { u8 has_iotlb_device: 1; u8 iommu_coherency: 1; /* indicate coherency of iommu access */ - u8 iommu_snooping: 1; /* indicate snooping control feature */ u8 force_snooping : 1; /* Create IOPTEs with snoop control */ struct list_head devices; /* all devices' list */ -- cgit From f78dc1dad829e505d83e33dc0879887f074c52e1 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Mon, 11 Apr 2022 12:16:07 -0300 Subject: iommu: Redefine IOMMU_CAP_CACHE_COHERENCY as the cap flag for IOMMU_CACHE While the comment was correct that this flag was intended to convey the block no-snoop support in the IOMMU, it has become widely implemented and used to mean the IOMMU supports IOMMU_CACHE as a map flag. Only the Intel driver was different. Now that the Intel driver is using enforce_cache_coherency() update the comment to make it clear that IOMMU_CAP_CACHE_COHERENCY is only about IOMMU_CACHE. Fix the Intel driver to return true since IOMMU_CACHE always works. The two places that test this flag, usnic and vdpa, are both assigning userspace pages to a driver controlled iommu_domain and require IOMMU_CACHE behavior as they offer no way for userspace to synchronize caches. Reviewed-by: Kevin Tian Reviewed-by: Lu Baolu Acked-by: Robin Murphy Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/3-v3-2cf356649677+a32-intel_no_snoop_jgg@nvidia.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index c7ad6b10e261..575ab27ede5b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -103,8 +103,7 @@ static inline bool iommu_is_dma_domain(struct iommu_domain *domain) } enum iommu_cap { - IOMMU_CAP_CACHE_COHERENCY, /* IOMMU can enforce cache coherent DMA - transactions */ + IOMMU_CAP_CACHE_COHERENCY, /* IOMMU_CACHE is supported */ IOMMU_CAP_INTR_REMAP, /* IOMMU supports interrupt isolation */ IOMMU_CAP_NOEXEC, /* IOMMU_NOEXEC flag */ IOMMU_CAP_PRE_BOOT_PROTECTION, /* Firmware says it used the IOMMU for -- cgit From 992be5d3c818fcc277db246cb409659ca82abdbe Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Wed, 30 Mar 2022 16:05:35 +0100 Subject: firmware: arm_scmi: Make name_get operations return a const A few protocol operations are available that returns a pointer to an internal character array representing resource name. Make those functions return a const pointer to such array. Link: https://lore.kernel.org/r/20220330150551.2573938-7-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index b87551f41f9f..ced37d1de1fe 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -146,7 +146,8 @@ struct scmi_perf_proto_ops { */ struct scmi_power_proto_ops { int (*num_domains_get)(const struct scmi_protocol_handle *ph); - char *(*name_get)(const struct scmi_protocol_handle *ph, u32 domain); + const char *(*name_get)(const struct scmi_protocol_handle *ph, + u32 domain); #define SCMI_POWER_STATE_TYPE_SHIFT 30 #define SCMI_POWER_STATE_ID_MASK (BIT(28) - 1) #define SCMI_POWER_STATE_PARAM(type, id) \ @@ -484,7 +485,8 @@ struct scmi_sensor_proto_ops { */ struct scmi_reset_proto_ops { int (*num_domains_get)(const struct scmi_protocol_handle *ph); - char *(*name_get)(const struct scmi_protocol_handle *ph, u32 domain); + const char *(*name_get)(const struct scmi_protocol_handle *ph, + u32 domain); int (*latency_get)(const struct scmi_protocol_handle *ph, u32 domain); int (*reset)(const struct scmi_protocol_handle *ph, u32 domain); int (*assert)(const struct scmi_protocol_handle *ph, u32 domain); -- cgit From b260fccaebdc2c838e62aaef24fedf497f181d10 Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Wed, 30 Mar 2022 16:05:40 +0100 Subject: firmware: arm_scmi: Add SCMI v3.1 protocol extended names support Using the common protocol helper implementation add support for all new SCMIv3.1 extended names commands related to all protocols with the exception of SENSOR_AXIS_GET_NAME. Link: https://lore.kernel.org/r/20220330150551.2573938-12-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index ced37d1de1fe..56e6f13355b8 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -13,7 +13,7 @@ #include #include -#define SCMI_MAX_STR_SIZE 16 +#define SCMI_MAX_STR_SIZE 64 #define SCMI_MAX_NUM_RATES 16 /** -- cgit From 7aa75496ea1f38dfd99b93c66f8d9bc525d11efc Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Wed, 30 Mar 2022 16:05:48 +0100 Subject: firmware: arm_scmi: Add SCMI v3.1 clock notifications Add SCMI v3.1 clock pre and post notifications. Link: https://lore.kernel.org/r/20220330150551.2573938-20-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 56e6f13355b8..0e20acc80d50 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -44,6 +44,8 @@ struct scmi_clock_info { char name[SCMI_MAX_STR_SIZE]; unsigned int enable_latency; bool rate_discrete; + bool rate_changed_notifications; + bool rate_change_requested_notifications; union { struct { int num_rates; @@ -744,6 +746,8 @@ void scmi_protocol_unregister(const struct scmi_protocol *proto); /* SCMI Notification API - Custom Event Reports */ enum scmi_notification_events { SCMI_EVENT_POWER_STATE_CHANGED = 0x0, + SCMI_EVENT_CLOCK_RATE_CHANGED = 0x0, + SCMI_EVENT_CLOCK_RATE_CHANGE_REQUESTED = 0x1, SCMI_EVENT_PERFORMANCE_LIMITS_CHANGED = 0x0, SCMI_EVENT_PERFORMANCE_LEVEL_CHANGED = 0x1, SCMI_EVENT_SENSOR_TRIP_POINT_EVENT = 0x0, @@ -760,6 +764,13 @@ struct scmi_power_state_changed_report { unsigned int power_state; }; +struct scmi_clock_rate_notif_report { + ktime_t timestamp; + unsigned int agent_id; + unsigned int clock_id; + unsigned long long rate; +}; + struct scmi_system_power_state_notifier_report { ktime_t timestamp; unsigned int agent_id; -- cgit From 4c74701b1eb7636eb0cdd66b488b42920105122a Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Wed, 30 Mar 2022 16:05:49 +0100 Subject: firmware: arm_scmi: Add SCMI v3.1 VOLTAGE_LEVEL_SET_COMPLETE Add SCMI v3.1 voltage protocol support for asynchronous VOLTAGE_LEVEL_SET command. Note that, if a voltage domain is advertised to support the asynchronous version of VOLTAGE_LEVEL_SET, the command will be issued asynchronously unless explicitly requested to use the synchronous version by setting the mode to SCMI_VOLTAGE_LEVEL_SET_SYNC when calling voltage_ops->level_set. The SCMI regulator driver level_set invocation has been left unchanged so that it will transparently use the asynchronous version if available. Link: https://lore.kernel.org/r/20220330150551.2573938-21-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 0e20acc80d50..1c58646ba381 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -495,6 +495,11 @@ struct scmi_reset_proto_ops { int (*deassert)(const struct scmi_protocol_handle *ph, u32 domain); }; +enum scmi_voltage_level_mode { + SCMI_VOLTAGE_LEVEL_SET_AUTO, + SCMI_VOLTAGE_LEVEL_SET_SYNC, +}; + /** * struct scmi_voltage_info - describe one available SCMI Voltage Domain * @@ -507,7 +512,8 @@ struct scmi_reset_proto_ops { * supported voltage level * @negative_volts_allowed: True if any of the entries of @levels_uv represent * a negative voltage. - * @attributes: represents Voltage Domain advertised attributes + * @async_level_set: True when the voltage domain supports asynchronous level + * set commands. * @name: name assigned to the Voltage Domain by platform * @num_levels: number of total entries in @levels_uv. * @levels_uv: array of entries describing the available voltage levels for @@ -517,7 +523,7 @@ struct scmi_voltage_info { unsigned int id; bool segmented; bool negative_volts_allowed; - unsigned int attributes; + bool async_level_set; char name[SCMI_MAX_STR_SIZE]; unsigned int num_levels; #define SCMI_VOLTAGE_SEGMENT_LOW 0 @@ -548,7 +554,7 @@ struct scmi_voltage_proto_ops { int (*config_get)(const struct scmi_protocol_handle *ph, u32 domain_id, u32 *config); int (*level_set)(const struct scmi_protocol_handle *ph, u32 domain_id, - u32 flags, s32 volt_uV); + enum scmi_voltage_level_mode mode, s32 volt_uV); int (*level_get)(const struct scmi_protocol_handle *ph, u32 domain_id, s32 *volt_uV); }; -- cgit From ac3e62f51b3fbda78a44fff62ece8f11d8f08a25 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Mon, 7 Feb 2022 15:38:40 +0100 Subject: iio: core: Clarify the modes As part of a previous discussion with Jonathan Cameron [1], it appeared necessary to clarify the meaning of each mode so that new developers could understand better what they should use or not use and when. The idea of renaming these modes as been let aside because naming is a big deal and requires a lot of thinking. So for now let's focus on correctly explaining what each mode implies. [1] https://lore.kernel.org/linux-iio/20210930165510.2295e6c4@jic23-huawei/ Suggested-by: Jonathan Cameron Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/r/20220207143840.707510-14-miquel.raynal@bootlin.com Signed-off-by: Jonathan Cameron --- include/linux/iio/iio.h | 49 ++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 48 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index 85cb924debd9..233d2e6b7721 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -315,7 +315,54 @@ static inline bool iio_channel_has_available(const struct iio_chan_spec *chan, s64 iio_get_time_ns(const struct iio_dev *indio_dev); unsigned int iio_get_time_res(const struct iio_dev *indio_dev); -/* Device operating modes */ +/* + * Device operating modes + * @INDIO_DIRECT_MODE: There is an access to either: + * a) The last single value available for devices that do not provide + * on-demand reads. + * b) A new value after performing an on-demand read otherwise. + * On most devices, this is a single-shot read. On some devices with data + * streams without an 'on-demand' function, this might also be the 'last value' + * feature. Above all, this mode internally means that we are not in any of the + * other modes, and sysfs reads should work. + * Device drivers should inform the core if they support this mode. + * @INDIO_BUFFER_TRIGGERED: Common mode when dealing with kfifo buffers. + * It indicates that an explicit trigger is required. This requests the core to + * attach a poll function when enabling the buffer, which is indicated by the + * _TRIGGERED suffix. + * The core will ensure this mode is set when registering a triggered buffer + * with iio_triggered_buffer_setup(). + * @INDIO_BUFFER_SOFTWARE: Another kfifo buffer mode, but not event triggered. + * No poll function can be attached because there is no triggered infrastructure + * we can use to cause capture. There is a kfifo that the driver will fill, but + * not "only one scan at a time". Typically, hardware will have a buffer that + * can hold multiple scans. Software may read one or more scans at a single time + * and push the available data to a Kfifo. This means the core will not attach + * any poll function when enabling the buffer. + * The core will ensure this mode is set when registering a simple kfifo buffer + * with devm_iio_kfifo_buffer_setup(). + * @INDIO_BUFFER_HARDWARE: For specific hardware, if unsure do not use this mode. + * Same as above but this time the buffer is not a kfifo where we have direct + * access to the data. Instead, the consumer driver must access the data through + * non software visible channels (or DMA when there is no demux possible in + * software) + * The core will ensure this mode is set when registering a dmaengine buffer + * with devm_iio_dmaengine_buffer_setup(). + * @INDIO_EVENT_TRIGGERED: Very unusual mode. + * Triggers usually refer to an external event which will start data capture. + * Here it is kind of the opposite as, a particular state of the data might + * produce an event which can be considered as an event. We don't necessarily + * have access to the data itself, but to the event produced. For example, this + * can be a threshold detector. The internal path of this mode is very close to + * the INDIO_BUFFER_TRIGGERED mode. + * The core will ensure this mode is set when registering a triggered event. + * @INDIO_HARDWARE_TRIGGERED: Very unusual mode. + * Here, triggers can result in data capture and can be routed to multiple + * hardware components, which make them close to regular triggers in the way + * they must be managed by the core, but without the entire interrupts/poll + * functions burden. Interrupts are irrelevant as the data flow is hardware + * mediated and distributed. + */ #define INDIO_DIRECT_MODE 0x01 #define INDIO_BUFFER_TRIGGERED 0x02 #define INDIO_BUFFER_SOFTWARE 0x04 -- cgit From cc10eee95204579fcd66fd5965073fdcbf629676 Mon Sep 17 00:00:00 2001 From: Vishal Verma Date: Wed, 13 Apr 2022 01:36:16 -0600 Subject: PCI/ACPI: add a helper for retrieving _OSC Control DWORDs During _OSC negotiation, when the 'Control' DWORD is needed from the result buffer after running _OSC, a couple of places performed manual pointer arithmetic to offset into the right spot in the raw buffer. Add a acpi_osc_ctx_get_pci_control() helper to use the #define'd DWORD offsets to fetch the DWORDs needed from @acpi_osc_context, and replace the above instances of the open-coded arithmetic. Cc: "Rafael J. Wysocki" Suggested-by: Davidlohr Bueso Acked-by: Rafael J. Wysocki Reviewed-by: Rafael J. Wysocki Reviewed-by: Davidlohr Bueso Reviewed by: Adam Manzanares Signed-off-by: Vishal Verma Link: https://lore.kernel.org/r/20220413073618.291335-2-vishal.l.verma@intel.com Signed-off-by: Dan Williams --- include/linux/acpi.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index d7136d13aa44..04e5a038dd57 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -608,6 +608,13 @@ extern u32 osc_sb_native_usb4_control; #define OSC_PCI_EXPRESS_LTR_CONTROL 0x00000020 #define OSC_PCI_EXPRESS_DPC_CONTROL 0x00000080 +static inline u32 acpi_osc_ctx_get_pci_control(struct acpi_osc_context *context) +{ + u32 *ret = context->ret.pointer; + + return ret[OSC_CONTROL_DWORD]; +} + #define ACPI_GSB_ACCESS_ATTRIB_QUICK 0x00000002 #define ACPI_GSB_ACCESS_ATTRIB_SEND_RCV 0x00000004 #define ACPI_GSB_ACCESS_ATTRIB_BYTE 0x00000006 @@ -1004,6 +1011,12 @@ static inline int acpi_register_wakeup_handler(int wake_irq, static inline void acpi_unregister_wakeup_handler( bool (*wakeup)(void *context), void *context) { } +struct acpi_osc_context; +static inline u32 acpi_osc_ctx_get_pci_control(struct acpi_osc_context *context) +{ + return 0; +} + #endif /* !CONFIG_ACPI */ #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC -- cgit From 241d26bc26add2e2867c546f7474902406d37c60 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Wed, 13 Apr 2022 01:36:17 -0600 Subject: PCI/ACPI: Prefer CXL _OSC instead of PCIe _OSC for CXL host bridges OB In preparation for negotiating OS control of CXL _OSC features, do the minimal enabling to use CXL _OSC to handle the base PCIe feature negotiation. Recall that CXL _OSC is a super-set of PCIe _OSC and the CXL 2.0 specification mandates: "If a CXL Host Bridge device exposes CXL _OSC, CXL aware OSPM shall evaluate CXL _OSC and not evaluate PCIe _OSC." Rather than pass a boolean flag alongside @root to all the helper functions that need to consider PCIe specifics, add is_pcie() and is_cxl() helper functions to check the flavor of @root. This also allows for dynamic fallback to PCIe _OSC in cases where an attempt to use CXL _OXC fails. This can happen on CXL 1.1 platforms that publish ACPI0016 devices to indicate CXL host bridges, but do not publish the optional CXL _OSC method. CXL _OSC is mandatory for CXL 2.0 hosts. Cc: Bjorn Helgaas Cc: "Rafael J. Wysocki" Cc: Robert Moore Reviewed-by: Jonathan Cameron Reviewed-by: Rafael J. Wysocki Reviewed-by: Davidlohr Bueso Signed-off-by: Vishal Verma Link: https://lore.kernel.org/r/20220413073618.291335-3-vishal.l.verma@intel.com Signed-off-by: Dan Williams --- include/linux/acpi.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 04e5a038dd57..82d91d9ccce5 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -550,6 +550,10 @@ struct acpi_osc_context { acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context); +/* Number of _OSC capability DWORDS depends on bridge type */ +#define OSC_PCI_CAPABILITY_DWORDS 3 +#define OSC_CXL_CAPABILITY_DWORDS 5 + /* Indexes into _OSC Capabilities Buffer (DWORDs 2 & 3 are device-specific) */ #define OSC_QUERY_DWORD 0 /* DWORD 1 */ #define OSC_SUPPORT_DWORD 1 /* DWORD 2 */ -- cgit From 56368029d93bbb3246ee2e03268fa6dd9754be05 Mon Sep 17 00:00:00 2001 From: Vishal Verma Date: Wed, 13 Apr 2022 01:36:18 -0600 Subject: PCI/ACPI: negotiate CXL _OSC Add full support for negotiating _OSC as defined in the CXL 2.0 spec, as applicable to CXL-enabled platforms. Advertise support for the CXL features we support - 'CXL 2.0 port/device register access', 'Protocol Error Reporting', and 'CXL Native Hot Plug'. Request control for 'CXL Memory Error Reporting'. The requests are dependent on CONFIG_* based prerequisites, and prior PCI enabling, similar to how the standard PCI _OSC bits are determined. The CXL specification does not define any additional constraints on the hotplug flow beyond PCIe native hotplug, so a kernel that supports native PCIe hotplug, supports CXL hotplug. For error handling protocol and link errors just use PCIe AER. There is nascent support for amending AER events with CXL specific status [1], but there's otherwise no additional OS responsibility for CXL errors beyond PCIe AER. CXL Memory Errors behave the same as typical memory errors so CONFIG_MEMORY_FAILURE is sufficient to indicate support to platform firmware. [1]: https://lore.kernel.org/linux-cxl/164740402242.3912056.8303625392871313860.stgit@dwillia2-desk3.amr.corp.intel.com/ Cc: Bjorn Helgaas Cc: "Rafael J. Wysocki" Cc: Robert Moore Cc: Dan Williams Reviewed-by: Rafael J. Wysocki Reviewed-by: Davidlohr Bueso Signed-off-by: Vishal Verma Link: https://lore.kernel.org/r/20220413073618.291335-4-vishal.l.verma@intel.com Signed-off-by: Dan Williams --- include/linux/acpi.h | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 82d91d9ccce5..378a431666b3 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -554,10 +554,12 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context); #define OSC_PCI_CAPABILITY_DWORDS 3 #define OSC_CXL_CAPABILITY_DWORDS 5 -/* Indexes into _OSC Capabilities Buffer (DWORDs 2 & 3 are device-specific) */ +/* Indexes into _OSC Capabilities Buffer (DWORDs 2 to 5 are device-specific) */ #define OSC_QUERY_DWORD 0 /* DWORD 1 */ #define OSC_SUPPORT_DWORD 1 /* DWORD 2 */ #define OSC_CONTROL_DWORD 2 /* DWORD 3 */ +#define OSC_EXT_SUPPORT_DWORD 3 /* DWORD 4 */ +#define OSC_EXT_CONTROL_DWORD 4 /* DWORD 5 */ /* _OSC Capabilities DWORD 1: Query/Control and Error Returns (generic) */ #define OSC_QUERY_ENABLE 0x00000001 /* input */ @@ -612,6 +614,15 @@ extern u32 osc_sb_native_usb4_control; #define OSC_PCI_EXPRESS_LTR_CONTROL 0x00000020 #define OSC_PCI_EXPRESS_DPC_CONTROL 0x00000080 +/* CXL _OSC: Capabilities DWORD 4: Support Field */ +#define OSC_CXL_1_1_PORT_REG_ACCESS_SUPPORT 0x00000001 +#define OSC_CXL_2_0_PORT_DEV_REG_ACCESS_SUPPORT 0x00000002 +#define OSC_CXL_PROTOCOL_ERR_REPORTING_SUPPORT 0x00000004 +#define OSC_CXL_NATIVE_HP_SUPPORT 0x00000008 + +/* CXL _OSC: Capabilities DWORD 5: Control Field */ +#define OSC_CXL_ERROR_REPORTING_CONTROL 0x00000001 + static inline u32 acpi_osc_ctx_get_pci_control(struct acpi_osc_context *context) { u32 *ret = context->ret.pointer; @@ -619,6 +630,13 @@ static inline u32 acpi_osc_ctx_get_pci_control(struct acpi_osc_context *context) return ret[OSC_CONTROL_DWORD]; } +static inline u32 acpi_osc_ctx_get_cxl_control(struct acpi_osc_context *context) +{ + u32 *ret = context->ret.pointer; + + return ret[OSC_EXT_CONTROL_DWORD]; +} + #define ACPI_GSB_ACCESS_ATTRIB_QUICK 0x00000002 #define ACPI_GSB_ACCESS_ATTRIB_SEND_RCV 0x00000004 #define ACPI_GSB_ACCESS_ATTRIB_BYTE 0x00000006 @@ -1021,6 +1039,11 @@ static inline u32 acpi_osc_ctx_get_pci_control(struct acpi_osc_context *context) return 0; } +static inline u32 acpi_osc_ctx_get_cxl_control(struct acpi_osc_context *context) +{ + return 0; +} + #endif /* !CONFIG_ACPI */ #ifdef CONFIG_ACPI_HOTPLUG_IOAPIC -- cgit From d864b8ea6468cf1dce614a58eec92a23d8e07fec Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 26 Apr 2022 12:22:44 -0700 Subject: cxl/acpi: Add root device lockdep validation The CXL "root" device, ACPI0017, is an attach point for coordinating platform level CXL resources and is the parent device for a CXL port topology tree. As such it has distinct locking rules relative to other CXL subsystem objects, but because it is an ACPI device the lock class is established well before it is given to the cxl_acpi driver. However, the lockdep API does support changing the lock class "live" for situations like this. Add a device_lock_set_class() helper that a driver can use in ->probe() to set a custom lock class, and device_lock_reset_class() to return to the default "no validate" class before the custom lock class key goes out of scope after ->remove(). Note the helpers are all macros to support dead code elimination in the CONFIG_PROVE_LOCKING=n case, however device_set_lock_class() still needs #ifdef CONFIG_PROVE_LOCKING since lockdep_match_class() explicitly does not have a helper in the CONFIG_PROVE_LOCKING=n case (see comment in lockdep.h). The lockdep API needs 2 small tweaks to prevent "unused" warnings for the @key argument to lock_set_class(), and a new lock_set_novalidate_class() is added to supplement lockdep_set_novalidate_class() in the cases where the lock class is converted while the lock is held. Suggested-by: Peter Zijlstra Cc: "Rafael J. Wysocki" Cc: Ingo Molnar Cc: Will Deacon Cc: Waiman Long Cc: Boqun Feng Cc: Alison Schofield Cc: Vishal Verma Cc: Ben Widawsky Cc: Jonathan Cameron Reviewed-by: Greg Kroah-Hartman Reviewed-by: Ira Weiny Link: https://lore.kernel.org/r/165100081305.1528964.11138612430659737238.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- include/linux/device.h | 43 +++++++++++++++++++++++++++++++++++++++++++ include/linux/lockdep.h | 6 +++++- 2 files changed, 48 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index 93459724dcde..833b0b3b0193 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -850,6 +850,49 @@ static inline bool device_supports_offline(struct device *dev) return dev->bus && dev->bus->offline && dev->bus->online; } +#define __device_lock_set_class(dev, name, key) \ +do { \ + struct device *__d2 __maybe_unused = dev; \ + lock_set_class(&__d2->mutex.dep_map, name, key, 0, _THIS_IP_); \ +} while (0) + +/** + * device_lock_set_class - Specify a temporary lock class while a device + * is attached to a driver + * @dev: device to modify + * @key: lock class key data + * + * This must be called with the device_lock() already held, for example + * from driver ->probe(). Take care to only override the default + * lockdep_no_validate class. + */ +#ifdef CONFIG_LOCKDEP +#define device_lock_set_class(dev, key) \ +do { \ + struct device *__d = dev; \ + dev_WARN_ONCE(__d, !lockdep_match_class(&__d->mutex, \ + &__lockdep_no_validate__), \ + "overriding existing custom lock class\n"); \ + __device_lock_set_class(__d, #key, key); \ +} while (0) +#else +#define device_lock_set_class(dev, key) __device_lock_set_class(dev, #key, key) +#endif + +/** + * device_lock_reset_class - Return a device to the default lockdep novalidate state + * @dev: device to modify + * + * This must be called with the device_lock() already held, for example + * from driver ->remove(). + */ +#define device_lock_reset_class(dev) \ +do { \ + struct device *__d __maybe_unused = dev; \ + lock_set_novalidate_class(&__d->mutex.dep_map, "&dev->mutex", \ + _THIS_IP_); \ +} while (0) + void lock_device_hotplug(void); void unlock_device_hotplug(void); int lock_device_hotplug_sysfs(void); diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 467b94257105..43b0dc6a0b21 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -290,6 +290,9 @@ extern void lock_set_class(struct lockdep_map *lock, const char *name, struct lock_class_key *key, unsigned int subclass, unsigned long ip); +#define lock_set_novalidate_class(l, n, i) \ + lock_set_class(l, n, &__lockdep_no_validate__, 0, i) + static inline void lock_set_subclass(struct lockdep_map *lock, unsigned int subclass, unsigned long ip) { @@ -357,7 +360,8 @@ static inline void lockdep_set_selftest_task(struct task_struct *task) # define lock_acquire(l, s, t, r, c, n, i) do { } while (0) # define lock_release(l, i) do { } while (0) # define lock_downgrade(l, i) do { } while (0) -# define lock_set_class(l, n, k, s, i) do { } while (0) +# define lock_set_class(l, n, key, s, i) do { (void)(key); } while (0) +# define lock_set_novalidate_class(l, n, i) do { } while (0) # define lock_set_subclass(l, s, i) do { } while (0) # define lockdep_init() do { } while (0) # define lockdep_init_map_type(lock, name, key, sub, inner, outer, type) \ -- cgit From fd3abd2cafa46955846d731b9a6ded2c19ab73d8 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 21 Apr 2022 08:33:45 -0700 Subject: device-core: Kill the lockdep_mutex Per Peter [1], the lockdep API has native support for all the use cases lockdep_mutex was attempting to enable. Now that all lockdep_mutex users have been converted to those APIs, drop this lock. Link: https://lore.kernel.org/r/Ylf0dewci8myLvoW@hirez.programming.kicks-ass.net [1] Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Suggested-by: Peter Zijlstra Reviewed-by: Ira Weiny Acked-by: Greg Kroah-Hartman Link: https://lore.kernel.org/r/165055522548.3745911.14298368286915484086.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- include/linux/device.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index 833b0b3b0193..073f1b0126ac 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -400,8 +400,6 @@ struct dev_msi_info { * This identifies the device type and carries type-specific * information. * @mutex: Mutex to synchronize calls to its driver. - * @lockdep_mutex: An optional debug lock that a subsystem can use as a - * peer lock to gain localized lockdep coverage of the device_lock. * @bus: Type of bus device is on. * @driver: Which driver has allocated this * @platform_data: Platform data specific to the device. @@ -499,9 +497,6 @@ struct device { core doesn't touch it */ void *driver_data; /* Driver data, set and get with dev_set_drvdata/dev_get_drvdata */ -#ifdef CONFIG_PROVE_LOCKING - struct mutex lockdep_mutex; -#endif struct mutex mutex; /* mutex to synchronize calls to * its driver. */ -- cgit From 9096bbe951ddd3f9dc813e73b5bde6d5715d1cdb Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 28 Apr 2022 23:15:58 -0700 Subject: mm: shmem: make shmem_init return void The return value of shmem_init is never used. So we can make it return void now. [akpm@linux-foundation.org: remove `return;' from void-returning function, per Muchun Song] Link: https://lkml.kernel.org/r/20220328112707.22217-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Muchun Song Cc: Hugh Dickins Signed-off-by: Andrew Morton --- include/linux/shmem_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index ab51d3cd39bd..3e915cc550bc 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -56,7 +56,7 @@ static inline struct shmem_inode_info *SHMEM_I(struct inode *inode) * Functions in mm/shmem.c called directly from elsewhere: */ extern const struct fs_parameter_spec shmem_fs_parameters[]; -extern int shmem_init(void); +extern void shmem_init(void); extern int shmem_init_fs_context(struct fs_context *fc); extern struct file *shmem_file_setup(const char *name, loff_t size, unsigned long flags); -- cgit From ef7a4ffc4c7f969d61e297c53cb2de1d6f012a11 Mon Sep 17 00:00:00 2001 From: Lu Jialin Date: Thu, 28 Apr 2022 23:16:00 -0700 Subject: mm/memcontrol.c: make cgroup_memory_noswap static cgroup_memory_noswap is only used in mm/memcontrol.c, therefore just make it static, and remove export in include/linux/memcontrol.h Link: https://lkml.kernel.org/r/20220421124736.62180-1-lujialin4@huawei.com Signed-off-by: Lu Jialin Acked-by: Johannes Weiner Acked-by: Shakeel Butt Acked-by: Roman Gushchin Reviewed-by: Muchun Song Cc: Michal Hocko Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 89b14729d59f..fe580cb96683 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -937,10 +937,6 @@ struct mem_cgroup *mem_cgroup_get_oom_group(struct task_struct *victim, struct mem_cgroup *oom_domain); void mem_cgroup_print_oom_group(struct mem_cgroup *memcg); -#ifdef CONFIG_MEMCG_SWAP -extern bool cgroup_memory_noswap; -#endif - void folio_memcg_lock(struct folio *folio); void folio_memcg_unlock(struct folio *folio); void lock_page_memcg(struct page *page); -- cgit From 2ba2b008a8bf5fd268a43d03ba79e0ad464d6836 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 28 Apr 2022 23:16:02 -0700 Subject: Revert "mm/memory-failure.c: fix race with changing page compound again" Reverts commit 888af2701db7 ("mm/memory-failure.c: fix race with changing page compound again") because now we fetch the page refcount under hugetlb_lock in try_memory_failure_hugetlb() so that the race check is no longer necessary. Link: https://lkml.kernel.org/r/20220408135323.1559401-4-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi Suggested-by: Miaohe Lin Reviewed-by: Miaohe Lin Reviewed-by: Mike Kravetz Cc: Miaohe Lin Cc: Yang Shi Cc: Dan Carpenter Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 9f44254af8ce..d446e834a3e5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3251,7 +3251,6 @@ enum mf_action_page_type { MF_MSG_BUDDY, MF_MSG_DAX, MF_MSG_UNSPLIT_THP, - MF_MSG_DIFFERENT_PAGE_SIZE, MF_MSG_UNKNOWN, }; -- cgit From 7d6e2d96384556a4f30547803be1f606eb805a62 Mon Sep 17 00:00:00 2001 From: Oscar Salvador Date: Thu, 28 Apr 2022 23:16:09 -0700 Subject: mm: untangle config dependencies for demote-on-reclaim At the time demote-on-reclaim was introduced, it was tied to CONFIG_HOTPLUG_CPU + CONFIG_MIGRATE, but that is not really accurate. The only two things we need to depend on are CONFIG_NUMA + CONFIG_MIGRATE, so clean this up. Furthermore, we only register the hotplug memory notifier when the system has CONFIG_MEMORY_HOTPLUG. Link: https://lkml.kernel.org/r/20220322224016.4574-1-osalvador@suse.de Signed-off-by: Oscar Salvador Suggested-by: "Huang, Ying" Reviewed-by: "Huang, Ying" Cc: Dave Hansen Cc: Abhishek Goel Cc: Baolin Wang Signed-off-by: Andrew Morton --- include/linux/migrate.h | 34 +++++++++++++++------------------- 1 file changed, 15 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 90e75d5a54d6..2707bfd43a0d 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -47,17 +47,8 @@ void folio_migrate_copy(struct folio *newfolio, struct folio *folio); int folio_migrate_mapping(struct address_space *mapping, struct folio *newfolio, struct folio *folio, int extra_count); -extern bool numa_demotion_enabled; -extern void migrate_on_reclaim_init(void); -#ifdef CONFIG_HOTPLUG_CPU -extern void set_migration_target_nodes(void); -#else -static inline void set_migration_target_nodes(void) {} -#endif #else -static inline void set_migration_target_nodes(void) {} - static inline void putback_movable_pages(struct list_head *l) {} static inline int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, unsigned long private, enum migrate_mode mode, @@ -82,9 +73,23 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, return -ENOSYS; } -#define numa_demotion_enabled false #endif /* CONFIG_MIGRATION */ +#if defined(CONFIG_MIGRATION) && defined(CONFIG_NUMA) +extern void set_migration_target_nodes(void); +extern void migrate_on_reclaim_init(void); +extern bool numa_demotion_enabled; +extern int next_demotion_node(int node); +#else +static inline void set_migration_target_nodes(void) {} +static inline void migrate_on_reclaim_init(void) {} +static inline int next_demotion_node(int node) +{ + return NUMA_NO_NODE; +} +#define numa_demotion_enabled false +#endif + #ifdef CONFIG_COMPACTION extern int PageMovable(struct page *page); extern void __SetPageMovable(struct page *page, struct address_space *mapping); @@ -172,15 +177,6 @@ struct migrate_vma { int migrate_vma_setup(struct migrate_vma *args); void migrate_vma_pages(struct migrate_vma *migrate); void migrate_vma_finalize(struct migrate_vma *migrate); -int next_demotion_node(int node); - -#else /* CONFIG_MIGRATION disabled: */ - -static inline int next_demotion_node(int node) -{ - return NUMA_NO_NODE; -} - #endif /* CONFIG_MIGRATION */ #endif /* _LINUX_MIGRATE_H */ -- cgit From 6a8e0596f00469c15ec556b9f3624acd2e9a04f9 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 28 Apr 2022 23:16:10 -0700 Subject: mm: rmap: introduce pfn_mkclean_range() to cleans PTEs The page_mkclean_one() is supposed to be used with the pfn that has a associated struct page, but not all the pfns (e.g. DAX) have a struct page. Introduce a new function pfn_mkclean_range() to cleans the PTEs (including PMDs) mapped with range of pfns which has no struct page associated with them. This helper will be used by DAX device in the next patch to make pfns clean. Link: https://lkml.kernel.org/r/20220403053957.10770-4-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Alistair Popple Cc: Al Viro Cc: Christoph Hellwig Cc: Dan Williams Cc: Hugh Dickins Cc: Jan Kara Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Ralph Campbell Cc: Ross Zwisler Cc: Xiongchun Duan Cc: Xiyu Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/rmap.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 17230c458341..8573aae50d96 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -261,6 +261,9 @@ unsigned long page_address_in_vma(struct page *, struct vm_area_struct *); */ int folio_mkclean(struct folio *); +int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, + struct vm_area_struct *vma); + void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); /* -- cgit From 0e5e64c0b0d7bb33a5e971ad17e771cf6e0a1127 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 28 Apr 2022 23:16:10 -0700 Subject: mm: simplify follow_invalidate_pte() The only user (DAX) of range and pmdpp parameters of follow_invalidate_pte() is gone, it is safe to remove them and make it static to simlify the code. This is revertant of the following commits: 097963959594 ("mm: add follow_pte_pmd()") a4d1a8852513 ("dax: update to new mmu_notifier semantic") There is only one caller of the follow_invalidate_pte(). So just fold it into follow_pte() and remove it. Link: https://lkml.kernel.org/r/20220403053957.10770-7-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Christoph Hellwig Cc: Alistair Popple Cc: Al Viro Cc: Dan Williams Cc: Hugh Dickins Cc: Jan Kara Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Ralph Campbell Cc: Ross Zwisler Cc: Xiongchun Duan Cc: Xiyu Yang Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/mm.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d446e834a3e5..16d6b46d2d3e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1845,9 +1845,6 @@ void free_pgd_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, unsigned long floor, unsigned long ceiling); int copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma); -int follow_invalidate_pte(struct mm_struct *mm, unsigned long address, - struct mmu_notifier_range *range, pte_t **ptepp, - pmd_t **pmdpp, spinlock_t **ptlp); int follow_pte(struct mm_struct *mm, unsigned long address, pte_t **ptepp, spinlock_t **ptlp); int follow_pfn(struct vm_area_struct *vma, unsigned long address, -- cgit From 3afa793082e624f7bd83533010ff4a676451d4ee Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Thu, 28 Apr 2022 23:16:14 -0700 Subject: mm/mmap: drop arch_vm_get_page_pgprot() There are no platforms left which use arch_vm_get_page_prot(). Just drop generic arch_vm_get_page_prot(). Link: https://lkml.kernel.org/r/20220414062125.609297-8-anshuman.khandual@arm.com Cc: Andrew Morton Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Christophe Leroy Reviewed-by: Catalin Marinas Signed-off-by: Anshuman Khandual Cc: Christoph Hellwig Cc: David S. Miller Cc: Ingo Molnar Cc: Khalid Aziz Cc: Michael Ellerman Cc: Paul Mackerras Cc: Thomas Gleixner Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mman.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mman.h b/include/linux/mman.h index b66e91b8176c..58b3abd457a3 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -93,10 +93,6 @@ static inline void vm_unacct_memory(long pages) #define arch_calc_vm_flag_bits(flags) 0 #endif -#ifndef arch_vm_get_page_prot -#define arch_vm_get_page_prot(vm_flags) __pgprot(0) -#endif - #ifndef arch_validate_prot /* * This is called from mprotect(). PROT_GROWSDOWN and PROT_GROWSUP have -- cgit From 5981611d0a006472d367d7a8e6ead8afaecf17c7 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 28 Apr 2022 23:16:14 -0700 Subject: mm: hugetlb_vmemmap: cleanup hugetlb_vmemmap related functions Patch series "cleanup hugetlb_vmemmap". The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize" is more clear. In this series, cheanup related codes to make it more clear and expressive. This is suggested by David. This patch (of 3): The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". And some function names are prefixed with "huge_page" instead of "hugetlb", it is easily to be confused with THP. In this patch, cheanup related functions to make code more clear and expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220404074652.68024-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: David Hildenbrand Cc: Mike Kravetz Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ac2a1d758a80..189bddb8eca4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -624,7 +624,7 @@ struct hstate { unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP - unsigned int nr_free_vmemmap_pages; + unsigned int optimize_vmemmap_pages; #endif #ifdef CONFIG_CGROUP_HUGETLB /* cgroup control files */ -- cgit From f10f1442c309ccef7a80ba3dc4abde0978e86fb4 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 28 Apr 2022 23:16:15 -0700 Subject: mm: hugetlb_vmemmap: cleanup hugetlb_free_vmemmap_enabled* The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup the static key and hugetlb_free_vmemmap_enabled() to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-3-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: David Hildenbrand Cc: Mike Kravetz Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 9d8eeaa67d05..573f499d5a2d 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -192,16 +192,16 @@ enum pageflags { #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, - hugetlb_free_vmemmap_enabled_key); + hugetlb_optimize_vmemmap_key); -static __always_inline bool hugetlb_free_vmemmap_enabled(void) +static __always_inline bool hugetlb_optimize_vmemmap_enabled(void) { return static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, - &hugetlb_free_vmemmap_enabled_key); + &hugetlb_optimize_vmemmap_key); } /* - * If the feature of freeing some vmemmap pages associated with each HugeTLB + * If the feature of optimizing vmemmap pages associated with each HugeTLB * page is enabled, the head vmemmap page frame is reused and all of the tail * vmemmap addresses map to the head vmemmap page frame (furture details can * refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other @@ -218,7 +218,7 @@ static __always_inline bool hugetlb_free_vmemmap_enabled(void) */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!hugetlb_free_vmemmap_enabled()) + if (!hugetlb_optimize_vmemmap_enabled()) return page; /* @@ -247,7 +247,7 @@ static inline const struct page *page_fixed_fake_head(const struct page *page) return page; } -static inline bool hugetlb_free_vmemmap_enabled(void) +static inline bool hugetlb_optimize_vmemmap_enabled(void) { return false; } -- cgit From 47010c040dec8af6347ec6259104fc13f7e7e30a Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Thu, 28 Apr 2022 23:16:15 -0700 Subject: mm: hugetlb_vmemmap: cleanup CONFIG_HUGETLB_PAGE_FREE_VMEMMAP* The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup configs to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Mike Kravetz Cc: David Hildenbrand Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 2 +- include/linux/mm.h | 2 +- include/linux/page-flags.h | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 189bddb8eca4..ac2ece9e9c79 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -623,7 +623,7 @@ struct hstate { unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP unsigned int optimize_vmemmap_pages; #endif #ifdef CONFIG_CGROUP_HUGETLB diff --git a/include/linux/mm.h b/include/linux/mm.h index 16d6b46d2d3e..a8f4c7e96ad5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3145,7 +3145,7 @@ static inline void print_vma_addr(char *prefix, unsigned long rip) } #endif -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP int vmemmap_remap_free(unsigned long start, unsigned long end, unsigned long reuse); int vmemmap_remap_alloc(unsigned long start, unsigned long end, diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 573f499d5a2d..1ea896887ee4 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -190,13 +190,13 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H -#ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP -DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP +DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, hugetlb_optimize_vmemmap_key); static __always_inline bool hugetlb_optimize_vmemmap_enabled(void) { - return static_branch_maybe(CONFIG_HUGETLB_PAGE_FREE_VMEMMAP_DEFAULT_ON, + return static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, &hugetlb_optimize_vmemmap_key); } -- cgit From e3246d8f52173a798710314a42fea83223036fc8 Mon Sep 17 00:00:00 2001 From: Joao Martins Date: Thu, 28 Apr 2022 23:16:15 -0700 Subject: mm/sparse-vmemmap: add a pgmap argument to section activation Patch series "sparse-vmemmap: memory savings for compound devmaps (device-dax)", v9. This series minimizes 'struct page' overhead by pursuing a similar approach as Muchun Song series "Free some vmemmap pages of hugetlb page" (now merged since v5.14), but applied to devmap with @vmemmap_shift (device-dax). The vmemmap dedpulication original idea (already used in HugeTLB) is to reuse/deduplicate tail page vmemmap areas, particular the area which only describes tail pages. So a vmemmap page describes 64 struct pages, and the first page for a given ZONE_DEVICE vmemmap would contain the head page and 63 tail pages. The second vmemmap page would contain only tail pages, and that's what gets reused across the rest of the subsection/section. The bigger the page size, the bigger the savings (2M hpage -> save 6 vmemmap pages; 1G hpage -> save 4094 vmemmap pages). This is done for PMEM /specifically only/ on device-dax configured namespaces, not fsdax. In other words, a devmap with a @vmemmap_shift. In terms of savings, per 1Tb of memory, the struct page cost would go down with compound devmap: * with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of total memory) * with 1G pages we lose 40MB instead of 16G (0.0014% instead of 1.5% of total memory) The series is mostly summed up by patch 4, and to summarize what the series does: Patches 1 - 3: Minor cleanups in preparation for patch 4. Move the very nice docs of hugetlb_vmemmap.c into a Documentation/vm/ entry. Patch 4: Patch 4 is the one that takes care of the struct page savings (also referred to here as tail-page/vmemmap deduplication). Much like Muchun series, we reuse the second PTE tail page vmemmap areas across a given @vmemmap_shift On important difference though, is that contrary to the hugetlbfs series, there's no vmemmap for the area because we are late-populating it as opposed to remapping a system-ram range. IOW no freeing of pages of already initialized vmemmap like the case for hugetlbfs, which greatly simplifies the logic (besides not being arch-specific). altmap case unchanged and still goes via the vmemmap_populate(). Also adjust the newly added docs to the device-dax case. [Note that device-dax is still a little behind HugeTLB in terms of savings. I have an additional simple patch that reuses the head vmemmap page too, as a follow-up. That will double the savings and namespaces initialization.] Patch 5: Initialize fewer struct pages depending on the page size with DRAM backed struct pages -- because fewer pages are unique and most tail pages (with bigger vmemmap_shift). NVDIMM namespace bootstrap improves from ~268-358 ms to ~80-110/<1ms on 128G NVDIMMs with 2M and 1G respectivally. And struct page needed capacity will be 3.8x / 1071x smaller for 2M and 1G respectivelly. Tested on x86 with 1.5Tb of pmem (including pinning, and RDMA registration/deregistration scalability with 2M MRs) This patch (of 5): In support of using compound pages for devmap mappings, plumb the pgmap down to the vmemmap_populate implementation. Note that while altmap is retrievable from pgmap the memory hotplug code passes altmap without pgmap[*], so both need to be independently plumbed. So in addition to @altmap, pass @pgmap to sparse section populate functions namely: sparse_add_section section_activate populate_section_memmap __populate_section_memmap Passing @pgmap allows __populate_section_memmap() to both fetch the vmemmap_shift in which memmap metadata is created for and also to let sparse-vmemmap fetch pgmap ranges to co-relate to a given section and pick whether to just reuse tail pages from past onlined sections. While at it, fix the kdoc for @altmap for sparse_add_section(). [*] https://lore.kernel.org/linux-mm/20210319092635.6214-1-osalvador@suse.de/ Link: https://lkml.kernel.org/r/20220420155310.9712-1-joao.m.martins@oracle.com Link: https://lkml.kernel.org/r/20220420155310.9712-2-joao.m.martins@oracle.com Signed-off-by: Joao Martins Reviewed-by: Dan Williams Reviewed-by: Muchun Song Cc: Vishal Verma Cc: Matthew Wilcox Cc: Jason Gunthorpe Cc: Jane Chu Cc: Mike Kravetz Cc: Jonathan Corbet Cc: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/memory_hotplug.h | 5 ++++- include/linux/mm.h | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 1ce6f8044f1e..e0b2209ab71c 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -15,6 +15,7 @@ struct memory_block; struct memory_group; struct resource; struct vmem_altmap; +struct dev_pagemap; #ifdef CONFIG_HAVE_ARCH_NODEDATA_EXTENSION /* @@ -122,6 +123,7 @@ typedef int __bitwise mhp_t; struct mhp_params { struct vmem_altmap *altmap; pgprot_t pgprot; + struct dev_pagemap *pgmap; }; bool mhp_range_allowed(u64 start, u64 size, bool need_mapping); @@ -333,7 +335,8 @@ extern void remove_pfn_range_from_zone(struct zone *zone, unsigned long nr_pages); extern bool is_memblock_offlined(struct memory_block *mem); extern int sparse_add_section(int nid, unsigned long pfn, - unsigned long nr_pages, struct vmem_altmap *altmap); + unsigned long nr_pages, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap); extern void sparse_remove_section(struct mem_section *ms, unsigned long pfn, unsigned long nr_pages, unsigned long map_offset, struct vmem_altmap *altmap); diff --git a/include/linux/mm.h b/include/linux/mm.h index a8f4c7e96ad5..80bba49387e9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3154,7 +3154,8 @@ int vmemmap_remap_alloc(unsigned long start, unsigned long end, void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, - unsigned long nr_pages, int nid, struct vmem_altmap *altmap); + unsigned long nr_pages, int nid, struct vmem_altmap *altmap, + struct dev_pagemap *pgmap); pgd_t *vmemmap_pgd_populate(unsigned long addr, int node); p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); -- cgit From 4917f55b4ef963e2d2288fe4eb651728be8db406 Mon Sep 17 00:00:00 2001 From: Joao Martins Date: Thu, 28 Apr 2022 23:16:16 -0700 Subject: mm/sparse-vmemmap: improve memory savings for compound devmaps A compound devmap is a dev_pagemap with @vmemmap_shift > 0 and it means that pages are mapped at a given huge page alignment and utilize uses compound pages as opposed to order-0 pages. Take advantage of the fact that most tail pages look the same (except the first two) to minimize struct page overhead. Allocate a separate page for the vmemmap area which contains the head page and separate for the next 64 pages. The rest of the subsections then reuse this tail vmemmap page to initialize the rest of the tail pages. Sections are arch-dependent (e.g. on x86 it's 64M, 128M or 512M) and when initializing compound devmap with big enough @vmemmap_shift (e.g. 1G PUD) it may cross multiple sections. The vmemmap code needs to consult @pgmap so that multiple sections that all map the same tail data can refer back to the first copy of that data for a given gigantic page. On compound devmaps with 2M align, this mechanism lets 6 pages be saved out of the 8 necessary PFNs necessary to set the subsection's 512 struct pages being mapped. On a 1G compound devmap it saves 4094 pages. Altmap isn't supported yet, given various restrictions in altmap pfn allocator, thus fallback to the already in use vmemmap_populate(). It is worth noting that altmap for devmap mappings was there to relieve the pressure of inordinate amounts of memmap space to map terabytes of pmem. With compound pages the motivation for altmaps for pmem gets reduced. Link: https://lkml.kernel.org/r/20220420155310.9712-5-joao.m.martins@oracle.com Signed-off-by: Joao Martins Reviewed-by: Muchun Song Cc: Christoph Hellwig Cc: Dan Williams Cc: Jane Chu Cc: Jason Gunthorpe Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Vishal Verma Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 80bba49387e9..b9316b2c11ce 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3161,7 +3161,7 @@ p4d_t *vmemmap_p4d_populate(pgd_t *pgd, unsigned long addr, int node); pud_t *vmemmap_pud_populate(p4d_t *p4d, unsigned long addr, int node); pmd_t *vmemmap_pmd_populate(pud_t *pud, unsigned long addr, int node); pte_t *vmemmap_pte_populate(pmd_t *pmd, unsigned long addr, int node, - struct vmem_altmap *altmap); + struct vmem_altmap *altmap, struct page *reuse); void *vmemmap_alloc_block(unsigned long size, int node); struct vmem_altmap; void *vmemmap_alloc_block_buf(unsigned long size, int node, -- cgit From ba91fb7dd03c4f463453f28af24851097a6547fb Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 28 Apr 2022 23:16:16 -0700 Subject: include/linux/swapops.h: remove stub for non_swap_entry() The stub for non_swap_entry() may not help much, because MAX_SWAPFILES has already contained all the information to decide whether a swap entry is real swap entry or pesudo ones (migrations, ...). There can be some performance influences on non_swap_entry() with below conditions all met: !CONFIG_MIGRATION && !CONFIG_MEMORY_FAILURE && !CONFIG_DEVICE_PRIVATE But that's definitely not the major config most machines will use, at the meantime it's already in a slow path of swap entry (being parsed from a swap pte), so IMHO it shouldn't be a major issue. Also according to the analysis from Alistair, somehow the stub didn't do the job right [1]. To make the code cleaner, let's drop the stub. [1] https://lore.kernel.org/lkml/8735ihbw6g.fsf@nvdebian.thelocal/ Note: the uffd-wp shmem & hugetlbfs series will need this patch to make sure swap entries work as expected with below config as spotted by Alistair: !CONFIG_MIGRATION && !CONFIG_MEMORY_FAILURE && !CONFIG_DEVICE_PRIVATE && CONFIG_PTE_MARKER (PS: this config should mostly never gonna happen, though, afaict..) Link: https://lkml.kernel.org/r/20220413191147.66645-1-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: Alistair Popple Cc: Hugh Dickins Signed-off-by: Andrew Morton --- include/linux/swapops.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index d356ab4047f7..5af852b68805 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -387,18 +387,10 @@ static inline void num_poisoned_pages_inc(void) } #endif -#if defined(CONFIG_MEMORY_FAILURE) || defined(CONFIG_MIGRATION) || \ - defined(CONFIG_DEVICE_PRIVATE) static inline int non_swap_entry(swp_entry_t entry) { return swp_type(entry) >= MAX_SWAPFILES; } -#else -static inline int non_swap_entry(swp_entry_t entry) -{ - return 0; -} -#endif #endif /* CONFIG_MMU */ #endif /* _LINUX_SWAPOPS_H */ -- cgit From 7609385337a4feb6236e42dcd0df2185683ce839 Mon Sep 17 00:00:00 2001 From: xu xin Date: Thu, 28 Apr 2022 23:16:16 -0700 Subject: ksm: count ksm merging pages for each process Some applications or containers want to use KSM by calling madvise() to advise areas of address space to be MERGEABLE. But they may not know which applications are more likely to cause real merges in the deployment. If this patch is applied, it helps them know their corresponding number of merged pages, and then optimize their app code. As current KSM only counts the number of KSM merging pages(e.g. ksm_pages_sharing and ksm_pages_shared) of the whole system, we cannot see the more fine-grained KSM merging, for the upper application optimization, the merging area cannot be set easily according to the KSM page merging probability of each process. Therefore, it is necessary to add extra statistical means so that the upper level users can know the detailed KSM merging information of each process. We add a new proc file named as ksm_merging_pages under /proc// to indicate the involved ksm merging pages of this process. [akpm@linux-foundation.org: fix comment typo, remove BUG_ON()s] Link: https://lkml.kernel.org/r/20220325082318.2352853-1-xu.xin16@zte.com.cn Signed-off-by: xu xin Reported-by: kernel test robot Reviewed-by: Yang Yang Reviewed-by: Ran Xiaokai Reported-by: Zeal Robot Cc: Kees Cook Cc: Alexey Dobriyan Cc: Stephen Rothwell Cc: Ohhoon Kwon Cc: Matthew Wilcox (Oracle) Cc: Stephen Brennan Cc: Vlastimil Babka Cc: Feng Tang Cc: Yang Yang Cc: Ran Xiaokai Cc: Zeal Robot Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8834e38c06a4..87eddd509de2 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -654,6 +654,13 @@ struct mm_struct { #ifdef CONFIG_IOMMU_SVA u32 pasid; +#endif +#ifdef CONFIG_KSM + /* + * Represent how many pages of this process are involved in KSM + * merging. + */ + unsigned long ksm_merging_pages; #endif } __randomize_layout; -- cgit From 94bfe85bde18a99612bb5d0ce348643c2d352836 Mon Sep 17 00:00:00 2001 From: Yang Yang Date: Thu, 28 Apr 2022 23:16:16 -0700 Subject: mm/vmstat: add events for ksm cow Users may use ksm by calling madvise(, , MADV_MERGEABLE) when they want to save memory, it's a tradeoff by suffering delay on ksm cow. Users can get to know how much memory ksm saved by reading /sys/kernel/mm/ksm/pages_sharing, but they don't know what's the costs of ksm cow, and this is important of some delay sensitive tasks. So add ksm cow events to help users evaluate whether or how to use ksm. Also update Documentation/admin-guide/mm/ksm.rst with new added events. Link: https://lkml.kernel.org/r/20220331035616.2390805-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang Reviewed-by: David Hildenbrand Reviewed-by: xu xin Reviewed-by: Ran Xiaokai Cc: Matthew Wilcox (Oracle) Cc: Jonathan Corbet Cc: Dave Hansen Cc: Saravanan D Cc: Minchan Kim Cc: John Hubbard Signed-off-by: Andrew Morton --- include/linux/vm_event_item.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 16a0a4fd000b..e83967e4c20e 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -133,6 +133,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, KSM_SWPIN_COPY, #endif #endif +#ifdef CONFIG_KSM + COW_KSM, +#endif #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, -- cgit From 024c61eaff17f6a8c73cb8b25137251f0c3ef039 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 28 Apr 2022 23:16:17 -0700 Subject: mm: compaction: remove unneeded return value of kcompactd_run Patch series "A few cleanup and fixup patches for compaction". This series contains a few patches to clean up some obsolete comment, remove unneeded return value and so on. Also we fix the possible NULL pointer dereference. More details can be found in the respective changelogs. This patch (of 12): The return value of kcompactd_run() is unused now. Clean it up. Link: https://lkml.kernel.org/r/20220418141253.24298-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220418141253.24298-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc; Mel Gorman Cc: David Hildenbrand Cc: Vlastimil Babka Cc: Pintu Kumar Cc: Charan Teja Kalla Signed-off-by: Andrew Morton --- include/linux/compaction.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compaction.h b/include/linux/compaction.h index 34bce35c808d..52a9ff65faee 100644 --- a/include/linux/compaction.h +++ b/include/linux/compaction.h @@ -177,7 +177,7 @@ static inline bool compaction_withdrawn(enum compact_result result) bool compaction_zonelist_suitable(struct alloc_context *ac, int order, int alloc_flags); -extern int kcompactd_run(int nid); +extern void kcompactd_run(int nid); extern void kcompactd_stop(int nid); extern void wakeup_kcompactd(pg_data_t *pgdat, int order, int highest_zoneidx); @@ -212,9 +212,8 @@ static inline bool compaction_withdrawn(enum compact_result result) return true; } -static inline int kcompactd_run(int nid) +static inline void kcompactd_run(int nid) { - return 0; } static inline void kcompactd_stop(int nid) { -- cgit From 766475cb526b2fc5ca21374b5810d6d8557870fc Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Thu, 28 Apr 2022 12:28:59 +0200 Subject: ARM: omap1: add back omap_set_dma_priority() stub One of my multiplatform patches went a little too far and removed a declaration that is needed for compile-testing the omapfb driver on non-OMAP1 platforms: arm-linux-gnueabi-ld: drivers/video/fbdev/omap/omapfb_main.o: in function `omapfb_do_probe': omapfb_main.c:(.text+0x41ec): undefined reference to `omap_set_dma_priority' Add back the inline stub, and in turn hide the definition when omapfb is disabled, like we do for the usb specific bits. Reported-by: kernel test robot Reviewed-by: Tony Lindgren Fixes: 52ef8efcb75e ("dma: omap: hide legacy interface") Signed-off-by: Arnd Bergmann --- include/linux/omap-dma.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/omap-dma.h b/include/linux/omap-dma.h index 254b4e10511b..6f6c31e3fb93 100644 --- a/include/linux/omap-dma.h +++ b/include/linux/omap-dma.h @@ -294,7 +294,14 @@ struct omap_system_dma_plat_info { extern struct omap_system_dma_plat_info *omap_get_plat_info(void); +#if defined(CONFIG_ARCH_OMAP1) extern void omap_set_dma_priority(int lch, int dst_port, int priority); +#else +static inline void omap_set_dma_priority(int lch, int dst_port, int priority) +{ +} +#endif + extern int omap_request_dma(int dev_id, const char *dev_name, void (*callback)(int lch, u16 ch_status, void *data), void *data, int *dma_ch); -- cgit From 50e7b416d2ab10b9771bd00a4d85df90ad2e4b37 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Thu, 28 Apr 2022 15:43:37 +0100 Subject: sched/fair: Remove sched_trace_*() helper functions We no longer need them as we can use DWARF debug info or BTF + pahole to re-generate the required structs to compile against them for a given kernel. This moves the burden of maintaining these helper functions to the module. https://github.com/qais-yousef/sched_tp Note that pahole v1.15 is required at least for using DWARF. And for BTF v1.23 which is not yet released will be required. There's alignment problem that will lead to crashes in earlier versions when used with BTF. We should have enough infrastructure to make these helper functions now obsolete, so remove them. [Rewrote commit message to reflect the new alternative] Signed-off-by: Dietmar Eggemann Signed-off-by: Qais Yousef Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220428144338.479094-2-qais.yousef@arm.com --- include/linux/sched.h | 14 -------------- 1 file changed, 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 67f06f72c50e..fc74ea2578b7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2378,20 +2378,6 @@ static inline void rseq_syscall(struct pt_regs *regs) #endif -const struct sched_avg *sched_trace_cfs_rq_avg(struct cfs_rq *cfs_rq); -char *sched_trace_cfs_rq_path(struct cfs_rq *cfs_rq, char *str, int len); -int sched_trace_cfs_rq_cpu(struct cfs_rq *cfs_rq); - -const struct sched_avg *sched_trace_rq_avg_rt(struct rq *rq); -const struct sched_avg *sched_trace_rq_avg_dl(struct rq *rq); -const struct sched_avg *sched_trace_rq_avg_irq(struct rq *rq); - -int sched_trace_rq_cpu(struct rq *rq); -int sched_trace_rq_cpu_capacity(struct rq *rq); -int sched_trace_rq_nr_running(struct rq *rq); - -const struct cpumask *sched_trace_rd_span(struct root_domain *rd); - #ifdef CONFIG_SCHED_CORE extern void sched_core_free(struct task_struct *tsk); extern void sched_core_fork(struct task_struct *p); -- cgit From 498af8d1678ae2351218337b47bbf3cb0fc16821 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Fri, 29 Apr 2022 12:39:45 +0100 Subject: firmware: arm_ffa: Add ffa_dev_get_drvdata helper function Add a helper function to fetch ffa_dev's driver_data using dev_get_drvdata. At the same time move existing ffa_dev_set_drvdata to use dev_set_drvdata. Link: https://lore.kernel.org/r/20220429113946.2087145-3-sudeep.holla@arm.com Suggested-by: Arunachalam Ganapathy Signed-off-by: Sudeep Holla --- include/linux/arm_ffa.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h index 85651e41ded8..e5c76c1ef9ed 100644 --- a/include/linux/arm_ffa.h +++ b/include/linux/arm_ffa.h @@ -38,7 +38,12 @@ struct ffa_driver { static inline void ffa_dev_set_drvdata(struct ffa_device *fdev, void *data) { - fdev->dev.driver_data = data; + dev_set_drvdata(&fdev->dev, data); +} + +static inline void *ffa_dev_get_drvdata(struct ffa_device *fdev) +{ + return dev_get_drvdata(&fdev->dev); } #if IS_REACHABLE(CONFIG_ARM_FFA_TRANSPORT) -- cgit From 892de36fd4a98fab3298d417c051d9099af5448d Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Fri, 29 Apr 2022 12:22:10 -0400 Subject: SUNRPC: Ensure gss-proxy connects on setup For reasons best known to the author, gss-proxy does not implement a NULL procedure, and returns RPC_PROC_UNAVAIL. However we still want to ensure that we connect to the service at setup time. So add a quirk-flag specially for this case. Fixes: 1d658336b05f ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- include/linux/sunrpc/clnt.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 267b7aeaf1a6..db5149567305 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -160,6 +160,7 @@ struct rpc_add_xprt_test { #define RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT (1UL << 9) #define RPC_CLNT_CREATE_SOFTERR (1UL << 10) #define RPC_CLNT_CREATE_REUSEPORT (1UL << 11) +#define RPC_CLNT_CREATE_IGNORE_NULL_UNAVAIL (1UL << 12) struct rpc_clnt *rpc_create(struct rpc_create_args *args); struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *, -- cgit From ec2a0f9c8b50865a11720651c9c536f20c578def Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Fri, 29 Apr 2022 14:36:58 -0700 Subject: kasan: mark KASAN_VMALLOC flags as kasan_vmalloc_flags_t Fix sparse warning: mm/kasan/shadow.c:496:15: warning: restricted kasan_vmalloc_flags_t degrades to integer Link: https://lkml.kernel.org/r/52d8fccdd3a48d4bdfd0ff522553bac2a13f1579.1649351254.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reported-by: kernel test robot Cc: Andrey Konovalov Cc: Marco Elver Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Andrey Ryabinin Signed-off-by: Andrew Morton --- include/linux/kasan.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index ceebcb9de7bf..b092277bf48d 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -23,10 +23,10 @@ struct task_struct; typedef unsigned int __bitwise kasan_vmalloc_flags_t; -#define KASAN_VMALLOC_NONE 0x00u -#define KASAN_VMALLOC_INIT 0x01u -#define KASAN_VMALLOC_VM_ALLOC 0x02u -#define KASAN_VMALLOC_PROT_NORMAL 0x04u +#define KASAN_VMALLOC_NONE ((__force kasan_vmalloc_flags_t)0x00u) +#define KASAN_VMALLOC_INIT ((__force kasan_vmalloc_flags_t)0x01u) +#define KASAN_VMALLOC_VM_ALLOC ((__force kasan_vmalloc_flags_t)0x02u) +#define KASAN_VMALLOC_PROT_NORMAL ((__force kasan_vmalloc_flags_t)0x04u) #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) -- cgit From 5d8de293c224896a4da99763fce4f9794308caf4 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 14:37:59 -0700 Subject: vmcore: convert copy_oldmem_page() to take an iov_iter Patch series "Convert vmcore to use an iov_iter", v5. For some reason several people have been sending bad patches to fix compiler warnings in vmcore recently. Here's how it should be done. Compile-tested only on x86. As noted in the first patch, s390 should take this conversion a bit further, but I'm not inclined to do that work myself. This patch (of 3): Instead of passing in a 'buf' and 'userbuf' argument, pass in an iov_iter. s390 needs more work to pass the iov_iter down further, or refactor, but I'd be more comfortable if someone who can test on s390 did that work. It's more convenient to convert the whole of read_from_oldmem() to take an iov_iter at the same time, so rename it to read_from_oldmem_iter() and add a temporary read_from_oldmem() wrapper that creates an iov_iter. Link: https://lkml.kernel.org/r/20220408090636.560886-1-bhe@redhat.com Link: https://lkml.kernel.org/r/20220408090636.560886-2-bhe@redhat.com Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Baoquan He Reviewed-by: Christoph Hellwig Cc: Heiko Carstens Signed-off-by: Andrew Morton --- include/linux/crash_dump.h | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h index 620821549b23..a1cf7d5c03c7 100644 --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -24,11 +24,10 @@ extern int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from, unsigned long pfn, unsigned long size, pgprot_t prot); -extern ssize_t copy_oldmem_page(unsigned long, char *, size_t, - unsigned long, int); -extern ssize_t copy_oldmem_page_encrypted(unsigned long pfn, char *buf, - size_t csize, unsigned long offset, - int userbuf); +ssize_t copy_oldmem_page(struct iov_iter *i, unsigned long pfn, size_t csize, + unsigned long offset); +ssize_t copy_oldmem_page_encrypted(struct iov_iter *iter, unsigned long pfn, + size_t csize, unsigned long offset); void vmcore_cleanup(void); -- cgit From e0690479917cbce740eef51fa3de92c69647a5ad Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 14:37:59 -0700 Subject: vmcore: convert read_from_oldmem() to take an iov_iter Remove the read_from_oldmem() wrapper introduced earlier and convert all the remaining callers to pass an iov_iter. Link: https://lkml.kernel.org/r/20220408090636.560886-4-bhe@redhat.com Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Baoquan He Reviewed-by: Christoph Hellwig Cc: Tiezhu Yang Cc: Amit Daniel Kachhap Cc: Al Viro Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/crash_dump.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/crash_dump.h b/include/linux/crash_dump.h index a1cf7d5c03c7..0f3a656293b0 100644 --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -134,13 +134,11 @@ static inline int vmcore_add_device_dump(struct vmcoredd_data *data) #endif /* CONFIG_PROC_VMCORE_DEVICE_DUMP */ #ifdef CONFIG_PROC_VMCORE -ssize_t read_from_oldmem(char *buf, size_t count, - u64 *ppos, int userbuf, - bool encrypted); +ssize_t read_from_oldmem(struct iov_iter *iter, size_t count, + u64 *ppos, bool encrypted); #else -static inline ssize_t read_from_oldmem(char *buf, size_t count, - u64 *ppos, int userbuf, - bool encrypted) +static inline ssize_t read_from_oldmem(struct iov_iter *iter, size_t count, + u64 *ppos, bool encrypted) { return -EOPNOTSUPP; } -- cgit From f485922d8fe4e44f6d52a5bb95a603b7c65554bb Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 29 Apr 2022 14:38:01 -0700 Subject: pipe: make poll_usage boolean and annotate its access Patch series "Fix data-races around epoll reported by KCSAN." This series suppresses a false positive KCSAN's message and fixes a real data-race. This patch (of 2): pipe_poll() runs locklessly and assigns 1 to poll_usage. Once poll_usage is set to 1, it never changes in other places. However, concurrent writes of a value trigger KCSAN, so let's make KCSAN happy. BUG: KCSAN: data-race in pipe_poll / pipe_poll write to 0xffff8880042f6678 of 4 bytes by task 174 on cpu 3: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) write to 0xffff8880042f6678 of 4 bytes by task 177 on cpu 1: pipe_poll (fs/pipe.c:656) ep_item_poll.isra.0 (./include/linux/poll.h:88 fs/eventpoll.c:853) do_epoll_wait (fs/eventpoll.c:1692 fs/eventpoll.c:1806 fs/eventpoll.c:2234) __x64_sys_epoll_wait (fs/eventpoll.c:2246 fs/eventpoll.c:2241 fs/eventpoll.c:2241) do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:113) Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 177 Comm: epoll_race Not tainted 5.17.0-58927-gf443e374ae13 #6 Hardware name: Red Hat KVM, BIOS 1.11.0-2.amzn2 04/01/2014 Link: https://lkml.kernel.org/r/20220322002653.33865-1-kuniyu@amazon.co.jp Link: https://lkml.kernel.org/r/20220322002653.33865-2-kuniyu@amazon.co.jp Fixes: 3b844826b6c6 ("pipe: avoid unnecessary EPOLLET wakeups under normal loads") Signed-off-by: Kuniyuki Iwashima Cc: Alexander Duyck Cc: Al Viro Cc: Davidlohr Bueso Cc: Kuniyuki Iwashima Cc: "Soheil Hassas Yeganeh" Cc: "Sridhar Samudrala" Signed-off-by: Andrew Morton --- include/linux/pipe_fs_i.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index c00c618ef290..cb0fd633a610 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -71,7 +71,7 @@ struct pipe_inode_info { unsigned int files; unsigned int r_counter; unsigned int w_counter; - unsigned int poll_usage; + bool poll_usage; struct page *tmp_page; struct fasync_struct *fasync_readers; struct fasync_struct *fasync_writers; -- cgit From d679ae94fdd5d3ab00c35078f5af5f37e068b03d Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Fri, 29 Apr 2022 14:38:01 -0700 Subject: list: fix a data-race around ep->rdllist ep_poll() first calls ep_events_available() with no lock held and checks if ep->rdllist is empty by list_empty_careful(), which reads rdllist->prev. Thus all accesses to it need some protection to avoid store/load-tearing. Note INIT_LIST_HEAD_RCU() already has the annotation for both prev and next. Commit bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") added the first lockless ep_events_available(), and commit c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") made some ep_events_available() calls lockless and added single call under a lock, finally commit e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") made the last ep_events_available() lockless. BUG: KCSAN: data-race in do_epoll_wait / do_epoll_wait write to 0xffff88810480c7d8 of 8 bytes by task 1802 on cpu 0: INIT_LIST_HEAD include/linux/list.h:38 [inline] list_splice_init include/linux/list.h:492 [inline] ep_start_scan fs/eventpoll.c:622 [inline] ep_send_events fs/eventpoll.c:1656 [inline] ep_poll fs/eventpoll.c:1806 [inline] do_epoll_wait+0x4eb/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae read to 0xffff88810480c7d8 of 8 bytes by task 1799 on cpu 1: list_empty_careful include/linux/list.h:329 [inline] ep_events_available fs/eventpoll.c:381 [inline] ep_poll fs/eventpoll.c:1797 [inline] do_epoll_wait+0x279/0xf40 fs/eventpoll.c:2234 do_epoll_pwait fs/eventpoll.c:2268 [inline] __do_sys_epoll_pwait fs/eventpoll.c:2281 [inline] __se_sys_epoll_pwait+0x12b/0x240 fs/eventpoll.c:2275 __x64_sys_epoll_pwait+0x74/0x80 fs/eventpoll.c:2275 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae value changed: 0xffff88810480c7d0 -> 0xffff888103c15098 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 1799 Comm: syz-fuzzer Tainted: G W 5.17.0-rc7-syzkaller-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Link: https://lkml.kernel.org/r/20220322002653.33865-3-kuniyu@amazon.co.jp Fixes: e59d3c64cba6 ("epoll: eliminate unnecessary lock for zero timeout") Fixes: c5a282e9635e ("fs/epoll: reduce the scope of wq lock in epoll_wait()") Fixes: bf3b9f6372c4 ("epoll: Add busy poll support to epoll with socket fds.") Signed-off-by: Kuniyuki Iwashima Reported-by: syzbot+bdd6e38a1ed5ee58d8bd@syzkaller.appspotmail.com Cc: Al Viro , Andrew Morton Cc: Kuniyuki Iwashima Cc: Kuniyuki Iwashima Cc: "Soheil Hassas Yeganeh" Cc: Davidlohr Bueso Cc: "Sridhar Samudrala" Cc: Alexander Duyck Signed-off-by: Andrew Morton --- include/linux/list.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/list.h b/include/linux/list.h index dd6c2041d09c..d7d2bfa1a365 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -35,7 +35,7 @@ static inline void INIT_LIST_HEAD(struct list_head *list) { WRITE_ONCE(list->next, list); - list->prev = list; + WRITE_ONCE(list->prev, list); } #ifdef CONFIG_DEBUG_LIST @@ -306,7 +306,7 @@ static inline int list_empty(const struct list_head *head) static inline void list_del_init_careful(struct list_head *entry) { __list_del_entry(entry); - entry->prev = entry; + WRITE_ONCE(entry->prev, entry); smp_store_release(&entry->next, entry); } @@ -326,7 +326,7 @@ static inline void list_del_init_careful(struct list_head *entry) static inline int list_empty_careful(const struct list_head *head) { struct list_head *next = smp_load_acquire(&head->next); - return list_is_head(next, head) && (next == head->prev); + return list_is_head(next, head) && (next == READ_ONCE(head->prev)); } /** -- cgit From a9866bef5171c859cfabc1155c594d28f194aa23 Mon Sep 17 00:00:00 2001 From: Tiezhu Yang Date: Fri, 29 Apr 2022 14:38:02 -0700 Subject: ptrace: fix wrong comment of PT_DTRACE PT_DTRACE is only used on um now, fix the wrong comment. Link: https://lkml.kernel.org/r/1649240981-11024-3-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Tiezhu Yang Cc: Oleg Nesterov Signed-off-by: Andrew Morton --- include/linux/ptrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 15b3d176b6b4..db4509587d2c 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -30,7 +30,7 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr, #define PT_SEIZED 0x00010000 /* SEIZE used, enable new behavior */ #define PT_PTRACED 0x00000001 -#define PT_DTRACE 0x00000002 /* delayed trace (used on m68k, i386) */ +#define PT_DTRACE 0x00000002 /* delayed trace (used on um) */ #define PT_OPT_FLAG_SHIFT 3 /* PT_TRACE_* event enable flags */ -- cgit From f94fd25cb0aaf77fd7453f31c5d394a1a68ecf60 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 28 Apr 2022 18:45:06 -0600 Subject: tcp: pass back data left in socket after receive This is currently done for CMSG_INQ, add an ability to do so via struct msghdr as well and have CMSG_INQ use that too. If the caller sets msghdr->msg_get_inq, then we'll pass back the hint in msghdr->msg_inq. Rearrange struct msghdr a bit so we can add this member while shrinking it at the same time. On a 64-bit build, it was 96 bytes before this change and 88 bytes afterwards. Reviewed-by: Eric Dumazet Signed-off-by: Jens Axboe Link: https://lore.kernel.org/r/650c22ca-cffc-0255-9a05-2413a1e20826@kernel.dk Signed-off-by: Jakub Kicinski --- include/linux/socket.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/socket.h b/include/linux/socket.h index 6f85f5d957ef..12085c9a8544 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -50,6 +50,9 @@ struct linger { struct msghdr { void *msg_name; /* ptr to socket address structure */ int msg_namelen; /* size of socket address structure */ + + int msg_inq; /* output, data left in socket */ + struct iov_iter msg_iter; /* data */ /* @@ -62,8 +65,9 @@ struct msghdr { void __user *msg_control_user; }; bool msg_control_is_user : 1; - __kernel_size_t msg_controllen; /* ancillary data buffer length */ + bool msg_get_inq : 1;/* return INQ after receive */ unsigned int msg_flags; /* flags on received message */ + __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ }; -- cgit From 657dd5f97b2ed16cdaa6339f42f9130240af1c04 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 28 Apr 2022 11:58:45 +0100 Subject: net: inline skb_zerocopy_iter_dgram skb_zerocopy_iter_dgram() is a small proxy function, inline it. For that, move __zerocopy_sg_from_iter into linux/skbuff.h Signed-off-by: Pavel Begunkov Signed-off-by: David S. Miller --- include/linux/skbuff.h | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index dc4b3e1cf21b..3270cb72e4d8 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -684,20 +684,6 @@ struct ubuf_info { int mm_account_pinned_pages(struct mmpin *mmp, size_t size); void mm_unaccount_pinned_pages(struct mmpin *mmp); -struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size); -struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, - struct ubuf_info *uarg); - -void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); - -void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, - bool success); - -int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len); -int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, - struct msghdr *msg, int len, - struct ubuf_info *uarg); - /* This data is invariant across clones and lives at * the end of the header data, ie. at skb->end. */ @@ -1679,6 +1665,28 @@ static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) } #endif +struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size); +struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, + struct ubuf_info *uarg); + +void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); + +void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, + bool success); + +int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length); + +static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb, + struct msghdr *msg, int len) +{ + return __zerocopy_sg_from_iter(skb->sk, skb, &msg->msg_iter, len); +} + +int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, + struct msghdr *msg, int len, + struct ubuf_info *uarg); + /* Internal */ #define skb_shinfo(SKB) ((struct skb_shared_info *)(skb_end_pointer(SKB))) -- cgit From c526fd8f9f4f21cb83c0b1c9a1ee9c0ac9be9e2e Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 28 Apr 2022 11:58:46 +0100 Subject: net: inline dev_queue_xmit() Inline dev_queue_xmit() and dev_queue_xmit_accel(), they both are small proxy functions doing nothing but redirecting the control flow to __dev_queue_xmit(). Signed-off-by: Pavel Begunkov Signed-off-by: David S. Miller --- include/linux/netdevice.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b75ca2d095ae..4aba92a4042a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2940,10 +2940,20 @@ u16 dev_pick_tx_zero(struct net_device *dev, struct sk_buff *skb, u16 dev_pick_tx_cpu_id(struct net_device *dev, struct sk_buff *skb, struct net_device *sb_dev); -int dev_queue_xmit(struct sk_buff *skb); -int dev_queue_xmit_accel(struct sk_buff *skb, struct net_device *sb_dev); +int __dev_queue_xmit(struct sk_buff *skb, struct net_device *sb_dev); int __dev_direct_xmit(struct sk_buff *skb, u16 queue_id); +static inline int dev_queue_xmit(struct sk_buff *skb) +{ + return __dev_queue_xmit(skb, NULL); +} + +static inline int dev_queue_xmit_accel(struct sk_buff *skb, + struct net_device *sb_dev) +{ + return __dev_queue_xmit(skb, sb_dev); +} + static inline int dev_direct_xmit(struct sk_buff *skb, u16 queue_id) { int ret; -- cgit From 48cec73a891cca087fbc7791c4753784180991a9 Mon Sep 17 00:00:00 2001 From: Horatiu Vultur Date: Fri, 29 Apr 2022 09:19:53 +0200 Subject: net: lan966x: Fix compilation error MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Starting from the blamed commit, the lan966x build fails with the following compilation error: drivers/net/ethernet/microchip/lan966x/lan966x_ptp.c:342:9: error: implicit declaration of function ‘ptp_find_pin_unlocked’ [-Werror=implicit-function-declaration] 342 | pin = ptp_find_pin_unlocked(phc->clock, PTP_PF_EXTTS, 0); The issue is that there is no stub function for ptp_find_pin_unlocked in case CONFIG_PTP_1588_CLOCK is not selected. Therefore add one. Reported-by: kernel test robot Fixes: f3d8e0a9c28ba0 ("net: lan966x: Add support for PTP_PF_EXTTS") Signed-off-by: Horatiu Vultur Signed-off-by: David S. Miller --- include/linux/ptp_clock_kernel.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h index 554454cb8693..e8cc8b6bbf50 100644 --- a/include/linux/ptp_clock_kernel.h +++ b/include/linux/ptp_clock_kernel.h @@ -321,6 +321,10 @@ static inline int ptp_clock_index(struct ptp_clock *ptp) static inline int ptp_find_pin(struct ptp_clock *ptp, enum ptp_pin_function func, unsigned int chan) { return -1; } +static inline int ptp_find_pin_unlocked(struct ptp_clock *ptp, + enum ptp_pin_function func, + unsigned int chan) +{ return -1; } static inline int ptp_schedule_worker(struct ptp_clock *ptp, unsigned long delay) { return -EOPNOTSUPP; } -- cgit From e788be95a57a9bebe446878ce9bf2750f6fe4974 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 28 Apr 2022 17:25:16 -0600 Subject: task_work: allow TWA_SIGNAL without a rescheduling IPI Some use cases don't always need an IPI when sending a TWA_SIGNAL notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL, except it doesn't send an IPI to the target task. It merely sets TIF_NOTIFY_SIGNAL and wakes up the task. This can be useful in avoiding a forceful transition to the kernel if the task is running in userspace. Depending on the task_work in question, it may be quite fine waiting for the next reschedule or kernel enter anyway, or the use case may even have other mechanisms for hinting to the task that a transition may be useful. This can drive more cooperative scheduling of task_work. Reviewed-by: Pavel Begunkov Link: https://lore.kernel.org/r/821f42b6-7d91-8074-8212-d34998097de4@kernel.dk Signed-off-by: Jens Axboe --- include/linux/sched/signal.h | 13 +++++++++++-- include/linux/task_work.h | 1 + 2 files changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 3c8b34876744..66b689f6cfcb 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -355,14 +355,23 @@ static inline void clear_notify_signal(void) smp_mb__after_atomic(); } +/* + * Returns 'true' if kick_process() is needed to force a transition from + * user -> kernel to guarantee expedient run of TWA_SIGNAL based task_work. + */ +static inline bool __set_notify_signal(struct task_struct *task) +{ + return !test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) && + !wake_up_state(task, TASK_INTERRUPTIBLE); +} + /* * Called to break out of interruptible wait loops, and enter the * exit_to_user_mode_loop(). */ static inline void set_notify_signal(struct task_struct *task) { - if (!test_and_set_tsk_thread_flag(task, TIF_NOTIFY_SIGNAL) && - !wake_up_state(task, TASK_INTERRUPTIBLE)) + if (__set_notify_signal(task)) kick_process(task); } diff --git a/include/linux/task_work.h b/include/linux/task_work.h index 897494b597ba..795ef5a68429 100644 --- a/include/linux/task_work.h +++ b/include/linux/task_work.h @@ -17,6 +17,7 @@ enum task_work_notify_mode { TWA_NONE, TWA_RESUME, TWA_SIGNAL, + TWA_SIGNAL_NO_IPI, }; static inline bool task_work_pending(struct task_struct *task) -- cgit From d664e399128bd78b905ff480917e2c2d4949e101 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Wed, 13 Apr 2022 15:31:02 +0200 Subject: sched: Fix missing prototype warnings A W=1 build emits more than a dozen missing prototype warnings related to scheduler and scheduler specific includes. Reported-by: kernel test robot Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220413133024.249118058@linutronix.de --- include/linux/sched.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index fc74ea2578b7..a27316f5f737 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2388,4 +2388,6 @@ static inline void sched_core_free(struct task_struct *tsk) { } static inline void sched_core_fork(struct task_struct *p) { } #endif +extern void sched_set_stop_task(int cpu, struct task_struct *stop); + #endif -- cgit From 1a90bfd220201fbe050dfc15deaac20ca5f15638 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Wed, 13 Apr 2022 15:31:05 +0200 Subject: smp: Make softirq handling RT safe in flush_smp_call_function_queue() flush_smp_call_function_queue() invokes do_softirq() which is not available on PREEMPT_RT. flush_smp_call_function_queue() is invoked from the idle task and the migration task with preemption or interrupts disabled. So RT kernels cannot process soft interrupts in that context as that has to acquire 'sleeping spinlocks' which is not possible with preemption or interrupts disabled and forbidden from the idle task anyway. The currently known SMP function call which raises a soft interrupt is in the block layer, but this functionality is not enabled on RT kernels due to latency and performance reasons. RT could wake up ksoftirqd unconditionally, but this wants to be avoided if there were soft interrupts pending already when this is invoked in the context of the migration task. The migration task might have preempted a threaded interrupt handler which raised a soft interrupt, but did not reach the local_bh_enable() to process it. The "running" ksoftirqd might prevent the handling in the interrupt thread context which is causing latency issues. Add a new function which handles this case explicitely for RT and falls back to do_softirq() on !RT kernels. In the RT case this warns when one of the flushed SMP function calls raised a soft interrupt so this can be investigated. [ tglx: Moved the RT part out of SMP code ] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/YgKgL6aPj8aBES6G@linutronix.de Link: https://lore.kernel.org/r/20220413133024.356509586@linutronix.de --- include/linux/interrupt.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index f40754caaefa..a49fe8d88676 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -607,6 +607,15 @@ struct softirq_action asmlinkage void do_softirq(void); asmlinkage void __do_softirq(void); +#ifdef CONFIG_PREEMPT_RT +extern void do_softirq_post_smp_call_flush(unsigned int was_pending); +#else +static inline void do_softirq_post_smp_call_flush(unsigned int unused) +{ + do_softirq(); +} +#endif + extern void open_softirq(int nr, void (*action)(struct softirq_action *)); extern void softirq_init(void); extern void __raise_softirq_irqoff(unsigned int nr); -- cgit From 47f753c1108e287edb3e27fad8a7511a9d55578e Mon Sep 17 00:00:00 2001 From: Tan Tee Min Date: Fri, 29 Apr 2022 19:58:07 +0800 Subject: net: stmmac: disable Split Header (SPH) for Intel platforms Based on DesignWare Ethernet QoS datasheet, we are seeing the limitation of Split Header (SPH) feature is not supported for Ipv4 fragmented packet. This SPH limitation will cause ping failure when the packets size exceed the MTU size. For example, the issue happens once the basic ping packet size is larger than the configured MTU size and the data is lost inside the fragmented packet, replaced by zeros/corrupted values, and leads to ping fail. So, disable the Split Header for Intel platforms. v2: Add fixes tag in commit message. Fixes: 67afd6d1cfdf("net: stmmac: Add Split Header support and enable it in XGMAC cores") Cc: # 5.10.x Suggested-by: Ong, Boon Leong Signed-off-by: Mohammad Athari Bin Ismail Signed-off-by: Wong Vee Khee Signed-off-by: Tan Tee Min Signed-off-by: David S. Miller --- include/linux/stmmac.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 24eea1b05ca2..29917850f079 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -270,5 +270,6 @@ struct plat_stmmacenet_data { int msi_rx_base_vec; int msi_tx_base_vec; bool use_phy_wol; + bool sph_disable; }; #endif -- cgit From 9bd1d9a0d8bb1a549831fd98fcc3105960f7068b Mon Sep 17 00:00:00 2001 From: Sven Peter Date: Sun, 1 May 2022 16:55:06 +0200 Subject: soc: apple: Add RTKit IPC library Apple SoCs such as the M1 come with multiple embedded co-processors running proprietary firmware. Communication with those is established over a simple mailbox using the RTKit IPC protocol. This cannot be implemented inside the mailbox subsystem since on top of communication over channels we also need support for starting, hibernating and resetting these co-processors. We also need to handle shared memory allocations differently depending on the co-processor and don't want to split that across multiple drivers. Reviewed-by: Arnd Bergmann Signed-off-by: Sven Peter --- include/linux/soc/apple/rtkit.h | 155 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 155 insertions(+) create mode 100644 include/linux/soc/apple/rtkit.h (limited to 'include/linux') diff --git a/include/linux/soc/apple/rtkit.h b/include/linux/soc/apple/rtkit.h new file mode 100644 index 000000000000..88eb832eac7b --- /dev/null +++ b/include/linux/soc/apple/rtkit.h @@ -0,0 +1,155 @@ +/* SPDX-License-Identifier: GPL-2.0-only OR MIT */ +/* + * Apple RTKit IPC Library + * Copyright (C) The Asahi Linux Contributors + * + * Apple's SoCs come with various co-processors running their RTKit operating + * system. This protocol library is used by client drivers to use the + * features provided by them. + */ +#ifndef _LINUX_APPLE_RTKIT_H_ +#define _LINUX_APPLE_RTKIT_H_ + +#include +#include +#include + +/* + * Struct to represent implementation-specific RTKit operations. + * + * @buffer: Shared memory buffer allocated inside normal RAM. + * @iomem: Shared memory buffer controlled by the co-processors. + * @size: Size of the shared memory buffer. + * @iova: Device VA of shared memory buffer. + * @is_mapped: Shared memory buffer is managed by the co-processor. + */ + +struct apple_rtkit_shmem { + void *buffer; + void __iomem *iomem; + size_t size; + dma_addr_t iova; + bool is_mapped; +}; + +/* + * Struct to represent implementation-specific RTKit operations. + * + * @crashed: Called when the co-processor has crashed. Runs in process + * context. + * @recv_message: Function called when a message from RTKit is received + * on a non-system endpoint. Called from a worker thread. + * @recv_message_early: + * Like recv_message, but called from atomic context. It + * should return true if it handled the message. If it + * returns false, the message will be passed on to the + * worker thread. + * @shmem_setup: Setup shared memory buffer. If bfr.is_iomem is true the + * buffer is managed by the co-processor and needs to be mapped. + * Otherwise the buffer is managed by Linux and needs to be + * allocated. If not specified dma_alloc_coherent is used. + * Called in process context. + * @shmem_destroy: Undo the shared memory buffer setup in shmem_setup. If not + * specified dma_free_coherent is used. Called in process + * context. + */ +struct apple_rtkit_ops { + void (*crashed)(void *cookie); + void (*recv_message)(void *cookie, u8 endpoint, u64 message); + bool (*recv_message_early)(void *cookie, u8 endpoint, u64 message); + int (*shmem_setup)(void *cookie, struct apple_rtkit_shmem *bfr); + void (*shmem_destroy)(void *cookie, struct apple_rtkit_shmem *bfr); +}; + +struct apple_rtkit; + +/* + * Initializes the internal state required to handle RTKit. This + * should usually be called within _probe. + * + * @dev: Pointer to the device node this coprocessor is assocated with + * @cookie: opaque cookie passed to all functions defined in rtkit_ops + * @mbox_name: mailbox name used to communicate with the co-processor + * @mbox_idx: mailbox index to be used if mbox_name is NULL + * @ops: pointer to rtkit_ops to be used for this co-processor + */ +struct apple_rtkit *devm_apple_rtkit_init(struct device *dev, void *cookie, + const char *mbox_name, int mbox_idx, + const struct apple_rtkit_ops *ops); + +/* + * Reinitialize internal structures. Must only be called with the co-processor + * is held in reset. + */ +int apple_rtkit_reinit(struct apple_rtkit *rtk); + +/* + * Handle RTKit's boot process. Should be called after the CPU of the + * co-processor has been started. + */ +int apple_rtkit_boot(struct apple_rtkit *rtk); + +/* + * Quiesce the co-processor. + */ +int apple_rtkit_quiesce(struct apple_rtkit *rtk); + +/* + * Wake the co-processor up from hibernation mode. + */ +int apple_rtkit_wake(struct apple_rtkit *rtk); + +/* + * Shutdown the co-processor + */ +int apple_rtkit_shutdown(struct apple_rtkit *rtk); + +/* + * Checks if RTKit is running and ready to handle messages. + */ +bool apple_rtkit_is_running(struct apple_rtkit *rtk); + +/* + * Checks if RTKit has crashed. + */ +bool apple_rtkit_is_crashed(struct apple_rtkit *rtk); + +/* + * Starts an endpoint. Must be called after boot but before any messages can be + * sent or received from that endpoint. + */ +int apple_rtkit_start_ep(struct apple_rtkit *rtk, u8 endpoint); + +/* + * Send a message to the given endpoint. + * + * @rtk: RTKit reference + * @ep: target endpoint + * @message: message to be sent + * @completeion: will be completed once the message has been submitted + * to the hardware FIFO. Can be NULL. + * @atomic: if set to true this function can be called from atomic + * context. + */ +int apple_rtkit_send_message(struct apple_rtkit *rtk, u8 ep, u64 message, + struct completion *completion, bool atomic); + +/* + * Send a message to the given endpoint and wait until it has been submitted + * to the hardware FIFO. + * Will return zero on success and a negative error code on failure + * (e.g. -ETIME when the message couldn't be written within the given + * timeout) + * + * @rtk: RTKit reference + * @ep: target endpoint + * @message: message to be sent + * @timeout: timeout in milliseconds to allow the message transmission + * to be completed + * @atomic: if set to true this function can be called from atomic + * context. + */ +int apple_rtkit_send_message_wait(struct apple_rtkit *rtk, u8 ep, u64 message, + unsigned long timeout, bool atomic); + +#endif /* _LINUX_APPLE_RTKIT_H_ */ -- cgit From 3254e0b9eb5649ffaa48717ebc9c593adc4ee6a9 Mon Sep 17 00:00:00 2001 From: Alexandru Tachici Date: Fri, 29 Apr 2022 18:34:31 +0300 Subject: ethtool: Add 10base-T1L link mode entry Add entry for the 10base-T1L full duplex mode. Reviewed-by: Andrew Lunn Reviewed-by: Oleksij Rempel Signed-off-by: Alexandru Tachici Signed-off-by: David S. Miller --- include/linux/phy.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 36ca2b5c2253..b12af9e2f389 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -65,7 +65,7 @@ extern const int phy_basic_ports_array[3]; extern const int phy_fibre_port_array[1]; extern const int phy_all_ports_features_array[7]; extern const int phy_10_100_features_array[4]; -extern const int phy_basic_t1_features_array[2]; +extern const int phy_basic_t1_features_array[3]; extern const int phy_gbit_features_array[2]; extern const int phy_10gbit_features_array[1]; -- cgit From 3da8ffd8545f62fec85a48a3c637b2f427974f11 Mon Sep 17 00:00:00 2001 From: Alexandru Tachici Date: Fri, 29 Apr 2022 18:34:34 +0300 Subject: net: phy: Add 10BASE-T1L support in phy-c45 This patch is needed because the BASE-T1 uses different registers for status, control and advertisement to those already employed in the existing phy-c45 functions. Where required, genphy_c45 functions will now check whether the device supports BASE-T1 and use the specific registers instead: 45.2.7.19 BASE-T1 AN control register, 45.2.7.20 BASE-T1 AN status, 45.2.7.21 BASE-T1 AN advertisement register, 45.2.7.22 BASE-T1 AN LP Base Page ability register, 45.2.1.185 BASE-T1 PMA/PMD control register. Tested-by: Oleksij Rempel Signed-off-by: Alexandru Tachici Signed-off-by: David S. Miller --- include/linux/mdio.h | 70 ++++++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/phy.h | 3 +++ 2 files changed, 73 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mdio.h b/include/linux/mdio.h index ecac96d52e01..00177567cfef 100644 --- a/include/linux/mdio.h +++ b/include/linux/mdio.h @@ -340,6 +340,76 @@ static inline void mii_10gbt_stat_mod_linkmode_lpa_t(unsigned long *advertising, advertising, lpa & MDIO_AN_10GBT_STAT_LP10G); } +/** + * mii_t1_adv_l_mod_linkmode_t + * @advertising: target the linkmode advertisement settings + * @lpa: value of the BASE-T1 Autonegotiation Advertisement [15:0] Register + * + * A small helper function that translates BASE-T1 Autonegotiation + * Advertisement [15:0] Register bits to linkmode advertisement settings. + * Other bits in advertising aren't changed. + */ +static inline void mii_t1_adv_l_mod_linkmode_t(unsigned long *advertising, u32 lpa) +{ + linkmode_mod_bit(ETHTOOL_LINK_MODE_Pause_BIT, advertising, + lpa & MDIO_AN_T1_ADV_L_PAUSE_CAP); + linkmode_mod_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, advertising, + lpa & MDIO_AN_T1_ADV_L_PAUSE_ASYM); +} + +/** + * mii_t1_adv_m_mod_linkmode_t + * @advertising: target the linkmode advertisement settings + * @lpa: value of the BASE-T1 Autonegotiation Advertisement [31:16] Register + * + * A small helper function that translates BASE-T1 Autonegotiation + * Advertisement [31:16] Register bits to linkmode advertisement settings. + * Other bits in advertising aren't changed. + */ +static inline void mii_t1_adv_m_mod_linkmode_t(unsigned long *advertising, u32 lpa) +{ + linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT1L_Full_BIT, + advertising, lpa & MDIO_AN_T1_ADV_M_B10L); +} + +/** + * linkmode_adv_to_mii_t1_adv_l_t + * @advertising: the linkmode advertisement settings + * + * A small helper function that translates linkmode advertisement + * settings to phy autonegotiation advertisements for the + * BASE-T1 Autonegotiation Advertisement [15:0] Register. + */ +static inline u32 linkmode_adv_to_mii_t1_adv_l_t(unsigned long *advertising) +{ + u32 result = 0; + + if (linkmode_test_bit(ETHTOOL_LINK_MODE_Pause_BIT, advertising)) + result |= MDIO_AN_T1_ADV_L_PAUSE_CAP; + if (linkmode_test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, advertising)) + result |= MDIO_AN_T1_ADV_L_PAUSE_ASYM; + + return result; +} + +/** + * linkmode_adv_to_mii_t1_adv_m_t + * @advertising: the linkmode advertisement settings + * + * A small helper function that translates linkmode advertisement + * settings to phy autonegotiation advertisements for the + * BASE-T1 Autonegotiation Advertisement [31:16] Register. + */ +static inline u32 linkmode_adv_to_mii_t1_adv_m_t(unsigned long *advertising) +{ + u32 result = 0; + + if (linkmode_test_bit(ETHTOOL_LINK_MODE_10baseT1L_Full_BIT, advertising)) + result |= MDIO_AN_T1_ADV_M_B10L; + + return result; +} + int __mdiobus_read(struct mii_bus *bus, int addr, u32 regnum); int __mdiobus_write(struct mii_bus *bus, int addr, u32 regnum, u16 val); int __mdiobus_modify_changed(struct mii_bus *bus, int addr, u32 regnum, diff --git a/include/linux/phy.h b/include/linux/phy.h index b12af9e2f389..2d12054932ba 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -570,6 +570,7 @@ struct macsec_ops; * @autoneg_complete: Flag auto negotiation of the link has completed * @mdix: Current crossover * @mdix_ctrl: User setting of crossover + * @pma_extable: Cached value of PMA/PMD Extended Abilities Register * @interrupts: Flag interrupts have been enabled * @interface: enum phy_interface_t value * @skb: Netlink message for cable diagnostics @@ -698,6 +699,8 @@ struct phy_device { u8 mdix; u8 mdix_ctrl; + int pma_extable; + void (*phy_link_change)(struct phy_device *phydev, bool up); void (*adjust_link)(struct net_device *dev); -- cgit From 246d921646c071b878480997c294db6c83215b06 Mon Sep 17 00:00:00 2001 From: Mimi Zohar Date: Tue, 23 Nov 2021 13:37:52 -0500 Subject: fs-verity: define a function to return the integrity protected file digest Define a function named fsverity_get_digest() to return the verity file digest and the associated hash algorithm (enum hash_algo). This assumes that before calling fsverity_get_digest() the file must have been opened, which is even true for the IMA measure/appraise on file open policy rule use case (func=FILE_CHECK). do_open() calls vfs_open() immediately prior to ima_file_check(). Acked-by: Eric Biggers Signed-off-by: Mimi Zohar --- include/linux/fsverity.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h index a7afc800bd8d..7af030fa3c36 100644 --- a/include/linux/fsverity.h +++ b/include/linux/fsverity.h @@ -12,8 +12,16 @@ #define _LINUX_FSVERITY_H #include +#include +#include #include +/* + * Largest digest size among all hash algorithms supported by fs-verity. + * Currently assumed to be <= size of fsverity_descriptor::root_hash. + */ +#define FS_VERITY_MAX_DIGEST_SIZE SHA512_DIGEST_SIZE + /* Verity operations for filesystems */ struct fsverity_operations { @@ -131,6 +139,9 @@ int fsverity_ioctl_enable(struct file *filp, const void __user *arg); /* measure.c */ int fsverity_ioctl_measure(struct file *filp, void __user *arg); +int fsverity_get_digest(struct inode *inode, + u8 digest[FS_VERITY_MAX_DIGEST_SIZE], + enum hash_algo *alg); /* open.c */ @@ -170,6 +181,13 @@ static inline int fsverity_ioctl_measure(struct file *filp, void __user *arg) return -EOPNOTSUPP; } +static inline int fsverity_get_digest(struct inode *inode, + u8 digest[FS_VERITY_MAX_DIGEST_SIZE], + enum hash_algo *alg) +{ + return -EOPNOTSUPP; +} + /* open.c */ static inline int fsverity_file_open(struct inode *inode, struct file *filp) -- cgit From 8cf9e1210adf49cd83d11e2b1984540cf0e0be71 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 28 Apr 2022 17:59:39 +0200 Subject: mm: slab: fix comment for ARCH_KMALLOC_MINALIGN The comment next to the ARCH_KMALLOC_MINALIGN definition says that ARCH_KMALLOC_MINALIGN can be defined in arch headers. This is incorrect: it's actually ARCH_DMA_MINALIGN that can be defined there. Fix the comment. Signed-off-by: Andrey Konovalov Acked-by: David Rientjes Signed-off-by: Vlastimil Babka Link: https://lore.kernel.org/r/fe1ca7a25a054b61d1038686d07569416e287e7b.1651161548.git.andreyknvl@google.com --- include/linux/slab.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 11ceddcae9f4..8c090599fc21 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -197,7 +197,7 @@ void kmem_dump_obj(void *object); /* * Some archs want to perform DMA into kmalloc caches and need a guaranteed * alignment larger than the alignment of a 64-bit integer. - * Setting ARCH_KMALLOC_MINALIGN in arch headers allows that. + * Setting ARCH_DMA_MINALIGN in arch headers allows that. */ #if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8 #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN -- cgit From 154036a3b3f39cb17fc65511011f4c4024846e91 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 28 Apr 2022 17:59:40 +0200 Subject: mm: slab: fix comment for __assume_kmalloc_alignment The comment next to the __assume_kmalloc_alignment definition is not precise: kmalloc relies on kmem_cache_alloc, so kmalloc technically returns pointers aligned to both ARCH_KMALLOC_MINALIGN and ARCH_SLAB_MINALIGN, not only to ARCH_KMALLOC_MINALIGN. (See create_kmalloc_cache()->create_boot_cache()->calculate_alignment() for SLAB and SLUB and __do_kmalloc_node() for SLOB.) Clarify the comment. The assumption specified by __assume_kmalloc_alignment is still correct, although it can be made stronger. I'll leave this to a separate patch. Signed-off-by: Andrey Konovalov Acked-by: David Rientjes Signed-off-by: Vlastimil Babka Link: https://lore.kernel.org/r/84d8142747230f2015eaf9705ee7c2e1a9f56596.1651161548.git.andreyknvl@google.com --- include/linux/slab.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 8c090599fc21..58bb9392775d 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -217,9 +217,9 @@ void kmem_dump_obj(void *object); #endif /* - * kmalloc and friends return ARCH_KMALLOC_MINALIGN aligned - * pointers. kmem_cache_alloc and friends return ARCH_SLAB_MINALIGN - * aligned pointers. + * kmem_cache_alloc and friends return pointers aligned to ARCH_SLAB_MINALIGN. + * kmalloc and friends return pointers aligned to both ARCH_KMALLOC_MINALIGN + * and ARCH_SLAB_MINALIGN, but here we only assume the former alignment. */ #define __assume_kmalloc_alignment __assume_aligned(ARCH_KMALLOC_MINALIGN) #define __assume_slab_alignment __assume_aligned(ARCH_SLAB_MINALIGN) -- cgit From 23587f7c5daa936524ede26201253a089666b5d9 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Fri, 29 Apr 2022 17:05:45 +0800 Subject: mm/slub: remove unused kmem_cache_order_objects max max field holds the largest slab order that was ever used for a slab cache. But it's unused now. Remove it. Signed-off-by: Miaohe Lin Reviewed-by: Muchun Song Acked-by: David Rientjes Signed-off-by: Vlastimil Babka Link: https://lore.kernel.org/r/20220429090545.33413-1-linmiaohe@huawei.com --- include/linux/slub_def.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 33c5c0e3bd8d..f9c68a9dac04 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -105,7 +105,6 @@ struct kmem_cache { struct kmem_cache_order_objects oo; /* Allocation and freeing of slabs */ - struct kmem_cache_order_objects max; struct kmem_cache_order_objects min; gfp_t allocflags; /* gfp flags to use on each alloc */ int refcount; /* Refcount for slab cache destroy */ -- cgit From 94f697c5384bd7f9632acca483ba1ef9dd99ea97 Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Fri, 29 Apr 2022 12:01:53 +0200 Subject: mtd: spi-nor: move spi_nor_write_ear() to winbond module The "Extended Address Register" is winbond specific. If the flash is larger than 16MiB and is used in 3 byte address mode, it is used to set the remaining address bits. Move the write_ear() function, the opcode macros and the spimem op template into the winbond module and rename them accordingly. Signed-off-by: Michael Walle Signed-off-by: Pratyush Yadav Reviewed-by: Pratyush Yadav Link: https://lore.kernel.org/r/20220429100153.2338501-1-michael@walle.cc --- include/linux/mtd/spi-nor.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h index 5e25a7b75ae2..e505c4a5c530 100644 --- a/include/linux/mtd/spi-nor.h +++ b/include/linux/mtd/spi-nor.h @@ -47,8 +47,6 @@ #define SPINOR_OP_RDID 0x9f /* Read JEDEC ID */ #define SPINOR_OP_RDSFDP 0x5a /* Read SFDP */ #define SPINOR_OP_RDCR 0x35 /* Read configuration register */ -#define SPINOR_OP_RDEAR 0xc8 /* Read Extended Address Register */ -#define SPINOR_OP_WREAR 0xc5 /* Write Extended Address Register */ #define SPINOR_OP_SRSTEN 0x66 /* Software Reset Enable */ #define SPINOR_OP_SRST 0x99 /* Software Reset */ #define SPINOR_OP_GBULK 0x98 /* Global Block Unlock */ -- cgit From b170143ae1113882731666aec9b9105356f1fc17 Mon Sep 17 00:00:00 2001 From: Sven Peter Date: Sun, 1 May 2022 16:55:08 +0200 Subject: soc: apple: Add SART driver The NVMe co-processor on the Apple M1 uses a DMA address filter called SART for some DMA transactions. This adds a simple driver used to configure the memory regions from which DMA transactions are allowed. Unlike a real IOMMU, SART does not support any pagetables and can't be implemented inside the IOMMU subsystem using iommu_ops. It also can't be implemented using dma_map_ops since not all DMA transactions of the NVMe controller are filtered by SART. Instead, most buffers have to be registered using the integrated NVMe IOMMU and we can't have two separate dma_map_ops implementations for a single device. Co-developed-by: Hector Martin Reviewed-by: Arnd Bergmann Signed-off-by: Hector Martin Signed-off-by: Sven Peter --- include/linux/soc/apple/sart.h | 53 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 include/linux/soc/apple/sart.h (limited to 'include/linux') diff --git a/include/linux/soc/apple/sart.h b/include/linux/soc/apple/sart.h new file mode 100644 index 000000000000..2249bf6cde09 --- /dev/null +++ b/include/linux/soc/apple/sart.h @@ -0,0 +1,53 @@ +/* SPDX-License-Identifier: GPL-2.0-only OR MIT */ +/* + * Apple SART device driver + * Copyright (C) The Asahi Linux Contributors + * + * Apple SART is a simple address filter for DMA transactions. + * Regions of physical memory must be added to the SART's allow + * list before any DMA can target these. Unlike a proper + * IOMMU no remapping can be done. + */ + +#ifndef _LINUX_SOC_APPLE_SART_H_ +#define _LINUX_SOC_APPLE_SART_H_ + +#include +#include +#include + +struct apple_sart; + +/* + * Get a reference to the SART attached to dev. + * + * Looks for the phandle reference in apple,sart and returns a pointer + * to the corresponding apple_sart struct to be used with + * apple_sart_add_allowed_region and apple_sart_remove_allowed_region. + */ +struct apple_sart *devm_apple_sart_get(struct device *dev); + +/* + * Adds the region [paddr, paddr+size] to the DMA allow list. + * + * @sart: SART reference + * @paddr: Start address of the region to be used for DMA + * @size: Size of the region to be used for DMA. + */ +int apple_sart_add_allowed_region(struct apple_sart *sart, phys_addr_t paddr, + size_t size); + +/* + * Removes the region [paddr, paddr+size] from the DMA allow list. + * + * Note that exact same paddr and size used for apple_sart_add_allowed_region + * have to be passed. + * + * @sart: SART reference + * @paddr: Start address of the region no longer used for DMA + * @size: Size of the region no longer used for DMA. + */ +int apple_sart_remove_allowed_region(struct apple_sart *sart, phys_addr_t paddr, + size_t size); + +#endif /* _LINUX_SOC_APPLE_SART_H_ */ -- cgit From f502cc568de95e5ed9cc9e6133fa454fbe0c5c01 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 4 Mar 2022 11:48:38 -0800 Subject: KVM: Add max_vcpus field in common 'struct kvm' For TDX guests, the maximum number of vcpus needs to be specified when the TDX guest VM is initialized (creating the TDX data corresponding to TDX guest) before creating vcpu. It needs to record the maximum number of vcpus on VM creation (KVM_CREATE_VM) and return error if the number of vcpus exceeds it Because there is already max_vcpu member in arm64 struct kvm_arch, move it to common struct kvm and initialize it to KVM_MAX_VCPUS before kvm_arch_init_vm() instead of adding it to x86 struct kvm_arch. Signed-off-by: Sean Christopherson Signed-off-by: Isaku Yamahata Message-Id: Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 252ee4a61b58..f94f72bbd2d3 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -725,6 +725,7 @@ struct kvm { * and is accessed atomically. */ atomic_t online_vcpus; + int max_vcpus; int created_vcpus; int last_boosted_vcpu; struct list_head vm_list; -- cgit From db05628435aa761d30b4eae481a82befe7a8492a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Apr 2022 06:27:12 +0200 Subject: blk-cgroup: move blkcg_{get,set}_fc_appid out of line No need to have these helpers inline. Also remove the stubs and just use an IS_ENABLED for the get side (the set side already is only built conditionlly). Signed-off-by: Christoph Hellwig Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220420042723.1010598-5-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 58 ++-------------------------------------------- 1 file changed, 2 insertions(+), 56 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 652cd05b0924..7a2f7de30173 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -218,61 +218,7 @@ static inline struct blkcg *bio_blkcg(struct bio *bio) { return NULL; } #endif /* CONFIG_BLK_CGROUP */ -#ifdef CONFIG_BLK_CGROUP_FC_APPID -/* - * Sets the fc_app_id field associted to blkcg - * @app_id: application identifier - * @cgrp_id: cgroup id - * @app_id_len: size of application identifier - */ -static inline int blkcg_set_fc_appid(char *app_id, u64 cgrp_id, size_t app_id_len) -{ - struct cgroup *cgrp; - struct cgroup_subsys_state *css; - struct blkcg *blkcg; - int ret = 0; - - if (app_id_len > FC_APPID_LEN) - return -EINVAL; - - cgrp = cgroup_get_from_id(cgrp_id); - if (!cgrp) - return -ENOENT; - css = cgroup_get_e_css(cgrp, &io_cgrp_subsys); - if (!css) { - ret = -ENOENT; - goto out_cgrp_put; - } - blkcg = css_to_blkcg(css); - /* - * There is a slight race condition on setting the appid. - * Worst case an I/O may not find the right id. - * This is no different from the I/O we let pass while obtaining - * the vmid from the fabric. - * Adding the overhead of a lock is not necessary. - */ - strlcpy(blkcg->fc_app_id, app_id, app_id_len); - css_put(css); -out_cgrp_put: - cgroup_put(cgrp); - return ret; -} +int blkcg_set_fc_appid(char *app_id, u64 cgrp_id, size_t app_id_len); +char *blkcg_get_fc_appid(struct bio *bio); -/** - * blkcg_get_fc_appid - get the fc app identifier associated with a bio - * @bio: target bio - * - * On success return the fc_app_id, on failure return NULL - */ -static inline char *blkcg_get_fc_appid(struct bio *bio) -{ - if (bio && bio->bi_blkg && - (bio->bi_blkg->blkcg->fc_app_id[0] != '\0')) - return bio->bi_blkg->blkcg->fc_app_id; - return NULL; -} -#else -static inline int blkcg_set_fc_appid(char *buf, u64 id, size_t len) { return -EINVAL; } -static inline char *blkcg_get_fc_appid(struct bio *bio) { return NULL; } -#endif /*CONFIG_BLK_CGROUP_FC_APPID*/ #endif /* _BLK_CGROUP_H */ -- cgit From 216889aad362b5b7e998a5371348b5e95d485dd1 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Apr 2022 06:27:13 +0200 Subject: blk-cgroup: move blk_cgroup_congested out line There is no urgent need to inline this function, so move it out of line. Signed-off-by: Christoph Hellwig Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220420042723.1010598-6-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 20 +------------------- 1 file changed, 1 insertion(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 7a2f7de30173..988965c1c27b 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -135,25 +135,7 @@ static inline struct blkcg *bio_blkcg(struct bio *bio) return NULL; } -static inline bool blk_cgroup_congested(void) -{ - struct cgroup_subsys_state *css; - bool ret = false; - - rcu_read_lock(); - css = kthread_blkcg(); - if (!css) - css = task_css(current, io_cgrp_id); - while (css) { - if (atomic_read(&css->cgroup->congestion_count)) { - ret = true; - break; - } - css = css->parent; - } - rcu_read_unlock(); - return ret; -} +bool blk_cgroup_congested(void); /** * blkcg_parent - get the parent of a blkcg -- cgit From 397c9f46ee4d99024c64954b007c1b5762d01cb4 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Apr 2022 06:27:14 +0200 Subject: blk-cgroup: move blkcg_{pin,unpin}_online out of line Move these two functions out of line as there is no good reason to inline them. Also switch to passing a cgroup_subsys_state instead of doing the conversion in the caller to prepare for making the blkcg structure private to blk-cgroup. Signed-off-by: Christoph Hellwig Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220420042723.1010598-7-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 46 ++-------------------------------------------- 1 file changed, 2 insertions(+), 44 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 988965c1c27b..0fb7459096e9 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -111,7 +111,6 @@ struct blkcg_gq { extern struct cgroup_subsys_state * const blkcg_root_css; -void blkcg_destroy_blkgs(struct blkcg *blkcg); void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); void blkcg_maybe_throttle_current(void); @@ -136,49 +135,8 @@ static inline struct blkcg *bio_blkcg(struct bio *bio) } bool blk_cgroup_congested(void); - -/** - * blkcg_parent - get the parent of a blkcg - * @blkcg: blkcg of interest - * - * Return the parent blkcg of @blkcg. Can be called anytime. - */ -static inline struct blkcg *blkcg_parent(struct blkcg *blkcg) -{ - return css_to_blkcg(blkcg->css.parent); -} - -/** - * blkcg_pin_online - pin online state - * @blkcg: blkcg of interest - * - * While pinned, a blkcg is kept online. This is primarily used to - * impedance-match blkg and cgwb lifetimes so that blkg doesn't go offline - * while an associated cgwb is still active. - */ -static inline void blkcg_pin_online(struct blkcg *blkcg) -{ - refcount_inc(&blkcg->online_pin); -} - -/** - * blkcg_unpin_online - unpin online state - * @blkcg: blkcg of interest - * - * This is primarily used to impedance-match blkg and cgwb lifetimes so - * that blkg doesn't go offline while an associated cgwb is still active. - * When this count goes to zero, all active cgwbs have finished so the - * blkcg can continue destruction by calling blkcg_destroy_blkgs(). - */ -static inline void blkcg_unpin_online(struct blkcg *blkcg) -{ - do { - if (!refcount_dec_and_test(&blkcg->online_pin)) - break; - blkcg_destroy_blkgs(blkcg); - blkcg = blkcg_parent(blkcg); - } while (blkcg); -} +void blkcg_pin_online(struct cgroup_subsys_state *blkcg_css); +void blkcg_unpin_online(struct cgroup_subsys_state *blkcg_css); #else /* CONFIG_BLK_CGROUP */ -- cgit From dec223c92a4688f6c9642d640cfe15a99d289dd4 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Apr 2022 06:27:15 +0200 Subject: blk-cgroup: move struct blkcg to block/blk-cgroup.h There is no real need to expose the blkcg structure to the whole kernel. Move it to the private header an expose a helper to let the writeback code access the cgwb_list member. Signed-off-by: Christoph Hellwig Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220420042723.1010598-8-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/backing-dev.h | 6 ++---- include/linux/blk-cgroup.h | 32 +------------------------------- 2 files changed, 3 insertions(+), 35 deletions(-) (limited to 'include/linux') diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 87ce24d238f3..2bd073fa6bb5 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -17,8 +17,6 @@ #include #include -struct blkcg; - static inline struct backing_dev_info *bdi_get(struct backing_dev_info *bdi) { kref_get(&bdi->refcnt); @@ -154,7 +152,7 @@ struct bdi_writeback *wb_get_create(struct backing_dev_info *bdi, struct cgroup_subsys_state *memcg_css, gfp_t gfp); void wb_memcg_offline(struct mem_cgroup *memcg); -void wb_blkcg_offline(struct blkcg *blkcg); +void wb_blkcg_offline(struct cgroup_subsys_state *css); /** * inode_cgwb_enabled - test whether cgroup writeback is enabled on an inode @@ -378,7 +376,7 @@ static inline void wb_memcg_offline(struct mem_cgroup *memcg) { } -static inline void wb_blkcg_offline(struct blkcg *blkcg) +static inline void wb_blkcg_offline(struct cgroup_subsys_state *css) { } diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 0fb7459096e9..d7b188095040 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -37,29 +37,6 @@ enum blkg_iostat_type { BLKG_IOSTAT_NR, }; -struct blkcg_gq; -struct blkg_policy_data; - -struct blkcg { - struct cgroup_subsys_state css; - spinlock_t lock; - refcount_t online_pin; - - struct radix_tree_root blkg_tree; - struct blkcg_gq __rcu *blkg_hint; - struct hlist_head blkg_list; - - struct blkcg_policy_data *cpd[BLKCG_MAX_POLS]; - - struct list_head all_blkcgs_node; -#ifdef CONFIG_BLK_CGROUP_FC_APPID - char fc_app_id[FC_APPID_LEN]; -#endif -#ifdef CONFIG_CGROUP_WRITEBACK - struct list_head cgwb_list; -#endif -}; - struct blkg_iostat { u64 bytes[BLKG_IOSTAT_NR]; u64 ios[BLKG_IOSTAT_NR]; @@ -114,11 +91,6 @@ extern struct cgroup_subsys_state * const blkcg_root_css; void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); void blkcg_maybe_throttle_current(void); -static inline struct blkcg *css_to_blkcg(struct cgroup_subsys_state *css) -{ - return css ? container_of(css, struct blkcg, css) : NULL; -} - /** * bio_blkcg - grab the blkcg associated with a bio * @bio: target bio @@ -137,12 +109,10 @@ static inline struct blkcg *bio_blkcg(struct bio *bio) bool blk_cgroup_congested(void); void blkcg_pin_online(struct cgroup_subsys_state *blkcg_css); void blkcg_unpin_online(struct cgroup_subsys_state *blkcg_css); +struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css); #else /* CONFIG_BLK_CGROUP */ -struct blkcg { -}; - struct blkcg_gq { }; -- cgit From f4a6a61cb6d40d9ae63e47743d33200f3efe3fe7 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Apr 2022 06:27:16 +0200 Subject: blktrace: cleanup the __trace_note_message interface Pass the cgroup_subsys_state instead of a the blkg so that blktrace doesn't need to poke into blk-cgroup internals, and give the name a blk prefix as the current name is way too generic for a public interface. Signed-off-by: Christoph Hellwig Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220420042723.1010598-9-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blktrace_api.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 22501a293fa5..623e22492afa 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -27,12 +27,10 @@ struct blk_trace { atomic_t dropped; }; -struct blkcg; - extern int blk_trace_ioctl(struct block_device *, unsigned, char __user *); extern void blk_trace_shutdown(struct request_queue *); -extern __printf(3, 4) -void __trace_note_message(struct blk_trace *, struct blkcg *blkcg, const char *fmt, ...); +__printf(3, 4) void __blk_trace_note_message(struct blk_trace *bt, + struct cgroup_subsys_state *css, const char *fmt, ...); /** * blk_add_trace_msg - Add a (simple) message to the blktrace stream @@ -47,14 +45,14 @@ void __trace_note_message(struct blk_trace *, struct blkcg *blkcg, const char *f * NOTE: Can not use 'static inline' due to presence of var args... * **/ -#define blk_add_cgroup_trace_msg(q, cg, fmt, ...) \ +#define blk_add_cgroup_trace_msg(q, css, fmt, ...) \ do { \ struct blk_trace *bt; \ \ rcu_read_lock(); \ bt = rcu_dereference((q)->blk_trace); \ if (unlikely(bt)) \ - __trace_note_message(bt, cg, fmt, ##__VA_ARGS__);\ + __blk_trace_note_message(bt, css, fmt, ##__VA_ARGS__);\ rcu_read_unlock(); \ } while (0) #define blk_add_trace_msg(q, fmt, ...) \ -- cgit From bbb1ebe7a909db4de49777fb7676d5bf293f34c9 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Apr 2022 06:27:17 +0200 Subject: blk-cgroup: replace bio_blkcg with bio_blkcg_css All callers of bio_blkcg actually want the CSS, so replace it with an interface that does return the CSS. This now allows to move struct blkcg_gq to block/blk-cgroup.h instead of exposing it in a public header. Signed-off-by: Christoph Hellwig Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220420042723.1010598-10-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 83 +++------------------------------------------- 1 file changed, 5 insertions(+), 78 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index d7b188095040..97c7968e3204 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -28,94 +28,18 @@ #define FC_APPID_LEN 129 #ifdef CONFIG_BLK_CGROUP - -enum blkg_iostat_type { - BLKG_IOSTAT_READ, - BLKG_IOSTAT_WRITE, - BLKG_IOSTAT_DISCARD, - - BLKG_IOSTAT_NR, -}; - -struct blkg_iostat { - u64 bytes[BLKG_IOSTAT_NR]; - u64 ios[BLKG_IOSTAT_NR]; -}; - -struct blkg_iostat_set { - struct u64_stats_sync sync; - struct blkg_iostat cur; - struct blkg_iostat last; -}; - -/* association between a blk cgroup and a request queue */ -struct blkcg_gq { - /* Pointer to the associated request_queue */ - struct request_queue *q; - struct list_head q_node; - struct hlist_node blkcg_node; - struct blkcg *blkcg; - - /* all non-root blkcg_gq's are guaranteed to have access to parent */ - struct blkcg_gq *parent; - - /* reference count */ - struct percpu_ref refcnt; - - /* is this blkg online? protected by both blkcg and q locks */ - bool online; - - struct blkg_iostat_set __percpu *iostat_cpu; - struct blkg_iostat_set iostat; - - struct blkg_policy_data *pd[BLKCG_MAX_POLS]; - - spinlock_t async_bio_lock; - struct bio_list async_bios; - union { - struct work_struct async_bio_work; - struct work_struct free_work; - }; - - atomic_t use_delay; - atomic64_t delay_nsec; - atomic64_t delay_start; - u64 last_delay; - int last_use; - - struct rcu_head rcu_head; -}; - extern struct cgroup_subsys_state * const blkcg_root_css; void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); void blkcg_maybe_throttle_current(void); - -/** - * bio_blkcg - grab the blkcg associated with a bio - * @bio: target bio - * - * This returns the blkcg associated with a bio, %NULL if not associated. - * Callers are expected to either handle %NULL or know association has been - * done prior to calling this. - */ -static inline struct blkcg *bio_blkcg(struct bio *bio) -{ - if (bio && bio->bi_blkg) - return bio->bi_blkg->blkcg; - return NULL; -} - bool blk_cgroup_congested(void); void blkcg_pin_online(struct cgroup_subsys_state *blkcg_css); void blkcg_unpin_online(struct cgroup_subsys_state *blkcg_css); struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css); +struct cgroup_subsys_state *bio_blkcg_css(struct bio *bio); #else /* CONFIG_BLK_CGROUP */ -struct blkcg_gq { -}; - #define blkcg_root_css ((struct cgroup_subsys_state *)ERR_PTR(-EINVAL)) static inline void blkcg_maybe_throttle_current(void) { } @@ -123,7 +47,10 @@ static inline bool blk_cgroup_congested(void) { return false; } #ifdef CONFIG_BLOCK static inline void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay) { } -static inline struct blkcg *bio_blkcg(struct bio *bio) { return NULL; } +static inline struct cgroup_subsys_state *bio_blkcg_css(struct bio *bio) +{ + return NULL; +} #endif /* CONFIG_BLOCK */ #endif /* CONFIG_BLK_CGROUP */ -- cgit From 7f20ba7c42fd899557cef7d001f48711c3066ba5 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Apr 2022 06:27:18 +0200 Subject: blk-cgroup: remove pointless CONFIG_BLOCK ifdefs No need to make BLK_CGROUP stubs conditional on CONFIG_BLOCK as they can't be used without that. Signed-off-by: Christoph Hellwig Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220420042723.1010598-11-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 97c7968e3204..abbfa97d6d46 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -44,15 +44,11 @@ struct cgroup_subsys_state *bio_blkcg_css(struct bio *bio); static inline void blkcg_maybe_throttle_current(void) { } static inline bool blk_cgroup_congested(void) { return false; } - -#ifdef CONFIG_BLOCK static inline void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay) { } static inline struct cgroup_subsys_state *bio_blkcg_css(struct bio *bio) { return NULL; } -#endif /* CONFIG_BLOCK */ - #endif /* CONFIG_BLK_CGROUP */ int blkcg_set_fc_appid(char *app_id, u64 cgrp_id, size_t app_id_len); -- cgit From c97ab271576dec2170e7b804cb05f7617b30fed9 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Apr 2022 06:27:19 +0200 Subject: blk-cgroup: remove unneeded includes from Remove all the includes that aren't actually needed from and push them to the actual source files where needed. Signed-off-by: Christoph Hellwig Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220420042723.1010598-12-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 15 +++++---------- 1 file changed, 5 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index abbfa97d6d46..9f40dbc65f82 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -14,16 +14,11 @@ * Nauman Rafique */ -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include +#include + +struct bio; +struct cgroup_subsys_state; +struct request_queue; #define FC_APPID_LEN 129 -- cgit From f624506f98b198e65b44da303f44974590fb16c0 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 20 Apr 2022 06:27:23 +0200 Subject: kthread: unexport kthread_blkcg kthread_blkcg is only used by the built-in blk-cgroup code. Signed-off-by: Christoph Hellwig Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220420042723.1010598-16-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/kthread.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kthread.h b/include/linux/kthread.h index de5d75bafd66..30e5bec81d2b 100644 --- a/include/linux/kthread.h +++ b/include/linux/kthread.h @@ -222,9 +222,5 @@ void kthread_associate_blkcg(struct cgroup_subsys_state *css); struct cgroup_subsys_state *kthread_blkcg(void); #else static inline void kthread_associate_blkcg(struct cgroup_subsys_state *css) { } -static inline struct cgroup_subsys_state *kthread_blkcg(void) -{ - return NULL; -} #endif #endif /* _LINUX_KTHREAD_H */ -- cgit From d639af621600dc52dd9ed0dcf62b22595f2f5b52 Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Tue, 22 Mar 2022 09:16:39 +0000 Subject: net/mlx5: fs, split software and IFC flow destination definitions Separate flow destinations between software and IFC. Flow destination type passed by callers was used as the input in firmware commands and over the years software only types were added which resulted in mixing between the two. Create an IFC enum that contains only the flow destinations defined when talking to the firmware. Now that there is a proper software only enum for flow destinations the hardcoded values can be removed as the values are no longer used in firmware commands. Signed-off-by: Mark Bloch Reviewed-by: Maor Gottlieb Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 11 +++++++++++ include/linux/mlx5/mlx5_ifc.h | 16 ++++++---------- 2 files changed, 17 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index e3bfed68b08a..9da9df9ae751 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -40,6 +40,17 @@ #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v) +enum mlx5_flow_destination_type { + MLX5_FLOW_DESTINATION_TYPE_VPORT, + MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE, + MLX5_FLOW_DESTINATION_TYPE_TIR, + MLX5_FLOW_DESTINATION_TYPE_FLOW_SAMPLER, + MLX5_FLOW_DESTINATION_TYPE_UPLINK, + MLX5_FLOW_DESTINATION_TYPE_PORT, + MLX5_FLOW_DESTINATION_TYPE_COUNTER, + MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE_NUM, +}; + enum { MLX5_FLOW_CONTEXT_ACTION_FWD_NEXT_PRIO = 1 << 16, MLX5_FLOW_CONTEXT_ACTION_ENCRYPT = 1 << 17, diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 7d2d0ba82144..7f4ec9faa180 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1806,16 +1806,12 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 reserved_at_c0[0x740]; }; -enum mlx5_flow_destination_type { - MLX5_FLOW_DESTINATION_TYPE_VPORT = 0x0, - MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE = 0x1, - MLX5_FLOW_DESTINATION_TYPE_TIR = 0x2, - MLX5_FLOW_DESTINATION_TYPE_FLOW_SAMPLER = 0x6, - MLX5_FLOW_DESTINATION_TYPE_UPLINK = 0x8, - - MLX5_FLOW_DESTINATION_TYPE_PORT = 0x99, - MLX5_FLOW_DESTINATION_TYPE_COUNTER = 0x100, - MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE_NUM = 0x101, +enum mlx5_ifc_flow_destination_type { + MLX5_IFC_FLOW_DESTINATION_TYPE_VPORT = 0x0, + MLX5_IFC_FLOW_DESTINATION_TYPE_FLOW_TABLE = 0x1, + MLX5_IFC_FLOW_DESTINATION_TYPE_TIR = 0x2, + MLX5_IFC_FLOW_DESTINATION_TYPE_FLOW_SAMPLER = 0x6, + MLX5_IFC_FLOW_DESTINATION_TYPE_UPLINK = 0x8, }; enum mlx5_flow_table_miss_action { -- cgit From 6510bc0d7cb4119ebe10f9ab43aca6fd0e5f3e73 Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Tue, 15 Mar 2022 14:08:23 +0000 Subject: net/mlx5: fs, add unused destination type When the caller doesn't pass a destination fs_core will create a unused rule just so a context can be returned. This unused rule is zeroed out and its type is 0 which can be mixed up with MLX5_FLOW_DESTINATION_TYPE_VPORT. Create a dedicated type to differentiate between the two named MLX5_FLOW_DESTINATION_TYPE_NONE. Signed-off-by: Mark Bloch Reviewed-by: Maor Gottlieb Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 9da9df9ae751..8135713b0d2d 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -41,6 +41,7 @@ #define MLX5_SET_CFG(p, f, v) MLX5_SET(create_flow_group_in, p, f, v) enum mlx5_flow_destination_type { + MLX5_FLOW_DESTINATION_TYPE_NONE, MLX5_FLOW_DESTINATION_TYPE_VPORT, MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE, MLX5_FLOW_DESTINATION_TYPE_TIR, -- cgit From c9bc1a0ef9f613a7bc1adfff4c67dc5e5d7d1709 Mon Sep 17 00:00:00 2001 From: "Dustin L. Howett" Date: Thu, 17 Feb 2022 10:59:30 -0600 Subject: platform/chrome: cros_ec_lpcs: reserve the MEC LPC I/O ports first Some ChromeOS EC devices (such as the Framework Laptop) only map I/O ports 0x800-0x807. Making the larger reservation required by the non-MEC LPC (the 0xFF ports for the memory map, and the 0xFF ports for the parameter region) is non-viable on these devices. Since we probe the MEC EC first, we can get away with a smaller reservation that covers the MEC EC ports. If we fall back to classic LPC, we can grow the reservation to cover the memory map and the parameter region. cros_ec_lpc_probe also interacted with I/O ports 0x800-0x807 without a reservation. Restructuring the code to request the MEC LPC region first obviates the need to do so. Signed-off-by: Dustin L. Howett Signed-off-by: Tzung-Bi Shih Link: https://lore.kernel.org/r/20220217165930.15081-3-dustin@howett.net --- include/linux/platform_data/cros_ec_commands.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_commands.h b/include/linux/platform_data/cros_ec_commands.h index c23554531961..8cfa8cfca77e 100644 --- a/include/linux/platform_data/cros_ec_commands.h +++ b/include/linux/platform_data/cros_ec_commands.h @@ -51,10 +51,14 @@ /* * The actual block is 0x800-0x8ff, but some BIOSes think it's 0x880-0x8ff * and they tell the kernel that so we have to think of it as two parts. + * + * Other BIOSes report only the I/O port region spanned by the Microchip + * MEC series EC; an attempt to address a larger region may fail. */ -#define EC_HOST_CMD_REGION0 0x800 -#define EC_HOST_CMD_REGION1 0x880 -#define EC_HOST_CMD_REGION_SIZE 0x80 +#define EC_HOST_CMD_REGION0 0x800 +#define EC_HOST_CMD_REGION1 0x880 +#define EC_HOST_CMD_REGION_SIZE 0x80 +#define EC_HOST_CMD_MEC_REGION_SIZE 0x8 /* EC command register bit functions */ #define EC_LPC_CMDR_DATA BIT(0) /* Data ready for host to read */ -- cgit From 4c7f24f857c7cd0381dd92495db476066d1c6aec Mon Sep 17 00:00:00 2001 From: Tonghao Zhang Date: Sun, 1 May 2022 11:55:23 +0800 Subject: net: sysctl: introduce sysctl SYSCTL_THREE This patch introdues the SYSCTL_THREE. KUnit: [00:10:14] ================ sysctl_test (10 subtests) ================= [00:10:14] [PASSED] sysctl_test_api_dointvec_null_tbl_data [00:10:14] [PASSED] sysctl_test_api_dointvec_table_maxlen_unset [00:10:14] [PASSED] sysctl_test_api_dointvec_table_len_is_zero [00:10:14] [PASSED] sysctl_test_api_dointvec_table_read_but_position_set [00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_positive [00:10:14] [PASSED] sysctl_test_dointvec_read_happy_single_negative [00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_positive [00:10:14] [PASSED] sysctl_test_dointvec_write_happy_single_negative [00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_less_int_min [00:10:14] [PASSED] sysctl_test_api_dointvec_write_single_greater_int_max [00:10:14] =================== [PASSED] sysctl_test =================== ./run_kselftest.sh -c sysctl ... ok 1 selftests: sysctl: sysctl.sh Cc: Luis Chamberlain Cc: Kees Cook Cc: Iurii Zaikin Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Hideaki YOSHIFUJI Cc: David Ahern Cc: Simon Horman Cc: Julian Anastasov Cc: Pablo Neira Ayuso Cc: Jozsef Kadlecsik Cc: Florian Westphal Cc: Shuah Khan Cc: Andrew Morton Cc: Alexei Starovoitov Cc: Eric Dumazet Cc: Lorenz Bauer Cc: Akhmat Karakotov Signed-off-by: Tonghao Zhang Reviewed-by: Simon Horman Signed-off-by: Paolo Abeni --- include/linux/sysctl.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 6353d6db69b2..80263f7cdb77 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -38,10 +38,10 @@ struct ctl_table_header; struct ctl_dir; /* Keep the same order as in fs/proc/proc_sysctl.c */ -#define SYSCTL_NEG_ONE ((void *)&sysctl_vals[0]) -#define SYSCTL_ZERO ((void *)&sysctl_vals[1]) -#define SYSCTL_ONE ((void *)&sysctl_vals[2]) -#define SYSCTL_TWO ((void *)&sysctl_vals[3]) +#define SYSCTL_ZERO ((void *)&sysctl_vals[0]) +#define SYSCTL_ONE ((void *)&sysctl_vals[1]) +#define SYSCTL_TWO ((void *)&sysctl_vals[2]) +#define SYSCTL_THREE ((void *)&sysctl_vals[3]) #define SYSCTL_FOUR ((void *)&sysctl_vals[4]) #define SYSCTL_ONE_HUNDRED ((void *)&sysctl_vals[5]) #define SYSCTL_TWO_HUNDRED ((void *)&sysctl_vals[6]) @@ -51,6 +51,7 @@ struct ctl_dir; /* this is needed for the proc_dointvec_minmax for [fs_]overflow UID and GID */ #define SYSCTL_MAXOLDUID ((void *)&sysctl_vals[10]) +#define SYSCTL_NEG_ONE ((void *)&sysctl_vals[11]) extern const int sysctl_vals[]; -- cgit From 62139f52b7e588d565aa9df81ea0a0548a68b823 Mon Sep 17 00:00:00 2001 From: Per-Daniel Olsson Date: Fri, 29 Apr 2022 09:22:08 +0200 Subject: regulator: pca9450: Make I2C Level Translator configurable Make the I2C Level Translator included in PCA9450 configurable from devicetree. The reset state is off. By setting nxp,i2c-lt-enable, the I2C Level Translator will be enabled while in STANDBY or RUN state. Signed-off-by: Per-Daniel Olsson Signed-off-by: Rickard x Andersson Link: https://lore.kernel.org/r/20220429072211.24957-2-rickaran@axis.com Signed-off-by: Mark Brown --- include/linux/regulator/pca9450.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regulator/pca9450.h b/include/linux/regulator/pca9450.h index 71902f41c919..3c01c2bf84f5 100644 --- a/include/linux/regulator/pca9450.h +++ b/include/linux/regulator/pca9450.h @@ -226,4 +226,11 @@ enum { #define WDOG_B_CFG_COLD_LDO12 0x80 #define WDOG_B_CFG_COLD 0xC0 +/* PCA9450_REG_CONFIG2 bits */ +#define I2C_LT_MASK 0x03 +#define I2C_LT_FORCE_DISABLE 0x00 +#define I2C_LT_ON_STANDBY_RUN 0x01 +#define I2C_LT_ON_RUN 0x02 +#define I2C_LT_FORCE_ENABLE 0x03 + #endif /* __LINUX_REG_PCA9450_H__ */ -- cgit From 1dcbae86ee669bdb0338954cd0136863f5c96c0a Mon Sep 17 00:00:00 2001 From: Dave Gerlach Date: Thu, 14 Apr 2022 12:27:24 -0700 Subject: soc: ti: wkup_m3_ipc: Add support for IO Isolation AM43xx support isolation of the IOs so that control is taken from the peripheral they are connected to and overridden by values present in the CTRL_CONF_* registers for the pad in the control module. The actual toggling happens from the wkup_m3, so use a DT property from the wkup_m3_ipc node to allow the PM code to communicate the necessity for placing the IOs into isolation to the firmware. Signed-off-by: Dave Gerlach Signed-off-by: Keerthy Signed-off-by: Drew Fustini Signed-off-by: Nishanth Menon Link: https://lore.kernel.org/r/20220414192722.2978837-3-dfustini@baylibre.com --- include/linux/wkup_m3_ipc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/wkup_m3_ipc.h b/include/linux/wkup_m3_ipc.h index 2bc52c6381d5..b706eac58f92 100644 --- a/include/linux/wkup_m3_ipc.h +++ b/include/linux/wkup_m3_ipc.h @@ -34,6 +34,7 @@ struct wkup_m3_ipc { int mem_type; unsigned long resume_addr; int vtt_conf; + int isolation_conf; int state; struct completion sync_complete; -- cgit From ea082040fe071d2ba1f8f73792743d7ca9fb218e Mon Sep 17 00:00:00 2001 From: Dave Gerlach Date: Tue, 26 Apr 2022 13:07:44 -0700 Subject: soc: ti: wkup_m3_ipc: Add support for i2c voltage scaling Allow loading of a binary containing i2c scaling sequences to be provided to the wkup_m3 firmware in order to properly scale voltage rails on the PMIC during low power modes like DeepSleep0. Proper binary format is determined by the FW in use. Code expects firmware to have 0x0C57 present as the first two bytes followed by one byte defining offset to sleep sequence followed by one byte defining offset to wake sequence and then lastly both sequences. Each sequence is a series of I2C transfers in the form: u8 length | u8 chip address | u8 byte0/reg address | u8 byte1 | u8 byteN .. The length indicates the number of bytes to transfer, including the register address. The length of each transfer is limited by the I2C buffer size of 32 bytes. Based on previous work by Russ Dill. [dfustini: replace FW_ACTION_HOTPLUG with FW_ACTION_UEVENT] Signed-off-by: Dave Gerlach Signed-off-by: Keerthy [dfustini: add NULL argument to rproc_da_to_va() call] Signed-off-by: Drew Fustini Signed-off-by: Nishanth Menon Link: https://lore.kernel.org/r/20220426200741.712842-3-dfustini@baylibre.com --- include/linux/wkup_m3_ipc.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/wkup_m3_ipc.h b/include/linux/wkup_m3_ipc.h index b706eac58f92..fef0fac60f8c 100644 --- a/include/linux/wkup_m3_ipc.h +++ b/include/linux/wkup_m3_ipc.h @@ -37,6 +37,9 @@ struct wkup_m3_ipc { int isolation_conf; int state; + unsigned long volt_scale_offsets; + const char *sd_fw_name; + struct completion sync_complete; struct mbox_client mbox_client; struct mbox_chan *mbox; @@ -50,6 +53,12 @@ struct wkup_m3_wakeup_src { char src[10]; }; +struct wkup_m3_scale_data_header { + u16 magic; + u8 sleep_offset; + u8 wake_offset; +} __packed; + struct wkup_m3_ipc_ops { void (*set_mem_type)(struct wkup_m3_ipc *m3_ipc, int mem_type); void (*set_resume_address)(struct wkup_m3_ipc *m3_ipc, void *addr); -- cgit From 2a21f9e6d9a408dbd09a01caf5fff42c2f70fa82 Mon Sep 17 00:00:00 2001 From: Dave Gerlach Date: Sun, 1 May 2022 20:32:12 -0700 Subject: soc: ti: wkup_m3_ipc: Add debug option to halt m3 in suspend Add a debugfs option to allow configurable halting of the wkup_m3 during suspend at the last possible point before low power mode entry. This condition can only be resolved through JTAG and advancing beyond the while loop in a8_lp_ds0_handler [1]. Although this hangs the system it forces the system to remain active once it has been entirely configured for low power mode entry, allowing for register inspection through JTAG to help in debugging transition errors. Halt mode can be set using the enable_off_mode entry under wkup_m3_ipc in the debugfs. [1] https://git.ti.com/cgit/processor-firmware/ti-amx3-cm3-pm-firmware/tree/src/pm_services/pm_handlers.c?h=08.02.00.006#n141 Suggested-by: Brad Griffis Signed-off-by: Dave Gerlach [dfustini: add link for a8_lp_ds0_handler() in ti-amx3-cm3-pm-firmware] Signed-off-by: Drew Fustini Signed-off-by: Nishanth Menon Link: https://lore.kernel.org/r/20220502033211.1383158-1-dfustini@baylibre.com --- include/linux/wkup_m3_ipc.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/wkup_m3_ipc.h b/include/linux/wkup_m3_ipc.h index fef0fac60f8c..26d1eb058fa3 100644 --- a/include/linux/wkup_m3_ipc.h +++ b/include/linux/wkup_m3_ipc.h @@ -36,6 +36,7 @@ struct wkup_m3_ipc { int vtt_conf; int isolation_conf; int state; + u32 halt; unsigned long volt_scale_offsets; const char *sd_fw_name; @@ -46,6 +47,7 @@ struct wkup_m3_ipc { struct wkup_m3_ipc_ops *ops; int is_rtc_only; + struct dentry *dbg_path; }; struct wkup_m3_wakeup_src { -- cgit From 3ba75c1316390b2bc39c19cb8f0f85922ab3f9ed Mon Sep 17 00:00:00 2001 From: Baskov Evgeniy Date: Thu, 3 Mar 2022 17:21:19 +0300 Subject: efi: libstub: declare DXE services table UEFI DXE services are not yet used in kernel code but are required to manipulate page table memory protection flags. Add required declarations to use DXE services functions. Signed-off-by: Baskov Evgeniy Link: https://lore.kernel.org/r/20220303142120.1975-2-baskov@ispras.ru [ardb: ignore absent DXE table but warn if the signature check fails] Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index fd266198f9d8..b1f7c6a3e5bf 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -385,6 +385,7 @@ void efi_native_runtime_setup(void); #define EFI_LOAD_FILE_PROTOCOL_GUID EFI_GUID(0x56ec3091, 0x954c, 0x11d2, 0x8e, 0x3f, 0x00, 0xa0, 0xc9, 0x69, 0x72, 0x3b) #define EFI_LOAD_FILE2_PROTOCOL_GUID EFI_GUID(0x4006c0c1, 0xfcb3, 0x403e, 0x99, 0x6d, 0x4a, 0x6c, 0x87, 0x24, 0xe0, 0x6d) #define EFI_RT_PROPERTIES_TABLE_GUID EFI_GUID(0xeb66918a, 0x7eef, 0x402a, 0x84, 0x2e, 0x93, 0x1d, 0x21, 0xc3, 0x8a, 0xe9) +#define EFI_DXE_SERVICES_TABLE_GUID EFI_GUID(0x05ad34ba, 0x6f02, 0x4214, 0x95, 0x2e, 0x4d, 0xa0, 0x39, 0x8e, 0x2b, 0xb9) #define EFI_IMAGE_SECURITY_DATABASE_GUID EFI_GUID(0xd719b2cb, 0x3d3a, 0x4596, 0xa3, 0xbc, 0xda, 0xd0, 0x0e, 0x67, 0x65, 0x6f) #define EFI_SHIM_LOCK_GUID EFI_GUID(0x605dab50, 0xe046, 0x4300, 0xab, 0xb6, 0x3d, 0xd8, 0x10, 0xdd, 0x8b, 0x23) @@ -438,6 +439,7 @@ typedef struct { } efi_config_table_type_t; #define EFI_SYSTEM_TABLE_SIGNATURE ((u64)0x5453595320494249ULL) +#define EFI_DXE_SERVICES_TABLE_SIGNATURE ((u64)0x565245535f455844ULL) #define EFI_2_30_SYSTEM_TABLE_REVISION ((2 << 16) | (30)) #define EFI_2_20_SYSTEM_TABLE_REVISION ((2 << 16) | (20)) -- cgit From 07768c55f9c2ad64ccae3ed82447a87d8af8a687 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Sat, 19 Mar 2022 19:00:20 +0100 Subject: efi/arm64: libstub: run image in place if randomized by the loader If the loader has already placed the EFI kernel image randomly in physical memory, and indicates having done so by installing the 'fixed placement' protocol onto the image handle, don't bother randomizing the placement again in the EFI stub. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index b1f7c6a3e5bf..580ce607d6f5 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -410,6 +410,17 @@ void efi_native_runtime_setup(void); #define LINUX_EFI_MOK_VARIABLE_TABLE_GUID EFI_GUID(0xc451ed2b, 0x9694, 0x45d3, 0xba, 0xba, 0xed, 0x9f, 0x89, 0x88, 0xa3, 0x89) #define LINUX_EFI_COCO_SECRET_AREA_GUID EFI_GUID(0xadf956ad, 0xe98c, 0x484c, 0xae, 0x11, 0xb5, 0x1c, 0x7d, 0x33, 0x64, 0x47) +/* + * This GUID may be installed onto the kernel image's handle as a NULL protocol + * to signal to the stub that the placement of the image should be respected, + * and moving the image in physical memory is undesirable. To ensure + * compatibility with 64k pages kernels with virtually mapped stacks, and to + * avoid defeating physical randomization, this protocol should only be + * installed if the image was placed at a randomized 128k aligned address in + * memory. + */ +#define LINUX_EFI_LOADED_IMAGE_FIXED_GUID EFI_GUID(0xf5a37b6d, 0x3344, 0x42a5, 0xb6, 0xbb, 0x97, 0x86, 0x48, 0xc1, 0x89, 0x0a) + /* OEM GUIDs */ #define DELLEMC_EFI_RCI2_TABLE_GUID EFI_GUID(0x2d9f28a2, 0xa886, 0x456a, 0x97, 0xa8, 0xf1, 0x1e, 0xf2, 0x4f, 0xf4, 0x55) #define AMD_SEV_MEM_ENCRYPT_GUID EFI_GUID(0x0cf29b71, 0x9e51, 0x433a, 0xa3, 0xb7, 0x81, 0xf3, 0xab, 0x16, 0xb8, 0x75) -- cgit From 56c134f7f1b58be08bdb0ca8372474a4a5165f31 Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Fri, 29 Apr 2022 12:08:31 +0200 Subject: fbdev: Track deferred-I/O pages in pageref struct Store the per-page state for fbdev's deferred I/O in struct fb_deferred_io_pageref. Maintain a list of pagerefs for the pages that have to be written back to video memory. Update all affected drivers. As with pages before, fbdev acquires a pageref when an mmaped page of the framebuffer is being written to. It holds the pageref in a list of all currently written pagerefs until it flushes the written pages to video memory. Writeback occurs periodically. After writeback fbdev releases all pagerefs and builds up a new dirty list until the next writeback occurs. Using pagerefs has a number of benefits. For pages of the framebuffer, the deferred I/O code used struct page.lru as an entry into the list of dirty pages. The lru field is owned by the page cache, which makes deferred I/O incompatible with some memory pages (e.g., most notably DRM's GEM SHMEM allocator). struct fb_deferred_io_pageref now provides an entry into a list of dirty framebuffer pages, freeing lru for use with the page cache. Drivers also assumed that struct page.index is the page offset into the framebuffer. This is not true for DRM buffers, which are located at various offset within a mapped area. struct fb_deferred_io_pageref explicitly stores an offset into the framebuffer. struct page.index is now only the page offset into the mapped area. These changes will allow DRM to use fbdev deferred I/O without an intermediate shadow buffer. v3: * use pageref->offset for sorting * fix grammar in comment v2: * minor fixes in commit message Signed-off-by: Thomas Zimmermann Reviewed-by: Javier Martinez Canillas Link: https://patchwork.freedesktop.org/patch/msgid/20220429100834.18898-3-tzimmermann@suse.de --- include/linux/fb.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index f95da1af9ff6..a332590c0fae 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -201,6 +201,13 @@ struct fb_pixmap { }; #ifdef CONFIG_FB_DEFERRED_IO +struct fb_deferred_io_pageref { + struct page *page; + unsigned long offset; + /* private */ + struct list_head list; +}; + struct fb_deferred_io { /* delay between mkwrite and deferred handler */ unsigned long delay; @@ -468,6 +475,8 @@ struct fb_info { #endif #ifdef CONFIG_FB_DEFERRED_IO struct delayed_work deferred_work; + unsigned long npagerefs; + struct fb_deferred_io_pageref *pagerefs; struct fb_deferred_io *fbdefio; #endif @@ -661,7 +670,7 @@ static inline void __fb_pad_aligned_buffer(u8 *dst, u32 d_pitch, /* drivers/video/fb_defio.c */ int fb_deferred_io_mmap(struct fb_info *info, struct vm_area_struct *vma); -extern void fb_deferred_io_init(struct fb_info *info); +extern int fb_deferred_io_init(struct fb_info *info); extern void fb_deferred_io_open(struct fb_info *info, struct inode *inode, struct file *file); -- cgit From e80eec1b871a2acb8f5c92db4c237e9ae6dd322b Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Fri, 29 Apr 2022 12:08:33 +0200 Subject: fbdev: Rename pagelist to pagereflist for deferred I/O Rename various instances of pagelist to pagereflist. The list now stores pageref structures, so the new name is more appropriate. In their write-back helpers, several fbdev drivers refer to the pageref list in struct fb_deferred_io instead of using the one supplied as argument to the function. Convert them over to the supplied one. It's the same instance, so no change of behavior occurs. v4: * fix commit message (Javier) Suggested-by: Sam Ravnborg Signed-off-by: Thomas Zimmermann Reviewed-by: Javier Martinez Canillas Link: https://patchwork.freedesktop.org/patch/msgid/20220429100834.18898-5-tzimmermann@suse.de --- include/linux/fb.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index a332590c0fae..69c67c70fa78 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -211,9 +211,9 @@ struct fb_deferred_io_pageref { struct fb_deferred_io { /* delay between mkwrite and deferred handler */ unsigned long delay; - bool sort_pagelist; /* sort pagelist by offset */ - struct mutex lock; /* mutex that protects the page list */ - struct list_head pagelist; /* list of touched pages */ + bool sort_pagereflist; /* sort pagelist by offset */ + struct mutex lock; /* mutex that protects the pageref list */ + struct list_head pagereflist; /* list of pagerefs for touched pages */ /* callback */ void (*first_io)(struct fb_info *info); void (*deferred_io)(struct fb_info *info, struct list_head *pagelist); -- cgit From f2b6e79c7378542e03e42fd47c0c8bf00a2fcf6b Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 3 May 2022 16:44:02 +0200 Subject: Revert "usb: core: hcd: Create platform devices for onboard hubs in probe()" This reverts commit c40b62216c1aecc0dc00faf33d71bd71cb440337. The series still has built errors as reported in linux-next, so revert it for now. Reported-by: Stephen Rothwell Link: https://lore.kernel.org/r/20220502210728.0b36f3cd@canb.auug.org.au Cc: Stephen Boyd Cc: Douglas Anderson Cc: Matthias Kaehlcke Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/hcd.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h index 4ebc91c09182..548a028f2dab 100644 --- a/include/linux/usb/hcd.h +++ b/include/linux/usb/hcd.h @@ -198,7 +198,6 @@ struct usb_hcd { struct usb_hcd *shared_hcd; struct usb_hcd *primary_hcd; - struct list_head onboard_hub_devs; #define HCD_BUFFER_POOLS 4 struct dma_pool *pool[HCD_BUFFER_POOLS]; -- cgit From 67a7570ad31f2a7f02550153e774bf4c630f0fef Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 3 May 2022 16:44:11 +0200 Subject: Revert "usb: misc: Add onboard_usb_hub driver" This reverts commit 0298b4b95cb373c21e6323c905589f8dac42c5b4. The series still has built errors as reported in linux-next, so revert it for now. Reported-by: Stephen Rothwell Link: https://lore.kernel.org/r/20220502210728.0b36f3cd@canb.auug.org.au Cc: Alan Stern Cc: Douglas Anderson Cc: Matthias Kaehlcke Cc: Ravi Chandra Sadineni Cc: Stephen Boyd Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/onboard_hub.h | 18 ------------------ 1 file changed, 18 deletions(-) delete mode 100644 include/linux/usb/onboard_hub.h (limited to 'include/linux') diff --git a/include/linux/usb/onboard_hub.h b/include/linux/usb/onboard_hub.h deleted file mode 100644 index d9373230556e..000000000000 --- a/include/linux/usb/onboard_hub.h +++ /dev/null @@ -1,18 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ - -#ifndef __LINUX_USB_ONBOARD_HUB_H -#define __LINUX_USB_ONBOARD_HUB_H - -struct usb_device; -struct list_head; - -#if IS_ENABLED(CONFIG_USB_ONBOARD_HUB) -void onboard_hub_create_pdevs(struct usb_device *parent_hub, struct list_head *pdev_list); -void onboard_hub_destroy_pdevs(struct list_head *pdev_list); -#else -static inline void onboard_hub_create_pdevs(struct usb_device *parent_hub, - struct list_head *pdev_list) {} -static inline void onboard_hub_destroy_pdevs(struct list_head *pdev_list) {} -#endif - -#endif /* __LINUX_USB_ONBOARD_HUB_H */ -- cgit From 1a9517a0a43040b77b5ef64d9053df214ba33d17 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 3 May 2022 16:44:24 +0200 Subject: Revert "of/platform: Add stubs for of_platform_device_create/destroy()" This reverts commit 8e8b11956486e3fe8cacf54a1d492ebdd8cc1fb2. The series still has built errors as reported in linux-next, so revert it for now. Reported-by: Stephen Rothwell Link: https://lore.kernel.org/r/20220502210728.0b36f3cd@canb.auug.org.au Cc: Stephen Boyd Cc: Douglas Anderson Cc: Rob Herring Cc: Matthias Kaehlcke Signed-off-by: Greg Kroah-Hartman --- include/linux/of_platform.h | 22 ++++------------------ 1 file changed, 4 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of_platform.h b/include/linux/of_platform.h index d15b6cd5e1c3..84a966623e78 100644 --- a/include/linux/of_platform.h +++ b/include/linux/of_platform.h @@ -61,18 +61,16 @@ static inline struct platform_device *of_find_device_by_node(struct device_node } #endif -extern int of_platform_bus_probe(struct device_node *root, - const struct of_device_id *matches, - struct device *parent); - -#ifdef CONFIG_OF_ADDRESS /* Platform devices and busses creation */ extern struct platform_device *of_platform_device_create(struct device_node *np, const char *bus_id, struct device *parent); extern int of_platform_device_destroy(struct device *dev, void *data); - +extern int of_platform_bus_probe(struct device_node *root, + const struct of_device_id *matches, + struct device *parent); +#ifdef CONFIG_OF_ADDRESS extern int of_platform_populate(struct device_node *root, const struct of_device_id *matches, const struct of_dev_auxdata *lookup, @@ -86,18 +84,6 @@ extern int devm_of_platform_populate(struct device *dev); extern void devm_of_platform_depopulate(struct device *dev); #else -/* Platform devices and busses creation */ -static inline struct platform_device *of_platform_device_create(struct device_node *np, - const char *bus_id, - struct device *parent) -{ - return NULL; -} -static inline int of_platform_device_destroy(struct device *dev, void *data) -{ - return -ENODEV; -} - static inline int of_platform_populate(struct device_node *root, const struct of_device_id *matches, const struct of_dev_auxdata *lookup, -- cgit From 1ac17586c950a2c129393f8a92901a2b357acf24 Mon Sep 17 00:00:00 2001 From: Frank Rowand Date: Mon, 2 May 2022 13:17:40 -0500 Subject: of: overlay: add entry to of_overlay_action_name[] The values of enum of_overlay_notify_action are used to index into array of_overlay_action_name. Add an entry to of_overlay_action_name for the value recently added to of_overlay_notify_action. Array of_overlay_action_name[] is moved into include/linux/of.h adjacent to enum of_overlay_notify_action to make the connection between the two more obvious if either is modified in the future. The only use of of_overlay_action_name is for error reporting in overlay_notify(). All callers of overlay_notify() report the same error, but with fewer details. Remove the redundant error reports in the callers. Fixes: 067c098766c6 ("of: overlay: rework overlay apply and remove kfree()s") Reported-by: Dan Carpenter Signed-off-by: Frank Rowand Signed-off-by: Rob Herring Link: https://lore.kernel.org/r/20220502181742.1402826-2-frowand.list@gmail.com --- include/linux/of.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index 17741eee0ca4..f0a5d6b10c5a 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -1550,6 +1550,19 @@ enum of_overlay_notify_action { OF_OVERLAY_POST_REMOVE, }; +static inline char *of_overlay_action_name(enum of_overlay_notify_action action) +{ + static char *of_overlay_action_name[] = { + "init", + "pre-apply", + "post-apply", + "pre-remove", + "post-remove", + }; + + return of_overlay_action_name[action]; +} + struct of_overlay_notify_data { struct device_node *overlay; struct device_node *target; -- cgit From 282d8998e9979c2186af7f7d22366f2fc3149838 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 8 Mar 2022 15:45:33 -0800 Subject: srcu: Prevent expedited GPs and blocking readers from consuming CPU If an SRCU reader blocks while a synchronize_srcu_expedited() waits for that same reader, then that grace period will spawn an endless series of workqueue handlers, consuming a full CPU. This quickly gets pointless because consuming more CPU isn't going to make that reader get done faster, especially if it is blocked waiting for an external event. This commit therefore spawns at most one pair of back-to-back workqueue handlers per expedited grace period phase, instead inserting increasing delays as that grace period phase grows older, but capped at 10 jiffies. In any case, if there have been at least 100 back-to-back workqueue handlers within a single jiffy, regardless of grace period or grace-period phase, then a one-jiffy delay is inserted. [ paulmck: Apply feedback from kernel test robot. ] Cc: Neeraj Upadhyay Reported-by: Song Liu Tested-by: kernel test robot Signed-off-by: Paul E. McKenney --- include/linux/srcutree.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/srcutree.h b/include/linux/srcutree.h index 1b9ff4ed37e4..e3014319d1ad 100644 --- a/include/linux/srcutree.h +++ b/include/linux/srcutree.h @@ -71,9 +71,11 @@ struct srcu_struct { unsigned long srcu_gp_seq; /* Grace-period seq #. */ unsigned long srcu_gp_seq_needed; /* Latest gp_seq needed. */ unsigned long srcu_gp_seq_needed_exp; /* Furthest future exp GP. */ + unsigned long srcu_gp_start; /* Last GP start timestamp (jiffies) */ unsigned long srcu_last_gp_end; /* Last GP end timestamp (ns) */ unsigned long srcu_size_jiffies; /* Current contention-measurement interval. */ unsigned long srcu_n_lock_retries; /* Contention events in current interval. */ + unsigned long srcu_n_exp_nodelay; /* # expedited no-delays in current GP phase. */ struct srcu_data __percpu *sda; /* Per-CPU srcu_data array. */ bool sda_is_static; /* May ->sda be passed to free_percpu()? */ unsigned long srcu_barrier_seq; /* srcu_barrier seq #. */ @@ -83,6 +85,8 @@ struct srcu_struct { atomic_t srcu_barrier_cpu_cnt; /* # CPUs not yet posting a */ /* callback for the barrier */ /* operation. */ + unsigned long reschedule_jiffies; + unsigned long reschedule_count; struct delayed_work work; struct lockdep_map dep_map; }; -- cgit From c29722fad4aabbf6bb841b8f058f858ec911df56 Mon Sep 17 00:00:00 2001 From: Christian Göttsche Date: Tue, 8 Mar 2022 18:09:26 +0100 Subject: selinux: log anon inode class name MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Log the anonymous inode class name in the security hook inode_init_security_anon. This name is the key for name based type transitions on the anon_inode security class on creation. Example: type=AVC msg=audit(02/16/22 22:02:50.585:216) : avc: granted \ { create } for pid=2136 comm=mariadbd anonclass=[io_uring] \ scontext=system_u:system_r:mysqld_t:s0 \ tcontext=system_u:system_r:mysqld_iouring_t:s0 tclass=anon_inode Add a new LSM audit data type holding the inode and the class name. Signed-off-by: Christian Göttsche [PM: adjusted 'anonclass' to be a trusted string, cgzones approved] Signed-off-by: Paul Moore --- include/linux/lsm_audit.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lsm_audit.h b/include/linux/lsm_audit.h index 17d02eda9538..97a8b21eb033 100644 --- a/include/linux/lsm_audit.h +++ b/include/linux/lsm_audit.h @@ -76,6 +76,7 @@ struct common_audit_data { #define LSM_AUDIT_DATA_IBENDPORT 14 #define LSM_AUDIT_DATA_LOCKDOWN 15 #define LSM_AUDIT_DATA_NOTIFICATION 16 +#define LSM_AUDIT_DATA_ANONINODE 17 union { struct path path; struct dentry *dentry; @@ -96,6 +97,7 @@ struct common_audit_data { struct lsm_ibpkey_audit *ibpkey; struct lsm_ibendport_audit *ibendport; int reason; + const char *anonclass; } u; /* this union contains LSM specific data */ union { -- cgit From c2aa2dfef243efe213a480a1ee8566507a5152f4 Mon Sep 17 00:00:00 2001 From: Sargun Dhillon Date: Tue, 3 May 2022 01:09:56 -0700 Subject: seccomp: Add wait_killable semantic to seccomp user notifier This introduces a per-filter flag (SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) that makes it so that when notifications are received by the supervisor the notifying process will transition to wait killable semantics. Although wait killable isn't a set of semantics formally exposed to userspace, the concept is searchable. If the notifying process is signaled prior to the notification being received by the userspace agent, it will be handled as normal. One quirk about how this is handled is that the notifying process only switches to TASK_KILLABLE if it receives a wakeup from either an addfd or a signal. This is to avoid an unnecessary wakeup of the notifying task. The reasons behind switching into wait_killable only after userspace receives the notification are: * Avoiding unncessary work - Often, workloads will perform work that they may abort (request racing comes to mind). This allows for syscalls to be aborted safely prior to the notification being received by the supervisor. In this, the supervisor doesn't end up doing work that the workload does not want to complete anyways. * Avoiding side effects - We don't want the syscall to be interruptible once the supervisor starts doing work because it may not be trivial to reverse the operation. For example, unmounting a file system may take a long time, and it's hard to rollback, or treat that as reentrant. * Avoid breaking runtimes - Various runtimes do not GC when they are during a syscall (or while running native code that subsequently calls a syscall). If many notifications are blocked, and not picked up by the supervisor, this can get the application into a bad state. Signed-off-by: Sargun Dhillon Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220503080958.20220-2-sargun@sargun.me --- include/linux/seccomp.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h index 0c564e5d40ff..d31d76be4982 100644 --- a/include/linux/seccomp.h +++ b/include/linux/seccomp.h @@ -8,7 +8,8 @@ SECCOMP_FILTER_FLAG_LOG | \ SECCOMP_FILTER_FLAG_SPEC_ALLOW | \ SECCOMP_FILTER_FLAG_NEW_LISTENER | \ - SECCOMP_FILTER_FLAG_TSYNC_ESRCH) + SECCOMP_FILTER_FLAG_TSYNC_ESRCH | \ + SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV) /* sizeof() the first published struct seccomp_notif_addfd */ #define SECCOMP_NOTIFY_ADDFD_SIZE_VER0 24 -- cgit From 58caed3dacb4354a25a1aa8d2febc3e9648ba1f4 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 2 May 2022 16:27:03 -0700 Subject: netdev: reshuffle netif_napi_add() APIs to allow dropping weight Most drivers should not have to worry about selecting the right weight for their NAPI instances and pass NAPI_POLL_WEIGHT. It'd be best if we didn't require the argument at all and selected the default internally. This change prepares the ground for such reshuffling, allowing for a smooth transition. The following API should remain after the next release cycle: netif_napi_add() netif_napi_add_weight() netif_napi_add_tx() netif_napi_add_tx_weight() Where the _weight() variants take an explicit weight argument. I opted for a _weight() suffix rather than a __ prefix, because we use __ in places to mean that caller needs to also issue a synchronize_net() call. Signed-off-by: Jakub Kicinski Link: https://lore.kernel.org/r/20220502232703.396351-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 50 +++++++++++++++++++++++++++++++---------------- 1 file changed, 33 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 4aba92a4042a..eaf66e57d891 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2499,37 +2499,53 @@ static inline void *netdev_priv(const struct net_device *dev) */ #define NAPI_POLL_WEIGHT 64 +void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, + int (*poll)(struct napi_struct *, int), int weight); + /** - * netif_napi_add - initialize a NAPI context - * @dev: network device - * @napi: NAPI context - * @poll: polling function - * @weight: default weight + * netif_napi_add() - initialize a NAPI context + * @dev: network device + * @napi: NAPI context + * @poll: polling function + * @weight: default weight * * netif_napi_add() must be used to initialize a NAPI context prior to calling * *any* of the other NAPI-related functions. */ -void netif_napi_add(struct net_device *dev, struct napi_struct *napi, - int (*poll)(struct napi_struct *, int), int weight); +static inline void +netif_napi_add(struct net_device *dev, struct napi_struct *napi, + int (*poll)(struct napi_struct *, int), int weight) +{ + netif_napi_add_weight(dev, napi, poll, weight); +} + +static inline void +netif_napi_add_tx_weight(struct net_device *dev, + struct napi_struct *napi, + int (*poll)(struct napi_struct *, int), + int weight) +{ + set_bit(NAPI_STATE_NO_BUSY_POLL, &napi->state); + netif_napi_add_weight(dev, napi, poll, weight); +} + +#define netif_tx_napi_add netif_napi_add_tx_weight /** - * netif_tx_napi_add - initialize a NAPI context - * @dev: network device - * @napi: NAPI context - * @poll: polling function - * @weight: default weight + * netif_napi_add_tx() - initialize a NAPI context to be used for Tx only + * @dev: network device + * @napi: NAPI context + * @poll: polling function * * This variant of netif_napi_add() should be used from drivers using NAPI * to exclusively poll a TX queue. * This will avoid we add it into napi_hash[], thus polluting this hash table. */ -static inline void netif_tx_napi_add(struct net_device *dev, +static inline void netif_napi_add_tx(struct net_device *dev, struct napi_struct *napi, - int (*poll)(struct napi_struct *, int), - int weight) + int (*poll)(struct napi_struct *, int)) { - set_bit(NAPI_STATE_NO_BUSY_POLL, &napi->state); - netif_napi_add(dev, napi, poll, weight); + netif_napi_add_tx_weight(dev, napi, poll, NAPI_POLL_WEIGHT); } /** -- cgit From c674df973ad8af2074c834788e167332d81309fa Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Tue, 8 Mar 2022 20:36:15 +0200 Subject: net/mlx5: Store IPsec ESN update work in XFRM state mlx5 IPsec code updated ESN through workqueue with allocation calls in the data path, which can be saved easily if the work is created during XFRM state initialization routine. The locking used later in the work didn't protect from anything because change of HW context is possible during XFRM state add or delete only, which can cancel work and make sure that it is not running. Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/accel.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/accel.h b/include/linux/mlx5/accel.h index 0f2596297f6a..a2720ebbb9fd 100644 --- a/include/linux/mlx5/accel.h +++ b/include/linux/mlx5/accel.h @@ -127,8 +127,8 @@ struct mlx5_accel_esp_xfrm * mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev, const struct mlx5_accel_esp_xfrm_attrs *attrs); void mlx5_accel_esp_destroy_xfrm(struct mlx5_accel_esp_xfrm *xfrm); -int mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm, - const struct mlx5_accel_esp_xfrm_attrs *attrs); +void mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm, + const struct mlx5_accel_esp_xfrm_attrs *attrs); #else @@ -145,9 +145,11 @@ mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev, } static inline void mlx5_accel_esp_destroy_xfrm(struct mlx5_accel_esp_xfrm *xfrm) {} -static inline int +static inline void mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm, - const struct mlx5_accel_esp_xfrm_attrs *attrs) { return -EOPNOTSUPP; } + const struct mlx5_accel_esp_xfrm_attrs *attrs) +{ +} #endif /* CONFIG_MLX5_EN_IPSEC */ #endif /* __MLX5_ACCEL_H__ */ -- cgit From 2ea36e2e4ad2b77bd5d45142a529b72eb5ca9156 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Tue, 8 Mar 2022 20:55:00 +0200 Subject: net/mlx5: Remove useless validity check All callers build xfrm attributes with help of mlx5e_ipsec_build_accel_xfrm_attrs() function that ensure validity of attributes. There is no need to recheck them again. Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/accel.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/accel.h b/include/linux/mlx5/accel.h index a2720ebbb9fd..9c511d466e55 100644 --- a/include/linux/mlx5/accel.h +++ b/include/linux/mlx5/accel.h @@ -36,10 +36,6 @@ #include -enum mlx5_accel_esp_aes_gcm_keymat_iv_algo { - MLX5_ACCEL_ESP_AES_GCM_IV_ALGO_SEQ, -}; - enum mlx5_accel_esp_flags { MLX5_ACCEL_ESP_FLAGS_TUNNEL = 0, /* Default */ MLX5_ACCEL_ESP_FLAGS_TRANSPORT = 1UL << 0, @@ -57,14 +53,9 @@ enum mlx5_accel_esp_keymats { MLX5_ACCEL_ESP_KEYMAT_AES_GCM, }; -enum mlx5_accel_esp_replay { - MLX5_ACCEL_ESP_REPLAY_NONE, - MLX5_ACCEL_ESP_REPLAY_BMP, -}; struct aes_gcm_keymat { u64 seq_iv; - enum mlx5_accel_esp_aes_gcm_keymat_iv_algo iv_algo; u32 salt; u32 icv_len; @@ -81,7 +72,6 @@ struct mlx5_accel_esp_xfrm_attrs { u32 tfc_pad; u32 flags; u32 sa_handle; - enum mlx5_accel_esp_replay replay_type; union { struct { u32 size; -- cgit From c6e3b421c7079af67201351c9faff62613e06f40 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 9 Mar 2022 10:35:05 +0200 Subject: net/mlx5: Merge various control path IPsec headers into one file The mlx5 IPsec code has logical separation between code that operates with XFRM objects (ipsec.c), HW objects (ipsec_offload.c), flow steering logic (ipsec_fs.c) and data path (ipsec_rxtx.c). Such separation makes sense for C-files, but isn't needed at all for H-files as they are included in batch anyway. Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/accel.h | 145 --------------------------------------------- 1 file changed, 145 deletions(-) delete mode 100644 include/linux/mlx5/accel.h (limited to 'include/linux') diff --git a/include/linux/mlx5/accel.h b/include/linux/mlx5/accel.h deleted file mode 100644 index 9c511d466e55..000000000000 --- a/include/linux/mlx5/accel.h +++ /dev/null @@ -1,145 +0,0 @@ -/* - * Copyright (c) 2018 Mellanox Technologies. All rights reserved. - * - * This software is available to you under a choice of one of two - * licenses. You may choose to be licensed under the terms of the GNU - * General Public License (GPL) Version 2, available from the file - * COPYING in the main directory of this source tree, or the - * OpenIB.org BSD license below: - * - * Redistribution and use in source and binary forms, with or - * without modification, are permitted provided that the following - * conditions are met: - * - * - Redistributions of source code must retain the above - * copyright notice, this list of conditions and the following - * disclaimer. - * - * - Redistributions in binary form must reproduce the above - * copyright notice, this list of conditions and the following - * disclaimer in the documentation and/or other materials - * provided with the distribution. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, - * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF - * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND - * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS - * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN - * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE - * SOFTWARE. - * - */ - -#ifndef __MLX5_ACCEL_H__ -#define __MLX5_ACCEL_H__ - -#include - -enum mlx5_accel_esp_flags { - MLX5_ACCEL_ESP_FLAGS_TUNNEL = 0, /* Default */ - MLX5_ACCEL_ESP_FLAGS_TRANSPORT = 1UL << 0, - MLX5_ACCEL_ESP_FLAGS_ESN_TRIGGERED = 1UL << 1, - MLX5_ACCEL_ESP_FLAGS_ESN_STATE_OVERLAP = 1UL << 2, -}; - -enum mlx5_accel_esp_action { - MLX5_ACCEL_ESP_ACTION_DECRYPT, - MLX5_ACCEL_ESP_ACTION_ENCRYPT, -}; - -enum mlx5_accel_esp_keymats { - MLX5_ACCEL_ESP_KEYMAT_AES_NONE, - MLX5_ACCEL_ESP_KEYMAT_AES_GCM, -}; - - -struct aes_gcm_keymat { - u64 seq_iv; - - u32 salt; - u32 icv_len; - - u32 key_len; - u32 aes_key[256 / 32]; -}; - -struct mlx5_accel_esp_xfrm_attrs { - enum mlx5_accel_esp_action action; - u32 esn; - __be32 spi; - u32 seq; - u32 tfc_pad; - u32 flags; - u32 sa_handle; - union { - struct { - u32 size; - - } bmp; - } replay; - enum mlx5_accel_esp_keymats keymat_type; - union { - struct aes_gcm_keymat aes_gcm; - } keymat; - - union { - __be32 a4; - __be32 a6[4]; - } saddr; - - union { - __be32 a4; - __be32 a6[4]; - } daddr; - - u8 is_ipv6; -}; - -struct mlx5_accel_esp_xfrm { - struct mlx5_core_dev *mdev; - struct mlx5_accel_esp_xfrm_attrs attrs; -}; - -enum mlx5_accel_ipsec_cap { - MLX5_ACCEL_IPSEC_CAP_DEVICE = 1 << 0, - MLX5_ACCEL_IPSEC_CAP_ESP = 1 << 1, - MLX5_ACCEL_IPSEC_CAP_IPV6 = 1 << 2, - MLX5_ACCEL_IPSEC_CAP_LSO = 1 << 3, - MLX5_ACCEL_IPSEC_CAP_ESN = 1 << 4, -}; - -#ifdef CONFIG_MLX5_EN_IPSEC - -u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev); - -struct mlx5_accel_esp_xfrm * -mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev, - const struct mlx5_accel_esp_xfrm_attrs *attrs); -void mlx5_accel_esp_destroy_xfrm(struct mlx5_accel_esp_xfrm *xfrm); -void mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm, - const struct mlx5_accel_esp_xfrm_attrs *attrs); - -#else - -static inline u32 mlx5_ipsec_device_caps(struct mlx5_core_dev *mdev) -{ - return 0; -} - -static inline struct mlx5_accel_esp_xfrm * -mlx5_accel_esp_create_xfrm(struct mlx5_core_dev *mdev, - const struct mlx5_accel_esp_xfrm_attrs *attrs) -{ - return ERR_PTR(-EOPNOTSUPP); -} -static inline void -mlx5_accel_esp_destroy_xfrm(struct mlx5_accel_esp_xfrm *xfrm) {} -static inline void -mlx5_accel_esp_modify_xfrm(struct mlx5_accel_esp_xfrm *xfrm, - const struct mlx5_accel_esp_xfrm_attrs *attrs) -{ -} - -#endif /* CONFIG_MLX5_EN_IPSEC */ -#endif /* __MLX5_ACCEL_H__ */ -- cgit From 1c4a59b9fa98c222bd6a8fa82bff942312165c1a Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Sun, 27 Mar 2022 19:23:19 +0300 Subject: net/mlx5: Remove not-supported ICV length mlx5 doesn't allow to configure any AEAD ICV length other than 128, so remove the logic that configures other unsupported values. Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 7f4ec9faa180..7bab3e51c61e 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -11379,8 +11379,6 @@ enum { enum { MLX5_IPSEC_OBJECT_ICV_LEN_16B, - MLX5_IPSEC_OBJECT_ICV_LEN_12B, - MLX5_IPSEC_OBJECT_ICV_LEN_8B, }; struct mlx5_ifc_ipsec_obj_bits { -- cgit From 31ab09b4218879bc394c9faa6da983a82a694600 Mon Sep 17 00:00:00 2001 From: Dipen Patel Date: Fri, 22 Apr 2022 13:52:13 -0700 Subject: drivers: Add hardware timestamp engine (HTE) subsystem Some devices can timestamp system lines/signals/Buses in real-time using the hardware counter or other hardware means which can give finer granularity and help avoid jitter introduced by software timestamping. To utilize such functionality, this patchset creates HTE subsystem where devices can register themselves as providers so that the consumers devices can request specific line from the providers. The patch also adds compilation support in Makefile and menu options in Kconfig. The provider does following: - Registers chip with the framework. - Provides translation hook to convert logical line id. - Provides enable/disable, request/release callbacks. - Pushes timestamp data to HTE subsystem. The consumer does following: - Initializes line attribute. - Gets HTE timestamp descriptor. - Requests timestamp functionality. - Puts HTE timestamp descriptor. Signed-off-by: Dipen Patel Reported-by: kernel test robot Signed-off-by: Thierry Reding --- include/linux/hte.h | 271 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 271 insertions(+) create mode 100644 include/linux/hte.h (limited to 'include/linux') diff --git a/include/linux/hte.h b/include/linux/hte.h new file mode 100644 index 000000000000..8289055061ab --- /dev/null +++ b/include/linux/hte.h @@ -0,0 +1,271 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LINUX_HTE_H +#define __LINUX_HTE_H + +#include + +struct hte_chip; +struct hte_device; +struct of_phandle_args; + +/** + * enum hte_edge - HTE line edge flags. + * + * @HTE_EDGE_NO_SETUP: No edge setup. In this case consumer will setup edges, + * for example during request irq call. + * @HTE_RISING_EDGE_TS: Rising edge. + * @HTE_FALLING_EDGE_TS: Falling edge. + * + */ +enum hte_edge { + HTE_EDGE_NO_SETUP = 1U << 0, + HTE_RISING_EDGE_TS = 1U << 1, + HTE_FALLING_EDGE_TS = 1U << 2, +}; + +/** + * enum hte_return - HTE subsystem return values used during callback. + * + * @HTE_CB_HANDLED: The consumer handled the data. + * @HTE_RUN_SECOND_CB: The consumer needs further processing, in that case + * HTE subsystem calls secondary callback provided by the consumer where it + * is allowed to sleep. + */ +enum hte_return { + HTE_CB_HANDLED, + HTE_RUN_SECOND_CB, +}; + +/** + * struct hte_ts_data - HTE timestamp data. + * + * @tsc: Timestamp value. + * @seq: Sequence counter of the timestamps. + * @raw_level: Level of the line at the timestamp if provider supports it, + * -1 otherwise. + */ +struct hte_ts_data { + u64 tsc; + u64 seq; + int raw_level; +}; + +/** + * struct hte_clk_info - Clock source info that HTE provider uses to timestamp. + * + * @hz: Supported clock rate in HZ, for example 1KHz clock = 1000. + * @type: Supported clock type. + */ +struct hte_clk_info { + u64 hz; + clockid_t type; +}; + +/** + * typedef hte_ts_cb_t - HTE timestamp data processing primary callback. + * + * The callback is used to push timestamp data to the client and it is + * not allowed to sleep. + * + * @ts: HW timestamp data. + * @data: Client supplied data. + */ +typedef enum hte_return (*hte_ts_cb_t)(struct hte_ts_data *ts, void *data); + +/** + * typedef hte_ts_sec_cb_t - HTE timestamp data processing secondary callback. + * + * This is used when the client needs further processing where it is + * allowed to sleep. + * + * @data: Client supplied data. + * + */ +typedef enum hte_return (*hte_ts_sec_cb_t)(void *data); + +/** + * struct hte_line_attr - Line attributes. + * + * @line_id: The logical ID understood by the consumers and providers. + * @line_data: Line data related to line_id. + * @edge_flags: Edge setup flags. + * @name: Descriptive name of the entity that is being monitored for the + * hardware timestamping. If null, HTE core will construct the name. + * + */ +struct hte_line_attr { + u32 line_id; + void *line_data; + unsigned long edge_flags; + const char *name; +}; + +/** + * struct hte_ts_desc - HTE timestamp descriptor. + * + * This structure is a communication token between consumers to subsystem + * and subsystem to providers. + * + * @attr: The line attributes. + * @hte_data: Subsystem's private data, set by HTE subsystem. + */ +struct hte_ts_desc { + struct hte_line_attr attr; + void *hte_data; +}; + +/** + * struct hte_ops - HTE operations set by providers. + * + * @request: Hook for requesting a HTE timestamp. Returns 0 on success, + * non-zero for failures. + * @release: Hook for releasing a HTE timestamp. Returns 0 on success, + * non-zero for failures. + * @enable: Hook to enable the specified timestamp. Returns 0 on success, + * non-zero for failures. + * @disable: Hook to disable specified timestamp. Returns 0 on success, + * non-zero for failures. + * @get_clk_src_info: Hook to get the clock information the provider uses + * to timestamp. Returns 0 for success and negative error code for failure. On + * success HTE subsystem fills up provided struct hte_clk_info. + * + * xlated_id parameter is used to communicate between HTE subsystem and the + * providers and is translated by the provider. + */ +struct hte_ops { + int (*request)(struct hte_chip *chip, struct hte_ts_desc *desc, + u32 xlated_id); + int (*release)(struct hte_chip *chip, struct hte_ts_desc *desc, + u32 xlated_id); + int (*enable)(struct hte_chip *chip, u32 xlated_id); + int (*disable)(struct hte_chip *chip, u32 xlated_id); + int (*get_clk_src_info)(struct hte_chip *chip, + struct hte_clk_info *ci); +}; + +/** + * struct hte_chip - Abstract HTE chip. + * + * @name: functional name of the HTE IP block. + * @dev: device providing the HTE. + * @ops: callbacks for this HTE. + * @nlines: number of lines/signals supported by this chip. + * @xlate_of: Callback which translates consumer supplied logical ids to + * physical ids, return 0 for the success and negative for the failures. + * It stores (between 0 to @nlines) in xlated_id parameter for the success. + * @xlate_plat: Same as above but for the consumers with no DT node. + * @match_from_linedata: Match HTE device using the line_data. + * @of_hte_n_cells: Number of cells used to form the HTE specifier. + * @gdev: HTE subsystem abstract device, internal to the HTE subsystem. + * @data: chip specific private data. + */ +struct hte_chip { + const char *name; + struct device *dev; + const struct hte_ops *ops; + u32 nlines; + int (*xlate_of)(struct hte_chip *gc, + const struct of_phandle_args *args, + struct hte_ts_desc *desc, u32 *xlated_id); + int (*xlate_plat)(struct hte_chip *gc, struct hte_ts_desc *desc, + u32 *xlated_id); + bool (*match_from_linedata)(const struct hte_chip *chip, + const struct hte_ts_desc *hdesc); + u8 of_hte_n_cells; + + struct hte_device *gdev; + void *data; +}; + +#if IS_ENABLED(CONFIG_HTE) +/* HTE APIs for the providers */ +int devm_hte_register_chip(struct hte_chip *chip); +int hte_push_ts_ns(const struct hte_chip *chip, u32 xlated_id, + struct hte_ts_data *data); + +/* HTE APIs for the consumers */ +int hte_init_line_attr(struct hte_ts_desc *desc, u32 line_id, + unsigned long edge_flags, const char *name, + void *data); +int hte_ts_get(struct device *dev, struct hte_ts_desc *desc, int index); +int hte_ts_put(struct hte_ts_desc *desc); +int hte_request_ts_ns(struct hte_ts_desc *desc, hte_ts_cb_t cb, + hte_ts_sec_cb_t tcb, void *data); +int devm_hte_request_ts_ns(struct device *dev, struct hte_ts_desc *desc, + hte_ts_cb_t cb, hte_ts_sec_cb_t tcb, void *data); +int of_hte_req_count(struct device *dev); +int hte_enable_ts(struct hte_ts_desc *desc); +int hte_disable_ts(struct hte_ts_desc *desc); +int hte_get_clk_src_info(const struct hte_ts_desc *desc, + struct hte_clk_info *ci); + +#else /* !CONFIG_HTE */ +static inline int devm_hte_register_chip(struct hte_chip *chip) +{ + return -EOPNOTSUPP; +} + +static inline int hte_push_ts_ns(const struct hte_chip *chip, + u32 xlated_id, + const struct hte_ts_data *data) +{ + return -EOPNOTSUPP; +} + +static inline int hte_init_line_attr(struct hte_ts_desc *desc, u32 line_id, + unsigned long edge_flags, + const char *name, void *data) +{ + return -EOPNOTSUPP; +} + +static inline int hte_ts_get(struct device *dev, struct hte_ts_desc *desc, + int index) +{ + return -EOPNOTSUPP; +} + +static inline int hte_ts_put(struct hte_ts_desc *desc) +{ + return -EOPNOTSUPP; +} + +static inline int hte_request_ts_ns(struct hte_ts_desc *desc, hte_ts_cb_t cb, + hte_ts_sec_cb_t tcb, void *data) +{ + return -EOPNOTSUPP; +} + +static inline int devm_hte_request_ts_ns(struct device *dev, + struct hte_ts_desc *desc, + hte_ts_cb_t cb, + hte_ts_sec_cb_t tcb, + void *data) +{ + return -EOPNOTSUPP; +} + +static inline int of_hte_req_count(struct device *dev) +{ + return -EOPNOTSUPP; +} + +static inline int hte_enable_ts(struct hte_ts_desc *desc) +{ + return -EOPNOTSUPP; +} + +static inline int hte_disable_ts(struct hte_ts_desc *desc) +{ + return -EOPNOTSUPP; +} + +static inline int hte_get_clk_src_info(const struct hte_ts_desc *desc, + struct hte_clk_info *ci) +{ + return -EOPNOTSUPP; +} +#endif /* !CONFIG_HTE */ + +#endif -- cgit From 42112dd77b74220e6a1f4a71bb51ca3f583d3842 Mon Sep 17 00:00:00 2001 From: Dipen Patel Date: Fri, 22 Apr 2022 13:52:16 -0700 Subject: gpiolib: Add HTE support Some GPIO chip can provide hardware timestamp support on its GPIO lines , in order to support that, additional API needs to be added which can talk to both GPIO chip and HTE (hardware timestamping engine) providers if there is any dependencies. This patch introduces optional hooks to enable and disable hardware timestamping related features in the GPIO controller chip. Signed-off-by: Dipen Patel Reviewed-by: Linus Walleij Reported-by: kernel test robot Signed-off-by: Thierry Reding --- include/linux/gpio/consumer.h | 16 ++++++++++++++-- include/linux/gpio/driver.h | 10 ++++++++++ 2 files changed, 24 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h index c3aa8b330e1c..7eaec081ae6c 100644 --- a/include/linux/gpio/consumer.h +++ b/include/linux/gpio/consumer.h @@ -109,6 +109,8 @@ int gpiod_get_direction(struct gpio_desc *desc); int gpiod_direction_input(struct gpio_desc *desc); int gpiod_direction_output(struct gpio_desc *desc, int value); int gpiod_direction_output_raw(struct gpio_desc *desc, int value); +int gpiod_enable_hw_timestamp_ns(struct gpio_desc *desc, unsigned long flags); +int gpiod_disable_hw_timestamp_ns(struct gpio_desc *desc, unsigned long flags); /* Value get/set from non-sleeping context */ int gpiod_get_value(const struct gpio_desc *desc); @@ -350,8 +352,18 @@ static inline int gpiod_direction_output_raw(struct gpio_desc *desc, int value) WARN_ON(desc); return -ENOSYS; } - - +static inline int gpiod_enable_hw_timestamp_ns(struct gpio_desc *desc, + unsigned long flags) +{ + WARN_ON(desc); + return -ENOSYS; +} +static inline int gpiod_disable_hw_timestamp_ns(struct gpio_desc *desc, + unsigned long flags) +{ + WARN_ON(desc); + return -ENOSYS; +} static inline int gpiod_get_value(const struct gpio_desc *desc) { /* GPIO can never have been requested */ diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index 98c93510640e..0ce96ebdaa36 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -323,6 +323,10 @@ struct gpio_irq_chip { * @add_pin_ranges: optional routine to initialize pin ranges, to be used when * requires special mapping of the pins that provides GPIO functionality. * It is called after adding GPIO chip and before adding IRQ chip. + * @en_hw_timestamp: Dependent on GPIO chip, an optional routine to + * enable hardware timestamp. + * @dis_hw_timestamp: Dependent on GPIO chip, an optional routine to + * disable hardware timestamp. * @base: identifies the first GPIO number handled by this chip; * or, if negative during registration, requests dynamic ID allocation. * DEPRECATION: providing anything non-negative and nailing the base @@ -419,6 +423,12 @@ struct gpio_chip { int (*add_pin_ranges)(struct gpio_chip *gc); + int (*en_hw_timestamp)(struct gpio_chip *gc, + u32 offset, + unsigned long flags); + int (*dis_hw_timestamp)(struct gpio_chip *gc, + u32 offset, + unsigned long flags); int base; u16 ngpio; u16 offset; -- cgit From 3ae2722c93c916b47bfe47a4326e88cc4abb9bf4 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 27 Apr 2022 14:55:57 +0200 Subject: mmc: mmci: Remove custom ios handler The custom boardfile ios handler isn't used anywhere in the kernel. Delete it. Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220427125557.1608825-1-linus.walleij@linaro.org Signed-off-by: Ulf Hansson --- include/linux/amba/mmci.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/amba/mmci.h b/include/linux/amba/mmci.h index c92ebc39fc1f..6f96dc2209c0 100644 --- a/include/linux/amba/mmci.h +++ b/include/linux/amba/mmci.h @@ -13,17 +13,11 @@ * @ocr_mask: available voltages on the 4 pins from the block, this * is ignored if a regulator is used, see the MMC_VDD_* masks in * mmc/host.h - * @ios_handler: a callback function to act on specfic ios changes, - * used for example to control a levelshifter - * mask into a value to be binary (or set some other custom bits - * in MMCIPWR) or:ed and written into the MMCIPWR register of the - * block. May also control external power based on the power_mode. * @status: if no GPIO line was given to the block in this function will * be called to determine whether a card is present in the MMC slot or not */ struct mmci_platform_data { unsigned int ocr_mask; - int (*ios_handler)(struct device *, struct mmc_ios *); unsigned int (*status)(struct device *); }; -- cgit From 00ce3873f730fb24c657854ec04374358e711838 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Wed, 4 May 2022 10:17:32 +0200 Subject: opp: Add apis to retrieve opps with interconnect bandwidth Add dev_pm_opp_find_bw_ceil and dev_pm_opp_find_bw_floor to retrieve opps based on interconnect associated with the opp and bandwidth. The index variable is the index of the interconnect as specified in the opp table in Devicetree. Co-developed-by: Thara Gopinath Signed-off-by: Thara Gopinath Signed-off-by: Krzysztof Kozlowski Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 0d85a63a1f78..dcea178868c9 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -129,6 +129,13 @@ struct dev_pm_opp *dev_pm_opp_find_freq_ceil_by_volt(struct device *dev, struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, unsigned long *freq); + +struct dev_pm_opp *dev_pm_opp_find_bw_ceil(struct device *dev, + unsigned int *bw, int index); + +struct dev_pm_opp *dev_pm_opp_find_bw_floor(struct device *dev, + unsigned int *bw, int index); + void dev_pm_opp_put(struct dev_pm_opp *opp); int dev_pm_opp_add(struct device *dev, unsigned long freq, @@ -279,6 +286,18 @@ static inline struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, return ERR_PTR(-EOPNOTSUPP); } +static inline struct dev_pm_opp *dev_pm_opp_find_bw_ceil(struct device *dev, + unsigned int *bw, int index) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +static inline struct dev_pm_opp *dev_pm_opp_find_bw_floor(struct device *dev, + unsigned int *bw, int index) +{ + return ERR_PTR(-EOPNOTSUPP); +} + static inline void dev_pm_opp_put(struct dev_pm_opp *opp) {} static inline int dev_pm_opp_add(struct device *dev, unsigned long freq, -- cgit From 22079af7df5a5dfef1c4d160abfd43035211759e Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Wed, 4 May 2022 15:40:22 +0530 Subject: opp: Reorder definition of ceil/floor helpers Reorder the helpers to keep all freq specific ones, followed by level and bw. No functional change. Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index dcea178868c9..6708b4ec244d 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -117,16 +117,16 @@ unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev); struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, unsigned long freq, bool available); -struct dev_pm_opp *dev_pm_opp_find_level_exact(struct device *dev, - unsigned int level); -struct dev_pm_opp *dev_pm_opp_find_level_ceil(struct device *dev, - unsigned int *level); - struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, unsigned long *freq); struct dev_pm_opp *dev_pm_opp_find_freq_ceil_by_volt(struct device *dev, unsigned long u_volt); +struct dev_pm_opp *dev_pm_opp_find_level_exact(struct device *dev, + unsigned int level); +struct dev_pm_opp *dev_pm_opp_find_level_ceil(struct device *dev, + unsigned int *level); + struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, unsigned long *freq); @@ -250,12 +250,6 @@ static inline unsigned long dev_pm_opp_get_suspend_opp_freq(struct device *dev) return 0; } -static inline struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, - unsigned long freq, bool available) -{ - return ERR_PTR(-EOPNOTSUPP); -} - static inline struct dev_pm_opp *dev_pm_opp_find_level_exact(struct device *dev, unsigned int level) { @@ -268,6 +262,12 @@ static inline struct dev_pm_opp *dev_pm_opp_find_level_ceil(struct device *dev, return ERR_PTR(-EOPNOTSUPP); } +static inline struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, + unsigned long freq, bool available) +{ + return ERR_PTR(-EOPNOTSUPP); +} + static inline struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, unsigned long *freq) { -- cgit From 34453c2e9f799d02f5f379519495208bbd96a935 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Tue, 5 Apr 2022 19:23:24 +0100 Subject: irqchip/gic-v3: Exposes bit values for GICR_CTLR.{IR, CES} As we're about to expose GICR_CTLR.{IR,CES} to guests, populate the include file with the architectural values. Signed-off-by: Marc Zyngier Reviewed-by: Oliver Upton Link: https://lore.kernel.org/r/20220405182327.205520-2-maz@kernel.org --- include/linux/irqchip/arm-gic-v3.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/irqchip/arm-gic-v3.h b/include/linux/irqchip/arm-gic-v3.h index 12d91f0dedf9..728691365464 100644 --- a/include/linux/irqchip/arm-gic-v3.h +++ b/include/linux/irqchip/arm-gic-v3.h @@ -127,6 +127,8 @@ #define GICR_PIDR2 GICD_PIDR2 #define GICR_CTLR_ENABLE_LPIS (1UL << 0) +#define GICR_CTLR_CES (1UL << 1) +#define GICR_CTLR_IR (1UL << 2) #define GICR_CTLR_RWP (1UL << 3) #define GICR_TYPER_CPU_NUMBER(r) (((r) >> 8) & 0xffff) -- cgit From ec69dfbdc426f22a9557e5c5408d7902fe0e0144 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson Date: Mon, 2 May 2022 14:54:06 -0700 Subject: soc: qcom: llcc: Add sc8180x and sc8280xp configurations Add LLCC configuration data for the SC8180X and SC8280XP platforms, based on the downstream tables. Signed-off-by: Bjorn Andersson Reviewed-by: Sai Prakash Ranjan Link: https://lore.kernel.org/r/20220502215406.612967-3-bjorn.andersson@linaro.org --- include/linux/soc/qcom/llcc-qcom.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/llcc-qcom.h b/include/linux/soc/qcom/llcc-qcom.h index 0bc21ee58fac..9ed5384c5ca1 100644 --- a/include/linux/soc/qcom/llcc-qcom.h +++ b/include/linux/soc/qcom/llcc-qcom.h @@ -29,6 +29,8 @@ #define LLCC_AUDHW 22 #define LLCC_NPU 23 #define LLCC_WLHW 24 +#define LLCC_PIMEM 25 +#define LLCC_DRE 26 #define LLCC_CVP 28 #define LLCC_MODPE 29 #define LLCC_APTCM 30 -- cgit From 170f37d6aa6ad4582eefd7459015de79e244536e Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 3 May 2022 00:09:31 -0400 Subject: block: Do not call folio_next() on an unreferenced folio It is unsafe to call folio_next() on a folio unless you hold a reference on it that prevents it from being split or freed. After returning from the iterator, iomap calls folio_end_writeback() which may drop the last reference to the page, or allow the page to be split. If that happens, the iterator will not advance far enough through the bio_vec, leading to assertion failures like the BUG() in folio_end_writeback() that checks we're not trying to end writeback on a page not currently under writeback. Other assertion failures were also seen, but they're all explained by this one bug. Fix the bug by remembering where the next folio starts before returning from the iterator. There are other ways of fixing this bug, but this seems the simplest. Reported-by: Darrick J. Wong Tested-by: Darrick J. Wong Reported-by: Brian Foster Tested-by: Brian Foster Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/bio.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 278cc81cc1e7..00450fd86bb4 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -269,6 +269,7 @@ struct folio_iter { size_t offset; size_t length; /* private: for use by the iterator */ + struct folio *_next; size_t _seg_count; int _i; }; @@ -283,6 +284,7 @@ static inline void bio_first_folio(struct folio_iter *fi, struct bio *bio, PAGE_SIZE * (bvec->bv_page - &fi->folio->page); fi->_seg_count = bvec->bv_len; fi->length = min(folio_size(fi->folio) - fi->offset, fi->_seg_count); + fi->_next = folio_next(fi->folio); fi->_i = i; } @@ -290,9 +292,10 @@ static inline void bio_next_folio(struct folio_iter *fi, struct bio *bio) { fi->_seg_count -= fi->length; if (fi->_seg_count) { - fi->folio = folio_next(fi->folio); + fi->folio = fi->_next; fi->offset = 0; fi->length = min(folio_size(fi->folio), fi->_seg_count); + fi->_next = folio_next(fi->folio); } else if (fi->_i + 1 < bio->bi_vcnt) { bio_first_folio(fi, bio, fi->_i + 1); } else { -- cgit From 8e1de7042596abb7cb277ea751fc13a4c2b65aea Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Sun, 13 Feb 2022 16:44:45 +0200 Subject: thunderbolt: Add support for XDomain lane bonding The USB4 Inter-Domain Service specification defines a protocol that can be used to establish lane bonding between two USB4 domains (hosts). So far we have not implemented it because the host controller DMA was not fast enough to be able to go over 20 Gbits/s even if lanes were bonded. However, starting from Intel Alder Lake CPUs the DMA can go over 20 Gbits/s so now it makes more sense to add this support to the driver. Because both ends need to negotiate the bonding we add a simple state machine that tracks the connection state and does the necessary steps described by the USB4 Inter-Domain Service specification. We only establish lane bonding when both sides of the link support it. Otherwise we default to use the single lane. Also this is only done when software connection manager is used. On systems with firmware based connection manager, it handles the high-speed tunneling so bonding lanes is specific to the implementation (Intel firmware based connection manager does not support lane bonding). Signed-off-by: Mika Westerberg --- include/linux/thunderbolt.h | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h index 124e13cb1469..e13fe15e6a51 100644 --- a/include/linux/thunderbolt.h +++ b/include/linux/thunderbolt.h @@ -198,15 +198,15 @@ void tb_unregister_property_dir(const char *key, struct tb_property_dir *dir); * @local_property_block_len: Length of the @local_property_block in dwords * @remote_properties: Properties exported by the remote domain * @remote_property_block_gen: Generation of @remote_properties - * @get_uuid_work: Work used to retrieve @remote_uuid - * @uuid_retries: Number of times left @remote_uuid is requested before - * giving up - * @get_properties_work: Work used to get remote domain properties - * @properties_retries: Number of times left to read properties + * @state: Next XDomain discovery state to run + * @state_work: Work used to run the next state + * @state_retries: Number of retries remain for the state * @properties_changed_work: Work used to notify the remote domain that * our properties have changed * @properties_changed_retries: Number of times left to send properties * changed notification + * @bonding_possible: True if lane bonding is possible on local side + * @target_link_width: Target link width from the remote host * @link: Root switch link the remote domain is connected (ICM only) * @depth: Depth in the chain the remote domain is connected (ICM only) * @@ -244,12 +244,13 @@ struct tb_xdomain { u32 local_property_block_len; struct tb_property_dir *remote_properties; u32 remote_property_block_gen; - struct delayed_work get_uuid_work; - int uuid_retries; - struct delayed_work get_properties_work; - int properties_retries; + int state; + struct delayed_work state_work; + int state_retries; struct delayed_work properties_changed_work; int properties_changed_retries; + bool bonding_possible; + u8 target_link_width; u8 link; u8 depth; }; -- cgit From 4b439e25e29ec336c0e71ef1d1b212c412526518 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Sat, 9 Apr 2022 19:17:27 +0200 Subject: mm, hugetlbfs: Allow an arch to always use generic versions of get_unmapped_area functions Unlike most architectures, powerpc can only define at runtime if it is going to use the generic arch_get_unmapped_area() or not. Today, powerpc has a copy of the generic arch_get_unmapped_area() because when selection HAVE_ARCH_UNMAPPED_AREA the generic arch_get_unmapped_area() is not available. Rename it generic_get_unmapped_area() and make it independent of HAVE_ARCH_UNMAPPED_AREA. Do the same for arch_get_unmapped_area_topdown() versus HAVE_ARCH_UNMAPPED_AREA_TOPDOWN. Do the same for hugetlb_get_unmapped_area() versus HAVE_ARCH_HUGETLB_UNMAPPED_AREA. Signed-off-by: Christophe Leroy Reviewed-by: Nicholas Piggin Acked-by: Andrew Morton Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/77f9d3e592f1c8511df9381aa1c4e754651da4d1.1649523076.git.christophe.leroy@csgroup.eu --- include/linux/hugetlb.h | 5 +++++ include/linux/sched/mm.h | 9 +++++++++ 2 files changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ac2a1d758a80..bc97c8458df6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -519,6 +519,11 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr, unsigned long flags); #endif /* HAVE_ARCH_HUGETLB_UNMAPPED_AREA */ +unsigned long +generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags); + /* * huegtlb page specific state flags. These flags are located in page.private * of the hugetlb head page. Functions created via the below macros should be diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 1ad1f4bfa025..da3a32d05b34 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -153,6 +153,15 @@ extern unsigned long arch_get_unmapped_area_topdown(struct file *filp, unsigned long addr, unsigned long len, unsigned long pgoff, unsigned long flags); + +unsigned long +generic_get_unmapped_area(struct file *filp, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags); +unsigned long +generic_get_unmapped_area_topdown(struct file *filp, unsigned long addr, + unsigned long len, unsigned long pgoff, + unsigned long flags); #else static inline void arch_pick_mmap_layout(struct mm_struct *mm, struct rlimit *rlim_stack) {} -- cgit From 2cb4de085f383cb9289083d1bedbaad046f640eb Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Sat, 9 Apr 2022 19:17:28 +0200 Subject: mm: Add len and flags parameters to arch_get_mmap_end() Powerpc needs flags and len to make decision on arch_get_mmap_end(). So add them as parameters to arch_get_mmap_end(). Signed-off-by: Christophe Leroy Acked-by: Catalin Marinas Acked-by: Andrew Morton Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/b556daabe7d2bdb2361c4d6130280da7c1ba2c14.1649523076.git.christophe.leroy@csgroup.eu --- include/linux/sched/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index da3a32d05b34..8cd975a8bfeb 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -137,7 +137,7 @@ static inline void mm_update_next_owner(struct mm_struct *mm) #ifdef CONFIG_MMU #ifndef arch_get_mmap_end -#define arch_get_mmap_end(addr) (TASK_SIZE) +#define arch_get_mmap_end(addr, len, flags) (TASK_SIZE) #endif #ifndef arch_get_mmap_base -- cgit From d77e745613680c54708470402e2b623dcd769681 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Sat, 30 Apr 2022 04:51:44 +0200 Subject: regmap: Add bulk read/write callbacks into regmap_config Currently the regmap_config structure only allows the user to implement single element register read/write using .reg_read/.reg_write callbacks. The regmap_bus already implements bulk counterparts of both, and is being misused as a workaround for the missing bulk read/write callbacks in regmap_config by a couple of drivers. To stop this misuse, add the bulk read/write callbacks to regmap_config and call them from the regmap core code. Signed-off-by: Marek Vasut Cc: Jagan Teki Cc: Mark Brown Cc: Maxime Ripard Cc: Robert Foss Cc: Sam Ravnborg Cc: Thomas Zimmermann To: dri-devel@lists.freedesktop.org Link: https://lore.kernel.org/r/20220430025145.640305-1-marex@denx.de Signed-off-by: Mark Brown --- include/linux/regmap.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index de81a94d7b30..8952fa3d0d59 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -299,6 +299,12 @@ typedef void (*regmap_unlock)(void *); * if the function require special handling with lock and reg * handling and the operation cannot be represented as a simple * update_bits operation on a bus such as SPI, I2C, etc. + * @read: Optional callback that if filled will be used to perform all the + * bulk reads from the registers. Data is returned in the buffer used + * to transmit data. + * @write: Same as above for writing. + * @max_raw_read: Max raw read size that can be used on the device. + * @max_raw_write: Max raw write size that can be used on the device. * @fast_io: Register IO is fast. Use a spinlock instead of a mutex * to perform locking. This field is ignored if custom lock/unlock * functions are used (see fields lock/unlock of struct regmap_config). @@ -385,6 +391,12 @@ struct regmap_config { int (*reg_write)(void *context, unsigned int reg, unsigned int val); int (*reg_update_bits)(void *context, unsigned int reg, unsigned int mask, unsigned int val); + /* Bulk read/write */ + int (*read)(void *context, const void *reg_buf, size_t reg_size, + void *val_buf, size_t val_size); + int (*write)(void *context, const void *data, size_t count); + size_t max_raw_read; + size_t max_raw_write; bool fast_io; -- cgit From 6d5f2207447b28dc73c25b3907e7ee32ee66bdbd Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 2 May 2022 19:08:27 +0200 Subject: gpio: max732x: Drop unused support for irq and setup code via platform data MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The only user of max732x_platform_data is arch/arm/mach-pxa/littleton.c and it only uses .gpio_base. So drop the other members from the data struct and simplify the driver accordingly. The motivating side effect of this change is that the .remove() callback cannot return a nonzero error code any more which prepares making i2c remove callbacks return void. Signed-off-by: Uwe Kleine-König Signed-off-by: Bartosz Golaszewski --- include/linux/platform_data/max732x.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/max732x.h b/include/linux/platform_data/max732x.h index f231c635faec..423999207cd5 100644 --- a/include/linux/platform_data/max732x.h +++ b/include/linux/platform_data/max732x.h @@ -7,17 +7,5 @@ struct max732x_platform_data { /* number of the first GPIO */ unsigned gpio_base; - - /* interrupt base */ - int irq_base; - - void *context; /* param to setup/teardown */ - - int (*setup)(struct i2c_client *client, - unsigned gpio, unsigned ngpio, - void *context); - int (*teardown)(struct i2c_client *client, - unsigned gpio, unsigned ngpio, - void *context); }; #endif /* __LINUX_I2C_MAX732X_H */ -- cgit From 8221ecd4e4620cf2f0e942cafcdecac1685f8f16 Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Thu, 14 Apr 2022 15:04:27 +0200 Subject: PCI/PM: Drop the runtime_d3cold device flag The runtime_d3cold flag is not needed any more, so drop it. Link: https://lore.kernel.org/r/8077784.T7Z3S40VBb@kreacher Signed-off-by: Rafael J. Wysocki Signed-off-by: Bjorn Helgaas Reviewed-by: Mika Westerberg --- include/linux/pci.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 60adf42460ab..3266ac08f8ec 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -379,10 +379,6 @@ struct pci_dev { unsigned int mmio_always_on:1; /* Disallow turning off io/mem decoding during BAR sizing */ unsigned int wakeup_prepared:1; - unsigned int runtime_d3cold:1; /* Whether go through runtime - D3cold, not set for devices - powered on/off by the - corresponding bridge */ unsigned int skip_bus_pm:1; /* Internal: Skip bus-level PM */ unsigned int ignore_hotplug:1; /* Ignore hotplug events */ unsigned int hotplug_user_indicators:1; /* SlotCtl indicators -- cgit From 26817fb7b066b21c66cd41d75ed2137e046045be Mon Sep 17 00:00:00 2001 From: Alex Deucher Date: Wed, 4 May 2022 09:27:10 -0400 Subject: Revert "fbdev: fbmem: add a helper to determine if an aperture is used by a fw fb" This reverts commit 9a45ac2320d0a6ae01880a30d4b86025fce4061b. This was added a helper for amdgpu to workaround a runtime pm regression caused by a runtime pm fix in efifb. We now have a better workaround in amdgpu in commit f95af4a9236695 ("drm/amdgpu: don't runtime suspend if there are displays attached (v3)") so this workaround is no longer necessary. Since amdgpu was the only user of this interface, we can remove it. Reviewed-by: Javier Martinez Canillas Acked-by: Daniel Vetter Signed-off-by: Alex Deucher --- include/linux/fb.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 9a77ab615c36..147d582dab41 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -612,7 +612,6 @@ extern int remove_conflicting_pci_framebuffers(struct pci_dev *pdev, const char *name); extern int remove_conflicting_framebuffers(struct apertures_struct *a, const char *name, bool primary); -extern bool is_firmware_framebuffer(struct apertures_struct *a); extern int fb_prepare_logo(struct fb_info *fb_info, int rotate); extern int fb_show_logo(struct fb_info *fb_info, int rotate); extern char* fb_get_buffer_offset(struct fb_info *info, struct fb_pixmap *buf, u32 size); -- cgit From c67b627e99affe4865b13557d68152ad4c35515c Mon Sep 17 00:00:00 2001 From: David Ahern Date: Wed, 4 May 2022 10:09:47 -0700 Subject: net: Make msg_zerocopy_alloc static msg_zerocopy_alloc is only used by msg_zerocopy_realloc; remove the export and make static in skbuff.c Signed-off-by: David Ahern Acked-by: Jonathan Lemon Link: https://lore.kernel.org/r/20220504170947.18773-1-dsahern@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 3270cb72e4d8..5c2599e3fe7d 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1665,7 +1665,6 @@ static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) } #endif -struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size); struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, struct ubuf_info *uarg); -- cgit From 85db6352fc8a158a893151baa1716463d34a20d0 Mon Sep 17 00:00:00 2001 From: Tariq Toukan Date: Wed, 4 May 2022 11:09:14 +0300 Subject: net: Fix features skip in for_each_netdev_feature() The find_next_netdev_feature() macro gets the "remaining length", not bit index. Passing "bit - 1" for the following iteration is wrong as it skips the adjacent bit. Pass "bit" instead. Fixes: 3b89ea9c5902 ("net: Fix for_each_netdev_feature on Big endian") Signed-off-by: Tariq Toukan Reviewed-by: Gal Pressman Link: https://lore.kernel.org/r/20220504080914.1918-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski --- include/linux/netdev_features.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h index 2c6b9e416225..7c2d77d75a88 100644 --- a/include/linux/netdev_features.h +++ b/include/linux/netdev_features.h @@ -169,7 +169,7 @@ enum { #define NETIF_F_HW_HSR_FWD __NETIF_F(HW_HSR_FWD) #define NETIF_F_HW_HSR_DUP __NETIF_F(HW_HSR_DUP) -/* Finds the next feature with the highest number of the range of start till 0. +/* Finds the next feature with the highest number of the range of start-1 till 0. */ static inline int find_next_netdev_feature(u64 feature, unsigned long start) { @@ -188,7 +188,7 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start) for ((bit) = find_next_netdev_feature((mask_addr), \ NETDEV_FEATURE_COUNT); \ (bit) >= 0; \ - (bit) = find_next_netdev_feature((mask_addr), (bit) - 1)) + (bit) = find_next_netdev_feature((mask_addr), (bit))) /* Features valid for ethtool to change */ /* = all defined minus driver/device-class-related */ -- cgit From bb17d110cbf270d5247a6e261c5ad50e362d1675 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Fri, 29 Apr 2022 21:59:45 +0200 Subject: rpmsg: Fix calling device_lock() on non-initialized device driver_set_override() helper uses device_lock() so it should not be called before rpmsg_register_device() (which calls device_register()). Effect can be seen with CONFIG_DEBUG_MUTEXES: DEBUG_LOCKS_WARN_ON(lock->magic != lock) WARNING: CPU: 3 PID: 57 at kernel/locking/mutex.c:582 __mutex_lock+0x1ec/0x430 ... Call trace: __mutex_lock+0x1ec/0x430 mutex_lock_nested+0x44/0x50 driver_set_override+0x124/0x150 qcom_glink_native_probe+0x30c/0x3b0 glink_rpm_probe+0x274/0x350 platform_probe+0x6c/0xe0 really_probe+0x17c/0x3d0 __driver_probe_device+0x114/0x190 driver_probe_device+0x3c/0xf0 ... Refactor the rpmsg_register_device() function to use two-step device registering (initialization + add) and call driver_set_override() in proper moment. This moves the code around, so while at it also NULL-ify the rpdev->driver_override in error path to be sure it won't be kfree() second time. Fixes: 42cd402b8fd4 ("rpmsg: Fix kfree() of static memory on setting driver_override") Reported-by: Marek Szyprowski Signed-off-by: Krzysztof Kozlowski Tested-by: Marek Szyprowski Link: https://lore.kernel.org/r/20220429195946.1061725-2-krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/rpmsg.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rpmsg.h b/include/linux/rpmsg.h index 20c8cd1cde21..523c98b96cb4 100644 --- a/include/linux/rpmsg.h +++ b/include/linux/rpmsg.h @@ -165,6 +165,8 @@ static inline __rpmsg64 cpu_to_rpmsg64(struct rpmsg_device *rpdev, u64 val) #if IS_ENABLED(CONFIG_RPMSG) +int rpmsg_register_device_override(struct rpmsg_device *rpdev, + const char *driver_override); int rpmsg_register_device(struct rpmsg_device *rpdev); int rpmsg_unregister_device(struct device *parent, struct rpmsg_channel_info *chinfo); @@ -192,6 +194,12 @@ ssize_t rpmsg_get_mtu(struct rpmsg_endpoint *ept); #else +static inline int rpmsg_register_device_override(struct rpmsg_device *rpdev, + const char *driver_override) +{ + return -ENXIO; +} + static inline int rpmsg_register_device(struct rpmsg_device *rpdev) { return -ENXIO; -- cgit From d143b9db8069f0e2a0fa34484e806a55a0dd4855 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 27 Apr 2022 11:04:42 +0200 Subject: export: fix string handling of namespace in EXPORT_SYMBOL_NS Commit c3a6cf19e695 ("export: avoid code duplication in include/linux/export.h") broke the ability for a defined string to be used as a namespace value. Fix this up by using stringify to properly encode the namespace name. Fixes: c3a6cf19e695 ("export: avoid code duplication in include/linux/export.h") Cc: Miroslav Benes Cc: Emil Velikov Cc: Jessica Yu Cc: Quentin Perret Cc: Matthias Maennich Reviewed-by: Masahiro Yamada Link: https://lore.kernel.org/r/20220427090442.2105905-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman --- include/linux/export.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/export.h b/include/linux/export.h index 27d848712b90..5910ccb66ca2 100644 --- a/include/linux/export.h +++ b/include/linux/export.h @@ -2,6 +2,8 @@ #ifndef _LINUX_EXPORT_H #define _LINUX_EXPORT_H +#include + /* * Export symbols from the kernel to modules. Forked from module.h * to reduce the amount of pointless cruft we feed to gcc when only @@ -154,7 +156,6 @@ struct kernel_symbol { #endif /* CONFIG_MODULES */ #ifdef DEFAULT_SYMBOL_NAMESPACE -#include #define _EXPORT_SYMBOL(sym, sec) __EXPORT_SYMBOL(sym, sec, __stringify(DEFAULT_SYMBOL_NAMESPACE)) #else #define _EXPORT_SYMBOL(sym, sec) __EXPORT_SYMBOL(sym, sec, "") @@ -162,8 +163,8 @@ struct kernel_symbol { #define EXPORT_SYMBOL(sym) _EXPORT_SYMBOL(sym, "") #define EXPORT_SYMBOL_GPL(sym) _EXPORT_SYMBOL(sym, "_gpl") -#define EXPORT_SYMBOL_NS(sym, ns) __EXPORT_SYMBOL(sym, "", #ns) -#define EXPORT_SYMBOL_NS_GPL(sym, ns) __EXPORT_SYMBOL(sym, "_gpl", #ns) +#define EXPORT_SYMBOL_NS(sym, ns) __EXPORT_SYMBOL(sym, "", __stringify(ns)) +#define EXPORT_SYMBOL_NS_GPL(sym, ns) __EXPORT_SYMBOL(sym, "_gpl", __stringify(ns)) #endif /* !__ASSEMBLY__ */ -- cgit From 6de4d4eca9a2d0195f802bc97b0e9aeeaff05900 Mon Sep 17 00:00:00 2001 From: Paul Gortmaker Date: Thu, 28 Apr 2022 02:24:27 -0400 Subject: platform/x86: pmc_atom: remove unused pmc_atom_write() This function isn't used anywhere in the driver or anywhere in tree. So remove it. It can always be re-added if/when a use arises. Cc: Andy Shevchenko Cc: Aubrey Li Cc: Hans de Goede Cc: Mark Gross Cc: platform-driver-x86@vger.kernel.org Signed-off-by: Paul Gortmaker Link: https://lore.kernel.org/r/20220428062430.31010-2-paul.gortmaker@windriver.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/pmc_atom.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/pmc_atom.h b/include/linux/platform_data/x86/pmc_atom.h index 022bcea9edec..6807839c718b 100644 --- a/include/linux/platform_data/x86/pmc_atom.h +++ b/include/linux/platform_data/x86/pmc_atom.h @@ -144,6 +144,5 @@ #define SLEEP_ENABLE 0x2000 extern int pmc_atom_read(int offset, u32 *value); -extern int pmc_atom_write(int offset, u32 value); #endif /* PMC_ATOM_H */ -- cgit From 6df6398f7c8b481ce83f28143bc08a5231616deb Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 May 2022 19:51:31 -0700 Subject: net: add netif_inherit_tso_max() To make later patches smaller create a helper for inheriting the TSO limitations of a lower device. The TSO in the name is not an accident, subsequent patches will replace GSO with TSO in more names. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netdevice.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index eaf66e57d891..006bb5c0e413 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4895,6 +4895,9 @@ static inline void netif_set_gro_max_size(struct net_device *dev, WRITE_ONCE(dev->gro_max_size, size); } +void netif_inherit_tso_max(struct net_device *to, + const struct net_device *from); + static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol, int pulled_hlen, u16 mac_offset, int mac_len) -- cgit From 14d7b8122fd591693a2388b98563707ba72c6780 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 May 2022 19:51:32 -0700 Subject: net: don't allow user space to lift the device limits Up until commit 46e6b992c250 ("rtnetlink: allow GSO maximums to be set on device creation") the gso_max_segs and gso_max_size of a device were not controlled from user space. The quoted commit added the ability to control them because of the following setup: netns A | netns B veth<->veth eth0 If eth0 has TSO limitations and user wants to efficiently forward traffic between eth0 and the veths they should copy the TSO limitations of eth0 onto the veths. This would happen automatically for macvlans or ipvlan but veth users are not so lucky (given the loose coupling). Unfortunately the commit in question allowed users to also override the limits on real HW devices. It may be useful to control the max GSO size and someone may be using that ability (not that I know of any user), so create a separate set of knobs to reliably record the TSO limitations. Validate the user requests. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netdevice.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 006bb5c0e413..e12f7de6d6ae 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1917,8 +1917,10 @@ enum netdev_ml_priv_type { * @rtnl_link_ops: Rtnl_link_ops * * @gso_max_size: Maximum size of generic segmentation offload + * @tso_max_size: Device (as in HW) limit on the max TSO request size * @gso_max_segs: Maximum number of segments that can be passed to the * NIC for GSO + * @tso_max_segs: Device (as in HW) limit on the max TSO segment count * * @dcbnl_ops: Data Center Bridging netlink ops * @num_tc: Number of traffic classes in the net device @@ -2262,8 +2264,13 @@ struct net_device { /* for setting kernel sock attribute on TCP connection setup */ #define GSO_MAX_SIZE 65536 unsigned int gso_max_size; +#define TSO_LEGACY_MAX_SIZE 65536 +#define TSO_MAX_SIZE UINT_MAX + unsigned int tso_max_size; #define GSO_MAX_SEGS 65535 u16 gso_max_segs; +#define TSO_MAX_SEGS U16_MAX + u16 tso_max_segs; #ifdef CONFIG_DCB const struct dcbnl_rtnl_ops *dcbnl_ops; @@ -4895,6 +4902,8 @@ static inline void netif_set_gro_max_size(struct net_device *dev, WRITE_ONCE(dev->gro_max_size, size); } +void netif_set_tso_max_size(struct net_device *dev, unsigned int size); +void netif_set_tso_max_segs(struct net_device *dev, unsigned int segs); void netif_inherit_tso_max(struct net_device *to, const struct net_device *from); -- cgit From 744d49daf8bd3b17b345c836f2e6f97d49fa6ae8 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 5 May 2022 19:51:34 -0700 Subject: net: move netif_set_gso_max helpers These are now internal to the core, no need to expose them. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netdevice.h | 21 --------------------- 1 file changed, 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e12f7de6d6ae..8cf0ac616cb9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4881,27 +4881,6 @@ static inline bool netif_needs_gso(struct sk_buff *skb, (skb->ip_summed != CHECKSUM_UNNECESSARY))); } -static inline void netif_set_gso_max_size(struct net_device *dev, - unsigned int size) -{ - /* dev->gso_max_size is read locklessly from sk_setup_caps() */ - WRITE_ONCE(dev->gso_max_size, size); -} - -static inline void netif_set_gso_max_segs(struct net_device *dev, - unsigned int segs) -{ - /* dev->gso_max_segs is read locklessly from sk_setup_caps() */ - WRITE_ONCE(dev->gso_max_segs, segs); -} - -static inline void netif_set_gro_max_size(struct net_device *dev, - unsigned int size) -{ - /* This pairs with the READ_ONCE() in skb_gro_receive() */ - WRITE_ONCE(dev->gro_max_size, size); -} - void netif_set_tso_max_size(struct net_device *dev, unsigned int size); void netif_set_tso_max_segs(struct net_device *dev, unsigned int segs); void netif_inherit_tso_max(struct net_device *to, -- cgit From 3ff5f7840979aa36d47a6a00694826c78d63bf3c Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 6 May 2022 14:14:36 +0200 Subject: linkage: Fix issue with missing symbol size Occasionally, typically when a function doesn't end with 'ret', an alias on that function will have 0 size. The difference between what GCC generates and our linkage magic, is that GCC doesn't appear to provide .size for the alias'ed symbol at all. And indeed, removing this directive cures the issue. Additionally, GCC also doesn't emit .type for alias symbols either, so also omit that. Fixes: e0891269a8c2 ("linkage: add SYM_FUNC_ALIAS{,_LOCAL,_WEAK}()") Suggested-by: Mark Rutland Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov Reviewed-by: Mark Rutland Acked-by: Josh Poimboeuf Link: https://lore.kernel.org/r/20220506121631.437480085@infradead.org --- include/linux/linkage.h | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/linkage.h b/include/linux/linkage.h index acb1ad2356f1..1feab6136b5b 100644 --- a/include/linux/linkage.h +++ b/include/linux/linkage.h @@ -171,12 +171,9 @@ /* SYM_ALIAS -- use only if you have to */ #ifndef SYM_ALIAS -#define SYM_ALIAS(alias, name, sym_type, linkage) \ - linkage(alias) ASM_NL \ - .set alias, name ASM_NL \ - .type alias sym_type ASM_NL \ - .set .L__sym_size_##alias, .L__sym_size_##name ASM_NL \ - .size alias, .L__sym_size_##alias +#define SYM_ALIAS(alias, name, linkage) \ + linkage(alias) ASM_NL \ + .set alias, name ASM_NL #endif /* === code annotations === */ @@ -261,7 +258,7 @@ */ #ifndef SYM_FUNC_ALIAS #define SYM_FUNC_ALIAS(alias, name) \ - SYM_ALIAS(alias, name, SYM_T_FUNC, SYM_L_GLOBAL) + SYM_ALIAS(alias, name, SYM_L_GLOBAL) #endif /* @@ -269,7 +266,7 @@ */ #ifndef SYM_FUNC_ALIAS_LOCAL #define SYM_FUNC_ALIAS_LOCAL(alias, name) \ - SYM_ALIAS(alias, name, SYM_T_FUNC, SYM_L_LOCAL) + SYM_ALIAS(alias, name, SYM_L_LOCAL) #endif /* @@ -277,7 +274,7 @@ */ #ifndef SYM_FUNC_ALIAS_WEAK #define SYM_FUNC_ALIAS_WEAK(alias, name) \ - SYM_ALIAS(alias, name, SYM_T_FUNC, SYM_L_WEAK) + SYM_ALIAS(alias, name, SYM_L_WEAK) #endif /* SYM_CODE_START -- use for non-C (special) functions */ -- cgit From 6b79738b6ed91a2d0fe958819469eeedac3bca81 Mon Sep 17 00:00:00 2001 From: Qi Liu Date: Fri, 15 Apr 2022 18:23:52 +0800 Subject: drivers/perf: hisi: Add Support for CPA PMU On HiSilicon Hip09 platform, there is a CPA (Coherency Protocol Agent) on each SICL (Super IO Cluster) which implements packet format translation, route parsing and traffic statistics. CPA PMU has 8 PMU counters and interrupt is supported to handle counter overflow. Let's support its driver under the framework of HiSilicon PMU driver. Signed-off-by: Qi Liu Reviewed-by: John Garry Reviewed-by: Shaokun Zhang Link: https://lore.kernel.org/r/20220415102352.6665-3-liuqi115@huawei.com Signed-off-by: Will Deacon --- include/linux/cpuhotplug.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 82e33137f917..b66c5f389159 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -222,6 +222,7 @@ enum cpuhp_state { CPUHP_AP_PERF_S390_SF_ONLINE, CPUHP_AP_PERF_ARM_CCI_ONLINE, CPUHP_AP_PERF_ARM_CCN_ONLINE, + CPUHP_AP_PERF_ARM_HISI_CPA_ONLINE, CPUHP_AP_PERF_ARM_HISI_DDRC_ONLINE, CPUHP_AP_PERF_ARM_HISI_HHA_ONLINE, CPUHP_AP_PERF_ARM_HISI_L3_ONLINE, -- cgit From 343f4c49f2438d8920f1f76fa823ee59b91f02e4 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Mon, 11 Apr 2022 11:40:14 -0500 Subject: kthread: Don't allocate kthread_struct for init and umh MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If kthread_is_per_cpu runs concurrently with free_kthread_struct the kthread_struct that was just freed may be read from. This bug was introduced by commit 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads"). When kthread_struct started to be allocated for all tasks that have PF_KTHREAD set. This in turn required the kthread_struct to be freed in kernel_execve and violated the assumption that kthread_struct will have the same lifetime as the task. Looking a bit deeper this only applies to callers of kernel_execve which is just the init process and the user mode helper processes. These processes really don't want to be kernel threads but are for historical reasons. Mostly that copy_thread does not know how to take a kernel mode function to the process with for processes without PF_KTHREAD or PF_IO_WORKER set. Solve this by not allocating kthread_struct for the init process and the user mode helper processes. This is done by adding a kthread member to struct kernel_clone_args. Setting kthread in fork_idle and kernel_thread. Adding user_mode_thread that works like kernel_thread except it does not set kthread. In fork only allocating the kthread_struct if .kthread is set. I have looked at kernel/kthread.c and since commit 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") there have been no assumptions added that to_kthread or __to_kthread will not return NULL. There are a few callers of to_kthread or __to_kthread that assume a non-NULL struct kthread pointer will be returned. These functions are kthread_data(), kthread_parmme(), kthread_exit(), kthread(), kthread_park(), kthread_unpark(), kthread_stop(). All of those functions can reasonably expected to be called when it is know that a task is a kthread so that assumption seems reasonable. Cc: stable@vger.kernel.org Fixes: 40966e316f86 ("kthread: Ensure struct kthread is present for all kthreads") Reported-by: Максим Кутявин Link: https://lkml.kernel.org/r/20220506141512.516114-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/sched/task.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 719c9a6cac8d..4492266935dd 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -32,6 +32,7 @@ struct kernel_clone_args { size_t set_tid_size; int cgroup; int io_thread; + int kthread; struct cgroup *cgrp; struct css_set *cset; }; @@ -89,6 +90,7 @@ struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node); struct task_struct *fork_idle(int); struct mm_struct *copy_init_mm(void); extern pid_t kernel_thread(int (*fn)(void *), void *arg, unsigned long flags); +extern pid_t user_mode_thread(int (*fn)(void *), void *arg, unsigned long flags); extern long kernel_wait4(pid_t, int __user *, int, struct rusage *); int kernel_wait(pid_t pid, int *stat); -- cgit From e2ef115813c34ea5380ac5b4879f515070150210 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 6 May 2022 14:14:37 +0200 Subject: objtool: Fix STACK_FRAME_NON_STANDARD reloc type STACK_FRAME_NON_STANDARD results in inconsistent relocation types depending on .c or .S usage: Relocation section '.rela.discard.func_stack_frame_non_standard' at offset 0x3c01090 contains 5 entries: Offset Info Type Symbol's Value Symbol's Name + Addend 0000000000000000 00020c2200000002 R_X86_64_PC32 0000000000047b40 do_suspend_lowlevel + 0 0000000000000008 0002461e00000001 R_X86_64_64 00000000000480a0 machine_real_restart + 0 0000000000000010 0000001400000001 R_X86_64_64 0000000000000000 .rodata + b3d4 0000000000000018 0002444600000002 R_X86_64_PC32 00000000000678a0 __efi64_thunk + 0 0000000000000020 0002659d00000001 R_X86_64_64 0000000000113160 __crash_kexec + 0 Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov Link: https://lore.kernel.org/r/20220506121631.508692613@infradead.org --- include/linux/objtool.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/objtool.h b/include/linux/objtool.h index 586d35720f13..b9c1474a571e 100644 --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -40,6 +40,8 @@ struct unwind_hint { #ifdef CONFIG_STACK_VALIDATION +#include + #ifndef __ASSEMBLY__ #define UNWIND_HINT(sp_reg, sp_offset, type, end) \ @@ -137,7 +139,7 @@ struct unwind_hint { .macro STACK_FRAME_NON_STANDARD func:req .pushsection .discard.func_stack_frame_non_standard, "aw" - .long \func - . + _ASM_PTR \func .popsection .endm -- cgit From c5febea0956fd3874e8fb59c6f84d68f128d68f8 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 8 Apr 2022 18:07:50 -0500 Subject: fork: Pass struct kernel_clone_args into copy_thread With io_uring we have started supporting tasks that are for most purposes user space tasks that exclusively run code in kernel mode. The kernel task that exec's init and tasks that exec user mode helpers are also user mode tasks that just run kernel code until they call kernel execve. Pass kernel_clone_args into copy_thread so these oddball tasks can be supported more cleanly and easily. v2: Fix spelling of kenrel_clone_args on h8300 Link: https://lkml.kernel.org/r/20220506141512.516114-2-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/sched/task.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 4492266935dd..fcdcba231aac 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -68,8 +68,7 @@ extern void fork_init(void); extern void release_task(struct task_struct * p); -extern int copy_thread(unsigned long, unsigned long, unsigned long, - struct task_struct *, unsigned long); +extern int copy_thread(struct task_struct *, const struct kernel_clone_args *); extern void flush_thread(void); -- cgit From 36cb0e1cda645ee645b85a6ce652cb46a16e14e5 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Mon, 11 Apr 2022 16:17:28 -0500 Subject: fork: Explicity test for idle tasks in copy_thread The architectures ia64 and parisc have special handling for the idle thread in copy_process. Add a flag named idle to kernel_clone_args and use it to explicity test if an idle process is being created. Fullfill the expectations of the rest of the copy_thread implemetations and pass a function pointer in .stack from fork_idle(). This makes what is happening in copy_thread better defined, and is useful to make idle threads less special. Link: https://lkml.kernel.org/r/20220506141512.516114-3-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/sched/task.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index fcdcba231aac..3d6b99ce5408 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -33,6 +33,7 @@ struct kernel_clone_args { int cgroup; int io_thread; int kthread; + int idle; struct cgroup *cgrp; struct css_set *cset; }; -- cgit From 5bd2e97c868a8a44470950ed01846cab6328e540 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Tue, 12 Apr 2022 10:18:48 -0500 Subject: fork: Generalize PF_IO_WORKER handling Add fn and fn_arg members into struct kernel_clone_args and test for them in copy_thread (instead of testing for PF_KTHREAD | PF_IO_WORKER). This allows any task that wants to be a user space task that only runs in kernel mode to use this functionality. The code on x86 is an exception and still retains a PF_KTHREAD test because x86 unlikely everything else handles kthreads slightly differently than user space tasks that start with a function. The functions that created tasks that start with a function have been updated to set ".fn" and ".fn_arg" instead of ".stack" and ".stack_size". These functions are fork_idle(), create_io_thread(), kernel_thread(), and user_mode_thread(). Link: https://lkml.kernel.org/r/20220506141512.516114-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/sched/task.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 3d6b99ce5408..505aaf9fe477 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -34,6 +34,8 @@ struct kernel_clone_args { int io_thread; int kthread; int idle; + int (*fn)(void *); + void *fn_arg; struct cgroup *cgrp; struct css_set *cset; }; -- cgit From 3d1b0d351441c2be7de082b92f43d0bfdc95a8f3 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 7 May 2022 13:48:21 -0400 Subject: Revert "SUNRPC: Ensure gss-proxy connects on setup" This reverts commit 892de36fd4a98fab3298d417c051d9099af5448d. The gssproxy server is unresponsive when it calls into the kernel to start the upcall service, so it will not reply to our RPC ping at all. Reported-by: "J.Bruce Fields" Fixes: 892de36fd4a9 ("SUNRPC: Ensure gss-proxy connects on setup") Signed-off-by: Trond Myklebust --- include/linux/sunrpc/clnt.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index db5149567305..267b7aeaf1a6 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -160,7 +160,6 @@ struct rpc_add_xprt_test { #define RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT (1UL << 9) #define RPC_CLNT_CREATE_SOFTERR (1UL << 10) #define RPC_CLNT_CREATE_REUSEPORT (1UL << 11) -#define RPC_CLNT_CREATE_IGNORE_NULL_UNAVAIL (1UL << 12) struct rpc_clnt *rpc_create(struct rpc_create_args *args); struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *, -- cgit From fd13359f54ee854f00134abc6be32da94ec53dbf Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 7 May 2022 13:53:59 -0400 Subject: SUNRPC: Ensure that the gssproxy client can start in a connected state Ensure that the gssproxy client connects to the server from the gssproxy daemon process context so that the AF_LOCAL socket connection is done using the correct path and namespaces. Fixes: 1d658336b05f ("SUNRPC: Add RPC based upcall mechanism for RPCGSS auth") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- include/linux/sunrpc/clnt.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 267b7aeaf1a6..90501404fa49 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -160,6 +160,7 @@ struct rpc_add_xprt_test { #define RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT (1UL << 9) #define RPC_CLNT_CREATE_SOFTERR (1UL << 10) #define RPC_CLNT_CREATE_REUSEPORT (1UL << 11) +#define RPC_CLNT_CREATE_CONNECTED (1UL << 12) struct rpc_clnt *rpc_create(struct rpc_create_args *args); struct rpc_clnt *rpc_bind_new_program(struct rpc_clnt *, -- cgit From 813c2aee51dd7d7d9092251851e33f66719513cc Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Sat, 7 May 2022 14:33:31 +0200 Subject: ARM/pxa/mfd/power/sound: Switch Tosa to GPIO descriptors The Tosa device (Sharp SL-6000) has a mishmash driver set-up for the Toshiba TC6393xb MFD that includes a battery charger and touchscreen and has some kind of relationship to the SoC sound driver for the AC97 codec. Other devices define a chip like this but seem only half-implemented, not really handling battery charging etc. This patch switches the Toshiba MFD device to provide GPIO descriptors to the battery charger and SoC codec. As a result some descriptors need to be moved out of the Tosa boardfile and new one added: all SoC GPIO resources to these drivers now comes from the main boardfile, while the MFD provide GPIOs for its portions. As a result we can request one GPIO from our own GPIO chip and drop two hairy callbacks into the board file. This platform badly needs to have its drivers split up and converted to device tree probing to handle this quite complex relationship in an orderly manner. I just do my best in solving the GPIO descriptor part of the puzzle. Please don't ask me to fix everything that is wrong with these driver to todays standards, I am just trying to fix one aspect. I do try to use modern devres resource management and handle deferred probe using new functions where appropriate. Cc: Dmitry Eremin-Solenikov Cc: Dirk Opfer Cc: Robert Jarzmik Cc: Daniel Mack Cc: Haojian Zhuang Cc: Lee Jones Cc: Liam Girdwood Reviewed-by: Dmitry Baryshkov Acked-by: Mark Brown Acked-by: Sebastian Reichel Signed-off-by: Linus Walleij Signed-off-by: Arnd Bergmann --- include/linux/gpio/machine.h | 12 ++++++++++++ include/linux/mfd/tc6393xb.h | 3 --- 2 files changed, 12 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gpio/machine.h b/include/linux/gpio/machine.h index 2647dd10b541..4d55da28e664 100644 --- a/include/linux/gpio/machine.h +++ b/include/linux/gpio/machine.h @@ -63,6 +63,18 @@ struct gpiod_hog { int dflags; }; +/* + * Helper for lookup tables with just one single lookup for a device. + */ +#define GPIO_LOOKUP_SINGLE(_name, _dev_id, _key, _chip_hwnum, _con_id, _flags) \ +static struct gpiod_lookup_table _name = { \ + .dev_id = _dev_id, \ + .table = { \ + GPIO_LOOKUP(_key, _chip_hwnum, _con_id, _flags), \ + {}, \ + }, \ +} + /* * Simple definition of a single GPIO under a con_id */ diff --git a/include/linux/mfd/tc6393xb.h b/include/linux/mfd/tc6393xb.h index fcc8e74f0e8d..d336c541b7df 100644 --- a/include/linux/mfd/tc6393xb.h +++ b/include/linux/mfd/tc6393xb.h @@ -27,9 +27,6 @@ struct tc6393xb_platform_data { int (*resume)(struct platform_device *dev); int irq_base; /* base for subdevice irqs */ - int gpio_base; - int (*setup)(struct platform_device *dev); - void (*teardown)(struct platform_device *dev); struct tmio_nand_data *nand_data; struct tmio_fb_data *fb_data; -- cgit From ac70f4d80df414223130b04d9b4435bf56dda654 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 11 Sep 2019 10:58:26 +0200 Subject: ARM: pxa: poodle: use platform data for poodle asoc driver The poodle audio driver shows its age by using a custom gpio api for the "locomo" support chip. In a perfect world, this would get converted to use gpiolib and a gpio lookup table. As the world is not perfect, just pass all the required data in a custom platform_data structure. to avoid the globally visible mach/poodle.h header. Acked-by: Mark Brown Acked-by: Robert Jarzmik Cc: alsa-devel@alsa-project.org Signed-off-by: Arnd Bergmann --- include/linux/platform_data/asoc-poodle.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 include/linux/platform_data/asoc-poodle.h (limited to 'include/linux') diff --git a/include/linux/platform_data/asoc-poodle.h b/include/linux/platform_data/asoc-poodle.h new file mode 100644 index 000000000000..2052fad55c5c --- /dev/null +++ b/include/linux/platform_data/asoc-poodle.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_PLATFORM_DATA_POODLE_AUDIO +#define __LINUX_PLATFORM_DATA_POODLE_AUDIO + +/* locomo is not a proper gpio driver, and uses its own api */ +struct poodle_audio_platform_data { + struct device *locomo_dev; + + int gpio_amp_on; + int gpio_mute_l; + int gpio_mute_r; + int gpio_232vcc_on; + int gpio_jk_b; +}; + +#endif -- cgit From a2ef926143b84664f4823c82a437b29c4667a08c Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Fri, 18 Oct 2019 13:48:33 -0700 Subject: Input: wm97xx - switch to using threaded IRQ Instead of manually disabling and enabling interrupts and scheduling work to access the device, let's use threaded oneshot interrupt handler. It simplifies things. Signed-off-by: Dmitry Torokhov Signed-off-by: Arnd Bergmann --- include/linux/wm97xx.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/wm97xx.h b/include/linux/wm97xx.h index 462854f4f286..85bd8dd3caea 100644 --- a/include/linux/wm97xx.h +++ b/include/linux/wm97xx.h @@ -281,7 +281,6 @@ struct wm97xx { unsigned long ts_reader_min_interval; /* Minimum interval */ unsigned int pen_irq; /* Pen IRQ number in use */ struct workqueue_struct *ts_workq; - struct work_struct pen_event_work; u16 acc_slot; /* AC97 slot used for acc touch data */ u16 acc_rate; /* acc touch data rate */ unsigned pen_is_down:1; /* Pen is down */ -- cgit From e1d8f31218aab302df9e18526345af189fd061f0 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Fri, 18 Oct 2019 13:48:34 -0700 Subject: Input: wm97xx - get rid of irq_enable method in wm97xx_mach_ops Now that we are using oneshot threaded IRQ this method is not used anymore. Signed-off-by: Dmitry Torokhov [arnd: add the db1300 change as well] Cc: Manuel Lauss Signed-off-by: Arnd Bergmann --- include/linux/wm97xx.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/wm97xx.h b/include/linux/wm97xx.h index 85bd8dd3caea..332d2b0f29b9 100644 --- a/include/linux/wm97xx.h +++ b/include/linux/wm97xx.h @@ -254,9 +254,6 @@ struct wm97xx_mach_ops { int (*acc_startup) (struct wm97xx *); void (*acc_shutdown) (struct wm97xx *); - /* interrupt mask control - required for accelerated operation */ - void (*irq_enable) (struct wm97xx *, int enable); - /* GPIO pin used for accelerated operation */ int irq_gpio; -- cgit From 6a946f1bd5cc966c9dd377b14efa4cec388101ce Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 18 Sep 2019 14:50:37 +0200 Subject: ARM: pxa: pcmcia: move smemc configuration back to arch Rather than poking at the smemc registers directly from the pcmcia/pxa2xx_base driver, move those bits into machine file to have a cleaner interface. Cc: Dominik Brodowski Link: https://lore.kernel.org/lkml/87d0egjzxk.fsf@belgarion.home/ Signed-off-by: Arnd Bergmann --- include/linux/soc/pxa/smemc.h | 10 ++++++++++ 1 file changed, 10 insertions(+) create mode 100644 include/linux/soc/pxa/smemc.h (limited to 'include/linux') diff --git a/include/linux/soc/pxa/smemc.h b/include/linux/soc/pxa/smemc.h new file mode 100644 index 000000000000..cbf1a2d8af29 --- /dev/null +++ b/include/linux/soc/pxa/smemc.h @@ -0,0 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +#ifndef __PXA_REGS_H +#define __PXA_REGS_H + +#include + +void pxa_smemc_set_pcmcia_timing(int sock, u32 mcmem, u32 mcatt, u32 mcio); +void pxa_smemc_set_pcmcia_socket(int nr); + +#endif -- cgit From 5c6603e741921c75f67a724e798642ed50a6328e Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 18 Sep 2019 15:34:10 +0200 Subject: cpufreq: pxa3: move clk register access to clk driver The driver needs some low-level register access for setting the core and bus frequencies. These registers are owned by the clk driver, so move the low-level access into that driver with a slightly higher-level interface and avoid any machine header file dependencies. Cc: Michael Turquette Cc: Stephen Boyd Acked-by: Viresh Kumar Cc: linux-clk@vger.kernel.org Cc: linux-pm@vger.kernel.org Signed-off-by: Arnd Bergmann --- include/linux/clk/pxa.h | 9 +++++++++ 1 file changed, 9 insertions(+) create mode 100644 include/linux/clk/pxa.h (limited to 'include/linux') diff --git a/include/linux/clk/pxa.h b/include/linux/clk/pxa.h new file mode 100644 index 000000000000..e5516c608c99 --- /dev/null +++ b/include/linux/clk/pxa.h @@ -0,0 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ + +#ifdef CONFIG_PXA3xx +extern unsigned pxa3xx_get_clk_frequency_khz(int); +extern void pxa3xx_clk_update_accr(u32 disable, u32 enable, u32 xclkcfg, u32 mask); +#else +#define pxa3xx_get_clk_frequency_khz(x) (0) +#define pxa3xx_clk_update_accr(disable, enable, xclkcfg, mask) do { } while (0) +#endif -- cgit From fd13f8117f7a2d4054bf420ec1428e918a24a480 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 18 Sep 2019 17:42:52 +0200 Subject: ARM: pxa: move smemc register access from clk to platform The get_sdram_rows() and get_memclkdiv() helpers need smemc register that are separate from the clk registers, move them out of the clk driver, and use an extern declaration instead. Cc: Michael Turquette Cc: Stephen Boyd Cc: linux-clk@vger.kernel.org Link: https://lore.kernel.org/lkml/87pnielzo4.fsf@belgarion.home/ Signed-off-by: Arnd Bergmann --- include/linux/soc/pxa/smemc.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/pxa/smemc.h b/include/linux/soc/pxa/smemc.h index cbf1a2d8af29..f1ffea236c15 100644 --- a/include/linux/soc/pxa/smemc.h +++ b/include/linux/soc/pxa/smemc.h @@ -6,5 +6,8 @@ void pxa_smemc_set_pcmcia_timing(int sock, u32 mcmem, u32 mcatt, u32 mcio); void pxa_smemc_set_pcmcia_socket(int nr); +int pxa2xx_smemc_get_sdram_rows(void); +unsigned int pxa3xx_smemc_get_memclkdiv(void); +void __iomem *pxa_smemc_get_mdrefr(void); #endif -- cgit From 3c816d950a494ae6e16b1fa017af29bc53cb7791 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Wed, 18 Sep 2019 20:54:17 +0200 Subject: ARM: pxa: move clk register definitions to driver The clock register definitions are now used (almost) exclusively in the clk driver, and that relies on no other mach/*.h header files any more. Remove the dependency on mach/pxa*-regs.h by addressing the registers as offsets from a void __iomem * pointer, which is either passed from a board file, or (for the moment) ioremapped at boot time from a hardcoded address in case of DT (this should be moved into the DT of course). Cc: linux-clk@vger.kernel.org Acked-by: Stephen Boyd Acked-by: Robert Jarzmik Signed-off-by: Arnd Bergmann --- include/linux/clk/pxa.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk/pxa.h b/include/linux/clk/pxa.h index e5516c608c99..736b8bb91bd7 100644 --- a/include/linux/clk/pxa.h +++ b/include/linux/clk/pxa.h @@ -1,5 +1,12 @@ /* SPDX-License-Identifier: GPL-2.0-only */ +#include +#include + +extern int pxa25x_clocks_init(void __iomem *regs); +extern int pxa27x_clocks_init(void __iomem *regs); +extern int pxa3xx_clocks_init(void __iomem *regs, void __iomem *oscc_reg); + #ifdef CONFIG_PXA3xx extern unsigned pxa3xx_get_clk_frequency_khz(int); extern void pxa3xx_clk_update_accr(u32 disable, u32 enable, u32 xclkcfg, u32 mask); -- cgit From 64dbc4dd7a7cc6642c522963a6194b62480e2a68 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 20 Sep 2019 13:33:40 +0200 Subject: ARM: pxa: move plat-pxa to drivers/soc/ There are two drivers in arch/arm/plat-pxa: mfp and ssp. Both of them should ideally not be needed at all, as there are proper subsystems to replace them. OTOH, they are self-contained and can simply be normal SoC drivers, so move them over there to eliminate one more of the plat-* directories. Acked-by: Robert Jarzmik (mach-pxa) Acked-by: Lubomir Rintel (mach-mmp) Signed-off-by: Arnd Bergmann --- include/linux/soc/pxa/mfp.h | 470 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 470 insertions(+) create mode 100644 include/linux/soc/pxa/mfp.h (limited to 'include/linux') diff --git a/include/linux/soc/pxa/mfp.h b/include/linux/soc/pxa/mfp.h new file mode 100644 index 000000000000..39779cbed0c0 --- /dev/null +++ b/include/linux/soc/pxa/mfp.h @@ -0,0 +1,470 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Common Multi-Function Pin Definitions + * + * Copyright (C) 2007 Marvell International Ltd. + * + * 2007-8-21: eric miao + * initial version + */ + +#ifndef __ASM_PLAT_MFP_H +#define __ASM_PLAT_MFP_H + +#define mfp_to_gpio(m) ((m) % 256) + +/* list of all the configurable MFP pins */ +enum { + MFP_PIN_INVALID = -1, + + MFP_PIN_GPIO0 = 0, + MFP_PIN_GPIO1, + MFP_PIN_GPIO2, + MFP_PIN_GPIO3, + MFP_PIN_GPIO4, + MFP_PIN_GPIO5, + MFP_PIN_GPIO6, + MFP_PIN_GPIO7, + MFP_PIN_GPIO8, + MFP_PIN_GPIO9, + MFP_PIN_GPIO10, + MFP_PIN_GPIO11, + MFP_PIN_GPIO12, + MFP_PIN_GPIO13, + MFP_PIN_GPIO14, + MFP_PIN_GPIO15, + MFP_PIN_GPIO16, + MFP_PIN_GPIO17, + MFP_PIN_GPIO18, + MFP_PIN_GPIO19, + MFP_PIN_GPIO20, + MFP_PIN_GPIO21, + MFP_PIN_GPIO22, + MFP_PIN_GPIO23, + MFP_PIN_GPIO24, + MFP_PIN_GPIO25, + MFP_PIN_GPIO26, + MFP_PIN_GPIO27, + MFP_PIN_GPIO28, + MFP_PIN_GPIO29, + MFP_PIN_GPIO30, + MFP_PIN_GPIO31, + MFP_PIN_GPIO32, + MFP_PIN_GPIO33, + MFP_PIN_GPIO34, + MFP_PIN_GPIO35, + MFP_PIN_GPIO36, + MFP_PIN_GPIO37, + MFP_PIN_GPIO38, + MFP_PIN_GPIO39, + MFP_PIN_GPIO40, + MFP_PIN_GPIO41, + MFP_PIN_GPIO42, + MFP_PIN_GPIO43, + MFP_PIN_GPIO44, + MFP_PIN_GPIO45, + MFP_PIN_GPIO46, + MFP_PIN_GPIO47, + MFP_PIN_GPIO48, + MFP_PIN_GPIO49, + MFP_PIN_GPIO50, + MFP_PIN_GPIO51, + MFP_PIN_GPIO52, + MFP_PIN_GPIO53, + MFP_PIN_GPIO54, + MFP_PIN_GPIO55, + MFP_PIN_GPIO56, + MFP_PIN_GPIO57, + MFP_PIN_GPIO58, + MFP_PIN_GPIO59, + MFP_PIN_GPIO60, + MFP_PIN_GPIO61, + MFP_PIN_GPIO62, + MFP_PIN_GPIO63, + MFP_PIN_GPIO64, + MFP_PIN_GPIO65, + MFP_PIN_GPIO66, + MFP_PIN_GPIO67, + MFP_PIN_GPIO68, + MFP_PIN_GPIO69, + MFP_PIN_GPIO70, + MFP_PIN_GPIO71, + MFP_PIN_GPIO72, + MFP_PIN_GPIO73, + MFP_PIN_GPIO74, + MFP_PIN_GPIO75, + MFP_PIN_GPIO76, + MFP_PIN_GPIO77, + MFP_PIN_GPIO78, + MFP_PIN_GPIO79, + MFP_PIN_GPIO80, + MFP_PIN_GPIO81, + MFP_PIN_GPIO82, + MFP_PIN_GPIO83, + MFP_PIN_GPIO84, + MFP_PIN_GPIO85, + MFP_PIN_GPIO86, + MFP_PIN_GPIO87, + MFP_PIN_GPIO88, + MFP_PIN_GPIO89, + MFP_PIN_GPIO90, + MFP_PIN_GPIO91, + MFP_PIN_GPIO92, + MFP_PIN_GPIO93, + MFP_PIN_GPIO94, + MFP_PIN_GPIO95, + MFP_PIN_GPIO96, + MFP_PIN_GPIO97, + MFP_PIN_GPIO98, + MFP_PIN_GPIO99, + MFP_PIN_GPIO100, + MFP_PIN_GPIO101, + MFP_PIN_GPIO102, + MFP_PIN_GPIO103, + MFP_PIN_GPIO104, + MFP_PIN_GPIO105, + MFP_PIN_GPIO106, + MFP_PIN_GPIO107, + MFP_PIN_GPIO108, + MFP_PIN_GPIO109, + MFP_PIN_GPIO110, + MFP_PIN_GPIO111, + MFP_PIN_GPIO112, + MFP_PIN_GPIO113, + MFP_PIN_GPIO114, + MFP_PIN_GPIO115, + MFP_PIN_GPIO116, + MFP_PIN_GPIO117, + MFP_PIN_GPIO118, + MFP_PIN_GPIO119, + MFP_PIN_GPIO120, + MFP_PIN_GPIO121, + MFP_PIN_GPIO122, + MFP_PIN_GPIO123, + MFP_PIN_GPIO124, + MFP_PIN_GPIO125, + MFP_PIN_GPIO126, + MFP_PIN_GPIO127, + + MFP_PIN_GPIO128, + MFP_PIN_GPIO129, + MFP_PIN_GPIO130, + MFP_PIN_GPIO131, + MFP_PIN_GPIO132, + MFP_PIN_GPIO133, + MFP_PIN_GPIO134, + MFP_PIN_GPIO135, + MFP_PIN_GPIO136, + MFP_PIN_GPIO137, + MFP_PIN_GPIO138, + MFP_PIN_GPIO139, + MFP_PIN_GPIO140, + MFP_PIN_GPIO141, + MFP_PIN_GPIO142, + MFP_PIN_GPIO143, + MFP_PIN_GPIO144, + MFP_PIN_GPIO145, + MFP_PIN_GPIO146, + MFP_PIN_GPIO147, + MFP_PIN_GPIO148, + MFP_PIN_GPIO149, + MFP_PIN_GPIO150, + MFP_PIN_GPIO151, + MFP_PIN_GPIO152, + MFP_PIN_GPIO153, + MFP_PIN_GPIO154, + MFP_PIN_GPIO155, + MFP_PIN_GPIO156, + MFP_PIN_GPIO157, + MFP_PIN_GPIO158, + MFP_PIN_GPIO159, + MFP_PIN_GPIO160, + MFP_PIN_GPIO161, + MFP_PIN_GPIO162, + MFP_PIN_GPIO163, + MFP_PIN_GPIO164, + MFP_PIN_GPIO165, + MFP_PIN_GPIO166, + MFP_PIN_GPIO167, + MFP_PIN_GPIO168, + MFP_PIN_GPIO169, + MFP_PIN_GPIO170, + MFP_PIN_GPIO171, + MFP_PIN_GPIO172, + MFP_PIN_GPIO173, + MFP_PIN_GPIO174, + MFP_PIN_GPIO175, + MFP_PIN_GPIO176, + MFP_PIN_GPIO177, + MFP_PIN_GPIO178, + MFP_PIN_GPIO179, + MFP_PIN_GPIO180, + MFP_PIN_GPIO181, + MFP_PIN_GPIO182, + MFP_PIN_GPIO183, + MFP_PIN_GPIO184, + MFP_PIN_GPIO185, + MFP_PIN_GPIO186, + MFP_PIN_GPIO187, + MFP_PIN_GPIO188, + MFP_PIN_GPIO189, + MFP_PIN_GPIO190, + MFP_PIN_GPIO191, + + MFP_PIN_GPIO255 = 255, + + MFP_PIN_GPIO0_2, + MFP_PIN_GPIO1_2, + MFP_PIN_GPIO2_2, + MFP_PIN_GPIO3_2, + MFP_PIN_GPIO4_2, + MFP_PIN_GPIO5_2, + MFP_PIN_GPIO6_2, + MFP_PIN_GPIO7_2, + MFP_PIN_GPIO8_2, + MFP_PIN_GPIO9_2, + MFP_PIN_GPIO10_2, + MFP_PIN_GPIO11_2, + MFP_PIN_GPIO12_2, + MFP_PIN_GPIO13_2, + MFP_PIN_GPIO14_2, + MFP_PIN_GPIO15_2, + MFP_PIN_GPIO16_2, + MFP_PIN_GPIO17_2, + + MFP_PIN_ULPI_STP, + MFP_PIN_ULPI_NXT, + MFP_PIN_ULPI_DIR, + + MFP_PIN_nXCVREN, + MFP_PIN_DF_CLE_nOE, + MFP_PIN_DF_nADV1_ALE, + MFP_PIN_DF_SCLK_E, + MFP_PIN_DF_SCLK_S, + MFP_PIN_nBE0, + MFP_PIN_nBE1, + MFP_PIN_DF_nADV2_ALE, + MFP_PIN_DF_INT_RnB, + MFP_PIN_DF_nCS0, + MFP_PIN_DF_nCS1, + MFP_PIN_nLUA, + MFP_PIN_nLLA, + MFP_PIN_DF_nWE, + MFP_PIN_DF_ALE_nWE, + MFP_PIN_DF_nRE_nOE, + MFP_PIN_DF_ADDR0, + MFP_PIN_DF_ADDR1, + MFP_PIN_DF_ADDR2, + MFP_PIN_DF_ADDR3, + MFP_PIN_DF_IO0, + MFP_PIN_DF_IO1, + MFP_PIN_DF_IO2, + MFP_PIN_DF_IO3, + MFP_PIN_DF_IO4, + MFP_PIN_DF_IO5, + MFP_PIN_DF_IO6, + MFP_PIN_DF_IO7, + MFP_PIN_DF_IO8, + MFP_PIN_DF_IO9, + MFP_PIN_DF_IO10, + MFP_PIN_DF_IO11, + MFP_PIN_DF_IO12, + MFP_PIN_DF_IO13, + MFP_PIN_DF_IO14, + MFP_PIN_DF_IO15, + MFP_PIN_DF_nCS0_SM_nCS2, + MFP_PIN_DF_nCS1_SM_nCS3, + MFP_PIN_SM_nCS0, + MFP_PIN_SM_nCS1, + MFP_PIN_DF_WEn, + MFP_PIN_DF_REn, + MFP_PIN_DF_CLE_SM_OEn, + MFP_PIN_DF_ALE_SM_WEn, + MFP_PIN_DF_RDY0, + MFP_PIN_DF_RDY1, + + MFP_PIN_SM_SCLK, + MFP_PIN_SM_BE0, + MFP_PIN_SM_BE1, + MFP_PIN_SM_ADV, + MFP_PIN_SM_ADVMUX, + MFP_PIN_SM_RDY, + + MFP_PIN_MMC1_DAT7, + MFP_PIN_MMC1_DAT6, + MFP_PIN_MMC1_DAT5, + MFP_PIN_MMC1_DAT4, + MFP_PIN_MMC1_DAT3, + MFP_PIN_MMC1_DAT2, + MFP_PIN_MMC1_DAT1, + MFP_PIN_MMC1_DAT0, + MFP_PIN_MMC1_CMD, + MFP_PIN_MMC1_CLK, + MFP_PIN_MMC1_CD, + MFP_PIN_MMC1_WP, + + /* additional pins on PXA930 */ + MFP_PIN_GSIM_UIO, + MFP_PIN_GSIM_UCLK, + MFP_PIN_GSIM_UDET, + MFP_PIN_GSIM_nURST, + MFP_PIN_PMIC_INT, + MFP_PIN_RDY, + + /* additional pins on MMP2 */ + MFP_PIN_TWSI1_SCL, + MFP_PIN_TWSI1_SDA, + MFP_PIN_TWSI4_SCL, + MFP_PIN_TWSI4_SDA, + MFP_PIN_CLK_REQ, + + MFP_PIN_MAX, +}; + +/* + * a possible MFP configuration is represented by a 32-bit integer + * + * bit 0.. 9 - MFP Pin Number (1024 Pins Maximum) + * bit 10..12 - Alternate Function Selection + * bit 13..15 - Drive Strength + * bit 16..18 - Low Power Mode State + * bit 19..20 - Low Power Mode Edge Detection + * bit 21..22 - Run Mode Pull State + * + * to facilitate the definition, the following macros are provided + * + * MFP_CFG_DEFAULT - default MFP configuration value, with + * alternate function = 0, + * drive strength = fast 3mA (MFP_DS03X) + * low power mode = default + * edge detection = none + * + * MFP_CFG - default MFPR value with alternate function + * MFP_CFG_DRV - default MFPR value with alternate function and + * pin drive strength + * MFP_CFG_LPM - default MFPR value with alternate function and + * low power mode + * MFP_CFG_X - default MFPR value with alternate function, + * pin drive strength and low power mode + */ + +typedef unsigned long mfp_cfg_t; + +#define MFP_PIN(x) ((x) & 0x3ff) + +#define MFP_AF0 (0x0 << 10) +#define MFP_AF1 (0x1 << 10) +#define MFP_AF2 (0x2 << 10) +#define MFP_AF3 (0x3 << 10) +#define MFP_AF4 (0x4 << 10) +#define MFP_AF5 (0x5 << 10) +#define MFP_AF6 (0x6 << 10) +#define MFP_AF7 (0x7 << 10) +#define MFP_AF_MASK (0x7 << 10) +#define MFP_AF(x) (((x) >> 10) & 0x7) + +#define MFP_DS01X (0x0 << 13) +#define MFP_DS02X (0x1 << 13) +#define MFP_DS03X (0x2 << 13) +#define MFP_DS04X (0x3 << 13) +#define MFP_DS06X (0x4 << 13) +#define MFP_DS08X (0x5 << 13) +#define MFP_DS10X (0x6 << 13) +#define MFP_DS13X (0x7 << 13) +#define MFP_DS_MASK (0x7 << 13) +#define MFP_DS(x) (((x) >> 13) & 0x7) + +#define MFP_LPM_DEFAULT (0x0 << 16) +#define MFP_LPM_DRIVE_LOW (0x1 << 16) +#define MFP_LPM_DRIVE_HIGH (0x2 << 16) +#define MFP_LPM_PULL_LOW (0x3 << 16) +#define MFP_LPM_PULL_HIGH (0x4 << 16) +#define MFP_LPM_FLOAT (0x5 << 16) +#define MFP_LPM_INPUT (0x6 << 16) +#define MFP_LPM_STATE_MASK (0x7 << 16) +#define MFP_LPM_STATE(x) (((x) >> 16) & 0x7) + +#define MFP_LPM_EDGE_NONE (0x0 << 19) +#define MFP_LPM_EDGE_RISE (0x1 << 19) +#define MFP_LPM_EDGE_FALL (0x2 << 19) +#define MFP_LPM_EDGE_BOTH (0x3 << 19) +#define MFP_LPM_EDGE_MASK (0x3 << 19) +#define MFP_LPM_EDGE(x) (((x) >> 19) & 0x3) + +#define MFP_PULL_NONE (0x0 << 21) +#define MFP_PULL_LOW (0x1 << 21) +#define MFP_PULL_HIGH (0x2 << 21) +#define MFP_PULL_BOTH (0x3 << 21) +#define MFP_PULL_FLOAT (0x4 << 21) +#define MFP_PULL_MASK (0x7 << 21) +#define MFP_PULL(x) (((x) >> 21) & 0x7) + +#define MFP_CFG_DEFAULT (MFP_AF0 | MFP_DS03X | MFP_LPM_DEFAULT |\ + MFP_LPM_EDGE_NONE | MFP_PULL_NONE) + +#define MFP_CFG(pin, af) \ + ((MFP_CFG_DEFAULT & ~MFP_AF_MASK) |\ + (MFP_PIN(MFP_PIN_##pin) | MFP_##af)) + +#define MFP_CFG_DRV(pin, af, drv) \ + ((MFP_CFG_DEFAULT & ~(MFP_AF_MASK | MFP_DS_MASK)) |\ + (MFP_PIN(MFP_PIN_##pin) | MFP_##af | MFP_##drv)) + +#define MFP_CFG_LPM(pin, af, lpm) \ + ((MFP_CFG_DEFAULT & ~(MFP_AF_MASK | MFP_LPM_STATE_MASK)) |\ + (MFP_PIN(MFP_PIN_##pin) | MFP_##af | MFP_LPM_##lpm)) + +#define MFP_CFG_X(pin, af, drv, lpm) \ + ((MFP_CFG_DEFAULT & ~(MFP_AF_MASK | MFP_DS_MASK | MFP_LPM_STATE_MASK)) |\ + (MFP_PIN(MFP_PIN_##pin) | MFP_##af | MFP_##drv | MFP_LPM_##lpm)) + +#if defined(CONFIG_PXA3xx) || defined(CONFIG_ARCH_MMP) +/* + * each MFP pin will have a MFPR register, since the offset of the + * register varies between processors, the processor specific code + * should initialize the pin offsets by mfp_init() + * + * mfp_init_base() - accepts a virtual base for all MFPR registers and + * initialize the MFP table to a default state + * + * mfp_init_addr() - accepts a table of "mfp_addr_map" structure, which + * represents a range of MFP pins from "start" to "end", with the offset + * beginning at "offset", to define a single pin, let "end" = -1. + * + * use + * + * MFP_ADDR_X() to define a range of pins + * MFP_ADDR() to define a single pin + * MFP_ADDR_END to signal the end of pin offset definitions + */ +struct mfp_addr_map { + unsigned int start; + unsigned int end; + unsigned long offset; +}; + +#define MFP_ADDR_X(start, end, offset) \ + { MFP_PIN_##start, MFP_PIN_##end, offset } + +#define MFP_ADDR(pin, offset) \ + { MFP_PIN_##pin, -1, offset } + +#define MFP_ADDR_END { MFP_PIN_INVALID, 0 } + +void mfp_init_base(void __iomem *mfpr_base); +void mfp_init_addr(struct mfp_addr_map *map); + +/* + * mfp_{read, write}() - for direct read/write access to the MFPR register + * mfp_config() - for configuring a group of MFPR registers + * mfp_config_lpm() - configuring all low power MFPR registers for suspend + * mfp_config_run() - configuring all run time MFPR registers after resume + */ +unsigned long mfp_read(int mfp); +void mfp_write(int mfp, unsigned long mfpr_val); +void mfp_config(unsigned long *mfp_cfgs, int num); +void mfp_config_run(void); +void mfp_config_lpm(void); +#endif /* CONFIG_PXA3xx || CONFIG_ARCH_MMP */ + +#endif /* __ASM_PLAT_MFP_H */ -- cgit From 3b5eed3c71a2fb60aa4405ad92a2a6ad2677f220 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 3 May 2022 13:54:58 -0700 Subject: netfs: Eliminate Clang randstruct warning Clang's structure layout randomization feature gets upset when it sees struct inode (which is randomized) cast to struct netfs_i_context. This is due to seeing the inode pointer as being treated as an array of inodes, rather than "something else, following struct inode". Since netfs can't use container_of() (since it doesn't know what the true containing struct is), it uses this direct offset instead. Adjust the code to better reflect what is happening: an arbitrary pointer is being adjusted and cast to something else: use a "void *" for the math. The resulting binary output is the same, but Clang no longer sees an unexpected cross-structure cast: In file included from ../fs/nfs/inode.c:50: In file included from ../fs/nfs/fscache.h:15: In file included from ../include/linux/fscache.h:18: ../include/linux/netfs.h:298:9: error: casting from randomized structure pointer type 'struct inode *' to 'struct netfs_i_context *' return (struct netfs_i_context *)(inode + 1); ^ 1 error generated. Cc: David Howells Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220503205503.3054173-2-keescook@chromium.org Reviewed-by: Jeff Layton Link: https://lore.kernel.org/lkml/7562f8eccd7cc0e447becfe9912179088784e3b9.camel@kernel.org --- include/linux/netfs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index c7bf1eaf51d5..0c33b715cbfd 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -295,7 +295,7 @@ extern void netfs_stats_show(struct seq_file *); */ static inline struct netfs_i_context *netfs_i_context(struct inode *inode) { - return (struct netfs_i_context *)(inode + 1); + return (void *)inode + sizeof(*inode); } /** @@ -307,7 +307,7 @@ static inline struct netfs_i_context *netfs_i_context(struct inode *inode) */ static inline struct inode *netfs_inode(struct netfs_i_context *ctx) { - return ((struct inode *)ctx) - 1; + return (void *)ctx - sizeof(struct inode); } /** -- cgit From 595b893e2087de306d0781795fb8ec47873596a6 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 3 May 2022 13:55:00 -0700 Subject: randstruct: Reorganize Kconfigs and attribute macros In preparation for Clang supporting randstruct, reorganize the Kconfigs, move the attribute macros, and generalize the feature to be named CONFIG_RANDSTRUCT for on/off, CONFIG_RANDSTRUCT_FULL for the full randomization mode, and CONFIG_RANDSTRUCT_PERFORMANCE for the cache-line sized mode. Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220503205503.3054173-4-keescook@chromium.org --- include/linux/compiler-gcc.h | 8 -------- include/linux/compiler_types.h | 14 +++++++------- include/linux/vermagic.h | 8 ++++---- 3 files changed, 11 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 52299c957c98..a0c55eeaeaf1 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,14 +66,6 @@ __builtin_unreachable(); \ } while (0) -#if defined(RANDSTRUCT_PLUGIN) && !defined(__CHECKER__) -#define __randomize_layout __attribute__((randomize_layout)) -#define __no_randomize_layout __attribute__((no_randomize_layout)) -/* This anon struct can add padding, so only enable it under randstruct. */ -#define randomized_struct_fields_start struct { -#define randomized_struct_fields_end } __randomize_layout; -#endif - /* * GCC 'asm goto' miscompiles certain code sequences: * diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 1c2c33ae1b37..d08dfcb0ac68 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -242,15 +242,15 @@ struct ftrace_likely_data { # define __latent_entropy #endif -#ifndef __randomize_layout +#if defined(RANDSTRUCT) && !defined(__CHECKER__) +# define __randomize_layout __designated_init __attribute__((randomize_layout)) +# define __no_randomize_layout __attribute__((no_randomize_layout)) +/* This anon struct can add padding, so only enable it under randstruct. */ +# define randomized_struct_fields_start struct { +# define randomized_struct_fields_end } __randomize_layout; +#else # define __randomize_layout __designated_init -#endif - -#ifndef __no_randomize_layout # define __no_randomize_layout -#endif - -#ifndef randomized_struct_fields_start # define randomized_struct_fields_start # define randomized_struct_fields_end #endif diff --git a/include/linux/vermagic.h b/include/linux/vermagic.h index 329d63babaeb..efb51a2da599 100644 --- a/include/linux/vermagic.h +++ b/include/linux/vermagic.h @@ -32,11 +32,11 @@ #else #define MODULE_VERMAGIC_MODVERSIONS "" #endif -#ifdef RANDSTRUCT_PLUGIN +#ifdef RANDSTRUCT #include -#define MODULE_RANDSTRUCT_PLUGIN "RANDSTRUCT_PLUGIN_" RANDSTRUCT_HASHED_SEED +#define MODULE_RANDSTRUCT "RANDSTRUCT_" RANDSTRUCT_HASHED_SEED #else -#define MODULE_RANDSTRUCT_PLUGIN +#define MODULE_RANDSTRUCT #endif #define VERMAGIC_STRING \ @@ -44,6 +44,6 @@ MODULE_VERMAGIC_SMP MODULE_VERMAGIC_PREEMPT \ MODULE_VERMAGIC_MODULE_UNLOAD MODULE_VERMAGIC_MODVERSIONS \ MODULE_ARCH_VERMAGIC \ - MODULE_RANDSTRUCT_PLUGIN + MODULE_RANDSTRUCT #endif /* _LINUX_VERMAGIC_H */ -- cgit From be2b34fa9be31c60a95989f984c9a5d40cd781b6 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 3 May 2022 13:55:02 -0700 Subject: randstruct: Move seed generation into scripts/basic/ To enable Clang randstruct support, move the structure layout randomization seed generation out of scripts/gcc-plugins/ into scripts/basic/ so it happens early enough that it can be used by either compiler implementation. The gcc-plugin still builds its own header file, but now does so from the common "randstruct.seed" file. Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220503205503.3054173-6-keescook@chromium.org --- include/linux/vermagic.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vermagic.h b/include/linux/vermagic.h index efb51a2da599..a54046bf37e5 100644 --- a/include/linux/vermagic.h +++ b/include/linux/vermagic.h @@ -33,7 +33,7 @@ #define MODULE_VERMAGIC_MODVERSIONS "" #endif #ifdef RANDSTRUCT -#include +#include #define MODULE_RANDSTRUCT "RANDSTRUCT_" RANDSTRUCT_HASHED_SEED #else #define MODULE_RANDSTRUCT -- cgit From 9ec79840d6afaf472294588a6bbe145bcdffa28b Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 27 Apr 2022 18:31:19 +0100 Subject: stackleak: rework stack low bound handling In stackleak_task_init(), stackleak_track_stack(), and __stackleak_erase(), we open-code skipping the STACK_END_MAGIC at the bottom of the stack. Each case is implemented slightly differently, and only the __stackleak_erase() case is commented. In stackleak_task_init() and stackleak_track_stack() we unconditionally add sizeof(unsigned long) to the lowest stack address. In stackleak_task_init() we use end_of_stack() for this, and in stackleak_track_stack() we use task_stack_page(). In __stackleak_erase() we handle this by detecting if `kstack_ptr` has hit the stack end boundary, and if so, conditionally moving it above the magic. This patch adds a new stackleak_task_low_bound() helper which is used in all three cases, which unconditionally adds sizeof(unsigned long) to the lowest address on the task stack, with commentary as to why. This uses end_of_stack() as stackleak_task_init() did prior to this patch, as this is consistent with the code in kernel/fork.c which initializes the STACK_END_MAGIC value. In __stackleak_erase() we no longer need to check whether we've spilled into the STACK_END_MAGIC value, as stackleak_track_stack() ensures that `current->lowest_stack` stops immediately above this, and similarly the poison scan will stop immediately above this. For stackleak_task_init() and stackleak_track_stack() this results in no change to code generation. For __stackleak_erase() the generated assembly is slightly simpler and shorter. Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Kees Cook Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220427173128.2603085-5-mark.rutland@arm.com --- include/linux/stackleak.h | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h index ccaab2043fcd..67430faa5c51 100644 --- a/include/linux/stackleak.h +++ b/include/linux/stackleak.h @@ -15,9 +15,22 @@ #ifdef CONFIG_GCC_PLUGIN_STACKLEAK #include +/* + * The lowest address on tsk's stack which we can plausibly erase. + */ +static __always_inline unsigned long +stackleak_task_low_bound(const struct task_struct *tsk) +{ + /* + * The lowest unsigned long on the task stack contains STACK_END_MAGIC, + * which we must not corrupt. + */ + return (unsigned long)end_of_stack(tsk) + sizeof(unsigned long); +} + static inline void stackleak_task_init(struct task_struct *t) { - t->lowest_stack = (unsigned long)end_of_stack(t) + sizeof(unsigned long); + t->lowest_stack = stackleak_task_low_bound(t); # ifdef CONFIG_STACKLEAK_METRICS t->prev_lowest_stack = t->lowest_stack; # endif -- cgit From 0cfa2ccd285d98ad62218add2eebdcfff69fb2c0 Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 27 Apr 2022 18:31:21 +0100 Subject: stackleak: rework stack high bound handling Prior to returning to userspace, we reset current->lowest_stack to a reasonable high bound. Currently we do this by subtracting the arbitrary value `THREAD_SIZE/64` from the top of the stack, for reasons lost to history. Looking at configurations today: * On i386 where THREAD_SIZE is 8K, the bound will be 128 bytes. The pt_regs at the top of the stack is 68 bytes (with 0 to 16 bytes of padding above), and so this covers an additional portion of 44 to 60 bytes. * On x86_64 where THREAD_SIZE is at least 16K (up to 32K with KASAN) the bound will be at least 256 bytes (up to 512 with KASAN). The pt_regs at the top of the stack is 168 bytes, and so this cover an additional 88 bytes of stack (up to 344 with KASAN). * On arm64 where THREAD_SIZE is at least 16K (up to 64K with 64K pages and VMAP_STACK), the bound will be at least 256 bytes (up to 1024 with KASAN). The pt_regs at the top of the stack is 336 bytes, so this can fall within the pt_regs, or can cover an additional 688 bytes of stack. Clearly the `THREAD_SIZE/64` value doesn't make much sense -- in the worst case, this will cause more than 600 bytes of stack to be erased for every syscall, even if actual stack usage were substantially smaller. This patches makes this slightly less nonsensical by consistently resetting current->lowest_stack to the base of the task pt_regs. For clarity and for consistency with the handling of the low bound, the generation of the high bound is split into a helper with commentary explaining why. Since the pt_regs at the top of the stack will be clobbered upon the next exception entry, we don't need to poison these at exception exit. By using task_pt_regs() as the high stack boundary instead of current_top_of_stack() we avoid some redundant poisoning, and the compiler can share the address generation between the poisoning and resetting of `current->lowest_stack`, making the generated code more optimal. It's not clear to me whether the existing `THREAD_SIZE/64` offset was a dodgy heuristic to skip the pt_regs, or whether it was attempting to minimize the number of times stackleak_check_stack() would have to update `current->lowest_stack` when stack usage was shallow at the cost of unconditionally poisoning a small portion of the stack for every exit to userspace. For now I've simply removed the offset, and if we need/want to minimize updates for shallow stack usage it should be easy to add a better heuristic atop, with appropriate commentary so we know what's going on. Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Kees Cook Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220427173128.2603085-7-mark.rutland@arm.com --- include/linux/stackleak.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h index 67430faa5c51..467661aeb413 100644 --- a/include/linux/stackleak.h +++ b/include/linux/stackleak.h @@ -28,6 +28,20 @@ stackleak_task_low_bound(const struct task_struct *tsk) return (unsigned long)end_of_stack(tsk) + sizeof(unsigned long); } +/* + * The address immediately after the highest address on tsk's stack which we + * can plausibly erase. + */ +static __always_inline unsigned long +stackleak_task_high_bound(const struct task_struct *tsk) +{ + /* + * The task's pt_regs lives at the top of the task stack and will be + * overwritten by exception entry, so there's no need to erase them. + */ + return (unsigned long)task_pt_regs(tsk); +} + static inline void stackleak_task_init(struct task_struct *t) { t->lowest_stack = stackleak_task_low_bound(t); -- cgit From 77cf2b6dee6680536a3109d894f1b1ccda3fc5bf Mon Sep 17 00:00:00 2001 From: Mark Rutland Date: Wed, 27 Apr 2022 18:31:22 +0100 Subject: stackleak: rework poison scanning Currently we over-estimate the region of stack which must be erased. To determine the region to be erased, we scan downwards for a contiguous block of poison values (or the low bound of the stack). There are a few minor problems with this today: * When we find a block of poison values, we include this block within the region to erase. As this is included within the region to erase, this causes us to redundantly overwrite 'STACKLEAK_SEARCH_DEPTH' (128) bytes with poison. * As the loop condition checks 'poison_count <= depth', it will run an additional iteration after finding the contiguous block of poison, decrementing 'erase_low' once more than necessary. As this is included within the region to erase, this causes us to redundantly overwrite an additional unsigned long with poison. * As we always decrement 'erase_low' after checking an element on the stack, we always include the element below this within the region to erase. As this is included within the region to erase, this causes us to redundantly overwrite an additional unsigned long with poison. Note that this is not a functional problem. As the loop condition checks 'erase_low > task_stack_low', we'll never clobber the STACK_END_MAGIC. As we always decrement 'erase_low' after this, we'll never fail to erase the element immediately above the STACK_END_MAGIC. In total, this can cause us to erase `128 + 2 * sizeof(unsigned long)` bytes more than necessary, which is unfortunate. This patch reworks the logic to find the address immediately above the poisoned region, by finding the lowest non-poisoned address. This is factored into a stackleak_find_top_of_poison() helper both for clarity and so that this can be shared with the LKDTM test in subsequent patches. Signed-off-by: Mark Rutland Cc: Alexander Popov Cc: Andrew Morton Cc: Andy Lutomirski Cc: Kees Cook Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220427173128.2603085-8-mark.rutland@arm.com --- include/linux/stackleak.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include/linux') diff --git a/include/linux/stackleak.h b/include/linux/stackleak.h index 467661aeb413..c36e7a3b45e7 100644 --- a/include/linux/stackleak.h +++ b/include/linux/stackleak.h @@ -42,6 +42,32 @@ stackleak_task_high_bound(const struct task_struct *tsk) return (unsigned long)task_pt_regs(tsk); } +/* + * Find the address immediately above the poisoned region of the stack, where + * that region falls between 'low' (inclusive) and 'high' (exclusive). + */ +static __always_inline unsigned long +stackleak_find_top_of_poison(const unsigned long low, const unsigned long high) +{ + const unsigned int depth = STACKLEAK_SEARCH_DEPTH / sizeof(unsigned long); + unsigned int poison_count = 0; + unsigned long poison_high = high; + unsigned long sp = high; + + while (sp > low && poison_count < depth) { + sp -= sizeof(unsigned long); + + if (*(unsigned long *)sp == STACKLEAK_POISON) { + poison_count++; + } else { + poison_count = 0; + poison_high = sp; + } + } + + return poison_high; +} + static inline void stackleak_task_init(struct task_struct *t) { t->lowest_stack = stackleak_task_low_bound(t); -- cgit From 2e3afb42dd480d755cd7d97ca04586a5616c1a5e Mon Sep 17 00:00:00 2001 From: Willy Tarreau Date: Sun, 8 May 2022 11:37:09 +0200 Subject: blk-mq: remove the error_count from struct request The last two users were floppy.c and ataflop.c respectively, it was verified that no other drivers makes use of this, so let's remove it. Suggested-by: Linus Torvalds Cc: Minh Yuan Cc: Denis Efremov , Cc: Geert Uytterhoeven Cc: Christoph Hellwig Signed-off-by: Willy Tarreau Signed-off-by: Linus Torvalds --- include/linux/blk-mq.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 7aa5c54901a9..9f07061418db 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -163,7 +163,6 @@ struct request { struct rb_node rb_node; /* sort/lookup */ struct bio_vec special_vec; void *completion_data; - int error_count; /* for legacy drivers, don't use */ }; -- cgit From 56f5746c414d92ae8e8314f46760822b4ecf8be3 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 22 Feb 2022 09:40:54 -0500 Subject: namei: Merge page_symlink() and __page_symlink() There are no callers of __page_symlink() left, so we can remove that entry point. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Christian Brauner --- include/linux/fs.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index bbde95387a23..e108aff23a28 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3109,8 +3109,6 @@ extern int page_readlink(struct dentry *, char __user *, int); extern const char *page_get_link(struct dentry *, struct inode *, struct delayed_call *); extern void page_put_link(void *); -extern int __page_symlink(struct inode *inode, const char *symname, int len, - int nofs); extern int page_symlink(struct inode *inode, const char *symname, int len); extern const struct inode_operations page_symlink_inode_operations; extern void kfree_link(void *); -- cgit From 236d93c4bf2d6da83241cc8e4625e89d9604cb43 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 22 Feb 2022 10:40:11 -0500 Subject: fs: Remove AOP_FLAG_NOFS With all users of this flag gone, we can stop testing whether it's set. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/fs.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index e108aff23a28..f81bc5cbcbb6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -275,10 +275,6 @@ enum positive_aop_returns { AOP_TRUNCATED_PAGE = 0x80001, }; -#define AOP_FLAG_NOFS 0x0002 /* used by filesystem to direct - * helper code (eg buffer layer) - * to clear GFP_FS from alloc */ - /* * oh the beauties of C type declarations. */ -- cgit From de2a931150177957d37e9c975025604f4a1fe853 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 22 Feb 2022 10:47:09 -0500 Subject: fs: Remove aop_flags parameter from netfs_write_begin() There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/netfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index c7bf1eaf51d5..1c29f317d907 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -276,7 +276,7 @@ struct readahead_control; extern void netfs_readahead(struct readahead_control *); extern int netfs_readpage(struct file *, struct page *); extern int netfs_write_begin(struct file *, struct address_space *, - loff_t, unsigned int, unsigned int, struct folio **, + loff_t, unsigned int, struct folio **, void **); extern void netfs_subreq_terminated(struct netfs_io_subrequest *, ssize_t, bool); -- cgit From b3992d1e2ebcd478e0614494a6abd95e902a029b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 22 Feb 2022 11:25:12 -0500 Subject: fs: Remove aop flags parameter from block_write_begin() There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/buffer_head.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index bcb4fe9b8575..63e49dfa7738 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -226,7 +226,7 @@ int __block_write_full_page(struct inode *inode, struct page *page, int block_read_full_page(struct page*, get_block_t*); bool block_is_partially_uptodate(struct folio *, size_t from, size_t count); int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len, - unsigned flags, struct page **pagep, get_block_t *get_block); + struct page **pagep, get_block_t *get_block); int __block_write_begin(struct page *page, loff_t pos, unsigned len, get_block_t *get_block); int block_write_end(struct file *, struct address_space *, -- cgit From be3bbbc588118bdc10e21fdd7bfa6ee6b8c2555d Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 22 Feb 2022 11:25:12 -0500 Subject: fs: Remove aop flags parameter from cont_write_begin() There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/buffer_head.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 63e49dfa7738..127b60fad77e 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -238,7 +238,7 @@ int generic_write_end(struct file *, struct address_space *, void page_zero_new_buffers(struct page *page, unsigned from, unsigned to); void clean_page_buffers(struct page *page); int cont_write_begin(struct file *, struct address_space *, loff_t, - unsigned, unsigned, struct page **, void **, + unsigned, struct page **, void **, get_block_t *, loff_t *); int generic_cont_expand_simple(struct inode *inode, loff_t size); int block_commit_write(struct page *page, unsigned from, unsigned to); -- cgit From b7446e7cf15f0926866c8e5de90ab278998bf8c8 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 22 Feb 2022 11:25:12 -0500 Subject: fs: Remove aop flags parameter from grab_cache_page_write_begin() There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/pagemap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 993994cd943a..65ae8f96554b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -735,7 +735,7 @@ static inline unsigned find_get_pages_tag(struct address_space *mapping, } struct page *grab_cache_page_write_begin(struct address_space *mapping, - pgoff_t index, unsigned flags); + pgoff_t index); /* * Returns locked page at given index in given cache, creating it if needed. -- cgit From 8371f30cf774a20fd627a0f7b1ecf00e8257f3bc Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 22 Feb 2022 11:54:56 -0500 Subject: fs: Remove aop flags parameter from nobh_write_begin() There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/buffer_head.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 127b60fad77e..6e5a64005fef 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -258,7 +258,7 @@ static inline vm_fault_t block_page_mkwrite_return(int err) } sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *); int block_truncate_page(struct address_space *, loff_t, get_block_t *); -int nobh_write_begin(struct address_space *, loff_t, unsigned, unsigned, +int nobh_write_begin(struct address_space *, loff_t, unsigned len, struct page **, void **, get_block_t*); int nobh_write_end(struct file *, struct address_space *, loff_t, unsigned, unsigned, -- cgit From 9d6b0cd7579844761ed68926eb3073bab1dca87b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 22 Feb 2022 14:31:43 -0500 Subject: fs: Remove flags parameter from aops->write_begin There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/fs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index f81bc5cbcbb6..a0e73432526f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -346,7 +346,7 @@ struct address_space_operations { void (*readahead)(struct readahead_control *); int (*write_begin)(struct file *, struct address_space *mapping, - loff_t pos, unsigned len, unsigned flags, + loff_t pos, unsigned len, struct page **pagep, void **fsdata); int (*write_end)(struct file *, struct address_space *mapping, loff_t pos, unsigned len, unsigned copied, @@ -3179,7 +3179,7 @@ extern int noop_fsync(struct file *, loff_t, loff_t, int); extern ssize_t noop_direct_IO(struct kiocb *iocb, struct iov_iter *iter); extern int simple_empty(struct dentry *); extern int simple_write_begin(struct file *file, struct address_space *mapping, - loff_t pos, unsigned len, unsigned flags, + loff_t pos, unsigned len, struct page **pagep, void **fsdata); extern const struct address_space_operations ram_aops; extern int always_delete_dentry(const struct dentry *); -- cgit From 84a1041c60ff8f648a09d28af7b2e50a8f6345ed Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 3 Mar 2022 15:00:20 -0500 Subject: fs: Remove pagecache_write_begin() and pagecache_write_end() These wrappers have no more users; remove them. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/fs.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index a0e73432526f..b35ce086a7a1 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -380,18 +380,6 @@ struct address_space_operations { extern const struct address_space_operations empty_aops; -/* - * pagecache_write_begin/pagecache_write_end must be used by general code - * to write into the pagecache. - */ -int pagecache_write_begin(struct file *, struct address_space *mapping, - loff_t pos, unsigned len, unsigned flags, - struct page **pagep, void **fsdata); - -int pagecache_write_end(struct file *, struct address_space *mapping, - loff_t pos, unsigned len, unsigned copied, - struct page *page, void *fsdata); - /** * struct address_space - Contents of a cacheable, mappable object. * @host: Owner, either the inode or the block_device. -- cgit From 65aa6b5a18294b3713a90c120312ed5d63a16b82 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Mon, 4 Apr 2022 11:38:20 -0400 Subject: filemap: Remove obsolete comment in lock_page We no longer need the page's inode pinned. This comment dates back to commit db37648cd6ce ("[PATCH] mm: non syncing lock_page()") which added lock_page_nosync(). That was removed by commit 7eaceaccab5f ("block: remove per-queue plugging") which also made this comment obsolete. Signed-off-by: Miaohe Lin Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/pagemap.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 65ae8f96554b..ab47579af434 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -908,9 +908,6 @@ static inline void folio_lock(struct folio *folio) __folio_lock(folio); } -/* - * lock_page may only be called if we have the page's inode pinned. - */ static inline void lock_page(struct page *page) { struct folio *folio; -- cgit From cd125eeab2de8ef8ca3a0f3a284bc695375c73af Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 4 Apr 2022 13:24:36 -0400 Subject: filemap: Update the folio_lock documentation Add kernel-doc for several functions relating to take the folio lock. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 59 +++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 57 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ab47579af434..60657132080f 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -888,6 +888,18 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm, void unlock_page(struct page *page); void folio_unlock(struct folio *folio); +/** + * folio_trylock() - Attempt to lock a folio. + * @folio: The folio to attempt to lock. + * + * Sometimes it is undesirable to wait for a folio to be unlocked (eg + * when the locks are being taken in the wrong order, or if making + * progress through a batch of folios is more important than processing + * them in order). Usually folio_lock() is the correct function to call. + * + * Context: Any context. + * Return: Whether the lock was successfully acquired. + */ static inline bool folio_trylock(struct folio *folio) { return likely(!test_and_set_bit_lock(PG_locked, folio_flags(folio, 0))); @@ -901,6 +913,28 @@ static inline int trylock_page(struct page *page) return folio_trylock(page_folio(page)); } +/** + * folio_lock() - Lock this folio. + * @folio: The folio to lock. + * + * The folio lock protects against many things, probably more than it + * should. It is primarily held while a folio is being brought uptodate, + * either from its backing file or from swap. It is also held while a + * folio is being truncated from its address_space, so holding the lock + * is sufficient to keep folio->mapping stable. + * + * The folio lock is also held while write() is modifying the page to + * provide POSIX atomicity guarantees (as long as the write does not + * cross a page boundary). Other modifications to the data in the folio + * do not hold the folio lock and can race with writes, eg DMA and stores + * to mapped pages. + * + * Context: May sleep. If you need to acquire the locks of two or + * more folios, they must be in order of ascending index, if they are + * in the same address_space. If they are in different address_spaces, + * acquire the lock of the folio which belongs to the address_space which + * has the lowest address in memory first. + */ static inline void folio_lock(struct folio *folio) { might_sleep(); @@ -908,6 +942,17 @@ static inline void folio_lock(struct folio *folio) __folio_lock(folio); } +/** + * lock_page() - Lock the folio containing this page. + * @page: The page to lock. + * + * See folio_lock() for a description of what the lock protects. + * This is a legacy function and new code should probably use folio_lock() + * instead. + * + * Context: May sleep. Pages in the same folio share a lock, so do not + * attempt to lock two pages which share a folio. + */ static inline void lock_page(struct page *page) { struct folio *folio; @@ -918,6 +963,16 @@ static inline void lock_page(struct page *page) __folio_lock(folio); } +/** + * folio_lock_killable() - Lock this folio, interruptible by a fatal signal. + * @folio: The folio to lock. + * + * Attempts to lock the folio, like folio_lock(), except that the sleep + * to acquire the lock is interruptible by a fatal signal. + * + * Context: May sleep; see folio_lock(). + * Return: 0 if the lock was acquired; -EINTR if a fatal signal was received. + */ static inline int folio_lock_killable(struct folio *folio) { might_sleep(); @@ -964,8 +1019,8 @@ int folio_wait_bit_killable(struct folio *folio, int bit_nr); * Wait for a folio to be unlocked. * * This must be called with the caller "holding" the folio, - * ie with increased "page->count" so that the folio won't - * go away during the wait.. + * ie with increased folio reference count so that the folio won't + * go away during the wait. */ static inline void folio_wait_locked(struct folio *folio) { -- cgit From 520f301c54faa3484e820b80d4505d48ee587163 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 17 Jan 2022 14:35:22 -0500 Subject: fs: Convert is_dirty_writeback() to take a folio Pass a folio instead of a page to aops->is_dirty_writeback(). Convert both implementations and the caller. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/buffer_head.h | 2 +- include/linux/fs.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 6e5a64005fef..805c4e12700a 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -146,7 +146,7 @@ BUFFER_FNS(Defer_Completion, defer_completion) #define page_has_buffers(page) PagePrivate(page) #define folio_buffers(folio) folio_get_private(folio) -void buffer_check_dirty_writeback(struct page *page, +void buffer_check_dirty_writeback(struct folio *folio, bool *dirty, bool *writeback); /* diff --git a/include/linux/fs.h b/include/linux/fs.h index b35ce086a7a1..2be852661a29 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -369,7 +369,7 @@ struct address_space_operations { int (*launder_folio)(struct folio *); bool (*is_partially_uptodate) (struct folio *, size_t from, size_t count); - void (*is_dirty_writeback) (struct page *, bool *, bool *); + void (*is_dirty_writeback) (struct folio *, bool *dirty, bool *wb); int (*error_remove_page)(struct address_space *, struct page *); /* swapfile support */ -- cgit From 2ebdd1df316636c2faf25a1780e12553adf09cf7 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 17 Mar 2021 22:38:26 -0400 Subject: mm/readahead: Convert page_cache_async_readahead to take a folio Removes a couple of calls to compound_head and saves a few bytes. Also convert verity's read_file_data_page() to be folio-based. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/pagemap.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 60657132080f..b70192f56454 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1242,7 +1242,7 @@ void page_cache_sync_readahead(struct address_space *mapping, * @mapping: address_space which holds the pagecache and I/O vectors * @ra: file_ra_state which holds the readahead state * @file: Used by the filesystem for authentication. - * @page: The page at @index which triggered the readahead call. + * @folio: The folio at @index which triggered the readahead call. * @index: Index of first page to be read. * @req_count: Total number of pages being read by the caller. * @@ -1254,10 +1254,10 @@ void page_cache_sync_readahead(struct address_space *mapping, static inline void page_cache_async_readahead(struct address_space *mapping, struct file_ra_state *ra, struct file *file, - struct page *page, pgoff_t index, unsigned long req_count) + struct folio *folio, pgoff_t index, unsigned long req_count) { DEFINE_READAHEAD(ractl, file, ra, mapping, index); - page_cache_async_ra(&ractl, page_folio(page), req_count); + page_cache_async_ra(&ractl, folio, req_count); } static inline struct folio *__readahead_folio(struct readahead_control *ractl) -- cgit From 6d97af487dee3176cb1342d4ab16637e495440ad Mon Sep 17 00:00:00 2001 From: Sven Schnelle Date: Wed, 4 May 2022 08:23:50 +0200 Subject: entry: Rename arch_check_user_regs() to arch_enter_from_user_mode() arch_check_user_regs() is used at the moment to verify that struct pt_regs contains valid values when entering the kernel from userspace. s390 needs a place in the generic entry code to modify a cpu data structure when switching from userspace to kernel mode. As arch_check_user_regs() is exactly this, rename it to arch_enter_from_user_mode(). When entering the kernel from userspace, arch_check_user_regs() is used to verify that struct pt_regs contains valid values. Note that the NMI codepath doesn't call this function. s390 needs a place in the generic entry code to modify a cpu data structure when switching from userspace to kernel mode. As arch_check_user_regs() is exactly this, rename it to arch_enter_from_user_mode(). Signed-off-by: Sven Schnelle Reviewed-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Cc: Andy Lutomirski Link: https://lore.kernel.org/r/20220504062351.2954280-2-tmricht@linux.ibm.com Signed-off-by: Heiko Carstens --- include/linux/entry-common.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index ab78bd4c2eb0..c92ac75d6556 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -63,7 +63,7 @@ ARCH_EXIT_TO_USER_MODE_WORK) /** - * arch_check_user_regs - Architecture specific sanity check for user mode regs + * arch_enter_from_user_mode - Architecture specific sanity check for user mode regs * @regs: Pointer to currents pt_regs * * Defaults to an empty implementation. Can be replaced by architecture @@ -73,10 +73,10 @@ * section. Use __always_inline so the compiler cannot push it out of line * and make it instrumentable. */ -static __always_inline void arch_check_user_regs(struct pt_regs *regs); +static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs); -#ifndef arch_check_user_regs -static __always_inline void arch_check_user_regs(struct pt_regs *regs) {} +#ifndef arch_enter_from_user_mode +static __always_inline void arch_enter_from_user_mode(struct pt_regs *regs) {} #endif /** -- cgit From 2fbdf45d7d26361a0c3ec8833fd96edf0f5812da Mon Sep 17 00:00:00 2001 From: Ricardo Martinez Date: Fri, 6 May 2022 11:12:57 -0700 Subject: list: Add list_next_entry_circular() and list_prev_entry_circular() Add macros to get the next or previous entries and wraparound if needed. For example, calling list_next_entry_circular() on the last element should return the first element in the list. Signed-off-by: Ricardo Martinez Reviewed-by: Andy Shevchenko Signed-off-by: David S. Miller --- include/linux/list.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include/linux') diff --git a/include/linux/list.h b/include/linux/list.h index dd6c2041d09c..c147eeb2d39d 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -563,6 +563,19 @@ static inline void list_splice_tail_init(struct list_head *list, #define list_next_entry(pos, member) \ list_entry((pos)->member.next, typeof(*(pos)), member) +/** + * list_next_entry_circular - get the next element in list + * @pos: the type * to cursor. + * @head: the list head to take the element from. + * @member: the name of the list_head within the struct. + * + * Wraparound if pos is the last element (return the first element). + * Note, that list is expected to be not empty. + */ +#define list_next_entry_circular(pos, head, member) \ + (list_is_last(&(pos)->member, head) ? \ + list_first_entry(head, typeof(*(pos)), member) : list_next_entry(pos, member)) + /** * list_prev_entry - get the prev element in list * @pos: the type * to cursor @@ -571,6 +584,19 @@ static inline void list_splice_tail_init(struct list_head *list, #define list_prev_entry(pos, member) \ list_entry((pos)->member.prev, typeof(*(pos)), member) +/** + * list_prev_entry_circular - get the prev element in list + * @pos: the type * to cursor. + * @head: the list head to take the element from. + * @member: the name of the list_head within the struct. + * + * Wraparound if pos is the first element (return the last element). + * Note, that list is expected to be not empty. + */ +#define list_prev_entry_circular(pos, head, member) \ + (list_is_first(&(pos)->member, head) ? \ + list_last_entry(head, typeof(*(pos)), member) : list_prev_entry(pos, member)) + /** * list_for_each - iterate over a list * @pos: the &struct list_head to use as a loop cursor. -- cgit From a4ff365346c98081311c299955d91d657123e8aa Mon Sep 17 00:00:00 2001 From: Ricardo Martinez Date: Fri, 6 May 2022 11:12:58 -0700 Subject: net: skb: introduce skb_data_area_size() Helper to calculate the linear data space in the skb. Signed-off-by: Ricardo Martinez Reviewed-by: Sergey Ryazanov Signed-off-by: David S. Miller --- include/linux/skbuff.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5c2599e3fe7d..d58669d6cb91 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1665,6 +1665,11 @@ static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) } #endif +static inline unsigned int skb_data_area_size(struct sk_buff *skb) +{ + return skb_end_pointer(skb) - skb->data; +} + struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, struct ubuf_info *uarg); -- cgit From ca4567f1e6f660f86fcd04f3563c0045b0d4772f Mon Sep 17 00:00:00 2001 From: Alaa Mohamed Date: Thu, 5 May 2022 17:09:57 +0200 Subject: rtnetlink: add extack support in fdb del handlers Add extack support to .ndo_fdb_del in netdevice.h and all related methods. Signed-off-by: Alaa Mohamed Signed-off-by: David S. Miller --- include/linux/netdevice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 8cf0ac616cb9..74c97a34921d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1513,7 +1513,7 @@ struct net_device_ops { struct nlattr *tb[], struct net_device *dev, const unsigned char *addr, - u16 vid); + u16 vid, struct netlink_ext_ack *extack); int (*ndo_fdb_del_bulk)(struct ndmsg *ndm, struct nlattr *tb[], struct net_device *dev, -- cgit From 90532850eb210dff70423eaf20e323a91ef542d8 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Fri, 6 May 2022 06:23:52 +0200 Subject: net: phy: introduce genphy_c45_pma_baset1_setup_master_slave() Move baset1 specific part of genphy_c45_pma_setup_forced() code to separate function to make it reusable by PHY drivers. Signed-off-by: Oleksij Rempel Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 2d12054932ba..d3f924d3b235 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1614,6 +1614,7 @@ int genphy_c45_read_link(struct phy_device *phydev); int genphy_c45_read_lpa(struct phy_device *phydev); int genphy_c45_read_pma(struct phy_device *phydev); int genphy_c45_pma_setup_forced(struct phy_device *phydev); +int genphy_c45_pma_baset1_setup_master_slave(struct phy_device *phydev); int genphy_c45_an_config_aneg(struct phy_device *phydev); int genphy_c45_an_disable_aneg(struct phy_device *phydev); int genphy_c45_read_mdix(struct phy_device *phydev); -- cgit From b9a366f3d874ce1c9c0148ec399f2ce4ab05a467 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Fri, 6 May 2022 06:23:54 +0200 Subject: net: phy: introduce genphy_c45_pma_baset1_read_master_slave() Move baset1 specific part of genphy_c45_read_pma() code to separate function to make it reusable by PHY drivers. Signed-off-by: Oleksij Rempel Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index d3f924d3b235..4713c95d65fb 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1619,6 +1619,7 @@ int genphy_c45_an_config_aneg(struct phy_device *phydev); int genphy_c45_an_disable_aneg(struct phy_device *phydev); int genphy_c45_read_mdix(struct phy_device *phydev); int genphy_c45_pma_read_abilities(struct phy_device *phydev); +int genphy_c45_pma_baset1_read_master_slave(struct phy_device *phydev); int genphy_c45_read_status(struct phy_device *phydev); int genphy_c45_config_aneg(struct phy_device *phydev); int genphy_c45_loopback(struct phy_device *phydev, bool enable); -- cgit From 2013ad8836aceb828c0fb5b16fa8bb98fd3d9b28 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Fri, 6 May 2022 06:23:56 +0200 Subject: net: phy: export genphy_c45_baset1_read_status() Export genphy_c45_baset1_read_status() to make it reusable by PHY drivers. Signed-off-by: Oleksij Rempel Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/phy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 4713c95d65fb..508f1149665b 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1621,6 +1621,7 @@ int genphy_c45_read_mdix(struct phy_device *phydev); int genphy_c45_pma_read_abilities(struct phy_device *phydev); int genphy_c45_pma_baset1_read_master_slave(struct phy_device *phydev); int genphy_c45_read_status(struct phy_device *phydev); +int genphy_c45_baset1_read_status(struct phy_device *phydev); int genphy_c45_config_aneg(struct phy_device *phydev); int genphy_c45_loopback(struct phy_device *phydev, bool enable); int genphy_c45_pma_resume(struct phy_device *phydev); -- cgit From 0257be79fc4a16a3252ce80aa13b3640f728c425 Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Fri, 29 Apr 2022 12:20:18 +0200 Subject: mtd: spi-nor: expose internal parameters via debugfs There is no way to gather all information to verify support for a new flash chip. Also if you want to convert an existing flash chip to the new SFDP parsing, there is not enough information to determine if the flash will work like before. To ease this development, expose internal parameters via the debugfs. Signed-off-by: Michael Walle Signed-off-by: Pratyush Yadav Reviewed-by: Pratyush Yadav Link: https://lore.kernel.org/r/20220429102018.2361038-2-michael@walle.cc --- include/linux/mtd/spi-nor.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h index e505c4a5c530..1ede4c89805a 100644 --- a/include/linux/mtd/spi-nor.h +++ b/include/linux/mtd/spi-nor.h @@ -363,6 +363,7 @@ struct spi_nor_flash_parameter; * @write_proto: the SPI protocol for write operations * @reg_proto: the SPI protocol for read_reg/write_reg/erase operations * @sfdp: the SFDP data of the flash + * @debugfs_root: pointer to the debugfs directory * @controller_ops: SPI NOR controller driver specific operations. * @params: [FLASH-SPECIFIC] SPI NOR flash parameters and settings. * The structure includes legacy flash parameters and @@ -392,6 +393,7 @@ struct spi_nor { u32 flags; enum spi_nor_cmd_ext cmd_ext_type; struct sfdp *sfdp; + struct dentry *debugfs_root; const struct spi_nor_controller_ops *controller_ops; -- cgit From b1c5f3085149e9643b125eb10aae0e74644d7dcc Mon Sep 17 00:00:00 2001 From: Ricky WU Date: Mon, 21 Mar 2022 11:18:30 +0000 Subject: misc: rtsx: add rts5261 efuse function move rts5261_fetch_vendor_settings() to rts5261_init_from_hw() make sure it be called from S3 or D3 add more register setting when efuse is set read efuse setting to register on init flow Signed-off-by: Ricky Wu Link: https://lore.kernel.org/r/18101ecb0f0749ccb9f564eda171ba40@realtek.com Signed-off-by: Greg Kroah-Hartman --- include/linux/rtsx_pci.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rtsx_pci.h b/include/linux/rtsx_pci.h index 3d780b44e678..534038d962e4 100644 --- a/include/linux/rtsx_pci.h +++ b/include/linux/rtsx_pci.h @@ -1067,6 +1067,9 @@ #define PCR_SETTING_REG1 0x724 #define PCR_SETTING_REG2 0x814 #define PCR_SETTING_REG3 0x747 +#define PCR_SETTING_REG4 0x818 +#define PCR_SETTING_REG5 0x81C + #define rtsx_pci_init_cmd(pcr) ((pcr)->ci = 0) -- cgit From dbc2f62061c6bfba0aee93161ee3194dcee84bd0 Mon Sep 17 00:00:00 2001 From: Rafał Miłecki Date: Fri, 29 Apr 2022 17:26:46 +0100 Subject: nvmem: core: support passing DT node in cell info MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Some hardware may have NVMEM cells described in Device Tree using individual nodes. Let drivers pass such nodes to the NVMEM subsystem so they can be later used by NVMEM consumers. Signed-off-by: Rafał Miłecki Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20220429162701.2222-2-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/nvmem-consumer.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nvmem-consumer.h b/include/linux/nvmem-consumer.h index c0c0cefc3b92..980f9c9ac0bc 100644 --- a/include/linux/nvmem-consumer.h +++ b/include/linux/nvmem-consumer.h @@ -25,6 +25,7 @@ struct nvmem_cell_info { unsigned int bytes; unsigned int bit_offset; unsigned int nbits; + struct device_node *np; }; /** -- cgit From 5efe7448a1426250b5747c10ad438517f44f1e51 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 08:43:23 -0400 Subject: fs: Introduce aops->read_folio Change all the callers of ->readpage to call ->read_folio in preference, if it exists. This is a transitional duplication, and will be removed by the end of the series. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 2be852661a29..5ad942183a2c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -336,6 +336,7 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb) struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); int (*readpage)(struct file *, struct page *); + int (*read_folio)(struct file *, struct folio *); /* Write back some dirty pages from this mapping. */ int (*writepages)(struct address_space *, struct writeback_control *); -- cgit From 6c62371b7fd77628feb5b806bc29433caecedff8 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 08:49:28 -0400 Subject: fs: Convert netfs_readpage to netfs_read_folio This is straightforward because netfs already worked in terms of folios. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/netfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 1c29f317d907..4bd5ee709daa 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -274,7 +274,7 @@ struct netfs_cache_ops { struct readahead_control; extern void netfs_readahead(struct readahead_control *); -extern int netfs_readpage(struct file *, struct page *); +int netfs_read_folio(struct file *, struct folio *); extern int netfs_write_begin(struct file *, struct address_space *, loff_t, unsigned int, struct folio **, void **); -- cgit From 7479c505b4ab5ed5f81f35fdd68c44c58d6f0439 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 08:54:32 -0400 Subject: fs: Convert iomap_readpage to iomap_read_folio A straightforward conversion as iomap_readpage already worked in folios. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/iomap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index b76f0dd149fb..5b2aa45ddda3 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -225,7 +225,7 @@ static inline const struct iomap *iomap_iter_srcmap(const struct iomap_iter *i) ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, const struct iomap_ops *ops); -int iomap_readpage(struct page *page, const struct iomap_ops *ops); +int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops); void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); int iomap_releasepage(struct page *page, gfp_t gfp_mask); -- cgit From 2c69e2057962b6bd76d72446453862eb59325b49 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 10:40:40 -0400 Subject: fs: Convert block_read_full_page() to block_read_full_folio() This function is NOT converted to handle large folios, so include an assert that the filesystem isn't passing one in. Otherwise, use the folio functions instead of the page functions, where they exist. Convert all filesystems which use block_read_full_page(). Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/buffer_head.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 805c4e12700a..31d82fd9abe8 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -223,7 +223,7 @@ int block_write_full_page(struct page *page, get_block_t *get_block, int __block_write_full_page(struct inode *inode, struct page *page, get_block_t *get_block, struct writeback_control *wbc, bh_end_io_t *handler); -int block_read_full_page(struct page*, get_block_t*); +int block_read_full_folio(struct folio *, get_block_t *); bool block_is_partially_uptodate(struct folio *, size_t from, size_t count); int block_write_begin(struct address_space *mapping, loff_t pos, unsigned len, struct page **pagep, get_block_t *get_block); -- cgit From f132ab7d3ab03c5bae28d31fb80ba77c4da05500 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 11:47:39 -0400 Subject: fs: Convert mpage_readpage to mpage_read_folio mpage_readpage still works in terms of pages, and has not been audited for correctness with large folios, so include an assertion that the filesystem is not passing it large folios. Convert all the filesystems to call mpage_read_folio() instead of mpage_readpage(). Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mpage.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mpage.h b/include/linux/mpage.h index f4f5e90a6844..43986f7ec4dd 100644 --- a/include/linux/mpage.h +++ b/include/linux/mpage.h @@ -16,7 +16,7 @@ struct writeback_control; struct readahead_control; void mpage_readahead(struct readahead_control *, get_block_t get_block); -int mpage_readpage(struct page *page, get_block_t get_block); +int mpage_read_folio(struct folio *folio, get_block_t get_block); int mpage_writepages(struct address_space *mapping, struct writeback_control *wbc, get_block_t get_block); int mpage_writepage(struct page *page, get_block_t *get_block, -- cgit From 65d023af7f29eb1250a6105141a74776bae7e1f8 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 11:12:16 -0400 Subject: nfs: Convert nfs to read_folio This is a "weak" conversion which converts straight back to using pages. A full conversion should be performed at some point, hopefully by someone familiar with the filesystem. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/nfs_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index b48b9259e02c..1bba71757d62 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -594,7 +594,7 @@ static inline bool nfs_have_writebacks(const struct inode *inode) /* * linux/fs/nfs/read.c */ -extern int nfs_readpage(struct file *, struct page *); +int nfs_read_folio(struct file *, struct folio *); void nfs_readahead(struct readahead_control *); /* -- cgit From 7e0a126519b82648b254afcd95a168c15f65ea40 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 11:53:28 -0400 Subject: mm,fs: Remove aops->readpage With all implementations of aops->readpage converted to aops->read_folio, we can stop checking whether it's set and remove the member from aops. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/fs.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 5ad942183a2c..f812f5aa07dd 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -262,7 +262,7 @@ struct iattr { * trying again. The aop will be taking reasonable * precautions not to livelock. If the caller held a page * reference, it should drop it before retrying. Returned - * by readpage(). + * by read_folio(). * * address_space_operation functions return these large constants to indicate * special semantics to the caller. These are much larger than the bytes in a @@ -335,7 +335,6 @@ static inline bool is_sync_kiocb(struct kiocb *kiocb) struct address_space_operations { int (*writepage)(struct page *page, struct writeback_control *wbc); - int (*readpage)(struct file *, struct page *); int (*read_folio)(struct file *, struct folio *); /* Write back some dirty pages from this mapping. */ -- cgit From e9b5b23e957ef9260fec811d8d8081125889308a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 1 May 2022 21:39:29 -0400 Subject: fs: Change the type of filler_t By making filler_t the same as read_folio, we can use the same function for both in gfs2. We can push the use of folios down one more level in jffs2 and nfs. We also increase type safety for future users of the various read_cache_page() family of functions by forcing the parameter to be a pointer to struct file (or NULL). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Andreas Gruenbacher --- include/linux/pagemap.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index b70192f56454..831b28dab01a 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -492,7 +492,7 @@ static inline gfp_t readahead_gfp_mask(struct address_space *x) return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN; } -typedef int filler_t(void *, struct page *); +typedef int filler_t(struct file *, struct folio *); pgoff_t page_cache_next_miss(struct address_space *mapping, pgoff_t index, unsigned long max_scan); @@ -747,9 +747,9 @@ static inline struct page *grab_cache_page(struct address_space *mapping, } struct folio *read_cache_folio(struct address_space *, pgoff_t index, - filler_t *filler, void *data); + filler_t *filler, struct file *file); struct page *read_cache_page(struct address_space *, pgoff_t index, - filler_t *filler, void *data); + filler_t *filler, struct file *file); extern struct page * read_cache_page_gfp(struct address_space *mapping, pgoff_t index, gfp_t gfp_mask); -- cgit From 218d921b581eadf312c8ef0e09113b111f104eeb Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Sat, 30 Apr 2022 22:08:54 -0700 Subject: fscrypt: add new helper functions for test_dummy_encryption Unfortunately the design of fscrypt_set_test_dummy_encryption() doesn't work properly for the new mount API, as it combines too many steps into one function: - Parse the argument to test_dummy_encryption - Check the setting against the filesystem instance - Apply the setting to the filesystem instance The new mount API has split these into separate steps. ext4 partially worked around this by duplicating some of the logic, but it still had some bugs. To address this, add some new helper functions that split up the steps of fscrypt_set_test_dummy_encryption(): - fscrypt_parse_test_dummy_encryption() - fscrypt_dummy_policies_equal() - fscrypt_add_test_dummy_key() While we're add it, also add a function fscrypt_is_dummy_policy_set() which will be useful to avoid some #ifdef's. Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20220501050857.538984-5-ebiggers@kernel.org --- include/linux/fscrypt.h | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index efc7f96e5e26..e60d57c99cb6 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -32,6 +32,7 @@ union fscrypt_policy; struct fscrypt_info; +struct fs_parameter; struct seq_file; struct fscrypt_str { @@ -289,10 +290,19 @@ struct fscrypt_dummy_policy { const union fscrypt_policy *policy; }; +int fscrypt_parse_test_dummy_encryption(const struct fs_parameter *param, + struct fscrypt_dummy_policy *dummy_policy); +bool fscrypt_dummy_policies_equal(const struct fscrypt_dummy_policy *p1, + const struct fscrypt_dummy_policy *p2); int fscrypt_set_test_dummy_encryption(struct super_block *sb, const char *arg, struct fscrypt_dummy_policy *dummy_policy); void fscrypt_show_test_dummy_encryption(struct seq_file *seq, char sep, struct super_block *sb); +static inline bool +fscrypt_is_dummy_policy_set(const struct fscrypt_dummy_policy *dummy_policy) +{ + return dummy_policy->policy != NULL; +} static inline void fscrypt_free_dummy_policy(struct fscrypt_dummy_policy *dummy_policy) { @@ -303,6 +313,8 @@ fscrypt_free_dummy_policy(struct fscrypt_dummy_policy *dummy_policy) /* keyring.c */ void fscrypt_sb_free(struct super_block *sb); int fscrypt_ioctl_add_key(struct file *filp, void __user *arg); +int fscrypt_add_test_dummy_key(struct super_block *sb, + const struct fscrypt_dummy_policy *dummy_policy); int fscrypt_ioctl_remove_key(struct file *filp, void __user *arg); int fscrypt_ioctl_remove_key_all_users(struct file *filp, void __user *arg); int fscrypt_ioctl_get_key_status(struct file *filp, void __user *arg); @@ -477,12 +489,32 @@ static inline int fscrypt_set_context(struct inode *inode, void *fs_data) struct fscrypt_dummy_policy { }; +static inline int +fscrypt_parse_test_dummy_encryption(const struct fs_parameter *param, + struct fscrypt_dummy_policy *dummy_policy) +{ + return -EINVAL; +} + +static inline bool +fscrypt_dummy_policies_equal(const struct fscrypt_dummy_policy *p1, + const struct fscrypt_dummy_policy *p2) +{ + return true; +} + static inline void fscrypt_show_test_dummy_encryption(struct seq_file *seq, char sep, struct super_block *sb) { } +static inline bool +fscrypt_is_dummy_policy_set(const struct fscrypt_dummy_policy *dummy_policy) +{ + return false; +} + static inline void fscrypt_free_dummy_policy(struct fscrypt_dummy_policy *dummy_policy) { @@ -498,6 +530,13 @@ static inline int fscrypt_ioctl_add_key(struct file *filp, void __user *arg) return -EOPNOTSUPP; } +static inline int +fscrypt_add_test_dummy_key(struct super_block *sb, + const struct fscrypt_dummy_policy *dummy_policy) +{ + return 0; +} + static inline int fscrypt_ioctl_remove_key(struct file *filp, void __user *arg) { return -EOPNOTSUPP; -- cgit From 623a1ddfeb232526275ddd0c8378771e6712aad4 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:42 -0700 Subject: mm/hugetlb: take src_mm->write_protect_seq in copy_hugetlb_page_range() Let's do it just like copy_page_range(), taking the seqlock and making sure the mmap_lock is held in write mode. This allows for add a VM_BUG_ON to page_needs_cow_for_dma() and properly synchronizes concurrent fork() with GUP-fast of hugetlb pages, which will be relevant for further changes. Link: https://lkml.kernel.org/r/20220428083441.37290-3-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/mm.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index b9316b2c11ce..a02812178562 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1576,6 +1576,8 @@ static inline bool page_maybe_dma_pinned(struct page *page) /* * This should most likely only be called during fork() to see whether we * should break the cow immediately for a page on the src mm. + * + * The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq. */ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, struct page *page) @@ -1583,6 +1585,8 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, if (!is_cow_mapping(vma->vm_flags)) return false; + VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1)); + if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)) return false; -- cgit From fb3d824d1a46c5bb0584ea88f32dc2495544aebf Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:43 -0700 Subject: mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap() ... and move the special check for pinned pages into page_try_dup_anon_rmap() to prepare for tracking exclusive anonymous pages via a new pageflag, clearing it only after making sure that there are no GUP pins on the anonymous page. We really only care about pins on anonymous pages, because they are prone to getting replaced in the COW handler once mapped R/O. For !anon pages in cow-mappings (!VM_SHARED && VM_MAYWRITE) we shouldn't really care about that, at least not that I could come up with an example. Let's drop the is_cow_mapping() check from page_needs_cow_for_dma(), as we know we're dealing with anonymous pages. Also, drop the handling of pinned pages from copy_huge_pud() and add a comment if ever supporting anonymous pages on the PUD level. This is a preparation for tracking exclusivity of anonymous pages in the rmap code, and disallowing marking a page shared (-> failing to duplicate) if there are GUP pins on a page. Link: https://lkml.kernel.org/r/20220428083441.37290-5-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/mm.h | 5 +---- include/linux/rmap.h | 49 ++++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 49 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index a02812178562..9ee3ae51e8e3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1575,16 +1575,13 @@ static inline bool page_maybe_dma_pinned(struct page *page) /* * This should most likely only be called during fork() to see whether we - * should break the cow immediately for a page on the src mm. + * should break the cow immediately for an anon page on the src mm. * * The caller has to hold the PT lock and the vma->vm_mm->->write_protect_seq. */ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, struct page *page) { - if (!is_cow_mapping(vma->vm_flags)) - return false; - VM_BUG_ON(!(raw_read_seqcount(&vma->vm_mm->write_protect_seq) & 1)); if (!test_bit(MMF_HAS_PINNED, &vma->vm_mm->flags)) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 8573aae50d96..73f41544084f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -12,6 +12,7 @@ #include #include #include +#include /* * The anon_vma heads a list of private "related" vmas, to scan if @@ -182,11 +183,57 @@ void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *, void hugepage_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address); -static inline void page_dup_rmap(struct page *page, bool compound) +static inline void __page_dup_rmap(struct page *page, bool compound) { atomic_inc(compound ? compound_mapcount_ptr(page) : &page->_mapcount); } +static inline void page_dup_file_rmap(struct page *page, bool compound) +{ + __page_dup_rmap(page, compound); +} + +/** + * page_try_dup_anon_rmap - try duplicating a mapping of an already mapped + * anonymous page + * @page: the page to duplicate the mapping for + * @compound: the page is mapped as compound or as a small page + * @vma: the source vma + * + * The caller needs to hold the PT lock and the vma->vma_mm->write_protect_seq. + * + * Duplicating the mapping can only fail if the page may be pinned; device + * private pages cannot get pinned and consequently this function cannot fail. + * + * If duplicating the mapping succeeds, the page has to be mapped R/O into + * the parent and the child. It must *not* get mapped writable after this call. + * + * Returns 0 if duplicating the mapping succeeded. Returns -EBUSY otherwise. + */ +static inline int page_try_dup_anon_rmap(struct page *page, bool compound, + struct vm_area_struct *vma) +{ + VM_BUG_ON_PAGE(!PageAnon(page), page); + + /* + * If this page may have been pinned by the parent process, + * don't allow to duplicate the mapping but instead require to e.g., + * copy the page immediately for the child so that we'll always + * guarantee the pinned page won't be randomly replaced in the + * future on write faults. + */ + if (likely(!is_device_private_page(page) && + unlikely(page_needs_cow_for_dma(vma, page)))) + return -EBUSY; + + /* + * It's okay to share the anon page between both processes, mapping + * the page R/O into both processes. + */ + __page_dup_rmap(page, compound); + return 0; +} + /* * Called from mm/vmscan.c to handle paging out */ -- cgit From 14f9135d547060d1d0c182501201f8da19895fe3 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:43 -0700 Subject: mm/rmap: convert RMAP flags to a proper distinct rmap_t type We want to pass the flags to more than one anon rmap function, getting rid of special "do_page_add_anon_rmap()". So let's pass around a distinct __bitwise type and refine documentation. Link: https://lkml.kernel.org/r/20220428083441.37290-6-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/rmap.h | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 73f41544084f..c53a38151089 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -160,9 +160,23 @@ static inline void anon_vma_merge(struct vm_area_struct *vma, struct anon_vma *page_get_anon_vma(struct page *page); -/* bitflags for do_page_add_anon_rmap() */ -#define RMAP_EXCLUSIVE 0x01 -#define RMAP_COMPOUND 0x02 +/* RMAP flags, currently only relevant for some anon rmap operations. */ +typedef int __bitwise rmap_t; + +/* + * No special request: if the page is a subpage of a compound page, it is + * mapped via a PTE. The mapped (sub)page is possibly shared between processes. + */ +#define RMAP_NONE ((__force rmap_t)0) + +/* The (sub)page is exclusive to a single process. */ +#define RMAP_EXCLUSIVE ((__force rmap_t)BIT(0)) + +/* + * The compound page is not mapped via PTEs, but instead via a single PMD and + * should be accounted accordingly. + */ +#define RMAP_COMPOUND ((__force rmap_t)BIT(1)) /* * rmap interfaces called when adding or removing pte of page @@ -171,7 +185,7 @@ void page_move_anon_rmap(struct page *, struct vm_area_struct *); void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, bool compound); void do_page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long address, int flags); + unsigned long address, rmap_t flags); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, bool compound); void page_add_file_rmap(struct page *, struct vm_area_struct *, -- cgit From f1e2db12e45baaa2d366f87c885968096c2ff5aa Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:43 -0700 Subject: mm/rmap: remove do_page_add_anon_rmap() ... and instead convert page_add_anon_rmap() to accept flags. Passing flags instead of bools is usually nicer either way, and we want to more often also pass RMAP_EXCLUSIVE in follow up patches when detecting that an anonymous page is exclusive: for example, when restoring an anonymous page from a writable migration entry. This is a preparation for marking an anonymous page inside page_add_anon_rmap() as exclusive when RMAP_EXCLUSIVE is passed. Link: https://lkml.kernel.org/r/20220428083441.37290-7-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/rmap.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index c53a38151089..643801a937f3 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -183,8 +183,6 @@ typedef int __bitwise rmap_t; */ void page_move_anon_rmap(struct page *, struct vm_area_struct *); void page_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long address, bool compound); -void do_page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, bool compound); -- cgit From 28c5209dfd5f86f4398ce01bfac8508b2c4d4050 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:43 -0700 Subject: mm/rmap: pass rmap flags to hugepage_add_anon_rmap() Let's prepare for passing RMAP_EXCLUSIVE, similarly as we do for page_add_anon_rmap() now. RMAP_COMPOUND is implicit for hugetlb pages and ignored. Link: https://lkml.kernel.org/r/20220428083441.37290-8-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/rmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 643801a937f3..90e1e6925789 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -191,7 +191,7 @@ void page_add_file_rmap(struct page *, struct vm_area_struct *, void page_remove_rmap(struct page *, struct vm_area_struct *, bool compound); void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long address); + unsigned long address, rmap_t flags); void hugepage_add_new_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address); -- cgit From 40f2bbf71161fa9195c7869004290003af152375 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:43 -0700 Subject: mm/rmap: drop "compound" parameter from page_add_new_anon_rmap() New anonymous pages are always mapped natively: only THP/khugepaged code maps a new compound anonymous page and passes "true". Otherwise, we're just dealing with simple, non-compound pages. Let's give the interface clearer semantics and document these. Remove the PageTransCompound() sanity check from page_add_new_anon_rmap(). Link: https://lkml.kernel.org/r/20220428083441.37290-9-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/rmap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 90e1e6925789..e4156921eea9 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -185,11 +185,12 @@ void page_move_anon_rmap(struct page *, struct vm_area_struct *); void page_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); void page_add_new_anon_rmap(struct page *, struct vm_area_struct *, - unsigned long address, bool compound); + unsigned long address); void page_add_file_rmap(struct page *, struct vm_area_struct *, bool compound); void page_remove_rmap(struct page *, struct vm_area_struct *, bool compound); + void hugepage_add_anon_rmap(struct page *, struct vm_area_struct *, unsigned long address, rmap_t flags); void hugepage_add_new_anon_rmap(struct page *, struct vm_area_struct *, -- cgit From 78fbe906cc900b33ce078102e13e0e99b5b8c406 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:44 -0700 Subject: mm/page-flags: reuse PG_mappedtodisk as PG_anon_exclusive for PageAnon() pages The basic question we would like to have a reliable and efficient answer to is: is this anonymous page exclusive to a single process or might it be shared? We need that information for ordinary/single pages, hugetlb pages, and possibly each subpage of a THP. Introduce a way to mark an anonymous page as exclusive, with the ultimate goal of teaching our COW logic to not do "wrong COWs", whereby GUP pins lose consistency with the pages mapped into the page table, resulting in reported memory corruptions. Most pageflags already have semantics for anonymous pages, however, PG_mappedtodisk should never apply to pages in the swapcache, so let's reuse that flag. As PG_has_hwpoisoned also uses that flag on the second tail page of a compound page, convert it to PG_error instead, which is marked as PF_NO_TAIL, so never used for tail pages. Use custom page flag modification functions such that we can do additional sanity checks. The semantics we'll put into some kernel doc in the future are: " PG_anon_exclusive is *usually* only expressive in combination with a page table entry. Depending on the page table entry type it might store the following information: Is what's mapped via this page table entry exclusive to the single process and can be mapped writable without further checks? If not, it might be shared and we might have to COW. For now, we only expect PTE-mapped THPs to make use of PG_anon_exclusive in subpages. For other anonymous compound folios (i.e., hugetlb), only the head page is logically mapped and holds this information. For example, an exclusive, PMD-mapped THP only has PG_anon_exclusive set on the head page. When replacing the PMD by a page table full of PTEs, PG_anon_exclusive, if set on the head page, will be set on all tail pages accordingly. Note that converting from a PTE-mapping to a PMD mapping using the same compound page is currently not possible and consequently doesn't require care. If GUP wants to take a reliable pin (FOLL_PIN) on an anonymous page, it should only pin if the relevant PG_anon_exclusive is set. In that case, the pin will be fully reliable and stay consistent with the pages mapped into the page table, as the bit cannot get cleared (e.g., by fork(), KSM) while the page is pinned. For anonymous pages that are mapped R/W, PG_anon_exclusive can be assumed to always be set because such pages cannot possibly be shared. The page table lock protecting the page table entry is the primary synchronization mechanism for PG_anon_exclusive; GUP-fast that does not take the PT lock needs special care when trying to clear the flag. Page table entry types and PG_anon_exclusive: * Present: PG_anon_exclusive applies. * Swap: the information is lost. PG_anon_exclusive was cleared. * Migration: the entry holds this information instead. PG_anon_exclusive was cleared. * Device private: PG_anon_exclusive applies. * Device exclusive: PG_anon_exclusive applies. * HW Poison: PG_anon_exclusive is stale and not changed. If the page may be pinned (FOLL_PIN), clearing PG_anon_exclusive is not allowed and the flag will stick around until the page is freed and folio->mapping is cleared. " We won't be clearing PG_anon_exclusive on destructive unmapping (i.e., zapping) of page table entries, page freeing code will handle that when also invalidate page->mapping to not indicate PageAnon() anymore. Letting information about exclusivity stick around will be an important property when adding sanity checks to unpinning code. Note that we properly clear the flag in free_pages_prepare() via PAGE_FLAGS_CHECK_AT_PREP for each individual subpage of a compound page, so there is no need to manually clear the flag. Link: https://lkml.kernel.org/r/20220428083441.37290-12-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1ea896887ee4..b70124b9c7c1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -142,6 +142,15 @@ enum pageflags { PG_readahead = PG_reclaim, + /* + * Depending on the way an anonymous folio can be mapped into a page + * table (e.g., single PMD/PUD/CONT of the head page vs. PTE-mapped + * THP), PG_anon_exclusive may be set only for the head page or for + * tail pages of an anonymous folio. For now, we only expect it to be + * set on tail pages for PTE-mapped THP. + */ + PG_anon_exclusive = PG_mappedtodisk, + /* Filesystems */ PG_checked = PG_owner_priv_1, @@ -176,7 +185,7 @@ enum pageflags { * Indicates that at least one subpage is hwpoisoned in the * THP. */ - PG_has_hwpoisoned = PG_mappedtodisk, + PG_has_hwpoisoned = PG_error, #endif /* non-lru isolated movable page */ @@ -1002,6 +1011,34 @@ extern bool is_free_buddy_page(struct page *page); PAGEFLAG(Isolated, isolated, PF_ANY); +static __always_inline int PageAnonExclusive(struct page *page) +{ + VM_BUG_ON_PGFLAGS(!PageAnon(page), page); + VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page); + return test_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags); +} + +static __always_inline void SetPageAnonExclusive(struct page *page) +{ + VM_BUG_ON_PGFLAGS(!PageAnon(page) || PageKsm(page), page); + VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page); + set_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags); +} + +static __always_inline void ClearPageAnonExclusive(struct page *page) +{ + VM_BUG_ON_PGFLAGS(!PageAnon(page) || PageKsm(page), page); + VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page); + clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags); +} + +static __always_inline void __ClearPageAnonExclusive(struct page *page) +{ + VM_BUG_ON_PGFLAGS(!PageAnon(page), page); + VM_BUG_ON_PGFLAGS(PageHuge(page) && !PageHead(page), page); + __clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags); +} + #ifdef CONFIG_MMU #define __PG_MLOCKED (1UL << PG_mlocked) #else -- cgit From 6c287605fd56466e645693eff3ae7c08fba56e0a Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:44 -0700 Subject: mm: remember exclusively mapped anonymous pages with PG_anon_exclusive Let's mark exclusively mapped anonymous pages with PG_anon_exclusive as exclusive, and use that information to make GUP pins reliable and stay consistent with the page mapped into the page table even if the page table entry gets write-protected. With that information at hand, we can extend our COW logic to always reuse anonymous pages that are exclusive. For anonymous pages that might be shared, the existing logic applies. As already documented, PG_anon_exclusive is usually only expressive in combination with a page table entry. Especially PTE vs. PMD-mapped anonymous pages require more thought, some examples: due to mremap() we can easily have a single compound page PTE-mapped into multiple page tables exclusively in a single process -- multiple page table locks apply. Further, due to MADV_WIPEONFORK we might not necessarily write-protect all PTEs, and only some subpages might be pinned. Long story short: once PTE-mapped, we have to track information about exclusivity per sub-page, but until then, we can just track it for the compound page in the head page and not having to update a whole bunch of subpages all of the time for a simple PMD mapping of a THP. For simplicity, this commit mostly talks about "anonymous pages", while it's for THP actually "the part of an anonymous folio referenced via a page table entry". To not spill PG_anon_exclusive code all over the mm code-base, we let the anon rmap code to handle all PG_anon_exclusive logic it can easily handle. If a writable, present page table entry points at an anonymous (sub)page, that (sub)page must be PG_anon_exclusive. If GUP wants to take a reliably pin (FOLL_PIN) on an anonymous page references via a present page table entry, it must only pin if PG_anon_exclusive is set for the mapped (sub)page. This commit doesn't adjust GUP, so this is only implicitly handled for FOLL_WRITE, follow-up commits will teach GUP to also respect it for FOLL_PIN without FOLL_WRITE, to make all GUP pins of anonymous pages fully reliable. Whenever an anonymous page is to be shared (fork(), KSM), or when temporarily unmapping an anonymous page (swap, migration), the relevant PG_anon_exclusive bit has to be cleared to mark the anonymous page possibly shared. Clearing will fail if there are GUP pins on the page: * For fork(), this means having to copy the page and not being able to share it. fork() protects against concurrent GUP using the PT lock and the src_mm->write_protect_seq. * For KSM, this means sharing will fail. For swap this means, unmapping will fail, For migration this means, migration will fail early. All three cases protect against concurrent GUP using the PT lock and a proper clear/invalidate+flush of the relevant page table entry. This fixes memory corruptions reported for FOLL_PIN | FOLL_WRITE, when a pinned page gets mapped R/O and the successive write fault ends up replacing the page instead of reusing it. It improves the situation for O_DIRECT/vmsplice/... that still use FOLL_GET instead of FOLL_PIN, if fork() is *not* involved, however swapout and fork() are still problematic. Properly using FOLL_PIN instead of FOLL_GET for these GUP users will fix the issue for them. I. Details about basic handling I.1. Fresh anonymous pages page_add_new_anon_rmap() and hugepage_add_new_anon_rmap() will mark the given page exclusive via __page_set_anon_rmap(exclusive=1). As that is the mechanism fresh anonymous pages come into life (besides migration code where we copy the page->mapping), all fresh anonymous pages will start out as exclusive. I.2. COW reuse handling of anonymous pages When a COW handler stumbles over a (sub)page that's marked exclusive, it simply reuses it. Otherwise, the handler tries harder under page lock to detect if the (sub)page is exclusive and can be reused. If exclusive, page_move_anon_rmap() will mark the given (sub)page exclusive. Note that hugetlb code does not yet check for PageAnonExclusive(), as it still uses the old COW logic that is prone to the COW security issue because hugetlb code cannot really tolerate unnecessary/wrong COW as huge pages are a scarce resource. I.3. Migration handling try_to_migrate() has to try marking an exclusive anonymous page shared via page_try_share_anon_rmap(). If it fails because there are GUP pins on the page, unmap fails. migrate_vma_collect_pmd() and __split_huge_pmd_locked() are handled similarly. Writable migration entries implicitly point at shared anonymous pages. For readable migration entries that information is stored via a new "readable-exclusive" migration entry, specific to anonymous pages. When restoring a migration entry in remove_migration_pte(), information about exlusivity is detected via the migration entry type, and RMAP_EXCLUSIVE is set accordingly for page_add_anon_rmap()/hugepage_add_anon_rmap() to restore that information. I.4. Swapout handling try_to_unmap() has to try marking the mapped page possibly shared via page_try_share_anon_rmap(). If it fails because there are GUP pins on the page, unmap fails. For now, information about exclusivity is lost. In the future, we might want to remember that information in the swap entry in some cases, however, it requires more thought, care, and a way to store that information in swap entries. I.5. Swapin handling do_swap_page() will never stumble over exclusive anonymous pages in the swap cache, as try_to_migrate() prohibits that. do_swap_page() always has to detect manually if an anonymous page is exclusive and has to set RMAP_EXCLUSIVE for page_add_anon_rmap() accordingly. I.6. THP handling __split_huge_pmd_locked() has to move the information about exclusivity from the PMD to the PTEs. a) In case we have a readable-exclusive PMD migration entry, simply insert readable-exclusive PTE migration entries. b) In case we have a present PMD entry and we don't want to freeze ("convert to migration entries"), simply forward PG_anon_exclusive to all sub-pages, no need to temporarily clear the bit. c) In case we have a present PMD entry and want to freeze, handle it similar to try_to_migrate(): try marking the page shared first. In case we fail, we ignore the "freeze" instruction and simply split ordinarily. try_to_migrate() will properly fail because the THP is still mapped via PTEs. When splitting a compound anonymous folio (THP), the information about exclusivity is implicitly handled via the migration entries: no need to replicate PG_anon_exclusive manually. I.7. fork() handling fork() handling is relatively easy, because PG_anon_exclusive is only expressive for some page table entry types. a) Present anonymous pages page_try_dup_anon_rmap() will mark the given subpage shared -- which will fail if the page is pinned. If it failed, we have to copy (or PTE-map a PMD to handle it on the PTE level). Note that device exclusive entries are just a pointer at a PageAnon() page. fork() will first convert a device exclusive entry to a present page table and handle it just like present anonymous pages. b) Device private entry Device private entries point at PageAnon() pages that cannot be mapped directly and, therefore, cannot get pinned. page_try_dup_anon_rmap() will mark the given subpage shared, which cannot fail because they cannot get pinned. c) HW poison entries PG_anon_exclusive will remain untouched and is stale -- the page table entry is just a placeholder after all. d) Migration entries Writable and readable-exclusive entries are converted to readable entries: possibly shared. I.8. mprotect() handling mprotect() only has to properly handle the new readable-exclusive migration entry: When write-protecting a migration entry that points at an anonymous page, remember the information about exclusivity via the "readable-exclusive" migration entry type. II. Migration and GUP-fast Whenever replacing a present page table entry that maps an exclusive anonymous page by a migration entry, we have to mark the page possibly shared and synchronize against GUP-fast by a proper clear/invalidate+flush to make the following scenario impossible: 1. try_to_migrate() places a migration entry after checking for GUP pins and marks the page possibly shared. 2. GUP-fast pins the page due to lack of synchronization 3. fork() converts the "writable/readable-exclusive" migration entry into a readable migration entry 4. Migration fails due to the GUP pin (failing to freeze the refcount) 5. Migration entries are restored. PG_anon_exclusive is lost -> We have a pinned page that is not marked exclusive anymore. Note that we move information about exclusivity from the page to the migration entry as it otherwise highly overcomplicates fork() and PTE-mapping a THP. III. Swapout and GUP-fast Whenever replacing a present page table entry that maps an exclusive anonymous page by a swap entry, we have to mark the page possibly shared and synchronize against GUP-fast by a proper clear/invalidate+flush to make the following scenario impossible: 1. try_to_unmap() places a swap entry after checking for GUP pins and clears exclusivity information on the page. 2. GUP-fast pins the page due to lack of synchronization. -> We have a pinned page that is not marked exclusive anymore. If we'd ever store information about exclusivity in the swap entry, similar to migration handling, the same considerations as in II would apply. This is future work. Link: https://lkml.kernel.org/r/20220428083441.37290-13-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/rmap.h | 40 ++++++++++++++++++++++++++++++++++++++++ include/linux/swap.h | 15 +++++++++++---- include/linux/swapops.h | 25 +++++++++++++++++++++++++ 3 files changed, 76 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index e4156921eea9..cbe279a6f0de 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -228,6 +228,13 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound, { VM_BUG_ON_PAGE(!PageAnon(page), page); + /* + * No need to check+clear for already shared pages, including KSM + * pages. + */ + if (!PageAnonExclusive(page)) + goto dup; + /* * If this page may have been pinned by the parent process, * don't allow to duplicate the mapping but instead require to e.g., @@ -239,14 +246,47 @@ static inline int page_try_dup_anon_rmap(struct page *page, bool compound, unlikely(page_needs_cow_for_dma(vma, page)))) return -EBUSY; + ClearPageAnonExclusive(page); /* * It's okay to share the anon page between both processes, mapping * the page R/O into both processes. */ +dup: __page_dup_rmap(page, compound); return 0; } +/** + * page_try_share_anon_rmap - try marking an exclusive anonymous page possibly + * shared to prepare for KSM or temporary unmapping + * @page: the exclusive anonymous page to try marking possibly shared + * + * The caller needs to hold the PT lock and has to have the page table entry + * cleared/invalidated+flushed, to properly sync against GUP-fast. + * + * This is similar to page_try_dup_anon_rmap(), however, not used during fork() + * to duplicate a mapping, but instead to prepare for KSM or temporarily + * unmapping a page (swap, migration) via page_remove_rmap(). + * + * Marking the page shared can only fail if the page may be pinned; device + * private pages cannot get pinned and consequently this function cannot fail. + * + * Returns 0 if marking the page possibly shared succeeded. Returns -EBUSY + * otherwise. + */ +static inline int page_try_share_anon_rmap(struct page *page) +{ + VM_BUG_ON_PAGE(!PageAnon(page) || !PageAnonExclusive(page), page); + + /* See page_try_dup_anon_rmap(). */ + if (likely(!is_device_private_page(page) && + unlikely(page_maybe_dma_pinned(page)))) + return -EBUSY; + + ClearPageAnonExclusive(page); + return 0; +} + /* * Called from mm/vmscan.c to handle paging out */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 27093b477c5f..e6d70a4156e8 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -78,12 +78,19 @@ static inline int current_is_kswapd(void) #endif /* - * NUMA node memory migration support + * Page migration support. + * + * SWP_MIGRATION_READ_EXCLUSIVE is only applicable to anonymous pages and + * indicates that the referenced (part of) an anonymous page is exclusive to + * a single process. For SWP_MIGRATION_WRITE, that information is implicit: + * (part of) an anonymous page that are mapped writable are exclusive to a + * single process. */ #ifdef CONFIG_MIGRATION -#define SWP_MIGRATION_NUM 2 -#define SWP_MIGRATION_READ (MAX_SWAPFILES + SWP_HWPOISON_NUM) -#define SWP_MIGRATION_WRITE (MAX_SWAPFILES + SWP_HWPOISON_NUM + 1) +#define SWP_MIGRATION_NUM 3 +#define SWP_MIGRATION_READ (MAX_SWAPFILES + SWP_HWPOISON_NUM) +#define SWP_MIGRATION_READ_EXCLUSIVE (MAX_SWAPFILES + SWP_HWPOISON_NUM + 1) +#define SWP_MIGRATION_WRITE (MAX_SWAPFILES + SWP_HWPOISON_NUM + 2) #else #define SWP_MIGRATION_NUM 0 #endif diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 5af852b68805..6648b97244e7 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -194,6 +194,7 @@ static inline bool is_writable_device_exclusive_entry(swp_entry_t entry) static inline int is_migration_entry(swp_entry_t entry) { return unlikely(swp_type(entry) == SWP_MIGRATION_READ || + swp_type(entry) == SWP_MIGRATION_READ_EXCLUSIVE || swp_type(entry) == SWP_MIGRATION_WRITE); } @@ -202,11 +203,26 @@ static inline int is_writable_migration_entry(swp_entry_t entry) return unlikely(swp_type(entry) == SWP_MIGRATION_WRITE); } +static inline int is_readable_migration_entry(swp_entry_t entry) +{ + return unlikely(swp_type(entry) == SWP_MIGRATION_READ); +} + +static inline int is_readable_exclusive_migration_entry(swp_entry_t entry) +{ + return unlikely(swp_type(entry) == SWP_MIGRATION_READ_EXCLUSIVE); +} + static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) { return swp_entry(SWP_MIGRATION_READ, offset); } +static inline swp_entry_t make_readable_exclusive_migration_entry(pgoff_t offset) +{ + return swp_entry(SWP_MIGRATION_READ_EXCLUSIVE, offset); +} + static inline swp_entry_t make_writable_migration_entry(pgoff_t offset) { return swp_entry(SWP_MIGRATION_WRITE, offset); @@ -224,6 +240,11 @@ static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) return swp_entry(0, 0); } +static inline swp_entry_t make_readable_exclusive_migration_entry(pgoff_t offset) +{ + return swp_entry(0, 0); +} + static inline swp_entry_t make_writable_migration_entry(pgoff_t offset) { return swp_entry(0, 0); @@ -244,6 +265,10 @@ static inline int is_writable_migration_entry(swp_entry_t entry) { return 0; } +static inline int is_readable_migration_entry(swp_entry_t entry) +{ + return 0; +} #endif -- cgit From 7f5abe609b3dcbc62a36e18b1437f4f3521ecb75 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:44 -0700 Subject: mm/rmap: fail try_to_migrate() early when setting a PMD migration entry fails Let's fail right away in case we cannot clear PG_anon_exclusive because the anon THP may be pinned. Right now, we continue trying to install migration entries and the caller of try_to_migrate() will realize that the page is still mapped and has to restore the migration entries. Let's just fail fast just like for PTE migration entries. Link: https://lkml.kernel.org/r/20220428083441.37290-14-david@redhat.com Signed-off-by: David Hildenbrand Suggested-by: Vlastimil Babka Cc: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/swapops.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 6648b97244e7..e476a8fed537 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -299,7 +299,7 @@ static inline bool is_pfn_swap_entry(swp_entry_t entry) struct page_vma_mapped_walk; #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION -extern void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, +extern int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, struct page *page); extern void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, @@ -332,7 +332,7 @@ static inline int is_pmd_migration_entry(pmd_t pmd) return !pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd)); } #else -static inline void set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, +static inline int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, struct page *page) { BUILD_BUG(); -- cgit From c89357e27f20dda3fff6791d27bb6c91eae99f4a Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:45 -0700 Subject: mm: support GUP-triggered unsharing of anonymous pages Whenever GUP currently ends up taking a R/O pin on an anonymous page that might be shared -- mapped R/O and !PageAnonExclusive() -- any write fault on the page table entry will end up replacing the mapped anonymous page due to COW, resulting in the GUP pin no longer being consistent with the page actually mapped into the page table. The possible ways to deal with this situation are: (1) Ignore and pin -- what we do right now. (2) Fail to pin -- which would be rather surprising to callers and could break user space. (3) Trigger unsharing and pin the now exclusive page -- reliable R/O pins. We want to implement 3) because it provides the clearest semantics and allows for checking in unpin_user_pages() and friends for possible BUGs: when trying to unpin a page that's no longer exclusive, clearly something went very wrong and might result in memory corruptions that might be hard to debug. So we better have a nice way to spot such issues. To implement 3), we need a way for GUP to trigger unsharing: FAULT_FLAG_UNSHARE. FAULT_FLAG_UNSHARE is only applicable to R/O mapped anonymous pages and resembles COW logic during a write fault. However, in contrast to a write fault, GUP-triggered unsharing will, for example, still maintain the write protection. Let's implement FAULT_FLAG_UNSHARE by hooking into the existing write fault handlers for all applicable anonymous page types: ordinary pages, THP and hugetlb. * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that has been marked exclusive in the meantime by someone else, there is nothing to do. * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that's not marked exclusive, it will try detecting if the process is the exclusive owner. If exclusive, it can be set exclusive similar to reuse logic during write faults via page_move_anon_rmap() and there is nothing else to do; otherwise, we either have to copy and map a fresh, anonymous exclusive page R/O (ordinary pages, hugetlb), or split the THP. This commit is heavily based on patches by Andrea. Link: https://lkml.kernel.org/r/20220428083441.37290-16-david@redhat.com Signed-off-by: Andrea Arcangeli Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Co-developed-by: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 87eddd509de2..7216d77a5884 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -819,6 +819,9 @@ typedef struct { * @FAULT_FLAG_REMOTE: The fault is not for current task/mm. * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch. * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals. + * @FAULT_FLAG_UNSHARE: The fault is an unsharing request to unshare (and mark + * exclusive) a possibly shared anonymous page that is + * mapped R/O. * * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify * whether we would allow page faults to retry by specifying these two @@ -838,6 +841,10 @@ typedef struct { * continuous faults with flags (b). We should always try to detect pending * signals before a retry to make sure the continuous page faults can still be * interrupted if necessary. + * + * The combination FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE is illegal. + * FAULT_FLAG_UNSHARE is ignored and treated like an ordinary read fault when + * no existing R/O-mapped anonymous page is encountered. */ enum fault_flag { FAULT_FLAG_WRITE = 1 << 0, @@ -850,6 +857,7 @@ enum fault_flag { FAULT_FLAG_REMOTE = 1 << 7, FAULT_FLAG_INSTRUCTION = 1 << 8, FAULT_FLAG_INTERRUPTIBLE = 1 << 9, + FAULT_FLAG_UNSHARE = 1 << 10, }; #endif /* _LINUX_MM_TYPES_H */ -- cgit From a7f226604170acd6b142b76472c1a49c12ebb83d Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:45 -0700 Subject: mm/gup: trigger FAULT_FLAG_UNSHARE when R/O-pinning a possibly shared anonymous page Whenever GUP currently ends up taking a R/O pin on an anonymous page that might be shared -- mapped R/O and !PageAnonExclusive() -- any write fault on the page table entry will end up replacing the mapped anonymous page due to COW, resulting in the GUP pin no longer being consistent with the page actually mapped into the page table. The possible ways to deal with this situation are: (1) Ignore and pin -- what we do right now. (2) Fail to pin -- which would be rather surprising to callers and could break user space. (3) Trigger unsharing and pin the now exclusive page -- reliable R/O pins. Let's implement 3) because it provides the clearest semantics and allows for checking in unpin_user_pages() and friends for possible BUGs: when trying to unpin a page that's no longer exclusive, clearly something went very wrong and might result in memory corruptions that might be hard to debug. So we better have a nice way to spot such issues. This change implies that whenever user space *wrote* to a private mapping (IOW, we have an anonymous page mapped), that GUP pins will always remain consistent: reliable R/O GUP pins of anonymous pages. As a side note, this commit fixes the COW security issue for hugetlb with FOLL_PIN as documented in: https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com The vmsplice reproducer still applies, because vmsplice uses FOLL_GET instead of FOLL_PIN. Note that follow_huge_pmd() doesn't apply because we cannot end up in there with FOLL_PIN. This commit is heavily based on prototype patches by Andrea. Link: https://lkml.kernel.org/r/20220428083441.37290-17-david@redhat.com Signed-off-by: Andrea Arcangeli Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Co-developed-by: Andrea Arcangeli Cc: Christoph Hellwig Cc: David Rientjes Cc: Don Dutile Cc: Hugh Dickins Cc: Jan Kara Cc: Jann Horn Cc: Jason Gunthorpe Cc: John Hubbard Cc: Khalid Aziz Cc: "Kirill A. Shutemov" Cc: Liang Zhang Cc: "Matthew Wilcox (Oracle)" Cc: Michal Hocko Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Oded Gabbay Cc: Oleg Nesterov Cc: Pedro Demarchi Gomes Cc: Peter Xu Cc: Rik van Riel Cc: Roman Gushchin Cc: Shakeel Butt Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/mm.h | 39 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 9ee3ae51e8e3..5dc5005ccc9b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3002,6 +3002,45 @@ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) return 0; } +/* + * Indicates for which pages that are write-protected in the page table, + * whether GUP has to trigger unsharing via FAULT_FLAG_UNSHARE such that the + * GUP pin will remain consistent with the pages mapped into the page tables + * of the MM. + * + * Temporary unmapping of PageAnonExclusive() pages or clearing of + * PageAnonExclusive() has to protect against concurrent GUP: + * * Ordinary GUP: Using the PT lock + * * GUP-fast and fork(): mm->write_protect_seq + * * GUP-fast and KSM or temporary unmapping (swap, migration): + * clear/invalidate+flush of the page table entry + * + * Must be called with the (sub)page that's actually referenced via the + * page table entry, which might not necessarily be the head page for a + * PTE-mapped THP. + */ +static inline bool gup_must_unshare(unsigned int flags, struct page *page) +{ + /* + * FOLL_WRITE is implicitly handled correctly as the page table entry + * has to be writable -- and if it references (part of) an anonymous + * folio, that part is required to be marked exclusive. + */ + if ((flags & (FOLL_WRITE | FOLL_PIN)) != FOLL_PIN) + return false; + /* + * Note: PageAnon(page) is stable until the page is actually getting + * freed. + */ + if (!PageAnon(page)) + return false; + /* + * Note that PageKsm() pages cannot be exclusive, and consequently, + * cannot get pinned. + */ + return !PageAnonExclusive(page); +} + typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data); extern int apply_to_page_range(struct mm_struct *mm, unsigned long address, unsigned long size, pte_fn_t fn, void *data); -- cgit From 1493a1913e34b0ac366e33f9ebad721e69fd06ac Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 9 May 2022 18:20:45 -0700 Subject: mm/swap: remember PG_anon_exclusive via a swp pte bit Patch series "mm: COW fixes part 3: reliable GUP R/W FOLL_GET of anonymous pages", v2. This series fixes memory corruptions when a GUP R/W reference (FOLL_WRITE | FOLL_GET) was taken on an anonymous page and COW logic fails to detect exclusivity of the page to then replacing the anonymous page by a copy in the page table: The GUP reference lost synchronicity with the pages mapped into the page tables. This series focuses on x86, arm64, s390x and ppc64/book3s -- other architectures are fairly easy to support by implementing __HAVE_ARCH_PTE_SWP_EXCLUSIVE. This primarily fixes the O_DIRECT memory corruptions that can happen on concurrent swapout, whereby we lose DMA reads to a page (modifying the user page by writing to it). O_DIRECT currently uses FOLL_GET for short-term (!FOLL_LONGTERM) DMA from/to a user page. In the long run, we want to convert it to properly use FOLL_PIN, and John is working on it, but that might take a while and might not be easy to backport. In the meantime, let's restore what used to work before we started modifying our COW logic: make R/W FOLL_GET references reliable as long as there is no fork() after GUP involved. This is just the natural follow-up of part 2, that will also further reduce "wrong COW" on the swapin path, for example, when we cannot remove a page from the swapcache due to concurrent writeback, or if we have two threads faulting on the same swapped-out page. Fixing O_DIRECT is just a nice side-product This issue, including other related COW issues, has been summarized in [3] under 2): " 2. Intra Process Memory Corruptions due to Wrong COW (FOLL_GET) It was discovered that we can create a memory corruption by reading a file via O_DIRECT to a part (e.g., first 512 bytes) of a page, concurrently writing to an unrelated part (e.g., last byte) of the same page, and concurrently write-protecting the page via clear_refs SOFTDIRTY tracking [6]. For the reproducer, the issue is that O_DIRECT grabs a reference of the target page (via FOLL_GET) and clear_refs write-protects the relevant page table entry. On successive write access to the page from the process itself, we wrongly COW the page when resolving the write fault, resulting in a loss of synchronicity and consequently a memory corruption. While some people might think that using clear_refs in this combination is a corner cases, it turns out to be a more generic problem unfortunately. For example, it was just recently discovered that we can similarly create a memory corruption without clear_refs, simply by concurrently swapping out the buffer pages [7]. Note that we nowadays even use the swap infrastructure in Linux without an actual swap disk/partition: the prime example is zram which is enabled as default under Fedora [10]. The root issue is that a write-fault on a page that has additional references results in a COW and thereby a loss of synchronicity and consequently a memory corruption if two parties believe they are referencing the same page. " We don't particularly care about R/O FOLL_GET references: they were never reliable and O_DIRECT doesn't expect to observe modifications from a page after DMA was started. Note that: * this only fixes the issue on x86, arm64, s390x and ppc64/book3s ("enterprise architectures"). Other architectures have to implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE to achieve the same. * this does *not * consider any kind of fork() after taking the reference: fork() after GUP never worked reliably with FOLL_GET. * Not losing PG_anon_exclusive during swapout was the last remaining piece. KSM already makes sure that there are no other references on a page before considering it for sharing. Page migration maintains PG_anon_exclusive and simply fails when there are additional references (freezing the refcount fails). Only swapout code dropped the PG_anon_exclusive flag because it requires more work to remember + restore it. With this series in place, most COW issues of [3] are fixed on said architectures. Other architectures can implement __HAVE_ARCH_PTE_SWP_EXCLUSIVE fairly easily. [1] https://lkml.kernel.org/r/20220329160440.193848-1-david@redhat.com [2] https://lkml.kernel.org/r/20211217113049.23850-1-david@redhat.com [3] https://lore.kernel.org/r/3ae33b08-d9ef-f846-56fb-645e3b9b4c66@redhat.com This patch (of 8): Currently, we clear PG_anon_exclusive in try_to_unmap() and forget about it. We do this, to keep fork() logic on swap entries easy and efficient: for example, if we wouldn't clear it when unmapping, we'd have to lookup the page in the swapcache for each and every swap entry during fork() and clear PG_anon_exclusive if set. Instead, we want to store that information directly in the swap pte, protected by the page table lock, similarly to how we handle SWP_MIGRATION_READ_EXCLUSIVE for migration entries. However, for actual swap entries, we don't want to mess with the swap type (e.g., still one bit) because it overcomplicates swap code. In try_to_unmap(), we already reject to unmap in case the page might be pinned, because we must not lose PG_anon_exclusive on pinned pages ever. Checking if there are other unexpected references reliably *before* completely unmapping a page is unfortunately not really possible: THP heavily overcomplicate the situation. Once fully unmapped it's easier -- we, for example, make sure that there are no unexpected references *after* unmapping a page before starting writeback on that page. So, we currently might end up unmapping a page and clearing PG_anon_exclusive if that page has additional references, for example, due to a FOLL_GET. do_swap_page() has to re-determine if a page is exclusive, which will easily fail if there are other references on a page, most prominently GUP references via FOLL_GET. This can currently result in memory corruptions when taking a FOLL_GET | FOLL_WRITE reference on a page even when fork() is never involved: try_to_unmap() will succeed, and when refaulting the page, it cannot be marked exclusive and will get replaced by a copy in the page tables on the next write access, resulting in writes via the GUP reference to the page being lost. In an ideal world, everybody that uses GUP and wants to modify page content, such as O_DIRECT, would properly use FOLL_PIN. However, that conversion will take a while. It's easier to fix what used to work in the past (FOLL_GET | FOLL_WRITE) remembering PG_anon_exclusive. In addition, by remembering PG_anon_exclusive we can further reduce unnecessary COW in some cases, so it's the natural thing to do. So let's transfer the PG_anon_exclusive information to the swap pte and store it via an architecture-dependant pte bit; use that information when restoring the swap pte in do_swap_page() and unuse_pte(). During fork(), we simply have to clear the pte bit and are done. Of course, there is one corner case to handle: swap backends that don't support concurrent page modifications while the page is under writeback. Special case these, and drop the exclusive marker. Add a comment why that is just fine (also, reuse_swap_page() would have done the same in the past). In the future, we'll hopefully have all architectures support __HAVE_ARCH_PTE_SWP_EXCLUSIVE, such that we can get rid of the empty stubs and the define completely. Then, we can also convert SWP_MIGRATION_READ_EXCLUSIVE. For architectures it's fairly easy to support: either simply use a yet unused pte bit that can be used for swap entries, steal one from the arch type bits if they exceed 5, or steal one from the offset bits. Note: R/O FOLL_GET references were never really reliable, especially when taking one on a shared page and then writing to the page (e.g., GUP after fork()). FOLL_GET, including R/W references, were never really reliable once fork was involved (e.g., GUP before fork(), GUP during fork()). KSM steps back in case it stumbles over unexpected references and is, therefore, fine. [david@redhat.com: fix SWP_STABLE_WRITES test] Link: https://lkml.kernel.org/r/ac725bcb-313a-4fff-250a-68ba9a8f85fb@redhat.comLink: https://lkml.kernel.org/r/20220329164329.208407-1-david@redhat.com Link: https://lkml.kernel.org/r/20220329164329.208407-2-david@redhat.com Signed-off-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Hugh Dickins Cc: Shakeel Butt Cc: John Hubbard Cc: Jason Gunthorpe Cc: Mike Kravetz Cc: Mike Rapoport Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox (Oracle) Cc: Jann Horn Cc: Michal Hocko Cc: Nadav Amit Cc: Rik van Riel Cc: Roman Gushchin Cc: Andrea Arcangeli Cc: Peter Xu Cc: Don Dutile Cc: Christoph Hellwig Cc: Oleg Nesterov Cc: Jan Kara Cc: Liang Zhang Cc: Pedro Demarchi Gomes Cc: Oded Gabbay Cc: Catalin Marinas Cc: Will Deacon Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: Gerald Schaefer Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 29 +++++++++++++++++++++++++++++ include/linux/swapops.h | 2 ++ 2 files changed, 31 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index f4f4077b97aa..53750224e176 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1003,6 +1003,35 @@ static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot) #define arch_start_context_switch(prev) do {} while (0) #endif +/* + * When replacing an anonymous page by a real (!non) swap entry, we clear + * PG_anon_exclusive from the page and instead remember whether the flag was + * set in the swp pte. During fork(), we have to mark the entry as !exclusive + * (possibly shared). On swapin, we use that information to restore + * PG_anon_exclusive, which is very helpful in cases where we might have + * additional (e.g., FOLL_GET) references on a page and wouldn't be able to + * detect exclusivity. + * + * These functions don't apply to non-swap entries (e.g., migration, hwpoison, + * ...). + */ +#ifndef __HAVE_ARCH_PTE_SWP_EXCLUSIVE +static inline pte_t pte_swp_mkexclusive(pte_t pte) +{ + return pte; +} + +static inline int pte_swp_exclusive(pte_t pte) +{ + return false; +} + +static inline pte_t pte_swp_clear_exclusive(pte_t pte) +{ + return pte; +} +#endif + #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY #ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index e476a8fed537..d1b728904d4e 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -26,6 +26,8 @@ /* Clear all flags but only keep swp_entry_t related information */ static inline pte_t pte_swp_clear_flags(pte_t pte) { + if (pte_swp_exclusive(pte)) + pte = pte_swp_clear_exclusive(pte); if (pte_swp_soft_dirty(pte)) pte = pte_swp_clear_soft_dirty(pte); if (pte_swp_uffd_wp(pte)) -- cgit From 014bb1de4fc17d54907d54418126a9a9736f4aff Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 9 May 2022 18:20:47 -0700 Subject: mm: create new mm/swap.h header file Patch series "MM changes to improve swap-over-NFS support". Assorted improvements for swap-via-filesystem. This is a resend of these patches, rebased on current HEAD. The only substantial changes is that swap_dirty_folio has replaced swap_set_page_dirty. Currently swap-via-fs (SWP_FS_OPS) doesn't work for any filesystem. It has previously worked for NFS but that broke a few releases back. This series changes to use a new ->swap_rw rather than ->readpage and ->direct_IO. It also makes other improvements. There is a companion series already in linux-next which fixes various issues with NFS. Once both series land, a final patch is needed which changes NFS over to use ->swap_rw. This patch (of 10): Many functions declared in include/linux/swap.h are only used within mm/ Create a new "mm/swap.h" and move some of these declarations there. Remove the redundant 'extern' from the function declarations. [akpm@linux-foundation.org: mm/memory-failure.c needs mm/swap.h] Link: https://lkml.kernel.org/r/164859751830.29473.5309689752169286816.stgit@noble.brown Link: https://lkml.kernel.org/r/164859778120.29473.11725907882296224053.stgit@noble.brown Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig Tested-by: David Howells Tested-by: Geert Uytterhoeven Cc: Trond Myklebust Cc: Hugh Dickins Cc: Mel Gorman Cc: Miaohe Lin Signed-off-by: Andrew Morton --- include/linux/swap.h | 121 --------------------------------------------------- 1 file changed, 121 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index e6d70a4156e8..def67112dce4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -427,62 +427,19 @@ extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP -#include /* for bio_end_io_t */ - -/* linux/mm/page_io.c */ -extern int swap_readpage(struct page *page, bool do_poll); -extern int swap_writepage(struct page *page, struct writeback_control *wbc); -extern void end_swap_bio_write(struct bio *bio); -extern int __swap_writepage(struct page *page, struct writeback_control *wbc, - bio_end_io_t end_write_func); bool swap_dirty_folio(struct address_space *mapping, struct folio *folio); - int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, sector_t *); -/* linux/mm/swap_state.c */ -/* One swap address space for each 64M swap space */ -#define SWAP_ADDRESS_SPACE_SHIFT 14 -#define SWAP_ADDRESS_SPACE_PAGES (1 << SWAP_ADDRESS_SPACE_SHIFT) -extern struct address_space *swapper_spaces[]; -#define swap_address_space(entry) \ - (&swapper_spaces[swp_type(entry)][swp_offset(entry) \ - >> SWAP_ADDRESS_SPACE_SHIFT]) static inline unsigned long total_swapcache_pages(void) { return global_node_page_state(NR_SWAPCACHE); } -extern void show_swap_cache_info(void); -extern int add_to_swap(struct page *page); -extern void *get_shadow_from_swap_cache(swp_entry_t entry); -extern int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp, void **shadowp); -extern void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow); -extern void delete_from_swap_cache(struct page *); -extern void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end); -extern void free_swap_cache(struct page *); extern void free_page_and_swap_cache(struct page *); extern void free_pages_and_swap_cache(struct page **, int); -extern struct page *lookup_swap_cache(swp_entry_t entry, - struct vm_area_struct *vma, - unsigned long addr); -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index); -extern struct page *read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool do_poll); -extern struct page *__read_swap_cache_async(swp_entry_t, gfp_t, - struct vm_area_struct *vma, unsigned long addr, - bool *new_page_allocated); -extern struct page *swap_cluster_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); -extern struct page *swapin_readahead(swp_entry_t entry, gfp_t flag, - struct vm_fault *vmf); - /* linux/mm/swapfile.c */ extern atomic_long_t nr_swap_pages; extern long total_swap_pages; @@ -535,12 +492,6 @@ static inline void put_swap_device(struct swap_info_struct *si) } #else /* CONFIG_SWAP */ - -static inline int swap_readpage(struct page *page, bool do_poll) -{ - return 0; -} - static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry) { return NULL; @@ -555,11 +506,6 @@ static inline void put_swap_device(struct swap_info_struct *si) { } -static inline struct address_space *swap_address_space(swp_entry_t entry) -{ - return NULL; -} - #define get_nr_swap_pages() 0L #define total_swap_pages 0L #define total_swapcache_pages() 0UL @@ -574,14 +520,6 @@ static inline struct address_space *swap_address_space(swp_entry_t entry) #define free_pages_and_swap_cache(pages, nr) \ release_pages((pages), (nr)); -static inline void free_swap_cache(struct page *page) -{ -} - -static inline void show_swap_cache_info(void) -{ -} - /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */ #define free_swap_and_cache(e) is_pfn_swap_entry(e) @@ -607,65 +545,6 @@ static inline void put_swap_page(struct page *page, swp_entry_t swp) { } -static inline struct page *swap_cluster_readahead(swp_entry_t entry, - gfp_t gfp_mask, struct vm_fault *vmf) -{ - return NULL; -} - -static inline struct page *swapin_readahead(swp_entry_t swp, gfp_t gfp_mask, - struct vm_fault *vmf) -{ - return NULL; -} - -static inline int swap_writepage(struct page *p, struct writeback_control *wbc) -{ - return 0; -} - -static inline struct page *lookup_swap_cache(swp_entry_t swp, - struct vm_area_struct *vma, - unsigned long addr) -{ - return NULL; -} - -static inline -struct page *find_get_incore_page(struct address_space *mapping, pgoff_t index) -{ - return find_get_page(mapping, index); -} - -static inline int add_to_swap(struct page *page) -{ - return 0; -} - -static inline void *get_shadow_from_swap_cache(swp_entry_t entry) -{ - return NULL; -} - -static inline int add_to_swap_cache(struct page *page, swp_entry_t entry, - gfp_t gfp_mask, void **shadowp) -{ - return -1; -} - -static inline void __delete_from_swap_cache(struct page *page, - swp_entry_t entry, void *shadow) -{ -} - -static inline void delete_from_swap_cache(struct page *page) -{ -} - -static inline void clear_shadow_from_swap_cache(int type, unsigned long begin, - unsigned long end) -{ -} static inline int page_swapcount(struct page *page) { -- cgit From 4c4a763406ef903b78334bd2ccea168d2f7a741a Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 9 May 2022 18:20:47 -0700 Subject: mm: drop swap_dirty_folio folios that are written to swap are owned by the MM subsystem - not any filesystem. When such a folio is passed to a filesystem to be written out to a swap-file, the filesystem handles the data, but the folio itself does not belong to the filesystem. So calling the filesystem's ->dirty_folio() address_space operation makes no sense. This is for folios in the given address space, and a folio to be written to swap does not exist in the given address space. So drop swap_dirty_folio() which calls the address-space's ->dirty_folio(), and always use noop_dirty_folio(), which is appropriate for folios being swapped out. Link: https://lkml.kernel.org/r/164859778123.29473.6900942583784889976.stgit@noble.brown Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig Tested-by: David Howells Tested-by: Geert Uytterhoeven Cc: Hugh Dickins Cc: Mel Gorman Cc: Trond Myklebust Cc: Miaohe Lin Signed-off-by: Andrew Morton --- include/linux/swap.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index def67112dce4..8170254c56b0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -427,7 +427,6 @@ extern void kswapd_stop(int nid); #ifdef CONFIG_SWAP -bool swap_dirty_folio(struct address_space *mapping, struct folio *folio); int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block); int generic_swapfile_activate(struct swap_info_struct *, struct file *, -- cgit From 4b60c0ff2f2021ab99b7fb9da63b7ed1579ef1d8 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 9 May 2022 18:20:48 -0700 Subject: mm: move responsibility for setting SWP_FS_OPS to ->swap_activate If a filesystem wishes to handle all swap IO itself (via ->direct_IO and ->readpage), rather than just providing devices addresses for submit_bio(), SWP_FS_OPS must be set. Currently the protocol for setting this it to have ->swap_activate return zero. In that case SWP_FS_OPS is set, and add_swap_extent() is called for the entire file. This is a little clumsy as different return values for ->swap_activate have quite different meanings, and it makes it hard to search for which filesystems require SWP_FS_OPS to be set. So remove the special meaning of a zero return, and require the filesystem to set SWP_FS_OPS if it so desires, and to always call add_swap_extent() as required. Currently only NFS and CIFS return zero for add_swap_extent(). Link: https://lkml.kernel.org/r/164859778123.29473.17908205846599043598.stgit@noble.brown Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig Tested-by: David Howells Tested-by: Geert Uytterhoeven Cc: Hugh Dickins Cc: Mel Gorman Cc: Trond Myklebust Cc: Miaohe Lin Signed-off-by: Andrew Morton --- include/linux/swap.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 8170254c56b0..7daae5a4b3e1 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -577,6 +577,12 @@ static inline swp_entry_t get_swap_page(struct page *page) return entry; } +static inline int add_swap_extent(struct swap_info_struct *sis, + unsigned long start_page, + unsigned long nr_pages, sector_t start_block) +{ + return -EINVAL; +} #endif /* CONFIG_SWAP */ #ifdef CONFIG_THP_SWAP -- cgit From e1209d3a7a67c281260ba9989621060ba7328b8c Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 9 May 2022 18:20:48 -0700 Subject: mm: introduce ->swap_rw and use it for reads from SWP_FS_OPS swap-space swap currently uses ->readpage to read swap pages. This can only request one page at a time from the filesystem, which is not most efficient. swap uses ->direct_IO for writes which while this is adequate is an inappropriate over-loading. ->direct_IO may need to had handle allocate space for holes or other details that are not relevant for swap. So this patch introduces a new address_space operation: ->swap_rw. In this patch it is used for reads, and a subsequent patch will switch writes to use it. No filesystem yet supports ->swap_rw, but that is not a problem because no filesystem actually works with filesystem-based swap. Only two filesystems set SWP_FS_OPS: - cifs sets the flag, but ->direct_IO always fails so swap cannot work. - nfs sets the flag, but ->direct_IO calls generic_write_checks() which has failed on swap files for several releases. To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both NFS and cifs are changed to fail if ->swap_rw is not set. This can be removed if/when the function is added. Future patches will restore swap-over-NFS functionality. To submit an async read with ->swap_rw() we need to allocate a structure to hold the kiocb and other details. swap_readpage() cannot handle transient failure, so we create a mempool to provide the structures. Link: https://lkml.kernel.org/r/164859778125.29473.13430559328221330589.stgit@noble.brown Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig Tested-by: David Howells Tested-by: Geert Uytterhoeven Cc: Hugh Dickins Cc: Mel Gorman Cc: Trond Myklebust Cc: Miaohe Lin Signed-off-by: Andrew Morton --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index bbde95387a23..dbc5d10328df 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -380,6 +380,7 @@ struct address_space_operations { int (*swap_activate)(struct swap_info_struct *sis, struct file *file, sector_t *span); void (*swap_deactivate)(struct file *file); + int (*swap_rw)(struct kiocb *iocb, struct iov_iter *iter); }; extern const struct address_space_operations empty_aops; -- cgit From eb79f3af9395fbe448e91b0940a6c395b7d06be4 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 9 May 2022 18:20:48 -0700 Subject: nfs: rename nfs_direct_IO and use as ->swap_rw The nfs_direct_IO() exists to support SWAP IO, but hasn't worked for a while. We now need a ->swap_rw function which behaves slightly differently, returning zero for success rather than a byte count. So modify nfs_direct_IO accordingly, rename it, and use it as the ->swap_rw function. Link: https://lkml.kernel.org/r/165119301493.15698.7491285551903597618.stgit@noble.brown Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig Tested-by: Geert Uytterhoeven (on Renesas RSK+RZA1 with 32 MiB of SDRAM) Cc: David Howells Cc: Hugh Dickins Cc: Mel Gorman Cc: Miaohe Lin Cc: Trond Myklebust Signed-off-by: Andrew Morton --- include/linux/nfs_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index b48b9259e02c..fd5543486a3f 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -507,7 +507,7 @@ static inline const struct cred *nfs_file_cred(struct file *file) /* * linux/fs/nfs/direct.c */ -extern ssize_t nfs_direct_IO(struct kiocb *, struct iov_iter *); +int nfs_swap_rw(struct kiocb *iocb, struct iov_iter *iter); ssize_t nfs_file_direct_read(struct kiocb *iocb, struct iov_iter *iter, bool swap); ssize_t nfs_file_direct_write(struct kiocb *iocb, -- cgit From 2282679fb20bf036a714ed49fadd0230c278a203 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 9 May 2022 18:20:49 -0700 Subject: mm: submit multipage write for SWP_FS_OPS swap-space swap_writepage() is given one page at a time, but may be called repeatedly in succession. For block-device swapspace, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass a pointer-to-pointer via the wbc. swap_writepage can store state between calls - much like the pointer passed explicitly to swap_readpage. After calling swap_writepage() some number of times, the state will be passed to swap_write_unplug() which can submit the combined request. Link: https://lkml.kernel.org/r/164859778128.29473.5191868522654408537.stgit@noble.brown Signed-off-by: NeilBrown Reviewed-by: Christoph Hellwig Tested-by: David Howells Tested-by: Geert Uytterhoeven Cc: Hugh Dickins Cc: Mel Gorman Cc: Trond Myklebust Cc: Miaohe Lin Signed-off-by: Andrew Morton --- include/linux/writeback.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/writeback.h b/include/linux/writeback.h index fec248ab1fec..32b35f21cb97 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -80,6 +80,13 @@ struct writeback_control { unsigned punt_to_cgroup:1; /* cgrp punting, see __REQ_CGROUP_PUNT */ + /* To enable batching of swap writes to non-block-device backends, + * "plug" can be set point to a 'struct swap_iocb *'. When all swap + * writes have been submitted, if with swap_iocb is not NULL, + * swap_write_unplug() should be called. + */ + struct swap_iocb **swap_plug; + #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb; /* wb this writeback is issued under */ struct inode *inode; /* inode being written out */ -- cgit From a2ad63daa88b9d6846976fd2a0b5e4f5cfc58377 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 9 May 2022 18:20:49 -0700 Subject: VFS: add FMODE_CAN_ODIRECT file flag Currently various places test if direct IO is possible on a file by checking for the existence of the direct_IO address space operation. This is a poor choice, as the direct_IO operation may not be used - it is only used if the generic_file_*_iter functions are called for direct IO and some filesystems - particularly NFS - don't do this. Instead, introduce a new f_mode flag: FMODE_CAN_ODIRECT and change the various places to check this (avoiding pointer dereferences). do_dentry_open() will set this flag if ->direct_IO is present, so filesystems do not need to be changed. NFS *is* changed, to set the flag explicitly and discard the direct_IO entry in the address_space_operations for files. Other filesystems which currently use noop_direct_IO could usefully be changed to set this flag instead. Link: https://lkml.kernel.org/r/164859778128.29473.15189737957277399416.stgit@noble.brown Reviewed-by: Christoph Hellwig Signed-off-by: NeilBrown Tested-by: David Howells Tested-by: Geert Uytterhoeven Cc: Hugh Dickins Cc: Mel Gorman Cc: Trond Myklebust Cc: Miaohe Lin Signed-off-by: Andrew Morton --- include/linux/fs.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index dbc5d10328df..b81cacc51d2f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -162,6 +162,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File is stream-like */ #define FMODE_STREAM ((__force fmode_t)0x200000) +/* File supports DIRECT IO */ +#define FMODE_CAN_ODIRECT ((__force fmode_t)0x400000) + /* File was opened by fanotify and shouldn't generate fanotify events */ #define FMODE_NONOTIFY ((__force fmode_t)0x4000000) -- cgit From fa29000b6b2603ec2bfdc4c73249fcb00cd54f85 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 17:00:05 -0400 Subject: fs: Add aops->release_folio This replaces aops->releasepage. Update the documentation, and call it if it exists. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Jeff Layton --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index f812f5aa07dd..ad768f13f485 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -355,6 +355,7 @@ struct address_space_operations { /* Unfortunately this kludge is needed for FIBMAP. Don't use it */ sector_t (*bmap)(struct address_space *, sector_t); void (*invalidate_folio) (struct folio *, size_t offset, size_t len); + bool (*release_folio)(struct folio *, gfp_t); int (*releasepage) (struct page *, gfp_t); void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); -- cgit From 8597447dc565a6a3fa7bc503674452b7ae2b914c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 30 Apr 2022 23:01:08 -0400 Subject: iomap: Convert to release_folio Change all the filesystems which used iomap_releasepage to use the new function. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Jeff Layton --- include/linux/iomap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 5b2aa45ddda3..0d674695b6d3 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -228,7 +228,7 @@ ssize_t iomap_file_buffered_write(struct kiocb *iocb, struct iov_iter *from, int iomap_read_folio(struct folio *folio, const struct iomap_ops *ops); void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); -int iomap_releasepage(struct page *page, gfp_t gfp_mask); +bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); #ifdef CONFIG_MIGRATION int iomap_migrate_page(struct address_space *mapping, struct page *newpage, -- cgit From 704ead2bed202579f025a4754e52e9ab21ff3ada Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 1 May 2022 00:27:53 -0400 Subject: fs: Remove last vestiges of releasepage All users are now converted to release_folio Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Jeff Layton --- include/linux/fs.h | 1 - include/linux/page-flags.h | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index ad768f13f485..1cee64d9724b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -356,7 +356,6 @@ struct address_space_operations { sector_t (*bmap)(struct address_space *, sector_t); void (*invalidate_folio) (struct folio *, size_t offset, size_t len); bool (*release_folio)(struct folio *, gfp_t); - int (*releasepage) (struct page *, gfp_t); void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); /* diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 9d8eeaa67d05..af10149a6c31 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -516,7 +516,7 @@ PAGEFLAG(SwapBacked, swapbacked, PF_NO_TAIL) /* * Private page markings that may be used by the filesystem that owns the page * for its own purposes. - * - PG_private and PG_private_2 cause releasepage() and co to be invoked + * - PG_private and PG_private_2 cause release_folio() and co to be invoked */ PAGEFLAG(Private, private, PF_ANY) PAGEFLAG(Private2, private_2, PF_ANY) TESTSCFLAG(Private2, private_2, PF_ANY) -- cgit From c56a6eb03deb187c989a966fda5a254249b56c2a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 1 May 2022 00:46:03 -0400 Subject: jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio Also convert it to return a bool since it's called from release_folio(). Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Theodore Ts'o Reviewed-by: Jeff Layton --- include/linux/jbd2.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index de9536680b2b..e79d6e0b14e8 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1529,7 +1529,7 @@ extern int jbd2_journal_dirty_metadata (handle_t *, struct buffer_head *); extern int jbd2_journal_forget (handle_t *, struct buffer_head *); int jbd2_journal_invalidate_folio(journal_t *, struct folio *, size_t offset, size_t length); -extern int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page); +bool jbd2_journal_try_to_free_buffers(journal_t *journal, struct folio *folio); extern int jbd2_journal_stop(handle_t *); extern int jbd2_journal_flush(journal_t *journal, unsigned int flags); extern void jbd2_journal_lock_updates (journal_t *); -- cgit From 68189fef88c7d02eb92e038be3d6428ebd0d2945 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 1 May 2022 01:08:08 -0400 Subject: fs: Change try_to_free_buffers() to take a folio All but two of the callers already have a folio; pass a folio into try_to_free_buffers(). This removes the last user of cancel_dirty_page() so remove that wrapper function too. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Jeff Layton --- include/linux/buffer_head.h | 4 ++-- include/linux/pagemap.h | 4 ---- 2 files changed, 2 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 31d82fd9abe8..c9d1463bb20f 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -158,7 +158,7 @@ void mark_buffer_write_io_error(struct buffer_head *bh); void touch_buffer(struct buffer_head *bh); void set_bh_page(struct buffer_head *bh, struct page *page, unsigned long offset); -int try_to_free_buffers(struct page *); +bool try_to_free_buffers(struct folio *); struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size, bool retry); void create_empty_buffers(struct page *, unsigned long, @@ -402,7 +402,7 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio); #else /* CONFIG_BLOCK */ static inline void buffer_init(void) {} -static inline int try_to_free_buffers(struct page *page) { return 1; } +static inline bool try_to_free_buffers(struct folio *folio) { return true; } static inline int inode_has_buffers(struct inode *inode) { return 0; } static inline void invalidate_inode_buffers(struct inode *inode) {} static inline int remove_inode_buffers(struct inode *inode) { return 1; } diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 831b28dab01a..82dfb279e0c4 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1067,10 +1067,6 @@ static inline void folio_cancel_dirty(struct folio *folio) if (folio_test_dirty(folio)) __folio_cancel_dirty(folio); } -static inline void cancel_dirty_page(struct page *page) -{ - folio_cancel_dirty(page_folio(page)); -} bool folio_clear_dirty_for_io(struct folio *folio); bool clear_page_dirty_for_io(struct page *page); void folio_invalidate(struct folio *folio, size_t offset, size_t length); -- cgit From d2329aa0c78f4a8dd368bb706f196ab99f692eaa Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 1 May 2022 07:35:31 -0400 Subject: fs: Add free_folio address space operation Include documentation and convert the callers to use ->free_folio as well as ->freepage. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 1cee64d9724b..915844e6293e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -356,6 +356,7 @@ struct address_space_operations { sector_t (*bmap)(struct address_space *, sector_t); void (*invalidate_folio) (struct folio *, size_t offset, size_t len); bool (*release_folio)(struct folio *, gfp_t); + void (*free_folio)(struct folio *folio); void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); /* -- cgit From 8560cb1a7d75048af275dd23fb0cf05382b3c2b9 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 5 May 2022 00:43:09 -0400 Subject: fs: Remove aops->freepage All implementations now use free_folio so we can delete the callers and the method. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/fs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 915844e6293e..6f305f1097a5 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -357,7 +357,6 @@ struct address_space_operations { void (*invalidate_folio) (struct folio *, size_t offset, size_t len); bool (*release_folio)(struct folio *, gfp_t); void (*free_folio)(struct folio *folio); - void (*freepage)(struct page *); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); /* * migrate the contents of a page to the specified target. If -- cgit From 8324a02c342a36336114a497130826612ed5520d Mon Sep 17 00:00:00 2001 From: Gavin Li Date: Sun, 27 Mar 2022 17:45:32 +0300 Subject: net/mlx5: Add exit route when waiting for FW Currently, removing a device needs to get the driver interface lock before doing any cleanup. If the driver is waiting in a loop for FW init, there is no way to cancel the wait, instead the device cleanup waits for the loop to conclude and release the lock. To allow immediate response to remove device commands, check the TEARDOWN flag while waiting for FW init, and exit the loop if it has been set. Signed-off-by: Gavin Li Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index ff47d49d8be4..f327d0544038 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -632,6 +632,7 @@ enum mlx5_device_state { enum mlx5_interface_state { MLX5_INTERFACE_STATE_UP = BIT(0), + MLX5_BREAK_FW_WAIT = BIT(1), }; enum mlx5_pci_status { -- cgit From 34a30d7635a8e37275a7b63bec09035ed762969b Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Tue, 1 Mar 2022 15:42:01 +0000 Subject: net/mlx5: Lag, expose number of lag ports Downstream patches will add support for hardware lag with more than 2 ports. Add a way for users to query the number of lag ports. Signed-off-by: Mark Bloch Reviewed-by: Maor Gottlieb Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index f327d0544038..62ea1120de9c 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1142,6 +1142,7 @@ int mlx5_lag_query_cong_counters(struct mlx5_core_dev *dev, int num_counters, size_t *offsets); struct mlx5_core_dev *mlx5_lag_get_peer_mdev(struct mlx5_core_dev *dev); +u8 mlx5_lag_get_num_ports(struct mlx5_core_dev *dev); struct mlx5_uars_page *mlx5_get_uars_page(struct mlx5_core_dev *mdev); void mlx5_put_uars_page(struct mlx5_core_dev *mdev, struct mlx5_uars_page *up); int mlx5_dm_sw_icm_alloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type, -- cgit From 4cd14d44b11dabf195d1e66dadbb954336224658 Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Tue, 1 Mar 2022 17:34:58 +0000 Subject: net/mlx5: Support devices with more than 2 ports Increase the define MLX5_MAX_PORTS to 4 as the driver is ready to support NICs with 4 ports. Signed-off-by: Mark Bloch Reviewed-by: Maor Gottlieb Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 62ea1120de9c..fdb9d07a05a4 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -84,7 +84,7 @@ enum mlx5_sqp_t { }; enum { - MLX5_MAX_PORTS = 2, + MLX5_MAX_PORTS = 4, }; enum { -- cgit From 7f46a0b7327ae261f9981888708dbca22c283900 Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Tue, 15 Mar 2022 16:56:50 +0000 Subject: net/mlx5: Lag, add debugfs to query hardware lag state Lag state has become very complicated with many modes, flags, types and port selections methods and future work will add additional features. Add a debugfs to query the current lag state. A new directory named "lag" will be created under the mlx5 debugfs directory. As the driver has debugfs per pci function the location will be: /mlx5//lag For example: /sys/kernel/debug/mlx5/0000:08:00.0/lag The following files are exposed: - state: Returns "active" or "disabled". If "active" it means hardware lag is active. - members: Returns the BDFs of all the members of lag object. - type: Returns the type of the lag currently configured. Valid only if hardware lag is active. * "roce" - Members are bare metal PFs. * "switchdev" - Members are in switchdev mode. * "multipath" - ECMP offloads. - port_sel_mode: Returns the egress port selection method, valid only if hardware lag is active. * "queue_affinity" - Egress port is selected by the QP/SQ affinity. * "hash" - Egress port is selected by hash done on each packet. Controlled by: xmit_hash_policy of the bond device. - flags: Returns flags that are specific per lag @type. Valid only if hardware lag is active. * "shared_fdb" - "on" or "off", if "on" single FDB is used. - mapping: Returns the mapping which is used to select egress port. Valid only if hardware lag is active. If @port_sel_mode is "hash" returns the active egress ports. The hash result will select only active ports. if @port_sel_mode is "queue_affinity" returns the mapping between the configured port affinity of the QP/SQ and actual egress port. For example: * 1:1 - Mapping means if the configured affinity is port 1 traffic will egress via port 1. * 1:2 - Mapping means if the configured affinity is port 1 traffic will egress via port 2. This can happen if port 1 is down or in active/backup mode and port 1 is backup. Signed-off-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index fdb9d07a05a4..d6bac3976913 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -558,6 +558,7 @@ struct mlx5_debugfs_entries { struct dentry *cq_debugfs; struct dentry *cmdif_debugfs; struct dentry *pages_debugfs; + struct dentry *lag_debugfs; }; struct mlx5_ft_pool; -- cgit From 42704b26b0f1d891f6cf4ebc877dbac0d17c690d Mon Sep 17 00:00:00 2001 From: Gerhard Engleder Date: Fri, 6 May 2022 22:01:37 +0200 Subject: ptp: Add cycles support for virtual clocks ptp vclocks require a free running time for their timecounter. Currently only a physical clock forced to free running is supported. If vclocks are used, then the physical clock cannot be synchronized anymore. The synchronized time is not available in hardware in this case. As a result, timed transmission with TAPRIO hardware support is not possible anymore. If hardware would support a free running time additionally to the physical clock, then the physical clock does not need to be forced to free running. Thus, the physical clocks can still be synchronized while vclocks are in use. The physical clock could be used to synchronize the time domain of the TSN network and trigger TAPRIO. In parallel vclocks can be used to synchronize other time domains. Introduce support for a free running cycle counter called cycles to physical clocks. Rework ptp vclocks to use this free running cycle counter. Default implementation is based on time of physical clock. Thus, behavior of ptp vclocks based on physical clocks without free running cycle counter is identical to previous behavior. Signed-off-by: Gerhard Engleder Acked-by: Richard Cochran Signed-off-by: Paolo Abeni --- include/linux/ptp_clock_kernel.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h index e8cc8b6bbf50..ad309202cf9f 100644 --- a/include/linux/ptp_clock_kernel.h +++ b/include/linux/ptp_clock_kernel.h @@ -108,6 +108,32 @@ struct ptp_system_timestamp { * @settime64: Set the current time on the hardware clock. * parameter ts: Time value to set. * + * @getcycles64: Reads the current free running cycle counter from the hardware + * clock. + * If @getcycles64 and @getcyclesx64 are not supported, then + * @gettime64 or @gettimex64 will be used as default + * implementation. + * parameter ts: Holds the result. + * + * @getcyclesx64: Reads the current free running cycle counter from the + * hardware clock and optionally also the system clock. + * If @getcycles64 and @getcyclesx64 are not supported, then + * @gettimex64 will be used as default implementation if + * available. + * parameter ts: Holds the PHC timestamp. + * parameter sts: If not NULL, it holds a pair of timestamps + * from the system clock. The first reading is made right before + * reading the lowest bits of the PHC timestamp and the second + * reading immediately follows that. + * + * @getcrosscycles: Reads the current free running cycle counter from the + * hardware clock and system clock simultaneously. + * If @getcycles64 and @getcyclesx64 are not supported, then + * @getcrosststamp will be used as default implementation if + * available. + * parameter cts: Contains timestamp (device,system) pair, + * where system time is realtime and monotonic. + * * @enable: Request driver to enable or disable an ancillary feature. * parameter request: Desired resource to enable or disable. * parameter on: Caller passes one to enable or zero to disable. @@ -155,6 +181,11 @@ struct ptp_clock_info { int (*getcrosststamp)(struct ptp_clock_info *ptp, struct system_device_crosststamp *cts); int (*settime64)(struct ptp_clock_info *p, const struct timespec64 *ts); + int (*getcycles64)(struct ptp_clock_info *ptp, struct timespec64 *ts); + int (*getcyclesx64)(struct ptp_clock_info *ptp, struct timespec64 *ts, + struct ptp_system_timestamp *sts); + int (*getcrosscycles)(struct ptp_clock_info *ptp, + struct system_device_crosststamp *cts); int (*enable)(struct ptp_clock_info *ptp, struct ptp_clock_request *request, int on); int (*verify)(struct ptp_clock_info *ptp, unsigned int pin, -- cgit From 51eb7492af276b5b4d27cfa4474d40bdac7b9cf8 Mon Sep 17 00:00:00 2001 From: Gerhard Engleder Date: Fri, 6 May 2022 22:01:38 +0200 Subject: ptp: Request cycles for TX timestamp The free running cycle counter of physical clocks called cycles shall be used for hardware timestamps to enable synchronisation. Introduce new flag SKBTX_HW_TSTAMP_USE_CYCLES, which signals driver to provide a TX timestamp based on cycles if cycles are supported. Signed-off-by: Gerhard Engleder Acked-by: Richard Cochran Signed-off-by: Paolo Abeni --- include/linux/skbuff.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d58669d6cb91..4a4f25975ca2 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -615,6 +615,9 @@ enum { /* device driver is going to provide hardware time stamp */ SKBTX_IN_PROGRESS = 1 << 2, + /* generate hardware time stamp based on cycles if supported */ + SKBTX_HW_TSTAMP_USE_CYCLES = 1 << 3, + /* generate wifi status information (where possible) */ SKBTX_WIFI_STATUS = 1 << 4, @@ -624,7 +627,9 @@ enum { #define SKBTX_ANY_SW_TSTAMP (SKBTX_SW_TSTAMP | \ SKBTX_SCHED_TSTAMP) -#define SKBTX_ANY_TSTAMP (SKBTX_HW_TSTAMP | SKBTX_ANY_SW_TSTAMP) +#define SKBTX_ANY_TSTAMP (SKBTX_HW_TSTAMP | \ + SKBTX_HW_TSTAMP_USE_CYCLES | \ + SKBTX_ANY_SW_TSTAMP) /* Definitions for flags in struct skb_shared_info */ enum { -- cgit From d58809d854c9ee19e4cd41023e137e65e9dc3f94 Mon Sep 17 00:00:00 2001 From: Gerhard Engleder Date: Fri, 6 May 2022 22:01:39 +0200 Subject: ptp: Pass hwtstamp to ptp_convert_timestamp() ptp_convert_timestamp() converts only the timestamp hwtstamp, which is a field of the argument with the type struct skb_shared_hwtstamps *. So a pointer to the hwtstamp field of this structure is sufficient. Rework ptp_convert_timestamp() to use an argument of type ktime_t *. This allows to add additional timestamp manipulation stages before the call of ptp_convert_timestamp(). Signed-off-by: Gerhard Engleder Acked-by: Richard Cochran Signed-off-by: Paolo Abeni --- include/linux/ptp_clock_kernel.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ptp_clock_kernel.h b/include/linux/ptp_clock_kernel.h index ad309202cf9f..92b44161408e 100644 --- a/include/linux/ptp_clock_kernel.h +++ b/include/linux/ptp_clock_kernel.h @@ -384,17 +384,16 @@ int ptp_get_vclocks_index(int pclock_index, int **vclock_index); /** * ptp_convert_timestamp() - convert timestamp to a ptp vclock time * - * @hwtstamps: skb_shared_hwtstamps structure pointer + * @hwtstamp: timestamp * @vclock_index: phc index of ptp vclock. * * Returns converted timestamp, or 0 on error. */ -ktime_t ptp_convert_timestamp(const struct skb_shared_hwtstamps *hwtstamps, - int vclock_index); +ktime_t ptp_convert_timestamp(const ktime_t *hwtstamp, int vclock_index); #else static inline int ptp_get_vclocks_index(int pclock_index, int **vclock_index) { return 0; } -static inline ktime_t ptp_convert_timestamp(const struct skb_shared_hwtstamps *hwtstamps, +static inline ktime_t ptp_convert_timestamp(const ktime_t *hwtstamp, int vclock_index) { return 0; } -- cgit From 97dc7cd92ac67f6e05df418df1772ba4a7fbf693 Mon Sep 17 00:00:00 2001 From: Gerhard Engleder Date: Fri, 6 May 2022 22:01:40 +0200 Subject: ptp: Support late timestamp determination If a physical clock supports a free running cycle counter, then timestamps shall be based on this time too. For TX it is known in advance before the transmission if a timestamp based on the free running cycle counter is needed. For RX it is impossible to know which timestamp is needed before the packet is received and assigned to a socket. Support late timestamp determination by a network device. Therefore, an address/cookie is stored within the new netdev_data field of struct skb_shared_hwtstamps. This address/cookie is provided to a new network device function called ndo_get_tstamp(), which returns a timestamp based on the normal/adjustable time or based on the free running cycle counter. If function is not supported, then timestamp handling is not changed. This mechanism is intended for RX, but TX use is also possible. Signed-off-by: Gerhard Engleder Acked-by: Jonathan Lemon Acked-by: Richard Cochran Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 21 +++++++++++++++++++++ include/linux/skbuff.h | 14 +++++++++++--- 2 files changed, 32 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 74c97a34921d..51fe143091cf 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1356,6 +1356,12 @@ struct netdev_net_notifier { * The caller must be under RCU read context. * int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path); * Get the forwarding path to reach the real device from the HW destination address + * ktime_t (*ndo_get_tstamp)(struct net_device *dev, + * const struct skb_shared_hwtstamps *hwtstamps, + * bool cycles); + * Get hardware timestamp based on normal/adjustable time or free running + * cycle counter. This function is required if physical clock supports a + * free running cycle counter. */ struct net_device_ops { int (*ndo_init)(struct net_device *dev); @@ -1578,6 +1584,9 @@ struct net_device_ops { struct net_device * (*ndo_get_peer_dev)(struct net_device *dev); int (*ndo_fill_forward_path)(struct net_device_path_ctx *ctx, struct net_device_path *path); + ktime_t (*ndo_get_tstamp)(struct net_device *dev, + const struct skb_shared_hwtstamps *hwtstamps, + bool cycles); }; /** @@ -4761,6 +4770,18 @@ static inline void netdev_rx_csum_fault(struct net_device *dev, void net_enable_timestamp(void); void net_disable_timestamp(void); +static inline ktime_t netdev_get_tstamp(struct net_device *dev, + const struct skb_shared_hwtstamps *hwtstamps, + bool cycles) +{ + const struct net_device_ops *ops = dev->netdev_ops; + + if (ops->ndo_get_tstamp) + return ops->ndo_get_tstamp(dev, hwtstamps, cycles); + + return hwtstamps->hwtstamp; +} + static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops, struct sk_buff *skb, struct net_device *dev, bool more) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 4a4f25975ca2..97de40bcfef6 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -588,8 +588,10 @@ static inline bool skb_frag_must_loop(struct page *p) /** * struct skb_shared_hwtstamps - hardware time stamps - * @hwtstamp: hardware time stamp transformed into duration - * since arbitrary point in time + * @hwtstamp: hardware time stamp transformed into duration + * since arbitrary point in time + * @netdev_data: address/cookie of network device driver used as + * reference to actual hardware time stamp * * Software time stamps generated by ktime_get_real() are stored in * skb->tstamp. @@ -601,7 +603,10 @@ static inline bool skb_frag_must_loop(struct page *p) * &skb_shared_info. Use skb_hwtstamps() to get a pointer. */ struct skb_shared_hwtstamps { - ktime_t hwtstamp; + union { + ktime_t hwtstamp; + void *netdev_data; + }; }; /* Definitions for tx_flags in struct skb_shared_info */ @@ -621,6 +626,9 @@ enum { /* generate wifi status information (where possible) */ SKBTX_WIFI_STATUS = 1 << 4, + /* determine hardware time stamp based on time or cycles */ + SKBTX_HW_TSTAMP_NETDEV = 1 << 5, + /* generate software time stamp when entering packet scheduling */ SKBTX_SCHED_TSTAMP = 1 << 6, }; -- cgit From 57ce2e406fe1fa005e8fdbf20936c791252ac7bc Mon Sep 17 00:00:00 2001 From: Nava kishore Manne Date: Sat, 23 Apr 2022 22:32:32 +0530 Subject: fpga: fix for coding style issues fixes the below checks reported by checkpatch.pl: - Lines should not end with a '(' - Alignment should match open parenthesis Signed-off-by: Nava kishore Manne Link: https://lore.kernel.org/r/20220423170235.2115479-3-nava.manne@xilinx.com Signed-off-by: Xu Yilun --- include/linux/fpga/fpga-region.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fpga/fpga-region.h b/include/linux/fpga/fpga-region.h index 3b87f232425c..9d4d32909340 100644 --- a/include/linux/fpga/fpga-region.h +++ b/include/linux/fpga/fpga-region.h @@ -52,9 +52,9 @@ struct fpga_region { #define to_fpga_region(d) container_of(d, struct fpga_region, dev) -struct fpga_region *fpga_region_class_find( - struct device *start, const void *data, - int (*match)(struct device *, const void *)); +struct fpga_region * +fpga_region_class_find(struct device *start, const void *data, + int (*match)(struct device *, const void *)); int fpga_region_program_fpga(struct fpga_region *region); -- cgit From 846e437387e74c44ddc9f3eeec472fd37ca3cdb9 Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Tue, 10 May 2022 12:02:03 +0300 Subject: net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a VF register to be notified for its enablement / disablement by the PF. Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with its notifier block and upon VF remove it will call mlx5_sriov_blocking_notifier_unregister() to drop its registration. This can give a VF the ability to clean some resources upon disable before that the command interface goes down and on the other hand sets some stuff before that it's enabled. This may be used by a VF which is migration capable in few cases.(e.g. PF load/unload upon an health recovery). Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.com Signed-off-by: Yishai Hadas Signed-off-by: Saeed Mahameed Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index ff47d49d8be4..6fac5427d899 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -445,6 +445,11 @@ struct mlx5_qp_table { struct radix_tree_root tree; }; +enum { + MLX5_PF_NOTIFY_DISABLE_VF, + MLX5_PF_NOTIFY_ENABLE_VF, +}; + struct mlx5_vf_context { int enabled; u64 port_guid; @@ -455,6 +460,7 @@ struct mlx5_vf_context { u8 port_guid_valid:1; u8 node_guid_valid:1; enum port_state_policy policy; + struct blocking_notifier_head notifier; }; struct mlx5_core_sriov { @@ -1152,6 +1158,12 @@ int mlx5_dm_sw_icm_dealloc(struct mlx5_core_dev *dev, enum mlx5_sw_icm_type type struct mlx5_core_dev *mlx5_vf_get_core_dev(struct pci_dev *pdev); void mlx5_vf_put_core_dev(struct mlx5_core_dev *mdev); +int mlx5_sriov_blocking_notifier_register(struct mlx5_core_dev *mdev, + int vf_id, + struct notifier_block *nb); +void mlx5_sriov_blocking_notifier_unregister(struct mlx5_core_dev *mdev, + int vf_id, + struct notifier_block *nb); #ifdef CONFIG_MLX5_CORE_IPOIB struct net_device *mlx5_rdma_netdev_alloc(struct mlx5_core_dev *mdev, struct ib_device *ibdev, -- cgit From 1ff297584fad2eef390f212b860e0fbb7363e0e8 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 10 May 2022 10:29:36 -0700 Subject: randomize_kstack: Improve docs on requirements/rationale There were some recent questions about where and why to use the random_kstack routines when applying them to new architectures[1]. Update the header comments to reflect the design choices for the routines. [1] https://lore.kernel.org/lkml/1652173338.7bltwybi0c.astroid@bobo.none Cc: Nicholas Piggin Cc: Xiu Jianfeng Signed-off-by: Kees Cook --- include/linux/randomize_kstack.h | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h index 1468caf001c0..5d868505a94e 100644 --- a/include/linux/randomize_kstack.h +++ b/include/linux/randomize_kstack.h @@ -40,10 +40,14 @@ DECLARE_PER_CPU(u32, kstack_offset); */ #define KSTACK_OFFSET_MAX(x) ((x) & 0x3FF) -/* - * These macros must be used during syscall entry when interrupts and +/** + * add_random_kstack_offset - Increase stack utilization by previously + * chosen random offset + * + * This should be used in the syscall entry path when interrupts and * preempt are disabled, and after user registers have been stored to - * the stack. + * the stack. For testing the resulting entropy, please see: + * tools/testing/selftests/lkdtm/stack-entropy.sh */ #define add_random_kstack_offset() do { \ if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ @@ -55,6 +59,23 @@ DECLARE_PER_CPU(u32, kstack_offset); } \ } while (0) +/** + * choose_random_kstack_offset - Choose the random offset for the next + * add_random_kstack_offset() + * + * This should only be used during syscall exit when interrupts and + * preempt are disabled. This position in the syscall flow is done to + * frustrate attacks from userspace attempting to learn the next offset: + * - Maximize the timing uncertainty visible from userspace: if the + * offset is chosen at syscall entry, userspace has much more control + * over the timing between choosing offsets. "How long will we be in + * kernel mode?" tends to be more difficult to predict than "how long + * will we be in user mode?" + * - Reduce the lifetime of the new offset sitting in memory during + * kernel mode execution. Exposure of "thread-local" memory content + * (e.g. current, percpu, etc) tends to be easier than arbitrary + * location memory exposure. + */ #define choose_random_kstack_offset(rand) do { \ if (static_branch_maybe(CONFIG_RANDOMIZE_KSTACK_OFFSET_DEFAULT, \ &randomize_kstack_offset)) { \ -- cgit From 9f88361273082825d9f0d13a543d49f9fa0d44a8 Mon Sep 17 00:00:00 2001 From: Dmitrii Dolgov <9erthalion6@gmail.com> Date: Tue, 10 May 2022 17:52:30 +0200 Subject: bpf: Add bpf_link iterator Implement bpf_link iterator to traverse links via bpf_seq_file operations. The changeset is mostly shamelessly copied from commit a228a64fc1e4 ("bpf: Add bpf_prog iterator") Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com> Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20220510155233.9815-2-9erthalion6@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index be94833d390a..551b7198ae8a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1544,6 +1544,7 @@ void bpf_link_put(struct bpf_link *link); int bpf_link_new_fd(struct bpf_link *link); struct file *bpf_link_new_file(struct bpf_link *link, int *reserved_fd); struct bpf_link *bpf_link_get_from_fd(u32 ufd); +struct bpf_link *bpf_link_get_curr_or_next(u32 *id); int bpf_obj_pin_user(u32 ufd, const char __user *pathname); int bpf_obj_get_user(const char __user *pathname, int flags); -- cgit From d721def7392a7348ffb9f3583b264239cbd3702c Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 10 May 2022 14:26:12 +0200 Subject: kallsyms: Make kallsyms_on_each_symbol generally available Making kallsyms_on_each_symbol generally available, so it can be used outside CONFIG_LIVEPATCH option in following changes. Rather than adding another ifdef option let's make the function generally available (when CONFIG_KALLSYMS option is defined). Cc: Christoph Hellwig Reviewed-by: Masami Hiramatsu Signed-off-by: Jiri Olsa Link: https://lore.kernel.org/r/20220510122616.2652285-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/kallsyms.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index ce1bd2fbf23e..ad39636e0c3f 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -65,11 +65,11 @@ static inline void *dereference_symbol_descriptor(void *ptr) return ptr; } +#ifdef CONFIG_KALLSYMS int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, unsigned long), void *data); -#ifdef CONFIG_KALLSYMS /* Lookup the address for a symbol. Returns 0 if not found. */ unsigned long kallsyms_lookup_name(const char *name); @@ -163,6 +163,11 @@ static inline bool kallsyms_show_value(const struct cred *cred) return false; } +static inline int kallsyms_on_each_symbol(int (*fn)(void *, const char *, struct module *, + unsigned long), void *data) +{ + return -EOPNOTSUPP; +} #endif /*CONFIG_KALLSYMS*/ static inline void print_ip_sym(const char *loglvl, unsigned long ip) -- cgit From bed0d9a50dacee6fcf785c555cfb0d2573355afc Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 10 May 2022 14:26:13 +0200 Subject: ftrace: Add ftrace_lookup_symbols function Adding ftrace_lookup_symbols function that resolves array of symbols with single pass over kallsyms. The user provides array of string pointers with count and pointer to allocated array for resolved values. int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs) It iterates all kallsyms symbols and tries to loop up each in provided symbols array with bsearch. The symbols array needs to be sorted by name for this reason. We also check each symbol to pass ftrace_location, because this API will be used for fprobe symbols resolving. Suggested-by: Andrii Nakryiko Acked-by: Andrii Nakryiko Reviewed-by: Masami Hiramatsu Signed-off-by: Jiri Olsa Link: https://lore.kernel.org/r/20220510122616.2652285-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/ftrace.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 4816b7e11047..820500430eae 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -303,6 +303,8 @@ int unregister_ftrace_function(struct ftrace_ops *ops); extern void ftrace_stub(unsigned long a0, unsigned long a1, struct ftrace_ops *op, struct ftrace_regs *fregs); + +int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs); #else /* !CONFIG_FUNCTION_TRACER */ /* * (un)register_ftrace_function must be a macro since the ops parameter @@ -313,6 +315,10 @@ extern void ftrace_stub(unsigned long a0, unsigned long a1, static inline void ftrace_kill(void) { } static inline void ftrace_free_init_mem(void) { } static inline void ftrace_free_mem(struct module *mod, void *start, void *end) { } +static inline int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long *addrs) +{ + return -EOPNOTSUPP; +} #endif /* CONFIG_FUNCTION_TRACER */ struct ftrace_func_entry { -- cgit From ddccc9ef55992716ad477d38fbcd9f8f1d34fc67 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 9 May 2022 09:04:54 -0700 Subject: skbuff: add a basic intro doc Add basic skb documentation. It's mostly an intro to the subsequent patches - it would looks strange if we documented advanced topics without covering the basics in any way. Reviewed-by: David Ahern Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 97de40bcfef6..69802624276d 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -800,6 +800,46 @@ typedef unsigned int sk_buff_data_t; typedef unsigned char *sk_buff_data_t; #endif +/** + * DOC: Basic sk_buff geometry + * + * struct sk_buff itself is a metadata structure and does not hold any packet + * data. All the data is held in associated buffers. + * + * &sk_buff.head points to the main "head" buffer. The head buffer is divided + * into two parts: + * + * - data buffer, containing headers and sometimes payload; + * this is the part of the skb operated on by the common helpers + * such as skb_put() or skb_pull(); + * - shared info (struct skb_shared_info) which holds an array of pointers + * to read-only data in the (page, offset, length) format. + * + * Optionally &skb_shared_info.frag_list may point to another skb. + * + * Basic diagram may look like this:: + * + * --------------- + * | sk_buff | + * --------------- + * ,--------------------------- + head + * / ,----------------- + data + * / / ,----------- + tail + * | | | , + end + * | | | | + * v v v v + * ----------------------------------------------- + * | headroom | data | tailroom | skb_shared_info | + * ----------------------------------------------- + * + [page frag] + * + [page frag] + * + [page frag] + * + [page frag] --------- + * + frag_list --> | sk_buff | + * --------- + * + */ + /** * struct sk_buff - socket buffer * @next: Next buffer in list -- cgit From 9ec7ea1462084df695f34c5ac2d2d2250d9d6897 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 9 May 2022 09:04:55 -0700 Subject: skbuff: rewrite the doc for data-only skbs The comment about shinfo->dataref split is really unhelpful, at least to me. Rewrite it and render it to skb documentation. Reviewed-by: David Ahern Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 42 ++++++++++++++++++++++++++++++------------ 1 file changed, 30 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 69802624276d..0e492f9bf532 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -727,16 +727,32 @@ struct skb_shared_info { skb_frag_t frags[MAX_SKB_FRAGS]; }; -/* We divide dataref into two halves. The higher 16 bits hold references - * to the payload part of skb->data. The lower 16 bits hold references to - * the entire skb->data. A clone of a headerless skb holds the length of - * the header in skb->hdr_len. - * - * All users must obey the rule that the skb->data reference count must be - * greater than or equal to the payload reference count. - * - * Holding a reference to the payload part means that the user does not - * care about modifications to the header part of skb->data. +/** + * DOC: dataref and headerless skbs + * + * Transport layers send out clones of payload skbs they hold for + * retransmissions. To allow lower layers of the stack to prepend their headers + * we split &skb_shared_info.dataref into two halves. + * The lower 16 bits count the overall number of references. + * The higher 16 bits indicate how many of the references are payload-only. + * skb_header_cloned() checks if skb is allowed to add / write the headers. + * + * The creator of the skb (e.g. TCP) marks its skb as &sk_buff.nohdr + * (via __skb_header_release()). Any clone created from marked skb will get + * &sk_buff.hdr_len populated with the available headroom. + * If there's the only clone in existence it's able to modify the headroom + * at will. The sequence of calls inside the transport layer is:: + * + * + * skb_reserve() + * __skb_header_release() + * skb_clone() + * // send the clone down the stack + * + * This is not a very generic construct and it depends on the transport layers + * doing the right thing. In practice there's usually only one payload-only skb. + * Having multiple payload-only skbs with different lengths of hdr_len is not + * possible. The payload-only skbs should never leave their owner. */ #define SKB_DATAREF_SHIFT 16 #define SKB_DATAREF_MASK ((1 << SKB_DATAREF_SHIFT) - 1) @@ -2027,8 +2043,10 @@ static inline int skb_header_unclone(struct sk_buff *skb, gfp_t pri) } /** - * __skb_header_release - release reference to header - * @skb: buffer to operate on + * __skb_header_release() - allow clones to use the headroom + * @skb: buffer to operate on + * + * See "DOC: dataref and headerless skbs". */ static inline void __skb_header_release(struct sk_buff *skb) { -- cgit From 9facd94114b59afebd405e74b3bc2bf184efe986 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 9 May 2022 09:04:56 -0700 Subject: skbuff: render the checksum comment to documentation Long time ago Tom added a giant comment to skbuff.h explaining checksums. Now that we have a place in Documentation for skbuff docs we should render it. Sprinkle some markup while at it. Reviewed-by: David Ahern Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 219 ++++++++++++++++++++++++++++--------------------- 1 file changed, 124 insertions(+), 95 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 0e492f9bf532..5296b2c37f62 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -43,98 +43,112 @@ #include #endif -/* The interface for checksum offload between the stack and networking drivers +/** + * DOC: skb checksums + * + * The interface for checksum offload between the stack and networking drivers * is as follows... * - * A. IP checksum related features + * IP checksum related features + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * * Drivers advertise checksum offload capabilities in the features of a device. * From the stack's point of view these are capabilities offered by the driver. * A driver typically only advertises features that it is capable of offloading * to its device. * - * The checksum related features are: - * - * NETIF_F_HW_CSUM - The driver (or its device) is able to compute one - * IP (one's complement) checksum for any combination - * of protocols or protocol layering. The checksum is - * computed and set in a packet per the CHECKSUM_PARTIAL - * interface (see below). - * - * NETIF_F_IP_CSUM - Driver (device) is only able to checksum plain - * TCP or UDP packets over IPv4. These are specifically - * unencapsulated packets of the form IPv4|TCP or - * IPv4|UDP where the Protocol field in the IPv4 header - * is TCP or UDP. The IPv4 header may contain IP options. - * This feature cannot be set in features for a device - * with NETIF_F_HW_CSUM also set. This feature is being - * DEPRECATED (see below). - * - * NETIF_F_IPV6_CSUM - Driver (device) is only able to checksum plain - * TCP or UDP packets over IPv6. These are specifically - * unencapsulated packets of the form IPv6|TCP or - * IPv6|UDP where the Next Header field in the IPv6 - * header is either TCP or UDP. IPv6 extension headers - * are not supported with this feature. This feature - * cannot be set in features for a device with - * NETIF_F_HW_CSUM also set. This feature is being - * DEPRECATED (see below). - * - * NETIF_F_RXCSUM - Driver (device) performs receive checksum offload. - * This flag is only used to disable the RX checksum - * feature for a device. The stack will accept receive - * checksum indication in packets received on a device - * regardless of whether NETIF_F_RXCSUM is set. - * - * B. Checksumming of received packets by device. Indication of checksum - * verification is set in skb->ip_summed. Possible values are: - * - * CHECKSUM_NONE: + * .. flat-table:: Checksum related device features + * :widths: 1 10 + * + * * - %NETIF_F_HW_CSUM + * - The driver (or its device) is able to compute one + * IP (one's complement) checksum for any combination + * of protocols or protocol layering. The checksum is + * computed and set in a packet per the CHECKSUM_PARTIAL + * interface (see below). + * + * * - %NETIF_F_IP_CSUM + * - Driver (device) is only able to checksum plain + * TCP or UDP packets over IPv4. These are specifically + * unencapsulated packets of the form IPv4|TCP or + * IPv4|UDP where the Protocol field in the IPv4 header + * is TCP or UDP. The IPv4 header may contain IP options. + * This feature cannot be set in features for a device + * with NETIF_F_HW_CSUM also set. This feature is being + * DEPRECATED (see below). + * + * * - %NETIF_F_IPV6_CSUM + * - Driver (device) is only able to checksum plain + * TCP or UDP packets over IPv6. These are specifically + * unencapsulated packets of the form IPv6|TCP or + * IPv6|UDP where the Next Header field in the IPv6 + * header is either TCP or UDP. IPv6 extension headers + * are not supported with this feature. This feature + * cannot be set in features for a device with + * NETIF_F_HW_CSUM also set. This feature is being + * DEPRECATED (see below). + * + * * - %NETIF_F_RXCSUM + * - Driver (device) performs receive checksum offload. + * This flag is only used to disable the RX checksum + * feature for a device. The stack will accept receive + * checksum indication in packets received on a device + * regardless of whether NETIF_F_RXCSUM is set. + * + * Checksumming of received packets by device + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * Indication of checksum verification is set in &sk_buff.ip_summed. + * Possible values are: + * + * - %CHECKSUM_NONE * * Device did not checksum this packet e.g. due to lack of capabilities. * The packet contains full (though not verified) checksum in packet but * not in skb->csum. Thus, skb->csum is undefined in this case. * - * CHECKSUM_UNNECESSARY: + * - %CHECKSUM_UNNECESSARY * * The hardware you're dealing with doesn't calculate the full checksum - * (as in CHECKSUM_COMPLETE), but it does parse headers and verify checksums - * for specific protocols. For such packets it will set CHECKSUM_UNNECESSARY - * if their checksums are okay. skb->csum is still undefined in this case + * (as in %CHECKSUM_COMPLETE), but it does parse headers and verify checksums + * for specific protocols. For such packets it will set %CHECKSUM_UNNECESSARY + * if their checksums are okay. &sk_buff.csum is still undefined in this case * though. A driver or device must never modify the checksum field in the * packet even if checksum is verified. * - * CHECKSUM_UNNECESSARY is applicable to following protocols: - * TCP: IPv6 and IPv4. - * UDP: IPv4 and IPv6. A device may apply CHECKSUM_UNNECESSARY to a + * %CHECKSUM_UNNECESSARY is applicable to following protocols: + * + * - TCP: IPv6 and IPv4. + * - UDP: IPv4 and IPv6. A device may apply CHECKSUM_UNNECESSARY to a * zero UDP checksum for either IPv4 or IPv6, the networking stack * may perform further validation in this case. - * GRE: only if the checksum is present in the header. - * SCTP: indicates the CRC in SCTP header has been validated. - * FCOE: indicates the CRC in FC frame has been validated. + * - GRE: only if the checksum is present in the header. + * - SCTP: indicates the CRC in SCTP header has been validated. + * - FCOE: indicates the CRC in FC frame has been validated. * - * skb->csum_level indicates the number of consecutive checksums found in - * the packet minus one that have been verified as CHECKSUM_UNNECESSARY. + * &sk_buff.csum_level indicates the number of consecutive checksums found in + * the packet minus one that have been verified as %CHECKSUM_UNNECESSARY. * For instance if a device receives an IPv6->UDP->GRE->IPv4->TCP packet * and a device is able to verify the checksums for UDP (possibly zero), - * GRE (checksum flag is set) and TCP, skb->csum_level would be set to + * GRE (checksum flag is set) and TCP, &sk_buff.csum_level would be set to * two. If the device were only able to verify the UDP checksum and not * GRE, either because it doesn't support GRE checksum or because GRE * checksum is bad, skb->csum_level would be set to zero (TCP checksum is * not considered in this case). * - * CHECKSUM_COMPLETE: + * - %CHECKSUM_COMPLETE * * This is the most generic way. The device supplied checksum of the _whole_ - * packet as seen by netif_rx() and fills in skb->csum. This means the + * packet as seen by netif_rx() and fills in &sk_buff.csum. This means the * hardware doesn't need to parse L3/L4 headers to implement this. * * Notes: + * * - Even if device supports only some protocols, but is able to produce * skb->csum, it MUST use CHECKSUM_COMPLETE, not CHECKSUM_UNNECESSARY. * - CHECKSUM_COMPLETE is not applicable to SCTP and FCoE protocols. * - * CHECKSUM_PARTIAL: + * - %CHECKSUM_PARTIAL * * A checksum is set up to be offloaded to a device as described in the * output description for CHECKSUM_PARTIAL. This may occur on a packet @@ -146,14 +160,18 @@ * packet that are after the checksum being offloaded are not considered to * be verified. * - * C. Checksumming on transmit for non-GSO. The stack requests checksum offload - * in the skb->ip_summed for a packet. Values are: + * Checksumming on transmit for non-GSO + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * The stack requests checksum offload in the &sk_buff.ip_summed for a packet. + * Values are: * - * CHECKSUM_PARTIAL: + * - %CHECKSUM_PARTIAL * * The driver is required to checksum the packet as seen by hard_start_xmit() - * from skb->csum_start up to the end, and to record/write the checksum at - * offset skb->csum_start + skb->csum_offset. A driver may verify that the + * from &sk_buff.csum_start up to the end, and to record/write the checksum at + * offset &sk_buff.csum_start + &sk_buff.csum_offset. + * A driver may verify that the * csum_start and csum_offset values are valid values given the length and * offset of the packet, but it should not attempt to validate that the * checksum refers to a legitimate transport layer checksum -- it is the @@ -165,55 +183,66 @@ * checksum calculation to the device, or call skb_checksum_help (in the case * that the device does not support offload for a particular checksum). * - * NETIF_F_IP_CSUM and NETIF_F_IPV6_CSUM are being deprecated in favor of - * NETIF_F_HW_CSUM. New devices should use NETIF_F_HW_CSUM to indicate + * %NETIF_F_IP_CSUM and %NETIF_F_IPV6_CSUM are being deprecated in favor of + * %NETIF_F_HW_CSUM. New devices should use %NETIF_F_HW_CSUM to indicate * checksum offload capability. - * skb_csum_hwoffload_help() can be called to resolve CHECKSUM_PARTIAL based + * skb_csum_hwoffload_help() can be called to resolve %CHECKSUM_PARTIAL based * on network device checksumming capabilities: if a packet does not match - * them, skb_checksum_help or skb_crc32c_help (depending on the value of - * csum_not_inet, see item D.) is called to resolve the checksum. + * them, skb_checksum_help() or skb_crc32c_help() (depending on the value of + * &sk_buff.csum_not_inet, see :ref:`crc`) + * is called to resolve the checksum. * - * CHECKSUM_NONE: + * - %CHECKSUM_NONE * * The skb was already checksummed by the protocol, or a checksum is not * required. * - * CHECKSUM_UNNECESSARY: + * - %CHECKSUM_UNNECESSARY * * This has the same meaning as CHECKSUM_NONE for checksum offload on * output. * - * CHECKSUM_COMPLETE: + * - %CHECKSUM_COMPLETE + * * Not used in checksum output. If a driver observes a packet with this value - * set in skbuff, it should treat the packet as if CHECKSUM_NONE were set. - * - * D. Non-IP checksum (CRC) offloads - * - * NETIF_F_SCTP_CRC - This feature indicates that a device is capable of - * offloading the SCTP CRC in a packet. To perform this offload the stack - * will set csum_start and csum_offset accordingly, set ip_summed to - * CHECKSUM_PARTIAL and set csum_not_inet to 1, to provide an indication in - * the skbuff that the CHECKSUM_PARTIAL refers to CRC32c. - * A driver that supports both IP checksum offload and SCTP CRC32c offload - * must verify which offload is configured for a packet by testing the - * value of skb->csum_not_inet; skb_crc32c_csum_help is provided to resolve - * CHECKSUM_PARTIAL on skbs where csum_not_inet is set to 1. - * - * NETIF_F_FCOE_CRC - This feature indicates that a device is capable of - * offloading the FCOE CRC in a packet. To perform this offload the stack - * will set ip_summed to CHECKSUM_PARTIAL and set csum_start and csum_offset - * accordingly. Note that there is no indication in the skbuff that the - * CHECKSUM_PARTIAL refers to an FCOE checksum, so a driver that supports - * both IP checksum offload and FCOE CRC offload must verify which offload - * is configured for a packet, presumably by inspecting packet headers. - * - * E. Checksumming on output with GSO. - * - * In the case of a GSO packet (skb_is_gso(skb) is true), checksum offload + * set in skbuff, it should treat the packet as if %CHECKSUM_NONE were set. + * + * .. _crc: + * + * Non-IP checksum (CRC) offloads + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * .. flat-table:: + * :widths: 1 10 + * + * * - %NETIF_F_SCTP_CRC + * - This feature indicates that a device is capable of + * offloading the SCTP CRC in a packet. To perform this offload the stack + * will set csum_start and csum_offset accordingly, set ip_summed to + * %CHECKSUM_PARTIAL and set csum_not_inet to 1, to provide an indication + * in the skbuff that the %CHECKSUM_PARTIAL refers to CRC32c. + * A driver that supports both IP checksum offload and SCTP CRC32c offload + * must verify which offload is configured for a packet by testing the + * value of &sk_buff.csum_not_inet; skb_crc32c_csum_help() is provided to + * resolve %CHECKSUM_PARTIAL on skbs where csum_not_inet is set to 1. + * + * * - %NETIF_F_FCOE_CRC + * - This feature indicates that a device is capable of offloading the FCOE + * CRC in a packet. To perform this offload the stack will set ip_summed + * to %CHECKSUM_PARTIAL and set csum_start and csum_offset + * accordingly. Note that there is no indication in the skbuff that the + * %CHECKSUM_PARTIAL refers to an FCOE checksum, so a driver that supports + * both IP checksum offload and FCOE CRC offload must verify which offload + * is configured for a packet, presumably by inspecting packet headers. + * + * Checksumming on output with GSO + * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * + * In the case of a GSO packet (skb_is_gso() is true), checksum offload * is implied by the SKB_GSO_* flags in gso_type. Most obviously, if the - * gso_type is SKB_GSO_TCPV4 or SKB_GSO_TCPV6, TCP checksum offload as + * gso_type is %SKB_GSO_TCPV4 or %SKB_GSO_TCPV6, TCP checksum offload as * part of the GSO operation is implied. If a checksum is being offloaded - * with GSO then ip_summed is CHECKSUM_PARTIAL, and both csum_start and + * with GSO then ip_summed is %CHECKSUM_PARTIAL, and both csum_start and * csum_offset are set to refer to the outermost checksum being offloaded * (two offloaded checksums are possible with UDP encapsulation). */ -- cgit From f7e0beaf39d3868dc700d4954b26cf8443c5d423 Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Tue, 10 May 2022 13:59:19 -0700 Subject: bpf, x86: Generate trampolines from bpf_tramp_links Replace struct bpf_tramp_progs with struct bpf_tramp_links to collect struct bpf_tramp_link(s) for a trampoline. struct bpf_tramp_link extends bpf_link to act as a linked list node. arch_prepare_bpf_trampoline() accepts a struct bpf_tramp_links to collects all bpf_tramp_link(s) that a trampoline should call. Change BPF trampoline and bpf_struct_ops to pass bpf_tramp_links instead of bpf_tramp_progs. Signed-off-by: Kui-Feng Lee Signed-off-by: Alexei Starovoitov Signed-off-by: Andrii Nakryiko Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20220510205923.3206889-2-kuifeng@fb.com --- include/linux/bpf.h | 36 ++++++++++++++++++++++++------------ include/linux/bpf_types.h | 1 + 2 files changed, 25 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 551b7198ae8a..75e0110a65e1 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -723,11 +723,11 @@ struct btf_func_model { /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 * bytes on x86. Pick a number to fit into BPF_IMAGE_SIZE / 2 */ -#define BPF_MAX_TRAMP_PROGS 38 +#define BPF_MAX_TRAMP_LINKS 38 -struct bpf_tramp_progs { - struct bpf_prog *progs[BPF_MAX_TRAMP_PROGS]; - int nr_progs; +struct bpf_tramp_links { + struct bpf_tramp_link *links[BPF_MAX_TRAMP_LINKS]; + int nr_links; }; /* Different use cases for BPF trampoline: @@ -753,7 +753,7 @@ struct bpf_tramp_progs { struct bpf_tramp_image; int arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *image_end, const struct btf_func_model *m, u32 flags, - struct bpf_tramp_progs *tprogs, + struct bpf_tramp_links *tlinks, void *orig_call); /* these two functions are called from generated trampoline */ u64 notrace __bpf_prog_enter(struct bpf_prog *prog); @@ -852,9 +852,10 @@ static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func( { return bpf_func(ctx, insnsi); } + #ifdef CONFIG_BPF_JIT -int bpf_trampoline_link_prog(struct bpf_prog *prog, struct bpf_trampoline *tr); -int bpf_trampoline_unlink_prog(struct bpf_prog *prog, struct bpf_trampoline *tr); +int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr); +int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr); struct bpf_trampoline *bpf_trampoline_get(u64 key, struct bpf_attach_target_info *tgt_info); void bpf_trampoline_put(struct bpf_trampoline *tr); @@ -905,12 +906,12 @@ int bpf_jit_charge_modmem(u32 size); void bpf_jit_uncharge_modmem(u32 size); bool bpf_prog_has_trampoline(const struct bpf_prog *prog); #else -static inline int bpf_trampoline_link_prog(struct bpf_prog *prog, +static inline int bpf_trampoline_link_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr) { return -ENOTSUPP; } -static inline int bpf_trampoline_unlink_prog(struct bpf_prog *prog, +static inline int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampoline *tr) { return -ENOTSUPP; @@ -1009,7 +1010,6 @@ struct bpf_prog_aux { bool tail_call_reachable; bool xdp_has_frags; bool use_bpf_prog_pack; - struct hlist_node tramp_hlist; /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ const struct btf_type *attach_func_proto; /* function name for valid attach_btf_id */ @@ -1096,6 +1096,18 @@ struct bpf_link_ops { struct bpf_link_info *info); }; +struct bpf_tramp_link { + struct bpf_link link; + struct hlist_node tramp_hlist; +}; + +struct bpf_tracing_link { + struct bpf_tramp_link link; + enum bpf_attach_type attach_type; + struct bpf_trampoline *trampoline; + struct bpf_prog *tgt_prog; +}; + struct bpf_link_primer { struct bpf_link *link; struct file *file; @@ -1133,8 +1145,8 @@ bool bpf_struct_ops_get(const void *kdata); void bpf_struct_ops_put(const void *kdata); int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, void *key, void *value); -int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_progs *tprogs, - struct bpf_prog *prog, +int bpf_struct_ops_prepare_trampoline(struct bpf_tramp_links *tlinks, + struct bpf_tramp_link *link, const struct btf_func_model *model, void *image, void *image_end); static inline bool bpf_try_module_get(const void *data, struct module *owner) diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 3e24ad0c4b3c..2b9112b80171 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -141,3 +141,4 @@ BPF_LINK_TYPE(BPF_LINK_TYPE_XDP, xdp) BPF_LINK_TYPE(BPF_LINK_TYPE_PERF_EVENT, perf) #endif BPF_LINK_TYPE(BPF_LINK_TYPE_KPROBE_MULTI, kprobe_multi) +BPF_LINK_TYPE(BPF_LINK_TYPE_STRUCT_OPS, struct_ops) -- cgit From e384c7b7b46d0a5f4bf3c554f963e6e9622d0ab1 Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Tue, 10 May 2022 13:59:20 -0700 Subject: bpf, x86: Create bpf_tramp_run_ctx on the caller thread's stack BPF trampolines will create a bpf_tramp_run_ctx, a bpf_run_ctx, on stacks and set/reset the current bpf_run_ctx before/after calling a bpf_prog. Signed-off-by: Kui-Feng Lee Signed-off-by: Alexei Starovoitov Signed-off-by: Andrii Nakryiko Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20220510205923.3206889-3-kuifeng@fb.com --- include/linux/bpf.h | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 75e0110a65e1..256fb802e580 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -730,6 +730,8 @@ struct bpf_tramp_links { int nr_links; }; +struct bpf_tramp_run_ctx; + /* Different use cases for BPF trampoline: * 1. replace nop at the function entry (kprobe equivalent) * flags = BPF_TRAMP_F_RESTORE_REGS @@ -756,10 +758,11 @@ int arch_prepare_bpf_trampoline(struct bpf_tramp_image *tr, void *image, void *i struct bpf_tramp_links *tlinks, void *orig_call); /* these two functions are called from generated trampoline */ -u64 notrace __bpf_prog_enter(struct bpf_prog *prog); -void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start); -u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog); -void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start); +u64 notrace __bpf_prog_enter(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); +void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx); +u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); +void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); @@ -1351,6 +1354,12 @@ struct bpf_trace_run_ctx { u64 bpf_cookie; }; +struct bpf_tramp_run_ctx { + struct bpf_run_ctx run_ctx; + u64 bpf_cookie; + struct bpf_run_ctx *saved_run_ctx; +}; + static inline struct bpf_run_ctx *bpf_set_run_ctx(struct bpf_run_ctx *new_ctx) { struct bpf_run_ctx *old_ctx = NULL; -- cgit From 2fcc82411e74e5e6aba336561cf56fb899bfae4e Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Tue, 10 May 2022 13:59:21 -0700 Subject: bpf, x86: Attach a cookie to fentry/fexit/fmod_ret/lsm. Pass a cookie along with BPF_LINK_CREATE requests. Add a bpf_cookie field to struct bpf_tracing_link to attach a cookie. The cookie of a bpf_tracing_link is available by calling bpf_get_attach_cookie when running the BPF program of the attached link. The value of a cookie will be set at bpf_tramp_run_ctx by the trampoline of the link. Signed-off-by: Kui-Feng Lee Signed-off-by: Alexei Starovoitov Signed-off-by: Andrii Nakryiko Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20220510205923.3206889-4-kuifeng@fb.com --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 256fb802e580..aba7ded56436 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1102,6 +1102,7 @@ struct bpf_link_ops { struct bpf_tramp_link { struct bpf_link link; struct hlist_node tramp_hlist; + u64 cookie; }; struct bpf_tracing_link { -- cgit From 5b87be9e4978fe507c7e14c0bdff147d10d39aec Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 9 May 2022 20:57:37 -0700 Subject: net: add include/net/net_debug.h Remove from include/linux/netdevice.h helpers that send debug/info/warnings to syslog. We plan adding more helpers in following patches. v2: added two includes, and 'struct net_device' forward declaration to avoid compile errors (kernel bots) Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/netdevice.h | 141 +--------------------------------------------- 1 file changed, 1 insertion(+), 140 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 51fe143091cf..536321691c72 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -50,6 +50,7 @@ #include #include #include +#include struct netpoll_info; struct device; @@ -5071,81 +5072,9 @@ static inline const char *netdev_reg_state(const struct net_device *dev) return " (unknown)"; } -__printf(3, 4) __cold -void netdev_printk(const char *level, const struct net_device *dev, - const char *format, ...); -__printf(2, 3) __cold -void netdev_emerg(const struct net_device *dev, const char *format, ...); -__printf(2, 3) __cold -void netdev_alert(const struct net_device *dev, const char *format, ...); -__printf(2, 3) __cold -void netdev_crit(const struct net_device *dev, const char *format, ...); -__printf(2, 3) __cold -void netdev_err(const struct net_device *dev, const char *format, ...); -__printf(2, 3) __cold -void netdev_warn(const struct net_device *dev, const char *format, ...); -__printf(2, 3) __cold -void netdev_notice(const struct net_device *dev, const char *format, ...); -__printf(2, 3) __cold -void netdev_info(const struct net_device *dev, const char *format, ...); - -#define netdev_level_once(level, dev, fmt, ...) \ -do { \ - static bool __section(".data.once") __print_once; \ - \ - if (!__print_once) { \ - __print_once = true; \ - netdev_printk(level, dev, fmt, ##__VA_ARGS__); \ - } \ -} while (0) - -#define netdev_emerg_once(dev, fmt, ...) \ - netdev_level_once(KERN_EMERG, dev, fmt, ##__VA_ARGS__) -#define netdev_alert_once(dev, fmt, ...) \ - netdev_level_once(KERN_ALERT, dev, fmt, ##__VA_ARGS__) -#define netdev_crit_once(dev, fmt, ...) \ - netdev_level_once(KERN_CRIT, dev, fmt, ##__VA_ARGS__) -#define netdev_err_once(dev, fmt, ...) \ - netdev_level_once(KERN_ERR, dev, fmt, ##__VA_ARGS__) -#define netdev_warn_once(dev, fmt, ...) \ - netdev_level_once(KERN_WARNING, dev, fmt, ##__VA_ARGS__) -#define netdev_notice_once(dev, fmt, ...) \ - netdev_level_once(KERN_NOTICE, dev, fmt, ##__VA_ARGS__) -#define netdev_info_once(dev, fmt, ...) \ - netdev_level_once(KERN_INFO, dev, fmt, ##__VA_ARGS__) - #define MODULE_ALIAS_NETDEV(device) \ MODULE_ALIAS("netdev-" device) -#if defined(CONFIG_DYNAMIC_DEBUG) || \ - (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE)) -#define netdev_dbg(__dev, format, args...) \ -do { \ - dynamic_netdev_dbg(__dev, format, ##args); \ -} while (0) -#elif defined(DEBUG) -#define netdev_dbg(__dev, format, args...) \ - netdev_printk(KERN_DEBUG, __dev, format, ##args) -#else -#define netdev_dbg(__dev, format, args...) \ -({ \ - if (0) \ - netdev_printk(KERN_DEBUG, __dev, format, ##args); \ -}) -#endif - -#if defined(VERBOSE_DEBUG) -#define netdev_vdbg netdev_dbg -#else - -#define netdev_vdbg(dev, format, args...) \ -({ \ - if (0) \ - netdev_printk(KERN_DEBUG, dev, format, ##args); \ - 0; \ -}) -#endif - /* * netdev_WARN() acts like dev_printk(), but with the key difference * of using a WARN/WARN_ON to get the message out, including the @@ -5159,74 +5088,6 @@ do { \ WARN_ONCE(1, "netdevice: %s%s: " format, netdev_name(dev), \ netdev_reg_state(dev), ##args) -/* netif printk helpers, similar to netdev_printk */ - -#define netif_printk(priv, type, level, dev, fmt, args...) \ -do { \ - if (netif_msg_##type(priv)) \ - netdev_printk(level, (dev), fmt, ##args); \ -} while (0) - -#define netif_level(level, priv, type, dev, fmt, args...) \ -do { \ - if (netif_msg_##type(priv)) \ - netdev_##level(dev, fmt, ##args); \ -} while (0) - -#define netif_emerg(priv, type, dev, fmt, args...) \ - netif_level(emerg, priv, type, dev, fmt, ##args) -#define netif_alert(priv, type, dev, fmt, args...) \ - netif_level(alert, priv, type, dev, fmt, ##args) -#define netif_crit(priv, type, dev, fmt, args...) \ - netif_level(crit, priv, type, dev, fmt, ##args) -#define netif_err(priv, type, dev, fmt, args...) \ - netif_level(err, priv, type, dev, fmt, ##args) -#define netif_warn(priv, type, dev, fmt, args...) \ - netif_level(warn, priv, type, dev, fmt, ##args) -#define netif_notice(priv, type, dev, fmt, args...) \ - netif_level(notice, priv, type, dev, fmt, ##args) -#define netif_info(priv, type, dev, fmt, args...) \ - netif_level(info, priv, type, dev, fmt, ##args) - -#if defined(CONFIG_DYNAMIC_DEBUG) || \ - (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE)) -#define netif_dbg(priv, type, netdev, format, args...) \ -do { \ - if (netif_msg_##type(priv)) \ - dynamic_netdev_dbg(netdev, format, ##args); \ -} while (0) -#elif defined(DEBUG) -#define netif_dbg(priv, type, dev, format, args...) \ - netif_printk(priv, type, KERN_DEBUG, dev, format, ##args) -#else -#define netif_dbg(priv, type, dev, format, args...) \ -({ \ - if (0) \ - netif_printk(priv, type, KERN_DEBUG, dev, format, ##args); \ - 0; \ -}) -#endif - -/* if @cond then downgrade to debug, else print at @level */ -#define netif_cond_dbg(priv, type, netdev, cond, level, fmt, args...) \ - do { \ - if (cond) \ - netif_dbg(priv, type, netdev, fmt, ##args); \ - else \ - netif_ ## level(priv, type, netdev, fmt, ##args); \ - } while (0) - -#if defined(VERBOSE_DEBUG) -#define netif_vdbg netif_dbg -#else -#define netif_vdbg(priv, type, dev, format, args...) \ -({ \ - if (0) \ - netif_printk(priv, type, KERN_DEBUG, dev, format, ##args); \ - 0; \ -}) -#endif - /* * The list of packet types we will receive (as opposed to discard) * and the routines to invoke. -- cgit From 66e4c8d950083df8e12981babca788e1635c92b6 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 9 May 2022 20:57:39 -0700 Subject: net: warn if transport header was not set Make sure skb_transport_header() and skb_transport_offset() uses are not fooled if the transport header has not been set. This change will likely expose existing bugs in linux networking stacks. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/skbuff.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5296b2c37f62..b91d225fdc13 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -42,6 +42,7 @@ #if IS_ENABLED(CONFIG_NF_CONNTRACK) #include #endif +#include /** * DOC: skb checksums @@ -2904,6 +2905,7 @@ static inline bool skb_transport_header_was_set(const struct sk_buff *skb) static inline unsigned char *skb_transport_header(const struct sk_buff *skb) { + DEBUG_NET_WARN_ON_ONCE(!skb_transport_header_was_set(skb)); return skb->head + skb->transport_header; } -- cgit From ee692a21e9bf8354bd3ec816f1cf4bff8619ed77 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 11 May 2022 11:17:45 +0530 Subject: fs,io_uring: add infrastructure for uring-cmd file_operations->uring_cmd is a file private handler. This is somewhat similar to ioctl but hopefully a lot more sane and useful as it can be used to enable many io_uring capabilities for the underlying operation. IORING_OP_URING_CMD is a file private kind of request. io_uring doesn't know what is in this command type, it's for the provider of ->uring_cmd() to deal with. Co-developed-by: Kanchan Joshi Signed-off-by: Kanchan Joshi Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220511054750.20432-2-joshi.k@samsung.com Signed-off-by: Jens Axboe --- include/linux/fs.h | 2 ++ include/linux/io_uring.h | 33 +++++++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index bbde95387a23..87b5af1d9fbe 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1953,6 +1953,7 @@ struct dir_context { #define REMAP_FILE_ADVISORY (REMAP_FILE_CAN_SHORTEN) struct iov_iter; +struct io_uring_cmd; struct file_operations { struct module *owner; @@ -1995,6 +1996,7 @@ struct file_operations { struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); int (*fadvise)(struct file *, loff_t, loff_t, int); + int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags); } __randomize_layout; struct inode_operations { diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 24651c229ed2..4a2f6cc5a492 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -5,7 +5,32 @@ #include #include +enum io_uring_cmd_flags { + IO_URING_F_COMPLETE_DEFER = 1, + IO_URING_F_UNLOCKED = 2, + /* int's last bit, sign checks are usually faster than a bit test */ + IO_URING_F_NONBLOCK = INT_MIN, + + /* ctx state flags, for URING_CMD */ + IO_URING_F_SQE128 = 4, + IO_URING_F_CQE32 = 8, + IO_URING_F_IOPOLL = 16, +}; + +struct io_uring_cmd { + struct file *file; + const void *cmd; + /* callback to defer completions to task context */ + void (*task_work_cb)(struct io_uring_cmd *cmd); + u32 cmd_op; + u32 pad; + u8 pdu[32]; /* available inline for free use */ +}; + #if defined(CONFIG_IO_URING) +void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2); +void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *)); struct sock *io_uring_get_socket(struct file *file); void __io_uring_cancel(bool cancel_all); void __io_uring_free(struct task_struct *tsk); @@ -30,6 +55,14 @@ static inline void io_uring_free(struct task_struct *tsk) __io_uring_free(tsk); } #else +static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, + ssize_t ret2) +{ +} +static inline void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, + void (*task_work_cb)(struct io_uring_cmd *)) +{ +} static inline struct sock *io_uring_get_socket(struct file *file) { return NULL; -- cgit From deaf7c4b4bf8b802cc465bb9b33fe6c76e812924 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Mon, 9 May 2022 21:03:43 +0200 Subject: lockdep: Delete local_irq_enable_in_hardirq() No more users and there is no desire to grow new ones. Signed-off-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/8735hir0j4.ffs@tglx --- include/linux/interrupt.h | 18 ------------------ 1 file changed, 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index f40754caaefa..b5e06a6e4019 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -222,24 +222,6 @@ devm_request_any_context_irq(struct device *dev, unsigned int irq, extern void devm_free_irq(struct device *dev, unsigned int irq, void *dev_id); -/* - * On lockdep we dont want to enable hardirqs in hardirq - * context. Use local_irq_enable_in_hardirq() to annotate - * kernel code that has to do this nevertheless (pretty much - * the only valid case is for old/broken hardware that is - * insanely slow). - * - * NOTE: in theory this might break fragile code that relies - * on hardirq delivery - in practice we dont seem to have such - * places left. So the only effect should be slightly increased - * irqs-off latencies. - */ -#ifdef CONFIG_LOCKDEP -# define local_irq_enable_in_hardirq() do { } while (0) -#else -# define local_irq_enable_in_hardirq() local_irq_enable() -#endif - bool irq_has_action(unsigned int irq); extern void disable_irq_nosync(unsigned int irq); extern bool disable_hardirq(unsigned int irq); -- cgit From abcebcd39fe094b68826cc04f2eca835606697f9 Mon Sep 17 00:00:00 2001 From: Michael Shych Date: Sat, 30 Apr 2022 14:58:07 +0300 Subject: platform_data/mlxreg: Add field for notification callback Add notification callback to inform caller that platform driver probing has been completed. It allows to caller to perform some initialization flow steps depending on specific driver probing completion. Signed-off-by: Michael Shych Reviewed-by: Vadim Pasternak Link: https://lore.kernel.org/r/20220430115809.54565-2-michaelsh@nvidia.com Signed-off-by: Hans de Goede --- include/linux/platform_data/mlxreg.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/mlxreg.h b/include/linux/platform_data/mlxreg.h index 40185f9d7c14..a6bd74e29b6b 100644 --- a/include/linux/platform_data/mlxreg.h +++ b/include/linux/platform_data/mlxreg.h @@ -216,6 +216,8 @@ struct mlxreg_core_platform_data { * @mask_low: low aggregation interrupt common mask; * @deferred_nr: I2C adapter number must be exist prior probing execution; * @shift_nr: I2C adapter numbers must be incremented by this value; + * @handle: handle to be passed by callback; + * @completion_notify: callback to notify when platform driver probing is done; */ struct mlxreg_core_hotplug_platform_data { struct mlxreg_core_item *items; @@ -228,6 +230,8 @@ struct mlxreg_core_hotplug_platform_data { u32 mask_low; int deferred_nr; int shift_nr; + void *handle; + int (*completion_notify)(void *handle, int id); }; #endif /* __LINUX_PLATFORM_DATA_MLXREG_H */ -- cgit From f9d76d15072caf1ec5558fa7cc6d93c7b9d33488 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Tue, 10 May 2022 11:51:29 -0400 Subject: USB: gadget: Add ID numbers to gadget names Putting USB gadgets on a new bus of their own encounters a problem when multiple gadgets are present: They all have the same name! The driver core fails with a "sys: cannot create duplicate filename" error when creating any of the /sys/bus/gadget/devices/ symbolic links after the first. This patch fixes the problem by adding a ".N" suffix to each gadget's name when the gadget is registered (where N is a unique ID number), thus making the names distinct. Reported-and-tested-by: Geert Uytterhoeven Signed-off-by: Alan Stern Reviewed-by: Geert Uytterhoeven Fixes: fc274c1e9973 ("USB: gadget: Add a new bus for gadgets") Link: https://lore.kernel.org/r/YnqKAXKyp9Vq/pbn@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/gadget.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h index cf7af8a0a6e9..3ad58b7a0824 100644 --- a/include/linux/usb/gadget.h +++ b/include/linux/usb/gadget.h @@ -386,6 +386,7 @@ struct usb_gadget_ops { * @lpm_capable: If the gadget max_speed is FULL or HIGH, this flag * indicates that it supports LPM as per the LPM ECN & errata. * @irq: the interrupt number for device controller. + * @id_number: a unique ID number for ensuring that gadget names are distinct * * Gadgets have a mostly-portable "gadget driver" implementing device * functions, handling all usb configurations and interfaces. Gadget @@ -446,6 +447,7 @@ struct usb_gadget { unsigned connected:1; unsigned lpm_capable:1; int irq; + int id_number; }; #define work_to_gadget(w) (container_of((w), struct usb_gadget, work)) -- cgit From a6b94c6b49198266eaf78095a632df7245ef5196 Mon Sep 17 00:00:00 2001 From: Michael Kelley Date: Mon, 2 May 2022 09:36:28 -0700 Subject: Drivers: hv: vmbus: Remove support for Hyper-V 2008 and Hyper-V 2008R2/Win7 The VMbus driver has special case code for running on the first released versions of Hyper-V: 2008 and 2008 R2/Windows 7. These versions are now out of support (except for extended security updates) and lack the performance features needed for effective production usage of Linux guests. Simplify the code by removing the negotiation of the VMbus protocol versions required for these releases of Hyper-V, and by removing the special case code for handling these VMbus protocol versions. Signed-off-by: Michael Kelley Reviewed-by: Andrea Parri (Microsoft) Link: https://lore.kernel.org/r/1651509391-2058-2-git-send-email-mikelley@microsoft.com Signed-off-by: Wei Liu --- include/linux/hyperv.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index c440c45887c2..a2464295c14a 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -230,15 +230,19 @@ static inline u32 hv_get_avail_to_write_percent( * two 16 bit quantities: major_number. minor_number. * * 0 . 13 (Windows Server 2008) - * 1 . 1 (Windows 7) - * 2 . 4 (Windows 8) - * 3 . 0 (Windows 8 R2) + * 1 . 1 (Windows 7, WS2008 R2) + * 2 . 4 (Windows 8, WS2012) + * 3 . 0 (Windows 8.1, WS2012 R2) * 4 . 0 (Windows 10) * 4 . 1 (Windows 10 RS3) * 5 . 0 (Newer Windows 10) * 5 . 1 (Windows 10 RS4) * 5 . 2 (Windows Server 2019, RS5) * 5 . 3 (Windows Server 2022) + * + * The WS2008 and WIN7 versions are listed here for + * completeness but are no longer supported in the + * Linux kernel. */ #define VERSION_WS2008 ((0 << 16) | (13)) -- cgit From 09ea48efffa3156218980e20aaf23dcc7d6000fc Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 11 May 2022 13:12:58 -0600 Subject: vfio: Make vfio_(un)register_notifier accept a vfio_device All callers have a struct vfio_device trivially available, pass it in directly and avoid calling the expensive vfio_group_get_from_dev(). Acked-by: Eric Farman Reviewed-by: Jason J. Herne Reviewed-by: Tony Krowiak Reviewed-by: Kevin Tian Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/1-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 66dda06ec42d..a00fd722f044 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -178,11 +178,11 @@ enum vfio_notify_type { /* events for VFIO_GROUP_NOTIFY */ #define VFIO_GROUP_NOTIFY_SET_KVM BIT(0) -extern int vfio_register_notifier(struct device *dev, +extern int vfio_register_notifier(struct vfio_device *device, enum vfio_notify_type type, unsigned long *required_events, struct notifier_block *nb); -extern int vfio_unregister_notifier(struct device *dev, +extern int vfio_unregister_notifier(struct vfio_device *device, enum vfio_notify_type type, struct notifier_block *nb); -- cgit From 8e432bb015b6c327d016b1dca509964e189c4770 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 11 May 2022 13:12:59 -0600 Subject: vfio/mdev: Pass in a struct vfio_device * to vfio_pin/unpin_pages() Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. The struct vfio_device already contains the group we need so this avoids complexity, extra refcountings, and a confusing lifecycle model. Reviewed-by: Christoph Hellwig Acked-by: Eric Farman Reviewed-by: Jason J. Herne Reviewed-by: Tony Krowiak Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/3-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index a00fd722f044..bddc70f88899 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -150,9 +150,9 @@ extern long vfio_external_check_extension(struct vfio_group *group, #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) -extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn, +extern int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn, int npage, int prot, unsigned long *phys_pfn); -extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, +extern int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn, int npage); extern int vfio_group_pin_pages(struct vfio_group *group, -- cgit From c6250ffbacc5989a5db3b9acce34b93570938f60 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 11 May 2022 13:12:59 -0600 Subject: vfio/mdev: Pass in a struct vfio_device * to vfio_dma_rw() Every caller has a readily available vfio_device pointer, use that instead of passing in a generic struct device. Change vfio_dma_rw() to take in the struct vfio_device and move the container users that would have been held by vfio_group_get_external_user_from_dev() to vfio_dma_rw() directly, like vfio_pin/unpin_pages(). Reviewed-by: Christoph Hellwig Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/4-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index bddc70f88899..8a1510258717 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -161,7 +161,7 @@ extern int vfio_group_pin_pages(struct vfio_group *group, extern int vfio_group_unpin_pages(struct vfio_group *group, unsigned long *user_iova_pfn, int npage); -extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova, +extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data, size_t len, bool write); extern struct iommu_domain *vfio_group_iommu_domain(struct vfio_group *group); -- cgit From 231657b345046552b35670b45b892cfce2610e12 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 11 May 2022 13:13:00 -0600 Subject: vfio: Remove dead code Now that callers have been updated to use the vfio_device APIs the driver facing group interface is no longer used, delete it: - vfio_group_get_external_user_from_dev() - vfio_group_pin_pages() - vfio_group_unpin_pages() - vfio_group_iommu_domain() -- Reviewed-by: Christoph Hellwig Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/6-v4-8045e76bf00b+13d-vfio_mdev_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 8a1510258717..6195edd2edcd 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -140,8 +140,6 @@ int vfio_mig_get_next_state(struct vfio_device *device, */ extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); -extern struct vfio_group *vfio_group_get_external_user_from_dev(struct device - *dev); extern bool vfio_external_group_match_file(struct vfio_group *group, struct file *filep); extern int vfio_external_user_iommu_id(struct vfio_group *group); @@ -154,18 +152,9 @@ extern int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn, int npage, int prot, unsigned long *phys_pfn); extern int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn, int npage); - -extern int vfio_group_pin_pages(struct vfio_group *group, - unsigned long *user_iova_pfn, int npage, - int prot, unsigned long *phys_pfn); -extern int vfio_group_unpin_pages(struct vfio_group *group, - unsigned long *user_iova_pfn, int npage); - extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data, size_t len, bool write); -extern struct iommu_domain *vfio_group_iommu_domain(struct vfio_group *group); - /* each type has independent events */ enum vfio_notify_type { VFIO_IOMMU_NOTIFY = 0, -- cgit From ff806cbd90bd2cc3d08a026e26157e5b5a6b5e03 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 11 May 2022 13:19:07 -0600 Subject: vfio/pci: Remove vfio_device_get_from_dev() The last user of this function is in PCI callbacks that want to convert their struct pci_dev to a vfio_device. Instead of searching use the vfio_device available trivially through the drvdata. When a callback in the device_driver is called, the caller must hold the device_lock() on dev. The purpose of the device_lock is to prevent remove() from being called (see __device_release_driver), and allow the driver to safely interact with its drvdata without races. The PCI core correctly follows this and holds the device_lock() when calling error_detected (see report_error_detected) and sriov_configure (see sriov_numvfs_store). Further, since the drvdata holds a positive refcount on the vfio_device any access of the drvdata, under the device_lock(), from a driver callback needs no further protection or refcounting. Thus the remark in the vfio_device_get_from_dev() comment does not apply here, VFIO PCI drivers all call vfio_unregister_group_dev() from their remove callbacks under the device_lock() and cannot race with the remaining callers. Reviewed-by: Kevin Tian Reviewed-by: Shameer Kolothum Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/2-v4-c841817a0349+8f-vfio_get_from_dev_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 2 -- include/linux/vfio_pci_core.h | 3 ++- 2 files changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 6195edd2edcd..be1e7db4a314 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -125,8 +125,6 @@ void vfio_uninit_group_dev(struct vfio_device *device); int vfio_register_group_dev(struct vfio_device *device); int vfio_register_emulated_iommu_dev(struct vfio_device *device); void vfio_unregister_group_dev(struct vfio_device *device); -extern struct vfio_device *vfio_device_get_from_dev(struct device *dev); -extern void vfio_device_put(struct vfio_device *device); int vfio_assign_device_set(struct vfio_device *device, void *set_id); diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 48f2dd3c568c..23c176d4b073 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -227,8 +227,9 @@ void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev, int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev); void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev); void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev); -int vfio_pci_core_sriov_configure(struct pci_dev *pdev, int nr_virtfn); extern const struct pci_error_handlers vfio_pci_core_err_handlers; +int vfio_pci_core_sriov_configure(struct vfio_pci_core_device *vdev, + int nr_virtfn); long vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, unsigned long arg); int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, -- cgit From 157cc18122b4a1456d19048e151a164216c4a704 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 22 Apr 2022 09:48:54 -0500 Subject: signal: Rename send_signal send_signal_locked Rename send_signal and __send_signal to send_signal_locked and __send_signal_locked to make send_signal usable outside of signal.c. Tested-by: Kees Cook Reviewed-by: Oleg Nesterov Link: https://lkml.kernel.org/r/20220505182645.497868-1-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/signal.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/signal.h b/include/linux/signal.h index a6db6f2ae113..55605bdf5ce9 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -283,6 +283,8 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info, extern int group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p, enum pid_type type); extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *); +extern int send_signal_locked(int sig, struct kernel_siginfo *info, + struct task_struct *p, enum pid_type type); extern int sigprocmask(int, sigset_t *, sigset_t *); extern void set_current_blocked(sigset_t *); extern void __set_current_blocked(const sigset_t *); -- cgit From e71ba124078e391879e0bf111529fa2d630d106c Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Fri, 22 Apr 2022 09:28:50 -0500 Subject: signal: Replace __group_send_sig_info with send_signal_locked The function __group_send_sig_info is just a light wrapper around send_signal_locked with one parameter fixed to a constant value. As the wrapper adds no real value update the code to directly call the wrapped function. Tested-by: Kees Cook Reviewed-by: Oleg Nesterov Link: https://lkml.kernel.org/r/20220505182645.497868-2-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/signal.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/signal.h b/include/linux/signal.h index 55605bdf5ce9..3b98e7a28538 100644 --- a/include/linux/signal.h +++ b/include/linux/signal.h @@ -282,7 +282,6 @@ extern int do_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p, enum pid_type type); extern int group_send_sig_info(int sig, struct kernel_siginfo *info, struct task_struct *p, enum pid_type type); -extern int __group_send_sig_info(int, struct kernel_siginfo *, struct task_struct *); extern int send_signal_locked(int sig, struct kernel_siginfo *info, struct task_struct *p, enum pid_type type); extern int sigprocmask(int, sigset_t *, sigset_t *); -- cgit From c200e4bb44e80b343c09841e7caaaca0aac5e5fa Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Tue, 26 Apr 2022 16:30:17 -0500 Subject: ptrace/um: Replace PT_DTRACE with TIF_SINGLESTEP User mode linux is the last user of the PT_DTRACE flag. Using the flag to indicate single stepping is a little confusing and worse changing tsk->ptrace without locking could potentionally cause problems. So use a thread info flag with a better name instead of flag in tsk->ptrace. Remove the definition PT_DTRACE as uml is the last user. Cc: stable@vger.kernel.org Acked-by: Johannes Berg Tested-by: Kees Cook Reviewed-by: Oleg Nesterov Link: https://lkml.kernel.org/r/20220505182645.497868-3-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/ptrace.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 15b3d176b6b4..4c06f9f8ef3f 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -30,7 +30,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr, #define PT_SEIZED 0x00010000 /* SEIZE used, enable new behavior */ #define PT_PTRACED 0x00000001 -#define PT_DTRACE 0x00000002 /* delayed trace (used on m68k, i386) */ #define PT_OPT_FLAG_SHIFT 3 /* PT_TRACE_* event enable flags */ -- cgit From 4a3d2717d140401df7501a95e454180831a0c5af Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Tue, 26 Apr 2022 16:45:37 -0500 Subject: ptrace/xtensa: Replace PT_SINGLESTEP with TIF_SINGLESTEP xtensa is the last user of the PT_SINGLESTEP flag. Changing tsk->ptrace in user_enable_single_step and user_disable_single_step without locking could potentiallly cause problems. So use a thread info flag instead of a flag in tsk->ptrace. Use TIF_SINGLESTEP that xtensa already had defined but unused. Remove the definitions of PT_SINGLESTEP and PT_BLOCKSTEP as they have no more users. Cc: stable@vger.kernel.org Acked-by: Max Filippov Tested-by: Kees Cook Reviewed-by: Oleg Nesterov Link: https://lkml.kernel.org/r/20220505182645.497868-4-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/ptrace.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ptrace.h b/include/linux/ptrace.h index 4c06f9f8ef3f..c952c5ba8fab 100644 --- a/include/linux/ptrace.h +++ b/include/linux/ptrace.h @@ -46,12 +46,6 @@ extern int ptrace_access_vm(struct task_struct *tsk, unsigned long addr, #define PT_EXITKILL (PTRACE_O_EXITKILL << PT_OPT_FLAG_SHIFT) #define PT_SUSPEND_SECCOMP (PTRACE_O_SUSPEND_SECCOMP << PT_OPT_FLAG_SHIFT) -/* single stepping state bits (used on ARM and PA-RISC) */ -#define PT_SINGLESTEP_BIT 31 -#define PT_SINGLESTEP (1< Date: Fri, 29 Apr 2022 08:43:34 -0500 Subject: ptrace: Don't change __state Stop playing with tsk->__state to remove TASK_WAKEKILL while a ptrace command is executing. Instead remove TASK_WAKEKILL from the definition of TASK_TRACED, and implement a new jobctl flag TASK_PTRACE_FROZEN. This new flag is set in jobctl_freeze_task and cleared when ptrace_stop is awoken or in jobctl_unfreeze_task (when ptrace_stop remains asleep). In signal_wake_up add __TASK_TRACED to state along with TASK_WAKEKILL when the wake up is for a fatal signal. Skip adding __TASK_TRACED when TASK_PTRACE_FROZEN is not set. This has the same effect as changing TASK_TRACED to __TASK_TRACED as all of the wake_ups that use TASK_KILLABLE go through signal_wake_up. Handle a ptrace_stop being called with a pending fatal signal. Previously it would have been handled by schedule simply failing to sleep. As TASK_WAKEKILL is no longer part of TASK_TRACED schedule will sleep with a fatal_signal_pending. The code in signal_wake_up guarantees that the code will be awaked by any fatal signal that codes after TASK_TRACED is set. Previously the __state value of __TASK_TRACED was changed to TASK_RUNNING when woken up or back to TASK_TRACED when the code was left in ptrace_stop. Now when woken up ptrace_stop now clears JOBCTL_PTRACE_FROZEN and when left sleeping ptrace_unfreezed_traced clears JOBCTL_PTRACE_FROZEN. Tested-by: Kees Cook Reviewed-by: Oleg Nesterov Link: https://lkml.kernel.org/r/20220505182645.497868-10-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" --- include/linux/sched.h | 2 +- include/linux/sched/jobctl.h | 2 ++ include/linux/sched/signal.h | 5 +++-- 3 files changed, 6 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index d5e3c00b74e1..610f2fdb1e2c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -103,7 +103,7 @@ struct task_group; /* Convenience macros for the sake of set_current_state: */ #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE) #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) -#define TASK_TRACED (TASK_WAKEKILL | __TASK_TRACED) +#define TASK_TRACED __TASK_TRACED #define TASK_IDLE (TASK_UNINTERRUPTIBLE | TASK_NOLOAD) diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h index fa067de9f1a9..d556c3425963 100644 --- a/include/linux/sched/jobctl.h +++ b/include/linux/sched/jobctl.h @@ -19,6 +19,7 @@ struct task_struct; #define JOBCTL_TRAPPING_BIT 21 /* switching to TRACED */ #define JOBCTL_LISTENING_BIT 22 /* ptracer is listening for events */ #define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */ +#define JOBCTL_PTRACE_FROZEN_BIT 24 /* frozen for ptrace */ #define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT) #define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT) @@ -28,6 +29,7 @@ struct task_struct; #define JOBCTL_TRAPPING (1UL << JOBCTL_TRAPPING_BIT) #define JOBCTL_LISTENING (1UL << JOBCTL_LISTENING_BIT) #define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT) +#define JOBCTL_PTRACE_FROZEN (1UL << JOBCTL_PTRACE_FROZEN_BIT) #define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY) #define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index 3c8b34876744..e66948abbee4 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -435,9 +435,10 @@ extern void calculate_sigpending(void); extern void signal_wake_up_state(struct task_struct *t, unsigned int state); -static inline void signal_wake_up(struct task_struct *t, bool resume) +static inline void signal_wake_up(struct task_struct *t, bool fatal) { - signal_wake_up_state(t, resume ? TASK_WAKEKILL : 0); + fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN); + signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0); } static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume) { -- cgit From 31cae1eaae4fd65095ad6a3659db467bc3c2599e Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 3 May 2022 15:57:47 -0500 Subject: sched,signal,ptrace: Rework TASK_TRACED, TASK_STOPPED state Currently ptrace_stop() / do_signal_stop() rely on the special states TASK_TRACED and TASK_STOPPED resp. to keep unique state. That is, this state exists only in task->__state and nowhere else. There's two spots of bother with this: - PREEMPT_RT has task->saved_state which complicates matters, meaning task_is_{traced,stopped}() needs to check an additional variable. - An alternative freezer implementation that itself relies on a special TASK state would loose TASK_TRACED/TASK_STOPPED and will result in misbehaviour. As such, add additional state to task->jobctl to track this state outside of task->__state. NOTE: this doesn't actually fix anything yet, just adds extra state. --EWB * didn't add a unnecessary newline in signal.h * Update t->jobctl in signal_wake_up and ptrace_signal_wake_up instead of in signal_wake_up_state. This prevents the clearing of TASK_STOPPED and TASK_TRACED from getting lost. * Added warnings if JOBCTL_STOPPED or JOBCTL_TRACED are not cleared Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220421150654.757693825@infradead.org Tested-by: Kees Cook Reviewed-by: Oleg Nesterov Link: https://lkml.kernel.org/r/20220505182645.497868-12-ebiederm@xmission.com Signed-off-by: Eric W. Biederman --- include/linux/sched.h | 8 +++----- include/linux/sched/jobctl.h | 6 ++++++ include/linux/sched/signal.h | 19 +++++++++++++++---- 3 files changed, 24 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 610f2fdb1e2c..cbe5c899599c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -118,11 +118,9 @@ struct task_group; #define task_is_running(task) (READ_ONCE((task)->__state) == TASK_RUNNING) -#define task_is_traced(task) ((READ_ONCE(task->__state) & __TASK_TRACED) != 0) - -#define task_is_stopped(task) ((READ_ONCE(task->__state) & __TASK_STOPPED) != 0) - -#define task_is_stopped_or_traced(task) ((READ_ONCE(task->__state) & (__TASK_STOPPED | __TASK_TRACED)) != 0) +#define task_is_traced(task) ((READ_ONCE(task->jobctl) & JOBCTL_TRACED) != 0) +#define task_is_stopped(task) ((READ_ONCE(task->jobctl) & JOBCTL_STOPPED) != 0) +#define task_is_stopped_or_traced(task) ((READ_ONCE(task->jobctl) & (JOBCTL_STOPPED | JOBCTL_TRACED)) != 0) /* * Special states are those that do not use the normal wait-loop pattern. See diff --git a/include/linux/sched/jobctl.h b/include/linux/sched/jobctl.h index d556c3425963..68876d0a7ef9 100644 --- a/include/linux/sched/jobctl.h +++ b/include/linux/sched/jobctl.h @@ -21,6 +21,9 @@ struct task_struct; #define JOBCTL_TRAP_FREEZE_BIT 23 /* trap for cgroup freezer */ #define JOBCTL_PTRACE_FROZEN_BIT 24 /* frozen for ptrace */ +#define JOBCTL_STOPPED_BIT 26 /* do_signal_stop() */ +#define JOBCTL_TRACED_BIT 27 /* ptrace_stop() */ + #define JOBCTL_STOP_DEQUEUED (1UL << JOBCTL_STOP_DEQUEUED_BIT) #define JOBCTL_STOP_PENDING (1UL << JOBCTL_STOP_PENDING_BIT) #define JOBCTL_STOP_CONSUME (1UL << JOBCTL_STOP_CONSUME_BIT) @@ -31,6 +34,9 @@ struct task_struct; #define JOBCTL_TRAP_FREEZE (1UL << JOBCTL_TRAP_FREEZE_BIT) #define JOBCTL_PTRACE_FROZEN (1UL << JOBCTL_PTRACE_FROZEN_BIT) +#define JOBCTL_STOPPED (1UL << JOBCTL_STOPPED_BIT) +#define JOBCTL_TRACED (1UL << JOBCTL_TRACED_BIT) + #define JOBCTL_TRAP_MASK (JOBCTL_TRAP_STOP | JOBCTL_TRAP_NOTIFY) #define JOBCTL_PENDING_MASK (JOBCTL_STOP_PENDING | JOBCTL_TRAP_MASK) diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index e66948abbee4..07ba3404fcde 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -294,8 +294,10 @@ static inline int kernel_dequeue_signal(void) static inline void kernel_signal_stop(void) { spin_lock_irq(¤t->sighand->siglock); - if (current->jobctl & JOBCTL_STOP_DEQUEUED) + if (current->jobctl & JOBCTL_STOP_DEQUEUED) { + current->jobctl |= JOBCTL_STOPPED; set_special_state(TASK_STOPPED); + } spin_unlock_irq(¤t->sighand->siglock); schedule(); @@ -437,12 +439,21 @@ extern void signal_wake_up_state(struct task_struct *t, unsigned int state); static inline void signal_wake_up(struct task_struct *t, bool fatal) { - fatal = fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN); - signal_wake_up_state(t, fatal ? TASK_WAKEKILL | __TASK_TRACED : 0); + unsigned int state = 0; + if (fatal && !(t->jobctl & JOBCTL_PTRACE_FROZEN)) { + t->jobctl &= ~(JOBCTL_STOPPED | JOBCTL_TRACED); + state = TASK_WAKEKILL | __TASK_TRACED; + } + signal_wake_up_state(t, state); } static inline void ptrace_signal_wake_up(struct task_struct *t, bool resume) { - signal_wake_up_state(t, resume ? __TASK_TRACED : 0); + unsigned int state = 0; + if (resume) { + t->jobctl &= ~JOBCTL_TRACED; + state = __TASK_TRACED; + } + signal_wake_up_state(t, state); } void task_join_group_stop(struct task_struct *task); -- cgit From 5b74c690e1c55953ec99fd9dab74f72dbee4fe95 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Thu, 12 May 2022 01:16:51 +0530 Subject: bpf: Fix sparse warning for bpf_kptr_xchg_proto Kernel Test Robot complained about missing static storage class annotation for bpf_kptr_xchg_proto variable. sparse: symbol 'bpf_kptr_xchg_proto' was not declared. Should it be static? This caused by missing extern definition in the header. Add it to suppress the sparse warning. Fixes: c0a5a21c25f3 ("bpf: Allow storing referenced kptr in map") Reported-by: kernel test robot Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220511194654.765705-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index aba7ded56436..3ded8711457f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2246,6 +2246,7 @@ extern const struct bpf_func_proto bpf_find_vma_proto; extern const struct bpf_func_proto bpf_loop_proto; extern const struct bpf_func_proto bpf_strncmp_proto; extern const struct bpf_func_proto bpf_copy_from_user_task_proto; +extern const struct bpf_func_proto bpf_kptr_xchg_proto; const struct bpf_func_proto *tracing_prog_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); -- cgit From 07343110b293456d30393e89b86c4dee1ac051c8 Mon Sep 17 00:00:00 2001 From: Feng Zhou Date: Wed, 11 May 2022 17:38:53 +0800 Subject: bpf: add bpf_map_lookup_percpu_elem for percpu map Add new ebpf helpers bpf_map_lookup_percpu_elem. The implementation method is relatively simple, refer to the implementation method of map_lookup_elem of percpu map, increase the parameters of cpu, and obtain it according to the specified cpu. Signed-off-by: Feng Zhou Link: https://lore.kernel.org/r/20220511093854.411-2-zhoufeng.zf@bytedance.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3ded8711457f..5061ccd8b2dc 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -89,6 +89,7 @@ struct bpf_map_ops { int (*map_push_elem)(struct bpf_map *map, void *value, u64 flags); int (*map_pop_elem)(struct bpf_map *map, void *value); int (*map_peek_elem)(struct bpf_map *map, void *value); + void *(*map_lookup_percpu_elem)(struct bpf_map *map, void *key, u32 cpu); /* funcs called by prog_array and perf_event_array map */ void *(*map_fd_get_ptr)(struct bpf_map *map, struct file *map_file, @@ -2184,6 +2185,7 @@ extern const struct bpf_func_proto bpf_map_delete_elem_proto; extern const struct bpf_func_proto bpf_map_push_elem_proto; extern const struct bpf_func_proto bpf_map_pop_elem_proto; extern const struct bpf_func_proto bpf_map_peek_elem_proto; +extern const struct bpf_func_proto bpf_map_lookup_percpu_elem_proto; extern const struct bpf_func_proto bpf_get_prandom_u32_proto; extern const struct bpf_func_proto bpf_get_smp_processor_id_proto; -- cgit From 43213daed6d6cb60e8cf69058a6db8648a556d9d Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Tue, 10 May 2022 19:53:01 -0700 Subject: fortify: Provide a memcpy trap door for sharp corners As we continue to narrow the scope of what the FORTIFY memcpy() will accept and build alternative APIs that give the compiler appropriate visibility into more complex memcpy scenarios, there is a need for "unfortified" memcpy use in rare cases where combinations of compiler behaviors, source code layout, etc, result in cases where the stricter memcpy checks need to be bypassed until appropriate solutions can be developed (i.e. fix compiler bugs, code refactoring, new API, etc). The intention is for this to be used only if there's no other reasonable solution, for its use to include a justification that can be used to assess future solutions, and for it to be temporary. Example usage included, based on analysis and discussion from: https://lore.kernel.org/netdev/CANn89iLS_2cshtuXPyNUGDPaic=sJiYfvTb_wNLgWrZRyBxZ_g@mail.gmail.com Cc: Jakub Kicinski Cc: Eric Dumazet Cc: "David S. Miller" Cc: Paolo Abeni Cc: Coco Li Cc: Tariq Toukan Cc: Saeed Mahameed Cc: Leon Romanovsky Cc: netdev@vger.kernel.org Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220511025301.3636666-1-keescook@chromium.org Signed-off-by: Paolo Abeni --- include/linux/fortify-string.h | 16 ++++++++++++++++ include/linux/string.h | 4 ++++ 2 files changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 295637a66c46..3b401fa0f374 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -52,6 +52,22 @@ extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) #define __underlying_strncpy __builtin_strncpy #endif +/** + * unsafe_memcpy - memcpy implementation with no FORTIFY bounds checking + * + * @dst: Destination memory address to write to + * @src: Source memory address to read from + * @bytes: How many bytes to write to @dst from @src + * @justification: Free-form text or comment describing why the use is needed + * + * This should be used for corner cases where the compiler cannot do the + * right thing, or during transitions between APIs, etc. It should be used + * very rarely, and includes a place for justification detailing where bounds + * checking has happened, and why existing solutions cannot be employed. + */ +#define unsafe_memcpy(dst, src, bytes, justification) \ + __underlying_memcpy(dst, src, bytes) + /* * Clang's use of __builtin_object_size() within inlines needs hinting via * __pass_object_size(). The preference is to only ever use type 1 (member diff --git a/include/linux/string.h b/include/linux/string.h index b6572aeca2f5..61ec7e4f6311 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -252,6 +252,10 @@ static inline const char *kbasename(const char *path) #if !defined(__NO_FORTIFY) && defined(__OPTIMIZE__) && defined(CONFIG_FORTIFY_SOURCE) #include #endif +#ifndef unsafe_memcpy +#define unsafe_memcpy(dst, src, bytes, justification) \ + memcpy(dst, src, bytes) +#endif void memcpy_and_pad(void *dest, size_t dest_len, const void *src, size_t count, int pad); -- cgit From a44623d9279086c89f631201d993aa332f7c9e66 Mon Sep 17 00:00:00 2001 From: Kishon Vijay Abraham I Date: Tue, 10 May 2022 14:46:29 +0530 Subject: usb: core: hcd: Add support for deferring roothub registration It has been observed with certain PCIe USB cards (like Inateck connected to AM64 EVM or J7200 EVM) that as soon as the primary roothub is registered, port status change is handled even before xHC is running leading to cold plug USB devices not detected. For such cases, registering both the root hubs along with the second HCD is required. Add support for deferring roothub registration in usb_add_hcd(), so that both primary and secondary roothubs are registered along with the second HCD. This patch has been added and reverted earier as it triggered a race in usb device enumeration. That race is now fixed in 5.16-rc3, and in stable back to 5.4 commit 6cca13de26ee ("usb: hub: Fix locking issues with address0_mutex") commit 6ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0 race") CC: stable@vger.kernel.org # 5.4+ Suggested-by: Mathias Nyman Tested-by: Chris Chiu Acked-by: Alan Stern Signed-off-by: Kishon Vijay Abraham I Link: https://lore.kernel.org/r/20220510091630.16564-2-kishon@ti.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/hcd.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h index 548a028f2dab..2c1fc9212cf2 100644 --- a/include/linux/usb/hcd.h +++ b/include/linux/usb/hcd.h @@ -124,6 +124,7 @@ struct usb_hcd { #define HCD_FLAG_RH_RUNNING 5 /* root hub is running? */ #define HCD_FLAG_DEAD 6 /* controller has died? */ #define HCD_FLAG_INTF_AUTHORIZED 7 /* authorize interfaces? */ +#define HCD_FLAG_DEFER_RH_REGISTER 8 /* Defer roothub registration */ /* The flags can be tested using these macros; they are likely to * be slightly faster than test_bit(). @@ -134,6 +135,7 @@ struct usb_hcd { #define HCD_WAKEUP_PENDING(hcd) ((hcd)->flags & (1U << HCD_FLAG_WAKEUP_PENDING)) #define HCD_RH_RUNNING(hcd) ((hcd)->flags & (1U << HCD_FLAG_RH_RUNNING)) #define HCD_DEAD(hcd) ((hcd)->flags & (1U << HCD_FLAG_DEAD)) +#define HCD_DEFER_RH_REGISTER(hcd) ((hcd)->flags & (1U << HCD_FLAG_DEFER_RH_REGISTER)) /* * Specifies if interfaces are authorized by default -- cgit From 5ce7729f25c16d5045deff4c9577e6d565da2d8d Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 12 May 2022 08:14:08 +0200 Subject: block: reorder the REQ_ flags Keep the op-specific flag last so that they are clearly separate from the generic flags. Various recent commits just kept adding new flags at the end. Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220512061408.1826595-1-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 4968cb17b13b..30f9a5391fd5 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -408,16 +408,17 @@ enum req_flag_bits { * work item to avoid such priority inversions. */ __REQ_CGROUP_PUNT, - - /* command specific flags for REQ_OP_WRITE_ZEROES: */ - __REQ_NOUNMAP, /* do not free blocks when zeroing */ - __REQ_POLLED, /* caller polls for completion using bio_poll */ __REQ_ALLOC_CACHE, /* allocate IO from cache if available */ + __REQ_SWAP, /* swap I/O */ + __REQ_DRV, /* for driver use */ + + /* + * Command specific flags, keep last: + */ + /* for REQ_OP_WRITE_ZEROES: */ + __REQ_NOUNMAP, /* do not free blocks when zeroing */ - /* for driver use */ - __REQ_DRV, - __REQ_SWAP, /* swapping request. */ __REQ_NR_BITS, /* stops here */ }; -- cgit From 5d2ae14276e698c76fa0c8ce870103f343b38263 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Wed, 11 May 2022 16:51:52 -0700 Subject: block: Fix the bio.bi_opf comment Commit ef295ecf090d modified the Linux kernel such that the bottom bits of the bi_opf member contain the operation instead of the topmost bits. That commit did not update the comment next to bi_opf. Hence this patch. From commit ef295ecf090d: -#define bio_op(bio) ((bio)->bi_opf >> BIO_OP_SHIFT) +#define bio_op(bio) ((bio)->bi_opf & REQ_OP_MASK) Cc: Christoph Hellwig Cc: Ming Lei Fixes: ef295ecf090d ("block: better op and flags encoding") Signed-off-by: Bart Van Assche Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220511235152.1082246-1-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 30f9a5391fd5..40e815400611 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -246,9 +246,8 @@ typedef unsigned int blk_qc_t; struct bio { struct bio *bi_next; /* request queue link */ struct block_device *bi_bdev; - unsigned int bi_opf; /* bottom bits req flags, - * top bits REQ_OP. Use - * accessors. + unsigned int bi_opf; /* bottom bits REQ_OP, top bits + * req_flags. */ unsigned short bi_flags; /* BIO_* below */ unsigned short bi_ioprio; -- cgit From 2760f5a415c3b86c6394738c6cff740c8b4ce664 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 6 May 2022 15:54:01 -0700 Subject: stop_machine: Add stop_core_cpuslocked() for per-core operations Hardware core level testing features require near simultaneous execution of WRMSR instructions on all threads of a core to initiate a test. Provide a customized cut down version of stop_machine_cpuslocked() that just operates on the threads of a single core. Suggested-by: Thomas Gleixner Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Tony Luck Reviewed-by: Thomas Gleixner Link: https://lore.kernel.org/r/20220506225410.1652287-4-tony.luck@intel.com Signed-off-by: Hans de Goede --- include/linux/stop_machine.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h index 46fb3ebdd16e..ea7a74ea7389 100644 --- a/include/linux/stop_machine.h +++ b/include/linux/stop_machine.h @@ -124,6 +124,22 @@ int stop_machine(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); */ int stop_machine_cpuslocked(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); +/** + * stop_core_cpuslocked: - stop all threads on just one core + * @cpu: any cpu in the targeted core + * @fn: the function to run + * @data: the data ptr for @fn() + * + * Same as above, but instead of every CPU, only the logical CPUs of a + * single core are affected. + * + * Context: Must be called from within a cpus_read_lock() protected region. + * + * Return: 0 if all executions of @fn returned 0, any non zero return + * value if any returned non zero. + */ +int stop_core_cpuslocked(unsigned int cpu, cpu_stop_fn_t fn, void *data); + int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void *data, const struct cpumask *cpus); #else /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */ -- cgit From 75d6fe48a21a0ea1565228c12b9c16f3fb37b673 Mon Sep 17 00:00:00 2001 From: Siddh Raman Pant Date: Thu, 12 May 2022 20:06:45 +0530 Subject: spi: Doc fix - Describe add_lock and dma_map_dev in spi_controller This fixes the corresponding warnings during building the docs. Signed-off-by: Siddh Raman Pant Link: https://lore.kernel.org/r/4e6187a4-d0f8-4750-e407-e09cc1c91789@gmail.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 5f8c063ddff4..df70eb1a671e 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -347,6 +347,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @max_message_size: function that returns the max message size for * a &spi_device; may be %NULL, so the default %SIZE_MAX will be used. * @io_mutex: mutex for physical bus access + * @add_lock: mutex to avoid adding devices to the same chipselect * @bus_lock_spinlock: spinlock for SPI bus locking * @bus_lock_mutex: mutex for exclusion of multiple callers * @bus_lock_flag: indicates that the SPI bus is locked for exclusive use @@ -361,6 +362,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @transfer: adds a message to the controller's transfer queue. * @cleanup: frees controller-specific state * @can_dma: determine whether this controller supports DMA + * @dma_map_dev: device which can be used for DMA mapping * @queued: whether this controller is providing an internal message queue * @kworker: pointer to thread struct for message pump * @pump_messages: work struct for scheduling work to the message pump -- cgit From 9824117dd964ecebf5d81990dbf21dfb56445049 Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Tue, 12 Apr 2022 15:38:51 -0500 Subject: ipmi: Add an intializer for ipmi_smi_msg struct There was a "type" element added to this structure, but some static values were missed. The default value will be zero, which is correct, but create an initializer for the type and initialize the type properly in the initializer to avoid future issues. Reported-by: Joe Wiese Signed-off-by: Corey Minyard --- include/linux/ipmi_smi.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ipmi_smi.h b/include/linux/ipmi_smi.h index 9277d21c2690..5d69820d8b02 100644 --- a/include/linux/ipmi_smi.h +++ b/include/linux/ipmi_smi.h @@ -125,6 +125,12 @@ struct ipmi_smi_msg { void (*done)(struct ipmi_smi_msg *msg); }; +#define INIT_IPMI_SMI_MSG(done_handler) \ +{ \ + .done = done_handler, \ + .type = IPMI_SMI_MSG_TYPE_NORMAL \ +} + struct ipmi_smi_handlers { struct module *owner; -- cgit From f214549d717310f795c20db9497db3938116399d Mon Sep 17 00:00:00 2001 From: Corey Minyard Date: Tue, 12 Apr 2022 15:49:47 -0500 Subject: ipmi: Add an intializer for ipmi_recv_msg struct Don't hand-initialize the struct here, create a macro to initialize it so new fields added don't get forgotten in places. Signed-off-by: Corey Minyard --- include/linux/ipmi.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ipmi.h b/include/linux/ipmi.h index 163831a087ef..a1c9c0d48ebf 100644 --- a/include/linux/ipmi.h +++ b/include/linux/ipmi.h @@ -72,6 +72,11 @@ struct ipmi_recv_msg { unsigned char msg_data[IPMI_MAX_MSG_LENGTH]; }; +#define INIT_IPMI_RECV_MSG(done_handler) \ +{ \ + .done = done_handler \ +} + /* Allocate and free the receive message. */ void ipmi_free_recv_msg(struct ipmi_recv_msg *msg); -- cgit From 80140a81f7f833998d732102eea0fea230b88067 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Wed, 27 Apr 2022 11:03:21 +0200 Subject: module.h: simplify MODULE_IMPORT_NS In commit ca321ec74322 ("module.h: allow #define strings to work with MODULE_IMPORT_NS") I fixed up the MODULE_IMPORT_NS() macro to allow defined strings to work with it. Unfortunatly I did it in a two-stage process, when it could just be done with the __stringify() macro as pointed out by Masahiro Yamada. Clean this up to only be one macro instead of two steps to achieve the same end result. Fixes: ca321ec74322 ("module.h: allow #define strings to work with MODULE_IMPORT_NS") Reported-by: Masahiro Yamada Cc: Luis Chamberlain Cc: Jessica Yu Cc: Matthias Maennich Signed-off-by: Greg Kroah-Hartman Signed-off-by: Luis Chamberlain --- include/linux/module.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/module.h b/include/linux/module.h index 46d4d5f2516e..abd9fa916b7d 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -290,8 +290,7 @@ extern typeof(name) __mod_##type##__##name##_device_table \ * files require multiple MODULE_FIRMWARE() specifiers */ #define MODULE_FIRMWARE(_firmware) MODULE_INFO(firmware, _firmware) -#define _MODULE_IMPORT_NS(ns) MODULE_INFO(import_ns, #ns) -#define MODULE_IMPORT_NS(ns) _MODULE_IMPORT_NS(ns) +#define MODULE_IMPORT_NS(ns) MODULE_INFO(import_ns, __stringify(ns)) struct notifier_block; -- cgit From 0df65743537dd6e1c8b0924714f14dc367cba2be Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 11 May 2022 10:23:05 -0700 Subject: skbuff: replace a BUG_ON() with the new DEBUG_NET_WARN_ON_ONCE() Very few drivers actually have Kconfig knobs for adding -DDEBUG. 8 according to a quick grep, while there are 93 users of skb_checksum_none_assert(). Switch to the new DEBUG_NET_WARN_ON_ONCE() to catch bad skbs. Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20220511172305.1382810-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index b91d225fdc13..9d82a8b6c8f1 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -5048,9 +5048,7 @@ static inline void skb_forward_csum(struct sk_buff *skb) */ static inline void skb_checksum_none_assert(const struct sk_buff *skb) { -#ifdef DEBUG - BUG_ON(skb->ip_summed != CHECKSUM_NONE); -#endif + DEBUG_NET_WARN_ON_ONCE(skb->ip_summed != CHECKSUM_NONE); } bool skb_partial_csum_set(struct sk_buff *skb, u16 start, u16 off); -- cgit From 58e4a2d27d3255e4e8c507fdc13734dccc9fc4c7 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 17 Dec 2021 09:28:46 +0300 Subject: extcon: Fix extcon_get_extcon_dev() error handling The extcon_get_extcon_dev() function returns error pointers on error, NULL when it's a -EPROBE_DEFER defer situation, and ERR_PTR(-ENODEV) when the CONFIG_EXTCON option is disabled. This is very complicated for the callers to handle and a number of them had bugs that would lead to an Oops. In real life, there are two things which prevented crashes. First, error pointers would only be returned if there was bug in the caller where they passed a NULL "extcon_name" and none of them do that. Second, only two out of the eight drivers will build when CONFIG_EXTCON is disabled. The normal way to write this would be to return -EPROBE_DEFER directly when appropriate and return NULL when CONFIG_EXTCON is disabled. Then the error handling is simple and just looks like: dev->edev = extcon_get_extcon_dev(acpi_dev_name(adev)); if (IS_ERR(dev->edev)) return PTR_ERR(dev->edev); For the two drivers which can build with CONFIG_EXTCON disabled, then extcon_get_extcon_dev() will now return NULL which is not treated as an error and the probe will continue successfully. Those two drivers are "typec_fusb302" and "max8997-battery". In the original code, the typec_fusb302 driver had an 800ms hang in tcpm_get_current_limit() but now that function is a no-op. For the max8997-battery driver everything should continue working as is. Signed-off-by: Dan Carpenter Reviewed-by: Hans de Goede Reviewed-by: Heikki Krogerus Reviewed-by: Guenter Roeck Acked-by: Sebastian Reichel Signed-off-by: Chanwoo Choi --- include/linux/extcon.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/extcon.h b/include/linux/extcon.h index 0c19010da77f..685401d94d39 100644 --- a/include/linux/extcon.h +++ b/include/linux/extcon.h @@ -296,7 +296,7 @@ static inline void devm_extcon_unregister_notifier_all(struct device *dev, static inline struct extcon_dev *extcon_get_extcon_dev(const char *extcon_name) { - return ERR_PTR(-ENODEV); + return NULL; } static inline struct extcon_dev *extcon_find_edev_by_node(struct device_node *node) -- cgit From 3367aa7d74d240261de2543ddb35531ccad9d884 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Wed, 11 May 2022 13:30:39 +0200 Subject: fbdev: Restart conflicting fb removal loop when unregistering devices Drivers that want to remove registered conflicting framebuffers prior to register their own framebuffer, call to remove_conflicting_framebuffers(). This function takes the registration_lock mutex, to prevent a race when drivers register framebuffer devices. But if a conflicting framebuffer device is found, the underlaying platform device is unregistered and this will lead to the platform driver .remove callback to be called. Which in turn will call to unregister_framebuffer() that takes the same lock. To prevent this, a struct fb_info.forced_out field was used as indication to unregister_framebuffer() whether the mutex has to be grabbed or not. But this could be unsafe, since the fbdev core is making assumptions about what drivers may or may not do in their .remove callbacks. Allowing to run these callbacks with the registration_lock held can cause deadlocks, since the fbdev core has no control over what drivers do in their removal path. A better solution is to drop the lock before platform_device_unregister(), so unregister_framebuffer() can take it when called from the fbdev driver. The lock is acquired again after the device has been unregistered and at this point the removal loop can be restarted. Since the conflicting framebuffer device has already been removed, the loop would just finish when no more conflicting framebuffers are found. Suggested-by: Daniel Vetter Signed-off-by: Javier Martinez Canillas Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220511113039.1252432-1-javierm@redhat.com --- include/linux/fb.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 69c67c70fa78..bbe1e4571899 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -511,7 +511,6 @@ struct fb_info { } *apertures; bool skip_vt_switch; /* no VT switch on suspend/resume required */ - bool forced_out; /* set when being removed by another driver */ }; static inline struct apertures_struct *alloc_apertures(unsigned int max_num) { -- cgit From 59fba9eed5a736caa257df4738c57ff72492365f Mon Sep 17 00:00:00 2001 From: Yunfei Dong Date: Thu, 12 May 2022 04:19:47 +0200 Subject: media: mediatek: vcodec: support stateless H.264 decoding for mt8192 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds h264 lat and core architecture driver for mt8192, and the decode mode is frame based for stateless decoder. Signed-off-by: Yunfei Dong Tested-by: Nícolas F. R. A. Prado Signed-off-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- include/linux/remoteproc/mtk_scp.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/remoteproc/mtk_scp.h b/include/linux/remoteproc/mtk_scp.h index b47416f7aeb8..7c2b7cc9fe6c 100644 --- a/include/linux/remoteproc/mtk_scp.h +++ b/include/linux/remoteproc/mtk_scp.h @@ -41,6 +41,8 @@ enum scp_ipi_id { SCP_IPI_ISP_FRAME, SCP_IPI_FD_CMD, SCP_IPI_CROS_HOST_CMD, + SCP_IPI_VDEC_LAT, + SCP_IPI_VDEC_CORE, SCP_IPI_NS_SERVICE = 0xFF, SCP_IPI_MAX = 0x100, }; -- cgit From ea661ad6e1573d5b08c27444ff2ed403bf39ff66 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 10 May 2022 10:34:03 +0800 Subject: iommu/vt-d: Size Page Request Queue to avoid overflow condition PRQ overflow may cause I/O throughput congestion, resulting in unnecessary degradation of I/O performance. Appropriately increasing the length of PRQ can greatly reduce the occurrence of PRQ overflow. The count of maximum page requests that can be generated in parallel by a PCIe device is statically defined in the Outstanding Page Request Capacity field of the PCIe ATS configure space. The new length of PRQ is calculated by summing up the value of Outstanding Page Request Capacity register across all devices where Page Requests are supported on the real PR-capable platform (Intel Sapphire Rapids). The result is round to the nearest higher power of 2. The PRQ length is also double sized as the VT-d IOMMU driver only updates the Page Request Queue Head Register (PQH_REG) after processing the entire queue. Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20220421113558.3504874-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220510023407.2759143-5-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-svm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/intel-svm.h b/include/linux/intel-svm.h index b3b125b332aa..207ef06ba3e1 100644 --- a/include/linux/intel-svm.h +++ b/include/linux/intel-svm.h @@ -9,7 +9,7 @@ #define __INTEL_SVM_H__ /* Page Request Queue depth */ -#define PRQ_ORDER 2 +#define PRQ_ORDER 4 #define PRQ_RING_MASK ((0x1000 << PRQ_ORDER) - 0x20) #define PRQ_DEPTH ((0x1000 << PRQ_ORDER) >> 5) -- cgit From fc0051cb95909ab56bd8c929f24d48c9870c3e3a Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 10 May 2022 10:34:05 +0800 Subject: iommu/vt-d: Check domain force_snooping against attached devices As domain->force_snooping only impacts the devices attached with the domain, there's no need to check against all IOMMU units. On the other hand, force_snooping could be set on a domain no matter whether it has been attached or not, and once set it is an immutable flag. If no device attached, the operation always succeeds. Then this empty domain can be only attached to a device of which the IOMMU supports snoop control. Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/20220508123525.1973626-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220510023407.2759143-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 72e5d7900e71..4f29139bbfc3 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -540,6 +540,7 @@ struct dmar_domain { u8 has_iotlb_device: 1; u8 iommu_coherency: 1; /* indicate coherency of iommu access */ u8 force_snooping : 1; /* Create IOPTEs with snoop control */ + u8 set_pte_snp:1; struct list_head devices; /* all devices' list */ struct iova_domain iovad; /* iova's that belong to this domain */ -- cgit From 4a18419f71cdf9155d2d2a6c79546f720978b990 Mon Sep 17 00:00:00 2001 From: Nadav Amit Date: Mon, 9 May 2022 18:20:50 -0700 Subject: mm/mprotect: use mmu_gather Patch series "mm/mprotect: avoid unnecessary TLB flushes", v6. This patchset is intended to remove unnecessary TLB flushes during mprotect() syscalls. Once this patch-set make it through, similar and further optimizations for MADV_COLD and userfaultfd would be possible. Basically, there are 3 optimizations in this patch-set: 1. Use TLB batching infrastructure to batch flushes across VMAs and do better/fewer flushes. This would also be handy for later userfaultfd enhancements. 2. Avoid unnecessary TLB flushes. This optimization is the one that provides most of the performance benefits. Unlike previous versions, we now only avoid flushes that would not result in spurious page-faults. 3. Avoiding TLB flushes on change_huge_pmd() that are only needed to prevent the A/D bits from changing. Andrew asked for some benchmark numbers. I do not have an easy determinate macrobenchmark in which it is easy to show benefit. I therefore ran a microbenchmark: a loop that does the following on anonymous memory, just as a sanity check to see that time is saved by avoiding TLB flushes. The loop goes: mprotect(p, PAGE_SIZE, PROT_READ) mprotect(p, PAGE_SIZE, PROT_READ|PROT_WRITE) *p = 0; // make the page writable The test was run in KVM guest with 1 or 2 threads (the second thread was busy-looping). I measured the time (cycles) of each operation: 1 thread 2 threads mmots +patch mmots +patch PROT_READ 3494 2725 (-22%) 8630 7788 (-10%) PROT_READ|WRITE 3952 2724 (-31%) 9075 2865 (-68%) [ mmots = v5.17-rc6-mmots-2022-03-06-20-38 ] The exact numbers are really meaningless, but the benefit is clear. There are 2 interesting results though. (1) PROT_READ is cheaper, while one can expect it not to be affected. This is presumably due to TLB miss that is saved (2) Without memory access (*p = 0), the speedup of the patch is even greater. In that scenario mprotect(PROT_READ) also avoids the TLB flush. As a result both operations on the patched kernel take roughly ~1500 cycles (with either 1 or 2 threads), whereas on mmotm their cost is as high as presented in the table. This patch (of 3): change_pXX_range() currently does not use mmu_gather, but instead implements its own deferred TLB flushes scheme. This both complicates the code, as developers need to be aware of different invalidation schemes, and prevents opportunities to avoid TLB flushes or perform them in finer granularity. The use of mmu_gather for modified PTEs has benefits in various scenarios even if pages are not released. For instance, if only a single page needs to be flushed out of a range of many pages, only that page would be flushed. If a THP page is flushed, on x86 a single TLB invlpg instruction can be used instead of 512 instructions (or a full TLB flush, which would Linux would actually use by default). mprotect() over multiple VMAs requires a single flush. Use mmu_gather in change_pXX_range(). As the pages are not released, only record the flushed range using tlb_flush_pXX_range(). Handle THP similarly and get rid of flush_cache_range() which becomes redundant since tlb_start_vma() calls it when needed. Link: https://lkml.kernel.org/r/20220401180821.1986781-1-namit@vmware.com Link: https://lkml.kernel.org/r/20220401180821.1986781-2-namit@vmware.com Signed-off-by: Nadav Amit Acked-by: Peter Zijlstra (Intel) Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 5 +++-- include/linux/mm.h | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2999190adc22..9a26bd10e083 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -36,8 +36,9 @@ int zap_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, pud_t *pud, unsigned long addr); bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd); -int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, - pgprot_t newprot, unsigned long cp_flags); +int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, + pmd_t *pmd, unsigned long addr, pgprot_t newprot, + unsigned long cp_flags); vm_fault_t vmf_insert_pfn_pmd_prot(struct vm_fault *vmf, pfn_t pfn, pgprot_t pgprot, bool write); diff --git a/include/linux/mm.h b/include/linux/mm.h index 5dc5005ccc9b..d63ba0e0e068 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1967,10 +1967,11 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, #define MM_CP_UFFD_WP_ALL (MM_CP_UFFD_WP | \ MM_CP_UFFD_WP_RESOLVE) -extern unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, +extern unsigned long change_protection(struct mmu_gather *tlb, + struct vm_area_struct *vma, unsigned long start, unsigned long end, pgprot_t newprot, unsigned long cp_flags); -extern int mprotect_fixup(struct vm_area_struct *vma, +extern int mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma, struct vm_area_struct **pprev, unsigned long start, unsigned long end, unsigned long newflags); -- cgit From 4f83145721f362c2f4d312edc4755269a2069488 Mon Sep 17 00:00:00 2001 From: Nadav Amit Date: Mon, 9 May 2022 18:20:50 -0700 Subject: mm: avoid unnecessary flush on change_huge_pmd() Calls to change_protection_range() on THP can trigger, at least on x86, two TLB flushes for one page: one immediately, when pmdp_invalidate() is called by change_huge_pmd(), and then another one later (that can be batched) when change_protection_range() finishes. The first TLB flush is only necessary to prevent the dirty bit (and with a lesser importance the access bit) from changing while the PTE is modified. However, this is not necessary as the x86 CPUs set the dirty-bit atomically with an additional check that the PTE is (still) present. One caveat is Intel's Knights Landing that has a bug and does not do so. Leverage this behavior to eliminate the unnecessary TLB flush in change_huge_pmd(). Introduce a new arch specific pmdp_invalidate_ad() that only invalidates the access and dirty bit from further changes. Link: https://lkml.kernel.org/r/20220401180821.1986781-4-namit@vmware.com Signed-off-by: Nadav Amit Cc: Andrea Arcangeli Cc: Andrew Cooper Cc: Andy Lutomirski Cc: Dave Hansen Cc: Peter Xu Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Will Deacon Cc: Yu Zhao Cc: Nick Piggin Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 53750224e176..530b1817b58c 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -570,6 +570,26 @@ extern pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp); #endif +#ifndef __HAVE_ARCH_PMDP_INVALIDATE_AD + +/* + * pmdp_invalidate_ad() invalidates the PMD while changing a transparent + * hugepage mapping in the page tables. This function is similar to + * pmdp_invalidate(), but should only be used if the access and dirty bits would + * not be cleared by the software in the new PMD value. The function ensures + * that hardware changes of the access and dirty bits updates would not be lost. + * + * Doing so can allow in certain architectures to avoid a TLB flush in most + * cases. Yet, another TLB flush might be necessary later if the PMD update + * itself requires such flush (e.g., if protection was set to be stricter). Yet, + * even when a TLB flush is needed because of the update, the caller may be able + * to batch these TLB flushing operations, so fewer TLB flush operations are + * needed. + */ +extern pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, + unsigned long address, pmd_t *pmdp); +#endif + #ifndef __HAVE_ARCH_PTE_SAME static inline int pte_same(pte_t pte_a, pte_t pte_b) { -- cgit From f38adfef7e6bfe681d6602b6668be31fe1310ed0 Mon Sep 17 00:00:00 2001 From: "Fabio M. De Francesco" Date: Mon, 9 May 2022 18:20:51 -0700 Subject: mm/highmem: VM_BUG_ON() if offset + len > PAGE_SIZE Add VM_BUG_ON() bounds checking to make sure that, if "offset + len> PAGE_SIZE", memset() does not corrupt data in adjacent pages. Mainly to match all the similar functions in highmem.h. Link: https://lkml.kernel.org/r/20220426193020.8710-1-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco Reviewed-by: Andrew Morton Cc: Ira Weiny Cc: Catalin Marinas Cc: "Matthew Wilcox (Oracle)" Cc: Peter Collingbourne Signed-off-by: Andrew Morton --- include/linux/highmem.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 39bb9b47fa9c..67720bfd6ade 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -358,6 +358,8 @@ static inline void memcpy_to_page(struct page *page, size_t offset, static inline void memzero_page(struct page *page, size_t offset, size_t len) { char *addr = kmap_local_page(page); + + VM_BUG_ON(offset + len > PAGE_SIZE); memset(addr + offset, 0, len); flush_dcache_page(page); kunmap_local(addr); -- cgit From 152e56178ad72037bc6a2f08d4a608543bc67cdf Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 9 May 2022 18:20:51 -0700 Subject: mm/damon/core: add a function for damon_operations registration checks Patch series "mm/damon: allow users know which monitoring ops are available". DAMON users can configure it for vaious address spaces including virtual address spaces and the physical address space by setting its monitoring operations set with appropriate one for their purpose. However, there is no celan and simple way to know exactly which monitoring operations sets are available on the currently running kernel. This patchset adds functions for the purpose on DAMON's kernel API ('damon_is_registered_ops()') and sysfs interface ('avail_operations' file under each context directory). This patch (of 4): To know if a specific 'damon_operations' is registered, users need to check the kernel config or try 'damon_select_ops()' with the ops of the question, and then see if it successes. In the latter case, the user should also revert the change. To make the process simple and convenient, this commit adds a function for checking if a specific 'damon_operations' is registered or not. Link: https://lkml.kernel.org/r/20220426203843.45238-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220426203843.45238-2-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index f23cbfa4248d..73ff0e2d2a4d 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -509,6 +509,7 @@ int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes, ssize_t nr_schemes); int damon_nr_running_ctxs(void); +bool damon_is_registered_ops(enum damon_ops_id id); int damon_register_ops(struct damon_operations *ops); int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id); -- cgit From de6d01542a5c4f6fe6f0c65c14694b760f896acc Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 9 May 2022 18:20:52 -0700 Subject: mm/damon/vaddr: register a damon_operations for fixed virtual address ranges monitoring Patch series "support fixed virtual address ranges monitoring". The monitoring operations set for virtual address spaces automatically updates the monitoring target regions to cover entire mappings of the virtual address spaces as much as possible. Some users could have more information about their programs than kernel and therefore have interest in not entire regions but only specific regions. For such cases, the automatic monitoring target regions updates are only unnecessary overhead or distractions. This patchset adds supports for the use case on DAMON's kernel API (DAMON_OPS_FVADDR) and sysfs interface ('fvaddr' keyword for 'operations' sysfs file). This patch (of 3): The monitoring operations set for virtual address spaces automatically updates the monitoring target regions to cover entire mappings of the virtual address spaces as much as possible. Some users could have more information about their programs than kernel and therefore have interest in not entire regions but only specific regions. For such cases, the automatic monitoring target regions updates are only unnecessary overheads or distractions. For such cases, DAMON's API users can simply set the '->init()' and '->update()' of the DAMON context's '->ops' NULL, and set the target monitoring regions when creating the context. But, that would be a dirty hack. Worse yet, the hack is unavailable for DAMON user space interface users. To support the use case in a clean way that can easily exported to the user space, this commit adds another monitoring operations set called 'fvaddr', which is same to 'vaddr' but does not automatically update the monitoring regions. Instead, it will only respect the virtual address regions which have explicitly passed at the initial context creation. Note that this commit leave sysfs interface not supporting the feature yet. The support will be made in a following commit. Link: https://lkml.kernel.org/r/20220426231750.48822-1-sj@kernel.org Link: https://lkml.kernel.org/r/20220426231750.48822-2-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 73ff0e2d2a4d..09a5d0d02c00 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -261,10 +261,13 @@ struct damos { * enum damon_ops_id - Identifier for each monitoring operations implementation * * @DAMON_OPS_VADDR: Monitoring operations for virtual address spaces + * @DAMON_OPS_FVADDR: Monitoring operations for only fixed ranges of virtual + * address spaces * @DAMON_OPS_PADDR: Monitoring operations for the physical address space */ enum damon_ops_id { DAMON_OPS_VADDR, + DAMON_OPS_FVADDR, DAMON_OPS_PADDR, NR_DAMON_OPS, }; -- cgit From 534aa1dc975ac883ad89110534585a96630802a0 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Mon, 9 May 2022 18:20:53 -0700 Subject: printk: stop including cache.h from printk.h An inclusion of cache.h in printk.h was added in 2014 in commit c28aa1f0a847 ("printk/cache: mark printk_once test variable __read_mostly") in order to bring in the definition of __read_mostly. The usage of __read_mostly was later removed in commit 3ec25826ae33 ("printk: Tie printk_once / printk_deferred_once into .data.once for reset") which made the inclusion of cache.h unnecessary, so remove it. We have a small amount of code that depended on the inclusion of cache.h from printk.h; fix that code to include the appropriate header. This fixes a circular inclusion on arm64 (linux/printk.h -> linux/cache.h -> asm/cache.h -> linux/kasan-enabled.h -> linux/static_key.h -> linux/jump_label.h -> linux/bug.h -> asm/bug.h -> linux/printk.h) that would otherwise be introduced by the next patch. Build tested using {allyesconfig,defconfig} x {arm64,x86_64}. Link: https://linux-review.googlesource.com/id/I8fd51f72c9ef1f2d6afd3b2cbc875aa4792c1fba Link: https://lkml.kernel.org/r/20220427195820.1716975-1-pcc@google.com Signed-off-by: Peter Collingbourne Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Andrey Ryabinin Cc: Catalin Marinas Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric W. Biederman Cc: Herbert Xu Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim Cc: Kees Cook Cc: Pekka Enberg Cc: Roman Gushchin Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/printk.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index 1522df223c0f..8e8d74edf121 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -6,7 +6,6 @@ #include #include #include -#include #include #include -- cgit From d949a8155d139aa890795b802004a196b7f00598 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Mon, 9 May 2022 18:20:53 -0700 Subject: mm: make minimum slab alignment a runtime property When CONFIG_KASAN_HW_TAGS is enabled we currently increase the minimum slab alignment to 16. This happens even if MTE is not supported in hardware or disabled via kasan=off, which creates an unnecessary memory overhead in those cases. Eliminate this overhead by making the minimum slab alignment a runtime property and only aligning to 16 if KASAN is enabled at runtime. On a DragonBoard 845c (non-MTE hardware) with a kernel built with CONFIG_KASAN_HW_TAGS, waiting for quiescence after a full Android boot I see the following Slab measurements in /proc/meminfo (median of 3 reboots): Before: 169020 kB After: 167304 kB [akpm@linux-foundation.org: make slab alignment type `unsigned int' to avoid casting] Link: https://linux-review.googlesource.com/id/I752e725179b43b144153f4b6f584ceb646473ead Link: https://lkml.kernel.org/r/20220427195820.1716975-2-pcc@google.com Signed-off-by: Peter Collingbourne Reviewed-by: Andrey Konovalov Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by: David Rientjes Reviewed-by: Catalin Marinas Acked-by: Vlastimil Babka Cc: Pekka Enberg Cc: Roman Gushchin Cc: Joonsoo Kim Cc: Herbert Xu Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Eric W. Biederman Cc: Kees Cook Signed-off-by: Andrew Morton --- include/linux/slab.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 373b3ef99f4e..3d2f2a3ca17e 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -209,6 +209,18 @@ void kmem_dump_obj(void *object); #define ARCH_SLAB_MINALIGN __alignof__(unsigned long long) #endif +/* + * Arches can define this function if they want to decide the minimum slab + * alignment at runtime. The value returned by the function must be a power + * of two and >= ARCH_SLAB_MINALIGN. + */ +#ifndef arch_slab_minalign +static inline unsigned int arch_slab_minalign(void) +{ + return ARCH_SLAB_MINALIGN; +} +#endif + /* * kmalloc and friends return ARCH_KMALLOC_MINALIGN aligned * pointers. kmem_cache_alloc and friends return ARCH_SLAB_MINALIGN -- cgit From b304c6f0d39d927a87e72a8ac6c89b96ac25f355 Mon Sep 17 00:00:00 2001 From: Hongchen Zhang Date: Mon, 9 May 2022 18:20:53 -0700 Subject: mm/swapops: make is_pmd_migration_entry more strict A pmd migration entry should first be a swap pmd,so use is_swap_pmd(pmd) instead of !pmd_present(pmd). On the other hand, some architecture (MIPS for example) may misjudge a pmd_none entry as a pmd migration entry. Link: https://lkml.kernel.org/r/1651131333-6386-1-git-send-email-zhanghongchen@loongson.cn Signed-off-by: Hongchen Zhang Acked-by: Peter Xu Cc: Alistair Popple Cc: Ralph Campbell Cc: Naoya Horiguchi Cc: Hugh Dickins Signed-off-by: Andrew Morton --- include/linux/swapops.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index d1b728904d4e..82265b055a0d 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -331,7 +331,7 @@ static inline pmd_t swp_entry_to_pmd(swp_entry_t entry) static inline int is_pmd_migration_entry(pmd_t pmd) { - return !pmd_present(pmd) && is_migration_entry(pmd_to_swp_entry(pmd)); + return is_swap_pmd(pmd) && is_migration_entry(pmd_to_swp_entry(pmd)); } #else static inline int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, -- cgit From 6e74d2bf5a265113ca54a8323783d2f3fdde96b7 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 9 May 2022 18:20:54 -0700 Subject: mm/damon/core: add a new callback for watermarks checks Patch series "mm/damon: Support online tuning". Effects of DAMON and DAMON-based Operation Schemes highly depends on the configurations. Wrong configurations could even result in unexpected efficiency degradations. For finding a best configuration, repeating incremental configuration changes and results measurements, in other words, online tuning, could be helpful. Nevertheless, DAMON kernel API supports only restrictive online tuning. Worse yet, the sysfs-based DAMON user interface doesn't support online tuning at all. DAMON_RECLAIM also doesn't support online tuning. This patchset makes the DAMON kernel API, DAMON sysfs interface, and DAMON_RECLAIM supports online tuning. Sequence of patches ------------------- First two patches enhance DAMON online tuning for kernel API users. Specifically, patch 1 let kernel API users to be able to do DAMON online tuning without a restriction, and patch 2 makes error handling easier. Following seven patches (patches 3-9) refactor code for better readability and easier reuse of code fragments that will be useful for online tuning support. Patch 10 introduces DAMON callback based user request handling structure for DAMON sysfs interface, and patch 11 enables DAMON online tuning via DAMON sysfs interface. Documentation patch (patch 12) for usage of it follows. Patch 13 enables online tuning of DAMON_RECLAIM and finally patch 14 documents the DAMON_RECLAIM online tuning usage. This patch (of 14): For updating input parameters for running DAMON contexts, DAMON kernel API users can use the contexts' callbacks, as it is the safe place for context internal data accesses. When the context has DAMON-based operation schemes and all schemes are deactivated due to their watermarks, however, DAMON does nothing but only watermarks checks. As a result, no callbacks will be called back, and therefore the kernel API users cannot update the input parameters including monitoring attributes, DAMON-based operation schemes, and watermarks. To let users easily update such DAMON input parameters in such a case, this commit adds a new callback, 'after_wmarks_check()'. It will be called after each watermarks check. Users can do the online input parameters update in the callback even under the schemes deactivated case. Link: https://lkml.kernel.org/r/20220429160606.127307-2-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 09a5d0d02c00..6cb5ab5d8e9d 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -343,6 +343,7 @@ struct damon_operations { * struct damon_callback - Monitoring events notification callbacks. * * @before_start: Called before starting the monitoring. + * @after_wmarks_check: Called after each schemes' watermarks check. * @after_sampling: Called after each sampling. * @after_aggregation: Called after each aggregation. * @before_terminate: Called before terminating the monitoring. @@ -353,6 +354,11 @@ struct damon_operations { * respectively. Therefore, those are good places for installing and cleaning * @private. * + * The monitoring thread calls @after_wmarks_check after each DAMON-based + * operation schemes' watermarks check. If users need to make changes to the + * attributes of the monitoring context while it's deactivated due to the + * watermarks, this is the good place to do. + * * The monitoring thread calls @after_sampling and @after_aggregation for each * of the sampling intervals and aggregation intervals, respectively. * Therefore, users can safely access the monitoring results without additional @@ -365,6 +371,7 @@ struct damon_callback { void *private; int (*before_start)(struct damon_ctx *context); + int (*after_wmarks_check)(struct damon_ctx *context); int (*after_sampling)(struct damon_ctx *context); int (*after_aggregation)(struct damon_ctx *context); void (*before_terminate)(struct damon_ctx *context); -- cgit From d0723bc04185b15ed96ade0a0bf9277ef00008db Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 9 May 2022 18:20:55 -0700 Subject: mm/damon/vaddr: move 'damon_set_regions()' to core This commit moves 'damon_set_regions()' from vaddr to core, as it is aimed to be used by not only 'vaddr' but also other parts of DAMON. Link: https://lkml.kernel.org/r/20220429160606.127307-5-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 6cb5ab5d8e9d..d1e6ee28a2ff 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -494,6 +494,8 @@ static inline void damon_insert_region(struct damon_region *r, void damon_add_region(struct damon_region *r, struct damon_target *t); void damon_destroy_region(struct damon_region *r, struct damon_target *t); +int damon_set_regions(struct damon_target *t, struct damon_addr_range *ranges, + unsigned int nr_ranges); struct damos *damon_new_scheme( unsigned long min_sz_region, unsigned long max_sz_region, -- cgit From 679d103319101800567c8bb7e341b5eee39f6685 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:52 -0700 Subject: mm: introduce PTE_MARKER swap entry Patch series "userfaultfd-wp: Support shmem and hugetlbfs", v8. Overview ======== Userfaultfd-wp anonymous support was merged two years ago. There're quite a few applications that started to leverage this capability either to take snapshots for user-app memory, or use it for full user controled swapping. This series tries to complete the feature for uffd-wp so as to cover all the RAM-based memory types. So far uffd-wp is the only missing piece of the rest features (uffd-missing & uffd-minor mode). One major reason to do so is that anonymous pages are sometimes not satisfying the need of applications, and there're growing users of either shmem and hugetlbfs for either sharing purpose (e.g., sharing guest mem between hypervisor process and device emulation process, shmem local live migration for upgrades), or for performance on tlb hits. All these mean that if a uffd-wp app wants to switch to any of the memory types, it'll stop working. I think it's worthwhile to have the kernel to cover all these aspects. This series chose to protect pages in pte level not page level. One major reason is safety. I have no idea how we could make it safe if any of the uffd-privileged app can wr-protect a page that any other application can use. It means this app can block any process potentially for any time it wants. The other reason is that it aligns very well with not only the anonymous uffd-wp solution, but also uffd as a whole. For example, userfaultfd is implemented fundamentally based on VMAs. We set flags to VMAs showing the status of uffd tracking. For another per-page based protection solution, it'll be crossing the fundation line on VMA-based, and it could simply be too far away already from what's called userfaultfd. PTE markers =========== The patchset is based on the idea called PTE markers. It was discussed in one of the mm alignment sessions, proposed starting from v6, and this is the 2nd version of it using PTE marker idea. PTE marker is a new type of swap entry that is ony applicable to file backed memories like shmem and hugetlbfs. It's used to persist some pte-level information even if the original present ptes in pgtable are zapped. Logically pte markers can store more than uffd-wp information, but so far only one bit is used for uffd-wp purpose. When the pte marker is installed with uffd-wp bit set, it means this pte is wr-protected by uffd. It solves the problem on e.g. file-backed memory mapped ptes got zapped due to any reason (e.g. thp split, or swapped out), we can still keep the wr-protect information in the ptes. Then when the page fault triggers again, we'll know this pte is wr-protected so we can treat the pte the same as a normal uffd wr-protected pte. The extra information is encoded into the swap entry, or swp_offset to be explicit, with the swp_type being PTE_MARKER. So far uffd-wp only uses one bit out of the swap entry, the rest bits of swp_offset are still reserved for other purposes. There're two configs to enable/disable PTE markers: CONFIG_PTE_MARKER CONFIG_PTE_MARKER_UFFD_WP We can set !PTE_MARKER to completely disable all the PTE markers, along with uffd-wp support. I made two config so we can also enable PTE marker but disable uffd-wp file-backed for other purposes. At the end of current series, I'll enable CONFIG_PTE_MARKER by default, but that patch is standalone and if anyone worries about having it by default, we can also consider turn it off by dropping that oneliner patch. So far I don't see a huge risk of doing so, so I kept that patch. In most cases, PTE markers should be treated as none ptes. It is because that unlike most of the other swap entry types, there's no PFN or block offset information encoded into PTE markers but some extra well-defined bits showing the status of the pte. These bits should only be used as extra data when servicing an upcoming page fault, and then we behave as if it's a none pte. I did spend a lot of time observing all the pte_none() users this time. It is indeed a challenge because there're a lot, and I hope I didn't miss a single of them when we should take care of pte markers. Luckily, I don't think it'll need to be considered in many cases, for example: boot code, arch code (especially non-x86), kernel-only page handlings (e.g. CPA), or device driver codes when we're tackling with pure PFN mappings. I introduced pte_none_mostly() in this series when we need to handle pte markers the same as none pte, the "mostly" is the other way to write "either none pte or a pte marker". I didn't replace pte_none() to cover pte markers for below reasons: - Very rare case of pte_none() callers will handle pte markers. E.g., all the kernel pages do not require knowledge of pte markers. So we don't pollute the major use cases. - Unconditionally change pte_none() semantics could confuse people, because pte_none() existed for so long a time. - Unconditionally change pte_none() semantics could make pte_none() slower even if in many cases pte markers do not exist. - There're cases where we'd like to handle pte markers differntly from pte_none(), so a full replace is also impossible. E.g. khugepaged should still treat pte markers as normal swap ptes rather than none ptes, because pte markers will always need a fault-in to merge the marker with a valid pte. Or the smap code will need to parse PTE markers not none ptes. Patch Layout ============ Introducing PTE marker and uffd-wp bit in PTE marker: mm: Introduce PTE_MARKER swap entry mm: Teach core mm about pte markers mm: Check against orig_pte for finish_fault() mm/uffd: PTE_MARKER_UFFD_WP Adding support for shmem uffd-wp: mm/shmem: Take care of UFFDIO_COPY_MODE_WP mm/shmem: Handle uffd-wp special pte in page fault handler mm/shmem: Persist uffd-wp bit across zapping for file-backed mm/shmem: Allow uffd wr-protect none pte for file-backed mem mm/shmem: Allows file-back mem to be uffd wr-protected on thps mm/shmem: Handle uffd-wp during fork() Adding support for hugetlbfs uffd-wp: mm/hugetlb: Introduce huge pte version of uffd-wp helpers mm/hugetlb: Hook page faults for uffd write protection mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP mm/hugetlb: Handle UFFDIO_WRITEPROTECT mm/hugetlb: Handle pte markers in page faults mm/hugetlb: Allow uffd wr-protect none ptes mm/hugetlb: Only drop uffd-wp special pte if required mm/hugetlb: Handle uffd-wp during fork() Misc handling on the rest mm for uffd-wp file-backed: mm/khugepaged: Don't recycle vma pgtable if uffd-wp registered mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs Enabling of uffd-wp on file-backed memory: mm/uffd: Enable write protection for shmem & hugetlbfs mm: Enable PTE markers by default selftests/uffd: Enable uffd-wp for shmem/hugetlbfs Tests ===== - Compile test on x86_64 and aarch64 on different configs - Kernel selftests - uffd-test [0] - Umapsort [1,2] test for shmem/hugetlb, with swap on/off [0] https://github.com/xzpeter/clibs/tree/master/uffd-test [1] https://github.com/xzpeter/umap-apps/tree/peter [2] https://github.com/xzpeter/umap/tree/peter-shmem-hugetlbfs This patch (of 23): Introduces a new swap entry type called PTE_MARKER. It can be installed for any pte that maps a file-backed memory when the pte is temporarily zapped, so as to maintain per-pte information. The information that kept in the pte is called a "marker". Here we define the marker as "unsigned long" just to match pgoff_t, however it will only work if it still fits in swp_offset(), which is e.g. currently 58 bits on x86_64. A new config CONFIG_PTE_MARKER is introduced too; it's by default off. A bunch of helpers are defined altogether to service the rest of the pte marker code. [peterx@redhat.com: fixup] Link: https://lkml.kernel.org/r/Yk2rdB7SXZf+2BDF@xz-m1.local Link: https://lkml.kernel.org/r/20220405014646.13522-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20220405014646.13522-2-peterx@redhat.com Signed-off-by: Peter Xu Cc: Mike Kravetz Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Alistair Popple Cc: Nadav Amit Cc: Axel Rasmussen Cc: Andrea Arcangeli Cc: "Kirill A . Shutemov" Cc: Hugh Dickins Cc: Jerome Glisse Cc: Mike Rapoport Signed-off-by: Andrew Morton --- include/linux/swap.h | 15 +++++++++- include/linux/swapops.h | 78 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 92 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 7daae5a4b3e1..5553189d0215 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -55,6 +55,19 @@ static inline int current_is_kswapd(void) * actions on faults. */ +/* + * PTE markers are used to persist information onto PTEs that are mapped with + * file-backed memories. As its name "PTE" hints, it should only be applied to + * the leaves of pgtables. + */ +#ifdef CONFIG_PTE_MARKER +#define SWP_PTE_MARKER_NUM 1 +#define SWP_PTE_MARKER (MAX_SWAPFILES + SWP_HWPOISON_NUM + \ + SWP_MIGRATION_NUM + SWP_DEVICE_NUM) +#else +#define SWP_PTE_MARKER_NUM 0 +#endif + /* * Unaddressable device memory support. See include/linux/hmm.h and * Documentation/vm/hmm.rst. Short description is we need struct pages for @@ -107,7 +120,7 @@ static inline int current_is_kswapd(void) #define MAX_SWAPFILES \ ((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \ - SWP_MIGRATION_NUM - SWP_HWPOISON_NUM) + SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - SWP_PTE_MARKER_NUM) /* * Magic header for a swap area. The first part of the union is diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 82265b055a0d..bb70da461b7d 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -274,6 +274,84 @@ static inline int is_readable_migration_entry(swp_entry_t entry) #endif +typedef unsigned long pte_marker; + +#define PTE_MARKER_MASK (0) + +#ifdef CONFIG_PTE_MARKER + +static inline swp_entry_t make_pte_marker_entry(pte_marker marker) +{ + return swp_entry(SWP_PTE_MARKER, marker); +} + +static inline bool is_pte_marker_entry(swp_entry_t entry) +{ + return swp_type(entry) == SWP_PTE_MARKER; +} + +static inline pte_marker pte_marker_get(swp_entry_t entry) +{ + return swp_offset(entry) & PTE_MARKER_MASK; +} + +static inline bool is_pte_marker(pte_t pte) +{ + return is_swap_pte(pte) && is_pte_marker_entry(pte_to_swp_entry(pte)); +} + +#else /* CONFIG_PTE_MARKER */ + +static inline swp_entry_t make_pte_marker_entry(pte_marker marker) +{ + /* This should never be called if !CONFIG_PTE_MARKER */ + WARN_ON_ONCE(1); + return swp_entry(0, 0); +} + +static inline bool is_pte_marker_entry(swp_entry_t entry) +{ + return false; +} + +static inline pte_marker pte_marker_get(swp_entry_t entry) +{ + return 0; +} + +static inline bool is_pte_marker(pte_t pte) +{ + return false; +} + +#endif /* CONFIG_PTE_MARKER */ + +static inline pte_t make_pte_marker(pte_marker marker) +{ + return swp_entry_to_pte(make_pte_marker_entry(marker)); +} + +/* + * This is a special version to check pte_none() just to cover the case when + * the pte is a pte marker. It existed because in many cases the pte marker + * should be seen as a none pte; it's just that we have stored some information + * onto the none pte so it becomes not-none any more. + * + * It should be used when the pte is file-backed, ram-based and backing + * userspace pages, like shmem. It is not needed upon pgtables that do not + * support pte markers at all. For example, it's not needed on anonymous + * memory, kernel-only memory (including when the system is during-boot), + * non-ram based generic file-system. It's fine to be used even there, but the + * extra pte marker check will be pure overhead. + * + * For systems configured with !CONFIG_PTE_MARKER this will be automatically + * optimized to pte_none(). + */ +static inline int pte_none_mostly(pte_t pte) +{ + return pte_none(pte) || is_pte_marker(pte); +} + static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry) { struct page *p = pfn_to_page(swp_offset(entry)); -- cgit From f46f2adecdcc1ba0799383e67fe98f65f41fea5c Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:52 -0700 Subject: mm: check against orig_pte for finish_fault() This patch allows do_fault() to trigger on !pte_none() cases too. This prepares for the pte markers to be handled by do_fault() just like none pte. To achieve this, instead of unconditionally check against pte_none() in finish_fault(), we may hit the case that the orig_pte was some pte marker so what we want to do is to replace the pte marker with some valid pte entry. Then if orig_pte was set we'd want to check the current *pte (under pgtable lock) against orig_pte rather than none pte. Right now there's no solid way to safely reference orig_pte because when pmd is not allocated handle_pte_fault() will not initialize orig_pte, so it's not safe to reference it. There's another solution proposed before this patch to do pte_clear() for vmf->orig_pte for pmd==NULL case, however it turns out it'll break arm32 because arm32 could have assumption that pte_t* pointer will always reside on a real ram32 pgtable, not any kernel stack variable. To solve this, we add a new flag FAULT_FLAG_ORIG_PTE_VALID, and it'll be set along with orig_pte when there is valid orig_pte, or it'll be cleared when orig_pte was not initialized. It'll be updated every time we call handle_pte_fault(), so e.g. if a page fault retry happened it'll be properly updated along with orig_pte. [1] https://lore.kernel.org/lkml/710c48c9-406d-e4c5-a394-10501b951316@samsung.com/ [akpm@linux-foundation.org: coding-style cleanups] [peterx@redhat.com: fix crash reported by Marek] Link: https://lkml.kernel.org/r/Ylb9rXJyPm8/ao8f@xz-m1.local Link: https://lkml.kernel.org/r/20220405014836.14077-1-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: Alistair Popple Tested-by: Marek Szyprowski Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7216d77a5884..dd382270ae40 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -822,6 +822,8 @@ typedef struct { * @FAULT_FLAG_UNSHARE: The fault is an unsharing request to unshare (and mark * exclusive) a possibly shared anonymous page that is * mapped R/O. + * @FAULT_FLAG_ORIG_PTE_VALID: whether the fault has vmf->orig_pte cached. + * We should only access orig_pte if this flag set. * * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify * whether we would allow page faults to retry by specifying these two @@ -858,6 +860,7 @@ enum fault_flag { FAULT_FLAG_INSTRUCTION = 1 << 8, FAULT_FLAG_INTERRUPTIBLE = 1 << 9, FAULT_FLAG_UNSHARE = 1 << 10, + FAULT_FLAG_ORIG_PTE_VALID = 1 << 11, }; #endif /* _LINUX_MM_TYPES_H */ -- cgit From 1db9dbc2ef05205bf1022f9b14aa29b1dd8efd7e Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:52 -0700 Subject: mm/uffd: PTE_MARKER_UFFD_WP This patch introduces the 1st user of pte marker: the uffd-wp marker. When the pte marker is installed with the uffd-wp bit set, it means this pte was wr-protected by uffd. We will use this special pte to arm the ptes that got either unmapped or swapped out for a file-backed region that was previously wr-protected. This special pte could trigger a page fault just like swap entries. This idea is greatly inspired by Hugh and Andrea in the discussion, which is referenced in the links below. Some helpers are introduced to detect whether a swap pte is uffd wr-protected. After the pte marker introduced, one swap pte can be wr-protected in two forms: either it is a normal swap pte and it has _PAGE_SWP_UFFD_WP set, or it's a pte marker that has PTE_MARKER_UFFD_WP set. [peterx@redhat.com: fixup] Link: https://lkml.kernel.org/r/YkzKiM8tI4+qOfXF@xz-m1.local Link: https://lore.kernel.org/lkml/20201126222359.8120-1-peterx@redhat.com/ Link: https://lore.kernel.org/lkml/20201130230603.46187-1-peterx@redhat.com/ Link: https://lkml.kernel.org/r/20220405014838.14131-1-peterx@redhat.com Signed-off-by: Peter Xu Suggested-by: Andrea Arcangeli Suggested-by: Hugh Dickins Cc: Alistair Popple Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/swapops.h | 3 ++- include/linux/userfaultfd_k.h | 47 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 49 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index bb70da461b7d..09945342bf76 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -276,7 +276,8 @@ static inline int is_readable_migration_entry(swp_entry_t entry) typedef unsigned long pte_marker; -#define PTE_MARKER_MASK (0) +#define PTE_MARKER_UFFD_WP BIT(0) +#define PTE_MARKER_MASK (PTE_MARKER_UFFD_WP) #ifdef CONFIG_PTE_MARKER diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 33cea484d1ad..3354d9ddc35f 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -15,6 +15,8 @@ #include #include +#include +#include #include /* The set of all possible UFFD-related VM flags. */ @@ -236,4 +238,49 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm, #endif /* CONFIG_USERFAULTFD */ +static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + return is_pte_marker_entry(entry) && + (pte_marker_get(entry) & PTE_MARKER_UFFD_WP); +#else + return false; +#endif +} + +static inline bool pte_marker_uffd_wp(pte_t pte) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + swp_entry_t entry; + + if (!is_swap_pte(pte)) + return false; + + entry = pte_to_swp_entry(pte); + + return pte_marker_entry_uffd_wp(entry); +#else + return false; +#endif +} + +/* + * Returns true if this is a swap pte and was uffd-wp wr-protected in either + * forms (pte marker or a normal swap pte), false otherwise. + */ +static inline bool pte_swp_uffd_wp_any(pte_t pte) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + if (!is_swap_pte(pte)) + return false; + + if (pte_swp_uffd_wp(pte)) + return true; + + if (pte_marker_uffd_wp(pte)) + return true; +#endif + return false; +} + #endif /* _LINUX_USERFAULTFD_K_H */ -- cgit From 8ee79edff6d3b43b2d0c1e41f92b32e128988b22 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:52 -0700 Subject: mm/shmem: take care of UFFDIO_COPY_MODE_WP Pass wp_copy into shmem_mfill_atomic_pte() through the stack, then apply the UFFD_WP bit properly when the UFFDIO_COPY on shmem is with UFFDIO_COPY_MODE_WP. wp_copy lands mfill_atomic_install_pte() finally. Note: we must do pte_wrprotect() if !writable in mfill_atomic_install_pte(), as mk_pte() could return a writable pte (e.g., when VM_SHARED on a shmem file). Link: https://lkml.kernel.org/r/20220405014841.14185-1-peterx@redhat.com Signed-off-by: Peter Xu Cc: Alistair Popple Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/shmem_fs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 3e915cc550bc..a68f982f22d1 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -145,11 +145,11 @@ extern int shmem_mfill_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - bool zeropage, + bool zeropage, bool wp_copy, struct page **pagep); #else /* !CONFIG_SHMEM */ #define shmem_mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \ - src_addr, zeropage, pagep) ({ BUG(); 0; }) + src_addr, zeropage, wp_copy, pagep) ({ BUG(); 0; }) #endif /* CONFIG_SHMEM */ #endif /* CONFIG_USERFAULTFD */ -- cgit From 9c28a205c06123b9f0a0c4d819ece9f5f552d004 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:53 -0700 Subject: mm/shmem: handle uffd-wp special pte in page fault handler File-backed memories are prone to unmap/swap so the ptes are always unstable, because they can be easily faulted back later using the page cache. This could lead to uffd-wp getting lost when unmapping or swapping out such memory. One example is shmem. PTE markers are needed to store those information. This patch prepares it by handling uffd-wp pte markers first it is applied elsewhere, so that the page fault handler can recognize uffd-wp pte markers. The handling of uffd-wp pte markers is similar to missing fault, it's just that we'll handle this "missing fault" when we see the pte markers, meanwhile we need to make sure the marker information is kept during processing the fault. This is a slow path of uffd-wp handling, because zapping of wr-protected shmem ptes should be rare. So far it should only trigger in two conditions: (1) When trying to punch holes in shmem_fallocate(), there is an optimization to zap the pgtables before evicting the page. (2) When swapping out shmem pages. Because of this, the page fault handling is simplifed too by not sending the wr-protect message in the 1st page fault, instead the page will be installed read-only, so the uffd-wp message will be generated in the next fault, which will trigger the do_wp_page() path of general uffd-wp handling. Disable fault-around for all uffd-wp registered ranges for extra safety just like uffd-minor fault, and clean the code up. Link: https://lkml.kernel.org/r/20220405014844.14239-1-peterx@redhat.com Signed-off-by: Peter Xu Cc: Alistair Popple Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/userfaultfd_k.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 3354d9ddc35f..e7afcdfd4b46 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -96,6 +96,18 @@ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); } +/* + * Don't do fault around for either WP or MINOR registered uffd range. For + * MINOR registered range, fault around will be a total disaster and ptes can + * be installed without notifications; for WP it should mostly be fine as long + * as the fault around checks for pte_none() before the installation, however + * to be super safe we just forbid it. + */ +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; @@ -236,6 +248,11 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm, { } +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) +{ + return false; +} + #endif /* CONFIG_USERFAULTFD */ static inline bool pte_marker_entry_uffd_wp(swp_entry_t entry) -- cgit From 999dad824c39ed14dee7c4412aae531ba9e74a90 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:53 -0700 Subject: mm/shmem: persist uffd-wp bit across zapping for file-backed File-backed memory is prone to being unmapped at any time. It means all information in the pte will be dropped, including the uffd-wp flag. To persist the uffd-wp flag, we'll use the pte markers. This patch teaches the zap code to understand uffd-wp and know when to keep or drop the uffd-wp bit. Add a new flag ZAP_FLAG_DROP_MARKER and set it in zap_details when we don't want to persist such an information, for example, when destroying the whole vma, or punching a hole in a shmem file. For the rest cases we should never drop the uffd-wp bit, or the wr-protect information will get lost. The new ZAP_FLAG_DROP_MARKER needs to be put into mm.h rather than memory.c because it'll be further referenced in hugetlb files later. Link: https://lkml.kernel.org/r/20220405014847.14295-1-peterx@redhat.com Signed-off-by: Peter Xu Cc: Alistair Popple Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/mm.h | 10 ++++++++++ include/linux/mm_inline.h | 43 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 53 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d63ba0e0e068..61786259e52a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3428,4 +3428,14 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start, } #endif +typedef unsigned int __bitwise zap_flags_t; + +/* + * Whether to drop the pte markers, for example, the uffd-wp information for + * file-backed memory. This should only be specified when we will completely + * drop the page in the mm, either by truncation or unmapping of the vma. By + * default, the flag is not set. + */ +#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) + #endif /* _LINUX_MM_H */ diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index ac32125745ab..7b25b53c474a 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -6,6 +6,8 @@ #include #include #include +#include +#include /** * folio_is_file_lru - Should the folio be on a file LRU or anon LRU? @@ -316,5 +318,46 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) return atomic_read(&mm->tlb_flush_pending) > 1; } +/* + * If this pte is wr-protected by uffd-wp in any form, arm the special pte to + * replace a none pte. NOTE! This should only be called when *pte is already + * cleared so we will never accidentally replace something valuable. Meanwhile + * none pte also means we are not demoting the pte so tlb flushed is not needed. + * E.g., when pte cleared the caller should have taken care of the tlb flush. + * + * Must be called with pgtable lock held so that no thread will see the none + * pte, and if they see it, they'll fault and serialize at the pgtable lock. + * + * This function is a no-op if PTE_MARKER_UFFD_WP is not enabled. + */ +static inline void +pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, pte_t pteval) +{ +#ifdef CONFIG_PTE_MARKER_UFFD_WP + bool arm_uffd_pte = false; + + /* The current status of the pte should be "cleared" before calling */ + WARN_ON_ONCE(!pte_none(*pte)); + + if (vma_is_anonymous(vma) || !userfaultfd_wp(vma)) + return; + + /* A uffd-wp wr-protected normal pte */ + if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) + arm_uffd_pte = true; + + /* + * A uffd-wp wr-protected swap pte. Note: this should even cover an + * existing pte marker with uffd-wp bit set. + */ + if (unlikely(pte_swp_uffd_wp_any(pteval))) + arm_uffd_pte = true; + + if (unlikely(arm_uffd_pte)) + set_pte_at(vma->vm_mm, addr, pte, + make_pte_marker(PTE_MARKER_UFFD_WP)); +#endif +} #endif -- cgit From 6041c69179034278ac6d57f90a55b09e588f4b90 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:54 -0700 Subject: mm/hugetlb: take care of UFFDIO_COPY_MODE_WP Pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout the stack. Apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with UFFDIO_COPY. Hugetlb pages are only managed by hugetlbfs, so we're safe even without setting dirty bit in the huge pte if the page is installed as read-only. However we'd better still keep the dirty bit set for a read-only UFFDIO_COPY pte (when UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with shmem, but also because the page does contain dirty data that the kernel just copied from the userspace. Link: https://lkml.kernel.org/r/20220405014904.14643-1-peterx@redhat.com Signed-off-by: Peter Xu Cc: Alistair Popple Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ac2ece9e9c79..159b253e523d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -160,7 +160,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep); + struct page **pagep, + bool wp_copy); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, @@ -356,7 +357,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { BUG(); return 0; -- cgit From 5a90d5a103c2badfcf12d48e2fec350969e3f486 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:54 -0700 Subject: mm/hugetlb: handle UFFDIO_WRITEPROTECT This starts from passing cp_flags into hugetlb_change_protection() so hugetlb will be able to handle MM_CP_UFFD_WP[_RESOLVE] requests. huge_pte_clear_uffd_wp() is introduced to handle the case where the UFFDIO_WRITEPROTECT is requested upon migrating huge page entries. Link: https://lkml.kernel.org/r/20220405014906.14708-1-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz Cc: Alistair Popple Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 159b253e523d..f1143f1fb444 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -211,7 +211,8 @@ struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, int pmd_huge(pmd_t pmd); int pud_huge(pud_t pud); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, - unsigned long address, unsigned long end, pgprot_t newprot); + unsigned long address, unsigned long end, pgprot_t newprot, + unsigned long cp_flags); bool is_hugetlb_entry_migration(pte_t pte); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); @@ -397,7 +398,8 @@ static inline void move_hugetlb_state(struct page *oldpage, static inline unsigned long hugetlb_change_protection( struct vm_area_struct *vma, unsigned long address, - unsigned long end, pgprot_t newprot) + unsigned long end, pgprot_t newprot, + unsigned long cp_flags) { return 0; } -- cgit From 05e90bd05eea33fc77d6b11e121e2da01fee379f Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:55 -0700 Subject: mm/hugetlb: only drop uffd-wp special pte if required As with shmem uffd-wp special ptes, only drop the uffd-wp special swap pte if unmapping an entire vma or synchronized such that faults can not race with the unmap operation. This requires passing zap_flags all the way to the lowest level hugetlb unmap routine: __unmap_hugepage_range. In general, unmap calls originated in hugetlbfs code will pass the ZAP_FLAG_DROP_MARKER flag as synchronization is in place to prevent faults. The exception is hole punch which will first unmap without any synchronization. Later when hole punch actually removes the page from the file, it will check to see if there was a subsequent fault and if so take the hugetlb fault mutex while unmapping again. This second unmap will pass in ZAP_FLAG_DROP_MARKER. The justification of "whether to apply ZAP_FLAG_DROP_MARKER flag when unmap a hugetlb range" is (IMHO): we should never reach a state when a page fault could errornously fault in a page-cache page that was wr-protected to be writable, even in an extremely short period. That could happen if e.g. we pass ZAP_FLAG_DROP_MARKER when hugetlbfs_punch_hole() calls hugetlb_vmdelete_list(), because if a page faults after that call and before remove_inode_hugepages() is executed, the page cache can be mapped writable again in the small racy window, that can cause unexpected data overwritten. [peterx@redhat.com: fix sparse warning] Link: https://lkml.kernel.org/r/Ylcdw8I1L5iAoWhb@xz-m1.local [akpm@linux-foundation.org: move zap_flags_t from mm.h to mm_types.h to fix build issues] Link: https://lkml.kernel.org/r/20220405014915.14873-1-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz Cc: Alistair Popple Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 8 +++++--- include/linux/mm.h | 2 -- include/linux/mm_types.h | 2 ++ 3 files changed, 7 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index f1143f1fb444..19cec415f546 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -143,11 +143,12 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, unsigned long *, unsigned long *, long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *, + zap_flags_t); void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, zap_flags_t zap_flags); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(char *buf, int len, int nid); void hugetlb_show_meminfo(void); @@ -406,7 +407,8 @@ static inline unsigned long hugetlb_change_protection( static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + zap_flags_t zap_flags) { BUG(); } diff --git a/include/linux/mm.h b/include/linux/mm.h index 61786259e52a..de32c0383387 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3428,8 +3428,6 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start, } #endif -typedef unsigned int __bitwise zap_flags_t; - /* * Whether to drop the pte markers, for example, the uffd-wp information for * file-backed memory. This should only be specified when we will completely diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index dd382270ae40..b34ff2cdbc4f 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -863,4 +863,6 @@ enum fault_flag { FAULT_FLAG_ORIG_PTE_VALID = 1 << 11, }; +typedef unsigned int __bitwise zap_flags_t; + #endif /* _LINUX_MM_TYPES_H */ -- cgit From bc70fbf269fdff410b0b6d75c3770b9f59117b90 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:55 -0700 Subject: mm/hugetlb: handle uffd-wp during fork() Firstly, we'll need to pass in dst_vma into copy_hugetlb_page_range() because for uffd-wp it's the dst vma that matters on deciding how we should treat uffd-wp protected ptes. We should recognize pte markers during fork and do the pte copy if needed. [lkp@intel.com: vma_needs_copy can be static] Link: https://lkml.kernel.org/r/Ylb0CGeFJlc4EzLk@7ec4ff11d4ae Link: https://lkml.kernel.org/r/20220405014918.14932-1-peterx@redhat.com Signed-off-by: Peter Xu Cc: Alistair Popple Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 19cec415f546..04f0186b089b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -137,7 +137,8 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, struct vm_area_struct *new_vma, unsigned long old_addr, unsigned long new_addr, unsigned long len); -int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); +int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, + struct vm_area_struct *, struct vm_area_struct *); long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, struct page **, struct vm_area_struct **, unsigned long *, unsigned long *, long, unsigned int, @@ -269,7 +270,9 @@ static inline struct page *follow_huge_addr(struct mm_struct *mm, } static inline int copy_hugetlb_page_range(struct mm_struct *dst, - struct mm_struct *src, struct vm_area_struct *vma) + struct mm_struct *src, + struct vm_area_struct *dst_vma, + struct vm_area_struct *src_vma) { BUG(); return 0; -- cgit From b1f9e876862d8f7176299ec4fb2108bc1045cbc8 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 12 May 2022 20:22:56 -0700 Subject: mm/uffd: enable write protection for shmem & hugetlbfs We've had all the necessary changes ready for both shmem and hugetlbfs. Turn on all the shmem/hugetlbfs switches for userfaultfd-wp. We can expand UFFD_API_RANGE_IOCTLS_BASIC with _UFFDIO_WRITEPROTECT too because all existing types now support write protection mode. Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h. Link: https://lkml.kernel.org/r/20220405014926.15101-1-peterx@redhat.com Signed-off-by: Peter Xu Cc: Alistair Popple Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Signed-off-by: Andrew Morton --- include/linux/userfaultfd_k.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index e7afcdfd4b46..732b522bacb7 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -18,6 +18,7 @@ #include #include #include +#include /* The set of all possible UFFD-related VM flags. */ #define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR) @@ -140,6 +141,25 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma) return vma->vm_flags & __VM_UFFD_FLAGS; } +static inline bool vma_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + if (vm_flags & VM_UFFD_MINOR) + return is_vm_hugetlb_page(vma) || vma_is_shmem(vma); + +#ifndef CONFIG_PTE_MARKER_UFFD_WP + /* + * If user requested uffd-wp but not enabled pte markers for + * uffd-wp, then shmem & hugetlbfs are not supported but only + * anonymous. + */ + if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma)) + return false; +#endif + return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || + vma_is_shmem(vma); +} + extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); -- cgit From b48d8a8e5ce53e3114a1ffe96563e3555b51d40b Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Thu, 12 May 2022 20:22:57 -0700 Subject: mm: page_isolation: move has_unmovable_pages() to mm/page_isolation.c Patch series "Use pageblock_order for cma and alloc_contig_range alignment", v11. This patchset tries to remove the MAX_ORDER-1 alignment requirement for CMA and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER adjustable at boot time[1]. The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range() isolates pageblocks to remove free memory from buddy allocator but isolating only a subset of pageblocks within a page spanning across multiple pageblocks causes free page accounting issues. Isolated page might not be put into the right free list, since the code assumes the migratetype of the first pageblock as the whole free page migratetype. This is based on the discussion at [2]. To remove the requirement, this patchset: 1. isolates pages at pageblock granularity instead of max(MAX_ORDER_NR_PAEGS, pageblock_nr_pages); 2. splits free pages across the specified range or migrates in-use pages across the specified range then splits the freed page to avoid free page accounting issues (it happens when multiple pageblocks within a single page have different migratetypes); 3. only checks unmovable pages within the range instead of MAX_ORDER - 1 aligned range during isolation to avoid alloc_contig_range() failure when pageblocks within a MAX_ORDER - 1 aligned range are allocated separately. 4. returns pages not in the range as it did before. One optimization might come later: 1. make MIGRATE_ISOLATE a separate bit to be able to restore the original migratetypes when isolation fails in the middle of the range. [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/ [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/ This patch (of 6): has_unmovable_pages() is only used in mm/page_isolation.c. Move it from mm/page_alloc.c and make it static. Link: https://lkml.kernel.org/r/20220425143118.2850746-2-zi.yan@sent.com Signed-off-by: Zi Yan Reviewed-by: Oscar Salvador Reviewed-by: Mike Rapoport Acked-by: David Hildenbrand Cc: Vlastimil Babka Cc: Mel Gorman Cc: Eric Ren Cc: Christophe Leroy Cc: Minchan Kim Cc: kernel test robot Signed-off-by: Andrew Morton --- include/linux/page-isolation.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 572458016331..e14eddf6741a 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -33,8 +33,6 @@ static inline bool is_migrate_isolate(int migratetype) #define MEMORY_OFFLINE 0x1 #define REPORT_FAILURE 0x2 -struct page *has_unmovable_pages(struct zone *zone, struct page *page, - int migratetype, int flags); void set_pageblock_migratetype(struct page *page, int migratetype); int move_freepages_block(struct zone *zone, struct page *page, int migratetype, int *num_movable); -- cgit From b2c9e2fbba32539626522b6aed30d1dde7b7e971 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Thu, 12 May 2022 20:22:58 -0700 Subject: mm: make alloc_contig_range work at pageblock granularity alloc_contig_range() worked at MAX_ORDER_NR_PAGES granularity to avoid merging pageblocks with different migratetypes. It might unnecessarily convert extra pageblocks at the beginning and at the end of the range. Change alloc_contig_range() to work at pageblock granularity. Special handling is needed for free pages and in-use pages across the boundaries of the range specified by alloc_contig_range(). Because these= Partially isolated pages causes free page accounting issues. The free pages will be split and freed into separate migratetype lists; the in-use= Pages will be migrated then the freed pages will be handled in the aforementioned way. [ziy@nvidia.com: fix deadlock/crash] Link: https://lkml.kernel.org/r/23A7297E-6C84-4138-A9FE-3598234004E6@nvidia.com Link: https://lkml.kernel.org/r/20220425143118.2850746-4-zi.yan@sent.com Signed-off-by: Zi Yan Reported-by: kernel test robot Cc: Christophe Leroy Cc: David Hildenbrand Cc: Eric Ren Cc: Mel Gorman Cc: Mike Rapoport Cc: Minchan Kim Cc: Oscar Salvador Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/page-isolation.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index e14eddf6741a..5456b7be38ae 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -42,7 +42,7 @@ int move_freepages_block(struct zone *zone, struct page *page, */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, - unsigned migratetype, int flags); + int migratetype, int flags, gfp_t gfp_flags); /* * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. @@ -50,7 +50,7 @@ start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, */ void undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, - unsigned migratetype); + int migratetype); /* * Test all pages in [start_pfn, end_pfn) are isolated or not. -- cgit From 11ac3e87ce09c27f4587a8c4fe0829d814021a82 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Thu, 12 May 2022 20:22:58 -0700 Subject: mm: cma: use pageblock_order as the single alignment Now alloc_contig_range() works at pageblock granularity. Change CMA allocation, which uses alloc_contig_range(), to use pageblock_nr_pages alignment. Link: https://lkml.kernel.org/r/20220425143118.2850746-6-zi.yan@sent.com Signed-off-by: Zi Yan Cc: Christophe Leroy Cc: David Hildenbrand Cc: Eric Ren Cc: kernel test robot Cc: Mel Gorman Cc: Mike Rapoport Cc: Minchan Kim Cc: Oscar Salvador Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/cma.h | 4 ++-- include/linux/mmzone.h | 5 +---- 2 files changed, 3 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cma.h b/include/linux/cma.h index a6f637342740..63873b93deaa 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -17,11 +17,11 @@ #define CMA_MAX_NAME 64 /* - * TODO: once the buddy -- especially pageblock merging and alloc_contig_range() + * the buddy -- especially pageblock merging and alloc_contig_range() * -- can deal with only some pageblocks of a higher-order page being * MIGRATE_CMA, we can use pageblock_nr_pages. */ -#define CMA_MIN_ALIGNMENT_PAGES MAX_ORDER_NR_PAGES +#define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES) struct cma; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 46ffab808f03..aab70355d64f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -54,10 +54,7 @@ enum migratetype { * * The way to use it is to change migratetype of a range of * pageblocks to MIGRATE_CMA which can be done by - * __free_pageblock_cma() function. What is important though - * is that a range of pageblocks must be aligned to - * MAX_ORDER_NR_PAGES should biggest page be bigger than - * a single pageblock. + * __free_pageblock_cma() function. */ MIGRATE_CMA, #endif -- cgit From adf88aa8ea7ff143825a2a8a7193f92e0e6fc79b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 12 May 2022 20:23:01 -0700 Subject: mm: remove alloc_pages_vma() All callers have now been converted to use vma_alloc_folio(), so convert the body of alloc_pages_vma() to allocate folios instead. Link: https://lkml.kernel.org/r/20220504182857.4013401-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/gfp.h | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 3e3d36fc2109..2a08a3c4ba95 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -613,13 +613,8 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, #ifdef CONFIG_NUMA struct page *alloc_pages(gfp_t gfp, unsigned int order); struct folio *folio_alloc(gfp_t gfp, unsigned order); -struct page *alloc_pages_vma(gfp_t gfp_mask, int order, - struct vm_area_struct *vma, unsigned long addr, - bool hugepage); struct folio *vma_alloc_folio(gfp_t gfp, int order, struct vm_area_struct *vma, unsigned long addr, bool hugepage); -#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ - alloc_pages_vma(gfp_mask, order, vma, addr, true) #else static inline struct page *alloc_pages(gfp_t gfp_mask, unsigned int order) { @@ -629,16 +624,17 @@ static inline struct folio *folio_alloc(gfp_t gfp, unsigned int order) { return __folio_alloc_node(gfp, order, numa_node_id()); } -#define alloc_pages_vma(gfp_mask, order, vma, addr, hugepage) \ - alloc_pages(gfp_mask, order) #define vma_alloc_folio(gfp, order, vma, addr, hugepage) \ folio_alloc(gfp, order) -#define alloc_hugepage_vma(gfp_mask, vma, addr, order) \ - alloc_pages(gfp_mask, order) #endif #define alloc_page(gfp_mask) alloc_pages(gfp_mask, 0) -#define alloc_page_vma(gfp_mask, vma, addr) \ - alloc_pages_vma(gfp_mask, 0, vma, addr, false) +static inline struct page *alloc_page_vma(gfp_t gfp, + struct vm_area_struct *vma, unsigned long addr) +{ + struct folio *folio = vma_alloc_folio(gfp, 0, vma, addr, false); + + return &folio->page; +} extern unsigned long __get_free_pages(gfp_t gfp_mask, unsigned int order); extern unsigned long get_zeroed_page(gfp_t gfp_mask); -- cgit From e2e3fdc7d4afdb8e7ba981eba7827993f2d390a8 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 12 May 2022 20:23:02 -0700 Subject: swap: turn get_swap_page() into folio_alloc_swap() This removes an assumption that a large folio is HPAGE_PMD_NR pages in size. Link: https://lkml.kernel.org/r/20220504182857.4013401-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/swap.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 5553189d0215..252c23a6667f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -470,7 +470,7 @@ static inline long get_nr_swap_pages(void) } extern void si_swapinfo(struct sysinfo *); -extern swp_entry_t get_swap_page(struct page *page); +swp_entry_t folio_alloc_swap(struct folio *folio); extern void put_swap_page(struct page *page, swp_entry_t entry); extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); @@ -583,7 +583,7 @@ static inline int try_to_free_swap(struct page *page) return 0; } -static inline swp_entry_t get_swap_page(struct page *page) +static inline swp_entry_t folio_alloc_swap(struct folio *folio) { swp_entry_t entry; entry.val = 0; @@ -643,12 +643,13 @@ static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask) #ifdef CONFIG_MEMCG_SWAP void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry); -extern int __mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry); -static inline int mem_cgroup_try_charge_swap(struct page *page, swp_entry_t entry) +int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry); +static inline int mem_cgroup_try_charge_swap(struct folio *folio, + swp_entry_t entry) { if (mem_cgroup_disabled()) return 0; - return __mem_cgroup_try_charge_swap(page, entry); + return __mem_cgroup_try_charge_swap(folio, entry); } extern void __mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_pages); @@ -666,7 +667,7 @@ static inline void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) { } -static inline int mem_cgroup_try_charge_swap(struct page *page, +static inline int mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry) { return 0; -- cgit From 64daa5d818ae3430f0785206b0af13ef528cb9ef Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 12 May 2022 20:23:03 -0700 Subject: vmscan: convert lazy freeing to folios Remove a hidden call to compound_head(), and account nr_pages instead of a single page. This matches the code in lru_lazyfree_fn() that accounts nr_pages to PGLAZYFREE. Link: https://lkml.kernel.org/r/20220504182857.4013401-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index fe580cb96683..8ea4b541c31e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1057,6 +1057,15 @@ static inline void count_memcg_page_event(struct page *page, count_memcg_events(memcg, idx, 1); } +static inline void count_memcg_folio_events(struct folio *folio, + enum vm_event_item idx, unsigned long nr) +{ + struct mem_cgroup *memcg = folio_memcg(folio); + + if (memcg) + count_memcg_events(memcg, idx, nr); +} + static inline void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) { @@ -1494,6 +1503,11 @@ static inline void count_memcg_page_event(struct page *page, { } +static inline void count_memcg_folio_events(struct folio *folio, + enum vm_event_item idx, unsigned long nr) +{ +} + static inline void count_memcg_event_mm(struct mm_struct *mm, enum vm_event_item idx) { -- cgit From dc786690a6a1d9a680e6872821291ad7ca9f520d Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 12 May 2022 20:23:03 -0700 Subject: mm: allow can_split_folio() to be called when THP are disabled The call to can_split_folio() in vmscan is currently guarded by a test of PageTransHuge() so the BUILD_BUG() is eliminated if THP are disabled. The next patch replaces that test with folio_test_large() which may be true, even when THP are disabled. However, if THP are disabled, we cannot split, so an unconditional return of false is appropriate. Link: https://lkml.kernel.org/r/20220504182857.4013401-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 9a26bd10e083..fbf36bb1be22 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -348,7 +348,6 @@ static inline void prep_transhuge_page(struct page *page) {} static inline bool can_split_folio(struct folio *folio, int *pextra_pins) { - BUILD_BUG(); return false; } static inline int -- cgit From 039bc1240165c99b932fb4be2f784948d1038eea Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 12 May 2022 20:23:04 -0700 Subject: mm/swap: add folio_throttle_swaprate The only use of the page argument to cgroup_throttle_swaprate() is to get the node ID, and this will be the same for all pages in the folio, so just pass in the first page of the folio. Link: https://lkml.kernel.org/r/20220504182857.4013401-18-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/swap.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 252c23a6667f..6bf1506e5080 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -640,6 +640,10 @@ static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask) { } #endif +static inline void folio_throttle_swaprate(struct folio *folio, gfp_t gfp) +{ + cgroup_throttle_swaprate(&folio->page, gfp); +} #ifdef CONFIG_MEMCG_SWAP void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry); -- cgit From da08e9b7932345aff0a748c55f8a25709505f2a3 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 12 May 2022 20:23:05 -0700 Subject: mm/shmem: convert shmem_swapin_page() to shmem_swapin_folio() shmem_swapin_page() only brings in order-0 pages, which are folios by definition. Link: https://lkml.kernel.org/r/20220504182857.4013401-24-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 530b1817b58c..39fa93e94b80 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -758,7 +758,7 @@ static inline void arch_swap_invalidate_area(int type) #endif #ifndef __HAVE_ARCH_SWAP_RESTORE -static inline void arch_swap_restore(swp_entry_t entry, struct page *page) +static inline void arch_swap_restore(swp_entry_t entry, struct folio *folio) { } #endif -- cgit From a9595b305c0f6cbbf9505325509d95637f6a7062 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 12 May 2022 20:23:05 -0700 Subject: mm: add folio_mapping_flags() This is the equivalent of PageMappingFlags and is needed for converting mm/migrate.c to folios. Link: https://lkml.kernel.org/r/20220504182857.4013401-25-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b70124b9c7c1..97eb88dbcbbc 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -650,6 +650,11 @@ __PAGEFLAG(Reported, reported, PF_NO_COMPOUND) #define PAGE_MAPPING_KSM (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE) #define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE) +static __always_inline bool folio_mapping_flags(struct folio *folio) +{ + return ((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS) != 0; +} + static __always_inline int PageMappingFlags(struct page *page) { return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) != 0; -- cgit From 8b463be3a0241558071e6ba9bf11e4bf42ccc856 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 12 May 2022 20:23:05 -0700 Subject: mm: add folio_test_movable() This is the folio equivalent of PageMovable() which is needed to convert mm/migrate.c to folios. Link: https://lkml.kernel.org/r/20220504182857.4013401-26-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/migrate.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 2707bfd43a0d..069a89e847f3 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -105,6 +105,11 @@ static inline void __ClearPageMovable(struct page *page) } #endif +static inline bool folio_test_movable(struct folio *folio) +{ + return PageMovable(&folio->page); +} + #ifdef CONFIG_NUMA_BALANCING extern int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, int node); -- cgit From de8c8e52836d0082188508548d0b939f49f7f0e6 Mon Sep 17 00:00:00 2001 From: Tong Tiangen Date: Thu, 12 May 2022 20:23:06 -0700 Subject: mm: page_table_check: add hooks to public helpers Move ptep_clear() to the include/linux/pgtable.h and add page table check relate hooks to some helpers, it's prepare for support page table check feature on new architecture. Optimize the implementation of ptep_clear(), page table hooks added page table check stubs, the interface control should be at stubs, there is no rationale for doing a IS_ENABLED() check here. For architectures that do not enable CONFIG_PAGE_TABLE_CHECK, they will call a fallback page table check stubs[1] when getting their page table helpers[2] in include/linux/pgtable.h. [1] page table check stubs defined in include/linux/page_table_check.h [2] ptep_clear() ptep_get_and_clear() pmdp_huge_get_and_clear() pudp_huge_get_and_clear() Link: https://lkml.kernel.org/r/20220507110114.4128854-4-tongtiangen@huawei.com Signed-off-by: Tong Tiangen Acked-by: Pasha Tatashin Cc: Anshuman Khandual Cc: Borislav Petkov Cc: Catalin Marinas Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Kefeng Wang Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 39fa93e94b80..e0a449b477c5 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -12,6 +12,7 @@ #include #include #include +#include #if 5 - defined(__PAGETABLE_P4D_FOLDED) - defined(__PAGETABLE_PUD_FOLDED) - \ defined(__PAGETABLE_PMD_FOLDED) != CONFIG_PGTABLE_LEVELS @@ -259,14 +260,6 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif -#ifndef __HAVE_ARCH_PTEP_CLEAR -static inline void ptep_clear(struct mm_struct *mm, unsigned long addr, - pte_t *ptep) -{ - pte_clear(mm, addr, ptep); -} -#endif - #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, @@ -274,10 +267,19 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, { pte_t pte = *ptep; pte_clear(mm, address, ptep); + page_table_check_pte_clear(mm, address, pte); return pte; } #endif +#ifndef __HAVE_ARCH_PTEP_CLEAR +static inline void ptep_clear(struct mm_struct *mm, unsigned long addr, + pte_t *ptep) +{ + ptep_get_and_clear(mm, addr, ptep); +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET static inline pte_t ptep_get(pte_t *ptep) { @@ -347,7 +349,10 @@ static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, pmd_t *pmdp) { pmd_t pmd = *pmdp; + pmd_clear(pmdp); + page_table_check_pmd_clear(mm, address, pmd); + return pmd; } #endif /* __HAVE_ARCH_PMDP_HUGE_GET_AND_CLEAR */ @@ -359,6 +364,8 @@ static inline pud_t pudp_huge_get_and_clear(struct mm_struct *mm, pud_t pud = *pudp; pud_clear(pudp); + page_table_check_pud_clear(mm, address, pud); + return pud; } #endif /* __HAVE_ARCH_PUDP_HUGE_GET_AND_CLEAR */ -- cgit From 2e7dc2b632a324c670ff11dd6c84d711043c683f Mon Sep 17 00:00:00 2001 From: Tong Tiangen Date: Thu, 12 May 2022 20:23:06 -0700 Subject: mm: remove __HAVE_ARCH_PTEP_CLEAR in pgtable.h Currently, there is no architecture definition __HAVE_ARCH_PTEP_CLEAR, Generic ptep_clear() is the only definition for all architecture, So drop the "#ifndef __HAVE_ARCH_PTEP_CLEAR". Link: https://lkml.kernel.org/r/20220507110114.4128854-5-tongtiangen@huawei.com Signed-off-by: Tong Tiangen Suggested-by: Anshuman Khandual Cc: Borislav Petkov Cc: Catalin Marinas Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Kefeng Wang Cc: Palmer Dabbelt Cc: Pasha Tatashin Cc: Paul Walmsley Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e0a449b477c5..ecb4c762d760 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -272,13 +272,11 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm, } #endif -#ifndef __HAVE_ARCH_PTEP_CLEAR static inline void ptep_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { ptep_get_and_clear(mm, addr, ptep); } -#endif #ifndef __HAVE_ARCH_PTEP_GET static inline pte_t ptep_get(pte_t *ptep) -- cgit From c8db8c2628afc7088a43de3f7cfbcc2ef1f182f7 Mon Sep 17 00:00:00 2001 From: Li kunyu Date: Thu, 12 May 2022 20:23:07 -0700 Subject: mm: functions may simplify the use of return values p4d_clear_huge may be optimized for void return type and function usage. vunmap_p4d_range function saves a few steps here. Link: https://lkml.kernel.org/r/20220507150630.90399-1-kunyu@nfschina.com Signed-off-by: Li kunyu Cc: Dave Hansen Cc: Andy Lutomirski Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: "H. Peter Anvin" Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index ecb4c762d760..3cdc16cfd867 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1448,16 +1448,13 @@ static inline int pmd_protnone(pmd_t pmd) #ifndef __PAGETABLE_P4D_FOLDED int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot); -int p4d_clear_huge(p4d_t *p4d); +void p4d_clear_huge(p4d_t *p4d); #else static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot) { return 0; } -static inline int p4d_clear_huge(p4d_t *p4d) -{ - return 0; -} +static inline void p4d_clear_huge(p4d_t *p4d) { } #endif /* !__PAGETABLE_P4D_FOLDED */ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot); @@ -1480,10 +1477,7 @@ static inline int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot) { return 0; } -static inline int p4d_clear_huge(p4d_t *p4d) -{ - return 0; -} +static inline void p4d_clear_huge(p4d_t *p4d) { } static inline int pud_clear_huge(pud_t *pud) { return 0; -- cgit From fe573327ffb1deba802c91dd1d4ff41dafa97a0e Mon Sep 17 00:00:00 2001 From: Vasily Averin Date: Thu, 12 May 2022 20:23:08 -0700 Subject: tracing: incorrect gfp_t conversion Fixes the following sparse warnings: include/trace/events/*: sparse: cast to restricted gfp_t include/trace/events/*: sparse: restricted gfp_t degrades to integer gfp_t type is bitwise and requires __force attributes for any casts. Link: https://lkml.kernel.org/r/331d88fe-f4f7-657c-02a2-d977f15fbff6@openvz.org Signed-off-by: Vasily Averin Cc: Steven Rostedt Cc: Ingo Molnar Signed-off-by: Andrew Morton --- include/linux/gfp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 2a08a3c4ba95..2d2ccae933c2 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -367,7 +367,7 @@ static inline int gfp_migratetype(const gfp_t gfp_flags) return MIGRATE_UNMOVABLE; /* Group based on mobility */ - return (gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT; + return (__force unsigned long)(gfp_flags & GFP_MOVABLE_MASK) >> GFP_MOVABLE_SHIFT; } #undef GFP_MOVABLE_MASK #undef GFP_MOVABLE_SHIFT -- cgit From 50d63b5bbfd12262069ad062611cd5e69c5e9e05 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 4 May 2022 16:14:41 -0300 Subject: vfio: Change vfio_external_user_iommu_id() to vfio_file_iommu_group() The only caller wants to get a pointer to the struct iommu_group associated with the VFIO group file. Instead of returning the group ID then searching sysfs for that string to get the struct iommu_group just directly return the iommu_group pointer already held by the vfio_group struct. It already has a safe lifetime due to the struct file kref, the vfio_group and thus the iommu_group cannot be destroyed while the group file is open. Reviewed-by: Kevin Tian Reviewed-by: Christoph Hellwig Reviewed-by: Yi Liu Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/3-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index be1e7db4a314..d8c34c8490cb 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -140,7 +140,7 @@ extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); extern bool vfio_external_group_match_file(struct vfio_group *group, struct file *filep); -extern int vfio_external_user_iommu_id(struct vfio_group *group); +extern struct iommu_group *vfio_file_iommu_group(struct file *file); extern long vfio_external_check_extension(struct vfio_group *group, unsigned long arg); -- cgit From c38ff5b0c373fbbd6a249eb461ffd4ae0f9dbfa0 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 4 May 2022 16:14:42 -0300 Subject: vfio: Remove vfio_external_group_match_file() vfio_group_fops_open() ensures there is only ever one struct file open for any struct vfio_group at any time: /* Do we need multiple instances of the group open? Seems not. */ opened = atomic_cmpxchg(&group->opened, 0, 1); if (opened) { vfio_group_put(group); return -EBUSY; Therefor the struct file * can be used directly to search the list of VFIO groups that KVM keeps instead of using the vfio_external_group_match_file() callback to try to figure out if the passed in FD matches the list or not. Delete vfio_external_group_match_file(). Reviewed-by: Kevin Tian Reviewed-by: Christoph Hellwig Reviewed-by: Yi Liu Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/4-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index d8c34c8490cb..df0adecdf1ea 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -138,8 +138,6 @@ int vfio_mig_get_next_state(struct vfio_device *device, */ extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); -extern bool vfio_external_group_match_file(struct vfio_group *group, - struct file *filep); extern struct iommu_group *vfio_file_iommu_group(struct file *file); extern long vfio_external_check_extension(struct vfio_group *group, unsigned long arg); -- cgit From a905ad043f32bbb0c35d4325036397f20f30c8a9 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 4 May 2022 16:14:43 -0300 Subject: vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent() Instead of a general extension check change the function into a limited test if the iommu_domain has enforced coherency, which is the only thing kvm needs to query. Make the new op self contained by properly refcounting the container before touching it. Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/5-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index df0adecdf1ea..7cc374da4839 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -139,8 +139,7 @@ int vfio_mig_get_next_state(struct vfio_device *device, extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); extern struct iommu_group *vfio_file_iommu_group(struct file *file); -extern long vfio_external_check_extension(struct vfio_group *group, - unsigned long arg); +extern bool vfio_file_enforced_coherent(struct file *file); #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) -- cgit From ba70a89f3c2a8279809ea0fc7684857c91938b8a Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 4 May 2022 16:14:44 -0300 Subject: vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm() Just change the argument from struct vfio_group to struct file *. Reviewed-by: Kevin Tian Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/6-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 7cc374da4839..b298a26c4f07 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -15,6 +15,8 @@ #include #include +struct kvm; + /* * VFIO devices can be placed in a set, this allows all devices to share this * structure and the VFIO core will provide a lock that is held around @@ -140,6 +142,7 @@ extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); extern struct iommu_group *vfio_file_iommu_group(struct file *file); extern bool vfio_file_enforced_coherent(struct file *file); +extern void vfio_file_set_kvm(struct file *file, struct kvm *kvm); #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) @@ -170,8 +173,6 @@ extern int vfio_unregister_notifier(struct vfio_device *device, enum vfio_notify_type type, struct notifier_block *nb); -struct kvm; -extern void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm); /* * Sub-module helpers -- cgit From 6a985ae80befcf2c00e7c889336bfe9e9739e2ef Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Wed, 4 May 2022 16:14:46 -0300 Subject: vfio/pci: Use the struct file as the handle not the vfio_group VFIO PCI does a security check as part of hot reset to prove that the user has permission to manipulate all the devices that will be impacted by the reset. Use a new API vfio_file_has_dev() to perform this security check against the struct file directly and remove the vfio_group from VFIO PCI. Since VFIO PCI was the last user of vfio_group_get_external_user() and vfio_group_put_external_user() remove it as well. Reviewed-by: Kevin Tian Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/8-v3-f7729924a7ea+25e33-vfio_kvm_no_group_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index b298a26c4f07..45b287826ce6 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -138,11 +138,10 @@ int vfio_mig_get_next_state(struct vfio_device *device, /* * External user API */ -extern struct vfio_group *vfio_group_get_external_user(struct file *filep); -extern void vfio_group_put_external_user(struct vfio_group *group); extern struct iommu_group *vfio_file_iommu_group(struct file *file); extern bool vfio_file_enforced_coherent(struct file *file); extern void vfio_file_set_kvm(struct file *file, struct kvm *kvm); +extern bool vfio_file_has_dev(struct file *file, struct vfio_device *device); #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) -- cgit From 1af0e4a0233fea7e8226cb977d379dc20f9bbe11 Mon Sep 17 00:00:00 2001 From: Christian Göttsche Date: Thu, 17 Feb 2022 15:18:57 +0100 Subject: security: declare member holding string literal const MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The struct security_hook_list member lsm is assigned in security_add_hooks() with string literals passed from the individual security modules. Declare the function parameter and the struct member const to signal their immutability. Reported by Clang [-Wwrite-strings]: security/selinux/hooks.c:7388:63: error: passing 'const char [8]' to parameter of type 'char *' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers] security_add_hooks(selinux_hooks, ARRAY_SIZE(selinux_hooks), selinux); ^~~~~~~~~ ./include/linux/lsm_hooks.h:1629:11: note: passing argument to parameter 'lsm' here char *lsm); ^ Signed-off-by: Christian Göttsche Reviewed-by: Paul Moore Reviewed-by: Casey Schaufler Signed-off-by: Paul Moore --- include/linux/lsm_hooks.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 419b5febc3ca..47cdf3fbecef 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1595,7 +1595,7 @@ struct security_hook_list { struct hlist_node list; struct hlist_head *head; union security_list_options hook; - char *lsm; + const char *lsm; } __randomize_layout; /* @@ -1630,7 +1630,7 @@ extern struct security_hook_heads security_hook_heads; extern char *lsm_names; extern void security_add_hooks(struct security_hook_list *hooks, int count, - char *lsm); + const char *lsm); #define LSM_FLAG_LEGACY_MAJOR BIT(0) #define LSM_FLAG_EXCLUSIVE BIT(1) -- cgit From 1366992e16bddd5e2d9a561687f367f9f802e2e4 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sun, 10 Apr 2022 16:49:50 +0200 Subject: timekeeping: Add raw clock fallback for random_get_entropy() The addition of random_get_entropy_fallback() provides access to whichever time source has the highest frequency, which is useful for gathering entropy on platforms without available cycle counters. It's not necessarily as good as being able to quickly access a cycle counter that the CPU has, but it's still something, even when it falls back to being jiffies-based. In the event that a given arch does not define get_cycles(), falling back to the get_cycles() default implementation that returns 0 is really not the best we can do. Instead, at least calling random_get_entropy_fallback() would be preferable, because that always needs to return _something_, even falling back to jiffies eventually. It's not as though random_get_entropy_fallback() is super high precision or guaranteed to be entropic, but basically anything that's not zero all the time is better than returning zero all the time. Finally, since random_get_entropy_fallback() is used during extremely early boot when randomizing freelists in mm_init(), it can be called before timekeeping has been initialized. In that case there really is nothing we can do; jiffies hasn't even started ticking yet. So just give up and return 0. Suggested-by: Thomas Gleixner Signed-off-by: Jason A. Donenfeld Reviewed-by: Thomas Gleixner Cc: Arnd Bergmann Cc: Theodore Ts'o --- include/linux/timex.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/timex.h b/include/linux/timex.h index 5745c90c8800..3871b06bd302 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -62,6 +62,8 @@ #include #include +unsigned long random_get_entropy_fallback(void); + #include #ifndef random_get_entropy @@ -74,8 +76,14 @@ * * By default we use get_cycles() for this purpose, but individual * architectures may override this in their asm/timex.h header file. + * If a given arch does not have get_cycles(), then we fallback to + * using random_get_entropy_fallback(). */ +#ifdef get_cycles #define random_get_entropy() ((unsigned long)get_cycles()) +#else +#define random_get_entropy() random_get_entropy_fallback() +#endif #endif /* -- cgit From 16d1e00c7e8a4950e914223b3112144289a82913 Mon Sep 17 00:00:00 2001 From: Joanne Koong Date: Mon, 9 May 2022 15:42:52 -0700 Subject: bpf: Add MEM_UNINIT as a bpf_type_flag Instead of having uninitialized versions of arguments as separate bpf_arg_types (eg ARG_PTR_TO_UNINIT_MEM as the uninitialized version of ARG_PTR_TO_MEM), we can instead use MEM_UNINIT as a bpf_type_flag modifier to denote that the argument is uninitialized. Doing so cleans up some of the logic in the verifier. We no longer need to do two checks against an argument type (eg "if (base_type(arg_type) == ARG_PTR_TO_MEM || base_type(arg_type) == ARG_PTR_TO_UNINIT_MEM)"), since uninitialized and initialized versions of the same argument type will now share the same base type. In the near future, MEM_UNINIT will be used by dynptr helper functions as well. Signed-off-by: Joanne Koong Acked-by: Andrii Nakryiko Acked-by: David Vernet Link: https://lore.kernel.org/r/20220509224257.3222614-2-joannelkoong@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5061ccd8b2dc..c107392b0ba7 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -390,7 +390,10 @@ enum bpf_type_flag { */ PTR_UNTRUSTED = BIT(6 + BPF_BASE_TYPE_BITS), - __BPF_TYPE_LAST_FLAG = PTR_UNTRUSTED, + MEM_UNINIT = BIT(7 + BPF_BASE_TYPE_BITS), + + __BPF_TYPE_FLAG_MAX, + __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; /* Max number of base types. */ @@ -409,16 +412,11 @@ enum bpf_arg_type { ARG_CONST_MAP_PTR, /* const argument used as pointer to bpf_map */ ARG_PTR_TO_MAP_KEY, /* pointer to stack used as map key */ ARG_PTR_TO_MAP_VALUE, /* pointer to stack used as map value */ - ARG_PTR_TO_UNINIT_MAP_VALUE, /* pointer to valid memory used to store a map value */ - /* the following constraints used to prototype bpf_memcmp() and other - * functions that access data on eBPF program stack + /* Used to prototype bpf_memcmp() and other functions that access data + * on eBPF program stack */ ARG_PTR_TO_MEM, /* pointer to valid memory (stack, packet, map value) */ - ARG_PTR_TO_UNINIT_MEM, /* pointer to memory does not need to be initialized, - * helper function must fill all bytes or clear - * them in error case. - */ ARG_CONST_SIZE, /* number of bytes accessed from memory */ ARG_CONST_SIZE_OR_ZERO, /* number of bytes accessed from memory or 0 */ @@ -450,6 +448,10 @@ enum bpf_arg_type { ARG_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_ALLOC_MEM, ARG_PTR_TO_STACK_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_STACK, ARG_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | ARG_PTR_TO_BTF_ID, + /* pointer to memory does not need to be initialized, helper function must fill + * all bytes or clear them in error case. + */ + ARG_PTR_TO_UNINIT_MEM = MEM_UNINIT | ARG_PTR_TO_MEM, /* This must be the last entry. Its purpose is to ensure the enum is * wide enough to hold the higher bits reserved for bpf_type_flag. -- cgit From e7392b4eca84e86718a7b73aea8fa5573b06ba76 Mon Sep 17 00:00:00 2001 From: "Fabio M. De Francesco" Date: Fri, 13 May 2022 16:48:55 -0700 Subject: mm/highmem: fix kernel-doc warnings in highmem*.h Patch series "Extend and reorganize Highmem's documentation", v4. This series has the purpose to extend and reorganize Highmem's documentation. This is a work in progress because some information should still be moved from highmem.rst to highmem.h and highmem-internal.h. Specifically I'm talking about moving the "how to" information to the relevant headers, as it as been suggested by Ira Weiny (Intel). Also, this is a work in progress because some kdocs in highmem.h and highmem-internal.h should be improved. This patch (of 4): `scripts/kernel-doc -v -none include/linux/highmem*` reports the following warnings: include/linux/highmem.h:160: warning: expecting prototype for kunmap_atomic(). Prototype was for nr_free_highpages() instead include/linux/highmem.h:204: warning: No description found for return value of 'alloc_zeroed_user_highpage_movable' include/linux/highmem-internal.h:256: warning: Function parameter or member '__addr' not described in 'kunmap_atomic' include/linux/highmem-internal.h:256: warning: Excess function parameter 'addr' description in 'kunmap_atomic' Fix these warnings by (1) moving the kernel-doc comments from highmem.h to highmem-internal.h (which is the file were the kunmap_atomic() macro is actually defined), (2) extending and merging it with the comment which was already in highmem-internal.h, and (3) using correct parameter names (4) correcting a few technical inaccuracies in comments, and (5) adding a deprecation notice in kunmap_atomic() for consistency with kmap_atomic(). Link: https://lkml.kernel.org/r/20220428212455.892-1-fmdefrancesco@gmail.com Link: https://lkml.kernel.org/r/20220428212455.892-2-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco Reviewed-by: Sebastian Andrzej Siewior Reviewed-by: Ira Weiny Cc: Matthew Wilcox Cc: Mike Rapoport Cc: Catalin Marinas Cc: Will Deacon Cc: Peter Collingbourne Cc: Vlastimil Babka Cc: Jonathan Corbet Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- include/linux/highmem-internal.h | 18 +++++++++++++++--- include/linux/highmem.h | 22 ++++++++-------------- 2 files changed, 23 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/highmem-internal.h b/include/linux/highmem-internal.h index a77be5630209..a694ca95c4ed 100644 --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -236,9 +236,21 @@ static inline unsigned long totalhigh_pages(void) { return 0UL; } #endif /* CONFIG_HIGHMEM */ -/* - * Prevent people trying to call kunmap_atomic() as if it were kunmap() - * kunmap_atomic() should get the return value of kmap_atomic, not the page. +/** + * kunmap_atomic - Unmap the virtual address mapped by kmap_atomic() - deprecated! + * @__addr: Virtual address to be unmapped + * + * Unmaps an address previously mapped by kmap_atomic() and re-enables + * pagefaults. Depending on PREEMP_RT configuration, re-enables also + * migration and preemption. Users should not count on these side effects. + * + * Mappings should be unmapped in the reverse order that they were mapped. + * See kmap_local_page() for details on nesting. + * + * @__addr can be any address within the mapped page, so there is no need + * to subtract any offset that has been added. In contrast to kunmap(), + * this function takes the address returned from kmap_atomic(), not the + * page passed to it. The compiler will warn you if you pass the page. */ #define kunmap_atomic(__addr) \ do { \ diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 67720bfd6ade..c05ae395898a 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -37,7 +37,7 @@ static inline void *kmap(struct page *page); /** * kunmap - Unmap the virtual address mapped by kmap() - * @addr: Virtual address to be unmapped + * @page: Pointer to the page which was mapped by kmap() * * Counterpart to kmap(). A NOOP for CONFIG_HIGHMEM=n and for mappings of * pages in the low memory area. @@ -138,24 +138,16 @@ static inline void *kmap_local_folio(struct folio *folio, size_t offset); * * Returns: The virtual address of the mapping * - * Effectively a wrapper around kmap_local_page() which disables pagefaults - * and preemption. + * In fact a wrapper around kmap_local_page() which also disables pagefaults + * and, depending on PREEMPT_RT configuration, also CPU migration and + * preemption. Therefore users should not count on the latter two side effects. + * + * Mappings should always be released by kunmap_atomic(). * * Do not use in new code. Use kmap_local_page() instead. */ static inline void *kmap_atomic(struct page *page); -/** - * kunmap_atomic - Unmap the virtual address mapped by kmap_atomic() - * @addr: Virtual address to be unmapped - * - * Counterpart to kmap_atomic(). - * - * Effectively a wrapper around kunmap_local() which additionally undoes - * the side effects of kmap_atomic(), i.e. reenabling pagefaults and - * preemption. - */ - /* Highmem related interfaces for management code */ static inline unsigned int nr_free_highpages(void); static inline unsigned long totalhigh_pages(void); @@ -191,6 +183,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) * @vma: The VMA the page is to be allocated for * @vaddr: The virtual address the page will be inserted into * + * Returns: The allocated and zeroed HIGHMEM page + * * This function will allocate a page for a VMA that the caller knows will * be able to migrate in the future using move_pages() or reclaimed * -- cgit From 85a85e7601263f2b4edc86136dd0172c2cfbfaa1 Mon Sep 17 00:00:00 2001 From: "Fabio M. De Francesco" Date: Fri, 13 May 2022 16:48:55 -0700 Subject: Documentation/vm: move "Using kmap-atomic" to highmem.h The use of kmap_atomic() is new code is being deprecated in favor of kmap_local_page(). For this reason the "Using kmap_atomic" section in highmem.rst is obsolete and unnecessary, but it can still help developers if it were moved to kdocs in highmem.h. Therefore, move the relevant parts of this section from highmem.rst and merge them with the kdocs in highmem.h. Link: https://lkml.kernel.org/r/20220428212455.892-4-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco Suggested-by: Ira Weiny Reviewed-by: Sebastian Andrzej Siewior Reviewed-by: Ira Weiny Cc: Jonathan Corbet Cc: Matthew Wilcox Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Catalin Marinas Cc: Mike Rapoport Cc: Peter Collingbourne Cc: Vlastimil Babka Cc: Will Deacon Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- include/linux/highmem.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index c05ae395898a..3af34de54330 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -145,6 +145,37 @@ static inline void *kmap_local_folio(struct folio *folio, size_t offset); * Mappings should always be released by kunmap_atomic(). * * Do not use in new code. Use kmap_local_page() instead. + * + * It is used in atomic context when code wants to access the contents of a + * page that might be allocated from high memory (see __GFP_HIGHMEM), for + * example a page in the pagecache. The API has two functions, and they + * can be used in a manner similar to the following: + * + * -- Find the page of interest. -- + * struct page *page = find_get_page(mapping, offset); + * + * -- Gain access to the contents of that page. -- + * void *vaddr = kmap_atomic(page); + * + * -- Do something to the contents of that page. -- + * memset(vaddr, 0, PAGE_SIZE); + * + * -- Unmap that page. -- + * kunmap_atomic(vaddr); + * + * Note that the kunmap_atomic() call takes the result of the kmap_atomic() + * call, not the argument. + * + * If you need to map two pages because you want to copy from one page to + * another you need to keep the kmap_atomic calls strictly nested, like: + * + * vaddr1 = kmap_atomic(page1); + * vaddr2 = kmap_atomic(page2); + * + * memcpy(vaddr1, vaddr2, PAGE_SIZE); + * + * kunmap_atomic(vaddr2); + * kunmap_atomic(vaddr1); */ static inline void *kmap_atomic(struct page *page); -- cgit From 5d4af6195c87c6b162b7963e0ad00a214b80d764 Mon Sep 17 00:00:00 2001 From: Baolin Wang Date: Fri, 13 May 2022 16:48:55 -0700 Subject: mm: rmap: fix CONT-PTE/PMD size hugetlb issue when migration On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb: 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page size specified. When migrating a hugetlb page, we will get the relevant page table entry by huge_pte_offset() only once to nuke it and remap it with a migration pte entry. This is correct for PMD or PUD size hugetlb, since they always contain only one pmd entry or pud entry in the page table. However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, since they can contain several continuous pte or pmd entry with same page table attributes. So we will nuke or remap only one pte or pmd entry for this CONT-PTE/PMD size hugetlb page, which is not expected for hugetlb migration. The problem is we can still continue to modify the subpages' data of a hugetlb page during migrating a hugetlb page, which can cause a serious data consistent issue, since we did not nuke the page table entry and set a migration pte for the subpages of a hugetlb page. To fix this issue, we should change to use huge_ptep_clear_flush() to nuke a hugetlb page table, and remap it with set_huge_pte_at() and set_huge_swap_pte_at() when migrating a hugetlb page, which already considered the CONT-PTE or CONT-PMD size hugetlb. [akpm@linux-foundation.org: fix nommu build] [baolin.wang@linux.alibaba.com: fix build errors for !CONFIG_MMU] Link: https://lkml.kernel.org/r/a4baca670aca637e7198d9ae4543b8873cb224dc.1652270205.git.baolin.wang@linux.alibaba.com Link: https://lkml.kernel.org/r/ea5abf529f0997b5430961012bfda6166c1efc8c.1652147571.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang Reviewed-by: Muchun Song Reviewed-by: Mike Kravetz Acked-by: David Hildenbrand Cc: Alexander Gordeev Cc: Arnd Bergmann Cc: Benjamin Herrenschmidt Cc: Catalin Marinas Cc: Christian Borntraeger Cc: David S. Miller Cc: Gerald Schaefer Cc: Heiko Carstens Cc: Helge Deller Cc: James Bottomley Cc: Michael Ellerman Cc: Paul Mackerras Cc: Rich Felker Cc: Sven Schnelle Cc: Thomas Bogendoerfer Cc: Vasily Gorbik Cc: Will Deacon Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 04f0186b089b..0f2894d01333 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return *ptep; +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, -- cgit From 78f39084b41d287aedb2ea55f2c1895cfa11d61a Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Fri, 13 May 2022 16:48:56 -0700 Subject: mm: hugetlb_vmemmap: add hugetlb_optimize_vmemmap sysctl We must add hugetlb_free_vmemmap=on (or "off") to the boot cmdline and reboot the server to enable or disable the feature of optimizing vmemmap pages associated with HugeTLB pages. However, rebooting usually takes a long time. So add a sysctl to enable or disable the feature at runtime without rebooting. Why we need this? There are 3 use cases. 1) The feature of minimizing overhead of struct page associated with each HugeTLB is disabled by default without passing "hugetlb_free_vmemmap=on" to the boot cmdline. When we (ByteDance) deliver the servers to the users who want to enable this feature, they have to configure the grub (change boot cmdline) and reboot the servers, whereas rebooting usually takes a long time (we have thousands of servers). It's a very bad experience for the users. So we need a approach to enable this feature after rebooting. This is a use case in our practical environment. 2) Some use cases are that HugeTLB pages are allocated 'on the fly' instead of being pulled from the HugeTLB pool, those workloads would be affected with this feature enabled. Those workloads could be identified by the characteristics of they never explicitly allocating huge pages with 'nr_hugepages' but only set 'nr_overcommit_hugepages' and then let the pages be allocated from the buddy allocator at fault time. We can confirm it is a real use case from the commit 099730d67417. For those workloads, the page fault time could be ~2x slower than before. We suspect those users want to disable this feature if the system has enabled this before and they don't think the memory savings benefit is enough to make up for the performance drop. 3) If the workload which wants vmemmap pages to be optimized and the workload which wants to set 'nr_overcommit_hugepages' and does not want the extera overhead at fault time when the overcommitted pages be allocated from the buddy allocator are deployed in the same server. The user could enable this feature and set 'nr_hugepages' and 'nr_overcommit_hugepages', then disable the feature. In this case, the overcommited HugeTLB pages will not encounter the extra overhead at fault time. Link: https://lkml.kernel.org/r/20220512041142.39501-5-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Cc: Jonathan Corbet Cc: Luis Chamberlain Cc: Kees Cook Cc: Iurii Zaikin Cc: Oscar Salvador Cc: David Hildenbrand Cc: Masahiro Yamada Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/memory_hotplug.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index e0b2209ab71c..20d7edf62a6a 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -351,4 +351,13 @@ void arch_remove_linear_mapping(u64 start, u64 size); extern bool mhp_supports_memmap_on_memory(unsigned long size); #endif /* CONFIG_MEMORY_HOTPLUG */ +#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY +bool mhp_memmap_on_memory(void); +#else +static inline bool mhp_memmap_on_memory(void) +{ + return false; +} +#endif + #endif /* __LINUX_MEMORY_HOTPLUG_H */ -- cgit From d4a157f5a26fbf1f1fd5da47a4772a738306348f Mon Sep 17 00:00:00 2001 From: Gautam Menghani Date: Fri, 13 May 2022 16:48:57 -0700 Subject: mm/damon: add documentation for Enum value Fix the warning - "Enum value 'NR_DAMON_OPS' not described in enum 'damon_ops_id'" generated by the command "make pdfdocs" Link: https://lkml.kernel.org/r/20220508073316.141401-1-gautammenghani201@gmail.com Signed-off-by: Gautam Menghani Reviewed-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index d1e6ee28a2ff..7c62da31ce4b 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -264,6 +264,7 @@ struct damos { * @DAMON_OPS_FVADDR: Monitoring operations for only fixed ranges of virtual * address spaces * @DAMON_OPS_PADDR: Monitoring operations for the physical address space + * @NR_DAMON_OPS: Number of monitoring operations implementations */ enum damon_ops_id { DAMON_OPS_VADDR, -- cgit From 81132a39c152ca09832b9e4cb748129cee5f55ec Mon Sep 17 00:00:00 2001 From: Gou Hao Date: Tue, 2 Nov 2021 10:46:48 +0800 Subject: fs: remove fget_many and fput_many interface These two interface were added in 091141a42 commit, but now there is no place to call them. The only user of fput/fget_many() was removed in commit 62906e89e63b ("io_uring: remove file batch-get optimisation"). A user of get_file_rcu_many() were removed in commit f073531070d2 ("init: add an init_dup helper"). And replace atomic_long_sub/add to atomic_long_dec/inc can improve performance. Here are the test results of unixbench: Cmd: ./Run -c 64 context1 Without patch: System Benchmarks Partial Index BASELINE RESULT INDEX Pipe-based Context Switching 4000.0 2798407.0 6996.0 ======== System Benchmarks Index Score (Partial Only) 6996.0 With patch: System Benchmarks Partial Index BASELINE RESULT INDEX Pipe-based Context Switching 4000.0 3486268.8 8715.7 ======== System Benchmarks Index Score (Partial Only) 8715.7 Signed-off-by: Gou Hao Signed-off-by: Al Viro --- include/linux/file.h | 2 -- include/linux/fs.h | 4 +--- 2 files changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/file.h b/include/linux/file.h index 51e830b4fe3a..39704eae83e2 100644 --- a/include/linux/file.h +++ b/include/linux/file.h @@ -14,7 +14,6 @@ struct file; extern void fput(struct file *); -extern void fput_many(struct file *, unsigned int); struct file_operations; struct task_struct; @@ -47,7 +46,6 @@ static inline void fdput(struct fd fd) } extern struct file *fget(unsigned int fd); -extern struct file *fget_many(unsigned int fd, unsigned int refs); extern struct file *fget_raw(unsigned int fd); extern struct file *fget_task(struct task_struct *task, unsigned int fd); extern unsigned long __fdget(unsigned int fd); diff --git a/include/linux/fs.h b/include/linux/fs.h index bbde95387a23..3660c338bb16 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -981,9 +981,7 @@ static inline struct file *get_file(struct file *f) atomic_long_inc(&f->f_count); return f; } -#define get_file_rcu_many(x, cnt) \ - atomic_long_add_unless(&(x)->f_count, (cnt), 0) -#define get_file_rcu(x) get_file_rcu_many((x), 1) +#define get_file_rcu(x) atomic_long_inc_not_zero(&(x)->f_count) #define file_count(x) atomic_long_read(&(x)->f_count) #define MAX_NON_LFS ((1UL<<31) - 1) -- cgit From 6319194ec57b0452dcda4589d24c4e7db299c5bf Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 12 May 2022 17:08:03 -0400 Subject: Unify the primitives for file descriptor closing Currently we have 3 primitives for removing an opened file from descriptor table - pick_file(), __close_fd_get_file() and close_fd_get_file(). Their calling conventions are rather odd and there's a code duplication for no good reason. They can be unified - 1) have __range_close() cap max_fd in the very beginning; that way we don't need separate way for pick_file() to report being past the end of descriptor table. 2) make {__,}close_fd_get_file() return file (or NULL) directly, rather than returning it via struct file ** argument. Don't bother with (bogus) return value - nobody wants that -ENOENT. 3) make pick_file() return NULL on unopened descriptor - the only caller that used to care about the distinction between descriptor past the end of descriptor table and finding NULL in descriptor table doesn't give a damn after (1). 4) lift ->files_lock out of pick_file() That actually simplifies the callers, as well as the primitives themselves. Code duplication is also gone... Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro --- include/linux/fdtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fdtable.h b/include/linux/fdtable.h index d0e78174874a..e066816f3519 100644 --- a/include/linux/fdtable.h +++ b/include/linux/fdtable.h @@ -125,7 +125,7 @@ int iterate_fd(struct files_struct *, unsigned, extern int close_fd(unsigned int fd); extern int __close_range(unsigned int fd, unsigned int max_fd, unsigned int flags); -extern int close_fd_get_file(unsigned int fd, struct file **res); +extern struct file *close_fd_get_file(unsigned int fd); extern int unshare_fd(unsigned long unshare_flags, unsigned int max_fds, struct files_struct **new_fdp); -- cgit From 03fea699b050805ad6ee111f9db04f223f3e835e Mon Sep 17 00:00:00 2001 From: Paul Gortmaker Date: Sun, 15 May 2022 21:58:30 +0100 Subject: cdrom: remove the unused driver specific disc change ioctl This was only used by the ide-cd driver, which went away in commit b7fb14d3ac63 ("ide: remove the legacy ide driver") so we might as well take advantage of that and get rid of this hook as well. Cc: Christoph Hellwig Cc: Jens Axboe Cc: Phillip Potter Signed-off-by: Paul Gortmaker Link: https://lore.kernel.org/all/20220427132436.12795-2-paul.gortmaker@windriver.com Signed-off-by: Phillip Potter Link: https://lore.kernel.org/r/20220515205833.944139-3-phil@philpotter.co.uk Signed-off-by: Jens Axboe --- include/linux/cdrom.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cdrom.h b/include/linux/cdrom.h index 0a89f111e00e..67caa909e3e6 100644 --- a/include/linux/cdrom.h +++ b/include/linux/cdrom.h @@ -77,7 +77,6 @@ struct cdrom_device_ops { int (*tray_move) (struct cdrom_device_info *, int); int (*lock_door) (struct cdrom_device_info *, int); int (*select_speed) (struct cdrom_device_info *, int); - int (*select_disc) (struct cdrom_device_info *, int); int (*get_last_session) (struct cdrom_device_info *, struct cdrom_multisession *); int (*get_mcn) (struct cdrom_device_info *, -- cgit From ca2d89925ae3f3d5c65182ff75e58bc9b484e69c Mon Sep 17 00:00:00 2001 From: Max Gurtovoy Date: Thu, 28 Apr 2022 12:19:35 +0300 Subject: nvme: add missing status values to verbose logging Log a few more path related status codes. Signed-off-by: Max Gurtovoy Reviewed-by: Hannes Reinecke Signed-off-by: Christoph Hellwig --- include/linux/nvme.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index f626a445d1a8..bbabdc7600da 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -1679,9 +1679,11 @@ enum { /* * Path-related Errors: */ + NVME_SC_INTERNAL_PATH_ERROR = 0x300, NVME_SC_ANA_PERSISTENT_LOSS = 0x301, NVME_SC_ANA_INACCESSIBLE = 0x302, NVME_SC_ANA_TRANSITION = 0x303, + NVME_SC_CTRL_PATH_ERROR = 0x360, NVME_SC_HOST_PATH_ERROR = 0x370, NVME_SC_HOST_ABORTED_CMD = 0x371, -- cgit From 7c4e983c4f3cf94fcd879730c6caa877e0768a4d Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Fri, 13 May 2022 11:33:57 -0700 Subject: net: allow gso_max_size to exceed 65536 The code for gso_max_size was added originally to allow for debugging and workaround of buggy devices that couldn't support TSO with blocks 64K in size. The original reason for limiting it to 64K was because that was the existing limits of IPv4 and non-jumbogram IPv6 length fields. With the addition of Big TCP we can remove this limit and allow the value to potentially go up to UINT_MAX and instead be limited by the tso_max_size value. So in order to support this we need to go through and clean up the remaining users of the gso_max_size value so that the values will cap at 64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper limit for GSO_MAX_SIZE. v6: (edumazet) fixed a compile error if CONFIG_IPV6=n, in a new sk_trim_gso_size() helper. netif_set_tso_max_size() caps the requested TSO size with GSO_MAX_SIZE. Signed-off-by: Alexander Duyck Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/netdevice.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 536321691c72..ce780e352f43 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2272,7 +2272,9 @@ struct net_device { const struct rtnl_link_ops *rtnl_link_ops; /* for setting kernel sock attribute on TCP connection setup */ -#define GSO_MAX_SIZE 65536 +#define GSO_LEGACY_MAX_SIZE 65536u +#define GSO_MAX_SIZE UINT_MAX + unsigned int gso_max_size; #define TSO_LEGACY_MAX_SIZE 65536 #define TSO_MAX_SIZE UINT_MAX -- cgit From 34b92e8d19da1e44070dc0ecc2747871469ac534 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Fri, 13 May 2022 11:33:58 -0700 Subject: net: limit GSO_MAX_SIZE to 524280 bytes Make sure we will not overflow shinfo->gso_segs Minimal TCP MSS size is 8 bytes, and shinfo->gso_segs is a 16bit field. TCP_MIN_GSO_SIZE is currently defined in include/net/tcp.h, it seems cleaner to not bring tcp details into include/linux/netdevice.h Signed-off-by: Eric Dumazet Acked-by: Alexander Duyck Signed-off-by: David S. Miller --- include/linux/netdevice.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ce780e352f43..fd38847d0dc7 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2272,14 +2272,17 @@ struct net_device { const struct rtnl_link_ops *rtnl_link_ops; /* for setting kernel sock attribute on TCP connection setup */ +#define GSO_MAX_SEGS 65535u #define GSO_LEGACY_MAX_SIZE 65536u -#define GSO_MAX_SIZE UINT_MAX +/* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE), + * and shinfo->gso_segs is a 16bit field. + */ +#define GSO_MAX_SIZE (8 * GSO_MAX_SEGS) unsigned int gso_max_size; #define TSO_LEGACY_MAX_SIZE 65536 #define TSO_MAX_SIZE UINT_MAX unsigned int tso_max_size; -#define GSO_MAX_SEGS 65535 u16 gso_max_segs; #define TSO_MAX_SEGS U16_MAX u16 tso_max_segs; -- cgit From 0fe79f28bfaf73b66b7b1562d2468f94aa03bd12 Mon Sep 17 00:00:00 2001 From: Alexander Duyck Date: Fri, 13 May 2022 11:34:03 -0700 Subject: net: allow gro_max_size to exceed 65536 Allow the gro_max_size to exceed a value larger than 65536. There weren't really any external limitations that prevented this other than the fact that IPv4 only supports a 16 bit length field. Since we have the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to exceed this value and for IPv4 and non-TCP flows we can cap things at 65536 via a constant rather than relying on gro_max_size. [edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows. Signed-off-by: Alexander Duyck Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/netdevice.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index fd38847d0dc7..d57ce248004c 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2161,7 +2161,11 @@ struct net_device { struct bpf_prog __rcu *xdp_prog; unsigned long gro_flush_timeout; int napi_defer_hard_irqs; -#define GRO_MAX_SIZE 65536 +#define GRO_LEGACY_MAX_SIZE 65536u +/* TCP minimal MSS is 8 (TCP_MIN_GSO_SIZE), + * and shinfo->gso_segs is a 16bit field. + */ +#define GRO_MAX_SIZE (8 * 65535u) unsigned int gro_max_size; rx_handler_func_t __rcu *rx_handler; void __rcu *rx_handler_data; -- cgit From 80e425b613421911f89664663a7060216abcaed2 Mon Sep 17 00:00:00 2001 From: Coco Li Date: Fri, 13 May 2022 11:34:04 -0700 Subject: ipv6: Add hop-by-hop header to jumbograms in ip6_output Instead of simply forcing a 0 payload_len in IPv6 header, implement RFC 2675 and insert a custom extension header. Note that only TCP stack is currently potentially generating jumbograms, and that this extension header is purely local, it wont be sent on a physical link. This is needed so that packet capture (tcpdump and friends) can properly dissect these large packets. Signed-off-by: Coco Li Signed-off-by: Eric Dumazet Acked-by: Alexander Duyck Signed-off-by: David S. Miller --- include/linux/ipv6.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index ec5ca392eaa3..38c8203d52cb 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -145,6 +145,7 @@ struct inet6_skb_parm { #define IP6SKB_L3SLAVE 64 #define IP6SKB_JUMBOGRAM 128 #define IP6SKB_SEG6 256 +#define IP6SKB_FAKEJUMBO 512 }; #if defined(CONFIG_NET_L3_MASTER_DEV) -- cgit From 7ebd3f3ee51a9e02994cd0a1be44fbd325d1e0dc Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Fri, 13 May 2022 11:03:38 +0800 Subject: net: skb: change the definition SKB_DR_SET() The SKB_DR_OR() is used to set the drop reason to a value when it is not set or specified yet. SKB_NOT_DROPPED_YET should also be considered as not set. Reviewed-by: Jiang Biao Reviewed-by: Hao Peng Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/skbuff.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9d82a8b6c8f1..5a2b29df6cab 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -510,7 +510,8 @@ enum skb_drop_reason { (name = SKB_DROP_REASON_##reason) #define SKB_DR_OR(name, reason) \ do { \ - if (name == SKB_DROP_REASON_NOT_SPECIFIED) \ + if (name == SKB_DROP_REASON_NOT_SPECIFIED || \ + name == SKB_NOT_DROPPED_YET) \ SKB_DR_SET(name, reason); \ } while (0) -- cgit From 97e719a82b43c6c2bb5eebdb3c5d479a332ac2ac Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sun, 15 May 2022 21:24:53 -0700 Subject: net: fix possible race in skb_attempt_defer_free() A cpu can observe sd->defer_count reaching 128, and call smp_call_function_single_async() Problem is that the remote CPU can clear sd->defer_count before the IPI is run/acknowledged. Other cpus can queue more packets and also decide to call smp_call_function_single_async() while the pending IPI was not yet delivered. This is a common issue with smp_call_function_single_async(). Callers must ensure correct synchronization and serialization. I triggered this issue while experimenting smaller threshold. Performing the call to smp_call_function_single_async() under sd->defer_lock protection did not solve the problem. Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async() to insert locked csd") replaced an informative WARN_ON_ONCE() with a return of -EBUSY, which is often ignored. Test of CSD_FLAG_LOCK presence is racy anyway. Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists") Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index d57ce248004c..cbaf312e365b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3136,6 +3136,7 @@ struct softnet_data { /* Another possibly contended cache line */ spinlock_t defer_lock ____cacheline_aligned_in_smp; int defer_count; + int defer_ipi_scheduled; struct sk_buff *defer_list; call_single_data_t defer_csd; }; -- cgit From cf2df74e202d81b09f09d84c2d8903e0e87e9274 Mon Sep 17 00:00:00 2001 From: Felix Fietkau Date: Mon, 9 May 2022 14:26:15 +0200 Subject: net: fix dev_fill_forward_path with pppoe + bridge When calling dev_fill_forward_path on a pppoe device, the provided destination address is invalid. In order for the bridge fdb lookup to succeed, the pppoe code needs to update ctx->daddr to the correct value. Fix this by storing the address inside struct net_device_path_ctx Fixes: f6efc675c9dd ("net: ppp: resolve forwarding path for bridge pppoe devices") Signed-off-by: Felix Fietkau Signed-off-by: Pablo Neira Ayuso --- include/linux/netdevice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index b1fbe21650bb..f736c020cde2 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -900,7 +900,7 @@ struct net_device_path_stack { struct net_device_path_ctx { const struct net_device *dev; - const u8 *daddr; + u8 daddr[ETH_ALEN]; int num_vlans; struct { -- cgit From 9db69df4bdd37eb1f65b6931ee067fb15b9a4d5c Mon Sep 17 00:00:00 2001 From: TingHan Shen Date: Thu, 12 May 2022 16:22:13 +0800 Subject: firmware: mediatek: Add adsp ipc protocol interface Some of mediatek processors contain the Tensilica HiFix DSP for audio processing. The communication between Host CPU and DSP firmware is taking place using a shared memory area for message passing. ADSP IPC protocol offers (send/recv) interfaces using mediatek-mailbox APIs. We use two mbox channels to implement a request-reply protocol. Signed-off-by: Allen-KH Cheng Signed-off-by: TingHan Shen Reviewed-by: Pierre-Louis Bossart Reviewed-by: Curtis Malainey Reviewed-by: Tzung-Bi Shih Reviewed-by: YC Hung Reviewed-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20220512082215.3018-2-tinghan.shen@mediatek.com Signed-off-by: Mark Brown --- include/linux/firmware/mediatek/mtk-adsp-ipc.h | 65 ++++++++++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 include/linux/firmware/mediatek/mtk-adsp-ipc.h (limited to 'include/linux') diff --git a/include/linux/firmware/mediatek/mtk-adsp-ipc.h b/include/linux/firmware/mediatek/mtk-adsp-ipc.h new file mode 100644 index 000000000000..28fd313340b8 --- /dev/null +++ b/include/linux/firmware/mediatek/mtk-adsp-ipc.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2022 MediaTek Inc. + */ + +#ifndef MTK_ADSP_IPC_H +#define MTK_ADSP_IPC_H + +#include +#include +#include +#include + +#define MTK_ADSP_IPC_REQ 0 +#define MTK_ADSP_IPC_RSP 1 +#define MTK_ADSP_IPC_OP_REQ 0x1 +#define MTK_ADSP_IPC_OP_RSP 0x2 + +enum { + MTK_ADSP_MBOX_REPLY, + MTK_ADSP_MBOX_REQUEST, + MTK_ADSP_MBOX_NUM, +}; + +struct mtk_adsp_ipc; + +struct mtk_adsp_ipc_ops { + void (*handle_reply)(struct mtk_adsp_ipc *ipc); + void (*handle_request)(struct mtk_adsp_ipc *ipc); +}; + +struct mtk_adsp_chan { + struct mtk_adsp_ipc *ipc; + struct mbox_client cl; + struct mbox_chan *ch; + char *name; + int idx; +}; + +struct mtk_adsp_ipc { + struct mtk_adsp_chan chans[MTK_ADSP_MBOX_NUM]; + struct device *dev; + struct mtk_adsp_ipc_ops *ops; + void *private_data; +}; + +static inline void mtk_adsp_ipc_set_data(struct mtk_adsp_ipc *ipc, void *data) +{ + if (!ipc) + return; + + ipc->private_data = data; +} + +static inline void *mtk_adsp_ipc_get_data(struct mtk_adsp_ipc *ipc) +{ + if (!ipc) + return NULL; + + return ipc->private_data; +} + +int mtk_adsp_ipc_send(struct mtk_adsp_ipc *ipc, unsigned int idx, uint32_t op); + +#endif /* MTK_ADSP_IPC_H */ -- cgit From 7f8d12ea96352275c2850c24a1367166179392d2 Mon Sep 17 00:00:00 2001 From: Naohiro Aota Date: Tue, 29 Mar 2022 15:55:59 +0900 Subject: fs: add a lockdep check function for sb_start_write() Add a function sb_write_started() to allow callers to verify if sb_start_write() is properly called. It will be used for assertion in btrfs. Reviewed-by: Filipe Manana Signed-off-by: Naohiro Aota Reviewed-by: David Sterba Signed-off-by: David Sterba --- include/linux/fs.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index bbde95387a23..01d61984ce7a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1708,6 +1708,11 @@ static inline bool __sb_start_write_trylock(struct super_block *sb, int level) #define __sb_writers_release(sb, lev) \ percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_) +static inline bool sb_write_started(const struct super_block *sb) +{ + return lockdep_is_held_type(sb->s_writers.rw_sem + SB_FREEZE_WRITE - 1, 1); +} + /** * sb_end_write - drop write access to a superblock * @sb: the super we wrote to -- cgit From 908c54909ae72dcbf1d7e1440f7297187d06c275 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 5 May 2022 15:11:10 -0500 Subject: iomap: allow the file system to provide a bio_set for direct I/O Allow the file system to provide a specific bio_set for allocating direct I/O bios. This will allow file systems that use the ->submit_io hook to stash away additional information for file system use. To make use of this additional space for information in the completion path, the file system needs to override the ->bi_end_io callback and then call back into iomap, so export iomap_dio_bio_end_io for that. Reviewed-by: Darrick J. Wong Reviewed-by: Nikolay Borisov Signed-off-by: Christoph Hellwig Reviewed-by: David Sterba Signed-off-by: David Sterba --- include/linux/iomap.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index b76f0dd149fb..cf903f1a230f 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -320,6 +320,16 @@ struct iomap_dio_ops { unsigned flags); void (*submit_io)(const struct iomap_iter *iter, struct bio *bio, loff_t file_offset); + + /* + * Filesystems wishing to attach private information to a direct io bio + * must provide a ->submit_io method that attaches the additional + * information to the bio and changes the ->bi_end_io callback to a + * custom function. This function should, at a minimum, perform any + * relevant post-processing of the bio and end with a call to + * iomap_dio_bio_end_io. + */ + struct bio_set *bio_set; }; /* @@ -349,6 +359,7 @@ struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, unsigned int dio_flags, size_t done_before); ssize_t iomap_dio_complete(struct iomap_dio *dio); +void iomap_dio_bio_end_io(struct bio *bio); #ifdef CONFIG_SWAP struct file; -- cgit From 786f847f43a54e63161474fe85a4f1764d871a35 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 5 May 2022 15:11:11 -0500 Subject: iomap: add per-iomap_iter private data Allow the file system to keep state for all iterations. For now only wire it up for direct I/O as there is an immediate need for it there. Reviewed-by: Darrick J. Wong Reviewed-by: Nikolay Borisov Signed-off-by: Christoph Hellwig Reviewed-by: David Sterba Signed-off-by: David Sterba --- include/linux/iomap.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index cf903f1a230f..5b6f64f4d771 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -188,6 +188,7 @@ struct iomap_iter { unsigned flags; struct iomap iomap; struct iomap srcmap; + void *private; }; int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops); @@ -354,10 +355,10 @@ struct iomap_dio_ops { ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - unsigned int dio_flags, size_t done_before); + unsigned int dio_flags, void *private, size_t done_before); struct iomap_dio *__iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, - unsigned int dio_flags, size_t done_before); + unsigned int dio_flags, void *private, size_t done_before); ssize_t iomap_dio_complete(struct iomap_dio *dio); void iomap_dio_bio_end_io(struct bio *bio); -- cgit From b3fdf9398a16f01dc013967a4ab25e99c3f4fc12 Mon Sep 17 00:00:00 2001 From: Jane Chu Date: Mon, 16 May 2022 11:21:46 -0700 Subject: x86/mce: relocate set{clear}_mce_nospec() functions Relocate the twin mce functions to arch/x86/mm/pat/set_memory.c file where they belong. While at it, fixup a function name in a comment. Reviewed-by: Christoph Hellwig Reviewed-by: Dan Williams Signed-off-by: Jane Chu Acked-by: Borislav Petkov Cc: Stephen Rothwell [sfr: gate {set,clear}_mce_nospec() by CONFIG_X86_64] Link: https://lore.kernel.org/r/165272527328.90175.8336008202048685278.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- include/linux/set_memory.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h index f36be5166c19..683a6c3f7179 100644 --- a/include/linux/set_memory.h +++ b/include/linux/set_memory.h @@ -42,14 +42,14 @@ static inline bool can_set_direct_map(void) #endif #endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */ -#ifndef set_mce_nospec +#ifdef CONFIG_X86_64 +int set_mce_nospec(unsigned long pfn, bool unmap); +int clear_mce_nospec(unsigned long pfn); +#else static inline int set_mce_nospec(unsigned long pfn, bool unmap) { return 0; } -#endif - -#ifndef clear_mce_nospec static inline int clear_mce_nospec(unsigned long pfn) { return 0; -- cgit From 5898b43af954b83c4a4ee4ab85c4dbafa395822a Mon Sep 17 00:00:00 2001 From: Jane Chu Date: Mon, 16 May 2022 11:38:10 -0700 Subject: mce: fix set_mce_nospec to always unmap the whole page The set_memory_uc() approach doesn't work well in all cases. As Dan pointed out when "The VMM unmapped the bad page from guest physical space and passed the machine check to the guest." "The guest gets virtual #MC on an access to that page. When the guest tries to do set_memory_uc() and instructs cpa_flush() to do clean caches that results in taking another fault / exception perhaps because the VMM unmapped the page from the guest." Since the driver has special knowledge to handle NP or UC, mark the poisoned page with NP and let driver handle it when it comes down to repair. Please refer to discussions here for more details. https://lore.kernel.org/all/CAPcyv4hrXPb1tASBZUg-GgdVs0OOFKXMXLiHmktg_kFi7YBMyQ@mail.gmail.com/ Now since poisoned page is marked as not-present, in order to avoid writing to a not-present page and trigger kernel Oops, also fix pmem_do_write(). Fixes: 284ce4011ba6 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()") Reviewed-by: Christoph Hellwig Reviewed-by: Dan Williams Signed-off-by: Jane Chu Acked-by: Tony Luck Link: https://lore.kernel.org/r/165272615484.103830.2563950688772226611.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- include/linux/set_memory.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/set_memory.h b/include/linux/set_memory.h index 683a6c3f7179..369769ce7399 100644 --- a/include/linux/set_memory.h +++ b/include/linux/set_memory.h @@ -43,10 +43,10 @@ static inline bool can_set_direct_map(void) #endif /* CONFIG_ARCH_HAS_SET_DIRECT_MAP */ #ifdef CONFIG_X86_64 -int set_mce_nospec(unsigned long pfn, bool unmap); +int set_mce_nospec(unsigned long pfn); int clear_mce_nospec(unsigned long pfn); #else -static inline int set_mce_nospec(unsigned long pfn, bool unmap) +static inline int set_mce_nospec(unsigned long pfn) { return 0; } -- cgit From e511c4a3d2a1f64aafc1f5df37a2ffcf7ef91b55 Mon Sep 17 00:00:00 2001 From: Jane Chu Date: Fri, 13 May 2022 15:10:58 -0700 Subject: dax: introduce DAX_RECOVERY_WRITE dax access mode Up till now, dax_direct_access() is used implicitly for normal access, but for the purpose of recovery write, dax range with poison is requested. To make the interface clear, introduce enum dax_access_mode { DAX_ACCESS, DAX_RECOVERY_WRITE, } where DAX_ACCESS is used for normal dax access, and DAX_RECOVERY_WRITE is used for dax recovery write. Suggested-by: Dan Williams Signed-off-by: Jane Chu Reviewed-by: Christoph Hellwig Cc: Mike Snitzer Reviewed-by: Vivek Goyal Link: https://lore.kernel.org/r/165247982851.52965.11024212198889762949.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams --- include/linux/dax.h | 9 +++++++-- include/linux/device-mapper.h | 4 +++- 2 files changed, 10 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dax.h b/include/linux/dax.h index 9fc5f99a0ae2..3f1339bce3c0 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -14,6 +14,11 @@ struct iomap_ops; struct iomap_iter; struct iomap; +enum dax_access_mode { + DAX_ACCESS, + DAX_RECOVERY_WRITE, +}; + struct dax_operations { /* * direct_access: translate a device-relative @@ -21,7 +26,7 @@ struct dax_operations { * number of pages available for DAX at that pfn. */ long (*direct_access)(struct dax_device *, pgoff_t, long, - void **, pfn_t *); + enum dax_access_mode, void **, pfn_t *); /* * Validate whether this device is usable as an fsdax backing * device. @@ -178,7 +183,7 @@ static inline void dax_read_unlock(int id) bool dax_alive(struct dax_device *dax_dev); void *dax_get_private(struct dax_device *dax_dev); long dax_direct_access(struct dax_device *dax_dev, pgoff_t pgoff, long nr_pages, - void **kaddr, pfn_t *pfn); + enum dax_access_mode mode, void **kaddr, pfn_t *pfn); size_t dax_copy_from_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i); size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index c2a3758c4aaa..acdedda0d12b 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -20,6 +20,7 @@ struct dm_table; struct dm_report_zones_args; struct mapped_device; struct bio_vec; +enum dax_access_mode; /* * Type of table, mapped_device's mempool and request_queue @@ -146,7 +147,8 @@ typedef int (*dm_busy_fn) (struct dm_target *ti); * >= 0 : the number of bytes accessible at the address */ typedef long (*dm_dax_direct_access_fn) (struct dm_target *ti, pgoff_t pgoff, - long nr_pages, void **kaddr, pfn_t *pfn); + long nr_pages, enum dax_access_mode node, void **kaddr, + pfn_t *pfn); typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff, size_t nr_pages); -- cgit From 047218ec904da19c45c4a70274fc3f818a1fcba1 Mon Sep 17 00:00:00 2001 From: Jane Chu Date: Fri, 22 Apr 2022 16:45:06 -0600 Subject: dax: add .recovery_write dax_operation Introduce dax_recovery_write() operation. The function is used to recover a dax range that contains poison. Typical use case is when a user process receives a SIGBUS with si_code BUS_MCEERR_AR indicating poison(s) in a dax range, in response, the user process issues a pwrite() to the page-aligned dax range, thus clears the poison and puts valid data in the range. Reviewed-by: Christoph Hellwig Signed-off-by: Jane Chu Link: https://lore.kernel.org/r/20220422224508.440670-6-jane.chu@oracle.com Signed-off-by: Dan Williams --- include/linux/dax.h | 13 +++++++++++++ include/linux/device-mapper.h | 9 +++++++++ 2 files changed, 22 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dax.h b/include/linux/dax.h index 3f1339bce3c0..e7b81634c52a 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -35,6 +35,12 @@ struct dax_operations { sector_t, sector_t); /* zero_page_range: required operation. Zero page range */ int (*zero_page_range)(struct dax_device *, pgoff_t, size_t); + /* + * recovery_write: recover a poisoned range by DAX device driver + * capable of clearing poison. + */ + size_t (*recovery_write)(struct dax_device *dax_dev, pgoff_t pgoff, + void *addr, size_t bytes, struct iov_iter *iter); }; #if IS_ENABLED(CONFIG_DAX) @@ -45,6 +51,8 @@ void dax_write_cache(struct dax_device *dax_dev, bool wc); bool dax_write_cache_enabled(struct dax_device *dax_dev); bool dax_synchronous(struct dax_device *dax_dev); void set_dax_synchronous(struct dax_device *dax_dev); +size_t dax_recovery_write(struct dax_device *dax_dev, pgoff_t pgoff, + void *addr, size_t bytes, struct iov_iter *i); /* * Check if given mapping is supported by the file / underlying device. */ @@ -92,6 +100,11 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma, { return !(vma->vm_flags & VM_SYNC); } +static inline size_t dax_recovery_write(struct dax_device *dax_dev, + pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i) +{ + return 0; +} #endif void set_dax_nocache(struct dax_device *dax_dev); diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index acdedda0d12b..47a01c7cffdf 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -152,6 +152,14 @@ typedef long (*dm_dax_direct_access_fn) (struct dm_target *ti, pgoff_t pgoff, typedef int (*dm_dax_zero_page_range_fn)(struct dm_target *ti, pgoff_t pgoff, size_t nr_pages); +/* + * Returns: + * != 0 : number of bytes transferred + * 0 : recovery write failed + */ +typedef size_t (*dm_dax_recovery_write_fn)(struct dm_target *ti, pgoff_t pgoff, + void *addr, size_t bytes, struct iov_iter *i); + void dm_error(const char *message); struct dm_dev { @@ -201,6 +209,7 @@ struct target_type { dm_io_hints_fn io_hints; dm_dax_direct_access_fn direct_access; dm_dax_zero_page_range_fn dax_zero_page_range; + dm_dax_recovery_write_fn dax_recovery_write; /* For internal device-mapper use. */ struct list_head list; -- cgit From 89af2ce2d95c80f6da9388d9da2e35c84c2573b1 Mon Sep 17 00:00:00 2001 From: Ricardo Martinez Date: Fri, 13 May 2022 10:34:00 -0700 Subject: net: skb: Remove skb_data_area_size() skb_data_area_size() is not needed. As Jakub pointed out [1]: For Rx, drivers can use the size passed during skb allocation or use skb_tailroom(). For Tx, drivers should use skb_headlen(). [1] https://lore.kernel.org/netdev/CAHNKnsTmH-rGgWi3jtyC=ktM1DW2W1VJkYoTMJV2Z_Bt498bsg@mail.gmail.com/ Signed-off-by: Ricardo Martinez Reviewed-by: Andy Shevchenko Reviewed-by: Sergey Ryazanov Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 5a2b29df6cab..da96f0d3e753 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1765,11 +1765,6 @@ static inline void skb_set_end_offset(struct sk_buff *skb, unsigned int offset) } #endif -static inline unsigned int skb_data_area_size(struct sk_buff *skb) -{ - return skb_end_pointer(skb) - skb->data; -} - struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size, struct ubuf_info *uarg); -- cgit From e626f37e657adbab2a7abe51480925891662a5f3 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 16 May 2022 14:29:43 +0200 Subject: nvme: split the enum used for various register constants Instead of having one big enum add one for each register or field. Signed-off-by: Christoph Hellwig Reviewed-by: Keith Busch Reviewed-by: Chaitanya Kulkarni --- include/linux/nvme.h | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index bbabdc7600da..5f6d432fa06a 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -204,8 +204,9 @@ enum { NVME_CC_SHN_MASK = 3 << NVME_CC_SHN_SHIFT, NVME_CC_IOSQES = NVME_NVM_IOSQES << NVME_CC_IOSQES_SHIFT, NVME_CC_IOCQES = NVME_NVM_IOCQES << NVME_CC_IOCQES_SHIFT, - NVME_CAP_CSS_NVM = 1 << 0, - NVME_CAP_CSS_CSI = 1 << 6, +}; + +enum { NVME_CSTS_RDY = 1 << 0, NVME_CSTS_CFS = 1 << 1, NVME_CSTS_NSSRO = 1 << 4, @@ -214,10 +215,18 @@ enum { NVME_CSTS_SHST_OCCUR = 1 << 2, NVME_CSTS_SHST_CMPLT = 2 << 2, NVME_CSTS_SHST_MASK = 3 << 2, +}; + +enum { NVME_CMBMSC_CRE = 1 << 0, NVME_CMBMSC_CMSE = 1 << 1, }; +enum { + NVME_CAP_CSS_NVM = 1 << 0, + NVME_CAP_CSS_CSI = 1 << 6, +}; + struct nvme_id_power_state { __le16 max_power; /* centiwatts */ __u8 rsvd2; -- cgit From a03dacb0316f74400846aaf144d6c73f4217ca08 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Tue, 2 Mar 2021 15:58:21 +0900 Subject: PM / devfreq: Add cpu based scaling support to passive governor Many CPU architectures have caches that can scale independent of the CPUs. Frequency scaling of the caches is necessary to make sure that the cache is not a performance bottleneck that leads to poor performance and power. The same idea applies for RAM/DDR. To achieve this, this patch adds support for cpu based scaling to the passive governor. This is accomplished by taking the current frequency of each CPU frequency domain and then adjust the frequency of the cache (or any devfreq device) based on the frequency of the CPUs. It listens to CPU frequency transition notifiers to keep itself up to date on the current CPU frequency. To decide the frequency of the device, the governor does one of the following: * Derives the optimal devfreq device opp from required-opps property of the parent cpu opp_table. * Scales the device frequency in proportion to the CPU frequency. So, if the CPUs are running at their max frequency, the device runs at its max frequency. If the CPUs are running at their min frequency, the device runs at its min frequency. It is interpolated for frequencies in between. Tested-by: Chen-Yu Tsai Tested-by: Johnson Wang Signed-off-by: Saravana Kannan [Sibi: Integrated cpu-freqmap governor into passive_governor] Signed-off-by: Sibi Sankar [Chanwoo: Fix conflict with latest code and cleanup code] Signed-off-by: Chanwoo Choi --- include/linux/devfreq.h | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h index 142474b4af96..b1e4a6f796ce 100644 --- a/include/linux/devfreq.h +++ b/include/linux/devfreq.h @@ -38,6 +38,7 @@ enum devfreq_timer { struct devfreq; struct devfreq_governor; +struct devfreq_cpu_data; struct thermal_cooling_device; /** @@ -288,6 +289,11 @@ struct devfreq_simple_ondemand_data { #endif #if IS_ENABLED(CONFIG_DEVFREQ_GOV_PASSIVE) +enum devfreq_parent_dev_type { + DEVFREQ_PARENT_DEV, + CPUFREQ_PARENT_DEV, +}; + /** * struct devfreq_passive_data - ``void *data`` fed to struct devfreq * and devfreq_add_device @@ -299,8 +305,11 @@ struct devfreq_simple_ondemand_data { * using governors except for passive governor. * If the devfreq device has the specific method to decide * the next frequency, should use this callback. - * @this: the devfreq instance of own device. - * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER list + * @parent_type: the parent type of the device. + * @this: the devfreq instance of own device. + * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER or + * CPUFREQ_TRANSITION_NOTIFIER list. + * @parent_cpu_data: the state min/max/current frequency of all online cpu's. * * The devfreq_passive_data have to set the devfreq instance of parent * device with governors except for the passive governor. But, don't need to @@ -314,9 +323,13 @@ struct devfreq_passive_data { /* Optional callback to decide the next frequency of passvice device */ int (*get_target_freq)(struct devfreq *this, unsigned long *freq); + /* Should set the type of parent device */ + enum devfreq_parent_dev_type parent_type; + /* For passive governor's internal use. Don't need to set them */ struct devfreq *this; struct notifier_block nb; + struct devfreq_cpu_data *parent_cpu_data[NR_CPUS]; }; #endif -- cgit From 26984d9d581e5049bd75091d2e789b9cc3ea12e0 Mon Sep 17 00:00:00 2001 From: Chanwoo Choi Date: Wed, 27 Apr 2022 03:49:19 +0900 Subject: PM / devfreq: passive: Keep cpufreq_policy for possible cpus The passive governor requires the cpu data to get the next target frequency of devfreq device if depending on cpu. In order to reduce the unnecessary memory data, keep cpufreq_policy data for possible cpus instead of NR_CPU. Tested-by: Chen-Yu Tsai Tested-by: Johnson Wang Signed-off-by: Chanwoo Choi --- include/linux/devfreq.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h index b1e4a6f796ce..dc10bee75a72 100644 --- a/include/linux/devfreq.h +++ b/include/linux/devfreq.h @@ -309,7 +309,7 @@ enum devfreq_parent_dev_type { * @this: the devfreq instance of own device. * @nb: the notifier block for DEVFREQ_TRANSITION_NOTIFIER or * CPUFREQ_TRANSITION_NOTIFIER list. - * @parent_cpu_data: the state min/max/current frequency of all online cpu's. + * @cpu_data_list: the list of cpu frequency data for all cpufreq_policy. * * The devfreq_passive_data have to set the devfreq instance of parent * device with governors except for the passive governor. But, don't need to @@ -329,7 +329,7 @@ struct devfreq_passive_data { /* For passive governor's internal use. Don't need to set them */ struct devfreq *this; struct notifier_block nb; - struct devfreq_cpu_data *parent_cpu_data[NR_CPUS]; + struct list_head cpu_data_list; }; #endif -- cgit From 1ad6c3b7ef132e1d8c5d606008069724625c8daf Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Tue, 5 Apr 2022 11:24:51 +0200 Subject: hwmon: introduce hwmon_sanitize_name() More and more drivers will check for bad characters in the hwmon name and all are using the same code snippet. Consolidate that code by adding a new hwmon_sanitize_name() function. Signed-off-by: Michael Walle Reviewed-by: Tom Rix Link: https://lore.kernel.org/r/20220405092452.4033674-2-michael@walle.cc Signed-off-by: Guenter Roeck --- include/linux/hwmon.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hwmon.h b/include/linux/hwmon.h index eba380b76d15..4efaf06fd2b8 100644 --- a/include/linux/hwmon.h +++ b/include/linux/hwmon.h @@ -461,6 +461,9 @@ void devm_hwmon_device_unregister(struct device *dev); int hwmon_notify_event(struct device *dev, enum hwmon_sensor_types type, u32 attr, int channel); +char *hwmon_sanitize_name(const char *name); +char *devm_hwmon_sanitize_name(struct device *dev, const char *name); + /** * hwmon_is_bad_char - Is the char invalid in a hwmon name * @ch: the char to be considered -- cgit From c8383054506c77b814489c09877b5db83fd4abf2 Mon Sep 17 00:00:00 2001 From: Jeffle Xu Date: Mon, 25 Apr 2022 20:21:24 +0800 Subject: cachefiles: notify the user daemon when looking up cookie Fscache/CacheFiles used to serve as a local cache for a remote networking fs. A new on-demand read mode will be introduced for CacheFiles, which can boost the scenario where on-demand read semantics are needed, e.g. container image distribution. The essential difference between these two modes is seen when a cache miss occurs: In the original mode, the netfs will fetch the data from the remote server and then write it to the cache file; in on-demand read mode, fetching the data and writing it into the cache is delegated to a user daemon. As the first step, notify the user daemon when looking up cookie. In this case, an anonymous fd is sent to the user daemon, through which the user daemon can write the fetched data to the cache file. Since the user daemon may move the anonymous fd around, e.g. through dup(), an object ID uniquely identifying the cache file is also attached. Also add one advisory flag (FSCACHE_ADV_WANT_CACHE_SIZE) suggesting that the cache file size shall be retrieved at runtime. This helps the scenario where one cache file contains multiple netfs files, e.g. for the purpose of deduplication. In this case, netfs itself has no idea the size of the cache file, whilst the user daemon should give the hint on it. Signed-off-by: Jeffle Xu Link: https://lore.kernel.org/r/20220509074028.74954-3-jefflexu@linux.alibaba.com Acked-by: David Howells Signed-off-by: Gao Xiang --- include/linux/fscache.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fscache.h b/include/linux/fscache.h index e25539072463..72585c9729a2 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -39,6 +39,7 @@ struct fscache_cookie; #define FSCACHE_ADV_SINGLE_CHUNK 0x01 /* The object is a single chunk of data */ #define FSCACHE_ADV_WRITE_CACHE 0x00 /* Do cache if written to locally */ #define FSCACHE_ADV_WRITE_NOCACHE 0x02 /* Don't cache if written to locally */ +#define FSCACHE_ADV_WANT_CACHE_SIZE 0x04 /* Retrieve cache size at runtime */ #define FSCACHE_INVAL_DIO_WRITE 0x01 /* Invalidate due to DIO write */ -- cgit From 9032b6e8589f269743984aac53e82e4835be16dc Mon Sep 17 00:00:00 2001 From: Jeffle Xu Date: Mon, 25 Apr 2022 20:21:27 +0800 Subject: cachefiles: implement on-demand read Implement the data plane of on-demand read mode. The early implementation [1] place the entry to cachefiles_ondemand_read() in fscache_read(). However, fscache_read() can only detect if the requested file range is fully cache miss, whilst we need to notify the user daemon as long as there's a hole inside the requested file range. Thus the entry is now placed in cachefiles_prepare_read(). When working in on-demand read mode, once a hole detected, the read routine will send a READ request to the user daemon. The user daemon needs to fetch the data and write it to the cache file. After sending the READ request, the read routine will hang there, until the READ request is handled by the user daemon. Then it will retry to read from the same file range. If no progress encountered, the read routine will fail then. A new NETFS_SREQ_ONDEMAND flag is introduced to indicate that on-demand read should be done when a cache miss encountered. [1] https://lore.kernel.org/all/20220406075612.60298-6-jefflexu@linux.alibaba.com/ #v8 Signed-off-by: Jeffle Xu Acked-by: David Howells Link: https://lore.kernel.org/r/20220425122143.56815-6-jefflexu@linux.alibaba.com Signed-off-by: Gao Xiang --- include/linux/netfs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index c7bf1eaf51d5..057d04efaf79 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -159,6 +159,7 @@ struct netfs_io_subrequest { #define NETFS_SREQ_SHORT_IO 2 /* Set if the I/O was short */ #define NETFS_SREQ_SEEK_DATA_READ 3 /* Set if ->read() should SEEK_DATA first */ #define NETFS_SREQ_NO_PROGRESS 4 /* Set if we didn't manage to read any data */ +#define NETFS_SREQ_ONDEMAND 5 /* Set if it's from on-demand read mode */ }; enum netfs_io_origin { -- cgit From 7b8b44eb7710e34a1ea5278392417993a549fabf Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 14 May 2022 10:36:58 -0400 Subject: NFSv4: Specify the type of ACL to cache When caching a NFSv4 ACL, we want to specify whether we are caching an NFSv4.0 type acl, the NFSv4.1 dacl or the NFSv4.1 sacl. Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker --- include/linux/nfs4.h | 2 ++ include/linux/nfs_xdr.h | 7 +++++++ 2 files changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h index 5662d8be04eb..8d04b6a5964c 100644 --- a/include/linux/nfs4.h +++ b/include/linux/nfs4.h @@ -451,6 +451,8 @@ enum lock_type4 { #define FATTR4_WORD1_TIME_MODIFY (1UL << 21) #define FATTR4_WORD1_TIME_MODIFY_SET (1UL << 22) #define FATTR4_WORD1_MOUNTED_ON_FILEID (1UL << 23) +#define FATTR4_WORD1_DACL (1UL << 26) +#define FATTR4_WORD1_SACL (1UL << 27) #define FATTR4_WORD1_FS_LAYOUT_TYPES (1UL << 30) #define FATTR4_WORD2_LAYOUT_TYPES (1UL << 0) #define FATTR4_WORD2_LAYOUT_BLKSIZE (1UL << 1) diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 2863e5a69c6a..13d068c57d8d 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -800,6 +800,13 @@ struct nfs_setattrargs { const struct nfs4_label *label; }; +enum nfs4_acl_type { + NFS4ACL_NONE = 0, + NFS4ACL_ACL, + NFS4ACL_DACL, + NFS4ACL_SACL, +}; + struct nfs_setaclargs { struct nfs4_sequence_args seq_args; struct nfs_fh * fh; -- cgit From db145db021abf75aa1a5810e631425055cfbed47 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 14 May 2022 10:36:59 -0400 Subject: NFSv4: Add encoders/decoders for the NFSv4.1 dacl and sacl attributes Add the ability to set or retrieve the acl using the NFSv4.1 'dacl' and 'sacl' attributes to the NFSv4 xdr encoders/decoders. Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker --- include/linux/nfs_xdr.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 13d068c57d8d..4a8ba84f848e 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -810,6 +810,7 @@ enum nfs4_acl_type { struct nfs_setaclargs { struct nfs4_sequence_args seq_args; struct nfs_fh * fh; + enum nfs4_acl_type acl_type; size_t acl_len; struct page ** acl_pages; }; @@ -821,6 +822,7 @@ struct nfs_setaclres { struct nfs_getaclargs { struct nfs4_sequence_args seq_args; struct nfs_fh * fh; + enum nfs4_acl_type acl_type; size_t acl_len; struct page ** acl_pages; }; @@ -829,6 +831,7 @@ struct nfs_getaclargs { #define NFS4_ACL_TRUNC 0x0001 /* ACL was truncated */ struct nfs_getaclres { struct nfs4_sequence_res seq_res; + enum nfs4_acl_type acl_type; size_t acl_len; size_t acl_data_offset; int acl_flags; -- cgit From 69e9cd66ae1392437234a63a3a1d60b6655f92ef Mon Sep 17 00:00:00 2001 From: Julian Orth Date: Tue, 17 May 2022 12:32:53 +0200 Subject: audit,io_uring,io-wq: call __audit_uring_exit for dummy contexts Not calling the function for dummy contexts will cause the context to not be reset. During the next syscall, this will cause an error in __audit_syscall_entry: WARN_ON(context->context != AUDIT_CTX_UNUSED); WARN_ON(context->name_count); if (context->context != AUDIT_CTX_UNUSED || context->name_count) { audit_panic("unrecoverable error in audit_syscall_entry()"); return; } These problematic dummy contexts are created via the following call chain: exit_to_user_mode_prepare -> arch_do_signal_or_restart -> get_signal -> task_work_run -> tctx_task_work -> io_req_task_submit -> io_issue_sqe -> audit_uring_entry Cc: stable@vger.kernel.org Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Signed-off-by: Julian Orth [PM: subject line tweaks] Signed-off-by: Paul Moore --- include/linux/audit.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/audit.h b/include/linux/audit.h index d06134ac6245..cece70231138 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -339,7 +339,7 @@ static inline void audit_uring_entry(u8 op) } static inline void audit_uring_exit(int success, long code) { - if (unlikely(!audit_dummy_context())) + if (unlikely(audit_context())) __audit_uring_exit(success, code); } static inline void audit_syscall_entry(int major, unsigned long a0, -- cgit From 0aa7be05d83cc584da0782405e8007e351dfb6cc Mon Sep 17 00:00:00 2001 From: Uros Bizjak Date: Sun, 15 May 2022 20:42:03 +0200 Subject: locking/atomic: Add generic try_cmpxchg64 support Add generic support for try_cmpxchg64{,_acquire,_release,_relaxed} and their falbacks involving cmpxchg64. Signed-off-by: Uros Bizjak Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220515184205.103089-2-ubizjak@gmail.com --- include/linux/atomic/atomic-arch-fallback.h | 72 ++++++++++++++++++++++++++++- include/linux/atomic/atomic-instrumented.h | 40 +++++++++++++++- 2 files changed, 110 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/atomic/atomic-arch-fallback.h b/include/linux/atomic/atomic-arch-fallback.h index 6db58d180866..77bc5522e61c 100644 --- a/include/linux/atomic/atomic-arch-fallback.h +++ b/include/linux/atomic/atomic-arch-fallback.h @@ -147,6 +147,76 @@ #endif /* arch_try_cmpxchg_relaxed */ +#ifndef arch_try_cmpxchg64_relaxed +#ifdef arch_try_cmpxchg64 +#define arch_try_cmpxchg64_acquire arch_try_cmpxchg64 +#define arch_try_cmpxchg64_release arch_try_cmpxchg64 +#define arch_try_cmpxchg64_relaxed arch_try_cmpxchg64 +#endif /* arch_try_cmpxchg64 */ + +#ifndef arch_try_cmpxchg64 +#define arch_try_cmpxchg64(_ptr, _oldp, _new) \ +({ \ + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + ___r = arch_cmpxchg64((_ptr), ___o, (_new)); \ + if (unlikely(___r != ___o)) \ + *___op = ___r; \ + likely(___r == ___o); \ +}) +#endif /* arch_try_cmpxchg64 */ + +#ifndef arch_try_cmpxchg64_acquire +#define arch_try_cmpxchg64_acquire(_ptr, _oldp, _new) \ +({ \ + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + ___r = arch_cmpxchg64_acquire((_ptr), ___o, (_new)); \ + if (unlikely(___r != ___o)) \ + *___op = ___r; \ + likely(___r == ___o); \ +}) +#endif /* arch_try_cmpxchg64_acquire */ + +#ifndef arch_try_cmpxchg64_release +#define arch_try_cmpxchg64_release(_ptr, _oldp, _new) \ +({ \ + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + ___r = arch_cmpxchg64_release((_ptr), ___o, (_new)); \ + if (unlikely(___r != ___o)) \ + *___op = ___r; \ + likely(___r == ___o); \ +}) +#endif /* arch_try_cmpxchg64_release */ + +#ifndef arch_try_cmpxchg64_relaxed +#define arch_try_cmpxchg64_relaxed(_ptr, _oldp, _new) \ +({ \ + typeof(*(_ptr)) *___op = (_oldp), ___o = *___op, ___r; \ + ___r = arch_cmpxchg64_relaxed((_ptr), ___o, (_new)); \ + if (unlikely(___r != ___o)) \ + *___op = ___r; \ + likely(___r == ___o); \ +}) +#endif /* arch_try_cmpxchg64_relaxed */ + +#else /* arch_try_cmpxchg64_relaxed */ + +#ifndef arch_try_cmpxchg64_acquire +#define arch_try_cmpxchg64_acquire(...) \ + __atomic_op_acquire(arch_try_cmpxchg64, __VA_ARGS__) +#endif + +#ifndef arch_try_cmpxchg64_release +#define arch_try_cmpxchg64_release(...) \ + __atomic_op_release(arch_try_cmpxchg64, __VA_ARGS__) +#endif + +#ifndef arch_try_cmpxchg64 +#define arch_try_cmpxchg64(...) \ + __atomic_op_fence(arch_try_cmpxchg64, __VA_ARGS__) +#endif + +#endif /* arch_try_cmpxchg64_relaxed */ + #ifndef arch_atomic_read_acquire static __always_inline int arch_atomic_read_acquire(const atomic_t *v) @@ -2386,4 +2456,4 @@ arch_atomic64_dec_if_positive(atomic64_t *v) #endif #endif /* _LINUX_ATOMIC_FALLBACK_H */ -// 8e2cc06bc0d2c0967d2f8424762bd48555ee40ae +// b5e87bdd5ede61470c29f7a7e4de781af3770f09 diff --git a/include/linux/atomic/atomic-instrumented.h b/include/linux/atomic/atomic-instrumented.h index 5d69b143c28e..7a139ec030b0 100644 --- a/include/linux/atomic/atomic-instrumented.h +++ b/include/linux/atomic/atomic-instrumented.h @@ -2006,6 +2006,44 @@ atomic_long_dec_if_positive(atomic_long_t *v) arch_try_cmpxchg_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \ }) +#define try_cmpxchg64(ptr, oldp, ...) \ +({ \ + typeof(ptr) __ai_ptr = (ptr); \ + typeof(oldp) __ai_oldp = (oldp); \ + kcsan_mb(); \ + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + arch_try_cmpxchg64(__ai_ptr, __ai_oldp, __VA_ARGS__); \ +}) + +#define try_cmpxchg64_acquire(ptr, oldp, ...) \ +({ \ + typeof(ptr) __ai_ptr = (ptr); \ + typeof(oldp) __ai_oldp = (oldp); \ + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + arch_try_cmpxchg64_acquire(__ai_ptr, __ai_oldp, __VA_ARGS__); \ +}) + +#define try_cmpxchg64_release(ptr, oldp, ...) \ +({ \ + typeof(ptr) __ai_ptr = (ptr); \ + typeof(oldp) __ai_oldp = (oldp); \ + kcsan_release(); \ + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + arch_try_cmpxchg64_release(__ai_ptr, __ai_oldp, __VA_ARGS__); \ +}) + +#define try_cmpxchg64_relaxed(ptr, oldp, ...) \ +({ \ + typeof(ptr) __ai_ptr = (ptr); \ + typeof(oldp) __ai_oldp = (oldp); \ + instrument_atomic_write(__ai_ptr, sizeof(*__ai_ptr)); \ + instrument_atomic_write(__ai_oldp, sizeof(*__ai_oldp)); \ + arch_try_cmpxchg64_relaxed(__ai_ptr, __ai_oldp, __VA_ARGS__); \ +}) + #define cmpxchg_local(ptr, ...) \ ({ \ typeof(ptr) __ai_ptr = (ptr); \ @@ -2045,4 +2083,4 @@ atomic_long_dec_if_positive(atomic_long_t *v) }) #endif /* _LINUX_ATOMIC_INSTRUMENTED_H */ -// 87c974b93032afd42143613434d1a7788fa598f9 +// 764f741eb77a7ad565dc8d99ce2837d5542e8aee -- cgit From bec67592521ec816371f5f072b1a340e1c2ad434 Mon Sep 17 00:00:00 2001 From: Min Li Date: Mon, 16 May 2022 10:47:06 -0400 Subject: ptp: ptp_clockmatrix: Add PTP_CLK_REQ_EXTTS support Use TOD_READ_SECONDARY for extts to keep TOD_READ_PRIMARY for gettime and settime exclusively. Before this change, TOD_READ_PRIMARY was used for both extts and gettime/settime, which would result in changing TOD read/write triggers between operations. Using TOD_READ_SECONDARY would make extts independent of gettime/settime operation Signed-off-by: Min Li Acked-by: Richard Cochran Link: https://lore.kernel.org/r/1652712427-14703-1-git-send-email-min.li.xe@renesas.com Signed-off-by: Jakub Kicinski --- include/linux/mfd/idt8a340_reg.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mfd/idt8a340_reg.h b/include/linux/mfd/idt8a340_reg.h index a18c1539a152..0c706085c205 100644 --- a/include/linux/mfd/idt8a340_reg.h +++ b/include/linux/mfd/idt8a340_reg.h @@ -407,7 +407,7 @@ #define TOD_READ_PRIMARY_0 0xcc40 #define TOD_READ_PRIMARY_0_V520 0xcc50 /* 8-bit subns, 32-bit ns, 48-bit seconds */ -#define TOD_READ_PRIMARY 0x0000 +#define TOD_READ_PRIMARY_BASE 0x0000 /* Counter increments after TOD write is completed */ #define TOD_READ_PRIMARY_COUNTER 0x000b /* Read trigger configuration */ @@ -424,6 +424,16 @@ #define TOD_READ_SECONDARY_0 0xcc90 #define TOD_READ_SECONDARY_0_V520 0xcca0 +/* 8-bit subns, 32-bit ns, 48-bit seconds */ +#define TOD_READ_SECONDARY_BASE 0x0000 +/* Counter increments after TOD write is completed */ +#define TOD_READ_SECONDARY_COUNTER 0x000b +/* Read trigger configuration */ +#define TOD_READ_SECONDARY_SEL_CFG_0 0x000c +/* Read trigger selection */ +#define TOD_READ_SECONDARY_CMD 0x000e +#define TOD_READ_SECONDARY_CMD_V520 0x000f + #define TOD_READ_SECONDARY_1 0xcca0 #define TOD_READ_SECONDARY_1_V520 0xccb0 #define TOD_READ_SECONDARY_2 0xccb0 -- cgit From 1d2c717bc7f7fd3c9cf38d4a0d5d7ede06adf05b Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Fri, 13 May 2022 06:19:31 +0300 Subject: net/mlx5: Add last command failure syndrome to debugfs Add syndrome of last command failure per command type to debugfs to ease debugging of such failure. last_failed_syndrome - last command failed syndrome returned by FW. Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index d6bac3976913..74c8cfb771a2 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -272,6 +272,8 @@ struct mlx5_cmd_stats { u32 last_failed_errno; /* last bad status returned by FW */ u8 last_failed_mbox_status; + /* last command failed syndrome returned by FW */ + u32 last_failed_syndrome; struct dentry *root; /* protect command average calculations */ spinlock_t lock; -- cgit From 9b45bde82c229fda94618896ff530dcba9d66fe0 Mon Sep 17 00:00:00 2001 From: Tariq Toukan Date: Tue, 25 Jan 2022 14:47:36 +0200 Subject: net/mlx5: Inline db alloc API function Take the wrapper version which picks default node into a header file. This reduces the number of exported functions. Signed-off-by: Tariq Toukan Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 74c8cfb771a2..b064bc278f52 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1053,9 +1053,14 @@ int mlx5_core_access_reg(struct mlx5_core_dev *dev, void *data_in, int size_in, void *data_out, int size_out, u16 reg_num, int arg, int write); -int mlx5_db_alloc(struct mlx5_core_dev *dev, struct mlx5_db *db); int mlx5_db_alloc_node(struct mlx5_core_dev *dev, struct mlx5_db *db, int node); + +static inline int mlx5_db_alloc(struct mlx5_core_dev *dev, struct mlx5_db *db) +{ + return mlx5_db_alloc_node(dev, db, dev->priv.numa_node); +} + void mlx5_db_free(struct mlx5_core_dev *dev, struct mlx5_db *db); const char *mlx5_command_str(int command); -- cgit From 94db3317781922ba52722c58061e0e8517d4d80d Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Mon, 31 Jan 2022 07:49:51 +0200 Subject: net/mlx5: Support multiport eswitch mode Multiport eswitch mode is a LAG mode that allows to add rules that forward traffic to a specific physical port without being affected by LAG affinity configuration. This mode of operation is mutual exclusive with the other LAG modes used by multipath and bonding. To make the transition between the modes, we maintain a counter on the number of rules specifying one of the uplink representors as the target of mirred egress redirect action. An example of such rule would be: $ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \ 00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0 If the reference count just grows to one and LAG is not in use, we create the LAG in multiport eswitch mode. Other mode changes are not allowed while in this mode. When the reference count reaches zero, we destroy the LAG and let other modes be used if needed. logic also changed such that if forwarding to some uplink destination cannot be guaranteed, we fail the operation so the rule will eventually be in software and not in hardware. Signed-off-by: Eli Cohen Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 7bab3e51c61e..78b3d3465dd7 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1359,7 +1359,7 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 vhca_resource_manager[0x1]; u8 hca_cap_2[0x1]; - u8 reserved_at_21[0x1]; + u8 create_lag_when_not_master_up[0x1]; u8 dtor[0x1]; u8 event_on_vhca_state_teardown_request[0x1]; u8 event_on_vhca_state_in_use[0x1]; @@ -10816,7 +10816,8 @@ struct mlx5_ifc_dcbx_param_bits { enum { MLX5_LAG_PORT_SELECT_MODE_QUEUE_AFFINITY = 0, - MLX5_LAG_PORT_SELECT_MODE_PORT_SELECT_FT, + MLX5_LAG_PORT_SELECT_MODE_PORT_SELECT_FT = 1, + MLX5_LAG_PORT_SELECT_MODE_PORT_SELECT_MPESW = 2, }; struct mlx5_ifc_lagc_bits { -- cgit From 41929c9f628b9990d33a200c54bb0c919e089aa8 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 6 Apr 2022 22:55:05 +0200 Subject: clocksource/drivers/ixp4xx: Drop boardfile probe path The boardfiles for IXP4xx have been deleted. Delete all the quirks and code dealing with that boot path and rely solely on device tree boot. Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220406205505.2332821-1-linus.walleij@linaro.org Signed-off-by: Daniel Lezcano --- include/linux/platform_data/timer-ixp4xx.h | 11 ----------- 1 file changed, 11 deletions(-) delete mode 100644 include/linux/platform_data/timer-ixp4xx.h (limited to 'include/linux') diff --git a/include/linux/platform_data/timer-ixp4xx.h b/include/linux/platform_data/timer-ixp4xx.h deleted file mode 100644 index ee92ae7edaed..000000000000 --- a/include/linux/platform_data/timer-ixp4xx.h +++ /dev/null @@ -1,11 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __TIMER_IXP4XX_H -#define __TIMER_IXP4XX_H - -#include - -void __init ixp4xx_timer_setup(resource_size_t timerbase, - int timer_irq, - unsigned int timer_freq); - -#endif -- cgit From 14362a2541797cf9df0e86fb12dcd7950baf566e Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 11 May 2022 22:02:12 +0300 Subject: fsnotify: introduce mark type iterator fsnotify_foreach_iter_mark_type() is used to reduce boilerplate code of iterating all marks of a specific group interested in an event by consulting the iterator report_mask. Use an open coded version of that iterator in fsnotify_iter_next() that collects all marks of the current iteration group without consulting the iterator report_mask. At the moment, the two iterator variants are the same, but this decoupling will allow us to exclude some of the group's marks from reporting the event, for example for event on child and inode marks on parent did not request to watch events on children. Fixes: 2f02fd3fa13e ("fanotify: fix ignore mask logic for events on child and on dir") Reported-by: Jan Kara Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20220511190213.831646-2-amir73il@gmail.com --- include/linux/fsnotify_backend.h | 31 ++++++++++++++++++++++++------- 1 file changed, 24 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 9a1a9e78f69f..9560734759fa 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -399,6 +399,7 @@ static inline bool fsnotify_valid_obj_type(unsigned int obj_type) struct fsnotify_iter_info { struct fsnotify_mark *marks[FSNOTIFY_ITER_TYPE_COUNT]; + struct fsnotify_group *current_group; unsigned int report_mask; int srcu_idx; }; @@ -415,20 +416,31 @@ static inline void fsnotify_iter_set_report_type( iter_info->report_mask |= (1U << iter_type); } -static inline void fsnotify_iter_set_report_type_mark( - struct fsnotify_iter_info *iter_info, int iter_type, - struct fsnotify_mark *mark) +static inline struct fsnotify_mark *fsnotify_iter_mark( + struct fsnotify_iter_info *iter_info, int iter_type) { - iter_info->marks[iter_type] = mark; - iter_info->report_mask |= (1U << iter_type); + if (fsnotify_iter_should_report_type(iter_info, iter_type)) + return iter_info->marks[iter_type]; + return NULL; +} + +static inline int fsnotify_iter_step(struct fsnotify_iter_info *iter, int type, + struct fsnotify_mark **markp) +{ + while (type < FSNOTIFY_ITER_TYPE_COUNT) { + *markp = fsnotify_iter_mark(iter, type); + if (*markp) + break; + type++; + } + return type; } #define FSNOTIFY_ITER_FUNCS(name, NAME) \ static inline struct fsnotify_mark *fsnotify_iter_##name##_mark( \ struct fsnotify_iter_info *iter_info) \ { \ - return (iter_info->report_mask & (1U << FSNOTIFY_ITER_TYPE_##NAME)) ? \ - iter_info->marks[FSNOTIFY_ITER_TYPE_##NAME] : NULL; \ + return fsnotify_iter_mark(iter_info, FSNOTIFY_ITER_TYPE_##NAME); \ } FSNOTIFY_ITER_FUNCS(inode, INODE) @@ -438,6 +450,11 @@ FSNOTIFY_ITER_FUNCS(sb, SB) #define fsnotify_foreach_iter_type(type) \ for (type = 0; type < FSNOTIFY_ITER_TYPE_COUNT; type++) +#define fsnotify_foreach_iter_mark_type(iter, mark, type) \ + for (type = 0; \ + type = fsnotify_iter_step(iter, type, &mark), \ + type < FSNOTIFY_ITER_TYPE_COUNT; \ + type++) /* * fsnotify_connp_t is what we embed in objects which connector can be attached -- cgit From e73aaae2fa9024832e1f42e30c787c7baf61d014 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sat, 7 May 2022 14:03:46 +0200 Subject: siphash: use one source of truth for siphash permutations The SipHash family of permutations is currently used in three places: - siphash.c itself, used in the ordinary way it was intended. - random32.c, in a construction from an anonymous contributor. - random.c, as part of its fast_mix function. Each one of these places reinvents the wheel with the same C code, same rotation constants, and same symmetry-breaking constants. This commit tidies things up a bit by placing macros for the permutations and constants into siphash.h, where each of the three .c users can access them. It also leaves a note dissuading more users of them from emerging. Signed-off-by: Jason A. Donenfeld --- include/linux/prandom.h | 23 +++++++---------------- include/linux/siphash.h | 28 ++++++++++++++++++++++++++++ 2 files changed, 35 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/prandom.h b/include/linux/prandom.h index 056d31317e49..a4aadd2dc153 100644 --- a/include/linux/prandom.h +++ b/include/linux/prandom.h @@ -10,6 +10,7 @@ #include #include +#include u32 prandom_u32(void); void prandom_bytes(void *buf, size_t nbytes); @@ -27,15 +28,10 @@ DECLARE_PER_CPU(unsigned long, net_rand_noise); * The core SipHash round function. Each line can be executed in * parallel given enough CPU resources. */ -#define PRND_SIPROUND(v0, v1, v2, v3) ( \ - v0 += v1, v1 = rol64(v1, 13), v2 += v3, v3 = rol64(v3, 16), \ - v1 ^= v0, v0 = rol64(v0, 32), v3 ^= v2, \ - v0 += v3, v3 = rol64(v3, 21), v2 += v1, v1 = rol64(v1, 17), \ - v3 ^= v0, v1 ^= v2, v2 = rol64(v2, 32) \ -) +#define PRND_SIPROUND(v0, v1, v2, v3) SIPHASH_PERMUTATION(v0, v1, v2, v3) -#define PRND_K0 (0x736f6d6570736575 ^ 0x6c7967656e657261) -#define PRND_K1 (0x646f72616e646f6d ^ 0x7465646279746573) +#define PRND_K0 (SIPHASH_CONST_0 ^ SIPHASH_CONST_2) +#define PRND_K1 (SIPHASH_CONST_1 ^ SIPHASH_CONST_3) #elif BITS_PER_LONG == 32 /* @@ -43,14 +39,9 @@ DECLARE_PER_CPU(unsigned long, net_rand_noise); * This is weaker, but 32-bit machines are not used for high-traffic * applications, so there is less output for an attacker to analyze. */ -#define PRND_SIPROUND(v0, v1, v2, v3) ( \ - v0 += v1, v1 = rol32(v1, 5), v2 += v3, v3 = rol32(v3, 8), \ - v1 ^= v0, v0 = rol32(v0, 16), v3 ^= v2, \ - v0 += v3, v3 = rol32(v3, 7), v2 += v1, v1 = rol32(v1, 13), \ - v3 ^= v0, v1 ^= v2, v2 = rol32(v2, 16) \ -) -#define PRND_K0 0x6c796765 -#define PRND_K1 0x74656462 +#define PRND_SIPROUND(v0, v1, v2, v3) HSIPHASH_PERMUTATION(v0, v1, v2, v3) +#define PRND_K0 (HSIPHASH_CONST_0 ^ HSIPHASH_CONST_2) +#define PRND_K1 (HSIPHASH_CONST_1 ^ HSIPHASH_CONST_3) #else #error Unsupported BITS_PER_LONG diff --git a/include/linux/siphash.h b/include/linux/siphash.h index cce8a9acc76c..3af1428da559 100644 --- a/include/linux/siphash.h +++ b/include/linux/siphash.h @@ -138,4 +138,32 @@ static inline u32 hsiphash(const void *data, size_t len, return ___hsiphash_aligned(data, len, key); } +/* + * These macros expose the raw SipHash and HalfSipHash permutations. + * Do not use them directly! If you think you have a use for them, + * be sure to CC the maintainer of this file explaining why. + */ + +#define SIPHASH_PERMUTATION(a, b, c, d) ( \ + (a) += (b), (b) = rol64((b), 13), (b) ^= (a), (a) = rol64((a), 32), \ + (c) += (d), (d) = rol64((d), 16), (d) ^= (c), \ + (a) += (d), (d) = rol64((d), 21), (d) ^= (a), \ + (c) += (b), (b) = rol64((b), 17), (b) ^= (c), (c) = rol64((c), 32)) + +#define SIPHASH_CONST_0 0x736f6d6570736575ULL +#define SIPHASH_CONST_1 0x646f72616e646f6dULL +#define SIPHASH_CONST_2 0x6c7967656e657261ULL +#define SIPHASH_CONST_3 0x7465646279746573ULL + +#define HSIPHASH_PERMUTATION(a, b, c, d) ( \ + (a) += (b), (b) = rol32((b), 5), (b) ^= (a), (a) = rol32((a), 16), \ + (c) += (d), (d) = rol32((d), 8), (d) ^= (c), \ + (a) += (d), (d) = rol32((d), 7), (d) ^= (a), \ + (c) += (b), (b) = rol32((b), 13), (b) ^= (c), (c) = rol32((c), 16)) + +#define HSIPHASH_CONST_0 0U +#define HSIPHASH_CONST_1 0U +#define HSIPHASH_CONST_2 0x6c796765U +#define HSIPHASH_CONST_3 0x74656462U + #endif /* _LINUX_SIPHASH_H */ -- cgit From d4150779e60fb6c49be25572596b2cdfc5d46a09 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 11 May 2022 16:11:29 +0200 Subject: random32: use real rng for non-deterministic randomness MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit random32.c has two random number generators in it: one that is meant to be used deterministically, with some predefined seed, and one that does the same exact thing as random.c, except does it poorly. The first one has some use cases. The second one no longer does and can be replaced with calls to random.c's proper random number generator. The relatively recent siphash-based bad random32.c code was added in response to concerns that the prior random32.c was too deterministic. Out of fears that random.c was (at the time) too slow, this code was anonymously contributed. Then out of that emerged a kind of shadow entropy gathering system, with its own tentacles throughout various net code, added willy nilly. Stop👏making👏bespoke👏random👏number👏generators👏. Fortunately, recent advances in random.c mean that we can stop playing with this sketchiness, and just use get_random_u32(), which is now fast enough. In micro benchmarks using RDPMC, I'm seeing the same median cycle count between the two functions, with the mean being _slightly_ higher due to batches refilling (which we can optimize further need be). However, when doing *real* benchmarks of the net functions that actually use these random numbers, the mean cycles actually *decreased* slightly (with the median still staying the same), likely because the additional prandom code means icache misses and complexity, whereas random.c is generally already being used by something else nearby. The biggest benefit of this is that there are many users of prandom who probably should be using cryptographically secure random numbers. This makes all of those accidental cases become secure by just flipping a switch. Later on, we can do a tree-wide cleanup to remove the static inline wrapper functions that this commit adds. There are also some low-ish hanging fruits for making this even faster in the future: a get_random_u16() function for use in the networking stack will give a 2x performance boost there, using SIMD for ChaCha20 will let us compute 4 or 8 or 16 blocks of output in parallel, instead of just one, giving us large buffers for cheap, and introducing a get_random_*_bh() function that assumes irqs are already disabled will shave off a few cycles for ordinary calls. These are things we can chip away at down the road. Acked-by: Jakub Kicinski Acked-by: Theodore Ts'o Signed-off-by: Jason A. Donenfeld --- include/linux/prandom.h | 52 +++++++------------------------------------------ 1 file changed, 7 insertions(+), 45 deletions(-) (limited to 'include/linux') diff --git a/include/linux/prandom.h b/include/linux/prandom.h index a4aadd2dc153..deace5fb4e62 100644 --- a/include/linux/prandom.h +++ b/include/linux/prandom.h @@ -10,53 +10,16 @@ #include #include -#include +#include -u32 prandom_u32(void); -void prandom_bytes(void *buf, size_t nbytes); -void prandom_seed(u32 seed); -void prandom_reseed_late(void); - -DECLARE_PER_CPU(unsigned long, net_rand_noise); - -#define PRANDOM_ADD_NOISE(a, b, c, d) \ - prandom_u32_add_noise((unsigned long)(a), (unsigned long)(b), \ - (unsigned long)(c), (unsigned long)(d)) - -#if BITS_PER_LONG == 64 -/* - * The core SipHash round function. Each line can be executed in - * parallel given enough CPU resources. - */ -#define PRND_SIPROUND(v0, v1, v2, v3) SIPHASH_PERMUTATION(v0, v1, v2, v3) - -#define PRND_K0 (SIPHASH_CONST_0 ^ SIPHASH_CONST_2) -#define PRND_K1 (SIPHASH_CONST_1 ^ SIPHASH_CONST_3) - -#elif BITS_PER_LONG == 32 -/* - * On 32-bit machines, we use HSipHash, a reduced-width version of SipHash. - * This is weaker, but 32-bit machines are not used for high-traffic - * applications, so there is less output for an attacker to analyze. - */ -#define PRND_SIPROUND(v0, v1, v2, v3) HSIPHASH_PERMUTATION(v0, v1, v2, v3) -#define PRND_K0 (HSIPHASH_CONST_0 ^ HSIPHASH_CONST_2) -#define PRND_K1 (HSIPHASH_CONST_1 ^ HSIPHASH_CONST_3) - -#else -#error Unsupported BITS_PER_LONG -#endif +static inline u32 prandom_u32(void) +{ + return get_random_u32(); +} -static inline void prandom_u32_add_noise(unsigned long a, unsigned long b, - unsigned long c, unsigned long d) +static inline void prandom_bytes(void *buf, size_t nbytes) { - /* - * This is not used cryptographically; it's just - * a convenient 4-word hash function. (3 xor, 2 add, 2 rol) - */ - a ^= raw_cpu_read(net_rand_noise); - PRND_SIPROUND(a, b, c, d); - raw_cpu_write(net_rand_noise, d); + return get_random_bytes(buf, nbytes); } struct rnd_state { @@ -108,7 +71,6 @@ static inline void prandom_seed_state(struct rnd_state *state, u64 seed) state->s2 = __seed(i, 8U); state->s3 = __seed(i, 16U); state->s4 = __seed(i, 128U); - PRANDOM_ADD_NOISE(state, i, 0, 0); } /* Pseudo random number generator from numerical recipes. */ -- cgit From 2f14062bb14b0fcfcc21e6dc7d5b5c0d25966164 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 5 May 2022 02:20:22 +0200 Subject: random: handle latent entropy and command line from random_init() Currently, start_kernel() adds latent entropy and the command line to the entropy bool *after* the RNG has been initialized, deferring when it's actually used by things like stack canaries until the next time the pool is seeded. This surely is not intended. Rather than splitting up which entropy gets added where and when between start_kernel() and random_init(), just do everything in random_init(), which should eliminate these kinds of bugs in the future. While we're at it, rename the awkwardly titled "rand_initialize()" to the more standard "random_init()" nomenclature. Reviewed-by: Dominik Brodowski Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index f673fbb838b3..b52963955a99 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -14,22 +14,21 @@ struct notifier_block; extern void add_device_randomness(const void *, size_t); extern void add_bootloader_randomness(const void *, size_t); +extern void add_input_randomness(unsigned int type, unsigned int code, + unsigned int value) __latent_entropy; +extern void add_interrupt_randomness(int irq) __latent_entropy; +extern void add_hwgenerator_randomness(const void *buffer, size_t count, + size_t entropy); #if defined(LATENT_ENTROPY_PLUGIN) && !defined(__CHECKER__) static inline void add_latent_entropy(void) { - add_device_randomness((const void *)&latent_entropy, - sizeof(latent_entropy)); + add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy)); } #else static inline void add_latent_entropy(void) {} #endif -extern void add_input_randomness(unsigned int type, unsigned int code, - unsigned int value) __latent_entropy; -extern void add_interrupt_randomness(int irq) __latent_entropy; -extern void add_hwgenerator_randomness(const void *buffer, size_t count, - size_t entropy); #if IS_ENABLED(CONFIG_VMGENID) extern void add_vmfork_randomness(const void *unique_vm_id, size_t size); extern int register_random_vmfork_notifier(struct notifier_block *nb); @@ -41,7 +40,7 @@ static inline int unregister_random_vmfork_notifier(struct notifier_block *nb) { extern void get_random_bytes(void *buf, size_t nbytes); extern int wait_for_random_bytes(void); -extern int __init rand_initialize(void); +extern int __init random_init(const char *command_line); extern bool rng_is_initialized(void); extern int register_random_ready_notifier(struct notifier_block *nb); extern int unregister_random_ready_notifier(struct notifier_block *nb); -- cgit From 354201c53e61e493017b15327294b0c8ab522d69 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 16 May 2022 15:09:21 +0200 Subject: nvme: add support for TP4084 - Time-to-Ready Enhancements Add support for using longer timeouts during controller initialization and letting the controller come up with namespaces that are not ready for I/O yet. We skip these not ready namespaces during scanning and only bring them online once anoter scan is kicked off by the AEN that is set when the NRDY bit gets set in the I/O Command Set Independent Identify Namespace Data Structure. This asynchronous probing avoids blocking the kernel boot when controllers take a very long time to recover after unclean shutdowns (up to minutes). Signed-off-by: Christoph Hellwig Reviewed-by: Keith Busch Reviewed-by: Chaitanya Kulkarni Reviewed-by: Hannes Reinecke --- include/linux/nvme.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 5f6d432fa06a..29ec3e3481ff 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -137,6 +137,7 @@ enum { NVME_REG_CMBMSC = 0x0050, /* Controller Memory Buffer Memory * Space Control */ + NVME_REG_CRTO = 0x0068, /* Controller Ready Timeouts */ NVME_REG_PMRCAP = 0x0e00, /* Persistent Memory Capabilities */ NVME_REG_PMRCTL = 0x0e04, /* Persistent Memory Region Control */ NVME_REG_PMRSTS = 0x0e08, /* Persistent Memory Region Status */ @@ -161,6 +162,9 @@ enum { #define NVME_CMB_BIR(cmbloc) ((cmbloc) & 0x7) #define NVME_CMB_OFST(cmbloc) (((cmbloc) >> 12) & 0xfffff) +#define NVME_CRTO_CRIMT(crto) ((crto) >> 16) +#define NVME_CRTO_CRWMT(crto) ((crto) & 0xffff) + enum { NVME_CMBSZ_SQS = 1 << 0, NVME_CMBSZ_CQS = 1 << 1, @@ -204,6 +208,7 @@ enum { NVME_CC_SHN_MASK = 3 << NVME_CC_SHN_SHIFT, NVME_CC_IOSQES = NVME_NVM_IOSQES << NVME_CC_IOSQES_SHIFT, NVME_CC_IOCQES = NVME_NVM_IOCQES << NVME_CC_IOCQES_SHIFT, + NVME_CC_CRIME = 1 << 24, }; enum { @@ -227,6 +232,11 @@ enum { NVME_CAP_CSS_CSI = 1 << 6, }; +enum { + NVME_CAP_CRMS_CRIMS = 1ULL << 59, + NVME_CAP_CRMS_CRWMS = 1ULL << 60, +}; + struct nvme_id_power_state { __le16 max_power; /* centiwatts */ __u8 rsvd2; @@ -414,6 +424,21 @@ struct nvme_id_ns { __u8 vs[3712]; }; +/* I/O Command Set Independent Identify Namespace Data Structure */ +struct nvme_id_ns_cs_indep { + __u8 nsfeat; + __u8 nmic; + __u8 rescap; + __u8 fpi; + __le32 anagrpid; + __u8 nsattr; + __u8 rsvd9; + __le16 nvmsetid; + __le16 endgid; + __u8 nstat; + __u8 rsvd15[4081]; +}; + struct nvme_zns_lbafe { __le64 zsze; __u8 zdes; @@ -478,6 +503,7 @@ enum { NVME_ID_CNS_NS_DESC_LIST = 0x03, NVME_ID_CNS_CS_NS = 0x05, NVME_ID_CNS_CS_CTRL = 0x06, + NVME_ID_CNS_NS_CS_INDEP = 0x08, NVME_ID_CNS_NS_PRESENT_LIST = 0x10, NVME_ID_CNS_NS_PRESENT = 0x11, NVME_ID_CNS_CTRL_NS_LIST = 0x12, @@ -531,6 +557,10 @@ enum { NVME_NS_DPS_PI_TYPE3 = 3, }; +enum { + NVME_NSTAT_NRDY = 1 << 0, +}; + enum { NVME_NVM_NS_16B_GUARD = 0, NVME_NVM_NS_32B_GUARD = 1, @@ -1592,6 +1622,7 @@ enum { NVME_SC_NS_WRITE_PROTECTED = 0x20, NVME_SC_CMD_INTERRUPTED = 0x21, NVME_SC_TRANSIENT_TR_ERR = 0x22, + NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY = 0x24, NVME_SC_INVALID_IO_CMD_SET = 0x2C, NVME_SC_LBA_RANGE = 0x80, -- cgit From 75dbb685f4e8786c33ddef8279bab0eadfb0731f Mon Sep 17 00:00:00 2001 From: Ilya Dryomov Date: Sat, 14 May 2022 12:16:47 +0200 Subject: libceph: fix potential use-after-free on linger ping and resends request_reinit() is not only ugly as the comment rightfully suggests, but also unsafe. Even though it is called with osdc->lock held for write in all cases, resetting the OSD request refcount can still race with handle_reply() and result in use-after-free. Taking linger ping as an example: handle_timeout thread handle_reply thread down_read(&osdc->lock) req = lookup_request(...) ... finish_request(req) # unregisters up_read(&osdc->lock) __complete_request(req) linger_ping_cb(req) # req->r_kref == 2 because handle_reply still holds its ref down_write(&osdc->lock) send_linger_ping(lreq) req = lreq->ping_req # same req # cancel_linger_request is NOT # called - handle_reply already # unregistered request_reinit(req) WARN_ON(req->r_kref != 1) # fires request_init(req) kref_init(req->r_kref) # req->r_kref == 1 after kref_init ceph_osdc_put_request(req) kref_put(req->r_kref) # req->r_kref == 0 after kref_put, req is freed !!! This happens because send_linger_ping() always (re)uses the same OSD request for watch ping requests, relying on cancel_linger_request() to unregister it from the OSD client and rip its messages out from the messenger. send_linger() does the same for watch/notify registration and watch reconnect requests. Unfortunately cancel_request() doesn't guarantee that after it returns the OSD client would be completely done with the OSD request -- a ref could still be held and the callback (if specified) could still be invoked too. The original motivation for request_reinit() was inability to deal with allocation failures in send_linger() and send_linger_ping(). Switching to using osdc->req_mempool (currently only used by CephFS) respects that and allows us to get rid of request_reinit(). Cc: stable@vger.kernel.org Signed-off-by: Ilya Dryomov Reviewed-by: Xiubo Li Acked-by: Jeff Layton --- include/linux/ceph/osd_client.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index 3431011f364d..cba8a6ffc329 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -287,6 +287,9 @@ struct ceph_osd_linger_request { rados_watcherrcb_t errcb; void *data; + struct ceph_pagelist *request_pl; + struct page **notify_id_pages; + struct page ***preply_pages; size_t *preply_len; }; -- cgit From de399236e240743ad2dd10d719c37b97ddf31996 Mon Sep 17 00:00:00 2001 From: Alexey Gladkov Date: Wed, 18 May 2022 19:17:30 +0200 Subject: ucounts: Split rlimit and ucount values and max values Since the semantics of maximum rlimit values are different, it would be better not to mix ucount and rlimit values. This will prevent the error of using inc_count/dec_ucount for rlimit parameters. This patch also renames the functions to emphasize the lack of connection between rlimit and ucount. v3: - Fix BUG:KASAN:use-after-free_in_dec_ucount. v2: - Fix the array-index-out-of-bounds that was found by the lkp project. Reported-by: kernel test robot Signed-off-by: Alexey Gladkov Signed-off-by: Eric W. Biederman Link: https://lkml.kernel.org/r/20220518171730.l65lmnnjtnxnftpq@example.org Signed-off-by: Eric W. Biederman --- include/linux/user_namespace.h | 35 ++++++++++++++++++++++------------- 1 file changed, 22 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 33a4240e6a6f..45f09bec02c4 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -54,15 +54,17 @@ enum ucount_type { UCOUNT_FANOTIFY_GROUPS, UCOUNT_FANOTIFY_MARKS, #endif + UCOUNT_COUNTS, +}; + +enum rlimit_type { UCOUNT_RLIMIT_NPROC, UCOUNT_RLIMIT_MSGQUEUE, UCOUNT_RLIMIT_SIGPENDING, UCOUNT_RLIMIT_MEMLOCK, - UCOUNT_COUNTS, + UCOUNT_RLIMIT_COUNTS, }; -#define MAX_PER_NAMESPACE_UCOUNTS UCOUNT_RLIMIT_NPROC - struct user_namespace { struct uid_gid_map uid_map; struct uid_gid_map gid_map; @@ -99,6 +101,7 @@ struct user_namespace { #endif struct ucounts *ucounts; long ucount_max[UCOUNT_COUNTS]; + long rlimit_max[UCOUNT_RLIMIT_COUNTS]; } __randomize_layout; struct ucounts { @@ -107,6 +110,7 @@ struct ucounts { kuid_t uid; atomic_t count; atomic_long_t ucount[UCOUNT_COUNTS]; + atomic_long_t rlimit[UCOUNT_RLIMIT_COUNTS]; }; extern struct user_namespace init_user_ns; @@ -120,21 +124,26 @@ struct ucounts *alloc_ucounts(struct user_namespace *ns, kuid_t uid); struct ucounts * __must_check get_ucounts(struct ucounts *ucounts); void put_ucounts(struct ucounts *ucounts); -static inline long get_ucounts_value(struct ucounts *ucounts, enum ucount_type type) +static inline long get_rlimit_value(struct ucounts *ucounts, enum rlimit_type type) { - return atomic_long_read(&ucounts->ucount[type]); + return atomic_long_read(&ucounts->rlimit[type]); } -long inc_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v); -bool dec_rlimit_ucounts(struct ucounts *ucounts, enum ucount_type type, long v); -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum ucount_type type); -void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum ucount_type type); -bool is_ucounts_overlimit(struct ucounts *ucounts, enum ucount_type type, unsigned long max); +long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); +bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type); +void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type); +bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max); + +static inline long get_userns_rlimit_max(struct user_namespace *ns, enum rlimit_type type) +{ + return READ_ONCE(ns->rlimit_max[type]); +} -static inline void set_rlimit_ucount_max(struct user_namespace *ns, - enum ucount_type type, unsigned long max) +static inline void set_userns_rlimit_max(struct user_namespace *ns, + enum rlimit_type type, unsigned long max) { - ns->ucount_max[type] = max <= LONG_MAX ? max : LONG_MAX; + ns->rlimit_max[type] = max <= LONG_MAX ? max : LONG_MAX; } #ifdef CONFIG_USER_NS -- cgit From 3f68e69520d3d52d66a6ad872a75b7d8f2ea7665 Mon Sep 17 00:00:00 2001 From: Sunil V L Date: Thu, 19 May 2022 10:45:12 +0530 Subject: riscv/efi_stub: Add support for RISCV_EFI_BOOT_PROTOCOL Add support for getting the boot hart ID from the Linux EFI stub using RISCV_EFI_BOOT_PROTOCOL. This method is preferred over the existing DT based approach since it works irrespective of DT or ACPI. The specification of the protocol is hosted at: https://github.com/riscv-non-isa/riscv-uefi Signed-off-by: Sunil V L Acked-by: Palmer Dabbelt Reviewed-by: Heinrich Schuchardt Link: https://lore.kernel.org/r/20220519051512.136724-2-sunilvl@ventanamicro.com [ardb: minor tweaks for coding style and whitespace] Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 580ce607d6f5..0412304ce34e 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -410,6 +410,8 @@ void efi_native_runtime_setup(void); #define LINUX_EFI_MOK_VARIABLE_TABLE_GUID EFI_GUID(0xc451ed2b, 0x9694, 0x45d3, 0xba, 0xba, 0xed, 0x9f, 0x89, 0x88, 0xa3, 0x89) #define LINUX_EFI_COCO_SECRET_AREA_GUID EFI_GUID(0xadf956ad, 0xe98c, 0x484c, 0xae, 0x11, 0xb5, 0x1c, 0x7d, 0x33, 0x64, 0x47) +#define RISCV_EFI_BOOT_PROTOCOL_GUID EFI_GUID(0xccd15fec, 0x6f73, 0x4eec, 0x83, 0x95, 0x3e, 0x69, 0xe4, 0xb9, 0x40, 0xbf) + /* * This GUID may be installed onto the kernel image's handle as a NULL protocol * to signal to the stub that the placement of the image should be respected, -- cgit From 238e34ad7d5ce36939ac1442c98c7cbb7771f4de Mon Sep 17 00:00:00 2001 From: Jishnu Prakash Date: Sun, 3 Apr 2022 18:47:47 +0530 Subject: iio: adc: qcom-vadc-common: add reverse scaling for PMIC5 Gen2 ADC_TM Add reverse scaling function for PMIC5 Gen2 ADC_TM, to convert temperature to raw ADC code, for setting thresholds for thermistor channels. Signed-off-by: Jishnu Prakash Acked-by: Jonathan Cameron Link: https://lore.kernel.org/r/1648991869-20899-3-git-send-email-quic_jprakash@quicinc.com Signed-off-by: Daniel Lezcano --- include/linux/iio/adc/qcom-vadc-common.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iio/adc/qcom-vadc-common.h b/include/linux/iio/adc/qcom-vadc-common.h index ce78d4804994..aa21b032e861 100644 --- a/include/linux/iio/adc/qcom-vadc-common.h +++ b/include/linux/iio/adc/qcom-vadc-common.h @@ -152,6 +152,8 @@ int qcom_adc5_hw_scale(enum vadc_scale_fn_type scaletype, u16 qcom_adc_tm5_temp_volt_scale(unsigned int prescale_ratio, u32 full_scale_code_volt, int temp); +u16 qcom_adc_tm5_gen2_temp_res_scale(int temp); + int qcom_adc5_prescaling_from_dt(u32 num, u32 den); int qcom_adc5_hw_settle_time_from_dt(u32 value, const unsigned int *hw_settle); -- cgit From bf70c577516b8d9fbe703371aa98bbea13661cec Mon Sep 17 00:00:00 2001 From: Manaf Meethalavalappu Pallikunhi Date: Wed, 9 Mar 2022 00:56:26 +0530 Subject: thermal/drivers/thermal_of: Add change_mode ops support for thermal_of sensor The sensor driver which register through thermal_of interface doesn't have an option to get thermal zone mode change notification from thermal core. Add support for change_mode ops in thermal_of interface so that sensor driver can use this ops for mode change notification. Signed-off-by: Manaf Meethalavalappu Pallikunhi Link: https://lore.kernel.org/r/1646767586-31908-1-git-send-email-quic_manafm@quicinc.com Signed-off-by: Daniel Lezcano --- include/linux/thermal.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index c314893970b3..365733b428d8 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -299,6 +299,8 @@ struct thermal_zone_params { * temperature. * @set_trip_temp: a pointer to a function that sets the trip temperature on * hardware. + * @change_mode: a pointer to a function that notifies the thermal zone + * mode change. */ struct thermal_zone_of_device_ops { int (*get_temp)(void *, int *); @@ -306,6 +308,7 @@ struct thermal_zone_of_device_ops { int (*set_trips)(void *, int, int); int (*set_emul_temp)(void *, int); int (*set_trip_temp)(void *, int, int); + int (*change_mode) (void *, enum thermal_device_mode); }; /* Function declarations */ -- cgit From 7782cfeca7d420e8bb707613d4cfb0f7ff29bb3a Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Fri, 13 May 2022 12:29:38 +0200 Subject: random: remove extern from functions in header Accoriding to the kernel style guide, having `extern` on functions in headers is old school and deprecated, and doesn't add anything. So remove them from random.h, and tidy up the file a little bit too. Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 77 ++++++++++++++++++++------------------------------ 1 file changed, 31 insertions(+), 46 deletions(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index b52963955a99..97f879ea78a5 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -12,13 +12,12 @@ struct notifier_block; -extern void add_device_randomness(const void *, size_t); -extern void add_bootloader_randomness(const void *, size_t); -extern void add_input_randomness(unsigned int type, unsigned int code, - unsigned int value) __latent_entropy; -extern void add_interrupt_randomness(int irq) __latent_entropy; -extern void add_hwgenerator_randomness(const void *buffer, size_t count, - size_t entropy); +void add_device_randomness(const void *, size_t); +void add_bootloader_randomness(const void *, size_t); +void add_input_randomness(unsigned int type, unsigned int code, + unsigned int value) __latent_entropy; +void add_interrupt_randomness(int irq) __latent_entropy; +void add_hwgenerator_randomness(const void *buffer, size_t count, size_t entropy); #if defined(LATENT_ENTROPY_PLUGIN) && !defined(__CHECKER__) static inline void add_latent_entropy(void) @@ -26,30 +25,20 @@ static inline void add_latent_entropy(void) add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy)); } #else -static inline void add_latent_entropy(void) {} +static inline void add_latent_entropy(void) { } #endif #if IS_ENABLED(CONFIG_VMGENID) -extern void add_vmfork_randomness(const void *unique_vm_id, size_t size); -extern int register_random_vmfork_notifier(struct notifier_block *nb); -extern int unregister_random_vmfork_notifier(struct notifier_block *nb); +void add_vmfork_randomness(const void *unique_vm_id, size_t size); +int register_random_vmfork_notifier(struct notifier_block *nb); +int unregister_random_vmfork_notifier(struct notifier_block *nb); #else static inline int register_random_vmfork_notifier(struct notifier_block *nb) { return 0; } static inline int unregister_random_vmfork_notifier(struct notifier_block *nb) { return 0; } #endif -extern void get_random_bytes(void *buf, size_t nbytes); -extern int wait_for_random_bytes(void); -extern int __init random_init(const char *command_line); -extern bool rng_is_initialized(void); -extern int register_random_ready_notifier(struct notifier_block *nb); -extern int unregister_random_ready_notifier(struct notifier_block *nb); -extern size_t __must_check get_random_bytes_arch(void *buf, size_t nbytes); - -#ifndef MODULE -extern const struct file_operations random_fops, urandom_fops; -#endif - +void get_random_bytes(void *buf, size_t nbytes); +size_t __must_check get_random_bytes_arch(void *buf, size_t nbytes); u32 get_random_u32(void); u64 get_random_u64(void); static inline unsigned int get_random_int(void) @@ -81,11 +70,17 @@ static inline unsigned long get_random_long(void) static inline unsigned long get_random_canary(void) { - unsigned long val = get_random_long(); - - return val & CANARY_MASK; + return get_random_long() & CANARY_MASK; } +unsigned long randomize_page(unsigned long start, unsigned long range); + +int __init random_init(const char *command_line); +bool rng_is_initialized(void); +int wait_for_random_bytes(void); +int register_random_ready_notifier(struct notifier_block *nb); +int unregister_random_ready_notifier(struct notifier_block *nb); + /* Calls wait_for_random_bytes() and then calls get_random_bytes(buf, nbytes). * Returns the result of the call to wait_for_random_bytes. */ static inline int get_random_bytes_wait(void *buf, size_t nbytes) @@ -109,8 +104,6 @@ declare_get_random_var_wait(int) declare_get_random_var_wait(long) #undef declare_get_random_var -unsigned long randomize_page(unsigned long start, unsigned long range); - /* * This is designed to be standalone for just prandom * users, but for now we include it from @@ -121,22 +114,10 @@ unsigned long randomize_page(unsigned long start, unsigned long range); #ifdef CONFIG_ARCH_RANDOM # include #else -static inline bool __must_check arch_get_random_long(unsigned long *v) -{ - return false; -} -static inline bool __must_check arch_get_random_int(unsigned int *v) -{ - return false; -} -static inline bool __must_check arch_get_random_seed_long(unsigned long *v) -{ - return false; -} -static inline bool __must_check arch_get_random_seed_int(unsigned int *v) -{ - return false; -} +static inline bool __must_check arch_get_random_long(unsigned long *v) { return false; } +static inline bool __must_check arch_get_random_int(unsigned int *v) { return false; } +static inline bool __must_check arch_get_random_seed_long(unsigned long *v) { return false; } +static inline bool __must_check arch_get_random_seed_int(unsigned int *v) { return false; } #endif /* @@ -160,8 +141,12 @@ static inline bool __init arch_get_random_long_early(unsigned long *v) #endif #ifdef CONFIG_SMP -extern int random_prepare_cpu(unsigned int cpu); -extern int random_online_cpu(unsigned int cpu); +int random_prepare_cpu(unsigned int cpu); +int random_online_cpu(unsigned int cpu); +#endif + +#ifndef MODULE +extern const struct file_operations random_fops, urandom_fops; #endif #endif /* _LINUX_RANDOM_H */ -- cgit From 7c3a8a1db5e03d02cc0abb3357a84b8b326dfac3 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Fri, 13 May 2022 12:32:23 +0200 Subject: random: use proper return types on get_random_{int,long}_wait() Before these were returning signed values, but the API is intended to be used with unsigned values. Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 97f879ea78a5..883bf9a23d84 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -90,18 +90,18 @@ static inline int get_random_bytes_wait(void *buf, size_t nbytes) return ret; } -#define declare_get_random_var_wait(var) \ - static inline int get_random_ ## var ## _wait(var *out) { \ +#define declare_get_random_var_wait(name, ret_type) \ + static inline int get_random_ ## name ## _wait(ret_type *out) { \ int ret = wait_for_random_bytes(); \ if (unlikely(ret)) \ return ret; \ - *out = get_random_ ## var(); \ + *out = get_random_ ## name(); \ return 0; \ } -declare_get_random_var_wait(u32) -declare_get_random_var_wait(u64) -declare_get_random_var_wait(int) -declare_get_random_var_wait(long) +declare_get_random_var_wait(u32, u32) +declare_get_random_var_wait(u64, u32) +declare_get_random_var_wait(int, unsigned int) +declare_get_random_var_wait(long, unsigned long) #undef declare_get_random_var /* -- cgit From a19402634c435a4eae226df53c141cdbb9922e7b Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Fri, 13 May 2022 13:18:46 +0200 Subject: random: make consistent use of buf and len The current code was a mix of "nbytes", "count", "size", "buffer", "in", and so forth. Instead, let's clean this up by naming input parameters "buf" (or "ubuf") and "len", so that you always understand that you're reading this variety of function argument. Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 883bf9a23d84..fc82f1dc36f1 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -12,12 +12,12 @@ struct notifier_block; -void add_device_randomness(const void *, size_t); -void add_bootloader_randomness(const void *, size_t); +void add_device_randomness(const void *buf, size_t len); +void add_bootloader_randomness(const void *buf, size_t len); void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) __latent_entropy; void add_interrupt_randomness(int irq) __latent_entropy; -void add_hwgenerator_randomness(const void *buffer, size_t count, size_t entropy); +void add_hwgenerator_randomness(const void *buf, size_t len, size_t entropy); #if defined(LATENT_ENTROPY_PLUGIN) && !defined(__CHECKER__) static inline void add_latent_entropy(void) @@ -29,7 +29,7 @@ static inline void add_latent_entropy(void) { } #endif #if IS_ENABLED(CONFIG_VMGENID) -void add_vmfork_randomness(const void *unique_vm_id, size_t size); +void add_vmfork_randomness(const void *unique_vm_id, size_t len); int register_random_vmfork_notifier(struct notifier_block *nb); int unregister_random_vmfork_notifier(struct notifier_block *nb); #else @@ -37,8 +37,8 @@ static inline int register_random_vmfork_notifier(struct notifier_block *nb) { r static inline int unregister_random_vmfork_notifier(struct notifier_block *nb) { return 0; } #endif -void get_random_bytes(void *buf, size_t nbytes); -size_t __must_check get_random_bytes_arch(void *buf, size_t nbytes); +void get_random_bytes(void *buf, size_t len); +size_t __must_check get_random_bytes_arch(void *buf, size_t len); u32 get_random_u32(void); u64 get_random_u64(void); static inline unsigned int get_random_int(void) -- cgit From 248561ad25a8ba4ecbc7df42f9a5a82fd5fbb4f6 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sat, 14 May 2022 13:09:17 +0200 Subject: random: remove get_random_bytes_arch() and add rng_has_arch_random() The RNG incorporates RDRAND into its state at boot and every time it reseeds, so there's no reason for callers to use it directly. The hashing that the RNG does on it is preferable to using the bytes raw. The only current use case of get_random_bytes_arch() is vsprintf's siphash key for pointer hashing, which uses it to initialize the pointer secret earlier than usual if RDRAND is available. In order to replace this narrow use case, just expose whether RDRAND is mixed into the RNG, with a new function called rng_has_arch_random(). With that taken care of, there are no users of get_random_bytes_arch() left, so it can be removed. Later, if trust_cpu gets turned on by default (as most distros are doing), this one use of rng_has_arch_random() can probably go away as well. Cc: Steven Rostedt Cc: Sergey Senozhatsky Acked-by: Petr Mladek # for vsprintf.c Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index fc82f1dc36f1..6af130c6edb9 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -38,7 +38,6 @@ static inline int unregister_random_vmfork_notifier(struct notifier_block *nb) { #endif void get_random_bytes(void *buf, size_t len); -size_t __must_check get_random_bytes_arch(void *buf, size_t len); u32 get_random_u32(void); u64 get_random_u64(void); static inline unsigned int get_random_int(void) @@ -77,6 +76,7 @@ unsigned long randomize_page(unsigned long start, unsigned long range); int __init random_init(const char *command_line); bool rng_is_initialized(void); +bool rng_has_arch_random(void); int wait_for_random_bytes(void); int register_random_ready_notifier(struct notifier_block *nb); int unregister_random_ready_notifier(struct notifier_block *nb); -- cgit From 6701de6c51c172b5de5633374479503c81fefc0b Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sun, 15 May 2022 15:06:18 +0200 Subject: random: remove mostly unused async readiness notifier The register_random_ready_notifier() notifier is somewhat complicated, and was already recently rewritten to use notifier blocks. It is only used now by one consumer in the kernel, vsprintf.c, for which the async mechanism is really overly complex for what it actually needs. This commit removes register_random_ready_notifier() and unregister_random_ ready_notifier(), because it just adds complication with little utility, and changes vsprintf.c to just check on `!rng_is_initialized() && !rng_has_arch_random()`, which will eventually be true. Performance- wise, that code was already using a static branch, so there's basically no overhead at all to this change. Cc: Steven Rostedt Cc: Sergey Senozhatsky Acked-by: Petr Mladek # for vsprintf.c Reviewed-by: Petr Mladek Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 6af130c6edb9..d2360b2825b6 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -78,8 +78,6 @@ int __init random_init(const char *command_line); bool rng_is_initialized(void); bool rng_has_arch_random(void); int wait_for_random_bytes(void); -int register_random_ready_notifier(struct notifier_block *nb); -int unregister_random_ready_notifier(struct notifier_block *nb); /* Calls wait_for_random_bytes() and then calls get_random_bytes(buf, nbytes). * Returns the result of the call to wait_for_random_bytes. */ -- cgit From 5ad7dd882e45d7fe432c32e896e2aaa0b21746ea Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sat, 14 May 2022 13:59:30 +0200 Subject: random: move randomize_page() into mm where it belongs randomize_page is an mm function. It is documented like one. It contains the history of one. It has the naming convention of one. It looks just like another very similar function in mm, randomize_stack_top(). And it has always been maintained and updated by mm people. There is no need for it to be in random.c. In the "which shape does not look like the other ones" test, pointing to randomize_page() is correct. So move randomize_page() into mm/util.c, right next to the similar randomize_stack_top() function. This commit contains no actual code changes. Cc: Andrew Morton Signed-off-by: Jason A. Donenfeld --- include/linux/mm.h | 1 + include/linux/random.h | 2 -- 2 files changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 9f44254af8ce..b0183450e484 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2677,6 +2677,7 @@ extern int install_special_mapping(struct mm_struct *mm, unsigned long flags, struct page **pages); unsigned long randomize_stack_top(unsigned long stack_top); +unsigned long randomize_page(unsigned long start, unsigned long range); extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long); diff --git a/include/linux/random.h b/include/linux/random.h index d2360b2825b6..fae0c84027fd 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -72,8 +72,6 @@ static inline unsigned long get_random_canary(void) return get_random_long() & CANARY_MASK; } -unsigned long randomize_page(unsigned long start, unsigned long range); - int __init random_init(const char *command_line); bool rng_is_initialized(void); bool rng_has_arch_random(void); -- cgit From 135c579d77d066aece83609329150b87fe81e454 Mon Sep 17 00:00:00 2001 From: Hector Martin Date: Mon, 2 May 2022 18:25:05 +0900 Subject: tty: serial: samsung_tty: Fix suspend/resume on S5L We were restoring the IRQ masks then clearing them again, because ucon_mask wasn't set properly. Adding that makes suspend/resume work as intended. Signed-off-by: Hector Martin Link: https://lore.kernel.org/r/20220502092505.30934-1-marcan@marcan.st Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_s3c.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/serial_s3c.h b/include/linux/serial_s3c.h index f6c3323fc4c5..dec15f5b3dec 100644 --- a/include/linux/serial_s3c.h +++ b/include/linux/serial_s3c.h @@ -256,6 +256,9 @@ #define APPLE_S5L_UCON_DEFAULT (S3C2410_UCON_TXIRQMODE | \ S3C2410_UCON_RXIRQMODE | \ S3C2410_UCON_RXFIFO_TOI) +#define APPLE_S5L_UCON_MASK (APPLE_S5L_UCON_RXTO_ENA_MSK | \ + APPLE_S5L_UCON_RXTHRESH_ENA_MSK | \ + APPLE_S5L_UCON_TXTHRESH_ENA_MSK) #define APPLE_S5L_UTRSTAT_RXTHRESH (1<<4) #define APPLE_S5L_UTRSTAT_TXTHRESH (1<<5) -- cgit From 0b6c14bdd908879078ec85d65cfb78472a97e4e7 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Fri, 1 Apr 2022 13:10:31 -0400 Subject: SUNRPC: Make cache_req::thread_wait an unsigned long The second parameter of wait_for_completion_interruptible_timeout() is a jiffies value whose type is "unsigned long". Avoid an unnecessary and potentially fraught implicit type conversion at the wait_for_completion_interruptible_timeout() call site in cache_wait_req(). Signed-off-by: Chuck Lever --- include/linux/sunrpc/cache.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/cache.h b/include/linux/sunrpc/cache.h index b134b2b3371c..ec5a555df96f 100644 --- a/include/linux/sunrpc/cache.h +++ b/include/linux/sunrpc/cache.h @@ -121,17 +121,17 @@ struct cache_detail { struct net *net; }; - /* this must be embedded in any request structure that * identifies an object that will want a callback on * a cache fill */ struct cache_req { struct cache_deferred_req *(*defer)(struct cache_req *req); - int thread_wait; /* How long (jiffies) we can block the - * current thread to wait for updates. - */ + unsigned long thread_wait; /* How long (jiffies) we can block the + * current thread to wait for updates. + */ }; + /* this must be embedded in a deferred_request that is being * delayed awaiting cache-fill */ -- cgit From 983084b2672c593959e3148d6a17c8b920797dde Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 6 Apr 2022 14:38:59 -0400 Subject: SUNRPC: Remove svc_rqst::rq_xprt_hlen Clean up: This field is now always set to zero. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 217711fc9cac..b73704fbd09b 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -257,7 +257,6 @@ struct svc_rqst { void * rq_xprt_ctxt; /* transport specific context ptr */ struct svc_deferred_req*rq_deferred; /* deferred request we are replaying */ - size_t rq_xprt_hlen; /* xprt header len */ struct xdr_buf rq_arg; struct xdr_stream rq_arg_stream; struct xdr_stream rq_res_stream; @@ -397,7 +396,6 @@ struct svc_deferred_req { size_t daddrlen; void *xprt_ctxt; struct cache_deferred_req handle; - size_t xprt_hlen; int argslen; __be32 args[]; }; -- cgit From 591502c5cb325b1c6ec59ab161927d606b918aa0 Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 2 May 2022 14:19:24 -0700 Subject: fs/lock: add helper locks_owner_has_blockers to check for blockers Add helper locks_owner_has_blockers to check if there is any blockers for a given lockowner. Reviewed-by: J. Bruce Fields Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton --- include/linux/fs.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index bbde95387a23..b8ed7f974fb4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1174,6 +1174,8 @@ extern void lease_unregister_notifier(struct notifier_block *); struct files_struct; extern void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files); +extern bool locks_owner_has_blockers(struct file_lock_context *flctx, + fl_owner_t owner); #else /* !CONFIG_FILE_LOCKING */ static inline int fcntl_getlk(struct file *file, unsigned int cmd, struct flock __user *user) @@ -1309,6 +1311,11 @@ static inline int lease_modify(struct file_lock *fl, int arg, struct files_struct; static inline void show_fd_locks(struct seq_file *f, struct file *filp, struct files_struct *files) {} +static inline bool locks_owner_has_blockers(struct file_lock_context *flctx, + fl_owner_t owner) +{ + return false; +} #endif /* !CONFIG_FILE_LOCKING */ static inline struct inode *file_inode(const struct file *f) -- cgit From 2443da2259e97688f93d64d17ab69b15f466078a Mon Sep 17 00:00:00 2001 From: Dai Ngo Date: Mon, 2 May 2022 14:19:25 -0700 Subject: fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict Add 2 new callbacks, lm_lock_expirable and lm_expire_lock, to lock_manager_operations to allow the lock manager to take appropriate action to resolve the lock conflict if possible. A new field, lm_mod_owner, is also added to lock_manager_operations. The lm_mod_owner is used by the fs/lock code to make sure the lock manager module such as nfsd, is not freed while lock conflict is being resolved. lm_lock_expirable checks and returns true to indicate that the lock conflict can be resolved else return false. This callback must be called with the flc_lock held so it can not block. lm_expire_lock is called to resolve the lock conflict if the returned value from lm_lock_expirable is true. This callback is called without the flc_lock held since it's allowed to block. Upon returning from this callback, the lock conflict should be resolved and the caller is expected to restart the conflict check from the beginnning of the list. Lock manager, such as NFSv4 courteous server, uses this callback to resolve conflict by destroying lock owner, or the NFSv4 courtesy client (client that has expired but allowed to maintains its states) that owns the lock. Reviewed-by: J. Bruce Fields Signed-off-by: Dai Ngo Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton --- include/linux/fs.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index b8ed7f974fb4..aa6c1bbdb8c4 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1029,6 +1029,7 @@ struct file_lock_operations { }; struct lock_manager_operations { + void *lm_mod_owner; fl_owner_t (*lm_get_owner)(fl_owner_t); void (*lm_put_owner)(fl_owner_t); void (*lm_notify)(struct file_lock *); /* unblock callback */ @@ -1037,6 +1038,8 @@ struct lock_manager_operations { int (*lm_change)(struct file_lock *, int, struct list_head *); void (*lm_setup)(struct file_lock *, void **); bool (*lm_breaker_owns_lease)(struct file_lock *); + bool (*lm_lock_expirable)(struct file_lock *cfl); + void (*lm_expire_lock)(void); }; struct lock_manager { -- cgit From 2059b698a2efcce3c67b0e61eab7d7680bbe10bd Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 10 May 2022 11:45:53 -0400 Subject: SUNRPC: Simplify synopsis of svc_pool_for_cpu() Clean up: There is one caller. The @cpu argument can be made implicit now that a get_cpu/put_cpu pair is no longer needed. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index b73704fbd09b..daecb009c05b 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -504,7 +504,7 @@ int svc_register(const struct svc_serv *, struct net *, const int, void svc_wake_up(struct svc_serv *); void svc_reserve(struct svc_rqst *rqstp, int space); -struct svc_pool * svc_pool_for_cpu(struct svc_serv *serv, int cpu); +struct svc_pool *svc_pool_for_cpu(struct svc_serv *serv); char * svc_print_addr(struct svc_rqst *, char *, size_t); const char * svc_proc_name(const struct svc_rqst *rqstp); int svc_encode_result_payload(struct svc_rqst *rqstp, -- cgit From 53c83d6d8e399fad3d3d25df0ea0d54eb0f94f88 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 18 May 2022 15:23:45 +0200 Subject: siphash: add SPDX tags as sole licensing authority The text "dual BSD/GPLv2 license" is somewhat ambiguous, and moving this over to SPDX is overdue. This commit adds SPDX tags to the relevant files and clarifies that it's GPLv2 only and 3-clause BSD. It also removes the old text, so that the SPDX tags are the only source of the information. Suggested-by: Thomas Gleixner Signed-off-by: Jason A. Donenfeld Signed-off-by: Greg Kroah-Hartman --- include/linux/siphash.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/siphash.h b/include/linux/siphash.h index cce8a9acc76c..9e822311b29e 100644 --- a/include/linux/siphash.h +++ b/include/linux/siphash.h @@ -1,6 +1,5 @@ -/* Copyright (C) 2016 Jason A. Donenfeld . All Rights Reserved. - * - * This file is provided under a dual BSD/GPLv2 license. +/* SPDX-License-Identifier: (GPL-2.0-only OR BSD-3-Clause) */ +/* Copyright (C) 2016-2022 Jason A. Donenfeld . All Rights Reserved. * * SipHash: a fast short-input PRF * https://131002.net/siphash/ -- cgit From e6d3c99adf54fc1dcd07729990dc32acd3be874b Mon Sep 17 00:00:00 2001 From: Abhyuday Godhasara Date: Wed, 27 Apr 2022 00:48:03 -0700 Subject: driver: soc: xilinx: Update function prototype for xlnx_unregister_event As per the current implementation only single callback data gets saved per event, driver is throwing an error if try to register multiple callback for same event. So at time of unregistration of any event required things are event details and callback handler as parameter of xlnx_unregister_event(). As part of adding support of multiple callbacks for same event also require change in prototype of xlnx_unregister_event(). During unregistration of any events, now required things are event details, callback handler and agent's private data as parameter of xlnx_unregister_event(). Also modify the usage of xlnx_unregister_event() in xilinx/zynqmp_power.c driver as per new implementation. Signed-off-by: Abhyuday Godhasara Link: https://lore.kernel.org/r/20220427074803.19009-3-abhyuday.godhasara@xilinx.com Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/xlnx-event-manager.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/firmware/xlnx-event-manager.h b/include/linux/firmware/xlnx-event-manager.h index 3f87c4929d21..82e8254b0f80 100644 --- a/include/linux/firmware/xlnx-event-manager.h +++ b/include/linux/firmware/xlnx-event-manager.h @@ -17,7 +17,7 @@ int xlnx_register_event(const enum pm_api_cb_id cb_type, const u32 node_id, event_cb_func_t cb_fun, void *data); int xlnx_unregister_event(const enum pm_api_cb_id cb_type, const u32 node_id, - const u32 event, event_cb_func_t cb_fun); + const u32 event, event_cb_func_t cb_fun, void *data); #else static inline int xlnx_register_event(const enum pm_api_cb_id cb_type, const u32 node_id, const u32 event, const bool wake, @@ -27,7 +27,7 @@ static inline int xlnx_register_event(const enum pm_api_cb_id cb_type, const u32 } static inline int xlnx_unregister_event(const enum pm_api_cb_id cb_type, const u32 node_id, - const u32 event, event_cb_func_t cb_fun) + const u32 event, event_cb_func_t cb_fun, void *data) { return -ENODEV; } -- cgit From 885525c1e7e27ea6207d648a8db20dfbbd9e4238 Mon Sep 17 00:00:00 2001 From: Miquel Raynal Date: Wed, 27 Apr 2022 11:56:48 +0200 Subject: clk: renesas: r9a06g032: Export function to set dmamux The dmamux register is located within the system controller. Without syscon, we need an extra helper in order to give write access to this register to a dmamux driver. Signed-off-by: Miquel Raynal Acked-by: Stephen Boyd Reviewed-by: Geert Uytterhoeven Acked-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/20220427095653.91804-5-miquel.raynal@bootlin.com Signed-off-by: Vinod Koul --- include/linux/soc/renesas/r9a06g032-sysctrl.h | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 include/linux/soc/renesas/r9a06g032-sysctrl.h (limited to 'include/linux') diff --git a/include/linux/soc/renesas/r9a06g032-sysctrl.h b/include/linux/soc/renesas/r9a06g032-sysctrl.h new file mode 100644 index 000000000000..066dfb15cbdd --- /dev/null +++ b/include/linux/soc/renesas/r9a06g032-sysctrl.h @@ -0,0 +1,11 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_SOC_RENESAS_R9A06G032_SYSCTRL_H__ +#define __LINUX_SOC_RENESAS_R9A06G032_SYSCTRL_H__ + +#ifdef CONFIG_CLK_R9A06G032 +int r9a06g032_sysctrl_set_dmamux(u32 mask, u32 val); +#else +static inline int r9a06g032_sysctrl_set_dmamux(u32 mask, u32 val) { return -ENODEV; } +#endif + +#endif /* __LINUX_SOC_RENESAS_R9A06G032_SYSCTRL_H__ */ -- cgit From 13dfd97a341a5cf9d15f415dd469f45e971ef12a Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Thu, 19 May 2022 14:02:32 +0300 Subject: notifier: Add atomic_notifier_call_chain_is_empty() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add atomic_notifier_call_chain_is_empty() that returns true if given atomic call chain is empty. The first user of this new notifier API function will be the kernel power-off core code that will support power-off call chains. The core code will need to check whether there is a power-off handler registered at all in order to decide whether to halt machine or power it off. Reviewed-by: Michał Mirosław Signed-off-by: Dmitry Osipenko Signed-off-by: Rafael J. Wysocki --- include/linux/notifier.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/notifier.h b/include/linux/notifier.h index 87069b8459af..95e2440037de 100644 --- a/include/linux/notifier.h +++ b/include/linux/notifier.h @@ -173,6 +173,8 @@ extern int blocking_notifier_call_chain_robust(struct blocking_notifier_head *nh extern int raw_notifier_call_chain_robust(struct raw_notifier_head *nh, unsigned long val_up, unsigned long val_down, void *v); +extern bool atomic_notifier_call_chain_is_empty(struct atomic_notifier_head *nh); + #define NOTIFY_DONE 0x0000 /* Don't care */ #define NOTIFY_OK 0x0001 /* Suits me */ #define NOTIFY_STOP_MASK 0x8000 /* Don't call further */ -- cgit From c82f898d873ce10d88f8d60b38c8dd19f1ed7c66 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 10 May 2022 02:32:10 +0300 Subject: notifier: Add blocking/atomic_notifier_chain_register_unique_prio() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add variant of blocking/atomic_notifier_chain_register() functions that allow registration of a notifier only if it has unique priority, otherwise -EBUSY error code is returned by the new functions. Reviewed-by: Michał Mirosław Signed-off-by: Dmitry Osipenko Signed-off-by: Rafael J. Wysocki --- include/linux/notifier.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/notifier.h b/include/linux/notifier.h index 95e2440037de..aef88c2d1173 100644 --- a/include/linux/notifier.h +++ b/include/linux/notifier.h @@ -150,6 +150,11 @@ extern int raw_notifier_chain_register(struct raw_notifier_head *nh, extern int srcu_notifier_chain_register(struct srcu_notifier_head *nh, struct notifier_block *nb); +extern int atomic_notifier_chain_register_unique_prio( + struct atomic_notifier_head *nh, struct notifier_block *nb); +extern int blocking_notifier_chain_register_unique_prio( + struct blocking_notifier_head *nh, struct notifier_block *nb); + extern int atomic_notifier_chain_unregister(struct atomic_notifier_head *nh, struct notifier_block *nb); extern int blocking_notifier_chain_unregister(struct blocking_notifier_head *nh, -- cgit From 232edc2f72f5beda3586360de6254d443820050a Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 10 May 2022 02:32:11 +0300 Subject: kernel/reboot: Introduce sys-off handler API In order to support power-off chaining we need to get rid of the global pm_* variables, replacing them with the new kernel API functions that support chaining. Introduce new generic sys-off handler API that brings the following features: 1. Power-off and restart handlers are registered using same API function that supports chaining, hence all power-off and restart modes will support chaining using this unified function. 2. Prevents notifier priority collisions by disallowing registration of multiple handlers at the non-default priority level. 3. Supports passing opaque user argument to callback, which allows us to remove global variables from drivers. This patch adds support of the following sys-off modes: - SYS_OFF_MODE_POWER_OFF_PREPARE that replaces global pm_power_off_prepare variable and provides chaining support for power-off-prepare handlers. - SYS_OFF_MODE_POWER_OFF that replaces global pm_power_off variable and provides chaining support for power-off handlers. - SYS_OFF_MODE_RESTART that provides a better restart API, removing a need from drivers to have a global scratch variable by utilizing the opaque callback argument. Signed-off-by: Dmitry Osipenko Signed-off-by: Rafael J. Wysocki --- include/linux/reboot.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) (limited to 'include/linux') diff --git a/include/linux/reboot.h b/include/linux/reboot.h index af907a3d68d1..b79f8942ea5f 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -7,6 +7,7 @@ #include struct device; +struct sys_off_handler; #define SYS_DOWN 0x0001 /* Notify of system down */ #define SYS_RESTART SYS_DOWN @@ -62,6 +63,82 @@ extern void machine_shutdown(void); struct pt_regs; extern void machine_crash_shutdown(struct pt_regs *); +/* + * sys-off handler API. + */ + +/* + * Standard sys-off priority levels. Users are expected to set priorities + * relative to the standard levels. + * + * SYS_OFF_PRIO_PLATFORM: Use this for platform-level handlers. + * + * SYS_OFF_PRIO_LOW: Use this for handler of last resort. + * + * SYS_OFF_PRIO_DEFAULT: Use this for normal handlers. + * + * SYS_OFF_PRIO_HIGH: Use this for higher priority handlers. + * + * SYS_OFF_PRIO_FIRMWARE: Use this if handler uses firmware call. + */ +#define SYS_OFF_PRIO_PLATFORM -256 +#define SYS_OFF_PRIO_LOW -128 +#define SYS_OFF_PRIO_DEFAULT 0 +#define SYS_OFF_PRIO_HIGH 192 +#define SYS_OFF_PRIO_FIRMWARE 224 + +enum sys_off_mode { + /** + * @SYS_OFF_MODE_POWER_OFF_PREPARE: + * + * Handlers prepare system to be powered off. Handlers are + * allowed to sleep. + */ + SYS_OFF_MODE_POWER_OFF_PREPARE, + + /** + * @SYS_OFF_MODE_POWER_OFF: + * + * Handlers power-off system. Handlers are disallowed to sleep. + */ + SYS_OFF_MODE_POWER_OFF, + + /** + * @SYS_OFF_MODE_RESTART: + * + * Handlers restart system. Handlers are disallowed to sleep. + */ + SYS_OFF_MODE_RESTART, +}; + +/** + * struct sys_off_data - sys-off callback argument + * + * @mode: Mode ID. Currently used only by the sys-off restart mode, + * see enum reboot_mode for the available modes. + * @cb_data: User's callback data. + * @cmd: Command string. Currently used only by the sys-off restart mode, + * NULL otherwise. + */ +struct sys_off_data { + int mode; + void *cb_data; + const char *cmd; +}; + +struct sys_off_handler * +register_sys_off_handler(enum sys_off_mode mode, + int priority, + int (*callback)(struct sys_off_data *data), + void *cb_data); +void unregister_sys_off_handler(struct sys_off_handler *handler); + +int devm_register_sys_off_handler(struct device *dev, + enum sys_off_mode mode, + int priority, + int (*callback)(struct sys_off_data *data), + void *cb_data); + /* * Architecture independent implemenations of sys_reboot commands. */ -- cgit From 2b6aa7332f8020dfdaffe340ff038aac4df35238 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 10 May 2022 02:32:13 +0300 Subject: kernel/reboot: Add do_kernel_power_off() Add do_kernel_power_off() helper that will remove open-coded pm_power_off invocations from the architecture code. This is the first step on the way to remove the global pm_power_off variable, which will allow us to implement consistent power-off chaining support. Signed-off-by: Dmitry Osipenko Signed-off-by: Rafael J. Wysocki --- include/linux/reboot.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/reboot.h b/include/linux/reboot.h index b79f8942ea5f..99a9753e77bc 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -63,6 +63,8 @@ extern void machine_shutdown(void); struct pt_regs; extern void machine_crash_shutdown(struct pt_regs *); +void do_kernel_power_off(void); + /* * sys-off handler API. */ -- cgit From 0e2110d2e910e44cc7cf23fce14613e232602602 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 10 May 2022 02:32:15 +0300 Subject: kernel/reboot: Add kernel_can_power_off() Add kernel_can_power_off() helper that replaces open-coded checks of the global pm_power_off variable. This is a necessary step towards supporting chained power-off handlers. Signed-off-by: Dmitry Osipenko Signed-off-by: Rafael J. Wysocki --- include/linux/reboot.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/reboot.h b/include/linux/reboot.h index 99a9753e77bc..5837c6c2fb40 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -149,6 +149,7 @@ extern void kernel_restart_prepare(char *cmd); extern void kernel_restart(char *cmd); extern void kernel_halt(void); extern void kernel_power_off(void); +extern bool kernel_can_power_off(void); extern int C_A_D; /* for sysctl */ void ctrl_alt_del(void); -- cgit From fb61375ecfba49e153af561402f49f6fe3bebd39 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 10 May 2022 02:32:16 +0300 Subject: kernel/reboot: Add register_platform_power_off() Add platform-level registration helpers that will ease transition of the arch/platform power-off callbacks to the new sys-off based API, allowing us to remove the global pm_power_off variable in the future. Signed-off-by: Dmitry Osipenko Signed-off-by: Rafael J. Wysocki --- include/linux/reboot.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/reboot.h b/include/linux/reboot.h index 5837c6c2fb40..a8fee1d09c65 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -141,6 +141,9 @@ int devm_register_sys_off_handler(struct device *dev, int (*callback)(struct sys_off_data *data), void *cb_data); +int register_platform_power_off(void (*power_off)(void)); +void unregister_platform_power_off(void (*power_off)(void)); + /* * Architecture independent implemenations of sys_reboot commands. */ -- cgit From 5b71808eb7c97815fc3b9c386ca0ef6daf2dc053 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 10 May 2022 02:32:32 +0300 Subject: reboot: Remove pm_power_off_prepare() All pm_power_off_prepare() users were converted to sys-off handler API. Remove the obsolete global callback variable. Signed-off-by: Dmitry Osipenko Signed-off-by: Rafael J. Wysocki --- include/linux/pm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pm.h b/include/linux/pm.h index e65b3ab28377..3786d306dad2 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -21,7 +21,6 @@ * Callbacks for platform drivers to implement. */ extern void (*pm_power_off)(void); -extern void (*pm_power_off_prepare)(void); struct device; /* we have a circular dep with device.h */ #ifdef CONFIG_VT_CONSOLE_SLEEP -- cgit From d2c5415327171e6c1bca2dc4d8a77f450ecf7150 Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 10 May 2022 02:32:34 +0300 Subject: kernel/reboot: Add devm_register_power_off_handler() Add devm_register_power_off_handler() helper that registers sys-off handler using power-off mode and with a default priority. Most drivers will want to register power-off handler with a default priority, so this helper will reduce the boilerplate code and make code easier to read and follow. Signed-off-by: Dmitry Osipenko Signed-off-by: Rafael J. Wysocki --- include/linux/reboot.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/reboot.h b/include/linux/reboot.h index a8fee1d09c65..3b063c5b6118 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -141,6 +141,10 @@ int devm_register_sys_off_handler(struct device *dev, int (*callback)(struct sys_off_data *data), void *cb_data); +int devm_register_power_off_handler(struct device *dev, + int (*callback)(struct sys_off_data *data), + void *cb_data); + int register_platform_power_off(void (*power_off)(void)); void unregister_platform_power_off(void (*power_off)(void)); -- cgit From 6779db970bd287bb35b28bd5dc256fd7aef19d1c Mon Sep 17 00:00:00 2001 From: Dmitry Osipenko Date: Tue, 10 May 2022 02:32:35 +0300 Subject: kernel/reboot: Add devm_register_restart_handler() Add devm_register_restart_handler() helper that registers sys-off handler using restart mode and with a default priority. Most drivers will want to register restart handler with a default priority, so this helper will reduce the boilerplate code and make code easier to read and follow. Signed-off-by: Dmitry Osipenko Signed-off-by: Rafael J. Wysocki --- include/linux/reboot.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/reboot.h b/include/linux/reboot.h index 3b063c5b6118..d29d37a2abb6 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -145,6 +145,10 @@ int devm_register_power_off_handler(struct device *dev, int (*callback)(struct sys_off_data *data), void *cb_data); +int devm_register_restart_handler(struct device *dev, + int (*callback)(struct sys_off_data *data), + void *cb_data); + int register_platform_power_off(void (*power_off)(void)); void unregister_platform_power_off(void (*power_off)(void)); -- cgit From 15f214f9bdb7c1f560b4bf863c5a72ff53b442a4 Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Fri, 13 May 2022 11:34:33 +0200 Subject: topology: Remove unused cpu_cluster_mask() default_topology[] uses cpu_clustergroup_mask() for the CLS level (guarded by CONFIG_SCHED_CLUSTER) which is currently provided by x86 (arch/x86/kernel/smpboot.c) and arm64 (drivers/base/arch_topology.c). Fixes: 778c558f49a2 ("sched: Add cluster scheduler level in core and related Kconfig for ARM64") Acked-by: Barry Song Signed-off-by: Dietmar Eggemann Link: https://lore.kernel.org/r/20220513093433.425163-1-dietmar.eggemann@arm.com Signed-off-by: Greg Kroah-Hartman --- include/linux/topology.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/topology.h b/include/linux/topology.h index f19bc3626297..4564faafd0e1 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -240,13 +240,6 @@ static inline const struct cpumask *cpu_smt_mask(int cpu) } #endif -#if defined(CONFIG_SCHED_CLUSTER) && !defined(cpu_cluster_mask) -static inline const struct cpumask *cpu_cluster_mask(int cpu) -{ - return topology_cluster_cpumask(cpu); -} -#endif - static inline const struct cpumask *cpu_cpu_mask(int cpu) { return cpumask_of_node(cpu_to_node(cpu)); -- cgit From 0651ab90e4ade17f1d4f4367b70f6120480410f3 Mon Sep 17 00:00:00 2001 From: Pierre Gondois Date: Wed, 18 May 2022 11:08:57 +0200 Subject: ACPI: CPPC: Check _OSC for flexible address space MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ACPI 6.2 Section 6.2.11.2 'Platform-Wide OSPM Capabilities': Starting with ACPI Specification 6.2, all _CPC registers can be in PCC, System Memory, System IO, or Functional Fixed Hardware address spaces. OSPM support for this more flexible register space scheme is indicated by the “Flexible Address Space for CPPC Registers” _OSC bit Otherwise (cf ACPI 6.1, s8.4.7.1.1.X), _CPC registers must be in: - PCC or Functional Fixed Hardware address space if defined - SystemMemory address space (NULL register) if not defined Add the corresponding _OSC bit and check it when parsing _CPC objects. Signed-off-by: Pierre Gondois Reviewed-by: Sudeep Holla Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index d7136d13aa44..03465db16b68 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -574,6 +574,7 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context); #define OSC_SB_OSLPI_SUPPORT 0x00000100 #define OSC_SB_CPC_DIVERSE_HIGH_SUPPORT 0x00001000 #define OSC_SB_GENERIC_INITIATOR_SUPPORT 0x00002000 +#define OSC_SB_CPC_FLEXIBLE_ADR_SPACE 0x00004000 #define OSC_SB_NATIVE_USB4_SUPPORT 0x00040000 #define OSC_SB_PRM_SUPPORT 0x00200000 @@ -581,6 +582,7 @@ extern bool osc_sb_apei_support_acked; extern bool osc_pc_lpi_support_confirmed; extern bool osc_sb_native_usb4_support_confirmed; extern bool osc_sb_cppc_not_supported; +extern bool osc_cpc_flexible_adr_space_confirmed; /* USB4 Capabilities */ #define OSC_USB_USB3_TUNNELING 0x00000001 -- cgit From 66d29d802ef3bf55a49b07568b0048823d4a72a6 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 11 May 2022 16:56:56 +0200 Subject: PM: domains: Allocate gpd_timing_data dynamically based on governor If a genpd doesn't have an associated governor assigned, there's really no point to allocate the per device gpd_timing_data, as the data isn't being used by a governor anyway. To avoid wasting memory, let's therefore convert the corresponding td variable in the struct generic_pm_domain_data into a pointer and manage the allocation of its data dynamically. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- include/linux/pm_domain.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 043d48e4420a..126a4b9ab215 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -193,7 +193,7 @@ struct pm_domain_data { struct generic_pm_domain_data { struct pm_domain_data base; - struct gpd_timing_data td; + struct gpd_timing_data *td; struct notifier_block nb; struct notifier_block *power_nb; int cpu; -- cgit From 9c74f2ac4801473d03b0e29fab74eb94a9944521 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 11 May 2022 16:56:57 +0200 Subject: PM: domains: Move the next_wakeup variable into the struct gpd_timing_data If the corresponding genpd for the device doesn't use a governor, the variable next_wakeup within the struct generic_pm_domain_data becomes superfluous. To avoid wasting memory, let's move it into the struct gpd_timing_data, which is already being allocated based upon if there is governor assigned. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- include/linux/pm_domain.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 126a4b9ab215..1f370f074f30 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -182,6 +182,7 @@ struct gpd_timing_data { s64 suspend_latency_ns; s64 resume_latency_ns; s64 effective_constraint_ns; + ktime_t next_wakeup; bool constraint_changed; bool cached_suspend_ok; }; @@ -200,7 +201,6 @@ struct generic_pm_domain_data { unsigned int performance_state; unsigned int default_pstate; unsigned int rpm_pstate; - ktime_t next_wakeup; void *data; }; -- cgit From f38d1a6d002526a4e8840e9bb19733e9d4ce1a67 Mon Sep 17 00:00:00 2001 From: Ulf Hansson Date: Wed, 11 May 2022 16:57:02 +0200 Subject: PM: domains: Allocate governor data dynamically based on a genpd governor If a genpd doesn't have an associated governor assigned, several variables in the struct generic_pm_domain becomes superfluous. Rather than wasting memory in allocated genpds, let's move the variables from the struct generic_pm_domain into a new separate struct. In this way, we can instead dynamically decide when we need to allocate the corresponding data for it. Signed-off-by: Ulf Hansson Signed-off-by: Rafael J. Wysocki --- include/linux/pm_domain.h | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_domain.h b/include/linux/pm_domain.h index 1f370f074f30..ebc351698090 100644 --- a/include/linux/pm_domain.h +++ b/include/linux/pm_domain.h @@ -91,6 +91,14 @@ struct gpd_dev_ops { int (*stop)(struct device *dev); }; +struct genpd_governor_data { + s64 max_off_time_ns; + bool max_off_time_changed; + ktime_t next_wakeup; + bool cached_power_down_ok; + bool cached_power_down_state_idx; +}; + struct genpd_power_state { s64 power_off_latency_ns; s64 power_on_latency_ns; @@ -114,6 +122,7 @@ struct generic_pm_domain { struct list_head child_links; /* Links with PM domain as a child */ struct list_head dev_list; /* List of devices */ struct dev_power_governor *gov; + struct genpd_governor_data *gd; /* Data used by a genpd governor. */ struct work_struct power_off_work; struct fwnode_handle *provider; /* Identity of the domain provider */ bool has_provider; @@ -134,11 +143,6 @@ struct generic_pm_domain { int (*set_performance_state)(struct generic_pm_domain *genpd, unsigned int state); struct gpd_dev_ops dev_ops; - s64 max_off_time_ns; /* Maximum allowed "suspended" time. */ - ktime_t next_wakeup; /* Maintained by the domain governor */ - bool max_off_time_changed; - bool cached_power_down_ok; - bool cached_power_down_state_idx; int (*attach_dev)(struct generic_pm_domain *domain, struct device *dev); void (*detach_dev)(struct generic_pm_domain *domain, -- cgit From 4853f68d158ac59b05985a6af5b7da7ccdbc14c8 Mon Sep 17 00:00:00 2001 From: Liao Chang Date: Fri, 8 Apr 2022 18:09:09 +0800 Subject: kexec_file: Fix kexec_file.c build error for riscv platform When CONFIG_KEXEC_FILE is set for riscv platform, the compilation of kernel/kexec_file.c generate build error: kernel/kexec_file.c: In function 'crash_prepare_elf64_headers': ./arch/riscv/include/asm/page.h:110:71: error: request for member 'virt_addr' in something not a structure or union 110 | ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < kernel_map.virt_addr)) | ^ ./arch/riscv/include/asm/page.h:131:2: note: in expansion of macro 'is_linear_mapping' 131 | is_linear_mapping(_x) ? \ | ^~~~~~~~~~~~~~~~~ ./arch/riscv/include/asm/page.h:140:31: note: in expansion of macro '__va_to_pa_nodebug' 140 | #define __phys_addr_symbol(x) __va_to_pa_nodebug(x) | ^~~~~~~~~~~~~~~~~~ ./arch/riscv/include/asm/page.h:143:24: note: in expansion of macro '__phys_addr_symbol' 143 | #define __pa_symbol(x) __phys_addr_symbol(RELOC_HIDE((unsigned long)(x), 0)) | ^~~~~~~~~~~~~~~~~~ kernel/kexec_file.c:1327:36: note: in expansion of macro '__pa_symbol' 1327 | phdr->p_offset = phdr->p_paddr = __pa_symbol(_text); This occurs is because the "kernel_map" referenced in macro is_linear_mapping() is suppose to be the one of struct kernel_mapping defined in arch/riscv/mm/init.c, but the 2nd argument of crash_prepare_elf64_header() has same symbol name, in expansion of macro is_linear_mapping in function crash_prepare_elf64_header(), "kernel_map" actually is the local variable. Signed-off-by: Liao Chang Link: https://lore.kernel.org/r/20220408100914.150110-2-lizhengyu3@huawei.com Signed-off-by: Palmer Dabbelt --- include/linux/kexec.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 58d1b58a971e..ebb1bffbf068 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -227,7 +227,7 @@ struct crash_mem { extern int crash_exclude_mem_range(struct crash_mem *mem, unsigned long long mstart, unsigned long long mend); -extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map, +extern int crash_prepare_elf64_headers(struct crash_mem *mem, int need_kernel_map, void **addr, unsigned long *sz); #endif /* CONFIG_KEXEC_FILE */ -- cgit From 6c1e423a3c84953edcf91ff03ab97829b287184a Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Wed, 18 May 2022 17:45:27 +0200 Subject: can: can-dev: remove obsolete CAN LED support Since commit 30f3b42147ba6f ("can: mark led trigger as broken") the CAN specific LED support was disabled and marked as BROKEN. As the common LED support with CONFIG_LEDS_TRIGGER_NETDEV should do this work now the code can be removed as preparation for a CAN netdevice Kconfig rework. Link: https://lore.kernel.org/all/20220518154527.29046-1-socketcan@hartkopp.net Suggested-by: Vincent Mailhol Signed-off-by: Oliver Hartkopp [mkl: remove led.h from MAINTAINERS] Signed-off-by: Marc Kleine-Budde --- include/linux/can/dev.h | 10 ---------- include/linux/can/led.h | 51 ------------------------------------------------- 2 files changed, 61 deletions(-) delete mode 100644 include/linux/can/led.h (limited to 'include/linux') diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h index c2ea47f30046..e22dc03c850e 100644 --- a/include/linux/can/dev.h +++ b/include/linux/can/dev.h @@ -17,7 +17,6 @@ #include #include #include -#include #include #include #include @@ -85,15 +84,6 @@ struct can_priv { int (*do_get_berr_counter)(const struct net_device *dev, struct can_berr_counter *bec); int (*do_get_auto_tdcv)(const struct net_device *dev, u32 *tdcv); - -#ifdef CONFIG_CAN_LEDS - struct led_trigger *tx_led_trig; - char tx_led_trig_name[CAN_LED_NAME_SZ]; - struct led_trigger *rx_led_trig; - char rx_led_trig_name[CAN_LED_NAME_SZ]; - struct led_trigger *rxtx_led_trig; - char rxtx_led_trig_name[CAN_LED_NAME_SZ]; -#endif }; static inline bool can_tdc_is_enabled(const struct can_priv *priv) diff --git a/include/linux/can/led.h b/include/linux/can/led.h deleted file mode 100644 index 7c3cfd798c56..000000000000 --- a/include/linux/can/led.h +++ /dev/null @@ -1,51 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright 2012, Fabio Baltieri - */ - -#ifndef _CAN_LED_H -#define _CAN_LED_H - -#include -#include -#include - -enum can_led_event { - CAN_LED_EVENT_OPEN, - CAN_LED_EVENT_STOP, - CAN_LED_EVENT_TX, - CAN_LED_EVENT_RX, -}; - -#ifdef CONFIG_CAN_LEDS - -/* keep space for interface name + "-tx"/"-rx"/"-rxtx" - * suffix and null terminator - */ -#define CAN_LED_NAME_SZ (IFNAMSIZ + 6) - -void can_led_event(struct net_device *netdev, enum can_led_event event); -void devm_can_led_init(struct net_device *netdev); -int __init can_led_notifier_init(void); -void __exit can_led_notifier_exit(void); - -#else - -static inline void can_led_event(struct net_device *netdev, - enum can_led_event event) -{ -} -static inline void devm_can_led_init(struct net_device *netdev) -{ -} -static inline int can_led_notifier_init(void) -{ - return 0; -} -static inline void can_led_notifier_exit(void) -{ -} - -#endif - -#endif /* !_CAN_LED_H */ -- cgit From b265cdebdfefbee4ef1ae2972af36f7a7cac5148 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 19 May 2022 14:08:48 -0700 Subject: sched: coredump.h: clarify the use of MMF_VM_HUGEPAGE Patch series "Make khugepaged collapse readonly FS THP more consistent", v4. The readonly FS THP relies on khugepaged to collapse THP for suitable vmas. But the behavior is inconsistent for "always" mode (https://lore.kernel.org/linux-mm/00f195d4-d039-3cf2-d3a1-a2c88de397a0@suse.cz/). The "always" mode means THP allocation should be tried all the time and khugepaged should try to collapse THP all the time. Of course the allocation and collapse may fail due to other factors and conditions. Currently file THP may not be collapsed by khugepaged even though all the conditions are met. That does break the semantics of "always" mode. So make sure readonly FS vmas are registered to khugepaged to fix the break. Register suitable vmas in common mmap path, that could cover both readonly FS vmas and shmem vmas, so remove the khugepaged calls in shmem.c. The patch 1-7 are minor bug fixes, clean up and preparation patches. Patch 8 is the real meat. Tested with khugepaged test in selftests and the testcase provided by Vlastimil Babka in https://lore.kernel.org/lkml/df3b5d1c-a36b-2c73-3e27-99e74983de3a@suse.cz/ by commenting out MADV_HUGEPAGE call. This patch (of 8): MMF_VM_HUGEPAGE is set as long as the mm is available for khugepaged by khugepaged_enter(), not only when VM_HUGEPAGE is set on vma. Correct the comment to avoid confusion. Link: https://lkml.kernel.org/r/20220510203222.24246-1-shy828301@gmail.com Link: https://lkml.kernel.org/r/20220510203222.24246-2-shy828301@gmail.com Signed-off-by: Yang Shi Reviewed-by: Miaohe Lin Acked-by: Song Liu Acked-by: Vlastmil Babka Cc: Kirill A. Shutemov Cc: Matthew Wilcox (Oracle) Cc: Rik van Riel Cc: Song Liu Cc: Theodore Ts'o Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/sched/coredump.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 4d9e3a656875..4d0a5be28b70 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -57,7 +57,8 @@ static inline int get_dumpable(struct mm_struct *mm) #endif /* leave room for more dump flags */ #define MMF_VM_MERGEABLE 16 /* KSM may merge identical pages */ -#define MMF_VM_HUGEPAGE 17 /* set when VM_HUGEPAGE is set on vma */ +#define MMF_VM_HUGEPAGE 17 /* set when mm is available for + khugepaged */ /* * This one-shot flag is dropped due to necessity of changing exe once again * on NFS restore -- cgit From 78d12c19e02d8151b1c82228ae6bb10f174a68e2 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 19 May 2022 14:08:49 -0700 Subject: mm: thp: only regular file could be THP eligible Since commit a4aeaa06d45e ("mm: khugepaged: skip huge page collapse for special files"), khugepaged just collapses THP for regular file which is the intended usecase for readonly fs THP. Only show regular file as THP eligible accordingly. And make file_thp_enabled() available for khugepaged too in order to remove duplicate code. Link: https://lkml.kernel.org/r/20220510203222.24246-5-shy828301@gmail.com Signed-off-by: Yang Shi Acked-by: Song Liu Acked-by: Vlastmil Babka Cc: Kirill A. Shutemov Cc: Matthew Wilcox (Oracle) Cc: Miaohe Lin Cc: Rik van Riel Cc: Song Liu Cc: Theodore Ts'o Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index fbf36bb1be22..de29821231c9 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -173,6 +173,20 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) return false; } +static inline bool file_thp_enabled(struct vm_area_struct *vma) +{ + struct inode *inode; + + if (!vma->vm_file) + return false; + + inode = vma->vm_file->f_inode; + + return (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) && + (vma->vm_flags & VM_EXEC) && + !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); +} + bool transparent_hugepage_active(struct vm_area_struct *vma); #define transparent_hugepage_use_zero_page() \ -- cgit From d2081b2bf8195b8239c67fdd61518e077da7cbec Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 19 May 2022 14:08:49 -0700 Subject: mm: khugepaged: make khugepaged_enter() void function The most callers of khugepaged_enter() don't care about the return value. Only dup_mmap(), anonymous THP page fault and MADV_HUGEPAGE handle the error by returning -ENOMEM. Actually it is not harmful for them to ignore the error case either. It also sounds overkilling to fail fork() and page fault early due to khugepaged_enter() error, and MADV_HUGEPAGE does set VM_HUGEPAGE flag regardless of the error. Link: https://lkml.kernel.org/r/20220510203222.24246-6-shy828301@gmail.com Signed-off-by: Yang Shi Acked-by: Song Liu Acked-by: Vlastmil Babka Cc: Kirill A. Shutemov Cc: Matthew Wilcox (Oracle) Cc: Miaohe Lin Cc: Rik van Riel Cc: Song Liu Cc: Theodore Ts'o Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/khugepaged.h | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 2fcc01891b47..0423d3619f26 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -12,10 +12,10 @@ extern struct attribute_group khugepaged_attr_group; extern int khugepaged_init(void); extern void khugepaged_destroy(void); extern int start_stop_khugepaged(void); -extern int __khugepaged_enter(struct mm_struct *mm); +extern void __khugepaged_enter(struct mm_struct *mm); extern void __khugepaged_exit(struct mm_struct *mm); -extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma, - unsigned long vm_flags); +extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma, + unsigned long vm_flags); extern void khugepaged_min_free_kbytes_update(void); #ifdef CONFIG_SHMEM extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr); @@ -40,11 +40,10 @@ static inline void collapse_pte_mapped_thp(struct mm_struct *mm, (transparent_hugepage_flags & \ (1<flags)) - return __khugepaged_enter(mm); - return 0; + __khugepaged_enter(mm); } static inline void khugepaged_exit(struct mm_struct *mm) @@ -53,7 +52,7 @@ static inline void khugepaged_exit(struct mm_struct *mm) __khugepaged_exit(mm); } -static inline int khugepaged_enter(struct vm_area_struct *vma, +static inline void khugepaged_enter(struct vm_area_struct *vma, unsigned long vm_flags) { if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags)) @@ -62,27 +61,22 @@ static inline int khugepaged_enter(struct vm_area_struct *vma, (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) && !(vm_flags & VM_NOHUGEPAGE) && !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - if (__khugepaged_enter(vma->vm_mm)) - return -ENOMEM; - return 0; + __khugepaged_enter(vma->vm_mm); } #else /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline int khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm) +static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm) { - return 0; } static inline void khugepaged_exit(struct mm_struct *mm) { } -static inline int khugepaged_enter(struct vm_area_struct *vma, - unsigned long vm_flags) +static inline void khugepaged_enter(struct vm_area_struct *vma, + unsigned long vm_flags) { - return 0; } -static inline int khugepaged_enter_vma_merge(struct vm_area_struct *vma, - unsigned long vm_flags) +static inline void khugepaged_enter_vma_merge(struct vm_area_struct *vma, + unsigned long vm_flags) { - return 0; } static inline void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr) -- cgit From 2647d11b9e71d495b2d4985ebaf7a734373a4b80 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 19 May 2022 14:08:49 -0700 Subject: mm: khugepaged: make hugepage_vma_check() non-static The hugepage_vma_check() could be reused by khugepaged_enter() and khugepaged_enter_vma_merge(), but it is static in khugepaged.c. Make it non-static and declare it in khugepaged.h. Link: https://lkml.kernel.org/r/20220510203222.24246-7-shy828301@gmail.com Signed-off-by: Yang Shi Suggested-by: Vlastimil Babka Cc: Kirill A. Shutemov Cc: Matthew Wilcox (Oracle) Cc: Miaohe Lin Cc: Rik van Riel Cc: Song Liu Cc: Song Liu Cc: Theodore Ts'o Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/khugepaged.h | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 0423d3619f26..c340f6bb39d6 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -3,8 +3,6 @@ #define _LINUX_KHUGEPAGED_H #include /* MMF_VM_HUGEPAGE */ -#include - #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern struct attribute_group khugepaged_attr_group; @@ -12,6 +10,8 @@ extern struct attribute_group khugepaged_attr_group; extern int khugepaged_init(void); extern void khugepaged_destroy(void); extern int start_stop_khugepaged(void); +extern bool hugepage_vma_check(struct vm_area_struct *vma, + unsigned long vm_flags); extern void __khugepaged_enter(struct mm_struct *mm); extern void __khugepaged_exit(struct mm_struct *mm); extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma, @@ -55,13 +55,11 @@ static inline void khugepaged_exit(struct mm_struct *mm) static inline void khugepaged_enter(struct vm_area_struct *vma, unsigned long vm_flags) { - if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags)) - if ((khugepaged_always() || - (shmem_file(vma->vm_file) && shmem_huge_enabled(vma)) || - (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) && - !(vm_flags & VM_NOHUGEPAGE) && - !test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && + khugepaged_enabled()) { + if (hugepage_vma_check(vma, vm_flags)) __khugepaged_enter(vma->vm_mm); + } } #else /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm) -- cgit From c791576c60288c89b351ea2d2098f6a872d78fa7 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 19 May 2022 14:08:50 -0700 Subject: mm: khugepaged: introduce khugepaged_enter_vma() helper The khugepaged_enter_vma_merge() actually does as the same thing as the khugepaged_enter() section called by shmem_mmap(), so consolidate them into one helper and rename it to khugepaged_enter_vma(). Link: https://lkml.kernel.org/r/20220510203222.24246-8-shy828301@gmail.com Signed-off-by: Yang Shi Acked-by: Vlastmil Babka Cc: Kirill A. Shutemov Cc: Matthew Wilcox (Oracle) Cc: Miaohe Lin Cc: Rik van Riel Cc: Song Liu Cc: Song Liu Cc: Theodore Ts'o Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/khugepaged.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index c340f6bb39d6..392d34c3c59a 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -14,8 +14,8 @@ extern bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags); extern void __khugepaged_enter(struct mm_struct *mm); extern void __khugepaged_exit(struct mm_struct *mm); -extern void khugepaged_enter_vma_merge(struct vm_area_struct *vma, - unsigned long vm_flags); +extern void khugepaged_enter_vma(struct vm_area_struct *vma, + unsigned long vm_flags); extern void khugepaged_min_free_kbytes_update(void); #ifdef CONFIG_SHMEM extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr); @@ -72,8 +72,8 @@ static inline void khugepaged_enter(struct vm_area_struct *vma, unsigned long vm_flags) { } -static inline void khugepaged_enter_vma_merge(struct vm_area_struct *vma, - unsigned long vm_flags) +static inline void khugepaged_enter_vma(struct vm_area_struct *vma, + unsigned long vm_flags) { } static inline void collapse_pte_mapped_thp(struct mm_struct *mm, -- cgit From bc4a68adb151f08b50822158c7767f5726346370 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 19 May 2022 14:08:50 -0700 Subject: mm/swap: remove unneeded return value of free_swap_slot The return value of free_swap_slot is always 0 and also ignored now. Remove it to clean up the code. Link: https://lkml.kernel.org/r/20220509131416.17553-5-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Cc: Alistair Popple Cc: David Howells Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: NeilBrown Cc: Peter Xu Cc: Suren Baghdasaryan Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/swap_slots.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap_slots.h b/include/linux/swap_slots.h index 347f1a304190..15adfb8c813a 100644 --- a/include/linux/swap_slots.h +++ b/include/linux/swap_slots.h @@ -24,7 +24,7 @@ struct swap_slots_cache { void disable_swap_slots_cache_lock(void); void reenable_swap_slots_cache_unlock(void); void enable_swap_slots_cache(void); -int free_swap_slot(swp_entry_t entry); +void free_swap_slot(swp_entry_t entry); extern bool swap_slot_cache_enabled; -- cgit From 3db3264d8a5f4816c022eb9052694ce51e657bfc Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 19 May 2022 14:08:51 -0700 Subject: mm/swap: make page_swapcount and __lru_add_drain_all static Make page_swapcount and __lru_add_drain_all static. They are only used within the file now. Link: https://lkml.kernel.org/r/20220509131416.17553-9-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Cc: Alistair Popple Cc: David Howells Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: NeilBrown Cc: Peter Xu Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Oscar Salvador Signed-off-by: Andrew Morton --- include/linux/swap.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 6bf1506e5080..0ff887cad383 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -485,7 +485,6 @@ int swap_type_of(dev_t device, sector_t offset); int find_first_swap(dev_t *device); extern unsigned int count_swap_pages(int, int); extern sector_t swapdev_block(int, pgoff_t); -extern int page_swapcount(struct page *); extern int __swap_count(swp_entry_t entry); extern int __swp_swapcount(swp_entry_t entry); extern int swp_swapcount(swp_entry_t entry); @@ -557,12 +556,6 @@ static inline void put_swap_page(struct page *page, swp_entry_t swp) { } - -static inline int page_swapcount(struct page *page) -{ - return 0; -} - static inline int __swap_count(swp_entry_t entry) { return 0; -- cgit From a930c210c42dec32b031ad7645bd6904701b5297 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 19 May 2022 14:08:52 -0700 Subject: mm/swap: fix the obsolete comment for SWP_TYPE_SHIFT Since commit 3159f943aafd ("xarray: Replace exceptional entries"), there is only one bit of 'type' can be shifted up. Update the corresponding comment. Link: https://lkml.kernel.org/r/20220509131416.17553-13-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Cc: Alistair Popple Cc: David Howells Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: NeilBrown Cc: Peter Xu Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Oscar Salvador Signed-off-by: Andrew Morton --- include/linux/swapops.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 09945342bf76..fe220df499f1 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -13,10 +13,10 @@ * get good packing density in that tree, so the index should be dense in * the low-order bits. * - * We arrange the `type' and `offset' fields so that `type' is at the seven + * We arrange the `type' and `offset' fields so that `type' is at the six * high-order bits of the swp_entry_t and `offset' is right-aligned in the * remaining bits. Although `type' itself needs only five bits, we allow for - * shmem/tmpfs to shift it all up a further two bits: see swp_to_radix_entry(). + * shmem/tmpfs to shift it all up a further one bit: see swp_to_radix_entry(). * * swp_entry_t's are *never* stored anywhere in their arch-dependent format. */ -- cgit From ff351f4bb9606a6c9a1f3fcdd918635b51514779 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 19 May 2022 14:08:52 -0700 Subject: mm/swap: fix comment about swap extent Since commit 4efaceb1c5f8 ("mm, swap: use rbtree for swap_extent"), rbtree is used for swap extent. Also curr_swap_extent is removed at that time. Update the corresponding comment. Link: https://lkml.kernel.org/r/20220509131416.17553-16-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Alistair Popple Cc: David Hildenbrand Cc: David Howells Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: NeilBrown Cc: Peter Xu Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Oscar Salvador Signed-off-by: Andrew Morton --- include/linux/swap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 0ff887cad383..04c0713c1d6e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -168,8 +168,8 @@ struct zone; /* * A swap extent maps a range of a swapfile's PAGE_SIZE pages onto a range of - * disk blocks. A list of swap extents maps the entire swapfile. (Where the - * term `swapfile' refers to either a blockdevice or an IS_REG file. Apart + * disk blocks. A rbtree of swap extents maps the entire swapfile (Where the + * term `swapfile' refers to either a blockdevice or an IS_REG file). Apart * from setup, they're handled identically. * * We always assume that blocks are of size PAGE_SIZE. -- cgit From f6498b776d280b30a4614d8261840961e993c2c8 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Thu, 19 May 2022 14:08:53 -0700 Subject: mm: zswap: add basic meminfo and vmstat coverage Currently it requires poking at debugfs to figure out the size and population of the zswap cache on a host. There are no counters for reads and writes against the cache. As a result, it's difficult to understand zswap behavior on production systems. Print zswap memory consumption and how many pages are zswapped out in /proc/meminfo. Count zswapouts and zswapins in /proc/vmstat. Link: https://lkml.kernel.org/r/20220510152847.230957-6-hannes@cmpxchg.org Signed-off-by: Johannes Weiner Acked-by: David Hildenbrand Cc: Dan Streetman Cc: Michal Hocko Cc: Minchan Kim Cc: Roman Gushchin Cc: Seth Jennings Cc: Shakeel Butt Signed-off-by: Andrew Morton --- include/linux/swap.h | 5 +++++ include/linux/vm_event_item.h | 4 ++++ 2 files changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 04c0713c1d6e..f3ae17b43f20 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -620,6 +620,11 @@ static inline int mem_cgroup_swappiness(struct mem_cgroup *mem) } #endif +#ifdef CONFIG_ZSWAP +extern u64 zswap_pool_total_size; +extern atomic_t zswap_stored_pages; +#endif + #if defined(CONFIG_SWAP) && defined(CONFIG_MEMCG) && defined(CONFIG_BLK_CGROUP) extern void __cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask); static inline void cgroup_throttle_swaprate(struct page *page, gfp_t gfp_mask) diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index e83967e4c20e..404024486fa5 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -136,6 +136,10 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, #ifdef CONFIG_KSM COW_KSM, #endif +#ifdef CONFIG_ZSWAP + ZSWPIN, + ZSWPOUT, +#endif #ifdef CONFIG_X86 DIRECT_MAP_LEVEL2_SPLIT, DIRECT_MAP_LEVEL3_SPLIT, -- cgit From f4840ccfca25db225b3371a8f7b5770febee87c5 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Thu, 19 May 2022 14:08:53 -0700 Subject: zswap: memcg accounting Applications can currently escape their cgroup memory containment when zswap is enabled. This patch adds per-cgroup tracking and limiting of zswap backend memory to rectify this. The existing cgroup2 memory.stat file is extended to show zswap statistics analogous to what's in meminfo and vmstat. Furthermore, two new control files, memory.zswap.current and memory.zswap.max, are added to allow tuning zswap usage on a per-workload basis. This is important since not all workloads benefit from zswap equally; some even suffer compared to disk swap when memory contents don't compress well. The optimal size of the zswap pool, and the threshold for writeback, also depends on the size of the workload's warm set. The implementation doesn't use a traditional page_counter transaction. zswap is unconventional as a memory consumer in that we only know the amount of memory to charge once expensive compression has occurred. If zwap is disabled or the limit is already exceeded we obviously don't want to compress page upon page only to reject them all. Instead, the limit is checked against current usage, then we compress and charge. This allows some limit overrun, but not enough to matter in practice. [hannes@cmpxchg.org: fix for CONFIG_SLOB builds] Link: https://lkml.kernel.org/r/YnwD14zxYjUJPc2w@cmpxchg.org [hannes@cmpxchg.org: opt out of cgroups v1] Link: https://lkml.kernel.org/r/Yn6it9mBYFA+/lTb@cmpxchg.org Link: https://lkml.kernel.org/r/20220510152847.230957-7-hannes@cmpxchg.org Signed-off-by: Johannes Weiner Cc: Michal Hocko Cc: Roman Gushchin Cc: Shakeel Butt Cc: Seth Jennings Cc: Dan Streetman Cc: Minchan Kim Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 54 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 8ea4b541c31e..9ecead1042b9 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -35,6 +35,8 @@ enum memcg_stat_item { MEMCG_PERCPU_B, MEMCG_VMALLOC, MEMCG_KMEM, + MEMCG_ZSWAP_B, + MEMCG_ZSWAPPED, MEMCG_NR_STAT, }; @@ -252,6 +254,10 @@ struct mem_cgroup { /* Range enforcement for interrupt charges */ struct work_struct high_work; +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) + unsigned long zswap_max; +#endif + unsigned long soft_limit; /* vmpressure notifications */ @@ -1273,6 +1279,10 @@ struct mem_cgroup *mem_cgroup_from_css(struct cgroup_subsys_state *css) return NULL; } +static inline void obj_cgroup_put(struct obj_cgroup *objcg) +{ +} + static inline void mem_cgroup_put(struct mem_cgroup *memcg) { } @@ -1694,6 +1704,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order); void __memcg_kmem_uncharge_page(struct page *page, int order); struct obj_cgroup *get_obj_cgroup_from_current(void); +struct obj_cgroup *get_obj_cgroup_from_page(struct page *page); int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size); void obj_cgroup_uncharge(struct obj_cgroup *objcg, size_t size); @@ -1730,6 +1741,20 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg) struct mem_cgroup *mem_cgroup_from_obj(void *p); +static inline void count_objcg_event(struct obj_cgroup *objcg, + enum vm_event_item idx) +{ + struct mem_cgroup *memcg; + + if (mem_cgroup_kmem_disabled()) + return; + + rcu_read_lock(); + memcg = obj_cgroup_memcg(objcg); + count_memcg_events(memcg, idx, 1); + rcu_read_unlock(); +} + #else static inline bool mem_cgroup_kmem_disabled(void) { @@ -1756,6 +1781,11 @@ static inline void __memcg_kmem_uncharge_page(struct page *page, int order) { } +static inline struct obj_cgroup *get_obj_cgroup_from_page(struct page *page) +{ + return NULL; +} + static inline bool memcg_kmem_enabled(void) { return false; @@ -1771,6 +1801,30 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p) return NULL; } +static inline void count_objcg_event(struct obj_cgroup *objcg, + enum vm_event_item idx) +{ +} + #endif /* CONFIG_MEMCG_KMEM */ +#if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) +bool obj_cgroup_may_zswap(struct obj_cgroup *objcg); +void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, size_t size); +void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, size_t size); +#else +static inline bool obj_cgroup_may_zswap(struct obj_cgroup *objcg) +{ + return true; +} +static inline void obj_cgroup_charge_zswap(struct obj_cgroup *objcg, + size_t size) +{ +} +static inline void obj_cgroup_uncharge_zswap(struct obj_cgroup *objcg, + size_t size) +{ +} +#endif + #endif /* _LINUX_MEMCONTROL_H */ -- cgit From 6d4675e601357834dadd2ba1d803f6484596015c Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Thu, 19 May 2022 14:08:54 -0700 Subject: mm: don't be stuck to rmap lock on reclaim path The rmap locks(i_mmap_rwsem and anon_vma->root->rwsem) could be contended under memory pressure if processes keep working on their vmas(e.g., fork, mmap, munmap). It makes reclaim path stuck. In our real workload traces, we see kswapd is waiting the lock for 300ms+(worst case, a sec) and it makes other processes entering direct reclaim, which were also stuck on the lock. This patch makes lru aging path try_lock mode like shink_page_list so the reclaim context will keep working with next lru pages without being stuck. if it found the rmap lock contended, it rotates the page back to head of lru in both active/inactive lrus to make them consistent behavior, which is basic starting point rather than adding more heristic. Since this patch introduces a new "contended" field as out-param along with try_lock in-param in rmap_walk_control, it's not immutable any longer if the try_lock is set so remove const keywords on rmap related functions. Since rmap walking is already expensive operation, I doubt the const would help sizable benefit( And we didn't have it until 5.17). In a heavy app workload in Android, trace shows following statistics. It almost removes rmap lock contention from reclaim path. Martin Liu reported: Before: max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function 1632 0 1631 151.542173 31672 209 page_lock_anon_vma_read 601 0 601 145.544681 28817 198 rmap_walk_file After: max_dur(ms) min_dur(ms) max-min(dur)ms avg_dur(ms) sum_dur(ms) count blocked_function NaN NaN NaN NaN NaN 0.0 NaN 0 0 0 0.127645 1 12 rmap_walk_file [minchan@kernel.org: add comment, per Matthew] Link: https://lkml.kernel.org/r/YnNqeB5tUf6LZ57b@google.com Link: https://lkml.kernel.org/r/20220510215423.164547-1-minchan@kernel.org Signed-off-by: Minchan Kim Acked-by: Johannes Weiner Cc: Suren Baghdasaryan Cc: Michal Hocko Cc: John Dias Cc: Tim Murray Cc: Matthew Wilcox Cc: Vladimir Davydov Cc: Martin Liu Cc: Minchan Kim Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/fs.h | 5 +++++ include/linux/ksm.h | 4 ++-- include/linux/rmap.h | 28 ++++++++++++++++++++-------- 3 files changed, 27 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index b81cacc51d2f..044b67f8d861 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -477,6 +477,11 @@ static inline void i_mmap_unlock_write(struct address_space *mapping) up_write(&mapping->i_mmap_rwsem); } +static inline int i_mmap_trylock_read(struct address_space *mapping) +{ + return down_read_trylock(&mapping->i_mmap_rwsem); +} + static inline void i_mmap_lock_read(struct address_space *mapping) { down_read(&mapping->i_mmap_rwsem); diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 0630e545f4cb..0b4f17418f64 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -51,7 +51,7 @@ static inline void ksm_exit(struct mm_struct *mm) struct page *ksm_might_need_to_copy(struct page *page, struct vm_area_struct *vma, unsigned long address); -void rmap_walk_ksm(struct folio *folio, const struct rmap_walk_control *rwc); +void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc); void folio_migrate_ksm(struct folio *newfolio, struct folio *folio); #else /* !CONFIG_KSM */ @@ -79,7 +79,7 @@ static inline struct page *ksm_might_need_to_copy(struct page *page, } static inline void rmap_walk_ksm(struct folio *folio, - const struct rmap_walk_control *rwc) + struct rmap_walk_control *rwc) { } diff --git a/include/linux/rmap.h b/include/linux/rmap.h index cbe279a6f0de..9ec23138e410 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -128,6 +128,11 @@ static inline void anon_vma_lock_read(struct anon_vma *anon_vma) down_read(&anon_vma->root->rwsem); } +static inline int anon_vma_trylock_read(struct anon_vma *anon_vma) +{ + return down_read_trylock(&anon_vma->root->rwsem); +} + static inline void anon_vma_unlock_read(struct anon_vma *anon_vma) { up_read(&anon_vma->root->rwsem); @@ -366,17 +371,14 @@ int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); -/* - * Called by memory-failure.c to kill processes. - */ -struct anon_vma *folio_lock_anon_vma_read(struct folio *folio); -void page_unlock_anon_vma_read(struct anon_vma *anon_vma); int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); /* * rmap_walk_control: To control rmap traversing for specific needs * * arg: passed to rmap_one() and invalid_vma() + * try_lock: bail out if the rmap lock is contended + * contended: indicate the rmap traversal bailed out due to lock contention * rmap_one: executed on each vma where page is mapped * done: for checking traversing termination condition * anon_lock: for getting anon_lock by optimized way rather than default @@ -384,6 +386,8 @@ int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); */ struct rmap_walk_control { void *arg; + bool try_lock; + bool contended; /* * Return false if page table scanning in rmap_walk should be stopped. * Otherwise, return true. @@ -391,12 +395,20 @@ struct rmap_walk_control { bool (*rmap_one)(struct folio *folio, struct vm_area_struct *vma, unsigned long addr, void *arg); int (*done)(struct folio *folio); - struct anon_vma *(*anon_lock)(struct folio *folio); + struct anon_vma *(*anon_lock)(struct folio *folio, + struct rmap_walk_control *rwc); bool (*invalid_vma)(struct vm_area_struct *vma, void *arg); }; -void rmap_walk(struct folio *folio, const struct rmap_walk_control *rwc); -void rmap_walk_locked(struct folio *folio, const struct rmap_walk_control *rwc); +void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); +void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); + +/* + * Called by memory-failure.c to kill processes. + */ +struct anon_vma *folio_lock_anon_vma_read(struct folio *folio, + struct rmap_walk_control *rwc); +void page_unlock_anon_vma_read(struct anon_vma *anon_vma); #else /* !CONFIG_MMU */ -- cgit From 3f913fc5f9745613088d3c569778c9813ab9c129 Mon Sep 17 00:00:00 2001 From: Qi Zheng Date: Thu, 19 May 2022 14:08:55 -0700 Subject: mm: fix missing handler for __GFP_NOWARN We expect no warnings to be issued when we specify __GFP_NOWARN, but currently in paths like alloc_pages() and kmalloc(), there are still some warnings printed, fix it. But for some warnings that report usage problems, we don't deal with them. If such warnings are printed, then we should fix the usage problems. Such as the following case: WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); [zhengqi.arch@bytedance.com: v2] Link: https://lkml.kernel.org/r/20220511061951.1114-1-zhengqi.arch@bytedance.com Link: https://lkml.kernel.org/r/20220510113809.80626-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng Cc: Akinobu Mita Cc: Vlastimil Babka Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Steven Rostedt (Google) Signed-off-by: Andrew Morton --- include/linux/fault-inject.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h index 2d04f6448cde..9f6e25467844 100644 --- a/include/linux/fault-inject.h +++ b/include/linux/fault-inject.h @@ -20,6 +20,7 @@ struct fault_attr { atomic_t space; unsigned long verbose; bool task_filter; + bool no_warn; unsigned long stacktrace_depth; unsigned long require_start; unsigned long require_end; @@ -39,6 +40,7 @@ struct fault_attr { .ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \ .verbose = 2, \ .dname = NULL, \ + .no_warn = false, \ } #define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER -- cgit From 37462a920392cb86541650a6f4121155f11f1199 Mon Sep 17 00:00:00 2001 From: Christophe de Dinechin Date: Thu, 14 Apr 2022 17:08:54 +0200 Subject: nodemask.h: fix compilation error with GCC12 With gcc version 12.0.1 20220401 (Red Hat 12.0.1-0), building with defconfig results in the following compilation error: | CC mm/swapfile.o | mm/swapfile.c: In function `setup_swap_info': | mm/swapfile.c:2291:47: error: array subscript -1 is below array bounds | of `struct plist_node[]' [-Werror=array-bounds] | 2291 | p->avail_lists[i].prio = 1; | | ~~~~~~~~~~~~~~^~~ | In file included from mm/swapfile.c:16: | ./include/linux/swap.h:292:27: note: while referencing `avail_lists' | 292 | struct plist_node avail_lists[]; /* | | ^~~~~~~~~~~ This is due to the compiler detecting that the mask in node_states[__state] could theoretically be zero, which would lead to first_node() returning -1 through find_first_bit. I believe that the warning/error is legitimate. I first tried adding a test to check that the node mask is not emtpy, since a similar test exists in the case where MAX_NUMNODES == 1. However, adding the if statement causes other warnings to appear in for_each_cpu_node_but, because it introduces a dangling else ambiguity. And unfortunately, GCC is not smart enough to detect that the added test makes the case where (node) == -1 impossible, so it still complains with the same message. This is why I settled on replacing that with a harmless, but relatively useless (node) >= 0 test. Based on the warning for the dangling else, I also decided to fix the case where MAX_NUMNODES == 1 by moving the condition inside the for loop. It will still only be tested once. This ensures that the meaning of an else following for_each_node_mask or derivatives would not silently have a different meaning depending on the configuration. Link: https://lkml.kernel.org/r/20220414150855.2407137-3-dinechin@redhat.com Signed-off-by: Christophe de Dinechin Signed-off-by: Christophe de Dinechin Reviewed-by: Andrew Morton Cc: Ben Segall Cc: "Michael S. Tsirkin" Cc: Steven Rostedt Cc: Ingo Molnar Cc: Mel Gorman Cc: Dietmar Eggemann Cc: Vincent Guittot Cc: Paolo Bonzini Cc: Daniel Bristot de Oliveira Cc: Jason Wang Cc: Zhen Lei Cc: Juri Lelli Cc: Peter Zijlstra Cc: Signed-off-by: Andrew Morton --- include/linux/nodemask.h | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 567c3ddba2c4..c6199dbe2591 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -375,14 +375,13 @@ static inline void __nodes_fold(nodemask_t *dstp, const nodemask_t *origp, } #if MAX_NUMNODES > 1 -#define for_each_node_mask(node, mask) \ - for ((node) = first_node(mask); \ - (node) < MAX_NUMNODES; \ - (node) = next_node((node), (mask))) +#define for_each_node_mask(node, mask) \ + for ((node) = first_node(mask); \ + (node >= 0) && (node) < MAX_NUMNODES; \ + (node) = next_node((node), (mask))) #else /* MAX_NUMNODES == 1 */ -#define for_each_node_mask(node, mask) \ - if (!nodes_empty(mask)) \ - for ((node) = 0; (node) < 1; (node)++) +#define for_each_node_mask(node, mask) \ + for ((node) = 0; (node) < 1 && !nodes_empty(mask); (node)++) #endif /* MAX_NUMNODES */ /* -- cgit From 991d8d8142cad94f9c5c05db25e67fa83d6f772a Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Fri, 13 May 2022 11:34:33 +0200 Subject: topology: Remove unused cpu_cluster_mask() default_topology[] uses cpu_clustergroup_mask() for the CLS level (guarded by CONFIG_SCHED_CLUSTER) which is currently provided by x86 (arch/x86/kernel/smpboot.c) and arm64 (drivers/base/arch_topology.c). Fixes: 778c558f49a2c ("sched: Add cluster scheduler level in core and related Kconfig for ARM64") Signed-off-by: Dietmar Eggemann Signed-off-by: Peter Zijlstra (Intel) Acked-by: Barry Song Link: https://lore.kernel.org/r/20220513093433.425163-1-dietmar.eggemann@arm.com --- include/linux/topology.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/topology.h b/include/linux/topology.h index f19bc3626297..4564faafd0e1 100644 --- a/include/linux/topology.h +++ b/include/linux/topology.h @@ -240,13 +240,6 @@ static inline const struct cpumask *cpu_smt_mask(int cpu) } #endif -#if defined(CONFIG_SCHED_CLUSTER) && !defined(cpu_cluster_mask) -static inline const struct cpumask *cpu_cluster_mask(int cpu) -{ - return topology_cluster_cpumask(cpu); -} -#endif - static inline const struct cpumask *cpu_cpu_mask(int cpu) { return cpumask_of_node(cpu_to_node(cpu)); -- cgit From 5cebb40bc9554aafcc492431181f43c6231b0459 Mon Sep 17 00:00:00 2001 From: Harini Katakam Date: Wed, 18 May 2022 22:37:56 +0530 Subject: net: macb: Fix PTP one step sync support PTP one step sync packets cannot have CSUM padding and insertion in SW since time stamp is inserted on the fly by HW. In addition, ptp4l version 3.0 and above report an error when skb timestamps are reported for packets that not processed for TX TS after transmission. Add a helper to identify PTP one step sync and fix the above two errors. Add a common mask for PTP header flag field "twoStepflag". Also reset ptp OSS bit when one step is not selected. Fixes: ab91f0a9b5f4 ("net: macb: Add hardware PTP support") Fixes: 653e92a9175e ("net: macb: add support for padding and fcs computation") Signed-off-by: Harini Katakam Reviewed-by: Radhey Shyam Pandey Reviewed-by: Claudiu Beznea Link: https://lore.kernel.org/r/20220518170756.7752-1-harini.katakam@xilinx.com Signed-off-by: Jakub Kicinski --- include/linux/ptp_classify.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ptp_classify.h b/include/linux/ptp_classify.h index fefa7790dc46..2b6ea36ad162 100644 --- a/include/linux/ptp_classify.h +++ b/include/linux/ptp_classify.h @@ -43,6 +43,9 @@ #define OFF_PTP_SOURCE_UUID 22 /* PTPv1 only */ #define OFF_PTP_SEQUENCE_ID 30 +/* PTP header flag fields */ +#define PTP_FLAG_TWOSTEP BIT(1) + /* Below defines should actually be removed at some point in time. */ #define IP6_HLEN 40 #define UDP_HLEN 8 -- cgit From 827fc630e4c8087df5a8e8ee013b686bd6f13736 Mon Sep 17 00:00:00 2001 From: Muneendra Kumar Date: Thu, 19 May 2022 05:31:07 -0700 Subject: scsi: nvme-fc: Add new routine nvme_fc_io_getuuid() Add nvme_fc_io_getuuid() to the nvme-fc transport. The routine is invoked by the FC LLDD on a per-I/O request basis. The routine translates from the FC-specific request structure to the bio and the cgroup structure in order to obtain the FC appid stored in the cgroup structure. If a value is not set or a bio is not found, a NULL appid (aka uuid) will be returned to the LLDD. Link: https://lore.kernel.org/r/20220519123110.17361-2-jsmart2021@gmail.com Reviewed-by: Hannes Reinecke Reviewed-by: Himanshu Madhani Acked-by: Christoph Hellwig Signed-off-by: Muneendra Kumar Signed-off-by: James Smart Signed-off-by: Martin K. Petersen --- include/linux/nvme-fc-driver.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nvme-fc-driver.h b/include/linux/nvme-fc-driver.h index 5358a5facdee..fa092b9be2fd 100644 --- a/include/linux/nvme-fc-driver.h +++ b/include/linux/nvme-fc-driver.h @@ -564,6 +564,15 @@ int nvme_fc_rcv_ls_req(struct nvme_fc_remote_port *remoteport, void *lsreqbuf, u32 lsreqbuf_len); +/* + * Routine called to get the appid field associated with request by the lldd + * + * If the return value is NULL : the user/libvirt has not set the appid to VM + * If the return value is non-zero: Returns the appid associated with VM + * + * @req: IO request from nvme fc to driver + */ +char *nvme_fc_io_getuuid(struct nvmefc_fcp_req *req); /* * *************** LLDD FC-NVME Target/Subsystem API *************** @@ -1048,5 +1057,10 @@ int nvmet_fc_rcv_fcp_req(struct nvmet_fc_target_port *tgtport, void nvmet_fc_rcv_fcp_abort(struct nvmet_fc_target_port *tgtport, struct nvmefc_tgt_fcp_req *fcpreq); +/* + * add a define, visible to the compiler, that indicates support + * for feature. Allows for conditional compilation in LLDDs. + */ +#define NVME_FC_FEAT_UUID 0x0001 #endif /* _NVME_FC_DRIVER_H */ -- cgit From 59df85d5fbae17175c391d89ad03e9e7a01b7a55 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 1 Mar 2022 19:56:53 -0500 Subject: linux/mount.h: trim includes Signed-off-by: Al Viro --- include/linux/mount.h | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mount.h b/include/linux/mount.h index 7f18a7555dff..b3b149dcbf96 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -11,17 +11,15 @@ #define _LINUX_MOUNT_H #include -#include -#include -#include -#include -#include +#include struct super_block; -struct vfsmount; struct dentry; -struct mnt_namespace; +struct user_namespace; +struct file_system_type; struct fs_context; +struct file; +struct path; #define MNT_NOSUID 0x01 #define MNT_NODEV 0x02 @@ -81,9 +79,6 @@ static inline struct user_namespace *mnt_user_ns(const struct vfsmount *mnt) return smp_load_acquire(&mnt->mnt_userns); } -struct file; /* forward dec */ -struct path; - extern int mnt_want_write(struct vfsmount *mnt); extern int mnt_want_write_file(struct file *file); extern void mnt_drop_write(struct vfsmount *mnt); @@ -94,12 +89,10 @@ extern struct vfsmount *mnt_clone_internal(const struct path *path); extern bool __mnt_is_readonly(struct vfsmount *mnt); extern bool mnt_may_suid(struct vfsmount *mnt); -struct path; extern struct vfsmount *clone_private_mount(const struct path *path); extern int __mnt_want_write(struct vfsmount *); extern void __mnt_drop_write(struct vfsmount *); -struct file_system_type; extern struct vfsmount *fc_mount(struct fs_context *fc); extern struct vfsmount *vfs_create_mount(struct fs_context *fc); extern struct vfsmount *vfs_kern_mount(struct file_system_type *type, -- cgit From 70f8d9c5750bbb0ca4ef7e23d6abcb05e6061138 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Wed, 2 Mar 2022 17:49:09 -0500 Subject: move mount-related externs from fs.h to mount.h Signed-off-by: Al Viro --- include/linux/fs.h | 11 ----------- include/linux/mount.h | 12 ++++++++++++ 2 files changed, 12 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index bbde95387a23..14df7c049b43 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2461,22 +2461,11 @@ struct super_block *sget(struct file_system_type *type, extern int register_filesystem(struct file_system_type *); extern int unregister_filesystem(struct file_system_type *); -extern struct vfsmount *kern_mount(struct file_system_type *); -extern void kern_unmount(struct vfsmount *mnt); -extern int may_umount_tree(struct vfsmount *); -extern int may_umount(struct vfsmount *); -extern long do_mount(const char *, const char __user *, - const char *, unsigned long, void *); -extern struct vfsmount *collect_mounts(const struct path *); -extern void drop_collected_mounts(struct vfsmount *); -extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *, - struct vfsmount *); extern int vfs_statfs(const struct path *, struct kstatfs *); extern int user_statfs(const char __user *, struct kstatfs *); extern int fd_statfs(int, struct kstatfs *); extern int freeze_super(struct super_block *super); extern int thaw_super(struct super_block *super); -extern bool our_mnt(struct vfsmount *mnt); extern __printf(2, 3) int super_setup_bdi_name(struct super_block *sb, char *fmt, ...); extern int super_setup_bdi(struct super_block *sb); diff --git a/include/linux/mount.h b/include/linux/mount.h index b3b149dcbf96..55a4abaf6715 100644 --- a/include/linux/mount.h +++ b/include/linux/mount.h @@ -108,6 +108,18 @@ extern void mark_mounts_for_expiry(struct list_head *mounts); extern dev_t name_to_dev_t(const char *name); extern bool path_is_mountpoint(const struct path *path); +extern bool our_mnt(struct vfsmount *mnt); + +extern struct vfsmount *kern_mount(struct file_system_type *); +extern void kern_unmount(struct vfsmount *mnt); +extern int may_umount_tree(struct vfsmount *); +extern int may_umount(struct vfsmount *); +extern long do_mount(const char *, const char __user *, + const char *, unsigned long, void *); +extern struct vfsmount *collect_mounts(const struct path *); +extern void drop_collected_mounts(struct vfsmount *); +extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *, + struct vfsmount *); extern void kern_unmount_array(struct vfsmount *mnt[], unsigned int num); #endif /* _LINUX_MOUNT_H */ -- cgit From 3bc253c2e652cf5f12cd8c00d80d8ec55d67d1a7 Mon Sep 17 00:00:00 2001 From: Geliang Tang Date: Thu, 19 May 2022 16:30:10 -0700 Subject: bpf: Add bpf_skc_to_mptcp_sock_proto This patch implements a new struct bpf_func_proto, named bpf_skc_to_mptcp_sock_proto. Define a new bpf_id BTF_SOCK_TYPE_MPTCP, and a new helper bpf_skc_to_mptcp_sock(), which invokes another new helper bpf_mptcp_sock_from_subflow() in net/mptcp/bpf.c to get struct mptcp_sock from a given subflow socket. v2: Emit BTF type, add func_id checks in verifier.c and bpf_trace.c, remove build check for CONFIG_BPF_JIT v5: Drop EXPORT_SYMBOL (Martin) Co-developed-by: Nicolas Rybowski Co-developed-by: Matthieu Baerts Signed-off-by: Nicolas Rybowski Signed-off-by: Matthieu Baerts Signed-off-by: Geliang Tang Signed-off-by: Mat Martineau Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20220519233016.105670-2-mathew.j.martineau@linux.intel.com --- include/linux/bpf.h | 1 + include/linux/btf_ids.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c107392b0ba7..a3ef078401cf 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2231,6 +2231,7 @@ extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_request_sock_proto; extern const struct bpf_func_proto bpf_skc_to_udp6_sock_proto; extern const struct bpf_func_proto bpf_skc_to_unix_sock_proto; +extern const struct bpf_func_proto bpf_skc_to_mptcp_sock_proto; extern const struct bpf_func_proto bpf_copy_from_user_proto; extern const struct bpf_func_proto bpf_snprintf_btf_proto; extern const struct bpf_func_proto bpf_snprintf_proto; diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index bc5d9cc34e4c..335a19092368 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -178,7 +178,8 @@ extern struct btf_id_set name; BTF_SOCK_TYPE(BTF_SOCK_TYPE_TCP6, tcp6_sock) \ BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP, udp_sock) \ BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock) \ - BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock) + BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock) \ + BTF_SOCK_TYPE(BTF_SOCK_TYPE_MPTCP, mptcp_sock) enum { #define BTF_SOCK_TYPE(name, str) name, -- cgit From b23316aabffa835ecc516cb81daeef5b9155e8a5 Mon Sep 17 00:00:00 2001 From: Yuntao Wang Date: Thu, 19 May 2022 23:06:10 +0800 Subject: selftests/bpf: Add missing trampoline program type to trampoline_count test Currently the trampoline_count test doesn't include any fmod_ret bpf programs, fix it to make the test cover all possible trampoline program types. Since fmod_ret bpf programs can't be attached to __set_task_comm function, as it's neither whitelisted for error injection nor a security hook, change it to bpf_modify_return_test. This patch also does some other cleanups such as removing duplicate code, dropping inconsistent comments, etc. Signed-off-by: Yuntao Wang Signed-off-by: Andrii Nakryiko Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20220519150610.601313-1-ytcoode@gmail.com --- include/linux/bpf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a3ef078401cf..cc4d5e394031 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -724,7 +724,7 @@ struct btf_func_model { #define BPF_TRAMP_F_RET_FENTRY_RET BIT(4) /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 - * bytes on x86. Pick a number to fit into BPF_IMAGE_SIZE / 2 + * bytes on x86. */ #define BPF_MAX_TRAMP_LINKS 38 -- cgit From 8d50cdf8b8341770bc6367bce40c0c1bb0e1d5b3 Mon Sep 17 00:00:00 2001 From: Pawan Gupta Date: Thu, 19 May 2022 20:32:13 -0700 Subject: x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data Add the sysfs reporting file for Processor MMIO Stale Data vulnerability. It exposes the vulnerability and mitigation state similar to the existing files for the other hardware vulnerabilities. Signed-off-by: Pawan Gupta Signed-off-by: Borislav Petkov --- include/linux/cpu.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 54dc2f9a2d56..2c7477354744 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -65,6 +65,9 @@ extern ssize_t cpu_show_tsx_async_abort(struct device *dev, extern ssize_t cpu_show_itlb_multihit(struct device *dev, struct device_attribute *attr, char *buf); extern ssize_t cpu_show_srbds(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_mmio_stale_data(struct device *dev, + struct device_attribute *attr, + char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, -- cgit From bb12dd42d20f5513a8d1da225232af0a0743fd79 Mon Sep 17 00:00:00 2001 From: Krzysztof Kozlowski Date: Fri, 24 Sep 2021 12:56:53 +0200 Subject: powerpc/powermac: constify device_node in of_irq_parse_oldworld() The of_irq_parse_oldworld() does not modify passed device_node so make it a pointer to const for safety. Drop the extern while modifying the line. Signed-off-by: Krzysztof Kozlowski Acked-by: Rob Herring Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20210924105653.46963-2-krzysztof.kozlowski@canonical.com --- include/linux/of_irq.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of_irq.h b/include/linux/of_irq.h index aaf219bd0354..83fccd0c9bba 100644 --- a/include/linux/of_irq.h +++ b/include/linux/of_irq.h @@ -20,12 +20,12 @@ typedef int (*of_irq_init_cb_t)(struct device_node *, struct device_node *); #if defined(CONFIG_PPC32) && defined(CONFIG_PPC_PMAC) extern unsigned int of_irq_workarounds; extern struct device_node *of_irq_dflt_pic; -extern int of_irq_parse_oldworld(struct device_node *device, int index, - struct of_phandle_args *out_irq); +int of_irq_parse_oldworld(const struct device_node *device, int index, + struct of_phandle_args *out_irq); #else /* CONFIG_PPC32 && CONFIG_PPC_PMAC */ #define of_irq_workarounds (0) #define of_irq_dflt_pic (NULL) -static inline int of_irq_parse_oldworld(struct device_node *device, int index, +static inline int of_irq_parse_oldworld(const struct device_node *device, int index, struct of_phandle_args *out_irq) { return -EINVAL; -- cgit From cd705ea857fdd859a9df09e8adda4cb4c906e8a2 Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Fri, 1 Apr 2022 23:40:29 +0200 Subject: lib: add generic polynomial calculation Some temperature and voltage sensors use a polynomial to convert between raw data points and actual temperature or voltage. The polynomial is usually the result of a curve fitting of the diode characteristic. The BT1 PVT hwmon driver already uses such a polynonmial calculation which is rather generic. Move it to lib/ so other drivers can reuse it. Signed-off-by: Michael Walle Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20220401214032.3738095-2-michael@walle.cc Signed-off-by: Guenter Roeck --- include/linux/polynomial.h | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) create mode 100644 include/linux/polynomial.h (limited to 'include/linux') diff --git a/include/linux/polynomial.h b/include/linux/polynomial.h new file mode 100644 index 000000000000..9e074a0bb6fa --- /dev/null +++ b/include/linux/polynomial.h @@ -0,0 +1,35 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2020 BAIKAL ELECTRONICS, JSC + */ + +#ifndef _POLYNOMIAL_H +#define _POLYNOMIAL_H + +/* + * struct polynomial_term - one term descriptor of a polynomial + * @deg: degree of the term. + * @coef: multiplication factor of the term. + * @divider: distributed divider per each degree. + * @divider_leftover: divider leftover, which couldn't be redistributed. + */ +struct polynomial_term { + unsigned int deg; + long coef; + long divider; + long divider_leftover; +}; + +/* + * struct polynomial - a polynomial descriptor + * @total_divider: total data divider. + * @terms: polynomial terms, last term must have degree of 0 + */ +struct polynomial { + long total_divider; + struct polynomial_term terms[]; +}; + +long polynomial_calc(const struct polynomial *poly, long data); + +#endif -- cgit From e5d21072054fbadf41cd56062a3a14e447e8c22b Mon Sep 17 00:00:00 2001 From: Guenter Roeck Date: Wed, 11 May 2022 06:29:59 -0700 Subject: hwmon: Introduce hwmon_device_register_for_thermal The thermal subsystem registers a hwmon driver without providing chip or sysfs group information. This is for legacy reasons and would be difficult to change. At the same time, we want to enforce that chip information is provided when registering a hwmon device using hwmon_device_register_with_info(). To enable this, introduce a special API for use only by the thermal subsystem. Acked-by: Rafael J . Wysocki Signed-off-by: Guenter Roeck --- include/linux/hwmon.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hwmon.h b/include/linux/hwmon.h index 4efaf06fd2b8..14325f93c6b2 100644 --- a/include/linux/hwmon.h +++ b/include/linux/hwmon.h @@ -450,6 +450,9 @@ hwmon_device_register_with_info(struct device *dev, const struct hwmon_chip_info *info, const struct attribute_group **extra_groups); struct device * +hwmon_device_register_for_thermal(struct device *dev, const char *name, + void *drvdata); +struct device * devm_hwmon_device_register_with_info(struct device *dev, const char *name, void *drvdata, const struct hwmon_chip_info *info, -- cgit From cc4e7fa549cb69c18bc2ba7209df373ca06749e3 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sat, 21 May 2022 13:10:53 +0200 Subject: net: qed: fix typos in comments Spelling mistakes (triple letters) in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall Signed-off-by: David S. Miller --- include/linux/qed/qed_fcoe_if.h | 4 ++-- include/linux/qed/qed_iscsi_if.h | 4 ++-- include/linux/qed/qed_nvmetcp_if.h | 2 +- 3 files changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/qed/qed_fcoe_if.h b/include/linux/qed/qed_fcoe_if.h index 16752eca5cbd..90e3045b2dcb 100644 --- a/include/linux/qed/qed_fcoe_if.h +++ b/include/linux/qed/qed_fcoe_if.h @@ -76,7 +76,7 @@ void qed_fcoe_set_pf_params(struct qed_dev *cdev, * @fill_dev_info: fills FCoE specific information * @param cdev * @param info - * @return 0 on sucesss, otherwise error value. + * @return 0 on success, otherwise error value. * @register_ops: register FCoE operations * @param cdev * @param ops - specified using qed_iscsi_cb_ops @@ -96,7 +96,7 @@ void qed_fcoe_set_pf_params(struct qed_dev *cdev, * connection. * @param p_doorbell - qed will fill the address of the * doorbell. - * return 0 on sucesss, otherwise error value. + * return 0 on success, otherwise error value. * @release_conn: release a previously acquired fcoe connection * @param cdev * @param handle - the connection handle. diff --git a/include/linux/qed/qed_iscsi_if.h b/include/linux/qed/qed_iscsi_if.h index 494cdc3cd840..fbf7973ae9ba 100644 --- a/include/linux/qed/qed_iscsi_if.h +++ b/include/linux/qed/qed_iscsi_if.h @@ -133,7 +133,7 @@ struct qed_iscsi_cb_ops { * @fill_dev_info: fills iSCSI specific information * @param cdev * @param info - * @return 0 on sucesss, otherwise error value. + * @return 0 on success, otherwise error value. * @register_ops: register iscsi operations * @param cdev * @param ops - specified using qed_iscsi_cb_ops @@ -152,7 +152,7 @@ struct qed_iscsi_cb_ops { * connection. * @param p_doorbell - qed will fill the address of the * doorbell. - * @return 0 on sucesss, otherwise error value. + * @return 0 on success, otherwise error value. * @release_conn: release a previously acquired iscsi connection * @param cdev * @param handle - the connection handle. diff --git a/include/linux/qed/qed_nvmetcp_if.h b/include/linux/qed/qed_nvmetcp_if.h index 1d51df347560..bbfbfba51f37 100644 --- a/include/linux/qed/qed_nvmetcp_if.h +++ b/include/linux/qed/qed_nvmetcp_if.h @@ -132,7 +132,7 @@ struct nvmetcp_task_params { * connection. * @param p_doorbell - qed will fill the address of the * doorbell. - * @return 0 on sucesss, otherwise error value. + * @return 0 on success, otherwise error value. * @release_conn: release a previously acquired nvmetcp connection * @param cdev * @param handle - the connection handle. -- cgit From ad25f5cb39872ca14bcbe00816ae65c22fe04b89 Mon Sep 17 00:00:00 2001 From: David Howells Date: Sat, 21 May 2022 08:45:28 +0100 Subject: rxrpc: Fix locking issue There's a locking issue with the per-netns list of calls in rxrpc. The pieces of code that add and remove a call from the list use write_lock() and the calls procfile uses read_lock() to access it. However, the timer callback function may trigger a removal by trying to queue a call for processing and finding that it's already queued - at which point it has a spare refcount that it has to do something with. Unfortunately, if it puts the call and this reduces the refcount to 0, the call will be removed from the list. Unfortunately, since the _bh variants of the locking functions aren't used, this can deadlock. ================================ WARNING: inconsistent lock state 5.18.0-rc3-build4+ #10 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/2/25 [HC0[0]:SC1[1]:HE1:SE0] takes: ffff888107ac4038 (&rxnet->call_lock){+.?.}-{2:2}, at: rxrpc_put_call+0x103/0x14b {SOFTIRQ-ON-W} state was registered at: ... Possible unsafe locking scenario: CPU0 ---- lock(&rxnet->call_lock); lock(&rxnet->call_lock); *** DEADLOCK *** 1 lock held by ksoftirqd/2/25: #0: ffff8881008ffdb0 ((&call->timer)){+.-.}-{0:0}, at: call_timer_fn+0x5/0x23d Changes ======= ver #2) - Changed to using list_next_rcu() rather than rcu_dereference() directly. Fixes: 17926a79320a ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both") Signed-off-by: David Howells cc: Marc Dionne cc: linux-afs@lists.infradead.org Signed-off-by: David S. Miller --- include/linux/list.h | 10 ++++++++++ include/linux/seq_file.h | 4 ++++ 2 files changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/list.h b/include/linux/list.h index c147eeb2d39d..57e8b559cdf6 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -605,6 +605,16 @@ static inline void list_splice_tail_init(struct list_head *list, #define list_for_each(pos, head) \ for (pos = (head)->next; !list_is_head(pos, (head)); pos = pos->next) +/** + * list_for_each_rcu - Iterate over a list in an RCU-safe fashion + * @pos: the &struct list_head to use as a loop cursor. + * @head: the head for your list. + */ +#define list_for_each_rcu(pos, head) \ + for (pos = rcu_dereference((head)->next); \ + !list_is_head(pos, (head)); \ + pos = rcu_dereference(pos->next)) + /** * list_for_each_continue - continue iteration over a list * @pos: the &struct list_head to use as a loop cursor. diff --git a/include/linux/seq_file.h b/include/linux/seq_file.h index 60820ab511d2..bd023dd38ae6 100644 --- a/include/linux/seq_file.h +++ b/include/linux/seq_file.h @@ -277,6 +277,10 @@ extern struct list_head *seq_list_start_head(struct list_head *head, extern struct list_head *seq_list_next(void *v, struct list_head *head, loff_t *ppos); +extern struct list_head *seq_list_start_rcu(struct list_head *head, loff_t pos); +extern struct list_head *seq_list_start_head_rcu(struct list_head *head, loff_t pos); +extern struct list_head *seq_list_next_rcu(void *v, struct list_head *head, loff_t *ppos); + /* * Helpers for iteration over hlist_head-s in seq_files */ -- cgit From c304eddcecfe2513ff98ce3ae97d1c196d82ba08 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 19 May 2022 13:20:54 -0700 Subject: net: wrap the wireless pointers in struct net_device in an ifdef Most protocol-specific pointers in struct net_device are under a respective ifdef. Wireless is the notable exception. Since there's a sizable number of custom-built kernels for datacenter workloads which don't build wireless it seems reasonable to ifdefy those pointers as well. While at it move IPv4 and IPv6 pointers up, those are special for obvious reasons. Acked-by: Johannes Berg Acked-by: Stefan Schmidt # ieee802154 Acked-by: Sven Eckelmann Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netdevice.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 1ce91cb8074a..f615a66c89e9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2119,6 +2119,8 @@ struct net_device { /* Protocol-specific pointers */ + struct in_device __rcu *ip_ptr; + struct inet6_dev __rcu *ip6_ptr; #if IS_ENABLED(CONFIG_VLAN_8021Q) struct vlan_info __rcu *vlan_info; #endif @@ -2131,16 +2133,18 @@ struct net_device { #if IS_ENABLED(CONFIG_ATALK) void *atalk_ptr; #endif - struct in_device __rcu *ip_ptr; #if IS_ENABLED(CONFIG_DECNET) struct dn_dev __rcu *dn_ptr; #endif - struct inet6_dev __rcu *ip6_ptr; #if IS_ENABLED(CONFIG_AX25) void *ax25_ptr; #endif +#if IS_ENABLED(CONFIG_CFG80211) struct wireless_dev *ieee80211_ptr; +#endif +#if IS_ENABLED(CONFIG_IEEE802154) || IS_ENABLED(CONFIG_6LOWPAN) struct wpan_dev *ieee802154_ptr; +#endif #if IS_ENABLED(CONFIG_MPLS_ROUTING) struct mpls_dev __rcu *mpls_ptr; #endif -- cgit From 97d6fb1b48f5e6f6d58028593defe8a23641b0b4 Mon Sep 17 00:00:00 2001 From: Yuezhang Mo Date: Tue, 12 Apr 2022 12:23:10 +0900 Subject: block: add sync_blockdev_range() sync_blockdev_range() is to support syncing multiple sectors with as few block device requests as possible, it is helpful to make the block device to give full play to its performance. Signed-off-by: Yuezhang Mo Suggested-by: Christoph Hellwig Reviewed-by: Andy Wu Reviewed-by: Aoyama Wataru Reviewed-by: Christoph Hellwig Reviewed-by: Jens Axboe Acked-by: Sungjong Seo Signed-off-by: Namjae Jeon --- include/linux/blkdev.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 60d016138997..331cc6918ee9 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1547,6 +1547,7 @@ int truncate_bdev_range(struct block_device *bdev, fmode_t mode, loff_t lstart, #ifdef CONFIG_BLOCK void invalidate_bdev(struct block_device *bdev); int sync_blockdev(struct block_device *bdev); +int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend); int sync_blockdev_nowait(struct block_device *bdev); void sync_bdevs(bool wait); void printk_all_partitions(void); -- cgit From 100f59d964050020285f0c8264ce520f0c406c13 Mon Sep 17 00:00:00 2001 From: Mickaël Salaün Date: Fri, 6 May 2022 18:10:56 +0200 Subject: LSM: Remove double path_rename hook calls for RENAME_EXCHANGE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In order to be able to identify a file exchange with renameat2(2) and RENAME_EXCHANGE, which will be useful for Landlock [1], propagate the rename flags to LSMs. This may also improve performance because of the switch from two set of LSM hook calls to only one, and because LSMs using this hook may optimize the double check (e.g. only one lock, reduce the number of path walks). AppArmor, Landlock and Tomoyo are updated to leverage this change. This should not change the current behavior (same check order), except (different level of) speed boosts. [1] https://lore.kernel.org/r/20220221212522.320243-1-mic@digikod.net Cc: James Morris Cc: Kentaro Takeda Cc: Serge E. Hallyn Acked-by: John Johansen Acked-by: Tetsuo Handa Reviewed-by: Paul Moore Signed-off-by: Mickaël Salaün Link: https://lore.kernel.org/r/20220506161102.525323-7-mic@digikod.net --- include/linux/lsm_hook_defs.h | 2 +- include/linux/lsm_hooks.h | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index db924fe379c9..eafa1d2489fd 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -100,7 +100,7 @@ LSM_HOOK(int, 0, path_link, struct dentry *old_dentry, const struct path *new_dir, struct dentry *new_dentry) LSM_HOOK(int, 0, path_rename, const struct path *old_dir, struct dentry *old_dentry, const struct path *new_dir, - struct dentry *new_dentry) + struct dentry *new_dentry, unsigned int flags) LSM_HOOK(int, 0, path_chmod, const struct path *path, umode_t mode) LSM_HOOK(int, 0, path_chown, const struct path *path, kuid_t uid, kgid_t gid) LSM_HOOK(int, 0, path_chroot, const struct path *path) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 419b5febc3ca..9acf5e368d73 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -358,6 +358,7 @@ * @old_dentry contains the dentry structure of the old link. * @new_dir contains the path structure for parent of the new link. * @new_dentry contains the dentry structure of the new link. + * @flags may contain rename options such as RENAME_EXCHANGE. * Return 0 if permission is granted. * @path_chmod: * Check for permission to change a mode of the file @path. The new -- cgit From fb70bf124b051d4ded4ce57511dfec6d3ebf2b43 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 30 Mar 2022 10:30:54 -0400 Subject: NFSD: Instantiate a struct file when creating a regular NFSv4 file There have been reports of races that cause NFSv4 OPEN(CREATE) to return an error even though the requested file was created. NFSv4 does not provide a status code for this case. To mitigate some of these problems, reorganize the NFSv4 OPEN(CREATE) logic to allocate resources before the file is actually created, and open the new file while the parent directory is still locked. Two new APIs are added: + Add an API that works like nfsd_file_acquire() but does not open the underlying file. The OPEN(CREATE) path can use this API when it already has an open file. + Add an API that is kin to dentry_open(). NFSD needs to create a file and grab an open "struct file *" atomically. The alloc_empty_file() has to be done before the inode create. If it fails (for example, because the NFS server has exceeded its max_files limit), we avoid creating the file and can still return an error to the NFS client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=382 Signed-off-by: Chuck Lever Tested-by: JianHong Yin --- include/linux/fs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index aa6c1bbdb8c4..b848355b5e6c 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2640,6 +2640,8 @@ static inline struct file *file_open_root_mnt(struct vfsmount *mnt, name, flags, mode); } extern struct file * dentry_open(const struct path *, int, const struct cred *); +extern struct file *dentry_create(const struct path *path, int flags, + umode_t mode, const struct cred *cred); extern struct file * open_with_fake_path(const struct path *, int, struct inode*, const struct cred *); static inline struct file *file_clone_open(struct file *file) -- cgit From bca1a1004615efe141fd78f360ecc48c60bc4ad5 Mon Sep 17 00:00:00 2001 From: Björn Ardö Date: Thu, 31 Mar 2022 09:01:15 +0200 Subject: mailbox: forward the hrtimer if not queued and under a lock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit c7dacf5b0f32957b24ef29df1207dc2cd8307743, "mailbox: avoid timer start from callback" The previous commit was reverted since it lead to a race that caused the hrtimer to not be started at all. The check for hrtimer_active() in msg_submit() will return true if the callback function txdone_hrtimer() is currently running. This function could return HRTIMER_NORESTART and then the timer will not be restarted, and also msg_submit() will not start the timer. This will lead to a message actually being submitted but no timer will start to check for its compleation. The original fix that added checking hrtimer_active() was added to avoid a warning with hrtimer_forward. Looking in the kernel another solution to avoid this warning is to check hrtimer_is_queued() before calling hrtimer_forward_now() instead. This however requires a lock so the timer is not started by msg_submit() inbetween this check and the hrtimer_forward() call. Fixes: c7dacf5b0f32 ("mailbox: avoid timer start from callback") Signed-off-by: Björn Ardö Signed-off-by: Jassi Brar --- include/linux/mailbox_controller.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mailbox_controller.h b/include/linux/mailbox_controller.h index 36d6ce673503..6fee33cb52f5 100644 --- a/include/linux/mailbox_controller.h +++ b/include/linux/mailbox_controller.h @@ -83,6 +83,7 @@ struct mbox_controller { const struct of_phandle_args *sp); /* Internal to API */ struct hrtimer poll_hrt; + spinlock_t poll_hrt_lock; struct list_head node; }; -- cgit From fe736565efb775620dbcf3c459c1cd80d3e868da Mon Sep 17 00:00:00 2001 From: Song Liu Date: Fri, 20 May 2022 16:57:53 -0700 Subject: bpf: Introduce bpf_arch_text_invalidate for bpf_prog_pack Introduce bpf_arch_text_invalidate and use it to fill unused part of the bpf_prog_pack with illegal instructions when a BPF program is freed. Fixes: 57631054fae6 ("bpf: Introduce bpf_prog_pack allocator") Fixes: 33c9805860e5 ("bpf: Introduce bpf_jit_binary_pack_[alloc|finalize|free]") Reported-by: Linus Torvalds Signed-off-by: Song Liu Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220520235758.1858153-4-song@kernel.org --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index cc4d5e394031..a9b1875212f6 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2365,6 +2365,7 @@ int bpf_arch_text_poke(void *ip, enum bpf_text_poke_type t, void *addr1, void *addr2); void *bpf_arch_text_copy(void *dst, void *src, size_t len); +int bpf_arch_text_invalidate(void *dst, size_t len); struct btf_id_set; bool btf_id_set_contains(const struct btf_id_set *set, u32 id); -- cgit From 97e03f521050c092919591e668107b3d69c5f426 Mon Sep 17 00:00:00 2001 From: Joanne Koong Date: Mon, 23 May 2022 14:07:07 -0700 Subject: bpf: Add verifier support for dynptrs This patch adds the bulk of the verifier work for supporting dynamic pointers (dynptrs) in bpf. A bpf_dynptr is opaque to the bpf program. It is a 16-byte structure defined internally as: struct bpf_dynptr_kern { void *data; u32 size; u32 offset; } __aligned(8); The upper 8 bits of *size* is reserved (it contains extra metadata about read-only status and dynptr type). Consequently, a dynptr only supports memory less than 16 MB. There are different types of dynptrs (eg malloc, ringbuf, ...). In this patchset, the most basic one, dynptrs to a bpf program's local memory, is added. For now only local memory that is of reg type PTR_TO_MAP_VALUE is supported. In the verifier, dynptr state information will be tracked in stack slots. When the program passes in an uninitialized dynptr (ARG_PTR_TO_DYNPTR | MEM_UNINIT), the stack slots corresponding to the frame pointer where the dynptr resides at are marked STACK_DYNPTR. For helper functions that take in initialized dynptrs (eg bpf_dynptr_read + bpf_dynptr_write which are added later in this patchset), the verifier enforces that the dynptr has been initialized properly by checking that their corresponding stack slots have been marked as STACK_DYNPTR. The 6th patch in this patchset adds test cases that the verifier should successfully reject, such as for example attempting to use a dynptr after doing a direct write into it inside the bpf program. Signed-off-by: Joanne Koong Signed-off-by: Andrii Nakryiko Acked-by: Andrii Nakryiko Acked-by: David Vernet Link: https://lore.kernel.org/bpf/20220523210712.3641569-2-joannelkoong@gmail.com --- include/linux/bpf.h | 28 ++++++++++++++++++++++++++++ include/linux/bpf_verifier.h | 18 ++++++++++++++++++ 2 files changed, 46 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a9b1875212f6..b26c8176b9e0 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -392,10 +392,15 @@ enum bpf_type_flag { MEM_UNINIT = BIT(7 + BPF_BASE_TYPE_BITS), + /* DYNPTR points to memory local to the bpf program. */ + DYNPTR_TYPE_LOCAL = BIT(8 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; +#define DYNPTR_TYPE_FLAG_MASK DYNPTR_TYPE_LOCAL + /* Max number of base types. */ #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) @@ -438,6 +443,7 @@ enum bpf_arg_type { ARG_PTR_TO_CONST_STR, /* pointer to a null terminated read-only string */ ARG_PTR_TO_TIMER, /* pointer to bpf_timer */ ARG_PTR_TO_KPTR, /* pointer to referenced kptr */ + ARG_PTR_TO_DYNPTR, /* pointer to bpf_dynptr. See bpf_type_flag for dynptr type */ __BPF_ARG_TYPE_MAX, /* Extended arg_types. */ @@ -2376,4 +2382,26 @@ int bpf_bprintf_prepare(char *fmt, u32 fmt_size, const u64 *raw_args, u32 **bin_buf, u32 num_args); void bpf_bprintf_cleanup(void); +/* the implementation of the opaque uapi struct bpf_dynptr */ +struct bpf_dynptr_kern { + void *data; + /* Size represents the number of usable bytes of dynptr data. + * If for example the offset is at 4 for a local dynptr whose data is + * of type u64, the number of usable bytes is 4. + * + * The upper 8 bits are reserved. It is as follows: + * Bits 0 - 23 = size + * Bits 24 - 30 = dynptr type + * Bit 31 = whether dynptr is read-only + */ + u32 size; + u32 offset; +} __aligned(8); + +enum bpf_dynptr_type { + BPF_DYNPTR_TYPE_INVALID, + /* Points to memory that is local to the bpf program */ + BPF_DYNPTR_TYPE_LOCAL, +}; + #endif /* _LINUX_BPF_H */ diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 1f1e7f2ea967..af5b2135215e 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -72,6 +72,18 @@ struct bpf_reg_state { u32 mem_size; /* for PTR_TO_MEM | PTR_TO_MEM_OR_NULL */ + /* For dynptr stack slots */ + struct { + enum bpf_dynptr_type type; + /* A dynptr is 16 bytes so it takes up 2 stack slots. + * We need to track which slot is the first slot + * to protect against cases where the user may try to + * pass in an address starting at the second slot of the + * dynptr. + */ + bool first_slot; + } dynptr; + /* Max size from any of the above. */ struct { unsigned long raw1; @@ -174,9 +186,15 @@ enum bpf_stack_slot_type { STACK_SPILL, /* register spilled into stack */ STACK_MISC, /* BPF program wrote some data into this slot */ STACK_ZERO, /* BPF program wrote constant zero */ + /* A dynptr is stored in this stack slot. The type of dynptr + * is stored in bpf_stack_state->spilled_ptr.dynptr.type + */ + STACK_DYNPTR, }; #define BPF_REG_SIZE 8 /* size of eBPF register in bytes */ +#define BPF_DYNPTR_SIZE sizeof(struct bpf_dynptr_kern) +#define BPF_DYNPTR_NR_SLOTS (BPF_DYNPTR_SIZE / BPF_REG_SIZE) struct bpf_stack_state { struct bpf_reg_state spilled_ptr; -- cgit From bc34dee65a65e9c920c420005b8a43f2a721a458 Mon Sep 17 00:00:00 2001 From: Joanne Koong Date: Mon, 23 May 2022 14:07:09 -0700 Subject: bpf: Dynptr support for ring buffers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently, our only way of writing dynamically-sized data into a ring buffer is through bpf_ringbuf_output but this incurs an extra memcpy cost. bpf_ringbuf_reserve + bpf_ringbuf_commit avoids this extra memcpy, but it can only safely support reservation sizes that are statically known since the verifier cannot guarantee that the bpf program won’t access memory outside the reserved space. The bpf_dynptr abstraction allows for dynamically-sized ring buffer reservations without the extra memcpy. There are 3 new APIs: long bpf_ringbuf_reserve_dynptr(void *ringbuf, u32 size, u64 flags, struct bpf_dynptr *ptr); void bpf_ringbuf_submit_dynptr(struct bpf_dynptr *ptr, u64 flags); void bpf_ringbuf_discard_dynptr(struct bpf_dynptr *ptr, u64 flags); These closely follow the functionalities of the original ringbuf APIs. For example, all ringbuffer dynptrs that have been reserved must be either submitted or discarded before the program exits. Signed-off-by: Joanne Koong Signed-off-by: Andrii Nakryiko Acked-by: Andrii Nakryiko Acked-by: David Vernet Link: https://lore.kernel.org/bpf/20220523210712.3641569-4-joannelkoong@gmail.com --- include/linux/bpf.h | 15 ++++++++++++++- include/linux/bpf_verifier.h | 2 ++ 2 files changed, 16 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b26c8176b9e0..c72321b6f306 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -395,11 +395,14 @@ enum bpf_type_flag { /* DYNPTR points to memory local to the bpf program. */ DYNPTR_TYPE_LOCAL = BIT(8 + BPF_BASE_TYPE_BITS), + /* DYNPTR points to a ringbuf record. */ + DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; -#define DYNPTR_TYPE_FLAG_MASK DYNPTR_TYPE_LOCAL +#define DYNPTR_TYPE_FLAG_MASK (DYNPTR_TYPE_LOCAL | DYNPTR_TYPE_RINGBUF) /* Max number of base types. */ #define BPF_BASE_TYPE_LIMIT (1UL << BPF_BASE_TYPE_BITS) @@ -2231,6 +2234,9 @@ extern const struct bpf_func_proto bpf_ringbuf_reserve_proto; extern const struct bpf_func_proto bpf_ringbuf_submit_proto; extern const struct bpf_func_proto bpf_ringbuf_discard_proto; extern const struct bpf_func_proto bpf_ringbuf_query_proto; +extern const struct bpf_func_proto bpf_ringbuf_reserve_dynptr_proto; +extern const struct bpf_func_proto bpf_ringbuf_submit_dynptr_proto; +extern const struct bpf_func_proto bpf_ringbuf_discard_dynptr_proto; extern const struct bpf_func_proto bpf_skc_to_tcp6_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_sock_proto; extern const struct bpf_func_proto bpf_skc_to_tcp_timewait_sock_proto; @@ -2402,6 +2408,13 @@ enum bpf_dynptr_type { BPF_DYNPTR_TYPE_INVALID, /* Points to memory that is local to the bpf program */ BPF_DYNPTR_TYPE_LOCAL, + /* Underlying data is a ringbuf record */ + BPF_DYNPTR_TYPE_RINGBUF, }; +void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, + enum bpf_dynptr_type type, u32 offset, u32 size); +void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr); +int bpf_dynptr_check_size(u32 size); + #endif /* _LINUX_BPF_H */ diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index af5b2135215e..e8439f6cbe57 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -100,6 +100,8 @@ struct bpf_reg_state { * for the purpose of tracking that it's freed. * For PTR_TO_SOCKET this is used to share which pointers retain the * same reference to the socket, to determine proper reference freeing. + * For stack slots that are dynptrs, this is used to track references to + * the dynptr to determine proper reference freeing. */ u32 id; /* PTR_TO_SOCKET and PTR_TO_TCP_SOCK could be a ptr returned -- cgit From 34d4ef5775f776ec4b0d53a02d588bf3195cada6 Mon Sep 17 00:00:00 2001 From: Joanne Koong Date: Mon, 23 May 2022 14:07:11 -0700 Subject: bpf: Add dynptr data slices This patch adds a new helper function void *bpf_dynptr_data(struct bpf_dynptr *ptr, u32 offset, u32 len); which returns a pointer to the underlying data of a dynptr. *len* must be a statically known value. The bpf program may access the returned data slice as a normal buffer (eg can do direct reads and writes), since the verifier associates the length with the returned pointer, and enforces that no out of bounds accesses occur. Signed-off-by: Joanne Koong Signed-off-by: Andrii Nakryiko Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20220523210712.3641569-6-joannelkoong@gmail.com --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index c72321b6f306..a7080c86fa76 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -488,6 +488,7 @@ enum bpf_return_type { RET_PTR_TO_TCP_SOCK_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_TCP_SOCK, RET_PTR_TO_SOCK_COMMON_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_SOCK_COMMON, RET_PTR_TO_ALLOC_MEM_OR_NULL = PTR_MAYBE_NULL | MEM_ALLOC | RET_PTR_TO_ALLOC_MEM, + RET_PTR_TO_DYNPTR_MEM_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_ALLOC_MEM, RET_PTR_TO_BTF_ID_OR_NULL = PTR_MAYBE_NULL | RET_PTR_TO_BTF_ID, /* This must be the last entry. Its purpose is to ensure the enum is -- cgit From 5d7c854593a460706dacf8e1b16c9bdcb1c2d7bb Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Mon, 28 Mar 2022 08:26:48 +0200 Subject: livepatch: Remove klp_arch_set_pc() and asm/livepatch.h All three versions of klp_arch_set_pc() do exactly the same: they call ftrace_instruction_pointer_set(). Call ftrace_instruction_pointer_set() directly and remove klp_arch_set_pc(). As klp_arch_set_pc() was the only thing remaining in asm/livepatch.h on x86 and s390, remove asm/livepatch.h livepatch.h remains on powerpc but its content is exclusively used by powerpc specific code. Signed-off-by: Christophe Leroy Acked-by: Petr Mladek Acked-by: Peter Zijlstra (Intel) Acked-by: Miroslav Benes Signed-off-by: Petr Mladek --- include/linux/livepatch.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index 2614247a9781..293e29960c6e 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -16,8 +16,6 @@ #if IS_ENABLED(CONFIG_LIVEPATCH) -#include - /* task patch states */ #define KLP_UNDEFINED -1 #define KLP_UNPATCHED 0 -- cgit From 7b4537199a4a8480b8c3ba37a2d44765ce76cd9b Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Fri, 13 May 2022 20:39:22 +0900 Subject: kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS include/{linux,asm-generic}/export.h defines a weak symbol, __crc_* as a placeholder. Genksyms writes the version CRCs into the linker script, which will be used for filling the __crc_* symbols. The linker script format depends on CONFIG_MODULE_REL_CRCS. If it is enabled, __crc_* holds the offset to the reference of CRC. It is time to get rid of this complexity. Now that modpost parses text files (.*.cmd) to collect all the CRCs, it can generate C code that will be linked to the vmlinux or modules. Generate a new C file, .vmlinux.export.c, which contains the CRCs of symbols exported by vmlinux. It is compiled and linked to vmlinux in scripts/link-vmlinux.sh. Put the CRCs of symbols exported by modules into the existing *.mod.c files. No additional build step is needed for modules. As before, *.mod.c are compiled and linked to *.ko in scripts/Makefile.modfinal. No linker magic is used here. The new C implementation works in the same way, whether CONFIG_RELOCATABLE is enabled or not. CONFIG_MODULE_REL_CRCS is no longer needed. Previously, Kbuild invoked additional $(LD) to update the CRCs in objects, but this step is unneeded too. Signed-off-by: Masahiro Yamada Tested-by: Nathan Chancellor Tested-by: Nicolas Schier Reviewed-by: Nicolas Schier Tested-by: Sedat Dilek # LLVM-14 (x86-64) --- include/linux/export-internal.h | 17 +++++++++++++++++ include/linux/export.h | 30 ++++++++---------------------- 2 files changed, 25 insertions(+), 22 deletions(-) create mode 100644 include/linux/export-internal.h (limited to 'include/linux') diff --git a/include/linux/export-internal.h b/include/linux/export-internal.h new file mode 100644 index 000000000000..c2b1d4fd5987 --- /dev/null +++ b/include/linux/export-internal.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Please do not include this explicitly. + * This is used by C files generated by modpost. + */ + +#ifndef __LINUX_EXPORT_INTERNAL_H__ +#define __LINUX_EXPORT_INTERNAL_H__ + +#include +#include + +/* __used is needed to keep __crc_* for LTO */ +#define SYMBOL_CRC(sym, crc, sec) \ + u32 __section("___kcrctab" sec "+" #sym) __used __crc_##sym = crc + +#endif /* __LINUX_EXPORT_INTERNAL_H__ */ diff --git a/include/linux/export.h b/include/linux/export.h index 27d848712b90..565c5ffcb26f 100644 --- a/include/linux/export.h +++ b/include/linux/export.h @@ -11,6 +11,14 @@ * hackers place grumpy comments in header files. */ +/* + * This comment block is used by fixdep. Please do not remove. + * + * When CONFIG_MODVERSIONS is changed from n to y, all source files having + * EXPORT_SYMBOL variants must be re-compiled because genksyms is run as a + * side effect of the *.o build rule. + */ + #ifndef __ASSEMBLY__ #ifdef MODULE extern struct module __this_module; @@ -19,26 +27,6 @@ extern struct module __this_module; #define THIS_MODULE ((struct module *)0) #endif -#ifdef CONFIG_MODVERSIONS -/* Mark the CRC weak since genksyms apparently decides not to - * generate a checksums for some symbols */ -#if defined(CONFIG_MODULE_REL_CRCS) -#define __CRC_SYMBOL(sym, sec) \ - asm(" .section \"___kcrctab" sec "+" #sym "\", \"a\" \n" \ - " .weak __crc_" #sym " \n" \ - " .long __crc_" #sym " - . \n" \ - " .previous \n") -#else -#define __CRC_SYMBOL(sym, sec) \ - asm(" .section \"___kcrctab" sec "+" #sym "\", \"a\" \n" \ - " .weak __crc_" #sym " \n" \ - " .long __crc_" #sym " \n" \ - " .previous \n") -#endif -#else -#define __CRC_SYMBOL(sym, sec) -#endif - #ifdef CONFIG_HAVE_ARCH_PREL32_RELOCATIONS #include /* @@ -85,7 +73,6 @@ struct kernel_symbol { /* * For every exported symbol, do the following: * - * - If applicable, place a CRC entry in the __kcrctab section. * - Put the name of the symbol and namespace (empty string "" for none) in * __ksymtab_strings. * - Place a struct kernel_symbol entry in the __ksymtab section. @@ -98,7 +85,6 @@ struct kernel_symbol { extern typeof(sym) sym; \ extern const char __kstrtab_##sym[]; \ extern const char __kstrtabns_##sym[]; \ - __CRC_SYMBOL(sym, sec); \ asm(" .section \"__ksymtab_strings\",\"aMS\",%progbits,1 \n" \ "__kstrtab_" #sym ": \n" \ " .asciz \"" #sym "\" \n" \ -- cgit From 421cfe6596f6cb316991c02bf30a93bd81092853 Mon Sep 17 00:00:00 2001 From: Matthew Rosato Date: Thu, 19 May 2022 14:33:11 -0400 Subject: vfio: remove VFIO_GROUP_NOTIFY_SET_KVM Rather than relying on a notifier for associating the KVM with the group, let's assume that the association has already been made prior to device_open. The first time a device is opened associate the group KVM with the device. This fixes a user-triggerable oops in GVT. Reviewed-by: Tony Krowiak Reviewed-by: Kevin Tian Reviewed-by: Christoph Hellwig Signed-off-by: Jason Gunthorpe Signed-off-by: Matthew Rosato Reviewed-by: Jason Gunthorpe Acked-by: Zhi Wang Link: https://lore.kernel.org/r/20220519183311.582380-2-mjrosato@linux.ibm.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 45b287826ce6..aa888cc51757 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -36,6 +36,8 @@ struct vfio_device { struct vfio_device_set *dev_set; struct list_head dev_set_list; unsigned int migration_flags; + /* Driver must reference the kvm during open_device or never touch it */ + struct kvm *kvm; /* Members below here are private, not for driver use */ refcount_t refcount; @@ -155,15 +157,11 @@ extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, /* each type has independent events */ enum vfio_notify_type { VFIO_IOMMU_NOTIFY = 0, - VFIO_GROUP_NOTIFY = 1, }; /* events for VFIO_IOMMU_NOTIFY */ #define VFIO_IOMMU_NOTIFY_DMA_UNMAP BIT(0) -/* events for VFIO_GROUP_NOTIFY */ -#define VFIO_GROUP_NOTIFY_SET_KVM BIT(0) - extern int vfio_register_notifier(struct vfio_device *device, enum vfio_notify_type type, unsigned long *required_events, -- cgit From eadb2f47a3ced5c64b23b90fd2a3463f63726066 Mon Sep 17 00:00:00 2001 From: Daniel Thompson Date: Mon, 23 May 2022 19:11:02 +0100 Subject: lockdown: also lock down previous kgdb use KGDB and KDB allow read and write access to kernel memory, and thus should be restricted during lockdown. An attacker with access to a serial port (for example, via a hypervisor console, which some cloud vendors provide over the network) could trigger the debugger so it is important that the debugger respect the lockdown mode when/if it is triggered. Fix this by integrating lockdown into kdb's existing permissions mechanism. Unfortunately kgdb does not have any permissions mechanism (although it certainly could be added later) so, for now, kgdb is simply and brutally disabled by immediately exiting the gdb stub without taking any action. For lockdowns established early in the boot (e.g. the normal case) then this should be fine but on systems where kgdb has set breakpoints before the lockdown is enacted than "bad things" will happen. CVE: CVE-2022-21499 Co-developed-by: Stephen Brennan Signed-off-by: Stephen Brennan Reviewed-by: Douglas Anderson Signed-off-by: Daniel Thompson Signed-off-by: Linus Torvalds --- include/linux/security.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/security.h b/include/linux/security.h index 25b3ef71f495..7fc4e9f49f54 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -121,10 +121,12 @@ enum lockdown_reason { LOCKDOWN_DEBUGFS, LOCKDOWN_XMON_WR, LOCKDOWN_BPF_WRITE_USER, + LOCKDOWN_DBG_WRITE_KERNEL, LOCKDOWN_INTEGRITY_MAX, LOCKDOWN_KCORE, LOCKDOWN_KPROBES, LOCKDOWN_BPF_READ_KERNEL, + LOCKDOWN_DBG_READ_KERNEL, LOCKDOWN_PERF, LOCKDOWN_TRACEFS, LOCKDOWN_XMON_RW, -- cgit From 93984f19e7bce4c18084a6ef3dacafb155b806ed Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 29 Apr 2022 21:00:23 +0000 Subject: KVM: Fully serialize gfn=>pfn cache refresh via mutex Protect gfn=>pfn cache refresh with a mutex to fully serialize refreshes. The refresh logic doesn't protect against - concurrent unmaps, or refreshes with different GPAs (which may or may not happen in practice, for example if a cache is only used under vcpu->mutex; but it's allowed in the code) - a false negative on the memslot generation. If the first refresh sees a stale memslot generation, it will refresh the hva and generation before moving on to the hva=>pfn translation. If it then drops gpc->lock, a different user of the cache can come along, acquire gpc->lock, see that the memslot generation is fresh, and skip the hva=>pfn update due to the userspace address also matching (because it too was updated). The refresh path can already sleep during hva=>pfn resolution, so wrap the refresh with a mutex to ensure that any given refresh runs to completion before other callers can start their refresh. Cc: stable@vger.kernel.org Cc: Lai Jiangshan Signed-off-by: Sean Christopherson Message-Id: <20220429210025.3293691-7-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_types.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index ac1ebb37a0ff..f328a01db4fe 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -19,6 +19,7 @@ struct kvm_memslots; enum kvm_mr_change; #include +#include #include #include @@ -69,6 +70,7 @@ struct gfn_to_pfn_cache { struct kvm_vcpu *vcpu; struct list_head list; rwlock_t lock; + struct mutex refresh_lock; void *khva; kvm_pfn_t pfn; enum pfn_cache_usage usage; -- cgit From 43994049180704fd1faf78623fabd9a5cd443708 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Wed, 4 May 2022 12:36:31 +0900 Subject: kprobes: Fix build errors with CONFIG_KRETPROBES=n MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Max Filippov reported: When building kernel with CONFIG_KRETPROBES=n kernel/kprobes.c compilation fails with the following messages: kernel/kprobes.c: In function ‘recycle_rp_inst’: kernel/kprobes.c:1273:32: error: implicit declaration of function ‘get_kretprobe’ kernel/kprobes.c: In function ‘kprobe_flush_task’: kernel/kprobes.c:1299:35: error: ‘struct task_struct’ has no member named ‘kretprobe_instances’ This came from the commit d741bf41d7c7 ("kprobes: Remove kretprobe hash") which introduced get_kretprobe() and kretprobe_instances member in task_struct when CONFIG_KRETPROBES=y, but did not make recycle_rp_inst() and kprobe_flush_task() depending on CONFIG_KRETPORBES. Since those functions are only used for kretprobe, move those functions into #ifdef CONFIG_KRETPROBE area. Link: https://lkml.kernel.org/r/165163539094.74407.3838114721073251225.stgit@devnote2 Reported-by: Max Filippov Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash") Cc: "Naveen N . Rao" Cc: Anil S Keshavamurthy Cc: "David S . Miller" Cc: Peter Zijlstra Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu Tested-by: Max Filippov Signed-off-by: Steven Rostedt (Google) --- include/linux/kprobes.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 157168769fc2..55041d2f884d 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -424,7 +424,7 @@ void unregister_kretprobe(struct kretprobe *rp); int register_kretprobes(struct kretprobe **rps, int num); void unregister_kretprobes(struct kretprobe **rps, int num); -#ifdef CONFIG_KRETPROBE_ON_RETHOOK +#if defined(CONFIG_KRETPROBE_ON_RETHOOK) || !defined(CONFIG_KRETPROBES) #define kprobe_flush_task(tk) do {} while (0) #else void kprobe_flush_task(struct task_struct *tk); -- cgit From 3a2bfec0b02f2226ff3376a5d2ff604d799bd7ea Mon Sep 17 00:00:00 2001 From: Li kunyu Date: Wed, 18 May 2022 10:36:40 +0800 Subject: ftrace: Remove return value of ftrace_arch_modify_*() All instances of the function ftrace_arch_modify_prepare() and ftrace_arch_modify_post_process() return zero. There's no point in checking their return value. Just have them be void functions. Link: https://lkml.kernel.org/r/20220518023639.4065-1-kunyu@nfschina.com Signed-off-by: Li kunyu Signed-off-by: Steven Rostedt (Google) --- include/linux/ftrace.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 4816b7e11047..a5f74f6e7e4e 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -449,8 +449,8 @@ static inline void stack_tracer_enable(void) { } #ifdef CONFIG_DYNAMIC_FTRACE -int ftrace_arch_code_modify_prepare(void); -int ftrace_arch_code_modify_post_process(void); +void ftrace_arch_code_modify_prepare(void); +void ftrace_arch_code_modify_post_process(void); enum ftrace_bug_type { FTRACE_BUG_UNKNOWN, -- cgit From 656d054e0a15ec327bd82801ccd58201e59f6896 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 2 May 2022 12:30:20 +0200 Subject: jump_label,noinstr: Avoid instrumentation for JUMP_LABEL=n builds When building x86_64 with JUMP_LABEL=n it's possible for instrumentation to sneak into noinstr: vmlinux.o: warning: objtool: exit_to_user_mode+0x14: call to static_key_count.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: syscall_exit_to_user_mode+0x2d: call to static_key_count.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: irqentry_exit_to_user_mode+0x1b: call to static_key_count.constprop.0() leaves .noinstr.text section Switch to arch_ prefixed atomic to avoid the explicit instrumentation. Reported-by: kernel test robot Signed-off-by: Peter Zijlstra (Intel) --- include/linux/jump_label.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 107751cc047b..bf1eef337a07 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -256,9 +256,9 @@ extern void static_key_disable_cpuslocked(struct static_key *key); #include #include -static inline int static_key_count(struct static_key *key) +static __always_inline int static_key_count(struct static_key *key) { - return atomic_read(&key->enabled); + return arch_atomic_read(&key->enabled); } static __always_inline void jump_label_init(void) -- cgit From 620f8d3bd3d5e82dff8cc591c831827d4beeae2e Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Sat, 7 May 2022 13:35:37 +0200 Subject: context_tracking: Always inline empty stubs Because GCC is seriously challenged.. vmlinux.o: warning: objtool: enter_from_user_mode+0x85: call to context_tracking_enabled() leaves .noinstr.text section vmlinux.o: warning: objtool: syscall_enter_from_user_mode+0x8f: call to context_tracking_enabled() leaves .noinstr.text section vmlinux.o: warning: objtool: syscall_enter_from_user_mode_prepare+0x85: call to context_tracking_enabled() leaves .noinstr.text section vmlinux.o: warning: objtool: irqentry_enter_from_user_mode+0x85: call to context_tracking_enabled() leaves .noinstr.text section Signed-off-by: Peter Zijlstra (Intel) Acked-by: Mark Rutland Link: https://lkml.kernel.org/r/20220526105958.134113388@infradead.org --- include/linux/context_tracking_state.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 65a60d3313b0..ae1e63e26947 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -46,10 +46,10 @@ static __always_inline bool context_tracking_in_user(void) return __this_cpu_read(context_tracking.state) == CONTEXT_USER; } #else -static inline bool context_tracking_in_user(void) { return false; } -static inline bool context_tracking_enabled(void) { return false; } -static inline bool context_tracking_enabled_cpu(int cpu) { return false; } -static inline bool context_tracking_enabled_this_cpu(void) { return false; } +static __always_inline bool context_tracking_in_user(void) { return false; } +static __always_inline bool context_tracking_enabled(void) { return false; } +static __always_inline bool context_tracking_enabled_cpu(int cpu) { return false; } +static __always_inline bool context_tracking_enabled_this_cpu(void) { return false; } #endif /* CONFIG_CONTEXT_TRACKING */ #endif -- cgit From b9684a71fca793213378dd410cd11675d973eaa1 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 27 May 2022 07:58:06 +0200 Subject: block, loop: support partitions without scanning Historically we did distinguish between a flag that surpressed partition scanning, and a combinations of the minors variable and another flag if any partitions were supported. This was generally confusing and doesn't make much sense, but some corner case uses of the loop driver actually do want to support manually added partitions on a device that does not actively scan for partitions. To make things worsee the loop driver also wants to dynamically toggle the scanning for partitions on a live gendisk, which makes the disk->flags updates non-atomic. Introduce a new GD_SUPPRESS_PART_SCAN bit in disk->state that disables just scanning for partitions, and toggle that instead of GENHD_FL_NO_PART in the loop driver. Fixes: 1ebe2e5f9d68 ("block: remove GENHD_FL_EXT_DEVT") Reported-by: Ming Lei Signed-off-by: Christoph Hellwig Reviewed-by: Ming Lei Link: https://lore.kernel.org/r/20220527055806.1972352-1-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index afad3d1d0dac..691b4c15b8ce 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -147,6 +147,7 @@ struct gendisk { #define GD_DEAD 2 #define GD_NATIVE_CAPACITY 3 #define GD_ADDED 4 +#define GD_SUPPRESS_PART_SCAN 5 struct mutex open_mutex; /* open/close mutex */ unsigned open_partitions; /* number of open partitions */ -- cgit From 3e35142ef99fe6b4fe5d834ad43ee13cca10a2dc Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Thu, 19 May 2022 14:42:37 +0530 Subject: kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add] Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") [1], binutils (v2.36+) started dropping section symbols that it thought were unused. This isn't an issue in general, but with kexec_file.c, gcc is placing kexec_arch_apply_relocations[_add] into a separate .text.unlikely section and the section symbol ".text.unlikely" is being dropped. Due to this, recordmcount is unable to find a non-weak symbol in .text.unlikely to generate a relocation record against. Address this by dropping the weak attribute from these functions. Instead, follow the existing pattern of having architectures #define the name of the function they want to override in their headers. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=d1bcae833b32f1 [akpm@linux-foundation.org: arch/s390/include/asm/kexec.h needs linux/module.h] Link: https://lkml.kernel.org/r/20220519091237.676736-1-naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Michael Ellerman Signed-off-by: Naveen N. Rao Cc: "Eric W. Biederman" Cc: Signed-off-by: Andrew Morton --- include/linux/kexec.h | 46 ++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 38 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 58d1b58a971e..fcd5035209f1 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -193,14 +193,6 @@ void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name); int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, unsigned long buf_len); void *arch_kexec_kernel_image_load(struct kimage *image); -int arch_kexec_apply_relocations_add(struct purgatory_info *pi, - Elf_Shdr *section, - const Elf_Shdr *relsec, - const Elf_Shdr *symtab); -int arch_kexec_apply_relocations(struct purgatory_info *pi, - Elf_Shdr *section, - const Elf_Shdr *relsec, - const Elf_Shdr *symtab); int arch_kimage_file_post_load_cleanup(struct kimage *image); #ifdef CONFIG_KEXEC_SIG int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, @@ -229,6 +221,44 @@ extern int crash_exclude_mem_range(struct crash_mem *mem, unsigned long long mend); extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map, void **addr, unsigned long *sz); + +#ifndef arch_kexec_apply_relocations_add +/* + * arch_kexec_apply_relocations_add - apply relocations of type RELA + * @pi: Purgatory to be relocated. + * @section: Section relocations applying to. + * @relsec: Section containing RELAs. + * @symtab: Corresponding symtab. + * + * Return: 0 on success, negative errno on error. + */ +static inline int +arch_kexec_apply_relocations_add(struct purgatory_info *pi, Elf_Shdr *section, + const Elf_Shdr *relsec, const Elf_Shdr *symtab) +{ + pr_err("RELA relocation unsupported.\n"); + return -ENOEXEC; +} +#endif + +#ifndef arch_kexec_apply_relocations +/* + * arch_kexec_apply_relocations - apply relocations of type REL + * @pi: Purgatory to be relocated. + * @section: Section relocations applying to. + * @relsec: Section containing RELs. + * @symtab: Corresponding symtab. + * + * Return: 0 on success, negative errno on error. + */ +static inline int +arch_kexec_apply_relocations(struct purgatory_info *pi, Elf_Shdr *section, + const Elf_Shdr *relsec, const Elf_Shdr *symtab) +{ + pr_err("REL relocation unsupported.\n"); + return -ENOEXEC; +} +#endif #endif /* CONFIG_KEXEC_FILE */ #ifdef CONFIG_KEXEC_ELF -- cgit From 9f186f9e5fa9ebdaef909fd45f825a6ce281f13c Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 19 May 2022 20:50:26 +0800 Subject: mm/swapfile: unuse_pte can map random data if swap read fails Patch series "A few fixup patches for mm", v4. This series contains a few patches to avoid mapping random data if swap read fails and fix lost swap bits in unuse_pte. Also we free hwpoison and swapin error entry in madvise_free_pte_range and so on. More details can be found in the respective changelogs. This patch (of 5): There is a bug in unuse_pte(): when swap page happens to be unreadable, page filled with random data is mapped into user address space. In case of error, a special swap entry indicating swap read fails is set to the page table. So the swapcache page can be freed and the user won't end up with a permanently mounted swap because a sector is bad. And if the page is accessed later, the user process will be killed so that corrupted data is never consumed. On the other hand, if the page is never accessed, the user won't even notice it. Link: https://lkml.kernel.org/r/20220519125030.21486-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220519125030.21486-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: David Hildenbrand Cc: Hugh Dickins Cc: Matthew Wilcox (Oracle) Cc: Vlastimil Babka Cc: David Howells Cc: NeilBrown Cc: Alistair Popple Cc: Suren Baghdasaryan Cc: Peter Xu Cc: Ralph Campbell Cc: Naoya Horiguchi Signed-off-by: Andrew Morton --- include/linux/swap.h | 7 ++++++- include/linux/swapops.h | 10 ++++++++++ 2 files changed, 16 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index f3ae17b43f20..0c0fed1b348f 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -55,6 +55,10 @@ static inline int current_is_kswapd(void) * actions on faults. */ +#define SWP_SWAPIN_ERROR_NUM 1 +#define SWP_SWAPIN_ERROR (MAX_SWAPFILES + SWP_HWPOISON_NUM + \ + SWP_MIGRATION_NUM + SWP_DEVICE_NUM + \ + SWP_PTE_MARKER_NUM) /* * PTE markers are used to persist information onto PTEs that are mapped with * file-backed memories. As its name "PTE" hints, it should only be applied to @@ -120,7 +124,8 @@ static inline int current_is_kswapd(void) #define MAX_SWAPFILES \ ((1 << MAX_SWAPFILES_SHIFT) - SWP_DEVICE_NUM - \ - SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - SWP_PTE_MARKER_NUM) + SWP_MIGRATION_NUM - SWP_HWPOISON_NUM - \ + SWP_PTE_MARKER_NUM - SWP_SWAPIN_ERROR_NUM) /* * Magic header for a swap area. The first part of the union is diff --git a/include/linux/swapops.h b/include/linux/swapops.h index fe220df499f1..f24775b41880 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -108,6 +108,16 @@ static inline void *swp_to_radix_entry(swp_entry_t entry) return xa_mk_value(entry.val); } +static inline swp_entry_t make_swapin_error_entry(struct page *page) +{ + return swp_entry(SWP_SWAPIN_ERROR, page_to_pfn(page)); +} + +static inline int is_swapin_error_entry(swp_entry_t entry) +{ + return swp_type(entry) == SWP_SWAPIN_ERROR; +} + #if IS_ENABLED(CONFIG_DEVICE_PRIVATE) static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset) { -- cgit From 1c563432588dbffa71e67ca6e37c826f9fa86e04 Mon Sep 17 00:00:00 2001 From: Minchan Kim Date: Tue, 24 May 2022 10:15:25 -0700 Subject: mm: fix is_pinnable_page against a cma page Pages in the CMA area could have MIGRATE_ISOLATE as well as MIGRATE_CMA so the current is_pinnable_page() could miss CMA pages which have MIGRATE_ISOLATE. It ends up pinning CMA pages as longterm for the pin_user_pages() API so CMA allocations keep failing until the pin is released. CPU 0 CPU 1 - Task B cma_alloc alloc_contig_range pin_user_pages_fast(FOLL_LONGTERM) change pageblock as MIGRATE_ISOLATE internal_get_user_pages_fast lockless_pages_from_mm gup_pte_range try_grab_folio is_pinnable_page return true; So, pinned the page successfully. page migration failure with pinned page .. .. After 30 sec unpin_user_page(page) CMA allocation succeeded after 30 sec. The CMA allocation path protects the migration type change race using zone->lock but what GUP path need to know is just whether the page is on CMA area or not rather than exact migration type. Thus, we don't need zone->lock but just checks migration type in either of (MIGRATE_ISOLATE and MIGRATE_CMA). Adding the MIGRATE_ISOLATE check in is_pinnable_page could cause rejecting of pinning pages on MIGRATE_ISOLATE pageblocks even though it's neither CMA nor movable zone if the page is temporarily unmovable. However, such a migration failure by unexpected temporal refcount holding is general issue, not only come from MIGRATE_ISOLATE and the MIGRATE_ISOLATE is also transient state like other temporal elevated refcount problem. Link: https://lkml.kernel.org/r/20220524171525.976723-1-minchan@kernel.org Signed-off-by: Minchan Kim Reviewed-by: John Hubbard Acked-by: Paul E. McKenney Cc: David Hildenbrand Signed-off-by: Andrew Morton --- include/linux/mm.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index de32c0383387..93a46ff33dc2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1594,8 +1594,13 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, #ifdef CONFIG_MIGRATION static inline bool is_pinnable_page(struct page *page) { - return !(is_zone_movable_page(page) || is_migrate_cma_page(page)) || - is_zero_pfn(page_to_pfn(page)); +#ifdef CONFIG_CMA + int mt = get_pageblock_migratetype(page); + + if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) + return false; +#endif + return !(is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page))); } #else static inline bool is_pinnable_page(struct page *page) -- cgit From 98d40e76652e9aeb3aec4065f600d633ed335e94 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Tue, 24 May 2022 07:56:30 +0200 Subject: block: document BLK_STS_AGAIN usage BLK_STS_AGAIN should only be used if RQF_NOWAIT is set and the bio would block. So we'd better document that to avoid accidental misuse. Signed-off-by: Hannes Reinecke Reviewed-by: Chaitanya Kulkarni Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220524055631.85480-2-hare@suse.de Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 40e815400611..8b38367d1bc7 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -105,6 +105,10 @@ typedef u16 blk_short_t; /* hack for device mapper, don't use elsewhere: */ #define BLK_STS_DM_REQUEUE ((__force blk_status_t)11) +/* + * BLK_STS_AGAIN should only be returned if RQF_NOWAIT is set + * and the bio would block (cf bio_wouldblock_error()) + */ #define BLK_STS_AGAIN ((__force blk_status_t)12) /* -- cgit From e2e530867245d051dc7800b0d07193b3e581f5b9 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 24 May 2022 14:15:30 +0200 Subject: blk-mq: remove the done argument to blk_execute_rq_nowait Let the caller set it together with the end_io_data instead of passing a pointless argument. Note the the target code did in fact already set it and then just overrode it again by calling blk_execute_rq_nowait. Signed-off-by: Christoph Hellwig Reviewed-by: Keith Busch Reviewed-by: Kanchan Joshi Reviewed-by: Chaitanya Kulkarni Link: https://lore.kernel.org/r/20220524121530.943123-4-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 9f07061418db..e2d9daf7e8dd 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -969,8 +969,7 @@ int blk_rq_unmap_user(struct bio *); int blk_rq_map_kern(struct request_queue *, struct request *, void *, unsigned int, gfp_t); int blk_rq_append_bio(struct request *rq, struct bio *bio); -void blk_execute_rq_nowait(struct request *rq, bool at_head, - rq_end_io_fn *end_io); +void blk_execute_rq_nowait(struct request *rq, bool at_head); blk_status_t blk_execute_rq(struct request *rq, bool at_head); struct req_iterator { -- cgit From 01357a5a45ed8eb9543183f5c9c6713ae60fc1f3 Mon Sep 17 00:00:00 2001 From: Christian König Date: Sun, 24 Apr 2022 16:55:14 +0200 Subject: dma-buf: cleanup dma_fence_unwrap implementation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Move the code from the inline functions into exported functions. Signed-off-by: Christian König Acked-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220518135844.3338-3-christian.koenig@amd.com --- include/linux/dma-fence-unwrap.h | 52 ++++------------------------------------ 1 file changed, 4 insertions(+), 48 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-fence-unwrap.h b/include/linux/dma-fence-unwrap.h index 77e335a1bcac..e7c219da4ed7 100644 --- a/include/linux/dma-fence-unwrap.h +++ b/include/linux/dma-fence-unwrap.h @@ -1,7 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * fence-chain: chain fences together in a timeline - * * Copyright (C) 2022 Advanced Micro Devices, Inc. * Authors: * Christian König @@ -10,8 +8,7 @@ #ifndef __LINUX_DMA_FENCE_UNWRAP_H #define __LINUX_DMA_FENCE_UNWRAP_H -#include -#include +struct dma_fence; /** * struct dma_fence_unwrap - cursor into the container structure @@ -33,50 +30,9 @@ struct dma_fence_unwrap { unsigned int index; }; -/* Internal helper to start new array iteration, don't use directly */ -static inline struct dma_fence * -__dma_fence_unwrap_array(struct dma_fence_unwrap * cursor) -{ - cursor->array = dma_fence_chain_contained(cursor->chain); - cursor->index = 0; - return dma_fence_array_first(cursor->array); -} - -/** - * dma_fence_unwrap_first - return the first fence from fence containers - * @head: the entrypoint into the containers - * @cursor: current position inside the containers - * - * Unwraps potential dma_fence_chain/dma_fence_array containers and return the - * first fence. - */ -static inline struct dma_fence * -dma_fence_unwrap_first(struct dma_fence *head, struct dma_fence_unwrap *cursor) -{ - cursor->chain = dma_fence_get(head); - return __dma_fence_unwrap_array(cursor); -} - -/** - * dma_fence_unwrap_next - return the next fence from a fence containers - * @cursor: current position inside the containers - * - * Continue unwrapping the dma_fence_chain/dma_fence_array containers and return - * the next fence from them. - */ -static inline struct dma_fence * -dma_fence_unwrap_next(struct dma_fence_unwrap *cursor) -{ - struct dma_fence *tmp; - - ++cursor->index; - tmp = dma_fence_array_next(cursor->array, cursor->index); - if (tmp) - return tmp; - - cursor->chain = dma_fence_chain_walk(cursor->chain); - return __dma_fence_unwrap_array(cursor); -} +struct dma_fence *dma_fence_unwrap_first(struct dma_fence *head, + struct dma_fence_unwrap *cursor); +struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor); /** * dma_fence_unwrap_for_each - iterate over all fences in containers -- cgit From 8f61973718485f3e89bc4f408f929048b7b47c83 Mon Sep 17 00:00:00 2001 From: Christian König Date: Wed, 4 May 2022 13:01:29 +0200 Subject: dma-buf: return only unsignaled fences in dma_fence_unwrap_for_each v3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit dma_fence_chain containers cleanup signaled fences automatically, so filter those out from arrays as well. v2: fix missing walk over the array v3: massively simplify the patch and actually update the description. Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220518135844.3338-4-christian.koenig@amd.com --- include/linux/dma-fence-unwrap.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dma-fence-unwrap.h b/include/linux/dma-fence-unwrap.h index e7c219da4ed7..a4d342fef8e0 100644 --- a/include/linux/dma-fence-unwrap.h +++ b/include/linux/dma-fence-unwrap.h @@ -43,9 +43,13 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor); * Unwrap dma_fence_chain and dma_fence_array containers and deep dive into all * potential fences in them. If @head is just a normal fence only that one is * returned. + * + * Note that signalled fences are opportunistically filtered out, which + * means the iteration is potentially over no fence at all. */ #define dma_fence_unwrap_for_each(fence, cursor, head) \ for (fence = dma_fence_unwrap_first(head, cursor); fence; \ - fence = dma_fence_unwrap_next(cursor)) + fence = dma_fence_unwrap_next(cursor)) \ + if (!dma_fence_is_signaled(fence)) #endif -- cgit From 245a4a7b531cffb41233a716497c25b06835cf4b Mon Sep 17 00:00:00 2001 From: Christian König Date: Mon, 25 Apr 2022 14:22:12 +0200 Subject: dma-buf: generalize dma_fence unwrap & merging v3 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce a dma_fence_unwrap_merge() macro which allows to unwrap fences which potentially can be containers as well and then merge them back together into a flat dma_fence_array. v2: rename the function, add some more comments about how the wrapper is used, move filtering of signaled fences into the unwrap iterator, add complex selftest which covers more cases. v3: fix signaled fence filtering once more Signed-off-by: Christian König Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220518135844.3338-5-christian.koenig@amd.com --- include/linux/dma-fence-unwrap.h | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-fence-unwrap.h b/include/linux/dma-fence-unwrap.h index a4d342fef8e0..390de1ee9d35 100644 --- a/include/linux/dma-fence-unwrap.h +++ b/include/linux/dma-fence-unwrap.h @@ -52,4 +52,28 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor); fence = dma_fence_unwrap_next(cursor)) \ if (!dma_fence_is_signaled(fence)) +struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences, + struct dma_fence **fences, + struct dma_fence_unwrap *cursors); + +/** + * dma_fence_unwrap_merge - unwrap and merge fences + * + * All fences given as parameters are unwrapped and merged back together as flat + * dma_fence_array. Useful if multiple containers need to be merged together. + * + * Implemented as a macro to allocate the necessary arrays on the stack and + * account the stack frame size to the caller. + * + * Returns NULL on memory allocation failure, a dma_fence object representing + * all the given fences otherwise. + */ +#define dma_fence_unwrap_merge(...) \ + ({ \ + struct dma_fence *__f[] = { __VA_ARGS__ }; \ + struct dma_fence_unwrap __c[ARRAY_SIZE(__f)]; \ + \ + __dma_fence_unwrap_merge(ARRAY_SIZE(__f), __f, __c); \ + }) + #endif -- cgit From 3e0b8f529c10037ae0b369fc892e524eae5a5485 Mon Sep 17 00:00:00 2001 From: Arun Ajith S Date: Mon, 30 May 2022 10:14:14 +0000 Subject: net/ipv6: Expand and rename accept_unsolicited_na to accept_untracked_na RFC 9131 changes default behaviour of handling RX of NA messages when the corresponding entry is absent in the neighbour cache. The current implementation is limited to accept just unsolicited NAs. However, the RFC is more generic where it also accepts solicited NAs. Both types should result in adding a STALE entry for this case. Expand accept_untracked_na behaviour to also accept solicited NAs to be compliant with the RFC and rename the sysctl knob to accept_untracked_na. Fixes: f9a2fb73318e ("net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131") Signed-off-by: Arun Ajith S Reviewed-by: David Ahern Link: https://lore.kernel.org/r/20220530101414.65439-1-aajith@arista.com Signed-off-by: Paolo Abeni --- include/linux/ipv6.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ipv6.h b/include/linux/ipv6.h index 38c8203d52cb..37dfdcfcdd54 100644 --- a/include/linux/ipv6.h +++ b/include/linux/ipv6.h @@ -61,7 +61,7 @@ struct ipv6_devconf { __s32 suppress_frag_ndisc; __s32 accept_ra_mtu; __s32 drop_unsolicited_na; - __s32 accept_unsolicited_na; + __s32 accept_untracked_na; struct ipv6_stable_secret { bool initialized; struct in6_addr secret; -- cgit From 13b00b135665c92065a27c0c39dd97e0f380bd4f Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Wed, 18 May 2022 16:38:00 +0300 Subject: vdpa: Add support for querying vendor statistics Allows to read vendor statistics of a vdpa device. The specific statistics data are received from the upstream driver in the form of an (attribute name, attribute value) pairs. An example of statistics for mlx5_vdpa device are: received_desc - number of descriptors received by the virtqueue completed_desc - number of descriptors completed by the virtqueue A descriptor using indirect buffers is still counted as 1. In addition, N chained descriptors are counted correctly N times as one would expect. A new callback was added to vdpa_config_ops which provides the means for the vdpa driver to return statistics results. The interface allows for reading all the supported virtqueues, including the control virtqueue if it exists. Below are some examples taken from mlx5_vdpa which are introduced in the following patch: 1. Read statistics for the virtqueue at index 1 $ vdpa dev vstats show vdpa-a qidx 1 vdpa-a: queue_type tx queue_index 1 received_desc 3844836 completed_desc 3844836 2. Read statistics for the virtqueue at index 32 $ vdpa dev vstats show vdpa-a qidx 32 vdpa-a: queue_type control_vq queue_index 32 received_desc 62 completed_desc 62 3. Read statisitics for the virtqueue at index 0 with json output $ vdpa -j dev vstats show vdpa-a qidx 0 {"vstats":{"vdpa-a":{ "queue_type":"rx","queue_index":0,"name":"received_desc","value":417776,\ "name":"completed_desc","value":417548}}} 4. Read statistics for the virtqueue at index 0 with preety json output $ vdpa -jp dev vstats show vdpa-a qidx 0 { "vstats": { "vdpa-a": { "queue_type": "rx", "queue_index": 0, "name": "received_desc", "value": 417776, "name": "completed_desc", "value": 417548 } } } Signed-off-by: Eli Cohen Message-Id: <20220518133804.1075129-3-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vdpa.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 8943a209202e..2ae8443331e1 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -276,6 +276,9 @@ struct vdpa_config_ops { const struct vdpa_vq_state *state); int (*get_vq_state)(struct vdpa_device *vdev, u16 idx, struct vdpa_vq_state *state); + int (*get_vendor_vq_stats)(struct vdpa_device *vdev, u16 idx, + struct sk_buff *msg, + struct netlink_ext_ack *extack); struct vdpa_notification_area (*get_vq_notification)(struct vdpa_device *vdev, u16 idx); /* vq irq is not expected to be changed once DRIVER_OK is set */ -- cgit From a6a51adc6e8aafebfe0c4beb80e99694ea562b40 Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Wed, 18 May 2022 16:38:02 +0300 Subject: net/vdpa: Use readers/writers semaphore instead of cf_mutex Replace cf_mutex with rw_semaphore to reflect the fact that some calls could be called concurrently but can suffice with read lock. Suggested-by: Si-Wei Liu Signed-off-by: Eli Cohen Message-Id: <20220518133804.1075129-5-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vdpa.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 2ae8443331e1..2cb14847831e 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -66,7 +66,7 @@ struct vdpa_mgmt_dev; * @dma_dev: the actual device that is performing DMA * @driver_override: driver name to force a match * @config: the configuration ops for this device. - * @cf_mutex: Protects get and set access to configuration layout. + * @cf_lock: Protects get and set access to configuration layout. * @index: device index * @features_valid: were features initialized? for legacy guests * @use_va: indicate whether virtual address must be used by this device @@ -79,7 +79,7 @@ struct vdpa_device { struct device *dma_dev; const char *driver_override; const struct vdpa_config_ops *config; - struct mutex cf_mutex; /* Protects get/set config */ + struct rw_semaphore cf_lock; /* Protects get/set config */ unsigned int index; bool features_valid; bool use_va; @@ -398,10 +398,10 @@ static inline int vdpa_reset(struct vdpa_device *vdev) const struct vdpa_config_ops *ops = vdev->config; int ret; - mutex_lock(&vdev->cf_mutex); + down_write(&vdev->cf_lock); vdev->features_valid = false; ret = ops->reset(vdev); - mutex_unlock(&vdev->cf_mutex); + up_write(&vdev->cf_lock); return ret; } @@ -420,9 +420,9 @@ static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features) { int ret; - mutex_lock(&vdev->cf_mutex); + down_write(&vdev->cf_lock); ret = vdpa_set_features_unlocked(vdev, features); - mutex_unlock(&vdev->cf_mutex); + up_write(&vdev->cf_lock); return ret; } -- cgit From 1892a3d425bf525ac98d6d3534035e6ed2bfab50 Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Wed, 18 May 2022 16:38:03 +0300 Subject: vdpa/mlx5: Add support for reading descriptor statistics Implement the get_vq_stats calback of vdpa_config_ops to return the statistics for a virtqueue. The statistics are provided as vendor specific statistics where the driver provides a pair of attribute name and attribute value. Currently supported are received descriptors and completed descriptors. Signed-off-by: Eli Cohen Message-Id: <20220518133804.1075129-6-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin --- include/linux/mlx5/mlx5_ifc.h | 1 + include/linux/mlx5/mlx5_ifc_vdpa.h | 39 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 40 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 78b3d3465dd7..2a8334bb5f82 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -87,6 +87,7 @@ enum { enum { MLX5_OBJ_TYPE_GENEVE_TLV_OPT = 0x000b, MLX5_OBJ_TYPE_VIRTIO_NET_Q = 0x000d, + MLX5_OBJ_TYPE_VIRTIO_Q_COUNTERS = 0x001c, MLX5_OBJ_TYPE_MATCH_DEFINER = 0x0018, MLX5_OBJ_TYPE_MKEY = 0xff01, MLX5_OBJ_TYPE_QP = 0xff02, diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h index 1a9c9d94cb59..4414ed5b6ed2 100644 --- a/include/linux/mlx5/mlx5_ifc_vdpa.h +++ b/include/linux/mlx5/mlx5_ifc_vdpa.h @@ -165,4 +165,43 @@ struct mlx5_ifc_modify_virtio_net_q_out_bits { struct mlx5_ifc_general_obj_out_cmd_hdr_bits general_obj_out_cmd_hdr; }; +struct mlx5_ifc_virtio_q_counters_bits { + u8 modify_field_select[0x40]; + u8 reserved_at_40[0x40]; + u8 received_desc[0x40]; + u8 completed_desc[0x40]; + u8 error_cqes[0x20]; + u8 bad_desc_errors[0x20]; + u8 exceed_max_chain[0x20]; + u8 invalid_buffer[0x20]; + u8 reserved_at_180[0x280]; +}; + +struct mlx5_ifc_create_virtio_q_counters_in_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr; + struct mlx5_ifc_virtio_q_counters_bits virtio_q_counters; +}; + +struct mlx5_ifc_create_virtio_q_counters_out_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr; + struct mlx5_ifc_virtio_q_counters_bits virtio_q_counters; +}; + +struct mlx5_ifc_destroy_virtio_q_counters_in_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr; +}; + +struct mlx5_ifc_destroy_virtio_q_counters_out_bits { + struct mlx5_ifc_general_obj_out_cmd_hdr_bits hdr; +}; + +struct mlx5_ifc_query_virtio_q_counters_in_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr; +}; + +struct mlx5_ifc_query_virtio_q_counters_out_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr; + struct mlx5_ifc_virtio_q_counters_bits counters; +}; + #endif /* __MLX5_IFC_VDPA_H_ */ -- cgit From d4821902e43453b85b31329441a9f6ac071228a8 Mon Sep 17 00:00:00 2001 From: Gautam Dawar Date: Wed, 30 Mar 2022 23:33:45 +0530 Subject: vdpa: introduce virtqueue groups This patch introduces virtqueue groups to vDPA device. The virtqueue group is the minimal set of virtqueues that must share an address space. And the address space identifier could only be attached to a specific virtqueue group. Signed-off-by: Jason Wang Signed-off-by: Gautam Dawar Message-Id: <20220330180436.24644-6-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vdpa.h | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 2cb14847831e..e4e53574183e 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -85,6 +85,7 @@ struct vdpa_device { bool use_va; u32 nvqs; struct vdpa_mgmt_dev *mdev; + unsigned int ngroups; }; /** @@ -172,6 +173,10 @@ struct vdpa_map_file { * for the device * @vdev: vdpa device * Returns virtqueue algin requirement + * @get_vq_group: Get the group id for a specific virtqueue + * @vdev: vdpa device + * @idx: virtqueue index + * Returns u32: group id for this virtqueue * @get_device_features: Get virtio features supported by the device * @vdev: vdpa device * Returns the virtio features support by the @@ -286,6 +291,7 @@ struct vdpa_config_ops { /* Device ops */ u32 (*get_vq_align)(struct vdpa_device *vdev); + u32 (*get_vq_group)(struct vdpa_device *vdev, u16 idx); u64 (*get_device_features)(struct vdpa_device *vdev); int (*set_driver_features)(struct vdpa_device *vdev, u64 features); u64 (*get_driver_features)(struct vdpa_device *vdev); @@ -318,6 +324,7 @@ struct vdpa_config_ops { struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, + unsigned int ngroups, size_t size, const char *name, bool use_va); @@ -328,17 +335,18 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, * @member: the name of struct vdpa_device within the @dev_struct * @parent: the parent device * @config: the bus operations that is supported by this device + * @ngroups: the number of virtqueue groups supported by this device * @name: name of the vdpa device * @use_va: indicate whether virtual address must be used by this device * * Return allocated data structure or ERR_PTR upon error */ -#define vdpa_alloc_device(dev_struct, member, parent, config, name, use_va) \ - container_of(__vdpa_alloc_device( \ - parent, config, \ +#define vdpa_alloc_device(dev_struct, member, parent, config, ngroups, name, use_va) \ + container_of((__vdpa_alloc_device( \ + parent, config, ngroups, \ sizeof(dev_struct) + \ BUILD_BUG_ON_ZERO(offsetof( \ - dev_struct, member)), name, use_va), \ + dev_struct, member)), name, use_va)), \ dev_struct, member) int vdpa_register_device(struct vdpa_device *vdev, u32 nvqs); -- cgit From db9adcbf4286ad1c1fca091a870db6e49bb0df07 Mon Sep 17 00:00:00 2001 From: Gautam Dawar Date: Wed, 30 Mar 2022 23:33:46 +0530 Subject: vdpa: multiple address spaces support This patches introduces the multiple address spaces support for vDPA device. This idea is to identify a specific address space via an dedicated identifier - ASID. During vDPA device allocation, vDPA device driver needs to report the number of address spaces supported by the device then the DMA mapping ops of the vDPA device needs to be extended to support ASID. This helps to isolate the environments for the virtqueue that will not be assigned directly. E.g in the case of virtio-net, the control virtqueue will not be assigned directly to guest. As a start, simply claim 1 virtqueue groups and 1 address spaces for all vDPA devices. And vhost-vDPA will simply reject the device with more than 1 virtqueue groups or address spaces. Signed-off-by: Jason Wang Signed-off-by: Gautam Dawar Message-Id: <20220330180436.24644-7-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vdpa.h | 28 +++++++++++++++++++--------- 1 file changed, 19 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index e4e53574183e..1515748e84ff 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -69,6 +69,8 @@ struct vdpa_mgmt_dev; * @cf_lock: Protects get and set access to configuration layout. * @index: device index * @features_valid: were features initialized? for legacy guests + * @ngroups: the number of virtqueue groups + * @nas: the number of address spaces * @use_va: indicate whether virtual address must be used by this device * @nvqs: maximum number of supported virtqueues * @mdev: management device pointer; caller must setup when registering device as part @@ -86,6 +88,7 @@ struct vdpa_device { u32 nvqs; struct vdpa_mgmt_dev *mdev; unsigned int ngroups; + unsigned int nas; }; /** @@ -241,6 +244,7 @@ struct vdpa_map_file { * Needed for device that using device * specific DMA translation (on-chip IOMMU) * @vdev: vdpa device + * @asid: address space identifier * @iotlb: vhost memory mapping to be * used by the vDPA * Returns integer: success (0) or error (< 0) @@ -249,6 +253,7 @@ struct vdpa_map_file { * specific DMA translation (on-chip IOMMU) * and preferring incremental map. * @vdev: vdpa device + * @asid: address space identifier * @iova: iova to be mapped * @size: size of the area * @pa: physical address for the map @@ -260,6 +265,7 @@ struct vdpa_map_file { * specific DMA translation (on-chip IOMMU) * and preferring incremental unmap. * @vdev: vdpa device + * @asid: address space identifier * @iova: iova to be unmapped * @size: size of the area * Returns integer: success (0) or error (< 0) @@ -313,10 +319,12 @@ struct vdpa_config_ops { struct vdpa_iova_range (*get_iova_range)(struct vdpa_device *vdev); /* DMA ops */ - int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb); - int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size, - u64 pa, u32 perm, void *opaque); - int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size); + int (*set_map)(struct vdpa_device *vdev, unsigned int asid, + struct vhost_iotlb *iotlb); + int (*dma_map)(struct vdpa_device *vdev, unsigned int asid, + u64 iova, u64 size, u64 pa, u32 perm, void *opaque); + int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid, + u64 iova, u64 size); /* Free device resources */ void (*free)(struct vdpa_device *vdev); @@ -324,7 +332,7 @@ struct vdpa_config_ops { struct vdpa_device *__vdpa_alloc_device(struct device *parent, const struct vdpa_config_ops *config, - unsigned int ngroups, + unsigned int ngroups, unsigned int nas, size_t size, const char *name, bool use_va); @@ -336,17 +344,19 @@ struct vdpa_device *__vdpa_alloc_device(struct device *parent, * @parent: the parent device * @config: the bus operations that is supported by this device * @ngroups: the number of virtqueue groups supported by this device + * @nas: the number of address spaces * @name: name of the vdpa device * @use_va: indicate whether virtual address must be used by this device * * Return allocated data structure or ERR_PTR upon error */ -#define vdpa_alloc_device(dev_struct, member, parent, config, ngroups, name, use_va) \ +#define vdpa_alloc_device(dev_struct, member, parent, config, ngroups, nas, \ + name, use_va) \ container_of((__vdpa_alloc_device( \ - parent, config, ngroups, \ - sizeof(dev_struct) + \ + parent, config, ngroups, nas, \ + (sizeof(dev_struct) + \ BUILD_BUG_ON_ZERO(offsetof( \ - dev_struct, member)), name, use_va)), \ + dev_struct, member))), name, use_va)), \ dev_struct, member) int vdpa_register_device(struct vdpa_device *vdev, u32 nvqs); -- cgit From 46d554b1bcd19133401d2d5c0728b85e7bfd1358 Mon Sep 17 00:00:00 2001 From: Gautam Dawar Date: Wed, 30 Mar 2022 23:33:47 +0530 Subject: vdpa: introduce config operations for associating ASID to a virtqueue group This patch introduces a new bus operation to allow the vDPA bus driver to associate an ASID to a virtqueue group. Signed-off-by: Jason Wang Signed-off-by: Gautam Dawar Message-Id: <20220330180436.24644-8-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vdpa.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 1515748e84ff..f336d253db3d 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -240,6 +240,12 @@ struct vdpa_map_file { * @vdev: vdpa device * Returns the iova range supported by * the device. + * @set_group_asid: Set address space identifier for a + * virtqueue group + * @vdev: vdpa device + * @group: virtqueue group + * @asid: address space id for this group + * Returns integer: success (0) or error (< 0) * @set_map: Set device memory mapping (optional) * Needed for device that using device * specific DMA translation (on-chip IOMMU) @@ -325,6 +331,8 @@ struct vdpa_config_ops { u64 iova, u64 size, u64 pa, u32 perm, void *opaque); int (*dma_unmap)(struct vdpa_device *vdev, unsigned int asid, u64 iova, u64 size); + int (*set_group_asid)(struct vdpa_device *vdev, unsigned int group, + unsigned int asid); /* Free device resources */ void (*free)(struct vdpa_device *vdev); -- cgit From 1cb108994c6830cc6a6e066ad7d9a22ef59fa167 Mon Sep 17 00:00:00 2001 From: Gautam Dawar Date: Wed, 30 Mar 2022 23:33:48 +0530 Subject: vhost_iotlb: split out IOTLB initialization This patch splits out IOTLB initialization to make sure it could be reused by external modules. Signed-off-by: Jason Wang Signed-off-by: Gautam Dawar Message-Id: <20220330180436.24644-9-gdawar@xilinx.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vhost_iotlb.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vhost_iotlb.h b/include/linux/vhost_iotlb.h index 2d0e2f52f938..e79a40838998 100644 --- a/include/linux/vhost_iotlb.h +++ b/include/linux/vhost_iotlb.h @@ -36,6 +36,8 @@ int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last, u64 addr, unsigned int perm); void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last); +void vhost_iotlb_init(struct vhost_iotlb *iotlb, unsigned int limit, + unsigned int flags); struct vhost_iotlb *vhost_iotlb_alloc(unsigned int limit, unsigned int flags); void vhost_iotlb_free(struct vhost_iotlb *iotlb); void vhost_iotlb_reset(struct vhost_iotlb *iotlb); -- cgit From ffbda8e9df10d1784d5427ec199e7d8308e3763f Mon Sep 17 00:00:00 2001 From: Cindy Lu Date: Fri, 29 Apr 2022 17:10:30 +0800 Subject: vdpa/vp_vdpa : add vdpa tool support in vp_vdpa this patch is to add the support for vdpa tool in vp_vdpa here is the example steps modprobe vp_vdpa modprobe vhost_vdpa echo 0000:00:06.0>/sys/bus/pci/drivers/virtio-pci/unbind echo 1af4 1041 > /sys/bus/pci/drivers/vp-vdpa/new_id vdpa dev add name vdpa1 mgmtdev pci/0000:00:06.0 Signed-off-by: Cindy Lu Message-Id: <20220429091030.547434-1-lulu@redhat.com> Signed-off-by: Michael S. Tsirkin Acked-by: Jason Wang --- include/linux/vdpa.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index f336d253db3d..15af802d41c4 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -492,7 +492,7 @@ struct vdpa_mgmtdev_ops { struct vdpa_mgmt_dev { struct device *device; const struct vdpa_mgmtdev_ops *ops; - const struct virtio_device_id *id_table; + struct virtio_device_id *id_table; u64 config_attr_mask; struct list_head list; u64 supported_features; -- cgit From 48b69959a8559db5df78661d9c6d5138c2f20871 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Fri, 27 May 2022 14:01:14 +0800 Subject: virtio: introduce config op to synchronize vring callbacks This patch introduces new virtio config op to vring callbacks. Transport specific method is required to make sure the write before this function is visible to the vring_interrupt() that is called after the return of this function. For the transport that doesn't provide synchronize_vqs(), use synchornize_rcu() which synchronize with IRQ implicitly as a fallback. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s390@vger.kernel.org Reviewed-by: Cornelia Huck Signed-off-by: Jason Wang Message-Id: <20220527060120.20964-4-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Xuan Zhuo Reviewed-by: Stefano Garzarella --- include/linux/virtio_config.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index b341dd62aa4d..25be018810a7 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -57,6 +57,11 @@ struct virtio_shm_region { * include a NULL entry for vqs unused by driver * Returns 0 on success or error status * @del_vqs: free virtqueues found by find_vqs(). + * @synchronize_cbs: synchronize with the virtqueue callbacks (optional) + * The function guarantees that all memory operations on the + * queue before it are visible to the vring_interrupt() that is + * called after it. + * vdev: the virtio_device * @get_features: get the array of feature bits for this device. * vdev: the virtio_device * Returns the first 64 feature bits (all we currently need). @@ -89,6 +94,7 @@ struct virtio_config_ops { const char * const names[], const bool *ctx, struct irq_affinity *desc); void (*del_vqs)(struct virtio_device *); + void (*synchronize_cbs)(struct virtio_device *); u64 (*get_features)(struct virtio_device *vdev); int (*finalize_features)(struct virtio_device *vdev); const char *(*bus_name)(struct virtio_device *vdev); @@ -217,6 +223,25 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs, desc); } +/** + * virtio_synchronize_cbs - synchronize with virtqueue callbacks + * @vdev: the device + */ +static inline +void virtio_synchronize_cbs(struct virtio_device *dev) +{ + if (dev->config->synchronize_cbs) { + dev->config->synchronize_cbs(dev); + } else { + /* + * A best effort fallback to synchronize with + * interrupts, preemption and softirq disabled + * regions. See comment above synchronize_rcu(). + */ + synchronize_rcu(); + } +} + /** * virtio_device_ready - enable vq use in probe function * @vdev: the device -- cgit From be83f04d2529e8dc4273efdd1ccf7b7502741071 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Fri, 27 May 2022 14:01:18 +0800 Subject: virtio: allow to unbreak virtqueue This patch allows the new introduced __virtio_break_device() to unbreak the virtqueue. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s390@vger.kernel.org Signed-off-by: Jason Wang Message-Id: <20220527060120.20964-8-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Xuan Zhuo --- include/linux/virtio.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 5464f398912a..d8fdf170637c 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -131,6 +131,7 @@ void unregister_virtio_device(struct virtio_device *dev); bool is_virtio_device(struct device *dev); void virtio_break_device(struct virtio_device *dev); +void __virtio_unbreak_device(struct virtio_device *dev); void virtio_config_changed(struct virtio_device *dev); #ifdef CONFIG_PM_SLEEP -- cgit From 8b4ec69d7e098a7ddf832e1e7840de53ed474c77 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Fri, 27 May 2022 14:01:19 +0800 Subject: virtio: harden vring IRQ This is a rework on the previous IRQ hardening that is done for virtio-pci where several drawbacks were found and were reverted: 1) try to use IRQF_NO_AUTOEN which is not friendly to affinity managed IRQ that is used by some device such as virtio-blk 2) done only for PCI transport The vq->broken is re-used in this patch for implementing the IRQ hardening. The vq->broken is set to true during both initialization and reset. And the vq->broken is set to false in virtio_device_ready(). Then vring_interrupt() can check and return when vq->broken is true. And in this case, switch to return IRQ_NONE to let the interrupt core aware of such invalid interrupt to prevent IRQ storm. The reason of using a per queue variable instead of a per device one is that we may need it for per queue reset hardening in the future. Note that the hardening is only done for vring interrupt since the config interrupt hardening is already done in commit 22b7050a024d7 ("virtio: defer config changed notifications"). But the method that is used by config interrupt can't be reused by the vring interrupt handler because it uses spinlock to do the synchronization which is expensive. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s390@vger.kernel.org Signed-off-by: Jason Wang Message-Id: <20220527060120.20964-9-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Xuan Zhuo --- include/linux/virtio_config.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 25be018810a7..d4edfd7d91bb 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -256,6 +256,26 @@ void virtio_device_ready(struct virtio_device *dev) unsigned status = dev->config->get_status(dev); BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK); + + /* + * The virtio_synchronize_cbs() makes sure vring_interrupt() + * will see the driver specific setup if it sees vq->broken + * as false (even if the notifications come before DRIVER_OK). + */ + virtio_synchronize_cbs(dev); + __virtio_unbreak_device(dev); + /* + * The transport should ensure the visibility of vq->broken + * before setting DRIVER_OK. See the comments for the transport + * specific set_status() method. + * + * A well behaved device will only notify a virtqueue after + * DRIVER_OK, this means the device should "see" the coherenct + * memory write that set vq->broken as false which is done by + * the driver when it sees DRIVER_OK, then the following + * driver's vring_interrupt() will see vq->broken as false so + * we won't lose any notification. + */ dev->config->set_status(dev, status | VIRTIO_CONFIG_S_DRIVER_OK); } -- cgit From 619e9e14ba3c97a80776c0b5a68a01caa41dd148 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Fri, 27 May 2022 14:01:20 +0800 Subject: virtio: use WARN_ON() to warning illegal status value We used to use BUG_ON() in virtio_device_ready() to detect illegal status value, this seems sub-optimal since the value is under the control of the device. Switch to use WARN_ON() instead. Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Marc Zyngier Cc: Halil Pasic Cc: Cornelia Huck Cc: Vineeth Vijayan Cc: Peter Oberparleiter Cc: linux-s390@vger.kernel.org Signed-off-by: Jason Wang Message-Id: <20220527060120.20964-10-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Xuan Zhuo --- include/linux/virtio_config.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index d4edfd7d91bb..9a36051ceb76 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -255,7 +255,7 @@ void virtio_device_ready(struct virtio_device *dev) { unsigned status = dev->config->get_status(dev); - BUG_ON(status & VIRTIO_CONFIG_S_DRIVER_OK); + WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK); /* * The virtio_synchronize_cbs() makes sure vring_interrupt() -- cgit From 3fc2a9e89b3508a5cc0c324f26d7b4740ba8c456 Mon Sep 17 00:00:00 2001 From: Changcheng Liu Date: Tue, 26 Apr 2022 21:28:14 +0800 Subject: net/mlx5: correct ECE offset in query qp output ECE field should be after opt_param_mask in query qp output. Fixes: 6b646a7e4af6 ("net/mlx5: Add ability to read and write ECE options") Signed-off-by: Changcheng Liu Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 78b3d3465dd7..2cd7d611e7b3 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -5176,12 +5176,11 @@ struct mlx5_ifc_query_qp_out_bits { u8 syndrome[0x20]; - u8 reserved_at_40[0x20]; - u8 ece[0x20]; + u8 reserved_at_40[0x40]; u8 opt_param_mask[0x20]; - u8 reserved_at_a0[0x20]; + u8 ece[0x20]; struct mlx5_ifc_qpc_bits qpc; -- cgit From c3ed222745d9ad7b69299b349a64ba533c64a34f Mon Sep 17 00:00:00 2001 From: Benjamin Coddington Date: Sat, 14 May 2022 07:05:13 -0400 Subject: NFSv4: Fix free of uninitialized nfs4_label on referral lookup. Send along the already-allocated fattr along with nfs4_fs_locations, and drop the memcpy of fattr. We end up growing two more allocations, but this fixes up a crash as: PID: 790 TASK: ffff88811b43c000 CPU: 0 COMMAND: "ls" #0 [ffffc90000857920] panic at ffffffff81b9bfde #1 [ffffc900008579c0] do_trap at ffffffff81023a9b #2 [ffffc90000857a10] do_error_trap at ffffffff81023b78 #3 [ffffc90000857a58] exc_stack_segment at ffffffff81be1f45 #4 [ffffc90000857a80] asm_exc_stack_segment at ffffffff81c009de #5 [ffffc90000857b08] nfs_lookup at ffffffffa0302322 [nfs] #6 [ffffc90000857b70] __lookup_slow at ffffffff813a4a5f #7 [ffffc90000857c60] walk_component at ffffffff813a86c4 #8 [ffffc90000857cb8] path_lookupat at ffffffff813a9553 #9 [ffffc90000857cf0] filename_lookup at ffffffff813ab86b Suggested-by: Trond Myklebust Fixes: 9558a007dbc3 ("NFS: Remove the label from the nfs4_lookup_res struct") Signed-off-by: Benjamin Coddington Signed-off-by: Anna Schumaker --- include/linux/nfs_xdr.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 4a8ba84f848e..0e3aa0f5f324 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1222,7 +1222,7 @@ struct nfs4_fs_location { #define NFS4_FS_LOCATIONS_MAXENTRIES 10 struct nfs4_fs_locations { - struct nfs_fattr fattr; + struct nfs_fattr *fattr; const struct nfs_server *server; struct nfs4_pathname fs_path; int nlocations; -- cgit From 118f09eda21d392e1eeb9f8a4bee044958cccf20 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Wed, 25 May 2022 12:12:59 -0400 Subject: NFSv4.1 mark qualified async operations as MOVEABLE tasks Mark async operations such as RENAME, REMOVE, COMMIT MOVEABLE for the nfsv4.1+ sessions. Fixes: 85e39feead948 ("NFSv4.1 identify and mark RPC tasks that can move between transports") Signed-off-by: Olga Kornievskaia Signed-off-by: Anna Schumaker --- include/linux/nfs_fs_sb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h index 157d2bd6b241..ea2f7e6b1b0b 100644 --- a/include/linux/nfs_fs_sb.h +++ b/include/linux/nfs_fs_sb.h @@ -287,4 +287,5 @@ struct nfs_server { #define NFS_CAP_XATTR (1U << 28) #define NFS_CAP_READ_PLUS (1U << 29) #define NFS_CAP_FS_LOCATIONS (1U << 30) +#define NFS_CAP_MOVEABLE (1U << 31) #endif -- cgit From 597b89d30b42dcc8e6b262e6876b42dde66f97f0 Mon Sep 17 00:00:00 2001 From: Mikko Perttunen Date: Mon, 16 May 2022 11:52:51 +0300 Subject: gpu: host1x: Add context bus The context bus is a "dummy" bus that contains struct devices that correspond to IOMMU contexts assigned through Host1x to processes. Even when host1x itself is built as a module, the bus is registered in built-in code so that the built-in ARM SMMU driver is able to reference it. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding --- include/linux/host1x_context_bus.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 include/linux/host1x_context_bus.h (limited to 'include/linux') diff --git a/include/linux/host1x_context_bus.h b/include/linux/host1x_context_bus.h new file mode 100644 index 000000000000..72462737a6db --- /dev/null +++ b/include/linux/host1x_context_bus.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (c) 2021, NVIDIA Corporation. All rights reserved. + */ + +#ifndef __LINUX_HOST1X_CONTEXT_BUS_H +#define __LINUX_HOST1X_CONTEXT_BUS_H + +#include + +#ifdef CONFIG_TEGRA_HOST1X_CONTEXT_BUS +extern struct bus_type host1x_context_device_bus_type; +#endif + +#endif -- cgit From 662ce1dc9caf493c309200edbe38d186f1ea20d0 Mon Sep 17 00:00:00 2001 From: Yang Yang Date: Wed, 1 Jun 2022 15:55:25 -0700 Subject: delayacct: track delays from write-protect copy Delay accounting does not track the delay of write-protect copy. When tasks trigger many write-protect copys(include COW and unsharing of anonymous pages[1]), it may spend a amount of time waiting for them. To get the delay of tasks in write-protect copy, could help users to evaluate the impact of using KSM or fork() or GUP. Also update tools/accounting/getdelays.c: / # ./getdelays -dl -p 231 print delayacct stats ON listen forever PID 231 CPU count real total virtual total delay total delay average 6247 1859000000 2154070021 1674255063 0.268ms IO count delay total delay average 0 0 0ms SWAP count delay total delay average 0 0 0ms RECLAIM count delay total delay average 0 0 0ms THRASHING count delay total delay average 0 0 0ms COMPACT count delay total delay average 3 72758 0ms WPCOPY count delay total delay average 3635 271567604 0ms [1] commit 31cc5bc4af70("mm: support GUP-triggered unsharing of anonymous pages") Link: https://lkml.kernel.org/r/20220409014342.2505532-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang Reviewed-by: David Hildenbrand Reviewed-by: Jiang Xuexin Reviewed-by: Ran Xiaokai Reviewed-by: wangyong Cc: Jonathan Corbet Cc: Balbir Singh Cc: Mike Kravetz Cc: Stephen Rothwell Signed-off-by: Andrew Morton --- include/linux/delayacct.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/delayacct.h b/include/linux/delayacct.h index 6b16a6930a19..58aea2d7385c 100644 --- a/include/linux/delayacct.h +++ b/include/linux/delayacct.h @@ -45,9 +45,13 @@ struct task_delay_info { u64 compact_start; u64 compact_delay; /* wait for memory compact */ + u64 wpcopy_start; + u64 wpcopy_delay; /* wait for write-protect copy */ + u32 freepages_count; /* total count of memory reclaim */ u32 thrashing_count; /* total count of thrash waits */ u32 compact_count; /* total count of memory compact */ + u32 wpcopy_count; /* total count of write-protect copy */ }; #endif @@ -75,6 +79,8 @@ extern void __delayacct_swapin_start(void); extern void __delayacct_swapin_end(void); extern void __delayacct_compact_start(void); extern void __delayacct_compact_end(void); +extern void __delayacct_wpcopy_start(void); +extern void __delayacct_wpcopy_end(void); static inline void delayacct_tsk_init(struct task_struct *tsk) { @@ -191,6 +197,24 @@ static inline void delayacct_compact_end(void) __delayacct_compact_end(); } +static inline void delayacct_wpcopy_start(void) +{ + if (!static_branch_unlikely(&delayacct_key)) + return; + + if (current->delays) + __delayacct_wpcopy_start(); +} + +static inline void delayacct_wpcopy_end(void) +{ + if (!static_branch_unlikely(&delayacct_key)) + return; + + if (current->delays) + __delayacct_wpcopy_end(); +} + #else static inline void delayacct_init(void) {} @@ -225,6 +249,10 @@ static inline void delayacct_compact_start(void) {} static inline void delayacct_compact_end(void) {} +static inline void delayacct_wpcopy_start(void) +{} +static inline void delayacct_wpcopy_end(void) +{} #endif /* CONFIG_TASK_DELAY_ACCT */ -- cgit From 22296a5c0cd35aaf62e1af3266f82cdf6b0b9b78 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 2 Jun 2022 09:18:58 -0700 Subject: net: add debug info to __skb_pull() While analyzing yet another syzbot report, I found the following patch very useful. It allows to better understand what went wrong. This debug info is only enabled if CONFIG_DEBUG_NET=y, which is the case for syzbot builds. Signed-off-by: Eric Dumazet Acked-by: Willem de Bruijn Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index da96f0d3e753..d3d10556f0fa 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2696,7 +2696,14 @@ void *skb_pull(struct sk_buff *skb, unsigned int len); static inline void *__skb_pull(struct sk_buff *skb, unsigned int len) { skb->len -= len; - BUG_ON(skb->len < skb->data_len); + if (unlikely(skb->len < skb->data_len)) { +#if defined(CONFIG_DEBUG_NET) + skb->len += len; + pr_err("__skb_pull(len=%u)\n", len); + skb_dump(KERN_ERR, skb, false); +#endif + BUG(); + } return skb->data += len; } -- cgit From d8616ee2affcff37c5d315310da557a694a3303d Mon Sep 17 00:00:00 2001 From: Wang Yufen Date: Tue, 24 May 2022 15:53:11 +0800 Subject: bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues During TCP sockmap redirect pressure test, the following warning is triggered: WARNING: CPU: 3 PID: 2145 at net/core/stream.c:205 sk_stream_kill_queues+0xbc/0xd0 CPU: 3 PID: 2145 Comm: iperf Kdump: loaded Tainted: G W 5.10.0+ #9 Call Trace: inet_csk_destroy_sock+0x55/0x110 inet_csk_listen_stop+0xbb/0x380 tcp_close+0x41b/0x480 inet_release+0x42/0x80 __sock_release+0x3d/0xa0 sock_close+0x11/0x20 __fput+0x9d/0x240 task_work_run+0x62/0x90 exit_to_user_mode_prepare+0x110/0x120 syscall_exit_to_user_mode+0x27/0x190 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason we observed is that: When the listener is closing, a connection may have completed the three-way handshake but not accepted, and the client has sent some packets. The child sks in accept queue release by inet_child_forget()->inet_csk_destroy_sock(), but psocks of child sks have not released. To fix, add sock_map_destroy to release psocks. Signed-off-by: Wang Yufen Signed-off-by: Daniel Borkmann Signed-off-by: Andrii Nakryiko Acked-by: Jakub Sitnicki Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/20220524075311.649153-1-wangyufen@huawei.com --- include/linux/bpf.h | 1 + include/linux/skmsg.h | 1 + 2 files changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2b914a56a2c5..8e6092d0ea95 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2104,6 +2104,7 @@ int sock_map_bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr); void sock_map_unhash(struct sock *sk); +void sock_map_destroy(struct sock *sk); void sock_map_close(struct sock *sk, long timeout); #else static inline int bpf_prog_offload_init(struct bpf_prog *prog, diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index c5a2d6f50f25..153b6dec9b6a 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -95,6 +95,7 @@ struct sk_psock { spinlock_t link_lock; refcount_t refcnt; void (*saved_unhash)(struct sock *sk); + void (*saved_destroy)(struct sock *sk); void (*saved_close)(struct sock *sk, long timeout); void (*saved_write_space)(struct sock *sk); void (*saved_data_ready)(struct sock *sk); -- cgit From 46859ac8af52ae599e1b51992ddef3eb43f295fc Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Tue, 31 May 2022 18:04:12 +0800 Subject: LoongArch: Add multi-processor (SMP) support LoongArch-based procesors have 4, 8 or 16 cores per package. This patch adds multi-processor (SMP) support for LoongArch. Reviewed-by: WANG Xuerui Reviewed-by: Jiaxun Yang Signed-off-by: Huacai Chen --- include/linux/cpuhotplug.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index b66c5f389159..19f0dbfdd7fe 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -130,6 +130,7 @@ enum cpuhp_state { CPUHP_ZCOMP_PREPARE, CPUHP_TIMERS_PREPARE, CPUHP_MIPS_SOC_PREPARE, + CPUHP_LOONGARCH_SOC_PREPARE, CPUHP_BP_PREPARE_DYN, CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20, CPUHP_BRINGUP_CPU, -- cgit From 6d7131bd52b3e0dd068a0067e5584b3f36cf17eb Mon Sep 17 00:00:00 2001 From: Anna-Maria Behnsen Date: Mon, 11 Apr 2022 17:05:55 +0200 Subject: include/linux/find: Fix documentation The order of the arguments in function documentation doesn't fit the implementation. Change the documentation so that it corresponds to the code. This prevent people to get confused when reading the documentation. Signed-off-by: Anna-Maria Behnsen Reviewed-by: Andy Shevchenko Signed-off-by: Yury Norov --- include/linux/find.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/find.h b/include/linux/find.h index 5bb6db213bcb..424ef67d4a42 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -21,8 +21,8 @@ extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long siz /** * find_next_bit - find the next set bit in a memory region * @addr: The address to base the search on - * @offset: The bitnumber to start searching at * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at * * Returns the bit number for the next set bit * If no bits are set, returns @size. @@ -50,8 +50,8 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, * find_next_and_bit - find the next set bit in both memory regions * @addr1: The first address to base the search on * @addr2: The second address to base the search on - * @offset: The bitnumber to start searching at * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at * * Returns the bit number for the next set bit * If no bits are set, returns @size. @@ -79,8 +79,8 @@ unsigned long find_next_and_bit(const unsigned long *addr1, /** * find_next_zero_bit - find the next cleared bit in a memory region * @addr: The address to base the search on - * @offset: The bitnumber to start searching at * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at * * Returns the bit number of the next zero bit * If no bits are zero, returns @size. -- cgit From e041e0ac53dd52d2d201aa87edc3adaca1085299 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Thu, 28 Apr 2022 13:51:12 -0700 Subject: lib/bitmap: extend comment for bitmap_(from,to)_arr32() On LE systems bitmaps are naturally ordered, therefore we can potentially use bitmap_copy routines when converting from 32-bit arrays, even if host system is 64-bit. But it may lead to out-of-bond access due to unsafe typecast, and the bitmap_(from,to)_arr32 comment doesn't explain that clearly CC: Alexander Gordeev CC: Andy Shevchenko CC: Christian Borntraeger CC: Claudio Imbrenda CC: David Hildenbrand CC: Heiko Carstens CC: Janosch Frank CC: Rasmus Villemoes CC: Sven Schnelle CC: Vasily Gorbik Signed-off-by: Yury Norov --- include/linux/bitmap.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 7dba0847510c..afcf7b8dddd1 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -264,8 +264,12 @@ static inline void bitmap_copy_clear_tail(unsigned long *dst, } /* - * On 32-bit systems bitmaps are represented as u32 arrays internally, and - * therefore conversion is not needed when copying data from/to arrays of u32. + * On 32-bit systems bitmaps are represented as u32 arrays internally. On LE64 + * machines the order of hi and lo parts of numbers match the bitmap structure. + * In both cases conversion is not needed when copying data from/to arrays of + * u32. But in LE64 case, typecast in bitmap_copy_clear_tail() may lead + * to out-of-bound access. To avoid that, both LE and BE variants of 64-bit + * architectures are not using bitmap_copy_clear_tail(). */ #if BITS_PER_LONG == 64 void bitmap_from_arr32(unsigned long *bitmap, const u32 *buf, -- cgit From 0a97953fd2210d0ac8eb5c76f8bd08fb53b6d3d6 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Thu, 28 Apr 2022 13:51:13 -0700 Subject: lib: add bitmap_{from,to}_arr64 Manipulating 64-bit arrays with bitmap functions is potentially dangerous because on 32-bit BE machines the order of halfwords doesn't match. Another issue is that compiler may throw a warning about out-of-boundary access. This patch adds bitmap_{from,to}_arr64 functions in addition to existing bitmap_{from,to}_arr32. CC: Alexander Gordeev CC: Andy Shevchenko CC: Christian Borntraeger CC: Claudio Imbrenda CC: David Hildenbrand CC: Heiko Carstens CC: Janosch Frank CC: Rasmus Villemoes CC: Sven Schnelle CC: Vasily Gorbik Signed-off-by: Yury Norov --- include/linux/bitmap.h | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index afcf7b8dddd1..71147b7d721b 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -72,6 +72,8 @@ struct device; * bitmap_allocate_region(bitmap, pos, order) Allocate specified bit region * bitmap_from_arr32(dst, buf, nbits) Copy nbits from u32[] buf to dst * bitmap_to_arr32(buf, src, nbits) Copy nbits from buf to u32[] dst + * bitmap_to_arr64(buf, src, nbits) Copy nbits from buf to u64[] dst + * bitmap_to_arr64(buf, src, nbits) Copy nbits from buf to u64[] dst * bitmap_get_value8(map, start) Get 8bit value from map at start * bitmap_set_value8(map, value, start) Set 8bit value to map at start * @@ -285,6 +287,22 @@ void bitmap_to_arr32(u32 *buf, const unsigned long *bitmap, (const unsigned long *) (bitmap), (nbits)) #endif +/* + * On 64-bit systems bitmaps are represented as u64 arrays internally. On LE32 + * machines the order of hi and lo parts of numbers match the bitmap structure. + * In both cases conversion is not needed when copying data from/to arrays of + * u64. + */ +#if (BITS_PER_LONG == 32) && defined(__BIG_ENDIAN) +void bitmap_from_arr64(unsigned long *bitmap, const u64 *buf, unsigned int nbits); +void bitmap_to_arr64(u64 *buf, const unsigned long *bitmap, unsigned int nbits); +#else +#define bitmap_from_arr64(bitmap, buf, nbits) \ + bitmap_copy_clear_tail((unsigned long *)(bitmap), (const unsigned long *)(buf), (nbits)) +#define bitmap_to_arr64(buf, bitmap, nbits) \ + bitmap_copy_clear_tail((unsigned long *)(buf), (const unsigned long *)(bitmap), (nbits)) +#endif + static inline int bitmap_and(unsigned long *dst, const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { @@ -518,10 +536,7 @@ static inline void bitmap_next_set_region(unsigned long *bitmap, */ static inline void bitmap_from_u64(unsigned long *dst, u64 mask) { - dst[0] = mask & ULONG_MAX; - - if (sizeof(mask) > sizeof(unsigned long)) - dst[1] = mask >> 32; + bitmap_from_arr64(dst, &mask, 64); } /** -- cgit From 005f17007f47495dbbb659aa5db7e581065d16e7 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 18 May 2022 13:52:22 -0700 Subject: bitmap: Fix return values to be unsigned Both nodemask and bitmap routines had mixed return values that provided potentially signed return values that could never happen. This was leading to the compiler getting confusing about the range of possible return values (it was thinking things could be negative where they could not be). In preparation for fixing nodemask, fix all the bitmap routines that should be returning unsigned (or bool) values. Cc: Yury Norov Cc: Rasmus Villemoes Cc: Christophe de Dinechin Cc: Alexey Dobriyan Cc: Andy Shevchenko Cc: Andrew Morton Cc: Zhen Lei Signed-off-by: Kees Cook Signed-off-by: Yury Norov --- include/linux/bitmap.h | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 71147b7d721b..2e6cd5681040 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -134,8 +134,8 @@ unsigned long *devm_bitmap_zalloc(struct device *dev, * lib/bitmap.c provides these functions: */ -int __bitmap_equal(const unsigned long *bitmap1, - const unsigned long *bitmap2, unsigned int nbits); +bool __bitmap_equal(const unsigned long *bitmap1, + const unsigned long *bitmap2, unsigned int nbits); bool __pure __bitmap_or_equal(const unsigned long *src1, const unsigned long *src2, const unsigned long *src3, @@ -159,10 +159,10 @@ int __bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1, void __bitmap_replace(unsigned long *dst, const unsigned long *old, const unsigned long *new, const unsigned long *mask, unsigned int nbits); -int __bitmap_intersects(const unsigned long *bitmap1, - const unsigned long *bitmap2, unsigned int nbits); -int __bitmap_subset(const unsigned long *bitmap1, - const unsigned long *bitmap2, unsigned int nbits); +bool __bitmap_intersects(const unsigned long *bitmap1, + const unsigned long *bitmap2, unsigned int nbits); +bool __bitmap_subset(const unsigned long *bitmap1, + const unsigned long *bitmap2, unsigned int nbits); int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits); void __bitmap_set(unsigned long *map, unsigned int start, int len); void __bitmap_clear(unsigned long *map, unsigned int start, int len); @@ -353,8 +353,8 @@ static inline void bitmap_complement(unsigned long *dst, const unsigned long *sr #endif #define BITMAP_MEM_MASK (BITMAP_MEM_ALIGNMENT - 1) -static inline int bitmap_equal(const unsigned long *src1, - const unsigned long *src2, unsigned int nbits) +static inline bool bitmap_equal(const unsigned long *src1, + const unsigned long *src2, unsigned int nbits) { if (small_const_nbits(nbits)) return !((*src1 ^ *src2) & BITMAP_LAST_WORD_MASK(nbits)); @@ -384,8 +384,9 @@ static inline bool bitmap_or_equal(const unsigned long *src1, return !(((*src1 | *src2) ^ *src3) & BITMAP_LAST_WORD_MASK(nbits)); } -static inline int bitmap_intersects(const unsigned long *src1, - const unsigned long *src2, unsigned int nbits) +static inline bool bitmap_intersects(const unsigned long *src1, + const unsigned long *src2, + unsigned int nbits) { if (small_const_nbits(nbits)) return ((*src1 & *src2) & BITMAP_LAST_WORD_MASK(nbits)) != 0; @@ -393,8 +394,8 @@ static inline int bitmap_intersects(const unsigned long *src1, return __bitmap_intersects(src1, src2, nbits); } -static inline int bitmap_subset(const unsigned long *src1, - const unsigned long *src2, unsigned int nbits) +static inline bool bitmap_subset(const unsigned long *src1, + const unsigned long *src2, unsigned int nbits) { if (small_const_nbits(nbits)) return ! ((*src1 & ~(*src2)) & BITMAP_LAST_WORD_MASK(nbits)); -- cgit From 0dfe54071d7c828a02917b595456bfde1afdddc9 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 18 May 2022 13:52:23 -0700 Subject: nodemask: Fix return values to be unsigned MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The nodemask routines had mixed return values that provided potentially signed return values that could never happen. This was leading to the compiler getting confusing about the range of possible return values (it was thinking things could be negative where they could not be). Fix all the nodemask routines that should be returning unsigned (or bool) values. Silences: mm/swapfile.c: In function ‘setup_swap_info’: mm/swapfile.c:2291:47: error: array subscript -1 is below array bounds of ‘struct plist_node[]’ [-Werror=array-bounds] 2291 | p->avail_lists[i].prio = 1; | ~~~~~~~~~~~~~~^~~ In file included from mm/swapfile.c:16: ./include/linux/swap.h:292:27: note: while referencing ‘avail_lists’ 292 | struct plist_node avail_lists[]; /* | ^~~~~~~~~~~ Reported-by: Christophe de Dinechin Link: https://lore.kernel.org/lkml/20220414150855.2407137-3-dinechin@redhat.com/ Cc: Alexey Dobriyan Cc: Yury Norov Cc: Andy Shevchenko Cc: Rasmus Villemoes Cc: Andrew Morton Cc: Zhen Lei Signed-off-by: Kees Cook Signed-off-by: Yury Norov --- include/linux/nodemask.h | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 567c3ddba2c4..2c39663c3407 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -42,11 +42,11 @@ * void nodes_shift_right(dst, src, n) Shift right * void nodes_shift_left(dst, src, n) Shift left * - * int first_node(mask) Number lowest set bit, or MAX_NUMNODES - * int next_node(node, mask) Next node past 'node', or MAX_NUMNODES - * int next_node_in(node, mask) Next node past 'node', or wrap to first, + * unsigned int first_node(mask) Number lowest set bit, or MAX_NUMNODES + * unsigend int next_node(node, mask) Next node past 'node', or MAX_NUMNODES + * unsigned int next_node_in(node, mask) Next node past 'node', or wrap to first, * or MAX_NUMNODES - * int first_unset_node(mask) First node not set in mask, or + * unsigned int first_unset_node(mask) First node not set in mask, or * MAX_NUMNODES * * nodemask_t nodemask_of_node(node) Return nodemask with bit 'node' set @@ -153,7 +153,7 @@ static inline void __nodes_clear(nodemask_t *dstp, unsigned int nbits) #define node_test_and_set(node, nodemask) \ __node_test_and_set((node), &(nodemask)) -static inline int __node_test_and_set(int node, nodemask_t *addr) +static inline bool __node_test_and_set(int node, nodemask_t *addr) { return test_and_set_bit(node, addr->bits); } @@ -200,7 +200,7 @@ static inline void __nodes_complement(nodemask_t *dstp, #define nodes_equal(src1, src2) \ __nodes_equal(&(src1), &(src2), MAX_NUMNODES) -static inline int __nodes_equal(const nodemask_t *src1p, +static inline bool __nodes_equal(const nodemask_t *src1p, const nodemask_t *src2p, unsigned int nbits) { return bitmap_equal(src1p->bits, src2p->bits, nbits); @@ -208,7 +208,7 @@ static inline int __nodes_equal(const nodemask_t *src1p, #define nodes_intersects(src1, src2) \ __nodes_intersects(&(src1), &(src2), MAX_NUMNODES) -static inline int __nodes_intersects(const nodemask_t *src1p, +static inline bool __nodes_intersects(const nodemask_t *src1p, const nodemask_t *src2p, unsigned int nbits) { return bitmap_intersects(src1p->bits, src2p->bits, nbits); @@ -216,20 +216,20 @@ static inline int __nodes_intersects(const nodemask_t *src1p, #define nodes_subset(src1, src2) \ __nodes_subset(&(src1), &(src2), MAX_NUMNODES) -static inline int __nodes_subset(const nodemask_t *src1p, +static inline bool __nodes_subset(const nodemask_t *src1p, const nodemask_t *src2p, unsigned int nbits) { return bitmap_subset(src1p->bits, src2p->bits, nbits); } #define nodes_empty(src) __nodes_empty(&(src), MAX_NUMNODES) -static inline int __nodes_empty(const nodemask_t *srcp, unsigned int nbits) +static inline bool __nodes_empty(const nodemask_t *srcp, unsigned int nbits) { return bitmap_empty(srcp->bits, nbits); } #define nodes_full(nodemask) __nodes_full(&(nodemask), MAX_NUMNODES) -static inline int __nodes_full(const nodemask_t *srcp, unsigned int nbits) +static inline bool __nodes_full(const nodemask_t *srcp, unsigned int nbits) { return bitmap_full(srcp->bits, nbits); } @@ -260,15 +260,15 @@ static inline void __nodes_shift_left(nodemask_t *dstp, > MAX_NUMNODES, then the silly min_ts could be dropped. */ #define first_node(src) __first_node(&(src)) -static inline int __first_node(const nodemask_t *srcp) +static inline unsigned int __first_node(const nodemask_t *srcp) { - return min_t(int, MAX_NUMNODES, find_first_bit(srcp->bits, MAX_NUMNODES)); + return min_t(unsigned int, MAX_NUMNODES, find_first_bit(srcp->bits, MAX_NUMNODES)); } #define next_node(n, src) __next_node((n), &(src)) -static inline int __next_node(int n, const nodemask_t *srcp) +static inline unsigned int __next_node(int n, const nodemask_t *srcp) { - return min_t(int,MAX_NUMNODES,find_next_bit(srcp->bits, MAX_NUMNODES, n+1)); + return min_t(unsigned int, MAX_NUMNODES, find_next_bit(srcp->bits, MAX_NUMNODES, n+1)); } /* @@ -276,7 +276,7 @@ static inline int __next_node(int n, const nodemask_t *srcp) * the first node in src if needed. Returns MAX_NUMNODES if src is empty. */ #define next_node_in(n, src) __next_node_in((n), &(src)) -int __next_node_in(int node, const nodemask_t *srcp); +unsigned int __next_node_in(int node, const nodemask_t *srcp); static inline void init_nodemask_of_node(nodemask_t *mask, int node) { @@ -296,9 +296,9 @@ static inline void init_nodemask_of_node(nodemask_t *mask, int node) }) #define first_unset_node(mask) __first_unset_node(&(mask)) -static inline int __first_unset_node(const nodemask_t *maskp) +static inline unsigned int __first_unset_node(const nodemask_t *maskp) { - return min_t(int,MAX_NUMNODES, + return min_t(unsigned int, MAX_NUMNODES, find_first_zero_bit(maskp->bits, MAX_NUMNODES)); } @@ -436,11 +436,11 @@ static inline int num_node_state(enum node_states state) #define first_online_node first_node(node_states[N_ONLINE]) #define first_memory_node first_node(node_states[N_MEMORY]) -static inline int next_online_node(int nid) +static inline unsigned int next_online_node(int nid) { return next_node(nid, node_states[N_ONLINE]); } -static inline int next_memory_node(int nid) +static inline unsigned int next_memory_node(int nid) { return next_node(nid, node_states[N_MEMORY]); } -- cgit From a734510fa8b4e61e6a37176f0da01f4c55fa52de Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Wed, 25 May 2022 13:49:42 +0200 Subject: ata: libata: drop 'sas_last_tag' Unused now. Fixes: 4f1a22ee7b57 ("libata: Improve ATA queued command allocation") Cc: John Garry Signed-off-by: Hannes Reinecke Reviewed-by: John Garry Signed-off-by: Damien Le Moal --- include/linux/libata.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 732de9014626..0f2a59c9c735 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -822,7 +822,6 @@ struct ata_port { struct ata_queued_cmd qcmd[ATA_MAX_QUEUE + 1]; u64 qc_active; int nr_active_links; /* #links with active qcs */ - unsigned int sas_last_tag; /* track next tag hw expects */ struct ata_link link; /* host default link */ struct ata_link *slave_link; /* see ata_slave_link_init() */ -- cgit From 8d5976089c97a4beeeda4de59e2fba9862946893 Mon Sep 17 00:00:00 2001 From: Xiang wangx Date: Mon, 6 Jun 2022 10:23:13 +0800 Subject: platform/chrome: cros_ec_commands: Fix syntax errors in comments Delete the redundant word 'using'. Signed-off-by: Xiang wangx Reviewed-by: Tzung-Bi Shih Signed-off-by: Tzung-Bi Shih Link: https://lore.kernel.org/r/20220606022313.22912-1-wangxiang@cdjrlc.com --- include/linux/platform_data/cros_ec_commands.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_commands.h b/include/linux/platform_data/cros_ec_commands.h index 8cfa8cfca77e..e59f51c41a1c 100644 --- a/include/linux/platform_data/cros_ec_commands.h +++ b/include/linux/platform_data/cros_ec_commands.h @@ -787,7 +787,7 @@ struct ec_host_response { * * Packets always start with a request or response header. They are followed * by data_len bytes of data. If the data_crc_present flag is set, the data - * bytes are followed by a CRC-8 of that data, using using x^8 + x^2 + x + 1 + * bytes are followed by a CRC-8 of that data, using x^8 + x^2 + x + 1 * polynomial. * * Host algorithm when sending a request q: -- cgit From 2130a790ca49763f724ec45cf93b9dd765e2023e Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Thu, 2 Jun 2022 15:05:26 +0200 Subject: kernel: add platform_has() infrastructure Add a simple infrastructure for setting, resetting and querying platform feature flags. Flags can be either global or architecture specific. Signed-off-by: Juergen Gross Reviewed-by: Oleksandr Tyshchenko Tested-by: Oleksandr Tyshchenko # Arm64 only Reviewed-by: Christoph Hellwig Acked-by: Borislav Petkov Signed-off-by: Juergen Gross --- include/linux/platform-feature.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) create mode 100644 include/linux/platform-feature.h (limited to 'include/linux') diff --git a/include/linux/platform-feature.h b/include/linux/platform-feature.h new file mode 100644 index 000000000000..6ed859928b97 --- /dev/null +++ b/include/linux/platform-feature.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _PLATFORM_FEATURE_H +#define _PLATFORM_FEATURE_H + +#include +#include + +/* The platform features are starting with the architecture specific ones. */ +#define PLATFORM_FEAT_N (0 + PLATFORM_ARCH_FEAT_N) + +void platform_set(unsigned int feature); +void platform_clear(unsigned int feature); +bool platform_has(unsigned int feature); + +#endif /* _PLATFORM_FEATURE_H */ -- cgit From 3f9dfbebdc48cebfbda738f6f3d1dbf6d7232f90 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Mon, 6 Jun 2022 08:09:16 +0200 Subject: virtio: replace arch_has_restricted_virtio_memory_access() Instead of using arch_has_restricted_virtio_memory_access() together with CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, replace those with platform_has() and a new platform feature PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS. Signed-off-by: Juergen Gross Reviewed-by: Oleksandr Tyshchenko Tested-by: Oleksandr Tyshchenko # Arm64 only Reviewed-by: Christoph Hellwig Acked-by: Borislav Petkov --- include/linux/platform-feature.h | 6 +++++- include/linux/virtio_config.h | 9 --------- 2 files changed, 5 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform-feature.h b/include/linux/platform-feature.h index 6ed859928b97..b2f48be999fa 100644 --- a/include/linux/platform-feature.h +++ b/include/linux/platform-feature.h @@ -6,7 +6,11 @@ #include /* The platform features are starting with the architecture specific ones. */ -#define PLATFORM_FEAT_N (0 + PLATFORM_ARCH_FEAT_N) + +/* Used to enable platform specific DMA handling for virtio devices. */ +#define PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS (0 + PLATFORM_ARCH_FEAT_N) + +#define PLATFORM_FEAT_N (1 + PLATFORM_ARCH_FEAT_N) void platform_set(unsigned int feature); void platform_clear(unsigned int feature); diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 9a36051ceb76..49c7c32815f1 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -604,13 +604,4 @@ static inline void virtio_cwrite64(struct virtio_device *vdev, _r; \ }) -#ifdef CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS -int arch_has_restricted_virtio_memory_access(void); -#else -static inline int arch_has_restricted_virtio_memory_access(void) -{ - return 0; -} -#endif /* CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS */ - #endif /* _LINUX_VIRTIO_CONFIG_H */ -- cgit From 657f8bd88cb5a968d907fd1c891cee52dc156caa Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sat, 21 May 2022 13:10:23 +0200 Subject: spi: fix typo in comment Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall Link: https://lore.kernel.org/r/20220521111145.81697-13-Julia.Lawall@inria.fr Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index d361ba26203b..eadfd3ba3cbc 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -1127,7 +1127,7 @@ spi_max_transfer_size(struct spi_device *spi) if (ctlr->max_transfer_size) tr_max = ctlr->max_transfer_size(spi); - /* transfer size limit must not be greater than messsage size limit */ + /* transfer size limit must not be greater than message size limit */ return min(tr_max, msg_max); } -- cgit From 6598b91b5ac32bc756d7c3000a31f775d4ead1c4 Mon Sep 17 00:00:00 2001 From: David Jander Date: Tue, 24 May 2022 11:18:08 +0200 Subject: spi: spi.c: Convert statistics to per-cpu u64_stats_t This change gives a dramatic performance improvement in the hot path, since many costly spin_lock_irqsave() calls can be avoided. On an i.MX8MM system with a MCP2518FD CAN controller connected via SPI, the time the driver takes to handle interrupts, or in other words the time the IRQ line of the CAN controller stays low is mainly dominated by the time it takes to do 3 relatively short sync SPI transfers. The effect of this patch is a reduction of this time from 136us down to only 98us. Suggested-by: Andrew Lunn Signed-off-by: David Jander Link: https://lore.kernel.org/r/20220524091808.2269898-1-david@protonic.nl Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 52 +++++++++++++++++++++++++++---------------------- 1 file changed, 29 insertions(+), 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index eadfd3ba3cbc..eac8d3caf954 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -17,6 +17,7 @@ #include #include +#include struct dma_chan; struct software_node; @@ -59,37 +60,42 @@ extern struct bus_type spi_bus_type; * maxsize limit */ struct spi_statistics { - spinlock_t lock; /* lock for the whole structure */ + struct u64_stats_sync syncp; - unsigned long messages; - unsigned long transfers; - unsigned long errors; - unsigned long timedout; + u64_stats_t messages; + u64_stats_t transfers; + u64_stats_t errors; + u64_stats_t timedout; - unsigned long spi_sync; - unsigned long spi_sync_immediate; - unsigned long spi_async; + u64_stats_t spi_sync; + u64_stats_t spi_sync_immediate; + u64_stats_t spi_async; - unsigned long long bytes; - unsigned long long bytes_rx; - unsigned long long bytes_tx; + u64_stats_t bytes; + u64_stats_t bytes_rx; + u64_stats_t bytes_tx; #define SPI_STATISTICS_HISTO_SIZE 17 - unsigned long transfer_bytes_histo[SPI_STATISTICS_HISTO_SIZE]; + u64_stats_t transfer_bytes_histo[SPI_STATISTICS_HISTO_SIZE]; - unsigned long transfers_split_maxsize; + u64_stats_t transfers_split_maxsize; }; -#define SPI_STATISTICS_ADD_TO_FIELD(stats, field, count) \ - do { \ - unsigned long flags; \ - spin_lock_irqsave(&(stats)->lock, flags); \ - (stats)->field += count; \ - spin_unlock_irqrestore(&(stats)->lock, flags); \ +#define SPI_STATISTICS_ADD_TO_FIELD(pcpu_stats, field, count) \ + do { \ + struct spi_statistics *__lstats = this_cpu_ptr(pcpu_stats); \ + u64_stats_update_begin(&__lstats->syncp); \ + u64_stats_add(&__lstats->field, count); \ + u64_stats_update_end(&__lstats->syncp); \ } while (0) -#define SPI_STATISTICS_INCREMENT_FIELD(stats, field) \ - SPI_STATISTICS_ADD_TO_FIELD(stats, field, 1) +#define SPI_STATISTICS_INCREMENT_FIELD(pcpu_stats, field) \ + do { \ + struct spi_statistics *__lstats = this_cpu_ptr(pcpu_stats); \ + u64_stats_update_begin(&__lstats->syncp); \ + u64_stats_inc(&__lstats->field); \ + u64_stats_update_end(&__lstats->syncp); \ + } while (0) /** * struct spi_delay - SPI delay information @@ -194,7 +200,7 @@ struct spi_device { struct spi_delay cs_inactive; /* the statistics */ - struct spi_statistics statistics; + struct spi_statistics __percpu *pcpu_statistics; /* * likely need more hooks for more protocol options affecting how @@ -647,7 +653,7 @@ struct spi_controller { s8 max_native_cs; /* statistics */ - struct spi_statistics statistics; + struct spi_statistics __percpu *pcpu_statistics; /* DMA channels for use with core dmaengine helpers */ struct dma_chan *dma_tx; -- cgit From fc602b4f692cb83c937b5f79628bca32b60c4402 Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Sat, 4 Jun 2022 12:32:50 +0100 Subject: mtd: spinand: Add support for ATO25D1GA Add support for the ATO25D1GA SPI NAND flash. Datasheet: - https://atta.szlcsc.com/upload/public/pdf/source/20191212/C469320_04599D67B03B078044EB65FF5AEDDDE9.pdf Signed-off-by: Aidan MacDonald Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220604113250.4745-1-aidanmacdonald.0x0@gmail.com --- include/linux/mtd/spinand.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mtd/spinand.h b/include/linux/mtd/spinand.h index 5584d3bb6556..6d3392a7edc6 100644 --- a/include/linux/mtd/spinand.h +++ b/include/linux/mtd/spinand.h @@ -260,6 +260,7 @@ struct spinand_manufacturer { }; /* SPI NAND manufacturers */ +extern const struct spinand_manufacturer ato_spinand_manufacturer; extern const struct spinand_manufacturer gigadevice_spinand_manufacturer; extern const struct spinand_manufacturer macronix_spinand_manufacturer; extern const struct spinand_manufacturer micron_spinand_manufacturer; -- cgit From 39d649602be2ceb3f8f8e98483d09e4d71133c6a Mon Sep 17 00:00:00 2001 From: Clément Léger Date: Wed, 1 Jun 2022 10:17:58 +0200 Subject: of: constify of_property_check_flags() prop argument MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This argument is not modified and thus can be set as const. Signed-off-by: Clément Léger Signed-off-by: Rob Herring Link: https://lore.kernel.org/r/20220601081801.348571-2-clement.leger@bootlin.com --- include/linux/of.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index f0a5d6b10c5a..d74fd82a6963 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -207,7 +207,7 @@ static inline void of_node_clear_flag(struct device_node *n, unsigned long flag) } #if defined(CONFIG_OF_DYNAMIC) || defined(CONFIG_SPARC) -static inline int of_property_check_flag(struct property *p, unsigned long flag) +static inline int of_property_check_flag(const struct property *p, unsigned long flag) { return test_bit(flag, &p->_flags); } @@ -814,7 +814,8 @@ static inline void of_node_clear_flag(struct device_node *n, unsigned long flag) { } -static inline int of_property_check_flag(struct property *p, unsigned long flag) +static inline int of_property_check_flag(const struct property *p, + unsigned long flag) { return 0; } -- cgit From 7b6c7a877cc616bc7dc9cd39646fe454acbed48b Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Fri, 3 Jun 2022 08:04:44 -0700 Subject: x86/ftrace: Remove OBJECT_FILES_NON_STANDARD usage The file-wide OBJECT_FILES_NON_STANDARD annotation is used with CONFIG_FRAME_POINTER to tell objtool to skip the entire file when frame pointers are enabled. However that annotation is now deprecated because it doesn't work with IBT, where objtool runs on vmlinux.o instead of individual translation units. Instead, use more fine-grained function-specific annotations: - The 'save_mcount_regs' macro does funny things with the frame pointer. Use STACK_FRAME_NON_STANDARD_FP to tell objtool to ignore the functions using it. - The return_to_handler() "function" isn't actually a callable function. Instead of being called, it's returned to. The real return address isn't on the stack, so unwinding is already doomed no matter which unwinder is used. So just remove the STT_FUNC annotation, telling objtool to ignore it. That also removes the implicit ANNOTATE_NOENDBR, which now needs to be made explicit. Fixes the following warning: vmlinux.o: warning: objtool: __fentry__+0x16: return with modified stack frame Fixes: ed53a0d97192 ("x86/alternative: Use .ibt_endbr_seal to seal indirect calls") Reported-by: kernel test robot Signed-off-by: Josh Poimboeuf Link: https://lore.kernel.org/r/b7a7a42fe306aca37826043dac89e113a1acdbac.1654268610.git.jpoimboe@kernel.org --- include/linux/objtool.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/objtool.h b/include/linux/objtool.h index 6491fa8fba6d..15b940ec1eac 100644 --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -143,6 +143,12 @@ struct unwind_hint { .popsection .endm +.macro STACK_FRAME_NON_STANDARD_FP func:req +#ifdef CONFIG_FRAME_POINTER + STACK_FRAME_NON_STANDARD \func +#endif +.endm + .macro ANNOTATE_NOENDBR .Lhere_\@: .pushsection .discard.noendbr -- cgit From 77991645952c21962a095910c51fe0f73d35bf91 Mon Sep 17 00:00:00 2001 From: Roger Knecht Date: Sat, 21 May 2022 14:47:45 +0200 Subject: crc-itu-t: fix typo in CRC ITU-T polynomial comment The code comment says that the polynomial is x^16 + x^12 + x^15 + 1, but the correct polynomial is x^16 + x^12 + x^5 + 1. Quoting from page 2 in the ITU-T V.41 specification [1]: 2 Encoding and checking process The service bits and information bits, taken in conjunction, correspond to the coefficients of a message polynomial having terms from x^(n-1) (n = total number of bits in a block or sequence) down to x^16. This polynomial is divided, modulo 2, by the generating polynomial x^16 + x^12 + x^5 + 1. The hex (truncated) polynomial 0x1021 and CRC code implementation are correct, however. [1] https://www.itu.int/rec/T-REC-V.41-198811-I/en Signed-off-by: Roger Knecht Acked-by: Randy Dunlap Signed-off-by: Jason A. Donenfeld --- include/linux/crc-itu-t.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/crc-itu-t.h b/include/linux/crc-itu-t.h index a4367051e192..2f991a427ade 100644 --- a/include/linux/crc-itu-t.h +++ b/include/linux/crc-itu-t.h @@ -4,7 +4,7 @@ * * Implements the standard CRC ITU-T V.41: * Width 16 - * Poly 0x1021 (x^16 + x^12 + x^15 + 1) + * Poly 0x1021 (x^16 + x^12 + x^5 + 1) * Init 0 */ -- cgit From ff8372a467fa28752610f57a6512c5365413faa2 Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Mon, 6 Jun 2022 10:24:34 +0800 Subject: net: skb: move enum skb_drop_reason to standalone header file As the skb drop reasons are getting more and more, move the enum 'skb_drop_reason' and related function to the standalone header 'dropreason.h', as Jakub Kicinski suggested. Signed-off-by: Menglong Dong Signed-off-by: Paolo Abeni --- include/linux/skbuff.h | 179 +------------------------------------------------ 1 file changed, 1 insertion(+), 178 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d3d10556f0fa..82edf0359ab3 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -43,6 +43,7 @@ #include #endif #include +#include /** * DOC: skb checksums @@ -337,184 +338,6 @@ struct sk_buff_head { struct sk_buff; -/* The reason of skb drop, which is used in kfree_skb_reason(). - * en...maybe they should be splited by group? - * - * Each item here should also be in 'TRACE_SKB_DROP_REASON', which is - * used to translate the reason to string. - */ -enum skb_drop_reason { - SKB_NOT_DROPPED_YET = 0, - SKB_DROP_REASON_NOT_SPECIFIED, /* drop reason is not specified */ - SKB_DROP_REASON_NO_SOCKET, /* socket not found */ - SKB_DROP_REASON_PKT_TOO_SMALL, /* packet size is too small */ - SKB_DROP_REASON_TCP_CSUM, /* TCP checksum error */ - SKB_DROP_REASON_SOCKET_FILTER, /* dropped by socket filter */ - SKB_DROP_REASON_UDP_CSUM, /* UDP checksum error */ - SKB_DROP_REASON_NETFILTER_DROP, /* dropped by netfilter */ - SKB_DROP_REASON_OTHERHOST, /* packet don't belong to current - * host (interface is in promisc - * mode) - */ - SKB_DROP_REASON_IP_CSUM, /* IP checksum error */ - SKB_DROP_REASON_IP_INHDR, /* there is something wrong with - * IP header (see - * IPSTATS_MIB_INHDRERRORS) - */ - SKB_DROP_REASON_IP_RPFILTER, /* IP rpfilter validate failed. - * see the document for rp_filter - * in ip-sysctl.rst for more - * information - */ - SKB_DROP_REASON_UNICAST_IN_L2_MULTICAST, /* destination address of L2 - * is multicast, but L3 is - * unicast. - */ - SKB_DROP_REASON_XFRM_POLICY, /* xfrm policy check failed */ - SKB_DROP_REASON_IP_NOPROTO, /* no support for IP protocol */ - SKB_DROP_REASON_SOCKET_RCVBUFF, /* socket receive buff is full */ - SKB_DROP_REASON_PROTO_MEM, /* proto memory limition, such as - * udp packet drop out of - * udp_memory_allocated. - */ - SKB_DROP_REASON_TCP_MD5NOTFOUND, /* no MD5 hash and one - * expected, corresponding - * to LINUX_MIB_TCPMD5NOTFOUND - */ - SKB_DROP_REASON_TCP_MD5UNEXPECTED, /* MD5 hash and we're not - * expecting one, corresponding - * to LINUX_MIB_TCPMD5UNEXPECTED - */ - SKB_DROP_REASON_TCP_MD5FAILURE, /* MD5 hash and its wrong, - * corresponding to - * LINUX_MIB_TCPMD5FAILURE - */ - SKB_DROP_REASON_SOCKET_BACKLOG, /* failed to add skb to socket - * backlog (see - * LINUX_MIB_TCPBACKLOGDROP) - */ - SKB_DROP_REASON_TCP_FLAGS, /* TCP flags invalid */ - SKB_DROP_REASON_TCP_ZEROWINDOW, /* TCP receive window size is zero, - * see LINUX_MIB_TCPZEROWINDOWDROP - */ - SKB_DROP_REASON_TCP_OLD_DATA, /* the TCP data reveived is already - * received before (spurious retrans - * may happened), see - * LINUX_MIB_DELAYEDACKLOST - */ - SKB_DROP_REASON_TCP_OVERWINDOW, /* the TCP data is out of window, - * the seq of the first byte exceed - * the right edges of receive - * window - */ - SKB_DROP_REASON_TCP_OFOMERGE, /* the data of skb is already in - * the ofo queue, corresponding to - * LINUX_MIB_TCPOFOMERGE - */ - SKB_DROP_REASON_TCP_RFC7323_PAWS, /* PAWS check, corresponding to - * LINUX_MIB_PAWSESTABREJECTED - */ - SKB_DROP_REASON_TCP_INVALID_SEQUENCE, /* Not acceptable SEQ field */ - SKB_DROP_REASON_TCP_RESET, /* Invalid RST packet */ - SKB_DROP_REASON_TCP_INVALID_SYN, /* Incoming packet has unexpected SYN flag */ - SKB_DROP_REASON_TCP_CLOSE, /* TCP socket in CLOSE state */ - SKB_DROP_REASON_TCP_FASTOPEN, /* dropped by FASTOPEN request socket */ - SKB_DROP_REASON_TCP_OLD_ACK, /* TCP ACK is old, but in window */ - SKB_DROP_REASON_TCP_TOO_OLD_ACK, /* TCP ACK is too old */ - SKB_DROP_REASON_TCP_ACK_UNSENT_DATA, /* TCP ACK for data we haven't sent yet */ - SKB_DROP_REASON_TCP_OFO_QUEUE_PRUNE, /* pruned from TCP OFO queue */ - SKB_DROP_REASON_TCP_OFO_DROP, /* data already in receive queue */ - SKB_DROP_REASON_IP_OUTNOROUTES, /* route lookup failed */ - SKB_DROP_REASON_BPF_CGROUP_EGRESS, /* dropped by - * BPF_PROG_TYPE_CGROUP_SKB - * eBPF program - */ - SKB_DROP_REASON_IPV6DISABLED, /* IPv6 is disabled on the device */ - SKB_DROP_REASON_NEIGH_CREATEFAIL, /* failed to create neigh - * entry - */ - SKB_DROP_REASON_NEIGH_FAILED, /* neigh entry in failed state */ - SKB_DROP_REASON_NEIGH_QUEUEFULL, /* arp_queue for neigh - * entry is full - */ - SKB_DROP_REASON_NEIGH_DEAD, /* neigh entry is dead */ - SKB_DROP_REASON_TC_EGRESS, /* dropped in TC egress HOOK */ - SKB_DROP_REASON_QDISC_DROP, /* dropped by qdisc when packet - * outputting (failed to enqueue to - * current qdisc) - */ - SKB_DROP_REASON_CPU_BACKLOG, /* failed to enqueue the skb to - * the per CPU backlog queue. This - * can be caused by backlog queue - * full (see netdev_max_backlog in - * net.rst) or RPS flow limit - */ - SKB_DROP_REASON_XDP, /* dropped by XDP in input path */ - SKB_DROP_REASON_TC_INGRESS, /* dropped in TC ingress HOOK */ - SKB_DROP_REASON_UNHANDLED_PROTO, /* protocol not implemented - * or not supported - */ - SKB_DROP_REASON_SKB_CSUM, /* sk_buff checksum computation - * error - */ - SKB_DROP_REASON_SKB_GSO_SEG, /* gso segmentation error */ - SKB_DROP_REASON_SKB_UCOPY_FAULT, /* failed to copy data from - * user space, e.g., via - * zerocopy_sg_from_iter() - * or skb_orphan_frags_rx() - */ - SKB_DROP_REASON_DEV_HDR, /* device driver specific - * header/metadata is invalid - */ - /* the device is not ready to xmit/recv due to any of its data - * structure that is not up/ready/initialized, e.g., the IFF_UP is - * not set, or driver specific tun->tfiles[txq] is not initialized - */ - SKB_DROP_REASON_DEV_READY, - SKB_DROP_REASON_FULL_RING, /* ring buffer is full */ - SKB_DROP_REASON_NOMEM, /* error due to OOM */ - SKB_DROP_REASON_HDR_TRUNC, /* failed to trunc/extract the header - * from networking data, e.g., failed - * to pull the protocol header from - * frags via pskb_may_pull() - */ - SKB_DROP_REASON_TAP_FILTER, /* dropped by (ebpf) filter directly - * attached to tun/tap, e.g., via - * TUNSETFILTEREBPF - */ - SKB_DROP_REASON_TAP_TXFILTER, /* dropped by tx filter implemented - * at tun/tap, e.g., check_filter() - */ - SKB_DROP_REASON_ICMP_CSUM, /* ICMP checksum error */ - SKB_DROP_REASON_INVALID_PROTO, /* the packet doesn't follow RFC - * 2211, such as a broadcasts - * ICMP_TIMESTAMP - */ - SKB_DROP_REASON_IP_INADDRERRORS, /* host unreachable, corresponding - * to IPSTATS_MIB_INADDRERRORS - */ - SKB_DROP_REASON_IP_INNOROUTES, /* network unreachable, corresponding - * to IPSTATS_MIB_INADDRERRORS - */ - SKB_DROP_REASON_PKT_TOO_BIG, /* packet size is too big (maybe exceed - * the MTU) - */ - SKB_DROP_REASON_MAX, -}; - -#define SKB_DR_INIT(name, reason) \ - enum skb_drop_reason name = SKB_DROP_REASON_##reason -#define SKB_DR(name) \ - SKB_DR_INIT(name, NOT_SPECIFIED) -#define SKB_DR_SET(name, reason) \ - (name = SKB_DROP_REASON_##reason) -#define SKB_DR_OR(name, reason) \ - do { \ - if (name == SKB_DROP_REASON_NOT_SPECIFIED || \ - name == SKB_NOT_DROPPED_YET) \ - SKB_DR_SET(name, reason); \ - } while (0) - /* To allow 64K frame to be packed as single skb without frag_list we * require 64K/PAGE_SIZE pages plus 1 additional page to allow for * buffers which do not start on a page boundary. -- cgit From c4f135d643823a869becfa87539f7820ef9d5bfa Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Wed, 1 Jun 2022 16:32:47 +0900 Subject: workqueue: Wrap flush_workqueue() using a macro Since flush operation synchronously waits for completion, flushing system-wide WQs (e.g. system_wq) might introduce possibility of deadlock due to unexpected locking dependency. Tejun Heo commented at [1] that it makes no sense at all to call flush_workqueue() on the shared WQs as the caller has no idea what it's gonna end up waiting for. Although there is flush_scheduled_work() which flushes system_wq WQ with "Think twice before calling this function! It's very easy to get into trouble if you don't take great care." warning message, syzbot found a circular locking dependency caused by flushing system_wq WQ [2]. Therefore, let's change the direction to that developers had better use their local WQs if flush_scheduled_work()/flush_workqueue(system_*_wq) is inevitable. Steps for converting system-wide WQs into local WQs are explained at [3], and a conversion to stop flushing system-wide WQs is in progress. Now we want some mechanism for preventing developers who are not aware of this conversion from again start flushing system-wide WQs. Since I found that WARN_ON() is complete but awkward approach for teaching developers about this problem, let's use __compiletime_warning() for incomplete but handy approach. For completeness, we will also insert WARN_ON() into __flush_workqueue() after all in-tree users stopped calling flush_scheduled_work(). Link: https://lore.kernel.org/all/YgnQGZWT%2Fn3VAITX@slm.duckdns.org/ [1] Link: https://syzkaller.appspot.com/bug?extid=bde0f89deacca7c765b8 [2] Link: https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp [3] Signed-off-by: Tetsuo Handa Signed-off-by: Tejun Heo --- include/linux/workqueue.h | 64 +++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 56 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 7fee9b6cfede..e1f1c8b1121b 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -445,7 +445,7 @@ extern bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq, struct delayed_work *dwork, unsigned long delay); extern bool queue_rcu_work(struct workqueue_struct *wq, struct rcu_work *rwork); -extern void flush_workqueue(struct workqueue_struct *wq); +extern void __flush_workqueue(struct workqueue_struct *wq); extern void drain_workqueue(struct workqueue_struct *wq); extern int schedule_on_each_cpu(work_func_t func); @@ -563,15 +563,23 @@ static inline bool schedule_work(struct work_struct *work) return queue_work(system_wq, work); } +/* + * Detect attempt to flush system-wide workqueues at compile time when possible. + * + * See https://lkml.kernel.org/r/49925af7-78a8-a3dd-bce6-cfc02e1a9236@I-love.SAKURA.ne.jp + * for reasons and steps for converting system-wide workqueues into local workqueues. + */ +extern void __warn_flushing_systemwide_wq(void) + __compiletime_warning("Please avoid flushing system-wide workqueues."); + /** * flush_scheduled_work - ensure that any scheduled work has run to completion. * * Forces execution of the kernel-global workqueue and blocks until its * completion. * - * Think twice before calling this function! It's very easy to get into - * trouble if you don't take great care. Either of the following situations - * will lead to deadlock: + * It's very easy to get into trouble if you don't take great care. + * Either of the following situations will lead to deadlock: * * One of the work items currently on the workqueue needs to acquire * a lock held by your code or its caller. @@ -586,11 +594,51 @@ static inline bool schedule_work(struct work_struct *work) * need to know that a particular work item isn't queued and isn't running. * In such cases you should use cancel_delayed_work_sync() or * cancel_work_sync() instead. + * + * Please stop calling this function! A conversion to stop flushing system-wide + * workqueues is in progress. This function will be removed after all in-tree + * users stopped calling this function. */ -static inline void flush_scheduled_work(void) -{ - flush_workqueue(system_wq); -} +/* + * The background of commit 771c035372a036f8 ("deprecate the + * '__deprecated' attribute warnings entirely and for good") is that, + * since Linus builds all modules between every single pull he does, + * the standard kernel build needs to be _clean_ in order to be able to + * notice when new problems happen. Therefore, don't emit warning while + * there are in-tree users. + */ +#define flush_scheduled_work() \ +({ \ + if (0) \ + __warn_flushing_systemwide_wq(); \ + __flush_workqueue(system_wq); \ +}) + +/* + * Although there is no longer in-tree caller, for now just emit warning + * in order to give out-of-tree callers time to update. + */ +#define flush_workqueue(wq) \ +({ \ + struct workqueue_struct *_wq = (wq); \ + \ + if ((__builtin_constant_p(_wq == system_wq) && \ + _wq == system_wq) || \ + (__builtin_constant_p(_wq == system_highpri_wq) && \ + _wq == system_highpri_wq) || \ + (__builtin_constant_p(_wq == system_long_wq) && \ + _wq == system_long_wq) || \ + (__builtin_constant_p(_wq == system_unbound_wq) && \ + _wq == system_unbound_wq) || \ + (__builtin_constant_p(_wq == system_freezable_wq) && \ + _wq == system_freezable_wq) || \ + (__builtin_constant_p(_wq == system_power_efficient_wq) && \ + _wq == system_power_efficient_wq) || \ + (__builtin_constant_p(_wq == system_freezable_power_efficient_wq) && \ + _wq == system_freezable_power_efficient_wq)) \ + __warn_flushing_systemwide_wq(); \ + __flush_workqueue(_wq); \ +}) /** * schedule_delayed_work_on - queue work in global workqueue on CPU after delay -- cgit From 5f69a6577bc33d8f6d6bbe02bccdeb357b287f56 Mon Sep 17 00:00:00 2001 From: Chen Wandun Date: Thu, 26 May 2022 20:26:56 +0800 Subject: psi: dont alloc memory for psi by default Memory about struct psi_group is allocated by default for each cgroup even if psi_disabled is true, in this case, these allocated memory is waste, so alloc memory for struct psi_group only when psi_disabled is false. Signed-off-by: Chen Wandun Acked-by: Johannes Weiner Signed-off-by: Tejun Heo --- include/linux/cgroup-defs.h | 2 +- include/linux/cgroup.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 1bfcfb1af352..672de25e3ec8 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -475,7 +475,7 @@ struct cgroup { struct work_struct release_agent_work; /* used to track pressure stalls */ - struct psi_group psi; + struct psi_group *psi; /* used to store eBPF programs */ struct cgroup_bpf bpf; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 0d1ada8968d7..ed53bfe7c46c 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -674,7 +674,7 @@ static inline void pr_cont_cgroup_path(struct cgroup *cgrp) static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) { - return &cgrp->psi; + return cgrp->psi; } bool cgroup_psi_enabled(void); -- cgit From 6089fb325cf737eeb2c4d236c94697112ca860da Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Mon, 6 Jun 2022 23:26:00 -0700 Subject: bpf: Add btf enum64 support Currently, BTF only supports upto 32bit enum value with BTF_KIND_ENUM. But in kernel, some enum indeed has 64bit values, e.g., in uapi bpf.h, we have enum { BPF_F_INDEX_MASK = 0xffffffffULL, BPF_F_CURRENT_CPU = BPF_F_INDEX_MASK, BPF_F_CTXLEN_MASK = (0xfffffULL << 32), }; In this case, BTF_KIND_ENUM will encode the value of BPF_F_CTXLEN_MASK as 0, which certainly is incorrect. This patch added a new btf kind, BTF_KIND_ENUM64, which permits 64bit value to cover the above use case. The BTF_KIND_ENUM64 has the following three fields followed by the common type: struct bpf_enum64 { __u32 nume_off; __u32 val_lo32; __u32 val_hi32; }; Currently, btf type section has an alignment of 4 as all element types are u32. Representing the value with __u64 will introduce a pad for bpf_enum64 and may also introduce misalignment for the 64bit value. Hence, two members of val_hi32 and val_lo32 are chosen to avoid these issues. The kflag is also introduced for BTF_KIND_ENUM and BTF_KIND_ENUM64 to indicate whether the value is signed or unsigned. The kflag intends to provide consistent output of BTF C fortmat with the original source code. For example, the original BTF_KIND_ENUM bit value is 0xffffffff. The format C has two choices, printing out 0xffffffff or -1 and current libbpf prints out as unsigned value. But if the signedness is preserved in btf, the value can be printed the same as the original source code. The kflag value 0 means unsigned values, which is consistent to the default by libbpf and should also cover most cases as well. The new BTF_KIND_ENUM64 is intended to support the enum value represented as 64bit value. But it can represent all BTF_KIND_ENUM values as well. The compiler ([1]) and pahole will generate BTF_KIND_ENUM64 only if the value has to be represented with 64 bits. In addition, a static inline function btf_kind_core_compat() is introduced which will be used later when libbpf relo_core.c changed. Here the kernel shares the same relo_core.c with libbpf. [1] https://reviews.llvm.org/D124641 Acked-by: Andrii Nakryiko Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20220607062600.3716578-1-yhs@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/btf.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index 2611cea2c2b6..1bfed7fa0428 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -177,6 +177,19 @@ static inline bool btf_type_is_enum(const struct btf_type *t) return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM; } +static inline bool btf_is_any_enum(const struct btf_type *t) +{ + return BTF_INFO_KIND(t->info) == BTF_KIND_ENUM || + BTF_INFO_KIND(t->info) == BTF_KIND_ENUM64; +} + +static inline bool btf_kind_core_compat(const struct btf_type *t1, + const struct btf_type *t2) +{ + return BTF_INFO_KIND(t1->info) == BTF_INFO_KIND(t2->info) || + (btf_is_any_enum(t1) && btf_is_any_enum(t2)); +} + static inline bool str_is_empty(const char *s) { return !s || !s[0]; @@ -192,6 +205,16 @@ static inline bool btf_is_enum(const struct btf_type *t) return btf_kind(t) == BTF_KIND_ENUM; } +static inline bool btf_is_enum64(const struct btf_type *t) +{ + return btf_kind(t) == BTF_KIND_ENUM64; +} + +static inline u64 btf_enum64_value(const struct btf_enum64 *e) +{ + return ((u64)e->val_hi32 << 32) | e->val_lo32; +} + static inline bool btf_is_composite(const struct btf_type *t) { u16 kind = btf_kind(t); @@ -332,6 +355,11 @@ static inline struct btf_enum *btf_enum(const struct btf_type *t) return (struct btf_enum *)(t + 1); } +static inline struct btf_enum64 *btf_enum64(const struct btf_type *t) +{ + return (struct btf_enum64 *)(t + 1); +} + static inline const struct btf_var_secinfo *btf_type_var_secinfo( const struct btf_type *t) { -- cgit From 0e3c3b901c00364198d31482fa2552ccf2d5c899 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Mon, 6 Jun 2022 18:42:59 -0400 Subject: No need of likely/unlikely on calls of check_copy_size() it's inline and unlikely() inside of it (including the implicit one in WARN_ON_ONCE()) suffice to convince the compiler that getting false from check_copy_size() is unlikely. Spotted-by: Jens Axboe Reviewed-by: Christoph Hellwig Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro --- include/linux/uaccess.h | 4 ++-- include/linux/uio.h | 15 ++++++--------- 2 files changed, 8 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 5a328cf02b75..47e5d374c7eb 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -148,7 +148,7 @@ _copy_to_user(void __user *, const void *, unsigned long); static __always_inline unsigned long __must_check copy_from_user(void *to, const void __user *from, unsigned long n) { - if (likely(check_copy_size(to, n, false))) + if (check_copy_size(to, n, false)) n = _copy_from_user(to, from, n); return n; } @@ -156,7 +156,7 @@ copy_from_user(void *to, const void __user *from, unsigned long n) static __always_inline unsigned long __must_check copy_to_user(void __user *to, const void *from, unsigned long n) { - if (likely(check_copy_size(from, n, true))) + if (check_copy_size(from, n, true)) n = _copy_to_user(to, from, n); return n; } diff --git a/include/linux/uio.h b/include/linux/uio.h index 739285fe5a2f..76d305f3d4c2 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -156,19 +156,17 @@ static inline size_t copy_folio_to_iter(struct folio *folio, size_t offset, static __always_inline __must_check size_t copy_to_iter(const void *addr, size_t bytes, struct iov_iter *i) { - if (unlikely(!check_copy_size(addr, bytes, true))) - return 0; - else + if (check_copy_size(addr, bytes, true)) return _copy_to_iter(addr, bytes, i); + return 0; } static __always_inline __must_check size_t copy_from_iter(void *addr, size_t bytes, struct iov_iter *i) { - if (unlikely(!check_copy_size(addr, bytes, false))) - return 0; - else + if (check_copy_size(addr, bytes, false)) return _copy_from_iter(addr, bytes, i); + return 0; } static __always_inline __must_check @@ -184,10 +182,9 @@ bool copy_from_iter_full(void *addr, size_t bytes, struct iov_iter *i) static __always_inline __must_check size_t copy_from_iter_nocache(void *addr, size_t bytes, struct iov_iter *i) { - if (unlikely(!check_copy_size(addr, bytes, false))) - return 0; - else + if (check_copy_size(addr, bytes, false)) return _copy_from_iter_nocache(addr, bytes, i); + return 0; } static __always_inline __must_check -- cgit From b1d288d9c3c5ca28df062214656a59cf7ee370e0 Mon Sep 17 00:00:00 2001 From: Prashant Malani Date: Mon, 6 Jun 2022 20:18:03 +0000 Subject: platform/chrome: cros_ec_proto: Rename cros_ec_command function cros_ec_command() is the name of a function as well as a struct, as such it can confuse indexing tools (like ctags). Avoid this by renaming it to cros_ec_cmd(). Update all the callsites to use the new name. This patch is a find-and-replace, so should not introduce any functional changes. Suggested-by: Stephen Boyd Signed-off-by: Prashant Malani Acked-by: Lee Jones Reviewed-by: Stephen Boyd Reviewed-by: Guenter Roeck Signed-off-by: Tzung-Bi Shih Link: https://lore.kernel.org/r/20220606201825.763788-3-pmalani@chromium.org --- include/linux/platform_data/cros_ec_proto.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_proto.h b/include/linux/platform_data/cros_ec_proto.h index 138fd912c808..816da4eef3e5 100644 --- a/include/linux/platform_data/cros_ec_proto.h +++ b/include/linux/platform_data/cros_ec_proto.h @@ -231,7 +231,7 @@ bool cros_ec_check_features(struct cros_ec_dev *ec, int feature); int cros_ec_get_sensor_count(struct cros_ec_dev *ec); -int cros_ec_command(struct cros_ec_device *ec_dev, unsigned int version, int command, void *outdata, +int cros_ec_cmd(struct cros_ec_device *ec_dev, unsigned int version, int command, void *outdata, int outsize, void *indata, int insize); /** -- cgit From f87e15fbf6d8cdb51f953338d41a4a52ad1aca14 Mon Sep 17 00:00:00 2001 From: Prashant Malani Date: Mon, 6 Jun 2022 20:18:05 +0000 Subject: platform/chrome: cros_ec_proto: Update size arg types cros_ec_cmd() takes 2 size arguments. Update them to be of the more appropriate type size_t. Suggested-by: Stephen Boyd Signed-off-by: Prashant Malani Reviewed-by: Stephen Boyd Reviewed-by: Guenter Roeck Signed-off-by: Tzung-Bi Shih Link: https://lore.kernel.org/r/20220606201825.763788-4-pmalani@chromium.org --- include/linux/platform_data/cros_ec_proto.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_proto.h b/include/linux/platform_data/cros_ec_proto.h index 816da4eef3e5..85e29300f63d 100644 --- a/include/linux/platform_data/cros_ec_proto.h +++ b/include/linux/platform_data/cros_ec_proto.h @@ -232,7 +232,7 @@ bool cros_ec_check_features(struct cros_ec_dev *ec, int feature); int cros_ec_get_sensor_count(struct cros_ec_dev *ec); int cros_ec_cmd(struct cros_ec_device *ec_dev, unsigned int version, int command, void *outdata, - int outsize, void *indata, int insize); + size_t outsize, void *indata, size_t insize); /** * cros_ec_get_time_ns() - Return time in ns. -- cgit From 62ed448cc53b654036f7d7f3c99f299d79ad14c3 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 7 Jun 2022 16:47:58 -0400 Subject: SUNRPC: Optimize xdr_reserve_space() Transitioning between encode buffers is quite infrequent. It happens about 1 time in 400 calls to xdr_reserve_space(), measured on NFSD with a typical build/test workload. Force the compiler to remove that code from xdr_reserve_space(), which is a hot path on both the server and the client. This change reduces the size of xdr_reserve_space() from 10 cache lines to 2 when compiled with -Os. Signed-off-by: Chuck Lever Reviewed-by: J. Bruce Fields --- include/linux/sunrpc/xdr.h | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 4417f667c757..5860f32e3958 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -243,7 +243,7 @@ extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes); extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes); -extern void xdr_commit_encode(struct xdr_stream *xdr); +extern void __xdr_commit_encode(struct xdr_stream *xdr); extern void xdr_truncate_encode(struct xdr_stream *xdr, size_t len); extern int xdr_restrict_buflen(struct xdr_stream *xdr, int newbuflen); extern void xdr_write_pages(struct xdr_stream *xdr, struct page **pages, @@ -306,6 +306,20 @@ xdr_reset_scratch_buffer(struct xdr_stream *xdr) xdr_set_scratch_buffer(xdr, NULL, 0); } +/** + * xdr_commit_encode - Ensure all data is written to xdr->buf + * @xdr: pointer to xdr_stream + * + * Handle encoding across page boundaries by giving the caller a + * temporary location to write to, then later copying the data into + * place. __xdr_commit_encode() does that copying. + */ +static inline void xdr_commit_encode(struct xdr_stream *xdr) +{ + if (unlikely(xdr->scratch.iov_len)) + __xdr_commit_encode(xdr); +} + /** * xdr_stream_remaining - Return the number of bytes remaining in the stream * @xdr: pointer to struct xdr_stream -- cgit From 5dfac65b621733e69b789150a0a3f1bf2f9095a3 Mon Sep 17 00:00:00 2001 From: David Jander Date: Wed, 8 Jun 2022 17:33:09 +0200 Subject: spi: : Add missing documentation for struct members Fixes these "make htmldocs" warnings: include/linux/spi/spi.h:82: warning: Function parameter or member 'syncp' not described in 'spi_statistics' include/linux/spi/spi.h:213: warning: Function parameter or member 'pcpu_statistics' not described in 'spi_device' include/linux/spi/spi.h:676: warning: Function parameter or member 'pcpu_statistics' not described in 'spi_controller' Fixes: 6598b91b5ac3 ("spi: spi.c: Convert statistics to per-cpu u64_stats_t") Reported-by: Stephen Rothwell Signed-off-by: David Jander Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220608153309.2899565-1-david@protonic.nl Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index eac8d3caf954..2e63b4935deb 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -35,7 +35,8 @@ extern struct bus_type spi_bus_type; /** * struct spi_statistics - statistics for spi transfers - * @lock: lock protecting this structure + * @syncp: seqcount to protect members in this struct for per-cpu udate + * on 32-bit systems * * @messages: number of spi-messages handled * @transfers: number of spi_transfers handled @@ -155,7 +156,7 @@ extern int spi_delay_exec(struct spi_delay *_delay, struct spi_transfer *xfer); * @cs_inactive: delay to be introduced by the controller after CS is * deasserted. If @cs_change_delay is used from @spi_transfer, then the * two delays will be added up. - * @statistics: statistics for the spi_device + * @pcpu_statistics: statistics for the spi_device * * A @spi_device is used to interchange data between an SPI slave * (usually a discrete chip) and CPU memory. @@ -439,7 +440,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @max_native_cs: When cs_gpiods is used, and this field is filled in, * spi_register_controller() will validate all native CS (including the * unused native CS) against this value. - * @statistics: statistics for the spi_controller + * @pcpu_statistics: statistics for the spi_controller * @dma_tx: DMA transmit channel * @dma_rx: DMA receive channel * @dummy_rx: dummy receive buffer for full-duplex devices -- cgit From d5a37b19983725d2045588cfa3a4699f5b39ae26 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 8 Jun 2022 08:34:07 +0200 Subject: block: remove bioset_init_from_src Unused now, and the interface never really made a whole lot of sense to start with. Signed-off-by: Christoph Hellwig Signed-off-by: Mike Snitzer --- include/linux/bio.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 1cf3738ef1ea..992ee987f273 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -403,7 +403,6 @@ enum { extern int bioset_init(struct bio_set *, unsigned int, unsigned int, int flags); extern void bioset_exit(struct bio_set *); extern int biovec_init_pool(mempool_t *pool, int pool_entries); -extern int bioset_init_from_src(struct bio_set *bs, struct bio_set *src); struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, unsigned int opf, gfp_t gfp_mask, -- cgit From 00d1f546470d89e072dd3cda12b5c794341e7268 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Thu, 9 Jun 2022 12:19:01 +0800 Subject: vdpa: make get_vq_group and set_group_asid optional This patch makes get_vq_group and set_group_asid optional. This is needed to unbreak the vDPA parent that doesn't support multiple address spaces. Cc: Gautam Dawar Fixes: aaca8373c4b1 ("vhost-vdpa: support ASID based IOTLB API") Signed-off-by: Jason Wang Message-Id: <20220609041901.2029-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vdpa.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 4700a88a28f6..7b4a13d3bd91 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -178,7 +178,8 @@ struct vdpa_map_file { * for the device * @vdev: vdpa device * Returns virtqueue algin requirement - * @get_vq_group: Get the group id for a specific virtqueue + * @get_vq_group: Get the group id for a specific + * virtqueue (optional) * @vdev: vdpa device * @idx: virtqueue index * Returns u32: group id for this virtqueue @@ -243,7 +244,7 @@ struct vdpa_map_file { * Returns the iova range supported by * the device. * @set_group_asid: Set address space identifier for a - * virtqueue group + * virtqueue group (optional) * @vdev: vdpa device * @group: virtqueue group * @asid: address space id for this group -- cgit From 0c90466a7985d39355f743e9cd2139da3e86c4d8 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Fri, 3 Jun 2022 23:07:45 +0200 Subject: mtd: hyperbus: Make hyperbus_unregister_device() return void MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The only thing that could theoretically fail in that function is mtd_device_unregister(). However it's not supposed to fail and when used correctly it doesn't. So wail loudly if it does anyhow. This matches how other drivers (e.g. nand/raw/nandsim.c) use mtd_device_unregister(). This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220603210758.148493-2-u.kleine-koenig@pengutronix.de --- include/linux/mtd/hyperbus.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mtd/hyperbus.h b/include/linux/mtd/hyperbus.h index 0ce612428aea..bb6b7121a542 100644 --- a/include/linux/mtd/hyperbus.h +++ b/include/linux/mtd/hyperbus.h @@ -89,9 +89,7 @@ int hyperbus_register_device(struct hyperbus_device *hbdev); /** * hyperbus_unregister_device - deregister HyperBus slave memory device * @hbdev: hyperbus_device to be unregistered - * - * Return: 0 for success, others for failure. */ -int hyperbus_unregister_device(struct hyperbus_device *hbdev); +void hyperbus_unregister_device(struct hyperbus_device *hbdev); #endif /* __LINUX_MTD_HYPERBUS_H__ */ -- cgit From 0949ee75da6c918fcbd567e1bfa4943a56ab4e5d Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Tue, 7 Jun 2022 20:23:34 +0200 Subject: firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer This function just returned 0 on success or an errno code on error, but it could be useful for sysfb_init() callers to have a pointer to the device. Signed-off-by: Javier Martinez Canillas Reviewed-by: Daniel Vetter Reviewed-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-2-javierm@redhat.com --- include/linux/sysfb.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h index b0dcfa26d07b..708152e9037b 100644 --- a/include/linux/sysfb.h +++ b/include/linux/sysfb.h @@ -72,8 +72,8 @@ static inline void sysfb_apply_efi_quirks(struct platform_device *pd) bool sysfb_parse_mode(const struct screen_info *si, struct simplefb_platform_data *mode); -int sysfb_create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode); +struct platform_device *sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode); #else /* CONFIG_SYSFB_SIMPLE */ @@ -83,10 +83,10 @@ static inline bool sysfb_parse_mode(const struct screen_info *si, return false; } -static inline int sysfb_create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode) +static inline struct platform_device *sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode) { - return -EINVAL; + return ERR_PTR(-EINVAL); } #endif /* CONFIG_SYSFB_SIMPLE */ -- cgit From bc824922b264aff40eba8c160972ee07a95e7dd4 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Tue, 7 Jun 2022 20:23:35 +0200 Subject: firmware: sysfb: Add sysfb_disable() helper function This can be used by subsystems to unregister a platform device registered by sysfb and also to disable future platform device registration in sysfb. Suggested-by: Daniel Vetter Signed-off-by: Javier Martinez Canillas Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-3-javierm@redhat.com --- include/linux/sysfb.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h index 708152e9037b..8ba8b5be5567 100644 --- a/include/linux/sysfb.h +++ b/include/linux/sysfb.h @@ -55,6 +55,18 @@ struct efifb_dmi_info { int flags; }; +#ifdef CONFIG_SYSFB + +void sysfb_disable(void); + +#else /* CONFIG_SYSFB */ + +static inline void sysfb_disable(void) +{ +} + +#endif /* CONFIG_SYSFB */ + #ifdef CONFIG_EFI extern struct efifb_dmi_info efifb_dmi_list[]; -- cgit From 69a37a8ba1b408a1c7616494aa7018e4b3844cbe Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 8 Jun 2022 15:18:34 -0400 Subject: mm/huge_memory: Fix xarray node memory leak If xas_split_alloc() fails to allocate the necessary nodes to complete the xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem() then interprets as "Please allocate more memory", not as "Please free any unnecessary memory" (which was the intended outcome). It's confusing to use xas_nomem() to free memory in this context, so call xas_destroy() instead. Reported-by: syzbot+9e27a75a8c24f3fe75c1@syzkaller.appspotmail.com Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Cc: stable@vger.kernel.org Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/xarray.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/xarray.h b/include/linux/xarray.h index 72feab5ea8d4..c29e11b2c073 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -1508,6 +1508,7 @@ void *xas_find_marked(struct xa_state *, unsigned long max, xa_mark_t); void xas_init_marks(const struct xa_state *); bool xas_nomem(struct xa_state *, gfp_t); +void xas_destroy(struct xa_state *); void xas_pause(struct xa_state *); void xas_create_range(struct xa_state *); -- cgit From 334f6f53abcf57782bd2fe81da1cbd893e4ef05c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Thu, 9 Jun 2022 09:13:57 -0400 Subject: mm: Add kernel-doc for folio->mlock_count Fix "./include/linux/mm_types.h:279: warning: Function parameter or member 'mlock_count' not described in 'folio'". Also neaten the html by hiding the anon struct. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/mm_types.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index b34ff2cdbc4f..c29ab4c0cd5c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -227,6 +227,7 @@ struct page { * struct folio - Represents a contiguous set of bytes. * @flags: Identical to the page flags. * @lru: Least Recently Used list; tracks how recently this folio was used. + * @mlock_count: Number of times this folio has been pinned by mlock(). * @mapping: The file this page belongs to, or refers to the anon_vma for * anonymous memory. * @index: Offset within the file, in units of pages. For anonymous memory, @@ -255,10 +256,14 @@ struct folio { unsigned long flags; union { struct list_head lru; + /* private: avoid cluttering the output */ struct { void *__filler; + /* public: */ unsigned int mlock_count; + /* private: */ }; + /* public: */ }; struct address_space *mapping; pgoff_t index; -- cgit From 874c8ca1e60b2c564a48f7e7acc40d328d5c8733 Mon Sep 17 00:00:00 2001 From: David Howells Date: Thu, 9 Jun 2022 21:46:04 +0100 Subject: netfs: Fix gcc-12 warning by embedding vfs inode in netfs_i_context While randstruct was satisfied with using an open-coded "void *" offset cast for the netfs_i_context <-> inode casting, __builtin_object_size() as used by FORTIFY_SOURCE was not as easily fooled. This was causing the following complaint[1] from gcc v12: In file included from include/linux/string.h:253, from include/linux/ceph/ceph_debug.h:7, from fs/ceph/inode.c:2: In function 'fortify_memset_chk', inlined from 'netfs_i_context_init' at include/linux/netfs.h:326:2, inlined from 'ceph_alloc_inode' at fs/ceph/inode.c:463:2: include/linux/fortify-string.h:242:25: warning: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 242 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by embedding a struct inode into struct netfs_i_context (which should perhaps be renamed to struct netfs_inode). The struct inode vfs_inode fields are then removed from the 9p, afs, ceph and cifs inode structs and vfs_inode is then simply changed to "netfs.inode" in those filesystems. Further, rename netfs_i_context to netfs_inode, get rid of the netfs_inode() function that converted a netfs_i_context pointer to an inode pointer (that can now be done with &ctx->inode) and rename the netfs_i_context() function to netfs_inode() (which is now a wrapper around container_of()). Most of the changes were done with: perl -p -i -e 's/vfs_inode/netfs.inode/'g \ `git grep -l 'vfs_inode' -- fs/{9p,afs,ceph,cifs}/*.[ch]` Kees suggested doing it with a pair structure[2] and a special declarator to insert that into the network filesystem's inode wrapper[3], but I think it's cleaner to embed it - and then it doesn't matter if struct randomisation reorders things. Dave Chinner suggested using a filesystem-specific VFS_I() function in each filesystem to convert that filesystem's own inode wrapper struct into the VFS inode struct[4]. Version #2: - Fix a couple of missed name changes due to a disabled cifs option. - Rename nfs_i_context to nfs_inode - Use "netfs" instead of "nic" as the member name in per-fs inode wrapper structs. [ This also undoes commit 507160f46c55 ("netfs: gcc-12: temporarily disable '-Wattribute-warning' for now") that is no longer needed ] Fixes: bc899ee1c898 ("netfs: Add a netfs inode context") Reported-by: Jeff Layton Signed-off-by: David Howells Reviewed-by: Jeff Layton Reviewed-by: Kees Cook Reviewed-by: Xiubo Li cc: Jonathan Corbet cc: Eric Van Hensbergen cc: Latchesar Ionkov cc: Dominique Martinet cc: Christian Schoenebeck cc: Marc Dionne cc: Ilya Dryomov cc: Steve French cc: William Kucharski cc: "Matthew Wilcox (Oracle)" cc: Dave Chinner cc: linux-doc@vger.kernel.org cc: v9fs-developer@lists.sourceforge.net cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: samba-technical@lists.samba.org cc: linux-fsdevel@vger.kernel.org cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/r/d2ad3a3d7bdd794c6efb562d2f2b655fb67756b9.camel@kernel.org/ [1] Link: https://lore.kernel.org/r/20220517210230.864239-1-keescook@chromium.org/ [2] Link: https://lore.kernel.org/r/20220518202212.2322058-1-keescook@chromium.org/ [3] Link: https://lore.kernel.org/r/20220524101205.GI2306852@dread.disaster.area/ [4] Link: https://lore.kernel.org/r/165296786831.3591209.12111293034669289733.stgit@warthog.procyon.org.uk/ # v1 Link: https://lore.kernel.org/r/165305805651.4094995.7763502506786714216.stgit@warthog.procyon.org.uk # v2 Signed-off-by: Linus Torvalds --- include/linux/netfs.h | 41 ++++++++++++++++------------------------- 1 file changed, 16 insertions(+), 25 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 77fa6a61706a..6dbb4c9ce50d 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -119,9 +119,10 @@ typedef void (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error, bool was_async); /* - * Per-inode description. This must be directly after the inode struct. + * Per-inode context. This wraps the VFS inode. */ -struct netfs_i_context { +struct netfs_inode { + struct inode inode; /* The VFS inode */ const struct netfs_request_ops *ops; #if IS_ENABLED(CONFIG_FSCACHE) struct fscache_cookie *cache; @@ -256,7 +257,7 @@ struct netfs_cache_ops { * boundary as appropriate. */ enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq, - loff_t i_size); + loff_t i_size); /* Prepare a write operation, working out what part of the write we can * actually do. @@ -288,45 +289,35 @@ extern void netfs_put_subrequest(struct netfs_io_subrequest *subreq, extern void netfs_stats_show(struct seq_file *); /** - * netfs_i_context - Get the netfs inode context from the inode + * netfs_inode - Get the netfs inode context from the inode * @inode: The inode to query * * Get the netfs lib inode context from the network filesystem's inode. The * context struct is expected to directly follow on from the VFS inode struct. */ -static inline struct netfs_i_context *netfs_i_context(struct inode *inode) +static inline struct netfs_inode *netfs_inode(struct inode *inode) { - return (void *)inode + sizeof(*inode); + return container_of(inode, struct netfs_inode, inode); } /** - * netfs_inode - Get the netfs inode from the inode context - * @ctx: The context to query - * - * Get the netfs inode from the netfs library's inode context. The VFS inode - * is expected to directly precede the context struct. - */ -static inline struct inode *netfs_inode(struct netfs_i_context *ctx) -{ - return (void *)ctx - sizeof(struct inode); -} - -/** - * netfs_i_context_init - Initialise a netfs lib context + * netfs_inode_init - Initialise a netfslib inode context * @inode: The inode with which the context is associated * @ops: The netfs's operations list * * Initialise the netfs library context struct. This is expected to follow on * directly from the VFS inode struct. */ -static inline void netfs_i_context_init(struct inode *inode, - const struct netfs_request_ops *ops) +static inline void netfs_inode_init(struct inode *inode, + const struct netfs_request_ops *ops) { - struct netfs_i_context *ctx = netfs_i_context(inode); + struct netfs_inode *ctx = netfs_inode(inode); - memset(ctx, 0, sizeof(*ctx)); ctx->ops = ops; ctx->remote_i_size = i_size_read(inode); +#if IS_ENABLED(CONFIG_FSCACHE) + ctx->cache = NULL; +#endif } /** @@ -338,7 +329,7 @@ static inline void netfs_i_context_init(struct inode *inode, */ static inline void netfs_resize_file(struct inode *inode, loff_t new_i_size) { - struct netfs_i_context *ctx = netfs_i_context(inode); + struct netfs_inode *ctx = netfs_inode(inode); ctx->remote_i_size = new_i_size; } @@ -352,7 +343,7 @@ static inline void netfs_resize_file(struct inode *inode, loff_t new_i_size) static inline struct fscache_cookie *netfs_i_cookie(struct inode *inode) { #if IS_ENABLED(CONFIG_FSCACHE) - struct netfs_i_context *ctx = netfs_i_context(inode); + struct netfs_inode *ctx = netfs_inode(inode); return ctx->cache; #else return NULL; -- cgit From ea7f0f777d28db6e500a05836f2a9d467c7012de Mon Sep 17 00:00:00 2001 From: Tzung-Bi Shih Date: Thu, 9 Jun 2022 08:49:37 +0000 Subject: platform/chrome: cros_ec_commands: fix compile errors Fix compile errors when including cros_ec_commands.h solely. 1. cros_ec_commands.h:587:9: error: unknown type name 'uint8_t' 587 | uint8_t flags; | ^~~~~~~ 2. cros_ec_commands.h:1105:43: error: implicit declaration of function 'BIT' 1105 | EC_COMMS_STATUS_PROCESSING = BIT(0), | ^~~ Reviewed-by: Guenter Roeck Signed-off-by: Tzung-Bi Shih Link: https://lore.kernel.org/r/20220609084957.3684698-2-tzungbi@kernel.org --- include/linux/platform_data/cros_ec_commands.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_commands.h b/include/linux/platform_data/cros_ec_commands.h index e59f51c41a1c..f13568b3e247 100644 --- a/include/linux/platform_data/cros_ec_commands.h +++ b/include/linux/platform_data/cros_ec_commands.h @@ -13,8 +13,8 @@ #ifndef __CROS_EC_COMMANDS_H #define __CROS_EC_COMMANDS_H - - +#include +#include #define BUILD_ASSERT(_cond) -- cgit From 3db0c9e5de7bd9dbe52580eb9752b2b3049e38da Mon Sep 17 00:00:00 2001 From: Tzung-Bi Shih Date: Thu, 9 Jun 2022 08:49:39 +0000 Subject: platform/chrome: use macros for passthru indexes Move passthru indexes for EC and PD devices to common header. Also use them instead of literal constants. Reviewed-by: Guenter Roeck Signed-off-by: Tzung-Bi Shih Link: https://lore.kernel.org/r/20220609084957.3684698-4-tzungbi@kernel.org --- include/linux/platform_data/cros_ec_proto.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_proto.h b/include/linux/platform_data/cros_ec_proto.h index 85e29300f63d..c82a8285d936 100644 --- a/include/linux/platform_data/cros_ec_proto.h +++ b/include/linux/platform_data/cros_ec_proto.h @@ -21,6 +21,9 @@ #define CROS_EC_DEV_SCP_NAME "cros_scp" #define CROS_EC_DEV_TP_NAME "cros_tp" +#define CROS_EC_DEV_EC_INDEX 0 +#define CROS_EC_DEV_PD_INDEX 1 + /* * The EC is unresponsive for a time after a reboot command. Add a * simple delay to make sure that the bus stays locked. -- cgit From d62607c3fe45911b2331fac073355a8c914bbde2 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 7 Jun 2022 21:39:55 -0700 Subject: net: rename reference+tracking helpers Netdev reference helpers have a dev_ prefix for historic reasons. Renaming the old helpers would be too much churn but we can rename the tracking ones which are relatively recent and should be the default for new code. Rename: dev_hold_track() -> netdev_hold() dev_put_track() -> netdev_put() dev_replace_track() -> netdev_ref_replace() Link: https://lore.kernel.org/r/20220608043955.919359-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f615a66c89e9..e2e5088888b1 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3981,8 +3981,8 @@ static inline void netdev_tracker_free(struct net_device *dev, #endif } -static inline void dev_hold_track(struct net_device *dev, - netdevice_tracker *tracker, gfp_t gfp) +static inline void netdev_hold(struct net_device *dev, + netdevice_tracker *tracker, gfp_t gfp) { if (dev) { __dev_hold(dev); @@ -3990,8 +3990,8 @@ static inline void dev_hold_track(struct net_device *dev, } } -static inline void dev_put_track(struct net_device *dev, - netdevice_tracker *tracker) +static inline void netdev_put(struct net_device *dev, + netdevice_tracker *tracker) { if (dev) { netdev_tracker_free(dev, tracker); @@ -4004,11 +4004,11 @@ static inline void dev_put_track(struct net_device *dev, * @dev: network device * * Hold reference to device to keep it from being freed. - * Try using dev_hold_track() instead. + * Try using netdev_hold() instead. */ static inline void dev_hold(struct net_device *dev) { - dev_hold_track(dev, NULL, GFP_ATOMIC); + netdev_hold(dev, NULL, GFP_ATOMIC); } /** @@ -4016,17 +4016,17 @@ static inline void dev_hold(struct net_device *dev) * @dev: network device * * Release reference to device to allow it to be freed. - * Try using dev_put_track() instead. + * Try using netdev_put() instead. */ static inline void dev_put(struct net_device *dev) { - dev_put_track(dev, NULL); + netdev_put(dev, NULL); } -static inline void dev_replace_track(struct net_device *odev, - struct net_device *ndev, - netdevice_tracker *tracker, - gfp_t gfp) +static inline void netdev_ref_replace(struct net_device *odev, + struct net_device *ndev, + netdevice_tracker *tracker, + gfp_t gfp) { if (odev) netdev_tracker_free(odev, tracker); -- cgit From 09cca53c1656960c38c04bb12da30f61952ca660 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 8 Jun 2022 08:46:32 -0700 Subject: vlan: adopt u64_stats_t As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Add READ_ONCE() when reading rx_errors & tx_dropped. Signed-off-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- include/linux/if_macvlan.h | 6 +++--- include/linux/if_vlan.h | 10 +++++----- 2 files changed, 8 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index b42294739063..523025106a64 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -46,10 +46,10 @@ static inline void macvlan_count_rx(const struct macvlan_dev *vlan, pcpu_stats = get_cpu_ptr(vlan->pcpu_stats); u64_stats_update_begin(&pcpu_stats->syncp); - pcpu_stats->rx_packets++; - pcpu_stats->rx_bytes += len; + u64_stats_inc(&pcpu_stats->rx_packets); + u64_stats_add(&pcpu_stats->rx_bytes, len); if (multicast) - pcpu_stats->rx_multicast++; + u64_stats_inc(&pcpu_stats->rx_multicast); u64_stats_update_end(&pcpu_stats->syncp); put_cpu_ptr(vlan->pcpu_stats); } else { diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h index 2be4dd7e90a9..e00c4ee81ff7 100644 --- a/include/linux/if_vlan.h +++ b/include/linux/if_vlan.h @@ -118,11 +118,11 @@ static inline void vlan_drop_rx_stag_filter_info(struct net_device *dev) * @tx_dropped: number of tx drops */ struct vlan_pcpu_stats { - u64 rx_packets; - u64 rx_bytes; - u64 rx_multicast; - u64 tx_packets; - u64 tx_bytes; + u64_stats_t rx_packets; + u64_stats_t rx_bytes; + u64_stats_t rx_multicast; + u64_stats_t tx_packets; + u64_stats_t tx_bytes; struct u64_stats_sync syncp; u32 rx_errors; u32 tx_dropped; -- cgit From 9962acefbcb92736c268aafe5f52200948f60f3e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 8 Jun 2022 08:46:37 -0700 Subject: net: adopt u64_stats_t in struct pcpu_sw_netstats As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e2e5088888b1..89afa4f7747d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2636,10 +2636,10 @@ struct packet_offload { /* often modified stats are per-CPU, other are shared (netdev->stats) */ struct pcpu_sw_netstats { - u64 rx_packets; - u64 rx_bytes; - u64 tx_packets; - u64 tx_bytes; + u64_stats_t rx_packets; + u64_stats_t rx_bytes; + u64_stats_t tx_packets; + u64_stats_t tx_bytes; struct u64_stats_sync syncp; } __aligned(4 * sizeof(u64)); @@ -2656,8 +2656,8 @@ static inline void dev_sw_netstats_rx_add(struct net_device *dev, unsigned int l struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats); u64_stats_update_begin(&tstats->syncp); - tstats->rx_bytes += len; - tstats->rx_packets++; + u64_stats_add(&tstats->rx_bytes, len); + u64_stats_inc(&tstats->rx_packets); u64_stats_update_end(&tstats->syncp); } @@ -2668,8 +2668,8 @@ static inline void dev_sw_netstats_tx_add(struct net_device *dev, struct pcpu_sw_netstats *tstats = this_cpu_ptr(dev->tstats); u64_stats_update_begin(&tstats->syncp); - tstats->tx_bytes += len; - tstats->tx_packets += packets; + u64_stats_add(&tstats->tx_bytes, len); + u64_stats_add(&tstats->tx_packets, packets); u64_stats_update_end(&tstats->syncp); } -- cgit From 9ec321aba2eaca109139a2a67c65ede7f07bd8ec Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 8 Jun 2022 08:46:40 -0700 Subject: team: adopt u64_stats_t As explained in commit 316580b69d0a ("u64_stats: provide u64_stats_t type") we should use u64_stats_t and related accessors to avoid load/store tearing. Signed-off-by: Eric Dumazet Signed-off-by: Jakub Kicinski --- include/linux/if_team.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/if_team.h b/include/linux/if_team.h index add607943c95..fc985e5c739d 100644 --- a/include/linux/if_team.h +++ b/include/linux/if_team.h @@ -12,11 +12,11 @@ #include struct team_pcpu_stats { - u64 rx_packets; - u64 rx_bytes; - u64 rx_multicast; - u64 tx_packets; - u64 tx_bytes; + u64_stats_t rx_packets; + u64_stats_t rx_bytes; + u64_stats_t rx_multicast; + u64_stats_t tx_packets; + u64_stats_t tx_bytes; struct u64_stats_sync syncp; u32 rx_dropped; u32 tx_dropped; -- cgit From 21ab562c1f65e583b5d89081acaf096805522482 Mon Sep 17 00:00:00 2001 From: Po Hao Huang Date: Wed, 8 Jun 2022 19:32:22 +0800 Subject: ieee80211: add trigger frame definition Define trigger stype of control frame, and its checking function, struct and trigger type within common_info of trigger. Signed-off-by: Po Hao Huang Signed-off-by: Ping-Ke Shih Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20220608113224.11193-2-pkshih@realtek.com --- include/linux/ieee80211.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 75d40acb60c1..5c65ae6b8154 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -76,6 +76,7 @@ #define IEEE80211_STYPE_ACTION 0x00D0 /* control */ +#define IEEE80211_STYPE_TRIGGER 0x0020 #define IEEE80211_STYPE_CTL_EXT 0x0060 #define IEEE80211_STYPE_BACK_REQ 0x0080 #define IEEE80211_STYPE_BACK 0x0090 @@ -295,6 +296,17 @@ static inline u16 ieee80211_sn_sub(u16 sn1, u16 sn2) #define IEEE80211_HT_CTL_LEN 4 +/* trigger type within common_info of trigger frame */ +#define IEEE80211_TRIGGER_TYPE_MASK 0xf +#define IEEE80211_TRIGGER_TYPE_BASIC 0x0 +#define IEEE80211_TRIGGER_TYPE_BFRP 0x1 +#define IEEE80211_TRIGGER_TYPE_MU_BAR 0x2 +#define IEEE80211_TRIGGER_TYPE_MU_RTS 0x3 +#define IEEE80211_TRIGGER_TYPE_BSRP 0x4 +#define IEEE80211_TRIGGER_TYPE_GCR_MU_BAR 0x5 +#define IEEE80211_TRIGGER_TYPE_BQRP 0x6 +#define IEEE80211_TRIGGER_TYPE_NFRP 0x7 + struct ieee80211_hdr { __le16 frame_control; __le16 duration_id; @@ -324,6 +336,15 @@ struct ieee80211_qos_hdr { __le16 qos_ctrl; } __packed __aligned(2); +struct ieee80211_trigger { + __le16 frame_control; + __le16 duration; + u8 ra[ETH_ALEN]; + u8 ta[ETH_ALEN]; + __le64 common_info; + u8 variable[]; +} __packed __aligned(2); + /** * ieee80211_has_tods - check if IEEE80211_FCTL_TODS is set * @fc: frame control bytes in little-endian byteorder @@ -729,6 +750,16 @@ static inline bool ieee80211_is_qos_nullfunc(__le16 fc) cpu_to_le16(IEEE80211_FTYPE_DATA | IEEE80211_STYPE_QOS_NULLFUNC); } +/** + * ieee80211_is_trigger - check if frame is trigger frame + * @fc: frame control field in little-endian byteorder + */ +static inline bool ieee80211_is_trigger(__le16 fc) +{ + return (fc & cpu_to_le16(IEEE80211_FCTL_FTYPE | IEEE80211_FCTL_STYPE)) == + cpu_to_le16(IEEE80211_FTYPE_CTL | IEEE80211_STYPE_TRIGGER); +} + /** * ieee80211_is_any_nullfunc - check if frame is regular or QoS nullfunc frame * @fc: frame control bytes in little-endian byteorder -- cgit From e3fa404a261bb8b7795b62be1b9af7b7dc1030f6 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sat, 28 May 2022 16:59:17 +0200 Subject: USB: Follow-up to SPDX identifiers addition - remove now useless comments All these files have been updated in the commit given in the Fixes: tag below. When the SPDX-License-Identifier: has been added, the corresponding text at the beginning of the files has not been deleted. All these texts are about GPL-2.0, with different variation in the wording. Remove these now useless lines to save some LoC. Fixes: 5fd54ace4721 ("USB: add SPDX identifiers to all remaining files in drivers/usb/") Signed-off-by: Christophe JAILLET Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/audio-v2.h | 3 --- include/linux/usb/audio.h | 3 --- include/linux/usb/cdc-wdm.h | 4 ---- include/linux/usb/cdc.h | 4 ---- include/linux/usb/gadget.h | 2 -- include/linux/usb/input.h | 4 ---- include/linux/usb/isp1301.h | 10 ---------- include/linux/usb/m66592.h | 14 -------------- include/linux/usb/of.h | 2 -- include/linux/usb/r8a66597.h | 14 -------------- include/linux/usb/serial.h | 5 ----- include/linux/usb/storage.h | 2 -- include/linux/usb/tegra_usb_phy.h | 10 ---------- include/linux/usb/ulpi.h | 4 ---- include/linux/usb/xhci-dbgp.h | 4 ---- 15 files changed, 85 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb/audio-v2.h b/include/linux/usb/audio-v2.h index 8fc2abd7aecb..ca796dc1a984 100644 --- a/include/linux/usb/audio-v2.h +++ b/include/linux/usb/audio-v2.h @@ -2,9 +2,6 @@ /* * Copyright (c) 2010 Daniel Mack * - * This software is distributed under the terms of the GNU General Public - * License ("GPL") version 2, as published by the Free Software Foundation. - * * This file holds USB constants and structures defined * by the USB Device Class Definition for Audio Devices in version 2.0. * Comments below reference relevant sections of the documents contained diff --git a/include/linux/usb/audio.h b/include/linux/usb/audio.h index 170acd500ea1..0747b24a1a7c 100644 --- a/include/linux/usb/audio.h +++ b/include/linux/usb/audio.h @@ -6,9 +6,6 @@ * Developed for Thumtronics by Grey Innovation * Ben Williamson * - * This software is distributed under the terms of the GNU General Public - * License ("GPL") version 2, as published by the Free Software Foundation. - * * This file holds USB constants and structures defined * by the USB Device Class Definition for Audio Devices. * Comments below reference relevant sections of that document: diff --git a/include/linux/usb/cdc-wdm.h b/include/linux/usb/cdc-wdm.h index 9f5a51f79ba5..85417f00a89a 100644 --- a/include/linux/usb/cdc-wdm.h +++ b/include/linux/usb/cdc-wdm.h @@ -3,10 +3,6 @@ * USB CDC Device Management subdriver * * Copyright (c) 2012 Bjørn Mork - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * version 2 as published by the Free Software Foundation. */ #ifndef __LINUX_USB_CDC_WDM_H diff --git a/include/linux/usb/cdc.h b/include/linux/usb/cdc.h index 35d784cf32a4..0af0db51bc89 100644 --- a/include/linux/usb/cdc.h +++ b/include/linux/usb/cdc.h @@ -3,10 +3,6 @@ * USB CDC common helpers * * Copyright (c) 2015 Oliver Neukum - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License - * version 2 as published by the Free Software Foundation. */ #ifndef __LINUX_USB_CDC_H #define __LINUX_USB_CDC_H diff --git a/include/linux/usb/gadget.h b/include/linux/usb/gadget.h index 3ad58b7a0824..dc3092cea99e 100644 --- a/include/linux/usb/gadget.h +++ b/include/linux/usb/gadget.h @@ -10,8 +10,6 @@ * * (C) Copyright 2002-2004 by David Brownell * All Rights Reserved. - * - * This software is licensed under the GNU GPL version 2. */ #ifndef __LINUX_USB_GADGET_H diff --git a/include/linux/usb/input.h b/include/linux/usb/input.h index 974befa72ac0..5e759b2cf551 100644 --- a/include/linux/usb/input.h +++ b/include/linux/usb/input.h @@ -1,10 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2005 Dmitry Torokhov - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License version 2 as published by - * the Free Software Foundation. */ #ifndef __LINUX_USB_INPUT_H diff --git a/include/linux/usb/isp1301.h b/include/linux/usb/isp1301.h index dedb3b2473e8..fa986b926a12 100644 --- a/include/linux/usb/isp1301.h +++ b/include/linux/usb/isp1301.h @@ -3,16 +3,6 @@ * NXP ISP1301 USB transceiver driver * * Copyright (C) 2012 Roland Stigge - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * */ #ifndef __LINUX_USB_ISP1301_H diff --git a/include/linux/usb/m66592.h b/include/linux/usb/m66592.h index 2dfe68183495..5f04de2b47fd 100644 --- a/include/linux/usb/m66592.h +++ b/include/linux/usb/m66592.h @@ -3,20 +3,6 @@ * M66592 driver platform data * * Copyright (C) 2009 Renesas Solutions Corp. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA - * */ #ifndef __LINUX_USB_M66592_H diff --git a/include/linux/usb/of.h b/include/linux/usb/of.h index dba55ccb9b53..98487fd7ab11 100644 --- a/include/linux/usb/of.h +++ b/include/linux/usb/of.h @@ -1,8 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* * OF helpers for usb devices. - * - * This file is released under the GPLv2 */ #ifndef __LINUX_USB_OF_H diff --git a/include/linux/usb/r8a66597.h b/include/linux/usb/r8a66597.h index c0753d026bbf..f0fa7ddadbaa 100644 --- a/include/linux/usb/r8a66597.h +++ b/include/linux/usb/r8a66597.h @@ -5,20 +5,6 @@ * Copyright (C) 2009 Renesas Solutions Corp. * * Author : Yoshihiro Shimoda - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA - * */ #ifndef __LINUX_USB_R8A66597_H diff --git a/include/linux/usb/serial.h b/include/linux/usb/serial.h index 16ea5a4cc586..8ea319f89e1f 100644 --- a/include/linux/usb/serial.h +++ b/include/linux/usb/serial.h @@ -4,11 +4,6 @@ * * Copyright (C) 1999 - 2012 * Greg Kroah-Hartman (greg@kroah.com) - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; version 2 of the License. - * */ #ifndef __LINUX_USB_SERIAL_H diff --git a/include/linux/usb/storage.h b/include/linux/usb/storage.h index e0240f864548..2827ce72e502 100644 --- a/include/linux/usb/storage.h +++ b/include/linux/usb/storage.h @@ -9,8 +9,6 @@ * * This file contains definitions taken from the * USB Mass Storage Class Specification Overview - * - * Distributed under the terms of the GNU GPL, version two. */ /* Storage subclass codes */ diff --git a/include/linux/usb/tegra_usb_phy.h b/include/linux/usb/tegra_usb_phy.h index d3e65eb9e16f..46e73584b6e6 100644 --- a/include/linux/usb/tegra_usb_phy.h +++ b/include/linux/usb/tegra_usb_phy.h @@ -1,16 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* * Copyright (C) 2010 Google, Inc. - * - * This software is licensed under the terms of the GNU General Public - * License version 2, as published by the Free Software Foundation, and - * may be copied, distributed, and modified under those terms. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * */ #ifndef __TEGRA_USB_PHY_H diff --git a/include/linux/usb/ulpi.h b/include/linux/usb/ulpi.h index 36c2982780ad..5050f502c1ed 100644 --- a/include/linux/usb/ulpi.h +++ b/include/linux/usb/ulpi.h @@ -3,10 +3,6 @@ * ulpi.h -- ULPI defines and function prorotypes * * Copyright (C) 2010 Nokia Corporation - * - * This software is distributed under the terms of the GNU General - * Public License ("GPL") as published by the Free Software Foundation, - * version 2 of that License. */ #ifndef __LINUX_USB_ULPI_H diff --git a/include/linux/usb/xhci-dbgp.h b/include/linux/usb/xhci-dbgp.h index 01fe768873f9..171fd74b1cfc 100644 --- a/include/linux/usb/xhci-dbgp.h +++ b/include/linux/usb/xhci-dbgp.h @@ -5,10 +5,6 @@ * Copyright (C) 2016 Intel Corporation * * Author: Lu Baolu - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. */ #ifndef __LINUX_XHCI_DBGP_H -- cgit From 3e00a22fdc9a9457f1c64d885532710e174babd8 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sat, 28 May 2022 17:13:56 +0200 Subject: USB: Follow-up to SPDX GPL-2.0+ identifiers addition - remove now useless comments All these files have been updated in the commit given in the Fixes: tag below. When the SPDX-License-Identifier: has been added, the corresponding text at the beginning of the files has not been deleted. All these texts are about GPL-2.0+, with different variation in the wording. Remove these now useless lines to save some LoC. Fixes: 5fd54ace4721 ("USB: add SPDX identifiers to all remaining files in drivers/usb/") Signed-off-by: Christophe JAILLET Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/c67x00.h | 15 --------------- include/linux/usb/composite.h | 14 -------------- include/linux/usb/ehci_def.h | 14 -------------- include/linux/usb/ehci_pdriver.h | 14 -------------- include/linux/usb/g_hid.h | 14 -------------- include/linux/usb/hcd.h | 14 -------------- include/linux/usb/musb-ux500.h | 10 ---------- include/linux/usb/net2280.h | 14 -------------- include/linux/usb/ohci_pdriver.h | 14 -------------- include/linux/usb/otg-fsm.h | 17 ++--------------- include/linux/usb/phy_companion.h | 10 ---------- include/linux/usb/rndis_host.h | 14 -------------- include/linux/usb/usb338x.h | 11 ----------- include/linux/usb/usbnet.h | 14 -------------- 14 files changed, 2 insertions(+), 187 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb/c67x00.h b/include/linux/usb/c67x00.h index 2fc39e3b7281..45e0757e58f3 100644 --- a/include/linux/usb/c67x00.h +++ b/include/linux/usb/c67x00.h @@ -3,21 +3,6 @@ * usb_c67x00.h: platform definitions for the Cypress C67X00 USB chip * * Copyright (C) 2006-2008 Barco N.V. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, - * MA 02110-1301 USA. */ #ifndef _LINUX_USB_C67X00_H diff --git a/include/linux/usb/composite.h b/include/linux/usb/composite.h index 9d2762279286..43ac3fa760db 100644 --- a/include/linux/usb/composite.h +++ b/include/linux/usb/composite.h @@ -3,20 +3,6 @@ * composite.h -- framework for usb gadgets which are composite devices * * Copyright (C) 2006-2008 David Brownell - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ #ifndef __LINUX_USB_COMPOSITE_H diff --git a/include/linux/usb/ehci_def.h b/include/linux/usb/ehci_def.h index c892c5bc6638..fbabadd3b372 100644 --- a/include/linux/usb/ehci_def.h +++ b/include/linux/usb/ehci_def.h @@ -1,20 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* * Copyright (c) 2001-2002 by David Brownell - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License as published by the - * Free Software Foundation; either version 2 of the License, or (at your - * option) any later version. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY - * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - * for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software Foundation, - * Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef __LINUX_USB_EHCI_DEF_H diff --git a/include/linux/usb/ehci_pdriver.h b/include/linux/usb/ehci_pdriver.h index 89fc901e778f..0f1b166f5aa0 100644 --- a/include/linux/usb/ehci_pdriver.h +++ b/include/linux/usb/ehci_pdriver.h @@ -1,20 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* * Copyright (C) 2012 Hauke Mehrtens - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License as published by the - * Free Software Foundation; either version 2 of the License, or (at your - * option) any later version. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY - * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - * for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software Foundation, - * Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef __USB_CORE_EHCI_PDRIVER_H diff --git a/include/linux/usb/g_hid.h b/include/linux/usb/g_hid.h index 7581e488c237..d56bfedeb079 100644 --- a/include/linux/usb/g_hid.h +++ b/include/linux/usb/g_hid.h @@ -3,20 +3,6 @@ * g_hid.h -- Header file for USB HID gadget driver * * Copyright (C) 2010 Fabien Chouteau - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #ifndef __LINUX_USB_G_HID_H diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h index 2c1fc9212cf2..37ccb31b1d40 100644 --- a/include/linux/usb/hcd.h +++ b/include/linux/usb/hcd.h @@ -1,20 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* * Copyright (c) 2001-2002 by David Brownell - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License as published by the - * Free Software Foundation; either version 2 of the License, or (at your - * option) any later version. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY - * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - * for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software Foundation, - * Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef __USB_CORE_HCD_H diff --git a/include/linux/usb/musb-ux500.h b/include/linux/usb/musb-ux500.h index c4b7ad9850ca..d60dcfc56b5a 100644 --- a/include/linux/usb/musb-ux500.h +++ b/include/linux/usb/musb-ux500.h @@ -1,16 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* * Copyright (C) 2013 ST-Ericsson AB - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __MUSB_UX500_H__ diff --git a/include/linux/usb/net2280.h b/include/linux/usb/net2280.h index 08b85caecfaf..f29fe6a1f415 100644 --- a/include/linux/usb/net2280.h +++ b/include/linux/usb/net2280.h @@ -5,20 +5,6 @@ * * Copyright (C) 2002 NetChip Technology, Inc. (http://www.netchip.com) * Copyright (C) 2003 David Brownell - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #ifndef __LINUX_USB_NET2280_H diff --git a/include/linux/usb/ohci_pdriver.h b/include/linux/usb/ohci_pdriver.h index 7eb16cf587ee..2447c78b1766 100644 --- a/include/linux/usb/ohci_pdriver.h +++ b/include/linux/usb/ohci_pdriver.h @@ -1,20 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ /* * Copyright (C) 2012 Hauke Mehrtens - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License as published by the - * Free Software Foundation; either version 2 of the License, or (at your - * option) any later version. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY - * or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License - * for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software Foundation, - * Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ #ifndef __USB_CORE_OHCI_PDRIVER_H diff --git a/include/linux/usb/otg-fsm.h b/include/linux/usb/otg-fsm.h index 784659d4dc99..6135d076c53d 100644 --- a/include/linux/usb/otg-fsm.h +++ b/include/linux/usb/otg-fsm.h @@ -1,19 +1,6 @@ // SPDX-License-Identifier: GPL-2.0+ -/* Copyright (C) 2007,2008 Freescale Semiconductor, Inc. - * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License as published by the - * Free Software Foundation; either version 2 of the License, or (at your - * option) any later version. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License for more details. - * - * You should have received a copy of the GNU General Public License along - * with this program; if not, write to the Free Software Foundation, Inc., - * 675 Mass Ave, Cambridge, MA 02139, USA. +/* + * Copyright (C) 2007,2008 Freescale Semiconductor, Inc. */ #ifndef __LINUX_USB_OTG_FSM_H diff --git a/include/linux/usb/phy_companion.h b/include/linux/usb/phy_companion.h index 263196f05015..862aaeca2319 100644 --- a/include/linux/usb/phy_companion.h +++ b/include/linux/usb/phy_companion.h @@ -3,18 +3,8 @@ * phy-companion.h -- phy companion to indicate the comparator part of PHY * * Copyright (C) 2012 Texas Instruments Incorporated - https://www.ti.com - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. * * Author: Kishon Vijay Abraham I - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * */ #ifndef __DRIVERS_PHY_COMPANION_H diff --git a/include/linux/usb/rndis_host.h b/include/linux/usb/rndis_host.h index cc42db51bbba..489cfb1d00f6 100644 --- a/include/linux/usb/rndis_host.h +++ b/include/linux/usb/rndis_host.h @@ -2,20 +2,6 @@ /* * Host Side support for RNDIS Networking Links * Copyright (C) 2005 by David Brownell - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #ifndef __LINUX_USB_RNDIS_HOST_H diff --git a/include/linux/usb/usb338x.h b/include/linux/usb/usb338x.h index 20020c1336d5..70a7e3cdb3c9 100644 --- a/include/linux/usb/usb338x.h +++ b/include/linux/usb/usb338x.h @@ -6,17 +6,6 @@ * Copyright (C) 2002 NetChip Technology, Inc. (http://www.netchip.com) * Copyright (C) 2003 David Brownell * Copyright (C) 2014 Ricardo Ribalda - Qtechnology/AS - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * */ #ifndef __LINUX_USB_USB338X_H diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index 1b4d72d5e891..4f5608cbd9ab 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -4,20 +4,6 @@ * * Copyright (C) 2000-2005 by David Brownell * Copyright (C) 2003-2005 David Hollis - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by - * the Free Software Foundation; either version 2 of the License, or - * (at your option) any later version. - * - * This program is distributed in the hope that it will be useful, - * but WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - * - * You should have received a copy of the GNU General Public License - * along with this program; if not, write to the Free Software - * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA */ #ifndef __LINUX_USB_USBNET_H -- cgit From 39e0f991a62ed5efabd20711a7b6e7da92603170 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Tue, 7 Jun 2022 17:00:16 +0200 Subject: random: mark bootloader randomness code as __init add_bootloader_randomness() and the variables it touches are only used during __init and not after, so mark these as __init. At the same time, unexport this, since it's only called by other __init code that's built-in. Cc: stable@vger.kernel.org Fixes: 428826f5358c ("fdt: add support for rng-seed") Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index fae0c84027fd..223b4bd584e7 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -13,7 +13,7 @@ struct notifier_block; void add_device_randomness(const void *buf, size_t len); -void add_bootloader_randomness(const void *buf, size_t len); +void __init add_bootloader_randomness(const void *buf, size_t len); void add_input_randomness(unsigned int type, unsigned int code, unsigned int value) __latent_entropy; void add_interrupt_randomness(int irq) __latent_entropy; -- cgit From e052a478a7daeca67664f7addd308ff51dd40654 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 8 Jun 2022 10:31:25 +0200 Subject: random: remove rng_has_arch_random() With arch randomness being used by every distro and enabled in defconfigs, the distinction between rng_has_arch_random() and rng_is_initialized() is now rather small. In fact, the places where they differ are now places where paranoid users and system builders really don't want arch randomness to be used, in which case we should respect that choice, or places where arch randomness is known to be broken, in which case that choice is all the more important. So this commit just removes the function and its one user. Reviewed-by: Petr Mladek # for vsprintf.c Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 223b4bd584e7..20e389a14e5c 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -74,7 +74,6 @@ static inline unsigned long get_random_canary(void) int __init random_init(const char *command_line); bool rng_is_initialized(void); -bool rng_has_arch_random(void); int wait_for_random_bytes(void); /* Calls wait_for_random_bytes() and then calls get_random_bytes(buf, nbytes). -- cgit From cfab87c2c2715763dc7e43d9968bdaa01cde4bc3 Mon Sep 17 00:00:00 2001 From: Vijaya Krishna Nivarthi Date: Wed, 8 Jun 2022 00:22:44 +0530 Subject: serial: core: Introduce callback for start_rx and do stop_rx in suspend only if this callback implementation is present. In suspend sequence there is a need to perform stop_rx during suspend sequence to prevent any asynchronous data over rx line. However this can cause problem to drivers which dont do re-start_rx during set_termios. Add new callback start_rx and perform stop_rx only when implementation of start_rx is present. Also add call to start_rx in resume sequence so that drivers who come across this problem can make use of this framework. Fixes: c9d2325cdb92 ("serial: core: Do stop_rx in suspend path for console if console_suspend is disabled") Reviewed-by: Douglas Anderson Signed-off-by: Vijaya Krishna Nivarthi Link: https://lore.kernel.org/r/1654627965-1461-2-git-send-email-quic_vnivarth@quicinc.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index cbd5070bc87f..657a0fc68a3f 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -45,6 +45,7 @@ struct uart_ops { void (*unthrottle)(struct uart_port *); void (*send_xchar)(struct uart_port *, char ch); void (*stop_rx)(struct uart_port *); + void (*start_rx)(struct uart_port *); void (*enable_ms)(struct uart_port *); void (*break_ctl)(struct uart_port *, int ctl); int (*startup)(struct uart_port *); -- cgit From 4173f018aae16b6496d292c234b858241f85254f Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Tue, 7 Jun 2022 12:49:12 +0200 Subject: tty/vt: consolemap: rename and document struct uni_pagedir struct uni_pagedir contains 32 unicode page directories, so the name of the structure is a bit misleading. Rename the structure to uni_pagedict, so it looks like this: struct uni_pagedict -> 32 page dirs -> 32 rows -> 64 glyphs Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220607104946.18710-2-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/console_struct.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/console_struct.h b/include/linux/console_struct.h index d5b9c8d40c18..f75033f0277f 100644 --- a/include/linux/console_struct.h +++ b/include/linux/console_struct.h @@ -17,7 +17,7 @@ #include #include -struct uni_pagedir; +struct uni_pagedict; struct uni_screen; #define NPAR 16 @@ -157,8 +157,8 @@ struct vc_data { unsigned int vc_bell_duration; /* Console bell duration */ unsigned short vc_cur_blink_ms; /* Cursor blink duration */ struct vc_data **vc_display_fg; /* [!] Ptr to var holding fg console for this display */ - struct uni_pagedir *vc_uni_pagedir; - struct uni_pagedir **vc_uni_pagedir_loc; /* [!] Location of uni_pagedir variable for this console */ + struct uni_pagedict *vc_uni_pagedir; + struct uni_pagedict **vc_uni_pagedir_loc; /* [!] Location of uni_pagedict variable for this console */ struct uni_screen *vc_uni_screen; /* unicode screen content */ /* additional information is in vt_kern.h */ }; -- cgit From 0b75f7968d61566d150747512344cabaa59e54c6 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Tue, 7 Jun 2022 12:49:15 +0200 Subject: tty/vt: consolemap: remove extern from function decls MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The extern keyword is not needed for function declarations. Remove it, so that the consolemap header conforms to other tty headers. Reviewed-by: Ilpo Järvinen Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220607104946.18710-5-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/consolemap.h | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index bcfce748c9d8..abc28e4450bf 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -17,12 +17,11 @@ #ifdef CONFIG_CONSOLE_TRANSLATIONS struct vc_data; -extern u16 inverse_translate(const struct vc_data *conp, int glyph, - int use_unicode); -extern unsigned short *set_translate(int m, struct vc_data *vc); -extern int conv_uni_to_pc(struct vc_data *conp, long ucs); -extern u32 conv_8bit_to_uni(unsigned char c); -extern int conv_uni_to_8bit(u32 uni); +u16 inverse_translate(const struct vc_data *conp, int glyph, int use_unicode); +unsigned short *set_translate(int m, struct vc_data *vc); +int conv_uni_to_pc(struct vc_data *conp, long ucs); +u32 conv_8bit_to_uni(unsigned char c); +int conv_uni_to_8bit(u32 uni); void console_map_init(void); #else #define inverse_translate(conp, glyph, uni) ((uint16_t)glyph) -- cgit From f827c754f9b6247f4d4f9cbb508b22bff2cf8e6d Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Tue, 7 Jun 2022 12:49:16 +0200 Subject: tty/vt: consolemap: convert macros to static inlines MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit changes !CONFIG_CONSOLE_TRANSLATIONS definitions to real (inline) functions. So the commit: * makes type checking much stronger, * removes the need of many parentheses and casts, and * makes the code more readable. Reviewed-by: Ilpo Järvinen Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220607104946.18710-6-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/consolemap.h | 35 ++++++++++++++++++++++++++++------- 1 file changed, 28 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index abc28e4450bf..98171dbed51f 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -14,9 +14,9 @@ #include -#ifdef CONFIG_CONSOLE_TRANSLATIONS struct vc_data; +#ifdef CONFIG_CONSOLE_TRANSLATIONS u16 inverse_translate(const struct vc_data *conp, int glyph, int use_unicode); unsigned short *set_translate(int m, struct vc_data *vc); int conv_uni_to_pc(struct vc_data *conp, long ucs); @@ -24,12 +24,33 @@ u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); #else -#define inverse_translate(conp, glyph, uni) ((uint16_t)glyph) -#define set_translate(m, vc) ((unsigned short *)NULL) -#define conv_uni_to_pc(conp, ucs) ((int) (ucs > 0xff ? -1: ucs)) -#define conv_8bit_to_uni(c) ((uint32_t)(c)) -#define conv_uni_to_8bit(c) ((int) ((c) & 0xff)) -#define console_map_init(c) do { ; } while (0) +static inline u16 inverse_translate(const struct vc_data *conp, int glyph, + int use_unicode) +{ + return glyph; +} + +static inline unsigned short *set_translate(int m, struct vc_data *vc) +{ + return NULL; +} + +static inline int conv_uni_to_pc(struct vc_data *conp, long ucs) +{ + return ucs > 0xff ? -1 : ucs; +} + +static inline u32 conv_8bit_to_uni(unsigned char c) +{ + return c; +} + +static inline int conv_uni_to_8bit(u32 uni) +{ + return uni & 0xff; +} + +static inline void console_map_init(void) { } #endif /* CONFIG_CONSOLE_TRANSLATIONS */ #endif /* __LINUX_CONSOLEMAP_H__ */ -- cgit From d9ebb906a45adc71a62aceb1e3608c847cafa660 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Tue, 7 Jun 2022 12:49:17 +0200 Subject: tty/vt: consolemap: make parameters of inverse_translate() saner MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - int use_unicode -> bool: it's used as bool at some places already, so make it explicit. - int glyph -> u16: every caller passes a u16 in. So make it explicit too. And remove a negative check from inverse_translate() as it never could be negative. Reviewed-by: Ilpo Järvinen Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220607104946.18710-7-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/consolemap.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index 98171dbed51f..1ff2bf55eb85 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -17,15 +17,15 @@ struct vc_data; #ifdef CONFIG_CONSOLE_TRANSLATIONS -u16 inverse_translate(const struct vc_data *conp, int glyph, int use_unicode); +u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode); unsigned short *set_translate(int m, struct vc_data *vc); int conv_uni_to_pc(struct vc_data *conp, long ucs); u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); void console_map_init(void); #else -static inline u16 inverse_translate(const struct vc_data *conp, int glyph, - int use_unicode) +static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, + bool use_unicode) { return glyph; } -- cgit From 5a904a936b407624cd1ff5ee3f1675ca3d2366a5 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Tue, 7 Jun 2022 12:49:27 +0200 Subject: tty/vt: consolemap: introduce enum translation_map and use it Again, instead of magic constants in the code, declare an enum and be a little bit more explicit. Both in the translations definition and in the loops etc. Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220607104946.18710-17-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/consolemap.h | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/consolemap.h b/include/linux/consolemap.h index 1ff2bf55eb85..c35db4896c37 100644 --- a/include/linux/consolemap.h +++ b/include/linux/consolemap.h @@ -7,10 +7,15 @@ #ifndef __LINUX_CONSOLEMAP_H__ #define __LINUX_CONSOLEMAP_H__ -#define LAT1_MAP 0 -#define GRAF_MAP 1 -#define IBMPC_MAP 2 -#define USER_MAP 3 +enum translation_map { + LAT1_MAP, + GRAF_MAP, + IBMPC_MAP, + USER_MAP, + + FIRST_MAP = LAT1_MAP, + LAST_MAP = USER_MAP, +}; #include @@ -18,7 +23,7 @@ struct vc_data; #ifdef CONFIG_CONSOLE_TRANSLATIONS u16 inverse_translate(const struct vc_data *conp, u16 glyph, bool use_unicode); -unsigned short *set_translate(int m, struct vc_data *vc); +unsigned short *set_translate(enum translation_map m, struct vc_data *vc); int conv_uni_to_pc(struct vc_data *conp, long ucs); u32 conv_8bit_to_uni(unsigned char c); int conv_uni_to_8bit(u32 uni); @@ -30,7 +35,8 @@ static inline u16 inverse_translate(const struct vc_data *conp, u16 glyph, return glyph; } -static inline unsigned short *set_translate(int m, struct vc_data *vc) +static inline unsigned short *set_translate(enum translation_map m, + struct vc_data *vc) { return NULL; } -- cgit From 8322b1f527159de578aab277629296575a11eb3c Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Mon, 6 Jun 2022 13:03:58 +0300 Subject: serial: Add uart_rs485_config() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A few serial drivers make a call to rs485_config() themselves (all these seem to relate to init). Convert them all to use a common helper which makes it easy to make adjustments on tasks related to it as serial_rs485 struct sanitization is going to be added. In pci_fintek_setup() (in 8250_pci.c), the rs485_config() call was made with NULL, however, it can be changed to pass uart_port's rs485 struct. No other callers should pass NULL into rs485_config() so the NULL check can now be eliminated. Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220606100433.13793-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index cbd5070bc87f..d3ebb4db2d80 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -592,4 +592,5 @@ static inline int uart_handle_break(struct uart_port *port) !((cflag) & CLOCAL)) int uart_get_rs485_mode(struct uart_port *port); +int uart_rs485_config(struct uart_port *port); #endif /* LINUX_SERIAL_CORE_H */ -- cgit From 8925c31c1ac2f1e05da988581f2a70a2a8c4d638 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Mon, 6 Jun 2022 13:04:00 +0300 Subject: serial: Add rs485_supported to uart_port MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Preparing to move serial_rs485 struct sanitization into serial core, each driver has to provide what fields/flags it supports. This information is pointed into by rs485_supported. Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220606100433.13793-4-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index d3ebb4db2d80..5518b70177b3 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -254,6 +254,7 @@ struct uart_port { struct attribute_group *attr_group; /* port specific attributes */ const struct attribute_group **tty_groups; /* all attributes (serial core use only) */ struct serial_rs485 rs485; + const struct serial_rs485 *rs485_supported; /* Supported mask for serial_rs485 */ struct gpio_desc *rs485_term_gpio; /* enable RS485 bus termination */ struct serial_iso7816 iso7816; void *private_data; /* generic platform data pointer */ -- cgit From 6bb6fa6908ebd3cb4e14cd4f0ce272ec885d2eb0 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Mon, 6 Jun 2022 18:36:51 +0300 Subject: tty: Implement lookahead to process XON/XOFF timely MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When tty is not read from, XON/XOFF may get stuck into an intermediate buffer. As those characters are there to do software flow-control, it is not very useful. In the case where neither end reads from ttys, the receiving ends might not be able receive the XOFF characters and just keep sending more data to the opposite direction. This problem is almost guaranteed to occur with DMA which sends data in large chunks. If TTY is slow to process characters, that is, eats less than given amount in receive_buf, invoke lookahead for the rest of the chars to process potential XON/XOFF characters. We need to keep track of how many characters have been processed by the lookahead to avoid processing the flow control char again on the normal path. Bookkeeping occurs parallel on two layers (tty_buffer and n_tty) to avoid passing the lookahead_count through the whole call chain. When a flow-control char is processed, two things must occur: a) it must not be treated as normal char b) if not yet processed, flow-control actions need to be taken The return value of n_tty_receive_char_flow_ctrl() tells caller a), and b) is kept internal to n_tty_receive_char_flow_ctrl(). If characters were previous looked ahead, __receive_buf() makes two calls to the appropriate n_tty_receive_buf_* function. First call is made with lookahead_done=true for the characters that were subject to lookahead earlier and then with lookahead=false for the new characters. Either of the calls might be skipped when it has no characters to handle. Reported-by: Gilles Buloz Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220606153652.63554-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_buffer.h | 1 + include/linux/tty_ldisc.h | 14 ++++++++++++++ include/linux/tty_port.h | 2 ++ 3 files changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tty_buffer.h b/include/linux/tty_buffer.h index 3b9d77604291..1796648c2907 100644 --- a/include/linux/tty_buffer.h +++ b/include/linux/tty_buffer.h @@ -15,6 +15,7 @@ struct tty_buffer { int used; int size; int commit; + int lookahead; /* Lazy update on recv, can become less than "read" */ int read; int flags; /* Data points here */ diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index e85002b56752..33678e1936f6 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -186,6 +186,18 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * indicate all data received is %TTY_NORMAL. If assigned, prefer this * function for automatic flow control. * + * @lookahead_buf: [DRV] ``void ()(struct tty_struct *tty, + * const unsigned char *cp, const char *fp, int count) + * + * This function is called by the low-level tty driver for characters + * not eaten by ->receive_buf() or ->receive_buf2(). It is useful for + * processing high-priority characters such as software flow-control + * characters that could otherwise get stuck into the intermediate + * buffer until tty has room to receive them. Ldisc must be able to + * handle later a ->receive_buf() or ->receive_buf2() call for the + * same characters (e.g. by skipping the actions for high-priority + * characters already handled by ->lookahead_buf()). + * * @owner: module containting this ldisc (for reference counting) * * This structure defines the interface between the tty line discipline @@ -229,6 +241,8 @@ struct tty_ldisc_ops { void (*dcd_change)(struct tty_struct *tty, unsigned int status); int (*receive_buf2)(struct tty_struct *tty, const unsigned char *cp, const char *fp, int count); + void (*lookahead_buf)(struct tty_struct *tty, const unsigned char *cp, + const unsigned char *fp, unsigned int count); struct module *owner; }; diff --git a/include/linux/tty_port.h b/include/linux/tty_port.h index 58e9619116b7..fa3c3bdaa234 100644 --- a/include/linux/tty_port.h +++ b/include/linux/tty_port.h @@ -40,6 +40,8 @@ struct tty_port_operations { struct tty_port_client_operations { int (*receive_buf)(struct tty_port *port, const unsigned char *, const unsigned char *, size_t); + void (*lookahead_buf)(struct tty_port *port, const unsigned char *cp, + const unsigned char *fp, unsigned int count); void (*write_wakeup)(struct tty_port *port); }; -- cgit From 67b9d64139e13621d3ab8bb0daad7602e5fe0778 Mon Sep 17 00:00:00 2001 From: David Jander Date: Thu, 9 Jun 2022 14:13:34 +0200 Subject: spi: Fix per-cpu stats access on 32 bit systems MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 32 bit systems, the following kernel BUG is hit: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is debug_smp_processor_id+0x18/0x24 CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.19.0-rc1-00001-g6ae0aec8a366 #181 Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) Backtrace: dump_backtrace from show_stack+0x20/0x24 r7:81024ffd r6:00000000 r5:81024ffd r4:60000013 show_stack from dump_stack_lvl+0x60/0x78 dump_stack_lvl from dump_stack+0x14/0x1c r7:81024ffd r6:80f652de r5:80bec180 r4:819a2500 dump_stack from check_preemption_disabled+0xc8/0xf0 check_preemption_disabled from debug_smp_processor_id+0x18/0x24 r8:8119b7e0 r7:81205534 r6:819f5c00 r5:819f4c00 r4:c083d724 debug_smp_processor_id from __spi_sync+0x78/0x220 __spi_sync from spi_sync+0x34/0x4c r10:bb7bf4e0 r9:c083d724 r8:00000007 r7:81a068c0 r6:822a83c0 r5:c083d724 r4:819f4c00 spi_sync from spi_mem_exec_op+0x338/0x370 r5:000000b4 r4:c083d910 spi_mem_exec_op from spi_nor_read_id+0x98/0xdc r10:bb7bf4e0 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:82358040 r4:819f7c40 spi_nor_read_id from spi_nor_detect+0x38/0x114 r7:82358040 r6:00000000 r5:819f7c40 r4:819f7c40 spi_nor_detect from spi_nor_scan+0x11c/0xbec r10:bb7bf4e0 r9:00000000 r8:00000000 r7:c083da4c r6:00000000 r5:00010101 r4:819f7c40 spi_nor_scan from spi_nor_probe+0x10c/0x2d0 r10:bb7bf4e0 r9:bb7bf4d0 r8:00000000 r7:819f4c00 r6:00000000 r5:00000000 r4:819f7c40 per-cpu access needs to be guarded against preemption. Fixes: 6598b91b5ac3 ("spi: spi.c: Convert statistics to per-cpu u64_stats_t") Reported-by: Marc Kleine-Budde Signed-off-by: David Jander Tested-by: Nícolas F. R. A. Prado Link: https://lore.kernel.org/r/20220609121334.2984808-1-david@protonic.nl Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 2e63b4935deb..c96f526d9a20 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -84,18 +84,24 @@ struct spi_statistics { #define SPI_STATISTICS_ADD_TO_FIELD(pcpu_stats, field, count) \ do { \ - struct spi_statistics *__lstats = this_cpu_ptr(pcpu_stats); \ + struct spi_statistics *__lstats; \ + get_cpu(); \ + __lstats = this_cpu_ptr(pcpu_stats); \ u64_stats_update_begin(&__lstats->syncp); \ u64_stats_add(&__lstats->field, count); \ u64_stats_update_end(&__lstats->syncp); \ + put_cpu(); \ } while (0) #define SPI_STATISTICS_INCREMENT_FIELD(pcpu_stats, field) \ do { \ - struct spi_statistics *__lstats = this_cpu_ptr(pcpu_stats); \ + struct spi_statistics *__lstats; \ + get_cpu(); \ + __lstats = this_cpu_ptr(pcpu_stats); \ u64_stats_update_begin(&__lstats->syncp); \ u64_stats_inc(&__lstats->field); \ u64_stats_update_end(&__lstats->syncp); \ + put_cpu(); \ } while (0) /** -- cgit From a6546f89eac9bfad4b50a7f8ccfbefe0ae01e7df Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 7 Jun 2022 16:11:10 +0200 Subject: treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_8.RULE Based on the normalized pattern: this program is free software you can redistribute it and/or modify it under the terms of the gnu general public license version 2 as published by the free software foundation extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- include/linux/input/elan-i2c-ids.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/input/elan-i2c-ids.h b/include/linux/input/elan-i2c-ids.h index 520858d12680..51cca17ee94c 100644 --- a/include/linux/input/elan-i2c-ids.h +++ b/include/linux/input/elan-i2c-ids.h @@ -1,3 +1,4 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * Elan I2C/SMBus Touchpad device whitelist * @@ -11,10 +12,6 @@ * copyright (c) 2011-2012 Cypress Semiconductor, Inc. * copyright (c) 2011-2012 Google, Inc. * - * This program is free software; you can redistribute it and/or modify it - * under the terms of the GNU General Public License version 2 as published - * by the Free Software Foundation. - * * Trademarks are the property of their respective owners. */ -- cgit From 2aec85b26f39fa9e036c5872950c0ef9b479a1ec Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 7 Jun 2022 16:11:13 +0200 Subject: treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_30.RULE (part 2) Based on the normalized pattern: this program is free software you can redistribute it and/or modify it under the terms of the gnu general public license as published by the free software foundation version 2 this program is distributed as is without any warranty of any kind whether express or implied without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- include/linux/mfd/lp873x.h | 10 +--------- include/linux/mfd/tps65217.h | 10 +--------- include/linux/platform_data/davinci_asp.h | 10 +--------- include/linux/platform_data/gpio-davinci.h | 10 +--------- include/linux/platform_data/uio_dmem_genirq.h | 10 +--------- include/linux/platform_data/uio_pruss.h | 10 +--------- include/linux/reset/bcm63xx_pmb.h | 10 +--------- include/linux/soc/ti/knav_dma.h | 10 +--------- include/linux/soc/ti/knav_qmss.h | 10 +--------- include/linux/sram.h | 14 ++------------ include/linux/ti-emif-sram.h | 10 +--------- include/linux/wkup_m3_ipc.h | 10 +--------- 12 files changed, 13 insertions(+), 111 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/lp873x.h b/include/linux/mfd/lp873x.h index 5546688c7da7..fe8174cc8637 100644 --- a/include/linux/mfd/lp873x.h +++ b/include/linux/mfd/lp873x.h @@ -1,16 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * Functions to access LP873X power management chip. * * Copyright (C) 2016 Texas Instruments Incorporated - https://www.ti.com/ - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __LINUX_MFD_LP873X_H diff --git a/include/linux/mfd/tps65217.h b/include/linux/mfd/tps65217.h index db7091824ed0..877d9c41c53d 100644 --- a/include/linux/mfd/tps65217.h +++ b/include/linux/mfd/tps65217.h @@ -1,18 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * linux/mfd/tps65217.h * * Functions to access TPS65217 power management chip. * * Copyright (C) 2011 Texas Instruments Incorporated - https://www.ti.com/ - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __LINUX_MFD_TPS65217_H diff --git a/include/linux/platform_data/davinci_asp.h b/include/linux/platform_data/davinci_asp.h index 76b13ef67562..c8645b2ed3c0 100644 --- a/include/linux/platform_data/davinci_asp.h +++ b/include/linux/platform_data/davinci_asp.h @@ -1,16 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * TI DaVinci Audio Serial Port support * * Copyright (C) 2012 Texas Instruments Incorporated - https://www.ti.com/ - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __DAVINCI_ASP_H diff --git a/include/linux/platform_data/gpio-davinci.h b/include/linux/platform_data/gpio-davinci.h index e182a46e609f..b82e44662efe 100644 --- a/include/linux/platform_data/gpio-davinci.h +++ b/include/linux/platform_data/gpio-davinci.h @@ -1,16 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * DaVinci GPIO Platform Related Defines * * Copyright (C) 2013 Texas Instruments Incorporated - https://www.ti.com/ - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __DAVINCI_GPIO_PLATFORM_H diff --git a/include/linux/platform_data/uio_dmem_genirq.h b/include/linux/platform_data/uio_dmem_genirq.h index 973c1bb32168..c8f6de685306 100644 --- a/include/linux/platform_data/uio_dmem_genirq.h +++ b/include/linux/platform_data/uio_dmem_genirq.h @@ -1,16 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * include/linux/platform_data/uio_dmem_genirq.h * * Copyright (C) 2012 Damian Hobson-Garcia - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef _UIO_DMEM_GENIRQ_H diff --git a/include/linux/platform_data/uio_pruss.h b/include/linux/platform_data/uio_pruss.h index 31f2e22661bc..f76fa393b802 100644 --- a/include/linux/platform_data/uio_pruss.h +++ b/include/linux/platform_data/uio_pruss.h @@ -1,18 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * include/linux/platform_data/uio_pruss.h * * Platform data for uio_pruss driver * * Copyright (C) 2010-11 Texas Instruments Incorporated - https://www.ti.com/ - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef _UIO_PRUSS_H_ diff --git a/include/linux/reset/bcm63xx_pmb.h b/include/linux/reset/bcm63xx_pmb.h index bb4af7b5eb36..c77b6999518a 100644 --- a/include/linux/reset/bcm63xx_pmb.h +++ b/include/linux/reset/bcm63xx_pmb.h @@ -1,17 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * Broadcom BCM63xx Processor Monitor Bus shared routines (SMP and reset) * * Copyright (C) 2015, Broadcom Corporation * Author: Florian Fainelli - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __BCM63XX_PMB_H #define __BCM63XX_PMB_H diff --git a/include/linux/soc/ti/knav_dma.h b/include/linux/soc/ti/knav_dma.h index 7127ec301537..18d806a8e52c 100644 --- a/include/linux/soc/ti/knav_dma.h +++ b/include/linux/soc/ti/knav_dma.h @@ -1,17 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (C) 2014 Texas Instruments Incorporated * Authors: Sandeep Nair - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __SOC_TI_KEYSTONE_NAVIGATOR_DMA_H__ diff --git a/include/linux/soc/ti/knav_qmss.h b/include/linux/soc/ti/knav_qmss.h index c75ef99c99ca..175f466ebcc3 100644 --- a/include/linux/soc/ti/knav_qmss.h +++ b/include/linux/soc/ti/knav_qmss.h @@ -1,3 +1,4 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * Keystone Navigator Queue Management Sub-System header * @@ -5,15 +6,6 @@ * Author: Sandeep Nair * Cyril Chemparathy * Santosh Shilimkar - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __SOC_TI_KNAV_QMSS_H__ diff --git a/include/linux/sram.h b/include/linux/sram.h index 4fb405fb0480..d7dee19505c6 100644 --- a/include/linux/sram.h +++ b/include/linux/sram.h @@ -1,15 +1,5 @@ -/* - * Generic SRAM Driver Interface - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - */ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Generic SRAM Driver Interface */ #ifndef __LINUX_SRAM_H__ #define __LINUX_SRAM_H__ diff --git a/include/linux/ti-emif-sram.h b/include/linux/ti-emif-sram.h index 2fc854155c27..441b2988e66a 100644 --- a/include/linux/ti-emif-sram.h +++ b/include/linux/ti-emif-sram.h @@ -1,17 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * TI AM33XX EMIF Routines * * Copyright (C) 2016-2017 Texas Instruments Inc. * Dave Gerlach - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __LINUX_TI_EMIF_H #define __LINUX_TI_EMIF_H diff --git a/include/linux/wkup_m3_ipc.h b/include/linux/wkup_m3_ipc.h index 26d1eb058fa3..5e1b26f988e2 100644 --- a/include/linux/wkup_m3_ipc.h +++ b/include/linux/wkup_m3_ipc.h @@ -1,17 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * TI Wakeup M3 for AMx3 SoCs Power Management Routines * * Copyright (C) 2015 Texas Instruments Incorporated - https://www.ti.com/ * Dave Gerlach - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License as - * published by the Free Software Foundation version 2. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef _LINUX_WKUP_M3_IPC_H -- cgit From 1accad5e74635d7f9d994e692649fbe736afe150 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 7 Jun 2022 16:11:20 +0200 Subject: treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_149.RULE Based on the normalized pattern: netapp provides this source code under the gpl v2 license the gpl v2 license is available at https://opensource org/licenses/gpl-license php this software is provided by the copyright holders and contributors as is and any express or implied warranties including but not limited to the implied warranties of merchantability and fitness for a particular purpose are disclaimed in no event shall the copyright owner or contributors be liable for any direct indirect incidental special exemplary or consequential damages (including but not limited to procurement of substitute goods or services loss of use data or profits or business interruption) however caused and on any theory of liability whether in contract strict liability or tort (including negligence or otherwise) arising in any way out of the use of this software even if advised of the possibility of such damage extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- include/linux/sunrpc/bc_xprt.h | 17 +---------------- 1 file changed, 1 insertion(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/bc_xprt.h b/include/linux/sunrpc/bc_xprt.h index f07c334c599f..db30a159f9d5 100644 --- a/include/linux/sunrpc/bc_xprt.h +++ b/include/linux/sunrpc/bc_xprt.h @@ -1,22 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /****************************************************************************** (c) 2008 NetApp. All Rights Reserved. -NetApp provides this source code under the GPL v2 License. -The GPL v2 license is available at -https://opensource.org/licenses/gpl-license.php. - -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR -CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, -EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, -PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR -PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF -LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING -NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS -SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. ******************************************************************************/ @@ -83,4 +69,3 @@ static inline void xprt_free_bc_request(struct rpc_rqst *req) } #endif /* CONFIG_SUNRPC_BACKCHANNEL */ #endif /* _LINUX_SUNRPC_BC_XPRT_H */ - -- cgit From 298b95f111be85f2d5a18dc0177eb9a64130f9b4 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 7 Jun 2022 16:11:21 +0200 Subject: treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_152.RULE Based on the normalized pattern: this software is distributed under the terms of the gnu general public license ( gpl ) version 2 as published by the free software foundation this software is provided by the copyright holders and contributors as is and any express or implied warranties including but not limited to the implied warranties of merchantability and fitness for a particular purpose are disclaimed in no event shall the copyright owner or contributors be liable for any direct indirect incidental special exemplary or consequential damages (including but not limited to procurement of substitute goods or services loss of use data or profits or business interruption) however caused and on any theory of liability whether in contract strict liability or tort (including negligence or otherwise) arising in any way out of the use of this software even if advised of the possibility of such damage extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- include/linux/platform_data/usb-omap.h | 16 +--------------- 1 file changed, 1 insertion(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/usb-omap.h b/include/linux/platform_data/usb-omap.h index 5e70d667031c..580978e468f8 100644 --- a/include/linux/platform_data/usb-omap.h +++ b/include/linux/platform_data/usb-omap.h @@ -1,22 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * usb-omap.h - Platform data for the various OMAP USB IPs * * Copyright (C) 2012 Texas Instruments Incorporated - https://www.ti.com - * - * This software is distributed under the terms of the GNU General Public - * License ("GPL") version 2, as published by the Free Software Foundation. - * - * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" - * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE - * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE - * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE - * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR - * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF - * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS - * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN - * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) - * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE - * POSSIBILITY OF SUCH DAMAGE. */ #define OMAP3_HS_USB_PORTS 3 -- cgit From abd462747539bbc630e50e8be59e3aebca63441d Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 7 Jun 2022 16:11:31 +0200 Subject: treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_319.RULE Based on the normalized pattern: this program is free software you can redistribute it and/or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed as is without any warranty of any kind whether expressed or implied without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license version 2 for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- include/linux/mfd/tps65086.h | 10 +--------- include/linux/mfd/tps65218.h | 10 +--------- include/linux/mfd/tps65912.h | 10 +--------- 3 files changed, 3 insertions(+), 27 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/tps65086.h b/include/linux/mfd/tps65086.h index e0a417e53766..16f87cccc003 100644 --- a/include/linux/mfd/tps65086.h +++ b/include/linux/mfd/tps65086.h @@ -1,16 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (C) 2015 Texas Instruments Incorporated - https://www.ti.com/ * Andrew F. Davis * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether expressed or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License version 2 for more details. - * * Based on the TPS65912 driver */ diff --git a/include/linux/mfd/tps65218.h b/include/linux/mfd/tps65218.h index 122e24ddbd4b..2946be2f15f3 100644 --- a/include/linux/mfd/tps65218.h +++ b/include/linux/mfd/tps65218.h @@ -1,18 +1,10 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * linux/mfd/tps65218.h * * Functions to access TPS65218 power management chip. * * Copyright (C) 2014 Texas Instruments Incorporated - https://www.ti.com/ - * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether expressed or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License version 2 for more details. */ #ifndef __LINUX_MFD_TPS65218_H diff --git a/include/linux/mfd/tps65912.h b/include/linux/mfd/tps65912.h index 8a61386cb8c1..860ec0a16c96 100644 --- a/include/linux/mfd/tps65912.h +++ b/include/linux/mfd/tps65912.h @@ -1,16 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * Copyright (C) 2015 Texas Instruments Incorporated - https://www.ti.com/ * Andrew F. Davis * - * This program is free software; you can redistribute it and/or - * modify it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether expressed or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License version 2 for more details. - * * Based on the TPS65218 driver and the previous TPS65912 driver by * Margarita Olaya Cabrera */ -- cgit From 5a729246e57eac410e4a13f5aba66ae2dc552632 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Tue, 7 Jun 2022 16:11:32 +0200 Subject: treewide: Replace GPLv2 boilerplate/reference with SPDX - gpl-2.0_320.RULE Based on the normalized pattern: this program is free software you can redistribute it and/or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed as is without any warranty of any kind whether express or implied without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference. Reviewed-by: Allison Randal Signed-off-by: Thomas Gleixner Signed-off-by: Greg Kroah-Hartman --- include/linux/clk/ti.h | 10 +--------- include/linux/pm_wakeirq.h | 14 ++------------ include/linux/soc/ti/ti-msgmgr.h | 10 +--------- 3 files changed, 4 insertions(+), 30 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk/ti.h b/include/linux/clk/ti.h index 3486f20a3753..cbfcbf186ce3 100644 --- a/include/linux/clk/ti.h +++ b/include/linux/clk/ti.h @@ -1,16 +1,8 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * TI clock drivers support * * Copyright (C) 2013 Texas Instruments, Inc. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef __LINUX_CLK_TI_H__ #define __LINUX_CLK_TI_H__ diff --git a/include/linux/pm_wakeirq.h b/include/linux/pm_wakeirq.h index e63a63aa47a3..dd42d16945d0 100644 --- a/include/linux/pm_wakeirq.h +++ b/include/linux/pm_wakeirq.h @@ -1,15 +1,5 @@ -/* - * pm_wakeirq.h - Device wakeirq helper functions - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. - */ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* pm_wakeirq.h - Device wakeirq helper functions */ #ifndef _LINUX_PM_WAKEIRQ_H #define _LINUX_PM_WAKEIRQ_H diff --git a/include/linux/soc/ti/ti-msgmgr.h b/include/linux/soc/ti/ti-msgmgr.h index 69a8d7682c4b..543da257a5f2 100644 --- a/include/linux/soc/ti/ti-msgmgr.h +++ b/include/linux/soc/ti/ti-msgmgr.h @@ -1,17 +1,9 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ /* * Texas Instruments' Message Manager * * Copyright (C) 2015-2022 Texas Instruments Incorporated - https://www.ti.com/ * Nishanth Menon - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 as - * published by the Free Software Foundation. - * - * This program is distributed "as is" WITHOUT ANY WARRANTY of any - * kind, whether express or implied; without even the implied warranty - * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the - * GNU General Public License for more details. */ #ifndef TI_MSGMGR_H -- cgit From cd756dafd86ee3a4969906086f3c2537e0c6d9d0 Mon Sep 17 00:00:00 2001 From: Peter Robinson Date: Mon, 6 Jun 2022 14:22:00 +0100 Subject: staging: Also remove the Unisys visorbus.h The commit that removed the Unisys s-Par and visorbus drivers left around the include/linux/visorbus.h file mentioned in the MAINTAINERS entry, we can also remove that too. Fixes: e5f45b011e4a ("staging: Remove the drivers for the Unisys s-Par") Reviewed-by: Fabio M. De Francesco Signed-off-by: Peter Robinson Link: https://lore.kernel.org/r/20220606132200.2873243-1-pbrobinson@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/visorbus.h | 344 ----------------------------------------------- 1 file changed, 344 deletions(-) delete mode 100644 include/linux/visorbus.h (limited to 'include/linux') diff --git a/include/linux/visorbus.h b/include/linux/visorbus.h deleted file mode 100644 index 0d8bd6769b13..000000000000 --- a/include/linux/visorbus.h +++ /dev/null @@ -1,344 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0+ -/* - * Copyright (C) 2010 - 2013 UNISYS CORPORATION - * All rights reserved. - */ - -/* - * This header file is to be included by other kernel mode components that - * implement a particular kind of visor_device. Each of these other kernel - * mode components is called a visor device driver. Refer to visortemplate - * for a minimal sample visor device driver. - * - * There should be nothing in this file that is private to the visorbus - * bus implementation itself. - */ - -#ifndef __VISORBUS_H__ -#define __VISORBUS_H__ - -#include - -#define VISOR_CHANNEL_SIGNATURE ('L' << 24 | 'N' << 16 | 'C' << 8 | 'E') - -/* - * enum channel_serverstate - * @CHANNELSRV_UNINITIALIZED: Channel is in an undefined state. - * @CHANNELSRV_READY: Channel has been initialized by server. - */ -enum channel_serverstate { - CHANNELSRV_UNINITIALIZED = 0, - CHANNELSRV_READY = 1 -}; - -/* - * enum channel_clientstate - * @CHANNELCLI_DETACHED: - * @CHANNELCLI_DISABLED: Client can see channel but is NOT allowed to use it - * unless given TBD* explicit request - * (should actually be < DETACHED). - * @CHANNELCLI_ATTACHING: Legacy EFI client request for EFI server to attach. - * @CHANNELCLI_ATTACHED: Idle, but client may want to use channel any time. - * @CHANNELCLI_BUSY: Client either wants to use or is using channel. - * @CHANNELCLI_OWNED: "No worries" state - client can access channel - * anytime. - */ -enum channel_clientstate { - CHANNELCLI_DETACHED = 0, - CHANNELCLI_DISABLED = 1, - CHANNELCLI_ATTACHING = 2, - CHANNELCLI_ATTACHED = 3, - CHANNELCLI_BUSY = 4, - CHANNELCLI_OWNED = 5 -}; - -/* - * Values for VISOR_CHANNEL_PROTOCOL.Features: This define exists so that - * a guest can look at the FeatureFlags in the io channel, and configure the - * driver to use interrupts or not based on this setting. All feature bits for - * all channels should be defined here. The io channel feature bits are defined - * below. - */ -#define VISOR_DRIVER_ENABLES_INTS (0x1ULL << 1) -#define VISOR_CHANNEL_IS_POLLING (0x1ULL << 3) -#define VISOR_IOVM_OK_DRIVER_DISABLING_INTS (0x1ULL << 4) -#define VISOR_DRIVER_DISABLES_INTS (0x1ULL << 5) -#define VISOR_DRIVER_ENHANCED_RCVBUF_CHECKING (0x1ULL << 6) - -/* - * struct channel_header - Common Channel Header - * @signature: Signature. - * @legacy_state: DEPRECATED - being replaced by. - * @header_size: sizeof(struct channel_header). - * @size: Total size of this channel in bytes. - * @features: Flags to modify behavior. - * @chtype: Channel type: data, bus, control, etc.. - * @partition_handle: ID of guest partition. - * @handle: Device number of this channel in client. - * @ch_space_offset: Offset in bytes to channel specific area. - * @version_id: Struct channel_header Version ID. - * @partition_index: Index of guest partition. - * @zone_uuid: Guid of Channel's zone. - * @cli_str_offset: Offset from channel header to null-terminated - * ClientString (0 if ClientString not present). - * @cli_state_boot: CHANNEL_CLIENTSTATE of pre-boot EFI client of this - * channel. - * @cmd_state_cli: CHANNEL_COMMANDSTATE (overloaded in Windows drivers, see - * ServerStateUp, ServerStateDown, etc). - * @cli_state_os: CHANNEL_CLIENTSTATE of Guest OS client of this channel. - * @ch_characteristic: CHANNEL_CHARACTERISTIC_. - * @cmd_state_srv: CHANNEL_COMMANDSTATE (overloaded in Windows drivers, see - * ServerStateUp, ServerStateDown, etc). - * @srv_state: CHANNEL_SERVERSTATE. - * @cli_error_boot: Bits to indicate err states for boot clients, so err - * messages can be throttled. - * @cli_error_os: Bits to indicate err states for OS clients, so err - * messages can be throttled. - * @filler: Pad out to 128 byte cacheline. - * @recover_channel: Please add all new single-byte values below here. - */ -struct channel_header { - u64 signature; - u32 legacy_state; - /* SrvState, CliStateBoot, and CliStateOS below */ - u32 header_size; - u64 size; - u64 features; - guid_t chtype; - u64 partition_handle; - u64 handle; - u64 ch_space_offset; - u32 version_id; - u32 partition_index; - guid_t zone_guid; - u32 cli_str_offset; - u32 cli_state_boot; - u32 cmd_state_cli; - u32 cli_state_os; - u32 ch_characteristic; - u32 cmd_state_srv; - u32 srv_state; - u8 cli_error_boot; - u8 cli_error_os; - u8 filler[1]; - u8 recover_channel; -} __packed; - -#define VISOR_CHANNEL_ENABLE_INTS (0x1ULL << 0) - -/* - * struct signal_queue_header - Subheader for the Signal Type variation of the - * Common Channel. - * @version: SIGNAL_QUEUE_HEADER Version ID. - * @chtype: Queue type: storage, network. - * @size: Total size of this queue in bytes. - * @sig_base_offset: Offset to signal queue area. - * @features: Flags to modify behavior. - * @num_sent: Total # of signals placed in this queue. - * @num_overflows: Total # of inserts failed due to full queue. - * @signal_size: Total size of a signal for this queue. - * @max_slots: Max # of slots in queue, 1 slot is always empty. - * @max_signals: Max # of signals in queue (MaxSignalSlots-1). - * @head: Queue head signal #. - * @num_received: Total # of signals removed from this queue. - * @tail: Queue tail signal. - * @reserved1: Reserved field. - * @reserved2: Reserved field. - * @client_queue: - * @num_irq_received: Total # of Interrupts received. This is incremented by the - * ISR in the guest windows driver. - * @num_empty: Number of times that visor_signal_remove is called and - * returned Empty Status. - * @errorflags: Error bits set during SignalReinit to denote trouble with - * client's fields. - * @filler: Pad out to 64 byte cacheline. - */ -struct signal_queue_header { - /* 1st cache line */ - u32 version; - u32 chtype; - u64 size; - u64 sig_base_offset; - u64 features; - u64 num_sent; - u64 num_overflows; - u32 signal_size; - u32 max_slots; - u32 max_signals; - u32 head; - /* 2nd cache line */ - u64 num_received; - u32 tail; - u32 reserved1; - u64 reserved2; - u64 client_queue; - u64 num_irq_received; - u64 num_empty; - u32 errorflags; - u8 filler[12]; -} __packed; - -/* VISORCHANNEL Guids */ -/* {414815ed-c58c-11da-95a9-00e08161165f} */ -#define VISOR_VHBA_CHANNEL_GUID \ - GUID_INIT(0x414815ed, 0xc58c, 0x11da, \ - 0x95, 0xa9, 0x0, 0xe0, 0x81, 0x61, 0x16, 0x5f) -#define VISOR_VHBA_CHANNEL_GUID_STR \ - "414815ed-c58c-11da-95a9-00e08161165f" -struct visorchipset_state { - u32 created:1; - u32 attached:1; - u32 configured:1; - u32 running:1; - /* Remaining bits in this 32-bit word are reserved. */ -}; - -/** - * struct visor_device - A device type for things "plugged" into the visorbus - * bus - * @visorchannel: Points to the channel that the device is - * associated with. - * @channel_type_guid: Identifies the channel type to the bus driver. - * @device: Device struct meant for use by the bus driver - * only. - * @list_all: Used by the bus driver to enumerate devices. - * @timer: Timer fired periodically to do interrupt-type - * activity. - * @being_removed: Indicates that the device is being removed from - * the bus. Private bus driver use only. - * @visordriver_callback_lock: Used by the bus driver to lock when adding and - * removing devices. - * @pausing: Indicates that a change towards a paused state. - * is in progress. Only modified by the bus driver. - * @resuming: Indicates that a change towards a running state - * is in progress. Only modified by the bus driver. - * @chipset_bus_no: Private field used by the bus driver. - * @chipset_dev_no: Private field used the bus driver. - * @state: Used to indicate the current state of the - * device. - * @inst: Unique GUID for this instance of the device. - * @name: Name of the device. - * @pending_msg_hdr: For private use by bus driver to respond to - * hypervisor requests. - * @vbus_hdr_info: A pointer to header info. Private use by bus - * driver. - * @partition_guid: Indicates client partion id. This should be the - * same across all visor_devices in the current - * guest. Private use by bus driver only. - */ -struct visor_device { - struct visorchannel *visorchannel; - guid_t channel_type_guid; - /* These fields are for private use by the bus driver only. */ - struct device device; - struct list_head list_all; - struct timer_list timer; - bool timer_active; - bool being_removed; - struct mutex visordriver_callback_lock; /* synchronize probe/remove */ - bool pausing; - bool resuming; - u32 chipset_bus_no; - u32 chipset_dev_no; - struct visorchipset_state state; - guid_t inst; - u8 *name; - struct controlvm_message_header *pending_msg_hdr; - void *vbus_hdr_info; - guid_t partition_guid; - struct dentry *debugfs_dir; - struct dentry *debugfs_bus_info; -}; - -#define to_visor_device(x) container_of(x, struct visor_device, device) - -typedef void (*visorbus_state_complete_func) (struct visor_device *dev, - int status); - -/* - * This struct describes a specific visor channel, by providing its GUID, name, - * and sizes. - */ -struct visor_channeltype_descriptor { - const guid_t guid; - const char *name; - u64 min_bytes; - u32 version; -}; - -/** - * struct visor_driver - Information provided by each visor driver when it - * registers with the visorbus driver - * @name: Name of the visor driver. - * @owner: The module owner. - * @channel_types: Types of channels handled by this driver, ending with - * a zero GUID. Our specialized BUS.match() method knows - * about this list, and uses it to determine whether this - * driver will in fact handle a new device that it has - * detected. - * @probe: Called when a new device comes online, by our probe() - * function specified by driver.probe() (triggered - * ultimately by some call to driver_register(), - * bus_add_driver(), or driver_attach()). - * @remove: Called when a new device is removed, by our remove() - * function specified by driver.remove() (triggered - * ultimately by some call to device_release_driver()). - * @channel_interrupt: Called periodically, whenever there is a possiblity - * that "something interesting" may have happened to the - * channel. - * @pause: Called to initiate a change of the device's state. If - * the return valu`e is < 0, there was an error and the - * state transition will NOT occur. If the return value - * is >= 0, then the state transition was INITIATED - * successfully, and complete_func() will be called (or - * was just called) with the final status when either the - * state transition fails or completes successfully. - * @resume: Behaves similar to pause. - * @driver: Private reference to the device driver. For use by bus - * driver only. - */ -struct visor_driver { - const char *name; - struct module *owner; - struct visor_channeltype_descriptor *channel_types; - int (*probe)(struct visor_device *dev); - void (*remove)(struct visor_device *dev); - void (*channel_interrupt)(struct visor_device *dev); - int (*pause)(struct visor_device *dev, - visorbus_state_complete_func complete_func); - int (*resume)(struct visor_device *dev, - visorbus_state_complete_func complete_func); - - /* These fields are for private use by the bus driver only. */ - struct device_driver driver; -}; - -#define to_visor_driver(x) (container_of(x, struct visor_driver, driver)) - -int visor_check_channel(struct channel_header *ch, struct device *dev, - const guid_t *expected_uuid, char *chname, - u64 expected_min_bytes, u32 expected_version, - u64 expected_signature); - -int visorbus_register_visor_driver(struct visor_driver *drv); -void visorbus_unregister_visor_driver(struct visor_driver *drv); -int visorbus_read_channel(struct visor_device *dev, - unsigned long offset, void *dest, - unsigned long nbytes); -int visorbus_write_channel(struct visor_device *dev, - unsigned long offset, void *src, - unsigned long nbytes); -int visorbus_enable_channel_interrupts(struct visor_device *dev); -void visorbus_disable_channel_interrupts(struct visor_device *dev); - -int visorchannel_signalremove(struct visorchannel *channel, u32 queue, - void *msg); -int visorchannel_signalinsert(struct visorchannel *channel, u32 queue, - void *msg); -bool visorchannel_signalempty(struct visorchannel *channel, u32 queue); -const guid_t *visorchannel_get_guid(struct visorchannel *channel); - -#define BUS_ROOT_DEVICE UINT_MAX -struct visor_device *visorbus_get_device_by_id(u32 bus_no, u32 dev_no, - struct visor_device *from); -#endif -- cgit From 35ba63b8f6d07d353159505423cfca5a4378a11c Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Mon, 6 Jun 2022 10:41:05 +0200 Subject: vme: move back to staging The VME subsystem graduated from staging into a top-level subsystem in 2012, with commit db3b9e990e75 ("Staging: VME: move VME drivers out of staging") stating: The VME device drivers have not moved out yet due to some API questions they are still working through, that should happen soon, hopefully. However, this never happened: maintenance of drivers/vme effectively stopped in 2017, with all subsequent changes being treewide cleanups. No hardware driver remains in staging, only the limited user-level access, and I just removed one of the two bridge drivers and the only remaining board. drivers/staging/vme/devices/ was recently moved to drivers/staging/vme_user/, but as the vme_user driver is the only one remaining for this subsystem, it is easier to just move the remaining three source files into this directory rather than keeping the original hierarchy. Signed-off-by: Arnd Bergmann Link: https://lore.kernel.org/r/20220606084109.4108188-3-arnd@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/vme.h | 190 ---------------------------------------------------- 1 file changed, 190 deletions(-) delete mode 100644 include/linux/vme.h (limited to 'include/linux') diff --git a/include/linux/vme.h b/include/linux/vme.h deleted file mode 100644 index b204a9b4be1b..000000000000 --- a/include/linux/vme.h +++ /dev/null @@ -1,190 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _VME_H_ -#define _VME_H_ - -/* Resource Type */ -enum vme_resource_type { - VME_MASTER, - VME_SLAVE, - VME_DMA, - VME_LM -}; - -/* VME Address Spaces */ -#define VME_A16 0x1 -#define VME_A24 0x2 -#define VME_A32 0x4 -#define VME_A64 0x8 -#define VME_CRCSR 0x10 -#define VME_USER1 0x20 -#define VME_USER2 0x40 -#define VME_USER3 0x80 -#define VME_USER4 0x100 - -#define VME_A16_MAX 0x10000ULL -#define VME_A24_MAX 0x1000000ULL -#define VME_A32_MAX 0x100000000ULL -#define VME_A64_MAX 0x10000000000000000ULL -#define VME_CRCSR_MAX 0x1000000ULL - - -/* VME Cycle Types */ -#define VME_SCT 0x1 -#define VME_BLT 0x2 -#define VME_MBLT 0x4 -#define VME_2eVME 0x8 -#define VME_2eSST 0x10 -#define VME_2eSSTB 0x20 - -#define VME_2eSST160 0x100 -#define VME_2eSST267 0x200 -#define VME_2eSST320 0x400 - -#define VME_SUPER 0x1000 -#define VME_USER 0x2000 -#define VME_PROG 0x4000 -#define VME_DATA 0x8000 - -/* VME Data Widths */ -#define VME_D8 0x1 -#define VME_D16 0x2 -#define VME_D32 0x4 -#define VME_D64 0x8 - -/* Arbitration Scheduling Modes */ -#define VME_R_ROBIN_MODE 0x1 -#define VME_PRIORITY_MODE 0x2 - -#define VME_DMA_PATTERN (1<<0) -#define VME_DMA_PCI (1<<1) -#define VME_DMA_VME (1<<2) - -#define VME_DMA_PATTERN_BYTE (1<<0) -#define VME_DMA_PATTERN_WORD (1<<1) -#define VME_DMA_PATTERN_INCREMENT (1<<2) - -#define VME_DMA_VME_TO_MEM (1<<0) -#define VME_DMA_MEM_TO_VME (1<<1) -#define VME_DMA_VME_TO_VME (1<<2) -#define VME_DMA_MEM_TO_MEM (1<<3) -#define VME_DMA_PATTERN_TO_VME (1<<4) -#define VME_DMA_PATTERN_TO_MEM (1<<5) - -struct vme_dma_attr { - u32 type; - void *private; -}; - -struct vme_resource { - enum vme_resource_type type; - struct list_head *entry; -}; - -extern struct bus_type vme_bus_type; - -/* Number of VME interrupt vectors */ -#define VME_NUM_STATUSID 256 - -/* VME_MAX_BRIDGES comes from the type of vme_bus_numbers */ -#define VME_MAX_BRIDGES (sizeof(unsigned int)*8) -#define VME_MAX_SLOTS 32 - -#define VME_SLOT_CURRENT -1 -#define VME_SLOT_ALL -2 - -/** - * struct vme_dev - Structure representing a VME device - * @num: The device number - * @bridge: Pointer to the bridge device this device is on - * @dev: Internal device structure - * @drv_list: List of devices (per driver) - * @bridge_list: List of devices (per bridge) - */ -struct vme_dev { - int num; - struct vme_bridge *bridge; - struct device dev; - struct list_head drv_list; - struct list_head bridge_list; -}; - -/** - * struct vme_driver - Structure representing a VME driver - * @name: Driver name, should be unique among VME drivers and usually the same - * as the module name. - * @match: Callback used to determine whether probe should be run. - * @probe: Callback for device binding, called when new device is detected. - * @remove: Callback, called on device removal. - * @driver: Underlying generic device driver structure. - * @devices: List of VME devices (struct vme_dev) associated with this driver. - */ -struct vme_driver { - const char *name; - int (*match)(struct vme_dev *); - int (*probe)(struct vme_dev *); - void (*remove)(struct vme_dev *); - struct device_driver driver; - struct list_head devices; -}; - -void *vme_alloc_consistent(struct vme_resource *, size_t, dma_addr_t *); -void vme_free_consistent(struct vme_resource *, size_t, void *, - dma_addr_t); - -size_t vme_get_size(struct vme_resource *); -int vme_check_window(u32 aspace, unsigned long long vme_base, - unsigned long long size); - -struct vme_resource *vme_slave_request(struct vme_dev *, u32, u32); -int vme_slave_set(struct vme_resource *, int, unsigned long long, - unsigned long long, dma_addr_t, u32, u32); -int vme_slave_get(struct vme_resource *, int *, unsigned long long *, - unsigned long long *, dma_addr_t *, u32 *, u32 *); -void vme_slave_free(struct vme_resource *); - -struct vme_resource *vme_master_request(struct vme_dev *, u32, u32, u32); -int vme_master_set(struct vme_resource *, int, unsigned long long, - unsigned long long, u32, u32, u32); -int vme_master_get(struct vme_resource *, int *, unsigned long long *, - unsigned long long *, u32 *, u32 *, u32 *); -ssize_t vme_master_read(struct vme_resource *, void *, size_t, loff_t); -ssize_t vme_master_write(struct vme_resource *, void *, size_t, loff_t); -unsigned int vme_master_rmw(struct vme_resource *, unsigned int, unsigned int, - unsigned int, loff_t); -int vme_master_mmap(struct vme_resource *resource, struct vm_area_struct *vma); -void vme_master_free(struct vme_resource *); - -struct vme_resource *vme_dma_request(struct vme_dev *, u32); -struct vme_dma_list *vme_new_dma_list(struct vme_resource *); -struct vme_dma_attr *vme_dma_pattern_attribute(u32, u32); -struct vme_dma_attr *vme_dma_pci_attribute(dma_addr_t); -struct vme_dma_attr *vme_dma_vme_attribute(unsigned long long, u32, u32, u32); -void vme_dma_free_attribute(struct vme_dma_attr *); -int vme_dma_list_add(struct vme_dma_list *, struct vme_dma_attr *, - struct vme_dma_attr *, size_t); -int vme_dma_list_exec(struct vme_dma_list *); -int vme_dma_list_free(struct vme_dma_list *); -int vme_dma_free(struct vme_resource *); - -int vme_irq_request(struct vme_dev *, int, int, - void (*callback)(int, int, void *), void *); -void vme_irq_free(struct vme_dev *, int, int); -int vme_irq_generate(struct vme_dev *, int, int); - -struct vme_resource *vme_lm_request(struct vme_dev *); -int vme_lm_count(struct vme_resource *); -int vme_lm_set(struct vme_resource *, unsigned long long, u32, u32); -int vme_lm_get(struct vme_resource *, unsigned long long *, u32 *, u32 *); -int vme_lm_attach(struct vme_resource *, int, void (*callback)(void *), void *); -int vme_lm_detach(struct vme_resource *, int); -void vme_lm_free(struct vme_resource *); - -int vme_slot_num(struct vme_dev *); -int vme_bus_num(struct vme_dev *); - -int vme_register_driver(struct vme_driver *, unsigned int); -void vme_unregister_driver(struct vme_driver *); - - -#endif /* _VME_H_ */ - -- cgit From 2f8c3ae8288e4a4018330ed5c4e758b878d9c555 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:07:00 -0700 Subject: driver core: Add wait_for_init_devices_probe helper function Some devices might need to be probed and bound successfully before the kernel boot sequence can finish and move on to init/userspace. For example, a network interface might need to be bound to be able to mount a NFS rootfs. With fw_devlink=on by default, some of these devices might be blocked from probing because they are waiting on a optional supplier that doesn't have a driver. While fw_devlink will eventually identify such devices and unblock the probing automatically, it might be too late by the time it unblocks the probing of devices. For example, the IP4 autoconfig might timeout before fw_devlink unblocks probing of the network interface. This function is available to temporarily try and probe all devices that have a driver even if some of their suppliers haven't been added or don't have drivers. The drivers can then decide which of the suppliers are optional vs mandatory and probe the device if possible. By the time this function returns, all such "best effort" probes are guaranteed to be completed. If a device successfully probes in this mode, we delete all fw_devlink discovered dependencies of that device where the supplier hasn't yet probed successfully because they have to be optional dependencies. This also means that some devices that aren't needed for init and could have waited for their optional supplier to probe (when the supplier's module is loaded later on) would end up probing prematurely with limited functionality. So call this function only when boot would fail without it. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-5-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- include/linux/device/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index 700453017e1c..2114d65b862f 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -129,6 +129,7 @@ extern struct device_driver *driver_find(const char *name, struct bus_type *bus); extern int driver_probe_done(void); extern void wait_for_device_probe(void); +void __init wait_for_init_devices_probe(void); /* sysfs interface for exporting driver attributes */ -- cgit From 9cbffc7a59561be950ecc675d19a3d2b45202b2b Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Wed, 1 Jun 2022 00:07:05 -0700 Subject: driver core: Delete driver_deferred_probe_check_state() The function is no longer used. So delete it. Tested-by: Geert Uytterhoeven Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220601070707.3946847-10-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- include/linux/device/driver.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index 2114d65b862f..7acaabde5396 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -242,7 +242,6 @@ driver_find_device_by_acpi_dev(struct device_driver *drv, const void *adev) extern int driver_deferred_probe_timeout; void driver_deferred_probe_add(struct device *dev); -int driver_deferred_probe_check_state(struct device *dev); void driver_init(void); /** -- cgit From 82b070beae1ef55b0049768c8dc91d87565bb191 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 10 Jun 2022 15:02:18 +0300 Subject: driver core: Introduce device_find_any_child() helper There are several places in the kernel where this kind of functionality is being used. Provide a generic helper for such cases. Reviewed-by: Rafael J. Wysocki Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220610120219.18988-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/device.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/device.h b/include/linux/device.h index dc941997795c..424b55df0272 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -905,6 +905,8 @@ struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); struct device *device_find_child_by_name(struct device *parent, const char *name); +struct device *device_find_any_child(struct device *parent); + int device_rename(struct device *dev, const char *new_name); int device_move(struct device *dev, struct device *new_parent, enum dpm_order dpm_order); -- cgit From 4314f9f4f85832b5082f4e38b07b63b11baa538c Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Wed, 8 Jun 2022 10:55:28 +0100 Subject: firmware: arm_scmi: Avoid using extended string-buffers sizes if not necessary Commit b260fccaebdc2 ("firmware: arm_scmi: Add SCMI v3.1 protocol extended names support") moved all the name string buffers to use the extended buffer size of 64 instead of the required 16 bytes. While that should be fine if the firmware terminates the string before 16 bytes, there is possibility of copying random data if the name is not NULL terminated by the firmware. SCMI base protocol agent_name/vendor_id/sub_vendor_id are defined by the specification as NULL-terminated ASCII strings up to 16-bytes in length. The underlying buffers and message descriptors are currently bigger than needed; resize them to fit only the strictly needed 16 bytes to avoid any possible leaks when reading data from the firmware. Change the size argument of strlcpy to use SCMI_SHORT_NAME_MAX_SIZE always when dealing with short domain names, so as to limit the possibility that an ill-formed non-NULL terminated short reply from the SCMI platform firmware can leak stale content laying in the underlying transport shared memory area. While at that, convert all strings handling routines to use the preferred strscpy. Link: https://lore.kernel.org/r/20220608095530.497879-1-cristian.marussi@arm.com Fixes: b260fccaebdc2 ("firmware: arm_scmi: Add SCMI v3.1 protocol extended names support") Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 1c58646ba381..704111f63993 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -13,8 +13,9 @@ #include #include -#define SCMI_MAX_STR_SIZE 64 -#define SCMI_MAX_NUM_RATES 16 +#define SCMI_MAX_STR_SIZE 64 +#define SCMI_SHORT_NAME_MAX_SIZE 16 +#define SCMI_MAX_NUM_RATES 16 /** * struct scmi_revision_info - version information structure @@ -36,8 +37,8 @@ struct scmi_revision_info { u8 num_protocols; u8 num_agents; u32 impl_ver; - char vendor_id[SCMI_MAX_STR_SIZE]; - char sub_vendor_id[SCMI_MAX_STR_SIZE]; + char vendor_id[SCMI_SHORT_NAME_MAX_SIZE]; + char sub_vendor_id[SCMI_SHORT_NAME_MAX_SIZE]; }; struct scmi_clock_info { -- cgit From 73b4b53276a1d6290cd4f47dbbc885b6e6e59ac6 Mon Sep 17 00:00:00 2001 From: Andrey Grodzovsky Date: Thu, 19 May 2022 09:47:28 -0400 Subject: Revert "workqueue: remove unused cancel_work()" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 6417250d3f894e66a68ba1cd93676143f2376a6f. amdpgu need this function in order to prematurly stop pending reset works when another reset work already in progress. Acked-by: Tejun Heo Signed-off-by: Andrey Grodzovsky Reviewed-by: Lai Jiangshan Reviewed-by: Christian König Signed-off-by: Alex Deucher --- include/linux/workqueue.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 7fee9b6cfede..9e41e1226193 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -453,6 +453,7 @@ extern int schedule_on_each_cpu(work_func_t func); int execute_in_process_context(work_func_t fn, struct execute_work *); extern bool flush_work(struct work_struct *work); +extern bool cancel_work(struct work_struct *work); extern bool cancel_work_sync(struct work_struct *work); extern bool flush_delayed_work(struct delayed_work *dwork); -- cgit From e81fb4198e27925b151aad1450e0fd607d6733f8 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 9 Jun 2022 15:04:01 -0700 Subject: netfs: Further cleanups after struct netfs_inode wrapper introduced Change the signature of netfs helper functions to take a struct netfs_inode pointer rather than a struct inode pointer where appropriate, thereby relieving the need for the network filesystem to convert its internal inode format down to the VFS inode only for netfslib to bounce it back up. For type safety, it's better not to do that (and it's less typing too). Give netfs_write_begin() an extra argument to pass in a pointer to the netfs_inode struct rather than deriving it internally from the file pointer. Note that the ->write_begin() and ->write_end() ops are intended to be replaced in the future by netfslib code that manages this without the need to call in twice for each page. netfs_readpage() and similar are intended to be pointed at directly by the address_space_operations table, so must stick to the signature dictated by the function pointers there. Changes ======= - Updated the kerneldoc comments and documentation [DH]. Signed-off-by: David Howells cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/CAHk-=wgkwKyNmNdKpQkqZ6DnmUL-x9hp0YBnUGjaPFEAdxDTbw@mail.gmail.com/ --- include/linux/netfs.h | 22 +++++++++------------- 1 file changed, 9 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 6dbb4c9ce50d..a62739f3726b 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -277,7 +277,8 @@ struct netfs_cache_ops { struct readahead_control; extern void netfs_readahead(struct readahead_control *); int netfs_read_folio(struct file *, struct folio *); -extern int netfs_write_begin(struct file *, struct address_space *, +extern int netfs_write_begin(struct netfs_inode *, + struct file *, struct address_space *, loff_t, unsigned int, struct folio **, void **); @@ -302,19 +303,17 @@ static inline struct netfs_inode *netfs_inode(struct inode *inode) /** * netfs_inode_init - Initialise a netfslib inode context - * @inode: The inode with which the context is associated + * @inode: The netfs inode to initialise * @ops: The netfs's operations list * * Initialise the netfs library context struct. This is expected to follow on * directly from the VFS inode struct. */ -static inline void netfs_inode_init(struct inode *inode, +static inline void netfs_inode_init(struct netfs_inode *ctx, const struct netfs_request_ops *ops) { - struct netfs_inode *ctx = netfs_inode(inode); - ctx->ops = ops; - ctx->remote_i_size = i_size_read(inode); + ctx->remote_i_size = i_size_read(&ctx->inode); #if IS_ENABLED(CONFIG_FSCACHE) ctx->cache = NULL; #endif @@ -322,28 +321,25 @@ static inline void netfs_inode_init(struct inode *inode, /** * netfs_resize_file - Note that a file got resized - * @inode: The inode being resized + * @ctx: The netfs inode being resized * @new_i_size: The new file size * * Inform the netfs lib that a file got resized so that it can adjust its state. */ -static inline void netfs_resize_file(struct inode *inode, loff_t new_i_size) +static inline void netfs_resize_file(struct netfs_inode *ctx, loff_t new_i_size) { - struct netfs_inode *ctx = netfs_inode(inode); - ctx->remote_i_size = new_i_size; } /** * netfs_i_cookie - Get the cache cookie from the inode - * @inode: The inode to query + * @ctx: The netfs inode to query * * Get the caching cookie (if enabled) from the network filesystem's inode. */ -static inline struct fscache_cookie *netfs_i_cookie(struct inode *inode) +static inline struct fscache_cookie *netfs_i_cookie(struct netfs_inode *ctx) { #if IS_ENABLED(CONFIG_FSCACHE) - struct netfs_inode *ctx = netfs_inode(inode); return ctx->cache; #else return NULL; -- cgit From 40a81101202300df7db273f77dda9fbe6271b1d2 Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 25 Feb 2022 11:19:14 +0000 Subject: netfs: Rename the netfs_io_request cleanup op and give it an op pointer The netfs_io_request cleanup op is now always in a position to be given a pointer to a netfs_io_request struct, so this can be passed in instead of the mapping and private data arguments (both of which are included in the struct). So rename the ->cleanup op to ->free_request (to match ->init_request) and pass in the I/O pointer. Signed-off-by: David Howells Reviewed-by: Jeff Layton cc: linux-cachefs@redhat.com --- include/linux/netfs.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index a62739f3726b..097cdd644665 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -206,7 +206,9 @@ struct netfs_io_request { */ struct netfs_request_ops { int (*init_request)(struct netfs_io_request *rreq, struct file *file); + void (*free_request)(struct netfs_io_request *rreq); int (*begin_cache_operation)(struct netfs_io_request *rreq); + void (*expand_readahead)(struct netfs_io_request *rreq); bool (*clamp_length)(struct netfs_io_subrequest *subreq); void (*issue_read)(struct netfs_io_subrequest *subreq); @@ -214,7 +216,6 @@ struct netfs_request_ops { int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, struct folio *folio, void **_fsdata); void (*done)(struct netfs_io_request *rreq); - void (*cleanup)(struct address_space *mapping, void *netfs_priv); }; /* -- cgit From 36518b6b4da7e8d4387bc19ad21e772f1060e9d7 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 7 Jun 2022 16:04:03 -0400 Subject: teach iomap_dio_rw() to suppress dsync New flag, equivalent to removal of IOCB_DSYNC from iocb flags. This mimics what btrfs is doing (and that's what btrfs will switch to). However, I'm not at all sure that we want to suppress REQ_FUA for those - all btrfs hack really cares about is suppression of generic_write_sync(). For now let's keep the existing behaviour, but I really want to hear more detailed arguments pro or contra. [folded brain fix from willy] Suggested-by: Christoph Hellwig Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro --- include/linux/iomap.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e552097c67e0..c8622d8f064e 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -353,6 +353,12 @@ struct iomap_dio_ops { */ #define IOMAP_DIO_PARTIAL (1 << 2) +/* + * The caller will sync the write if needed; do not sync it within + * iomap_dio_rw. Overrides IOMAP_DIO_FORCE_WAIT. + */ +#define IOMAP_DIO_NOSYNC (1 << 3) + ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops, unsigned int dio_flags, void *private, size_t done_before); -- cgit From e87f2c26c8085dac59978dee1beeb05cef31a9dd Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 22 May 2022 09:28:12 -0400 Subject: struct file: use anonymous union member for rcuhead and llist Once upon a time we couldn't afford anon unions; these days minimal gcc version had been raised enough to take care of that. Reviewed-by: Jan Kara Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro --- include/linux/fs.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ad5e3520fae..6a2a4906041f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -924,9 +924,9 @@ static inline int ra_has_index(struct file_ra_state *ra, pgoff_t index) struct file { union { - struct llist_node fu_llist; - struct rcu_head fu_rcuhead; - } f_u; + struct llist_node f_llist; + struct rcu_head f_rcuhead; + }; struct path f_path; struct inode *f_inode; /* cached value */ const struct file_operations *f_op; -- cgit From 91b94c5d6ae55d1161633047ffeea644b110b35f Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 22 May 2022 09:39:27 -0400 Subject: iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC New helper to be used instead of direct checks for IOCB_DSYNC: iocb_is_dsync(iocb). Checks converted, which allows to avoid the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines) from iocb_flags() - it's checked in iocb_is_dsync() instead Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro --- include/linux/fs.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 6a2a4906041f..380a1292f4f9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2720,6 +2720,12 @@ extern int vfs_fsync(struct file *file, int datasync); extern int sync_file_range(struct file *file, loff_t offset, loff_t nbytes, unsigned int flags); +static inline bool iocb_is_dsync(const struct kiocb *iocb) +{ + return (iocb->ki_flags & IOCB_DSYNC) || + IS_SYNC(iocb->ki_filp->f_mapping->host); +} + /* * Sync the bytes written if this was a synchronous write. Expect ki_pos * to already be updated for the write, and will return either the amount @@ -2727,7 +2733,7 @@ extern int sync_file_range(struct file *file, loff_t offset, loff_t nbytes, */ static inline ssize_t generic_write_sync(struct kiocb *iocb, ssize_t count) { - if (iocb->ki_flags & IOCB_DSYNC) { + if (iocb_is_dsync(iocb)) { int ret = vfs_fsync_range(iocb->ki_filp, iocb->ki_pos - count, iocb->ki_pos - 1, (iocb->ki_flags & IOCB_SYNC) ? 0 : 1); @@ -3262,7 +3268,7 @@ static inline int iocb_flags(struct file *file) res |= IOCB_APPEND; if (file->f_flags & O_DIRECT) res |= IOCB_DIRECT; - if ((file->f_flags & O_DSYNC) || IS_SYNC(file->f_mapping->host)) + if (file->f_flags & O_DSYNC) res |= IOCB_DSYNC; if (file->f_flags & __O_SYNC) res |= IOCB_SYNC; -- cgit From 164f4064ca81eefcea29f7f5dcf394f92be1d0c0 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 22 May 2022 11:38:11 -0400 Subject: keep iocb_flags() result cached in struct file * calculate at the time we set FMODE_OPENED (do_dentry_open() for normal opens, alloc_file() for pipe()/socket()/etc.) * update when handling F_SETFL * keep in a new field - file->f_iocb_flags; since that thing is needed only before the refcount reaches zero, we can put it into the same anon union where ->f_rcuhead and ->f_llist live - those are used only after refcount reaches zero. Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro --- include/linux/fs.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 380a1292f4f9..c82b9d442f56 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -926,6 +926,7 @@ struct file { union { struct llist_node f_llist; struct rcu_head f_rcuhead; + unsigned int f_iocb_flags; }; struct path f_path; struct inode *f_inode; /* cached value */ @@ -2199,13 +2200,11 @@ static inline bool HAS_UNMAPPED_ID(struct user_namespace *mnt_userns, !gid_valid(i_gid_into_mnt(mnt_userns, inode)); } -static inline int iocb_flags(struct file *file); - static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp) { *kiocb = (struct kiocb) { .ki_filp = filp, - .ki_flags = iocb_flags(filp), + .ki_flags = filp->f_iocb_flags, .ki_ioprio = get_current_ioprio(), }; } -- cgit From 7cbb6681d7e5b88688234ad370e027a9346ff7a9 Mon Sep 17 00:00:00 2001 From: Gwendal Grignou Date: Wed, 27 Apr 2022 12:08:04 -0700 Subject: iio: common: cros_ec_sensors: Add label attribute When sensor location is known, populate iio sysfs "label" attribute: * "accel-base" : the sensor is in the base of the convertible (2-1) device. * "accel-display" : the sensor is in the lid/display plane of the device. * "accel-camera" : the sensor is in the swivel camera subassembly. The non-standard |location| attribute is removed, the field |loc| in cros_ec_sensors_core_state is removed. It apply to standalone accelerometer as well as IMU (accelerometer + gyroscope) and sensors where the location is known (light). Signed-off-by: Gwendal Grignou Link: https://lore.kernel.org/r/20220427190804.961697-3-gwendal@chromium.org Signed-off-by: Jonathan Cameron --- include/linux/iio/common/cros_ec_sensors_core.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/common/cros_ec_sensors_core.h b/include/linux/iio/common/cros_ec_sensors_core.h index c582e1a14232..a8259c8822f5 100644 --- a/include/linux/iio/common/cros_ec_sensors_core.h +++ b/include/linux/iio/common/cros_ec_sensors_core.h @@ -41,7 +41,6 @@ typedef irqreturn_t (*cros_ec_sensors_capture_t)(int irq, void *p); * @param: motion sensor parameters structure * @resp: motion sensor response structure * @type: type of motion sensor - * @loc: location where the motion sensor is placed * @range_updated: True if the range of the sensor has been * updated. * @curr_range: If updated, the current range value. @@ -67,7 +66,6 @@ struct cros_ec_sensors_core_state { struct ec_response_motion_sense *resp; enum motionsensor_type type; - enum motionsensor_location loc; bool range_updated; int curr_range; -- cgit From ccd8a9351f7b44bc1c0c8e4d39c0d6593b106fc4 Mon Sep 17 00:00:00 2001 From: Vincent Mailhol Date: Fri, 10 Jun 2022 23:30:08 +0900 Subject: can: skb: move can_dropped_invalid_skb() and can_skb_headroom_valid() to skb.c The functions can_dropped_invalid_skb() and can_skb_headroom_valid() grew a lot over the years to a point which it does not make much sense to have them defined as static inline in header files. Move those two functions to the .c counterpart of skb.h. can_skb_headroom_valid()'s only caller being can_dropped_invalid_skb(), the declaration is removed from the header. Only can_dropped_invalid_skb() gets its symbol exported. While doing so, do a small cleanup: add brackets around the else block in can_dropped_invalid_skb(). Link: https://lore.kernel.org/all/20220610143009.323579-7-mailhol.vincent@wanadoo.fr Signed-off-by: Vincent Mailhol Reported-by: kernel test robot Acked-by: Max Staudt Tested-by: Oliver Hartkopp Signed-off-by: Marc Kleine-Budde --- include/linux/can/skb.h | 59 +------------------------------------------------ 1 file changed, 1 insertion(+), 58 deletions(-) (limited to 'include/linux') diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h index fdb22b00674a..182749e858b3 100644 --- a/include/linux/can/skb.h +++ b/include/linux/can/skb.h @@ -31,6 +31,7 @@ struct sk_buff *alloc_canfd_skb(struct net_device *dev, struct canfd_frame **cfd); struct sk_buff *alloc_can_err_skb(struct net_device *dev, struct can_frame **cf); +bool can_dropped_invalid_skb(struct net_device *dev, struct sk_buff *skb); /* * The struct can_skb_priv is used to transport additional information along @@ -96,64 +97,6 @@ static inline struct sk_buff *can_create_echo_skb(struct sk_buff *skb) return nskb; } -/* Check for outgoing skbs that have not been created by the CAN subsystem */ -static inline bool can_skb_headroom_valid(struct net_device *dev, - struct sk_buff *skb) -{ - /* af_packet creates a headroom of HH_DATA_MOD bytes which is fine */ - if (WARN_ON_ONCE(skb_headroom(skb) < sizeof(struct can_skb_priv))) - return false; - - /* af_packet does not apply CAN skb specific settings */ - if (skb->ip_summed == CHECKSUM_NONE) { - /* init headroom */ - can_skb_prv(skb)->ifindex = dev->ifindex; - can_skb_prv(skb)->skbcnt = 0; - - skb->ip_summed = CHECKSUM_UNNECESSARY; - - /* perform proper loopback on capable devices */ - if (dev->flags & IFF_ECHO) - skb->pkt_type = PACKET_LOOPBACK; - else - skb->pkt_type = PACKET_HOST; - - skb_reset_mac_header(skb); - skb_reset_network_header(skb); - skb_reset_transport_header(skb); - } - - return true; -} - -/* Drop a given socketbuffer if it does not contain a valid CAN frame. */ -static inline bool can_dropped_invalid_skb(struct net_device *dev, - struct sk_buff *skb) -{ - const struct canfd_frame *cfd = (struct canfd_frame *)skb->data; - - if (skb->protocol == htons(ETH_P_CAN)) { - if (unlikely(skb->len != CAN_MTU || - cfd->len > CAN_MAX_DLEN)) - goto inval_skb; - } else if (skb->protocol == htons(ETH_P_CANFD)) { - if (unlikely(skb->len != CANFD_MTU || - cfd->len > CANFD_MAX_DLEN)) - goto inval_skb; - } else - goto inval_skb; - - if (!can_skb_headroom_valid(dev, skb)) - goto inval_skb; - - return false; - -inval_skb: - kfree_skb(skb); - dev->stats.tx_dropped++; - return true; -} - static inline bool can_is_canfd_skb(const struct sk_buff *skb) { /* the CAN specific type of skb is identified by its data length */ -- cgit From 8bee9dd953b69c634d1c9a3241a8b357469ad4aa Mon Sep 17 00:00:00 2001 From: Jonathan Neuschäfer Date: Fri, 10 Jun 2022 01:41:10 +0200 Subject: workqueue: Switch to new kerneldoc syntax for named variable macro argument MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The syntax without dots is available since commit 43756e347f21 ("scripts/kernel-doc: Add support for named variable macro arguments"). The same HTML output is produced with and without this patch. Signed-off-by: Jonathan Neuschäfer Acked-by: Tejun Heo Signed-off-by: Tejun Heo --- include/linux/workqueue.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index e1f1c8b1121b..62e75dd40d9a 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -406,7 +406,7 @@ alloc_workqueue(const char *fmt, unsigned int flags, int max_active, ...); * alloc_ordered_workqueue - allocate an ordered workqueue * @fmt: printf format for the name of the workqueue * @flags: WQ_* flags (only WQ_FREEZABLE and WQ_MEM_RECLAIM are meaningful) - * @args...: args for @fmt + * @args: args for @fmt * * Allocate an ordered workqueue. An ordered workqueue executes at * most one work item at any given time in the queued order. They are -- cgit From 662a60102c122e44fdaf5c826f7f415eb57d48ad Mon Sep 17 00:00:00 2001 From: Heikki Krogerus Date: Mon, 2 May 2022 16:20:56 +0300 Subject: usb: typec: Separate USB Power Delivery from USB Type-C Introducing a small device class for USB Power Delivery. The idea with it is that we do not mix any more USB Power Delivery information into the USB Type-C connectors only. This separation will make it possible to register USB Power Delivery devices also from other places, for example from USB Type-C Bridges (see USB Type-C Bridge Specification). The device class will not always deal with only the messages and objects that were negotiated with the partner, but instead messages and objects that can be used in the negotiation. That allows the USB PD devices to be shared and reconfigured. The ports can decide which objects are to be advertised to the partner before the contract is negotiated. It is also possible to allow the user space to make that decision if needed. Signed-off-by: Heikki Krogerus Link: https://lore.kernel.org/r/20220502132058.86236-2-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/pd.h | 38 ++++++++++++++++++++++++++++++++++++++ include/linux/usb/typec.h | 10 ++++++++++ 2 files changed, 48 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb/pd.h b/include/linux/usb/pd.h index 96b7ff66f074..c59fb79a42e8 100644 --- a/include/linux/usb/pd.h +++ b/include/linux/usb/pd.h @@ -495,4 +495,42 @@ static inline unsigned int rdo_max_power(u32 rdo) #define PD_P_SNK_STDBY_MW 2500 /* 2500 mW */ +#if IS_ENABLED(CONFIG_TYPEC) + +struct usb_power_delivery; + +/** + * usb_power_delivery_desc - USB Power Delivery Descriptor + * @revision: USB Power Delivery Specification Revision + * @version: USB Power Delivery Specicication Version - optional + */ +struct usb_power_delivery_desc { + u16 revision; + u16 version; +}; + +/** + * usb_power_delivery_capabilities_desc - Description of USB Power Delivery Capabilities Message + * @pdo: The Power Data Objects in the Capability Message + * @role: Power role of the capabilities + */ +struct usb_power_delivery_capabilities_desc { + u32 pdo[PDO_MAX_OBJECTS]; + enum typec_role role; +}; + +struct usb_power_delivery_capabilities * +usb_power_delivery_register_capabilities(struct usb_power_delivery *pd, + struct usb_power_delivery_capabilities_desc *desc); +void usb_power_delivery_unregister_capabilities(struct usb_power_delivery_capabilities *cap); + +struct usb_power_delivery *usb_power_delivery_register(struct device *parent, + struct usb_power_delivery_desc *desc); +void usb_power_delivery_unregister(struct usb_power_delivery *pd); + +int usb_power_delivery_link_device(struct usb_power_delivery *pd, struct device *dev); +void usb_power_delivery_unlink_device(struct usb_power_delivery *pd, struct device *dev); + +#endif /* CONFIG_TYPEC */ + #endif /* __LINUX_USB_PD_H */ diff --git a/include/linux/usb/typec.h b/include/linux/usb/typec.h index fdf737d48b3b..45e28d14ae56 100644 --- a/include/linux/usb/typec.h +++ b/include/linux/usb/typec.h @@ -52,6 +52,16 @@ enum typec_role { TYPEC_SOURCE, }; +static inline int is_sink(enum typec_role role) +{ + return role == TYPEC_SINK; +} + +static inline int is_source(enum typec_role role) +{ + return role == TYPEC_SOURCE; +} + enum typec_pwr_opmode { TYPEC_PWR_MODE_USB, TYPEC_PWR_MODE_1_5A, -- cgit From a7cff92f0635c794e2198a69a7ff4ecfe0decab9 Mon Sep 17 00:00:00 2001 From: Heikki Krogerus Date: Mon, 2 May 2022 16:20:57 +0300 Subject: usb: typec: USB Power Delivery helpers for ports and partners All the USB Type-C Connector Class devices are protected, so the drivers can not directly access them. This will adds a few helpers that can be used to link the ports and partners to the correct USB Power Delivery objects. For ports a new optional sysfs attribute file is also added that can be used to select the USB Power Delivery capabilities that the port will advertise to the partner. Signed-off-by: Heikki Krogerus Link: https://lore.kernel.org/r/20220502132058.86236-3-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/typec.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb/typec.h b/include/linux/usb/typec.h index 45e28d14ae56..7751bedcae5d 100644 --- a/include/linux/usb/typec.h +++ b/include/linux/usb/typec.h @@ -22,6 +22,8 @@ struct typec_altmode_ops; struct fwnode_handle; struct device; +struct usb_power_delivery; + enum typec_port_type { TYPEC_PORT_SRC, TYPEC_PORT_SNK, @@ -223,6 +225,8 @@ struct typec_partner_desc { * @pr_set: Set Power Role * @vconn_set: Source VCONN * @port_type_set: Set port type + * @pd_get: Get available USB Power Delivery Capabilities. + * @pd_set: Set USB Power Delivery Capabilities. */ struct typec_operations { int (*try_role)(struct typec_port *port, int role); @@ -231,6 +235,8 @@ struct typec_operations { int (*vconn_set)(struct typec_port *port, enum typec_role role); int (*port_type_set)(struct typec_port *port, enum typec_port_type type); + struct usb_power_delivery **(*pd_get)(struct typec_port *port); + int (*pd_set)(struct typec_port *port, struct usb_power_delivery *pd); }; enum usb_pd_svdm_ver { @@ -250,6 +256,7 @@ enum usb_pd_svdm_ver { * @accessory: Supported Accessory Modes * @fwnode: Optional fwnode of the port * @driver_data: Private pointer for driver specific info + * @pd: Optional USB Power Delivery Support * @ops: Port operations vector * * Static capabilities of a single USB Type-C port. @@ -267,6 +274,8 @@ struct typec_capability { struct fwnode_handle *fwnode; void *driver_data; + struct usb_power_delivery *pd; + const struct typec_operations *ops; }; @@ -318,4 +327,8 @@ void typec_partner_set_svdm_version(struct typec_partner *partner, enum usb_pd_svdm_ver svdm_version); int typec_get_negotiated_svdm_version(struct typec_port *port); +int typec_port_set_usb_power_delivery(struct typec_port *port, struct usb_power_delivery *pd); +int typec_partner_set_usb_power_delivery(struct typec_partner *partner, + struct usb_power_delivery *pd); + #endif /* __LINUX_USB_TYPEC_H */ -- cgit From e146caf303493c4f2458173d7f1598b76a9b1396 Mon Sep 17 00:00:00 2001 From: Mathias Nyman Date: Fri, 6 May 2022 19:18:07 +0300 Subject: usb: Avoid extra usb SET_SEL requests when enabling link power management The host needs to tell the device the exit latencies using the SET_SEL request before device initiated link powermanagement can be enabled. The exit latency values do not change after enumeration, it's enough to set them once. So do like Windows 10 and issue the SET_SEL request once just before setting the configuration. This is also the sequence described in USB 3.2 specs "9.1.2 Bus enumeration". SET_SEL is issued once before the Set Configuration request, and won't be cleared by the Set Configuration, Set Interface or ClearFeature (STALL) requests. Only warm reset, hot reset, set Address 0 clears the exit latencies. See USB 3.2 section 9.4.14 Table 9-10 Device parameters and events Add udev->lpm_devinit_allow, and set it if SET_SEL was successful. If not set, then don't try to enable device initiated LPM We used to issue a SET_SEL request every time lpm is enabled for either U1 or U2 link states, meaning a SET_SEL was issued twice after every Set Configuration and Set Interface requests, easily accumulating to over 15 SET_SEL requets during a USB3 webcam enumeration. Signed-off-by: Mathias Nyman Link: https://lore.kernel.org/r/20220506161807.3369439-1-mathias.nyman@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb.h b/include/linux/usb.h index 60bee864d897..f7a9914fc97f 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -584,6 +584,7 @@ struct usb3_lpm_parameters { * @authenticated: Crypto authentication passed * @wusb: device is Wireless USB * @lpm_capable: device supports LPM + * @lpm_devinit_allow: Allow USB3 device initiated LPM, exit latency is in range * @usb2_hw_lpm_capable: device can perform USB2 hardware LPM * @usb2_hw_lpm_besl_capable: device can perform USB2 hardware BESL LPM * @usb2_hw_lpm_enabled: USB2 hardware LPM is enabled @@ -666,6 +667,7 @@ struct usb_device { unsigned authenticated:1; unsigned wusb:1; unsigned lpm_capable:1; + unsigned lpm_devinit_allow:1; unsigned usb2_hw_lpm_capable:1; unsigned usb2_hw_lpm_besl_capable:1; unsigned usb2_hw_lpm_enabled:1; -- cgit From c04245328dd7e915e21ac6395ffd218616e22754 Mon Sep 17 00:00:00 2001 From: Yajun Deng Date: Fri, 10 Jun 2022 17:10:17 +0800 Subject: net: make __sys_accept4_file() static __sys_accept4_file() isn't used outside of the file, make it static. As the same time, move file_flags and nofile parameters into __sys_accept4_file(). Signed-off-by: Yajun Deng Signed-off-by: David S. Miller --- include/linux/socket.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/socket.h b/include/linux/socket.h index 17311ad9f9af..414b8c7bb8f7 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -428,10 +428,6 @@ extern int __sys_recvfrom(int fd, void __user *ubuf, size_t size, extern int __sys_sendto(int fd, void __user *buff, size_t len, unsigned int flags, struct sockaddr __user *addr, int addr_len); -extern int __sys_accept4_file(struct file *file, unsigned file_flags, - struct sockaddr __user *upeer_sockaddr, - int __user *upeer_addrlen, int flags, - unsigned long nofile); extern struct file *do_accept(struct file *file, unsigned file_flags, struct sockaddr __user *upeer_sockaddr, int __user *upeer_addrlen, int flags); -- cgit From 0eb6584068642767baab17c2e4385a6b9c029caa Mon Sep 17 00:00:00 2001 From: Maximilian Luz Date: Fri, 27 May 2022 04:34:36 +0200 Subject: platform/surface: aggregator: Allow is_ssam_device() to be used when CONFIG_SURFACE_AGGREGATOR_BUS is disabled In SSAM subsystem drivers that handle both ACPI and SSAM-native client devices, we may want to check whether we have a SSAM (native) client device. Further, we may want to do this even when instantiation thereof cannot happen due to CONFIG_SURFACE_AGGREGATOR_BUS=n. Currently, doing so causes an error due to an undefined reference error due to ssam_device_type being placed in the bus source unit. Therefore, if CONFIG_SURFACE_AGGREGATOR_BUS is not defined, simply let is_ssam_device() return false to prevent this error. Signed-off-by: Maximilian Luz Link: https://lore.kernel.org/r/20220527023447.2460025-2-luzmaximilian@gmail.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/surface_aggregator/device.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/surface_aggregator/device.h b/include/linux/surface_aggregator/device.h index cc257097eb05..62b38b4487eb 100644 --- a/include/linux/surface_aggregator/device.h +++ b/include/linux/surface_aggregator/device.h @@ -177,6 +177,8 @@ struct ssam_device_driver { void (*remove)(struct ssam_device *sdev); }; +#ifdef CONFIG_SURFACE_AGGREGATOR_BUS + extern struct bus_type ssam_bus_type; extern const struct device_type ssam_device_type; @@ -193,6 +195,15 @@ static inline bool is_ssam_device(struct device *d) return d->type == &ssam_device_type; } +#else /* CONFIG_SURFACE_AGGREGATOR_BUS */ + +static inline bool is_ssam_device(struct device *d) +{ + return false; +} + +#endif /* CONFIG_SURFACE_AGGREGATOR_BUS */ + /** * to_ssam_device() - Casts the given device to a SSAM client device. * @d: The device to cast. -- cgit From dc0393c76f378f68961587fd4f32de29fb8f0c79 Mon Sep 17 00:00:00 2001 From: Maximilian Luz Date: Fri, 27 May 2022 04:34:37 +0200 Subject: platform/surface: aggregator: Allow devices to be marked as hot-removed Some SSAM devices, notably the keyboard cover (keyboard and touchpad) on the Surface Pro 8, can be hot-removed. When this occurs, communication with the device may fail and time out. This timeout can unnecessarily block and slow down device removal and even cause issues when the devices are detached and re-attached quickly. Thus, communication should generally be avoided once hot-removal is detected. While we already remove a device as soon as we detect its (hot-)removal, the corresponding device driver may still attempt to communicate with the device during teardown. This is especially critical as communication failure may also extend to disabling of events, which is typically done at that stage. Add a flag to allow marking devices as hot-removed. This can then be used during client driver teardown to check if any communication attempts should be avoided. Signed-off-by: Maximilian Luz Link: https://lore.kernel.org/r/20220527023447.2460025-3-luzmaximilian@gmail.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/surface_aggregator/device.h | 48 +++++++++++++++++++++++++++++-- 1 file changed, 45 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/surface_aggregator/device.h b/include/linux/surface_aggregator/device.h index 62b38b4487eb..6df7c8d4e50e 100644 --- a/include/linux/surface_aggregator/device.h +++ b/include/linux/surface_aggregator/device.h @@ -148,17 +148,30 @@ struct ssam_device_uid { #define SSAM_SDEV(cat, tid, iid, fun) \ SSAM_DEVICE(SSAM_DOMAIN_SERIALHUB, SSAM_SSH_TC_##cat, tid, iid, fun) +/* + * enum ssam_device_flags - Flags for SSAM client devices. + * @SSAM_DEVICE_HOT_REMOVED_BIT: + * The device has been hot-removed. Further communication with it may time + * out and should be avoided. + */ +enum ssam_device_flags { + SSAM_DEVICE_HOT_REMOVED_BIT = 0, +}; + /** * struct ssam_device - SSAM client device. - * @dev: Driver model representation of the device. - * @ctrl: SSAM controller managing this device. - * @uid: UID identifying the device. + * @dev: Driver model representation of the device. + * @ctrl: SSAM controller managing this device. + * @uid: UID identifying the device. + * @flags: Device state flags, see &enum ssam_device_flags. */ struct ssam_device { struct device dev; struct ssam_controller *ctrl; struct ssam_device_uid uid; + + unsigned long flags; }; /** @@ -251,6 +264,35 @@ struct ssam_device *ssam_device_alloc(struct ssam_controller *ctrl, int ssam_device_add(struct ssam_device *sdev); void ssam_device_remove(struct ssam_device *sdev); +/** + * ssam_device_mark_hot_removed() - Mark the given device as hot-removed. + * @sdev: The device to mark as hot-removed. + * + * Mark the device as having been hot-removed. This signals drivers using the + * device that communication with the device should be avoided and may lead to + * timeouts. + */ +static inline void ssam_device_mark_hot_removed(struct ssam_device *sdev) +{ + dev_dbg(&sdev->dev, "marking device as hot-removed\n"); + set_bit(SSAM_DEVICE_HOT_REMOVED_BIT, &sdev->flags); +} + +/** + * ssam_device_is_hot_removed() - Check if the given device has been + * hot-removed. + * @sdev: The device to check. + * + * Checks if the given device has been marked as hot-removed. See + * ssam_device_mark_hot_removed() for more details. + * + * Return: Returns ``true`` if the device has been marked as hot-removed. + */ +static inline bool ssam_device_is_hot_removed(struct ssam_device *sdev) +{ + return test_bit(SSAM_DEVICE_HOT_REMOVED_BIT, &sdev->flags); +} + /** * ssam_device_get() - Increment reference count of SSAM client device. * @sdev: The device to increment the reference count of. -- cgit From 5c1e88b98c60e4074796e9a05d3c674479ab1919 Mon Sep 17 00:00:00 2001 From: Maximilian Luz Date: Fri, 27 May 2022 04:34:38 +0200 Subject: platform/surface: aggregator: Allow notifiers to avoid communication on unregistering When SSAM client devices have been (physically) hot-removed, communication attempts with those devices may fail and time out. This can even extend to event notifiers, due to which timeouts may occur during device removal, slowing down that process. Add a parameter to the notifier unregister function that allows skipping communication with the EC to prevent this. Furthermore, add wrappers for registering and unregistering notifiers belonging to SSAM client devices that automatically check if the device has been marked as hot-removed and communication should be avoided. Note that non-SSAM client devices can generally not be hot-removed, so also add a convenience wrapper for those, defaulting to allow communication. Signed-off-by: Maximilian Luz Link: https://lore.kernel.org/r/20220527023447.2460025-4-luzmaximilian@gmail.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/surface_aggregator/controller.h | 24 +++++++++- include/linux/surface_aggregator/device.h | 66 +++++++++++++++++++++++++++ 2 files changed, 88 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/surface_aggregator/controller.h b/include/linux/surface_aggregator/controller.h index 74bfdffaf7b0..50a2b4926c06 100644 --- a/include/linux/surface_aggregator/controller.h +++ b/include/linux/surface_aggregator/controller.h @@ -835,8 +835,28 @@ struct ssam_event_notifier { int ssam_notifier_register(struct ssam_controller *ctrl, struct ssam_event_notifier *n); -int ssam_notifier_unregister(struct ssam_controller *ctrl, - struct ssam_event_notifier *n); +int __ssam_notifier_unregister(struct ssam_controller *ctrl, + struct ssam_event_notifier *n, bool disable); + +/** + * ssam_notifier_unregister() - Unregister an event notifier. + * @ctrl: The controller the notifier has been registered on. + * @n: The event notifier to unregister. + * + * Unregister an event notifier. Decrement the usage counter of the associated + * SAM event if the notifier is not marked as an observer. If the usage counter + * reaches zero, the event will be disabled. + * + * Return: Returns zero on success, %-ENOENT if the given notifier block has + * not been registered on the controller. If the given notifier block was the + * last one associated with its specific event, returns the status of the + * event-disable EC-command. + */ +static inline int ssam_notifier_unregister(struct ssam_controller *ctrl, + struct ssam_event_notifier *n) +{ + return __ssam_notifier_unregister(ctrl, n, true); +} int ssam_controller_event_enable(struct ssam_controller *ctrl, struct ssam_event_registry reg, diff --git a/include/linux/surface_aggregator/device.h b/include/linux/surface_aggregator/device.h index 6df7c8d4e50e..c418f7f2732d 100644 --- a/include/linux/surface_aggregator/device.h +++ b/include/linux/surface_aggregator/device.h @@ -483,4 +483,70 @@ static inline void ssam_remove_clients(struct device *dev) {} sdev->uid.instance, ret); \ } + +/* -- Helpers for client-device notifiers. ---------------------------------- */ + +/** + * ssam_device_notifier_register() - Register an event notifier for the + * specified client device. + * @sdev: The device the notifier should be registered on. + * @n: The event notifier to register. + * + * Register an event notifier. Increment the usage counter of the associated + * SAM event if the notifier is not marked as an observer. If the event is not + * marked as an observer and is currently not enabled, it will be enabled + * during this call. If the notifier is marked as an observer, no attempt will + * be made at enabling any event and no reference count will be modified. + * + * Notifiers marked as observers do not need to be associated with one specific + * event, i.e. as long as no event matching is performed, only the event target + * category needs to be set. + * + * Return: Returns zero on success, %-ENOSPC if there have already been + * %INT_MAX notifiers for the event ID/type associated with the notifier block + * registered, %-ENOMEM if the corresponding event entry could not be + * allocated, %-ENODEV if the device is marked as hot-removed. If this is the + * first time that a notifier block is registered for the specific associated + * event, returns the status of the event-enable EC-command. + */ +static inline int ssam_device_notifier_register(struct ssam_device *sdev, + struct ssam_event_notifier *n) +{ + /* + * Note that this check does not provide any guarantees whatsoever as + * hot-removal could happen at any point and we can't protect against + * it. Nevertheless, if we can detect hot-removal, bail early to avoid + * communication timeouts. + */ + if (ssam_device_is_hot_removed(sdev)) + return -ENODEV; + + return ssam_notifier_register(sdev->ctrl, n); +} + +/** + * ssam_device_notifier_unregister() - Unregister an event notifier for the + * specified client device. + * @sdev: The device the notifier has been registered on. + * @n: The event notifier to unregister. + * + * Unregister an event notifier. Decrement the usage counter of the associated + * SAM event if the notifier is not marked as an observer. If the usage counter + * reaches zero, the event will be disabled. + * + * In case the device has been marked as hot-removed, the event will not be + * disabled on the EC, as in those cases any attempt at doing so may time out. + * + * Return: Returns zero on success, %-ENOENT if the given notifier block has + * not been registered on the controller. If the given notifier block was the + * last one associated with its specific event, returns the status of the + * event-disable EC-command. + */ +static inline int ssam_device_notifier_unregister(struct ssam_device *sdev, + struct ssam_event_notifier *n) +{ + return __ssam_notifier_unregister(sdev->ctrl, n, + !ssam_device_is_hot_removed(sdev)); +} + #endif /* _LINUX_SURFACE_AGGREGATOR_DEVICE_H */ -- cgit From 25e2ca7301bd3ca5a63a6be41d729eb42202bc21 Mon Sep 17 00:00:00 2001 From: Maximilian Luz Date: Fri, 27 May 2022 04:34:43 +0200 Subject: platform/surface: aggregator: Add comment for KIP subsystem category The KIP subsystem (full name unknown, abbreviation has been obtained through reverse engineering) handles detachable peripherals such as the keyboard cover on the Surface Pro X and Surface Pro 8. It is currently not entirely clear what this subsystem entails, but at the very least it provides event notifications for when the keyboard cover on the Surface Pro X and Surface Pro 8 have been detached or re-attached, as well as the state that the keyboard cover is currently in (e.g. folded-back, folded laptop-like, closed, etc.). Signed-off-by: Maximilian Luz Link: https://lore.kernel.org/r/20220527023447.2460025-9-luzmaximilian@gmail.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/surface_aggregator/serial_hub.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/surface_aggregator/serial_hub.h b/include/linux/surface_aggregator/serial_hub.h index c3de43edcffa..26b95ec12733 100644 --- a/include/linux/surface_aggregator/serial_hub.h +++ b/include/linux/surface_aggregator/serial_hub.h @@ -306,7 +306,7 @@ enum ssam_ssh_tc { SSAM_SSH_TC_LPC = 0x0b, SSAM_SSH_TC_TCL = 0x0c, SSAM_SSH_TC_SFL = 0x0d, - SSAM_SSH_TC_KIP = 0x0e, + SSAM_SSH_TC_KIP = 0x0e, /* Manages detachable peripherals (Pro X/8 keyboard cover) */ SSAM_SSH_TC_EXT = 0x0f, SSAM_SSH_TC_BLD = 0x10, SSAM_SSH_TC_BAS = 0x11, /* Detachment system (Surface Book 2/3). */ -- cgit From 993d0b287e2ef7bee2e8b13b0ce4d2b5066f278e Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sun, 12 Jun 2022 22:32:25 +0100 Subject: usercopy: Handle vm_map_ram() areas vmalloc does not allocate a vm_struct for vm_map_ram() areas. That causes us to deny usercopies from those areas. This affects XFS which uses vm_map_ram() for its directories. Fix this by calling find_vmap_area() instead of find_vm_area(). Fixes: 0aef499f3172 ("mm/usercopy: Detect vmalloc overruns") Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Uladzislau Rezki (Sony) Tested-by: Zorro Lang Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220612213227.3881769-2-willy@infradead.org --- include/linux/vmalloc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index b159c2789961..096d48aa3437 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -215,6 +215,7 @@ extern struct vm_struct *__get_vm_area_caller(unsigned long size, void free_vm_area(struct vm_struct *area); extern struct vm_struct *remove_vm_area(const void *addr); extern struct vm_struct *find_vm_area(const void *addr); +struct vmap_area *find_vmap_area(unsigned long addr); static inline bool is_vm_area_hugepages(const void *addr) { -- cgit From 546093206ba16623c18e344630dbfdd71a4327e0 Mon Sep 17 00:00:00 2001 From: Xiu Jianfeng Date: Sat, 11 Jun 2022 17:23:04 +0800 Subject: audit: make is_audit_feature_set() static Currently nobody use is_audit_feature_set() outside this file, so make it static. Signed-off-by: Xiu Jianfeng Signed-off-by: Paul Moore --- include/linux/audit.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/audit.h b/include/linux/audit.h index cece70231138..00f7a80f1a3e 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -119,8 +119,6 @@ enum audit_nfcfgop { AUDIT_NFT_OP_INVALID, }; -extern int is_audit_feature_set(int which); - extern int __init audit_register_class(int class, unsigned *list); extern int audit_classify_syscall(int abi, unsigned syscall); extern int audit_classify_arch(int arch); -- cgit From 795e10b450a88b2943a241a5eaa6e86ae4f47694 Mon Sep 17 00:00:00 2001 From: Yevgeny Kliteynik Date: Tue, 7 Jun 2022 15:47:43 +0300 Subject: net/mlx5: Introduce header-modify-pattern ICM properties Added new fields for device memory capabilities, in order to support creation of ICM memory for modify header patterns. Signed-off-by: Erez Shitrit Signed-off-by: Yevgeny Kliteynik Signed-off-by: Leon Romanovsky Acked-by: Saeed Mahameed Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index fd7d083a34d3..789d64400744 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1086,11 +1086,14 @@ struct mlx5_ifc_device_mem_cap_bits { u8 log_sw_icm_alloc_granularity[0x6]; u8 log_steering_sw_icm_size[0x8]; - u8 reserved_at_120[0x20]; + u8 reserved_at_120[0x18]; + u8 log_header_modify_pattern_sw_icm_size[0x8]; u8 header_modify_sw_icm_start_address[0x40]; - u8 reserved_at_180[0x80]; + u8 reserved_at_180[0x40]; + + u8 header_modify_pattern_sw_icm_start_address[0x40]; u8 memic_operations[0x20]; -- cgit From 667658364b2056f344a2769280b939a5e45610be Mon Sep 17 00:00:00 2001 From: Yevgeny Kliteynik Date: Tue, 7 Jun 2022 15:47:44 +0300 Subject: net/mlx5: Manage ICM of type modify-header pattern Added support for managing new type of ICM for devices that support sw_owner_v2. Signed-off-by: Erez Shitrit Signed-off-by: Yevgeny Kliteynik Signed-off-by: Leon Romanovsky Acked-by: Saeed Mahameed Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 5040cd774c5a..76d7661e3e63 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -676,6 +676,7 @@ struct mlx5e_resources { enum mlx5_sw_icm_type { MLX5_SW_ICM_TYPE_STEERING, MLX5_SW_ICM_TYPE_HEADER_MODIFY, + MLX5_SW_ICM_TYPE_HEADER_MODIFY_PATTERN, }; #define MLX5_MAX_RESERVED_GIDS 8 -- cgit From f5d23ee137e51b4e5cd5d263b144d5e6719f6e52 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Wed, 8 Jun 2022 13:04:47 -0700 Subject: net/mlx5: Add IFC bits and enums for flow meter Add/extend structure layouts and defines for flow meter. Signed-off-by: Jianbo Liu Reviewed-by: Ariel Levkovich Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 1 + include/linux/mlx5/mlx5_ifc.h | 114 ++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 111 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 604b85dd770a..15ac02eeed4f 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -455,6 +455,7 @@ enum { MLX5_OPCODE_UMR = 0x25, + MLX5_OPCODE_ACCESS_ASO = 0x2d, }; enum { diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 789d64400744..91872afb2bfe 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -442,7 +442,9 @@ struct mlx5_ifc_flow_table_prop_layout_bits { u8 max_modify_header_actions[0x8]; u8 max_ft_level[0x8]; - u8 reserved_at_40[0x20]; + u8 reserved_at_40[0x6]; + u8 execute_aso[0x1]; + u8 reserved_at_47[0x19]; u8 reserved_at_60[0x2]; u8 reformat_insert[0x1]; @@ -940,7 +942,17 @@ struct mlx5_ifc_qos_cap_bits { u8 max_tsar_bw_share[0x20]; - u8 reserved_at_100[0x700]; + u8 reserved_at_100[0x20]; + + u8 reserved_at_120[0x3]; + u8 log_meter_aso_granularity[0x5]; + u8 reserved_at_128[0x3]; + u8 log_meter_aso_max_alloc[0x5]; + u8 reserved_at_130[0x3]; + u8 log_max_num_meter_aso[0x5]; + u8 reserved_at_138[0x8]; + + u8 reserved_at_140[0x6c0]; }; struct mlx5_ifc_debug_cap_bits { @@ -3280,6 +3292,7 @@ enum { MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH_2 = 0x800, MLX5_FLOW_CONTEXT_ACTION_IPSEC_DECRYPT = 0x1000, MLX5_FLOW_CONTEXT_ACTION_IPSEC_ENCRYPT = 0x2000, + MLX5_FLOW_CONTEXT_ACTION_EXECUTE_ASO = 0x4000, }; enum { @@ -3295,6 +3308,38 @@ struct mlx5_ifc_vlan_bits { u8 vid[0xc]; }; +enum { + MLX5_FLOW_METER_COLOR_RED = 0x0, + MLX5_FLOW_METER_COLOR_YELLOW = 0x1, + MLX5_FLOW_METER_COLOR_GREEN = 0x2, + MLX5_FLOW_METER_COLOR_UNDEFINED = 0x3, +}; + +enum { + MLX5_EXE_ASO_FLOW_METER = 0x2, +}; + +struct mlx5_ifc_exe_aso_ctrl_flow_meter_bits { + u8 return_reg_id[0x4]; + u8 aso_type[0x4]; + u8 reserved_at_8[0x14]; + u8 action[0x1]; + u8 init_color[0x2]; + u8 meter_id[0x1]; +}; + +union mlx5_ifc_exe_aso_ctrl { + struct mlx5_ifc_exe_aso_ctrl_flow_meter_bits exe_aso_ctrl_flow_meter; +}; + +struct mlx5_ifc_execute_aso_bits { + u8 valid[0x1]; + u8 reserved_at_1[0x7]; + u8 aso_object_id[0x18]; + + union mlx5_ifc_exe_aso_ctrl exe_aso_ctrl; +}; + struct mlx5_ifc_flow_context_bits { struct mlx5_ifc_vlan_bits push_vlan; @@ -3326,7 +3371,9 @@ struct mlx5_ifc_flow_context_bits { struct mlx5_ifc_fte_match_param_bits match_value; - u8 reserved_at_1200[0x600]; + struct mlx5_ifc_execute_aso_bits execute_aso[4]; + + u8 reserved_at_1300[0x500]; union mlx5_ifc_dest_format_struct_flow_counter_list_auto_bits destination[]; }; @@ -5973,7 +6020,9 @@ struct mlx5_ifc_general_obj_in_cmd_hdr_bits { u8 obj_id[0x20]; - u8 reserved_at_60[0x20]; + u8 reserved_at_60[0x3]; + u8 log_obj_range[0x5]; + u8 reserved_at_68[0x18]; }; struct mlx5_ifc_general_obj_out_cmd_hdr_bits { @@ -11373,12 +11422,14 @@ enum { MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = BIT_ULL(0xc), MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_IPSEC = BIT_ULL(0x13), MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT_ULL(0x20), + MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_FLOW_METER_ASO = BIT_ULL(0x24), }; enum { MLX5_GENERAL_OBJECT_TYPES_ENCRYPTION_KEY = 0xc, MLX5_GENERAL_OBJECT_TYPES_IPSEC = 0x13, MLX5_GENERAL_OBJECT_TYPES_SAMPLER = 0x20, + MLX5_GENERAL_OBJECT_TYPES_FLOW_METER_ASO = 0x24, }; enum { @@ -11451,6 +11502,61 @@ struct mlx5_ifc_create_encryption_key_in_bits { struct mlx5_ifc_encryption_key_obj_bits encryption_key_object; }; +enum { + MLX5_FLOW_METER_MODE_BYTES_IP_LENGTH = 0x0, + MLX5_FLOW_METER_MODE_BYTES_CALC_WITH_L2 = 0x1, + MLX5_FLOW_METER_MODE_BYTES_CALC_WITH_L2_IPG = 0x2, + MLX5_FLOW_METER_MODE_NUM_PACKETS = 0x3, +}; + +struct mlx5_ifc_flow_meter_parameters_bits { + u8 valid[0x1]; + u8 bucket_overflow[0x1]; + u8 start_color[0x2]; + u8 both_buckets_on_green[0x1]; + u8 reserved_at_5[0x1]; + u8 meter_mode[0x2]; + u8 reserved_at_8[0x18]; + + u8 reserved_at_20[0x20]; + + u8 reserved_at_40[0x3]; + u8 cbs_exponent[0x5]; + u8 cbs_mantissa[0x8]; + u8 reserved_at_50[0x3]; + u8 cir_exponent[0x5]; + u8 cir_mantissa[0x8]; + + u8 reserved_at_60[0x20]; + + u8 reserved_at_80[0x3]; + u8 ebs_exponent[0x5]; + u8 ebs_mantissa[0x8]; + u8 reserved_at_90[0x3]; + u8 eir_exponent[0x5]; + u8 eir_mantissa[0x8]; + + u8 reserved_at_a0[0x60]; +}; + +struct mlx5_ifc_flow_meter_aso_obj_bits { + u8 modify_field_select[0x40]; + + u8 reserved_at_40[0x40]; + + u8 reserved_at_80[0x8]; + u8 meter_aso_access_pd[0x18]; + + u8 reserved_at_a0[0x160]; + + struct mlx5_ifc_flow_meter_parameters_bits flow_meter_parameters[2]; +}; + +struct mlx5_ifc_create_flow_meter_aso_obj_in_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits hdr; + struct mlx5_ifc_flow_meter_aso_obj_bits flow_meter_aso_obj; +}; + struct mlx5_ifc_sampler_obj_bits { u8 modify_field_select[0x40]; -- cgit From 3e94e61bd44d90070dcda53b647fdc826097ef26 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Wed, 8 Jun 2022 13:04:48 -0700 Subject: net/mlx5: Add HW definitions of vport debug counters total_q_under_processor_handle - number of queues in error state due to an async error or errored command. send_queue_priority_update_flow - number of QP/SQ priority/SL update events. cq_overrun - number of times CQ entered an error state due to an overflow. async_eq_overrun -number of time an EQ mapped to async events was overrun. comp_eq_overrun - number of time an EQ mapped to completion events was overrun. quota_exceeded_command - number of commands issued and failed due to quota exceeded. invalid_command - number of commands issued and failed dues to any reason other than quota exceeded. Signed-off-by: Saeed Mahameed Signed-off-by: Michael Guralnik Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 91872afb2bfe..f678ad88a7d5 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1441,7 +1441,8 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 reserved_at_120[0xa]; u8 log_max_ra_req_dc[0x6]; - u8 reserved_at_130[0xa]; + u8 reserved_at_130[0x9]; + u8 vnic_env_cq_overrun[0x1]; u8 log_max_ra_res_dc[0x6]; u8 reserved_at_140[0x5]; @@ -1636,7 +1637,11 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 nic_receive_steering_discard[0x1]; u8 receive_discard_vport_down[0x1]; u8 transmit_discard_vport_down[0x1]; - u8 reserved_at_343[0x5]; + u8 eq_overrun_count[0x1]; + u8 reserved_at_344[0x1]; + u8 invalid_command_count[0x1]; + u8 quota_exceeded_count[0x1]; + u8 reserved_at_347[0x1]; u8 log_max_flow_counter_bulk[0x8]; u8 max_flow_counter_15_0[0x10]; @@ -3441,11 +3446,21 @@ struct mlx5_ifc_vnic_diagnostic_statistics_bits { u8 transmit_discard_vport_down[0x40]; - u8 reserved_at_140[0xa0]; + u8 async_eq_overrun[0x20]; + + u8 comp_eq_overrun[0x20]; + + u8 reserved_at_180[0x20]; + + u8 invalid_command[0x20]; + + u8 quota_exceeded_command[0x20]; u8 internal_rq_out_of_buffer[0x20]; - u8 reserved_at_200[0xe00]; + u8 cq_overrun[0x20]; + + u8 reserved_at_220[0xde0]; }; struct mlx5_ifc_traffic_counter_bits { -- cgit From 91707779a481aab9c7f1d7a5ea3033ce87dc4fd6 Mon Sep 17 00:00:00 2001 From: Jianbo Liu Date: Wed, 8 Jun 2022 13:04:49 -0700 Subject: net/mlx5: Add support EXECUTE_ASO action for flow entry Attach flow meter to FTE with object id and index. Use metadata register C5 to store the packet color meter result. Signed-off-by: Jianbo Liu Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 8135713b0d2d..ece3e35622d7 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -212,6 +212,19 @@ struct mlx5_flow_group * mlx5_create_flow_group(struct mlx5_flow_table *ft, u32 *in); void mlx5_destroy_flow_group(struct mlx5_flow_group *fg); +struct mlx5_exe_aso { + u32 object_id; + u8 type; + u8 return_reg_id; + union { + u32 ctrl_data; + struct { + u8 meter_idx; + u8 init_color; + } flow_meter; + }; +}; + struct mlx5_fs_vlan { u16 ethtype; u16 vid; @@ -237,6 +250,7 @@ struct mlx5_flow_act { struct mlx5_fs_vlan vlan[MLX5_FS_VLAN_DEPTH]; struct ib_counters *counters; struct mlx5_flow_group *fg; + struct mlx5_exe_aso exe_aso; }; #define MLX5_DECLARE_FLOW_ACT(name) \ -- cgit From d107ba1f7c067b08eb4b8ca7c51187fb7c4d97f2 Mon Sep 17 00:00:00 2001 From: Shay Drory Date: Wed, 8 Jun 2022 13:04:51 -0700 Subject: net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASK Remove not used MLX5_CAP_BITS_RW_MASK. While at it, remove CAP_MASK, MLX5_CAP_OFF_CMDIF_CSUM and MLX5_DEV_CAP_FLAG_*, since MLX5_CAP_BITS_RW_MASK was their only user. Signed-off-by: Shay Drory Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 19 ------------------- 1 file changed, 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 15ac02eeed4f..95a4fa0fd40a 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -386,21 +386,6 @@ enum { MLX5_PORT_CHANGE_SUBTYPE_CLIENT_REREG = 9, }; -enum { - MLX5_DEV_CAP_FLAG_XRC = 1LL << 3, - MLX5_DEV_CAP_FLAG_BAD_PKEY_CNTR = 1LL << 8, - MLX5_DEV_CAP_FLAG_BAD_QKEY_CNTR = 1LL << 9, - MLX5_DEV_CAP_FLAG_APM = 1LL << 17, - MLX5_DEV_CAP_FLAG_ATOMIC = 1LL << 18, - MLX5_DEV_CAP_FLAG_BLOCK_MCAST = 1LL << 23, - MLX5_DEV_CAP_FLAG_ON_DMND_PG = 1LL << 24, - MLX5_DEV_CAP_FLAG_CQ_MODER = 1LL << 29, - MLX5_DEV_CAP_FLAG_RESIZE_CQ = 1LL << 30, - MLX5_DEV_CAP_FLAG_DCT = 1LL << 37, - MLX5_DEV_CAP_FLAG_SIG_HAND_OVER = 1LL << 40, - MLX5_DEV_CAP_FLAG_CMDIF_CSUM = 3LL << 46, -}; - enum { MLX5_ROCE_VERSION_1 = 0, MLX5_ROCE_VERSION_2 = 2, @@ -496,10 +481,6 @@ enum { MLX5_MAX_PAGE_SHIFT = 31 }; -enum { - MLX5_CAP_OFF_CMDIF_CSUM = 46, -}; - enum { /* * Max wqe size for rdma read is 512 bytes, so this -- cgit From cdcdce948d64139aea1c6dfea4b04f5c8ad2784e Mon Sep 17 00:00:00 2001 From: Ofer Levi Date: Wed, 8 Jun 2022 13:04:52 -0700 Subject: net/mlx5: Add bits and fields to support enhanced CQE compression Expose ifc bits and add needed structure fields and methods to support enhanced CQE compression feature. The enhanced CQE compression feature improves cpu utiliziation with better packet latency from nic to host. Signed-off-by: Ofer Levi Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 16 +++++++++++++++- include/linux/mlx5/mlx5_ifc.h | 7 +++++-- 2 files changed, 20 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 95a4fa0fd40a..b5f58fd37a0f 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -822,7 +822,10 @@ struct mlx5_cqe64 { __be32 timestamp_l; __be32 sop_drop_qpn; __be16 wqe_counter; - u8 signature; + union { + u8 signature; + u8 validity_iteration_count; + }; u8 op_own; }; @@ -854,6 +857,11 @@ enum { MLX5_CQE_FORMAT_CSUM_STRIDX = 0x3, }; +enum { + MLX5_CQE_COMPRESS_LAYOUT_BASIC = 0, + MLX5_CQE_COMPRESS_LAYOUT_ENHANCED = 1, +}; + #define MLX5_MINI_CQE_ARRAY_SIZE 8 static inline u8 mlx5_get_cqe_format(struct mlx5_cqe64 *cqe) @@ -866,6 +874,12 @@ static inline u8 get_cqe_opcode(struct mlx5_cqe64 *cqe) return cqe->op_own >> 4; } +static inline u8 get_cqe_enhanced_num_mini_cqes(struct mlx5_cqe64 *cqe) +{ + /* num_of_mini_cqes is zero based */ + return get_cqe_opcode(cqe) + 1; +} + static inline u8 get_cqe_lro_tcppsh(struct mlx5_cqe64 *cqe) { return (cqe->lro.tcppsh_abort_dupack >> 6) & 1; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index f678ad88a7d5..8e87eb47f9dc 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1739,7 +1739,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 log_max_dci_errored_streams[0x5]; u8 reserved_at_598[0x8]; - u8 reserved_at_5a0[0x13]; + u8 reserved_at_5a0[0x10]; + u8 enhanced_cqe_compression[0x1]; + u8 reserved_at_5b1[0x2]; u8 log_max_dek[0x5]; u8 reserved_at_5b8[0x4]; u8 mini_cqe_resp_stride_index[0x1]; @@ -4139,7 +4141,8 @@ struct mlx5_ifc_cqc_bits { u8 cqe_comp_en[0x1]; u8 mini_cqe_res_format[0x2]; u8 st[0x4]; - u8 reserved_at_18[0x8]; + u8 reserved_at_18[0x6]; + u8 cqe_compression_layout[0x2]; u8 reserved_at_20[0x20]; -- cgit From 9822bb87cee12a9d1e3321b652ba537af1f174aa Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Sun, 20 Feb 2022 16:33:27 +0000 Subject: iio: core: drop iio_get_time_res() This function was introduced with the ability to pick a clock. There are no upstream users so presumably it isn't as obviously useful as it seemed at the time. Hence drop it. Signed-off-by: Jonathan Cameron Link: https://lore.kernel.org/r/20220220163327.424696-1-jic23@kernel.org --- include/linux/iio/iio.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index 233d2e6b7721..f6ea2ed99457 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -313,7 +313,6 @@ static inline bool iio_channel_has_available(const struct iio_chan_spec *chan, } s64 iio_get_time_ns(const struct iio_dev *indio_dev); -unsigned int iio_get_time_res(const struct iio_dev *indio_dev); /* * Device operating modes -- cgit From 12c4efe3509b8018e76ea3ebda8227cb53bf5887 Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Sun, 8 May 2022 18:55:41 +0100 Subject: iio: core: Fix IIO_ALIGN and rename as it was not sufficiently large MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Discussion of the series: https://lore.kernel.org/all/20220405135758.774016-1-catalin.marinas@arm.com/ mm, arm64: Reduce ARCH_KMALLOC_MINALIGN brought to my attention that our current IIO usage of L1CACHE_ALIGN is insufficient as their are Arm platforms out their with non coherent DMA and larger cache lines at at higher levels of their cache hierarchy. Rename the define to make it's purpose more explicit. It will be used much more widely going forwards (to replace incorrect ____cacheline_aligned markings. Note this patch will greatly reduce the padding on some architectures that have smaller requirements for DMA safe buffers. The history of changing values of ARCH_KMALLOC_MINALIGN via ARCH_DMA_MINALIGN on arm64 is rather complex. I'm not tagging this as fixing a particular patch from that route as it's not clear what to tag. Most recently a change to bring them back inline was reverted because of some Qualcomm Kryo cores with an L2 cache with 128-byte lines sitting above the point of coherency. c1132702c71f Revert "arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES)" That reverts: 65688d2a05de arm64: cache: Lower ARCH_DMA_MINALIGN to 64 (L1_CACHE_BYTES) which refers to the change originally being motivated by Thunder x1 performance rather than correctness. Fixes: 6f7c8ee585e9d ("staging:iio: Add ability to allocate private data space to iio_allocate_device") Signed-off-by: Jonathan Cameron Acked-by: Nuno Sá Link: https://lore.kernel.org/r/20220508175712.647246-2-jic23@kernel.org --- include/linux/iio/iio.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index f6ea2ed99457..4e21a82b3756 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -9,6 +9,7 @@ #include #include +#include #include #include /* IIO TODO LIST */ @@ -708,8 +709,13 @@ static inline void *iio_device_get_drvdata(const struct iio_dev *indio_dev) return dev_get_drvdata(&indio_dev->dev); } -/* Can we make this smaller? */ -#define IIO_ALIGN L1_CACHE_BYTES +/* + * Used to ensure the iio_priv() structure is aligned to allow that structure + * to in turn include IIO_DMA_MINALIGN'd elements such as buffers which + * must not share cachelines with the rest of the structure, thus making + * them safe for use with non-coherent DMA. + */ +#define IIO_DMA_MINALIGN ARCH_KMALLOC_MINALIGN struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv); /* The information at the returned address is guaranteed to be cacheline aligned */ -- cgit From 6dbdc9f35360d4ef4704462c265f63b32fcb5354 Mon Sep 17 00:00:00 2001 From: Hongyi Lu Date: Mon, 13 Jun 2022 21:16:33 +0000 Subject: bpf: Fix spelling in bpf_verifier.h Minor spelling fix spotted in bpf_verifier.h. Spelling is no big deal, but it is still an improvement when reading through the code. Signed-off-by: Hongyi Lu Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220613211633.58647-1-jwnhy0@gmail.com --- include/linux/bpf_verifier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index e8439f6cbe57..3930c963fa67 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -299,7 +299,7 @@ struct bpf_verifier_state { * If is_state_visited() sees a state with branches > 0 it means * there is a loop. If such state is exactly equal to the current state * it's an infinite loop. Note states_equal() checks for states - * equvalency, so two states being 'states_equal' does not mean + * equivalency, so two states being 'states_equal' does not mean * infinite loop. The exact comparison is provided by * states_maybe_looping() function. It's a stronger pre-check and * much faster than states_equal(). -- cgit From 018ab4fabddd94f1c96f3b59e180691b9e88d5d8 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Tue, 14 Jun 2022 10:36:11 -0700 Subject: netfs: fix up netfs_inode_init() docbook comment Commit e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode wrapper introduced") changed the argument types and names, and actually updated the comment too (although that was thanks to David Howells, not me: my original patch only changed the code). But the comment fixup didn't go quite far enough, and didn't change the argument name in the comment, resulting in include/linux/netfs.h:314: warning: Function parameter or member 'ctx' not described in 'netfs_inode_init' include/linux/netfs.h:314: warning: Excess function parameter 'inode' description in 'netfs_inode_init' during htmldoc generation. Fixes: e81fb4198e27 ("netfs: Further cleanups after struct netfs_inode wrapper introduced") Reported-by: Stephen Rothwell Signed-off-by: Linus Torvalds --- include/linux/netfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 097cdd644665..1773e5df8e65 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -304,7 +304,7 @@ static inline struct netfs_inode *netfs_inode(struct inode *inode) /** * netfs_inode_init - Initialise a netfslib inode context - * @inode: The netfs inode to initialise + * @ctx: The netfs inode to initialise * @ops: The netfs's operations list * * Initialise the netfs library context struct. This is expected to follow on -- cgit From 064520e8aeaa2569f6504a50a37ac801b73656bc Mon Sep 17 00:00:00 2001 From: Bard Liao Date: Wed, 15 Jun 2022 16:43:48 +0800 Subject: ASoC: SOF: Intel: Add support for MeteorLake (MTL) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add platform abstraction for the Meteor Lake platform. This platform has significant differences compared to the TGL/ADL generation: it relies on new hardware using the code name 'ACE' and only supports the INTEL_IPC4 protocol and firmware architecture based on the Zephyr RTOS Co-developed-by: Ranjani Sridharan Signed-off-by: Ranjani Sridharan Signed-off-by: Bard Liao Reviewed-by: Péter Ujfalusi Reviewed-by: Rander Wang Link: https://lore.kernel.org/r/20220615084348.3489-3-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown --- include/linux/soundwire/sdw_intel.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 67e0d3e750b5..b5b489ea1aef 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -9,6 +9,8 @@ #define SDW_SHIM_BASE 0x2C000 #define SDW_ALH_BASE 0x2C800 +#define SDW_SHIM_BASE_ACE 0x38000 +#define SDW_ALH_BASE_ACE 0x24000 #define SDW_LINK_BASE 0x30000 #define SDW_LINK_SIZE 0x10000 -- cgit From 6365a1935c5151455812e96d8de434c551dc0d98 Mon Sep 17 00:00:00 2001 From: Ma Wupeng Date: Tue, 14 Jun 2022 17:21:52 +0800 Subject: efi: Make code to find mirrored memory ranges generic Commit b05b9f5f9dcf ("x86, mirror: x86 enabling - find mirrored memory ranges") introduce the efi_find_mirror() function on x86. In order to reuse the API we make it public. Arm64 can support mirrored memory too, so function efi_find_mirror() is added to efi_init() to this support for arm64. Since efi_init() is shared by ARM, arm64 and riscv, this patch will bring mirror memory support for these architectures, but this support is only tested in arm64. Signed-off-by: Ma Wupeng Link: https://lore.kernel.org/r/20220614092156.1972846-2-mawupeng1@huawei.com [ardb: fix subject to better reflect the payload] Acked-by: Mike Rapoport Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 7d9b0bb47eb3..53f64c14a525 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -872,6 +872,7 @@ static inline bool efi_rt_services_supported(unsigned int mask) { return (efi.runtime_supported_mask & mask) == mask; } +extern void efi_find_mirror(void); #else static inline bool efi_enabled(int feature) { @@ -889,6 +890,8 @@ static inline bool efi_rt_services_supported(unsigned int mask) { return false; } + +static inline void efi_find_mirror(void) {} #endif extern int efi_status_to_err(efi_status_t status); -- cgit From f67be8b7ee90c292948c3ec6395673963cccaee6 Mon Sep 17 00:00:00 2001 From: Li Chen Date: Sun, 22 May 2022 20:26:58 -0700 Subject: regmap: provide regmap_field helpers for simple bit operations We have set/clear/test operations for regmap, but not for regmap_field yet. So let's introduce regmap_field helpers too. In many instances regmap_field_update_bits() is used for simple bit setting and clearing. In these cases the last argument is redundant and we can hide it with a static inline function. This adds three new helpers for simple bit operations: set_bits, clear_bits and test_bits (the last one defined as a regular function). Signed-off-by: Li Chen Link: https://lore.kernel.org/r/180eef422c3.deae9cd960729.8518395646822099769@zohomail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 8952fa3d0d59..d5b08f4f0dc0 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1336,6 +1336,22 @@ static inline int regmap_field_update_bits(struct regmap_field *field, NULL, false, false); } +static inline int regmap_field_set_bits(struct regmap_field *field, + unsigned int bits) +{ + return regmap_field_update_bits_base(field, bits, bits, NULL, false, + false); +} + +static inline int regmap_field_clear_bits(struct regmap_field *field, + unsigned int bits) +{ + return regmap_field_update_bits_base(field, bits, 0, NULL, false, + false); +} + +int regmap_field_test_bits(struct regmap_field *field, unsigned int bits); + static inline int regmap_field_force_update_bits(struct regmap_field *field, unsigned int mask, unsigned int val) @@ -1769,6 +1785,27 @@ regmap_field_force_update_bits(struct regmap_field *field, return -EINVAL; } +static inline int regmap_field_set_bits(struct regmap_field *field, + unsigned int bits) +{ + WARN_ONCE(1, "regmap API is disabled"); + return -EINVAL; +} + +static inline int regmap_field_clear_bits(struct regmap_field *field, + unsigned int bits) +{ + WARN_ONCE(1, "regmap API is disabled"); + return -EINVAL; +} + +static inline int regmap_field_test_bits(struct regmap_field *field, + unsigned int bits) +{ + WARN_ONCE(1, "regmap API is disabled"); + return -EINVAL; +} + static inline int regmap_fields_write(struct regmap_field *field, unsigned int id, unsigned int val) { -- cgit From 003cbe046171596809c2f37dc07e69df1b4d9f95 Mon Sep 17 00:00:00 2001 From: Basavaraj Natikar Date: Wed, 1 Jun 2022 20:58:55 +0530 Subject: pinctrl: Add pingroup and define PINCTRL_PINGROUP Add 'struct pingroup' to represent pingroup and 'PINCTRL_PINGROUP' macro for inline use. Both are used to manage and represent larger number of pingroups. Signed-off-by: Basavaraj Natikar Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220601152900.1012813-2-Basavaraj.Natikar@amd.com Signed-off-by: Linus Walleij --- include/linux/pinctrl/pinctrl.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pinctrl/pinctrl.h b/include/linux/pinctrl/pinctrl.h index 70b45d28e7a9..487117ccb1bc 100644 --- a/include/linux/pinctrl/pinctrl.h +++ b/include/linux/pinctrl/pinctrl.h @@ -26,6 +26,26 @@ struct pin_config_item; struct gpio_chip; struct device_node; +/** + * struct pingroup - provides information on pingroup + * @name: a name for pingroup + * @pins: an array of pins in the pingroup + * @npins: number of pins in the pingroup + */ +struct pingroup { + const char *name; + const unsigned int *pins; + size_t npins; +}; + +/* Convenience macro to define a single named or anonymous pingroup */ +#define PINCTRL_PINGROUP(_name, _pins, _npins) \ +(struct pingroup){ \ + .name = _name, \ + .pins = _pins, \ + .npins = _npins, \ +} + /** * struct pinctrl_pin_desc - boards/machines provide information on their * pins, pads or other muxable units in this struct -- cgit From b87f02307d3cfbda768520f0687c51ca77e14fc3 Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Wed, 15 Jun 2022 18:28:05 +0200 Subject: printk: Wait for the global console lock when the system is going down There are reports that the console kthreads block the global console lock when the system is going down, for example, reboot, panic. First part of the solution was to block kthreads in these problematic system states so they stopped handling newly added messages. Second part of the solution is to wait when for the kthreads when they are actively printing. It solves the problem when a message was printed before the system entered the problematic state and the kthreads managed to step in. A busy waiting has to be used because panic() can be called in any context and in an unknown state of the scheduler. There must be a timeout because the kthread might get stuck or sleeping and never release the lock. The timeout 10s is an arbitrary value inspired by the softlockup timeout. Link: https://lore.kernel.org/r/20220610205038.GA3050413@paulmck-ThinkPad-P17-Gen-1 Link: https://lore.kernel.org/r/CAMdYzYpF4FNTBPZsEFeWRuEwSies36QM_As8osPWZSr2q-viEA@mail.gmail.com Signed-off-by: Petr Mladek Tested-by: Paul E. McKenney Link: https://lore.kernel.org/r/20220615162805.27962-3-pmladek@suse.com --- include/linux/printk.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index cd26aab0ab2a..c1e07c0652c7 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -174,6 +174,7 @@ extern void printk_prefer_direct_enter(void); extern void printk_prefer_direct_exit(void); extern bool pr_flush(int timeout_ms, bool reset_on_progress); +extern void try_block_console_kthreads(int timeout_ms); /* * Please don't use printk_ratelimit(), because it shares ratelimiting state @@ -238,6 +239,10 @@ static inline bool pr_flush(int timeout_ms, bool reset_on_progress) return true; } +static inline void try_block_console_kthreads(int timeout_ms) +{ +} + static inline int printk_ratelimit(void) { return 0; -- cgit From 10f09307199da274584b8170a41228ca6dfed6d3 Mon Sep 17 00:00:00 2001 From: Nuno Sá Date: Fri, 10 Jun 2022 10:45:30 +0200 Subject: iio: core: drop of.h from iio.h MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is no reason to include OF as we only need to forward declare 'of_phandle_args'. Previously, some drivers were actually relying on this for some headers (those were already fixed). Signed-off-by: Nuno Sá Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220610084545.547700-20-nuno.sa@analog.com Signed-off-by: Jonathan Cameron --- include/linux/iio/iio.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index 4e21a82b3756..d9b4a9ca9a0f 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -11,13 +11,14 @@ #include #include #include -#include /* IIO TODO LIST */ /* * Provide means of adjusting timer accuracy. * Currently assumes nano seconds. */ +struct of_phandle_args; + enum iio_shared_by { IIO_SEPARATE, IIO_SHARED_BY_TYPE, -- cgit From af89cd45603483135bdd238fcb3fa871155a0ae1 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Fri, 20 May 2022 09:57:34 +0200 Subject: clk: Improve documentation for devm_clk_get() and its optional variant MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Make use of "Context:" and "Return:". Mention that the clk is not to be expected to be prepared, previously only not being enabled was mentioned which probably dates from the times when the concept of clk preparation wasn't invented yet. Also describe devm_clk_get_optional() fully instead of just referencing devm_clk_get(). Signed-off-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20220520075737.758761-2-u.kleine-koenig@pengutronix.de Reviewed-by: Russell King (Oracle) Signed-off-by: Stephen Boyd --- include/linux/clk.h | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk.h b/include/linux/clk.h index 39faa54efe88..c8fc398d2ad7 100644 --- a/include/linux/clk.h +++ b/include/linux/clk.h @@ -443,15 +443,16 @@ int __must_check devm_clk_bulk_get_all(struct device *dev, * @dev: device for clock "consumer" * @id: clock consumer ID * - * Returns a struct clk corresponding to the clock producer, or + * Context: May sleep. + * + * Return: a struct clk corresponding to the clock producer, or * valid IS_ERR() condition containing errno. The implementation * uses @dev and @id to determine the clock consumer, and thereby * the clock producer. (IOW, @id may be identical strings, but * clk_get may return different clock producers depending on @dev.) * - * Drivers must assume that the clock source is not enabled. - * - * devm_clk_get should not be called from within interrupt context. + * Drivers must assume that the clock source is neither prepared nor + * enabled. * * The clock will automatically be freed when the device is unbound * from the bus. @@ -464,8 +465,20 @@ struct clk *devm_clk_get(struct device *dev, const char *id); * @dev: device for clock "consumer" * @id: clock consumer ID * - * Behaves the same as devm_clk_get() except where there is no clock producer. - * In this case, instead of returning -ENOENT, the function returns NULL. + * Context: May sleep. + * + * Return: a struct clk corresponding to the clock producer, or + * valid IS_ERR() condition containing errno. The implementation + * uses @dev and @id to determine the clock consumer, and thereby + * the clock producer. If no such clk is found, it returns NULL + * which serves as a dummy clk. That's the only difference compared + * to devm_clk_get(). + * + * Drivers must assume that the clock source is neither prepared nor + * enabled. + * + * The clock will automatically be freed when the device is unbound + * from the bus. */ struct clk *devm_clk_get_optional(struct device *dev, const char *id); -- cgit From 7ef9651e9792b08eb310c6beb202cbc947f43cab Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Fri, 20 May 2022 09:57:36 +0200 Subject: clk: Provide new devm_clk helpers for prepared and enabled clocks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When a driver keeps a clock prepared (or enabled) during the whole lifetime of the driver, these helpers allow to simplify the drivers. Reviewed-by: Jonathan Cameron Reviewed-by: Alexandru Ardelean Signed-off-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20220520075737.758761-4-u.kleine-koenig@pengutronix.de Signed-off-by: Stephen Boyd --- include/linux/clk.h | 109 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 109 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk.h b/include/linux/clk.h index c8fc398d2ad7..c13061cabdfc 100644 --- a/include/linux/clk.h +++ b/include/linux/clk.h @@ -459,6 +459,47 @@ int __must_check devm_clk_bulk_get_all(struct device *dev, */ struct clk *devm_clk_get(struct device *dev, const char *id); +/** + * devm_clk_get_prepared - devm_clk_get() + clk_prepare() + * @dev: device for clock "consumer" + * @id: clock consumer ID + * + * Context: May sleep. + * + * Return: a struct clk corresponding to the clock producer, or + * valid IS_ERR() condition containing errno. The implementation + * uses @dev and @id to determine the clock consumer, and thereby + * the clock producer. (IOW, @id may be identical strings, but + * clk_get may return different clock producers depending on @dev.) + * + * The returned clk (if valid) is prepared. Drivers must however assume + * that the clock is not enabled. + * + * The clock will automatically be unprepared and freed when the device + * is unbound from the bus. + */ +struct clk *devm_clk_get_prepared(struct device *dev, const char *id); + +/** + * devm_clk_get_enabled - devm_clk_get() + clk_prepare_enable() + * @dev: device for clock "consumer" + * @id: clock consumer ID + * + * Context: May sleep. + * + * Return: a struct clk corresponding to the clock producer, or + * valid IS_ERR() condition containing errno. The implementation + * uses @dev and @id to determine the clock consumer, and thereby + * the clock producer. (IOW, @id may be identical strings, but + * clk_get may return different clock producers depending on @dev.) + * + * The returned clk (if valid) is prepared and enabled. + * + * The clock will automatically be disabled, unprepared and freed + * when the device is unbound from the bus. + */ +struct clk *devm_clk_get_enabled(struct device *dev, const char *id); + /** * devm_clk_get_optional - lookup and obtain a managed reference to an optional * clock producer. @@ -482,6 +523,50 @@ struct clk *devm_clk_get(struct device *dev, const char *id); */ struct clk *devm_clk_get_optional(struct device *dev, const char *id); +/** + * devm_clk_get_optional_prepared - devm_clk_get_optional() + clk_prepare() + * @dev: device for clock "consumer" + * @id: clock consumer ID + * + * Context: May sleep. + * + * Return: a struct clk corresponding to the clock producer, or + * valid IS_ERR() condition containing errno. The implementation + * uses @dev and @id to determine the clock consumer, and thereby + * the clock producer. If no such clk is found, it returns NULL + * which serves as a dummy clk. That's the only difference compared + * to devm_clk_get_prepared(). + * + * The returned clk (if valid) is prepared. Drivers must however + * assume that the clock is not enabled. + * + * The clock will automatically be unprepared and freed when the + * device is unbound from the bus. + */ +struct clk *devm_clk_get_optional_prepared(struct device *dev, const char *id); + +/** + * devm_clk_get_optional_enabled - devm_clk_get_optional() + + * clk_prepare_enable() + * @dev: device for clock "consumer" + * @id: clock consumer ID + * + * Context: May sleep. + * + * Return: a struct clk corresponding to the clock producer, or + * valid IS_ERR() condition containing errno. The implementation + * uses @dev and @id to determine the clock consumer, and thereby + * the clock producer. If no such clk is found, it returns NULL + * which serves as a dummy clk. That's the only difference compared + * to devm_clk_get_enabled(). + * + * The returned clk (if valid) is prepared and enabled. + * + * The clock will automatically be disabled, unprepared and freed + * when the device is unbound from the bus. + */ +struct clk *devm_clk_get_optional_enabled(struct device *dev, const char *id); + /** * devm_get_clk_from_child - lookup and obtain a managed reference to a * clock producer from child node. @@ -826,12 +911,36 @@ static inline struct clk *devm_clk_get(struct device *dev, const char *id) return NULL; } +static inline struct clk *devm_clk_get_prepared(struct device *dev, + const char *id) +{ + return NULL; +} + +static inline struct clk *devm_clk_get_enabled(struct device *dev, + const char *id) +{ + return NULL; +} + static inline struct clk *devm_clk_get_optional(struct device *dev, const char *id) { return NULL; } +static inline struct clk *devm_clk_get_optional_prepared(struct device *dev, + const char *id) +{ + return NULL; +} + +static inline struct clk *devm_clk_get_optional_enabled(struct device *dev, + const char *id) +{ + return NULL; +} + static inline int __must_check devm_clk_bulk_get(struct device *dev, int num_clks, struct clk_bulk_data *clks) { -- cgit From 4bca7e80b6455772b4bf3f536dcbc19aac424d6a Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 15 Jun 2022 15:22:29 +0200 Subject: init: Initialize noop_backing_dev_info early noop_backing_dev_info is used by superblocks of various pseudofilesystems such as kdevtmpfs. After commit 10e14073107d ("writeback: Fix inode->i_io_list not be protected by inode->i_lock error") this broke because __mark_inode_dirty() started to access more fields from noop_backing_dev_info and this led to crashes inside locked_inode_to_wb_and_lock_list() called from __mark_inode_dirty(). Fix the problem by initializing noop_backing_dev_info before the filesystems get mounted. Fixes: 10e14073107d ("writeback: Fix inode->i_io_list not be protected by inode->i_lock error") Reported-and-tested-by: Suzuki K Poulose Reported-and-tested-by: Alexandru Elisei Reported-and-tested-by: Guenter Roeck Reviewed-by: Christoph Hellwig Signed-off-by: Jan Kara --- include/linux/backing-dev.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index 2bd073fa6bb5..d452071db572 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -119,6 +119,8 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio); extern struct backing_dev_info noop_backing_dev_info; +int bdi_init(struct backing_dev_info *bdi); + /** * writeback_in_progress - determine whether there is writeback in progress * @wb: bdi_writeback of interest -- cgit From 5a0e4529d9aee8ce348f628ad476c9ddb6cf457d Mon Sep 17 00:00:00 2001 From: Frank Li Date: Tue, 24 May 2022 10:21:52 -0500 Subject: dmaengine: dw-edma: Remove unused irq field in struct dw_edma_chip The "irq" field of struct dw_edma_chip was never used. Remove it. Link: https://lore.kernel.org/r/20220524152159.2370739-2-Frank.Li@nxp.com Tested-by: Serge Semin Tested-by: Manivannan Sadhasivam Signed-off-by: Frank Li Signed-off-by: Bjorn Helgaas Reviewed-by: Serge Semin Reviewed-by: Manivannan Sadhasivam Acked-By: Vinod Koul --- include/linux/dma/edma.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h index cab6e18773da..d4333e721588 100644 --- a/include/linux/dma/edma.h +++ b/include/linux/dma/edma.h @@ -18,13 +18,11 @@ struct dw_edma; * struct dw_edma_chip - representation of DesignWare eDMA controller hardware * @dev: struct device of the eDMA controller * @id: instance ID - * @irq: irq line * @dw: struct dw_edma that is filed by dw_edma_probe() */ struct dw_edma_chip { struct device *dev; int id; - int irq; struct dw_edma *dw; }; -- cgit From 07fd5b6cdf3cc30bfde8fe0f644771688be04447 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 13 Jun 2022 12:19:50 -1000 Subject: cgroup: Use separate src/dst nodes when preloading css_sets for migration Each cset (css_set) is pinned by its tasks. When we're moving tasks around across csets for a migration, we need to hold the source and destination csets to ensure that they don't go away while we're moving tasks about. This is done by linking cset->mg_preload_node on either the mgctx->preloaded_src_csets or mgctx->preloaded_dst_csets list. Using the same cset->mg_preload_node for both the src and dst lists was deemed okay as a cset can't be both the source and destination at the same time. Unfortunately, this overloading becomes problematic when multiple tasks are involved in a migration and some of them are identity noop migrations while others are actually moving across cgroups. For example, this can happen with the following sequence on cgroup1: #1> mkdir -p /sys/fs/cgroup/misc/a/b #2> echo $$ > /sys/fs/cgroup/misc/a/cgroup.procs #3> RUN_A_COMMAND_WHICH_CREATES_MULTIPLE_THREADS & #4> PID=$! #5> echo $PID > /sys/fs/cgroup/misc/a/b/tasks #6> echo $PID > /sys/fs/cgroup/misc/a/cgroup.procs the process including the group leader back into a. In this final migration, non-leader threads would be doing identity migration while the group leader is doing an actual one. After #3, let's say the whole process was in cset A, and that after #4, the leader moves to cset B. Then, during #6, the following happens: 1. cgroup_migrate_add_src() is called on B for the leader. 2. cgroup_migrate_add_src() is called on A for the other threads. 3. cgroup_migrate_prepare_dst() is called. It scans the src list. 4. It notices that B wants to migrate to A, so it tries to A to the dst list but realizes that its ->mg_preload_node is already busy. 5. and then it notices A wants to migrate to A as it's an identity migration, it culls it by list_del_init()'ing its ->mg_preload_node and putting references accordingly. 6. The rest of migration takes place with B on the src list but nothing on the dst list. This means that A isn't held while migration is in progress. If all tasks leave A before the migration finishes and the incoming task pins it, the cset will be destroyed leading to use-after-free. This is caused by overloading cset->mg_preload_node for both src and dst preload lists. We wanted to exclude the cset from the src list but ended up inadvertently excluding it from the dst list too. This patch fixes the issue by separating out cset->mg_preload_node into ->mg_src_preload_node and ->mg_dst_preload_node, so that the src and dst preloadings don't interfere with each other. Signed-off-by: Tejun Heo Reported-by: Mukesh Ojha Reported-by: shisiyuan Link: http://lkml.kernel.org/r/1654187688-27411-1-git-send-email-shisiyuan@xiaomi.com Link: https://www.spinics.net/lists/cgroups/msg33313.html Fixes: f817de98513d ("cgroup: prepare migration path for unified hierarchy") Cc: stable@vger.kernel.org # v3.16+ --- include/linux/cgroup-defs.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 1bfcfb1af352..d4427d0a0e18 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -264,7 +264,8 @@ struct css_set { * List of csets participating in the on-going migration either as * source or destination. Protected by cgroup_mutex. */ - struct list_head mg_preload_node; + struct list_head mg_src_preload_node; + struct list_head mg_dst_preload_node; struct list_head mg_node; /* -- cgit From 4d337cebcb1c27d9b48c48b9a98e939d4552d584 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Thu, 16 Jun 2022 09:44:00 +0800 Subject: blk-mq: avoid to touch q->elevator without any protection q->elevator is referred in blk_mq_has_sqsched() without any protection, no .q_usage_counter is held, no queue srcu and rcu read lock is held, so potential use-after-free may be triggered. Fix the issue by adding one queue flag for checking if the elevator uses single queue style dispatch. Meantime the elevator feature flag of ELEVATOR_F_MQ_AWARE isn't needed any more. Cc: Jan Kara Signed-off-by: Ming Lei Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220616014401.817001-3-ming.lei@redhat.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 608d577734c2..bb6e3c31b3b7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -575,6 +575,7 @@ struct request_queue { #define QUEUE_FLAG_RQ_ALLOC_TIME 27 /* record rq->alloc_time_ns */ #define QUEUE_FLAG_HCTX_ACTIVE 28 /* at least one blk-mq hctx is active */ #define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */ +#define QUEUE_FLAG_SQ_SCHED 30 /* single queue style io dispatch */ #define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ (1 << QUEUE_FLAG_SAME_COMP) | \ @@ -616,6 +617,7 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #define blk_queue_pm_only(q) atomic_read(&(q)->pm_only) #define blk_queue_registered(q) test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags) #define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags) +#define blk_queue_sq_sched(q) test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags) extern void blk_set_pm_only(struct request_queue *q); extern void blk_clear_pm_only(struct request_queue *q); @@ -1006,8 +1008,6 @@ void disk_set_independent_access_ranges(struct gendisk *disk, */ /* Supports zoned block devices sequential write constraint */ #define ELEVATOR_F_ZBD_SEQ_WRITE (1U << 0) -/* Supports scheduling on multiple hardware queues */ -#define ELEVATOR_F_MQ_AWARE (1U << 1) extern void blk_queue_required_elevator_features(struct request_queue *q, unsigned int features); -- cgit From e4a8864f74e9e9e4a7eb93952a4cfa35c165c930 Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Fri, 10 Jun 2022 16:21:30 -0700 Subject: iosys-map: Fix typo in documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit It's one argument, vaddr_iomem, not 2 (vaddr and _iomem). Signed-off-by: Lucas De Marchi Reviewed-by: Christian König Link: https://patchwork.freedesktop.org/patch/msgid/20220610232130.2865479-3-lucas.demarchi@intel.com --- include/linux/iosys-map.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index e69a002d5aa4..4b8406ee8bc4 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -23,7 +23,7 @@ * memcpy(vaddr, src, len); * * void *vaddr_iomem = ...; // pointer to I/O memory - * memcpy_toio(vaddr, _iomem, src, len); + * memcpy_toio(vaddr_iomem, src, len); * * The user of such pointer may not have information about the mapping of that * region or may want to have a single code path to handle operations on that -- cgit From 034e5afad921f1c08c001bf147fb1ba76ae33498 Mon Sep 17 00:00:00 2001 From: Alex Williamson Date: Fri, 10 Jun 2022 16:35:13 -0600 Subject: mm: re-allow pinning of zero pfns The commit referenced below subtly and inadvertently changed the logic to disallow pinning of zero pfns. This breaks device assignment with vfio and potentially various other users of gup. Exclude the zero page test from the negation. Link: https://lkml.kernel.org/r/165490039431.944052.12458624139225785964.stgit@omen Fixes: 1c563432588d ("mm: fix is_pinnable_page against a cma page") Signed-off-by: Alex Williamson Acked-by: Minchan Kim Acked-by: David Hildenbrand Reported-by: Yishai Hadas Cc: Paul E. McKenney Cc: John Hubbard Cc: John Dias Cc: Jason Gunthorpe Cc: Zhangfei Gao Cc: Matthew Wilcox Cc: Joao Martins Cc: Yi Liu Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index bc8f326be0ce..781fae17177d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1600,7 +1600,7 @@ static inline bool is_pinnable_page(struct page *page) if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) return false; #endif - return !(is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page))); + return !is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page)); } #else static inline bool is_pinnable_page(struct page *page) -- cgit From 67f22ba7750f940bcd7e1b12720896c505c2d63f Mon Sep 17 00:00:00 2001 From: zhenwei pi Date: Wed, 15 Jun 2022 17:32:09 +0800 Subject: mm/memory-failure: disable unpoison once hw error happens Currently unpoison_memory(unsigned long pfn) is designed for soft poison(hwpoison-inject) only. Since 17fae1294ad9d, the KPTE gets cleared on a x86 platform once hardware memory corrupts. Unpoisoning a hardware corrupted page puts page back buddy only, the kernel has a chance to access the page with *NOT PRESENT* KPTE. This leads BUG during accessing on the corrupted KPTE. Suggested by David&Naoya, disable unpoison mechanism when a real HW error happens to avoid BUG like this: Unpoison: Software-unpoisoned page 0x61234 BUG: unable to handle page fault for address: ffff888061234000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 2c01067 P4D 2c01067 PUD 107267063 PMD 10382b063 PTE 800fffff9edcb062 Oops: 0002 [#1] PREEMPT SMP NOPTI CPU: 4 PID: 26551 Comm: stress Kdump: loaded Tainted: G M OE 5.18.0.bm.1-amd64 #7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ... RIP: 0010:clear_page_erms+0x7/0x10 Code: ... RSP: 0000:ffffc90001107bc8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000001000 RDX: ffffea0001848d00 RSI: ffffea0001848d40 RDI: ffff888061234000 RBP: ffffea0001848d00 R08: 0000000000000901 R09: 0000000000001276 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000001 R13: 0000000000000000 R14: 0000000000140dca R15: 0000000000000001 FS: 00007fd8b2333740(0000) GS:ffff88813fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff888061234000 CR3: 00000001023d2005 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: prep_new_page+0x151/0x170 get_page_from_freelist+0xca0/0xe20 ? sysvec_apic_timer_interrupt+0xab/0xc0 ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 __alloc_pages+0x17e/0x340 __folio_alloc+0x17/0x40 vma_alloc_folio+0x84/0x280 __handle_mm_fault+0x8d4/0xeb0 handle_mm_fault+0xd5/0x2a0 do_user_addr_fault+0x1d0/0x680 ? kvm_read_and_reset_apf_flags+0x3b/0x50 exc_page_fault+0x78/0x170 asm_exc_page_fault+0x27/0x30 Link: https://lkml.kernel.org/r/20220615093209.259374-2-pizhenwei@bytedance.com Fixes: 847ce401df392 ("HWPOISON: Add unpoisoning support") Fixes: 17fae1294ad9d ("x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned") Signed-off-by: zhenwei pi Acked-by: David Hildenbrand Acked-by: Naoya Horiguchi Reviewed-by: Miaohe Lin Reviewed-by: Oscar Salvador Cc: Greg Kroah-Hartman Cc: [5.8+] Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 781fae17177d..cf3d0d673f6b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3232,6 +3232,7 @@ enum mf_flags { MF_MUST_KILL = 1 << 2, MF_SOFT_OFFLINE = 1 << 3, MF_UNPOISON = 1 << 4, + MF_SW_SIMULATED = 1 << 5, }; extern int memory_failure(unsigned long pfn, int flags); extern void memory_failure_queue(unsigned long pfn, int flags); -- cgit From d687f621c518d791b5fffde8add3112d869b0b1b Mon Sep 17 00:00:00 2001 From: Delyan Kratunov Date: Tue, 14 Jun 2022 23:10:42 +0000 Subject: bpf: move bpf_prog to bpf.h In order to add a version of bpf_prog_run_array which accesses the bpf_prog->aux member, bpf_prog needs to be more than a forward declaration inside bpf.h. Given that filter.h already includes bpf.h, this merely reorders the type declarations for filter.h users. bpf.h users now have access to bpf_prog internals. Signed-off-by: Delyan Kratunov Link: https://lore.kernel.org/r/3ed7824e3948f22d84583649ccac0ff0d38b6b58.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 36 ++++++++++++++++++++++++++++++++++++ include/linux/filter.h | 34 ---------------------------------- 2 files changed, 36 insertions(+), 34 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 8e6092d0ea95..69106ae46464 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -5,6 +5,7 @@ #define _LINUX_BPF_H 1 #include +#include #include #include @@ -22,6 +23,7 @@ #include #include #include +#include #include #include @@ -1084,6 +1086,40 @@ struct bpf_prog_aux { }; }; +struct bpf_prog { + u16 pages; /* Number of allocated pages */ + u16 jited:1, /* Is our filter JIT'ed? */ + jit_requested:1,/* archs need to JIT the prog */ + gpl_compatible:1, /* Is filter GPL compatible? */ + cb_access:1, /* Is control block accessed? */ + dst_needed:1, /* Do we need dst entry? */ + blinding_requested:1, /* needs constant blinding */ + blinded:1, /* Was blinded */ + is_func:1, /* program is a bpf function */ + kprobe_override:1, /* Do we override a kprobe? */ + has_callchain_buf:1, /* callchain buffer allocated? */ + enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ + call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ + call_get_func_ip:1, /* Do we call get_func_ip() */ + tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */ + enum bpf_prog_type type; /* Type of BPF program */ + enum bpf_attach_type expected_attach_type; /* For some prog types */ + u32 len; /* Number of filter blocks */ + u32 jited_len; /* Size of jited insns in bytes */ + u8 tag[BPF_TAG_SIZE]; + struct bpf_prog_stats __percpu *stats; + int __percpu *active; + unsigned int (*bpf_func)(const void *ctx, + const struct bpf_insn *insn); + struct bpf_prog_aux *aux; /* Auxiliary fields */ + struct sock_fprog_kern *orig_prog; /* Original BPF program */ + /* Instructions for interpreter */ + union { + DECLARE_FLEX_ARRAY(struct sock_filter, insns); + DECLARE_FLEX_ARRAY(struct bpf_insn, insnsi); + }; +}; + struct bpf_array_aux { /* Programs with direct jumps into programs part of this array. */ struct list_head poke_progs; diff --git a/include/linux/filter.h b/include/linux/filter.h index ed0c0ff42ad5..d0cbb31b1b4d 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -559,40 +559,6 @@ struct bpf_prog_stats { struct u64_stats_sync syncp; } __aligned(2 * sizeof(u64)); -struct bpf_prog { - u16 pages; /* Number of allocated pages */ - u16 jited:1, /* Is our filter JIT'ed? */ - jit_requested:1,/* archs need to JIT the prog */ - gpl_compatible:1, /* Is filter GPL compatible? */ - cb_access:1, /* Is control block accessed? */ - dst_needed:1, /* Do we need dst entry? */ - blinding_requested:1, /* needs constant blinding */ - blinded:1, /* Was blinded */ - is_func:1, /* program is a bpf function */ - kprobe_override:1, /* Do we override a kprobe? */ - has_callchain_buf:1, /* callchain buffer allocated? */ - enforce_expected_attach_type:1, /* Enforce expected_attach_type checking at attach time */ - call_get_stack:1, /* Do we call bpf_get_stack() or bpf_get_stackid() */ - call_get_func_ip:1, /* Do we call get_func_ip() */ - tstamp_type_access:1; /* Accessed __sk_buff->tstamp_type */ - enum bpf_prog_type type; /* Type of BPF program */ - enum bpf_attach_type expected_attach_type; /* For some prog types */ - u32 len; /* Number of filter blocks */ - u32 jited_len; /* Size of jited insns in bytes */ - u8 tag[BPF_TAG_SIZE]; - struct bpf_prog_stats __percpu *stats; - int __percpu *active; - unsigned int (*bpf_func)(const void *ctx, - const struct bpf_insn *insn); - struct bpf_prog_aux *aux; /* Auxiliary fields */ - struct sock_fprog_kern *orig_prog; /* Original BPF program */ - /* Instructions for interpreter */ - union { - DECLARE_FLEX_ARRAY(struct sock_filter, insns); - DECLARE_FLEX_ARRAY(struct bpf_insn, insnsi); - }; -}; - struct sk_filter { refcount_t refcnt; struct rcu_head rcu; -- cgit From 8c7dcb84e3b744b2b70baa7a44a9b1881c33a9c9 Mon Sep 17 00:00:00 2001 From: Delyan Kratunov Date: Tue, 14 Jun 2022 23:10:46 +0000 Subject: bpf: implement sleepable uprobes by chaining gps uprobes work by raising a trap, setting a task flag from within the interrupt handler, and processing the actual work for the uprobe on the way back to userspace. As a result, uprobe handlers already execute in a might_fault/_sleep context. The primary obstacle to sleepable bpf uprobe programs is therefore on the bpf side. Namely, the bpf_prog_array attached to the uprobe is protected by normal rcu. In order for uprobe bpf programs to become sleepable, it has to be protected by the tasks_trace rcu flavor instead (and kfree() called after a corresponding grace period). Therefore, the free path for bpf_prog_array now chains a tasks_trace and normal grace periods one after the other. Users who iterate under tasks_trace read section would be safe, as would users who iterate under normal read sections (from non-sleepable locations). The downside is that the tasks_trace latency affects all perf_event-attached bpf programs (and not just uprobe ones). This is deemed safe given the possible attach rates for kprobe/uprobe/tp programs. Separately, non-sleepable programs need access to dynamically sized rcu-protected maps, so bpf_run_prog_array_sleepables now conditionally takes an rcu read section, in addition to the overarching tasks_trace section. Signed-off-by: Delyan Kratunov Link: https://lore.kernel.org/r/ce844d62a2fd0443b08c5ab02e95bc7149f9aeb1.1655248076.git.delyank@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 69106ae46464..f3e88afdaffe 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -26,6 +26,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; @@ -1372,6 +1373,8 @@ extern struct bpf_empty_prog_array bpf_empty_prog_array; struct bpf_prog_array *bpf_prog_array_alloc(u32 prog_cnt, gfp_t flags); void bpf_prog_array_free(struct bpf_prog_array *progs); +/* Use when traversal over the bpf_prog_array uses tasks_trace rcu */ +void bpf_prog_array_free_sleepable(struct bpf_prog_array *progs); int bpf_prog_array_length(struct bpf_prog_array *progs); bool bpf_prog_array_is_empty(struct bpf_prog_array *array); int bpf_prog_array_copy_to_user(struct bpf_prog_array *progs, @@ -1463,6 +1466,55 @@ bpf_prog_run_array(const struct bpf_prog_array *array, return ret; } +/* Notes on RCU design for bpf_prog_arrays containing sleepable programs: + * + * We use the tasks_trace rcu flavor read section to protect the bpf_prog_array + * overall. As a result, we must use the bpf_prog_array_free_sleepable + * in order to use the tasks_trace rcu grace period. + * + * When a non-sleepable program is inside the array, we take the rcu read + * section and disable preemption for that program alone, so it can access + * rcu-protected dynamically sized maps. + */ +static __always_inline u32 +bpf_prog_run_array_sleepable(const struct bpf_prog_array __rcu *array_rcu, + const void *ctx, bpf_prog_run_fn run_prog) +{ + const struct bpf_prog_array_item *item; + const struct bpf_prog *prog; + const struct bpf_prog_array *array; + struct bpf_run_ctx *old_run_ctx; + struct bpf_trace_run_ctx run_ctx; + u32 ret = 1; + + might_fault(); + + rcu_read_lock_trace(); + migrate_disable(); + + array = rcu_dereference_check(array_rcu, rcu_read_lock_trace_held()); + if (unlikely(!array)) + goto out; + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); + item = &array->items[0]; + while ((prog = READ_ONCE(item->prog))) { + if (!prog->aux->sleepable) + rcu_read_lock(); + + run_ctx.bpf_cookie = item->bpf_cookie; + ret &= run_prog(prog, ctx); + item++; + + if (!prog->aux->sleepable) + rcu_read_unlock(); + } + bpf_reset_run_ctx(old_run_ctx); +out: + migrate_enable(); + rcu_read_unlock_trace(); + return ret; +} + #ifdef CONFIG_BPF_SYSCALL DECLARE_PER_CPU(int, bpf_prog_active); extern struct mutex bpf_stats_enabled_mutex; -- cgit From d92725256b4f22d084b813b37ddc394da79aacab Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Mon, 30 May 2022 14:34:50 -0400 Subject: mm: avoid unnecessary page fault retires on shared memory types I observed that for each of the shared file-backed page faults, we're very likely to retry one more time for the 1st write fault upon no page. It's because we'll need to release the mmap lock for dirty rate limit purpose with balance_dirty_pages_ratelimited() (in fault_dirty_shared_page()). Then after that throttling we return VM_FAULT_RETRY. We did that probably because VM_FAULT_RETRY is the only way we can return to the fault handler at that time telling it we've released the mmap lock. However that's not ideal because it's very likely the fault does not need to be retried at all since the pgtable was well installed before the throttling, so the next continuous fault (including taking mmap read lock, walk the pgtable, etc.) could be in most cases unnecessary. It's not only slowing down page faults for shared file-backed, but also add more mmap lock contention which is in most cases not needed at all. To observe this, one could try to write to some shmem page and look at "pgfault" value in /proc/vmstat, then we should expect 2 counts for each shmem write simply because we retried, and vm event "pgfault" will capture that. To make it more efficient, add a new VM_FAULT_COMPLETED return code just to show that we've completed the whole fault and released the lock. It's also a hint that we should very possibly not need another fault immediately on this page because we've just completed it. This patch provides a ~12% perf boost on my aarch64 test VM with a simple program sequentially dirtying 400MB shmem file being mmap()ed and these are the time it needs: Before: 650.980 ms (+-1.94%) After: 569.396 ms (+-1.38%) I believe it could help more than that. We need some special care on GUP and the s390 pgfault handler (for gmap code before returning from pgfault), the rest changes in the page fault handlers should be relatively straightforward. Another thing to mention is that mm_account_fault() does take this new fault as a generic fault to be accounted, unlike VM_FAULT_RETRY. I explicitly didn't touch hmm_vma_fault() and break_ksm() because they do not handle VM_FAULT_RETRY even with existing code, so I'm literally keeping them as-is. Link: https://lkml.kernel.org/r/20220530183450.42886-1-peterx@redhat.com Signed-off-by: Peter Xu Acked-by: Geert Uytterhoeven Acked-by: Peter Zijlstra (Intel) Acked-by: Johannes Weiner Acked-by: Vineet Gupta Acked-by: Guo Ren Acked-by: Max Filippov Acked-by: Christian Borntraeger Acked-by: Michael Ellerman (powerpc) Acked-by: Catalin Marinas Reviewed-by: Alistair Popple Reviewed-by: Ingo Molnar Acked-by: Russell King (Oracle) [arm part] Acked-by: Heiko Carstens Cc: Vasily Gorbik Cc: Stafford Horne Cc: David S. Miller Cc: Johannes Berg Cc: Brian Cain Cc: Richard Henderson Cc: Richard Weinberger Cc: Benjamin Herrenschmidt Cc: Thomas Gleixner Cc: Janosch Frank Cc: Albert Ou Cc: Anton Ivanov Cc: Dave Hansen Cc: Borislav Petkov Cc: Sven Schnelle Cc: Andrea Arcangeli Cc: James Bottomley Cc: Al Viro Cc: Alexander Gordeev Cc: Jonas Bonn Cc: Will Deacon Cc: Vlastimil Babka Cc: Michal Simek Cc: Matt Turner Cc: Paul Mackerras Cc: David Hildenbrand Cc: Nicholas Piggin Cc: Palmer Dabbelt Cc: Stefan Kristiansson Cc: Paul Walmsley Cc: Ivan Kokshaysky Cc: Chris Zankel Cc: Hugh Dickins Cc: Dinh Nguyen Cc: Rich Felker Cc: H. Peter Anvin Cc: Andy Lutomirski Cc: Thomas Bogendoerfer Cc: Helge Deller Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c29ab4c0cd5c..6b961a29bf26 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -729,6 +729,7 @@ typedef __bitwise unsigned int vm_fault_t; * @VM_FAULT_NEEDDSYNC: ->fault did not modify page tables and needs * fsync() to complete (for synchronous page faults * in DAX) + * @VM_FAULT_COMPLETED: ->fault completed, meanwhile mmap lock released * @VM_FAULT_HINDEX_MASK: mask HINDEX value * */ @@ -746,6 +747,7 @@ enum vm_fault_reason { VM_FAULT_FALLBACK = (__force vm_fault_t)0x000800, VM_FAULT_DONE_COW = (__force vm_fault_t)0x001000, VM_FAULT_NEEDDSYNC = (__force vm_fault_t)0x002000, + VM_FAULT_COMPLETED = (__force vm_fault_t)0x004000, VM_FAULT_HINDEX_MASK = (__force vm_fault_t)0x0f0000, }; -- cgit From bcc728eb4f446073e0160671d7d0059a4e9aa300 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Tue, 31 May 2022 10:04:21 +0800 Subject: mm/damon: remove obsolete comments of kdamond_stop Since commit 0f91d13366a4 ("mm/damon: simplify stop mechanism") delete kdamond_stop and change to use kthread stop mechanism, these obsolete comments should be removed accordingly. Link: https://lkml.kernel.org/r/20220531020421.46849-1-zhouchengming@bytedance.com Signed-off-by: Chengming Zhou Reviewed-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 7c62da31ce4b..2765c7d99beb 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -397,7 +397,6 @@ struct damon_callback { * detail. * * @kdamond: Kernel thread who does the monitoring. - * @kdamond_stop: Notifies whether kdamond should stop. * @kdamond_lock: Mutex for the synchronizations with @kdamond. * * For each monitoring context, one kernel thread for the monitoring is @@ -406,14 +405,14 @@ struct damon_callback { * Once started, the monitoring thread runs until explicitly required to be * terminated or every monitoring target is invalid. The validity of the * targets is checked via the &damon_operations.target_valid of @ops. The - * termination can also be explicitly requested by writing non-zero to - * @kdamond_stop. The thread sets @kdamond to NULL when it terminates. - * Therefore, users can know whether the monitoring is ongoing or terminated by - * reading @kdamond. Reads and writes to @kdamond and @kdamond_stop from - * outside of the monitoring thread must be protected by @kdamond_lock. - * - * Note that the monitoring thread protects only @kdamond and @kdamond_stop via - * @kdamond_lock. Accesses to other fields must be protected by themselves. + * termination can also be explicitly requested by calling damon_stop(). + * The thread sets @kdamond to NULL when it terminates. Therefore, users can + * know whether the monitoring is ongoing or terminated by reading @kdamond. + * Reads and writes to @kdamond from outside of the monitoring thread must + * be protected by @kdamond_lock. + * + * Note that the monitoring thread protects only @kdamond via @kdamond_lock. + * Accesses to other fields must be protected by themselves. * * @ops: Set of monitoring operations for given use cases. * @callback: Set of callbacks for monitoring events notifications. -- cgit From 9384d79249d04b03572abb7e551a35d99c9268c0 Mon Sep 17 00:00:00 2001 From: "Fabio M. De Francesco" Date: Mon, 6 Jun 2022 16:15:33 +0200 Subject: mm/highmem: delete memmove_page() Matthew Wilcox reported that, while he was looking at memmove_page(), he realized that it can't actually work. The reasons are hidden in its implementation, which makes use of memmove() on logical addresses provided by kmap_local_page(). memmove() does the wrong thing when it tests "if (dest <= src)". Therefore, delete memmove_page(). No need to change any other code because we have no call sites of memmove_page() across the whole kernel. Link: https://lkml.kernel.org/r/20220606141533.555-1-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco Reported-by: Matthew Wilcox Reviewed-by: Baoquan He Reviewed-by: Ira Weiny Cc: Sebastian Andrzej Siewior Signed-off-by: Andrew Morton --- include/linux/highmem.h | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 3af34de54330..fee9835e3793 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -336,19 +336,6 @@ static inline void memcpy_page(struct page *dst_page, size_t dst_off, kunmap_local(dst); } -static inline void memmove_page(struct page *dst_page, size_t dst_off, - struct page *src_page, size_t src_off, - size_t len) -{ - char *dst = kmap_local_page(dst_page); - char *src = kmap_local_page(src_page); - - VM_BUG_ON(dst_off + len > PAGE_SIZE || src_off + len > PAGE_SIZE); - memmove(dst + dst_off, src + src_off, len); - kunmap_local(src); - kunmap_local(dst); -} - static inline void memset_page(struct page *page, size_t offset, int val, size_t len) { -- cgit From c200d90049dbe08fa8b016f74b713fddefca0479 Mon Sep 17 00:00:00 2001 From: Patrick Wang Date: Sat, 11 Jun 2022 11:55:48 +0800 Subject: mm: kmemleak: remove kmemleak_not_leak_phys() and the min_count argument to kmemleak_alloc_phys() Patch series "mm: kmemleak: store objects allocated with physical address separately and check when scan", v4. The kmemleak_*_phys() interface uses "min_low_pfn" and "max_low_pfn" to check address. But on some architectures, kmemleak_*_phys() is called before those two variables initialized. The following steps will be taken: 1) Add OBJECT_PHYS flag and rbtree for the objects allocated with physical address 2) Store physical address in objects if allocated with OBJECT_PHYS 3) Check the boundary when scan instead of in kmemleak_*_phys() This patch set will solve: https://lore.kernel.org/r/20220527032504.30341-1-yee.lee@mediatek.com https://lore.kernel.org/r/9dd08bb5-f39e-53d8-f88d-bec598a08c93@gmail.com v3: https://lore.kernel.org/r/20220609124950.1694394-1-patrick.wang.shcn@gmail.com v2: https://lore.kernel.org/r/20220603035415.1243913-1-patrick.wang.shcn@gmail.com v1: https://lore.kernel.org/r/20220531150823.1004101-1-patrick.wang.shcn@gmail.com This patch (of 4): Remove the unused kmemleak_not_leak_phys() function. And remove the min_count argument to kmemleak_alloc_phys() function, assume it's 0. Link: https://lkml.kernel.org/r/20220611035551.1823303-1-patrick.wang.shcn@gmail.com Link: https://lkml.kernel.org/r/20220611035551.1823303-2-patrick.wang.shcn@gmail.com Signed-off-by: Patrick Wang Suggested-by: Catalin Marinas Reviewed-by: Catalin Marinas Cc: Yee Lee Signed-off-by: Andrew Morton --- include/linux/kmemleak.h | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kmemleak.h b/include/linux/kmemleak.h index 34684b2026ab..6a3cd1bf4680 100644 --- a/include/linux/kmemleak.h +++ b/include/linux/kmemleak.h @@ -29,10 +29,9 @@ extern void kmemleak_not_leak(const void *ptr) __ref; extern void kmemleak_ignore(const void *ptr) __ref; extern void kmemleak_scan_area(const void *ptr, size_t size, gfp_t gfp) __ref; extern void kmemleak_no_scan(const void *ptr) __ref; -extern void kmemleak_alloc_phys(phys_addr_t phys, size_t size, int min_count, +extern void kmemleak_alloc_phys(phys_addr_t phys, size_t size, gfp_t gfp) __ref; extern void kmemleak_free_part_phys(phys_addr_t phys, size_t size) __ref; -extern void kmemleak_not_leak_phys(phys_addr_t phys) __ref; extern void kmemleak_ignore_phys(phys_addr_t phys) __ref; static inline void kmemleak_alloc_recursive(const void *ptr, size_t size, @@ -107,15 +106,12 @@ static inline void kmemleak_no_scan(const void *ptr) { } static inline void kmemleak_alloc_phys(phys_addr_t phys, size_t size, - int min_count, gfp_t gfp) + gfp_t gfp) { } static inline void kmemleak_free_part_phys(phys_addr_t phys, size_t size) { } -static inline void kmemleak_not_leak_phys(phys_addr_t phys) -{ -} static inline void kmemleak_ignore_phys(phys_addr_t phys) { } -- cgit From fc4db90fe71e640e3fe88df346f7cf653b75315d Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Fri, 10 Jun 2022 11:03:10 -0700 Subject: mm: kmem: make mem_cgroup_from_obj() vmalloc()-safe MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently mem_cgroup_from_obj() is not working properly with objects allocated using vmalloc(). It creates problems in some cases, when it's called for static objects belonging to modules or generally allocated using vmalloc(). This patch makes mem_cgroup_from_obj() safe to be called on objects allocated using vmalloc(). It also introduces mem_cgroup_from_slab_obj(), which is a faster version to use in places when we know the object is either a slab object or a generic slab page (e.g. when adding an object to a lru list). Link: https://lkml.kernel.org/r/20220610180310.1725111-1-roman.gushchin@linux.dev Suggested-by: Kefeng Wang Signed-off-by: Roman Gushchin Tested-by: Linux Kernel Functional Testing Acked-by: Shakeel Butt Tested-by: Vasily Averin Acked-by: Michal Hocko Acked-by: Muchun Song Cc: Johannes Weiner Cc: Naresh Kamboju Cc: Qian Cai Cc: Kefeng Wang Cc: David S. Miller Cc: Eric Dumazet Cc: Florian Westphal Cc: Jakub Kicinski Cc: Michal Koutný Cc: Paolo Abeni Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 9ecead1042b9..3ce96ce5fe3e 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1740,6 +1740,7 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg) } struct mem_cgroup *mem_cgroup_from_obj(void *p); +struct mem_cgroup *mem_cgroup_from_slab_obj(void *p); static inline void count_objcg_event(struct obj_cgroup *objcg, enum vm_event_item idx) @@ -1801,6 +1802,11 @@ static inline struct mem_cgroup *mem_cgroup_from_obj(void *p) return NULL; } +static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p) +{ + return NULL; +} + static inline void count_objcg_event(struct obj_cgroup *objcg, enum vm_event_item idx) { -- cgit From 1d0403d20f6c281cb3d14c5f1db5317caeec48e9 Mon Sep 17 00:00:00 2001 From: Vasily Averin Date: Fri, 3 Jun 2022 07:19:43 +0300 Subject: net: set proper memcg for net_init hooks allocations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit __register_pernet_operations() executes init hook of registered pernet_operation structure in all existing net namespaces. Typically, these hooks are called by a process associated with the specified net namespace, and all __GFP_ACCOUNT marked allocation are accounted for corresponding container/memcg. However __register_pernet_operations() calls the hooks in the same context, and as a result all marked allocations are accounted to one memcg for all processed net namespaces. This patch adjusts active memcg for each net namespace and helps to account memory allocated inside ops_init() into the proper memcg. Link: https://lkml.kernel.org/r/f9394752-e272-9bf9-645f-a18c56d1c4ec@openvz.org Signed-off-by: Vasily Averin Acked-by: Roman Gushchin Acked-by: Shakeel Butt Cc: Michal Koutný Cc: Vlastimil Babka Cc: Michal Hocko Cc: Florian Westphal Cc: David S. Miller Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Eric Dumazet Cc: Johannes Weiner Cc: Kefeng Wang Cc: Linux Kernel Functional Testing Cc: Muchun Song Cc: Naresh Kamboju Cc: Qian Cai Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 47 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 3ce96ce5fe3e..04f2f33607e9 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1756,6 +1756,42 @@ static inline void count_objcg_event(struct obj_cgroup *objcg, rcu_read_unlock(); } +/** + * get_mem_cgroup_from_obj - get a memcg associated with passed kernel object. + * @p: pointer to object from which memcg should be extracted. It can be NULL. + * + * Retrieves the memory group into which the memory of the pointed kernel + * object is accounted. If memcg is found, its reference is taken. + * If a passed kernel object is uncharged, or if proper memcg cannot be found, + * as well as if mem_cgroup is disabled, NULL is returned. + * + * Return: valid memcg pointer with taken reference or NULL. + */ +static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p) +{ + struct mem_cgroup *memcg; + + rcu_read_lock(); + do { + memcg = mem_cgroup_from_obj(p); + } while (memcg && !css_tryget(&memcg->css)); + rcu_read_unlock(); + return memcg; +} + +/** + * mem_cgroup_or_root - always returns a pointer to a valid memory cgroup. + * @memcg: pointer to a valid memory cgroup or NULL. + * + * If passed argument is not NULL, returns it without any additional checks + * and changes. Otherwise, root_mem_cgroup is returned. + * + * NOTE: root_mem_cgroup can be NULL during early boot. + */ +static inline struct mem_cgroup *mem_cgroup_or_root(struct mem_cgroup *memcg) +{ + return memcg ? memcg : root_mem_cgroup; +} #else static inline bool mem_cgroup_kmem_disabled(void) { @@ -1799,7 +1835,7 @@ static inline int memcg_kmem_id(struct mem_cgroup *memcg) static inline struct mem_cgroup *mem_cgroup_from_obj(void *p) { - return NULL; + return NULL; } static inline struct mem_cgroup *mem_cgroup_from_slab_obj(void *p) @@ -1812,6 +1848,15 @@ static inline void count_objcg_event(struct obj_cgroup *objcg, { } +static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p) +{ + return NULL; +} + +static inline struct mem_cgroup *mem_cgroup_or_root(struct mem_cgroup *memcg) +{ + return NULL; +} #endif /* CONFIG_MEMCG_KMEM */ #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) -- cgit From 4815a36009044ba69a9b8d781943ec6505c451a2 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 3 Jun 2022 20:10:12 +0300 Subject: include/linux/rbtree.h: replace kernel.h with the necessary inclusions When kernel.h is used in the headers it adds a lot into dependency hell, especially when there are circular dependencies are involved. Replace kernel.h inclusion with the list of what is really being used. Link: https://lkml.kernel.org/r/20220603171012.48880-1-andriy.shevchenko@linux.intel.com Signed-off-by: Andy Shevchenko Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/rbtree.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h index 235047d7a1b5..f7edca369eda 100644 --- a/include/linux/rbtree.h +++ b/include/linux/rbtree.h @@ -17,9 +17,9 @@ #ifndef _LINUX_RBTREE_H #define _LINUX_RBTREE_H +#include #include -#include #include #include -- cgit From dabba87229411a5e9d20ac03ffc36463c53ae672 Mon Sep 17 00:00:00 2001 From: Pasha Tatashin Date: Fri, 27 May 2022 02:55:34 +0000 Subject: fs/kernel_read_file: allow to read files up-to ssize_t Patch series "Allow to kexec with initramfs larger than 2G", v2. Currently, the largest initramfs that is supported by kexec_file_load() syscall is 2G. This is because kernel_read_file() returns int, and is limited to INT_MAX or 2G. On the other hand, there are kexec based boot loaders (i.e. u-root), that may need to boot netboot images that might be larger than 2G. The first patch changes the return type from int to ssize_t in kernel_read_file* functions. The second patch increases the maximum initramfs file size to 4G. Tested: verified that can kexec_file_load() works with 4G initramfs on x86_64. This patch (of 2): Currently, the maximum file size that is supported is 2G. This may be too small in some cases. For example, kexec_file_load() system call loads initramfs. In some netboot cases initramfs can be rather large. Allow to use up-to ssize_t bytes. The callers still can limit the maximum file size via buf_size. Link: https://lkml.kernel.org/r/20220527025535.3953665-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20220527025535.3953665-2-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin Cc: Al Viro Cc: Baoquan He Cc: "Eric W. Biederman" Cc: Greg Thelen Cc: Sasha Levin Signed-off-by: Andrew Morton --- include/linux/kernel_read_file.h | 32 ++++++++++++++++---------------- include/linux/limits.h | 1 + 2 files changed, 17 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kernel_read_file.h b/include/linux/kernel_read_file.h index 575ffa1031d3..90451e2e12bd 100644 --- a/include/linux/kernel_read_file.h +++ b/include/linux/kernel_read_file.h @@ -35,21 +35,21 @@ static inline const char *kernel_read_file_id_str(enum kernel_read_file_id id) return kernel_read_file_str[id]; } -int kernel_read_file(struct file *file, loff_t offset, - void **buf, size_t buf_size, - size_t *file_size, - enum kernel_read_file_id id); -int kernel_read_file_from_path(const char *path, loff_t offset, - void **buf, size_t buf_size, - size_t *file_size, - enum kernel_read_file_id id); -int kernel_read_file_from_path_initns(const char *path, loff_t offset, - void **buf, size_t buf_size, - size_t *file_size, - enum kernel_read_file_id id); -int kernel_read_file_from_fd(int fd, loff_t offset, - void **buf, size_t buf_size, - size_t *file_size, - enum kernel_read_file_id id); +ssize_t kernel_read_file(struct file *file, loff_t offset, + void **buf, size_t buf_size, + size_t *file_size, + enum kernel_read_file_id id); +ssize_t kernel_read_file_from_path(const char *path, loff_t offset, + void **buf, size_t buf_size, + size_t *file_size, + enum kernel_read_file_id id); +ssize_t kernel_read_file_from_path_initns(const char *path, loff_t offset, + void **buf, size_t buf_size, + size_t *file_size, + enum kernel_read_file_id id); +ssize_t kernel_read_file_from_fd(int fd, loff_t offset, + void **buf, size_t buf_size, + size_t *file_size, + enum kernel_read_file_id id); #endif /* _LINUX_KERNEL_READ_FILE_H */ diff --git a/include/linux/limits.h b/include/linux/limits.h index b568b9c30bbf..f6bcc9369010 100644 --- a/include/linux/limits.h +++ b/include/linux/limits.h @@ -7,6 +7,7 @@ #include #define SIZE_MAX (~(size_t)0) +#define SSIZE_MAX ((ssize_t)(SIZE_MAX >> 1)) #define PHYS_ADDR_MAX (~(phys_addr_t)0) #define U8_MAX ((u8)~0U) -- cgit From a793679827a87b01f9d973ea30b923cd5a3ff2c5 Mon Sep 17 00:00:00 2001 From: Rasmus Villemoes Date: Tue, 14 Jun 2022 10:46:11 +0200 Subject: linux/phy.h: add phydev_err_probe() wrapper for dev_err_probe() The dev_err_probe() function is quite useful to avoid boilerplate related to -EPROBE_DEFER handling. Add a phydev_err_probe() helper to simplify making use of that from phy drivers which otherwise use the phydev_* helpers. Signed-off-by: Rasmus Villemoes Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 508f1149665b..bed9a347481b 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -1539,6 +1539,9 @@ static inline void phy_device_reset(struct phy_device *phydev, int value) #define phydev_err(_phydev, format, args...) \ dev_err(&_phydev->mdio.dev, format, ##args) +#define phydev_err_probe(_phydev, err, format, args...) \ + dev_err_probe(&_phydev->mdio.dev, err, format, ##args) + #define phydev_info(_phydev, format, args...) \ dev_info(&_phydev->mdio.dev, format, ##args) -- cgit From 508362ac66b0478affb4e52cb8da98478312d72d Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Wed, 15 Jun 2022 16:48:43 +0300 Subject: bpf: Allow helpers to accept pointers with a fixed size Before this commit, the BPF verifier required ARG_PTR_TO_MEM arguments to be followed by ARG_CONST_SIZE holding the size of the memory region. The helpers had to check that size in runtime. There are cases where the size expected by a helper is a compile-time constant. Checking it in runtime is an unnecessary overhead and waste of BPF registers. This commit allows helpers to accept pointers to memory without the corresponding ARG_CONST_SIZE, given that they define the memory region size in struct bpf_func_proto and use ARG_PTR_TO_FIXED_SIZE_MEM type. arg_size is unionized with arg_btf_id to reduce the kernel image size, and it's valid because they are used by different argument types. Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan Link: https://lore.kernel.org/r/20220615134847.3753567-3-maximmi@nvidia.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index f3e88afdaffe..a94531971a7a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -401,6 +401,9 @@ enum bpf_type_flag { /* DYNPTR points to a ringbuf record. */ DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS), + /* Size is known at compile time. */ + MEM_FIXED_SIZE = BIT(10 + BPF_BASE_TYPE_BITS), + __BPF_TYPE_FLAG_MAX, __BPF_TYPE_LAST_FLAG = __BPF_TYPE_FLAG_MAX - 1, }; @@ -464,6 +467,8 @@ enum bpf_arg_type { * all bytes or clear them in error case. */ ARG_PTR_TO_UNINIT_MEM = MEM_UNINIT | ARG_PTR_TO_MEM, + /* Pointer to valid memory of size known at compile time. */ + ARG_PTR_TO_FIXED_SIZE_MEM = MEM_FIXED_SIZE | ARG_PTR_TO_MEM, /* This must be the last entry. Its purpose is to ensure the enum is * wide enough to hold the higher bits reserved for bpf_type_flag. @@ -529,6 +534,14 @@ struct bpf_func_proto { u32 *arg5_btf_id; }; u32 *arg_btf_id[5]; + struct { + size_t arg1_size; + size_t arg2_size; + size_t arg3_size; + size_t arg4_size; + size_t arg5_size; + }; + size_t arg_size[5]; }; int *ret_btf_id; /* return value btf_id */ bool (*allowed)(const struct bpf_prog *prog); -- cgit From f0a6d77b351c18c122fc1638ac9e58f5e0346f64 Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Tue, 14 Jun 2022 22:51:47 +0300 Subject: ata: make transfer mode masks *unsigned int* The packed transfer mode masks and also the {pio|mwdma|udma}_mask fields of *struct*s ata_device and ata_port_info are declared as *unsigned long* (which is a 64-bit type on 64-bit architectures) but actually the packed masks occupy only 20 bits (7 PIO modes, 5 MWDMA modes, and 8 UDMA modes) and the PIO/MWDMA/UDMA masks easily fit into just 8 bits each, so we can safely use (always 32-bit) *unsigned int* variables instead. This saves 745 bytes of object code in libata-core.o alone, not to mention LLDDs... Signed-off-by: Sergey Shtylyov Signed-off-by: Damien Le Moal --- include/linux/libata.h | 49 ++++++++++++++++++++++++------------------------- 1 file changed, 24 insertions(+), 25 deletions(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 0f2a59c9c735..a8bc88b4fe07 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -275,7 +275,7 @@ enum { PORT_DISABLED = 2, /* encoding various smaller bitmaps into a single - * unsigned long bitmap + * unsigned int bitmap */ ATA_NR_PIO_MODES = 7, ATA_NR_MWDMA_MODES = 5, @@ -426,12 +426,9 @@ enum { }; enum ata_xfer_mask { - ATA_MASK_PIO = ((1LU << ATA_NR_PIO_MODES) - 1) - << ATA_SHIFT_PIO, - ATA_MASK_MWDMA = ((1LU << ATA_NR_MWDMA_MODES) - 1) - << ATA_SHIFT_MWDMA, - ATA_MASK_UDMA = ((1LU << ATA_NR_UDMA_MODES) - 1) - << ATA_SHIFT_UDMA, + ATA_MASK_PIO = ((1U << ATA_NR_PIO_MODES) - 1) << ATA_SHIFT_PIO, + ATA_MASK_MWDMA = ((1U << ATA_NR_MWDMA_MODES) - 1) << ATA_SHIFT_MWDMA, + ATA_MASK_UDMA = ((1U << ATA_NR_UDMA_MODES) - 1) << ATA_SHIFT_UDMA, }; enum hsm_task_states { @@ -680,9 +677,9 @@ struct ata_device { unsigned int cdb_len; /* per-dev xfer mask */ - unsigned long pio_mask; - unsigned long mwdma_mask; - unsigned long udma_mask; + unsigned int pio_mask; + unsigned int mwdma_mask; + unsigned int udma_mask; /* for CHS addressing */ u16 cylinders; /* Number of cylinders */ @@ -885,7 +882,7 @@ struct ata_port_operations { * Configuration and exception handling */ int (*cable_detect)(struct ata_port *ap); - unsigned long (*mode_filter)(struct ata_device *dev, unsigned long xfer_mask); + unsigned int (*mode_filter)(struct ata_device *dev, unsigned int xfer_mask); void (*set_piomode)(struct ata_port *ap, struct ata_device *dev); void (*set_dmamode)(struct ata_port *ap, struct ata_device *dev); int (*set_mode)(struct ata_link *link, struct ata_device **r_failed_dev); @@ -981,9 +978,9 @@ struct ata_port_operations { struct ata_port_info { unsigned long flags; unsigned long link_flags; - unsigned long pio_mask; - unsigned long mwdma_mask; - unsigned long udma_mask; + unsigned int pio_mask; + unsigned int mwdma_mask; + unsigned int udma_mask; struct ata_port_operations *port_ops; void *private_data; }; @@ -1102,16 +1099,18 @@ extern void ata_msleep(struct ata_port *ap, unsigned int msecs); extern u32 ata_wait_register(struct ata_port *ap, void __iomem *reg, u32 mask, u32 val, unsigned long interval, unsigned long timeout); extern int atapi_cmd_type(u8 opcode); -extern unsigned long ata_pack_xfermask(unsigned long pio_mask, - unsigned long mwdma_mask, unsigned long udma_mask); -extern void ata_unpack_xfermask(unsigned long xfer_mask, - unsigned long *pio_mask, unsigned long *mwdma_mask, - unsigned long *udma_mask); -extern u8 ata_xfer_mask2mode(unsigned long xfer_mask); -extern unsigned long ata_xfer_mode2mask(u8 xfer_mode); +extern unsigned int ata_pack_xfermask(unsigned int pio_mask, + unsigned int mwdma_mask, + unsigned int udma_mask); +extern void ata_unpack_xfermask(unsigned int xfer_mask, + unsigned int *pio_mask, + unsigned int *mwdma_mask, + unsigned int *udma_mask); +extern u8 ata_xfer_mask2mode(unsigned int xfer_mask); +extern unsigned int ata_xfer_mode2mask(u8 xfer_mode); extern int ata_xfer_mode2shift(u8 xfer_mode); -extern const char *ata_mode_string(unsigned long xfer_mask); -extern unsigned long ata_id_xfermask(const u16 *id); +extern const char *ata_mode_string(unsigned int xfer_mask); +extern unsigned int ata_id_xfermask(const u16 *id); extern int ata_std_qc_defer(struct ata_queued_cmd *qc); extern enum ata_completion_errors ata_noop_qc_prep(struct ata_queued_cmd *qc); extern void ata_sg_init(struct ata_queued_cmd *qc, struct scatterlist *sg, @@ -1283,8 +1282,8 @@ static inline const struct ata_acpi_gtm *ata_acpi_init_gtm(struct ata_port *ap) } int ata_acpi_stm(struct ata_port *ap, const struct ata_acpi_gtm *stm); int ata_acpi_gtm(struct ata_port *ap, struct ata_acpi_gtm *stm); -unsigned long ata_acpi_gtm_xfermask(struct ata_device *dev, - const struct ata_acpi_gtm *gtm); +unsigned int ata_acpi_gtm_xfermask(struct ata_device *dev, + const struct ata_acpi_gtm *gtm); int ata_acpi_cbl_80wire(struct ata_port *ap, const struct ata_acpi_gtm *gtm); #else static inline const struct ata_acpi_gtm *ata_acpi_init_gtm(struct ata_port *ap) -- cgit From d64de9773c18409d2161228242968ff3ebe3707e Mon Sep 17 00:00:00 2001 From: Weili Qian Date: Thu, 9 Jun 2022 20:31:19 +0800 Subject: crypto: hisilicon/qm - modify event irq processing When the driver receives an event interrupt, the driver will enable the event interrupt after handling all completed tasks on the function, tasks on the function are parsed through only one thread. If the task's user callback takes time, other tasks on the function will be blocked. Therefore, the event irq processing is modified as follows: 1. Obtain the ID of the queue that completes the task. 2. Enable event interrupt. 3. Parse the completed tasks in the queue and call the user callback. Enabling event interrupt in advance can quickly report pending event interrupts and process tasks in multiple threads. Signed-off-by: Weili Qian Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 6cabafffd0dd..116e8bd68c99 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -265,6 +265,12 @@ struct hisi_qm_list { void (*unregister_from_crypto)(struct hisi_qm *qm); }; +struct hisi_qm_poll_data { + struct hisi_qm *qm; + struct work_struct work; + u16 *qp_finish_id; +}; + struct hisi_qm { enum qm_hw_ver ver; enum qm_fun_type fun_type; @@ -302,6 +308,7 @@ struct hisi_qm { struct rw_semaphore qps_lock; struct idr qp_idr; struct hisi_qp *qp_array; + struct hisi_qm_poll_data *poll_data; struct mutex mailbox_lock; @@ -312,7 +319,6 @@ struct hisi_qm { u32 error_mask; struct workqueue_struct *wq; - struct work_struct work; struct work_struct rst_work; struct work_struct cmd_process; -- cgit From fa9c562f9735d24c3253747eb21f3f0c0f6de48e Mon Sep 17 00:00:00 2001 From: Ong Boon Leong Date: Wed, 15 Jun 2022 16:39:04 +0800 Subject: net: make xpcs_do_config to accept advertising for pcs-xpcs and sja1105 xpcs_config() has 'advertising' input that is required for C37 1000BASE-X AN in later patch series. So, we prepare xpcs_do_config() for it. For sja1105, xpcs_do_config() is used for xpcs configuration without depending on advertising input, so set to NULL. Reported-by: kernel test robot Signed-off-by: Ong Boon Leong Signed-off-by: David S. Miller --- include/linux/pcs/pcs-xpcs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index 266eb26fb029..37eb97cc2283 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -30,7 +30,7 @@ int xpcs_get_an_mode(struct dw_xpcs *xpcs, phy_interface_t interface); void xpcs_link_up(struct phylink_pcs *pcs, unsigned int mode, phy_interface_t interface, int speed, int duplex); int xpcs_do_config(struct dw_xpcs *xpcs, phy_interface_t interface, - unsigned int mode); + unsigned int mode, const unsigned long *advertising); void xpcs_get_interfaces(struct dw_xpcs *xpcs, unsigned long *interfaces); int xpcs_config_eee(struct dw_xpcs *xpcs, int mult_fact_100ns, int enable); -- cgit From b47aec885bcd672ebca2108a8b7e9ce3e3982775 Mon Sep 17 00:00:00 2001 From: Ong Boon Leong Date: Wed, 15 Jun 2022 16:39:06 +0800 Subject: net: pcs: xpcs: add CL37 1000BASE-X AN support For CL37 1000BASE-X AN, DW xPCS does not support C22 method but offers C45 vendor-specific MII MMD for programming. We also add the ability to disable Autoneg (through ethtool for certain network switch that supports 1000BASE-X (1000Mbps and Full-Duplex) but not Autoneg capability. v4: Fixes to comment from Russell King. Thanks! https://patchwork.kernel.org/comment/24894239/ Make xpcs_modify_changed() as private, change to use mdiodev_modify_changed() for cleaner code. v3: Fixes to issues spotted by Russell King. Thanks! https://patchwork.kernel.org/comment/24890210/ Use phylink_mii_c22_pcs_decode_state(), remove unnecessary interrupt clearing and skip speed & duplex setting if AN is enabled. v2: Fixes to issues spotted by Russell King in v1. Thanks! https://patchwork.kernel.org/comment/24826650/ Use phylink_mii_c22_pcs_encode_advertisement() and implement C45 MII ADV handling since IP only support C45 access. Tested-by: Emilio Riva Signed-off-by: Ong Boon Leong Reviewed-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/pcs/pcs-xpcs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pcs/pcs-xpcs.h b/include/linux/pcs/pcs-xpcs.h index 37eb97cc2283..d2da1e0b4a92 100644 --- a/include/linux/pcs/pcs-xpcs.h +++ b/include/linux/pcs/pcs-xpcs.h @@ -17,6 +17,7 @@ #define DW_AN_C73 1 #define DW_AN_C37_SGMII 2 #define DW_2500BASEX 3 +#define DW_AN_C37_1000BASEX 4 struct xpcs_id; -- cgit From 5cf9c91ba927119fc6606b938b1895bb2459d3bc Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 14 Jun 2022 09:48:25 +0200 Subject: block: serialize all debugfs operations using q->debugfs_mutex Various places like I/O schedulers or the QOS infrastructure try to register debugfs files on demans, which can race with creating and removing the main queue debugfs directory. Use the existing debugfs_mutex to serialize all debugfs operations that rely on q->debugfs_dir or the directories hanging off it. To make the teardown code a little simpler declare all debugfs dentry pointers and not just the main one uncoditionally in blkdev.h. Move debugfs_mutex next to the dentries that it protects and document what it is used for. Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220614074827.458955-3-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index bb6e3c31b3b7..73c886eba8e1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -482,7 +482,6 @@ struct request_queue { #endif /* CONFIG_BLK_DEV_ZONED */ int node; - struct mutex debugfs_mutex; #ifdef CONFIG_BLK_DEV_IO_TRACE struct blk_trace __rcu *blk_trace; #endif @@ -526,11 +525,12 @@ struct request_queue { struct bio_set bio_split; struct dentry *debugfs_dir; - -#ifdef CONFIG_BLK_DEBUG_FS struct dentry *sched_debugfs_dir; struct dentry *rqos_debugfs_dir; -#endif + /* + * Serializes all debugfs metadata operations using the above dentries. + */ + struct mutex debugfs_mutex; bool mq_sysfs_init_done; -- cgit From d0804085c5a7feb557bd3219777b51f5598dfdc6 Mon Sep 17 00:00:00 2001 From: Moudy Ho Date: Fri, 10 Jun 2022 14:34:18 +0800 Subject: soc: mediatek: mutex: add common interface for modules setting In order to allow multiple modules to operate MUTEX hardware through a common interfrace, two flexible indexes "mtk_mutex_mod_index" and "mtk_mutex_sof_index" need to be added to replace original component ID so that like DDP and MDP can add their own MOD table or SOF settings independently. In addition, 2 generic interface "mtk_mutex_write_mod" and "mtk_mutex_write_sof" have been added, which is expected to replace the "mtk_mutex_add_comp" and "mtk_mutex_remove_comp" pair originally dedicated to DDP in the future. Signed-off-by: Moudy Ho Reviewed-by: AngeloGioacchino Del Regno Reviewed-by: Rex-BC Chen Reviewed-by: CK Hu Link: https://lore.kernel.org/r/20220610063424.7800-2-moudy.ho@mediatek.com Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/mtk-mutex.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk-mutex.h b/include/linux/soc/mediatek/mtk-mutex.h index 6fe4ffbde290..2ddab9d2b85d 100644 --- a/include/linux/soc/mediatek/mtk-mutex.h +++ b/include/linux/soc/mediatek/mtk-mutex.h @@ -10,6 +10,26 @@ struct regmap; struct device; struct mtk_mutex; +enum mtk_mutex_mod_index { + /* MDP table index */ + MUTEX_MOD_IDX_MDP_RDMA0, + MUTEX_MOD_IDX_MDP_RSZ0, + MUTEX_MOD_IDX_MDP_RSZ1, + MUTEX_MOD_IDX_MDP_TDSHP0, + MUTEX_MOD_IDX_MDP_WROT0, + MUTEX_MOD_IDX_MDP_WDMA, + MUTEX_MOD_IDX_MDP_AAL0, + MUTEX_MOD_IDX_MDP_CCORR0, + + MUTEX_MOD_IDX_MAX /* ALWAYS keep at the end */ +}; + +enum mtk_mutex_sof_index { + MUTEX_SOF_IDX_SINGLE_MODE, + + MUTEX_SOF_IDX_MAX /* ALWAYS keep at the end */ +}; + struct mtk_mutex *mtk_mutex_get(struct device *dev); int mtk_mutex_prepare(struct mtk_mutex *mutex); void mtk_mutex_add_comp(struct mtk_mutex *mutex, @@ -22,5 +42,10 @@ void mtk_mutex_unprepare(struct mtk_mutex *mutex); void mtk_mutex_put(struct mtk_mutex *mutex); void mtk_mutex_acquire(struct mtk_mutex *mutex); void mtk_mutex_release(struct mtk_mutex *mutex); +int mtk_mutex_write_mod(struct mtk_mutex *mutex, + enum mtk_mutex_mod_index idx, + bool clear); +int mtk_mutex_write_sof(struct mtk_mutex *mutex, + enum mtk_mutex_sof_index idx); #endif /* MTK_MUTEX_H */ -- cgit From e5758850c2ea448dd750a280d128a3590d68b899 Mon Sep 17 00:00:00 2001 From: Moudy Ho Date: Fri, 10 Jun 2022 14:34:23 +0800 Subject: soc: mediatek: mutex: add functions that operate registers by CMDQ Due to HW limitations, MDP3 is necessary to enable MUTEX in each frame for SOF triggering and cooperate with CMDQ control to reduce the amount of interrupts generated(also, reduce frame latency). In response to the above situation, a new interface "mtk_mutex_enable_by_cmdq" has been added to achieve the purpose. Signed-off-by: Moudy Ho Reviewed-by: AngeloGioacchino Del Regno Reviewed-by: Rex-BC Chen Reviewed-by: CK Hu Link: https://lore.kernel.org/r/20220610063424.7800-7-moudy.ho@mediatek.com Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/mtk-mutex.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk-mutex.h b/include/linux/soc/mediatek/mtk-mutex.h index 2ddab9d2b85d..a0f4f51a3b45 100644 --- a/include/linux/soc/mediatek/mtk-mutex.h +++ b/include/linux/soc/mediatek/mtk-mutex.h @@ -35,6 +35,8 @@ int mtk_mutex_prepare(struct mtk_mutex *mutex); void mtk_mutex_add_comp(struct mtk_mutex *mutex, enum mtk_ddp_comp_id id); void mtk_mutex_enable(struct mtk_mutex *mutex); +int mtk_mutex_enable_by_cmdq(struct mtk_mutex *mutex, + void *pkt); void mtk_mutex_disable(struct mtk_mutex *mutex); void mtk_mutex_remove_comp(struct mtk_mutex *mutex, enum mtk_ddp_comp_id id); -- cgit From dc368e1c658e4f478a45e8d1d5b0c8392ca87506 Mon Sep 17 00:00:00 2001 From: Joanne Koong Date: Thu, 16 Jun 2022 15:54:07 -0700 Subject: bpf: Fix non-static bpf_func_proto struct definitions This patch does two things: 1) Marks the dynptr bpf_func_proto structs that were added in [1] as static, as pointed out by the kernel test robot in [2]. 2) There are some bpf_func_proto structs marked as extern which can instead be statically defined. [1] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/ [2] https://lore.kernel.org/bpf/62ab89f2.Pko7sI08RAKdF8R6%25lkp@intel.com/ Reported-by: kernel test robot Signed-off-by: Joanne Koong Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220616225407.1878436-1-joannelkoong@gmail.com --- include/linux/bpf.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a94531971a7a..0edd7d2c0064 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2363,12 +2363,9 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto; extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto; extern const struct bpf_func_proto bpf_sk_setsockopt_proto; extern const struct bpf_func_proto bpf_sk_getsockopt_proto; -extern const struct bpf_func_proto bpf_kallsyms_lookup_name_proto; extern const struct bpf_func_proto bpf_find_vma_proto; extern const struct bpf_func_proto bpf_loop_proto; -extern const struct bpf_func_proto bpf_strncmp_proto; extern const struct bpf_func_proto bpf_copy_from_user_task_proto; -extern const struct bpf_func_proto bpf_kptr_xchg_proto; const struct bpf_func_proto *tracing_prog_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); -- cgit From 9a2139c2912ea64e288749c21452930dc752d4fd Mon Sep 17 00:00:00 2001 From: Caleb Connolly Date: Fri, 29 Apr 2022 23:08:56 +0100 Subject: spmi: add a helper to look up an SPMI device from a device node The helper function spmi_device_from_of() takes a device node and returns the SPMI device associated with it. This is like of_find_device_by_node but for SPMI devices. Signed-off-by: Caleb Connolly Acked-by: Stephen Boyd Link: https://lore.kernel.org/r/20220429220904.137297-2-caleb.connolly@linaro.org Signed-off-by: Jonathan Cameron --- include/linux/spmi.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spmi.h b/include/linux/spmi.h index 729bcbf9f5ad..eac1956a8727 100644 --- a/include/linux/spmi.h +++ b/include/linux/spmi.h @@ -164,6 +164,9 @@ static inline void spmi_driver_unregister(struct spmi_driver *sdrv) module_driver(__spmi_driver, spmi_driver_register, \ spmi_driver_unregister) +struct device_node; + +struct spmi_device *spmi_device_from_of(struct device_node *np); int spmi_register_read(struct spmi_device *sdev, u8 addr, u8 *buf); int spmi_ext_register_read(struct spmi_device *sdev, u8 addr, u8 *buf, size_t len); -- cgit From 4a08069461ac9822855116d24fbf11d5fb223cd1 Mon Sep 17 00:00:00 2001 From: Dmitry Rokosov Date: Tue, 7 Jun 2022 18:39:18 +0000 Subject: iio: trigger: warn about non-registered iio trigger getting attempt As a part of patch series about wrong trigger register() and get() calls order in the some IIO drivers trigger initialization path: https://lore.kernel.org/all/20220524181150.9240-1-ddrokosov@sberdevices.ru/ runtime WARN_ONCE() is added to alarm IIO driver authors who make such a mistake. When an IIO driver allocates a new IIO trigger, it should register it before calling the get() operation. In other words, each IIO driver must abide by IIO trigger alloc()/register()/get() calls order. Signed-off-by: Dmitry Rokosov Link: https://lore.kernel.org/r/20220607183907.20017-1-ddrokosov@sberdevices.ru Signed-off-by: Jonathan Cameron --- include/linux/iio/trigger.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iio/trigger.h b/include/linux/iio/trigger.h index 4c69b144677b..03b1d6863436 100644 --- a/include/linux/iio/trigger.h +++ b/include/linux/iio/trigger.h @@ -93,6 +93,11 @@ static inline void iio_trigger_put(struct iio_trigger *trig) static inline struct iio_trigger *iio_trigger_get(struct iio_trigger *trig) { get_device(&trig->dev); + + WARN_ONCE(list_empty(&trig->list), + "Getting non-registered iio trigger %s is prohibited\n", + trig->name); + __module_get(trig->owner); return trig; -- cgit From bdb6cfe7512f7a214815a3092f0be50963dcacbc Mon Sep 17 00:00:00 2001 From: "Russell King (Oracle)" Date: Sat, 18 Jun 2022 11:28:32 +0100 Subject: net: mii: add mii_bmcr_encode_fixed() Add a function to encode a fixed speed/duplex to a BMCR value. Signed-off-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/mii.h | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mii.h b/include/linux/mii.h index 5ee13083cec7..d5a959ce4877 100644 --- a/include/linux/mii.h +++ b/include/linux/mii.h @@ -545,4 +545,39 @@ static inline u8 mii_resolve_flowctrl_fdx(u16 lcladv, u16 rmtadv) return cap; } +/** + * mii_bmcr_encode_fixed - encode fixed speed/duplex settings to a BMCR value + * @speed: a SPEED_* value + * @duplex: a DUPLEX_* value + * + * Encode the speed and duplex to a BMCR value. 2500, 1000, 100 and 10 Mbps are + * supported. 2500Mbps is encoded to 1000Mbps. Other speeds are encoded as 10 + * Mbps. Unknown duplex values are encoded to half-duplex. + */ +static inline u16 mii_bmcr_encode_fixed(int speed, int duplex) +{ + u16 bmcr; + + switch (speed) { + case SPEED_2500: + case SPEED_1000: + bmcr = BMCR_SPEED1000; + break; + + case SPEED_100: + bmcr = BMCR_SPEED100; + break; + + case SPEED_10: + default: + bmcr = BMCR_SPEED10; + break; + } + + if (duplex == DUPLEX_FULL) + bmcr |= BMCR_FULLDPLX; + + return bmcr; +} + #endif /* __LINUX_MII_H__ */ -- cgit From c01d4d0a82b71857be7449380338bc53dde2da92 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 16 Jun 2022 15:00:51 +0200 Subject: random: quiet urandom warning ratelimit suppression message random.c ratelimits how much it warns about uninitialized urandom reads using __ratelimit(). When the RNG is finally initialized, it prints the number of missed messages due to ratelimiting. It has been this way since that functionality was introduced back in 2018. Recently, cc1e127bfa95 ("random: remove ratelimiting for in-kernel unseeded randomness") put a bit more stress on the urandom ratelimiting, which teased out a bug in the implementation. Specifically, when under pressure, __ratelimit() will print its own message and reset the count back to 0, making the final message at the end less useful. Secondly, it does so as a pr_warn(), which apparently is undesirable for people's CI. Fortunately, __ratelimit() has the RATELIMIT_MSG_ON_RELEASE flag exactly for this purpose, so we set the flag. Fixes: 4e00b339e264 ("random: rate limit unseeded randomness warnings") Cc: stable@vger.kernel.org Reported-by: Jon Hunter Reported-by: Ron Economos Tested-by: Ron Economos Signed-off-by: Jason A. Donenfeld --- include/linux/ratelimit_types.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ratelimit_types.h b/include/linux/ratelimit_types.h index c21c7f8103e2..002266693e50 100644 --- a/include/linux/ratelimit_types.h +++ b/include/linux/ratelimit_types.h @@ -23,12 +23,16 @@ struct ratelimit_state { unsigned long flags; }; -#define RATELIMIT_STATE_INIT(name, interval_init, burst_init) { \ - .lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \ - .interval = interval_init, \ - .burst = burst_init, \ +#define RATELIMIT_STATE_INIT_FLAGS(name, interval_init, burst_init, flags_init) { \ + .lock = __RAW_SPIN_LOCK_UNLOCKED(name.lock), \ + .interval = interval_init, \ + .burst = burst_init, \ + .flags = flags_init, \ } +#define RATELIMIT_STATE_INIT(name, interval_init, burst_init) \ + RATELIMIT_STATE_INIT_FLAGS(name, interval_init, burst_init, 0) + #define RATELIMIT_STATE_INIT_DISABLED \ RATELIMIT_STATE_INIT(ratelimit_state, 0, DEFAULT_RATELIMIT_BURST) -- cgit From 2e0aee8f0a22c60a1ae0876f7233e70ad9d026b8 Mon Sep 17 00:00:00 2001 From: Sergey Shtylyov Date: Thu, 16 Jun 2022 23:51:49 +0300 Subject: ata: make ata_port::fastdrain_cnt *unsigned int* *unsigned long* ata_port::fastdrain_cnt (64-bit value in a 64-bit kernel) is always assigned from the 32-bit *unsigned int* variables, thus could also be made just *unsigned int*... Signed-off-by: Sergey Shtylyov Signed-off-by: Damien Le Moal --- include/linux/libata.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index a8bc88b4fe07..0269ff114f5a 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -847,7 +847,7 @@ struct ata_port { enum ata_lpm_policy target_lpm_policy; struct timer_list fastdrain_timer; - unsigned long fastdrain_cnt; + unsigned int fastdrain_cnt; async_cookie_t cookie; -- cgit From 9243fc4cd28c8bdddd7fe0abd5bbec3c4fdf5052 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Fri, 3 Jun 2022 14:35:29 +0900 Subject: block: remove queue from struct blk_independent_access_range The request queue pointer in struct blk_independent_access_range is unused. Remove it. Signed-off-by: Damien Le Moal Fixes: 41e46b3c2aa2 ("block: Fix potential deadlock in blk_ia_range_sysfs_show()") Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220603053529.76405-1-damien.lemoal@opensource.wdc.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 73c886eba8e1..2f7b43444c5f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -342,7 +342,6 @@ static inline int blkdev_zone_mgmt_ioctl(struct block_device *bdev, */ struct blk_independent_access_range { struct kobject kobj; - struct request_queue *queue; sector_t sector; sector_t nr_sectors; }; -- cgit From 8e1c69149f27189cff93a0cfe9402e576d89ce29 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 29 Apr 2022 01:04:10 +0000 Subject: KVM: Avoid pfn_to_page() and vice versa when releasing pages Invert the order of KVM's page/pfn release helpers so that the "inner" helper operates on a page instead of a pfn. As pointed out by Linus[*], converting between struct page and a pfn isn't necessarily cheap, and that's not even counting the overhead of is_error_noslot_pfn() and kvm_is_reserved_pfn(). Even if the checks were dirt cheap, there's no reason to convert from a page to a pfn and back to a page, just to mark the page dirty/accessed or to put a reference to the page. Opportunistically drop a stale declaration of kvm_set_page_accessed() from kvm_host.h (there was no implementation). No functional change intended. [*] https://lore.kernel.org/all/CAHk-=wifQimj2d6npq-wCi5onYPjzQg4vyO4tFcPJJZr268cRw@mail.gmail.com Signed-off-by: Sean Christopherson Message-Id: <20220429010416.2788472-5-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c20f2d55840c..af4b5c0bf04e 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1139,7 +1139,6 @@ unsigned long gfn_to_hva_memslot_prot(struct kvm_memory_slot *slot, gfn_t gfn, bool *writable); void kvm_release_page_clean(struct page *page); void kvm_release_page_dirty(struct page *page); -void kvm_set_page_accessed(struct page *page); kvm_pfn_t gfn_to_pfn(struct kvm *kvm, gfn_t gfn); kvm_pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault, -- cgit From b1624f99aa8fedafcabf1b92fa51ed88dde14acb Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 29 Apr 2022 01:04:13 +0000 Subject: KVM: Remove kvm_vcpu_gfn_to_page() and kvm_vcpu_gpa_to_page() Drop helpers to convert a gfn/gpa to a 'struct page' in the context of a vCPU. KVM doesn't require that guests be backed by 'struct page' memory, thus any use of helpers that assume 'struct page' is bound to be flawed, as was the case for the recently removed last user in x86's nested VMX. No functional change intended. Signed-off-by: Sean Christopherson Message-Id: <20220429010416.2788472-8-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index af4b5c0bf04e..4334789409c0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1230,7 +1230,6 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn kvm_pfn_t kvm_vcpu_gfn_to_pfn_atomic(struct kvm_vcpu *vcpu, gfn_t gfn); kvm_pfn_t kvm_vcpu_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn); int kvm_vcpu_map(struct kvm_vcpu *vcpu, gpa_t gpa, struct kvm_host_map *map); -struct page *kvm_vcpu_gfn_to_page(struct kvm_vcpu *vcpu, gfn_t gfn); void kvm_vcpu_unmap(struct kvm_vcpu *vcpu, struct kvm_host_map *map, bool dirty); unsigned long kvm_vcpu_gfn_to_hva(struct kvm_vcpu *vcpu, gfn_t gfn); unsigned long kvm_vcpu_gfn_to_hva_prot(struct kvm_vcpu *vcpu, gfn_t gfn, bool *writable); @@ -1718,12 +1717,6 @@ static inline hpa_t pfn_to_hpa(kvm_pfn_t pfn) return (hpa_t)pfn << PAGE_SHIFT; } -static inline struct page *kvm_vcpu_gpa_to_page(struct kvm_vcpu *vcpu, - gpa_t gpa) -{ - return kvm_vcpu_gfn_to_page(vcpu, gpa_to_gfn(gpa)); -} - static inline bool kvm_is_error_gpa(struct kvm *kvm, gpa_t gpa) { unsigned long hva = gfn_to_hva(kvm, gpa_to_gfn(gpa)); -- cgit From 284dc49307738d2a897fd375431f741213cd0f27 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 29 Apr 2022 01:04:14 +0000 Subject: KVM: Take a 'struct page', not a pfn in kvm_is_zone_device_page() Operate on a 'struct page' instead of a pfn when checking if a page is a ZONE_DEVICE page, and rename the helper accordingly. Generally speaking, KVM doesn't actually care about ZONE_DEVICE memory, i.e. shouldn't do anything special for ZONE_DEVICE memory. Rather, KVM wants to treat ZONE_DEVICE memory like regular memory, and the need to identify ZONE_DEVICE memory only arises as an exception to PG_reserved pages. In other words, KVM should only ever check for ZONE_DEVICE memory after KVM has already verified that there is a struct page associated with the pfn. No functional change intended. Signed-off-by: Sean Christopherson Message-Id: <20220429010416.2788472-9-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4334789409c0..6fb433ac2ba1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1571,7 +1571,7 @@ void kvm_arch_sync_events(struct kvm *kvm); int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); bool kvm_is_reserved_pfn(kvm_pfn_t pfn); -bool kvm_is_zone_device_pfn(kvm_pfn_t pfn); +bool kvm_is_zone_device_page(struct page *page); struct kvm_irq_ack_notifier { struct hlist_node link; -- cgit From b14b2690c50e02145bb867dfcde8845eb17aa8a4 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Fri, 29 Apr 2022 01:04:15 +0000 Subject: KVM: Rename/refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page() Rename and refactor kvm_is_reserved_pfn() to kvm_pfn_to_refcounted_page() to better reflect what KVM is actually checking, and to eliminate extra pfn_to_page() lookups. The kvm_release_pfn_*() an kvm_try_get_pfn() helpers in particular benefit from "refouncted" nomenclature, as it's not all that obvious why KVM needs to get/put refcounts for some PG_reserved pages (ZERO_PAGE and ZONE_DEVICE). Add a comment to call out that the list of exceptions to PG_reserved is all but guaranteed to be incomplete. The list has mostly been compiled by people throwing noodles at KVM and finding out they stick a little too well, e.g. the ZERO_PAGE's refcount overflowed and ZONE_DEVICE pages didn't get freed. No functional change intended. Signed-off-by: Sean Christopherson Message-Id: <20220429010416.2788472-10-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 6fb433ac2ba1..a2bbdf3ab086 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1570,7 +1570,7 @@ void kvm_arch_sync_events(struct kvm *kvm); int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); -bool kvm_is_reserved_pfn(kvm_pfn_t pfn); +struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn); bool kvm_is_zone_device_page(struct page *page); struct kvm_irq_ack_notifier { -- cgit From 7b0a0e3c3a88260b6fcb017e49f198463aa62ed1 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Thu, 14 Apr 2022 16:50:57 +0200 Subject: wifi: cfg80211: do some rework towards MLO link APIs In order to support multi-link operation with multiple links, start adding some APIs. The notable addition here is to have the link ID in a new nl80211 attribute, that will be used to differentiate the links in many nl80211 operations. So far, this patch adds the netlink NL80211_ATTR_MLO_LINK_ID attribute (as well as the NL80211_ATTR_MLO_LINKS attribute) and plugs it through the system in some places, checking the validity etc. along with other infrastructure needed for it. For now, I've decided to include only the over-the-air link ID in the API. I know we discussed that we eventually need to have to have other ways of identifying a link, but for local AP mode and auth/assoc commands as well as set_key etc. we'll use the OTA ID. Also included in this patch is some refactoring of the data structures in struct wireless_dev, splitting for the first time the data into type dependent pieces, to make reasoning about these things easier. Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 5c65ae6b8154..e15771965916 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -4376,4 +4376,7 @@ enum ieee80211_range_params_max_total_ltf { IEEE80211_RANGE_PARAMS_MAX_TOTAL_LTF_UNSPECIFIED, }; +/* multi-link device */ +#define IEEE80211_MLD_MAX_NUM_LINKS 15 + #endif /* LINUX_IEEE80211_H */ -- cgit From 0f48b8b88aa9ed7b65d7cb55dbc57ec914ddada1 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 31 May 2022 14:03:38 +0200 Subject: wifi: ieee80211: add definitions for multi-link element Add the definitions necessary to build and parse some of the multi-link element, the per-STA profile isn't fully included. Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 222 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 222 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index e15771965916..870f659087d6 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -4379,4 +4379,226 @@ enum ieee80211_range_params_max_total_ltf { /* multi-link device */ #define IEEE80211_MLD_MAX_NUM_LINKS 15 +#define IEEE80211_ML_CONTROL_TYPE 0x0007 +#define IEEE80211_ML_CONTROL_TYPE_BASIC 0 +#define IEEE80211_ML_CONTROL_TYPE_PREQ 1 +#define IEEE80211_ML_CONTROL_TYPE_RECONF 2 +#define IEEE80211_ML_CONTROL_TYPE_TDLS 3 +#define IEEE80211_ML_CONTROL_TYPE_PRIO_ACCESS 4 +#define IEEE80211_ML_CONTROL_PRESENCE_MASK 0xfff0 + +struct ieee80211_multi_link_elem { + __le16 control; + u8 variable[]; +} __packed; + +#define IEEE80211_MLC_BASIC_PRES_LINK_ID 0x0010 +#define IEEE80211_MLC_BASIC_PRES_BSS_PARAM_CH_CNT 0x0020 +#define IEEE80211_MLC_BASIC_PRES_MED_SYNC_DELAY 0x0040 +#define IEEE80211_MLC_BASIC_PRES_EML_CAPA 0x0080 +#define IEEE80211_MLC_BASIC_PRES_MLD_CAPA_OP 0x0100 +#define IEEE80211_MLC_BASIC_PRES_MLD_ID 0x0200 + +#define IEEE80211_MED_SYNC_DELAY_DURATION 0x00ff +#define IEEE80211_MED_SYNC_DELAY_SYNC_OFDM_ED_THRESH 0x0f00 +#define IEEE80211_MED_SYNC_DELAY_SYNC_MAX_NUM_TXOPS 0xf000 + +#define IEEE80211_EML_CAP_EMLSR_SUPP 0x0001 +#define IEEE80211_EML_CAP_EMLSR_PADDING_DELAY 0x000e +#define IEEE80211_EML_CAP_EMLSR_PADDING_DELAY_0US 0 +#define IEEE80211_EML_CAP_EMLSR_PADDING_DELAY_32US 1 +#define IEEE80211_EML_CAP_EMLSR_PADDING_DELAY_64US 2 +#define IEEE80211_EML_CAP_EMLSR_PADDING_DELAY_128US 3 +#define IEEE80211_EML_CAP_EMLSR_PADDING_DELAY_256US 4 +#define IEEE80211_EML_CAP_EMLSR_TRANSITION_DELAY 0x0070 +#define IEEE80211_EML_CAP_EMLSR_TRANSITION_DELAY_0US 0 +#define IEEE80211_EML_CAP_EMLSR_TRANSITION_DELAY_16US 1 +#define IEEE80211_EML_CAP_EMLSR_TRANSITION_DELAY_32US 2 +#define IEEE80211_EML_CAP_EMLSR_TRANSITION_DELAY_64US 3 +#define IEEE80211_EML_CAP_EMLSR_TRANSITION_DELAY_128US 4 +#define IEEE80211_EML_CAP_EMLSR_TRANSITION_DELAY_256US 5 +#define IEEE80211_EML_CAP_EMLMR_SUPPORT 0x0080 +#define IEEE80211_EML_CAP_EMLMR_DELAY 0x0700 +#define IEEE80211_EML_CAP_EMLMR_DELAY_0US 0 +#define IEEE80211_EML_CAP_EMLMR_DELAY_32US 1 +#define IEEE80211_EML_CAP_EMLMR_DELAY_64US 2 +#define IEEE80211_EML_CAP_EMLMR_DELAY_128US 3 +#define IEEE80211_EML_CAP_EMLMR_DELAY_256US 4 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT 0x7800 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_0 0 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_128US 1 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_256US 2 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_512US 3 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_1TU 4 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_2TU 5 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_4TU 6 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_8TU 7 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_16TU 8 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_32TU 9 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_64TU 10 +#define IEEE80211_EML_CAP_TRANSITION_TIMEOUT_128TU 11 + +#define IEEE80211_MLD_CAP_OP_MAX_SIMUL_LINKS 0x000f +#define IEEE80211_MLD_CAP_OP_SRS_SUPPORT 0x0010 +#define IEEE80211_MLD_CAP_OP_TID_TO_LINK_MAP_NEG_SUPP 0x0060 +#define IEEE80211_MLD_CAP_OP_FREQ_SEP_TYPE_IND 0x0f80 +#define IEEE80211_MLD_CAP_OP_AAR_SUPPORT 0x1000 + +struct ieee80211_mle_basic_common_info { + u8 len; + u8 mld_mac_addr[ETH_ALEN]; + u8 variable[]; +} __packed; + +#define IEEE80211_MLC_PREQ_PRES_MLD_ID 0x0010 + +struct ieee80211_mle_preq_common_info { + u8 len; + u8 variable[]; +} __packed; + +#define IEEE80211_MLC_RECONF_PRES_MLD_MAC_ADDR 0x0010 + +/* no fixed fields in RECONF */ + +struct ieee80211_mle_tdls_common_info { + u8 len; + u8 ap_mld_mac_addr[ETH_ALEN]; +} __packed; + +#define IEEE80211_MLC_PRIO_ACCESS_PRES_AP_MLD_MAC_ADDR 0x0010 + +/* no fixed fields in PRIO_ACCESS */ + +/** + * ieee80211_mle_common_size - check multi-link element common size + * @data: multi-link element, must already be checked for size using + * ieee80211_mle_size_ok() + */ +static inline u8 ieee80211_mle_common_size(const u8 *data) +{ + const struct ieee80211_multi_link_elem *mle = (const void *)data; + u16 control = le16_to_cpu(mle->control); + u8 common = 0; + + switch (u16_get_bits(control, IEEE80211_ML_CONTROL_TYPE)) { + case IEEE80211_ML_CONTROL_TYPE_BASIC: + common += sizeof(struct ieee80211_mle_basic_common_info); + break; + case IEEE80211_ML_CONTROL_TYPE_PREQ: + common += sizeof(struct ieee80211_mle_preq_common_info); + break; + case IEEE80211_ML_CONTROL_TYPE_RECONF: + if (control & IEEE80211_MLC_RECONF_PRES_MLD_MAC_ADDR) + common += ETH_ALEN; + return common; + case IEEE80211_ML_CONTROL_TYPE_TDLS: + common += sizeof(struct ieee80211_mle_tdls_common_info); + break; + case IEEE80211_ML_CONTROL_TYPE_PRIO_ACCESS: + if (control & IEEE80211_MLC_PRIO_ACCESS_PRES_AP_MLD_MAC_ADDR) + common += ETH_ALEN; + return common; + default: + WARN_ON(1); + return 0; + } + + return common + mle->variable[0]; +} + +/** + * ieee80211_mle_size_ok - validate multi-link element size + * @data: pointer to the element data + * @len: length of the containing element + */ +static inline bool ieee80211_mle_size_ok(const u8 *data, u8 len) +{ + const struct ieee80211_multi_link_elem *mle = (const void *)data; + u8 fixed = sizeof(*mle); + u8 common = 0; + bool check_common_len = false; + u16 control; + + if (len < fixed) + return false; + + control = le16_to_cpu(mle->control); + + switch (u16_get_bits(control, IEEE80211_ML_CONTROL_TYPE)) { + case IEEE80211_ML_CONTROL_TYPE_BASIC: + common += sizeof(struct ieee80211_mle_basic_common_info); + check_common_len = true; + if (control & IEEE80211_MLC_BASIC_PRES_LINK_ID) + common += 1; + if (control & IEEE80211_MLC_BASIC_PRES_BSS_PARAM_CH_CNT) + common += 1; + if (control & IEEE80211_MLC_BASIC_PRES_MED_SYNC_DELAY) + common += 2; + if (control & IEEE80211_MLC_BASIC_PRES_EML_CAPA) + common += 2; + if (control & IEEE80211_MLC_BASIC_PRES_MLD_CAPA_OP) + common += 2; + if (control & IEEE80211_MLC_BASIC_PRES_MLD_ID) + common += 1; + break; + case IEEE80211_ML_CONTROL_TYPE_PREQ: + common += sizeof(struct ieee80211_mle_preq_common_info); + if (control & IEEE80211_MLC_PREQ_PRES_MLD_ID) + common += 1; + check_common_len = true; + break; + case IEEE80211_ML_CONTROL_TYPE_RECONF: + if (control & IEEE80211_MLC_RECONF_PRES_MLD_MAC_ADDR) + common += ETH_ALEN; + break; + case IEEE80211_ML_CONTROL_TYPE_TDLS: + common += sizeof(struct ieee80211_mle_tdls_common_info); + check_common_len = true; + break; + case IEEE80211_ML_CONTROL_TYPE_PRIO_ACCESS: + if (control & IEEE80211_MLC_PRIO_ACCESS_PRES_AP_MLD_MAC_ADDR) + common += ETH_ALEN; + break; + default: + /* we don't know this type */ + return true; + } + + if (len < fixed + common) + return false; + + if (!check_common_len) + return true; + + /* if present, common length is the first octet there */ + return mle->variable[0] >= common; +} + +enum ieee80211_mle_subelems { + IEEE80211_MLE_SUBELEM_PER_STA_PROFILE = 0, +}; + +#define IEEE80211_MLE_STA_CONTROL_LINK_ID 0x000f +#define IEEE80211_MLE_STA_CONTROL_COMPLETE_PROFILE 0x0010 +#define IEEE80211_MLE_STA_CONTROL_STA_MAC_ADDR_PRESENT 0x0020 +#define IEEE80211_MLE_STA_CONTROL_BEACON_INT_PRESENT 0x0040 +#define IEEE80211_MLE_STA_CONTROL_TSF_OFFS_PRESENT 0x0080 +#define IEEE80211_MLE_STA_CONTROL_DTIM_INFO_PRESENT 0x0100 +#define IEEE80211_MLE_STA_CONTROL_NSTR_LINK_PAIR_PRESENT 0x0200 +#define IEEE80211_MLE_STA_CONTROL_NSTR_BITMAP_SIZE 0x0400 +#define IEEE80211_MLE_STA_CONTROL_BSS_PARAM_CHANGE_CNT_PRESENT 0x0800 + +struct ieee80211_mle_per_sta_profile { + __le16 control; + u8 sta_info_len; + u8 variable[]; +} __packed; + +#define for_each_mle_subelement(_elem, _data, _len) \ + if (ieee80211_mle_size_ok(_data, _len)) \ + for_each_element(_elem, \ + _data + ieee80211_mle_common_size(_data),\ + _len - ieee80211_mle_common_size(_data)) + #endif /* LINUX_IEEE80211_H */ -- cgit From 965b57b469a589d64d81b1688b38dcb537011bb0 Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Wed, 15 Jun 2022 09:20:12 -0700 Subject: net: Introduce a new proto_ops ->read_skb() Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors. For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb(). Signed-off-by: Cong Wang Signed-off-by: Daniel Borkmann Reviewed-by: John Fastabend Link: https://lore.kernel.org/bpf/20220615162014.89193-3-xiyou.wangcong@gmail.com --- include/linux/net.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/net.h b/include/linux/net.h index 12093f4db50c..a03485e8cbb2 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -152,6 +152,8 @@ struct module; struct sk_buff; typedef int (*sk_read_actor_t)(read_descriptor_t *, struct sk_buff *, unsigned int, size_t); +typedef int (*skb_read_actor_t)(struct sock *, struct sk_buff *); + struct proto_ops { int family; @@ -214,6 +216,8 @@ struct proto_ops { */ int (*read_sock)(struct sock *sk, read_descriptor_t *desc, sk_read_actor_t recv_actor); + /* This is different from read_sock(), it reads an entire skb at a time. */ + int (*read_skb)(struct sock *sk, skb_read_actor_t recv_actor); int (*sendpage_locked)(struct sock *sk, struct page *page, int offset, size_t size, int flags); int (*sendmsg_locked)(struct sock *sk, struct msghdr *msg, -- cgit From 414c12385d4741e35d88670c6cc2f40a77809734 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 13 Apr 2022 15:17:25 -0700 Subject: rcu: Provide a get_completed_synchronize_rcu() function It is currently up to the caller to handle stale return values from get_state_synchronize_rcu(). If poll_state_synchronize_rcu() returned true once, a grace period has elapsed, regardless of the fact that counter wrap might cause some future poll_state_synchronize_rcu() invocation to return false. For example, the caller might store a separate flag that indicates whether some previous call to poll_state_synchronize_rcu() determined that the relevant grace period had already ended. This approach works, but it requires extra storage and is easy to get wrong. This commit therefore introduces a get_completed_synchronize_rcu() that returns a cookie that causes poll_state_synchronize_rcu() to always return true. This already-completed cookie can be stored in place of the cookie that previously caused poll_state_synchronize_rcu() to return true. It can also be used to flag a given structure as not having been exposed to readers, and thus not requiring a grace period to elapse. This commit is in preparation for polled expedited grace periods. Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/ Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing Cc: Brian Foster Cc: Dave Chinner Cc: Al Viro Cc: Ian Kent Signed-off-by: Paul E. McKenney --- include/linux/rcupdate.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 1a32036c918c..7f12daa4708b 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -41,6 +41,7 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func); void rcu_barrier_tasks(void); void rcu_barrier_tasks_rude(void); void synchronize_rcu(void); +unsigned long get_completed_synchronize_rcu(void); #ifdef CONFIG_PREEMPT_RCU -- cgit From 3847b64570b1753e9863a4106aec03f4d68e7b17 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 23 May 2022 20:50:11 -0700 Subject: rcu-tasks: Merge state into .b.need_qs and atomically update This commit gets rid of the task_struct structure's ->trc_reader_checked field, making it instead be a bit within the task_struct structure's existing ->trc_reader_special.b.need_qs field. This commit also atomically loads, stores, and checks the resulting combination of the reader-checked and need-quiescent state flags. This will in turn allow significant simplification of the rcu_tasks_trace_postgp() function as well as elimination of the trc_n_readers_need_end counter in later commits. These changes will in turn simplify later elimination of the RCU Tasks Trace scan of the task list, which will make RCU Tasks Trace grace periods less CPU-intensive. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney Cc: Neeraj Upadhyay Cc: Eric Dumazet Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Martin KaFai Lau Cc: KP Singh --- include/linux/rcupdate.h | 18 +++++++++++------- include/linux/sched.h | 1 - 2 files changed, 11 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 1a32036c918c..1e728d544fc1 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -169,13 +169,17 @@ void synchronize_rcu_tasks(void); # endif # ifdef CONFIG_TASKS_TRACE_RCU -# define rcu_tasks_trace_qs(t) \ - do { \ - if (!likely(READ_ONCE((t)->trc_reader_checked)) && \ - !unlikely(READ_ONCE((t)->trc_reader_nesting))) { \ - smp_store_release(&(t)->trc_reader_checked, true); \ - smp_mb(); /* Readers partitioned by store. */ \ - } \ +// Bits for ->trc_reader_special.b.need_qs field. +#define TRC_NEED_QS 0x1 // Task needs a quiescent state. +#define TRC_NEED_QS_CHECKED 0x2 // Task has been checked for needing quiescent state. + +u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new); + +# define rcu_tasks_trace_qs(t) \ + do { \ + if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) && \ + likely(!READ_ONCE((t)->trc_reader_nesting))) \ + rcu_trc_cmpxchg_need_qs((t), 0, TRC_NEED_QS_CHECKED); \ } while (0) # else # define rcu_tasks_trace_qs(t) do { } while (0) diff --git a/include/linux/sched.h b/include/linux/sched.h index c46f3a63b758..e6eb5871593e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -843,7 +843,6 @@ struct task_struct { int trc_reader_nesting; int trc_ipi_to_cpu; union rcu_special trc_reader_special; - bool trc_reader_checked; struct list_head trc_holdout_list; #endif /* #ifdef CONFIG_TASKS_TRACE_RCU */ -- cgit From 434c9eefb959c36331a93617ea95df903469b99f Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Mon, 16 May 2022 17:56:16 -0700 Subject: rcu-tasks: Add data structures for lightweight grace periods This commit adds fields to task_struct and to rcu_tasks_percpu that will be used to avoid the task-list scan for RCU Tasks Trace grace periods, and also initializes these fields. Signed-off-by: Paul E. McKenney Cc: Neeraj Upadhyay Cc: Eric Dumazet Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Martin KaFai Lau Cc: KP Singh --- include/linux/sched.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index e6eb5871593e..b88caf54e168 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -844,6 +844,8 @@ struct task_struct { int trc_ipi_to_cpu; union rcu_special trc_reader_special; struct list_head trc_holdout_list; + struct list_head trc_blkd_node; + int trc_blkd_cpu; #endif /* #ifdef CONFIG_TASKS_TRACE_RCU */ struct sched_info sched_info; -- cgit From 0356d4e66214569de674ab2684f2e0b440a466ab Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 17 May 2022 11:30:32 -0700 Subject: rcu-tasks: Track blocked RCU Tasks Trace readers This commit places any task that has ever blocked within its current RCU Tasks Trace read-side critical section on a per-CPU list within the rcu_tasks_percpu structure. Tasks are removed from this list when they exit by the exit_tasks_rcu_finish_trace() function. The purpose of this commit is to provide the information needed to eliminate the current scan of the full task list. This commit offsets the INT_MIN value for ->trc_reader_nesting with the new nesting level in order to avoid queueing tasks that are exiting their read-side critical sections. [ paulmck: Apply kernel test robot feedback. ] [ paulmck: Apply feedback from syzbot+9bb26e7c5e8e4fa7e641@syzkaller.appspotmail.com ] Signed-off-by: Paul E. McKenney Tested-by: syzbot Tested-by: "Zhang, Qiang1" Cc: Peter Zijlstra Cc: Neeraj Upadhyay Cc: Eric Dumazet Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Martin KaFai Lau Cc: KP Singh --- include/linux/rcupdate.h | 11 +++++++++-- include/linux/rcupdate_trace.h | 2 +- 2 files changed, 10 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 1e728d544fc1..ebdfeead44e5 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -174,12 +174,19 @@ void synchronize_rcu_tasks(void); #define TRC_NEED_QS_CHECKED 0x2 // Task has been checked for needing quiescent state. u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new); +void rcu_tasks_trace_qs_blkd(struct task_struct *t); # define rcu_tasks_trace_qs(t) \ do { \ + int ___rttq_nesting = READ_ONCE((t)->trc_reader_nesting); \ + \ if (likely(!READ_ONCE((t)->trc_reader_special.b.need_qs)) && \ - likely(!READ_ONCE((t)->trc_reader_nesting))) \ + likely(!___rttq_nesting)) { \ rcu_trc_cmpxchg_need_qs((t), 0, TRC_NEED_QS_CHECKED); \ + } else if (___rttq_nesting && ___rttq_nesting != INT_MIN && \ + !READ_ONCE((t)->trc_reader_special.b.blocked)) { \ + rcu_tasks_trace_qs_blkd(t); \ + } \ } while (0) # else # define rcu_tasks_trace_qs(t) do { } while (0) @@ -188,7 +195,7 @@ u8 rcu_trc_cmpxchg_need_qs(struct task_struct *t, u8 old, u8 new); #define rcu_tasks_qs(t, preempt) \ do { \ rcu_tasks_classic_qs((t), (preempt)); \ - rcu_tasks_trace_qs((t)); \ + rcu_tasks_trace_qs(t); \ } while (0) # ifdef CONFIG_TASKS_RUDE_RCU diff --git a/include/linux/rcupdate_trace.h b/include/linux/rcupdate_trace.h index 6f9c35817398..9bc8cbb33340 100644 --- a/include/linux/rcupdate_trace.h +++ b/include/linux/rcupdate_trace.h @@ -75,7 +75,7 @@ static inline void rcu_read_unlock_trace(void) nesting = READ_ONCE(t->trc_reader_nesting) - 1; barrier(); // Critical section before disabling. // Disable IPI-based setting of .need_qs. - WRITE_ONCE(t->trc_reader_nesting, INT_MIN); + WRITE_ONCE(t->trc_reader_nesting, INT_MIN + nesting); if (likely(!READ_ONCE(t->trc_reader_special.s)) || nesting) { WRITE_ONCE(t->trc_reader_nesting, nesting); return; // We assume shallow reader nesting. -- cgit From 5f8a62af527ad8a615d68889d816b72c3f5f227f Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:18 +0200 Subject: context_tracking: Remove unused context_tracking_in_user() This function is not used and CT_WARN_ON() coupled with ct_state() is the preferred way to assert context tracking state values. Reported-by: Nicolas Saenz Julienne Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Frederic Weisbecker Signed-off-by: Paul E. McKenney --- include/linux/context_tracking_state.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index ae1e63e26947..edc7b46376a6 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -41,12 +41,7 @@ static inline bool context_tracking_enabled_this_cpu(void) return context_tracking_enabled() && __this_cpu_read(context_tracking.active); } -static __always_inline bool context_tracking_in_user(void) -{ - return __this_cpu_read(context_tracking.state) == CONTEXT_USER; -} #else -static __always_inline bool context_tracking_in_user(void) { return false; } static __always_inline bool context_tracking_enabled(void) { return false; } static __always_inline bool context_tracking_enabled_cpu(int cpu) { return false; } static __always_inline bool context_tracking_enabled_this_cpu(void) { return false; } -- cgit From 1ade23711971b0eececf0d7fedc29d3c1d2fce01 Mon Sep 17 00:00:00 2001 From: Eduard Zingerman Date: Tue, 21 Jun 2022 02:53:42 +0300 Subject: bpf: Inline calls to bpf_loop when callback is known Calls to `bpf_loop` are replaced with direct loops to avoid indirection. E.g. the following: bpf_loop(10, foo, NULL, 0); Is replaced by equivalent of the following: for (int i = 0; i < 10; ++i) foo(i, NULL); This transformation could be applied when: - callback is known and does not change during program execution; - flags passed to `bpf_loop` are always zero. Inlining logic works as follows: - During execution simulation function `update_loop_inline_state` tracks the following information for each `bpf_loop` call instruction: - is callback known and constant? - are flags constant and zero? - Function `optimize_bpf_loop` increases stack depth for functions where `bpf_loop` calls can be inlined and invokes `inline_bpf_loop` to apply the inlining. The additional stack space is used to spill registers R6, R7 and R8. These registers are used as loop counter, loop maximal bound and callback context parameter; Measurements using `benchs/run_bench_bpf_loop.sh` inside QEMU / KVM on i7-4710HQ CPU show a drop in latency from 14 ns/op to 2 ns/op. Signed-off-by: Eduard Zingerman Acked-by: Song Liu Link: https://lore.kernel.org/r/20220620235344.569325-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 3 +++ include/linux/bpf_verifier.h | 12 ++++++++++++ 2 files changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0edd7d2c0064..d05e1495a06e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1286,6 +1286,9 @@ struct bpf_array { #define BPF_COMPLEXITY_LIMIT_INSNS 1000000 /* yes. 1M insns */ #define MAX_TAIL_CALL_CNT 33 +/* Maximum number of loops for bpf_loop */ +#define BPF_MAX_LOOPS BIT(23) + #define BPF_F_ACCESS_MASK (BPF_F_RDONLY | \ BPF_F_RDONLY_PROG | \ BPF_F_WRONLY | \ diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 3930c963fa67..81b19669efba 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -344,6 +344,14 @@ struct bpf_verifier_state_list { int miss_cnt, hit_cnt; }; +struct bpf_loop_inline_state { + int initialized:1; /* set to true upon first entry */ + int fit_for_inline:1; /* true if callback function is the same + * at each call and flags are always zero + */ + u32 callback_subprogno; /* valid when fit_for_inline is true */ +}; + /* Possible states for alu_state member. */ #define BPF_ALU_SANITIZE_SRC (1U << 0) #define BPF_ALU_SANITIZE_DST (1U << 1) @@ -373,6 +381,10 @@ struct bpf_insn_aux_data { u32 mem_size; /* mem_size for non-struct typed var */ }; } btf_var; + /* if instruction is a call to bpf_loop this field tracks + * the state of the relevant registers to make decision about inlining + */ + struct bpf_loop_inline_state loop_inline_state; }; u64 map_key_state; /* constant (32 bit) key tracking for maps */ int ctx_field_size; /* the ctx field size for load insn, maybe 0 */ -- cgit From f9aefd6b2aa38b9787d2705f0f1161dfd23cdb8f Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Mon, 20 Jun 2022 02:30:17 -0700 Subject: net: warn if mac header was not set Make sure skb_mac_header(), skb_mac_offset() and skb_mac_header_len() uses are not fooled if the mac header has not been set. These checks are enabled if CONFIG_DEBUG_NET=y This commit will likely expose existing bugs in linux networking stacks. Signed-off-by: Eric Dumazet Link: https://lore.kernel.org/r/20220620093017.3366713-1-eric.dumazet@gmail.com Signed-off-by: Paolo Abeni --- include/linux/skbuff.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 82edf0359ab3..cd4a8268894a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2763,8 +2763,14 @@ static inline void skb_set_network_header(struct sk_buff *skb, const int offset) skb->network_header += offset; } +static inline int skb_mac_header_was_set(const struct sk_buff *skb) +{ + return skb->mac_header != (typeof(skb->mac_header))~0U; +} + static inline unsigned char *skb_mac_header(const struct sk_buff *skb) { + DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb)); return skb->head + skb->mac_header; } @@ -2775,14 +2781,10 @@ static inline int skb_mac_offset(const struct sk_buff *skb) static inline u32 skb_mac_header_len(const struct sk_buff *skb) { + DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb)); return skb->network_header - skb->mac_header; } -static inline int skb_mac_header_was_set(const struct sk_buff *skb) -{ - return skb->mac_header != (typeof(skb->mac_header))~0U; -} - static inline void skb_unset_mac_header(struct sk_buff *skb) { skb->mac_header = (typeof(skb->mac_header))~0U; -- cgit From a37599ebfb656c2af4ca119de556eba29b6926d6 Mon Sep 17 00:00:00 2001 From: Prashant Malani Date: Wed, 15 Jun 2022 17:20:18 +0000 Subject: usb: typec: mux: Add CONFIG guards for functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There are some drivers that can use the Type C mux API, but don't have to. Introduce CONFIG guards for the mux functions so that drivers can include the header file and not run into compilation errors on systems which don't have CONFIG_TYPEC enabled. When CONFIG_TYPEC is not enabled, the Type C mux functions will be stub versions of the original calls. Reported-by: kernel test robot Reviewed-by: Nícolas F. R. A. Prado Reviewed-by: Heikki Krogerus Reviewed-by: AngeloGioacchino Del Regno Tested-by: Nícolas F. R. A. Prado Signed-off-by: Prashant Malani Link: https://lore.kernel.org/r/20220615172129.1314056-3-pmalani@chromium.org Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/typec_mux.h | 44 +++++++++++++++++++++++++++++++++++++------ 1 file changed, 38 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb/typec_mux.h b/include/linux/usb/typec_mux.h index ee57781dcf28..9292f0e07846 100644 --- a/include/linux/usb/typec_mux.h +++ b/include/linux/usb/typec_mux.h @@ -58,17 +58,13 @@ struct typec_mux_desc { void *drvdata; }; +#if IS_ENABLED(CONFIG_TYPEC) + struct typec_mux *fwnode_typec_mux_get(struct fwnode_handle *fwnode, const struct typec_altmode_desc *desc); void typec_mux_put(struct typec_mux *mux); int typec_mux_set(struct typec_mux *mux, struct typec_mux_state *state); -static inline struct typec_mux * -typec_mux_get(struct device *dev, const struct typec_altmode_desc *desc) -{ - return fwnode_typec_mux_get(dev_fwnode(dev), desc); -} - struct typec_mux_dev * typec_mux_register(struct device *parent, const struct typec_mux_desc *desc); void typec_mux_unregister(struct typec_mux_dev *mux); @@ -76,4 +72,40 @@ void typec_mux_unregister(struct typec_mux_dev *mux); void typec_mux_set_drvdata(struct typec_mux_dev *mux, void *data); void *typec_mux_get_drvdata(struct typec_mux_dev *mux); +#else + +static inline struct typec_mux *fwnode_typec_mux_get(struct fwnode_handle *fwnode, + const struct typec_altmode_desc *desc) +{ + return NULL; +} + +static inline void typec_mux_put(struct typec_mux *mux) {} + +static inline int typec_mux_set(struct typec_mux *mux, struct typec_mux_state *state) +{ + return 0; +} + +static inline struct typec_mux_dev * +typec_mux_register(struct device *parent, const struct typec_mux_desc *desc) +{ + return ERR_PTR(-EOPNOTSUPP); +} +static inline void typec_mux_unregister(struct typec_mux_dev *mux) {} + +static inline void typec_mux_set_drvdata(struct typec_mux_dev *mux, void *data) {} +static inline void *typec_mux_get_drvdata(struct typec_mux_dev *mux) +{ + return ERR_PTR(-EOPNOTSUPP); +} + +#endif /* CONFIG_TYPEC */ + +static inline struct typec_mux * +typec_mux_get(struct device *dev, const struct typec_altmode_desc *desc) +{ + return fwnode_typec_mux_get(dev_fwnode(dev), desc); +} + #endif /* __USB_TYPEC_MUX */ -- cgit From 95acd8817e66d031d2e6ee7def3f1e1874819317 Mon Sep 17 00:00:00 2001 From: Tony Ambardar Date: Fri, 17 Jun 2022 12:57:34 +0200 Subject: bpf, x64: Add predicate for bpf2bpf with tailcalls support in JIT The BPF core/verifier is hard-coded to permit mixing bpf2bpf and tail calls for only x86-64. Change the logic to instead rely on a new weak function 'bool bpf_jit_supports_subprog_tailcalls(void)', which a capable JIT backend can override. Update the x86-64 eBPF JIT to reflect this. Signed-off-by: Tony Ambardar [jakub: drop MIPS bits and tweak patch subject] Signed-off-by: Jakub Sitnicki Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220617105735.733938-2-jakub@cloudflare.com --- include/linux/filter.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index d0cbb31b1b4d..4c1a8b247545 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -914,6 +914,7 @@ u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *prog); void bpf_jit_compile(struct bpf_prog *prog); bool bpf_jit_needs_zext(void); +bool bpf_jit_supports_subprog_tailcalls(void); bool bpf_jit_supports_kfunc_call(void); bool bpf_helper_changes_pkt_data(void *func); -- cgit From 77515ebaf01920e2db49e04672ef669a7c2907f2 Mon Sep 17 00:00:00 2001 From: Duoming Zhou Date: Tue, 7 Jun 2022 11:26:25 +0800 Subject: devcoredump: remove the useless gfp_t parameter in dev_coredumpv and dev_coredumpm The dev_coredumpv() and dev_coredumpm() could not be used in atomic context, because they call kvasprintf_const() and kstrdup() with GFP_KERNEL parameter. The process is shown below: dev_coredumpv(.., gfp_t gfp) dev_coredumpm(.., gfp_t gfp) dev_set_name kobject_set_name_vargs kvasprintf_const(GFP_KERNEL, ...); //may sleep kstrdup(s, GFP_KERNEL); //may sleep This patch removes gfp_t parameter of dev_coredumpv() and dev_coredumpm() and changes the gfp_t parameter of kzalloc() in dev_coredumpm() to GFP_KERNEL in order to show they could not be used in atomic context. Fixes: 833c95456a70 ("device coredump: add new device coredump class") Reviewed-by: Brian Norris Reviewed-by: Johannes Berg Signed-off-by: Duoming Zhou Link: https://lore.kernel.org/r/df72af3b1862bac7d8e793d1f3931857d3779dfd.1654569290.git.duoming@zju.edu.cn Signed-off-by: Greg Kroah-Hartman --- include/linux/devcoredump.h | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h index c008169ed2c6..c7d840d824c3 100644 --- a/include/linux/devcoredump.h +++ b/include/linux/devcoredump.h @@ -52,27 +52,26 @@ static inline void _devcd_free_sgtable(struct scatterlist *table) #ifdef CONFIG_DEV_COREDUMP -void dev_coredumpv(struct device *dev, void *data, size_t datalen, - gfp_t gfp); +void dev_coredumpv(struct device *dev, void *data, size_t datalen); void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, gfp_t gfp, + void *data, size_t datalen, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)); void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen, gfp_t gfp); + size_t datalen); #else static inline void dev_coredumpv(struct device *dev, void *data, - size_t datalen, gfp_t gfp) + size_t datalen) { vfree(data); } static inline void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, gfp_t gfp, + void *data, size_t datalen, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)) @@ -81,7 +80,7 @@ dev_coredumpm(struct device *dev, struct module *owner, } static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen, gfp_t gfp) + size_t datalen) { _devcd_free_sgtable(table); } -- cgit From e386b6725798eec07facedf4d4bb710c079fd25c Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 2 Jun 2022 17:30:01 -0700 Subject: rcu-tasks: Eliminate RCU Tasks Trace IPIs to online CPUs Currently, the RCU Tasks Trace grace-period kthread IPIs each online CPU using smp_call_function_single() in order to track any tasks currently in RCU Tasks Trace read-side critical sections during which the corresponding task has neither blocked nor been preempted. These IPIs are annoying and are also not strictly necessary because any task that blocks or is preempted within its current RCU Tasks Trace read-side critical section will be tracked on one of the per-CPU rcu_tasks_percpu structure's ->rtp_blkd_tasks list. So the only time that this is a problem is if one of the CPUs runs through a long-duration RCU Tasks Trace read-side critical section without a context switch. Note that the task_call_func() function cannot help here because there is no safe way to identify the target task. Of course, the task_call_func() function will be very useful later, when processing the list of tasks, but it needs to know the task. This commit therefore creates a cpu_curr_snapshot() function that returns a pointer the task_struct structure of some task that happened to be running on the specified CPU more or less during the time that the cpu_curr_snapshot() function was executing. If there was no context switch during this time, this function will return a pointer to the task_struct structure of the task that was running throughout. If there was a context switch, then the outgoing task will be taken care of by RCU's context-switch hook, and the incoming task was either already taken care during some previous context switch, or it is not currently within an RCU Tasks Trace read-side critical section. And in this latter case, the grace period already started, so there is no need to wait on this task. This new cpu_curr_snapshot() function is invoked on each CPU early in the RCU Tasks Trace grace-period processing, and the resulting tasks are queued for later quiescent-state inspection. Signed-off-by: Paul E. McKenney Cc: Peter Zijlstra Cc: Neeraj Upadhyay Cc: Eric Dumazet Cc: Alexei Starovoitov Cc: Andrii Nakryiko Cc: Martin KaFai Lau Cc: KP Singh --- include/linux/sched.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index b88caf54e168..72242bc73d85 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2224,6 +2224,7 @@ static inline void set_task_cpu(struct task_struct *p, unsigned int cpu) extern bool sched_task_on_rq(struct task_struct *p); extern unsigned long get_wchan(struct task_struct *p); +extern struct task_struct *cpu_curr_snapshot(int cpu); /* * In order to reduce various lock holder preemption latencies provide an -- cgit From 0ffc781a19ed3030c792ad1a0e44e6e047bb9adc Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:20 +0200 Subject: context_tracking: Rename __context_tracking_enter/exit() to __ct_user_enter/exit() The context tracking namespace is going to expand and some new functions will require even longer names. Start shrinking the context_tracking prefix to "ct" as is already the case for some existing macros, this will make the introduction of new functions easier. Acked-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 7a14807c9d1a..773035124bad 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -14,8 +14,8 @@ extern void context_tracking_cpu_set(int cpu); /* Called with interrupts disabled. */ -extern void __context_tracking_enter(enum ctx_state state); -extern void __context_tracking_exit(enum ctx_state state); +extern void __ct_user_enter(enum ctx_state state); +extern void __ct_user_exit(enum ctx_state state); extern void context_tracking_enter(enum ctx_state state); extern void context_tracking_exit(enum ctx_state state); @@ -38,13 +38,13 @@ static inline void user_exit(void) static __always_inline void user_enter_irqoff(void) { if (context_tracking_enabled()) - __context_tracking_enter(CONTEXT_USER); + __ct_user_enter(CONTEXT_USER); } static __always_inline void user_exit_irqoff(void) { if (context_tracking_enabled()) - __context_tracking_exit(CONTEXT_USER); + __ct_user_exit(CONTEXT_USER); } static inline enum ctx_state exception_enter(void) @@ -74,7 +74,7 @@ static inline void exception_exit(enum ctx_state prev_ctx) static __always_inline bool context_tracking_guest_enter(void) { if (context_tracking_enabled()) - __context_tracking_enter(CONTEXT_GUEST); + __ct_user_enter(CONTEXT_GUEST); return context_tracking_enabled_this_cpu(); } @@ -82,7 +82,7 @@ static __always_inline bool context_tracking_guest_enter(void) static __always_inline void context_tracking_guest_exit(void) { if (context_tracking_enabled()) - __context_tracking_exit(CONTEXT_GUEST); + __ct_user_exit(CONTEXT_GUEST); } /** -- cgit From e244a46a529a9a4c43ae3a2b4bf613e260ec8f81 Mon Sep 17 00:00:00 2001 From: Maximilian Luz Date: Tue, 14 Jun 2022 21:41:17 +0200 Subject: platform/surface: aggregator: Reserve more event- and target-categories With the introduction of the Surface Laptop Studio, more event- and target categories have been added. Therefore, increase the number of reserved events and extend the enum of know target categories to accommodate this. Signed-off-by: Maximilian Luz Link: https://lore.kernel.org/r/20220614194117.4118897-1-luzmaximilian@gmail.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/surface_aggregator/serial_hub.h | 75 ++++++++++++++------------- 1 file changed, 40 insertions(+), 35 deletions(-) (limited to 'include/linux') diff --git a/include/linux/surface_aggregator/serial_hub.h b/include/linux/surface_aggregator/serial_hub.h index 26b95ec12733..45501b6e54e8 100644 --- a/include/linux/surface_aggregator/serial_hub.h +++ b/include/linux/surface_aggregator/serial_hub.h @@ -201,7 +201,7 @@ static inline u16 ssh_crc(const u8 *buf, size_t len) * exception of zero, which is not an event ID. Thus, this is also the * absolute maximum number of event handlers that can be registered. */ -#define SSH_NUM_EVENTS 34 +#define SSH_NUM_EVENTS 38 /* * SSH_NUM_TARGETS - The number of communication targets used in the protocol. @@ -292,40 +292,45 @@ struct ssam_span { * Windows driver. */ enum ssam_ssh_tc { - /* Category 0x00 is invalid for EC use. */ - SSAM_SSH_TC_SAM = 0x01, /* Generic system functionality, real-time clock. */ - SSAM_SSH_TC_BAT = 0x02, /* Battery/power subsystem. */ - SSAM_SSH_TC_TMP = 0x03, /* Thermal subsystem. */ - SSAM_SSH_TC_PMC = 0x04, - SSAM_SSH_TC_FAN = 0x05, - SSAM_SSH_TC_PoM = 0x06, - SSAM_SSH_TC_DBG = 0x07, - SSAM_SSH_TC_KBD = 0x08, /* Legacy keyboard (Laptop 1/2). */ - SSAM_SSH_TC_FWU = 0x09, - SSAM_SSH_TC_UNI = 0x0a, - SSAM_SSH_TC_LPC = 0x0b, - SSAM_SSH_TC_TCL = 0x0c, - SSAM_SSH_TC_SFL = 0x0d, - SSAM_SSH_TC_KIP = 0x0e, /* Manages detachable peripherals (Pro X/8 keyboard cover) */ - SSAM_SSH_TC_EXT = 0x0f, - SSAM_SSH_TC_BLD = 0x10, - SSAM_SSH_TC_BAS = 0x11, /* Detachment system (Surface Book 2/3). */ - SSAM_SSH_TC_SEN = 0x12, - SSAM_SSH_TC_SRQ = 0x13, - SSAM_SSH_TC_MCU = 0x14, - SSAM_SSH_TC_HID = 0x15, /* Generic HID input subsystem. */ - SSAM_SSH_TC_TCH = 0x16, - SSAM_SSH_TC_BKL = 0x17, - SSAM_SSH_TC_TAM = 0x18, - SSAM_SSH_TC_ACC = 0x19, - SSAM_SSH_TC_UFI = 0x1a, - SSAM_SSH_TC_USC = 0x1b, - SSAM_SSH_TC_PEN = 0x1c, - SSAM_SSH_TC_VID = 0x1d, - SSAM_SSH_TC_AUD = 0x1e, - SSAM_SSH_TC_SMC = 0x1f, - SSAM_SSH_TC_KPD = 0x20, - SSAM_SSH_TC_REG = 0x21, /* Extended event registry. */ + /* Category 0x00 is invalid for EC use. */ + SSAM_SSH_TC_SAM = 0x01, /* Generic system functionality, real-time clock. */ + SSAM_SSH_TC_BAT = 0x02, /* Battery/power subsystem. */ + SSAM_SSH_TC_TMP = 0x03, /* Thermal subsystem. */ + SSAM_SSH_TC_PMC = 0x04, + SSAM_SSH_TC_FAN = 0x05, + SSAM_SSH_TC_PoM = 0x06, + SSAM_SSH_TC_DBG = 0x07, + SSAM_SSH_TC_KBD = 0x08, /* Legacy keyboard (Laptop 1/2). */ + SSAM_SSH_TC_FWU = 0x09, + SSAM_SSH_TC_UNI = 0x0a, + SSAM_SSH_TC_LPC = 0x0b, + SSAM_SSH_TC_TCL = 0x0c, + SSAM_SSH_TC_SFL = 0x0d, + SSAM_SSH_TC_KIP = 0x0e, /* Manages detachable peripherals (Pro X/8 keyboard cover) */ + SSAM_SSH_TC_EXT = 0x0f, + SSAM_SSH_TC_BLD = 0x10, + SSAM_SSH_TC_BAS = 0x11, /* Detachment system (Surface Book 2/3). */ + SSAM_SSH_TC_SEN = 0x12, + SSAM_SSH_TC_SRQ = 0x13, + SSAM_SSH_TC_MCU = 0x14, + SSAM_SSH_TC_HID = 0x15, /* Generic HID input subsystem. */ + SSAM_SSH_TC_TCH = 0x16, + SSAM_SSH_TC_BKL = 0x17, + SSAM_SSH_TC_TAM = 0x18, + SSAM_SSH_TC_ACC0 = 0x19, + SSAM_SSH_TC_UFI = 0x1a, + SSAM_SSH_TC_USC = 0x1b, + SSAM_SSH_TC_PEN = 0x1c, + SSAM_SSH_TC_VID = 0x1d, + SSAM_SSH_TC_AUD = 0x1e, + SSAM_SSH_TC_SMC = 0x1f, + SSAM_SSH_TC_KPD = 0x20, + SSAM_SSH_TC_REG = 0x21, /* Extended event registry. */ + SSAM_SSH_TC_SPT = 0x22, + SSAM_SSH_TC_SYS = 0x23, + SSAM_SSH_TC_ACC1 = 0x24, + SSAM_SSH_TC_SHB = 0x25, + SSAM_SSH_TC_POS = 0x26, /* For obtaining Laptop Studio screen position. */ }; -- cgit From 1a3c7d0841ae24d28a15bbf87ee7d08614cec957 Mon Sep 17 00:00:00 2001 From: Dongli Zhang Date: Sat, 11 Jun 2022 01:25:11 -0700 Subject: swiotlb: remove the unused swiotlb_force declaration The 'swiotlb_force' is removed since commit c6af2aa9ffc9 ("swiotlb: make the swiotlb_init interface more useful"). Signed-off-by: Dongli Zhang Signed-off-by: Christoph Hellwig --- include/linux/swiotlb.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7ed35dd3de6e..bdc58a0e20b1 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -60,7 +60,6 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, size_t size, enum dma_data_direction dir, unsigned long attrs); #ifdef CONFIG_SWIOTLB -extern enum swiotlb_force swiotlb_force; /** * struct io_tlb_mem - IO TLB Memory Pool Descriptor -- cgit From 0829c35dc5346e90f428de61896362b51ab58296 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Sat, 21 May 2022 13:02:03 +0200 Subject: pwm: Drop support for legacy drivers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There are no drivers left providing the legacy callbacks. So drop support for these. If this commit breaks your out-of-tree pwm driver, look at e.g. commit ec00cd5e63f0 ("pwm: renesas-tpu: Implement .apply() callback") for an example of the needed conversion for your driver. Signed-off-by: Uwe Kleine-König Signed-off-by: Thierry Reding --- include/linux/pwm.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pwm.h b/include/linux/pwm.h index 9771a0761a40..76463b71ad4e 100644 --- a/include/linux/pwm.h +++ b/include/linux/pwm.h @@ -261,10 +261,6 @@ pwm_set_relative_duty_cycle(struct pwm_state *state, unsigned int duty_cycle, * called once per PWM device when the PWM chip is * registered. * @owner: helps prevent removal of modules exporting active PWMs - * @config: configure duty cycles and period length for this PWM - * @set_polarity: configure the polarity of this PWM - * @enable: enable PWM output toggling - * @disable: disable PWM output toggling */ struct pwm_ops { int (*request)(struct pwm_chip *chip, struct pwm_device *pwm); @@ -276,14 +272,6 @@ struct pwm_ops { void (*get_state)(struct pwm_chip *chip, struct pwm_device *pwm, struct pwm_state *state); struct module *owner; - - /* Only used by legacy drivers */ - int (*config)(struct pwm_chip *chip, struct pwm_device *pwm, - int duty_ns, int period_ns); - int (*set_polarity)(struct pwm_chip *chip, struct pwm_device *pwm, - enum pwm_polarity polarity); - int (*enable)(struct pwm_chip *chip, struct pwm_device *pwm); - void (*disable)(struct pwm_chip *chip, struct pwm_device *pwm); }; /** -- cgit From ef2e35d90890932ee4546231c775326ffd7a3909 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 23 May 2022 19:45:00 +0200 Subject: pwm: Reorder header file to get rid of struct pwm_capture forward declaration MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is no cyclic dependency, so by reordering the forward declaration can be dropped. Signed-off-by: Uwe Kleine-König Signed-off-by: Thierry Reding --- include/linux/pwm.h | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pwm.h b/include/linux/pwm.h index 76463b71ad4e..6d6946dec771 100644 --- a/include/linux/pwm.h +++ b/include/linux/pwm.h @@ -6,7 +6,6 @@ #include #include -struct pwm_capture; struct seq_file; struct pwm_chip; @@ -251,6 +250,16 @@ pwm_set_relative_duty_cycle(struct pwm_state *state, unsigned int duty_cycle, return 0; } +/** + * struct pwm_capture - PWM capture data + * @period: period of the PWM signal (in nanoseconds) + * @duty_cycle: duty cycle of the PWM signal (in nanoseconds) + */ +struct pwm_capture { + unsigned int period; + unsigned int duty_cycle; +}; + /** * struct pwm_ops - PWM controller operations * @request: optional hook for requesting a PWM @@ -300,16 +309,6 @@ struct pwm_chip { struct pwm_device *pwms; }; -/** - * struct pwm_capture - PWM capture data - * @period: period of the PWM signal (in nanoseconds) - * @duty_cycle: duty cycle of the PWM signal (in nanoseconds) - */ -struct pwm_capture { - unsigned int period; - unsigned int duty_cycle; -}; - #if IS_ENABLED(CONFIG_PWM) /* PWM user APIs */ struct pwm_device *pwm_request(int pwm_id, const char *label); -- cgit From 5c8dca97404bf23f6531456ae4d14c862f6871a3 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 23 May 2022 19:45:01 +0200 Subject: pwm: Drop unused forward declaration from pwm.h MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The declaration was necessary until commit cc2d22477779 ("pwm: Drop per-chip dbg_show callback"). Signed-off-by: Uwe Kleine-König Signed-off-by: Thierry Reding --- include/linux/pwm.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pwm.h b/include/linux/pwm.h index 6d6946dec771..9429930c5566 100644 --- a/include/linux/pwm.h +++ b/include/linux/pwm.h @@ -6,8 +6,6 @@ #include #include -struct seq_file; - struct pwm_chip; /** -- cgit From 62c0aff64c8d6594357b6217db019552ea664b90 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 22 Jun 2022 20:11:47 +0300 Subject: clk: Remove never used devm_clk_*unregister() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit For the entire history of the devm_clk_*unregister() existence they were used only once (*) in 2015. Remove them. *) The commit 264e3b75de4e ("clk: s2mps11: Simplify s2mps11_clk_probe unwind paths") exactly supports the point of the change proposed here. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220622171147.85603-1-andriy.shevchenko@linux.intel.com Acked-by: Uwe Kleine-König Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index c10dc4c659e2..72d937c03a3e 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -1176,10 +1176,8 @@ int __must_check devm_clk_hw_register(struct device *dev, struct clk_hw *hw); int __must_check of_clk_hw_register(struct device_node *node, struct clk_hw *hw); void clk_unregister(struct clk *clk); -void devm_clk_unregister(struct device *dev, struct clk *clk); void clk_hw_unregister(struct clk_hw *hw); -void devm_clk_hw_unregister(struct device *dev, struct clk_hw *hw); /* helper functions */ const char *__clk_get_name(const struct clk *clk); -- cgit From 23c9cd56007e90b2c2317c5eab6ab12921b4314a Mon Sep 17 00:00:00 2001 From: Joel Granados Date: Tue, 21 Jun 2022 09:05:00 +0200 Subject: nvme: fix the CRIMS and CRWMS definitions to match the spec Adjust the values of NVME_CAP_CRMS_CRIMS and NVME_CAP_CRMS_CRWMS masks as they are different from the ones in TP4084 - Time-to-ready. Fixes: 354201c53e61 ("nvme: add support for TP4084 - Time-to-Ready Enhancements"). Signed-off-by: Joel Granados Reviewed-by: Keith Busch Reviewed-by: Sagi Grimberg Reviewed-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig --- include/linux/nvme.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 29ec3e3481ff..e3934003f239 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -233,8 +233,8 @@ enum { }; enum { - NVME_CAP_CRMS_CRIMS = 1ULL << 59, - NVME_CAP_CRMS_CRWMS = 1ULL << 60, + NVME_CAP_CRMS_CRWMS = 1ULL << 59, + NVME_CAP_CRMS_CRIMS = 1ULL << 60, }; struct nvme_id_power_state { -- cgit From 20fb0c8272bbb102d15bdd11aa64f828619dd7cc Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Thu, 23 Jun 2022 16:51:52 +0200 Subject: Revert "printk: Wait for the global console lock when the system is going down" This reverts commit b87f02307d3cfbda768520f0687c51ca77e14fc3. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220623145157.21938-2-pmladek@suse.com --- include/linux/printk.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index c1e07c0652c7..cd26aab0ab2a 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -174,7 +174,6 @@ extern void printk_prefer_direct_enter(void); extern void printk_prefer_direct_exit(void); extern bool pr_flush(int timeout_ms, bool reset_on_progress); -extern void try_block_console_kthreads(int timeout_ms); /* * Please don't use printk_ratelimit(), because it shares ratelimiting state @@ -239,10 +238,6 @@ static inline bool pr_flush(int timeout_ms, bool reset_on_progress) return true; } -static inline void try_block_console_kthreads(int timeout_ms) -{ -} - static inline int printk_ratelimit(void) { return 0; -- cgit From 2d9ef940f89e0ab4fde7ba6f769d82f2a450c35a Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Thu, 23 Jun 2022 16:51:55 +0200 Subject: Revert "printk: extend console_lock for per-console locking" This reverts commit 8e274732115f63c1d09136284431b3555bd5cc56. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220623145157.21938-5-pmladek@suse.com --- include/linux/console.h | 15 --------------- 1 file changed, 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/console.h b/include/linux/console.h index 143653090c48..9a251e70c090 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -16,7 +16,6 @@ #include #include -#include struct vc_data; struct console_font_op; @@ -155,20 +154,6 @@ struct console { u64 seq; unsigned long dropped; struct task_struct *thread; - bool blocked; - - /* - * The per-console lock is used by printing kthreads to synchronize - * this console with callers of console_lock(). This is necessary in - * order to allow printing kthreads to run in parallel to each other, - * while each safely accessing the @blocked field and synchronizing - * against direct printing via console_lock/console_unlock. - * - * Note: For synchronizing against direct printing via - * console_trylock/console_unlock, see the static global - * variable @console_kthreads_active. - */ - struct mutex lock; void *data; struct console *next; -- cgit From 5831788afb17b89c5b531fb60cbd798613ccbb63 Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Thu, 23 Jun 2022 16:51:56 +0200 Subject: Revert "printk: add kthread console printers" This reverts commit 09c5ba0aa2fcfdadb17d045c3ee6f86d69270df7. This reverts commit b87f02307d3cfbda768520f0687c51ca77e14fc3. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220623145157.21938-6-pmladek@suse.com --- include/linux/console.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/console.h b/include/linux/console.h index 9a251e70c090..8c1686e2c233 100644 --- a/include/linux/console.h +++ b/include/linux/console.h @@ -153,8 +153,6 @@ struct console { uint ospeed; u64 seq; unsigned long dropped; - struct task_struct *thread; - void *data; struct console *next; }; -- cgit From 07a22b61946f0b80065b0ddcc703b715f84355f5 Mon Sep 17 00:00:00 2001 From: Petr Mladek Date: Thu, 23 Jun 2022 16:51:57 +0200 Subject: Revert "printk: add functions to prefer direct printing" This reverts commit 2bb2b7b57f81255c13f4395ea911d6bdc70c9fe2. The testing of 5.19 release candidates revealed missing synchronization between early and regular console functionality. It would be possible to start the console kthreads later as a workaround. But it is clear that console lock serialized console drivers between each other. It opens a big area of possible problems that were not considered by people involved in the development and review. printk() is crucial for debugging kernel issues and console output is very important part of it. The number of consoles is huge and a proper review would take some time. As a result it need to be reverted for 5.19. Link: https://lore.kernel.org/r/YrBdjVwBOVgLfHyb@alley Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220623145157.21938-7-pmladek@suse.com --- include/linux/printk.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index cd26aab0ab2a..091fba7283e1 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -170,9 +170,6 @@ extern void __printk_safe_exit(void); #define printk_deferred_enter __printk_safe_enter #define printk_deferred_exit __printk_safe_exit -extern void printk_prefer_direct_enter(void); -extern void printk_prefer_direct_exit(void); - extern bool pr_flush(int timeout_ms, bool reset_on_progress); /* @@ -225,14 +222,6 @@ static inline void printk_deferred_exit(void) { } -static inline void printk_prefer_direct_enter(void) -{ -} - -static inline void printk_prefer_direct_exit(void) -{ -} - static inline bool pr_flush(int timeout_ms, bool reset_on_progress) { return true; -- cgit From 203184571388a988283543f0fd7da1a0da7c3f91 Mon Sep 17 00:00:00 2001 From: Frank Li Date: Tue, 24 May 2022 10:21:53 -0500 Subject: dmaengine: dw-edma: Detach the private data and chip info structures "struct dw_edma_chip" contains an internal structure "struct dw_edma" that is used by the eDMA core internally and should not be touched by the eDMA controller drivers themselves. But currently, the eDMA controller drivers like "dw-edma-pci" allocate and populate this internal structure before passing it on to the eDMA core. The eDMA core further populates the structure and uses it. This is wrong! Hence, move all the "struct dw_edma" specifics from controller drivers to the eDMA core. Link: https://lore.kernel.org/r/20220524152159.2370739-3-Frank.Li@nxp.com Tested-by: Serge Semin Tested-by: Manivannan Sadhasivam Signed-off-by: Frank Li Signed-off-by: Bjorn Helgaas Reviewed-by: Serge Semin Reviewed-by: Manivannan Sadhasivam Acked-By: Vinod Koul --- include/linux/dma/edma.h | 48 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h index d4333e721588..6f64e90d5c38 100644 --- a/include/linux/dma/edma.h +++ b/include/linux/dma/edma.h @@ -12,17 +12,63 @@ #include #include +#define EDMA_MAX_WR_CH 8 +#define EDMA_MAX_RD_CH 8 + struct dw_edma; +struct dw_edma_region { + phys_addr_t paddr; + void __iomem *vaddr; + size_t sz; +}; + +struct dw_edma_core_ops { + int (*irq_vector)(struct device *dev, unsigned int nr); +}; + +enum dw_edma_map_format { + EDMA_MF_EDMA_LEGACY = 0x0, + EDMA_MF_EDMA_UNROLL = 0x1, + EDMA_MF_HDMA_COMPAT = 0x5 +}; + /** * struct dw_edma_chip - representation of DesignWare eDMA controller hardware * @dev: struct device of the eDMA controller * @id: instance ID - * @dw: struct dw_edma that is filed by dw_edma_probe() + * @nr_irqs: total number of DMA IRQs + * @ops DMA channel to IRQ number mapping + * @wr_ch_cnt DMA write channel number + * @rd_ch_cnt DMA read channel number + * @rg_region DMA register region + * @ll_region_wr DMA descriptor link list memory for write channel + * @ll_region_rd DMA descriptor link list memory for read channel + * @dt_region_wr DMA data memory for write channel + * @dt_region_rd DMA data memory for read channel + * @mf DMA register map format + * @dw: struct dw_edma that is filled by dw_edma_probe() */ struct dw_edma_chip { struct device *dev; int id; + int nr_irqs; + const struct dw_edma_core_ops *ops; + + struct dw_edma_region rg_region; + + u16 wr_ch_cnt; + u16 rd_ch_cnt; + /* link list address */ + struct dw_edma_region ll_region_wr[EDMA_MAX_WR_CH]; + struct dw_edma_region ll_region_rd[EDMA_MAX_RD_CH]; + + /* data region */ + struct dw_edma_region dt_region_wr[EDMA_MAX_WR_CH]; + struct dw_edma_region dt_region_rd[EDMA_MAX_RD_CH]; + + enum dw_edma_map_format mf; + struct dw_edma *dw; }; -- cgit From e51b3048116a6e10b96bd5298cbcb209b6d729cd Mon Sep 17 00:00:00 2001 From: Frank Li Date: Tue, 24 May 2022 10:21:54 -0500 Subject: dmaengine: dw-edma: Change rg_region to reg_base in struct dw_edma_chip struct dw_edma_region rg_region included virtual address, physical address and size information. But only the virtual address is used by EDMA driver. Change it to void __iomem *reg_base to clean up code. Link: https://lore.kernel.org/r/20220524152159.2370739-4-Frank.Li@nxp.com Tested-by: Serge Semin Tested-by: Manivannan Sadhasivam Signed-off-by: Frank Li Signed-off-by: Bjorn Helgaas Reviewed-by: Serge Semin Reviewed-by: Manivannan Sadhasivam Acked-By: Vinod Koul --- include/linux/dma/edma.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h index 6f64e90d5c38..df9ba3ecb437 100644 --- a/include/linux/dma/edma.h +++ b/include/linux/dma/edma.h @@ -39,6 +39,7 @@ enum dw_edma_map_format { * @id: instance ID * @nr_irqs: total number of DMA IRQs * @ops DMA channel to IRQ number mapping + * @reg_base DMA register base address * @wr_ch_cnt DMA write channel number * @rd_ch_cnt DMA read channel number * @rg_region DMA register region @@ -55,7 +56,7 @@ struct dw_edma_chip { int nr_irqs; const struct dw_edma_core_ops *ops; - struct dw_edma_region rg_region; + void __iomem *reg_base; u16 wr_ch_cnt; u16 rd_ch_cnt; -- cgit From 6951ee96c649f6e963b98c11b2b1a92697d3c45c Mon Sep 17 00:00:00 2001 From: Frank Li Date: Tue, 24 May 2022 10:21:55 -0500 Subject: dmaengine: dw-edma: Rename wr(rd)_ch_cnt to ll_wr(rd)_cnt in struct dw_edma_chip The struct dw_edma contains wr(rd)_ch_cnt fields. The EDMA driver gets write(read) channel number from register, then saves these into dw_edma. The wr(rd)_ch_cnt in dw_edma_chip actually means how many link list memory are available in ll_region_wr(rd)[EDMA_MAX_WR_CH]. Rename it to ll_wr(rd)_cnt to indicate actual usage. Link: https://lore.kernel.org/r/20220524152159.2370739-5-Frank.Li@nxp.com Tested-by: Serge Semin Tested-by: Manivannan Sadhasivam Signed-off-by: Frank Li Signed-off-by: Bjorn Helgaas Reviewed-by: Serge Semin Reviewed-by: Manivannan Sadhasivam Acked-By: Vinod Koul --- include/linux/dma/edma.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h index df9ba3ecb437..fdbbeda170a9 100644 --- a/include/linux/dma/edma.h +++ b/include/linux/dma/edma.h @@ -40,8 +40,8 @@ enum dw_edma_map_format { * @nr_irqs: total number of DMA IRQs * @ops DMA channel to IRQ number mapping * @reg_base DMA register base address - * @wr_ch_cnt DMA write channel number - * @rd_ch_cnt DMA read channel number + * @ll_wr_cnt DMA write link list count + * @ll_rd_cnt DMA read link list count * @rg_region DMA register region * @ll_region_wr DMA descriptor link list memory for write channel * @ll_region_rd DMA descriptor link list memory for read channel @@ -58,8 +58,8 @@ struct dw_edma_chip { void __iomem *reg_base; - u16 wr_ch_cnt; - u16 rd_ch_cnt; + u16 ll_wr_cnt; + u16 ll_rd_cnt; /* link list address */ struct dw_edma_region ll_region_wr[EDMA_MAX_WR_CH]; struct dw_edma_region ll_region_rd[EDMA_MAX_RD_CH]; -- cgit From d6b03171f9fc8127b3a7adfd4e74ee5d4dae5d14 Mon Sep 17 00:00:00 2001 From: Frank Li Date: Tue, 24 May 2022 10:21:58 -0500 Subject: dmaengine: dw-edma: Add support for chip-specific flags Add a "flags" field to the "struct dw_edma_chip" so that the controller drivers can pass flags that are relevant to the platform. DW_EDMA_CHIP_LOCAL - Used by the controller drivers accessing eDMA locally. Local eDMA access doesn't require generating MSIs to the remote. Link: https://lore.kernel.org/r/20220524152159.2370739-8-Frank.Li@nxp.com Tested-by: Serge Semin Tested-by: Manivannan Sadhasivam Signed-off-by: Frank Li Signed-off-by: Bjorn Helgaas Reviewed-by: Serge Semin Reviewed-by: Manivannan Sadhasivam Acked-By: Vinod Koul --- include/linux/dma/edma.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma/edma.h b/include/linux/dma/edma.h index fdbbeda170a9..7d8062e9c544 100644 --- a/include/linux/dma/edma.h +++ b/include/linux/dma/edma.h @@ -33,12 +33,21 @@ enum dw_edma_map_format { EDMA_MF_HDMA_COMPAT = 0x5 }; +/** + * enum dw_edma_chip_flags - Flags specific to an eDMA chip + * @DW_EDMA_CHIP_LOCAL: eDMA is used locally by an endpoint + */ +enum dw_edma_chip_flags { + DW_EDMA_CHIP_LOCAL = BIT(0), +}; + /** * struct dw_edma_chip - representation of DesignWare eDMA controller hardware * @dev: struct device of the eDMA controller * @id: instance ID * @nr_irqs: total number of DMA IRQs * @ops DMA channel to IRQ number mapping + * @flags dw_edma_chip_flags * @reg_base DMA register base address * @ll_wr_cnt DMA write link list count * @ll_rd_cnt DMA read link list count @@ -55,6 +64,7 @@ struct dw_edma_chip { int id; int nr_irqs; const struct dw_edma_core_ops *ops; + u32 flags; void __iomem *reg_base; -- cgit From c7e1c443584ddd7facffca00e9e23b0d084e5bb3 Mon Sep 17 00:00:00 2001 From: Akira Yokosawa Date: Mon, 6 Jun 2022 13:44:24 +0900 Subject: gpio: Fix kernel-doc comments to nested union Commit 48ec13d36d3f ("gpio: Properly document parent data union") is supposed to have fixed a warning from "make htmldocs" regarding kernel-doc comments to union members. However, the same warning still remains [1]. Fix the issue by following the example found in section "Nested structs/unions" of Documentation/doc-guide/kernel-doc.rst. Signed-off-by: Akira Yokosawa Reported-by: Stephen Rothwell Fixes: 48ec13d36d3f ("gpio: Properly document parent data union") Link: https://lore.kernel.org/r/20220606093302.21febee3@canb.auug.org.au/ [1] Cc: Linus Walleij Cc: Bartosz Golaszewski Cc: Joey Gouly Cc: Marc Zyngier Tested-by: Stephen Rothwell Reviewed-by: Mauro Carvalho Chehab Signed-off-by: Bartosz Golaszewski --- include/linux/gpio/driver.h | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index b1e0f1f8ee2e..54c3c6506503 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -167,21 +167,24 @@ struct gpio_irq_chip { */ irq_flow_handler_t parent_handler; - /** - * @parent_handler_data: - * - * If @per_parent_data is false, @parent_handler_data is a single - * pointer used as the data associated with every parent interrupt. - * - * @parent_handler_data_array: - * - * If @per_parent_data is true, @parent_handler_data_array is - * an array of @num_parents pointers, and is used to associate - * different data for each parent. This cannot be NULL if - * @per_parent_data is true. - */ union { + /** + * @parent_handler_data: + * + * If @per_parent_data is false, @parent_handler_data is a + * single pointer used as the data associated with every + * parent interrupt. + */ void *parent_handler_data; + + /** + * @parent_handler_data_array: + * + * If @per_parent_data is true, @parent_handler_data_array is + * an array of @num_parents pointers, and is used to associate + * different data for each parent. This cannot be NULL if + * @per_parent_data is true. + */ void **parent_handler_data_array; }; -- cgit From f50974eee5c4a5de1e4f1a3d873099f170df25f8 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Thu, 23 Jun 2022 13:02:31 -0700 Subject: memregion: Fix memregion_free() fallback definition In the CONFIG_MEMREGION=n case, memregion_free() is meant to be a static inline. 0day reports: In file included from drivers/cxl/core/port.c:4: include/linux/memregion.h:19:6: warning: no previous prototype for function 'memregion_free' [-Wmissing-prototypes] Mark memregion_free() static. Fixes: 33dd70752cd7 ("lib: Uplevel the pmem "region" ida to a global allocator") Reported-by: kernel test robot Reviewed-by: Alison Schofield Link: https://lore.kernel.org/r/165601455171.4042645.3350844271068713515.stgit@dwillia2-xfh Signed-off-by: Dan Williams --- include/linux/memregion.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memregion.h b/include/linux/memregion.h index e11595256cac..c04c4fd2e209 100644 --- a/include/linux/memregion.h +++ b/include/linux/memregion.h @@ -16,7 +16,7 @@ static inline int memregion_alloc(gfp_t gfp) { return -ENOMEM; } -void memregion_free(int id) +static inline void memregion_free(int id) { } #endif -- cgit From 3cc624beba6310a8a534fb00841f22445a200d54 Mon Sep 17 00:00:00 2001 From: Ivan Bornyakov Date: Thu, 23 Jun 2022 19:32:44 +0300 Subject: fpga: fpga-mgr: support bitstream offset in image buffer At the moment FPGA manager core loads to the device entire image provided to fpga_mgr_load(). But it is not always whole FPGA image buffer meant to be written to the device. In particular, .dat formatted image for Microchip MPF contains meta info in the header that is not meant to be written to the device. This is issue for those low level drivers that loads data to the device with write() fpga_manager_ops callback, since write() can be called in iterator over scatter-gather table, not only linear image buffer. On the other hand, write_sg() callback is provided with whole image in scatter-gather form and can decide itself which part should be sent to the device. Add header_size and data_size to the fpga_image_info struct, add skip_header to the fpga_manager_ops struct and adjust fpga_mgr_write() callers with respect to them. * info->header_size indicates part at the beginning of image buffer that contains some meta info. It is optional and can be 0, initialized with mops->initial_header_size. * mops->skip_header tells fpga-mgr core whether write should start from the beginning of image buffer or at the offset of header_size. * info->data_size is the size of bitstream data that is meant to be written to the device. It is also optional and can be 0, which means bitstream data is up to the end of image buffer. Also add parse_header() callback to fpga_manager_ops, which purpose is to set info->header_size and info->data_size. At least initial_header_size bytes of image buffer will be passed into parse_header() first time. If it is not enough, parse_header() should set desired size into info->header_size and return -EAGAIN, then it will be called again with greater part of image buffer on the input. Suggested-by: Xu Yilun Signed-off-by: Ivan Bornyakov Acked-by: Xu Yilun Link: https://lore.kernel.org/r/20220623163248.3672-2-i.bornyakov@metrotek.ru Signed-off-by: Xu Yilun --- include/linux/fpga/fpga-mgr.h | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fpga/fpga-mgr.h b/include/linux/fpga/fpga-mgr.h index 0f9468771bb9..54f63459efd6 100644 --- a/include/linux/fpga/fpga-mgr.h +++ b/include/linux/fpga/fpga-mgr.h @@ -22,6 +22,8 @@ struct sg_table; * @FPGA_MGR_STATE_RESET: FPGA in reset state * @FPGA_MGR_STATE_FIRMWARE_REQ: firmware request in progress * @FPGA_MGR_STATE_FIRMWARE_REQ_ERR: firmware request failed + * @FPGA_MGR_STATE_PARSE_HEADER: parse FPGA image header + * @FPGA_MGR_STATE_PARSE_HEADER_ERR: Error during PARSE_HEADER stage * @FPGA_MGR_STATE_WRITE_INIT: preparing FPGA for programming * @FPGA_MGR_STATE_WRITE_INIT_ERR: Error during WRITE_INIT stage * @FPGA_MGR_STATE_WRITE: writing image to FPGA @@ -41,7 +43,9 @@ enum fpga_mgr_states { FPGA_MGR_STATE_FIRMWARE_REQ, FPGA_MGR_STATE_FIRMWARE_REQ_ERR, - /* write sequence: init, write, complete */ + /* write sequence: parse header, init, write, complete */ + FPGA_MGR_STATE_PARSE_HEADER, + FPGA_MGR_STATE_PARSE_HEADER_ERR, FPGA_MGR_STATE_WRITE_INIT, FPGA_MGR_STATE_WRITE_INIT_ERR, FPGA_MGR_STATE_WRITE, @@ -85,6 +89,9 @@ enum fpga_mgr_states { * @sgt: scatter/gather table containing FPGA image * @buf: contiguous buffer containing FPGA image * @count: size of buf + * @header_size: size of image header. + * @data_size: size of image data to be sent to the device. If not specified, + * whole image will be used. Header may be skipped in either case. * @region_id: id of target region * @dev: device that owns this * @overlay: Device Tree overlay @@ -98,6 +105,8 @@ struct fpga_image_info { struct sg_table *sgt; const char *buf; size_t count; + size_t header_size; + size_t data_size; int region_id; struct device *dev; #ifdef CONFIG_OF @@ -137,9 +146,16 @@ struct fpga_manager_info { /** * struct fpga_manager_ops - ops for low level fpga manager drivers - * @initial_header_size: Maximum number of bytes that should be passed into write_init + * @initial_header_size: minimum number of bytes that should be passed into + * parse_header and write_init. + * @skip_header: bool flag to tell fpga-mgr core whether it should skip + * info->header_size part at the beginning of the image when invoking + * write callback. * @state: returns an enum value of the FPGA's state * @status: returns status of the FPGA, including reconfiguration error code + * @parse_header: parse FPGA image header to set info->header_size and + * info->data_size. In case the input buffer is not large enough, set + * required size to info->header_size and return -EAGAIN. * @write_init: prepare the FPGA to receive configuration data * @write: write count bytes of configuration data to the FPGA * @write_sg: write the scatter list of configuration data to the FPGA @@ -153,8 +169,12 @@ struct fpga_manager_info { */ struct fpga_manager_ops { size_t initial_header_size; + bool skip_header; enum fpga_mgr_states (*state)(struct fpga_manager *mgr); u64 (*status)(struct fpga_manager *mgr); + int (*parse_header)(struct fpga_manager *mgr, + struct fpga_image_info *info, + const char *buf, size_t count); int (*write_init)(struct fpga_manager *mgr, struct fpga_image_info *info, const char *buf, size_t count); -- cgit From c346dae4f3fbce51bbd4f2ec5e8c6f9b91e93163 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Wed, 22 Jun 2022 09:29:40 +0800 Subject: virtio: disable notification hardening by default We try to harden virtio device notifications in 8b4ec69d7e09 ("virtio: harden vring IRQ"). It works with the assumption that the driver or core can properly call virtio_device_ready() at the right place. Unfortunately, this seems to be not true and uncover various bugs of the existing drivers, mainly the issue of using virtio_device_ready() incorrectly. So let's add a Kconfig option and disable it by default. It gives us time to fix the drivers and then we can consider re-enabling it. Signed-off-by: Jason Wang Message-Id: <20220622012940.21441-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Cornelia Huck --- include/linux/virtio_config.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 49c7c32815f1..b47c2e7ed0ee 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -257,6 +257,7 @@ void virtio_device_ready(struct virtio_device *dev) WARN_ON(status & VIRTIO_CONFIG_S_DRIVER_OK); +#ifdef CONFIG_VIRTIO_HARDEN_NOTIFICATION /* * The virtio_synchronize_cbs() makes sure vring_interrupt() * will see the driver specific setup if it sees vq->broken @@ -264,6 +265,7 @@ void virtio_device_ready(struct virtio_device *dev) */ virtio_synchronize_cbs(dev); __virtio_unbreak_device(dev); +#endif /* * The transport should ensure the visibility of vq->broken * before setting DRIVER_OK. See the comments for the transport -- cgit From fdfd42892f311e2b3695852036e5be23661dc590 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 15 Jun 2022 17:41:41 +0200 Subject: jump_label: mips: move module NOP patching into arch code MIPS is the only remaining architecture that needs to patch jump label NOP encodings to initialize them at load time. So let's move the module patching part of that from generic code into arch/mips, and drop it from the others. Signed-off-by: Ard Biesheuvel Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220615154142.1574619-3-ardb@kernel.org --- include/linux/jump_label.h | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index bf1eef337a07..2003a0935478 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -230,12 +230,12 @@ extern void static_key_slow_inc(struct static_key *key); extern void static_key_slow_dec(struct static_key *key); extern void static_key_slow_inc_cpuslocked(struct static_key *key); extern void static_key_slow_dec_cpuslocked(struct static_key *key); -extern void jump_label_apply_nops(struct module *mod); extern int static_key_count(struct static_key *key); extern void static_key_enable(struct static_key *key); extern void static_key_disable(struct static_key *key); extern void static_key_enable_cpuslocked(struct static_key *key); extern void static_key_disable_cpuslocked(struct static_key *key); +extern enum jump_label_type jump_label_init_type(struct jump_entry *entry); /* * We should be using ATOMIC_INIT() for initializing .enabled, but @@ -303,11 +303,6 @@ static inline int jump_label_text_reserved(void *start, void *end) static inline void jump_label_lock(void) {} static inline void jump_label_unlock(void) {} -static inline int jump_label_apply_nops(struct module *mod) -{ - return 0; -} - static inline void static_key_enable(struct static_key *key) { STATIC_KEY_CHECK_USE(key); -- cgit From 7e6b9db27de9f69a705c1a046d45882c768e16c3 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 15 Jun 2022 17:41:42 +0200 Subject: jump_label: make initial NOP patching the special case Instead of defaulting to patching NOP opcodes at init time, and leaving it to the architectures to override this if this is not needed, switch to a model where doing nothing is the default. This is the common case by far, as only MIPS requires NOP patching at init time. On all other architectures, the correct encodings are emitted by the compiler and so no initial patching is needed. Signed-off-by: Ard Biesheuvel Signed-off-by: Peter Zijlstra (Intel) Acked-by: Mark Rutland Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220615154142.1574619-4-ardb@kernel.org --- include/linux/jump_label.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h index 2003a0935478..570831ca9951 100644 --- a/include/linux/jump_label.h +++ b/include/linux/jump_label.h @@ -220,8 +220,6 @@ extern void jump_label_lock(void); extern void jump_label_unlock(void); extern void arch_jump_label_transform(struct jump_entry *entry, enum jump_label_type type); -extern void arch_jump_label_transform_static(struct jump_entry *entry, - enum jump_label_type type); extern bool arch_jump_label_transform_queue(struct jump_entry *entry, enum jump_label_type type); extern void arch_jump_label_transform_apply(void); -- cgit From eae6d58d67d9739be5f7ae2dbead1d0ef6528243 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 17 Jun 2022 15:26:06 +0200 Subject: locking/lockdep: Fix lockdep_init_map_*() confusion Commit dfd5e3f5fe27 ("locking/lockdep: Mark local_lock_t") added yet another lockdep_init_map_*() variant, but forgot to update all the existing users of the most complicated version. This could lead to a loss of lock_type and hence an incorrect report. Given the relative rarity of both local_lock and these annotations, this is unlikely to happen in practise, still, best fix things. Fixes: dfd5e3f5fe27 ("locking/lockdep: Mark local_lock_t") Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/YqyEDtoan20K0CVD@worktop.programming.kicks-ass.net --- include/linux/lockdep.h | 30 +++++++++++++++++------------- 1 file changed, 17 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index b6829b970093..1f1099dac3f0 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -188,7 +188,7 @@ static inline void lockdep_init_map_waits(struct lockdep_map *lock, const char *name, struct lock_class_key *key, int subclass, u8 inner, u8 outer) { - lockdep_init_map_type(lock, name, key, subclass, inner, LD_WAIT_INV, LD_LOCK_NORMAL); + lockdep_init_map_type(lock, name, key, subclass, inner, outer, LD_LOCK_NORMAL); } static inline void @@ -211,24 +211,28 @@ static inline void lockdep_init_map(struct lockdep_map *lock, const char *name, * or they are too narrow (they suffer from a false class-split): */ #define lockdep_set_class(lock, key) \ - lockdep_init_map_waits(&(lock)->dep_map, #key, key, 0, \ - (lock)->dep_map.wait_type_inner, \ - (lock)->dep_map.wait_type_outer) + lockdep_init_map_type(&(lock)->dep_map, #key, key, 0, \ + (lock)->dep_map.wait_type_inner, \ + (lock)->dep_map.wait_type_outer, \ + (lock)->dep_map.lock_type) #define lockdep_set_class_and_name(lock, key, name) \ - lockdep_init_map_waits(&(lock)->dep_map, name, key, 0, \ - (lock)->dep_map.wait_type_inner, \ - (lock)->dep_map.wait_type_outer) + lockdep_init_map_type(&(lock)->dep_map, name, key, 0, \ + (lock)->dep_map.wait_type_inner, \ + (lock)->dep_map.wait_type_outer, \ + (lock)->dep_map.lock_type) #define lockdep_set_class_and_subclass(lock, key, sub) \ - lockdep_init_map_waits(&(lock)->dep_map, #key, key, sub,\ - (lock)->dep_map.wait_type_inner, \ - (lock)->dep_map.wait_type_outer) + lockdep_init_map_type(&(lock)->dep_map, #key, key, sub, \ + (lock)->dep_map.wait_type_inner, \ + (lock)->dep_map.wait_type_outer, \ + (lock)->dep_map.lock_type) #define lockdep_set_subclass(lock, sub) \ - lockdep_init_map_waits(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\ - (lock)->dep_map.wait_type_inner, \ - (lock)->dep_map.wait_type_outer) + lockdep_init_map_type(&(lock)->dep_map, #lock, (lock)->dep_map.key, sub,\ + (lock)->dep_map.wait_type_inner, \ + (lock)->dep_map.wait_type_outer, \ + (lock)->dep_map.lock_type) #define lockdep_set_novalidate_class(lock) \ lockdep_set_class_and_name(lock, &__lockdep_no_validate__, #lock) -- cgit From 837f66c71207542283831d0762c5dca3db5b397a Mon Sep 17 00:00:00 2001 From: David Matlack Date: Wed, 22 Jun 2022 15:27:08 -0400 Subject: KVM: Allow for different capacities in kvm_mmu_memory_cache structs Allow the capacity of the kvm_mmu_memory_cache struct to be chosen at declaration time rather than being fixed for all declarations. This will be used in a follow-up commit to declare an cache in x86 with a capacity of 512+ objects without having to increase the capacity of all caches in KVM. This change requires each cache now specify its capacity at runtime, since the cache struct itself no longer has a fixed capacity known at compile time. To protect against someone accidentally defining a kvm_mmu_memory_cache struct directly (without the extra storage), this commit includes a WARN_ON() in kvm_mmu_topup_memory_cache(). In order to support different capacities, this commit changes the objects pointer array to be dynamically allocated the first time the cache is topped-up. While here, opportunistically clean up the stack-allocated kvm_mmu_memory_cache structs in riscv and arm64 to use designated initializers. No functional change intended. Reviewed-by: Marc Zyngier Signed-off-by: David Matlack Message-Id: <20220516232138.1783324-22-dmatlack@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 1 + include/linux/kvm_types.h | 6 +++++- 2 files changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a2bbdf3ab086..3554e48406e4 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1356,6 +1356,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm); #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min); +int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capacity, int min); int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc); void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index f328a01db4fe..4d933518060f 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -85,12 +85,16 @@ struct gfn_to_pfn_cache { * MMU flows is problematic, as is triggering reclaim, I/O, etc... while * holding MMU locks. Note, these caches act more like prefetch buffers than * classical caches, i.e. objects are not returned to the cache on being freed. + * + * The @capacity field and @objects array are lazily initialized when the cache + * is topped up (__kvm_mmu_topup_memory_cache()). */ struct kvm_mmu_memory_cache { int nobjs; gfp_t gfp_zero; struct kmem_cache *kmem_cache; - void *objects[KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE]; + int capacity; + void **objects; }; #endif -- cgit From ebc3197963fc42841b183d0280bd3f9f85f13c30 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 23 Jun 2022 04:34:32 +0000 Subject: ipmr: add rcu protection over (struct vif_device)->dev We will soon use RCU instead of rwlock in ipmr & ip6mr This preliminary patch adds proper rcu verbs to read/write (struct vif_device)->dev Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/mroute_base.h | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h index e05ee9f001ff..10d1e4fb4e9f 100644 --- a/include/linux/mroute_base.h +++ b/include/linux/mroute_base.h @@ -26,7 +26,7 @@ * @remote: Remote address for tunnels */ struct vif_device { - struct net_device *dev; + struct net_device __rcu *dev; netdevice_tracker dev_tracker; unsigned long bytes_in, bytes_out; unsigned long pkt_in, pkt_out; @@ -52,6 +52,7 @@ static inline int mr_call_vif_notifier(struct notifier_block *nb, unsigned short family, enum fib_event_type event_type, struct vif_device *vif, + struct net_device *vif_dev, unsigned short vif_index, u32 tb_id, struct netlink_ext_ack *extack) { @@ -60,7 +61,7 @@ static inline int mr_call_vif_notifier(struct notifier_block *nb, .family = family, .extack = extack, }, - .dev = vif->dev, + .dev = vif_dev, .vif_index = vif_index, .vif_flags = vif->flags, .tb_id = tb_id, @@ -73,6 +74,7 @@ static inline int mr_call_vif_notifiers(struct net *net, unsigned short family, enum fib_event_type event_type, struct vif_device *vif, + struct net_device *vif_dev, unsigned short vif_index, u32 tb_id, unsigned int *ipmr_seq) { @@ -80,7 +82,7 @@ static inline int mr_call_vif_notifiers(struct net *net, .info = { .family = family, }, - .dev = vif->dev, + .dev = vif_dev, .vif_index = vif_index, .vif_flags = vif->flags, .tb_id = tb_id, @@ -98,7 +100,8 @@ static inline int mr_call_vif_notifiers(struct net *net, #define MAXVIFS 32 #endif -#define VIF_EXISTS(_mrt, _idx) (!!((_mrt)->vif_table[_idx].dev)) +/* Note: This helper is deprecated. */ +#define VIF_EXISTS(_mrt, _idx) (!!rcu_access_pointer((_mrt)->vif_table[_idx].dev)) /* mfc_flags: * MFC_STATIC - the entry was added statically (not by a routing daemon) -- cgit From 194366b28b8306b7a24596c57c09635ab2891252 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 23 Jun 2022 04:34:46 +0000 Subject: ipmr: adopt rcu_read_lock() in mr_dump() We no longer need to acquire mrt_lock() in mr_dump, using rcu_read_lock() is enough. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/mroute_base.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mroute_base.h b/include/linux/mroute_base.h index 10d1e4fb4e9f..9dd4bf157255 100644 --- a/include/linux/mroute_base.h +++ b/include/linux/mroute_base.h @@ -308,7 +308,7 @@ int mr_dump(struct net *net, struct notifier_block *nb, unsigned short family, struct netlink_ext_ack *extack), struct mr_table *(*mr_iter)(struct net *net, struct mr_table *mrt), - rwlock_t *mrt_lock, struct netlink_ext_ack *extack); + struct netlink_ext_ack *extack); #else static inline void vif_device_init(struct vif_device *v, struct net_device *dev, @@ -363,7 +363,7 @@ static inline int mr_dump(struct net *net, struct notifier_block *nb, struct netlink_ext_ack *extack), struct mr_table *(*mr_iter)(struct net *net, struct mr_table *mrt), - rwlock_t *mrt_lock, struct netlink_ext_ack *extack) + struct netlink_ext_ack *extack) { return -EINVAL; } -- cgit From ba1afa676d0babf99e99f5415db43fdd7ecef104 Mon Sep 17 00:00:00 2001 From: Qu Wenruo Date: Fri, 24 Jun 2022 17:31:47 +0800 Subject: lib: bitmap: fix the duplicated comments on bitmap_to_arr64() Thanks to the recent commit 0a97953fd221 ("lib: add bitmap_{from,to}_arr64") now we can directly convert a U64 value into a bitmap and vice verse. However when checking the header there is duplicated helper for bitmap_to_arr64(), but no bitmap_from_arr64(). Just fix the copy-n-paste error. Signed-off-by: Qu Wenruo Signed-off-by: Yury Norov --- include/linux/bitmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 2e6cd5681040..f091a1664bf1 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -71,9 +71,9 @@ struct device; * bitmap_release_region(bitmap, pos, order) Free specified bit region * bitmap_allocate_region(bitmap, pos, order) Allocate specified bit region * bitmap_from_arr32(dst, buf, nbits) Copy nbits from u32[] buf to dst + * bitmap_from_arr64(dst, buf, nbits) Copy nbits from u64[] buf to dst * bitmap_to_arr32(buf, src, nbits) Copy nbits from buf to u32[] dst * bitmap_to_arr64(buf, src, nbits) Copy nbits from buf to u64[] dst - * bitmap_to_arr64(buf, src, nbits) Copy nbits from buf to u64[] dst * bitmap_get_value8(map, start) Get 8bit value from map at start * bitmap_set_value8(map, value, start) Set 8bit value to map at start * -- cgit From e61c451476e61450f6771ce03bbc01210a09be16 Mon Sep 17 00:00:00 2001 From: Mark-PK Tsai Date: Fri, 22 Apr 2022 14:24:35 +0800 Subject: dma-mapping: Add dma_release_coherent_memory to DMA API Add dma_release_coherent_memory to DMA API to allow dma user call it to release dev->dma_mem when the device is removed. Signed-off-by: Mark-PK Tsai Acked-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220422062436.14384-2-mark-pk.tsai@mediatek.com Signed-off-by: Mathieu Poirier --- include/linux/dma-map-ops.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 0d5b06b3a4a6..53db9655efe9 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -166,6 +166,7 @@ static inline void dma_pernuma_cma_reserve(void) { } #ifdef CONFIG_DMA_DECLARE_COHERENT int dma_declare_coherent_memory(struct device *dev, phys_addr_t phys_addr, dma_addr_t device_addr, size_t size); +void dma_release_coherent_memory(struct device *dev); int dma_alloc_from_dev_coherent(struct device *dev, ssize_t size, dma_addr_t *dma_handle, void **ret); int dma_release_from_dev_coherent(struct device *dev, int order, void *vaddr); @@ -177,6 +178,8 @@ static inline int dma_declare_coherent_memory(struct device *dev, { return -ENOSYS; } + +#define dma_release_coherent_memory(dev) (0) #define dma_alloc_from_dev_coherent(dev, size, handle, ret) (0) #define dma_release_from_dev_coherent(dev, order, vaddr) (0) #define dma_mmap_from_dev_coherent(dev, vma, vaddr, order, ret) (0) -- cgit From e36de87d34a7f2f26b2e2129c6dc18a0024663eb Mon Sep 17 00:00:00 2001 From: Vineeth Pillai Date: Mon, 23 May 2022 15:03:27 -0400 Subject: KVM: debugfs: expose pid of vcpu threads Add a new debugfs file to expose the pid of each vcpu threads. This is very helpful for userland tools to get the vcpu pids without worrying about thread naming conventions of the VMM. Signed-off-by: Vineeth Pillai (Google) Message-Id: <20220523190327.2658-1-vineeth@bitbyteword.org> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3554e48406e4..3b40f8d68fbb 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1435,6 +1435,8 @@ int kvm_arch_pm_notifier(struct kvm *kvm, unsigned long state); #ifdef __KVM_HAVE_ARCH_VCPU_DEBUGFS void kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu, struct dentry *debugfs_dentry); +#else +static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {} #endif int kvm_arch_hardware_enable(void); -- cgit From 8ca869b24538a7b5501af368e87e4a59b0c04117 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 22 Jun 2022 09:31:31 +0200 Subject: pstore: Add priv field to pstore_record for backend specific use The EFI pstore backend will need to store per-record variable name data when we switch away from the efivars layer. Add a priv field to struct pstore_record, and document it as holding a backend specific pointer that is assumed to be a kmalloc()d buffer, and will be kfree()d when the entire record is freed. Acked-by: Kees Cook Signed-off-by: Ard Biesheuvel --- include/linux/pstore.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pstore.h b/include/linux/pstore.h index e97a8188f0fd..638507a3c8ff 100644 --- a/include/linux/pstore.h +++ b/include/linux/pstore.h @@ -57,6 +57,9 @@ struct pstore_info; * @size: size of @buf * @ecc_notice_size: * ECC information for @buf + * @priv: pointer for backend specific use, will be + * kfree()d by the pstore core if non-NULL + * when the record is freed. * * Valid for PSTORE_TYPE_DMESG @type: * @@ -74,6 +77,7 @@ struct pstore_record { char *buf; ssize_t size; ssize_t ecc_notice_size; + void *priv; int count; enum kmsg_dump_reason reason; -- cgit From ec3507b2ca51286de6ecd85fdac8e722219cdef8 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 20 Jun 2022 17:04:32 +0200 Subject: efi: vars: Don't drop lock in the middle of efivar_init() Even though the efivars_lock lock is documented as protecting the efivars->ops pointer (among other things), efivar_init() happily releases and reacquires the lock for every EFI variable that it enumerates. This used to be needed because the lock was originally a spinlock, which prevented the callback that is invoked for every variable from being able to sleep. However, releasing the lock could potentially invalidate the ops pointer, but more importantly, it might allow a SetVariable() runtime service call to take place concurrently, and the UEFI spec does not define how this affects an enumeration that is running in parallel using the GetNextVariable() runtime service, which is what efivar_init() uses. In the meantime, the lock has been converted into a semaphore, and the only reason we need to drop the lock is because the efivarfs pseudo filesystem driver will otherwise deadlock when it invokes the efivars API from the callback to create the efivar_entry items and insert them into the linked list. (EFI pstore is affected in a similar way) So let's switch to helpers that can be used while the lock is already taken. This way, we can hold on to the lock throughout the enumeration. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 53f64c14a525..56f04b6daeb0 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1064,6 +1064,7 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), void *data, bool duplicates, struct list_head *head); int efivar_entry_add(struct efivar_entry *entry, struct list_head *head); +void __efivar_entry_add(struct efivar_entry *entry, struct list_head *head); int efivar_entry_remove(struct efivar_entry *entry); int __efivar_entry_delete(struct efivar_entry *entry); -- cgit From 472831d4c4b2d8eac783b256e5c829487d5310df Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 20 Jun 2022 13:17:20 +0200 Subject: efi: vars: Add thin wrapper around EFI get/set variable interface The current efivars layer is a jumble of list iterators, shadow data structures and safe variable manipulation helpers that really belong in the efivarfs pseudo file system once the obsolete sysfs access method to EFI variables is removed. So split off a minimal efivar get/set variable API that reuses the existing efivars_lock semaphore to mediate access to the various runtime services, primarily to ensure that performing a SetVariable() on one CPU while another is calling GetNextVariable() in a loop to enumerate the contents of the EFI variable store does not result in surprises. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 56f04b6daeb0..c828ab6f0e2a 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1099,6 +1099,26 @@ bool efivar_validate(efi_guid_t vendor, efi_char16_t *var_name, u8 *data, bool efivar_variable_is_removable(efi_guid_t vendor, const char *name, size_t len); +int efivar_lock(void); +int efivar_trylock(void); +void efivar_unlock(void); + +efi_status_t efivar_get_variable(efi_char16_t *name, efi_guid_t *vendor, + u32 *attr, unsigned long *size, void *data); + +efi_status_t efivar_get_next_variable(unsigned long *name_size, + efi_char16_t *name, efi_guid_t *vendor); + +efi_status_t efivar_set_variable_locked(efi_char16_t *name, efi_guid_t *vendor, + u32 attr, unsigned long data_size, + void *data, bool nonblocking); + +efi_status_t efivar_set_variable(efi_char16_t *name, efi_guid_t *vendor, + u32 attr, unsigned long data_size, void *data); + +efi_status_t check_var_size(u32 attributes, unsigned long size); +efi_status_t check_var_size_nonblocking(u32 attributes, unsigned long size); + #if IS_ENABLED(CONFIG_EFI_CAPSULE_LOADER) extern bool efi_capsule_pending(int *reset_type); -- cgit From 859748255b43460685e93a1f8a40b8cdc3be02f2 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 20 Jun 2022 13:21:26 +0200 Subject: efi: pstore: Omit efivars caching EFI varstore access layer Avoid the efivars layer and simply call the newly introduced EFI varstore helpers instead. This simplifies the code substantially, and also allows us to remove some hacks in the shared efivars layer that were added for efi-pstore specifically. In order to be able to delete the EFI variable associated with a record, store the UTF-16 name of the variable in the pstore record's priv field. That way, we don't have to make guesses regarding which variable the record may have been loaded from. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index c828ab6f0e2a..08bc6215e3b4 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1043,8 +1043,6 @@ struct efivar_entry { struct efi_variable var; struct list_head list; struct kobject kobj; - bool scanning; - bool deleting; }; static inline void -- cgit From 0f5b2c69a4cbe4166ca24b76d5ada98ed2867741 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 20 Jun 2022 13:34:03 +0200 Subject: efi: vars: Remove deprecated 'efivars' sysfs interface Commit 5d9db883761a ("efi: Add support for a UEFI variable filesystem") dated Oct 5, 2012, introduced a new efivarfs pseudo-filesystem to replace the efivars sysfs interface that was used up to that point to expose EFI variables to user space. The main problem with the sysfs interface was that it only supported up to 1024 bytes of payload per file, whereas the underlying variables themselves are only bounded by a platform specific per-variable and global limit that is typically much higher than 1024 bytes. The deprecated sysfs interface is only enabled on x86 and Itanium, other EFI enabled architectures only support the efivarfs pseudo-filesystem. So let's finally rip off the band aid, and drop the old interface entirely. This will make it easier to refactor and clean up the underlying infrastructure that is shared between efivars, efivarfs and efi-pstore, and is long overdue for a makeover. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 18 ------------------ 1 file changed, 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 93ce85a14a46..10ef0a0d5e9a 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1045,12 +1045,6 @@ struct efivar_entry { struct kobject kobj; }; -static inline void -efivar_unregister(struct efivar_entry *var) -{ - kobject_put(&var->kobj); -} - int efivars_register(struct efivars *efivars, const struct efivar_operations *ops, struct kobject *kobject); @@ -1064,8 +1058,6 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), int efivar_entry_add(struct efivar_entry *entry, struct list_head *head); void __efivar_entry_add(struct efivar_entry *entry, struct list_head *head); void efivar_entry_remove(struct efivar_entry *entry); - -int __efivar_entry_delete(struct efivar_entry *entry); int efivar_entry_delete(struct efivar_entry *entry); int efivar_entry_size(struct efivar_entry *entry, unsigned long *size); @@ -1073,22 +1065,12 @@ int __efivar_entry_get(struct efivar_entry *entry, u32 *attributes, unsigned long *size, void *data); int efivar_entry_get(struct efivar_entry *entry, u32 *attributes, unsigned long *size, void *data); -int efivar_entry_set(struct efivar_entry *entry, u32 attributes, - unsigned long size, void *data, struct list_head *head); int efivar_entry_set_get_size(struct efivar_entry *entry, u32 attributes, unsigned long *size, void *data, bool *set); -int efivar_entry_set_safe(efi_char16_t *name, efi_guid_t vendor, u32 attributes, - bool block, unsigned long size, void *data); - -int efivar_entry_iter_begin(void); -void efivar_entry_iter_end(void); int efivar_entry_iter(int (*func)(struct efivar_entry *, void *), struct list_head *head, void *data); -struct efivar_entry *efivar_entry_find(efi_char16_t *name, efi_guid_t guid, - struct list_head *head, bool remove); - bool efivar_validate(efi_guid_t vendor, efi_char16_t *var_name, u8 *data, unsigned long data_size); bool efivar_variable_is_removable(efi_guid_t vendor, const char *name, -- cgit From 3a75f9f2f9ad19bb9a0f566373ae91d8f09db85e Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Tue, 21 Jun 2022 15:48:29 +0200 Subject: efi: vars: Use locking version to iterate over efivars linked lists Both efivars and efivarfs uses __efivar_entry_iter() to go over the linked list that shadows the list of EFI variables held by the firmware, but fail to call the begin/end helpers that are documented as a prerequisite. So switch to the proper version, which is efivar_entry_iter(). Given that in both cases, efivar_entry_remove() is invoked with the lock held already, don't take the lock there anymore. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 08bc6215e3b4..54ca2d6b6c78 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1063,7 +1063,7 @@ int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), int efivar_entry_add(struct efivar_entry *entry, struct list_head *head); void __efivar_entry_add(struct efivar_entry *entry, struct list_head *head); -int efivar_entry_remove(struct efivar_entry *entry); +void efivar_entry_remove(struct efivar_entry *entry); int __efivar_entry_delete(struct efivar_entry *entry); int efivar_entry_delete(struct efivar_entry *entry); -- cgit From 5ac941367a6f85777ef34ec15d60e17ea8e446d4 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Tue, 21 Jun 2022 15:54:53 +0200 Subject: efi: vars: Drop __efivar_entry_iter() helper which is no longer used __efivar_entry_iter() uses a list iterator in a dubious way, i.e., it assumes that the iteration variable always points to an object of the appropriate type, even if the list traversal exhausts the list completely, in which case it will point somewhere in the vicinity of the list's anchor instead. Fortunately, we no longer use this function so we can just get rid of it entirely. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 54ca2d6b6c78..93ce85a14a46 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1083,9 +1083,6 @@ int efivar_entry_set_safe(efi_char16_t *name, efi_guid_t vendor, u32 attributes, int efivar_entry_iter_begin(void); void efivar_entry_iter_end(void); -int __efivar_entry_iter(int (*func)(struct efivar_entry *, void *), - struct list_head *head, void *data, - struct efivar_entry **prev); int efivar_entry_iter(int (*func)(struct efivar_entry *, void *), struct list_head *head, void *data); -- cgit From 2d82e6227ea189c0589e7383a36616ac2a2d248c Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 20 Jun 2022 18:19:43 +0200 Subject: efi: vars: Move efivar caching layer into efivarfs Move the fiddly bits of the efivar layer into its only remaining user, efivarfs, and confine its use to that particular module. All other uses of the EFI variable store have no need for this additional layer of complexity, given that they either only read variables, or read and write variables into a separate GUIDed namespace, and cannot be used to manipulate EFI variables that are covered by the EFI spec and/or affect the boot flow. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 38 -------------------------------------- 1 file changed, 38 deletions(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 10ef0a0d5e9a..8122c2ed505c 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1030,21 +1030,6 @@ struct efivars { #define EFI_VAR_NAME_LEN 1024 -struct efi_variable { - efi_char16_t VariableName[EFI_VAR_NAME_LEN/sizeof(efi_char16_t)]; - efi_guid_t VendorGuid; - unsigned long DataSize; - __u8 Data[1024]; - efi_status_t Status; - __u32 Attributes; -} __attribute__((packed)); - -struct efivar_entry { - struct efi_variable var; - struct list_head list; - struct kobject kobj; -}; - int efivars_register(struct efivars *efivars, const struct efivar_operations *ops, struct kobject *kobject); @@ -1052,29 +1037,6 @@ int efivars_unregister(struct efivars *efivars); struct kobject *efivars_kobject(void); int efivar_supports_writes(void); -int efivar_init(int (*func)(efi_char16_t *, efi_guid_t, unsigned long, void *), - void *data, bool duplicates, struct list_head *head); - -int efivar_entry_add(struct efivar_entry *entry, struct list_head *head); -void __efivar_entry_add(struct efivar_entry *entry, struct list_head *head); -void efivar_entry_remove(struct efivar_entry *entry); -int efivar_entry_delete(struct efivar_entry *entry); - -int efivar_entry_size(struct efivar_entry *entry, unsigned long *size); -int __efivar_entry_get(struct efivar_entry *entry, u32 *attributes, - unsigned long *size, void *data); -int efivar_entry_get(struct efivar_entry *entry, u32 *attributes, - unsigned long *size, void *data); -int efivar_entry_set_get_size(struct efivar_entry *entry, u32 attributes, - unsigned long *size, void *data, bool *set); - -int efivar_entry_iter(int (*func)(struct efivar_entry *, void *), - struct list_head *head, void *data); - -bool efivar_validate(efi_guid_t vendor, efi_char16_t *var_name, u8 *data, - unsigned long data_size); -bool efivar_variable_is_removable(efi_guid_t vendor, const char *name, - size_t len); int efivar_lock(void); int efivar_trylock(void); -- cgit From ede57d58e6f38d5bc66137368e4a1e68a157af6e Mon Sep 17 00:00:00 2001 From: Richard Gobert Date: Wed, 22 Jun 2022 18:09:03 +0200 Subject: net: helper function skb_len_add Move the len fields manipulation in the skbs to a helper function. There is a comment specifically requesting this and there are several other areas in the code displaying the same pattern which can be refactored. This improves code readability. Signed-off-by: Richard Gobert Link: https://lore.kernel.org/r/20220622160853.GA6478@debian Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index cd4a8268894a..f6a27ab19202 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2351,6 +2351,18 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) return skb_headlen(skb) + __skb_pagelen(skb); } +/** + * skb_len_add - adds a number to len fields of skb + * @skb: buffer to add len to + * @delta: number of bytes to add + */ +static inline void skb_len_add(struct sk_buff *skb, int delta) +{ + skb->len += delta; + skb->data_len += delta; + skb->truesize += delta; +} + /** * __skb_fill_page_desc - initialise a paged fragment in an skb * @skb: buffer containing fragment to be initialised -- cgit From 3b89b511ea0c705cc418440e2abf9d692a556d84 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 23 Jun 2022 16:32:32 +0300 Subject: net: fix IFF_TX_SKB_NO_LINEAR definition The "1<<31" shift has a sign extension bug so IFF_TX_SKB_NO_LINEAR is 0xffffffff80000000 instead of 0x0000000080000000. Fixes: c2ff53d8049f ("net: Add priv_flags for allow tx skb without linear") Signed-off-by: Dan Carpenter Reviewed-by: Xuan Zhuo Link: https://lore.kernel.org/r/YrRrcGttfEVnf85Q@kili Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f615a66c89e9..2563d30736e9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1671,7 +1671,7 @@ enum netdev_priv_flags { IFF_FAILOVER_SLAVE = 1<<28, IFF_L3MDEV_RX_HANDLER = 1<<29, IFF_LIVE_RENAME_OK = 1<<30, - IFF_TX_SKB_NO_LINEAR = 1<<31, + IFF_TX_SKB_NO_LINEAR = BIT_ULL(31), IFF_CHANGE_PROTO_DOWN = BIT_ULL(32), }; -- cgit From 1e5267cd0895183e09c5bb76da85c674014285d0 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 21 Jun 2022 16:14:47 +0200 Subject: mnt_idmapping: add vfs{g,u}id_t Introduces new vfs{g,u}id_t types. Similar to k{g,u}id_t the new types are just simple wrapper structs around regular {g,u}id_t types. They allows to establish a type safety boundary between {g,u}ids on idmapped mounts and {g,u}ids as they are represented in filesystems themselves. A vfs{g,u}id_t is always created from a k{g,u}id_t, never directly from a {g,u}id_t as idmapped mounts remap a given {g,u}id according to the mount's idmapping. This is expressed in the VFS{G,U}IDT_INIT() macros. A vfs{g,u}id_t may be used as a k{g,u}id_t via AS_K{G,U}IDT(). This often happens when we need to check whether a {g,u}id mapped according to an idmapped mount is identical to a given k{g,u}id_t. For an example, see vfsgid_in_group_p() which determines whether the value of vfsgid_t matches the value of any of the caller's groups. Similar logic is expressed in the k{g,u}id_eq_vfs{g,u}id(). The from_vfs{g,u}id() helpers map a given vfs{g,u}id_t from the mount's idmapping into the filesystem idmapping. They make it possible to update a filesystem object such as inode->i_{g,u}id with the correct value. This makes it harder to accidently write a wrong {g,u}id anwywhere. The vfs{g,u}id_has_fsmapping() helpers check whether a given vfs{g,u}id_t can be mapped into the filesystem idmapping. All new helpers are nops on non-idmapped mounts. I've done work on this roughly 7 months ago but dropped it to focus on the testsuite. Linus brought this up independently just last week and it's time to move this along (see [1]). [1]: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com Link: https://lore.kernel.org/r/20220621141454.2914719-2-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/mnt_idmapping.h | 262 +++++++++++++++++++++++++++++++++++++----- 1 file changed, 234 insertions(+), 28 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mnt_idmapping.h b/include/linux/mnt_idmapping.h index ee5a217de2a8..71b4cd5a7466 100644 --- a/include/linux/mnt_idmapping.h +++ b/include/linux/mnt_idmapping.h @@ -13,6 +13,122 @@ struct user_namespace; */ extern struct user_namespace init_user_ns; +typedef struct { + uid_t val; +} vfsuid_t; + +typedef struct { + gid_t val; +} vfsgid_t; + +#ifdef CONFIG_MULTIUSER +static inline uid_t __vfsuid_val(vfsuid_t uid) +{ + return uid.val; +} + +static inline gid_t __vfsgid_val(vfsgid_t gid) +{ + return gid.val; +} +#else +static inline uid_t __vfsuid_val(vfsuid_t uid) +{ + return 0; +} + +static inline gid_t __vfsgid_val(vfsgid_t gid) +{ + return 0; +} +#endif + +static inline bool vfsuid_valid(vfsuid_t uid) +{ + return __vfsuid_val(uid) != (uid_t)-1; +} + +static inline bool vfsgid_valid(vfsgid_t gid) +{ + return __vfsgid_val(gid) != (gid_t)-1; +} + +static inline bool vfsuid_eq(vfsuid_t left, vfsuid_t right) +{ + return __vfsuid_val(left) == __vfsuid_val(right); +} + +static inline bool vfsgid_eq(vfsgid_t left, vfsgid_t right) +{ + return __vfsgid_val(left) == __vfsgid_val(right); +} + +/** + * vfsuid_eq_kuid - check whether kuid and vfsuid have the same value + * @kuid: the kuid to compare + * @vfsuid: the vfsuid to compare + * + * Check whether @kuid and @vfsuid have the same values. + * + * Return: true if @kuid and @vfsuid have the same value, false if not. + */ +static inline bool vfsuid_eq_kuid(vfsuid_t vfsuid, kuid_t kuid) +{ + return __vfsuid_val(vfsuid) == __kuid_val(kuid); +} + +/** + * vfsgid_eq_kgid - check whether kgid and vfsgid have the same value + * @kgid: the kgid to compare + * @vfsgid: the vfsgid to compare + * + * Check whether @kgid and @vfsgid have the same values. + * + * Return: true if @kgid and @vfsgid have the same value, false if not. + */ +static inline bool vfsgid_eq_kgid(kgid_t kgid, vfsgid_t vfsgid) +{ + return __vfsgid_val(vfsgid) == __kgid_val(kgid); +} + +/* + * vfs{g,u}ids are created from k{g,u}ids. + * We don't allow them to be created from regular {u,g}id. + */ +#define VFSUIDT_INIT(val) (vfsuid_t){ __kuid_val(val) } +#define VFSGIDT_INIT(val) (vfsgid_t){ __kgid_val(val) } + +#define INVALID_VFSUID VFSUIDT_INIT(INVALID_UID) +#define INVALID_VFSGID VFSGIDT_INIT(INVALID_GID) + +/* + * Allow a vfs{g,u}id to be used as a k{g,u}id where we want to compare + * whether the mapped value is identical to value of a k{g,u}id. + */ +#define AS_KUIDT(val) (kuid_t){ __vfsuid_val(val) } +#define AS_KGIDT(val) (kgid_t){ __vfsgid_val(val) } + +#ifdef CONFIG_MULTIUSER +/** + * vfsgid_in_group_p() - check whether a vfsuid matches the caller's groups + * @vfsgid: the mnt gid to match + * + * This function can be used to determine whether @vfsuid matches any of the + * caller's groups. + * + * Return: 1 if vfsuid matches caller's groups, 0 if not. + */ +static inline int vfsgid_in_group_p(vfsgid_t vfsgid) +{ + return in_group_p(AS_KGIDT(vfsgid)); +} +#else +static inline int vfsgid_in_group_p(vfsgid_t vfsgid) +{ + return 1; +} +#endif + /** * initial_idmapping - check whether this is the initial mapping * @ns: idmapping to check @@ -67,21 +183,29 @@ static inline bool no_idmapping(const struct user_namespace *mnt_userns, * If @kuid has no mapping in either @mnt_userns or @fs_userns INVALID_UID is * returned. */ -static inline kuid_t mapped_kuid_fs(struct user_namespace *mnt_userns, - struct user_namespace *fs_userns, - kuid_t kuid) + +static inline vfsuid_t make_vfsuid(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + kuid_t kuid) { uid_t uid; if (no_idmapping(mnt_userns, fs_userns)) - return kuid; + return VFSUIDT_INIT(kuid); if (initial_idmapping(fs_userns)) uid = __kuid_val(kuid); else uid = from_kuid(fs_userns, kuid); if (uid == (uid_t)-1) - return INVALID_UID; - return make_kuid(mnt_userns, uid); + return INVALID_VFSUID; + return VFSUIDT_INIT(make_kuid(mnt_userns, uid)); +} + +static inline kuid_t mapped_kuid_fs(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + kuid_t kuid) +{ + return AS_KUIDT(make_vfsuid(mnt_userns, fs_userns, kuid)); } /** @@ -104,21 +228,56 @@ static inline kuid_t mapped_kuid_fs(struct user_namespace *mnt_userns, * If @kgid has no mapping in either @mnt_userns or @fs_userns INVALID_GID is * returned. */ -static inline kgid_t mapped_kgid_fs(struct user_namespace *mnt_userns, - struct user_namespace *fs_userns, - kgid_t kgid) + +static inline vfsgid_t make_vfsgid(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + kgid_t kgid) { gid_t gid; if (no_idmapping(mnt_userns, fs_userns)) - return kgid; + return VFSGIDT_INIT(kgid); if (initial_idmapping(fs_userns)) gid = __kgid_val(kgid); else gid = from_kgid(fs_userns, kgid); if (gid == (gid_t)-1) - return INVALID_GID; - return make_kgid(mnt_userns, gid); + return INVALID_VFSGID; + return VFSGIDT_INIT(make_kgid(mnt_userns, gid)); +} + +static inline kgid_t mapped_kgid_fs(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + kgid_t kgid) +{ + return AS_KGIDT(make_vfsgid(mnt_userns, fs_userns, kgid)); +} + +/** + * from_vfsuid - map a vfsuid into the filesystem idmapping + * @mnt_userns: the mount's idmapping + * @fs_userns: the filesystem's idmapping + * @vfsuid : vfsuid to be mapped + * + * Map @vfsuid into the filesystem idmapping. This function has to be used in + * order to e.g. write @vfsuid to inode->i_uid. + * + * Return: @vfsuid mapped into the filesystem idmapping + */ +static inline kuid_t from_vfsuid(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + vfsuid_t vfsuid) +{ + uid_t uid; + + if (no_idmapping(mnt_userns, fs_userns)) + return AS_KUIDT(vfsuid); + uid = from_kuid(mnt_userns, AS_KUIDT(vfsuid)); + if (uid == (uid_t)-1) + return INVALID_UID; + if (initial_idmapping(fs_userns)) + return KUIDT_INIT(uid); + return make_kuid(fs_userns, uid); } /** @@ -145,16 +304,53 @@ static inline kuid_t mapped_kuid_user(struct user_namespace *mnt_userns, struct user_namespace *fs_userns, kuid_t kuid) { - uid_t uid; + return from_vfsuid(mnt_userns, fs_userns, VFSUIDT_INIT(kuid)); +} + +/** + * vfsuid_has_fsmapping - check whether a vfsuid maps into the filesystem + * @mnt_userns: the mount's idmapping + * @fs_userns: the filesystem's idmapping + * @vfsuid: vfsuid to be mapped + * + * Check whether @vfsuid has a mapping in the filesystem idmapping. Use this + * function to check whether the filesystem idmapping has a mapping for + * @vfsuid. + * + * Return: true if @vfsuid has a mapping in the filesystem, false if not. + */ +static inline bool vfsuid_has_fsmapping(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + vfsuid_t vfsuid) +{ + return uid_valid(from_vfsuid(mnt_userns, fs_userns, vfsuid)); +} + +/** + * from_vfsgid - map a vfsgid into the filesystem idmapping + * @mnt_userns: the mount's idmapping + * @fs_userns: the filesystem's idmapping + * @vfsgid : vfsgid to be mapped + * + * Map @vfsgid into the filesystem idmapping. This function has to be used in + * order to e.g. write @vfsgid to inode->i_gid. + * + * Return: @vfsgid mapped into the filesystem idmapping + */ +static inline kgid_t from_vfsgid(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + vfsgid_t vfsgid) +{ + gid_t gid; if (no_idmapping(mnt_userns, fs_userns)) - return kuid; - uid = from_kuid(mnt_userns, kuid); - if (uid == (uid_t)-1) - return INVALID_UID; + return AS_KGIDT(vfsgid); + gid = from_kgid(mnt_userns, AS_KGIDT(vfsgid)); + if (gid == (gid_t)-1) + return INVALID_GID; if (initial_idmapping(fs_userns)) - return KUIDT_INIT(uid); - return make_kuid(fs_userns, uid); + return KGIDT_INIT(gid); + return make_kgid(fs_userns, gid); } /** @@ -181,16 +377,26 @@ static inline kgid_t mapped_kgid_user(struct user_namespace *mnt_userns, struct user_namespace *fs_userns, kgid_t kgid) { - gid_t gid; + return from_vfsgid(mnt_userns, fs_userns, VFSGIDT_INIT(kgid)); +} - if (no_idmapping(mnt_userns, fs_userns)) - return kgid; - gid = from_kgid(mnt_userns, kgid); - if (gid == (gid_t)-1) - return INVALID_GID; - if (initial_idmapping(fs_userns)) - return KGIDT_INIT(gid); - return make_kgid(fs_userns, gid); +/** + * vfsgid_has_fsmapping - check whether a vfsgid maps into the filesystem + * @mnt_userns: the mount's idmapping + * @fs_userns: the filesystem's idmapping + * @vfsgid: vfsgid to be mapped + * + * Check whether @vfsgid has a mapping in the filesystem idmapping. Use this + * function to check whether the filesystem idmapping has a mapping for + * @vfsgid. + * + * Return: true if @vfsgid has a mapping in the filesystem, false if not. + */ +static inline bool vfsgid_has_fsmapping(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + vfsgid_t vfsgid) +{ + return gid_valid(from_vfsgid(mnt_userns, fs_userns, vfsgid)); } /** -- cgit From 234a3113f28d02973ecf501f83d796ea89db295f Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 21 Jun 2022 16:14:48 +0200 Subject: fs: add two type safe mapping helpers Introduce i_{g,u}id_into_vfs{g,u}id(). They return vfs{g,u}id_t. This makes it way harder to confused idmapped mount {g,u}ids with filesystem {g,u}ids. The two helpers will eventually replace the old non type safe i_{g,u}id_into_mnt() helpers once we finished converting all places. Add a comment noting that they will be removed in the future. All new helpers are nops on non-idmapped mounts. Link: https://lore.kernel.org/r/20220621141454.2914719-3-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/fs.h | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ad5e3520fae..afcffa251cb9 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1600,13 +1600,30 @@ static inline void i_gid_write(struct inode *inode, gid_t gid) * @mnt_userns: user namespace of the mount the inode was found from * @inode: inode to map * + * Note, this will eventually be removed completely in favor of the type-safe + * i_uid_into_vfsuid(). + * * Return: the inode's i_uid mapped down according to @mnt_userns. * If the inode's i_uid has no mapping INVALID_UID is returned. */ static inline kuid_t i_uid_into_mnt(struct user_namespace *mnt_userns, const struct inode *inode) { - return mapped_kuid_fs(mnt_userns, i_user_ns(inode), inode->i_uid); + return AS_KUIDT(make_vfsuid(mnt_userns, i_user_ns(inode), inode->i_uid)); +} + +/** + * i_uid_into_vfsuid - map an inode's i_uid down into a mnt_userns + * @mnt_userns: user namespace of the mount the inode was found from + * @inode: inode to map + * + * Return: whe inode's i_uid mapped down according to @mnt_userns. + * If the inode's i_uid has no mapping INVALID_VFSUID is returned. + */ +static inline vfsuid_t i_uid_into_vfsuid(struct user_namespace *mnt_userns, + const struct inode *inode) +{ + return make_vfsuid(mnt_userns, i_user_ns(inode), inode->i_uid); } /** @@ -1614,13 +1631,30 @@ static inline kuid_t i_uid_into_mnt(struct user_namespace *mnt_userns, * @mnt_userns: user namespace of the mount the inode was found from * @inode: inode to map * + * Note, this will eventually be removed completely in favor of the type-safe + * i_gid_into_vfsgid(). + * * Return: the inode's i_gid mapped down according to @mnt_userns. * If the inode's i_gid has no mapping INVALID_GID is returned. */ static inline kgid_t i_gid_into_mnt(struct user_namespace *mnt_userns, const struct inode *inode) { - return mapped_kgid_fs(mnt_userns, i_user_ns(inode), inode->i_gid); + return AS_KGIDT(make_vfsgid(mnt_userns, i_user_ns(inode), inode->i_gid)); +} + +/** + * i_gid_into_vfsgid - map an inode's i_gid down into a mnt_userns + * @mnt_userns: user namespace of the mount the inode was found from + * @inode: inode to map + * + * Return: the inode's i_gid mapped down according to @mnt_userns. + * If the inode's i_gid has no mapping INVALID_VFSGID is returned. + */ +static inline vfsgid_t i_gid_into_vfsgid(struct user_namespace *mnt_userns, + const struct inode *inode) +{ + return make_vfsgid(mnt_userns, i_user_ns(inode), inode->i_gid); } /** -- cgit From 45c311501c77217c50d08ed08aa722c812d92ab5 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 21 Jun 2022 16:14:49 +0200 Subject: fs: use mount types in iattr Add ia_vfs{g,u}id members of type vfs{g,u}id_t to struct iattr. We use an anonymous union (similar to what we do in struct file) around ia_{g,u}id and ia_vfs{g,u}id. At the end of this series ia_{g,u}id and ia_vfs{g,u}id will always contain the same value independent of whether struct iattr is initialized from an idmapped mount. This is a change from how this is done today. Wrapping this in a anonymous unions has a few advantages. It allows us to avoid needlessly increasing struct iattr. Since the types for ia_{g,u}id and ia_vfs{g,u}id are structures with overlapping/identical members they are covered by 6.5.2.3/6 of the C standard and it is safe to initialize and access them. Filesystems that raise FS_ALLOW_IDMAP and thus support idmapped mounts will have to use ia_vfs{g,u}id and the associated helpers. And will be ported at the end of this series. They will immediately benefit from the type safe new helpers. Filesystems that do not support FS_ALLOW_IDMAP can continue to use ia_{g,u}id for now. The aim is to convert every filesystem to always use ia_vfs{g,u}id and thus ultimately remove the ia_{g,u}id members. Link: https://lore.kernel.org/r/20220621141454.2914719-4-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/fs.h | 18 ++++++++++++++++-- include/linux/mnt_idmapping.h | 5 +++++ 2 files changed, 21 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index afcffa251cb9..54ffcdce3ccb 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -221,8 +221,22 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, struct iattr { unsigned int ia_valid; umode_t ia_mode; - kuid_t ia_uid; - kgid_t ia_gid; + /* + * The two anonymous unions wrap structures with the same member. + * + * Filesystems raising FS_ALLOW_IDMAP need to use ia_vfs{g,u}id which + * are a dedicated type requiring the filesystem to use the dedicated + * helpers. Other filesystem can continue to use ia_{g,u}id until they + * have been ported. + */ + union { + kuid_t ia_uid; + vfsuid_t ia_vfsuid; + }; + union { + kgid_t ia_gid; + vfsgid_t ia_vfsgid; + }; loff_t ia_size; struct timespec64 ia_atime; struct timespec64 ia_mtime; diff --git a/include/linux/mnt_idmapping.h b/include/linux/mnt_idmapping.h index 71b4cd5a7466..6a752b61088b 100644 --- a/include/linux/mnt_idmapping.h +++ b/include/linux/mnt_idmapping.h @@ -21,6 +21,11 @@ typedef struct { gid_t val; } vfsgid_t; +static_assert(sizeof(vfsuid_t) == sizeof(kuid_t)); +static_assert(sizeof(vfsgid_t) == sizeof(kgid_t)); +static_assert(offsetof(vfsuid_t, val) == offsetof(kuid_t, val)); +static_assert(offsetof(vfsgid_t, val) == offsetof(kgid_t, val)); + #ifdef CONFIG_MULTIUSER static inline uid_t __vfsuid_val(vfsuid_t uid) { -- cgit From 1f36146a5a3dc6098566d34a9886f9e97c88d93e Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 21 Jun 2022 16:14:50 +0200 Subject: fs: introduce tiny iattr ownership update helpers Nearly all fileystems currently open-code the same checks for determining whether the i_{g,u}id fields of an inode need to be updated and then updating the fields. Introduce tiny helpers i_{g,u}id_needs_update() and i_{g,u}id_update() that wrap this logic. This allows filesystems to not care about updating inode->i_{g,u}id with the correct values themselves instead leaving this to the helpers. We also get rid of a lot of code duplication and make it easier to change struct iattr in the future since changes can be localized to these helpers. And finally we make it hard to conflate k{g,u}id_t types with vfs{g,u}id_t types for filesystems that support idmapped mounts. In the following patch we will port all filesystems that raise FS_ALLOW_IDMAP to use the new helpers. However, the ultimate goal is to convert all filesystems to make use of these helpers. All new helpers are nops on non-idmapped mounts. Link: https://lore.kernel.org/r/20220621141454.2914719-5-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/fs.h | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 76 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 54ffcdce3ccb..0f8cc7f2665a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1640,6 +1640,44 @@ static inline vfsuid_t i_uid_into_vfsuid(struct user_namespace *mnt_userns, return make_vfsuid(mnt_userns, i_user_ns(inode), inode->i_uid); } +/** + * i_uid_needs_update - check whether inode's i_uid needs to be updated + * @mnt_userns: user namespace of the mount the inode was found from + * @attr: the new attributes of @inode + * @inode: the inode to update + * + * Check whether the $inode's i_uid field needs to be updated taking idmapped + * mounts into account if the filesystem supports it. + * + * Return: true if @inode's i_uid field needs to be updated, false if not. + */ +static inline bool i_uid_needs_update(struct user_namespace *mnt_userns, + const struct iattr *attr, + const struct inode *inode) +{ + return ((attr->ia_valid & ATTR_UID) && + !vfsuid_eq(attr->ia_vfsuid, + i_uid_into_vfsuid(mnt_userns, inode))); +} + +/** + * i_uid_update - update @inode's i_uid field + * @mnt_userns: user namespace of the mount the inode was found from + * @attr: the new attributes of @inode + * @inode: the inode to update + * + * Safely update @inode's i_uid field translating the vfsuid of any idmapped + * mount into the filesystem kuid. + */ +static inline void i_uid_update(struct user_namespace *mnt_userns, + const struct iattr *attr, + struct inode *inode) +{ + if (attr->ia_valid & ATTR_UID) + inode->i_uid = from_vfsuid(mnt_userns, i_user_ns(inode), + attr->ia_vfsuid); +} + /** * i_gid_into_mnt - map an inode's i_gid down into a mnt_userns * @mnt_userns: user namespace of the mount the inode was found from @@ -1671,6 +1709,44 @@ static inline vfsgid_t i_gid_into_vfsgid(struct user_namespace *mnt_userns, return make_vfsgid(mnt_userns, i_user_ns(inode), inode->i_gid); } +/** + * i_gid_needs_update - check whether inode's i_gid needs to be updated + * @mnt_userns: user namespace of the mount the inode was found from + * @attr: the new attributes of @inode + * @inode: the inode to update + * + * Check whether the $inode's i_gid field needs to be updated taking idmapped + * mounts into account if the filesystem supports it. + * + * Return: true if @inode's i_gid field needs to be updated, false if not. + */ +static inline bool i_gid_needs_update(struct user_namespace *mnt_userns, + const struct iattr *attr, + const struct inode *inode) +{ + return ((attr->ia_valid & ATTR_GID) && + !vfsgid_eq(attr->ia_vfsgid, + i_gid_into_vfsgid(mnt_userns, inode))); +} + +/** + * i_gid_update - update @inode's i_gid field + * @mnt_userns: user namespace of the mount the inode was found from + * @attr: the new attributes of @inode + * @inode: the inode to update + * + * Safely update @inode's i_gid field translating the vfsgid of any idmapped + * mount into the filesystem kgid. + */ +static inline void i_gid_update(struct user_namespace *mnt_userns, + const struct iattr *attr, + struct inode *inode) +{ + if (attr->ia_valid & ATTR_GID) + inode->i_gid = from_vfsgid(mnt_userns, i_user_ns(inode), + attr->ia_vfsgid); +} + /** * inode_fsuid_set - initialize inode's i_uid field with callers fsuid * @inode: inode to initialize -- cgit From 35faf3109a78516f60ca13f957083d5e5535fde0 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 21 Jun 2022 16:14:51 +0200 Subject: fs: port to iattr ownership update helpers Earlier we introduced new helpers to abstract ownership update and remove code duplication. This converts all filesystems supporting idmapped mounts to make use of these new helpers. For now we always pass the initial idmapping which makes the idmapping functions these helpers call nops. This is done because we currently always pass the actual value to be written to i_{g,u}id via struct iattr. While this allowed us to treat the {g,u}id values in struct iattr as values that can be directly written to inode->i_{g,u}id it also increases the potential for confusion for filesystems. Now that we are have dedicated types to prevent this confusion we will ultimately only map the value from the idmapped mount into a filesystem value that can be written to inode->i_{g,u}id when the filesystem actually updates the inode. So pass down the initial idmapping until we finished that conversion at which point we pass down the mount's idmapping. No functional changes intended. Link: https://lore.kernel.org/r/20220621141454.2914719-6-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/quotaops.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index a0f6668924d3..61ee34861ca2 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -22,9 +22,9 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb) /* i_mutex must being held */ static inline bool is_quota_modification(struct inode *inode, struct iattr *ia) { - return (ia->ia_valid & ATTR_SIZE) || - (ia->ia_valid & ATTR_UID && !uid_eq(ia->ia_uid, inode->i_uid)) || - (ia->ia_valid & ATTR_GID && !gid_eq(ia->ia_gid, inode->i_gid)); + return ((ia->ia_valid & ATTR_SIZE) || + i_uid_needs_update(&init_user_ns, ia, inode) || + i_gid_needs_update(&init_user_ns, ia, inode)); } #if defined(CONFIG_QUOTA) -- cgit From 71e7b535b8900d7ce7d5279fa472711db5251ae5 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 21 Jun 2022 16:14:52 +0200 Subject: quota: port quota helpers mount ids Port the is_quota_modification() and dqout_transfer() helper to type safe vfs{g,u}id_t. Since these helpers are only called by a few filesystems don't introduce a new helper but simply extend the existing helpers to pass down the mount's idmapping. Note, that this is a non-functional change, i.e. nothing will have happened here or at the end of this series to how quota are done! This a change necessary because we will at the end of this series make ownership changes easier to reason about by keeping the original value in struct iattr for both non-idmapped and idmapped mounts. For now we always pass the initial idmapping which makes the idmapping functions these helpers call nops. This is done because we currently always pass the actual value to be written to i_{g,u}id via struct iattr. While this allowed us to treat the {g,u}id values in struct iattr as values that can be directly written to inode->i_{g,u}id it also increases the potential for confusion for filesystems. Now that we are have dedicated types to prevent this confusion we will ultimately only map the value from the idmapped mount into a filesystem value that can be written to inode->i_{g,u}id when the filesystem actually updates the inode. So pass down the initial idmapping until we finished that conversion at which point we pass down the mount's idmapping. Since struct iattr uses an anonymous union with overlapping types as supported by the C standard, filesystems that haven't converted to ia_vfs{g,u}id won't see any difference and things will continue to work as before. In other words, no functional changes intended with this change. Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Jan Kara Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Jan Kara Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/quotaops.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index 61ee34861ca2..0342ff6584fd 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -20,7 +20,8 @@ static inline struct quota_info *sb_dqopt(struct super_block *sb) } /* i_mutex must being held */ -static inline bool is_quota_modification(struct inode *inode, struct iattr *ia) +static inline bool is_quota_modification(struct user_namespace *mnt_userns, + struct inode *inode, struct iattr *ia) { return ((ia->ia_valid & ATTR_SIZE) || i_uid_needs_update(&init_user_ns, ia, inode) || @@ -115,7 +116,8 @@ int dquot_set_dqblk(struct super_block *sb, struct kqid id, struct qc_dqblk *di); int __dquot_transfer(struct inode *inode, struct dquot **transfer_to); -int dquot_transfer(struct inode *inode, struct iattr *iattr); +int dquot_transfer(struct user_namespace *mnt_userns, struct inode *inode, + struct iattr *iattr); static inline struct mem_dqinfo *sb_dqinfo(struct super_block *sb, int type) { @@ -234,7 +236,8 @@ static inline void dquot_free_inode(struct inode *inode) { } -static inline int dquot_transfer(struct inode *inode, struct iattr *iattr) +static inline int dquot_transfer(struct user_namespace *mnt_userns, + struct inode *inode, struct iattr *iattr) { return 0; } -- cgit From 0e363cf3fa598c69340794da068d4d9cbc869322 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 21 Jun 2022 16:14:53 +0200 Subject: security: pass down mount idmapping to setattr hook Before this change we used to take a shortcut and place the actual values that would be written to inode->i_{g,u}id into struct iattr. This had the advantage that we moved idmappings mostly out of the picture early on but it made reasoning about changes more difficult than it should be. The filesystem was never explicitly told that it dealt with an idmapped mount. The transition to the value that needed to be stored in inode->i_{g,u}id appeared way too early and increased the probability of bugs in various codepaths. We know place the same value in struct iattr no matter if this is an idmapped mount or not. The vfs will only deal with type safe vfs{g,u}id_t. This makes it massively safer to perform permission checks as the type will tell us what checks we need to perform and what helpers we need to use. Adapt the security_inode_setattr() helper to pass down the mount's idmapping to account for that change. Link: https://lore.kernel.org/r/20220621141454.2914719-8-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/evm.h | 6 ++++-- include/linux/security.h | 8 +++++--- 2 files changed, 9 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/evm.h b/include/linux/evm.h index 4c374be70247..aa63e0b3c0a2 100644 --- a/include/linux/evm.h +++ b/include/linux/evm.h @@ -21,7 +21,8 @@ extern enum integrity_status evm_verifyxattr(struct dentry *dentry, void *xattr_value, size_t xattr_value_len, struct integrity_iint_cache *iint); -extern int evm_inode_setattr(struct dentry *dentry, struct iattr *attr); +extern int evm_inode_setattr(struct user_namespace *mnt_userns, + struct dentry *dentry, struct iattr *attr); extern void evm_inode_post_setattr(struct dentry *dentry, int ia_valid); extern int evm_inode_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry, const char *name, @@ -68,7 +69,8 @@ static inline enum integrity_status evm_verifyxattr(struct dentry *dentry, } #endif -static inline int evm_inode_setattr(struct dentry *dentry, struct iattr *attr) +static inline int evm_inode_setattr(struct user_namespace *mnt_userns, + struct dentry *dentry, struct iattr *attr) { return 0; } diff --git a/include/linux/security.h b/include/linux/security.h index 7fc4e9f49f54..4d0baf30266e 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -353,7 +353,8 @@ int security_inode_readlink(struct dentry *dentry); int security_inode_follow_link(struct dentry *dentry, struct inode *inode, bool rcu); int security_inode_permission(struct inode *inode, int mask); -int security_inode_setattr(struct dentry *dentry, struct iattr *attr); +int security_inode_setattr(struct user_namespace *mnt_userns, + struct dentry *dentry, struct iattr *attr); int security_inode_getattr(const struct path *path); int security_inode_setxattr(struct user_namespace *mnt_userns, struct dentry *dentry, const char *name, @@ -848,8 +849,9 @@ static inline int security_inode_permission(struct inode *inode, int mask) return 0; } -static inline int security_inode_setattr(struct dentry *dentry, - struct iattr *attr) +static inline int security_inode_setattr(struct user_namespace *mnt_userns, + struct dentry *dentry, + struct iattr *attr) { return 0; } -- cgit From b27c82e1296572cfa3997e58db3118a33915f85c Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Tue, 21 Jun 2022 16:14:54 +0200 Subject: attr: port attribute changes to new types Now that we introduced new infrastructure to increase the type safety for filesystems supporting idmapped mounts port the first part of the vfs over to them. This ports the attribute changes codepaths to rely on the new better helpers using a dedicated type. Before this change we used to take a shortcut and place the actual values that would be written to inode->i_{g,u}id into struct iattr. This had the advantage that we moved idmappings mostly out of the picture early on but it made reasoning about changes more difficult than it should be. The filesystem was never explicitly told that it dealt with an idmapped mount. The transition to the value that needed to be stored in inode->i_{g,u}id appeared way too early and increased the probability of bugs in various codepaths. We know place the same value in struct iattr no matter if this is an idmapped mount or not. The vfs will only deal with type safe vfs{g,u}id_t. This makes it massively safer to perform permission checks as the type will tell us what checks we need to perform and what helpers we need to use. Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to inode->i_{g,u}id since they are different types. Instead they need to use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the vfs{g,u}id into the filesystem. The other nice effect is that filesystems like overlayfs don't need to care about idmappings explicitly anymore and can simply set up struct iattr accordingly directly. Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1] Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org Cc: Seth Forshee Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Linus Torvalds Cc: Al Viro CC: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/fs.h | 4 ++++ include/linux/quotaops.h | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 0f8cc7f2665a..d6e3347cbf69 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -228,6 +228,10 @@ struct iattr { * are a dedicated type requiring the filesystem to use the dedicated * helpers. Other filesystem can continue to use ia_{g,u}id until they * have been ported. + * + * They always contain the same value. In other words FS_ALLOW_IDMAP + * pass down the same value on idmapped mounts as they would on regular + * mounts. */ union { kuid_t ia_uid; diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index 0342ff6584fd..0d8625d71733 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -24,8 +24,8 @@ static inline bool is_quota_modification(struct user_namespace *mnt_userns, struct inode *inode, struct iattr *ia) { return ((ia->ia_valid & ATTR_SIZE) || - i_uid_needs_update(&init_user_ns, ia, inode) || - i_gid_needs_update(&init_user_ns, ia, inode)); + i_uid_needs_update(mnt_userns, ia, inode) || + i_gid_needs_update(mnt_userns, ia, inode)); } #if defined(CONFIG_QUOTA) -- cgit From 0cae04373b77a117830e5f7d7aaa7eaf01f950d5 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 6 Jun 2022 09:47:33 +0200 Subject: dmaengine: remove DMA_MEMCPY_SG once again This was removed before due to the complete lack of users, but 3218910fd585 ("dmaengine: Add core function and capability check for DMA_MEMCPY_SG") and 29cf37fa6dd9 ("dmaengine: Add consumer for the new DMA_MEMCPY_SG API function.") added it back despite still not having any users whatsoever. Fixes: 3218910fd585 ("dmaengine: Add core function and capability check for DMA_MEMCPY_SG") Fixes: 29cf37fa6dd9 ("dmaengine: Add consumer for the new DMA_MEMCPY_SG API function.") Signed-off-by: Christoph Hellwig Acked-by: Michal Simek Link: https://lore.kernel.org/r/20220606074733.622616-1-hch@lst.de Signed-off-by: Vinod Koul --- include/linux/dmaengine.h | 20 -------------------- 1 file changed, 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h index b46b88e6aa0d..c923f4e60f24 100644 --- a/include/linux/dmaengine.h +++ b/include/linux/dmaengine.h @@ -50,7 +50,6 @@ enum dma_status { */ enum dma_transaction_type { DMA_MEMCPY, - DMA_MEMCPY_SG, DMA_XOR, DMA_PQ, DMA_XOR_VAL, @@ -887,11 +886,6 @@ struct dma_device { struct dma_async_tx_descriptor *(*device_prep_dma_memcpy)( struct dma_chan *chan, dma_addr_t dst, dma_addr_t src, size_t len, unsigned long flags); - struct dma_async_tx_descriptor *(*device_prep_dma_memcpy_sg)( - struct dma_chan *chan, - struct scatterlist *dst_sg, unsigned int dst_nents, - struct scatterlist *src_sg, unsigned int src_nents, - unsigned long flags); struct dma_async_tx_descriptor *(*device_prep_dma_xor)( struct dma_chan *chan, dma_addr_t dst, dma_addr_t *src, unsigned int src_cnt, size_t len, unsigned long flags); @@ -1060,20 +1054,6 @@ static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy( len, flags); } -static inline struct dma_async_tx_descriptor *dmaengine_prep_dma_memcpy_sg( - struct dma_chan *chan, - struct scatterlist *dst_sg, unsigned int dst_nents, - struct scatterlist *src_sg, unsigned int src_nents, - unsigned long flags) -{ - if (!chan || !chan->device || !chan->device->device_prep_dma_memcpy_sg) - return NULL; - - return chan->device->device_prep_dma_memcpy_sg(chan, dst_sg, dst_nents, - src_sg, src_nents, - flags); -} - static inline bool dmaengine_is_metadata_mode_supported(struct dma_chan *chan, enum dma_desc_metadata_mode mode) { -- cgit From 742ab6df974ae8384a2dd213db1a3a06cf6d8936 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 14 Jun 2022 23:15:32 +0200 Subject: x86/kvm/vmx: Make noinstr clean The recent mmio_stale_data fixes broke the noinstr constraints: vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x15b: call to wrmsrl.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x1bf: call to kvm_arch_has_assigned_device() leaves .noinstr.text section make it all happy again. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov --- include/linux/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c20f2d55840c..83cf7fd842e0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1513,7 +1513,7 @@ static inline void kvm_arch_end_assignment(struct kvm *kvm) { } -static inline bool kvm_arch_has_assigned_device(struct kvm *kvm) +static __always_inline bool kvm_arch_has_assigned_device(struct kvm *kvm) { return false; } -- cgit From 6b80b59b3555706508008f1f127b5412c89c7fd8 Mon Sep 17 00:00:00 2001 From: Alexandre Chartre Date: Tue, 14 Jun 2022 23:15:49 +0200 Subject: x86/bugs: Report AMD retbleed vulnerability Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary Speculative Code Execution with Return Instructions) attack. [peterz: add hygon] [kim: invert parity; fam15h] Co-developed-by: Kim Phillips Signed-off-by: Kim Phillips Signed-off-by: Alexandre Chartre Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov Reviewed-by: Josh Poimboeuf Signed-off-by: Borislav Petkov --- include/linux/cpu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 2c7477354744..314802f98b9d 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -68,6 +68,8 @@ extern ssize_t cpu_show_srbds(struct device *dev, struct device_attribute *attr, extern ssize_t cpu_show_mmio_stale_data(struct device *dev, struct device_attribute *attr, char *buf); +extern ssize_t cpu_show_retbleed(struct device *dev, + struct device_attribute *attr, char *buf); extern __printf(4, 5) struct device *cpu_device_create(struct device *parent, void *drvdata, -- cgit From a09a6e2399ba0595c3042b3164f3ca68a3cff33e Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 14 Jun 2022 23:16:03 +0200 Subject: objtool: Add entry UNRET validation Since entry asm is tricky, add a validation pass that ensures the retbleed mitigation has been done before the first actual RET instruction. Entry points are those that either have UNWIND_HINT_ENTRY, which acts as UNWIND_HINT_EMPTY but marks the instruction as an entry point, or those that have UWIND_HINT_IRET_REGS at +0. This is basically a variant of validate_branch() that is intra-function and it will simply follow all branches from marked entry points and ensures that all paths lead to ANNOTATE_UNRET_END. If a path hits RET or an indirection the path is a fail and will be reported. There are 3 ANNOTATE_UNRET_END instances: - UNTRAIN_RET itself - exception from-kernel; this path doesn't need UNTRAIN_RET - all early exceptions; these also don't need UNTRAIN_RET Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov Reviewed-by: Josh Poimboeuf Signed-off-by: Borislav Petkov --- include/linux/objtool.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/objtool.h b/include/linux/objtool.h index 15b940ec1eac..b026f1ae39c6 100644 --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -32,11 +32,14 @@ struct unwind_hint { * * UNWIND_HINT_FUNC: Generate the unwind metadata of a callable function. * Useful for code which doesn't have an ELF function annotation. + * + * UNWIND_HINT_ENTRY: machine entry without stack, SYSCALL/SYSENTER etc. */ #define UNWIND_HINT_TYPE_CALL 0 #define UNWIND_HINT_TYPE_REGS 1 #define UNWIND_HINT_TYPE_REGS_PARTIAL 2 #define UNWIND_HINT_TYPE_FUNC 3 +#define UNWIND_HINT_TYPE_ENTRY 4 #ifdef CONFIG_OBJTOOL -- cgit From 8faea26e611189e933ea2281975ff4dc7c1106b6 Mon Sep 17 00:00:00 2001 From: Josh Poimboeuf Date: Fri, 24 Jun 2022 12:52:40 +0200 Subject: objtool: Re-add UNWIND_HINT_{SAVE_RESTORE} Commit c536ed2fffd5 ("objtool: Remove SAVE/RESTORE hints") removed the save/restore unwind hints because they were no longer needed. Now they're going to be needed again so re-add them. Signed-off-by: Josh Poimboeuf Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Borislav Petkov --- include/linux/objtool.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/objtool.h b/include/linux/objtool.h index b026f1ae39c6..10bc88cc3bf6 100644 --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -40,6 +40,8 @@ struct unwind_hint { #define UNWIND_HINT_TYPE_REGS_PARTIAL 2 #define UNWIND_HINT_TYPE_FUNC 3 #define UNWIND_HINT_TYPE_ENTRY 4 +#define UNWIND_HINT_TYPE_SAVE 5 +#define UNWIND_HINT_TYPE_RESTORE 6 #ifdef CONFIG_OBJTOOL @@ -127,7 +129,7 @@ struct unwind_hint { * the debuginfo as necessary. It will also warn if it sees any * inconsistencies. */ -.macro UNWIND_HINT sp_reg:req sp_offset=0 type:req end=0 +.macro UNWIND_HINT type:req sp_reg=0 sp_offset=0 end=0 .Lunwind_hint_ip_\@: .pushsection .discard.unwind_hints /* struct unwind_hint */ @@ -180,7 +182,7 @@ struct unwind_hint { #define ASM_REACHABLE #else #define ANNOTATE_INTRA_FUNCTION_CALL -.macro UNWIND_HINT sp_reg:req sp_offset=0 type:req end=0 +.macro UNWIND_HINT type:req sp_reg=0 sp_offset=0 end=0 .endm .macro STACK_FRAME_NON_STANDARD func:req .endm -- cgit From 7283f862bd991c8657e9bf1c02db772fcf018f13 Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Wed, 22 Jun 2022 16:01:33 +0200 Subject: drm: Implement DRM aperture helpers under video/ Implement DRM's aperture helpers under video/ for sharing with other sub-systems. Remove DRM-isms from the interface. The helpers track the ownership of framebuffer apertures and provide hand-over from firmware, such as EFI and VESA, to native graphics drivers. Other subsystems, such as fbdev and vfio, also have to maintain ownership of framebuffer apertures. Moving DRM's aperture helpers to a more public location allows all subsystems to interact with each other and share a common implementation. The aperture helpers are selected by the various firmware drivers within DRM and fbdev, and the VGA text-console driver. The original DRM interface is kept in place for use by DRM drivers. v3: * prefix all interfaces with aperture_ (Javier) * rework and simplify documentation (Javier) * rename struct dev_aperture to struct aperture_range * rebase onto latest DRM * update MAINTAINERS entry Signed-off-by: Thomas Zimmermann Reviewed-by: Javier Martinez Canillas Tested-by: Laszlo Ersek Link: https://patchwork.freedesktop.org/patch/msgid/20220622140134.12763-3-tzimmermann@suse.de --- include/linux/aperture.h | 56 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 56 insertions(+) create mode 100644 include/linux/aperture.h (limited to 'include/linux') diff --git a/include/linux/aperture.h b/include/linux/aperture.h new file mode 100644 index 000000000000..442f15a57cad --- /dev/null +++ b/include/linux/aperture.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: MIT */ + +#ifndef _LINUX_APERTURE_H_ +#define _LINUX_APERTURE_H_ + +#include + +struct pci_dev; +struct platform_device; + +#if defined(CONFIG_APERTURE_HELPERS) +int devm_aperture_acquire_for_platform_device(struct platform_device *pdev, + resource_size_t base, + resource_size_t size); + +int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t size, + bool primary, const char *name); + +int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *name); +#else +static inline int devm_aperture_acquire_for_platform_device(struct platform_device *pdev, + resource_size_t base, + resource_size_t size) +{ + return 0; +} + +static inline int aperture_remove_conflicting_devices(resource_size_t base, resource_size_t size, + bool primary, const char *name) +{ + return 0; +} + +static inline int aperture_remove_conflicting_pci_devices(struct pci_dev *pdev, const char *name) +{ + return 0; +} +#endif + +/** + * aperture_remove_all_conflicting_devices - remove all existing framebuffers + * @primary: also kick vga16fb if present; only relevant for VGA devices + * @name: a descriptive name of the requesting driver + * + * This function removes all graphics device drivers. Use this function on systems + * that can have their framebuffer located anywhere in memory. + * + * Returns: + * 0 on success, or a negative errno code otherwise + */ +static inline int aperture_remove_all_conflicting_devices(bool primary, const char *name) +{ + return aperture_remove_conflicting_devices(0, (resource_size_t)-1, primary, name); +} + +#endif -- cgit From 7dc54d3b8d9100c65b0860f76342fc9f3b4694d9 Mon Sep 17 00:00:00 2001 From: Clément Léger Date: Fri, 24 Jun 2022 16:39:50 +0200 Subject: net: pcs: add Renesas MII converter driver MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add a PCS driver for the MII converter that is present on the Renesas RZ/N1 SoC. This MII converter is reponsible for converting MII to RMII/RGMII or act as a MII pass-trough. Exposing it as a PCS allows to reuse it in both the switch driver and the stmmac driver. Currently, this driver only allows the PCS to be used by the dual Cortex-A7 subsystem since the register locking system is not used. Signed-off-by: Clément Léger Reviewed-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/pcs-rzn1-miic.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 include/linux/pcs-rzn1-miic.h (limited to 'include/linux') diff --git a/include/linux/pcs-rzn1-miic.h b/include/linux/pcs-rzn1-miic.h new file mode 100644 index 000000000000..56d12b21365d --- /dev/null +++ b/include/linux/pcs-rzn1-miic.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022 Schneider Electric + * + * Clément Léger + */ + +#ifndef __LINUX_PCS_MIIC_H +#define __LINUX_PCS_MIIC_H + +struct phylink; +struct device_node; + +struct phylink_pcs *miic_create(struct device *dev, struct device_node *np); + +void miic_destroy(struct phylink_pcs *pcs); + +#endif /* __LINUX_PCS_MIIC_H */ -- cgit From 8da443b1a4036b5863fd2ec7e0251164b704c726 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Tue, 14 Jun 2022 11:05:34 +0200 Subject: tty/vt: consolemap: rename struct vc_data::vc_uni_pagedir* MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit As a follow-up to the commit 4173f018aae1 (tty/vt: consolemap: rename and document struct uni_pagedir), rename also the members of struct vc_data. I.e. pagedir -> pagedict. And while touching all the places, remove also the unnecessary vc_ prefix. Suggested-by: Ilpo Järvinen Reviewed-by: Ilpo Järvinen Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220614090537.15557-5-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/console_struct.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/console_struct.h b/include/linux/console_struct.h index f75033f0277f..1518568aaf0f 100644 --- a/include/linux/console_struct.h +++ b/include/linux/console_struct.h @@ -157,8 +157,8 @@ struct vc_data { unsigned int vc_bell_duration; /* Console bell duration */ unsigned short vc_cur_blink_ms; /* Cursor blink duration */ struct vc_data **vc_display_fg; /* [!] Ptr to var holding fg console for this display */ - struct uni_pagedict *vc_uni_pagedir; - struct uni_pagedict **vc_uni_pagedir_loc; /* [!] Location of uni_pagedict variable for this console */ + struct uni_pagedict *uni_pagedict; + struct uni_pagedict **uni_pagedict_loc; /* [!] Location of uni_pagedict variable for this console */ struct uni_screen *vc_uni_screen; /* unicode screen content */ /* additional information is in vt_kern.h */ }; -- cgit From 1714582a3a087eda8786d5a1b32b2ec86ca8a303 Mon Sep 17 00:00:00 2001 From: David Jander Date: Tue, 21 Jun 2022 08:12:24 +0200 Subject: spi: Move ctlr->cur_msg_prepared to struct spi_message This enables the possibility to transfer a message that is not at the current tip of the async message queue. This is in preparation of the next patch(es) which enable spi_sync messages to skip the queue altogether. Signed-off-by: David Jander Link: https://lore.kernel.org/r/20220621061234.3626638-2-david@protonic.nl Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index c96f526d9a20..1a75c26742f2 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -385,8 +385,6 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @queue: message queue * @idling: the device is entering idle state * @cur_msg: the currently in-flight message - * @cur_msg_prepared: spi_prepare_message was called for the currently - * in-flight message * @cur_msg_mapped: message has been mapped for DMA * @last_cs: the last chip_select that is recorded by set_cs, -1 on non chip * selected @@ -621,7 +619,6 @@ struct spi_controller { bool running; bool rt; bool auto_runtime_pm; - bool cur_msg_prepared; bool cur_msg_mapped; char last_cs; bool last_cs_mode_high; @@ -988,6 +985,7 @@ struct spi_transfer { * @queue: for use by whichever driver currently owns the message * @state: for use by whichever driver currently owns the message * @resources: for resource management when the spi message is processed + * @prepared: spi_prepare_message was called for the this message * * A @spi_message is used to execute an atomic sequence of data transfers, * each represented by a struct spi_transfer. The sequence is "atomic" @@ -1037,6 +1035,9 @@ struct spi_message { /* list of spi_res reources when the spi message is processed */ struct list_head resources; + + /* spi_prepare_message was called for this message */ + bool prepared; }; static inline void spi_message_init_no_memset(struct spi_message *m) -- cgit From ae7d2346dc89ae89a6e0aabe6037591a11e593c0 Mon Sep 17 00:00:00 2001 From: David Jander Date: Tue, 21 Jun 2022 08:12:25 +0200 Subject: spi: Don't use the message queue if possible in spi_sync The interaction with the controller message queue and its corresponding auxiliary flags and variables requires the use of the queue_lock which is costly. Since spi_sync will transfer the complete message anyway, and not return until it is finished, there is no need to put the message into the queue if the queue is empty. This can save a lot of overhead. As an example of how significant this is, when using the MCP2518FD SPI CAN controller on a i.MX8MM SoC, the time during which the interrupt line stays active (during 3 relatively short spi_sync messages), is reduced from 98us to 72us by this patch. Signed-off-by: David Jander Link: https://lore.kernel.org/r/20220621061234.3626638-3-david@protonic.nl Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 1a75c26742f2..74261a83b5fa 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -461,6 +461,8 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @irq_flags: Interrupt enable state during PTP system timestamping * @fallback: fallback to pio if dma transfer return failure with * SPI_TRANS_FAIL_NO_START. + * @queue_empty: signal green light for opportunistically skipping the queue + * for spi_sync transfers. * * Each SPI controller can communicate with one or more @spi_device * children. These make a small bus, sharing MOSI, MISO and SCK signals @@ -677,6 +679,9 @@ struct spi_controller { /* Interrupt enable state during PTP system timestamping */ unsigned long irq_flags; + + /* Flag for enabling opportunistic skipping of the queue in spi_sync */ + bool queue_empty; }; static inline void *spi_controller_get_devdata(struct spi_controller *ctlr) @@ -986,6 +991,7 @@ struct spi_transfer { * @state: for use by whichever driver currently owns the message * @resources: for resource management when the spi message is processed * @prepared: spi_prepare_message was called for the this message + * @sync: this message took the direct sync path skipping the async queue * * A @spi_message is used to execute an atomic sequence of data transfers, * each represented by a struct spi_transfer. The sequence is "atomic" @@ -1037,7 +1043,10 @@ struct spi_message { struct list_head resources; /* spi_prepare_message was called for this message */ - bool prepared; + bool prepared; + + /* this message is skipping the async queue */ + bool sync; }; static inline void spi_message_init_no_memset(struct spi_message *m) -- cgit From 66a221593cb26dd6aabba63bcd18173f4e69c7ab Mon Sep 17 00:00:00 2001 From: David Jander Date: Tue, 21 Jun 2022 08:12:30 +0200 Subject: spi: Remove the now unused ctlr->idling flag The ctlr->idling flag is never checked now, so we don't need to set it either. Signed-off-by: David Jander Link: https://lore.kernel.org/r/20220621061234.3626638-8-david@protonic.nl Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 74261a83b5fa..c58f46be762f 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -383,7 +383,6 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @pump_messages: work struct for scheduling work to the message pump * @queue_lock: spinlock to syncronise access to message queue * @queue: message queue - * @idling: the device is entering idle state * @cur_msg: the currently in-flight message * @cur_msg_mapped: message has been mapped for DMA * @last_cs: the last chip_select that is recorded by set_cs, -1 on non chip @@ -616,7 +615,6 @@ struct spi_controller { spinlock_t queue_lock; struct list_head queue; struct spi_message *cur_msg; - bool idling; bool busy; bool running; bool rt; -- cgit From 69fa95905d40846756d22402690ddf5361a9d13b Mon Sep 17 00:00:00 2001 From: David Jander Date: Tue, 21 Jun 2022 08:12:33 +0200 Subject: spi: Ensure the io_mutex is held until spi_finalize_current_message() This patch introduces a completion that is completed in spi_finalize_current_message() and waited for in __spi_pump_transfer_message(). This way all manipulation of ctlr->cur_msg is done with the io_mutex held and strictly ordered: __spi_pump_transfer_message() will not return until spi_finalize_current_message() is done using ctlr->cur_msg, and its calling context is only touching ctlr->cur_msg after returning. Due to this, we can safely drop the spin-locks around ctlr->cur_msg. Signed-off-by: David Jander Link: https://lore.kernel.org/r/20220621061234.3626638-11-david@protonic.nl Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index c58f46be762f..c56e0d240a58 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -384,6 +384,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @queue_lock: spinlock to syncronise access to message queue * @queue: message queue * @cur_msg: the currently in-flight message + * @cur_msg_completion: a completion for the current in-flight message * @cur_msg_mapped: message has been mapped for DMA * @last_cs: the last chip_select that is recorded by set_cs, -1 on non chip * selected @@ -615,6 +616,7 @@ struct spi_controller { spinlock_t queue_lock; struct list_head queue; struct spi_message *cur_msg; + struct completion cur_msg_completion; bool busy; bool running; bool rt; @@ -989,7 +991,6 @@ struct spi_transfer { * @state: for use by whichever driver currently owns the message * @resources: for resource management when the spi message is processed * @prepared: spi_prepare_message was called for the this message - * @sync: this message took the direct sync path skipping the async queue * * A @spi_message is used to execute an atomic sequence of data transfers, * each represented by a struct spi_transfer. The sequence is "atomic" @@ -1042,9 +1043,6 @@ struct spi_message { /* spi_prepare_message was called for this message */ bool prepared; - - /* this message is skipping the async queue */ - bool sync; }; static inline void spi_message_init_no_memset(struct spi_message *m) -- cgit From dc3029056b02414c29b6627e3dd7b16624725ae9 Mon Sep 17 00:00:00 2001 From: David Jander Date: Tue, 21 Jun 2022 08:12:34 +0200 Subject: spi: opportunistically skip ctlr->cur_msg_completion There are only a few drivers that do not call spi_finalize_current_message() in the context of transfer_one_message(), and even for those cases the completion ctlr->cur_msg_completion is not needed always. The calls to complete() and wait_for_completion() each take a spin-lock, which is costly. This patch makes it possible to avoid those calls in the big majority of cases, by introducing two flags that with the help of ordering via barriers can avoid using the completion safely. In case of a race with the context calling spi_finalize_current_message(), the scheme errs on the safe side and takes the completion. The impact of this patch is worth the effort: On a i.MX8MM SoC, the time the SPI bus is idle between two consecutive calls to spi_sync(), is reduced from 19.6us to 16.8us... roughly 15%. Signed-off-by: David Jander Link: https://lore.kernel.org/r/20220621061234.3626638-12-david@protonic.nl Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index c56e0d240a58..eb0d316e3c36 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -385,6 +385,12 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @queue: message queue * @cur_msg: the currently in-flight message * @cur_msg_completion: a completion for the current in-flight message + * @cur_msg_incomplete: Flag used internally to opportunistically skip + * the @cur_msg_completion. This flag is used to check if the driver has + * already called spi_finalize_current_message(). + * @cur_msg_need_completion: Flag used internally to opportunistically skip + * the @cur_msg_completion. This flag is used to signal the context that + * is running spi_finalize_current_message() that it needs to complete() * @cur_msg_mapped: message has been mapped for DMA * @last_cs: the last chip_select that is recorded by set_cs, -1 on non chip * selected @@ -617,6 +623,8 @@ struct spi_controller { struct list_head queue; struct spi_message *cur_msg; struct completion cur_msg_completion; + bool cur_msg_incomplete; + bool cur_msg_need_completion; bool busy; bool running; bool rt; -- cgit From 4a2dcc35911324d6fcde09b1760cf4f2962699ef Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Fri, 10 Jun 2022 12:58:23 -0700 Subject: block: introduce bdev_dma_alignment helper Preparing for upcoming dma_alignment users that have a block_device, but don't need the request_queue. Signed-off-by: Keith Busch Reviewed-by: Johannes Thumshirn Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220610195830.3574005-5-kbusch@fb.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2f7b43444c5f..2556fcdb645b 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1365,6 +1365,11 @@ static inline int queue_dma_alignment(const struct request_queue *q) return q ? q->dma_alignment : 511; } +static inline unsigned int bdev_dma_alignment(struct block_device *bdev) +{ + return queue_dma_alignment(bdev_get_queue(bdev)); +} + static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr, unsigned int len) { -- cgit From cfa320f72882f0e944e2237287db84b0f7df877d Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Fri, 10 Jun 2022 12:58:27 -0700 Subject: iov: introduce iov_iter_aligned The existing iov_iter_alignment() function returns the logical OR of address and length. For cases where address and length need to be considered separately, introduce a helper function that a caller can specificy length and address masks that indicate if the iov is unaligned. Cc: Alexander Viro Signed-off-by: Keith Busch Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220610195830.3574005-9-kbusch@fb.com Signed-off-by: Jens Axboe --- include/linux/uio.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/uio.h b/include/linux/uio.h index 739285fe5a2f..34ba4a731179 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -219,6 +219,8 @@ size_t _copy_mc_to_iter(const void *addr, size_t bytes, struct iov_iter *i); #endif size_t iov_iter_zero(size_t bytes, struct iov_iter *); +bool iov_iter_is_aligned(const struct iov_iter *i, unsigned addr_mask, + unsigned len_mask); unsigned long iov_iter_alignment(const struct iov_iter *i); unsigned long iov_iter_gap_alignment(const struct iov_iter *i); void iov_iter_init(struct iov_iter *i, unsigned int direction, const struct iovec *iov, -- cgit From 5debd9691c3ac64c3acd6867c264ad38bbe48cdc Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Fri, 10 Jun 2022 12:58:28 -0700 Subject: block: introduce bdev_iter_is_aligned helper Provide a convenient function for this repeatable coding pattern. Signed-off-by: Keith Busch Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220610195830.3574005-10-kbusch@fb.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2556fcdb645b..0b8bc1fe0b2c 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1370,6 +1370,13 @@ static inline unsigned int bdev_dma_alignment(struct block_device *bdev) return queue_dma_alignment(bdev_get_queue(bdev)); } +static inline bool bdev_iter_is_aligned(struct block_device *bdev, + struct iov_iter *iter) +{ + return iov_iter_is_aligned(iter, bdev_dma_alignment(bdev), + bdev_logical_block_size(bdev) - 1); +} + static inline int blk_rq_aligned(struct request_queue *q, unsigned long addr, unsigned int len) { -- cgit From b1a000d3b8ec582da64bb644be633e5a0beffcbf Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Fri, 10 Jun 2022 12:58:29 -0700 Subject: block: relax direct io memory alignment Use the address alignment requirements from the block_device for direct io instead of requiring addresses be aligned to the block size. User space can discover the alignment requirements from the dma_alignment queue attribute. User space can specify any hardware compatible DMA offset for each segment, but every segment length is still required to be a multiple of the block size. Signed-off-by: Keith Busch Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220610195830.3574005-11-kbusch@fb.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 0b8bc1fe0b2c..886c44e97434 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -424,6 +424,11 @@ struct request_queue { unsigned long nr_requests; /* Max # of requests */ unsigned int dma_pad_mask; + /* + * Drivers that set dma_alignment to less than 511 must be prepared to + * handle individual bvec's that are not a multiple of a SECTOR_SIZE + * due to possible offsets. + */ unsigned int dma_alignment; #ifdef CONFIG_BLK_INLINE_ENCRYPTION -- cgit From 8689461be3f1ce6686bc26f1f379790bb0fc7a8c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 14 Jun 2022 11:09:29 +0200 Subject: block: factor out a chunk_size_left helper Factor out a helper from blk_max_size_offset so that it can be reused independently. Signed-off-by: Christoph Hellwig Reviewed-by: Bart Van Assche Reviewed-by: Pankaj Raghav Link: https://lore.kernel.org/r/20220614090934.570632-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 886c44e97434..283961257cc9 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -934,6 +934,17 @@ static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, return q->limits.max_sectors; } +/* + * Return how much of the chunk is left to be used for I/O at a given offset. + */ +static inline unsigned int blk_chunk_sectors_left(sector_t offset, + unsigned int chunk_sectors) +{ + if (unlikely(!is_power_of_2(chunk_sectors))) + return chunk_sectors - sector_div(offset, chunk_sectors); + return chunk_sectors - (offset & (chunk_sectors - 1)); +} + /* * Return maximum size of a request at given offset. Only valid for * file system requests. @@ -949,12 +960,8 @@ static inline unsigned int blk_max_size_offset(struct request_queue *q, return q->limits.max_sectors; } - if (likely(is_power_of_2(chunk_sectors))) - chunk_sectors -= offset & (chunk_sectors - 1); - else - chunk_sectors -= sector_div(offset, chunk_sectors); - - return min(q->limits.max_sectors, chunk_sectors); + return min(q->limits.max_sectors, + blk_chunk_sectors_left(offset, chunk_sectors)); } /* -- cgit From efef739d5f37dc998b113fb965aea68d42a9eddc Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 14 Jun 2022 11:09:33 +0200 Subject: block: fold blk_max_size_offset into get_max_io_size Now that blk_max_size_offset has a single caller left, fold it into that and clean up the naming convention for the local variables there. Signed-off-by: Christoph Hellwig Reviewed-by: Pankaj Raghav Reviewed-by: Bart Van Assche Link: https://lore.kernel.org/r/20220614090934.570632-6-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 19 ------------------- 1 file changed, 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 283961257cc9..652c357dafb9 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -945,25 +945,6 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset, return chunk_sectors - (offset & (chunk_sectors - 1)); } -/* - * Return maximum size of a request at given offset. Only valid for - * file system requests. - */ -static inline unsigned int blk_max_size_offset(struct request_queue *q, - sector_t offset, - unsigned int chunk_sectors) -{ - if (!chunk_sectors) { - if (q->limits.chunk_sectors) - chunk_sectors = q->limits.chunk_sectors; - else - return q->limits.max_sectors; - } - - return min(q->limits.max_sectors, - blk_chunk_sectors_left(offset, chunk_sectors)); -} - /* * Access functions for manipulating queue properties */ -- cgit From 2a9336c42a6abdcef564d522648a684a474a3483 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 14 Jun 2022 11:09:34 +0200 Subject: block: move blk_queue_get_max_sectors to blk.h blk_queue_get_max_sectors is private to the block layer, so move it out of blkdev.h. Signed-off-by: Christoph Hellwig Reviewed-by: Bart Van Assche Link: https://lore.kernel.org/r/20220614090934.570632-7-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 652c357dafb9..b2d42201bd5d 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -921,19 +921,6 @@ static inline unsigned int bio_zone_is_seq(struct bio *bio) } #endif /* CONFIG_BLK_DEV_ZONED */ -static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q, - int op) -{ - if (unlikely(op == REQ_OP_DISCARD || op == REQ_OP_SECURE_ERASE)) - return min(q->limits.max_discard_sectors, - UINT_MAX >> SECTOR_SHIFT); - - if (unlikely(op == REQ_OP_WRITE_ZEROES)) - return q->limits.max_write_zeroes_sectors; - - return q->limits.max_sectors; -} - /* * Return how much of the chunk is left to be used for I/O at a given offset. */ -- cgit From e589f46445960c274cc813a1cc8e2fc73b2a1849 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 23 Jun 2022 09:48:26 +0200 Subject: block: fix default IO priority handling again Commit e70344c05995 ("block: fix default IO priority handling") introduced an inconsistency in get_current_ioprio() that tasks without IO context return IOPRIO_DEFAULT priority while tasks with freshly allocated IO context will return 0 (IOPRIO_CLASS_NONE/0) IO priority. Tasks without IO context used to be rare before 5a9d041ba2f6 ("block: move io_context creation into where it's needed") but after this commit they became common because now only BFQ IO scheduler setups task's IO context. Similar inconsistency is there for get_task_ioprio() so this inconsistency is now exposed to userspace and userspace will see different IO priority for tasks operating on devices with BFQ compared to devices without BFQ. Furthemore the changes done by commit e70344c05995 change the behavior when no IO priority is set for BFQ IO scheduler which is also documented in ioprio_set(2) manpage: "If no I/O scheduler has been set for a thread, then by default the I/O priority will follow the CPU nice value (setpriority(2)). In Linux kernels before version 2.6.24, once an I/O priority had been set using ioprio_set(), there was no way to reset the I/O scheduling behavior to the default. Since Linux 2.6.24, specifying ioprio as 0 can be used to reset to the default I/O scheduling behavior." So make sure we default to IOPRIO_CLASS_NONE as used to be the case before commit e70344c05995. Also cleanup alloc_io_context() to explicitely set this IO priority for the allocated IO context to avoid future surprises. Note that we tweak ioprio_best() to maintain ioprio_get(2) behavior and make this commit easily backportable. CC: stable@vger.kernel.org Fixes: e70344c05995 ("block: fix default IO priority handling") Reviewed-by: Damien Le Moal Tested-by: Damien Le Moal Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220623074840.5960-1-jack@suse.cz Signed-off-by: Jens Axboe --- include/linux/ioprio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h index 3f53bc27a19b..3d088a88f832 100644 --- a/include/linux/ioprio.h +++ b/include/linux/ioprio.h @@ -11,7 +11,7 @@ /* * Default IO priority. */ -#define IOPRIO_DEFAULT IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, IOPRIO_BE_NORM) +#define IOPRIO_DEFAULT IOPRIO_PRIO_VALUE(IOPRIO_CLASS_NONE, 0) /* * Check that a priority value has a valid class. -- cgit From f7eda402878b12bc0884c5bc1192a9e76ad121fb Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 23 Jun 2022 09:48:27 +0200 Subject: block: Return effective IO priority from get_current_ioprio() get_current_ioprio() is used to initialize IO priority of various requests. As such it should be returning the effective IO priority of the task (i.e., reflecting the fact that unset IO priority should get set based on task's CPU priority) so that the conversion is concentrated in one place. Reviewed-by: Damien Le Moal Tested-by: Damien Le Moal Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220623074840.5960-2-jack@suse.cz Signed-off-by: Jens Axboe --- include/linux/ioprio.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h index 3d088a88f832..61ed6bb4998e 100644 --- a/include/linux/ioprio.h +++ b/include/linux/ioprio.h @@ -53,10 +53,17 @@ static inline int task_nice_ioclass(struct task_struct *task) static inline int get_current_ioprio(void) { struct io_context *ioc = current->io_context; + int prio; if (ioc) - return ioc->ioprio; - return IOPRIO_DEFAULT; + prio = ioc->ioprio; + else + prio = IOPRIO_DEFAULT; + + if (IOPRIO_PRIO_CLASS(prio) == IOPRIO_CLASS_NONE) + prio = IOPRIO_PRIO_VALUE(task_nice_ioclass(current), + task_nice_ioprio(current)); + return prio; } /* -- cgit From 893e5d32d5832674bcf6465f27958e883b72b346 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 23 Jun 2022 09:48:28 +0200 Subject: block: Generalize get_current_ioprio() for any task get_current_ioprio() operates only on current task. We will need the same functionality for other tasks as well. Generalize get_current_ioprio() for that and also move the bulk out of the header file because it is large enough. Reviewed-by: Damien Le Moal Tested-by: Damien Le Moal Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220623074840.5960-3-jack@suse.cz Signed-off-by: Jens Axboe --- include/linux/ioprio.h | 26 ++++++++++---------------- 1 file changed, 10 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h index 61ed6bb4998e..9752cf4a9c7c 100644 --- a/include/linux/ioprio.h +++ b/include/linux/ioprio.h @@ -46,24 +46,18 @@ static inline int task_nice_ioclass(struct task_struct *task) return IOPRIO_CLASS_BE; } -/* - * If the calling process has set an I/O priority, use that. Otherwise, return - * the default I/O priority. - */ -static inline int get_current_ioprio(void) +#ifdef CONFIG_BLOCK +int __get_task_ioprio(struct task_struct *p); +#else +static inline int __get_task_ioprio(struct task_struct *p) { - struct io_context *ioc = current->io_context; - int prio; - - if (ioc) - prio = ioc->ioprio; - else - prio = IOPRIO_DEFAULT; + return IOPRIO_DEFAULT; +} +#endif /* CONFIG_BLOCK */ - if (IOPRIO_PRIO_CLASS(prio) == IOPRIO_CLASS_NONE) - prio = IOPRIO_PRIO_VALUE(task_nice_ioclass(current), - task_nice_ioprio(current)); - return prio; +static inline int get_current_ioprio(void) +{ + return __get_task_ioprio(current); } /* -- cgit From fc25545e17bd74befe0b8ab2c65ac84936be5066 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 23 Jun 2022 09:48:29 +0200 Subject: block: Make ioprio_best() static Nobody outside of block/ioprio.c uses it. Reviewed-by: Damien Le Moal Tested-by: Damien Le Moal Signed-off-by: Jan Kara Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220623074840.5960-4-jack@suse.cz Signed-off-by: Jens Axboe --- include/linux/ioprio.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h index 9752cf4a9c7c..7578d4f6a969 100644 --- a/include/linux/ioprio.h +++ b/include/linux/ioprio.h @@ -60,11 +60,6 @@ static inline int get_current_ioprio(void) return __get_task_ioprio(current); } -/* - * For inheritance, return the highest of the two given priorities - */ -extern int ioprio_best(unsigned short aprio, unsigned short bprio); - extern int set_task_ioprio(struct task_struct *task, int ioprio); #ifdef CONFIG_BLOCK -- cgit From ab24a01b276508dc884761bcb8e2759c36702377 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Tue, 14 Jun 2022 10:50:54 +0300 Subject: tty: Add closing marker into comment in tty_ldisc.h MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The closing `` is missing. Add it. Fixes: 6bb6fa6908eb ("tty: Implement lookahead to process XON/XOFF timely") Reported-by: Stephen Rothwell Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/9bc6d45d-48c8-519-1646-78ba22505b1f@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_ldisc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index 33678e1936f6..ede6f2157f32 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -187,7 +187,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * function for automatic flow control. * * @lookahead_buf: [DRV] ``void ()(struct tty_struct *tty, - * const unsigned char *cp, const char *fp, int count) + * const unsigned char *cp, const char *fp, int count)`` * * This function is called by the low-level tty driver for characters * not eaten by ->receive_buf() or ->receive_buf2(). It is useful for -- cgit From f9008285bb69e4713918a665250ab2d356b731ba Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Mon, 13 Jun 2022 14:39:05 +0300 Subject: serial: Drop timeout from uart_port MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since commit 31f6bd7fad3b ("serial: Store character timing information to uart_port"), per frame timing information is available on uart_port. Uart port's timeout can be derived from frame_time by multiplying with fifosize. Most callers of uart_poll_timeout are not made under port's lock. To be on the safe side, make sure frame_time is only accessed once. As fifo_size is effectively a constant, it shouldn't cause any issues. Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220613113905.22962-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 8032ffa741ed..faaf2372c60d 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -232,7 +232,6 @@ struct uart_port { int hw_stopped; /* sw-assisted CTS flow state */ unsigned int mctrl; /* current modem ctrl settings */ - unsigned int timeout; /* character-based timeout */ unsigned int frame_time; /* frame timing in ns */ unsigned int type; /* port type */ const struct uart_ops *ops; @@ -335,10 +334,23 @@ unsigned int uart_get_baud_rate(struct uart_port *port, struct ktermios *termios unsigned int max); unsigned int uart_get_divisor(struct uart_port *port, unsigned int baud); +/* + * Calculates FIFO drain time. + */ +static inline unsigned long uart_fifo_timeout(struct uart_port *port) +{ + u64 fifo_timeout = (u64)READ_ONCE(port->frame_time) * port->fifosize; + + /* Add .02 seconds of slop */ + fifo_timeout += 20 * NSEC_PER_MSEC; + + return max(nsecs_to_jiffies(fifo_timeout), 1UL); +} + /* Base timer interval for polling */ static inline int uart_poll_timeout(struct uart_port *port) { - int timeout = port->timeout; + int timeout = uart_fifo_timeout(port); return timeout > 6 ? (timeout / 2 - 2) : 1; } -- cgit From eb47b59afb7e46c952d7b03884245364990d4910 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Fri, 24 Jun 2022 23:54:23 +0300 Subject: serial: Convert SERIAL_XMIT_SIZE to UART_XMIT_SIZE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Both UART_XMIT_SIZE and SERIAL_XMIT_SIZE are defined. Make them all UART_XMIT_SIZE. Reviewed-by: Jiri Slaby Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220624205424.12686-6-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/serial.h b/include/linux/serial.h index 0b8b7d7c8f33..70a9866e4abb 100644 --- a/include/linux/serial.h +++ b/include/linux/serial.h @@ -9,7 +9,6 @@ #ifndef _LINUX_SERIAL_H #define _LINUX_SERIAL_H -#include #include /* Helper for dealing with UART_LCR_WLEN* defines */ @@ -25,11 +24,6 @@ struct async_icount { __u32 buf_overrun; }; -/* - * The size of the serial xmit buffer is 1 page, or 4096 bytes - */ -#define SERIAL_XMIT_SIZE PAGE_SIZE - #include #endif /* _LINUX_SERIAL_H */ -- cgit From 34619de1b8cb52afa90bbeb3b4fbad34c28f19cf Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Fri, 24 Jun 2022 23:54:24 +0300 Subject: serial: Consolidate BOTH_EMPTY use MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per file BOTH_EMPTY defines are littering our source code here and there. Define once in serial.h and create helper for the check too. Reviewed-by: Jiri Slaby Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220624205424.12686-7-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/serial.h b/include/linux/serial.h index 70a9866e4abb..3d6fe3ef92cf 100644 --- a/include/linux/serial.h +++ b/include/linux/serial.h @@ -10,10 +10,19 @@ #define _LINUX_SERIAL_H #include +#include /* Helper for dealing with UART_LCR_WLEN* defines */ #define UART_LCR_WLEN(x) ((x) - 5) +/* FIFO and shifting register empty */ +#define UART_LSR_BOTH_EMPTY (UART_LSR_TEMT | UART_LSR_THRE) + +static inline bool uart_lsr_tx_empty(u16 lsr) +{ + return (lsr & UART_LSR_BOTH_EMPTY) == UART_LSR_BOTH_EMPTY; +} + /* * Counters of the input lines (CTS, DSR, RI, CD) interrupts */ -- cgit From f8ba5680a56be696b3f4343ed0a591abab807da4 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Fri, 24 Jun 2022 23:42:05 +0300 Subject: serial: 8250: make saved LSR larger MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DW flags address received as BIT(8) in LSR. In order to not lose that on read, enlarge lsr_saved_flags to u16. Adjust lsr/status variables and related call chains to use u16. Technically, some of these type conversion would not be needed but it doesn't hurt to be consistent. Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220624204210.11112-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_8250.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h index ff84a3ed10ea..4565f25ba9a2 100644 --- a/include/linux/serial_8250.h +++ b/include/linux/serial_8250.h @@ -119,7 +119,7 @@ struct uart_8250_port { * be immediately processed. */ #define LSR_SAVE_FLAGS UART_LSR_BRK_ERROR_BITS - unsigned char lsr_saved_flags; + u16 lsr_saved_flags; #define MSR_SAVE_FLAGS UART_MSR_ANY_DELTA unsigned char msr_saved_flags; @@ -170,8 +170,8 @@ extern void serial8250_do_set_divisor(struct uart_port *port, unsigned int baud, unsigned int quot_frac); extern int fsl8250_handle_irq(struct uart_port *port); int serial8250_handle_irq(struct uart_port *port, unsigned int iir); -unsigned char serial8250_rx_chars(struct uart_8250_port *up, unsigned char lsr); -void serial8250_read_char(struct uart_8250_port *up, unsigned char lsr); +u16 serial8250_rx_chars(struct uart_8250_port *up, u16 lsr); +void serial8250_read_char(struct uart_8250_port *up, u16 lsr); void serial8250_tx_chars(struct uart_8250_port *up); unsigned int serial8250_modem_status(struct uart_8250_port *up); void serial8250_init_port(struct uart_8250_port *up); -- cgit From 507bd6fbaaefcb8dd89bd00baddf00b439d30c51 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Fri, 24 Jun 2022 23:42:06 +0300 Subject: serial: 8250: create lsr_save_mask MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Allow drivers to alter LSR save mask. Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220624204210.11112-3-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_8250.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h index 4565f25ba9a2..8c7b793aa4d7 100644 --- a/include/linux/serial_8250.h +++ b/include/linux/serial_8250.h @@ -120,6 +120,7 @@ struct uart_8250_port { */ #define LSR_SAVE_FLAGS UART_LSR_BRK_ERROR_BITS u16 lsr_saved_flags; + u16 lsr_save_mask; #define MSR_SAVE_FLAGS UART_MSR_ANY_DELTA unsigned char msr_saved_flags; -- cgit From ae50bb2752836277ae15aa4e9d99074d6d947946 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Fri, 24 Jun 2022 23:42:08 +0300 Subject: serial: take termios_rwsem for ->rs485_config() & pass termios as param MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To be able to alter ADDRB within ->rs485_config(), take termios_rwsem before calling ->rs485_config() and pass termios. Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220624204210.11112-5-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index faaf2372c60d..b7b86ee3cb12 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -133,6 +133,7 @@ struct uart_port { unsigned int old); void (*handle_break)(struct uart_port *); int (*rs485_config)(struct uart_port *, + struct ktermios *termios, struct serial_rs485 *rs485); int (*iso7816_config)(struct uart_port *, struct serial_iso7816 *iso7816); -- cgit From 4d0548a7b806a78ba253f1389b9ecdcaca47d583 Mon Sep 17 00:00:00 2001 From: Seth Forshee Date: Mon, 27 Jun 2022 08:01:58 -0500 Subject: mnt_idmapping: return false when comparing two invalid ids INVALID_VFS{U,G}ID represent ids which have no mapping in the target mnt_usersns. This can happen for a couple of different reasons -- the source id might be valid but has no mapping in mnt_userns, or the source id might have been invalid (either due to a failed mapping or because it was set to invalid to indicate it is uninitialized). This means that two arbitrary vfs{u,g}ids which are both invalid could represent two different underlying ids, or they could represent a failed mapping and an uninitialized value. In these situation the vfs{u,g}id equality functions evaluate these ids as equal, and care must be taken when comparing ids to avoid problems. It would be less error prone to always evaluate two invalid ids as not equal to each other, and to check explicitly for vfs{u,g}id validity when that is needed. Change all vfs{u,g}id equality functions to return false when both ids are invalid. Functions for checking whether an id is valid exist and are already being used by code which needs to check this. Link: https://lore.kernel.org/linux-fsdevel/YrIMZirGoE0VIO45@do-x1extreme Signed-off-by: Seth Forshee Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Christian Brauner (Microsoft) --- include/linux/mnt_idmapping.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mnt_idmapping.h b/include/linux/mnt_idmapping.h index 6a752b61088b..21bd22a7b326 100644 --- a/include/linux/mnt_idmapping.h +++ b/include/linux/mnt_idmapping.h @@ -60,12 +60,12 @@ static inline bool vfsgid_valid(vfsgid_t gid) static inline bool vfsuid_eq(vfsuid_t left, vfsuid_t right) { - return __vfsuid_val(left) == __vfsuid_val(right); + return vfsuid_valid(left) && __vfsuid_val(left) == __vfsuid_val(right); } static inline bool vfsgid_eq(vfsgid_t left, vfsgid_t right) { - return __vfsgid_val(left) == __vfsgid_val(right); + return vfsgid_valid(left) && __vfsgid_val(left) == __vfsgid_val(right); } /** @@ -76,10 +76,11 @@ static inline bool vfsgid_eq(vfsgid_t left, vfsgid_t right) * Check whether @kuid and @vfsuid have the same values. * * Return: true if @kuid and @vfsuid have the same value, false if not. + * Comparison between two invalid uids returns false. */ static inline bool vfsuid_eq_kuid(vfsuid_t vfsuid, kuid_t kuid) { - return __vfsuid_val(vfsuid) == __kuid_val(kuid); + return vfsuid_valid(vfsuid) && __vfsuid_val(vfsuid) == __kuid_val(kuid); } /** @@ -90,10 +91,11 @@ static inline bool vfsuid_eq_kuid(vfsuid_t vfsuid, kuid_t kuid) * Check whether @kgid and @vfsgid have the same values. * * Return: true if @kgid and @vfsgid have the same value, false if not. + * Comparison between two invalid gids returns false. */ static inline bool vfsgid_eq_kgid(kgid_t kgid, vfsgid_t vfsgid) { - return __vfsgid_val(vfsgid) == __kgid_val(kgid); + return vfsgid_valid(vfsgid) && __vfsgid_val(vfsgid) == __kgid_val(kgid); } /* -- cgit From 38a523a2946d3a0961d141d477a1ee2b1f3bdbb1 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Mon, 27 Jun 2022 16:36:57 +0200 Subject: Revert "devcoredump: remove the useless gfp_t parameter in dev_coredumpv and dev_coredumpm" This reverts commit 77515ebaf01920e2db49e04672ef669a7c2907f2 as it causes build problems in linux-next. It needs to be reintroduced in a way that can allow the api to evolve and not require a "flag day" to catch all users. Link: https://lore.kernel.org/r/20220623160723.7a44b573@canb.auug.org.au Cc: Duoming Zhou Cc: Brian Norris Cc: Johannes Berg Reported-by: Stephen Rothwell Signed-off-by: Greg Kroah-Hartman --- include/linux/devcoredump.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/devcoredump.h b/include/linux/devcoredump.h index c7d840d824c3..c008169ed2c6 100644 --- a/include/linux/devcoredump.h +++ b/include/linux/devcoredump.h @@ -52,26 +52,27 @@ static inline void _devcd_free_sgtable(struct scatterlist *table) #ifdef CONFIG_DEV_COREDUMP -void dev_coredumpv(struct device *dev, void *data, size_t datalen); +void dev_coredumpv(struct device *dev, void *data, size_t datalen, + gfp_t gfp); void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, + void *data, size_t datalen, gfp_t gfp, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)); void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen); + size_t datalen, gfp_t gfp); #else static inline void dev_coredumpv(struct device *dev, void *data, - size_t datalen) + size_t datalen, gfp_t gfp) { vfree(data); } static inline void dev_coredumpm(struct device *dev, struct module *owner, - void *data, size_t datalen, + void *data, size_t datalen, gfp_t gfp, ssize_t (*read)(char *buffer, loff_t offset, size_t count, void *data, size_t datalen), void (*free)(void *data)) @@ -80,7 +81,7 @@ dev_coredumpm(struct device *dev, struct module *owner, } static inline void dev_coredumpsg(struct device *dev, struct scatterlist *table, - size_t datalen) + size_t datalen, gfp_t gfp) { _devcd_free_sgtable(table); } -- cgit From 086c00c71fc8d47db6983f419a45f9ee167de03f Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Wed, 15 Jun 2022 12:10:56 +1000 Subject: kernfs: make ->attr.open RCU protected. After removal of kernfs_open_node->refcnt in the previous patch, kernfs_open_node_lock can be removed as well by making ->attr.open RCU protected. kernfs_put_open_node can delegate freeing to ->attr.open to RCU and other readers of ->attr.open can do so under rcu_read_(un)lock. Suggested by: Al Viro Acked-by: Tejun Heo Signed-off-by: Imran Khan Link: https://lore.kernel.org/r/20220615021059.862643-2-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman --- include/linux/kernfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index e2ae15a6225e..13f54f078a52 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -114,7 +114,7 @@ struct kernfs_elem_symlink { struct kernfs_elem_attr { const struct kernfs_ops *ops; - struct kernfs_open_node *open; + struct kernfs_open_node __rcu *open; loff_t size; struct kernfs_node *notify_next; /* for kernfs_notify() */ }; -- cgit From b8f35fa1188b84035c59d4842826c4e93a1b1c9f Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Wed, 15 Jun 2022 12:10:57 +1000 Subject: kernfs: Change kernfs_notify_list to llist. At present kernfs_notify_list is implemented as a singly linked list of kernfs_node(s), where last element points to itself and value of ->attr.next tells if node is present on the list or not. Both addition and deletion to list happen under kernfs_notify_lock. Change kernfs_notify_list to llist so that addition to list can heppen locklessly. Suggested by: Al Viro Acked-by: Tejun Heo Signed-off-by: Imran Khan Link: https://lore.kernel.org/r/20220615021059.862643-3-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman --- include/linux/kernfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 13f54f078a52..2dd9c8df0f4f 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -116,7 +116,7 @@ struct kernfs_elem_attr { const struct kernfs_ops *ops; struct kernfs_open_node __rcu *open; loff_t size; - struct kernfs_node *notify_next; /* for kernfs_notify() */ + struct llist_node notify_next; /* for kernfs_notify() */ }; /* -- cgit From 1d25b84e444ad66313c473407979ea9cd33deb3f Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Wed, 15 Jun 2022 12:10:59 +1000 Subject: kernfs: Replace global kernfs_open_file_mutex with hashed mutexes. In current kernfs design a single mutex, kernfs_open_file_mutex, protects the list of kernfs_open_file instances corresponding to a sysfs attribute. So even if different tasks are opening or closing different sysfs files they can contend on osq_lock of this mutex. The contention is more apparent in large scale systems with few hundred CPUs where most of the CPUs have running tasks that are opening, accessing or closing sysfs files at any point of time. Using hashed mutexes in place of a single global mutex, can significantly reduce contention around global mutex and hence can provide better scalability. Moreover as these hashed mutexes are not part of kernfs_node objects we will not see any singnificant change in memory utilization of kernfs based file systems like sysfs, cgroupfs etc. Modify interface introduced in previous patch to make use of hashed mutexes. Use kernfs_node address as hashing key. Acked-by: Tejun Heo Signed-off-by: Imran Khan Link: https://lore.kernel.org/r/20220615021059.862643-5-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman --- include/linux/kernfs.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 2dd9c8df0f4f..13e703f615f7 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -18,6 +18,7 @@ #include #include #include +#include struct file; struct dentry; @@ -34,6 +35,62 @@ struct kernfs_fs_context; struct kernfs_open_node; struct kernfs_iattrs; +/* + * NR_KERNFS_LOCK_BITS determines size (NR_KERNFS_LOCKS) of hash + * table of locks. + * Having a small hash table would impact scalability, since + * more and more kernfs_node objects will end up using same lock + * and having a very large hash table would waste memory. + * + * At the moment size of hash table of locks is being set based on + * the number of CPUs as follows: + * + * NR_CPU NR_KERNFS_LOCK_BITS NR_KERNFS_LOCKS + * 1 1 2 + * 2-3 2 4 + * 4-7 4 16 + * 8-15 6 64 + * 16-31 8 256 + * 32 and more 10 1024 + * + * The above relation between NR_CPU and number of locks is based + * on some internal experimentation which involved booting qemu + * with different values of smp, performing some sysfs operations + * on all CPUs and observing how increase in number of locks impacts + * completion time of these sysfs operations on each CPU. + */ +#ifdef CONFIG_SMP +#define NR_KERNFS_LOCK_BITS (2 * (ilog2(NR_CPUS < 32 ? NR_CPUS : 32))) +#else +#define NR_KERNFS_LOCK_BITS 1 +#endif + +#define NR_KERNFS_LOCKS (1 << NR_KERNFS_LOCK_BITS) + +/* + * There's one kernfs_open_file for each open file and one kernfs_open_node + * for each kernfs_node with one or more open files. + * + * filp->private_data points to seq_file whose ->private points to + * kernfs_open_file. + * + * kernfs_open_files are chained at kernfs_open_node->files, which is + * protected by kernfs_global_locks.open_file_mutex[i]. + * + * To reduce possible contention in sysfs access, arising due to single + * locks, use an array of locks (e.g. open_file_mutex) and use kernfs_node + * object address as hash keys to get the index of these locks. + * + * Hashed mutexes are safe to use here because operations using these don't + * rely on global exclusion. + * + * In future we intend to replace other global locks with hashed ones as well. + * kernfs_global_locks acts as a holder for all such hash tables. + */ +struct kernfs_global_locks { + struct mutex open_file_mutex[NR_KERNFS_LOCKS]; +}; + enum kernfs_node_type { KERNFS_DIR = 0x0001, KERNFS_FILE = 0x0002, -- cgit From 8f486cab263ca1b761c1aab9b36975afd355b799 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Thu, 23 Jun 2022 01:03:42 -0700 Subject: driver core: fw_devlink: Allow firmware to mark devices as best effort When firmware sets the FWNODE_FLAG_BEST_EFFORT flag for a fwnode, fw_devlink will do a best effort ordering for that device where it'll only enforce the probe/suspend/resume ordering of that device with suppliers that have drivers. The driver of that device can then decide if it wants to defer probe or probe without the suppliers. This will be useful for avoid probe delays of the console device that were caused by commit 71066545b48e ("driver core: Set fw_devlink.strict=1 by default"). Fixes: 71066545b48e ("driver core: Set fw_devlink.strict=1 by default") Reported-by: Sascha Hauer Reported-by: Peng Fan Tested-by: Peng Fan Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220623080344.783549-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- include/linux/fwnode.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h index 9a81c4410b9f..89b9bdfca925 100644 --- a/include/linux/fwnode.h +++ b/include/linux/fwnode.h @@ -27,11 +27,15 @@ struct device; * driver needs its child devices to be bound with * their respective drivers as soon as they are * added. + * BEST_EFFORT: The fwnode/device needs to probe early and might be missing some + * suppliers. Only enforce ordering with suppliers that have + * drivers. */ #define FWNODE_FLAG_LINKS_ADDED BIT(0) #define FWNODE_FLAG_NOT_DEVICE BIT(1) #define FWNODE_FLAG_INITIALIZED BIT(2) #define FWNODE_FLAG_NEEDS_CHILD_BOUND_ON_ADD BIT(3) +#define FWNODE_FLAG_BEST_EFFORT BIT(4) struct fwnode_handle { struct fwnode_handle *secondary; -- cgit From d1877e639bc6bf1c3131eda3f9ede73f8da96c22 Mon Sep 17 00:00:00 2001 From: Alex Williamson Date: Wed, 8 Jun 2022 12:55:13 -0600 Subject: vfio: de-extern-ify function prototypes The use of 'extern' in function prototypes has been disrecommended in the kernel coding style for several years now, remove them from all vfio related files so contributors no longer need to decide between style and consistency. Reviewed-by: Kevin Tian Reviewed-by: Jason Gunthorpe Reviewed-by: Eric Farman Reviewed-by: Eric Auger Reviewed-by: Cornelia Huck Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/165471414407.203056.474032786990662279.stgit@omen Signed-off-by: Alex Williamson --- include/linux/vfio.h | 70 +++++++++++++++++++++---------------------- include/linux/vfio_pci_core.h | 65 ++++++++++++++++++++-------------------- 2 files changed, 66 insertions(+), 69 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index aa888cc51757..49580fa2073a 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -140,19 +140,19 @@ int vfio_mig_get_next_state(struct vfio_device *device, /* * External user API */ -extern struct iommu_group *vfio_file_iommu_group(struct file *file); -extern bool vfio_file_enforced_coherent(struct file *file); -extern void vfio_file_set_kvm(struct file *file, struct kvm *kvm); -extern bool vfio_file_has_dev(struct file *file, struct vfio_device *device); +struct iommu_group *vfio_file_iommu_group(struct file *file); +bool vfio_file_enforced_coherent(struct file *file); +void vfio_file_set_kvm(struct file *file, struct kvm *kvm); +bool vfio_file_has_dev(struct file *file, struct vfio_device *device); #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) -extern int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn, - int npage, int prot, unsigned long *phys_pfn); -extern int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn, - int npage); -extern int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, - void *data, size_t len, bool write); +int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn, + int npage, int prot, unsigned long *phys_pfn); +int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn, + int npage); +int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, + void *data, size_t len, bool write); /* each type has independent events */ enum vfio_notify_type { @@ -162,13 +162,13 @@ enum vfio_notify_type { /* events for VFIO_IOMMU_NOTIFY */ #define VFIO_IOMMU_NOTIFY_DMA_UNMAP BIT(0) -extern int vfio_register_notifier(struct vfio_device *device, - enum vfio_notify_type type, - unsigned long *required_events, - struct notifier_block *nb); -extern int vfio_unregister_notifier(struct vfio_device *device, - enum vfio_notify_type type, - struct notifier_block *nb); +int vfio_register_notifier(struct vfio_device *device, + enum vfio_notify_type type, + unsigned long *required_events, + struct notifier_block *nb); +int vfio_unregister_notifier(struct vfio_device *device, + enum vfio_notify_type type, + struct notifier_block *nb); /* @@ -178,25 +178,24 @@ struct vfio_info_cap { struct vfio_info_cap_header *buf; size_t size; }; -extern struct vfio_info_cap_header *vfio_info_cap_add( - struct vfio_info_cap *caps, size_t size, u16 id, u16 version); -extern void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset); +struct vfio_info_cap_header *vfio_info_cap_add(struct vfio_info_cap *caps, + size_t size, u16 id, + u16 version); +void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset); -extern int vfio_info_add_capability(struct vfio_info_cap *caps, - struct vfio_info_cap_header *cap, - size_t size); +int vfio_info_add_capability(struct vfio_info_cap *caps, + struct vfio_info_cap_header *cap, size_t size); -extern int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, - int num_irqs, int max_irq_type, - size_t *data_size); +int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, + int num_irqs, int max_irq_type, + size_t *data_size); struct pci_dev; #if IS_ENABLED(CONFIG_VFIO_SPAPR_EEH) -extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev); -extern void vfio_spapr_pci_eeh_release(struct pci_dev *pdev); -extern long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group, - unsigned int cmd, - unsigned long arg); +void vfio_spapr_pci_eeh_open(struct pci_dev *pdev); +void vfio_spapr_pci_eeh_release(struct pci_dev *pdev); +long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group, unsigned int cmd, + unsigned long arg); #else static inline void vfio_spapr_pci_eeh_open(struct pci_dev *pdev) { @@ -230,10 +229,9 @@ struct virqfd { struct virqfd **pvirqfd; }; -extern int vfio_virqfd_enable(void *opaque, - int (*handler)(void *, void *), - void (*thread)(void *, void *), - void *data, struct virqfd **pvirqfd, int fd); -extern void vfio_virqfd_disable(struct virqfd **pvirqfd); +int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *), + void (*thread)(void *, void *), void *data, + struct virqfd **pvirqfd, int fd); +void vfio_virqfd_disable(struct virqfd **pvirqfd); #endif /* VFIO_H */ diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 23c176d4b073..22de2bce6394 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -147,23 +147,23 @@ struct vfio_pci_core_device { #define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev))) #define irq_is(vdev, type) (vdev->irq_type == type) -extern void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); -extern void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); +void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); +void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); -extern int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, - uint32_t flags, unsigned index, - unsigned start, unsigned count, void *data); +int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, + uint32_t flags, unsigned index, + unsigned start, unsigned count, void *data); -extern ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, - char __user *buf, size_t count, - loff_t *ppos, bool iswrite); +ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, + char __user *buf, size_t count, + loff_t *ppos, bool iswrite); -extern ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, - size_t count, loff_t *ppos, bool iswrite); +ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, + size_t count, loff_t *ppos, bool iswrite); #ifdef CONFIG_VFIO_PCI_VGA -extern ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, - size_t count, loff_t *ppos, bool iswrite); +ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, + size_t count, loff_t *ppos, bool iswrite); #else static inline ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, size_t count, @@ -173,32 +173,31 @@ static inline ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, } #endif -extern long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, - uint64_t data, int count, int fd); +long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, + uint64_t data, int count, int fd); -extern int vfio_pci_init_perm_bits(void); -extern void vfio_pci_uninit_perm_bits(void); +int vfio_pci_init_perm_bits(void); +void vfio_pci_uninit_perm_bits(void); -extern int vfio_config_init(struct vfio_pci_core_device *vdev); -extern void vfio_config_free(struct vfio_pci_core_device *vdev); +int vfio_config_init(struct vfio_pci_core_device *vdev); +void vfio_config_free(struct vfio_pci_core_device *vdev); -extern int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev, - unsigned int type, unsigned int subtype, - const struct vfio_pci_regops *ops, - size_t size, u32 flags, void *data); +int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev, + unsigned int type, unsigned int subtype, + const struct vfio_pci_regops *ops, + size_t size, u32 flags, void *data); -extern int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, - pci_power_t state); +int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, + pci_power_t state); -extern bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev); -extern void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device - *vdev); -extern u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev); -extern void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev, - u16 cmd); +bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev); +void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev); +u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev); +void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev, + u16 cmd); #ifdef CONFIG_VFIO_PCI_IGD -extern int vfio_pci_igd_init(struct vfio_pci_core_device *vdev); +int vfio_pci_igd_init(struct vfio_pci_core_device *vdev); #else static inline int vfio_pci_igd_init(struct vfio_pci_core_device *vdev) { @@ -207,8 +206,8 @@ static inline int vfio_pci_igd_init(struct vfio_pci_core_device *vdev) #endif #ifdef CONFIG_S390 -extern int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev, - struct vfio_info_cap *caps); +int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev, + struct vfio_info_cap *caps); #else static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev, struct vfio_info_cap *caps) -- cgit From ee65728e103bb7dd99d8604bf6c7aa89c7d7e446 Mon Sep 17 00:00:00 2001 From: Mike Rapoport Date: Mon, 27 Jun 2022 09:00:26 +0300 Subject: docs: rename Documentation/vm to Documentation/mm so it will be consistent with code mm directory and with Documentation/admin-guide/mm and won't be confused with virtual machines. Signed-off-by: Mike Rapoport Suggested-by: Matthew Wilcox Tested-by: Ira Weiny Acked-by: Jonathan Corbet Acked-by: Wu XiangCheng --- include/linux/hmm.h | 4 ++-- include/linux/memremap.h | 2 +- include/linux/mmu_notifier.h | 2 +- include/linux/sched/mm.h | 4 ++-- include/linux/swap.h | 2 +- 5 files changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hmm.h b/include/linux/hmm.h index d5a6f101f843..126a36571667 100644 --- a/include/linux/hmm.h +++ b/include/linux/hmm.h @@ -4,7 +4,7 @@ * * Authors: Jérôme Glisse * - * See Documentation/vm/hmm.rst for reasons and overview of what HMM is. + * See Documentation/mm/hmm.rst for reasons and overview of what HMM is. */ #ifndef LINUX_HMM_H #define LINUX_HMM_H @@ -100,7 +100,7 @@ struct hmm_range { }; /* - * Please see Documentation/vm/hmm.rst for how to use the range API. + * Please see Documentation/mm/hmm.rst for how to use the range API. */ int hmm_range_fault(struct hmm_range *range); diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 8af304f6b504..9f5ee49482de 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -39,7 +39,7 @@ struct vmem_altmap { * must be treated as an opaque object, rather than a "normal" struct page. * * A more complete discussion of unaddressable memory may be found in - * include/linux/hmm.h and Documentation/vm/hmm.rst. + * include/linux/hmm.h and Documentation/mm/hmm.rst. * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index 45fc2c81e370..d6c06e140277 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -198,7 +198,7 @@ struct mmu_notifier_ops { * invalidate_range_start()/end() notifiers, as * invalidate_range() already catches the points in time when an * external TLB range needs to be flushed. For more in depth - * discussion on this see Documentation/vm/mmu_notifier.rst + * discussion on this see Documentation/mm/mmu_notifier.rst * * Note that this function might be called with just a sub-range * of what was passed to invalidate_range_start()/end(), if diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 8cd975a8bfeb..2a243616f222 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -29,7 +29,7 @@ extern struct mm_struct *mm_alloc(void); * * Use mmdrop() to release the reference acquired by mmgrab(). * - * See also for an in-depth explanation + * See also for an in-depth explanation * of &mm_struct.mm_count vs &mm_struct.mm_users. */ static inline void mmgrab(struct mm_struct *mm) @@ -92,7 +92,7 @@ static inline void mmdrop_sched(struct mm_struct *mm) * * Use mmput() to release the reference acquired by mmget(). * - * See also for an in-depth explanation + * See also for an in-depth explanation * of &mm_struct.mm_count vs &mm_struct.mm_users. */ static inline void mmget(struct mm_struct *mm) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0c0fed1b348f..95a5b7aa1ae9 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -74,7 +74,7 @@ static inline int current_is_kswapd(void) /* * Unaddressable device memory support. See include/linux/hmm.h and - * Documentation/vm/hmm.rst. Short description is we need struct pages for + * Documentation/mm/hmm.rst. Short description is we need struct pages for * device memory that is unaddressable (inaccessible) by CPU, so that we can * migrate part of a process memory to device memory. * -- cgit From 70fb5ccf2ebb09a0c8ebba775041567812d45f86 Mon Sep 17 00:00:00 2001 From: Chen Yu Date: Mon, 13 Jun 2022 00:34:28 +0800 Subject: sched/fair: Introduce SIS_UTIL to search idle CPU based on sum of util_avg [Problem Statement] select_idle_cpu() might spend too much time searching for an idle CPU, when the system is overloaded. The following histogram is the time spent in select_idle_cpu(), when running 224 instances of netperf on a system with 112 CPUs per LLC domain: @usecs: [0] 533 | | [1] 5495 | | [2, 4) 12008 | | [4, 8) 239252 | | [8, 16) 4041924 |@@@@@@@@@@@@@@ | [16, 32) 12357398 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [32, 64) 14820255 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [64, 128) 13047682 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [128, 256) 8235013 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [256, 512) 4507667 |@@@@@@@@@@@@@@@ | [512, 1K) 2600472 |@@@@@@@@@ | [1K, 2K) 927912 |@@@ | [2K, 4K) 218720 | | [4K, 8K) 98161 | | [8K, 16K) 37722 | | [16K, 32K) 6715 | | [32K, 64K) 477 | | [64K, 128K) 7 | | netperf latency usecs: ======= case load Lat_99th std% TCP_RR thread-224 257.39 ( 0.21) The time spent in select_idle_cpu() is visible to netperf and might have a negative impact. [Symptom analysis] The patch [1] from Mel Gorman has been applied to track the efficiency of select_idle_sibling. Copy the indicators here: SIS Search Efficiency(se_eff%): A ratio expressed as a percentage of runqueues scanned versus idle CPUs found. A 100% efficiency indicates that the target, prev or recent CPU of a task was idle at wakeup. The lower the efficiency, the more runqueues were scanned before an idle CPU was found. SIS Domain Search Efficiency(dom_eff%): Similar, except only for the slower SIS patch. SIS Fast Success Rate(fast_rate%): Percentage of SIS that used target, prev or recent CPUs. SIS Success rate(success_rate%): Percentage of scans that found an idle CPU. The test is based on Aubrey's schedtests tool, including netperf, hackbench, schbench and tbench. Test on vanilla kernel: schedstat_parse.py -f netperf_vanilla.log case load se_eff% dom_eff% fast_rate% success_rate% TCP_RR 28 threads 99.978 18.535 99.995 100.000 TCP_RR 56 threads 99.397 5.671 99.964 100.000 TCP_RR 84 threads 21.721 6.818 73.632 100.000 TCP_RR 112 threads 12.500 5.533 59.000 100.000 TCP_RR 140 threads 8.524 4.535 49.020 100.000 TCP_RR 168 threads 6.438 3.945 40.309 99.999 TCP_RR 196 threads 5.397 3.718 32.320 99.982 TCP_RR 224 threads 4.874 3.661 25.775 99.767 UDP_RR 28 threads 99.988 17.704 99.997 100.000 UDP_RR 56 threads 99.528 5.977 99.970 100.000 UDP_RR 84 threads 24.219 6.992 76.479 100.000 UDP_RR 112 threads 13.907 5.706 62.538 100.000 UDP_RR 140 threads 9.408 4.699 52.519 100.000 UDP_RR 168 threads 7.095 4.077 44.352 100.000 UDP_RR 196 threads 5.757 3.775 35.764 99.991 UDP_RR 224 threads 5.124 3.704 28.748 99.860 schedstat_parse.py -f schbench_vanilla.log (each group has 28 tasks) case load se_eff% dom_eff% fast_rate% success_rate% normal 1 mthread 99.152 6.400 99.941 100.000 normal 2 mthreads 97.844 4.003 99.908 100.000 normal 3 mthreads 96.395 2.118 99.917 99.998 normal 4 mthreads 55.288 1.451 98.615 99.804 normal 5 mthreads 7.004 1.870 45.597 61.036 normal 6 mthreads 3.354 1.346 20.777 34.230 normal 7 mthreads 2.183 1.028 11.257 21.055 normal 8 mthreads 1.653 0.825 7.849 15.549 schedstat_parse.py -f hackbench_vanilla.log (each group has 28 tasks) case load se_eff% dom_eff% fast_rate% success_rate% process-pipe 1 group 99.991 7.692 99.999 100.000 process-pipe 2 groups 99.934 4.615 99.997 100.000 process-pipe 3 groups 99.597 3.198 99.987 100.000 process-pipe 4 groups 98.378 2.464 99.958 100.000 process-pipe 5 groups 27.474 3.653 89.811 99.800 process-pipe 6 groups 20.201 4.098 82.763 99.570 process-pipe 7 groups 16.423 4.156 77.398 99.316 process-pipe 8 groups 13.165 3.920 72.232 98.828 process-sockets 1 group 99.977 5.882 99.999 100.000 process-sockets 2 groups 99.927 5.505 99.996 100.000 process-sockets 3 groups 99.397 3.250 99.980 100.000 process-sockets 4 groups 79.680 4.258 98.864 99.998 process-sockets 5 groups 7.673 2.503 63.659 92.115 process-sockets 6 groups 4.642 1.584 58.946 88.048 process-sockets 7 groups 3.493 1.379 49.816 81.164 process-sockets 8 groups 3.015 1.407 40.845 75.500 threads-pipe 1 group 99.997 0.000 100.000 100.000 threads-pipe 2 groups 99.894 2.932 99.997 100.000 threads-pipe 3 groups 99.611 4.117 99.983 100.000 threads-pipe 4 groups 97.703 2.624 99.937 100.000 threads-pipe 5 groups 22.919 3.623 87.150 99.764 threads-pipe 6 groups 18.016 4.038 80.491 99.557 threads-pipe 7 groups 14.663 3.991 75.239 99.247 threads-pipe 8 groups 12.242 3.808 70.651 98.644 threads-sockets 1 group 99.990 6.667 99.999 100.000 threads-sockets 2 groups 99.940 5.114 99.997 100.000 threads-sockets 3 groups 99.469 4.115 99.977 100.000 threads-sockets 4 groups 87.528 4.038 99.400 100.000 threads-sockets 5 groups 6.942 2.398 59.244 88.337 threads-sockets 6 groups 4.359 1.954 49.448 87.860 threads-sockets 7 groups 2.845 1.345 41.198 77.102 threads-sockets 8 groups 2.871 1.404 38.512 74.312 schedstat_parse.py -f tbench_vanilla.log case load se_eff% dom_eff% fast_rate% success_rate% loopback 28 threads 99.976 18.369 99.995 100.000 loopback 56 threads 99.222 7.799 99.934 100.000 loopback 84 threads 19.723 6.819 70.215 100.000 loopback 112 threads 11.283 5.371 55.371 99.999 loopback 140 threads 0.000 0.000 0.000 0.000 loopback 168 threads 0.000 0.000 0.000 0.000 loopback 196 threads 0.000 0.000 0.000 0.000 loopback 224 threads 0.000 0.000 0.000 0.000 According to the test above, if the system becomes busy, the SIS Search Efficiency(se_eff%) drops significantly. Although some benchmarks would finally find an idle CPU(success_rate% = 100%), it is doubtful whether it is worth it to search the whole LLC domain. [Proposal] It would be ideal to have a crystal ball to answer this question: How many CPUs must a wakeup path walk down, before it can find an idle CPU? Many potential metrics could be used to predict the number. One candidate is the sum of util_avg in this LLC domain. The benefit of choosing util_avg is that it is a metric of accumulated historic activity, which seems to be smoother than instantaneous metrics (such as rq->nr_running). Besides, choosing the sum of util_avg would help predict the load of the LLC domain more precisely, because SIS_PROP uses one CPU's idle time to estimate the total LLC domain idle time. In summary, the lower the util_avg is, the more select_idle_cpu() should scan for idle CPU, and vice versa. When the sum of util_avg in this LLC domain hits 85% or above, the scan stops. The reason to choose 85% as the threshold is that this is the imbalance_pct(117) when a LLC sched group is overloaded. Introduce the quadratic function: y = SCHED_CAPACITY_SCALE - p * x^2 and y'= y / SCHED_CAPACITY_SCALE x is the ratio of sum_util compared to the CPU capacity: x = sum_util / (llc_weight * SCHED_CAPACITY_SCALE) y' is the ratio of CPUs to be scanned in the LLC domain, and the number of CPUs to scan is calculated by: nr_scan = llc_weight * y' Choosing quadratic function is because: [1] Compared to the linear function, it scans more aggressively when the sum_util is low. [2] Compared to the exponential function, it is easier to calculate. [3] It seems that there is no accurate mapping between the sum of util_avg and the number of CPUs to be scanned. Use heuristic scan for now. For a platform with 112 CPUs per LLC, the number of CPUs to scan is: sum_util% 0 5 15 25 35 45 55 65 75 85 86 ... scan_nr 112 111 108 102 93 81 65 47 25 1 0 ... For a platform with 16 CPUs per LLC, the number of CPUs to scan is: sum_util% 0 5 15 25 35 45 55 65 75 85 86 ... scan_nr 16 15 15 14 13 11 9 6 3 0 0 ... Furthermore, to minimize the overhead of calculating the metrics in select_idle_cpu(), borrow the statistics from periodic load balance. As mentioned by Abel, on a platform with 112 CPUs per LLC, the sum_util calculated by periodic load balance after 112 ms would decay to about 0.5 * 0.5 * 0.5 * 0.7 = 8.75%, thus bringing a delay in reflecting the latest utilization. But it is a trade-off. Checking the util_avg in newidle load balance would be more frequent, but it brings overhead - multiple CPUs write/read the per-LLC shared variable and introduces cache contention. Tim also mentioned that, it is allowed to be non-optimal in terms of scheduling for the short-term variations, but if there is a long-term trend in the load behavior, the scheduler can adjust for that. When SIS_UTIL is enabled, the select_idle_cpu() uses the nr_scan calculated by SIS_UTIL instead of the one from SIS_PROP. As Peter and Mel suggested, SIS_UTIL should be enabled by default. This patch is based on the util_avg, which is very sensitive to the CPU frequency invariance. There is an issue that, when the max frequency has been clamp, the util_avg would decay insanely fast when the CPU is idle. Commit addca285120b ("cpufreq: intel_pstate: Handle no_turbo in frequency invariance") could be used to mitigate this symptom, by adjusting the arch_max_freq_ratio when turbo is disabled. But this issue is still not thoroughly fixed, because the current code is unaware of the user-specified max CPU frequency. [Test result] netperf and tbench were launched with 25% 50% 75% 100% 125% 150% 175% 200% of CPU number respectively. Hackbench and schbench were launched by 1, 2 ,4, 8 groups. Each test lasts for 100 seconds and repeats 3 times. The following is the benchmark result comparison between baseline:vanilla v5.19-rc1 and compare:patched kernel. Positive compare% indicates better performance. Each netperf test is a: netperf -4 -H 127.0.1 -t TCP/UDP_RR -c -C -l 100 netperf.throughput ======= case load baseline(std%) compare%( std%) TCP_RR 28 threads 1.00 ( 0.34) -0.16 ( 0.40) TCP_RR 56 threads 1.00 ( 0.19) -0.02 ( 0.20) TCP_RR 84 threads 1.00 ( 0.39) -0.47 ( 0.40) TCP_RR 112 threads 1.00 ( 0.21) -0.66 ( 0.22) TCP_RR 140 threads 1.00 ( 0.19) -0.69 ( 0.19) TCP_RR 168 threads 1.00 ( 0.18) -0.48 ( 0.18) TCP_RR 196 threads 1.00 ( 0.16) +194.70 ( 16.43) TCP_RR 224 threads 1.00 ( 0.16) +197.30 ( 7.85) UDP_RR 28 threads 1.00 ( 0.37) +0.35 ( 0.33) UDP_RR 56 threads 1.00 ( 11.18) -0.32 ( 0.21) UDP_RR 84 threads 1.00 ( 1.46) -0.98 ( 0.32) UDP_RR 112 threads 1.00 ( 28.85) -2.48 ( 19.61) UDP_RR 140 threads 1.00 ( 0.70) -0.71 ( 14.04) UDP_RR 168 threads 1.00 ( 14.33) -0.26 ( 11.16) UDP_RR 196 threads 1.00 ( 12.92) +186.92 ( 20.93) UDP_RR 224 threads 1.00 ( 11.74) +196.79 ( 18.62) Take the 224 threads as an example, the SIS search metrics changes are illustrated below: vanilla patched 4544492 +237.5% 15338634 sched_debug.cpu.sis_domain_search.avg 38539 +39686.8% 15333634 sched_debug.cpu.sis_failed.avg 128300000 -87.9% 15551326 sched_debug.cpu.sis_scanned.avg 5842896 +162.7% 15347978 sched_debug.cpu.sis_search.avg There is -87.9% less CPU scans after patched, which indicates lower overhead. Besides, with this patch applied, there is -13% less rq lock contention in perf-profile.calltrace.cycles-pp._raw_spin_lock.raw_spin_rq_lock_nested .try_to_wake_up.default_wake_function.woken_wake_function. This might help explain the performance improvement - Because this patch allows the waking task to remain on the previous CPU, rather than grabbing other CPUs' lock. Each hackbench test is a: hackbench -g $job --process/threads --pipe/sockets -l 1000000 -s 100 hackbench.throughput ========= case load baseline(std%) compare%( std%) process-pipe 1 group 1.00 ( 1.29) +0.57 ( 0.47) process-pipe 2 groups 1.00 ( 0.27) +0.77 ( 0.81) process-pipe 4 groups 1.00 ( 0.26) +1.17 ( 0.02) process-pipe 8 groups 1.00 ( 0.15) -4.79 ( 0.02) process-sockets 1 group 1.00 ( 0.63) -0.92 ( 0.13) process-sockets 2 groups 1.00 ( 0.03) -0.83 ( 0.14) process-sockets 4 groups 1.00 ( 0.40) +5.20 ( 0.26) process-sockets 8 groups 1.00 ( 0.04) +3.52 ( 0.03) threads-pipe 1 group 1.00 ( 1.28) +0.07 ( 0.14) threads-pipe 2 groups 1.00 ( 0.22) -0.49 ( 0.74) threads-pipe 4 groups 1.00 ( 0.05) +1.88 ( 0.13) threads-pipe 8 groups 1.00 ( 0.09) -4.90 ( 0.06) threads-sockets 1 group 1.00 ( 0.25) -0.70 ( 0.53) threads-sockets 2 groups 1.00 ( 0.10) -0.63 ( 0.26) threads-sockets 4 groups 1.00 ( 0.19) +11.92 ( 0.24) threads-sockets 8 groups 1.00 ( 0.08) +4.31 ( 0.11) Each tbench test is a: tbench -t 100 $job 127.0.0.1 tbench.throughput ====== case load baseline(std%) compare%( std%) loopback 28 threads 1.00 ( 0.06) -0.14 ( 0.09) loopback 56 threads 1.00 ( 0.03) -0.04 ( 0.17) loopback 84 threads 1.00 ( 0.05) +0.36 ( 0.13) loopback 112 threads 1.00 ( 0.03) +0.51 ( 0.03) loopback 140 threads 1.00 ( 0.02) -1.67 ( 0.19) loopback 168 threads 1.00 ( 0.38) +1.27 ( 0.27) loopback 196 threads 1.00 ( 0.11) +1.34 ( 0.17) loopback 224 threads 1.00 ( 0.11) +1.67 ( 0.22) Each schbench test is a: schbench -m $job -t 28 -r 100 -s 30000 -c 30000 schbench.latency_90%_us ======== case load baseline(std%) compare%( std%) normal 1 mthread 1.00 ( 31.22) -7.36 ( 20.25)* normal 2 mthreads 1.00 ( 2.45) -0.48 ( 1.79) normal 4 mthreads 1.00 ( 1.69) +0.45 ( 0.64) normal 8 mthreads 1.00 ( 5.47) +9.81 ( 14.28) *Consider the Standard Deviation, this -7.36% regression might not be valid. Also, a OLTP workload with a commercial RDBMS has been tested, and there is no significant change. There were concerns that unbalanced tasks among CPUs would cause problems. For example, suppose the LLC domain is composed of 8 CPUs, and 7 tasks are bound to CPU0~CPU6, while CPU7 is idle: CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7 util_avg 1024 1024 1024 1024 1024 1024 1024 0 Since the util_avg ratio is 87.5%( = 7/8 ), which is higher than 85%, select_idle_cpu() will not scan, thus CPU7 is undetected during scan. But according to Mel, it is unlikely the CPU7 will be idle all the time because CPU7 could pull some tasks via CPU_NEWLY_IDLE. lkp(kernel test robot) has reported a regression on stress-ng.sock on a very busy system. According to the sched_debug statistics, it might be caused by SIS_UTIL terminates the scan and chooses a previous CPU earlier, and this might introduce more context switch, especially involuntary preemption, which impacts a busy stress-ng. This regression has shown that, not all benchmarks in every scenario benefit from idle CPU scan limit, and it needs further investigation. Besides, there is slight regression in hackbench's 16 groups case when the LLC domain has 16 CPUs. Prateek mentioned that we should scan aggressively in an LLC domain with 16 CPUs. Because the cost to search for an idle one among 16 CPUs is negligible. The current patch aims to propose a generic solution and only considers the util_avg. Something like the below could be applied on top of the current patch to fulfill the requirement: if (llc_weight <= 16) nr_scan = nr_scan * 32 / llc_weight; For LLC domain with 16 CPUs, the nr_scan will be expanded to 2 times large. The smaller the CPU number this LLC domain has, the larger nr_scan will be expanded. This needs further investigation. There is also ongoing work[2] from Abel to filter out the busy CPUs during wakeup, to further speed up the idle CPU scan. And it could be a following-up optimization on top of this change. Suggested-by: Tim Chen Suggested-by: Peter Zijlstra Signed-off-by: Chen Yu Signed-off-by: Peter Zijlstra (Intel) Tested-by: Yicong Yang Tested-by: Mohini Narkhede Tested-by: K Prateek Nayak Link: https://lore.kernel.org/r/20220612163428.849378-1-yu.c.chen@intel.com --- include/linux/sched/topology.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sched/topology.h b/include/linux/sched/topology.h index 56cffe42abbc..816df6cc444e 100644 --- a/include/linux/sched/topology.h +++ b/include/linux/sched/topology.h @@ -81,6 +81,7 @@ struct sched_domain_shared { atomic_t ref; atomic_t nr_busy_cpus; int has_idle_cores; + int nr_idle_scan; }; struct sched_domain { -- cgit From 119a784c81270eb88e573174ed2209225d646656 Mon Sep 17 00:00:00 2001 From: Namhyung Kim Date: Thu, 16 Jun 2022 11:06:23 -0700 Subject: perf/core: Add a new read format to get a number of lost samples Sometimes we want to know an accurate number of samples even if it's lost. Currenlty PERF_RECORD_LOST is generated for a ring-buffer which might be shared with other events. So it's hard to know per-event lost count. Add event->lost_samples field and PERF_FORMAT_LOST to retrieve it from userspace. Original-patch-by: Jiri Olsa Signed-off-by: Namhyung Kim Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220616180623.1358843-1-namhyung@kernel.org --- include/linux/perf_event.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index da759560eec5..ee8b9ecdc03b 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -759,6 +759,8 @@ struct perf_event { struct pid_namespace *ns; u64 id; + atomic64_t lost_samples; + u64 (*clock)(void); perf_overflow_handler_t overflow_handler; void *overflow_handler_context; -- cgit From bb4479994945e9170534389a7762eb56149320ac Mon Sep 17 00:00:00 2001 From: Dietmar Eggemann Date: Tue, 21 Jun 2022 10:04:10 +0100 Subject: sched, drivers: Remove max param from effective_cpu_util()/sched_cpu_util() effective_cpu_util() already has a `int cpu' parameter which allows to retrieve the CPU capacity scale factor (or maximum CPU capacity) inside this function via an arch_scale_cpu_capacity(cpu). A lot of code calling effective_cpu_util() (or the shim sched_cpu_util()) needs the maximum CPU capacity, i.e. it will call arch_scale_cpu_capacity() already. But not having to pass it into effective_cpu_util() will make the EAS wake-up code easier, especially when the maximum CPU capacity reduced by the thermal pressure is passed through the EAS wake-up functions. Due to the asymmetric CPU capacity support of arm/arm64 architectures, arch_scale_cpu_capacity(int cpu) is a per-CPU variable read access via per_cpu(cpu_scale, cpu) on such a system. On all other architectures it is a a compile-time constant (SCHED_CAPACITY_SCALE). Signed-off-by: Dietmar Eggemann Signed-off-by: Peter Zijlstra (Intel) Acked-by: Vincent Guittot Tested-by: Lukasz Luba Link: https://lkml.kernel.org/r/20220621090414.433602-4-vdonnefort@google.com --- include/linux/sched.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index c46f3a63b758..88b8817b827d 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -2257,7 +2257,7 @@ static inline bool owner_on_cpu(struct task_struct *owner) } /* Returns effective CPU energy utilization, as seen by the scheduler */ -unsigned long sched_cpu_util(int cpu, unsigned long max); +unsigned long sched_cpu_util(int cpu); #endif /* CONFIG_SMP */ #ifdef CONFIG_RSEQ -- cgit From 586b3b7600e44c1cad52c683ccfbb76fb2c10cc8 Mon Sep 17 00:00:00 2001 From: Sai Krishna Potthuri Date: Fri, 17 Jun 2022 16:16:56 +0530 Subject: firmware: xilinx: Add configuration values for tri-state Add configuration values(enable/disable) for tri-state parameter. Signed-off-by: Sai Krishna Potthuri Link: https://lore.kernel.org/r/1655462819-28801-2-git-send-email-lakshmi.sai.krishna.potthuri@xilinx.com Signed-off-by: Linus Walleij --- include/linux/firmware/xlnx-zynqmp.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h index 1ec73d5352c3..4901fb31de3e 100644 --- a/include/linux/firmware/xlnx-zynqmp.h +++ b/include/linux/firmware/xlnx-zynqmp.h @@ -368,6 +368,11 @@ enum pm_pinctrl_drive_strength { PM_PINCTRL_DRIVE_STRENGTH_12MA = 3, }; +enum pm_pinctrl_tri_state { + PM_PINCTRL_TRI_STATE_DISABLE = 0, + PM_PINCTRL_TRI_STATE_ENABLE = 1, +}; + enum zynqmp_pm_shutdown_type { ZYNQMP_PM_SHUTDOWN_TYPE_SHUTDOWN = 0, ZYNQMP_PM_SHUTDOWN_TYPE_RESET = 1, -- cgit From 8698e3bab4dd7968666e84e111d0bfd17c040e77 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Mon, 27 Jun 2022 20:47:19 +0300 Subject: fanotify: refine the validation checks on non-dir inode mask Commit ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") added restrictions about setting dirent events in the mask of a non-dir inode mark, which does not make any sense. For backward compatibility, these restictions were added only to new (v5.17+) APIs. It also does not make any sense to set the flags FAN_EVENT_ON_CHILD or FAN_ONDIR in the mask of a non-dir inode. Add these flags to the dir-only restriction of the new APIs as well. Move the check of the dir-only flags for new APIs into the helper fanotify_events_supported(), which is only called for FAN_MARK_ADD, because there is no need to error on an attempt to remove the dir-only flags from non-dir inode. Fixes: ceaf69f8eadc ("fanotify: do not allow setting dirent events in mask of non-dir") Link: https://lore.kernel.org/linux-fsdevel/20220627113224.kr2725conevh53u4@quack3.lan/ Link: https://lore.kernel.org/r/20220627174719.2838175-1-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fanotify.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index edc28555814c..e517dbcf74ed 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -111,6 +111,10 @@ FANOTIFY_PERM_EVENTS | \ FAN_Q_OVERFLOW | FAN_ONDIR) +/* Events and flags relevant only for directories */ +#define FANOTIFY_DIRONLY_EVENT_BITS (FANOTIFY_DIRENT_EVENTS | \ + FAN_EVENT_ON_CHILD | FAN_ONDIR) + #define ALL_FANOTIFY_EVENT_BITS (FANOTIFY_OUTGOING_EVENTS | \ FANOTIFY_EVENT_FLAGS) -- cgit From 0e584d46218e3b9dc12a98e18e81a0cd3e0d5419 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 28 Jun 2022 10:46:22 +0100 Subject: regulator: fix a kernel-doc warning document n_ramp_values field at struct regulator_desc, in order to solve this warning: include/linux/regulator/driver.h:434: warning: Function parameter or member 'n_ramp_values' not described in 'regulator_desc' Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/15efc16e878aa327aa2769023bcdf959a795f41d.1656409369.git.mchehab@kernel.org Signed-off-by: Mark Brown --- include/linux/regulator/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/regulator/driver.h b/include/linux/regulator/driver.h index 0228caaa6741..f9a7461e72b8 100644 --- a/include/linux/regulator/driver.h +++ b/include/linux/regulator/driver.h @@ -348,6 +348,7 @@ enum regulator_type { * @ramp_delay_table: Table for mapping the regulator ramp-rate values. Values * should be given in units of V/S (uV/uS). See the * regulator_set_ramp_delay_regmap(). + * @n_ramp_values: number of elements at @ramp_delay_table. * * @enable_time: Time taken for initial enable of regulator (in uS). * @off_on_delay: guard time (in uS), before re-enabling a regulator -- cgit From 1f90307e5f0d7bc9a336ead528f616a5df8e5944 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Sun, 19 Jun 2022 08:05:49 +0200 Subject: block: remove QUEUE_FLAG_DEAD Disallow setting the blk-mq state on any queue that is already dying as setting the state even then is a bad idea, and remove the now unused QUEUE_FLAG_DEAD flag. Signed-off-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Link: https://lore.kernel.org/r/20220619060552.1850436-4-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b2d42201bd5d..f4632f4fe884 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -564,7 +564,6 @@ struct request_queue { #define QUEUE_FLAG_NOXMERGES 9 /* No extended merges */ #define QUEUE_FLAG_ADD_RANDOM 10 /* Contributes to random pool */ #define QUEUE_FLAG_SAME_FORCE 12 /* force complete on same CPU */ -#define QUEUE_FLAG_DEAD 13 /* queue tear-down finished */ #define QUEUE_FLAG_INIT_DONE 14 /* queue is initialized */ #define QUEUE_FLAG_STABLE_WRITES 15 /* don't modify blks until WB is done */ #define QUEUE_FLAG_POLL 16 /* IO polling enabled if set */ @@ -592,7 +591,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) #define blk_queue_dying(q) test_bit(QUEUE_FLAG_DYING, &(q)->queue_flags) #define blk_queue_has_srcu(q) test_bit(QUEUE_FLAG_HAS_SRCU, &(q)->queue_flags) -#define blk_queue_dead(q) test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags) #define blk_queue_init_done(q) test_bit(QUEUE_FLAG_INIT_DONE, &(q)->queue_flags) #define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags) #define blk_queue_noxmerges(q) \ -- cgit From 6f8191fdf41d3a53cc1d63fe2234e812c55a0092 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Sun, 19 Jun 2022 08:05:51 +0200 Subject: block: simplify disk shutdown Set the queue dying flag and call blk_mq_exit_queue from del_gendisk for all disks that do not have separately allocated queues, and thus remove the need to call blk_cleanup_queue for them. Rename blk_cleanup_disk to blk_mq_destroy_queue to make it clear that this function is intended only for separately allocated blk-mq queues. This saves an extra queue freeze for devices without a separately allocated queue. Signed-off-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Link: https://lore.kernel.org/r/20220619060552.1850436-6-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 3 +++ include/linux/blkdev.h | 4 +--- 2 files changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index e2d9daf7e8dd..0fd96e92c6c6 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -686,10 +686,13 @@ struct gendisk *__blk_mq_alloc_disk(struct blk_mq_tag_set *set, void *queuedata, \ __blk_mq_alloc_disk(set, queuedata, &__key); \ }) +struct gendisk *blk_mq_alloc_disk_for_queue(struct request_queue *q, + struct lock_class_key *lkclass); struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *); int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, struct request_queue *q); void blk_mq_unregister_dev(struct device *, struct request_queue *); +void blk_mq_destroy_queue(struct request_queue *); int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set); int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set, diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f4632f4fe884..530eeccffda3 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -148,6 +148,7 @@ struct gendisk { #define GD_NATIVE_CAPACITY 3 #define GD_ADDED 4 #define GD_SUPPRESS_PART_SCAN 5 +#define GD_OWNS_QUEUE 6 struct mutex open_mutex; /* open/close mutex */ unsigned open_partitions; /* number of open partitions */ @@ -815,8 +816,6 @@ static inline u64 sb_bdev_nr_blocks(struct super_block *sb) int bdev_disk_changed(struct gendisk *disk, bool invalidate); -struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id, - struct lock_class_key *lkclass); void put_disk(struct gendisk *disk); struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass); @@ -933,7 +932,6 @@ static inline unsigned int blk_chunk_sectors_left(sector_t offset, /* * Access functions for manipulating queue properties */ -extern void blk_cleanup_queue(struct request_queue *); void blk_queue_bounce_limit(struct request_queue *q, enum blk_bounce limit); extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int); extern void blk_queue_chunk_sectors(struct request_queue *, unsigned int); -- cgit From 8b9ab62662048a3274361c7e5f64037c2c133e2c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Sun, 19 Jun 2022 08:05:52 +0200 Subject: block: remove blk_cleanup_disk blk_cleanup_disk is nothing but a trivial wrapper for put_disk now, so remove it. Signed-off-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Link: https://lore.kernel.org/r/20220619060552.1850436-7-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 530eeccffda3..22b12531aeb7 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -834,7 +834,6 @@ struct gendisk *__blk_alloc_disk(int node, struct lock_class_key *lkclass); \ __blk_alloc_disk(node_id, &__key); \ }) -void blk_cleanup_disk(struct gendisk *disk); int __register_blkdev(unsigned int major, const char *name, void (*probe)(dev_t devt)); -- cgit From 0fe3dbbefb74a8575f61d7801b08dbc50523d60d Mon Sep 17 00:00:00 2001 From: Tao Liu Date: Mon, 27 Jun 2022 22:00:04 +0800 Subject: linux/dim: Fix divide by 0 in RDMA DIM Fix a divide 0 error in rdma_dim_stats_compare() when prev->cpe_ratio == 0. CallTrace: Hardware name: H3C R4900 G3/RS33M2C9S, BIOS 2.00.37P21 03/12/2020 task: ffff880194b78000 task.stack: ffffc90006714000 RIP: 0010:backport_rdma_dim+0x10e/0x240 [mlx_compat] RSP: 0018:ffff880c10e83ec0 EFLAGS: 00010202 RAX: 0000000000002710 RBX: ffff88096cd7f780 RCX: 0000000000000064 RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000001 RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 000000001d7c6c09 R13: ffff88096cd7f780 R14: ffff880b174fe800 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff880c10e80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000a0965b00 CR3: 000000000200a003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ib_poll_handler+0x43/0x80 [ib_core] irq_poll_softirq+0xae/0x110 __do_softirq+0xd1/0x28c irq_exit+0xde/0xf0 do_IRQ+0x54/0xe0 common_interrupt+0x8f/0x8f ? cpuidle_enter_state+0xd9/0x2a0 ? cpuidle_enter_state+0xc7/0x2a0 ? do_idle+0x170/0x1d0 ? cpu_startup_entry+0x6f/0x80 ? start_secondary+0x1b9/0x210 ? secondary_startup_64+0xa5/0xb0 Code: 0f 87 e1 00 00 00 8b 4c 24 14 44 8b 43 14 89 c8 4d 63 c8 44 29 c0 99 31 d0 29 d0 31 d2 48 98 48 8d 04 80 48 8d 04 80 48 c1 e0 02 <49> f7 f1 48 83 f8 0a 0f 86 c1 00 00 00 44 39 c1 7f 10 48 89 df RIP: backport_rdma_dim+0x10e/0x240 [mlx_compat] RSP: ffff880c10e83ec0 Fixes: f4915455dcf0 ("linux/dim: Implement RDMA adaptive moderation (DIM)") Link: https://lore.kernel.org/r/20220627140004.3099-1-thomas.liu@ucloud.cn Signed-off-by: Tao Liu Reviewed-by: Max Gurtovoy Acked-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe --- include/linux/dim.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dim.h b/include/linux/dim.h index b698266d0035..6c5733981563 100644 --- a/include/linux/dim.h +++ b/include/linux/dim.h @@ -21,7 +21,7 @@ * We consider 10% difference as significant. */ #define IS_SIGNIFICANT_DIFF(val, ref) \ - (((100UL * abs((val) - (ref))) / (ref)) > 10) + ((ref) && (((100UL * abs((val) - (ref))) / (ref)) > 10)) /* * Calculate the gap between two values. -- cgit From cc5c516df028b221d94c65c47c5ae8d20f61b6f9 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 28 Jun 2022 19:18:45 +0200 Subject: block: simplify blktrace sysfs attribute creation Add the trace attributes to the default gendisk attributes, just like we already do for partitions. Signed-off-by: Christoph Hellwig Reviewed-by: Bart Van Assche Link: https://lore.kernel.org/r/20220628171850.1313069-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blktrace_api.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index 623e22492afa..f6f9b544365a 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -77,10 +77,6 @@ extern int blk_trace_setup(struct request_queue *q, char *name, dev_t dev, char __user *arg); extern int blk_trace_startstop(struct request_queue *q, int start); extern int blk_trace_remove(struct request_queue *q); -extern void blk_trace_remove_sysfs(struct device *dev); -extern int blk_trace_init_sysfs(struct device *dev); - -extern struct attribute_group blk_trace_attr_group; #else /* !CONFIG_BLK_DEV_IO_TRACE */ # define blk_trace_ioctl(bdev, cmd, arg) (-ENOTTY) @@ -91,13 +87,7 @@ extern struct attribute_group blk_trace_attr_group; # define blk_trace_remove(q) (-ENOTTY) # define blk_add_trace_msg(q, fmt, ...) do { } while (0) # define blk_add_cgroup_trace_msg(q, cg, fmt, ...) do { } while (0) -# define blk_trace_remove_sysfs(dev) do { } while (0) # define blk_trace_note_message_enabled(q) (false) -static inline int blk_trace_init_sysfs(struct device *dev) -{ - return 0; -} - #endif /* CONFIG_BLK_DEV_IO_TRACE */ #ifdef CONFIG_COMPAT -- cgit From 8682b92e5ab852b93739a0f2b261fff4c733be57 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 28 Jun 2022 19:18:50 +0200 Subject: blk-mq: cleanup disk sysfs registration Pass a gendisk to the sysfs register/unregister functions and give them descriptive names. Also move the unregistration helper next to the one doing the registration. Signed-off-by: Christoph Hellwig Reviewed-by: Bart Van Assche Link: https://lore.kernel.org/r/20220628171850.1313069-7-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 0fd96e92c6c6..43aad0da3305 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -691,7 +691,6 @@ struct gendisk *blk_mq_alloc_disk_for_queue(struct request_queue *q, struct request_queue *blk_mq_init_queue(struct blk_mq_tag_set *); int blk_mq_init_allocated_queue(struct blk_mq_tag_set *set, struct request_queue *q); -void blk_mq_unregister_dev(struct device *, struct request_queue *); void blk_mq_destroy_queue(struct request_queue *); int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set); -- cgit From 8add9a3a2243166f8f60fc20e876caaf30a333f7 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Tue, 28 Jun 2022 15:18:21 +0100 Subject: efi: Simplify arch_efi_call_virt() macro Currently, the arch_efi_call_virt() assumes all users of it will have defined a type 'efi_##f##_t' to make use of it. Simplify the arch_efi_call_virt() macro by eliminating the explicit need for efi_##f##_t type for every user of this macro. Signed-off-by: Sudeep Holla Acked-by: Russell King (Oracle) [ardb: apply Sudeep's ARM fix to i686, Loongarch and RISC-V too] Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 93ce85a14a46..9ff63acef1ec 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1200,6 +1200,8 @@ static inline void efi_check_for_embedded_firmwares(void) { } efi_status_t efi_random_get_seed(void); +#define arch_efi_call_virt(p, f, args...) ((p)->f(args)) + /* * Arch code can implement the following three template macros, avoiding * reptition for the void/non-void return cases of {__,}efi_call_virt(): -- cgit From c3497fd009ef2c59eea60d21c3ac22de3585ed7d Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 12 Jun 2022 19:50:29 -0400 Subject: fix short copy handling in copy_mc_pipe_to_iter() Unlike other copying operations on ITER_PIPE, copy_mc_to_iter() can result in a short copy. In that case we need to trim the unused buffers, as well as the length of partially filled one - it's not enough to set ->head, ->iov_offset and ->count to reflect how much had we copied. Not hard to fix, fortunately... I'd put a helper (pipe_discard_from(pipe, head)) into pipe_fs_i.h, rather than iov_iter.c - it has nothing to do with iov_iter and having it will allow us to avoid an ugly kludge in fs/splice.c. We could put it into lib/iov_iter.c for now and move it later, but I don't see the point going that way... Cc: stable@kernel.org # 4.19+ Fixes: ca146f6f091e "lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()" Reviewed-by: Jeff Layton Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro --- include/linux/pipe_fs_i.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index cb0fd633a610..4ea496924106 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -229,6 +229,15 @@ static inline bool pipe_buf_try_steal(struct pipe_inode_info *pipe, return buf->ops->try_steal(pipe, buf); } +static inline void pipe_discard_from(struct pipe_inode_info *pipe, + unsigned int old_head) +{ + unsigned int mask = pipe->ring_size - 1; + + while (pipe->head > old_head) + pipe_buf_release(pipe, &pipe->bufs[--pipe->head & mask]); +} + /* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual memory allocation, whereas PIPE_BUF makes atomicity guarantees. */ #define PIPE_SIZE PAGE_SIZE -- cgit From 9e121040e54abef9ed5542e5fdfa87911cd96204 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Tue, 7 Jun 2022 20:23:34 +0200 Subject: firmware: sysfb: Make sysfb_create_simplefb() return a pdev pointer This function just returned 0 on success or an errno code on error, but it could be useful for sysfb_init() callers to have a pointer to the device. Signed-off-by: Javier Martinez Canillas Reviewed-by: Daniel Vetter Reviewed-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-2-javierm@redhat.com --- include/linux/sysfb.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h index b0dcfa26d07b..708152e9037b 100644 --- a/include/linux/sysfb.h +++ b/include/linux/sysfb.h @@ -72,8 +72,8 @@ static inline void sysfb_apply_efi_quirks(struct platform_device *pd) bool sysfb_parse_mode(const struct screen_info *si, struct simplefb_platform_data *mode); -int sysfb_create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode); +struct platform_device *sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode); #else /* CONFIG_SYSFB_SIMPLE */ @@ -83,10 +83,10 @@ static inline bool sysfb_parse_mode(const struct screen_info *si, return false; } -static inline int sysfb_create_simplefb(const struct screen_info *si, - const struct simplefb_platform_data *mode) +static inline struct platform_device *sysfb_create_simplefb(const struct screen_info *si, + const struct simplefb_platform_data *mode) { - return -EINVAL; + return ERR_PTR(-EINVAL); } #endif /* CONFIG_SYSFB_SIMPLE */ -- cgit From bde376e9de3c0bc55eedc8956b0f114c05531595 Mon Sep 17 00:00:00 2001 From: Javier Martinez Canillas Date: Tue, 7 Jun 2022 20:23:35 +0200 Subject: firmware: sysfb: Add sysfb_disable() helper function This can be used by subsystems to unregister a platform device registered by sysfb and also to disable future platform device registration in sysfb. Suggested-by: Daniel Vetter Signed-off-by: Javier Martinez Canillas Reviewed-by: Daniel Vetter Link: https://patchwork.freedesktop.org/patch/msgid/20220607182338.344270-3-javierm@redhat.com --- include/linux/sysfb.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sysfb.h b/include/linux/sysfb.h index 708152e9037b..8ba8b5be5567 100644 --- a/include/linux/sysfb.h +++ b/include/linux/sysfb.h @@ -55,6 +55,18 @@ struct efifb_dmi_info { int flags; }; +#ifdef CONFIG_SYSFB + +void sysfb_disable(void); + +#else /* CONFIG_SYSFB */ + +static inline void sysfb_disable(void) +{ +} + +#endif /* CONFIG_SYSFB */ + #ifdef CONFIG_EFI extern struct efifb_dmi_info efifb_dmi_list[]; -- cgit From 9adf24a40978c19f57f44572b292b38938da7686 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 29 Jun 2022 12:25:53 +0200 Subject: fs: port HAS_UNMAPPED_ID() to vfs{g,u}id_t The HAS_UNMAPPED_ID() helper is fully self contained so we can port it to vfs{g,u}id_t without much effort. Cc: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/fs.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index d6e3347cbf69..ec2e35886779 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2323,8 +2323,8 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags static inline bool HAS_UNMAPPED_ID(struct user_namespace *mnt_userns, struct inode *inode) { - return !uid_valid(i_uid_into_mnt(mnt_userns, inode)) || - !gid_valid(i_gid_into_mnt(mnt_userns, inode)); + return !vfsuid_valid(i_uid_into_vfsuid(mnt_userns, inode)) || + !vfsgid_valid(i_gid_into_vfsgid(mnt_userns, inode)); } static inline int iocb_flags(struct file *file); -- cgit From fc04dafd263d8c0b0251b63d47f35b29373d50f2 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 29 Jun 2022 13:15:10 +0200 Subject: mnt_idmapping: use new helpers in mapped_fs{g,u}id() The old non-type safe helpers will soon be removed. Cc: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/mnt_idmapping.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mnt_idmapping.h b/include/linux/mnt_idmapping.h index 21bd22a7b326..be83643cadff 100644 --- a/include/linux/mnt_idmapping.h +++ b/include/linux/mnt_idmapping.h @@ -422,7 +422,8 @@ static inline bool vfsgid_has_fsmapping(struct user_namespace *mnt_userns, static inline kuid_t mapped_fsuid(struct user_namespace *mnt_userns, struct user_namespace *fs_userns) { - return mapped_kuid_user(mnt_userns, fs_userns, current_fsuid()); + return from_vfsuid(mnt_userns, fs_userns, + VFSUIDT_INIT(current_fsuid())); } /** @@ -441,7 +442,8 @@ static inline kuid_t mapped_fsuid(struct user_namespace *mnt_userns, static inline kgid_t mapped_fsgid(struct user_namespace *mnt_userns, struct user_namespace *fs_userns) { - return mapped_kgid_user(mnt_userns, fs_userns, current_fsgid()); + return from_vfsgid(mnt_userns, fs_userns, + VFSGIDT_INIT(current_fsgid())); } #endif /* _LINUX_MNT_IDMAPPING_H */ -- cgit From acd6510dd7ab3664b69eb99e37c4fd6325a7d442 Mon Sep 17 00:00:00 2001 From: Tanmay Shah Date: Tue, 7 Jun 2022 15:42:54 -0700 Subject: firmware: xilinx: Add TF_A_PM_REGISTER_SGI SMC call SGI interrupt register and reset is performed by EEMI ioctl IOCTL_REGISTER_SGI. However, this is not correct use of EEMI call. SGI registration functionality does not qualify as energy management activity and so shouldn't be mapped to EEMI call. This new call will replace IOCTL_REGISTER_SGI and will be handled by TF-A specific handler in TF-A. To maintain backward compatibility for a while firmware driver will still use IOCTL_REGISTER_SGI as fallback strategy if new call fails or is not supported by TF-A. This new design also helps to make TF-A as pass through layer for EEMI calls. So we don't have to maintain PM_IOCTL as EEMI API ID in TF-A. Signed-off-by: Tanmay Shah Acked-by: Michal Simek Link: https://lore.kernel.org/r/20220607224253.54919-1-tanmay.shah@xilinx.com Signed-off-by: Michal Simek --- include/linux/firmware/xlnx-zynqmp.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h index 1ec73d5352c3..cbde3b1fa414 100644 --- a/include/linux/firmware/xlnx-zynqmp.h +++ b/include/linux/firmware/xlnx-zynqmp.h @@ -34,6 +34,7 @@ #define PM_API_VERSION_2 2 /* ATF only commands */ +#define TF_A_PM_REGISTER_SGI 0xa04 #define PM_GET_TRUSTZONE_VERSION 0xa03 #define PM_SET_SUSPEND_MODE 0xa02 #define GET_CALLBACK_DATA 0xa01 @@ -468,6 +469,7 @@ int zynqmp_pm_feature(const u32 api_id); int zynqmp_pm_is_function_supported(const u32 api_id, const u32 id); int zynqmp_pm_set_feature_config(enum pm_feature_config_id id, u32 value); int zynqmp_pm_get_feature_config(enum pm_feature_config_id id, u32 *payload); +int zynqmp_pm_register_sgi(u32 sgi_num, u32 reset); #else static inline int zynqmp_pm_get_api_version(u32 *version) { @@ -733,6 +735,11 @@ static inline int zynqmp_pm_get_feature_config(enum pm_feature_config_id id, { return -ENODEV; } + +static inline int zynqmp_pm_register_sgi(u32 sgi_num, u32 reset) +{ + return -ENODEV; +} #endif #endif /* __FIRMWARE_ZYNQMP_H__ */ -- cgit From 6ffcd825e7d0416d78fd41cd5b7856a78122cc8c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 28 Jun 2022 20:41:40 -0400 Subject: mm: Remove __delete_from_page_cache() This wrapper is no longer used. Remove it and all references to it. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/pagemap.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ce96866fbec4..44b9c265a234 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1107,10 +1107,6 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, void filemap_remove_folio(struct folio *folio); void delete_from_page_cache(struct page *page); void __filemap_remove_folio(struct folio *folio, void *shadow); -static inline void __delete_from_page_cache(struct page *page, void *shadow) -{ - __filemap_remove_folio(page_folio(page), shadow); -} void replace_page_cache_page(struct page *old, struct page *new); void delete_from_page_cache_batch(struct address_space *mapping, struct folio_batch *fbatch); -- cgit From 2bb876b58d593d7f2522ec0f41f20a74fde76822 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 1 Jun 2022 15:13:59 -0400 Subject: filemap: Remove add_to_page_cache() and add_to_page_cache_locked() These functions have no more users, so delete them. Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Mike Kravetz Reviewed-by: Muchun Song --- include/linux/pagemap.h | 18 ------------------ 1 file changed, 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 44b9c265a234..dee519ef7436 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1098,8 +1098,6 @@ size_t fault_in_subpage_writeable(char __user *uaddr, size_t size); size_t fault_in_safe_writeable(const char __user *uaddr, size_t size); size_t fault_in_readable(const char __user *uaddr, size_t size); -int add_to_page_cache_locked(struct page *page, struct address_space *mapping, - pgoff_t index, gfp_t gfp); int add_to_page_cache_lru(struct page *page, struct address_space *mapping, pgoff_t index, gfp_t gfp); int filemap_add_folio(struct address_space *mapping, struct folio *folio, @@ -1115,22 +1113,6 @@ bool filemap_release_folio(struct folio *folio, gfp_t gfp); loff_t mapping_seek_hole_data(struct address_space *, loff_t start, loff_t end, int whence); -/* - * Like add_to_page_cache_locked, but used to add newly allocated pages: - * the page is new, so we can just run __SetPageLocked() against it. - */ -static inline int add_to_page_cache(struct page *page, - struct address_space *mapping, pgoff_t offset, gfp_t gfp_mask) -{ - int error; - - __SetPageLocked(page); - error = add_to_page_cache_locked(page, mapping, offset, gfp_mask); - if (unlikely(error)) - __ClearPageLocked(page); - return error; -} - /* Must be non-static for BPF error injection */ int __filemap_add_folio(struct address_space *mapping, struct folio *folio, pgoff_t index, gfp_t gfp, void **shadowp); -- cgit From be0ced5e9cb81a4c2edefd081a7794a1814fb60d Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 3 Jun 2022 15:30:25 -0400 Subject: filemap: Add filemap_get_folios() This is the equivalent of find_get_pages() but fills a folio_batch instead of an array of pages. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Acked-by: Christian Brauner (Microsoft) --- include/linux/pagemap.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index dee519ef7436..cfd0e8001b3b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -718,6 +718,8 @@ static inline struct page *find_subpage(struct page *head, pgoff_t index) return head + (index & (thp_nr_pages(head) - 1)); } +unsigned filemap_get_folios(struct address_space *mapping, pgoff_t *start, + pgoff_t end, struct folio_batch *fbatch); unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start, pgoff_t end, unsigned int nr_pages, struct page **pages); -- cgit From 77414d195f905dd43f58bce82118775ffa59575c Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 4 Jun 2022 17:39:09 -0400 Subject: vmscan: Add check_move_unevictable_folios() Change the guts of check_move_unevictable_pages() over to use folios and add check_move_unevictable_pages() as a wrapper. Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Christian Brauner (Microsoft) --- include/linux/swap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 0c0fed1b348f..8672a7123ccd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -438,7 +438,8 @@ static inline bool node_reclaim_enabled(void) return node_reclaim_mode & (RECLAIM_ZONE|RECLAIM_WRITE|RECLAIM_UNMAP); } -extern void check_move_unevictable_pages(struct pagevec *pvec); +void check_move_unevictable_folios(struct folio_batch *fbatch); +void check_move_unevictable_pages(struct pagevec *pvec); extern void kswapd_run(int nid); extern void kswapd_stop(int nid); -- cgit From bb4b42ba926224e26ed699e51def164b4b163935 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Sat, 4 Jun 2022 17:46:02 -0400 Subject: filemap: Remove find_get_pages_range() and associated functions All callers of find_get_pages_range(), pagevec_lookup_range() and pagevec_lookup() have now been removed. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Acked-by: Christian Brauner (Microsoft) --- include/linux/pagemap.h | 3 --- include/linux/pagevec.h | 10 ---------- 2 files changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index cfd0e8001b3b..87d4ea571240 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -720,9 +720,6 @@ static inline struct page *find_subpage(struct page *head, pgoff_t index) unsigned filemap_get_folios(struct address_space *mapping, pgoff_t *start, pgoff_t end, struct folio_batch *fbatch); -unsigned find_get_pages_range(struct address_space *mapping, pgoff_t *start, - pgoff_t end, unsigned int nr_pages, - struct page **pages); unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start, unsigned int nr_pages, struct page **pages); unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h index 67b1246f136b..6649154a2115 100644 --- a/include/linux/pagevec.h +++ b/include/linux/pagevec.h @@ -27,16 +27,6 @@ struct pagevec { void __pagevec_release(struct pagevec *pvec); void __pagevec_lru_add(struct pagevec *pvec); -unsigned pagevec_lookup_range(struct pagevec *pvec, - struct address_space *mapping, - pgoff_t *start, pgoff_t end); -static inline unsigned pagevec_lookup(struct pagevec *pvec, - struct address_space *mapping, - pgoff_t *start) -{ - return pagevec_lookup_range(pvec, mapping, start, (pgoff_t)-1); -} - unsigned pagevec_lookup_range_tag(struct pagevec *pvec, struct address_space *mapping, pgoff_t *index, pgoff_t end, xa_mark_t tag); -- cgit From 0e8e08cca5e3256a6209f02b482bee96fb91ba1b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 29 Apr 2022 08:49:28 -0400 Subject: netfs: Remove extern from function prototypes The 'extern' keyword is not necessary and removing it lets us shorten some lines. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/netfs.h | 23 +++++++++++------------ 1 file changed, 11 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 1773e5df8e65..11df38d03359 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -276,19 +276,18 @@ struct netfs_cache_ops { }; struct readahead_control; -extern void netfs_readahead(struct readahead_control *); +void netfs_readahead(struct readahead_control *); int netfs_read_folio(struct file *, struct folio *); -extern int netfs_write_begin(struct netfs_inode *, - struct file *, struct address_space *, - loff_t, unsigned int, struct folio **, - void **); - -extern void netfs_subreq_terminated(struct netfs_io_subrequest *, ssize_t, bool); -extern void netfs_get_subrequest(struct netfs_io_subrequest *subreq, - enum netfs_sreq_ref_trace what); -extern void netfs_put_subrequest(struct netfs_io_subrequest *subreq, - bool was_async, enum netfs_sreq_ref_trace what); -extern void netfs_stats_show(struct seq_file *); +int netfs_write_begin(struct netfs_inode *, struct file *, + struct address_space *, loff_t pos, unsigned int len, + struct folio **, void **fsdata); + +void netfs_subreq_terminated(struct netfs_io_subrequest *, ssize_t, bool); +void netfs_get_subrequest(struct netfs_io_subrequest *subreq, + enum netfs_sreq_ref_trace what); +void netfs_put_subrequest(struct netfs_io_subrequest *subreq, + bool was_async, enum netfs_sreq_ref_trace what); +void netfs_stats_show(struct seq_file *); /** * netfs_inode - Get the netfs inode context from the inode -- cgit From a57f68ddc8865d59a19783080cc52fb4a11dc209 Mon Sep 17 00:00:00 2001 From: Serge Semin Date: Fri, 24 Jun 2022 17:18:45 +0300 Subject: reset: Fix devm bulk optional exclusive control getter Most likely due to copy-paste mistake the device managed version of the denoted reset control getter has been implemented with invalid semantic, which can be immediately spotted by having "WARN_ON(shared && acquired)" warning in the system log as soon as the method is called. Anyway let's fix it by altering the boolean arguments passed to the __devm_reset_control_bulk_get() method from - shared = true, optional = false, acquired = true to + shared = false, optional = true, acquired = true That's what they were supposed to be in the first place (see the non-devm version of the same method: reset_control_bulk_get_optional_exclusive()). Fixes: 48d71395896d ("reset: Add reset_control_bulk API") Signed-off-by: Serge Semin Reviewed-by: Dmitry Osipenko Signed-off-by: Philipp Zabel Link: https://lore.kernel.org/r/20220624141853.7417-2-Sergey.Semin@baikalelectronics.ru --- include/linux/reset.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/reset.h b/include/linux/reset.h index 8a21b5756c3e..514ddf003efc 100644 --- a/include/linux/reset.h +++ b/include/linux/reset.h @@ -731,7 +731,7 @@ static inline int __must_check devm_reset_control_bulk_get_optional_exclusive(struct device *dev, int num_rstcs, struct reset_control_bulk_data *rstcs) { - return __devm_reset_control_bulk_get(dev, num_rstcs, rstcs, true, false, true); + return __devm_reset_control_bulk_get(dev, num_rstcs, rstcs, false, true, true); } /** -- cgit From 77940f0d96cd2ec9fe2125f74f513a7254bcdd7f Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 29 Jun 2022 16:31:12 +0200 Subject: mnt_idmapping: align kernel doc and parameter order The kernel documentation added for the new helpers had a few tiny ordering issues. Fix them. Signed-off-by: Christian Brauner (Microsoft) --- include/linux/mnt_idmapping.h | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mnt_idmapping.h b/include/linux/mnt_idmapping.h index be83643cadff..41dc80f8b67c 100644 --- a/include/linux/mnt_idmapping.h +++ b/include/linux/mnt_idmapping.h @@ -70,12 +70,12 @@ static inline bool vfsgid_eq(vfsgid_t left, vfsgid_t right) /** * vfsuid_eq_kuid - check whether kuid and vfsuid have the same value - * @kuid: the kuid to compare * @vfsuid: the vfsuid to compare + * @kuid: the kuid to compare * - * Check whether @kuid and @vfsuid have the same values. + * Check whether @vfsuid and @kuid have the same values. * - * Return: true if @kuid and @vfsuid have the same value, false if not. + * Return: true if @vfsuid and @kuid have the same value, false if not. * Comparison between two invalid uids returns false. */ static inline bool vfsuid_eq_kuid(vfsuid_t vfsuid, kuid_t kuid) @@ -85,15 +85,15 @@ static inline bool vfsuid_eq_kuid(vfsuid_t vfsuid, kuid_t kuid) /** * vfsgid_eq_kgid - check whether kgid and vfsgid have the same value - * @kgid: the kgid to compare * @vfsgid: the vfsgid to compare + * @kgid: the kgid to compare * - * Check whether @kgid and @vfsgid have the same values. + * Check whether @vfsgid and @kgid have the same values. * - * Return: true if @kgid and @vfsgid have the same value, false if not. + * Return: true if @vfsgid and @kgid have the same value, false if not. * Comparison between two invalid gids returns false. */ -static inline bool vfsgid_eq_kgid(kgid_t kgid, vfsgid_t vfsgid) +static inline bool vfsgid_eq_kgid(vfsgid_t vfsgid, kgid_t kgid) { return vfsgid_valid(vfsgid) && __vfsgid_val(vfsgid) == __kgid_val(kgid); } @@ -171,7 +171,7 @@ static inline bool no_idmapping(const struct user_namespace *mnt_userns, } /** - * mapped_kuid_fs - map a filesystem kuid into a mnt_userns + * make_vfsuid - map a filesystem kuid into a mnt_userns * @mnt_userns: the mount's idmapping * @fs_userns: the filesystem's idmapping * @kuid : kuid to be mapped @@ -216,7 +216,7 @@ static inline kuid_t mapped_kuid_fs(struct user_namespace *mnt_userns, } /** - * mapped_kgid_fs - map a filesystem kgid into a mnt_userns + * make_vfsgid - map a filesystem kgid into a mnt_userns * @mnt_userns: the mount's idmapping * @fs_userns: the filesystem's idmapping * @kgid : kgid to be mapped -- cgit From 6a27d28c81bc5843de2490688a04ee5baa6615e7 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 29 Jun 2022 08:20:12 +0200 Subject: block: move ->ia_ranges from the request_queue to the gendisk Independent access ranges only matter for file system I/O and are only valid with a registered gendisk, so move them there. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal Tested-by: Damien Le Moal Link: https://lore.kernel.org/r/20220629062013.1331068-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 22b12531aeb7..b9a94c53c6cd 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -171,6 +171,12 @@ struct gendisk { struct badblocks *bb; struct lockdep_map lockdep_map; u64 diskseq; + + /* + * Independent sector access ranges. This is always NULL for + * devices that do not have multiple independent access ranges. + */ + struct blk_independent_access_ranges *ia_ranges; }; static inline bool disk_live(struct gendisk *disk) @@ -539,12 +545,6 @@ struct request_queue { bool mq_sysfs_init_done; - /* - * Independent sector access ranges. This is always NULL for - * devices that do not have multiple independent access ranges. - */ - struct blk_independent_access_ranges *ia_ranges; - /** * @srcu: Sleepable RCU. Use as lock when type of the request queue * is blocking (BLK_MQ_F_BLOCKING). Must be the last member -- cgit From 445cbd219ac3b8f451153c210aaf97adcbf4bd02 Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Thu, 23 Jun 2022 22:14:09 +0100 Subject: regmap-irq: Convert bool bitfields to unsigned int Use 'unsigned int' for bitfields for consistency with most other kernel code. Signed-off-by: Aidan MacDonald Link: https://lore.kernel.org/r/20220623211420.918875-2-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 26 +++++++++++++------------- 1 file changed, 13 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index d5b08f4f0dc0..bcbc5d4ffd30 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1534,19 +1534,19 @@ struct regmap_irq_chip { unsigned int type_base; unsigned int *virt_reg_base; unsigned int irq_reg_stride; - bool mask_writeonly:1; - bool init_ack_masked:1; - bool mask_invert:1; - bool use_ack:1; - bool ack_invert:1; - bool clear_ack:1; - bool wake_invert:1; - bool runtime_pm:1; - bool type_invert:1; - bool type_in_mask:1; - bool clear_on_unmask:1; - bool not_fixed_stride:1; - bool status_invert:1; + unsigned int mask_writeonly:1; + unsigned int init_ack_masked:1; + unsigned int mask_invert:1; + unsigned int use_ack:1; + unsigned int ack_invert:1; + unsigned int clear_ack:1; + unsigned int wake_invert:1; + unsigned int runtime_pm:1; + unsigned int type_invert:1; + unsigned int type_in_mask:1; + unsigned int clear_on_unmask:1; + unsigned int not_fixed_stride:1; + unsigned int status_invert:1; int num_regs; -- cgit From 53a1a16dcc972163bd5816192d5d63ae433c9e56 Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Thu, 23 Jun 2022 22:14:10 +0100 Subject: regmap-irq: Remove unused type_reg_stride field It appears that no chip ever required a nonzero type_reg_stride and commit 1066cfbdfa3f ("regmap-irq: Extend sub-irq to support non-fixed reg strides") broke support. Just remove the field. Signed-off-by: Aidan MacDonald Link: https://lore.kernel.org/r/20220623211420.918875-3-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index bcbc5d4ffd30..58db2206f193 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1503,8 +1503,6 @@ struct regmap_irq_sub_irq_map { * @num_type_reg: Number of type registers. * @num_virt_regs: Number of non-standard irq configuration registers. * If zero unsupported. - * @type_reg_stride: Stride to use for chips where type registers are not - * contiguous. * @handle_pre_irq: Driver specific callback to handle interrupt from device * before regmap_irq_handler process the interrupts. * @handle_post_irq: Driver specific callback to handle interrupt from device @@ -1555,7 +1553,6 @@ struct regmap_irq_chip { int num_type_reg; int num_virt_regs; - unsigned int type_reg_stride; int (*handle_pre_irq)(void *irq_drv_data); int (*handle_post_irq)(void *irq_drv_data); -- cgit From 610fdd668e6af48fcae7908161d14eee3a95ec92 Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Thu, 23 Jun 2022 22:14:12 +0100 Subject: regmap-irq: Remove an unnecessary restriction on type_in_mask Check types_supported instead of checking type_rising/falling_val when using type_in_mask interrupts. This makes the intent clearer and allows a type_in_mask irq to support level or edge triggers, rather than only edge triggers. Update the documentation and comments to reflect the new behavior. This shouldn't affect existing drivers, because if they didn't set types_supported properly the type buffer wouldn't be updated. Signed-off-by: Aidan MacDonald Link: https://lore.kernel.org/r/20220623211420.918875-5-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 58db2206f193..b404c59b3b20 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1484,9 +1484,11 @@ struct regmap_irq_sub_irq_map { * @clear_ack: Use this to set 1 and 0 or vice-versa to clear interrupts. * @wake_invert: Inverted wake register: cleared bits are wake enabled. * @type_invert: Invert the type flags. - * @type_in_mask: Use the mask registers for controlling irq type. For - * interrupts defining type_rising/falling_mask use mask_base - * for edge configuration and never update bits in type_base. + * @type_in_mask: Use the mask registers for controlling irq type. Use this if + * the hardware provides separate bits for rising/falling edge + * or low/high level interrupts and they should be combined into + * a single logical interrupt. Use &struct regmap_irq_type data + * to define the mask bit for each irq type. * @clear_on_unmask: For chips with interrupts cleared on read: read the status * registers before unmasking interrupts to clear any bits * set when they were masked. -- cgit From ad22b3e98f9430896bd4bd8f4fbff4667f02a0c8 Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Thu, 23 Jun 2022 22:14:14 +0100 Subject: regmap-irq: Remove mask_writeonly and regmap_irq_update_bits() Commit a71411dbf6c8 ("regmap: irq: add chip option mask_writeonly") introduced the mask_writeonly option, but it isn't used now and it appears it's never been used by any in-tree drivers. The motivation for the option is mentioned in the commit message, Some irq controllers have writeonly/multipurpose register layouts. In those cases we read invalid data back. [...] The option causes mask register updates to use regmap_write_bits() instead of regmap_update_bits(). However, regmap_write_bits() doesn't solve the reading invalid data problem. It's still a read-modify-write op like regmap_update_bits(). The difference is that 'update bits' will only write the new value if it is different from the current value, while 'write bits' will write the new value unconditionally, even if it's the same as the current value. This seems like a bit of a specialized use case and probably isn't that useful for regmap-irq, so let's just remove the option and go back to using an 'update bits' op for the mask registers. We can always add the option back if some driver ends up needing it in the future. Signed-off-by: Aidan MacDonald Link: https://lore.kernel.org/r/20220623211420.918875-7-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index b404c59b3b20..6489b3610362 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1468,7 +1468,6 @@ struct regmap_irq_sub_irq_map { * * @status_base: Base status register address. * @mask_base: Base mask register address. - * @mask_writeonly: Base mask register is write only. * @unmask_base: Base unmask register address. for chips who have * separate mask and unmask registers * @ack_base: Base ack address. If zero then the chip is clear on read. @@ -1534,7 +1533,6 @@ struct regmap_irq_chip { unsigned int type_base; unsigned int *virt_reg_base; unsigned int irq_reg_stride; - unsigned int mask_writeonly:1; unsigned int init_ack_masked:1; unsigned int mask_invert:1; unsigned int use_ack:1; -- cgit From faa87ce9196dbb074d75bd4aecb8bacf18f19b4e Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Thu, 23 Jun 2022 22:14:16 +0100 Subject: regmap-irq: Introduce config registers for irq types Config registers provide a more uniform approach to handling irq type registers. They are essentially an extension of the virtual registers used by the qcom-pm8008 driver. Config registers can be represented as a 2D array: config_base[0] reg0,0 reg0,1 reg0,2 reg0,3 config_base[1] reg1,0 reg1,1 reg1,2 reg1,3 config_base[2] reg2,0 reg2,1 reg2,2 reg2,3 There are 'num_config_bases' base registers, each of which is used to address 'num_config_regs' registers. The addresses are calculated in the same way as for other bases. It is assumed that an irq's type is controlled by one column of registers; that column is identified by the irq's 'type_reg_offset'. The set_type_config() callback is responsible for updating the config register contents. It receives an array of buffers (each represents a row of registers) and the index of the column to update, along with the 'struct regmap_irq' description and requested irq type. Buffered values are written to registers in regmap_irq_sync_unlock(). Note that the entire register contents are overwritten, which is a minor change in behavior from type registers via 'type_base'. Signed-off-by: Aidan MacDonald Link: https://lore.kernel.org/r/20220623211420.918875-9-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 6489b3610362..791c15ad1f63 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1475,6 +1475,7 @@ struct regmap_irq_sub_irq_map { * @wake_base: Base address for wake enables. If zero unsupported. * @type_base: Base address for irq type. If zero unsupported. * @virt_reg_base: Base addresses for extra config regs. + * @config_base: Base address for IRQ type config regs. If null unsupported. * @irq_reg_stride: Stride to use for chips where registers are not contiguous. * @init_ack_masked: Ack all masked interrupts once during initalization. * @mask_invert: Inverted mask register: cleared bits are masked out. @@ -1504,12 +1505,15 @@ struct regmap_irq_sub_irq_map { * @num_type_reg: Number of type registers. * @num_virt_regs: Number of non-standard irq configuration registers. * If zero unsupported. + * @num_config_bases: Number of config base registers. + * @num_config_regs: Number of config registers for each config base register. * @handle_pre_irq: Driver specific callback to handle interrupt from device * before regmap_irq_handler process the interrupts. * @handle_post_irq: Driver specific callback to handle interrupt from device * after handling the interrupts in regmap_irq_handler(). * @set_type_virt: Driver specific callback to extend regmap_irq_set_type() * and configure virt regs. + * @set_type_config: Callback used for configuring irq types. * @irq_drv_data: Driver specific IRQ data which is passed as parameter when * driver specific pre/post interrupt handler is called. * @@ -1532,6 +1536,7 @@ struct regmap_irq_chip { unsigned int wake_base; unsigned int type_base; unsigned int *virt_reg_base; + const unsigned int *config_base; unsigned int irq_reg_stride; unsigned int init_ack_masked:1; unsigned int mask_invert:1; @@ -1553,16 +1558,23 @@ struct regmap_irq_chip { int num_type_reg; int num_virt_regs; + int num_config_bases; + int num_config_regs; int (*handle_pre_irq)(void *irq_drv_data); int (*handle_post_irq)(void *irq_drv_data); int (*set_type_virt)(unsigned int **buf, unsigned int type, unsigned long hwirq, int reg); + int (*set_type_config)(unsigned int **buf, unsigned int type, + const struct regmap_irq *irq_data, int idx); void *irq_drv_data; }; struct regmap_irq_chip_data; +int regmap_irq_set_type_config_simple(unsigned int **buf, unsigned int type, + const struct regmap_irq *irq_data, int idx); + int regmap_add_irq_chip(struct regmap *map, int irq, int irq_flags, int irq_base, const struct regmap_irq_chip *chip, struct regmap_irq_chip_data **data); -- cgit From 9edd4f5aee8470dcfd0db04005908f61fbfae8e0 Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Thu, 23 Jun 2022 22:14:17 +0100 Subject: regmap-irq: Deprecate type registers and virtual registers Config registers can be used to replace both type and virtual registers, so mark both features are deprecated and issue a warning if they're used. Signed-off-by: Aidan MacDonald Link: https://lore.kernel.org/r/20220623211420.918875-10-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 791c15ad1f63..c9a9bbbb6261 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1473,8 +1473,10 @@ struct regmap_irq_sub_irq_map { * @ack_base: Base ack address. If zero then the chip is clear on read. * Using zero value is possible with @use_ack bit. * @wake_base: Base address for wake enables. If zero unsupported. - * @type_base: Base address for irq type. If zero unsupported. - * @virt_reg_base: Base addresses for extra config regs. + * @type_base: Base address for irq type. If zero unsupported. Deprecated, + * use @config_base instead. + * @virt_reg_base: Base addresses for extra config regs. Deprecated, use + * @config_base instead. * @config_base: Base address for IRQ type config regs. If null unsupported. * @irq_reg_stride: Stride to use for chips where registers are not contiguous. * @init_ack_masked: Ack all masked interrupts once during initalization. @@ -1483,7 +1485,8 @@ struct regmap_irq_sub_irq_map { * @ack_invert: Inverted ack register: cleared bits for ack. * @clear_ack: Use this to set 1 and 0 or vice-versa to clear interrupts. * @wake_invert: Inverted wake register: cleared bits are wake enabled. - * @type_invert: Invert the type flags. + * @type_invert: Invert the type flags. Deprecated, use config registers + * instead. * @type_in_mask: Use the mask registers for controlling irq type. Use this if * the hardware provides separate bits for rising/falling edge * or low/high level interrupts and they should be combined into @@ -1502,9 +1505,11 @@ struct regmap_irq_sub_irq_map { * @irqs: Descriptors for individual IRQs. Interrupt numbers are * assigned based on the index in the array of the interrupt. * @num_irqs: Number of descriptors. - * @num_type_reg: Number of type registers. + * @num_type_reg: Number of type registers. Deprecated, use config registers + * instead. * @num_virt_regs: Number of non-standard irq configuration registers. - * If zero unsupported. + * If zero unsupported. Deprecated, use config registers + * instead. * @num_config_bases: Number of config base registers. * @num_config_regs: Number of config registers for each config base register. * @handle_pre_irq: Driver specific callback to handle interrupt from device @@ -1512,7 +1517,8 @@ struct regmap_irq_sub_irq_map { * @handle_post_irq: Driver specific callback to handle interrupt from device * after handling the interrupts in regmap_irq_handler(). * @set_type_virt: Driver specific callback to extend regmap_irq_set_type() - * and configure virt regs. + * and configure virt regs. Deprecated, use @set_type_config + * callback and config registers instead. * @set_type_config: Callback used for configuring irq types. * @irq_drv_data: Driver specific IRQ data which is passed as parameter when * driver specific pre/post interrupt handler is called. -- cgit From e8ffb12e7f065db616a3eba79ff138bececf0825 Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Thu, 23 Jun 2022 22:14:18 +0100 Subject: regmap-irq: Fix inverted handling of unmask registers To me "unmask" suggests that we write 1s to the register when an interrupt is enabled. This also makes sense because it's the opposite of what the "mask" register does (write 1s to disable an interrupt). But regmap-irq does the opposite: for a disabled interrupt, it writes 1s to "unmask" and 0s to "mask". This is surprising and deviates from the usual way mask registers are handled. Additionally, mask_invert didn't interact with unmask registers properly -- it caused them to be ignored entirely. Fix this by making mask and unmask registers orthogonal, using the following behavior: * Mask registers are written with 1s for disabled interrupts. * Unmask registers are written with 1s for enabled interrupts. This behavior supports both normal or inverted mask registers and separate set/clear registers via different combinations of mask_base/unmask_base. The old unmask register behavior is deprecated. Drivers need to opt-in to the new behavior by setting mask_unmask_non_inverted. Warnings are issued if the driver relies on deprecated behavior. Chips that only set one of mask_base/unmask_base don't have to use the mask_unmask_non_inverted flag because that use case was previously not supported. The mask_invert flag is also deprecated in favor of describing inverted mask registers as unmask registers. Signed-off-by: Aidan MacDonald Link: https://lore.kernel.org/r/20220623211420.918875-11-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index c9a9bbbb6261..c9c9e39750ed 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1467,9 +1467,10 @@ struct regmap_irq_sub_irq_map { * main_status set. * * @status_base: Base status register address. - * @mask_base: Base mask register address. - * @unmask_base: Base unmask register address. for chips who have - * separate mask and unmask registers + * @mask_base: Base mask register address. Mask bits are set to 1 when an + * interrupt is masked, 0 when unmasked. + * @unmask_base: Base unmask register address. Unmask bits are set to 1 when + * an interrupt is unmasked and 0 when masked. * @ack_base: Base ack address. If zero then the chip is clear on read. * Using zero value is possible with @use_ack bit. * @wake_base: Base address for wake enables. If zero unsupported. @@ -1481,6 +1482,16 @@ struct regmap_irq_sub_irq_map { * @irq_reg_stride: Stride to use for chips where registers are not contiguous. * @init_ack_masked: Ack all masked interrupts once during initalization. * @mask_invert: Inverted mask register: cleared bits are masked out. + * Deprecated; prefer describing an inverted mask register as + * an unmask register. + * @mask_unmask_non_inverted: Controls mask bit inversion for chips that set + * both @mask_base and @unmask_base. If false, mask and unmask bits are + * inverted (which is deprecated behavior); if true, bits will not be + * inverted and the registers keep their normal behavior. Note that if + * you use only one of @mask_base or @unmask_base, this flag has no + * effect and is unnecessary. Any new drivers that set both @mask_base + * and @unmask_base should set this to true to avoid relying on the + * deprecated behavior. * @use_ack: Use @ack register even if it is zero. * @ack_invert: Inverted ack register: cleared bits for ack. * @clear_ack: Use this to set 1 and 0 or vice-versa to clear interrupts. @@ -1546,6 +1557,7 @@ struct regmap_irq_chip { unsigned int irq_reg_stride; unsigned int init_ack_masked:1; unsigned int mask_invert:1; + unsigned int mask_unmask_non_inverted:1; unsigned int use_ack:1; unsigned int ack_invert:1; unsigned int clear_ack:1; -- cgit From bdf9b86cd3adbbcf590ab82b74ab8554534c9b6e Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Thu, 23 Jun 2022 22:14:19 +0100 Subject: regmap-irq: Add get_irq_reg() callback Replace the internal sub_irq_reg() function with a public callback that drivers can use when they have more complex register layouts. The default implementation is regmap_irq_get_irq_reg_linear(), used if the chip doesn't provide its own callback. Signed-off-by: Aidan MacDonald Link: https://lore.kernel.org/r/20220623211420.918875-12-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index c9c9e39750ed..0d240922d4da 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1440,6 +1440,8 @@ struct regmap_irq_sub_irq_map { unsigned int *offset; }; +struct regmap_irq_chip_data; + /** * struct regmap_irq_chip - Description of a generic regmap irq_chip. * @@ -1531,6 +1533,13 @@ struct regmap_irq_sub_irq_map { * and configure virt regs. Deprecated, use @set_type_config * callback and config registers instead. * @set_type_config: Callback used for configuring irq types. + * @get_irq_reg: Callback for mapping (base register, index) pairs to register + * addresses. The base register will be one of @status_base, + * @mask_base, etc., @main_status, or any of @config_base. + * The index will be in the range [0, num_main_regs[ for the + * main status base, [0, num_type_settings[ for any config + * register base, and [0, num_regs[ for any other base. + * If unspecified then regmap_irq_get_irq_reg_linear() is used. * @irq_drv_data: Driver specific IRQ data which is passed as parameter when * driver specific pre/post interrupt handler is called. * @@ -1585,11 +1594,13 @@ struct regmap_irq_chip { unsigned long hwirq, int reg); int (*set_type_config)(unsigned int **buf, unsigned int type, const struct regmap_irq *irq_data, int idx); + unsigned int (*get_irq_reg)(struct regmap_irq_chip_data *data, + unsigned int base, int index); void *irq_drv_data; }; -struct regmap_irq_chip_data; - +unsigned int regmap_irq_get_irq_reg_linear(struct regmap_irq_chip_data *data, + unsigned int base, int index); int regmap_irq_set_type_config_simple(unsigned int **buf, unsigned int type, const struct regmap_irq *irq_data, int idx); -- cgit From 48e014ee9a61e8f4700987b82f7cb1dc3c89fa76 Mon Sep 17 00:00:00 2001 From: Aidan MacDonald Date: Thu, 23 Jun 2022 22:14:20 +0100 Subject: regmap-irq: Deprecate the not_fixed_stride flag This flag is a bit of a hack and the same thing can be accomplished using a custom ->get_irq_reg() callback. Add a warning to catch any use of the flag. Signed-off-by: Aidan MacDonald Link: https://lore.kernel.org/r/20220623211420.918875-13-aidanmacdonald.0x0@gmail.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 0d240922d4da..7cf2157134ac 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -1509,8 +1509,10 @@ struct regmap_irq_chip_data; * registers before unmasking interrupts to clear any bits * set when they were masked. * @not_fixed_stride: Used when chip peripherals are not laid out with fixed - * stride. Must be used with sub_reg_offsets containing the - * offsets to each peripheral. + * stride. Must be used with sub_reg_offsets containing the + * offsets to each peripheral. Deprecated; the same thing + * can be accomplished with a @get_irq_reg callback, without + * the need for a @sub_reg_offsets table. * @status_invert: Inverted status register: cleared bits are active interrupts. * @runtime_pm: Hold a runtime PM lock on the device when accessing it. * -- cgit From ae92b1c84306a68c361d90b46ad3acef99c10e29 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Tue, 28 Jun 2022 10:46:24 +0100 Subject: usb: typec_altmode: add a missing "@" at a kernel-doc parameter Without that, the parameter is not properly parsed: include/linux/usb/typec_altmode.h:132: warning: Function parameter or member 'altmode' not described in 'typec_altmode_get_orientation' Signed-off-by: Mauro Carvalho Chehab Acked-by: Heikki Krogerus Link: https://lore.kernel.org/r/70dc4c5d744cf1fe9a0efe6b85deaa0489628282.1656409369.git.mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/typec_altmode.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/usb/typec_altmode.h b/include/linux/usb/typec_altmode.h index 65933cbe9129..350d49012659 100644 --- a/include/linux/usb/typec_altmode.h +++ b/include/linux/usb/typec_altmode.h @@ -124,7 +124,7 @@ struct typec_altmode *typec_match_altmode(struct typec_altmode **altmodes, /** * typec_altmode_get_orientation - Get cable plug orientation - * altmode: Handle to the alternate mode + * @altmode: Handle to the alternate mode */ static inline enum typec_orientation typec_altmode_get_orientation(struct typec_altmode *altmode) -- cgit From b5d281f6c16dd432b618bdfd36ddba1a58d5b603 Mon Sep 17 00:00:00 2001 From: Christian Marangi Date: Mon, 20 Jun 2022 00:03:51 +0200 Subject: PM / devfreq: Rework freq_table to be local to devfreq struct On a devfreq PROBE_DEFER, the freq_table in the driver profile struct, is never reset and may be leaved in an undefined state. This comes from the fact that we store the freq_table in the driver profile struct that is commonly defined as static and not reset on PROBE_DEFER. We currently skip the reinit of the freq_table if we found it's already defined since a driver may declare his own freq_table. This logic is flawed in the case devfreq core generate a freq_table, set it in the profile struct and then PROBE_DEFER, freeing the freq_table. In this case devfreq will found a NOT NULL freq_table that has been freed, skip the freq_table generation and probe the driver based on the wrong table. To fix this and correctly handle PROBE_DEFER, use a local freq_table and max_state in the devfreq struct and never modify the freq_table present in the profile struct if it does provide it. Fixes: 0ec09ac2cebe ("PM / devfreq: Set the freq_table of devfreq device") Cc: stable@vger.kernel.org Signed-off-by: Christian Marangi Signed-off-by: Chanwoo Choi --- include/linux/devfreq.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h index dc10bee75a72..34aab4dd336c 100644 --- a/include/linux/devfreq.h +++ b/include/linux/devfreq.h @@ -148,6 +148,8 @@ struct devfreq_stats { * reevaluate operable frequencies. Devfreq users may use * devfreq.nb to the corresponding register notifier call chain. * @work: delayed work for load monitoring. + * @freq_table: current frequency table used by the devfreq driver. + * @max_state: count of entry present in the frequency table. * @previous_freq: previously configured frequency value. * @last_status: devfreq user device info, performance statistics * @data: Private data of the governor. The devfreq framework does not @@ -185,6 +187,9 @@ struct devfreq { struct notifier_block nb; struct delayed_work work; + unsigned long *freq_table; + unsigned int max_state; + unsigned long previous_freq; struct devfreq_dev_status last_status; -- cgit From af3f4134006bf3bf894179c08aaf98ed5938cf4e Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Tue, 28 Jun 2022 10:43:04 -0700 Subject: bpf: add bpf_func_t and trampoline helpers I'll be adding lsm cgroup specific helpers that grab trampoline mutex. No functional changes. Reviewed-by: Martin KaFai Lau Signed-off-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20220628174314.1216643-2-sdf@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index d05e1495a06e..d547be9db75f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -56,6 +56,8 @@ typedef u64 (*bpf_callback_t)(u64, u64, u64, u64, u64); typedef int (*bpf_iter_init_seq_priv_t)(void *private_data, struct bpf_iter_aux_info *aux); typedef void (*bpf_iter_fini_seq_priv_t)(void *private_data); +typedef unsigned int (*bpf_func_t)(const void *, + const struct bpf_insn *); struct bpf_iter_seq_info { const struct seq_operations *seq_ops; bpf_iter_init_seq_priv_t init_seq_private; @@ -879,8 +881,7 @@ struct bpf_dispatcher { static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func( const void *ctx, const struct bpf_insn *insnsi, - unsigned int (*bpf_func)(const void *, - const struct bpf_insn *)) + bpf_func_t bpf_func) { return bpf_func(ctx, insnsi); } @@ -909,8 +910,7 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs); noinline __nocfi unsigned int bpf_dispatcher_##name##_func( \ const void *ctx, \ const struct bpf_insn *insnsi, \ - unsigned int (*bpf_func)(const void *, \ - const struct bpf_insn *)) \ + bpf_func_t bpf_func) \ { \ return bpf_func(ctx, insnsi); \ } \ @@ -921,8 +921,7 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs); unsigned int bpf_dispatcher_##name##_func( \ const void *ctx, \ const struct bpf_insn *insnsi, \ - unsigned int (*bpf_func)(const void *, \ - const struct bpf_insn *)); \ + bpf_func_t bpf_func); \ extern struct bpf_dispatcher bpf_dispatcher_##name; #define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_##name##_func #define BPF_DISPATCHER_PTR(name) (&bpf_dispatcher_##name) -- cgit From 00442143a2ab7f1da46fbf4d2a99c85df767d49a Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Tue, 28 Jun 2022 10:43:05 -0700 Subject: bpf: convert cgroup_bpf.progs to hlist This lets us reclaim some space to be used by new cgroup lsm slots. Before: struct cgroup_bpf { struct bpf_prog_array * effective[23]; /* 0 184 */ /* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */ struct list_head progs[23]; /* 184 368 */ /* --- cacheline 8 boundary (512 bytes) was 40 bytes ago --- */ u32 flags[23]; /* 552 92 */ /* XXX 4 bytes hole, try to pack */ /* --- cacheline 10 boundary (640 bytes) was 8 bytes ago --- */ struct list_head storages; /* 648 16 */ struct bpf_prog_array * inactive; /* 664 8 */ struct percpu_ref refcnt; /* 672 16 */ struct work_struct release_work; /* 688 32 */ /* size: 720, cachelines: 12, members: 7 */ /* sum members: 716, holes: 1, sum holes: 4 */ /* last cacheline: 16 bytes */ }; After: struct cgroup_bpf { struct bpf_prog_array * effective[23]; /* 0 184 */ /* --- cacheline 2 boundary (128 bytes) was 56 bytes ago --- */ struct hlist_head progs[23]; /* 184 184 */ /* --- cacheline 5 boundary (320 bytes) was 48 bytes ago --- */ u8 flags[23]; /* 368 23 */ /* XXX 1 byte hole, try to pack */ /* --- cacheline 6 boundary (384 bytes) was 8 bytes ago --- */ struct list_head storages; /* 392 16 */ struct bpf_prog_array * inactive; /* 408 8 */ struct percpu_ref refcnt; /* 416 16 */ struct work_struct release_work; /* 432 72 */ /* size: 504, cachelines: 8, members: 7 */ /* sum members: 503, holes: 1, sum holes: 1 */ /* last cacheline: 56 bytes */ }; Suggested-by: Jakub Sitnicki Reviewed-by: Jakub Sitnicki Reviewed-by: Martin KaFai Lau Signed-off-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20220628174314.1216643-3-sdf@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf-cgroup-defs.h | 4 ++-- include/linux/bpf-cgroup.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h index 695d1224a71b..5d268e76d8e6 100644 --- a/include/linux/bpf-cgroup-defs.h +++ b/include/linux/bpf-cgroup-defs.h @@ -47,8 +47,8 @@ struct cgroup_bpf { * have either zero or one element * when BPF_F_ALLOW_MULTI the list can have up to BPF_CGROUP_MAX_PROGS */ - struct list_head progs[MAX_CGROUP_BPF_ATTACH_TYPE]; - u32 flags[MAX_CGROUP_BPF_ATTACH_TYPE]; + struct hlist_head progs[MAX_CGROUP_BPF_ATTACH_TYPE]; + u8 flags[MAX_CGROUP_BPF_ATTACH_TYPE]; /* list of cgroup shared storages */ struct list_head storages; diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 669d96d074ad..6673acfbf2ef 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -95,7 +95,7 @@ struct bpf_cgroup_link { }; struct bpf_prog_list { - struct list_head node; + struct hlist_node node; struct bpf_prog *prog; struct bpf_cgroup_link *link; struct bpf_cgroup_storage *storage[MAX_BPF_CGROUP_STORAGE_TYPE]; -- cgit From 69fd337a975c7e690dfe49d9cb4fe5ba1e6db44e Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Tue, 28 Jun 2022 10:43:06 -0700 Subject: bpf: per-cgroup lsm flavor Allow attaching to lsm hooks in the cgroup context. Attaching to per-cgroup LSM works exactly like attaching to other per-cgroup hooks. New BPF_LSM_CGROUP is added to trigger new mode; the actual lsm hook we attach to is signaled via existing attach_btf_id. For the hooks that have 'struct socket' or 'struct sock' as its first argument, we use the cgroup associated with that socket. For the rest, we use 'current' cgroup (this is all on default hierarchy == v2 only). Note that for some hooks that work on 'struct sock' we still take the cgroup from 'current' because some of them work on the socket that hasn't been properly initialized yet. Behind the scenes, we allocate a shim program that is attached to the trampoline and runs cgroup effective BPF programs array. This shim has some rudimentary ref counting and can be shared between several programs attaching to the same lsm hook from different cgroups. Note that this patch bloats cgroup size because we add 211 cgroup_bpf_attach_type(s) for simplicity sake. This will be addressed in the subsequent patch. Also note that we only add non-sleepable flavor for now. To enable sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu, shim programs have to be freed via trace rcu, cgroup_bpf.effective should be also trace-rcu-managed + maybe some other changes that I'm not aware of. Reviewed-by: Martin KaFai Lau Signed-off-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20220628174314.1216643-4-sdf@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf-cgroup-defs.h | 8 ++++++++ include/linux/bpf-cgroup.h | 7 +++++++ include/linux/bpf.h | 24 ++++++++++++++++++++++++ include/linux/bpf_lsm.h | 13 +++++++++++++ include/linux/btf_ids.h | 3 ++- 5 files changed, 54 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h index 5d268e76d8e6..b99f8c3e37ea 100644 --- a/include/linux/bpf-cgroup-defs.h +++ b/include/linux/bpf-cgroup-defs.h @@ -10,6 +10,12 @@ struct bpf_prog_array; +#ifdef CONFIG_BPF_LSM +#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */ +#else +#define CGROUP_LSM_NUM 0 +#endif + enum cgroup_bpf_attach_type { CGROUP_BPF_ATTACH_TYPE_INVALID = -1, CGROUP_INET_INGRESS = 0, @@ -35,6 +41,8 @@ enum cgroup_bpf_attach_type { CGROUP_INET4_GETSOCKNAME, CGROUP_INET6_GETSOCKNAME, CGROUP_INET_SOCK_RELEASE, + CGROUP_LSM_START, + CGROUP_LSM_END = CGROUP_LSM_START + CGROUP_LSM_NUM - 1, MAX_CGROUP_BPF_ATTACH_TYPE }; diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 6673acfbf2ef..2bd1b5f8de9b 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -23,6 +23,13 @@ struct ctl_table; struct ctl_table_header; struct task_struct; +unsigned int __cgroup_bpf_run_lsm_sock(const void *ctx, + const struct bpf_insn *insn); +unsigned int __cgroup_bpf_run_lsm_socket(const void *ctx, + const struct bpf_insn *insn); +unsigned int __cgroup_bpf_run_lsm_current(const void *ctx, + const struct bpf_insn *insn); + #ifdef CONFIG_CGROUP_BPF #define CGROUP_ATYPE(type) \ diff --git a/include/linux/bpf.h b/include/linux/bpf.h index d547be9db75f..77cd613a00bd 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -794,6 +794,10 @@ void notrace __bpf_prog_exit(struct bpf_prog *prog, u64 start, struct bpf_tramp_ u64 notrace __bpf_prog_enter_sleepable(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_prog_exit_sleepable(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx); +u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx); +void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); @@ -1060,6 +1064,7 @@ struct bpf_prog_aux { struct user_struct *user; u64 load_time; /* ns since boottime */ u32 verified_insns; + int cgroup_atype; /* enum cgroup_bpf_attach_type */ struct bpf_map *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE]; char name[BPF_OBJ_NAME_LEN]; #ifdef CONFIG_SECURITY @@ -1167,6 +1172,11 @@ struct bpf_tramp_link { u64 cookie; }; +struct bpf_shim_tramp_link { + struct bpf_tramp_link link; + struct bpf_trampoline *trampoline; +}; + struct bpf_tracing_link { struct bpf_tramp_link link; enum bpf_attach_type attach_type; @@ -1245,6 +1255,9 @@ struct bpf_dummy_ops { int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr); #endif +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, + int cgroup_atype); +void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog); #else static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) { @@ -1268,6 +1281,14 @@ static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, { return -EINVAL; } +static inline int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, + int cgroup_atype) +{ + return -EOPNOTSUPP; +} +static inline void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog) +{ +} #endif struct bpf_array { @@ -2368,6 +2389,8 @@ extern const struct bpf_func_proto bpf_sk_getsockopt_proto; extern const struct bpf_func_proto bpf_find_vma_proto; extern const struct bpf_func_proto bpf_loop_proto; extern const struct bpf_func_proto bpf_copy_from_user_task_proto; +extern const struct bpf_func_proto bpf_set_retval_proto; +extern const struct bpf_func_proto bpf_get_retval_proto; const struct bpf_func_proto *tracing_prog_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); @@ -2485,6 +2508,7 @@ int bpf_arch_text_invalidate(void *dst, size_t len); struct btf_id_set; bool btf_id_set_contains(const struct btf_id_set *set, u32 id); +int btf_id_set_index(const struct btf_id_set *set, u32 id); #define MAX_BPRINTF_VARARGS 12 diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h index 479c101546ad..61787a5f6af9 100644 --- a/include/linux/bpf_lsm.h +++ b/include/linux/bpf_lsm.h @@ -42,6 +42,9 @@ extern const struct bpf_func_proto bpf_inode_storage_get_proto; extern const struct bpf_func_proto bpf_inode_storage_delete_proto; void bpf_inode_storage_free(struct inode *inode); +int bpf_lsm_hook_idx(u32 btf_id); +void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func); + #else /* !CONFIG_BPF_LSM */ static inline bool bpf_lsm_is_sleepable_hook(u32 btf_id) @@ -65,6 +68,16 @@ static inline void bpf_inode_storage_free(struct inode *inode) { } +static inline void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, + bpf_func_t *bpf_func) +{ +} + +static inline int bpf_lsm_hook_idx(u32 btf_id) +{ + return -EINVAL; +} + #endif /* CONFIG_BPF_LSM */ #endif /* _LINUX_BPF_LSM_H */ diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index 335a19092368..252a4befeab1 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -179,7 +179,8 @@ extern struct btf_id_set name; BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP, udp_sock) \ BTF_SOCK_TYPE(BTF_SOCK_TYPE_UDP6, udp6_sock) \ BTF_SOCK_TYPE(BTF_SOCK_TYPE_UNIX, unix_sock) \ - BTF_SOCK_TYPE(BTF_SOCK_TYPE_MPTCP, mptcp_sock) + BTF_SOCK_TYPE(BTF_SOCK_TYPE_MPTCP, mptcp_sock) \ + BTF_SOCK_TYPE(BTF_SOCK_TYPE_SOCKET, socket) enum { #define BTF_SOCK_TYPE(name, str) name, -- cgit From c0e19f2c9a3edd38e4b1bdae98eb44555d02bc31 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Tue, 28 Jun 2022 10:43:07 -0700 Subject: bpf: minimize number of allocated lsm slots per program Previous patch adds 1:1 mapping between all 211 LSM hooks and bpf_cgroup program array. Instead of reserving a slot per possible hook, reserve 10 slots per cgroup for lsm programs. Those slots are dynamically allocated on demand and reclaimed. struct cgroup_bpf { struct bpf_prog_array * effective[33]; /* 0 264 */ /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */ struct hlist_head progs[33]; /* 264 264 */ /* --- cacheline 8 boundary (512 bytes) was 16 bytes ago --- */ u8 flags[33]; /* 528 33 */ /* XXX 7 bytes hole, try to pack */ struct list_head storages; /* 568 16 */ /* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */ struct bpf_prog_array * inactive; /* 584 8 */ struct percpu_ref refcnt; /* 592 16 */ struct work_struct release_work; /* 608 72 */ /* size: 680, cachelines: 11, members: 7 */ /* sum members: 673, holes: 1, sum holes: 7 */ /* last cacheline: 40 bytes */ }; Reviewed-by: Martin KaFai Lau Signed-off-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20220628174314.1216643-5-sdf@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf-cgroup-defs.h | 3 ++- include/linux/bpf.h | 9 ++++++++- include/linux/bpf_lsm.h | 6 ------ 3 files changed, 10 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf-cgroup-defs.h b/include/linux/bpf-cgroup-defs.h index b99f8c3e37ea..7b121bd780eb 100644 --- a/include/linux/bpf-cgroup-defs.h +++ b/include/linux/bpf-cgroup-defs.h @@ -11,7 +11,8 @@ struct bpf_prog_array; #ifdef CONFIG_BPF_LSM -#define CGROUP_LSM_NUM 211 /* will be addressed in the next patch */ +/* Maximum number of concurrently attachable per-cgroup LSM hooks. */ +#define CGROUP_LSM_NUM 10 #else #define CGROUP_LSM_NUM 0 #endif diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 77cd613a00bd..5d2afa55c7c3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2508,7 +2508,6 @@ int bpf_arch_text_invalidate(void *dst, size_t len); struct btf_id_set; bool btf_id_set_contains(const struct btf_id_set *set, u32 id); -int btf_id_set_index(const struct btf_id_set *set, u32 id); #define MAX_BPRINTF_VARARGS 12 @@ -2545,4 +2544,12 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr); int bpf_dynptr_check_size(u32 size); +#ifdef CONFIG_BPF_LSM +void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype); +void bpf_cgroup_atype_put(int cgroup_atype); +#else +static inline void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype) {} +static inline void bpf_cgroup_atype_put(int cgroup_atype) {} +#endif /* CONFIG_BPF_LSM */ + #endif /* _LINUX_BPF_H */ diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h index 61787a5f6af9..4bcf76a9bb06 100644 --- a/include/linux/bpf_lsm.h +++ b/include/linux/bpf_lsm.h @@ -42,7 +42,6 @@ extern const struct bpf_func_proto bpf_inode_storage_get_proto; extern const struct bpf_func_proto bpf_inode_storage_delete_proto; void bpf_inode_storage_free(struct inode *inode); -int bpf_lsm_hook_idx(u32 btf_id); void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, bpf_func_t *bpf_func); #else /* !CONFIG_BPF_LSM */ @@ -73,11 +72,6 @@ static inline void bpf_lsm_find_cgroup_shim(const struct bpf_prog *prog, { } -static inline int bpf_lsm_hook_idx(u32 btf_id) -{ - return -EINVAL; -} - #endif /* CONFIG_BPF_LSM */ #endif /* _LINUX_BPF_LSM_H */ -- cgit From 9113d7e48e9128522b9f5a54dfd30dff10509a92 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Tue, 28 Jun 2022 10:43:09 -0700 Subject: bpf: expose bpf_{g,s}etsockopt to lsm cgroup I don't see how to make it nice without introducing btf id lists for the hooks where these helpers are allowed. Some LSM hooks work on the locked sockets, some are triggering early and don't grab any locks, so have two lists for now: 1. LSM hooks which trigger under socket lock - minority of the hooks, but ideal case for us, we can expose existing BTF-based helpers 2. LSM hooks which trigger without socket lock, but they trigger early in the socket creation path where it should be safe to do setsockopt without any locks 3. The rest are prohibited. I'm thinking that this use-case might be a good gateway to sleeping lsm cgroup hooks in the future. We can either expose lock/unlock operations (and add tracking to the verifier) or have another set of bpf_setsockopt wrapper that grab the locks and might sleep. Reviewed-by: Martin KaFai Lau Signed-off-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20220628174314.1216643-7-sdf@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5d2afa55c7c3..2b21f2a3452f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2386,6 +2386,8 @@ extern const struct bpf_func_proto bpf_for_each_map_elem_proto; extern const struct bpf_func_proto bpf_btf_find_by_name_kind_proto; extern const struct bpf_func_proto bpf_sk_setsockopt_proto; extern const struct bpf_func_proto bpf_sk_getsockopt_proto; +extern const struct bpf_func_proto bpf_unlocked_sk_setsockopt_proto; +extern const struct bpf_func_proto bpf_unlocked_sk_getsockopt_proto; extern const struct bpf_func_proto bpf_find_vma_proto; extern const struct bpf_func_proto bpf_loop_proto; extern const struct bpf_func_proto bpf_copy_from_user_task_proto; -- cgit From f163f0302ab69722c052519f4014814bf10026a9 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:21 +0200 Subject: context_tracking: Rename context_tracking_user_enter/exit() to user_enter/exit_callable() context_tracking_user_enter() and context_tracking_user_exit() are ASM callable versions of user_enter() and user_exit() for architectures that didn't manage to check the context tracking static key from ASM. Change those function names to better reflect their purpose. [ frederic: Apply Max Filippov feedback. ] Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 773035124bad..69532cd18f72 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -19,8 +19,8 @@ extern void __ct_user_exit(enum ctx_state state); extern void context_tracking_enter(enum ctx_state state); extern void context_tracking_exit(enum ctx_state state); -extern void context_tracking_user_enter(void); -extern void context_tracking_user_exit(void); +extern void user_enter_callable(void); +extern void user_exit_callable(void); static inline void user_enter(void) { -- cgit From fe98db1c6d1ad7349e61a5a2766ad64975bc9ae4 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:22 +0200 Subject: context_tracking: Rename context_tracking_enter/exit() to ct_user_enter/exit() context_tracking_enter() and context_tracking_exit() have confusing names that don't explain the fact they are referring to user/guest state. Use more self-explanatory names and shrink to the new context tracking prefix instead. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 69532cd18f72..7a5f04ae1758 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -17,21 +17,22 @@ extern void context_tracking_cpu_set(int cpu); extern void __ct_user_enter(enum ctx_state state); extern void __ct_user_exit(enum ctx_state state); -extern void context_tracking_enter(enum ctx_state state); -extern void context_tracking_exit(enum ctx_state state); +extern void ct_user_enter(enum ctx_state state); +extern void ct_user_exit(enum ctx_state state); + extern void user_enter_callable(void); extern void user_exit_callable(void); static inline void user_enter(void) { if (context_tracking_enabled()) - context_tracking_enter(CONTEXT_USER); + ct_user_enter(CONTEXT_USER); } static inline void user_exit(void) { if (context_tracking_enabled()) - context_tracking_exit(CONTEXT_USER); + ct_user_exit(CONTEXT_USER); } /* Called with interrupts disabled. */ @@ -57,7 +58,7 @@ static inline enum ctx_state exception_enter(void) prev_ctx = this_cpu_read(context_tracking.state); if (prev_ctx != CONTEXT_KERNEL) - context_tracking_exit(prev_ctx); + ct_user_exit(prev_ctx); return prev_ctx; } @@ -67,7 +68,7 @@ static inline void exception_exit(enum ctx_state prev_ctx) if (!IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK) && context_tracking_enabled()) { if (prev_ctx != CONTEXT_KERNEL) - context_tracking_enter(prev_ctx); + ct_user_enter(prev_ctx); } } -- cgit From 2a0aafce963dab996b843f6cdb49237b0ae4a118 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:23 +0200 Subject: context_tracking: Rename context_tracking_cpu_set() to ct_cpu_track_user() context_tracking_cpu_set() is called in order to tell a CPU to track user/kernel transitions. Since context tracking is going to expand in to also track transitions from/to idle/IRQ/NMIs, the scope of this function name becomes too broad and needs to be made more specific. Also shorten the prefix to align with the new namespace. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 7a5f04ae1758..63259fece7c7 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -11,7 +11,7 @@ #ifdef CONFIG_CONTEXT_TRACKING -extern void context_tracking_cpu_set(int cpu); +extern void ct_cpu_track_user(int cpu); /* Called with interrupts disabled. */ extern void __ct_user_enter(enum ctx_state state); -- cgit From 24a9c54182b3758801b8ca6c8c237cc2ff654732 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:24 +0200 Subject: context_tracking: Split user tracking Kconfig Context tracking is going to be used not only to track user transitions but also idle/IRQs/NMIs. The user tracking part will then become a separate feature. Prepare Kconfig for that. [ frederic: Apply Max Filippov feedback. ] Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking.h | 12 ++++++------ include/linux/context_tracking_state.h | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 63259fece7c7..e35ae66b4794 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -10,7 +10,7 @@ #include -#ifdef CONFIG_CONTEXT_TRACKING +#ifdef CONFIG_CONTEXT_TRACKING_USER extern void ct_cpu_track_user(int cpu); /* Called with interrupts disabled. */ @@ -52,7 +52,7 @@ static inline enum ctx_state exception_enter(void) { enum ctx_state prev_ctx; - if (IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK) || + if (IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK) || !context_tracking_enabled()) return 0; @@ -65,7 +65,7 @@ static inline enum ctx_state exception_enter(void) static inline void exception_exit(enum ctx_state prev_ctx) { - if (!IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK) && + if (!IS_ENABLED(CONFIG_HAVE_CONTEXT_TRACKING_USER_OFFSTACK) && context_tracking_enabled()) { if (prev_ctx != CONTEXT_KERNEL) ct_user_enter(prev_ctx); @@ -109,14 +109,14 @@ static inline enum ctx_state ct_state(void) { return CONTEXT_DISABLED; } static __always_inline bool context_tracking_guest_enter(void) { return false; } static inline void context_tracking_guest_exit(void) { } -#endif /* !CONFIG_CONTEXT_TRACKING */ +#endif /* !CONFIG_CONTEXT_TRACKING_USER */ #define CT_WARN_ON(cond) WARN_ON(context_tracking_enabled() && (cond)) -#ifdef CONFIG_CONTEXT_TRACKING_FORCE +#ifdef CONFIG_CONTEXT_TRACKING_USER_FORCE extern void context_tracking_init(void); #else static inline void context_tracking_init(void) { } -#endif /* CONFIG_CONTEXT_TRACKING_FORCE */ +#endif /* CONFIG_CONTEXT_TRACKING_USER_FORCE */ #endif diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index edc7b46376a6..2b46afe105a9 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -22,7 +22,7 @@ struct context_tracking { } state; }; -#ifdef CONFIG_CONTEXT_TRACKING +#ifdef CONFIG_CONTEXT_TRACKING_USER extern struct static_key_false context_tracking_key; DECLARE_PER_CPU(struct context_tracking, context_tracking); @@ -45,6 +45,6 @@ static inline bool context_tracking_enabled_this_cpu(void) static __always_inline bool context_tracking_enabled(void) { return false; } static __always_inline bool context_tracking_enabled_cpu(int cpu) { return false; } static __always_inline bool context_tracking_enabled_this_cpu(void) { return false; } -#endif /* CONFIG_CONTEXT_TRACKING */ +#endif /* CONFIG_CONTEXT_TRACKING_USER */ #endif -- cgit From 5f278dbd540b7548bc5193552e6d478255c14c2d Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Tue, 28 Jun 2022 12:10:15 -0700 Subject: iosys-map: Add per-word read MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Instead of always falling back to memcpy_fromio() for any size, prefer using read{b,w,l}(). When reading struct members it's common to read individual integer variables individually. Going through memcpy_fromio() for each of them poses a high penalty. Employ a similar trick as __seqprop() by using _Generic() to generate only the specific call based on a type-compatible variable. For a pariticular i915 workload producing GPU context switches, __get_engine_usage_record() is particularly hot since the engine usage is read from device local memory with dgfx, possibly multiple times since it's racy. Test execution time for this test shows a ~12.5% improvement with DG2: Before: nrepeats = 1000; min = 7.63243e+06; max = 1.01817e+07; median = 9.52548e+06; var = 526149; After: nrepeats = 1000; min = 7.03402e+06; max = 8.8832e+06; median = 8.33955e+06; var = 333113; Other things attempted that didn't prove very useful: 1) Change the _Generic() on x86 to just dereference the memory address 2) Change __get_engine_usage_record() to do just 1 read per loop, comparing with the previous value read 3) Change __get_engine_usage_record() to access the fields directly as it was before the conversion to iosys-map (3) did gave a small improvement (~3%), but doesn't seem to scale well to other similar cases in the driver. Additional test by Chris Wilson using gem_create from igt with some changes to track object creation time. This happens to accidentally stress this code path: Pre iosys_map conversion of engine busyness: lmem0: Creating 262144 4KiB objects took 59274.2ms Unpatched: lmem0: Creating 262144 4KiB objects took 108830.2ms With readl (this patch): lmem0: Creating 262144 4KiB objects took 61348.6ms s/readl/READ_ONCE/ lmem0: Creating 262144 4KiB objects took 61333.2ms So we do take a little bit more time than before the conversion, but that is due to other factors: bringing the READ_ONCE back would be as good as just doing this conversion. v2: - Remove default from _Generic() - callers wanting to read more than u64 should use iosys_map_memcpy_from() - Add READ_ONCE() cases dereferencing the pointer when using system memory v3: - Fix precedence issue when casting inside READ_ONCE(). By not using () around vaddr__ the offset was not part of the cast, but rather added to it, producing a wrong address - Remove compiletime_assert() as READ_ONCE() already contains it Signed-off-by: Lucas De Marchi Reviewed-by: Christian König # v1 Reviewed-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220628191016.3899428-1-lucas.demarchi@intel.com --- include/linux/iosys-map.h | 42 +++++++++++++++++++++++++++++++++--------- 1 file changed, 33 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index 4b8406ee8bc4..d19977e011ae 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -6,6 +6,7 @@ #ifndef __IOSYS_MAP_H__ #define __IOSYS_MAP_H__ +#include #include #include @@ -333,6 +334,23 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, memset(dst->vaddr + offset, value, len); } +#ifdef CONFIG_64BIT +#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ + u64: val_ = readq(vaddr_iomem_) +#else +#define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ + u64: memcpy_fromio(&(val_), vaddr_iomem_, sizeof(u64)) +#endif + +#define __iosys_map_rd_io(val__, vaddr_iomem__, type__) _Generic(val__, \ + u8: val__ = readb(vaddr_iomem__), \ + u16: val__ = readw(vaddr_iomem__), \ + u32: val__ = readl(vaddr_iomem__), \ + __iosys_map_rd_io_u64_case(val__, vaddr_iomem__)) + +#define __iosys_map_rd_sys(val__, vaddr__, type__) \ + val__ = READ_ONCE(*(type__ *)(vaddr__)) + /** * iosys_map_rd - Read a C-type value from the iosys_map * @@ -340,16 +358,21 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, * @offset__: The offset from which to read * @type__: Type of the value being read * - * Read a C type value from iosys_map, handling possible un-aligned accesses to - * the mapping. + * Read a C type value (u8, u16, u32 and u64) from iosys_map. For other types or + * if pointer may be unaligned (and problematic for the architecture supported), + * use iosys_map_memcpy_from(). * * Returns: * The value read from the mapping. */ -#define iosys_map_rd(map__, offset__, type__) ({ \ - type__ val; \ - iosys_map_memcpy_from(&val, map__, offset__, sizeof(val)); \ - val; \ +#define iosys_map_rd(map__, offset__, type__) ({ \ + type__ val; \ + if ((map__)->is_iomem) { \ + __iosys_map_rd_io(val, (map__)->vaddr_iomem + (offset__), type__);\ + } else { \ + __iosys_map_rd_sys(val, (map__)->vaddr + (offset__), type__); \ + } \ + val; \ }) /** @@ -379,9 +402,10 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, * * Read a value from iosys_map considering its layout is described by a C struct * starting at @struct_offset__. The field offset and size is calculated and its - * value read handling possible un-aligned memory accesses. For example: suppose - * there is a @struct foo defined as below and the value ``foo.field2.inner2`` - * needs to be read from the iosys_map: + * value read. If the field access would incur in un-aligned access, then either + * iosys_map_memcpy_from() needs to be used or the architecture must support it. + * For example: suppose there is a @struct foo defined as below and the value + * ``foo.field2.inner2`` needs to be read from the iosys_map: * * .. code-block:: c * -- cgit From 6fb5ee7cec06266a29f25ecc01a23b9d107f64e1 Mon Sep 17 00:00:00 2001 From: Lucas De Marchi Date: Tue, 28 Jun 2022 12:10:16 -0700 Subject: iosys-map: Add per-word write MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Like was done for read, provide the equivalent for write. Even if current users are not in the hot path, this should future-proof it. v2: - Remove default from _Generic() - callers wanting to write more than u64 should use iosys_map_memcpy_to() - Add WRITE_ONCE() cases dereferencing the pointer when using system memory v3: - Fix precedence issue when casting inside WRITE_ONCE(). By not using () around vaddr__ the offset was not part of the cast, but rather added to it, producing a wrong address - Remove compiletime_assert() as WRITE_ONCE() already contains it Signed-off-by: Lucas De Marchi Reviewed-by: Reviewed-by: Christian König # v1 Reviewed-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220628191016.3899428-2-lucas.demarchi@intel.com --- include/linux/iosys-map.h | 38 +++++++++++++++++++++++++++++--------- 1 file changed, 29 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index d19977e011ae..a533cae189d7 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -337,9 +337,13 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, #ifdef CONFIG_64BIT #define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ u64: val_ = readq(vaddr_iomem_) +#define __iosys_map_wr_io_u64_case(val_, vaddr_iomem_) \ + u64: writeq(val_, vaddr_iomem_) #else #define __iosys_map_rd_io_u64_case(val_, vaddr_iomem_) \ u64: memcpy_fromio(&(val_), vaddr_iomem_, sizeof(u64)) +#define __iosys_map_wr_io_u64_case(val_, vaddr_iomem_) \ + u64: memcpy_toio(vaddr_iomem_, &(val_), sizeof(u64)) #endif #define __iosys_map_rd_io(val__, vaddr_iomem__, type__) _Generic(val__, \ @@ -351,6 +355,15 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, #define __iosys_map_rd_sys(val__, vaddr__, type__) \ val__ = READ_ONCE(*(type__ *)(vaddr__)) +#define __iosys_map_wr_io(val__, vaddr_iomem__, type__) _Generic(val__, \ + u8: writeb(val__, vaddr_iomem__), \ + u16: writew(val__, vaddr_iomem__), \ + u32: writel(val__, vaddr_iomem__), \ + __iosys_map_wr_io_u64_case(val__, vaddr_iomem__)) + +#define __iosys_map_wr_sys(val__, vaddr__, type__) \ + WRITE_ONCE(*(type__ *)(vaddr__), val__) + /** * iosys_map_rd - Read a C-type value from the iosys_map * @@ -383,12 +396,17 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, * @type__: Type of the value being written * @val__: Value to write * - * Write a C-type value to the iosys_map, handling possible un-aligned accesses - * to the mapping. + * Write a C type value (u8, u16, u32 and u64) to the iosys_map. For other types + * or if pointer may be unaligned (and problematic for the architecture + * supported), use iosys_map_memcpy_to() */ -#define iosys_map_wr(map__, offset__, type__, val__) ({ \ - type__ val = (val__); \ - iosys_map_memcpy_to(map__, offset__, &val, sizeof(val)); \ +#define iosys_map_wr(map__, offset__, type__, val__) ({ \ + type__ val = (val__); \ + if ((map__)->is_iomem) { \ + __iosys_map_wr_io(val, (map__)->vaddr_iomem + (offset__), type__);\ + } else { \ + __iosys_map_wr_sys(val, (map__)->vaddr + (offset__), type__); \ + } \ }) /** @@ -469,10 +487,12 @@ static inline void iosys_map_memset(struct iosys_map *dst, size_t offset, * @field__: Member of the struct to read * @val__: Value to write * - * Write a value to the iosys_map considering its layout is described by a C struct - * starting at @struct_offset__. The field offset and size is calculated and the - * @val__ is written handling possible un-aligned memory accesses. Refer to - * iosys_map_rd_field() for expected usage and memory layout. + * Write a value to the iosys_map considering its layout is described by a C + * struct starting at @struct_offset__. The field offset and size is calculated + * and the @val__ is written. If the field access would incur in un-aligned + * access, then either iosys_map_memcpy_to() needs to be used or the + * architecture must support it. Refer to iosys_map_rd_field() for expected + * usage and memory layout. */ #define iosys_map_wr_field(map__, struct_offset__, struct_type__, field__, val__) ({ \ struct_type__ *s; \ -- cgit From 1758bde2e4aa5ff188d53e7d9d388bbb7e12eebb Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Tue, 28 Jun 2022 12:15:08 +0200 Subject: net: phy: Don't trigger state machine while in suspend Upon system sleep, mdio_bus_phy_suspend() stops the phy_state_machine(), but subsequent interrupts may retrigger it: They may have been left enabled to facilitate wakeup and are not quiesced until the ->suspend_noirq() phase. Unwanted interrupts may hence occur between mdio_bus_phy_suspend() and dpm_suspend_noirq(), as well as between dpm_resume_noirq() and mdio_bus_phy_resume(). Retriggering the phy_state_machine() through an interrupt is not only undesirable for the reason given in mdio_bus_phy_suspend() (freezing it midway with phydev->lock held), but also because the PHY may be inaccessible after it's suspended: Accesses to USB-attached PHYs are blocked once usb_suspend_both() clears the can_submit flag and PHYs on PCI network cards may become inaccessible upon suspend as well. Amend phy_interrupt() to avoid triggering the state machine if the PHY is suspended. Signal wakeup instead if the attached net_device or its parent has been configured as a wakeup source. (Those conditions are identical to mdio_bus_phy_may_suspend().) Postpone handling of the interrupt until the PHY has resumed. Before stopping the phy_state_machine() in mdio_bus_phy_suspend(), wait for a concurrent phy_interrupt() to run to completion. That is necessary because phy_interrupt() may have checked the PHY's suspend status before the system sleep transition commenced and it may thus retrigger the state machine after it was stopped. Likewise, after re-enabling interrupt handling in mdio_bus_phy_resume(), wait for a concurrent phy_interrupt() to complete to ensure that interrupts which it postponed are properly rerun. The issue was exposed by commit 1ce8b37241ed ("usbnet: smsc95xx: Forward PHY interrupts to PHY driver to avoid polling"), but has existed since forever. Fixes: 541cd3ee00a4 ("phylib: Fix deadlock on resume") Link: https://lore.kernel.org/netdev/a5315a8a-32c2-962f-f696-de9a26d30091@samsung.com/ Reported-by: Marek Szyprowski Tested-by: Marek Szyprowski Signed-off-by: Lukas Wunner Acked-by: Rafael J. Wysocki Cc: stable@vger.kernel.org # v2.6.33+ Reviewed-by: Andrew Lunn Link: https://lore.kernel.org/r/b7f386d04e9b5b0e2738f0125743e30676f309ef.1656410895.git.lukas@wunner.de Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 508f1149665b..b09f7d36cff2 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -572,6 +572,10 @@ struct macsec_ops; * @mdix_ctrl: User setting of crossover * @pma_extable: Cached value of PMA/PMD Extended Abilities Register * @interrupts: Flag interrupts have been enabled + * @irq_suspended: Flag indicating PHY is suspended and therefore interrupt + * handling shall be postponed until PHY has resumed + * @irq_rerun: Flag indicating interrupts occurred while PHY was suspended, + * requiring a rerun of the interrupt handler after resume * @interface: enum phy_interface_t value * @skb: Netlink message for cable diagnostics * @nest: Netlink nest used for cable diagnostics @@ -626,6 +630,8 @@ struct phy_device { /* Interrupts are enabled */ unsigned interrupts:1; + unsigned irq_suspended:1; + unsigned irq_rerun:1; enum phy_state state; -- cgit From c381d02b2fd5f82d2207db1b9b25ff60d0d9c27c Mon Sep 17 00:00:00 2001 From: Yuwei Wang Date: Wed, 29 Jun 2022 08:48:31 +0000 Subject: sysctl: add proc_dointvec_ms_jiffies_minmax add proc_dointvec_ms_jiffies_minmax to fit read msecs value to jiffies with a limited range of values Signed-off-by: Yuwei Wang Signed-off-by: Paolo Abeni --- include/linux/sysctl.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 80263f7cdb77..17b42ce89d3e 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -75,6 +75,8 @@ int proc_douintvec_minmax(struct ctl_table *table, int write, void *buffer, int proc_dou8vec_minmax(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); int proc_dointvec_jiffies(struct ctl_table *, int, void *, size_t *, loff_t *); +int proc_dointvec_ms_jiffies_minmax(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos); int proc_dointvec_userhz_jiffies(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_dointvec_ms_jiffies(struct ctl_table *, int, void *, size_t *, -- cgit From 95c8222f0e52b09b7607616274e7cae84d519a9b Mon Sep 17 00:00:00 2001 From: David Jander Date: Wed, 29 Jun 2022 16:25:18 +0200 Subject: spi: spi.c: Fix comment style Capitalize first word in comment where appropriate and add parentheses to function names. Reported-by: Andy Shevchenko Signed-off-by: David Jander Link: https://lore.kernel.org/r/20220629142519.3985486-3-david@protonic.nl Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 82 ++++++++++++++++++++++++------------------------- 1 file changed, 41 insertions(+), 41 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index eb0d316e3c36..e6c73d5ff1a8 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -176,13 +176,13 @@ extern int spi_delay_exec(struct spi_delay *_delay, struct spi_transfer *xfer); struct spi_device { struct device dev; struct spi_controller *controller; - struct spi_controller *master; /* compatibility layer */ + struct spi_controller *master; /* Compatibility layer */ u32 max_speed_hz; u8 chip_select; u8 bits_per_word; bool rt; -#define SPI_NO_TX BIT(31) /* no transmit wire */ -#define SPI_NO_RX BIT(30) /* no receive wire */ +#define SPI_NO_TX BIT(31) /* No transmit wire */ +#define SPI_NO_RX BIT(30) /* No receive wire */ /* * All bits defined above should be covered by SPI_MODE_KERNEL_MASK. * The SPI_MODE_KERNEL_MASK has the SPI_MODE_USER_MASK counterpart, @@ -199,14 +199,14 @@ struct spi_device { void *controller_data; char modalias[SPI_NAME_SIZE]; const char *driver_override; - struct gpio_desc *cs_gpiod; /* chip select gpio desc */ - struct spi_delay word_delay; /* inter-word delay */ + struct gpio_desc *cs_gpiod; /* Chip select gpio desc */ + struct spi_delay word_delay; /* Inter-word delay */ /* CS delays */ struct spi_delay cs_setup; struct spi_delay cs_hold; struct spi_delay cs_inactive; - /* the statistics */ + /* The statistics */ struct spi_statistics __percpu *pcpu_statistics; /* @@ -228,7 +228,7 @@ static inline struct spi_device *to_spi_device(struct device *dev) return dev ? container_of(dev, struct spi_device, dev) : NULL; } -/* most drivers won't need to care about device refcounting */ +/* Most drivers won't need to care about device refcounting */ static inline struct spi_device *spi_dev_get(struct spi_device *spi) { return (spi && get_device(&spi->dev)) ? spi : NULL; @@ -251,7 +251,7 @@ static inline void spi_set_ctldata(struct spi_device *spi, void *state) spi->controller_state = state; } -/* device driver data */ +/* Device driver data */ static inline void spi_set_drvdata(struct spi_device *spi, void *data) { @@ -318,7 +318,7 @@ static inline void spi_unregister_driver(struct spi_driver *sdrv) extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 chip_select); -/* use a define to avoid include chaining to get THIS_MODULE */ +/* Use a define to avoid include chaining to get THIS_MODULE */ #define spi_register_driver(driver) \ __spi_register_driver(THIS_MODULE, driver) @@ -486,7 +486,7 @@ struct spi_controller { struct list_head list; - /* other than negative (== assign one dynamically), bus_num is fully + /* Other than negative (== assign one dynamically), bus_num is fully * board-specific. usually that simplifies to being SOC-specific. * example: one SOC has three SPI controllers, numbered 0..2, * and one board's schematics might show it using SPI-2. software @@ -499,7 +499,7 @@ struct spi_controller { */ u16 num_chipselect; - /* some SPI controllers pose alignment requirements on DMAable + /* Some SPI controllers pose alignment requirements on DMAable * buffers; let protocol drivers know about these requirements. */ u16 dma_alignment; @@ -510,29 +510,29 @@ struct spi_controller { /* spi_device.mode flags override flags for this controller */ u32 buswidth_override_bits; - /* bitmask of supported bits_per_word for transfers */ + /* Bitmask of supported bits_per_word for transfers */ u32 bits_per_word_mask; #define SPI_BPW_MASK(bits) BIT((bits) - 1) #define SPI_BPW_RANGE_MASK(min, max) GENMASK((max) - 1, (min) - 1) - /* limits on transfer speed */ + /* Limits on transfer speed */ u32 min_speed_hz; u32 max_speed_hz; - /* other constraints relevant to this driver */ + /* Other constraints relevant to this driver */ u16 flags; -#define SPI_CONTROLLER_HALF_DUPLEX BIT(0) /* can't do full duplex */ -#define SPI_CONTROLLER_NO_RX BIT(1) /* can't do buffer read */ -#define SPI_CONTROLLER_NO_TX BIT(2) /* can't do buffer write */ -#define SPI_CONTROLLER_MUST_RX BIT(3) /* requires rx */ -#define SPI_CONTROLLER_MUST_TX BIT(4) /* requires tx */ +#define SPI_CONTROLLER_HALF_DUPLEX BIT(0) /* Can't do full duplex */ +#define SPI_CONTROLLER_NO_RX BIT(1) /* Can't do buffer read */ +#define SPI_CONTROLLER_NO_TX BIT(2) /* Can't do buffer write */ +#define SPI_CONTROLLER_MUST_RX BIT(3) /* Requires rx */ +#define SPI_CONTROLLER_MUST_TX BIT(4) /* Requires tx */ #define SPI_MASTER_GPIO_SS BIT(5) /* GPIO CS must select slave */ - /* flag indicating if the allocation of this struct is devres-managed */ + /* Flag indicating if the allocation of this struct is devres-managed */ bool devm_allocated; - /* flag indicating this is an SPI slave controller */ + /* Flag indicating this is an SPI slave controller */ bool slave; /* @@ -548,11 +548,11 @@ struct spi_controller { /* Used to avoid adding the same CS twice */ struct mutex add_lock; - /* lock and mutex for SPI bus locking */ + /* Lock and mutex for SPI bus locking */ spinlock_t bus_lock_spinlock; struct mutex bus_lock_mutex; - /* flag indicating that the SPI bus is locked for exclusive use */ + /* Flag indicating that the SPI bus is locked for exclusive use */ bool bus_lock_flag; /* Setup mode and clock, etc (spi driver may call many times). @@ -573,7 +573,7 @@ struct spi_controller { */ int (*set_cs_timing)(struct spi_device *spi); - /* bidirectional bulk transfers + /* Bidirectional bulk transfers * * + The transfer() method may not sleep; its main role is * just to add the message to the queue. @@ -595,7 +595,7 @@ struct spi_controller { int (*transfer)(struct spi_device *spi, struct spi_message *mesg); - /* called on release() to free memory provided by spi_controller */ + /* Called on release() to free memory provided by spi_controller */ void (*cleanup)(struct spi_device *spi); /* @@ -666,14 +666,14 @@ struct spi_controller { s8 unused_native_cs; s8 max_native_cs; - /* statistics */ + /* Statistics */ struct spi_statistics __percpu *pcpu_statistics; /* DMA channels for use with core dmaengine helpers */ struct dma_chan *dma_tx; struct dma_chan *dma_rx; - /* dummy data for full duplex devices */ + /* Dummy data for full duplex devices */ void *dummy_rx; void *dummy_tx; @@ -738,7 +738,7 @@ void spi_take_timestamp_post(struct spi_controller *ctlr, struct spi_transfer *xfer, size_t progress, bool irqs_off); -/* the spi driver core manages memory for the spi_controller classdev */ +/* The spi driver core manages memory for the spi_controller classdev */ extern struct spi_controller *__spi_alloc_controller(struct device *host, unsigned int size, bool slave); @@ -808,7 +808,7 @@ typedef void (*spi_res_release_t)(struct spi_controller *ctlr, struct spi_res { struct list_head entry; spi_res_release_t release; - unsigned long long data[]; /* guarantee ull alignment */ + unsigned long long data[]; /* Guarantee ull alignment */ }; /*---------------------------------------------------------------------------*/ @@ -941,7 +941,7 @@ struct spi_res { * and its transfers, ignore them until its completion callback. */ struct spi_transfer { - /* it's ok if tx_buf == rx_buf (right?) + /* It's ok if tx_buf == rx_buf (right?) * for MicroWire, one buffer must be null * buffers must work with dma_*map_single() calls, unless * spi_message.is_dma_mapped reports a pre-existing mapping @@ -1032,24 +1032,24 @@ struct spi_message { * tell them about such special cases. */ - /* completion is reported through a callback */ + /* Completion is reported through a callback */ void (*complete)(void *context); void *context; unsigned frame_length; unsigned actual_length; int status; - /* for optional use by whatever driver currently owns the + /* For optional use by whatever driver currently owns the * spi_message ... between calls to spi_async and then later * complete(), that's the spi_controller controller driver. */ struct list_head queue; void *state; - /* list of spi_res reources when the spi message is processed */ + /* List of spi_res reources when the spi message is processed */ struct list_head resources; - /* spi_prepare_message was called for this message */ + /* spi_prepare_message() was called for this message */ bool prepared; }; @@ -1154,7 +1154,7 @@ spi_max_transfer_size(struct spi_device *spi) if (ctlr->max_transfer_size) tr_max = ctlr->max_transfer_size(spi); - /* transfer size limit must not be greater than message size limit */ + /* Transfer size limit must not be greater than message size limit */ return min(tr_max, msg_max); } @@ -1305,7 +1305,7 @@ spi_read(struct spi_device *spi, void *buf, size_t len) return spi_sync_transfer(spi, &t, 1); } -/* this copies txbuf and rxbuf data; for small transfers only! */ +/* This copies txbuf and rxbuf data; for small transfers only! */ extern int spi_write_then_read(struct spi_device *spi, const void *txbuf, unsigned n_tx, void *rxbuf, unsigned n_rx); @@ -1328,7 +1328,7 @@ static inline ssize_t spi_w8r8(struct spi_device *spi, u8 cmd) status = spi_write_then_read(spi, &cmd, 1, &result, 1); - /* return negative errno or unsigned value */ + /* Return negative errno or unsigned value */ return (status < 0) ? status : result; } @@ -1353,7 +1353,7 @@ static inline ssize_t spi_w8r16(struct spi_device *spi, u8 cmd) status = spi_write_then_read(spi, &cmd, 1, &result, 2); - /* return negative errno or unsigned value */ + /* Return negative errno or unsigned value */ return (status < 0) ? status : result; } @@ -1433,7 +1433,7 @@ static inline ssize_t spi_w8r16be(struct spi_device *spi, u8 cmd) * are active in some dynamic board configuration models. */ struct spi_board_info { - /* the device name and module name are coupled, like platform_bus; + /* The device name and module name are coupled, like platform_bus; * "modalias" is normally the driver name. * * platform_data goes to spi_device.dev.platform_data, @@ -1446,7 +1446,7 @@ struct spi_board_info { void *controller_data; int irq; - /* slower signaling on noisy or low voltage boards */ + /* Slower signaling on noisy or low voltage boards */ u32 max_speed_hz; @@ -1475,7 +1475,7 @@ struct spi_board_info { extern int spi_register_board_info(struct spi_board_info const *info, unsigned n); #else -/* board init code may ignore whether SPI is configured or not */ +/* Board init code may ignore whether SPI is configured or not */ static inline int spi_register_board_info(struct spi_board_info const *info, unsigned n) { return 0; } -- cgit From 72a43046b61a3fe7164a622224bcdfc3cf6b795d Mon Sep 17 00:00:00 2001 From: Chanho Park Date: Wed, 29 Jun 2022 09:41:41 +0900 Subject: tty: serial: samsung_tty: loopback mode support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Internal loopback mode can be supported by setting UCON register's Loopback Mode bit. The mode & bit can be supported since s3c2410 and later SoCs. The prefix of LOOPBACK / BIT(5) naming should be also changed to S3C2410_ in order to avoid confusion. Reviewed-by: Krzysztof Kozlowski Reviewed-by: Ilpo Järvinen Signed-off-by: Chanho Park Link: https://lore.kernel.org/r/20220629004141.51484-1-chanho61.park@samsung.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_s3c.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/serial_s3c.h b/include/linux/serial_s3c.h index dec15f5b3dec..1672cf0810ef 100644 --- a/include/linux/serial_s3c.h +++ b/include/linux/serial_s3c.h @@ -83,7 +83,7 @@ #define S3C2410_UCON_RXIRQMODE (1<<0) #define S3C2410_UCON_RXFIFO_TOI (1<<7) #define S3C2443_UCON_RXERR_IRQEN (1<<6) -#define S3C2443_UCON_LOOPBACK (1<<5) +#define S3C2410_UCON_LOOPBACK (1<<5) #define S3C2410_UCON_DEFAULT (S3C2410_UCON_TXILEVEL | \ S3C2410_UCON_RXILEVEL | \ -- cgit From f9b11229b79c0fb2100b5bb4628a101b1d37fbf6 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Wed, 29 Jun 2022 12:48:41 +0300 Subject: serial: 8250: Fix PM usage_count for console handover MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When console is enabled, univ8250_console_setup() calls serial8250_console_setup() before .dev is set to uart_port. Therefore, it will not call pm_runtime_get_sync(). Later, when the actual driver is going to take over univ8250_console_exit() is called. As .dev is already set, serial8250_console_exit() makes pm_runtime_put_sync() call with usage count being zero triggering PM usage count warning (extra debug for univ8250_console_setup(), univ8250_console_exit(), and serial8250_register_ports()): [ 0.068987] univ8250_console_setup ttyS0 nodev [ 0.499670] printk: console [ttyS0] enabled [ 0.717955] printk: console [ttyS0] printing thread started [ 1.960163] serial8250_register_ports assigned dev for ttyS0 [ 1.976830] printk: console [ttyS0] disabled [ 1.976888] printk: console [ttyS0] printing thread stopped [ 1.977073] univ8250_console_exit ttyS0 usage:0 [ 1.977075] serial8250 serial8250: Runtime PM usage count underflow! [ 1.977429] dw-apb-uart.6: ttyS0 at MMIO 0x4010006000 (irq = 33, base_baud = 115200) is a 16550A [ 1.977812] univ8250_console_setup ttyS0 usage:2 [ 1.978167] printk: console [ttyS0] printing thread started [ 1.978203] printk: console [ttyS0] enabled To fix the issue, call pm_runtime_get_sync() in serial8250_register_ports() as soon as .dev is set for an uart_port if it has console enabled. This problem became apparent only recently because 82586a721595 ("PM: runtime: Avoid device usage count underflows") added the warning printout. I confirmed this problem also occurs with v5.18 (w/o the warning printout, obviously). Fixes: bedb404e91bb ("serial: 8250_port: Don't use power management for kernel console") Cc: stable Tested-by: Tony Lindgren Reviewed-by: Andy Shevchenko Reviewed-by: Tony Lindgren Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/b4f428e9-491f-daf2-2232-819928dc276e@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 657a0fc68a3f..fde258b3decd 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -390,6 +390,11 @@ static const bool earlycon_acpi_spcr_enable EARLYCON_USED_OR_UNUSED; static inline int setup_earlycon(char *buf) { return 0; } #endif +static inline bool uart_console_enabled(struct uart_port *port) +{ + return uart_console(port) && (port->cons->flags & CON_ENABLED); +} + struct uart_port *uart_get_console(struct uart_port *ports, int nr, struct console *c); int uart_parse_earlycon(char *p, unsigned char *iotype, resource_size_t *addr, -- cgit From 6e97eba8ad8748fabb795cffc5d9e1a7dcfd7367 Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Tue, 28 Jun 2022 18:59:10 +0300 Subject: vfio: Split migration ops from main device ops vfio core checks whether the driver sets some migration op (e.g. set_state/get_state) and accordingly calls its op. However, currently mlx5 driver sets the above ops without regards to its migration caps. This might lead to unexpected usage/Oops if user space may call to the above ops even if the driver doesn't support migration. As for example, the migration state_mutex is not initialized in that case. The cleanest way to manage that seems to split the migration ops from the main device ops, this will let the driver setting them separately from the main ops when it's applicable. As part of that, validate ops construction on registration and include a check for VFIO_MIGRATION_STOP_COPY since the uAPI claims it must be set in migration_flags. HISI driver was changed as well to match this scheme. This scheme may enable down the road to come with some extra group of ops (e.g. DMA log) that can be set without regards to the other options based on driver caps. Fixes: 6fadb021266d ("vfio/mlx5: Implement vfio_pci driver for mlx5 devices") Reviewed-by: Kevin Tian Signed-off-by: Yishai Hadas Link: https://lore.kernel.org/r/20220628155910.171454-3-yishaih@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 49580fa2073a..4d26e149db81 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -32,6 +32,11 @@ struct vfio_device_set { struct vfio_device { struct device *dev; const struct vfio_device_ops *ops; + /* + * mig_ops is a static property of the vfio_device which must be set + * prior to registering the vfio_device. + */ + const struct vfio_migration_ops *mig_ops; struct vfio_group *group; struct vfio_device_set *dev_set; struct list_head dev_set_list; @@ -61,16 +66,6 @@ struct vfio_device { * match, -errno for abort (ex. match with insufficient or incorrect * additional args) * @device_feature: Optional, fill in the VFIO_DEVICE_FEATURE ioctl - * @migration_set_state: Optional callback to change the migration state for - * devices that support migration. It's mandatory for - * VFIO_DEVICE_FEATURE_MIGRATION migration support. - * The returned FD is used for data transfer according to the FSM - * definition. The driver is responsible to ensure that FD reaches end - * of stream or error whenever the migration FSM leaves a data transfer - * state or before close_device() returns. - * @migration_get_state: Optional callback to get the migration state for - * devices that support migration. It's mandatory for - * VFIO_DEVICE_FEATURE_MIGRATION migration support. */ struct vfio_device_ops { char *name; @@ -87,6 +82,21 @@ struct vfio_device_ops { int (*match)(struct vfio_device *vdev, char *buf); int (*device_feature)(struct vfio_device *device, u32 flags, void __user *arg, size_t argsz); +}; + +/** + * @migration_set_state: Optional callback to change the migration state for + * devices that support migration. It's mandatory for + * VFIO_DEVICE_FEATURE_MIGRATION migration support. + * The returned FD is used for data transfer according to the FSM + * definition. The driver is responsible to ensure that FD reaches end + * of stream or error whenever the migration FSM leaves a data transfer + * state or before close_device() returns. + * @migration_get_state: Optional callback to get the migration state for + * devices that support migration. It's mandatory for + * VFIO_DEVICE_FEATURE_MIGRATION migration support. + */ +struct vfio_migration_ops { struct file *(*migration_set_state)( struct vfio_device *device, enum vfio_device_mig_state new_state); -- cgit From 0e862838f290147ea9c16db852d8d494b552d38d Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 24 Jun 2022 14:13:07 +0200 Subject: bitops: unify non-atomic bitops prototypes across architectures Currently, there is a mess with the prototypes of the non-atomic bitops across the different architectures: ret bool, int, unsigned long nr int, long, unsigned int, unsigned long addr volatile unsigned long *, volatile void * Thankfully, it doesn't provoke any bugs, but can sometimes make the compiler angry when it's not handy at all. Adjust all the prototypes to the following standard: ret bool retval can be only 0 or 1 nr unsigned long native; signed makes no sense addr volatile unsigned long * bitmaps are arrays of ulongs Next, some architectures don't define 'arch_' versions as they don't support instrumentation, others do. To make sure there is always the same set of callables present and to ease any potential future changes, make them all follow the rule: * architecture-specific files define only 'arch_' versions; * non-prefixed versions can be defined only in asm-generic files; and place the non-prefixed definitions into a new file in asm-generic to be included by non-instrumented architectures. Finally, add some static assertions in order to prevent people from making a mess in this room again. I also used the %__always_inline attribute consistently, so that they always get resolved to the actual operations. Suggested-by: Andy Shevchenko Signed-off-by: Alexander Lobakin Acked-by: Mark Rutland Reviewed-by: Yury Norov Reviewed-by: Andy Shevchenko Signed-off-by: Yury Norov --- include/linux/bitops.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 7aaed501f768..87087454a288 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -26,12 +26,29 @@ extern unsigned int __sw_hweight16(unsigned int w); extern unsigned int __sw_hweight32(unsigned int w); extern unsigned long __sw_hweight64(__u64 w); +#include + /* * Include this here because some architectures need generic_ffs/fls in * scope */ #include +/* Check that the bitops prototypes are sane */ +#define __check_bitop_pr(name) \ + static_assert(__same_type(arch_##name, generic_##name) && \ + __same_type(name, generic_##name)) + +__check_bitop_pr(__set_bit); +__check_bitop_pr(__clear_bit); +__check_bitop_pr(__change_bit); +__check_bitop_pr(__test_and_set_bit); +__check_bitop_pr(__test_and_clear_bit); +__check_bitop_pr(__test_and_change_bit); +__check_bitop_pr(test_bit); + +#undef __check_bitop_pr + static inline int get_bitmask_order(unsigned int count) { int order; -- cgit From bb7379bfa680bd48b468e856475778db2ad866c1 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 24 Jun 2022 14:13:08 +0200 Subject: bitops: define const_*() versions of the non-atomics Define const_*() variants of the non-atomic bitops to be used when the input arguments are compile-time constants, so that the compiler will be always able to resolve those to compile-time constants as well. Those are mostly direct aliases for generic_*() with one exception for const_test_bit(): the original one is declared atomic-safe and thus doesn't discard the `volatile` qualifier, so in order to let optimize code, define it separately disregarding the qualifier. Add them to the compile-time type checks as well just in case. Suggested-by: Marco Elver Signed-off-by: Alexander Lobakin Reviewed-by: Marco Elver Reviewed-by: Andy Shevchenko Signed-off-by: Yury Norov --- include/linux/bitops.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 87087454a288..d393297287d5 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -37,6 +37,7 @@ extern unsigned long __sw_hweight64(__u64 w); /* Check that the bitops prototypes are sane */ #define __check_bitop_pr(name) \ static_assert(__same_type(arch_##name, generic_##name) && \ + __same_type(const_##name, generic_##name) && \ __same_type(name, generic_##name)) __check_bitop_pr(__set_bit); -- cgit From e69eb9c460f128b71c6b995d75a05244e4b6cc3e Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 24 Jun 2022 14:13:09 +0200 Subject: bitops: wrap non-atomic bitops with a transparent macro In preparation for altering the non-atomic bitops with a macro, wrap them in a transparent definition. This requires prepending one more '_' to their names in order to be able to do that seamlessly. It is a simple change, given that all the non-prefixed definitions are now in asm-generic. sparc32 already has several triple-underscored functions, so I had to rename them ('___' -> 'sp32_'). Signed-off-by: Alexander Lobakin Reviewed-by: Marco Elver Reviewed-by: Andy Shevchenko Signed-off-by: Yury Norov --- include/linux/bitops.h | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index d393297287d5..3c3afbae1533 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -26,8 +26,24 @@ extern unsigned int __sw_hweight16(unsigned int w); extern unsigned int __sw_hweight32(unsigned int w); extern unsigned long __sw_hweight64(__u64 w); +/* + * Defined here because those may be needed by architecture-specific static + * inlines. + */ + #include +#define bitop(op, nr, addr) \ + op(nr, addr) + +#define __set_bit(nr, addr) bitop(___set_bit, nr, addr) +#define __clear_bit(nr, addr) bitop(___clear_bit, nr, addr) +#define __change_bit(nr, addr) bitop(___change_bit, nr, addr) +#define __test_and_set_bit(nr, addr) bitop(___test_and_set_bit, nr, addr) +#define __test_and_clear_bit(nr, addr) bitop(___test_and_clear_bit, nr, addr) +#define __test_and_change_bit(nr, addr) bitop(___test_and_change_bit, nr, addr) +#define test_bit(nr, addr) bitop(_test_bit, nr, addr) + /* * Include this here because some architectures need generic_ffs/fls in * scope @@ -38,7 +54,7 @@ extern unsigned long __sw_hweight64(__u64 w); #define __check_bitop_pr(name) \ static_assert(__same_type(arch_##name, generic_##name) && \ __same_type(const_##name, generic_##name) && \ - __same_type(name, generic_##name)) + __same_type(_##name, generic_##name)) __check_bitop_pr(__set_bit); __check_bitop_pr(__clear_bit); -- cgit From b03fc1173c0c2bb8fad61902a862985cecdc4b1b Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 24 Jun 2022 14:13:10 +0200 Subject: bitops: let optimize out non-atomic bitops on compile-time constants Currently, many architecture-specific non-atomic bitop implementations use inline asm or other hacks which are faster or more robust when working with "real" variables (i.e. fields from the structures etc.), but the compilers have no clue how to optimize them out when called on compile-time constants. That said, the following code: DECLARE_BITMAP(foo, BITS_PER_LONG) = { }; // -> unsigned long foo[1]; unsigned long bar = BIT(BAR_BIT); unsigned long baz = 0; __set_bit(FOO_BIT, foo); baz |= BIT(BAZ_BIT); BUILD_BUG_ON(!__builtin_constant_p(test_bit(FOO_BIT, foo)); BUILD_BUG_ON(!__builtin_constant_p(bar & BAR_BIT)); BUILD_BUG_ON(!__builtin_constant_p(baz & BAZ_BIT)); triggers the first assertion on x86_64, which means that the compiler is unable to evaluate it to a compile-time initializer when the architecture-specific bitop is used even if it's obvious. In order to let the compiler optimize out such cases, expand the bitop() macro to use the "constant" C non-atomic bitop implementations when all of the arguments passed are compile-time constants, which means that the result will be a compile-time constant as well, so that it produces more efficient and simple code in 100% cases, comparing to the architecture-specific counterparts. The savings are architecture, compiler and compiler flags dependent, for example, on x86_64 -O2: GCC 12: add/remove: 78/29 grow/shrink: 332/525 up/down: 31325/-61560 (-30235) LLVM 13: add/remove: 79/76 grow/shrink: 184/537 up/down: 55076/-141892 (-86816) LLVM 14: add/remove: 10/3 grow/shrink: 93/138 up/down: 3705/-6992 (-3287) and ARM64 (courtesy of Mark): GCC 11: add/remove: 92/29 grow/shrink: 933/2766 up/down: 39340/-82580 (-43240) LLVM 14: add/remove: 21/11 grow/shrink: 620/651 up/down: 12060/-15824 (-3764) Cc: Mark Rutland Signed-off-by: Alexander Lobakin Reviewed-by: Marco Elver Signed-off-by: Yury Norov --- include/linux/bitops.h | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 3c3afbae1533..cf9bf65039f2 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -33,8 +33,24 @@ extern unsigned long __sw_hweight64(__u64 w); #include +/* + * Many architecture-specific non-atomic bitops contain inline asm code and due + * to that the compiler can't optimize them to compile-time expressions or + * constants. In contrary, generic_*() helpers are defined in pure C and + * compilers optimize them just well. + * Therefore, to make `unsigned long foo = 0; __set_bit(BAR, &foo)` effectively + * equal to `unsigned long foo = BIT(BAR)`, pick the generic C alternative when + * the arguments can be resolved at compile time. That expression itself is a + * constant and doesn't bring any functional changes to the rest of cases. + * The casts to `uintptr_t` are needed to mitigate `-Waddress` warnings when + * passing a bitmap from .bss or .data (-> `!!addr` is always true). + */ #define bitop(op, nr, addr) \ - op(nr, addr) + ((__builtin_constant_p(nr) && \ + __builtin_constant_p((uintptr_t)(addr) != (uintptr_t)NULL) && \ + (uintptr_t)(addr) != (uintptr_t)NULL && \ + __builtin_constant_p(*(const unsigned long *)(addr))) ? \ + const##op(nr, addr) : op(nr, addr)) #define __set_bit(nr, addr) bitop(___set_bit, nr, addr) #define __clear_bit(nr, addr) bitop(___clear_bit, nr, addr) -- cgit From 3e7e5baaaba78075a7f3a57432609e363bf2a486 Mon Sep 17 00:00:00 2001 From: Alexander Lobakin Date: Fri, 24 Jun 2022 14:13:12 +0200 Subject: bitmap: don't assume compiler evaluates small mem*() builtins calls Intel kernel bot triggered the build bug on ARC architecture that in fact is as follows: DECLARE_BITMAP(bitmap, BITS_PER_LONG); bitmap_clear(bitmap, 0, BITS_PER_LONG); BUILD_BUG_ON(!__builtin_constant_p(*bitmap)); which can be expanded to: unsigned long bitmap[1]; memset(bitmap, 0, sizeof(*bitmap)); BUILD_BUG_ON(!__builtin_constant_p(*bitmap)); In most cases, a compiler is able to expand small/simple mem*() calls to simple assignments or bitops, in this case that would mean: unsigned long bitmap[1] = { 0 }; BUILD_BUG_ON(!__builtin_constant_p(*bitmap)); and on most architectures this works, but not on ARC, despite having -O3 for every build. So, to make this work, in case when the last bit to modify is still within the first long (small_const_nbits()), just use plain assignments for the rest of bitmap_*() functions which still use mem*(), but didn't receive such compile-time optimizations yet. This doesn't have the same coverage as compilers provide, but at least something to start: text: add/remove: 3/7 grow/shrink: 43/78 up/down: 1848/-3370 (-1546) data: add/remove: 1/11 grow/shrink: 0/8 up/down: 4/-356 (-352) notably cpumask_*() family when NR_CPUS <= BITS_PER_LONG: netif_get_num_default_rss_queues 38 4 -34 cpumask_copy 90 - -90 cpumask_clear 146 - -146 and the abovementioned assertion started passing. Signed-off-by: Alexander Lobakin Signed-off-by: Yury Norov --- include/linux/bitmap.h | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index f091a1664bf1..c91638e507f2 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -238,20 +238,32 @@ extern int bitmap_print_list_to_buf(char *buf, const unsigned long *maskp, static inline void bitmap_zero(unsigned long *dst, unsigned int nbits) { unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); - memset(dst, 0, len); + + if (small_const_nbits(nbits)) + *dst = 0; + else + memset(dst, 0, len); } static inline void bitmap_fill(unsigned long *dst, unsigned int nbits) { unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); - memset(dst, 0xff, len); + + if (small_const_nbits(nbits)) + *dst = ~0UL; + else + memset(dst, 0xff, len); } static inline void bitmap_copy(unsigned long *dst, const unsigned long *src, unsigned int nbits) { unsigned int len = BITS_TO_LONGS(nbits) * sizeof(unsigned long); - memcpy(dst, src, len); + + if (small_const_nbits(nbits)) + *dst = *src; + else + memcpy(dst, src, len); } /* @@ -431,6 +443,8 @@ static __always_inline void bitmap_set(unsigned long *map, unsigned int start, { if (__builtin_constant_p(nbits) && nbits == 1) __set_bit(start, map); + else if (small_const_nbits(start + nbits)) + *map |= GENMASK(start + nbits - 1, start); else if (__builtin_constant_p(start & BITMAP_MEM_MASK) && IS_ALIGNED(start, BITMAP_MEM_ALIGNMENT) && __builtin_constant_p(nbits & BITMAP_MEM_MASK) && @@ -445,6 +459,8 @@ static __always_inline void bitmap_clear(unsigned long *map, unsigned int start, { if (__builtin_constant_p(nbits) && nbits == 1) __clear_bit(start, map); + else if (small_const_nbits(start + nbits)) + *map &= ~GENMASK(start + nbits - 1, start); else if (__builtin_constant_p(start & BITMAP_MEM_MASK) && IS_ALIGNED(start, BITMAP_MEM_ALIGNMENT) && __builtin_constant_p(nbits & BITMAP_MEM_MASK) && -- cgit From 837ced3a1a5d8bb1a637dd584711f31ae6b54d93 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Tue, 28 Jun 2022 17:52:38 +0300 Subject: time64.h: consolidate uses of PSEC_PER_NSEC Time-sensitive networking code needs to work with PTP times expressed in nanoseconds, and with packet transmission times expressed in picoseconds, since those would be fractional at higher than gigabit speed when expressed in nanoseconds. Convert the existing uses in tc-taprio and the ocelot/felix DSA driver to a PSEC_PER_NSEC macro. This macro is placed in include/linux/time64.h as opposed to its relatives (PSEC_PER_SEC etc) from include/vdso/time64.h because the vDSO library does not (yet) need/use it. Cc: Andy Lutomirski Cc: Thomas Gleixner Signed-off-by: Vladimir Oltean Reviewed-by: Vincenzo Frascino # for the vDSO parts Signed-off-by: Jakub Kicinski --- include/linux/time64.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/time64.h b/include/linux/time64.h index 81b9686a2079..2fb8232cff1d 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -20,6 +20,9 @@ struct itimerspec64 { struct timespec64 it_value; }; +/* Parameters used to convert the timespec values: */ +#define PSEC_PER_NSEC 1000L + /* Located here for timespec[64]_valid_strict */ #define TIME64_MAX ((s64)~((u64)1 << 63)) #define TIME64_MIN (-TIME64_MAX - 1) -- cgit From eb7f8e28420372787933eec079735c35034bda7d Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Thu, 30 Jun 2022 20:32:55 -0600 Subject: misc: rtsx_usb: fix use of dma mapped buffer for usb bulk transfer rtsx_usb driver allocates coherent dma buffer for urb transfers. This buffer is passed to usb_bulk_msg() and usb core tries to map already mapped buffer running into a dma mapping error. xhci_hcd 0000:01:00.0: rejecting DMA map of vmalloc memory WARNING: CPU: 1 PID: 279 at include/linux/dma-mapping.h:326 usb_ hcd_map_urb_for_dma+0x7d6/0x820 ... xhci_map_urb_for_dma+0x291/0x4e0 usb_hcd_submit_urb+0x199/0x12b0 ... usb_submit_urb+0x3b8/0x9e0 usb_start_wait_urb+0xe3/0x2d0 usb_bulk_msg+0x115/0x240 rtsx_usb_transfer_data+0x185/0x1a8 [rtsx_usb] rtsx_usb_send_cmd+0xbb/0x123 [rtsx_usb] rtsx_usb_write_register+0x12c/0x143 [rtsx_usb] rtsx_usb_probe+0x226/0x4b2 [rtsx_usb] Fix it to use kmalloc() to get DMA-able memory region instead. Signed-off-by: Shuah Khan Cc: stable Link: https://lore.kernel.org/r/667d627d502e1ba9ff4f9b94966df3299d2d3c0d.1656642167.git.skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman --- include/linux/rtsx_usb.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rtsx_usb.h b/include/linux/rtsx_usb.h index 159729cffd8e..a07f7341ebc2 100644 --- a/include/linux/rtsx_usb.h +++ b/include/linux/rtsx_usb.h @@ -55,7 +55,6 @@ struct rtsx_ucr { struct usb_interface *pusb_intf; struct usb_sg_request current_sg; unsigned char *iobuf; - dma_addr_t iobuf_dma; struct timer_list sg_timer; struct mutex dev_mutex; -- cgit From 3776c78559853fd151be7c41e369fd076fb679d5 Mon Sep 17 00:00:00 2001 From: Shuah Khan Date: Thu, 30 Jun 2022 20:32:56 -0600 Subject: misc: rtsx_usb: use separate command and response buffers rtsx_usb uses same buffer for command and response. There could be a potential conflict using the same buffer for both especially if retries and timeouts are involved. Use separate command and response buffers to avoid conflicts. Signed-off-by: Shuah Khan Cc: stable Link: https://lore.kernel.org/r/07e3721804ff07aaab9ef5b39a5691d0718b9ade.1656642167.git.skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman --- include/linux/rtsx_usb.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rtsx_usb.h b/include/linux/rtsx_usb.h index a07f7341ebc2..3247ed8e9ff0 100644 --- a/include/linux/rtsx_usb.h +++ b/include/linux/rtsx_usb.h @@ -54,7 +54,6 @@ struct rtsx_ucr { struct usb_device *pusb_dev; struct usb_interface *pusb_intf; struct usb_sg_request current_sg; - unsigned char *iobuf; struct timer_list sg_timer; struct mutex dev_mutex; -- cgit From 6708be40047789aa3587a3866b782d5cda7b2a31 Mon Sep 17 00:00:00 2001 From: Peter Chiu Date: Wed, 22 Jun 2022 09:08:20 +0800 Subject: wifi: ieee80211: s1g action frames are not robust S1g action frame with code 22 is not protected so update the robust action frame list. Signed-off-by: Peter Chiu Link: https://lore.kernel.org/r/20220622010820.17522-1-chui-hao.chiu@mediatek.com Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 870f659087d6..f386f9ed41f3 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -4094,6 +4094,7 @@ static inline bool _ieee80211_is_robust_mgmt_frame(struct ieee80211_hdr *hdr) *category != WLAN_CATEGORY_SELF_PROTECTED && *category != WLAN_CATEGORY_UNPROT_DMG && *category != WLAN_CATEGORY_VHT && + *category != WLAN_CATEGORY_S1G && *category != WLAN_CATEGORY_VENDOR_SPECIFIC; } -- cgit From 80fc671bcc0173836e9032b0c698ea74c13b9d7c Mon Sep 17 00:00:00 2001 From: Jean-Philippe Brucker Date: Fri, 1 Jul 2022 11:48:43 +0800 Subject: uacce: Handle parent device removal or parent driver module rmmod The uacce driver must deal with a possible removal of the parent device or parent driver module rmmod at any time. Although uacce_remove(), called on device removal and on driver unbind, prevents future use of the uacce fops by removing the cdev, fops that were called before that point may still be running. Serialize uacce_fops_open() and uacce_remove() with uacce->mutex. Serialize other fops against uacce_remove() with q->mutex. Since we need to protect uacce_fops_poll() which gets called on the fast path, replace uacce->queues_lock with q->mutex to improve scalability. The other fops are only used during setup. uacce_queue_is_valid(), checked under q->mutex or uacce->mutex, denotes whether uacce_remove() has disabled all queues. If that is the case, don't go any further since the parent device is being removed and uacce->ops should not be called anymore. Reported-by: Yang Shen Signed-off-by: Zhangfei Gao Signed-off-by: Jean-Philippe Brucker Link: https://lore.kernel.org/r/20220701034843.7502-1-zhangfei.gao@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/uacce.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/uacce.h b/include/linux/uacce.h index 48e319f40275..9ce88c28b0a8 100644 --- a/include/linux/uacce.h +++ b/include/linux/uacce.h @@ -70,6 +70,7 @@ enum uacce_q_state { * @wait: wait queue head * @list: index into uacce queues list * @qfrs: pointer of qfr regions + * @mutex: protects queue state * @state: queue state machine * @pasid: pasid associated to the mm * @handle: iommu_sva handle returned by iommu_sva_bind_device() @@ -80,6 +81,7 @@ struct uacce_queue { wait_queue_head_t wait; struct list_head list; struct uacce_qfile_region *qfrs[UACCE_MAX_REGION]; + struct mutex mutex; enum uacce_q_state state; u32 pasid; struct iommu_sva *handle; @@ -97,9 +99,9 @@ struct uacce_queue { * @dev_id: id of the uacce device * @cdev: cdev of the uacce * @dev: dev of the uacce + * @mutex: protects uacce operation * @priv: private pointer of the uacce * @queues: list of queues - * @queues_lock: lock for queues list * @inode: core vfs */ struct uacce_device { @@ -113,9 +115,9 @@ struct uacce_device { u32 dev_id; struct cdev *cdev; struct device dev; + struct mutex mutex; void *priv; struct list_head queues; - struct mutex queues_lock; struct inode *inode; }; -- cgit From c882716b6d411495f221bdd73e9137357c16a3ea Mon Sep 17 00:00:00 2001 From: Liang He Date: Tue, 28 Jun 2022 10:16:40 +0800 Subject: firmware: Hold a reference for of_find_compatible_node() In of_register_trusted_foundations(), we need to hold the reference returned by of_find_compatible_node() and then use it to call of_node_put() for refcount balance. Signed-off-by: Liang He Link: https://lore.kernel.org/r/20220628021640.4015-1-windhl@126.com Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/trusted_foundations.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/firmware/trusted_foundations.h b/include/linux/firmware/trusted_foundations.h index be5984bda592..931b6c5c72df 100644 --- a/include/linux/firmware/trusted_foundations.h +++ b/include/linux/firmware/trusted_foundations.h @@ -71,12 +71,16 @@ static inline void register_trusted_foundations( static inline void of_register_trusted_foundations(void) { + struct device_node *np = of_find_compatible_node(NULL, NULL, "tlm,trusted-foundations"); + + if (!np) + return; + of_node_put(np); /* * If we find the target should enable TF but does not support it, * fail as the system won't be able to do much anyway */ - if (of_find_compatible_node(NULL, NULL, "tlm,trusted-foundations")) - register_trusted_foundations(NULL); + register_trusted_foundations(NULL); } static inline bool trusted_foundations_registered(void) -- cgit From 31a371e419c885e0f137ce70395356ba8639dc52 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 29 Jun 2022 17:42:08 +0300 Subject: fanotify: prepare for setting event flags in ignore mask Setting flags FAN_ONDIR FAN_EVENT_ON_CHILD in ignore mask has no effect. The FAN_EVENT_ON_CHILD flag in mask implicitly applies to ignore mask and ignore mask is always implicitly applied to events on directories. Define a mark flag that replaces this legacy behavior with logic of applying the ignore mask according to event flags in ignore mask. Implement the new logic to prepare for supporting an ignore mask that ignores events on children and ignore mask that does not ignore events on directories. To emphasize the change in terminology, also rename ignored_mask mark member to ignore_mask and use accessors to get only the effective ignored events or the ignored events and flags. This change in terminology finally aligns with the "ignore mask" language in man pages and in most of the comments. Link: https://lore.kernel.org/r/20220629144210.2983229-2-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fsnotify_backend.h | 89 +++++++++++++++++++++++++++++++++++++--- 1 file changed, 83 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fsnotify_backend.h b/include/linux/fsnotify_backend.h index 9560734759fa..d7d96c806bff 100644 --- a/include/linux/fsnotify_backend.h +++ b/include/linux/fsnotify_backend.h @@ -518,8 +518,8 @@ struct fsnotify_mark { struct hlist_node obj_list; /* Head of list of marks for an object [mark ref] */ struct fsnotify_mark_connector *connector; - /* Events types to ignore [mark->lock, group->mark_mutex] */ - __u32 ignored_mask; + /* Events types and flags to ignore [mark->lock, group->mark_mutex] */ + __u32 ignore_mask; /* General fsnotify mark flags */ #define FSNOTIFY_MARK_FLAG_ALIVE 0x0001 #define FSNOTIFY_MARK_FLAG_ATTACHED 0x0002 @@ -529,6 +529,7 @@ struct fsnotify_mark { /* fanotify mark flags */ #define FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY 0x0100 #define FSNOTIFY_MARK_FLAG_NO_IREF 0x0200 +#define FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS 0x0400 unsigned int flags; /* flags [mark->lock] */ }; @@ -655,15 +656,91 @@ extern void fsnotify_remove_queued_event(struct fsnotify_group *group, /* functions used to manipulate the marks attached to inodes */ -/* Get mask for calculating object interest taking ignored mask into account */ +/* + * Canonical "ignore mask" including event flags. + * + * Note the subtle semantic difference from the legacy ->ignored_mask. + * ->ignored_mask traditionally only meant which events should be ignored, + * while ->ignore_mask also includes flags regarding the type of objects on + * which events should be ignored. + */ +static inline __u32 fsnotify_ignore_mask(struct fsnotify_mark *mark) +{ + __u32 ignore_mask = mark->ignore_mask; + + /* The event flags in ignore mask take effect */ + if (mark->flags & FSNOTIFY_MARK_FLAG_HAS_IGNORE_FLAGS) + return ignore_mask; + + /* + * Legacy behavior: + * - Always ignore events on dir + * - Ignore events on child if parent is watching children + */ + ignore_mask |= FS_ISDIR; + ignore_mask &= ~FS_EVENT_ON_CHILD; + ignore_mask |= mark->mask & FS_EVENT_ON_CHILD; + + return ignore_mask; +} + +/* Legacy ignored_mask - only event types to ignore */ +static inline __u32 fsnotify_ignored_events(struct fsnotify_mark *mark) +{ + return mark->ignore_mask & ALL_FSNOTIFY_EVENTS; +} + +/* + * Check if mask (or ignore mask) should be applied depending if victim is a + * directory and whether it is reported to a watching parent. + */ +static inline bool fsnotify_mask_applicable(__u32 mask, bool is_dir, + int iter_type) +{ + /* Should mask be applied to a directory? */ + if (is_dir && !(mask & FS_ISDIR)) + return false; + + /* Should mask be applied to a child? */ + if (iter_type == FSNOTIFY_ITER_TYPE_PARENT && + !(mask & FS_EVENT_ON_CHILD)) + return false; + + return true; +} + +/* + * Effective ignore mask taking into account if event victim is a + * directory and whether it is reported to a watching parent. + */ +static inline __u32 fsnotify_effective_ignore_mask(struct fsnotify_mark *mark, + bool is_dir, int iter_type) +{ + __u32 ignore_mask = fsnotify_ignored_events(mark); + + if (!ignore_mask) + return 0; + + /* For non-dir and non-child, no need to consult the event flags */ + if (!is_dir && iter_type != FSNOTIFY_ITER_TYPE_PARENT) + return ignore_mask; + + ignore_mask = fsnotify_ignore_mask(mark); + if (!fsnotify_mask_applicable(ignore_mask, is_dir, iter_type)) + return 0; + + return ignore_mask & ALL_FSNOTIFY_EVENTS; +} + +/* Get mask for calculating object interest taking ignore mask into account */ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) { __u32 mask = mark->mask; - if (!mark->ignored_mask) + if (!fsnotify_ignored_events(mark)) return mask; - /* Interest in FS_MODIFY may be needed for clearing ignored mask */ + /* Interest in FS_MODIFY may be needed for clearing ignore mask */ if (!(mark->flags & FSNOTIFY_MARK_FLAG_IGNORED_SURV_MODIFY)) mask |= FS_MODIFY; @@ -671,7 +748,7 @@ static inline __u32 fsnotify_calc_mask(struct fsnotify_mark *mark) * If mark is interested in ignoring events on children, the object must * show interest in those events for fsnotify_parent() to notice it. */ - return mask | (mark->ignored_mask & ALL_FSNOTIFY_EVENTS); + return mask | mark->ignore_mask; } /* Get mask of events for a list of marks */ -- cgit From 8afd7215aa97f8868d033f6e1d01a276ab2d29c0 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 29 Jun 2022 17:42:09 +0300 Subject: fanotify: cleanups for fanotify_mark() input validations Create helper fanotify_may_update_existing_mark() for checking for conflicts between existing mark flags and fanotify_mark() flags. Use variable mark_cmd to make the checks for mark command bits cleaner. Link: https://lore.kernel.org/r/20220629144210.2983229-3-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fanotify.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index e517dbcf74ed..a7207f092fd1 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -59,15 +59,16 @@ #define FANOTIFY_MARK_TYPE_BITS (FAN_MARK_INODE | FAN_MARK_MOUNT | \ FAN_MARK_FILESYSTEM) +#define FANOTIFY_MARK_CMD_BITS (FAN_MARK_ADD | FAN_MARK_REMOVE | \ + FAN_MARK_FLUSH) + #define FANOTIFY_MARK_FLAGS (FANOTIFY_MARK_TYPE_BITS | \ - FAN_MARK_ADD | \ - FAN_MARK_REMOVE | \ + FANOTIFY_MARK_CMD_BITS | \ FAN_MARK_DONT_FOLLOW | \ FAN_MARK_ONLYDIR | \ FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ - FAN_MARK_EVICTABLE | \ - FAN_MARK_FLUSH) + FAN_MARK_EVICTABLE) /* * Events that can be reported with data type FSNOTIFY_EVENT_PATH. -- cgit From e252f2ed1c8c6c3884ab5dd34e003ed21f1fe6e0 Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Wed, 29 Jun 2022 17:42:10 +0300 Subject: fanotify: introduce FAN_MARK_IGNORE This flag is a new way to configure ignore mask which allows adding and removing the event flags FAN_ONDIR and FAN_EVENT_ON_CHILD in ignore mask. The legacy FAN_MARK_IGNORED_MASK flag would always ignore events on directories and would ignore events on children depending on whether the FAN_EVENT_ON_CHILD flag was set in the (non ignored) mask. FAN_MARK_IGNORE can be used to ignore events on children without setting FAN_EVENT_ON_CHILD in the mark's mask and will not ignore events on directories unconditionally, only when FAN_ONDIR is set in ignore mask. The new behavior is non-downgradable. After calling fanotify_mark() with FAN_MARK_IGNORE once, calling fanotify_mark() with FAN_MARK_IGNORED_MASK on the same object will return EEXIST error. Setting the event flags with FAN_MARK_IGNORE on a non-dir inode mark has no meaning and will return ENOTDIR error. The meaning of FAN_MARK_IGNORED_SURV_MODIFY is preserved with the new FAN_MARK_IGNORE flag, but with a few semantic differences: 1. FAN_MARK_IGNORED_SURV_MODIFY is required for filesystem and mount marks and on an inode mark on a directory. Omitting this flag will return EINVAL or EISDIR error. 2. An ignore mask on a non-directory inode that survives modify could never be downgraded to an ignore mask that does not survive modify. With new FAN_MARK_IGNORE semantics we make that rule explicit - trying to update a surviving ignore mask without the flag FAN_MARK_IGNORED_SURV_MODIFY will return EEXIST error. The conveniene macro FAN_MARK_IGNORE_SURV is added for (FAN_MARK_IGNORE | FAN_MARK_IGNORED_SURV_MODIFY), because the common case should use short constant names. Link: https://lore.kernel.org/r/20220629144210.2983229-4-amir73il@gmail.com Signed-off-by: Amir Goldstein Signed-off-by: Jan Kara --- include/linux/fanotify.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fanotify.h b/include/linux/fanotify.h index a7207f092fd1..8ad743def6f3 100644 --- a/include/linux/fanotify.h +++ b/include/linux/fanotify.h @@ -62,11 +62,14 @@ #define FANOTIFY_MARK_CMD_BITS (FAN_MARK_ADD | FAN_MARK_REMOVE | \ FAN_MARK_FLUSH) +#define FANOTIFY_MARK_IGNORE_BITS (FAN_MARK_IGNORED_MASK | \ + FAN_MARK_IGNORE) + #define FANOTIFY_MARK_FLAGS (FANOTIFY_MARK_TYPE_BITS | \ FANOTIFY_MARK_CMD_BITS | \ + FANOTIFY_MARK_IGNORE_BITS | \ FAN_MARK_DONT_FOLLOW | \ FAN_MARK_ONLYDIR | \ - FAN_MARK_IGNORED_MASK | \ FAN_MARK_IGNORED_SURV_MODIFY | \ FAN_MARK_EVICTABLE) -- cgit From b69a2afd5afce9bf6d56e349d6ab592c916e20f2 Mon Sep 17 00:00:00 2001 From: Jonathan McDowell Date: Thu, 30 Jun 2022 08:36:12 +0000 Subject: x86/kexec: Carry forward IMA measurement log on kexec On kexec file load, the Integrity Measurement Architecture (IMA) subsystem may verify the IMA signature of the kernel and initramfs, and measure it. The command line parameters passed to the kernel in the kexec call may also be measured by IMA. A remote attestation service can verify a TPM quote based on the TPM event log, the IMA measurement list and the TPM PCR data. This can be achieved only if the IMA measurement log is carried over from the current kernel to the next kernel across the kexec call. PowerPC and ARM64 both achieve this using device tree with a "linux,ima-kexec-buffer" node. x86 platforms generally don't make use of device tree, so use the setup_data mechanism to pass the IMA buffer to the new kernel. Signed-off-by: Jonathan McDowell Signed-off-by: Borislav Petkov Reviewed-by: Mimi Zohar # IMA function definitions Link: https://lore.kernel.org/r/YmKyvlF3my1yWTvK@noodles-fedora-PC23Y6EG --- include/linux/ima.h | 5 +++++ include/linux/of.h | 2 -- 2 files changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ima.h b/include/linux/ima.h index 426b1744215e..81708ca0ebc7 100644 --- a/include/linux/ima.h +++ b/include/linux/ima.h @@ -140,6 +140,11 @@ static inline int ima_measure_critical_data(const char *event_label, #endif /* CONFIG_IMA */ +#ifdef CONFIG_HAVE_IMA_KEXEC +int __init ima_free_kexec_buffer(void); +int __init ima_get_kexec_buffer(void **addr, size_t *size); +#endif + #ifdef CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT extern bool arch_ima_get_secureboot(void); extern const char * const *arch_get_ima_policy(void); diff --git a/include/linux/of.h b/include/linux/of.h index f0a5d6b10c5a..20a4e7cb7afe 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -441,8 +441,6 @@ void *of_kexec_alloc_and_setup_fdt(const struct kimage *image, unsigned long initrd_load_addr, unsigned long initrd_len, const char *cmdline, size_t extra_fdt_size); -int ima_get_kexec_buffer(void **addr, size_t *size); -int ima_free_kexec_buffer(void); #else /* CONFIG_OF */ static inline void of_core_init(void) -- cgit From 07358194badf73e267289b40b761f5dc56928eab Mon Sep 17 00:00:00 2001 From: "Rafael J. Wysocki" Date: Mon, 27 Jun 2022 20:42:18 +0200 Subject: PM: runtime: Redefine pm_runtime_release_supplier() Instead of passing an extra bool argument to pm_runtime_release_supplier(), make its callers take care of triggering a runtime-suspend of the supplier device as needed. No expected functional impact. Suggested-by: Greg Kroah-Hartman Signed-off-by: Rafael J. Wysocki Reviewed-by: Greg Kroah-Hartman Cc: 5.1+ # 5.1+ --- include/linux/pm_runtime.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 9e4d056967c6..0a41b2dcccad 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -88,7 +88,7 @@ extern void pm_runtime_get_suppliers(struct device *dev); extern void pm_runtime_put_suppliers(struct device *dev); extern void pm_runtime_new_link(struct device *dev); extern void pm_runtime_drop_link(struct device_link *link); -extern void pm_runtime_release_supplier(struct device_link *link, bool check_idle); +extern void pm_runtime_release_supplier(struct device_link *link); extern int devm_pm_runtime_enable(struct device *dev); @@ -314,8 +314,7 @@ static inline void pm_runtime_get_suppliers(struct device *dev) {} static inline void pm_runtime_put_suppliers(struct device *dev) {} static inline void pm_runtime_new_link(struct device *dev) {} static inline void pm_runtime_drop_link(struct device_link *link) {} -static inline void pm_runtime_release_supplier(struct device_link *link, - bool check_idle) {} +static inline void pm_runtime_release_supplier(struct device_link *link) {} #endif /* !CONFIG_PM */ -- cgit From 2852ca7fba9f77b204f0fe953b31fadd0057c936 Mon Sep 17 00:00:00 2001 From: David Gow Date: Fri, 1 Jul 2022 16:47:41 +0800 Subject: panic: Taint kernel if tests are run Most in-kernel tests (such as KUnit tests) are not supposed to run on production systems: they may do deliberately illegal things to trigger errors, and have security implications (for example, KUnit assertions will often deliberately leak kernel addresses). Add a new taint type, TAINT_TEST to signal that a test has been run. This will be printed as 'N' (originally for kuNit, as every other sensible letter was taken.) This should discourage people from running these tests on production systems, and to make it easier to tell if tests have been run accidentally (by loading the wrong configuration, etc.) Acked-by: Luis Chamberlain Reviewed-by: Brendan Higgins Signed-off-by: David Gow Signed-off-by: Shuah Khan --- include/linux/panic.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/panic.h b/include/linux/panic.h index e71161da69c4..c7759b3f2045 100644 --- a/include/linux/panic.h +++ b/include/linux/panic.h @@ -68,7 +68,8 @@ static inline void set_arch_panic_timeout(int timeout, int arch_default_timeout) #define TAINT_LIVEPATCH 15 #define TAINT_AUX 16 #define TAINT_RANDSTRUCT 17 -#define TAINT_FLAGS_COUNT 18 +#define TAINT_TEST 18 +#define TAINT_FLAGS_COUNT 19 #define TAINT_FLAGS_MAX ((1UL << TAINT_FLAGS_COUNT) - 1) struct taint_flag { -- cgit From eb003bf3ba221bb3d21d1fdcddaa36c158fd2d8f Mon Sep 17 00:00:00 2001 From: Maximilian Luz Date: Fri, 24 Jun 2022 20:36:39 +0200 Subject: platform/surface: aggregator: Add helper macros for requests with argument and return value Add helper macros for synchronous stack-allocated Surface Aggregator request with both argument and return value, similar to the current argument-only and return-value-only ones. Signed-off-by: Maximilian Luz Link: https://lore.kernel.org/r/20220624183642.910893-2-luzmaximilian@gmail.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/surface_aggregator/controller.h | 125 ++++++++++++++++++++++++++ include/linux/surface_aggregator/device.h | 36 ++++++++ 2 files changed, 161 insertions(+) (limited to 'include/linux') diff --git a/include/linux/surface_aggregator/controller.h b/include/linux/surface_aggregator/controller.h index 50a2b4926c06..d11a1c6e3186 100644 --- a/include/linux/surface_aggregator/controller.h +++ b/include/linux/surface_aggregator/controller.h @@ -469,6 +469,67 @@ struct ssam_request_spec_md { return 0; \ } +/** + * SSAM_DEFINE_SYNC_REQUEST_WR() - Define synchronous SAM request function with + * both argument and return value. + * @name: Name of the generated function. + * @atype: Type of the request's argument. + * @rtype: Type of the request's return value. + * @spec: Specification (&struct ssam_request_spec) defining the request. + * + * Defines a function executing the synchronous SAM request specified by @spec, + * with the request taking an argument of type @atype and having a return value + * of type @rtype. The generated function takes care of setting up the request + * and response structs, buffer allocation, as well as execution of the request + * itself, returning once the request has been fully completed. The required + * transport buffer will be allocated on the stack. + * + * The generated function is defined as ``static int name(struct + * ssam_controller *ctrl, const atype *arg, rtype *ret)``, returning the status + * of the request, which is zero on success and negative on failure. The + * ``ctrl`` parameter is the controller via which the request is sent. The + * request argument is specified via the ``arg`` pointer. The request's return + * value is written to the memory pointed to by the ``ret`` parameter. + * + * Refer to ssam_request_sync_onstack() for more details on the behavior of + * the generated function. + */ +#define SSAM_DEFINE_SYNC_REQUEST_WR(name, atype, rtype, spec...) \ + static int name(struct ssam_controller *ctrl, const atype *arg, rtype *ret) \ + { \ + struct ssam_request_spec s = (struct ssam_request_spec)spec; \ + struct ssam_request rqst; \ + struct ssam_response rsp; \ + int status; \ + \ + rqst.target_category = s.target_category; \ + rqst.target_id = s.target_id; \ + rqst.command_id = s.command_id; \ + rqst.instance_id = s.instance_id; \ + rqst.flags = s.flags | SSAM_REQUEST_HAS_RESPONSE; \ + rqst.length = sizeof(atype); \ + rqst.payload = (u8 *)arg; \ + \ + rsp.capacity = sizeof(rtype); \ + rsp.length = 0; \ + rsp.pointer = (u8 *)ret; \ + \ + status = ssam_request_sync_onstack(ctrl, &rqst, &rsp, sizeof(atype)); \ + if (status) \ + return status; \ + \ + if (rsp.length != sizeof(rtype)) { \ + struct device *dev = ssam_controller_device(ctrl); \ + dev_err(dev, \ + "rqst: invalid response length, expected %zu, got %zu (tc: %#04x, cid: %#04x)", \ + sizeof(rtype), rsp.length, rqst.target_category,\ + rqst.command_id); \ + return -EIO; \ + } \ + \ + return 0; \ + } + /** * SSAM_DEFINE_SYNC_REQUEST_MD_N() - Define synchronous multi-device SAM * request function with neither argument nor return value. @@ -613,6 +674,70 @@ struct ssam_request_spec_md { return 0; \ } +/** + * SSAM_DEFINE_SYNC_REQUEST_MD_WR() - Define synchronous multi-device SAM + * request function with both argument and return value. + * @name: Name of the generated function. + * @atype: Type of the request's argument. + * @rtype: Type of the request's return value. + * @spec: Specification (&struct ssam_request_spec_md) defining the request. + * + * Defines a function executing the synchronous SAM request specified by @spec, + * with the request taking an argument of type @atype and having a return value + * of type @rtype. Device specifying parameters are not hard-coded, but instead + * must be provided to the function. The generated function takes care of + * setting up the request and response structs, buffer allocation, as well as + * execution of the request itself, returning once the request has been fully + * completed. The required transport buffer will be allocated on the stack. + * + * The generated function is defined as ``static int name(struct + * ssam_controller *ctrl, u8 tid, u8 iid, const atype *arg, rtype *ret)``, + * returning the status of the request, which is zero on success and negative + * on failure. The ``ctrl`` parameter is the controller via which the request + * is sent, ``tid`` the target ID for the request, and ``iid`` the instance ID. + * The request argument is specified via the ``arg`` pointer. The request's + * return value is written to the memory pointed to by the ``ret`` parameter. + * + * Refer to ssam_request_sync_onstack() for more details on the behavior of + * the generated function. + */ +#define SSAM_DEFINE_SYNC_REQUEST_MD_WR(name, atype, rtype, spec...) \ + static int name(struct ssam_controller *ctrl, u8 tid, u8 iid, \ + const atype *arg, rtype *ret) \ + { \ + struct ssam_request_spec_md s = (struct ssam_request_spec_md)spec; \ + struct ssam_request rqst; \ + struct ssam_response rsp; \ + int status; \ + \ + rqst.target_category = s.target_category; \ + rqst.target_id = tid; \ + rqst.command_id = s.command_id; \ + rqst.instance_id = iid; \ + rqst.flags = s.flags | SSAM_REQUEST_HAS_RESPONSE; \ + rqst.length = sizeof(atype); \ + rqst.payload = (u8 *)arg; \ + \ + rsp.capacity = sizeof(rtype); \ + rsp.length = 0; \ + rsp.pointer = (u8 *)ret; \ + \ + status = ssam_request_sync_onstack(ctrl, &rqst, &rsp, sizeof(atype)); \ + if (status) \ + return status; \ + \ + if (rsp.length != sizeof(rtype)) { \ + struct device *dev = ssam_controller_device(ctrl); \ + dev_err(dev, \ + "rqst: invalid response length, expected %zu, got %zu (tc: %#04x, cid: %#04x)", \ + sizeof(rtype), rsp.length, rqst.target_category,\ + rqst.command_id); \ + return -EIO; \ + } \ + \ + return 0; \ + } + /* -- Event notifier/callbacks. --------------------------------------------- */ diff --git a/include/linux/surface_aggregator/device.h b/include/linux/surface_aggregator/device.h index c418f7f2732d..6cf7e80312d5 100644 --- a/include/linux/surface_aggregator/device.h +++ b/include/linux/surface_aggregator/device.h @@ -483,6 +483,42 @@ static inline void ssam_remove_clients(struct device *dev) {} sdev->uid.instance, ret); \ } +/** + * SSAM_DEFINE_SYNC_REQUEST_CL_WR() - Define synchronous client-device SAM + * request function with argument and return value. + * @name: Name of the generated function. + * @atype: Type of the request's argument. + * @rtype: Type of the request's return value. + * @spec: Specification (&struct ssam_request_spec_md) defining the request. + * + * Defines a function executing the synchronous SAM request specified by @spec, + * with the request taking an argument of type @atype and having a return value + * of type @rtype. Device specifying parameters are not hard-coded, but instead + * are provided via the client device, specifically its UID, supplied when + * calling this function. The generated function takes care of setting up the + * request struct, buffer allocation, as well as execution of the request + * itself, returning once the request has been fully completed. The required + * transport buffer will be allocated on the stack. + * + * The generated function is defined as ``static int name(struct ssam_device + * *sdev, const atype *arg, rtype *ret)``, returning the status of the request, + * which is zero on success and negative on failure. The ``sdev`` parameter + * specifies both the target device of the request and by association the + * controller via which the request is sent. The request's argument is + * specified via the ``arg`` pointer. The request's return value is written to + * the memory pointed to by the ``ret`` parameter. + * + * Refer to ssam_request_sync_onstack() for more details on the behavior of + * the generated function. + */ +#define SSAM_DEFINE_SYNC_REQUEST_CL_WR(name, atype, rtype, spec...) \ + SSAM_DEFINE_SYNC_REQUEST_MD_WR(__raw_##name, atype, rtype, spec) \ + static int name(struct ssam_device *sdev, const atype *arg, rtype *ret) \ + { \ + return __raw_##name(sdev->ctrl, sdev->uid.target, \ + sdev->uid.instance, arg, ret); \ + } + /* -- Helpers for client-device notifiers. ---------------------------------- */ -- cgit From 4a4ab610b8ae912c28a4fd28442ef24ed7a1a5bd Mon Sep 17 00:00:00 2001 From: Maximilian Luz Date: Fri, 24 Jun 2022 22:57:58 +0200 Subject: platform/surface: aggregator: Move device registry helper functions to core module Move helper functions for client device registration to the core module. This simplifies addition of future DT/OF support and also allows us to split out the device hub drivers into their own module. At the same time, also improve device node validation a bit by not silently skipping devices with invalid device UID specifiers. Further, ensure proper lifetime management for the firmware/software nodes associated with the added devices. Signed-off-by: Maximilian Luz Link: https://lore.kernel.org/r/20220624205800.1355621-2-luzmaximilian@gmail.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/surface_aggregator/device.h | 52 +++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) (limited to 'include/linux') diff --git a/include/linux/surface_aggregator/device.h b/include/linux/surface_aggregator/device.h index 6cf7e80312d5..46c45d1b6368 100644 --- a/include/linux/surface_aggregator/device.h +++ b/include/linux/surface_aggregator/device.h @@ -15,6 +15,7 @@ #include #include +#include #include #include @@ -375,11 +376,62 @@ void ssam_device_driver_unregister(struct ssam_device_driver *d); /* -- Helpers for controller and hub devices. ------------------------------- */ #ifdef CONFIG_SURFACE_AGGREGATOR_BUS + +int __ssam_register_clients(struct device *parent, struct ssam_controller *ctrl, + struct fwnode_handle *node); void ssam_remove_clients(struct device *dev); + #else /* CONFIG_SURFACE_AGGREGATOR_BUS */ + +static inline int __ssam_register_clients(struct device *parent, struct ssam_controller *ctrl, + struct fwnode_handle *node) +{ + return 0; +} + static inline void ssam_remove_clients(struct device *dev) {} + #endif /* CONFIG_SURFACE_AGGREGATOR_BUS */ +/** + * ssam_register_clients() - Register all client devices defined under the + * given parent device. + * @dev: The parent device under which clients should be registered. + * @ctrl: The controller with which client should be registered. + * + * Register all clients that have via firmware nodes been defined as children + * of the given (parent) device. The respective child firmware nodes will be + * associated with the correspondingly created child devices. + * + * The given controller will be used to instantiate the new devices. See + * ssam_device_add() for details. + * + * Return: Returns zero on success, nonzero on failure. + */ +static inline int ssam_register_clients(struct device *dev, struct ssam_controller *ctrl) +{ + return __ssam_register_clients(dev, ctrl, dev_fwnode(dev)); +} + +/** + * ssam_device_register_clients() - Register all client devices defined under + * the given SSAM parent device. + * @sdev: The parent device under which clients should be registered. + * + * Register all clients that have via firmware nodes been defined as children + * of the given (parent) device. The respective child firmware nodes will be + * associated with the correspondingly created child devices. + * + * The controller used by the parent device will be used to instantiate the new + * devices. See ssam_device_add() for details. + * + * Return: Returns zero on success, nonzero on failure. + */ +static inline int ssam_device_register_clients(struct ssam_device *sdev) +{ + return ssam_register_clients(&sdev->dev, sdev->ctrl); +} + /* -- Helpers for client-device requests. ----------------------------------- */ -- cgit From 504148fedb854299972d164b001357b888a9193e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Thu, 30 Jun 2022 15:07:50 +0000 Subject: net: add skb_[inner_]tcp_all_headers helpers Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)" to compute headers length for a TCP packet, but others use more convoluted (but equivalent) ways. Add skb_tcp_all_headers() and skb_inner_tcp_all_headers() helpers to harmonize this a bit. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/tcp.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index 1168302b7927..a9fbe22732c3 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -46,6 +46,36 @@ static inline unsigned int inner_tcp_hdrlen(const struct sk_buff *skb) return inner_tcp_hdr(skb)->doff * 4; } +/** + * skb_tcp_all_headers - Returns size of all headers for a TCP packet + * @skb: buffer + * + * Used in TX path, for a packet known to be a TCP one. + * + * if (skb_is_gso(skb)) { + * int hlen = skb_tcp_all_headers(skb); + * ... + */ +static inline int skb_tcp_all_headers(const struct sk_buff *skb) +{ + return skb_transport_offset(skb) + tcp_hdrlen(skb); +} + +/** + * skb_inner_tcp_all_headers - Returns size of all headers for an encap TCP packet + * @skb: buffer + * + * Used in TX path, for a packet known to be a TCP one. + * + * if (skb_is_gso(skb) && skb->encapsulation) { + * int hlen = skb_inner_tcp_all_headers(skb); + * ... + */ +static inline int skb_inner_tcp_all_headers(const struct sk_buff *skb) +{ + return skb_inner_transport_offset(skb) + inner_tcp_hdrlen(skb); +} + static inline unsigned int tcp_optlen(const struct sk_buff *skb) { return (tcp_hdr(skb)->doff - 5) * 4; -- cgit From f019679ea5f2ab650c3348a79e7d9c3625f62899 Mon Sep 17 00:00:00 2001 From: Chris Mi Date: Mon, 30 May 2022 06:07:57 +0300 Subject: net/mlx5: E-switch, Remove dependency between sriov and eswitch mode Currently, there are three eswitch modes, none, legacy and switchdev. None is the default mode. Remove redundant none mode as eswitch mode should always be either legacy mode or switchdev mode. With this patch, there are two behavior changes: 1. Legacy becomes the default mode. When querying eswitch mode using devlink, a valid mode is always returned. 2. When disabling sriov, the eswitch mode will not change, only vfs are unloaded. Signed-off-by: Chris Mi Reviewed-by: Maor Dickman Reviewed-by: Roi Dayan Signed-off-by: Saeed Mahameed --- include/linux/mlx5/eswitch.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/eswitch.h b/include/linux/mlx5/eswitch.h index 8b18fe9771f9..e2701ed0200e 100644 --- a/include/linux/mlx5/eswitch.h +++ b/include/linux/mlx5/eswitch.h @@ -12,7 +12,6 @@ #define MLX5_ESWITCH_MANAGER(mdev) MLX5_CAP_GEN(mdev, eswitch_manager) enum { - MLX5_ESWITCH_NONE, MLX5_ESWITCH_LEGACY, MLX5_ESWITCH_OFFLOADS }; @@ -153,7 +152,7 @@ struct mlx5_core_dev *mlx5_eswitch_get_core_dev(struct mlx5_eswitch *esw); static inline u8 mlx5_eswitch_mode(const struct mlx5_core_dev *dev) { - return MLX5_ESWITCH_NONE; + return MLX5_ESWITCH_LEGACY; } static inline enum devlink_eswitch_encap_mode @@ -198,6 +197,11 @@ static inline struct mlx5_core_dev *mlx5_eswitch_get_core_dev(struct mlx5_eswitc #endif /* CONFIG_MLX5_ESWITCH */ +static inline bool is_mdev_legacy_mode(struct mlx5_core_dev *dev) +{ + return mlx5_eswitch_mode(dev) == MLX5_ESWITCH_LEGACY; +} + static inline bool is_mdev_switchdev_mode(struct mlx5_core_dev *dev) { return mlx5_eswitch_mode(dev) == MLX5_ESWITCH_OFFLOADS; -- cgit From 036bff2800cbcf8217dd0bc93d8421b5b8f72476 Mon Sep 17 00:00:00 2001 From: Dario Binacchi Date: Tue, 28 Jun 2022 18:31:28 +0200 Subject: can: netlink: dump bitrate 0 if can_priv::bittiming.bitrate is -1U Upcoming changes on slcan driver will require you to specify a bitrate of value -1 to prevent the open_candev() from failing but at the same time highlighting that it is a fake value. In this case the command `ip --details -s -s link show' would print 4294967295 as the bitrate value. The patch change this value in 0. Link: https://lore.kernel.org/all/20220628163137.413025-5-dario.binacchi@amarulasolutions.com Suggested-by: Marc Kleine-Budde Signed-off-by: Dario Binacchi Tested-by: Jeroen Hofstee Signed-off-by: Marc Kleine-Budde --- include/linux/can/bittiming.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/can/bittiming.h b/include/linux/can/bittiming.h index 7ae21c0f7f23..ef0a77173e3c 100644 --- a/include/linux/can/bittiming.h +++ b/include/linux/can/bittiming.h @@ -11,6 +11,8 @@ #define CAN_SYNC_SEG 1 +#define CAN_BITRATE_UNSET 0 +#define CAN_BITRATE_UNKNOWN (-1U) #define CAN_CTRLMODE_TDC_MASK \ (CAN_CTRLMODE_TDC_AUTO | CAN_CTRLMODE_TDC_MANUAL) -- cgit From 4a557a5d1a6145ea586dc9b17a9b4e5190c9c017 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Thu, 30 Jun 2022 09:34:10 -0700 Subject: sparse: introduce conditional lock acquire function attribute The kernel tends to try to avoid conditional locking semantics because it makes it harder to think about and statically check locking rules, but we do have a few fundamental locking primitives that take locks conditionally - most obviously the 'trylock' functions. That has always been a problem for 'sparse' checking for locking imbalance, and we've had a special '__cond_lock()' macro that we've used to let sparse know how the locking works: # define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0) so that you can then use this to tell sparse that (for example) the spinlock trylock macro ends up acquiring the lock when it succeeds, but not when it fails: #define raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock)) and then sparse can follow along the locking rules when you have code like if (!spin_trylock(&dentry->d_lock)) return LRU_SKIP; .. sparse sees that the lock is held here.. spin_unlock(&dentry->d_lock); and sparse ends up happy about the lock contexts. However, this '__cond_lock()' use does result in very ugly header files, and requires you to basically wrap the real function with that macro that uses '__cond_lock'. Which has made PeterZ NAK things that try to fix sparse warnings over the years [1]. To solve this, there is now a very experimental patch to sparse that basically does the exact same thing as '__cond_lock()' did, but using a function attribute instead. That seems to make PeterZ happy [2]. Note that this does not replace existing use of '__cond_lock()', but only exposes the new proposed attribute and uses it for the previously unannotated 'refcount_dec_and_lock()' family of functions. For existing sparse installations, this will make no difference (a negative output context was ignored), but if you have the experimental sparse patch it will make sparse now understand code that uses those functions, the same way '__cond_lock()' makes sparse understand the very similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()' annotations. Note that in some cases this will silence existing context imbalance warnings. But in other cases it may end up exposing new sparse warnings for code that sparse just didn't see the locking for at all before. This is a trial, in other words. I'd expect that if it ends up being successful, and new sparse releases end up having this new attribute, we'll migrate the old-style '__cond_lock()' users to use the new-style '__cond_acquires' function attribute. The actual experimental sparse patch was posted in [3]. Link: https://lore.kernel.org/all/20130930134434.GC12926@twins.programming.kicks-ass.net/ [1] Link: https://lore.kernel.org/all/Yr60tWxN4P568x3W@worktop.programming.kicks-ass.net/ [2] Link: https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/ [3] Acked-by: Peter Zijlstra Cc: Alexander Aring Cc: Luc Van Oostenryck Signed-off-by: Linus Torvalds --- include/linux/compiler_types.h | 2 ++ include/linux/refcount.h | 6 +++--- 2 files changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index d08dfcb0ac68..4f2a819fd60a 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -24,6 +24,7 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { } /* context/locking */ # define __must_hold(x) __attribute__((context(x,1,1))) # define __acquires(x) __attribute__((context(x,0,1))) +# define __cond_acquires(x) __attribute__((context(x,0,-1))) # define __releases(x) __attribute__((context(x,1,0))) # define __acquire(x) __context__(x,1) # define __release(x) __context__(x,-1) @@ -50,6 +51,7 @@ static inline void __chk_io_ptr(const volatile void __iomem *ptr) { } /* context/locking */ # define __must_hold(x) # define __acquires(x) +# define __cond_acquires(x) # define __releases(x) # define __acquire(x) (void)0 # define __release(x) (void)0 diff --git a/include/linux/refcount.h b/include/linux/refcount.h index b8a6e387f8f9..a62fcca97486 100644 --- a/include/linux/refcount.h +++ b/include/linux/refcount.h @@ -361,9 +361,9 @@ static inline void refcount_dec(refcount_t *r) extern __must_check bool refcount_dec_if_one(refcount_t *r); extern __must_check bool refcount_dec_not_one(refcount_t *r); -extern __must_check bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock); -extern __must_check bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock); +extern __must_check bool refcount_dec_and_mutex_lock(refcount_t *r, struct mutex *lock) __cond_acquires(lock); +extern __must_check bool refcount_dec_and_lock(refcount_t *r, spinlock_t *lock) __cond_acquires(lock); extern __must_check bool refcount_dec_and_lock_irqsave(refcount_t *r, spinlock_t *lock, - unsigned long *flags); + unsigned long *flags) __cond_acquires(lock); #endif /* _LINUX_REFCOUNT_H */ -- cgit From b8d5109f50969ead9d49c3e8bd78ec1f82e548e3 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 3 Jul 2022 14:40:28 -0700 Subject: lockref: remove unused 'lockref_get_or_lock()' function Looking at the conditional lock acquire functions in the kernel due to the new sparse support (see commit 4a557a5d1a61 "sparse: introduce conditional lock acquire function attribute"), it became obvious that the lockref code has a couple of them, but they don't match the usual naming convention for the other ones, and their return value logic is also reversed. In the other very similar places, the naming pattern is '*_and_lock()' (eg 'atomic_put_and_lock()' and 'refcount_dec_and_lock()'), and the function returns true when the lock is taken. The lockref code is superficially very similar to the refcount code, only with the special "atomic wrt the embedded lock" semantics. But instead of the '*_and_lock()' naming it uses '*_or_lock()'. And instead of returning true in case it took the lock, it returns true if it *didn't* take the lock. Now, arguably the reflock code is quite logical: it really is a "either decrement _or_ lock" kind of situation - and the return value is about whether the operation succeeded without any special care needed. So despite the similarities, the differences do make some sense, and maybe it's not worth trying to unify the different conditional locking primitives in this area. But while looking at this all, it did become obvious that the 'lockref_get_or_lock()' function hasn't actually had any users for almost a decade. The only user it ever had was the shortlived 'd_rcu_to_refcount()' function, and it got removed and replaced with 'lockref_get_not_dead()' back in 2013 in commits 0d98439ea3c6 ("vfs: use lockred 'dead' flag to mark unrecoverably dead dentries") and e5c832d55588 ("vfs: fix dentry RCU to refcounting possibly sleeping dput()") In fact, that single use was removed less than a week after the whole function was introduced in commit b3abd80250c1 ("lockref: add 'lockref_get_or_lock() helper") so this function has been around for a decade, but only had a user for six days. Let's just put this mis-designed and unused function out of its misery. We can think about the naming and semantic oddities of the remaining 'lockref_put_or_lock()' later, but at least that function has users. And while the naming is different and the return value doesn't match, that function matches the whole '{atomic,refcount}_dec_and_test()' pattern much better (ie the magic happens when the count goes down to zero, not when it is incremented from zero). Signed-off-by: Linus Torvalds --- include/linux/lockref.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/lockref.h b/include/linux/lockref.h index 99f17cc8e163..c3a1f78bc884 100644 --- a/include/linux/lockref.h +++ b/include/linux/lockref.h @@ -38,7 +38,6 @@ extern void lockref_get(struct lockref *); extern int lockref_put_return(struct lockref *); extern int lockref_get_not_zero(struct lockref *); extern int lockref_put_not_zero(struct lockref *); -extern int lockref_get_or_lock(struct lockref *); extern int lockref_put_or_lock(struct lockref *); extern void lockref_mark_dead(struct lockref *); -- cgit From cffe57bee62b155c08d71218fc0e9e84a0a90bbb Mon Sep 17 00:00:00 2001 From: Bagas Sanjaya Date: Wed, 22 Jun 2022 15:45:46 +0700 Subject: Documentation: highmem: use literal block for code example in highmem.h comment When building htmldocs on Linus's tree, there are inline emphasis warnings on include/linux/highmem.h: Documentation/vm/highmem:166: ./include/linux/highmem.h:154: WARNING: Inline emphasis start-string without end-string. Documentation/vm/highmem:166: ./include/linux/highmem.h:157: WARNING: Inline emphasis start-string without end-string. These warnings above are due to comments in code example at the mentioned lines above are enclosed by double dash (--), which confuses Sphinx as inline markup delimiters instead. Fix these warnings by indenting the code example with literal block indentation and making the comments C comments. Link: https://lkml.kernel.org/r/20220622084546.17745-1-bagasdotme@gmail.com Fixes: 85a85e7601263f ("Documentation/vm: move "Using kmap-atomic" to highmem.h") Signed-off-by: Bagas Sanjaya Reviewed-by: Ira Weiny Tested-by: Ira Weiny Cc: "Matthew Wilcox (Oracle)" Cc: "Fabio M. De Francesco" Cc: Sebastian Andrzej Siewior Signed-off-by: Andrew Morton --- include/linux/highmem.h | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 3af34de54330..56d6a0196534 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -149,19 +149,19 @@ static inline void *kmap_local_folio(struct folio *folio, size_t offset); * It is used in atomic context when code wants to access the contents of a * page that might be allocated from high memory (see __GFP_HIGHMEM), for * example a page in the pagecache. The API has two functions, and they - * can be used in a manner similar to the following: + * can be used in a manner similar to the following:: * - * -- Find the page of interest. -- - * struct page *page = find_get_page(mapping, offset); + * // Find the page of interest. + * struct page *page = find_get_page(mapping, offset); * - * -- Gain access to the contents of that page. -- - * void *vaddr = kmap_atomic(page); + * // Gain access to the contents of that page. + * void *vaddr = kmap_atomic(page); * - * -- Do something to the contents of that page. -- - * memset(vaddr, 0, PAGE_SIZE); + * // Do something to the contents of that page. + * memset(vaddr, 0, PAGE_SIZE); * - * -- Unmap that page. -- - * kunmap_atomic(vaddr); + * // Unmap that page. + * kunmap_atomic(vaddr); * * Note that the kunmap_atomic() call takes the result of the kmap_atomic() * call, not the argument. -- cgit From 507db7927cd181d409dd495c8384b8e14c21c600 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Sun, 3 Jul 2022 18:08:36 -0700 Subject: mm: rmap: use the correct parameter name for DEFINE_PAGE_VMA_WALK The parameter used by DEFINE_PAGE_VMA_WALK is _page not page, fix the parameter name. It didn't cause any build error, it is probably because the only caller is write_protect_page() from ksm.c, which pass in page. Link: https://lkml.kernel.org/r/20220512174551.81279-1-shy828301@gmail.com Fixes: 2aff7a4755be ("mm: Convert page_vma_mapped_walk to work on PFNs") Signed-off-by: Yang Shi Reviewed-by: Muchun Song Reviewed-by: Matthew Wilcox (Oracle) Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/rmap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 9ec23138e410..bf80adca980b 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -325,8 +325,8 @@ struct page_vma_mapped_walk { #define DEFINE_PAGE_VMA_WALK(name, _page, _vma, _address, _flags) \ struct page_vma_mapped_walk name = { \ .pfn = page_to_pfn(_page), \ - .nr_pages = compound_nr(page), \ - .pgoff = page_to_pgoff(page), \ + .nr_pages = compound_nr(_page), \ + .pgoff = page_to_pgoff(_page), \ .vma = _vma, \ .address = _address, \ .flags = _flags, \ -- cgit From c453d8c7d1384d7e1d7f26d3ec0d527092edf801 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Fri, 13 May 2022 12:17:05 -0700 Subject: mm/page_vma_mapped.c: check possible huge PMD map with transhuge_vma_suitable() IIUC page_vma_mapped_walk() checks if the vma is possibly huge PMD mapped with transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR". Actually pvmw->nr_pages is returned by compound_nr() or folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages >= HPAGE_PMD_NR". And it is guaranteed THP is allocated for valid VMA in the first place. But it may be not PMD mapped if the VMA is file VMA and it is not properly aligned. The transhuge_vma_suitable() is used to do such check, so replace transparent_hugepage_active() to it, which is too heavy and overkilling. Link: https://lkml.kernel.org/r/20220513191705.457775-1-shy828301@gmail.com Signed-off-by: Yang Shi Reviewed-by: Muchun Song Cc: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index de29821231c9..648cb3ce7099 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabled_attr; extern unsigned long transparent_hugepage_flags; static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, - unsigned long haddr) + unsigned long addr) { + unsigned long haddr; + /* Don't have to check pgoff for anonymous vma */ if (!vma_is_anonymous(vma)) { if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff, @@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, return false; } + haddr = addr & HPAGE_PMD_MASK; + if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end) return false; return true; @@ -342,7 +346,7 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma) } static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, - unsigned long haddr) + unsigned long addr) { return false; } -- cgit From 7ce82f4c3f3ead13a9d9498768e3b1a79975c4d8 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Mon, 30 May 2022 19:30:15 +0800 Subject: mm/migration: return errno when isolate_huge_page failed We might fail to isolate huge page due to e.g. the page is under migration which cleared HPageMigratable. We should return errno in this case rather than always return 1 which could confuse the user, i.e. the caller might think all of the memory is migrated while the hugetlb page is left behind. We make the prototype of isolate_huge_page consistent with isolate_lru_page as suggested by Huang Ying and rename isolate_huge_page to isolate_hugetlb as suggested by Muchun to improve the readability. Link: https://lkml.kernel.org/r/20220530113016.16663-4-linmiaohe@huawei.com Fixes: e8db67eb0ded ("mm: migrate: move_pages() supports thp migration") Signed-off-by: Miaohe Lin Suggested-by: Huang Ying Reported-by: kernel test robot (build error) Cc: Alistair Popple Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Hildenbrand Cc: David Howells Cc: Mike Kravetz Cc: Muchun Song Cc: Oscar Salvador Cc: Peter Xu Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e4cff27d1198..756b66ff025e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -170,7 +170,7 @@ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, vm_flags_t vm_flags); long hugetlb_unreserve_pages(struct inode *inode, long start, long end, long freed); -bool isolate_huge_page(struct page *page, struct list_head *list); +int isolate_hugetlb(struct page *page, struct list_head *list); int get_hwpoison_huge_page(struct page *page, bool *hugetlb); int get_huge_page_for_hwpoison(unsigned long pfn, int flags); void putback_active_hugepage(struct page *page); @@ -376,9 +376,9 @@ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, return NULL; } -static inline bool isolate_huge_page(struct page *page, struct list_head *list) +static inline int isolate_hugetlb(struct page *page, struct list_head *list) { - return false; + return -EBUSY; } static inline int get_hwpoison_huge_page(struct page *page, bool *hugetlb) -- cgit From ad1ac596e8a8c4b06715dfbd89853eb73c9886b2 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Mon, 30 May 2022 19:30:16 +0800 Subject: mm/migration: fix potential pte_unmap on an not mapped pte __migration_entry_wait and migration_entry_wait_on_locked assume pte is always mapped from caller. But this is not the case when it's called from migration_entry_wait_huge and follow_huge_pmd. Add a hugetlbfs variant that calls hugetlb_migration_entry_wait(ptep == NULL) to fix this issue. Link: https://lkml.kernel.org/r/20220530113016.16663-5-linmiaohe@huawei.com Fixes: 30dad30922cc ("mm: migration: add migrate_entry_wait_huge()") Signed-off-by: Miaohe Lin Suggested-by: David Hildenbrand Reviewed-by: David Hildenbrand Cc: Alistair Popple Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Howells Cc: Huang Ying Cc: kernel test robot Cc: Mike Kravetz Cc: Muchun Song Cc: Oscar Salvador Cc: Peter Xu Signed-off-by: Andrew Morton --- include/linux/swapops.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index f24775b41880..bb7afd03a324 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -244,8 +244,10 @@ extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); -extern void migration_entry_wait_huge(struct vm_area_struct *vma, - struct mm_struct *mm, pte_t *pte); +#ifdef CONFIG_HUGETLB_PAGE +extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl); +extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte); +#endif #else static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) { @@ -271,8 +273,10 @@ static inline void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, spinlock_t *ptl) { } static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } -static inline void migration_entry_wait_huge(struct vm_area_struct *vma, - struct mm_struct *mm, pte_t *pte) { } +#ifdef CONFIG_HUGETLB_PAGE +static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { } +static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { } +#endif static inline int is_writable_migration_entry(swp_entry_t entry) { return 0; -- cgit From c9e124e0382d83d458db204f929002ea98daa6a8 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 6 Jun 2022 18:23:06 +0000 Subject: mm/damon/{dbgfs,sysfs}: move target_has_pid() from dbgfs to damon.h The function for knowing if given monitoring context's targets will have pid or not is defined and used in dbgfs only. However, the logic is also needed for sysfs. This commit moves the code to damon.h and makes both dbgfs and sysfs to use it. Link: https://lkml.kernel.org/r/20220606182310.48781-3-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- include/linux/damon.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 2765c7d99beb..b9aae19fab3e 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -525,6 +525,12 @@ bool damon_is_registered_ops(enum damon_ops_id id); int damon_register_ops(struct damon_operations *ops); int damon_select_ops(struct damon_ctx *ctx, enum damon_ops_id id); +static inline bool damon_target_has_pid(const struct damon_ctx *ctx) +{ + return ctx->ops.id == DAMON_OPS_VADDR || ctx->ops.id == DAMON_OPS_FVADDR; +} + + int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); -- cgit From d9da8f6cf55eeca642c021912af1890002464c64 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Thu, 9 Jun 2022 20:18:46 +0200 Subject: mm: introduce clear_highpage_kasan_tagged Add a clear_highpage_kasan_tagged() helper that does clear_highpage() on a page potentially tagged by KASAN. This helper is used by the following patch. Link: https://lkml.kernel.org/r/4471979b46b2c487787ddcd08b9dc5fedd1b6ffd.1654798516.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Marco Elver Signed-off-by: Andrew Morton --- include/linux/highmem.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index fee9835e3793..22379a63e293 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -243,6 +243,16 @@ static inline void clear_highpage(struct page *page) kunmap_local(kaddr); } +static inline void clear_highpage_kasan_tagged(struct page *page) +{ + u8 tag; + + tag = page_kasan_tag(page); + page_kasan_tag_reset(page); + clear_highpage(page); + page_kasan_tag_set(page, tag); +} + #ifndef __HAVE_ARCH_TAG_CLEAR_HIGHPAGE static inline void tag_clear_highpage(struct page *page) -- cgit From c15187a4a2d660bf490f7873afd0de5288f65c8f Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Tue, 31 May 2022 20:22:22 -0700 Subject: mm: memcontrol: introduce mem_cgroup_ino() and mem_cgroup_get_from_ino() Patch series "mm: introduce shrinker debugfs interface", v5. The only existing debugging mechanism is a couple of tracepoints in do_shrink_slab(): mm_shrink_slab_start and mm_shrink_slab_end. They aren't covering everything though: shrinkers which report 0 objects will never show up, there is no support for memcg-aware shrinkers. Shrinkers are identified by their scan function, which is not always enough (e.g. hard to guess which super block's shrinker it is having only "super_cache_scan"). To provide a better visibility and debug options for memory shrinkers this patchset introduces a /sys/kernel/debug/shrinker interface, to some extent similar to /sys/kernel/slab. For each shrinker registered in the system a directory is created. As now, the directory will contain only a "scan" file, which allows to get the number of managed objects for each memory cgroup (for memcg-aware shrinkers) and each numa node (for numa-aware shrinkers on a numa machine). Other interfaces might be added in the future. To make debugging more pleasant, the patchset also names all shrinkers, so that debugfs entries can have meaningful names. This patch (of 5): Shrinker debugfs requires a way to represent memory cgroups without using full paths, both for displaying information and getting input from a user. Cgroup inode number is a perfect way, already used by bpf. This commit adds a couple of helper functions which will be used to handle memcg-aware shrinkers. Link: https://lkml.kernel.org/r/20220601032227.4076670-1-roman.gushchin@linux.dev Link: https://lkml.kernel.org/r/20220601032227.4076670-2-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin Acked-by: Muchun Song Cc: Dave Chinner Cc: Kent Overstreet Cc: Hillf Danton Cc: Christophe JAILLET Cc: Roman Gushchin Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 04f2f33607e9..4d31ce55b1c0 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -837,6 +837,15 @@ static inline unsigned short mem_cgroup_id(struct mem_cgroup *memcg) } struct mem_cgroup *mem_cgroup_from_id(unsigned short id); +#ifdef CONFIG_SHRINKER_DEBUG +static inline unsigned long mem_cgroup_ino(struct mem_cgroup *memcg) +{ + return memcg ? cgroup_ino(memcg->css.cgroup) : 0; +} + +struct mem_cgroup *mem_cgroup_get_from_ino(unsigned long ino); +#endif + static inline struct mem_cgroup *mem_cgroup_from_seq(struct seq_file *m) { return mem_cgroup_from_css(seq_css(m)); @@ -1343,6 +1352,18 @@ static inline struct mem_cgroup *mem_cgroup_from_id(unsigned short id) return NULL; } +#ifdef CONFIG_SHRINKER_DEBUG +static inline unsigned long mem_cgroup_ino(struct mem_cgroup *memcg) +{ + return 0; +} + +static inline struct mem_cgroup *mem_cgroup_get_from_ino(unsigned long ino) +{ + return NULL; +} +#endif + static inline struct mem_cgroup *mem_cgroup_from_seq(struct seq_file *m) { return NULL; -- cgit From 5035ebc644aec92d55d1bbfe042f35341e4bffb5 Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Tue, 31 May 2022 20:22:23 -0700 Subject: mm: shrinkers: introduce debugfs interface for memory shrinkers This commit introduces the /sys/kernel/debug/shrinker debugfs interface which provides an ability to observe the state of individual kernel memory shrinkers. Because the feature adds some memory overhead (which shouldn't be large unless there is a huge amount of registered shrinkers), it's guarded by a config option (enabled by default). This commit introduces the "count" interface for each shrinker registered in the system. The output is in the following format: ... ... ... To reduce the size of output on machines with many thousands cgroups, if the total number of objects on all nodes is 0, the line is omitted. If the shrinker is not memcg-aware or CONFIG_MEMCG is off, 0 is printed as cgroup inode id. If the shrinker is not numa-aware, 0's are printed for all nodes except the first one. This commit gives debugfs entries simple numeric names, which are not very convenient. The following commit in the series will provide shrinkers with more meaningful names. [akpm@linux-foundation.org: remove WARN_ON_ONCE(), per Roman] Reported-by: syzbot+300d27c79fe6d4cbcc39@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/20220601032227.4076670-3-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin Reviewed-by: Kent Overstreet Acked-by: Muchun Song Cc: Christophe JAILLET Cc: Dave Chinner Cc: Hillf Danton Signed-off-by: Andrew Morton --- include/linux/shrinker.h | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 76fbf92b04d9..2ced8149c513 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -72,6 +72,10 @@ struct shrinker { #ifdef CONFIG_MEMCG /* ID in shrinker_idr */ int id; +#endif +#ifdef CONFIG_SHRINKER_DEBUG + int debugfs_id; + struct dentry *debugfs_entry; #endif /* objs pending delete, per node */ atomic_long_t *nr_deferred; @@ -94,4 +98,17 @@ extern int register_shrinker(struct shrinker *shrinker); extern void unregister_shrinker(struct shrinker *shrinker); extern void free_prealloced_shrinker(struct shrinker *shrinker); extern void synchronize_shrinkers(void); -#endif + +#ifdef CONFIG_SHRINKER_DEBUG +extern int shrinker_debugfs_add(struct shrinker *shrinker); +extern void shrinker_debugfs_remove(struct shrinker *shrinker); +#else /* CONFIG_SHRINKER_DEBUG */ +static inline int shrinker_debugfs_add(struct shrinker *shrinker) +{ + return 0; +} +static inline void shrinker_debugfs_remove(struct shrinker *shrinker) +{ +} +#endif /* CONFIG_SHRINKER_DEBUG */ +#endif /* _LINUX_SHRINKER_H */ -- cgit From e33c267ab70de4249d22d7eab1cc7d68a889bac2 Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Tue, 31 May 2022 20:22:24 -0700 Subject: mm: shrinkers: provide shrinkers with names Currently shrinkers are anonymous objects. For debugging purposes they can be identified by count/scan function names, but it's not always useful: e.g. for superblock's shrinkers it's nice to have at least an idea of to which superblock the shrinker belongs. This commit adds names to shrinkers. register_shrinker() and prealloc_shrinker() functions are extended to take a format and arguments to master a name. In some cases it's not possible to determine a good name at the time when a shrinker is allocated. For such cases shrinker_debugfs_rename() is provided. The expected format is: -[:]- For some shrinkers an instance can be encoded as (MAJOR:MINOR) pair. After this change the shrinker debugfs directory looks like: $ cd /sys/kernel/debug/shrinker/ $ ls dquota-cache-16 sb-devpts-28 sb-proc-47 sb-tmpfs-42 mm-shadow-18 sb-devtmpfs-5 sb-proc-48 sb-tmpfs-43 mm-zspool:zram0-34 sb-hugetlbfs-17 sb-pstore-31 sb-tmpfs-44 rcu-kfree-0 sb-hugetlbfs-33 sb-rootfs-2 sb-tmpfs-49 sb-aio-20 sb-iomem-12 sb-securityfs-6 sb-tracefs-13 sb-anon_inodefs-15 sb-mqueue-21 sb-selinuxfs-22 sb-xfs:vda1-36 sb-bdev-3 sb-nsfs-4 sb-sockfs-8 sb-zsmalloc-19 sb-bpf-32 sb-pipefs-14 sb-sysfs-26 thp-deferred_split-10 sb-btrfs:vda2-24 sb-proc-25 sb-tmpfs-1 thp-zero-9 sb-cgroup2-30 sb-proc-39 sb-tmpfs-27 xfs-buf:vda1-37 sb-configfs-23 sb-proc-41 sb-tmpfs-29 xfs-inodegc:vda1-38 sb-dax-11 sb-proc-45 sb-tmpfs-35 sb-debugfs-7 sb-proc-46 sb-tmpfs-40 [roman.gushchin@linux.dev: fix build warnings] Link: https://lkml.kernel.org/r/Yr+ZTnLb9lJk6fJO@castle Reported-by: kernel test robot Link: https://lkml.kernel.org/r/20220601032227.4076670-4-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin Cc: Christophe JAILLET Cc: Dave Chinner Cc: Hillf Danton Cc: Kent Overstreet Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/shrinker.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h index 2ced8149c513..08e6054e061f 100644 --- a/include/linux/shrinker.h +++ b/include/linux/shrinker.h @@ -75,6 +75,7 @@ struct shrinker { #endif #ifdef CONFIG_SHRINKER_DEBUG int debugfs_id; + const char *name; struct dentry *debugfs_entry; #endif /* objs pending delete, per node */ @@ -92,9 +93,11 @@ struct shrinker { */ #define SHRINKER_NONSLAB (1 << 3) -extern int prealloc_shrinker(struct shrinker *shrinker); +extern int __printf(2, 3) prealloc_shrinker(struct shrinker *shrinker, + const char *fmt, ...); extern void register_shrinker_prepared(struct shrinker *shrinker); -extern int register_shrinker(struct shrinker *shrinker); +extern int __printf(2, 3) register_shrinker(struct shrinker *shrinker, + const char *fmt, ...); extern void unregister_shrinker(struct shrinker *shrinker); extern void free_prealloced_shrinker(struct shrinker *shrinker); extern void synchronize_shrinkers(void); @@ -102,6 +105,8 @@ extern void synchronize_shrinkers(void); #ifdef CONFIG_SHRINKER_DEBUG extern int shrinker_debugfs_add(struct shrinker *shrinker); extern void shrinker_debugfs_remove(struct shrinker *shrinker); +extern int __printf(2, 3) shrinker_debugfs_rename(struct shrinker *shrinker, + const char *fmt, ...); #else /* CONFIG_SHRINKER_DEBUG */ static inline int shrinker_debugfs_add(struct shrinker *shrinker) { @@ -110,5 +115,10 @@ static inline int shrinker_debugfs_add(struct shrinker *shrinker) static inline void shrinker_debugfs_remove(struct shrinker *shrinker) { } +static inline __printf(2, 3) +int shrinker_debugfs_rename(struct shrinker *shrinker, const char *fmt, ...) +{ + return 0; +} #endif /* CONFIG_SHRINKER_DEBUG */ #endif /* _LINUX_SHRINKER_H */ -- cgit From 8cdcc532268df0893d9756f537cbce479f4c4831 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 13 Jun 2022 19:22:56 +0000 Subject: mm/damon/schemes: add 'LRU_PRIO' DAMOS action This commit adds a new DAMOS action called 'LRU_PRIO' for the physical address space. The action prioritizes pages in the memory regions of the user-specified target access pattern on their LRU lists. This is hence supposed to be used for frequently accessed (hot) memory regions so that hot pages could be more likely protected under memory pressure. Internally, it simply calls 'mark_page_accessed()'. Link: https://lkml.kernel.org/r/20220613192301.8817-5-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- include/linux/damon.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index b9aae19fab3e..4c64e03e94d8 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -86,6 +86,7 @@ struct damon_target { * @DAMOS_PAGEOUT: Call ``madvise()`` for the region with MADV_PAGEOUT. * @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE. * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE. + * @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists. * @DAMOS_STAT: Do nothing but count the stat. * @NR_DAMOS_ACTIONS: Total number of DAMOS actions */ @@ -95,6 +96,7 @@ enum damos_action { DAMOS_PAGEOUT, DAMOS_HUGEPAGE, DAMOS_NOHUGEPAGE, + DAMOS_LRU_PRIO, DAMOS_STAT, /* Do nothing but only record the stat */ NR_DAMOS_ACTIONS, }; -- cgit From 99cdc2cd180a7adc87badc9ca92f8af803d8bf3b Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Mon, 13 Jun 2022 19:22:58 +0000 Subject: mm/damon/schemes: add 'LRU_DEPRIO' action This commit adds a new DAMON-based operation scheme action called 'LRU_DEPRIO' for physical address space. The action deprioritizes pages in the memory area of the target access pattern on their LRU lists. This is hence supposed to be used for rarely accessed (cold) memory regions so that cold pages could be more likely reclaimed first under memory pressure. Internally, it simply calls 'lru_deactivate()'. Using this with 'LRU_PRIO' action for hot pages, users can proactively sort LRU lists based on the access pattern. That is, it can make the LRU lists somewhat more trustworthy source of access temperature. As a result, efficiency of LRU-lists based mechanisms including the reclamation target selection could be improved. Link: https://lkml.kernel.org/r/20220613192301.8817-7-sj@kernel.org Signed-off-by: SeongJae Park Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- include/linux/damon.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 4c64e03e94d8..7b1f4a488230 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -87,6 +87,7 @@ struct damon_target { * @DAMOS_HUGEPAGE: Call ``madvise()`` for the region with MADV_HUGEPAGE. * @DAMOS_NOHUGEPAGE: Call ``madvise()`` for the region with MADV_NOHUGEPAGE. * @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists. + * @DAMOS_LRU_DEPRIO: Deprioritize the region on its LRU lists. * @DAMOS_STAT: Do nothing but count the stat. * @NR_DAMOS_ACTIONS: Total number of DAMOS actions */ @@ -97,6 +98,7 @@ enum damos_action { DAMOS_HUGEPAGE, DAMOS_NOHUGEPAGE, DAMOS_LRU_PRIO, + DAMOS_LRU_DEPRIO, DAMOS_STAT, /* Do nothing but only record the stat */ NR_DAMOS_ACTIONS, }; -- cgit From 64fe24a3e05e5f3ac56fcd45afd2fd1d9cc8fcb6 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 14 Jun 2022 11:36:29 +0200 Subject: mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection Similar to our MM_CP_DIRTY_ACCT handling for shared, writable mappings, we can try mapping anonymous pages in a private writable mapping writable if they are exclusive, the PTE is already dirty, and no special handling applies. Mapping the anonymous page writable is essentially the same thing the write fault handler would do in this case. Special handling is required for uffd-wp and softdirty tracking, so take care of that properly. Also, leave PROT_NONE handling alone for now; in the future, we could similarly extend the logic in do_numa_page() or use pte_mk_savedwrite() here. While this improves mprotect(PROT_READ)+mprotect(PROT_READ|PROT_WRITE) performance, it should also be a valuable optimization for uffd-wp, when un-protecting. This has been previously suggested by Peter Collingbourne in [1], relevant in the context of the Scudo memory allocator, before we had PageAnonExclusive. This commit doesn't add the same handling for PMDs (i.e., anonymous THP, anonymous hugetlb); benchmark results from Andrea indicate that there are minor performance gains, so it's might still be valuable to streamline that logic for all anonymous pages in the future. As we now also set MM_CP_DIRTY_ACCT for private mappings, let's rename it to MM_CP_TRY_CHANGE_WRITABLE, to make it clearer what's actually happening. Micro-benchmark courtesy of Andrea: === #define _GNU_SOURCE #include #include #include #include #include #define SIZE (1024*1024*1024) int main(int argc, char *argv[]) { char *p; if (posix_memalign((void **)&p, sysconf(_SC_PAGESIZE)*512, SIZE)) perror("posix_memalign"), exit(1); if (madvise(p, SIZE, argc > 1 ? MADV_HUGEPAGE : MADV_NOHUGEPAGE)) perror("madvise"); explicit_bzero(p, SIZE); for (int loops = 0; loops < 40; loops++) { if (mprotect(p, SIZE, PROT_READ)) perror("mprotect"), exit(1); if (mprotect(p, SIZE, PROT_READ|PROT_WRITE)) perror("mprotect"), exit(1); explicit_bzero(p, SIZE); } } === Results on my Ryzen 9 3900X: Stock 10 runs (lower is better): AVG 6.398s, STDEV 0.043 Patched 10 runs (lower is better): AVG 3.780s, STDEV 0.026 === [1] https://lkml.kernel.org/r/20210429214801.2583336-1-pcc@google.com Link: https://lkml.kernel.org/r/20220614093629.76309-1-david@redhat.com Signed-off-by: David Hildenbrand Suggested-by: Peter Collingbourne Acked-by: Peter Xu Cc: Nadav Amit Cc: Dave Hansen Cc: Andrea Arcangeli Cc: Yang Shi Cc: Hugh Dickins Cc: Mel Gorman Signed-off-by: Andrew Morton --- include/linux/mm.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index cf3d0d673f6b..09ea26056e2f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1962,8 +1962,12 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma, * for now all the callers are only use one of the flags at the same * time. */ -/* Whether we should allow dirty bit accounting */ -#define MM_CP_DIRTY_ACCT (1UL << 0) +/* + * Whether we should manually check if we can map individual PTEs writable, + * because something (e.g., COW, uffd-wp) blocks that from happening for all + * PTEs automatically in a writable mapping. + */ +#define MM_CP_TRY_CHANGE_WRITABLE (1UL << 0) /* Whether this protection change is for NUMA hints */ #define MM_CP_PROT_NUMA (1UL << 1) /* Whether this change is for write protecting */ -- cgit From b8cecb9376b9d3031cf62b476a0db087b6b01072 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 17 Jun 2022 16:42:44 +0100 Subject: mm/vmscan: convert reclaim_clean_pages_from_list() to folios Patch series "nvert much of vmscan to folios" vmscan always operates on folios since it puts the pages on the LRU list. Switching all of these functions from pages to folios saves 1483 bytes of text from removing all the baggage around calling compound_page() and similar functions. This patch (of 5): This is a straightforward conversion which removes several hidden calls to compound_head, saving 330 bytes of kernel text. Link: https://lkml.kernel.org/r/20220617154248.700416-1-willy@infradead.org Link: https://lkml.kernel.org/r/20220617154248.700416-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e66f7aa3191d..f32aade2a6e0 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -670,6 +670,12 @@ static __always_inline bool PageAnon(struct page *page) return folio_test_anon(page_folio(page)); } +static __always_inline bool __folio_test_movable(const struct folio *folio) +{ + return ((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS) == + PAGE_MAPPING_MOVABLE; +} + static __always_inline int __PageMovable(struct page *page) { return ((unsigned long)page->mapping & PAGE_MAPPING_FLAGS) == -- cgit From e3c4cebf3f9db8c9150eb1982da7e353d9938bed Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 17 Jun 2022 18:49:59 +0100 Subject: mm: add folios_put() Patch series "Convert the swap code to be more folio-based". There's still more to do with the swap code, but this reaps a lot of the folio benefit. More than 4kB of kernel text saved (with the UEK7 kernel config). I don't know how much that's going to translate into CPU savings, but some of those compound_head() calls are on every page free, so it should be noticable. It might even be noticable just from an I-cache consumption perspective. This patch (of 22): This is just a wrapper around release_pages() for now. Place the prototype in mm.h along with folio_put() and folio_put_refs(). Link: https://lkml.kernel.org/r/20220617175020.717127-1-willy@infradead.org Link: https://lkml.kernel.org/r/20220617175020.717127-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/mm.h | 19 +++++++++++++++++++ include/linux/pagemap.h | 2 -- 2 files changed, 19 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 09ea26056e2f..09670ccb94e7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1220,6 +1220,25 @@ static inline void folio_put_refs(struct folio *folio, int refs) __put_page(&folio->page); } +void release_pages(struct page **pages, int nr); + +/** + * folios_put - Decrement the reference count on an array of folios. + * @folios: The folios. + * @nr: How many folios there are. + * + * Like folio_put(), but for an array of folios. This is more efficient + * than writing the loop yourself as it will optimise the locks which + * need to be taken if the folios are freed. + * + * Context: May be called in process or interrupt context, but not in NMI + * context. May be called while holding a spinlock. + */ +static inline void folios_put(struct folio **folios, unsigned int nr) +{ + release_pages((struct page **)folios, nr); +} + static inline void put_page(struct page *page) { struct folio *folio = page_folio(page); diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index ce96866fbec4..c399a9c5da7d 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -345,8 +345,6 @@ static inline void filemap_nr_thps_dec(struct address_space *mapping) #endif } -void release_pages(struct page **pages, int nr); - struct address_space *page_mapping(struct page *); struct address_space *folio_mapping(struct folio *); struct address_space *swapcache_mapping(struct folio *); -- cgit From 7d80dd096f8f889128f67a2d452e4dadeed71e63 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 17 Jun 2022 18:50:01 +0100 Subject: mm/swap: make __pagevec_lru_add static __pagevec_lru_add has no callers outside swap.c, so make it static, and move it to a more logical position in the file. Link: https://lkml.kernel.org/r/20220617175020.717127-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/pagevec.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h index 67b1246f136b..b0e3540f3a4c 100644 --- a/include/linux/pagevec.h +++ b/include/linux/pagevec.h @@ -26,7 +26,6 @@ struct pagevec { }; void __pagevec_release(struct pagevec *pvec); -void __pagevec_lru_add(struct pagevec *pvec); unsigned pagevec_lookup_range(struct pagevec *pvec, struct address_space *mapping, pgoff_t *start, pgoff_t end); -- cgit From 8d29c7036f5ff360ea1f51b9fed5d909be7c8094 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 17 Jun 2022 18:50:13 +0100 Subject: mm/swap: convert __put_page() to __folio_put() Saves 11 bytes of text by removing a check of PageTail. Link: https://lkml.kernel.org/r/20220617175020.717127-16-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/mm.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 09670ccb94e7..3fb49aec13fd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -855,7 +855,7 @@ static inline struct folio *virt_to_folio(const void *x) return page_folio(page); } -void __put_page(struct page *page); +void __folio_put(struct folio *folio); void put_pages_list(struct list_head *pages); @@ -1197,7 +1197,7 @@ static inline __must_check bool try_get_page(struct page *page) static inline void folio_put(struct folio *folio) { if (folio_put_testzero(folio)) - __put_page(&folio->page); + __folio_put(folio); } /** @@ -1217,7 +1217,7 @@ static inline void folio_put(struct folio *folio) static inline void folio_put_refs(struct folio *folio, int refs) { if (folio_ref_sub_and_test(folio, refs)) - __put_page(&folio->page); + __folio_put(folio); } void release_pages(struct page **pages, int nr); -- cgit From 5375336c8c42a343c3b440b6f1e21c65e7b174b9 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 17 Jun 2022 18:50:17 +0100 Subject: mm: convert destroy_compound_page() to destroy_large_folio() All callers now have a folio, so push the folio->page conversion down to this function. [akpm@linux-foundation.org: uninline destroy_large_folio() to fix build issue] Link: https://lkml.kernel.org/r/20220617175020.717127-20-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/mm.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 3fb49aec13fd..9cc02a7e503b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -892,11 +892,7 @@ static inline void set_compound_page_dtor(struct page *page, page[1].compound_dtor = compound_dtor; } -static inline void destroy_compound_page(struct page *page) -{ - VM_BUG_ON_PAGE(page[1].compound_dtor >= NR_COMPOUND_DTORS, page); - compound_page_dtors[page[1].compound_dtor](page); -} +void destroy_large_folio(struct folio *folio); static inline int head_compound_pincount(struct page *head) { -- cgit From ed7802dd48f7a507213cbb95bb4c6f1fe134eb5d Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Fri, 17 Jun 2022 21:56:49 +0800 Subject: mm: memory_hotplug: enumerate all supported section flags Patch series "make hugetlb_optimize_vmemmap compatible with memmap_on_memory", v3. This series makes hugetlb_optimize_vmemmap compatible with memmap_on_memory. This patch (of 2): We are almost running out of section flags, only one bit is available in the worst case (powerpc with 256k pages). However, there are still some free bits (in ->section_mem_map) on other architectures (e.g. x86_64 has 10 bits available, arm64 has 8 bits available with worst case of 64K pages). We have hard coded those numbers in code, it is inconvenient to use those bits on other architectures except powerpc. So transfer those section flags to enumeration to make it easy to add new section flags in the future. Also, move SECTION_TAINT_ZONE_DEVICE into the scope of CONFIG_ZONE_DEVICE to save a bit on non-zone-device case. [songmuchun@bytedance.com: replace enum with defines per David] Link: https://lkml.kernel.org/r/20220620110616.12056-2-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220617135650.74901-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220617135650.74901-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: David Hildenbrand Cc: Jonathan Corbet Cc: Mike Kravetz Cc: Oscar Salvador Cc: Paul E. McKenney Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 41 ++++++++++++++++++++++++++++++++--------- 1 file changed, 32 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index aab70355d64f..2b5757752333 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1418,16 +1418,32 @@ extern size_t mem_section_usage_size(void); * (equal SECTION_SIZE_BITS - PAGE_SHIFT), and the * worst combination is powerpc with 256k pages, * which results in PFN_SECTION_SHIFT equal 6. - * To sum it up, at least 6 bits are available. + * To sum it up, at least 6 bits are available on all architectures. + * However, we can exceed 6 bits on some other architectures except + * powerpc (e.g. 15 bits are available on x86_64, 13 bits are available + * with the worst case of 64K pages on arm64) if we make sure the + * exceeded bit is not applicable to powerpc. */ -#define SECTION_MARKED_PRESENT (1UL<<0) -#define SECTION_HAS_MEM_MAP (1UL<<1) -#define SECTION_IS_ONLINE (1UL<<2) -#define SECTION_IS_EARLY (1UL<<3) -#define SECTION_TAINT_ZONE_DEVICE (1UL<<4) -#define SECTION_MAP_LAST_BIT (1UL<<5) -#define SECTION_MAP_MASK (~(SECTION_MAP_LAST_BIT-1)) -#define SECTION_NID_SHIFT 6 +enum { + SECTION_MARKED_PRESENT_BIT, + SECTION_HAS_MEM_MAP_BIT, + SECTION_IS_ONLINE_BIT, + SECTION_IS_EARLY_BIT, +#ifdef CONFIG_ZONE_DEVICE + SECTION_TAINT_ZONE_DEVICE_BIT, +#endif + SECTION_MAP_LAST_BIT, +}; + +#define SECTION_MARKED_PRESENT BIT(SECTION_MARKED_PRESENT_BIT) +#define SECTION_HAS_MEM_MAP BIT(SECTION_HAS_MEM_MAP_BIT) +#define SECTION_IS_ONLINE BIT(SECTION_IS_ONLINE_BIT) +#define SECTION_IS_EARLY BIT(SECTION_IS_EARLY_BIT) +#ifdef CONFIG_ZONE_DEVICE +#define SECTION_TAINT_ZONE_DEVICE BIT(SECTION_TAINT_ZONE_DEVICE_BIT) +#endif +#define SECTION_MAP_MASK (~(BIT(SECTION_MAP_LAST_BIT) - 1)) +#define SECTION_NID_SHIFT SECTION_MAP_LAST_BIT static inline struct page *__section_mem_map_addr(struct mem_section *section) { @@ -1466,12 +1482,19 @@ static inline int online_section(struct mem_section *section) return (section && (section->section_mem_map & SECTION_IS_ONLINE)); } +#ifdef CONFIG_ZONE_DEVICE static inline int online_device_section(struct mem_section *section) { unsigned long flags = SECTION_IS_ONLINE | SECTION_TAINT_ZONE_DEVICE; return section && ((section->section_mem_map & flags) == flags); } +#else +static inline int online_device_section(struct mem_section *section) +{ + return 0; +} +#endif static inline int online_section_nr(unsigned long nr) { -- cgit From 66361095129b3b5d065e6c09cf0c085ef4a8c40f Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Fri, 17 Jun 2022 21:56:50 +0800 Subject: mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with memmap_on_memory For now, the feature of hugetlb_free_vmemmap is not compatible with the feature of memory_hotplug.memmap_on_memory, and hugetlb_free_vmemmap takes precedence over memory_hotplug.memmap_on_memory. However, someone wants to make memory_hotplug.memmap_on_memory takes precedence over hugetlb_free_vmemmap since memmap_on_memory makes it more likely to succeed memory hotplug in close-to-OOM situations. So the decision of making hugetlb_free_vmemmap take precedence is not wise and elegant. The proper approach is to have hugetlb_vmemmap.c do the check whether the section which the HugeTLB pages belong to can be optimized. If the section's vmemmap pages are allocated from the added memory block itself, hugetlb_free_vmemmap should refuse to optimize the vmemmap, otherwise, do the optimization. Then both kernel parameters are compatible. So this patch introduces VmemmapSelfHosted to mask any non-optimizable vmemmap pages. The hugetlb_vmemmap can use this flag to detect if a vmemmap page can be optimized. [songmuchun@bytedance.com: walk vmemmap page tables to avoid false-positive] Link: https://lkml.kernel.org/r/20220620110616.12056-3-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220617135650.74901-3-songmuchun@bytedance.com Signed-off-by: Muchun Song Co-developed-by: Oscar Salvador Signed-off-by: Oscar Salvador Acked-by: David Hildenbrand Cc: Jonathan Corbet Cc: Mike Kravetz Cc: Paul E. McKenney Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/memory_hotplug.h | 9 --------- include/linux/page-flags.h | 11 +++++++++++ 2 files changed, 11 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 20d7edf62a6a..e0b2209ab71c 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -351,13 +351,4 @@ void arch_remove_linear_mapping(u64 start, u64 size); extern bool mhp_supports_memmap_on_memory(unsigned long size); #endif /* CONFIG_MEMORY_HOTPLUG */ -#ifdef CONFIG_MHP_MEMMAP_ON_MEMORY -bool mhp_memmap_on_memory(void); -#else -static inline bool mhp_memmap_on_memory(void) -{ - return false; -} -#endif - #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index f32aade2a6e0..82719d33c0f1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -193,6 +193,11 @@ enum pageflags { /* Only valid for buddy pages. Used to track pages that are reported */ PG_reported = PG_uptodate, + +#ifdef CONFIG_MEMORY_HOTPLUG + /* For self-hosted memmap pages */ + PG_vmemmap_self_hosted = PG_owner_priv_1, +#endif }; #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) @@ -628,6 +633,12 @@ PAGEFLAG_FALSE(SkipKASanPoison, skip_kasan_poison) */ __PAGEFLAG(Reported, reported, PF_NO_COMPOUND) +#ifdef CONFIG_MEMORY_HOTPLUG +PAGEFLAG(VmemmapSelfHosted, vmemmap_self_hosted, PF_ANY) +#else +PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted) +#endif + /* * On an anonymous page mapped into a user virtual memory area, * page->mapping points to its anon_vma, not to a struct address_space; -- cgit From e8da368a1e42a8056d1a6b419e1b91b6cf11d77e Mon Sep 17 00:00:00 2001 From: Yun-Ze Li Date: Mon, 20 Jun 2022 07:15:16 +0000 Subject: mm, docs: fix comments that mention mem_hotplug_end() Comments that mention mem_hotplug_end() are confusing as there is no function called mem_hotplug_end(). Fix them by replacing all the occurences of mem_hotplug_end() in the comments with mem_hotplug_done(). [akpm@linux-foundation.org: grammatical fixes] Link: https://lkml.kernel.org/r/20220620071516.1286101-1-p76091292@gs.ncku.edu.tw Signed-off-by: Yun-Ze Li Cc: Souptick Joarder Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 2b5757752333..735bf5b37949 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -591,8 +591,8 @@ struct zone { * give them a chance of being in the same cacheline. * * Write access to present_pages at runtime should be protected by - * mem_hotplug_begin/end(). Any reader who can't tolerant drift of - * present_pages should get_online_mems() to get a stable value. + * mem_hotplug_begin/done(). Any reader who can't tolerant drift of + * present_pages should use get_online_mems() to get a stable value. */ atomic_long_t managed_pages; unsigned long spanned_pages; @@ -870,7 +870,7 @@ typedef struct pglist_data { unsigned long nr_reclaim_start; /* nr pages written while throttled * when throttling started. */ struct task_struct *kswapd; /* Protected by - mem_hotplug_begin/end() */ + mem_hotplug_begin/done() */ int kswapd_order; enum zone_type kswapd_highest_zoneidx; -- cgit From 18f3962953e40401b7ed98e8524167282c3e626e Mon Sep 17 00:00:00 2001 From: Qi Zheng Date: Sun, 26 Jun 2022 22:57:17 +0800 Subject: mm: hugetlb: kill set_huge_swap_pte_at() Commit e5251fd43007 ("mm/hugetlb: introduce set_huge_swap_pte_at() helper") add set_huge_swap_pte_at() to handle swap entries on architectures that support hugepages consisting of contiguous ptes. And currently the set_huge_swap_pte_at() is only overridden by arm64. set_huge_swap_pte_at() provide a sz parameter to help determine the number of entries to be updated. But in fact, all hugetlb swap entries contain pfn information, so we can find the corresponding folio through the pfn recorded in the swap entry, then the folio_size() is the number of entries that need to be updated. And considering that users will easily cause bugs by ignoring the difference between set_huge_swap_pte_at() and set_huge_pte_at(). Let's handle swap entries in set_huge_pte_at() and remove the set_huge_swap_pte_at(), then we can call set_huge_pte_at() anywhere, which simplifies our coding. Link: https://lkml.kernel.org/r/20220626145717.53572-1-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng Acked-by: Muchun Song Cc: Mike Kravetz Cc: Catalin Marinas Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 756b66ff025e..c6cccfaf8708 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -903,14 +903,6 @@ static inline void hugetlb_count_sub(long l, struct mm_struct *mm) atomic_long_sub(l, &mm->hugetlb_usage); } -#ifndef set_huge_swap_pte_at -static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned long sz) -{ - set_huge_pte_at(mm, addr, ptep, pte); -} -#endif - #ifndef huge_ptep_modify_prot_start #define huge_ptep_modify_prot_start huge_ptep_modify_prot_start static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, @@ -1094,11 +1086,6 @@ static inline void hugetlb_count_sub(long l, struct mm_struct *mm) { } -static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte, unsigned long sz) -{ -} - static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { -- cgit From 1baec203b77cafa24610b5c9ae7a2aa380d74ef6 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Sat, 25 Jun 2022 17:28:16 +0800 Subject: mm/khugepaged: try to free transhuge swapcache when possible Transhuge swapcaches won't be freed in __collapse_huge_page_copy(). It's because release_pte_page() is not called for these pages and thus free_page_and_swap_cache can't grab the page lock. These pages won't be freed from swap cache even if we are the only user until next time reclaim. It shouldn't hurt indeed, but we could try to free these pages to save more memory for system. Link: https://lkml.kernel.org/r/20220625092816.4856-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Cc: Alistair Popple Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: David Howells Cc: Matthew Wilcox (Oracle) Cc: NeilBrown Cc: Peter Xu Cc: Suren Baghdasaryan Cc: Vlastimil Babka Cc: Yang Shi Cc: Zach O'Keefe Signed-off-by: Andrew Morton --- include/linux/swap.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 95a5b7aa1ae9..6d11c51b2b62 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -455,6 +455,7 @@ static inline unsigned long total_swapcache_pages(void) return global_node_page_state(NR_SWAPCACHE); } +extern void free_swap_cache(struct page *page); extern void free_page_and_swap_cache(struct page *); extern void free_pages_and_swap_cache(struct page **, int); /* linux/mm/swapfile.c */ @@ -539,6 +540,10 @@ static inline void put_swap_device(struct swap_info_struct *si) /* used to sanity check ptes in zap_pte_range when CONFIG_SWAP=0 */ #define free_swap_and_cache(e) is_pfn_swap_entry(e) +static inline void free_swap_cache(struct page *page) +{ +} + static inline int add_swap_count_continuation(swp_entry_t swp, gfp_t gfp_mask) { return 0; -- cgit From 1fcf54deb767d474181ad7cf33c92bb2a33607fb Mon Sep 17 00:00:00 2001 From: Josh Don Date: Wed, 29 Jun 2022 14:14:26 -0700 Subject: sched/core: add forced idle accounting for cgroups 4feee7d1260 previously added per-task forced idle accounting. This patch extends this to also include cgroups. rstat is used for cgroup accounting, except for the root, which uses kcpustat in order to bypass the need for doing an rstat flush when reading root stats. Only cgroup v2 is supported. Similar to the task accounting, the cgroup accounting requires that schedstats is enabled. Signed-off-by: Josh Don Signed-off-by: Peter Zijlstra (Intel) Acked-by: Tejun Heo Link: https://lkml.kernel.org/r/20220629211426.3329954-1-joshdon@google.com --- include/linux/cgroup-defs.h | 4 ++++ include/linux/kernel_stat.h | 7 +++++++ 2 files changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 1bfcfb1af352..025fd0e84a31 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -287,6 +287,10 @@ struct css_set { struct cgroup_base_stat { struct task_cputime cputime; + +#ifdef CONFIG_SCHED_CORE + u64 forceidle_sum; +#endif }; /* diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 69ae6b278464..ddb5a358fd82 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -28,6 +28,9 @@ enum cpu_usage_stat { CPUTIME_STEAL, CPUTIME_GUEST, CPUTIME_GUEST_NICE, +#ifdef CONFIG_SCHED_CORE + CPUTIME_FORCEIDLE, +#endif NR_STATS, }; @@ -115,4 +118,8 @@ extern void account_process_tick(struct task_struct *, int user); extern void account_idle_ticks(unsigned long ticks); +#ifdef CONFIG_SCHED_CORE +extern void __account_forceidle_time(struct task_struct *tsk, u64 delta); +#endif + #endif /* _LINUX_KERNEL_STAT_H */ -- cgit From 39bfb3c12d792181de0cf3306be1ea03664a1b05 Mon Sep 17 00:00:00 2001 From: Kurt Kanzenbach Date: Fri, 1 Jul 2022 19:56:06 +0200 Subject: net: phy: broadcom: Add support for BCM53128 internal PHYs Add support for BCM53128 internal PHYs. These support interrupts as well as statistics. Therefore, enable the Broadcom PHY driver for them. Tested on BCM53128 switch using the mainline b53 DSA driver. Signed-off-by: Kurt Kanzenbach Reviewed-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/brcmphy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h index 747fad264033..6ff567ece34a 100644 --- a/include/linux/brcmphy.h +++ b/include/linux/brcmphy.h @@ -16,6 +16,7 @@ #define PHY_ID_BCM5481 0x0143bca0 #define PHY_ID_BCM5395 0x0143bcf0 #define PHY_ID_BCM53125 0x03625f20 +#define PHY_ID_BCM53128 0x03625e10 #define PHY_ID_BCM54810 0x03625d00 #define PHY_ID_BCM54811 0x03625cc0 #define PHY_ID_BCM5482 0x0143bcb0 -- cgit From df76234276e22136b2468825c18407fdfbb2076a Mon Sep 17 00:00:00 2001 From: Stefan Wahren Date: Sat, 25 Jun 2022 13:36:15 +0200 Subject: mfd: bcm2835-pm: Add support for BCM2711 In BCM2711 the new RPiVid ASB took over V3D. The old ASB is still present with the ISP and H264 bits, and V3D is in the same place in the new ASB as the old one. As per the devicetree bindings, BCM2711 will provide both the old and new ASB resources, so get both of them and pass them into 'bcm2835-power,' which will take care of selecting which one to use accordingly. Since the RPiVid ASB's resources were being provided prior to formalizing the bindings[1], also support the old DT files that didn't use 'reg-names.' [1] See: 7dbe8c62ceeb ("ARM: dts: Add minimal Raspberry Pi 4 support") Signed-off-by: Stefan Wahren Reviewed-by: Peter Robinson Acked-by: Florian Fainelli Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220625113619.15944-8-stefan.wahren@i2se.com --- include/linux/mfd/bcm2835-pm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mfd/bcm2835-pm.h b/include/linux/mfd/bcm2835-pm.h index ed37dc40e82a..f70a810c55f7 100644 --- a/include/linux/mfd/bcm2835-pm.h +++ b/include/linux/mfd/bcm2835-pm.h @@ -9,6 +9,7 @@ struct bcm2835_pm { struct device *dev; void __iomem *base; void __iomem *asb; + void __iomem *rpivid_asb; }; #endif /* BCM2835_MFD_PM_H */ -- cgit From 2fcfa72fc13f0203bb676bd1ec30ec85e17855be Mon Sep 17 00:00:00 2001 From: Peng Fan Date: Sun, 3 Jul 2022 17:11:25 +0800 Subject: interconnect: add device managed bulk API Add device managed bulk API to simplify driver. Signed-off-by: Peng Fan Link: https://lore.kernel.org/r/20220703091132.1412063-4-peng.fan@oss.nxp.com Signed-off-by: Georgi Djakov --- include/linux/interconnect.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/interconnect.h b/include/linux/interconnect.h index f685777b875e..2b0e784ba771 100644 --- a/include/linux/interconnect.h +++ b/include/linux/interconnect.h @@ -44,6 +44,7 @@ struct icc_path *icc_get(struct device *dev, const int src_id, const int dst_id); struct icc_path *of_icc_get(struct device *dev, const char *name); struct icc_path *devm_of_icc_get(struct device *dev, const char *name); +int devm_of_icc_bulk_get(struct device *dev, int num_paths, struct icc_bulk_data *paths); struct icc_path *of_icc_get_by_index(struct device *dev, int idx); void icc_put(struct icc_path *path); int icc_enable(struct icc_path *path); @@ -116,6 +117,12 @@ static inline int of_icc_bulk_get(struct device *dev, int num_paths, struct icc_ return 0; } +static inline int devm_of_icc_bulk_get(struct device *dev, int num_paths, + struct icc_bulk_data *paths) +{ + return 0; +} + static inline void icc_bulk_put(int num_paths, struct icc_bulk_data *paths) { } -- cgit From 7097f29819bb70374dfae9f705e548a720f16f94 Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Mon, 4 Jul 2022 11:19:31 +0100 Subject: firmware: arm_scmi: Add SCMI v3.1 System Power extensions Add support for SCMIv3.1 System Power optional timeout field while dispatching SYSTEM_POWER_STATE_NOTIFIER notification. Link: https://lore.kernel.org/r/20220704101933.2981635-3-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 704111f63993..311aa3c73aac 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -781,8 +781,10 @@ struct scmi_clock_rate_notif_report { struct scmi_system_power_state_notifier_report { ktime_t timestamp; unsigned int agent_id; +#define SCMI_SYSPOWER_IS_REQUEST_GRACEFUL(flags) ((flags) & BIT(0)) unsigned int flags; unsigned int system_state; + unsigned int timeout; }; struct scmi_perf_limits_report { -- cgit From d91079995fa62720e11a39c08b932f2f9a8cbfae Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Mon, 4 Jul 2022 11:19:32 +0100 Subject: firmware: arm_scmi: Add devm_protocol_acquire helper Add a method to get hold of a protocol, causing it to be initialized and its resource accounting updated, without getting access to its operations and handle. Some protocols, like SCMI SystemPower, do not expose any protocol ops to the kernel OSPM agent but still need to be at least initialized. This helper avoids the need to invoke a full devm_get_protocol() only to get the protocol initialized while throwing away unused the protocol ops and handle. Link: https://lore.kernel.org/r/20220704101933.2981635-4-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 311aa3c73aac..898488c7f185 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -624,6 +624,9 @@ struct scmi_notify_ops { * * @dev: pointer to the SCMI device * @version: pointer to the structure containing SCMI version information + * @devm_protocol_acquire: devres managed method to get hold of a protocol, + * causing its initialization and related resource + * accounting * @devm_protocol_get: devres managed method to acquire a protocol and get specific * operations and a dedicated protocol handler * @devm_protocol_put: devres managed method to release a protocol @@ -642,6 +645,8 @@ struct scmi_handle { struct device *dev; struct scmi_revision_info *version; + int __must_check (*devm_protocol_acquire)(struct scmi_device *sdev, + u8 proto); const void __must_check * (*devm_protocol_get)(struct scmi_device *sdev, u8 proto, struct scmi_protocol_handle **ph); -- cgit From 0316f99c4780b0a5fd60b7f136c64cb1af8d5fc3 Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Mon, 4 Jul 2022 11:22:36 +0100 Subject: firmware: arm_scmi: Add SCMI v3.1 powercap protocol basic support Add support for SCMI v3.1 powercap protocol, with the exception of powercap fast channels, exposing all the new related powercap protocol operations as usual in include/linux/scmi_protocol.h. Link: https://lore.kernel.org/r/20220704102241.2988447-3-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 125 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 125 insertions(+) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 898488c7f185..95023dc8b3a3 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -560,6 +560,114 @@ struct scmi_voltage_proto_ops { s32 *volt_uV); }; +/** + * struct scmi_powercap_info - Describe one available Powercap domain + * + * @id: Domain ID as advertised by the platform. + * @notify_powercap_cap_change: CAP change notification support. + * @notify_powercap_measurement_change: MEASUREMENTS change notifications + * support. + * @async_powercap_cap_set: Asynchronous CAP set support. + * @powercap_cap_config: CAP configuration support. + * @powercap_monitoring: Monitoring (measurements) support. + * @powercap_pai_config: PAI configuration support. + * @powercap_scale_mw: Domain reports power data in milliwatt units. + * @powercap_scale_uw: Domain reports power data in microwatt units. + * Note that, when both @powercap_scale_mw and + * @powercap_scale_uw are set to false, the domain + * reports power data on an abstract linear scale. + * @name: name assigned to the Powercap Domain by platform. + * @min_pai: Minimum configurable PAI. + * @max_pai: Maximum configurable PAI. + * @pai_step: Step size between two consecutive PAI values. + * @min_power_cap: Minimum configurable CAP. + * @max_power_cap: Maximum configurable CAP. + * @power_cap_step: Step size between two consecutive CAP values. + * @sustainable_power: Maximum sustainable power consumption for this domain + * under normal conditions. + * @accuracy: The accuracy with which the power is measured and reported in + * integral multiples of 0.001 percent. + * @parent_id: Identifier of the containing parent power capping domain, or the + * value 0xFFFFFFFF if this powercap domain is a root domain not + * contained in any other domain. + */ +struct scmi_powercap_info { + unsigned int id; + bool notify_powercap_cap_change; + bool notify_powercap_measurement_change; + bool async_powercap_cap_set; + bool powercap_cap_config; + bool powercap_monitoring; + bool powercap_pai_config; + bool powercap_scale_mw; + bool powercap_scale_uw; + char name[SCMI_MAX_STR_SIZE]; + unsigned int min_pai; + unsigned int max_pai; + unsigned int pai_step; + unsigned int min_power_cap; + unsigned int max_power_cap; + unsigned int power_cap_step; + unsigned int sustainable_power; + unsigned int accuracy; +#define SCMI_POWERCAP_ROOT_ZONE_ID 0xFFFFFFFFUL + unsigned int parent_id; +}; + +/** + * struct scmi_powercap_proto_ops - represents the various operations provided + * by SCMI Powercap Protocol + * + * @num_domains_get: get the count of powercap domains provided by SCMI. + * @info_get: get the information for the specified domain. + * @cap_get: get the current CAP value for the specified domain. + * @cap_set: set the CAP value for the specified domain to the provided value; + * if the domain supports setting the CAP with an asynchronous command + * this request will finally trigger an asynchronous transfer, but, if + * @ignore_dresp here is set to true, this call will anyway return + * immediately without waiting for the related delayed response. + * @pai_get: get the current PAI value for the specified domain. + * @pai_set: set the PAI value for the specified domain to the provided value. + * @measurements_get: retrieve the current average power measurements for the + * specified domain and the related PAI upon which is + * calculated. + * @measurements_threshold_set: set the desired low and high power thresholds + * to be used when registering for notification + * of type POWERCAP_MEASUREMENTS_NOTIFY with this + * powercap domain. + * Note that this must be called at least once + * before registering any callback with the usual + * @scmi_notify_ops; moreover, in case this method + * is called with measurement notifications already + * enabled it will also trigger, transparently, a + * proper update of the power thresholds configured + * in the SCMI backend server. + * @measurements_threshold_get: get the currently configured low and high power + * thresholds used when registering callbacks for + * notification POWERCAP_MEASUREMENTS_NOTIFY. + */ +struct scmi_powercap_proto_ops { + int (*num_domains_get)(const struct scmi_protocol_handle *ph); + const struct scmi_powercap_info __must_check *(*info_get) + (const struct scmi_protocol_handle *ph, u32 domain_id); + int (*cap_get)(const struct scmi_protocol_handle *ph, u32 domain_id, + u32 *power_cap); + int (*cap_set)(const struct scmi_protocol_handle *ph, u32 domain_id, + u32 power_cap, bool ignore_dresp); + int (*pai_get)(const struct scmi_protocol_handle *ph, u32 domain_id, + u32 *pai); + int (*pai_set)(const struct scmi_protocol_handle *ph, u32 domain_id, + u32 pai); + int (*measurements_get)(const struct scmi_protocol_handle *ph, + u32 domain_id, u32 *average_power, u32 *pai); + int (*measurements_threshold_set)(const struct scmi_protocol_handle *ph, + u32 domain_id, u32 power_thresh_low, + u32 power_thresh_high); + int (*measurements_threshold_get)(const struct scmi_protocol_handle *ph, + u32 domain_id, u32 *power_thresh_low, + u32 *power_thresh_high); +}; + /** * struct scmi_notify_ops - represents notifications' operations provided by * SCMI core @@ -666,6 +774,7 @@ enum scmi_std_protocol { SCMI_PROTOCOL_SENSOR = 0x15, SCMI_PROTOCOL_RESET = 0x16, SCMI_PROTOCOL_VOLTAGE = 0x17, + SCMI_PROTOCOL_POWERCAP = 0x18, }; enum scmi_system_events { @@ -767,6 +876,8 @@ enum scmi_notification_events { SCMI_EVENT_RESET_ISSUED = 0x0, SCMI_EVENT_BASE_ERROR_EVENT = 0x0, SCMI_EVENT_SYSTEM_POWER_STATE_NOTIFIER = 0x0, + SCMI_EVENT_POWERCAP_CAP_CHANGED = 0x0, + SCMI_EVENT_POWERCAP_MEASUREMENTS_CHANGED = 0x1, }; struct scmi_power_state_changed_report { @@ -837,4 +948,18 @@ struct scmi_base_error_report { unsigned long long reports[]; }; +struct scmi_powercap_cap_changed_report { + ktime_t timestamp; + unsigned int agent_id; + unsigned int domain_id; + unsigned int power_cap; + unsigned int pai; +}; + +struct scmi_powercap_meas_changed_report { + ktime_t timestamp; + unsigned int agent_id; + unsigned int domain_id; + unsigned int power; +}; #endif /* _LINUX_SCMI_PROTOCOL_H */ -- cgit From 855aa26e5f56d415b71d3f8d86ef0cc51b2166a3 Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Mon, 4 Jul 2022 11:22:38 +0100 Subject: firmware: arm_scmi: Add SCMI v3.1 powercap fast channels support Add SCMIv3.1 powercap protocol fast channel support using common helpers provided by the SCMI core with scmi_proto_helpers_ops operations. Link: https://lore.kernel.org/r/20220704102241.2988447-5-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 95023dc8b3a3..7f4f9df1b20f 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -601,6 +601,7 @@ struct scmi_powercap_info { bool powercap_pai_config; bool powercap_scale_mw; bool powercap_scale_uw; + bool fastchannels; char name[SCMI_MAX_STR_SIZE]; unsigned int min_pai; unsigned int max_pai; @@ -612,6 +613,7 @@ struct scmi_powercap_info { unsigned int accuracy; #define SCMI_POWERCAP_ROOT_ZONE_ID 0xFFFFFFFFUL unsigned int parent_id; + struct scmi_fc_info *fc_info; }; /** -- cgit From cc1cfc47ea47187a21ec1f079b3c53264157fe15 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:49 +0100 Subject: cacheinfo: Add support to check if last level cache(LLC) is valid or shared It is useful to have helper to check if the given two CPUs share last level cache. We can do that check by comparing fw_token or by comparing the cache ID. Currently we check just for fw_token as the cache ID is optional. This helper can be used to build the llc_sibling during arch specific topology parsing and feeding information to the sched_domains. This also helps to get rid of llc_id in the CPU topology as it is sort of duplicate information. Also add helper to check if the llc information in cacheinfo is valid or not. Link: https://lore.kernel.org/r/20220704101605.1318280-6-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- include/linux/cacheinfo.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h index 4ff37cb763ae..7e429bc5c1a4 100644 --- a/include/linux/cacheinfo.h +++ b/include/linux/cacheinfo.h @@ -82,6 +82,8 @@ struct cpu_cacheinfo *get_cpu_cacheinfo(unsigned int cpu); int init_cache_level(unsigned int cpu); int populate_cache_leaves(unsigned int cpu); int cache_setup_acpi(unsigned int cpu); +bool last_level_cache_is_valid(unsigned int cpu); +bool last_level_cache_is_shared(unsigned int cpu_x, unsigned int cpu_y); #ifndef CONFIG_ACPI_PPTT /* * acpi_find_last_cache_level is only called on ACPI enabled -- cgit From 36bbc5b4ffab33ccac0f4db27f619a6ba7a4fd32 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:50 +0100 Subject: cacheinfo: Allow early detection and population of cache attributes Some architecture/platforms may need to setup cache properties very early in the boot along with other cpu topologies so that all these information can be used to build sched_domains which is used by the scheduler. Allow detect_cache_attributes to be called quite early during the boot. Link: https://lore.kernel.org/r/20220704101605.1318280-7-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- include/linux/cacheinfo.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cacheinfo.h b/include/linux/cacheinfo.h index 7e429bc5c1a4..00b7a6ae8617 100644 --- a/include/linux/cacheinfo.h +++ b/include/linux/cacheinfo.h @@ -84,6 +84,7 @@ int populate_cache_leaves(unsigned int cpu); int cache_setup_acpi(unsigned int cpu); bool last_level_cache_is_valid(unsigned int cpu); bool last_level_cache_is_shared(unsigned int cpu_x, unsigned int cpu_y); +int detect_cache_attributes(unsigned int cpu); #ifndef CONFIG_ACPI_PPTT /* * acpi_find_last_cache_level is only called on ACPI enabled -- cgit From 5b8dc787ce4a45c87254d1b0b22f161347ab7f81 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:15:56 +0100 Subject: arch_topology: Drop LLC identifier stash from the CPU topology Since the cacheinfo LLC information is used directly in arch_topology, there is no need to parse and store the LLC ID information only for ACPI systems in the CPU topology. Remove the redundant LLC ID from the generic CPU arch_topology information. Link: https://lore.kernel.org/r/20220704101605.1318280-13-sudeep.holla@arm.com Tested-by: Ionela Voinescu Tested-by: Conor Dooley Reviewed-by: Gavin Shan Signed-off-by: Sudeep Holla --- include/linux/arch_topology.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index 58cbe18d825c..a07b510e7dc5 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -68,7 +68,6 @@ struct cpu_topology { int core_id; int cluster_id; int package_id; - int llc_id; cpumask_t thread_sibling; cpumask_t core_sibling; cpumask_t cluster_sibling; -- cgit From 7128af87c7f1c30cd6cebe0b012cc25872c689e2 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Mon, 4 Jul 2022 11:16:05 +0100 Subject: ACPI: Remove the unused find_acpi_cpu_cache_topology() The sole user of this find_acpi_cpu_cache_topology() was arm64 topology which is now consolidated into the generic arch_topology without the need of this function. Drop the unused function find_acpi_cpu_cache_topology(). Link: https://lore.kernel.org/r/20220704101605.1318280-22-sudeep.holla@arm.com Cc: Rafael J. Wysocki Reported-by: Ionela Voinescu Tested-by: Conor Dooley Acked-by: Rafael J. Wysocki Signed-off-by: Sudeep Holla --- include/linux/acpi.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 4f82a5bc6d98..7b96a8bff6d2 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1429,7 +1429,6 @@ int find_acpi_cpu_topology(unsigned int cpu, int level); int find_acpi_cpu_topology_cluster(unsigned int cpu); int find_acpi_cpu_topology_package(unsigned int cpu); int find_acpi_cpu_topology_hetero_id(unsigned int cpu); -int find_acpi_cpu_cache_topology(unsigned int cpu, int level); #else static inline int acpi_pptt_cpu_is_thread(unsigned int cpu) { @@ -1451,10 +1450,6 @@ static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu) { return -EINVAL; } -static inline int find_acpi_cpu_cache_topology(unsigned int cpu, int level) -{ - return -EINVAL; -} #endif #ifdef CONFIG_ACPI_PCC -- cgit From 50d6281ce9b8412f7ef02d1bc9d23aa62ae0cf98 Mon Sep 17 00:00:00 2001 From: Ren Zhijie Date: Thu, 30 Jun 2022 20:35:28 +0800 Subject: dma-mapping: Fix build error unused-value MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If CONFIG_DMA_DECLARE_COHERENT is not set, make ARCH=x86_64 CROSS_COMPILE=x86_64-linux-gnu- will be failed, like this: drivers/remoteproc/remoteproc_core.c: In function ‘rproc_rvdev_release’: ./include/linux/dma-map-ops.h:182:42: error: statement with no effect [-Werror=unused-value] #define dma_release_coherent_memory(dev) (0) ^ drivers/remoteproc/remoteproc_core.c:464:2: note: in expansion of macro ‘dma_release_coherent_memory’ dma_release_coherent_memory(dev); ^~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors The return type of function dma_release_coherent_memory in CONFIG_DMA_DECLARE_COHERENT area is void, so in !CONFIG_DMA_DECLARE_COHERENT area it should neither return any value nor be defined as zero. Reported-by: Hulk Robot Fixes: e61c451476e6 ("dma-mapping: Add dma_release_coherent_memory to DMA API") Signed-off-by: Ren Zhijie Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220630123528.251181-1-renzhijie2@huawei.com Signed-off-by: Mathieu Poirier --- include/linux/dma-map-ops.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 53db9655efe9..bfffe494356a 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -179,10 +179,10 @@ static inline int dma_declare_coherent_memory(struct device *dev, return -ENOSYS; } -#define dma_release_coherent_memory(dev) (0) #define dma_alloc_from_dev_coherent(dev, size, handle, ret) (0) #define dma_release_from_dev_coherent(dev, order, vaddr) (0) #define dma_mmap_from_dev_coherent(dev, vma, vaddr, order, ret) (0) +static inline void dma_release_coherent_memory(struct device *dev) { } #endif /* CONFIG_DMA_DECLARE_COHERENT */ #ifdef CONFIG_DMA_GLOBAL_POOL -- cgit From 85e4ea1049c70fb99de5c6057e835d151fb647da Mon Sep 17 00:00:00 2001 From: David Howells Date: Wed, 20 Apr 2022 14:27:17 +0100 Subject: fscache: Fix invalidation/lookup race If an NFS file is opened for writing and closed, fscache_invalidate() will be asked to invalidate the file - however, if the cookie is in the LOOKING_UP state (or the CREATING state), then request to invalidate doesn't get recorded for fscache_cookie_state_machine() to do something with. Fix this by making __fscache_invalidate() set a flag if it sees the cookie is in the LOOKING_UP state to indicate that we need to go to invalidation. Note that this requires a count on the n_accesses counter for the state machine, which that will release when it's done. fscache_cookie_state_machine() then shifts to the INVALIDATING state if it sees the flag. Without this, an nfs file can get corrupted if it gets modified locally and then read locally as the cache contents may not get updated. Fixes: d24af13e2e23 ("fscache: Implement cookie invalidation") Reported-by: Max Kellermann Signed-off-by: David Howells Tested-by: Max Kellermann Link: https://lore.kernel.org/r/YlWWbpW5Foynjllo@rabbit.intern.cm-ag [1] --- include/linux/fscache.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 72585c9729a2..b86265664879 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -130,6 +130,7 @@ struct fscache_cookie { #define FSCACHE_COOKIE_DO_PREP_TO_WRITE 12 /* T if cookie needs write preparation */ #define FSCACHE_COOKIE_HAVE_DATA 13 /* T if this cookie has data stored */ #define FSCACHE_COOKIE_IS_HASHED 14 /* T if this cookie is hashed */ +#define FSCACHE_COOKIE_DO_INVALIDATE 15 /* T if cookie needs invalidation */ enum fscache_cookie_state state; u8 advice; /* FSCACHE_ADV_* */ -- cgit From 3dcb861dbc6ab101838a1548b1efddd00ca3c3ec Mon Sep 17 00:00:00 2001 From: Eric Auger Date: Thu, 30 Jun 2022 11:40:59 +0200 Subject: ACPI: VIOT: Fix ACS setup Currently acpi_viot_init() gets called after the pci device has been scanned and pci_enable_acs() has been called. So pci_request_acs() fails to be taken into account leading to wrong single iommu group topologies when dealing with multi-function root ports for instance. We cannot simply move the acpi_viot_init() earlier, similarly as the IORT init because the VIOT parsing relies on the pci scan. However we can detect VIOT is present earlier and in such a case, request ACS. Introduce a new acpi_viot_early_init() routine that allows to call pci_request_acs() before the scan. While at it, guard the call to pci_request_acs() with #ifdef CONFIG_PCI. Fixes: 3cf485540e7b ("ACPI: Add driver for the VIOT table") Signed-off-by: Eric Auger Reported-by: Jin Liu Reviewed-by: Jean-Philippe Brucker Tested-by: Jean-Philippe Brucker Signed-off-by: Rafael J. Wysocki --- include/linux/acpi_viot.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h index 1eb8ee5b0e5f..a5a122431563 100644 --- a/include/linux/acpi_viot.h +++ b/include/linux/acpi_viot.h @@ -6,9 +6,11 @@ #include #ifdef CONFIG_ACPI_VIOT +void __init acpi_viot_early_init(void); void __init acpi_viot_init(void); int viot_iommu_configure(struct device *dev); #else +static inline void acpi_viot_early_init(void) {} static inline void acpi_viot_init(void) {} static inline int viot_iommu_configure(struct device *dev) { -- cgit From 7feec7430edddb87c24b0a86b08a03d0b496a755 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Tue, 5 Jul 2022 13:29:14 -0500 Subject: ACPI: CPPC: Only probe for _CPC if CPPC v2 is acked Previously the kernel used to ignore whether the firmware masked CPPC or CPPCv2 and would just pretend that it worked. When support for the USB4 bit in _OSC was introduced from commit 9e1f561afb ("ACPI: Execute platform _OSC also with query bit clear") the kernel began to look at the return when the query bit was clear. This caused regressions that were misdiagnosed and attempted to be solved as part of commit 2ca8e6285250 ("Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag""). This caused a different regression where non-Intel systems weren't able to negotiate _OSC properly. This was reverted in commit 2ca8e6285250 ("Revert "ACPI: Pass the same capabilities to the _OSC regardless of the query flag"") and attempted to be fixed by commit c42fa24b4475 ("ACPI: bus: Avoid using CPPC if not supported by firmware") but the regression still returned. These systems with the regression only load support for CPPC from an SSDT dynamically when _OSC reports CPPC v2. Avoid the problem by not letting CPPC satisfy the requirement in `acpi_cppc_processor_probe`. Reported-by: CUI Hao Reported-by: maxim.novozhilov@gmail.com Reported-by: lethe.tree@protonmail.com Reported-by: garystephenwright@gmail.com Reported-by: galaxyking0419@gmail.com Fixes: c42fa24b4475 ("ACPI: bus: Avoid using CPPC if not supported by firmware") Fixes: 2ca8e6285250 ("Revert "ACPI Pass the same capabilities to the _OSC regardless of the query flag"") Link: https://bugzilla.kernel.org/show_bug.cgi?id=213023 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2075387 Reviewed-by: Mika Westerberg Tested-by: CUI Hao Signed-off-by: Mario Limonciello Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 4f82a5bc6d98..44975c1bbe12 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -584,7 +584,7 @@ acpi_status acpi_run_osc(acpi_handle handle, struct acpi_osc_context *context); extern bool osc_sb_apei_support_acked; extern bool osc_pc_lpi_support_confirmed; extern bool osc_sb_native_usb4_support_confirmed; -extern bool osc_sb_cppc_not_supported; +extern bool osc_sb_cppc2_support_acked; extern bool osc_cpc_flexible_adr_space_confirmed; /* USB4 Capabilities */ -- cgit From 09d3154a6f0f0bb5b604832095804780f3684b96 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 6 Jun 2022 22:51:58 -0500 Subject: PM: wakeup: Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP Previously the CONFIG_PM_SLEEP and !CONFIG_PM_SLEEP device_init_wakeup() implementations differed in confusing ways: - The PM_SLEEP version checked for a NULL device pointer and returned -EINVAL, while the !PM_SLEEP version did not and would simply dereference a NULL pointer. - When called with "false", the !PM_SLEEP version cleared "capable" and "enable" in the opposite order of the PM_SLEEP version. That was harmless because for !PM_SLEEP they're simple assignments, but it's unnecessary confusion. Use a simplified version of the PM_SLEEP implementation for both cases. Signed-off-by: Bjorn Helgaas Signed-off-by: Rafael J. Wysocki --- include/linux/pm_wakeup.h | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_wakeup.h b/include/linux/pm_wakeup.h index 196a157456aa..77f4849e3418 100644 --- a/include/linux/pm_wakeup.h +++ b/include/linux/pm_wakeup.h @@ -109,7 +109,6 @@ extern struct wakeup_source *wakeup_sources_walk_next(struct wakeup_source *ws); extern int device_wakeup_enable(struct device *dev); extern int device_wakeup_disable(struct device *dev); extern void device_set_wakeup_capable(struct device *dev, bool capable); -extern int device_init_wakeup(struct device *dev, bool val); extern int device_set_wakeup_enable(struct device *dev, bool enable); extern void __pm_stay_awake(struct wakeup_source *ws); extern void pm_stay_awake(struct device *dev); @@ -167,13 +166,6 @@ static inline int device_set_wakeup_enable(struct device *dev, bool enable) return 0; } -static inline int device_init_wakeup(struct device *dev, bool val) -{ - device_set_wakeup_capable(dev, val); - device_set_wakeup_enable(dev, val); - return 0; -} - static inline bool device_may_wakeup(struct device *dev) { return dev->power.can_wakeup && dev->power.should_wakeup; @@ -217,4 +209,27 @@ static inline void pm_wakeup_hard_event(struct device *dev) return pm_wakeup_dev_event(dev, 0, true); } +/** + * device_init_wakeup - Device wakeup initialization. + * @dev: Device to handle. + * @enable: Whether or not to enable @dev as a wakeup device. + * + * By default, most devices should leave wakeup disabled. The exceptions are + * devices that everyone expects to be wakeup sources: keyboards, power buttons, + * possibly network interfaces, etc. Also, devices that don't generate their + * own wakeup requests but merely forward requests from one bus to another + * (like PCI bridges) should have wakeup enabled by default. + */ +static inline int device_init_wakeup(struct device *dev, bool enable) +{ + if (enable) { + device_set_wakeup_capable(dev, true); + return device_wakeup_enable(dev); + } else { + device_wakeup_disable(dev); + device_set_wakeup_capable(dev, false); + return 0; + } +} + #endif /* _LINUX_PM_WAKEUP_H */ -- cgit From e67198cc05b8ecbb7b8e2d8ef9fb5c8d26821873 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:25 +0200 Subject: context_tracking: Take idle eqs entrypoints over RCU The RCU dynticks counter is going to be merged into the context tracking subsystem. Start with moving the idle extended quiescent states entrypoints to context tracking. For now those are dumb redirections to existing RCU calls. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking.h | 8 ++++++++ include/linux/rcupdate.h | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index e35ae66b4794..01abadb2f993 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -119,4 +119,12 @@ extern void context_tracking_init(void); static inline void context_tracking_init(void) { } #endif /* CONFIG_CONTEXT_TRACKING_USER_FORCE */ +#ifdef CONFIG_CONTEXT_TRACKING_IDLE +extern void ct_idle_enter(void); +extern void ct_idle_exit(void); +#else +static inline void ct_idle_enter(void) { } +static inline void ct_idle_exit(void) { } +#endif /* !CONFIG_CONTEXT_TRACKING_IDLE */ + #endif diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 1a32036c918c..6ebe754501c3 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -128,7 +128,7 @@ static inline void rcu_nocb_flush_deferred_wakeup(void) { } * @a: Code that RCU needs to pay attention to. * * RCU read-side critical sections are forbidden in the inner idle loop, - * that is, between the rcu_idle_enter() and the rcu_idle_exit() -- RCU + * that is, between the ct_idle_enter() and the ct_idle_exit() -- RCU * will happily ignore any such read-side critical sections. However, * things like powertop need tracepoints in the inner idle loop. * -- cgit From 6f0e6c1598b1a3d19fc30db86b6e26d6f881b43d Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:26 +0200 Subject: context_tracking: Take IRQ eqs entrypoints over RCU The RCU dynticks counter is going to be merged into the context tracking subsystem. Prepare with moving the IRQ extended quiescent states entrypoints to context tracking. For now those are dumb redirection to existing RCU calls. [ paulmck: Apply Stephen Rothwell feedback from -next. ] [ paulmck: Apply Nathan Chancellor feedback. ] Acked-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking_irq.h | 17 +++++++++++++++++ include/linux/context_tracking_state.h | 1 + include/linux/entry-common.h | 10 +++++----- include/linux/rcupdate.h | 5 +++-- include/linux/tracepoint.h | 4 ++-- 5 files changed, 28 insertions(+), 9 deletions(-) create mode 100644 include/linux/context_tracking_irq.h (limited to 'include/linux') diff --git a/include/linux/context_tracking_irq.h b/include/linux/context_tracking_irq.h new file mode 100644 index 000000000000..62f62bbd1a50 --- /dev/null +++ b/include/linux/context_tracking_irq.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_CONTEXT_TRACKING_IRQ_H +#define _LINUX_CONTEXT_TRACKING_IRQ_H + +#ifdef CONFIG_CONTEXT_TRACKING_IDLE +void ct_irq_enter(void); +void ct_irq_exit(void); +void ct_irq_enter_irqson(void); +void ct_irq_exit_irqson(void); +#else +static inline void ct_irq_enter(void) { } +static inline void ct_irq_exit(void) { } +static inline void ct_irq_enter_irqson(void) { } +static inline void ct_irq_exit_irqson(void) { } +#endif + +#endif diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 2b46afe105a9..9c16a8b2c194 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -4,6 +4,7 @@ #include #include +#include struct context_tracking { /* diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index c92ac75d6556..84a466b176cf 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -357,7 +357,7 @@ void irqentry_exit_to_user_mode(struct pt_regs *regs); /** * struct irqentry_state - Opaque object for exception state storage * @exit_rcu: Used exclusively in the irqentry_*() calls; signals whether the - * exit path has to invoke rcu_irq_exit(). + * exit path has to invoke ct_irq_exit(). * @lockdep: Used exclusively in the irqentry_nmi_*() calls; ensures that * lockdep state is restored correctly on exit from nmi. * @@ -395,12 +395,12 @@ typedef struct irqentry_state { * * For kernel mode entries RCU handling is done conditional. If RCU is * watching then the only RCU requirement is to check whether the tick has - * to be restarted. If RCU is not watching then rcu_irq_enter() has to be - * invoked on entry and rcu_irq_exit() on exit. + * to be restarted. If RCU is not watching then ct_irq_enter() has to be + * invoked on entry and ct_irq_exit() on exit. * - * Avoiding the rcu_irq_enter/exit() calls is an optimization but also + * Avoiding the ct_irq_enter/exit() calls is an optimization but also * solves the problem of kernel mode pagefaults which can schedule, which - * is not possible after invoking rcu_irq_enter() without undoing it. + * is not possible after invoking ct_irq_enter() without undoing it. * * For user mode entries irqentry_enter_from_user_mode() is invoked to * establish the proper context for NOHZ_FULL. Otherwise scheduling on exit diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 6ebe754501c3..f1562d91c67d 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -29,6 +29,7 @@ #include #include #include +#include #define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b)) #define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b)) @@ -143,9 +144,9 @@ static inline void rcu_nocb_flush_deferred_wakeup(void) { } */ #define RCU_NONIDLE(a) \ do { \ - rcu_irq_enter_irqson(); \ + ct_irq_enter_irqson(); \ do { a; } while (0); \ - rcu_irq_exit_irqson(); \ + ct_irq_exit_irqson(); \ } while (0) /* diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 28031b15f878..55717a2eda08 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -200,13 +200,13 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) */ \ if (rcuidle) { \ __idx = srcu_read_lock_notrace(&tracepoint_srcu);\ - rcu_irq_enter_irqson(); \ + ct_irq_enter_irqson(); \ } \ \ __DO_TRACE_CALL(name, TP_ARGS(args)); \ \ if (rcuidle) { \ - rcu_irq_exit_irqson(); \ + ct_irq_exit_irqson(); \ srcu_read_unlock_notrace(&tracepoint_srcu, __idx);\ } \ \ -- cgit From 493c1822825f00025d6754ec0632990a27edc6f8 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:27 +0200 Subject: context_tracking: Take NMI eqs entrypoints over RCU The RCU dynticks counter is going to be merged into the context tracking subsystem. Prepare with moving the NMI extended quiescent states entrypoints to context tracking. For now those are dumb redirection to existing RCU calls. Acked-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking_irq.h | 4 ++++ include/linux/hardirq.h | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking_irq.h b/include/linux/context_tracking_irq.h index 62f62bbd1a50..c50b5670c4a5 100644 --- a/include/linux/context_tracking_irq.h +++ b/include/linux/context_tracking_irq.h @@ -7,11 +7,15 @@ void ct_irq_enter(void); void ct_irq_exit(void); void ct_irq_enter_irqson(void); void ct_irq_exit_irqson(void); +void ct_nmi_enter(void); +void ct_nmi_exit(void); #else static inline void ct_irq_enter(void) { } static inline void ct_irq_exit(void) { } static inline void ct_irq_enter_irqson(void) { } static inline void ct_irq_exit_irqson(void) { } +static inline void ct_nmi_enter(void) { } +static inline void ct_nmi_exit(void) { } #endif #endif diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index 76878b357ffa..345cdbe9c1b7 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -124,7 +124,7 @@ extern void rcu_nmi_exit(void); do { \ __nmi_enter(); \ lockdep_hardirq_enter(); \ - rcu_nmi_enter(); \ + ct_nmi_enter(); \ instrumentation_begin(); \ ftrace_nmi_enter(); \ instrumentation_end(); \ @@ -143,7 +143,7 @@ extern void rcu_nmi_exit(void); instrumentation_begin(); \ ftrace_nmi_exit(); \ instrumentation_end(); \ - rcu_nmi_exit(); \ + ct_nmi_exit(); \ lockdep_hardirq_exit(); \ __nmi_exit(); \ } while (0) -- cgit From 3864caafe7c66f01b188ffccb6a4215f3bf56292 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:28 +0200 Subject: rcu/context-tracking: Remove rcu_irq_enter/exit() Now rcu_irq_enter/exit() is an unnecessary middle call between ct_irq_enter/exit() and nmi_irq_enter/exit(). Take this opportunity to remove the former functions and move the comments above them to the new entrypoints. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/rcutiny.h | 4 ---- include/linux/rcutree.h | 4 ---- 2 files changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 5fed476f977f..591119413cf1 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -78,10 +78,6 @@ static inline void rcu_cpu_stall_reset(void) { } static inline int rcu_jiffies_till_stall_check(void) { return 21 * HZ; } static inline void rcu_idle_enter(void) { } static inline void rcu_idle_exit(void) { } -static inline void rcu_irq_enter(void) { } -static inline void rcu_irq_exit_irqson(void) { } -static inline void rcu_irq_enter_irqson(void) { } -static inline void rcu_irq_exit(void) { } static inline void rcu_irq_exit_check_preempt(void) { } #define rcu_is_idle_cpu(cpu) \ (is_idle_task(current) && !in_nmi() && !in_hardirq() && !in_serving_softirq()) diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 9c6cfb742504..4522b6a7cc42 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -47,10 +47,6 @@ void cond_synchronize_rcu(unsigned long oldstate); void rcu_idle_enter(void); void rcu_idle_exit(void); -void rcu_irq_enter(void); -void rcu_irq_exit(void); -void rcu_irq_enter_irqson(void); -void rcu_irq_exit_irqson(void); bool rcu_is_idle_cpu(int cpu); #ifdef CONFIG_PROVE_RCU -- cgit From 62e2412df4b90ae6706ce1f1a9649b789b2e44ef Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:29 +0200 Subject: rcu/context_tracking: Move dynticks counter to context tracking In order to prepare for merging RCU dynticks counter into the context tracking state, move the rcu_data's dynticks field to the context tracking structure. It will later be mixed within the context tracking state itself. [ paulmck: Move enum ctx_state into global scope. ] Acked-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking_state.h | 45 ++++++++++++++++++++++++++++------ 1 file changed, 38 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 9c16a8b2c194..5a8da2787287 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -6,7 +6,15 @@ #include #include +enum ctx_state { + CONTEXT_DISABLED = -1, /* returned by ct_state() if unknown */ + CONTEXT_KERNEL = 0, + CONTEXT_USER, + CONTEXT_GUEST, +}; + struct context_tracking { +#ifdef CONFIG_CONTEXT_TRACKING_USER /* * When active is false, probes are unset in order * to minimize overhead: TIF flags are cleared @@ -15,17 +23,40 @@ struct context_tracking { */ bool active; int recursion; - enum ctx_state { - CONTEXT_DISABLED = -1, /* returned by ct_state() if unknown */ - CONTEXT_KERNEL = 0, - CONTEXT_USER, - CONTEXT_GUEST, - } state; + enum ctx_state state; +#endif +#ifdef CONFIG_CONTEXT_TRACKING_IDLE + atomic_t dynticks; /* Even value for idle, else odd. */ +#endif }; +#ifdef CONFIG_CONTEXT_TRACKING +DECLARE_PER_CPU(struct context_tracking, context_tracking); +#endif + +#ifdef CONFIG_CONTEXT_TRACKING_IDLE +static __always_inline int ct_dynticks(void) +{ + return atomic_read(this_cpu_ptr(&context_tracking.dynticks)); +} + +static __always_inline int ct_dynticks_cpu(int cpu) +{ + struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); + + return atomic_read(&ct->dynticks); +} + +static __always_inline int ct_dynticks_cpu_acquire(int cpu) +{ + struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); + + return atomic_read_acquire(&ct->dynticks); +} +#endif /* #ifdef CONFIG_CONTEXT_TRACKING_IDLE */ + #ifdef CONFIG_CONTEXT_TRACKING_USER extern struct static_key_false context_tracking_key; -DECLARE_PER_CPU(struct context_tracking, context_tracking); static __always_inline bool context_tracking_enabled(void) { -- cgit From 904e600e60f46f92eb4bcfb95788b1fedf7e8237 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:30 +0200 Subject: rcu/context_tracking: Move dynticks_nesting to context tracking The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be moved to the context tracking structure. Acked-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking_state.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 5a8da2787287..13a4a9d1ec7e 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -27,6 +27,7 @@ struct context_tracking { #endif #ifdef CONFIG_CONTEXT_TRACKING_IDLE atomic_t dynticks; /* Even value for idle, else odd. */ + long dynticks_nesting; /* Track process nesting level. */ #endif }; @@ -53,6 +54,18 @@ static __always_inline int ct_dynticks_cpu_acquire(int cpu) return atomic_read_acquire(&ct->dynticks); } + +static __always_inline long ct_dynticks_nesting(void) +{ + return __this_cpu_read(context_tracking.dynticks_nesting); +} + +static __always_inline long ct_dynticks_nesting_cpu(int cpu) +{ + struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); + + return ct->dynticks_nesting; +} #endif /* #ifdef CONFIG_CONTEXT_TRACKING_IDLE */ #ifdef CONFIG_CONTEXT_TRACKING_USER -- cgit From 95e04f48ec0a634e2f221081f5fa1a904755f326 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:31 +0200 Subject: rcu/context_tracking: Move dynticks_nmi_nesting to context tracking The RCU eqs tracking is going to be performed by the context tracking subsystem. The related nesting counters thus need to be moved to the context tracking structure. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking_state.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 13a4a9d1ec7e..5f11e3d2d85a 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -13,6 +13,9 @@ enum ctx_state { CONTEXT_GUEST, }; +/* Offset to allow distinguishing irq vs. task-based idle entry/exit. */ +#define DYNTICK_IRQ_NONIDLE ((LONG_MAX / 2) + 1) + struct context_tracking { #ifdef CONFIG_CONTEXT_TRACKING_USER /* @@ -28,6 +31,7 @@ struct context_tracking { #ifdef CONFIG_CONTEXT_TRACKING_IDLE atomic_t dynticks; /* Even value for idle, else odd. */ long dynticks_nesting; /* Track process nesting level. */ + long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */ #endif }; @@ -66,6 +70,18 @@ static __always_inline long ct_dynticks_nesting_cpu(int cpu) return ct->dynticks_nesting; } + +static __always_inline long ct_dynticks_nmi_nesting(void) +{ + return __this_cpu_read(context_tracking.dynticks_nmi_nesting); +} + +static __always_inline long ct_dynticks_nmi_nesting_cpu(int cpu) +{ + struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); + + return ct->dynticks_nmi_nesting; +} #endif /* #ifdef CONFIG_CONTEXT_TRACKING_IDLE */ #ifdef CONFIG_CONTEXT_TRACKING_USER -- cgit From 564506495ca96a6e66d077d3d5b9f02d4b9b0f45 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:32 +0200 Subject: rcu/context-tracking: Move deferred nocb resched to context tracking To prepare for migrating the RCU eqs accounting code to context tracking, split the last-resort deferred nocb resched from rcu_user_enter() and move it into a separate call from context tracking. Acked-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/rcupdate.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index f1562d91c67d..3717cad983a6 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -112,6 +112,12 @@ static inline void rcu_user_enter(void) { } static inline void rcu_user_exit(void) { } #endif /* CONFIG_NO_HZ_FULL */ +#if defined(CONFIG_NO_HZ_FULL) && (!defined(CONFIG_GENERIC_ENTRY) || !defined(CONFIG_KVM_XFER_TO_GUEST_WORK)) +void rcu_irq_work_resched(void); +#else +static inline void rcu_irq_work_resched(void) { } +#endif + #ifdef CONFIG_RCU_NOCB_CPU void rcu_init_nohz(void); int rcu_nocb_cpu_offload(int cpu); -- cgit From 172114552701b85d5c3b1a089a73ee85d0d7786b Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:33 +0200 Subject: rcu/context-tracking: Move RCU-dynticks internal functions to context_tracking Move the core RCU eqs/dynticks functions to context tracking so that we can later merge all that code within context tracking. Acked-by: Paul E. McKenney Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking.h | 20 ++++++++++++++++++++ include/linux/rcutree.h | 3 +++ 2 files changed, 23 insertions(+) (limited to 'include/linux') diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 01abadb2f993..1f568676bc1d 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -122,6 +122,26 @@ static inline void context_tracking_init(void) { } #ifdef CONFIG_CONTEXT_TRACKING_IDLE extern void ct_idle_enter(void); extern void ct_idle_exit(void); + +/* + * Is the current CPU in an extended quiescent state? + * + * No ordering, as we are sampling CPU-local information. + */ +static __always_inline bool rcu_dynticks_curr_cpu_in_eqs(void) +{ + return !(arch_atomic_read(this_cpu_ptr(&context_tracking.dynticks)) & 0x1); +} + +/* + * Increment the current CPU's context_tracking structure's ->dynticks field + * with ordering. Return the new value. + */ +static __always_inline unsigned long rcu_dynticks_inc(int incby) +{ + return arch_atomic_add_return(incby, this_cpu_ptr(&context_tracking.dynticks)); +} + #else static inline void ct_idle_enter(void) { } static inline void ct_idle_exit(void) { } diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 4522b6a7cc42..24db1e41695c 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -55,6 +55,9 @@ void rcu_irq_exit_check_preempt(void); static inline void rcu_irq_exit_check_preempt(void) { } #endif +struct task_struct; +void rcu_preempt_deferred_qs(struct task_struct *t); + void exit_rcu(void); void rcu_scheduler_starting(void); -- cgit From c33ef43a359001415032665dfcd433979c462b71 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:34 +0200 Subject: rcu/context-tracking: Remove unused and/or unecessary middle functions Some eqs functions are now only used internally by context tracking, so their public declarations can be removed. Also middle functions such as rcu_user_*() and rcu_idle_*() which now directly call to rcu_eqs_enter() and rcu_eqs_exit() can be wiped out as well. Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/hardirq.h | 8 -------- include/linux/rcupdate.h | 8 -------- include/linux/rcutiny.h | 2 -- include/linux/rcutree.h | 2 -- 4 files changed, 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h index 345cdbe9c1b7..d57cab4d4c06 100644 --- a/include/linux/hardirq.h +++ b/include/linux/hardirq.h @@ -92,14 +92,6 @@ void irq_exit_rcu(void); #define arch_nmi_exit() do { } while (0) #endif -#ifdef CONFIG_TINY_RCU -static inline void rcu_nmi_enter(void) { } -static inline void rcu_nmi_exit(void) { } -#else -extern void rcu_nmi_enter(void); -extern void rcu_nmi_exit(void); -#endif - /* * NMI vs Tracing * -------------- diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 3717cad983a6..434da1eb88cd 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -104,14 +104,6 @@ static inline void rcu_sysrq_start(void) { } static inline void rcu_sysrq_end(void) { } #endif /* #else #ifdef CONFIG_RCU_STALL_COMMON */ -#ifdef CONFIG_NO_HZ_FULL -void rcu_user_enter(void); -void rcu_user_exit(void); -#else -static inline void rcu_user_enter(void) { } -static inline void rcu_user_exit(void) { } -#endif /* CONFIG_NO_HZ_FULL */ - #if defined(CONFIG_NO_HZ_FULL) && (!defined(CONFIG_GENERIC_ENTRY) || !defined(CONFIG_KVM_XFER_TO_GUEST_WORK)) void rcu_irq_work_resched(void); #else diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 591119413cf1..900ba35c3582 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -76,8 +76,6 @@ static inline int rcu_needs_cpu(void) static inline void rcu_virt_note_context_switch(int cpu) { } static inline void rcu_cpu_stall_reset(void) { } static inline int rcu_jiffies_till_stall_check(void) { return 21 * HZ; } -static inline void rcu_idle_enter(void) { } -static inline void rcu_idle_exit(void) { } static inline void rcu_irq_exit_check_preempt(void) { } #define rcu_is_idle_cpu(cpu) \ (is_idle_task(current) && !in_nmi() && !in_hardirq() && !in_serving_softirq()) diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 24db1e41695c..9cca00ed9bc9 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -45,8 +45,6 @@ unsigned long start_poll_synchronize_rcu(void); bool poll_state_synchronize_rcu(unsigned long oldstate); void cond_synchronize_rcu(unsigned long oldstate); -void rcu_idle_enter(void); -void rcu_idle_exit(void); bool rcu_is_idle_cpu(int cpu); #ifdef CONFIG_PROVE_RCU -- cgit From 171476775d32a40bfebf83250136c19b2e842672 Mon Sep 17 00:00:00 2001 From: Frederic Weisbecker Date: Wed, 8 Jun 2022 16:40:35 +0200 Subject: context_tracking: Convert state to atomic_t Context tracking's state and dynticks counter are going to be merged in a single field so that both updates can happen atomically and at the same time. Prepare for that with converting the state into an atomic_t. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Frederic Weisbecker Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Neeraj Upadhyay Cc: Uladzislau Rezki Cc: Joel Fernandes Cc: Boqun Feng Cc: Nicolas Saenz Julienne Cc: Marcelo Tosatti Cc: Xiongfeng Wang Cc: Yu Liao Cc: Phil Auld Cc: Paul Gortmaker Cc: Alex Belits Signed-off-by: Paul E. McKenney Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne --- include/linux/context_tracking.h | 32 ++++++------------- include/linux/context_tracking_state.h | 57 +++++++++++++++++++++++++++------- 2 files changed, 56 insertions(+), 33 deletions(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking.h b/include/linux/context_tracking.h index 1f568676bc1d..dcef4a9e4d63 100644 --- a/include/linux/context_tracking.h +++ b/include/linux/context_tracking.h @@ -56,7 +56,7 @@ static inline enum ctx_state exception_enter(void) !context_tracking_enabled()) return 0; - prev_ctx = this_cpu_read(context_tracking.state); + prev_ctx = __ct_state(); if (prev_ctx != CONTEXT_KERNEL) ct_user_exit(prev_ctx); @@ -86,33 +86,21 @@ static __always_inline void context_tracking_guest_exit(void) __ct_user_exit(CONTEXT_GUEST); } -/** - * ct_state() - return the current context tracking state if known - * - * Returns the current cpu's context tracking state if context tracking - * is enabled. If context tracking is disabled, returns - * CONTEXT_DISABLED. This should be used primarily for debugging. - */ -static __always_inline enum ctx_state ct_state(void) -{ - return context_tracking_enabled() ? - this_cpu_read(context_tracking.state) : CONTEXT_DISABLED; -} +#define CT_WARN_ON(cond) WARN_ON(context_tracking_enabled() && (cond)) + #else static inline void user_enter(void) { } static inline void user_exit(void) { } static inline void user_enter_irqoff(void) { } static inline void user_exit_irqoff(void) { } -static inline enum ctx_state exception_enter(void) { return 0; } +static inline int exception_enter(void) { return 0; } static inline void exception_exit(enum ctx_state prev_ctx) { } -static inline enum ctx_state ct_state(void) { return CONTEXT_DISABLED; } +static inline int ct_state(void) { return -1; } static __always_inline bool context_tracking_guest_enter(void) { return false; } static inline void context_tracking_guest_exit(void) { } - +#define CT_WARN_ON(cond) do { } while (0) #endif /* !CONFIG_CONTEXT_TRACKING_USER */ -#define CT_WARN_ON(cond) WARN_ON(context_tracking_enabled() && (cond)) - #ifdef CONFIG_CONTEXT_TRACKING_USER_FORCE extern void context_tracking_init(void); #else @@ -130,16 +118,16 @@ extern void ct_idle_exit(void); */ static __always_inline bool rcu_dynticks_curr_cpu_in_eqs(void) { - return !(arch_atomic_read(this_cpu_ptr(&context_tracking.dynticks)) & 0x1); + return !(arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & RCU_DYNTICKS_IDX); } /* - * Increment the current CPU's context_tracking structure's ->dynticks field + * Increment the current CPU's context_tracking structure's ->state field * with ordering. Return the new value. */ -static __always_inline unsigned long rcu_dynticks_inc(int incby) +static __always_inline unsigned long ct_state_inc(int incby) { - return arch_atomic_add_return(incby, this_cpu_ptr(&context_tracking.dynticks)); + return arch_atomic_add_return(incby, this_cpu_ptr(&context_tracking.state)); } #else diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index 5f11e3d2d85a..e20a74bc0597 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -6,15 +6,23 @@ #include #include +/* Offset to allow distinguishing irq vs. task-based idle entry/exit. */ +#define DYNTICK_IRQ_NONIDLE ((LONG_MAX / 2) + 1) + enum ctx_state { - CONTEXT_DISABLED = -1, /* returned by ct_state() if unknown */ - CONTEXT_KERNEL = 0, - CONTEXT_USER, - CONTEXT_GUEST, + CONTEXT_DISABLED = -1, /* returned by ct_state() if unknown */ + CONTEXT_KERNEL = 0, + CONTEXT_IDLE = 1, + CONTEXT_USER = 2, + CONTEXT_GUEST = 3, + CONTEXT_MAX = 4, }; -/* Offset to allow distinguishing irq vs. task-based idle entry/exit. */ -#define DYNTICK_IRQ_NONIDLE ((LONG_MAX / 2) + 1) +/* Even value for idle, else odd. */ +#define RCU_DYNTICKS_IDX CONTEXT_MAX + +#define CT_STATE_MASK (CONTEXT_MAX - 1) +#define CT_DYNTICKS_MASK (~CT_STATE_MASK) struct context_tracking { #ifdef CONFIG_CONTEXT_TRACKING_USER @@ -26,10 +34,11 @@ struct context_tracking { */ bool active; int recursion; - enum ctx_state state; +#endif +#ifdef CONFIG_CONTEXT_TRACKING + atomic_t state; #endif #ifdef CONFIG_CONTEXT_TRACKING_IDLE - atomic_t dynticks; /* Even value for idle, else odd. */ long dynticks_nesting; /* Track process nesting level. */ long dynticks_nmi_nesting; /* Track irq/NMI nesting level. */ #endif @@ -37,26 +46,31 @@ struct context_tracking { #ifdef CONFIG_CONTEXT_TRACKING DECLARE_PER_CPU(struct context_tracking, context_tracking); + +static __always_inline int __ct_state(void) +{ + return atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK; +} #endif #ifdef CONFIG_CONTEXT_TRACKING_IDLE static __always_inline int ct_dynticks(void) { - return atomic_read(this_cpu_ptr(&context_tracking.dynticks)); + return atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_DYNTICKS_MASK; } static __always_inline int ct_dynticks_cpu(int cpu) { struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); - return atomic_read(&ct->dynticks); + return atomic_read(&ct->state) & CT_DYNTICKS_MASK; } static __always_inline int ct_dynticks_cpu_acquire(int cpu) { struct context_tracking *ct = per_cpu_ptr(&context_tracking, cpu); - return atomic_read_acquire(&ct->dynticks); + return atomic_read_acquire(&ct->state) & CT_DYNTICKS_MASK; } static __always_inline long ct_dynticks_nesting(void) @@ -102,6 +116,27 @@ static inline bool context_tracking_enabled_this_cpu(void) return context_tracking_enabled() && __this_cpu_read(context_tracking.active); } +/** + * ct_state() - return the current context tracking state if known + * + * Returns the current cpu's context tracking state if context tracking + * is enabled. If context tracking is disabled, returns + * CONTEXT_DISABLED. This should be used primarily for debugging. + */ +static __always_inline int ct_state(void) +{ + int ret; + + if (!context_tracking_enabled()) + return CONTEXT_DISABLED; + + preempt_disable(); + ret = __ct_state(); + preempt_enable(); + + return ret; +} + #else static __always_inline bool context_tracking_enabled(void) { return false; } static __always_inline bool context_tracking_enabled_cpu(int cpu) { return false; } -- cgit From 1dcaa3b462265f688613163a1562a65ee53a3311 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 16 Jun 2022 09:30:37 -0700 Subject: context_tracking: Use arch_atomic_read() in __ct_state for KASAN Context tracking's __ct_state() function can be invoked from noinstr state where RCU is not watching. This means that its use of atomic_read() causes KASAN to invoke the non-noinstr __kasan_check_read() function from the noinstr function __ct_state(). This is problematic because someone tracing the __kasan_check_read() function could get a nasty surprise because of RCU not watching. This commit therefore replaces the __ct_state() function's use of atomic_read() with arch_atomic_read(), which KASAN does not attempt to add instrumention to. Reported-by: kernel test robot Signed-off-by: Paul E. McKenney Cc: Frederic Weisbecker Cc: Marco Elver Reviewed-by: Marco Elver --- include/linux/context_tracking_state.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/context_tracking_state.h b/include/linux/context_tracking_state.h index e20a74bc0597..4a4d56f77180 100644 --- a/include/linux/context_tracking_state.h +++ b/include/linux/context_tracking_state.h @@ -49,7 +49,7 @@ DECLARE_PER_CPU(struct context_tracking, context_tracking); static __always_inline int __ct_state(void) { - return atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK; + return arch_atomic_read(this_cpu_ptr(&context_tracking.state)) & CT_STATE_MASK; } #endif -- cgit From 6a4e9307cd3782f8e805fac970b9a240ab3078d6 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Sat, 21 May 2022 13:11:12 +0200 Subject: dmaengine: qcom: fix typo in comment Spelling mistake (triple letters) in comment. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall Link: https://lore.kernel.org/r/20220521111145.81697-62-Julia.Lawall@inria.fr Signed-off-by: Vinod Koul --- include/linux/dma/qcom-gpi-dma.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dma/qcom-gpi-dma.h b/include/linux/dma/qcom-gpi-dma.h index f46dc3372f11..6680dd1a43c6 100644 --- a/include/linux/dma/qcom-gpi-dma.h +++ b/include/linux/dma/qcom-gpi-dma.h @@ -26,7 +26,7 @@ enum spi_transfer_cmd { * @clk_div: source clock divider * @clk_src: serial clock * @cmd: spi cmd - * @fragmentation: keep CS assserted at end of sequence + * @fragmentation: keep CS asserted at end of sequence * @cs: chip select toggle * @set_config: set peripheral config * @rx_len: receive length for buffer -- cgit From bd29c00edd0a5dac8b6e7332bb470cd50f92e893 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 21 Jun 2022 17:56:38 -0500 Subject: soundwire: revisit driver bind/unbind and callbacks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In the SoundWire probe, we store a pointer from the driver ops into the 'slave' structure. This can lead to kernel oopses when unbinding codec drivers, e.g. with the following sequence to remove machine driver and codec driver. /sbin/modprobe -r snd_soc_sof_sdw /sbin/modprobe -r snd_soc_rt711 The full details can be found in the BugLink below, for reference the two following examples show different cases of driver ops/callbacks being invoked after the driver .remove(). kernel: BUG: kernel NULL pointer dereference, address: 0000000000000150 kernel: Workqueue: events cdns_update_slave_status_work [soundwire_cadence] kernel: RIP: 0010:mutex_lock+0x19/0x30 kernel: Call Trace: kernel: ? sdw_handle_slave_status+0x426/0xe00 [soundwire_bus 94ff184bf398570c3f8ff7efe9e32529f532e4ae] kernel: ? newidle_balance+0x26a/0x400 kernel: ? cdns_update_slave_status_work+0x1e9/0x200 [soundwire_cadence 1bcf98eebe5ba9833cd433323769ac923c9c6f82] kernel: BUG: unable to handle page fault for address: ffffffffc07654c8 kernel: Workqueue: pm pm_runtime_work kernel: RIP: 0010:sdw_bus_prep_clk_stop+0x6f/0x160 [soundwire_bus] kernel: Call Trace: kernel: kernel: sdw_cdns_clock_stop+0xb5/0x1b0 [soundwire_cadence 1bcf98eebe5ba9833cd433323769ac923c9c6f82] kernel: intel_suspend_runtime+0x5f/0x120 [soundwire_intel aca858f7c87048d3152a4a41bb68abb9b663a1dd] kernel: ? dpm_sysfs_remove+0x60/0x60 This was not detected earlier in Intel tests since the tests first remove the parent PCI device and shut down the bus. The sequence above is a corner case which keeps the bus operational but without a driver bound. While trying to solve this kernel oopses, it became clear that the existing SoundWire bus does not deal well with the unbind case. Commit 528be501b7d4a ("soundwire: sdw_slave: add probe_complete structure and new fields") added a 'probed' status variable and a 'probe_complete' struct completion. This status is however not reset on remove and likewise the 'probe complete' is not re-initialized, so the bind/unbind/bind test cases would fail. The timeout used before the 'update_status' callback was also a bad idea in hindsight, there should really be no timing assumption as to if and when a driver is bound to a device. An initial draft was based on device_lock() and device_unlock() was tested. This proved too complicated, with deadlocks created during the suspend-resume sequences, which also use the same device_lock/unlock() as the bind/unbind sequences. On a CometLake device, a bad DSDT/BIOS caused spurious resumes and the use of device_lock() caused hangs during suspend. After multiple weeks or testing and painful reverse-engineering of deadlocks on different devices, we looked for alternatives that did not interfere with the device core. A bus notifier was used successfully to keep track of DRIVER_BOUND and DRIVER_UNBIND events. This solved the bind-unbind-bind case in tests, but it can still be defeated with a theoretical corner case where the memory is freed by a .remove while the callback is in use. The notifier only helps make sure the driver callbacks are valid, but not that the memory allocated in probe remains valid while the callbacks are invoked. This patch suggests the introduction of a new 'sdw_dev_lock' mutex protecting probe/remove and all driver callbacks. Since this mutex is 'local' to SoundWire only, it does not interfere with existing locks and does not create deadlocks. In addition, this patch removes the 'probe_complete' completion, instead we directly invoke the 'update_status' from the probe routine. That removes any sort of timing dependency and a much better support for the device/driver model, the driver could be bound before the bus started, or eons after the bus started and the hardware would be properly initialized in all cases. BugLink: https://github.com/thesofproject/linux/issues/3531 Fixes: 56d4fe31af77 ("soundwire: Add MIPI DisCo property helpers") Fixes: 528be501b7d4a ("soundwire: sdw_slave: add probe_complete structure and new fields") Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Reviewed-by: Ranjani Sridharan Reviewed-by: Bard Liao Reviewed-by: Péter Ujfalusi Link: https://lore.kernel.org/r/20220621225641.221170-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index 76ce3f3ac0f2..bf6f0decb3f6 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -646,9 +646,6 @@ struct sdw_slave_ops { * @dev_num: Current Device Number, values can be 0 or dev_num_sticky * @dev_num_sticky: one-time static Device Number assigned by Bus * @probed: boolean tracking driver state - * @probe_complete: completion utility to control potential races - * on startup between driver probe/initialization and SoundWire - * Slave state changes/implementation-defined interrupts * @enumeration_complete: completion utility to control potential races * on startup between device enumeration and read/write access to the * Slave device @@ -663,6 +660,7 @@ struct sdw_slave_ops { * for a Slave happens for the first time after enumeration * @is_mockup_device: status flag used to squelch errors in the command/control * protocol for SoundWire mockup devices + * @sdw_dev_lock: mutex used to protect callbacks/remove races */ struct sdw_slave { struct sdw_slave_id id; @@ -680,12 +678,12 @@ struct sdw_slave { u16 dev_num; u16 dev_num_sticky; bool probed; - struct completion probe_complete; struct completion enumeration_complete; struct completion initialization_complete; u32 unattach_request; bool first_interrupt_done; bool is_mockup_device; + struct mutex sdw_dev_lock; /* protect callbacks/remove races */ }; #define dev_to_sdw_dev(_dev) container_of(_dev, struct sdw_slave, dev) -- cgit From 9a24bb35b0d81edb826f174d7752c2a54bc00abd Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 21 Jun 2022 17:56:39 -0500 Subject: soundwire: peripheral: remove useless ops pointer MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Now that we are using the ops structure directly from the driver, there are no users left of this ops pointer. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Reviewed-by: Ranjani Sridharan Reviewed-by: Bard Liao Reviewed-by: Péter Ujfalusi Link: https://lore.kernel.org/r/20220621225641.221170-3-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index bf6f0decb3f6..39058c841469 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -637,7 +637,6 @@ struct sdw_slave_ops { * @dev: Linux device * @status: Status reported by the Slave * @bus: Bus handle - * @ops: Slave callback ops * @prop: Slave properties * @debugfs: Slave debugfs * @node: node for bus list @@ -667,7 +666,6 @@ struct sdw_slave { struct device dev; enum sdw_slave_status status; struct sdw_bus *bus; - const struct sdw_slave_ops *ops; struct sdw_slave_prop prop; #ifdef CONFIG_DEBUG_FS struct dentry *debugfs; -- cgit From e9a023f2b73ac35ff5cfbefe8524c64d8173d65f Mon Sep 17 00:00:00 2001 From: Eric Lin Date: Tue, 5 Jul 2022 17:19:20 +0800 Subject: drivers/perf: riscv_pmu: Add riscv pmu pm notifier Currently, when the CPU is doing suspend to ram, we don't save pmu counter register and its content will be lost. To ensure perf profiling is not affected by suspend to ram, this patch is based on arm_pmu CPU_PM notifier and implements riscv pmu pm notifier. In the pm notifier, we stop the counter and update the counter value before suspend and start the counter after resume. Signed-off-by: Eric Lin Link: https://lore.kernel.org/r/20220705091920.27432-1-eric.lin@sifive.com Signed-off-by: Will Deacon --- include/linux/perf/riscv_pmu.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h index 46f9b6fe306e..bf66fe011fa8 100644 --- a/include/linux/perf/riscv_pmu.h +++ b/include/linux/perf/riscv_pmu.h @@ -56,9 +56,13 @@ struct riscv_pmu { struct cpu_hw_events __percpu *hw_events; struct hlist_node node; + struct notifier_block riscv_pm_nb; }; #define to_riscv_pmu(p) (container_of(p, struct riscv_pmu, pmu)) + +void riscv_pmu_start(struct perf_event *event, int flags); +void riscv_pmu_stop(struct perf_event *event, int flags); unsigned long riscv_pmu_ctr_read_csr(unsigned long csr); int riscv_pmu_event_set_period(struct perf_event *event); uint64_t riscv_pmu_ctr_get_width_mask(struct perf_event *event); -- cgit From 66637ab137b44914356a9dc7a9b3f8ebcf0b0695 Mon Sep 17 00:00:00 2001 From: Guangbin Huang Date: Tue, 28 Jun 2022 14:34:19 +0800 Subject: drivers/perf: hisi: add driver for HNS3 PMU HNS3(HiSilicon Network System 3) PMU is RCiEP device in HiSilicon SoC NIC, supports collection of performance statistics such as bandwidth, latency, packet rate and interrupt rate. NIC of each SICL has one PMU device for it. Driver registers each PMU device to perf, and exports information of supported events, filter mode of each event, bdf range, hardware clock frequency, identifier and so on via sysfs. Each PMU device has its own registers of control, counters and interrupt, and it supports 8 hardware events, each hardward event has its own registers for configuration, counters and interrupt. Filter options contains: config - select event port - select physical port of nic tc - select tc(must be used with port) func - select PF/VF queue - select queue of PF/VF(must be used with func) intr - select interrupt number(must be used with func) global - select all functions of IO DIE Signed-off-by: Guangbin Huang Reviewed-by: John Garry Reviewed-by: Shaokun Zhang Link: https://lore.kernel.org/r/20220628063419.38514-3-huangguangbin2@huawei.com Signed-off-by: Will Deacon --- include/linux/cpuhotplug.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 19f0dbfdd7fe..3e99fb4d3134 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -230,6 +230,7 @@ enum cpuhp_state { CPUHP_AP_PERF_ARM_HISI_PA_ONLINE, CPUHP_AP_PERF_ARM_HISI_SLLC_ONLINE, CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, + CPUHP_AP_PERF_ARM_HNS3_PMU_ONLINE, CPUHP_AP_PERF_ARM_L2X0_ONLINE, CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE, CPUHP_AP_PERF_ARM_QCOM_L3_ONLINE, -- cgit From 3b7e2482f9a3f2e99628048b945c9c6cc53bd38e Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Wed, 15 Jun 2022 11:10:36 +0100 Subject: iommu: Introduce a callback to struct iommu_resv_region MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A callback is introduced to struct iommu_resv_region to free memory allocations associated with the reserved region. This will be useful when we introduce support for IORT RMR based reserved regions. Reviewed-by: Christoph Hellwig Tested-by: Steven Price Tested-by: Laurentiu Tudor Tested-by: Hanjun Guo Signed-off-by: Shameer Kolothum Acked-by: Robin Murphy Link: https://lore.kernel.org/r/20220615101044.1972-2-shameerali.kolothum.thodi@huawei.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 5e1afe169549..b22ffa6bc4a9 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -135,6 +135,7 @@ enum iommu_resv_type { * @length: Length of the region in bytes * @prot: IOMMU Protection flags (READ/WRITE/...) * @type: Type of the reserved region + * @free: Callback to free associated memory allocations */ struct iommu_resv_region { struct list_head list; @@ -142,6 +143,7 @@ struct iommu_resv_region { size_t length; int prot; enum iommu_resv_type type; + void (*free)(struct device *dev, struct iommu_resv_region *region); }; /** -- cgit From 8778b1d48117055c710a01498f65fa730160fdfa Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Wed, 15 Jun 2022 11:10:37 +0100 Subject: ACPI/IORT: Make iort_iommu_msi_get_resv_regions() return void At present iort_iommu_msi_get_resv_regions() returns the number of MSI reserved regions on success and there are no users for this. The reserved region list will get populated anyway for platforms that require the HW MSI region reservation. Hence, change the function to return void instead. Reviewed-by: Christoph Hellwig Tested-by: Steven Price Tested-by: Laurentiu Tudor Reviewed-by: Hanjun Guo Signed-off-by: Shameer Kolothum Acked-by: Robin Murphy Link: https://lore.kernel.org/r/20220615101044.1972-3-shameerali.kolothum.thodi@huawei.com Signed-off-by: Joerg Roedel --- include/linux/acpi_iort.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index f1f0842a2cb2..a8198b83753d 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -36,7 +36,7 @@ int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); /* IOMMU interface */ int iort_dma_get_ranges(struct device *dev, u64 *size); int iort_iommu_configure_id(struct device *dev, const u32 *id_in); -int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); +void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); phys_addr_t acpi_iort_dma_get_max_cpu_address(void); #else static inline void acpi_iort_init(void) { } @@ -52,8 +52,8 @@ static inline int iort_dma_get_ranges(struct device *dev, u64 *size) static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in) { return -ENODEV; } static inline -int iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) -{ return 0; } +void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) +{ } static inline phys_addr_t acpi_iort_dma_get_max_cpu_address(void) { return PHYS_ADDR_MAX; } -- cgit From 55be25b8b5e4e2fd680cfb073b84a74a79c002fa Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Wed, 15 Jun 2022 11:10:38 +0100 Subject: ACPI/IORT: Provide a generic helper to retrieve reserve regions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently IORT provides a helper to retrieve HW MSI reserve regions. Change this to a generic helper to retrieve any IORT related reserve regions. This will be useful when we add support for RMR nodes in subsequent patches. [Lorenzo: For ACPI IORT] Reviewed-by: Lorenzo Pieralisi Reviewed-by: Christoph Hellwig Tested-by: Steven Price Tested-by: Laurentiu Tudor Tested-by: Hanjun Guo Reviewed-by: Hanjun Guo Signed-off-by: Shameer Kolothum Acked-by: Robin Murphy Link: https://lore.kernel.org/r/20220615101044.1972-4-shameerali.kolothum.thodi@huawei.com Signed-off-by: Joerg Roedel --- include/linux/acpi_iort.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index a8198b83753d..e5d2de9caf7f 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -36,7 +36,7 @@ int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); /* IOMMU interface */ int iort_dma_get_ranges(struct device *dev, u64 *size); int iort_iommu_configure_id(struct device *dev, const u32 *id_in); -void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head); +void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head); phys_addr_t acpi_iort_dma_get_max_cpu_address(void); #else static inline void acpi_iort_init(void) { } @@ -52,7 +52,7 @@ static inline int iort_dma_get_ranges(struct device *dev, u64 *size) static inline int iort_iommu_configure_id(struct device *dev, const u32 *id_in) { return -ENODEV; } static inline -void iort_iommu_msi_get_resv_regions(struct device *dev, struct list_head *head) +void iort_iommu_get_resv_regions(struct device *dev, struct list_head *head) { } static inline phys_addr_t acpi_iort_dma_get_max_cpu_address(void) -- cgit From 491cf4a6735a957cd0733365f8d17e0f4308f5a4 Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Wed, 15 Jun 2022 11:10:39 +0100 Subject: ACPI/IORT: Add support to retrieve IORT RMR reserved regions Parse through the IORT RMR nodes and populate the reserve region list corresponding to a given IOMMU and device(optional). Also, go through the ID mappings of the RMR node and retrieve all the SIDs associated with it. Reviewed-by: Lorenzo Pieralisi Tested-by: Steven Price Tested-by: Laurentiu Tudor Tested-by: Hanjun Guo Reviewed-by: Hanjun Guo Signed-off-by: Shameer Kolothum Acked-by: Robin Murphy Link: https://lore.kernel.org/r/20220615101044.1972-5-shameerali.kolothum.thodi@huawei.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index b22ffa6bc4a9..e6abd998dbe7 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -146,6 +146,14 @@ struct iommu_resv_region { void (*free)(struct device *dev, struct iommu_resv_region *region); }; +struct iommu_iort_rmr_data { + struct iommu_resv_region rr; + + /* Stream IDs associated with IORT RMR entry */ + const u32 *sids; + u32 num_sids; +}; + /** * enum iommu_dev_features - Per device IOMMU features * @IOMMU_DEV_FEAT_SVA: Shared Virtual Addresses -- cgit From e302eea8f49717253ac64fd45b7cc719e87fa010 Mon Sep 17 00:00:00 2001 From: Shameer Kolothum Date: Wed, 15 Jun 2022 11:10:40 +0100 Subject: ACPI/IORT: Add a helper to retrieve RMR info directly This will provide a way for SMMU drivers to retrieve StreamIDs associated with IORT RMR nodes and use that to set bypass settings for those IDs. Tested-by: Steven Price Tested-by: Laurentiu Tudor Tested-by: Hanjun Guo Reviewed-by: Hanjun Guo Signed-off-by: Shameer Kolothum Acked-by: Robin Murphy Link: https://lore.kernel.org/r/20220615101044.1972-6-shameerali.kolothum.thodi@huawei.com Signed-off-by: Joerg Roedel --- include/linux/acpi_iort.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/acpi_iort.h b/include/linux/acpi_iort.h index e5d2de9caf7f..b43be0987b19 100644 --- a/include/linux/acpi_iort.h +++ b/include/linux/acpi_iort.h @@ -33,6 +33,10 @@ struct irq_domain *iort_get_device_domain(struct device *dev, u32 id, enum irq_domain_bus_token bus_token); void acpi_configure_pmsi_domain(struct device *dev); int iort_pmsi_get_dev_id(struct device *dev, u32 *dev_id); +void iort_get_rmr_sids(struct fwnode_handle *iommu_fwnode, + struct list_head *head); +void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode, + struct list_head *head); /* IOMMU interface */ int iort_dma_get_ranges(struct device *dev, u64 *size); int iort_iommu_configure_id(struct device *dev, const u32 *id_in); @@ -46,6 +50,10 @@ static inline struct irq_domain *iort_get_device_domain( struct device *dev, u32 id, enum irq_domain_bus_token bus_token) { return NULL; } static inline void acpi_configure_pmsi_domain(struct device *dev) { } +static inline +void iort_get_rmr_sids(struct fwnode_handle *iommu_fwnode, struct list_head *head) { } +static inline +void iort_put_rmr_sids(struct fwnode_handle *iommu_fwnode, struct list_head *head) { } /* IOMMU interface */ static inline int iort_dma_get_ranges(struct device *dev, u64 *size) { return -ENODEV; } -- cgit From 4140d77a022101376bbfa3ec3e3da5063455c60e Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Sat, 25 Jun 2022 21:34:30 +0800 Subject: iommu/vt-d: Fix RID2PASID setup/teardown failure The IOMMU driver shares the pasid table for PCI alias devices. When the RID2PASID entry of the shared pasid table has been filled by the first device, the subsequent device will encounter the "DMAR: Setup RID2PASID failed" failure as the pasid entry has already been marked as present. As the result, the IOMMU probing process will be aborted. On the contrary, when any alias device is hot-removed from the system, for example, by writing to /sys/bus/pci/devices/.../remove, the shared RID2PASID will be cleared without any notifications to other devices. As the result, any DMAs from those rest devices are blocked. Sharing pasid table among PCI alias devices could save two memory pages for devices underneath the PCIe-to-PCI bridges. Anyway, considering that those devices are rare on modern platforms that support VT-d in scalable mode and the saved memory is negligible, it's reasonable to remove this part of immature code to make the driver feasible and stable. Fixes: ef848b7e5a6a0 ("iommu/vt-d: Setup pasid entry for RID2PASID support") Reported-by: Chenyi Qiang Reported-by: Ethan Zhao Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian Reviewed-by: Ethan Zhao Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220623065720.727849-1-baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20220625133430.2200315-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 4f29139bbfc3..5fcf89faa31a 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -612,7 +612,6 @@ struct intel_iommu { struct device_domain_info { struct list_head link; /* link to domain siblings */ struct list_head global; /* link to global list */ - struct list_head table; /* link to pasid table */ u32 segment; /* PCI segment number */ u8 bus; /* PCI bus number */ u8 devfn; /* PCI devfn number */ @@ -729,8 +728,6 @@ extern int dmar_ir_support(void); void *alloc_pgtable_page(int node); void free_pgtable_page(void *vaddr); struct intel_iommu *domain_get_iommu(struct dmar_domain *domain); -int for_each_device_domain(int (*fn)(struct device_domain_info *info, - void *data), void *data); void iommu_flush_write_buffer(struct intel_iommu *iommu); int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev); struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn); -- cgit From 88527790c079fb1ea41cbcfa4450ee37906a2fb0 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 5 Jul 2022 16:59:24 -0700 Subject: tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3 Since optimisitic decrypt may add extra load in case of retries require socket owner to explicitly opt-in. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/sockptr.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index ea193414298b..d45902fb4cad 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -102,4 +102,12 @@ static inline long strncpy_from_sockptr(char *dst, sockptr_t src, size_t count) return strncpy_from_user(dst, src.user, count); } +static inline int check_zeroed_sockptr(sockptr_t src, size_t offset, + size_t size) +{ + if (!sockptr_is_kernel(src)) + return check_zeroed_user(src.user + offset, size); + return memchr_inv(src.kernel + offset, 0, size) == NULL; +} + #endif /* _LINUX_SOCKPTR_H */ -- cgit From 2fd26970cf66bd52dc42843c46968040caa8c9a1 Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Wed, 6 Jul 2022 06:10:26 +1000 Subject: Revert "kernfs: Change kernfs_notify_list to llist." This reverts commit b8f35fa1188b84035c59d4842826c4e93a1b1c9f. This is causing regression due to same kernfs_node getting added multiple times in kernfs_notify_list so revert it until safe way of using llist in this context is found. Reported-by: Nathan Chancellor Reported-by: Michael Walle Reported-by: Marek Szyprowski Signed-off-by: Imran Khan Cc: Tejun Heo Link: https://lore.kernel.org/r/20220705201026.2487665-1-imran.f.khan@oracle.com Signed-off-by: Greg Kroah-Hartman --- include/linux/kernfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 13e703f615f7..367044d7708c 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -173,7 +173,7 @@ struct kernfs_elem_attr { const struct kernfs_ops *ops; struct kernfs_open_node __rcu *open; loff_t size; - struct llist_node notify_next; /* for kernfs_notify() */ + struct kernfs_node *notify_next; /* for kernfs_notify() */ }; /* -- cgit From 99e48cd6855e9535488e3c90d65edd46c6e6fc1b Mon Sep 17 00:00:00 2001 From: John Garry Date: Wed, 6 Jul 2022 20:03:50 +0800 Subject: blk-mq: Add a flag for reserved requests Add a flag for reserved requests so that drivers may know this for any special handling. Signed-off-by: John Garry Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/1657109034-206040-3-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 43aad0da3305..7c62b7fabec7 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -57,6 +57,7 @@ typedef __u32 __bitwise req_flags_t; #define RQF_TIMED_OUT ((__force req_flags_t)(1 << 21)) /* queue has elevator attached */ #define RQF_ELV ((__force req_flags_t)(1 << 22)) +#define RQF_RESV ((__force req_flags_t)(1 << 23)) /* flags that prevent us from merging requests: */ #define RQF_NOMERGE_FLAGS \ @@ -825,6 +826,11 @@ static inline bool blk_mq_need_time_stamp(struct request *rq) return (rq->rq_flags & (RQF_IO_STAT | RQF_STATS | RQF_ELV)); } +static inline bool blk_mq_is_reserved_rq(struct request *rq) +{ + return rq->rq_flags & RQF_RESV; +} + /* * Batched completions only work when there is no I/O error and no special * ->end_io handler. -- cgit From 9bdb4833dd399cbff82cc20893f52bdec66a9eca Mon Sep 17 00:00:00 2001 From: John Garry Date: Wed, 6 Jul 2022 20:03:51 +0800 Subject: blk-mq: Drop blk_mq_ops.timeout 'reserved' arg With new API blk_mq_is_reserved_rq() we can tell if a request is from the reserved pool, so stop passing 'reserved' arg. There is actually only a single user of that arg for all the callback implementations, which can use blk_mq_is_reserved_rq() instead. This will also allow us to stop passing the same 'reserved' around the blk-mq iter functions next. Signed-off-by: John Garry Reviewed-by: Christoph Hellwig Reviewed-by: Bart Van Assche Reviewed-by: Hannes Reinecke Acked-by: Ulf Hansson # For MMC Reviewed-by: Martin K. Petersen Link: https://lore.kernel.org/r/1657109034-206040-4-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 7c62b7fabec7..c84c56d296fe 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -575,7 +575,7 @@ struct blk_mq_ops { /** * @timeout: Called on request timeout. */ - enum blk_eh_timer_return (*timeout)(struct request *, bool); + enum blk_eh_timer_return (*timeout)(struct request *); /** * @poll: Called to poll for completion of a specific tag. -- cgit From 2dd6532e9591f201e7571b30915db807603ab924 Mon Sep 17 00:00:00 2001 From: John Garry Date: Wed, 6 Jul 2022 20:03:53 +0800 Subject: blk-mq: Drop 'reserved' arg of busy_tag_iter_fn We no longer use the 'reserved' arg in busy_tag_iter_fn for any iter function so it may be dropped. Signed-off-by: John Garry Reviewed-by: Christoph Hellwig Reviewed-by: Hannes Reinecke Reviewed-by: Martin K. Petersen Reviewed-by: Sagi Grimberg #nvme Reviewed-by: Bart Van Assche Link: https://lore.kernel.org/r/1657109034-206040-6-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index c84c56d296fe..810a24884f7e 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -520,7 +520,7 @@ struct blk_mq_queue_data { bool last; }; -typedef bool (busy_tag_iter_fn)(struct request *, void *, bool); +typedef bool (busy_tag_iter_fn)(struct request *, void *); /** * struct blk_mq_ops - Callback functions that implements block driver -- cgit From f1a8bbd1100d9cd117bc8b7fc0903982bbaf474f Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 6 Jul 2022 09:03:35 +0200 Subject: block: remove a superflous ifdef in blkdev.h It doesn't hurt to always have the blk_zone_cond_str prototype, and the two inlines can also be defined unconditionally. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220706070350.1703384-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b9a94c53c6cd..270cd0c55292 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -899,8 +899,6 @@ static inline struct request_queue *bdev_get_queue(struct block_device *bdev) return bdev->bd_queue; /* this is never NULL */ } -#ifdef CONFIG_BLK_DEV_ZONED - /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */ const char *blk_zone_cond_str(enum blk_zone_cond zone_cond); @@ -915,7 +913,6 @@ static inline unsigned int bio_zone_is_seq(struct bio *bio) return blk_queue_zone_is_seq(bdev_get_queue(bio->bi_bdev), bio->bi_iter.bi_sector); } -#endif /* CONFIG_BLK_DEV_ZONED */ /* * Return how much of the chunk is left to be used for I/O at a given offset. -- cgit From 6b2bd274744e6454ba7bbbe6a09b44866f2f414a Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 6 Jul 2022 09:03:40 +0200 Subject: block: pass a gendisk to blk_queue_set_zoned Prepare for storing the zone related field in struct gendisk instead of struct request_queue. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220706070350.1703384-7-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 270cd0c55292..416faa013782 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -291,7 +291,7 @@ struct queue_limits { typedef int (*report_zones_cb)(struct blk_zone *zone, unsigned int idx, void *data); -void blk_queue_set_zoned(struct gendisk *disk, enum blk_zoned_model model); +void disk_set_zoned(struct gendisk *disk, enum blk_zoned_model model); #ifdef CONFIG_BLK_DEV_ZONED -- cgit From 1dc0172027b0aa09823b430e395b1116d2745f36 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 6 Jul 2022 09:03:43 +0200 Subject: block: remove queue_max_open_zones and queue_max_active_zones Always use the bdev based helpers instead. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220706070350.1703384-10-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 37 ++++++++++--------------------------- 1 file changed, 10 insertions(+), 27 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 416faa013782..7d4105d23b0a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -702,21 +702,22 @@ static inline void blk_queue_max_open_zones(struct request_queue *q, q->max_open_zones = max_open_zones; } -static inline unsigned int queue_max_open_zones(const struct request_queue *q) -{ - return q->max_open_zones; -} - static inline void blk_queue_max_active_zones(struct request_queue *q, unsigned int max_active_zones) { q->max_active_zones = max_active_zones; } -static inline unsigned int queue_max_active_zones(const struct request_queue *q) +static inline unsigned int bdev_max_open_zones(struct block_device *bdev) +{ + return bdev->bd_disk->queue->max_open_zones; +} + +static inline unsigned int bdev_max_active_zones(struct block_device *bdev) { - return q->max_active_zones; + return bdev->bd_disk->queue->max_active_zones; } + #else /* CONFIG_BLK_DEV_ZONED */ static inline unsigned int blk_queue_nr_zones(struct request_queue *q) { @@ -732,11 +733,11 @@ static inline unsigned int blk_queue_zone_no(struct request_queue *q, { return 0; } -static inline unsigned int queue_max_open_zones(const struct request_queue *q) +static inline unsigned int bdev_max_open_zones(struct block_device *bdev) { return 0; } -static inline unsigned int queue_max_active_zones(const struct request_queue *q) +static inline unsigned int bdev_max_active_zones(struct block_device *bdev) { return 0; } @@ -1314,24 +1315,6 @@ static inline sector_t bdev_zone_sectors(struct block_device *bdev) return 0; } -static inline unsigned int bdev_max_open_zones(struct block_device *bdev) -{ - struct request_queue *q = bdev_get_queue(bdev); - - if (q) - return queue_max_open_zones(q); - return 0; -} - -static inline unsigned int bdev_max_active_zones(struct block_device *bdev) -{ - struct request_queue *q = bdev_get_queue(bdev); - - if (q) - return queue_max_active_zones(q); - return 0; -} - static inline int queue_dma_alignment(const struct request_queue *q) { return q ? q->dma_alignment : 511; -- cgit From 982977df48179c8c690868f398051074e68eef0f Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 6 Jul 2022 09:03:44 +0200 Subject: block: pass a gendisk to blk_queue_max_open_zones and blk_queue_max_active_zones Switch to a gendisk based API in preparation for moving all zone related fields from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220706070350.1703384-11-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 7d4105d23b0a..c05e1cc05c26 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -696,16 +696,16 @@ static inline bool blk_queue_zone_is_seq(struct request_queue *q, return !test_bit(blk_queue_zone_no(q, sector), q->conv_zones_bitmap); } -static inline void blk_queue_max_open_zones(struct request_queue *q, +static inline void disk_set_max_open_zones(struct gendisk *disk, unsigned int max_open_zones) { - q->max_open_zones = max_open_zones; + disk->queue->max_open_zones = max_open_zones; } -static inline void blk_queue_max_active_zones(struct request_queue *q, +static inline void disk_set_max_active_zones(struct gendisk *disk, unsigned int max_active_zones) { - q->max_active_zones = max_active_zones; + disk->queue->max_active_zones = max_active_zones; } static inline unsigned int bdev_max_open_zones(struct block_device *bdev) -- cgit From b623e347323f6464b20fb0d899a0a73522ed8f6c Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 6 Jul 2022 09:03:45 +0200 Subject: block: replace blkdev_nr_zones with bdev_nr_zones Pass a block_device instead of a request_queue as that is what most callers have at hand. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Johannes Thumshirn Reviewed-by: Damien Le Moal Acked-by: Damien Le Moal Link: https://lore.kernel.org/r/20220706070350.1703384-12-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index c05e1cc05c26..fa2757ef4a84 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -298,7 +298,7 @@ void disk_set_zoned(struct gendisk *disk, enum blk_zoned_model model); #define BLK_ALL_ZONES ((unsigned int)-1) int blkdev_report_zones(struct block_device *bdev, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); -unsigned int blkdev_nr_zones(struct gendisk *disk); +unsigned int bdev_nr_zones(struct block_device *bdev); extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op, sector_t sectors, sector_t nr_sectors, gfp_t gfp_mask); @@ -312,7 +312,7 @@ extern int blkdev_zone_mgmt_ioctl(struct block_device *bdev, fmode_t mode, #else /* CONFIG_BLK_DEV_ZONED */ -static inline unsigned int blkdev_nr_zones(struct gendisk *disk) +static inline unsigned int bdev_nr_zones(struct block_device *bdev) { return 0; } -- cgit From de71973c2951cb2ce4b46560f021f03b15906408 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 6 Jul 2022 09:03:49 +0200 Subject: block: remove blk_queue_zone_sectors Always use bdev_zone_sectors instead. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220706070350.1703384-16-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 11 +++-------- 1 file changed, 3 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index fa2757ef4a84..21b97f7115dc 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -667,11 +667,6 @@ static inline bool blk_queue_is_zoned(struct request_queue *q) } } -static inline sector_t blk_queue_zone_sectors(struct request_queue *q) -{ - return blk_queue_is_zoned(q) ? q->limits.chunk_sectors : 0; -} - #ifdef CONFIG_BLK_DEV_ZONED static inline unsigned int blk_queue_nr_zones(struct request_queue *q) { @@ -1310,9 +1305,9 @@ static inline sector_t bdev_zone_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); - if (q) - return blk_queue_zone_sectors(q); - return 0; + if (!blk_queue_is_zoned(q)) + return 0; + return q->limits.chunk_sectors; } static inline int queue_dma_alignment(const struct request_queue *q) -- cgit From d86e716aa40643e3eb8c69fab3a198146bf76dd6 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 6 Jul 2022 09:03:50 +0200 Subject: block: move zone related fields to struct gendisk Move the zone related fields that are currently stored in struct request_queue to struct gendisk as these are part of the highlevel block layer API and are only used for non-passthrough I/O that requires the gendisk. Signed-off-by: Christoph Hellwig Reviewed-by: Chaitanya Kulkarni Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220706070350.1703384-17-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 8 ++--- include/linux/blkdev.h | 91 +++++++++++++++++++++++--------------------------- 2 files changed, 46 insertions(+), 53 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 810a24884f7e..d74f6a6b7e69 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -1129,12 +1129,12 @@ void blk_dump_rq_flags(struct request *, char *); #ifdef CONFIG_BLK_DEV_ZONED static inline unsigned int blk_rq_zone_no(struct request *rq) { - return blk_queue_zone_no(rq->q, blk_rq_pos(rq)); + return disk_zone_no(rq->q->disk, blk_rq_pos(rq)); } static inline unsigned int blk_rq_zone_is_seq(struct request *rq) { - return blk_queue_zone_is_seq(rq->q, blk_rq_pos(rq)); + return disk_zone_is_seq(rq->q->disk, blk_rq_pos(rq)); } bool blk_req_needs_zone_write_lock(struct request *rq); @@ -1156,8 +1156,8 @@ static inline void blk_req_zone_write_unlock(struct request *rq) static inline bool blk_req_zone_is_write_locked(struct request *rq) { - return rq->q->seq_zones_wlock && - test_bit(blk_rq_zone_no(rq), rq->q->seq_zones_wlock); + return rq->q->disk->seq_zones_wlock && + test_bit(blk_rq_zone_no(rq), rq->q->disk->seq_zones_wlock); } static inline bool blk_req_can_dispatch_to_zone(struct request *rq) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 21b97f7115dc..22c477fadc0f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -164,6 +164,29 @@ struct gendisk { #ifdef CONFIG_BLK_DEV_INTEGRITY struct kobject integrity_kobj; #endif /* CONFIG_BLK_DEV_INTEGRITY */ + +#ifdef CONFIG_BLK_DEV_ZONED + /* + * Zoned block device information for request dispatch control. + * nr_zones is the total number of zones of the device. This is always + * 0 for regular block devices. conv_zones_bitmap is a bitmap of nr_zones + * bits which indicates if a zone is conventional (bit set) or + * sequential (bit clear). seq_zones_wlock is a bitmap of nr_zones + * bits which indicates if a zone is write locked, that is, if a write + * request targeting the zone was dispatched. + * + * Reads of this information must be protected with blk_queue_enter() / + * blk_queue_exit(). Modifying this information is only allowed while + * no requests are being processed. See also blk_mq_freeze_queue() and + * blk_mq_unfreeze_queue(). + */ + unsigned int nr_zones; + unsigned int max_open_zones; + unsigned int max_active_zones; + unsigned long *conv_zones_bitmap; + unsigned long *seq_zones_wlock; +#endif /* CONFIG_BLK_DEV_ZONED */ + #if IS_ENABLED(CONFIG_CDROM) struct cdrom_device_info *cdi; #endif @@ -467,31 +490,6 @@ struct request_queue { unsigned int required_elevator_features; -#ifdef CONFIG_BLK_DEV_ZONED - /* - * Zoned block device information for request dispatch control. - * nr_zones is the total number of zones of the device. This is always - * 0 for regular block devices. conv_zones_bitmap is a bitmap of nr_zones - * bits which indicates if a zone is conventional (bit set) or - * sequential (bit clear). seq_zones_wlock is a bitmap of nr_zones - * bits which indicates if a zone is write locked, that is, if a write - * request targeting the zone was dispatched. All three fields are - * initialized by the low level device driver (e.g. scsi/sd.c). - * Stacking drivers (device mappers) may or may not initialize - * these fields. - * - * Reads of this information must be protected with blk_queue_enter() / - * blk_queue_exit(). Modifying this information is only allowed while - * no requests are being processed. See also blk_mq_freeze_queue() and - * blk_mq_unfreeze_queue(). - */ - unsigned int nr_zones; - unsigned long *conv_zones_bitmap; - unsigned long *seq_zones_wlock; - unsigned int max_open_zones; - unsigned int max_active_zones; -#endif /* CONFIG_BLK_DEV_ZONED */ - int node; #ifdef CONFIG_BLK_DEV_IO_TRACE struct blk_trace __rcu *blk_trace; @@ -668,63 +666,59 @@ static inline bool blk_queue_is_zoned(struct request_queue *q) } #ifdef CONFIG_BLK_DEV_ZONED -static inline unsigned int blk_queue_nr_zones(struct request_queue *q) +static inline unsigned int disk_nr_zones(struct gendisk *disk) { - return blk_queue_is_zoned(q) ? q->nr_zones : 0; + return blk_queue_is_zoned(disk->queue) ? disk->nr_zones : 0; } -static inline unsigned int blk_queue_zone_no(struct request_queue *q, - sector_t sector) +static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector) { - if (!blk_queue_is_zoned(q)) + if (!blk_queue_is_zoned(disk->queue)) return 0; - return sector >> ilog2(q->limits.chunk_sectors); + return sector >> ilog2(disk->queue->limits.chunk_sectors); } -static inline bool blk_queue_zone_is_seq(struct request_queue *q, - sector_t sector) +static inline bool disk_zone_is_seq(struct gendisk *disk, sector_t sector) { - if (!blk_queue_is_zoned(q)) + if (!blk_queue_is_zoned(disk->queue)) return false; - if (!q->conv_zones_bitmap) + if (!disk->conv_zones_bitmap) return true; - return !test_bit(blk_queue_zone_no(q, sector), q->conv_zones_bitmap); + return !test_bit(disk_zone_no(disk, sector), disk->conv_zones_bitmap); } static inline void disk_set_max_open_zones(struct gendisk *disk, unsigned int max_open_zones) { - disk->queue->max_open_zones = max_open_zones; + disk->max_open_zones = max_open_zones; } static inline void disk_set_max_active_zones(struct gendisk *disk, unsigned int max_active_zones) { - disk->queue->max_active_zones = max_active_zones; + disk->max_active_zones = max_active_zones; } static inline unsigned int bdev_max_open_zones(struct block_device *bdev) { - return bdev->bd_disk->queue->max_open_zones; + return bdev->bd_disk->max_open_zones; } static inline unsigned int bdev_max_active_zones(struct block_device *bdev) { - return bdev->bd_disk->queue->max_active_zones; + return bdev->bd_disk->max_active_zones; } #else /* CONFIG_BLK_DEV_ZONED */ -static inline unsigned int blk_queue_nr_zones(struct request_queue *q) +static inline unsigned int disk_nr_zones(struct gendisk *disk) { return 0; } -static inline bool blk_queue_zone_is_seq(struct request_queue *q, - sector_t sector) +static inline bool disk_zone_is_seq(struct gendisk *disk, sector_t sector) { return false; } -static inline unsigned int blk_queue_zone_no(struct request_queue *q, - sector_t sector) +static inline unsigned int disk_zone_no(struct gendisk *disk, sector_t sector) { return 0; } @@ -732,6 +726,7 @@ static inline unsigned int bdev_max_open_zones(struct block_device *bdev) { return 0; } + static inline unsigned int bdev_max_active_zones(struct block_device *bdev) { return 0; @@ -900,14 +895,12 @@ const char *blk_zone_cond_str(enum blk_zone_cond zone_cond); static inline unsigned int bio_zone_no(struct bio *bio) { - return blk_queue_zone_no(bdev_get_queue(bio->bi_bdev), - bio->bi_iter.bi_sector); + return disk_zone_no(bio->bi_bdev->bd_disk, bio->bi_iter.bi_sector); } static inline unsigned int bio_zone_is_seq(struct bio *bio) { - return blk_queue_zone_is_seq(bdev_get_queue(bio->bi_bdev), - bio->bi_iter.bi_sector); + return disk_zone_is_seq(bio->bi_bdev->bd_disk, bio->bi_iter.bi_sector); } /* -- cgit From 2d693ed436a67db06d1473c22d4caee899a4e9d2 Mon Sep 17 00:00:00 2001 From: James Clark Date: Wed, 11 May 2022 15:45:58 +0100 Subject: coresight: Add config flag to enable branch broadcast When enabled, all taken branch addresses are output, even if the branch was because of a direct branch instruction. This enables reconstruction of the program flow without having access to the memory image of the code being executed. Use bit 8 for the config option which would be the correct bit for programming ETMv3. Although branch broadcast can't be enabled on ETMv3 because it's not in the define ETM3X_SUPPORTED_OPTIONS, using the correct bit might help prevent future collisions or allow it to be enabled if needed. Signed-off-by: James Clark Reviewed-by: Mike Leach Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20220511144601.2257870-2-james.clark@arm.com --- include/linux/coresight-pmu.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/coresight-pmu.h b/include/linux/coresight-pmu.h index 4ac5c081af93..6c2fd6cc5a98 100644 --- a/include/linux/coresight-pmu.h +++ b/include/linux/coresight-pmu.h @@ -18,6 +18,7 @@ * ETMv3.5/PTM doesn't define ETMCR config bits with prefix "ETM3_" and * directly use below macros as config bits. */ +#define ETM_OPT_BRANCH_BROADCAST 8 #define ETM_OPT_CYCACC 12 #define ETM_OPT_CTXTID 14 #define ETM_OPT_CTXTID2 15 @@ -25,6 +26,7 @@ #define ETM_OPT_RETSTK 29 /* ETMv4 CONFIGR programming bits for the ETM OPTs */ +#define ETM4_CFG_BIT_BB 3 #define ETM4_CFG_BIT_CYCACC 4 #define ETM4_CFG_BIT_CTXTID 6 #define ETM4_CFG_BIT_VMID 7 -- cgit From 5c629dc9609dc43492a7bc8060cc6120875bf096 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Wed, 6 Jul 2022 10:05:05 -0700 Subject: nvme: use struct group for generic command dwords This will allow the trace event to know the full size of the data intended to be copied and silence read overflow checks. Reported-by: John Garry Suggested-by: Christoph Hellwig Signed-off-by: Keith Busch Signed-off-by: Christoph Hellwig --- include/linux/nvme.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index e3934003f239..07cfc922f8e4 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -906,12 +906,14 @@ struct nvme_common_command { __le32 cdw2[2]; __le64 metadata; union nvme_data_ptr dptr; + struct_group(cdws, __le32 cdw10; __le32 cdw11; __le32 cdw12; __le32 cdw13; __le32 cdw14; __le32 cdw15; + ); }; struct nvme_rw_command { -- cgit From bfdd231374181254742c5e2faef0bef2d30c0ee4 Mon Sep 17 00:00:00 2001 From: Yunfei Wang Date: Thu, 30 Jun 2022 17:29:25 +0800 Subject: iommu/io-pgtable-arm-v7s: Add a quirk to allow pgtable PA up to 35bit Single memory zone feature will remove ZONE_DMA32 and ZONE_DMA and cause pgtable PA size larger than 32bit. Since Mediatek IOMMU hardware support at most 35bit PA in pgtable, so add a quirk to allow the PA of pgtables support up to bit35. Signed-off-by: Ning Li Signed-off-by: Yunfei Wang Reviewed-by: Robin Murphy Acked-by: Will Deacon Link: https://lore.kernel.org/r/20220630092927.24925-2-yf.wang@mediatek.com Signed-off-by: Joerg Roedel --- include/linux/io-pgtable.h | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 86af6f0a00a2..ca98aeadcc80 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -74,17 +74,22 @@ struct io_pgtable_cfg { * to support up to 35 bits PA where the bit32, bit33 and bit34 are * encoded in the bit9, bit4 and bit5 of the PTE respectively. * + * IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT: (ARM v7s format) MediaTek IOMMUs + * extend the translation table base support up to 35 bits PA, the + * encoding format is same with IO_PGTABLE_QUIRK_ARM_MTK_EXT. + * * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table * for use in the upper half of a split address space. * * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability * attributes set in the TCR for a non-coherent page-table walker. */ - #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) - #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) - #define IO_PGTABLE_QUIRK_ARM_MTK_EXT BIT(3) - #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5) - #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6) + #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) + #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) + #define IO_PGTABLE_QUIRK_ARM_MTK_EXT BIT(3) + #define IO_PGTABLE_QUIRK_ARM_MTK_TTBR_EXT BIT(4) + #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5) + #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6) unsigned long quirks; unsigned long pgsize_bitmap; unsigned int ias; -- cgit From 961343d7822624d0e329ab4167c7e1d02bb53112 Mon Sep 17 00:00:00 2001 From: Samuel Holland Date: Fri, 1 Jul 2022 15:00:53 -0500 Subject: genirq: Refactor accessors to use irq_data_get_affinity_mask A couple of functions directly reference the affinity mask. Route them through irq_data_get_affinity_mask so they will pick up any refactoring done there. Signed-off-by: Samuel Holland Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220701200056.46555-6-samuel@sholland.org --- include/linux/irq.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/irq.h b/include/linux/irq.h index 505308253d23..69ee4e2f36ce 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -879,16 +879,16 @@ static inline int irq_data_get_node(struct irq_data *d) return irq_common_data_get_node(d->common); } -static inline struct cpumask *irq_get_affinity_mask(int irq) +static inline struct cpumask *irq_data_get_affinity_mask(struct irq_data *d) { - struct irq_data *d = irq_get_irq_data(irq); - - return d ? d->common->affinity : NULL; + return d->common->affinity; } -static inline struct cpumask *irq_data_get_affinity_mask(struct irq_data *d) +static inline struct cpumask *irq_get_affinity_mask(int irq) { - return d->common->affinity; + struct irq_data *d = irq_get_irq_data(irq); + + return d ? irq_data_get_affinity_mask(d) : NULL; } #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK @@ -910,7 +910,7 @@ static inline void irq_data_update_effective_affinity(struct irq_data *d, static inline struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d) { - return d->common->affinity; + return irq_data_get_affinity_mask(d); } #endif -- cgit From 073352e951f60946452da358d64841066c3142ff Mon Sep 17 00:00:00 2001 From: Samuel Holland Date: Fri, 1 Jul 2022 15:00:54 -0500 Subject: genirq: Add and use an irq_data_update_affinity helper Some architectures and irqchip drivers modify the cpumask returned by irq_data_get_affinity_mask, usually by copying in to it. This is problematic for uniprocessor configurations, where the affinity mask should be constant, as it is known at compile time. Add and use a setter for the affinity mask, following the pattern of irq_data_update_effective_affinity. This allows the getter function to return a const cpumask pointer. Signed-off-by: Samuel Holland Reviewed-by: Oleksandr Tyshchenko # Xen bits Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220701200056.46555-7-samuel@sholland.org --- include/linux/irq.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/irq.h b/include/linux/irq.h index 69ee4e2f36ce..adcfebceb777 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -884,6 +884,12 @@ static inline struct cpumask *irq_data_get_affinity_mask(struct irq_data *d) return d->common->affinity; } +static inline void irq_data_update_affinity(struct irq_data *d, + const struct cpumask *m) +{ + cpumask_copy(d->common->affinity, m); +} + static inline struct cpumask *irq_get_affinity_mask(int irq) { struct irq_data *d = irq_get_irq_data(irq); -- cgit From 4d0b8298818b623f5fa51d5c49e1a142d3618ac9 Mon Sep 17 00:00:00 2001 From: Samuel Holland Date: Fri, 1 Jul 2022 15:00:55 -0500 Subject: genirq: Return a const cpumask from irq_data_get_affinity_mask Now that the irq_data_update_affinity helper exists, enforce its use by returning a a const cpumask from irq_data_get_affinity_mask. Since the previous commit already updated places that needed to call irq_data_update_affinity, this commit updates the remaining code that either did not modify the cpumask or immediately passed the modified mask to irq_set_affinity. Signed-off-by: Samuel Holland Reviewed-by: Michael Kelley Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220701200056.46555-8-samuel@sholland.org --- include/linux/irq.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/irq.h b/include/linux/irq.h index adcfebceb777..02073f7a156e 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -879,7 +879,8 @@ static inline int irq_data_get_node(struct irq_data *d) return irq_common_data_get_node(d->common); } -static inline struct cpumask *irq_data_get_affinity_mask(struct irq_data *d) +static inline +const struct cpumask *irq_data_get_affinity_mask(struct irq_data *d) { return d->common->affinity; } @@ -890,7 +891,7 @@ static inline void irq_data_update_affinity(struct irq_data *d, cpumask_copy(d->common->affinity, m); } -static inline struct cpumask *irq_get_affinity_mask(int irq) +static inline const struct cpumask *irq_get_affinity_mask(int irq) { struct irq_data *d = irq_get_irq_data(irq); @@ -899,7 +900,7 @@ static inline struct cpumask *irq_get_affinity_mask(int irq) #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK static inline -struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d) +const struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d) { return d->common->effective_affinity; } @@ -914,13 +915,14 @@ static inline void irq_data_update_effective_affinity(struct irq_data *d, { } static inline -struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d) +const struct cpumask *irq_data_get_effective_affinity_mask(struct irq_data *d) { return irq_data_get_affinity_mask(d); } #endif -static inline struct cpumask *irq_get_effective_affinity_mask(unsigned int irq) +static inline +const struct cpumask *irq_get_effective_affinity_mask(unsigned int irq) { struct irq_data *d = irq_get_irq_data(irq); -- cgit From aa0813581b8d37bdd91cd40b67ef79ffa45104b2 Mon Sep 17 00:00:00 2001 From: Samuel Holland Date: Fri, 1 Jul 2022 15:00:56 -0500 Subject: genirq: Provide an IRQ affinity mask in non-SMP configs IRQ affinity masks are not allocated in uniprocessor configurations. This requires special case non-SMP code in drivers for irqchips which have per-CPU enable or mask registers. Since IRQ affinity is always the same in a uniprocessor configuration, we can provide a correct affinity mask without allocating one per IRQ. By returning a real cpumask from irq_data_get_affinity_mask even when SMP is disabled, irqchip drivers which iterate over that mask will automatically do the right thing. Signed-off-by: Samuel Holland Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220701200056.46555-9-samuel@sholland.org --- include/linux/irq.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/irq.h b/include/linux/irq.h index 02073f7a156e..996e22744edd 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -151,7 +151,9 @@ struct irq_common_data { #endif void *handler_data; struct msi_desc *msi_desc; +#ifdef CONFIG_SMP cpumask_var_t affinity; +#endif #ifdef CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK cpumask_var_t effective_affinity; #endif @@ -882,13 +884,19 @@ static inline int irq_data_get_node(struct irq_data *d) static inline const struct cpumask *irq_data_get_affinity_mask(struct irq_data *d) { +#ifdef CONFIG_SMP return d->common->affinity; +#else + return cpumask_of(0); +#endif } static inline void irq_data_update_affinity(struct irq_data *d, const struct cpumask *m) { +#ifdef CONFIG_SMP cpumask_copy(d->common->affinity, m); +#endif } static inline const struct cpumask *irq_get_affinity_mask(int irq) -- cgit From e64242caef18b4a5840b0e7a9bff37abd4f4f933 Mon Sep 17 00:00:00 2001 From: Helge Deller Date: Sat, 25 Jun 2022 13:00:34 +0200 Subject: fbcon: Prevent that screen size is smaller than font size We need to prevent that users configure a screen size which is smaller than the currently selected font size. Otherwise rendering chars on the screen will access memory outside the graphics memory region. This patch adds a new function fbcon_modechange_possible() which implements this check and which later may be extended with other checks if necessary. The new function is called from the FBIOPUT_VSCREENINFO ioctl handler in fbmem.c, which will return -EINVAL if userspace asked for a too small screen size. Signed-off-by: Helge Deller Reviewed-by: Geert Uytterhoeven Cc: stable@vger.kernel.org # v5.4+ --- include/linux/fbcon.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fbcon.h b/include/linux/fbcon.h index ff5596dd30f8..2382dec6d6ab 100644 --- a/include/linux/fbcon.h +++ b/include/linux/fbcon.h @@ -15,6 +15,8 @@ void fbcon_new_modelist(struct fb_info *info); void fbcon_get_requirement(struct fb_info *info, struct fb_blit_caps *caps); void fbcon_fb_blanked(struct fb_info *info, int blank); +int fbcon_modechange_possible(struct fb_info *info, + struct fb_var_screeninfo *var); void fbcon_update_vcs(struct fb_info *info, bool all); void fbcon_remap_all(struct fb_info *info); int fbcon_set_con2fb_map_ioctl(void __user *argp); @@ -33,6 +35,8 @@ static inline void fbcon_new_modelist(struct fb_info *info) {} static inline void fbcon_get_requirement(struct fb_info *info, struct fb_blit_caps *caps) {} static inline void fbcon_fb_blanked(struct fb_info *info, int blank) {} +static inline int fbcon_modechange_possible(struct fb_info *info, + struct fb_var_screeninfo *var) { return 0; } static inline void fbcon_update_vcs(struct fb_info *info, bool all) {} static inline void fbcon_remap_all(struct fb_info *info) {} static inline int fbcon_set_con2fb_map_ioctl(void __user *argp) { return 0; } -- cgit From 70c248aca9e7efa85a6664d5ab56c17c326c958f Mon Sep 17 00:00:00 2001 From: Catalin Marinas Date: Fri, 10 Jun 2022 16:21:39 +0100 Subject: mm: kasan: Skip unpoisoning of user pages Commit c275c5c6d50a ("kasan: disable freed user page poisoning with HW tags") added __GFP_SKIP_KASAN_POISON to GFP_HIGHUSER_MOVABLE. A similar argument can be made about unpoisoning, so also add __GFP_SKIP_KASAN_UNPOISON to user pages. To ensure the user page is still accessible via page_address() without a kasan fault, reset the page->flags tag. With the above changes, there is no need for the arm64 tag_clear_highpage() to reset the page->flags tag. Signed-off-by: Catalin Marinas Cc: Andrey Ryabinin Cc: Andrey Konovalov Cc: Peter Collingbourne Cc: Vincenzo Frascino Reviewed-by: Vincenzo Frascino Reviewed-by: Andrey Konovalov Link: https://lore.kernel.org/r/20220610152141.2148929-3-catalin.marinas@arm.com Signed-off-by: Will Deacon --- include/linux/gfp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 2d2ccae933c2..0ace7759acd2 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -348,7 +348,7 @@ struct vm_area_struct; #define GFP_DMA32 __GFP_DMA32 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE | \ - __GFP_SKIP_KASAN_POISON) + __GFP_SKIP_KASAN_POISON | __GFP_SKIP_KASAN_UNPOISON) #define GFP_TRANSHUGE_LIGHT ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM) #define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) -- cgit From 2aec377a29250b942f14d3c16d49783da3e9df11 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Tue, 5 Jul 2022 14:00:36 -0400 Subject: dm table: remove dm_table_get_num_targets() wrapper More efficient and readable to just access table->num_targets directly. Suggested-by: Linus Torvalds Signed-off-by: Mike Snitzer --- include/linux/device-mapper.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 47a01c7cffdf..920085dd7f3b 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -561,7 +561,6 @@ void dm_sync_table(struct mapped_device *md); * Queries */ sector_t dm_table_get_size(struct dm_table *t); -unsigned int dm_table_get_num_targets(struct dm_table *t); fmode_t dm_table_get_mode(struct dm_table *t); struct mapped_device *dm_table_get_md(struct dm_table *t); const char *dm_table_device_name(struct dm_table *t); -- cgit From 752f596371289cb3b45e99453b5c09d96ed36614 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 26 Jun 2022 10:10:48 +0100 Subject: docs: filesystems: update netfs-api.rst reference Changeset efc930fa1d84 ("docs: filesystems: caching/netfs-api.txt: convert it to ReST") renamed: Documentation/filesystems/caching/netfs-api.txt to: Documentation/filesystems/caching/netfs-api.rst. Update its cross-reference accordingly. Fixes: efc930fa1d84 ("docs: filesystems: caching/netfs-api.txt: convert it to ReST") Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/5f867f01d42c3e65e111167739ed1a41a26623f9.1656234456.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet --- include/linux/fscache.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 72585c9729a2..2f82ea31d4c1 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -378,7 +378,7 @@ void fscache_update_cookie(struct fscache_cookie *cookie, const void *aux_data, * * Request that the size of an object be changed. * - * See Documentation/filesystems/caching/netfs-api.txt for a complete + * See Documentation/filesystems/caching/netfs-api.rst for a complete * description. */ static inline -- cgit From c02b872a7ca7842e4cdbbf621f77607d0a655f83 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 26 Jun 2022 10:10:56 +0100 Subject: Documentation: update watch_queue.rst references Changeset f5461124d59b ("Documentation: move watch_queue to core-api") renamed: Documentation/watch_queue.rst to: Documentation/core-api/watch_queue.rst. Update the cross-references accordingly. Fixes: f5461124d59b ("Documentation: move watch_queue to core-api") Reviewed-by: Randy Dunlap Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/1c220de9c58f35e815a3df9458ac2bea323c8bfb.1656234456.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet --- include/linux/watch_queue.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/watch_queue.h b/include/linux/watch_queue.h index 3b9a40ae8bdb..fc6bba20273b 100644 --- a/include/linux/watch_queue.h +++ b/include/linux/watch_queue.h @@ -4,7 +4,7 @@ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved. * Written by David Howells (dhowells@redhat.com) * - * See Documentation/watch_queue.rst + * See Documentation/core-api/watch_queue.rst */ #ifndef _LINUX_WATCH_QUEUE_H -- cgit From d6a21f2d73258f2a4cd2e7806f5755ee73fddced Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 26 Jun 2022 10:11:01 +0100 Subject: objtool: update objtool.txt references Changeset a8e35fece49b ("objtool: Update documentation") renamed: tools/objtool/Documentation/stack-validation.txt to: tools/objtool/Documentation/objtool.txt. Update the cross-references accordingly. Fixes: a8e35fece49b ("objtool: Update documentation") Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/ec285ece6348a5be191aebe45f78d06b3319056b.1656234456.git.mchehab@kernel.org Signed-off-by: Jonathan Corbet --- include/linux/objtool.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/objtool.h b/include/linux/objtool.h index 6491fa8fba6d..6f89471005ee 100644 --- a/include/linux/objtool.h +++ b/include/linux/objtool.h @@ -62,7 +62,7 @@ struct unwind_hint { * It should only be used in special cases where you're 100% sure it won't * affect the reliability of frame pointers and kernel stack traces. * - * For more information, see tools/objtool/Documentation/stack-validation.txt. + * For more information, see tools/objtool/Documentation/objtool.txt. */ #define STACK_FRAME_NON_STANDARD(func) \ static void __used __section(".discard.func_stack_frame_non_standard") \ -- cgit From 3566ee1d776c1393393564b2514f9cd52a49c16e Mon Sep 17 00:00:00 2001 From: Michael Kawano Date: Thu, 7 Jul 2022 15:57:27 +0200 Subject: vfio/ccw: Remove UUID from s390 debug log As vfio-ccw devices are created/destroyed, the uuid of the associated mdevs that are recorded in $S390DBF/vfio_ccw_msg/sprintf get lost. This is because a pointer to the UUID is stored instead of the UUID itself, and that memory may have been repurposed if/when the logs are examined. The result is usually garbage UUID data in the logs, though there is an outside chance of an oops happening here. Simply remove the UUID from the traces, as the subchannel number will provide useful configuration information for problem determination, and is stored directly into the log instead of a pointer. As we were the only consumer of mdev_uuid(), remove that too. Cc: Kirti Wankhede Signed-off-by: Michael Kawano Fixes: 60e05d1cf0875 ("vfio-ccw: add some logging") Fixes: b7701dfbf9832 ("vfio-ccw: Register a chp_event callback for vfio-ccw") [farman: reworded commit message, added Fixes: tags] Signed-off-by: Eric Farman Reviewed-by: Jason Gunthorpe Reviewed-by: Matthew Rosato Reviewed-by: Kirti Wankhede Link: https://lore.kernel.org/r/20220707135737.720765-2-farman@linux.ibm.com Signed-off-by: Alex Williamson --- include/linux/mdev.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index bb539794f54a..47ad3b104d9e 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -65,11 +65,6 @@ struct mdev_driver { struct device_driver driver; }; -static inline const guid_t *mdev_uuid(struct mdev_device *mdev) -{ - return &mdev->uuid; -} - extern struct bus_type mdev_bus_type; int mdev_register_device(struct device *dev, struct mdev_driver *mdev_driver); -- cgit From 87686cc845c3be7dea777f1dbf2de0767007cda8 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Mon, 4 Jul 2022 16:10:39 +0530 Subject: OPP: Make dev_pm_opp_set_regulators() accept NULL terminated list Make dev_pm_opp_set_regulators() accept a NULL terminated list of names instead of making the callers keep the two parameters in sync, which creates an opportunity for bugs to get in. Suggested-by: Greg Kroah-Hartman Reviewed-by: Steven Price # panfrost Reviewed-by: Chanwoo Choi Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 6708b4ec244d..4c490865d574 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -159,9 +159,9 @@ void dev_pm_opp_put_supported_hw(struct opp_table *opp_table); int devm_pm_opp_set_supported_hw(struct device *dev, const u32 *versions, unsigned int count); struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name); void dev_pm_opp_put_prop_name(struct opp_table *opp_table); -struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[], unsigned int count); +struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[]); void dev_pm_opp_put_regulators(struct opp_table *opp_table); -int devm_pm_opp_set_regulators(struct device *dev, const char * const names[], unsigned int count); +int devm_pm_opp_set_regulators(struct device *dev, const char * const names[]); struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char *name); void dev_pm_opp_put_clkname(struct opp_table *opp_table); int devm_pm_opp_set_clkname(struct device *dev, const char *name); @@ -379,7 +379,7 @@ static inline struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, con static inline void dev_pm_opp_put_prop_name(struct opp_table *opp_table) {} -static inline struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[], unsigned int count) +static inline struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[]) { return ERR_PTR(-EOPNOTSUPP); } @@ -387,8 +387,7 @@ static inline struct opp_table *dev_pm_opp_set_regulators(struct device *dev, co static inline void dev_pm_opp_put_regulators(struct opp_table *opp_table) {} static inline int devm_pm_opp_set_regulators(struct device *dev, - const char * const names[], - unsigned int count) + const char * const names[]) { return -EOPNOTSUPP; } -- cgit From 11b9b663585c4f8b00846089ebbca4d1e3283e86 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Wed, 25 May 2022 15:23:16 +0530 Subject: OPP: Add dev_pm_opp_set_config() and friends The OPP core already have few configuration specific APIs and it is getting complex or messy for both the OPP core and its users. Lets introduce a new set of API which will be used for all kind of different configurations, and shall eventually be used by all the existing ones. The new API, returns a unique token instead of a pointer to the OPP table, which allows the OPP core to drop the resources selectively later on. Tested-by: Dmitry Osipenko Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 4c490865d574..a08f9481efb3 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -90,6 +90,32 @@ struct dev_pm_set_opp_data { struct device *dev; }; +/** + * struct dev_pm_opp_config - Device OPP configuration values + * @clk_names: Clk names, NULL terminated array, max 1 clock for now. + * @prop_name: Name to postfix to properties. + * @set_opp: Custom set OPP helper. + * @supported_hw: Array of hierarchy of versions to match. + * @supported_hw_count: Number of elements in the array. + * @regulator_names: Array of pointers to the names of the regulator, NULL terminated. + * @genpd_names: Null terminated array of pointers containing names of genpd to + * attach. + * @virt_devs: Pointer to return the array of virtual devices. + * + * This structure contains platform specific OPP configurations for the device. + */ +struct dev_pm_opp_config { + /* NULL terminated */ + const char * const *clk_names; + const char *prop_name; + int (*set_opp)(struct dev_pm_set_opp_data *data); + const unsigned int *supported_hw; + unsigned int supported_hw_count; + const char * const *regulator_names; + const char * const *genpd_names; + struct device ***virt_devs; +}; + #if defined(CONFIG_PM_OPP) struct opp_table *dev_pm_opp_get_opp_table(struct device *dev); @@ -154,6 +180,10 @@ int dev_pm_opp_disable(struct device *dev, unsigned long freq); int dev_pm_opp_register_notifier(struct device *dev, struct notifier_block *nb); int dev_pm_opp_unregister_notifier(struct device *dev, struct notifier_block *nb); +int dev_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config); +int devm_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config); +void dev_pm_opp_clear_config(int token); + struct opp_table *dev_pm_opp_set_supported_hw(struct device *dev, const u32 *versions, unsigned int count); void dev_pm_opp_put_supported_hw(struct opp_table *opp_table); int devm_pm_opp_set_supported_hw(struct device *dev, const u32 *versions, unsigned int count); @@ -418,6 +448,18 @@ static inline int devm_pm_opp_attach_genpd(struct device *dev, return -EOPNOTSUPP; } +static inline int dev_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config) +{ + return -EOPNOTSUPP; +} + +static inline int devm_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config) +{ + return -EOPNOTSUPP; +} + +static inline void dev_pm_opp_clear_config(int token) {} + static inline struct dev_pm_opp *dev_pm_opp_xlate_required_opp(struct opp_table *src_table, struct opp_table *dst_table, struct dev_pm_opp *src_opp) { -- cgit From b0ec09428621daee5101130c307634a390b0213b Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Mon, 4 Jul 2022 16:36:26 +0530 Subject: OPP: Migrate set-regulators API to use set-config helpers Now that we have a central API to handle all OPP table configurations, migrate the set-regulators family of helpers to use the new infrastructure. The return type and parameter to the APIs change a bit due to this, update the current users as well in the same commit in order to avoid breaking builds. Reviewed-by: Chanwoo Choi Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 44 ++++++++++++++++++++++++++++---------------- 1 file changed, 28 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index a08f9481efb3..f014bd172c99 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -189,9 +189,6 @@ void dev_pm_opp_put_supported_hw(struct opp_table *opp_table); int devm_pm_opp_set_supported_hw(struct device *dev, const u32 *versions, unsigned int count); struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name); void dev_pm_opp_put_prop_name(struct opp_table *opp_table); -struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[]); -void dev_pm_opp_put_regulators(struct opp_table *opp_table); -int devm_pm_opp_set_regulators(struct device *dev, const char * const names[]); struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char *name); void dev_pm_opp_put_clkname(struct opp_table *opp_table); int devm_pm_opp_set_clkname(struct device *dev, const char *name); @@ -409,19 +406,6 @@ static inline struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, con static inline void dev_pm_opp_put_prop_name(struct opp_table *opp_table) {} -static inline struct opp_table *dev_pm_opp_set_regulators(struct device *dev, const char * const names[]) -{ - return ERR_PTR(-EOPNOTSUPP); -} - -static inline void dev_pm_opp_put_regulators(struct opp_table *opp_table) {} - -static inline int devm_pm_opp_set_regulators(struct device *dev, - const char * const names[]) -{ - return -EOPNOTSUPP; -} - static inline struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char *name) { return ERR_PTR(-EOPNOTSUPP); @@ -606,4 +590,32 @@ static inline int dev_pm_opp_of_find_icc_paths(struct device *dev, struct opp_ta } #endif +/* OPP Configuration helpers */ + +/* Regulators helpers */ +static inline int dev_pm_opp_set_regulators(struct device *dev, + const char * const names[]) +{ + struct dev_pm_opp_config config = { + .regulator_names = names, + }; + + return dev_pm_opp_set_config(dev, &config); +} + +static inline void dev_pm_opp_put_regulators(int token) +{ + dev_pm_opp_clear_config(token); +} + +static inline int devm_pm_opp_set_regulators(struct device *dev, + const char * const names[]) +{ + struct dev_pm_opp_config config = { + .regulator_names = names, + }; + + return devm_pm_opp_set_config(dev, &config); +} + #endif /* __LINUX_OPP_H__ */ -- cgit From 89f03984fa2abface1ffb1fe050b7c175651ffc7 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 26 May 2022 09:36:27 +0530 Subject: OPP: Migrate set-supported-hw API to use set-config helpers Now that we have a central API to handle all OPP table configurations, migrate the set-supported-hw family of helpers to use the new infrastructure. The return type and parameter to the APIs change a bit due to this, update the current users as well in the same commit in order to avoid breaking builds. Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 49 ++++++++++++++++++++++++++++++------------------- 1 file changed, 30 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index f014bd172c99..94d0101c254c 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -184,9 +184,6 @@ int dev_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config); int devm_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config); void dev_pm_opp_clear_config(int token); -struct opp_table *dev_pm_opp_set_supported_hw(struct device *dev, const u32 *versions, unsigned int count); -void dev_pm_opp_put_supported_hw(struct opp_table *opp_table); -int devm_pm_opp_set_supported_hw(struct device *dev, const u32 *versions, unsigned int count); struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name); void dev_pm_opp_put_prop_name(struct opp_table *opp_table); struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char *name); @@ -369,22 +366,6 @@ static inline int dev_pm_opp_unregister_notifier(struct device *dev, struct noti return -EOPNOTSUPP; } -static inline struct opp_table *dev_pm_opp_set_supported_hw(struct device *dev, - const u32 *versions, - unsigned int count) -{ - return ERR_PTR(-EOPNOTSUPP); -} - -static inline void dev_pm_opp_put_supported_hw(struct opp_table *opp_table) {} - -static inline int devm_pm_opp_set_supported_hw(struct device *dev, - const u32 *versions, - unsigned int count) -{ - return -EOPNOTSUPP; -} - static inline struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data)) { @@ -618,4 +599,34 @@ static inline int devm_pm_opp_set_regulators(struct device *dev, return devm_pm_opp_set_config(dev, &config); } +/* Supported-hw helpers */ +static inline int dev_pm_opp_set_supported_hw(struct device *dev, + const u32 *versions, + unsigned int count) +{ + struct dev_pm_opp_config config = { + .supported_hw = versions, + .supported_hw_count = count, + }; + + return dev_pm_opp_set_config(dev, &config); +} + +static inline void dev_pm_opp_put_supported_hw(int token) +{ + dev_pm_opp_clear_config(token); +} + +static inline int devm_pm_opp_set_supported_hw(struct device *dev, + const u32 *versions, + unsigned int count) +{ + struct dev_pm_opp_config config = { + .supported_hw = versions, + .supported_hw_count = count, + }; + + return devm_pm_opp_set_config(dev, &config); +} + #endif /* __LINUX_OPP_H__ */ -- cgit From 2368f57685768f9f9cd666eaa4194a359d89afb8 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 26 May 2022 09:36:27 +0530 Subject: OPP: Migrate set-clk-name API to use set-config helpers Now that we have a central API to handle all OPP table configurations, migrate the set-clk-name family of helpers to use the new infrastructure. Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 41 ++++++++++++++++++++++++++--------------- 1 file changed, 26 insertions(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 94d0101c254c..ed1906bbe8bb 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -186,9 +186,6 @@ void dev_pm_opp_clear_config(int token); struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name); void dev_pm_opp_put_prop_name(struct opp_table *opp_table); -struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char *name); -void dev_pm_opp_put_clkname(struct opp_table *opp_table); -int devm_pm_opp_set_clkname(struct device *dev, const char *name); struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data)); void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table); int devm_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data)); @@ -387,18 +384,6 @@ static inline struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, con static inline void dev_pm_opp_put_prop_name(struct opp_table *opp_table) {} -static inline struct opp_table *dev_pm_opp_set_clkname(struct device *dev, const char *name) -{ - return ERR_PTR(-EOPNOTSUPP); -} - -static inline void dev_pm_opp_put_clkname(struct opp_table *opp_table) {} - -static inline int devm_pm_opp_set_clkname(struct device *dev, const char *name) -{ - return -EOPNOTSUPP; -} - static inline struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, const char * const *names, struct device ***virt_devs) { return ERR_PTR(-EOPNOTSUPP); @@ -629,4 +614,30 @@ static inline int devm_pm_opp_set_supported_hw(struct device *dev, return devm_pm_opp_set_config(dev, &config); } +/* clkname helpers */ +static inline int dev_pm_opp_set_clkname(struct device *dev, const char *name) +{ + const char *names[] = { name, NULL }; + struct dev_pm_opp_config config = { + .clk_names = names, + }; + + return dev_pm_opp_set_config(dev, &config); +} + +static inline void dev_pm_opp_put_clkname(int token) +{ + dev_pm_opp_clear_config(token); +} + +static inline int devm_pm_opp_set_clkname(struct device *dev, const char *name) +{ + const char *names[] = { name, NULL }; + struct dev_pm_opp_config config = { + .clk_names = names, + }; + + return devm_pm_opp_set_config(dev, &config); +} + #endif /* __LINUX_OPP_H__ */ -- cgit From 3c543b42a6df35feb3db6689e512fa167739451e Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 26 May 2022 09:36:27 +0530 Subject: OPP: Migrate set-opp-helper API to use set-config helpers Now that we have a central API to handle all OPP table configurations, migrate the set-opp-helper family of helpers to use the new infrastructure. The return type and parameter to the APIs change a bit due to this, update the current users as well in the same commit in order to avoid breaking builds. Remove devm_pm_opp_register_set_opp_helper() as it has no users currently. Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index ed1906bbe8bb..85a4b7353979 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -186,9 +186,6 @@ void dev_pm_opp_clear_config(int token); struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name); void dev_pm_opp_put_prop_name(struct opp_table *opp_table); -struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data)); -void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table); -int devm_pm_opp_register_set_opp_helper(struct device *dev, int (*set_opp)(struct dev_pm_set_opp_data *data)); struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, const char * const *names, struct device ***virt_devs); void dev_pm_opp_detach_genpd(struct opp_table *opp_table); int devm_pm_opp_attach_genpd(struct device *dev, const char * const *names, struct device ***virt_devs); @@ -363,20 +360,6 @@ static inline int dev_pm_opp_unregister_notifier(struct device *dev, struct noti return -EOPNOTSUPP; } -static inline struct opp_table *dev_pm_opp_register_set_opp_helper(struct device *dev, - int (*set_opp)(struct dev_pm_set_opp_data *data)) -{ - return ERR_PTR(-EOPNOTSUPP); -} - -static inline void dev_pm_opp_unregister_set_opp_helper(struct opp_table *opp_table) {} - -static inline int devm_pm_opp_register_set_opp_helper(struct device *dev, - int (*set_opp)(struct dev_pm_set_opp_data *data)) -{ - return -EOPNOTSUPP; -} - static inline struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name) { return ERR_PTR(-EOPNOTSUPP); @@ -640,4 +623,20 @@ static inline int devm_pm_opp_set_clkname(struct device *dev, const char *name) return devm_pm_opp_set_config(dev, &config); } +/* set-opp helpers */ +static inline int dev_pm_opp_register_set_opp_helper(struct device *dev, + int (*set_opp)(struct dev_pm_set_opp_data *data)) +{ + struct dev_pm_opp_config config = { + .set_opp = set_opp, + }; + + return dev_pm_opp_set_config(dev, &config); +} + +static inline void dev_pm_opp_unregister_set_opp_helper(int token) +{ + dev_pm_opp_clear_config(token); +} + #endif /* __LINUX_OPP_H__ */ -- cgit From 442e7a1786e628b38175314ec1805966b8d0c02c Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 26 May 2022 09:36:27 +0530 Subject: OPP: Migrate attach-genpd API to use set-config helpers Now that we have a central API to handle all OPP table configurations, migrate the attach-genpd family of helpers to use the new infrastructure. Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 48 +++++++++++++++++++++++++++++++----------------- 1 file changed, 31 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 85a4b7353979..20e1e5060a8a 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -186,9 +186,6 @@ void dev_pm_opp_clear_config(int token); struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name); void dev_pm_opp_put_prop_name(struct opp_table *opp_table); -struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, const char * const *names, struct device ***virt_devs); -void dev_pm_opp_detach_genpd(struct opp_table *opp_table); -int devm_pm_opp_attach_genpd(struct device *dev, const char * const *names, struct device ***virt_devs); struct dev_pm_opp *dev_pm_opp_xlate_required_opp(struct opp_table *src_table, struct opp_table *dst_table, struct dev_pm_opp *src_opp); int dev_pm_opp_xlate_performance_state(struct opp_table *src_table, struct opp_table *dst_table, unsigned int pstate); int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq); @@ -367,20 +364,6 @@ static inline struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, con static inline void dev_pm_opp_put_prop_name(struct opp_table *opp_table) {} -static inline struct opp_table *dev_pm_opp_attach_genpd(struct device *dev, const char * const *names, struct device ***virt_devs) -{ - return ERR_PTR(-EOPNOTSUPP); -} - -static inline void dev_pm_opp_detach_genpd(struct opp_table *opp_table) {} - -static inline int devm_pm_opp_attach_genpd(struct device *dev, - const char * const *names, - struct device ***virt_devs) -{ - return -EOPNOTSUPP; -} - static inline int dev_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config) { return -EOPNOTSUPP; @@ -639,4 +622,35 @@ static inline void dev_pm_opp_unregister_set_opp_helper(int token) dev_pm_opp_clear_config(token); } +/* genpd helpers */ +static inline int dev_pm_opp_attach_genpd(struct device *dev, + const char * const *names, + struct device ***virt_devs) +{ + struct dev_pm_opp_config config = { + .genpd_names = names, + .virt_devs = virt_devs, + }; + + return dev_pm_opp_set_config(dev, &config); +} + +static inline void dev_pm_opp_detach_genpd(int token) +{ + dev_pm_opp_clear_config(token); +} + +static inline int devm_pm_opp_attach_genpd(struct device *dev, + const char * const *names, + struct device ***virt_devs) +{ + struct dev_pm_opp_config config = { + .genpd_names = names, + .virt_devs = virt_devs, + }; + + return devm_pm_opp_set_config(dev, &config); +} + + #endif /* __LINUX_OPP_H__ */ -- cgit From 298098e55a6fcc176a5af52cd689f33577ead5ca Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 26 May 2022 09:36:27 +0530 Subject: OPP: Migrate set-prop-name helper API to use set-config helpers Now that we have a central API to handle all OPP table configurations, migrate the set-prop-name family of helpers to use the new infrastructure. The return type and parameter to the APIs change a bit due to this, update the current users as well in the same commit in order to avoid breaking builds. Acked-by: Samuel Holland # sun50i Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 20e1e5060a8a..f995ca1406e8 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -184,8 +184,6 @@ int dev_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config); int devm_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config); void dev_pm_opp_clear_config(int token); -struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name); -void dev_pm_opp_put_prop_name(struct opp_table *opp_table); struct dev_pm_opp *dev_pm_opp_xlate_required_opp(struct opp_table *src_table, struct opp_table *dst_table, struct dev_pm_opp *src_opp); int dev_pm_opp_xlate_performance_state(struct opp_table *src_table, struct opp_table *dst_table, unsigned int pstate); int dev_pm_opp_set_rate(struct device *dev, unsigned long target_freq); @@ -357,13 +355,6 @@ static inline int dev_pm_opp_unregister_notifier(struct device *dev, struct noti return -EOPNOTSUPP; } -static inline struct opp_table *dev_pm_opp_set_prop_name(struct device *dev, const char *name) -{ - return ERR_PTR(-EOPNOTSUPP); -} - -static inline void dev_pm_opp_put_prop_name(struct opp_table *opp_table) {} - static inline int dev_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config) { return -EOPNOTSUPP; @@ -652,5 +643,19 @@ static inline int devm_pm_opp_attach_genpd(struct device *dev, return devm_pm_opp_set_config(dev, &config); } +/* prop-name helpers */ +static inline int dev_pm_opp_set_prop_name(struct device *dev, const char *name) +{ + struct dev_pm_opp_config config = { + .prop_name = name, + }; + + return dev_pm_opp_set_config(dev, &config); +} + +static inline void dev_pm_opp_put_prop_name(int token) +{ + dev_pm_opp_clear_config(token); +} #endif /* __LINUX_OPP_H__ */ -- cgit From aee3352f6ecf8cfad1f1ee5838cfc4d37c6b8f75 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Mon, 4 Jul 2022 13:45:08 +0530 Subject: OPP: Add support for config_regulators() helper Extend the dev_pm_opp_set_config() interface to allow adding config_regulators() helpers. This helper will be called to set the voltages of the regulators from the regular path in _set_opp(), while we are trying to change the OPP. This will eventually replace the custom set_opp() helper. Tested-by: Dmitry Osipenko Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index f995ca1406e8..9f2f9a792a19 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -90,11 +90,16 @@ struct dev_pm_set_opp_data { struct device *dev; }; +typedef int (*config_regulators_t)(struct device *dev, + struct dev_pm_opp *old_opp, struct dev_pm_opp *new_opp, + struct regulator **regulators, unsigned int count); + /** * struct dev_pm_opp_config - Device OPP configuration values * @clk_names: Clk names, NULL terminated array, max 1 clock for now. * @prop_name: Name to postfix to properties. * @set_opp: Custom set OPP helper. + * @config_regulators: Custom set regulator helper. * @supported_hw: Array of hierarchy of versions to match. * @supported_hw_count: Number of elements in the array. * @regulator_names: Array of pointers to the names of the regulator, NULL terminated. @@ -109,6 +114,7 @@ struct dev_pm_opp_config { const char * const *clk_names; const char *prop_name; int (*set_opp)(struct dev_pm_set_opp_data *data); + config_regulators_t config_regulators; const unsigned int *supported_hw; unsigned int supported_hw_count; const char * const *regulator_names; @@ -613,6 +619,22 @@ static inline void dev_pm_opp_unregister_set_opp_helper(int token) dev_pm_opp_clear_config(token); } +/* config-regulators helpers */ +static inline int dev_pm_opp_set_config_regulators(struct device *dev, + config_regulators_t helper) +{ + struct dev_pm_opp_config config = { + .config_regulators = helper, + }; + + return dev_pm_opp_set_config(dev, &config); +} + +static inline void dev_pm_opp_put_config_regulators(int token) +{ + dev_pm_opp_clear_config(token); +} + /* genpd helpers */ static inline int dev_pm_opp_attach_genpd(struct device *dev, const char * const *names, -- cgit From 69b1af178a3a534da2f20fa0c7356fcbbe8cce1f Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 31 May 2022 13:24:21 +0530 Subject: OPP: Add dev_pm_opp_get_supplies() We already have an API for getting voltage information for a single regulator, dev_pm_opp_get_voltage(), but there is nothing available for multiple regulator case. This patch adds a new API, dev_pm_opp_get_supplies(), to get all information related to the supplies for an OPP. Tested-by: Dmitry Osipenko Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 9f2f9a792a19..1e2b33d79ba6 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -129,6 +129,8 @@ void dev_pm_opp_put_opp_table(struct opp_table *opp_table); unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp); +int dev_pm_opp_get_supplies(struct dev_pm_opp *opp, struct dev_pm_opp_supply *supplies); + unsigned long dev_pm_opp_get_power(struct dev_pm_opp *opp); unsigned long dev_pm_opp_get_freq(struct dev_pm_opp *opp); @@ -217,6 +219,11 @@ static inline unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp) return 0; } +static inline int dev_pm_opp_get_supplies(struct dev_pm_opp *opp, struct dev_pm_opp_supply *supplies) +{ + return -EOPNOTSUPP; +} + static inline unsigned long dev_pm_opp_get_power(struct dev_pm_opp *opp) { return 0; -- cgit From 1f378c6ead5c69decf36e8e61de2e50377c76ab8 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Mon, 4 Jul 2022 14:57:06 +0530 Subject: OPP: Remove custom OPP helper support The only user of the custom helper is migrated to use dev_pm_opp_set_config_regulators() interface. Remove the now unused custom OPP helper support. This cleans up _set_opp() and leaves a single code path to be used by all users. Tested-by: Dmitry Osipenko Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 51 -------------------------------------------------- 1 file changed, 51 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 1e2b33d79ba6..9d59aedc2be3 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -57,39 +57,6 @@ struct dev_pm_opp_icc_bw { u32 peak; }; -/** - * struct dev_pm_opp_info - OPP freq/voltage/current values - * @rate: Target clk rate in hz - * @supplies: Array of voltage/current values for all power supplies - * - * This structure stores the freq/voltage/current values for a single OPP. - */ -struct dev_pm_opp_info { - unsigned long rate; - struct dev_pm_opp_supply *supplies; -}; - -/** - * struct dev_pm_set_opp_data - Set OPP data - * @old_opp: Old OPP info - * @new_opp: New OPP info - * @regulators: Array of regulator pointers - * @regulator_count: Number of regulators - * @clk: Pointer to clk - * @dev: Pointer to the struct device - * - * This structure contains all information required for setting an OPP. - */ -struct dev_pm_set_opp_data { - struct dev_pm_opp_info old_opp; - struct dev_pm_opp_info new_opp; - - struct regulator **regulators; - unsigned int regulator_count; - struct clk *clk; - struct device *dev; -}; - typedef int (*config_regulators_t)(struct device *dev, struct dev_pm_opp *old_opp, struct dev_pm_opp *new_opp, struct regulator **regulators, unsigned int count); @@ -98,7 +65,6 @@ typedef int (*config_regulators_t)(struct device *dev, * struct dev_pm_opp_config - Device OPP configuration values * @clk_names: Clk names, NULL terminated array, max 1 clock for now. * @prop_name: Name to postfix to properties. - * @set_opp: Custom set OPP helper. * @config_regulators: Custom set regulator helper. * @supported_hw: Array of hierarchy of versions to match. * @supported_hw_count: Number of elements in the array. @@ -113,7 +79,6 @@ struct dev_pm_opp_config { /* NULL terminated */ const char * const *clk_names; const char *prop_name; - int (*set_opp)(struct dev_pm_set_opp_data *data); config_regulators_t config_regulators; const unsigned int *supported_hw; unsigned int supported_hw_count; @@ -610,22 +575,6 @@ static inline int devm_pm_opp_set_clkname(struct device *dev, const char *name) return devm_pm_opp_set_config(dev, &config); } -/* set-opp helpers */ -static inline int dev_pm_opp_register_set_opp_helper(struct device *dev, - int (*set_opp)(struct dev_pm_set_opp_data *data)) -{ - struct dev_pm_opp_config config = { - .set_opp = set_opp, - }; - - return dev_pm_opp_set_config(dev, &config); -} - -static inline void dev_pm_opp_unregister_set_opp_helper(int token) -{ - dev_pm_opp_clear_config(token); -} - /* config-regulators helpers */ static inline int dev_pm_opp_set_config_regulators(struct device *dev, config_regulators_t helper) -- cgit From 9fbb62605607d8f00c32a4765cd1ae67f022b640 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Thu, 2 Jun 2022 14:05:59 +0530 Subject: OPP: Remove dev_pm_opp_find_freq_ceil_by_volt() This was added few years back, but the code that was supposed to use it never got merged. Remove the unused helper. Tested-by: Dmitry Osipenko Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 9d59aedc2be3..50cbc75bef71 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -118,8 +118,6 @@ struct dev_pm_opp *dev_pm_opp_find_freq_exact(struct device *dev, bool available); struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, unsigned long *freq); -struct dev_pm_opp *dev_pm_opp_find_freq_ceil_by_volt(struct device *dev, - unsigned long u_volt); struct dev_pm_opp *dev_pm_opp_find_level_exact(struct device *dev, unsigned int level); @@ -265,12 +263,6 @@ static inline struct dev_pm_opp *dev_pm_opp_find_freq_floor(struct device *dev, return ERR_PTR(-EOPNOTSUPP); } -static inline struct dev_pm_opp *dev_pm_opp_find_freq_ceil_by_volt(struct device *dev, - unsigned long u_volt) -{ - return ERR_PTR(-EOPNOTSUPP); -} - static inline struct dev_pm_opp *dev_pm_opp_find_freq_ceil(struct device *dev, unsigned long *freq) { -- cgit From 7963d4d710112bc457f99bdb56608211e561190e Mon Sep 17 00:00:00 2001 From: Xin Ji Date: Wed, 6 Jul 2022 16:34:31 +0800 Subject: usb: typec: tcpci: move tcpci.h to include/linux/usb/ USB PD controllers which consisting of a microcontroller (acting as the TCPM) and a port controller (TCPC) - may require that the driver for the PD controller accesses directly also the on-chip port controller in some cases. Move tcpci.h to include/linux/usb/ is convenience access TCPC registers. Reviewed-by: Heikki Krogerus Signed-off-by: Xin Ji Link: https://lore.kernel.org/r/20220706083433.2415524-1-xji@analogixsemi.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/tcpci.h | 210 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 210 insertions(+) create mode 100644 include/linux/usb/tcpci.h (limited to 'include/linux') diff --git a/include/linux/usb/tcpci.h b/include/linux/usb/tcpci.h new file mode 100644 index 000000000000..20c0bedb8ec8 --- /dev/null +++ b/include/linux/usb/tcpci.h @@ -0,0 +1,210 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +/* + * Copyright 2015-2017 Google, Inc + * + * USB Type-C Port Controller Interface. + */ + +#ifndef __LINUX_USB_TCPCI_H +#define __LINUX_USB_TCPCI_H + +#include +#include + +#define TCPC_VENDOR_ID 0x0 +#define TCPC_PRODUCT_ID 0x2 +#define TCPC_BCD_DEV 0x4 +#define TCPC_TC_REV 0x6 +#define TCPC_PD_REV 0x8 +#define TCPC_PD_INT_REV 0xa + +#define TCPC_ALERT 0x10 +#define TCPC_ALERT_EXTND BIT(14) +#define TCPC_ALERT_EXTENDED_STATUS BIT(13) +#define TCPC_ALERT_VBUS_DISCNCT BIT(11) +#define TCPC_ALERT_RX_BUF_OVF BIT(10) +#define TCPC_ALERT_FAULT BIT(9) +#define TCPC_ALERT_V_ALARM_LO BIT(8) +#define TCPC_ALERT_V_ALARM_HI BIT(7) +#define TCPC_ALERT_TX_SUCCESS BIT(6) +#define TCPC_ALERT_TX_DISCARDED BIT(5) +#define TCPC_ALERT_TX_FAILED BIT(4) +#define TCPC_ALERT_RX_HARD_RST BIT(3) +#define TCPC_ALERT_RX_STATUS BIT(2) +#define TCPC_ALERT_POWER_STATUS BIT(1) +#define TCPC_ALERT_CC_STATUS BIT(0) + +#define TCPC_ALERT_MASK 0x12 +#define TCPC_POWER_STATUS_MASK 0x14 +#define TCPC_FAULT_STATUS_MASK 0x15 + +#define TCPC_EXTENDED_STATUS_MASK 0x16 +#define TCPC_EXTENDED_STATUS_MASK_VSAFE0V BIT(0) + +#define TCPC_ALERT_EXTENDED_MASK 0x17 +#define TCPC_SINK_FAST_ROLE_SWAP BIT(0) + +#define TCPC_CONFIG_STD_OUTPUT 0x18 + +#define TCPC_TCPC_CTRL 0x19 +#define TCPC_TCPC_CTRL_ORIENTATION BIT(0) +#define PLUG_ORNT_CC1 0 +#define PLUG_ORNT_CC2 1 +#define TCPC_TCPC_CTRL_BIST_TM BIT(1) +#define TCPC_TCPC_CTRL_EN_LK4CONN_ALRT BIT(6) + +#define TCPC_EXTENDED_STATUS 0x20 +#define TCPC_EXTENDED_STATUS_VSAFE0V BIT(0) + +#define TCPC_ROLE_CTRL 0x1a +#define TCPC_ROLE_CTRL_DRP BIT(6) +#define TCPC_ROLE_CTRL_RP_VAL_SHIFT 4 +#define TCPC_ROLE_CTRL_RP_VAL_MASK 0x3 +#define TCPC_ROLE_CTRL_RP_VAL_DEF 0x0 +#define TCPC_ROLE_CTRL_RP_VAL_1_5 0x1 +#define TCPC_ROLE_CTRL_RP_VAL_3_0 0x2 +#define TCPC_ROLE_CTRL_CC2_SHIFT 2 +#define TCPC_ROLE_CTRL_CC2_MASK 0x3 +#define TCPC_ROLE_CTRL_CC1_SHIFT 0 +#define TCPC_ROLE_CTRL_CC1_MASK 0x3 +#define TCPC_ROLE_CTRL_CC_RA 0x0 +#define TCPC_ROLE_CTRL_CC_RP 0x1 +#define TCPC_ROLE_CTRL_CC_RD 0x2 +#define TCPC_ROLE_CTRL_CC_OPEN 0x3 + +#define TCPC_FAULT_CTRL 0x1b + +#define TCPC_POWER_CTRL 0x1c +#define TCPC_POWER_CTRL_VCONN_ENABLE BIT(0) +#define TCPC_POWER_CTRL_BLEED_DISCHARGE BIT(3) +#define TCPC_POWER_CTRL_AUTO_DISCHARGE BIT(4) +#define TCPC_DIS_VOLT_ALRM BIT(5) +#define TCPC_POWER_CTRL_VBUS_VOLT_MON BIT(6) +#define TCPC_FAST_ROLE_SWAP_EN BIT(7) + +#define TCPC_CC_STATUS 0x1d +#define TCPC_CC_STATUS_TOGGLING BIT(5) +#define TCPC_CC_STATUS_TERM BIT(4) +#define TCPC_CC_STATUS_TERM_RP 0 +#define TCPC_CC_STATUS_TERM_RD 1 +#define TCPC_CC_STATE_SRC_OPEN 0 +#define TCPC_CC_STATUS_CC2_SHIFT 2 +#define TCPC_CC_STATUS_CC2_MASK 0x3 +#define TCPC_CC_STATUS_CC1_SHIFT 0 +#define TCPC_CC_STATUS_CC1_MASK 0x3 + +#define TCPC_POWER_STATUS 0x1e +#define TCPC_POWER_STATUS_DBG_ACC_CON BIT(7) +#define TCPC_POWER_STATUS_UNINIT BIT(6) +#define TCPC_POWER_STATUS_SOURCING_VBUS BIT(4) +#define TCPC_POWER_STATUS_VBUS_DET BIT(3) +#define TCPC_POWER_STATUS_VBUS_PRES BIT(2) +#define TCPC_POWER_STATUS_VCONN_PRES BIT(1) +#define TCPC_POWER_STATUS_SINKING_VBUS BIT(0) + +#define TCPC_FAULT_STATUS 0x1f + +#define TCPC_ALERT_EXTENDED 0x21 + +#define TCPC_COMMAND 0x23 +#define TCPC_CMD_WAKE_I2C 0x11 +#define TCPC_CMD_DISABLE_VBUS_DETECT 0x22 +#define TCPC_CMD_ENABLE_VBUS_DETECT 0x33 +#define TCPC_CMD_DISABLE_SINK_VBUS 0x44 +#define TCPC_CMD_SINK_VBUS 0x55 +#define TCPC_CMD_DISABLE_SRC_VBUS 0x66 +#define TCPC_CMD_SRC_VBUS_DEFAULT 0x77 +#define TCPC_CMD_SRC_VBUS_HIGH 0x88 +#define TCPC_CMD_LOOK4CONNECTION 0x99 +#define TCPC_CMD_RXONEMORE 0xAA +#define TCPC_CMD_I2C_IDLE 0xFF + +#define TCPC_DEV_CAP_1 0x24 +#define TCPC_DEV_CAP_2 0x26 +#define TCPC_STD_INPUT_CAP 0x28 +#define TCPC_STD_OUTPUT_CAP 0x29 + +#define TCPC_MSG_HDR_INFO 0x2e +#define TCPC_MSG_HDR_INFO_DATA_ROLE BIT(3) +#define TCPC_MSG_HDR_INFO_PWR_ROLE BIT(0) +#define TCPC_MSG_HDR_INFO_REV_SHIFT 1 +#define TCPC_MSG_HDR_INFO_REV_MASK 0x3 + +#define TCPC_RX_DETECT 0x2f +#define TCPC_RX_DETECT_HARD_RESET BIT(5) +#define TCPC_RX_DETECT_SOP BIT(0) +#define TCPC_RX_DETECT_SOP1 BIT(1) +#define TCPC_RX_DETECT_SOP2 BIT(2) +#define TCPC_RX_DETECT_DBG1 BIT(3) +#define TCPC_RX_DETECT_DBG2 BIT(4) + +#define TCPC_RX_BYTE_CNT 0x30 +#define TCPC_RX_BUF_FRAME_TYPE 0x31 +#define TCPC_RX_BUF_FRAME_TYPE_SOP 0 +#define TCPC_RX_HDR 0x32 +#define TCPC_RX_DATA 0x34 /* through 0x4f */ + +#define TCPC_TRANSMIT 0x50 +#define TCPC_TRANSMIT_RETRY_SHIFT 4 +#define TCPC_TRANSMIT_RETRY_MASK 0x3 +#define TCPC_TRANSMIT_TYPE_SHIFT 0 +#define TCPC_TRANSMIT_TYPE_MASK 0x7 + +#define TCPC_TX_BYTE_CNT 0x51 +#define TCPC_TX_HDR 0x52 +#define TCPC_TX_DATA 0x54 /* through 0x6f */ + +#define TCPC_VBUS_VOLTAGE 0x70 +#define TCPC_VBUS_VOLTAGE_MASK 0x3ff +#define TCPC_VBUS_VOLTAGE_LSB_MV 25 +#define TCPC_VBUS_SINK_DISCONNECT_THRESH 0x72 +#define TCPC_VBUS_SINK_DISCONNECT_THRESH_LSB_MV 25 +#define TCPC_VBUS_SINK_DISCONNECT_THRESH_MAX 0x3ff +#define TCPC_VBUS_STOP_DISCHARGE_THRESH 0x74 +#define TCPC_VBUS_VOLTAGE_ALARM_HI_CFG 0x76 +#define TCPC_VBUS_VOLTAGE_ALARM_LO_CFG 0x78 + +/* I2C_WRITE_BYTE_COUNT + 1 when TX_BUF_BYTE_x is only accessible I2C_WRITE_BYTE_COUNT */ +#define TCPC_TRANSMIT_BUFFER_MAX_LEN 31 + +struct tcpci; + +/* + * @TX_BUF_BYTE_x_hidden: + * optional; Set when TX_BUF_BYTE_x can only be accessed through I2C_WRITE_BYTE_COUNT. + * @frs_sourcing_vbus: + * Optional; Callback to perform chip specific operations when FRS + * is sourcing vbus. + * @auto_discharge_disconnect: + * Optional; Enables TCPC to autonously discharge vbus on disconnect. + * @vbus_vsafe0v: + * optional; Set when TCPC can detect whether vbus is at VSAFE0V. + * @set_partner_usb_comm_capable: + * Optional; The USB Communications Capable bit indicates if port + * partner is capable of communication over the USB data lines + * (e.g. D+/- or SS Tx/Rx). Called to notify the status of the bit. + */ +struct tcpci_data { + struct regmap *regmap; + unsigned char TX_BUF_BYTE_x_hidden:1; + unsigned char auto_discharge_disconnect:1; + unsigned char vbus_vsafe0v:1; + + int (*init)(struct tcpci *tcpci, struct tcpci_data *data); + int (*set_vconn)(struct tcpci *tcpci, struct tcpci_data *data, + bool enable); + int (*start_drp_toggling)(struct tcpci *tcpci, struct tcpci_data *data, + enum typec_cc_status cc); + int (*set_vbus)(struct tcpci *tcpci, struct tcpci_data *data, bool source, bool sink); + void (*frs_sourcing_vbus)(struct tcpci *tcpci, struct tcpci_data *data); + void (*set_partner_usb_comm_capable)(struct tcpci *tcpci, struct tcpci_data *data, + bool capable); +}; + +struct tcpci *tcpci_register_port(struct device *dev, struct tcpci_data *data); +void tcpci_unregister_port(struct tcpci *tcpci); +irqreturn_t tcpci_irq(struct tcpci *tcpci); + +struct tcpm_port; +struct tcpm_port *tcpci_get_tcpm_port(struct tcpci *tcpci); +#endif /* __LINUX_USB_TCPCI_H */ -- cgit From 620e8e8ba6210003fa45081b63d90ef4f3a0637f Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Thu, 30 Jun 2022 12:35:27 -0700 Subject: of/platform: Add stubs for of_platform_device_create/destroy() Code for platform_device_create() and of_platform_device_destroy() is only generated if CONFIG_OF_ADDRESS=y. Add stubs to avoid unresolved symbols when CONFIG_OF_ADDRESS is not set. Reviewed-by: Stephen Boyd Reviewed-by: Douglas Anderson Acked-by: Rob Herring Signed-off-by: Matthias Kaehlcke Link: https://lore.kernel.org/r/20220630123445.v24.1.I08fd2e1c775af04f663730e9fb4d00e6bbb38541@changeid Signed-off-by: Greg Kroah-Hartman --- include/linux/of_platform.h | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of_platform.h b/include/linux/of_platform.h index 84a966623e78..d15b6cd5e1c3 100644 --- a/include/linux/of_platform.h +++ b/include/linux/of_platform.h @@ -61,16 +61,18 @@ static inline struct platform_device *of_find_device_by_node(struct device_node } #endif +extern int of_platform_bus_probe(struct device_node *root, + const struct of_device_id *matches, + struct device *parent); + +#ifdef CONFIG_OF_ADDRESS /* Platform devices and busses creation */ extern struct platform_device *of_platform_device_create(struct device_node *np, const char *bus_id, struct device *parent); extern int of_platform_device_destroy(struct device *dev, void *data); -extern int of_platform_bus_probe(struct device_node *root, - const struct of_device_id *matches, - struct device *parent); -#ifdef CONFIG_OF_ADDRESS + extern int of_platform_populate(struct device_node *root, const struct of_device_id *matches, const struct of_dev_auxdata *lookup, @@ -84,6 +86,18 @@ extern int devm_of_platform_populate(struct device *dev); extern void devm_of_platform_depopulate(struct device *dev); #else +/* Platform devices and busses creation */ +static inline struct platform_device *of_platform_device_create(struct device_node *np, + const char *bus_id, + struct device *parent) +{ + return NULL; +} +static inline int of_platform_device_destroy(struct device *dev, void *data) +{ + return -ENODEV; +} + static inline int of_platform_populate(struct device_node *root, const struct of_device_id *matches, const struct of_dev_auxdata *lookup, -- cgit From 8bc063641cebf9d555e41d135db2b5035b521768 Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Thu, 30 Jun 2022 12:35:29 -0700 Subject: usb: misc: Add onboard_usb_hub driver The main issue this driver addresses is that a USB hub needs to be powered before it can be discovered. For discrete onboard hubs (an example for such a hub is the Realtek RTS5411) this is often solved by supplying the hub with an 'always-on' regulator, which is kind of a hack. Some onboard hubs may require further initialization steps, like changing the state of a GPIO or enabling a clock, which requires even more hacks. This driver creates a platform device representing the hub which performs the necessary initialization. Currently it only supports switching on a single regulator, support for multiple regulators or other actions can be added as needed. Different initialization sequences can be supported based on the compatible string. Besides performing the initialization the driver can be configured to power the hub off during system suspend. This can help to extend battery life on battery powered devices which have no requirements to keep the hub powered during suspend. The driver can also be configured to leave the hub powered when a wakeup capable USB device is connected when suspending, and power it off otherwise. Technically the driver consists of two drivers, the platform driver described above and a very thin USB driver that subclasses the generic driver. The purpose of this driver is to provide the platform driver with the USB devices corresponding to the hub(s) (a hub controller may provide multiple 'logical' hubs, e.g. one to support USB 2.0 and another for USB 3.x). Co-developed-by: Ravi Chandra Sadineni Reviewed-by: Douglas Anderson Signed-off-by: Ravi Chandra Sadineni Signed-off-by: Matthias Kaehlcke Link: https://lore.kernel.org/r/20220630123445.v24.3.I7c9a1f1d6ced41dd8310e8a03da666a32364e790@changeid Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/onboard_hub.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 include/linux/usb/onboard_hub.h (limited to 'include/linux') diff --git a/include/linux/usb/onboard_hub.h b/include/linux/usb/onboard_hub.h new file mode 100644 index 000000000000..d9373230556e --- /dev/null +++ b/include/linux/usb/onboard_hub.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LINUX_USB_ONBOARD_HUB_H +#define __LINUX_USB_ONBOARD_HUB_H + +struct usb_device; +struct list_head; + +#if IS_ENABLED(CONFIG_USB_ONBOARD_HUB) +void onboard_hub_create_pdevs(struct usb_device *parent_hub, struct list_head *pdev_list); +void onboard_hub_destroy_pdevs(struct list_head *pdev_list); +#else +static inline void onboard_hub_create_pdevs(struct usb_device *parent_hub, + struct list_head *pdev_list) {} +static inline void onboard_hub_destroy_pdevs(struct list_head *pdev_list) {} +#endif + +#endif /* __LINUX_USB_ONBOARD_HUB_H */ -- cgit From 0139da50dc53f0ce2804e83566d290c7e626fd17 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Mon, 4 Jul 2022 12:45:14 +0300 Subject: serial: Embed rs485_supported to uart_port MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Embed rs485_supported to uart_port to allow serial core to tweak it as needed. Reviewed-by: Lino Sanfilippo Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220704094515.6831-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index b7b86ee3cb12..a6fa7c40c330 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -255,7 +255,7 @@ struct uart_port { struct attribute_group *attr_group; /* port specific attributes */ const struct attribute_group **tty_groups; /* all attributes (serial core use only) */ struct serial_rs485 rs485; - const struct serial_rs485 *rs485_supported; /* Supported mask for serial_rs485 */ + struct serial_rs485 rs485_supported; /* Supported mask for serial_rs485 */ struct gpio_desc *rs485_term_gpio; /* enable RS485 bus termination */ struct serial_iso7816 iso7816; void *private_data; /* generic platform data pointer */ -- cgit From 8aa5bcb61612060429223d1fbb7a1c30a579fc1f Mon Sep 17 00:00:00 2001 From: Mikko Perttunen Date: Mon, 27 Jun 2022 17:19:48 +0300 Subject: gpu: host1x: Add context device management code Add code to register context devices from device tree, allocate them out and manage their refcounts. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding --- include/linux/host1x.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) (limited to 'include/linux') diff --git a/include/linux/host1x.h b/include/linux/host1x.h index c0bf4e581fe9..32a82da13fed 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -446,4 +446,38 @@ int tegra_mipi_disable(struct tegra_mipi_device *device); int tegra_mipi_start_calibration(struct tegra_mipi_device *device); int tegra_mipi_finish_calibration(struct tegra_mipi_device *device); +/* host1x memory contexts */ + +struct host1x_memory_context { + struct host1x *host; + + refcount_t ref; + struct pid *owner; + + struct device dev; + u64 dma_mask; + u32 stream_id; +}; + +#ifdef CONFIG_IOMMU_API +struct host1x_memory_context *host1x_memory_context_alloc(struct host1x *host1x, + struct pid *pid); +void host1x_memory_context_get(struct host1x_memory_context *cd); +void host1x_memory_context_put(struct host1x_memory_context *cd); +#else +static inline struct host1x_memory_context *host1x_memory_context_alloc(struct host1x *host1x, + struct pid *pid) +{ + return NULL; +} + +static inline void host1x_memory_context_get(struct host1x_memory_context *cd) +{ +} + +static inline void host1x_memory_context_put(struct host1x_memory_context *cd) +{ +} +#endif + #endif -- cgit From 2486254781eab6f6119fabea78f11386c54460d2 Mon Sep 17 00:00:00 2001 From: Mikko Perttunen Date: Mon, 27 Jun 2022 17:19:49 +0300 Subject: gpu: host1x: Program context stream ID on submission Add code to do stream ID switching at the beginning of a job. The stream ID is switched to the stream ID specified by the context passed in the job structure. Before switching the stream ID, an OP_DONE wait is done on the channel's engine to ensure that there is no residual ongoing work that might do DMA using the new stream ID. Signed-off-by: Mikko Perttunen Signed-off-by: Thierry Reding --- include/linux/host1x.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/host1x.h b/include/linux/host1x.h index 32a82da13fed..cb2100d9b0ff 100644 --- a/include/linux/host1x.h +++ b/include/linux/host1x.h @@ -327,6 +327,14 @@ struct host1x_job { /* Whether host1x-side firewall should be ran for this job or not */ bool enable_firewall; + + /* Options for configuring engine data stream ID */ + /* Context device to use for job */ + struct host1x_memory_context *memory_context; + /* Stream ID to use if context isolation is disabled (!memory_context) */ + u32 engine_fallback_streamid; + /* Engine offset to program stream ID to */ + u32 engine_streamid_offset; }; struct host1x_job *host1x_job_alloc(struct host1x_channel *ch, -- cgit From b6c1c5745ccc68ac5d57c7ffb51ea25a86d0e97b Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Mon, 27 Jun 2022 08:35:24 -0700 Subject: dm: Add verity helpers for LoadPin LoadPin limits loading of kernel modules, firmware and certain other files to a 'pinned' file system (typically a read-only rootfs). To provide more flexibility LoadPin is being extended to also allow loading these files from trusted dm-verity devices. For that purpose LoadPin can be provided with a list of verity root digests that it should consider as trusted. Add a bunch of helpers to allow LoadPin to check whether a DM device is a trusted verity device. The new functions broadly fall in two categories: those that need access to verity internals (like the root digest), and the 'glue' between LoadPin and verity. The new file dm-verity-loadpin.c contains the glue functions. Signed-off-by: Matthias Kaehlcke Acked-by: Mike Snitzer Link: https://lore.kernel.org/lkml/20220627083512.v7.1.I3e928575a23481121e73286874c4c2bdb403355d@changeid Signed-off-by: Kees Cook --- include/linux/dm-verity-loadpin.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 include/linux/dm-verity-loadpin.h (limited to 'include/linux') diff --git a/include/linux/dm-verity-loadpin.h b/include/linux/dm-verity-loadpin.h new file mode 100644 index 000000000000..fb695ecaa5d5 --- /dev/null +++ b/include/linux/dm-verity-loadpin.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LINUX_DM_VERITY_LOADPIN_H +#define __LINUX_DM_VERITY_LOADPIN_H + +#include + +struct block_device; + +extern struct list_head dm_verity_loadpin_trusted_root_digests; + +struct dm_verity_loadpin_trusted_root_digest { + struct list_head node; + unsigned int len; + u8 data[]; +}; + +#if IS_ENABLED(CONFIG_SECURITY_LOADPIN) && IS_BUILTIN(CONFIG_DM_VERITY) +bool dm_verity_loadpin_is_bdev_trusted(struct block_device *bdev); +#else +static inline bool dm_verity_loadpin_is_bdev_trusted(struct block_device *bdev) +{ + return false; +} +#endif + +#endif /* __LINUX_DM_VERITY_LOADPIN_H */ -- cgit From 231af4709018a8e4f20e511da4b6506346d662d3 Mon Sep 17 00:00:00 2001 From: Matthias Kaehlcke Date: Mon, 27 Jun 2022 08:35:26 -0700 Subject: dm: verity-loadpin: Use CONFIG_SECURITY_LOADPIN_VERITY for conditional compilation The verity glue for LoadPin is only needed when CONFIG_SECURITY_LOADPIN_VERITY is set, use this option for conditional compilation instead of the combo of CONFIG_DM_VERITY and CONFIG_SECURITY_LOADPIN. Signed-off-by: Matthias Kaehlcke Acked-by: Mike Snitzer Link: https://lore.kernel.org/lkml/20220627083512.v7.3.I5aca2dcc3b06de4bf53696cd21329dce8272b8aa@changeid Signed-off-by: Kees Cook --- include/linux/dm-verity-loadpin.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dm-verity-loadpin.h b/include/linux/dm-verity-loadpin.h index fb695ecaa5d5..552b817ab102 100644 --- a/include/linux/dm-verity-loadpin.h +++ b/include/linux/dm-verity-loadpin.h @@ -15,7 +15,7 @@ struct dm_verity_loadpin_trusted_root_digest { u8 data[]; }; -#if IS_ENABLED(CONFIG_SECURITY_LOADPIN) && IS_BUILTIN(CONFIG_DM_VERITY) +#if IS_ENABLED(CONFIG_SECURITY_LOADPIN_VERITY) bool dm_verity_loadpin_is_bdev_trusted(struct block_device *bdev); #else static inline bool dm_verity_loadpin_is_bdev_trusted(struct block_device *bdev) -- cgit From 91a29af413def677495e447fb9a06957ebc8bed5 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Thu, 7 Jul 2022 19:23:09 +0100 Subject: gpio: Remove dynamic allocation from populate_parent_alloc_arg() The gpiolib is unique in the way it uses intermediate fwspecs when feeding an interrupt specifier to the parent domain, as it relies on the populate_parent_alloc_arg() callback to perform a dynamic allocation. This is pretty inefficient (we free the structure almost immediately), and the only reason this isn't a stack allocation is that our ThunderX friend uses MSIs rather than a FW-constructed structure. Let's solve it by providing a new type composed of the union of a struct irq_fwspec and a msi_info_t, which satisfies both requirements. This allows us to use a stack allocation, and we can move the handful of users to this new scheme. Also perform some additional cleanup, such as getting rid of the stub versions of the irq_domain_translate_*cell helpers, which are never used when CONFIG_IRQ_DOMAIN_HIERARCHY isn't selected. Tested on a Tegra186. Reviewed-by: Linus Walleij Signed-off-by: Marc Zyngier Cc: Daniel Palmer Cc: Romain Perier Cc: Bartosz Golaszewski Cc: Thierry Reding Cc: Jonathan Hunter Cc: Robert Richter Cc: Nobuhiro Iwamatsu Cc: Andy Gross Cc: Bjorn Andersson Acked-by: Bartosz Golaszewski Link: https://lore.kernel.org/r/20220707182314.66610-2-prabhakar.mahadev-lad.rj@bp.renesas.com --- include/linux/gpio/driver.h | 42 +++++++++++++++++++----------------------- 1 file changed, 19 insertions(+), 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gpio/driver.h b/include/linux/gpio/driver.h index b1e0f1f8ee2e..ad5b92b74ccf 100644 --- a/include/linux/gpio/driver.h +++ b/include/linux/gpio/driver.h @@ -12,6 +12,8 @@ #include #include +#include + struct gpio_desc; struct of_phandle_args; struct device_node; @@ -23,6 +25,13 @@ enum gpio_lookup_flags; struct gpio_chip; +union gpio_irq_fwspec { + struct irq_fwspec fwspec; +#ifdef CONFIG_GENERIC_MSI_IRQ_DOMAIN + msi_alloc_info_t msiinfo; +#endif +}; + #define GPIO_LINE_DIRECTION_IN 1 #define GPIO_LINE_DIRECTION_OUT 0 @@ -103,9 +112,10 @@ struct gpio_irq_chip { * variant named &gpiochip_populate_parent_fwspec_fourcell is also * available. */ - void *(*populate_parent_alloc_arg)(struct gpio_chip *gc, - unsigned int parent_hwirq, - unsigned int parent_type); + int (*populate_parent_alloc_arg)(struct gpio_chip *gc, + union gpio_irq_fwspec *fwspec, + unsigned int parent_hwirq, + unsigned int parent_type); /** * @child_offset_to_irq: @@ -646,28 +656,14 @@ struct bgpio_pdata { #ifdef CONFIG_IRQ_DOMAIN_HIERARCHY -void *gpiochip_populate_parent_fwspec_twocell(struct gpio_chip *gc, +int gpiochip_populate_parent_fwspec_twocell(struct gpio_chip *gc, + union gpio_irq_fwspec *gfwspec, + unsigned int parent_hwirq, + unsigned int parent_type); +int gpiochip_populate_parent_fwspec_fourcell(struct gpio_chip *gc, + union gpio_irq_fwspec *gfwspec, unsigned int parent_hwirq, unsigned int parent_type); -void *gpiochip_populate_parent_fwspec_fourcell(struct gpio_chip *gc, - unsigned int parent_hwirq, - unsigned int parent_type); - -#else - -static inline void *gpiochip_populate_parent_fwspec_twocell(struct gpio_chip *gc, - unsigned int parent_hwirq, - unsigned int parent_type) -{ - return NULL; -} - -static inline void *gpiochip_populate_parent_fwspec_fourcell(struct gpio_chip *gc, - unsigned int parent_hwirq, - unsigned int parent_type) -{ - return NULL; -} #endif /* CONFIG_IRQ_DOMAIN_HIERARCHY */ -- cgit From 1dd685c414a7b9fdb3d23aca3aedae84f0b998ae Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Wed, 6 Jul 2022 14:51:00 -0400 Subject: XArray: Add calls to might_alloc() Catch bogus GFP flags deterministically, instead of occasionally when we actually have to allocate memory. Reported-by: Nikolay Borisov Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/xarray.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/xarray.h b/include/linux/xarray.h index c29e11b2c073..44dd6d6e01bc 100644 --- a/include/linux/xarray.h +++ b/include/linux/xarray.h @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -586,6 +587,7 @@ static inline void *xa_store_bh(struct xarray *xa, unsigned long index, { void *curr; + might_alloc(gfp); xa_lock_bh(xa); curr = __xa_store(xa, index, entry, gfp); xa_unlock_bh(xa); @@ -612,6 +614,7 @@ static inline void *xa_store_irq(struct xarray *xa, unsigned long index, { void *curr; + might_alloc(gfp); xa_lock_irq(xa); curr = __xa_store(xa, index, entry, gfp); xa_unlock_irq(xa); @@ -687,6 +690,7 @@ static inline void *xa_cmpxchg(struct xarray *xa, unsigned long index, { void *curr; + might_alloc(gfp); xa_lock(xa); curr = __xa_cmpxchg(xa, index, old, entry, gfp); xa_unlock(xa); @@ -714,6 +718,7 @@ static inline void *xa_cmpxchg_bh(struct xarray *xa, unsigned long index, { void *curr; + might_alloc(gfp); xa_lock_bh(xa); curr = __xa_cmpxchg(xa, index, old, entry, gfp); xa_unlock_bh(xa); @@ -741,6 +746,7 @@ static inline void *xa_cmpxchg_irq(struct xarray *xa, unsigned long index, { void *curr; + might_alloc(gfp); xa_lock_irq(xa); curr = __xa_cmpxchg(xa, index, old, entry, gfp); xa_unlock_irq(xa); @@ -770,6 +776,7 @@ static inline int __must_check xa_insert(struct xarray *xa, { int err; + might_alloc(gfp); xa_lock(xa); err = __xa_insert(xa, index, entry, gfp); xa_unlock(xa); @@ -799,6 +806,7 @@ static inline int __must_check xa_insert_bh(struct xarray *xa, { int err; + might_alloc(gfp); xa_lock_bh(xa); err = __xa_insert(xa, index, entry, gfp); xa_unlock_bh(xa); @@ -828,6 +836,7 @@ static inline int __must_check xa_insert_irq(struct xarray *xa, { int err; + might_alloc(gfp); xa_lock_irq(xa); err = __xa_insert(xa, index, entry, gfp); xa_unlock_irq(xa); @@ -857,6 +866,7 @@ static inline __must_check int xa_alloc(struct xarray *xa, u32 *id, { int err; + might_alloc(gfp); xa_lock(xa); err = __xa_alloc(xa, id, entry, limit, gfp); xa_unlock(xa); @@ -886,6 +896,7 @@ static inline int __must_check xa_alloc_bh(struct xarray *xa, u32 *id, { int err; + might_alloc(gfp); xa_lock_bh(xa); err = __xa_alloc(xa, id, entry, limit, gfp); xa_unlock_bh(xa); @@ -915,6 +926,7 @@ static inline int __must_check xa_alloc_irq(struct xarray *xa, u32 *id, { int err; + might_alloc(gfp); xa_lock_irq(xa); err = __xa_alloc(xa, id, entry, limit, gfp); xa_unlock_irq(xa); @@ -948,6 +960,7 @@ static inline int xa_alloc_cyclic(struct xarray *xa, u32 *id, void *entry, { int err; + might_alloc(gfp); xa_lock(xa); err = __xa_alloc_cyclic(xa, id, entry, limit, next, gfp); xa_unlock(xa); @@ -981,6 +994,7 @@ static inline int xa_alloc_cyclic_bh(struct xarray *xa, u32 *id, void *entry, { int err; + might_alloc(gfp); xa_lock_bh(xa); err = __xa_alloc_cyclic(xa, id, entry, limit, next, gfp); xa_unlock_bh(xa); @@ -1014,6 +1028,7 @@ static inline int xa_alloc_cyclic_irq(struct xarray *xa, u32 *id, void *entry, { int err; + might_alloc(gfp); xa_lock_irq(xa); err = __xa_alloc_cyclic(xa, id, entry, limit, next, gfp); xa_unlock_irq(xa); -- cgit From c435c54639aa5513ab877c8b014dd83d4ce6b40e Mon Sep 17 00:00:00 2001 From: Matthew Rosato Date: Mon, 6 Jun 2022 16:33:14 -0400 Subject: vfio/pci: introduce CONFIG_VFIO_PCI_ZDEV_KVM The current contents of vfio-pci-zdev are today only useful in a KVM environment; let's tie everything currently under vfio-pci-zdev to this Kconfig statement and require KVM in this case, reducing complexity (e.g. symbol lookups). Signed-off-by: Matthew Rosato Acked-by: Alex Williamson Reviewed-by: Pierre Morel Link: https://lore.kernel.org/r/20220606203325.110625-11-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger --- include/linux/vfio_pci_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 23c176d4b073..63af2897939c 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -206,7 +206,7 @@ static inline int vfio_pci_igd_init(struct vfio_pci_core_device *vdev) } #endif -#ifdef CONFIG_S390 +#ifdef CONFIG_VFIO_PCI_ZDEV_KVM extern int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev, struct vfio_info_cap *caps); #else -- cgit From 3c5a1b6f0a18520a0edd0600fef6f1a8553b8fdc Mon Sep 17 00:00:00 2001 From: Matthew Rosato Date: Mon, 6 Jun 2022 16:33:19 -0400 Subject: KVM: s390: pci: provide routines for enabling/disabling interrupt forwarding These routines will be wired into a kvm ioctl in order to respond to requests to enable / disable a device for Adapter Event Notifications / Adapter Interuption Forwarding. Reviewed-by: Christian Borntraeger Acked-by: Niklas Schnelle Signed-off-by: Matthew Rosato Link: https://lore.kernel.org/r/20220606203325.110625-16-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger --- include/linux/sched/user.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched/user.h b/include/linux/sched/user.h index 00ed419dd464..f054d0360a75 100644 --- a/include/linux/sched/user.h +++ b/include/linux/sched/user.h @@ -24,7 +24,8 @@ struct user_struct { kuid_t uid; #if defined(CONFIG_PERF_EVENTS) || defined(CONFIG_BPF_SYSCALL) || \ - defined(CONFIG_NET) || defined(CONFIG_IO_URING) + defined(CONFIG_NET) || defined(CONFIG_IO_URING) || \ + defined(CONFIG_VFIO_PCI_ZDEV_KVM) atomic_long_t locked_vm; #endif #ifdef CONFIG_WATCH_QUEUE -- cgit From 8061d1c31f1a018281bc9877ecce44bfc779e21d Mon Sep 17 00:00:00 2001 From: Matthew Rosato Date: Mon, 6 Jun 2022 16:33:21 -0400 Subject: vfio-pci/zdev: add open/close device hooks During vfio-pci open_device, pass the KVM associated with the vfio group (if one exists). This is needed in order to pass a special indicator (GISA) to firmware to allow zPCI interpretation facilities to be used for only the specific KVM associated with the vfio-pci device. During vfio-pci close_device, unregister the notifier. Signed-off-by: Matthew Rosato Acked-by: Alex Williamson Reviewed-by: Pierre Morel Link: https://lore.kernel.org/r/20220606203325.110625-18-mjrosato@linux.ibm.com Signed-off-by: Christian Borntraeger --- include/linux/vfio_pci_core.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 63af2897939c..d5d9e17f0156 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -209,12 +209,22 @@ static inline int vfio_pci_igd_init(struct vfio_pci_core_device *vdev) #ifdef CONFIG_VFIO_PCI_ZDEV_KVM extern int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev, struct vfio_info_cap *caps); +int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev); +void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev); #else static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev, struct vfio_info_cap *caps) { return -ENODEV; } + +static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev) +{ + return 0; +} + +static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev) +{} #endif /* Will be exported for vfio pci drivers usage */ -- cgit From ef6e5d61eb7a0a30f776a829274573094185d03d Mon Sep 17 00:00:00 2001 From: Michael Walle Date: Wed, 6 Jul 2022 17:15:52 +0200 Subject: genirq: Allow irq_set_chip_handler_name_locked() to take a const irq_chip Similar to commit 393e1280f765 ("genirq: Allow irq_chip registration functions to take a const irq_chip"), allow the irq_set_chip_handler_name_locked() function to take a const irq_chip argument. Signed-off-by: Michael Walle Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220706151553.1580790-1-michael@walle.cc --- include/linux/irqdesc.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h index a77584593f7d..1cd4e36890fb 100644 --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -209,14 +209,15 @@ static inline void irq_set_handler_locked(struct irq_data *data, * Must be called with irq_desc locked and valid parameters. */ static inline void -irq_set_chip_handler_name_locked(struct irq_data *data, struct irq_chip *chip, +irq_set_chip_handler_name_locked(struct irq_data *data, + const struct irq_chip *chip, irq_flow_handler_t handler, const char *name) { struct irq_desc *desc = irq_data_to_desc(data); desc->handle_irq = handler; desc->name = name; - data->chip = chip; + data->chip = (struct irq_chip *)chip; } bool irq_check_status_bit(unsigned int irq, unsigned int bitmask); -- cgit From 6976890e8998afd8abbbd9fe27ed71387b24f57f Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 22 Jun 2022 11:00:45 +0200 Subject: netfilter: nf_conntrack: add missing __rcu annotations Access to the hook pointers use correct helpers but the pointers lack the needed __rcu annotation. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/linux/netfilter/nf_conntrack_sip.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfilter/nf_conntrack_sip.h b/include/linux/netfilter/nf_conntrack_sip.h index c620521c42bc..dbc614dfe0d5 100644 --- a/include/linux/netfilter/nf_conntrack_sip.h +++ b/include/linux/netfilter/nf_conntrack_sip.h @@ -164,7 +164,7 @@ struct nf_nat_sip_hooks { unsigned int medialen, union nf_inet_addr *rtp_addr); }; -extern const struct nf_nat_sip_hooks *nf_nat_sip_hooks; +extern const struct nf_nat_sip_hooks __rcu *nf_nat_sip_hooks; int ct_sip_parse_request(const struct nf_conn *ct, const char *dptr, unsigned int datalen, unsigned int *matchoff, -- cgit From d3f2d0a292c24fc624afb2b4f47f838e83775721 Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Wed, 22 Jun 2022 11:00:47 +0200 Subject: netfilter: h323: merge nat hook pointers into one sparse complains about incorrect rcu usage. Code uses the correct rcu access primitives, but the function pointers lack rcu annotations. Collapse all of them into a single structure, then annotate the pointer. Signed-off-by: Florian Westphal Signed-off-by: Pablo Neira Ayuso --- include/linux/netfilter/nf_conntrack_h323.h | 109 ++++++++++++++-------------- 1 file changed, 56 insertions(+), 53 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfilter/nf_conntrack_h323.h b/include/linux/netfilter/nf_conntrack_h323.h index 4561ec0fcea4..9e937f64a1ad 100644 --- a/include/linux/netfilter/nf_conntrack_h323.h +++ b/include/linux/netfilter/nf_conntrack_h323.h @@ -38,60 +38,63 @@ void nf_conntrack_h245_expect(struct nf_conn *new, struct nf_conntrack_expect *this); void nf_conntrack_q931_expect(struct nf_conn *new, struct nf_conntrack_expect *this); -extern int (*set_h245_addr_hook) (struct sk_buff *skb, unsigned int protoff, - unsigned char **data, int dataoff, - H245_TransportAddress *taddr, - union nf_inet_addr *addr, - __be16 port); -extern int (*set_h225_addr_hook) (struct sk_buff *skb, unsigned int protoff, - unsigned char **data, int dataoff, - TransportAddress *taddr, - union nf_inet_addr *addr, - __be16 port); -extern int (*set_sig_addr_hook) (struct sk_buff *skb, - struct nf_conn *ct, - enum ip_conntrack_info ctinfo, - unsigned int protoff, unsigned char **data, - TransportAddress *taddr, int count); -extern int (*set_ras_addr_hook) (struct sk_buff *skb, - struct nf_conn *ct, - enum ip_conntrack_info ctinfo, - unsigned int protoff, unsigned char **data, - TransportAddress *taddr, int count); -extern int (*nat_rtp_rtcp_hook) (struct sk_buff *skb, - struct nf_conn *ct, - enum ip_conntrack_info ctinfo, - unsigned int protoff, unsigned char **data, - int dataoff, - H245_TransportAddress *taddr, - __be16 port, __be16 rtp_port, - struct nf_conntrack_expect *rtp_exp, - struct nf_conntrack_expect *rtcp_exp); -extern int (*nat_t120_hook) (struct sk_buff *skb, struct nf_conn *ct, - enum ip_conntrack_info ctinfo, - unsigned int protoff, + +struct nfct_h323_nat_hooks { + int (*set_h245_addr)(struct sk_buff *skb, unsigned int protoff, unsigned char **data, int dataoff, - H245_TransportAddress *taddr, __be16 port, - struct nf_conntrack_expect *exp); -extern int (*nat_h245_hook) (struct sk_buff *skb, struct nf_conn *ct, - enum ip_conntrack_info ctinfo, - unsigned int protoff, + H245_TransportAddress *taddr, + union nf_inet_addr *addr, __be16 port); + int (*set_h225_addr)(struct sk_buff *skb, unsigned int protoff, unsigned char **data, int dataoff, - TransportAddress *taddr, __be16 port, - struct nf_conntrack_expect *exp); -extern int (*nat_callforwarding_hook) (struct sk_buff *skb, - struct nf_conn *ct, - enum ip_conntrack_info ctinfo, - unsigned int protoff, - unsigned char **data, int dataoff, - TransportAddress *taddr, - __be16 port, - struct nf_conntrack_expect *exp); -extern int (*nat_q931_hook) (struct sk_buff *skb, struct nf_conn *ct, - enum ip_conntrack_info ctinfo, - unsigned int protoff, - unsigned char **data, TransportAddress *taddr, - int idx, __be16 port, - struct nf_conntrack_expect *exp); + TransportAddress *taddr, + union nf_inet_addr *addr, __be16 port); + int (*set_sig_addr)(struct sk_buff *skb, + struct nf_conn *ct, + enum ip_conntrack_info ctinfo, + unsigned int protoff, unsigned char **data, + TransportAddress *taddr, int count); + int (*set_ras_addr)(struct sk_buff *skb, + struct nf_conn *ct, + enum ip_conntrack_info ctinfo, + unsigned int protoff, unsigned char **data, + TransportAddress *taddr, int count); + int (*nat_rtp_rtcp)(struct sk_buff *skb, + struct nf_conn *ct, + enum ip_conntrack_info ctinfo, + unsigned int protoff, + unsigned char **data, int dataoff, + H245_TransportAddress *taddr, + __be16 port, __be16 rtp_port, + struct nf_conntrack_expect *rtp_exp, + struct nf_conntrack_expect *rtcp_exp); + int (*nat_t120)(struct sk_buff *skb, + struct nf_conn *ct, + enum ip_conntrack_info ctinfo, + unsigned int protoff, + unsigned char **data, int dataoff, + H245_TransportAddress *taddr, __be16 port, + struct nf_conntrack_expect *exp); + int (*nat_h245)(struct sk_buff *skb, + struct nf_conn *ct, + enum ip_conntrack_info ctinfo, + unsigned int protoff, + unsigned char **data, int dataoff, + TransportAddress *taddr, __be16 port, + struct nf_conntrack_expect *exp); + int (*nat_callforwarding)(struct sk_buff *skb, + struct nf_conn *ct, + enum ip_conntrack_info ctinfo, + unsigned int protoff, + unsigned char **data, int dataoff, + TransportAddress *taddr, __be16 port, + struct nf_conntrack_expect *exp); + int (*nat_q931)(struct sk_buff *skb, + struct nf_conn *ct, + enum ip_conntrack_info ctinfo, + unsigned int protoff, + unsigned char **data, TransportAddress *taddr, int idx, + __be16 port, struct nf_conntrack_expect *exp); +}; +extern const struct nfct_h323_nat_hooks __rcu *nfct_h323_nat_hook; #endif -- cgit From d5b36a4dbd06c5e8e36ca8ccc552f679069e2946 Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 11 Jul 2022 18:16:25 +0200 Subject: fix race between exit_itimers() and /proc/pid/timers As Chris explains, the comment above exit_itimers() is not correct, we can race with proc_timers_seq_ops. Change exit_itimers() to clear signal->posix_timers with ->siglock held. Cc: Reported-by: chris@accessvector.net Signed-off-by: Oleg Nesterov Signed-off-by: Linus Torvalds --- include/linux/sched/task.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 505aaf9fe477..81cab4b01edc 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -85,7 +85,7 @@ static inline void exit_thread(struct task_struct *tsk) extern __noreturn void do_group_exit(int); extern void exit_files(struct task_struct *); -extern void exit_itimers(struct signal_struct *); +extern void exit_itimers(struct task_struct *); extern pid_t kernel_clone(struct kernel_clone_args *kargs); struct task_struct *create_io_thread(int (*fn)(void *), void *arg, int node); -- cgit From 3d6e44623841c8b82c2157f2f749019803fb238a Mon Sep 17 00:00:00 2001 From: Jeremy Kerr Date: Sat, 9 Jul 2022 11:19:57 +0800 Subject: kunit: unify module and builtin suite definitions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently, KUnit runs built-in tests and tests loaded from modules differently. For built-in tests, the kunit_test_suite{,s}() macro adds a list of suites in the .kunit_test_suites linker section. However, for kernel modules, a module_init() function is used to run the test suites. This causes problems if tests are included in a module which already defines module_init/exit_module functions, as they'll conflict with the kunit-provided ones. This change removes the kunit-defined module inits, and instead parses the kunit tests from their own section in the module. After module init, we call __kunit_test_suites_init() on the contents of that section, which prepares and runs the suite. This essentially unifies the module- and non-module kunit init formats. Tested-by: Maíra Canal Reviewed-by: Brendan Higgins Signed-off-by: Jeremy Kerr Signed-off-by: Daniel Latypov Signed-off-by: David Gow Signed-off-by: Shuah Khan --- include/linux/module.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/module.h b/include/linux/module.h index abd9fa916b7d..2490223c975d 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -505,6 +505,11 @@ struct module { int num_static_call_sites; struct static_call_site *static_call_sites; #endif +#if IS_ENABLED(CONFIG_KUNIT) + int num_kunit_suites; + struct kunit_suite ***kunit_suites; +#endif + #ifdef CONFIG_LIVEPATCH bool klp; /* Is this a livepatch module? */ -- cgit From e5857d396f35e59e6fe96cf1178b0357cc3a1ea4 Mon Sep 17 00:00:00 2001 From: Daniel Latypov Date: Sat, 9 Jul 2022 11:19:58 +0800 Subject: kunit: flatten kunit_suite*** to kunit_suite** in .kunit_test_suites MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We currently store kunit suites in the .kunit_test_suites ELF section as a `struct kunit_suite***` (modulo some `const`s). For every test file, we store a struct kunit_suite** NULL-terminated array. This adds quite a bit of complexity to the test filtering code in the executor. Instead, let's just make the .kunit_test_suites section contain a single giant array of struct kunit_suite pointers, which can then be directly manipulated. This array is not NULL-terminated, and so none of the test filtering code needs to NULL-terminate anything. Tested-by: Maíra Canal Reviewed-by: Brendan Higgins Signed-off-by: Daniel Latypov Co-developed-by: David Gow Signed-off-by: David Gow Signed-off-by: Shuah Khan --- include/linux/module.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/module.h b/include/linux/module.h index 2490223c975d..518296ea7f73 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -507,7 +507,7 @@ struct module { #endif #if IS_ENABLED(CONFIG_KUNIT) int num_kunit_suites; - struct kunit_suite ***kunit_suites; + struct kunit_suite **kunit_suites; #endif -- cgit From f16214c102f0f64b2f3546e989498525bd7b7708 Mon Sep 17 00:00:00 2001 From: Matthieu Baerts Date: Mon, 11 Jul 2022 10:12:00 +0200 Subject: bpf: Fix 'dubious one-bit signed bitfield' warnings Our CI[1] reported these warnings when using Sparse: $ touch net/mptcp/bpf.c $ make C=1 net/mptcp/bpf.o net/mptcp/bpf.c: note: in included file: include/linux/bpf_verifier.h:348:26: error: dubious one-bit signed bitfield include/linux/bpf_verifier.h:349:29: error: dubious one-bit signed bitfield Set them as 'unsigned' to avoid warnings. [1] https://github.com/multipath-tcp/mptcp_net-next/actions/runs/2643588487 Fixes: 1ade23711971 ("bpf: Inline calls to bpf_loop when callback is known") Signed-off-by: Matthieu Baerts Signed-off-by: Andrii Nakryiko Acked-by: Yonghong Song Link: https://lore.kernel.org/bpf/20220711081200.2081262-1-matthieu.baerts@tessares.net --- include/linux/bpf_verifier.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 81b19669efba..2e3bad8640dc 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -345,10 +345,10 @@ struct bpf_verifier_state_list { }; struct bpf_loop_inline_state { - int initialized:1; /* set to true upon first entry */ - int fit_for_inline:1; /* true if callback function is the same - * at each call and flags are always zero - */ + unsigned int initialized:1; /* set to true upon first entry */ + unsigned int fit_for_inline:1; /* true if callback function is the same + * at each call and flags are always zero + */ u32 callback_subprogno; /* valid when fit_for_inline is true */ }; -- cgit From 7b19119f4c7dc9412b556ea12fae6b32574e2810 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Mon, 11 Jul 2022 01:14:06 -0700 Subject: net/mlx5: Use devl_ API in mlx5e_devlink_port_register As part of the flows invoked by mlx5_devlink_eswitch_mode_set() get to mlx5_rescan_drivers_locked() which can call mlx5e_probe()/mlx5e_remove and register/unregister mlx5e driver ports accordingly. This can lead to deadlock once mlx5_devlink_eswitch_mode_set() will use devlink lock. Use devl_port_register/unregister() instead of devlink_port_register/unregister() and add devlink instance locks in the driver paths to this function to have it locked while calling devl_ API function. If remove or probe were called by module init or module cleanup flows, need to lock devlink just before calling devl_port_register(), otherwise it is called by attach/detach or register/unregister flow and we can have the flow locked. Added flag to distinguish between these cases. This will be used by the downstream patch to invoke mlx5_devlink_eswitch_mode_set() with devlink locked. Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Paolo Abeni --- include/linux/mlx5/driver.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 76d7661e3e63..bd882884b23c 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -551,6 +551,10 @@ enum { * creation/deletion on drivers rescan. Unset during device attach. */ MLX5_PRIV_FLAGS_DETACH = 1 << 2, + /* Distinguish between mlx5e_probe/remove called by module init/cleanup + * and called by other flows which can already hold devlink lock + */ + MLX5_PRIV_FLAGS_MLX5E_LOCKED_FLOW = 1 << 3, }; struct mlx5_adev { -- cgit From 91f059c95c6a5dbc0907a5f871e7915a5e93c1f9 Mon Sep 17 00:00:00 2001 From: Shaik Sajida Bhanu Date: Fri, 27 May 2022 23:23:52 +0530 Subject: mmc: core: Capture eMMC and SD card errors Add changes to capture eMMC and SD card errors. This is useful for debug and testing. Signed-off-by: Liangliang Lu Signed-off-by: Sayali Lokhande Signed-off-by: Bao D. Nguyen Signed-off-by: Ram Prakash Gupta Signed-off-by: Shaik Sajida Bhanu Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/1653674036-21829-2-git-send-email-quic_c_sbhanu@quicinc.com Signed-off-by: Ulf Hansson --- include/linux/mmc/host.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index c193c50ccd78..eb8bc5b9b0b7 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -93,6 +93,25 @@ struct mmc_clk_phase_map { struct mmc_host; +enum mmc_err_stat { + MMC_ERR_CMD_TIMEOUT, + MMC_ERR_CMD_CRC, + MMC_ERR_DAT_TIMEOUT, + MMC_ERR_DAT_CRC, + MMC_ERR_AUTO_CMD, + MMC_ERR_ADMA, + MMC_ERR_TUNING, + MMC_ERR_CMDQ_RED, + MMC_ERR_CMDQ_GCE, + MMC_ERR_CMDQ_ICCE, + MMC_ERR_REQ_TIMEOUT, + MMC_ERR_CMDQ_REQ_TIMEOUT, + MMC_ERR_ICE_CFG, + MMC_ERR_CTRL_TIMEOUT, + MMC_ERR_UNEXPECTED_IRQ, + MMC_ERR_MAX, +}; + struct mmc_host_ops { /* * It is optional for the host to implement pre_req and post_req in @@ -501,6 +520,7 @@ struct mmc_host { /* Host Software Queue support */ bool hsq_enabled; + u32 err_stats[MMC_ERR_MAX]; unsigned long private[] ____cacheline_aligned; }; @@ -635,6 +655,12 @@ static inline enum dma_data_direction mmc_get_dma_dir(struct mmc_data *data) return data->flags & MMC_DATA_WRITE ? DMA_TO_DEVICE : DMA_FROM_DEVICE; } +static inline void mmc_debugfs_err_stats_inc(struct mmc_host *host, + enum mmc_err_stat stat) +{ + host->err_stats[stat] += 1; +} + int mmc_send_tuning(struct mmc_host *host, u32 opcode, int *cmd_error); int mmc_send_abort_tuning(struct mmc_host *host, u32 opcode); int mmc_get_ext_csd(struct mmc_card *card, u8 **new_ext_csd); -- cgit From efe8f5c9b5e118070f424205078ababc46fd130a Mon Sep 17 00:00:00 2001 From: Shaik Sajida Bhanu Date: Fri, 27 May 2022 23:23:53 +0530 Subject: mmc: sdhci: Capture eMMC and SD card errors Add changes to capture eMMC and SD card errors. This is useful for debug and testing. Signed-off-by: Liangliang Lu Signed-off-by: Sayali Lokhande Signed-off-by: Bao D. Nguyen Signed-off-by: Shaik Sajida Bhanu Acked-by: Adrian Hunter Link: https://lore.kernel.org/r/1653674036-21829-3-git-send-email-quic_c_sbhanu@quicinc.com Signed-off-by: Ulf Hansson --- include/linux/mmc/mmc.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmc/mmc.h b/include/linux/mmc/mmc.h index d9a65c6a8816..9c50bc40f8ff 100644 --- a/include/linux/mmc/mmc.h +++ b/include/linux/mmc/mmc.h @@ -99,6 +99,12 @@ static inline bool mmc_op_multi(u32 opcode) opcode == MMC_READ_MULTIPLE_BLOCK; } +static inline bool mmc_op_tuning(u32 opcode) +{ + return opcode == MMC_SEND_TUNING_BLOCK || + opcode == MMC_SEND_TUNING_BLOCK_HS200; +} + /* * MMC_SWITCH argument format: * -- cgit From 2083da24eb56ce622332946800a67a7449d85fe5 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Fri, 10 Jun 2022 12:02:04 +0530 Subject: OPP: Allow multiple clocks for a device This patch adds support to allow multiple clocks for a device. The design is pretty much similar to how this is done for regulators, and platforms can supply their own version of the config_clks() callback if they have multiple clocks for their device. The core manages the calls via opp_table->config_clks() eventually. We have kept both "clk" and "clks" fields in the OPP table structure and the reason is provided as a comment in _opp_set_clknames(). The same isn't done for "rates" though and we use rates[0] at most of the places now. Co-developed-by: Krzysztof Kozlowski Signed-off-by: Krzysztof Kozlowski Tested-by: Jon Hunter Tested-by: Dmitry Osipenko Tested-by: Manivannan Sadhasivam Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 50cbc75bef71..104151dfe46c 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -61,9 +61,13 @@ typedef int (*config_regulators_t)(struct device *dev, struct dev_pm_opp *old_opp, struct dev_pm_opp *new_opp, struct regulator **regulators, unsigned int count); +typedef int (*config_clks_t)(struct device *dev, struct opp_table *opp_table, + struct dev_pm_opp *opp, void *data, bool scaling_down); + /** * struct dev_pm_opp_config - Device OPP configuration values - * @clk_names: Clk names, NULL terminated array, max 1 clock for now. + * @clk_names: Clk names, NULL terminated array. + * @config_clks: Custom set clk helper. * @prop_name: Name to postfix to properties. * @config_regulators: Custom set regulator helper. * @supported_hw: Array of hierarchy of versions to match. @@ -78,6 +82,7 @@ typedef int (*config_regulators_t)(struct device *dev, struct dev_pm_opp_config { /* NULL terminated */ const char * const *clk_names; + config_clks_t config_clks; const char *prop_name; config_regulators_t config_regulators; const unsigned int *supported_hw; -- cgit From 8174a3a613af1a911ab19da812824f7180b261f9 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Fri, 10 Jun 2022 12:37:25 +0530 Subject: OPP: Provide a simple implementation to configure multiple clocks This provides a simple implementation to configure multiple clocks for a device. Tested-by: Dmitry Osipenko Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 104151dfe46c..683e6baf9618 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -159,6 +159,9 @@ int dev_pm_opp_unregister_notifier(struct device *dev, struct notifier_block *nb int dev_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config); int devm_pm_opp_set_config(struct device *dev, struct dev_pm_opp_config *config); void dev_pm_opp_clear_config(int token); +int dev_pm_opp_config_clks_simple(struct device *dev, + struct opp_table *opp_table, struct dev_pm_opp *opp, void *data, + bool scaling_down); struct dev_pm_opp *dev_pm_opp_xlate_required_opp(struct opp_table *src_table, struct opp_table *dst_table, struct dev_pm_opp *src_opp); int dev_pm_opp_xlate_performance_state(struct opp_table *src_table, struct opp_table *dst_table, unsigned int pstate); @@ -342,6 +345,13 @@ static inline int devm_pm_opp_set_config(struct device *dev, struct dev_pm_opp_c static inline void dev_pm_opp_clear_config(int token) {} +static inline int dev_pm_opp_config_clks_simple(struct device *dev, + struct opp_table *opp_table, struct dev_pm_opp *opp, void *data, + bool scaling_down) +{ + return -EOPNOTSUPP; +} + static inline struct dev_pm_opp *dev_pm_opp_xlate_required_opp(struct opp_table *src_table, struct opp_table *dst_table, struct dev_pm_opp *src_opp) { -- cgit From 1e5fb38442ebaabb65057c2e58355ee36c2a178f Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Tue, 5 Jul 2022 09:41:56 +0530 Subject: OPP: Remove dev{m}_pm_opp_of_add_table_noclk() Remove the now unused variants and the now unnecessary "getclk" parameter from few routines. Signed-off-by: Viresh Kumar --- include/linux/pm_opp.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm_opp.h b/include/linux/pm_opp.h index 683e6baf9618..dc1fb5890792 100644 --- a/include/linux/pm_opp.h +++ b/include/linux/pm_opp.h @@ -402,8 +402,6 @@ static inline int dev_pm_opp_sync_regulators(struct device *dev) int dev_pm_opp_of_add_table(struct device *dev); int dev_pm_opp_of_add_table_indexed(struct device *dev, int index); int devm_pm_opp_of_add_table_indexed(struct device *dev, int index); -int dev_pm_opp_of_add_table_noclk(struct device *dev, int index); -int devm_pm_opp_of_add_table_noclk(struct device *dev, int index); void dev_pm_opp_of_remove_table(struct device *dev); int devm_pm_opp_of_add_table(struct device *dev); int dev_pm_opp_of_cpumask_add_table(const struct cpumask *cpumask); @@ -434,16 +432,6 @@ static inline int devm_pm_opp_of_add_table_indexed(struct device *dev, int index return -EOPNOTSUPP; } -static inline int dev_pm_opp_of_add_table_noclk(struct device *dev, int index) -{ - return -EOPNOTSUPP; -} - -static inline int devm_pm_opp_of_add_table_noclk(struct device *dev, int index) -{ - return -EOPNOTSUPP; -} - static inline void dev_pm_opp_of_remove_table(struct device *dev) { } -- cgit From bbd60fee2d2166b2b8722cbad740996ef2e7ce40 Mon Sep 17 00:00:00 2001 From: Christian König Date: Mon, 11 Jul 2022 16:48:01 +0200 Subject: dma-buf: revert "return only unsignaled fences in dma_fence_unwrap_for_each v3" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 8f61973718485f3e89bc4f408f929048b7b47c83. It turned out that this is not correct. Especially the sync_file info IOCTL needs to see even signaled fences to correctly report back their status to userspace. Instead add the filter in the merge function again where it makes sense. Signed-off-by: Christian König Tested-by: Karolina Drobnik Reviewed-by: Alex Deucher Link: https://patchwork.freedesktop.org/patch/msgid/20220712102849.1562-1-christian.koenig@amd.com --- include/linux/dma-fence-unwrap.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-fence-unwrap.h b/include/linux/dma-fence-unwrap.h index 390de1ee9d35..66b1e56fbb81 100644 --- a/include/linux/dma-fence-unwrap.h +++ b/include/linux/dma-fence-unwrap.h @@ -43,14 +43,10 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor); * Unwrap dma_fence_chain and dma_fence_array containers and deep dive into all * potential fences in them. If @head is just a normal fence only that one is * returned. - * - * Note that signalled fences are opportunistically filtered out, which - * means the iteration is potentially over no fence at all. */ #define dma_fence_unwrap_for_each(fence, cursor, head) \ for (fence = dma_fence_unwrap_first(head, cursor); fence; \ - fence = dma_fence_unwrap_next(cursor)) \ - if (!dma_fence_is_signaled(fence)) + fence = dma_fence_unwrap_next(cursor)) struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences, struct dma_fence **fences, -- cgit From 79f772b9e8004c542cc88aaf680189c3f6e9a0f2 Mon Sep 17 00:00:00 2001 From: Sean Christopherson Date: Tue, 14 Jun 2022 22:56:15 +0000 Subject: KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor, again Read vcpu->vcpu_idx directly instead of bouncing through the one-line wrapper, kvm_vcpu_get_idx(), and drop the wrapper. The wrapper is a remnant of the original implementation and serves no purpose; remove it (again) before it gains more users. kvm_vcpu_get_idx() was removed in the not-too-distant past by commit 4eeef2424153 ("KVM: x86: Query vcpu->vcpu_idx directly and drop its accessor"), but was unintentionally re-introduced by commit a54d806688fe ("KVM: Keep memslots in tree-based structures instead of array-based ones"), likely due to a rebase goof. The wrapper then managed to gain users in KVM's Xen code. No functional change intended. Signed-off-by: Sean Christopherson Reviewed-by: Jim Mattson Link: https://lore.kernel.org/r/20220614225615.3843835-1-seanjc@google.com --- include/linux/kvm_host.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3b40f8d68fbb..cb8168cc9755 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -907,11 +907,6 @@ static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id) return NULL; } -static inline int kvm_vcpu_get_idx(struct kvm_vcpu *vcpu) -{ - return vcpu->vcpu_idx; -} - void kvm_destroy_vcpus(struct kvm *kvm); void vcpu_load(struct kvm_vcpu *vcpu); -- cgit From 4201d9ab3e42d9e2a20320b751a931e6239c0df2 Mon Sep 17 00:00:00 2001 From: Roman Gushchin Date: Mon, 11 Jul 2022 09:28:27 -0700 Subject: bpf: reparent bpf maps on memcg offlining The memory consumed by a bpf map is always accounted to the memory cgroup of the process which created the map. The map can outlive the memory cgroup if it's used by processes in other cgroups or is pinned on bpffs. In this case the map pins the original cgroup in the dying state. For other types of objects (slab objects, non-slab kernel allocations, percpu objects and recently LRU pages) there is a reparenting process implemented: on cgroup offlining charged objects are getting reassigned to the parent cgroup. Because all charges and statistics are fully recursive it's a fairly cheap operation. For efficiency and consistency with other types of objects, let's do the same for bpf maps. Fortunately thanks to the objcg API, the required changes are minimal. Please, note that individual allocations (slabs, percpu and large kmallocs) already have the reparenting mechanism. This commit adds it to the saved map->memcg pointer by replacing it to map->objcg. Because dying cgroups are not visible for a user and all charges are recursive, this commit doesn't bring any behavior changes for a user. v2: added a missing const qualifier Signed-off-by: Roman Gushchin Reviewed-by: Shakeel Butt Link: https://lore.kernel.org/r/20220711162827.184743-1-roman.gushchin@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2b21f2a3452f..85a4db3e0536 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -221,7 +221,7 @@ struct bpf_map { u32 btf_vmlinux_value_type_id; struct btf *btf; #ifdef CONFIG_MEMCG_KMEM - struct mem_cgroup *memcg; + struct obj_cgroup *objcg; #endif char name[BPF_OBJ_NAME_LEN]; struct bpf_map_off_arr *off_arr; -- cgit From 1d5f82d9dd477d5c66e0214a68c3e4f308eadd6d Mon Sep 17 00:00:00 2001 From: Song Liu Date: Tue, 5 Jul 2022 17:26:12 -0700 Subject: bpf, x86: fix freeing of not-finalized bpf_prog_pack syzbot reported a few issues with bpf_prog_pack [1], [2]. This only happens with multiple subprogs. In jit_subprogs(), we first call bpf_int_jit_compile() on each sub program. And then, we call it on each sub program again. jit_data is not freed in the first call of bpf_int_jit_compile(). Similarly we don't call bpf_jit_binary_pack_finalize() in the first call of bpf_int_jit_compile(). If bpf_int_jit_compile() failed for one sub program, we will call bpf_jit_binary_pack_finalize() for this sub program. However, we don't have a chance to call it for other sub programs. Then we will hit "goto out_free" in jit_subprogs(), and call bpf_jit_free on some subprograms that haven't got bpf_jit_binary_pack_finalize() yet. At this point, bpf_jit_binary_pack_free() is called and the whole 2MB page is freed erroneously. Fix this with a custom bpf_jit_free() for x86_64, which calls bpf_jit_binary_pack_finalize() if necessary. Also, with custom bpf_jit_free(), bpf_prog_aux->use_bpf_prog_pack is not needed any more, remove it. Fixes: 1022a5498f6f ("bpf, x86_64: Use bpf_jit_binary_pack_alloc") [1] https://syzkaller.appspot.com/bug?extid=2f649ec6d2eea1495a8f [2] https://syzkaller.appspot.com/bug?extid=87f65c75f4a72db05445 Reported-by: syzbot+2f649ec6d2eea1495a8f@syzkaller.appspotmail.com Reported-by: syzbot+87f65c75f4a72db05445@syzkaller.appspotmail.com Signed-off-by: Song Liu Link: https://lore.kernel.org/r/20220706002612.4013790-1-song@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 - include/linux/filter.h | 8 ++++++++ 2 files changed, 8 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 85a4db3e0536..a5bf00649995 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1044,7 +1044,6 @@ struct bpf_prog_aux { bool sleepable; bool tail_call_reachable; bool xdp_has_frags; - bool use_bpf_prog_pack; /* BTF_KIND_FUNC_PROTO for valid attach_btf_id */ const struct btf_type *attach_func_proto; /* function name for valid attach_btf_id */ diff --git a/include/linux/filter.h b/include/linux/filter.h index 4c1a8b247545..a5f21dc3c432 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1027,6 +1027,14 @@ u64 bpf_jit_alloc_exec_limit(void); void *bpf_jit_alloc_exec(unsigned long size); void bpf_jit_free_exec(void *addr); void bpf_jit_free(struct bpf_prog *fp); +struct bpf_binary_header * +bpf_jit_binary_pack_hdr(const struct bpf_prog *fp); + +static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) +{ + return list_empty(&fp->aux->ksym.lnode) || + fp->aux->ksym.lnode.prev == LIST_POISON2; +} struct bpf_binary_header * bpf_jit_binary_pack_alloc(unsigned int proglen, u8 **ro_image, -- cgit From 401e4963bf45c800e3e9ea0d3a0289d738005fd4 Mon Sep 17 00:00:00 2001 From: John Keeping Date: Fri, 8 Jul 2022 17:27:02 +0100 Subject: sched/core: Always flush pending blk_plug With CONFIG_PREEMPT_RT, it is possible to hit a deadlock between two normal priority tasks (SCHED_OTHER, nice level zero): INFO: task kworker/u8:0:8 blocked for more than 491 seconds. Not tainted 5.15.49-rt46 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u8:0 state:D stack: 0 pid: 8 ppid: 2 flags:0x00000000 Workqueue: writeback wb_workfn (flush-7:0) [] (__schedule) from [] (schedule+0xdc/0x134) [] (schedule) from [] (rt_mutex_slowlock_block.constprop.0+0xb8/0x174) [] (rt_mutex_slowlock_block.constprop.0) from [] +(rt_mutex_slowlock.constprop.0+0xac/0x174) [] (rt_mutex_slowlock.constprop.0) from [] (fat_write_inode+0x34/0x54) [] (fat_write_inode) from [] (__writeback_single_inode+0x354/0x3ec) [] (__writeback_single_inode) from [] (writeback_sb_inodes+0x250/0x45c) [] (writeback_sb_inodes) from [] (__writeback_inodes_wb+0x7c/0xb8) [] (__writeback_inodes_wb) from [] (wb_writeback+0x2c8/0x2e4) [] (wb_writeback) from [] (wb_workfn+0x1a4/0x3e4) [] (wb_workfn) from [] (process_one_work+0x1fc/0x32c) [] (process_one_work) from [] (worker_thread+0x22c/0x2d8) [] (worker_thread) from [] (kthread+0x16c/0x178) [] (kthread) from [] (ret_from_fork+0x14/0x38) Exception stack(0xc10e3fb0 to 0xc10e3ff8) 3fa0: 00000000 00000000 00000000 00000000 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000 INFO: task tar:2083 blocked for more than 491 seconds. Not tainted 5.15.49-rt46 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:tar state:D stack: 0 pid: 2083 ppid: 2082 flags:0x00000000 [] (__schedule) from [] (schedule+0xdc/0x134) [] (schedule) from [] (io_schedule+0x14/0x24) [] (io_schedule) from [] (bit_wait_io+0xc/0x30) [] (bit_wait_io) from [] (__wait_on_bit_lock+0x54/0xa8) [] (__wait_on_bit_lock) from [] (out_of_line_wait_on_bit_lock+0x84/0xb0) [] (out_of_line_wait_on_bit_lock) from [] (fat_mirror_bhs+0xa0/0x144) [] (fat_mirror_bhs) from [] (fat_alloc_clusters+0x138/0x2a4) [] (fat_alloc_clusters) from [] (fat_alloc_new_dir+0x34/0x250) [] (fat_alloc_new_dir) from [] (vfat_mkdir+0x58/0x148) [] (vfat_mkdir) from [] (vfs_mkdir+0x68/0x98) [] (vfs_mkdir) from [] (do_mkdirat+0xb0/0xec) [] (do_mkdirat) from [] (ret_fast_syscall+0x0/0x1c) Exception stack(0xc2e1bfa8 to 0xc2e1bff0) bfa0: 01ee42f0 01ee4208 01ee42f0 000041ed 00000000 00004000 bfc0: 01ee42f0 01ee4208 00000000 00000027 01ee4302 00000004 000dcb00 01ee4190 bfe0: 000dc368 bed11924 0006d4b0 b6ebddfc Here the kworker is waiting on msdos_sb_info::s_lock which is held by tar which is in turn waiting for a buffer which is locked waiting to be flushed, but this operation is plugged in the kworker. The lock is a normal struct mutex, so tsk_is_pi_blocked() will always return false on !RT and thus the behaviour changes for RT. It seems that the intent here is to skip blk_flush_plug() in the case where a non-preemptible lock (such as a spinlock) has been converted to a rtmutex on RT, which is the case covered by the SM_RTLOCK_WAIT schedule flag. But sched_submit_work() is only called from schedule() which is never called in this scenario, so the check can simply be deleted. Looking at the history of the -rt patchset, in fact this change was present from v5.9.1-rt20 until being dropped in v5.13-rt1 as it was part of a larger patch [1] most of which was replaced by commit b4bfa3fcfe3b ("sched/core: Rework the __schedule() preempt argument"). As described in [1]: The schedule process must distinguish between blocking on a regular sleeping lock (rwsem and mutex) and a RT-only sleeping lock (spinlock and rwlock): - rwsem and mutex must flush block requests (blk_schedule_flush_plug()) even if blocked on a lock. This can not deadlock because this also happens for non-RT. There should be a warning if the scheduling point is within a RCU read section. - spinlock and rwlock must not flush block requests. This will deadlock if the callback attempts to acquire a lock which is already acquired. Similarly to being preempted, there should be no warning if the scheduling point is within a RCU read section. and with the tsk_is_pi_blocked() in the scheduler path, we hit the first issue. [1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/0022-locking-rtmutex-Use-custom-scheduling-function-for-s.patch?h=linux-5.10.y-rt-patches Signed-off-by: John Keeping Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Steven Rostedt (Google) Link: https://lkml.kernel.org/r/20220708162702.1758865-1-john@metanate.com --- include/linux/sched/rt.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched/rt.h b/include/linux/sched/rt.h index e5af028c08b4..994c25640e15 100644 --- a/include/linux/sched/rt.h +++ b/include/linux/sched/rt.h @@ -39,20 +39,12 @@ static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *p) } extern void rt_mutex_setprio(struct task_struct *p, struct task_struct *pi_task); extern void rt_mutex_adjust_pi(struct task_struct *p); -static inline bool tsk_is_pi_blocked(struct task_struct *tsk) -{ - return tsk->pi_blocked_on != NULL; -} #else static inline struct task_struct *rt_mutex_get_top_task(struct task_struct *task) { return NULL; } # define rt_mutex_adjust_pi(p) do { } while (0) -static inline bool tsk_is_pi_blocked(struct task_struct *tsk) -{ - return false; -} #endif extern void normalize_rt_tasks(void); -- cgit From 3beb0ab5bffba625007ea5c9e5e0ee5eef05c1ea Mon Sep 17 00:00:00 2001 From: Seunghui Lee Date: Wed, 13 Jul 2022 12:36:34 +0900 Subject: mmc: core: Use mmc_card_* macro and add a new for the sd_combo type Add mmc_card_sd_combo() macro for sd combo type card and use the mmc_card_* macro to simplify code instead of comparing card->type. Signed-off-by: Seunghui Lee Link: https://lore.kernel.org/r/20220713033635.28432-2-sh043.lee@samsung.com Signed-off-by: Ulf Hansson --- include/linux/mmc/card.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h index 37f975875102..156a7b673a28 100644 --- a/include/linux/mmc/card.h +++ b/include/linux/mmc/card.h @@ -348,5 +348,6 @@ bool mmc_card_is_blockaddr(struct mmc_card *card); #define mmc_card_mmc(c) ((c)->type == MMC_TYPE_MMC) #define mmc_card_sd(c) ((c)->type == MMC_TYPE_SD) #define mmc_card_sdio(c) ((c)->type == MMC_TYPE_SDIO) +#define mmc_card_sd_combo(c) ((c)->type == MMC_TYPE_SD_COMBO) #endif /* LINUX_MMC_CARD_H */ -- cgit From 20347fca71a387a3751f7bb270062616ddc5317a Mon Sep 17 00:00:00 2001 From: Tianyu Lan Date: Fri, 8 Jul 2022 12:15:44 -0400 Subject: swiotlb: split up the global swiotlb lock Traditionally swiotlb was not performance critical because it was only used for slow devices. But in some setups, like TDX/SEV confidential guests, all IO has to go through swiotlb. Currently swiotlb only has a single lock. Under high IO load with multiple CPUs this can lead to significat lock contention on the swiotlb lock. This patch splits the swiotlb bounce buffer pool into individual areas which have their own lock. Each CPU tries to allocate in its own area first. Only if that fails does it search other areas. On freeing the allocation is freed into the area where the memory was originally allocated from. Area number can be set via swiotlb kernel parameter and is default to be possible cpu number. If possible cpu number is not power of 2, area number will be round up to the next power of 2. This idea from Andi Kleen patch(https://github.com/intel/tdx/commit/ 4529b5784c141782c72ec9bd9a92df2b68cb7d45). Based-on-idea-by: Andi Kleen Signed-off-by: Tianyu Lan Signed-off-by: Christoph Hellwig --- include/linux/swiotlb.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index bdc58a0e20b1..f65ff1930120 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -88,6 +88,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, * @late_alloc: %true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced * @for_alloc: %true if the pool is used for memory allocation + * @nareas: The area number in the pool. + * @area_nslabs: The slot number in the area. */ struct io_tlb_mem { phys_addr_t start; @@ -101,6 +103,9 @@ struct io_tlb_mem { bool late_alloc; bool force_bounce; bool for_alloc; + unsigned int nareas; + unsigned int area_nslabs; + struct io_tlb_area *areas; struct io_tlb_slot { phys_addr_t orig_addr; size_t alloc_size; -- cgit From 6d1c1a73e1126572de0a8b063fe62fe43786ed59 Mon Sep 17 00:00:00 2001 From: Bard Liao Date: Fri, 8 Jul 2022 14:13:11 +0800 Subject: soundwire: Intel: add trigger callback When a pipeline is split into FE and BE parts, the BE pipeline may need to be triggered separately in the BE trigger op. So add the trigger callback in the link_res ops that will be invoked during BE DAI trigger. Signed-off-by: Bard Liao Reviewed-by: Rander Wang Reviewed-by: Pierre-Louis Bossart Reviewed-by: Ranjani Sridharan Acked-by: Vinod Koul Link: https://lore.kernel.org/r/20220708061312.25878-2-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown --- include/linux/soundwire/sdw_intel.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index b5b489ea1aef..ec16ae49e6a4 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -121,6 +121,7 @@ struct sdw_intel_ops { struct sdw_intel_stream_params_data *params_data); int (*free_stream)(struct device *dev, struct sdw_intel_stream_free_data *free_data); + int (*trigger)(struct snd_soc_dai *dai, int cmd, int stream); }; /** -- cgit From af16df54b89dee72df253abc5e7b5e8a6d16c11c Mon Sep 17 00:00:00 2001 From: Coiby Xu Date: Wed, 13 Jul 2022 15:21:11 +0800 Subject: ima: force signature verification when CONFIG_KEXEC_SIG is configured Currently, an unsigned kernel could be kexec'ed when IMA arch specific policy is configured unless lockdown is enabled. Enforce kernel signature verification check in the kexec_file_load syscall when IMA arch specific policy is configured. Fixes: 99d5cadfde2b ("kexec_file: split KEXEC_VERIFY_SIG into KEXEC_SIG and KEXEC_SIG_FORCE") Reported-and-suggested-by: Mimi Zohar Signed-off-by: Coiby Xu Signed-off-by: Mimi Zohar --- include/linux/kexec.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index ce6536f1d269..475683cd67f1 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -452,6 +452,12 @@ static inline int kexec_crash_loaded(void) { return 0; } #define kexec_in_progress false #endif /* CONFIG_KEXEC_CORE */ +#ifdef CONFIG_KEXEC_SIG +void set_kexec_sig_enforced(void); +#else +static inline void set_kexec_sig_enforced(void) {} +#endif + #endif /* !defined(__ASSEBMLY__) */ #endif /* LINUX_KEXEC_H */ -- cgit From 0372c546eca575445331c0ad8902210b70be6d61 Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Thu, 2 Jun 2022 12:41:00 +0300 Subject: net/mlx5: Introduce ifc bits for using software vhca id Introduce ifc related stuff to enable using software vhca id functionality. Signed-off-by: Yishai Hadas Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 8e87eb47f9dc..254cc22f5eec 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1826,7 +1826,14 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 max_reformat_remove_size[0x8]; u8 max_reformat_remove_offset[0x8]; - u8 reserved_at_c0[0x740]; + u8 reserved_at_c0[0x160]; + + u8 reserved_at_220[0x1]; + u8 sw_vhca_id_valid[0x1]; + u8 sw_vhca_id[0xe]; + u8 reserved_at_230[0x10]; + + u8 reserved_at_240[0x5c0]; }; enum mlx5_ifc_flow_destination_type { @@ -3782,6 +3789,11 @@ struct mlx5_ifc_rmpc_bits { struct mlx5_ifc_wq_bits wq; }; +enum { + VHCA_ID_TYPE_HW = 0, + VHCA_ID_TYPE_SW = 1, +}; + struct mlx5_ifc_nic_vport_context_bits { u8 reserved_at_0[0x5]; u8 min_wqe_inline_mode[0x3]; @@ -3798,8 +3810,8 @@ struct mlx5_ifc_nic_vport_context_bits { u8 event_on_mc_address_change[0x1]; u8 event_on_uc_address_change[0x1]; - u8 reserved_at_40[0xc]; - + u8 vhca_id_type[0x1]; + u8 reserved_at_41[0xb]; u8 affiliation_criteria[0x4]; u8 affiliated_vhca_id[0x10]; @@ -7259,7 +7271,12 @@ struct mlx5_ifc_init_hca_in_bits { u8 reserved_at_20[0x10]; u8 op_mod[0x10]; - u8 reserved_at_40[0x40]; + u8 reserved_at_40[0x20]; + + u8 reserved_at_60[0x2]; + u8 sw_vhca_id[0xe]; + u8 reserved_at_70[0x10]; + u8 sw_owner_id[4][0x20]; }; -- cgit From dc402ccc0d7b55922a79505df3000da7deb77a2b Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Thu, 2 Jun 2022 12:47:34 +0300 Subject: net/mlx5: Use software VHCA id when it's supported Use software VHCA id when it's supported by the firmware. A unique id is allocated upon mlx5_mdev_init() and freed upon mlx5_mdev_uninit(), as such it stays the same during the full life cycle of the device including upon health recovery if occurred. The conjunction of sw_vhca_id with sw_owner_id will be a global unique id per function which uses mlx5_core. The sw_vhca_id is set upon init_hca command and is used to specify the VHCA that the NIC vport is affiliated with. This functionality is needed upon migration of VM which is MPV based. (i.e. multi port device). Signed-off-by: Yishai Hadas Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index bd882884b23c..ecda6e63d5f2 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -610,6 +610,7 @@ struct mlx5_priv { spinlock_t ctx_lock; struct mlx5_adev **adev; int adev_idx; + int sw_vhca_id; struct mlx5_events *events; struct mlx5_flow_steering *steering; -- cgit From fac47b43c760ea90e64b895dba60df0327be7775 Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Mon, 11 Jul 2022 12:11:21 +0800 Subject: netfs: do not unlock and put the folio twice check_write_begin() will unlock and put the folio when return non-zero. So we should avoid unlocking and putting it twice in netfs layer. Change the way ->check_write_begin() works in the following two ways: (1) Pass it a pointer to the folio pointer, allowing it to unlock and put the folio prior to doing the stuff it wants to do, provided it clears the folio pointer. (2) Change the return values such that 0 with folio pointer set means continue, 0 with folio pointer cleared means re-get and all error codes indicating an error (no special treatment for -EAGAIN). [ bagasdotme: use Sphinx code text syntax for *foliop pointer ] Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/56423 Link: https://lore.kernel.org/r/cf169f43-8ee7-8697-25da-0204d1b4343e@redhat.com Co-developed-by: David Howells Signed-off-by: Xiubo Li Signed-off-by: David Howells Signed-off-by: Bagas Sanjaya Signed-off-by: Ilya Dryomov --- include/linux/netfs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/netfs.h b/include/linux/netfs.h index 1773e5df8e65..1b18dfa52e48 100644 --- a/include/linux/netfs.h +++ b/include/linux/netfs.h @@ -214,7 +214,7 @@ struct netfs_request_ops { void (*issue_read)(struct netfs_io_subrequest *subreq); bool (*is_still_valid)(struct netfs_io_request *rreq); int (*check_write_begin)(struct file *file, loff_t pos, unsigned len, - struct folio *folio, void **_fsdata); + struct folio **foliop, void **_fsdata); void (*done)(struct netfs_io_request *rreq); }; -- cgit From 9745fb07474f4501eff62130a78a42a8b8c18b05 Mon Sep 17 00:00:00 2001 From: Jonathan Yong Date: Mon, 6 Jun 2022 19:41:27 +0300 Subject: platform/x86/intel: Add Primary to Sideband (P2SB) bridge support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit SoC features such as GPIO are accessed via a reserved MMIO area, we don't know its address but can obtain it from the BAR of the P2SB device, that device is normally hidden so we have to temporarily unhide it, read address and hide it back. There are already a few users and at least one more is coming which require an access to Primary to Sideband (P2SB) bridge in order to get IO or MMIO BAR hidden by BIOS. Create a library to access P2SB for x86 devices in a unified way. Background information ====================== Note, the term "bridge" is used in the documentation and it has nothing to do with a PCI (host) bridge as per the PCI specifications. The P2SB is an interesting device by its nature and hardware design. First of all, it has several devices in the hardware behind it. These devices may or may not be represented as ACPI devices by a firmware. It also has a hardwired (to 0s) the least significant bits of the base address register which is represented by the only 64-bit BAR0. It means that OS mustn't reallocate the BAR. On top of that in some cases P2SB is represented by function 0 on PCI slot (in terms of B:D.F) and according to the PCI specification any other function can't be seen until function 0 is present and visible. In the PCI configuration space of P2SB device the full 32-bit register is allocated for the only purpose of hiding the entire P2SB device. As per [3]: 3.1.39 P2SB Control (P2SBC)—Offset E0h Hide Device (HIDE): When this bit is set, the P2SB will return 1s on any PCI Configuration Read on IOSF-P. All other transactions including PCI Configuration Writes on IOSF-P are unaffected by this. This does not affect reads performed on the IOSF-SB interface. This doesn't prevent MMIO accesses, although preventing the OS from assigning these addresses. The firmware on the affected platforms marks the region as unusable (by cutting it off from the PCI host bridge resources) as depicted in the Apollo Lake example below: PCI host bridge to bus 0000:00 pci_bus 0000:00: root bus resource [io 0x0070-0x0077] pci_bus 0000:00: root bus resource [io 0x0000-0x006f window] pci_bus 0000:00: root bus resource [io 0x0078-0x0cf7 window] pci_bus 0000:00: root bus resource [io 0x0d00-0xffff window] pci_bus 0000:00: root bus resource [mem 0x7c000001-0x7fffffff window] pci_bus 0000:00: root bus resource [mem 0x7b800001-0x7bffffff window] pci_bus 0000:00: root bus resource [mem 0x80000000-0xcfffffff window] pci_bus 0000:00: root bus resource [mem 0xe0000000-0xefffffff window] pci_bus 0000:00: root bus resource [bus 00-ff] The P2SB 16MB BAR is located at 0xd0000000-0xd0ffffff memory window. The generic solution ==================== The generic solution for all cases when we need to access to the information behind P2SB device is a library code where users ask for necessary resources by demand and hence those users take care of not being run on the systems where this access is not required. The library provides the p2sb_bar() API to retrieve the MMIO of the BAR0 of the device from P2SB device slot. P2SB unconditional unhiding awareness ===================================== Technically it's possible to unhide the P2SB device and devices on the same PCI slot and access them at any time as needed. But there are several potential issues with that: - the systems were never tested against such configuration and hence nobody knows what kind of bugs it may bring, especially when we talk about SPI NOR case which contains Intel FirmWare Image (IFWI) code (including BIOS) and already known to be problematic in the past for end users - the PCI by its nature is a hotpluggable bus and in case somebody attaches a driver to the functions of a P2SB slot device(s) the end user experience and system behaviour can be unpredictable - the kernel code would need some ugly hacks (or code looking as an ugly hack) under arch/x86/pci in order to enable these devices on only selected platforms (which may include CPU ID table followed by a potentially growing number of DMI strings The future improvements ======================= The future improvements with this code may go in order to gain some kind of cache, if it's possible at all, to prevent unhiding and hiding many times to take static information that may be saved once per boot. Links ===== [1]: https://lab.whitequark.org/notes/2017-11-08/accessing-intel-ich-pch-gpios/ [2]: https://cdrdv2.intel.com/v1/dl/getContent/332690?wapkw=332690 [3]: https://cdrdv2.intel.com/v1/dl/getContent/332691?wapkw=332691 [4]: https://medium.com/@jacksonchen_43335/bios-gpio-p2sb-70e9b829b403 Signed-off-by: Jonathan Yong Co-developed-by: Andy Shevchenko Signed-off-by: Andy Shevchenko Tested-by: Henning Schild Acked-by: Hans de Goede Acked-by: Linus Walleij Signed-off-by: Lee Jones --- include/linux/platform_data/x86/p2sb.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 include/linux/platform_data/x86/p2sb.h (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/p2sb.h b/include/linux/platform_data/x86/p2sb.h new file mode 100644 index 000000000000..a1d5fddc8f13 --- /dev/null +++ b/include/linux/platform_data/x86/p2sb.h @@ -0,0 +1,28 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Primary to Sideband (P2SB) bridge access support + */ + +#ifndef _PLATFORM_DATA_X86_P2SB_H +#define _PLATFORM_DATA_X86_P2SB_H + +#include +#include + +struct pci_bus; +struct resource; + +#if IS_BUILTIN(CONFIG_P2SB) + +int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem); + +#else /* CONFIG_P2SB */ + +static inline int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem) +{ + return -ENODEV; +} + +#endif /* CONFIG_P2SB is not set */ + +#endif /* _PLATFORM_DATA_X86_P2SB_H */ -- cgit From 446f0cf9e08b483d7dc6f61eee0ee846b22f6386 Mon Sep 17 00:00:00 2001 From: Henning Schild Date: Mon, 6 Jun 2022 19:41:37 +0300 Subject: platform/x86: simatic-ipc: drop custom P2SB bar code The two drivers that used to use this have been switched over to the common P2SB accessor, so this code is not needed any longer. Signed-off-by: Henning Schild Signed-off-by: Andy Shevchenko Reviewed-by: Hans de Goede Signed-off-by: Lee Jones --- include/linux/platform_data/x86/simatic-ipc-base.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/simatic-ipc-base.h b/include/linux/platform_data/x86/simatic-ipc-base.h index 62d2bc774067..39fefd48cf4d 100644 --- a/include/linux/platform_data/x86/simatic-ipc-base.h +++ b/include/linux/platform_data/x86/simatic-ipc-base.h @@ -24,6 +24,4 @@ struct simatic_ipc_platform { u8 devmode; }; -u32 simatic_ipc_get_membase0(unsigned int p2sb); - #endif /* __PLATFORM_DATA_X86_SIMATIC_IPC_BASE_H */ -- cgit From 1b870fa5573e260bc74d19f381ab0dd971a8d8e7 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Thu, 14 Jul 2022 07:27:31 -0400 Subject: kvm: stats: tell userspace which values are boolean Some of the statistics values exported by KVM are always only 0 or 1. It can be useful to export this fact to userspace so that it can track them specially (for example by polling the value every now and then to compute a % of time spent in a specific state). Therefore, add "boolean value" as a new "unit". While it is not exactly a unit, it walks and quacks like one. In particular, using the type would be wrong because boolean values could be instantaneous or peak values (e.g. "is the rmap allocated?") or even two-bucket histograms (e.g. "number of posted vs. non-posted interrupt injections"). Suggested-by: Amneesh Singh Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 83cf7fd842e0..90a45ef7203b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1822,6 +1822,15 @@ struct _kvm_stats_desc { STATS_DESC_PEAK(SCOPE, name, KVM_STATS_UNIT_NONE, \ KVM_STATS_BASE_POW10, 0) +/* Instantaneous boolean value, read only */ +#define STATS_DESC_IBOOLEAN(SCOPE, name) \ + STATS_DESC_INSTANT(SCOPE, name, KVM_STATS_UNIT_BOOLEAN, \ + KVM_STATS_BASE_POW10, 0) +/* Peak (sticky) boolean value, read/write */ +#define STATS_DESC_PBOOLEAN(SCOPE, name) \ + STATS_DESC_PEAK(SCOPE, name, KVM_STATS_UNIT_BOOLEAN, \ + KVM_STATS_BASE_POW10, 0) + /* Cumulative time in nanosecond */ #define STATS_DESC_TIME_NSEC(SCOPE, name) \ STATS_DESC_CUMULATIVE(SCOPE, name, KVM_STATS_UNIT_SECONDS, \ @@ -1853,7 +1862,7 @@ struct _kvm_stats_desc { HALT_POLL_HIST_COUNT), \ STATS_DESC_LOGHIST_TIME_NSEC(VCPU_GENERIC, halt_wait_hist, \ HALT_POLL_HIST_COUNT), \ - STATS_DESC_ICOUNTER(VCPU_GENERIC, blocking) + STATS_DESC_IBOOLEAN(VCPU_GENERIC, blocking) extern struct dentry *kvm_debugfs_dir; -- cgit From 4fa05a67b558d2cb3acd2bb299b91220d405ca5e Mon Sep 17 00:00:00 2001 From: Christian König Date: Mon, 11 Jul 2022 16:48:01 +0200 Subject: dma-buf: revert "return only unsignaled fences in dma_fence_unwrap_for_each v3" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 8f61973718485f3e89bc4f408f929048b7b47c83. It turned out that this is not correct. Especially the sync_file info IOCTL needs to see even signaled fences to correctly report back their status to userspace. Instead add the filter in the merge function again where it makes sense. Signed-off-by: Christian König Tested-by: Karolina Drobnik Reviewed-by: Alex Deucher Link: https://patchwork.freedesktop.org/patch/msgid/20220712102849.1562-1-christian.koenig@amd.com --- include/linux/dma-fence-unwrap.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-fence-unwrap.h b/include/linux/dma-fence-unwrap.h index 390de1ee9d35..66b1e56fbb81 100644 --- a/include/linux/dma-fence-unwrap.h +++ b/include/linux/dma-fence-unwrap.h @@ -43,14 +43,10 @@ struct dma_fence *dma_fence_unwrap_next(struct dma_fence_unwrap *cursor); * Unwrap dma_fence_chain and dma_fence_array containers and deep dive into all * potential fences in them. If @head is just a normal fence only that one is * returned. - * - * Note that signalled fences are opportunistically filtered out, which - * means the iteration is potentially over no fence at all. */ #define dma_fence_unwrap_for_each(fence, cursor, head) \ for (fence = dma_fence_unwrap_first(head, cursor); fence; \ - fence = dma_fence_unwrap_next(cursor)) \ - if (!dma_fence_is_signaled(fence)) + fence = dma_fence_unwrap_next(cursor)) struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences, struct dma_fence **fences, -- cgit From ddaf8d96f93bccb3f2b1f4f156c098b272440004 Mon Sep 17 00:00:00 2001 From: Prashant Malani Date: Mon, 11 Jul 2022 07:22:55 +0000 Subject: usb: typec: Add support for retimers Introduce a retimer device class and associated functions that register and use retimer "switch" devices. These operate in a manner similar to the "mode-switch" and help configure retimers that exist between the Type-C connector and host controller(s). Type C ports can be linked to retimers using firmware node device references (again, in a manner similar to "mode-switch"). There are no new sysfs files being created; there is the new retimer class directory, but there are no class-specific files being created there. Signed-off-by: Prashant Malani Acked-by: Heikki Krogerus Link: https://lore.kernel.org/r/20220711072333.2064341-2-pmalani@chromium.org Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/typec_retimer.h | 45 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 include/linux/usb/typec_retimer.h (limited to 'include/linux') diff --git a/include/linux/usb/typec_retimer.h b/include/linux/usb/typec_retimer.h new file mode 100644 index 000000000000..5e036b3360e2 --- /dev/null +++ b/include/linux/usb/typec_retimer.h @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __USB_TYPEC_RETIMER +#define __USB_TYPEC_RETIMER + +#include +#include + +struct device; +struct typec_retimer; +struct typec_altmode; +struct fwnode_handle; + +struct typec_retimer_state { + struct typec_altmode *alt; + unsigned long mode; + void *data; +}; + +typedef int (*typec_retimer_set_fn_t)(struct typec_retimer *retimer, + struct typec_retimer_state *state); + +struct typec_retimer_desc { + struct fwnode_handle *fwnode; + typec_retimer_set_fn_t set; + const char *name; + void *drvdata; +}; + +struct typec_retimer *fwnode_typec_retimer_get(struct fwnode_handle *fwnode); +void typec_retimer_put(struct typec_retimer *retimer); +int typec_retimer_set(struct typec_retimer *retimer, struct typec_retimer_state *state); + +static inline struct typec_retimer *typec_retimer_get(struct device *dev) +{ + return fwnode_typec_retimer_get(dev_fwnode(dev)); +} + +struct typec_retimer * +typec_retimer_register(struct device *parent, const struct typec_retimer_desc *desc); +void typec_retimer_unregister(struct typec_retimer *retimer); + +void *typec_retimer_get_drvdata(struct typec_retimer *retimer); + +#endif /* __USB_TYPEC_RETIMER */ -- cgit From e6281c26674e037798bf674f26a7593a324cdf39 Mon Sep 17 00:00:00 2001 From: Ang Tien Sung Date: Mon, 11 Jul 2022 17:31:35 -0500 Subject: firmware: stratix10-svc: Add support for FCS Extend Intel service layer driver to support FPGA Crypto service(FCS) features on Intel Soc platforms. Adding an additional channel and FCS platform driver ("intel_fcs") as part of the probe method. FCS driver uses the driver to send crypto operations' commands to the secure device manager(SDM) on Intel Soc platforms Stratix10 and Agilex. Signed-off-by: Ang Tien Sung Signed-off-by: Dinh Nguyen Link: https://lore.kernel.org/r/20220711223140.2307945-1-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/intel/stratix10-svc-client.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h index 18c1841fdb1f..24121f8e0d7c 100644 --- a/include/linux/firmware/intel/stratix10-svc-client.h +++ b/include/linux/firmware/intel/stratix10-svc-client.h @@ -14,6 +14,7 @@ */ #define SVC_CLIENT_FPGA "fpga" #define SVC_CLIENT_RSU "rsu" +#define SVC_CLIENT_FCS "fcs" /* * Status of the sent command, in bit number -- cgit From 79b936254aa0eb4e3bc73fecaacf049145613c0a Mon Sep 17 00:00:00 2001 From: Ang Tien Sung Date: Mon, 11 Jul 2022 17:31:36 -0500 Subject: firmware: stratix10-svc: add FCS polling command Introduce a new SMC command INTEL_SIP_SMC_FUNCID_SERVICE_COMPLETED that polls if a previous asynchronous command was completed. This SMC command is used by the new FPGA Crypto Service (FCS). A basic example is that the FCS sends an AES data encryption call to the secure device manager(SDM) and waits for the completion of the operation by continuously polling the results with the new command. Signed-off-by: Ang Tien Sung Signed-off-by: Dinh Nguyen Link: https://lore.kernel.org/r/20220711223140.2307945-2-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/intel/stratix10-smc.h | 25 ++++++++++++++++++++++ .../linux/firmware/intel/stratix10-svc-client.h | 5 +++++ 2 files changed, 30 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/intel/stratix10-smc.h b/include/linux/firmware/intel/stratix10-smc.h index aad497a9ad8b..0de104009566 100644 --- a/include/linux/firmware/intel/stratix10-smc.h +++ b/include/linux/firmware/intel/stratix10-smc.h @@ -403,6 +403,31 @@ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FPGA_CONFIG_COMPLETED_WRITE) #define INTEL_SIP_SMC_RSU_MAX_RETRY \ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_RSU_MAX_RETRY) +/** + * Request INTEL_SIP_SMC_SERVICE_COMPLETED + * Sync call to check if the secure world have completed service request + * or not. + * + * Call register usage: + * a0: INTEL_SIP_SMC_SERVICE_COMPLETED + * a1: this register is optional. If used, it is the physical address for + * secure firmware to put output data + * a2: this register is optional. If used, it is the size of output data + * a3-a7: not used + * + * Return status: + * a0: INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_STATUS_ERROR, + * INTEL_SIP_SMC_REJECTED or INTEL_SIP_SMC_STATUS_BUSY + * a1: mailbox error if a0 is INTEL_SIP_SMC_STATUS_ERROR + * a2: physical address containing the process info + * for FCS certificate -- the data contains the certificate status + * for FCS cryption -- the data contains the actual data size FW processes + * a3: output data size + */ +#define INTEL_SIP_SMC_FUNCID_SERVICE_COMPLETED 30 +#define INTEL_SIP_SMC_SERVICE_COMPLETED \ + INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_SERVICE_COMPLETED) + /** * Request INTEL_SIP_SMC_FIRMWARE_VERSION * diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h index 24121f8e0d7c..5d0e814e0c41 100644 --- a/include/linux/firmware/intel/stratix10-svc-client.h +++ b/include/linux/firmware/intel/stratix10-svc-client.h @@ -106,6 +106,9 @@ struct stratix10_svc_chan; * @COMMAND_RSU_DCMF_VERSION: query firmware for the DCMF version, return status * is SVC_STATUS_OK or SVC_STATUS_ERROR * + * @COMMAND_POLL_SERVICE_STATUS: poll if the service request is complete, + * return statis is SVC_STATUS_OK, SVC_STATUS_ERROR or SVC_STATUS_BUSY + * * @COMMAND_FIRMWARE_VERSION: query running firmware version, return status * is SVC_STATUS_OK or SVC_STATUS_ERROR */ @@ -122,6 +125,8 @@ enum stratix10_svc_command_code { COMMAND_RSU_MAX_RETRY, COMMAND_RSU_DCMF_VERSION, COMMAND_FIRMWARE_VERSION, + /* for general status poll */ + COMMAND_POLL_SERVICE_STATUS = 40, }; /** -- cgit From 4a4709d470e624e65a879b5c430dee5e27e9ac83 Mon Sep 17 00:00:00 2001 From: Ang Tien Sung Date: Mon, 11 Jul 2022 17:31:37 -0500 Subject: firmware: stratix10-svc: add new FCS commands Extending the fpga svc driver to support 6 new FPGA Crypto Service(FCS) commands. We are adding FCS SDOS data encryption and decryption, random number generator, image validation request, reading the data provision and certificate validation. Signed-off-by: Ang Tien Sung Signed-off-by: Dinh Nguyen Link: https://lore.kernel.org/r/20220711223140.2307945-3-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/intel/stratix10-smc.h | 111 +++++++++++++++++++++ .../linux/firmware/intel/stratix10-svc-client.h | 41 +++++++- 2 files changed, 148 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/firmware/intel/stratix10-smc.h b/include/linux/firmware/intel/stratix10-smc.h index 0de104009566..8c198984a47c 100644 --- a/include/linux/firmware/intel/stratix10-smc.h +++ b/include/linux/firmware/intel/stratix10-smc.h @@ -445,4 +445,115 @@ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FPGA_CONFIG_COMPLETED_WRITE) #define INTEL_SIP_SMC_FIRMWARE_VERSION \ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FIRMWARE_VERSION) +/** + * SMC call protocol for FPGA Crypto Service (FCS) + * FUNCID starts from 90 + */ + +/** + * Request INTEL_SIP_SMC_FCS_RANDOM_NUMBER + * + * Sync call used to query the random number generated by the firmware + * + * Call register usage: + * a0 INTEL_SIP_SMC_FCS_RANDOM_NUMBER + * a1 the physical address for firmware to write generated random data + * a2-a7 not used + * + * Return status: + * a0 INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_FCS_ERROR or + * INTEL_SIP_SMC_FCS_REJECTED + * a1 mailbox error + * a2 the physical address of generated random number + * a3 size + */ +#define INTEL_SIP_SMC_FUNCID_FCS_RANDOM_NUMBER 90 +#define INTEL_SIP_SMC_FCS_RANDOM_NUMBER \ + INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FCS_RANDOM_NUMBER) + +/** + * Request INTEL_SIP_SMC_FCS_CRYPTION + * Async call for data encryption and HMAC signature generation, or for + * data decryption and HMAC verification. + * + * Call INTEL_SIP_SMC_SERVICE_COMPLETED to get the output encrypted or + * decrypted data + * + * Call register usage: + * a0 INTEL_SIP_SMC_FCS_CRYPTION + * a1 cryption mode (1 for encryption and 0 for decryption) + * a2 physical address which stores to be encrypted or decrypted data + * a3 input data size + * a4 physical address which will hold the encrypted or decrypted output data + * a5 output data size + * a6-a7 not used + * + * Return status: + * a0 INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_STATUS_ERROR or + * INTEL_SIP_SMC_STATUS_REJECTED + * a1-3 not used + */ +#define INTEL_SIP_SMC_FUNCID_FCS_CRYPTION 91 +#define INTEL_SIP_SMC_FCS_CRYPTION \ + INTEL_SIP_SMC_STD_CALL_VAL(INTEL_SIP_SMC_FUNCID_FCS_CRYPTION) + +/** + * Request INTEL_SIP_SMC_FCS_SERVICE_REQUEST + * Async call for authentication service of HPS software + * + * Call register usage: + * a0 INTEL_SIP_SMC_FCS_SERVICE_REQUEST + * a1 the physical address of data block + * a2 size of data block + * a3-a7 not used + * + * Return status: + * a0 INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_ERROR or + * INTEL_SIP_SMC_REJECTED + * a1-a3 not used + */ +#define INTEL_SIP_SMC_FUNCID_FCS_SERVICE_REQUEST 92 +#define INTEL_SIP_SMC_FCS_SERVICE_REQUEST \ + INTEL_SIP_SMC_STD_CALL_VAL(INTEL_SIP_SMC_FUNCID_FCS_SERVICE_REQUEST) + +/** + * Request INTEL_SIP_SMC_FUNCID_FCS_SEND_CERTIFICATE + * Sync call to send a signed certificate + * + * Call register usage: + * a0 INTEL_SIP_SMC_FCS_SEND_CERTIFICATE + * a1 the physical address of CERTIFICATE block + * a2 size of data block + * a3-a7 not used + * + * Return status: + * a0 INTEL_SIP_SMC_STATUS_OK or INTEL_SIP_SMC_FCS_REJECTED + * a1-a3 not used + */ +#define INTEL_SIP_SMC_FUNCID_FCS_SEND_CERTIFICATE 93 +#define INTEL_SIP_SMC_FCS_SEND_CERTIFICATE \ + INTEL_SIP_SMC_STD_CALL_VAL(INTEL_SIP_SMC_FUNCID_FCS_SEND_CERTIFICATE) + +/** + * Request INTEL_SIP_SMC_FCS_GET_PROVISION_DATA + * Sync call to dump all the fuses and key hashes + * + * Call register usage: + * a0 INTEL_SIP_SMC_FCS_GET_PROVISION_DATA + * a1 the physical address for firmware to write structure of fuse and + * key hashes + * a2-a7 not used + * + * Return status: + * a0 INTEL_SIP_SMC_STATUS_OK, INTEL_SIP_SMC_FCS_ERROR or + * INTEL_SIP_SMC_FCS_REJECTED + * a1 mailbox error + * a2 physical address for the structure of fuse and key hashes + * a3 the size of structure + * + */ +#define INTEL_SIP_SMC_FUNCID_FCS_GET_PROVISION_DATA 94 +#define INTEL_SIP_SMC_FCS_GET_PROVISION_DATA \ + INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FCS_GET_PROVISION_DATA) + #endif diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h index 5d0e814e0c41..677720792259 100644 --- a/include/linux/firmware/intel/stratix10-svc-client.h +++ b/include/linux/firmware/intel/stratix10-svc-client.h @@ -50,8 +50,8 @@ #define SVC_STATUS_BUSY 4 #define SVC_STATUS_ERROR 5 #define SVC_STATUS_NO_SUPPORT 6 - -/* +#define SVC_STATUS_INVALID_PARAM 7 +/** * Flag bit for COMMAND_RECONFIG * * COMMAND_RECONFIG_FLAG_PARTIAL: @@ -67,6 +67,8 @@ #define SVC_RECONFIG_REQUEST_TIMEOUT_MS 300 #define SVC_RECONFIG_BUFFER_TIMEOUT_MS 720 #define SVC_RSU_REQUEST_TIMEOUT_MS 300 +#define SVC_FCS_REQUEST_TIMEOUT_MS 2000 +#define SVC_COMPLETED_TIMEOUT_MS 30000 struct stratix10_svc_chan; @@ -111,20 +113,47 @@ struct stratix10_svc_chan; * * @COMMAND_FIRMWARE_VERSION: query running firmware version, return status * is SVC_STATUS_OK or SVC_STATUS_ERROR + * + * @COMMAND_FCS_REQUEST_SERVICE: request validation of image from firmware, + * return status is SVC_STATUS_OK, SVC_STATUS_INVALID_PARAM + * + * @COMMAND_FCS_SEND_CERTIFICATE: send a certificate, return status is + * SVC_STATUS_OK, SVC_STATUS_INVALID_PARAM, SVC_STATUS_ERROR + * + * @COMMAND_FCS_GET_PROVISION_DATA: read the provisioning data, return status is + * SVC_STATUS_OK, SVC_STATUS_INVALID_PARAM, SVC_STATUS_ERROR + * + * @COMMAND_FCS_DATA_ENCRYPTION: encrypt the data, return status is + * SVC_STATUS_OK, SVC_STATUS_INVALID_PARAM, SVC_STATUS_ERROR + * + * @COMMAND_FCS_DATA_DECRYPTION: decrypt the data, return status is + * SVC_STATUS_OK, SVC_STATUS_INVALID_PARAM, SVC_STATUS_ERROR + * + * @COMMAND_FCS_RANDOM_NUMBER_GEN: generate a random number, return status + * is SVC_STATUS_OK, SVC_STATUS_ERROR */ enum stratix10_svc_command_code { + /* for FPGA */ COMMAND_NOOP = 0, COMMAND_RECONFIG, COMMAND_RECONFIG_DATA_SUBMIT, COMMAND_RECONFIG_DATA_CLAIM, COMMAND_RECONFIG_STATUS, - COMMAND_RSU_STATUS, + /* for RSU */ + COMMAND_RSU_STATUS = 10, COMMAND_RSU_UPDATE, COMMAND_RSU_NOTIFY, COMMAND_RSU_RETRY, COMMAND_RSU_MAX_RETRY, COMMAND_RSU_DCMF_VERSION, COMMAND_FIRMWARE_VERSION, + /* for FCS */ + COMMAND_FCS_REQUEST_SERVICE = 20, + COMMAND_FCS_SEND_CERTIFICATE, + COMMAND_FCS_GET_PROVISION_DATA, + COMMAND_FCS_DATA_ENCRYPTION, + COMMAND_FCS_DATA_DECRYPTION, + COMMAND_FCS_RANDOM_NUMBER_GEN, /* for general status poll */ COMMAND_POLL_SERVICE_STATUS = 40, }; @@ -132,13 +161,17 @@ enum stratix10_svc_command_code { /** * struct stratix10_svc_client_msg - message sent by client to service * @payload: starting address of data need be processed - * @payload_length: data size in bytes + * @payload_length: to be processed data size in bytes + * @payload_output: starting address of processed data + * @payload_length_output: processed data size in bytes * @command: service command * @arg: args to be passed via registers and not physically mapped buffers */ struct stratix10_svc_client_msg { void *payload; size_t payload_length; + void *payload_output; + size_t payload_length_output; enum stratix10_svc_command_code command; u64 arg[3]; }; -- cgit From 1b4394c5d731593063f53df9d72467335c3b0367 Mon Sep 17 00:00:00 2001 From: Kah Jing Lee Date: Mon, 11 Jul 2022 17:31:39 -0500 Subject: firmware: stratix10-svc: extend svc to support RSU feature Extend Intel Stratix10 service layer driver to support new RSU DCMF status reporting. The status of each DCMF is reported. The currently used DCMF is used as reference, while the other three are compared against it to determine if they are corrupted. DCMF = Decision Configuration Management Firmware RSU = Remote System Update Signed-off-by: Radu Bacrau Signed-off-by: Ang Tien Sung Signed-off-by: Kah Jing Lee Signed-off-by: Dinh Nguyen Link: https://lore.kernel.org/r/20220711223140.2307945-5-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/intel/stratix10-smc.h | 21 +++++++++++++++++++++ include/linux/firmware/intel/stratix10-svc-client.h | 4 ++++ 2 files changed, 25 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/intel/stratix10-smc.h b/include/linux/firmware/intel/stratix10-smc.h index 8c198984a47c..8fc4aa450517 100644 --- a/include/linux/firmware/intel/stratix10-smc.h +++ b/include/linux/firmware/intel/stratix10-smc.h @@ -403,6 +403,27 @@ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FPGA_CONFIG_COMPLETED_WRITE) #define INTEL_SIP_SMC_RSU_MAX_RETRY \ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_RSU_MAX_RETRY) +/** + * Request INTEL_SIP_SMC_RSU_DCMF_STATUS + * + * Sync call used by service driver at EL1 to query DCMF status from FW + * + * Call register usage: + * a0 INTEL_SIP_SMC_RSU_DCMF_STATUS + * a1-7 not used + * + * Return status + * a0 INTEL_SIP_SMC_STATUS_OK + * a1 dcmf3 | dcmf2 | dcmf1 | dcmf0 + * + * Or + * + * a0 INTEL_SIP_SMC_RSU_ERROR + */ +#define INTEL_SIP_SMC_FUNCID_RSU_DCMF_STATUS 20 +#define INTEL_SIP_SMC_RSU_DCMF_STATUS \ + INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_RSU_DCMF_STATUS) + /** * Request INTEL_SIP_SMC_SERVICE_COMPLETED * Sync call to check if the secure world have completed service request diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h index 677720792259..82a20e62f15f 100644 --- a/include/linux/firmware/intel/stratix10-svc-client.h +++ b/include/linux/firmware/intel/stratix10-svc-client.h @@ -114,6 +114,9 @@ struct stratix10_svc_chan; * @COMMAND_FIRMWARE_VERSION: query running firmware version, return status * is SVC_STATUS_OK or SVC_STATUS_ERROR * + * @COMMAND_RSU_DCMF_STATUS: query firmware for the DCMF status + * return status is SVC_STATUS_OK or SVC_STATUS_ERROR + * * @COMMAND_FCS_REQUEST_SERVICE: request validation of image from firmware, * return status is SVC_STATUS_OK, SVC_STATUS_INVALID_PARAM * @@ -146,6 +149,7 @@ enum stratix10_svc_command_code { COMMAND_RSU_RETRY, COMMAND_RSU_MAX_RETRY, COMMAND_RSU_DCMF_VERSION, + COMMAND_RSU_DCMF_STATUS, COMMAND_FIRMWARE_VERSION, /* for FCS */ COMMAND_FCS_REQUEST_SERVICE = 20, -- cgit From 7935e899b35c93faa26e1d272a51b3d9cb39f23f Mon Sep 17 00:00:00 2001 From: Ang Tien Sung Date: Mon, 11 Jul 2022 17:31:40 -0500 Subject: firmware: stratix10-svc: To support a command ATF Get Version We are to support a new SMC Command of hexadecimal 0x200 that returns the ATF Firmware major and minor version. Signed-off-by: Ang Tien Sung Signed-off-by: Dinh Nguyen Link: https://lore.kernel.org/r/20220711223140.2307945-6-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/intel/stratix10-smc.h | 18 ++++++++++++++++++ include/linux/firmware/intel/stratix10-svc-client.h | 5 +++++ 2 files changed, 23 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/intel/stratix10-smc.h b/include/linux/firmware/intel/stratix10-smc.h index 8fc4aa450517..a718f853d457 100644 --- a/include/linux/firmware/intel/stratix10-smc.h +++ b/include/linux/firmware/intel/stratix10-smc.h @@ -466,6 +466,24 @@ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FPGA_CONFIG_COMPLETED_WRITE) #define INTEL_SIP_SMC_FIRMWARE_VERSION \ INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_FUNCID_FIRMWARE_VERSION) +/** + * Request INTEL_SIP_SMC_SVC_VERSION + * + * Sync call used to query the SIP SMC API Version + * + * Call register usage: + * a0 INTEL_SIP_SMC_SVC_VERSION + * a1-a7 not used + * + * Return status: + * a0 INTEL_SIP_SMC_STATUS_OK + * a1 Major + * a2 Minor + */ +#define INTEL_SIP_SMC_SVC_FUNCID_VERSION 512 +#define INTEL_SIP_SMC_SVC_VERSION \ + INTEL_SIP_SMC_FAST_CALL_VAL(INTEL_SIP_SMC_SVC_FUNCID_VERSION) + /** * SMC call protocol for FPGA Crypto Service (FCS) * FUNCID starts from 90 diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h index 82a20e62f15f..a776a787ac75 100644 --- a/include/linux/firmware/intel/stratix10-svc-client.h +++ b/include/linux/firmware/intel/stratix10-svc-client.h @@ -114,6 +114,9 @@ struct stratix10_svc_chan; * @COMMAND_FIRMWARE_VERSION: query running firmware version, return status * is SVC_STATUS_OK or SVC_STATUS_ERROR * + * @COMMAND_SMC_SVC_VERSION: Non-mailbox SMC SVC API Version, + * return status is SVC_STATUS_OK + * * @COMMAND_RSU_DCMF_STATUS: query firmware for the DCMF status * return status is SVC_STATUS_OK or SVC_STATUS_ERROR * @@ -160,6 +163,8 @@ enum stratix10_svc_command_code { COMMAND_FCS_RANDOM_NUMBER_GEN, /* for general status poll */ COMMAND_POLL_SERVICE_STATUS = 40, + /* Non-mailbox SMC Call */ + COMMAND_SMC_SVC_VERSION = 200, }; /** -- cgit From 900d156bac2bc474cf7c7bee4efbc6c83ec5ae58 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 13 Jul 2022 07:53:17 +0200 Subject: block: remove bdevname Replace the remaining calls of bdevname with snprintf using the %pg format specifier. Signed-off-by: Christoph Hellwig Reviewed-by: Jan Kara Reviewed-by: Chaitanya Kulkarni Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220713055317.1888500-10-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 22c477fadc0f..2775763c51b9 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1457,7 +1457,6 @@ static inline void bio_end_io_acct(struct bio *bio, unsigned long start_time) int bdev_read_only(struct block_device *bdev); int set_blocksize(struct block_device *bdev, int size); -const char *bdevname(struct block_device *bdev, char *buffer); int lookup_bdev(const char *pathname, dev_t *dev); void blkdev_show(struct seq_file *seqf, off_t offset); -- cgit From ff07a02e9e8e6489db841e0c48a5c78e7e78d572 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:06:27 -0700 Subject: treewide: Rename enum req_opf into enum req_op The type name enum req_opf is misleading since it suggests that values of this type include both an operation type and flags. Since values of this type represent an operation only, change the type name into enum req_op. Convert the enum req_op documentation into kernel-doc format. Move a few definitions such that the enum req_op documentation occurs just above the enum req_op definition. The name "req_opf" was introduced by commit ef295ecf090d ("block: better op and flags encoding"). Cc: Christoph Hellwig Cc: Ming Lei Cc: Hannes Reinecke Cc: Damien Le Moal Cc: Johannes Thumshirn Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-2-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 16 ++++++++-------- include/linux/blkdev.h | 2 +- 2 files changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index a24d4078fb21..0e6a2af7ed3d 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -337,8 +337,12 @@ enum { typedef __u32 __bitwise blk_mq_req_flags_t; -/* - * Operations and flags common to the bio and request structures. +#define REQ_OP_BITS 8 +#define REQ_OP_MASK ((1 << REQ_OP_BITS) - 1) +#define REQ_FLAG_BITS 24 + +/** + * enum req_op - Operations common to the bio and request structures. * We use 8 bits for encoding the operation, and the remaining 24 for flags. * * The least significant bit of the operation number indicates the data @@ -350,11 +354,7 @@ typedef __u32 __bitwise blk_mq_req_flags_t; * If a operation does not transfer data the least significant bit has no * meaning. */ -#define REQ_OP_BITS 8 -#define REQ_OP_MASK ((1 << REQ_OP_BITS) - 1) -#define REQ_FLAG_BITS 24 - -enum req_opf { +enum req_op { /* read sectors from the device */ REQ_OP_READ = 0, /* write sectors to the device */ @@ -509,7 +509,7 @@ static inline bool op_is_discard(unsigned int op) * due to its different handling in the block layer and device response in * case of command failure. */ -static inline bool op_is_zone_mgmt(enum req_opf op) +static inline bool op_is_zone_mgmt(enum req_op op) { switch (op & REQ_OP_MASK) { case REQ_OP_ZONE_RESET: diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2775763c51b9..ec072a5129bf 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -322,7 +322,7 @@ void disk_set_zoned(struct gendisk *disk, enum blk_zoned_model model); int blkdev_report_zones(struct block_device *bdev, sector_t sector, unsigned int nr_zones, report_zones_cb cb, void *data); unsigned int bdev_nr_zones(struct block_device *bdev); -extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_opf op, +extern int blkdev_zone_mgmt(struct block_device *bdev, enum req_op op, sector_t sectors, sector_t nr_sectors, gfp_t gfp_mask); int blk_revalidate_disk_zones(struct gendisk *disk, -- cgit From 77e7ffd7ad3952909be6a9c599b7d164c8866fec Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:06:28 -0700 Subject: block: Use enum req_op where appropriate Change the type of the arguments that are used to pass a REQ_OP_* value from int or unsigned int into enum req_op to improve static type checking. Cc: Christoph Hellwig Cc: Ming Lei Cc: Hannes Reinecke Cc: Damien Le Moal Cc: Johannes Thumshirn Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-3-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 2 +- include/linux/blkdev.h | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 0e6a2af7ed3d..cce8768bc00b 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -522,7 +522,7 @@ static inline bool op_is_zone_mgmt(enum req_op op) } } -static inline int op_stat_group(unsigned int op) +static inline int op_stat_group(enum req_op op) { if (op_is_discard(op)) return STAT_DISCARD; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ec072a5129bf..2f13f0062192 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -872,7 +872,7 @@ extern void blk_queue_exit(struct request_queue *q); extern void blk_sync_queue(struct request_queue *q); /* Helper to convert REQ_OP_XXX to its string format XXX */ -extern const char *blk_op_str(unsigned int op); +extern const char *blk_op_str(enum req_op op); int blk_status_to_errno(blk_status_t status); blk_status_t errno_to_blk_status(int errno); @@ -1434,9 +1434,9 @@ static inline void blk_wake_io_task(struct task_struct *waiter) } unsigned long bdev_start_io_acct(struct block_device *bdev, - unsigned int sectors, unsigned int op, + unsigned int sectors, enum req_op op, unsigned long start_time); -void bdev_end_io_acct(struct block_device *bdev, unsigned int op, +void bdev_end_io_acct(struct block_device *bdev, enum req_op op, unsigned long start_time); void bio_start_io_acct_time(struct bio *bio, unsigned long start_time); -- cgit From 86947df3a9236481276e8baadde50a403b02b4d4 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:06:29 -0700 Subject: block: Change the type of the last .rw_page() argument All .rw_page() callers pass an enum req_op value as last argument. Make this explicit by changing the type of the last argument into enum req_op. See also commit 3f289dcb4b26 ("block: make bdev_ops->rw_page() take a REQ_OP instead of bool"). Cc: Tejun Heo Cc: Minchan Kim Cc: Dan Williams Cc: Hannes Reinecke Cc: Damien Le Moal Cc: Johannes Thumshirn Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-4-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2f13f0062192..ca2ff113ea00 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1381,7 +1381,7 @@ struct block_device_operations { unsigned int flags); int (*open) (struct block_device *, fmode_t); void (*release) (struct gendisk *, fmode_t); - int (*rw_page)(struct block_device *, sector_t, struct page *, unsigned int); + int (*rw_page)(struct block_device *, sector_t, struct page *, enum req_op); int (*ioctl) (struct block_device *, fmode_t, unsigned, unsigned long); int (*compat_ioctl) (struct block_device *, fmode_t, unsigned, unsigned long); unsigned int (*check_events) (struct gendisk *disk, -- cgit From 2d9b02be73ba8efba406b399a722b4e33614dd0e Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:06:30 -0700 Subject: block: Change the type of req_op() and bio_op() into enum req_op Improve static type checking by changing the type of the value returned by req_op() and bio_op() from unsigned int into enum req_op. Insert 'default: break;' in switch statements on the enum req_op type to prevent that the compiler warns about these switch statements. Cc: Christoph Hellwig Cc: Ming Lei Cc: Hannes Reinecke Cc: Damien Le Moal Cc: Johannes Thumshirn Cc: Tim Waugh Cc: Alasdair Kergon Cc: Mike Snitzer Cc: Mikulas Patocka Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-5-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 6 ++++-- include/linux/blk_types.h | 6 ++++-- 2 files changed, 8 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index d74f6a6b7e69..677195de0663 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -198,8 +198,10 @@ struct request { void *end_io_data; }; -#define req_op(req) \ - ((req)->cmd_flags & REQ_OP_MASK) +static inline enum req_op req_op(const struct request *req) +{ + return req->cmd_flags & REQ_OP_MASK; +} static inline bool blk_rq_is_passthrough(struct request *rq) { diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index cce8768bc00b..e66cbe377ae8 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -463,8 +463,10 @@ enum stat_group { NR_STAT_GROUPS }; -#define bio_op(bio) \ - ((bio)->bi_opf & REQ_OP_MASK) +static inline enum req_op bio_op(const struct bio *bio) +{ + return bio->bi_opf & REQ_OP_MASK; +} /* obsolete, don't use in new code */ static inline void bio_set_op_attrs(struct bio *bio, unsigned op, -- cgit From 342a72a334073f163da924b69c3d3fb4685eb33a Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:06:31 -0700 Subject: block: Introduce the type blk_opf_t Introduce the type blk_opf_t for the request operation and flags (REQ_OP_* and REQ_*). This type will be used to improve documentation of the block layer code and also to allow sparse to verify whether request flags are used correctly. Cc: Christoph Hellwig Cc: Ming Lei Cc: Hannes Reinecke Cc: Damien Le Moal Cc: Johannes Thumshirn Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-6-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 97 +++++++++++++++++++++++++---------------------- 1 file changed, 51 insertions(+), 46 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index e66cbe377ae8..1ef99790f6ed 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -240,6 +240,8 @@ static inline void bio_issue_init(struct bio_issue *issue, ((u64)size << BIO_ISSUE_SIZE_SHIFT)); } +typedef __u32 __bitwise blk_opf_t; + typedef unsigned int blk_qc_t; #define BLK_QC_T_NONE -1U @@ -250,7 +252,7 @@ typedef unsigned int blk_qc_t; struct bio { struct bio *bi_next; /* request queue link */ struct block_device *bi_bdev; - unsigned int bi_opf; /* bottom bits REQ_OP, top bits + blk_opf_t bi_opf; /* bottom bits REQ_OP, top bits * req_flags. */ unsigned short bi_flags; /* BIO_* below */ @@ -338,7 +340,7 @@ enum { typedef __u32 __bitwise blk_mq_req_flags_t; #define REQ_OP_BITS 8 -#define REQ_OP_MASK ((1 << REQ_OP_BITS) - 1) +#define REQ_OP_MASK (__force blk_opf_t)((1 << REQ_OP_BITS) - 1) #define REQ_FLAG_BITS 24 /** @@ -356,35 +358,35 @@ typedef __u32 __bitwise blk_mq_req_flags_t; */ enum req_op { /* read sectors from the device */ - REQ_OP_READ = 0, + REQ_OP_READ = (__force blk_opf_t)0, /* write sectors to the device */ - REQ_OP_WRITE = 1, + REQ_OP_WRITE = (__force blk_opf_t)1, /* flush the volatile write cache */ - REQ_OP_FLUSH = 2, + REQ_OP_FLUSH = (__force blk_opf_t)2, /* discard sectors */ - REQ_OP_DISCARD = 3, + REQ_OP_DISCARD = (__force blk_opf_t)3, /* securely erase sectors */ - REQ_OP_SECURE_ERASE = 5, + REQ_OP_SECURE_ERASE = (__force blk_opf_t)5, /* write the zero filled sector many times */ - REQ_OP_WRITE_ZEROES = 9, + REQ_OP_WRITE_ZEROES = (__force blk_opf_t)9, /* Open a zone */ - REQ_OP_ZONE_OPEN = 10, + REQ_OP_ZONE_OPEN = (__force blk_opf_t)10, /* Close a zone */ - REQ_OP_ZONE_CLOSE = 11, + REQ_OP_ZONE_CLOSE = (__force blk_opf_t)11, /* Transition a zone to full */ - REQ_OP_ZONE_FINISH = 12, + REQ_OP_ZONE_FINISH = (__force blk_opf_t)12, /* write data at the current zone write pointer */ - REQ_OP_ZONE_APPEND = 13, + REQ_OP_ZONE_APPEND = (__force blk_opf_t)13, /* reset a zone write pointer */ - REQ_OP_ZONE_RESET = 15, + REQ_OP_ZONE_RESET = (__force blk_opf_t)15, /* reset all the zone present on the device */ - REQ_OP_ZONE_RESET_ALL = 17, + REQ_OP_ZONE_RESET_ALL = (__force blk_opf_t)17, /* Driver private requests */ - REQ_OP_DRV_IN = 34, - REQ_OP_DRV_OUT = 35, + REQ_OP_DRV_IN = (__force blk_opf_t)34, + REQ_OP_DRV_OUT = (__force blk_opf_t)35, - REQ_OP_LAST, + REQ_OP_LAST = (__force blk_opf_t)36, }; enum req_flag_bits { @@ -425,28 +427,31 @@ enum req_flag_bits { __REQ_NR_BITS, /* stops here */ }; -#define REQ_FAILFAST_DEV (1ULL << __REQ_FAILFAST_DEV) -#define REQ_FAILFAST_TRANSPORT (1ULL << __REQ_FAILFAST_TRANSPORT) -#define REQ_FAILFAST_DRIVER (1ULL << __REQ_FAILFAST_DRIVER) -#define REQ_SYNC (1ULL << __REQ_SYNC) -#define REQ_META (1ULL << __REQ_META) -#define REQ_PRIO (1ULL << __REQ_PRIO) -#define REQ_NOMERGE (1ULL << __REQ_NOMERGE) -#define REQ_IDLE (1ULL << __REQ_IDLE) -#define REQ_INTEGRITY (1ULL << __REQ_INTEGRITY) -#define REQ_FUA (1ULL << __REQ_FUA) -#define REQ_PREFLUSH (1ULL << __REQ_PREFLUSH) -#define REQ_RAHEAD (1ULL << __REQ_RAHEAD) -#define REQ_BACKGROUND (1ULL << __REQ_BACKGROUND) -#define REQ_NOWAIT (1ULL << __REQ_NOWAIT) -#define REQ_CGROUP_PUNT (1ULL << __REQ_CGROUP_PUNT) - -#define REQ_NOUNMAP (1ULL << __REQ_NOUNMAP) -#define REQ_POLLED (1ULL << __REQ_POLLED) -#define REQ_ALLOC_CACHE (1ULL << __REQ_ALLOC_CACHE) - -#define REQ_DRV (1ULL << __REQ_DRV) -#define REQ_SWAP (1ULL << __REQ_SWAP) +#define REQ_FAILFAST_DEV \ + (__force blk_opf_t)(1ULL << __REQ_FAILFAST_DEV) +#define REQ_FAILFAST_TRANSPORT \ + (__force blk_opf_t)(1ULL << __REQ_FAILFAST_TRANSPORT) +#define REQ_FAILFAST_DRIVER \ + (__force blk_opf_t)(1ULL << __REQ_FAILFAST_DRIVER) +#define REQ_SYNC (__force blk_opf_t)(1ULL << __REQ_SYNC) +#define REQ_META (__force blk_opf_t)(1ULL << __REQ_META) +#define REQ_PRIO (__force blk_opf_t)(1ULL << __REQ_PRIO) +#define REQ_NOMERGE (__force blk_opf_t)(1ULL << __REQ_NOMERGE) +#define REQ_IDLE (__force blk_opf_t)(1ULL << __REQ_IDLE) +#define REQ_INTEGRITY (__force blk_opf_t)(1ULL << __REQ_INTEGRITY) +#define REQ_FUA (__force blk_opf_t)(1ULL << __REQ_FUA) +#define REQ_PREFLUSH (__force blk_opf_t)(1ULL << __REQ_PREFLUSH) +#define REQ_RAHEAD (__force blk_opf_t)(1ULL << __REQ_RAHEAD) +#define REQ_BACKGROUND (__force blk_opf_t)(1ULL << __REQ_BACKGROUND) +#define REQ_NOWAIT (__force blk_opf_t)(1ULL << __REQ_NOWAIT) +#define REQ_CGROUP_PUNT (__force blk_opf_t)(1ULL << __REQ_CGROUP_PUNT) + +#define REQ_NOUNMAP (__force blk_opf_t)(1ULL << __REQ_NOUNMAP) +#define REQ_POLLED (__force blk_opf_t)(1ULL << __REQ_POLLED) +#define REQ_ALLOC_CACHE (__force blk_opf_t)(1ULL << __REQ_ALLOC_CACHE) + +#define REQ_DRV (__force blk_opf_t)(1ULL << __REQ_DRV) +#define REQ_SWAP (__force blk_opf_t)(1ULL << __REQ_SWAP) #define REQ_FAILFAST_MASK \ (REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER) @@ -469,22 +474,22 @@ static inline enum req_op bio_op(const struct bio *bio) } /* obsolete, don't use in new code */ -static inline void bio_set_op_attrs(struct bio *bio, unsigned op, - unsigned op_flags) +static inline void bio_set_op_attrs(struct bio *bio, enum req_op op, + blk_opf_t op_flags) { bio->bi_opf = op | op_flags; } -static inline bool op_is_write(unsigned int op) +static inline bool op_is_write(blk_opf_t op) { - return (op & 1); + return !!(op & (__force blk_opf_t)1); } /* * Check if the bio or request is one that needs special treatment in the * flush state machine. */ -static inline bool op_is_flush(unsigned int op) +static inline bool op_is_flush(blk_opf_t op) { return op & (REQ_FUA | REQ_PREFLUSH); } @@ -494,13 +499,13 @@ static inline bool op_is_flush(unsigned int op) * PREFLUSH flag. Other operations may be marked as synchronous using the * REQ_SYNC flag. */ -static inline bool op_is_sync(unsigned int op) +static inline bool op_is_sync(blk_opf_t op) { return (op & REQ_OP_MASK) == REQ_OP_READ || (op & (REQ_SYNC | REQ_FUA | REQ_PREFLUSH)); } -static inline bool op_is_discard(unsigned int op) +static inline bool op_is_discard(blk_opf_t op) { return (op & REQ_OP_MASK) == REQ_OP_DISCARD; } -- cgit From 16458cf3bd15e5624205df6e8a76b9a5363555f3 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:06:32 -0700 Subject: block: Use the new blk_opf_t type Use the new blk_opf_t type for arguments and variables that represent request flags or a bitwise combination of a request operation and request flags. Rename the function arguments and also a structure member that hold a request operation and flags from 'rw' into 'opf'. This patch does not change any functionality. Cc: Christoph Hellwig Cc: Ming Lei Cc: Hannes Reinecke Cc: Damien Le Moal Cc: Johannes Thumshirn Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-7-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/bio.h | 10 +++++----- include/linux/blk-mq.h | 6 +++--- include/linux/blkdev.h | 2 +- 3 files changed, 9 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index 992ee987f273..ca22b06700a9 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -405,7 +405,7 @@ extern void bioset_exit(struct bio_set *); extern int biovec_init_pool(mempool_t *pool, int pool_entries); struct bio *bio_alloc_bioset(struct block_device *bdev, unsigned short nr_vecs, - unsigned int opf, gfp_t gfp_mask, + blk_opf_t opf, gfp_t gfp_mask, struct bio_set *bs); struct bio *bio_kmalloc(unsigned short nr_vecs, gfp_t gfp_mask); extern void bio_put(struct bio *); @@ -418,7 +418,7 @@ int bio_init_clone(struct block_device *bdev, struct bio *bio, extern struct bio_set fs_bio_set; static inline struct bio *bio_alloc(struct block_device *bdev, - unsigned short nr_vecs, unsigned int opf, gfp_t gfp_mask) + unsigned short nr_vecs, blk_opf_t opf, gfp_t gfp_mask) { return bio_alloc_bioset(bdev, nr_vecs, opf, gfp_mask, &fs_bio_set); } @@ -456,9 +456,9 @@ struct request_queue; extern int submit_bio_wait(struct bio *bio); void bio_init(struct bio *bio, struct block_device *bdev, struct bio_vec *table, - unsigned short max_vecs, unsigned int opf); + unsigned short max_vecs, blk_opf_t opf); extern void bio_uninit(struct bio *); -void bio_reset(struct bio *bio, struct block_device *bdev, unsigned int opf); +void bio_reset(struct bio *bio, struct block_device *bdev, blk_opf_t opf); void bio_chain(struct bio *, struct bio *); int bio_add_page(struct bio *, struct page *, unsigned len, unsigned off); @@ -789,6 +789,6 @@ static inline void bio_clear_polled(struct bio *bio) } struct bio *blk_next_bio(struct bio *bio, struct block_device *bdev, - unsigned int nr_pages, unsigned int opf, gfp_t gfp); + unsigned int nr_pages, blk_opf_t opf, gfp_t gfp); #endif /* __LINUX_BIO_H */ diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 677195de0663..effee1dc715a 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -80,7 +80,7 @@ struct request { struct blk_mq_ctx *mq_ctx; struct blk_mq_hw_ctx *mq_hctx; - unsigned int cmd_flags; /* op and common flags */ + blk_opf_t cmd_flags; /* op and common flags */ req_flags_t rq_flags; int tag; @@ -715,10 +715,10 @@ enum { BLK_MQ_REQ_PM = (__force blk_mq_req_flags_t)(1 << 2), }; -struct request *blk_mq_alloc_request(struct request_queue *q, unsigned int op, +struct request *blk_mq_alloc_request(struct request_queue *q, blk_opf_t opf, blk_mq_req_flags_t flags); struct request *blk_mq_alloc_request_hctx(struct request_queue *q, - unsigned int op, blk_mq_req_flags_t flags, + blk_opf_t opf, blk_mq_req_flags_t flags, unsigned int hctx_idx); /* diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index ca2ff113ea00..d04bdf549efa 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -250,7 +250,7 @@ static inline int blk_validate_block_size(unsigned long bsize) return 0; } -static inline bool blk_op_is_passthrough(unsigned int op) +static inline bool blk_op_is_passthrough(blk_opf_t op) { op &= REQ_OP_MASK; return op == REQ_OP_DRV_IN || op == REQ_OP_DRV_OUT; -- cgit From 919dbca8670d0f7828dfbb2f9b434ac22dca8d2e Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:06:37 -0700 Subject: blktrace: Use the new blk_opf_t type Improve static type checking by using the new blk_opf_t type for a function argument that represents a combination of a request operation and request flags. Rename that argument from 'op' into 'opf' to make its role more clear. Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Li Zefan Cc: Chaitanya Kulkarni Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-12-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/blktrace_api.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blktrace_api.h b/include/linux/blktrace_api.h index f6f9b544365a..cfbda114348c 100644 --- a/include/linux/blktrace_api.h +++ b/include/linux/blktrace_api.h @@ -7,6 +7,7 @@ #include #include #include +#include #if defined(CONFIG_BLK_DEV_IO_TRACE) @@ -105,7 +106,7 @@ struct compat_blk_user_trace_setup { #endif -void blk_fill_rwbs(char *rwbs, unsigned int op); +void blk_fill_rwbs(char *rwbs, blk_opf_t opf); static inline sector_t blk_rq_trace_sector(struct request *rq) { -- cgit From 581075e4f6475bb97c73ecccf68636a9453a31fd Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:06:47 -0700 Subject: dm/core: Reduce the size of struct dm_io_request Combine the bi_op and bi_op_flags into the bi_opf member. Use the new blk_opf_t type to improve static type checking. This patch does not change any functionality. Cc: Alasdair Kergon Cc: Mike Snitzer Cc: Mikulas Patocka Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-22-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/dm-io.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dm-io.h b/include/linux/dm-io.h index a52c6580cc9a..8e1c4ab5df04 100644 --- a/include/linux/dm-io.h +++ b/include/linux/dm-io.h @@ -13,6 +13,7 @@ #ifdef __KERNEL__ #include +#include struct dm_io_region { struct block_device *bdev; @@ -57,8 +58,7 @@ struct dm_io_notify { */ struct dm_io_client; struct dm_io_request { - int bi_op; /* REQ_OP */ - int bi_op_flags; /* req_flag_bits */ + blk_opf_t bi_opf; /* Request type and flags */ struct dm_io_memory mem; /* Memory to use for io */ struct dm_io_notify notify; /* Synchronous if notify.fn is NULL */ struct dm_io_client *client; /* Client memory handler */ -- cgit From f8e6e4bd9fd8c452565f3eaeb358e3cc08d880f4 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:07:11 -0700 Subject: mm: Use the new blk_opf_t type Improve static type checking by using the new blk_opf_t type for block layer request flags. Cc: Andrew Morton Cc: Christoph Hellwig Cc: Jan Kara Cc: Stefan Roesch Cc: NeilBrown Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-46-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/writeback.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/writeback.h b/include/linux/writeback.h index da21d63f70e2..e91bea371b18 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -101,9 +101,9 @@ struct writeback_control { #endif }; -static inline int wbc_to_write_flags(struct writeback_control *wbc) +static inline blk_opf_t wbc_to_write_flags(struct writeback_control *wbc) { - int flags = 0; + blk_opf_t flags = 0; if (wbc->punt_to_cgroup) flags = REQ_CGROUP_PUNT; -- cgit From 3ae7286943ae6f6bfecfe0a3da9d1a4c64f5531f Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:07:12 -0700 Subject: fs/buffer: Use the new blk_opf_t type Improve static type checking by using the new blk_opf_t type for block layer request flags. Change WRITE into REQ_OP_WRITE. This patch does not change any functionality since REQ_OP_WRITE == WRITE == 1. Reviewed-by: Jan Kara Cc: Al Viro Cc: Christoph Hellwig Cc: Matthew Wilcox Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-47-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/buffer_head.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index c9d1463bb20f..9795df9400bd 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -9,6 +9,7 @@ #define _LINUX_BUFFER_HEAD_H #include +#include #include #include #include @@ -201,11 +202,11 @@ struct buffer_head *alloc_buffer_head(gfp_t gfp_flags); void free_buffer_head(struct buffer_head * bh); void unlock_buffer(struct buffer_head *bh); void __lock_buffer(struct buffer_head *bh); -void ll_rw_block(int, int, int, struct buffer_head * bh[]); +void ll_rw_block(enum req_op, blk_opf_t, int, struct buffer_head * bh[]); int sync_dirty_buffer(struct buffer_head *bh); -int __sync_dirty_buffer(struct buffer_head *bh, int op_flags); -void write_dirty_buffer(struct buffer_head *bh, int op_flags); -int submit_bh(int, int, struct buffer_head *); +int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); +void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); +int submit_bh(enum req_op, blk_opf_t, struct buffer_head *); void write_boundary_block(struct block_device *bdev, sector_t bblock, unsigned blocksize); int bh_uptodate_or_lock(struct buffer_head *bh); -- cgit From 1420c4a549bf28ffddbed827d61fb3d4d2132ddb Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:07:13 -0700 Subject: fs/buffer: Combine two submit_bh() and ll_rw_block() arguments Both submit_bh() and ll_rw_block() accept a request operation type and request flags as their first two arguments. Micro-optimize these two functions by combining these first two arguments into a single argument. This patch does not change the behavior of any of the modified code. Cc: Alexander Viro Cc: Jan Kara Acked-by: Song Liu (for the md changes) Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-48-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/buffer_head.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 9795df9400bd..bb68eb6407da 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -202,11 +202,11 @@ struct buffer_head *alloc_buffer_head(gfp_t gfp_flags); void free_buffer_head(struct buffer_head * bh); void unlock_buffer(struct buffer_head *bh); void __lock_buffer(struct buffer_head *bh); -void ll_rw_block(enum req_op, blk_opf_t, int, struct buffer_head * bh[]); +void ll_rw_block(blk_opf_t, int, struct buffer_head * bh[]); int sync_dirty_buffer(struct buffer_head *bh); int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); -int submit_bh(enum req_op, blk_opf_t, struct buffer_head *); +int submit_bh(blk_opf_t, struct buffer_head *); void write_boundary_block(struct block_device *bdev, sector_t bblock, unsigned blocksize); int bh_uptodate_or_lock(struct buffer_head *bh); -- cgit From 6669797b0dd41ced457760b6e1014fdda8ce19ce Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Thu, 14 Jul 2022 11:07:22 -0700 Subject: fs/jbd2: Fix the documentation of the jbd2_write_superblock() callers Commit 2a222ca992c3 ("fs: have submit_bh users pass in op and flags separately") renamed the jbd2_write_superblock() 'write_op' argument into 'write_flags'. Propagate this change to the jbd2_write_superblock() callers. Additionally, change the type of 'write_flags' into blk_opf_t. Cc: Mike Christie Cc: Theodore Ts'o Signed-off-by: Bart Van Assche Link: https://lore.kernel.org/r/20220714180729.1065367-57-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/jbd2.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index e79d6e0b14e8..dc1724131300 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1557,7 +1557,7 @@ extern int jbd2_journal_wipe (journal_t *, int); extern int jbd2_journal_skip_recovery (journal_t *); extern void jbd2_journal_update_sb_errno(journal_t *); extern int jbd2_journal_update_sb_log_tail (journal_t *, tid_t, - unsigned long, int); + unsigned long, blk_opf_t); extern void jbd2_journal_abort (journal_t *, int); extern int jbd2_journal_errno (journal_t *); extern void jbd2_journal_ack_err (journal_t *); -- cgit From b644c95598adfe2b968e97daee3a890dd2fda84d Mon Sep 17 00:00:00 2001 From: PaddyKP_Yao Date: Mon, 11 Jul 2022 19:51:25 +0800 Subject: platform/x86: asus-wmi: Add mic-mute LED classdev support In some new ASUS devices, hotkey Fn+F13 is used for mic mute. If mic-mute LED is present by checking WMI ASUS_WMI_DEVID_MICMUTE_LED, we will add a mic-mute LED classdev, asus::micmute, in the asus-wmi driver to control it. The binding of mic-mute LED controls will be swithched with LED trigger. Signed-off-by: PaddyKP_Yao Link: https://lore.kernel.org/r/20220711115125.2072508-1-PaddyKP_Yao@asus.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index a571b47ff362..98f2b2f20f3e 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -49,6 +49,7 @@ #define ASUS_WMI_DEVID_LED4 0x00020014 #define ASUS_WMI_DEVID_LED5 0x00020015 #define ASUS_WMI_DEVID_LED6 0x00020016 +#define ASUS_WMI_DEVID_MICMUTE_LED 0x00040017 /* Backlight and Brightness */ #define ASUS_WMI_DEVID_ALS_ENABLE 0x00050001 /* Ambient Light Sensor */ -- cgit From cd89edda4002b7fb3c0a6765c3a60a60d5b1dc16 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Thu, 14 Jul 2022 20:42:12 +0800 Subject: PCI: loongson: Add ACPI init support Loongson PCH (LS7A chipset) will be used by both MIPS-based and LoongArch- based Loongson processors. MIPS-based Loongson uses FDT, while LoongArch- based Loongson uses ACPI. Add ACPI init support for the driver in pci-loongson.c because it is currently FDT-only. LoongArch is a new RISC ISA, mainline support will come soon, and documentations are here (in translation): https://github.com/loongson/LoongArch-Documentation Link: https://lore.kernel.org/r/20220714124216.1489304-4-chenhuacai@loongson.cn Signed-off-by: Huacai Chen Signed-off-by: Bjorn Helgaas --- include/linux/pci-ecam.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pci-ecam.h b/include/linux/pci-ecam.h index adea5a4771cf..6b1301e2498e 100644 --- a/include/linux/pci-ecam.h +++ b/include/linux/pci-ecam.h @@ -87,6 +87,7 @@ extern const struct pci_ecam_ops xgene_v1_pcie_ecam_ops; /* APM X-Gene PCIe v1 * extern const struct pci_ecam_ops xgene_v2_pcie_ecam_ops; /* APM X-Gene PCIe v2.x */ extern const struct pci_ecam_ops al_pcie_ops; /* Amazon Annapurna Labs PCIe */ extern const struct pci_ecam_ops tegra194_pcie_ops; /* Tegra194 PCIe */ +extern const struct pci_ecam_ops loongson_pci_ecam_ops; /* Loongson PCIe */ #endif #if IS_ENABLED(CONFIG_PCI_HOST_COMMON) -- cgit From e2863a78593d638d3924a6f67900c4820034f349 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Fri, 1 Jul 2022 05:54:24 -0700 Subject: lib/bitmap: change return types to bool where appropriate Some bitmap functions return boolean results in int variables. Fix it by changing return types to bool. Signed-off-by: Yury Norov --- include/linux/bitmap.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index c91638e507f2..e1a438bdda52 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -148,13 +148,13 @@ void __bitmap_shift_left(unsigned long *dst, const unsigned long *src, unsigned int shift, unsigned int nbits); void bitmap_cut(unsigned long *dst, const unsigned long *src, unsigned int first, unsigned int cut, unsigned int nbits); -int __bitmap_and(unsigned long *dst, const unsigned long *bitmap1, +bool __bitmap_and(unsigned long *dst, const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); void __bitmap_or(unsigned long *dst, const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); void __bitmap_xor(unsigned long *dst, const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); -int __bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1, +bool __bitmap_andnot(unsigned long *dst, const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); void __bitmap_replace(unsigned long *dst, const unsigned long *old, const unsigned long *new, @@ -315,7 +315,7 @@ void bitmap_to_arr64(u64 *buf, const unsigned long *bitmap, unsigned int nbits); bitmap_copy_clear_tail((unsigned long *)(buf), (const unsigned long *)(bitmap), (nbits)) #endif -static inline int bitmap_and(unsigned long *dst, const unsigned long *src1, +static inline bool bitmap_and(unsigned long *dst, const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { if (small_const_nbits(nbits)) @@ -341,7 +341,7 @@ static inline void bitmap_xor(unsigned long *dst, const unsigned long *src1, __bitmap_xor(dst, src1, src2, nbits); } -static inline int bitmap_andnot(unsigned long *dst, const unsigned long *src1, +static inline bool bitmap_andnot(unsigned long *dst, const unsigned long *src1, const unsigned long *src2, unsigned int nbits) { if (small_const_nbits(nbits)) -- cgit From 3a06ed80265fa62eecaf519d92f1633e4f9510c7 Mon Sep 17 00:00:00 2001 From: Michael Wu Date: Fri, 8 Jul 2022 17:57:14 +0800 Subject: extcon: Add EXTCON_DISP_CVBS and EXTCON_DISP_EDP Add EXTCON_DISP_CVBS for Composite Video Broadcast Signal. Add EXTCON_DISP_EDP for Embedded Display Port [1] https://en.wikipedia.org/wiki/Composite_video [2] https://en.wikipedia.org/wiki/DisplayPort#eDP Signed-off-by: Michael Wu Signed-off-by: Chanwoo Choi --- include/linux/extcon.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/extcon.h b/include/linux/extcon.h index 685401d94d39..3c45c3846fe9 100644 --- a/include/linux/extcon.h +++ b/include/linux/extcon.h @@ -76,6 +76,8 @@ #define EXTCON_DISP_VGA 43 /* Video Graphics Array */ #define EXTCON_DISP_DP 44 /* Display Port */ #define EXTCON_DISP_HMD 45 /* Head-Mounted Display */ +#define EXTCON_DISP_CVBS 46 /* Composite Video Broadcast Signal */ +#define EXTCON_DISP_EDP 47 /* Embedded Display Port */ /* Miscellaneous external connector */ #define EXTCON_DOCK 60 -- cgit From 309c56e84602d894e7ca424b0b13837117568fca Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 8 Jul 2022 10:06:13 +0200 Subject: iommu: remove the unused dev_has_feat method This method is never actually called. Signed-off-by: Christoph Hellwig Reviewed-by: Lu Baolu Link: https://lore.kernel.org/r/20220708080616.238833-2-hch@lst.de Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index e6abd998dbe7..a3acdb46b939 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -164,8 +164,7 @@ struct iommu_iort_rmr_data { * supported, this feature must be enabled before and * disabled after %IOMMU_DEV_FEAT_SVA. * - * Device drivers query whether a feature is supported using - * iommu_dev_has_feature(), and enable it using iommu_dev_enable_feature(). + * Device drivers enable a feature using iommu_dev_enable_feature(). */ enum iommu_dev_features { IOMMU_DEV_FEAT_SVA, @@ -248,7 +247,6 @@ struct iommu_ops { bool (*is_attach_deferred)(struct device *dev); /* Per device IOMMU features */ - bool (*dev_has_feat)(struct device *dev, enum iommu_dev_features f); bool (*dev_feat_enabled)(struct device *dev, enum iommu_dev_features f); int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f); int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f); -- cgit From a871765d5588531e1972902d1ae077b8be306c94 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 8 Jul 2022 10:06:14 +0200 Subject: iommu: remove iommu_dev_feature_enabled Remove the unused iommu_dev_feature_enabled function. Signed-off-by: Christoph Hellwig Reviewed-by: Lu Baolu Link: https://lore.kernel.org/r/20220708080616.238833-3-hch@lst.de Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a3acdb46b939..0bc2eb14b026 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -215,7 +215,6 @@ struct iommu_iotlb_gather { * driver init to device driver init (default no) * @dev_has/enable/disable_feat: per device entries to check/enable/disable * iommu specific features. - * @dev_feat_enabled: check enabled feature * @sva_bind: Bind process address space to device * @sva_unbind: Unbind process address space from device * @sva_get_pasid: Get PASID associated to a SVA handle @@ -247,7 +246,6 @@ struct iommu_ops { bool (*is_attach_deferred)(struct device *dev); /* Per device IOMMU features */ - bool (*dev_feat_enabled)(struct device *dev, enum iommu_dev_features f); int (*dev_enable_feat)(struct device *dev, enum iommu_dev_features f); int (*dev_disable_feat)(struct device *dev, enum iommu_dev_features f); @@ -670,7 +668,6 @@ void iommu_release_device(struct device *dev); int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features f); int iommu_dev_disable_feature(struct device *dev, enum iommu_dev_features f); -bool iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features f); struct iommu_sva *iommu_sva_bind_device(struct device *dev, struct mm_struct *mm, @@ -997,12 +994,6 @@ const struct iommu_ops *iommu_ops_from_fwnode(struct fwnode_handle *fwnode) return NULL; } -static inline bool -iommu_dev_feature_enabled(struct device *dev, enum iommu_dev_features feat) -{ - return false; -} - static inline int iommu_dev_enable_feature(struct device *dev, enum iommu_dev_features feat) { -- cgit From ae3ff39a51a0f5843960487962e110339f321b0f Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 8 Jul 2022 10:06:15 +0200 Subject: iommu: remove the put_resv_regions method All drivers that implement get_resv_regions just use generic_put_resv_regions to implement the put side. Remove the indirections and document the allocations constraints. Signed-off-by: Christoph Hellwig Reviewed-by: Lu Baolu Link: https://lore.kernel.org/r/20220708080616.238833-4-hch@lst.de Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0bc2eb14b026..ea30f00dc145 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -209,7 +209,6 @@ struct iommu_iotlb_gather { * group and attached to the groups domain * @device_group: find iommu group for a particular device * @get_resv_regions: Request list of reserved regions for a device - * @put_resv_regions: Free list of reserved regions for a device * @of_xlate: add OF master IDs to iommu grouping * @is_attach_deferred: Check if domain attach should be deferred from iommu * driver init to device driver init (default no) @@ -240,7 +239,6 @@ struct iommu_ops { /* Request/Free a list of reserved regions for a device */ void (*get_resv_regions)(struct device *dev, struct list_head *list); - void (*put_resv_regions)(struct device *dev, struct list_head *list); int (*of_xlate)(struct device *dev, struct of_phandle_args *args); bool (*is_attach_deferred)(struct device *dev); @@ -454,8 +452,6 @@ extern void iommu_set_fault_handler(struct iommu_domain *domain, extern void iommu_get_resv_regions(struct device *dev, struct list_head *list); extern void iommu_put_resv_regions(struct device *dev, struct list_head *list); -extern void generic_iommu_put_resv_regions(struct device *dev, - struct list_head *list); extern void iommu_set_default_passthrough(bool cmd_line); extern void iommu_set_default_translated(bool cmd_line); extern bool iommu_default_passthrough(void); -- cgit From f9903555dd05a4096d8fef4eb823c0b94f710982 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 12 Jul 2022 08:08:46 +0800 Subject: iommu/vt-d: Remove unnecessary exported symbol The exported symbol intel_iommu_gfx_mapped is not used anywhere in the tree. Remove it to avoid dead code. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Steve Wahl Link: https://lore.kernel.org/r/20220514014322.2927339-4-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h index 5fcf89faa31a..4ebf3c4da45e 100644 --- a/include/linux/intel-iommu.h +++ b/include/linux/intel-iommu.h @@ -787,7 +787,6 @@ extern int iommu_calculate_agaw(struct intel_iommu *iommu); extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu); extern int dmar_disabled; extern int intel_iommu_enabled; -extern int intel_iommu_gfx_mapped; #else static inline int iommu_calculate_agaw(struct intel_iommu *iommu) { -- cgit From 853788b9a66ff089053df3b4ed7d166e61def449 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 12 Jul 2022 08:08:49 +0800 Subject: x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMU tboot_force_iommu() is only called by the Intel IOMMU driver. Move the helper into that driver. No functional change intended. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Steve Wahl Link: https://lore.kernel.org/r/20220514014322.2927339-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/tboot.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tboot.h b/include/linux/tboot.h index 5146d2574e85..d2279160ef39 100644 --- a/include/linux/tboot.h +++ b/include/linux/tboot.h @@ -126,7 +126,6 @@ extern void tboot_probe(void); extern void tboot_shutdown(u32 shutdown_type); extern struct acpi_table_header *tboot_get_dmar_table( struct acpi_table_header *dmar_tbl); -extern int tboot_force_iommu(void); #else @@ -136,7 +135,6 @@ extern int tboot_force_iommu(void); #define tboot_sleep(sleep_state, pm1a_control, pm1b_control) \ do { } while (0) #define tboot_get_dmar_table(dmar_tbl) (dmar_tbl) -#define tboot_force_iommu() 0 #endif /* !CONFIG_INTEL_TXT */ -- cgit From 2585a2790e7fdb3dadfe309c9220bbc03704f84f Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 12 Jul 2022 08:08:50 +0800 Subject: iommu/vt-d: Move include/linux/intel-iommu.h under iommu This header file is private to the Intel IOMMU driver. Move it to the driver folder. Signed-off-by: Lu Baolu Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Steve Wahl Link: https://lore.kernel.org/r/20220514014322.2927339-8-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/intel-iommu.h | 831 -------------------------------------------- 1 file changed, 831 deletions(-) delete mode 100644 include/linux/intel-iommu.h (limited to 'include/linux') diff --git a/include/linux/intel-iommu.h b/include/linux/intel-iommu.h deleted file mode 100644 index 4ebf3c4da45e..000000000000 --- a/include/linux/intel-iommu.h +++ /dev/null @@ -1,831 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright © 2006-2015, Intel Corporation. - * - * Authors: Ashok Raj - * Anil S Keshavamurthy - * David Woodhouse - */ - -#ifndef _INTEL_IOMMU_H_ -#define _INTEL_IOMMU_H_ - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include -#include - -/* - * VT-d hardware uses 4KiB page size regardless of host page size. - */ -#define VTD_PAGE_SHIFT (12) -#define VTD_PAGE_SIZE (1UL << VTD_PAGE_SHIFT) -#define VTD_PAGE_MASK (((u64)-1) << VTD_PAGE_SHIFT) -#define VTD_PAGE_ALIGN(addr) (((addr) + VTD_PAGE_SIZE - 1) & VTD_PAGE_MASK) - -#define VTD_STRIDE_SHIFT (9) -#define VTD_STRIDE_MASK (((u64)-1) << VTD_STRIDE_SHIFT) - -#define DMA_PTE_READ BIT_ULL(0) -#define DMA_PTE_WRITE BIT_ULL(1) -#define DMA_PTE_LARGE_PAGE BIT_ULL(7) -#define DMA_PTE_SNP BIT_ULL(11) - -#define DMA_FL_PTE_PRESENT BIT_ULL(0) -#define DMA_FL_PTE_US BIT_ULL(2) -#define DMA_FL_PTE_ACCESS BIT_ULL(5) -#define DMA_FL_PTE_DIRTY BIT_ULL(6) -#define DMA_FL_PTE_XD BIT_ULL(63) - -#define ADDR_WIDTH_5LEVEL (57) -#define ADDR_WIDTH_4LEVEL (48) - -#define CONTEXT_TT_MULTI_LEVEL 0 -#define CONTEXT_TT_DEV_IOTLB 1 -#define CONTEXT_TT_PASS_THROUGH 2 -#define CONTEXT_PASIDE BIT_ULL(3) - -/* - * Intel IOMMU register specification per version 1.0 public spec. - */ -#define DMAR_VER_REG 0x0 /* Arch version supported by this IOMMU */ -#define DMAR_CAP_REG 0x8 /* Hardware supported capabilities */ -#define DMAR_ECAP_REG 0x10 /* Extended capabilities supported */ -#define DMAR_GCMD_REG 0x18 /* Global command register */ -#define DMAR_GSTS_REG 0x1c /* Global status register */ -#define DMAR_RTADDR_REG 0x20 /* Root entry table */ -#define DMAR_CCMD_REG 0x28 /* Context command reg */ -#define DMAR_FSTS_REG 0x34 /* Fault Status register */ -#define DMAR_FECTL_REG 0x38 /* Fault control register */ -#define DMAR_FEDATA_REG 0x3c /* Fault event interrupt data register */ -#define DMAR_FEADDR_REG 0x40 /* Fault event interrupt addr register */ -#define DMAR_FEUADDR_REG 0x44 /* Upper address register */ -#define DMAR_AFLOG_REG 0x58 /* Advanced Fault control */ -#define DMAR_PMEN_REG 0x64 /* Enable Protected Memory Region */ -#define DMAR_PLMBASE_REG 0x68 /* PMRR Low addr */ -#define DMAR_PLMLIMIT_REG 0x6c /* PMRR low limit */ -#define DMAR_PHMBASE_REG 0x70 /* pmrr high base addr */ -#define DMAR_PHMLIMIT_REG 0x78 /* pmrr high limit */ -#define DMAR_IQH_REG 0x80 /* Invalidation queue head register */ -#define DMAR_IQT_REG 0x88 /* Invalidation queue tail register */ -#define DMAR_IQ_SHIFT 4 /* Invalidation queue head/tail shift */ -#define DMAR_IQA_REG 0x90 /* Invalidation queue addr register */ -#define DMAR_ICS_REG 0x9c /* Invalidation complete status register */ -#define DMAR_IQER_REG 0xb0 /* Invalidation queue error record register */ -#define DMAR_IRTA_REG 0xb8 /* Interrupt remapping table addr register */ -#define DMAR_PQH_REG 0xc0 /* Page request queue head register */ -#define DMAR_PQT_REG 0xc8 /* Page request queue tail register */ -#define DMAR_PQA_REG 0xd0 /* Page request queue address register */ -#define DMAR_PRS_REG 0xdc /* Page request status register */ -#define DMAR_PECTL_REG 0xe0 /* Page request event control register */ -#define DMAR_PEDATA_REG 0xe4 /* Page request event interrupt data register */ -#define DMAR_PEADDR_REG 0xe8 /* Page request event interrupt addr register */ -#define DMAR_PEUADDR_REG 0xec /* Page request event Upper address register */ -#define DMAR_MTRRCAP_REG 0x100 /* MTRR capability register */ -#define DMAR_MTRRDEF_REG 0x108 /* MTRR default type register */ -#define DMAR_MTRR_FIX64K_00000_REG 0x120 /* MTRR Fixed range registers */ -#define DMAR_MTRR_FIX16K_80000_REG 0x128 -#define DMAR_MTRR_FIX16K_A0000_REG 0x130 -#define DMAR_MTRR_FIX4K_C0000_REG 0x138 -#define DMAR_MTRR_FIX4K_C8000_REG 0x140 -#define DMAR_MTRR_FIX4K_D0000_REG 0x148 -#define DMAR_MTRR_FIX4K_D8000_REG 0x150 -#define DMAR_MTRR_FIX4K_E0000_REG 0x158 -#define DMAR_MTRR_FIX4K_E8000_REG 0x160 -#define DMAR_MTRR_FIX4K_F0000_REG 0x168 -#define DMAR_MTRR_FIX4K_F8000_REG 0x170 -#define DMAR_MTRR_PHYSBASE0_REG 0x180 /* MTRR Variable range registers */ -#define DMAR_MTRR_PHYSMASK0_REG 0x188 -#define DMAR_MTRR_PHYSBASE1_REG 0x190 -#define DMAR_MTRR_PHYSMASK1_REG 0x198 -#define DMAR_MTRR_PHYSBASE2_REG 0x1a0 -#define DMAR_MTRR_PHYSMASK2_REG 0x1a8 -#define DMAR_MTRR_PHYSBASE3_REG 0x1b0 -#define DMAR_MTRR_PHYSMASK3_REG 0x1b8 -#define DMAR_MTRR_PHYSBASE4_REG 0x1c0 -#define DMAR_MTRR_PHYSMASK4_REG 0x1c8 -#define DMAR_MTRR_PHYSBASE5_REG 0x1d0 -#define DMAR_MTRR_PHYSMASK5_REG 0x1d8 -#define DMAR_MTRR_PHYSBASE6_REG 0x1e0 -#define DMAR_MTRR_PHYSMASK6_REG 0x1e8 -#define DMAR_MTRR_PHYSBASE7_REG 0x1f0 -#define DMAR_MTRR_PHYSMASK7_REG 0x1f8 -#define DMAR_MTRR_PHYSBASE8_REG 0x200 -#define DMAR_MTRR_PHYSMASK8_REG 0x208 -#define DMAR_MTRR_PHYSBASE9_REG 0x210 -#define DMAR_MTRR_PHYSMASK9_REG 0x218 -#define DMAR_VCCAP_REG 0xe30 /* Virtual command capability register */ -#define DMAR_VCMD_REG 0xe00 /* Virtual command register */ -#define DMAR_VCRSP_REG 0xe10 /* Virtual command response register */ - -#define DMAR_IQER_REG_IQEI(reg) FIELD_GET(GENMASK_ULL(3, 0), reg) -#define DMAR_IQER_REG_ITESID(reg) FIELD_GET(GENMASK_ULL(47, 32), reg) -#define DMAR_IQER_REG_ICESID(reg) FIELD_GET(GENMASK_ULL(63, 48), reg) - -#define OFFSET_STRIDE (9) - -#define dmar_readq(a) readq(a) -#define dmar_writeq(a,v) writeq(v,a) -#define dmar_readl(a) readl(a) -#define dmar_writel(a, v) writel(v, a) - -#define DMAR_VER_MAJOR(v) (((v) & 0xf0) >> 4) -#define DMAR_VER_MINOR(v) ((v) & 0x0f) - -/* - * Decoding Capability Register - */ -#define cap_5lp_support(c) (((c) >> 60) & 1) -#define cap_pi_support(c) (((c) >> 59) & 1) -#define cap_fl1gp_support(c) (((c) >> 56) & 1) -#define cap_read_drain(c) (((c) >> 55) & 1) -#define cap_write_drain(c) (((c) >> 54) & 1) -#define cap_max_amask_val(c) (((c) >> 48) & 0x3f) -#define cap_num_fault_regs(c) ((((c) >> 40) & 0xff) + 1) -#define cap_pgsel_inv(c) (((c) >> 39) & 1) - -#define cap_super_page_val(c) (((c) >> 34) & 0xf) -#define cap_super_offset(c) (((find_first_bit(&cap_super_page_val(c), 4)) \ - * OFFSET_STRIDE) + 21) - -#define cap_fault_reg_offset(c) ((((c) >> 24) & 0x3ff) * 16) -#define cap_max_fault_reg_offset(c) \ - (cap_fault_reg_offset(c) + cap_num_fault_regs(c) * 16) - -#define cap_zlr(c) (((c) >> 22) & 1) -#define cap_isoch(c) (((c) >> 23) & 1) -#define cap_mgaw(c) ((((c) >> 16) & 0x3f) + 1) -#define cap_sagaw(c) (((c) >> 8) & 0x1f) -#define cap_caching_mode(c) (((c) >> 7) & 1) -#define cap_phmr(c) (((c) >> 6) & 1) -#define cap_plmr(c) (((c) >> 5) & 1) -#define cap_rwbf(c) (((c) >> 4) & 1) -#define cap_afl(c) (((c) >> 3) & 1) -#define cap_ndoms(c) (((unsigned long)1) << (4 + 2 * ((c) & 0x7))) -/* - * Extended Capability Register - */ - -#define ecap_rps(e) (((e) >> 49) & 0x1) -#define ecap_smpwc(e) (((e) >> 48) & 0x1) -#define ecap_flts(e) (((e) >> 47) & 0x1) -#define ecap_slts(e) (((e) >> 46) & 0x1) -#define ecap_slads(e) (((e) >> 45) & 0x1) -#define ecap_vcs(e) (((e) >> 44) & 0x1) -#define ecap_smts(e) (((e) >> 43) & 0x1) -#define ecap_dit(e) (((e) >> 41) & 0x1) -#define ecap_pds(e) (((e) >> 42) & 0x1) -#define ecap_pasid(e) (((e) >> 40) & 0x1) -#define ecap_pss(e) (((e) >> 35) & 0x1f) -#define ecap_eafs(e) (((e) >> 34) & 0x1) -#define ecap_nwfs(e) (((e) >> 33) & 0x1) -#define ecap_srs(e) (((e) >> 31) & 0x1) -#define ecap_ers(e) (((e) >> 30) & 0x1) -#define ecap_prs(e) (((e) >> 29) & 0x1) -#define ecap_broken_pasid(e) (((e) >> 28) & 0x1) -#define ecap_dis(e) (((e) >> 27) & 0x1) -#define ecap_nest(e) (((e) >> 26) & 0x1) -#define ecap_mts(e) (((e) >> 25) & 0x1) -#define ecap_ecs(e) (((e) >> 24) & 0x1) -#define ecap_iotlb_offset(e) ((((e) >> 8) & 0x3ff) * 16) -#define ecap_max_iotlb_offset(e) (ecap_iotlb_offset(e) + 16) -#define ecap_coherent(e) ((e) & 0x1) -#define ecap_qis(e) ((e) & 0x2) -#define ecap_pass_through(e) (((e) >> 6) & 0x1) -#define ecap_eim_support(e) (((e) >> 4) & 0x1) -#define ecap_ir_support(e) (((e) >> 3) & 0x1) -#define ecap_dev_iotlb_support(e) (((e) >> 2) & 0x1) -#define ecap_max_handle_mask(e) (((e) >> 20) & 0xf) -#define ecap_sc_support(e) (((e) >> 7) & 0x1) /* Snooping Control */ - -/* Virtual command interface capability */ -#define vccap_pasid(v) (((v) & DMA_VCS_PAS)) /* PASID allocation */ - -/* IOTLB_REG */ -#define DMA_TLB_FLUSH_GRANU_OFFSET 60 -#define DMA_TLB_GLOBAL_FLUSH (((u64)1) << 60) -#define DMA_TLB_DSI_FLUSH (((u64)2) << 60) -#define DMA_TLB_PSI_FLUSH (((u64)3) << 60) -#define DMA_TLB_IIRG(type) ((type >> 60) & 3) -#define DMA_TLB_IAIG(val) (((val) >> 57) & 3) -#define DMA_TLB_READ_DRAIN (((u64)1) << 49) -#define DMA_TLB_WRITE_DRAIN (((u64)1) << 48) -#define DMA_TLB_DID(id) (((u64)((id) & 0xffff)) << 32) -#define DMA_TLB_IVT (((u64)1) << 63) -#define DMA_TLB_IH_NONLEAF (((u64)1) << 6) -#define DMA_TLB_MAX_SIZE (0x3f) - -/* INVALID_DESC */ -#define DMA_CCMD_INVL_GRANU_OFFSET 61 -#define DMA_ID_TLB_GLOBAL_FLUSH (((u64)1) << 4) -#define DMA_ID_TLB_DSI_FLUSH (((u64)2) << 4) -#define DMA_ID_TLB_PSI_FLUSH (((u64)3) << 4) -#define DMA_ID_TLB_READ_DRAIN (((u64)1) << 7) -#define DMA_ID_TLB_WRITE_DRAIN (((u64)1) << 6) -#define DMA_ID_TLB_DID(id) (((u64)((id & 0xffff) << 16))) -#define DMA_ID_TLB_IH_NONLEAF (((u64)1) << 6) -#define DMA_ID_TLB_ADDR(addr) (addr) -#define DMA_ID_TLB_ADDR_MASK(mask) (mask) - -/* PMEN_REG */ -#define DMA_PMEN_EPM (((u32)1)<<31) -#define DMA_PMEN_PRS (((u32)1)<<0) - -/* GCMD_REG */ -#define DMA_GCMD_TE (((u32)1) << 31) -#define DMA_GCMD_SRTP (((u32)1) << 30) -#define DMA_GCMD_SFL (((u32)1) << 29) -#define DMA_GCMD_EAFL (((u32)1) << 28) -#define DMA_GCMD_WBF (((u32)1) << 27) -#define DMA_GCMD_QIE (((u32)1) << 26) -#define DMA_GCMD_SIRTP (((u32)1) << 24) -#define DMA_GCMD_IRE (((u32) 1) << 25) -#define DMA_GCMD_CFI (((u32) 1) << 23) - -/* GSTS_REG */ -#define DMA_GSTS_TES (((u32)1) << 31) -#define DMA_GSTS_RTPS (((u32)1) << 30) -#define DMA_GSTS_FLS (((u32)1) << 29) -#define DMA_GSTS_AFLS (((u32)1) << 28) -#define DMA_GSTS_WBFS (((u32)1) << 27) -#define DMA_GSTS_QIES (((u32)1) << 26) -#define DMA_GSTS_IRTPS (((u32)1) << 24) -#define DMA_GSTS_IRES (((u32)1) << 25) -#define DMA_GSTS_CFIS (((u32)1) << 23) - -/* DMA_RTADDR_REG */ -#define DMA_RTADDR_RTT (((u64)1) << 11) -#define DMA_RTADDR_SMT (((u64)1) << 10) - -/* CCMD_REG */ -#define DMA_CCMD_ICC (((u64)1) << 63) -#define DMA_CCMD_GLOBAL_INVL (((u64)1) << 61) -#define DMA_CCMD_DOMAIN_INVL (((u64)2) << 61) -#define DMA_CCMD_DEVICE_INVL (((u64)3) << 61) -#define DMA_CCMD_FM(m) (((u64)((m) & 0x3)) << 32) -#define DMA_CCMD_MASK_NOBIT 0 -#define DMA_CCMD_MASK_1BIT 1 -#define DMA_CCMD_MASK_2BIT 2 -#define DMA_CCMD_MASK_3BIT 3 -#define DMA_CCMD_SID(s) (((u64)((s) & 0xffff)) << 16) -#define DMA_CCMD_DID(d) ((u64)((d) & 0xffff)) - -/* FECTL_REG */ -#define DMA_FECTL_IM (((u32)1) << 31) - -/* FSTS_REG */ -#define DMA_FSTS_PFO (1 << 0) /* Primary Fault Overflow */ -#define DMA_FSTS_PPF (1 << 1) /* Primary Pending Fault */ -#define DMA_FSTS_IQE (1 << 4) /* Invalidation Queue Error */ -#define DMA_FSTS_ICE (1 << 5) /* Invalidation Completion Error */ -#define DMA_FSTS_ITE (1 << 6) /* Invalidation Time-out Error */ -#define DMA_FSTS_PRO (1 << 7) /* Page Request Overflow */ -#define dma_fsts_fault_record_index(s) (((s) >> 8) & 0xff) - -/* FRCD_REG, 32 bits access */ -#define DMA_FRCD_F (((u32)1) << 31) -#define dma_frcd_type(d) ((d >> 30) & 1) -#define dma_frcd_fault_reason(c) (c & 0xff) -#define dma_frcd_source_id(c) (c & 0xffff) -#define dma_frcd_pasid_value(c) (((c) >> 8) & 0xfffff) -#define dma_frcd_pasid_present(c) (((c) >> 31) & 1) -/* low 64 bit */ -#define dma_frcd_page_addr(d) (d & (((u64)-1) << PAGE_SHIFT)) - -/* PRS_REG */ -#define DMA_PRS_PPR ((u32)1) -#define DMA_PRS_PRO ((u32)2) - -#define DMA_VCS_PAS ((u64)1) - -#define IOMMU_WAIT_OP(iommu, offset, op, cond, sts) \ -do { \ - cycles_t start_time = get_cycles(); \ - while (1) { \ - sts = op(iommu->reg + offset); \ - if (cond) \ - break; \ - if (DMAR_OPERATION_TIMEOUT < (get_cycles() - start_time))\ - panic("DMAR hardware is malfunctioning\n"); \ - cpu_relax(); \ - } \ -} while (0) - -#define QI_LENGTH 256 /* queue length */ - -enum { - QI_FREE, - QI_IN_USE, - QI_DONE, - QI_ABORT -}; - -#define QI_CC_TYPE 0x1 -#define QI_IOTLB_TYPE 0x2 -#define QI_DIOTLB_TYPE 0x3 -#define QI_IEC_TYPE 0x4 -#define QI_IWD_TYPE 0x5 -#define QI_EIOTLB_TYPE 0x6 -#define QI_PC_TYPE 0x7 -#define QI_DEIOTLB_TYPE 0x8 -#define QI_PGRP_RESP_TYPE 0x9 -#define QI_PSTRM_RESP_TYPE 0xa - -#define QI_IEC_SELECTIVE (((u64)1) << 4) -#define QI_IEC_IIDEX(idx) (((u64)(idx & 0xffff) << 32)) -#define QI_IEC_IM(m) (((u64)(m & 0x1f) << 27)) - -#define QI_IWD_STATUS_DATA(d) (((u64)d) << 32) -#define QI_IWD_STATUS_WRITE (((u64)1) << 5) -#define QI_IWD_FENCE (((u64)1) << 6) -#define QI_IWD_PRQ_DRAIN (((u64)1) << 7) - -#define QI_IOTLB_DID(did) (((u64)did) << 16) -#define QI_IOTLB_DR(dr) (((u64)dr) << 7) -#define QI_IOTLB_DW(dw) (((u64)dw) << 6) -#define QI_IOTLB_GRAN(gran) (((u64)gran) >> (DMA_TLB_FLUSH_GRANU_OFFSET-4)) -#define QI_IOTLB_ADDR(addr) (((u64)addr) & VTD_PAGE_MASK) -#define QI_IOTLB_IH(ih) (((u64)ih) << 6) -#define QI_IOTLB_AM(am) (((u8)am) & 0x3f) - -#define QI_CC_FM(fm) (((u64)fm) << 48) -#define QI_CC_SID(sid) (((u64)sid) << 32) -#define QI_CC_DID(did) (((u64)did) << 16) -#define QI_CC_GRAN(gran) (((u64)gran) >> (DMA_CCMD_INVL_GRANU_OFFSET-4)) - -#define QI_DEV_IOTLB_SID(sid) ((u64)((sid) & 0xffff) << 32) -#define QI_DEV_IOTLB_QDEP(qdep) (((qdep) & 0x1f) << 16) -#define QI_DEV_IOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK) -#define QI_DEV_IOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \ - ((u64)((pfsid >> 4) & 0xfff) << 52)) -#define QI_DEV_IOTLB_SIZE 1 -#define QI_DEV_IOTLB_MAX_INVS 32 - -#define QI_PC_PASID(pasid) (((u64)pasid) << 32) -#define QI_PC_DID(did) (((u64)did) << 16) -#define QI_PC_GRAN(gran) (((u64)gran) << 4) - -/* PASID cache invalidation granu */ -#define QI_PC_ALL_PASIDS 0 -#define QI_PC_PASID_SEL 1 -#define QI_PC_GLOBAL 3 - -#define QI_EIOTLB_ADDR(addr) ((u64)(addr) & VTD_PAGE_MASK) -#define QI_EIOTLB_IH(ih) (((u64)ih) << 6) -#define QI_EIOTLB_AM(am) (((u64)am) & 0x3f) -#define QI_EIOTLB_PASID(pasid) (((u64)pasid) << 32) -#define QI_EIOTLB_DID(did) (((u64)did) << 16) -#define QI_EIOTLB_GRAN(gran) (((u64)gran) << 4) - -/* QI Dev-IOTLB inv granu */ -#define QI_DEV_IOTLB_GRAN_ALL 1 -#define QI_DEV_IOTLB_GRAN_PASID_SEL 0 - -#define QI_DEV_EIOTLB_ADDR(a) ((u64)(a) & VTD_PAGE_MASK) -#define QI_DEV_EIOTLB_SIZE (((u64)1) << 11) -#define QI_DEV_EIOTLB_PASID(p) ((u64)((p) & 0xfffff) << 32) -#define QI_DEV_EIOTLB_SID(sid) ((u64)((sid) & 0xffff) << 16) -#define QI_DEV_EIOTLB_QDEP(qd) ((u64)((qd) & 0x1f) << 4) -#define QI_DEV_EIOTLB_PFSID(pfsid) (((u64)(pfsid & 0xf) << 12) | \ - ((u64)((pfsid >> 4) & 0xfff) << 52)) -#define QI_DEV_EIOTLB_MAX_INVS 32 - -/* Page group response descriptor QW0 */ -#define QI_PGRP_PASID_P(p) (((u64)(p)) << 4) -#define QI_PGRP_PDP(p) (((u64)(p)) << 5) -#define QI_PGRP_RESP_CODE(res) (((u64)(res)) << 12) -#define QI_PGRP_DID(rid) (((u64)(rid)) << 16) -#define QI_PGRP_PASID(pasid) (((u64)(pasid)) << 32) - -/* Page group response descriptor QW1 */ -#define QI_PGRP_LPIG(x) (((u64)(x)) << 2) -#define QI_PGRP_IDX(idx) (((u64)(idx)) << 3) - - -#define QI_RESP_SUCCESS 0x0 -#define QI_RESP_INVALID 0x1 -#define QI_RESP_FAILURE 0xf - -#define QI_GRAN_NONG_PASID 2 -#define QI_GRAN_PSI_PASID 3 - -#define qi_shift(iommu) (DMAR_IQ_SHIFT + !!ecap_smts((iommu)->ecap)) - -struct qi_desc { - u64 qw0; - u64 qw1; - u64 qw2; - u64 qw3; -}; - -struct q_inval { - raw_spinlock_t q_lock; - void *desc; /* invalidation queue */ - int *desc_status; /* desc status */ - int free_head; /* first free entry */ - int free_tail; /* last free entry */ - int free_cnt; -}; - -struct dmar_pci_notify_info; - -#ifdef CONFIG_IRQ_REMAP -/* 1MB - maximum possible interrupt remapping table size */ -#define INTR_REMAP_PAGE_ORDER 8 -#define INTR_REMAP_TABLE_REG_SIZE 0xf -#define INTR_REMAP_TABLE_REG_SIZE_MASK 0xf - -#define INTR_REMAP_TABLE_ENTRIES 65536 - -struct irq_domain; - -struct ir_table { - struct irte *base; - unsigned long *bitmap; -}; - -void intel_irq_remap_add_device(struct dmar_pci_notify_info *info); -#else -static inline void -intel_irq_remap_add_device(struct dmar_pci_notify_info *info) { } -#endif - -struct iommu_flush { - void (*flush_context)(struct intel_iommu *iommu, u16 did, u16 sid, - u8 fm, u64 type); - void (*flush_iotlb)(struct intel_iommu *iommu, u16 did, u64 addr, - unsigned int size_order, u64 type); -}; - -enum { - SR_DMAR_FECTL_REG, - SR_DMAR_FEDATA_REG, - SR_DMAR_FEADDR_REG, - SR_DMAR_FEUADDR_REG, - MAX_SR_DMAR_REGS -}; - -#define VTD_FLAG_TRANS_PRE_ENABLED (1 << 0) -#define VTD_FLAG_IRQ_REMAP_PRE_ENABLED (1 << 1) -#define VTD_FLAG_SVM_CAPABLE (1 << 2) - -extern int intel_iommu_sm; -extern spinlock_t device_domain_lock; - -#define sm_supported(iommu) (intel_iommu_sm && ecap_smts((iommu)->ecap)) -#define pasid_supported(iommu) (sm_supported(iommu) && \ - ecap_pasid((iommu)->ecap)) - -struct pasid_entry; -struct pasid_state_entry; -struct page_req_dsc; - -/* - * 0: Present - * 1-11: Reserved - * 12-63: Context Ptr (12 - (haw-1)) - * 64-127: Reserved - */ -struct root_entry { - u64 lo; - u64 hi; -}; - -/* - * low 64 bits: - * 0: present - * 1: fault processing disable - * 2-3: translation type - * 12-63: address space root - * high 64 bits: - * 0-2: address width - * 3-6: aval - * 8-23: domain id - */ -struct context_entry { - u64 lo; - u64 hi; -}; - -/* - * When VT-d works in the scalable mode, it allows DMA translation to - * happen through either first level or second level page table. This - * bit marks that the DMA translation for the domain goes through the - * first level page table, otherwise, it goes through the second level. - */ -#define DOMAIN_FLAG_USE_FIRST_LEVEL BIT(1) - -struct dmar_domain { - int nid; /* node id */ - - unsigned int iommu_refcnt[DMAR_UNITS_SUPPORTED]; - /* Refcount of devices per iommu */ - - - u16 iommu_did[DMAR_UNITS_SUPPORTED]; - /* Domain ids per IOMMU. Use u16 since - * domain ids are 16 bit wide according - * to VT-d spec, section 9.3 */ - - u8 has_iotlb_device: 1; - u8 iommu_coherency: 1; /* indicate coherency of iommu access */ - u8 force_snooping : 1; /* Create IOPTEs with snoop control */ - u8 set_pte_snp:1; - - struct list_head devices; /* all devices' list */ - struct iova_domain iovad; /* iova's that belong to this domain */ - - struct dma_pte *pgd; /* virtual address */ - int gaw; /* max guest address width */ - - /* adjusted guest address width, 0 is level 2 30-bit */ - int agaw; - - int flags; /* flags to find out type of domain */ - int iommu_superpage;/* Level of superpages supported: - 0 == 4KiB (no superpages), 1 == 2MiB, - 2 == 1GiB, 3 == 512GiB, 4 == 1TiB */ - u64 max_addr; /* maximum mapped address */ - - struct iommu_domain domain; /* generic domain data structure for - iommu core */ -}; - -struct intel_iommu { - void __iomem *reg; /* Pointer to hardware regs, virtual addr */ - u64 reg_phys; /* physical address of hw register set */ - u64 reg_size; /* size of hw register set */ - u64 cap; - u64 ecap; - u64 vccap; - u32 gcmd; /* Holds TE, EAFL. Don't need SRTP, SFL, WBF */ - raw_spinlock_t register_lock; /* protect register handling */ - int seq_id; /* sequence id of the iommu */ - int agaw; /* agaw of this iommu */ - int msagaw; /* max sagaw of this iommu */ - unsigned int irq, pr_irq; - u16 segment; /* PCI segment# */ - unsigned char name[13]; /* Device Name */ - -#ifdef CONFIG_INTEL_IOMMU - unsigned long *domain_ids; /* bitmap of domains */ - spinlock_t lock; /* protect context, domain ids */ - struct root_entry *root_entry; /* virtual address */ - - struct iommu_flush flush; -#endif -#ifdef CONFIG_INTEL_IOMMU_SVM - struct page_req_dsc *prq; - unsigned char prq_name[16]; /* Name for PRQ interrupt */ - struct completion prq_complete; - struct ioasid_allocator_ops pasid_allocator; /* Custom allocator for PASIDs */ -#endif - struct iopf_queue *iopf_queue; - unsigned char iopfq_name[16]; - struct q_inval *qi; /* Queued invalidation info */ - u32 *iommu_state; /* Store iommu states between suspend and resume.*/ - -#ifdef CONFIG_IRQ_REMAP - struct ir_table *ir_table; /* Interrupt remapping info */ - struct irq_domain *ir_domain; - struct irq_domain *ir_msi_domain; -#endif - struct iommu_device iommu; /* IOMMU core code handle */ - int node; - u32 flags; /* Software defined flags */ - - struct dmar_drhd_unit *drhd; - void *perf_statistic; -}; - -/* PCI domain-device relationship */ -struct device_domain_info { - struct list_head link; /* link to domain siblings */ - struct list_head global; /* link to global list */ - u32 segment; /* PCI segment number */ - u8 bus; /* PCI bus number */ - u8 devfn; /* PCI devfn number */ - u16 pfsid; /* SRIOV physical function source ID */ - u8 pasid_supported:3; - u8 pasid_enabled:1; - u8 pri_supported:1; - u8 pri_enabled:1; - u8 ats_supported:1; - u8 ats_enabled:1; - u8 ats_qdep; - struct device *dev; /* it's NULL for PCIe-to-PCI bridge */ - struct intel_iommu *iommu; /* IOMMU used by this device */ - struct dmar_domain *domain; /* pointer to domain */ - struct pasid_table *pasid_table; /* pasid table */ -}; - -static inline void __iommu_flush_cache( - struct intel_iommu *iommu, void *addr, int size) -{ - if (!ecap_coherent(iommu->ecap)) - clflush_cache_range(addr, size); -} - -/* Convert generic struct iommu_domain to private struct dmar_domain */ -static inline struct dmar_domain *to_dmar_domain(struct iommu_domain *dom) -{ - return container_of(dom, struct dmar_domain, domain); -} - -/* - * 0: readable - * 1: writable - * 2-6: reserved - * 7: super page - * 8-10: available - * 11: snoop behavior - * 12-63: Host physical address - */ -struct dma_pte { - u64 val; -}; - -static inline void dma_clear_pte(struct dma_pte *pte) -{ - pte->val = 0; -} - -static inline u64 dma_pte_addr(struct dma_pte *pte) -{ -#ifdef CONFIG_64BIT - return pte->val & VTD_PAGE_MASK & (~DMA_FL_PTE_XD); -#else - /* Must have a full atomic 64-bit read */ - return __cmpxchg64(&pte->val, 0ULL, 0ULL) & - VTD_PAGE_MASK & (~DMA_FL_PTE_XD); -#endif -} - -static inline bool dma_pte_present(struct dma_pte *pte) -{ - return (pte->val & 3) != 0; -} - -static inline bool dma_pte_superpage(struct dma_pte *pte) -{ - return (pte->val & DMA_PTE_LARGE_PAGE); -} - -static inline bool first_pte_in_page(struct dma_pte *pte) -{ - return IS_ALIGNED((unsigned long)pte, VTD_PAGE_SIZE); -} - -static inline int nr_pte_to_next_page(struct dma_pte *pte) -{ - return first_pte_in_page(pte) ? BIT_ULL(VTD_STRIDE_SHIFT) : - (struct dma_pte *)ALIGN((unsigned long)pte, VTD_PAGE_SIZE) - pte; -} - -extern struct dmar_drhd_unit * dmar_find_matched_drhd_unit(struct pci_dev *dev); - -extern int dmar_enable_qi(struct intel_iommu *iommu); -extern void dmar_disable_qi(struct intel_iommu *iommu); -extern int dmar_reenable_qi(struct intel_iommu *iommu); -extern void qi_global_iec(struct intel_iommu *iommu); - -extern void qi_flush_context(struct intel_iommu *iommu, u16 did, u16 sid, - u8 fm, u64 type); -extern void qi_flush_iotlb(struct intel_iommu *iommu, u16 did, u64 addr, - unsigned int size_order, u64 type); -extern void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, - u16 qdep, u64 addr, unsigned mask); - -void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr, - unsigned long npages, bool ih); - -void qi_flush_dev_iotlb_pasid(struct intel_iommu *iommu, u16 sid, u16 pfsid, - u32 pasid, u16 qdep, u64 addr, - unsigned int size_order); -void qi_flush_pasid_cache(struct intel_iommu *iommu, u16 did, u64 granu, - u32 pasid); - -int qi_submit_sync(struct intel_iommu *iommu, struct qi_desc *desc, - unsigned int count, unsigned long options); -/* - * Options used in qi_submit_sync: - * QI_OPT_WAIT_DRAIN - Wait for PRQ drain completion, spec 6.5.2.8. - */ -#define QI_OPT_WAIT_DRAIN BIT(0) - -extern int dmar_ir_support(void); - -void *alloc_pgtable_page(int node); -void free_pgtable_page(void *vaddr); -struct intel_iommu *domain_get_iommu(struct dmar_domain *domain); -void iommu_flush_write_buffer(struct intel_iommu *iommu); -int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev); -struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn); - -#ifdef CONFIG_INTEL_IOMMU_SVM -extern void intel_svm_check(struct intel_iommu *iommu); -extern int intel_svm_enable_prq(struct intel_iommu *iommu); -extern int intel_svm_finish_prq(struct intel_iommu *iommu); -struct iommu_sva *intel_svm_bind(struct device *dev, struct mm_struct *mm, - void *drvdata); -void intel_svm_unbind(struct iommu_sva *handle); -u32 intel_svm_get_pasid(struct iommu_sva *handle); -int intel_svm_page_response(struct device *dev, struct iommu_fault_event *evt, - struct iommu_page_response *msg); - -struct intel_svm_dev { - struct list_head list; - struct rcu_head rcu; - struct device *dev; - struct intel_iommu *iommu; - struct iommu_sva sva; - unsigned long prq_seq_number; - u32 pasid; - int users; - u16 did; - u16 dev_iotlb:1; - u16 sid, qdep; -}; - -struct intel_svm { - struct mmu_notifier notifier; - struct mm_struct *mm; - - unsigned int flags; - u32 pasid; - struct list_head devs; -}; -#else -static inline void intel_svm_check(struct intel_iommu *iommu) {} -#endif - -#ifdef CONFIG_INTEL_IOMMU_DEBUGFS -void intel_iommu_debugfs_init(void); -#else -static inline void intel_iommu_debugfs_init(void) {} -#endif /* CONFIG_INTEL_IOMMU_DEBUGFS */ - -extern const struct attribute_group *intel_iommu_groups[]; -bool context_present(struct context_entry *context); -struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, - u8 devfn, int alloc); - -extern const struct iommu_ops intel_iommu_ops; - -#ifdef CONFIG_INTEL_IOMMU -extern int iommu_calculate_agaw(struct intel_iommu *iommu); -extern int iommu_calculate_max_sagaw(struct intel_iommu *iommu); -extern int dmar_disabled; -extern int intel_iommu_enabled; -#else -static inline int iommu_calculate_agaw(struct intel_iommu *iommu) -{ - return 0; -} -static inline int iommu_calculate_max_sagaw(struct intel_iommu *iommu) -{ - return 0; -} -#define dmar_disabled (1) -#define intel_iommu_enabled (0) -#endif - -static inline const char *decode_prq_descriptor(char *str, size_t size, - u64 dw0, u64 dw1, u64 dw2, u64 dw3) -{ - char *buf = str; - int bytes; - - bytes = snprintf(buf, size, - "rid=0x%llx addr=0x%llx %c%c%c%c%c pasid=0x%llx index=0x%llx", - FIELD_GET(GENMASK_ULL(31, 16), dw0), - FIELD_GET(GENMASK_ULL(63, 12), dw1), - dw1 & BIT_ULL(0) ? 'r' : '-', - dw1 & BIT_ULL(1) ? 'w' : '-', - dw0 & BIT_ULL(52) ? 'x' : '-', - dw0 & BIT_ULL(53) ? 'p' : '-', - dw1 & BIT_ULL(2) ? 'l' : '-', - FIELD_GET(GENMASK_ULL(51, 32), dw0), - FIELD_GET(GENMASK_ULL(11, 3), dw1)); - - /* Private Data */ - if (dw0 & BIT_ULL(9)) { - size -= bytes; - buf += bytes; - snprintf(buf, size, " private=0x%llx/0x%llx\n", dw2, dw3); - } - - return str; -} - -#endif -- cgit From 25357900f4e67094dd09b7dcb8a5f47c3f5a159d Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Tue, 12 Jul 2022 08:09:08 +0800 Subject: iommu/vt-d: Make DMAR_UNITS_SUPPORTED default 1024 If the available hardware exceeds DMAR_UNITS_SUPPORTED (previously set to MAX_IO_APICS, or 128), it causes these messages: "DMAR: Failed to allocate seq_id", "DMAR: Parse DMAR table failure.", and "x2apic: IRQ remapping doesn't support X2APIC mode x2apic disabled"; and the system fails to boot properly. To support up to 64 sockets with 10 DMAR units each (640), make the value of DMAR_UNITS_SUPPORTED default 1024. Signed-off-by: Steve Wahl Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian Reviewed-by: Steve Wahl Link: https://lore.kernel.org/linux-iommu/20220615183650.32075-1-steve.wahl@hpe.com/ Link: https://lore.kernel.org/r/20220702015610.2849494-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/dmar.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dmar.h b/include/linux/dmar.h index cbd714a198a0..d81a51978d01 100644 --- a/include/linux/dmar.h +++ b/include/linux/dmar.h @@ -18,11 +18,7 @@ struct acpi_dmar_header; -#ifdef CONFIG_X86 -# define DMAR_UNITS_SUPPORTED MAX_IO_APICS -#else -# define DMAR_UNITS_SUPPORTED 64 -#endif +#define DMAR_UNITS_SUPPORTED 1024 /* DMAR Flags */ #define DMAR_INTR_REMAP 0x1 -- cgit From fb2accadaa9427a374fa9f482ea24ca731f675ba Mon Sep 17 00:00:00 2001 From: Brijesh Singh Date: Wed, 13 Jul 2022 17:56:48 -0500 Subject: iommu/amd: Introduce function to check and enable SNP To support SNP, IOMMU needs to be enabled, and prohibits IOMMU configurations where DTE[Mode]=0, which means it cannot be supported with IOMMU passthrough domain (a.k.a IOMMU_DOMAIN_IDENTITY), and when AMD IOMMU driver is configured to not use the IOMMU host (v1) page table. Otherwise, RMP table initialization could cause the system to crash. The request to enable SNP support in IOMMU must be done before PCI initialization state of the IOMMU driver because enabling SNP affects how IOMMU driver sets up IOMMU data structures (i.e. DTE). Unlike other IOMMU features, SNP feature does not have an enable bit in the IOMMU control register. Instead, the IOMMU driver introduces an amd_iommu_snp_en variable to track enabling state of SNP. Introduce amd_iommu_snp_enable() for other drivers to request enabling the SNP support in IOMMU, which checks all prerequisites and determines if the feature can be safely enabled. Please see the IOMMU spec section 2.12 for further details. Reviewed-by: Robin Murphy Co-developed-by: Suravee Suthikulpanit Signed-off-by: Suravee Suthikulpanit Signed-off-by: Brijesh Singh Link: https://lore.kernel.org/r/20220713225651.20758-7-suravee.suthikulpanit@amd.com Signed-off-by: Joerg Roedel --- include/linux/amd-iommu.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/amd-iommu.h b/include/linux/amd-iommu.h index 58e6c3806c09..953e6f12fa1c 100644 --- a/include/linux/amd-iommu.h +++ b/include/linux/amd-iommu.h @@ -206,4 +206,8 @@ int amd_iommu_pc_get_reg(struct amd_iommu *iommu, u8 bank, u8 cntr, u8 fxn, u64 *value); struct amd_iommu *get_amd_iommu(unsigned int idx); +#ifdef CONFIG_AMD_MEM_ENCRYPT +int amd_iommu_snp_enable(void); +#endif + #endif /* _ASM_X86_AMD_IOMMU_H */ -- cgit From 1e0b3b0b6cb5e04bcb25fbe4164180d64e3ace07 Mon Sep 17 00:00:00 2001 From: Ilan Peer Date: Wed, 27 Apr 2022 18:02:10 +0300 Subject: wifi: mac80211: Align with Draft P802.11be_D1.5 Align the mac80211 implementation with P802.11be_D1.5. Signed-off-by: Ilan Peer Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 52 ++++++++++++++++++++++++++++++++--------------- 1 file changed, 36 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index f386f9ed41f3..e75c73ca11ec 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -2046,25 +2046,38 @@ struct ieee80211_eht_cap_elem { u8 optional[]; } __packed; +#define IEEE80211_EHT_OPER_INFO_PRESENT 0x1 +#define IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT 0x2 + /** * struct ieee80211_eht_operation - eht operation element * * This structure is the "EHT Operation Element" fields as - * described in P802.11be_D1.4 section 9.4.2.311 + * described in P802.11be_D1.5 section 9.4.2.311 * - * FIXME: The spec is unclear how big the fields are, and doesn't - * indicate the "Disabled Subchannel Bitmap Present" in the - * structure (Figure 9-1002a) at all ... + * @params: EHT operation element parameters. See &IEEE80211_EHT_OPER_* + * @optional: optional parts */ struct ieee80211_eht_operation { - u8 chan_width; - u8 ccfs; - u8 present_bm; - - u8 disable_subchannel_bitmap[]; + u8 params; + u8 optional[]; } __packed; -#define IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT 0x1 +/** + * struct ieee80211_eht_operation_info - eht operation information + * + * @control: EHT operation information control. + * @ccfs0: defines a channel center frequency for a 20, 40, 80, 160, or 320 MHz + * EHT BSS. + * @ccfs1: defines a channel center frequency for a 160 or 320 MHz EHT BSS. + * @optional: optional parts + */ +struct ieee80211_eht_operation_info { + u8 control; + u8 ccfs0; + u8 ccfs1; + u8 optional[]; +} __packed; /* 802.11ac VHT Capabilities */ #define IEEE80211_VHT_CAP_MAX_MPDU_LENGTH_3895 0x00000000 @@ -2773,10 +2786,12 @@ ieee80211_he_spr_size(const u8 *he_spr_ie) #define IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE2 0x08 #define IEEE80211_EHT_MAC_CAP0_RESTRICTED_TWT 0x10 #define IEEE80211_EHT_MAC_CAP0_SCS_TRAFFIC_DESC 0x20 -#define IEEE80211_EHT_MAC_CAP0_MAX_AMPDU_LEN_MASK 0xc0 -#define IEEE80211_EHT_MAC_CAP0_MAX_AMPDU_LEN_3895 0 -#define IEEE80211_EHT_MAC_CAP0_MAX_AMPDU_LEN_7991 1 -#define IEEE80211_EHT_MAC_CAP0_MAX_AMPDU_LEN_11454 2 +#define IEEE80211_EHT_MAC_CAP0_MAX_MPDU_LEN_MASK 0xc0 +#define IEEE80211_EHT_MAC_CAP0_MAX_MPDU_LEN_3895 0 +#define IEEE80211_EHT_MAC_CAP0_MAX_MPDU_LEN_7991 1 +#define IEEE80211_EHT_MAC_CAP0_MAX_MPDU_LEN_11454 2 + +#define IEEE80211_EHT_MAC_CAP1_MAX_AMPDU_LEN_MASK 0x01 /* EHT PHY capabilities as defined in P802.11be_D1.4 section 9.4.2.313.3 */ #define IEEE80211_EHT_PHY_CAP0_320MHZ_IN_6GHZ 0x02 @@ -2949,8 +2964,13 @@ ieee80211_eht_oper_size_ok(const u8 *data, u8 len) if (len < needed) return false; - if (elem->present_bm & IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT) - needed += 2; + if (elem->params & IEEE80211_EHT_OPER_INFO_PRESENT) { + needed += 3; + + if (elem->params & + IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT) + needed += 2; + } return len >= needed; } -- cgit From 062e8e02dfd43c94b0d601c071e8a5c50bed830e Mon Sep 17 00:00:00 2001 From: Ilan Peer Date: Wed, 1 Jun 2022 17:43:34 +0300 Subject: wifi: mac80211: Align with Draft P802.11be_D2.0 Align the mac80211 implementation with P802.11be_D2.0. Signed-off-by: Ilan Peer Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index e75c73ca11ec..cca564372d16 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -2020,7 +2020,7 @@ struct ieee80211_eht_mcs_nss_supp_bw { * struct ieee80211_eht_cap_elem_fixed - EHT capabilities fixed data * * This structure is the "EHT Capabilities element" fixed fields as - * described in P802.11be_D1.4 section 9.4.2.313. + * described in P802.11be_D2.0 section 9.4.2.313. * * @mac_cap_info: MAC capabilities, see IEEE80211_EHT_MAC_CAP* * @phy_cap_info: PHY capabilities, see IEEE80211_EHT_PHY_CAP* @@ -2046,20 +2046,27 @@ struct ieee80211_eht_cap_elem { u8 optional[]; } __packed; -#define IEEE80211_EHT_OPER_INFO_PRESENT 0x1 -#define IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT 0x2 +#define IEEE80211_EHT_OPER_INFO_PRESENT 0x01 +#define IEEE80211_EHT_OPER_DISABLED_SUBCHANNEL_BITMAP_PRESENT 0x02 +#define IEEE80211_EHT_OPER_EHT_DEF_PE_DURATION 0x04 +#define IEEE80211_EHT_OPER_GROUP_ADDRESSED_BU_IND_LIMIT 0x08 +#define IEEE80211_EHT_OPER_GROUP_ADDRESSED_BU_IND_EXP_MASK 0x30 /** * struct ieee80211_eht_operation - eht operation element * * This structure is the "EHT Operation Element" fields as - * described in P802.11be_D1.5 section 9.4.2.311 + * described in P802.11be_D2.0 section 9.4.2.311 * * @params: EHT operation element parameters. See &IEEE80211_EHT_OPER_* + * @basic_mcs_nss: indicates the EHT-MCSs for each number of spatial streams in + * EHT PPDUs that are supported by all EHT STAs in the BSS in transmit and + * receive. * @optional: optional parts */ struct ieee80211_eht_operation { u8 params; + __le32 basic_mcs_nss; u8 optional[]; } __packed; @@ -2779,8 +2786,8 @@ ieee80211_he_spr_size(const u8 *he_spr_ie) #define S1G_OPER_CH_WIDTH_PRIMARY_1MHZ BIT(0) #define S1G_OPER_CH_WIDTH_OPER GENMASK(4, 1) -/* EHT MAC capabilities as defined in P802.11be_D1.4 section 9.4.2.313.2 */ -#define IEEE80211_EHT_MAC_CAP0_NSEP_PRIO_ACCESS 0x01 +/* EHT MAC capabilities as defined in P802.11be_D2.0 section 9.4.2.313.2 */ +#define IEEE80211_EHT_MAC_CAP0_EPCS_PRIO_ACCESS 0x01 #define IEEE80211_EHT_MAC_CAP0_OM_CONTROL 0x02 #define IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE1 0x04 #define IEEE80211_EHT_MAC_CAP0_TRIG_TXOP_SHARING_MODE2 0x08 @@ -2793,7 +2800,7 @@ ieee80211_he_spr_size(const u8 *he_spr_ie) #define IEEE80211_EHT_MAC_CAP1_MAX_AMPDU_LEN_MASK 0x01 -/* EHT PHY capabilities as defined in P802.11be_D1.4 section 9.4.2.313.3 */ +/* EHT PHY capabilities as defined in P802.11be_D2.0 section 9.4.2.313.3 */ #define IEEE80211_EHT_PHY_CAP0_320MHZ_IN_6GHZ 0x02 #define IEEE80211_EHT_PHY_CAP0_242_TONE_RU_GT20MHZ 0x04 #define IEEE80211_EHT_PHY_CAP0_NDP_4_EHT_LFT_32_GI 0x08 @@ -2858,7 +2865,7 @@ ieee80211_he_spr_size(const u8 *he_spr_ie) #define IEEE80211_EHT_PHY_CAP8_RX_4096QAM_WIDER_BW_DL_OFDMA 0x02 /* - * EHT operation channel width as defined in P802.11be_D1.4 section 9.4.2.311 + * EHT operation channel width as defined in P802.11be_D2.0 section 9.4.2.311 */ #define IEEE80211_EHT_OPER_CHAN_WIDTH 0x7 #define IEEE80211_EHT_OPER_CHAN_WIDTH_20MHZ 0 -- cgit From 93064e15c8a3a8394319a11b8037666e4b7d653d Mon Sep 17 00:00:00 2001 From: Stefan Binding Date: Thu, 7 Jul 2022 16:10:36 +0100 Subject: ACPI: utils: Add api to read _SUB from ACPI Add a wrapper function to read the _SUB string from ACPI. Signed-off-by: Stefan Binding Acked-by: Rafael J. Wysocki Link: https://lore.kernel.org/r/20220707151037.3901050-2-sbinding@opensource.cirrus.com Signed-off-by: Mark Brown --- include/linux/acpi.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 4f82a5bc6d98..968a187f14f0 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -762,6 +762,7 @@ static inline u64 acpi_arch_get_root_pointer(void) #endif int acpi_get_local_address(acpi_handle handle, u32 *addr); +const char *acpi_get_subsystem_id(acpi_handle handle); #else /* !CONFIG_ACPI */ @@ -1023,6 +1024,11 @@ static inline int acpi_get_local_address(acpi_handle handle, u32 *addr) return -ENODEV; } +static inline const char *acpi_get_subsystem_id(acpi_handle handle) +{ + return ERR_PTR(-ENODEV); +} + static inline int acpi_register_wakeup_handler(int wake_irq, bool (*wakeup)(void *context), void *context) { -- cgit From 28a6ed8e39f77f6ac613ec9b7461aa75e85fa79a Mon Sep 17 00:00:00 2001 From: Prashant Malani Date: Mon, 11 Jul 2022 07:22:57 +0000 Subject: platform/chrome: Add Type-C mux set command definitions Copy EC header definitions for the USB Type-C Mux control command from the EC code base. Also pull in "TBT_UFP_REPLY" definitions, since that is the prior entry in the enum. These headers are already present in the EC code base. [1] [1] https://chromium.googlesource.com/chromiumos/platform/ec/+/b80f85a94a423273c1638ef7b662c56931a138dd/include/ec_commands.h Signed-off-by: Prashant Malani Link: https://lore.kernel.org/r/20220711072333.2064341-4-pmalani@chromium.org Signed-off-by: Greg Kroah-Hartman --- include/linux/platform_data/cros_ec_commands.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_commands.h b/include/linux/platform_data/cros_ec_commands.h index 8cfa8cfca77e..a3945c5e7f50 100644 --- a/include/linux/platform_data/cros_ec_commands.h +++ b/include/linux/platform_data/cros_ec_commands.h @@ -5722,8 +5722,21 @@ enum typec_control_command { TYPEC_CONTROL_COMMAND_EXIT_MODES, TYPEC_CONTROL_COMMAND_CLEAR_EVENTS, TYPEC_CONTROL_COMMAND_ENTER_MODE, + TYPEC_CONTROL_COMMAND_TBT_UFP_REPLY, + TYPEC_CONTROL_COMMAND_USB_MUX_SET, }; +/* Replies the AP may specify to the TBT EnterMode command as a UFP */ +enum typec_tbt_ufp_reply { + TYPEC_TBT_UFP_REPLY_NAK, + TYPEC_TBT_UFP_REPLY_ACK, +}; + +struct typec_usb_mux_set { + uint8_t mux_index; /* Index of the mux to set in the chain */ + uint8_t mux_flags; /* USB_PD_MUX_*-encoded USB mux state to set */ +} __ec_align1; + struct ec_params_typec_control { uint8_t port; uint8_t command; /* enum typec_control_command */ @@ -5737,6 +5750,8 @@ struct ec_params_typec_control { union { uint32_t clear_events_mask; uint8_t mode_to_enter; /* enum typec_mode */ + uint8_t tbt_ufp_reply; /* enum typec_tbt_ufp_reply */ + struct typec_usb_mux_set mux_params; uint8_t placeholder[128]; }; } __ec_align1; @@ -5815,6 +5830,9 @@ enum tcpc_cc_polarity { #define PD_STATUS_EVENT_SOP_DISC_DONE BIT(0) #define PD_STATUS_EVENT_SOP_PRIME_DISC_DONE BIT(1) #define PD_STATUS_EVENT_HARD_RESET BIT(2) +#define PD_STATUS_EVENT_DISCONNECTED BIT(3) +#define PD_STATUS_EVENT_MUX_0_SET_DONE BIT(4) +#define PD_STATUS_EVENT_MUX_1_SET_DONE BIT(5) struct ec_params_typec_status { uint8_t port; -- cgit From 4dea97f8636d0514befc9fc5cf342b351b7d0e20 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Fri, 1 Jul 2022 05:54:25 -0700 Subject: lib/bitmap: change type of bitmap_weight to unsigned long bitmap_weight() doesn't return negative values, so change it's type to unsigned long. It may help compiler to generate better code and catch bugs. Signed-off-by: Yury Norov --- include/linux/bitmap.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index e1a438bdda52..035d4ac66641 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -163,7 +163,7 @@ bool __bitmap_intersects(const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); bool __bitmap_subset(const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); -int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits); +unsigned long __bitmap_weight(const unsigned long *bitmap, unsigned int nbits); void __bitmap_set(unsigned long *map, unsigned int start, int len); void __bitmap_clear(unsigned long *map, unsigned int start, int len); @@ -431,7 +431,8 @@ static inline bool bitmap_full(const unsigned long *src, unsigned int nbits) return find_first_zero_bit(src, nbits) == nbits; } -static __always_inline int bitmap_weight(const unsigned long *src, unsigned int nbits) +static __always_inline +unsigned long bitmap_weight(const unsigned long *src, unsigned int nbits) { if (small_const_nbits(nbits)) return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits)); -- cgit From cb32c285cc10e428589194e30233d673e7c23c72 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Fri, 1 Jul 2022 05:54:26 -0700 Subject: cpumask: change return types to bool where appropriate Some cpumask functions have integer return types where return values are naturally booleans. Signed-off-by: Yury Norov --- include/linux/cpumask.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index fe29ac7cc469..b54e27d9da6b 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -372,9 +372,9 @@ static __always_inline void __cpumask_clear_cpu(int cpu, struct cpumask *dstp) * @cpu: cpu number (< nr_cpu_ids) * @cpumask: the cpumask pointer * - * Returns 1 if @cpu is set in @cpumask, else returns 0 + * Returns true if @cpu is set in @cpumask, else returns false */ -static __always_inline int cpumask_test_cpu(int cpu, const struct cpumask *cpumask) +static __always_inline bool cpumask_test_cpu(int cpu, const struct cpumask *cpumask) { return test_bit(cpumask_check(cpu), cpumask_bits((cpumask))); } @@ -384,11 +384,11 @@ static __always_inline int cpumask_test_cpu(int cpu, const struct cpumask *cpuma * @cpu: cpu number (< nr_cpu_ids) * @cpumask: the cpumask pointer * - * Returns 1 if @cpu is set in old bitmap of @cpumask, else returns 0 + * Returns true if @cpu is set in old bitmap of @cpumask, else returns false * * test_and_set_bit wrapper for cpumasks. */ -static __always_inline int cpumask_test_and_set_cpu(int cpu, struct cpumask *cpumask) +static __always_inline bool cpumask_test_and_set_cpu(int cpu, struct cpumask *cpumask) { return test_and_set_bit(cpumask_check(cpu), cpumask_bits(cpumask)); } @@ -398,11 +398,11 @@ static __always_inline int cpumask_test_and_set_cpu(int cpu, struct cpumask *cpu * @cpu: cpu number (< nr_cpu_ids) * @cpumask: the cpumask pointer * - * Returns 1 if @cpu is set in old bitmap of @cpumask, else returns 0 + * Returns true if @cpu is set in old bitmap of @cpumask, else returns false * * test_and_clear_bit wrapper for cpumasks. */ -static __always_inline int cpumask_test_and_clear_cpu(int cpu, struct cpumask *cpumask) +static __always_inline bool cpumask_test_and_clear_cpu(int cpu, struct cpumask *cpumask) { return test_and_clear_bit(cpumask_check(cpu), cpumask_bits(cpumask)); } @@ -431,9 +431,9 @@ static inline void cpumask_clear(struct cpumask *dstp) * @src1p: the first input * @src2p: the second input * - * If *@dstp is empty, returns 0, else returns 1 + * If *@dstp is empty, returns false, else returns true */ -static inline int cpumask_and(struct cpumask *dstp, +static inline bool cpumask_and(struct cpumask *dstp, const struct cpumask *src1p, const struct cpumask *src2p) { @@ -474,9 +474,9 @@ static inline void cpumask_xor(struct cpumask *dstp, * @src1p: the first input * @src2p: the second input * - * If *@dstp is empty, returns 0, else returns 1 + * If *@dstp is empty, returns false, else returns true */ -static inline int cpumask_andnot(struct cpumask *dstp, +static inline bool cpumask_andnot(struct cpumask *dstp, const struct cpumask *src1p, const struct cpumask *src2p) { @@ -539,9 +539,9 @@ static inline bool cpumask_intersects(const struct cpumask *src1p, * @src1p: the first input * @src2p: the second input * - * Returns 1 if *@src1p is a subset of *@src2p, else returns 0 + * Returns true if *@src1p is a subset of *@src2p, else returns false */ -static inline int cpumask_subset(const struct cpumask *src1p, +static inline bool cpumask_subset(const struct cpumask *src1p, const struct cpumask *src2p) { return bitmap_subset(cpumask_bits(src1p), cpumask_bits(src2p), -- cgit From 8b6b795d9bfc031a8953c40fac8d3cf67e1a4d3d Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Fri, 1 Jul 2022 05:54:27 -0700 Subject: lib/cpumask: change return types to unsigned where appropriate Switch return types to unsigned int where return values cannot be negative. Signed-off-by: Yury Norov --- include/linux/cpumask.h | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index b54e27d9da6b..760022bcb925 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -176,12 +176,12 @@ static inline unsigned int cpumask_local_spread(unsigned int i, int node) return 0; } -static inline int cpumask_any_and_distribute(const struct cpumask *src1p, +static inline unsigned int cpumask_any_and_distribute(const struct cpumask *src1p, const struct cpumask *src2p) { return cpumask_first_and(src1p, src2p); } -static inline int cpumask_any_distribute(const struct cpumask *srcp) +static inline unsigned int cpumask_any_distribute(const struct cpumask *srcp) { return cpumask_first(srcp); } @@ -258,12 +258,12 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp) return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1); } -int __pure cpumask_next_and(int n, const struct cpumask *, const struct cpumask *); -int __pure cpumask_any_but(const struct cpumask *mask, unsigned int cpu); +unsigned int __pure cpumask_next_and(int n, const struct cpumask *, const struct cpumask *); +unsigned int __pure cpumask_any_but(const struct cpumask *mask, unsigned int cpu); unsigned int cpumask_local_spread(unsigned int i, int node); -int cpumask_any_and_distribute(const struct cpumask *src1p, +unsigned int cpumask_any_and_distribute(const struct cpumask *src1p, const struct cpumask *src2p); -int cpumask_any_distribute(const struct cpumask *srcp); +unsigned int cpumask_any_distribute(const struct cpumask *srcp); /** * for_each_cpu - iterate over every cpu in a mask @@ -289,7 +289,7 @@ int cpumask_any_distribute(const struct cpumask *srcp); (cpu) = cpumask_next_zero((cpu), (mask)), \ (cpu) < nr_cpu_ids;) -extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap); +unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap); /** * for_each_cpu_wrap - iterate over every cpu in a mask, starting at a specified location -- cgit From 9b2e70860ef2f0d74b6d9e57929d57b14481b9c9 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Fri, 1 Jul 2022 05:54:28 -0700 Subject: lib/cpumask: move trivial wrappers around find_bit to the header To avoid circular dependencies, cpumask keeps simple (almost) one-line wrappers around find_bit() in a c-file. Commit 47d8c15615c0a2 ("include: move find.h from asm_generic to linux") moved find.h header out of asm_generic include path, and it helped to fix many circular dependencies, including some in cpumask.h. This patch moves those one-liners to header files. Signed-off-by: Yury Norov --- include/linux/cpumask.h | 57 ++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 54 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 760022bcb925..ea3de2c2c180 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -241,7 +241,21 @@ static inline unsigned int cpumask_last(const struct cpumask *srcp) return find_last_bit(cpumask_bits(srcp), nr_cpumask_bits); } -unsigned int __pure cpumask_next(int n, const struct cpumask *srcp); +/** + * cpumask_next - get the next cpu in a cpumask + * @n: the cpu prior to the place to search (ie. return will be > @n) + * @srcp: the cpumask pointer + * + * Returns >= nr_cpu_ids if no further cpus set. + */ +static inline +unsigned int cpumask_next(int n, const struct cpumask *srcp) +{ + /* -1 is a legal arg here. */ + if (n != -1) + cpumask_check(n); + return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1); +} /** * cpumask_next_zero - get the next unset cpu in a cpumask @@ -258,8 +272,25 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp) return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1); } -unsigned int __pure cpumask_next_and(int n, const struct cpumask *, const struct cpumask *); -unsigned int __pure cpumask_any_but(const struct cpumask *mask, unsigned int cpu); +/** + * cpumask_next_and - get the next cpu in *src1p & *src2p + * @n: the cpu prior to the place to search (ie. return will be > @n) + * @src1p: the first cpumask pointer + * @src2p: the second cpumask pointer + * + * Returns >= nr_cpu_ids if no further cpus set in both. + */ +static inline +unsigned int cpumask_next_and(int n, const struct cpumask *src1p, + const struct cpumask *src2p) +{ + /* -1 is a legal arg here. */ + if (n != -1) + cpumask_check(n); + return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p), + nr_cpumask_bits, n + 1); +} + unsigned int cpumask_local_spread(unsigned int i, int node); unsigned int cpumask_any_and_distribute(const struct cpumask *src1p, const struct cpumask *src2p); @@ -324,6 +355,26 @@ unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, boo for ((cpu) = -1; \ (cpu) = cpumask_next_and((cpu), (mask1), (mask2)), \ (cpu) < nr_cpu_ids;) + +/** + * cpumask_any_but - return a "random" in a cpumask, but not this one. + * @mask: the cpumask to search + * @cpu: the cpu to ignore. + * + * Often used to find any cpu but smp_processor_id() in a mask. + * Returns >= nr_cpu_ids if no cpus set. + */ +static inline +unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu) +{ + unsigned int i; + + cpumask_check(cpu); + for_each_cpu(i, mask) + if (i != cpu) + break; + return i; +} #endif /* SMP */ #define CPU_BITS_NONE \ -- cgit From db96b0c5f9db22d908ab5f7cd75904adba4b28ca Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 10 Jun 2021 10:56:49 +0200 Subject: headers/deps: mm: Optimize header dependencies There's a couple of superfluous inclusions here - remove them before doing bigger changes. Signed-off-by: Ingo Molnar Signed-off-by: Yury Norov --- include/linux/gfp.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 2d2ccae933c2..52f2c873a7d4 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -2,10 +2,7 @@ #ifndef __LINUX_GFP_H #define __LINUX_GFP_H -#include #include -#include -#include #include /* The typedef is in types.h but we want the documentation here */ -- cgit From cb5a065b4ea9c062a18143c8a14e831179687f54 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Thu, 14 Apr 2022 18:42:28 +0200 Subject: headers/deps: mm: Split out of This is a much smaller header. Signed-off-by: Ingo Molnar Signed-off-by: Yury Norov --- include/linux/gfp.h | 345 +-------------------------------------------- include/linux/gfp_types.h | 348 ++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 350 insertions(+), 343 deletions(-) create mode 100644 include/linux/gfp_types.h (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 52f2c873a7d4..f314be58fa77 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -2,354 +2,13 @@ #ifndef __LINUX_GFP_H #define __LINUX_GFP_H +#include + #include #include -/* The typedef is in types.h but we want the documentation here */ -#if 0 -/** - * typedef gfp_t - Memory allocation flags. - * - * GFP flags are commonly used throughout Linux to indicate how memory - * should be allocated. The GFP acronym stands for get_free_pages(), - * the underlying memory allocation function. Not every GFP flag is - * supported by every function which may allocate memory. Most users - * will want to use a plain ``GFP_KERNEL``. - */ -typedef unsigned int __bitwise gfp_t; -#endif - struct vm_area_struct; -/* - * In case of changes, please don't forget to update - * include/trace/events/mmflags.h and tools/perf/builtin-kmem.c - */ - -/* Plain integer GFP bitmasks. Do not use this directly. */ -#define ___GFP_DMA 0x01u -#define ___GFP_HIGHMEM 0x02u -#define ___GFP_DMA32 0x04u -#define ___GFP_MOVABLE 0x08u -#define ___GFP_RECLAIMABLE 0x10u -#define ___GFP_HIGH 0x20u -#define ___GFP_IO 0x40u -#define ___GFP_FS 0x80u -#define ___GFP_ZERO 0x100u -#define ___GFP_ATOMIC 0x200u -#define ___GFP_DIRECT_RECLAIM 0x400u -#define ___GFP_KSWAPD_RECLAIM 0x800u -#define ___GFP_WRITE 0x1000u -#define ___GFP_NOWARN 0x2000u -#define ___GFP_RETRY_MAYFAIL 0x4000u -#define ___GFP_NOFAIL 0x8000u -#define ___GFP_NORETRY 0x10000u -#define ___GFP_MEMALLOC 0x20000u -#define ___GFP_COMP 0x40000u -#define ___GFP_NOMEMALLOC 0x80000u -#define ___GFP_HARDWALL 0x100000u -#define ___GFP_THISNODE 0x200000u -#define ___GFP_ACCOUNT 0x400000u -#define ___GFP_ZEROTAGS 0x800000u -#ifdef CONFIG_KASAN_HW_TAGS -#define ___GFP_SKIP_ZERO 0x1000000u -#define ___GFP_SKIP_KASAN_UNPOISON 0x2000000u -#define ___GFP_SKIP_KASAN_POISON 0x4000000u -#else -#define ___GFP_SKIP_ZERO 0 -#define ___GFP_SKIP_KASAN_UNPOISON 0 -#define ___GFP_SKIP_KASAN_POISON 0 -#endif -#ifdef CONFIG_LOCKDEP -#define ___GFP_NOLOCKDEP 0x8000000u -#else -#define ___GFP_NOLOCKDEP 0 -#endif -/* If the above are modified, __GFP_BITS_SHIFT may need updating */ - -/* - * Physical address zone modifiers (see linux/mmzone.h - low four bits) - * - * Do not put any conditional on these. If necessary modify the definitions - * without the underscores and use them consistently. The definitions here may - * be used in bit comparisons. - */ -#define __GFP_DMA ((__force gfp_t)___GFP_DMA) -#define __GFP_HIGHMEM ((__force gfp_t)___GFP_HIGHMEM) -#define __GFP_DMA32 ((__force gfp_t)___GFP_DMA32) -#define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* ZONE_MOVABLE allowed */ -#define GFP_ZONEMASK (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE) - -/** - * DOC: Page mobility and placement hints - * - * Page mobility and placement hints - * --------------------------------- - * - * These flags provide hints about how mobile the page is. Pages with similar - * mobility are placed within the same pageblocks to minimise problems due - * to external fragmentation. - * - * %__GFP_MOVABLE (also a zone modifier) indicates that the page can be - * moved by page migration during memory compaction or can be reclaimed. - * - * %__GFP_RECLAIMABLE is used for slab allocations that specify - * SLAB_RECLAIM_ACCOUNT and whose pages can be freed via shrinkers. - * - * %__GFP_WRITE indicates the caller intends to dirty the page. Where possible, - * these pages will be spread between local zones to avoid all the dirty - * pages being in one zone (fair zone allocation policy). - * - * %__GFP_HARDWALL enforces the cpuset memory allocation policy. - * - * %__GFP_THISNODE forces the allocation to be satisfied from the requested - * node with no fallbacks or placement policy enforcements. - * - * %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg. - */ -#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) -#define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) -#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL) -#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE) -#define __GFP_ACCOUNT ((__force gfp_t)___GFP_ACCOUNT) - -/** - * DOC: Watermark modifiers - * - * Watermark modifiers -- controls access to emergency reserves - * ------------------------------------------------------------ - * - * %__GFP_HIGH indicates that the caller is high-priority and that granting - * the request is necessary before the system can make forward progress. - * For example, creating an IO context to clean pages. - * - * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is - * high priority. Users are typically interrupt handlers. This may be - * used in conjunction with %__GFP_HIGH - * - * %__GFP_MEMALLOC allows access to all memory. This should only be used when - * the caller guarantees the allocation will allow more memory to be freed - * very shortly e.g. process exiting or swapping. Users either should - * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). - * Users of this flag have to be extremely careful to not deplete the reserve - * completely and implement a throttling mechanism which controls the - * consumption of the reserve based on the amount of freed memory. - * Usage of a pre-allocated pool (e.g. mempool) should be always considered - * before using this flag. - * - * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves. - * This takes precedence over the %__GFP_MEMALLOC flag if both are set. - */ -#define __GFP_ATOMIC ((__force gfp_t)___GFP_ATOMIC) -#define __GFP_HIGH ((__force gfp_t)___GFP_HIGH) -#define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC) -#define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) - -/** - * DOC: Reclaim modifiers - * - * Reclaim modifiers - * ----------------- - * Please note that all the following flags are only applicable to sleepable - * allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them). - * - * %__GFP_IO can start physical IO. - * - * %__GFP_FS can call down to the low-level FS. Clearing the flag avoids the - * allocator recursing into the filesystem which might already be holding - * locks. - * - * %__GFP_DIRECT_RECLAIM indicates that the caller may enter direct reclaim. - * This flag can be cleared to avoid unnecessary delays when a fallback - * option is available. - * - * %__GFP_KSWAPD_RECLAIM indicates that the caller wants to wake kswapd when - * the low watermark is reached and have it reclaim pages until the high - * watermark is reached. A caller may wish to clear this flag when fallback - * options are available and the reclaim is likely to disrupt the system. The - * canonical example is THP allocation where a fallback is cheap but - * reclaim/compaction may cause indirect stalls. - * - * %__GFP_RECLAIM is shorthand to allow/forbid both direct and kswapd reclaim. - * - * The default allocator behavior depends on the request size. We have a concept - * of so called costly allocations (with order > %PAGE_ALLOC_COSTLY_ORDER). - * !costly allocations are too essential to fail so they are implicitly - * non-failing by default (with some exceptions like OOM victims might fail so - * the caller still has to check for failures) while costly requests try to be - * not disruptive and back off even without invoking the OOM killer. - * The following three modifiers might be used to override some of these - * implicit rules - * - * %__GFP_NORETRY: The VM implementation will try only very lightweight - * memory direct reclaim to get some memory under memory pressure (thus - * it can sleep). It will avoid disruptive actions like OOM killer. The - * caller must handle the failure which is quite likely to happen under - * heavy memory pressure. The flag is suitable when failure can easily be - * handled at small cost, such as reduced throughput - * - * %__GFP_RETRY_MAYFAIL: The VM implementation will retry memory reclaim - * procedures that have previously failed if there is some indication - * that progress has been made else where. It can wait for other - * tasks to attempt high level approaches to freeing memory such as - * compaction (which removes fragmentation) and page-out. - * There is still a definite limit to the number of retries, but it is - * a larger limit than with %__GFP_NORETRY. - * Allocations with this flag may fail, but only when there is - * genuinely little unused memory. While these allocations do not - * directly trigger the OOM killer, their failure indicates that - * the system is likely to need to use the OOM killer soon. The - * caller must handle failure, but can reasonably do so by failing - * a higher-level request, or completing it only in a much less - * efficient manner. - * If the allocation does fail, and the caller is in a position to - * free some non-essential memory, doing so could benefit the system - * as a whole. - * - * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller - * cannot handle allocation failures. The allocation could block - * indefinitely but will never return with failure. Testing for - * failure is pointless. - * New users should be evaluated carefully (and the flag should be - * used only when there is no reasonable failure policy) but it is - * definitely preferable to use the flag rather than opencode endless - * loop around allocator. - * Using this flag for costly allocations is _highly_ discouraged. - */ -#define __GFP_IO ((__force gfp_t)___GFP_IO) -#define __GFP_FS ((__force gfp_t)___GFP_FS) -#define __GFP_DIRECT_RECLAIM ((__force gfp_t)___GFP_DIRECT_RECLAIM) /* Caller can reclaim */ -#define __GFP_KSWAPD_RECLAIM ((__force gfp_t)___GFP_KSWAPD_RECLAIM) /* kswapd can wake */ -#define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM)) -#define __GFP_RETRY_MAYFAIL ((__force gfp_t)___GFP_RETRY_MAYFAIL) -#define __GFP_NOFAIL ((__force gfp_t)___GFP_NOFAIL) -#define __GFP_NORETRY ((__force gfp_t)___GFP_NORETRY) - -/** - * DOC: Action modifiers - * - * Action modifiers - * ---------------- - * - * %__GFP_NOWARN suppresses allocation failure reports. - * - * %__GFP_COMP address compound page metadata. - * - * %__GFP_ZERO returns a zeroed page on success. - * - * %__GFP_ZEROTAGS zeroes memory tags at allocation time if the memory itself - * is being zeroed (either via __GFP_ZERO or via init_on_alloc, provided that - * __GFP_SKIP_ZERO is not set). This flag is intended for optimization: setting - * memory tags at the same time as zeroing memory has minimal additional - * performace impact. - * - * %__GFP_SKIP_KASAN_UNPOISON makes KASAN skip unpoisoning on page allocation. - * Only effective in HW_TAGS mode. - * - * %__GFP_SKIP_KASAN_POISON makes KASAN skip poisoning on page deallocation. - * Typically, used for userspace pages. Only effective in HW_TAGS mode. - */ -#define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) -#define __GFP_COMP ((__force gfp_t)___GFP_COMP) -#define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) -#define __GFP_ZEROTAGS ((__force gfp_t)___GFP_ZEROTAGS) -#define __GFP_SKIP_ZERO ((__force gfp_t)___GFP_SKIP_ZERO) -#define __GFP_SKIP_KASAN_UNPOISON ((__force gfp_t)___GFP_SKIP_KASAN_UNPOISON) -#define __GFP_SKIP_KASAN_POISON ((__force gfp_t)___GFP_SKIP_KASAN_POISON) - -/* Disable lockdep for GFP context tracking */ -#define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) - -/* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT (27 + IS_ENABLED(CONFIG_LOCKDEP)) -#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) - -/** - * DOC: Useful GFP flag combinations - * - * Useful GFP flag combinations - * ---------------------------- - * - * Useful GFP flag combinations that are commonly used. It is recommended - * that subsystems start with one of these combinations and then set/clear - * %__GFP_FOO flags as necessary. - * - * %GFP_ATOMIC users can not sleep and need the allocation to succeed. A lower - * watermark is applied to allow access to "atomic reserves". - * The current implementation doesn't support NMI and few other strict - * non-preemptive contexts (e.g. raw_spin_lock). The same applies to %GFP_NOWAIT. - * - * %GFP_KERNEL is typical for kernel-internal allocations. The caller requires - * %ZONE_NORMAL or a lower zone for direct access but can direct reclaim. - * - * %GFP_KERNEL_ACCOUNT is the same as GFP_KERNEL, except the allocation is - * accounted to kmemcg. - * - * %GFP_NOWAIT is for kernel allocations that should not stall for direct - * reclaim, start physical IO or use any filesystem callback. - * - * %GFP_NOIO will use direct reclaim to discard clean pages or slab pages - * that do not require the starting of any physical IO. - * Please try to avoid using this flag directly and instead use - * memalloc_noio_{save,restore} to mark the whole scope which cannot - * perform any IO with a short explanation why. All allocation requests - * will inherit GFP_NOIO implicitly. - * - * %GFP_NOFS will use direct reclaim but will not use any filesystem interfaces. - * Please try to avoid using this flag directly and instead use - * memalloc_nofs_{save,restore} to mark the whole scope which cannot/shouldn't - * recurse into the FS layer with a short explanation why. All allocation - * requests will inherit GFP_NOFS implicitly. - * - * %GFP_USER is for userspace allocations that also need to be directly - * accessibly by the kernel or hardware. It is typically used by hardware - * for buffers that are mapped to userspace (e.g. graphics) that hardware - * still must DMA to. cpuset limits are enforced for these allocations. - * - * %GFP_DMA exists for historical reasons and should be avoided where possible. - * The flags indicates that the caller requires that the lowest zone be - * used (%ZONE_DMA or 16M on x86-64). Ideally, this would be removed but - * it would require careful auditing as some users really require it and - * others use the flag to avoid lowmem reserves in %ZONE_DMA and treat the - * lowest zone as a type of emergency reserve. - * - * %GFP_DMA32 is similar to %GFP_DMA except that the caller requires a 32-bit - * address. Note that kmalloc(..., GFP_DMA32) does not return DMA32 memory - * because the DMA32 kmalloc cache array is not implemented. - * (Reason: there is no such user in kernel). - * - * %GFP_HIGHUSER is for userspace allocations that may be mapped to userspace, - * do not need to be directly accessible by the kernel but that cannot - * move once in use. An example may be a hardware allocation that maps - * data directly into userspace but has no addressing limitations. - * - * %GFP_HIGHUSER_MOVABLE is for userspace allocations that the kernel does not - * need direct access to but can use kmap() when access is required. They - * are expected to be movable via page reclaim or page migration. Typically, - * pages on the LRU would also be allocated with %GFP_HIGHUSER_MOVABLE. - * - * %GFP_TRANSHUGE and %GFP_TRANSHUGE_LIGHT are used for THP allocations. They - * are compound allocations that will generally fail quickly if memory is not - * available and will not wake kswapd/kcompactd on failure. The _LIGHT - * version does not attempt reclaim/compaction at all and is by default used - * in page fault path, while the non-light is used by khugepaged. - */ -#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) -#define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) -#define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) -#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) -#define GFP_NOIO (__GFP_RECLAIM) -#define GFP_NOFS (__GFP_RECLAIM | __GFP_IO) -#define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_HARDWALL) -#define GFP_DMA __GFP_DMA -#define GFP_DMA32 __GFP_DMA32 -#define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) -#define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE | \ - __GFP_SKIP_KASAN_POISON) -#define GFP_TRANSHUGE_LIGHT ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ - __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM) -#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) - /* Convert GFP flags to their corresponding migrate type */ #define GFP_MOVABLE_MASK (__GFP_RECLAIMABLE|__GFP_MOVABLE) #define GFP_MOVABLE_SHIFT 3 diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h new file mode 100644 index 000000000000..06fc85cee23f --- /dev/null +++ b/include/linux/gfp_types.h @@ -0,0 +1,348 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __LINUX_GFP_TYPES_H +#define __LINUX_GFP_TYPES_H + +/* The typedef is in types.h but we want the documentation here */ +#if 0 +/** + * typedef gfp_t - Memory allocation flags. + * + * GFP flags are commonly used throughout Linux to indicate how memory + * should be allocated. The GFP acronym stands for get_free_pages(), + * the underlying memory allocation function. Not every GFP flag is + * supported by every function which may allocate memory. Most users + * will want to use a plain ``GFP_KERNEL``. + */ +typedef unsigned int __bitwise gfp_t; +#endif + +/* + * In case of changes, please don't forget to update + * include/trace/events/mmflags.h and tools/perf/builtin-kmem.c + */ + +/* Plain integer GFP bitmasks. Do not use this directly. */ +#define ___GFP_DMA 0x01u +#define ___GFP_HIGHMEM 0x02u +#define ___GFP_DMA32 0x04u +#define ___GFP_MOVABLE 0x08u +#define ___GFP_RECLAIMABLE 0x10u +#define ___GFP_HIGH 0x20u +#define ___GFP_IO 0x40u +#define ___GFP_FS 0x80u +#define ___GFP_ZERO 0x100u +#define ___GFP_ATOMIC 0x200u +#define ___GFP_DIRECT_RECLAIM 0x400u +#define ___GFP_KSWAPD_RECLAIM 0x800u +#define ___GFP_WRITE 0x1000u +#define ___GFP_NOWARN 0x2000u +#define ___GFP_RETRY_MAYFAIL 0x4000u +#define ___GFP_NOFAIL 0x8000u +#define ___GFP_NORETRY 0x10000u +#define ___GFP_MEMALLOC 0x20000u +#define ___GFP_COMP 0x40000u +#define ___GFP_NOMEMALLOC 0x80000u +#define ___GFP_HARDWALL 0x100000u +#define ___GFP_THISNODE 0x200000u +#define ___GFP_ACCOUNT 0x400000u +#define ___GFP_ZEROTAGS 0x800000u +#ifdef CONFIG_KASAN_HW_TAGS +#define ___GFP_SKIP_ZERO 0x1000000u +#define ___GFP_SKIP_KASAN_UNPOISON 0x2000000u +#define ___GFP_SKIP_KASAN_POISON 0x4000000u +#else +#define ___GFP_SKIP_ZERO 0 +#define ___GFP_SKIP_KASAN_UNPOISON 0 +#define ___GFP_SKIP_KASAN_POISON 0 +#endif +#ifdef CONFIG_LOCKDEP +#define ___GFP_NOLOCKDEP 0x8000000u +#else +#define ___GFP_NOLOCKDEP 0 +#endif +/* If the above are modified, __GFP_BITS_SHIFT may need updating */ + +/* + * Physical address zone modifiers (see linux/mmzone.h - low four bits) + * + * Do not put any conditional on these. If necessary modify the definitions + * without the underscores and use them consistently. The definitions here may + * be used in bit comparisons. + */ +#define __GFP_DMA ((__force gfp_t)___GFP_DMA) +#define __GFP_HIGHMEM ((__force gfp_t)___GFP_HIGHMEM) +#define __GFP_DMA32 ((__force gfp_t)___GFP_DMA32) +#define __GFP_MOVABLE ((__force gfp_t)___GFP_MOVABLE) /* ZONE_MOVABLE allowed */ +#define GFP_ZONEMASK (__GFP_DMA|__GFP_HIGHMEM|__GFP_DMA32|__GFP_MOVABLE) + +/** + * DOC: Page mobility and placement hints + * + * Page mobility and placement hints + * --------------------------------- + * + * These flags provide hints about how mobile the page is. Pages with similar + * mobility are placed within the same pageblocks to minimise problems due + * to external fragmentation. + * + * %__GFP_MOVABLE (also a zone modifier) indicates that the page can be + * moved by page migration during memory compaction or can be reclaimed. + * + * %__GFP_RECLAIMABLE is used for slab allocations that specify + * SLAB_RECLAIM_ACCOUNT and whose pages can be freed via shrinkers. + * + * %__GFP_WRITE indicates the caller intends to dirty the page. Where possible, + * these pages will be spread between local zones to avoid all the dirty + * pages being in one zone (fair zone allocation policy). + * + * %__GFP_HARDWALL enforces the cpuset memory allocation policy. + * + * %__GFP_THISNODE forces the allocation to be satisfied from the requested + * node with no fallbacks or placement policy enforcements. + * + * %__GFP_ACCOUNT causes the allocation to be accounted to kmemcg. + */ +#define __GFP_RECLAIMABLE ((__force gfp_t)___GFP_RECLAIMABLE) +#define __GFP_WRITE ((__force gfp_t)___GFP_WRITE) +#define __GFP_HARDWALL ((__force gfp_t)___GFP_HARDWALL) +#define __GFP_THISNODE ((__force gfp_t)___GFP_THISNODE) +#define __GFP_ACCOUNT ((__force gfp_t)___GFP_ACCOUNT) + +/** + * DOC: Watermark modifiers + * + * Watermark modifiers -- controls access to emergency reserves + * ------------------------------------------------------------ + * + * %__GFP_HIGH indicates that the caller is high-priority and that granting + * the request is necessary before the system can make forward progress. + * For example, creating an IO context to clean pages. + * + * %__GFP_ATOMIC indicates that the caller cannot reclaim or sleep and is + * high priority. Users are typically interrupt handlers. This may be + * used in conjunction with %__GFP_HIGH + * + * %__GFP_MEMALLOC allows access to all memory. This should only be used when + * the caller guarantees the allocation will allow more memory to be freed + * very shortly e.g. process exiting or swapping. Users either should + * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). + * Users of this flag have to be extremely careful to not deplete the reserve + * completely and implement a throttling mechanism which controls the + * consumption of the reserve based on the amount of freed memory. + * Usage of a pre-allocated pool (e.g. mempool) should be always considered + * before using this flag. + * + * %__GFP_NOMEMALLOC is used to explicitly forbid access to emergency reserves. + * This takes precedence over the %__GFP_MEMALLOC flag if both are set. + */ +#define __GFP_ATOMIC ((__force gfp_t)___GFP_ATOMIC) +#define __GFP_HIGH ((__force gfp_t)___GFP_HIGH) +#define __GFP_MEMALLOC ((__force gfp_t)___GFP_MEMALLOC) +#define __GFP_NOMEMALLOC ((__force gfp_t)___GFP_NOMEMALLOC) + +/** + * DOC: Reclaim modifiers + * + * Reclaim modifiers + * ----------------- + * Please note that all the following flags are only applicable to sleepable + * allocations (e.g. %GFP_NOWAIT and %GFP_ATOMIC will ignore them). + * + * %__GFP_IO can start physical IO. + * + * %__GFP_FS can call down to the low-level FS. Clearing the flag avoids the + * allocator recursing into the filesystem which might already be holding + * locks. + * + * %__GFP_DIRECT_RECLAIM indicates that the caller may enter direct reclaim. + * This flag can be cleared to avoid unnecessary delays when a fallback + * option is available. + * + * %__GFP_KSWAPD_RECLAIM indicates that the caller wants to wake kswapd when + * the low watermark is reached and have it reclaim pages until the high + * watermark is reached. A caller may wish to clear this flag when fallback + * options are available and the reclaim is likely to disrupt the system. The + * canonical example is THP allocation where a fallback is cheap but + * reclaim/compaction may cause indirect stalls. + * + * %__GFP_RECLAIM is shorthand to allow/forbid both direct and kswapd reclaim. + * + * The default allocator behavior depends on the request size. We have a concept + * of so called costly allocations (with order > %PAGE_ALLOC_COSTLY_ORDER). + * !costly allocations are too essential to fail so they are implicitly + * non-failing by default (with some exceptions like OOM victims might fail so + * the caller still has to check for failures) while costly requests try to be + * not disruptive and back off even without invoking the OOM killer. + * The following three modifiers might be used to override some of these + * implicit rules + * + * %__GFP_NORETRY: The VM implementation will try only very lightweight + * memory direct reclaim to get some memory under memory pressure (thus + * it can sleep). It will avoid disruptive actions like OOM killer. The + * caller must handle the failure which is quite likely to happen under + * heavy memory pressure. The flag is suitable when failure can easily be + * handled at small cost, such as reduced throughput + * + * %__GFP_RETRY_MAYFAIL: The VM implementation will retry memory reclaim + * procedures that have previously failed if there is some indication + * that progress has been made else where. It can wait for other + * tasks to attempt high level approaches to freeing memory such as + * compaction (which removes fragmentation) and page-out. + * There is still a definite limit to the number of retries, but it is + * a larger limit than with %__GFP_NORETRY. + * Allocations with this flag may fail, but only when there is + * genuinely little unused memory. While these allocations do not + * directly trigger the OOM killer, their failure indicates that + * the system is likely to need to use the OOM killer soon. The + * caller must handle failure, but can reasonably do so by failing + * a higher-level request, or completing it only in a much less + * efficient manner. + * If the allocation does fail, and the caller is in a position to + * free some non-essential memory, doing so could benefit the system + * as a whole. + * + * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller + * cannot handle allocation failures. The allocation could block + * indefinitely but will never return with failure. Testing for + * failure is pointless. + * New users should be evaluated carefully (and the flag should be + * used only when there is no reasonable failure policy) but it is + * definitely preferable to use the flag rather than opencode endless + * loop around allocator. + * Using this flag for costly allocations is _highly_ discouraged. + */ +#define __GFP_IO ((__force gfp_t)___GFP_IO) +#define __GFP_FS ((__force gfp_t)___GFP_FS) +#define __GFP_DIRECT_RECLAIM ((__force gfp_t)___GFP_DIRECT_RECLAIM) /* Caller can reclaim */ +#define __GFP_KSWAPD_RECLAIM ((__force gfp_t)___GFP_KSWAPD_RECLAIM) /* kswapd can wake */ +#define __GFP_RECLAIM ((__force gfp_t)(___GFP_DIRECT_RECLAIM|___GFP_KSWAPD_RECLAIM)) +#define __GFP_RETRY_MAYFAIL ((__force gfp_t)___GFP_RETRY_MAYFAIL) +#define __GFP_NOFAIL ((__force gfp_t)___GFP_NOFAIL) +#define __GFP_NORETRY ((__force gfp_t)___GFP_NORETRY) + +/** + * DOC: Action modifiers + * + * Action modifiers + * ---------------- + * + * %__GFP_NOWARN suppresses allocation failure reports. + * + * %__GFP_COMP address compound page metadata. + * + * %__GFP_ZERO returns a zeroed page on success. + * + * %__GFP_ZEROTAGS zeroes memory tags at allocation time if the memory itself + * is being zeroed (either via __GFP_ZERO or via init_on_alloc, provided that + * __GFP_SKIP_ZERO is not set). This flag is intended for optimization: setting + * memory tags at the same time as zeroing memory has minimal additional + * performace impact. + * + * %__GFP_SKIP_KASAN_UNPOISON makes KASAN skip unpoisoning on page allocation. + * Only effective in HW_TAGS mode. + * + * %__GFP_SKIP_KASAN_POISON makes KASAN skip poisoning on page deallocation. + * Typically, used for userspace pages. Only effective in HW_TAGS mode. + */ +#define __GFP_NOWARN ((__force gfp_t)___GFP_NOWARN) +#define __GFP_COMP ((__force gfp_t)___GFP_COMP) +#define __GFP_ZERO ((__force gfp_t)___GFP_ZERO) +#define __GFP_ZEROTAGS ((__force gfp_t)___GFP_ZEROTAGS) +#define __GFP_SKIP_ZERO ((__force gfp_t)___GFP_SKIP_ZERO) +#define __GFP_SKIP_KASAN_UNPOISON ((__force gfp_t)___GFP_SKIP_KASAN_UNPOISON) +#define __GFP_SKIP_KASAN_POISON ((__force gfp_t)___GFP_SKIP_KASAN_POISON) + +/* Disable lockdep for GFP context tracking */ +#define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) + +/* Room for N __GFP_FOO bits */ +#define __GFP_BITS_SHIFT (27 + IS_ENABLED(CONFIG_LOCKDEP)) +#define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) + +/** + * DOC: Useful GFP flag combinations + * + * Useful GFP flag combinations + * ---------------------------- + * + * Useful GFP flag combinations that are commonly used. It is recommended + * that subsystems start with one of these combinations and then set/clear + * %__GFP_FOO flags as necessary. + * + * %GFP_ATOMIC users can not sleep and need the allocation to succeed. A lower + * watermark is applied to allow access to "atomic reserves". + * The current implementation doesn't support NMI and few other strict + * non-preemptive contexts (e.g. raw_spin_lock). The same applies to %GFP_NOWAIT. + * + * %GFP_KERNEL is typical for kernel-internal allocations. The caller requires + * %ZONE_NORMAL or a lower zone for direct access but can direct reclaim. + * + * %GFP_KERNEL_ACCOUNT is the same as GFP_KERNEL, except the allocation is + * accounted to kmemcg. + * + * %GFP_NOWAIT is for kernel allocations that should not stall for direct + * reclaim, start physical IO or use any filesystem callback. + * + * %GFP_NOIO will use direct reclaim to discard clean pages or slab pages + * that do not require the starting of any physical IO. + * Please try to avoid using this flag directly and instead use + * memalloc_noio_{save,restore} to mark the whole scope which cannot + * perform any IO with a short explanation why. All allocation requests + * will inherit GFP_NOIO implicitly. + * + * %GFP_NOFS will use direct reclaim but will not use any filesystem interfaces. + * Please try to avoid using this flag directly and instead use + * memalloc_nofs_{save,restore} to mark the whole scope which cannot/shouldn't + * recurse into the FS layer with a short explanation why. All allocation + * requests will inherit GFP_NOFS implicitly. + * + * %GFP_USER is for userspace allocations that also need to be directly + * accessibly by the kernel or hardware. It is typically used by hardware + * for buffers that are mapped to userspace (e.g. graphics) that hardware + * still must DMA to. cpuset limits are enforced for these allocations. + * + * %GFP_DMA exists for historical reasons and should be avoided where possible. + * The flags indicates that the caller requires that the lowest zone be + * used (%ZONE_DMA or 16M on x86-64). Ideally, this would be removed but + * it would require careful auditing as some users really require it and + * others use the flag to avoid lowmem reserves in %ZONE_DMA and treat the + * lowest zone as a type of emergency reserve. + * + * %GFP_DMA32 is similar to %GFP_DMA except that the caller requires a 32-bit + * address. Note that kmalloc(..., GFP_DMA32) does not return DMA32 memory + * because the DMA32 kmalloc cache array is not implemented. + * (Reason: there is no such user in kernel). + * + * %GFP_HIGHUSER is for userspace allocations that may be mapped to userspace, + * do not need to be directly accessible by the kernel but that cannot + * move once in use. An example may be a hardware allocation that maps + * data directly into userspace but has no addressing limitations. + * + * %GFP_HIGHUSER_MOVABLE is for userspace allocations that the kernel does not + * need direct access to but can use kmap() when access is required. They + * are expected to be movable via page reclaim or page migration. Typically, + * pages on the LRU would also be allocated with %GFP_HIGHUSER_MOVABLE. + * + * %GFP_TRANSHUGE and %GFP_TRANSHUGE_LIGHT are used for THP allocations. They + * are compound allocations that will generally fail quickly if memory is not + * available and will not wake kswapd/kcompactd on failure. The _LIGHT + * version does not attempt reclaim/compaction at all and is by default used + * in page fault path, while the non-light is used by khugepaged. + */ +#define GFP_ATOMIC (__GFP_HIGH|__GFP_ATOMIC|__GFP_KSWAPD_RECLAIM) +#define GFP_KERNEL (__GFP_RECLAIM | __GFP_IO | __GFP_FS) +#define GFP_KERNEL_ACCOUNT (GFP_KERNEL | __GFP_ACCOUNT) +#define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM) +#define GFP_NOIO (__GFP_RECLAIM) +#define GFP_NOFS (__GFP_RECLAIM | __GFP_IO) +#define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_HARDWALL) +#define GFP_DMA __GFP_DMA +#define GFP_DMA32 __GFP_DMA32 +#define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) +#define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE | \ + __GFP_SKIP_KASAN_POISON) +#define GFP_TRANSHUGE_LIGHT ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ + __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM) +#define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) + +#endif /* __LINUX_GFP_TYPES_H */ -- cgit From f0dd891dd5a1d6dc6c9d486333aac4f433f17d17 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Fri, 1 Jul 2022 05:54:30 -0700 Subject: lib/cpumask: move some one-line wrappers to header file After moving gfp flags to a separate header, it's possible to move some cpumask allocators into headers, and avoid creating real functions. Signed-off-by: Yury Norov --- include/linux/cpumask.h | 34 +++++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index ea3de2c2c180..80627362c774 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -12,6 +12,8 @@ #include #include #include +#include +#include /* Don't assign or return these: may not be this big! */ typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t; @@ -794,9 +796,35 @@ typedef struct cpumask *cpumask_var_t; #define __cpumask_var_read_mostly __read_mostly bool alloc_cpumask_var_node(cpumask_var_t *mask, gfp_t flags, int node); -bool alloc_cpumask_var(cpumask_var_t *mask, gfp_t flags); -bool zalloc_cpumask_var_node(cpumask_var_t *mask, gfp_t flags, int node); -bool zalloc_cpumask_var(cpumask_var_t *mask, gfp_t flags); + +static inline +bool zalloc_cpumask_var_node(cpumask_var_t *mask, gfp_t flags, int node) +{ + return alloc_cpumask_var_node(mask, flags | __GFP_ZERO, node); +} + +/** + * alloc_cpumask_var - allocate a struct cpumask + * @mask: pointer to cpumask_var_t where the cpumask is returned + * @flags: GFP_ flags + * + * Only defined when CONFIG_CPUMASK_OFFSTACK=y, otherwise is + * a nop returning a constant 1 (in ). + * + * See alloc_cpumask_var_node. + */ +static inline +bool alloc_cpumask_var(cpumask_var_t *mask, gfp_t flags) +{ + return alloc_cpumask_var_node(mask, flags, NUMA_NO_NODE); +} + +static inline +bool zalloc_cpumask_var(cpumask_var_t *mask, gfp_t flags) +{ + return alloc_cpumask_var(mask, flags | __GFP_ZERO); +} + void alloc_bootmem_cpumask_var(cpumask_var_t *mask); void free_cpumask_var(cpumask_var_t mask); void free_bootmem_cpumask_var(cpumask_var_t mask); -- cgit From a8755e9bdd6a416331e286cc25cddbd93b2c064a Mon Sep 17 00:00:00 2001 From: Dinh Nguyen Date: Fri, 15 Jul 2022 10:03:49 -0500 Subject: firmware: stratix10-svc: fix kernel-doc warning include/linux/firmware/intel/stratix10-svc-client.h:55: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst * Flag bit for COMMAND_RECONFIG Fixes: 4a4709d470e6 ("firmware: stratix10-svc: add new FCS commands") Signed-off-by: Dinh Nguyen Reported-by: Stephen Rothwell Link: https://lore.kernel.org/r/20220715150349.2413994-1-dinguyen@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/firmware/intel/stratix10-svc-client.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h index a776a787ac75..0c16037fd08d 100644 --- a/include/linux/firmware/intel/stratix10-svc-client.h +++ b/include/linux/firmware/intel/stratix10-svc-client.h @@ -51,7 +51,8 @@ #define SVC_STATUS_ERROR 5 #define SVC_STATUS_NO_SUPPORT 6 #define SVC_STATUS_INVALID_PARAM 7 -/** + +/* * Flag bit for COMMAND_RECONFIG * * COMMAND_RECONFIG_FLAG_PARTIAL: -- cgit From 7ee951acd31a88f941fd6535fbdee3a1567f1d63 Mon Sep 17 00:00:00 2001 From: Phil Auld Date: Fri, 15 Jul 2022 09:49:24 -0400 Subject: drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist Using bin_attributes with a 0 size causes fstat and friends to return that 0 size. This breaks userspace code that retrieves the size before reading the file. Rather than reverting 75bd50fa841 ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI") let's put in a size value at compile time. For cpulist the maximum size is on the order of NR_CPUS * (ceil(log10(NR_CPUS)) + 1)/2 which for 8192 is 20480 (8192 * 5)/2. In order to get near that you'd need a system with every other CPU on one node. For example: (0,2,4,8, ... ). To simplify the math and support larger NR_CPUS in the future we are using (NR_CPUS * 7)/2. We also set it to a min of PAGE_SIZE to retain the older behavior for smaller NR_CPUS. The cpumap file the size works out to be NR_CPUS/4 + NR_CPUS/32 - 1 (or NR_CPUS * 9/32 - 1) including the ","s. Add a set of macros for these values to cpumask.h so they can be used in multiple places. Apply these to the handful of such files in drivers/base/topology.c as well as node.c. As an example, on an 80 cpu 4-node system (NR_CPUS == 8192): before: -r--r--r--. 1 root root 0 Jul 12 14:08 system/node/node0/cpulist -r--r--r--. 1 root root 0 Jul 11 17:25 system/node/node0/cpumap after: -r--r--r--. 1 root root 28672 Jul 13 11:32 system/node/node0/cpulist -r--r--r--. 1 root root 4096 Jul 13 11:31 system/node/node0/cpumap CONFIG_NR_CPUS = 16384 -r--r--r--. 1 root root 57344 Jul 13 14:03 system/node/node0/cpulist -r--r--r--. 1 root root 4607 Jul 13 14:02 system/node/node0/cpumap The actual number of cpus doesn't matter for the reported size since they are based on NR_CPUS. Fixes: 75bd50fa841d ("drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI") Fixes: bb9ec13d156e ("topology: use bin_attribute to break the size limitation of cpumap ABI") Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Yury Norov Cc: stable@vger.kernel.org Acked-by: Yury Norov (for include/linux/cpumask.h) Signed-off-by: Phil Auld Link: https://lore.kernel.org/r/20220715134924.3466194-1-pauld@redhat.com Signed-off-by: Greg Kroah-Hartman --- include/linux/cpumask.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index fe29ac7cc469..4592d0845941 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -1071,4 +1071,22 @@ cpumap_print_list_to_buf(char *buf, const struct cpumask *mask, [0] = 1UL \ } } +/* + * Provide a valid theoretical max size for cpumap and cpulist sysfs files + * to avoid breaking userspace which may allocate a buffer based on the size + * reported by e.g. fstat. + * + * for cpumap NR_CPUS * 9/32 - 1 should be an exact length. + * + * For cpulist 7 is (ceil(log10(NR_CPUS)) + 1) allowing for NR_CPUS to be up + * to 2 orders of magnitude larger than 8192. And then we divide by 2 to + * cover a worst-case of every other cpu being on one of two nodes for a + * very large NR_CPUS. + * + * Use PAGE_SIZE as a minimum for smaller configurations. + */ +#define CPUMAP_FILE_MAX_BYTES ((((NR_CPUS * 9)/32 - 1) > PAGE_SIZE) \ + ? (NR_CPUS * 9)/32 - 1 : PAGE_SIZE) +#define CPULIST_FILE_MAX_BYTES (((NR_CPUS * 7)/2 > PAGE_SIZE) ? (NR_CPUS * 7)/2 : PAGE_SIZE) + #endif /* __LINUX_CPUMASK_H */ -- cgit From 65d9a9a60fd71be964effb2e94747a6acb6e7015 Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Fri, 1 Jul 2022 13:04:04 +0530 Subject: kexec_file: drop weak attribute from functions As requested (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org), this series converts weak functions in kexec to use the #ifdef approach. Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]") changelog: : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") : [1], binutils (v2.36+) started dropping section symbols that it thought : were unused. This isn't an issue in general, but with kexec_file.c, gcc : is placing kexec_arch_apply_relocations[_add] into a separate : .text.unlikely section and the section symbol ".text.unlikely" is being : dropped. Due to this, recordmcount is unable to find a non-weak symbol in : .text.unlikely to generate a relocation record against. This patch (of 2); Drop __weak attribute from functions in kexec_file.c: - arch_kexec_kernel_image_probe() - arch_kimage_file_post_load_cleanup() - arch_kexec_kernel_image_load() - arch_kexec_locate_mem_hole() - arch_kexec_kernel_verify_sig() arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so drop the static attribute for the latter. arch_kexec_kernel_verify_sig() is not overridden by any architecture, so drop the __weak attribute. Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao Suggested-by: Eric Biederman Signed-off-by: Andrew Morton Signed-off-by: Mimi Zohar --- include/linux/kexec.h | 44 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 38 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 475683cd67f1..6958c6b471f4 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -188,21 +188,53 @@ int kexec_purgatory_get_set_symbol(struct kimage *image, const char *name, void *buf, unsigned int size, bool get_value); void *kexec_purgatory_get_symbol_addr(struct kimage *image, const char *name); +void *kexec_image_load_default(struct kimage *image); + +#ifndef arch_kexec_kernel_image_probe +static inline int +arch_kexec_kernel_image_probe(struct kimage *image, void *buf, unsigned long buf_len) +{ + return kexec_image_probe_default(image, buf, buf_len); +} +#endif + +#ifndef arch_kimage_file_post_load_cleanup +static inline int arch_kimage_file_post_load_cleanup(struct kimage *image) +{ + return kexec_image_post_load_cleanup_default(image); +} +#endif + +#ifndef arch_kexec_kernel_image_load +static inline void *arch_kexec_kernel_image_load(struct kimage *image) +{ + return kexec_image_load_default(image); +} +#endif -/* Architectures may override the below functions */ -int arch_kexec_kernel_image_probe(struct kimage *image, void *buf, - unsigned long buf_len); -void *arch_kexec_kernel_image_load(struct kimage *image); -int arch_kimage_file_post_load_cleanup(struct kimage *image); #ifdef CONFIG_KEXEC_SIG int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, unsigned long buf_len); #endif -int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf); extern int kexec_add_buffer(struct kexec_buf *kbuf); int kexec_locate_mem_hole(struct kexec_buf *kbuf); +#ifndef arch_kexec_locate_mem_hole +/** + * arch_kexec_locate_mem_hole - Find free memory to place the segments. + * @kbuf: Parameters for the memory search. + * + * On success, kbuf->mem will have the start address of the memory region found. + * + * Return: 0 on success, negative errno on error. + */ +static inline int arch_kexec_locate_mem_hole(struct kexec_buf *kbuf) +{ + return kexec_locate_mem_hole(kbuf); +} +#endif + /* Alignment required for elf header segment */ #define ELF_CORE_HEADER_ALIGN 4096 -- cgit From 0738eceb6201691534df07e0928d0a6168a35787 Mon Sep 17 00:00:00 2001 From: "Naveen N. Rao" Date: Fri, 1 Jul 2022 13:04:05 +0530 Subject: kexec: drop weak attribute from functions Drop __weak attribute from functions in kexec_core.c: - machine_kexec_post_load() - arch_kexec_protect_crashkres() - arch_kexec_unprotect_crashkres() - crash_free_reserved_phys_range() Link: https://lkml.kernel.org/r/c0f6219e03cb399d166d518ab505095218a902dd.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao Suggested-by: Eric Biederman Signed-off-by: Andrew Morton Signed-off-by: Mimi Zohar --- include/linux/kexec.h | 32 ++++++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 6958c6b471f4..8107606ad1e8 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -390,7 +390,10 @@ extern void machine_kexec_cleanup(struct kimage *image); extern int kernel_kexec(void); extern struct page *kimage_alloc_control_pages(struct kimage *image, unsigned int order); -int machine_kexec_post_load(struct kimage *image); + +#ifndef machine_kexec_post_load +static inline int machine_kexec_post_load(struct kimage *image) { return 0; } +#endif extern void __crash_kexec(struct pt_regs *); extern void crash_kexec(struct pt_regs *); @@ -423,10 +426,21 @@ extern bool kexec_in_progress; int crash_shrink_memory(unsigned long new_size); size_t crash_get_memory_size(void); -void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); -void arch_kexec_protect_crashkres(void); -void arch_kexec_unprotect_crashkres(void); +#ifndef arch_kexec_protect_crashkres +/* + * Protection mechanism for crashkernel reserved memory after + * the kdump kernel is loaded. + * + * Provide an empty default implementation here -- architecture + * code may override this + */ +static inline void arch_kexec_protect_crashkres(void) { } +#endif + +#ifndef arch_kexec_unprotect_crashkres +static inline void arch_kexec_unprotect_crashkres(void) { } +#endif #ifndef page_to_boot_pfn static inline unsigned long page_to_boot_pfn(struct page *page) @@ -456,6 +470,16 @@ static inline phys_addr_t boot_phys_to_phys(unsigned long boot_phys) } #endif +#ifndef crash_free_reserved_phys_range +static inline void crash_free_reserved_phys_range(unsigned long begin, unsigned long end) +{ + unsigned long addr; + + for (addr = begin; addr < end; addr += PAGE_SIZE) + free_reserved_page(boot_pfn_to_page(addr >> PAGE_SHIFT)); +} +#endif + static inline unsigned long virt_to_boot_phys(void *addr) { return phys_to_boot_phys(__pa((unsigned long)addr)); -- cgit From 689a71493bd2f31c024f8c0395f85a1fd4b2138e Mon Sep 17 00:00:00 2001 From: Coiby Xu Date: Thu, 14 Jul 2022 21:40:24 +0800 Subject: kexec: clean up arch_kexec_kernel_verify_sig Before commit 105e10e2cf1c ("kexec_file: drop weak attribute from functions"), there was already no arch-specific implementation of arch_kexec_kernel_verify_sig. With weak attribute dropped by that commit, arch_kexec_kernel_verify_sig is completely useless. So clean it up. Note later patches are dependent on this patch so it should be backported to the stable tree as well. Cc: stable@vger.kernel.org Suggested-by: Eric W. Biederman Reviewed-by: Michal Suchanek Acked-by: Baoquan He Signed-off-by: Coiby Xu [zohar@linux.ibm.com: reworded patch description "Note"] Link: https://lore.kernel.org/linux-integrity/20220714134027.394370-1-coxu@redhat.com/ Signed-off-by: Mimi Zohar --- include/linux/kexec.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 8107606ad1e8..7f710fb3712b 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -212,11 +212,6 @@ static inline void *arch_kexec_kernel_image_load(struct kimage *image) } #endif -#ifdef CONFIG_KEXEC_SIG -int arch_kexec_kernel_verify_sig(struct kimage *image, void *buf, - unsigned long buf_len); -#endif - extern int kexec_add_buffer(struct kexec_buf *kbuf); int kexec_locate_mem_hole(struct kexec_buf *kbuf); -- cgit From c903dae8941deb55043ee46ded29e84e97cd84bb Mon Sep 17 00:00:00 2001 From: Coiby Xu Date: Thu, 14 Jul 2022 21:40:25 +0800 Subject: kexec, KEYS: make the code in bzImage64_verify_sig generic commit 278311e417be ("kexec, KEYS: Make use of platform keyring for signature verify") adds platform keyring support on x86 kexec but not arm64. The code in bzImage64_verify_sig uses the keys on the .builtin_trusted_keys, .machine, if configured and enabled, .secondary_trusted_keys, also if configured, and .platform keyrings to verify the signed kernel image as PE file. Cc: kexec@lists.infradead.org Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Reviewed-by: Michal Suchanek Signed-off-by: Coiby Xu Signed-off-by: Mimi Zohar --- include/linux/kexec.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 7f710fb3712b..13e6c4b58f07 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -19,6 +19,7 @@ #include #include +#include /* Location of a reserved region to hold the crash kernel. */ @@ -212,6 +213,12 @@ static inline void *arch_kexec_kernel_image_load(struct kimage *image) } #endif +#ifdef CONFIG_KEXEC_SIG +#ifdef CONFIG_SIGNED_PE_FILE_VERIFICATION +int kexec_kernel_verify_pe_sig(const char *kernel, unsigned long kernel_len); +#endif +#endif + extern int kexec_add_buffer(struct kexec_buf *kbuf); int kexec_locate_mem_hole(struct kexec_buf *kbuf); -- cgit From ae6ccaa650380d243cf43d31c864c5ced2fd4612 Mon Sep 17 00:00:00 2001 From: Lukasz Luba Date: Thu, 7 Jul 2022 08:15:52 +0100 Subject: PM: EM: convert power field to micro-Watts precision and align drivers The milli-Watts precision causes rounding errors while calculating efficiency cost for each OPP. This is especially visible in the 'simple' Energy Model (EM), where the power for each OPP is provided from OPP framework. This can cause some OPPs to be marked inefficient, while using micro-Watts precision that might not happen. Update all EM users which access 'power' field and assume the value is in milli-Watts. Solve also an issue with potential overflow in calculation of energy estimation on 32bit machine. It's needed now since the power value (thus the 'cost' as well) are higher. Example calculation which shows the rounding error and impact: power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000 power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18 power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961 power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21 max_freq = 2000MHz cost_a_mW = 18 * 2000MHz/500MHz = 72 cost_a_uW = 18000 * 2000MHz/500MHz = 72000 cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better cost_b_uW = 21961 * 2000MHz/600MHz = 73203 The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly better that the 'cost_b_uW' (this patch uses micro-Watts) and such would have impact on the 'inefficient OPPs' information in the Cpufreq framework. This patch set removes the rounding issue. Signed-off-by: Lukasz Luba Acked-by: Daniel Lezcano Acked-by: Viresh Kumar Signed-off-by: Rafael J. Wysocki --- include/linux/energy_model.h | 54 +++++++++++++++++++++++++++++++------------- 1 file changed, 38 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/energy_model.h b/include/linux/energy_model.h index 8419bffb4398..b9caa01dfac4 100644 --- a/include/linux/energy_model.h +++ b/include/linux/energy_model.h @@ -62,7 +62,7 @@ struct em_perf_domain { /* * em_perf_domain flags: * - * EM_PERF_DOMAIN_MILLIWATTS: The power values are in milli-Watts or some + * EM_PERF_DOMAIN_MICROWATTS: The power values are in micro-Watts or some * other scale. * * EM_PERF_DOMAIN_SKIP_INEFFICIENCIES: Skip inefficient states when estimating @@ -71,7 +71,7 @@ struct em_perf_domain { * EM_PERF_DOMAIN_ARTIFICIAL: The power values are artificial and might be * created by platform missing real power information */ -#define EM_PERF_DOMAIN_MILLIWATTS BIT(0) +#define EM_PERF_DOMAIN_MICROWATTS BIT(0) #define EM_PERF_DOMAIN_SKIP_INEFFICIENCIES BIT(1) #define EM_PERF_DOMAIN_ARTIFICIAL BIT(2) @@ -79,22 +79,44 @@ struct em_perf_domain { #define em_is_artificial(em) ((em)->flags & EM_PERF_DOMAIN_ARTIFICIAL) #ifdef CONFIG_ENERGY_MODEL -#define EM_MAX_POWER 0xFFFF +/* + * The max power value in micro-Watts. The limit of 64 Watts is set as + * a safety net to not overflow multiplications on 32bit platforms. The + * 32bit value limit for total Perf Domain power implies a limit of + * maximum CPUs in such domain to 64. + */ +#define EM_MAX_POWER (64000000) /* 64 Watts */ + +/* + * To avoid possible energy estimation overflow on 32bit machines add + * limits to number of CPUs in the Perf. Domain. + * We are safe on 64bit machine, thus some big number. + */ +#ifdef CONFIG_64BIT +#define EM_MAX_NUM_CPUS 4096 +#else +#define EM_MAX_NUM_CPUS 16 +#endif /* - * Increase resolution of energy estimation calculations for 64-bit - * architectures. The extra resolution improves decision made by EAS for the - * task placement when two Performance Domains might provide similar energy - * estimation values (w/o better resolution the values could be equal). + * To avoid an overflow on 32bit machines while calculating the energy + * use a different order in the operation. First divide by the 'cpu_scale' + * which would reduce big value stored in the 'cost' field, then multiply by + * the 'sum_util'. This would allow to handle existing platforms, which have + * e.g. power ~1.3 Watt at max freq, so the 'cost' value > 1mln micro-Watts. + * In such scenario, where there are 4 CPUs in the Perf. Domain the 'sum_util' + * could be 4096, then multiplication: 'cost' * 'sum_util' would overflow. + * This reordering of operations has some limitations, we lose small + * precision in the estimation (comparing to 64bit platform w/o reordering). * - * We increase resolution only if we have enough bits to allow this increased - * resolution (i.e. 64-bit). The costs for increasing resolution when 32-bit - * are pretty high and the returns do not justify the increased costs. + * We are safe on 64bit machine. */ #ifdef CONFIG_64BIT -#define em_scale_power(p) ((p) * 1000) +#define em_estimate_energy(cost, sum_util, scale_cpu) \ + (((cost) * (sum_util)) / (scale_cpu)) #else -#define em_scale_power(p) (p) +#define em_estimate_energy(cost, sum_util, scale_cpu) \ + (((cost) / (scale_cpu)) * (sum_util)) #endif struct em_data_callback { @@ -112,7 +134,7 @@ struct em_data_callback { * and frequency. * * In case of CPUs, the power is the one of a single CPU in the domain, - * expressed in milli-Watts or an abstract scale. It is expected to + * expressed in micro-Watts or an abstract scale. It is expected to * fit in the [0, EM_MAX_POWER] range. * * Return 0 on success. @@ -148,7 +170,7 @@ struct em_perf_domain *em_cpu_get(int cpu); struct em_perf_domain *em_pd_get(struct device *dev); int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, struct em_data_callback *cb, cpumask_t *span, - bool milliwatts); + bool microwatts); void em_dev_unregister_perf_domain(struct device *dev); /** @@ -273,7 +295,7 @@ static inline unsigned long em_cpu_energy(struct em_perf_domain *pd, * pd_nrg = ------------------------ (4) * scale_cpu */ - return ps->cost * sum_util / scale_cpu; + return em_estimate_energy(ps->cost, sum_util, scale_cpu); } /** @@ -297,7 +319,7 @@ struct em_data_callback {}; static inline int em_dev_register_perf_domain(struct device *dev, unsigned int nr_states, struct em_data_callback *cb, cpumask_t *span, - bool milliwatts) + bool microwatts) { return -EINVAL; } -- cgit From 5e0fd2026cdd474a85b3135c312912321e60f47a Mon Sep 17 00:00:00 2001 From: Lukasz Luba Date: Thu, 7 Jul 2022 08:15:54 +0100 Subject: firmware: arm_scmi: Get detailed power scale from perf In SCMI v3.1 the power scale can be in micro-Watts. The upper layers, e.g. cpufreq and EM should handle received power values properly (upscale when needed). Thus, provide an interface which allows to check what is the scale for power values. The old interface allowed to distinguish between bogo-Watts and milli-Watts only (which was good for older SCMI spec). Acked-by: Sudeep Holla Signed-off-by: Lukasz Luba Signed-off-by: Rafael J. Wysocki --- include/linux/scmi_protocol.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index 704111f63993..a0a246310ba1 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -60,6 +60,12 @@ struct scmi_clock_info { }; }; +enum scmi_power_scale { + SCMI_POWER_BOGOWATTS, + SCMI_POWER_MILLIWATTS, + SCMI_POWER_MICROWATTS +}; + struct scmi_handle; struct scmi_device; struct scmi_protocol_handle; @@ -135,7 +141,7 @@ struct scmi_perf_proto_ops { unsigned long *rate, unsigned long *power); bool (*fast_switch_possible)(const struct scmi_protocol_handle *ph, struct device *dev); - bool (*power_scale_mw_get)(const struct scmi_protocol_handle *ph); + enum scmi_power_scale (*power_scale_get)(const struct scmi_protocol_handle *ph); }; /** -- cgit From fcfe0ac2fcfae7d5fcad3d0375cb8ff38caf8aba Mon Sep 17 00:00:00 2001 From: Micah Morton Date: Wed, 8 Jun 2022 20:57:11 +0000 Subject: security: Add LSM hook to setgroups() syscall Give the LSM framework the ability to filter setgroups() syscalls. There are already analagous hooks for the set*uid() and set*gid() syscalls. The SafeSetID LSM will use this new hook to ensure setgroups() calls are allowed by the installed security policy. Tested by putting print statement in security_task_fix_setgroups() hook and confirming that it gets hit when userspace does a setgroups() syscall. Acked-by: Casey Schaufler Reviewed-by: Serge Hallyn Signed-off-by: Micah Morton --- include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 7 +++++++ include/linux/security.h | 7 +++++++ 3 files changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index eafa1d2489fd..806448173033 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -201,6 +201,7 @@ LSM_HOOK(int, 0, task_fix_setuid, struct cred *new, const struct cred *old, int flags) LSM_HOOK(int, 0, task_fix_setgid, struct cred *new, const struct cred * old, int flags) +LSM_HOOK(int, 0, task_fix_setgroups, struct cred *new, const struct cred * old) LSM_HOOK(int, 0, task_setpgid, struct task_struct *p, pid_t pgid) LSM_HOOK(int, 0, task_getpgid, struct task_struct *p) LSM_HOOK(int, 0, task_getsid, struct task_struct *p) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 91c8146649f5..84a0d7e02176 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -702,6 +702,13 @@ * @old is the set of credentials that are being replaced. * @flags contains one of the LSM_SETID_* values. * Return 0 on success. + * @task_fix_setgroups: + * Update the module's state after setting the supplementary group + * identity attributes of the current process. + * @new is the set of credentials that will be installed. Modifications + * should be made to this rather than to @current->cred. + * @old is the set of credentials that are being replaced. + * Return 0 on success. * @task_setpgid: * Check permission before setting the process group identifier of the * process @p to @pgid. diff --git a/include/linux/security.h b/include/linux/security.h index 7fc4e9f49f54..1dfd32c49fa3 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -415,6 +415,7 @@ int security_task_fix_setuid(struct cred *new, const struct cred *old, int flags); int security_task_fix_setgid(struct cred *new, const struct cred *old, int flags); +int security_task_fix_setgroups(struct cred *new, const struct cred *old); int security_task_setpgid(struct task_struct *p, pid_t pgid); int security_task_getpgid(struct task_struct *p); int security_task_getsid(struct task_struct *p); @@ -1098,6 +1099,12 @@ static inline int security_task_fix_setgid(struct cred *new, return 0; } +static inline int security_task_fix_setgroups(struct cred *new, + const struct cred *old) +{ + return 0; +} + static inline int security_task_setpgid(struct task_struct *p, pid_t pgid) { return 0; -- cgit From c9fa2b07fa99e648b5042a32dbfa39ba68a190db Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Sun, 26 Jun 2022 20:45:07 +0200 Subject: mnt_idmapping: add vfs[g,u]id_into_k[g,u]id() Add two tiny helpers to conver a vfsuid into a kuid. Signed-off-by: Christian Brauner (Microsoft) --- include/linux/mnt_idmapping.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mnt_idmapping.h b/include/linux/mnt_idmapping.h index 41dc80f8b67c..f6e5369d2928 100644 --- a/include/linux/mnt_idmapping.h +++ b/include/linux/mnt_idmapping.h @@ -333,6 +333,19 @@ static inline bool vfsuid_has_fsmapping(struct user_namespace *mnt_userns, return uid_valid(from_vfsuid(mnt_userns, fs_userns, vfsuid)); } +/** + * vfsuid_into_kuid - convert vfsuid into kuid + * @vfsuid: the vfsuid to convert + * + * This can be used when a vfsuid is committed as a kuid. + * + * Return: a kuid with the value of @vfsuid + */ +static inline kuid_t vfsuid_into_kuid(vfsuid_t vfsuid) +{ + return AS_KUIDT(vfsuid); +} + /** * from_vfsgid - map a vfsgid into the filesystem idmapping * @mnt_userns: the mount's idmapping @@ -406,6 +419,19 @@ static inline bool vfsgid_has_fsmapping(struct user_namespace *mnt_userns, return gid_valid(from_vfsgid(mnt_userns, fs_userns, vfsgid)); } +/** + * vfsgid_into_kgid - convert vfsgid into kgid + * @vfsgid: the vfsgid to convert + * + * This can be used when a vfsgid is committed as a kgid. + * + * Return: a kgid with the value of @vfsgid + */ +static inline kgid_t vfsgid_into_kgid(vfsgid_t vfsgid) +{ + return AS_KGIDT(vfsgid); +} + /** * mapped_fsuid - return caller's fsuid mapped up into a mnt_userns * @mnt_userns: the mount's idmapping -- cgit From 0c5fd887d2bb47aa37aa9fb1eb1d1d2abac62972 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 6 Jul 2022 18:30:59 +0200 Subject: acl: move idmapped mount fixup into vfs_{g,s}etxattr() This cycle we added support for mounting overlayfs on top of idmapped mounts. Recently I've started looking into potential corner cases when trying to add additional tests and I noticed that reporting for POSIX ACLs is currently wrong when using idmapped layers with overlayfs mounted on top of it. I'm going to give a rather detailed explanation to both the origin of the problem and the solution. Let's assume the user creates the following directory layout and they have a rootfs /var/lib/lxc/c1/rootfs. The files in this rootfs are owned as you would expect files on your host system to be owned. For example, ~/.bashrc for your regular user would be owned by 1000:1000 and /root/.bashrc would be owned by 0:0. IOW, this is just regular boring filesystem tree on an ext4 or xfs filesystem. The user chooses to set POSIX ACLs using the setfacl binary granting the user with uid 4 read, write, and execute permissions for their .bashrc file: setfacl -m u:4:rwx /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc Now they to expose the whole rootfs to a container using an idmapped mount. So they first create: mkdir -pv /vol/contpool/{ctrover,merge,lowermap,overmap} mkdir -pv /vol/contpool/ctrover/{over,work} chown 10000000:10000000 /vol/contpool/ctrover/{over,work} The user now creates an idmapped mount for the rootfs: mount-idmapped/mount-idmapped --map-mount=b:0:10000000:65536 \ /var/lib/lxc/c2/rootfs \ /vol/contpool/lowermap This for example makes it so that /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc which is owned by uid and gid 1000 as being owned by uid and gid 10001000 at /vol/contpool/lowermap/home/ubuntu/.bashrc. Assume the user wants to expose these idmapped mounts through an overlayfs mount to a container. mount -t overlay overlay \ -o lowerdir=/vol/contpool/lowermap, \ upperdir=/vol/contpool/overmap/over, \ workdir=/vol/contpool/overmap/work \ /vol/contpool/merge The user can do this in two ways: (1) Mount overlayfs in the initial user namespace and expose it to the container. (2) Mount overlayfs on top of the idmapped mounts inside of the container's user namespace. Let's assume the user chooses the (1) option and mounts overlayfs on the host and then changes into a container which uses the idmapping 0:10000000:65536 which is the same used for the two idmapped mounts. Now the user tries to retrieve the POSIX ACLs using the getfacl command getfacl -n /vol/contpool/lowermap/home/ubuntu/.bashrc and to their surprise they see: # file: vol/contpool/merge/home/ubuntu/.bashrc # owner: 1000 # group: 1000 user::rw- user:4294967295:rwx group::r-- mask::rwx other::r-- indicating the the uid wasn't correctly translated according to the idmapped mount. The problem is how we currently translate POSIX ACLs. Let's inspect the callchain in this example: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */ sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get == ovl_posix_acl_xattr_get() | -> ovl_xattr_get() | -> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get() /* lower filesystem callback */ |> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 4); /* FAILURE */ -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4); } If the user chooses to use option (2) and mounts overlayfs on top of idmapped mounts inside the container things don't look that much better: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get == ovl_posix_acl_xattr_get() | -> ovl_xattr_get() | -> vfs_getxattr() | -> __vfs_getxattr() | -> handler->get() /* lower filesystem callback */ |> posix_acl_fix_xattr_to_user() { 4 = make_kuid(&init_user_ns, 4); 4 = mapped_kuid_fs(&init_user_ns, 4); /* FAILURE */ -1 = from_kuid(0:10000000:65536 /* caller's idmapping */, 4); } As is easily seen the problem arises because the idmapping of the lower mount isn't taken into account as all of this happens in do_gexattr(). But do_getxattr() is always called on an overlayfs mount and inode and thus cannot possible take the idmapping of the lower layers into account. This problem is similar for fscaps but there the translation happens as part of vfs_getxattr() already. Let's walk through an fscaps overlayfs callchain: setcap 'cap_net_raw+ep' /var/lib/lxc/c2/rootfs/home/ubuntu/.bashrc The expected outcome here is that we'll receive the cap_net_raw capability as we are able to map the uid associated with the fscap to 0 within our container. IOW, we want to see 0 as the result of the idmapping translations. If the user chooses option (1) we get the following callchain for fscaps: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */ sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() ________________________________ -> cap_inode_getsecurity() | | { V | 10000000 = make_kuid(0:0:4k /* overlayfs idmapping */, 10000000); | 10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); | /* Expected result is 0 and thus that we own the fscap. */ | 0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); | } | -> vfs_getxattr_alloc() | -> handler->get == ovl_other_xattr_get() | -> vfs_getxattr() | -> xattr_getsecurity() | -> security_inode_getsecurity() | -> cap_inode_getsecurity() | { | 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); | 10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); | 10000000 = from_kuid(0:0:4k /* overlayfs idmapping */, 10000000); | |____________________________________________________________________| } -> vfs_getxattr_alloc() -> handler->get == /* lower filesystem callback */ And if the user chooses option (2) we get: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() -> vfs_getxattr() -> xattr_getsecurity() -> security_inode_getsecurity() _______________________________ -> cap_inode_getsecurity() | | { V | 10000000 = make_kuid(0:10000000:65536 /* overlayfs idmapping */, 0); | 10000000 = mapped_kuid_fs(0:0:4k /* no idmapped mount */, 10000000); | /* Expected result is 0 and thus that we own the fscap. */ | 0 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000000); | } | -> vfs_getxattr_alloc() | -> handler->get == ovl_other_xattr_get() | |-> vfs_getxattr() | -> xattr_getsecurity() | -> security_inode_getsecurity() | -> cap_inode_getsecurity() | { | 0 = make_kuid(0:0:4k /* lower s_user_ns */, 0); | 10000000 = mapped_kuid_fs(0:10000000:65536 /* idmapped mount */, 0); | 0 = from_kuid(0:10000000:65536 /* overlayfs idmapping */, 10000000); | |____________________________________________________________________| } -> vfs_getxattr_alloc() -> handler->get == /* lower filesystem callback */ We can see how the translation happens correctly in those cases as the conversion happens within the vfs_getxattr() helper. For POSIX ACLs we need to do something similar. However, in contrast to fscaps we cannot apply the fix directly to the kernel internal posix acl data structure as this would alter the cached values and would also require a rework of how we currently deal with POSIX ACLs in general which almost never take the filesystem idmapping into account (the noteable exception being FUSE but even there the implementation is special) and instead retrieve the raw values based on the initial idmapping. The correct values are then generated right before returning to userspace. The fix for this is to move taking the mount's idmapping into account directly in vfs_getxattr() instead of having it be part of posix_acl_fix_xattr_to_user(). To this end we split out two small and unexported helpers posix_acl_getxattr_idmapped_mnt() and posix_acl_setxattr_idmapped_mnt(). The former to be called in vfs_getxattr() and the latter to be called in vfs_setxattr(). Let's go back to the original example. Assume the user chose option (1) and mounted overlayfs on top of idmapped mounts on the host: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:0:4k /* initial idmapping */ sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | |> __vfs_getxattr() | | -> handler->get == ovl_posix_acl_xattr_get() | | -> ovl_xattr_get() | | -> vfs_getxattr() | | |> __vfs_getxattr() | | | -> handler->get() /* lower filesystem callback */ | | |> posix_acl_getxattr_idmapped_mnt() | | { | | 4 = make_kuid(&init_user_ns, 4); | | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4); | | 10000004 = from_kuid(&init_user_ns, 10000004); | | |_______________________ | | } | | | | | |> posix_acl_getxattr_idmapped_mnt() | | { | | V | 10000004 = make_kuid(&init_user_ns, 10000004); | 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004); | 10000004 = from_kuid(&init_user_ns, 10000004); | } |_________________________________________________ | | | | |> posix_acl_fix_xattr_to_user() | { V 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004); /* SUCCESS */ 4 = from_kuid(0:10000000:65536 /* caller's idmapping */, 10000004); } And similarly if the user chooses option (1) and mounted overayfs on top of idmapped mounts inside the container: idmapped mount /vol/contpool/merge: 0:10000000:65536 caller's idmapping: 0:10000000:65536 overlayfs idmapping (ofs->creator_cred): 0:10000000:65536 sys_getxattr() -> path_getxattr() -> getxattr() -> do_getxattr() |> vfs_getxattr() | |> __vfs_getxattr() | | -> handler->get == ovl_posix_acl_xattr_get() | | -> ovl_xattr_get() | | -> vfs_getxattr() | | |> __vfs_getxattr() | | | -> handler->get() /* lower filesystem callback */ | | |> posix_acl_getxattr_idmapped_mnt() | | { | | 4 = make_kuid(&init_user_ns, 4); | | 10000004 = mapped_kuid_fs(0:10000000:65536 /* lower idmapped mount */, 4); | | 10000004 = from_kuid(&init_user_ns, 10000004); | | |_______________________ | | } | | | | | |> posix_acl_getxattr_idmapped_mnt() | | { V | 10000004 = make_kuid(&init_user_ns, 10000004); | 10000004 = mapped_kuid_fs(&init_user_ns /* no idmapped mount */, 10000004); | 10000004 = from_kuid(0(&init_user_ns, 10000004); | |_________________________________________________ | } | | | |> posix_acl_fix_xattr_to_user() | { V 10000004 = make_kuid(0:0:4k /* init_user_ns */, 10000004); /* SUCCESS */ 4 = from_kuid(0:10000000:65536 /* caller's idmappings */, 10000004); } The last remaining problem we need to fix here is ovl_get_acl(). During ovl_permission() overlayfs will call: ovl_permission() -> generic_permission() -> acl_permission_check() -> check_acl() -> get_acl() -> inode->i_op->get_acl() == ovl_get_acl() > get_acl() /* on the underlying filesystem) ->inode->i_op->get_acl() == /*lower filesystem callback */ -> posix_acl_permission() passing through the get_acl request to the underlying filesystem. This will retrieve the acls stored in the lower filesystem without taking the idmapping of the underlying mount into account as this would mean altering the cached values for the lower filesystem. So we block using ACLs for now until we decided on a nice way to fix this. Note this limitation both in the documentation and in the code. The most straightforward solution would be to have ovl_get_acl() simply duplicate the ACLs, update the values according to the idmapped mount and return it to acl_permission_check() so it can be used in posix_acl_permission() forgetting them afterwards. This is a bit heavy handed but fairly straightforward otherwise. Link: https://github.com/brauner/mount-idmapped/issues/9 Link: https://lore.kernel.org/r/20220708090134.385160-2-brauner@kernel.org Cc: Seth Forshee Cc: Amir Goldstein Cc: Vivek Goyal Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Miklos Szeredi Cc: linux-unionfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/posix_acl_xattr.h | 34 ++++++++++++++++++++++------------ include/linux/xattr.h | 2 +- 2 files changed, 23 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/posix_acl_xattr.h b/include/linux/posix_acl_xattr.h index 1766e1de6956..b6bd3eac2bcc 100644 --- a/include/linux/posix_acl_xattr.h +++ b/include/linux/posix_acl_xattr.h @@ -33,21 +33,31 @@ posix_acl_xattr_count(size_t size) } #ifdef CONFIG_FS_POSIX_ACL -void posix_acl_fix_xattr_from_user(struct user_namespace *mnt_userns, - struct inode *inode, - void *value, size_t size); -void posix_acl_fix_xattr_to_user(struct user_namespace *mnt_userns, - struct inode *inode, - void *value, size_t size); +void posix_acl_fix_xattr_from_user(void *value, size_t size); +void posix_acl_fix_xattr_to_user(void *value, size_t size); +void posix_acl_getxattr_idmapped_mnt(struct user_namespace *mnt_userns, + const struct inode *inode, + void *value, size_t size); +void posix_acl_setxattr_idmapped_mnt(struct user_namespace *mnt_userns, + const struct inode *inode, + void *value, size_t size); #else -static inline void posix_acl_fix_xattr_from_user(struct user_namespace *mnt_userns, - struct inode *inode, - void *value, size_t size) +static inline void posix_acl_fix_xattr_from_user(void *value, size_t size) { } -static inline void posix_acl_fix_xattr_to_user(struct user_namespace *mnt_userns, - struct inode *inode, - void *value, size_t size) +static inline void posix_acl_fix_xattr_to_user(void *value, size_t size) +{ +} +static inline void +posix_acl_getxattr_idmapped_mnt(struct user_namespace *mnt_userns, + const struct inode *inode, void *value, + size_t size) +{ +} +static inline void +posix_acl_setxattr_idmapped_mnt(struct user_namespace *mnt_userns, + const struct inode *inode, void *value, + size_t size) { } #endif diff --git a/include/linux/xattr.h b/include/linux/xattr.h index 4c379d23ec6e..979a9d3e5bfb 100644 --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -61,7 +61,7 @@ int __vfs_setxattr_locked(struct user_namespace *, struct dentry *, const char *, const void *, size_t, int, struct inode **); int vfs_setxattr(struct user_namespace *, struct dentry *, const char *, - const void *, size_t, int); + void *, size_t, int); int __vfs_removexattr(struct user_namespace *, struct dentry *, const char *); int __vfs_removexattr_locked(struct user_namespace *, struct dentry *, const char *, struct inode **); -- cgit From 8043bffd01833a8544f2466fb3804310d6e73d09 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Wed, 6 Jul 2022 17:13:23 +0200 Subject: acl: make posix_acl_clone() available to overlayfs The ovl_get_acl() function needs to alter the POSIX ACLs retrieved from the lower filesystem. Instead of hand-rolling a overlayfs specific posix_acl_clone() variant allow export it. It's not special and it's not deeply internal anyway. Link: https://lore.kernel.org/r/20220708090134.385160-3-brauner@kernel.org Cc: Seth Forshee Cc: Amir Goldstein Cc: Vivek Goyal Cc: Christoph Hellwig Cc: Aleksa Sarai Cc: Miklos Szeredi Cc: linux-unionfs@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Reviewed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/posix_acl.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/posix_acl.h b/include/linux/posix_acl.h index b65c877d92b8..7d1e604c1325 100644 --- a/include/linux/posix_acl.h +++ b/include/linux/posix_acl.h @@ -73,6 +73,7 @@ extern int set_posix_acl(struct user_namespace *, struct inode *, int, struct posix_acl *); struct posix_acl *get_cached_acl_rcu(struct inode *inode, int type); +struct posix_acl *posix_acl_clone(const struct posix_acl *acl, gfp_t flags); #ifdef CONFIG_FS_POSIX_ACL int posix_acl_chmod(struct user_namespace *, struct inode *, umode_t); -- cgit From 0563231f93c6d1f582b168a47753b345c1e20d81 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Tue, 5 Jul 2022 18:44:54 -0400 Subject: tracing/events: Add __vstring() and __assign_vstr() helper macros There's several places that open code the following logic: TP_STRUCT__entry(__dynamic_array(char, msg, MSG_MAX)), TP_fast_assign(vsnprintf(__get_str(msg), MSG_MAX, vaf->fmt, *vaf->va);) To load a string created by variable array va_list. The main issue with this approach is that "MSG_MAX" usage in the __dynamic_array() portion. That actually just reserves the MSG_MAX in the event, and even wastes space because there's dynamic meta data also saved in the event to denote the offset and size of the dynamic array. It would have been better to just use a static __array() field. Instead, create __vstring() and __assign_vstr() that work like __string and __assign_str() but instead of taking a destination string to copy, take a format string and a va_list pointer and fill in the values. It uses the helper: #define __trace_event_vstr_len(fmt, va) \ ({ \ va_list __ap; \ int __ret; \ \ va_copy(__ap, *(va)); \ __ret = vsnprintf(NULL, 0, fmt, __ap) + 1; \ va_end(__ap); \ \ min(__ret, TRACE_EVENT_STR_MAX); \ }) To figure out the length to store the string. It may be slightly slower as it needs to run the vsnprintf() twice, but it now saves space on the ring buffer. Link: https://lkml.kernel.org/r/20220705224749.053570613@goodmis.org Cc: Dennis Dalessandro Cc: Ingo Molnar Cc: Andrew Morton Cc: Jason Gunthorpe Cc: Leon Romanovsky Cc: Kalle Valo Cc: "David S. Miller" Cc: Eric Dumazet Cc: Jakub Kicinski Cc: Paolo Abeni Cc: Arend van Spriel Cc: Franky Lin Cc: Hante Meuleman Cc: Gregory Greenman Cc: Peter Chen Cc: Greg Kroah-Hartman Cc: Mathias Nyman Cc: Chunfeng Yun Cc: Bin Liu Cc: Marek Lindner Cc: Simon Wunderlich Cc: Antonio Quartulli Cc: Sven Eckelmann Cc: Johannes Berg Cc: Jim Cromie Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index e6e95a9f07a5..b18759a673c6 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -916,6 +916,24 @@ perf_trace_buf_submit(void *raw_data, int size, int rctx, u16 type, #endif +#define TRACE_EVENT_STR_MAX 512 + +/* + * gcc warns that you can not use a va_list in an inlined + * function. But lets me make it into a macro :-/ + */ +#define __trace_event_vstr_len(fmt, va) \ +({ \ + va_list __ap; \ + int __ret; \ + \ + va_copy(__ap, *(va)); \ + __ret = vsnprintf(NULL, 0, fmt, __ap) + 1; \ + va_end(__ap); \ + \ + min(__ret, TRACE_EVENT_STR_MAX); \ +}) + #endif /* _LINUX_TRACE_EVENT_H */ /* -- cgit From e68c5dcf0aacc48a23cedcb3ce81b8c60837f48c Mon Sep 17 00:00:00 2001 From: Jaehee Park Date: Wed, 13 Jul 2022 16:40:47 -0700 Subject: net: ipv4: new arp_accept option to accept garp only if in-network In many deployments, we want the option to not learn a neighbor from garp if the src ip is not in the same subnet as an address configured on the interface that received the garp message. net.ipv4.arp_accept sysctl is currently used to control creation of a neigh from a received garp packet. This patch adds a new option '2' to net.ipv4.arp_accept which extends option '1' by including the subnet check. Signed-off-by: Jaehee Park Suggested-by: Roopa Prabhu Signed-off-by: Jakub Kicinski --- include/linux/inetdevice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/inetdevice.h b/include/linux/inetdevice.h index ead323243e7b..ddb27fc0ee8c 100644 --- a/include/linux/inetdevice.h +++ b/include/linux/inetdevice.h @@ -131,7 +131,7 @@ static inline void ipv4_devconf_setall(struct in_device *in_dev) IN_DEV_ORCONF((in_dev), IGNORE_ROUTES_WITH_LINKDOWN) #define IN_DEV_ARPFILTER(in_dev) IN_DEV_ORCONF((in_dev), ARPFILTER) -#define IN_DEV_ARP_ACCEPT(in_dev) IN_DEV_ORCONF((in_dev), ARP_ACCEPT) +#define IN_DEV_ARP_ACCEPT(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ACCEPT) #define IN_DEV_ARP_ANNOUNCE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_ANNOUNCE) #define IN_DEV_ARP_IGNORE(in_dev) IN_DEV_MAXCONF((in_dev), ARP_IGNORE) #define IN_DEV_ARP_NOTIFY(in_dev) IN_DEV_MAXCONF((in_dev), ARP_NOTIFY) -- cgit From 868941b14441282ba08761b770fc6cad69d5bdb7 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 29 Jun 2022 15:07:00 +0200 Subject: fs: remove no_llseek Now that all callers of ->llseek are going through vfs_llseek(), we don't gain anything by keeping no_llseek around. Nothing actually calls it and setting ->llseek to no_lseek is completely equivalent to leaving it NULL. Longer term (== by the end of merge window) we want to remove all such intializations. To simplify the merge window this commit does *not* touch initializers - it only defines no_llseek as NULL (and simplifies the tests on file opening). At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek | grep -v porting.rst | while read i; do sed -i '/\/d' $i done would do it. Signed-off-by: Jason A. Donenfeld Signed-off-by: Al Viro --- include/linux/fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ad5e3520fae..294932167335 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3022,7 +3022,7 @@ extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, extern void file_ra_state_init(struct file_ra_state *ra, struct address_space *mapping); extern loff_t noop_llseek(struct file *file, loff_t offset, int whence); -extern loff_t no_llseek(struct file *file, loff_t offset, int whence); +#define no_llseek NULL extern loff_t vfs_setpos(struct file *file, loff_t offset, loff_t maxsize); extern loff_t generic_file_llseek(struct file *file, loff_t offset, int whence); extern loff_t generic_file_llseek_size(struct file *file, loff_t offset, -- cgit From bc72d938c149197688ae3b3ecaa25d4aee8653cb Mon Sep 17 00:00:00 2001 From: Dmitry Rokosov Date: Wed, 1 Jun 2022 17:48:32 +0000 Subject: iio: trigger: move trig->owner init to trigger allocate() stage To provide a new IIO trigger to the IIO core, usually driver executes the following pipeline: allocate()/register()/get(). Before, IIO core assigned trig->owner as a pointer to the module which registered this trigger at the register() stage. But actually the trigger object is owned by the module earlier, on the allocate() stage, when trigger object is successfully allocated for the driver. This patch moves trig->owner initialization from register() stage of trigger initialization pipeline to allocate() stage to eliminate all misunderstandings and time gaps between trigger object creation and owner acquiring. Signed-off-by: Dmitry Rokosov Link: https://lore.kernel.org/r/20220601174837.20292-1-ddrokosov@sberdevices.ru Signed-off-by: Jonathan Cameron --- include/linux/iio/iio.h | 9 ++++++--- include/linux/iio/trigger.h | 21 ++++++++++----------- 2 files changed, 16 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index d9b4a9ca9a0f..5dfbfc991c69 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -727,10 +727,13 @@ static inline void *iio_priv(const struct iio_dev *indio_dev) void iio_device_free(struct iio_dev *indio_dev); struct iio_dev *devm_iio_device_alloc(struct device *parent, int sizeof_priv); -__printf(2, 3) -struct iio_trigger *devm_iio_trigger_alloc(struct device *parent, - const char *fmt, ...); +#define devm_iio_trigger_alloc(parent, fmt, ...) \ + __devm_iio_trigger_alloc((parent), THIS_MODULE, (fmt), ##__VA_ARGS__) +__printf(3, 4) +struct iio_trigger *__devm_iio_trigger_alloc(struct device *parent, + struct module *this_mod, + const char *fmt, ...); /** * iio_get_debugfs_dentry() - helper function to get the debugfs_dentry * @indio_dev: IIO device structure for device diff --git a/include/linux/iio/trigger.h b/include/linux/iio/trigger.h index 03b1d6863436..f6360d9a492d 100644 --- a/include/linux/iio/trigger.h +++ b/include/linux/iio/trigger.h @@ -131,16 +131,10 @@ static inline void *iio_trigger_get_drvdata(struct iio_trigger *trig) * iio_trigger_register() - register a trigger with the IIO core * @trig_info: trigger to be registered **/ -#define iio_trigger_register(trig_info) \ - __iio_trigger_register((trig_info), THIS_MODULE) -int __iio_trigger_register(struct iio_trigger *trig_info, - struct module *this_mod); +int iio_trigger_register(struct iio_trigger *trig_info); -#define devm_iio_trigger_register(dev, trig_info) \ - __devm_iio_trigger_register((dev), (trig_info), THIS_MODULE) -int __devm_iio_trigger_register(struct device *dev, - struct iio_trigger *trig_info, - struct module *this_mod); +int devm_iio_trigger_register(struct device *dev, + struct iio_trigger *trig_info); /** * iio_trigger_unregister() - unregister a trigger from the core @@ -168,8 +162,13 @@ void iio_trigger_poll_chained(struct iio_trigger *trig); irqreturn_t iio_trigger_generic_data_rdy_poll(int irq, void *private); -__printf(2, 3) -struct iio_trigger *iio_trigger_alloc(struct device *parent, const char *fmt, ...); +#define iio_trigger_alloc(parent, fmt, ...) \ + __iio_trigger_alloc((parent), THIS_MODULE, (fmt), ##__VA_ARGS__) + +__printf(3, 4) +struct iio_trigger *__iio_trigger_alloc(struct device *parent, + struct module *this_mod, + const char *fmt, ...); void iio_trigger_free(struct iio_trigger *trig); /** -- cgit From f484da847a01936e1d087a80cc96e8254492527c Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Sun, 3 Jul 2022 13:54:03 -0700 Subject: net/mlx5: Expose the ability to point to any UID from shared UID Expose shared_object_to_user_object_allowed, this capability means an object created with shared UID can point to any UID. Signed-off-by: Mark Bloch Reviewed-by: Maor Gottlieb Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 8e87eb47f9dc..9321d774e2d8 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1371,7 +1371,9 @@ enum { }; struct mlx5_ifc_cmd_hca_cap_bits { - u8 reserved_at_0[0x1f]; + u8 reserved_at_0[0x10]; + u8 shared_object_to_user_object_allowed[0x1]; + u8 reserved_at_13[0xe]; u8 vhca_resource_manager[0x1]; u8 hca_cap_2[0x1]; @@ -8507,7 +8509,7 @@ struct mlx5_ifc_create_flow_table_out_bits { struct mlx5_ifc_create_flow_table_in_bits { u8 opcode[0x10]; - u8 reserved_at_10[0x10]; + u8 uid[0x10]; u8 reserved_at_20[0x10]; u8 op_mod[0x10]; -- cgit From 6c27c56cdc69608211617eea217e1a6cecccc675 Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Sun, 3 Jul 2022 13:54:04 -0700 Subject: net/mlx5: fs, expose flow table ID to users Expose the flow table ID to users. This will be used by downstream patches to allow creating steering rules that point to a flow table ID. Signed-off-by: Mark Bloch Reviewed-by: Maor Gottlieb Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index ece3e35622d7..eee07d416b56 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -315,4 +315,5 @@ struct mlx5_pkt_reformat *mlx5_packet_reformat_alloc(struct mlx5_core_dev *dev, void mlx5_packet_reformat_dealloc(struct mlx5_core_dev *dev, struct mlx5_pkt_reformat *reformat); +u32 mlx5_flow_table_id(struct mlx5_flow_table *ft); #endif -- cgit From b0bb369ee451323968b31392a86398f15a2ba183 Mon Sep 17 00:00:00 2001 From: Mark Bloch Date: Sun, 3 Jul 2022 13:54:05 -0700 Subject: net/mlx5: fs, allow flow table creation with a UID Add UID field to flow table attributes to allow creating flow tables with a non default (zero) uid. Signed-off-by: Mark Bloch Reviewed-by: Alex Vesker Signed-off-by: Saeed Mahameed --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index eee07d416b56..8e73c377da2c 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -178,6 +178,7 @@ struct mlx5_flow_table_attr { int max_fte; u32 level; u32 flags; + u16 uid; struct mlx5_flow_table *next_ft; struct { -- cgit From 335e52c28cf9954d65b819cb68912fd32de3c844 Mon Sep 17 00:00:00 2001 From: David Gow Date: Fri, 1 Jul 2022 17:16:19 +0800 Subject: mm: Add PAGE_ALIGN_DOWN macro This is just the same as PAGE_ALIGN(), but rounds the address down, not up. Suggested-by: Dmitry Vyukov Signed-off-by: David Gow Acked-by: Andrew Morton Signed-off-by: Richard Weinberger --- include/linux/mm.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index cf3d0d673f6b..bd58ee018555 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -221,6 +221,9 @@ int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, /* to align the pointer to the (next) page boundary */ #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE) +/* to align the pointer to the (prev) page boundary */ +#define PAGE_ALIGN_DOWN(addr) ALIGN_DOWN(addr, PAGE_SIZE) + /* test whether an address (unsigned long or pointer) is aligned to PAGE_SIZE */ #define PAGE_ALIGNED(addr) IS_ALIGNED((unsigned long)(addr), PAGE_SIZE) -- cgit From 6077c943beee407168f72ece745b0aeaef6b896f Mon Sep 17 00:00:00 2001 From: Alex Sierra Date: Fri, 15 Jul 2022 10:05:08 -0500 Subject: mm: rename is_pinnable_page() to is_longterm_pinnable_page() Patch series "Add MEMORY_DEVICE_COHERENT for coherent device memory mapping", v9. This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory owned by a device that can be mapped into CPU page tables like MEMORY_DEVICE_GENERIC and can also be migrated like MEMORY_DEVICE_PRIVATE. This patch series is mostly self-contained except for a few places where it needs to update other subsystems to handle the new memory type. System stability and performance are not affected according to our ongoing testing, including xfstests. How it works: The system BIOS advertises the GPU device memory (aka VRAM) as SPM (special purpose memory) in the UEFI system address map. The amdgpu driver registers the memory with devmap as MEMORY_DEVICE_COHERENT using devm_memremap_pages. The initial user for this hardware page migration capability is the Frontier supercomputer project. This functionality is not AMD-specific. We expect other GPU vendors to find this functionality useful, and possibly other hardware types in the future. Our test nodes in the lab are similar to the Frontier configuration, with .5 TB of system memory plus 256 GB of device memory split across 4 GPUs, all in a single coherent address space. Page migration is expected to improve application efficiency significantly. We will report empirical results as they become available. Coherent device type pages at gup are now migrated back to system memory if they are being pinned long-term (FOLL_LONGTERM). The reason is, that long-term pinning would interfere with the device memory manager owning the device-coherent pages (e.g. evictions in TTM). These series incorporate Alistair Popple patches to do this migration from pin_user_pages() calls. hmm_gup_test has been added to hmm-test to test different get user pages calls. This series includes handling of device-managed anonymous pages returned by vm_normal_pages. Although they behave like normal pages for purposes of mapping in CPU page tables and for COW, they do not support LRU lists, NUMA migration or THP. We also introduced a FOLL_LRU flag that adds the same behaviour to follow_page and related APIs, to allow callers to specify that they expect to put pages on an LRU list. This patch (of 14): is_pinnable_page() and folio_is_pinnable() are renamed to is_longterm_pinnable_page() and folio_is_longterm_pinnable() respectively. These functions are used in the FOLL_LONGTERM flag context. Link: https://lkml.kernel.org/r/20220715150521.18165-1-alex.sierra@amd.com Link: https://lkml.kernel.org/r/20220715150521.18165-2-alex.sierra@amd.com Signed-off-by: Alex Sierra Reviewed-by: David Hildenbrand Cc: Jason Gunthorpe Cc: Felix Kuehling Cc: Ralph Campbell Cc: Christoph Hellwig Cc: Jerome Glisse Cc: Alistair Popple Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/mm.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 9cc02a7e503b..3c044e38958c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1607,7 +1607,7 @@ static inline bool page_needs_cow_for_dma(struct vm_area_struct *vma, /* MIGRATE_CMA and ZONE_MOVABLE do not allow pin pages */ #ifdef CONFIG_MIGRATION -static inline bool is_pinnable_page(struct page *page) +static inline bool is_longterm_pinnable_page(struct page *page) { #ifdef CONFIG_CMA int mt = get_pageblock_migratetype(page); @@ -1618,15 +1618,15 @@ static inline bool is_pinnable_page(struct page *page) return !is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page)); } #else -static inline bool is_pinnable_page(struct page *page) +static inline bool is_longterm_pinnable_page(struct page *page) { return true; } #endif -static inline bool folio_is_pinnable(struct folio *folio) +static inline bool folio_is_longterm_pinnable(struct folio *folio) { - return is_pinnable_page(&folio->page); + return is_longterm_pinnable_page(&folio->page); } static inline void set_page_zone(struct page *page, enum zone_type zone) -- cgit From 5bb88dc571b1cbf0284100a317fb21ab7d03e40c Mon Sep 17 00:00:00 2001 From: Alex Sierra Date: Fri, 15 Jul 2022 10:05:09 -0500 Subject: mm: move page zone helpers from mm.h to mmzone.h It makes more sense to have these helpers in zone specific header file, rather than the generic mm.h Link: https://lkml.kernel.org/r/20220715150521.18165-3-alex.sierra@amd.com Signed-off-by: Alex Sierra Cc: Alistair Popple Cc: Christoph Hellwig Cc: David Hildenbrand Cc: Felix Kuehling Cc: Jason Gunthorpe Cc: Jerome Glisse Cc: Matthew Wilcox Cc: Ralph Campbell Signed-off-by: Andrew Morton --- include/linux/memremap.h | 2 +- include/linux/mm.h | 78 ---------------------------------------------- include/linux/mmzone.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 81 insertions(+), 79 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 9f5ee49482de..732dde5988fb 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -2,7 +2,7 @@ #ifndef _LINUX_MEMREMAP_H_ #define _LINUX_MEMREMAP_H_ -#include +#include #include #include #include diff --git a/include/linux/mm.h b/include/linux/mm.h index 3c044e38958c..a2d01e49253b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1045,84 +1045,6 @@ vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf); * back into memory. */ -/* - * The zone field is never updated after free_area_init_core() - * sets it, so none of the operations on it need to be atomic. - */ - -/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_CPUPID] | ... | FLAGS | */ -#define SECTIONS_PGOFF ((sizeof(unsigned long)*8) - SECTIONS_WIDTH) -#define NODES_PGOFF (SECTIONS_PGOFF - NODES_WIDTH) -#define ZONES_PGOFF (NODES_PGOFF - ZONES_WIDTH) -#define LAST_CPUPID_PGOFF (ZONES_PGOFF - LAST_CPUPID_WIDTH) -#define KASAN_TAG_PGOFF (LAST_CPUPID_PGOFF - KASAN_TAG_WIDTH) - -/* - * Define the bit shifts to access each section. For non-existent - * sections we define the shift as 0; that plus a 0 mask ensures - * the compiler will optimise away reference to them. - */ -#define SECTIONS_PGSHIFT (SECTIONS_PGOFF * (SECTIONS_WIDTH != 0)) -#define NODES_PGSHIFT (NODES_PGOFF * (NODES_WIDTH != 0)) -#define ZONES_PGSHIFT (ZONES_PGOFF * (ZONES_WIDTH != 0)) -#define LAST_CPUPID_PGSHIFT (LAST_CPUPID_PGOFF * (LAST_CPUPID_WIDTH != 0)) -#define KASAN_TAG_PGSHIFT (KASAN_TAG_PGOFF * (KASAN_TAG_WIDTH != 0)) - -/* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */ -#ifdef NODE_NOT_IN_PAGE_FLAGS -#define ZONEID_SHIFT (SECTIONS_SHIFT + ZONES_SHIFT) -#define ZONEID_PGOFF ((SECTIONS_PGOFF < ZONES_PGOFF)? \ - SECTIONS_PGOFF : ZONES_PGOFF) -#else -#define ZONEID_SHIFT (NODES_SHIFT + ZONES_SHIFT) -#define ZONEID_PGOFF ((NODES_PGOFF < ZONES_PGOFF)? \ - NODES_PGOFF : ZONES_PGOFF) -#endif - -#define ZONEID_PGSHIFT (ZONEID_PGOFF * (ZONEID_SHIFT != 0)) - -#define ZONES_MASK ((1UL << ZONES_WIDTH) - 1) -#define NODES_MASK ((1UL << NODES_WIDTH) - 1) -#define SECTIONS_MASK ((1UL << SECTIONS_WIDTH) - 1) -#define LAST_CPUPID_MASK ((1UL << LAST_CPUPID_SHIFT) - 1) -#define KASAN_TAG_MASK ((1UL << KASAN_TAG_WIDTH) - 1) -#define ZONEID_MASK ((1UL << ZONEID_SHIFT) - 1) - -static inline enum zone_type page_zonenum(const struct page *page) -{ - ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); - return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; -} - -static inline enum zone_type folio_zonenum(const struct folio *folio) -{ - return page_zonenum(&folio->page); -} - -#ifdef CONFIG_ZONE_DEVICE -static inline bool is_zone_device_page(const struct page *page) -{ - return page_zonenum(page) == ZONE_DEVICE; -} -extern void memmap_init_zone_device(struct zone *, unsigned long, - unsigned long, struct dev_pagemap *); -#else -static inline bool is_zone_device_page(const struct page *page) -{ - return false; -} -#endif - -static inline bool folio_is_zone_device(const struct folio *folio) -{ - return is_zone_device_page(&folio->page); -} - -static inline bool is_zone_movable_page(const struct page *page) -{ - return page_zonenum(page) == ZONE_MOVABLE; -} - #if defined(CONFIG_ZONE_DEVICE) && defined(CONFIG_FS_DAX) DECLARE_STATIC_KEY_FALSE(devmap_managed_key); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 735bf5b37949..5da1135e6755 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -730,6 +730,86 @@ static inline bool zone_is_empty(struct zone *zone) return zone->spanned_pages == 0; } +#ifndef BUILD_VDSO32_64 +/* + * The zone field is never updated after free_area_init_core() + * sets it, so none of the operations on it need to be atomic. + */ + +/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_CPUPID] | ... | FLAGS | */ +#define SECTIONS_PGOFF ((sizeof(unsigned long)*8) - SECTIONS_WIDTH) +#define NODES_PGOFF (SECTIONS_PGOFF - NODES_WIDTH) +#define ZONES_PGOFF (NODES_PGOFF - ZONES_WIDTH) +#define LAST_CPUPID_PGOFF (ZONES_PGOFF - LAST_CPUPID_WIDTH) +#define KASAN_TAG_PGOFF (LAST_CPUPID_PGOFF - KASAN_TAG_WIDTH) + +/* + * Define the bit shifts to access each section. For non-existent + * sections we define the shift as 0; that plus a 0 mask ensures + * the compiler will optimise away reference to them. + */ +#define SECTIONS_PGSHIFT (SECTIONS_PGOFF * (SECTIONS_WIDTH != 0)) +#define NODES_PGSHIFT (NODES_PGOFF * (NODES_WIDTH != 0)) +#define ZONES_PGSHIFT (ZONES_PGOFF * (ZONES_WIDTH != 0)) +#define LAST_CPUPID_PGSHIFT (LAST_CPUPID_PGOFF * (LAST_CPUPID_WIDTH != 0)) +#define KASAN_TAG_PGSHIFT (KASAN_TAG_PGOFF * (KASAN_TAG_WIDTH != 0)) + +/* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */ +#ifdef NODE_NOT_IN_PAGE_FLAGS +#define ZONEID_SHIFT (SECTIONS_SHIFT + ZONES_SHIFT) +#define ZONEID_PGOFF ((SECTIONS_PGOFF < ZONES_PGOFF) ? \ + SECTIONS_PGOFF : ZONES_PGOFF) +#else +#define ZONEID_SHIFT (NODES_SHIFT + ZONES_SHIFT) +#define ZONEID_PGOFF ((NODES_PGOFF < ZONES_PGOFF) ? \ + NODES_PGOFF : ZONES_PGOFF) +#endif + +#define ZONEID_PGSHIFT (ZONEID_PGOFF * (ZONEID_SHIFT != 0)) + +#define ZONES_MASK ((1UL << ZONES_WIDTH) - 1) +#define NODES_MASK ((1UL << NODES_WIDTH) - 1) +#define SECTIONS_MASK ((1UL << SECTIONS_WIDTH) - 1) +#define LAST_CPUPID_MASK ((1UL << LAST_CPUPID_SHIFT) - 1) +#define KASAN_TAG_MASK ((1UL << KASAN_TAG_WIDTH) - 1) +#define ZONEID_MASK ((1UL << ZONEID_SHIFT) - 1) + +static inline enum zone_type page_zonenum(const struct page *page) +{ + ASSERT_EXCLUSIVE_BITS(page->flags, ZONES_MASK << ZONES_PGSHIFT); + return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK; +} + +static inline enum zone_type folio_zonenum(const struct folio *folio) +{ + return page_zonenum(&folio->page); +} + +#ifdef CONFIG_ZONE_DEVICE +static inline bool is_zone_device_page(const struct page *page) +{ + return page_zonenum(page) == ZONE_DEVICE; +} +extern void memmap_init_zone_device(struct zone *, unsigned long, + unsigned long, struct dev_pagemap *); +#else +static inline bool is_zone_device_page(const struct page *page) +{ + return false; +} +#endif + +static inline bool folio_is_zone_device(const struct folio *folio) +{ + return is_zone_device_page(&folio->page); +} + +static inline bool is_zone_movable_page(const struct page *page) +{ + return page_zonenum(page) == ZONE_MOVABLE; +} +#endif + /* * Return true if [start_pfn, start_pfn + nr_pages) range has a non-empty * intersection with the given zone -- cgit From f25cbb7a95a24ff9a2a3bebd308e303942ae6b2c Mon Sep 17 00:00:00 2001 From: Alex Sierra Date: Fri, 15 Jul 2022 10:05:10 -0500 Subject: mm: add zone device coherent type memory support Device memory that is cache coherent from device and CPU point of view. This is used on platforms that have an advanced system bus (like CAPI or CXL). Any page of a process can be migrated to such memory. However, no one should be allowed to pin such memory so that it can always be evicted. [hch@lst.de: rebased ontop of the refcount changes, remove is_dev_private_or_coherent_page] Link: https://lkml.kernel.org/r/20220715150521.18165-4-alex.sierra@amd.com Signed-off-by: Alex Sierra Signed-off-by: Christoph Hellwig Acked-by: Felix Kuehling Reviewed-by: Alistair Popple Acked-by: David Hildenbrand Cc: Jason Gunthorpe Cc: Jerome Glisse Cc: Matthew Wilcox Cc: Ralph Campbell Signed-off-by: Andrew Morton --- include/linux/memremap.h | 19 +++++++++++++++++++ include/linux/mm.h | 5 ++++- 2 files changed, 23 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 732dde5988fb..09320b7f706c 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -41,6 +41,13 @@ struct vmem_altmap { * A more complete discussion of unaddressable memory may be found in * include/linux/hmm.h and Documentation/mm/hmm.rst. * + * MEMORY_DEVICE_COHERENT: + * Device memory that is cache coherent from device and CPU point of view. This + * is used on platforms that have an advanced system bus (like CAPI or CXL). A + * driver can hotplug the device memory using ZONE_DEVICE and with that memory + * type. Any page of a process can be migrated to such memory. However no one + * should be allowed to pin such memory so that it can always be evicted. + * * MEMORY_DEVICE_FS_DAX: * Host memory that has similar access semantics as System RAM i.e. DMA * coherent and supports page pinning. In support of coordinating page @@ -61,6 +68,7 @@ struct vmem_altmap { enum memory_type { /* 0 is reserved to catch uninitialized type fields */ MEMORY_DEVICE_PRIVATE = 1, + MEMORY_DEVICE_COHERENT, MEMORY_DEVICE_FS_DAX, MEMORY_DEVICE_GENERIC, MEMORY_DEVICE_PCI_P2PDMA, @@ -150,6 +158,17 @@ static inline bool is_pci_p2pdma_page(const struct page *page) page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA; } +static inline bool is_device_coherent_page(const struct page *page) +{ + return is_zone_device_page(page) && + page->pgmap->type == MEMORY_DEVICE_COHERENT; +} + +static inline bool folio_is_device_coherent(const struct folio *folio) +{ + return is_device_coherent_page(&folio->page); +} + #ifdef CONFIG_ZONE_DEVICE void *memremap_pages(struct dev_pagemap *pgmap, int nid); void memunmap_pages(struct dev_pagemap *pgmap); diff --git a/include/linux/mm.h b/include/linux/mm.h index a2d01e49253b..64393ed3330a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -28,6 +28,7 @@ #include #include #include +#include struct mempolicy; struct anon_vma; @@ -1537,7 +1538,9 @@ static inline bool is_longterm_pinnable_page(struct page *page) if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) return false; #endif - return !is_zone_movable_page(page) || is_zero_pfn(page_to_pfn(page)); + return !(is_device_coherent_page(page) || + is_zone_movable_page(page) || + is_zero_pfn(page_to_pfn(page))); } #else static inline bool is_longterm_pinnable_page(struct page *page) -- cgit From dd19e6d8ffaa1289d75d7833de97faf1b6b2c8e4 Mon Sep 17 00:00:00 2001 From: Alex Sierra Date: Fri, 15 Jul 2022 10:05:12 -0500 Subject: mm: add device coherent vma selection for memory migration This case is used to migrate pages from device memory, back to system memory. Device coherent type memory is cache coherent from device and CPU point of view. Link: https://lkml.kernel.org/r/20220715150521.18165-6-alex.sierra@amd.com Signed-off-by: Alex Sierra Signed-off-by: Christoph Hellwig Acked-by: Felix Kuehling Reviewed-by: Alistair Poppple Reviewed-by: David Hildenbrand Cc: Jason Gunthorpe Cc: Jerome Glisse Cc: Matthew Wilcox Cc: Ralph Campbell Signed-off-by: Andrew Morton --- include/linux/migrate.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 069a89e847f3..b84908debe5c 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -148,6 +148,7 @@ static inline unsigned long migrate_pfn(unsigned long pfn) enum migrate_vma_direction { MIGRATE_VMA_SELECT_SYSTEM = 1 << 0, MIGRATE_VMA_SELECT_DEVICE_PRIVATE = 1 << 1, + MIGRATE_VMA_SELECT_DEVICE_COHERENT = 1 << 2, }; struct migrate_vma { -- cgit From 8012b866085523758780850087102421dbcce522 Mon Sep 17 00:00:00 2001 From: Shiyang Ruan Date: Fri, 3 Jun 2022 13:37:25 +0800 Subject: dax: introduce holder for dax_device Patch series "v14 fsdax-rmap + v11 fsdax-reflink", v2. The patchset fsdax-rmap is aimed to support shared pages tracking for fsdax. It moves owner tracking from dax_assocaite_entry() to pmem device driver, by introducing an interface ->memory_failure() for struct pagemap. This interface is called by memory_failure() in mm, and implemented by pmem device. Then call holder operations to find the filesystem which the corrupted data located in, and call filesystem handler to track files or metadata associated with this page. Finally we are able to try to fix the corrupted data in filesystem and do other necessary processing, such as killing processes who are using the files affected. The call trace is like this: memory_failure() |* fsdax case |------------ |pgmap->ops->memory_failure() => pmem_pgmap_memory_failure() | dax_holder_notify_failure() => | dax_device->holder_ops->notify_failure() => | - xfs_dax_notify_failure() | |* xfs_dax_notify_failure() | |-------------------------- | | xfs_rmap_query_range() | | xfs_dax_failure_fn() | | * corrupted on metadata | | try to recover data, call xfs_force_shutdown() | | * corrupted on file data | | try to recover data, call mf_dax_kill_procs() |* normal case |------------- |mf_generic_kill_procs() The patchset fsdax-reflink attempts to add CoW support for fsdax, and takes XFS, which has both reflink and fsdax features, as an example. One of the key mechanisms needed to be implemented in fsdax is CoW. Copy the data from srcmap before we actually write data to the destination iomap. And we just copy range in which data won't be changed. Another mechanism is range comparison. In page cache case, readpage() is used to load data on disk to page cache in order to be able to compare data. In fsdax case, readpage() does not work. So, we need another compare data with direct access support. With the two mechanisms implemented in fsdax, we are able to make reflink and fsdax work together in XFS. This patch (of 14): To easily track filesystem from a pmem device, we introduce a holder for dax_device structure, and also its operation. This holder is used to remember who is using this dax_device: - When it is the backend of a filesystem, the holder will be the instance of this filesystem. - When this pmem device is one of the targets in a mapped device, the holder will be this mapped device. In this case, the mapped device has its own dax_device and it will follow the first rule. So that we can finally track to the filesystem we needed. The holder and holder_ops will be set when filesystem is being mounted, or an target device is being activated. Link: https://lkml.kernel.org/r/20220603053738.1218681-1-ruansy.fnst@fujitsu.com Link: https://lkml.kernel.org/r/20220603053738.1218681-2-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan Reviewed-by: Christoph Hellwig Reviewed-by: Dan Williams Reviewed-by: Darrick J. Wong Cc: Dave Chinner Cc: Jane Chu Cc: Goldwyn Rodrigues Cc: Al Viro Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Miaohe Lin Cc: Dan Williams Cc: Goldwyn Rodrigues Cc: Ritesh Harjani Signed-off-by: Andrew Morton --- include/linux/dax.h | 33 +++++++++++++++++++++++++-------- 1 file changed, 25 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dax.h b/include/linux/dax.h index e7b81634c52a..cf85fc36da5f 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -43,8 +43,21 @@ struct dax_operations { void *addr, size_t bytes, struct iov_iter *iter); }; +struct dax_holder_operations { + /* + * notify_failure - notify memory failure into inner holder device + * @dax_dev: the dax device which contains the holder + * @offset: offset on this dax device where memory failure occurs + * @len: length of this memory failure event + * @flags: action flags for memory failure handler + */ + int (*notify_failure)(struct dax_device *dax_dev, u64 offset, + u64 len, int mf_flags); +}; + #if IS_ENABLED(CONFIG_DAX) struct dax_device *alloc_dax(void *private, const struct dax_operations *ops); +void *dax_holder(struct dax_device *dax_dev); void put_dax(struct dax_device *dax_dev); void kill_dax(struct dax_device *dax_dev); void dax_write_cache(struct dax_device *dax_dev, bool wc); @@ -66,6 +79,10 @@ static inline bool daxdev_mapping_supported(struct vm_area_struct *vma, return dax_synchronous(dax_dev); } #else +static inline void *dax_holder(struct dax_device *dax_dev) +{ + return NULL; +} static inline struct dax_device *alloc_dax(void *private, const struct dax_operations *ops) { @@ -114,12 +131,9 @@ struct writeback_control; #if defined(CONFIG_BLOCK) && defined(CONFIG_FS_DAX) int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk); void dax_remove_host(struct gendisk *disk); -struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, - u64 *start_off); -static inline void fs_put_dax(struct dax_device *dax_dev) -{ - put_dax(dax_dev); -} +struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, u64 *start_off, + void *holder, const struct dax_holder_operations *ops); +void fs_put_dax(struct dax_device *dax_dev, void *holder); #else static inline int dax_add_host(struct dax_device *dax_dev, struct gendisk *disk) { @@ -129,11 +143,12 @@ static inline void dax_remove_host(struct gendisk *disk) { } static inline struct dax_device *fs_dax_get_by_bdev(struct block_device *bdev, - u64 *start_off) + u64 *start_off, void *holder, + const struct dax_holder_operations *ops) { return NULL; } -static inline void fs_put_dax(struct dax_device *dax_dev) +static inline void fs_put_dax(struct dax_device *dax_dev, void *holder) { } #endif /* CONFIG_BLOCK && CONFIG_FS_DAX */ @@ -203,6 +218,8 @@ size_t dax_copy_to_iter(struct dax_device *dax_dev, pgoff_t pgoff, void *addr, size_t bytes, struct iov_iter *i); int dax_zero_page_range(struct dax_device *dax_dev, pgoff_t pgoff, size_t nr_pages); +int dax_holder_notify_failure(struct dax_device *dax_dev, u64 off, u64 len, + int mf_flags); void dax_flush(struct dax_device *dax_dev, void *addr, size_t size); ssize_t dax_iomap_rw(struct kiocb *iocb, struct iov_iter *iter, -- cgit From 33a8f7f2b3a3437d016d1b4047a4fd37eb6951b3 Mon Sep 17 00:00:00 2001 From: Shiyang Ruan Date: Fri, 3 Jun 2022 13:37:27 +0800 Subject: pagemap,pmem: introduce ->memory_failure() When memory-failure occurs, we call this function which is implemented by each kind of devices. For the fsdax case, pmem device driver implements it. Pmem device driver will find out the filesystem in which the corrupted page located in. With dax_holder notify support, we are able to notify the memory failure from pmem driver to upper layers. If there is something not support in the notify routine, memory_failure will fall back to the generic hanlder. Link: https://lkml.kernel.org/r/20220603053738.1218681-4-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan Reviewed-by: Christoph Hellwig Reviewed-by: Dan Williams Reviewed-by: Darrick J. Wong Reviewed-by: Naoya Horiguchi Cc: Al Viro Cc: Dan Williams Cc: Dave Chinner Cc: Goldwyn Rodrigues Cc: Goldwyn Rodrigues Cc: Jane Chu Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Ritesh Harjani Signed-off-by: Andrew Morton --- include/linux/memremap.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 09320b7f706c..19010491a603 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -87,6 +87,18 @@ struct dev_pagemap_ops { * the page back to a CPU accessible page. */ vm_fault_t (*migrate_to_ram)(struct vm_fault *vmf); + + /* + * Handle the memory failure happens on a range of pfns. Notify the + * processes who are using these pfns, and try to recover the data on + * them if necessary. The mf_flags is finally passed to the recover + * function through the whole notify routine. + * + * When this is not implemented, or it returns -EOPNOTSUPP, the caller + * will fall back to a common handler called mf_generic_kill_procs(). + */ + int (*memory_failure)(struct dev_pagemap *pgmap, unsigned long pfn, + unsigned long nr_pages, int mf_flags); }; #define PGMAP_ALTMAP_VALID (1 << 0) -- cgit From 2f437effc689ef913fbe5e31110580b4e7cf04be Mon Sep 17 00:00:00 2001 From: Shiyang Ruan Date: Fri, 3 Jun 2022 13:37:28 +0800 Subject: fsdax: introduce dax_lock_mapping_entry() The current dax_lock_page() locks dax entry by obtaining mapping and index in page. To support 1-to-N RMAP in NVDIMM, we need a new function to lock a specific dax entry corresponding to this file's mapping,index. And output the page corresponding to the specific dax entry for caller use. Link: https://lkml.kernel.org/r/20220603053738.1218681-5-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Cc: Al Viro Cc: Dan Williams Cc: Dan Williams Cc: Dave Chinner Cc: Goldwyn Rodrigues Cc: Goldwyn Rodrigues Cc: Jane Chu Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Ritesh Harjani Signed-off-by: Andrew Morton --- include/linux/dax.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dax.h b/include/linux/dax.h index cf85fc36da5f..7116681b48c0 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -161,6 +161,10 @@ struct page *dax_layout_busy_page(struct address_space *mapping); struct page *dax_layout_busy_page_range(struct address_space *mapping, loff_t start, loff_t end); dax_entry_t dax_lock_page(struct page *page); void dax_unlock_page(struct page *page, dax_entry_t cookie); +dax_entry_t dax_lock_mapping_entry(struct address_space *mapping, + unsigned long index, struct page **page); +void dax_unlock_mapping_entry(struct address_space *mapping, + unsigned long index, dax_entry_t cookie); #else static inline struct page *dax_layout_busy_page(struct address_space *mapping) { @@ -188,6 +192,17 @@ static inline dax_entry_t dax_lock_page(struct page *page) static inline void dax_unlock_page(struct page *page, dax_entry_t cookie) { } + +static inline dax_entry_t dax_lock_mapping_entry(struct address_space *mapping, + unsigned long index, struct page **page) +{ + return 0; +} + +static inline void dax_unlock_mapping_entry(struct address_space *mapping, + unsigned long index, dax_entry_t cookie) +{ +} #endif int dax_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, -- cgit From c36e2024957120566efd99395b5c8cc95b5175c1 Mon Sep 17 00:00:00 2001 From: Shiyang Ruan Date: Fri, 3 Jun 2022 13:37:29 +0800 Subject: mm: introduce mf_dax_kill_procs() for fsdax case This new function is a variant of mf_generic_kill_procs that accepts a file, offset pair instead of a struct to support multiple files sharing a DAX mapping. It is intended to be called by the file systems as part of the memory_failure handler after the file system performed a reverse mapping from the storage address to the file and file offset. Link: https://lkml.kernel.org/r/20220603053738.1218681-6-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan Reviewed-by: Dan Williams Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Reviewed-by: Miaohe Lin Cc: Al Viro Cc: Dan Williams Cc: Dave Chinner Cc: Goldwyn Rodrigues Cc: Goldwyn Rodrigues Cc: Jane Chu Cc: Matthew Wilcox Cc: Naoya Horiguchi Cc: Ritesh Harjani Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 64393ed3330a..d4ebfc206e2b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3178,6 +3178,8 @@ enum mf_flags { MF_UNPOISON = 1 << 4, MF_SW_SIMULATED = 1 << 5, }; +int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, + unsigned long count, int mf_flags); extern int memory_failure(unsigned long pfn, int flags); extern void memory_failure_queue(unsigned long pfn, int flags); extern void memory_failure_queue_kick(int cpu); -- cgit From 6061b69b9a550a2ab84e805d0d2315ba6215f112 Mon Sep 17 00:00:00 2001 From: Shiyang Ruan Date: Fri, 3 Jun 2022 13:37:31 +0800 Subject: fsdax: set a CoW flag when associate reflink mappings Introduce a PAGE_MAPPING_DAX_COW flag to support association with CoW file mappings. In this case, since the dax-rmap has already took the responsibility to look up for shared files by given dax page, the page->mapping is no longer to used for rmap but for marking that this dax page is shared. And to make sure disassociation works fine, we use page->index as refcount, and clear page->mapping to the initial state when page->index is decreased to 0. With the help of this new flag, it is able to distinguish normal case and CoW case, and keep the warning in normal case. Link: https://lkml.kernel.org/r/20220603053738.1218681-8-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Cc: Al Viro Cc: Dan Williams Cc: Dan Williams Cc: Dave Chinner Cc: Goldwyn Rodrigues Cc: Goldwyn Rodrigues Cc: Jane Chu Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Ritesh Harjani Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 82719d33c0f1..f2ff65f1bf83 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -661,6 +661,12 @@ PAGEFLAG_FALSE(VmemmapSelfHosted, vmemmap_self_hosted) #define PAGE_MAPPING_KSM (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE) #define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_MOVABLE) +/* + * Different with flags above, this flag is used only for fsdax mode. It + * indicates that this page->mapping is now under reflink case. + */ +#define PAGE_MAPPING_DAX_COW 0x1 + static __always_inline bool folio_mapping_flags(struct folio *folio) { return ((unsigned long)folio->mapping & PAGE_MAPPING_FLAGS) != 0; -- cgit From 6f7db3894ae23eb5d40af4efb404aa0c072a68d2 Mon Sep 17 00:00:00 2001 From: Shiyang Ruan Date: Fri, 3 Jun 2022 13:37:36 +0800 Subject: fsdax: dedup file range to use a compare function With dax we cannot deal with readpage() etc. So, we create a dax comparison function which is similar with vfs_dedupe_file_range_compare(). And introduce dax_remap_file_range_prep() for filesystem use. Link: https://lkml.kernel.org/r/20220603053738.1218681-13-ruansy.fnst@fujitsu.com Signed-off-by: Goldwyn Rodrigues Signed-off-by: Shiyang Ruan Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig Cc: Al Viro Cc: Dan Williams Cc: Dan Williams Cc: Dave Chinner Cc: Goldwyn Rodrigues Cc: Jane Chu Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Naoya Horiguchi Cc: Ritesh Harjani Signed-off-by: Andrew Morton --- include/linux/dax.h | 8 ++++++++ include/linux/fs.h | 12 ++++++++---- 2 files changed, 16 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dax.h b/include/linux/dax.h index 7116681b48c0..ba985333e26b 100644 --- a/include/linux/dax.h +++ b/include/linux/dax.h @@ -246,6 +246,14 @@ vm_fault_t dax_finish_sync_fault(struct vm_fault *vmf, int dax_delete_mapping_entry(struct address_space *mapping, pgoff_t index); int dax_invalidate_mapping_entry_sync(struct address_space *mapping, pgoff_t index); +int dax_dedupe_file_range_compare(struct inode *src, loff_t srcoff, + struct inode *dest, loff_t destoff, + loff_t len, bool *is_same, + const struct iomap_ops *ops); +int dax_remap_file_range_prep(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t *len, unsigned int remap_flags, + const struct iomap_ops *ops); static inline bool dax_mapping(struct address_space *mapping) { return mapping->host && IS_DAX(mapping->host); diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ad5e3520fae..134e9d7ad5d6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -74,6 +74,7 @@ struct fsverity_operations; struct fs_context; struct fs_parameter_spec; struct fileattr; +struct iomap_ops; extern void __init inode_init(void); extern void __init inode_init_early(void); @@ -2070,10 +2071,13 @@ extern ssize_t vfs_copy_file_range(struct file *, loff_t , struct file *, extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, size_t len, unsigned int flags); -extern int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, - struct file *file_out, loff_t pos_out, - loff_t *count, - unsigned int remap_flags); +int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t *len, unsigned int remap_flags, + const struct iomap_ops *dax_read_ops); +int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, + struct file *file_out, loff_t pos_out, + loff_t *count, unsigned int remap_flags); extern loff_t do_clone_file_range(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t len, unsigned int remap_flags); -- cgit From 4fa6893faeaaea4fe4440512d2a708527ef47051 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 16 Jun 2022 10:48:35 -0700 Subject: mm: thp: consolidate vma size check to transhuge_vma_suitable There are couple of places that check whether the vma size is ok for THP or whether address fits, they are open coded and duplicate, use transhuge_vma_suitable() to do the job by passing in (vma->end - HPAGE_PMD_SIZE). Move vma size check into hugepage_vma_check(). This will make khugepaged_enter() is as same as khugepaged_enter_vma(). There is just one caller for khugepaged_enter(), replace it to khugepaged_enter_vma() and remove khugepaged_enter(). Link: https://lkml.kernel.org/r/20220616174840.1202070-3-shy828301@gmail.com Signed-off-by: Yang Shi Reviewed-by: Zach O'Keefe Cc: Kirill A. Shutemov Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 11 +++++++++++ include/linux/khugepaged.h | 14 -------------- 2 files changed, 11 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 648cb3ce7099..8a5a8bfce0f5 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -116,6 +116,17 @@ extern struct kobj_attribute shmem_enabled_attr; extern unsigned long transparent_hugepage_flags; +/* + * Do the below checks: + * - For file vma, check if the linear page offset of vma is + * HPAGE_PMD_NR aligned within the file. The hugepage is + * guaranteed to be hugepage-aligned within the file, but we must + * check that the PMD-aligned addresses in the VMA map to + * PMD-aligned offsets within the file, else the hugepage will + * not be PMD-mappable. + * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE + * area. + */ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, unsigned long addr) { diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 392d34c3c59a..31ca8a7f78f4 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -51,16 +51,6 @@ static inline void khugepaged_exit(struct mm_struct *mm) if (test_bit(MMF_VM_HUGEPAGE, &mm->flags)) __khugepaged_exit(mm); } - -static inline void khugepaged_enter(struct vm_area_struct *vma, - unsigned long vm_flags) -{ - if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags) && - khugepaged_enabled()) { - if (hugepage_vma_check(vma, vm_flags)) - __khugepaged_enter(vma->vm_mm); - } -} #else /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm) { @@ -68,10 +58,6 @@ static inline void khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm static inline void khugepaged_exit(struct mm_struct *mm) { } -static inline void khugepaged_enter(struct vm_area_struct *vma, - unsigned long vm_flags) -{ -} static inline void khugepaged_enter_vma(struct vm_area_struct *vma, unsigned long vm_flags) { -- cgit From 9fec51689ff60d9766b38051a0b1692f93d95364 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 16 Jun 2022 10:48:37 -0700 Subject: mm: thp: kill transparent_hugepage_active() The transparent_hugepage_active() was introduced to show THP eligibility bit in smaps in proc, smaps is the only user. But it actually does the similar check as hugepage_vma_check() which is used by khugepaged. We definitely don't have to maintain two similar checks, so kill transparent_hugepage_active(). This patch also fixed the wrong behavior for VM_NO_KHUGEPAGED vmas. Also move hugepage_vma_check() to huge_memory.c and huge_mm.h since it is not only for khugepaged anymore. [akpm@linux-foundation.org: check vma->vm_mm, per Zach] [akpm@linux-foundation.org: add comment to vdso check] Link: https://lkml.kernel.org/r/20220616174840.1202070-5-shy828301@gmail.com Signed-off-by: Yang Shi Reviewed-by: Zach O'Keefe Cc: Kirill A. Shutemov Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 16 ++++++++++------ include/linux/khugepaged.h | 2 -- 2 files changed, 10 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 8a5a8bfce0f5..64487bcd0c7b 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -202,7 +202,9 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } -bool transparent_hugepage_active(struct vm_area_struct *vma); +bool hugepage_vma_check(struct vm_area_struct *vma, + unsigned long vm_flags, + bool smaps); #define transparent_hugepage_use_zero_page() \ (transparent_hugepage_flags & \ @@ -351,11 +353,6 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) return false; } -static inline bool transparent_hugepage_active(struct vm_area_struct *vma) -{ - return false; -} - static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, unsigned long addr) { @@ -368,6 +365,13 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, return false; } +static inline bool hugepage_vma_check(struct vm_area_struct *vma, + unsigned long vm_flags, + bool smaps) +{ + return false; +} + static inline void prep_transhuge_page(struct page *page) {} #define transparent_hugepage_flags 0UL diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 31ca8a7f78f4..ea5fd4c398f7 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -10,8 +10,6 @@ extern struct attribute_group khugepaged_attr_group; extern int khugepaged_init(void); extern void khugepaged_destroy(void); extern int start_stop_khugepaged(void); -extern bool hugepage_vma_check(struct vm_area_struct *vma, - unsigned long vm_flags); extern void __khugepaged_enter(struct mm_struct *mm); extern void __khugepaged_exit(struct mm_struct *mm); extern void khugepaged_enter_vma(struct vm_area_struct *vma, -- cgit From 7da4e2cb8b1ff8221759bfc7512d651ee69516dc Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 16 Jun 2022 10:48:38 -0700 Subject: mm: thp: kill __transhuge_page_enabled() The page fault path checks THP eligibility with __transhuge_page_enabled() which does the similar thing as hugepage_vma_check(), so use hugepage_vma_check() instead. However page fault allows DAX and !anon_vma cases, so added a new flag, in_pf, to hugepage_vma_check() to make page fault work correctly. The in_pf flag is also used to skip shmem and file THP for page fault since shmem handles THP in its own shmem_fault() and file THP allocation on fault is not supported yet. Also remove hugepage_vma_enabled() since hugepage_vma_check() is the only caller now, it is not necessary to have a helper function. Link: https://lkml.kernel.org/r/20220616174840.1202070-6-shy828301@gmail.com Signed-off-by: Yang Shi Reviewed-by: Zach O'Keefe Cc: Kirill A. Shutemov Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 57 ++----------------------------------------------- 1 file changed, 2 insertions(+), 55 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 64487bcd0c7b..cd8a6c5d9fe5 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -146,48 +146,6 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, return true; } -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, - unsigned long vm_flags) -{ - /* Explicitly disabled through madvise. */ - if ((vm_flags & VM_NOHUGEPAGE) || - test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) - return false; - return true; -} - -/* - * to be used on vmas which are known to support THP. - * Use transparent_hugepage_active otherwise - */ -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) -{ - - /* - * If the hardware/firmware marked hugepage support disabled. - */ - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX)) - return false; - - if (!transhuge_vma_enabled(vma, vma->vm_flags)) - return false; - - if (vma_is_temporary_stack(vma)) - return false; - - if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_FLAG)) - return true; - - if (vma_is_dax(vma)) - return true; - - if (transparent_hugepage_flags & - (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) - return !!(vma->vm_flags & VM_HUGEPAGE); - - return false; -} - static inline bool file_thp_enabled(struct vm_area_struct *vma) { struct inode *inode; @@ -204,7 +162,7 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, - bool smaps); + bool smaps, bool in_pf); #define transparent_hugepage_use_zero_page() \ (transparent_hugepage_flags & \ @@ -348,26 +306,15 @@ static inline bool folio_test_pmd_mappable(struct folio *folio) return false; } -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) -{ - return false; -} - static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, unsigned long addr) { return false; } -static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, - unsigned long vm_flags) -{ - return false; -} - static inline bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, - bool smaps) + bool smaps, bool in_pf) { return false; } -- cgit From 1064026bab9f011bdea1251d44d66bbbcee04f6e Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Thu, 16 Jun 2022 10:48:39 -0700 Subject: mm: khugepaged: reorg some khugepaged helpers The khugepaged_{enabled|always|req_madv} are not khugepaged only anymore, move them to huge_mm.h and rename to hugepage_flags_xxx, and remove khugepaged_req_madv due to no users. Also move khugepaged_defrag to khugepaged.c since its only caller is in that file, it doesn't have to be in a header file. Link: https://lkml.kernel.org/r/20220616174840.1202070-7-shy828301@gmail.com Signed-off-by: Yang Shi Reviewed-by: Zach O'Keefe Cc: Kirill A. Shutemov Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 8 ++++++++ include/linux/khugepaged.h | 14 -------------- 2 files changed, 8 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index cd8a6c5d9fe5..ae3d8e2fd9e2 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -116,6 +116,14 @@ extern struct kobj_attribute shmem_enabled_attr; extern unsigned long transparent_hugepage_flags; +#define hugepage_flags_enabled() \ + (transparent_hugepage_flags & \ + ((1<flags)) -- cgit From e95a9851787bbb3cd4deb40fe8bab03f731852d1 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 21 Jun 2022 16:56:17 -0700 Subject: hugetlb: skip to end of PT page mapping when pte not present Patch series "hugetlb: speed up linear address scanning", v2. At unmap, fork and remap time hugetlb address ranges are linearly scanned. We can optimize these scans if the ranges are sparsely populated. Also, enable page table "Lazy copy" for hugetlb at fork. NOTE: Architectures not defining CONFIG_ARCH_WANT_GENERAL_HUGETLB need to add an arch specific version hugetlb_mask_last_page() to take advantage of sparse address scanning improvements. Baolin Wang added the routine for arm64. Other architectures which could be optimized are: ia64, mips, parisc, powerpc, s390, sh and sparc. This patch (of 4): HugeTLB address ranges are linearly scanned during fork, unmap and remap operations. If a non-present entry is encountered, the code currently continues to the next huge page aligned address. However, a non-present entry implies that the page table page for that entry is not present. Therefore, the linear scan can skip to the end of range mapped by the page table page. This can speed operations on large sparsely populated hugetlb mappings. Create a new routine hugetlb_mask_last_page() that will return an address mask. When the mask is ORed with an address, the result will be the address of the last huge page mapped by the associated page table page. Use this mask to update addresses in routines which linearly scan hugetlb address ranges when a non-present pte is encountered. hugetlb_mask_last_page is related to the implementation of huge_pte_offset as hugetlb_mask_last_page is called when huge_pte_offset returns NULL. This patch only provides a complete hugetlb_mask_last_page implementation when CONFIG_ARCH_WANT_GENERAL_HUGETLB is defined. Architectures which provide their own versions of huge_pte_offset can also provide their own version of hugetlb_mask_last_page. Link: https://lkml.kernel.org/r/20220621235620.291305-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20220621235620.291305-2-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz Tested-by: Baolin Wang Reviewed-by: Baolin Wang Acked-by: Muchun Song Reported-by: kernel test robot Cc: Michal Hocko Cc: Peter Xu Cc: Naoya Horiguchi Cc: James Houghton Cc: Mina Almasry Cc: "Aneesh Kumar K.V" Cc: Anshuman Khandual Cc: Paul Walmsley Cc: Christian Borntraeger Cc: Catalin Marinas Cc: Will Deacon Cc: Rolf Eike Beer Cc: David Hildenbrand Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index c6cccfaf8708..ce30fad5fd13 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -194,6 +194,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz); pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); +unsigned long hugetlb_mask_last_page(struct hstate *h); int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long *addr, pte_t *ptep); void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, -- cgit From 4ddb4d91b82f4b64458fe35bc8e395c7c082ea2b Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Tue, 21 Jun 2022 16:56:19 -0700 Subject: hugetlb: do not update address in huge_pmd_unshare As an optimization for loops sequentially processing hugetlb address ranges, huge_pmd_unshare would update a passed address if it unshared a pmd. Updating a loop control variable outside the loop like this is generally a bad idea. These loops are now using hugetlb_mask_last_page to optimize scanning when non-present ptes are discovered. The same can be done when huge_pmd_unshare returns 1 indicating a pmd was unshared. Remove address update from huge_pmd_unshare. Change the passed argument type and update all callers. In loops sequentially processing addresses use hugetlb_mask_last_page to update address if pmd is unshared. [sfr@canb.auug.org.au: fix an unused variable warning/error] Link: https://lkml.kernel.org/r/20220622171117.70850960@canb.auug.org.au Link: https://lkml.kernel.org/r/20220621235620.291305-4-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz Signed-off-by: Stephen Rothwell Acked-by: Muchun Song Reviewed-by: Baolin Wang Cc: "Aneesh Kumar K.V" Cc: Anshuman Khandual Cc: Catalin Marinas Cc: Christian Borntraeger Cc: David Hildenbrand Cc: James Houghton Cc: kernel test robot Cc: Michal Hocko Cc: Mina Almasry Cc: Naoya Horiguchi Cc: Paul Walmsley Cc: Peter Xu Cc: Rolf Eike Beer Cc: Will Deacon Cc: Stephen Rothwell Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ce30fad5fd13..75ee739d815b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -196,7 +196,7 @@ pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); unsigned long hugetlb_mask_last_page(struct hstate *h); int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long *addr, pte_t *ptep); + unsigned long addr, pte_t *ptep); void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end); struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, @@ -243,7 +243,7 @@ static inline struct address_space *hugetlb_page_mapping_lock_write( static inline int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long *addr, pte_t *ptep) + unsigned long addr, pte_t *ptep) { return 0; } -- cgit From bf75f200569dd05ac2112797f44548beb6b4be26 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Fri, 24 Jun 2022 13:54:17 +0100 Subject: mm/page_alloc: add page->buddy_list and page->pcp_list Patch series "Drain remote per-cpu directly", v5. Some setups, notably NOHZ_FULL CPUs, may be running realtime or latency-sensitive applications that cannot tolerate interference due to per-cpu drain work queued by __drain_all_pages(). Introduce a new mechanism to remotely drain the per-cpu lists. It is made possible by remotely locking 'struct per_cpu_pages' new per-cpu spinlocks. This has two advantages, the time to drain is more predictable and other unrelated tasks are not interrupted. This series has the same intent as Nicolas' series "mm/page_alloc: Remote per-cpu lists drain support" -- avoid interference of a high priority task due to a workqueue item draining per-cpu page lists. While many workloads can tolerate a brief interruption, it may cause a real-time task running on a NOHZ_FULL CPU to miss a deadline and at minimum, the draining is non-deterministic. Currently an IRQ-safe local_lock protects the page allocator per-cpu lists. The local_lock on its own prevents migration and the IRQ disabling protects from corruption due to an interrupt arriving while a page allocation is in progress. This series adjusts the locking. A spinlock is added to struct per_cpu_pages to protect the list contents while local_lock_irq is ultimately replaced by just the spinlock in the final patch. This allows a remote CPU to safely. Follow-on work should allow the spin_lock_irqsave to be converted to spin_lock to avoid IRQs being disabled/enabled in most cases. The follow-on patch will be one kernel release later as it is relatively high risk and it'll make bisections more clear if there are any problems. Patch 1 is a cosmetic patch to clarify when page->lru is storing buddy pages and when it is storing per-cpu pages. Patch 2 shrinks per_cpu_pages to make room for a spin lock. Strictly speaking this is not necessary but it avoids per_cpu_pages consuming another cache line. Patch 3 is a preparation patch to avoid code duplication. Patch 4 is a minor correction. Patch 5 uses a spin_lock to protect the per_cpu_pages contents while still relying on local_lock to prevent migration, stabilise the pcp lookup and prevent IRQ reentrancy. Patch 6 remote drains per-cpu pages directly instead of using a workqueue. Patch 7 uses a normal spinlock instead of local_lock for remote draining This patch (of 7): The page allocator uses page->lru for storing pages on either buddy or PCP lists. Create page->buddy_list and page->pcp_list as a union with page->lru. This is simply to clarify what type of list a page is on in the page allocator. No functional change intended. [minchan@kernel.org: fix page lru fields in macros] Link: https://lkml.kernel.org/r/20220624125423.6126-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman Tested-by: Minchan Kim Acked-by: Minchan Kim Reviewed-by: Nicolas Saenz Julienne Acked-by: Vlastimil Babka Tested-by: Yu Zhao Cc: Marcelo Tosatti Cc: Michal Hocko Cc: Hugh Dickins Cc: Marek Szyprowski Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6b961a29bf26..cf97f3884fda 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -87,6 +87,7 @@ struct page { */ union { struct list_head lru; + /* Or, for the Unevictable "LRU list" slot */ struct { /* Always even, to negate PageTail */ @@ -94,6 +95,10 @@ struct page { /* Count page's or folio's mlocks */ unsigned int mlock_count; }; + + /* Or, free page */ + struct list_head buddy_list; + struct list_head pcp_list; }; /* See page-flags.h for PAGE_MAPPING_FLAGS */ struct address_space *mapping; -- cgit From 5d0a661d808fc8ddc26940b1a12b82ae356f3ae2 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Fri, 24 Jun 2022 13:54:18 +0100 Subject: mm/page_alloc: use only one PCP list for THP-sized allocations The per_cpu_pages is cache-aligned on a standard x86-64 distribution configuration but a later patch will add a new field which would push the structure into the next cache line. Use only one list to store THP-sized pages on the per-cpu list. This assumes that the vast majority of THP-sized allocations are GFP_MOVABLE but even if it was another type, it would not contribute to serious fragmentation that potentially causes a later THP allocation failure. Align per_cpu_pages on the cacheline boundary to ensure there is no false cache sharing. After this patch, the structure sizing is; struct per_cpu_pages { int count; /* 0 4 */ int high; /* 4 4 */ int batch; /* 8 4 */ short int free_factor; /* 12 2 */ short int expire; /* 14 2 */ struct list_head lists[13]; /* 16 208 */ /* size: 256, cachelines: 4, members: 6 */ /* padding: 32 */ } __attribute__((__aligned__(64))); Link: https://lkml.kernel.org/r/20220624125423.6126-3-mgorman@techsingularity.net Signed-off-by: Mel Gorman Tested-by: Minchan Kim Acked-by: Minchan Kim Acked-by: Vlastimil Babka Tested-by: Yu Zhao Cc: Hugh Dickins Cc: Marcelo Tosatti Cc: Marek Szyprowski Cc: Michal Hocko Cc: Nicolas Saenz Julienne Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 5da1135e6755..041136b5628a 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -355,15 +355,18 @@ enum zone_watermarks { }; /* - * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER plus one additional - * for pageblock size for THP if configured. + * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional list + * for THP which will usually be GFP_MOVABLE. Even if it is another type, + * it should not contribute to serious fragmentation causing THP allocation + * failures. */ #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define NR_PCP_THP 1 #else #define NR_PCP_THP 0 #endif -#define NR_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1 + NR_PCP_THP)) +#define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1)) +#define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP) /* * Shift to encode migratetype and order in the same integer, with order @@ -389,7 +392,7 @@ struct per_cpu_pages { /* Lists of pages, one per migrate type stored on the pcp-lists */ struct list_head lists[NR_PCP_LISTS]; -}; +} ____cacheline_aligned_in_smp; struct per_cpu_zonestat { #ifdef CONFIG_SMP -- cgit From 4b23a68f953628eb4e4b7fe1294ebf93d4b8ceee Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Fri, 24 Jun 2022 13:54:21 +0100 Subject: mm/page_alloc: protect PCP lists with a spinlock Currently the PCP lists are protected by using local_lock_irqsave to prevent migration and IRQ reentrancy but this is inconvenient. Remote draining of the lists is impossible and a workqueue is required and every task allocation/free must disable then enable interrupts which is expensive. As preparation for dealing with both of those problems, protect the lists with a spinlock. The IRQ-unsafe version of the lock is used because IRQs are already disabled by local_lock_irqsave. spin_trylock is used in combination with local_lock_irqsave() but later will be replaced with a spin_trylock_irqsave when the local_lock is removed. The per_cpu_pages still fits within the same number of cache lines after this patch relative to before the series. struct per_cpu_pages { spinlock_t lock; /* 0 4 */ int count; /* 4 4 */ int high; /* 8 4 */ int batch; /* 12 4 */ short int free_factor; /* 16 2 */ short int expire; /* 18 2 */ /* XXX 4 bytes hole, try to pack */ struct list_head lists[13]; /* 24 208 */ /* size: 256, cachelines: 4, members: 7 */ /* sum members: 228, holes: 1, sum holes: 4 */ /* padding: 24 */ } __attribute__((__aligned__(64))); There is overhead in the fast path due to acquiring the spinlock even though the spinlock is per-cpu and uncontended in the common case. Page Fault Test (PFT) running on a 1-socket reported the following results on a 1 socket machine. 5.19.0-rc3 5.19.0-rc3 vanilla mm-pcpspinirq-v5r16 Hmean faults/sec-1 869275.7381 ( 0.00%) 874597.5167 * 0.61%* Hmean faults/sec-3 2370266.6681 ( 0.00%) 2379802.0362 * 0.40%* Hmean faults/sec-5 2701099.7019 ( 0.00%) 2664889.7003 * -1.34%* Hmean faults/sec-7 3517170.9157 ( 0.00%) 3491122.8242 * -0.74%* Hmean faults/sec-8 3965729.6187 ( 0.00%) 3939727.0243 * -0.66%* There is a small hit in the number of faults per second but given that the results are more stable, it's borderline noise. [akpm@linux-foundation.org: add missing local_unlock_irqrestore() on contention path] Link: https://lkml.kernel.org/r/20220624125423.6126-6-mgorman@techsingularity.net Signed-off-by: Mel Gorman Tested-by: Yu Zhao Reviewed-by: Nicolas Saenz Julienne Tested-by: Nicolas Saenz Julienne Acked-by: Vlastimil Babka Cc: Hugh Dickins Cc: Marcelo Tosatti Cc: Marek Szyprowski Cc: Michal Hocko Cc: Minchan Kim Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 041136b5628a..578247a341b2 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -382,6 +382,7 @@ enum zone_watermarks { /* Fields and list protected by pagesets local_lock in page_alloc.c */ struct per_cpu_pages { + spinlock_t lock; /* Protects lists field */ int count; /* number of pages in the list */ int high; /* high watermark, emptying needed */ int batch; /* chunk size for buddy add/remove */ -- cgit From 840532711d7299d7e937952482ec899d4622c452 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 11 Jul 2022 12:35:35 +0530 Subject: mm/mmap: build protect protection_map[] with __P000 Patch series "mm/mmap: Drop __SXXX/__PXXX macros from across platforms", v7. __SXXX/__PXXX macros are unnecessary abstraction layer in creating the generic protection_map[] array which is used for vm_get_page_prot(). This abstraction layer can be avoided, if the platforms just define the array protection_map[] for all possible vm_flags access permission combinations and also export vm_get_page_prot() implementation. This series drops __SXXX/__PXXX macros from across platforms in the tree. First it build protects generic protection_map[] array with '#ifdef __P000' and moves it inside platforms which enable ARCH_HAS_VM_GET_PAGE_PROT. Later this build protects same array with '#ifdef ARCH_HAS_VM_GET_PAGE_PROT' and moves inside remaining platforms while enabling ARCH_HAS_VM_GET_PAGE_PROT. This adds a new macro DECLARE_VM_GET_PAGE_PROT defining the current generic vm_get_page_prot(), in order for it to be reused on platforms that do not require custom implementation. Finally, ARCH_HAS_VM_GET_PAGE_PROT can just be dropped, as all platforms now define and export vm_get_page_prot(), via looking up a private and static protection_map[] array. protection_map[] data type has been changed as 'static const' on all platforms that do not change it during boot. This patch (of 26): Build protect generic protection_map[] array with __P000, so that it can be moved inside all the platforms one after the other. Otherwise there will be build failures during this process. CONFIG_ARCH_HAS_VM_GET_PAGE_PROT cannot be used for this purpose as only certain platforms enable this config now. Link: https://lkml.kernel.org/r/20220711070600.2378316-1-anshuman.khandual@arm.com Link: https://lkml.kernel.org/r/20220711070600.2378316-2-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual Reviewed-by: Christoph Hellwig Reviewed-by: Christophe Leroy Suggested-by: Christophe Leroy Cc: Arnd Bergmann Cc: Brian Cain Cc: Catalin Marinas Cc: Christoph Hellwig Cc: Chris Zankel Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Heiko Carstens Cc: Huacai Chen Cc: Ingo Molnar Cc: "James E.J. Bottomley" Cc: Jeff Dike Cc: Jonas Bonn Cc: Michael Ellerman Cc: Michal Simek Cc: Nicholas Piggin Cc: Palmer Dabbelt Cc: Paul Mackerras Cc: Paul Walmsley Cc: Richard Henderson Cc: Rich Felker Cc: Russell King Cc: Sam Ravnborg Cc: Stafford Horne Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vineet Gupta Cc: WANG Xuerui Cc: Will Deacon Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index d4ebfc206e2b..1a435ce146a2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -425,7 +425,9 @@ extern unsigned int kobjsize(const void *objp); * mapping from the currently active vm_flags protection bits (the * low four bits) to a page protection mask.. */ +#ifdef __P000 extern pgprot_t protection_map[16]; +#endif /* * The default fault flags that should be used by most of the -- cgit From 43957b5d11037a651d162f65c682ec3c76777fc8 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 11 Jul 2022 12:35:36 +0530 Subject: mm/mmap: define DECLARE_VM_GET_PAGE_PROT This just converts the generic vm_get_page_prot() implementation into a new macro i.e DECLARE_VM_GET_PAGE_PROT which later can be used across platforms when enabling them with ARCH_HAS_VM_GET_PAGE_PROT. This does not create any functional change. Link: https://lkml.kernel.org/r/20220711070600.2378316-3-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual Reviewed-by: Christophe Leroy Suggested-by: Christoph Hellwig Cc: Arnd Bergmann Cc: Brian Cain Cc: Catalin Marinas Cc: Christoph Hellwig Cc: Chris Zankel Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Heiko Carstens Cc: Huacai Chen Cc: Ingo Molnar Cc: "James E.J. Bottomley" Cc: Jeff Dike Cc: Jonas Bonn Cc: Michael Ellerman Cc: Michal Simek Cc: Nicholas Piggin Cc: Palmer Dabbelt Cc: Paul Mackerras Cc: Paul Walmsley Cc: Richard Henderson Cc: Rich Felker Cc: Russell King Cc: Sam Ravnborg Cc: Stafford Horne Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vineet Gupta Cc: WANG Xuerui Cc: Will Deacon Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 3cdc16cfd867..014ee8f0fbaa 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1689,4 +1689,32 @@ typedef unsigned int pgtbl_mod_mask; #define MAX_PTRS_PER_P4D PTRS_PER_P4D #endif +/* description of effects of mapping type and prot in current implementation. + * this is due to the limited x86 page protection hardware. The expected + * behavior is in parens: + * + * map_type prot + * PROT_NONE PROT_READ PROT_WRITE PROT_EXEC + * MAP_SHARED r: (no) no r: (yes) yes r: (no) yes r: (no) yes + * w: (no) no w: (no) no w: (yes) yes w: (no) no + * x: (no) no x: (no) yes x: (no) yes x: (yes) yes + * + * MAP_PRIVATE r: (no) no r: (yes) yes r: (no) yes r: (no) yes + * w: (no) no w: (no) no w: (copy) copy w: (no) no + * x: (no) no x: (no) yes x: (no) yes x: (yes) yes + * + * On arm64, PROT_EXEC has the following behaviour for both MAP_SHARED and + * MAP_PRIVATE (with Enhanced PAN supported): + * r: (no) no + * w: (no) no + * x: (yes) yes + */ +#define DECLARE_VM_GET_PAGE_PROT \ +pgprot_t vm_get_page_prot(unsigned long vm_flags) \ +{ \ + return protection_map[vm_flags & \ + (VM_READ | VM_WRITE | VM_EXEC | VM_SHARED)]; \ +} \ +EXPORT_SYMBOL(vm_get_page_prot); + #endif /* _LINUX_PGTABLE_H */ -- cgit From 09095f74130dfb2110ef2bcdd9ad0d42addaa1d5 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 11 Jul 2022 12:35:41 +0530 Subject: mm/mmap: build protect protection_map[] with ARCH_HAS_VM_GET_PAGE_PROT Now that protection_map[] has been moved inside those platforms that enable ARCH_HAS_VM_GET_PAGE_PROT. Hence generic protection_map[] array now can be protected with CONFIG_ARCH_HAS_VM_GET_PAGE_PROT intead of __P000. Link: https://lkml.kernel.org/r/20220711070600.2378316-8-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual Reviewed-by: Christophe Leroy Cc: Arnd Bergmann Cc: Brian Cain Cc: Catalin Marinas Cc: Christoph Hellwig Cc: Christoph Hellwig Cc: Chris Zankel Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Geert Uytterhoeven Cc: Guo Ren Cc: Heiko Carstens Cc: Huacai Chen Cc: Ingo Molnar Cc: "James E.J. Bottomley" Cc: Jeff Dike Cc: Jonas Bonn Cc: Michael Ellerman Cc: Michal Simek Cc: Nicholas Piggin Cc: Palmer Dabbelt Cc: Paul Mackerras Cc: Paul Walmsley Cc: Richard Henderson Cc: Rich Felker Cc: Russell King Cc: Sam Ravnborg Cc: Stafford Horne Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vineet Gupta Cc: WANG Xuerui Cc: Will Deacon Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 1a435ce146a2..4b4dc93f9bc3 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -425,7 +425,7 @@ extern unsigned int kobjsize(const void *objp); * mapping from the currently active vm_flags protection bits (the * low four bits) to a page protection mask.. */ -#ifdef __P000 +#ifndef CONFIG_ARCH_HAS_VM_GET_PAGE_PROT extern pgprot_t protection_map[16]; #endif -- cgit From 3d923c5f1e21ad491acd4c0d62bf2481ce94016c Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Mon, 11 Jul 2022 12:36:00 +0530 Subject: mm/mmap: drop ARCH_HAS_VM_GET_PAGE_PROT Now all the platforms enable ARCH_HAS_GET_PAGE_PROT. They define and export own vm_get_page_prot() whether custom or standard DECLARE_VM_GET_PAGE_PROT. Hence there is no need for default generic fallback for vm_get_page_prot(). Just drop this fallback and also ARCH_HAS_GET_PAGE_PROT mechanism. Link: https://lkml.kernel.org/r/20220711070600.2378316-27-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual Reviewed-by: Geert Uytterhoeven Reviewed-by: Christoph Hellwig Reviewed-by: Christophe Leroy Acked-by: Geert Uytterhoeven Cc: Arnd Bergmann Cc: Brian Cain Cc: Catalin Marinas Cc: Christoph Hellwig Cc: Chris Zankel Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Guo Ren Cc: Heiko Carstens Cc: Huacai Chen Cc: Ingo Molnar Cc: "James E.J. Bottomley" Cc: Jeff Dike Cc: Jonas Bonn Cc: Michael Ellerman Cc: Michal Simek Cc: Nicholas Piggin Cc: Palmer Dabbelt Cc: Paul Mackerras Cc: Paul Walmsley Cc: Richard Henderson Cc: Rich Felker Cc: Russell King Cc: Sam Ravnborg Cc: Stafford Horne Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vineet Gupta Cc: WANG Xuerui Cc: Will Deacon Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/mm.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 4b4dc93f9bc3..61e3101c44ea 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -425,9 +425,6 @@ extern unsigned int kobjsize(const void *objp); * mapping from the currently active vm_flags protection bits (the * low four bits) to a page protection mask.. */ -#ifndef CONFIG_ARCH_HAS_VM_GET_PAGE_PROT -extern pgprot_t protection_map[16]; -#endif /* * The default fault flags that should be used by most of the -- cgit From 3ce4fee4401206cf5a2c476ec0ee6c90191dfade Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Mon, 4 Jul 2022 21:21:55 +0800 Subject: mm/huge_memory: check pmd_present first in is_huge_zero_pmd When pmd is non-present, pmd_pfn returns an insane value. So we should check pmd_present first to avoid acquiring such insane value and also avoid touching possible cold huge_zero_pfn cache line when pmd isn't present. Link: https://lkml.kernel.org/r/20220704132201.14611-11-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Muchun Song Cc: Matthew Wilcox Cc: Yang Shi Cc: Zach O'Keefe Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index ae3d8e2fd9e2..12b297f9951d 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -273,7 +273,7 @@ static inline bool is_huge_zero_page(struct page *page) static inline bool is_huge_zero_pmd(pmd_t pmd) { - return READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd) && pmd_present(pmd); + return pmd_present(pmd) && READ_ONCE(huge_zero_pfn) == pmd_pfn(pmd); } static inline bool is_huge_zero_pud(pud_t pud) -- cgit From 121c1781aeb00475d163246d9ae7d8746e377040 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Mon, 4 Jul 2022 21:21:58 +0800 Subject: mm/huge_memory: fix comment of page_deferred_list The current comment is confusing because if global or memcg deferred list in the second tail page is occupied by compound_head, why we still use page[2].deferred_list here? I think it wants to say that Global or memcg deferred list in the first tail page is occupied by compound_mapcount and compound_pincount so we use the second tail page's deferred_list instead. Link: https://lkml.kernel.org/r/20220704132201.14611-14-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Muchun Song Cc: Matthew Wilcox Cc: Yang Shi Cc: Zach O'Keefe Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 12b297f9951d..37f2f11a6d7e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -294,8 +294,8 @@ static inline bool thp_migration_supported(void) static inline struct list_head *page_deferred_list(struct page *page) { /* - * Global or memcg deferred list in the second tail pages is - * occupied by compound_head. + * See organization of tail pages of compound page in + * "struct page" definition. */ return &page[2].deferred_list; } -- cgit From dcadcf1c30619ead2f3280bfb7f74de8304be2bb Mon Sep 17 00:00:00 2001 From: Gang Li Date: Wed, 6 Jul 2022 11:46:54 +0800 Subject: mm, hugetlb: skip irrelevant nodes in show_free_areas() show_free_areas() allows to filter out node specific data which is irrelevant to the allocation request. But hugetlb_show_meminfo() still shows hugetlb on all nodes, which is redundant and unnecessary. Use show_mem_node_skip() to skip irrelevant nodes. And replace hugetlb_show_meminfo() with hugetlb_show_meminfo_node(nid). before-and-after sample output of OOM: before: ``` [ 214.362453] Node 1 active_anon:148kB inactive_anon:4050920kB active_file:112kB inactive_file:100kB [ 214.375429] Node 1 Normal free:45100kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig [ 214.388334] lowmem_reserve[]: 0 0 0 0 0 [ 214.390251] Node 1 Normal: 423*4kB (UE) 320*8kB (UME) 187*16kB (UE) 117*32kB (UE) 57*64kB (UME) 20 [ 214.397626] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 214.401518] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB ``` after: ``` [ 145.069705] Node 1 active_anon:128kB inactive_anon:4049412kB active_file:56kB inactive_file:84kB u [ 145.110319] Node 1 Normal free:45424kB boost:0kB min:45576kB low:56968kB high:68360kB reserved_hig [ 145.152315] lowmem_reserve[]: 0 0 0 0 0 [ 145.155244] Node 1 Normal: 470*4kB (UME) 373*8kB (UME) 247*16kB (UME) 168*32kB (UE) 86*64kB (UME) [ 145.164119] Node 1 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB ``` Link: https://lkml.kernel.org/r/20220706034655.1834-1-ligang.bdlg@bytedance.com Signed-off-by: Gang Li Reviewed-by: Mike Kravetz Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 75ee739d815b..4cdfce976644 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -152,7 +152,7 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct page *ref_page, zap_flags_t zap_flags); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(char *buf, int len, int nid); -void hugetlb_show_meminfo(void); +void hugetlb_show_meminfo_node(int nid); unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags); @@ -298,7 +298,7 @@ static inline int hugetlb_report_node_meminfo(char *buf, int len, int nid) return 0; } -static inline void hugetlb_show_meminfo(void) +static inline void hugetlb_show_meminfo_node(int nid) { } -- cgit From 0d8bc0b10aeab543bdccb86180f58db1f79f7cee Mon Sep 17 00:00:00 2001 From: Xiu Jianfeng Date: Wed, 13 Jul 2022 20:53:14 +0800 Subject: writeback: cleanup bdi_sched_wait() bdi_sched_wait() is no longer used since commit 839a8e8660b6 ("writeback: replace custom worker pool implementation with unbound workqueue"), so remove it. Link: https://lkml.kernel.org/r/20220713125314.171345-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng Reviewed-by: Jan Kara Reviewed-by: Johannes Thumshirn Acked-by: Jens Axboe Signed-off-by: Andrew Morton --- include/linux/backing-dev.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index d452071db572..e84b745a6811 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -140,12 +140,6 @@ static inline bool mapping_can_writeback(struct address_space *mapping) return inode_to_bdi(mapping->host)->capabilities & BDI_CAP_WRITEBACK; } -static inline int bdi_sched_wait(void *word) -{ - schedule(); - return 0; -} - #ifdef CONFIG_CGROUP_WRITEBACK struct bdi_writeback *wb_get_lookup(struct backing_dev_info *bdi, -- cgit From 62df90b53e6f332bb69b73621998826c49a17323 Mon Sep 17 00:00:00 2001 From: wuchi Date: Sun, 19 Jun 2022 15:46:41 +0800 Subject: net, lib/once: remove {net_}get_random_once_wait macro DO_ONCE(func, ...) will call func with spinlock which acquired by spin_lock_irqsave in __do_once_start. But the get_random_once_wait will sleep in get_random_bytes_wait -> wait_for_random_bytes. Fortunately, there is no place to use {net_}get_random_once_wait, so we could remove them simply. Link: https://lkml.kernel.org/r/20220619074641.40916-1-wuchi.zero@gmail.com Signed-off-by: wuchi Acked-by: Jakub Kicinski Cc: David S. Miller Cc: Eric Dumazet Cc: Paolo Abeni Signed-off-by: Andrew Morton --- include/linux/net.h | 2 -- include/linux/once.h | 2 -- 2 files changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/net.h b/include/linux/net.h index 12093f4db50c..8613772a1f58 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -303,8 +303,6 @@ do { \ #define net_get_random_once(buf, nbytes) \ get_random_once((buf), (nbytes)) -#define net_get_random_once_wait(buf, nbytes) \ - get_random_once_wait((buf), (nbytes)) /* * E.g. XFS meta- & log-data is in slab pages, or bcache meta diff --git a/include/linux/once.h b/include/linux/once.h index f54523052bbc..b14d8b309d52 100644 --- a/include/linux/once.h +++ b/include/linux/once.h @@ -54,7 +54,5 @@ void __do_once_done(bool *done, struct static_key_true *once_key, #define get_random_once(buf, nbytes) \ DO_ONCE(get_random_bytes, (buf), (nbytes)) -#define get_random_once_wait(buf, nbytes) \ - DO_ONCE(get_random_bytes_wait, (buf), (nbytes)) \ #endif /* _LINUX_ONCE_H */ -- cgit From 43c249ea0b1e10baac4a1264a25d69723ce5d2c2 Mon Sep 17 00:00:00 2001 From: Uros Bizjak Date: Fri, 24 Jun 2022 16:14:12 +0200 Subject: compiler-gcc.h: remove ancient workaround for gcc PR 58670 The workaround for 'asm goto' miscompilation introduces a compiler barrier quirk that inhibits many useful compiler optimizations. For example, __try_cmpxchg_user compiles to: 11375: 41 8b 4d 00 mov 0x0(%r13),%ecx 11379: 41 8b 02 mov (%r10),%eax 1137c: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 11380: 0f 94 c2 sete %dl 11383: 84 d2 test %dl,%dl 11385: 75 c4 jne 1134b <...> 11387: 41 89 02 mov %eax,(%r10) where the barrier inhibits flags propagation from asm when compiled with gcc-12. When the mentioned quirk is removed, the following code is generated: 11553: 41 8b 4d 00 mov 0x0(%r13),%ecx 11557: 41 8b 02 mov (%r10),%eax 1155a: f0 0f b1 0a lock cmpxchg %ecx,(%rdx) 1155e: 74 c9 je 11529 <...> 11560: 41 89 02 mov %eax,(%r10) The refered compiler bug: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 was fixed for gcc-4.8.2. Current minimum required version of GCC is version 5.1 which has the above 'asm goto' miscompilation fixed, so remove the workaround. Link: https://lkml.kernel.org/r/20220624141412.72274-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak Cc: Ingo Molnar Cc: "H. Peter Anvin" Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- include/linux/compiler-gcc.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index a0c55eeaeaf1..9b157b71036f 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -66,17 +66,6 @@ __builtin_unreachable(); \ } while (0) -/* - * GCC 'asm goto' miscompiles certain code sequences: - * - * http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670 - * - * Work it around via a compiler barrier quirk suggested by Jakub Jelinek. - * - * (asm goto is automatically volatile - the naming reflects this.) - */ -#define asm_volatile_goto(x...) do { asm goto(x); asm (""); } while (0) - #if defined(CONFIG_ARCH_USE_BUILTIN_BSWAP) #define __HAVE_BUILTIN_BSWAP32__ #define __HAVE_BUILTIN_BSWAP64__ -- cgit From 045ed31e23aea840648c290dbde04797064960db Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Fri, 24 Jun 2022 08:30:04 +0300 Subject: kfifo: fix kfifo_to_user() return type The kfifo_to_user() macro is supposed to return zero for success or negative error codes. Unfortunately, there is a signedness bug so it returns unsigned int. This only affects callers which try to save the result in ssize_t and as far as I can see the only place which does that is line6_hwdep_read(). TL;DR: s/_uint/_int/. Link: https://lkml.kernel.org/r/YrVL3OJVLlNhIMFs@kili Fixes: 144ecf310eb5 ("kfifo: fix kfifo_alloc() to return a signed int value") Signed-off-by: Dan Carpenter Cc: Stefani Seibold Cc: Randy Dunlap Signed-off-by: Andrew Morton --- include/linux/kfifo.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kfifo.h b/include/linux/kfifo.h index 86249476b57f..0b35a41440ff 100644 --- a/include/linux/kfifo.h +++ b/include/linux/kfifo.h @@ -688,7 +688,7 @@ __kfifo_uint_must_check_helper( \ * writer, you don't need extra locking to use these macro. */ #define kfifo_to_user(fifo, to, len, copied) \ -__kfifo_uint_must_check_helper( \ +__kfifo_int_must_check_helper( \ ({ \ typeof((fifo) + 1) __tmp = (fifo); \ void __user *__to = (to); \ -- cgit From 4f09903078eeb9138cddce8db06100b82f8620e8 Mon Sep 17 00:00:00 2001 From: Sander Vanheule Date: Sat, 2 Jul 2022 18:08:27 +0200 Subject: cpumask: add UP optimised for_each_*_cpu versions On uniprocessor builds, the following loops will always run over a mask that contains one enabled CPU (cpu0): - for_each_possible_cpu - for_each_online_cpu - for_each_present_cpu Provide uniprocessor-specific macros for these loops, that always run exactly once. Link: https://lkml.kernel.org/r/3a92869b902a075b97be5d1452c9c6badbbff0df.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule Acked-by: Yury Norov Cc: Andy Shevchenko Cc: Borislav Petkov Cc: Dave Hansen Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Marco Elver Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Valentin Schneider Signed-off-by: Andrew Morton --- include/linux/cpumask.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index fe29ac7cc469..533612770bc0 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -811,9 +811,16 @@ extern const DECLARE_BITMAP(cpu_all_bits, NR_CPUS); /* First bits of cpu_bit_bitmap are in fact unset. */ #define cpu_none_mask to_cpumask(cpu_bit_bitmap[0]) +#if NR_CPUS == 1 +/* Uniprocessor: the possible/online/present masks are always "1" */ +#define for_each_possible_cpu(cpu) for ((cpu) = 0; (cpu) < 1; (cpu)++) +#define for_each_online_cpu(cpu) for ((cpu) = 0; (cpu) < 1; (cpu)++) +#define for_each_present_cpu(cpu) for ((cpu) = 0; (cpu) < 1; (cpu)++) +#else #define for_each_possible_cpu(cpu) for_each_cpu((cpu), cpu_possible_mask) #define for_each_online_cpu(cpu) for_each_cpu((cpu), cpu_online_mask) #define for_each_present_cpu(cpu) for_each_cpu((cpu), cpu_present_mask) +#endif /* Wrappers for arch boot code to manipulate normally-constant masks */ void init_cpu_present(const struct cpumask *src); -- cgit From b81dce77cedcea6f00292f02d4b1ebbfc2c5988d Mon Sep 17 00:00:00 2001 From: Sander Vanheule Date: Sat, 2 Jul 2022 18:08:25 +0200 Subject: cpumask: Fix invalid uniprocessor mask assumption On uniprocessor builds, any CPU mask is assumed to contain exactly one CPU (cpu0). This assumption ignores the existence of empty masks, resulting in incorrect behaviour. cpumask_first_zero(), cpumask_next_zero(), and for_each_cpu_not() don't provide behaviour matching the assumption that a UP mask is always "1", and instead provide behaviour matching the empty mask. Drop the incorrectly optimised code and use the generic implementations in all cases. Link: https://lkml.kernel.org/r/86bf3f005abba2d92120ddd0809235cab4f759a6.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule Suggested-by: Yury Norov Cc: Andy Shevchenko Cc: Borislav Petkov Cc: Dave Hansen Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Marco Elver Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Valentin Schneider Signed-off-by: Andrew Morton --- include/linux/cpumask.h | 99 ++++++++++--------------------------------------- 1 file changed, 19 insertions(+), 80 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 533612770bc0..6c5b4ee000f2 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -116,85 +116,6 @@ static __always_inline unsigned int cpumask_check(unsigned int cpu) return cpu; } -#if NR_CPUS == 1 -/* Uniprocessor. Assume all masks are "1". */ -static inline unsigned int cpumask_first(const struct cpumask *srcp) -{ - return 0; -} - -static inline unsigned int cpumask_first_zero(const struct cpumask *srcp) -{ - return 0; -} - -static inline unsigned int cpumask_first_and(const struct cpumask *srcp1, - const struct cpumask *srcp2) -{ - return 0; -} - -static inline unsigned int cpumask_last(const struct cpumask *srcp) -{ - return 0; -} - -/* Valid inputs for n are -1 and 0. */ -static inline unsigned int cpumask_next(int n, const struct cpumask *srcp) -{ - return n+1; -} - -static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp) -{ - return n+1; -} - -static inline unsigned int cpumask_next_and(int n, - const struct cpumask *srcp, - const struct cpumask *andp) -{ - return n+1; -} - -static inline unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, - int start, bool wrap) -{ - /* cpu0 unless stop condition, wrap and at cpu0, then nr_cpumask_bits */ - return (wrap && n == 0); -} - -/* cpu must be a valid cpu, ie 0, so there's no other choice. */ -static inline unsigned int cpumask_any_but(const struct cpumask *mask, - unsigned int cpu) -{ - return 1; -} - -static inline unsigned int cpumask_local_spread(unsigned int i, int node) -{ - return 0; -} - -static inline int cpumask_any_and_distribute(const struct cpumask *src1p, - const struct cpumask *src2p) { - return cpumask_first_and(src1p, src2p); -} - -static inline int cpumask_any_distribute(const struct cpumask *srcp) -{ - return cpumask_first(srcp); -} - -#define for_each_cpu(cpu, mask) \ - for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask) -#define for_each_cpu_not(cpu, mask) \ - for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask) -#define for_each_cpu_wrap(cpu, mask, start) \ - for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask, (void)(start)) -#define for_each_cpu_and(cpu, mask1, mask2) \ - for ((cpu) = 0; (cpu) < 1; (cpu)++, (void)mask1, (void)mask2) -#else /** * cpumask_first - get the first cpu in a cpumask * @srcp: the cpumask pointer @@ -260,10 +181,29 @@ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp) int __pure cpumask_next_and(int n, const struct cpumask *, const struct cpumask *); int __pure cpumask_any_but(const struct cpumask *mask, unsigned int cpu); + +#if NR_CPUS == 1 +/* Uniprocessor: there is only one valid CPU */ +static inline unsigned int cpumask_local_spread(unsigned int i, int node) +{ + return 0; +} + +static inline int cpumask_any_and_distribute(const struct cpumask *src1p, + const struct cpumask *src2p) { + return cpumask_first_and(src1p, src2p); +} + +static inline int cpumask_any_distribute(const struct cpumask *srcp) +{ + return cpumask_first(srcp); +} +#else unsigned int cpumask_local_spread(unsigned int i, int node); int cpumask_any_and_distribute(const struct cpumask *src1p, const struct cpumask *src2p); int cpumask_any_distribute(const struct cpumask *srcp); +#endif /* NR_CPUS */ /** * for_each_cpu - iterate over every cpu in a mask @@ -324,7 +264,6 @@ extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool for ((cpu) = -1; \ (cpu) = cpumask_next_and((cpu), (mask1), (mask2)), \ (cpu) < nr_cpu_ids;) -#endif /* SMP */ #define CPU_BITS_NONE \ { \ -- cgit From 953257a9252a9b3c58ca68fc5bf26fc65e5b1cb8 Mon Sep 17 00:00:00 2001 From: Sander Vanheule Date: Sat, 2 Jul 2022 18:08:28 +0200 Subject: cpumask: update cpumask_next_wrap() signature The extern specifier is not needed for this declaration, so drop it. The function also depends only on the input parameters, and has no side effects, so it can be marked __pure like other functions in cpumask.h. Link: https://lkml.kernel.org/r/72ab755695b74bb5fbaa756ae4c0edd708d172f1.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule Reviewed-by: Andy Shevchenko Cc: Borislav Petkov Cc: Dave Hansen Cc: Greg Kroah-Hartman Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Marco Elver Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Valentin Schneider Cc: Yury Norov Signed-off-by: Andrew Morton --- include/linux/cpumask.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 6c5b4ee000f2..523857884ae4 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -229,7 +229,7 @@ int cpumask_any_distribute(const struct cpumask *srcp); (cpu) = cpumask_next_zero((cpu), (mask)), \ (cpu) < nr_cpu_ids;) -extern int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap); +int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap); /** * for_each_cpu_wrap - iterate over every cpu in a mask, starting at a specified location -- cgit From 91561d4ecb755f056f8ff04f9dcaec210140e55c Mon Sep 17 00:00:00 2001 From: Chao Gao Date: Fri, 15 Jul 2022 18:45:33 +0800 Subject: swiotlb: remove unused fields in io_tlb_mem Commit 20347fca71a3 ("swiotlb: split up the global swiotlb lock") splits io_tlb_mem into multiple areas. Each area has its own lock and index. The global ones are not used so remove them. Signed-off-by: Chao Gao Signed-off-by: Christoph Hellwig --- include/linux/swiotlb.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index f65ff1930120..d3ae03edbbd2 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -79,11 +79,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys, * @used: The number of used IO TLB block. * @list: The free list describing the number of free entries available * from each index. - * @index: The index to start searching in the next round. * @orig_addr: The original address corresponding to a mapped entry. * @alloc_size: Size of the allocated buffer. - * @lock: The lock to protect the above data structures in the map and - * unmap calls. * @debugfs: The dentry to debugfs. * @late_alloc: %true if allocated using the page allocator * @force_bounce: %true if swiotlb bouncing is forced @@ -97,8 +94,6 @@ struct io_tlb_mem { void *vaddr; unsigned long nslabs; unsigned long used; - unsigned int index; - spinlock_t lock; struct dentry *debugfs; bool late_alloc; bool force_bounce; -- cgit From 942a8186eb4451700dadd1d60a2e43ce67605991 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 12 Jul 2022 08:43:07 +0200 Subject: swiotlb: move struct io_tlb_slot to swiotlb.c No need to expose this structure definition in the header. Signed-off-by: Christoph Hellwig --- include/linux/swiotlb.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index d3ae03edbbd2..35bc4e281c21 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -101,11 +101,7 @@ struct io_tlb_mem { unsigned int nareas; unsigned int area_nslabs; struct io_tlb_area *areas; - struct io_tlb_slot { - phys_addr_t orig_addr; - size_t alloc_size; - unsigned int list; - } *slots; + struct io_tlb_slot *slots; }; extern struct io_tlb_mem io_tlb_default_mem; -- cgit From 76c16d3e19446deea98b7883f261758b96b8781a Mon Sep 17 00:00:00 2001 From: Wong Vee Khee Date: Thu, 14 Jul 2022 15:54:27 +0800 Subject: net: stmmac: switch to use interrupt for hw crosstimestamping Using current implementation of polling mode, there is high chances we will hit into timeout error when running phc2sys. Hence, update the implementation of hardware crosstimestamping to use the MAC interrupt service routine instead of polling for TSIS bit in the MAC Timestamp Interrupt Status register to be set. Cc: Richard Cochran Signed-off-by: Wong Vee Khee Signed-off-by: David S. Miller --- include/linux/stmmac.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 29917850f079..8df475db88c0 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -260,6 +260,7 @@ struct plat_stmmacenet_data { bool has_crossts; int int_snapshot_num; int ext_snapshot_num; + bool int_snapshot_en; bool ext_snapshot_en; bool multi_msi_en; int msi_mac_vec; -- cgit From 9592eef7c16ec5fb9f36c4d9abe8eeffc2e1d2f3 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Tue, 5 Jul 2022 20:48:41 +0200 Subject: random: remove CONFIG_ARCH_RANDOM When RDRAND was introduced, there was much discussion on whether it should be trusted and how the kernel should handle that. Initially, two mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and "nordrand", a boot-time switch. Later the thinking evolved. With a properly designed RNG, using RDRAND values alone won't harm anything, even if the outputs are malicious. Rather, the issue is whether those values are being *trusted* to be good or not. And so a new set of options were introduced as the real ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu". With these options, RDRAND is used, but it's not always credited. So in the worst case, it does nothing, and in the best case, maybe it helps. Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the center and became something certain platforms force-select. The old options don't really help with much, and it's a bit odd to have special handling for these instructions when the kernel can deal fine with the existence or untrusted existence or broken existence or non-existence of that CPU capability. Simplify the situation by removing CONFIG_ARCH_RANDOM and using the ordinary asm-generic fallback pattern instead, keeping the two options that are actually used. For now it leaves "nordrand" for now, as the removal of that will take a different route. Acked-by: Michael Ellerman Acked-by: Catalin Marinas Acked-by: Borislav Petkov Acked-by: Heiko Carstens Acked-by: Greg Kroah-Hartman Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 20e389a14e5c..865770e29f3e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,14 +106,7 @@ declare_get_random_var_wait(long, unsigned long) */ #include -#ifdef CONFIG_ARCH_RANDOM -# include -#else -static inline bool __must_check arch_get_random_long(unsigned long *v) { return false; } -static inline bool __must_check arch_get_random_int(unsigned int *v) { return false; } -static inline bool __must_check arch_get_random_seed_long(unsigned long *v) { return false; } -static inline bool __must_check arch_get_random_seed_int(unsigned int *v) { return false; } -#endif +#include /* * Called from the boot CPU during startup; not valid to call once -- cgit From 0b4ae3f6d1210c11f9baf159009c7227eacf90f2 Mon Sep 17 00:00:00 2001 From: Gwendal Grignou Date: Mon, 11 Jul 2022 07:47:16 -0700 Subject: iio: cros: Register FIFO callback after sensor is registered Instead of registering callback to process sensor events right at initialization time, wait for the sensor to be register in the iio subsystem. Events can come at probe time (in case the kernel rebooted abruptly without switching the sensor off for instance), and be sent to IIO core before the sensor is fully registered. Fixes: aa984f1ba4a4 ("iio: cros_ec: Register to cros_ec_sensorhub when EC supports FIFO") Reported-by: Douglas Anderson Signed-off-by: Gwendal Grignou Reviewed-by: Douglas Anderson Link: https://lore.kernel.org/r/20220711144716.642617-1-gwendal@chromium.org Signed-off-by: Jonathan Cameron --- include/linux/iio/common/cros_ec_sensors_core.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/common/cros_ec_sensors_core.h b/include/linux/iio/common/cros_ec_sensors_core.h index a8259c8822f5..e72167b96d27 100644 --- a/include/linux/iio/common/cros_ec_sensors_core.h +++ b/include/linux/iio/common/cros_ec_sensors_core.h @@ -93,8 +93,11 @@ int cros_ec_sensors_read_cmd(struct iio_dev *indio_dev, unsigned long scan_mask, struct platform_device; int cros_ec_sensors_core_init(struct platform_device *pdev, struct iio_dev *indio_dev, bool physical_device, - cros_ec_sensors_capture_t trigger_capture, - cros_ec_sensorhub_push_data_cb_t push_data); + cros_ec_sensors_capture_t trigger_capture); + +int cros_ec_sensors_core_register(struct device *dev, + struct iio_dev *indio_dev, + cros_ec_sensorhub_push_data_cb_t push_data); irqreturn_t cros_ec_sensors_capture(int irq, void *p); int cros_ec_sensors_push_data(struct iio_dev *indio_dev, -- cgit From 0b8613a21d9c52ccde18264b69de9f46faa362df Mon Sep 17 00:00:00 2001 From: Christian König Date: Tue, 12 Jul 2022 14:59:36 +0200 Subject: dma-buf/dma_resv_usage: update explicit sync documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Make it clear that DMA_RESV_USAGE_BOOKKEEP can be used for explicit synced user space submissions as well and document the rules around adding the same fence with different usages. Signed-off-by: Christian König Reviewed-by: Bas Nieuwenhuizen Acked-by: Alex Deucher Link: https://patchwork.freedesktop.org/patch/msgid/20220712131201.131475-1-christian.koenig@amd.com --- include/linux/dma-resv.h | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h index c8ccbc94d5d2..0637659a702c 100644 --- a/include/linux/dma-resv.h +++ b/include/linux/dma-resv.h @@ -62,6 +62,11 @@ struct dma_resv_list; * For example when asking for WRITE fences then the KERNEL fences are returned * as well. Similar when asked for READ fences then both WRITE and KERNEL * fences are returned as well. + * + * Already used fences can be promoted in the sense that a fence with + * DMA_RESV_USAGE_BOOKKEEP could become DMA_RESV_USAGE_READ by adding it again + * with this usage. But fences can never be degraded in the sense that a fence + * with DMA_RESV_USAGE_WRITE could become DMA_RESV_USAGE_READ. */ enum dma_resv_usage { /** @@ -98,10 +103,15 @@ enum dma_resv_usage { * @DMA_RESV_USAGE_BOOKKEEP: No implicit sync. * * This should be used by submissions which don't want to participate in - * implicit synchronization. + * any implicit synchronization. + * + * The most common case are preemption fences, page table updates, TLB + * flushes as well as explicit synced user submissions. * - * The most common case are preemption fences as well as page table - * updates and their TLB flushes. + * Explicit synced user user submissions can be promoted to + * DMA_RESV_USAGE_READ or DMA_RESV_USAGE_WRITE as needed using + * dma_buf_import_sync_file() when implicit synchronization should + * become necessary after initial adding of the fence. */ DMA_RESV_USAGE_BOOKKEEP }; -- cgit From f4f451a16dd1f478fdb966bcbb612c1e4ce6b962 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 5 Jul 2022 20:35:32 +0800 Subject: mm: fix missing wake-up event for FSDAX pages FSDAX page refcounts are 1-based, rather than 0-based: if refcount is 1, then the page is freed. The FSDAX pages can be pinned through GUP, then they will be unpinned via unpin_user_page() using a folio variant to put the page, however, folio variants did not consider this special case, the result will be to miss a wakeup event (like the user of __fuse_dax_break_layouts()). This results in a task being permanently stuck in TASK_INTERRUPTIBLE state. Since FSDAX pages are only possibly obtained by GUP users, so fix GUP instead of folio_put() to lower overhead. Link: https://lkml.kernel.org/r/20220705123532.283-1-songmuchun@bytedance.com Fixes: d8ddc099c6b3 ("mm/gup: Add gup_put_folio()") Signed-off-by: Muchun Song Suggested-by: Matthew Wilcox Cc: Jason Gunthorpe Cc: John Hubbard Cc: William Kucharski Cc: Dan Williams Cc: Jan Kara Cc: Signed-off-by: Andrew Morton --- include/linux/mm.h | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index cf3d0d673f6b..7898e29bcfb5 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1130,23 +1130,27 @@ static inline bool is_zone_movable_page(const struct page *page) #if defined(CONFIG_ZONE_DEVICE) && defined(CONFIG_FS_DAX) DECLARE_STATIC_KEY_FALSE(devmap_managed_key); -bool __put_devmap_managed_page(struct page *page); -static inline bool put_devmap_managed_page(struct page *page) +bool __put_devmap_managed_page_refs(struct page *page, int refs); +static inline bool put_devmap_managed_page_refs(struct page *page, int refs) { if (!static_branch_unlikely(&devmap_managed_key)) return false; if (!is_zone_device_page(page)) return false; - return __put_devmap_managed_page(page); + return __put_devmap_managed_page_refs(page, refs); } - #else /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ -static inline bool put_devmap_managed_page(struct page *page) +static inline bool put_devmap_managed_page_refs(struct page *page, int refs) { return false; } #endif /* CONFIG_ZONE_DEVICE && CONFIG_FS_DAX */ +static inline bool put_devmap_managed_page(struct page *page) +{ + return put_devmap_managed_page_refs(page, 1); +} + /* 127: arbitrary random number, small enough to assemble well */ #define folio_ref_zero_or_close_to_overflow(folio) \ ((unsigned int) folio_ref_count(folio) + 127u <= 127u) -- cgit From 2e07a521e1e424787af3bfc59615de4220856c35 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 12 Jul 2022 21:52:28 +0100 Subject: skbuff: add SKBFL_DONT_ORPHAN flag We don't want to list every single ubuf_info callback in skb_orphan_frags(), add a flag controlling the behaviour. Signed-off-by: Pavel Begunkov Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d3d10556f0fa..8e12b3b9ad6c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -686,10 +686,13 @@ enum { * charged to the kernel memory. */ SKBFL_PURE_ZEROCOPY = BIT(2), + + SKBFL_DONT_ORPHAN = BIT(3), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) -#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY) +#define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ + SKBFL_DONT_ORPHAN) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -3182,8 +3185,7 @@ static inline int skb_orphan_frags(struct sk_buff *skb, gfp_t gfp_mask) { if (likely(!skb_zcopy(skb))) return 0; - if (!skb_zcopy_is_nouarg(skb) && - skb_uarg(skb)->callback == msg_zerocopy_callback) + if (skb_shinfo(skb)->flags & SKBFL_DONT_ORPHAN) return 0; return skb_copy_ubufs(skb, gfp_mask); } -- cgit From a229cc14f3395311b899e5e582b71efa8dd01df0 Mon Sep 17 00:00:00 2001 From: John Garry Date: Thu, 14 Jul 2022 19:15:24 +0800 Subject: dma-mapping: add dma_opt_mapping_size() Streaming DMA mapping involving an IOMMU may be much slower for larger total mapping size. This is because every IOMMU DMA mapping requires an IOVA to be allocated and freed. IOVA sizes above a certain limit are not cached, which can have a big impact on DMA mapping performance. Provide an API for device drivers to know this "optimal" limit, such that they may try to produce mapping which don't exceed it. Signed-off-by: John Garry Reviewed-by: Damien Le Moal Acked-by: Martin K. Petersen Signed-off-by: Christoph Hellwig --- include/linux/dma-map-ops.h | 1 + include/linux/dma-mapping.h | 5 +++++ 2 files changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 0d5b06b3a4a6..98ceba6fa848 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -69,6 +69,7 @@ struct dma_map_ops { int (*dma_supported)(struct device *dev, u64 mask); u64 (*get_required_mask)(struct device *dev); size_t (*max_mapping_size)(struct device *dev); + size_t (*opt_mapping_size)(void); unsigned long (*get_merge_boundary)(struct device *dev); }; diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index dca2b1355bb1..fe3849434b2a 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -144,6 +144,7 @@ int dma_set_mask(struct device *dev, u64 mask); int dma_set_coherent_mask(struct device *dev, u64 mask); u64 dma_get_required_mask(struct device *dev); size_t dma_max_mapping_size(struct device *dev); +size_t dma_opt_mapping_size(struct device *dev); bool dma_need_sync(struct device *dev, dma_addr_t dma_addr); unsigned long dma_get_merge_boundary(struct device *dev); struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size, @@ -266,6 +267,10 @@ static inline size_t dma_max_mapping_size(struct device *dev) { return 0; } +static inline size_t dma_opt_mapping_size(struct device *dev) +{ + return 0; +} static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr) { return false; -- cgit From 6d9870b7e5def2450e21316515b9efc0529204dd Mon Sep 17 00:00:00 2001 From: John Garry Date: Thu, 14 Jul 2022 19:15:25 +0800 Subject: dma-iommu: add iommu_dma_opt_mapping_size() Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which allows the drivers to know the optimal mapping limit and thus limit the requested IOVA lengths. This value is based on the IOVA rcache range limit, as IOVAs allocated above this limit must always be newly allocated, which may be quite slow. Signed-off-by: John Garry Reviewed-by: Damien Le Moal Acked-by: Robin Murphy Acked-by: Martin K. Petersen Signed-off-by: Christoph Hellwig --- include/linux/iova.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iova.h b/include/linux/iova.h index 320a70e40233..c6ba6d95d79c 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain *iovad, dma_addr_t iova) int iova_cache_get(void); void iova_cache_put(void); +unsigned long iova_rcache_range(void); + void free_iova(struct iova_domain *iovad, unsigned long pfn); void __free_iova(struct iova_domain *iovad, struct iova *iova); struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size, -- cgit From 2b038e786f8338a3bc22d791000753e0ec113e00 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 22 Jun 2022 20:28:42 +0300 Subject: gpiolib: devres: Get rid of unused devm_gpio_free() The last user, which in fact was a dead code, has gone a year ago, previous one 3 years ago. On top of that we want to drop away the legacy GPIO APIs in the kernel, so take a chance to get rid of unused devm_gpio_free() and accompanying stuff. Signed-off-by: Andy Shevchenko Signed-off-by: Bartosz Golaszewski --- include/linux/gpio.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gpio.h b/include/linux/gpio.h index 008ad3ee56b7..a370387fa406 100644 --- a/include/linux/gpio.h +++ b/include/linux/gpio.h @@ -95,7 +95,6 @@ struct device; int devm_gpio_request(struct device *dev, unsigned gpio, const char *label); int devm_gpio_request_one(struct device *dev, unsigned gpio, unsigned long flags, const char *label); -void devm_gpio_free(struct device *dev, unsigned int gpio); #else /* ! CONFIG_GPIOLIB */ @@ -240,11 +239,6 @@ static inline int devm_gpio_request_one(struct device *dev, unsigned gpio, return -EINVAL; } -static inline void devm_gpio_free(struct device *dev, unsigned int gpio) -{ - WARN_ON(1); -} - #endif /* ! CONFIG_GPIOLIB */ #endif /* __LINUX_GPIO_H */ -- cgit From 2a1192ff0835cb4b0ccd6f6e85c93aa0dc6f66b7 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Tue, 14 Jun 2022 17:23:38 +0200 Subject: gpio: twl4030: Drop platform teardown callback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is no machine providing a teardown callback, so drop the unused code. This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König Signed-off-by: Bartosz Golaszewski --- include/linux/mfd/twl.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/twl.h b/include/linux/mfd/twl.h index 8871cc5188a0..c8cd31756037 100644 --- a/include/linux/mfd/twl.h +++ b/include/linux/mfd/twl.h @@ -594,8 +594,6 @@ struct twl4030_gpio_platform_data { int (*setup)(struct device *dev, unsigned gpio, unsigned ngpio); - int (*teardown)(struct device *dev, - unsigned gpio, unsigned ngpio); }; struct twl4030_madc_platform_data { -- cgit From 7e55b33d3f18fde5c7a57b6c52d80499485c737f Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Tue, 14 Jun 2022 21:48:02 +0200 Subject: gpio: ucb1400: Remove platform setup and teardown support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is no user of these callbacks. The motivation for this change is to stop returning an error code from the remove callback. This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König Signed-off-by: Bartosz Golaszewski --- include/linux/ucb1400.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ucb1400.h b/include/linux/ucb1400.h index 0968ef458447..22345391350b 100644 --- a/include/linux/ucb1400.h +++ b/include/linux/ucb1400.h @@ -84,8 +84,6 @@ struct ucb1400_gpio { struct gpio_chip gc; struct snd_ac97 *ac97; int gpio_offset; - int (*gpio_setup)(struct device *dev, int ngpio); - int (*gpio_teardown)(struct device *dev, int ngpio); }; struct ucb1400_ts { -- cgit From c269df8c5ad316eac47a9078e7a2339e1956637a Mon Sep 17 00:00:00 2001 From: Nuno Sá Date: Wed, 13 Jul 2022 15:14:18 +0200 Subject: gpiolib: add support for bias pull disable MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This change prepares the gpio core to look at firmware flags and set 'FLAG_BIAS_DISABLE' if necessary. It works in similar way to 'GPIO_PULL_DOWN' and 'GPIO_PULL_UP'. Signed-off-by: Nuno Sá Signed-off-by: Bartosz Golaszewski --- include/linux/gpio/machine.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/gpio/machine.h b/include/linux/gpio/machine.h index 4d55da28e664..0b619eb7ae83 100644 --- a/include/linux/gpio/machine.h +++ b/include/linux/gpio/machine.h @@ -14,6 +14,7 @@ enum gpio_lookup_flags { GPIO_TRANSITORY = (1 << 3), GPIO_PULL_UP = (1 << 4), GPIO_PULL_DOWN = (1 << 5), + GPIO_PULL_DISABLE = (1 << 6), GPIO_LOOKUP_FLAGS_DEFAULT = GPIO_ACTIVE_HIGH | GPIO_PERSISTENT, }; -- cgit From 31bea23119cda87088c6bd4085a1e442c6c5974c Mon Sep 17 00:00:00 2001 From: Nuno Sá Date: Wed, 13 Jul 2022 15:14:19 +0200 Subject: gpiolib: of: support bias pull disable MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On top of looking at PULL_UP and PULL_DOWN flags, also look at PULL_DISABLE and set the appropriate GPIO flag. The GPIO core will then pass down this to controllers that support it. Signed-off-by: Nuno Sá Reviewed-by: Linus Walleij Signed-off-by: Bartosz Golaszewski --- include/linux/of_gpio.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/of_gpio.h b/include/linux/of_gpio.h index 8bf2ea859653..a5166eb93437 100644 --- a/include/linux/of_gpio.h +++ b/include/linux/of_gpio.h @@ -29,6 +29,7 @@ enum of_gpio_flags { OF_GPIO_TRANSITORY = 0x8, OF_GPIO_PULL_UP = 0x10, OF_GPIO_PULL_DOWN = 0x20, + OF_GPIO_PULL_DISABLE = 0x40, }; #ifdef CONFIG_OF_GPIO -- cgit From 62fa5c9800a0b9aecf9ce0743f12a16057c9400d Mon Sep 17 00:00:00 2001 From: Luca Ceresoli Date: Fri, 3 Jun 2022 17:57:25 +0200 Subject: mfd: max77714: Update Luca Ceresoli's e-mail address My Bootlin address is preferred from now on. Signed-off-by: Luca Ceresoli Signed-off-by: Luca Ceresoli Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220603155727.1232061-4-luca@lucaceresoli.net --- include/linux/mfd/max77714.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mfd/max77714.h b/include/linux/mfd/max77714.h index a970dc455426..7947e0d697a5 100644 --- a/include/linux/mfd/max77714.h +++ b/include/linux/mfd/max77714.h @@ -3,7 +3,7 @@ * Maxim MAX77714 Register and data structures definition. * * Copyright (C) 2022 Luca Ceresoli - * Author: Luca Ceresoli + * Author: Luca Ceresoli */ #ifndef __LINUX_MFD_MAX77714_H_ -- cgit From 128ac294e1b437cb8a7f2ff8ede1cde9082bddbe Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 30 May 2022 21:24:28 +0200 Subject: mfd: t7l66xb: Drop platform disable callback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit None of the in-tree instantiations of struct t7l66xb_platform_data provides a disable callback. So better don't dereference this function pointer unconditionally. As there is no user, drop it completely instead of calling it conditional. This is a preparation for making platform remove callbacks return void. Fixes: 1f192015ca5b ("mfd: driver for the T7L66XB TMIO SoC") Signed-off-by: Uwe Kleine-König Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220530192430.2108217-3-u.kleine-koenig@pengutronix.de --- include/linux/mfd/t7l66xb.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mfd/t7l66xb.h b/include/linux/mfd/t7l66xb.h index 69632c1b07bd..ae3e7a5c5219 100644 --- a/include/linux/mfd/t7l66xb.h +++ b/include/linux/mfd/t7l66xb.h @@ -12,7 +12,6 @@ struct t7l66xb_platform_data { int (*enable)(struct platform_device *dev); - int (*disable)(struct platform_device *dev); int (*suspend)(struct platform_device *dev); int (*resume)(struct platform_device *dev); -- cgit From 6e1f1b1c93ceec1d9bfa9213775abc19161de5f1 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 30 May 2022 21:24:29 +0200 Subject: mfd: tc6387xb: Drop disable callback that is never called MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The driver never calls the disable callback, so drop the member from the platform struct and all callbacks from the actual platform datas. Signed-off-by: Uwe Kleine-König Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220530192430.2108217-4-u.kleine-koenig@pengutronix.de --- include/linux/mfd/tc6387xb.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mfd/tc6387xb.h b/include/linux/mfd/tc6387xb.h index b4888209494a..aacf1dcc86b9 100644 --- a/include/linux/mfd/tc6387xb.h +++ b/include/linux/mfd/tc6387xb.h @@ -12,7 +12,6 @@ struct tc6387xb_platform_data { int (*enable)(struct platform_device *dev); - int (*disable)(struct platform_device *dev); int (*suspend)(struct platform_device *dev); int (*resume)(struct platform_device *dev); }; -- cgit From de58cee8c6b803dda3304eace346919fe880a40a Mon Sep 17 00:00:00 2001 From: Fabien Parent Date: Tue, 31 May 2022 14:49:56 +0200 Subject: mfd: mt6397-core: Add MT6357 PMIC support Adds support for PMIC keys, Regulator, and RTC for the MT6357 PMIC. Signed-off-by: Fabien Parent Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220531124959.202787-5-fparent@baylibre.com --- include/linux/mfd/mt6357/core.h | 119 +++ include/linux/mfd/mt6357/registers.h | 1574 ++++++++++++++++++++++++++++++++++ 2 files changed, 1693 insertions(+) create mode 100644 include/linux/mfd/mt6357/core.h create mode 100644 include/linux/mfd/mt6357/registers.h (limited to 'include/linux') diff --git a/include/linux/mfd/mt6357/core.h b/include/linux/mfd/mt6357/core.h new file mode 100644 index 000000000000..2441611264fd --- /dev/null +++ b/include/linux/mfd/mt6357/core.h @@ -0,0 +1,119 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2022 BayLibre, SAS + * Author: Fabien Parent + */ + +#ifndef __MFD_MT6357_CORE_H__ +#define __MFD_MT6357_CORE_H__ + +enum mt6357_irq_top_status_shift { + MT6357_BUCK_TOP = 0, + MT6357_LDO_TOP, + MT6357_PSC_TOP, + MT6357_SCK_TOP, + MT6357_BM_TOP, + MT6357_HK_TOP, + MT6357_XPP_TOP, + MT6357_AUD_TOP, + MT6357_MISC_TOP, +}; + +enum mt6357_irq_numbers { + MT6357_IRQ_VPROC_OC = 0, + MT6357_IRQ_VCORE_OC, + MT6357_IRQ_VMODEM_OC, + MT6357_IRQ_VS1_OC, + MT6357_IRQ_VPA_OC, + MT6357_IRQ_VCORE_PREOC, + MT6357_IRQ_VFE28_OC = 16, + MT6357_IRQ_VXO22_OC, + MT6357_IRQ_VRF18_OC, + MT6357_IRQ_VRF12_OC, + MT6357_IRQ_VEFUSE_OC, + MT6357_IRQ_VCN33_OC, + MT6357_IRQ_VCN28_OC, + MT6357_IRQ_VCN18_OC, + MT6357_IRQ_VCAMA_OC, + MT6357_IRQ_VCAMD_OC, + MT6357_IRQ_VCAMIO_OC, + MT6357_IRQ_VLDO28_OC, + MT6357_IRQ_VUSB33_OC, + MT6357_IRQ_VAUX18_OC, + MT6357_IRQ_VAUD28_OC, + MT6357_IRQ_VIO28_OC, + MT6357_IRQ_VIO18_OC, + MT6357_IRQ_VSRAM_PROC_OC, + MT6357_IRQ_VSRAM_OTHERS_OC, + MT6357_IRQ_VIBR_OC, + MT6357_IRQ_VDRAM_OC, + MT6357_IRQ_VMC_OC, + MT6357_IRQ_VMCH_OC, + MT6357_IRQ_VEMC_OC, + MT6357_IRQ_VSIM1_OC, + MT6357_IRQ_VSIM2_OC, + MT6357_IRQ_PWRKEY = 48, + MT6357_IRQ_HOMEKEY, + MT6357_IRQ_PWRKEY_R, + MT6357_IRQ_HOMEKEY_R, + MT6357_IRQ_NI_LBAT_INT, + MT6357_IRQ_CHRDET, + MT6357_IRQ_CHRDET_EDGE, + MT6357_IRQ_VCDT_HV_DET, + MT6357_IRQ_WATCHDOG, + MT6357_IRQ_VBATON_UNDET, + MT6357_IRQ_BVALID_DET, + MT6357_IRQ_OV, + MT6357_IRQ_RTC = 64, + MT6357_IRQ_FG_BAT0_H = 80, + MT6357_IRQ_FG_BAT0_L, + MT6357_IRQ_FG_CUR_H, + MT6357_IRQ_FG_CUR_L, + MT6357_IRQ_FG_ZCV, + MT6357_IRQ_BATON_LV = 96, + MT6357_IRQ_BATON_HT, + MT6357_IRQ_BAT_H = 112, + MT6357_IRQ_BAT_L, + MT6357_IRQ_AUXADC_IMP, + MT6357_IRQ_NAG_C_DLTV, + MT6357_IRQ_AUDIO = 128, + MT6357_IRQ_ACCDET = 133, + MT6357_IRQ_ACCDET_EINT0, + MT6357_IRQ_ACCDET_EINT1, + MT6357_IRQ_SPI_CMD_ALERT = 144, + MT6357_IRQ_NR, +}; + +#define MT6357_IRQ_BUCK_BASE MT6357_IRQ_VPROC_OC +#define MT6357_IRQ_LDO_BASE MT6357_IRQ_VFE28_OC +#define MT6357_IRQ_PSC_BASE MT6357_IRQ_PWRKEY +#define MT6357_IRQ_SCK_BASE MT6357_IRQ_RTC +#define MT6357_IRQ_BM_BASE MT6357_IRQ_FG_BAT0_H +#define MT6357_IRQ_HK_BASE MT6357_IRQ_BAT_H +#define MT6357_IRQ_AUD_BASE MT6357_IRQ_AUDIO +#define MT6357_IRQ_MISC_BASE MT6357_IRQ_SPI_CMD_ALERT + +#define MT6357_IRQ_BUCK_BITS (MT6357_IRQ_VCORE_PREOC - MT6357_IRQ_BUCK_BASE + 1) +#define MT6357_IRQ_LDO_BITS (MT6357_IRQ_VSIM2_OC - MT6357_IRQ_LDO_BASE + 1) +#define MT6357_IRQ_PSC_BITS (MT6357_IRQ_VCDT_HV_DET - MT6357_IRQ_PSC_BASE + 1) +#define MT6357_IRQ_SCK_BITS (MT6357_IRQ_RTC - MT6357_IRQ_SCK_BASE + 1) +#define MT6357_IRQ_BM_BITS (MT6357_IRQ_BATON_HT - MT6357_IRQ_BM_BASE + 1) +#define MT6357_IRQ_HK_BITS (MT6357_IRQ_NAG_C_DLTV - MT6357_IRQ_HK_BASE + 1) +#define MT6357_IRQ_AUD_BITS (MT6357_IRQ_ACCDET_EINT1 - MT6357_IRQ_AUD_BASE + 1) +#define MT6357_IRQ_MISC_BITS \ + (MT6357_IRQ_SPI_CMD_ALERT - MT6357_IRQ_MISC_BASE + 1) + +#define MT6357_TOP_GEN(sp) \ +{ \ + .hwirq_base = MT6357_IRQ_##sp##_BASE, \ + .num_int_regs = \ + ((MT6357_IRQ_##sp##_BITS - 1) / \ + MTK_PMIC_REG_WIDTH) + 1, \ + .en_reg = MT6357_##sp##_TOP_INT_CON0, \ + .en_reg_shift = 0x6, \ + .sta_reg = MT6357_##sp##_TOP_INT_STATUS0, \ + .sta_reg_shift = 0x2, \ + .top_offset = MT6357_##sp##_TOP, \ +} + +#endif /* __MFD_MT6357_CORE_H__ */ diff --git a/include/linux/mfd/mt6357/registers.h b/include/linux/mfd/mt6357/registers.h new file mode 100644 index 000000000000..e24af83b618d --- /dev/null +++ b/include/linux/mfd/mt6357/registers.h @@ -0,0 +1,1574 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2022 MediaTek Inc. + */ + +#ifndef __MFD_MT6357_REGISTERS_H__ +#define __MFD_MT6357_REGISTERS_H__ + +/* PMIC Registers */ +#define MT6357_TOP0_ID 0x0 +#define MT6357_TOP0_REV0 0x2 +#define MT6357_TOP0_DSN_DBI 0x4 +#define MT6357_TOP0_DSN_DXI 0x6 +#define MT6357_HWCID 0x8 +#define MT6357_SWCID 0xa +#define MT6357_PONSTS 0xc +#define MT6357_POFFSTS 0xe +#define MT6357_PSTSCTL 0x10 +#define MT6357_PG_DEB_STS0 0x12 +#define MT6357_PG_SDN_STS0 0x14 +#define MT6357_OC_SDN_STS0 0x16 +#define MT6357_THERMALSTATUS 0x18 +#define MT6357_TOP_CON 0x1a +#define MT6357_TEST_OUT 0x1c +#define MT6357_TEST_CON0 0x1e +#define MT6357_TEST_CON1 0x20 +#define MT6357_TESTMODE_SW 0x22 +#define MT6357_TOPSTATUS 0x24 +#define MT6357_TDSEL_CON 0x26 +#define MT6357_RDSEL_CON 0x28 +#define MT6357_SMT_CON0 0x2a +#define MT6357_SMT_CON1 0x2c +#define MT6357_TOP_RSV0 0x2e +#define MT6357_TOP_RSV1 0x30 +#define MT6357_DRV_CON0 0x32 +#define MT6357_DRV_CON1 0x34 +#define MT6357_DRV_CON2 0x36 +#define MT6357_DRV_CON3 0x38 +#define MT6357_FILTER_CON0 0x3a +#define MT6357_FILTER_CON1 0x3c +#define MT6357_FILTER_CON2 0x3e +#define MT6357_FILTER_CON3 0x40 +#define MT6357_TOP_STATUS 0x42 +#define MT6357_TOP_STATUS_SET 0x44 +#define MT6357_TOP_STATUS_CLR 0x46 +#define MT6357_TOP_TRAP 0x48 +#define MT6357_TOP1_ID 0x80 +#define MT6357_TOP1_REV0 0x82 +#define MT6357_TOP1_DSN_DBI 0x84 +#define MT6357_TOP1_DSN_DXI 0x86 +#define MT6357_GPIO_DIR0 0x88 +#define MT6357_GPIO_DIR0_SET 0x8a +#define MT6357_GPIO_DIR0_CLR 0x8c +#define MT6357_GPIO_PULLEN0 0x8e +#define MT6357_GPIO_PULLEN0_SET 0x90 +#define MT6357_GPIO_PULLEN0_CLR 0x92 +#define MT6357_GPIO_PULLSEL0 0x94 +#define MT6357_GPIO_PULLSEL0_SET 0x96 +#define MT6357_GPIO_PULLSEL0_CLR 0x98 +#define MT6357_GPIO_DINV0 0x9a +#define MT6357_GPIO_DINV0_SET 0x9c +#define MT6357_GPIO_DINV0_CLR 0x9e +#define MT6357_GPIO_DOUT0 0xa0 +#define MT6357_GPIO_DOUT0_SET 0xa2 +#define MT6357_GPIO_DOUT0_CLR 0xa4 +#define MT6357_GPIO_PI0 0xa6 +#define MT6357_GPIO_POE0 0xa8 +#define MT6357_GPIO_MODE0 0xaa +#define MT6357_GPIO_MODE0_SET 0xac +#define MT6357_GPIO_MODE0_CLR 0xae +#define MT6357_GPIO_MODE1 0xb0 +#define MT6357_GPIO_MODE1_SET 0xb2 +#define MT6357_GPIO_MODE1_CLR 0xb4 +#define MT6357_GPIO_MODE2 0xb6 +#define MT6357_GPIO_MODE2_SET 0xb8 +#define MT6357_GPIO_MODE2_CLR 0xba +#define MT6357_GPIO_MODE3 0xbc +#define MT6357_GPIO_MODE3_SET 0xbe +#define MT6357_GPIO_MODE3_CLR 0xc0 +#define MT6357_GPIO_RSV 0xc2 +#define MT6357_TOP2_ID 0x100 +#define MT6357_TOP2_REV0 0x102 +#define MT6357_TOP2_DSN_DBI 0x104 +#define MT6357_TOP2_DSN_DXI 0x106 +#define MT6357_TOP_PAM0 0x108 +#define MT6357_TOP_PAM1 0x10a +#define MT6357_TOP_CKPDN_CON0 0x10c +#define MT6357_TOP_CKPDN_CON0_SET 0x10e +#define MT6357_TOP_CKPDN_CON0_CLR 0x110 +#define MT6357_TOP_CKPDN_CON1 0x112 +#define MT6357_TOP_CKPDN_CON1_SET 0x114 +#define MT6357_TOP_CKPDN_CON1_CLR 0x116 +#define MT6357_TOP_CKSEL_CON0 0x118 +#define MT6357_TOP_CKSEL_CON0_SET 0x11a +#define MT6357_TOP_CKSEL_CON0_CLR 0x11c +#define MT6357_TOP_CKSEL_CON1 0x11e +#define MT6357_TOP_CKSEL_CON1_SET 0x120 +#define MT6357_TOP_CKSEL_CON1_CLR 0x122 +#define MT6357_TOP_CKDIVSEL_CON0 0x124 +#define MT6357_TOP_CKDIVSEL_CON0_SET 0x126 +#define MT6357_TOP_CKDIVSEL_CON0_CLR 0x128 +#define MT6357_TOP_CKHWEN_CON0 0x12a +#define MT6357_TOP_CKHWEN_CON0_SET 0x12c +#define MT6357_TOP_CKHWEN_CON0_CLR 0x12e +#define MT6357_TOP_CKTST_CON0 0x130 +#define MT6357_TOP_CKTST_CON1 0x132 +#define MT6357_TOP_CLK_CON0 0x134 +#define MT6357_TOP_CLK_CON0_SET 0x136 +#define MT6357_TOP_CLK_CON0_CLR 0x138 +#define MT6357_TOP_DCM_CON0 0x13a +#define MT6357_TOP_HANDOVER_DEBUG0 0x13c +#define MT6357_TOP_RST_CON0 0x13e +#define MT6357_TOP_RST_CON0_SET 0x140 +#define MT6357_TOP_RST_CON0_CLR 0x142 +#define MT6357_TOP_RST_CON1 0x144 +#define MT6357_TOP_RST_CON1_SET 0x146 +#define MT6357_TOP_RST_CON1_CLR 0x148 +#define MT6357_TOP_RST_CON2 0x14a +#define MT6357_TOP_RST_MISC 0x14c +#define MT6357_TOP_RST_MISC_SET 0x14e +#define MT6357_TOP_RST_MISC_CLR 0x150 +#define MT6357_TOP_RST_STATUS 0x152 +#define MT6357_TOP_RST_STATUS_SET 0x154 +#define MT6357_TOP_RST_STATUS_CLR 0x156 +#define MT6357_TOP2_ELR_NUM 0x158 +#define MT6357_TOP2_ELR0 0x15a +#define MT6357_TOP2_ELR1 0x15c +#define MT6357_TOP3_ID 0x180 +#define MT6357_TOP3_REV0 0x182 +#define MT6357_TOP3_DSN_DBI 0x184 +#define MT6357_TOP3_DSN_DXI 0x186 +#define MT6357_MISC_TOP_INT_CON0 0x188 +#define MT6357_MISC_TOP_INT_CON0_SET 0x18a +#define MT6357_MISC_TOP_INT_CON0_CLR 0x18c +#define MT6357_MISC_TOP_INT_MASK_CON0 0x18e +#define MT6357_MISC_TOP_INT_MASK_CON0_SET 0x190 +#define MT6357_MISC_TOP_INT_MASK_CON0_CLR 0x192 +#define MT6357_MISC_TOP_INT_STATUS0 0x194 +#define MT6357_MISC_TOP_INT_RAW_STATUS0 0x196 +#define MT6357_TOP_INT_MASK_CON0 0x198 +#define MT6357_TOP_INT_MASK_CON0_SET 0x19a +#define MT6357_TOP_INT_MASK_CON0_CLR 0x19c +#define MT6357_TOP_INT_STATUS0 0x19e +#define MT6357_TOP_INT_RAW_STATUS0 0x1a0 +#define MT6357_TOP_INT_CON0 0x1a2 +#define MT6357_PLT0_ID 0x380 +#define MT6357_PLT0_REV0 0x382 +#define MT6357_PLT0_REV1 0x384 +#define MT6357_PLT0_DSN_DXI 0x386 +#define MT6357_FQMTR_CON0 0x388 +#define MT6357_FQMTR_CON1 0x38a +#define MT6357_FQMTR_CON2 0x38c +#define MT6357_TOP_CLK_TRIM 0x38e +#define MT6357_OTP_CON0 0x390 +#define MT6357_OTP_CON1 0x392 +#define MT6357_OTP_CON2 0x394 +#define MT6357_OTP_CON3 0x396 +#define MT6357_OTP_CON4 0x398 +#define MT6357_OTP_CON5 0x39a +#define MT6357_OTP_CON6 0x39c +#define MT6357_OTP_CON7 0x39e +#define MT6357_OTP_CON8 0x3a0 +#define MT6357_OTP_CON9 0x3a2 +#define MT6357_OTP_CON10 0x3a4 +#define MT6357_OTP_CON11 0x3a6 +#define MT6357_OTP_CON12 0x3a8 +#define MT6357_OTP_CON13 0x3aa +#define MT6357_OTP_CON14 0x3ac +#define MT6357_TOP_TMA_KEY 0x3ae +#define MT6357_TOP_MDB_CONF0 0x3b0 +#define MT6357_TOP_MDB_CONF1 0x3b2 +#define MT6357_TOP_MDB_CONF2 0x3b4 +#define MT6357_PLT0_ELR_NUM 0x3b6 +#define MT6357_PLT0_ELR0 0x3b8 +#define MT6357_PLT0_ELR1 0x3ba +#define MT6357_SPISLV_ID 0x400 +#define MT6357_SPISLV_REV0 0x402 +#define MT6357_SPISLV_REV1 0x404 +#define MT6357_SPISLV_DSN_DXI 0x406 +#define MT6357_RG_SPI_CON0 0x408 +#define MT6357_DEW_DIO_EN 0x40a +#define MT6357_DEW_READ_TEST 0x40c +#define MT6357_DEW_WRITE_TEST 0x40e +#define MT6357_DEW_CRC_SWRST 0x410 +#define MT6357_DEW_CRC_EN 0x412 +#define MT6357_DEW_CRC_VAL 0x414 +#define MT6357_DEW_DBG_MON_SEL 0x416 +#define MT6357_DEW_CIPHER_KEY_SEL 0x418 +#define MT6357_DEW_CIPHER_IV_SEL 0x41a +#define MT6357_DEW_CIPHER_EN 0x41c +#define MT6357_DEW_CIPHER_RDY 0x41e +#define MT6357_DEW_CIPHER_MODE 0x420 +#define MT6357_DEW_CIPHER_SWRST 0x422 +#define MT6357_DEW_RDDMY_NO 0x424 +#define MT6357_INT_TYPE_CON0 0x426 +#define MT6357_INT_TYPE_CON0_SET 0x428 +#define MT6357_INT_TYPE_CON0_CLR 0x42a +#define MT6357_INT_STA 0x42c +#define MT6357_RG_SPI_CON1 0x42e +#define MT6357_RG_SPI_CON2 0x430 +#define MT6357_RG_SPI_CON3 0x432 +#define MT6357_RG_SPI_CON4 0x434 +#define MT6357_RG_SPI_CON5 0x436 +#define MT6357_RG_SPI_CON6 0x438 +#define MT6357_RG_SPI_CON7 0x43a +#define MT6357_RG_SPI_CON8 0x43c +#define MT6357_RG_SPI_CON9 0x43e +#define MT6357_RG_SPI_CON10 0x440 +#define MT6357_RG_SPI_CON11 0x442 +#define MT6357_RG_SPI_CON12 0x444 +#define MT6357_RG_SPI_CON13 0x446 +#define MT6357_TOP_SPI_CON0 0x448 +#define MT6357_TOP_SPI_CON1 0x44a +#define MT6357_SCK_TOP_DSN_ID 0x500 +#define MT6357_SCK_TOP_DSN_REV0 0x502 +#define MT6357_SCK_TOP_DBI 0x504 +#define MT6357_SCK_TOP_DXI 0x506 +#define MT6357_SCK_TOP_TPM0 0x508 +#define MT6357_SCK_TOP_TPM1 0x50a +#define MT6357_SCK_TOP_CON0 0x50c +#define MT6357_SCK_TOP_CON1 0x50e +#define MT6357_SCK_TOP_TEST_OUT 0x510 +#define MT6357_SCK_TOP_TEST_CON0 0x512 +#define MT6357_SCK_TOP_CKPDN_CON0 0x514 +#define MT6357_SCK_TOP_CKPDN_CON0_SET 0x516 +#define MT6357_SCK_TOP_CKPDN_CON0_CLR 0x518 +#define MT6357_SCK_TOP_CKHWEN_CON0 0x51a +#define MT6357_SCK_TOP_CKHWEN_CON0_SET 0x51c +#define MT6357_SCK_TOP_CKHWEN_CON0_CLR 0x51e +#define MT6357_SCK_TOP_CKTST_CON 0x520 +#define MT6357_SCK_TOP_RST_CON0 0x522 +#define MT6357_SCK_TOP_RST_CON0_SET 0x524 +#define MT6357_SCK_TOP_RST_CON0_CLR 0x526 +#define MT6357_SCK_TOP_INT_CON0 0x528 +#define MT6357_SCK_TOP_INT_CON0_SET 0x52a +#define MT6357_SCK_TOP_INT_CON0_CLR 0x52c +#define MT6357_SCK_TOP_INT_MASK_CON0 0x52e +#define MT6357_SCK_TOP_INT_MASK_CON0_SET 0x530 +#define MT6357_SCK_TOP_INT_MASK_CON0_CLR 0x532 +#define MT6357_SCK_TOP_INT_STATUS0 0x534 +#define MT6357_SCK_TOP_INT_RAW_STATUS0 0x536 +#define MT6357_SCK_TOP_INT_MISC_CON 0x538 +#define MT6357_EOSC_CALI_CON0 0x53a +#define MT6357_EOSC_CALI_CON1 0x53c +#define MT6357_RTC_MIX_CON0 0x53e +#define MT6357_RTC_MIX_CON1 0x540 +#define MT6357_RTC_MIX_CON2 0x542 +#define MT6357_RTC_DSN_ID 0x580 +#define MT6357_RTC_DSN_REV0 0x582 +#define MT6357_RTC_DBI 0x584 +#define MT6357_RTC_DXI 0x586 +#define MT6357_RTC_BBPU 0x588 +#define MT6357_RTC_IRQ_STA 0x58a +#define MT6357_RTC_IRQ_EN 0x58c +#define MT6357_RTC_CII_EN 0x58e +#define MT6357_RTC_AL_MASK 0x590 +#define MT6357_RTC_TC_SEC 0x592 +#define MT6357_RTC_TC_MIN 0x594 +#define MT6357_RTC_TC_HOU 0x596 +#define MT6357_RTC_TC_DOM 0x598 +#define MT6357_RTC_TC_DOW 0x59a +#define MT6357_RTC_TC_MTH 0x59c +#define MT6357_RTC_TC_YEA 0x59e +#define MT6357_RTC_AL_SEC 0x5a0 +#define MT6357_RTC_AL_MIN 0x5a2 +#define MT6357_RTC_AL_HOU 0x5a4 +#define MT6357_RTC_AL_DOM 0x5a6 +#define MT6357_RTC_AL_DOW 0x5a8 +#define MT6357_RTC_AL_MTH 0x5aa +#define MT6357_RTC_AL_YEA 0x5ac +#define MT6357_RTC_OSC32CON 0x5ae +#define MT6357_RTC_POWERKEY1 0x5b0 +#define MT6357_RTC_POWERKEY2 0x5b2 +#define MT6357_RTC_PDN1 0x5b4 +#define MT6357_RTC_PDN2 0x5b6 +#define MT6357_RTC_SPAR0 0x5b8 +#define MT6357_RTC_SPAR1 0x5ba +#define MT6357_RTC_PROT 0x5bc +#define MT6357_RTC_DIFF 0x5be +#define MT6357_RTC_CALI 0x5c0 +#define MT6357_RTC_WRTGR 0x5c2 +#define MT6357_RTC_CON 0x5c4 +#define MT6357_RTC_SEC_CTRL 0x5c6 +#define MT6357_RTC_INT_CNT 0x5c8 +#define MT6357_RTC_SEC_DAT0 0x5ca +#define MT6357_RTC_SEC_DAT1 0x5cc +#define MT6357_RTC_SEC_DAT2 0x5ce +#define MT6357_RTC_SEC_DSN_ID 0x600 +#define MT6357_RTC_SEC_DSN_REV0 0x602 +#define MT6357_RTC_SEC_DBI 0x604 +#define MT6357_RTC_SEC_DXI 0x606 +#define MT6357_RTC_TC_SEC_SEC 0x608 +#define MT6357_RTC_TC_MIN_SEC 0x60a +#define MT6357_RTC_TC_HOU_SEC 0x60c +#define MT6357_RTC_TC_DOM_SEC 0x60e +#define MT6357_RTC_TC_DOW_SEC 0x610 +#define MT6357_RTC_TC_MTH_SEC 0x612 +#define MT6357_RTC_TC_YEA_SEC 0x614 +#define MT6357_RTC_SEC_CK_PDN 0x616 +#define MT6357_RTC_SEC_WRTGR 0x618 +#define MT6357_DCXO_DSN_ID 0x780 +#define MT6357_DCXO_DSN_REV0 0x782 +#define MT6357_DCXO_DSN_DBI 0x784 +#define MT6357_DCXO_DSN_DXI 0x786 +#define MT6357_DCXO_CW00 0x788 +#define MT6357_DCXO_CW00_SET 0x78a +#define MT6357_DCXO_CW00_CLR 0x78c +#define MT6357_DCXO_CW01 0x78e +#define MT6357_DCXO_CW02 0x790 +#define MT6357_DCXO_CW03 0x792 +#define MT6357_DCXO_CW04 0x794 +#define MT6357_DCXO_CW05 0x796 +#define MT6357_DCXO_CW06 0x798 +#define MT6357_DCXO_CW07 0x79a +#define MT6357_DCXO_CW08 0x79c +#define MT6357_DCXO_CW09 0x79e +#define MT6357_DCXO_CW10 0x7a0 +#define MT6357_DCXO_CW11 0x7a2 +#define MT6357_DCXO_CW11_SET 0x7a4 +#define MT6357_DCXO_CW11_CLR 0x7a6 +#define MT6357_DCXO_CW12 0x7a8 +#define MT6357_DCXO_CW13 0x7aa +#define MT6357_DCXO_CW14 0x7ac +#define MT6357_DCXO_CW15 0x7ae +#define MT6357_DCXO_CW16 0x7b0 +#define MT6357_DCXO_CW17 0x7b2 +#define MT6357_DCXO_CW18 0x7b4 +#define MT6357_DCXO_CW19 0x7b6 +#define MT6357_DCXO_CW20 0x7b8 +#define MT6357_DCXO_CW21 0x7ba +#define MT6357_DCXO_CW22 0x7bc +#define MT6357_DCXO_ELR_NUM 0x7be +#define MT6357_DCXO_ELR0 0x7c0 +#define MT6357_PSC_TOP_ID 0x900 +#define MT6357_PSC_TOP_REV0 0x902 +#define MT6357_PSC_TOP_DBI 0x904 +#define MT6357_PSC_TOP_DXI 0x906 +#define MT6357_PSC_TPM0 0x908 +#define MT6357_PSC_TPM1 0x90a +#define MT6357_PSC_TOP_RSTCTL_0 0x90c +#define MT6357_PSC_TOP_INT_CON0 0x90e +#define MT6357_PSC_TOP_INT_CON0_SET 0x910 +#define MT6357_PSC_TOP_INT_CON0_CLR 0x912 +#define MT6357_PSC_TOP_INT_MASK_CON0 0x914 +#define MT6357_PSC_TOP_INT_MASK_CON0_SET 0x916 +#define MT6357_PSC_TOP_INT_MASK_CON0_CLR 0x918 +#define MT6357_PSC_TOP_INT_STATUS0 0x91a +#define MT6357_PSC_TOP_INT_RAW_STATUS0 0x91c +#define MT6357_PSC_TOP_INT_MISC_CON 0x91e +#define MT6357_PSC_TOP_INT_MISC_CON_SET 0x920 +#define MT6357_PSC_TOP_INT_MISC_CON_CLR 0x922 +#define MT6357_PSC_TOP_MON_CTL 0x924 +#define MT6357_STRUP_ID 0x980 +#define MT6357_STRUP_REV0 0x982 +#define MT6357_STRUP_DBI 0x984 +#define MT6357_STRUP_DXI 0x986 +#define MT6357_STRUP_ANA_CON0 0x988 +#define MT6357_STRUP_ANA_CON1 0x98a +#define MT6357_STRUP_ANA_CON2 0x98c +#define MT6357_STRUP_ELR_NUM 0x98e +#define MT6357_STRUP_ELR_0 0x990 +#define MT6357_PSEQ_ID 0xa00 +#define MT6357_PSEQ_REV0 0xa02 +#define MT6357_PSEQ_DBI 0xa04 +#define MT6357_PSEQ_DXI 0xa06 +#define MT6357_PPCCTL0 0xa08 +#define MT6357_PPCCTL1 0xa0a +#define MT6357_PPCCTL2 0xa0c +#define MT6357_PPCCFG0 0xa0e +#define MT6357_PPCTST0 0xa10 +#define MT6357_PORFLAG 0xa12 +#define MT6357_STRUP_CON0 0xa14 +#define MT6357_STRUP_CON1 0xa16 +#define MT6357_STRUP_CON2 0xa18 +#define MT6357_STRUP_CON3 0xa1a +#define MT6357_STRUP_CON4 0xa1c +#define MT6357_STRUP_CON5 0xa1e +#define MT6357_STRUP_CON6 0xa20 +#define MT6357_STRUP_CON7 0xa22 +#define MT6357_CPSCFG0 0xa24 +#define MT6357_STRUP_CON9 0xa26 +#define MT6357_STRUP_CON10 0xa28 +#define MT6357_STRUP_CON11 0xa2a +#define MT6357_STRUP_CON12 0xa2c +#define MT6357_STRUP_CON13 0xa2e +#define MT6357_STRUP_CON14 0xa30 +#define MT6357_STRUP_CON15 0xa32 +#define MT6357_STRUP_CON16 0xa34 +#define MT6357_STRUP_CON19 0xa36 +#define MT6357_PSEQ_ELR_NUM 0xa38 +#define MT6357_PSEQ_ELR7 0xa3a +#define MT6357_PSEQ_ELR8 0xa3c +#define MT6357_PCHR_DIG_DSN_ID 0xa80 +#define MT6357_PCHR_DIG_DSN_REV0 0xa82 +#define MT6357_PCHR_DIG_DSN_DBI 0xa84 +#define MT6357_PCHR_DIG_DSN_DXI 0xa86 +#define MT6357_CHR_TOP_CON0 0xa88 +#define MT6357_CHR_TOP_CON1 0xa8a +#define MT6357_CHR_TOP_CON2 0xa8c +#define MT6357_CHR_TOP_CON3 0xa8e +#define MT6357_CHR_TOP_CON4 0xa90 +#define MT6357_CHR_TOP_CON5 0xa92 +#define MT6357_CHR_TOP_CON6 0xa94 +#define MT6357_PCHR_DIG_ELR_NUM 0xa96 +#define MT6357_PCHR_ELR0 0xa98 +#define MT6357_PCHR_ELR1 0xa9a +#define MT6357_PCHR_MACRO_DSN_ID 0xb80 +#define MT6357_PCHR_MACRO_DSN_REV0 0xb82 +#define MT6357_PCHR_MACRO_DSN_DBI 0xb84 +#define MT6357_PCHR_MACRO_DSN_DXI 0xb86 +#define MT6357_CHR_CON0 0xb88 +#define MT6357_CHR_CON1 0xb8a +#define MT6357_CHR_CON2 0xb8c +#define MT6357_CHR_CON3 0xb8e +#define MT6357_CHR_CON4 0xb90 +#define MT6357_CHR_CON5 0xb92 +#define MT6357_CHR_CON6 0xb94 +#define MT6357_CHR_CON7 0xb96 +#define MT6357_CHR_CON8 0xb98 +#define MT6357_CHR_CON9 0xb9a +#define MT6357_BM_TOP_DSN_ID 0xc00 +#define MT6357_BM_TOP_DSN_REV0 0xc02 +#define MT6357_BM_TOP_DBI 0xc04 +#define MT6357_BM_TOP_DXI 0xc06 +#define MT6357_BM_TPM0 0xc08 +#define MT6357_BM_TPM1 0xc0a +#define MT6357_BM_TOP_CKPDN_CON0 0xc0c +#define MT6357_BM_TOP_CKPDN_CON0_SET 0xc0e +#define MT6357_BM_TOP_CKPDN_CON0_CLR 0xc10 +#define MT6357_BM_TOP_CKSEL_CON0 0xc12 +#define MT6357_BM_TOP_CKSEL_CON0_SET 0xc14 +#define MT6357_BM_TOP_CKSEL_CON0_CLR 0xc16 +#define MT6357_BM_TOP_CKTST_CON0 0xc18 +#define MT6357_BM_TOP_RST_CON0 0xc1a +#define MT6357_BM_TOP_RST_CON0_SET 0xc1c +#define MT6357_BM_TOP_RST_CON0_CLR 0xc1e +#define MT6357_BM_TOP_INT_CON0 0xc20 +#define MT6357_BM_TOP_INT_CON0_SET 0xc22 +#define MT6357_BM_TOP_INT_CON0_CLR 0xc24 +#define MT6357_BM_TOP_INT_CON1 0xc26 +#define MT6357_BM_TOP_INT_CON1_SET 0xc28 +#define MT6357_BM_TOP_INT_CON1_CLR 0xc2a +#define MT6357_BM_TOP_INT_MASK_CON0 0xc2c +#define MT6357_BM_TOP_INT_MASK_CON0_SET 0xc2e +#define MT6357_BM_TOP_INT_MASK_CON0_CLR 0xc30 +#define MT6357_BM_TOP_INT_MASK_CON1 0xc32 +#define MT6357_BM_TOP_INT_MASK_CON1_SET 0xc34 +#define MT6357_BM_TOP_INT_MASK_CON1_CLR 0xc36 +#define MT6357_BM_TOP_INT_STATUS0 0xc38 +#define MT6357_BM_TOP_INT_STATUS1 0xc3a +#define MT6357_BM_TOP_INT_RAW_STATUS0 0xc3c +#define MT6357_BM_TOP_INT_RAW_STATUS1 0xc3e +#define MT6357_BM_TOP_INT_MISC_CON 0xc40 +#define MT6357_BM_TOP_DBG_CON 0xc42 +#define MT6357_BM_TOP_RSV0 0xc44 +#define MT6357_FGADC_ANA_DSN_ID 0xc80 +#define MT6357_FGADC_ANA_DSN_REV0 0xc82 +#define MT6357_FGADC_ANA_DSN_DBI 0xc84 +#define MT6357_FGADC_ANA_DSN_DXI 0xc86 +#define MT6357_FGADC_ANA_CON0 0xc88 +#define MT6357_FGADC_ANA_TEST_CON0 0xc8a +#define MT6357_FGADC_ANA_ELR_NUM 0xc8c +#define MT6357_FGADC_ANA_ELR0 0xc8e +#define MT6357_FGADC_ANA_ELR1 0xc90 +#define MT6357_FGADC0_DSN_ID 0xd00 +#define MT6357_FGADC0_DSN_REV0 0xd02 +#define MT6357_FGADC0_DSN_DBI 0xd04 +#define MT6357_FGADC0_DSN_DXI 0xd06 +#define MT6357_FGADC_CON0 0xd08 +#define MT6357_FGADC_CON1 0xd0a +#define MT6357_FGADC_CON2 0xd0c +#define MT6357_FGADC_CON3 0xd0e +#define MT6357_FGADC_CON4 0xd10 +#define MT6357_FGADC_CAR_CON0 0xd12 +#define MT6357_FGADC_CAR_CON1 0xd14 +#define MT6357_FGADC_CAR_CON2 0xd16 +#define MT6357_FGADC_CARTH_CON0 0xd18 +#define MT6357_FGADC_CARTH_CON1 0xd1a +#define MT6357_FGADC_CARTH_CON2 0xd1c +#define MT6357_FGADC_CARTH_CON3 0xd1e +#define MT6357_FGADC_NTER_CON0 0xd20 +#define MT6357_FGADC_NTER_CON1 0xd22 +#define MT6357_FGADC_NTER_CON2 0xd24 +#define MT6357_FGADC_SON_CON0 0xd26 +#define MT6357_FGADC_SON_CON1 0xd28 +#define MT6357_FGADC_SON_CON2 0xd2a +#define MT6357_FGADC_SON_CON3 0xd2c +#define MT6357_FGADC_ZCV_CON0 0xd2e +#define MT6357_FGADC_ZCV_CON1 0xd30 +#define MT6357_FGADC_ZCV_CON2 0xd32 +#define MT6357_FGADC_ZCV_CON3 0xd34 +#define MT6357_FGADC_ZCV_CON4 0xd36 +#define MT6357_FGADC_ZCVTH_CON0 0xd38 +#define MT6357_FGADC_ZCVTH_CON1 0xd3a +#define MT6357_FGADC_ZCVTH_CON2 0xd3c +#define MT6357_FGADC1_DSN_ID 0xd80 +#define MT6357_FGADC1_DSN_REV0 0xd82 +#define MT6357_FGADC1_DSN_DBI 0xd84 +#define MT6357_FGADC1_DSN_DXI 0xd86 +#define MT6357_FGADC_R_CON0 0xd88 +#define MT6357_FGADC_CUR_CON0 0xd8a +#define MT6357_FGADC_CUR_CON1 0xd8c +#define MT6357_FGADC_CUR_CON2 0xd8e +#define MT6357_FGADC_CUR_CON3 0xd90 +#define MT6357_FGADC_OFFSET_CON0 0xd92 +#define MT6357_FGADC_OFFSET_CON1 0xd94 +#define MT6357_FGADC_GAIN_CON0 0xd96 +#define MT6357_FGADC_TEST_CON0 0xd98 +#define MT6357_SYSTEM_INFO_CON0 0xd9a +#define MT6357_SYSTEM_INFO_CON1 0xd9c +#define MT6357_SYSTEM_INFO_CON2 0xd9e +#define MT6357_SYSTEM_INFO_CON3 0xda0 +#define MT6357_SYSTEM_INFO_CON4 0xda2 +#define MT6357_BATON_ANA_DSN_ID 0xe00 +#define MT6357_BATON_ANA_DSN_REV0 0xe02 +#define MT6357_BATON_ANA_DSN_DBI 0xe04 +#define MT6357_BATON_ANA_DSN_DXI 0xe06 +#define MT6357_BATON_ANA_CON0 0xe08 +#define MT6357_BATON_ANA_ELR_NUM 0xe0a +#define MT6357_BATON_ANA_ELR0 0xe0c +#define MT6357_HK_TOP_ID 0xf80 +#define MT6357_HK_TOP_REV0 0xf82 +#define MT6357_HK_TOP_DBI 0xf84 +#define MT6357_HK_TOP_DXI 0xf86 +#define MT6357_HK_TPM0 0xf88 +#define MT6357_HK_TPM1 0xf8a +#define MT6357_HK_TOP_CLK_CON0 0xf8c +#define MT6357_HK_TOP_CLK_CON1 0xf8e +#define MT6357_HK_TOP_RST_CON0 0xf90 +#define MT6357_HK_TOP_INT_CON0 0xf92 +#define MT6357_HK_TOP_INT_CON0_SET 0xf94 +#define MT6357_HK_TOP_INT_CON0_CLR 0xf96 +#define MT6357_HK_TOP_INT_MASK_CON0 0xf98 +#define MT6357_HK_TOP_INT_MASK_CON0_SET 0xf9a +#define MT6357_HK_TOP_INT_MASK_CON0_CLR 0xf9c +#define MT6357_HK_TOP_INT_STATUS0 0xf9e +#define MT6357_HK_TOP_INT_RAW_STATUS0 0xfa0 +#define MT6357_HK_TOP_MON_CON0 0xfa2 +#define MT6357_HK_TOP_MON_CON1 0xfa4 +#define MT6357_HK_TOP_MON_CON2 0xfa6 +#define MT6357_AUXADC_DSN_ID 0x1000 +#define MT6357_AUXADC_DSN_REV0 0x1002 +#define MT6357_AUXADC_DSN_DBI 0x1004 +#define MT6357_AUXADC_DSN_DXI 0x1006 +#define MT6357_AUXADC_ANA_CON0 0x1008 +#define MT6357_AUXADC_DIG_1_DSN_ID 0x1080 +#define MT6357_AUXADC_DIG_1_DSN_REV0 0x1082 +#define MT6357_AUXADC_DIG_1_DSN_DBI 0x1084 +#define MT6357_AUXADC_DIG_1_DSN_DXI 0x1086 +#define MT6357_AUXADC_ADC0 0x1088 +#define MT6357_AUXADC_ADC1 0x108a +#define MT6357_AUXADC_ADC2 0x108c +#define MT6357_AUXADC_ADC3 0x108e +#define MT6357_AUXADC_ADC4 0x1090 +#define MT6357_AUXADC_ADC5 0x1092 +#define MT6357_AUXADC_ADC6 0x1094 +#define MT6357_AUXADC_ADC7 0x1096 +#define MT6357_AUXADC_ADC8 0x1098 +#define MT6357_AUXADC_ADC9 0x109a +#define MT6357_AUXADC_ADC10 0x109c +#define MT6357_AUXADC_ADC11 0x109e +#define MT6357_AUXADC_ADC12 0x10a0 +#define MT6357_AUXADC_ADC14 0x10a2 +#define MT6357_AUXADC_ADC16 0x10a4 +#define MT6357_AUXADC_ADC17 0x10a6 +#define MT6357_AUXADC_ADC18 0x10a8 +#define MT6357_AUXADC_ADC19 0x10aa +#define MT6357_AUXADC_ADC20 0x10ac +#define MT6357_AUXADC_ADC21 0x10ae +#define MT6357_AUXADC_ADC22 0x10b0 +#define MT6357_AUXADC_ADC23 0x10b2 +#define MT6357_AUXADC_ADC24 0x10b4 +#define MT6357_AUXADC_ADC25 0x10b6 +#define MT6357_AUXADC_ADC26 0x10b8 +#define MT6357_AUXADC_ADC27 0x10ba +#define MT6357_AUXADC_ADC29 0x10bc +#define MT6357_AUXADC_ADC30 0x10be +#define MT6357_AUXADC_ADC31 0x10c0 +#define MT6357_AUXADC_ADC32 0x10c2 +#define MT6357_AUXADC_ADC33 0x10c4 +#define MT6357_AUXADC_ADC34 0x10c6 +#define MT6357_AUXADC_ADC35 0x10c8 +#define MT6357_AUXADC_ADC36 0x10ca +#define MT6357_AUXADC_ADC38 0x10cc +#define MT6357_AUXADC_ADC39 0x10ce +#define MT6357_AUXADC_ADC40 0x10d0 +#define MT6357_AUXADC_ADC41 0x10d2 +#define MT6357_AUXADC_ADC42 0x10d4 +#define MT6357_AUXADC_ADC43 0x10d6 +#define MT6357_AUXADC_ADC46 0x10d8 +#define MT6357_AUXADC_ADC47 0x10da +#define MT6357_AUXADC_DIG_1_ELR_NUM 0x10dc +#define MT6357_AUXADC_DIG_1_ELR0 0x10de +#define MT6357_AUXADC_DIG_1_ELR1 0x10e0 +#define MT6357_AUXADC_DIG_2_DSN_ID 0x1100 +#define MT6357_AUXADC_DIG_2_DSN_REV0 0x1102 +#define MT6357_AUXADC_DIG_2_DSN_DBI 0x1104 +#define MT6357_AUXADC_DIG_2_DSN_DXI 0x1106 +#define MT6357_AUXADC_STA0 0x1108 +#define MT6357_AUXADC_STA1 0x110a +#define MT6357_AUXADC_STA2 0x110c +#define MT6357_AUXADC_RQST0 0x110e +#define MT6357_AUXADC_RQST0_SET 0x1110 +#define MT6357_AUXADC_RQST0_CLR 0x1112 +#define MT6357_AUXADC_RQST2 0x1114 +#define MT6357_AUXADC_RQST2_SET 0x1116 +#define MT6357_AUXADC_RQST2_CLR 0x1118 +#define MT6357_AUXADC_RQST1 0x111a +#define MT6357_AUXADC_RQST1_SET 0x111c +#define MT6357_AUXADC_RQST1_CLR 0x111e +#define MT6357_AUXADC_CON0 0x1120 +#define MT6357_AUXADC_CON0_SET 0x1122 +#define MT6357_AUXADC_CON0_CLR 0x1124 +#define MT6357_AUXADC_CON1 0x1126 +#define MT6357_AUXADC_CON2 0x1128 +#define MT6357_AUXADC_CON3 0x112a +#define MT6357_AUXADC_CON4 0x112c +#define MT6357_AUXADC_CON5 0x112e +#define MT6357_AUXADC_CON6 0x1130 +#define MT6357_AUXADC_CON7 0x1132 +#define MT6357_AUXADC_CON8 0x1134 +#define MT6357_AUXADC_CON9 0x1136 +#define MT6357_AUXADC_CON10 0x1138 +#define MT6357_AUXADC_CON11 0x113a +#define MT6357_AUXADC_CON12 0x113c +#define MT6357_AUXADC_CON13 0x113e +#define MT6357_AUXADC_CON14 0x1140 +#define MT6357_AUXADC_CON15 0x1142 +#define MT6357_AUXADC_CON16 0x1144 +#define MT6357_AUXADC_CON17 0x1146 +#define MT6357_AUXADC_CON18 0x1148 +#define MT6357_AUXADC_CON19 0x114a +#define MT6357_AUXADC_CON20 0x114c +#define MT6357_AUXADC_DIG_3_DSN_ID 0x1180 +#define MT6357_AUXADC_DIG_3_DSN_REV0 0x1182 +#define MT6357_AUXADC_DIG_3_DSN_DBI 0x1184 +#define MT6357_AUXADC_DIG_3_DSN_DXI 0x1186 +#define MT6357_AUXADC_AUTORPT0 0x1188 +#define MT6357_AUXADC_LBAT0 0x118a +#define MT6357_AUXADC_LBAT1 0x118c +#define MT6357_AUXADC_LBAT2 0x118e +#define MT6357_AUXADC_LBAT3 0x1190 +#define MT6357_AUXADC_LBAT4 0x1192 +#define MT6357_AUXADC_LBAT5 0x1194 +#define MT6357_AUXADC_LBAT6 0x1196 +#define MT6357_AUXADC_ACCDET 0x1198 +#define MT6357_AUXADC_DBG0 0x119a +#define MT6357_AUXADC_IMP0 0x119c +#define MT6357_AUXADC_IMP1 0x119e +#define MT6357_AUXADC_DIG_3_ELR_NUM 0x11a0 +#define MT6357_AUXADC_DIG_3_ELR0 0x11a2 +#define MT6357_AUXADC_DIG_3_ELR1 0x11a4 +#define MT6357_AUXADC_DIG_3_ELR2 0x11a6 +#define MT6357_AUXADC_DIG_3_ELR3 0x11a8 +#define MT6357_AUXADC_DIG_3_ELR4 0x11aa +#define MT6357_AUXADC_DIG_3_ELR5 0x11ac +#define MT6357_AUXADC_DIG_3_ELR6 0x11ae +#define MT6357_AUXADC_DIG_3_ELR7 0x11b0 +#define MT6357_AUXADC_DIG_3_ELR8 0x11b2 +#define MT6357_AUXADC_DIG_3_ELR9 0x11b4 +#define MT6357_AUXADC_DIG_3_ELR10 0x11b6 +#define MT6357_AUXADC_DIG_3_ELR11 0x11b8 +#define MT6357_AUXADC_DIG_4_DSN_ID 0x1200 +#define MT6357_AUXADC_DIG_4_DSN_REV0 0x1202 +#define MT6357_AUXADC_DIG_4_DSN_DBI 0x1204 +#define MT6357_AUXADC_DIG_4_DSN_DXI 0x1206 +#define MT6357_AUXADC_MDRT_0 0x1208 +#define MT6357_AUXADC_MDRT_1 0x120a +#define MT6357_AUXADC_MDRT_2 0x120c +#define MT6357_AUXADC_MDRT_3 0x120e +#define MT6357_AUXADC_MDRT_4 0x1210 +#define MT6357_AUXADC_DCXO_MDRT_0 0x1212 +#define MT6357_AUXADC_DCXO_MDRT_1 0x1214 +#define MT6357_AUXADC_DCXO_MDRT_2 0x1216 +#define MT6357_AUXADC_NAG_0 0x1218 +#define MT6357_AUXADC_NAG_1 0x121a +#define MT6357_AUXADC_NAG_2 0x121c +#define MT6357_AUXADC_NAG_3 0x121e +#define MT6357_AUXADC_NAG_4 0x1220 +#define MT6357_AUXADC_NAG_5 0x1222 +#define MT6357_AUXADC_NAG_6 0x1224 +#define MT6357_AUXADC_NAG_7 0x1226 +#define MT6357_AUXADC_NAG_8 0x1228 +#define MT6357_AUXADC_RSV_1 0x122a +#define MT6357_AUXADC_ANA_0 0x122c +#define MT6357_AUXADC_IMP_CG0 0x122e +#define MT6357_AUXADC_LBAT_CG0 0x1230 +#define MT6357_AUXADC_NAG_CG0 0x1232 +#define MT6357_AUXADC_PRI_NEW 0x1234 +#define MT6357_AUXADC_CHR_TOP_CON2 0x1236 +#define MT6357_BUCK_TOP_DSN_ID 0x1400 +#define MT6357_BUCK_TOP_DSN_REV0 0x1402 +#define MT6357_BUCK_TOP_DBI 0x1404 +#define MT6357_BUCK_TOP_DXI 0x1406 +#define MT6357_BUCK_TOP_PAM0 0x1408 +#define MT6357_BUCK_TOP_PAM1 0x140a +#define MT6357_BUCK_TOP_CLK_CON0 0x140c +#define MT6357_BUCK_TOP_CLK_CON0_SET 0x140e +#define MT6357_BUCK_TOP_CLK_CON0_CLR 0x1410 +#define MT6357_BUCK_TOP_CLK_HWEN_CON0 0x1412 +#define MT6357_BUCK_TOP_CLK_HWEN_CON0_SET 0x1414 +#define MT6357_BUCK_TOP_CLK_HWEN_CON0_CLR 0x1416 +#define MT6357_BUCK_TOP_CLK_MISC_CON0 0x1418 +#define MT6357_BUCK_TOP_INT_CON0 0x141a +#define MT6357_BUCK_TOP_INT_CON0_SET 0x141c +#define MT6357_BUCK_TOP_INT_CON0_CLR 0x141e +#define MT6357_BUCK_TOP_INT_MASK_CON0 0x1420 +#define MT6357_BUCK_TOP_INT_MASK_CON0_SET 0x1422 +#define MT6357_BUCK_TOP_INT_MASK_CON0_CLR 0x1424 +#define MT6357_BUCK_TOP_INT_STATUS0 0x1426 +#define MT6357_BUCK_TOP_INT_RAW_STATUS0 0x1428 +#define MT6357_BUCK_TOP_STB_CON 0x142a +#define MT6357_BUCK_TOP_SLP_CON0 0x142c +#define MT6357_BUCK_TOP_SLP_CON1 0x142e +#define MT6357_BUCK_TOP_SLP_CON2 0x1430 +#define MT6357_BUCK_TOP_MINFREQ_CON 0x1432 +#define MT6357_BUCK_TOP_OC_CON0 0x1434 +#define MT6357_BUCK_TOP_K_CON0 0x1436 +#define MT6357_BUCK_TOP_K_CON1 0x1438 +#define MT6357_BUCK_TOP_K_CON2 0x143a +#define MT6357_BUCK_TOP_WDTDBG0 0x143c +#define MT6357_BUCK_TOP_WDTDBG1 0x143e +#define MT6357_BUCK_TOP_WDTDBG2 0x1440 +#define MT6357_BUCK_TOP_ELR_NUM 0x1442 +#define MT6357_BUCK_TOP_ELR0 0x1444 +#define MT6357_BUCK_TOP_ELR1 0x1446 +#define MT6357_BUCK_VPROC_DSN_ID 0x1480 +#define MT6357_BUCK_VPROC_DSN_REV0 0x1482 +#define MT6357_BUCK_VPROC_DSN_DBI 0x1484 +#define MT6357_BUCK_VPROC_DSN_DXI 0x1486 +#define MT6357_BUCK_VPROC_CON0 0x1488 +#define MT6357_BUCK_VPROC_CON1 0x148a +#define MT6357_BUCK_VPROC_CFG0 0x148c +#define MT6357_BUCK_VPROC_CFG1 0x148e +#define MT6357_BUCK_VPROC_OP_EN 0x1490 +#define MT6357_BUCK_VPROC_OP_EN_SET 0x1492 +#define MT6357_BUCK_VPROC_OP_EN_CLR 0x1494 +#define MT6357_BUCK_VPROC_OP_CFG 0x1496 +#define MT6357_BUCK_VPROC_OP_CFG_SET 0x1498 +#define MT6357_BUCK_VPROC_OP_CFG_CLR 0x149a +#define MT6357_BUCK_VPROC_SP_CON 0x149c +#define MT6357_BUCK_VPROC_SP_CFG 0x149e +#define MT6357_BUCK_VPROC_OC_CFG 0x14a0 +#define MT6357_BUCK_VPROC_DBG0 0x14a2 +#define MT6357_BUCK_VPROC_DBG1 0x14a4 +#define MT6357_BUCK_VPROC_DBG2 0x14a6 +#define MT6357_BUCK_VPROC_ELR_NUM 0x14a8 +#define MT6357_BUCK_VPROC_ELR0 0x14aa +#define MT6357_BUCK_VCORE_DSN_ID 0x1500 +#define MT6357_BUCK_VCORE_DSN_REV0 0x1502 +#define MT6357_BUCK_VCORE_DSN_DBI 0x1504 +#define MT6357_BUCK_VCORE_DSN_DXI 0x1506 +#define MT6357_BUCK_VCORE_CON0 0x1508 +#define MT6357_BUCK_VCORE_CON1 0x150a +#define MT6357_BUCK_VCORE_CFG0 0x150c +#define MT6357_BUCK_VCORE_CFG1 0x150e +#define MT6357_BUCK_VCORE_OP_EN 0x1510 +#define MT6357_BUCK_VCORE_OP_EN_SET 0x1512 +#define MT6357_BUCK_VCORE_OP_EN_CLR 0x1514 +#define MT6357_BUCK_VCORE_OP_CFG 0x1516 +#define MT6357_BUCK_VCORE_OP_CFG_SET 0x1518 +#define MT6357_BUCK_VCORE_OP_CFG_CLR 0x151a +#define MT6357_BUCK_VCORE_SP_CON 0x151c +#define MT6357_BUCK_VCORE_SP_CFG 0x151e +#define MT6357_BUCK_VCORE_OC_CFG 0x1520 +#define MT6357_BUCK_VCORE_DBG0 0x1522 +#define MT6357_BUCK_VCORE_DBG1 0x1524 +#define MT6357_BUCK_VCORE_DBG2 0x1526 +#define MT6357_BUCK_VCORE_ELR_NUM 0x1528 +#define MT6357_BUCK_VCORE_ELR0 0x152a +#define MT6357_BUCK_VMODEM_DSN_ID 0x1580 +#define MT6357_BUCK_VMODEM_DSN_REV0 0x1582 +#define MT6357_BUCK_VMODEM_DSN_DBI 0x1584 +#define MT6357_BUCK_VMODEM_DSN_DXI 0x1586 +#define MT6357_BUCK_VMODEM_CON0 0x1588 +#define MT6357_BUCK_VMODEM_CON1 0x158a +#define MT6357_BUCK_VMODEM_CFG0 0x158c +#define MT6357_BUCK_VMODEM_CFG1 0x158e +#define MT6357_BUCK_VMODEM_OP_EN 0x1590 +#define MT6357_BUCK_VMODEM_OP_EN_SET 0x1592 +#define MT6357_BUCK_VMODEM_OP_EN_CLR 0x1594 +#define MT6357_BUCK_VMODEM_OP_CFG 0x1596 +#define MT6357_BUCK_VMODEM_OP_CFG_SET 0x1598 +#define MT6357_BUCK_VMODEM_OP_CFG_CLR 0x159a +#define MT6357_BUCK_VMODEM_SP_CON 0x159c +#define MT6357_BUCK_VMODEM_SP_CFG 0x159e +#define MT6357_BUCK_VMODEM_OC_CFG 0x15a0 +#define MT6357_BUCK_VMODEM_DBG0 0x15a2 +#define MT6357_BUCK_VMODEM_DBG1 0x15a4 +#define MT6357_BUCK_VMODEM_DBG2 0x15a6 +#define MT6357_BUCK_VMODEM_ELR_NUM 0x15a8 +#define MT6357_BUCK_VMODEM_ELR0 0x15aa +#define MT6357_BUCK_VS1_DSN_ID 0x1600 +#define MT6357_BUCK_VS1_DSN_REV0 0x1602 +#define MT6357_BUCK_VS1_DSN_DBI 0x1604 +#define MT6357_BUCK_VS1_DSN_DXI 0x1606 +#define MT6357_BUCK_VS1_CON0 0x1608 +#define MT6357_BUCK_VS1_CON1 0x160a +#define MT6357_BUCK_VS1_CFG0 0x160c +#define MT6357_BUCK_VS1_CFG1 0x160e +#define MT6357_BUCK_VS1_OP_EN 0x1610 +#define MT6357_BUCK_VS1_OP_EN_SET 0x1612 +#define MT6357_BUCK_VS1_OP_EN_CLR 0x1614 +#define MT6357_BUCK_VS1_OP_CFG 0x1616 +#define MT6357_BUCK_VS1_OP_CFG_SET 0x1618 +#define MT6357_BUCK_VS1_OP_CFG_CLR 0x161a +#define MT6357_BUCK_VS1_SP_CON 0x161c +#define MT6357_BUCK_VS1_SP_CFG 0x161e +#define MT6357_BUCK_VS1_OC_CFG 0x1620 +#define MT6357_BUCK_VS1_DBG0 0x1622 +#define MT6357_BUCK_VS1_DBG1 0x1624 +#define MT6357_BUCK_VS1_DBG2 0x1626 +#define MT6357_BUCK_VS1_VOTER 0x1628 +#define MT6357_BUCK_VS1_VOTER_SET 0x162a +#define MT6357_BUCK_VS1_VOTER_CLR 0x162c +#define MT6357_BUCK_VS1_VOTER_CFG 0x162e +#define MT6357_BUCK_VS1_ELR_NUM 0x1630 +#define MT6357_BUCK_VS1_ELR0 0x1632 +#define MT6357_BUCK_VPA_DSN_ID 0x1680 +#define MT6357_BUCK_VPA_DSN_REV0 0x1682 +#define MT6357_BUCK_VPA_DSN_DBI 0x1684 +#define MT6357_BUCK_VPA_DSN_DXI 0x1686 +#define MT6357_BUCK_VPA_CON0 0x1688 +#define MT6357_BUCK_VPA_CON1 0x168a +#define MT6357_BUCK_VPA_CFG0 0x168c +#define MT6357_BUCK_VPA_CFG1 0x168e +#define MT6357_BUCK_VPA_OC_CFG 0x1690 +#define MT6357_BUCK_VPA_DBG0 0x1692 +#define MT6357_BUCK_VPA_DBG1 0x1694 +#define MT6357_BUCK_VPA_DBG2 0x1696 +#define MT6357_BUCK_VPA_DLC_CON0 0x1698 +#define MT6357_BUCK_VPA_DLC_CON1 0x169a +#define MT6357_BUCK_VPA_DLC_CON2 0x169c +#define MT6357_BUCK_VPA_MSFG_CON0 0x169e +#define MT6357_BUCK_VPA_MSFG_CON1 0x16a0 +#define MT6357_BUCK_VPA_MSFG_RRATE0 0x16a2 +#define MT6357_BUCK_VPA_MSFG_RRATE1 0x16a4 +#define MT6357_BUCK_VPA_MSFG_RRATE2 0x16a6 +#define MT6357_BUCK_VPA_MSFG_RTHD0 0x16a8 +#define MT6357_BUCK_VPA_MSFG_RTHD1 0x16aa +#define MT6357_BUCK_VPA_MSFG_RTHD2 0x16ac +#define MT6357_BUCK_VPA_MSFG_FRATE0 0x16ae +#define MT6357_BUCK_VPA_MSFG_FRATE1 0x16b0 +#define MT6357_BUCK_VPA_MSFG_FRATE2 0x16b2 +#define MT6357_BUCK_VPA_MSFG_FTHD0 0x16b4 +#define MT6357_BUCK_VPA_MSFG_FTHD1 0x16b6 +#define MT6357_BUCK_VPA_MSFG_FTHD2 0x16b8 +#define MT6357_BUCK_ANA_DSN_ID 0x1700 +#define MT6357_BUCK_ANA_DSN_REV0 0x1702 +#define MT6357_BUCK_ANA_DSN_DBI 0x1704 +#define MT6357_BUCK_ANA_DSN_FPI 0x1706 +#define MT6357_SMPS_ANA_CON0 0x1708 +#define MT6357_SMPS_ANA_CON1 0x170a +#define MT6357_SMPS_ANA_CON2 0x170c +#define MT6357_VCORE_VPROC_ANA_CON0 0x170e +#define MT6357_VCORE_VPROC_ANA_CON1 0x1710 +#define MT6357_VCORE_VPROC_ANA_CON2 0x1712 +#define MT6357_VCORE_VPROC_ANA_CON3 0x1714 +#define MT6357_VCORE_VPROC_ANA_CON4 0x1716 +#define MT6357_VCORE_VPROC_ANA_CON5 0x1718 +#define MT6357_VCORE_VPROC_ANA_CON6 0x171a +#define MT6357_VCORE_VPROC_ANA_CON7 0x171c +#define MT6357_VCORE_VPROC_ANA_CON8 0x171e +#define MT6357_VCORE_VPROC_ANA_CON9 0x1720 +#define MT6357_VCORE_VPROC_ANA_CON10 0x1722 +#define MT6357_VCORE_VPROC_ANA_CON11 0x1724 +#define MT6357_VMODEM_ANA_CON0 0x1726 +#define MT6357_VMODEM_ANA_CON1 0x1728 +#define MT6357_VMODEM_ANA_CON2 0x172a +#define MT6357_VMODEM_ANA_CON3 0x172c +#define MT6357_VMODEM_ANA_CON4 0x172e +#define MT6357_VMODEM_ANA_CON5 0x1730 +#define MT6357_VS1_ANA_CON0 0x1732 +#define MT6357_VS1_ANA_CON1 0x1734 +#define MT6357_VS1_ANA_CON2 0x1736 +#define MT6357_VS1_ANA_CON3 0x1738 +#define MT6357_VS1_ANA_CON4 0x173a +#define MT6357_VS1_ANA_CON5 0x173c +#define MT6357_VPA_ANA_CON0 0x173e +#define MT6357_VPA_ANA_CON1 0x1740 +#define MT6357_VPA_ANA_CON2 0x1742 +#define MT6357_VPA_ANA_CON3 0x1744 +#define MT6357_VPA_ANA_CON4 0x1746 +#define MT6357_VPA_ANA_CON5 0x1748 +#define MT6357_BUCK_ANA_ELR_NUM 0x174a +#define MT6357_SMPS_ELR_0 0x174c +#define MT6357_SMPS_ELR_1 0x174e +#define MT6357_SMPS_ELR_2 0x1750 +#define MT6357_SMPS_ELR_3 0x1752 +#define MT6357_SMPS_ELR_4 0x1754 +#define MT6357_SMPS_ELR_5 0x1756 +#define MT6357_VCORE_VPROC_ELR_0 0x1758 +#define MT6357_VCORE_VPROC_ELR_1 0x175a +#define MT6357_VCORE_VPROC_ELR_2 0x175c +#define MT6357_VCORE_VPROC_ELR_3 0x175e +#define MT6357_VCORE_VPROC_ELR_4 0x1760 +#define MT6357_VMODEM_ELR_0 0x1762 +#define MT6357_VMODEM_ELR_1 0x1764 +#define MT6357_VMODEM_ELR_2 0x1766 +#define MT6357_VS1_ELR_0 0x1768 +#define MT6357_VS1_ELR_1 0x176a +#define MT6357_VPA_ELR_0 0x176c +#define MT6357_LDO_TOP_ID 0x1880 +#define MT6357_LDO_TOP_REV0 0x1882 +#define MT6357_LDO_TOP_DBI 0x1884 +#define MT6357_LDO_TOP_DXI 0x1886 +#define MT6357_LDO_TPM0 0x1888 +#define MT6357_LDO_TPM1 0x188a +#define MT6357_LDO_TOP_CLK_DCM_CON0 0x188c +#define MT6357_LDO_TOP_CLK_VIO28_CON0 0x188e +#define MT6357_LDO_TOP_CLK_VIO18_CON0 0x1890 +#define MT6357_LDO_TOP_CLK_VAUD28_CON0 0x1892 +#define MT6357_LDO_TOP_CLK_VDRAM_CON0 0x1894 +#define MT6357_LDO_TOP_CLK_VSRAM_PROC_CON0 0x1896 +#define MT6357_LDO_TOP_CLK_VSRAM_OTHERS_CON0 0x1898 +#define MT6357_LDO_TOP_CLK_VAUX18_CON0 0x189a +#define MT6357_LDO_TOP_CLK_VUSB33_CON0 0x189c +#define MT6357_LDO_TOP_CLK_VEMC_CON0 0x189e +#define MT6357_LDO_TOP_CLK_VXO22_CON0 0x18a0 +#define MT6357_LDO_TOP_CLK_VSIM1_CON0 0x18a2 +#define MT6357_LDO_TOP_CLK_VSIM2_CON0 0x18a4 +#define MT6357_LDO_TOP_CLK_VCAMD_CON0 0x18a6 +#define MT6357_LDO_TOP_CLK_VCAMIO_CON0 0x18a8 +#define MT6357_LDO_TOP_CLK_VEFUSE_CON0 0x18aa +#define MT6357_LDO_TOP_CLK_VCN33_CON0 0x18ac +#define MT6357_LDO_TOP_CLK_VCN18_CON0 0x18ae +#define MT6357_LDO_TOP_CLK_VCN28_CON0 0x18b0 +#define MT6357_LDO_TOP_CLK_VIBR_CON0 0x18b2 +#define MT6357_LDO_TOP_CLK_VFE28_CON0 0x18b4 +#define MT6357_LDO_TOP_CLK_VMCH_CON0 0x18b6 +#define MT6357_LDO_TOP_CLK_VMC_CON0 0x18b8 +#define MT6357_LDO_TOP_CLK_VRF18_CON0 0x18ba +#define MT6357_LDO_TOP_CLK_VLDO28_CON0 0x18bc +#define MT6357_LDO_TOP_CLK_VRF12_CON0 0x18be +#define MT6357_LDO_TOP_CLK_VCAMA_CON0 0x18c0 +#define MT6357_LDO_TOP_CLK_TREF_CON0 0x18c2 +#define MT6357_LDO_TOP_INT_CON0 0x18c4 +#define MT6357_LDO_TOP_INT_CON0_SET 0x18c6 +#define MT6357_LDO_TOP_INT_CON0_CLR 0x18c8 +#define MT6357_LDO_TOP_INT_CON1 0x18ca +#define MT6357_LDO_TOP_INT_CON1_SET 0x18cc +#define MT6357_LDO_TOP_INT_CON1_CLR 0x18ce +#define MT6357_LDO_TOP_INT_MASK_CON0 0x18d0 +#define MT6357_LDO_TOP_INT_MASK_CON0_SET 0x18d2 +#define MT6357_LDO_TOP_INT_MASK_CON0_CLR 0x18d4 +#define MT6357_LDO_TOP_INT_MASK_CON1 0x18d6 +#define MT6357_LDO_TOP_INT_MASK_CON1_SET 0x18d8 +#define MT6357_LDO_TOP_INT_MASK_CON1_CLR 0x18da +#define MT6357_LDO_TOP_INT_STATUS0 0x18dc +#define MT6357_LDO_TOP_INT_STATUS1 0x18de +#define MT6357_LDO_TOP_INT_RAW_STATUS0 0x18e0 +#define MT6357_LDO_TOP_INT_RAW_STATUS1 0x18e2 +#define MT6357_LDO_TEST_CON0 0x18e4 +#define MT6357_LDO_TOP_WDT_CON0 0x18e6 +#define MT6357_LDO_TOP_RSV_CON0 0x18e8 +#define MT6357_LDO_TOP_RSV_CON1 0x18ea +#define MT6357_LDO_OCFB0 0x18ec +#define MT6357_LDO_LP_PROTECTION 0x18ee +#define MT6357_LDO_DUMMY_LOAD_GATED 0x18f0 +#define MT6357_LDO_GON0_DSN_ID 0x1900 +#define MT6357_LDO_GON0_DSN_REV0 0x1902 +#define MT6357_LDO_GON0_DSN_DBI 0x1904 +#define MT6357_LDO_GON0_DSN_DXI 0x1906 +#define MT6357_LDO_VXO22_CON0 0x1908 +#define MT6357_LDO_VXO22_OP_EN 0x190a +#define MT6357_LDO_VXO22_OP_EN_SET 0x190c +#define MT6357_LDO_VXO22_OP_EN_CLR 0x190e +#define MT6357_LDO_VXO22_OP_CFG 0x1910 +#define MT6357_LDO_VXO22_OP_CFG_SET 0x1912 +#define MT6357_LDO_VXO22_OP_CFG_CLR 0x1914 +#define MT6357_LDO_VXO22_CON1 0x1916 +#define MT6357_LDO_VXO22_CON2 0x1918 +#define MT6357_LDO_VXO22_CON3 0x191a +#define MT6357_LDO_VAUX18_CON0 0x191c +#define MT6357_LDO_VAUX18_OP_EN 0x191e +#define MT6357_LDO_VAUX18_OP_EN_SET 0x1920 +#define MT6357_LDO_VAUX18_OP_EN_CLR 0x1922 +#define MT6357_LDO_VAUX18_OP_CFG 0x1924 +#define MT6357_LDO_VAUX18_OP_CFG_SET 0x1926 +#define MT6357_LDO_VAUX18_OP_CFG_CLR 0x1928 +#define MT6357_LDO_VAUX18_CON1 0x192a +#define MT6357_LDO_VAUX18_CON2 0x192c +#define MT6357_LDO_VAUX18_CON3 0x192e +#define MT6357_LDO_VAUD28_CON0 0x1930 +#define MT6357_LDO_VAUD28_OP_EN 0x1932 +#define MT6357_LDO_VAUD28_OP_EN_SET 0x1934 +#define MT6357_LDO_VAUD28_OP_EN_CLR 0x1936 +#define MT6357_LDO_VAUD28_OP_CFG 0x1938 +#define MT6357_LDO_VAUD28_OP_CFG_SET 0x193a +#define MT6357_LDO_VAUD28_OP_CFG_CLR 0x193c +#define MT6357_LDO_VAUD28_CON1 0x193e +#define MT6357_LDO_VAUD28_CON2 0x1940 +#define MT6357_LDO_VAUD28_CON3 0x1942 +#define MT6357_LDO_VIO28_CON0 0x1944 +#define MT6357_LDO_VIO28_OP_EN 0x1946 +#define MT6357_LDO_VIO28_OP_EN_SET 0x1948 +#define MT6357_LDO_VIO28_OP_EN_CLR 0x194a +#define MT6357_LDO_VIO28_OP_CFG 0x194c +#define MT6357_LDO_VIO28_OP_CFG_SET 0x194e +#define MT6357_LDO_VIO28_OP_CFG_CLR 0x1950 +#define MT6357_LDO_VIO28_CON1 0x1952 +#define MT6357_LDO_VIO28_CON2 0x1954 +#define MT6357_LDO_VIO28_CON3 0x1956 +#define MT6357_LDO_VIO18_CON0 0x1958 +#define MT6357_LDO_VIO18_OP_EN 0x195a +#define MT6357_LDO_VIO18_OP_EN_SET 0x195c +#define MT6357_LDO_VIO18_OP_EN_CLR 0x195e +#define MT6357_LDO_VIO18_OP_CFG 0x1960 +#define MT6357_LDO_VIO18_OP_CFG_SET 0x1962 +#define MT6357_LDO_VIO18_OP_CFG_CLR 0x1964 +#define MT6357_LDO_VIO18_CON1 0x1966 +#define MT6357_LDO_VIO18_CON2 0x1968 +#define MT6357_LDO_VIO18_CON3 0x196a +#define MT6357_LDO_VDRAM_CON0 0x196c +#define MT6357_LDO_VDRAM_OP_EN 0x196e +#define MT6357_LDO_VDRAM_OP_EN_SET 0x1970 +#define MT6357_LDO_VDRAM_OP_EN_CLR 0x1972 +#define MT6357_LDO_VDRAM_OP_CFG 0x1974 +#define MT6357_LDO_VDRAM_OP_CFG_SET 0x1976 +#define MT6357_LDO_VDRAM_OP_CFG_CLR 0x1978 +#define MT6357_LDO_VDRAM_CON1 0x197a +#define MT6357_LDO_VDRAM_CON2 0x197c +#define MT6357_LDO_VDRAM_CON3 0x197e +#define MT6357_LDO_GON1_DSN_ID 0x1980 +#define MT6357_LDO_GON1_DSN_REV0 0x1982 +#define MT6357_LDO_GON1_DSN_DBI 0x1984 +#define MT6357_LDO_GON1_DSN_DXI 0x1986 +#define MT6357_LDO_VEMC_CON0 0x1988 +#define MT6357_LDO_VEMC_OP_EN 0x198a +#define MT6357_LDO_VEMC_OP_EN_SET 0x198c +#define MT6357_LDO_VEMC_OP_EN_CLR 0x198e +#define MT6357_LDO_VEMC_OP_CFG 0x1990 +#define MT6357_LDO_VEMC_OP_CFG_SET 0x1992 +#define MT6357_LDO_VEMC_OP_CFG_CLR 0x1994 +#define MT6357_LDO_VEMC_CON1 0x1996 +#define MT6357_LDO_VEMC_CON2 0x1998 +#define MT6357_LDO_VEMC_CON3 0x199a +#define MT6357_LDO_VUSB33_CON0_0 0x199c +#define MT6357_LDO_VUSB33_OP_EN 0x199e +#define MT6357_LDO_VUSB33_OP_EN_SET 0x19a0 +#define MT6357_LDO_VUSB33_OP_EN_CLR 0x19a2 +#define MT6357_LDO_VUSB33_OP_CFG 0x19a4 +#define MT6357_LDO_VUSB33_OP_CFG_SET 0x19a6 +#define MT6357_LDO_VUSB33_OP_CFG_CLR 0x19a8 +#define MT6357_LDO_VUSB33_CON0_1 0x19aa +#define MT6357_LDO_VUSB33_CON1 0x19ac +#define MT6357_LDO_VUSB33_CON2 0x19ae +#define MT6357_LDO_VUSB33_CON3 0x19b0 +#define MT6357_LDO_VSRAM_PROC_CON0 0x19b2 +#define MT6357_LDO_VSRAM_PROC_CON2 0x19b4 +#define MT6357_LDO_VSRAM_PROC_CFG0 0x19b6 +#define MT6357_LDO_VSRAM_PROC_CFG1 0x19b8 +#define MT6357_LDO_VSRAM_PROC_OP_EN 0x19ba +#define MT6357_LDO_VSRAM_PROC_OP_EN_SET 0x19bc +#define MT6357_LDO_VSRAM_PROC_OP_EN_CLR 0x19be +#define MT6357_LDO_VSRAM_PROC_OP_CFG 0x19c0 +#define MT6357_LDO_VSRAM_PROC_OP_CFG_SET 0x19c2 +#define MT6357_LDO_VSRAM_PROC_OP_CFG_CLR 0x19c4 +#define MT6357_LDO_VSRAM_PROC_CON3 0x19c6 +#define MT6357_LDO_VSRAM_PROC_CON4 0x19c8 +#define MT6357_LDO_VSRAM_PROC_CON5 0x19ca +#define MT6357_LDO_VSRAM_PROC_DBG0 0x19cc +#define MT6357_LDO_VSRAM_PROC_DBG1 0x19ce +#define MT6357_LDO_VSRAM_OTHERS_CON0 0x19d0 +#define MT6357_LDO_VSRAM_OTHERS_CON2 0x19d2 +#define MT6357_LDO_VSRAM_OTHERS_CFG0 0x19d4 +#define MT6357_LDO_VSRAM_OTHERS_CFG1 0x19d6 +#define MT6357_LDO_VSRAM_OTHERS_OP_EN 0x19d8 +#define MT6357_LDO_VSRAM_OTHERS_OP_EN_SET 0x19da +#define MT6357_LDO_VSRAM_OTHERS_OP_EN_CLR 0x19dc +#define MT6357_LDO_VSRAM_OTHERS_OP_CFG 0x19de +#define MT6357_LDO_VSRAM_OTHERS_OP_CFG_SET 0x19e0 +#define MT6357_LDO_VSRAM_OTHERS_OP_CFG_CLR 0x19e2 +#define MT6357_LDO_VSRAM_OTHERS_CON3 0x19e4 +#define MT6357_LDO_VSRAM_OTHERS_CON4 0x19e6 +#define MT6357_LDO_VSRAM_OTHERS_CON5 0x19e8 +#define MT6357_LDO_VSRAM_OTHERS_DBG0 0x19ea +#define MT6357_LDO_VSRAM_OTHERS_DBG1 0x19ec +#define MT6357_LDO_VSRAM_PROC_SP 0x19ee +#define MT6357_LDO_VSRAM_OTHERS_SP 0x19f0 +#define MT6357_LDO_VSRAM_PROC_R2R_PDN_DIS 0x19f2 +#define MT6357_LDO_VSRAM_OTHERS_R2R_PDN_DIS 0x19f4 +#define MT6357_LDO_VSRAM_WDT_DBG0 0x19f6 +#define MT6357_LDO_GON1_ELR_NUM 0x19f8 +#define MT6357_LDO_VSRAM_CON0 0x19fa +#define MT6357_LDO_VSRAM_CON1 0x19fc +#define MT6357_LDO_VSRAM_CON2 0x19fe +#define MT6357_LDO_GOFF0_DSN_ID 0x1a00 +#define MT6357_LDO_GOFF0_DSN_REV0 0x1a02 +#define MT6357_LDO_GOFF0_DSN_DBI 0x1a04 +#define MT6357_LDO_GOFF0_DSN_DXI 0x1a06 +#define MT6357_LDO_VFE28_CON0 0x1a08 +#define MT6357_LDO_VFE28_OP_EN 0x1a0a +#define MT6357_LDO_VFE28_OP_EN_SET 0x1a0c +#define MT6357_LDO_VFE28_OP_EN_CLR 0x1a0e +#define MT6357_LDO_VFE28_OP_CFG 0x1a10 +#define MT6357_LDO_VFE28_OP_CFG_SET 0x1a12 +#define MT6357_LDO_VFE28_OP_CFG_CLR 0x1a14 +#define MT6357_LDO_VFE28_CON1 0x1a16 +#define MT6357_LDO_VFE28_CON2 0x1a18 +#define MT6357_LDO_VFE28_CON3 0x1a1a +#define MT6357_LDO_VRF18_CON0 0x1a1c +#define MT6357_LDO_VRF18_OP_EN 0x1a1e +#define MT6357_LDO_VRF18_OP_EN_SET 0x1a20 +#define MT6357_LDO_VRF18_OP_EN_CLR 0x1a22 +#define MT6357_LDO_VRF18_OP_CFG 0x1a24 +#define MT6357_LDO_VRF18_OP_CFG_SET 0x1a26 +#define MT6357_LDO_VRF18_OP_CFG_CLR 0x1a28 +#define MT6357_LDO_VRF18_CON1 0x1a2a +#define MT6357_LDO_VRF18_CON2 0x1a2c +#define MT6357_LDO_VRF18_CON3 0x1a2e +#define MT6357_LDO_VRF12_CON0 0x1a30 +#define MT6357_LDO_VRF12_OP_EN 0x1a32 +#define MT6357_LDO_VRF12_OP_EN_SET 0x1a34 +#define MT6357_LDO_VRF12_OP_EN_CLR 0x1a36 +#define MT6357_LDO_VRF12_OP_CFG 0x1a38 +#define MT6357_LDO_VRF12_OP_CFG_SET 0x1a3a +#define MT6357_LDO_VRF12_OP_CFG_CLR 0x1a3c +#define MT6357_LDO_VRF12_CON1 0x1a3e +#define MT6357_LDO_VRF12_CON2 0x1a40 +#define MT6357_LDO_VRF12_CON3 0x1a42 +#define MT6357_LDO_VEFUSE_CON0 0x1a44 +#define MT6357_LDO_VEFUSE_OP_EN 0x1a46 +#define MT6357_LDO_VEFUSE_OP_EN_SET 0x1a48 +#define MT6357_LDO_VEFUSE_OP_EN_CLR 0x1a4a +#define MT6357_LDO_VEFUSE_OP_CFG 0x1a4c +#define MT6357_LDO_VEFUSE_OP_CFG_SET 0x1a4e +#define MT6357_LDO_VEFUSE_OP_CFG_CLR 0x1a50 +#define MT6357_LDO_VEFUSE_CON1 0x1a52 +#define MT6357_LDO_VEFUSE_CON2 0x1a54 +#define MT6357_LDO_VEFUSE_CON3 0x1a56 +#define MT6357_LDO_VCN18_CON0 0x1a58 +#define MT6357_LDO_VCN18_OP_EN 0x1a5a +#define MT6357_LDO_VCN18_OP_EN_SET 0x1a5c +#define MT6357_LDO_VCN18_OP_EN_CLR 0x1a5e +#define MT6357_LDO_VCN18_OP_CFG 0x1a60 +#define MT6357_LDO_VCN18_OP_CFG_SET 0x1a62 +#define MT6357_LDO_VCN18_OP_CFG_CLR 0x1a64 +#define MT6357_LDO_VCN18_CON1 0x1a66 +#define MT6357_LDO_VCN18_CON2 0x1a68 +#define MT6357_LDO_VCN18_CON3 0x1a6a +#define MT6357_LDO_VCAMA_CON0 0x1a6c +#define MT6357_LDO_VCAMA_OP_EN 0x1a6e +#define MT6357_LDO_VCAMA_OP_EN_SET 0x1a70 +#define MT6357_LDO_VCAMA_OP_EN_CLR 0x1a72 +#define MT6357_LDO_VCAMA_OP_CFG 0x1a74 +#define MT6357_LDO_VCAMA_OP_CFG_SET 0x1a76 +#define MT6357_LDO_VCAMA_OP_CFG_CLR 0x1a78 +#define MT6357_LDO_VCAMA_CON1 0x1a7a +#define MT6357_LDO_VCAMA_CON2 0x1a7c +#define MT6357_LDO_VCAMA_CON3 0x1a7e +#define MT6357_LDO_GOFF1_DSN_ID 0x1a80 +#define MT6357_LDO_GOFF1_DSN_REV0 0x1a82 +#define MT6357_LDO_GOFF1_DSN_DBI 0x1a84 +#define MT6357_LDO_GOFF1_DSN_DXI 0x1a86 +#define MT6357_LDO_VCAMD_CON0 0x1a88 +#define MT6357_LDO_VCAMD_OP_EN 0x1a8a +#define MT6357_LDO_VCAMD_OP_EN_SET 0x1a8c +#define MT6357_LDO_VCAMD_OP_EN_CLR 0x1a8e +#define MT6357_LDO_VCAMD_OP_CFG 0x1a90 +#define MT6357_LDO_VCAMD_OP_CFG_SET 0x1a92 +#define MT6357_LDO_VCAMD_OP_CFG_CLR 0x1a94 +#define MT6357_LDO_VCAMD_CON1 0x1a96 +#define MT6357_LDO_VCAMD_CON2 0x1a98 +#define MT6357_LDO_VCAMD_CON3 0x1a9a +#define MT6357_LDO_VCAMIO_CON0 0x1a9c +#define MT6357_LDO_VCAMIO_OP_EN 0x1a9e +#define MT6357_LDO_VCAMIO_OP_EN_SET 0x1aa0 +#define MT6357_LDO_VCAMIO_OP_EN_CLR 0x1aa2 +#define MT6357_LDO_VCAMIO_OP_CFG 0x1aa4 +#define MT6357_LDO_VCAMIO_OP_CFG_SET 0x1aa6 +#define MT6357_LDO_VCAMIO_OP_CFG_CLR 0x1aa8 +#define MT6357_LDO_VCAMIO_CON1 0x1aaa +#define MT6357_LDO_VCAMIO_CON2 0x1aac +#define MT6357_LDO_VCAMIO_CON3 0x1aae +#define MT6357_LDO_VMC_CON0 0x1ab0 +#define MT6357_LDO_VMC_OP_EN 0x1ab2 +#define MT6357_LDO_VMC_OP_EN_SET 0x1ab4 +#define MT6357_LDO_VMC_OP_EN_CLR 0x1ab6 +#define MT6357_LDO_VMC_OP_CFG 0x1ab8 +#define MT6357_LDO_VMC_OP_CFG_SET 0x1aba +#define MT6357_LDO_VMC_OP_CFG_CLR 0x1abc +#define MT6357_LDO_VMC_CON1 0x1abe +#define MT6357_LDO_VMC_CON2 0x1ac0 +#define MT6357_LDO_VMC_CON3 0x1ac2 +#define MT6357_LDO_VMCH_CON0 0x1ac4 +#define MT6357_LDO_VMCH_OP_EN 0x1ac6 +#define MT6357_LDO_VMCH_OP_EN_SET 0x1ac8 +#define MT6357_LDO_VMCH_OP_EN_CLR 0x1aca +#define MT6357_LDO_VMCH_OP_CFG 0x1acc +#define MT6357_LDO_VMCH_OP_CFG_SET 0x1ace +#define MT6357_LDO_VMCH_OP_CFG_CLR 0x1ad0 +#define MT6357_LDO_VMCH_CON1 0x1ad2 +#define MT6357_LDO_VMCH_CON2 0x1ad4 +#define MT6357_LDO_VMCH_CON3 0x1ad6 +#define MT6357_LDO_VSIM1_CON0 0x1ad8 +#define MT6357_LDO_VSIM1_OP_EN 0x1ada +#define MT6357_LDO_VSIM1_OP_EN_SET 0x1adc +#define MT6357_LDO_VSIM1_OP_EN_CLR 0x1ade +#define MT6357_LDO_VSIM1_OP_CFG 0x1ae0 +#define MT6357_LDO_VSIM1_OP_CFG_SET 0x1ae2 +#define MT6357_LDO_VSIM1_OP_CFG_CLR 0x1ae4 +#define MT6357_LDO_VSIM1_CON1 0x1ae6 +#define MT6357_LDO_VSIM1_CON2 0x1ae8 +#define MT6357_LDO_VSIM1_CON3 0x1aea +#define MT6357_LDO_VSIM2_CON0 0x1aec +#define MT6357_LDO_VSIM2_OP_EN 0x1aee +#define MT6357_LDO_VSIM2_OP_EN_SET 0x1af0 +#define MT6357_LDO_VSIM2_OP_EN_CLR 0x1af2 +#define MT6357_LDO_VSIM2_OP_CFG 0x1af4 +#define MT6357_LDO_VSIM2_OP_CFG_SET 0x1af6 +#define MT6357_LDO_VSIM2_OP_CFG_CLR 0x1af8 +#define MT6357_LDO_VSIM2_CON1 0x1afa +#define MT6357_LDO_VSIM2_CON2 0x1afc +#define MT6357_LDO_VSIM2_CON3 0x1afe +#define MT6357_LDO_GOFF2_DSN_ID 0x1b00 +#define MT6357_LDO_GOFF2_DSN_REV0 0x1b02 +#define MT6357_LDO_GOFF2_DSN_DBI 0x1b04 +#define MT6357_LDO_GOFF2_DSN_DXI 0x1b06 +#define MT6357_LDO_VIBR_CON0 0x1b08 +#define MT6357_LDO_VIBR_OP_EN 0x1b0a +#define MT6357_LDO_VIBR_OP_EN_SET 0x1b0c +#define MT6357_LDO_VIBR_OP_EN_CLR 0x1b0e +#define MT6357_LDO_VIBR_OP_CFG 0x1b10 +#define MT6357_LDO_VIBR_OP_CFG_SET 0x1b12 +#define MT6357_LDO_VIBR_OP_CFG_CLR 0x1b14 +#define MT6357_LDO_VIBR_CON1 0x1b16 +#define MT6357_LDO_VIBR_CON2 0x1b18 +#define MT6357_LDO_VIBR_CON3 0x1b1a +#define MT6357_LDO_VCN33_CON0_0 0x1b1c +#define MT6357_LDO_VCN33_OP_EN 0x1b1e +#define MT6357_LDO_VCN33_OP_EN_SET 0x1b20 +#define MT6357_LDO_VCN33_OP_EN_CLR 0x1b22 +#define MT6357_LDO_VCN33_OP_CFG 0x1b24 +#define MT6357_LDO_VCN33_OP_CFG_SET 0x1b26 +#define MT6357_LDO_VCN33_OP_CFG_CLR 0x1b28 +#define MT6357_LDO_VCN33_CON0_1 0x1b2a +#define MT6357_LDO_VCN33_CON1 0x1b2c +#define MT6357_LDO_VCN33_CON2 0x1b2e +#define MT6357_LDO_VCN33_CON3 0x1b30 +#define MT6357_LDO_VLDO28_CON0_0 0x1b32 +#define MT6357_LDO_VLDO28_OP_EN 0x1b34 +#define MT6357_LDO_VLDO28_OP_EN_SET 0x1b36 +#define MT6357_LDO_VLDO28_OP_EN_CLR 0x1b38 +#define MT6357_LDO_VLDO28_OP_CFG 0x1b3a +#define MT6357_LDO_VLDO28_OP_CFG_SET 0x1b3c +#define MT6357_LDO_VLDO28_OP_CFG_CLR 0x1b3e +#define MT6357_LDO_VLDO28_CON0_1 0x1b40 +#define MT6357_LDO_VLDO28_CON1 0x1b42 +#define MT6357_LDO_VLDO28_CON2 0x1b44 +#define MT6357_LDO_VLDO28_CON3 0x1b46 +#define MT6357_LDO_GOFF2_RSV_CON0 0x1b48 +#define MT6357_LDO_GOFF2_RSV_CON1 0x1b4a +#define MT6357_LDO_GOFF3_DSN_ID 0x1b80 +#define MT6357_LDO_GOFF3_DSN_REV0 0x1b82 +#define MT6357_LDO_GOFF3_DSN_DBI 0x1b84 +#define MT6357_LDO_GOFF3_DSN_DXI 0x1b86 +#define MT6357_LDO_VCN28_CON0 0x1b88 +#define MT6357_LDO_VCN28_OP_EN 0x1b8a +#define MT6357_LDO_VCN28_OP_EN_SET 0x1b8c +#define MT6357_LDO_VCN28_OP_EN_CLR 0x1b8e +#define MT6357_LDO_VCN28_OP_CFG 0x1b90 +#define MT6357_LDO_VCN28_OP_CFG_SET 0x1b92 +#define MT6357_LDO_VCN28_OP_CFG_CLR 0x1b94 +#define MT6357_LDO_VCN28_CON1 0x1b96 +#define MT6357_LDO_VCN28_CON2 0x1b98 +#define MT6357_LDO_VCN28_CON3 0x1b9a +#define MT6357_VRTC_CON0 0x1b9c +#define MT6357_LDO_TREF_CON0 0x1b9e +#define MT6357_LDO_TREF_OP_EN 0x1ba0 +#define MT6357_LDO_TREF_OP_EN_SET 0x1ba2 +#define MT6357_LDO_TREF_OP_EN_CLR 0x1ba4 +#define MT6357_LDO_TREF_OP_CFG 0x1ba6 +#define MT6357_LDO_TREF_OP_CFG_SET 0x1ba8 +#define MT6357_LDO_TREF_OP_CFG_CLR 0x1baa +#define MT6357_LDO_TREF_CON1 0x1bac +#define MT6357_LDO_GOFF3_RSV_CON0 0x1bae +#define MT6357_LDO_GOFF3_RSV_CON1 0x1bb0 +#define MT6357_LDO_ANA0_DSN_ID 0x1c00 +#define MT6357_LDO_ANA0_DSN_REV0 0x1c02 +#define MT6357_LDO_ANA0_DSN_DBI 0x1c04 +#define MT6357_LDO_ANA0_DSN_DXI 0x1c06 +#define MT6357_VFE28_ANA_CON0 0x1c08 +#define MT6357_VFE28_ANA_CON1 0x1c0a +#define MT6357_VCN28_ANA_CON0 0x1c0c +#define MT6357_VCN28_ANA_CON1 0x1c0e +#define MT6357_VAUD28_ANA_CON0 0x1c10 +#define MT6357_VAUD28_ANA_CON1 0x1c12 +#define MT6357_VAUX18_ANA_CON0 0x1c14 +#define MT6357_VAUX18_ANA_CON1 0x1c16 +#define MT6357_VXO22_ANA_CON0 0x1c18 +#define MT6357_VXO22_ANA_CON1 0x1c1a +#define MT6357_VCN33_ANA_CON0 0x1c1c +#define MT6357_VCN33_ANA_CON1 0x1c1e +#define MT6357_VEMC_ANA_CON0 0x1c20 +#define MT6357_VEMC_ANA_CON1 0x1c22 +#define MT6357_VLDO28_ANA_CON0 0x1c24 +#define MT6357_VLDO28_ANA_CON1 0x1c26 +#define MT6357_VIO28_ANA_CON0 0x1c28 +#define MT6357_VIO28_ANA_CON1 0x1c2a +#define MT6357_VIBR_ANA_CON0 0x1c2c +#define MT6357_VIBR_ANA_CON1 0x1c2e +#define MT6357_VSIM1_ANA_CON0 0x1c30 +#define MT6357_VSIM1_ANA_CON1 0x1c32 +#define MT6357_VSIM2_ANA_CON0 0x1c34 +#define MT6357_VSIM2_ANA_CON1 0x1c36 +#define MT6357_VMCH_ANA_CON0 0x1c38 +#define MT6357_VMCH_ANA_CON1 0x1c3a +#define MT6357_VMC_ANA_CON0 0x1c3c +#define MT6357_VMC_ANA_CON1 0x1c3e +#define MT6357_VCAMIO_ANA_CON0 0x1c40 +#define MT6357_VCAMIO_ANA_CON1 0x1c42 +#define MT6357_VCN18_ANA_CON0 0x1c44 +#define MT6357_VCN18_ANA_CON1 0x1c46 +#define MT6357_VRF18_ANA_CON0 0x1c48 +#define MT6357_VRF18_ANA_CON1 0x1c4a +#define MT6357_VIO18_ANA_CON0 0x1c4c +#define MT6357_VIO18_ANA_CON1 0x1c4e +#define MT6357_VDRAM_ANA_CON1 0x1c50 +#define MT6357_VRF12_ANA_CON0 0x1c52 +#define MT6357_VRF12_ANA_CON1 0x1c54 +#define MT6357_VSRAM_PROC_ANA_CON0 0x1c56 +#define MT6357_VSRAM_OTHERS_ANA_CON0 0x1c58 +#define MT6357_LDO_ANA0_ELR_NUM 0x1c5a +#define MT6357_VFE28_ELR_0 0x1c5c +#define MT6357_VCN28_ELR_0 0x1c5e +#define MT6357_VAUD28_ELR_0 0x1c60 +#define MT6357_VAUX18_ELR_0 0x1c62 +#define MT6357_VXO22_ELR_0 0x1c64 +#define MT6357_VCN33_ELR_0 0x1c66 +#define MT6357_VEMC_ELR_0 0x1c68 +#define MT6357_VLDO28_ELR_0 0x1c6a +#define MT6357_VIO28_ELR_0 0x1c6c +#define MT6357_VIBR_ELR_0 0x1c6e +#define MT6357_VSIM1_ELR_0 0x1c70 +#define MT6357_VSIM2_ELR_0 0x1c72 +#define MT6357_VMCH_ELR_0 0x1c74 +#define MT6357_VMC_ELR_0 0x1c76 +#define MT6357_VCAMIO_ELR_0 0x1c78 +#define MT6357_VCN18_ELR_0 0x1c7a +#define MT6357_VRF18_ELR_0 0x1c7c +#define MT6357_LDO_ANA1_DSN_ID 0x1c80 +#define MT6357_LDO_ANA1_DSN_REV0 0x1c82 +#define MT6357_LDO_ANA1_DSN_DBI 0x1c84 +#define MT6357_LDO_ANA1_DSN_DXI 0x1c86 +#define MT6357_VUSB33_ANA_CON0 0x1c88 +#define MT6357_VUSB33_ANA_CON1 0x1c8a +#define MT6357_VCAMA_ANA_CON0 0x1c8c +#define MT6357_VCAMA_ANA_CON1 0x1c8e +#define MT6357_VEFUSE_ANA_CON0 0x1c90 +#define MT6357_VEFUSE_ANA_CON1 0x1c92 +#define MT6357_VCAMD_ANA_CON0 0x1c94 +#define MT6357_VCAMD_ANA_CON1 0x1c96 +#define MT6357_LDO_ANA1_ELR_NUM 0x1c98 +#define MT6357_VUSB33_ELR_0 0x1c9a +#define MT6357_VCAMA_ELR_0 0x1c9c +#define MT6357_VEFUSE_ELR_0 0x1c9e +#define MT6357_VCAMD_ELR_0 0x1ca0 +#define MT6357_VIO18_ELR_0 0x1ca2 +#define MT6357_VDRAM_ELR_0 0x1ca4 +#define MT6357_VRF12_ELR_0 0x1ca6 +#define MT6357_VRTC_ELR_0 0x1ca8 +#define MT6357_VDRAM_ELR_1 0x1caa +#define MT6357_VDRAM_ELR_2 0x1cac +#define MT6357_XPP_TOP_ID 0x1e00 +#define MT6357_XPP_TOP_REV0 0x1e02 +#define MT6357_XPP_TOP_DBI 0x1e04 +#define MT6357_XPP_TOP_DXI 0x1e06 +#define MT6357_XPP_TPM0 0x1e08 +#define MT6357_XPP_TPM1 0x1e0a +#define MT6357_XPP_TOP_TEST_OUT 0x1e0c +#define MT6357_XPP_TOP_TEST_CON0 0x1e0e +#define MT6357_XPP_TOP_CKPDN_CON0 0x1e10 +#define MT6357_XPP_TOP_CKPDN_CON0_SET 0x1e12 +#define MT6357_XPP_TOP_CKPDN_CON0_CLR 0x1e14 +#define MT6357_XPP_TOP_CKSEL_CON0 0x1e16 +#define MT6357_XPP_TOP_CKSEL_CON0_SET 0x1e18 +#define MT6357_XPP_TOP_CKSEL_CON0_CLR 0x1e1a +#define MT6357_XPP_TOP_RST_CON0 0x1e1c +#define MT6357_XPP_TOP_RST_CON0_SET 0x1e1e +#define MT6357_XPP_TOP_RST_CON0_CLR 0x1e20 +#define MT6357_XPP_TOP_RST_BANK_CON0 0x1e22 +#define MT6357_XPP_TOP_RST_BANK_CON0_SET 0x1e24 +#define MT6357_XPP_TOP_RST_BANK_CON0_CLR 0x1e26 +#define MT6357_DRIVER_BL_DSN_ID 0x1e80 +#define MT6357_DRIVER_BL_DSN_REV0 0x1e82 +#define MT6357_DRIVER_BL_DSN_DBI 0x1e84 +#define MT6357_DRIVER_BL_DSN_DXI 0x1e86 +#define MT6357_ISINK1_CON0 0x1e88 +#define MT6357_ISINK1_CON1 0x1e8a +#define MT6357_ISINK1_CON2 0x1e8c +#define MT6357_ISINK1_CON3 0x1e8e +#define MT6357_ISINK_ANA1 0x1e90 +#define MT6357_ISINK_PHASE_DLY 0x1e92 +#define MT6357_ISINK_SFSTR 0x1e94 +#define MT6357_ISINK_EN_CTRL 0x1e96 +#define MT6357_ISINK_MODE_CTRL 0x1e98 +#define MT6357_DRIVER_ANA_CON0 0x1e9a +#define MT6357_ISINK_ANA_CON0 0x1e9c +#define MT6357_ISINK_ANA_CON1 0x1e9e +#define MT6357_DRIVER_BL_ELR_NUM 0x1ea0 +#define MT6357_DRIVER_BL_ELR_0 0x1ea2 +#define MT6357_DRIVER_CI_DSN_ID 0x1f00 +#define MT6357_DRIVER_CI_DSN_REV0 0x1f02 +#define MT6357_DRIVER_CI_DSN_DBI 0x1f04 +#define MT6357_DRIVER_CI_DSN_DXI 0x1f06 +#define MT6357_CHRIND_CON0 0x1f08 +#define MT6357_CHRIND_CON1 0x1f0a +#define MT6357_CHRIND_CON2 0x1f0c +#define MT6357_CHRIND_CON3 0x1f0e +#define MT6357_CHRIND_CON4 0x1f10 +#define MT6357_CHRIND_EN_CTRL 0x1f12 +#define MT6357_CHRIND_ANA_CON0 0x1f14 +#define MT6357_DRIVER_DL_DSN_ID 0x1f80 +#define MT6357_DRIVER_DL_DSN_REV0 0x1f82 +#define MT6357_DRIVER_DL_DSN_DBI 0x1f84 +#define MT6357_DRIVER_DL_DSN_DXI 0x1f86 +#define MT6357_ISINK2_CON0 0x1f88 +#define MT6357_ISINK3_CON0 0x1f8a +#define MT6357_ISINK_EN_CTRL_SMPL 0x1f8c +#define MT6357_AUD_TOP_ID 0x2080 +#define MT6357_AUD_TOP_REV0 0x2082 +#define MT6357_AUD_TOP_DBI 0x2084 +#define MT6357_AUD_TOP_DXI 0x2086 +#define MT6357_AUD_TOP_CKPDN_TPM0 0x2088 +#define MT6357_AUD_TOP_CKPDN_TPM1 0x208a +#define MT6357_AUD_TOP_CKPDN_CON0 0x208c +#define MT6357_AUD_TOP_CKPDN_CON0_SET 0x208e +#define MT6357_AUD_TOP_CKPDN_CON0_CLR 0x2090 +#define MT6357_AUD_TOP_CKSEL_CON0 0x2092 +#define MT6357_AUD_TOP_CKSEL_CON0_SET 0x2094 +#define MT6357_AUD_TOP_CKSEL_CON0_CLR 0x2096 +#define MT6357_AUD_TOP_CKTST_CON0 0x2098 +#define MT6357_AUD_TOP_RST_CON0 0x209a +#define MT6357_AUD_TOP_RST_CON0_SET 0x209c +#define MT6357_AUD_TOP_RST_CON0_CLR 0x209e +#define MT6357_AUD_TOP_RST_BANK_CON0 0x20a0 +#define MT6357_AUD_TOP_INT_CON0 0x20a2 +#define MT6357_AUD_TOP_INT_CON0_SET 0x20a4 +#define MT6357_AUD_TOP_INT_CON0_CLR 0x20a6 +#define MT6357_AUD_TOP_INT_MASK_CON0 0x20a8 +#define MT6357_AUD_TOP_INT_MASK_CON0_SET 0x20aa +#define MT6357_AUD_TOP_INT_MASK_CON0_CLR 0x20ac +#define MT6357_AUD_TOP_INT_STATUS0 0x20ae +#define MT6357_AUD_TOP_INT_RAW_STATUS0 0x20b0 +#define MT6357_AUD_TOP_INT_MISC_CON0 0x20b2 +#define MT6357_AUDNCP_CLKDIV_CON0 0x20b4 +#define MT6357_AUDNCP_CLKDIV_CON1 0x20b6 +#define MT6357_AUDNCP_CLKDIV_CON2 0x20b8 +#define MT6357_AUDNCP_CLKDIV_CON3 0x20ba +#define MT6357_AUDNCP_CLKDIV_CON4 0x20bc +#define MT6357_AUD_TOP_MON_CON0 0x20be +#define MT6357_AUDIO_DIG_DSN_ID 0x2100 +#define MT6357_AUDIO_DIG_DSN_REV0 0x2102 +#define MT6357_AUDIO_DIG_DSN_DBI 0x2104 +#define MT6357_AUDIO_DIG_DSN_DXI 0x2106 +#define MT6357_AFE_UL_DL_CON0 0x2108 +#define MT6357_AFE_DL_SRC2_CON0_L 0x210a +#define MT6357_AFE_UL_SRC_CON0_H 0x210c +#define MT6357_AFE_UL_SRC_CON0_L 0x210e +#define MT6357_AFE_TOP_CON0 0x2110 +#define MT6357_AUDIO_TOP_CON0 0x2112 +#define MT6357_AFE_MON_DEBUG0 0x2114 +#define MT6357_AFUNC_AUD_CON0 0x2116 +#define MT6357_AFUNC_AUD_CON1 0x2118 +#define MT6357_AFUNC_AUD_CON2 0x211a +#define MT6357_AFUNC_AUD_CON3 0x211c +#define MT6357_AFUNC_AUD_CON4 0x211e +#define MT6357_AFUNC_AUD_CON5 0x2120 +#define MT6357_AFUNC_AUD_CON6 0x2122 +#define MT6357_AFUNC_AUD_MON0 0x2124 +#define MT6357_AUDRC_TUNE_MON0 0x2126 +#define MT6357_AFE_ADDA_MTKAIF_FIFO_CFG0 0x2128 +#define MT6357_AFE_ADDA_MTKAIF_FIFO_LOG_MON1 0x212a +#define MT6357_AFE_ADDA_MTKAIF_MON0 0x212c +#define MT6357_AFE_ADDA_MTKAIF_MON1 0x212e +#define MT6357_AFE_ADDA_MTKAIF_MON2 0x2130 +#define MT6357_AFE_ADDA_MTKAIF_MON3 0x2132 +#define MT6357_AFE_ADDA_MTKAIF_CFG0 0x2134 +#define MT6357_AFE_ADDA_MTKAIF_RX_CFG0 0x2136 +#define MT6357_AFE_ADDA_MTKAIF_RX_CFG1 0x2138 +#define MT6357_AFE_ADDA_MTKAIF_RX_CFG2 0x213a +#define MT6357_AFE_ADDA_MTKAIF_RX_CFG3 0x213c +#define MT6357_AFE_ADDA_MTKAIF_TX_CFG1 0x213e +#define MT6357_AFE_SGEN_CFG0 0x2140 +#define MT6357_AFE_SGEN_CFG1 0x2142 +#define MT6357_AFE_ADC_ASYNC_FIFO_CFG 0x2144 +#define MT6357_AFE_DCCLK_CFG0 0x2146 +#define MT6357_AFE_DCCLK_CFG1 0x2148 +#define MT6357_AUDIO_DIG_CFG 0x214a +#define MT6357_AFE_AUD_PAD_TOP 0x214c +#define MT6357_AFE_AUD_PAD_TOP_MON 0x214e +#define MT6357_AFE_AUD_PAD_TOP_MON1 0x2150 +#define MT6357_AUDENC_DSN_ID 0x2180 +#define MT6357_AUDENC_DSN_REV0 0x2182 +#define MT6357_AUDENC_DSN_DBI 0x2184 +#define MT6357_AUDENC_DSN_FPI 0x2186 +#define MT6357_AUDENC_ANA_CON0 0x2188 +#define MT6357_AUDENC_ANA_CON1 0x218a +#define MT6357_AUDENC_ANA_CON2 0x218c +#define MT6357_AUDENC_ANA_CON3 0x218e +#define MT6357_AUDENC_ANA_CON4 0x2190 +#define MT6357_AUDENC_ANA_CON5 0x2192 +#define MT6357_AUDENC_ANA_CON6 0x2194 +#define MT6357_AUDENC_ANA_CON7 0x2196 +#define MT6357_AUDENC_ANA_CON8 0x2198 +#define MT6357_AUDENC_ANA_CON9 0x219a +#define MT6357_AUDENC_ANA_CON10 0x219c +#define MT6357_AUDENC_ANA_CON11 0x219e +#define MT6357_AUDDEC_DSN_ID 0x2200 +#define MT6357_AUDDEC_DSN_REV0 0x2202 +#define MT6357_AUDDEC_DSN_DBI 0x2204 +#define MT6357_AUDDEC_DSN_FPI 0x2206 +#define MT6357_AUDDEC_ANA_CON0 0x2208 +#define MT6357_AUDDEC_ANA_CON1 0x220a +#define MT6357_AUDDEC_ANA_CON2 0x220c +#define MT6357_AUDDEC_ANA_CON3 0x220e +#define MT6357_AUDDEC_ANA_CON4 0x2210 +#define MT6357_AUDDEC_ANA_CON5 0x2212 +#define MT6357_AUDDEC_ANA_CON6 0x2214 +#define MT6357_AUDDEC_ANA_CON7 0x2216 +#define MT6357_AUDDEC_ANA_CON8 0x2218 +#define MT6357_AUDDEC_ANA_CON9 0x221a +#define MT6357_AUDDEC_ANA_CON10 0x221c +#define MT6357_AUDDEC_ANA_CON11 0x221e +#define MT6357_AUDDEC_ANA_CON12 0x2220 +#define MT6357_AUDDEC_ANA_CON13 0x2222 +#define MT6357_AUDDEC_ELR_NUM 0x2224 +#define MT6357_AUDDEC_ELR_0 0x2226 +#define MT6357_AUDZCD_DSN_ID 0x2280 +#define MT6357_AUDZCD_DSN_REV0 0x2282 +#define MT6357_AUDZCD_DSN_DBI 0x2284 +#define MT6357_AUDZCD_DSN_FPI 0x2286 +#define MT6357_ZCD_CON0 0x2288 +#define MT6357_ZCD_CON1 0x228a +#define MT6357_ZCD_CON2 0x228c +#define MT6357_ZCD_CON3 0x228e +#define MT6357_ZCD_CON4 0x2290 +#define MT6357_ZCD_CON5 0x2292 +#define MT6357_ACCDET_DSN_DIG_ID 0x2300 +#define MT6357_ACCDET_DSN_DIG_REV0 0x2302 +#define MT6357_ACCDET_DSN_DBI 0x2304 +#define MT6357_ACCDET_DSN_FPI 0x2306 +#define MT6357_ACCDET_CON0 0x2308 +#define MT6357_ACCDET_CON1 0x230a +#define MT6357_ACCDET_CON2 0x230c +#define MT6357_ACCDET_CON3 0x230e +#define MT6357_ACCDET_CON4 0x2310 +#define MT6357_ACCDET_CON5 0x2312 +#define MT6357_ACCDET_CON6 0x2314 +#define MT6357_ACCDET_CON7 0x2316 +#define MT6357_ACCDET_CON8 0x2318 +#define MT6357_ACCDET_CON9 0x231a +#define MT6357_ACCDET_CON10 0x231c +#define MT6357_ACCDET_CON11 0x231e +#define MT6357_ACCDET_CON12 0x2320 +#define MT6357_ACCDET_CON13 0x2322 +#define MT6357_ACCDET_CON14 0x2324 +#define MT6357_ACCDET_CON15 0x2326 +#define MT6357_ACCDET_CON16 0x2328 +#define MT6357_ACCDET_CON17 0x232a +#define MT6357_ACCDET_CON18 0x232c +#define MT6357_ACCDET_CON19 0x232e +#define MT6357_ACCDET_CON20 0x2330 +#define MT6357_ACCDET_CON21 0x2332 +#define MT6357_ACCDET_CON22 0x2334 +#define MT6357_ACCDET_CON23 0x2336 +#define MT6357_ACCDET_CON24 0x2338 +#define MT6357_ACCDET_CON25 0x233a +#define MT6357_ACCDET_CON26 0x233c +#define MT6357_ACCDET_CON27 0x233e +#define MT6357_ACCDET_CON28 0x2340 + +#endif /* __MFD_MT6357_REGISTERS_H__ */ -- cgit From 738654be3cf7af4a0a131bd92142db045f0660dc Mon Sep 17 00:00:00 2001 From: Fabien Parent Date: Tue, 31 May 2022 14:49:57 +0200 Subject: mfd: mt6358-irq: Add MT6357 PMIC support Add MT6357 PMIC IRQ support. Signed-off-by: Fabien Parent Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220531124959.202787-6-fparent@baylibre.com --- include/linux/mfd/mt6397/core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mfd/mt6397/core.h b/include/linux/mfd/mt6397/core.h index 1cf78726503b..3fecaffe5019 100644 --- a/include/linux/mfd/mt6397/core.h +++ b/include/linux/mfd/mt6397/core.h @@ -12,6 +12,7 @@ enum chip_id { MT6323_CHIP_ID = 0x23, + MT6357_CHIP_ID = 0x57, MT6358_CHIP_ID = 0x58, MT6359_CHIP_ID = 0x59, MT6366_CHIP_ID = 0x66, -- cgit From 66ee379d743c69c726b61d078119a34d5be96a35 Mon Sep 17 00:00:00 2001 From: Tinghan Shen Date: Wed, 1 Jun 2022 19:22:01 +0800 Subject: mfd: cros_ec: Add SCP Core-1 as a new CrOS EC MCU MT8195 System Companion Processors(SCP) is a dual-core RISC-V MCU. Add a new CrOS feature ID to represent the SCP's 2nd core. The 1st core is referred to as 'core 0', and the 2nd core is referred to as 'core 1'. Signed-off-by: Tinghan Shen Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220601112201.15510-16-tinghan.shen@mediatek.com --- include/linux/platform_data/cros_ec_commands.h | 2 ++ include/linux/platform_data/cros_ec_proto.h | 1 + 2 files changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_commands.h b/include/linux/platform_data/cros_ec_commands.h index 8cfa8cfca77e..9fbf1c5eb8d3 100644 --- a/include/linux/platform_data/cros_ec_commands.h +++ b/include/linux/platform_data/cros_ec_commands.h @@ -1300,6 +1300,8 @@ enum ec_feature_code { * mux. */ EC_FEATURE_TYPEC_MUX_REQUIRE_AP_ACK = 43, + /* The MCU is a System Companion Processor (SCP) 2nd Core. */ + EC_FEATURE_SCP_C1 = 45, }; #define EC_FEATURE_MASK_0(event_code) BIT(event_code % 32) diff --git a/include/linux/platform_data/cros_ec_proto.h b/include/linux/platform_data/cros_ec_proto.h index 138fd912c808..da06dc7cf1cb 100644 --- a/include/linux/platform_data/cros_ec_proto.h +++ b/include/linux/platform_data/cros_ec_proto.h @@ -19,6 +19,7 @@ #define CROS_EC_DEV_ISH_NAME "cros_ish" #define CROS_EC_DEV_PD_NAME "cros_pd" #define CROS_EC_DEV_SCP_NAME "cros_scp" +#define CROS_EC_DEV_SCP_C1_NAME "cros_scp_c1" #define CROS_EC_DEV_TP_NAME "cros_tp" /* -- cgit From 4a346a03a63cb45f7766d9d6559cf3fee783e926 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Tue, 14 Jun 2022 17:21:48 +0200 Subject: mfd: twl: Remove platform data support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There is no in-tree machine that provides a struct twl4030_platform_data since commit e92fc4f04a34 ("ARM: OMAP2+: Drop legacy board file for LDP"). So assume dev_get_platdata() returns NULL in twl_probe() and simplify accordingly. Signed-off-by: Uwe Kleine-König Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220614152148.252820-1-u.kleine-koenig@pengutronix.de --- include/linux/mfd/twl.h | 55 ------------------------------------------------- 1 file changed, 55 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/twl.h b/include/linux/mfd/twl.h index 8871cc5188a0..e426938eafd5 100644 --- a/include/linux/mfd/twl.h +++ b/include/linux/mfd/twl.h @@ -694,61 +694,6 @@ struct twl4030_audio_data { unsigned int irq_base; }; -struct twl4030_platform_data { - struct twl4030_clock_init_data *clock; - struct twl4030_bci_platform_data *bci; - struct twl4030_gpio_platform_data *gpio; - struct twl4030_madc_platform_data *madc; - struct twl4030_keypad_data *keypad; - struct twl4030_usb_data *usb; - struct twl4030_power_data *power; - struct twl4030_audio_data *audio; - - /* Common LDO regulators for TWL4030/TWL6030 */ - struct regulator_init_data *vdac; - struct regulator_init_data *vaux1; - struct regulator_init_data *vaux2; - struct regulator_init_data *vaux3; - struct regulator_init_data *vdd1; - struct regulator_init_data *vdd2; - struct regulator_init_data *vdd3; - /* TWL4030 LDO regulators */ - struct regulator_init_data *vpll1; - struct regulator_init_data *vpll2; - struct regulator_init_data *vmmc1; - struct regulator_init_data *vmmc2; - struct regulator_init_data *vsim; - struct regulator_init_data *vaux4; - struct regulator_init_data *vio; - struct regulator_init_data *vintana1; - struct regulator_init_data *vintana2; - struct regulator_init_data *vintdig; - /* TWL6030 LDO regulators */ - struct regulator_init_data *vmmc; - struct regulator_init_data *vpp; - struct regulator_init_data *vusim; - struct regulator_init_data *vana; - struct regulator_init_data *vcxio; - struct regulator_init_data *vusb; - struct regulator_init_data *clk32kg; - struct regulator_init_data *v1v8; - struct regulator_init_data *v2v1; - /* TWL6032 LDO regulators */ - struct regulator_init_data *ldo1; - struct regulator_init_data *ldo2; - struct regulator_init_data *ldo3; - struct regulator_init_data *ldo4; - struct regulator_init_data *ldo5; - struct regulator_init_data *ldo6; - struct regulator_init_data *ldo7; - struct regulator_init_data *ldoln; - struct regulator_init_data *ldousb; - /* TWL6032 DCDC regulators */ - struct regulator_init_data *smps3; - struct regulator_init_data *smps4; - struct regulator_init_data *vio6025; -}; - struct twl_regulator_driver_data { int (*set_voltage)(void *data, int target_uV); int (*get_voltage)(void *data); -- cgit From c55333064d6ea9a26c7e8cfbe2ba1fa77c3a8e48 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Sun, 19 Jun 2022 10:26:55 +0200 Subject: mfd: tc6393xb: Make disable callback return void MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit All implementations return 0, so simplify accordingly. This is a preparation for making platform remove callbacks return void. Signed-off-by: Uwe Kleine-König Reported-by: kernel test robot Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220619082655.53728-1-u.kleine-koenig@pengutronix.de --- include/linux/mfd/tc6393xb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mfd/tc6393xb.h b/include/linux/mfd/tc6393xb.h index d336c541b7df..d17807f2d0c9 100644 --- a/include/linux/mfd/tc6393xb.h +++ b/include/linux/mfd/tc6393xb.h @@ -22,7 +22,7 @@ struct tc6393xb_platform_data { u16 scr_gper; /* GP Enable */ int (*enable)(struct platform_device *dev); - int (*disable)(struct platform_device *dev); + void (*disable)(struct platform_device *dev); int (*suspend)(struct platform_device *dev); int (*resume)(struct platform_device *dev); -- cgit From 79f821b5a3bf46d2d5ee2dc0e2b2428b1062a040 Mon Sep 17 00:00:00 2001 From: Zhang Jiaming Date: Thu, 23 Jun 2022 15:33:33 +0800 Subject: mfd: ipaq-micro: Fix spelling mistake of "receive{d}" Change 'receieved' to 'received' and 'recieve' to 'receive'. Signed-off-by: Zhang Jiaming Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220623073333.5675-1-jiaming@nfschina.com --- include/linux/mfd/ipaq-micro.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mfd/ipaq-micro.h b/include/linux/mfd/ipaq-micro.h index ee48a4321c57..d5caa4c86ecc 100644 --- a/include/linux/mfd/ipaq-micro.h +++ b/include/linux/mfd/ipaq-micro.h @@ -75,8 +75,8 @@ struct ipaq_micro_rxdev { * @id: 4-bit ID of the message * @tx_len: length of TX data * @tx_data: TX data to send - * @rx_len: length of receieved RX data - * @rx_data: RX data to recieve + * @rx_len: length of received RX data + * @rx_data: RX data to receive * @ack: a completion that will be completed when RX is complete * @node: list node if message gets queued */ -- cgit From d9cd0bc604705eb01497c3e483f8820a9677c682 Mon Sep 17 00:00:00 2001 From: AngeloGioacchino Del Regno Date: Mon, 27 Jun 2022 14:39:54 +0200 Subject: mfd: mt6397: Add basic support for MT6331+MT6332 PMIC Add support for the MT6331 PMIC with MT6332 Companion PMIC, found in MT6795 Helio X10 smartphone platforms. This combo has support for multiple devices but, for a start, only the following have been implemented: - Regulators (two instances, one in MT6331, one in MT6332) - RTC (MT6331) - Keys (MT6331) - Interrupts (MT6331 also dispatches MT6332's interrupts) There's more to be implemented, especially for MT6332, which will come at a later stage. Signed-off-by: AngeloGioacchino Del Regno Reviewed-by: Matthias Brugger Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220627123954.64299-1-angelogioacchino.delregno@collabora.com --- include/linux/mfd/mt6331/core.h | 40 +++ include/linux/mfd/mt6331/registers.h | 584 +++++++++++++++++++++++++++++++ include/linux/mfd/mt6332/core.h | 65 ++++ include/linux/mfd/mt6332/registers.h | 642 +++++++++++++++++++++++++++++++++++ include/linux/mfd/mt6397/core.h | 2 + 5 files changed, 1333 insertions(+) create mode 100644 include/linux/mfd/mt6331/core.h create mode 100644 include/linux/mfd/mt6331/registers.h create mode 100644 include/linux/mfd/mt6332/core.h create mode 100644 include/linux/mfd/mt6332/registers.h (limited to 'include/linux') diff --git a/include/linux/mfd/mt6331/core.h b/include/linux/mfd/mt6331/core.h new file mode 100644 index 000000000000..df8e6b1e4bc1 --- /dev/null +++ b/include/linux/mfd/mt6331/core.h @@ -0,0 +1,40 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2022 AngeloGioacchino Del Regno + */ + +#ifndef __MFD_MT6331_CORE_H__ +#define __MFD_MT6331_CORE_H__ + +enum mt6331_irq_status_numbers { + MT6331_IRQ_STATUS_PWRKEY = 0, + MT6331_IRQ_STATUS_HOMEKEY, + MT6331_IRQ_STATUS_CHRDET, + MT6331_IRQ_STATUS_THR_H, + MT6331_IRQ_STATUS_THR_L, + MT6331_IRQ_STATUS_BAT_H, + MT6331_IRQ_STATUS_BAT_L, + MT6331_IRQ_STATUS_RTC, + MT6331_IRQ_STATUS_AUDIO, + MT6331_IRQ_STATUS_MAD, + MT6331_IRQ_STATUS_ACCDET, + MT6331_IRQ_STATUS_ACCDET_EINT, + MT6331_IRQ_STATUS_ACCDET_NEGV = 12, + MT6331_IRQ_STATUS_VDVFS11_OC = 16, + MT6331_IRQ_STATUS_VDVFS12_OC, + MT6331_IRQ_STATUS_VDVFS13_OC, + MT6331_IRQ_STATUS_VDVFS14_OC, + MT6331_IRQ_STATUS_GPU_OC, + MT6331_IRQ_STATUS_VCORE1_OC, + MT6331_IRQ_STATUS_VCORE2_OC, + MT6331_IRQ_STATUS_VIO18_OC, + MT6331_IRQ_STATUS_LDO_OC, + MT6331_IRQ_STATUS_NR, +}; + +#define MT6331_IRQ_CON0_BASE MT6331_IRQ_STATUS_PWRKEY +#define MT6331_IRQ_CON0_BITS (MT6331_IRQ_STATUS_ACCDET_NEGV + 1) +#define MT6331_IRQ_CON1_BASE MT6331_IRQ_STATUS_VDVFS11_OC +#define MT6331_IRQ_CON1_BITS (MT6331_IRQ_STATUS_LDO_OC - MT6331_IRQ_STATUS_VDFS11_OC + 1) + +#endif /* __MFD_MT6331_CORE_H__ */ diff --git a/include/linux/mfd/mt6331/registers.h b/include/linux/mfd/mt6331/registers.h new file mode 100644 index 000000000000..e2be6bccd1a7 --- /dev/null +++ b/include/linux/mfd/mt6331/registers.h @@ -0,0 +1,584 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2022 AngeloGioacchino Del Regno + */ + +#ifndef __MFD_MT6331_REGISTERS_H__ +#define __MFD_MT6331_REGISTERS_H__ + +/* PMIC Registers */ +#define MT6331_STRUP_CON0 0x0 +#define MT6331_STRUP_CON2 0x2 +#define MT6331_STRUP_CON3 0x4 +#define MT6331_STRUP_CON4 0x6 +#define MT6331_STRUP_CON5 0x8 +#define MT6331_STRUP_CON6 0xA +#define MT6331_STRUP_CON7 0xC +#define MT6331_STRUP_CON8 0xE +#define MT6331_STRUP_CON9 0x10 +#define MT6331_STRUP_CON10 0x12 +#define MT6331_STRUP_CON11 0x14 +#define MT6331_STRUP_CON12 0x16 +#define MT6331_STRUP_CON13 0x18 +#define MT6331_STRUP_CON14 0x1A +#define MT6331_STRUP_CON15 0x1C +#define MT6331_STRUP_CON16 0x1E +#define MT6331_STRUP_CON17 0x20 +#define MT6331_STRUP_CON18 0x22 +#define MT6331_HWCID 0x100 +#define MT6331_SWCID 0x102 +#define MT6331_EXT_PMIC_STATUS 0x104 +#define MT6331_TOP_CON 0x106 +#define MT6331_TEST_OUT 0x108 +#define MT6331_TEST_CON0 0x10A +#define MT6331_TEST_CON1 0x10C +#define MT6331_TESTMODE_SW 0x10E +#define MT6331_EN_STATUS0 0x110 +#define MT6331_EN_STATUS1 0x112 +#define MT6331_EN_STATUS2 0x114 +#define MT6331_OCSTATUS0 0x116 +#define MT6331_OCSTATUS1 0x118 +#define MT6331_OCSTATUS2 0x11A +#define MT6331_PGSTATUS 0x11C +#define MT6331_TOPSTATUS 0x11E +#define MT6331_TDSEL_CON 0x120 +#define MT6331_RDSEL_CON 0x122 +#define MT6331_SMT_CON0 0x124 +#define MT6331_SMT_CON1 0x126 +#define MT6331_SMT_CON2 0x128 +#define MT6331_DRV_CON0 0x12A +#define MT6331_DRV_CON1 0x12C +#define MT6331_DRV_CON2 0x12E +#define MT6331_DRV_CON3 0x130 +#define MT6331_TOP_STATUS 0x132 +#define MT6331_TOP_STATUS_SET 0x134 +#define MT6331_TOP_STATUS_CLR 0x136 +#define MT6331_TOP_CKPDN_CON0 0x138 +#define MT6331_TOP_CKPDN_CON0_SET 0x13A +#define MT6331_TOP_CKPDN_CON0_CLR 0x13C +#define MT6331_TOP_CKPDN_CON1 0x13E +#define MT6331_TOP_CKPDN_CON1_SET 0x140 +#define MT6331_TOP_CKPDN_CON1_CLR 0x142 +#define MT6331_TOP_CKPDN_CON2 0x144 +#define MT6331_TOP_CKPDN_CON2_SET 0x146 +#define MT6331_TOP_CKPDN_CON2_CLR 0x148 +#define MT6331_TOP_CKSEL_CON 0x14A +#define MT6331_TOP_CKSEL_CON_SET 0x14C +#define MT6331_TOP_CKSEL_CON_CLR 0x14E +#define MT6331_TOP_CKHWEN_CON 0x150 +#define MT6331_TOP_CKHWEN_CON_SET 0x152 +#define MT6331_TOP_CKHWEN_CON_CLR 0x154 +#define MT6331_TOP_CKTST_CON0 0x156 +#define MT6331_TOP_CKTST_CON1 0x158 +#define MT6331_TOP_CLKSQ 0x15A +#define MT6331_TOP_CLKSQ_SET 0x15C +#define MT6331_TOP_CLKSQ_CLR 0x15E +#define MT6331_TOP_RST_CON 0x160 +#define MT6331_TOP_RST_CON_SET 0x162 +#define MT6331_TOP_RST_CON_CLR 0x164 +#define MT6331_TOP_RST_MISC 0x166 +#define MT6331_TOP_RST_MISC_SET 0x168 +#define MT6331_TOP_RST_MISC_CLR 0x16A +#define MT6331_INT_CON0 0x16C +#define MT6331_INT_CON0_SET 0x16E +#define MT6331_INT_CON0_CLR 0x170 +#define MT6331_INT_CON1 0x172 +#define MT6331_INT_CON1_SET 0x174 +#define MT6331_INT_CON1_CLR 0x176 +#define MT6331_INT_MISC_CON 0x178 +#define MT6331_INT_MISC_CON_SET 0x17A +#define MT6331_INT_MISC_CON_CLR 0x17C +#define MT6331_INT_STATUS_CON0 0x17E +#define MT6331_INT_STATUS_CON1 0x180 +#define MT6331_OC_GEAR_0 0x182 +#define MT6331_FQMTR_CON0 0x184 +#define MT6331_FQMTR_CON1 0x186 +#define MT6331_FQMTR_CON2 0x188 +#define MT6331_RG_SPI_CON 0x18A +#define MT6331_DEW_DIO_EN 0x18C +#define MT6331_DEW_READ_TEST 0x18E +#define MT6331_DEW_WRITE_TEST 0x190 +#define MT6331_DEW_CRC_SWRST 0x192 +#define MT6331_DEW_CRC_EN 0x194 +#define MT6331_DEW_CRC_VAL 0x196 +#define MT6331_DEW_DBG_MON_SEL 0x198 +#define MT6331_DEW_CIPHER_KEY_SEL 0x19A +#define MT6331_DEW_CIPHER_IV_SEL 0x19C +#define MT6331_DEW_CIPHER_EN 0x19E +#define MT6331_DEW_CIPHER_RDY 0x1A0 +#define MT6331_DEW_CIPHER_MODE 0x1A2 +#define MT6331_DEW_CIPHER_SWRST 0x1A4 +#define MT6331_DEW_RDDMY_NO 0x1A6 +#define MT6331_INT_TYPE_CON0 0x1A8 +#define MT6331_INT_TYPE_CON0_SET 0x1AA +#define MT6331_INT_TYPE_CON0_CLR 0x1AC +#define MT6331_INT_TYPE_CON1 0x1AE +#define MT6331_INT_TYPE_CON1_SET 0x1B0 +#define MT6331_INT_TYPE_CON1_CLR 0x1B2 +#define MT6331_INT_STA 0x1B4 +#define MT6331_BUCK_ALL_CON0 0x200 +#define MT6331_BUCK_ALL_CON1 0x202 +#define MT6331_BUCK_ALL_CON2 0x204 +#define MT6331_BUCK_ALL_CON3 0x206 +#define MT6331_BUCK_ALL_CON4 0x208 +#define MT6331_BUCK_ALL_CON5 0x20A +#define MT6331_BUCK_ALL_CON6 0x20C +#define MT6331_BUCK_ALL_CON7 0x20E +#define MT6331_BUCK_ALL_CON8 0x210 +#define MT6331_BUCK_ALL_CON9 0x212 +#define MT6331_BUCK_ALL_CON10 0x214 +#define MT6331_BUCK_ALL_CON11 0x216 +#define MT6331_BUCK_ALL_CON12 0x218 +#define MT6331_BUCK_ALL_CON13 0x21A +#define MT6331_BUCK_ALL_CON14 0x21C +#define MT6331_BUCK_ALL_CON15 0x21E +#define MT6331_BUCK_ALL_CON16 0x220 +#define MT6331_BUCK_ALL_CON17 0x222 +#define MT6331_BUCK_ALL_CON18 0x224 +#define MT6331_BUCK_ALL_CON19 0x226 +#define MT6331_BUCK_ALL_CON20 0x228 +#define MT6331_BUCK_ALL_CON21 0x22A +#define MT6331_BUCK_ALL_CON22 0x22C +#define MT6331_BUCK_ALL_CON23 0x22E +#define MT6331_BUCK_ALL_CON24 0x230 +#define MT6331_BUCK_ALL_CON25 0x232 +#define MT6331_BUCK_ALL_CON26 0x234 +#define MT6331_VDVFS11_CON0 0x236 +#define MT6331_VDVFS11_CON1 0x238 +#define MT6331_VDVFS11_CON2 0x23A +#define MT6331_VDVFS11_CON3 0x23C +#define MT6331_VDVFS11_CON4 0x23E +#define MT6331_VDVFS11_CON5 0x240 +#define MT6331_VDVFS11_CON6 0x242 +#define MT6331_VDVFS11_CON7 0x244 +#define MT6331_VDVFS11_CON8 0x246 +#define MT6331_VDVFS11_CON9 0x248 +#define MT6331_VDVFS11_CON10 0x24A +#define MT6331_VDVFS11_CON11 0x24C +#define MT6331_VDVFS11_CON12 0x24E +#define MT6331_VDVFS11_CON13 0x250 +#define MT6331_VDVFS11_CON14 0x252 +#define MT6331_VDVFS11_CON18 0x25A +#define MT6331_VDVFS11_CON19 0x25C +#define MT6331_VDVFS11_CON20 0x25E +#define MT6331_VDVFS11_CON21 0x260 +#define MT6331_VDVFS11_CON22 0x262 +#define MT6331_VDVFS11_CON23 0x264 +#define MT6331_VDVFS11_CON24 0x266 +#define MT6331_VDVFS11_CON25 0x268 +#define MT6331_VDVFS11_CON26 0x26A +#define MT6331_VDVFS11_CON27 0x26C +#define MT6331_VDVFS12_CON0 0x26E +#define MT6331_VDVFS12_CON1 0x270 +#define MT6331_VDVFS12_CON2 0x272 +#define MT6331_VDVFS12_CON3 0x274 +#define MT6331_VDVFS12_CON4 0x276 +#define MT6331_VDVFS12_CON5 0x278 +#define MT6331_VDVFS12_CON6 0x27A +#define MT6331_VDVFS12_CON7 0x27C +#define MT6331_VDVFS12_CON8 0x27E +#define MT6331_VDVFS12_CON9 0x280 +#define MT6331_VDVFS12_CON10 0x282 +#define MT6331_VDVFS12_CON11 0x284 +#define MT6331_VDVFS12_CON12 0x286 +#define MT6331_VDVFS12_CON13 0x288 +#define MT6331_VDVFS12_CON14 0x28A +#define MT6331_VDVFS12_CON18 0x292 +#define MT6331_VDVFS12_CON19 0x294 +#define MT6331_VDVFS12_CON20 0x296 +#define MT6331_VDVFS13_CON0 0x298 +#define MT6331_VDVFS13_CON1 0x29A +#define MT6331_VDVFS13_CON2 0x29C +#define MT6331_VDVFS13_CON3 0x29E +#define MT6331_VDVFS13_CON4 0x2A0 +#define MT6331_VDVFS13_CON5 0x2A2 +#define MT6331_VDVFS13_CON6 0x2A4 +#define MT6331_VDVFS13_CON7 0x2A6 +#define MT6331_VDVFS13_CON8 0x2A8 +#define MT6331_VDVFS13_CON9 0x2AA +#define MT6331_VDVFS13_CON10 0x2AC +#define MT6331_VDVFS13_CON11 0x2AE +#define MT6331_VDVFS13_CON12 0x2B0 +#define MT6331_VDVFS13_CON13 0x2B2 +#define MT6331_VDVFS13_CON14 0x2B4 +#define MT6331_VDVFS13_CON18 0x2BC +#define MT6331_VDVFS13_CON19 0x2BE +#define MT6331_VDVFS13_CON20 0x2C0 +#define MT6331_VDVFS14_CON0 0x2C2 +#define MT6331_VDVFS14_CON1 0x2C4 +#define MT6331_VDVFS14_CON2 0x2C6 +#define MT6331_VDVFS14_CON3 0x2C8 +#define MT6331_VDVFS14_CON4 0x2CA +#define MT6331_VDVFS14_CON5 0x2CC +#define MT6331_VDVFS14_CON6 0x2CE +#define MT6331_VDVFS14_CON7 0x2D0 +#define MT6331_VDVFS14_CON8 0x2D2 +#define MT6331_VDVFS14_CON9 0x2D4 +#define MT6331_VDVFS14_CON10 0x2D6 +#define MT6331_VDVFS14_CON11 0x2D8 +#define MT6331_VDVFS14_CON12 0x2DA +#define MT6331_VDVFS14_CON13 0x2DC +#define MT6331_VDVFS14_CON14 0x2DE +#define MT6331_VDVFS14_CON18 0x2E6 +#define MT6331_VDVFS14_CON19 0x2E8 +#define MT6331_VDVFS14_CON20 0x2EA +#define MT6331_VGPU_CON0 0x300 +#define MT6331_VGPU_CON1 0x302 +#define MT6331_VGPU_CON2 0x304 +#define MT6331_VGPU_CON3 0x306 +#define MT6331_VGPU_CON4 0x308 +#define MT6331_VGPU_CON5 0x30A +#define MT6331_VGPU_CON6 0x30C +#define MT6331_VGPU_CON7 0x30E +#define MT6331_VGPU_CON8 0x310 +#define MT6331_VGPU_CON9 0x312 +#define MT6331_VGPU_CON10 0x314 +#define MT6331_VGPU_CON11 0x316 +#define MT6331_VGPU_CON12 0x318 +#define MT6331_VGPU_CON13 0x31A +#define MT6331_VGPU_CON14 0x31C +#define MT6331_VGPU_CON15 0x31E +#define MT6331_VGPU_CON16 0x320 +#define MT6331_VGPU_CON17 0x322 +#define MT6331_VGPU_CON18 0x324 +#define MT6331_VGPU_CON19 0x326 +#define MT6331_VGPU_CON20 0x328 +#define MT6331_VCORE1_CON0 0x32A +#define MT6331_VCORE1_CON1 0x32C +#define MT6331_VCORE1_CON2 0x32E +#define MT6331_VCORE1_CON3 0x330 +#define MT6331_VCORE1_CON4 0x332 +#define MT6331_VCORE1_CON5 0x334 +#define MT6331_VCORE1_CON6 0x336 +#define MT6331_VCORE1_CON7 0x338 +#define MT6331_VCORE1_CON8 0x33A +#define MT6331_VCORE1_CON9 0x33C +#define MT6331_VCORE1_CON10 0x33E +#define MT6331_VCORE1_CON11 0x340 +#define MT6331_VCORE1_CON12 0x342 +#define MT6331_VCORE1_CON13 0x344 +#define MT6331_VCORE1_CON14 0x346 +#define MT6331_VCORE1_CON15 0x348 +#define MT6331_VCORE1_CON16 0x34A +#define MT6331_VCORE1_CON17 0x34C +#define MT6331_VCORE1_CON18 0x34E +#define MT6331_VCORE1_CON19 0x350 +#define MT6331_VCORE1_CON20 0x352 +#define MT6331_VCORE2_CON0 0x354 +#define MT6331_VCORE2_CON1 0x356 +#define MT6331_VCORE2_CON2 0x358 +#define MT6331_VCORE2_CON3 0x35A +#define MT6331_VCORE2_CON4 0x35C +#define MT6331_VCORE2_CON5 0x35E +#define MT6331_VCORE2_CON6 0x360 +#define MT6331_VCORE2_CON7 0x362 +#define MT6331_VCORE2_CON8 0x364 +#define MT6331_VCORE2_CON9 0x366 +#define MT6331_VCORE2_CON10 0x368 +#define MT6331_VCORE2_CON11 0x36A +#define MT6331_VCORE2_CON12 0x36C +#define MT6331_VCORE2_CON13 0x36E +#define MT6331_VCORE2_CON14 0x370 +#define MT6331_VCORE2_CON15 0x372 +#define MT6331_VCORE2_CON16 0x374 +#define MT6331_VCORE2_CON17 0x376 +#define MT6331_VCORE2_CON18 0x378 +#define MT6331_VCORE2_CON19 0x37A +#define MT6331_VCORE2_CON20 0x37C +#define MT6331_VCORE2_CON21 0x37E +#define MT6331_VIO18_CON0 0x380 +#define MT6331_VIO18_CON1 0x382 +#define MT6331_VIO18_CON2 0x384 +#define MT6331_VIO18_CON3 0x386 +#define MT6331_VIO18_CON4 0x388 +#define MT6331_VIO18_CON5 0x38A +#define MT6331_VIO18_CON6 0x38C +#define MT6331_VIO18_CON7 0x38E +#define MT6331_VIO18_CON8 0x390 +#define MT6331_VIO18_CON9 0x392 +#define MT6331_VIO18_CON10 0x394 +#define MT6331_VIO18_CON11 0x396 +#define MT6331_VIO18_CON12 0x398 +#define MT6331_VIO18_CON13 0x39A +#define MT6331_VIO18_CON14 0x39C +#define MT6331_VIO18_CON15 0x39E +#define MT6331_VIO18_CON16 0x3A0 +#define MT6331_VIO18_CON17 0x3A2 +#define MT6331_VIO18_CON18 0x3A4 +#define MT6331_VIO18_CON19 0x3A6 +#define MT6331_VIO18_CON20 0x3A8 +#define MT6331_BUCK_K_CON0 0x3AA +#define MT6331_BUCK_K_CON1 0x3AC +#define MT6331_BUCK_K_CON2 0x3AE +#define MT6331_BUCK_K_CON3 0x3B0 +#define MT6331_ZCD_CON0 0x400 +#define MT6331_ZCD_CON1 0x402 +#define MT6331_ZCD_CON2 0x404 +#define MT6331_ZCD_CON3 0x406 +#define MT6331_ZCD_CON4 0x408 +#define MT6331_ZCD_CON5 0x40A +#define MT6331_ISINK0_CON0 0x40C +#define MT6331_ISINK0_CON1 0x40E +#define MT6331_ISINK0_CON2 0x410 +#define MT6331_ISINK0_CON3 0x412 +#define MT6331_ISINK0_CON4 0x414 +#define MT6331_ISINK1_CON0 0x416 +#define MT6331_ISINK1_CON1 0x418 +#define MT6331_ISINK1_CON2 0x41A +#define MT6331_ISINK1_CON3 0x41C +#define MT6331_ISINK1_CON4 0x41E +#define MT6331_ISINK2_CON0 0x420 +#define MT6331_ISINK2_CON1 0x422 +#define MT6331_ISINK2_CON2 0x424 +#define MT6331_ISINK2_CON3 0x426 +#define MT6331_ISINK2_CON4 0x428 +#define MT6331_ISINK3_CON0 0x42A +#define MT6331_ISINK3_CON1 0x42C +#define MT6331_ISINK3_CON2 0x42E +#define MT6331_ISINK3_CON3 0x430 +#define MT6331_ISINK3_CON4 0x432 +#define MT6331_ISINK_ANA0 0x434 +#define MT6331_ISINK_ANA1 0x436 +#define MT6331_ISINK_PHASE_DLY 0x438 +#define MT6331_ISINK_EN_CTRL 0x43A +#define MT6331_ANALDO_CON0 0x500 +#define MT6331_ANALDO_CON1 0x502 +#define MT6331_ANALDO_CON2 0x504 +#define MT6331_ANALDO_CON3 0x506 +#define MT6331_ANALDO_CON4 0x508 +#define MT6331_ANALDO_CON5 0x50A +#define MT6331_ANALDO_CON6 0x50C +#define MT6331_ANALDO_CON7 0x50E +#define MT6331_ANALDO_CON8 0x510 +#define MT6331_ANALDO_CON9 0x512 +#define MT6331_ANALDO_CON10 0x514 +#define MT6331_ANALDO_CON11 0x516 +#define MT6331_ANALDO_CON12 0x518 +#define MT6331_ANALDO_CON13 0x51A +#define MT6331_SYSLDO_CON0 0x51C +#define MT6331_SYSLDO_CON1 0x51E +#define MT6331_SYSLDO_CON2 0x520 +#define MT6331_SYSLDO_CON3 0x522 +#define MT6331_SYSLDO_CON4 0x524 +#define MT6331_SYSLDO_CON5 0x526 +#define MT6331_SYSLDO_CON6 0x528 +#define MT6331_SYSLDO_CON7 0x52A +#define MT6331_SYSLDO_CON8 0x52C +#define MT6331_SYSLDO_CON9 0x52E +#define MT6331_SYSLDO_CON10 0x530 +#define MT6331_SYSLDO_CON11 0x532 +#define MT6331_SYSLDO_CON12 0x534 +#define MT6331_SYSLDO_CON13 0x536 +#define MT6331_SYSLDO_CON14 0x538 +#define MT6331_SYSLDO_CON15 0x53A +#define MT6331_SYSLDO_CON16 0x53C +#define MT6331_SYSLDO_CON17 0x53E +#define MT6331_SYSLDO_CON18 0x540 +#define MT6331_SYSLDO_CON19 0x542 +#define MT6331_SYSLDO_CON20 0x544 +#define MT6331_SYSLDO_CON21 0x546 +#define MT6331_DIGLDO_CON0 0x548 +#define MT6331_DIGLDO_CON1 0x54A +#define MT6331_DIGLDO_CON2 0x54C +#define MT6331_DIGLDO_CON3 0x54E +#define MT6331_DIGLDO_CON4 0x550 +#define MT6331_DIGLDO_CON5 0x552 +#define MT6331_DIGLDO_CON6 0x554 +#define MT6331_DIGLDO_CON7 0x556 +#define MT6331_DIGLDO_CON8 0x558 +#define MT6331_DIGLDO_CON9 0x55A +#define MT6331_DIGLDO_CON10 0x55C +#define MT6331_DIGLDO_CON11 0x55E +#define MT6331_DIGLDO_CON12 0x560 +#define MT6331_DIGLDO_CON13 0x562 +#define MT6331_DIGLDO_CON14 0x564 +#define MT6331_DIGLDO_CON15 0x566 +#define MT6331_DIGLDO_CON16 0x568 +#define MT6331_DIGLDO_CON17 0x56A +#define MT6331_DIGLDO_CON18 0x56C +#define MT6331_DIGLDO_CON19 0x56E +#define MT6331_DIGLDO_CON20 0x570 +#define MT6331_DIGLDO_CON21 0x572 +#define MT6331_DIGLDO_CON22 0x574 +#define MT6331_DIGLDO_CON23 0x576 +#define MT6331_DIGLDO_CON24 0x578 +#define MT6331_DIGLDO_CON25 0x57A +#define MT6331_DIGLDO_CON26 0x57C +#define MT6331_DIGLDO_CON27 0x57E +#define MT6331_DIGLDO_CON28 0x580 +#define MT6331_OTP_CON0 0x600 +#define MT6331_OTP_CON1 0x602 +#define MT6331_OTP_CON2 0x604 +#define MT6331_OTP_CON3 0x606 +#define MT6331_OTP_CON4 0x608 +#define MT6331_OTP_CON5 0x60A +#define MT6331_OTP_CON6 0x60C +#define MT6331_OTP_CON7 0x60E +#define MT6331_OTP_CON8 0x610 +#define MT6331_OTP_CON9 0x612 +#define MT6331_OTP_CON10 0x614 +#define MT6331_OTP_CON11 0x616 +#define MT6331_OTP_CON12 0x618 +#define MT6331_OTP_CON13 0x61A +#define MT6331_OTP_CON14 0x61C +#define MT6331_OTP_DOUT_0_15 0x61E +#define MT6331_OTP_DOUT_16_31 0x620 +#define MT6331_OTP_DOUT_32_47 0x622 +#define MT6331_OTP_DOUT_48_63 0x624 +#define MT6331_OTP_DOUT_64_79 0x626 +#define MT6331_OTP_DOUT_80_95 0x628 +#define MT6331_OTP_DOUT_96_111 0x62A +#define MT6331_OTP_DOUT_112_127 0x62C +#define MT6331_OTP_DOUT_128_143 0x62E +#define MT6331_OTP_DOUT_144_159 0x630 +#define MT6331_OTP_DOUT_160_175 0x632 +#define MT6331_OTP_DOUT_176_191 0x634 +#define MT6331_OTP_DOUT_192_207 0x636 +#define MT6331_OTP_DOUT_208_223 0x638 +#define MT6331_OTP_DOUT_224_239 0x63A +#define MT6331_OTP_DOUT_240_255 0x63C +#define MT6331_OTP_VAL_0_15 0x63E +#define MT6331_OTP_VAL_16_31 0x640 +#define MT6331_OTP_VAL_32_47 0x642 +#define MT6331_OTP_VAL_48_63 0x644 +#define MT6331_OTP_VAL_64_79 0x646 +#define MT6331_OTP_VAL_80_95 0x648 +#define MT6331_OTP_VAL_96_111 0x64A +#define MT6331_OTP_VAL_112_127 0x64C +#define MT6331_OTP_VAL_128_143 0x64E +#define MT6331_OTP_VAL_144_159 0x650 +#define MT6331_OTP_VAL_160_175 0x652 +#define MT6331_OTP_VAL_176_191 0x654 +#define MT6331_OTP_VAL_192_207 0x656 +#define MT6331_OTP_VAL_208_223 0x658 +#define MT6331_OTP_VAL_224_239 0x65A +#define MT6331_OTP_VAL_240_255 0x65C +#define MT6331_RTC_MIX_CON0 0x65E +#define MT6331_RTC_MIX_CON1 0x660 +#define MT6331_AUDDAC_CFG0 0x662 +#define MT6331_AUDBUF_CFG0 0x664 +#define MT6331_AUDBUF_CFG1 0x666 +#define MT6331_AUDBUF_CFG2 0x668 +#define MT6331_AUDBUF_CFG3 0x66A +#define MT6331_AUDBUF_CFG4 0x66C +#define MT6331_AUDBUF_CFG5 0x66E +#define MT6331_AUDBUF_CFG6 0x670 +#define MT6331_AUDBUF_CFG7 0x672 +#define MT6331_AUDBUF_CFG8 0x674 +#define MT6331_IBIASDIST_CFG0 0x676 +#define MT6331_AUDCLKGEN_CFG0 0x678 +#define MT6331_AUDLDO_CFG0 0x67A +#define MT6331_AUDDCDC_CFG0 0x67C +#define MT6331_AUDDCDC_CFG1 0x67E +#define MT6331_AUDNVREGGLB_CFG0 0x680 +#define MT6331_AUD_NCP0 0x682 +#define MT6331_AUD_ZCD_CFG0 0x684 +#define MT6331_AUDPREAMP_CFG0 0x686 +#define MT6331_AUDPREAMP_CFG1 0x688 +#define MT6331_AUDPREAMP_CFG2 0x68A +#define MT6331_AUDADC_CFG0 0x68C +#define MT6331_AUDADC_CFG1 0x68E +#define MT6331_AUDADC_CFG2 0x690 +#define MT6331_AUDADC_CFG3 0x692 +#define MT6331_AUDADC_CFG4 0x694 +#define MT6331_AUDADC_CFG5 0x696 +#define MT6331_AUDDIGMI_CFG0 0x698 +#define MT6331_AUDDIGMI_CFG1 0x69A +#define MT6331_AUDMICBIAS_CFG0 0x69C +#define MT6331_AUDMICBIAS_CFG1 0x69E +#define MT6331_AUDENCSPARE_CFG0 0x6A0 +#define MT6331_AUDPREAMPGAIN_CFG0 0x6A2 +#define MT6331_AUDMADPLL_CFG0 0x6A4 +#define MT6331_AUDMADPLL_CFG1 0x6A6 +#define MT6331_AUDMADPLL_CFG2 0x6A8 +#define MT6331_AUDLDO_NVREG_CFG0 0x6AA +#define MT6331_AUDLDO_NVREG_CFG1 0x6AC +#define MT6331_AUDLDO_NVREG_CFG2 0x6AE +#define MT6331_AUXADC_ADC0 0x700 +#define MT6331_AUXADC_ADC1 0x702 +#define MT6331_AUXADC_ADC2 0x704 +#define MT6331_AUXADC_ADC3 0x706 +#define MT6331_AUXADC_ADC4 0x708 +#define MT6331_AUXADC_ADC5 0x70A +#define MT6331_AUXADC_ADC6 0x70C +#define MT6331_AUXADC_ADC7 0x70E +#define MT6331_AUXADC_ADC8 0x710 +#define MT6331_AUXADC_ADC9 0x712 +#define MT6331_AUXADC_ADC10 0x714 +#define MT6331_AUXADC_ADC11 0x716 +#define MT6331_AUXADC_ADC12 0x718 +#define MT6331_AUXADC_ADC13 0x71A +#define MT6331_AUXADC_ADC14 0x71C +#define MT6331_AUXADC_ADC15 0x71E +#define MT6331_AUXADC_ADC16 0x720 +#define MT6331_AUXADC_ADC17 0x722 +#define MT6331_AUXADC_ADC18 0x724 +#define MT6331_AUXADC_ADC19 0x726 +#define MT6331_AUXADC_STA0 0x728 +#define MT6331_AUXADC_STA1 0x72A +#define MT6331_AUXADC_RQST0 0x72C +#define MT6331_AUXADC_RQST0_SET 0x72E +#define MT6331_AUXADC_RQST0_CLR 0x730 +#define MT6331_AUXADC_RQST1 0x732 +#define MT6331_AUXADC_RQST1_SET 0x734 +#define MT6331_AUXADC_RQST1_CLR 0x736 +#define MT6331_AUXADC_CON0 0x738 +#define MT6331_AUXADC_CON1 0x73A +#define MT6331_AUXADC_CON2 0x73C +#define MT6331_AUXADC_CON3 0x73E +#define MT6331_AUXADC_CON4 0x740 +#define MT6331_AUXADC_CON5 0x742 +#define MT6331_AUXADC_CON6 0x744 +#define MT6331_AUXADC_CON7 0x746 +#define MT6331_AUXADC_CON8 0x748 +#define MT6331_AUXADC_CON9 0x74A +#define MT6331_AUXADC_CON10 0x74C +#define MT6331_AUXADC_CON11 0x74E +#define MT6331_AUXADC_CON12 0x750 +#define MT6331_AUXADC_CON13 0x752 +#define MT6331_AUXADC_CON14 0x754 +#define MT6331_AUXADC_CON15 0x756 +#define MT6331_AUXADC_CON16 0x758 +#define MT6331_AUXADC_CON17 0x75A +#define MT6331_AUXADC_CON18 0x75C +#define MT6331_AUXADC_CON19 0x75E +#define MT6331_AUXADC_CON20 0x760 +#define MT6331_AUXADC_CON21 0x762 +#define MT6331_AUXADC_CON22 0x764 +#define MT6331_AUXADC_CON23 0x766 +#define MT6331_AUXADC_CON24 0x768 +#define MT6331_AUXADC_CON25 0x76A +#define MT6331_AUXADC_CON26 0x76C +#define MT6331_AUXADC_CON27 0x76E +#define MT6331_AUXADC_CON28 0x770 +#define MT6331_AUXADC_CON29 0x772 +#define MT6331_AUXADC_CON30 0x774 +#define MT6331_AUXADC_CON31 0x776 +#define MT6331_AUXADC_CON32 0x778 +#define MT6331_ACCDET_CON0 0x77A +#define MT6331_ACCDET_CON1 0x77C +#define MT6331_ACCDET_CON2 0x77E +#define MT6331_ACCDET_CON3 0x780 +#define MT6331_ACCDET_CON4 0x782 +#define MT6331_ACCDET_CON5 0x784 +#define MT6331_ACCDET_CON6 0x786 +#define MT6331_ACCDET_CON7 0x788 +#define MT6331_ACCDET_CON8 0x78A +#define MT6331_ACCDET_CON9 0x78C +#define MT6331_ACCDET_CON10 0x78E +#define MT6331_ACCDET_CON11 0x790 +#define MT6331_ACCDET_CON12 0x792 +#define MT6331_ACCDET_CON13 0x794 +#define MT6331_ACCDET_CON14 0x796 +#define MT6331_ACCDET_CON15 0x798 +#define MT6331_ACCDET_CON16 0x79A +#define MT6331_ACCDET_CON17 0x79C +#define MT6331_ACCDET_CON18 0x79E +#define MT6331_ACCDET_CON19 0x7A0 +#define MT6331_ACCDET_CON20 0x7A2 +#define MT6331_ACCDET_CON21 0x7A4 +#define MT6331_ACCDET_CON22 0x7A6 +#define MT6331_ACCDET_CON23 0x7A8 +#define MT6331_ACCDET_CON24 0x7AA + +#endif /* __MFD_MT6331_REGISTERS_H__ */ diff --git a/include/linux/mfd/mt6332/core.h b/include/linux/mfd/mt6332/core.h new file mode 100644 index 000000000000..cd6013eb82d9 --- /dev/null +++ b/include/linux/mfd/mt6332/core.h @@ -0,0 +1,65 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2022 AngeloGioacchino Del Regno + */ + +#ifndef __MFD_MT6332_CORE_H__ +#define __MFD_MT6332_CORE_H__ + +enum mt6332_irq_status_numbers { + MT6332_IRQ_STATUS_CHR_COMPLETE = 0, + MT6332_IRQ_STATUS_THERMAL_SD, + MT6332_IRQ_STATUS_THERMAL_REG_IN, + MT6332_IRQ_STATUS_THERMAL_REG_OUT, + MT6332_IRQ_STATUS_OTG_OC, + MT6332_IRQ_STATUS_CHR_OC, + MT6332_IRQ_STATUS_OTG_THERMAL, + MT6332_IRQ_STATUS_CHRIN_SHORT, + MT6332_IRQ_STATUS_DRVCDT_SHORT, + MT6332_IRQ_STATUS_PLUG_IN_FLASH, + MT6332_IRQ_STATUS_CHRWDT_FLAG, + MT6332_IRQ_STATUS_FLASH_EN_TIMEOUT, + MT6332_IRQ_STATUS_FLASH_VLED1_SHORT, + MT6332_IRQ_STATUS_FLASH_VLED1_OPEN = 13, + MT6332_IRQ_STATUS_OV = 16, + MT6332_IRQ_STATUS_BVALID_DET, + MT6332_IRQ_STATUS_VBATON_UNDET, + MT6332_IRQ_STATUS_CHR_PLUG_IN, + MT6332_IRQ_STATUS_CHR_PLUG_OUT, + MT6332_IRQ_STATUS_BC11_TIMEOUT, + MT6332_IRQ_STATUS_FLASH_VLED2_SHORT, + MT6332_IRQ_STATUS_FLASH_VLED2_OPEN = 23, + MT6332_IRQ_STATUS_THR_H = 32, + MT6332_IRQ_STATUS_THR_L, + MT6332_IRQ_STATUS_BAT_H, + MT6332_IRQ_STATUS_BAT_L, + MT6332_IRQ_STATUS_M3_H, + MT6332_IRQ_STATUS_M3_L, + MT6332_IRQ_STATUS_FG_BAT_H, + MT6332_IRQ_STATUS_FG_BAT_L, + MT6332_IRQ_STATUS_FG_CUR_H, + MT6332_IRQ_STATUS_FG_CUR_L, + MT6332_IRQ_STATUS_SPKL_D, + MT6332_IRQ_STATUS_SPKL_AB, + MT6332_IRQ_STATUS_BIF, + MT6332_IRQ_STATUS_VWLED_OC = 45, + MT6332_IRQ_STATUS_VDRAM_OC = 48, + MT6332_IRQ_STATUS_VDVFS2_OC, + MT6332_IRQ_STATUS_VRF1_OC, + MT6332_IRQ_STATUS_VRF2_OC, + MT6332_IRQ_STATUS_VPA_OC, + MT6332_IRQ_STATUS_VSBST_OC, + MT6332_IRQ_STATUS_LDO_OC, + MT6332_IRQ_STATUS_NR, +}; + +#define MT6332_IRQ_CON0_BASE MT6332_IRQ_STATUS_CHR_COMPLETE +#define MT6332_IRQ_CON0_BITS (MT6332_IRQ_STATUS_FLASH_VLED1_OPEN + 1) +#define MT6332_IRQ_CON1_BASE MT6332_IRQ_STATUS_OV +#define MT6332_IRQ_CON1_BITS (MT6332_IRQ_STATUS_FLASH_VLED2_OPEN - MT6332_IRQ_STATUS_OV + 1) +#define MT6332_IRQ_CON2_BASE MT6332_IRQ_STATUS_THR_H +#define MT6332_IRQ_CON2_BITS (MT6332_IRQ_STATUS_VWLED_OC - MT6332_IRQ_STATUS_THR_H + 1) +#define MT6332_IRQ_CON3_BASE MT6332_IRQ_STATUS_VDRAM_OC +#define MT6332_IRQ_CON3_BITS (MT6332_IRQ_STATUS_LDO_OC - MT6332_IRQ_STATUS_VDRAM_OC + 1) + +#endif /* __MFD_MT6332_CORE_H__ */ diff --git a/include/linux/mfd/mt6332/registers.h b/include/linux/mfd/mt6332/registers.h new file mode 100644 index 000000000000..65e0b86fceac --- /dev/null +++ b/include/linux/mfd/mt6332/registers.h @@ -0,0 +1,642 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2022 AngeloGioacchino Del Regno + */ + +#ifndef __MFD_MT6332_REGISTERS_H__ +#define __MFD_MT6332_REGISTERS_H__ + +/* PMIC Registers */ +#define MT6332_HWCID 0x8000 +#define MT6332_SWCID 0x8002 +#define MT6332_TOP_CON 0x8004 +#define MT6332_DDR_VREF_AP_CON 0x8006 +#define MT6332_DDR_VREF_DQ_CON 0x8008 +#define MT6332_DDR_VREF_CA_CON 0x800A +#define MT6332_TEST_OUT 0x800C +#define MT6332_TEST_CON0 0x800E +#define MT6332_TEST_CON1 0x8010 +#define MT6332_TESTMODE_SW 0x8012 +#define MT6332_TESTMODE_ANA 0x8014 +#define MT6332_TDSEL_CON 0x8016 +#define MT6332_RDSEL_CON 0x8018 +#define MT6332_SMT_CON0 0x801A +#define MT6332_SMT_CON1 0x801C +#define MT6332_DRV_CON0 0x801E +#define MT6332_DRV_CON1 0x8020 +#define MT6332_DRV_CON2 0x8022 +#define MT6332_EN_STATUS0 0x8024 +#define MT6332_OCSTATUS0 0x8026 +#define MT6332_TOP_STATUS 0x8028 +#define MT6332_TOP_STATUS_SET 0x802A +#define MT6332_TOP_STATUS_CLR 0x802C +#define MT6332_FLASH_CON0 0x802E +#define MT6332_FLASH_CON1 0x8030 +#define MT6332_FLASH_CON2 0x8032 +#define MT6332_CORE_CON0 0x8034 +#define MT6332_CORE_CON1 0x8036 +#define MT6332_CORE_CON2 0x8038 +#define MT6332_CORE_CON3 0x803A +#define MT6332_CORE_CON4 0x803C +#define MT6332_CORE_CON5 0x803E +#define MT6332_CORE_CON6 0x8040 +#define MT6332_CORE_CON7 0x8042 +#define MT6332_CORE_CON8 0x8044 +#define MT6332_CORE_CON9 0x8046 +#define MT6332_CORE_CON10 0x8048 +#define MT6332_CORE_CON11 0x804A +#define MT6332_CORE_CON12 0x804C +#define MT6332_CORE_CON13 0x804E +#define MT6332_CORE_CON14 0x8050 +#define MT6332_CORE_CON15 0x8052 +#define MT6332_STA_CON0 0x8054 +#define MT6332_STA_CON1 0x8056 +#define MT6332_STA_CON2 0x8058 +#define MT6332_STA_CON3 0x805A +#define MT6332_STA_CON4 0x805C +#define MT6332_STA_CON5 0x805E +#define MT6332_STA_CON6 0x8060 +#define MT6332_STA_CON7 0x8062 +#define MT6332_CHR_CON0 0x8064 +#define MT6332_CHR_CON1 0x8066 +#define MT6332_CHR_CON2 0x8068 +#define MT6332_CHR_CON3 0x806A +#define MT6332_CHR_CON4 0x806C +#define MT6332_CHR_CON5 0x806E +#define MT6332_CHR_CON6 0x8070 +#define MT6332_CHR_CON7 0x8072 +#define MT6332_CHR_CON8 0x8074 +#define MT6332_CHR_CON9 0x8076 +#define MT6332_CHR_CON10 0x8078 +#define MT6332_CHR_CON11 0x807A +#define MT6332_CHR_CON12 0x807C +#define MT6332_CHR_CON13 0x807E +#define MT6332_CHR_CON14 0x8080 +#define MT6332_CHR_CON15 0x8082 +#define MT6332_BOOST_CON0 0x8084 +#define MT6332_BOOST_CON1 0x8086 +#define MT6332_BOOST_CON2 0x8088 +#define MT6332_BOOST_CON3 0x808A +#define MT6332_BOOST_CON4 0x808C +#define MT6332_BOOST_CON5 0x808E +#define MT6332_BOOST_CON6 0x8090 +#define MT6332_BOOST_CON7 0x8092 +#define MT6332_TOP_CKPDN_CON0 0x8094 +#define MT6332_TOP_CKPDN_CON0_SET 0x8096 +#define MT6332_TOP_CKPDN_CON0_CLR 0x8098 +#define MT6332_TOP_CKPDN_CON1 0x809A +#define MT6332_TOP_CKPDN_CON1_SET 0x809C +#define MT6332_TOP_CKPDN_CON1_CLR 0x809E +#define MT6332_TOP_CKPDN_CON2 0x80A0 +#define MT6332_TOP_CKPDN_CON2_SET 0x80A2 +#define MT6332_TOP_CKPDN_CON2_CLR 0x80A4 +#define MT6332_TOP_CKSEL_CON0 0x80A6 +#define MT6332_TOP_CKSEL_CON0_SET 0x80A8 +#define MT6332_TOP_CKSEL_CON0_CLR 0x80AA +#define MT6332_TOP_CKSEL_CON1 0x80AC +#define MT6332_TOP_CKSEL_CON1_SET 0x80AE +#define MT6332_TOP_CKSEL_CON1_CLR 0x80B0 +#define MT6332_TOP_CKHWEN_CON 0x80B2 +#define MT6332_TOP_CKHWEN_CON_SET 0x80B4 +#define MT6332_TOP_CKHWEN_CON_CLR 0x80B6 +#define MT6332_TOP_CKTST_CON0 0x80B8 +#define MT6332_TOP_CKTST_CON1 0x80BA +#define MT6332_TOP_RST_CON 0x80BC +#define MT6332_TOP_RST_CON_SET 0x80BE +#define MT6332_TOP_RST_CON_CLR 0x80C0 +#define MT6332_TOP_RST_MISC 0x80C2 +#define MT6332_TOP_RST_MISC_SET 0x80C4 +#define MT6332_TOP_RST_MISC_CLR 0x80C6 +#define MT6332_INT_CON0 0x80C8 +#define MT6332_INT_CON0_SET 0x80CA +#define MT6332_INT_CON0_CLR 0x80CC +#define MT6332_INT_CON1 0x80CE +#define MT6332_INT_CON1_SET 0x80D0 +#define MT6332_INT_CON1_CLR 0x80D2 +#define MT6332_INT_CON2 0x80D4 +#define MT6332_INT_CON2_SET 0x80D6 +#define MT6332_INT_CON2_CLR 0x80D8 +#define MT6332_INT_CON3 0x80DA +#define MT6332_INT_CON3_SET 0x80DC +#define MT6332_INT_CON3_CLR 0x80DE +#define MT6332_CHRWDT_CON0 0x80E0 +#define MT6332_CHRWDT_STATUS0 0x80E2 +#define MT6332_INT_STATUS0 0x80E4 +#define MT6332_INT_STATUS1 0x80E6 +#define MT6332_INT_STATUS2 0x80E8 +#define MT6332_INT_STATUS3 0x80EA +#define MT6332_OC_GEAR_0 0x80EC +#define MT6332_OC_GEAR_1 0x80EE +#define MT6332_OC_GEAR_2 0x80F0 +#define MT6332_INT_MISC_CON 0x80F2 +#define MT6332_RG_SPI_CON 0x80F4 +#define MT6332_DEW_DIO_EN 0x80F6 +#define MT6332_DEW_READ_TEST 0x80F8 +#define MT6332_DEW_WRITE_TEST 0x80FA +#define MT6332_DEW_CRC_SWRST 0x80FC +#define MT6332_DEW_CRC_EN 0x80FE +#define MT6332_DEW_CRC_VAL 0x8100 +#define MT6332_DEW_DBG_MON_SEL 0x8102 +#define MT6332_DEW_CIPHER_KEY_SEL 0x8104 +#define MT6332_DEW_CIPHER_IV_SEL 0x8106 +#define MT6332_DEW_CIPHER_EN 0x8108 +#define MT6332_DEW_CIPHER_RDY 0x810A +#define MT6332_DEW_CIPHER_MODE 0x810C +#define MT6332_DEW_CIPHER_SWRST 0x810E +#define MT6332_DEW_RDDMY_NO 0x8110 +#define MT6332_INT_STA 0x8112 +#define MT6332_BIF_CON0 0x8114 +#define MT6332_BIF_CON1 0x8116 +#define MT6332_BIF_CON2 0x8118 +#define MT6332_BIF_CON3 0x811A +#define MT6332_BIF_CON4 0x811C +#define MT6332_BIF_CON5 0x811E +#define MT6332_BIF_CON6 0x8120 +#define MT6332_BIF_CON7 0x8122 +#define MT6332_BIF_CON8 0x8124 +#define MT6332_BIF_CON9 0x8126 +#define MT6332_BIF_CON10 0x8128 +#define MT6332_BIF_CON11 0x812A +#define MT6332_BIF_CON12 0x812C +#define MT6332_BIF_CON13 0x812E +#define MT6332_BIF_CON14 0x8130 +#define MT6332_BIF_CON15 0x8132 +#define MT6332_BIF_CON16 0x8134 +#define MT6332_BIF_CON17 0x8136 +#define MT6332_BIF_CON18 0x8138 +#define MT6332_BIF_CON19 0x813A +#define MT6332_BIF_CON20 0x813C +#define MT6332_BIF_CON21 0x813E +#define MT6332_BIF_CON22 0x8140 +#define MT6332_BIF_CON23 0x8142 +#define MT6332_BIF_CON24 0x8144 +#define MT6332_BIF_CON25 0x8146 +#define MT6332_BIF_CON26 0x8148 +#define MT6332_BIF_CON27 0x814A +#define MT6332_BIF_CON28 0x814C +#define MT6332_BIF_CON29 0x814E +#define MT6332_BIF_CON30 0x8150 +#define MT6332_BIF_CON31 0x8152 +#define MT6332_BIF_CON32 0x8154 +#define MT6332_BIF_CON33 0x8156 +#define MT6332_BIF_CON34 0x8158 +#define MT6332_BIF_CON35 0x815A +#define MT6332_BIF_CON36 0x815C +#define MT6332_BATON_CON0 0x815E +#define MT6332_BIF_CON37 0x8160 +#define MT6332_BIF_CON38 0x8162 +#define MT6332_CHR_CON16 0x8164 +#define MT6332_CHR_CON17 0x8166 +#define MT6332_CHR_CON18 0x8168 +#define MT6332_CHR_CON19 0x816A +#define MT6332_CHR_CON20 0x816C +#define MT6332_CHR_CON21 0x816E +#define MT6332_CHR_CON22 0x8170 +#define MT6332_CHR_CON23 0x8172 +#define MT6332_CHR_CON24 0x8174 +#define MT6332_CHR_CON25 0x8176 +#define MT6332_STA_CON8 0x8178 +#define MT6332_BUCK_ALL_CON0 0x8400 +#define MT6332_BUCK_ALL_CON1 0x8402 +#define MT6332_BUCK_ALL_CON2 0x8404 +#define MT6332_BUCK_ALL_CON3 0x8406 +#define MT6332_BUCK_ALL_CON4 0x8408 +#define MT6332_BUCK_ALL_CON5 0x840A +#define MT6332_BUCK_ALL_CON6 0x840C +#define MT6332_BUCK_ALL_CON7 0x840E +#define MT6332_BUCK_ALL_CON8 0x8410 +#define MT6332_BUCK_ALL_CON9 0x8412 +#define MT6332_BUCK_ALL_CON10 0x8414 +#define MT6332_BUCK_ALL_CON11 0x8416 +#define MT6332_BUCK_ALL_CON12 0x8418 +#define MT6332_BUCK_ALL_CON13 0x841A +#define MT6332_BUCK_ALL_CON14 0x841C +#define MT6332_BUCK_ALL_CON15 0x841E +#define MT6332_BUCK_ALL_CON16 0x8420 +#define MT6332_BUCK_ALL_CON17 0x8422 +#define MT6332_BUCK_ALL_CON18 0x8424 +#define MT6332_BUCK_ALL_CON19 0x8426 +#define MT6332_BUCK_ALL_CON20 0x8428 +#define MT6332_BUCK_ALL_CON21 0x842A +#define MT6332_BUCK_ALL_CON22 0x842C +#define MT6332_BUCK_ALL_CON23 0x842E +#define MT6332_BUCK_ALL_CON24 0x8430 +#define MT6332_BUCK_ALL_CON25 0x8432 +#define MT6332_BUCK_ALL_CON26 0x8434 +#define MT6332_BUCK_ALL_CON27 0x8436 +#define MT6332_VDRAM_CON0 0x8438 +#define MT6332_VDRAM_CON1 0x843A +#define MT6332_VDRAM_CON2 0x843C +#define MT6332_VDRAM_CON3 0x843E +#define MT6332_VDRAM_CON4 0x8440 +#define MT6332_VDRAM_CON5 0x8442 +#define MT6332_VDRAM_CON6 0x8444 +#define MT6332_VDRAM_CON7 0x8446 +#define MT6332_VDRAM_CON8 0x8448 +#define MT6332_VDRAM_CON9 0x844A +#define MT6332_VDRAM_CON10 0x844C +#define MT6332_VDRAM_CON11 0x844E +#define MT6332_VDRAM_CON12 0x8450 +#define MT6332_VDRAM_CON13 0x8452 +#define MT6332_VDRAM_CON14 0x8454 +#define MT6332_VDRAM_CON15 0x8456 +#define MT6332_VDRAM_CON16 0x8458 +#define MT6332_VDRAM_CON17 0x845A +#define MT6332_VDRAM_CON18 0x845C +#define MT6332_VDRAM_CON19 0x845E +#define MT6332_VDRAM_CON20 0x8460 +#define MT6332_VDRAM_CON21 0x8462 +#define MT6332_VDVFS2_CON0 0x8464 +#define MT6332_VDVFS2_CON1 0x8466 +#define MT6332_VDVFS2_CON2 0x8468 +#define MT6332_VDVFS2_CON3 0x846A +#define MT6332_VDVFS2_CON4 0x846C +#define MT6332_VDVFS2_CON5 0x846E +#define MT6332_VDVFS2_CON6 0x8470 +#define MT6332_VDVFS2_CON7 0x8472 +#define MT6332_VDVFS2_CON8 0x8474 +#define MT6332_VDVFS2_CON9 0x8476 +#define MT6332_VDVFS2_CON10 0x8478 +#define MT6332_VDVFS2_CON11 0x847A +#define MT6332_VDVFS2_CON12 0x847C +#define MT6332_VDVFS2_CON13 0x847E +#define MT6332_VDVFS2_CON14 0x8480 +#define MT6332_VDVFS2_CON15 0x8482 +#define MT6332_VDVFS2_CON16 0x8484 +#define MT6332_VDVFS2_CON17 0x8486 +#define MT6332_VDVFS2_CON18 0x8488 +#define MT6332_VDVFS2_CON19 0x848A +#define MT6332_VDVFS2_CON20 0x848C +#define MT6332_VDVFS2_CON21 0x848E +#define MT6332_VDVFS2_CON22 0x8490 +#define MT6332_VDVFS2_CON23 0x8492 +#define MT6332_VDVFS2_CON24 0x8494 +#define MT6332_VDVFS2_CON25 0x8496 +#define MT6332_VDVFS2_CON26 0x8498 +#define MT6332_VDVFS2_CON27 0x849A +#define MT6332_VRF1_CON0 0x849C +#define MT6332_VRF1_CON1 0x849E +#define MT6332_VRF1_CON2 0x84A0 +#define MT6332_VRF1_CON3 0x84A2 +#define MT6332_VRF1_CON4 0x84A4 +#define MT6332_VRF1_CON5 0x84A6 +#define MT6332_VRF1_CON6 0x84A8 +#define MT6332_VRF1_CON7 0x84AA +#define MT6332_VRF1_CON8 0x84AC +#define MT6332_VRF1_CON9 0x84AE +#define MT6332_VRF1_CON10 0x84B0 +#define MT6332_VRF1_CON11 0x84B2 +#define MT6332_VRF1_CON12 0x84B4 +#define MT6332_VRF1_CON13 0x84B6 +#define MT6332_VRF1_CON14 0x84B8 +#define MT6332_VRF1_CON15 0x84BA +#define MT6332_VRF1_CON16 0x84BC +#define MT6332_VRF1_CON17 0x84BE +#define MT6332_VRF1_CON18 0x84C0 +#define MT6332_VRF1_CON19 0x84C2 +#define MT6332_VRF1_CON20 0x84C4 +#define MT6332_VRF1_CON21 0x84C6 +#define MT6332_VRF2_CON0 0x84C8 +#define MT6332_VRF2_CON1 0x84CA +#define MT6332_VRF2_CON2 0x84CC +#define MT6332_VRF2_CON3 0x84CE +#define MT6332_VRF2_CON4 0x84D0 +#define MT6332_VRF2_CON5 0x84D2 +#define MT6332_VRF2_CON6 0x84D4 +#define MT6332_VRF2_CON7 0x84D6 +#define MT6332_VRF2_CON8 0x84D8 +#define MT6332_VRF2_CON9 0x84DA +#define MT6332_VRF2_CON10 0x84DC +#define MT6332_VRF2_CON11 0x84DE +#define MT6332_VRF2_CON12 0x84E0 +#define MT6332_VRF2_CON13 0x84E2 +#define MT6332_VRF2_CON14 0x84E4 +#define MT6332_VRF2_CON15 0x84E6 +#define MT6332_VRF2_CON16 0x84E8 +#define MT6332_VRF2_CON17 0x84EA +#define MT6332_VRF2_CON18 0x84EC +#define MT6332_VRF2_CON19 0x84EE +#define MT6332_VRF2_CON20 0x84F0 +#define MT6332_VRF2_CON21 0x84F2 +#define MT6332_VPA_CON0 0x84F4 +#define MT6332_VPA_CON1 0x84F6 +#define MT6332_VPA_CON2 0x84F8 +#define MT6332_VPA_CON3 0x84FC +#define MT6332_VPA_CON4 0x84FE +#define MT6332_VPA_CON5 0x8500 +#define MT6332_VPA_CON6 0x8502 +#define MT6332_VPA_CON7 0x8504 +#define MT6332_VPA_CON8 0x8506 +#define MT6332_VPA_CON9 0x8508 +#define MT6332_VPA_CON10 0x850A +#define MT6332_VPA_CON11 0x850C +#define MT6332_VPA_CON12 0x850E +#define MT6332_VPA_CON13 0x8510 +#define MT6332_VPA_CON14 0x8512 +#define MT6332_VPA_CON15 0x8514 +#define MT6332_VPA_CON16 0x8516 +#define MT6332_VPA_CON17 0x8518 +#define MT6332_VPA_CON18 0x851A +#define MT6332_VPA_CON19 0x851C +#define MT6332_VPA_CON20 0x851E +#define MT6332_VPA_CON21 0x8520 +#define MT6332_VPA_CON22 0x8522 +#define MT6332_VPA_CON23 0x8524 +#define MT6332_VPA_CON24 0x8526 +#define MT6332_VPA_CON25 0x8528 +#define MT6332_VSBST_CON0 0x852A +#define MT6332_VSBST_CON1 0x852C +#define MT6332_VSBST_CON2 0x852E +#define MT6332_VSBST_CON3 0x8530 +#define MT6332_VSBST_CON4 0x8532 +#define MT6332_VSBST_CON5 0x8534 +#define MT6332_VSBST_CON6 0x8536 +#define MT6332_VSBST_CON7 0x8538 +#define MT6332_VSBST_CON8 0x853A +#define MT6332_VSBST_CON9 0x853C +#define MT6332_VSBST_CON10 0x853E +#define MT6332_VSBST_CON11 0x8540 +#define MT6332_VSBST_CON12 0x8542 +#define MT6332_VSBST_CON13 0x8544 +#define MT6332_VSBST_CON14 0x8546 +#define MT6332_VSBST_CON15 0x8548 +#define MT6332_VSBST_CON16 0x854A +#define MT6332_VSBST_CON17 0x854C +#define MT6332_VSBST_CON18 0x854E +#define MT6332_VSBST_CON19 0x8550 +#define MT6332_VSBST_CON20 0x8552 +#define MT6332_VSBST_CON21 0x8554 +#define MT6332_BUCK_K_CON0 0x8556 +#define MT6332_BUCK_K_CON1 0x8558 +#define MT6332_BUCK_K_CON2 0x855A +#define MT6332_BUCK_K_CON3 0x855C +#define MT6332_BUCK_K_CON4 0x855E +#define MT6332_BUCK_K_CON5 0x8560 +#define MT6332_AUXADC_ADC0 0x8800 +#define MT6332_AUXADC_ADC1 0x8802 +#define MT6332_AUXADC_ADC2 0x8804 +#define MT6332_AUXADC_ADC3 0x8806 +#define MT6332_AUXADC_ADC4 0x8808 +#define MT6332_AUXADC_ADC5 0x880A +#define MT6332_AUXADC_ADC6 0x880C +#define MT6332_AUXADC_ADC7 0x880E +#define MT6332_AUXADC_ADC8 0x8810 +#define MT6332_AUXADC_ADC9 0x8812 +#define MT6332_AUXADC_ADC10 0x8814 +#define MT6332_AUXADC_ADC11 0x8816 +#define MT6332_AUXADC_ADC12 0x8818 +#define MT6332_AUXADC_ADC13 0x881A +#define MT6332_AUXADC_ADC14 0x881C +#define MT6332_AUXADC_ADC15 0x881E +#define MT6332_AUXADC_ADC16 0x8820 +#define MT6332_AUXADC_ADC17 0x8822 +#define MT6332_AUXADC_ADC18 0x8824 +#define MT6332_AUXADC_ADC19 0x8826 +#define MT6332_AUXADC_ADC20 0x8828 +#define MT6332_AUXADC_ADC21 0x882A +#define MT6332_AUXADC_ADC22 0x882C +#define MT6332_AUXADC_ADC23 0x882E +#define MT6332_AUXADC_ADC24 0x8830 +#define MT6332_AUXADC_ADC25 0x8832 +#define MT6332_AUXADC_ADC26 0x8834 +#define MT6332_AUXADC_ADC27 0x8836 +#define MT6332_AUXADC_ADC28 0x8838 +#define MT6332_AUXADC_ADC29 0x883A +#define MT6332_AUXADC_ADC30 0x883C +#define MT6332_AUXADC_ADC31 0x883E +#define MT6332_AUXADC_ADC32 0x8840 +#define MT6332_AUXADC_ADC33 0x8842 +#define MT6332_AUXADC_ADC34 0x8844 +#define MT6332_AUXADC_ADC35 0x8846 +#define MT6332_AUXADC_ADC36 0x8848 +#define MT6332_AUXADC_ADC37 0x884A +#define MT6332_AUXADC_ADC38 0x884C +#define MT6332_AUXADC_ADC39 0x884E +#define MT6332_AUXADC_ADC40 0x8850 +#define MT6332_AUXADC_ADC41 0x8852 +#define MT6332_AUXADC_ADC42 0x8854 +#define MT6332_AUXADC_ADC43 0x8856 +#define MT6332_AUXADC_STA0 0x8858 +#define MT6332_AUXADC_STA1 0x885A +#define MT6332_AUXADC_RQST0 0x885C +#define MT6332_AUXADC_RQST0_SET 0x885E +#define MT6332_AUXADC_RQST0_CLR 0x8860 +#define MT6332_AUXADC_RQST1 0x8862 +#define MT6332_AUXADC_RQST1_SET 0x8864 +#define MT6332_AUXADC_RQST1_CLR 0x8866 +#define MT6332_AUXADC_CON0 0x8868 +#define MT6332_AUXADC_CON1 0x886A +#define MT6332_AUXADC_CON2 0x886C +#define MT6332_AUXADC_CON3 0x886E +#define MT6332_AUXADC_CON4 0x8870 +#define MT6332_AUXADC_CON5 0x8872 +#define MT6332_AUXADC_CON6 0x8874 +#define MT6332_AUXADC_CON7 0x8876 +#define MT6332_AUXADC_CON8 0x8878 +#define MT6332_AUXADC_CON9 0x887A +#define MT6332_AUXADC_CON10 0x887C +#define MT6332_AUXADC_CON11 0x887E +#define MT6332_AUXADC_CON12 0x8880 +#define MT6332_AUXADC_CON13 0x8882 +#define MT6332_AUXADC_CON14 0x8884 +#define MT6332_AUXADC_CON15 0x8886 +#define MT6332_AUXADC_CON16 0x8888 +#define MT6332_AUXADC_CON17 0x888A +#define MT6332_AUXADC_CON18 0x888C +#define MT6332_AUXADC_CON19 0x888E +#define MT6332_AUXADC_CON20 0x8890 +#define MT6332_AUXADC_CON21 0x8892 +#define MT6332_AUXADC_CON22 0x8894 +#define MT6332_AUXADC_CON23 0x8896 +#define MT6332_AUXADC_CON24 0x8898 +#define MT6332_AUXADC_CON25 0x889A +#define MT6332_AUXADC_CON26 0x889C +#define MT6332_AUXADC_CON27 0x889E +#define MT6332_AUXADC_CON28 0x88A0 +#define MT6332_AUXADC_CON29 0x88A2 +#define MT6332_AUXADC_CON30 0x88A4 +#define MT6332_AUXADC_CON31 0x88A6 +#define MT6332_AUXADC_CON32 0x88A8 +#define MT6332_AUXADC_CON33 0x88AA +#define MT6332_AUXADC_CON34 0x88AC +#define MT6332_AUXADC_CON35 0x88AE +#define MT6332_AUXADC_CON36 0x88B0 +#define MT6332_AUXADC_CON37 0x88B2 +#define MT6332_AUXADC_CON38 0x88B4 +#define MT6332_AUXADC_CON39 0x88B6 +#define MT6332_AUXADC_CON40 0x88B8 +#define MT6332_AUXADC_CON41 0x88BA +#define MT6332_AUXADC_CON42 0x88BC +#define MT6332_AUXADC_CON43 0x88BE +#define MT6332_AUXADC_CON44 0x88C0 +#define MT6332_AUXADC_CON45 0x88C2 +#define MT6332_AUXADC_CON46 0x88C4 +#define MT6332_AUXADC_CON47 0x88C6 +#define MT6332_STRUP_CONA0 0x8C00 +#define MT6332_STRUP_CONA1 0x8C02 +#define MT6332_STRUP_CONA2 0x8C04 +#define MT6332_STRUP_CON0 0x8C06 +#define MT6332_STRUP_CON2 0x8C08 +#define MT6332_STRUP_CON3 0x8C0A +#define MT6332_STRUP_CON4 0x8C0C +#define MT6332_STRUP_CON5 0x8C0E +#define MT6332_STRUP_CON6 0x8C10 +#define MT6332_STRUP_CON7 0x8C12 +#define MT6332_STRUP_CON8 0x8C14 +#define MT6332_STRUP_CON9 0x8C16 +#define MT6332_STRUP_CON10 0x8C18 +#define MT6332_STRUP_CON11 0x8C1A +#define MT6332_STRUP_CON12 0x8C1C +#define MT6332_STRUP_CON13 0x8C1E +#define MT6332_STRUP_CON14 0x8C20 +#define MT6332_STRUP_CON15 0x8C22 +#define MT6332_STRUP_CON16 0x8C24 +#define MT6332_STRUP_CON17 0x8C26 +#define MT6332_FGADC_CON0 0x8C28 +#define MT6332_FGADC_CON1 0x8C2A +#define MT6332_FGADC_CON2 0x8C2C +#define MT6332_FGADC_CON3 0x8C2E +#define MT6332_FGADC_CON4 0x8C30 +#define MT6332_FGADC_CON5 0x8C32 +#define MT6332_FGADC_CON6 0x8C34 +#define MT6332_FGADC_CON7 0x8C36 +#define MT6332_FGADC_CON8 0x8C38 +#define MT6332_FGADC_CON9 0x8C3A +#define MT6332_FGADC_CON10 0x8C3C +#define MT6332_FGADC_CON11 0x8C3E +#define MT6332_FGADC_CON12 0x8C40 +#define MT6332_FGADC_CON13 0x8C42 +#define MT6332_FGADC_CON14 0x8C44 +#define MT6332_FGADC_CON15 0x8C46 +#define MT6332_FGADC_CON16 0x8C48 +#define MT6332_FGADC_CON17 0x8C4A +#define MT6332_FGADC_CON18 0x8C4C +#define MT6332_FGADC_CON19 0x8C4E +#define MT6332_FGADC_CON20 0x8C50 +#define MT6332_FGADC_CON21 0x8C52 +#define MT6332_FGADC_CON22 0x8C54 +#define MT6332_OTP_CON0 0x8C56 +#define MT6332_OTP_CON1 0x8C58 +#define MT6332_OTP_CON2 0x8C5A +#define MT6332_OTP_CON3 0x8C5C +#define MT6332_OTP_CON4 0x8C5E +#define MT6332_OTP_CON5 0x8C60 +#define MT6332_OTP_CON6 0x8C62 +#define MT6332_OTP_CON7 0x8C64 +#define MT6332_OTP_CON8 0x8C66 +#define MT6332_OTP_CON9 0x8C68 +#define MT6332_OTP_CON10 0x8C6A +#define MT6332_OTP_CON11 0x8C6C +#define MT6332_OTP_CON12 0x8C6E +#define MT6332_OTP_CON13 0x8C70 +#define MT6332_OTP_CON14 0x8C72 +#define MT6332_OTP_DOUT_0_15 0x8C74 +#define MT6332_OTP_DOUT_16_31 0x8C76 +#define MT6332_OTP_DOUT_32_47 0x8C78 +#define MT6332_OTP_DOUT_48_63 0x8C7A +#define MT6332_OTP_DOUT_64_79 0x8C7C +#define MT6332_OTP_DOUT_80_95 0x8C7E +#define MT6332_OTP_DOUT_96_111 0x8C80 +#define MT6332_OTP_DOUT_112_127 0x8C82 +#define MT6332_OTP_DOUT_128_143 0x8C84 +#define MT6332_OTP_DOUT_144_159 0x8C86 +#define MT6332_OTP_DOUT_160_175 0x8C88 +#define MT6332_OTP_DOUT_176_191 0x8C8A +#define MT6332_OTP_DOUT_192_207 0x8C8C +#define MT6332_OTP_DOUT_208_223 0x8C8E +#define MT6332_OTP_DOUT_224_239 0x8C90 +#define MT6332_OTP_DOUT_240_255 0x8C92 +#define MT6332_OTP_VAL_0_15 0x8C94 +#define MT6332_OTP_VAL_16_31 0x8C96 +#define MT6332_OTP_VAL_32_47 0x8C98 +#define MT6332_OTP_VAL_48_63 0x8C9A +#define MT6332_OTP_VAL_64_79 0x8C9C +#define MT6332_OTP_VAL_80_95 0x8C9E +#define MT6332_OTP_VAL_96_111 0x8CA0 +#define MT6332_OTP_VAL_112_127 0x8CA2 +#define MT6332_OTP_VAL_128_143 0x8CA4 +#define MT6332_OTP_VAL_144_159 0x8CA6 +#define MT6332_OTP_VAL_160_175 0x8CA8 +#define MT6332_OTP_VAL_176_191 0x8CAA +#define MT6332_OTP_VAL_192_207 0x8CAC +#define MT6332_OTP_VAL_208_223 0x8CAE +#define MT6332_OTP_VAL_224_239 0x8CB0 +#define MT6332_OTP_VAL_240_255 0x8CB2 +#define MT6332_LDO_CON0 0x8CB4 +#define MT6332_LDO_CON1 0x8CB6 +#define MT6332_LDO_CON2 0x8CB8 +#define MT6332_LDO_CON3 0x8CBA +#define MT6332_LDO_CON5 0x8CBC +#define MT6332_LDO_CON6 0x8CBE +#define MT6332_LDO_CON7 0x8CC0 +#define MT6332_LDO_CON8 0x8CC2 +#define MT6332_LDO_CON9 0x8CC4 +#define MT6332_LDO_CON10 0x8CC6 +#define MT6332_LDO_CON11 0x8CC8 +#define MT6332_LDO_CON12 0x8CCA +#define MT6332_LDO_CON13 0x8CCC +#define MT6332_FQMTR_CON0 0x8CCE +#define MT6332_FQMTR_CON1 0x8CD0 +#define MT6332_FQMTR_CON2 0x8CD2 +#define MT6332_IWLED_CON0 0x8CD4 +#define MT6332_IWLED_DEG 0x8CD6 +#define MT6332_IWLED_STATUS 0x8CD8 +#define MT6332_IWLED_EN_CTRL 0x8CDA +#define MT6332_IWLED_CON1 0x8CDC +#define MT6332_IWLED_CON2 0x8CDE +#define MT6332_IWLED_TRIM0 0x8CE0 +#define MT6332_IWLED_TRIM1 0x8CE2 +#define MT6332_IWLED_CON3 0x8CE4 +#define MT6332_IWLED_CON4 0x8CE6 +#define MT6332_IWLED_CON5 0x8CE8 +#define MT6332_IWLED_CON6 0x8CEA +#define MT6332_IWLED_CON7 0x8CEC +#define MT6332_IWLED_CON8 0x8CEE +#define MT6332_IWLED_CON9 0x8CF0 +#define MT6332_SPK_CON0 0x8CF2 +#define MT6332_SPK_CON1 0x8CF4 +#define MT6332_SPK_CON2 0x8CF6 +#define MT6332_SPK_CON3 0x8CF8 +#define MT6332_SPK_CON4 0x8CFA +#define MT6332_SPK_CON5 0x8CFC +#define MT6332_SPK_CON6 0x8CFE +#define MT6332_SPK_CON7 0x8D00 +#define MT6332_SPK_CON8 0x8D02 +#define MT6332_SPK_CON9 0x8D04 +#define MT6332_SPK_CON10 0x8D06 +#define MT6332_SPK_CON11 0x8D08 +#define MT6332_SPK_CON12 0x8D0A +#define MT6332_SPK_CON13 0x8D0C +#define MT6332_SPK_CON14 0x8D0E +#define MT6332_SPK_CON15 0x8D10 +#define MT6332_SPK_CON16 0x8D12 +#define MT6332_TESTI_CON0 0x8D14 +#define MT6332_TESTI_CON1 0x8D16 +#define MT6332_TESTI_CON2 0x8D18 +#define MT6332_TESTI_CON3 0x8D1A +#define MT6332_TESTI_CON4 0x8D1C +#define MT6332_TESTI_CON5 0x8D1E +#define MT6332_TESTI_CON6 0x8D20 +#define MT6332_TESTI_MUX_CON0 0x8D22 +#define MT6332_TESTI_MUX_CON1 0x8D24 +#define MT6332_TESTI_MUX_CON2 0x8D26 +#define MT6332_TESTI_MUX_CON3 0x8D28 +#define MT6332_TESTI_MUX_CON4 0x8D2A +#define MT6332_TESTI_MUX_CON5 0x8D2C +#define MT6332_TESTI_MUX_CON6 0x8D2E +#define MT6332_TESTO_CON0 0x8D30 +#define MT6332_TESTO_CON1 0x8D32 +#define MT6332_TEST_OMUX_CON0 0x8D34 +#define MT6332_TEST_OMUX_CON1 0x8D36 +#define MT6332_DEBUG_CON0 0x8D38 +#define MT6332_DEBUG_CON1 0x8D3A +#define MT6332_DEBUG_CON2 0x8D3C +#define MT6332_FGADC_CON23 0x8D3E +#define MT6332_FGADC_CON24 0x8D40 +#define MT6332_FGADC_CON25 0x8D42 +#define MT6332_TOP_RST_STATUS 0x8D44 +#define MT6332_TOP_RST_STATUS_SET 0x8D46 +#define MT6332_TOP_RST_STATUS_CLR 0x8D48 +#define MT6332_VDVFS2_CON28 0x8D4A + +#endif /* __MFD_MT6332_REGISTERS_H__ */ diff --git a/include/linux/mfd/mt6397/core.h b/include/linux/mfd/mt6397/core.h index 3fecaffe5019..627487e26287 100644 --- a/include/linux/mfd/mt6397/core.h +++ b/include/linux/mfd/mt6397/core.h @@ -12,6 +12,8 @@ enum chip_id { MT6323_CHIP_ID = 0x23, + MT6331_CHIP_ID = 0x20, + MT6332_CHIP_ID = 0x20, MT6357_CHIP_ID = 0x57, MT6358_CHIP_ID = 0x58, MT6359_CHIP_ID = 0x59, -- cgit From 9d69ef1838150c7d87afc1a87aa658c637217585 Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Mon, 18 Jul 2022 09:23:15 +0200 Subject: fbdev/core: Remove remove_conflicting_pci_framebuffers() Remove remove_conflicting_pci_framebuffers() and implement similar functionality in aperture_remove_conflicting_pci_device(), which was the only caller. Removes an otherwise unused interface and streamlines the aperture helper. No functional changes. Signed-off-by: Thomas Zimmermann Reviewed-by: Javier Martinez Canillas Link: https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-5-tzimmermann@suse.de --- include/linux/fb.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 07fcd0e56682..b91c77016560 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -615,8 +615,6 @@ extern ssize_t fb_sys_write(struct fb_info *info, const char __user *buf, /* drivers/video/fbmem.c */ extern int register_framebuffer(struct fb_info *fb_info); extern void unregister_framebuffer(struct fb_info *fb_info); -extern int remove_conflicting_pci_framebuffers(struct pci_dev *pdev, - const char *name); extern int remove_conflicting_framebuffers(struct apertures_struct *a, const char *name, bool primary); extern int fb_prepare_logo(struct fb_info *fb_info, int rotate); -- cgit From 15fced5b051e6e22c228a521a5894b12c2ba0892 Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Mon, 18 Jul 2022 09:23:22 +0200 Subject: fbdev: Remove conflict-handling code Remove the call to do_remove_conflicting_framebuffers() from the framebuffer registration. Aperture helpers take care of removing conflicting devices. With all ownership information stored in the aperture datastrcutures, remove remove_conflicting_framebuffers() entirely. This change also rectifies DRM generic-framebuffer registration, which tried to unregister conflicting framebuffers, even though it's entirely build on top of DRM. v2: * remove internal aperture-overlap helpers, which are now unused Signed-off-by: Thomas Zimmermann Reviewed-by: Javier Martinez Canillas Link: https://patchwork.freedesktop.org/patch/msgid/20220718072322.8927-12-tzimmermann@suse.de --- include/linux/fb.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index b91c77016560..453c3b2b6b8e 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -615,8 +615,6 @@ extern ssize_t fb_sys_write(struct fb_info *info, const char __user *buf, /* drivers/video/fbmem.c */ extern int register_framebuffer(struct fb_info *fb_info); extern void unregister_framebuffer(struct fb_info *fb_info); -extern int remove_conflicting_framebuffers(struct apertures_struct *a, - const char *name, bool primary); extern int fb_prepare_logo(struct fb_info *fb_info, int rotate); extern int fb_show_logo(struct fb_info *fb_info, int rotate); extern char* fb_get_buffer_offset(struct fb_info *info, struct fb_pixmap *buf, u32 size); -- cgit From 32f02a211b0a192553075803b0b819027f96bb43 Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Tue, 19 Jul 2022 13:57:48 +0200 Subject: Revert "platform/chrome: Add Type-C mux set command definitions" This reverts commit 28a6ed8e39f77f6ac613ec9b7461aa75e85fa79a. The chrome platform driver changes need to come in through the platform tree due to some api changes that showed up there that cause build errors in linux-next Reported-by: Stephen Rothwell Link: https://lore.kernel.org/r/20220719160821.5e68e30b@oak.ozlabs.ibm.com Cc: Prashant Malani Signed-off-by: Greg Kroah-Hartman --- include/linux/platform_data/cros_ec_commands.h | 18 ------------------ 1 file changed, 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_commands.h b/include/linux/platform_data/cros_ec_commands.h index a3945c5e7f50..8cfa8cfca77e 100644 --- a/include/linux/platform_data/cros_ec_commands.h +++ b/include/linux/platform_data/cros_ec_commands.h @@ -5722,21 +5722,8 @@ enum typec_control_command { TYPEC_CONTROL_COMMAND_EXIT_MODES, TYPEC_CONTROL_COMMAND_CLEAR_EVENTS, TYPEC_CONTROL_COMMAND_ENTER_MODE, - TYPEC_CONTROL_COMMAND_TBT_UFP_REPLY, - TYPEC_CONTROL_COMMAND_USB_MUX_SET, }; -/* Replies the AP may specify to the TBT EnterMode command as a UFP */ -enum typec_tbt_ufp_reply { - TYPEC_TBT_UFP_REPLY_NAK, - TYPEC_TBT_UFP_REPLY_ACK, -}; - -struct typec_usb_mux_set { - uint8_t mux_index; /* Index of the mux to set in the chain */ - uint8_t mux_flags; /* USB_PD_MUX_*-encoded USB mux state to set */ -} __ec_align1; - struct ec_params_typec_control { uint8_t port; uint8_t command; /* enum typec_control_command */ @@ -5750,8 +5737,6 @@ struct ec_params_typec_control { union { uint32_t clear_events_mask; uint8_t mode_to_enter; /* enum typec_mode */ - uint8_t tbt_ufp_reply; /* enum typec_tbt_ufp_reply */ - struct typec_usb_mux_set mux_params; uint8_t placeholder[128]; }; } __ec_align1; @@ -5830,9 +5815,6 @@ enum tcpc_cc_polarity { #define PD_STATUS_EVENT_SOP_DISC_DONE BIT(0) #define PD_STATUS_EVENT_SOP_PRIME_DISC_DONE BIT(1) #define PD_STATUS_EVENT_HARD_RESET BIT(2) -#define PD_STATUS_EVENT_DISCONNECTED BIT(3) -#define PD_STATUS_EVENT_MUX_0_SET_DONE BIT(4) -#define PD_STATUS_EVENT_MUX_1_SET_DONE BIT(5) struct ec_params_typec_status { uint8_t port; -- cgit From 2b3416ceff5e6bd4922f6d1c61fb68113dd82302 Mon Sep 17 00:00:00 2001 From: Yang Xu Date: Thu, 14 Jul 2022 14:11:25 +0800 Subject: fs: add mode_strip_sgid() helper Add a dedicated helper to handle the setgid bit when creating a new file in a setgid directory. This is a preparatory patch for moving setgid stripping into the vfs. The patch contains no functional changes. Currently the setgid stripping logic is open-coded directly in inode_init_owner() and the individual filesystems are responsible for handling setgid inheritance. Since this has proven to be brittle as evidenced by old issues we uncovered over the last months (see [1] to [3] below) we will try to move this logic into the vfs. Link: e014f37db1a2 ("xfs: use setattr_copy to set vfs inode attributes") [1] Link: 01ea173e103e ("xfs: fix up non-directory creation in SGID directories") [2] Link: fd84bfdddd16 ("ceph: fix up non-directory creation in SGID directories") [3] Link: https://lore.kernel.org/r/1657779088-2242-1-git-send-email-xuyang2018.jy@fujitsu.com Reviewed-by: Darrick J. Wong Reviewed-by: Christian Brauner (Microsoft) Reviewed-and-Tested-by: Jeff Layton Signed-off-by: Yang Xu Signed-off-by: Christian Brauner (Microsoft) --- include/linux/fs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ad5e3520fae..50642668c60f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1903,6 +1903,8 @@ extern long compat_ptr_ioctl(struct file *file, unsigned int cmd, void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode, const struct inode *dir, umode_t mode); extern bool may_open_dev(const struct path *path); +umode_t mode_strip_sgid(struct user_namespace *mnt_userns, + const struct inode *dir, umode_t mode); /* * This is the "filldir" function type, used by readdir() to let -- cgit From fd1894224407c484f652ad456e1ce423e89bb3eb Mon Sep 17 00:00:00 2001 From: Zhengchao Shao Date: Fri, 15 Jul 2022 19:55:59 +0800 Subject: bpf: Don't redirect packets with invalid pkt_len Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any skbs, that is, the flow->head is null. The root cause, as the [2] says, is because that bpf_prog_test_run_skb() run a bpf prog which redirects empty skbs. So we should determine whether the length of the packet modified by bpf prog or others like bpf_prog_test is valid before forwarding it directly. LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5 LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com Signed-off-by: Zhengchao Shao Reviewed-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.com Signed-off-by: Alexei Starovoitov --- include/linux/skbuff.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f6a27ab19202..82e8368ba6e6 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2459,6 +2459,14 @@ static inline void skb_set_tail_pointer(struct sk_buff *skb, const int offset) #endif /* NET_SKBUFF_DATA_USES_OFFSET */ +static inline void skb_assert_len(struct sk_buff *skb) +{ +#ifdef CONFIG_DEBUG_NET + if (WARN_ONCE(!skb->len, "%s\n", __func__)) + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); +#endif /* CONFIG_DEBUG_NET */ +} + /* * Add data to an sk_buff */ -- cgit From 800d6acf40e5ce676b53a1259fb4e93e56279367 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Fri, 27 May 2022 17:07:45 +0200 Subject: rcu: tiny: Record kvfree_call_rcu() call stack for KASAN When running KASAN with Tiny RCU (e.g. under ARCH=um, where a working KASAN patch is now available), we don't get any information on the original kfree_rcu() (or similar) caller when a problem is reported, as Tiny RCU doesn't record this. Add the recording, which required pulling kvfree_call_rcu() out of line for the KASAN case since the recording function (kasan_record_aux_stack_noalloc) is neither exported, nor can we include kasan.h into rcutiny.h. without KASAN, the patch has no size impact (ARCH=um kernel): text data bss dec hex filename 6151515 4423154 33148520 43723189 29b29b5 linux 6151515 4423154 33148520 43723189 29b29b5 linux + patch with KASAN, the impact on my build was minimal: text data bss dec hex filename 13915539 7388050 33282304 54585893 340ea25 linux 13911266 7392114 33282304 54585684 340e954 linux + patch -4273 +4064 +-0 -209 Acked-by: Dmitry Vyukov Signed-off-by: Johannes Berg Signed-off-by: Paul E. McKenney --- include/linux/rcutiny.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 5fed476f977f..d84e13f2c384 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -38,7 +38,7 @@ static inline void synchronize_rcu_expedited(void) */ extern void kvfree(const void *addr); -static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) +static inline void __kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) { if (head) { call_rcu(head, func); @@ -51,6 +51,15 @@ static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) kvfree((void *) func); } +#ifdef CONFIG_KASAN_GENERIC +void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func); +#else +static inline void kvfree_call_rcu(struct rcu_head *head, rcu_callback_t func) +{ + __kvfree_call_rcu(head, func); +} +#endif + void rcu_qs(void); static inline void rcu_softirq_qs(void) -- cgit From 2e5e4185ff89ea0fc44c9dead8dccf306a3b3bbb Mon Sep 17 00:00:00 2001 From: Aya Levin Date: Mon, 4 Jul 2022 19:34:08 +0300 Subject: net/mlx5: Expose ts_cqe_metadata_size2wqe_counter Add capability field which indicates the mask for wqe_counter which connects between loopback CQE and the original WQE. With this connection the driver can identify lost of the loopback CQE and reply PTP synchronization with timestamp given in the original CQE. Signed-off-by: Aya Levin Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 254cc22f5eec..51b4e71017ee 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1833,7 +1833,11 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 sw_vhca_id[0xe]; u8 reserved_at_230[0x10]; - u8 reserved_at_240[0x5c0]; + u8 reserved_at_240[0xb]; + u8 ts_cqe_metadata_size2wqe_counter[0x5]; + u8 reserved_at_250[0x10]; + + u8 reserved_at_260[0x5a0]; }; enum mlx5_ifc_flow_destination_type { -- cgit From 7c701d92b2b5e5175dbfec875816474b802b0c45 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 12 Jul 2022 21:52:29 +0100 Subject: skbuff: carry external ubuf_info in msghdr Make possible for network in-kernel callers like io_uring to pass in a custom ubuf_info by setting it in a new field of struct msghdr. Signed-off-by: Pavel Begunkov Signed-off-by: Jakub Kicinski --- include/linux/socket.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/socket.h b/include/linux/socket.h index 17311ad9f9af..7bac9fc1cee0 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -69,6 +69,7 @@ struct msghdr { unsigned int msg_flags; /* flags on received message */ __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ + struct ubuf_info *msg_ubuf; }; struct user_msghdr { -- cgit From ebe73a284f4de8c5d401adeccd9b8fe3183b6e95 Mon Sep 17 00:00:00 2001 From: David Ahern Date: Tue, 12 Jul 2022 21:52:30 +0100 Subject: net: Allow custom iter handler in msghdr Add support for custom iov_iter handling to msghdr. The idea is that in-kernel subsystems want control over how an SG is split. Signed-off-by: David Ahern [pavel: move callback into msghdr] Signed-off-by: Pavel Begunkov Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 7 ++++--- include/linux/socket.h | 4 ++++ 2 files changed, 8 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 8e12b3b9ad6c..a8a2dd4cfdfd 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1776,13 +1776,14 @@ void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref); void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg, bool success); -int __zerocopy_sg_from_iter(struct sock *sk, struct sk_buff *skb, - struct iov_iter *from, size_t length); +int __zerocopy_sg_from_iter(struct msghdr *msg, struct sock *sk, + struct sk_buff *skb, struct iov_iter *from, + size_t length); static inline int skb_zerocopy_iter_dgram(struct sk_buff *skb, struct msghdr *msg, int len) { - return __zerocopy_sg_from_iter(skb->sk, skb, &msg->msg_iter, len); + return __zerocopy_sg_from_iter(msg, skb->sk, skb, &msg->msg_iter, len); } int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb, diff --git a/include/linux/socket.h b/include/linux/socket.h index 7bac9fc1cee0..3c11ef18a9cf 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -14,6 +14,8 @@ struct file; struct pid; struct cred; struct socket; +struct sock; +struct sk_buff; #define __sockaddr_check_size(size) \ BUILD_BUG_ON(((size) > sizeof(struct __kernel_sockaddr_storage))) @@ -70,6 +72,8 @@ struct msghdr { __kernel_size_t msg_controllen; /* ancillary data buffer length */ struct kiocb *msg_iocb; /* ptr to iocb for async requests */ struct ubuf_info *msg_ubuf; + int (*sg_from_iter)(struct sock *sk, struct sk_buff *skb, + struct iov_iter *from, size_t length); }; struct user_msghdr { -- cgit From 753f1ca4e1e50248a1b760c9774d6d6b354562cc Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 12 Jul 2022 21:52:31 +0100 Subject: net: introduce managed frags infrastructure Some users like io_uring can do page pinning more efficiently, so we want a way to delegate referencing to other subsystems. For that add a new flag called SKBFL_MANAGED_FRAG_REFS. When set, skb doesn't hold page references and upper layers are responsivle to managing page lifetime. It's allowed to convert skbs from managed to normal by calling skb_zcopy_downgrade_managed(). The function will take all needed page references and clear the flag. It's needed, for instance, to avoid mixing managed modes. Signed-off-by: Pavel Begunkov Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index a8a2dd4cfdfd..07004593d7ca 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -688,11 +688,16 @@ enum { SKBFL_PURE_ZEROCOPY = BIT(2), SKBFL_DONT_ORPHAN = BIT(3), + + /* page references are managed by the ubuf_info, so it's safe to + * use frags only up until ubuf_info is released + */ + SKBFL_MANAGED_FRAG_REFS = BIT(4), }; #define SKBFL_ZEROCOPY_FRAG (SKBFL_ZEROCOPY_ENABLE | SKBFL_SHARED_FRAG) #define SKBFL_ALL_ZEROCOPY (SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \ - SKBFL_DONT_ORPHAN) + SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS) /* * The callback notifies userspace to release buffers when skb DMA is done in @@ -1810,6 +1815,11 @@ static inline bool skb_zcopy_pure(const struct sk_buff *skb) return skb_shinfo(skb)->flags & SKBFL_PURE_ZEROCOPY; } +static inline bool skb_zcopy_managed(const struct sk_buff *skb) +{ + return skb_shinfo(skb)->flags & SKBFL_MANAGED_FRAG_REFS; +} + static inline bool skb_pure_zcopy_same(const struct sk_buff *skb1, const struct sk_buff *skb2) { @@ -1884,6 +1894,14 @@ static inline void skb_zcopy_clear(struct sk_buff *skb, bool zerocopy_success) } } +void __skb_zcopy_downgrade_managed(struct sk_buff *skb); + +static inline void skb_zcopy_downgrade_managed(struct sk_buff *skb) +{ + if (unlikely(skb_zcopy_managed(skb))) + __skb_zcopy_downgrade_managed(skb); +} + static inline void skb_mark_not_on_list(struct sk_buff *skb) { skb->next = NULL; @@ -3499,7 +3517,10 @@ static inline void __skb_frag_unref(skb_frag_t *frag, bool recycle) */ static inline void skb_frag_unref(struct sk_buff *skb, int f) { - __skb_frag_unref(&skb_shinfo(skb)->frags[f], skb->pp_recycle); + struct skb_shared_info *shinfo = skb_shinfo(skb); + + if (!skb_zcopy_managed(skb)) + __skb_frag_unref(&shinfo->frags[f], skb->pp_recycle); } /** -- cgit From 84ce071e38a6e25ea3ea91188e5482ac1f17b3af Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 12 Jul 2022 21:52:32 +0100 Subject: net: introduce __skb_fill_page_desc_noacc Managed pages contain pinned userspace pages and controlled by upper layers, there is no need in tracking skb->pfmemalloc for them. Introduce a helper for filling frags but ignoring page tracking, it'll be needed later. Signed-off-by: Pavel Begunkov Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 28 +++++++++++++++++----------- 1 file changed, 17 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 07004593d7ca..1111adefd906 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2550,6 +2550,22 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) return skb_headlen(skb) + __skb_pagelen(skb); } +static inline void __skb_fill_page_desc_noacc(struct skb_shared_info *shinfo, + int i, struct page *page, + int off, int size) +{ + skb_frag_t *frag = &shinfo->frags[i]; + + /* + * Propagate page pfmemalloc to the skb if we can. The problem is + * that not all callers have unique ownership of the page but rely + * on page_is_pfmemalloc doing the right thing(tm). + */ + frag->bv_page = page; + frag->bv_offset = off; + skb_frag_size_set(frag, size); +} + /** * __skb_fill_page_desc - initialise a paged fragment in an skb * @skb: buffer containing fragment to be initialised @@ -2566,17 +2582,7 @@ static inline unsigned int skb_pagelen(const struct sk_buff *skb) static inline void __skb_fill_page_desc(struct sk_buff *skb, int i, struct page *page, int off, int size) { - skb_frag_t *frag = &skb_shinfo(skb)->frags[i]; - - /* - * Propagate page pfmemalloc to the skb if we can. The problem is - * that not all callers have unique ownership of the page but rely - * on page_is_pfmemalloc doing the right thing(tm). - */ - frag->bv_page = page; - frag->bv_offset = off; - skb_frag_size_set(frag, size); - + __skb_fill_page_desc_noacc(skb_shinfo(skb), i, page, off, size); page = compound_head(page); if (page_is_pfmemalloc(page)) skb->pfmemalloc = true; -- cgit From e6b8a0a5e7f688e092d1c639d438ccd0b323213c Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Tue, 19 Jul 2022 13:52:44 -0700 Subject: PCI: Add vendor ID for the PCI SIG This ID is used in DOE headers to identify protocols that are defined within the PCI Express Base Specification, PCIe r6.0, sec 6.30.1.1 table 6-32. Acked-by: Bjorn Helgaas Reviewed-by: Davidlohr Bueso Reviewed-by: Dan Williams Signed-off-by: Jonathan Cameron Link: https://lore.kernel.org/r/20220719205249.566684-2-ira.weiny@intel.com Signed-off-by: Dan Williams --- include/linux/pci_ids.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 0178823ce8c2..8af3b86206b1 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -151,6 +151,7 @@ #define PCI_CLASS_OTHERS 0xff /* Vendors and devices. Sort key: vendor first, device next. */ +#define PCI_VENDOR_ID_PCI_SIG 0x0001 #define PCI_VENDOR_ID_LOONGSON 0x0014 -- cgit From 9d24322e887b6a3d3f9f9c3e76937a646102c8c1 Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Tue, 19 Jul 2022 13:52:46 -0700 Subject: PCI/DOE: Add DOE mailbox support functions Introduced in a PCIe r6.0, sec 6.30, DOE provides a config space based mailbox with standard protocol discovery. Each mailbox is accessed through a DOE Extended Capability. Each DOE mailbox must support the DOE discovery protocol in addition to any number of additional protocols. Define core PCIe functionality to manage a single PCIe DOE mailbox at a defined config space offset. Functionality includes iterating, creating, query of supported protocol, and task submission. Destruction of the mailboxes is device managed. Cc: "Li, Ming" Cc: Bjorn Helgaas Cc: Matthew Wilcox Acked-by: Bjorn Helgaas Signed-off-by: Jonathan Cameron Co-developed-by: Ira Weiny Signed-off-by: Ira Weiny Link: https://lore.kernel.org/r/20220719205249.566684-4-ira.weiny@intel.com Signed-off-by: Dan Williams --- include/linux/pci-doe.h | 77 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 include/linux/pci-doe.h (limited to 'include/linux') diff --git a/include/linux/pci-doe.h b/include/linux/pci-doe.h new file mode 100644 index 000000000000..ed9b4df792b8 --- /dev/null +++ b/include/linux/pci-doe.h @@ -0,0 +1,77 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Data Object Exchange + * PCIe r6.0, sec 6.30 DOE + * + * Copyright (C) 2021 Huawei + * Jonathan Cameron + * + * Copyright (C) 2022 Intel Corporation + * Ira Weiny + */ + +#ifndef LINUX_PCI_DOE_H +#define LINUX_PCI_DOE_H + +struct pci_doe_protocol { + u16 vid; + u8 type; +}; + +struct pci_doe_mb; + +/** + * struct pci_doe_task - represents a single query/response + * + * @prot: DOE Protocol + * @request_pl: The request payload + * @request_pl_sz: Size of the request payload (bytes) + * @response_pl: The response payload + * @response_pl_sz: Size of the response payload (bytes) + * @rv: Return value. Length of received response or error (bytes) + * @complete: Called when task is complete + * @private: Private data for the consumer + * @work: Used internally by the mailbox + * @doe_mb: Used internally by the mailbox + * + * The payload sizes and rv are specified in bytes with the following + * restrictions concerning the protocol. + * + * 1) The request_pl_sz must be a multiple of double words (4 bytes) + * 2) The response_pl_sz must be >= a single double word (4 bytes) + * 3) rv is returned as bytes but it will be a multiple of double words + * + * NOTE there is no need for the caller to initialize work or doe_mb. + */ +struct pci_doe_task { + struct pci_doe_protocol prot; + u32 *request_pl; + size_t request_pl_sz; + u32 *response_pl; + size_t response_pl_sz; + int rv; + void (*complete)(struct pci_doe_task *task); + void *private; + + /* No need for the user to initialize these fields */ + struct work_struct work; + struct pci_doe_mb *doe_mb; +}; + +/** + * pci_doe_for_each_off - Iterate each DOE capability + * @pdev: struct pci_dev to iterate + * @off: u16 of config space offset of each mailbox capability found + */ +#define pci_doe_for_each_off(pdev, off) \ + for (off = pci_find_next_ext_capability(pdev, off, \ + PCI_EXT_CAP_ID_DOE); \ + off > 0; \ + off = pci_find_next_ext_capability(pdev, off, \ + PCI_EXT_CAP_ID_DOE)) + +struct pci_doe_mb *pcim_doe_create_mb(struct pci_dev *pdev, u16 cap_offset); +bool pci_doe_supports_prot(struct pci_doe_mb *doe_mb, u16 vid, u8 type); +int pci_doe_submit_task(struct pci_doe_mb *doe_mb, struct pci_doe_task *task); + +#endif -- cgit From 9d6794feeb90903b10c34bddd9c74c992447ce83 Mon Sep 17 00:00:00 2001 From: Ira Weiny Date: Tue, 19 Jul 2022 13:52:48 -0700 Subject: driver-core: Introduce BIN_ATTR_ADMIN_{RO,RW} MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Many binary attributes need to limit access to CAP_SYS_ADMIN only; ie many binary attributes specify is_visible with 0400 or 0600. Make setting the permissions of such attributes more explicit by defining BIN_ATTR_ADMIN_{RO,RW}. Cc: Bjorn Helgaas Suggested-by: Dan Williams Suggested-by: Krzysztof Wilczyński Reviewed-by: Jonathan Cameron Reviewed-by: Greg Kroah-Hartman Signed-off-by: Ira Weiny Link: https://lore.kernel.org/r/20220719205249.566684-6-ira.weiny@intel.com Signed-off-by: Dan Williams --- include/linux/sysfs.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sysfs.h b/include/linux/sysfs.h index e3f1e8ac1f85..fd3fe5c8c17f 100644 --- a/include/linux/sysfs.h +++ b/include/linux/sysfs.h @@ -235,6 +235,22 @@ struct bin_attribute bin_attr_##_name = __BIN_ATTR_WO(_name, _size) #define BIN_ATTR_RW(_name, _size) \ struct bin_attribute bin_attr_##_name = __BIN_ATTR_RW(_name, _size) + +#define __BIN_ATTR_ADMIN_RO(_name, _size) { \ + .attr = { .name = __stringify(_name), .mode = 0400 }, \ + .read = _name##_read, \ + .size = _size, \ +} + +#define __BIN_ATTR_ADMIN_RW(_name, _size) \ + __BIN_ATTR(_name, 0600, _name##_read, _name##_write, _size) + +#define BIN_ATTR_ADMIN_RO(_name, _size) \ +struct bin_attribute bin_attr_##_name = __BIN_ATTR_ADMIN_RO(_name, _size) + +#define BIN_ATTR_ADMIN_RW(_name, _size) \ +struct bin_attribute bin_attr_##_name = __BIN_ATTR_ADMIN_RW(_name, _size) + struct sysfs_ops { ssize_t (*show)(struct kobject *, struct attribute *, char *); ssize_t (*store)(struct kobject *, struct attribute *, const char *, size_t); -- cgit From d0637c505f8a1d8c4088642f1f3e9e3b22da14f6 Mon Sep 17 00:00:00 2001 From: Barry Song Date: Wed, 20 Jul 2022 21:37:37 +1200 Subject: arm64: enable THP_SWAP for arm64 THP_SWAP has been proven to improve the swap throughput significantly on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay splitting THP after swapped out"). As long as arm64 uses 4K page size, it is quite similar with x86_64 by having 2MB PMD THP. THP_SWAP is architecture-independent, thus, enabling it on arm64 will benefit arm64 as well. A corner case is that MTE has an assumption that only base pages can be swapped. We won't enable THP_SWAP for ARM64 hardware with MTE support until MTE is reworked to coexist with THP_SWAP. A micro-benchmark is written to measure thp swapout throughput as below, unsigned long long tv_to_ms(struct timeval tv) { return tv.tv_sec * 1000 + tv.tv_usec / 1000; } main() { struct timeval tv_b, tv_e;; #define SIZE 400*1024*1024 volatile void *p = mmap(NULL, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (!p) { perror("fail to get memory"); exit(-1); } madvise(p, SIZE, MADV_HUGEPAGE); memset(p, 0x11, SIZE); /* write to get mem */ gettimeofday(&tv_b, NULL); madvise(p, SIZE, MADV_PAGEOUT); gettimeofday(&tv_e, NULL); printf("swp out bandwidth: %ld bytes/ms\n", SIZE/(tv_to_ms(tv_e) - tv_to_ms(tv_b))); } Testing is done on rk3568 64bit Quad Core Cortex-A55 platform - ROCK 3A. thp swp throughput w/o patch: 2734bytes/ms (mean of 10 tests) thp swp throughput w/ patch: 3331bytes/ms (mean of 10 tests) Cc: "Huang, Ying" Cc: Minchan Kim Cc: Johannes Weiner Cc: Hugh Dickins Cc: Andrea Arcangeli Cc: Steven Price Cc: Yang Shi Reviewed-by: Anshuman Khandual Signed-off-by: Barry Song Link: https://lore.kernel.org/r/20220720093737.133375-1-21cnbao@gmail.com Signed-off-by: Will Deacon --- include/linux/huge_mm.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index de29821231c9..4ddaf6ad73ef 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -461,4 +461,16 @@ static inline int split_folio_to_list(struct folio *folio, return split_huge_page_to_list(&folio->page, list); } +/* + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to + * limitations in the implementation like arm64 MTE can override this to + * false + */ +#ifndef arch_thp_swp_supported +static inline bool arch_thp_swp_supported(void) +{ + return true; +} +#endif + #endif /* _LINUX_HUGE_MM_H */ -- cgit From 7327b16f5f56741960e11ae4d7ef0ffdff5fd252 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 20 Jul 2022 18:51:21 +0800 Subject: APCI: irq: Add support for multiple GSI domains In an unfortunate departure from the ACPI spec, the LoongArch architecture split its GSI space across multiple interrupt controllers. In order to be able to reuse the core code and prevent architectures from reinventing an already square wheel, offer the arch code the ability to register a dispatcher function that will return the domain fwnode for a given GSI. The ARM GIC drivers are updated to support this (with a single domain, as intended). Signed-off-by: Marc Zyngier Cc: Hanjun Guo Cc: Lorenzo Pieralisi Signed-off-by: Jianmin Lv Tested-by: Hanjun Guo Reviewed-by: Hanjun Guo Link: https://lore.kernel.org/r/1658314292-35346-3-git-send-email-lvjianmin@loongson.cn --- include/linux/acpi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 4f82a5bc6d98..957e23f727ea 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -356,7 +356,7 @@ int acpi_gsi_to_irq (u32 gsi, unsigned int *irq); int acpi_isa_irq_to_gsi (unsigned isa_irq, u32 *gsi); void acpi_set_irq_model(enum acpi_irq_model_id model, - struct fwnode_handle *fwnode); + struct fwnode_handle *(*)(u32)); struct irq_domain *acpi_irq_create_hierarchy(unsigned int flags, unsigned int size, -- cgit From 744b9a0c3c8334d705dea6af3645ea30d597c360 Mon Sep 17 00:00:00 2001 From: Marc Zyngier Date: Wed, 20 Jul 2022 18:51:22 +0800 Subject: ACPI: irq: Allow acpi_gsi_to_irq() to have an arch-specific fallback It appears that the generic version of acpi_gsi_to_irq() doesn't fallback to establishing a mapping if there is no pre-existing one while the x86 version does. While arm64 seems unaffected by it, LoongArch is relying on the x86 behaviour. In an effort to prevent new architectures from reinventing the proverbial wheel, provide an optional callback that the arch code can set to restore the x86 behaviour. Hopefully we can eventually get rid of this in the future once the expected behaviour has been clarified. Reported-by: Jianmin Lv Signed-off-by: Marc Zyngier Signed-off-by: Jianmin Lv Tested-by: Hanjun Guo Reviewed-by: Hanjun Guo Link: https://lore.kernel.org/r/1658314292-35346-4-git-send-email-lvjianmin@loongson.cn --- include/linux/acpi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 957e23f727ea..e2b60d53ca60 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -357,6 +357,7 @@ int acpi_isa_irq_to_gsi (unsigned isa_irq, u32 *gsi); void acpi_set_irq_model(enum acpi_irq_model_id model, struct fwnode_handle *(*)(u32)); +void acpi_set_gsi_to_irq_fallback(u32 (*)(u32)); struct irq_domain *acpi_irq_create_hierarchy(unsigned int flags, unsigned int size, -- cgit From d319a299f4066685a787cfb89ad36fd78bb830ed Mon Sep 17 00:00:00 2001 From: Jianmin Lv Date: Wed, 20 Jul 2022 18:51:23 +0800 Subject: genirq/generic_chip: Export irq_unmap_generic_chip Some irq controllers have to re-implement a private version for irq_generic_chip_ops, because they have a different xlate to translate hwirq. Export irq_unmap_generic_chip to allow reusing in drivers. Signed-off-by: Jianmin Lv Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/1658314292-35346-5-git-send-email-lvjianmin@loongson.cn --- include/linux/irq.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/irq.h b/include/linux/irq.h index 505308253d23..83a4574308fa 100644 --- a/include/linux/irq.h +++ b/include/linux/irq.h @@ -1121,6 +1121,7 @@ int irq_gc_set_wake(struct irq_data *d, unsigned int on); /* Setup functions for irq_chip_generic */ int irq_map_generic_chip(struct irq_domain *d, unsigned int virq, irq_hw_number_t hw_irq); +void irq_unmap_generic_chip(struct irq_domain *d, unsigned int virq); struct irq_chip_generic * irq_alloc_generic_chip(const char *name, int nr_ct, unsigned int irq_base, void __iomem *reg_base, irq_flow_handler_t handler); -- cgit From dd281e1a1a937ee2f13bd0db5be78e5f5b811ca7 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Wed, 20 Jul 2022 18:51:30 +0800 Subject: irqchip: Add Loongson Extended I/O interrupt controller support EIOINTC stands for "Extended I/O Interrupts" that described in Section 11.2 of "Loongson 3A5000 Processor Reference Manual". For more information please refer Documentation/loongarch/irq-chip-model.rst. Loongson-3A5000 has 4 cores per NUMA node, and each NUMA node has an EIOINTC; while Loongson-3C5000 has 16 cores per NUMA node, and each NUMA node has 4 EIOINTCs. In other words, 16 cores of one NUMA node in Loongson-3C5000 are organized in 4 groups, each group connects to an EIOINTC. We call the "group" here as an EIOINTC node, so each EIOINTC node always includes 4 cores (both in Loongson-3A5000 and Loongson- 3C5000). Co-developed-by: Jianmin Lv Signed-off-by: Jianmin Lv Signed-off-by: Huacai Chen Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/1658314292-35346-12-git-send-email-lvjianmin@loongson.cn --- include/linux/cpuhotplug.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 19f0dbfdd7fe..de662f3a6cee 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -151,6 +151,7 @@ enum cpuhp_state { CPUHP_AP_IRQ_BCM2836_STARTING, CPUHP_AP_IRQ_MIPS_GIC_STARTING, CPUHP_AP_IRQ_RISCV_STARTING, + CPUHP_AP_IRQ_LOONGARCH_STARTING, CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING, CPUHP_AP_ARM_MVEBU_COHERENCY, CPUHP_AP_MICROCODE_LOADER, -- cgit From e8bba72b396cef7c919c73710f3c5884521adb4e Mon Sep 17 00:00:00 2001 From: Jianmin Lv Date: Wed, 20 Jul 2022 18:51:32 +0800 Subject: irqchip / ACPI: Introduce ACPI_IRQ_MODEL_LPIC for LoongArch For LoongArch, ACPI_IRQ_MODEL_LPIC is introduced, and then the callback acpi_get_gsi_domain_id and acpi_gsi_to_irq_fallback are implemented. The acpi_get_gsi_domain_id callback returns related fwnode handle of irqdomain for different GSI range. The acpi_gsi_to_irq_fallback will create new mapping for gsi when the mapping of it is not found. Signed-off-by: Jianmin Lv Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/1658314292-35346-14-git-send-email-lvjianmin@loongson.cn --- include/linux/acpi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index e2b60d53ca60..76520f379313 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -105,6 +105,7 @@ enum acpi_irq_model_id { ACPI_IRQ_MODEL_IOSAPIC, ACPI_IRQ_MODEL_PLATFORM, ACPI_IRQ_MODEL_GIC, + ACPI_IRQ_MODEL_LPIC, ACPI_IRQ_MODEL_COUNT }; -- cgit From d80f7d7b2c75c5954d335dffbccca62a5002c3e0 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Tue, 21 Jun 2022 14:38:52 -0500 Subject: signal: Guarantee that SIGNAL_GROUP_EXIT is set on process exit Track how many threads have not started exiting and when the last thread starts exiting set SIGNAL_GROUP_EXIT. This guarantees that SIGNAL_GROUP_EXIT will get set when a process exits. In practice this achieves nothing as glibc's implementation of _exit calls sys_group_exit then sys_exit. While glibc's implemenation of pthread_exit calls exit (which cleansup and calls _exit) if it is the last thread and sys_exit if it is the last thread. This means the only way the kernel might observe a process that does not set call exit_group is if the language runtime does not use glibc. With more cleanups I hope to move the decrement of quick_threads earlier. Link: https://lkml.kernel.org/r/87bkukd4tc.fsf_-_@email.froward.int.ebiederm.org Signed-off-by: "Eric W. Biederman" --- include/linux/sched/signal.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h index cafbe03eed01..20099268fa25 100644 --- a/include/linux/sched/signal.h +++ b/include/linux/sched/signal.h @@ -94,6 +94,7 @@ struct signal_struct { refcount_t sigcnt; atomic_t live; int nr_threads; + int quick_threads; struct list_head thread_head; wait_queue_head_t wait_chldexit; /* for wait4() */ -- cgit From f8faf3496633b302a6591fda599540a0b53a35bb Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Tue, 19 Jul 2022 14:52:51 -0500 Subject: x86/amd_nb: Add AMD PCI IDs for SMN communication Add support for SMN communication on family 17h model A0h and family 19h models 60h-70h. [ bp: Merge into a single patch. ] Signed-off-by: Mario Limonciello Signed-off-by: Borislav Petkov Reviewed-by: Yazen Ghannam Acked-by: Bjorn Helgaas # pci_ids.h Acked-by: Guenter Roeck Link: https://lore.kernel.org/r/20220719195256.1516-1-mario.limonciello@amd.com --- include/linux/pci_ids.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 0178823ce8c2..7fa460ccf7fa 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -556,10 +556,13 @@ #define PCI_DEVICE_ID_AMD_17H_M30H_DF_F3 0x1493 #define PCI_DEVICE_ID_AMD_17H_M60H_DF_F3 0x144b #define PCI_DEVICE_ID_AMD_17H_M70H_DF_F3 0x1443 +#define PCI_DEVICE_ID_AMD_17H_MA0H_DF_F3 0x1727 #define PCI_DEVICE_ID_AMD_19H_DF_F3 0x1653 #define PCI_DEVICE_ID_AMD_19H_M10H_DF_F3 0x14b0 #define PCI_DEVICE_ID_AMD_19H_M40H_DF_F3 0x167c #define PCI_DEVICE_ID_AMD_19H_M50H_DF_F3 0x166d +#define PCI_DEVICE_ID_AMD_19H_M60H_DF_F3 0x14e3 +#define PCI_DEVICE_ID_AMD_19H_M70H_DF_F3 0x14f3 #define PCI_DEVICE_ID_AMD_CNB17H_F3 0x1703 #define PCI_DEVICE_ID_AMD_LANCE 0x2000 #define PCI_DEVICE_ID_AMD_LANCE_HOME 0x2001 -- cgit From ce4b4657ff18925c315855aa290e93c5fa652d96 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Tue, 19 Jul 2022 21:02:48 -0300 Subject: vfio: Replace the DMA unmapping notifier with a callback Instead of having drivers register the notifier with explicit code just have them provide a dma_unmap callback op in their driver ops and rely on the core code to wire it up. Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Reviewed-by: Kevin Tian Reviewed-by: Tony Krowiak Reviewed-by: Eric Farman Reviewed-by: Zhenyu Wang Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/1-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 4d26e149db81..1f9fc7a9be9e 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -49,6 +49,7 @@ struct vfio_device { unsigned int open_count; struct completion comp; struct list_head group_next; + struct notifier_block iommu_nb; }; /** @@ -65,6 +66,8 @@ struct vfio_device { * @match: Optional device name match callback (return: 0 for no-match, >0 for * match, -errno for abort (ex. match with insufficient or incorrect * additional args) + * @dma_unmap: Called when userspace unmaps IOVA from the container + * this device is attached to. * @device_feature: Optional, fill in the VFIO_DEVICE_FEATURE ioctl */ struct vfio_device_ops { @@ -80,6 +83,7 @@ struct vfio_device_ops { int (*mmap)(struct vfio_device *vdev, struct vm_area_struct *vma); void (*request)(struct vfio_device *vdev, unsigned int count); int (*match)(struct vfio_device *vdev, char *buf); + void (*dma_unmap)(struct vfio_device *vdev, u64 iova, u64 length); int (*device_feature)(struct vfio_device *device, u32 flags, void __user *arg, size_t argsz); }; @@ -164,23 +168,6 @@ int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn, int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data, size_t len, bool write); -/* each type has independent events */ -enum vfio_notify_type { - VFIO_IOMMU_NOTIFY = 0, -}; - -/* events for VFIO_IOMMU_NOTIFY */ -#define VFIO_IOMMU_NOTIFY_DMA_UNMAP BIT(0) - -int vfio_register_notifier(struct vfio_device *device, - enum vfio_notify_type type, - unsigned long *required_events, - struct notifier_block *nb); -int vfio_unregister_notifier(struct vfio_device *device, - enum vfio_notify_type type, - struct notifier_block *nb); - - /* * Sub-module helpers */ -- cgit From 8cfc5b60751bcf9b4c6bbab3f6a72d59e0156a89 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Tue, 19 Jul 2022 21:02:49 -0300 Subject: vfio: Replace the iommu notifier with a device list Instead of bouncing the function call to the driver op through a blocking notifier just have the iommu layer call it directly. Register each device that is being attached to the iommu with the lower driver which then threads them on a linked list and calls the appropriate driver op at the right time. Currently the only use is if dma_unmap() is defined. Also, fully lock all the debugging tests on the pinning path that a dma_unmap is registered. Reviewed-by: Christoph Hellwig Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/2-v4-681e038e30fd+78-vfio_unmap_notif_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 1f9fc7a9be9e..19cefbaa3d06 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -49,7 +49,7 @@ struct vfio_device { unsigned int open_count; struct completion comp; struct list_head group_next; - struct notifier_block iommu_nb; + struct list_head iommu_entry; }; /** -- cgit From 9cb61fda8c71baaff423fe5be2e609100860fa78 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Wed, 20 Jul 2022 08:52:20 -0700 Subject: bpf: Fix bpf_trampoline_{,un}link_cgroup_shim ifdef guards They were updated in kernel/bpf/trampoline.c to fix another build issue. We should to do the same for include/linux/bpf.h header. Fixes: 3908fcddc65d ("bpf: fix lsm_cgroup build errors on esoteric configs") Reported-by: Stephen Rothwell Signed-off-by: Stanislav Fomichev Signed-off-by: Daniel Borkmann Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220720155220.4087433-1-sdf@google.com --- include/linux/bpf.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a5bf00649995..11950029284f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1254,9 +1254,6 @@ struct bpf_dummy_ops { int bpf_struct_ops_test_run(struct bpf_prog *prog, const union bpf_attr *kattr, union bpf_attr __user *uattr); #endif -int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, - int cgroup_atype); -void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog); #else static inline const struct bpf_struct_ops *bpf_struct_ops_find(u32 type_id) { @@ -1280,6 +1277,13 @@ static inline int bpf_struct_ops_map_sys_lookup_elem(struct bpf_map *map, { return -EINVAL; } +#endif + +#if defined(CONFIG_CGROUP_BPF) && defined(CONFIG_BPF_LSM) +int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, + int cgroup_atype); +void bpf_trampoline_unlink_cgroup_shim(struct bpf_prog *prog); +#else static inline int bpf_trampoline_link_cgroup_shim(struct bpf_prog *prog, int cgroup_atype) { -- cgit From a1a5482a2c6e38a3ebed32e571625c56a8cc41a6 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 17 Jun 2022 16:52:06 +0200 Subject: x86/extable: Fix ex_handler_msr() print condition On Fri, Jun 17, 2022 at 02:08:52PM +0300, Stephane Eranian wrote: > Some changes to the way invalid MSR accesses are reported by the > kernel is causing some problems with messages printed on the > console. > > We have seen several cases of ex_handler_msr() printing invalid MSR > accesses once but the callstack multiple times causing confusion on > the console. > The problem here is that another earlier commit (5.13): > > a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality") > > Modifies all the pr_*_once() calls to always return true claiming > that no caller is ever checking the return value of the functions. > > This is why we are seeing the callstack printed without the > associated printk() msg. Extract the ONCE_IF(cond) part into __ONCE_LTE_IF() and use that to implement DO_ONCE_LITE_IF() and fix the extable code. Fixes: a358f40600b3 ("once: implement DO_ONCE_LITE for non-fast-path "do once" functionality") Reported-by: Stephane Eranian Signed-off-by: Peter Zijlstra (Intel) Tested-by: Stephane Eranian Link: https://lkml.kernel.org/r/YqyVFsbviKjVGGZ9@worktop.programming.kicks-ass.net --- include/linux/once_lite.h | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/once_lite.h b/include/linux/once_lite.h index 861e606b820f..b7bce4983638 100644 --- a/include/linux/once_lite.h +++ b/include/linux/once_lite.h @@ -9,15 +9,27 @@ */ #define DO_ONCE_LITE(func, ...) \ DO_ONCE_LITE_IF(true, func, ##__VA_ARGS__) -#define DO_ONCE_LITE_IF(condition, func, ...) \ + +#define __ONCE_LITE_IF(condition) \ ({ \ static bool __section(".data.once") __already_done; \ - bool __ret_do_once = !!(condition); \ + bool __ret_cond = !!(condition); \ + bool __ret_once = false; \ \ - if (unlikely(__ret_do_once && !__already_done)) { \ + if (unlikely(__ret_cond && !__already_done)) { \ __already_done = true; \ - func(__VA_ARGS__); \ + __ret_once = true; \ } \ + unlikely(__ret_once); \ + }) + +#define DO_ONCE_LITE_IF(condition, func, ...) \ + ({ \ + bool __ret_do_once = !!(condition); \ + \ + if (__ONCE_LITE_IF(__ret_do_once)) \ + func(__VA_ARGS__); \ + \ unlikely(__ret_do_once); \ }) -- cgit From 5588d628027092e66195097bdf6835ddf64418b3 Mon Sep 17 00:00:00 2001 From: Łukasz Spintzyk Date: Wed, 20 Jul 2022 08:05:18 +0200 Subject: net/cdc_ncm: Increase NTB max RX/TX values to 64kb MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit DisplayLink ethernet devices require NTB buffers larger then 32kb in order to run with highest performance. This patch is changing upper limit of the rx and tx buffers. Those buffers are initialized with CDC_NCM_NTB_DEF_SIZE_RX and CDC_NCM_NTB_DEF_SIZE_TX which is 16kb so by default no device is affected by increased limit. Rx and tx buffer is increased under two conditions: - Device need to advertise that it supports higher buffer size in dwNtbMaxInMaxSize and dwNtbMaxOutMaxSize. - cdc_ncm/rx_max and cdc_ncm/tx_max driver parameters must be adjusted with udev rule or ethtool. Summary of testing and performance results: Tests were performed on following devices: - DisplayLink DL-3xxx family device - DisplayLink DL-6xxx family device - ASUS USB-C2500 2.5G USB3 ethernet adapter - Plugable USB3 1G USB3 ethernet adapter - EDIMAX EU-4307 USB-C ethernet adapter - Dell DBQBCBC064 USB-C ethernet adapter Performance measurements were done with: - iperf3 between two linux boxes - http://openspeedtest.com/ instance running on local test machine Insights from tests results: - All except one from third party usb adapters were not affected by increased buffer size to their advertised dwNtbOutMaxSize and dwNtbInMaxSize. Devices were generally reaching 912-940Mbps both download and upload. Only EDIMAX adapter experienced decreased download size from 929Mbps to 827Mbps with iper3, with openspeedtest decrease was from 968Mbps to 886Mbps. - DisplayLink DL-3xxx family devices experienced performance increase with iperf3 download from 300Mbps to 870Mbps and upload from 782Mbps to 844Mbps. With openspeedtest download increased from 556Mbps to 873Mbps and upload from 727Mbps to 973Mbps - DiplayLink DL-6xxx family devices are not affected by increased buffer size. Signed-off-by: Łukasz Spintzyk Acked-by: Greg Kroah-Hartman Link: https://lore.kernel.org/r/20220720060518.541-2-lukasz.spintzyk@synaptics.com Signed-off-by: Paolo Abeni --- include/linux/usb/cdc_ncm.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb/cdc_ncm.h b/include/linux/usb/cdc_ncm.h index f7cb3ddce7fb..2d207cb4837d 100644 --- a/include/linux/usb/cdc_ncm.h +++ b/include/linux/usb/cdc_ncm.h @@ -53,8 +53,8 @@ #define USB_CDC_NCM_NDP32_LENGTH_MIN 0x20 /* Maximum NTB length */ -#define CDC_NCM_NTB_MAX_SIZE_TX 32768 /* bytes */ -#define CDC_NCM_NTB_MAX_SIZE_RX 32768 /* bytes */ +#define CDC_NCM_NTB_MAX_SIZE_TX 65536 /* bytes */ +#define CDC_NCM_NTB_MAX_SIZE_RX 65536 /* bytes */ /* Initial NTB length */ #define CDC_NCM_NTB_DEF_SIZE_TX 16384 /* bytes */ -- cgit From e0c7ea83f006ce8c3264ef8b6508a891d886ad4f Mon Sep 17 00:00:00 2001 From: Shengjiu Wang Date: Thu, 7 Jul 2022 11:00:29 +0800 Subject: dmaengine: imx-sdma: Add FIFO stride support for multi FIFO script The peripheral may have several FIFOs, but some case just select some FIFOs from them for data transfer, which means FIFO0 and FIFO2 may be selected. So add FIFO address stride support, 0 means all FIFOs are continuous, 1 means 1 word stride between FIFOs. All stride between FIFOs should be same. Another option words_per_fifo means how many audio channel data copied to one FIFO one time, 1 means one channel per FIFO, 2 means 2 channels per FIFO. If 'n_fifos_src = 4' and 'words_per_fifo = 2', it means the first two words(channels) fetch from FIFO0 and then jump to FIFO1 for next two words, and so on after the last FIFO3 fetched, roll back to FIFO0. Signed-off-by: Joy Zou Signed-off-by: Shengjiu Wang Link: https://lore.kernel.org/r/1657162829-9273-1-git-send-email-shengjiu.wang@nxp.com Signed-off-by: Vinod Koul --- include/linux/dma/imx-dma.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma/imx-dma.h b/include/linux/dma/imx-dma.h index 8887762360d4..f487a4fa103a 100644 --- a/include/linux/dma/imx-dma.h +++ b/include/linux/dma/imx-dma.h @@ -70,6 +70,16 @@ static inline int imx_dma_is_general_purpose(struct dma_chan *chan) * struct sdma_peripheral_config - SDMA config for audio * @n_fifos_src: Number of FIFOs for recording * @n_fifos_dst: Number of FIFOs for playback + * @stride_fifos_src: FIFO address stride for recording, 0 means all FIFOs are + * continuous, 1 means 1 word stride between FIFOs. All stride + * between FIFOs should be same. + * @stride_fifos_dst: FIFO address stride for playback + * @words_per_fifo: numbers of words per FIFO fetch/fill, 1 means + * one channel per FIFO, 2 means 2 channels per FIFO.. + * If 'n_fifos_src = 4' and 'words_per_fifo = 2', it + * means the first two words(channels) fetch from FIFO0 + * and then jump to FIFO1 for next two words, and so on + * after the last FIFO3 fetched, roll back to FIFO0. * @sw_done: Use software done. Needed for PDM (micfil) * * Some i.MX Audio devices (SAI, micfil) have multiple successive FIFO @@ -82,6 +92,9 @@ static inline int imx_dma_is_general_purpose(struct dma_chan *chan) struct sdma_peripheral_config { int n_fifos_src; int n_fifos_dst; + int stride_fifos_src; + int stride_fifos_dst; + int words_per_fifo; bool sw_done; }; -- cgit From 974854ab0728532600c72e41a44d6ce1cf8f20a4 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 12 Jul 2022 18:37:54 -0700 Subject: cxl/acpi: Track CXL resources in iomem_resource Recall that CXL capable address ranges, on ACPI platforms, are published in the CEDT.CFMWS (CXL Early Discovery Table: CXL Fixed Memory Window Structures). These windows represent both the actively mapped capacity and the potential address space that can be dynamically assigned to a new CXL decode configuration (region / interleave-set). CXL endpoints like DDR DIMMs can be mapped at any physical address including 0 and legacy ranges. There is an expectation and requirement that the /proc/iomem interface and the iomem_resource tree in the kernel reflect the full set of platform address ranges. I.e. that every address range that platform firmware and bus drivers enumerate be reflected as an iomem_resource entry. The hard requirement to do this for CXL arises from the fact that facilities like CONFIG_DEVICE_PRIVATE expect to be able to treat empty iomem_resource ranges as free for software to use as proxy address space. Without CXL publishing its potential address ranges in iomem_resource, the CONFIG_DEVICE_PRIVATE mechanism may inadvertently steal capacity reserved for runtime provisioning of new CXL regions. So, iomem_resource needs to know about both active and potential CXL resource ranges. The active CXL resources might already be reflected in iomem_resource as "System RAM". insert_resource_expand_to_fit() handles re-parenting "System RAM" underneath a CXL window. The "_expand_to_fit()" behavior handles cases where a CXL window is not a strict superset of an existing entry in the iomem_resource tree. The "_expand_to_fit()" behavior is acceptable from the perspective of resource allocation. The expansion happens because a conflicting resource range is already populated, which means the resource boundary expansion does not result in any additional free CXL address space being made available. CXL address space allocation is always bounded by the orginal unexpanded address range. However, the potential for expansion does mean that something like walk_iomem_res_desc(IORES_DESC_CXL...) can only return fuzzy answers on corner case platforms that cause the resource tree to expand a CXL window resource over a range that is not decoded by CXL. This would be an odd platform configuration, but if it becomes a problem in practice the CXL subsytem could just publish an API that returns definitive answers. Cc: Andrew Morton Cc: David Hildenbrand Cc: Jason Gunthorpe Cc: Tony Luck Cc: Christoph Hellwig Reviewed-by: Jonathan Cameron Acked-by: Greg Kroah-Hartman Link: https://lore.kernel.org/r/165784325943.1758207.5310344844375305118.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams --- include/linux/ioport.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ioport.h b/include/linux/ioport.h index ec5f71f7135b..79d1ad6d6275 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -141,6 +141,7 @@ enum { IORES_DESC_DEVICE_PRIVATE_MEMORY = 6, IORES_DESC_RESERVED = 7, IORES_DESC_SOFT_RESERVED = 8, + IORES_DESC_CXL = 9, }; /* -- cgit From 14b80582c43e4f550acfd93c2b2cadbe36ea0874 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 20 May 2022 13:41:24 -0700 Subject: resource: Introduce alloc_free_mem_region() The core of devm_request_free_mem_region() is a helper that searches for free space in iomem_resource and performs __request_region_locked() on the result of that search. The policy choices of the implementation conform to what CONFIG_DEVICE_PRIVATE users want which is memory that is immediately marked busy, and a preference to search for the first-fit free range in descending order from the top of the physical address space. CXL has a need for a similar allocator, but with the following tweaks: 1/ Search for free space in ascending order 2/ Search for free space relative to a given CXL window 3/ 'insert' rather than 'request' the new resource given downstream drivers from the CXL Region driver (like the pmem or dax drivers) are responsible for request_mem_region() when they activate the memory range. Rework __request_free_mem_region() into get_free_mem_region() which takes a set of GFR_* (Get Free Region) flags to control the allocation policy (ascending vs descending), and "busy" policy (insert_resource() vs request_region()). As part of the consolidation of the legacy GFR_REQUEST_REGION case with the new default of just inserting a new resource into the free space some minor cleanups like not checking for NULL before calling devres_free() (which does its own check) is included. Suggested-by: Jason Gunthorpe Link: https://lore.kernel.org/linux-cxl/20220420143406.GY2120790@nvidia.com/ Cc: Matthew Wilcox Cc: Christoph Hellwig Reviewed-by: Jonathan Cameron Link: https://lore.kernel.org/r/165784333333.1758207.13703329337805274043.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams --- include/linux/ioport.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 79d1ad6d6275..616b683563a9 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -330,6 +330,8 @@ struct resource *devm_request_free_mem_region(struct device *dev, struct resource *base, unsigned long size); struct resource *request_free_mem_region(struct resource *base, unsigned long size, const char *name); +struct resource *alloc_free_mem_region(struct resource *base, + unsigned long size, unsigned long align, const char *name); static inline void irqresource_disabled(struct resource *res, u32 irq) { -- cgit From d96c52fe4907c68adc5e61a0bef7aec0933223d5 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Fri, 15 Apr 2022 10:55:42 -0700 Subject: rcu: Add polled expedited grace-period primitives This commit adds expedited grace-period functionality to RCU's polled grace-period API, adding start_poll_synchronize_rcu_expedited() and cond_synchronize_rcu_expedited(), which are similar to the existing start_poll_synchronize_rcu() and cond_synchronize_rcu() functions, respectively. Note that although start_poll_synchronize_rcu_expedited() can be invoked very early, the resulting expedited grace periods are not guaranteed to start until after workqueues are fully initialized. On the other hand, both synchronize_rcu() and synchronize_rcu_expedited() can also be invoked very early, and the resulting grace periods will be taken into account as they occur. [ paulmck: Apply feedback from Neeraj Upadhyay. ] Link: https://lore.kernel.org/all/20220121142454.1994916-1-bfoster@redhat.com/ Link: https://docs.google.com/document/d/1RNKWW9jQyfjxw2E8dsXVTdvZYh0HnYeSHDKog9jhdN8/edit?usp=sharing Cc: Brian Foster Cc: Dave Chinner Cc: Al Viro Cc: Ian Kent Signed-off-by: Paul E. McKenney --- include/linux/rcutiny.h | 10 ++++++++++ include/linux/rcutree.h | 2 ++ 2 files changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 5fed476f977f..ab7e20dfb07b 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -23,6 +23,16 @@ static inline void cond_synchronize_rcu(unsigned long oldstate) might_sleep(); } +static inline unsigned long start_poll_synchronize_rcu_expedited(void) +{ + return start_poll_synchronize_rcu(); +} + +static inline void cond_synchronize_rcu_expedited(unsigned long oldstate) +{ + cond_synchronize_rcu(oldstate); +} + extern void rcu_barrier(void); static inline void synchronize_rcu_expedited(void) diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 9c6cfb742504..20dbaa9a3882 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -40,6 +40,8 @@ bool rcu_eqs_special_set(int cpu); void rcu_momentary_dyntick_idle(void); void kfree_rcu_scheduler_running(void); bool rcu_gp_might_be_stalled(void); +unsigned long start_poll_synchronize_rcu_expedited(void); +void cond_synchronize_rcu_expedited(unsigned long oldstate); unsigned long get_state_synchronize_rcu(void); unsigned long start_poll_synchronize_rcu(void); bool poll_state_synchronize_rcu(unsigned long oldstate); -- cgit From ab21d6063c01180a8e9b22a37b847e5819525d9f Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Thu, 21 Jul 2022 15:42:33 +0200 Subject: bpf: Introduce 8-byte BTF set Introduce support for defining flags for kfuncs using a new set of macros, BTF_SET8_START/BTF_SET8_END, which define a set which contains 8 byte elements (each of which consists of a pair of BTF ID and flags), using a new BTF_ID_FLAGS macro. This will be used to tag kfuncs registered for a certain program type as acquire, release, sleepable, ret_null, etc. without having to create more and more sets which was proving to be an unscalable solution. Now, when looking up whether a kfunc is allowed for a certain program, we can also obtain its kfunc flags in the same call and avoid further lookups. The resolve_btfids change is split into a separate patch. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220721134245.2450-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/btf_ids.h | 68 ++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 64 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index 252a4befeab1..3cb0741e71d7 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -8,6 +8,15 @@ struct btf_id_set { u32 ids[]; }; +struct btf_id_set8 { + u32 cnt; + u32 flags; + struct { + u32 id; + u32 flags; + } pairs[]; +}; + #ifdef CONFIG_DEBUG_INFO_BTF #include /* for __PASTE */ @@ -25,7 +34,7 @@ struct btf_id_set { #define BTF_IDS_SECTION ".BTF_ids" -#define ____BTF_ID(symbol) \ +#define ____BTF_ID(symbol, word) \ asm( \ ".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ ".local " #symbol " ; \n" \ @@ -33,10 +42,11 @@ asm( \ ".size " #symbol ", 4; \n" \ #symbol ": \n" \ ".zero 4 \n" \ +word \ ".popsection; \n"); -#define __BTF_ID(symbol) \ - ____BTF_ID(symbol) +#define __BTF_ID(symbol, word) \ + ____BTF_ID(symbol, word) #define __ID(prefix) \ __PASTE(prefix, __COUNTER__) @@ -46,7 +56,14 @@ asm( \ * to 4 zero bytes. */ #define BTF_ID(prefix, name) \ - __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__)) + __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), "") + +#define ____BTF_ID_FLAGS(prefix, name, flags) \ + __BTF_ID(__ID(__BTF_ID__##prefix##__##name##__), ".long " #flags "\n") +#define __BTF_ID_FLAGS(prefix, name, flags, ...) \ + ____BTF_ID_FLAGS(prefix, name, flags) +#define BTF_ID_FLAGS(prefix, name, ...) \ + __BTF_ID_FLAGS(prefix, name, ##__VA_ARGS__, 0) /* * The BTF_ID_LIST macro defines pure (unsorted) list @@ -145,10 +162,51 @@ asm( \ ".popsection; \n"); \ extern struct btf_id_set name; +/* + * The BTF_SET8_START/END macros pair defines sorted list of + * BTF IDs and their flags plus its members count, with the + * following layout: + * + * BTF_SET8_START(list) + * BTF_ID_FLAGS(type1, name1, flags) + * BTF_ID_FLAGS(type2, name2, flags) + * BTF_SET8_END(list) + * + * __BTF_ID__set8__list: + * .zero 8 + * list: + * __BTF_ID__type1__name1__3: + * .zero 4 + * .word (1 << 0) | (1 << 2) + * __BTF_ID__type2__name2__5: + * .zero 4 + * .word (1 << 3) | (1 << 1) | (1 << 2) + * + */ +#define __BTF_SET8_START(name, scope) \ +asm( \ +".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ +"." #scope " __BTF_ID__set8__" #name "; \n" \ +"__BTF_ID__set8__" #name ":; \n" \ +".zero 8 \n" \ +".popsection; \n"); + +#define BTF_SET8_START(name) \ +__BTF_ID_LIST(name, local) \ +__BTF_SET8_START(name, local) + +#define BTF_SET8_END(name) \ +asm( \ +".pushsection " BTF_IDS_SECTION ",\"a\"; \n" \ +".size __BTF_ID__set8__" #name ", .-" #name " \n" \ +".popsection; \n"); \ +extern struct btf_id_set8 name; + #else #define BTF_ID_LIST(name) static u32 __maybe_unused name[5]; #define BTF_ID(prefix, name) +#define BTF_ID_FLAGS(prefix, name, flags) #define BTF_ID_UNUSED #define BTF_ID_LIST_GLOBAL(name, n) u32 __maybe_unused name[n]; #define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 __maybe_unused name[1]; @@ -156,6 +214,8 @@ extern struct btf_id_set name; #define BTF_SET_START(name) static struct btf_id_set __maybe_unused name = { 0 }; #define BTF_SET_START_GLOBAL(name) static struct btf_id_set __maybe_unused name = { 0 }; #define BTF_SET_END(name) +#define BTF_SET8_START(name) static struct btf_id_set8 __maybe_unused name = { 0 }; +#define BTF_SET8_END(name) static struct btf_id_set8 __maybe_unused name = { 0 }; #endif /* CONFIG_DEBUG_INFO_BTF */ -- cgit From a4703e3184320d6e15e2bc81d2ccf1c8c883f9d1 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Thu, 21 Jul 2022 15:42:35 +0200 Subject: bpf: Switch to new kfunc flags infrastructure Instead of populating multiple sets to indicate some attribute and then researching the same BTF ID in them, prepare a single unified BTF set which indicates whether a kfunc is allowed to be called, and also its attributes if any at the same time. Now, only one call is needed to perform the lookup for both kfunc availability and its attributes. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220721134245.2450-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 3 ++- include/linux/btf.h | 33 ++++++++++----------------------- 2 files changed, 12 insertions(+), 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 11950029284f..a97751d845c9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1924,7 +1924,8 @@ int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog, struct bpf_reg_state *regs); int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, const struct btf *btf, u32 func_id, - struct bpf_reg_state *regs); + struct bpf_reg_state *regs, + u32 kfunc_flags); int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, struct bpf_reg_state *reg); int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog, diff --git a/include/linux/btf.h b/include/linux/btf.h index 1bfed7fa0428..6dfc6eaf7f8c 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -12,14 +12,11 @@ #define BTF_TYPE_EMIT(type) ((void)(type *)0) #define BTF_TYPE_EMIT_ENUM(enum_val) ((void)enum_val) -enum btf_kfunc_type { - BTF_KFUNC_TYPE_CHECK, - BTF_KFUNC_TYPE_ACQUIRE, - BTF_KFUNC_TYPE_RELEASE, - BTF_KFUNC_TYPE_RET_NULL, - BTF_KFUNC_TYPE_KPTR_ACQUIRE, - BTF_KFUNC_TYPE_MAX, -}; +/* These need to be macros, as the expressions are used in assembler input */ +#define KF_ACQUIRE (1 << 0) /* kfunc is an acquire function */ +#define KF_RELEASE (1 << 1) /* kfunc is a release function */ +#define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ +#define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ struct btf; struct btf_member; @@ -30,16 +27,7 @@ struct btf_id_set; struct btf_kfunc_id_set { struct module *owner; - union { - struct { - struct btf_id_set *check_set; - struct btf_id_set *acquire_set; - struct btf_id_set *release_set; - struct btf_id_set *ret_null_set; - struct btf_id_set *kptr_acquire_set; - }; - struct btf_id_set *sets[BTF_KFUNC_TYPE_MAX]; - }; + struct btf_id_set8 *set; }; struct btf_id_dtor_kfunc { @@ -378,9 +366,9 @@ const struct btf_type *btf_type_by_id(const struct btf *btf, u32 type_id); const char *btf_name_by_offset(const struct btf *btf, u32 offset); struct btf *btf_parse_vmlinux(void); struct btf *bpf_prog_get_target_btf(const struct bpf_prog *prog); -bool btf_kfunc_id_set_contains(const struct btf *btf, +u32 *btf_kfunc_id_set_contains(const struct btf *btf, enum bpf_prog_type prog_type, - enum btf_kfunc_type type, u32 kfunc_btf_id); + u32 kfunc_btf_id); int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, const struct btf_kfunc_id_set *s); s32 btf_find_dtor_kfunc(struct btf *btf, u32 btf_id); @@ -397,12 +385,11 @@ static inline const char *btf_name_by_offset(const struct btf *btf, { return NULL; } -static inline bool btf_kfunc_id_set_contains(const struct btf *btf, +static inline u32 *btf_kfunc_id_set_contains(const struct btf *btf, enum bpf_prog_type prog_type, - enum btf_kfunc_type type, u32 kfunc_btf_id) { - return false; + return NULL; } static inline int register_btf_kfunc_id_set(enum bpf_prog_type prog_type, const struct btf_kfunc_id_set *s) -- cgit From 56e948ffc098a780fefb6c1784a3a2c7b81100a1 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Thu, 21 Jul 2022 15:42:36 +0200 Subject: bpf: Add support for forcing kfunc args to be trusted Teach the verifier to detect a new KF_TRUSTED_ARGS kfunc flag, which means each pointer argument must be trusted, which we define as a pointer that is referenced (has non-zero ref_obj_id) and also needs to have its offset unchanged, similar to how release functions expect their argument. This allows a kfunc to receive pointer arguments unchanged from the result of the acquire kfunc. This is required to ensure that kfunc that operate on some object only work on acquired pointers and not normal PTR_TO_BTF_ID with same type which can be obtained by pointer walking. The restrictions applied to release arguments also apply to trusted arguments. This implies that strict type matching (not deducing type by recursively following members at offset) and OBJ_RELEASE offset checks (ensuring they are zero) are used for trusted pointer arguments. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220721134245.2450-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/btf.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index 6dfc6eaf7f8c..cdb376d53238 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -17,6 +17,38 @@ #define KF_RELEASE (1 << 1) /* kfunc is a release function */ #define KF_RET_NULL (1 << 2) /* kfunc returns a pointer that may be NULL */ #define KF_KPTR_GET (1 << 3) /* kfunc returns reference to a kptr */ +/* Trusted arguments are those which are meant to be referenced arguments with + * unchanged offset. It is used to enforce that pointers obtained from acquire + * kfuncs remain unmodified when being passed to helpers taking trusted args. + * + * Consider + * struct foo { + * int data; + * struct foo *next; + * }; + * + * struct bar { + * int data; + * struct foo f; + * }; + * + * struct foo *f = alloc_foo(); // Acquire kfunc + * struct bar *b = alloc_bar(); // Acquire kfunc + * + * If a kfunc set_foo_data() wants to operate only on the allocated object, it + * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like: + * + * set_foo_data(f, 42); // Allowed + * set_foo_data(f->next, 42); // Rejected, non-referenced pointer + * set_foo_data(&f->next, 42);// Rejected, referenced, but wrong type + * set_foo_data(&b->f, 42); // Rejected, referenced, but bad offset + * + * In the final case, usually for the purposes of type matching, it is deduced + * by looking at the type of the member at the offset, but due to the + * requirement of trusted argument, this deduction will be strict and not done + * for this case. + */ +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ struct btf; struct btf_member; -- cgit From 949d6b405e6160ae44baea39192d67b39cb7eeac Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 20 Jul 2022 16:57:58 -0700 Subject: net: add missing includes and forward declarations under net/ This patch adds missing includes to headers under include/net. All these problems are currently masked by the existing users including the missing dependency before the broken header. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/lapb.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lapb.h b/include/linux/lapb.h index eb56472f23b2..b5333f9413dc 100644 --- a/include/linux/lapb.h +++ b/include/linux/lapb.h @@ -6,6 +6,11 @@ #ifndef LAPB_KERNEL_H #define LAPB_KERNEL_H +#include +#include + +struct net_device; + #define LAPB_OK 0 #define LAPB_BADTOKEN 1 #define LAPB_INVALUE 2 -- cgit From 0903f899418ee0235e4ad756f5502162f29b49b9 Mon Sep 17 00:00:00 2001 From: Avraham Stern Date: Thu, 27 Jan 2022 14:39:46 +0200 Subject: wifi: ieee80211: add helper functions for detecting TM/FTM frames Add helper functions for detection timing measurement and fine timing measurement frames. Signed-off-by: Avraham Stern Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 54 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index cca564372d16..55e6f4ad0ca6 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -1332,6 +1332,15 @@ struct ieee80211_mgmt { u8 action_code; u8 variable[]; } __packed s1g; + struct { + u8 action_code; + u8 dialog_token; + u8 follow_up; + u32 tod; + u32 toa; + u8 max_tod_error; + u8 max_toa_error; + } __packed wnm_timing_msr; } u; } __packed action; } u; @@ -3522,6 +3531,17 @@ enum ieee80211_mesh_actioncode { WLAN_MESH_ACTION_TBTT_ADJUSTMENT_RESPONSE, }; +/* Unprotected WNM action codes */ +enum ieee80211_unprotected_wnm_actioncode { + WLAN_UNPROTECTED_WNM_ACTION_TIM = 0, + WLAN_UNPROTECTED_WNM_ACTION_TIMING_MEASUREMENT_RESPONSE = 1, +}; + +/* Public action codes */ +enum ieee80211_public_actioncode { + WLAN_PUBLIC_ACTION_FTM_RESPONSE = 33, +}; + /* Security key length */ enum ieee80211_key_len { WLAN_KEY_LEN_WEP40 = 5, @@ -4311,6 +4331,40 @@ static inline bool ieee80211_action_contains_tpc(struct sk_buff *skb) return true; } +static inline bool ieee80211_is_timing_measurement(struct sk_buff *skb) +{ + struct ieee80211_mgmt *mgmt = (void *)skb->data; + + if (skb->len < IEEE80211_MIN_ACTION_SIZE) + return false; + + if (!ieee80211_is_action(mgmt->frame_control)) + return false; + + if (mgmt->u.action.category == WLAN_CATEGORY_WNM_UNPROTECTED && + mgmt->u.action.u.wnm_timing_msr.action_code == + WLAN_UNPROTECTED_WNM_ACTION_TIMING_MEASUREMENT_RESPONSE && + skb->len >= offsetofend(typeof(*mgmt), u.action.u.wnm_timing_msr)) + return true; + + return false; +} + +static inline bool ieee80211_is_ftm(struct sk_buff *skb) +{ + struct ieee80211_mgmt *mgmt = (void *)skb->data; + + if (!ieee80211_is_public_action((void *)mgmt, skb->len)) + return false; + + if (mgmt->u.action.u.ftm.action_code == + WLAN_PUBLIC_ACTION_FTM_RESPONSE && + skb->len >= offsetofend(typeof(*mgmt), u.action.u.ftm)) + return true; + + return false; +} + struct element { u8 id; u8 datalen; -- cgit From dea997733575c5793ca77e166bbbf89097987eb4 Mon Sep 17 00:00:00 2001 From: Charles Keepax Date: Fri, 22 Jul 2022 10:48:50 +0100 Subject: firmware: cs_dsp: Add pre_stop callback The code already has a post_stop callback, add a matching pre_stop callback to the client_ops that is called before execution is stopped. This callback provides a convenient place for the client code to communicate with the DSP before it is stopped. Signed-off-by: Charles Keepax Link: https://lore.kernel.org/r/20220722094851.92521-1-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown --- include/linux/firmware/cirrus/cs_dsp.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/firmware/cirrus/cs_dsp.h b/include/linux/firmware/cirrus/cs_dsp.h index 30055706cce2..6ab230218df0 100644 --- a/include/linux/firmware/cirrus/cs_dsp.h +++ b/include/linux/firmware/cirrus/cs_dsp.h @@ -189,7 +189,8 @@ struct cs_dsp { * @control_remove: Called under the pwr_lock when a control is destroyed * @pre_run: Called under the pwr_lock by cs_dsp_run() before the core is started * @post_run: Called under the pwr_lock by cs_dsp_run() after the core is started - * @post_stop: Called under the pwr_lock by cs_dsp_stop() + * @pre_stop: Called under the pwr_lock by cs_dsp_stop() before the core is stopped + * @post_stop: Called under the pwr_lock by cs_dsp_stop() after the core is stopped * @watchdog_expired: Called when a watchdog expiry is detected * * These callbacks give the cs_dsp client an opportunity to respond to events @@ -200,6 +201,7 @@ struct cs_dsp_client_ops { void (*control_remove)(struct cs_dsp_coeff_ctl *ctl); int (*pre_run)(struct cs_dsp *dsp); int (*post_run)(struct cs_dsp *dsp); + void (*pre_stop)(struct cs_dsp *dsp); void (*post_stop)(struct cs_dsp *dsp); void (*watchdog_expired)(struct cs_dsp *dsp); }; -- cgit From a4b976552f122ea851f556769874022cf097741e Mon Sep 17 00:00:00 2001 From: Charles Keepax Date: Fri, 22 Jul 2022 10:48:51 +0100 Subject: firmware: cs_dsp: Add memory chunk helpers Add helpers that can be layered on top of a buffer read from or to be written to the DSP to faciliate accessing datastructures within the DSP memory. These functions handle adding the padding bytes for the DSP, converting to big endian, and packing arbitrary length data. Signed-off-by: Charles Keepax Link: https://lore.kernel.org/r/20220722094851.92521-2-ckeepax@opensource.cirrus.com Signed-off-by: Mark Brown --- include/linux/firmware/cirrus/cs_dsp.h | 73 ++++++++++++++++++++++++++++++++++ 1 file changed, 73 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/cirrus/cs_dsp.h b/include/linux/firmware/cirrus/cs_dsp.h index 6ab230218df0..cad828e21c72 100644 --- a/include/linux/firmware/cirrus/cs_dsp.h +++ b/include/linux/firmware/cirrus/cs_dsp.h @@ -11,6 +11,7 @@ #ifndef __CS_DSP_H #define __CS_DSP_H +#include #include #include #include @@ -34,6 +35,7 @@ #define CS_ADSP2_REGION_ALL (CS_ADSP2_REGION_0 | CS_ADSP2_REGION_1_9) #define CS_DSP_DATA_WORD_SIZE 3 +#define CS_DSP_DATA_WORD_BITS (3 * BITS_PER_BYTE) #define CS_DSP_ACKED_CTL_TIMEOUT_MS 100 #define CS_DSP_ACKED_CTL_N_QUICKPOLLS 10 @@ -252,4 +254,75 @@ struct cs_dsp_alg_region *cs_dsp_find_alg_region(struct cs_dsp *dsp, const char *cs_dsp_mem_region_name(unsigned int type); +/** + * struct cs_dsp_chunk - Describes a buffer holding data formatted for the DSP + * @data: Pointer to underlying buffer memory + * @max: Pointer to end of the buffer memory + * @bytes: Number of bytes read/written into the memory chunk + * @cache: Temporary holding data as it is formatted + * @cachebits: Number of bits of data currently in cache + */ +struct cs_dsp_chunk { + u8 *data; + u8 *max; + int bytes; + + u32 cache; + int cachebits; +}; + +/** + * cs_dsp_chunk() - Create a DSP memory chunk + * @data: Pointer to the buffer that will be used to store data + * @size: Size of the buffer in bytes + * + * Return: A cs_dsp_chunk structure + */ +static inline struct cs_dsp_chunk cs_dsp_chunk(void *data, int size) +{ + struct cs_dsp_chunk ch = { + .data = data, + .max = data + size, + }; + + return ch; +} + +/** + * cs_dsp_chunk_end() - Check if a DSP memory chunk is full + * @ch: Pointer to the chunk structure + * + * Return: True if the whole buffer has been read/written + */ +static inline bool cs_dsp_chunk_end(struct cs_dsp_chunk *ch) +{ + return ch->data == ch->max; +} + +/** + * cs_dsp_chunk_bytes() - Number of bytes written/read from a DSP memory chunk + * @ch: Pointer to the chunk structure + * + * Return: Number of bytes read/written to the buffer + */ +static inline int cs_dsp_chunk_bytes(struct cs_dsp_chunk *ch) +{ + return ch->bytes; +} + +/** + * cs_dsp_chunk_valid_addr() - Check if an address is in a DSP memory chunk + * @ch: Pointer to the chunk structure + * + * Return: True if the given address is within the buffer + */ +static inline bool cs_dsp_chunk_valid_addr(struct cs_dsp_chunk *ch, void *addr) +{ + return (u8 *)addr >= ch->data && (u8 *)addr < ch->max; +} + +int cs_dsp_chunk_write(struct cs_dsp_chunk *ch, int nbits, u32 val); +int cs_dsp_chunk_flush(struct cs_dsp_chunk *ch); +int cs_dsp_chunk_read(struct cs_dsp_chunk *ch, int nbits); + #endif -- cgit From e423414375866a399ebbe55ed044acb39846e8bf Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Fri, 22 Jul 2022 13:36:05 +0200 Subject: bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF BTF_ID_FLAGS macro needs to be able to take 0 or 1 args, so make it a variable argument. BTF_SET8_END is incorrect, it should just be empty. Reported-by: kernel test robot Fixes: ab21d6063c01 ("bpf: Introduce 8-byte BTF set") Signed-off-by: Kumar Kartikeya Dwivedi Acked-by: Jiri Olsa Link: https://lore.kernel.org/r/20220722113605.6513-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/btf_ids.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/btf_ids.h b/include/linux/btf_ids.h index 3cb0741e71d7..2aea877d644f 100644 --- a/include/linux/btf_ids.h +++ b/include/linux/btf_ids.h @@ -206,7 +206,7 @@ extern struct btf_id_set8 name; #define BTF_ID_LIST(name) static u32 __maybe_unused name[5]; #define BTF_ID(prefix, name) -#define BTF_ID_FLAGS(prefix, name, flags) +#define BTF_ID_FLAGS(prefix, name, ...) #define BTF_ID_UNUSED #define BTF_ID_LIST_GLOBAL(name, n) u32 __maybe_unused name[n]; #define BTF_ID_LIST_SINGLE(name, prefix, typename) static u32 __maybe_unused name[1]; @@ -215,7 +215,7 @@ extern struct btf_id_set8 name; #define BTF_SET_START_GLOBAL(name) static struct btf_id_set __maybe_unused name = { 0 }; #define BTF_SET_END(name) #define BTF_SET8_START(name) static struct btf_id_set8 __maybe_unused name = { 0 }; -#define BTF_SET8_END(name) static struct btf_id_set8 __maybe_unused name = { 0 }; +#define BTF_SET8_END(name) #endif /* CONFIG_DEBUG_INFO_BTF */ -- cgit From 478af190cb6c501efaa8de2b9c9418ece2e4d0bd Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 22 Jul 2022 10:59:17 -0700 Subject: iomap: remove iomap_writepage Unused now. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal Reviewed-by: Chaitanya Kulkarni Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- include/linux/iomap.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e552097c67e0..911888560d3e 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -303,9 +303,6 @@ void iomap_finish_ioends(struct iomap_ioend *ioend, int error); void iomap_ioend_try_merge(struct iomap_ioend *ioend, struct list_head *more_ioends); void iomap_sort_ioends(struct list_head *ioend_list); -int iomap_writepage(struct page *page, struct writeback_control *wbc, - struct iomap_writepage_ctx *wpc, - const struct iomap_writeback_ops *ops); int iomap_writepages(struct address_space *mapping, struct writeback_control *wbc, struct iomap_writepage_ctx *wpc, const struct iomap_writeback_ops *ops); -- cgit From f96f644ab97abeed3f7007c953836a574ce928cc Mon Sep 17 00:00:00 2001 From: Song Liu Date: Tue, 19 Jul 2022 17:21:23 -0700 Subject: ftrace: Add modify_ftrace_direct_multi_nolock This is similar to modify_ftrace_direct_multi, but does not acquire direct_mutex. This is useful when direct_mutex is already locked by the user. Signed-off-by: Song Liu Signed-off-by: Daniel Borkmann Reviewed-by: Steven Rostedt (Google) Link: https://lore.kernel.org/bpf/20220720002126.803253-2-song@kernel.org --- include/linux/ftrace.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 979f6bfa2c25..acb35243ce5d 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -340,6 +340,7 @@ unsigned long ftrace_find_rec_direct(unsigned long ip); int register_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); int unregister_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned long addr); +int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr); #else struct ftrace_ops; @@ -384,6 +385,10 @@ static inline int modify_ftrace_direct_multi(struct ftrace_ops *ops, unsigned lo { return -ENODEV; } +static inline int modify_ftrace_direct_multi_nolock(struct ftrace_ops *ops, unsigned long addr) +{ + return -ENODEV; +} #endif /* CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS */ #ifndef CONFIG_HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS -- cgit From 53cd885bc5c3ea283cc9c00ca6446c778f00bfba Mon Sep 17 00:00:00 2001 From: Song Liu Date: Tue, 19 Jul 2022 17:21:24 -0700 Subject: ftrace: Allow IPMODIFY and DIRECT ops on the same function IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important users of ftrace. It is necessary to allow them work on the same function at the same time. First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify(). Then, a callback function, ops_func, is added to ftrace_ops. This is used by ftrace core code to understand whether the DIRECT ops can share with an IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement the callback function and adjust the direct trampoline accordingly. If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the IPMODIFY ops. If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY, so error code otherwise. The error code is propagated to register_ftrace_direct_multi so that onwer of the DIRECT trampoline can handle it properly. For more details, please refer to comment before enum ftrace_ops_cmd. Signed-off-by: Song Liu Signed-off-by: Daniel Borkmann Reviewed-by: Steven Rostedt (Google) Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/ Link: https://lore.kernel.org/all/20220718055449.3960512-1-song@kernel.org/ Link: https://lore.kernel.org/bpf/20220720002126.803253-3-song@kernel.org --- include/linux/ftrace.h | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index acb35243ce5d..0b61371e287b 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -208,6 +208,43 @@ enum { FTRACE_OPS_FL_DIRECT = BIT(17), }; +/* + * FTRACE_OPS_CMD_* commands allow the ftrace core logic to request changes + * to a ftrace_ops. Note, the requests may fail. + * + * ENABLE_SHARE_IPMODIFY_SELF - enable a DIRECT ops to work on the same + * function as an ops with IPMODIFY. Called + * when the DIRECT ops is being registered. + * This is called with both direct_mutex and + * ftrace_lock are locked. + * + * ENABLE_SHARE_IPMODIFY_PEER - enable a DIRECT ops to work on the same + * function as an ops with IPMODIFY. Called + * when the other ops (the one with IPMODIFY) + * is being registered. + * This is called with direct_mutex locked. + * + * DISABLE_SHARE_IPMODIFY_PEER - disable a DIRECT ops to work on the same + * function as an ops with IPMODIFY. Called + * when the other ops (the one with IPMODIFY) + * is being unregistered. + * This is called with direct_mutex locked. + */ +enum ftrace_ops_cmd { + FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_SELF, + FTRACE_OPS_CMD_ENABLE_SHARE_IPMODIFY_PEER, + FTRACE_OPS_CMD_DISABLE_SHARE_IPMODIFY_PEER, +}; + +/* + * For most ftrace_ops_cmd, + * Returns: + * 0 - Success. + * Negative on failure. The return value is dependent on the + * callback. + */ +typedef int (*ftrace_ops_func_t)(struct ftrace_ops *op, enum ftrace_ops_cmd cmd); + #ifdef CONFIG_DYNAMIC_FTRACE /* The hash used to know what functions callbacks trace */ struct ftrace_ops_hash { @@ -250,6 +287,7 @@ struct ftrace_ops { unsigned long trampoline; unsigned long trampoline_size; struct list_head list; + ftrace_ops_func_t ops_func; #endif }; -- cgit From 316cba62dfb7878b7353177e6a7da9cc0c979cde Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 19 Jul 2022 17:21:25 -0700 Subject: bpf, x64: Allow to use caller address from stack Currently we call the original function by using the absolute address given at the JIT generation. That's not usable when having trampoline attached to multiple functions, or the target address changes dynamically (in case of live patch). In such cases we need to take the return address from the stack. Adding support to retrieve the original function address from the stack by adding new BPF_TRAMP_F_ORIG_STACK flag for arch_prepare_bpf_trampoline function. Basically we take the return address of the 'fentry' call: function + 0: call fentry # stores 'function + 5' address on stack function + 5: ... The 'function + 5' address will be used as the address for the original function to call. Signed-off-by: Jiri Olsa Signed-off-by: Song Liu Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220720002126.803253-4-song@kernel.org --- include/linux/bpf.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a97751d845c9..1d22df8bf306 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -751,6 +751,11 @@ struct btf_func_model { /* Return the return value of fentry prog. Only used by bpf_struct_ops. */ #define BPF_TRAMP_F_RET_FENTRY_RET BIT(4) +/* Get original function from stack instead of from provided direct address. + * Makes sense for trampolines with fexit or fmod_ret programs. + */ +#define BPF_TRAMP_F_ORIG_STACK BIT(5) + /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 * bytes on x86. */ -- cgit From 00963a2e75a872e5fce4d0115ac2786ec86b57a6 Mon Sep 17 00:00:00 2001 From: Song Liu Date: Tue, 19 Jul 2022 17:21:26 -0700 Subject: bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch) When tracing a function with IPMODIFY ftrace_ops (livepatch), the bpf trampoline must follow the instruction pointer saved on stack. This needs extra handling for bpf trampolines with BPF_TRAMP_F_CALL_ORIG flag. Implement bpf_tramp_ftrace_ops_func and use it for the ftrace_ops used by BPF trampoline. This enables tracing functions with livepatch. This also requires moving bpf trampoline to *_ftrace_direct_mult APIs. Signed-off-by: Song Liu Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/all/20220602193706.2607681-2-song@kernel.org/ Link: https://lore.kernel.org/bpf/20220720002126.803253-5-song@kernel.org --- include/linux/bpf.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1d22df8bf306..20c26aed7896 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -47,6 +47,7 @@ struct kobject; struct mem_cgroup; struct module; struct bpf_func_state; +struct ftrace_ops; extern struct idr btf_idr; extern spinlock_t btf_idr_lock; @@ -756,6 +757,11 @@ struct btf_func_model { */ #define BPF_TRAMP_F_ORIG_STACK BIT(5) +/* This trampoline is on a function with another ftrace_ops with IPMODIFY, + * e.g., a live patch. This flag is set and cleared by ftrace call backs, + */ +#define BPF_TRAMP_F_SHARE_IPMODIFY BIT(6) + /* Each call __bpf_prog_enter + call bpf_func + call __bpf_prog_exit is ~50 * bytes on x86. */ @@ -838,9 +844,11 @@ struct bpf_tramp_image { struct bpf_trampoline { /* hlist for trampoline_table */ struct hlist_node hlist; + struct ftrace_ops *fops; /* serializes access to fields of this trampoline */ struct mutex mutex; refcount_t refcnt; + u32 flags; u64 key; struct { struct btf_func_model model; -- cgit From 189c6c33ff421def040b904fb14ef76c5bf5af4c Mon Sep 17 00:00:00 2001 From: Niklas Schnelle Date: Tue, 28 Jun 2022 16:30:59 +0200 Subject: PCI: Extend isolated function probing to s390 Like the jailhouse hypervisor, s390's PCI architecture allows passing isolated PCI functions to a guest OS instance. As of now this is was not utilized even with multi-function support as the s390 PCI code makes sure that only virtual PCI busses including a function with devfn 0 are presented to the PCI subsystem. A subsequent change will remove this restriction. Allow probing such functions by replacing the existing check for jailhouse_paravirt() with a new hypervisor_isolated_pci_functions() helper. Link: https://lore.kernel.org/r/20220628143100.3228092-5-schnelle@linux.ibm.com Signed-off-by: Niklas Schnelle Signed-off-by: Bjorn Helgaas Reviewed-by: Pierre Morel Cc: Jan Kiszka --- include/linux/hypervisor.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hypervisor.h b/include/linux/hypervisor.h index fc08b433c856..9efbc54e35e5 100644 --- a/include/linux/hypervisor.h +++ b/include/linux/hypervisor.h @@ -32,4 +32,12 @@ static inline bool jailhouse_paravirt(void) #endif /* !CONFIG_X86 */ +static inline bool hypervisor_isolated_pci_functions(void) +{ + if (IS_ENABLED(CONFIG_S390)) + return true; + + return jailhouse_paravirt(); +} + #endif /* __LINUX_HYPEVISOR_H */ -- cgit From abb4970ac33514c84b143516583eaf8cc47abd67 Mon Sep 17 00:00:00 2001 From: Stafford Horne Date: Sat, 23 Jul 2022 06:49:42 +0900 Subject: PCI: Move isa_dma_bridge_buggy out of asm/dma.h The isa_dma_bridge_buggy symbol is only used for x86_32, and only x86_32 platforms or quirks ever set it. Add a new linux/isa-dma.h header that #defines isa_dma_bridge_buggy to 0 except on x86_32, where we keep it as a variable, and remove all the arch- specific definitions. [bhelgaas: commit log] Suggested-by: Arnd Bergmann Suggested-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220722214944.831438-3-shorne@gmail.com Signed-off-by: Stafford Horne Signed-off-by: Bjorn Helgaas Reviewed-by: Christoph Hellwig Acked-by: Geert Uytterhoeven --- include/linux/isa-dma.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) create mode 100644 include/linux/isa-dma.h (limited to 'include/linux') diff --git a/include/linux/isa-dma.h b/include/linux/isa-dma.h new file mode 100644 index 000000000000..61504a8c1b9e --- /dev/null +++ b/include/linux/isa-dma.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LINUX_ISA_DMA_H +#define __LINUX_ISA_DMA_H + +#include + +#if defined(CONFIG_PCI) && defined(CONFIG_X86_32) +extern int isa_dma_bridge_buggy; +#else +#define isa_dma_bridge_buggy (0) +#endif + +#endif /* __LINUX_ISA_DMA_H */ -- cgit From e8f90717ed3b58e81c480b3aa38e641c0da5a456 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Fri, 22 Jul 2022 19:02:47 -0700 Subject: vfio: Make vfio_unpin_pages() return void There's only one caller that checks its return value with a WARN_ON_ONCE, while all other callers don't check the return value at all. Above that, an undo function should not fail. So, simplify the API to return void by embedding similar WARN_ONs. Also for users to pinpoint which condition fails, separate WARN_ON lines, yet remove the "driver->ops->unpin_pages" check, since it's unreasonable for callers to unpin on something totally random that wasn't even pinned. And remove NULL pointer checks for they would trigger oops vs. warnings. Note that npage is already validated in the vfio core, thus drop the same check in the type1 code. Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Kirti Wankhede Tested-by: Terrence Xu Signed-off-by: Nicolin Chen Link: https://lore.kernel.org/r/20220723020256.30081-2-nicolinc@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 19cefbaa3d06..9f7d74c24925 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -163,8 +163,8 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device); int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn, int npage, int prot, unsigned long *phys_pfn); -int vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn, - int npage); +void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn, + int npage); int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data, size_t len, bool write); -- cgit From 6a010a49b63ac8465851a79185d8deff966f8e1a Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sat, 23 Jul 2022 04:28:28 -1000 Subject: cgroup: Make !percpu threadgroup_rwsem operations optional MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 3942a9bd7b58 ("locking, rcu, cgroup: Avoid synchronize_sched() in __cgroup_procs_write()") disabled percpu operations on threadgroup_rwsem because the impiled synchronize_rcu() on write locking was pushing up the latencies too much for android which constantly moves processes between cgroups. This makes the hotter paths - fork and exit - slower as they're always forced into the slow path. There is no reason to force this on everyone especially given that more common static usage pattern can now completely avoid write-locking the rwsem. Write-locking is elided when turning on and off controllers on empty sub-trees and CLONE_INTO_CGROUP enables seeding a cgroup without grabbing the rwsem. Restore the default percpu operations and introduce the mount option "favordynmods" and config option CGROUP_FAVOR_DYNMODS for users who need lower latencies for the dynamic operations. Signed-off-by: Tejun Heo Cc: Christian Brauner Cc: Michal Koutn� Cc: Peter Zijlstra Cc: John Stultz Cc: Dmitry Shmidt Cc: Oleg Nesterov --- include/linux/cgroup-defs.h | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 672de25e3ec8..63bf43c7ca3b 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -88,20 +88,33 @@ enum { */ CGRP_ROOT_NS_DELEGATE = (1 << 3), + /* + * Reduce latencies on dynamic cgroup modifications such as task + * migrations and controller on/offs by disabling percpu operation on + * cgroup_threadgroup_rwsem. This makes hot path operations such as + * forks and exits into the slow path and more expensive. + * + * The static usage pattern of creating a cgroup, enabling controllers, + * and then seeding it with CLONE_INTO_CGROUP doesn't require write + * locking cgroup_threadgroup_rwsem and thus doesn't benefit from + * favordynmod. + */ + CGRP_ROOT_FAVOR_DYNMODS = (1 << 4), + /* * Enable cpuset controller in v1 cgroup to use v2 behavior. */ - CGRP_ROOT_CPUSET_V2_MODE = (1 << 4), + CGRP_ROOT_CPUSET_V2_MODE = (1 << 16), /* * Enable legacy local memory.events. */ - CGRP_ROOT_MEMORY_LOCAL_EVENTS = (1 << 5), + CGRP_ROOT_MEMORY_LOCAL_EVENTS = (1 << 17), /* * Enable recursive subtree protection */ - CGRP_ROOT_MEMORY_RECURSIVE_PROT = (1 << 6), + CGRP_ROOT_MEMORY_RECURSIVE_PROT = (1 << 18), }; /* cftype->flags */ -- cgit From ba8ec7a607e98e8491a1fcf924a2e6c96ac9d413 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 23 Jul 2022 14:47:28 -0400 Subject: SUNRPC: Shrink size of struct rpc_task Move the field 'tk_rpc_status' so that we eliminate a 4 byte hole in the structure. For x86_64, this shrinks the size of the struct by 8 bytes. 'pahole' output before the change: /* size: 232, cachelines: 4, members: 27 */ /* sum members: 222, holes: 1, sum holes: 4 */ /* sum bitfield members: 8 bits (1 bytes) */ /* padding: 5 */ /* last cacheline: 40 bytes */ 'pahole' output after the change: /* size: 224, cachelines: 4, members: 27 */ /* padding: 1 */ /* last cacheline: 32 bytes */ Signed-off-by: Trond Myklebust --- include/linux/sunrpc/sched.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 1d7a3e51b795..acc62647317c 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -61,8 +61,6 @@ struct rpc_task { struct rpc_wait tk_wait; /* RPC wait */ } u; - int tk_rpc_status; /* Result of last RPC operation */ - /* * RPC call state */ @@ -82,6 +80,8 @@ struct rpc_task { ktime_t tk_start; /* RPC task init timestamp */ pid_t tk_owner; /* Process id for batching tasks */ + + int tk_rpc_status; /* Result of last RPC operation */ unsigned short tk_flags; /* misc flags */ unsigned short tk_timeouts; /* maj timeouts */ -- cgit From 69d966510d9f5de81588b37d23a9ee8ccc477b23 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 22 Jul 2022 14:12:20 -0400 Subject: nfs: only issue commit in DIO codepath if we have uncommitted data Currently, we try to determine whether to issue a commit based on nfs_write_need_commit which looks at the current verifier. In the case where we got a short write and then tried to follow it up with one that failed, the verifier can't be trusted. What we really want to know is whether the pgio request had any successful writes that came back as UNSTABLE. Add a new flag to the pgio request, and use that to indicate that we've had a successful unstable write. Only issue a commit if that flag is set. Signed-off-by: Jeff Layton Signed-off-by: Trond Myklebust --- include/linux/nfs_xdr.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h index 0e3aa0f5f324..e86cf6642d21 100644 --- a/include/linux/nfs_xdr.h +++ b/include/linux/nfs_xdr.h @@ -1600,6 +1600,7 @@ enum { NFS_IOHDR_STAT, NFS_IOHDR_RESEND_PNFS, NFS_IOHDR_RESEND_MDS, + NFS_IOHDR_UNSTABLE_WRITES, }; struct nfs_io_completion; -- cgit From 4f5f3b6028343d687d0533329b130e4b8280ab32 Mon Sep 17 00:00:00 2001 From: Anna Schumaker Date: Thu, 21 Jul 2022 14:21:31 -0400 Subject: SUNRPC: Introduce xdr_stream_move_subsegment() I do this by creating an xdr subsegment for the range we will be operating over. This lets me shift data to the correct place without potentially overwriting anything already there. Signed-off-by: Anna Schumaker Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xdr.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 5860f32e3958..7dcc6c31fe29 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -262,6 +262,8 @@ extern unsigned int xdr_align_data(struct xdr_stream *, unsigned int offset, uns extern unsigned int xdr_expand_hole(struct xdr_stream *, unsigned int offset, unsigned int length); extern bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, unsigned int len); +extern unsigned int xdr_stream_move_subsegment(struct xdr_stream *xdr, unsigned int offset, + unsigned int target, unsigned int length); /** * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. -- cgit From 7c4cd5f4d2dd4a028a46bfb696b0cd387caadf33 Mon Sep 17 00:00:00 2001 From: Anna Schumaker Date: Thu, 21 Jul 2022 14:21:32 -0400 Subject: SUNRPC: Add a function for directly setting the xdr page len We need to do this step during READ_PLUS decoding so that we know pages are the right length and any extra data has been preserved in the tail. Signed-off-by: Anna Schumaker Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xdr.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 7dcc6c31fe29..8cd38a9994ca 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -258,6 +258,7 @@ extern __be32 *xdr_inline_decode(struct xdr_stream *xdr, size_t nbytes); extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len); extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len); extern int xdr_process_buf(const struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data); +extern void xdr_set_pagelen(struct xdr_stream *, unsigned int len); extern unsigned int xdr_align_data(struct xdr_stream *, unsigned int offset, unsigned int length); extern unsigned int xdr_expand_hole(struct xdr_stream *, unsigned int offset, unsigned int length); extern bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, -- cgit From e1bd87608d4b6f87813f79b91e834de610f1049b Mon Sep 17 00:00:00 2001 From: Anna Schumaker Date: Thu, 21 Jul 2022 14:21:33 -0400 Subject: SUNRPC: Add a function for zeroing out a portion of an xdr_stream This will be used during READ_PLUS decoding for handling HOLE segments. Signed-off-by: Anna Schumaker Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xdr.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 8cd38a9994ca..f0ab06acab61 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -265,6 +265,8 @@ extern bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf unsigned int len); extern unsigned int xdr_stream_move_subsegment(struct xdr_stream *xdr, unsigned int offset, unsigned int target, unsigned int length); +extern unsigned int xdr_stream_zero(struct xdr_stream *xdr, unsigned int offset, + unsigned int length); /** * xdr_set_scratch_buffer - Attach a scratch buffer for decoding data. -- cgit From 29946fbcb2c31a2a367887dc58a2e7e5b012e285 Mon Sep 17 00:00:00 2001 From: Anna Schumaker Date: Thu, 21 Jul 2022 14:21:35 -0400 Subject: SUNRPC: Remove xdr_align_data() and xdr_expand_hole() These functions are no longer needed now that the NFS client places data and hole segments directly. Signed-off-by: Anna Schumaker Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xdr.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index f0ab06acab61..f38c97f45354 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -259,8 +259,6 @@ extern unsigned int xdr_read_pages(struct xdr_stream *xdr, unsigned int len); extern void xdr_enter_page(struct xdr_stream *xdr, unsigned int len); extern int xdr_process_buf(const struct xdr_buf *buf, unsigned int offset, unsigned int len, int (*actor)(struct scatterlist *, void *), void *data); extern void xdr_set_pagelen(struct xdr_stream *, unsigned int len); -extern unsigned int xdr_align_data(struct xdr_stream *, unsigned int offset, unsigned int length); -extern unsigned int xdr_expand_hole(struct xdr_stream *, unsigned int offset, unsigned int length); extern bool xdr_stream_subsegment(struct xdr_stream *xdr, struct xdr_buf *subbuf, unsigned int len); extern unsigned int xdr_stream_move_subsegment(struct xdr_stream *xdr, unsigned int offset, -- cgit From ab1c84d855cf2c284fa0f4b17fc04063659c54a1 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 16 Jun 2022 13:57:19 +0100 Subject: io_uring: make io_uring_types.h public Move io_uring types to linux/include, need them public so tracing can see the definitions and we can clean trace/events/io_uring.h Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/a15f12e8cb7289b2de0deaddcc7518d98a132d17.1655384063.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 554 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 554 insertions(+) create mode 100644 include/linux/io_uring_types.h (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h new file mode 100644 index 000000000000..779c72da5b8f --- /dev/null +++ b/include/linux/io_uring_types.h @@ -0,0 +1,554 @@ +#ifndef IO_URING_TYPES_H +#define IO_URING_TYPES_H + +#include +#include +#include +#include + +struct io_wq_work_node { + struct io_wq_work_node *next; +}; + +struct io_wq_work_list { + struct io_wq_work_node *first; + struct io_wq_work_node *last; +}; + +struct io_wq_work { + struct io_wq_work_node list; + unsigned flags; + /* place it here instead of io_kiocb as it fills padding and saves 4B */ + int cancel_seq; +}; + +struct io_fixed_file { + /* file * with additional FFS_* flags */ + unsigned long file_ptr; +}; + +struct io_file_table { + struct io_fixed_file *files; + unsigned long *bitmap; + unsigned int alloc_hint; +}; + +struct io_hash_bucket { + spinlock_t lock; + struct hlist_head list; +} ____cacheline_aligned_in_smp; + +struct io_hash_table { + struct io_hash_bucket *hbs; + unsigned hash_bits; +}; + +struct io_uring { + u32 head ____cacheline_aligned_in_smp; + u32 tail ____cacheline_aligned_in_smp; +}; + +/* + * This data is shared with the application through the mmap at offsets + * IORING_OFF_SQ_RING and IORING_OFF_CQ_RING. + * + * The offsets to the member fields are published through struct + * io_sqring_offsets when calling io_uring_setup. + */ +struct io_rings { + /* + * Head and tail offsets into the ring; the offsets need to be + * masked to get valid indices. + * + * The kernel controls head of the sq ring and the tail of the cq ring, + * and the application controls tail of the sq ring and the head of the + * cq ring. + */ + struct io_uring sq, cq; + /* + * Bitmasks to apply to head and tail offsets (constant, equals + * ring_entries - 1) + */ + u32 sq_ring_mask, cq_ring_mask; + /* Ring sizes (constant, power of 2) */ + u32 sq_ring_entries, cq_ring_entries; + /* + * Number of invalid entries dropped by the kernel due to + * invalid index stored in array + * + * Written by the kernel, shouldn't be modified by the + * application (i.e. get number of "new events" by comparing to + * cached value). + * + * After a new SQ head value was read by the application this + * counter includes all submissions that were dropped reaching + * the new SQ head (and possibly more). + */ + u32 sq_dropped; + /* + * Runtime SQ flags + * + * Written by the kernel, shouldn't be modified by the + * application. + * + * The application needs a full memory barrier before checking + * for IORING_SQ_NEED_WAKEUP after updating the sq tail. + */ + atomic_t sq_flags; + /* + * Runtime CQ flags + * + * Written by the application, shouldn't be modified by the + * kernel. + */ + u32 cq_flags; + /* + * Number of completion events lost because the queue was full; + * this should be avoided by the application by making sure + * there are not more requests pending than there is space in + * the completion queue. + * + * Written by the kernel, shouldn't be modified by the + * application (i.e. get number of "new events" by comparing to + * cached value). + * + * As completion events come in out of order this counter is not + * ordered with any other data. + */ + u32 cq_overflow; + /* + * Ring buffer of completion events. + * + * The kernel writes completion events fresh every time they are + * produced, so the application is allowed to modify pending + * entries. + */ + struct io_uring_cqe cqes[] ____cacheline_aligned_in_smp; +}; + +struct io_restriction { + DECLARE_BITMAP(register_op, IORING_REGISTER_LAST); + DECLARE_BITMAP(sqe_op, IORING_OP_LAST); + u8 sqe_flags_allowed; + u8 sqe_flags_required; + bool registered; +}; + +struct io_submit_link { + struct io_kiocb *head; + struct io_kiocb *last; +}; + +struct io_submit_state { + /* inline/task_work completion list, under ->uring_lock */ + struct io_wq_work_node free_list; + /* batch completion logic */ + struct io_wq_work_list compl_reqs; + struct io_submit_link link; + + bool plug_started; + bool need_plug; + bool flush_cqes; + unsigned short submit_nr; + struct blk_plug plug; +}; + +struct io_ev_fd { + struct eventfd_ctx *cq_ev_fd; + unsigned int eventfd_async: 1; + struct rcu_head rcu; +}; + +struct io_ring_ctx { + /* const or read-mostly hot data */ + struct { + struct percpu_ref refs; + + struct io_rings *rings; + unsigned int flags; + enum task_work_notify_mode notify_method; + unsigned int compat: 1; + unsigned int drain_next: 1; + unsigned int restricted: 1; + unsigned int off_timeout_used: 1; + unsigned int drain_active: 1; + unsigned int drain_disabled: 1; + unsigned int has_evfd: 1; + unsigned int syscall_iopoll: 1; + } ____cacheline_aligned_in_smp; + + /* submission data */ + struct { + struct mutex uring_lock; + + /* + * Ring buffer of indices into array of io_uring_sqe, which is + * mmapped by the application using the IORING_OFF_SQES offset. + * + * This indirection could e.g. be used to assign fixed + * io_uring_sqe entries to operations and only submit them to + * the queue when needed. + * + * The kernel modifies neither the indices array nor the entries + * array. + */ + u32 *sq_array; + struct io_uring_sqe *sq_sqes; + unsigned cached_sq_head; + unsigned sq_entries; + + /* + * Fixed resources fast path, should be accessed only under + * uring_lock, and updated through io_uring_register(2) + */ + struct io_rsrc_node *rsrc_node; + int rsrc_cached_refs; + atomic_t cancel_seq; + struct io_file_table file_table; + unsigned nr_user_files; + unsigned nr_user_bufs; + struct io_mapped_ubuf **user_bufs; + + struct io_submit_state submit_state; + + struct io_buffer_list *io_bl; + struct xarray io_bl_xa; + struct list_head io_buffers_cache; + + struct io_hash_table cancel_table_locked; + struct list_head cq_overflow_list; + struct list_head apoll_cache; + struct xarray personalities; + u32 pers_next; + } ____cacheline_aligned_in_smp; + + /* IRQ completion list, under ->completion_lock */ + struct io_wq_work_list locked_free_list; + unsigned int locked_free_nr; + + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ + struct io_sq_data *sq_data; /* if using sq thread polling */ + + struct wait_queue_head sqo_sq_wait; + struct list_head sqd_list; + + unsigned long check_cq; + + struct { + /* + * We cache a range of free CQEs we can use, once exhausted it + * should go through a slower range setup, see __io_get_cqe() + */ + struct io_uring_cqe *cqe_cached; + struct io_uring_cqe *cqe_sentinel; + + unsigned cached_cq_tail; + unsigned cq_entries; + struct io_ev_fd __rcu *io_ev_fd; + struct wait_queue_head cq_wait; + unsigned cq_extra; + } ____cacheline_aligned_in_smp; + + struct { + spinlock_t completion_lock; + + /* + * ->iopoll_list is protected by the ctx->uring_lock for + * io_uring instances that don't use IORING_SETUP_SQPOLL. + * For SQPOLL, only the single threaded io_sq_thread() will + * manipulate the list, hence no extra locking is needed there. + */ + struct io_wq_work_list iopoll_list; + struct io_hash_table cancel_table; + bool poll_multi_queue; + + struct list_head io_buffers_comp; + } ____cacheline_aligned_in_smp; + + /* timeouts */ + struct { + spinlock_t timeout_lock; + atomic_t cq_timeouts; + struct list_head timeout_list; + struct list_head ltimeout_list; + unsigned cq_last_tm_flush; + } ____cacheline_aligned_in_smp; + + /* Keep this last, we don't need it for the fast path */ + + struct io_restriction restrictions; + struct task_struct *submitter_task; + + /* slow path rsrc auxilary data, used by update/register */ + struct io_rsrc_node *rsrc_backup_node; + struct io_mapped_ubuf *dummy_ubuf; + struct io_rsrc_data *file_data; + struct io_rsrc_data *buf_data; + + struct delayed_work rsrc_put_work; + struct llist_head rsrc_put_llist; + struct list_head rsrc_ref_list; + spinlock_t rsrc_ref_lock; + + struct list_head io_buffers_pages; + + #if defined(CONFIG_UNIX) + struct socket *ring_sock; + #endif + /* hashed buffered write serialization */ + struct io_wq_hash *hash_map; + + /* Only used for accounting purposes */ + struct user_struct *user; + struct mm_struct *mm_account; + + /* ctx exit and cancelation */ + struct llist_head fallback_llist; + struct delayed_work fallback_work; + struct work_struct exit_work; + struct list_head tctx_list; + struct completion ref_comp; + + /* io-wq management, e.g. thread count */ + u32 iowq_limits[2]; + bool iowq_limits_set; + + struct list_head defer_list; + unsigned sq_thread_idle; +}; + +enum { + REQ_F_FIXED_FILE_BIT = IOSQE_FIXED_FILE_BIT, + REQ_F_IO_DRAIN_BIT = IOSQE_IO_DRAIN_BIT, + REQ_F_LINK_BIT = IOSQE_IO_LINK_BIT, + REQ_F_HARDLINK_BIT = IOSQE_IO_HARDLINK_BIT, + REQ_F_FORCE_ASYNC_BIT = IOSQE_ASYNC_BIT, + REQ_F_BUFFER_SELECT_BIT = IOSQE_BUFFER_SELECT_BIT, + REQ_F_CQE_SKIP_BIT = IOSQE_CQE_SKIP_SUCCESS_BIT, + + /* first byte is taken by user flags, shift it to not overlap */ + REQ_F_FAIL_BIT = 8, + REQ_F_INFLIGHT_BIT, + REQ_F_CUR_POS_BIT, + REQ_F_NOWAIT_BIT, + REQ_F_LINK_TIMEOUT_BIT, + REQ_F_NEED_CLEANUP_BIT, + REQ_F_POLLED_BIT, + REQ_F_BUFFER_SELECTED_BIT, + REQ_F_BUFFER_RING_BIT, + REQ_F_REISSUE_BIT, + REQ_F_CREDS_BIT, + REQ_F_REFCOUNT_BIT, + REQ_F_ARM_LTIMEOUT_BIT, + REQ_F_ASYNC_DATA_BIT, + REQ_F_SKIP_LINK_CQES_BIT, + REQ_F_SINGLE_POLL_BIT, + REQ_F_DOUBLE_POLL_BIT, + REQ_F_PARTIAL_IO_BIT, + REQ_F_CQE32_INIT_BIT, + REQ_F_APOLL_MULTISHOT_BIT, + REQ_F_CLEAR_POLLIN_BIT, + REQ_F_HASH_LOCKED_BIT, + /* keep async read/write and isreg together and in order */ + REQ_F_SUPPORT_NOWAIT_BIT, + REQ_F_ISREG_BIT, + + /* not a real bit, just to check we're not overflowing the space */ + __REQ_F_LAST_BIT, +}; + +enum { + /* ctx owns file */ + REQ_F_FIXED_FILE = BIT(REQ_F_FIXED_FILE_BIT), + /* drain existing IO first */ + REQ_F_IO_DRAIN = BIT(REQ_F_IO_DRAIN_BIT), + /* linked sqes */ + REQ_F_LINK = BIT(REQ_F_LINK_BIT), + /* doesn't sever on completion < 0 */ + REQ_F_HARDLINK = BIT(REQ_F_HARDLINK_BIT), + /* IOSQE_ASYNC */ + REQ_F_FORCE_ASYNC = BIT(REQ_F_FORCE_ASYNC_BIT), + /* IOSQE_BUFFER_SELECT */ + REQ_F_BUFFER_SELECT = BIT(REQ_F_BUFFER_SELECT_BIT), + /* IOSQE_CQE_SKIP_SUCCESS */ + REQ_F_CQE_SKIP = BIT(REQ_F_CQE_SKIP_BIT), + + /* fail rest of links */ + REQ_F_FAIL = BIT(REQ_F_FAIL_BIT), + /* on inflight list, should be cancelled and waited on exit reliably */ + REQ_F_INFLIGHT = BIT(REQ_F_INFLIGHT_BIT), + /* read/write uses file position */ + REQ_F_CUR_POS = BIT(REQ_F_CUR_POS_BIT), + /* must not punt to workers */ + REQ_F_NOWAIT = BIT(REQ_F_NOWAIT_BIT), + /* has or had linked timeout */ + REQ_F_LINK_TIMEOUT = BIT(REQ_F_LINK_TIMEOUT_BIT), + /* needs cleanup */ + REQ_F_NEED_CLEANUP = BIT(REQ_F_NEED_CLEANUP_BIT), + /* already went through poll handler */ + REQ_F_POLLED = BIT(REQ_F_POLLED_BIT), + /* buffer already selected */ + REQ_F_BUFFER_SELECTED = BIT(REQ_F_BUFFER_SELECTED_BIT), + /* buffer selected from ring, needs commit */ + REQ_F_BUFFER_RING = BIT(REQ_F_BUFFER_RING_BIT), + /* caller should reissue async */ + REQ_F_REISSUE = BIT(REQ_F_REISSUE_BIT), + /* supports async reads/writes */ + REQ_F_SUPPORT_NOWAIT = BIT(REQ_F_SUPPORT_NOWAIT_BIT), + /* regular file */ + REQ_F_ISREG = BIT(REQ_F_ISREG_BIT), + /* has creds assigned */ + REQ_F_CREDS = BIT(REQ_F_CREDS_BIT), + /* skip refcounting if not set */ + REQ_F_REFCOUNT = BIT(REQ_F_REFCOUNT_BIT), + /* there is a linked timeout that has to be armed */ + REQ_F_ARM_LTIMEOUT = BIT(REQ_F_ARM_LTIMEOUT_BIT), + /* ->async_data allocated */ + REQ_F_ASYNC_DATA = BIT(REQ_F_ASYNC_DATA_BIT), + /* don't post CQEs while failing linked requests */ + REQ_F_SKIP_LINK_CQES = BIT(REQ_F_SKIP_LINK_CQES_BIT), + /* single poll may be active */ + REQ_F_SINGLE_POLL = BIT(REQ_F_SINGLE_POLL_BIT), + /* double poll may active */ + REQ_F_DOUBLE_POLL = BIT(REQ_F_DOUBLE_POLL_BIT), + /* request has already done partial IO */ + REQ_F_PARTIAL_IO = BIT(REQ_F_PARTIAL_IO_BIT), + /* fast poll multishot mode */ + REQ_F_APOLL_MULTISHOT = BIT(REQ_F_APOLL_MULTISHOT_BIT), + /* ->extra1 and ->extra2 are initialised */ + REQ_F_CQE32_INIT = BIT(REQ_F_CQE32_INIT_BIT), + /* recvmsg special flag, clear EPOLLIN */ + REQ_F_CLEAR_POLLIN = BIT(REQ_F_CLEAR_POLLIN_BIT), + /* hashed into ->cancel_hash_locked, protected by ->uring_lock */ + REQ_F_HASH_LOCKED = BIT(REQ_F_HASH_LOCKED_BIT), +}; + +typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked); + +struct io_task_work { + union { + struct io_wq_work_node node; + struct llist_node fallback_node; + }; + io_req_tw_func_t func; +}; + +struct io_cqe { + __u64 user_data; + __s32 res; + /* fd initially, then cflags for completion */ + union { + __u32 flags; + int fd; + }; +}; + +/* + * Each request type overlays its private data structure on top of this one. + * They must not exceed this one in size. + */ +struct io_cmd_data { + struct file *file; + /* each command gets 56 bytes of data */ + __u8 data[56]; +}; + +#define io_kiocb_to_cmd(req) ((void *) &(req)->cmd) +#define cmd_to_io_kiocb(ptr) ((struct io_kiocb *) ptr) + +struct io_kiocb { + union { + /* + * NOTE! Each of the io_kiocb union members has the file pointer + * as the first entry in their struct definition. So you can + * access the file pointer through any of the sub-structs, + * or directly as just 'file' in this struct. + */ + struct file *file; + struct io_cmd_data cmd; + }; + + u8 opcode; + /* polled IO has completed */ + u8 iopoll_completed; + /* + * Can be either a fixed buffer index, or used with provided buffers. + * For the latter, before issue it points to the buffer group ID, + * and after selection it points to the buffer ID itself. + */ + u16 buf_index; + unsigned int flags; + + struct io_cqe cqe; + + struct io_ring_ctx *ctx; + struct task_struct *task; + + struct io_rsrc_node *rsrc_node; + + union { + /* store used ubuf, so we can prevent reloading */ + struct io_mapped_ubuf *imu; + + /* stores selected buf, valid IFF REQ_F_BUFFER_SELECTED is set */ + struct io_buffer *kbuf; + + /* + * stores buffer ID for ring provided buffers, valid IFF + * REQ_F_BUFFER_RING is set. + */ + struct io_buffer_list *buf_list; + }; + + union { + /* used by request caches, completion batching and iopoll */ + struct io_wq_work_node comp_list; + /* cache ->apoll->events */ + __poll_t apoll_events; + }; + atomic_t refs; + atomic_t poll_refs; + struct io_task_work io_task_work; + /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ + union { + struct hlist_node hash_node; + struct { + u64 extra1; + u64 extra2; + }; + }; + /* internal polling, see IORING_FEAT_FAST_POLL */ + struct async_poll *apoll; + /* opcode allocated if it needs to store data for async defer */ + void *async_data; + /* linked requests, IFF REQ_F_HARDLINK or REQ_F_LINK are set */ + struct io_kiocb *link; + /* custom credentials, valid IFF REQ_F_CREDS is set */ + const struct cred *creds; + struct io_wq_work work; +}; + +struct io_cancel_data { + struct io_ring_ctx *ctx; + union { + u64 data; + struct file *file; + }; + u32 flags; + int seq; +}; + +struct io_overflow_cqe { + struct list_head list; + struct io_uring_cqe cqe; +}; + +struct io_mapped_ubuf { + u64 ubuf; + u64 ubuf_end; + unsigned int nr_bvecs; + unsigned long acct_pages; + struct bio_vec bvec[]; +}; + +#endif -- cgit From ad163a7e2562230c77102c60f668bac440e60cce Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Sat, 18 Jun 2022 19:44:33 -0600 Subject: io_uring: move a few private types to local headers Commit 3a3d47fa9cfd ("io_uring: make io_uring_types.h public") moved a bunch of io_uring types to a kernel wide header, so we could make tracing a bit saner rather than pass in a ton of arguments. However, there are a few types in there that are not really needed to be system wide. Move the cancel data and mapped buffers back to the appropriate io_uring local headers. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 18 ------------------ 1 file changed, 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 779c72da5b8f..2015f3ea7cb7 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -528,27 +528,9 @@ struct io_kiocb { struct io_wq_work work; }; -struct io_cancel_data { - struct io_ring_ctx *ctx; - union { - u64 data; - struct file *file; - }; - u32 flags; - int seq; -}; - struct io_overflow_cqe { struct list_head list; struct io_uring_cqe cqe; }; -struct io_mapped_ubuf { - u64 ubuf; - u64 ubuf_end; - unsigned int nr_bvecs; - unsigned long acct_pages; - struct bio_vec bvec[]; -}; - #endif -- cgit From d9dee4302a7cbd6c0142dbdf6d150acc7459de0d Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sun, 19 Jun 2022 12:26:08 +0100 Subject: io_uring: remove ->flush_cqes optimisation It's not clear how widely used IOSQE_CQE_SKIP_SUCCESS is, and how often ->flush_cqes flag prevents from completion being flushed. Sometimes it's high level of concurrency that enables it at least for one CQE, but sometimes it doesn't save much because nobody waiting on the CQ. Remove ->flush_cqes flag and the optimisation, it should benefit the normal use case. Note, that there is no spurious eventfd problem with that as checks for spuriousness were incorporated into io_eventfd_signal(). Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/692e81eeddccc096f449a7960365fa7b4a18f8e6.1655637157.git.asml.silence@gmail.com [axboe: remove now dead state->flush_cqes variable] Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 2015f3ea7cb7..6bcd7bff6479 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -148,7 +148,6 @@ struct io_submit_state { bool plug_started; bool need_plug; - bool flush_cqes; unsigned short submit_nr; struct blk_plug plug; }; -- cgit From 305bef98870816ae58357d647521891ec558a92e Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Mon, 20 Jun 2022 01:25:55 +0100 Subject: io_uring: hide eventfd assumptions in eventfd paths Some io_uring-eventfd users assume that there won't be spurious wakeups. That assumption has to be honoured by all io_cqring_ev_posted() callers, which is inconvenient and from time to time leads to problems but should be maintained to not break the userspace. Instead of making the callers track whether a CQE was posted or not, hide it inside io_eventfd_signal(). It saves ->cached_cq_tail it saw last time and triggers the eventfd only when ->cached_cq_tail changed since then. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/0ffc66bae37a2513080b601e4370e147faaa72c5.1655684496.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 6bcd7bff6479..5987f8acca38 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -314,6 +314,8 @@ struct io_ring_ctx { struct list_head defer_list; unsigned sq_thread_idle; + /* protected by ->completion_lock */ + unsigned evfd_last_cq_tail; }; enum { -- cgit From f88262e60bb9cb5740891672ce9f405e7f9393e5 Mon Sep 17 00:00:00 2001 From: Dylan Yudaken Date: Wed, 22 Jun 2022 06:40:23 -0700 Subject: io_uring: lockless task list With networking use cases we see contention on the spinlock used to protect the task_list when multiple threads try and add completions at once. Instead we can use a lockless list, and assume that the first caller to add to the list is responsible for kicking off task work. Signed-off-by: Dylan Yudaken Link: https://lore.kernel.org/r/20220622134028.2013417-4-dylany@fb.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 5987f8acca38..918165a20053 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -428,7 +428,7 @@ typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked); struct io_task_work { union { - struct io_wq_work_node node; + struct llist_node node; struct llist_node fallback_node; }; io_req_tw_func_t func; -- cgit From 3218e5d32dbcf1b9c6dc589eca21deebb14215fa Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sat, 25 Jun 2022 11:52:59 +0100 Subject: io_uring: fuse fallback_node and normal tw node Now as both normal and fallback paths use llist, just keep one node head in struct io_task_work and kill off ->fallback_node. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/d04ebde409f7b162fe247b361b4486b193293e46.1656153285.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 918165a20053..3ca8f363f504 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -427,10 +427,7 @@ enum { typedef void (*io_req_tw_func_t)(struct io_kiocb *req, bool *locked); struct io_task_work { - union { - struct llist_node node; - struct llist_node fallback_node; - }; + struct llist_node node; io_req_tw_func_t func; }; -- cgit From 6e73dffbb93cb8797cd4e42e98d837edf0f1a967 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Sat, 25 Jun 2022 11:55:38 +0100 Subject: io_uring: let to set a range for file slot allocation From recently io_uring provides an option to allocate a file index for operation registering fixed files. However, it's utterly unusable with mixed approaches when for a part of files the userspace knows better where to place it, as it may race and users don't have any sane way to pick a slot and hoping it will not be taken. Let the userspace to register a range of fixed file slots in which the auto-allocation happens. The use case is splittting the fixed table in two parts, where on of them is used for auto-allocation and another for slot-specified operations. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/66ab0394e436f38437cf7c44676e1920d09687ad.1656154403.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 3ca8f363f504..26ef11e978d4 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -233,6 +233,9 @@ struct io_ring_ctx { unsigned long check_cq; + unsigned int file_alloc_start; + unsigned int file_alloc_end; + struct { /* * We cache a range of free CQEs we can use, once exhausted it -- cgit From 9b797a37c4bd83b03cedcfbd15852b836f5e562c Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 7 Jul 2022 14:16:20 -0600 Subject: io_uring: add abstraction around apoll cache In preparation for adding limits, and one more user, abstract out the core bits of the allocation+free cache. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 26ef11e978d4..b548da03b563 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -158,6 +158,10 @@ struct io_ev_fd { struct rcu_head rcu; }; +struct io_alloc_cache { + struct hlist_head list; +}; + struct io_ring_ctx { /* const or read-mostly hot data */ struct { @@ -216,7 +220,7 @@ struct io_ring_ctx { struct io_hash_table cancel_table_locked; struct list_head cq_overflow_list; - struct list_head apoll_cache; + struct io_alloc_cache apoll_cache; struct xarray personalities; u32 pers_next; } ____cacheline_aligned_in_smp; -- cgit From 9731bc9855dc169f27433fef3c4d0ff3496c512d Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 7 Jul 2022 14:20:54 -0600 Subject: io_uring: impose max limit on apoll cache Caches like this tend to grow to the peak size, and then never get any smaller. Impose a max limit on the size, to prevent it from growing too big. A somewhat randomly chosen 512 is the max size we'll allow the cache to get. If a batch of frees come in and would bring it over that, we simply start kfree'ing the surplus. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index b548da03b563..bf8f95332eda 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -160,6 +160,7 @@ struct io_ev_fd { struct io_alloc_cache { struct hlist_head list; + unsigned int nr_cached; }; struct io_ring_ctx { -- cgit From 43e0bbbd0b0e30d232fd8e9e908125b5c49a9fbc Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Thu, 7 Jul 2022 14:30:09 -0600 Subject: io_uring: add netmsg cache For recvmsg/sendmsg, if they don't complete inline, we currently need to allocate a struct io_async_msghdr for each request. This is a somewhat large struct. Hook up sendmsg/recvmsg to use the io_alloc_cache. This reduces the alloc + free overhead considerably, yielding 4-5% of extra performance running netbench. Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index bf8f95332eda..d54b8b7e0746 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -222,8 +222,7 @@ struct io_ring_ctx { struct io_hash_table cancel_table_locked; struct list_head cq_overflow_list; struct io_alloc_cache apoll_cache; - struct xarray personalities; - u32 pers_next; + struct io_alloc_cache netmsg_cache; } ____cacheline_aligned_in_smp; /* IRQ completion list, under ->completion_lock */ @@ -241,6 +240,9 @@ struct io_ring_ctx { unsigned int file_alloc_start; unsigned int file_alloc_end; + struct xarray personalities; + u32 pers_next; + struct { /* * We cache a range of free CQEs we can use, once exhausted it -- cgit From 7fa875b8e53c288d616234b9daf417b0650ce1cc Mon Sep 17 00:00:00 2001 From: Dylan Yudaken Date: Thu, 14 Jul 2022 04:02:56 -0700 Subject: net: copy from user before calling __copy_msghdr this is in preparation for multishot receive from io_uring, where it needs to have access to the original struct user_msghdr. functionally this should be a no-op. Acked-by: Paolo Abeni Signed-off-by: Dylan Yudaken Link: https://lore.kernel.org/r/20220714110258.1336200-2-dylany@fb.com Signed-off-by: Jens Axboe --- include/linux/socket.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/socket.h b/include/linux/socket.h index 17311ad9f9af..be24f1c8568a 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -416,10 +416,9 @@ extern int recvmsg_copy_msghdr(struct msghdr *msg, struct user_msghdr __user *umsg, unsigned flags, struct sockaddr __user **uaddr, struct iovec **iov); -extern int __copy_msghdr_from_user(struct msghdr *kmsg, - struct user_msghdr __user *umsg, - struct sockaddr __user **save_addr, - struct iovec __user **uiov, size_t *nsegs); +extern int __copy_msghdr(struct msghdr *kmsg, + struct user_msghdr *umsg, + struct sockaddr __user **save_addr); /* helpers which do the actual work for syscalls */ extern int __sys_recvfrom(int fd, void __user *ubuf, size_t size, -- cgit From fe6c9c6e3e3e332b998393d214fba9d09ab0acb0 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Thu, 23 Jun 2022 10:51:46 -0700 Subject: mm: Add balance_dirty_pages_ratelimited_flags() function This adds the helper function balance_dirty_pages_ratelimited_flags(). It adds the parameter flags to balance_dirty_pages_ratelimited(). The flags parameter is passed to balance_dirty_pages(). For async buffered writes the flag value will be BDP_ASYNC. If balance_dirty_pages() gets called for async buffered write, we don't want to wait. Instead we need to indicate to the caller that throttling is needed so that it can stop writing and offload the rest of the write to a context that can block. The new helper function is also used by balance_dirty_pages_ratelimited(). Signed-off-by: Jan Kara Signed-off-by: Stefan Roesch Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220623175157.1715274-4-shr@fb.com [axboe: fix kerneltest bot 'ret' issue] Signed-off-by: Jens Axboe --- include/linux/writeback.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/writeback.h b/include/linux/writeback.h index da21d63f70e2..b8c9610c2313 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -364,7 +364,14 @@ void global_dirty_limits(unsigned long *pbackground, unsigned long *pdirty); unsigned long wb_calc_thresh(struct bdi_writeback *wb, unsigned long thresh); void wb_update_bandwidth(struct bdi_writeback *wb); + +/* Invoke balance dirty pages in async mode. */ +#define BDP_ASYNC 0x0001 + void balance_dirty_pages_ratelimited(struct address_space *mapping); +int balance_dirty_pages_ratelimited_flags(struct address_space *mapping, + unsigned int flags); + bool wb_over_bg_thresh(struct bdi_writeback *wb); typedef int (*writepage_t)(struct page *page, struct writeback_control *wbc, -- cgit From 8017553980d0bbfef3e66c583363828565afd6da Mon Sep 17 00:00:00 2001 From: Stefan Roesch Date: Thu, 23 Jun 2022 10:51:50 -0700 Subject: fs: add a FMODE_BUF_WASYNC flags for f_mode This introduces the flag FMODE_BUF_WASYNC. If devices support async buffered writes, this flag can be set. It also modifies the check in generic_write_checks to take async buffered writes into consideration. Signed-off-by: Stefan Roesch Reviewed-by: Christoph Hellwig Reviewed-by: Christian Brauner (Microsoft) Reviewed-by: Darrick J. Wong Link: https://lore.kernel.org/r/20220623175157.1715274-8-shr@fb.com Signed-off-by: Jens Axboe --- include/linux/fs.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ad5e3520fae..bc84847c201e 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -180,6 +180,9 @@ typedef int (dio_iodone_t)(struct kiocb *iocb, loff_t offset, /* File supports async buffered reads */ #define FMODE_BUF_RASYNC ((__force fmode_t)0x40000000) +/* File supports async nowait buffered writes */ +#define FMODE_BUF_WASYNC ((__force fmode_t)0x80000000) + /* * Attribute flags. These should be or-ed together to figure out what * has been changed! -- cgit From 66fa3cedf16abc82d19b943e3289c82e685419d5 Mon Sep 17 00:00:00 2001 From: Stefan Roesch Date: Thu, 23 Jun 2022 10:51:53 -0700 Subject: fs: Add async write file modification handling. This adds a file_modified_async() function to return -EAGAIN if the request either requires to remove privileges or needs to update the file modification time. This is required for async buffered writes, so the request gets handled in the io worker of io-uring. Signed-off-by: Stefan Roesch Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Reviewed-by: Christian Brauner (Microsoft) Reviewed-by: Darrick J. Wong Link: https://lore.kernel.org/r/20220623175157.1715274-11-shr@fb.com Signed-off-by: Jens Axboe --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index bc84847c201e..c0d99b5a166b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2390,6 +2390,7 @@ static inline void file_accessed(struct file *file) } extern int file_modified(struct file *file); +int kiocb_modified(struct kiocb *iocb); int sync_inode_metadata(struct inode *inode, int wait); -- cgit From e70cb60893ca64b7df06864aa16c1cf6d6c671db Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 12 Jul 2022 21:52:37 +0100 Subject: io_uring: export io_put_task() Make io_put_task() available to non-core parts of io_uring, we'll need it for notification infrastructure. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/3686807d4c03b72e389947b0e8692d4d44334ef0.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index d54b8b7e0746..368c34d14b13 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -4,6 +4,7 @@ #include #include #include +#include #include struct io_wq_work_node { @@ -43,6 +44,30 @@ struct io_hash_table { unsigned hash_bits; }; +/* + * Arbitrary limit, can be raised if need be + */ +#define IO_RINGFD_REG_MAX 16 + +struct io_uring_task { + /* submission side */ + int cached_refs; + const struct io_ring_ctx *last; + struct io_wq *io_wq; + struct file *registered_rings[IO_RINGFD_REG_MAX]; + + struct xarray xa; + struct wait_queue_head wait; + atomic_t in_idle; + atomic_t inflight_tracked; + struct percpu_counter inflight; + + struct { /* task_work */ + struct llist_head task_list; + struct callback_head task_work; + } ____cacheline_aligned_in_smp; +}; + struct io_uring { u32 head ____cacheline_aligned_in_smp; u32 tail ____cacheline_aligned_in_smp; -- cgit From eb42cebb2cf24c48f60c32856a4bba93d42659c8 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 12 Jul 2022 21:52:38 +0100 Subject: io_uring: add zc notification infrastructure Add internal part of send zerocopy notifications. There are two main structures, the first one is struct io_notif, which carries inside struct ubuf_info and maps 1:1 to it. io_uring will be binding a number of zerocopy send requests to it and ask to complete (aka flush) it. When flushed and all attached requests and skbs complete, it'll generate one and only one CQE. There are intended to be passed into the network layer as struct msghdr::msg_ubuf. The second concept is notification slots. The userspace will be able to register an array of slots and subsequently addressing them by the index in the array. Slots are independent of each other. Each slot can have only one notifier at a time (called active notifier) but many notifiers during the lifetime. When active, a notifier not going to post any completion but the userspace can attach requests to it by specifying the corresponding slot while issueing send zc requests. Eventually, the userspace will want to "flush" the notifier losing any way to attach new requests to it, however it can use the next atomatically added notifier of this slot or of any other slot. When the network layer is done with all enqueued skbs attached to a notifier and doesn't need the specified in them user data, the flushed notifier will post a CQE. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/3ecf54c31a85762bf679b0a432c9f43ecf7e61cc.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 368c34d14b13..f7fab3758cb9 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -34,6 +34,9 @@ struct io_file_table { unsigned int alloc_hint; }; +struct io_notif; +struct io_notif_slot; + struct io_hash_bucket { spinlock_t lock; struct hlist_head list; @@ -237,6 +240,8 @@ struct io_ring_ctx { unsigned nr_user_files; unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; + struct io_notif_slot *notif_slots; + unsigned nr_notif_slots; struct io_submit_state submit_state; -- cgit From eb4a299b2f95437af6183946c2a2e850621cefdb Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 12 Jul 2022 21:52:39 +0100 Subject: io_uring: cache struct io_notif kmalloc'ing struct io_notif is too expensive when done frequently, cache them as many other resources in io_uring. Keep two list, the first one is from where we're getting notifiers, it's protected by ->uring_lock. The second is protected by ->completion_lock, to which we queue released notifiers. Then we splice one list into another when needed. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/9dec18f7fcbab9f4bd40b96e5ae158b119945230.1657643355.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index f7fab3758cb9..144493cbadb5 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -249,6 +249,9 @@ struct io_ring_ctx { struct xarray io_bl_xa; struct list_head io_buffers_cache; + /* struct io_notif cache, protected by uring_lock */ + struct list_head notif_list; + struct io_hash_table cancel_table_locked; struct list_head cq_overflow_list; struct io_alloc_cache apoll_cache; @@ -259,6 +262,10 @@ struct io_ring_ctx { struct io_wq_work_list locked_free_list; unsigned int locked_free_nr; + /* struct io_notif cache protected by completion_lock */ + struct list_head notif_list_locked; + unsigned int notif_locked_nr; + const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ -- cgit From 9d9b010f12cc4e254bbba32d79c965b564a8957f Mon Sep 17 00:00:00 2001 From: Ben Dooks Date: Sun, 24 Jul 2022 23:21:52 +0100 Subject: irqchip/mmp: Declare init functions in common header file The functions icu_init_irq and mmp2_init_icu are exported from this code, so declare them in the header file to avoid the following sparse warnings: drivers/irqchip/irq-mmp.c:248:13: warning: symbol 'icu_init_irq' was not declared. Should it be static? drivers/irqchip/irq-mmp.c:271:13: warning: symbol 'mmp2_init_icu' was not declared. Should it be static? Signed-off-by: Ben Dooks [maz: fixup commit message] Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220724222152.551850-1-ben-linux@fluff.org --- include/linux/irqchip/mmp.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/irqchip/mmp.h b/include/linux/irqchip/mmp.h index cb8455c87c8a..aa1813749a4f 100644 --- a/include/linux/irqchip/mmp.h +++ b/include/linux/irqchip/mmp.h @@ -4,4 +4,7 @@ extern struct irq_chip icu_irq_chip; +extern void icu_init_irq(void); +extern void mmp2_init_icu(void); + #endif /* __IRQCHIP_MMP_H */ -- cgit From d349ab99eec7ab0f977fc4aac27aa476907acf90 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Sun, 17 Jul 2022 12:35:24 +0200 Subject: random: handle archrandom with multiple longs The archrandom interface was originally designed for x86, which supplies RDRAND/RDSEED for receiving random words into registers, resulting in one function to generate an int and another to generate a long. However, other architectures don't follow this. On arm64, the SMCCC TRNG interface can return between one and three longs. On s390, the CPACF TRNG interface can return arbitrary amounts, with four longs having the same cost as one. On UML, the os_getrandom() interface can return arbitrary amounts. So change the api signature to take a "max_longs" parameter designating the maximum number of longs requested, and then return the number of longs generated. Since callers need to check this return value and loop anyway, each arch implementation does not bother implementing its own loop to try again to fill the maximum number of longs. Additionally, all existing callers pass in a constant max_longs parameter. Taken together, these two things mean that the codegen doesn't really change much for one-word-at-a-time platforms, while performance is greatly improved on platforms such as s390. Acked-by: Heiko Carstens Acked-by: Catalin Marinas Acked-by: Mark Rutland Acked-by: Michael Ellerman Acked-by: Borislav Petkov Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 865770e29f3e..3fec206487f6 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -112,19 +112,19 @@ declare_get_random_var_wait(long, unsigned long) * Called from the boot CPU during startup; not valid to call once * secondary CPUs are up and preemption is possible. */ -#ifndef arch_get_random_seed_long_early -static inline bool __init arch_get_random_seed_long_early(unsigned long *v) +#ifndef arch_get_random_seed_longs_early +static inline size_t __init arch_get_random_seed_longs_early(unsigned long *v, size_t max_longs) { WARN_ON(system_state != SYSTEM_BOOTING); - return arch_get_random_seed_long(v); + return arch_get_random_seed_longs(v, max_longs); } #endif -#ifndef arch_get_random_long_early -static inline bool __init arch_get_random_long_early(unsigned long *v) +#ifndef arch_get_random_longs_early +static inline bool __init arch_get_random_longs_early(unsigned long *v, size_t max_longs) { WARN_ON(system_state != SYSTEM_BOOTING); - return arch_get_random_long(v); + return arch_get_random_longs(v, max_longs); } #endif -- cgit From 7ffcdaa670164a2ad3844a5ef6df5423782ba290 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Mon, 25 Jul 2022 09:32:21 -0400 Subject: SUNRPC expose functions for offline remote xprt functionality Re-arrange the code that make offline transport and delete transport callable functions. Signed-off-by: Olga Kornievskaia Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xprt.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 522bbf937957..0d51b9f9ea37 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -505,4 +505,7 @@ static inline int xprt_test_and_set_binding(struct rpc_xprt *xprt) return test_and_set_bit(XPRT_BINDING, &xprt->state); } +void xprt_set_offline_locked(struct rpc_xprt *xprt, struct rpc_xprt_switch *xps); +void xprt_set_online_locked(struct rpc_xprt *xprt, struct rpc_xprt_switch *xps); +void xprt_delete_locked(struct rpc_xprt *xprt, struct rpc_xprt_switch *xps); #endif /* _LINUX_SUNRPC_XPRT_H */ -- cgit From 895245ccea251ff54ea19bc364c9a49007918115 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Mon, 25 Jul 2022 09:32:22 -0400 Subject: SUNRPC add function to offline remove trunkable transports Iterate thru available transports in the xprt_switch for all trunkable transports offline and possibly remote them as well. Signed-off-by: Olga Kornievskaia Signed-off-by: Trond Myklebust --- include/linux/sunrpc/clnt.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 90501404fa49..d14333f4947a 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -234,6 +234,7 @@ int rpc_clnt_setup_test_and_add_xprt(struct rpc_clnt *, struct rpc_xprt_switch *, struct rpc_xprt *, void *); +void rpc_clnt_manage_trunked_xprts(struct rpc_clnt *); const char *rpc_proc_name(const struct rpc_task *task); -- cgit From 95d0d30c66b855f614e677b8cd0455eed0765a6f Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Mon, 25 Jul 2022 09:32:24 -0400 Subject: SUNRPC create an iterator to list only OFFLINE xprts Create a new iterator helper that will go thru the all the transports in the switch and return transports that are marked OFFLINE. Signed-off-by: Olga Kornievskaia Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xprtmultipath.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xprtmultipath.h b/include/linux/sunrpc/xprtmultipath.h index bbb8a5fa0816..688ca7eb1d01 100644 --- a/include/linux/sunrpc/xprtmultipath.h +++ b/include/linux/sunrpc/xprtmultipath.h @@ -63,6 +63,9 @@ extern void xprt_iter_init(struct rpc_xprt_iter *xpi, extern void xprt_iter_init_listall(struct rpc_xprt_iter *xpi, struct rpc_xprt_switch *xps); +extern void xprt_iter_init_listoffline(struct rpc_xprt_iter *xpi, + struct rpc_xprt_switch *xps); + extern void xprt_iter_destroy(struct rpc_xprt_iter *xpi); extern struct rpc_xprt_switch *xprt_iter_xchg_switch( -- cgit From 9368fd6c75053630e95a6dbd17c9522e82101276 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Mon, 25 Jul 2022 09:32:25 -0400 Subject: SUNRPC enable back offline transports in trunking discovery When we are adding a transport to a xprt_switch that's already on the list but has been marked OFFLINE, then make the state ONLINE since it's been tested now. Signed-off-by: Olga Kornievskaia Signed-off-by: Trond Myklebust --- include/linux/sunrpc/clnt.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index d14333f4947a..71a3a1dd7e81 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -242,6 +242,7 @@ void rpc_clnt_xprt_switch_put(struct rpc_clnt *); void rpc_clnt_xprt_switch_add_xprt(struct rpc_clnt *, struct rpc_xprt *); bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt, const struct sockaddr *sap); +void rpc_clnt_xprt_set_online(struct rpc_clnt *clnt, struct rpc_xprt *xprt); void rpc_cleanup_clids(void); static inline int rpc_reply_expected(struct rpc_task *task) -- cgit From 497e6464d6adcee64f071b18fc826e63cfd2f0a5 Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Mon, 25 Jul 2022 09:32:26 -0400 Subject: SUNRPC create an rpc function that allows xprt removal from rpc_clnt Expose a function that allows a removal of xprt from the rpc_clnt. When called from NFS that's running a trunked transport then don't decrement the active transport counter. Signed-off-by: Olga Kornievskaia Signed-off-by: Trond Myklebust --- include/linux/sunrpc/clnt.h | 1 + include/linux/sunrpc/xprtmultipath.h | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 71a3a1dd7e81..7a43fd514398 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -240,6 +240,7 @@ const char *rpc_proc_name(const struct rpc_task *task); void rpc_clnt_xprt_switch_put(struct rpc_clnt *); void rpc_clnt_xprt_switch_add_xprt(struct rpc_clnt *, struct rpc_xprt *); +void rpc_clnt_xprt_switch_remove_xprt(struct rpc_clnt *, struct rpc_xprt *); bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt, const struct sockaddr *sap); void rpc_clnt_xprt_set_online(struct rpc_clnt *clnt, struct rpc_xprt *xprt); diff --git a/include/linux/sunrpc/xprtmultipath.h b/include/linux/sunrpc/xprtmultipath.h index 688ca7eb1d01..9fff0768d942 100644 --- a/include/linux/sunrpc/xprtmultipath.h +++ b/include/linux/sunrpc/xprtmultipath.h @@ -55,7 +55,7 @@ extern void rpc_xprt_switch_set_roundrobin(struct rpc_xprt_switch *xps); extern void rpc_xprt_switch_add_xprt(struct rpc_xprt_switch *xps, struct rpc_xprt *xprt); extern void rpc_xprt_switch_remove_xprt(struct rpc_xprt_switch *xps, - struct rpc_xprt *xprt); + struct rpc_xprt *xprt, bool offline); extern void xprt_iter_init(struct rpc_xprt_iter *xpi, struct rpc_xprt_switch *xps); -- cgit From 273d6aed9e5a1859dda15256f45561315c3d237a Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Mon, 25 Jul 2022 09:32:29 -0400 Subject: SUNRPC export xprt_iter_rewind function Make xprt_iter_rewind callable outside of xprtmultipath.c Signed-off-by: Olga Kornievskaia Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xprtmultipath.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xprtmultipath.h b/include/linux/sunrpc/xprtmultipath.h index 9fff0768d942..c0514c684b2c 100644 --- a/include/linux/sunrpc/xprtmultipath.h +++ b/include/linux/sunrpc/xprtmultipath.h @@ -68,6 +68,8 @@ extern void xprt_iter_init_listoffline(struct rpc_xprt_iter *xpi, extern void xprt_iter_destroy(struct rpc_xprt_iter *xpi); +extern void xprt_iter_rewind(struct rpc_xprt_iter *xpi); + extern struct rpc_xprt_switch *xprt_iter_xchg_switch( struct rpc_xprt_iter *xpi, struct rpc_xprt_switch *newswitch); -- cgit From 92cc04f60ab4ae199eee507e5cd4d5aa6c722e9c Mon Sep 17 00:00:00 2001 From: Olga Kornievskaia Date: Mon, 25 Jul 2022 09:32:30 -0400 Subject: SUNRPC create a function that probes only offline transports For only offline transports, attempt to check connectivity via a NULL call and, if that succeeds, call a provided session trunking detection function. Signed-off-by: Olga Kornievskaia Signed-off-by: Trond Myklebust --- include/linux/sunrpc/clnt.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 7a43fd514398..75eea5ebb179 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -235,6 +235,8 @@ int rpc_clnt_setup_test_and_add_xprt(struct rpc_clnt *, struct rpc_xprt *, void *); void rpc_clnt_manage_trunked_xprts(struct rpc_clnt *); +void rpc_clnt_probe_trunked_xprts(struct rpc_clnt *, + struct rpc_add_xprt_test *); const char *rpc_proc_name(const struct rpc_task *task); -- cgit From 39ade048a32ea653b94dcfbf816b0b13a6be8a33 Mon Sep 17 00:00:00 2001 From: "Fabio M. De Francesco" Date: Wed, 6 Jul 2022 13:15:19 +0200 Subject: highmem: Make __kunmap_{local,atomic}() take const void pointer __kunmap_ {local,atomic}() currently take pointers to void. However, this is semantically incorrect, since these functions do not change the memory their arguments point to. Therefore, make this semantics explicit by modifying the __kunmap_{local,atomic}() prototypes to take pointers to const void. As a side effect, compilers may produce more efficient code. Acked-by: Andrew Morton Acked-by: Helge Deller # parisc Suggested-by: David Sterba Suggested-by: Ira Weiny Reviewed-by: Ira Weiny Signed-off-by: Fabio M. De Francesco Reviewed-by: David Sterba Signed-off-by: David Sterba --- include/linux/highmem-internal.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/highmem-internal.h b/include/linux/highmem-internal.h index cddb42ff0473..034b1106d022 100644 --- a/include/linux/highmem-internal.h +++ b/include/linux/highmem-internal.h @@ -8,7 +8,7 @@ #ifdef CONFIG_KMAP_LOCAL void *__kmap_local_pfn_prot(unsigned long pfn, pgprot_t prot); void *__kmap_local_page_prot(struct page *page, pgprot_t prot); -void kunmap_local_indexed(void *vaddr); +void kunmap_local_indexed(const void *vaddr); void kmap_local_fork(struct task_struct *tsk); void __kmap_local_sched_out(void); void __kmap_local_sched_in(void); @@ -89,7 +89,7 @@ static inline void *kmap_local_pfn(unsigned long pfn) return __kmap_local_pfn_prot(pfn, kmap_prot); } -static inline void __kunmap_local(void *vaddr) +static inline void __kunmap_local(const void *vaddr) { kunmap_local_indexed(vaddr); } @@ -121,7 +121,7 @@ static inline void *kmap_atomic_pfn(unsigned long pfn) return __kmap_local_pfn_prot(pfn, kmap_prot); } -static inline void __kunmap_atomic(void *addr) +static inline void __kunmap_atomic(const void *addr) { kunmap_local_indexed(addr); pagefault_enable(); @@ -197,7 +197,7 @@ static inline void *kmap_local_pfn(unsigned long pfn) return kmap_local_page(pfn_to_page(pfn)); } -static inline void __kunmap_local(void *addr) +static inline void __kunmap_local(const void *addr) { #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(addr); @@ -224,7 +224,7 @@ static inline void *kmap_atomic_pfn(unsigned long pfn) return kmap_atomic(pfn_to_page(pfn)); } -static inline void __kunmap_atomic(void *addr) +static inline void __kunmap_atomic(const void *addr) { #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(addr); -- cgit From 65ea1b66482f415d51cd46515b02477257330339 Mon Sep 17 00:00:00 2001 From: Naohiro Aota Date: Sat, 9 Jul 2022 08:18:38 +0900 Subject: block: add bdev_max_segments() helper Add bdev_max_segments() like other queue parameters. Reviewed-by: Johannes Thumshirn Reviewed-by: Jens Axboe Reviewed-by: Christoph Hellwig Signed-off-by: Naohiro Aota Signed-off-by: David Sterba --- include/linux/blkdev.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 2f7b43444c5f..62e3ff52ab03 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1206,6 +1206,11 @@ bdev_max_zone_append_sectors(struct block_device *bdev) return queue_max_zone_append_sectors(bdev_get_queue(bdev)); } +static inline unsigned int bdev_max_segments(struct block_device *bdev) +{ + return queue_max_segments(bdev_get_queue(bdev)); +} + static inline unsigned queue_logical_block_size(const struct request_queue *q) { int retval = 512; -- cgit From 44abdd1646e1fbfb781972c0bffc90b4eb3e87b3 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Fri, 22 Jul 2022 19:02:51 -0700 Subject: vfio: Pass in starting IOVA to vfio_pin/unpin_pages API The vfio_pin/unpin_pages() so far accepted arrays of PFNs of user IOVA. Among all three callers, there was only one caller possibly passing in a non-contiguous PFN list, which is now ensured to have contiguous PFN inputs too. Pass in the starting address with "iova" alone to simplify things, so callers no longer need to maintain a PFN list or to pin/unpin one page at a time. This also allows VFIO to use more efficient implementations of pin/unpin_pages. For now, also update vfio_iommu_type1 to fit this new parameter too, while keeping its input intact (being user_iova) since we don't want to spend too much effort swapping its parameters and local variables at that level. Reviewed-by: Christoph Hellwig Reviewed-by: Kirti Wankhede Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Tony Krowiak Acked-by: Eric Farman Tested-by: Terrence Xu Tested-by: Eric Farman Signed-off-by: Nicolin Chen Link: https://lore.kernel.org/r/20220723020256.30081-6-nicolinc@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 9f7d74c24925..9e3b6abcf890 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -161,10 +161,9 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device); #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) -int vfio_pin_pages(struct vfio_device *device, unsigned long *user_pfn, +int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova, int npage, int prot, unsigned long *phys_pfn); -void vfio_unpin_pages(struct vfio_device *device, unsigned long *user_pfn, - int npage); +void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage); int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, void *data, size_t len, bool write); -- cgit From 8561aa4fb7d72011c2352af2b1b9caf588942181 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Fri, 22 Jul 2022 19:02:54 -0700 Subject: vfio: Rename user_iova of vfio_dma_rw() Following the updated vfio_pin/unpin_pages(), use the simpler "iova". Reviewed-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Tested-by: Terrence Xu Tested-by: Eric Farman Signed-off-by: Nicolin Chen Link: https://lore.kernel.org/r/20220723020256.30081-9-nicolinc@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 9e3b6abcf890..acefd663e63b 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -164,7 +164,7 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device); int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova, int npage, int prot, unsigned long *phys_pfn); void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage); -int vfio_dma_rw(struct vfio_device *device, dma_addr_t user_iova, +int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data, size_t len, bool write); /* -- cgit From 34a255e67615995f729254307a0581c143e03752 Mon Sep 17 00:00:00 2001 From: Nicolin Chen Date: Fri, 22 Jul 2022 19:02:56 -0700 Subject: vfio: Replace phys_pfn with pages for vfio_pin_pages() Most of the callers of vfio_pin_pages() want "struct page *" and the low-level mm code to pin pages returns a list of "struct page *" too. So there's no gain in converting "struct page *" to PFN in between. Replace the output parameter "phys_pfn" list with a "pages" list, to simplify callers. This also allows us to replace the vfio_iommu_type1 implementation with a more efficient one. And drop the pfn_valid check in the gvt code, as there is no need to do such a check at a page-backed struct page pointer. For now, also update vfio_iommu_type1 to fit this new parameter too. Reviewed-by: Christoph Hellwig Reviewed-by: Kirti Wankhede Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Acked-by: Eric Farman Tested-by: Terrence Xu Tested-by: Eric Farman Signed-off-by: Nicolin Chen Link: https://lore.kernel.org/r/20220723020256.30081-11-nicolinc@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index acefd663e63b..e05ddc6fe6a5 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -162,7 +162,7 @@ bool vfio_file_has_dev(struct file *file, struct vfio_device *device); #define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long)) int vfio_pin_pages(struct vfio_device *device, dma_addr_t iova, - int npage, int prot, unsigned long *phys_pfn); + int npage, int prot, struct page **pages); void vfio_unpin_pages(struct vfio_device *device, dma_addr_t iova, int npage); int vfio_dma_rw(struct vfio_device *device, dma_addr_t iova, void *data, size_t len, bool write); -- cgit From e948d32c54fa45cc1601c98956bdbbf5f17a3db2 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Sat, 23 Jul 2022 19:57:17 +0200 Subject: video: fbdev: imxfb: Drop platform data support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit No source file but the driver itself includes the header containing the platform data definition. The last user is gone since commit 8485adf17a15 ("ARM: imx: Remove imx device directory"). So we can safely drop platform data support. Signed-off-by: Uwe Kleine-König Signed-off-by: Helge Deller --- include/linux/platform_data/video-imxfb.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/video-imxfb.h b/include/linux/platform_data/video-imxfb.h index 02812651af7d..b80a156a6617 100644 --- a/include/linux/platform_data/video-imxfb.h +++ b/include/linux/platform_data/video-imxfb.h @@ -55,16 +55,4 @@ struct imx_fb_videomode { unsigned char bpp; }; -struct imx_fb_platform_data { - struct imx_fb_videomode *mode; - int num_modes; - - u_int pwmr; - u_int lscr1; - u_int dmacr; - - int (*init)(struct platform_device *); - void (*exit)(struct platform_device *); -}; - #endif /* ifndef __MACH_IMXFB_H__ */ -- cgit From e2279cc92919f37b4af985cb28ae350bb3e62e71 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Sat, 23 Jul 2022 19:57:18 +0200 Subject: video: fbdev: imxfb: Drop unused symbols from header MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The only file that includes is the imxfb driver. Drop all symbols that are unused there. Signed-off-by: Uwe Kleine-König Signed-off-by: Helge Deller --- include/linux/platform_data/video-imxfb.h | 35 ------------------------------- 1 file changed, 35 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/video-imxfb.h b/include/linux/platform_data/video-imxfb.h index b80a156a6617..a16837c5e43c 100644 --- a/include/linux/platform_data/video-imxfb.h +++ b/include/linux/platform_data/video-imxfb.h @@ -8,45 +8,10 @@ #include #define PCR_TFT (1 << 31) -#define PCR_COLOR (1 << 30) -#define PCR_PBSIZ_1 (0 << 28) -#define PCR_PBSIZ_2 (1 << 28) -#define PCR_PBSIZ_4 (2 << 28) -#define PCR_PBSIZ_8 (3 << 28) -#define PCR_BPIX_1 (0 << 25) -#define PCR_BPIX_2 (1 << 25) -#define PCR_BPIX_4 (2 << 25) #define PCR_BPIX_8 (3 << 25) #define PCR_BPIX_12 (4 << 25) #define PCR_BPIX_16 (5 << 25) #define PCR_BPIX_18 (6 << 25) -#define PCR_PIXPOL (1 << 24) -#define PCR_FLMPOL (1 << 23) -#define PCR_LPPOL (1 << 22) -#define PCR_CLKPOL (1 << 21) -#define PCR_OEPOL (1 << 20) -#define PCR_SCLKIDLE (1 << 19) -#define PCR_END_SEL (1 << 18) -#define PCR_END_BYTE_SWAP (1 << 17) -#define PCR_REV_VS (1 << 16) -#define PCR_ACD_SEL (1 << 15) -#define PCR_ACD(x) (((x) & 0x7f) << 8) -#define PCR_SCLK_SEL (1 << 7) -#define PCR_SHARP (1 << 6) -#define PCR_PCD(x) ((x) & 0x3f) - -#define PWMR_CLS(x) (((x) & 0x1ff) << 16) -#define PWMR_LDMSK (1 << 15) -#define PWMR_SCR1 (1 << 10) -#define PWMR_SCR0 (1 << 9) -#define PWMR_CC_EN (1 << 8) -#define PWMR_PW(x) ((x) & 0xff) - -#define LSCR1_PS_RISE_DELAY(x) (((x) & 0x7f) << 26) -#define LSCR1_CLS_RISE_DELAY(x) (((x) & 0x3f) << 16) -#define LSCR1_REV_TOGGLE_DELAY(x) (((x) & 0xf) << 8) -#define LSCR1_GRAY2(x) (((x) & 0xf) << 4) -#define LSCR1_GRAY1(x) (((x) & 0xf)) struct imx_fb_videomode { struct fb_videomode mode; -- cgit From ded77a74ee6bc3dea72ad41129823a812e4b64f3 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Sat, 23 Jul 2022 19:57:19 +0200 Subject: video: fbdev: imxfb: Fold into only user MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit No source file but the driver itself includes the header containing the platform data definition. The last user is gone since commit 8485adf17a15 ("ARM: imx: Remove imx device directory"). Move the remaining symbols directly into the driver and remove the then unused header file. Signed-off-by: Uwe Kleine-König Signed-off-by: Helge Deller --- include/linux/platform_data/video-imxfb.h | 23 ----------------------- 1 file changed, 23 deletions(-) delete mode 100644 include/linux/platform_data/video-imxfb.h (limited to 'include/linux') diff --git a/include/linux/platform_data/video-imxfb.h b/include/linux/platform_data/video-imxfb.h deleted file mode 100644 index a16837c5e43c..000000000000 --- a/include/linux/platform_data/video-imxfb.h +++ /dev/null @@ -1,23 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -/* - * This structure describes the machine which we are running on. - */ -#ifndef __MACH_IMXFB_H__ -#define __MACH_IMXFB_H__ - -#include - -#define PCR_TFT (1 << 31) -#define PCR_BPIX_8 (3 << 25) -#define PCR_BPIX_12 (4 << 25) -#define PCR_BPIX_16 (5 << 25) -#define PCR_BPIX_18 (6 << 25) - -struct imx_fb_videomode { - struct fb_videomode mode; - u32 pcr; - bool aus_mode; - unsigned char bpp; -}; - -#endif /* ifndef __MACH_IMXFB_H__ */ -- cgit From 42399301203e3cddef36cde457228f9247618313 Mon Sep 17 00:00:00 2001 From: Logan Gunthorpe Date: Fri, 8 Jul 2022 10:50:52 -0600 Subject: lib/scatterlist: add flag for indicating P2PDMA segments in an SGL Introduce a dma_flags field in struct scatterlist. These flags will be used by dma_[un]map_sg_p2pdma() to determine when a given SGL segments dma_address points to a PCI bus address. dma_unmap_sg_p2pdma() will need to perform different cleanup when a segment is marked as a bus address. The dma_flags field will fit in the existing padding on 64BIT systems (assuming CONFIG_NEED_SG_DMA_LENGTH is also set). The new bit will only be used when CONFIG_PCI_P2PDMA is set; this means PCI P2PDMA will require CONFIG_64BIT. This should be acceptable as the majority of P2PDMA use cases are restricted to newer root complexes and roughly require the extra address space for memory BARs used in the transactions. Signed-off-by: Logan Gunthorpe Signed-off-by: Christoph Hellwig --- include/linux/scatterlist.h | 69 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+) (limited to 'include/linux') diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h index 7ff9d6386c12..375a5e90d86a 100644 --- a/include/linux/scatterlist.h +++ b/include/linux/scatterlist.h @@ -16,6 +16,9 @@ struct scatterlist { #ifdef CONFIG_NEED_SG_DMA_LENGTH unsigned int dma_length; #endif +#ifdef CONFIG_PCI_P2PDMA + unsigned int dma_flags; +#endif }; /* @@ -245,6 +248,72 @@ static inline void sg_unmark_end(struct scatterlist *sg) sg->page_link &= ~SG_END; } +/* + * CONFGI_PCI_P2PDMA depends on CONFIG_64BIT which means there is 4 bytes + * in struct scatterlist (assuming also CONFIG_NEED_SG_DMA_LENGTH is set). + * Use this padding for DMA flags bits to indicate when a specific + * dma address is a bus address. + */ +#ifdef CONFIG_PCI_P2PDMA + +#define SG_DMA_BUS_ADDRESS (1 << 0) + +/** + * sg_dma_is_bus address - Return whether a given segment was marked + * as a bus address + * @sg: SG entry + * + * Description: + * Returns true if sg_dma_mark_bus_address() has been called on + * this segment. + **/ +static inline bool sg_is_dma_bus_address(struct scatterlist *sg) +{ + return sg->dma_flags & SG_DMA_BUS_ADDRESS; +} + +/** + * sg_dma_mark_bus address - Mark the scatterlist entry as a bus address + * @sg: SG entry + * + * Description: + * Marks the passed in sg entry to indicate that the dma_address is + * a bus address and doesn't need to be unmapped. This should only be + * used by dma_map_sg() implementations to mark bus addresses + * so they can be properly cleaned up in dma_unmap_sg(). + **/ +static inline void sg_dma_mark_bus_address(struct scatterlist *sg) +{ + sg->dma_flags |= SG_DMA_BUS_ADDRESS; +} + +/** + * sg_unmark_bus_address - Unmark the scatterlist entry as a bus address + * @sg: SG entry + * + * Description: + * Clears the bus address mark. + **/ +static inline void sg_dma_unmark_bus_address(struct scatterlist *sg) +{ + sg->dma_flags &= ~SG_DMA_BUS_ADDRESS; +} + +#else + +static inline bool sg_is_dma_bus_address(struct scatterlist *sg) +{ + return false; +} +static inline void sg_dma_mark_bus_address(struct scatterlist *sg) +{ +} +static inline void sg_dma_unmark_bus_address(struct scatterlist *sg) +{ +} + +#endif + /** * sg_phys - Return physical address of an sg entry * @sg: SG entry -- cgit From 5e180ff326b43bd3fb72892e1083b2707159aa6e Mon Sep 17 00:00:00 2001 From: Logan Gunthorpe Date: Fri, 8 Jul 2022 10:50:54 -0600 Subject: PCI/P2PDMA: Introduce helpers for dma_map_sg implementations Add pci_p2pdma_map_segment() as a helper for dma_map_sg() implementations. It takes an scatterlist segment that must point to a pci_p2pdma struct page and will map it if the mapping requires a bus address. The return value indicates whether the mapping required a bus address or whether the caller still needs to map the segment normally. If the segment should not be mapped, -EREMOTEIO is returned. This helper uses a state structure to track the changes to the pgmap across calls and avoid needing to lookup into the xarray for every page. The prototype for the helper is added to dma-map-ops.h as it is only useful to dma map implementations and don't need to pollute the public pci-p2pdma header. Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Reviewed-by: Christoph Hellwig Signed-off-by: Christoph Hellwig --- include/linux/dma-map-ops.h | 53 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 98ceba6fa848..99cec59dbfcb 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -380,4 +380,57 @@ static inline void debug_dma_dump_mappings(struct device *dev) extern const struct dma_map_ops dma_dummy_ops; +enum pci_p2pdma_map_type { + /* + * PCI_P2PDMA_MAP_UNKNOWN: Used internally for indicating the mapping + * type hasn't been calculated yet. Functions that return this enum + * never return this value. + */ + PCI_P2PDMA_MAP_UNKNOWN = 0, + + /* + * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will + * traverse the host bridge and the host bridge is not in the + * allowlist. DMA Mapping routines should return an error when + * this is returned. + */ + PCI_P2PDMA_MAP_NOT_SUPPORTED, + + /* + * PCI_P2PDMA_BUS_ADDR: Indicates that two devices can talk to + * each other directly through a PCI switch and the transaction will + * not traverse the host bridge. Such a mapping should program + * the DMA engine with PCI bus addresses. + */ + PCI_P2PDMA_MAP_BUS_ADDR, + + /* + * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk + * to each other, but the transaction traverses a host bridge on the + * allowlist. In this case, a normal mapping either with CPU physical + * addresses (in the case of dma-direct) or IOVA addresses (in the + * case of IOMMUs) should be used to program the DMA engine. + */ + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, +}; + +struct pci_p2pdma_map_state { + struct dev_pagemap *pgmap; + int map; + u64 bus_off; +}; + +#ifdef CONFIG_PCI_P2PDMA +enum pci_p2pdma_map_type +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, + struct scatterlist *sg); +#else /* CONFIG_PCI_P2PDMA */ +static inline enum pci_p2pdma_map_type +pci_p2pdma_map_segment(struct pci_p2pdma_map_state *state, struct device *dev, + struct scatterlist *sg) +{ + return PCI_P2PDMA_MAP_NOT_SUPPORTED; +} +#endif /* CONFIG_PCI_P2PDMA */ + #endif /* _LINUX_DMA_MAP_OPS_H */ -- cgit From 159bf19270e80b5bc4b13aa88072dcb390b4d297 Mon Sep 17 00:00:00 2001 From: Logan Gunthorpe Date: Fri, 8 Jul 2022 10:50:57 -0600 Subject: dma-mapping: add flags to dma_map_ops to indicate PCI P2PDMA support Add a flags member to the dma_map_ops structure with one flag to indicate support for PCI P2PDMA. Also, add a helper to check if a device supports PCI P2PDMA. Signed-off-by: Logan Gunthorpe Reviewed-by: Jason Gunthorpe Reviewed-by: Christoph Hellwig Signed-off-by: Christoph Hellwig --- include/linux/dma-map-ops.h | 10 ++++++++++ include/linux/dma-mapping.h | 5 +++++ 2 files changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h index 99cec59dbfcb..010df04358aa 100644 --- a/include/linux/dma-map-ops.h +++ b/include/linux/dma-map-ops.h @@ -11,7 +11,17 @@ struct cma; +/* + * Values for struct dma_map_ops.flags: + * + * DMA_F_PCI_P2PDMA_SUPPORTED: Indicates the dma_map_ops implementation can + * handle PCI P2PDMA pages in the map_sg/unmap_sg operation. + */ +#define DMA_F_PCI_P2PDMA_SUPPORTED (1 << 0) + struct dma_map_ops { + unsigned int flags; + void *(*alloc)(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index fe3849434b2a..25a30906289d 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -140,6 +140,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, unsigned long attrs); bool dma_can_mmap(struct device *dev); int dma_supported(struct device *dev, u64 mask); +bool dma_pci_p2pdma_supported(struct device *dev); int dma_set_mask(struct device *dev, u64 mask); int dma_set_coherent_mask(struct device *dev, u64 mask); u64 dma_get_required_mask(struct device *dev); @@ -251,6 +252,10 @@ static inline int dma_supported(struct device *dev, u64 mask) { return 0; } +static inline bool dma_pci_p2pdma_supported(struct device *dev) +{ + return false; +} static inline int dma_set_mask(struct device *dev, u64 mask) { return -EIO; -- cgit From 0d06132fc84bd91379300768c75a19474e06d0c5 Mon Sep 17 00:00:00 2001 From: Logan Gunthorpe Date: Fri, 8 Jul 2022 10:51:04 -0600 Subject: PCI/P2PDMA: Remove pci_p2pdma_[un]map_sg() This interface is superseded by support in dma_map_sg() which now supports heterogeneous scatterlists. There are no longer any users, so remove it. Signed-off-by: Logan Gunthorpe Acked-by: Bjorn Helgaas Reviewed-by: Jason Gunthorpe Reviewed-by: Max Gurtovoy Reviewed-by: Christoph Hellwig Signed-off-by: Christoph Hellwig --- include/linux/pci-p2pdma.h | 27 --------------------------- 1 file changed, 27 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pci-p2pdma.h b/include/linux/pci-p2pdma.h index 8318a97c9c61..2c07aa6b7665 100644 --- a/include/linux/pci-p2pdma.h +++ b/include/linux/pci-p2pdma.h @@ -30,10 +30,6 @@ struct scatterlist *pci_p2pmem_alloc_sgl(struct pci_dev *pdev, unsigned int *nents, u32 length); void pci_p2pmem_free_sgl(struct pci_dev *pdev, struct scatterlist *sgl); void pci_p2pmem_publish(struct pci_dev *pdev, bool publish); -int pci_p2pdma_map_sg_attrs(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction dir, unsigned long attrs); -void pci_p2pdma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction dir, unsigned long attrs); int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev, bool *use_p2pdma); ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, @@ -83,17 +79,6 @@ static inline void pci_p2pmem_free_sgl(struct pci_dev *pdev, static inline void pci_p2pmem_publish(struct pci_dev *pdev, bool publish) { } -static inline int pci_p2pdma_map_sg_attrs(struct device *dev, - struct scatterlist *sg, int nents, enum dma_data_direction dir, - unsigned long attrs) -{ - return 0; -} -static inline void pci_p2pdma_unmap_sg_attrs(struct device *dev, - struct scatterlist *sg, int nents, enum dma_data_direction dir, - unsigned long attrs) -{ -} static inline int pci_p2pdma_enable_store(const char *page, struct pci_dev **p2p_dev, bool *use_p2pdma) { @@ -119,16 +104,4 @@ static inline struct pci_dev *pci_p2pmem_find(struct device *client) return pci_p2pmem_find_many(&client, 1); } -static inline int pci_p2pdma_map_sg(struct device *dev, struct scatterlist *sg, - int nents, enum dma_data_direction dir) -{ - return pci_p2pdma_map_sg_attrs(dev, sg, nents, dir, 0); -} - -static inline void pci_p2pdma_unmap_sg(struct device *dev, - struct scatterlist *sg, int nents, enum dma_data_direction dir) -{ - pci_p2pdma_unmap_sg_attrs(dev, sg, nents, dir, 0); -} - #endif /* _LINUX_PCI_P2P_H */ -- cgit From 019e442bb0d5e7f25ada8eb4252e22e62d17a95e Mon Sep 17 00:00:00 2001 From: Axe Yang Date: Tue, 26 Jul 2022 14:28:41 +0800 Subject: mmc: core: Add support for SDIO wakeup interrupt If wakeup-source flag is set in host dts node, parse EAI information from SDIO CCCR interrupt externsion segment for in-band wakeup. If async interrupt is supported by SDIO card then enable it and set enable_async_irq flag in sdio_cccr structure to 1. The parse flow is implemented in sdio_read_cccr(). Reviewed-by: AngeloGioacchino Del Regno Signed-off-by: Axe Yang Link: https://lore.kernel.org/r/20220726062842.18846-3-axe.yang@mediatek.com Signed-off-by: Ulf Hansson --- include/linux/mmc/card.h | 8 +++++++- include/linux/mmc/sdio.h | 5 +++++ 2 files changed, 12 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h index 156a7b673a28..8a30de08e913 100644 --- a/include/linux/mmc/card.h +++ b/include/linux/mmc/card.h @@ -219,7 +219,8 @@ struct sdio_cccr { wide_bus:1, high_power:1, high_speed:1, - disable_cd:1; + disable_cd:1, + enable_async_irq:1; }; struct sdio_cis { @@ -343,6 +344,11 @@ static inline bool mmc_large_sector(struct mmc_card *card) return card->ext_csd.data_sector_size == 4096; } +static inline int mmc_card_enable_async_irq(struct mmc_card *card) +{ + return card->cccr.enable_async_irq; +} + bool mmc_card_is_blockaddr(struct mmc_card *card); #define mmc_card_mmc(c) ((c)->type == MMC_TYPE_MMC) diff --git a/include/linux/mmc/sdio.h b/include/linux/mmc/sdio.h index 2a05d1ac4f0e..1ef400f28642 100644 --- a/include/linux/mmc/sdio.h +++ b/include/linux/mmc/sdio.h @@ -159,6 +159,11 @@ #define SDIO_DTSx_SET_TYPE_A (1 << SDIO_DRIVE_DTSx_SHIFT) #define SDIO_DTSx_SET_TYPE_C (2 << SDIO_DRIVE_DTSx_SHIFT) #define SDIO_DTSx_SET_TYPE_D (3 << SDIO_DRIVE_DTSx_SHIFT) + +#define SDIO_CCCR_INTERRUPT_EXT 0x16 +#define SDIO_INTERRUPT_EXT_SAI (1 << 0) +#define SDIO_INTERRUPT_EXT_EAI (1 << 1) + /* * Function Basic Registers (FBR) */ -- cgit From 46126db9c86110e5fc1e369b9bb89735ddefdae4 Mon Sep 17 00:00:00 2001 From: Wojciech Drewek Date: Mon, 18 Jul 2022 14:18:10 +0200 Subject: flow_dissector: Add PPPoE dissectors Allow to dissect PPPoE specific fields which are: - session ID (16 bits) - ppp protocol (16 bits) - type (16 bits) - this is PPPoE ethertype, for now only ETH_P_PPP_SES is supported, possible ETH_P_PPP_DISC in the future The goal is to make the following TC command possible: # tc filter add dev ens6f0 ingress prio 1 protocol ppp_ses \ flower \ pppoe_sid 12 \ ppp_proto ip \ action drop Note that only PPPoE Session is supported. Signed-off-by: Wojciech Drewek Acked-by: Guillaume Nault Signed-off-by: Tony Nguyen --- include/linux/ppp_defs.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ppp_defs.h b/include/linux/ppp_defs.h index 9d2b388fae1a..b7e57fdbd413 100644 --- a/include/linux/ppp_defs.h +++ b/include/linux/ppp_defs.h @@ -11,4 +11,18 @@ #include #define PPP_FCS(fcs, c) crc_ccitt_byte(fcs, c) + +/** + * ppp_proto_is_valid - checks if PPP protocol is valid + * @proto: PPP protocol + * + * Assumes proto is not compressed. + * Protocol is valid if the value is odd and the least significant bit of the + * most significant octet is 0 (see RFC 1661, section 2). + */ +static inline bool ppp_proto_is_valid(u16 proto) +{ + return !!((proto & 0x0101) == 0x0001); +} + #endif /* _PPP_DEFS_H_ */ -- cgit From 04ad63f086d1a9649b8b082748cbc7a570ade461 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Tue, 11 Jan 2022 08:06:40 -0800 Subject: cxl/region: Introduce cxl_pmem_region objects The LIBNVDIMM subsystem is a platform agnostic representation of system NVDIMM / persistent memory resources. To date, the CXL subsystem's interaction with LIBNVDIMM has been to register an nvdimm-bridge device and cxl_nvdimm objects to proxy CXL capabilities into existing LIBNVDIMM subsystem mechanics. With regions the approach is the same. Create a new cxl_pmem_region object to proxy CXL region details into a LIBNVDIMM definition. With this enabling LIBNVDIMM can partition CXL persistent memory regions with legacy namespace labels. A follow-on patch will add CXL region label and CXL namespace label support to persist region configurations across driver reload / system-reset events. Co-developed-by: Ben Widawsky Signed-off-by: Ben Widawsky Reviewed-by: Jonathan Cameron Link: https://lore.kernel.org/r/165784340111.1758207.3036498385188290968.stgit@dwillia2-xfh.jf.intel.com Signed-off-by: Dan Williams --- include/linux/libnvdimm.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h index 0d61e07b6827..c74acfa1a3fe 100644 --- a/include/linux/libnvdimm.h +++ b/include/linux/libnvdimm.h @@ -59,6 +59,9 @@ enum { /* Platform provides asynchronous flush mechanism */ ND_REGION_ASYNC = 3, + /* Region was created by CXL subsystem */ + ND_REGION_CXL = 4, + /* mark newly adjusted resources as requiring a label update */ DPA_RESOURCE_ADJUSTED = 1 << 0, }; @@ -122,6 +125,7 @@ struct nd_region_desc { int numa_node; int target_node; unsigned long flags; + int memregion; struct device_node *of_node; int (*flush)(struct nd_region *nd_region, struct bio *bio); }; @@ -259,6 +263,7 @@ static inline struct nvdimm *nvdimm_create(struct nvdimm_bus *nvdimm_bus, cmd_mask, num_flush, flush_wpq, NULL, NULL, NULL); } void nvdimm_delete(struct nvdimm *nvdimm); +void nvdimm_region_delete(struct nd_region *nd_region); const struct nd_cmd_desc *nd_cmd_dimm_desc(int cmd); const struct nd_cmd_desc *nd_cmd_bus_desc(int cmd); -- cgit From a11821495fd4d9b5c97945db929e02c473b7a5d9 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Sun, 24 Jul 2022 20:46:28 +0200 Subject: i2c: extend documentation about retvals of master_xfer functions It was stated how the error codes should be. It was not stated what the regular case should return. Add this. Signed-off-by: Wolfram Sang Signed-off-by: Wolfram Sang --- include/linux/i2c.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/i2c.h b/include/linux/i2c.h index fbda5ada2afc..8eab5017bff3 100644 --- a/include/linux/i2c.h +++ b/include/linux/i2c.h @@ -537,7 +537,8 @@ i2c_register_board_info(int busnum, struct i2c_board_info const *info, * * The return codes from the ``master_xfer{_atomic}`` fields should indicate the * type of error code that occurred during the transfer, as documented in the - * Kernel Documentation file Documentation/i2c/fault-codes.rst. + * Kernel Documentation file Documentation/i2c/fault-codes.rst. Otherwise, the + * number of messages executed should be returned. */ struct i2c_algorithm { /* -- cgit From 5727dcfd8486399c40e39d2c08fe36fedab29d99 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Mon, 25 Jul 2022 09:54:00 +0200 Subject: fbdev: Make registered_fb[] private to fbmem.c No driver access this anymore, except for the olpc dcon fbdev driver but that has been marked as broken anyways by commit de0952f267ff ("staging: olpc_dcon: mark driver as broken"). Signed-off-by: Daniel Vetter Signed-off-by: Daniel Vetter Reviewed-by: Javier Martinez Canillas Signed-off-by: Javier Martinez Canillas Signed-off-by: Thomas Zimmermann Acked-by: Thomas Zimmermann Link: https://patchwork.freedesktop.org/patch/msgid/20220725075400.68478-1-javierm@redhat.com --- include/linux/fb.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 453c3b2b6b8e..0aff76bcbb00 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -627,16 +627,10 @@ extern int fb_get_color_depth(struct fb_var_screeninfo *var, extern int fb_get_options(const char *name, char **option); extern int fb_new_modelist(struct fb_info *info); -extern struct fb_info *registered_fb[FB_MAX]; -extern int num_registered_fb; extern bool fb_center_logo; extern int fb_logo_count; extern struct class *fb_class; -#define for_each_registered_fb(i) \ - for (i = 0; i < FB_MAX; i++) \ - if (!registered_fb[i]) {} else - static inline void lock_fb_info(struct fb_info *info) { mutex_lock(&info->lock); -- cgit From 04b9407197789c81fffac52921e703cb47967d6a Mon Sep 17 00:00:00 2001 From: Daniil Lunev Date: Wed, 27 Jul 2022 16:44:24 +1000 Subject: vfs: function to prevent re-use of block-device-based superblocks The function is to be called from filesystem-specific code to mark a superblock to be ignored by superblock test and thus never re-used. The function also unregisters bdi if the bdi is per-superblock to avoid collision if a new superblock is created to represent the filesystem. generic_shutdown_super() skips unregistering bdi for a retired superlock as it assumes retire function has already done it. This patch adds the functionality only for the block-device-based supers, since the primary use case of the feature is to gracefully handle force unmount of external devices, mounted with FUSE. This can be further extended to cover all superblocks, if the need arises. Signed-off-by: Daniil Lunev Reviewed-by: Christoph Hellwig Signed-off-by: Miklos Szeredi --- include/linux/fs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ad5e3520fae..246120ee0573 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1412,6 +1412,7 @@ extern int send_sigurg(struct fown_struct *fown); #define SB_I_SKIP_SYNC 0x00000100 /* Skip superblock at global sync */ #define SB_I_PERSB_BDI 0x00000200 /* has a per-sb bdi */ #define SB_I_TS_EXPIRY_WARNED 0x00000400 /* warned about timestamp range expiry */ +#define SB_I_RETIRED 0x00000800 /* superblock shouldn't be reused */ /* Possible states of 'frozen' field */ enum { @@ -2432,6 +2433,7 @@ extern struct dentry *mount_nodev(struct file_system_type *fs_type, int flags, void *data, int (*fill_super)(struct super_block *, void *, int)); extern struct dentry *mount_subtree(struct vfsmount *mnt, const char *path); +void retire_super(struct super_block *sb); void generic_shutdown_super(struct super_block *sb); void kill_block_super(struct super_block *sb); void kill_anon_super(struct super_block *sb); -- cgit From 7c56a8733d0a2a4be2438a7512566e5ce552fccf Mon Sep 17 00:00:00 2001 From: Laurent Dufour Date: Wed, 13 Jul 2022 17:47:27 +0200 Subject: watchdog: export lockup_detector_reconfigure In some circumstances it may be interesting to reconfigure the watchdog from inside the kernel. On PowerPC, this may helpful before and after a LPAR migration (LPM) is initiated, because it implies some latencies, watchdog, and especially NMI watchdog is expected to be triggered during this operation. Reconfiguring the watchdog with a factor, would prevent it to happen too frequently during LPM. Rename lockup_detector_reconfigure() as __lockup_detector_reconfigure() and create a new function lockup_detector_reconfigure() calling __lockup_detector_reconfigure() under the protection of watchdog_mutex. Signed-off-by: Laurent Dufour [mpe: Squash in build fix from Laurent, reported by Sachin] Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220713154729.80789-3-ldufour@linux.ibm.com --- include/linux/nmi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nmi.h b/include/linux/nmi.h index 750c7f395ca9..f700ff2df074 100644 --- a/include/linux/nmi.h +++ b/include/linux/nmi.h @@ -122,6 +122,8 @@ int watchdog_nmi_probe(void); int watchdog_nmi_enable(unsigned int cpu); void watchdog_nmi_disable(unsigned int cpu); +void lockup_detector_reconfigure(void); + /** * touch_nmi_watchdog - restart NMI watchdog timeout. * -- cgit From 9f8267b9b29937d997c3b4cec426252aae6311b5 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Wed, 27 Jul 2022 13:49:48 +0200 Subject: misc: Mark MICROCODE_MINOR unused This one is unused since 181b6f40e9ea ("x86/microcode: Rip out the OLD_INTERFACE") so comment it out. Link: https://lore.kernel.org/r/20220525161232.14924-1-bp@alien8.de Cc: Arnd Bergmann Cc: Greg Kroah-Hartman Signed-off-by: Borislav Petkov Link: https://lore.kernel.org/r/20220727114948.30123-1-bp@alien8.de Signed-off-by: Greg Kroah-Hartman --- include/linux/miscdevice.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index 0676f18093f9..c0fea6ca5076 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -44,7 +44,7 @@ #define AGPGART_MINOR 175 #define TOSH_MINOR_DEV 181 #define HWRNG_MINOR 183 -#define MICROCODE_MINOR 184 +/*#define MICROCODE_MINOR 184 unused */ #define KEYPAD_MINOR 185 #define IRNET_MINOR 187 #define D7S_MINOR 193 -- cgit From 26c6c2f8a907c9e3a2f24990552a4d77235791e6 Mon Sep 17 00:00:00 2001 From: Weitao Wang Date: Tue, 26 Jul 2022 15:49:18 +0800 Subject: USB: HCD: Fix URB giveback issue in tasklet function Usb core introduce the mechanism of giveback of URB in tasklet context to reduce hardware interrupt handling time. On some test situation(such as FIO with 4KB block size), when tasklet callback function called to giveback URB, interrupt handler add URB node to the bh->head list also. If check bh->head list again after finish all URB giveback of local_list, then it may introduce a "dynamic balance" between giveback URB and add URB to bh->head list. This tasklet callback function may not exit for a long time, which will cause other tasklet function calls to be delayed. Some real-time applications(such as KB and Mouse) will see noticeable lag. In order to prevent the tasklet function from occupying the cpu for a long time at a time, new URBS will not be added to the local_list even though the bh->head list is not empty. But also need to ensure the left URB giveback to be processed in time, so add a member high_prio for structure giveback_urb_bh to prioritize tasklet and schelule this tasklet again if bh->head list is not empty. At the same time, we are able to prioritize tasklet through structure member high_prio. So, replace the local high_prio_bh variable with this structure member in usb_hcd_giveback_urb. Fixes: 94dfd7edfd5c ("USB: HCD: support giveback of URB in tasklet context") Cc: stable Reviewed-by: Alan Stern Signed-off-by: Weitao Wang Link: https://lore.kernel.org/r/20220726074918.5114-1-WeitaoWang-oc@zhaoxin.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/hcd.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h index 2c1fc9212cf2..98d1921f02b1 100644 --- a/include/linux/usb/hcd.h +++ b/include/linux/usb/hcd.h @@ -66,6 +66,7 @@ struct giveback_urb_bh { bool running; + bool high_prio; spinlock_t lock; struct list_head head; struct tasklet_struct bh; -- cgit From 6eabfc018e8d1033e7fc1efce30a872e2dccb537 Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Tue, 26 Jul 2022 10:38:21 -0700 Subject: regulator: core: Allow specifying an initial load w/ the bulk API There are a number of drivers that follow a pattern that looks like this: 1. Use the regulator bulk API to get a bunch of regulators. 2. Set the load on each of the regulators to use whenever the regulators are enabled. Let's make this easier by just allowing the drivers to pass the load in. As part of this change we need to move the error printing in regulator_bulk_get() around; let's switch to the new dev_err_probe() to simplify it. Signed-off-by: Douglas Anderson Link: https://lore.kernel.org/r/20220726103631.v2.4.Ie85f68215ada39f502a96dcb8a1f3ad977e3f68a@changeid Signed-off-by: Mark Brown --- include/linux/regulator/consumer.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/regulator/consumer.h b/include/linux/regulator/consumer.h index bbf6590a6dec..5779f4466e62 100644 --- a/include/linux/regulator/consumer.h +++ b/include/linux/regulator/consumer.h @@ -171,10 +171,13 @@ struct regulator; /** * struct regulator_bulk_data - Data used for bulk regulator operations. * - * @supply: The name of the supply. Initialised by the user before - * using the bulk regulator APIs. - * @consumer: The regulator consumer for the supply. This will be managed - * by the bulk API. + * @supply: The name of the supply. Initialised by the user before + * using the bulk regulator APIs. + * @init_load_uA: After getting the regulator, regulator_set_load() will be + * called with this load. Initialised by the user before + * using the bulk regulator APIs. + * @consumer: The regulator consumer for the supply. This will be managed + * by the bulk API. * * The regulator APIs provide a series of regulator_bulk_() API calls as * a convenience to consumers which require multiple supplies. This @@ -182,6 +185,7 @@ struct regulator; */ struct regulator_bulk_data { const char *supply; + int init_load_uA; struct regulator *consumer; /* private: Internal use */ -- cgit From 1de452a0edda26f1483d1d934f692eab13ba669a Mon Sep 17 00:00:00 2001 From: Douglas Anderson Date: Tue, 26 Jul 2022 10:38:23 -0700 Subject: regulator: core: Allow drivers to define their init data as const Drivers tend to want to define the names of their regulators somewhere in their source file as "static const". This means, inevitable, that every driver out there open codes something like this: static const char * const supply_names[] = { "vcc", "vccl", }; static int get_regulators(struct my_data *data) { int i; data->supplies = devm_kzalloc(...) if (!data->supplies) return -ENOMEM; for (i = 0; i < ARRAY_SIZE(supply_names); i++) data->supplies[i].supply = supply_names[i]; return devm_regulator_bulk_get(data->dev, ARRAY_SIZE(supply_names), data->supplies); } Let's make this more convenient by doing providing a helper that does the copy. I have chosen to have the "const" input structure here be the exact same structure as the normal one passed to devm_regulator_bulk_get(). This is slightly inefficent since the input data can't possibly have anything useful for "ret" or consumer and thus we waste 8 bytes per structure. This seems an OK tradeoff for not introducing an extra structure. Signed-off-by: Douglas Anderson Link: https://lore.kernel.org/r/20220726103631.v2.6.I38fc508a73135a5c1b873851f3553ff2a3a625f5@changeid Signed-off-by: Mark Brown --- include/linux/regulator/consumer.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regulator/consumer.h b/include/linux/regulator/consumer.h index 5779f4466e62..bc6cda706d1f 100644 --- a/include/linux/regulator/consumer.h +++ b/include/linux/regulator/consumer.h @@ -244,6 +244,10 @@ int __must_check regulator_bulk_get(struct device *dev, int num_consumers, struct regulator_bulk_data *consumers); int __must_check devm_regulator_bulk_get(struct device *dev, int num_consumers, struct regulator_bulk_data *consumers); +int __must_check devm_regulator_bulk_get_const( + struct device *dev, int num_consumers, + const struct regulator_bulk_data *in_consumers, + struct regulator_bulk_data **out_consumers); int __must_check regulator_bulk_enable(int num_consumers, struct regulator_bulk_data *consumers); int regulator_bulk_disable(int num_consumers, -- cgit From 14b146b688ad9593f5eee93d51a34d09a47e50b5 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Wed, 27 Jul 2022 10:30:41 +0100 Subject: io_uring: notification completion optimisation We want to use all optimisations that we have for io_uring requests like completion batching, memory caching and more but for zc notifications. Fortunately, notification perfectly fit the request model so we can overlay them onto struct io_kiocb and use all the infratructure. Most of the fields of struct io_notif natively fits into io_kiocb, so we replace struct io_notif with struct io_kiocb carrying struct io_notif_data in the cmd cache line. Then we adapt io_alloc_notif() to use io_alloc_req()/io_alloc_req_refill(), and kill leftovers of hand coded caching. __io_notif_complete_tw() is converted to use io_uring's tw infra. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/9e010125175e80baf51f0ca63bdc7cc6a4a9fa56.1658913593.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 144493cbadb5..f7fab3758cb9 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -249,9 +249,6 @@ struct io_ring_ctx { struct xarray io_bl_xa; struct list_head io_buffers_cache; - /* struct io_notif cache, protected by uring_lock */ - struct list_head notif_list; - struct io_hash_table cancel_table_locked; struct list_head cq_overflow_list; struct io_alloc_cache apoll_cache; @@ -262,10 +259,6 @@ struct io_ring_ctx { struct io_wq_work_list locked_free_list; unsigned int locked_free_nr; - /* struct io_notif cache protected by completion_lock */ - struct list_head notif_list_locked; - unsigned int notif_locked_nr; - const struct cred *sq_creds; /* cred used for __io_sq_thread() */ struct io_sq_data *sq_data; /* if using sq thread polling */ -- cgit From 0113780870b1597ae49f30abfa4957c239f913d3 Mon Sep 17 00:00:00 2001 From: Aharon Landau Date: Tue, 26 Jul 2022 10:19:11 +0300 Subject: RDMA/mlx5: Rename the mkey cache variables and functions After replacing the MR cache with an Mkey cache, rename the variables and functions to fit the new meaning. Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com Signed-off-by: Aharon Landau Signed-off-by: Leon Romanovsky Signed-off-by: Jason Gunthorpe --- include/linux/mlx5/driver.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 76d7661e3e63..220597c2f436 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -728,10 +728,10 @@ enum { }; enum { - MR_CACHE_LAST_STD_ENTRY = 20, + MKEY_CACHE_LAST_STD_ENTRY = 20, MLX5_IMR_MTT_CACHE_ENTRY, MLX5_IMR_KSM_CACHE_ENTRY, - MAX_MR_CACHE_ENTRIES + MAX_MKEY_CACHE_ENTRIES }; struct mlx5_profile { @@ -740,7 +740,7 @@ struct mlx5_profile { struct { int size; int limit; - } mr_cache[MAX_MR_CACHE_ENTRIES]; + } mr_cache[MAX_MKEY_CACHE_ENTRIES]; }; struct mlx5_hca_cap { -- cgit From 103e10c69c611efabccf57d799c4b191d53ee765 Mon Sep 17 00:00:00 2001 From: Sakari Ailus Date: Fri, 22 Jul 2022 09:57:46 +0300 Subject: ACPI: property: Add support for parsing buffer property UUID Add support for newly added buffer property UUID, as defined in the DSD guide section 3.3 [1] Link: https://github.com/UEFI/DSD-Guide/blob/main/src/dsd-guide.adoc#buffer-data-extension-uuid # [1] Signed-off-by: Sakari Ailus Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 44975c1bbe12..7c46f152106b 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1243,7 +1243,7 @@ static inline bool acpi_dev_has_props(const struct acpi_device *adev) struct acpi_device_properties * acpi_data_add_props(struct acpi_device_data *data, const guid_t *guid, - const union acpi_object *properties); + union acpi_object *properties); int acpi_node_prop_get(const struct fwnode_handle *fwnode, const char *propname, void **valptr); -- cgit From 72691a269f0baad6d5f4aa7af97c29081b86d70f Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 27 Jul 2022 13:02:27 -0400 Subject: SUNRPC: Don't reuse bvec on retransmission of the request If a request is re-encoded and then retransmitted, we need to make sure that we also re-encode the bvec, in case the page lists have changed. Fixes: ff053dbbaffe ("SUNRPC: Move the call to xprt_send_pagedata() out of xprt_sock_sendmsg()") Signed-off-by: Trond Myklebust --- include/linux/sunrpc/xprt.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 0d51b9f9ea37..b9f59aabee53 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -144,7 +144,8 @@ struct rpc_xprt_ops { unsigned short (*get_srcport)(struct rpc_xprt *xprt); int (*buf_alloc)(struct rpc_task *task); void (*buf_free)(struct rpc_task *task); - int (*prepare_request)(struct rpc_rqst *req); + int (*prepare_request)(struct rpc_rqst *req, + struct xdr_buf *buf); int (*send_request)(struct rpc_rqst *req); void (*wait_for_reply_request)(struct rpc_task *task); void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task); -- cgit From c452d49849d48bd37ae97fc2bc92c6435707c35f Mon Sep 17 00:00:00 2001 From: Tudor Ambarus Date: Mon, 25 Jul 2022 12:24:59 +0300 Subject: mtd: spi-nor: s/addr_width/addr_nbytes Address width was an unfortunate name, as it means the number of IO lines used for the address, whereas in the code it is used as the number of address bytes. s/addr_width/addr_nbytes throughout the entire SPI NOR framework. Signed-off-by: Tudor Ambarus Reviewed-by: Michael Walle Acked-by: Pratyush Yadav Link: https://lore.kernel.org/r/20220725092505.446315-2-tudor.ambarus@microchip.com --- include/linux/mtd/spi-nor.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mtd/spi-nor.h b/include/linux/mtd/spi-nor.h index 1ede4c89805a..42218a1164f6 100644 --- a/include/linux/mtd/spi-nor.h +++ b/include/linux/mtd/spi-nor.h @@ -351,7 +351,7 @@ struct spi_nor_flash_parameter; * @bouncebuf_size: size of the bounce buffer * @info: SPI NOR part JEDEC MFR ID and other info * @manufacturer: SPI NOR manufacturer - * @addr_width: number of address bytes + * @addr_nbytes: number of address bytes * @erase_opcode: the opcode for erasing a sector * @read_opcode: the read opcode * @read_dummy: the dummy needed by the read operation @@ -381,7 +381,7 @@ struct spi_nor { size_t bouncebuf_size; const struct flash_info *info; const struct spi_nor_manufacturer *manufacturer; - u8 addr_width; + u8 addr_nbytes; u8 erase_opcode; u8 read_opcode; u8 read_dummy; -- cgit From e60a7233684aa8bbe9090537720fe6a1e901d823 Mon Sep 17 00:00:00 2001 From: Jiri Slaby Date: Thu, 28 Jul 2022 08:10:51 +0200 Subject: Documentation: serial: move uart_ops documentation to the struct While it's a lot of text, it always helps to keep it up to date when it's by the source. (And not in a separate file.) The documentation tooling also makes sure that all members of the structure are documented. (If not, it complains loudly.) Finally, there needs to be no comments inlined in the structure, so they are dropped as they are superfluous now. The compilation time of this header (tested with serial_core.c) didn't change in my testing at all. Signed-off-by: Jiri Slaby Link: https://lore.kernel.org/r/20220728061056.20799-1-jslaby@suse.cz Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 345 ++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 329 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index a6fa7c40c330..35d2f9e8027d 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -31,9 +31,336 @@ struct serial_struct; struct device; struct gpio_desc; -/* +/** + * struct uart_ops -- interface between serial_core and the driver + * * This structure describes all the operations that can be done on the - * physical hardware. See Documentation/driver-api/serial/driver.rst for details. + * physical hardware. + * + * @tx_empty: ``unsigned int ()(struct uart_port *port)`` + * + * This function tests whether the transmitter fifo and shifter for the + * @port is empty. If it is empty, this function should return + * %TIOCSER_TEMT, otherwise return 0. If the port does not support this + * operation, then it should return %TIOCSER_TEMT. + * + * Locking: none. + * Interrupts: caller dependent. + * This call must not sleep + * + * @set_mctrl: ``void ()(struct uart_port *port, unsigned int mctrl)`` + * + * This function sets the modem control lines for @port to the state + * described by @mctrl. The relevant bits of @mctrl are: + * + * - %TIOCM_RTS RTS signal. + * - %TIOCM_DTR DTR signal. + * - %TIOCM_OUT1 OUT1 signal. + * - %TIOCM_OUT2 OUT2 signal. + * - %TIOCM_LOOP Set the port into loopback mode. + * + * If the appropriate bit is set, the signal should be driven + * active. If the bit is clear, the signal should be driven + * inactive. + * + * Locking: @port->lock taken. + * Interrupts: locally disabled. + * This call must not sleep + * + * @get_mctrl: ``unsigned int ()(struct uart_port *port)`` + * + * Returns the current state of modem control inputs of @port. The state + * of the outputs should not be returned, since the core keeps track of + * their state. The state information should include: + * + * - %TIOCM_CAR state of DCD signal + * - %TIOCM_CTS state of CTS signal + * - %TIOCM_DSR state of DSR signal + * - %TIOCM_RI state of RI signal + * + * The bit is set if the signal is currently driven active. If + * the port does not support CTS, DCD or DSR, the driver should + * indicate that the signal is permanently active. If RI is + * not available, the signal should not be indicated as active. + * + * Locking: @port->lock taken. + * Interrupts: locally disabled. + * This call must not sleep + * + * @stop_tx: ``void ()(struct uart_port *port)`` + * + * Stop transmitting characters. This might be due to the CTS line + * becoming inactive or the tty layer indicating we want to stop + * transmission due to an %XOFF character. + * + * The driver should stop transmitting characters as soon as possible. + * + * Locking: @port->lock taken. + * Interrupts: locally disabled. + * This call must not sleep + * + * @start_tx: ``void ()(struct uart_port *port)`` + * + * Start transmitting characters. + * + * Locking: @port->lock taken. + * Interrupts: locally disabled. + * This call must not sleep + * + * @throttle: ``void ()(struct uart_port *port)`` + * + * Notify the serial driver that input buffers for the line discipline are + * close to full, and it should somehow signal that no more characters + * should be sent to the serial port. + * This will be called only if hardware assisted flow control is enabled. + * + * Locking: serialized with @unthrottle() and termios modification by the + * tty layer. + * + * @unthrottle: ``void ()(struct uart_port *port)`` + * + * Notify the serial driver that characters can now be sent to the serial + * port without fear of overrunning the input buffers of the line + * disciplines. + * + * This will be called only if hardware assisted flow control is enabled. + * + * Locking: serialized with @throttle() and termios modification by the + * tty layer. + * + * @send_xchar: ``void ()(struct uart_port *port, char ch)`` + * + * Transmit a high priority character, even if the port is stopped. This + * is used to implement XON/XOFF flow control and tcflow(). If the serial + * driver does not implement this function, the tty core will append the + * character to the circular buffer and then call start_tx() / stop_tx() + * to flush the data out. + * + * Do not transmit if @ch == '\0' (%__DISABLED_CHAR). + * + * Locking: none. + * Interrupts: caller dependent. + * + * @stop_rx: ``void ()(struct uart_port *port)`` + * + * Stop receiving characters; the @port is in the process of being closed. + * + * Locking: @port->lock taken. + * Interrupts: locally disabled. + * This call must not sleep + * + * @enable_ms: ``void ()(struct uart_port *port)`` + * + * Enable the modem status interrupts. + * + * This method may be called multiple times. Modem status interrupts + * should be disabled when the @shutdown() method is called. + * + * Locking: @port->lock taken. + * Interrupts: locally disabled. + * This call must not sleep + * + * @break_ctl: ``void ()(struct uart_port *port, int ctl)`` + * + * Control the transmission of a break signal. If @ctl is nonzero, the + * break signal should be transmitted. The signal should be terminated + * when another call is made with a zero @ctl. + * + * Locking: caller holds tty_port->mutex + * + * @startup: ``int ()(struct uart_port *port)`` + * + * Grab any interrupt resources and initialise any low level driver state. + * Enable the port for reception. It should not activate RTS nor DTR; + * this will be done via a separate call to @set_mctrl(). + * + * This method will only be called when the port is initially opened. + * + * Locking: port_sem taken. + * Interrupts: globally disabled. + * + * @shutdown: ``void ()(struct uart_port *port)`` + * + * Disable the @port, disable any break condition that may be in effect, + * and free any interrupt resources. It should not disable RTS nor DTR; + * this will have already been done via a separate call to @set_mctrl(). + * + * Drivers must not access @port->state once this call has completed. + * + * This method will only be called when there are no more users of this + * @port. + * + * Locking: port_sem taken. + * Interrupts: caller dependent. + * + * @flush_buffer: ``void ()(struct uart_port *port)`` + * + * Flush any write buffers, reset any DMA state and stop any ongoing DMA + * transfers. + * + * This will be called whenever the @port->state->xmit circular buffer is + * cleared. + * + * Locking: @port->lock taken. + * Interrupts: locally disabled. + * This call must not sleep + * + * @set_termios: ``void ()(struct uart_port *port, struct ktermios *new, + * struct ktermios *old)`` + * + * Change the @port parameters, including word length, parity, stop bits. + * Update @port->read_status_mask and @port->ignore_status_mask to + * indicate the types of events we are interested in receiving. Relevant + * ktermios::c_cflag bits are: + * + * - %CSIZE - word size + * - %CSTOPB - 2 stop bits + * - %PARENB - parity enable + * - %PARODD - odd parity (when %PARENB is in force) + * - %ADDRB - address bit (changed through uart_port::rs485_config()). + * - %CREAD - enable reception of characters (if not set, still receive + * characters from the port, but throw them away). + * - %CRTSCTS - if set, enable CTS status change reporting. + * - %CLOCAL - if not set, enable modem status change reporting. + * + * Relevant ktermios::c_iflag bits are: + * + * - %INPCK - enable frame and parity error events to be passed to the TTY + * layer. + * - %BRKINT / %PARMRK - both of these enable break events to be passed to + * the TTY layer. + * - %IGNPAR - ignore parity and framing errors. + * - %IGNBRK - ignore break errors. If %IGNPAR is also set, ignore overrun + * errors as well. + * + * The interaction of the ktermios::c_iflag bits is as follows (parity + * error given as an example): + * + * ============ ======= ======= ========================================= + * Parity error INPCK IGNPAR + * ============ ======= ======= ========================================= + * n/a 0 n/a character received, marked as %TTY_NORMAL + * None 1 n/a character received, marked as %TTY_NORMAL + * Yes 1 0 character received, marked as %TTY_PARITY + * Yes 1 1 character discarded + * ============ ======= ======= ========================================= + * + * Other flags may be used (eg, xon/xoff characters) if your hardware + * supports hardware "soft" flow control. + * + * Locking: caller holds tty_port->mutex + * Interrupts: caller dependent. + * This call must not sleep + * + * @set_ldisc: ``void ()(struct uart_port *port, struct ktermios *termios)`` + * + * Notifier for discipline change. See + * Documentation/driver-api/tty/tty_ldisc.rst. + * + * Locking: caller holds tty_port->mutex + * + * @pm: ``void ()(struct uart_port *port, unsigned int state, + * unsigned int oldstate)`` + * + * Perform any power management related activities on the specified @port. + * @state indicates the new state (defined by enum uart_pm_state), + * @oldstate indicates the previous state. + * + * This function should not be used to grab any resources. + * + * This will be called when the @port is initially opened and finally + * closed, except when the @port is also the system console. This will + * occur even if %CONFIG_PM is not set. + * + * Locking: none. + * Interrupts: caller dependent. + * + * @type: ``const char *()(struct uart_port *port)`` + * + * Return a pointer to a string constant describing the specified @port, + * or return %NULL, in which case the string 'unknown' is substituted. + * + * Locking: none. + * Interrupts: caller dependent. + * + * @release_port: ``void ()(struct uart_port *port)`` + * + * Release any memory and IO region resources currently in use by the + * @port. + * + * Locking: none. + * Interrupts: caller dependent. + * + * @request_port: ``int ()(struct uart_port *port)`` + * + * Request any memory and IO region resources required by the port. If any + * fail, no resources should be registered when this function returns, and + * it should return -%EBUSY on failure. + * + * Locking: none. + * Interrupts: caller dependent. + * + * @config_port: ``void ()(struct uart_port *port, int type)`` + * + * Perform any autoconfiguration steps required for the @port. @type + * contains a bit mask of the required configuration. %UART_CONFIG_TYPE + * indicates that the port requires detection and identification. + * @port->type should be set to the type found, or %PORT_UNKNOWN if no + * port was detected. + * + * %UART_CONFIG_IRQ indicates autoconfiguration of the interrupt signal, + * which should be probed using standard kernel autoprobing techniques. + * This is not necessary on platforms where ports have interrupts + * internally hard wired (eg, system on a chip implementations). + * + * Locking: none. + * Interrupts: caller dependent. + * + * @verify_port: ``int ()(struct uart_port *port, + * struct serial_struct *serinfo)`` + * + * Verify the new serial port information contained within @serinfo is + * suitable for this port type. + * + * Locking: none. + * Interrupts: caller dependent. + * + * @ioctl: ``int ()(struct uart_port *port, unsigned int cmd, + * unsigned long arg)`` + * + * Perform any port specific IOCTLs. IOCTL commands must be defined using + * the standard numbering system found in . + * + * Locking: none. + * Interrupts: caller dependent. + * + * @poll_init: ``int ()(struct uart_port *port)`` + * + * Called by kgdb to perform the minimal hardware initialization needed to + * support @poll_put_char() and @poll_get_char(). Unlike @startup(), this + * should not request interrupts. + * + * Locking: %tty_mutex and tty_port->mutex taken. + * Interrupts: n/a. + * + * @poll_put_char: ``void ()(struct uart_port *port, unsigned char ch)`` + * + * Called by kgdb to write a single character @ch directly to the serial + * @port. It can and should block until there is space in the TX FIFO. + * + * Locking: none. + * Interrupts: caller dependent. + * This call must not sleep + * + * @poll_get_char: ``int ()(struct uart_port *port)`` + * + * Called by kgdb to read a single character directly from the serial + * port. If data is available, it should be returned; otherwise the + * function should return %NO_POLL_CHAR immediately. + * + * Locking: none. + * Interrupts: caller dependent. + * This call must not sleep */ struct uart_ops { unsigned int (*tx_empty)(struct uart_port *); @@ -56,22 +383,8 @@ struct uart_ops { void (*set_ldisc)(struct uart_port *, struct ktermios *); void (*pm)(struct uart_port *, unsigned int state, unsigned int oldstate); - - /* - * Return a string describing the type of the port - */ const char *(*type)(struct uart_port *); - - /* - * Release IO and memory resources used by the port. - * This includes iounmap if necessary. - */ void (*release_port)(struct uart_port *); - - /* - * Request IO and memory resources used by the port. - * This includes iomapping the port if necessary. - */ int (*request_port)(struct uart_port *); void (*config_port)(struct uart_port *, int); int (*verify_port)(struct uart_port *, struct serial_struct *); -- cgit From 5f10376b6bc1e2773f56977980ab08c9e4fa91a7 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 26 Jul 2022 14:56:52 -0700 Subject: add missing includes and forward declarations to networking includes under linux/ Similarly to a recent include/net/ cleanup, this patch adds missing includes to networking headers under include/linux. All these problems are currently masked by the existing users including the missing dependency before the broken header. Link: https://lore.kernel.org/all/20220723045755.2676857-1-kuba@kernel.org/ v1 Signed-off-by: Jakub Kicinski Link: https://lore.kernel.org/r/20220726215652.158167-1-kuba@kernel.org Signed-off-by: Paolo Abeni --- include/linux/atm_tcp.h | 2 ++ include/linux/dsa/tag_qca.h | 5 +++++ include/linux/hippidevice.h | 4 ++++ include/linux/if_eql.h | 1 + include/linux/if_hsr.h | 4 ++++ include/linux/if_rmnet.h | 2 ++ include/linux/if_tap.h | 11 ++++++----- include/linux/mdio/mdio-xgene.h | 4 ++++ include/linux/nl802154.h | 2 ++ include/linux/phy_fixed.h | 3 +++ include/linux/ppp-comp.h | 2 +- include/linux/ppp_channel.h | 2 ++ include/linux/ptp_kvm.h | 2 ++ include/linux/ptp_pch.h | 4 ++++ include/linux/seq_file_net.h | 1 + include/linux/sungem_phy.h | 2 ++ include/linux/usb/usbnet.h | 6 ++++++ 17 files changed, 51 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/atm_tcp.h b/include/linux/atm_tcp.h index c8ecf6f68fb5..2558439d849b 100644 --- a/include/linux/atm_tcp.h +++ b/include/linux/atm_tcp.h @@ -9,6 +9,8 @@ #include +struct atm_vcc; +struct module; struct atm_tcp_ops { int (*attach)(struct atm_vcc *vcc,int itf); diff --git a/include/linux/dsa/tag_qca.h b/include/linux/dsa/tag_qca.h index 4359fb0221cf..50be7cbd93a5 100644 --- a/include/linux/dsa/tag_qca.h +++ b/include/linux/dsa/tag_qca.h @@ -3,6 +3,11 @@ #ifndef __TAG_QCA_H #define __TAG_QCA_H +#include + +struct dsa_switch; +struct sk_buff; + #define QCA_HDR_LEN 2 #define QCA_HDR_VERSION 0x2 diff --git a/include/linux/hippidevice.h b/include/linux/hippidevice.h index 9dc01f7ab5b4..07414c241e65 100644 --- a/include/linux/hippidevice.h +++ b/include/linux/hippidevice.h @@ -23,6 +23,10 @@ #ifdef __KERNEL__ +struct neigh_parms; +struct net_device; +struct sk_buff; + struct hippi_cb { __u32 ifield; }; diff --git a/include/linux/if_eql.h b/include/linux/if_eql.h index d75601d613cc..07f9b660b741 100644 --- a/include/linux/if_eql.h +++ b/include/linux/if_eql.h @@ -21,6 +21,7 @@ #include #include +#include #include typedef struct slave { diff --git a/include/linux/if_hsr.h b/include/linux/if_hsr.h index 408539d5ea5f..0404f5bf4f30 100644 --- a/include/linux/if_hsr.h +++ b/include/linux/if_hsr.h @@ -2,6 +2,10 @@ #ifndef _LINUX_IF_HSR_H_ #define _LINUX_IF_HSR_H_ +#include + +struct net_device; + /* used to differentiate various protocols */ enum hsr_version { HSR_V0 = 0, diff --git a/include/linux/if_rmnet.h b/include/linux/if_rmnet.h index 10e7521ecb6c..839d1e48b85e 100644 --- a/include/linux/if_rmnet.h +++ b/include/linux/if_rmnet.h @@ -5,6 +5,8 @@ #ifndef _LINUX_IF_RMNET_H_ #define _LINUX_IF_RMNET_H_ +#include + struct rmnet_map_header { u8 flags; /* MAP_CMD_FLAG, MAP_PAD_LEN_MASK */ u8 mux_id; diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h index 915a187cfabd..553552fa635c 100644 --- a/include/linux/if_tap.h +++ b/include/linux/if_tap.h @@ -2,14 +2,18 @@ #ifndef _LINUX_IF_TAP_H_ #define _LINUX_IF_TAP_H_ +#include +#include + +struct file; +struct socket; + #if IS_ENABLED(CONFIG_TAP) struct socket *tap_get_socket(struct file *); struct ptr_ring *tap_get_ptr_ring(struct file *file); #else #include #include -struct file; -struct socket; static inline struct socket *tap_get_socket(struct file *f) { return ERR_PTR(-EINVAL); @@ -20,9 +24,6 @@ static inline struct ptr_ring *tap_get_ptr_ring(struct file *f) } #endif /* CONFIG_TAP */ -#include -#include - /* * Maximum times a tap device can be opened. This can be used to * configure the number of receive queue, e.g. for multiqueue virtio. diff --git a/include/linux/mdio/mdio-xgene.h b/include/linux/mdio/mdio-xgene.h index 8af93ada8b64..9e588965dc83 100644 --- a/include/linux/mdio/mdio-xgene.h +++ b/include/linux/mdio/mdio-xgene.h @@ -8,6 +8,10 @@ #ifndef __MDIO_XGENE_H__ #define __MDIO_XGENE_H__ +#include +#include +#include + #define BLOCK_XG_MDIO_CSR_OFFSET 0x5000 #define BLOCK_DIAG_CSR_OFFSET 0xd000 #define XGENET_CONFIG_REG_ADDR 0x20 diff --git a/include/linux/nl802154.h b/include/linux/nl802154.h index b22782225f27..cbe5fd1dd2e7 100644 --- a/include/linux/nl802154.h +++ b/include/linux/nl802154.h @@ -8,6 +8,8 @@ #ifndef NL802154_H #define NL802154_H +#include + #define IEEE802154_NL_NAME "802.15.4 MAC" #define IEEE802154_MCAST_COORD_NAME "coordinator" #define IEEE802154_MCAST_BEACON_NAME "beacon" diff --git a/include/linux/phy_fixed.h b/include/linux/phy_fixed.h index 52bc8e487ef7..1acafd86ab13 100644 --- a/include/linux/phy_fixed.h +++ b/include/linux/phy_fixed.h @@ -2,6 +2,8 @@ #ifndef __PHY_FIXED_H #define __PHY_FIXED_H +#include + struct fixed_phy_status { int link; int speed; @@ -12,6 +14,7 @@ struct fixed_phy_status { struct device_node; struct gpio_desc; +struct net_device; #if IS_ENABLED(CONFIG_FIXED_PHY) extern int fixed_phy_change_carrier(struct net_device *dev, bool new_carrier); diff --git a/include/linux/ppp-comp.h b/include/linux/ppp-comp.h index 9d3ffc8f5ea6..fb847e47f148 100644 --- a/include/linux/ppp-comp.h +++ b/include/linux/ppp-comp.h @@ -9,7 +9,7 @@ #include - +struct compstat; struct module; /* diff --git a/include/linux/ppp_channel.h b/include/linux/ppp_channel.h index 91f9a928344e..45e6e427ceb8 100644 --- a/include/linux/ppp_channel.h +++ b/include/linux/ppp_channel.h @@ -20,6 +20,8 @@ #include #include +struct net_device_path; +struct net_device_path_ctx; struct ppp_channel; struct ppp_channel_ops { diff --git a/include/linux/ptp_kvm.h b/include/linux/ptp_kvm.h index f960a719f0d5..c2e28deef33a 100644 --- a/include/linux/ptp_kvm.h +++ b/include/linux/ptp_kvm.h @@ -8,6 +8,8 @@ #ifndef _PTP_KVM_H_ #define _PTP_KVM_H_ +#include + struct timespec64; struct clocksource; diff --git a/include/linux/ptp_pch.h b/include/linux/ptp_pch.h index 51818198c292..7ba643b62c15 100644 --- a/include/linux/ptp_pch.h +++ b/include/linux/ptp_pch.h @@ -10,6 +10,10 @@ #ifndef _PTP_PCH_H_ #define _PTP_PCH_H_ +#include + +struct pci_dev; + void pch_ch_control_write(struct pci_dev *pdev, u32 val); u32 pch_ch_event_read(struct pci_dev *pdev); void pch_ch_event_write(struct pci_dev *pdev, u32 val); diff --git a/include/linux/seq_file_net.h b/include/linux/seq_file_net.h index b97912fdbae7..79638395bc32 100644 --- a/include/linux/seq_file_net.h +++ b/include/linux/seq_file_net.h @@ -3,6 +3,7 @@ #define __SEQ_FILE_NET_H__ #include +#include struct net; extern struct net init_net; diff --git a/include/linux/sungem_phy.h b/include/linux/sungem_phy.h index 3a11fa41a131..c505f30e8b68 100644 --- a/include/linux/sungem_phy.h +++ b/include/linux/sungem_phy.h @@ -2,6 +2,8 @@ #ifndef __SUNGEM_PHY_H__ #define __SUNGEM_PHY_H__ +#include + struct mii_phy; /* Operations supported by any kind of PHY */ diff --git a/include/linux/usb/usbnet.h b/include/linux/usb/usbnet.h index 1b4d72d5e891..b42b72391a8d 100644 --- a/include/linux/usb/usbnet.h +++ b/include/linux/usb/usbnet.h @@ -23,6 +23,12 @@ #ifndef __LINUX_USB_USBNET_H #define __LINUX_USB_USBNET_H +#include +#include +#include +#include +#include + /* interface from usbnet core to each USB networking link we handle */ struct usbnet { /* housekeeping */ -- cgit From 7fb48d25b5ce3bc488dbb019bf1736248181de9a Mon Sep 17 00:00:00 2001 From: Vincent Mailhol Date: Wed, 27 Jul 2022 19:16:34 +0900 Subject: can: dev: add generic function can_ethtool_op_get_ts_info_hwts() Add function can_ethtool_op_get_ts_info_hwts(). This function will be used by CAN devices with hardware TX/RX timestamping support to implement ethtool_ops::get_ts_info. This function does not offer support to activate/deactivate hardware timestamps at device level nor support the filter options (which is currently the case for all CAN devices with hardware timestamping support). The fact that hardware timestamp can not be deactivated at hardware level does not impact the userland. As long as the user do not set SO_TIMESTAMPING using a setsockopt() or ioctl(), the kernel will not emit TX timestamps (RX timestamps will still be reproted as it is the case currently). Drivers which need more fine grained control remains free to implement their own function, but we foresee that the generic function introduced here will be sufficient for the majority. Signed-off-by: Vincent Mailhol Link: https://lore.kernel.org/all/20220727101641.198847-8-mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde --- include/linux/can/dev.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h index e22dc03c850e..752bd45d8ebf 100644 --- a/include/linux/can/dev.h +++ b/include/linux/can/dev.h @@ -20,6 +20,7 @@ #include #include #include +#include #include /* @@ -162,6 +163,8 @@ struct can_priv *safe_candev_priv(struct net_device *dev); int open_candev(struct net_device *dev); void close_candev(struct net_device *dev); int can_change_mtu(struct net_device *dev, int new_mtu); +int can_ethtool_op_get_ts_info_hwts(struct net_device *dev, + struct ethtool_ts_info *info); int register_candev(struct net_device *dev); void unregister_candev(struct net_device *dev); -- cgit From 90f942c5a6d775bad1be33ba214755314105da4a Mon Sep 17 00:00:00 2001 From: Vincent Mailhol Date: Wed, 27 Jul 2022 19:16:35 +0900 Subject: can: dev: add generic function can_eth_ioctl_hwts() Tools based on libpcap (such as tcpdump) expect the SIOCSHWTSTAMP ioctl call to be supported. This is also specified in the kernel doc [1]. The purpose of this ioctl is to toggle the hardware timestamps. Currently, CAN devices which support hardware timestamping have those always activated. can_eth_ioctl_hwts() is a dumb function that will always succeed when requested to set tx_type to HWTSTAMP_TX_ON or rx_filter to HWTSTAMP_FILTER_ALL. [1] Kernel doc: Timestamping, section 3.1 "Hardware Timestamping Implementation: Device Drivers" Link: https://docs.kernel.org/networking/timestamping.html#hardware-timestamping-implementation-device-drivers Signed-off-by: Vincent Mailhol Link: https://lore.kernel.org/all/20220727101641.198847-9-mailhol.vincent@wanadoo.fr Signed-off-by: Marc Kleine-Budde --- include/linux/can/dev.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h index 752bd45d8ebf..c3e50e537e39 100644 --- a/include/linux/can/dev.h +++ b/include/linux/can/dev.h @@ -163,6 +163,7 @@ struct can_priv *safe_candev_priv(struct net_device *dev); int open_candev(struct net_device *dev); void close_candev(struct net_device *dev); int can_change_mtu(struct net_device *dev, int new_mtu); +int can_eth_ioctl_hwts(struct net_device *netdev, struct ifreq *ifr, int cmd); int can_ethtool_op_get_ts_info_hwts(struct net_device *dev, struct ethtool_ts_info *info); -- cgit From cceeeb6a6d02e7b9a74ddd27a3225013b34174aa Mon Sep 17 00:00:00 2001 From: Juri Lelli Date: Mon, 27 Jun 2022 11:50:51 +0200 Subject: wait: Fix __wait_event_hrtimeout for RT/DL tasks Changes to hrtimer mode (potentially made by __hrtimer_init_sleeper on PREEMPT_RT) are not visible to hrtimer_start_range_ns, thus not accounted for by hrtimer_start_expires call paths. In particular, __wait_event_hrtimeout suffers from this problem as we have, for example: fs/aio.c::read_events wait_event_interruptible_hrtimeout __wait_event_hrtimeout hrtimer_init_sleeper_on_stack <- this might "mode |= HRTIMER_MODE_HARD" on RT if task runs at RT/DL priority hrtimer_start_range_ns WARN_ON_ONCE(!(mode & HRTIMER_MODE_HARD) ^ !timer->is_hard) fires since the latter doesn't see the change of mode done by init_sleeper Fix it by making __wait_event_hrtimeout call hrtimer_sleeper_start_expires, which is aware of the special RT/DL case, instead of hrtimer_start_range_ns. Reported-by: Bruno Goncalves Signed-off-by: Juri Lelli Signed-off-by: Thomas Gleixner Reviewed-by: Daniel Bristot de Oliveira Reviewed-by: Valentin Schneider Link: https://lore.kernel.org/r/20220627095051.42470-1-juri.lelli@redhat.com --- include/linux/wait.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/wait.h b/include/linux/wait.h index 851e07da2583..58cfbf81447c 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -544,10 +544,11 @@ do { \ \ hrtimer_init_sleeper_on_stack(&__t, CLOCK_MONOTONIC, \ HRTIMER_MODE_REL); \ - if ((timeout) != KTIME_MAX) \ - hrtimer_start_range_ns(&__t.timer, timeout, \ - current->timer_slack_ns, \ - HRTIMER_MODE_REL); \ + if ((timeout) != KTIME_MAX) { \ + hrtimer_set_expires_range_ns(&__t.timer, timeout, \ + current->timer_slack_ns); \ + hrtimer_sleeper_start_expires(&__t, HRTIMER_MODE_REL); \ + } \ \ __ret = ___wait_event(wq_head, condition, state, 0, 0, \ if (!__t.task) { \ -- cgit From 4102c4042a33c682021683ec26c2dca3fd9d7cc2 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Wed, 29 Jun 2022 17:10:12 +0200 Subject: thermal/core: Remove DROP_FULL and RAISE_FULL The trends DROP_FULL and RAISE_FULL are not used and were never used in the past AFAICT. Remove these conditions as they seems to not be handled anywhere. Signed-off-by: Daniel Lezcano Link: https://lore.kernel.org/r/20220629151012.3115773-2-daniel.lezcano@linaro.org Signed-off-by: Daniel Lezcano --- include/linux/thermal.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 365733b428d8..231bac2768fb 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -40,8 +40,6 @@ enum thermal_trend { THERMAL_TREND_STABLE, /* temperature is stable */ THERMAL_TREND_RAISING, /* temperature is raising */ THERMAL_TREND_DROPPING, /* temperature is dropping */ - THERMAL_TREND_RAISE_FULL, /* apply highest cooling action */ - THERMAL_TREND_DROP_FULL, /* apply lowest cooling action */ }; /* Thermal notification reason */ -- cgit From 646274ddaf756dc549fc98cc581fc1ce4fef517a Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Fri, 22 Jul 2022 22:00:01 +0200 Subject: thermal/of: Move thermal_trip structure to thermal.h The structure thermal_trip is now generic and will be usable by the different sensor drivers in place of their own structure. Move its definition to thermal.h to make it accessible. Cc: Alexandre Bailon Cc: Kevin Hilman Signed-off-by: Daniel Lezcano Reviewed-by: Lukasz Luba Link: https://lore.kernel.org/r/20220722200007.1839356-5-daniel.lezcano@linexp.org Signed-off-by: Daniel Lezcano --- include/linux/thermal.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 231bac2768fb..7e66970f0464 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -78,6 +78,18 @@ struct thermal_zone_device_ops { void (*critical)(struct thermal_zone_device *); }; +/** + * struct thermal_trip - representation of a point in temperature domain + * @temperature: temperature value in miliCelsius + * @hysteresis: relative hysteresis in miliCelsius + * @type: trip point type + */ +struct thermal_trip { + int temperature; + int hysteresis; + enum thermal_trip_type type; +}; + struct thermal_cooling_device_ops { int (*get_max_state) (struct thermal_cooling_device *, unsigned long *); int (*get_cur_state) (struct thermal_cooling_device *, unsigned long *); -- cgit From e5bfcd30f88fdb0ce830229e7ccdeddcb7a59b04 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Fri, 22 Jul 2022 22:00:04 +0200 Subject: thermal/core: Rename 'trips' to 'num_trips' In order to use thermal trips defined in the thermal structure, rename the 'trips' field to 'num_trips' to have the 'trips' field containing the thermal trip points. Cc: Alexandre Bailon Cc: Kevin Hilman Signed-off-by: Daniel Lezcano Link: https://lore.kernel.org/r/20220722200007.1839356-8-daniel.lezcano@linexp.org Signed-off-by: Daniel Lezcano --- include/linux/thermal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 7e66970f0464..ae579a70cc1a 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -123,7 +123,7 @@ struct thermal_cooling_device { * @trip_hyst_attrs: attributes for trip points for sysfs: trip hysteresis * @mode: current mode of this thermal zone * @devdata: private pointer for device private data - * @trips: number of trip points the thermal zone supports + * @num_trips: number of trip points the thermal zone supports * @trips_disabled; bitmap for disabled trips * @passive_delay_jiffies: number of jiffies to wait between polls when * performing passive cooling. @@ -163,7 +163,7 @@ struct thermal_zone_device { struct thermal_attr *trip_hyst_attrs; enum thermal_device_mode mode; void *devdata; - int trips; + int num_trips; unsigned long trips_disabled; /* bitmap for disabled trips */ unsigned long passive_delay_jiffies; unsigned long polling_delay_jiffies; -- cgit From fae11de507f0e420ba3733729317559e7cea6274 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Fri, 22 Jul 2022 22:00:05 +0200 Subject: thermal/core: Add thermal_trip in thermal_zone The thermal trip points are properties of a thermal zone and the different sub systems should be able to save them in the thermal zone structure instead of having their own definition. Give the opportunity to the drivers to create a thermal zone with thermal trips which will be accessible directly from the thermal core framework. As we added the thermal trip points structure in the thermal zone, let's extend the thermal zone register function to have the thermal trip structures as a parameter and store it in the 'trips' field of the thermal zone structure. The thermal zone contains the trip point, we can store them directly when registering the thermal zone. That will allow another step forward to remove the duplicate thermal zone structure we find in the thermal_of code. Cc: Alexandre Bailon Cc: Kevin Hilman Signed-off-by: Daniel Lezcano Link: https://lore.kernel.org/r/20220722200007.1839356-9-daniel.lezcano@linexp.org Signed-off-by: Daniel Lezcano --- include/linux/thermal.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index ae579a70cc1a..1386c713885d 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -123,6 +123,7 @@ struct thermal_cooling_device { * @trip_hyst_attrs: attributes for trip points for sysfs: trip hysteresis * @mode: current mode of this thermal zone * @devdata: private pointer for device private data + * @trips: an array of struct thermal_trip * @num_trips: number of trip points the thermal zone supports * @trips_disabled; bitmap for disabled trips * @passive_delay_jiffies: number of jiffies to wait between polls when @@ -163,6 +164,7 @@ struct thermal_zone_device { struct thermal_attr *trip_hyst_attrs; enum thermal_device_mode mode; void *devdata; + struct thermal_trip *trips; int num_trips; unsigned long trips_disabled; /* bitmap for disabled trips */ unsigned long passive_delay_jiffies; @@ -376,8 +378,14 @@ void devm_thermal_zone_of_sensor_unregister(struct device *dev, struct thermal_zone_device *thermal_zone_device_register(const char *, int, int, void *, struct thermal_zone_device_ops *, struct thermal_zone_params *, int, int); + void thermal_zone_device_unregister(struct thermal_zone_device *); +struct thermal_zone_device * +thermal_zone_device_register_with_trips(const char *, struct thermal_trip *, int, int, + void *, struct thermal_zone_device_ops *, + struct thermal_zone_params *, int, int); + int thermal_zone_bind_cooling_device(struct thermal_zone_device *, int, struct thermal_cooling_device *, unsigned long, unsigned long, -- cgit From 6dd71251b9aeedd540fd7003bc5f73d59dd6dcb2 Mon Sep 17 00:00:00 2001 From: Xin Gao Date: Fri, 22 Jul 2022 10:23:37 +0800 Subject: platform/x86: pmc_atom: Fix comment typo The double `of' is duplicated in line 50, remove one. Signed-off-by: Xin Gao Link: https://lore.kernel.org/r/20220722022337.15903-1-gaoxin@cdjrlc.com Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/pmc_atom.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/pmc_atom.h b/include/linux/platform_data/x86/pmc_atom.h index 6807839c718b..3edfb6d4e67a 100644 --- a/include/linux/platform_data/x86/pmc_atom.h +++ b/include/linux/platform_data/x86/pmc_atom.h @@ -47,7 +47,7 @@ #define PMC_S0I2_TMR 0x88 #define PMC_S0I3_TMR 0x8C #define PMC_S0_TMR 0x90 -/* Sleep state counter is in units of of 32us */ +/* Sleep state counter is in units of 32us */ #define PMC_TMR_SHIFT 5 /* Power status of power islands */ -- cgit From 9dd1cd3220eca534f2d47afad7ce85f4c40118d8 Mon Sep 17 00:00:00 2001 From: Mike Snitzer Date: Wed, 20 Jul 2022 13:58:04 -0400 Subject: dm: fix dm-raid crash if md_handle_request() splits bio Commit ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone") introduced the optimization to _not_ perform bio_associate_blkg()'s relatively costly work when DM core clones its bio. But in doing so it exposed the possibility for DM's cloned bio to alter DM target behavior (e.g. crash) if a target were to issue IO without first calling bio_set_dev(). The DM raid target can trigger an MD crash due to its need to split the DM bio that is passed to md_handle_request(). The split will recurse to submit_bio_noacct() using a bio with an uninitialized ->bi_blkg. This NULL bio->bi_blkg causes blk_throtl_bio() to dereference a NULL blkg_to_tg(bio->bi_blkg). Fix this in DM core by adding a new 'needs_bio_set_dev' target flag that will make alloc_tio() call bio_set_dev() on behalf of the target. dm-raid is the only target that requires this flag. bio_set_dev() initializes the DM cloned bio's ->bi_blkg, using bio_associate_blkg, before passing the bio to md_handle_request(). Long-term fix would be to audit and refactor MD code to rely on DM to split its bio, using dm_accept_partial_bio(), but there are MD raid personalities (e.g. raid1 and raid10) whose implementation are tightly coupled to handling the bio splitting inline. Fixes: ca522482e3eaf ("dm: pass NULL bdev to bio_alloc_clone") Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer --- include/linux/device-mapper.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h index 920085dd7f3b..04c6acf7faaa 100644 --- a/include/linux/device-mapper.h +++ b/include/linux/device-mapper.h @@ -373,6 +373,12 @@ struct dm_target { * after returning DM_MAPIO_SUBMITTED from its map function. */ bool accounts_remapped_io:1; + + /* + * Set if the target will submit the DM bio without first calling + * bio_set_dev(). NOTE: ideally a target should _not_ need this. + */ + bool needs_bio_set_dev:1; }; void *dm_per_bio_data(struct bio *bio, size_t data_size); -- cgit From 0fcb100d50835d6823723ef0898cd565b3796e0a Mon Sep 17 00:00:00 2001 From: Nathan Huckleberry Date: Fri, 22 Jul 2022 09:38:21 +0000 Subject: dm bufio: Add flags argument to dm_bufio_client_create Add a flags argument to dm_bufio_client_create and update all the callers. This is in preparation to add the DM_BUFIO_NO_SLEEP flag. Signed-off-by: Nathan Huckleberry Signed-off-by: Mike Snitzer --- include/linux/dm-bufio.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dm-bufio.h b/include/linux/dm-bufio.h index 90bd558a17f5..e21480715255 100644 --- a/include/linux/dm-bufio.h +++ b/include/linux/dm-bufio.h @@ -24,7 +24,8 @@ struct dm_bufio_client * dm_bufio_client_create(struct block_device *bdev, unsigned block_size, unsigned reserved_buffers, unsigned aux_size, void (*alloc_callback)(struct dm_buffer *), - void (*write_callback)(struct dm_buffer *)); + void (*write_callback)(struct dm_buffer *), + unsigned int flags); /* * Release a buffered IO cache. -- cgit From b32d45824aa7e07a0c3257a16e3a2a691b11b39a Mon Sep 17 00:00:00 2001 From: Nathan Huckleberry Date: Fri, 22 Jul 2022 09:38:22 +0000 Subject: dm bufio: Add DM_BUFIO_CLIENT_NO_SLEEP flag Add an optional flag that ensures dm_bufio_client does not sleep (primary focus is to service dm_bufio_get without sleeping). This allows the dm-bufio cache to be queried from interrupt context. To ensure that dm-bufio does not sleep, dm-bufio must use a spinlock instead of a mutex. Additionally, to avoid deadlocks, special care must be taken so that dm-bufio does not sleep while holding the spinlock. But again: the scope of this no_sleep is initially confined to dm_bufio_get, so __alloc_buffer_wait_no_callback is _not_ changed to avoid sleeping because __bufio_new avoids allocation for NF_GET. Signed-off-by: Nathan Huckleberry Signed-off-by: Mike Snitzer --- include/linux/dm-bufio.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dm-bufio.h b/include/linux/dm-bufio.h index e21480715255..15d9e15ca830 100644 --- a/include/linux/dm-bufio.h +++ b/include/linux/dm-bufio.h @@ -17,6 +17,11 @@ struct dm_bufio_client; struct dm_buffer; +/* + * Flags for dm_bufio_client_create + */ +#define DM_BUFIO_CLIENT_NO_SLEEP 0x1 + /* * Create a buffered IO cache on a given device */ -- cgit From 71610ab1d017e131a9888ef8acd035284fb0e1dd Mon Sep 17 00:00:00 2001 From: Bibo Mao Date: Wed, 20 Jul 2022 15:21:51 +0800 Subject: LoongArch: Remove clock setting during cpu hotplug stage On physical machine we can save power by disabling clock of hot removed cpu. However as different platforms require different methods to configure clocks, the code is platform-specific, and probably belongs to firmware/pmu or cpu regulator, rather than generic arch/loongarch code. Also, there is no such register on QEMU virt machine since the clock/frequency regulation is not emulated. This patch removes the hard-coded clock register accesses in generic LoongArch cpu hotplug flow. Reviewed-by: WANG Xuerui Signed-off-by: Bibo Mao Signed-off-by: Huacai Chen --- include/linux/cpuhotplug.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h index 19f0dbfdd7fe..b66c5f389159 100644 --- a/include/linux/cpuhotplug.h +++ b/include/linux/cpuhotplug.h @@ -130,7 +130,6 @@ enum cpuhp_state { CPUHP_ZCOMP_PREPARE, CPUHP_TIMERS_PREPARE, CPUHP_MIPS_SOC_PREPARE, - CPUHP_LOONGARCH_SOC_PREPARE, CPUHP_BP_PREPARE_DYN, CPUHP_BP_PREPARE_DYN_END = CPUHP_BP_PREPARE_DYN + 20, CPUHP_BRINGUP_CPU, -- cgit From 4ab0e470c06dc741f6583578da44961669331b78 Mon Sep 17 00:00:00 2001 From: Anup Patel Date: Fri, 29 Jul 2022 17:15:00 +0530 Subject: KVM: Add gfp_custom flag in struct kvm_mmu_memory_cache The kvm_mmu_topup_memory_cache() always uses GFP_KERNEL_ACCOUNT for memory allocation which prevents it's use in atomic context. To address this limitation of kvm_mmu_topup_memory_cache(), we add gfp_custom flag in struct kvm_mmu_memory_cache. When the gfp_custom flag is set to some GFP_xyz flags, the kvm_mmu_topup_memory_cache() will use that instead of GFP_KERNEL_ACCOUNT. Signed-off-by: Anup Patel Reviewed-by: Atish Patra Signed-off-by: Anup Patel --- include/linux/kvm_types.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index ac1ebb37a0ff..1dcfba68076a 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -87,6 +87,7 @@ struct gfn_to_pfn_cache { struct kvm_mmu_memory_cache { int nobjs; gfp_t gfp_zero; + gfp_t gfp_custom; struct kmem_cache *kmem_cache; void *objects[KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE]; }; -- cgit From 0ad722f159e44983ddea1929ffd90d0c20a86f24 Mon Sep 17 00:00:00 2001 From: Arnd Bergmann Date: Fri, 15 Jul 2022 17:36:16 +0200 Subject: PCI: Remove pci_mmap_page_range() wrapper The ARCH_GENERIC_PCI_MMAP_RESOURCE symbol came up in a recent discussion, and I noticed that this was left behind by an unfinished cleanup from 2017. The only architecture that still relies on providing its own pci_mmap_page_range() helper instead of using the generic pci_mmap_resource_range() is sparc. Presumably the reasons for this have not changed, but at least this can be simplified by converting sparc to use the same interface as the others. The only difference between the two is the device-specific offset that gets added to or subtracted from vma->vm_pgoff. Change the only caller of pci_mmap_page_range() in common code to subtract this offset and call the modern interface, while adding it back in the sparc implementation to preserve the existing behavior. This removes the complexities of the dual interfaces from the common code, and keeps it all specific to the sparc architecture code. According to David Miller, the sparc code lets user space poke into the VGA I/O port registers by mmapping the I/O space of the parent bridge device, which is something that the generic pci_mmap_resource_range() code apparently does not. Link: https://lore.kernel.org/lkml/1519887203.622.3.camel@infradead.org/t/ Link: https://lore.kernel.org/lkml/20220714214657.2402250-3-shorne@gmail.com/ Link: https://lore.kernel.org/r/20220715153617.3393420-1-arnd@kernel.org Signed-off-by: Arnd Bergmann Signed-off-by: Bjorn Helgaas Cc: David Woodhouse Cc: David S. Miller Cc: Stafford Horne --- include/linux/pci.h | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 81a57b498f22..060af91bafcd 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1909,24 +1909,14 @@ pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs, #include -/* These two functions provide almost identical functionality. Depending - * on the architecture, one will be implemented as a wrapper around the - * other (in drivers/pci/mmap.c). - * +/* * pci_mmap_resource_range() maps a specific BAR, and vm->vm_pgoff * is expected to be an offset within that region. * - * pci_mmap_page_range() is the legacy architecture-specific interface, - * which accepts a "user visible" resource address converted by - * pci_resource_to_user(), as used in the legacy mmap() interface in - * /proc/bus/pci/. */ int pci_mmap_resource_range(struct pci_dev *dev, int bar, struct vm_area_struct *vma, enum pci_mmap_state mmap_state, int write_combine); -int pci_mmap_page_range(struct pci_dev *pdev, int bar, - struct vm_area_struct *vma, - enum pci_mmap_state mmap_state, int write_combine); #ifndef arch_can_pci_mmap_wc #define arch_can_pci_mmap_wc() 0 -- cgit From 909fcb195201f7c3aa81523cc182a4c9b52210e5 Mon Sep 17 00:00:00 2001 From: Marijn Suijten Date: Thu, 30 Jun 2022 00:53:21 +0200 Subject: clk: divider: Introduce devm_clk_hw_register_divider_parent_hw() Add the devres variant of clk_hw_register_divider_parent_hw() for registering a divider clock with clk_hw parent pointer instead of parent name. Signed-off-by: Marijn Suijten Reviewed-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20220629225331.357308-2-marijn.suijten@somainline.org Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index c10dc4c659e2..4e07621849e6 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -831,6 +831,25 @@ struct clk *clk_register_divider_table(struct device *dev, const char *name, __devm_clk_hw_register_divider((dev), NULL, (name), (parent_name), NULL, \ NULL, (flags), (reg), (shift), (width), \ (clk_divider_flags), NULL, (lock)) +/** + * devm_clk_hw_register_divider_parent_hw - register a divider clock with the clock framework + * @dev: device registering this clock + * @name: name of this clock + * @parent_hw: pointer to parent clk + * @flags: framework-specific flags + * @reg: register address to adjust divider + * @shift: number of bits to shift the bitfield + * @width: width of the bitfield + * @clk_divider_flags: divider-specific flags for this clock + * @lock: shared register lock for this clock + */ +#define devm_clk_hw_register_divider_parent_hw(dev, name, parent_hw, flags, \ + reg, shift, width, \ + clk_divider_flags, lock) \ + __devm_clk_hw_register_divider((dev), NULL, (name), NULL, \ + (parent_hw), NULL, (flags), (reg), \ + (shift), (width), (clk_divider_flags), \ + NULL, (lock)) /** * devm_clk_hw_register_divider_table - register a table based divider clock * with the clock framework (devres variant) -- cgit From df63af17f3375239e0be124844f304b4a4b60665 Mon Sep 17 00:00:00 2001 From: Marijn Suijten Date: Thu, 30 Jun 2022 00:53:22 +0200 Subject: clk: mux: Introduce devm_clk_hw_register_mux_parent_hws() Add the devres variant of clk_hw_register_mux_hws() for registering a mux clock with clk_hw parent pointers instead of parent names. Signed-off-by: Marijn Suijten Reviewed-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20220629225331.357308-3-marijn.suijten@somainline.org Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 4e07621849e6..316c7e082934 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -980,6 +980,13 @@ struct clk *clk_register_mux_table(struct device *dev, const char *name, (parent_names), NULL, NULL, (flags), (reg), \ (shift), BIT((width)) - 1, (clk_mux_flags), \ NULL, (lock)) +#define devm_clk_hw_register_mux_parent_hws(dev, name, parent_hws, \ + num_parents, flags, reg, shift, \ + width, clk_mux_flags, lock) \ + __devm_clk_hw_register_mux((dev), NULL, (name), (num_parents), NULL, \ + (parent_hws), NULL, (flags), (reg), \ + (shift), BIT((width)) - 1, \ + (clk_mux_flags), NULL, (lock)) int clk_mux_val_to_index(struct clk_hw *hw, const u32 *table, unsigned int flags, unsigned int val); -- cgit From 6ebd5247ad2aa210b3ff4481c6ed8aa32a032b12 Mon Sep 17 00:00:00 2001 From: Marijn Suijten Date: Thu, 30 Jun 2022 00:53:23 +0200 Subject: clk: fixed-factor: Introduce *clk_hw_register_fixed_factor_parent_hw() Add the devres and non-devres variant of clk_hw_register_fixed_factor_parent_hw() for registering a fixed factor clock with clk_hw parent pointer instead of parent name. Signed-off-by: Marijn Suijten Link: https://lore.kernel.org/r/20220629225331.357308-4-marijn.suijten@somainline.org Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 316c7e082934..94458cb669f0 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -1032,6 +1032,14 @@ struct clk_hw *devm_clk_hw_register_fixed_factor(struct device *dev, struct clk_hw *devm_clk_hw_register_fixed_factor_index(struct device *dev, const char *name, unsigned int index, unsigned long flags, unsigned int mult, unsigned int div); + +struct clk_hw *devm_clk_hw_register_fixed_factor_parent_hw(struct device *dev, + const char *name, const struct clk_hw *parent_hw, + unsigned long flags, unsigned int mult, unsigned int div); + +struct clk_hw *clk_hw_register_fixed_factor_parent_hw(struct device *dev, + const char *name, const struct clk_hw *parent_hw, + unsigned long flags, unsigned int mult, unsigned int div); /** * struct clk_fractional_divider - adjustable fractional divider clock * -- cgit From c770f31d8f580ed4b965c64f924ec1cc50e41734 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Tue, 19 Jul 2022 09:18:35 -0400 Subject: SUNRPC: Fix xdr_encode_bool() I discovered that xdr_encode_bool() was returning the same address that was passed in the @p parameter. The documenting comment states that the intent is to return the address of the next buffer location, just like the other "xdr_encode_*" helpers. The result was the encoded results of NFSv3 PATHCONF operations were not formed correctly. Fixes: ded04a587f6c ("NFSD: Update the NFSv3 PATHCONF3res encoder to use struct xdr_stream") Signed-off-by: Chuck Lever Reviewed-by: Jeff Layton --- include/linux/sunrpc/xdr.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 5860f32e3958..986c8a17ca5e 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -419,8 +419,8 @@ static inline int xdr_stream_encode_item_absent(struct xdr_stream *xdr) */ static inline __be32 *xdr_encode_bool(__be32 *p, u32 n) { - *p = n ? xdr_one : xdr_zero; - return p++; + *p++ = n ? xdr_one : xdr_zero; + return p; } /** -- cgit From 184cefbe62627730c30282df12bcff9aae4816ea Mon Sep 17 00:00:00 2001 From: Benjamin Coddington Date: Mon, 13 Jun 2022 09:40:06 -0400 Subject: NLM: Defend against file_lock changes after vfs_test_lock() Instead of trusting that struct file_lock returns completely unchanged after vfs_test_lock() when there's no conflicting lock, stash away our nlm_lockowner reference so we can properly release it for all cases. This defends against another file_lock implementation overwriting fl_owner when the return type is F_UNLCK. Reported-by: Roberto Bergantinos Corpas Tested-by: Roberto Bergantinos Corpas Signed-off-by: Benjamin Coddington Signed-off-by: Chuck Lever --- include/linux/lockd/lockd.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h index fcef192e5e45..70ce419e2709 100644 --- a/include/linux/lockd/lockd.h +++ b/include/linux/lockd/lockd.h @@ -292,6 +292,7 @@ void nlmsvc_locks_init_private(struct file_lock *, struct nlm_host *, pid_t); __be32 nlm_lookup_file(struct svc_rqst *, struct nlm_file **, struct nlm_lock *); void nlm_release_file(struct nlm_file *); +void nlmsvc_put_lockowner(struct nlm_lockowner *); void nlmsvc_release_lockowner(struct nlm_lock *); void nlmsvc_mark_resources(struct net *); void nlmsvc_free_host_resources(struct nlm_host *); -- cgit From 5304877936c0a67e1a01464d113bae4c81eacdb6 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Wed, 27 Jul 2022 14:40:03 -0400 Subject: NFSD: Fix strncpy() fortify warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In function ‘strncpy’, inlined from ‘nfsd4_ssc_setup_dul’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1392:3, inlined from ‘nfsd4_interssc_connect’ at /home/cel/src/linux/manet/fs/nfsd/nfs4proc.c:1489:11: /home/cel/src/linux/manet/include/linux/fortify-string.h:52:33: warning: ‘__builtin_strncpy’ specified bound 63 equals destination size [-Wstringop-truncation] 52 | #define __underlying_strncpy __builtin_strncpy | ^ /home/cel/src/linux/manet/include/linux/fortify-string.h:89:16: note: in expansion of macro ‘__underlying_strncpy’ 89 | return __underlying_strncpy(p, q, size); | ^~~~~~~~~~~~~~~~~~~~ Signed-off-by: Chuck Lever --- include/linux/nfs_ssc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nfs_ssc.h b/include/linux/nfs_ssc.h index 222ae8883e85..75843c00f326 100644 --- a/include/linux/nfs_ssc.h +++ b/include/linux/nfs_ssc.h @@ -64,7 +64,7 @@ struct nfsd4_ssc_umount_item { refcount_t nsui_refcnt; unsigned long nsui_expire; struct vfsmount *nsui_vfsmount; - char nsui_ipaddr[RPC_MAX_ADDRBUFLEN]; + char nsui_ipaddr[RPC_MAX_ADDRBUFLEN + 1]; }; #endif -- cgit From fef3e9066d19230f661048ca86937d954c12cd50 Mon Sep 17 00:00:00 2001 From: Xiu Jianfeng Date: Thu, 14 Jul 2022 16:41:47 +0800 Subject: writeback: remove inode_to_wb_is_valid() inode_to_wb_is_valid() is no longer used since commit fe55d563d417 ("remove inode_congested()"), remove it. Link: https://lkml.kernel.org/r/20220714084147.140324-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng Reviewed-by: Johannes Thumshirn Reviewed-by: Jan Kara Signed-off-by: Andrew Morton --- include/linux/backing-dev.h | 17 ----------------- 1 file changed, 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h index e84b745a6811..439815cc1ab9 100644 --- a/include/linux/backing-dev.h +++ b/include/linux/backing-dev.h @@ -229,18 +229,6 @@ wb_get_create_current(struct backing_dev_info *bdi, gfp_t gfp) return wb; } -/** - * inode_to_wb_is_valid - test whether an inode has a wb associated - * @inode: inode of interest - * - * Returns %true if @inode has a wb associated. May be called without any - * locking. - */ -static inline bool inode_to_wb_is_valid(struct inode *inode) -{ - return inode->i_wb; -} - /** * inode_to_wb - determine the wb of an inode * @inode: inode of interest @@ -339,11 +327,6 @@ wb_get_create_current(struct backing_dev_info *bdi, gfp_t gfp) return &bdi->wb; } -static inline bool inode_to_wb_is_valid(struct inode *inode) -{ - return true; -} - static inline struct bdi_writeback *inode_to_wb(struct inode *inode) { return &inode_to_bdi(inode)->wb; -- cgit From 73b73bac90d97400e29e585c678c4d0ebfd2680d Mon Sep 17 00:00:00 2001 From: Yosry Ahmed Date: Thu, 14 Jul 2022 06:49:18 +0000 Subject: mm: vmpressure: don't count proactive reclaim in vmpressure memory.reclaim is a cgroup v2 interface that allows users to proactively reclaim memory from a memcg, without real memory pressure. Reclaim operations invoke vmpressure, which is used: (a) To notify userspace of reclaim efficiency in cgroup v1, and (b) As a signal for a memcg being under memory pressure for networking (see mem_cgroup_under_socket_pressure()). For (a), vmpressure notifications in v1 are not affected by this change since memory.reclaim is a v2 feature. For (b), the effects of the vmpressure signal (according to Shakeel [1]) are as follows: 1. Reducing send and receive buffers of the current socket. 2. May drop packets on the rx path. 3. May throttle current thread on the tx path. Since proactive reclaim is invoked directly by userspace, not by memory pressure, it makes sense not to throttle networking. Hence, this change makes sure that proactive reclaim caused by memory.reclaim does not trigger vmpressure. [1] https://lore.kernel.org/lkml/CALvZod68WdrXEmBpOkadhB5GPYmCXaDZzXH=yyGOCAjFRn4NDQ@mail.gmail.com/ [yosryahmed@google.com: update documentation] Link: https://lkml.kernel.org/r/20220721173015.2643248-1-yosryahmed@google.com Link: https://lkml.kernel.org/r/20220714064918.2576464-1-yosryahmed@google.com Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Michal Hocko Acked-by: David Rientjes Cc: Johannes Weiner Cc: Roman Gushchin Cc: Muchun Song Cc: Matthew Wilcox Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Miaohe Lin Cc: NeilBrown Cc: Alistair Popple Cc: Suren Baghdasaryan Cc: Peter Xu Signed-off-by: Andrew Morton --- include/linux/swap.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 6d11c51b2b62..ea895b40e6ff 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -411,10 +411,13 @@ extern void lru_cache_add_inactive_or_unevictable(struct page *page, extern unsigned long zone_reclaimable_pages(struct zone *zone); extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask, nodemask_t *mask); + +#define MEMCG_RECLAIM_MAY_SWAP (1 << 1) +#define MEMCG_RECLAIM_PROACTIVE (1 << 2) extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, - bool may_swap); + unsigned int reclaim_options); extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, gfp_t gfp_mask, bool noswap, pg_data_t *pgdat, -- cgit From e408e695f5f1f60d784913afc45ff2c387a5aeb8 Mon Sep 17 00:00:00 2001 From: Theodore Ts'o Date: Thu, 14 Jul 2022 21:59:12 -0400 Subject: mm/shmem: support FS_IOC_[SG]ETFLAGS in tmpfs This allows userspace to set flags like FS_APPEND_FL, FS_IMMUTABLE_FL, FS_NODUMP_FL, etc., like all other standard Linux file systems. [akpm@linux-foundation.org: fix CONFIG_TMPFS_XATTR=n warnings] Link: https://lkml.kernel.org/r/20220715015912.2560575-1-tytso@mit.edu Signed-off-by: Theodore Ts'o Cc: Hugh Dickins Signed-off-by: Andrew Morton --- include/linux/shmem_fs.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index a68f982f22d1..1b6c4013f691 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -25,9 +25,20 @@ struct shmem_inode_info { struct simple_xattrs xattrs; /* list of xattrs */ atomic_t stop_eviction; /* hold when working on inode */ struct timespec64 i_crtime; /* file creation time */ + unsigned int fsflags; /* flags for FS_IOC_[SG]ETFLAGS */ struct inode vfs_inode; }; +#define SHMEM_FL_USER_VISIBLE FS_FL_USER_VISIBLE +#define SHMEM_FL_USER_MODIFIABLE FS_FL_USER_MODIFIABLE +#define SHMEM_FL_INHERITED FS_FL_USER_MODIFIABLE + +/* Flags that are appropriate for regular files (all but dir-specific ones). */ +#define SHMEM_REG_FLMASK (~(FS_DIRSYNC_FL | FS_TOPDIR_FL)) + +/* Flags that are appropriate for non-directories/regular files. */ +#define SHMEM_OTHER_FLMASK (FS_NODUMP_FL | FS_NOATIME_FL) + struct shmem_sb_info { unsigned long max_blocks; /* How many blocks are allowed */ struct percpu_counter used_blocks; /* How many are allocated */ -- cgit From bb077c3ffd5362a6d9e60574e1bcc83fe8e3fb27 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Tue, 26 Jul 2022 21:18:16 +0800 Subject: mm: cleanup is_highmem() It is unnecessary to add CONFIG_HIGHMEM check in is_highmem(), which has been done in is_highmem_idx(), and move is_highmem() close to is_highmem_idx(). This has no functional impact. Link: https://lkml.kernel.org/r/20220726131816.149075-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 578247a341b2..e24b40c52468 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1137,15 +1137,6 @@ static inline int is_highmem_idx(enum zone_type idx) #endif } -#ifdef CONFIG_ZONE_DMA -bool has_managed_dma(void); -#else -static inline bool has_managed_dma(void) -{ - return false; -} -#endif - /** * is_highmem - helper function to quickly check if a struct zone is a * highmem zone or not. This is an attempt to keep references @@ -1155,12 +1146,17 @@ static inline bool has_managed_dma(void) */ static inline int is_highmem(struct zone *zone) { -#ifdef CONFIG_HIGHMEM return is_highmem_idx(zone_idx(zone)); +} + +#ifdef CONFIG_ZONE_DMA +bool has_managed_dma(void); #else - return 0; -#endif +static inline bool has_managed_dma(void) +{ + return false; } +#endif /* These two functions are used to setup the per zone pages min values */ struct ctl_table; -- cgit From fa7d574ba4f4f3f4f78d432c8545d9045daa89b1 Mon Sep 17 00:00:00 2001 From: Xiu Jianfeng Date: Tue, 19 Jul 2022 16:33:49 +0800 Subject: bdi: remove enum wb_congested_state enum wb_congested_state and the member 'congested' in bdi_writeback are useless since commit a88f2096d5a2 ("remove congestion tracking framework"), so remove it. Link: https://lkml.kernel.org/r/20220719083349.87547-1-xiujianfeng@huawei.com Signed-off-by: Xiu Jianfeng Reviewed-by: Jan Kara Cc: NeilBrown Signed-off-by: Andrew Morton --- include/linux/backing-dev-defs.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index e863c88df95f..ae12696ec492 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -28,11 +28,6 @@ enum wb_state { WB_start_all, /* nr_pages == 0 (all) work pending */ }; -enum wb_congested_state { - WB_async_congested, /* The async (write) queue is getting full */ - WB_sync_congested, /* The sync queue is getting full */ -}; - enum wb_stat_item { WB_RECLAIMABLE, WB_WRITEBACK, @@ -122,8 +117,6 @@ struct bdi_writeback { atomic_t writeback_inodes; /* number of inodes under writeback */ struct percpu_counter stat[NR_WB_STAT_ITEMS]; - unsigned long congested; /* WB_[a]sync_congested flags */ - unsigned long bw_time_stamp; /* last time write bw is updated */ unsigned long dirtied_stamp; unsigned long written_stamp; /* pages written at bw_time_stamp */ -- cgit From 45f78b0a2743c4fd71b73400bd5d5339628bf538 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Wed, 27 Jul 2022 13:49:03 +0200 Subject: fs/dcache: Move the wakeup from __d_lookup_done() to the caller. __d_lookup_done() wakes waiters on dentry->d_wait. On PREEMPT_RT we are not allowed to do that with preemption disabled, since the wakeup acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT. Calling it under dentry->d_lock is not a problem, since that is also a "sleeping" spinlock on the same configs. Unfortunately, two of its callers (__d_add() and __d_move()) are holding more than just ->d_lock and that needs to be dealt with. The key observation is that wakeup can be moved to any point before dropping ->d_lock. As a first step to solve this, move the wake up outside of the hlist_bl_lock() held section. This is safe because: Waiters get inserted into ->d_wait only after they'd taken ->d_lock and observed DCACHE_PAR_LOOKUP in flags. As long as they are woken up (and evicted from the queue) between the moment __d_lookup_done() has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe, since the waitqueue ->d_wait points to won't get destroyed without having __d_lookup_done(dentry) called (under ->d_lock). ->d_wait is set only by d_alloc_parallel() and only in case when it returns a freshly allocated in-lookup dentry. Whenever that happens, we are guaranteed that __d_lookup_done() will be called for resulting dentry (under ->d_lock) before the wq in question gets destroyed. With two exceptions wq lives in call frame of the caller of d_alloc_parallel() and we have an explicit d_lookup_done() on the resulting in-lookup dentry before we leave that frame. One of those exceptions is nfs_call_unlink(), where wq is embedded into (dynamically allocated) struct nfs_unlinkdata. It is destroyed in nfs_async_unlink_release() after an explicit d_lookup_done() on the dentry wq went into. Remaining exception is d_add_ci(). There wq is what we'd found in ->d_wait of d_add_ci() argument. Callers of d_add_ci() are two instances of ->d_lookup() and they must have been given an in-lookup dentry. Which means that they'd been called by __lookup_slow() or lookup_open(), with wq in the call frame of one of those. Result of d_alloc_parallel() in d_add_ci() is fed to d_splice_alias(), which either returns non-NULL (and d_add_ci() does d_lookup_done()) or feeds dentry to __d_add() that will do __d_lookup_done() under ->d_lock. That concludes the analysis. Let __d_lookup_unhash(): 1) Lock the lookup hash and clear DCACHE_PAR_LOOKUP 2) Unhash the dentry 3) Retrieve and clear dentry::d_wait 4) Unlock the hash and return the retrieved waitqueue head pointer 5) Let the caller handle the wake up. 6) Rename __d_lookup_done() to __d_lookup_unhash_wake() to enforce build failures for OOT code that used __d_lookup_done() and is not aware of the new return value. This does not yet solve the PREEMPT_RT problem completely because preemption is still disabled due to i_dir_seq being held for write. This will be addressed in subsequent steps. An alternative solution would be to switch the waitqueue to a simple waitqueue, but aside of Linus not being a fan of them, moving the wake up closer to the place where dentry::lock is unlocked reduces lock contention time for the woken up waiter. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Link: https://lkml.kernel.org/r/20220613140712.77932-3-bigeasy@linutronix.de Signed-off-by: Al Viro --- include/linux/dcache.h | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dcache.h b/include/linux/dcache.h index f5bba51480b2..c73e5e327e76 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -349,7 +349,7 @@ static inline void dont_mount(struct dentry *dentry) spin_unlock(&dentry->d_lock); } -extern void __d_lookup_done(struct dentry *); +extern void __d_lookup_unhash_wake(struct dentry *dentry); static inline int d_in_lookup(const struct dentry *dentry) { @@ -358,11 +358,8 @@ static inline int d_in_lookup(const struct dentry *dentry) static inline void d_lookup_done(struct dentry *dentry) { - if (unlikely(d_in_lookup(dentry))) { - spin_lock(&dentry->d_lock); - __d_lookup_done(dentry); - spin_unlock(&dentry->d_lock); - } + if (unlikely(d_in_lookup(dentry))) + __d_lookup_unhash_wake(dentry); } extern void dput(struct dentry *); -- cgit From 102227b970a15256f5ffd12a6a276ddf978e6caf Mon Sep 17 00:00:00 2001 From: Daniel Bristot de Oliveira Date: Fri, 29 Jul 2022 11:38:40 +0200 Subject: rv: Add Runtime Verification (RV) interface RV is a lightweight (yet rigorous) method that complements classical exhaustive verification techniques (such as model checking and theorem proving) with a more practical approach to complex systems. RV works by analyzing the trace of the system's actual execution, comparing it against a formal specification of the system behavior. RV can give precise information on the runtime behavior of the monitored system while enabling the reaction for unexpected events, avoiding, for example, the propagation of a failure on safety-critical systems. The development of this interface roots in the development of the paper: De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo Silva. Efficient formal verification for the Linux kernel. In: International Conference on Software Engineering and Formal Methods. Springer, Cham, 2019. p. 315-332. And: De Oliveira, Daniel Bristot. Automata-based formal analysis and verification of the real-time Linux kernel. PhD Thesis, 2020. The RV interface resembles the tracing/ interface on purpose. The current path for the RV interface is /sys/kernel/tracing/rv/. It presents these files: "available_monitors" - List the available monitors, one per line. For example: # cat available_monitors wip wwnr "enabled_monitors" - Lists the enabled monitors, one per line; - Writing to it enables a given monitor; - Writing a monitor name with a '!' prefix disables it; - Truncating the file disables all enabled monitors. For example: # cat enabled_monitors # echo wip > enabled_monitors # echo wwnr >> enabled_monitors # cat enabled_monitors wip wwnr # echo '!wip' >> enabled_monitors # cat enabled_monitors wwnr # echo > enabled_monitors # cat enabled_monitors # Note that more than one monitor can be enabled concurrently. "monitoring_on" - It is an on/off general switcher for monitoring. Note that it does not disable enabled monitors or detach events, but stop the per-entity monitors of monitoring the events received from the system. It resembles the "tracing_on" switcher. "monitors/" Each monitor will have its one directory inside "monitors/". There the monitor specific files will be presented. The "monitors/" directory resembles the "events" directory on tracefs. For example: # cd monitors/wip/ # ls desc enable # cat desc wakeup in preemptive per-cpu testing monitor. # cat enable 0 For further information, see the comments in the header of kernel/trace/rv/rv.c from this patch. Link: https://lkml.kernel.org/r/a4bfe038f50cb047bfb343ad0e12b0e646ab308b.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Will Deacon Cc: Catalin Marinas Cc: Marco Elver Cc: Dmitry Vyukov Cc: "Paul E. McKenney" Cc: Shuah Khan Cc: Gabriele Paoloni Cc: Juri Lelli Cc: Clark Williams Cc: Tao Zhou Cc: Randy Dunlap Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Steven Rostedt (Google) --- include/linux/rv.h | 43 +++++++++++++++++++++++++++++++++++++++++++ include/linux/sched.h | 11 +++++++++++ 2 files changed, 54 insertions(+) create mode 100644 include/linux/rv.h (limited to 'include/linux') diff --git a/include/linux/rv.h b/include/linux/rv.h new file mode 100644 index 000000000000..d8fa9e8be94a --- /dev/null +++ b/include/linux/rv.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Runtime Verification. + * + * For futher information, see: kernel/trace/rv/rv.c. + */ +#ifndef _LINUX_RV_H +#define _LINUX_RV_H + +#ifdef CONFIG_RV + +/* + * Per-task RV monitors count. Nowadays fixed in RV_PER_TASK_MONITORS. + * If we find justification for more monitors, we can think about + * adding more or developing a dynamic method. So far, none of + * these are justified. + */ +#define RV_PER_TASK_MONITORS 1 +#define RV_PER_TASK_MONITOR_INIT (RV_PER_TASK_MONITORS) + +/* + * Futher monitor types are expected, so make this a union. + */ +union rv_task_monitor { +}; + +struct rv_monitor { + const char *name; + const char *description; + bool enabled; + int (*enable)(void); + void (*disable)(void); + void (*reset)(void); +}; + +bool rv_monitoring_on(void); +int rv_unregister_monitor(struct rv_monitor *monitor); +int rv_register_monitor(struct rv_monitor *monitor); +int rv_get_task_monitor_slot(void); +void rv_put_task_monitor_slot(int slot); + +#endif /* CONFIG_RV */ +#endif /* _LINUX_RV_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index c46f3a63b758..b5da3e7c3a04 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -34,6 +34,7 @@ #include #include #include +#include #include /* task_struct member predeclarations (sorted alphabetically): */ @@ -1500,6 +1501,16 @@ struct task_struct { struct callback_head l1d_flush_kill; #endif +#ifdef CONFIG_RV + /* + * Per-task RV monitor. Nowadays fixed in RV_PER_TASK_MONITORS. + * If we find justification for more monitors, we can think + * about adding more or developing a dynamic method. So far, + * none of these are justified. + */ + union rv_task_monitor rv[RV_PER_TASK_MONITORS]; +#endif + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. -- cgit From 04acadcb4453cf8011dd3d4ce8d97fecac42d325 Mon Sep 17 00:00:00 2001 From: Daniel Bristot de Oliveira Date: Fri, 29 Jul 2022 11:38:41 +0200 Subject: rv: Add runtime reactors interface A runtime monitor can cause a reaction to the detection of an exception on the model's execution. By default, the monitors have tracing reactions, printing the monitor output via tracepoints. But other reactions can be added (on-demand) via this interface. The user interface resembles the kernel tracing interface and presents these files: "available_reactors" - Reading shows the available reactors, one per line. For example: # cat available_reactors nop panic printk "reacting_on" - It is an on/off general switch for reactors, disabling all reactions. "monitors/MONITOR/reactors" - List available reactors, with the select reaction for the given MONITOR inside []. The default one is the nop (no operation) reactor. - Writing the name of a reactor enables it to the given MONITOR. For example: # cat monitors/wip/reactors [nop] panic printk # echo panic > monitors/wip/reactors # cat monitors/wip/reactors nop [panic] printk Link: https://lkml.kernel.org/r/1794eb994637457bdeaa6bad0b8263d2f7eece0c.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Will Deacon Cc: Catalin Marinas Cc: Marco Elver Cc: Dmitry Vyukov Cc: "Paul E. McKenney" Cc: Shuah Khan Cc: Gabriele Paoloni Cc: Juri Lelli Cc: Clark Williams Cc: Tao Zhou Cc: Randy Dunlap Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Steven Rostedt (Google) --- include/linux/rv.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rv.h b/include/linux/rv.h index d8fa9e8be94a..dcce56987cf7 100644 --- a/include/linux/rv.h +++ b/include/linux/rv.h @@ -24,6 +24,14 @@ union rv_task_monitor { }; +#ifdef CONFIG_RV_REACTORS +struct rv_reactor { + const char *name; + const char *description; + void (*react)(char *msg); +}; +#endif + struct rv_monitor { const char *name; const char *description; @@ -31,6 +39,9 @@ struct rv_monitor { int (*enable)(void); void (*disable)(void); void (*reset)(void); +#ifdef CONFIG_RV_REACTORS + void (*react)(char *msg); +#endif }; bool rv_monitoring_on(void); @@ -39,5 +50,11 @@ int rv_register_monitor(struct rv_monitor *monitor); int rv_get_task_monitor_slot(void); void rv_put_task_monitor_slot(int slot); +#ifdef CONFIG_RV_REACTORS +bool rv_reacting_on(void); +int rv_unregister_reactor(struct rv_reactor *reactor); +int rv_register_reactor(struct rv_reactor *reactor); +#endif /* CONFIG_RV_REACTORS */ + #endif /* CONFIG_RV */ #endif /* _LINUX_RV_H */ -- cgit From 792575348ff70e05c6040d02fce38e949ef92c37 Mon Sep 17 00:00:00 2001 From: Daniel Bristot de Oliveira Date: Fri, 29 Jul 2022 11:38:43 +0200 Subject: rv/include: Add deterministic automata monitor definition via C macros In Linux terms, the runtime verification monitors are encapsulated inside the "RV monitor" abstraction. The "RV monitor" includes a set of instances of the monitor (per-cpu monitor, per-task monitor, and so on), the helper functions that glue the monitor to the system reference model, and the trace output as a reaction for event parsing and exceptions, as depicted below: Linux +----- RV Monitor ----------------------------------+ Formal Realm | | Realm +-------------------+ +----------------+ +-----------------+ | Linux kernel | | Monitor | | Reference | | Tracing | -> | Instance(s) | <- | Model | | (instrumentation) | | (verification) | | (specification) | +-------------------+ +----------------+ +-----------------+ | | | | V | | +----------+ | | | Reaction | | | +--+--+--+-+ | | | | | | | | | +-> trace output ? | +------------------------|--|----------------------+ | +----> panic ? +-------> Add the rv/da_monitor.h, enabling automatic code generation for the *Monitor Instance(s)* using C macros, and code to support it. The benefits of the usage of macro for monitor synthesis are 3-fold as it: - Reduces the code duplication; - Facilitates the bug fix/improvement; - Avoids the case of developers changing the core of the monitor code to manipulate the model in a (let's say) non-standard way. This initial implementation presents three different types of monitor instances: - DECLARE_DA_MON_GLOBAL(name, type) - DECLARE_DA_MON_PER_CPU(name, type) - DECLARE_DA_MON_PER_TASK(name, type) The first declares the functions for a global deterministic automata monitor, the second for monitors with per-cpu instances, and the third with per-task instances. Link: https://lkml.kernel.org/r/51b0bf425a281e226dfeba7401d2115d6091f84e.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck Cc: Guenter Roeck Cc: Jonathan Corbet Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Peter Zijlstra Cc: Will Deacon Cc: Catalin Marinas Cc: Marco Elver Cc: Dmitry Vyukov Cc: "Paul E. McKenney" Cc: Shuah Khan Cc: Gabriele Paoloni Cc: Juri Lelli Cc: Clark Williams Cc: Tao Zhou Cc: Randy Dunlap Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira Signed-off-by: Steven Rostedt (Google) --- include/linux/rv.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rv.h b/include/linux/rv.h index dcce56987cf7..8883b41d88ec 100644 --- a/include/linux/rv.h +++ b/include/linux/rv.h @@ -7,7 +7,16 @@ #ifndef _LINUX_RV_H #define _LINUX_RV_H +#define MAX_DA_NAME_LEN 24 + #ifdef CONFIG_RV +/* + * Deterministic automaton per-object variables. + */ +struct da_monitor { + bool monitoring; + unsigned int curr_state; +}; /* * Per-task RV monitors count. Nowadays fixed in RV_PER_TASK_MONITORS. @@ -22,6 +31,7 @@ * Futher monitor types are expected, so make this a union. */ union rv_task_monitor { + struct da_monitor da_mon; }; #ifdef CONFIG_RV_REACTORS -- cgit From a603002eea8213eec5211be5a85db8340aea06d0 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Wed, 22 Jun 2022 08:38:36 +0200 Subject: virtio: replace restricted mem access flag with callback Instead of having a global flag to require restricted memory access for all virtio devices, introduce a callback which can select that requirement on a per-device basis. For convenience add a common function returning always true, which can be used for use cases like SEV. Per default use a callback always returning false. As the callback needs to be set in early init code already, add a virtio anchor which is builtin in case virtio is enabled. Signed-off-by: Juergen Gross Tested-by: Oleksandr Tyshchenko # Arm64 guest using Xen Reviewed-by: Stefano Stabellini Link: https://lore.kernel.org/r/20220622063838.8854-2-jgross@suse.com Signed-off-by: Juergen Gross --- include/linux/platform-feature.h | 6 +----- include/linux/virtio_anchor.h | 19 +++++++++++++++++++ 2 files changed, 20 insertions(+), 5 deletions(-) create mode 100644 include/linux/virtio_anchor.h (limited to 'include/linux') diff --git a/include/linux/platform-feature.h b/include/linux/platform-feature.h index b2f48be999fa..6ed859928b97 100644 --- a/include/linux/platform-feature.h +++ b/include/linux/platform-feature.h @@ -6,11 +6,7 @@ #include /* The platform features are starting with the architecture specific ones. */ - -/* Used to enable platform specific DMA handling for virtio devices. */ -#define PLATFORM_VIRTIO_RESTRICTED_MEM_ACCESS (0 + PLATFORM_ARCH_FEAT_N) - -#define PLATFORM_FEAT_N (1 + PLATFORM_ARCH_FEAT_N) +#define PLATFORM_FEAT_N (0 + PLATFORM_ARCH_FEAT_N) void platform_set(unsigned int feature); void platform_clear(unsigned int feature); diff --git a/include/linux/virtio_anchor.h b/include/linux/virtio_anchor.h new file mode 100644 index 000000000000..432e6c00b3ca --- /dev/null +++ b/include/linux/virtio_anchor.h @@ -0,0 +1,19 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_VIRTIO_ANCHOR_H +#define _LINUX_VIRTIO_ANCHOR_H + +#ifdef CONFIG_VIRTIO_ANCHOR +struct virtio_device; + +bool virtio_require_restricted_mem_acc(struct virtio_device *dev); +extern bool (*virtio_check_mem_acc_cb)(struct virtio_device *dev); + +static inline void virtio_set_mem_acc_cb(bool (*func)(struct virtio_device *)) +{ + virtio_check_mem_acc_cb = func; +} +#else +#define virtio_set_mem_acc_cb(func) do { } while (0) +#endif + +#endif /* _LINUX_VIRTIO_ANCHOR_H */ -- cgit From a870544ca9d215449e91ebc01e35d80b23151c78 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Wed, 22 Jun 2022 08:38:37 +0200 Subject: kernel: remove platform_has() infrastructure The only use case of the platform_has() infrastructure has been removed again, so remove the whole feature. Signed-off-by: Juergen Gross Tested-by: Oleksandr Tyshchenko # Arm64 guest using Xen Reviewed-by: Stefano Stabellini Link: https://lore.kernel.org/r/20220622063838.8854-3-jgross@suse.com Signed-off-by: Juergen Gross --- include/linux/platform-feature.h | 15 --------------- 1 file changed, 15 deletions(-) delete mode 100644 include/linux/platform-feature.h (limited to 'include/linux') diff --git a/include/linux/platform-feature.h b/include/linux/platform-feature.h deleted file mode 100644 index 6ed859928b97..000000000000 --- a/include/linux/platform-feature.h +++ /dev/null @@ -1,15 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _PLATFORM_FEATURE_H -#define _PLATFORM_FEATURE_H - -#include -#include - -/* The platform features are starting with the architecture specific ones. */ -#define PLATFORM_FEAT_N (0 + PLATFORM_ARCH_FEAT_N) - -void platform_set(unsigned int feature); -void platform_clear(unsigned int feature); -bool platform_has(unsigned int feature); - -#endif /* _PLATFORM_FEATURE_H */ -- cgit From 36d4b36b69590fed99356a4426c940a253a93800 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 25 Jul 2022 09:39:17 -0700 Subject: lib/nodemask: inline next_node_in() and node_random() The functions are pretty thin wrappers around find_bit engine, and keeping them in c-file prevents compiler from small_const_nbits() optimization, which must take place for all systems with MAX_NUMNODES less than BITS_PER_LONG (default is 16 for me). Moving them to header file doesn't blow up the kernel size: add/remove: 1/2 grow/shrink: 9/5 up/down: 968/-88 (880) CC: Andy Shevchenko CC: Benjamin Herrenschmidt CC: Michael Ellerman CC: Paul Mackerras CC: Rasmus Villemoes CC: Stephen Rothwell CC: linuxppc-dev@lists.ozlabs.org Signed-off-by: Yury Norov --- include/linux/nodemask.h | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 0f233b76c9ce..4b71a96190a8 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -94,6 +94,7 @@ #include #include #include +#include typedef struct { DECLARE_BITMAP(bits, MAX_NUMNODES); } nodemask_t; extern nodemask_t _unused_nodemask_arg_; @@ -276,7 +277,14 @@ static inline unsigned int __next_node(int n, const nodemask_t *srcp) * the first node in src if needed. Returns MAX_NUMNODES if src is empty. */ #define next_node_in(n, src) __next_node_in((n), &(src)) -unsigned int __next_node_in(int node, const nodemask_t *srcp); +static inline unsigned int __next_node_in(int node, const nodemask_t *srcp) +{ + unsigned int ret = __next_node(node, srcp); + + if (ret == MAX_NUMNODES) + ret = __first_node(srcp); + return ret; +} static inline void init_nodemask_of_node(nodemask_t *mask, int node) { @@ -493,14 +501,20 @@ static inline int num_node_state(enum node_states state) #endif +static inline int node_random(const nodemask_t *maskp) +{ #if defined(CONFIG_NUMA) && (MAX_NUMNODES > 1) -extern int node_random(const nodemask_t *maskp); + int w, bit = NUMA_NO_NODE; + + w = nodes_weight(*maskp); + if (w) + bit = bitmap_ord_to_pos(maskp->bits, + get_random_int() % w, MAX_NUMNODES); + return bit; #else -static inline int node_random(const nodemask_t *mask) -{ return 0; -} #endif +} #define node_online_map node_states[N_ONLINE] #define node_possible_map node_states[N_POSSIBLE] -- cgit From 68f2736a858324c3ec852f6c2cddd9d1c777357d Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 7 Jun 2022 15:38:48 -0400 Subject: mm: Convert all PageMovable users to movable_operations These drivers are rather uncomfortably hammered into the address_space_operations hole. They aren't filesystems and don't behave like filesystems. They just need their own movable_operations structure, which we can point to directly from page->mapping. Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/balloon_compaction.h | 6 ++-- include/linux/fs.h | 2 -- include/linux/migrate.h | 56 ++++++++++++++++++++++++++++++++++---- include/linux/page-flags.h | 2 +- 4 files changed, 54 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/balloon_compaction.h b/include/linux/balloon_compaction.h index edb7f6d41faa..5ca2d5699620 100644 --- a/include/linux/balloon_compaction.h +++ b/include/linux/balloon_compaction.h @@ -57,7 +57,6 @@ struct balloon_dev_info { struct list_head pages; /* Pages enqueued & handled to Host */ int (*migratepage)(struct balloon_dev_info *, struct page *newpage, struct page *page, enum migrate_mode mode); - struct inode *inode; }; extern struct page *balloon_page_alloc(void); @@ -75,11 +74,10 @@ static inline void balloon_devinfo_init(struct balloon_dev_info *balloon) spin_lock_init(&balloon->pages_lock); INIT_LIST_HEAD(&balloon->pages); balloon->migratepage = NULL; - balloon->inode = NULL; } #ifdef CONFIG_BALLOON_COMPACTION -extern const struct address_space_operations balloon_aops; +extern const struct movable_operations balloon_mops; /* * balloon_page_insert - insert a page into the balloon's page list and make @@ -94,7 +92,7 @@ static inline void balloon_page_insert(struct balloon_dev_info *balloon, struct page *page) { __SetPageOffline(page); - __SetPageMovable(page, balloon->inode->i_mapping); + __SetPageMovable(page, &balloon_mops); set_page_private(page, (unsigned long)balloon); list_add(&page->lru, &balloon->pages); } diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ad5e3520fae..5d8ee3155ca2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -367,8 +367,6 @@ struct address_space_operations { */ int (*migratepage) (struct address_space *, struct page *, struct page *, enum migrate_mode); - bool (*isolate_page)(struct page *, isolate_mode_t); - void (*putback_page)(struct page *); int (*launder_folio)(struct folio *); bool (*is_partially_uptodate) (struct folio *, size_t from, size_t count); diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 069a89e847f3..82c735ba6109 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -19,6 +19,43 @@ struct migration_target_control; */ #define MIGRATEPAGE_SUCCESS 0 +/** + * struct movable_operations - Driver page migration + * @isolate_page: + * The VM calls this function to prepare the page to be moved. The page + * is locked and the driver should not unlock it. The driver should + * return ``true`` if the page is movable and ``false`` if it is not + * currently movable. After this function returns, the VM uses the + * page->lru field, so the driver must preserve any information which + * is usually stored here. + * + * @migrate_page: + * After isolation, the VM calls this function with the isolated + * @src page. The driver should copy the contents of the + * @src page to the @dst page and set up the fields of @dst page. + * Both pages are locked. + * If page migration is successful, the driver should call + * __ClearPageMovable(@src) and return MIGRATEPAGE_SUCCESS. + * If the driver cannot migrate the page at the moment, it can return + * -EAGAIN. The VM interprets this as a temporary migration failure and + * will retry it later. Any other error value is a permanent migration + * failure and migration will not be retried. + * The driver shouldn't touch the @src->lru field while in the + * migrate_page() function. It may write to @dst->lru. + * + * @putback_page: + * If migration fails on the isolated page, the VM informs the driver + * that the page is no longer a candidate for migration by calling + * this function. The driver should put the isolated page back into + * its own data structure. + */ +struct movable_operations { + bool (*isolate_page)(struct page *, isolate_mode_t); + int (*migrate_page)(struct page *dst, struct page *src, + enum migrate_mode); + void (*putback_page)(struct page *); +}; + /* Defined in mm/debug.c: */ extern const char *migrate_reason_names[MR_TYPES]; @@ -91,13 +128,13 @@ static inline int next_demotion_node(int node) #endif #ifdef CONFIG_COMPACTION -extern int PageMovable(struct page *page); -extern void __SetPageMovable(struct page *page, struct address_space *mapping); -extern void __ClearPageMovable(struct page *page); +bool PageMovable(struct page *page); +void __SetPageMovable(struct page *page, const struct movable_operations *ops); +void __ClearPageMovable(struct page *page); #else -static inline int PageMovable(struct page *page) { return 0; } +static inline bool PageMovable(struct page *page) { return false; } static inline void __SetPageMovable(struct page *page, - struct address_space *mapping) + const struct movable_operations *ops) { } static inline void __ClearPageMovable(struct page *page) @@ -110,6 +147,15 @@ static inline bool folio_test_movable(struct folio *folio) return PageMovable(&folio->page); } +static inline +const struct movable_operations *page_movable_ops(struct page *page) +{ + VM_BUG_ON(!__PageMovable(page)); + + return (const struct movable_operations *) + ((unsigned long)page->mapping - PAGE_MAPPING_MOVABLE); +} + #ifdef CONFIG_NUMA_BALANCING extern int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, int node); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e66f7aa3191d..3f5490f6f038 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -639,7 +639,7 @@ __PAGEFLAG(Reported, reported, PF_NO_COMPOUND) * structure which KSM associates with that merged page. See ksm.h. * * PAGE_MAPPING_KSM without PAGE_MAPPING_ANON is used for non-lru movable - * page and then page->mapping points a struct address_space. + * page and then page->mapping points to a struct movable_operations. * * Please note that, confusingly, "page_mapping" refers to the inode * address_space which maps the page from disk; whereas "page_mapped" -- cgit From 5490da4f06d182ba944706875029e98fe7f6b821 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 6 Jun 2022 09:00:16 -0400 Subject: fs: Add aops->migrate_folio Provide a folio-based replacement for aops->migratepage. Update the documentation to document migrate_folio instead of migratepage. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/fs.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 5d8ee3155ca2..47431cf8fbb3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -362,9 +362,11 @@ struct address_space_operations { void (*free_folio)(struct folio *folio); ssize_t (*direct_IO)(struct kiocb *, struct iov_iter *iter); /* - * migrate the contents of a page to the specified target. If + * migrate the contents of a folio to the specified target. If * migrate_mode is MIGRATE_ASYNC, it must not block. */ + int (*migrate_folio)(struct address_space *, struct folio *dst, + struct folio *src, enum migrate_mode); int (*migratepage) (struct address_space *, struct page *, struct page *, enum migrate_mode); int (*launder_folio)(struct folio *); -- cgit From 67235182a41c1bd6b32806a1556a1d299b84212b Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 6 Jun 2022 10:20:31 -0400 Subject: mm/migrate: Convert buffer_migrate_page() to buffer_migrate_folio() Use a folio throughout __buffer_migrate_folio(), add kernel-doc for buffer_migrate_folio() and buffer_migrate_folio_norefs(), move their declarations to buffer.h and switch all filesystems that have wired them up. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/buffer_head.h | 10 ++++++++++ include/linux/fs.h | 12 ------------ 2 files changed, 10 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index c9d1463bb20f..b0366c89d6a4 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -267,6 +267,16 @@ int nobh_truncate_page(struct address_space *, loff_t, get_block_t *); int nobh_writepage(struct page *page, get_block_t *get_block, struct writeback_control *wbc); +#ifdef CONFIG_MIGRATION +extern int buffer_migrate_folio(struct address_space *, + struct folio *dst, struct folio *src, enum migrate_mode); +extern int buffer_migrate_folio_norefs(struct address_space *, + struct folio *dst, struct folio *src, enum migrate_mode); +#else +#define buffer_migrate_folio NULL +#define buffer_migrate_folio_norefs NULL +#endif + void buffer_init(void); /* diff --git a/include/linux/fs.h b/include/linux/fs.h index 47431cf8fbb3..9e6b17da4e11 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -3215,18 +3215,6 @@ extern int generic_check_addressable(unsigned, u64); extern void generic_set_encrypted_ci_d_ops(struct dentry *dentry); -#ifdef CONFIG_MIGRATION -extern int buffer_migrate_page(struct address_space *, - struct page *, struct page *, - enum migrate_mode); -extern int buffer_migrate_page_norefs(struct address_space *, - struct page *, struct page *, - enum migrate_mode); -#else -#define buffer_migrate_page NULL -#define buffer_migrate_page_norefs NULL -#endif - int may_setattr(struct user_namespace *mnt_userns, struct inode *inode, unsigned int ia_valid); int setattr_prepare(struct user_namespace *, struct dentry *, struct iattr *); -- cgit From 541846502f4fe826cd7c16e4784695ac90736585 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 6 Jun 2022 10:27:41 -0400 Subject: mm/migrate: Convert migrate_page() to migrate_folio() Convert all callers to pass a folio. Most have the folio already available. Switch all users from aops->migratepage to aops->migrate_folio. Also turn the documentation into kerneldoc. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Acked-by: David Sterba --- include/linux/migrate.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 82c735ba6109..c9986d5da335 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -62,9 +62,8 @@ extern const char *migrate_reason_names[MR_TYPES]; #ifdef CONFIG_MIGRATION extern void putback_movable_pages(struct list_head *l); -extern int migrate_page(struct address_space *mapping, - struct page *newpage, struct page *page, - enum migrate_mode mode); +int migrate_folio(struct address_space *mapping, struct folio *dst, + struct folio *src, enum migrate_mode mode); extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, unsigned long private, enum migrate_mode mode, int reason, unsigned int *ret_succeeded); -- cgit From 2ec810d59602f0e08847f986ef8e16469722496f Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 6 Jun 2022 12:55:08 -0400 Subject: mm/migrate: Add filemap_migrate_folio() There is nothing iomap-specific about iomap_migratepage(), and it fits a pattern used by several other filesystems, so move it to mm/migrate.c, convert it to be filemap_migrate_folio() and convert the iomap filesystems to use it. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong --- include/linux/iomap.h | 6 ------ include/linux/pagemap.h | 6 ++++++ 2 files changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iomap.h b/include/linux/iomap.h index e552097c67e0..758a1125e72f 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -231,12 +231,6 @@ void iomap_readahead(struct readahead_control *, const struct iomap_ops *ops); bool iomap_is_partially_uptodate(struct folio *, size_t from, size_t count); bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); -#ifdef CONFIG_MIGRATION -int iomap_migrate_page(struct address_space *mapping, struct page *newpage, - struct page *page, enum migrate_mode mode); -#else -#define iomap_migrate_page NULL -#endif int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 87d4ea571240..cc9adbaddb59 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1078,6 +1078,12 @@ static inline int __must_check write_one_page(struct page *page) int __set_page_dirty_nobuffers(struct page *page); bool noop_dirty_folio(struct address_space *mapping, struct folio *folio); +#ifdef CONFIG_MIGRATION +int filemap_migrate_folio(struct address_space *mapping, struct folio *dst, + struct folio *src, enum migrate_mode mode); +#else +#define filemap_migrate_folio NULL +#endif void page_endio(struct page *page, bool is_write, int err); void folio_end_private_2(struct folio *folio); -- cgit From b890ec2a2c2d962f71ba31ae291f8fd252b46258 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 6 Jun 2022 10:47:21 -0400 Subject: hugetlb: Convert to migrate_folio This involves converting migrate_huge_page_move_mapping(). We also need a folio variant of hugetlb_set_page_subpool(), but that's for a later patch. Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Muchun Song Reviewed-by: Mike Kravetz --- include/linux/migrate.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index c9986d5da335..13f793309b75 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -72,8 +72,8 @@ extern int isolate_movable_page(struct page *page, isolate_mode_t mode); extern void migrate_page_states(struct page *newpage, struct page *page); extern void migrate_page_copy(struct page *newpage, struct page *page); -extern int migrate_huge_page_move_mapping(struct address_space *mapping, - struct page *newpage, struct page *page); +int migrate_huge_page_move_mapping(struct address_space *mapping, + struct folio *dst, struct folio *src); extern int migrate_page_move_mapping(struct address_space *mapping, struct page *newpage, struct page *page, int extra_count); void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, @@ -104,7 +104,7 @@ static inline void migrate_page_copy(struct page *newpage, struct page *page) {} static inline int migrate_huge_page_move_mapping(struct address_space *mapping, - struct page *newpage, struct page *page) + struct folio *dst, struct folio *src) { return -ENOSYS; } -- cgit From 9d0ddc0cb575fd41ff16131b06e08e1feac43b81 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 6 Jun 2022 11:53:31 -0400 Subject: fs: Remove aops->migratepage() With all users converted to migrate_folio(), remove this operation. Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/fs.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9e6b17da4e11..7e06919b8f60 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -367,8 +367,6 @@ struct address_space_operations { */ int (*migrate_folio)(struct address_space *, struct folio *dst, struct folio *src, enum migrate_mode); - int (*migratepage) (struct address_space *, - struct page *, struct page *, enum migrate_mode); int (*launder_folio)(struct folio *); bool (*is_partially_uptodate) (struct folio *, size_t from, size_t count); -- cgit From 9800562f2ab41656b0bdc2a41c77ab3f6dfdd6fc Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Mon, 6 Jun 2022 13:29:10 -0400 Subject: mm/folio-compat: Remove migration compatibility functions migrate_page_move_mapping(), migrate_page_copy() and migrate_page_states() are all now unused after converting all the filesystems from aops->migratepage() to aops->migrate_folio(). Signed-off-by: Matthew Wilcox (Oracle) Reviewed-by: Christoph Hellwig --- include/linux/migrate.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 13f793309b75..ae5bb67a9ba1 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -70,12 +70,8 @@ extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, extern struct page *alloc_migration_target(struct page *page, unsigned long private); extern int isolate_movable_page(struct page *page, isolate_mode_t mode); -extern void migrate_page_states(struct page *newpage, struct page *page); -extern void migrate_page_copy(struct page *newpage, struct page *page); int migrate_huge_page_move_mapping(struct address_space *mapping, struct folio *dst, struct folio *src); -extern int migrate_page_move_mapping(struct address_space *mapping, - struct page *newpage, struct page *page, int extra_count); void migration_entry_wait_on_locked(swp_entry_t entry, pte_t *ptep, spinlock_t *ptl); void folio_migrate_flags(struct folio *newfolio, struct folio *folio); @@ -96,13 +92,6 @@ static inline struct page *alloc_migration_target(struct page *page, static inline int isolate_movable_page(struct page *page, isolate_mode_t mode) { return -EBUSY; } -static inline void migrate_page_states(struct page *newpage, struct page *page) -{ -} - -static inline void migrate_page_copy(struct page *newpage, - struct page *page) {} - static inline int migrate_huge_page_move_mapping(struct address_space *mapping, struct folio *dst, struct folio *src) { -- cgit From cc9cf350d100f4c336edb228cbd078003430cfe7 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 13 Jun 2022 07:37:13 +0200 Subject: fs: remove the nobh helpers All callers are gone, so remove the now dead code. Signed-off-by: Christoph Hellwig Reviewed-by: Jan Kara Signed-off-by: Matthew Wilcox (Oracle) --- include/linux/buffer_head.h | 8 -------- include/linux/mpage.h | 2 -- 2 files changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index b0366c89d6a4..61afb81cfdae 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -258,14 +258,6 @@ static inline vm_fault_t block_page_mkwrite_return(int err) } sector_t generic_block_bmap(struct address_space *, sector_t, get_block_t *); int block_truncate_page(struct address_space *, loff_t, get_block_t *); -int nobh_write_begin(struct address_space *, loff_t, unsigned len, - struct page **, void **, get_block_t*); -int nobh_write_end(struct file *, struct address_space *, - loff_t, unsigned, unsigned, - struct page *, void *); -int nobh_truncate_page(struct address_space *, loff_t, get_block_t *); -int nobh_writepage(struct page *page, get_block_t *get_block, - struct writeback_control *wbc); #ifdef CONFIG_MIGRATION extern int buffer_migrate_folio(struct address_space *, diff --git a/include/linux/mpage.h b/include/linux/mpage.h index 43986f7ec4dd..1bdc39daac0a 100644 --- a/include/linux/mpage.h +++ b/include/linux/mpage.h @@ -19,7 +19,5 @@ void mpage_readahead(struct readahead_control *, get_block_t get_block); int mpage_read_folio(struct folio *folio, get_block_t get_block); int mpage_writepages(struct address_space *mapping, struct writeback_control *wbc, get_block_t get_block); -int mpage_writepage(struct page *page, get_block_t *get_block, - struct writeback_control *wbc); #endif -- cgit From 170ab26b01d7f445c46261eff69341dab7e50a24 Mon Sep 17 00:00:00 2001 From: Li zeming Date: Thu, 21 Jul 2022 16:19:04 +0800 Subject: tracepoints: It is CONFIG_TRACEPOINTS not CONFIG_TRACEPOINT When reading this note, CONFIG_TRACEPOINT searches my configuration file, and the result is CONFIG_TRACEPOINTS, the search results are consistent with the following macro definitions. I think it should be repaired. Link: https://lkml.kernel.org/r/20220721081904.3798-1-zeming@nfschina.com Signed-off-by: Li zeming Signed-off-by: Steven Rostedt (Google) --- include/linux/tracepoint.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 28031b15f878..2908cc5ed70e 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -151,7 +151,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) /* * Individual subsystem my have a separate configuration to * enable their tracepoints. By default, this file will create - * the tracepoints if CONFIG_TRACEPOINT is defined. If a subsystem + * the tracepoints if CONFIG_TRACEPOINTS is defined. If a subsystem * wants to be able to disable its tracepoints from being created * it can define NOTRACE before including the tracepoint headers. */ -- cgit From d9c26e0a58b0039d1ad4340d826fe8db866e9455 Mon Sep 17 00:00:00 2001 From: Chun-Kuang Hu Date: Wed, 8 Jun 2022 22:40:55 +0800 Subject: mailbox: mtk-cmdq: Remove proprietary cmdq_task_cb rx_callback is a standard mailbox callback mechanism and could cover the function of proprietary cmdq_task_cb, so use the standard one instead of the proprietary one. Client driver has changed to use standard rx_callback, so remove proprietary cmdq_task_cb. Signed-off-by: Chun-Kuang Hu Reviewed-by: Matthias Brugger Signed-off-by: Jassi Brar --- include/linux/mailbox/mtk-cmdq-mailbox.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mailbox/mtk-cmdq-mailbox.h b/include/linux/mailbox/mtk-cmdq-mailbox.h index 44365aab043c..a8f0070c7aa9 100644 --- a/include/linux/mailbox/mtk-cmdq-mailbox.h +++ b/include/linux/mailbox/mtk-cmdq-mailbox.h @@ -67,24 +67,14 @@ enum cmdq_code { struct cmdq_cb_data { int sta; - void *data; struct cmdq_pkt *pkt; }; -typedef void (*cmdq_async_flush_cb)(struct cmdq_cb_data data); - -struct cmdq_task_cb { - cmdq_async_flush_cb cb; - void *data; -}; - struct cmdq_pkt { void *va_base; dma_addr_t pa_base; size_t cmd_buf_size; /* command occupied size */ size_t buf_size; /* real buffer size */ - struct cmdq_task_cb cb; - struct cmdq_task_cb async_cb; void *cl; }; -- cgit From d3e94fdc4ef476ca1edd468cc11badf2dbbb3c00 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 8 Jan 2021 15:34:38 -0500 Subject: fscrypt: export fscrypt_fname_encrypt and fscrypt_fname_encrypted_size For ceph, we want to use our own scheme for handling filenames that are are longer than NAME_MAX after encryption and Base64 encoding. This allows us to have a consistent view of the encrypted filenames for clients that don't support fscrypt and clients that do but that don't have the key. Currently, fs/crypto only supports encrypting filenames using fscrypt_setup_filename, but that also handles encoding nokey names. Ceph can't use that because it handles nokey names in a different way. Export fscrypt_fname_encrypt. Rename fscrypt_fname_encrypted_size to __fscrypt_fname_encrypted_size and add a new wrapper called fscrypt_fname_encrypted_size that takes an inode argument rather than a pointer to a fscrypt_policy union. Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Acked-by: Eric Biggers Signed-off-by: Ilya Dryomov --- include/linux/fscrypt.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index e60d57c99cb6..5926a4081c6d 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -327,6 +327,10 @@ void fscrypt_free_inode(struct inode *inode); int fscrypt_drop_inode(struct inode *inode); /* fname.c */ +int fscrypt_fname_encrypt(const struct inode *inode, const struct qstr *iname, + u8 *out, unsigned int olen); +bool fscrypt_fname_encrypted_size(const struct inode *inode, u32 orig_len, + u32 max_len, u32 *encrypted_len_ret); int fscrypt_setup_filename(struct inode *inode, const struct qstr *iname, int lookup, struct fscrypt_name *fname); -- cgit From 637fa738b590ec0e3414931d1e07c4f195eb5215 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Tue, 1 Sep 2020 12:56:42 -0400 Subject: fscrypt: add fscrypt_context_for_new_inode Most filesystems just call fscrypt_set_context on new inodes, which usually causes a setxattr. That's a bit late for ceph, which can send along a full set of attributes with the create request. Doing so allows it to avoid race windows that where the new inode could be seen by other clients without the crypto context attached. It also avoids the separate round trip to the server. Refactor the fscrypt code a bit to allow us to create a new crypto context, attach it to the inode, and write it to the buffer, but without calling set_context on it. ceph can later use this to marshal the context into the attributes we send along with the create request. Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Acked-by: Eric Biggers Signed-off-by: Ilya Dryomov --- include/linux/fscrypt.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index 5926a4081c6d..7d2f1e0f23b1 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -284,6 +284,7 @@ int fscrypt_ioctl_get_policy(struct file *filp, void __user *arg); int fscrypt_ioctl_get_policy_ex(struct file *filp, void __user *arg); int fscrypt_ioctl_get_nonce(struct file *filp, void __user *arg); int fscrypt_has_permitted_context(struct inode *parent, struct inode *child); +int fscrypt_context_for_new_inode(void *ctx, struct inode *inode); int fscrypt_set_context(struct inode *inode, void *fs_data); struct fscrypt_dummy_policy { -- cgit From 4f48d5da81ee7004a789c8aac2d0dfb2514c37f1 Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Mon, 16 May 2022 11:23:19 +0800 Subject: fs/dcache: export d_same_name() helper Compare dentry name with case-exact name, return true if names are same, or false. Signed-off-by: Xiubo Li Reviewed-by: Jeff Layton Reviewed-by: Luis Chamberlain Signed-off-by: Ilya Dryomov --- include/linux/dcache.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dcache.h b/include/linux/dcache.h index f5bba51480b2..bb72361834de 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -233,6 +233,8 @@ extern struct dentry * d_alloc_parallel(struct dentry *, const struct qstr *, wait_queue_head_t *); extern struct dentry * d_splice_alias(struct inode *, struct dentry *); extern struct dentry * d_add_ci(struct dentry *, struct inode *, struct qstr *); +extern bool d_same_name(const struct dentry *dentry, const struct dentry *parent, + const struct qstr *name); extern struct dentry * d_exact_alias(struct dentry *, struct inode *); extern struct dentry *d_find_any_alias(struct inode *inode); extern struct dentry * d_obtain_alias(struct inode *); -- cgit From d93231a6bc8a452323d5fef16cca7107ce483a27 Mon Sep 17 00:00:00 2001 From: Luís Henriques Date: Fri, 3 Jun 2022 14:29:09 +0100 Subject: ceph: prevent a client from exceeding the MDS maximum xattr size MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The MDS tries to enforce a limit on the total key/values in extended attributes. However, this limit is enforced only if doing a synchronous operation (MDS_OP_SETXATTR) -- if we're buffering the xattrs, the MDS doesn't have a chance to enforce these limits. This patch adds support for decoding the xattrs maximum size setting that is distributed in the mdsmap. Then, when setting an xattr, the kernel client will revert to do a synchronous operation if that maximum size is exceeded. While there, fix a dout() that would trigger a printk warning: [ 98.718078] ------------[ cut here ]------------ [ 98.719012] precision 65536 too large [ 98.719039] WARNING: CPU: 1 PID: 3755 at lib/vsprintf.c:2703 vsnprintf+0x5e3/0x600 ... Link: https://tracker.ceph.com/issues/55725 Signed-off-by: Luís Henriques Reviewed-by: Xiubo Li Signed-off-by: Ilya Dryomov --- include/linux/ceph/mdsmap.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/ceph/mdsmap.h b/include/linux/ceph/mdsmap.h index 523fd0452856..4c3e0648dc27 100644 --- a/include/linux/ceph/mdsmap.h +++ b/include/linux/ceph/mdsmap.h @@ -25,6 +25,7 @@ struct ceph_mdsmap { u32 m_session_timeout; /* seconds */ u32 m_session_autoclose; /* seconds */ u64 m_max_file_size; + u64 m_max_xattr_size; /* maximum size for xattrs blob */ u32 m_max_mds; /* expected up:active mds number */ u32 m_num_active_mds; /* actual up:active mds number */ u32 possible_max_rank; /* possible max rank index */ -- cgit From 1b7587d69ea7b8c350195392bfd30a0f31374ae4 Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Mon, 6 Jun 2022 20:15:33 +0800 Subject: ceph: fix the incorrect comment for the ceph_mds_caps struct The incorrect comment is misleading. Acutally the last members in ceph_mds_caps strcut is a union for none export and export bodies. Signed-off-by: Xiubo Li Reviewed-by: Jeff Layton Signed-off-by: Ilya Dryomov --- include/linux/ceph/ceph_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 86bf82dbd8b8..40740e234ce1 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -768,7 +768,7 @@ struct ceph_mds_caps { __le32 xattr_len; __le64 xattr_version; - /* filelock */ + /* a union of non-export and export bodies. */ __le64 size, max_size, truncate_size; __le32 truncate_seq; struct ceph_timespec mtime, atime, ctime; -- cgit From 020bc44a9fbf6946f42db503d11c9811f26dd9fd Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Fri, 10 Jun 2022 11:40:13 -0400 Subject: ceph: switch back to testing for NULL folio->private in ceph_dirty_folio Willy requested that we change this back to warning on folio->private being non-NULl. He's trying to kill off the PG_private flag, and so we'd like to catch where it's non-NULL. Add a VM_WARN_ON_FOLIO (since it doesn't exist yet) and change over to using that instead of VM_BUG_ON_FOLIO along with testing the ->private pointer. [ xiubli: define VM_WARN_ON_FOLIO macro in case DEBUG_VM is disabled reported by kernel test robot ] Cc: Matthew Wilcox Signed-off-by: Jeff Layton Signed-off-by: Xiubo Li Signed-off-by: Ilya Dryomov --- include/linux/mmdebug.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h index d7285f8148a3..15ae78cd2853 100644 --- a/include/linux/mmdebug.h +++ b/include/linux/mmdebug.h @@ -54,6 +54,15 @@ void dump_mm(const struct mm_struct *mm); } \ unlikely(__ret_warn_once); \ }) +#define VM_WARN_ON_FOLIO(cond, folio) ({ \ + int __ret_warn = !!(cond); \ + \ + if (unlikely(__ret_warn)) { \ + dump_page(&folio->page, "VM_WARN_ON_FOLIO(" __stringify(cond)")");\ + WARN_ON(1); \ + } \ + unlikely(__ret_warn); \ +}) #define VM_WARN_ON_ONCE_FOLIO(cond, folio) ({ \ static bool __section(".data.once") __warned; \ int __ret_warn_once = !!(cond); \ @@ -79,6 +88,7 @@ void dump_mm(const struct mm_struct *mm); #define VM_WARN_ON(cond) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ON_ONCE(cond) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ON_ONCE_PAGE(cond, page) BUILD_BUG_ON_INVALID(cond) +#define VM_WARN_ON_FOLIO(cond, folio) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ON_ONCE_FOLIO(cond, folio) BUILD_BUG_ON_INVALID(cond) #define VM_WARN_ONCE(cond, format...) BUILD_BUG_ON_INVALID(cond) #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond) -- cgit From b53aca4b460ae2ece453963ef01667ee8ee78f52 Mon Sep 17 00:00:00 2001 From: Xiubo Li Date: Thu, 9 Jun 2022 15:21:37 +0800 Subject: ceph: fix incorrect old_size length in ceph_mds_request_args The 'old_size' is a __le64 type since birth, not sure why the kclient incorrectly switched it to __le32. This change is okay won't break anything because union will always allocate more memory than the 'open' member needed. Rename 'file_replication' to 'pool' as ceph did. Though this 'open' struct may never be used in kclient in future, it's confusing when going through the ceph code. Signed-off-by: Xiubo Li Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov --- include/linux/ceph/ceph_fs.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ceph/ceph_fs.h b/include/linux/ceph/ceph_fs.h index 40740e234ce1..49586ff26152 100644 --- a/include/linux/ceph/ceph_fs.h +++ b/include/linux/ceph/ceph_fs.h @@ -433,9 +433,9 @@ union ceph_mds_request_args { __le32 stripe_unit; /* layout for newly created file */ __le32 stripe_count; /* ... */ __le32 object_size; - __le32 file_replication; - __le32 mask; /* CEPH_CAP_* */ - __le32 old_size; + __le32 pool; + __le32 mask; /* CEPH_CAP_* */ + __le64 old_size; } __attribute__ ((packed)) open; struct { __le32 flags; -- cgit From 2c61c97fb12b806e1c8eb15f04c277ad097ec95e Mon Sep 17 00:00:00 2001 From: Michael Kelley Date: Wed, 8 Jun 2022 11:52:21 -0700 Subject: nvme: handle the persistent internal error AER In the NVM Express Revision 1.4 spec, Figure 145 describes possible values for an AER with event type "Error" (value 000b). For a Persistent Internal Error (value 03h), the host should perform a controller reset. Add support for this error using code that already exists for doing a controller reset. As part of this support, introduce two utility functions for parsing the AER type and subtype. This new support was tested in a lab environment where we can generate the persistent internal error on demand, and observe both the Linux side and NVMe controller side to see that the controller reset has been done. Signed-off-by: Michael Kelley Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- include/linux/nvme.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 07cfc922f8e4..3b8fc6c69529 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -711,6 +711,10 @@ enum { NVME_AER_VS = 7, }; +enum { + NVME_AER_ERROR_PERSIST_INT_ERR = 0x03, +}; + enum { NVME_AER_NOTICE_NS_CHANGED = 0x00, NVME_AER_NOTICE_FW_ACT_STARTING = 0x01, -- cgit From a116e1cdc64a743c36c40e6fa639dec991c157a3 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 27 Jun 2022 11:51:59 +0200 Subject: lib/base64: RFC4648-compliant base64 encoding Add RFC4648-compliant base64 encoding and decoding routines, based on the base64url encoding in fs/crypto/fname.c. Signed-off-by: Hannes Reinecke Reviewed-by: Himanshu Madhani Reviewed-by: Sagi Grimberg Cc: Eric Biggers Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- include/linux/base64.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 include/linux/base64.h (limited to 'include/linux') diff --git a/include/linux/base64.h b/include/linux/base64.h new file mode 100644 index 000000000000..660d4cb1ef31 --- /dev/null +++ b/include/linux/base64.h @@ -0,0 +1,16 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * base64 encoding, lifted from fs/crypto/fname.c. + */ + +#ifndef _LINUX_BASE64_H +#define _LINUX_BASE64_H + +#include + +#define BASE64_CHARS(nbytes) DIV_ROUND_UP((nbytes) * 4, 3) + +int base64_encode(const u8 *src, int len, char *dst); +int base64_decode(const char *src, int len, u8 *dst); + +#endif /* _LINUX_BASE64_H */ -- cgit From 88b140fec07307f825170a45562013a80842cc93 Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 27 Jun 2022 11:52:00 +0200 Subject: nvme: add definitions for NVMe In-Band authentication Add new definitions for NVMe In-band authentication as defined in the NVMe Base Specification v2.0. Signed-off-by: Hannes Reinecke Reviewed-by: Sagi Grimberg Reviewed-by: Himanshu Madhani Reviewed-by: Chaitanya Kulkarni Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- include/linux/nvme.h | 209 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 208 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 3b8fc6c69529..ae53d74f3696 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -19,6 +19,7 @@ #define NVMF_TRSVCID_SIZE 32 #define NVMF_TRADDR_SIZE 256 #define NVMF_TSAS_SIZE 256 +#define NVMF_AUTH_HASH_LEN 64 #define NVME_DISC_SUBSYS_NAME "nqn.2014-08.org.nvmexpress.discovery" @@ -1373,6 +1374,8 @@ enum nvmf_capsule_command { nvme_fabrics_type_property_set = 0x00, nvme_fabrics_type_connect = 0x01, nvme_fabrics_type_property_get = 0x04, + nvme_fabrics_type_auth_send = 0x05, + nvme_fabrics_type_auth_receive = 0x06, }; #define nvme_fabrics_type_name(type) { type, #type } @@ -1380,7 +1383,9 @@ enum nvmf_capsule_command { __print_symbolic(type, \ nvme_fabrics_type_name(nvme_fabrics_type_property_set), \ nvme_fabrics_type_name(nvme_fabrics_type_connect), \ - nvme_fabrics_type_name(nvme_fabrics_type_property_get)) + nvme_fabrics_type_name(nvme_fabrics_type_property_get), \ + nvme_fabrics_type_name(nvme_fabrics_type_auth_send), \ + nvme_fabrics_type_name(nvme_fabrics_type_auth_receive)) /* * If not fabrics command, fctype will be ignored. @@ -1476,6 +1481,11 @@ struct nvmf_connect_command { __u8 resv4[12]; }; +enum { + NVME_CONNECT_AUTHREQ_ASCR = (1 << 2), + NVME_CONNECT_AUTHREQ_ATR = (1 << 1), +}; + struct nvmf_connect_data { uuid_t hostid; __le16 cntlid; @@ -1510,6 +1520,200 @@ struct nvmf_property_get_command { __u8 resv4[16]; }; +struct nvmf_auth_common_command { + __u8 opcode; + __u8 resv1; + __u16 command_id; + __u8 fctype; + __u8 resv2[19]; + union nvme_data_ptr dptr; + __u8 resv3; + __u8 spsp0; + __u8 spsp1; + __u8 secp; + __le32 al_tl; + __u8 resv4[16]; +}; + +struct nvmf_auth_send_command { + __u8 opcode; + __u8 resv1; + __u16 command_id; + __u8 fctype; + __u8 resv2[19]; + union nvme_data_ptr dptr; + __u8 resv3; + __u8 spsp0; + __u8 spsp1; + __u8 secp; + __le32 tl; + __u8 resv4[16]; +}; + +struct nvmf_auth_receive_command { + __u8 opcode; + __u8 resv1; + __u16 command_id; + __u8 fctype; + __u8 resv2[19]; + union nvme_data_ptr dptr; + __u8 resv3; + __u8 spsp0; + __u8 spsp1; + __u8 secp; + __le32 al; + __u8 resv4[16]; +}; + +/* Value for secp */ +enum { + NVME_AUTH_DHCHAP_PROTOCOL_IDENTIFIER = 0xe9, +}; + +/* Defined value for auth_type */ +enum { + NVME_AUTH_COMMON_MESSAGES = 0x00, + NVME_AUTH_DHCHAP_MESSAGES = 0x01, +}; + +/* Defined messages for auth_id */ +enum { + NVME_AUTH_DHCHAP_MESSAGE_NEGOTIATE = 0x00, + NVME_AUTH_DHCHAP_MESSAGE_CHALLENGE = 0x01, + NVME_AUTH_DHCHAP_MESSAGE_REPLY = 0x02, + NVME_AUTH_DHCHAP_MESSAGE_SUCCESS1 = 0x03, + NVME_AUTH_DHCHAP_MESSAGE_SUCCESS2 = 0x04, + NVME_AUTH_DHCHAP_MESSAGE_FAILURE2 = 0xf0, + NVME_AUTH_DHCHAP_MESSAGE_FAILURE1 = 0xf1, +}; + +struct nvmf_auth_dhchap_protocol_descriptor { + __u8 authid; + __u8 rsvd; + __u8 halen; + __u8 dhlen; + __u8 idlist[60]; +}; + +enum { + NVME_AUTH_DHCHAP_AUTH_ID = 0x01, +}; + +/* Defined hash functions for DH-HMAC-CHAP authentication */ +enum { + NVME_AUTH_HASH_SHA256 = 0x01, + NVME_AUTH_HASH_SHA384 = 0x02, + NVME_AUTH_HASH_SHA512 = 0x03, + NVME_AUTH_HASH_INVALID = 0xff, +}; + +/* Defined Diffie-Hellman group identifiers for DH-HMAC-CHAP authentication */ +enum { + NVME_AUTH_DHGROUP_NULL = 0x00, + NVME_AUTH_DHGROUP_2048 = 0x01, + NVME_AUTH_DHGROUP_3072 = 0x02, + NVME_AUTH_DHGROUP_4096 = 0x03, + NVME_AUTH_DHGROUP_6144 = 0x04, + NVME_AUTH_DHGROUP_8192 = 0x05, + NVME_AUTH_DHGROUP_INVALID = 0xff, +}; + +union nvmf_auth_protocol { + struct nvmf_auth_dhchap_protocol_descriptor dhchap; +}; + +struct nvmf_auth_dhchap_negotiate_data { + __u8 auth_type; + __u8 auth_id; + __le16 rsvd; + __le16 t_id; + __u8 sc_c; + __u8 napd; + union nvmf_auth_protocol auth_protocol[]; +}; + +struct nvmf_auth_dhchap_challenge_data { + __u8 auth_type; + __u8 auth_id; + __u16 rsvd1; + __le16 t_id; + __u8 hl; + __u8 rsvd2; + __u8 hashid; + __u8 dhgid; + __le16 dhvlen; + __le32 seqnum; + /* 'hl' bytes of challenge value */ + __u8 cval[]; + /* followed by 'dhvlen' bytes of DH value */ +}; + +struct nvmf_auth_dhchap_reply_data { + __u8 auth_type; + __u8 auth_id; + __le16 rsvd1; + __le16 t_id; + __u8 hl; + __u8 rsvd2; + __u8 cvalid; + __u8 rsvd3; + __le16 dhvlen; + __le32 seqnum; + /* 'hl' bytes of response data */ + __u8 rval[]; + /* followed by 'hl' bytes of Challenge value */ + /* followed by 'dhvlen' bytes of DH value */ +}; + +enum { + NVME_AUTH_DHCHAP_RESPONSE_VALID = (1 << 0), +}; + +struct nvmf_auth_dhchap_success1_data { + __u8 auth_type; + __u8 auth_id; + __le16 rsvd1; + __le16 t_id; + __u8 hl; + __u8 rsvd2; + __u8 rvalid; + __u8 rsvd3[7]; + /* 'hl' bytes of response value if 'rvalid' is set */ + __u8 rval[]; +}; + +struct nvmf_auth_dhchap_success2_data { + __u8 auth_type; + __u8 auth_id; + __le16 rsvd1; + __le16 t_id; + __u8 rsvd2[10]; +}; + +struct nvmf_auth_dhchap_failure_data { + __u8 auth_type; + __u8 auth_id; + __le16 rsvd1; + __le16 t_id; + __u8 rescode; + __u8 rescode_exp; +}; + +enum { + NVME_AUTH_DHCHAP_FAILURE_REASON_FAILED = 0x01, +}; + +enum { + NVME_AUTH_DHCHAP_FAILURE_FAILED = 0x01, + NVME_AUTH_DHCHAP_FAILURE_NOT_USABLE = 0x02, + NVME_AUTH_DHCHAP_FAILURE_CONCAT_MISMATCH = 0x03, + NVME_AUTH_DHCHAP_FAILURE_HASH_UNUSABLE = 0x04, + NVME_AUTH_DHCHAP_FAILURE_DHGROUP_UNUSABLE = 0x05, + NVME_AUTH_DHCHAP_FAILURE_INCORRECT_PAYLOAD = 0x06, + NVME_AUTH_DHCHAP_FAILURE_INCORRECT_MESSAGE = 0x07, +}; + + struct nvme_dbbuf { __u8 opcode; __u8 flags; @@ -1553,6 +1757,9 @@ struct nvme_command { struct nvmf_connect_command connect; struct nvmf_property_set_command prop_set; struct nvmf_property_get_command prop_get; + struct nvmf_auth_common_command auth_common; + struct nvmf_auth_send_command auth_send; + struct nvmf_auth_receive_command auth_receive; struct nvme_dbbuf dbbuf; struct nvme_directive_cmd directive; }; -- cgit From f50fff73d620cd6e8f48bc58d4f1c944615a3fea Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 27 Jun 2022 11:52:02 +0200 Subject: nvme: implement In-Band authentication Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006. This patch adds two new fabric options 'dhchap_secret' to specify the pre-shared key (in ASCII respresentation according to NVMe 2.0 section 8.13.5.8 'Secret representation') and 'dhchap_ctrl_secret' to specify the pre-shared controller key for bi-directional authentication of both the host and the controller. Re-authentication can be triggered by writing the PSK into the new controller sysfs attribute 'dhchap_secret' or 'dhchap_ctrl_secret'. Signed-off-by: Hannes Reinecke Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig [axboe: fold in clang build fix] Signed-off-by: Jens Axboe --- include/linux/nvme-auth.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) create mode 100644 include/linux/nvme-auth.h (limited to 'include/linux') diff --git a/include/linux/nvme-auth.h b/include/linux/nvme-auth.h new file mode 100644 index 000000000000..354456826221 --- /dev/null +++ b/include/linux/nvme-auth.h @@ -0,0 +1,33 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2021 Hannes Reinecke, SUSE Software Solutions + */ + +#ifndef _NVME_AUTH_H +#define _NVME_AUTH_H + +#include + +struct nvme_dhchap_key { + u8 *key; + size_t len; + u8 hash; +}; + +u32 nvme_auth_get_seqnum(void); +const char *nvme_auth_dhgroup_name(u8 dhgroup_id); +const char *nvme_auth_dhgroup_kpp(u8 dhgroup_id); +u8 nvme_auth_dhgroup_id(const char *dhgroup_name); + +const char *nvme_auth_hmac_name(u8 hmac_id); +const char *nvme_auth_digest_name(u8 hmac_id); +size_t nvme_auth_hmac_hash_len(u8 hmac_id); +u8 nvme_auth_hmac_id(const char *hmac_name); + +struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, + u8 key_hash); +void nvme_auth_free_key(struct nvme_dhchap_key *key); +u8 *nvme_auth_transform_key(struct nvme_dhchap_key *key, char *nqn); +int nvme_auth_generate_key(u8 *secret, struct nvme_dhchap_key **ret_key); + +#endif /* _NVME_AUTH_H */ -- cgit From b61775d185a395f26fecdc7898e39de677a6c3dd Mon Sep 17 00:00:00 2001 From: Hannes Reinecke Date: Mon, 27 Jun 2022 11:52:03 +0200 Subject: nvme-auth: Diffie-Hellman key exchange support Implement Diffie-Hellman key exchange using FFDHE groups for NVMe In-Band Authentication. Signed-off-by: Hannes Reinecke Reviewed-by: Sagi Grimberg Signed-off-by: Christoph Hellwig Signed-off-by: Jens Axboe --- include/linux/nvme-auth.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nvme-auth.h b/include/linux/nvme-auth.h index 354456826221..dcb8030062dd 100644 --- a/include/linux/nvme-auth.h +++ b/include/linux/nvme-auth.h @@ -29,5 +29,13 @@ struct nvme_dhchap_key *nvme_auth_extract_key(unsigned char *secret, void nvme_auth_free_key(struct nvme_dhchap_key *key); u8 *nvme_auth_transform_key(struct nvme_dhchap_key *key, char *nqn); int nvme_auth_generate_key(u8 *secret, struct nvme_dhchap_key **ret_key); +int nvme_auth_augmented_challenge(u8 hmac_id, u8 *skey, size_t skey_len, + u8 *challenge, u8 *aug, size_t hlen); +int nvme_auth_gen_privkey(struct crypto_kpp *dh_tfm, u8 dh_gid); +int nvme_auth_gen_pubkey(struct crypto_kpp *dh_tfm, + u8 *host_key, size_t host_key_len); +int nvme_auth_gen_shared_secret(struct crypto_kpp *dh_tfm, + u8 *ctrl_key, size_t ctrl_key_len, + u8 *sess_key, size_t sess_key_len); #endif /* _NVME_AUTH_H */ -- cgit From 5a97806f7dc069d9561d9930a2ae108700e222ab Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 27 Jul 2022 12:22:55 -0400 Subject: block: change the blk_queue_split calling convention The double indirect bio leads to somewhat suboptimal code generation. Instead return the (original or split) bio, and make sure the request_queue arguments to the lower level helpers is passed after the bio to avoid constant reshuffling of the argument passing registers. Also give it and the helpers used to implement it more descriptive names. Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220727162300.3089193-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d04bdf549efa..5eef8d2eddc1 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -864,9 +864,9 @@ void blk_request_module(dev_t devt); extern int blk_register_queue(struct gendisk *disk); extern void blk_unregister_queue(struct gendisk *disk); void submit_bio_noacct(struct bio *bio); +struct bio *bio_split_to_limits(struct bio *bio); extern int blk_lld_busy(struct request_queue *q); -extern void blk_queue_split(struct bio **); extern int blk_queue_enter(struct request_queue *q, blk_mq_req_flags_t flags); extern void blk_queue_exit(struct request_queue *q); extern void blk_sync_queue(struct request_queue *q); -- cgit From 46754bd056053785d7079f1d48f7f571728dcb47 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 27 Jul 2022 12:22:57 -0400 Subject: block: move ->bio_split to the gendisk Only non-passthrough requests are split by the block layer and use the ->bio_split bio_set. Move it from the request_queue to the gendisk. Signed-off-by: Christoph Hellwig Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220727162300.3089193-4-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 5eef8d2eddc1..49dcd31e283e 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -140,6 +140,8 @@ struct gendisk { struct request_queue *queue; void *private_data; + struct bio_set bio_split; + int flags; unsigned long state; #define GD_NEED_PART_SCAN 0 @@ -531,7 +533,6 @@ struct request_queue { struct blk_mq_tag_set *tag_set; struct list_head tag_set_list; - struct bio_set bio_split; struct dentry *debugfs_dir; struct dentry *sched_debugfs_dir; -- cgit From cb3b3bf22cf33707d684e74207908ba0ef3b6467 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 8 Jun 2022 13:23:48 +0200 Subject: jbd2: rename jbd_debug() to jbd2_debug() The name of jbd_debug() is confusing as all functions inside jbd2 have jbd2_ prefix. Rename jbd_debug() to jbd2_debug(). No functional changes. Signed-off-by: Jan Kara Reviewed-by: Lukas Czerner Link: https://lore.kernel.org/r/20220608112355.4397-2-jack@suse.cz Signed-off-by: Theodore Ts'o --- include/linux/jbd2.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index e79d6e0b14e8..d4d59e43769f 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -58,10 +58,10 @@ extern ushort jbd2_journal_enable_debug; void __jbd2_debug(int level, const char *file, const char *func, unsigned int line, const char *fmt, ...); -#define jbd_debug(n, fmt, a...) \ +#define jbd2_debug(n, fmt, a...) \ __jbd2_debug((n), __FILE__, __func__, __LINE__, (fmt), ##a) #else -#define jbd_debug(n, fmt, a...) no_printk(fmt, ##a) +#define jbd2_debug(n, fmt, a...) no_printk(fmt, ##a) #endif extern void *jbd2_alloc(size_t size, gfp_t flags); -- cgit From 68af74e92a86984ca6d5f7b19ee8f8a30a550873 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 8 Jun 2022 13:23:49 +0200 Subject: jbd2: remove unused exports for jbd2 debugging Jbd2 exports jbd2_journal_enable_debug and __jbd2_debug() depite the first is used only in fs/jbd2/journal.c and the second only within jbd2 code. Remove the pointless exports make jbd2_journal_enable_debug static. Signed-off-by: Jan Kara Reviewed-by: Lukas Czerner Link: https://lore.kernel.org/r/20220608112355.4397-3-jack@suse.cz Signed-off-by: Theodore Ts'o --- include/linux/jbd2.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index d4d59e43769f..6c2aa61e0f73 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -54,7 +54,6 @@ * CONFIG_JBD2_DEBUG is on. */ #define JBD2_EXPENSIVE_CHECKING -extern ushort jbd2_journal_enable_debug; void __jbd2_debug(int level, const char *file, const char *func, unsigned int line, const char *fmt, ...); -- cgit From d1324958567da957385d8d555a8b840b3bf8e6e3 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Wed, 8 Jun 2022 13:23:50 +0200 Subject: jbd2: unexport jbd2_log_start_commit() jbd2_log_start_commit() is not used outside of jbd2 so unexport it. Also make __jbd2_log_start_commit() static when we are at it. Signed-off-by: Jan Kara Reviewed-by: Lukas Czerner Link: https://lore.kernel.org/r/20220608112355.4397-4-jack@suse.cz Signed-off-by: Theodore Ts'o --- include/linux/jbd2.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h index 6c2aa61e0f73..164ddf1211c0 100644 --- a/include/linux/jbd2.h +++ b/include/linux/jbd2.h @@ -1646,7 +1646,6 @@ extern void jbd2_clear_buffer_revoked_flags(journal_t *journal); */ int jbd2_log_start_commit(journal_t *journal, tid_t tid); -int __jbd2_log_start_commit(journal_t *journal, tid_t tid); int jbd2_journal_start_commit(journal_t *journal, tid_t *tid); int jbd2_log_wait_commit(journal_t *journal, tid_t tid); int jbd2_transaction_committed(journal_t *journal, tid_t tid); -- cgit From 3dc96bba65f53daa217f0a8f43edad145286a8f5 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 12 Jul 2022 12:54:21 +0200 Subject: mbcache: add functions to delete entry if unused Add function mb_cache_entry_delete_or_get() to delete mbcache entry if it is unused and also add a function to wait for entry to become unused - mb_cache_entry_wait_unused(). We do not share code between the two deleting function as one of them will go away soon. CC: stable@vger.kernel.org Fixes: 82939d7999df ("ext4: convert to mbcache2") Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20220712105436.32204-2-jack@suse.cz Signed-off-by: Theodore Ts'o --- include/linux/mbcache.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index 20f1e3ff6013..8eca7f25c432 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -30,15 +30,23 @@ void mb_cache_destroy(struct mb_cache *cache); int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, u64 value, bool reusable); void __mb_cache_entry_free(struct mb_cache_entry *entry); +void mb_cache_entry_wait_unused(struct mb_cache_entry *entry); static inline int mb_cache_entry_put(struct mb_cache *cache, struct mb_cache_entry *entry) { - if (!atomic_dec_and_test(&entry->e_refcnt)) + unsigned int cnt = atomic_dec_return(&entry->e_refcnt); + + if (cnt > 0) { + if (cnt <= 3) + wake_up_var(&entry->e_refcnt); return 0; + } __mb_cache_entry_free(entry); return 1; } +struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache, + u32 key, u64 value); void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value); struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key, u64 value); -- cgit From 75896339e43176af078509c1fce94ee6df9ca1a7 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 12 Jul 2022 12:54:28 +0200 Subject: mbcache: Remove mb_cache_entry_delete() Nobody uses mb_cache_entry_delete() anymore. Remove it. Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20220712105436.32204-9-jack@suse.cz Signed-off-by: Theodore Ts'o --- include/linux/mbcache.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index 8eca7f25c432..452b579856d4 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -47,7 +47,6 @@ static inline int mb_cache_entry_put(struct mb_cache *cache, struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache, u32 key, u64 value); -void mb_cache_entry_delete(struct mb_cache *cache, u32 key, u64 value); struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key, u64 value); struct mb_cache_entry *mb_cache_entry_find_first(struct mb_cache *cache, -- cgit From 307af6c879377c1c63e71cbdd978201f9c7ee8df Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 12 Jul 2022 12:54:29 +0200 Subject: mbcache: automatically delete entries from cache on freeing Use the fact that entries with elevated refcount are not removed from the hash and just move removal of the entry from the hash to the entry freeing time. When doing this we also change the generic code to hold one reference to the cache entry, not two of them, which makes code somewhat more obvious. Signed-off-by: Jan Kara Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz Signed-off-by: Theodore Ts'o --- include/linux/mbcache.h | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index 452b579856d4..2da63fd7b98f 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -13,8 +13,16 @@ struct mb_cache; struct mb_cache_entry { /* List of entries in cache - protected by cache->c_list_lock */ struct list_head e_list; - /* Hash table list - protected by hash chain bitlock */ + /* + * Hash table list - protected by hash chain bitlock. The entry is + * guaranteed to be hashed while e_refcnt > 0. + */ struct hlist_bl_node e_hash_list; + /* + * Entry refcount. Once it reaches zero, entry is unhashed and freed. + * While refcount > 0, the entry is guaranteed to stay in the hash and + * e.g. mb_cache_entry_try_delete() will fail. + */ atomic_t e_refcnt; /* Key in hash - stable during lifetime of the entry */ u32 e_key; @@ -29,20 +37,20 @@ void mb_cache_destroy(struct mb_cache *cache); int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, u64 value, bool reusable); -void __mb_cache_entry_free(struct mb_cache_entry *entry); +void __mb_cache_entry_free(struct mb_cache *cache, + struct mb_cache_entry *entry); void mb_cache_entry_wait_unused(struct mb_cache_entry *entry); -static inline int mb_cache_entry_put(struct mb_cache *cache, - struct mb_cache_entry *entry) +static inline void mb_cache_entry_put(struct mb_cache *cache, + struct mb_cache_entry *entry) { unsigned int cnt = atomic_dec_return(&entry->e_refcnt); if (cnt > 0) { - if (cnt <= 3) + if (cnt <= 2) wake_up_var(&entry->e_refcnt); - return 0; + return; } - __mb_cache_entry_free(entry); - return 1; + __mb_cache_entry_free(cache, entry); } struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache, -- cgit From b6e8d40d43ae4dec00c8fea2593eeea3114b8f44 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Tue, 2 Aug 2022 21:54:51 -0400 Subject: sched, cpuset: Fix dl_cpu_busy() panic due to empty cs->cpus_allowed With cgroup v2, the cpuset's cpus_allowed mask can be empty indicating that the cpuset will just use the effective CPUs of its parent. So cpuset_can_attach() can call task_can_attach() with an empty mask. This can lead to cpumask_any_and() returns nr_cpu_ids causing the call to dl_bw_of() to crash due to percpu value access of an out of bound CPU value. For example: [80468.182258] BUG: unable to handle page fault for address: ffffffff8b6648b0 : [80468.191019] RIP: 0010:dl_cpu_busy+0x30/0x2b0 : [80468.207946] Call Trace: [80468.208947] cpuset_can_attach+0xa0/0x140 [80468.209953] cgroup_migrate_execute+0x8c/0x490 [80468.210931] cgroup_update_dfl_csses+0x254/0x270 [80468.211898] cgroup_subtree_control_write+0x322/0x400 [80468.212854] kernfs_fop_write_iter+0x11c/0x1b0 [80468.213777] new_sync_write+0x11f/0x1b0 [80468.214689] vfs_write+0x1eb/0x280 [80468.215592] ksys_write+0x5f/0xe0 [80468.216463] do_syscall_64+0x5c/0x80 [80468.224287] entry_SYSCALL_64_after_hwframe+0x44/0xae Fix that by using effective_cpus instead. For cgroup v1, effective_cpus is the same as cpus_allowed. For v2, effective_cpus is the real cpumask to be used by tasks within the cpuset anyway. Also update task_can_attach()'s 2nd argument name to cs_effective_cpus to reflect the change. In addition, a check is added to task_can_attach() to guard against the possibility that cpumask_any_and() may return a value >= nr_cpu_ids. Fixes: 7f51412a415d ("sched/deadline: Fix bandwidth check/update when migrating tasks between exclusive cpusets") Signed-off-by: Waiman Long Signed-off-by: Ingo Molnar Acked-by: Juri Lelli Link: https://lore.kernel.org/r/20220803015451.2219567-1-longman@redhat.com --- include/linux/sched.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 88b8817b827d..6a060160f0db 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1813,7 +1813,7 @@ current_restore_flags(unsigned long orig_flags, unsigned long flags) } extern int cpuset_cpumask_can_shrink(const struct cpumask *cur, const struct cpumask *trial); -extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_cpus_allowed); +extern int task_can_attach(struct task_struct *p, const struct cpumask *cs_effective_cpus); #ifdef CONFIG_SMP extern void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask); extern int set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask); -- cgit From a8af0d682ae0c9cf62dd0ad6afdb1480951d6a10 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Thu, 30 Jun 2022 16:21:50 -0400 Subject: libceph: clean up ceph_osdc_start_request prototype This function always returns 0, and ignores the nofail boolean. Drop the nofail argument, make the function void return and fix up the callers. Signed-off-by: Jeff Layton Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov --- include/linux/ceph/osd_client.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ceph/osd_client.h b/include/linux/ceph/osd_client.h index cba8a6ffc329..fb6be72104df 100644 --- a/include/linux/ceph/osd_client.h +++ b/include/linux/ceph/osd_client.h @@ -507,9 +507,8 @@ extern struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *, extern void ceph_osdc_get_request(struct ceph_osd_request *req); extern void ceph_osdc_put_request(struct ceph_osd_request *req); -extern int ceph_osdc_start_request(struct ceph_osd_client *osdc, - struct ceph_osd_request *req, - bool nofail); +void ceph_osdc_start_request(struct ceph_osd_client *osdc, + struct ceph_osd_request *req); extern void ceph_osdc_cancel_request(struct ceph_osd_request *req); extern int ceph_osdc_wait_request(struct ceph_osd_client *osdc, struct ceph_osd_request *req); -- cgit From bed4593645366ad7362a3aa7bc0d100d8d8236a8 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Mon, 11 Jul 2022 09:17:38 +0800 Subject: tpm: eventlog: Fix section mismatch for DEBUG_SECTION_MISMATCH If DEBUG_SECTION_MISMATCH enabled, __calc_tpm2_event_size() will not be inlined, this cause section mismatch like this: WARNING: modpost: vmlinux.o(.text.unlikely+0xe30c): Section mismatch in reference from the variable L0 to the function .init.text:early_ioremap() The function L0() references the function __init early_memremap(). This is often because L0 lacks a __init annotation or the annotation of early_ioremap is wrong. Fix it by using __always_inline instead of inline for the called-once function __calc_tpm2_event_size(). Fixes: 44038bc514a2 ("tpm: Abstract crypto agile event size calculations") Cc: stable@vger.kernel.org # v5.3 Reported-by: WANG Xuerui Signed-off-by: Huacai Chen Signed-off-by: Jarkko Sakkinen --- include/linux/tpm_eventlog.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tpm_eventlog.h b/include/linux/tpm_eventlog.h index 739ba9a03ec1..20c0ff54b7a0 100644 --- a/include/linux/tpm_eventlog.h +++ b/include/linux/tpm_eventlog.h @@ -157,7 +157,7 @@ struct tcg_algorithm_info { * Return: size of the event on success, 0 on failure */ -static inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *event, +static __always_inline int __calc_tpm2_event_size(struct tcg_pcr_event2_head *event, struct tcg_pcr_event *event_header, bool do_mapping) { -- cgit From 6930bcbfb6ceda63e298c6af6d733ecdf6bd4cde Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Mon, 1 Aug 2022 15:57:26 -0400 Subject: lockd: detect and reject lock arguments that overflow lockd doesn't currently vet the start and length in nlm4 requests like it should, and can end up generating lock requests with arguments that overflow when passed to the filesystem. The NLM4 protocol uses unsigned 64-bit arguments for both start and length, whereas struct file_lock tracks the start and end as loff_t values. By the time we get around to calling nlm4svc_retrieve_args, we've lost the information that would allow us to determine if there was an overflow. Start tracking the actual start and len for NLM4 requests in the nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they won't cause an overflow, and return NLM4_FBIG if they do. Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392 Reported-by: Jan Kasiak Signed-off-by: Jeff Layton Signed-off-by: Chuck Lever Cc: # 5.14+ --- include/linux/lockd/xdr.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lockd/xdr.h b/include/linux/lockd/xdr.h index 398f70093cd3..67e4a2c5500b 100644 --- a/include/linux/lockd/xdr.h +++ b/include/linux/lockd/xdr.h @@ -41,6 +41,8 @@ struct nlm_lock { struct nfs_fh fh; struct xdr_netobj oh; u32 svid; + u64 lock_start; + u64 lock_len; struct file_lock fl; }; -- cgit From f482aa98652795846cc55da98ebe331eb74f3d0b Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Wed, 3 Aug 2022 15:23:43 -0700 Subject: audit, io_uring, io-wq: Fix memory leak in io_sq_thread() and io_wqe_worker() Currently @audit_context is allocated twice for io_uring workers: 1. copy_process() calls audit_alloc(); 2. io_sq_thread() or io_wqe_worker() calls audit_alloc_kernel() (which is effectively audit_alloc()) and overwrites @audit_context, causing: BUG: memory leak unreferenced object 0xffff888144547400 (size 1024): <...> hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [] audit_alloc+0x133/0x210 [] copy_process+0xcd3/0x2340 [] create_io_thread+0x63/0x90 [] create_io_worker+0xb4/0x230 [] io_wqe_enqueue+0x248/0x3b0 [] io_queue_iowq+0xba/0x200 [] io_queue_async+0x113/0x180 [] io_req_task_submit+0x18f/0x1a0 [] io_apoll_task_func+0xdd/0x120 [] tctx_task_work+0x11f/0x570 [] task_work_run+0x7e/0xc0 [] get_signal+0xc18/0xf10 [] arch_do_signal_or_restart+0x2b/0x730 [] exit_to_user_mode_prepare+0x5e/0x180 [] syscall_exit_to_user_mode+0x12/0x20 [] do_syscall_64+0x40/0x80 Then, 3. io_sq_thread() or io_wqe_worker() frees @audit_context using audit_free(); 4. do_exit() eventually calls audit_free() again, which is okay because audit_free() does a NULL check. As suggested by Paul Moore, fix it by deleting audit_alloc_kernel() and redundant audit_free() calls. Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring") Suggested-by: Paul Moore Cc: stable@vger.kernel.org Signed-off-by: Peilin Ye Acked-by: Paul Moore Link: https://lore.kernel.org/r/20220803222343.31673-1-yepeilin.cs@gmail.com Signed-off-by: Jens Axboe --- include/linux/audit.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/audit.h b/include/linux/audit.h index 00f7a80f1a3e..3608992848d3 100644 --- a/include/linux/audit.h +++ b/include/linux/audit.h @@ -285,7 +285,6 @@ static inline int audit_signal_info(int sig, struct task_struct *t) /* These are defined in auditsc.c */ /* Public API */ extern int audit_alloc(struct task_struct *task); -extern int audit_alloc_kernel(struct task_struct *task); extern void __audit_free(struct task_struct *task); extern void __audit_uring_entry(u8 op); extern void __audit_uring_exit(int success, long code); @@ -578,10 +577,6 @@ static inline int audit_alloc(struct task_struct *task) { return 0; } -static inline int audit_alloc_kernel(struct task_struct *task) -{ - return 0; -} static inline void audit_free(struct task_struct *task) { } static inline void audit_uring_entry(u8 op) -- cgit From e2dcac2f58f5a95ab092d1da237ffdc0da1832cf Mon Sep 17 00:00:00 2001 From: Jinghao Jia Date: Fri, 29 Jul 2022 20:17:13 +0000 Subject: BPF: Fix potential bad pointer dereference in bpf_sys_bpf() The bpf_sys_bpf() helper function allows an eBPF program to load another eBPF program from within the kernel. In this case the argument union bpf_attr pointer (as well as the insns and license pointers inside) is a kernel address instead of a userspace address (which is the case of a usual bpf() syscall). To make the memory copying process in the syscall work in both cases, bpfptr_t was introduced to wrap around the pointer and distinguish its origin. Specifically, when copying memory contents from a bpfptr_t, a copy_from_user() is performed in case of a userspace address and a memcpy() is performed for a kernel address. This can lead to problems because the in-kernel pointer is never checked for validity. The problem happens when an eBPF syscall program tries to call bpf_sys_bpf() to load a program but provides a bad insns pointer -- say 0xdeadbeef -- in the bpf_attr union. The helper calls __sys_bpf() which would then call bpf_prog_load() to load the program. bpf_prog_load() is responsible for copying the eBPF instructions to the newly allocated memory for the program; it creates a kernel bpfptr_t for insns and invokes copy_from_bpfptr(). Internally, all bpfptr_t operations are backed by the corresponding sockptr_t operations, which performs direct memcpy() on kernel pointers for copy_from/strncpy_from operations. Therefore, the code is always happy to dereference the bad pointer to trigger a un-handle-able page fault and in turn an oops. However, this is not supposed to happen because at that point the eBPF program is already verified and should not cause a memory error. Sample KASAN trace: [ 25.685056][ T228] ================================================================== [ 25.685680][ T228] BUG: KASAN: user-memory-access in copy_from_bpfptr+0x21/0x30 [ 25.686210][ T228] Read of size 80 at addr 00000000deadbeef by task poc/228 [ 25.686732][ T228] [ 25.686893][ T228] CPU: 3 PID: 228 Comm: poc Not tainted 5.19.0-rc7 #7 [ 25.687375][ T228] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS d55cb5a 04/01/2014 [ 25.687991][ T228] Call Trace: [ 25.688223][ T228] [ 25.688429][ T228] dump_stack_lvl+0x73/0x9e [ 25.688747][ T228] print_report+0xea/0x200 [ 25.689061][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.689401][ T228] ? _printk+0x54/0x6e [ 25.689693][ T228] ? _raw_spin_lock_irqsave+0x70/0xd0 [ 25.690071][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.690412][ T228] kasan_report+0xb5/0xe0 [ 25.690716][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.691059][ T228] kasan_check_range+0x2bd/0x2e0 [ 25.691405][ T228] ? copy_from_bpfptr+0x21/0x30 [ 25.691734][ T228] memcpy+0x25/0x60 [ 25.692000][ T228] copy_from_bpfptr+0x21/0x30 [ 25.692328][ T228] bpf_prog_load+0x604/0x9e0 [ 25.692653][ T228] ? cap_capable+0xb4/0xe0 [ 25.692956][ T228] ? security_capable+0x4f/0x70 [ 25.693324][ T228] __sys_bpf+0x3af/0x580 [ 25.693635][ T228] bpf_sys_bpf+0x45/0x240 [ 25.693937][ T228] bpf_prog_f0ec79a5a3caca46_bpf_func1+0xa2/0xbd [ 25.694394][ T228] bpf_prog_run_pin_on_cpu+0x2f/0xb0 [ 25.694756][ T228] bpf_prog_test_run_syscall+0x146/0x1c0 [ 25.695144][ T228] bpf_prog_test_run+0x172/0x190 [ 25.695487][ T228] __sys_bpf+0x2c5/0x580 [ 25.695776][ T228] __x64_sys_bpf+0x3a/0x50 [ 25.696084][ T228] do_syscall_64+0x60/0x90 [ 25.696393][ T228] ? fpregs_assert_state_consistent+0x50/0x60 [ 25.696815][ T228] ? exit_to_user_mode_prepare+0x36/0xa0 [ 25.697202][ T228] ? syscall_exit_to_user_mode+0x20/0x40 [ 25.697586][ T228] ? do_syscall_64+0x6e/0x90 [ 25.697899][ T228] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 25.698312][ T228] RIP: 0033:0x7f6d543fb759 [ 25.698624][ T228] Code: 08 5b 89 e8 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 a6 0e 00 f7 d8 64 89 01 48 [ 25.699946][ T228] RSP: 002b:00007ffc3df78468 EFLAGS: 00000287 ORIG_RAX: 0000000000000141 [ 25.700526][ T228] RAX: ffffffffffffffda RBX: 00007ffc3df78628 RCX: 00007f6d543fb759 [ 25.701071][ T228] RDX: 0000000000000090 RSI: 00007ffc3df78478 RDI: 000000000000000a [ 25.701636][ T228] RBP: 00007ffc3df78510 R08: 0000000000000000 R09: 0000000000300000 [ 25.702191][ T228] R10: 0000000000000005 R11: 0000000000000287 R12: 0000000000000000 [ 25.702736][ T228] R13: 00007ffc3df78638 R14: 000055a1584aca68 R15: 00007f6d5456a000 [ 25.703282][ T228] [ 25.703490][ T228] ================================================================== [ 25.704050][ T228] Disabling lock debugging due to kernel taint Update copy_from_bpfptr() and strncpy_from_bpfptr() so that: - for a kernel pointer, it uses the safe copy_from_kernel_nofault() and strncpy_from_kernel_nofault() functions. - for a userspace pointer, it performs copy_from_user() and strncpy_from_user(). Fixes: af2ac3e13e45 ("bpf: Prepare bpf syscall to be used from kernel and user space.") Link: https://lore.kernel.org/bpf/20220727132905.45166-1-jinghao@linux.ibm.com/ Signed-off-by: Jinghao Jia Acked-by: Yonghong Song Link: https://lore.kernel.org/r/20220729201713.88688-1-jinghao@linux.ibm.com Signed-off-by: Alexei Starovoitov --- include/linux/bpfptr.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpfptr.h b/include/linux/bpfptr.h index 46e1757d06a3..79b2f78eec1a 100644 --- a/include/linux/bpfptr.h +++ b/include/linux/bpfptr.h @@ -49,7 +49,9 @@ static inline void bpfptr_add(bpfptr_t *bpfptr, size_t val) static inline int copy_from_bpfptr_offset(void *dst, bpfptr_t src, size_t offset, size_t size) { - return copy_from_sockptr_offset(dst, (sockptr_t) src, offset, size); + if (!bpfptr_is_kernel(src)) + return copy_from_user(dst, src.user + offset, size); + return copy_from_kernel_nofault(dst, src.kernel + offset, size); } static inline int copy_from_bpfptr(void *dst, bpfptr_t src, size_t size) @@ -78,7 +80,9 @@ static inline void *kvmemdup_bpfptr(bpfptr_t src, size_t len) static inline long strncpy_from_bpfptr(char *dst, bpfptr_t src, size_t count) { - return strncpy_from_sockptr(dst, (sockptr_t) src, count); + if (bpfptr_is_kernel(src)) + return strncpy_from_kernel_nofault(dst, src.kernel, count); + return strncpy_from_user(dst, src.user, count); } #endif /* _LINUX_BPFPTR_H */ -- cgit From 3c59366c207e4c6c6569524af606baf017a55c61 Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 1 Aug 2022 10:33:34 +1000 Subject: NFS: don't unhash dentry during unlink/rename NFS unlink() (and rename over existing target) must determine if the file is open, and must perform a "silly rename" instead of an unlink (or before rename) if it is. Otherwise the client might hold a file open which has been removed on the server. Consequently if it determines that the file isn't open, it must block any subsequent opens until the unlink/rename has been completed on the server. This is currently achieved by unhashing the dentry. This forces any open attempt to the slow-path for lookup which will block on i_rwsem on the directory until the unlink/rename completes. A future patch will change the VFS to only get a shared lock on i_rwsem for unlink, so this will no longer work. Instead we introduce an explicit interlock. A special value is stored in dentry->d_fsdata while the unlink/rename is running and ->d_revalidate blocks while that value is present. When ->d_revalidate unblocks, the dentry will be invalid. This closes the race without requiring exclusion on i_rwsem. d_fsdata is already used in two different ways. 1/ an IS_ROOT directory dentry might have a "devname" stored in d_fsdata. Such a dentry doesn't have a name and so cannot be the target of unlink or rename. For safety we check if an old devname is still stored, and remove it if it is. 2/ a dentry with DCACHE_NFSFS_RENAMED set will have a 'struct nfs_unlinkdata' stored in d_fsdata. While this is set maydelete() will fail, so an unlink or rename will never proceed on such a dentry. Neither of these can be in effect when a dentry is the target of unlink or rename. So we can expect d_fsdata to be NULL, and store a special value ((void*)1) which is given the name NFS_FSDATA_BLOCKED to indicate that any lookup will be blocked. The d_count() is incremented under d_lock() when a lookup finds the dentry, so we check d_count() is low, and set NFS_FSDATA_BLOCKED under the same lock to avoid any races. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index a17c337dbdf1..b32ed68e7dc4 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -617,6 +617,15 @@ nfs_fileid_to_ino_t(u64 fileid) #define NFS_JUKEBOX_RETRY_TIME (5 * HZ) +/* We need to block new opens while a file is being unlinked. + * If it is opened *before* we decide to unlink, we will silly-rename + * instead. If it is opened *after*, then we need to create or will fail. + * If we allow the two to race, we could end up with a file that is open + * but deleted on the server resulting in ESTALE. + * So use ->d_fsdata to record when the unlink is happening + * and block dentry revalidation while it is set. + */ +#define NFS_FSDATA_BLOCKED ((void*)1) # undef ifdebug # ifdef NFS_DEBUG -- cgit From 2da1c30929a28c3c6b01d9c16c4216037be95597 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 28 Jun 2022 17:22:28 +0800 Subject: mm: hugetlb_vmemmap: delete hugetlb_optimize_vmemmap_enabled() Patch series "Simplify hugetlb vmemmap and improve its readability", v2. This series aims to simplify hugetlb vmemmap and improve its readability. This patch (of 8): The name hugetlb_optimize_vmemmap_enabled() a bit confusing as it tests two conditions (enabled and pages in use). Instead of coming up to an appropriate name, we could just delete it. There is already a discussion about deleting it in thread [1]. There is only one user of hugetlb_optimize_vmemmap_enabled() outside of hugetlb_vmemmap, that is flush_dcache_page() in arch/arm64/mm/flush.c. However, it does not need to call hugetlb_optimize_vmemmap_enabled() in flush_dcache_page() since HugeTLB pages are always fully mapped and only head page will be set PG_dcache_clean meaning only head page's flag may need to be cleared (see commit cf5a501d985b). So it is easy to remove hugetlb_optimize_vmemmap_enabled(). Link: https://lore.kernel.org/all/c77c61c8-8a5a-87e8-db89-d04d8aaab4cc@oracle.com/ [1] Link: https://lkml.kernel.org/r/20220628092235.91270-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Oscar Salvador Reviewed-by: Mike Kravetz Reviewed-by: Catalin Marinas Cc: Will Deacon Cc: Anshuman Khandual Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index ea19528564d1..2455405ab82b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -208,12 +208,6 @@ enum pageflags { DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, hugetlb_optimize_vmemmap_key); -static __always_inline bool hugetlb_optimize_vmemmap_enabled(void) -{ - return static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - &hugetlb_optimize_vmemmap_key); -} - /* * If the feature of optimizing vmemmap pages associated with each HugeTLB * page is enabled, the head vmemmap page frame is reused and all of the tail @@ -232,7 +226,8 @@ static __always_inline bool hugetlb_optimize_vmemmap_enabled(void) */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!hugetlb_optimize_vmemmap_enabled()) + if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, + &hugetlb_optimize_vmemmap_key)) return page; /* @@ -260,11 +255,6 @@ static inline const struct page *page_fixed_fake_head(const struct page *page) { return page; } - -static inline bool hugetlb_optimize_vmemmap_enabled(void) -{ - return false; -} #endif static __always_inline int page_is_fake_head(struct page *page) -- cgit From cf5472e561133888df81d2e48f7da9ebd3299459 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 28 Jun 2022 17:22:29 +0800 Subject: mm: hugetlb_vmemmap: optimize vmemmap_optimize_mode handling We hold an another reference to hugetlb_optimize_vmemmap_key when making vmemmap_optimize_mode on, because we use static_key to tell memory_hotplug that memory_hotplug.memmap_on_memory should be overridden. However, this rule has gone when we have introduced PageVmemmapSelfHosted. Therefore, we could simplify vmemmap_optimize_mode handling by not holding an another reference to hugetlb_optimize_vmemmap_key. This also means that we not incur the extra page_fixed_fake_head checks if there are no vmemmap optinmized hugetlb pages after this change. Link: https://lkml.kernel.org/r/20220628092235.91270-3-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Oscar Salvador Reviewed-by: Mike Kravetz Cc: Anshuman Khandual Cc: Catalin Marinas Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Will Deacon Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 2455405ab82b..b44cc24d7496 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -205,8 +205,7 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H #ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP -DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - hugetlb_optimize_vmemmap_key); +DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); /* * If the feature of optimizing vmemmap pages associated with each HugeTLB @@ -226,8 +225,7 @@ DECLARE_STATIC_KEY_MAYBE(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { - if (!static_branch_maybe(CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON, - &hugetlb_optimize_vmemmap_key)) + if (!static_branch_unlikely(&hugetlb_optimize_vmemmap_key)) return page; /* -- cgit From dff033818a06e7d0bf79271e34bda11c2d9d98d0 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 28 Jun 2022 17:22:30 +0800 Subject: mm: hugetlb_vmemmap: introduce the name HVO It it inconvenient to mention the feature of optimizing vmemmap pages associated with HugeTLB pages when communicating with others since there is no specific or abbreviated name for it when it is first introduced. Let us give it a name HVO (HugeTLB Vmemmap Optimization) from now. This commit also updates the document about "hugetlb_free_vmemmap" by the way discussed in thread [1]. Link: https://lore.kernel.org/all/21aae898-d54d-cc4b-a11f-1bb7fddcfffa@redhat.com/ [1] Link: https://lkml.kernel.org/r/20220628092235.91270-4-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Oscar Salvador Reviewed-by: Mike Kravetz Cc: Anshuman Khandual Cc: Catalin Marinas Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Will Deacon Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b44cc24d7496..78ed46ae6ee5 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -208,8 +208,7 @@ enum pageflags { DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); /* - * If the feature of optimizing vmemmap pages associated with each HugeTLB - * page is enabled, the head vmemmap page frame is reused and all of the tail + * If HVO is enabled, the head vmemmap page frame is reused and all of the tail * vmemmap addresses map to the head vmemmap page frame (furture details can * refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other * words, there are more than one page struct with PG_head associated with each -- cgit From 998a2997885f73e5cc732ac6d661dfa6e0f50654 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 28 Jun 2022 17:22:31 +0800 Subject: mm: hugetlb_vmemmap: move vmemmap code related to HugeTLB to hugetlb_vmemmap.c When I first introduced vmemmap manipulation functions related to HugeTLB, I thought those functions may be reused by other modules (e.g. using similar approach to optimize vmemmap pages, unfortunately, the DAX used the same approach but does not use those functions). After two years, we didn't see any other users. So move those functions to hugetlb_vmemmap.c. Code movement without any functional change. Link: https://lkml.kernel.org/r/20220628092235.91270-5-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Oscar Salvador Reviewed-by: Mike Kravetz Cc: Anshuman Khandual Cc: Catalin Marinas Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Will Deacon Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/mm.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 18e01474cf6b..e6e201a4ce05 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3142,13 +3142,6 @@ static inline void print_vma_addr(char *prefix, unsigned long rip) } #endif -#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP -int vmemmap_remap_free(unsigned long start, unsigned long end, - unsigned long reuse); -int vmemmap_remap_alloc(unsigned long start, unsigned long end, - unsigned long reuse, gfp_t gfp_mask); -#endif - void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, unsigned long nr_pages, int nid, struct vmem_altmap *altmap, -- cgit From 6213834c10de954470b7195cf0cdbda858edf0ee Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 28 Jun 2022 17:22:33 +0800 Subject: mm: hugetlb_vmemmap: improve hugetlb_vmemmap code readability There is a discussion about the name of hugetlb_vmemmap_alloc/free in thread [1]. The suggestion suggested by David is rename "alloc/free" to "optimize/restore" to make functionalities clearer to users, "optimize" means the function will optimize vmemmap pages, while "restore" means restoring its vmemmap pages discared before. This commit does this. Another discussion is the confusion RESERVE_VMEMMAP_NR isn't used explicitly for vmemmap_addr but implicitly for vmemmap_end in hugetlb_vmemmap_alloc/free. David suggested we can compute what hugetlb_vmemmap_init() does now at runtime. We do not need to worry for the overhead of computing at runtime since the calculation is simple enough and those functions are not in a hot path. This commit has the following improvements: 1) The function suffixed name ("optimize/restore") is more expressive. 2) The logic becomes less weird in hugetlb_vmemmap_optimize/restore(). 3) The hugetlb_vmemmap_init() does not need to be exported anymore. 4) A ->optimize_vmemmap_pages field in struct hstate is killed. 5) There is only one place where checks is_power_of_2(sizeof(struct page)) instead of two places. 6) Add more comments for hugetlb_vmemmap_optimize/restore(). 7) For external users, hugetlb_optimize_vmemmap_pages() is used for detecting if the HugeTLB's vmemmap pages is optimizable originally. In this commit, it is killed and we introduce a new helper hugetlb_vmemmap_optimizable() to replace it. The name is more expressive. Link: https://lore.kernel.org/all/20220404074652.68024-2-songmuchun@bytedance.com/ [1] Link: https://lkml.kernel.org/r/20220628092235.91270-7-songmuchun@bytedance.com Signed-off-by: Muchun Song Reviewed-by: Mike Kravetz Cc: Anshuman Khandual Cc: Catalin Marinas Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Oscar Salvador Cc: Will Deacon Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 7 ++----- include/linux/sysctl.h | 4 ++++ 2 files changed, 6 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 4cdfce976644..6d0620edf0a6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -638,9 +638,6 @@ struct hstate { unsigned int nr_huge_pages_node[MAX_NUMNODES]; unsigned int free_huge_pages_node[MAX_NUMNODES]; unsigned int surplus_huge_pages_node[MAX_NUMNODES]; -#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP - unsigned int optimize_vmemmap_pages; -#endif #ifdef CONFIG_CGROUP_HUGETLB /* cgroup control files */ struct cftype cgroup_files_dfl[8]; @@ -716,7 +713,7 @@ static inline struct hstate *hstate_vma(struct vm_area_struct *vma) return hstate_file(vma->vm_file); } -static inline unsigned long huge_page_size(struct hstate *h) +static inline unsigned long huge_page_size(const struct hstate *h) { return (unsigned long)PAGE_SIZE << h->order; } @@ -745,7 +742,7 @@ static inline bool hstate_is_gigantic(struct hstate *h) return huge_page_order(h) >= MAX_ORDER; } -static inline unsigned int pages_per_huge_page(struct hstate *h) +static inline unsigned int pages_per_huge_page(const struct hstate *h) { return 1 << h->order; } diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 17b42ce89d3e..780690dc08cd 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -268,6 +268,10 @@ static inline struct ctl_table_header *register_sysctl_table(struct ctl_table * return NULL; } +static inline void register_sysctl_init(const char *path, struct ctl_table *table) +{ +} + static inline struct ctl_table_header *register_sysctl_mount_point(const char *path) { return NULL; -- cgit From 838691a1c0ec44739db558834e6954d62577d6b8 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Tue, 28 Jun 2022 17:22:34 +0800 Subject: mm: hugetlb_vmemmap: move code comments to vmemmap_dedup.rst All the comments which explains how HVO works are moved to vmemmap_dedup.rst since commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory savings for compound devmaps") except some comments above page_fixed_fake_head(). This commit moves those comments to vmemmap_dedup.rst and improve vmemmap_dedup.rst as well. Link: https://lkml.kernel.org/r/20220628092235.91270-8-songmuchun@bytedance.com Signed-off-by: Muchun Song Cc: Anshuman Khandual Cc: Catalin Marinas Cc: David Hildenbrand Cc: Jonathan Corbet Cc: Mike Kravetz Cc: Oscar Salvador Cc: Will Deacon Cc: Xiongchun Duan Signed-off-by: Andrew Morton --- include/linux/page-flags.h | 15 ++------------- 1 file changed, 2 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 78ed46ae6ee5..465ff35a8c00 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -208,19 +208,8 @@ enum pageflags { DECLARE_STATIC_KEY_FALSE(hugetlb_optimize_vmemmap_key); /* - * If HVO is enabled, the head vmemmap page frame is reused and all of the tail - * vmemmap addresses map to the head vmemmap page frame (furture details can - * refer to the figure at the head of the mm/hugetlb_vmemmap.c). In other - * words, there are more than one page struct with PG_head associated with each - * HugeTLB page. We __know__ that there is only one head page struct, the tail - * page structs with PG_head are fake head page structs. We need an approach - * to distinguish between those two different types of page structs so that - * compound_head() can return the real head page struct when the parameter is - * the tail page struct but with PG_head. - * - * The page_fixed_fake_head() returns the real head page struct if the @page is - * fake page head, otherwise, returns @page which can either be a true page - * head or tail. + * Return the real head page struct iff the @page is a fake head page, otherwise + * return the @page itself. See Documentation/mm/vmemmap_dedup.rst. */ static __always_inline const struct page *page_fixed_fake_head(const struct page *page) { -- cgit From 161df60e9e89651c9aa3ae0edc9aae3a8a2d21e7 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 14 Jul 2022 13:24:15 +0900 Subject: mm, hwpoison, hugetlb: support saving mechanism of raw error pages When handling memory error on a hugetlb page, the error handler tries to dissolve and turn it into 4kB pages. If it's successfully dissolved, PageHWPoison flag is moved to the raw error page, so that's all right. However, dissolve sometimes fails, then the error page is left as hwpoisoned hugepage. It's useful if we can retry to dissolve it to save healthy pages, but that's not possible now because the information about where the raw error pages is lost. Use the private field of a few tail pages to keep that information. The code path of shrinking hugepage pool uses this info to try delayed dissolve. In order to remember multiple errors in a hugepage, a singly-linked list originated from SUBPAGE_INDEX_HWPOISON-th tail page is constructed. Only simple operations (adding an entry or clearing all) are required and the list is assumed not to be very long, so this simple data structure should be enough. If we failed to save raw error info, the hwpoison hugepage has errors on unknown subpage, then this new saving mechanism does not work any more, so disable saving new raw error info and freeing hwpoison hugepages. Link: https://lkml.kernel.org/r/20220714042420.1847125-4-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi Reported-by: kernel test robot Reviewed-by: Miaohe Lin Cc: David Hildenbrand Cc: Liu Shixin Cc: Mike Kravetz Cc: Muchun Song Cc: Oscar Salvador Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6d0620edf0a6..3ec981a0d8b3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -42,6 +42,9 @@ enum { SUBPAGE_INDEX_CGROUP, /* reuse page->private */ SUBPAGE_INDEX_CGROUP_RSVD, /* reuse page->private */ __MAX_CGROUP_SUBPAGE_INDEX = SUBPAGE_INDEX_CGROUP_RSVD, +#endif +#ifdef CONFIG_MEMORY_FAILURE + SUBPAGE_INDEX_HWPOISON, #endif __NR_USED_SUBPAGE, }; @@ -551,7 +554,7 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr, * Synchronization: Initially set after new page allocation with no * locking. When examined and modified during migration processing * (isolate, migrate, putback) the hugetlb_lock is held. - * HPG_temporary - - Set on a page that is temporarily allocated from the buddy + * HPG_temporary - Set on a page that is temporarily allocated from the buddy * allocator. Typically used for migration target pages when no pages * are available in the pool. The hugetlb free page path will * immediately free pages with this flag set to the buddy allocator. @@ -561,6 +564,8 @@ generic_hugetlb_get_unmapped_area(struct file *file, unsigned long addr, * HPG_freed - Set when page is on the free lists. * Synchronization: hugetlb_lock held for examination and modification. * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed. + * HPG_raw_hwp_unreliable - Set when the hugetlb page has a hwpoison sub-page + * that is not tracked by raw_hwp_page list. */ enum hugetlb_page_flags { HPG_restore_reserve = 0, @@ -568,6 +573,7 @@ enum hugetlb_page_flags { HPG_temporary, HPG_freed, HPG_vmemmap_optimized, + HPG_raw_hwp_unreliable, __NR_HPAGEFLAGS, }; @@ -614,6 +620,7 @@ HPAGEFLAG(Migratable, migratable) HPAGEFLAG(Temporary, temporary) HPAGEFLAG(Freed, freed) HPAGEFLAG(VmemmapOptimized, vmemmap_optimized) +HPAGEFLAG(RawHwpUnreliable, raw_hwp_unreliable) #ifdef CONFIG_HUGETLB_PAGE @@ -796,6 +803,14 @@ extern int dissolve_free_huge_page(struct page *page); extern int dissolve_free_huge_pages(unsigned long start_pfn, unsigned long end_pfn); +#ifdef CONFIG_MEMORY_FAILURE +extern void hugetlb_clear_page_hwpoison(struct page *hpage); +#else +static inline void hugetlb_clear_page_hwpoison(struct page *hpage) +{ +} +#endif + #ifdef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION #ifndef arch_hugetlb_migration_supported static inline bool arch_hugetlb_migration_supported(struct hstate *h) -- cgit From ac5fcde0a96a18773f06b7c00c5ea081bbdc64b3 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 14 Jul 2022 13:24:16 +0900 Subject: mm, hwpoison: make unpoison aware of raw error info in hwpoisoned hugepage Raw error info list needs to be removed when hwpoisoned hugetlb is unpoisoned. And unpoison handler needs to know how many errors there are in the target hugepage. So add them. HPageVmemmapOptimized(hpage) and HPageRawHwpUnreliable(hpage)) sometimes can't be unpoisoned, so skip them. Link: https://lkml.kernel.org/r/20220714042420.1847125-5-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi Reported-by: kernel test robot Reviewed-by: Miaohe Lin Cc: David Hildenbrand Cc: Liu Shixin Cc: Mike Kravetz Cc: Muchun Song Cc: Oscar Salvador Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/swapops.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index bb7afd03a324..a3d435bf9f97 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -490,6 +490,11 @@ static inline void num_poisoned_pages_dec(void) atomic_long_dec(&num_poisoned_pages); } +static inline void num_poisoned_pages_sub(long i) +{ + atomic_long_sub(i, &num_poisoned_pages); +} + #else static inline swp_entry_t make_hwpoison_entry(struct page *page) @@ -505,6 +510,10 @@ static inline int is_hwpoison_entry(swp_entry_t swp) static inline void num_poisoned_pages_inc(void) { } + +static inline void num_poisoned_pages_sub(long i) +{ +} #endif static inline int non_swap_entry(swp_entry_t entry) -- cgit From 38f6d29397ccb9c191c4c91103e8123f518fdc10 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 14 Jul 2022 13:24:17 +0900 Subject: mm, hwpoison: set PG_hwpoison for busy hugetlb pages If memory_failure() fails to grab page refcount on a hugetlb page because it's busy, it returns without setting PG_hwpoison on it. This not only loses a chance of error containment, but breaks the rule that action_result() should be called only when memory_failure() do any of handling work (even if that's just setting PG_hwpoison). This inconsistency could harm code maintainability. So set PG_hwpoison and call hugetlb_set_page_hwpoison() for such a case. Link: https://lkml.kernel.org/r/20220714042420.1847125-6-naoya.horiguchi@linux.dev Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()") Signed-off-by: Naoya Horiguchi Reviewed-by: Miaohe Lin Cc: David Hildenbrand Cc: kernel test robot Cc: Liu Shixin Cc: Mike Kravetz Cc: Muchun Song Cc: Oscar Salvador Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index e6e201a4ce05..0345b8c30394 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3176,6 +3176,7 @@ enum mf_flags { MF_SOFT_OFFLINE = 1 << 3, MF_UNPOISON = 1 << 4, MF_SW_SIMULATED = 1 << 5, + MF_NO_RETRY = 1 << 6, }; int mf_dax_kill_procs(struct address_space *mapping, pgoff_t index, unsigned long count, int mf_flags); -- cgit From 6f4614886baa59b6ae014093300482c1da4d3c93 Mon Sep 17 00:00:00 2001 From: Naoya Horiguchi Date: Thu, 14 Jul 2022 13:24:20 +0900 Subject: mm, hwpoison: enable memory error handling on 1GB hugepage Now error handling code is prepared, so remove the blocking code and enable memory error handling on 1GB hugepage. Link: https://lkml.kernel.org/r/20220714042420.1847125-9-naoya.horiguchi@linux.dev Signed-off-by: Naoya Horiguchi Reviewed-by: Miaohe Lin Cc: David Hildenbrand Cc: kernel test robot Cc: Liu Shixin Cc: Mike Kravetz Cc: Muchun Song Cc: Oscar Salvador Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 0345b8c30394..3bedc449c14d 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3229,7 +3229,6 @@ enum mf_action_page_type { MF_MSG_DIFFERENT_COMPOUND, MF_MSG_HUGE, MF_MSG_FREE_HUGE, - MF_MSG_NON_PMD_HUGE, MF_MSG_UNMAP_FAILED, MF_MSG_DIRTY_SWAPCACHE, MF_MSG_CLEAN_SWAPCACHE, -- cgit From 729337bc20876af348b363b3e35fb19be71ba793 Mon Sep 17 00:00:00 2001 From: "Fabio M. De Francesco" Date: Thu, 28 Jul 2022 17:48:38 +0200 Subject: highmem: remove unneeded spaces in kmap_local_page() kdocs Patch series "highmem: Extend kmap_local_page() documentation", v2. The Highmem interface is evolving and the current documentation does not reflect the intended uses of each of the calls. Furthermore, after a recent series of reworks, the differences of the calls can still be confusing and may lead to the expanded use of calls which are deprecated. This series is the second round of changes towards an enhanced documentation of the Highmem's interface; at this stage the patches are only focused to kmap_local_page(). In addition it also contains some minor clean ups. This patch (of 7): In the kdocs of kmap_local_page(), the description of @page starts after several unnecessary spaces. Therefore, remove those spaces. Link: https://lkml.kernel.org/r/20220728154844.10874-1-fmdefrancesco@gmail.com Link: https://lkml.kernel.org/r/20220728154844.10874-2-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco Suggested-by: Ira Weiny Reviewed-by: Ira Weiny Cc: Matthew Wilcox (Oracle) Cc: Mike Rapoport Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Cc: Catalin Marinas Cc: Will Deacon Cc: Peter Collingbourne Cc: Vlastimil Babka Cc: Jonathan Corbet Signed-off-by: Andrew Morton --- include/linux/highmem.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 177b07944640..0bd5ed4ac391 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -60,7 +60,7 @@ static inline void kmap_flush_unused(void); /** * kmap_local_page - Map a page for temporary usage - * @page: Pointer to the page to be mapped + * @page: Pointer to the page to be mapped * * Returns: The virtual address of the mapping * -- cgit From 383bbef283920411379c5c93829102ff7859fea5 Mon Sep 17 00:00:00 2001 From: "Fabio M. De Francesco" Date: Thu, 28 Jul 2022 17:48:39 +0200 Subject: highmem: specify that kmap_local_page() is callable from interrupts In a recent thread about converting kmap() to kmap_local_page(), the safety of calling kmap_local_page() was questioned.[1] "any context" should probably be enough detail for users who want to know whether or not kmap_local_page() can be called from interrupts. However, Linux still has kmap_atomic() which might make users think they must use the latter in interrupts. Add "including interrupts" for better clarity. [1] https://lore.kernel.org/lkml/3187836.aeNJFYEL58@opensuse/ Link: https://lkml.kernel.org/r/20220728154844.10874-3-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco Suggested-by: Ira Weiny Cc: Matthew Wilcox (Oracle) Cc: Mike Rapoport Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Cc: Catalin Marinas Cc: Jonathan Corbet Cc: Peter Collingbourne Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/highmem.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 0bd5ed4ac391..4a46d95ff6c8 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -64,7 +64,7 @@ static inline void kmap_flush_unused(void); * * Returns: The virtual address of the mapping * - * Can be invoked from any context. + * Can be invoked from any context, including interrupts. * * Requires careful handling when nesting multiple mappings because the map * management is stack based. The unmap has to be in the reverse order of -- cgit From 72f1c55adf70fd08ceac6b67455238db2014894a Mon Sep 17 00:00:00 2001 From: "Fabio M. De Francesco" Date: Thu, 28 Jul 2022 17:48:43 +0200 Subject: highmem: delete a sentence from kmap_local_page() kdocs kmap_local_page() should always be preferred in place of kmap() and kmap_atomic(). "Only use when really necessary." is not consistent with the Documentation/mm/highmem.rst and these kdocs it embeds. Therefore, delete the above-mentioned sentence from kdocs. Link: https://lkml.kernel.org/r/20220728154844.10874-7-fmdefrancesco@gmail.com Signed-off-by: Fabio M. De Francesco Suggested-by: Ira Weiny Reviewed-by: Ira Weiny Cc: Matthew Wilcox (Oracle) Cc: Mike Rapoport Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Cc: Catalin Marinas Cc: Jonathan Corbet Cc: Peter Collingbourne Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/highmem.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 4a46d95ff6c8..25679035ca28 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -86,8 +86,7 @@ static inline void kmap_flush_unused(void); * temporarily mapped. * * While it is significantly faster than kmap() for the higmem case it - * comes with restrictions about the pointer validity. Only use when really - * necessary. + * comes with restrictions about the pointer validity. * * On HIGHMEM enabled systems mapping a highmem page has the side effect of * disabling migration in order to keep the virtual address stable across -- cgit From fcb14cb1bdacec5b4374fe161e83fb8208164a85 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 22 May 2022 14:59:25 -0400 Subject: new iov_iter flavour - ITER_UBUF Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: Al Viro --- include/linux/uio.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include/linux') diff --git a/include/linux/uio.h b/include/linux/uio.h index 9a2dc496d535..85bef84fd294 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -26,6 +26,7 @@ enum iter_type { ITER_PIPE, ITER_XARRAY, ITER_DISCARD, + ITER_UBUF, }; struct iov_iter_state { @@ -38,6 +39,7 @@ struct iov_iter { u8 iter_type; bool nofault; bool data_source; + bool user_backed; size_t iov_offset; size_t count; union { @@ -46,6 +48,7 @@ struct iov_iter { const struct bio_vec *bvec; struct xarray *xarray; struct pipe_inode_info *pipe; + void __user *ubuf; }; union { unsigned long nr_segs; @@ -70,6 +73,11 @@ static inline void iov_iter_save_state(struct iov_iter *iter, state->nr_segs = iter->nr_segs; } +static inline bool iter_is_ubuf(const struct iov_iter *i) +{ + return iov_iter_type(i) == ITER_UBUF; +} + static inline bool iter_is_iovec(const struct iov_iter *i) { return iov_iter_type(i) == ITER_IOVEC; @@ -105,6 +113,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i) return i->data_source ? WRITE : READ; } +static inline bool user_backed_iter(const struct iov_iter *i) +{ + return i->user_backed; +} + /* * Total number of bytes covered by an iovec. * @@ -322,4 +335,17 @@ ssize_t __import_iovec(int type, const struct iovec __user *uvec, int import_single_range(int type, void __user *buf, size_t len, struct iovec *iov, struct iov_iter *i); +static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, + void __user *buf, size_t count) +{ + WARN_ON(direction & ~(READ | WRITE)); + *i = (struct iov_iter) { + .iter_type = ITER_UBUF, + .user_backed = true, + .data_source = direction, + .ubuf = buf, + .count = count + }; +} + #endif -- cgit From 10f525a8cd7a525e9fc73288bb35428c9cad5e63 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Wed, 15 Jun 2022 02:02:51 -0400 Subject: ITER_PIPE: cache the type of last buffer We often need to find whether the last buffer is anon or not, and currently it's rather clumsy: check if ->iov_offset is non-zero (i.e. that pipe is not empty) if so, get the corresponding pipe_buffer and check its ->ops if it's &default_pipe_buf_ops, we have an anon buffer. Let's replace the use of ->iov_offset (which is nowhere near similar to its role for other flavours) with signed field (->last_offset), with the following rules: empty, no buffers occupied: 0 anon, with bytes up to N-1 filled: N zero-copy, with bytes up to N-1 filled: -N That way abs(i->last_offset) is equal to what used to be in i->iov_offset and empty vs. anon vs. zero-copy can be distinguished by the sign of i->last_offset. Checks for "should we extend the last buffer or should we start a new one?" become easier to follow that way. Note that most of the operations can only be done in a sane state - i.e. when the pipe has nothing past the current position of iterator. About the only thing that could be done outside of that state is iov_iter_advance(), which transitions to the sane state by truncating the pipe. There are only two cases where we leave the sane state: 1) iov_iter_get_pages()/iov_iter_get_pages_alloc(). Will be dealt with later, when we make get_pages advancing - the callers are actually happier that way. 2) iov_iter copied, then something is put into the copy. Since they share the underlying pipe, the original gets behind. When we decide that we are done with the copy (original is not usable until then) we advance the original. direct_io used to be done that way; nowadays it operates on the original and we do iov_iter_revert() to discard the excessive data. At the moment there's nothing in the kernel that could do that to ITER_PIPE iterators, so this reason for insane state is theoretical right now. Signed-off-by: Al Viro --- include/linux/uio.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/uio.h b/include/linux/uio.h index 85bef84fd294..e7fc29b5ad19 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -40,7 +40,10 @@ struct iov_iter { bool nofault; bool data_source; bool user_backed; - size_t iov_offset; + union { + size_t iov_offset; + int last_offset; + }; size_t count; union { const struct iovec *iov; -- cgit From 12d426ab64a1c75f1b2ee5c33e933a4c16004049 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Wed, 15 Jun 2022 09:44:38 -0400 Subject: ITER_PIPE: fold data_start() and pipe_space_for_user() together All their callers are next to each other; all of them want the total amount of pages and, possibly, the offset in the partial final buffer. Combine into a new helper (pipe_npages()), fix the bogosity in pipe_space_for_user(), while we are at it. Signed-off-by: Al Viro --- include/linux/pipe_fs_i.h | 20 -------------------- 1 file changed, 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 4ea496924106..6cb65df3e3ba 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -156,26 +156,6 @@ static inline bool pipe_full(unsigned int head, unsigned int tail, return pipe_occupancy(head, tail) >= limit; } -/** - * pipe_space_for_user - Return number of slots available to userspace - * @head: The pipe ring head pointer - * @tail: The pipe ring tail pointer - * @pipe: The pipe info structure - */ -static inline unsigned int pipe_space_for_user(unsigned int head, unsigned int tail, - struct pipe_inode_info *pipe) -{ - unsigned int p_occupancy, p_space; - - p_occupancy = pipe_occupancy(head, tail); - if (p_occupancy >= pipe->max_usage) - return 0; - p_space = pipe->ring_size - p_occupancy; - if (p_space > pipe->max_usage) - p_space = pipe->max_usage; - return p_space; -} - /** * pipe_buf_get - get a reference to a pipe_buffer * @pipe: the pipe that the buffer belongs to -- cgit From 1ef255e257173f4bc44317ef2076e7e0de688fdf Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 9 Jun 2022 10:28:36 -0400 Subject: iov_iter: advancing variants of iov_iter_get_pages{,_alloc}() Most of the users immediately follow successful iov_iter_get_pages() with advancing by the amount it had returned. Provide inline wrappers doing that, convert trivial open-coded uses of those. BTW, iov_iter_get_pages() never returns more than it had been asked to; such checks in cifs ought to be removed someday... Reviewed-by: Jeff Layton Signed-off-by: Al Viro --- include/linux/uio.h | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/uio.h b/include/linux/uio.h index e7fc29b5ad19..b70d28693400 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -351,4 +351,24 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, }; } +static inline ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, + size_t maxsize, unsigned maxpages, size_t *start) +{ + ssize_t res = iov_iter_get_pages(i, pages, maxsize, maxpages, start); + + if (res >= 0) + iov_iter_advance(i, res); + return res; +} + +static inline ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages, + size_t maxsize, size_t *start) +{ + ssize_t res = iov_iter_get_pages_alloc(i, pages, maxsize, start); + + if (res >= 0) + iov_iter_advance(i, res); + return res; +} + #endif -- cgit From eba2d3d798295dc43cae8fade102f9d083a2a741 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Fri, 10 Jun 2022 13:05:12 -0400 Subject: get rid of non-advancing variants mechanical change; will be further massaged in subsequent commits Reviewed-by: Jeff Layton Signed-off-by: Al Viro --- include/linux/uio.h | 24 ++---------------------- 1 file changed, 2 insertions(+), 22 deletions(-) (limited to 'include/linux') diff --git a/include/linux/uio.h b/include/linux/uio.h index b70d28693400..5896af36199c 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -247,9 +247,9 @@ void iov_iter_pipe(struct iov_iter *i, unsigned int direction, struct pipe_inode void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, loff_t start, size_t count); -ssize_t iov_iter_get_pages(struct iov_iter *i, struct page **pages, +ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, size_t maxsize, unsigned maxpages, size_t *start); -ssize_t iov_iter_get_pages_alloc(struct iov_iter *i, struct page ***pages, +ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages, size_t maxsize, size_t *start); int iov_iter_npages(const struct iov_iter *i, int maxpages); void iov_iter_restore(struct iov_iter *i, struct iov_iter_state *state); @@ -351,24 +351,4 @@ static inline void iov_iter_ubuf(struct iov_iter *i, unsigned int direction, }; } -static inline ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages, - size_t maxsize, unsigned maxpages, size_t *start) -{ - ssize_t res = iov_iter_get_pages(i, pages, maxsize, maxpages, start); - - if (res >= 0) - iov_iter_advance(i, res); - return res; -} - -static inline ssize_t iov_iter_get_pages_alloc2(struct iov_iter *i, struct page ***pages, - size_t maxsize, size_t *start) -{ - ssize_t res = iov_iter_get_pages_alloc(i, pages, maxsize, start); - - if (res >= 0) - iov_iter_advance(i, res); - return res; -} - #endif -- cgit From fa96b24204af42274ec13dfb2f2e6990d7510e55 Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Fri, 5 Aug 2022 14:48:14 -0700 Subject: btf: Add a new kfunc flag which allows to mark a function to be sleepable This allows to declare a kfunc as sleepable and prevents its use in a non sleepable program. Signed-off-by: Benjamin Tissoires Co-developed-by: Yosry Ahmed Signed-off-by: Yosry Ahmed Signed-off-by: Hao Luo Link: https://lore.kernel.org/r/20220805214821.1058337-2-haoluo@google.com Signed-off-by: Alexei Starovoitov --- include/linux/btf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index cdb376d53238..976cbdd2981f 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -49,6 +49,7 @@ * for this case. */ #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ struct btf; struct btf_member; -- cgit From c8996c98f703b09afe77a1d247dae691c9849dc1 Mon Sep 17 00:00:00 2001 From: Jesper Dangaard Brouer Date: Tue, 9 Aug 2022 08:08:02 +0200 Subject: bpf: Add BPF-helper for accessing CLOCK_TAI Commit 3dc6ffae2da2 ("timekeeping: Introduce fast accessor to clock tai") introduced a fast and NMI-safe accessor for CLOCK_TAI. Especially in time sensitive networks (TSN), where all nodes are synchronized by Precision Time Protocol (PTP), it's helpful to have the possibility to generate timestamps based on CLOCK_TAI instead of CLOCK_MONOTONIC. With a BPF helper for TAI in place, it becomes very convenient to correlate activity across different machines in the network. Use cases for such a BPF helper include functionalities such as Tx launch time (e.g. ETF and TAPRIO Qdiscs) and timestamping. Note: CLOCK_TAI is nothing new per se, only the NMI-safe variant of it is. Signed-off-by: Jesper Dangaard Brouer [Kurt: Wrote changelog and renamed helper] Signed-off-by: Kurt Kanzenbach Link: https://lore.kernel.org/r/20220809060803.5773-2-kurt@linutronix.de Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 20c26aed7896..a627a02cf8ab 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2349,6 +2349,7 @@ extern const struct bpf_func_proto bpf_get_numa_node_id_proto; extern const struct bpf_func_proto bpf_tail_call_proto; extern const struct bpf_func_proto bpf_ktime_get_ns_proto; extern const struct bpf_func_proto bpf_ktime_get_boot_ns_proto; +extern const struct bpf_func_proto bpf_ktime_get_tai_ns_proto; extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto; extern const struct bpf_func_proto bpf_get_current_uid_gid_proto; extern const struct bpf_func_proto bpf_get_current_comm_proto; -- cgit From 46dae32fe625a75f549c3a70edc77b778197bb05 Mon Sep 17 00:00:00 2001 From: Youngmin Nam Date: Tue, 12 Jul 2022 18:47:15 +0900 Subject: time: Correct the prototype of ns_to_kernel_old_timeval and ns_to_timespec64 In ns_to_kernel_old_timeval() definition, the function argument is defined with const identifier in kernel/time/time.c, but the prototype in include/linux/time32.h looks different. - The function is defined in kernel/time/time.c as below: struct __kernel_old_timeval ns_to_kernel_old_timeval(const s64 nsec) - The function is decalared in include/linux/time32.h as below: extern struct __kernel_old_timeval ns_to_kernel_old_timeval(s64 nsec); Because the variable of arithmethic types isn't modified in the calling scope, there's no need to mark arguments as const, which was already mentioned during review (Link[1) of the original patch. Likewise remove the "const" keyword in both definition and declaration of ns_to_timespec64() as requested by Arnd (Link[2]). Fixes: a84d1169164b ("y2038: Introduce struct __kernel_old_timeval") Signed-off-by: Youngmin Nam Signed-off-by: Thomas Gleixner Reviewed-by: Arnd Bergmann Link: https://lore.kernel.org/all/20220712094715.2918823-1-youngmin.nam@samsung.com Link[1]: https://lore.kernel.org/all/20180310081123.thin6wphgk7tongy@gmail.com/ Link[2]: https://lore.kernel.org/all/CAK8P3a3nknJgEDESGdJH91jMj6R_xydFqWASd8r5BbesdvMBgA@mail.gmail.com/ --- include/linux/time64.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/time64.h b/include/linux/time64.h index 2fb8232cff1d..f1bcea8c124a 100644 --- a/include/linux/time64.h +++ b/include/linux/time64.h @@ -145,7 +145,7 @@ static inline s64 timespec64_to_ns(const struct timespec64 *ts) * * Returns the timespec64 representation of the nsec parameter. */ -extern struct timespec64 ns_to_timespec64(const s64 nsec); +extern struct timespec64 ns_to_timespec64(s64 nsec); /** * timespec64_add_ns - Adds nanoseconds to a timespec64 -- cgit From af887e437bb298752b2edc5834048b8151b8aea0 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 9 Aug 2022 12:50:28 -0400 Subject: NFS: Improve write error tracing Don't leak request pointers, but use the "device:inode" labelling that is used by all the other trace points. Furthermore, replace use of page indexes with an offset, again in order to align behaviour with other NFS trace points. Signed-off-by: Trond Myklebust --- include/linux/nfs_page.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h index f0373a6cb5fb..ba7e2e4b0926 100644 --- a/include/linux/nfs_page.h +++ b/include/linux/nfs_page.h @@ -202,8 +202,7 @@ nfs_list_entry(struct list_head *head) return list_entry(head, struct nfs_page, wb_list); } -static inline -loff_t req_offset(struct nfs_page *req) +static inline loff_t req_offset(const struct nfs_page *req) { return (((loff_t)req->wb_index) << PAGE_SHIFT) + req->wb_offset; } -- cgit From d4252071b97d2027d246f6a82cbee4d52f618b47 Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Tue, 9 Aug 2022 14:32:13 -0400 Subject: add barriers to buffer_uptodate and set_buffer_uptodate Let's have a look at this piece of code in __bread_slow: get_bh(bh); bh->b_end_io = end_buffer_read_sync; submit_bh(REQ_OP_READ, 0, bh); wait_on_buffer(bh); if (buffer_uptodate(bh)) return bh; Neither wait_on_buffer nor buffer_uptodate contain any memory barrier. Consequently, if someone calls sb_bread and then reads the buffer data, the read of buffer data may be executed before wait_on_buffer(bh) on architectures with weak memory ordering and it may return invalid data. Fix this bug by adding a memory barrier to set_buffer_uptodate and an acquire barrier to buffer_uptodate (in a similar way as folio_test_uptodate and folio_mark_uptodate). Signed-off-by: Mikulas Patocka Reviewed-by: Matthew Wilcox (Oracle) Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds --- include/linux/buffer_head.h | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 307445d1c69e..def8b8d30ccc 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -118,7 +118,6 @@ static __always_inline int test_clear_buffer_##name(struct buffer_head *bh) \ * of the form "mark_buffer_foo()". These are higher-level functions which * do something in addition to setting a b_state bit. */ -BUFFER_FNS(Uptodate, uptodate) BUFFER_FNS(Dirty, dirty) TAS_BUFFER_FNS(Dirty, dirty) BUFFER_FNS(Lock, locked) @@ -136,6 +135,30 @@ BUFFER_FNS(Meta, meta) BUFFER_FNS(Prio, prio) BUFFER_FNS(Defer_Completion, defer_completion) +static __always_inline void set_buffer_uptodate(struct buffer_head *bh) +{ + /* + * make it consistent with folio_mark_uptodate + * pairs with smp_load_acquire in buffer_uptodate + */ + smp_mb__before_atomic(); + set_bit(BH_Uptodate, &bh->b_state); +} + +static __always_inline void clear_buffer_uptodate(struct buffer_head *bh) +{ + clear_bit(BH_Uptodate, &bh->b_state); +} + +static __always_inline int buffer_uptodate(const struct buffer_head *bh) +{ + /* + * make it consistent with folio_test_uptodate + * pairs with smp_mb__before_atomic in set_buffer_uptodate + */ + return (smp_load_acquire(&bh->b_state) & (1UL << BH_Uptodate)) != 0; +} + #define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK) /* If we *know* page->private refers to buffer_heads */ -- cgit From 116d902fa9ff1f559b66bafad0f0cad90df95d21 Mon Sep 17 00:00:00 2001 From: Thomas Zimmermann Date: Mon, 8 Aug 2022 14:53:53 +0200 Subject: iosys-map: Add IOSYS_MAP_INIT_VADDR_IOMEM() Add IOSYS_MAP_INIT_VADDR_IOMEM() for static init of variables of type struct iosys_map. Signed-off-by: Thomas Zimmermann Reviewed-by: Lucas De Marchi Reviewed-by: Sam Ravnborg Link: https://patchwork.freedesktop.org/patch/msgid/20220808125406.20752-2-tzimmermann@suse.de --- include/linux/iosys-map.h | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iosys-map.h b/include/linux/iosys-map.h index a533cae189d7..cb71aa616bd3 100644 --- a/include/linux/iosys-map.h +++ b/include/linux/iosys-map.h @@ -46,10 +46,13 @@ * * iosys_map_set_vaddr(&map, 0xdeadbeaf); * - * To set an address in I/O memory, use iosys_map_set_vaddr_iomem(). + * To set an address in I/O memory, use IOSYS_MAP_INIT_VADDR_IOMEM() or + * iosys_map_set_vaddr_iomem(). * * .. code-block:: c * + * struct iosys_map map = IOSYS_MAP_INIT_VADDR_IOMEM(0xdeadbeaf); + * * iosys_map_set_vaddr_iomem(&map, 0xdeadbeaf); * * Instances of struct iosys_map do not have to be cleaned up, but @@ -121,6 +124,16 @@ struct iosys_map { .is_iomem = false, \ } +/** + * IOSYS_MAP_INIT_VADDR_IOMEM - Initializes struct iosys_map to an address in I/O memory + * @vaddr_iomem_: An I/O-memory address + */ +#define IOSYS_MAP_INIT_VADDR_IOMEM(vaddr_iomem_) \ + { \ + .vaddr_iomem = (vaddr_iomem_), \ + .is_iomem = true, \ + } + /** * IOSYS_MAP_INIT_OFFSET - Initializes struct iosys_map from another iosys_map * @map_: The dma-buf mapping structure to copy from -- cgit From 4dd48c6f1f83290d4bc61b43e61d86f8bc6c310e Mon Sep 17 00:00:00 2001 From: Artem Savkov Date: Wed, 10 Aug 2022 08:59:03 +0200 Subject: bpf: add destructive kfunc flag Add KF_DESTRUCTIVE flag for destructive functions. Functions with this flag set will require CAP_SYS_BOOT capabilities. Signed-off-by: Artem Savkov Link: https://lore.kernel.org/r/20220810065905.475418-2-asavkov@redhat.com Signed-off-by: Alexei Starovoitov --- include/linux/btf.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index 976cbdd2981f..ad93c2d9cc1c 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -49,7 +49,8 @@ * for this case. */ #define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */ -#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ +#define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ +#define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ struct btf; struct btf_member; -- cgit From 2a0133723f9ebeb751cfce19f74ec07e108bef1f Mon Sep 17 00:00:00 2001 From: Hawkins Jiawei Date: Fri, 5 Aug 2022 15:48:34 +0800 Subject: net: fix refcount bug in sk_psock_get (2) Syzkaller reports refcount bug as follows: ------------[ cut here ]------------ refcount_t: saturated; leaking memory. WARNING: CPU: 1 PID: 3605 at lib/refcount.c:19 refcount_warn_saturate+0xf4/0x1e0 lib/refcount.c:19 Modules linked in: CPU: 1 PID: 3605 Comm: syz-executor208 Not tainted 5.18.0-syzkaller-03023-g7e062cda7d90 #0 __refcount_add_not_zero include/linux/refcount.h:163 [inline] __refcount_inc_not_zero include/linux/refcount.h:227 [inline] refcount_inc_not_zero include/linux/refcount.h:245 [inline] sk_psock_get+0x3bc/0x410 include/linux/skmsg.h:439 tls_data_ready+0x6d/0x1b0 net/tls/tls_sw.c:2091 tcp_data_ready+0x106/0x520 net/ipv4/tcp_input.c:4983 tcp_data_queue+0x25f2/0x4c90 net/ipv4/tcp_input.c:5057 tcp_rcv_state_process+0x1774/0x4e80 net/ipv4/tcp_input.c:6659 tcp_v4_do_rcv+0x339/0x980 net/ipv4/tcp_ipv4.c:1682 sk_backlog_rcv include/net/sock.h:1061 [inline] __release_sock+0x134/0x3b0 net/core/sock.c:2849 release_sock+0x54/0x1b0 net/core/sock.c:3404 inet_shutdown+0x1e0/0x430 net/ipv4/af_inet.c:909 __sys_shutdown_sock net/socket.c:2331 [inline] __sys_shutdown_sock net/socket.c:2325 [inline] __sys_shutdown+0xf1/0x1b0 net/socket.c:2343 __do_sys_shutdown net/socket.c:2351 [inline] __se_sys_shutdown net/socket.c:2349 [inline] __x64_sys_shutdown+0x50/0x70 net/socket.c:2349 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 During SMC fallback process in connect syscall, kernel will replaces TCP with SMC. In order to forward wakeup smc socket waitqueue after fallback, kernel will sets clcsk->sk_user_data to origin smc socket in smc_fback_replace_callbacks(). Later, in shutdown syscall, kernel will calls sk_psock_get(), which treats the clcsk->sk_user_data as psock type, triggering the refcnt warning. So, the root cause is that smc and psock, both will use sk_user_data field. So they will mismatch this field easily. This patch solves it by using another bit(defined as SK_USER_DATA_PSOCK) in PTRMASK, to mark whether sk_user_data points to a psock object or not. This patch depends on a PTRMASK introduced in commit f1ff5ce2cd5e ("net, sk_msg: Clear sk_user_data pointer on clone if tagged"). For there will possibly be more flags in the sk_user_data field, this patch also refactor sk_user_data flags code to be more generic to improve its maintainability. Reported-and-tested-by: syzbot+5f26f85569bd179c18ce@syzkaller.appspotmail.com Suggested-by: Jakub Kicinski Acked-by: Wen Gu Signed-off-by: Hawkins Jiawei Reviewed-by: Jakub Sitnicki Signed-off-by: Jakub Kicinski --- include/linux/skmsg.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 153b6dec9b6a..48f4b645193b 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -278,7 +278,8 @@ static inline void sk_msg_sg_copy_clear(struct sk_msg *msg, u32 start) static inline struct sk_psock *sk_psock(const struct sock *sk) { - return rcu_dereference_sk_user_data(sk); + return __rcu_dereference_sk_user_data_with_flags(sk, + SK_USER_DATA_PSOCK); } static inline void sk_psock_set_state(struct sk_psock *psock, -- cgit From c2a052a4a949df53f50a5024843432d2234cb824 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Fri, 24 Jun 2022 10:55:41 +0800 Subject: remoteproc: rename len of rpoc_vring to num Rename the member len in the structure rpoc_vring to num. And remove 'in bytes' from the comment of it. This is misleading. Because this actually refers to the size of the virtio vring to be created. The unit is not bytes. Signed-off-by: Xuan Zhuo Message-Id: <20220624025621.128843-2-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/remoteproc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index 7c943f0a2fc4..aea79c77db0f 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -597,7 +597,7 @@ struct rproc_subdev { /** * struct rproc_vring - remoteproc vring state * @va: virtual address - * @len: length, in bytes + * @num: vring size * @da: device address * @align: vring alignment * @notifyid: rproc-specific unique vring index @@ -606,7 +606,7 @@ struct rproc_subdev { */ struct rproc_vring { void *va; - int len; + int num; u32 da; u32 align; int notifyid; -- cgit From da802961832f9852886304290135457519815497 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:21 +0800 Subject: virtio: record the maximum queue num supported by the device. virtio-net can display the maximum (supported by hardware) ring size in ethtool -g eth0. When the subsequent patch implements vring reset, it can judge whether the ring size passed by the driver is legal based on this. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-2-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index d8fdf170637c..129bde7521e3 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -19,6 +19,7 @@ * @priv: a pointer for the virtqueue implementation to use. * @index: the zero-based ordinal number for this queue. * @num_free: number of elements we expect to be able to fit. + * @num_max: the maximum number of elements supported by the device. * * A note on @num_free: with indirect buffers, each buffer needs one * element in the queue, otherwise a buffer will need one element per @@ -31,6 +32,7 @@ struct virtqueue { struct virtio_device *vdev; unsigned int index; unsigned int num_free; + unsigned int num_max; void *priv; }; -- cgit From 3086e9fc9173166774652a488467e4176ee1c81b Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:22 +0800 Subject: virtio: struct virtio_config_ops add callbacks for queue_reset reset can be divided into the following four steps (example): 1. transport: notify the device to reset the queue 2. vring: recycle the buffer submitted 3. vring: reset/resize the vring (may re-alloc) 4. transport: mmap vring to device, and enable the queue In order to support queue reset, add two callbacks in struct virtio_config_ops to implement steps 1 and 4. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-3-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio_config.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index b47c2e7ed0ee..36ec7be1f480 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -78,6 +78,18 @@ struct virtio_shm_region { * @set_vq_affinity: set the affinity for a virtqueue (optional). * @get_vq_affinity: get the affinity for a virtqueue (optional). * @get_shm_region: get a shared memory region based on the index. + * @disable_vq_and_reset: reset a queue individually (optional). + * vq: the virtqueue + * Returns 0 on success or error status + * disable_vq_and_reset will guarantee that the callbacks are disabled and + * synchronized. + * Except for the callback, the caller should guarantee that the vring is + * not accessed by any functions of virtqueue. + * @enable_vq_after_reset: enable a reset queue + * vq: the virtqueue + * Returns 0 on success or error status + * If disable_vq_and_reset is set, then enable_vq_after_reset must also be + * set. */ typedef void vq_callback_t(struct virtqueue *); struct virtio_config_ops { @@ -104,6 +116,8 @@ struct virtio_config_ops { int index); bool (*get_shm_region)(struct virtio_device *vdev, struct virtio_shm_region *region, u8 id); + int (*disable_vq_and_reset)(struct virtqueue *vq); + int (*enable_vq_after_reset)(struct virtqueue *vq); }; /* If driver didn't advertise the feature, it will never appear. */ -- cgit From 07d9629d49584b6f79faa6158cd7aef7e6919703 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:27 +0800 Subject: virtio_ring: split: stop __vring_new_virtqueue as export symbol There is currently only one place to reference __vring_new_virtqueue() directly from the outside of virtio core. And here vring_new_virtqueue() can be used instead. Subsequent patches will modify __vring_new_virtqueue, so stop it as an export symbol for now. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-8-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio_ring.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/virtio_ring.h b/include/linux/virtio_ring.h index b485b13fa50b..8b8af1a38991 100644 --- a/include/linux/virtio_ring.h +++ b/include/linux/virtio_ring.h @@ -76,16 +76,6 @@ struct virtqueue *vring_create_virtqueue(unsigned int index, void (*callback)(struct virtqueue *vq), const char *name); -/* Creates a virtqueue with a custom layout. */ -struct virtqueue *__vring_new_virtqueue(unsigned int index, - struct vring vring, - struct virtio_device *vdev, - bool weak_barriers, - bool ctx, - bool (*notify)(struct virtqueue *), - void (*callback)(struct virtqueue *), - const char *name); - /* * Creates a virtqueue with a standard layout but a caller-allocated * ring. -- cgit From c790e8e1817f1a17c05e64f1c4f16f231b8529d5 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:44 +0800 Subject: virtio_ring: introduce virtqueue_resize() Introduce virtqueue_resize() to implement the resize of vring. Based on these, the driver can dynamically adjust the size of the vring. For example: ethtool -G. virtqueue_resize() implements resize based on the vq reset function. In case of failure to allocate a new vring, it will give up resize and use the original vring. During this process, if the re-enable reset vq fails, the vq can no longer be used. Although the probability of this situation is not high. The parameter recycle is used to recycle the buffer that is no longer used. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-25-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 129bde7521e3..62e31bca5602 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -91,6 +91,9 @@ dma_addr_t virtqueue_get_desc_addr(struct virtqueue *vq); dma_addr_t virtqueue_get_avail_addr(struct virtqueue *vq); dma_addr_t virtqueue_get_used_addr(struct virtqueue *vq); +int virtqueue_resize(struct virtqueue *vq, u32 num, + void (*recycle)(struct virtqueue *vq, void *buf)); + /** * virtio_device - representation of a device using virtio * @index: unique position on the virtio bus -- cgit From ea024594b1dc5b6719c1400ae154690f5c203996 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:45 +0800 Subject: virtio_pci: struct virtio_pci_common_cfg add queue_notify_data Add queue_notify_data in struct virtio_pci_common_cfg, which comes from here https://github.com/oasis-tcs/virtio-spec/issues/89 In order not to affect the API, add a dedicated structure struct virtio_pci_modern_common_cfg to virtio_pci_modern.h. Since I want to add queue_reset after queue_notify_data, I submitted this patch first. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-26-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio_pci_modern.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h index eb2bd9b4077d..41f5a018bd94 100644 --- a/include/linux/virtio_pci_modern.h +++ b/include/linux/virtio_pci_modern.h @@ -5,6 +5,13 @@ #include #include +struct virtio_pci_modern_common_cfg { + struct virtio_pci_common_cfg cfg; + + __le16 queue_notify_data; /* read-write */ + __le16 padding; +}; + struct virtio_pci_modern_device { struct pci_dev *pci_dev; -- cgit From 3251063155032729b8793ac3957136ae25c0bafa Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:46 +0800 Subject: virtio: allow to unbreak/break virtqueue individually This patch allows the new introduced __virtqueue_break()/__virtqueue_unbreak() to break/unbreak the virtqueue. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-27-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index 62e31bca5602..d45ee82a4470 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -138,6 +138,9 @@ bool is_virtio_device(struct device *dev); void virtio_break_device(struct virtio_device *dev); void __virtio_unbreak_device(struct virtio_device *dev); +void __virtqueue_break(struct virtqueue *_vq); +void __virtqueue_unbreak(struct virtqueue *_vq); + void virtio_config_changed(struct virtio_device *dev); #ifdef CONFIG_PM_SLEEP int virtio_device_freeze(struct virtio_device *dev); -- cgit From 4913e85441b40386c4bb093f188b955d8165f1b7 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:48 +0800 Subject: virtio_ring: struct virtqueue introduce reset Introduce a new member reset to the structure virtqueue to determine whether the current vq is in the reset state. Subsequent patches will use it. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-29-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index d45ee82a4470..a3f73bb6733e 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -20,6 +20,7 @@ * @index: the zero-based ordinal number for this queue. * @num_free: number of elements we expect to be able to fit. * @num_max: the maximum number of elements supported by the device. + * @reset: vq is in reset state or not. * * A note on @num_free: with indirect buffers, each buffer needs one * element in the queue, otherwise a buffer will need one element per @@ -34,6 +35,7 @@ struct virtqueue { unsigned int num_free; unsigned int num_max; void *priv; + bool reset; }; int virtqueue_add_outbuf(struct virtqueue *vq, -- cgit From 0cdd450e70510c9e13af8099e9f6c1467e6a0b91 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:49 +0800 Subject: virtio_pci: struct virtio_pci_common_cfg add queue_reset Add queue_reset in virtio_pci_modern_common_cfg. https://github.com/oasis-tcs/virtio-spec/issues/124 https://github.com/oasis-tcs/virtio-spec/issues/139 Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-30-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio_pci_modern.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h index 41f5a018bd94..05123b9a606f 100644 --- a/include/linux/virtio_pci_modern.h +++ b/include/linux/virtio_pci_modern.h @@ -9,7 +9,7 @@ struct virtio_pci_modern_common_cfg { struct virtio_pci_common_cfg cfg; __le16 queue_notify_data; /* read-write */ - __le16 padding; + __le16 queue_reset; /* read-write */ }; struct virtio_pci_modern_device { -- cgit From 0b50cece0b7857732d2055f2c77f8730c10f9196 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:50 +0800 Subject: virtio_pci: introduce helper to get/set queue reset Introduce new helpers to implement queue reset and get queue reset status. https://github.com/oasis-tcs/virtio-spec/issues/124 https://github.com/oasis-tcs/virtio-spec/issues/139 Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-31-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio_pci_modern.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio_pci_modern.h b/include/linux/virtio_pci_modern.h index 05123b9a606f..c4eeb79b0139 100644 --- a/include/linux/virtio_pci_modern.h +++ b/include/linux/virtio_pci_modern.h @@ -113,4 +113,6 @@ void __iomem * vp_modern_map_vq_notify(struct virtio_pci_modern_device *mdev, u16 index, resource_size_t *pa); int vp_modern_probe(struct virtio_pci_modern_device *mdev); void vp_modern_remove(struct virtio_pci_modern_device *mdev); +int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 index); +void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index); #endif -- cgit From a10fba0377145fccefea4dc4dd5915b7ed87e546 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:53 +0800 Subject: virtio: find_vqs() add arg sizes find_vqs() adds a new parameter sizes to specify the size of each vq vring. NULL as sizes means that all queues in find_vqs() use the maximum size. A value in the array is 0, which means that the corresponding queue uses the maximum size. In the split scenario, the meaning of size is the largest size, because it may be limited by memory, the virtio core will try a smaller size. And the size is power of 2. Signed-off-by: Xuan Zhuo Acked-by: Hans de Goede Reviewed-by: Mathieu Poirier Acked-by: Jason Wang Message-Id: <20220801063902.129329-34-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio_config.h | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 36ec7be1f480..888f7e96f0c7 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -55,6 +55,7 @@ struct virtio_shm_region { * include a NULL entry for vqs that do not need a callback * names: array of virtqueue names (mainly for debugging) * include a NULL entry for vqs unused by driver + * sizes: array of virtqueue sizes * Returns 0 on success or error status * @del_vqs: free virtqueues found by find_vqs(). * @synchronize_cbs: synchronize with the virtqueue callbacks (optional) @@ -103,7 +104,9 @@ struct virtio_config_ops { void (*reset)(struct virtio_device *vdev); int (*find_vqs)(struct virtio_device *, unsigned nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], - const char * const names[], const bool *ctx, + const char * const names[], + u32 sizes[], + const bool *ctx, struct irq_affinity *desc); void (*del_vqs)(struct virtio_device *); void (*synchronize_cbs)(struct virtio_device *); @@ -212,7 +215,7 @@ struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev, const char *names[] = { n }; struct virtqueue *vq; int err = vdev->config->find_vqs(vdev, 1, &vq, callbacks, names, NULL, - NULL); + NULL, NULL); if (err < 0) return ERR_PTR(err); return vq; @@ -224,7 +227,8 @@ int virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs, const char * const names[], struct irq_affinity *desc) { - return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, NULL, desc); + return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, NULL, + NULL, desc); } static inline @@ -233,8 +237,8 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs, const char * const names[], const bool *ctx, struct irq_affinity *desc) { - return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, ctx, - desc); + return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, NULL, + ctx, desc); } /** -- cgit From fe3dc04e31aa51f91dc7f741a5f76cc4817eb5b4 Mon Sep 17 00:00:00 2001 From: Xuan Zhuo Date: Mon, 1 Aug 2022 14:38:56 +0800 Subject: virtio: add helper virtio_find_vqs_ctx_size() Introduce helper virtio_find_vqs_ctx_size() to call find_vqs and specify the maximum size of each vq ring. Signed-off-by: Xuan Zhuo Acked-by: Jason Wang Message-Id: <20220801063902.129329-37-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin --- include/linux/virtio_config.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 888f7e96f0c7..6adff09f7170 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -241,6 +241,18 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs, ctx, desc); } +static inline +int virtio_find_vqs_ctx_size(struct virtio_device *vdev, u32 nvqs, + struct virtqueue *vqs[], + vq_callback_t *callbacks[], + const char * const names[], + u32 sizes[], + const bool *ctx, struct irq_affinity *desc) +{ + return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, sizes, + ctx, desc); +} + /** * virtio_synchronize_cbs - synchronize with virtqueue callbacks * @vdev: the device -- cgit From cae15c2ed8e6e058bd5e32de292ab7982640161e Mon Sep 17 00:00:00 2001 From: Eli Cohen Date: Thu, 14 Jul 2022 14:39:26 +0300 Subject: vdpa/mlx5: Implement susupend virtqueue callback Implement the suspend callback allowing to suspend the virtqueues so they stop processing descriptors. This is required to allow to query a consistent state of the virtqueue while live migration is taking place. Signed-off-by: Eli Cohen Message-Id: <20220714113927.85729-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin --- include/linux/mlx5/mlx5_ifc_vdpa.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h index 4414ed5b6ed2..9becdc3fa503 100644 --- a/include/linux/mlx5/mlx5_ifc_vdpa.h +++ b/include/linux/mlx5/mlx5_ifc_vdpa.h @@ -150,6 +150,14 @@ enum { MLX5_VIRTIO_NET_Q_OBJECT_STATE_ERR = 0x3, }; +/* This indicates that the object was not created or has already + * been desroyed. It is very safe to assume that this object will never + * have so many states + */ +enum { + MLX5_VIRTIO_NET_Q_OBJECT_NONE = 0xffffffff +}; + enum { MLX5_RQTC_LIST_Q_TYPE_RQ = 0x0, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q = 0x1, -- cgit From 848ecea184e1253758423b37cbfc1ed732ccf5b4 Mon Sep 17 00:00:00 2001 From: Eugenio Pérez Date: Wed, 10 Aug 2022 19:15:09 +0200 Subject: vdpa: Add suspend operation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This operation is optional: It it's not implemented, backend feature bit will not be exposed. Signed-off-by: Eugenio Pérez Message-Id: <20220810171512.2343333-2-eperezma@redhat.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vdpa.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index 7b4a13d3bd91..d282f464d2f1 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -218,6 +218,9 @@ struct vdpa_map_file { * @reset: Reset device * @vdev: vdpa device * Returns integer: success (0) or error (< 0) + * @suspend: Suspend or resume the device (optional) + * @vdev: vdpa device + * Returns integer: success (0) or error (< 0) * @get_config_size: Get the size of the configuration space includes * fields that are conditional on feature bits. * @vdev: vdpa device @@ -319,6 +322,7 @@ struct vdpa_config_ops { u8 (*get_status)(struct vdpa_device *vdev); void (*set_status)(struct vdpa_device *vdev, u8 status); int (*reset)(struct vdpa_device *vdev); + int (*suspend)(struct vdpa_device *vdev); size_t (*get_config_size)(struct vdpa_device *vdev); void (*get_config)(struct vdpa_device *vdev, unsigned int offset, void *buf, unsigned int len); -- cgit From fb12ad5e3179c9667dfcbafe2c5da91f9e700e53 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Thu, 11 Aug 2022 16:11:36 -0700 Subject: Input: bma150 - fix a typo in some comments Remove some extra '0' s/BMA0150_RANGE_xxx/BMA150_RANGE_xxx/ s/BMA0150_BW_xxx/BMA150_BW_xxx Signed-off-by: Christophe JAILLET Link: https://lore.kernel.org/r/a331a6244a1dfbf34dc85f1be6995fa91500c801.1659802757.git.christophe.jaillet@wanadoo.fr Signed-off-by: Dmitry Torokhov --- include/linux/bma150.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bma150.h b/include/linux/bma150.h index 31c9e323a391..4d4a62d49341 100644 --- a/include/linux/bma150.h +++ b/include/linux/bma150.h @@ -33,8 +33,8 @@ struct bma150_cfg { unsigned char lg_hyst; /* Low-G hysterisis */ unsigned char lg_dur; /* Low-G duration */ unsigned char lg_thres; /* Low-G threshold */ - unsigned char range; /* one of BMA0150_RANGE_xxx */ - unsigned char bandwidth; /* one of BMA0150_BW_xxx */ + unsigned char range; /* one of BMA150_RANGE_xxx */ + unsigned char bandwidth; /* one of BMA150_BW_xxx */ }; struct bma150_platform_data { -- cgit From addebd9ac9ca0ef8b3764907bf8018e48caffc64 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Fri, 12 Aug 2022 15:56:33 -0700 Subject: fs: don't randomize struct kiocb fields This is a size sensitive structure and randomizing can introduce extra padding that breaks io_uring's fixed size expectations. There are few fields here as it is, half of which need a fixed order to optimally pack, so the randomization isn't providing much. Suggested-by: Linus Torvalds Signed-off-by: Keith Busch Link: https://lore.kernel.org/io-uring/b6f508ca-b1b2-5f40-7998-e4cff1cf7212@kernel.dk/ Signed-off-by: Jens Axboe --- include/linux/fs.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9f131e559d05..daf69a6504b6 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -339,17 +339,12 @@ enum rw_hint { struct kiocb { struct file *ki_filp; - - /* The 'ki_filp' pointer is shared in a union for aio */ - randomized_struct_fields_start - loff_t ki_pos; void (*ki_complete)(struct kiocb *iocb, long ret); void *private; int ki_flags; u16 ki_ioprio; /* See linux/ioprio.h */ struct wait_page_queue *ki_waitq; /* for async buffered IO */ - randomized_struct_fields_end }; static inline bool is_sync_kiocb(struct kiocb *kiocb) -- cgit From f2ccb5aed7bce1d8b3ed5b3385759a5509663028 Mon Sep 17 00:00:00 2001 From: Stefan Metzmacher Date: Thu, 11 Aug 2022 09:11:15 +0200 Subject: io_uring: make io_kiocb_to_cmd() typesafe We need to make sure (at build time) that struct io_cmd_data is not casted to a structure that's larger. Signed-off-by: Stefan Metzmacher Link: https://lore.kernel.org/r/c024cdf25ae19fc0319d4180e2298bade8ed17b8.1660201408.git.metze@samba.org Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index f7fab3758cb9..677a25d44d7f 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -491,7 +491,14 @@ struct io_cmd_data { __u8 data[56]; }; -#define io_kiocb_to_cmd(req) ((void *) &(req)->cmd) +static inline void io_kiocb_cmd_sz_check(size_t cmd_sz) +{ + BUILD_BUG_ON(cmd_sz > sizeof(struct io_cmd_data)); +} +#define io_kiocb_to_cmd(req, cmd_type) ( \ + io_kiocb_cmd_sz_check(sizeof(cmd_type)) , \ + ((cmd_type *)&(req)->cmd) \ +) #define cmd_to_io_kiocb(ptr) ((struct io_kiocb *) ptr) struct io_kiocb { -- cgit From 67f4b5dc49913abcdb5cc736e73674e2f352f81d Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 13 Aug 2022 08:22:25 -0400 Subject: NFS: Fix another fsync() issue after a server reboot Currently, when the writeback code detects a server reboot, it redirties any pages that were not committed to disk, and it sets the flag NFS_CONTEXT_RESEND_WRITES in the nfs_open_context of the file descriptor that dirtied the file. While this allows the file descriptor in question to redrive its own writes, it violates the fsync() requirement that we should be synchronising all writes to disk. While the problem is infrequent, we do see corner cases where an untimely server reboot causes the fsync() call to abandon its attempt to sync data to disk and causing data corruption issues due to missed error conditions or similar. In order to tighted up the client's ability to deal with this situation without introducing livelocks, add a counter that records the number of times pages are redirtied due to a server reboot-like condition, and use that in fsync() to redrive the sync to disk. Fixes: 2197e9b06c22 ("NFS: Fix up fsync() when the server rebooted") Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index b32ed68e7dc4..f08e581f0161 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -182,6 +182,7 @@ struct nfs_inode { /* Regular file */ struct { atomic_long_t nrequests; + atomic_long_t redirtied_pages; struct nfs_mds_commit_info commit_info; struct mutex commit_mutex; }; -- cgit From 5f6277a0c15e1ea54b6fd3d78c9fff7bfe42556c Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Sat, 13 Aug 2022 08:51:45 -0400 Subject: NFS: Cleanup to remove unused flag NFS_CONTEXT_RESEND_WRITES Signed-off-by: Trond Myklebust --- include/linux/nfs_fs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h index f08e581f0161..7931fa472561 100644 --- a/include/linux/nfs_fs.h +++ b/include/linux/nfs_fs.h @@ -83,7 +83,6 @@ struct nfs_open_context { fmode_t mode; unsigned long flags; -#define NFS_CONTEXT_RESEND_WRITES (1) #define NFS_CONTEXT_BAD (2) #define NFS_CONTEXT_UNLOCK (3) #define NFS_CONTEXT_FILE_OPEN (4) -- cgit From 9f162193d6e48eb4ff51c2ea3612f1daebca1b7e Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Thu, 11 Aug 2022 22:34:25 -0700 Subject: radix-tree: replace gfp.h inclusion with gfp_types.h Radix tree header includes gfp.h for __GFP_BITS_SHIFT only. Now we have gfp_types.h for this. Fixes powerpc allmodconfig build: In file included from include/linux/nodemask.h:97, from include/linux/mmzone.h:17, from include/linux/gfp.h:7, from include/linux/radix-tree.h:12, from include/linux/idr.h:15, from include/linux/kernfs.h:12, from include/linux/sysfs.h:16, from include/linux/kobject.h:20, from include/linux/pci.h:35, from arch/powerpc/kernel/prom_init.c:24: include/linux/random.h: In function 'add_latent_entropy': >> include/linux/random.h:25:46: error: 'latent_entropy' undeclared (first use in this function); did you mean 'add_latent_entropy'? 25 | add_device_randomness((const void *)&latent_entropy, sizeof(latent_entropy)); | ^~~~~~~~~~~~~~ | add_latent_entropy include/linux/random.h:25:46: note: each undeclared identifier is reported only once for each function it appears in Reported-by: kernel test robot CC: Andy Shevchenko CC: Andrew Morton CC: Jason A. Donenfeld Signed-off-by: Yury Norov Signed-off-by: Linus Torvalds --- include/linux/radix-tree.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h index f7c1d21c2f39..eae67015ce51 100644 --- a/include/linux/radix-tree.h +++ b/include/linux/radix-tree.h @@ -9,7 +9,7 @@ #define _LINUX_RADIX_TREE_H #include -#include +#include #include #include #include -- cgit From 93ce557679e1cf7742ad327d40a1499e7d8535b7 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 8 Aug 2022 23:33:59 +0300 Subject: regmap: mmio: Introduce IO accessors that can talk to IO port Some users may use regmap MMIO for IO ports, and this can be done by assigning ioreadXX()/iowriteXX() and their Big Endian counterparts to the regmap context. Add IO port support with a corresponding flag added. While doing that, make sure that user won't select relaxed MMIO access along with IO port because the latter have no relaxed variants. Signed-off-by: Andy Shevchenko Acked-by: William Breathitt Gray Link: https://lore.kernel.org/r/20220808203401.35153-4-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/regmap.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 7cf2157134ac..8cccc247cd37 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -311,6 +311,8 @@ typedef void (*regmap_unlock)(void *); * This field is a duplicate of a similar file in * 'struct regmap_bus' and serves exact same purpose. * Use it only for "no-bus" cases. + * @io_port: Support IO port accessors. Makes sense only when MMIO vs. IO port + * access can be distinguished. * @max_register: Optional, specifies the maximum valid register address. * @wr_table: Optional, points to a struct regmap_access_table specifying * valid ranges for write access. @@ -399,6 +401,7 @@ struct regmap_config { size_t max_raw_write; bool fast_io; + bool io_port; unsigned int max_register; const struct regmap_access_table *wr_table; -- cgit From 0a90ed8d0cfa29735a221eba14d9cb6c735d35b6 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 1 Aug 2022 14:37:31 +0300 Subject: platform/x86: pmc_atom: Fix SLP_TYPx bitfield mask On Intel hardware the SLP_TYPx bitfield occupies bits 10-12 as per ACPI specification (see Table 4.13 "PM1 Control Registers Fixed Hardware Feature Control Bits" for the details). Fix the mask and other related definitions accordingly. Fixes: 93e5eadd1f6e ("x86/platform: New Intel Atom SOC power management controller driver") Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220801113734.36131-1-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/pmc_atom.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/pmc_atom.h b/include/linux/platform_data/x86/pmc_atom.h index 3edfb6d4e67a..dd81f510e4cf 100644 --- a/include/linux/platform_data/x86/pmc_atom.h +++ b/include/linux/platform_data/x86/pmc_atom.h @@ -7,6 +7,8 @@ #ifndef PMC_ATOM_H #define PMC_ATOM_H +#include + /* ValleyView Power Control Unit PCI Device ID */ #define PCI_DEVICE_ID_VLV_PMC 0x0F1C /* CherryTrail Power Control Unit PCI Device ID */ @@ -139,9 +141,9 @@ #define ACPI_MMIO_REG_LEN 0x100 #define PM1_CNT 0x4 -#define SLEEP_TYPE_MASK 0xFFFFECFF +#define SLEEP_TYPE_MASK GENMASK(12, 10) #define SLEEP_TYPE_S5 0x1C00 -#define SLEEP_ENABLE 0x2000 +#define SLEEP_ENABLE BIT(13) extern int pmc_atom_read(int offset, u32 *value); -- cgit From be599244865cb788abe2c6f59ccb05d7240448e4 Mon Sep 17 00:00:00 2001 From: Sander Vanheule Date: Tue, 9 Aug 2022 19:36:33 +0200 Subject: cpumask: align signatures of UP implementations Between the generic version, and their uniprocessor optimised implementations, the return types of cpumask_any_and_distribute() and cpumask_any_distribute() are not identical. Change the UP versions to 'unsigned int', to match the generic versions. Suggested-by: Yury Norov Signed-off-by: Sander Vanheule Signed-off-by: Yury Norov --- include/linux/cpumask.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 0d435d0edbcb..d8c2a40f8beb 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -202,12 +202,13 @@ static inline unsigned int cpumask_local_spread(unsigned int i, int node) return 0; } -static inline int cpumask_any_and_distribute(const struct cpumask *src1p, - const struct cpumask *src2p) { +static inline unsigned int cpumask_any_and_distribute(const struct cpumask *src1p, + const struct cpumask *src2p) +{ return cpumask_first_and(src1p, src2p); } -static inline int cpumask_any_distribute(const struct cpumask *srcp) +static inline unsigned int cpumask_any_distribute(const struct cpumask *srcp) { return cpumask_first(srcp); } -- cgit From 2248ccd80124e61c2c84a22b22409bab452e1f0c Mon Sep 17 00:00:00 2001 From: Sander Vanheule Date: Tue, 9 Aug 2022 19:36:34 +0200 Subject: lib/cpumask: add inline cpumask_next_wrap() for UP In the uniprocessor case, cpumask_next_wrap() can be simplified, as the number of valid argument combinations is limited: - 'start' can only be 0 - 'n' can only be -1 or 0 The only valid CPU that can then be returned, if any, will be the first one set in the provided 'mask'. For NR_CPUS == 1, include/linux/cpumask.h now provides an inline definition of cpumask_next_wrap(), which will conflict with the one provided by lib/cpumask.c. Make building of lib/cpumask.o again depend on CONFIG_SMP=y (i.e. NR_CPUS > 1) to avoid the re-definition. Suggested-by: Yury Norov Signed-off-by: Sander Vanheule Signed-off-by: Yury Norov --- include/linux/cpumask.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index d8c2a40f8beb..bd047864c7ac 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -262,7 +262,26 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p, (cpu) = cpumask_next_zero((cpu), (mask)), \ (cpu) < nr_cpu_ids;) +#if NR_CPUS == 1 +static inline +unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap) +{ + cpumask_check(start); + if (n != -1) + cpumask_check(n); + + /* + * Return the first available CPU when wrapping, or when starting before cpu0, + * since there is only one valid option. + */ + if (wrap && n >= 0) + return nr_cpumask_bits; + + return cpumask_first(mask); +} +#else unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap); +#endif /** * for_each_cpu_wrap - iterate over every cpu in a mask, starting at a specified location -- cgit From 7f203bc89eb66d6afde7eae91347fc0352090cc3 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 29 Jul 2022 13:10:16 -1000 Subject: cgroup: Replace cgroup->ancestor_ids[] with ->ancestors[] Every cgroup knows all its ancestors through its ->ancestor_ids[]. There's no advantage to remembering the IDs instead of the pointers directly and this makes the array useless for finding an actual ancestor cgroup forcing cgroup_ancestor() to iteratively walk up the hierarchy instead. Let's replace cgroup->ancestor_ids[] with ->ancestors[] and remove the walking-up from cgroup_ancestor(). While at it, improve comments around cgroup_root->cgrp_ancestor_storage. This patch shouldn't cause user-visible behavior differences. v2: Update cgroup_ancestor() to use ->ancestors[]. v3: cgroup_root->cgrp_ancestor_storage's type is updated to match cgroup->ancestors[]. Better comments. Signed-off-by: Tejun Heo Acked-by: Namhyung Kim --- include/linux/cgroup-defs.h | 16 ++++++++++------ include/linux/cgroup.h | 8 +++----- 2 files changed, 13 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 4bcf56b3491c..1283993d7ea8 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -384,7 +384,7 @@ struct cgroup { /* * The depth this cgroup is at. The root is at depth zero and each * step down the hierarchy increments the level. This along with - * ancestor_ids[] can determine whether a given cgroup is a + * ancestors[] can determine whether a given cgroup is a * descendant of another without traversing the hierarchy. */ int level; @@ -504,8 +504,8 @@ struct cgroup { /* Used to store internal freezer state */ struct cgroup_freezer_state freezer; - /* ids of the ancestors at each level including self */ - u64 ancestor_ids[]; + /* All ancestors including self */ + struct cgroup *ancestors[]; }; /* @@ -522,11 +522,15 @@ struct cgroup_root { /* Unique id for this hierarchy. */ int hierarchy_id; - /* The root cgroup. Root is destroyed on its release. */ + /* + * The root cgroup. The containing cgroup_root will be destroyed on its + * release. cgrp->ancestors[0] will be used overflowing into the + * following field. cgrp_ancestor_storage must immediately follow. + */ struct cgroup cgrp; - /* for cgrp->ancestor_ids[0] */ - u64 cgrp_ancestor_id_storage; + /* must follow cgrp for cgrp->ancestors[0], see above */ + struct cgroup *cgrp_ancestor_storage; /* Number of cgroups in the hierarchy, used only for /proc/cgroups */ atomic_t nr_cgrps; diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index ed53bfe7c46c..4d143729b246 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -574,7 +574,7 @@ static inline bool cgroup_is_descendant(struct cgroup *cgrp, { if (cgrp->root != ancestor->root || cgrp->level < ancestor->level) return false; - return cgrp->ancestor_ids[ancestor->level] == cgroup_id(ancestor); + return cgrp->ancestors[ancestor->level] == ancestor; } /** @@ -591,11 +591,9 @@ static inline bool cgroup_is_descendant(struct cgroup *cgrp, static inline struct cgroup *cgroup_ancestor(struct cgroup *cgrp, int ancestor_level) { - if (cgrp->level < ancestor_level) + if (ancestor_level < 0 || ancestor_level > cgrp->level) return NULL; - while (cgrp && cgrp->level > ancestor_level) - cgrp = cgroup_parent(cgrp); - return cgrp; + return cgrp->ancestors[ancestor_level]; } /** -- cgit From 1e64b9c5f9a01f1a752438724bc83180c451e1c7 Mon Sep 17 00:00:00 2001 From: Nuno Sá Date: Fri, 15 Jul 2022 14:28:53 +0200 Subject: iio: inkern: move to fwnode properties MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This moves the IIO in kernel interface to use fwnode properties and thus be firmware agnostic. Note that the interface is still not firmware agnostic. At this point we have both OF and fwnode interfaces so that we don't break any user. On top of this we also want to have a per driver conversion and that is the main reason we have both of_xlate() and fwnode_xlate() support. Signed-off-by: Nuno Sá Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220715122903.332535-6-nuno.sa@analog.com Signed-off-by: Jonathan Cameron --- include/linux/iio/consumer.h | 36 ++++++++++++++++++++---------------- include/linux/iio/iio.h | 5 +++++ 2 files changed, 25 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/consumer.h b/include/linux/iio/consumer.h index 5fa5957586cf..2adb1306da3e 100644 --- a/include/linux/iio/consumer.h +++ b/include/linux/iio/consumer.h @@ -7,6 +7,7 @@ #ifndef _IIO_INKERN_CONSUMER_H_ #define _IIO_INKERN_CONSUMER_H_ +#include #include #include @@ -14,6 +15,7 @@ struct iio_dev; struct iio_chan_spec; struct device; struct device_node; +struct fwnode_handle; /** * struct iio_channel - everything needed for a consumer to use a channel @@ -99,26 +101,20 @@ void iio_channel_release_all(struct iio_channel *chan); struct iio_channel *devm_iio_channel_get_all(struct device *dev); /** - * of_iio_channel_get_by_name() - get description of all that is needed to access channel. - * @np: Pointer to consumer device tree node + * fwnode_iio_channel_get_by_name() - get description of all that is needed to access channel. + * @fwnode: Pointer to consumer Firmware node * @consumer_channel: Unique name to identify the channel on the consumer * side. This typically describes the channels use within * the consumer. E.g. 'battery_voltage' */ -#ifdef CONFIG_OF -struct iio_channel *of_iio_channel_get_by_name(struct device_node *np, const char *name); -#else -static inline struct iio_channel * -of_iio_channel_get_by_name(struct device_node *np, const char *name) -{ - return NULL; -} -#endif +struct iio_channel *fwnode_iio_channel_get_by_name(struct fwnode_handle *fwnode, + const char *name); /** - * devm_of_iio_channel_get_by_name() - Resource managed version of of_iio_channel_get_by_name(). + * devm_fwnode_iio_channel_get_by_name() - Resource managed version of + * fwnode_iio_channel_get_by_name(). * @dev: Pointer to consumer device. - * @np: Pointer to consumer device tree node + * @fwnode: Pointer to consumer Firmware node * @consumer_channel: Unique name to identify the channel on the consumer * side. This typically describes the channels use within * the consumer. E.g. 'battery_voltage' @@ -129,9 +125,17 @@ of_iio_channel_get_by_name(struct device_node *np, const char *name) * The allocated iio channel is automatically released when the device is * unbound. */ -struct iio_channel *devm_of_iio_channel_get_by_name(struct device *dev, - struct device_node *np, - const char *consumer_channel); +struct iio_channel *devm_fwnode_iio_channel_get_by_name(struct device *dev, + struct fwnode_handle *fwnode, + const char *consumer_channel); + +static inline struct iio_channel +*devm_of_iio_channel_get_by_name(struct device *dev, struct device_node *np, + const char *consumer_channel) +{ + return devm_fwnode_iio_channel_get_by_name(dev, of_fwnode_handle(np), + consumer_channel); +} struct iio_cb_buffer; /** diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index 5dfbfc991c69..6002f8a0baf1 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -18,6 +18,7 @@ */ struct of_phandle_args; +struct fwnode_reference_args; enum iio_shared_by { IIO_SEPARATE, @@ -429,6 +430,8 @@ struct iio_trigger; /* forward declaration */ * provide a custom of_xlate function that reads the * *args* and returns the appropriate index in registered * IIO channels array. + * @fwnode_xlate: fwnode based function pointer to obtain channel specifier index. + * Functionally the same as @of_xlate. * @hwfifo_set_watermark: function pointer to set the current hardware * fifo watermark level; see hwfifo_* entries in * Documentation/ABI/testing/sysfs-bus-iio for details on @@ -510,6 +513,8 @@ struct iio_info { unsigned *readval); int (*of_xlate)(struct iio_dev *indio_dev, const struct of_phandle_args *iiospec); + int (*fwnode_xlate)(struct iio_dev *indio_dev, + const struct fwnode_reference_args *iiospec); int (*hwfifo_set_watermark)(struct iio_dev *indio_dev, unsigned val); int (*hwfifo_flush_to_buffer)(struct iio_dev *indio_dev, unsigned count); -- cgit From b22bc4d6072e74b768c4c19e2ff3585ba5927904 Mon Sep 17 00:00:00 2001 From: Nuno Sá Date: Fri, 15 Jul 2022 14:29:02 +0200 Subject: iio: inkern: remove OF dependencies MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since all users of the OF dependendent API are now converted to use the firmware agnostic alternative, we can drop OF dependencies from the IIO in kernel interface. Signed-off-by: Nuno Sá Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220715122903.332535-15-nuno.sa@analog.com Signed-off-by: Jonathan Cameron --- include/linux/iio/consumer.h | 10 ---------- include/linux/iio/iio.h | 3 --- 2 files changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iio/consumer.h b/include/linux/iio/consumer.h index 2adb1306da3e..6802596b017c 100644 --- a/include/linux/iio/consumer.h +++ b/include/linux/iio/consumer.h @@ -7,14 +7,12 @@ #ifndef _IIO_INKERN_CONSUMER_H_ #define _IIO_INKERN_CONSUMER_H_ -#include #include #include struct iio_dev; struct iio_chan_spec; struct device; -struct device_node; struct fwnode_handle; /** @@ -129,14 +127,6 @@ struct iio_channel *devm_fwnode_iio_channel_get_by_name(struct device *dev, struct fwnode_handle *fwnode, const char *consumer_channel); -static inline struct iio_channel -*devm_of_iio_channel_get_by_name(struct device *dev, struct device_node *np, - const char *consumer_channel) -{ - return devm_fwnode_iio_channel_get_by_name(dev, of_fwnode_handle(np), - consumer_channel); -} - struct iio_cb_buffer; /** * iio_channel_get_all_cb() - register callback for triggered capture diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h index 6002f8a0baf1..f0ec8a5e5a7a 100644 --- a/include/linux/iio/iio.h +++ b/include/linux/iio/iio.h @@ -17,7 +17,6 @@ * Currently assumes nano seconds. */ -struct of_phandle_args; struct fwnode_reference_args; enum iio_shared_by { @@ -511,8 +510,6 @@ struct iio_info { int (*debugfs_reg_access)(struct iio_dev *indio_dev, unsigned reg, unsigned writeval, unsigned *readval); - int (*of_xlate)(struct iio_dev *indio_dev, - const struct of_phandle_args *iiospec); int (*fwnode_xlate)(struct iio_dev *indio_dev, const struct fwnode_reference_args *iiospec); int (*hwfifo_set_watermark)(struct iio_dev *indio_dev, unsigned val); -- cgit From 5c64990b99aa91622cf044a9a2f7a78de8f770a2 Mon Sep 17 00:00:00 2001 From: Jonathan Cameron Date: Sun, 26 Jun 2022 13:29:32 +0100 Subject: iio: core: Introduce _zeropoint for differential channels Address an ABI gap for device where the offset of both lines in a differential pair may be controlled so as to allow a wider range of inputs, but without having any direct effect of the differential measurement. _offset cannot be used as to remain in line with existing usage, userspace would be expected to apply it as (_raw + _offset) * _scale whereas _zeropoint is not. i.e. If we were computing the differential in software it would be. ((postive_raw + _zeropoint) - (negative_raw + zeropoint) + _offset) * _scale = ((postive_raw - negative_raw) + _offset) * _scale = (differential_raw + _offset) * _scale Similarly calibbias is expected to tweak the measurement seen, not the adjust the two lines of the differential pair. Needed for in_capacitanceX-capacitanceY_zeropoint for the AD7746 CDC driver. Signed-off-by: Jonathan Cameron Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220626122938.582107-12-jic23@kernel.org --- include/linux/iio/types.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/iio/types.h b/include/linux/iio/types.h index a7aa91f3a8dc..27143b03909d 100644 --- a/include/linux/iio/types.h +++ b/include/linux/iio/types.h @@ -63,6 +63,7 @@ enum iio_chan_info_enum { IIO_CHAN_INFO_OVERSAMPLING_RATIO, IIO_CHAN_INFO_THERMOCOUPLE_TYPE, IIO_CHAN_INFO_CALIBAMBIENT, + IIO_CHAN_INFO_ZEROPOINT, }; #endif /* _IIO_TYPES_H_ */ -- cgit From 76b079ef4cc954fc2c2e0333a01855b0b2b6bdee Mon Sep 17 00:00:00 2001 From: Hao Jia Date: Sat, 6 Aug 2022 20:05:09 +0800 Subject: sched/psi: Remove unused parameter nbytes of psi_trigger_create() psi_trigger_create()'s 'nbytes' parameter is not used, so we can remove it. Signed-off-by: Hao Jia Reviewed-by: Ingo Molnar Acked-by: Johannes Weiner Signed-off-by: Tejun Heo --- include/linux/psi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/psi.h b/include/linux/psi.h index 89784763d19e..dd74411ac21d 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -27,7 +27,7 @@ void psi_memstall_leave(unsigned long *flags); int psi_show(struct seq_file *s, struct psi_group *group, enum psi_res res); struct psi_trigger *psi_trigger_create(struct psi_group *group, - char *buf, size_t nbytes, enum psi_res res); + char *buf, enum psi_res res); void psi_trigger_destroy(struct psi_trigger *t); __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, -- cgit From d7ae5818c3fa3007dee13f9d99832e7f26b8bc44 Mon Sep 17 00:00:00 2001 From: Hao Jia Date: Sat, 6 Aug 2022 20:05:10 +0800 Subject: sched/psi: Remove redundant cgroup_psi() when !CONFIG_CGROUPS cgroup_psi() is only called under CONFIG_CGROUPS. We don't need cgroup_psi() when !CONFIG_CGROUPS, so we can remove it in this case. Signed-off-by: Hao Jia Reviewed-by: Ingo Molnar Acked-by: Johannes Weiner Signed-off-by: Tejun Heo --- include/linux/cgroup.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index ed53bfe7c46c..ac5d0515680e 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -734,11 +734,6 @@ static inline struct cgroup *cgroup_parent(struct cgroup *cgrp) return NULL; } -static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) -{ - return NULL; -} - static inline bool cgroup_psi_enabled(void) { return false; -- cgit From 484b9fa4886bd9377969aad5e9ea17efda4ecda6 Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Tue, 16 Aug 2022 01:36:31 -0400 Subject: virtio: Revert "virtio: add helper virtio_find_vqs_ctx_size()" This reverts commit fe3dc04e31aa51f91dc7f741a5f76cc4817eb5b4: the API is now unused and in fact can't be implemented on top of a legacy device. Fixes: fe3dc04e31aa ("virtio: add helper virtio_find_vqs_ctx_size()") Cc: "Xuan Zhuo" Signed-off-by: Michael S. Tsirkin Message-Id: <20220816053602.173815-3-mst@redhat.com> --- include/linux/virtio_config.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 6adff09f7170..888f7e96f0c7 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -241,18 +241,6 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs, ctx, desc); } -static inline -int virtio_find_vqs_ctx_size(struct virtio_device *vdev, u32 nvqs, - struct virtqueue *vqs[], - vq_callback_t *callbacks[], - const char * const names[], - u32 sizes[], - const bool *ctx, struct irq_affinity *desc) -{ - return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, sizes, - ctx, desc); -} - /** * virtio_synchronize_cbs - synchronize with virtqueue callbacks * @vdev: the device -- cgit From 9993a4f989c7ca5e227329b2878f65d05c9fc20f Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Tue, 16 Aug 2022 01:36:58 -0400 Subject: virtio: Revert "virtio: find_vqs() add arg sizes" This reverts commit a10fba0377145fccefea4dc4dd5915b7ed87e546: the proposed API isn't supported on all transports but no effort was made to address this. It might not be hard to fix if we want to: maybe just rename size to size_hint and make sure legacy transports ignore the hint. But it's not sure what the benefit is in any case, so let's drop it. Fixes: a10fba037714 ("virtio: find_vqs() add arg sizes") Signed-off-by: Michael S. Tsirkin Message-Id: <20220816053602.173815-8-mst@redhat.com> --- include/linux/virtio_config.h | 14 +++++--------- 1 file changed, 5 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 888f7e96f0c7..36ec7be1f480 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -55,7 +55,6 @@ struct virtio_shm_region { * include a NULL entry for vqs that do not need a callback * names: array of virtqueue names (mainly for debugging) * include a NULL entry for vqs unused by driver - * sizes: array of virtqueue sizes * Returns 0 on success or error status * @del_vqs: free virtqueues found by find_vqs(). * @synchronize_cbs: synchronize with the virtqueue callbacks (optional) @@ -104,9 +103,7 @@ struct virtio_config_ops { void (*reset)(struct virtio_device *vdev); int (*find_vqs)(struct virtio_device *, unsigned nvqs, struct virtqueue *vqs[], vq_callback_t *callbacks[], - const char * const names[], - u32 sizes[], - const bool *ctx, + const char * const names[], const bool *ctx, struct irq_affinity *desc); void (*del_vqs)(struct virtio_device *); void (*synchronize_cbs)(struct virtio_device *); @@ -215,7 +212,7 @@ struct virtqueue *virtio_find_single_vq(struct virtio_device *vdev, const char *names[] = { n }; struct virtqueue *vq; int err = vdev->config->find_vqs(vdev, 1, &vq, callbacks, names, NULL, - NULL, NULL); + NULL); if (err < 0) return ERR_PTR(err); return vq; @@ -227,8 +224,7 @@ int virtio_find_vqs(struct virtio_device *vdev, unsigned nvqs, const char * const names[], struct irq_affinity *desc) { - return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, NULL, - NULL, desc); + return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, NULL, desc); } static inline @@ -237,8 +233,8 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs, const char * const names[], const bool *ctx, struct irq_affinity *desc) { - return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, NULL, - ctx, desc); + return vdev->config->find_vqs(vdev, nvqs, vqs, callbacks, names, ctx, + desc); } /** -- cgit From 5c669c4a4c6aa0489848093c93b8029f5c5c75ec Mon Sep 17 00:00:00 2001 From: Ricardo Cañuelo Date: Wed, 10 Aug 2022 11:40:03 +0200 Subject: virtio: kerneldocs fixes and enhancements MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix variable names in some kerneldocs, naming in others. Add kerneldocs for struct vring_desc and vring_interrupt. Signed-off-by: Ricardo Cañuelo Message-Id: <20220810094004.1250-2-ricardo.canuelo@collabora.com> Signed-off-by: Michael S. Tsirkin Reviewed-by: Cornelia Huck --- include/linux/virtio.h | 6 +++--- include/linux/virtio_config.h | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/virtio.h b/include/linux/virtio.h index a3f73bb6733e..dcab9c7e8784 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -11,7 +11,7 @@ #include /** - * virtqueue - a queue to register buffers for sending or receiving. + * struct virtqueue - a queue to register buffers for sending or receiving. * @list: the chain of virtqueues for this device * @callback: the function to call when buffers are consumed (can be NULL). * @name: the name of this virtqueue (mainly for debugging) @@ -97,7 +97,7 @@ int virtqueue_resize(struct virtqueue *vq, u32 num, void (*recycle)(struct virtqueue *vq, void *buf)); /** - * virtio_device - representation of a device using virtio + * struct virtio_device - representation of a device using virtio * @index: unique position on the virtio bus * @failed: saved value for VIRTIO_CONFIG_S_FAILED bit (for restore) * @config_enabled: configuration change reporting enabled @@ -156,7 +156,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev); list_for_each_entry(vq, &vdev->vqs, list) /** - * virtio_driver - operations for a virtio I/O driver + * struct virtio_driver - operations for a virtio I/O driver * @driver: underlying device driver (populate name and owner). * @id_table: the ids serviced by this driver. * @feature_table: an array of feature numbers supported by this driver. diff --git a/include/linux/virtio_config.h b/include/linux/virtio_config.h index 36ec7be1f480..4b517649cfe8 100644 --- a/include/linux/virtio_config.h +++ b/include/linux/virtio_config.h @@ -239,7 +239,7 @@ int virtio_find_vqs_ctx(struct virtio_device *vdev, unsigned nvqs, /** * virtio_synchronize_cbs - synchronize with virtqueue callbacks - * @vdev: the device + * @dev: the virtio device */ static inline void virtio_synchronize_cbs(struct virtio_device *dev) @@ -258,7 +258,7 @@ void virtio_synchronize_cbs(struct virtio_device *dev) /** * virtio_device_ready - enable vq use in probe function - * @vdev: the device + * @dev: the virtio device * * Driver must call this to use vqs in the probe function. * @@ -306,7 +306,7 @@ const char *virtio_bus_name(struct virtio_device *vdev) /** * virtqueue_set_affinity - setting affinity for a virtqueue * @vq: the virtqueue - * @cpu: the cpu no. + * @cpu_mask: the cpu mask * * Pay attention the function are best-effort: the affinity hint may not be set * due to config support, irq type and sharing. -- cgit From 6a8f359c3132e4f51bdb263ad74ec632c65e55fd Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 15 Aug 2022 10:02:29 +0200 Subject: gpio: pca953x: Make platform teardown callback return void MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit All platforms that provide a teardown callback return 0. New users are supposed to not make use of platform support, so there is no functionality lost. This patch is a preparation for making i2c remove callbacks return void. Reviewed-by: Andy Shevchenko Signed-off-by: Uwe Kleine-König Acked-by: Bartosz Golaszewski Signed-off-by: Wolfram Sang --- include/linux/platform_data/pca953x.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/pca953x.h b/include/linux/platform_data/pca953x.h index 4eb53e023997..96c1a14ab365 100644 --- a/include/linux/platform_data/pca953x.h +++ b/include/linux/platform_data/pca953x.h @@ -22,7 +22,7 @@ struct pca953x_platform_data { int (*setup)(struct i2c_client *client, unsigned gpio, unsigned ngpio, void *context); - int (*teardown)(struct i2c_client *client, + void (*teardown)(struct i2c_client *client, unsigned gpio, unsigned ngpio, void *context); const char *const *names; -- cgit From ed5c2f5fd10dda07263f79f338a512c0f49f76f5 Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 15 Aug 2022 10:02:30 +0200 Subject: i2c: Make remove callback return void MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The value returned by an i2c driver's remove function is mostly ignored. (Only an error message is printed if the value is non-zero that the error is ignored.) So change the prototype of the remove function to return no value. This way driver authors are not tempted to assume that passing an error to the upper layer is a good idea. All drivers are adapted accordingly. There is no intended change of behaviour, all callbacks were prepared to return 0 before. Reviewed-by: Peter Senna Tschudin Reviewed-by: Jeremy Kerr Reviewed-by: Benjamin Mugnier Reviewed-by: Javier Martinez Canillas Reviewed-by: Crt Mori Reviewed-by: Heikki Krogerus Acked-by: Greg Kroah-Hartman Acked-by: Marek Behún # for leds-turris-omnia Acked-by: Andy Shevchenko Reviewed-by: Petr Machata # for mlxsw Reviewed-by: Maximilian Luz # for surface3_power Acked-by: Srinivas Pandruvada # for bmc150-accel-i2c + kxcjk-1013 Reviewed-by: Hans Verkuil # for media/* + staging/media/* Acked-by: Miguel Ojeda # for auxdisplay/ht16k33 + auxdisplay/lcd2s Reviewed-by: Luca Ceresoli # for versaclock5 Reviewed-by: Ajay Gupta # for ucsi_ccg Acked-by: Jonathan Cameron # for iio Acked-by: Peter Rosin # for i2c-mux-*, max9860 Acked-by: Adrien Grassein # for lontium-lt8912b Reviewed-by: Jean Delvare # for hwmon, i2c-core and i2c/muxes Acked-by: Corey Minyard # for IPMI Reviewed-by: Vladimir Oltean Acked-by: Dmitry Torokhov Acked-by: Sebastian Reichel # for drivers/power Acked-by: Krzysztof Hałasa Signed-off-by: Uwe Kleine-König Signed-off-by: Wolfram Sang --- include/linux/i2c.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/i2c.h b/include/linux/i2c.h index 8eab5017bff3..f7c49bbdb8a1 100644 --- a/include/linux/i2c.h +++ b/include/linux/i2c.h @@ -273,7 +273,7 @@ struct i2c_driver { /* Standard driver model interfaces */ int (*probe)(struct i2c_client *client, const struct i2c_device_id *id); - int (*remove)(struct i2c_client *client); + void (*remove)(struct i2c_client *client); /* New driver model interface to aid the seamless removal of the * current probe()'s, more commonly unused than used second parameter. -- cgit From 680f8666baf6b4a6cc368dfff6614010ec23c51d Mon Sep 17 00:00:00 2001 From: Uwe Kleine-König Date: Mon, 18 Jul 2022 14:14:08 +0200 Subject: interconnect: Make icc_provider_del() return void MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit All users ignore the return value of icc_provider_del(). Consequently make it not return an error code. Signed-off-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20220718121409.171773-8-u.kleine-koenig@pengutronix.de Signed-off-by: Georgi Djakov --- include/linux/interconnect-provider.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/interconnect-provider.h b/include/linux/interconnect-provider.h index 6bd01f7159c6..cd5c5a27557f 100644 --- a/include/linux/interconnect-provider.h +++ b/include/linux/interconnect-provider.h @@ -123,7 +123,7 @@ void icc_node_add(struct icc_node *node, struct icc_provider *provider); void icc_node_del(struct icc_node *node); int icc_nodes_remove(struct icc_provider *provider); int icc_provider_add(struct icc_provider *provider); -int icc_provider_del(struct icc_provider *provider); +void icc_provider_del(struct icc_provider *provider); struct icc_node_data *of_icc_get_from_provider(struct of_phandle_args *spec); void icc_sync_state(struct device *dev); @@ -172,9 +172,8 @@ static inline int icc_provider_add(struct icc_provider *provider) return -ENOTSUPP; } -static inline int icc_provider_del(struct icc_provider *provider) +static inline void icc_provider_del(struct icc_provider *provider) { - return -ENOTSUPP; } static inline struct icc_node_data *of_icc_get_from_provider(struct of_phandle_args *spec) -- cgit From 7cd4c5c2101cb092db00f61f69d24380cf7a0ee8 Mon Sep 17 00:00:00 2001 From: Frederick Lawler Date: Mon, 15 Aug 2022 11:20:25 -0500 Subject: security, lsm: Introduce security_create_user_ns() User namespaces are an effective tool to allow programs to run with permission without requiring the need for a program to run as root. User namespaces may also be used as a sandboxing technique. However, attackers sometimes leverage user namespaces as an initial attack vector to perform some exploit. [1,2,3] While it is not the unprivileged user namespace functionality, which causes the kernel to be exploitable, users/administrators might want to more granularly limit or at least monitor how various processes use this functionality, while vulnerable kernel subsystems are being patched. Preventing user namespace already creation comes in a few of forms in order of granularity: 1. /proc/sys/user/max_user_namespaces sysctl 2. Distro specific patch(es) 3. CONFIG_USER_NS To block a task based on its attributes, the LSM hook cred_prepare is a decent candidate for use because it provides more granular control, and it is called before create_user_ns(): cred = prepare_creds() security_prepare_creds() call_int_hook(cred_prepare, ... if (cred) create_user_ns(cred) Since security_prepare_creds() is meant for LSMs to copy and prepare credentials, access control is an unintended use of the hook. [4] Further, security_prepare_creds() will always return a ENOMEM if the hook returns any non-zero error code. This hook also does not handle the clone3 case which requires us to access a user space pointer to know if we're in the CLONE_NEW_USER call path which may be subject to a TOCTTOU attack. Lastly, cred_prepare is called in many call paths, and a targeted hook further limits the frequency of calls which is a beneficial outcome. Therefore introduce a new function security_create_user_ns() with an accompanying userns_create LSM hook. With the new userns_create hook, users will have more control over the observability and access control over user namespace creation. Users should expect that normal operation of user namespaces will behave as usual, and only be impacted when controls are implemented by users or administrators. This hook takes the prepared creds for LSM authors to write policy against. On success, the new namespace is applied to credentials, otherwise an error is returned. Links: 1. https://nvd.nist.gov/vuln/detail/CVE-2022-0492 2. https://nvd.nist.gov/vuln/detail/CVE-2022-25636 3. https://nvd.nist.gov/vuln/detail/CVE-2022-34918 4. https://lore.kernel.org/all/1c4b1c0d-12f6-6e9e-a6a3-cdce7418110c@schaufler-ca.com/ Reviewed-by: Christian Brauner (Microsoft) Reviewed-by: KP Singh Signed-off-by: Frederick Lawler Signed-off-by: Paul Moore --- include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 4 ++++ include/linux/security.h | 6 ++++++ 3 files changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 806448173033..aa7272e83626 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -224,6 +224,7 @@ LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p, struct inode *inode) +LSM_HOOK(int, 0, userns_create, const struct cred *cred) LSM_HOOK(int, 0, ipc_permission, struct kern_ipc_perm *ipcp, short flag) LSM_HOOK(void, LSM_RET_VOID, ipc_getsecid, struct kern_ipc_perm *ipcp, u32 *secid) diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 84a0d7e02176..2e11a2a22ed1 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -806,6 +806,10 @@ * security attributes, e.g. for /proc/pid inodes. * @p contains the task_struct for the task. * @inode contains the inode structure for the inode. + * @userns_create: + * Check permission prior to creating a new user namespace. + * @cred points to prepared creds. + * Return 0 if successful, otherwise < 0 error code. * * Security hooks for Netlink messaging. * diff --git a/include/linux/security.h b/include/linux/security.h index 1bc362cb413f..767802fe9bfa 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -437,6 +437,7 @@ int security_task_kill(struct task_struct *p, struct kernel_siginfo *info, int security_task_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5); void security_task_to_inode(struct task_struct *p, struct inode *inode); +int security_create_user_ns(const struct cred *cred); int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag); void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid); int security_msg_msg_alloc(struct msg_msg *msg); @@ -1194,6 +1195,11 @@ static inline int security_task_prctl(int option, unsigned long arg2, static inline void security_task_to_inode(struct task_struct *p, struct inode *inode) { } +static inline int security_create_user_ns(const struct cred *cred) +{ + return 0; +} + static inline int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag) { -- cgit From 0630f64d25a0f0a8c6a9ce9fde8750b3b561e6f5 Mon Sep 17 00:00:00 2001 From: Florian Fainelli Date: Mon, 15 Aug 2022 12:07:47 -0700 Subject: net: phy: broadcom: Implement suspend/resume for AC131 and BCM5241 Implement the suspend/resume procedure for the Broadcom AC131 and BCM5241 type of PHYs (10/100 only) by entering the standard power down followed by the proprietary standby mode in the auxiliary mode 4 shadow register. On resume, the PHY software reset is enough to make it come out of standby mode so we can utilize brcm_fet_config_init() as the resume hook. Signed-off-by: Florian Fainelli Signed-off-by: David S. Miller --- include/linux/brcmphy.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h index 6ff567ece34a..9e77165f3ef6 100644 --- a/include/linux/brcmphy.h +++ b/include/linux/brcmphy.h @@ -293,6 +293,7 @@ #define MII_BRCM_FET_SHDW_MC_FAME 0x4000 /* Force Auto MDIX enable */ #define MII_BRCM_FET_SHDW_AUXMODE4 0x1a /* Auxiliary mode 4 */ +#define MII_BRCM_FET_SHDW_AM4_STANDBY 0x0008 /* Standby enable */ #define MII_BRCM_FET_SHDW_AM4_LED_MASK 0x0003 #define MII_BRCM_FET_SHDW_AM4_LED_MODE1 0x0001 -- cgit From c20cc099b30abd50f563e422aa72edcd7f92da55 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Tue, 16 Aug 2022 22:48:31 +0200 Subject: regmap: Support accelerated noinc operations Several architectures have accelerated operations for MMIO operations writing to a single register, such as writesb, writesw, writesl, writesq, readsb, readsw, readsl and readsq but regmap currently cannot use them because we have no hooks for providing an accelerated noinc back-end for MMIO. Solve this by providing reg_[read/write]_noinc callbacks for the bus abstraction, so that the regmap-mmio bus can use this. Currently I do not see a need to support this for custom regmaps so it is only added to the bus. Callbacks are passed a void * with the array of values and a count which is the number of items of the byte chunk size for the specific register width. Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220816204832.265837-1-linus.walleij@linaro.org Signed-off-by: Mark Brown --- include/linux/regmap.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regmap.h b/include/linux/regmap.h index 8cccc247cd37..ca3434dca3a0 100644 --- a/include/linux/regmap.h +++ b/include/linux/regmap.h @@ -492,8 +492,12 @@ typedef int (*regmap_hw_read)(void *context, void *val_buf, size_t val_size); typedef int (*regmap_hw_reg_read)(void *context, unsigned int reg, unsigned int *val); +typedef int (*regmap_hw_reg_noinc_read)(void *context, unsigned int reg, + void *val, size_t val_count); typedef int (*regmap_hw_reg_write)(void *context, unsigned int reg, unsigned int val); +typedef int (*regmap_hw_reg_noinc_write)(void *context, unsigned int reg, + const void *val, size_t val_count); typedef int (*regmap_hw_reg_update_bits)(void *context, unsigned int reg, unsigned int mask, unsigned int val); typedef struct regmap_async *(*regmap_hw_async_alloc)(void); @@ -514,6 +518,8 @@ typedef void (*regmap_hw_free_context)(void *context); * must serialise with respect to non-async I/O. * @reg_write: Write a single register value to the given register address. This * write operation has to complete when returning from the function. + * @reg_write_noinc: Write multiple register value to the same register. This + * write operation has to complete when returning from the function. * @reg_update_bits: Update bits operation to be used against volatile * registers, intended for devices supporting some mechanism * for setting clearing bits without having to @@ -541,9 +547,11 @@ struct regmap_bus { regmap_hw_gather_write gather_write; regmap_hw_async_write async_write; regmap_hw_reg_write reg_write; + regmap_hw_reg_noinc_write reg_noinc_write; regmap_hw_reg_update_bits reg_update_bits; regmap_hw_read read; regmap_hw_reg_read reg_read; + regmap_hw_reg_noinc_read reg_noinc_read; regmap_hw_free_context free_context; regmap_hw_async_alloc async_alloc; u8 read_flag_mask; -- cgit From 874de459488b8741afc0e9888d39f2e15a962b3d Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Thu, 14 Jul 2022 09:10:40 +0800 Subject: soundwire: add read_ping_status helper definition in manager ops The existing manager ops provide callbacks to transfer read/write commands, but don't allow for direct access to PING status register. This is accessible in all existing IP, and would help diagnose timeouts or resume issues by reporting the 'true' status instead of the internal status reported by the IP. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Acked-By: Vinod Koul Link: https://lore.kernel.org/r/20220714011043.46059-2-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown --- include/linux/soundwire/sdw.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index 39058c841469..eb840a07c25c 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -839,6 +839,8 @@ struct sdw_defer { * @set_bus_conf: Set the bus configuration * @pre_bank_switch: Callback for pre bank switch * @post_bank_switch: Callback for post bank switch + * @read_ping_status: Read status from PING frames, reported with two bits per Device. + * Bits 31:24 are reserved. */ struct sdw_master_ops { int (*read_prop)(struct sdw_bus *bus); @@ -855,6 +857,7 @@ struct sdw_master_ops { struct sdw_bus_params *params); int (*pre_bank_switch)(struct sdw_bus *bus); int (*post_bank_switch)(struct sdw_bus *bus); + u32 (*read_ping_status)(struct sdw_bus *bus); }; -- cgit From 79fe02cb7547fcc09e83b850cfd32896d7dc6289 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Thu, 14 Jul 2022 09:10:42 +0800 Subject: soundwire: add sdw_show_ping_status() helper This helper provides an optional delay parameter to wait for devices to resync in case of errors, and checks that devices are indeed attached on the bus. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Acked-By: Vinod Koul Link: https://lore.kernel.org/r/20220714011043.46059-4-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown --- include/linux/soundwire/sdw.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index eb840a07c25c..822599957b35 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -922,6 +922,8 @@ int sdw_bus_master_add(struct sdw_bus *bus, struct device *parent, struct fwnode_handle *fwnode); void sdw_bus_master_delete(struct sdw_bus *bus); +void sdw_show_ping_status(struct sdw_bus *bus, bool sync_delay); + /** * sdw_port_config: Master or Slave Port configuration * -- cgit From 3fd6d6e2b4e80fe45bfd1c8f01dff7d30a0f9b53 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Fri, 5 Aug 2022 00:43:17 +0200 Subject: thermal/of: Rework the thermal device tree initialization The following changes are reworking entirely the thermal device tree initialization. The old version is kept until the different drivers using it are converted to the new API. The old approach creates the different actors independently. This approach is the source of the code duplication in the thermal OF because a thermal zone is created but a sensor is registered after. The thermal zones are created unconditionnaly with a fake sensor at init time, thus forcing to provide fake ops and store all the thermal zone related information in duplicated structures. Then the sensor is initialized and the code looks up the thermal zone name using the device tree. Then the sensor is associated to the thermal zone, and the sensor specific ops are called with a second level of indirection from the thermal zone ops. When a sensor is removed (with a module unload), the thermal zone stays there with the fake sensor. The cooling device associated with a thermal zone and a trip point is stored in a list, again duplicating information, using the node name of the device tree to match afterwards the cooling devices. The new approach is simpler, it creates a thermal zone when the sensor is registered and destroys it when the sensor is removed. All the matching between the cooling device, trip points and thermal zones are done using the device tree, as well as bindings. The ops are no longer specific but uses the generic ones provided by the thermal framework. When the old code won't have any users, it can be removed and the remaining thermal OF code will be much simpler. Signed-off-by: Daniel Lezcano Link: https://lore.kernel.org/r/20220804224349.1926752-2-daniel.lezcano@linexp.org Signed-off-by: Daniel Lezcano --- include/linux/thermal.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 1386c713885d..e2ac9d473bd6 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -325,6 +325,16 @@ struct thermal_zone_of_device_ops { /* Function declarations */ #ifdef CONFIG_THERMAL_OF +struct thermal_zone_device *thermal_of_zone_register(struct device_node *sensor, int id, void *data, + const struct thermal_zone_device_ops *ops); + +struct thermal_zone_device *devm_thermal_of_zone_register(struct device *dev, int id, void *data, + const struct thermal_zone_device_ops *ops); + +void thermal_of_zone_unregister(struct thermal_zone_device *tz); + +void devm_thermal_of_zone_unregister(struct device *dev, struct thermal_zone_device *tz); + int thermal_zone_of_get_sensor_id(struct device_node *tz_np, struct device_node *sensor_np, u32 *id); @@ -366,6 +376,14 @@ static inline struct thermal_zone_device *devm_thermal_zone_of_sensor_register( return ERR_PTR(-ENODEV); } +static inline void thermal_of_zone_unregister(struct thermal_zone_device *tz) +{ +} + +static inline void devm_thermal_of_zone_unregister(struct device *dev, struct thermal_zone_device *tz) +{ +} + static inline void devm_thermal_zone_of_sensor_unregister(struct device *dev, struct thermal_zone_device *tz) -- cgit From f59ac19b7f44cab23df84810e2da5f57bdd3a7e7 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Fri, 5 Aug 2022 00:43:49 +0200 Subject: thermal/of: Remove old OF code All the drivers are converted to the new OF API, remove the old OF code. Signed-off-by: Daniel Lezcano Link: https://lore.kernel.org/r/20220804224349.1926752-34-daniel.lezcano@linexp.org Signed-off-by: Daniel Lezcano --- include/linux/thermal.h | 77 ++++++++++--------------------------------------- 1 file changed, 15 insertions(+), 62 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index e2ac9d473bd6..86c24ddd5985 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -296,33 +296,6 @@ struct thermal_zone_params { int offset; }; -/** - * struct thermal_zone_of_device_ops - callbacks for handling DT based zones - * - * Mandatory: - * @get_temp: a pointer to a function that reads the sensor temperature. - * - * Optional: - * @get_trend: a pointer to a function that reads the sensor temperature trend. - * @set_trips: a pointer to a function that sets a temperature window. When - * this window is left the driver must inform the thermal core via - * thermal_zone_device_update. - * @set_emul_temp: a pointer to a function that sets sensor emulated - * temperature. - * @set_trip_temp: a pointer to a function that sets the trip temperature on - * hardware. - * @change_mode: a pointer to a function that notifies the thermal zone - * mode change. - */ -struct thermal_zone_of_device_ops { - int (*get_temp)(void *, int *); - int (*get_trend)(void *, int, enum thermal_trend *); - int (*set_trips)(void *, int, int); - int (*set_emul_temp)(void *, int); - int (*set_trip_temp)(void *, int, int); - int (*change_mode) (void *, enum thermal_device_mode); -}; - /* Function declarations */ #ifdef CONFIG_THERMAL_OF struct thermal_zone_device *thermal_of_zone_register(struct device_node *sensor, int id, void *data, @@ -335,61 +308,41 @@ void thermal_of_zone_unregister(struct thermal_zone_device *tz); void devm_thermal_of_zone_unregister(struct device *dev, struct thermal_zone_device *tz); +void thermal_of_zone_unregister(struct thermal_zone_device *tz); + int thermal_zone_of_get_sensor_id(struct device_node *tz_np, struct device_node *sensor_np, u32 *id); -struct thermal_zone_device * -thermal_zone_of_sensor_register(struct device *dev, int id, void *data, - const struct thermal_zone_of_device_ops *ops); -void thermal_zone_of_sensor_unregister(struct device *dev, - struct thermal_zone_device *tz); -struct thermal_zone_device *devm_thermal_zone_of_sensor_register( - struct device *dev, int id, void *data, - const struct thermal_zone_of_device_ops *ops); -void devm_thermal_zone_of_sensor_unregister(struct device *dev, - struct thermal_zone_device *tz); #else - -static inline int thermal_zone_of_get_sensor_id(struct device_node *tz_np, - struct device_node *sensor_np, - u32 *id) -{ - return -ENOENT; -} -static inline struct thermal_zone_device * -thermal_zone_of_sensor_register(struct device *dev, int id, void *data, - const struct thermal_zone_of_device_ops *ops) -{ - return ERR_PTR(-ENODEV); -} - static inline -void thermal_zone_of_sensor_unregister(struct device *dev, - struct thermal_zone_device *tz) +struct thermal_zone_device *thermal_of_zone_register(struct device_node *sensor, int id, void *data, + const struct thermal_zone_device_ops *ops) { + return ERR_PTR(-ENOTSUPP); } -static inline struct thermal_zone_device *devm_thermal_zone_of_sensor_register( - struct device *dev, int id, void *data, - const struct thermal_zone_of_device_ops *ops) +static inline +struct thermal_zone_device *devm_thermal_of_zone_register(struct device *dev, int id, void *data, + const struct thermal_zone_device_ops *ops) { - return ERR_PTR(-ENODEV); + return ERR_PTR(-ENOTSUPP); } static inline void thermal_of_zone_unregister(struct thermal_zone_device *tz) { } -static inline void devm_thermal_of_zone_unregister(struct device *dev, struct thermal_zone_device *tz) +static inline void devm_thermal_of_zone_unregister(struct device *dev, + struct thermal_zone_device *tz) { } -static inline -void devm_thermal_zone_of_sensor_unregister(struct device *dev, - struct thermal_zone_device *tz) +static inline int thermal_zone_of_get_sensor_id(struct device_node *tz_np, + struct device_node *sensor_np, + u32 *id) { + return -ENOENT; } - #endif #ifdef CONFIG_THERMAL -- cgit From 25885a35a72007cf28ec5f9ba7169c5c798f7167 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 16 Aug 2022 11:57:56 -0400 Subject: Change calling conventions for filldir_t filldir_t instances (directory iterators callbacks) used to return 0 for "OK, keep going" or -E... for "stop". Note that it's *NOT* how the error values are reported - the rules for those are callback-dependent and ->iterate{,_shared}() instances only care about zero vs. non-zero (look at emit_dir() and friends). So let's just return bool ("should we keep going?") - it's less confusing that way. The choice between "true means keep going" and "true means stop" is bikesheddable; we have two groups of callbacks - do something for everything in directory, until we run into problem and find an entry in directory and do something to it. The former tended to use 0/-E... conventions - -E on failure. The latter tended to use 0/1, 1 being "stop, we are done". The callers treated anything non-zero as "stop", ignoring which non-zero value did they get. "true means stop" would be more natural for the second group; "true means keep going" - for the first one. I tried both variants and the things like if allocation failed something = -ENOMEM; return true; just looked unnatural and asking for trouble. [folded suggestion from Matthew Wilcox ] Acked-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro --- include/linux/fs.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9eced4cc286e..c40f68f62941 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2038,9 +2038,10 @@ umode_t mode_strip_sgid(struct user_namespace *mnt_userns, * the kernel specify what kind of dirent layout it wants to have. * This allows the kernel to read directories into kernel space or * to have different dirent layouts depending on the binary type. + * Return 'true' to keep going and 'false' if there are no more entries. */ struct dir_context; -typedef int (*filldir_t)(struct dir_context *, const char *, int, loff_t, u64, +typedef bool (*filldir_t)(struct dir_context *, const char *, int, loff_t, u64, unsigned); struct dir_context { @@ -3540,17 +3541,17 @@ static inline bool dir_emit(struct dir_context *ctx, const char *name, int namelen, u64 ino, unsigned type) { - return ctx->actor(ctx, name, namelen, ctx->pos, ino, type) == 0; + return ctx->actor(ctx, name, namelen, ctx->pos, ino, type); } static inline bool dir_emit_dot(struct file *file, struct dir_context *ctx) { return ctx->actor(ctx, ".", 1, ctx->pos, - file->f_path.dentry->d_inode->i_ino, DT_DIR) == 0; + file->f_path.dentry->d_inode->i_ino, DT_DIR); } static inline bool dir_emit_dotdot(struct file *file, struct dir_context *ctx) { return ctx->actor(ctx, "..", 2, ctx->pos, - parent_ino(file->f_path.dentry), DT_DIR) == 0; + parent_ino(file->f_path.dentry), DT_DIR); } static inline bool dir_emit_dots(struct file *file, struct dir_context *ctx) { -- cgit From e34cfee65ec891a319ce79797dda18083af33a76 Mon Sep 17 00:00:00 2001 From: Wong Vee Khee Date: Wed, 17 Aug 2022 14:43:24 +0800 Subject: stmmac: intel: remove unused 'has_crossts' flag The 'has_crossts' flag was not used anywhere in the stmmac driver, removing it from both header file and dwmac-intel driver. Signed-off-by: Wong Vee Khee Reviewed-by: Kurt Kanzenbach Link: https://lore.kernel.org/r/20220817064324.10025-1-veekhee@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/stmmac.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index 8df475db88c0..fb2e88614f5d 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -257,7 +257,6 @@ struct plat_stmmacenet_data { u8 vlan_fail_q; unsigned int eee_usecs_rate; struct pci_dev *pdev; - bool has_crossts; int int_snapshot_num; int ext_snapshot_num; bool int_snapshot_en; -- cgit From da279e6965b3838e99e5c0ab8f76b87bf86b31a5 Mon Sep 17 00:00:00 2001 From: Matti Vaittinen Date: Fri, 12 Aug 2022 13:10:37 +0300 Subject: regulator: Add devm helpers for get and enable A few regulator consumer drivers seem to be just getting a regulator, enabling it and registering a devm-action to disable the regulator at the driver detach and then forget about it. We can simplify this a bit by adding a devm-helper for this pattern. Add devm_regulator_get_enable() and devm_regulator_get_enable_optional() Signed-off-by: Matti Vaittinen Link: https://lore.kernel.org/r/ed7b8841193bb9749d426f3cb3b199c9460794cd.1660292316.git.mazziesaccount@gmail.com Signed-off-by: Mark Brown --- include/linux/regulator/consumer.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regulator/consumer.h b/include/linux/regulator/consumer.h index bc6cda706d1f..ee3b4a014611 100644 --- a/include/linux/regulator/consumer.h +++ b/include/linux/regulator/consumer.h @@ -207,6 +207,8 @@ struct regulator *__must_check regulator_get_optional(struct device *dev, const char *id); struct regulator *__must_check devm_regulator_get_optional(struct device *dev, const char *id); +int devm_regulator_get_enable(struct device *dev, const char *id); +int devm_regulator_get_enable_optional(struct device *dev, const char *id); void regulator_put(struct regulator *regulator); void devm_regulator_put(struct regulator *regulator); @@ -244,12 +246,15 @@ int __must_check regulator_bulk_get(struct device *dev, int num_consumers, struct regulator_bulk_data *consumers); int __must_check devm_regulator_bulk_get(struct device *dev, int num_consumers, struct regulator_bulk_data *consumers); +void devm_regulator_bulk_put(struct regulator_bulk_data *consumers); int __must_check devm_regulator_bulk_get_const( struct device *dev, int num_consumers, const struct regulator_bulk_data *in_consumers, struct regulator_bulk_data **out_consumers); int __must_check regulator_bulk_enable(int num_consumers, struct regulator_bulk_data *consumers); +int devm_regulator_bulk_get_enable(struct device *dev, int num_consumers, + const char * const *id); int regulator_bulk_disable(int num_consumers, struct regulator_bulk_data *consumers); int regulator_bulk_force_disable(int num_consumers, @@ -354,6 +359,17 @@ devm_regulator_get_exclusive(struct device *dev, const char *id) return ERR_PTR(-ENODEV); } +static inline int devm_regulator_get_enable(struct device *dev, const char *id) +{ + return -ENODEV; +} + +static inline int devm_regulator_get_enable_optional(struct device *dev, + const char *id) +{ + return -ENODEV; +} + static inline struct regulator *__must_check regulator_get_optional(struct device *dev, const char *id) { @@ -375,6 +391,10 @@ static inline void devm_regulator_put(struct regulator *regulator) { } +static inline void devm_regulator_bulk_put(struct regulator_bulk_data *consumers) +{ +} + static inline int regulator_register_supply_alias(struct device *dev, const char *id, struct device *alias_dev, @@ -465,6 +485,13 @@ static inline int regulator_bulk_enable(int num_consumers, return 0; } +static inline int devm_regulator_bulk_get_enable(struct device *dev, + int num_consumers, + const char * const *id) +{ + return 0; +} + static inline int regulator_bulk_disable(int num_consumers, struct regulator_bulk_data *consumers) { -- cgit From a8239f0342bae5a51acca967ba95b9a8ad56dd62 Mon Sep 17 00:00:00 2001 From: Yu Kuai Date: Thu, 18 Aug 2022 14:35:55 +0800 Subject: blk-mq: remove unused function blk_mq_queue_stopped() blk_mq_queue_stopped() doesn't have any caller, which was found by code coverage test, thus remove it. Signed-off-by: Yu Kuai Link: https://lore.kernel.org/r/20220818063555.3741222-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index effee1dc715a..92294a5fb083 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -857,7 +857,6 @@ void blk_mq_kick_requeue_list(struct request_queue *q); void blk_mq_delay_kick_requeue_list(struct request_queue *q, unsigned long msecs); void blk_mq_complete_request(struct request *rq); bool blk_mq_complete_request_remote(struct request *rq); -bool blk_mq_queue_stopped(struct request_queue *q); void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx); void blk_mq_start_hw_queue(struct blk_mq_hw_ctx *hctx); void blk_mq_stop_hw_queues(struct request_queue *q); -- cgit From b5a5b9d5f28d23b84f06b45c61dcad95b07d41bc Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Thu, 18 Aug 2022 15:38:58 +0200 Subject: serial: document start_rx member at struct uart_ops Fix this doc build warning: ./include/linux/serial_core.h:397: warning: Function parameter or member 'start_rx' not described in 'uart_ops' Signed-off-by: Mauro Carvalho Chehab Link: https://lore.kernel.org/r/5d07ae2eec8fbad87e623160f9926b178bef2744.1660829433.git.mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index aef3145f2032..6e4f4765d209 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -141,6 +141,14 @@ struct gpio_desc; * Locking: none. * Interrupts: caller dependent. * + * @start_rx: ``void ()(struct uart_port *port)`` + * + * Start receiving characters. + * + * Locking: @port->lock taken. + * Interrupts: locally disabled. + * This call must not sleep + * * @stop_rx: ``void ()(struct uart_port *port)`` * * Stop receiving characters; the @port is in the process of being closed. -- cgit From c1e5c2f0cb8a22ec2e14af92afc7006491bebabb Mon Sep 17 00:00:00 2001 From: Pablo Sun Date: Thu, 4 Aug 2022 11:48:03 +0800 Subject: usb: typec: altmodes/displayport: correct pin assignment for UFP receptacles Fix incorrect pin assignment values when connecting to a monitor with Type-C receptacle instead of a plug. According to specification, an UFP_D receptacle's pin assignment should came from the UFP_D pin assignments field (bit 23:16), while an UFP_D plug's assignments are described in the DFP_D pin assignments (bit 15:8) during Mode Discovery. For example the LG 27 UL850-W is a monitor with Type-C receptacle. The monitor responds to MODE DISCOVERY command with following DisplayPort Capability flag: dp->alt->vdo=0x140045 The existing logic only take cares of UPF_D plug case, and would take the bit 15:8 for this 0x140045 case. This results in an non-existing pin assignment 0x0 in dp_altmode_configure. To fix this problem a new set of macros are introduced to take plug/receptacle differences into consideration. Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode") Cc: stable@vger.kernel.org Co-developed-by: Pablo Sun Co-developed-by: Macpaul Lin Reviewed-by: Guillaume Ranquet Reviewed-by: Heikki Krogerus Signed-off-by: Pablo Sun Signed-off-by: Macpaul Lin Link: https://lore.kernel.org/r/20220804034803.19486-1-macpaul.lin@mediatek.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/typec_dp.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb/typec_dp.h b/include/linux/usb/typec_dp.h index cfb916cccd31..8d09c2f0a9b8 100644 --- a/include/linux/usb/typec_dp.h +++ b/include/linux/usb/typec_dp.h @@ -73,6 +73,11 @@ enum { #define DP_CAP_USB BIT(7) #define DP_CAP_DFP_D_PIN_ASSIGN(_cap_) (((_cap_) & GENMASK(15, 8)) >> 8) #define DP_CAP_UFP_D_PIN_ASSIGN(_cap_) (((_cap_) & GENMASK(23, 16)) >> 16) +/* Get pin assignment taking plug & receptacle into consideration */ +#define DP_CAP_PIN_ASSIGN_UFP_D(_cap_) ((_cap_ & DP_CAP_RECEPTACLE) ? \ + DP_CAP_UFP_D_PIN_ASSIGN(_cap_) : DP_CAP_DFP_D_PIN_ASSIGN(_cap_)) +#define DP_CAP_PIN_ASSIGN_DFP_D(_cap_) ((_cap_ & DP_CAP_RECEPTACLE) ? \ + DP_CAP_DFP_D_PIN_ASSIGN(_cap_) : DP_CAP_UFP_D_PIN_ASSIGN(_cap_)) /* DisplayPort Status Update VDO bits */ #define DP_STATUS_CONNECTION(_status_) ((_status_) & 3) -- cgit From 77947238dad3b83bbe085ebcd576ea55afb70425 Mon Sep 17 00:00:00 2001 From: Prashant Malani Date: Tue, 16 Aug 2022 21:48:29 +0000 Subject: platform/chrome: Add Type-C mux set command definitions Copy EC header definitions for the USB Type-C Mux control command from the EC code base. Also pull in "TBT_UFP_REPLY" definitions, since that is the prior entry in the enum. These headers are already present in the EC code base. [1] [1] https://chromium.googlesource.com/chromiumos/platform/ec/+/b80f85a94a423273c1638ef7b662c56931a138dd/include/ec_commands.h Signed-off-by: Prashant Malani Reviewed-by: Tzung-Bi Shih Link: https://lore.kernel.org/r/20220816214857.2088914-2-pmalani@chromium.org --- include/linux/platform_data/cros_ec_commands.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_commands.h b/include/linux/platform_data/cros_ec_commands.h index 8b1b795867a1..5744a2d746aa 100644 --- a/include/linux/platform_data/cros_ec_commands.h +++ b/include/linux/platform_data/cros_ec_commands.h @@ -5724,8 +5724,21 @@ enum typec_control_command { TYPEC_CONTROL_COMMAND_EXIT_MODES, TYPEC_CONTROL_COMMAND_CLEAR_EVENTS, TYPEC_CONTROL_COMMAND_ENTER_MODE, + TYPEC_CONTROL_COMMAND_TBT_UFP_REPLY, + TYPEC_CONTROL_COMMAND_USB_MUX_SET, }; +/* Replies the AP may specify to the TBT EnterMode command as a UFP */ +enum typec_tbt_ufp_reply { + TYPEC_TBT_UFP_REPLY_NAK, + TYPEC_TBT_UFP_REPLY_ACK, +}; + +struct typec_usb_mux_set { + uint8_t mux_index; /* Index of the mux to set in the chain */ + uint8_t mux_flags; /* USB_PD_MUX_*-encoded USB mux state to set */ +} __ec_align1; + struct ec_params_typec_control { uint8_t port; uint8_t command; /* enum typec_control_command */ @@ -5739,6 +5752,8 @@ struct ec_params_typec_control { union { uint32_t clear_events_mask; uint8_t mode_to_enter; /* enum typec_mode */ + uint8_t tbt_ufp_reply; /* enum typec_tbt_ufp_reply */ + struct typec_usb_mux_set mux_params; uint8_t placeholder[128]; }; } __ec_align1; @@ -5817,6 +5832,9 @@ enum tcpc_cc_polarity { #define PD_STATUS_EVENT_SOP_DISC_DONE BIT(0) #define PD_STATUS_EVENT_SOP_PRIME_DISC_DONE BIT(1) #define PD_STATUS_EVENT_HARD_RESET BIT(2) +#define PD_STATUS_EVENT_DISCONNECTED BIT(3) +#define PD_STATUS_EVENT_MUX_0_SET_DONE BIT(4) +#define PD_STATUS_EVENT_MUX_1_SET_DONE BIT(5) struct ec_params_typec_status { uint8_t port; -- cgit From 24426654ed3ae83d1127511891fb782c54f49203 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Tue, 16 Aug 2022 23:17:17 -0700 Subject: bpf: net: Avoid sk_setsockopt() taking sk lock when called from bpf Most of the code in bpf_setsockopt(SOL_SOCKET) are duplicated from the sk_setsockopt(). The number of supported optnames are increasing ever and so as the duplicated code. One issue in reusing sk_setsockopt() is that the bpf prog has already acquired the sk lock. This patch adds a has_current_bpf_ctx() to tell if the sk_setsockopt() is called from a bpf prog. The bpf prog calling bpf_setsockopt() is either running in_task() or in_serving_softirq(). Both cases have the current->bpf_ctx initialized. Thus, the has_current_bpf_ctx() only needs to test !!current->bpf_ctx. This patch also adds sockopt_{lock,release}_sock() helpers for sk_setsockopt() to use. These helpers will test has_current_bpf_ctx() before acquiring/releasing the lock. They are in EXPORT_SYMBOL for the ipv6 module to use in a latter patch. Note on the change in sock_setbindtodevice(). sockopt_lock_sock() is done in sock_setbindtodevice() instead of doing the lock_sock in sock_bindtoindex(..., lock_sk = true). Reviewed-by: Stanislav Fomichev Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220817061717.4175589-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a627a02cf8ab..39bd36359c1e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1966,6 +1966,15 @@ static inline bool unprivileged_ebpf_enabled(void) return !sysctl_unprivileged_bpf_disabled; } +/* Not all bpf prog type has the bpf_ctx. + * For the bpf prog type that has initialized the bpf_ctx, + * this function can be used to decide if a kernel function + * is called by a bpf program. + */ +static inline bool has_current_bpf_ctx(void) +{ + return !!current->bpf_ctx; +} #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) { @@ -2175,6 +2184,10 @@ static inline bool unprivileged_ebpf_enabled(void) return false; } +static inline bool has_current_bpf_ctx(void) +{ + return false; +} #endif /* CONFIG_BPF_SYSCALL */ void __bpf_free_used_btfs(struct bpf_prog_aux *aux, -- cgit From bdd1c37a315bc50ab14066c4852bc8dcf070451e Mon Sep 17 00:00:00 2001 From: Chao Peng Date: Tue, 16 Aug 2022 20:53:21 +0800 Subject: KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTS KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM internally used (invisible to userspace) and avoids confusion to future private slots that can have different meaning. Signed-off-by: Chao Peng Message-Id: <20220816125322.1110439-2-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 1c480b1821e1..fd5c3b4715f8 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -656,12 +656,12 @@ struct kvm_irq_routing_table { }; #endif -#ifndef KVM_PRIVATE_MEM_SLOTS -#define KVM_PRIVATE_MEM_SLOTS 0 +#ifndef KVM_INTERNAL_MEM_SLOTS +#define KVM_INTERNAL_MEM_SLOTS 0 #endif #define KVM_MEM_SLOTS_NUM SHRT_MAX -#define KVM_USER_MEM_SLOTS (KVM_MEM_SLOTS_NUM - KVM_PRIVATE_MEM_SLOTS) +#define KVM_USER_MEM_SLOTS (KVM_MEM_SLOTS_NUM - KVM_INTERNAL_MEM_SLOTS) #ifndef __KVM_VCPU_MULTIPLE_ADDRESS_SPACE static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu) -- cgit From 20ec3ebd707c77fb9b11b37193449193d4649f33 Mon Sep 17 00:00:00 2001 From: Chao Peng Date: Tue, 16 Aug 2022 20:53:22 +0800 Subject: KVM: Rename mmu_notifier_* to mmu_invalidate_* The motivation of this renaming is to make these variables and related helper functions less mmu_notifier bound and can also be used for non mmu_notifier based page invalidation. mmu_invalidate_* was chosen to better describe the purpose of 'invalidating' a page that those variables are used for. - mmu_notifier_seq/range_start/range_end are renamed to mmu_invalidate_seq/range_start/range_end. - mmu_notifier_retry{_hva} helper functions are renamed to mmu_invalidate_retry{_hva}. - mmu_notifier_count is renamed to mmu_invalidate_in_progress to avoid confusion with mn_active_invalidate_count. - While here, also update kvm_inc/dec_notifier_count() to kvm_mmu_invalidate_begin/end() to match the change for mmu_notifier_count. No functional change intended. Signed-off-by: Chao Peng Message-Id: <20220816125322.1110439-3-chao.p.peng@linux.intel.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 60 +++++++++++++++++++++++++----------------------- 1 file changed, 31 insertions(+), 29 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index fd5c3b4715f8..f4519d3689e1 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -765,10 +765,10 @@ struct kvm { #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) struct mmu_notifier mmu_notifier; - unsigned long mmu_notifier_seq; - long mmu_notifier_count; - unsigned long mmu_notifier_range_start; - unsigned long mmu_notifier_range_end; + unsigned long mmu_invalidate_seq; + long mmu_invalidate_in_progress; + unsigned long mmu_invalidate_range_start; + unsigned long mmu_invalidate_range_end; #endif struct list_head devices; u64 manual_dirty_log_protect; @@ -1357,10 +1357,10 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); #endif -void kvm_inc_notifier_count(struct kvm *kvm, unsigned long start, - unsigned long end); -void kvm_dec_notifier_count(struct kvm *kvm, unsigned long start, - unsigned long end); +void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start, + unsigned long end); +void kvm_mmu_invalidate_end(struct kvm *kvm, unsigned long start, + unsigned long end); long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); @@ -1907,42 +1907,44 @@ extern const struct kvm_stats_header kvm_vcpu_stats_header; extern const struct _kvm_stats_desc kvm_vcpu_stats_desc[]; #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER) -static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq) +static inline int mmu_invalidate_retry(struct kvm *kvm, unsigned long mmu_seq) { - if (unlikely(kvm->mmu_notifier_count)) + if (unlikely(kvm->mmu_invalidate_in_progress)) return 1; /* - * Ensure the read of mmu_notifier_count happens before the read - * of mmu_notifier_seq. This interacts with the smp_wmb() in - * mmu_notifier_invalidate_range_end to make sure that the caller - * either sees the old (non-zero) value of mmu_notifier_count or - * the new (incremented) value of mmu_notifier_seq. - * PowerPC Book3s HV KVM calls this under a per-page lock - * rather than under kvm->mmu_lock, for scalability, so - * can't rely on kvm->mmu_lock to keep things ordered. + * Ensure the read of mmu_invalidate_in_progress happens before + * the read of mmu_invalidate_seq. This interacts with the + * smp_wmb() in mmu_notifier_invalidate_range_end to make sure + * that the caller either sees the old (non-zero) value of + * mmu_invalidate_in_progress or the new (incremented) value of + * mmu_invalidate_seq. + * + * PowerPC Book3s HV KVM calls this under a per-page lock rather + * than under kvm->mmu_lock, for scalability, so can't rely on + * kvm->mmu_lock to keep things ordered. */ smp_rmb(); - if (kvm->mmu_notifier_seq != mmu_seq) + if (kvm->mmu_invalidate_seq != mmu_seq) return 1; return 0; } -static inline int mmu_notifier_retry_hva(struct kvm *kvm, - unsigned long mmu_seq, - unsigned long hva) +static inline int mmu_invalidate_retry_hva(struct kvm *kvm, + unsigned long mmu_seq, + unsigned long hva) { lockdep_assert_held(&kvm->mmu_lock); /* - * If mmu_notifier_count is non-zero, then the range maintained by - * kvm_mmu_notifier_invalidate_range_start contains all addresses that - * might be being invalidated. Note that it may include some false + * If mmu_invalidate_in_progress is non-zero, then the range maintained + * by kvm_mmu_notifier_invalidate_range_start contains all addresses + * that might be being invalidated. Note that it may include some false * positives, due to shortcuts when handing concurrent invalidations. */ - if (unlikely(kvm->mmu_notifier_count) && - hva >= kvm->mmu_notifier_range_start && - hva < kvm->mmu_notifier_range_end) + if (unlikely(kvm->mmu_invalidate_in_progress) && + hva >= kvm->mmu_invalidate_range_start && + hva < kvm->mmu_invalidate_range_end) return 1; - if (kvm->mmu_notifier_seq != mmu_seq) + if (kvm->mmu_invalidate_seq != mmu_seq) return 1; return 0; } -- cgit From 2c8cc0946c14c39b02748fba34325ecae636530a Mon Sep 17 00:00:00 2001 From: Gene Chen Date: Fri, 5 Aug 2022 15:17:12 +0800 Subject: usb: typec: tcpci: Move function "tcpci_to_typec_cc" to common Move transition function "tcpci_to_typec_cc" to common header Reviewed-by: Guenter Roeck Acked-by: Heikki Krogerus Signed-off-by: Gene Chen Link: https://lore.kernel.org/r/20220805071714.150882-7-gene.chen.richtek@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/tcpci.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb/tcpci.h b/include/linux/usb/tcpci.h index 20c0bedb8ec8..17657451c762 100644 --- a/include/linux/usb/tcpci.h +++ b/include/linux/usb/tcpci.h @@ -167,6 +167,11 @@ /* I2C_WRITE_BYTE_COUNT + 1 when TX_BUF_BYTE_x is only accessible I2C_WRITE_BYTE_COUNT */ #define TCPC_TRANSMIT_BUFFER_MAX_LEN 31 +#define tcpc_presenting_rd(reg, cc) \ + (!(TCPC_ROLE_CTRL_DRP & (reg)) && \ + (((reg) & (TCPC_ROLE_CTRL_## cc ##_MASK << TCPC_ROLE_CTRL_## cc ##_SHIFT)) == \ + (TCPC_ROLE_CTRL_CC_RD << TCPC_ROLE_CTRL_## cc ##_SHIFT))) + struct tcpci; /* @@ -207,4 +212,21 @@ irqreturn_t tcpci_irq(struct tcpci *tcpci); struct tcpm_port; struct tcpm_port *tcpci_get_tcpm_port(struct tcpci *tcpci); + +static inline enum typec_cc_status tcpci_to_typec_cc(unsigned int cc, bool sink) +{ + switch (cc) { + case 0x1: + return sink ? TYPEC_CC_RP_DEF : TYPEC_CC_RA; + case 0x2: + return sink ? TYPEC_CC_RP_1_5 : TYPEC_CC_RD; + case 0x3: + if (sink) + return TYPEC_CC_RP_3_0; + fallthrough; + case 0x0: + default: + return TYPEC_CC_OPEN; + } +} #endif /* __LINUX_USB_TCPCI_H */ -- cgit From 77bfa0fc7536e8fa7dc6f12081827e0edd75b0f9 Mon Sep 17 00:00:00 2001 From: Jim Lin Date: Tue, 16 Aug 2022 16:23:52 +0800 Subject: phy: tegra: xusb: add utmi pad power on/down ops Add utmi_pad_power_on/down ops for each SOC instead of exporting tegra_phy_xusb_utmi_pad_power_on/down directly for Tegra186 chip. Signed-off-by: BH Hsieh Signed-off-by: Jim Lin Link: https://lore.kernel.org/r/20220816082353.13390-2-jilin@nvidia.com Signed-off-by: Greg Kroah-Hartman --- include/linux/phy/tegra/xusb.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phy/tegra/xusb.h b/include/linux/phy/tegra/xusb.h index 3a35e74cdc61..70998e6dd6fd 100644 --- a/include/linux/phy/tegra/xusb.h +++ b/include/linux/phy/tegra/xusb.h @@ -1,6 +1,6 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * Copyright (c) 2016-2020, NVIDIA CORPORATION. All rights reserved. + * Copyright (c) 2016-2022, NVIDIA CORPORATION. All rights reserved. */ #ifndef PHY_TEGRA_XUSB_H @@ -21,6 +21,8 @@ int tegra_xusb_padctl_usb3_set_lfps_detect(struct tegra_xusb_padctl *padctl, unsigned int port, bool enable); int tegra_xusb_padctl_set_vbus_override(struct tegra_xusb_padctl *padctl, bool val); +void tegra_phy_xusb_utmi_pad_power_on(struct phy *phy); +void tegra_phy_xusb_utmi_pad_power_down(struct phy *phy); int tegra_phy_xusb_utmi_port_reset(struct phy *phy); int tegra_xusb_padctl_get_usb3_companion(struct tegra_xusb_padctl *padctl, unsigned int port); -- cgit From 36cb6494429bd64b27b7ff8b4af56f8e526da2b4 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 28 Jul 2022 18:22:20 +0800 Subject: hwrng: core - let sleep be interrupted when unregistering hwrng MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There are two deadlock scenarios that need addressing, which cause problems when the computer goes to sleep, the interface is set down, and hwrng_unregister() is called. When the deadlock is hit, sleep is delayed for tens of seconds, causing it to fail. These scenarios are: 1) The hwrng kthread can't be stopped while it's sleeping, because it uses msleep_interruptible() which does not react to kthread_stop. 2) A normal user thread can't be interrupted by hwrng_unregister() while it's sleeping, because hwrng_unregister() is called from elsewhere. We solve both issues by add a completion object called dying that fulfils waiters once we have started the process in hwrng_unregister. At the same time, we should cleanup a common and useless dmesg splat in the same area. Cc: Reported-by: Gregory Erwin Fixes: fcd09c90c3c5 ("ath9k: use hw_random API instead of directly dumping into random.c") Link: https://lore.kernel.org/all/CAO+Okf6ZJC5-nTE_EJUGQtd8JiCkiEHytGgDsFGTEjs0c00giw@mail.gmail.com/ Link: https://lore.kernel.org/lkml/CAO+Okf5k+C+SE6pMVfPf-d8MfVPVq4PO7EY8Hys_DVXtent3HA@mail.gmail.com/ Link: https://bugs.archlinux.org/task/75138 Signed-off-by: Jason A. Donenfeld Signed-off-by: Herbert Xu Acked-by: Toke Høiland-Jørgensen Acked-by: Kalle Valo Signed-off-by: Herbert Xu --- include/linux/hw_random.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hw_random.h b/include/linux/hw_random.h index aa1d4da03538..77c2885c4c13 100644 --- a/include/linux/hw_random.h +++ b/include/linux/hw_random.h @@ -50,6 +50,7 @@ struct hwrng { struct list_head list; struct kref ref; struct completion cleanup_done; + struct completion dying; }; struct device; @@ -61,4 +62,6 @@ extern int devm_hwrng_register(struct device *dev, struct hwrng *rng); extern void hwrng_unregister(struct hwrng *rng); extern void devm_hwrng_unregister(struct device *dve, struct hwrng *rng); +extern long hwrng_msleep(struct hwrng *rng, unsigned int msecs); + #endif /* LINUX_HWRANDOM_H_ */ -- cgit From 0f60d28828dd94779c6527440289e1c36a05115a Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 30 Jan 2022 15:03:49 -0500 Subject: dynamic_dname(): drop unused dentry argument Signed-off-by: Al Viro --- include/linux/dcache.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 92c78ed02b54..54d46518c481 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -287,8 +287,8 @@ static inline unsigned d_count(const struct dentry *dentry) /* * helper function for dentry_operations.d_dname() members */ -extern __printf(4, 5) -char *dynamic_dname(struct dentry *, char *, int, const char *, ...); +extern __printf(3, 4) +char *dynamic_dname(char *, int, const char *, ...); extern char *__d_path(const struct path *, const struct path *, char *, int); extern char *d_absolute_path(const struct path *, char *, int); -- cgit From a357f7b4583ebf81d19c95aef57497ae81c5f63c Mon Sep 17 00:00:00 2001 From: John Garry Date: Wed, 17 Aug 2022 23:20:08 +0800 Subject: ata: libata: Set __ATA_BASE_SHT max_sectors Commit 0568e6122574 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors") inadvertently capped the max_sectors value for some SATA disks to a value which is lower than we would want. For a device which supports LBA48, we would previously have request queue max_sectors_kb and max_hw_sectors_kb values of 1280 and 32767 respectively. For AHCI controllers, the value chosen for shost max sectors comes from the minimum of the SCSI host default max sectors in SCSI_DEFAULT_MAX_SECTORS (1024) and the shost DMA device mapping limit. This means that we would now set the max_sectors_kb and max_hw_sectors_kb values for a disk which supports LBA48 at 512, ignoring DMA mapping limit. As report by Oliver at [0], this caused a performance regression. Fix by picking a large enough max sectors value for ATA host controllers such that we don't needlessly reduce max_sectors_kb for LBA48 disks. [0] https://lore.kernel.org/linux-ide/YvsGbidf3na5FpGb@xsang-OptiPlex-9020/T/#m22d9fc5ad15af66066dd9fecf3d50f1b1ef11da3 Fixes: 0568e6122574 ("ata: libata-scsi: cap ata_device->max_sectors according to shost->max_sectors") Reported-by: Oliver Sang Signed-off-by: John Garry Signed-off-by: Damien Le Moal --- include/linux/libata.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 0269ff114f5a..698032e5ef2d 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1382,7 +1382,8 @@ extern const struct attribute_group *ata_common_sdev_groups[]; .proc_name = drv_name, \ .slave_destroy = ata_scsi_slave_destroy, \ .bios_param = ata_std_bios_param, \ - .unlock_native_capacity = ata_scsi_unlock_native_capacity + .unlock_native_capacity = ata_scsi_unlock_native_capacity,\ + .max_sectors = ATA_MAX_SECTORS_LBA48 #define ATA_SUBBASE_SHT(drv_name) \ __ATA_BASE_SHT(drv_name), \ -- cgit From 5535be3099717646781ce1540cf725965d680e7b Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Tue, 9 Aug 2022 22:56:40 +0200 Subject: mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW Ever since the Dirty COW (CVE-2016-5195) security issue happened, we know that FOLL_FORCE can be possibly dangerous, especially if there are races that can be exploited by user space. Right now, it would be sufficient to have some code that sets a PTE of a R/O-mapped shared page dirty, in order for it to erroneously become writable by FOLL_FORCE. The implications of setting a write-protected PTE dirty might not be immediately obvious to everyone. And in fact ever since commit 9ae0f87d009c ("mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte"), we can use UFFDIO_CONTINUE to map a shmem page R/O while marking the pte dirty. This can be used by unprivileged user space to modify tmpfs/shmem file content even if the user does not have write permissions to the file, and to bypass memfd write sealing -- Dirty COW restricted to tmpfs/shmem (CVE-2022-2590). To fix such security issues for good, the insight is that we really only need that fancy retry logic (FOLL_COW) for COW mappings that are not writable (!VM_WRITE). And in a COW mapping, we really only broke COW if we have an exclusive anonymous page mapped. If we have something else mapped, or the mapped anonymous page might be shared (!PageAnonExclusive), we have to trigger a write fault to break COW. If we don't find an exclusive anonymous page when we retry, we have to trigger COW breaking once again because something intervened. Let's move away from this mandatory-retry + dirty handling and rely on our PageAnonExclusive() flag for making a similar decision, to use the same COW logic as in other kernel parts here as well. In case we stumble over a PTE in a COW mapping that does not map an exclusive anonymous page, COW was not properly broken and we have to trigger a fake write-fault to break COW. Just like we do in can_change_pte_writable() added via commit 64fe24a3e05e ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection") and commit 76aefad628aa ("mm/mprotect: fix soft-dirty check in can_change_pte_writable()"), take care of softdirty and uffd-wp manually. For example, a write() via /proc/self/mem to a uffd-wp-protected range has to fail instead of silently granting write access and bypassing the userspace fault handler. Note that FOLL_FORCE is not only used for debug access, but also triggered by applications without debug intentions, for example, when pinning pages via RDMA. This fixes CVE-2022-2590. Note that only x86_64 and aarch64 are affected, because only those support CONFIG_HAVE_ARCH_USERFAULTFD_MINOR. Fortunately, FOLL_COW is no longer required to handle FOLL_FORCE. So let's just get rid of it. Thanks to Nadav Amit for pointing out that the pte_dirty() check in FOLL_FORCE code is problematic and might be exploitable. Note 1: We don't check for the PTE being dirty because it doesn't matter for making a "was COWed" decision anymore, and whoever modifies the page has to set the page dirty either way. Note 2: Kernels before extended uffd-wp support and before PageAnonExclusive (< 5.19) can simply revert the problematic commit instead and be safe regarding UFFDIO_CONTINUE. A backport to v5.19 requires minor adjustments due to lack of vma_soft_dirty_enabled(). Link: https://lkml.kernel.org/r/20220809205640.70916-1-david@redhat.com Fixes: 9ae0f87d009c ("mm/shmem: unconditionally set pte dirty in mfill_atomic_install_pte") Signed-off-by: David Hildenbrand Cc: Greg Kroah-Hartman Cc: Axel Rasmussen Cc: Nadav Amit Cc: Peter Xu Cc: Hugh Dickins Cc: Andrea Arcangeli Cc: Matthew Wilcox Cc: Vlastimil Babka Cc: John Hubbard Cc: Jason Gunthorpe Cc: David Laight Cc: [5.16] Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 3bedc449c14d..982f2607180b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2885,7 +2885,6 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */ -#define FOLL_COW 0x4000 /* internal GUP flag */ #define FOLL_ANON 0x8000 /* don't do file mappings */ #define FOLL_LONGTERM 0x10000 /* mapping lifetime is indefinite: see below */ #define FOLL_SPLIT_PMD 0x20000 /* split huge pmd before returning */ -- cgit From a39c5d3ce03dd890ab6a9be44b21177cec32da55 Mon Sep 17 00:00:00 2001 From: Hao Lee Date: Sun, 7 Aug 2022 15:44:42 +0000 Subject: mm: add DEVICE_ZONE to FOR_ALL_ZONES FOR_ALL_ZONES should be consistent with enum zone_type. Otherwise, __count_zid_vm_events have the potential to add count to wrong item when zid is ZONE_DEVICE. Link: https://lkml.kernel.org/r/20220807154442.GA18167@haolee.io Signed-off-by: Hao Lee Cc: David Hildenbrand Cc: Johannes Weiner Signed-off-by: Andrew Morton --- include/linux/vm_event_item.h | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index 404024486fa5..f3fc36cd2276 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -20,12 +20,19 @@ #define HIGHMEM_ZONE(xx) #endif -#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, HIGHMEM_ZONE(xx) xx##_MOVABLE +#ifdef CONFIG_ZONE_DEVICE +#define DEVICE_ZONE(xx) xx##_DEVICE, +#else +#define DEVICE_ZONE(xx) +#endif + +#define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL, \ + HIGHMEM_ZONE(xx) xx##_MOVABLE, DEVICE_ZONE(xx) enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, - FOR_ALL_ZONES(PGALLOC), - FOR_ALL_ZONES(ALLOCSTALL), - FOR_ALL_ZONES(PGSCAN_SKIP), + FOR_ALL_ZONES(PGALLOC) + FOR_ALL_ZONES(ALLOCSTALL) + FOR_ALL_ZONES(PGSCAN_SKIP) PGFREE, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE, PGFAULT, PGMAJFAULT, PGLAZYFREED, -- cgit From f369b07c861435bd812a9d14493f71b34132ed6f Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 11 Aug 2022 16:13:40 -0400 Subject: mm/uffd: reset write protection when unregister with wp-mode The motivation of this patch comes from a recent report and patchfix from David Hildenbrand on hugetlb shared handling of wr-protected page [1]. With the reproducer provided in commit message of [1], one can leverage the uffd-wp lazy-reset of ptes to trigger a hugetlb issue which can affect not only the attacker process, but also the whole system. The lazy-reset mechanism of uffd-wp was used to make unregister faster, meanwhile it has an assumption that any leftover pgtable entries should only affect the process on its own, so not only the user should be aware of anything it does, but also it should not affect outside of the process. But it seems that this is not true, and it can also be utilized to make some exploit easier. So far there's no clue showing that the lazy-reset is important to any userfaultfd users because normally the unregister will only happen once for a specific range of memory of the lifecycle of the process. Considering all above, what this patch proposes is to do explicit pte resets when unregister an uffd region with wr-protect mode enabled. It should be the same as calling ioctl(UFFDIO_WRITEPROTECT, wp=false) right before ioctl(UFFDIO_UNREGISTER) for the user. So potentially it'll make the unregister slower. From that pov it's a very slight abi change, but hopefully nothing should break with this change either. Regarding to the change itself - core of uffd write [un]protect operation is moved into a separate function (uffd_wp_range()) and it is reused in the unregister code path. Note that the new function will not check for anything, e.g. ranges or memory types, because they should have been checked during the previous UFFDIO_REGISTER or it should have failed already. It also doesn't check mmap_changing because we're with mmap write lock held anyway. I added a Fixes upon introducing of uffd-wp shmem+hugetlbfs because that's the only issue reported so far and that's the commit David's reproducer will start working (v5.19+). But the whole idea actually applies to not only file memories but also anonymous. It's just that we don't need to fix anonymous prior to v5.19- because there's no known way to exploit. IOW, this patch can also fix the issue reported in [1] as the patch 2 does. [1] https://lore.kernel.org/all/20220811103435.188481-3-david@redhat.com/ Link: https://lkml.kernel.org/r/20220811201340.39342-1-peterx@redhat.com Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") Signed-off-by: Peter Xu Cc: David Hildenbrand Cc: Mike Rapoport Cc: Mike Kravetz Cc: Andrea Arcangeli Cc: Nadav Amit Cc: Axel Rasmussen Cc: Signed-off-by: Andrew Morton --- include/linux/userfaultfd_k.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 732b522bacb7..e1b8a915e9e9 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -73,6 +73,8 @@ extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start, extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool enable_wp, atomic_t *mmap_changing); +extern void uffd_wp_range(struct mm_struct *dst_mm, struct vm_area_struct *vma, + unsigned long start, unsigned long len, bool enable_wp); /* mm helpers */ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, -- cgit From cb241339b9d020c758a6647c69f8e42538c5cf88 Mon Sep 17 00:00:00 2001 From: Hugh Dickins Date: Wed, 10 Aug 2022 21:51:09 -0700 Subject: mm/shmem: fix chattr fsflags support in tmpfs ext[234] have always allowed unimplemented chattr flags to be set, but other filesystems have tended to be stricter. Follow the stricter approach for tmpfs: I don't want to have to explain why csu attributes don't actually work, and we won't need to update the chattr(1) manpage; and it's never wrong to start off strict, relaxing later if persuaded. Allow only a (append only) i (immutable) A (no atime) and d (no dump). Although lsattr showed 'A' inherited, the NOATIME behavior was not being inherited: because nothing sync'ed FS_NOATIME_FL to S_NOATIME. Add shmem_set_inode_flags() to sync the flags, using inode_set_flags() to avoid that instant of lost immutablility during fileattr_set(). But that change switched generic/079 from passing to failing: because FS_IMMUTABLE_FL and FS_APPEND_FL had been unconventionally included in the INHERITED fsflags: remove them and generic/079 is back to passing. Link: https://lkml.kernel.org/r/2961dcb0-ddf3-b9f0-3268-12a4ff996856@google.com Fixes: e408e695f5f1 ("mm/shmem: support FS_IOC_[SG]ETFLAGS in tmpfs") Signed-off-by: Hugh Dickins Cc: "Theodore Ts'o" Cc: Radoslaw Burny Cc: "Darrick J. Wong" Cc: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/shmem_fs.h | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 1b6c4013f691..ff0b990de83d 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -29,15 +29,10 @@ struct shmem_inode_info { struct inode vfs_inode; }; -#define SHMEM_FL_USER_VISIBLE FS_FL_USER_VISIBLE -#define SHMEM_FL_USER_MODIFIABLE FS_FL_USER_MODIFIABLE -#define SHMEM_FL_INHERITED FS_FL_USER_MODIFIABLE - -/* Flags that are appropriate for regular files (all but dir-specific ones). */ -#define SHMEM_REG_FLMASK (~(FS_DIRSYNC_FL | FS_TOPDIR_FL)) - -/* Flags that are appropriate for non-directories/regular files. */ -#define SHMEM_OTHER_FLMASK (FS_NODUMP_FL | FS_NOATIME_FL) +#define SHMEM_FL_USER_VISIBLE FS_FL_USER_VISIBLE +#define SHMEM_FL_USER_MODIFIABLE \ + (FS_IMMUTABLE_FL | FS_APPEND_FL | FS_NODUMP_FL | FS_NOATIME_FL) +#define SHMEM_FL_INHERITED (FS_NODUMP_FL | FS_NOATIME_FL) struct shmem_sb_info { unsigned long max_blocks; /* How many blocks are allowed */ -- cgit From 5e61fe157a27afc7c0d4f7bcbceefdca536c015f Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Wed, 17 Aug 2022 14:32:52 +0200 Subject: net: phy: Introduce QUSGMII PHY mode The QUSGMII mode is a derivative of Cisco's USXGMII standard. This standard is pretty similar to SGMII, but allows for faster speeds, and has the build-in bits for Quad and Octa variants (like QSGMII). The main difference with SGMII/QSGMII is that USXGMII/QUSGMII re-uses the preamble to carry various information, named 'Extensions'. As of today, the USXGMII standard only mentions the "PCH" extension, which is used to convey timestamps, allowing in-band signaling of PTP timestamps without having to modify the frame itself. This commit adds support for that mode. When no extension is in use, it behaves exactly like QSGMII, although it's not compatible with QSGMII. Signed-off-by: Maxime Chevallier Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/phy.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 87638c55d844..9eeab9b9a74c 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -115,6 +115,7 @@ extern const int phy_10gbit_features_array[1]; * @PHY_INTERFACE_MODE_25GBASER: 25G BaseR * @PHY_INTERFACE_MODE_USXGMII: Universal Serial 10GE MII * @PHY_INTERFACE_MODE_10GKR: 10GBASE-KR - with Clause 73 AN + * @PHY_INTERFACE_MODE_QUSGMII: Quad Universal SGMII * @PHY_INTERFACE_MODE_MAX: Book keeping * * Describes the interface between the MAC and PHY. @@ -152,6 +153,7 @@ typedef enum { PHY_INTERFACE_MODE_USXGMII, /* 10GBASE-KR - with Clause 73 AN */ PHY_INTERFACE_MODE_10GKR, + PHY_INTERFACE_MODE_QUSGMII, PHY_INTERFACE_MODE_MAX, } phy_interface_t; @@ -267,6 +269,8 @@ static inline const char *phy_modes(phy_interface_t interface) return "10gbase-kr"; case PHY_INTERFACE_MODE_100BASEX: return "100base-x"; + case PHY_INTERFACE_MODE_QUSGMII: + return "qusgmii"; default: return "unknown"; } -- cgit From c04ade27cb7b952b6b9b9a0efa0a6129cc63f2ae Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Wed, 17 Aug 2022 14:32:54 +0200 Subject: net: phy: Add helper to derive the number of ports from a phy mode Some phy modes such as QSGMII multiplex several MAC<->PHY links on one single physical interface. QSGMII used to be the only one supported, but other modes such as QUSGMII also carry multiple links. This helper allows getting the number of links that are multiplexed on a given interface. Signed-off-by: Maxime Chevallier Reviewed-by: Andrew Lunn Signed-off-by: David S. Miller --- include/linux/phy.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 9eeab9b9a74c..7c49ab95441b 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -968,6 +968,8 @@ struct phy_fixup { const char *phy_speed_to_str(int speed); const char *phy_duplex_to_str(unsigned int duplex); +int phy_interface_num_ports(phy_interface_t interface); + /* A structure for mapping a particular speed and duplex * combination to a particular SUPPORTED and ADVERTISED value */ -- cgit From 1202cdd665315c525b5237e96e0bedc76d7e754f Mon Sep 17 00:00:00 2001 From: Stephen Hemminger Date: Wed, 17 Aug 2022 17:43:21 -0700 Subject: Remove DECnet support from kernel DECnet is an obsolete network protocol that receives more attention from kernel janitors than users. It belongs in computer protocol history museum not in Linux kernel. It has been "Orphaned" in kernel since 2010. The iproute2 support for DECnet was dropped in 5.0 release. The documentation link on Sourceforge says it is abandoned there as well. Leave the UAPI alone to keep userspace programs compiling. This means that there is still an empty neighbour table for AF_DECNET. The table of /proc/sys/net entries was updated to match current directories and reformatted to be alphabetical. Signed-off-by: Stephen Hemminger Acked-by: David Ahern Acked-by: Nikolay Aleksandrov Signed-off-by: David S. Miller --- include/linux/netdevice.h | 4 ---- include/linux/netfilter.h | 5 ----- include/linux/netfilter_defs.h | 8 -------- 3 files changed, 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 1a3cb93c3dcc..64e8662632f8 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1837,7 +1837,6 @@ enum netdev_ml_priv_type { * @tipc_ptr: TIPC specific data * @atalk_ptr: AppleTalk link * @ip_ptr: IPv4 specific data - * @dn_ptr: DECnet specific data * @ip6_ptr: IPv6 specific data * @ax25_ptr: AX.25 specific data * @ieee80211_ptr: IEEE 802.11 specific data, assign before registering @@ -2133,9 +2132,6 @@ struct net_device { #if IS_ENABLED(CONFIG_ATALK) void *atalk_ptr; #endif -#if IS_ENABLED(CONFIG_DECNET) - struct dn_dev __rcu *dn_ptr; -#endif #if IS_ENABLED(CONFIG_AX25) void *ax25_ptr; #endif diff --git a/include/linux/netfilter.h b/include/linux/netfilter.h index c2c6f332fb90..d8817d381c14 100644 --- a/include/linux/netfilter.h +++ b/include/linux/netfilter.h @@ -243,11 +243,6 @@ static inline int nf_hook(u_int8_t pf, unsigned int hook, struct net *net, hook_head = rcu_dereference(net->nf.hooks_bridge[hook]); #endif break; -#if IS_ENABLED(CONFIG_DECNET) - case NFPROTO_DECNET: - hook_head = rcu_dereference(net->nf.hooks_decnet[hook]); - break; -#endif default: WARN_ON_ONCE(1); break; diff --git a/include/linux/netfilter_defs.h b/include/linux/netfilter_defs.h index 8dddfb151f00..a5f7bef1b3a4 100644 --- a/include/linux/netfilter_defs.h +++ b/include/linux/netfilter_defs.h @@ -7,14 +7,6 @@ /* in/out/forward only */ #define NF_ARP_NUMHOOKS 3 -/* max hook is NF_DN_ROUTE (6), also see uapi/linux/netfilter_decnet.h */ -#define NF_DN_NUMHOOKS 7 - -#if IS_ENABLED(CONFIG_DECNET) -/* Largest hook number + 1, see uapi/linux/netfilter_decnet.h */ -#define NF_MAX_HOOKS NF_DN_NUMHOOKS -#else #define NF_MAX_HOOKS NF_INET_NUMHOOKS -#endif #endif -- cgit From c6ea70604249bc357ce09e9f8e16c29df0fb2fa2 Mon Sep 17 00:00:00 2001 From: "dougmill@linux.vnet.ibm.com" Date: Tue, 16 Aug 2022 15:07:13 +0100 Subject: block: sed-opal: Add ioctl to return device status Provide a mechanism to retrieve basic status information about the device, including the "supported" flag indicating whether SED-OPAL is supported. The information returned is from the various feature descriptors received during the discovery0 step, and so this ioctl does nothing more than perform the discovery0 step and then save the information received. See "struct opal_status" and OPAL_FL_* bits for the status information currently returned. This is necessary to be able to check whether a device is OPAL enabled, set up, locked or unlocked from userspace programs like systemd-cryptsetup and libcryptsetup. Right now we just have to assume the user 'knows' or blindly attempt setup/lock/unlock operations. Signed-off-by: Douglas Miller Tested-by: Luca Boccassi Reviewed-by: Scott Bauer Acked-by: Christian Brauner (Microsoft) Link: https://lore.kernel.org/r/20220816140713.84893-1-luca.boccassi@gmail.com Signed-off-by: Jens Axboe --- include/linux/sed-opal.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sed-opal.h b/include/linux/sed-opal.h index 1ac0d712a9c3..6f837bb6c715 100644 --- a/include/linux/sed-opal.h +++ b/include/linux/sed-opal.h @@ -43,6 +43,7 @@ static inline bool is_sed_ioctl(unsigned int cmd) case IOC_OPAL_MBR_DONE: case IOC_OPAL_WRITE_SHADOW_MBR: case IOC_OPAL_GENERIC_TABLE_RW: + case IOC_OPAL_GET_STATUS: return true; } return false; -- cgit From a4e1d0b76e7b32c0839e72679c530445172a2564 Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Mon, 15 Aug 2022 10:00:43 -0700 Subject: block: Change the return type of blk_mq_map_queues() into void Since blk_mq_map_queues() and the .map_queues() callbacks always return 0, change their return type into void. Most callers ignore the returned value anyway. Cc: Christoph Hellwig Cc: Jason Wang Cc: Keith Busch Cc: Martin K. Petersen Cc: Doug Gilbert Cc: Michael S. Tsirkin Signed-off-by: Bart Van Assche Reviewed-by: John Garry Acked-by: Md Haris Iqbal Reviewed-by: Sagi Grimberg Link: https://lore.kernel.org/r/20220815170043.19489-3-bvanassche@acm.org [axboe: fold in fix from Bart] Signed-off-by: Jens Axboe --- include/linux/blk-mq-pci.h | 4 ++-- include/linux/blk-mq-rdma.h | 2 +- include/linux/blk-mq-virtio.h | 2 +- include/linux/blk-mq.h | 4 ++-- 4 files changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq-pci.h b/include/linux/blk-mq-pci.h index 0b1f45c62623..ca544e1d3508 100644 --- a/include/linux/blk-mq-pci.h +++ b/include/linux/blk-mq-pci.h @@ -5,7 +5,7 @@ struct blk_mq_queue_map; struct pci_dev; -int blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, struct pci_dev *pdev, - int offset); +void blk_mq_pci_map_queues(struct blk_mq_queue_map *qmap, struct pci_dev *pdev, + int offset); #endif /* _LINUX_BLK_MQ_PCI_H */ diff --git a/include/linux/blk-mq-rdma.h b/include/linux/blk-mq-rdma.h index 5cc5f0f36218..53b58c610e76 100644 --- a/include/linux/blk-mq-rdma.h +++ b/include/linux/blk-mq-rdma.h @@ -5,7 +5,7 @@ struct blk_mq_tag_set; struct ib_device; -int blk_mq_rdma_map_queues(struct blk_mq_queue_map *map, +void blk_mq_rdma_map_queues(struct blk_mq_queue_map *map, struct ib_device *dev, int first_vec); #endif /* _LINUX_BLK_MQ_RDMA_H */ diff --git a/include/linux/blk-mq-virtio.h b/include/linux/blk-mq-virtio.h index 687ae287e1dc..13226e9b22dd 100644 --- a/include/linux/blk-mq-virtio.h +++ b/include/linux/blk-mq-virtio.h @@ -5,7 +5,7 @@ struct blk_mq_queue_map; struct virtio_device; -int blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap, +void blk_mq_virtio_map_queues(struct blk_mq_queue_map *qmap, struct virtio_device *vdev, int first_vec); #endif /* _LINUX_BLK_MQ_VIRTIO_H */ diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 92294a5fb083..c38575209d51 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -630,7 +630,7 @@ struct blk_mq_ops { * @map_queues: This allows drivers specify their own queue mapping by * overriding the setup-time function that builds the mq_map. */ - int (*map_queues)(struct blk_mq_tag_set *set); + void (*map_queues)(struct blk_mq_tag_set *set); #ifdef CONFIG_BLK_DEBUG_FS /** @@ -880,7 +880,7 @@ void blk_mq_freeze_queue_wait(struct request_queue *q); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout); -int blk_mq_map_queues(struct blk_mq_queue_map *qmap); +void blk_mq_map_queues(struct blk_mq_queue_map *qmap); void blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set, int nr_hw_queues); void blk_mq_quiesce_queue_nowait(struct request_queue *q); -- cgit From f5d632d15e9e0a037339601680d82bb840f85d10 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 5 Aug 2022 16:39:04 -0600 Subject: block: shrink rq_map_data a bit We don't need full ints for several of these members. Change the page_order and nr_entries to unsigned shorts, and the true/false from_user and null_mapped to booleans. This shrinks the struct from 32 to 24 bytes on 64-bit archs. Reviewed-by: Chaitanya Kulkarni Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index c38575209d51..74b99d716b0b 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -963,11 +963,11 @@ blk_status_t blk_insert_cloned_request(struct request *rq); struct rq_map_data { struct page **pages; - int page_order; - int nr_entries; unsigned long offset; - int null_mapped; - int from_user; + unsigned short page_order; + unsigned short nr_entries; + bool null_mapped; + bool from_user; }; int blk_rq_map_user(struct request_queue *, struct request *, -- cgit From 1ecb7d27b1af6705e9a4e94415b4d8cc8cf2fbfb Mon Sep 17 00:00:00 2001 From: Cristian Marussi Date: Wed, 17 Aug 2022 18:27:27 +0100 Subject: firmware: arm_scmi: Improve checks in the info_get operations SCMI protocols abstract and expose a number of protocol specific resources like clocks, sensors and so on. Information about such specific domain resources are generally exposed via an `info_get` protocol operation. Improve the sanity check on these operations where needed. Link: https://lore.kernel.org/r/20220817172731.1185305-3-cristian.marussi@arm.com Signed-off-by: Cristian Marussi Signed-off-by: Sudeep Holla --- include/linux/scmi_protocol.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/scmi_protocol.h b/include/linux/scmi_protocol.h index a193884ecf2b..4f765bc788ff 100644 --- a/include/linux/scmi_protocol.h +++ b/include/linux/scmi_protocol.h @@ -84,7 +84,7 @@ struct scmi_protocol_handle; struct scmi_clk_proto_ops { int (*count_get)(const struct scmi_protocol_handle *ph); - const struct scmi_clock_info *(*info_get) + const struct scmi_clock_info __must_check *(*info_get) (const struct scmi_protocol_handle *ph, u32 clk_id); int (*rate_get)(const struct scmi_protocol_handle *ph, u32 clk_id, u64 *rate); @@ -466,7 +466,7 @@ enum scmi_sensor_class { */ struct scmi_sensor_proto_ops { int (*count_get)(const struct scmi_protocol_handle *ph); - const struct scmi_sensor_info *(*info_get) + const struct scmi_sensor_info __must_check *(*info_get) (const struct scmi_protocol_handle *ph, u32 sensor_id); int (*trip_point_config)(const struct scmi_protocol_handle *ph, u32 sensor_id, u8 trip_id, u64 trip_value); -- cgit From d59b73a66e5e0682442b6d7b4965364e57078b80 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Wed, 3 Aug 2022 10:49:23 +0300 Subject: net/mlx5: Avoid false positive lockdep warning by adding lock_class_key Add a lock_class_key per mlx5 device to avoid a false positive "possible circular locking dependency" warning by lockdep, on flows which lock more than one mlx5 device, such as adding SF. kernel log: ====================================================== WARNING: possible circular locking dependency detected 5.19.0-rc8+ #2 Not tainted ------------------------------------------------------ kworker/u20:0/8 is trying to acquire lock: ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core] but task is already holding lock: ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(¬ifier->n_head)->rwsem){++++}-{3:3}: down_write+0x90/0x150 blocking_notifier_chain_register+0x53/0xa0 mlx5_sf_table_init+0x369/0x4a0 [mlx5_core] mlx5_init_one+0x261/0x490 [mlx5_core] probe_one+0x430/0x680 [mlx5_core] local_pci_probe+0xd6/0x170 work_for_cpu_fn+0x4e/0xa0 process_one_work+0x7c2/0x1340 worker_thread+0x6f6/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}: __lock_acquire+0x2fc7/0x6720 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 mlx5_init_one+0x2e/0x490 [mlx5_core] mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core] auxiliary_bus_probe+0x9d/0xe0 really_probe+0x1e0/0xaa0 __driver_probe_device+0x219/0x480 driver_probe_device+0x49/0x130 __device_attach_driver+0x1b8/0x280 bus_for_each_drv+0x123/0x1a0 __device_attach+0x1a3/0x460 bus_probe_device+0x1a2/0x260 device_add+0x9b1/0x1b40 __auxiliary_device_add+0x88/0xc0 mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core] blocking_notifier_call_chain+0xd5/0x130 mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(¬ifier->n_head)->rwsem); lock(&dev->intf_state_mutex); lock(&(¬ifier->n_head)->rwsem); lock(&dev->intf_state_mutex); *** DEADLOCK *** 4 locks held by kworker/u20:0/8: #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffff888101aa7898 (&(¬ifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130 #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460 stack backtrace: CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core] Call Trace: dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? lock_chain_count+0x20/0x20 ? register_lock_class+0x1880/0x1880 __lock_acquire+0x2fc7/0x6720 ? register_lock_class+0x1880/0x1880 ? register_lock_class+0x1880/0x1880 lock_acquire+0x1c1/0x550 ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 __mutex_lock+0x12c/0x14b0 ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? _raw_read_unlock+0x1f/0x30 ? mutex_lock_io_nested+0x1320/0x1320 ? __ioremap_caller.constprop.0+0x306/0x490 ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core] ? iounmap+0x160/0x160 mlx5_init_one+0x2e/0x490 [mlx5_core] mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core] ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core] auxiliary_bus_probe+0x9d/0xe0 really_probe+0x1e0/0xaa0 __driver_probe_device+0x219/0x480 ? auxiliary_match_id+0xe9/0x140 driver_probe_device+0x49/0x130 __device_attach_driver+0x1b8/0x280 ? driver_allows_async_probing+0x140/0x140 bus_for_each_drv+0x123/0x1a0 ? bus_for_each_dev+0x1a0/0x1a0 ? lockdep_hardirqs_on_prepare+0x286/0x400 ? trace_hardirqs_on+0x2d/0x100 __device_attach+0x1a3/0x460 ? device_driver_attach+0x1e0/0x1e0 ? kobject_uevent_env+0x22d/0xf10 bus_probe_device+0x1a2/0x260 device_add+0x9b1/0x1b40 ? dev_set_name+0xab/0xe0 ? __fw_devlink_link_to_suppliers+0x260/0x260 ? memset+0x20/0x40 ? lockdep_init_map_type+0x21a/0x7d0 __auxiliary_device_add+0x88/0xc0 ? auxiliary_device_init+0x86/0xa0 mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core] blocking_notifier_call_chain+0xd5/0x130 mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core] ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core] ? lock_downgrade+0x6e0/0x6e0 ? lockdep_hardirqs_on_prepare+0x286/0x400 process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 Fixes: 6a3273217469 ("net/mlx5: SF, Port function state change support") Signed-off-by: Moshe Shemesh Reviewed-by: Shay Drory Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 96b16fbe1aa4..7b7ce602c808 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -779,6 +779,7 @@ struct mlx5_core_dev { enum mlx5_device_state state; /* sync interface state */ struct mutex intf_state_mutex; + struct lock_class_key lock_key; unsigned long intf_state; struct mlx5_priv priv; struct mlx5_profile profile; -- cgit From 272ac1500372183ffd54b0c9f43f52afc482e610 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 15 Aug 2022 11:45:01 -0700 Subject: fscrypt: remove fscrypt_set_test_dummy_encryption() Now that all its callers have been converted to fscrypt_parse_test_dummy_encryption() and fscrypt_add_test_dummy_key() instead, fscrypt_set_test_dummy_encryption() can be removed. Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20220513231605.175121-6-ebiggers@kernel.org --- include/linux/fscrypt.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index 7d2f1e0f23b1..b95b8601b9c1 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -295,8 +295,6 @@ int fscrypt_parse_test_dummy_encryption(const struct fs_parameter *param, struct fscrypt_dummy_policy *dummy_policy); bool fscrypt_dummy_policies_equal(const struct fscrypt_dummy_policy *p1, const struct fscrypt_dummy_policy *p2); -int fscrypt_set_test_dummy_encryption(struct super_block *sb, const char *arg, - struct fscrypt_dummy_policy *dummy_policy); void fscrypt_show_test_dummy_encryption(struct seq_file *seq, char sep, struct super_block *sb); static inline bool -- cgit From a9da7251ac8bcc2f2358513868f1903ac2809b3d Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Mon, 22 Aug 2022 17:14:58 -0700 Subject: Input: gameport - move from strlcpy with unused retval to strscpy Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Signed-off-by: Wolfram Sang Link: https://lore.kernel.org/r/20220818210156.8143-1-wsa+renesas@sang-engineering.com Signed-off-by: Dmitry Torokhov --- include/linux/gameport.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/gameport.h b/include/linux/gameport.h index 69081d899492..8c2f00018e89 100644 --- a/include/linux/gameport.h +++ b/include/linux/gameport.h @@ -110,7 +110,7 @@ static inline void gameport_free_port(struct gameport *gameport) static inline void gameport_set_name(struct gameport *gameport, const char *name) { - strlcpy(gameport->name, name, sizeof(gameport->name)); + strscpy(gameport->name, name, sizeof(gameport->name)); } /* -- cgit From 13a8e0f6b01b14b2e28ba144e112c883f03a3db2 Mon Sep 17 00:00:00 2001 From: Saravana Kannan Date: Fri, 19 Aug 2022 15:16:11 -0700 Subject: Revert "driver core: Delete driver_deferred_probe_check_state()" This reverts commit 9cbffc7a59561be950ecc675d19a3d2b45202b2b. There are a few more issues to fix that have been reported in the thread for the original series [1]. We'll need to fix those before this will work. So, revert it for now. [1] - https://lore.kernel.org/lkml/20220601070707.3946847-1-saravanak@google.com/ Fixes: 9cbffc7a5956 ("driver core: Delete driver_deferred_probe_check_state()") Tested-by: Tony Lindgren Tested-by: Peng Fan Tested-by: Douglas Anderson Tested-by: Alexander Stein Reviewed-by: Tony Lindgren Signed-off-by: Saravana Kannan Link: https://lore.kernel.org/r/20220819221616.2107893-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman --- include/linux/device/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index 7acaabde5396..2114d65b862f 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -242,6 +242,7 @@ driver_find_device_by_acpi_dev(struct device_driver *drv, const void *adev) extern int driver_deferred_probe_timeout; void driver_deferred_probe_add(struct device *dev); +int driver_deferred_probe_check_state(struct device *dev); void driver_init(void); /** -- cgit From 7997eff82828304b780dc0a39707e1946d6f1ebf Mon Sep 17 00:00:00 2001 From: Florian Westphal Date: Sat, 20 Aug 2022 17:38:37 +0200 Subject: netfilter: ebtables: reject blobs that don't provide all entry points Harshit Mogalapalli says: In ebt_do_table() function dereferencing 'private->hook_entry[hook]' can lead to NULL pointer dereference. [..] Kernel panic: general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] [..] RIP: 0010:ebt_do_table+0x1dc/0x1ce0 Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88 [..] Call Trace: nf_hook_slow+0xb1/0x170 __br_forward+0x289/0x730 maybe_deliver+0x24b/0x380 br_flood+0xc6/0x390 br_dev_xmit+0xa2e/0x12c0 For some reason ebtables rejects blobs that provide entry points that are not supported by the table, but what it should instead reject is the opposite: blobs that DO NOT provide an entry point supported by the table. t->valid_hooks is the bitmask of hooks (input, forward ...) that will see packets. Providing an entry point that is not support is harmless (never called/used), but the inverse isn't: it results in a crash because the ebtables traverser doesn't expect a NULL blob for a location its receiving packets for. Instead of fixing all the individual checks, do what iptables is doing and reject all blobs that differ from the expected hooks. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Harshit Mogalapalli Reported-by: syzkaller Signed-off-by: Florian Westphal --- include/linux/netfilter_bridge/ebtables.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h index a13296d6c7ce..fd533552a062 100644 --- a/include/linux/netfilter_bridge/ebtables.h +++ b/include/linux/netfilter_bridge/ebtables.h @@ -94,10 +94,6 @@ struct ebt_table { struct ebt_replace_kernel *table; unsigned int valid_hooks; rwlock_t lock; - /* e.g. could be the table explicitly only allows certain - * matches, targets, ... 0 == let it in */ - int (*check)(const struct ebt_table_info *info, - unsigned int valid_hooks); /* the data used by the kernel */ struct ebt_table_info *private; struct nf_hook_ops *ops; -- cgit From 12cda13cfd5310bbfefdfe32a82489228e2e0381 Mon Sep 17 00:00:00 2001 From: Alexander Aring Date: Mon, 15 Aug 2022 15:43:25 -0400 Subject: fs: dlm: remove DLM_LSFL_FS from uapi The DLM_LSFL_FS flag is set in lockspaces created directly for a kernel user, as opposed to those lockspaces created for user space applications. The user space libdlm allowed this flag to be set for lockspaces created from user space, but then used by a kernel user. No kernel user has ever used this method, so remove the ability to do it. Signed-off-by: Alexander Aring Signed-off-by: David Teigland --- include/linux/dlm.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dlm.h b/include/linux/dlm.h index ff951e9f6f20..f5f55c2138ae 100644 --- a/include/linux/dlm.h +++ b/include/linux/dlm.h @@ -56,9 +56,6 @@ struct dlm_lockspace_ops { * DLM_LSFL_TIMEWARN * The dlm should emit netlink messages if locks have been waiting * for a configurable amount of time. (Unused.) - * DLM_LSFL_FS - * The lockspace user is in the kernel (i.e. filesystem). Enables - * direct bast/cast callbacks. * DLM_LSFL_NEWEXCL * dlm_new_lockspace() should return -EEXIST if the lockspace exists. * -- cgit From 56171e0db23a5e0edce1596dd2792b95ffe57bd3 Mon Sep 17 00:00:00 2001 From: Alexander Aring Date: Mon, 15 Aug 2022 15:43:27 -0400 Subject: fs: dlm: const void resource name parameter The resource name parameter should never be changed by DLM so we declare it as const. At some point it is handled as a char pointer, a resource name can be a non printable ascii string as well. This patch change it to handle it as void pointer as it is offered by DLM API. Signed-off-by: Alexander Aring Signed-off-by: David Teigland --- include/linux/dlm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dlm.h b/include/linux/dlm.h index f5f55c2138ae..c6bc2b5ee7e6 100644 --- a/include/linux/dlm.h +++ b/include/linux/dlm.h @@ -131,7 +131,7 @@ int dlm_lock(dlm_lockspace_t *lockspace, int mode, struct dlm_lksb *lksb, uint32_t flags, - void *name, + const void *name, unsigned int namelen, uint32_t parent_lkid, void (*lockast) (void *astarg), -- cgit From 0ba985024ae7db226776725d9aa436b5c1c9fca2 Mon Sep 17 00:00:00 2001 From: Shmulik Ladkani Date: Sun, 21 Aug 2022 14:35:16 +0300 Subject: flow_dissector: Make 'bpf_flow_dissect' return the bpf program retcode Let 'bpf_flow_dissect' callers know the BPF program's retcode and act accordingly. Signed-off-by: Shmulik Ladkani Signed-off-by: Daniel Borkmann Reviewed-by: Stanislav Fomichev Acked-by: John Fastabend Link: https://lore.kernel.org/bpf/20220821113519.116765-2-shmulik.ladkani@gmail.com --- include/linux/skbuff.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ca8afa382bf2..87921996175c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1460,8 +1460,8 @@ void skb_flow_dissector_init(struct flow_dissector *flow_dissector, unsigned int key_count); struct bpf_flow_dissector; -bool bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx, - __be16 proto, int nhoff, int hlen, unsigned int flags); +u32 bpf_flow_dissect(struct bpf_prog *prog, struct bpf_flow_dissector *ctx, + __be16 proto, int nhoff, int hlen, unsigned int flags); bool __skb_flow_dissect(const struct net *net, const struct sk_buff *skb, -- cgit From dea6a4e17013382b20717664ebf3d7cc405e0952 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Tue, 23 Aug 2022 15:25:51 -0700 Subject: bpf: Introduce cgroup_{common,current}_func_proto Split cgroup_base_func_proto into the following: * cgroup_common_func_proto - common helpers for all cgroup hooks * cgroup_current_func_proto - common helpers for all cgroup hooks running in the process context (== have meaningful 'current'). Move bpf_{g,s}et_retval and other cgroup-related helpers into kernel/bpf/cgroup.c so they closer to where they are being used. Signed-off-by: Stanislav Fomichev Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220823222555.523590-2-sdf@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf-cgroup.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 2bd1b5f8de9b..57e9e109257e 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -414,6 +414,11 @@ int cgroup_bpf_prog_detach(const union bpf_attr *attr, int cgroup_bpf_link_attach(const union bpf_attr *attr, struct bpf_prog *prog); int cgroup_bpf_prog_query(const union bpf_attr *attr, union bpf_attr __user *uattr); + +const struct bpf_func_proto * +cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); +const struct bpf_func_proto * +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog); #else static inline int cgroup_bpf_inherit(struct cgroup *cgrp) { return 0; } @@ -444,6 +449,18 @@ static inline int cgroup_bpf_prog_query(const union bpf_attr *attr, return -EINVAL; } +static inline const struct bpf_func_proto * +cgroup_common_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + return NULL; +} + +static inline const struct bpf_func_proto * +cgroup_current_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog) +{ + return NULL; +} + static inline int bpf_cgroup_storage_assign(struct bpf_prog_aux *aux, struct bpf_map *map) { return 0; } static inline struct bpf_cgroup_storage *bpf_cgroup_storage_alloc( -- cgit From bed89185af0de0d417e29ca1798df50f161b0231 Mon Sep 17 00:00:00 2001 From: Stanislav Fomichev Date: Tue, 23 Aug 2022 15:25:52 -0700 Subject: bpf: Use cgroup_{common,current}_func_proto in more hooks The following hooks are per-cgroup hooks but they are not using cgroup_{common,current}_func_proto, fix it: * BPF_PROG_TYPE_CGROUP_SKB (cg_skb) * BPF_PROG_TYPE_CGROUP_SOCK_ADDR (cg_sock_addr) * BPF_PROG_TYPE_CGROUP_SOCK (cg_sock) * BPF_PROG_TYPE_LSM+BPF_LSM_CGROUP Also: * move common func_proto's into cgroup func_proto handlers * make sure bpf_{g,s}et_retval are not accessible from recvmsg, getpeername and getsockname (return/errno is ignored in these places) * as a side effect, expose get_current_pid_tgid, get_current_comm_proto, get_current_ancestor_cgroup_id, get_cgroup_classid to more cgroup hooks Acked-by: Martin KaFai Lau Signed-off-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20220823222555.523590-3-sdf@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 39bd36359c1e..99fc7a64564f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2375,6 +2375,7 @@ extern const struct bpf_func_proto bpf_sock_map_update_proto; extern const struct bpf_func_proto bpf_sock_hash_update_proto; extern const struct bpf_func_proto bpf_get_current_cgroup_id_proto; extern const struct bpf_func_proto bpf_get_current_ancestor_cgroup_id_proto; +extern const struct bpf_func_proto bpf_get_cgroup_classid_curr_proto; extern const struct bpf_func_proto bpf_msg_redirect_hash_proto; extern const struct bpf_func_proto bpf_msg_redirect_map_proto; extern const struct bpf_func_proto bpf_sk_redirect_hash_proto; -- cgit From e8bf17d58a4db4b4f38617925414097f12e0d509 Mon Sep 17 00:00:00 2001 From: Evan Green Date: Mon, 22 Aug 2022 14:40:40 -0700 Subject: platform/chrome: cros_ec: Expose suspend_timeout_ms in debugfs In modern Chromebooks, the embedded controller has a mechanism where it will watch a hardware-controlled line that toggles in suspend, and wake the system up if an expected sleep transition didn't occur. This can be very useful for detecting power management issues where the system appears to suspend, but doesn't actually reach its lowest expected power states. Sometimes it's useful in debug and test scenarios to be able to control the duration of that timeout, or even disable the EC timeout mechanism altogether. Add a debugfs control to set the timeout to values other than the EC-defined default, for more convenient debug and development iteration. Signed-off-by: Evan Green Reviewed-by: Prashant Malani Reviewed-by: Stephen Boyd Reviewed-by: Guenter Roeck Link: https://lore.kernel.org/r/20220822144026.v3.1.Idd188ff3f9caddebc17ac357a13005f93333c21f@changeid [tzungbi: fix one nit in Documentation/ABI/testing/debugfs-cros-ec.] Signed-off-by: Tzung-Bi Shih --- include/linux/platform_data/cros_ec_proto.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/cros_ec_proto.h b/include/linux/platform_data/cros_ec_proto.h index 408b29ca4004..e43107e0bee1 100644 --- a/include/linux/platform_data/cros_ec_proto.h +++ b/include/linux/platform_data/cros_ec_proto.h @@ -169,6 +169,7 @@ struct cros_ec_device { int event_size; u32 host_event_wake_mask; u32 last_resume_result; + u16 suspend_timeout_ms; ktime_t last_event_time; struct notifier_block notifier_ready; -- cgit From c205cc7534a97f2d6fbd2a23a94ed7c036c6e2aa Mon Sep 17 00:00:00 2001 From: Menglong Dong Date: Sun, 21 Aug 2022 13:18:58 +0800 Subject: net: skb: prevent the split of kfree_skb_reason() by gcc Sometimes, gcc will optimize the function by spliting it to two or more functions. In this case, kfree_skb_reason() is splited to kfree_skb_reason and kfree_skb_reason.part.0. However, the function/tracepoint trace_kfree_skb() in it needs the return address of kfree_skb_reason(). This split makes the call chains becomes: kfree_skb_reason() -> kfree_skb_reason.part.0 -> trace_kfree_skb() which makes the return address that passed to trace_kfree_skb() be kfree_skb(). Therefore, introduce '__fix_address', which is the combination of '__noclone' and 'noinline', and apply it to kfree_skb_reason() to prevent to from being splited or made inline. (Is it better to simply apply '__noclone oninline' to kfree_skb_reason? I'm thinking maybe other functions have the same problems) Meanwhile, wrap 'skb_unref()' with 'unlikely()', as the compiler thinks it is likely return true and splits kfree_skb_reason(). Signed-off-by: Menglong Dong Signed-off-by: David S. Miller --- include/linux/compiler_attributes.h | 7 +++++++ include/linux/skbuff.h | 3 ++- 2 files changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index 445e80517cab..fc93c9488c76 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -371,4 +371,11 @@ */ #define __weak __attribute__((__weak__)) +/* + * Used by functions that use '__builtin_return_address'. These function + * don't want to be splited or made inline, which can make + * the '__builtin_return_address' get unexpected address. + */ +#define __fix_address noinline __noclone + #endif /* __LINUX_COMPILER_ATTRIBUTES_H */ diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ca8afa382bf2..b51d07a727c9 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1195,7 +1195,8 @@ static inline bool skb_unref(struct sk_buff *skb) return true; } -void kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason); +void __fix_address +kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason); /** * kfree_skb - free an sk_buff with 'NOT_SPECIFIED' reason -- cgit From af67508ea6cbf0e4ea27f8120056fa2efce127dd Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 23 Aug 2022 10:46:56 -0700 Subject: net: Fix data-races around sysctl_fb_tunnels_only_for_init_net. While reading sysctl_fb_tunnels_only_for_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- include/linux/netdevice.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 1a3cb93c3dcc..6d3a33fd0cdb 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -640,9 +640,14 @@ extern int sysctl_devconf_inherit_init_net; */ static inline bool net_has_fallback_tunnels(const struct net *net) { - return !IS_ENABLED(CONFIG_SYSCTL) || - !sysctl_fb_tunnels_only_for_init_net || - (net == &init_net && sysctl_fb_tunnels_only_for_init_net == 1); +#if IS_ENABLED(CONFIG_SYSCTL) + int fb_tunnels_only_for_init_net = READ_ONCE(sysctl_fb_tunnels_only_for_init_net); + + return !fb_tunnels_only_for_init_net || + (net_eq(net, &init_net) && fb_tunnels_only_for_init_net == 1); +#else + return true; +#endif } static inline int netdev_queue_numa_node_read(const struct netdev_queue *q) -- cgit From a5612ca10d1aa05624ebe72633e0c8c792970833 Mon Sep 17 00:00:00 2001 From: Kuniyuki Iwashima Date: Tue, 23 Aug 2022 10:46:57 -0700 Subject: net: Fix data-races around sysctl_devconf_inherit_init_net. While reading sysctl_devconf_inherit_init_net, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers. Fixes: 856c395cfa63 ("net: introduce a knob to control whether to inherit devconf config") Signed-off-by: Kuniyuki Iwashima Signed-off-by: David S. Miller --- include/linux/netdevice.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 6d3a33fd0cdb..05d6f3facd5a 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -650,6 +650,15 @@ static inline bool net_has_fallback_tunnels(const struct net *net) #endif } +static inline int net_inherit_devconf(void) +{ +#if IS_ENABLED(CONFIG_SYSCTL) + return READ_ONCE(sysctl_devconf_inherit_init_net); +#else + return 0; +#endif +} + static inline int netdev_queue_numa_node_read(const struct netdev_queue *q) { #if defined(CONFIG_XPS) && defined(CONFIG_NUMA) -- cgit From f78a03f6e28be0283f73d3c18b54837b638a8ccf Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 17 Aug 2022 19:18:12 +0900 Subject: mm/slab_common: remove CONFIG_NUMA ifdefs for common kmalloc functions Now that slab_alloc_node() is available for SLAB when CONFIG_NUMA=n, remove CONFIG_NUMA ifdefs for common kmalloc functions. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 28 ---------------------------- 1 file changed, 28 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 0fefdf528e0d..4754c834b0e3 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -456,38 +456,18 @@ static __always_inline void kfree_bulk(size_t size, void **p) kmem_cache_free_bulk(NULL, size, p); } -#ifdef CONFIG_NUMA void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignment __alloc_size(1); void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment __malloc; -#else -static __always_inline __alloc_size(1) void *__kmalloc_node(size_t size, gfp_t flags, int node) -{ - return __kmalloc(size, flags); -} - -static __always_inline void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) -{ - return kmem_cache_alloc(s, flags); -} -#endif #ifdef CONFIG_TRACING extern void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t flags, size_t size) __assume_slab_alignment __alloc_size(3); -#ifdef CONFIG_NUMA extern void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags, int node, size_t size) __assume_slab_alignment __alloc_size(4); -#else -static __always_inline __alloc_size(4) void *kmem_cache_alloc_node_trace(struct kmem_cache *s, - gfp_t gfpflags, int node, size_t size) -{ - return kmem_cache_alloc_trace(s, gfpflags, size); -} -#endif /* CONFIG_NUMA */ #else /* CONFIG_TRACING */ static __always_inline __alloc_size(3) void *kmem_cache_alloc_trace(struct kmem_cache *s, @@ -701,20 +681,12 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t } -#ifdef CONFIG_NUMA extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node, unsigned long caller) __alloc_size(1); #define kmalloc_node_track_caller(size, flags, node) \ __kmalloc_node_track_caller(size, flags, node, \ _RET_IP_) -#else /* CONFIG_NUMA */ - -#define kmalloc_node_track_caller(size, flags, node) \ - kmalloc_track_caller(size, flags) - -#endif /* CONFIG_NUMA */ - /* * Shortcuts */ -- cgit From c45248db04f8e3aca4798d67a394fb9cc2168118 Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 17 Aug 2022 19:18:13 +0900 Subject: mm/slab_common: cleanup kmalloc_track_caller() Make kmalloc_track_caller() wrapper of kmalloc_node_track_caller(). Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 4754c834b0e3..a0e57df3d5a4 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -651,6 +651,12 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag return kmalloc_array(n, size, flags | __GFP_ZERO); } +void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node, + unsigned long caller) __alloc_size(1); +#define kmalloc_node_track_caller(size, flags, node) \ + __kmalloc_node_track_caller(size, flags, node, \ + _RET_IP_) + /* * kmalloc_track_caller is a special version of kmalloc that records the * calling function of the routine calling it for slab leak tracking instead @@ -659,9 +665,9 @@ static inline __alloc_size(1, 2) void *kcalloc(size_t n, size_t size, gfp_t flag * allocator where we care about the real place the memory allocation * request comes from. */ -extern void *__kmalloc_track_caller(size_t size, gfp_t flags, unsigned long caller); #define kmalloc_track_caller(size, flags) \ - __kmalloc_track_caller(size, flags, _RET_IP_) + __kmalloc_node_track_caller(size, flags, \ + NUMA_NO_NODE, _RET_IP_) static inline __alloc_size(1, 2) void *kmalloc_array_node(size_t n, size_t size, gfp_t flags, int node) @@ -680,13 +686,6 @@ static inline __alloc_size(1, 2) void *kcalloc_node(size_t n, size_t size, gfp_t return kmalloc_array_node(n, size, flags | __GFP_ZERO, node); } - -extern void *__kmalloc_node_track_caller(size_t size, gfp_t flags, int node, - unsigned long caller) __alloc_size(1); -#define kmalloc_node_track_caller(size, flags, node) \ - __kmalloc_node_track_caller(size, flags, node, \ - _RET_IP_) - /* * Shortcuts */ -- cgit From e4c98d68959e51646c379e157bad36ef0d7bf467 Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 17 Aug 2022 19:18:15 +0900 Subject: mm/slab_common: fold kmalloc_order_trace() into kmalloc_large() There is no caller of kmalloc_order_trace() except kmalloc_large(). Fold it into kmalloc_large() and remove kmalloc_order{,_trace}(). Also add tracepoint in kmalloc_large() that was previously in kmalloc_order_trace(). Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 22 ++-------------------- 1 file changed, 2 insertions(+), 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index a0e57df3d5a4..15a4c59da59e 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -489,26 +489,8 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g } #endif /* CONFIG_TRACING */ -extern void *kmalloc_order(size_t size, gfp_t flags, unsigned int order) __assume_page_alignment - __alloc_size(1); - -#ifdef CONFIG_TRACING -extern void *kmalloc_order_trace(size_t size, gfp_t flags, unsigned int order) - __assume_page_alignment __alloc_size(1); -#else -static __always_inline __alloc_size(1) void *kmalloc_order_trace(size_t size, gfp_t flags, - unsigned int order) -{ - return kmalloc_order(size, flags, order); -} -#endif - -static __always_inline __alloc_size(1) void *kmalloc_large(size_t size, gfp_t flags) -{ - unsigned int order = get_order(size); - return kmalloc_order_trace(size, flags, order); -} - +void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment + __alloc_size(1); /** * kmalloc - allocate memory * @size: how many bytes of memory are required. -- cgit From a0c3b940023eef3fa005b2bc37d9312712331dcb Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 17 Aug 2022 19:18:16 +0900 Subject: mm/slub: move kmalloc_large_node() to slab_common.c In later patch SLAB will also pass requests larger than order-1 page to page allocator. Move kmalloc_large_node() to slab_common.c. Fold kmalloc_large_node_hook() into kmalloc_large_node() as there is no other caller. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 15a4c59da59e..082499306098 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -491,6 +491,10 @@ static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, g void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment __alloc_size(1); + +void *kmalloc_large_node(size_t size, gfp_t flags, int node) __assume_page_alignment + __alloc_size(1); + /** * kmalloc - allocate memory * @size: how many bytes of memory are required. -- cgit From bf37d791022ecfb1279ac88c5448a53f1ae40a59 Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 17 Aug 2022 19:18:17 +0900 Subject: mm/slab_common: kmalloc_node: pass large requests to page allocator Now that kmalloc_large_node() is in common code, pass large requests to page allocator in kmalloc_node() using kmalloc_large_node(). One problem is that currently there is no tracepoint in kmalloc_large_node(). Instead of simply putting tracepoint in it, use kmalloc_large_node{,_notrace} depending on its caller to show useful address for both inlined kmalloc_node() and __kmalloc_node_track_caller() when large objects are allocated. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 26 +++++++++++++++++++------- 1 file changed, 19 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 082499306098..fd2e129fc813 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -571,23 +571,35 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags) return __kmalloc(size, flags); } +#ifndef CONFIG_SLOB static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node) { -#ifndef CONFIG_SLOB - if (__builtin_constant_p(size) && - size <= KMALLOC_MAX_CACHE_SIZE) { - unsigned int i = kmalloc_index(size); + if (__builtin_constant_p(size)) { + unsigned int index; - if (!i) + if (size > KMALLOC_MAX_CACHE_SIZE) + return kmalloc_large_node(size, flags, node); + + index = kmalloc_index(size); + + if (!index) return ZERO_SIZE_PTR; return kmem_cache_alloc_node_trace( - kmalloc_caches[kmalloc_type(flags)][i], + kmalloc_caches[kmalloc_type(flags)][index], flags, node, size); } -#endif return __kmalloc_node(size, flags, node); } +#else +static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t flags, int node) +{ + if (__builtin_constant_p(size) && size > KMALLOC_MAX_CACHE_SIZE) + return kmalloc_large_node(size, flags, node); + + return __kmalloc_node(size, flags, node); +} +#endif /** * kmalloc_array - allocate memory for an array. -- cgit From d6a71648dbc0ca5520cba16a8fdce8d37ae74218 Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 17 Aug 2022 19:18:19 +0900 Subject: mm/slab: kmalloc: pass requests larger than order-1 page to page allocator There is not much benefit for serving large objects in kmalloc(). Let's pass large requests to page allocator like SLUB for better maintenance of common code. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 23 +++++------------------ 1 file changed, 5 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index fd2e129fc813..4ee5b2fed164 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -243,27 +243,17 @@ static inline unsigned int arch_slab_minalign(void) #ifdef CONFIG_SLAB /* - * The largest kmalloc size supported by the SLAB allocators is - * 32 megabyte (2^25) or the maximum allocatable page order if that is - * less than 32 MB. - * - * WARNING: Its not easy to increase this value since the allocators have - * to do various tricks to work around compiler limitations in order to - * ensure proper constant folding. + * SLAB and SLUB directly allocates requests fitting in to an order-1 page + * (PAGE_SIZE*2). Larger requests are passed to the page allocator. */ -#define KMALLOC_SHIFT_HIGH ((MAX_ORDER + PAGE_SHIFT - 1) <= 25 ? \ - (MAX_ORDER + PAGE_SHIFT - 1) : 25) -#define KMALLOC_SHIFT_MAX KMALLOC_SHIFT_HIGH +#define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1) +#define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1) #ifndef KMALLOC_SHIFT_LOW #define KMALLOC_SHIFT_LOW 5 #endif #endif #ifdef CONFIG_SLUB -/* - * SLUB directly allocates requests fitting in to an order-1 page - * (PAGE_SIZE*2). Larger requests are passed to the page allocator. - */ #define KMALLOC_SHIFT_HIGH (PAGE_SHIFT + 1) #define KMALLOC_SHIFT_MAX (MAX_ORDER + PAGE_SHIFT - 1) #ifndef KMALLOC_SHIFT_LOW @@ -415,10 +405,6 @@ static __always_inline unsigned int __kmalloc_index(size_t size, if (size <= 512 * 1024) return 19; if (size <= 1024 * 1024) return 20; if (size <= 2 * 1024 * 1024) return 21; - if (size <= 4 * 1024 * 1024) return 22; - if (size <= 8 * 1024 * 1024) return 23; - if (size <= 16 * 1024 * 1024) return 24; - if (size <= 32 * 1024 * 1024) return 25; if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); @@ -428,6 +414,7 @@ static __always_inline unsigned int __kmalloc_index(size_t size, /* Will never be reached. Needed because the compiler may complain */ return -1; } +static_assert(PAGE_SHIFT <= 20); #define kmalloc_index(s) __kmalloc_index(s, true) #endif /* !CONFIG_SLOB */ -- cgit From ebc97a52b5d6cd5fb0c15a3fc9cdd6eb924646a1 Mon Sep 17 00:00:00 2001 From: Yosry Ahmed Date: Tue, 23 Aug 2022 00:46:36 +0000 Subject: mm: add NR_SECONDARY_PAGETABLE to count secondary page table uses. We keep track of several kernel memory stats (total kernel memory, page tables, stack, vmalloc, etc) on multiple levels (global, per-node, per-memcg, etc). These stats give insights to users to how much memory is used by the kernel and for what purposes. Currently, memory used by KVM mmu is not accounted in any of those kernel memory stats. This patch series accounts the memory pages used by KVM for page tables in those stats in a new NR_SECONDARY_PAGETABLE stat. This stat can be later extended to account for other types of secondary pages tables (e.g. iommu page tables). KVM has a decent number of large allocations that aren't for page tables, but for most of them, the number/size of those allocations scales linearly with either the number of vCPUs or the amount of memory assigned to the VM. KVM's secondary page table allocations do not scale linearly, especially when nested virtualization is in use. From a KVM perspective, NR_SECONDARY_PAGETABLE will scale with KVM's per-VM pages_{4k,2m,1g} stats unless the guest is doing something bizarre (e.g. accessing only 4kb chunks of 2mb pages so that KVM is forced to allocate a large number of page tables even though the guest isn't accessing that much memory). However, someone would need to either understand how KVM works to make that connection, or know (or be told) to go look at KVM's stats if they're running VMs to better decipher the stats. Furthermore, having NR_PAGETABLE side-by-side with NR_SECONDARY_PAGETABLE is informative. For example, when backing a VM with THP vs. HugeTLB, NR_SECONDARY_PAGETABLE is roughly the same, but NR_PAGETABLE is an order of magnitude higher with THP. So having this stat will at the very least prove to be useful for understanding tradeoffs between VM backing types, and likely even steer folks towards potential optimizations. The original discussion with more details about the rationale: https://lore.kernel.org/all/87ilqoi77b.wl-maz@kernel.org This stat will be used by subsequent patches to count KVM mmu memory usage. Signed-off-by: Yosry Ahmed Acked-by: Shakeel Butt Acked-by: Marc Zyngier Link: https://lore.kernel.org/r/20220823004639.2387269-2-yosryahmed@google.com Signed-off-by: Sean Christopherson --- include/linux/mmzone.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e24b40c52468..355d842d2731 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -216,6 +216,7 @@ enum node_stat_item { NR_KERNEL_SCS_KB, /* measured in KiB */ #endif NR_PAGETABLE, /* used for pagetables */ + NR_SECONDARY_PAGETABLE, /* secondary pagetables, e.g. KVM pagetables */ #ifdef CONFIG_SWAP NR_SWAPCACHE, #endif -- cgit From 9d9d00ac29d0ef7ce426964de46fa6b380357d0a Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Tue, 23 Aug 2022 03:31:25 +0200 Subject: bpf: Fix reference state management for synchronous callbacks Currently, verifier verifies callback functions (sync and async) as if they will be executed once, (i.e. it explores execution state as if the function was being called once). The next insn to explore is set to start of subprog and the exit from nested frame is handled using curframe > 0 and prepare_func_exit. In case of async callback it uses a customized variant of push_stack simulating a kind of branch to set up custom state and execution context for the async callback. While this approach is simple and works when callback really will be executed only once, it is unsafe for all of our current helpers which are for_each style, i.e. they execute the callback multiple times. A callback releasing acquired references of the caller may do so multiple times, but currently verifier sees it as one call inside the frame, which then returns to caller. Hence, it thinks it released some reference that the cb e.g. got access through callback_ctx (register filled inside cb from spilled typed register on stack). Similarly, it may see that an acquire call is unpaired inside the callback, so the caller will copy the reference state of callback and then will have to release the register with new ref_obj_ids. But again, the callback may execute multiple times, but the verifier will only account for acquired references for a single symbolic execution of the callback, which will cause leaks. Note that for async callback case, things are different. While currently we have bpf_timer_set_callback which only executes it once, even for multiple executions it would be safe, as reference state is NULL and check_reference_leak would force program to release state before BPF_EXIT. The state is also unaffected by analysis for the caller frame. Hence async callback is safe. Since we want the reference state to be accessible, e.g. for pointers loaded from stack through callback_ctx's PTR_TO_STACK, we still have to copy caller's reference_state to callback's bpf_func_state, but we enforce that whatever references it adds to that reference_state has been released before it hits BPF_EXIT. This requires introducing a new callback_ref member in the reference state to distinguish between caller vs callee references. Hence, check_reference_leak now errors out if it sees we are in callback_fn and we have not released callback_ref refs. Since there can be multiple nested callbacks, like frame 0 -> cb1 -> cb2 etc. we need to also distinguish between whether this particular ref belongs to this callback frame or parent, and only error for our own, so we store state->frameno (which is always non-zero for callbacks). In short, callbacks can read parent reference_state, but cannot mutate it, to be able to use pointers acquired by the caller. They must only undo their changes (by releasing their own acquired_refs before BPF_EXIT) on top of caller reference_state before returning (at which point the caller and callback state will match anyway, so no need to copy it back to caller). Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220823013125.24938-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 2e3bad8640dc..1fdddbf3546b 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -212,6 +212,17 @@ struct bpf_reference_state { * is used purely to inform the user of a reference leak. */ int insn_idx; + /* There can be a case like: + * main (frame 0) + * cb (frame 1) + * func (frame 3) + * cb (frame 4) + * Hence for frame 4, if callback_ref just stored boolean, it would be + * impossible to distinguish nested callback refs. Hence store the + * frameno and compare that to callback_ref in check_reference_leak when + * exiting a callback function. + */ + int callback_ref; }; /* state of the program: -- cgit From ea5cba269fb1fe22b84f4d01bb3d56320e6ffa3e Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Tue, 16 Aug 2022 11:26:23 +0200 Subject: wifi: cfg80211/mac80211: check EHT capability size correctly For AP/non-AP the EHT MCS/NSS subfield size differs, the 4-octet subfield is only used for 20 MHz-only non-AP STA. Pass an argument around everywhere to be able to parse it properly. Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 55e6f4ad0ca6..6f70394417ac 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -2886,7 +2886,8 @@ ieee80211_he_spr_size(const u8 *he_spr_ie) /* Calculate 802.11be EHT capabilities IE Tx/Rx EHT MCS NSS Support Field size */ static inline u8 ieee80211_eht_mcs_nss_size(const struct ieee80211_he_cap_elem *he_cap, - const struct ieee80211_eht_cap_elem_fixed *eht_cap) + const struct ieee80211_eht_cap_elem_fixed *eht_cap, + bool from_ap) { u8 count = 0; @@ -2907,7 +2908,10 @@ ieee80211_eht_mcs_nss_size(const struct ieee80211_he_cap_elem *he_cap, if (eht_cap->phy_cap_info[0] & IEEE80211_EHT_PHY_CAP0_320MHZ_IN_6GHZ) count += 3; - return count ? count : 4; + if (count) + return count; + + return from_ap ? 3 : 4; } /* 802.11be EHT PPE Thresholds */ @@ -2943,7 +2947,8 @@ ieee80211_eht_ppe_size(u16 ppe_thres_hdr, const u8 *phy_cap_info) } static inline bool -ieee80211_eht_capa_size_ok(const u8 *he_capa, const u8 *data, u8 len) +ieee80211_eht_capa_size_ok(const u8 *he_capa, const u8 *data, u8 len, + bool from_ap) { const struct ieee80211_eht_cap_elem_fixed *elem = (const void *)data; u8 needed = sizeof(struct ieee80211_eht_cap_elem_fixed); @@ -2952,7 +2957,8 @@ ieee80211_eht_capa_size_ok(const u8 *he_capa, const u8 *data, u8 len) return false; needed += ieee80211_eht_mcs_nss_size((const void *)he_capa, - (const void *)data); + (const void *)data, + from_ap); if (len < needed) return false; -- cgit From d8c04e27d93eb0afd750d5ea60119a92b146a755 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 1 Aug 2022 14:37:31 +0300 Subject: platform/x86: pmc_atom: Fix SLP_TYPx bitfield mask On Intel hardware the SLP_TYPx bitfield occupies bits 10-12 as per ACPI specification (see Table 4.13 "PM1 Control Registers Fixed Hardware Feature Control Bits" for the details). Fix the mask and other related definitions accordingly. Fixes: 93e5eadd1f6e ("x86/platform: New Intel Atom SOC power management controller driver") Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220801113734.36131-1-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/pmc_atom.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/pmc_atom.h b/include/linux/platform_data/x86/pmc_atom.h index 3edfb6d4e67a..dd81f510e4cf 100644 --- a/include/linux/platform_data/x86/pmc_atom.h +++ b/include/linux/platform_data/x86/pmc_atom.h @@ -7,6 +7,8 @@ #ifndef PMC_ATOM_H #define PMC_ATOM_H +#include + /* ValleyView Power Control Unit PCI Device ID */ #define PCI_DEVICE_ID_VLV_PMC 0x0F1C /* CherryTrail Power Control Unit PCI Device ID */ @@ -139,9 +141,9 @@ #define ACPI_MMIO_REG_LEN 0x100 #define PM1_CNT 0x4 -#define SLEEP_TYPE_MASK 0xFFFFECFF +#define SLEEP_TYPE_MASK GENMASK(12, 10) #define SLEEP_TYPE_S5 0x1C00 -#define SLEEP_ENABLE 0x2000 +#define SLEEP_ENABLE BIT(13) extern int pmc_atom_read(int offset, u32 *value); -- cgit From 5a88ace44d0f99e2bac55d827f4cd6e8bc07e00b Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 1 Aug 2022 14:37:34 +0300 Subject: platform/x86: pmc_atom: Amend comment style and grammar The style of the comments is not uniform, make it so and fix a few grammar issues. While at it, update Copyright years. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220801113734.36131-4-andriy.shevchenko@linux.intel.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/pmc_atom.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/pmc_atom.h b/include/linux/platform_data/x86/pmc_atom.h index dd81f510e4cf..b8a701c77fd0 100644 --- a/include/linux/platform_data/x86/pmc_atom.h +++ b/include/linux/platform_data/x86/pmc_atom.h @@ -1,7 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * Intel Atom SOC Power Management Controller Header File - * Copyright (c) 2014, Intel Corporation. + * Intel Atom SoC Power Management Controller Header File + * Copyright (c) 2014-2015,2022 Intel Corporation. */ #ifndef PMC_ATOM_H -- cgit From 01ef026ab36357a818c7d8324a36dbb8beff6ff5 Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Sat, 13 Aug 2022 21:26:24 +1200 Subject: platform/x86: asus-wmi: Support the hardware GPU MUX on some laptops Support the hardware GPU MUX switch available on some models. This switch can toggle the MUX between: - 0, Dedicated mode - 1, Optimus mode Optimus mode is the regular iGPU + dGPU available, while dedicated mode switches the system to have only the dGPU available. Signed-off-by: Luke D. Jones Link: https://lore.kernel.org/r/20220813092624.6228-1-luke@ljones.dev Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index 98f2b2f20f3e..70d2347bf6ca 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -99,6 +99,9 @@ /* dgpu on/off */ #define ASUS_WMI_DEVID_DGPU 0x00090020 +/* gpu mux switch, 0 = dGPU, 1 = Optimus */ +#define ASUS_WMI_DEVID_GPU_MUX 0x00090016 + /* DSTS masks */ #define ASUS_WMI_DSTS_STATUS_BIT 0x00000001 #define ASUS_WMI_DSTS_UNKNOWN_BIT 0x00000002 -- cgit From e397c3c460bf3849384f2f55516d1887617cfca9 Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Sat, 13 Aug 2022 21:27:53 +1200 Subject: platform/x86: asus-wmi: Add support for ROG X13 tablet mode Add quirk for ASUS ROG X13 Flow 2-in-1 to enable tablet mode with lid flip (all screen rotations). Signed-off-by: Luke D. Jones Link: https://lore.kernel.org/r/20220813092753.6635-2-luke@ljones.dev Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index 70d2347bf6ca..6e8a95c10d17 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -65,6 +65,7 @@ #define ASUS_WMI_DEVID_PANEL_OD 0x00050019 #define ASUS_WMI_DEVID_CAMERA 0x00060013 #define ASUS_WMI_DEVID_LID_FLIP 0x00060062 +#define ASUS_WMI_DEVID_LID_FLIP_ROG 0x00060077 /* Storage */ #define ASUS_WMI_DEVID_CARDREADER 0x00080013 -- cgit From d4ccaf58a8472123ac97e6db03932c375b5c45ba Mon Sep 17 00:00:00 2001 From: Hao Luo Date: Wed, 24 Aug 2022 16:31:13 -0700 Subject: bpf: Introduce cgroup iter Cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: - walking a cgroup's descendants in pre-order. - walking a cgroup's descendants in post-order. - walking a cgroup's ancestors. - process only the given cgroup. When attaching cgroup_iter, one can set a cgroup to the iter_link created from attaching. This cgroup is passed as a file descriptor or cgroup id and serves as the starting point of the walk. If no cgroup is specified, the starting point will be the root cgroup v2. For walking descendants, one can specify the order: either pre-order or post-order. For walking ancestors, the walk starts at the specified cgroup and ends at the root. One can also terminate the walk early by returning 1 from the iter program. Note that because walking cgroup hierarchy holds cgroup_mutex, the iter program is called with cgroup_mutex held. Currently only one session is supported, which means, depending on the volume of data bpf program intends to send to user space, the number of cgroups that can be walked is limited. For example, given the current buffer size is 8 * PAGE_SIZE, if the program sends 64B data for each cgroup, assuming PAGE_SIZE is 4kb, the total number of cgroups that can be walked is 512. This is a limitation of cgroup_iter. If the output data is larger than the kernel buffer size, after all data in the kernel buffer is consumed by user space, the subsequent read() syscall will signal EOPNOTSUPP. In order to work around, the user may have to update their program to reduce the volume of data sent to output. For example, skip some uninteresting cgroups. In future, we may extend bpf_iter flags to allow customizing buffer size. Acked-by: Yonghong Song Acked-by: Tejun Heo Signed-off-by: Hao Luo Link: https://lore.kernel.org/r/20220824233117.1312810-2-haoluo@google.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 99fc7a64564f..9c1674973e03 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -48,6 +48,7 @@ struct mem_cgroup; struct module; struct bpf_func_state; struct ftrace_ops; +struct cgroup; extern struct idr btf_idr; extern spinlock_t btf_idr_lock; @@ -1730,7 +1731,14 @@ int bpf_obj_get_user(const char __user *pathname, int flags); int __init bpf_iter_ ## target(args) { return 0; } struct bpf_iter_aux_info { + /* for map_elem iter */ struct bpf_map *map; + + /* for cgroup iter */ + struct { + struct cgroup *start; /* starting cgroup */ + enum bpf_cgroup_iter_order order; + } cgroup; }; typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog, -- cgit From 40bfe7a86d84cf08ac6a8fe2f0c8bf7a43edd110 Mon Sep 17 00:00:00 2001 From: Thierry Reding Date: Wed, 24 Aug 2022 17:32:56 +0200 Subject: of/device: Fix up of_dma_configure_id() stub Since the stub version of of_dma_configure_id() was added in commit a081bd4af4ce ("of/device: Add input id to of_dma_configure()"), it has not matched the signature of the full function, leading to build failure reports when code using this function is built on !OF configurations. Fixes: a081bd4af4ce ("of/device: Add input id to of_dma_configure()") Cc: stable@vger.kernel.org Signed-off-by: Thierry Reding Reviewed-by: Frank Rowand Acked-by: Lorenzo Pieralisi Link: https://lore.kernel.org/r/20220824153256.1437483-1-thierry.reding@gmail.com Signed-off-by: Rob Herring --- include/linux/of_device.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of_device.h b/include/linux/of_device.h index 1d7992a02e36..1a803e4335d3 100644 --- a/include/linux/of_device.h +++ b/include/linux/of_device.h @@ -101,8 +101,9 @@ static inline struct device_node *of_cpu_device_node_get(int cpu) } static inline int of_dma_configure_id(struct device *dev, - struct device_node *np, - bool force_dma) + struct device_node *np, + bool force_dma, + const u32 *id) { return 0; } -- cgit From b9030780971b56c0c455c3b66244efd96608846d Mon Sep 17 00:00:00 2001 From: Uros Bizjak Date: Mon, 22 Aug 2022 16:32:43 +0200 Subject: netdev: Use try_cmpxchg in napi_if_scheduled_mark_missed Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in napi_if_scheduled_mark_missed. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications. Cc: Eric Dumazet Cc: Paolo Abeni Signed-off-by: Uros Bizjak Link: https://lore.kernel.org/r/20220822143243.2798-1-ubizjak@gmail.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 64e8662632f8..8839abc1f941 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -546,8 +546,8 @@ static inline bool napi_if_scheduled_mark_missed(struct napi_struct *n) { unsigned long val, new; + val = READ_ONCE(n->state); do { - val = READ_ONCE(n->state); if (val & NAPIF_STATE_DISABLE) return true; @@ -555,7 +555,7 @@ static inline bool napi_if_scheduled_mark_missed(struct napi_struct *n) return false; new = val | NAPIF_STATE_MISSED; - } while (cmpxchg(&n->state, val, new) != val); + } while (!try_cmpxchg(&n->state, &val, new)); return true; } -- cgit From e00923c59e68b63c998a0fef4145b5279331968a Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Thu, 9 Jun 2022 12:33:13 +0900 Subject: ata: libata: Rename ATA_DFLAG_NCQ_PRIO_ENABLE Rename ATA_DFLAG_NCQ_PRIO_ENABLE to ATA_DFLAG_NCQ_PRIO_ENABLED to match the fact that this flags indicates if NCQ priority use is enabled by the user. Signed-off-by: Damien Le Moal --- include/linux/libata.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 0269ff114f5a..a505cfb92ab3 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -101,7 +101,7 @@ enum { ATA_DFLAG_UNLOCK_HPA = (1 << 18), /* unlock HPA */ ATA_DFLAG_NCQ_SEND_RECV = (1 << 19), /* device supports NCQ SEND and RECV */ ATA_DFLAG_NCQ_PRIO = (1 << 20), /* device supports NCQ priority */ - ATA_DFLAG_NCQ_PRIO_ENABLE = (1 << 21), /* Priority cmds sent to dev */ + ATA_DFLAG_NCQ_PRIO_ENABLED = (1 << 21), /* Priority cmds sent to dev */ ATA_DFLAG_INIT_MASK = (1 << 24) - 1, ATA_DFLAG_DETACH = (1 << 24), -- cgit From 12ff4c803d23218f9afea9f82019a5c9619f6744 Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Fri, 26 Aug 2022 12:42:10 +1200 Subject: platform/x86: asus-wmi: Support the GPU fan on TUF laptops Add support for TUF laptops which have the ability to control the GPU fan. This will show as a second fan in hwmon, and has the ability to run as boost (fullspeed), or auto. Signed-off-by: Luke D. Jones Link: https://lore.kernel.org/r/20220826004210.356534-3-luke@ljones.dev Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index 6e8a95c10d17..31d810e6569d 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -79,6 +79,7 @@ #define ASUS_WMI_DEVID_THERMAL_CTRL 0x00110011 #define ASUS_WMI_DEVID_FAN_CTRL 0x00110012 /* deprecated */ #define ASUS_WMI_DEVID_CPU_FAN_CTRL 0x00110013 +#define ASUS_WMI_DEVID_GPU_FAN_CTRL 0x00110014 #define ASUS_WMI_DEVID_CPU_FAN_CURVE 0x00110024 #define ASUS_WMI_DEVID_GPU_FAN_CURVE 0x00110025 -- cgit From e305a71cea37a64c7558b8b979f6f08f657d0c3d Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Fri, 26 Aug 2022 11:22:50 +1200 Subject: platform/x86: asus-wmi: Implement TUF laptop keyboard LED modes Adds support for changing the laptop keyboard LED mode and colour. The modes are visible effects such as static, rainbow, pulsing, colour cycles. These sysfs attributes are added to asus::kbd_backlight: - kbd_rgb_mode - kbd_rgb_mode_index Signed-off-by: Luke D. Jones Link: https://lore.kernel.org/r/20220825232251.345893-2-luke@ljones.dev Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index 31d810e6569d..09e5477f1aea 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -104,6 +104,9 @@ /* gpu mux switch, 0 = dGPU, 1 = Optimus */ #define ASUS_WMI_DEVID_GPU_MUX 0x00090016 +/* TUF laptop RGB modes/colours */ +#define ASUS_WMI_DEVID_TUF_RGB_MODE 0x00100056 + /* DSTS masks */ #define ASUS_WMI_DSTS_STATUS_BIT 0x00000001 #define ASUS_WMI_DSTS_UNKNOWN_BIT 0x00000002 -- cgit From 61f64515299e5f4885093656087ec1c0df8109c5 Mon Sep 17 00:00:00 2001 From: "Luke D. Jones" Date: Fri, 26 Aug 2022 11:22:51 +1200 Subject: platform/x86: asus-wmi: Implement TUF laptop keyboard power states Adds support for setting various power states of TUF keyboards. These states are combinations of: - boot, set if a boot animation is shown on keyboard - awake, set if the keyboard LEDs are visible while laptop is on - sleep, set if an animation is displayed while the laptop is suspended - keyboard (unknown effect) Adds two sysfs attributes to asus::kbd_backlight: - kbd_rgb_state - kbd_rgb_state_index Signed-off-by: Luke D. Jones Link: https://lore.kernel.org/r/20220825232251.345893-3-luke@ljones.dev Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/asus-wmi.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/asus-wmi.h b/include/linux/platform_data/x86/asus-wmi.h index 09e5477f1aea..28234dc9fa6a 100644 --- a/include/linux/platform_data/x86/asus-wmi.h +++ b/include/linux/platform_data/x86/asus-wmi.h @@ -107,6 +107,9 @@ /* TUF laptop RGB modes/colours */ #define ASUS_WMI_DEVID_TUF_RGB_MODE 0x00100056 +/* TUF laptop RGB power/state */ +#define ASUS_WMI_DEVID_TUF_RGB_STATE 0x00100057 + /* DSTS masks */ #define ASUS_WMI_DSTS_STATUS_BIT 0x00000001 #define ASUS_WMI_DSTS_UNKNOWN_BIT 0x00000002 -- cgit From 2a5840124009f133bd09fd855963551fb2cefe22 Mon Sep 17 00:00:00 2001 From: Luis Chamberlain Date: Fri, 15 Jul 2022 12:16:22 -0700 Subject: lsm,io_uring: add LSM hooks for the new uring_cmd file op io-uring cmd support was added through ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd"), this extended the struct file_operations to allow a new command which each subsystem can use to enable command passthrough. Add an LSM specific for the command passthrough which enables LSMs to inspect the command details. This was discussed long ago without no clear pointer for something conclusive, so this enables LSMs to at least reject this new file operation. [0] https://lkml.kernel.org/r/8adf55db-7bab-f59d-d612-ed906b948d19@schaufler-ca.com Cc: stable@vger.kernel.org Fixes: ee692a21e9bf ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Luis Chamberlain Acked-by: Jens Axboe Signed-off-by: Paul Moore --- include/linux/lsm_hook_defs.h | 1 + include/linux/lsm_hooks.h | 3 +++ include/linux/security.h | 5 +++++ 3 files changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 806448173033..60fff133c0b1 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -407,4 +407,5 @@ LSM_HOOK(int, 0, perf_event_write, struct perf_event *event) #ifdef CONFIG_IO_URING LSM_HOOK(int, 0, uring_override_creds, const struct cred *new) LSM_HOOK(int, 0, uring_sqpoll, void) +LSM_HOOK(int, 0, uring_cmd, struct io_uring_cmd *ioucmd) #endif /* CONFIG_IO_URING */ diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h index 84a0d7e02176..3aa6030302f5 100644 --- a/include/linux/lsm_hooks.h +++ b/include/linux/lsm_hooks.h @@ -1582,6 +1582,9 @@ * Check whether the current task is allowed to spawn a io_uring polling * thread (IORING_SETUP_SQPOLL). * + * @uring_cmd: + * Check whether the file_operations uring_cmd is allowed to run. + * */ union security_list_options { #define LSM_HOOK(RET, DEFAULT, NAME, ...) RET (*NAME)(__VA_ARGS__); diff --git a/include/linux/security.h b/include/linux/security.h index 1bc362cb413f..7bd0c490703d 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -2060,6 +2060,7 @@ static inline int security_perf_event_write(struct perf_event *event) #ifdef CONFIG_SECURITY extern int security_uring_override_creds(const struct cred *new); extern int security_uring_sqpoll(void); +extern int security_uring_cmd(struct io_uring_cmd *ioucmd); #else static inline int security_uring_override_creds(const struct cred *new) { @@ -2069,6 +2070,10 @@ static inline int security_uring_sqpoll(void) { return 0; } +static inline int security_uring_cmd(struct io_uring_cmd *ioucmd) +{ + return 0; +} #endif /* CONFIG_SECURITY */ #endif /* CONFIG_IO_URING */ -- cgit From 8238b4579866b7c1bb99883cfe102a43db5506ff Mon Sep 17 00:00:00 2001 From: Mikulas Patocka Date: Fri, 26 Aug 2022 09:17:08 -0400 Subject: wait_on_bit: add an acquire memory barrier There are several places in the kernel where wait_on_bit is not followed by a memory barrier (for example, in drivers/md/dm-bufio.c:new_read). On architectures with weak memory ordering, it may happen that memory accesses that follow wait_on_bit are reordered before wait_on_bit and they may return invalid data. Fix this class of bugs by introducing a new function "test_bit_acquire" that works like test_bit, but has acquire memory ordering semantics. Signed-off-by: Mikulas Patocka Acked-by: Will Deacon Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds --- include/linux/bitops.h | 1 + include/linux/buffer_head.h | 2 +- include/linux/wait_bit.h | 8 ++++---- 3 files changed, 6 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index cf9bf65039f2..3b89c64bcfd8 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -59,6 +59,7 @@ extern unsigned long __sw_hweight64(__u64 w); #define __test_and_clear_bit(nr, addr) bitop(___test_and_clear_bit, nr, addr) #define __test_and_change_bit(nr, addr) bitop(___test_and_change_bit, nr, addr) #define test_bit(nr, addr) bitop(_test_bit, nr, addr) +#define test_bit_acquire(nr, addr) bitop(_test_bit_acquire, nr, addr) /* * Include this here because some architectures need generic_ffs/fls in diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index def8b8d30ccc..089c9ade4325 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -156,7 +156,7 @@ static __always_inline int buffer_uptodate(const struct buffer_head *bh) * make it consistent with folio_test_uptodate * pairs with smp_mb__before_atomic in set_buffer_uptodate */ - return (smp_load_acquire(&bh->b_state) & (1UL << BH_Uptodate)) != 0; + return test_bit_acquire(BH_Uptodate, &bh->b_state); } #define bh_offset(bh) ((unsigned long)(bh)->b_data & ~PAGE_MASK) diff --git a/include/linux/wait_bit.h b/include/linux/wait_bit.h index 7dec36aecbd9..7725b7579b78 100644 --- a/include/linux/wait_bit.h +++ b/include/linux/wait_bit.h @@ -71,7 +71,7 @@ static inline int wait_on_bit(unsigned long *word, int bit, unsigned mode) { might_sleep(); - if (!test_bit(bit, word)) + if (!test_bit_acquire(bit, word)) return 0; return out_of_line_wait_on_bit(word, bit, bit_wait, @@ -96,7 +96,7 @@ static inline int wait_on_bit_io(unsigned long *word, int bit, unsigned mode) { might_sleep(); - if (!test_bit(bit, word)) + if (!test_bit_acquire(bit, word)) return 0; return out_of_line_wait_on_bit(word, bit, bit_wait_io, @@ -123,7 +123,7 @@ wait_on_bit_timeout(unsigned long *word, int bit, unsigned mode, unsigned long timeout) { might_sleep(); - if (!test_bit(bit, word)) + if (!test_bit_acquire(bit, word)) return 0; return out_of_line_wait_on_bit_timeout(word, bit, bit_wait_timeout, @@ -151,7 +151,7 @@ wait_on_bit_action(unsigned long *word, int bit, wait_bit_action_f *action, unsigned mode) { might_sleep(); - if (!test_bit(bit, word)) + if (!test_bit_acquire(bit, word)) return 0; return out_of_line_wait_on_bit(word, bit, action, mode); } -- cgit From fa7e439cf90ba23ea473d0b7d85efd02ae6ccf94 Mon Sep 17 00:00:00 2001 From: Michal Koutný Date: Fri, 26 Aug 2022 18:52:37 +0200 Subject: cgroup: Homogenize cgroup_get_from_id() return value MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cgroup id is user provided datum hence extend its return domain to include possible error reason (similar to cgroup_get_from_fd()). This change also fixes commit d4ccaf58a847 ("bpf: Introduce cgroup iter") that would use NULL instead of proper error handling in d4ccaf58a847 ("bpf: Introduce cgroup iter"). Additionally, neither of: fc_appid_store, bpf_iter_attach_cgroup, mem_cgroup_get_from_ino (callers of cgroup_get_from_fd) is built without CONFIG_CGROUPS (depends via CONFIG_BLK_CGROUP, direct, transitive CONFIG_MEMCG respectively) transitive, so drop the singular definition not needed with !CONFIG_CGROUPS. Fixes: d4ccaf58a847 ("bpf: Introduce cgroup iter") Signed-off-by: Michal Koutný Signed-off-by: Tejun Heo --- include/linux/cgroup.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 4d143729b246..30ee8ce2665e 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -750,11 +750,6 @@ static inline bool task_under_cgroup_hierarchy(struct task_struct *task, static inline void cgroup_path_from_kernfs_id(u64 id, char *buf, size_t buflen) {} - -static inline struct cgroup *cgroup_get_from_id(u64 id) -{ - return NULL; -} #endif /* !CONFIG_CGROUPS */ #ifdef CONFIG_CGROUPS -- cgit From 93315e46b000fc80fff5d53c3f444417fb3df6de Mon Sep 17 00:00:00 2001 From: Sandipan Das Date: Thu, 11 Aug 2022 18:00:00 +0530 Subject: perf/core: Add speculation info to branch entries Add a new "spec" bitfield to branch entries for providing speculation information. This will be populated using hints provided by branch sampling features on supported hardware. The following cases are covered: * No branch speculation information is available * Branch is speculative but taken on the wrong path * Branch is non-speculative but taken on the correct path * Branch is speculative and taken on the correct path Suggested-by: Stephane Eranian Signed-off-by: Sandipan Das Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/834088c302faf21c7b665031dd111f424e509a64.1660211399.git.sandipan.das@amd.com --- include/linux/perf_event.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index ee8b9ecdc03b..ae30c61957d2 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1078,6 +1078,7 @@ static inline void perf_clear_branch_entry_bitfields(struct perf_branch_entry *b br->abort = 0; br->cycles = 0; br->type = 0; + br->spec = PERF_BR_SPEC_NA; br->reserved = 0; } -- cgit From d2a4cbcb8bdc0e3d1cf85bf47a670695da0bc27a Mon Sep 17 00:00:00 2001 From: Dmitry Rokosov Date: Fri, 12 Aug 2022 16:52:26 +0000 Subject: units: complement the set of Hz units Currently, Hz units do not have milli, micro and nano Hz coefficients. Some drivers (IIO especially) use their analogues to calculate appropriate Hz values. This patch includes them to units.h definitions, so they can be used from different kernel places. Signed-off-by: Dmitry Rokosov Link: https://lore.kernel.org/r/20220812165243.22177-3-ddrokosov@sberdevices.ru Signed-off-by: Jonathan Cameron --- include/linux/units.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/units.h b/include/linux/units.h index 681fc652e3d7..2793a41e73a2 100644 --- a/include/linux/units.h +++ b/include/linux/units.h @@ -20,6 +20,9 @@ #define PICO 1000000000000ULL #define FEMTO 1000000000000000ULL +#define NANOHZ_PER_HZ 1000000000UL +#define MICROHZ_PER_HZ 1000000UL +#define MILLIHZ_PER_HZ 1000UL #define HZ_PER_KHZ 1000UL #define KHZ_PER_MHZ 1000UL #define HZ_PER_MHZ 1000000UL -- cgit From 1f5d7ea73c4b630dbb2c90818cb9fc0be54d2fe3 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Mon, 22 Aug 2022 17:49:23 +0000 Subject: lib/string_helpers: Add str_read_write() helper Add str_read_write() helper to return 'read' or 'write' string literal. Signed-off-by: Andy Shevchenko Signed-off-by: Dmitry Rokosov Link: https://lore.kernel.org/r/20220822175011.2886-2-ddrokosov@sberdevices.ru Signed-off-by: Jonathan Cameron --- include/linux/string_helpers.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h index 4d72258d42fd..9e22cd78f3b8 100644 --- a/include/linux/string_helpers.h +++ b/include/linux/string_helpers.h @@ -126,4 +126,9 @@ static inline const char *str_enabled_disabled(bool v) return v ? "enabled" : "disabled"; } +static inline const char *str_read_write(bool v) +{ + return v ? "read" : "write"; +} + #endif -- cgit From fcab34b433e2c13e333b2f53c4a8409eadc432c7 Mon Sep 17 00:00:00 2001 From: Alex Williamson Date: Wed, 10 Aug 2022 10:53:59 -0600 Subject: mm: re-allow pinning of zero pfns (again) The below referenced commit makes the same error as 1c563432588d ("mm: fix is_pinnable_page against a cma page"), re-interpreting the logic to exclude pinning of the zero page, which breaks device assignment with vfio. To avoid further subtle mistakes, split the logic into discrete tests. [akpm@linux-foundation.org: simplify comment, per John] Link: https://lkml.kernel.org/r/166015037385.760108.16881097713975517242.stgit@omen Link: https://lore.kernel.org/all/165490039431.944052.12458624139225785964.stgit@omen Fixes: f25cbb7a95a2 ("mm: add zone device coherent type memory support") Signed-off-by: Alex Williamson Suggested-by: Matthew Wilcox Suggested-by: Felix Kuehling Tested-by: Slawomir Laba Reviewed-by: John Hubbard Cc: Alex Sierra Cc: Christoph Hellwig Cc: Alistair Popple Signed-off-by: Andrew Morton --- include/linux/mm.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 982f2607180b..21f8b27bd9fd 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1544,9 +1544,16 @@ static inline bool is_longterm_pinnable_page(struct page *page) if (mt == MIGRATE_CMA || mt == MIGRATE_ISOLATE) return false; #endif - return !(is_device_coherent_page(page) || - is_zone_movable_page(page) || - is_zero_pfn(page_to_pfn(page))); + /* The zero page may always be pinned */ + if (is_zero_pfn(page_to_pfn(page))) + return true; + + /* Coherent device memory must always allow eviction. */ + if (is_device_coherent_page(page)) + return false; + + /* Otherwise, non-movable zone pages can be pinned. */ + return !is_zone_movable_page(page); } #else static inline bool is_longterm_pinnable_page(struct page *page) -- cgit From dbb16df6443c59e8a1ef21c2272fcf387d600ddf Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Wed, 17 Aug 2022 17:21:39 +0000 Subject: Revert "memcg: cleanup racy sum avoidance code" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 96e51ccf1af33e82f429a0d6baebba29c6448d0f. Recently we started running the kernel with rstat infrastructure on production traffic and begin to see negative memcg stats values. Particularly the 'sock' stat is the one which we observed having negative value. $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 18446744073708724224 Re-run after couple of seconds $ grep "sock " /mnt/memory/job/memory.stat sock 253952 total_sock 53248 For now we are only seeing this issue on large machines (256 CPUs) and only with 'sock' stat. I think the networking stack increase the stat on one cpu and decrease it on another cpu much more often. So, this negative sock is due to rstat flusher flushing the stats on the CPU that has seen the decrement of sock but missed the CPU that has increments. A typical race condition. For easy stable backport, revert is the most simple solution. For long term solution, I am thinking of two directions. First is just reduce the race window by optimizing the rstat flusher. Second is if the reader sees a negative stat value, force flush and restart the stat collection. Basically retry but limited. Link: https://lkml.kernel.org/r/20220817172139.3141101-1-shakeelb@google.com Fixes: 96e51ccf1af33e8 ("memcg: cleanup racy sum avoidance code") Signed-off-by: Shakeel Butt Cc: "Michal Koutný" Cc: Johannes Weiner Cc: Michal Hocko Cc: Roman Gushchin Cc: Muchun Song Cc: David Hildenbrand Cc: Yosry Ahmed Cc: Greg Thelen Cc: [5.15] Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 4d31ce55b1c0..6257867fbf95 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -987,19 +987,30 @@ static inline void mod_memcg_page_state(struct page *page, static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) { - return READ_ONCE(memcg->vmstats.state[idx]); + long x = READ_ONCE(memcg->vmstats.state[idx]); +#ifdef CONFIG_SMP + if (x < 0) + x = 0; +#endif + return x; } static inline unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx) { struct mem_cgroup_per_node *pn; + long x; if (mem_cgroup_disabled()) return node_page_state(lruvec_pgdat(lruvec), idx); pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); - return READ_ONCE(pn->lruvec_stats.state[idx]); + x = READ_ONCE(pn->lruvec_stats.state[idx]); +#ifdef CONFIG_SMP + if (x < 0) + x = 0; +#endif + return x; } static inline unsigned long lruvec_page_state_local(struct lruvec *lruvec, -- cgit From bc12b70f7d216b36bd87701349374a13e486f8eb Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 25 Aug 2022 13:36:40 +0200 Subject: x86/earlyprintk: Clean up pciserial While working on a GRUB patch to support PCI-serial, a number of cleanups were suggested that apply to the code I took inspiration from. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Bjorn Helgaas # pci_ids.h Reviewed-by: Greg Kroah-Hartman Link: https://lkml.kernel.org/r/YwdeyCEtW+wa+QhH@worktop.programming.kicks-ass.net --- include/linux/pci_ids.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 6feade66efdb..41b3fffdbb8e 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -75,6 +75,9 @@ #define PCI_CLASS_COMMUNICATION_MODEM 0x0703 #define PCI_CLASS_COMMUNICATION_OTHER 0x0780 +/* Interface for SERIAL/MODEM */ +#define PCI_SERIAL_16550_COMPATIBLE 0x02 + #define PCI_BASE_CLASS_SYSTEM 0x08 #define PCI_CLASS_SYSTEM_PIC 0x0800 #define PCI_CLASS_SYSTEM_PIC_IOAPIC 0x080010 -- cgit From 9c5d03d362519f36cd551aec596388f895c93d2d Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 24 Aug 2022 17:18:30 -0700 Subject: genetlink: start to validate reserved header bytes We had historically not checked that genlmsghdr.reserved is 0 on input which prevents us from using those precious bytes in the future. One use case would be to extend the cmd field, which is currently just 8 bits wide and 256 is not a lot of commands for some core families. To make sure that new families do the right thing by default put the onus of opting out of validation on existing families. Signed-off-by: Jakub Kicinski Acked-by: Paul Moore (NetLabel) Signed-off-by: David S. Miller --- include/linux/genl_magic_func.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/genl_magic_func.h b/include/linux/genl_magic_func.h index 939b1a8f571b..4a4b387181ad 100644 --- a/include/linux/genl_magic_func.h +++ b/include/linux/genl_magic_func.h @@ -294,6 +294,7 @@ static struct genl_family ZZZ_genl_family __ro_after_init = { .ops = ZZZ_genl_ops, .n_ops = ARRAY_SIZE(ZZZ_genl_ops), .mcgrps = ZZZ_genl_mcgrps, + .resv_start_op = 42, /* drbd is currently the only user */ .n_mcgrps = ARRAY_SIZE(ZZZ_genl_mcgrps), .module = THIS_MODULE, }; -- cgit From e8013f8edaa30e17fd583a326daff1fa86b75c82 Mon Sep 17 00:00:00 2001 From: Casper Andersson Date: Thu, 25 Aug 2022 11:28:35 +0200 Subject: ethernet: Add helpers to recognize addresses mapped to IP multicast IP multicast must sometimes be discriminated from non-IP multicast, e.g. when determining the forwarding behavior of a given group in the presence of multicast router ports on an offloaded bridge. Therefore, provide helpers to identify these groups. Signed-off-by: Casper Andersson Signed-off-by: David S. Miller --- include/linux/etherdevice.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) (limited to 'include/linux') diff --git a/include/linux/etherdevice.h b/include/linux/etherdevice.h index 92b10e67d5f8..a541f0c4f146 100644 --- a/include/linux/etherdevice.h +++ b/include/linux/etherdevice.h @@ -428,6 +428,28 @@ static inline bool ether_addr_equal_masked(const u8 *addr1, const u8 *addr2, return true; } +static inline bool ether_addr_is_ipv4_mcast(const u8 *addr) +{ + u8 base[ETH_ALEN] = { 0x01, 0x00, 0x5e, 0x00, 0x00, 0x00 }; + u8 mask[ETH_ALEN] = { 0xff, 0xff, 0xff, 0x80, 0x00, 0x00 }; + + return ether_addr_equal_masked(addr, base, mask); +} + +static inline bool ether_addr_is_ipv6_mcast(const u8 *addr) +{ + u8 base[ETH_ALEN] = { 0x33, 0x33, 0x00, 0x00, 0x00, 0x00 }; + u8 mask[ETH_ALEN] = { 0xff, 0xff, 0x00, 0x00, 0x00, 0x00 }; + + return ether_addr_equal_masked(addr, base, mask); +} + +static inline bool ether_addr_is_ip_mcast(const u8 *addr) +{ + return ether_addr_is_ipv4_mcast(addr) || + ether_addr_is_ipv6_mcast(addr); +} + /** * ether_addr_to_u64 - Convert an Ethernet address into a u64 value. * @addr: Pointer to a six-byte array containing the Ethernet address -- cgit From dcf8e5633e2e69ad60b730ab5905608b756a032f Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 23 Aug 2022 12:59:25 -0700 Subject: tracing: Define the is_signed_type() macro once There are two definitions of the is_signed_type() macro: one in and a second definition in . As suggested by Linus, move the definition of the is_signed_type() macro into the header file. Change the definition of the is_signed_type() macro to make sure that it does not trigger any sparse warnings with future versions of sparse for bitwise types. Link: https://lore.kernel.org/all/CAHk-=whjH6p+qzwUdx5SOVVHjS3WvzJQr6mDUwhEyTf6pJWzaQ@mail.gmail.com/ Link: https://lore.kernel.org/all/CAHk-=wjQGnVfb4jehFR0XyZikdQvCZouE96xR_nnf5kqaM5qqQ@mail.gmail.com/ Cc: Rasmus Villemoes Cc: Steven Rostedt Acked-by: Kees Cook Signed-off-by: Bart Van Assche Signed-off-by: Linus Torvalds --- include/linux/compiler.h | 6 ++++++ include/linux/overflow.h | 1 - include/linux/trace_events.h | 2 -- 3 files changed, 6 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 01ce94b58b42..7713d7bcdaea 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -239,6 +239,12 @@ static inline void *offset_to_ptr(const int *off) /* &a[0] degrades to a pointer: a different type from an array */ #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0])) +/* + * Whether 'type' is a signed type or an unsigned type. Supports scalar types, + * bool and also pointer types. + */ +#define is_signed_type(type) (((type)(-1)) < (__force type)1) + /* * This is needed in functions which generate the stack canary, see * arch/x86/kernel/smpboot.c::start_secondary() for an example. diff --git a/include/linux/overflow.h b/include/linux/overflow.h index f1221d11f8e5..0eb3b192f07a 100644 --- a/include/linux/overflow.h +++ b/include/linux/overflow.h @@ -30,7 +30,6 @@ * https://mail-index.netbsd.org/tech-misc/2007/02/05/0000.html - * credit to Christian Biere. */ -#define is_signed_type(type) (((type)(-1)) < (type)1) #define __type_half_max(type) ((type)1 << (8*sizeof(type) - 1 - is_signed_type(type))) #define type_max(T) ((T)((__type_half_max(T) - 1) + __type_half_max(T))) #define type_min(T) ((T)((T)-type_max(T)-(T)1)) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index b18759a673c6..8401dec93c15 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -814,8 +814,6 @@ extern int trace_add_event_call(struct trace_event_call *call); extern int trace_remove_event_call(struct trace_event_call *call); extern int trace_event_get_offsets(struct trace_event_call *call); -#define is_signed_type(type) (((type)(-1)) < (type)1) - int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); int trace_set_clr_event(const char *system, const char *event, int set); int trace_array_set_clr_event(struct trace_array *tr, const char *system, -- cgit From ff6d365898d4d31bd557954c7fc53f38977b491c Mon Sep 17 00:00:00 2001 From: Jeff Johnson Date: Mon, 22 Aug 2022 08:34:35 -0700 Subject: soc: qcom: qmi: use const for struct qmi_elem_info Currently all usage of struct qmi_elem_info, which is used to define the QMI message encoding/decoding rules, does not use const. This prevents clients from registering const arrays. Since these arrays are always pre-defined, they should be const, so add the const qualifier to all places in the QMI interface where struct qmi_elem_info is used. Once this patch is in place, clients can independently update their pre-defined arrays to be const, as demonstrated in the QMI sample code. Signed-off-by: Jeff Johnson Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20220822153435.7856-1-quic_jjohnson@quicinc.com --- include/linux/soc/qcom/qmi.h | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/qmi.h b/include/linux/soc/qcom/qmi.h index b1f80e756d2a..469e02d2aa0d 100644 --- a/include/linux/soc/qcom/qmi.h +++ b/include/linux/soc/qcom/qmi.h @@ -75,7 +75,7 @@ struct qmi_elem_info { enum qmi_array_type array_type; u8 tlv_type; u32 offset; - struct qmi_elem_info *ei_array; + const struct qmi_elem_info *ei_array; }; #define QMI_RESULT_SUCCESS_V01 0 @@ -102,7 +102,7 @@ struct qmi_response_type_v01 { u16 error; }; -extern struct qmi_elem_info qmi_response_type_v01_ei[]; +extern const struct qmi_elem_info qmi_response_type_v01_ei[]; /** * struct qmi_service - context to track lookup-results @@ -173,7 +173,7 @@ struct qmi_txn { struct completion completion; int result; - struct qmi_elem_info *ei; + const struct qmi_elem_info *ei; void *dest; }; @@ -189,7 +189,7 @@ struct qmi_msg_handler { unsigned int type; unsigned int msg_id; - struct qmi_elem_info *ei; + const struct qmi_elem_info *ei; size_t decoded_size; void (*fn)(struct qmi_handle *qmi, struct sockaddr_qrtr *sq, @@ -249,23 +249,23 @@ void qmi_handle_release(struct qmi_handle *qmi); ssize_t qmi_send_request(struct qmi_handle *qmi, struct sockaddr_qrtr *sq, struct qmi_txn *txn, int msg_id, size_t len, - struct qmi_elem_info *ei, const void *c_struct); + const struct qmi_elem_info *ei, const void *c_struct); ssize_t qmi_send_response(struct qmi_handle *qmi, struct sockaddr_qrtr *sq, struct qmi_txn *txn, int msg_id, size_t len, - struct qmi_elem_info *ei, const void *c_struct); + const struct qmi_elem_info *ei, const void *c_struct); ssize_t qmi_send_indication(struct qmi_handle *qmi, struct sockaddr_qrtr *sq, - int msg_id, size_t len, struct qmi_elem_info *ei, + int msg_id, size_t len, const struct qmi_elem_info *ei, const void *c_struct); void *qmi_encode_message(int type, unsigned int msg_id, size_t *len, - unsigned int txn_id, struct qmi_elem_info *ei, + unsigned int txn_id, const struct qmi_elem_info *ei, const void *c_struct); int qmi_decode_message(const void *buf, size_t len, - struct qmi_elem_info *ei, void *c_struct); + const struct qmi_elem_info *ei, void *c_struct); int qmi_txn_init(struct qmi_handle *qmi, struct qmi_txn *txn, - struct qmi_elem_info *ei, void *c_struct); + const struct qmi_elem_info *ei, void *c_struct); int qmi_txn_wait(struct qmi_txn *txn, unsigned long timeout); void qmi_txn_cancel(struct qmi_txn *txn); -- cgit From c13d7d261e361dbaf5adbdc216ee4a1204c48001 Mon Sep 17 00:00:00 2001 From: Manivannan Sadhasivam Date: Thu, 25 Aug 2022 10:08:56 +0530 Subject: soc: qcom: llcc: Pass LLCC version based register offsets to EDAC driver The LLCC EDAC register offsets varies between each SoCs. Until now, the EDAC driver used the hardcoded register offsets. But this caused crash on SM8450 SoC where the register offsets has been changed. So to avoid this crash and also to make it easy to accommodate changes for new SoCs, let's pass the LLCC version specific register offsets to the EDAC driver. Currently, two set of offsets are used. One is starting from LLCC version v1.0.0 used by all SoCs other than SM8450. For SM8450, LLCC version starting from v2.1.0 is used. Signed-off-by: Manivannan Sadhasivam Reviewed-by: Sai Prakash Ranjan Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20220825043859.30066-3-manivannan.sadhasivam@linaro.org --- include/linux/soc/qcom/llcc-qcom.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/llcc-qcom.h b/include/linux/soc/qcom/llcc-qcom.h index 9ed5384c5ca1..bc2fb8343a94 100644 --- a/include/linux/soc/qcom/llcc-qcom.h +++ b/include/linux/soc/qcom/llcc-qcom.h @@ -78,11 +78,40 @@ struct llcc_edac_reg_data { u8 ways_shift; }; +struct llcc_edac_reg_offset { + /* LLCC TRP registers */ + u32 trp_ecc_error_status0; + u32 trp_ecc_error_status1; + u32 trp_ecc_sb_err_syn0; + u32 trp_ecc_db_err_syn0; + u32 trp_ecc_error_cntr_clear; + u32 trp_interrupt_0_status; + u32 trp_interrupt_0_clear; + u32 trp_interrupt_0_enable; + + /* LLCC Common registers */ + u32 cmn_status0; + u32 cmn_interrupt_0_enable; + u32 cmn_interrupt_2_enable; + + /* LLCC DRP registers */ + u32 drp_ecc_error_cfg; + u32 drp_ecc_error_cntr_clear; + u32 drp_interrupt_status; + u32 drp_interrupt_clear; + u32 drp_interrupt_enable; + u32 drp_ecc_error_status0; + u32 drp_ecc_error_status1; + u32 drp_ecc_sb_err_syn0; + u32 drp_ecc_db_err_syn0; +}; + /** * struct llcc_drv_data - Data associated with the llcc driver * @regmap: regmap associated with the llcc device * @bcast_regmap: regmap associated with llcc broadcast offset * @cfg: pointer to the data structure for slice configuration + * @edac_reg_offset: Offset of the LLCC EDAC registers * @lock: mutex associated with each slice * @cfg_size: size of the config data table * @max_slices: max slices as read from device tree @@ -96,6 +125,7 @@ struct llcc_drv_data { struct regmap *regmap; struct regmap *bcast_regmap; const struct llcc_slice_config *cfg; + const struct llcc_edac_reg_offset *edac_reg_offset; struct mutex lock; u32 cfg_size; u32 max_slices; -- cgit From c60561014257699d81ab392eb6cb6389ad8907df Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 12:50:03 +0800 Subject: soundwire: bus: allow device number to be unique at system level The SoundWire specification allows the device number to be allocated at will. When a system includes multiple SoundWire links, the device number scope is limited to the link to which the device is attached. However, for integration/debug it can be convenient to have a unique device number across the system. This patch adds a 'dev_num_ida_min' field at the bus level, which when set will be used to allocate an IDA. The allocation happens when a hardware device reports as ATTACHED. If any error happens during the enumeration, the allocated IDA is not freed - the device number will be reused if/when the device re-joins the bus. The IDA is only freed when the Linux device is unregistered. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Ranjani Sridharan Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823045004.2670658-3-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index 39058c841469..a2b31d25ea27 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -889,6 +889,9 @@ struct sdw_master_ops { * meaningful if multi_link is set. If set to 1, hardware-based * synchronization will be used even if a stream only uses a single * SoundWire segment. + * @dev_num_ida_min: if set, defines the minimum values for the IDA + * used to allocate system-unique device numbers. This value needs to be + * identical across all SoundWire bus in the system. */ struct sdw_bus { struct device *dev; @@ -913,6 +916,7 @@ struct sdw_bus { u32 bank_switch_timeout; bool multi_link; int hw_sync_min_links; + int dev_num_ida_min; }; int sdw_bus_master_add(struct sdw_bus *bus, struct device *parent, -- cgit From c5b81449f915a28bb9c7725e53aebab3ba39b4a2 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 29 Aug 2022 14:47:07 +0200 Subject: perf/hw_breakpoint: Provide hw_breakpoint_is_used() and use in test Provide hw_breakpoint_is_used() to check if breakpoints are in use on the system. Use it in the KUnit test to verify the global state before and after a test case. Signed-off-by: Marco Elver Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Dmitry Vyukov Acked-by: Ian Rogers Link: https://lore.kernel.org/r/20220829124719.675715-3-elver@google.com --- include/linux/hw_breakpoint.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h index 78dd7035d1e5..a3fb846705eb 100644 --- a/include/linux/hw_breakpoint.h +++ b/include/linux/hw_breakpoint.h @@ -74,6 +74,7 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr, extern int register_perf_hw_breakpoint(struct perf_event *bp); extern void unregister_hw_breakpoint(struct perf_event *bp); extern void unregister_wide_hw_breakpoint(struct perf_event * __percpu *cpu_events); +extern bool hw_breakpoint_is_used(void); extern int dbg_reserve_bp_slot(struct perf_event *bp); extern int dbg_release_bp_slot(struct perf_event *bp); @@ -121,6 +122,8 @@ register_perf_hw_breakpoint(struct perf_event *bp) { return -ENOSYS; } static inline void unregister_hw_breakpoint(struct perf_event *bp) { } static inline void unregister_wide_hw_breakpoint(struct perf_event * __percpu *cpu_events) { } +static inline bool hw_breakpoint_is_used(void) { return false; } + static inline int reserve_bp_slot(struct perf_event *bp) {return -ENOSYS; } static inline void release_bp_slot(struct perf_event *bp) { } -- cgit From 0370dc314df35579b751d1b77c9169f071444962 Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 29 Aug 2022 14:47:09 +0200 Subject: perf/hw_breakpoint: Optimize list of per-task breakpoints On a machine with 256 CPUs, running the recently added perf breakpoint benchmark results in: | $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 | # Running 'breakpoint/thread' benchmark: | # Created/joined 30 threads with 4 breakpoints and 64 parallelism | Total time: 236.418 [sec] | | 123134.794271 usecs/op | 7880626.833333 usecs/op/cpu The benchmark tests inherited breakpoint perf events across many threads. Looking at a perf profile, we can see that the majority of the time is spent in various hw_breakpoint.c functions, which execute within the 'nr_bp_mutex' critical sections which then results in contention on that mutex as well: 37.27% [kernel] [k] osq_lock 34.92% [kernel] [k] mutex_spin_on_owner 12.15% [kernel] [k] toggle_bp_slot 11.90% [kernel] [k] __reserve_bp_slot The culprit here is task_bp_pinned(), which has a runtime complexity of O(#tasks) due to storing all task breakpoints in the same list and iterating through that list looking for a matching task. Clearly, this does not scale to thousands of tasks. Instead, make use of the "rhashtable" variant "rhltable" which stores multiple items with the same key in a list. This results in average runtime complexity of O(1) for task_bp_pinned(). With the optimization, the benchmark shows: | $> perf bench -r 30 breakpoint thread -b 4 -p 64 -t 64 | # Running 'breakpoint/thread' benchmark: | # Created/joined 30 threads with 4 breakpoints and 64 parallelism | Total time: 0.208 [sec] | | 108.422396 usecs/op | 6939.033333 usecs/op/cpu On this particular setup that's a speedup of ~1135x. While one option would be to make task_struct a breakpoint list node, this would only further bloat task_struct for infrequently used data. Furthermore, after all optimizations in this series, there's no evidence it would result in better performance: later optimizations make the time spent looking up entries in the hash table negligible (we'll reach the theoretical ideal performance i.e. no constraints). Signed-off-by: Marco Elver Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Dmitry Vyukov Acked-by: Ian Rogers Link: https://lore.kernel.org/r/20220829124719.675715-5-elver@google.com --- include/linux/perf_event.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index ae30c61957d2..1999408a9cbb 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -36,6 +36,7 @@ struct perf_guest_info_callbacks { }; #ifdef CONFIG_HAVE_HW_BREAKPOINT +#include #include #endif @@ -178,7 +179,7 @@ struct hw_perf_event { * creation and event initalization. */ struct arch_hw_breakpoint info; - struct list_head bp_list; + struct rhlist_head bp_list; }; #endif struct { /* amd_iommu */ -- cgit From 9caf87be118f4639537404eeb67dd444a3716e9a Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 29 Aug 2022 14:47:12 +0200 Subject: perf/hw_breakpoint: Make hw_breakpoint_weight() inlinable Due to being a __weak function, hw_breakpoint_weight() will cause the compiler to always emit a call to it. This generates unnecessarily bad code (register spills etc.) for no good reason; in fact it appears in profiles of `perf bench -r 100 breakpoint thread -b 4 -p 128 -t 512`: ... 0.70% [kernel] [k] hw_breakpoint_weight ... While a small percentage, no architecture defines its own hw_breakpoint_weight() nor are there users outside hw_breakpoint.c, which makes the fact it is currently __weak a poor choice. Change hw_breakpoint_weight()'s definition to follow a similar protocol to hw_breakpoint_slots(), such that if defines hw_breakpoint_weight(), we'll use it instead. The result is that it is inlined and no longer shows up in profiles. Signed-off-by: Marco Elver Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Dmitry Vyukov Acked-by: Ian Rogers Link: https://lore.kernel.org/r/20220829124719.675715-8-elver@google.com --- include/linux/hw_breakpoint.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h index a3fb846705eb..f319bd26b030 100644 --- a/include/linux/hw_breakpoint.h +++ b/include/linux/hw_breakpoint.h @@ -80,7 +80,6 @@ extern int dbg_reserve_bp_slot(struct perf_event *bp); extern int dbg_release_bp_slot(struct perf_event *bp); extern int reserve_bp_slot(struct perf_event *bp); extern void release_bp_slot(struct perf_event *bp); -int hw_breakpoint_weight(struct perf_event *bp); int arch_reserve_bp_slot(struct perf_event *bp); void arch_release_bp_slot(struct perf_event *bp); void arch_unregister_hw_breakpoint(struct perf_event *bp); -- cgit From 01fe8a3f818e1074a9a95d624be4549ee7ea2b2b Mon Sep 17 00:00:00 2001 From: Marco Elver Date: Mon, 29 Aug 2022 14:47:15 +0200 Subject: locking/percpu-rwsem: Add percpu_is_write_locked() and percpu_is_read_locked() Implement simple accessors to probe percpu-rwsem's locked state: percpu_is_write_locked(), percpu_is_read_locked(). Signed-off-by: Marco Elver Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Dmitry Vyukov Acked-by: Ian Rogers Link: https://lore.kernel.org/r/20220829124719.675715-11-elver@google.com --- include/linux/percpu-rwsem.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index 5fda40f97fe9..36b942b67b7d 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -121,9 +121,15 @@ static inline void percpu_up_read(struct percpu_rw_semaphore *sem) preempt_enable(); } +extern bool percpu_is_read_locked(struct percpu_rw_semaphore *); extern void percpu_down_write(struct percpu_rw_semaphore *); extern void percpu_up_write(struct percpu_rw_semaphore *); +static inline bool percpu_is_write_locked(struct percpu_rw_semaphore *sem) +{ + return atomic_read(&sem->block); +} + extern int __percpu_init_rwsem(struct percpu_rw_semaphore *, const char *, struct lock_class_key *); -- cgit From 25af7406df5915f04d5f1c8f081dabb0ead1cdcc Mon Sep 17 00:00:00 2001 From: Isaac Manjarres Date: Thu, 18 Aug 2022 18:28:51 +0100 Subject: ARM: 9229/1: amba: Fix use-after-free in amba_read_periphid() After commit f2d3b9a46e0e ("ARM: 9220/1: amba: Remove deferred device addition"), it became possible for amba_read_periphid() to be invoked concurrently from two threads for a particular AMBA device. Consider the case where a thread (T0) is registering an AMBA driver, and searching for all of the devices it can match with on the AMBA bus. Suppose that another thread (T1) is executing the deferred probe work, and is searching through all of the AMBA drivers on the bus for a driver that matches a particular AMBA device. Assume that both threads begin operating on the same AMBA device and the device's peripheral ID is still unknown. In this scenario, the amba_match() function will be invoked for the same AMBA device by both threads, which means amba_read_periphid() can also be invoked by both threads, and both threads will be able to manipulate the AMBA device's pclk pointer without any synchronization. It's possible that one thread will initialize the pclk pointer, then the other thread will re-initialize it, overwriting the previous value, and both will race to free the same pclk, resulting in a use-after-free for whichever thread frees the pclk last. Add a lock per AMBA device to synchronize the handling with detecting the peripheral ID to avoid the use-after-free scenario. The following KFENCE bug report helped detect this problem: ================================================================== BUG: KFENCE: use-after-free read in clk_disable+0x14/0x34 Use-after-free read at 0x(ptrval) (in kfence-#19): clk_disable+0x14/0x34 amba_read_periphid+0xdc/0x134 amba_match+0x3c/0x84 __driver_attach+0x20/0x158 bus_for_each_dev+0x74/0xc0 bus_add_driver+0x154/0x1e8 driver_register+0x88/0x11c do_one_initcall+0x8c/0x2fc kernel_init_freeable+0x190/0x220 kernel_init+0x10/0x108 ret_from_fork+0x14/0x3c 0x0 kfence-#19: 0x(ptrval)-0x(ptrval), size=36, cache=kmalloc-64 allocated by task 8 on cpu 0 at 11.629931s: clk_hw_create_clk+0x38/0x134 amba_get_enable_pclk+0x10/0x68 amba_read_periphid+0x28/0x134 amba_match+0x3c/0x84 __device_attach_driver+0x2c/0xc4 bus_for_each_drv+0x80/0xd0 __device_attach+0xb0/0x1f0 bus_probe_device+0x88/0x90 deferred_probe_work_func+0x8c/0xc0 process_one_work+0x23c/0x690 worker_thread+0x34/0x488 kthread+0xd4/0xfc ret_from_fork+0x14/0x3c 0x0 freed by task 8 on cpu 0 at 11.630095s: amba_read_periphid+0xec/0x134 amba_match+0x3c/0x84 __device_attach_driver+0x2c/0xc4 bus_for_each_drv+0x80/0xd0 __device_attach+0xb0/0x1f0 bus_probe_device+0x88/0x90 deferred_probe_work_func+0x8c/0xc0 process_one_work+0x23c/0x690 worker_thread+0x34/0x488 kthread+0xd4/0xfc ret_from_fork+0x14/0x3c 0x0 Cc: Saravana Kannan Cc: patches@armlinux.org.uk Fixes: f2d3b9a46e0e ("ARM: 9220/1: amba: Remove deferred device addition") Reported-by: Guenter Roeck Tested-by: Guenter Roeck Signed-off-by: Isaac J. Manjarres Signed-off-by: Russell King (Oracle) --- include/linux/amba/bus.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h index e94cdf235f1d..5001e14c5c06 100644 --- a/include/linux/amba/bus.h +++ b/include/linux/amba/bus.h @@ -67,6 +67,7 @@ struct amba_device { struct clk *pclk; struct device_dma_parameters dma_parms; unsigned int periphid; + struct mutex periphid_lock; unsigned int cid; struct amba_cs_uci_id uci; unsigned int irq[AMBA_NR_IRQS]; -- cgit From 690252f19f0e486abb8590b3a7a03d4e065d93d4 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 25 Aug 2022 20:09:31 -0700 Subject: netlink: add support for ext_ack missing attributes There is currently no way to report via extack in a structured way that an attribute is missing. This leads to families resorting to string messages. Add a pair of attributes - @offset and @type for machine-readable way of reporting missing attributes. The @offset points to the nest which should have contained the attribute, @type is the expected nla_type. The offset will be skipped if the attribute is missing at the message level rather than inside a nest. User space should be able to figure out which attribute enum (AKA attribute space AKA attribute set) the nest pointed to by @offset is using. Reviewed-by: Johannes Berg Signed-off-by: Jakub Kicinski Signed-off-by: Paolo Abeni --- include/linux/netlink.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index bda1c385cffb..1619221c415c 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -71,6 +71,8 @@ netlink_kernel_create(struct net *net, int unit, struct netlink_kernel_cfg *cfg) * %NL_SET_ERR_MSG * @bad_attr: attribute with error * @policy: policy for a bad attribute + * @miss_type: attribute type which was missing + * @miss_nest: nest missing an attribute (%NULL if missing top level attr) * @cookie: cookie data to return to userspace (for success) * @cookie_len: actual cookie data length */ @@ -78,6 +80,8 @@ struct netlink_ext_ack { const char *_msg; const struct nlattr *bad_attr; const struct nla_policy *policy; + const struct nlattr *miss_nest; + u16 miss_type; u8 cookie[NETLINK_MAX_COOKIE_LEN]; u8 cookie_len; }; @@ -126,6 +130,15 @@ struct netlink_ext_ack { #define NL_SET_ERR_MSG_ATTR(extack, attr, msg) \ NL_SET_ERR_MSG_ATTR_POL(extack, attr, NULL, msg) +#define NL_SET_ERR_ATTR_MISS(extack, nest, type) do { \ + struct netlink_ext_ack *__extack = (extack); \ + \ + if (__extack) { \ + __extack->miss_nest = (nest); \ + __extack->miss_type = (type); \ + } \ +} while (0) + static inline void nl_set_extack_cookie_u64(struct netlink_ext_ack *extack, u64 cookie) { -- cgit From 45dca15759643806660e9285e6af8a1ba3c76c82 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Thu, 25 Aug 2022 20:09:32 -0700 Subject: netlink: add helpers for extack attr presence checking Being able to check attribute presence and set extack if not on one line is handy, add helpers. Reviewed-by: Johannes Berg Signed-off-by: Jakub Kicinski Signed-off-by: Paolo Abeni --- include/linux/netlink.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netlink.h b/include/linux/netlink.h index 1619221c415c..d51e041d2242 100644 --- a/include/linux/netlink.h +++ b/include/linux/netlink.h @@ -139,6 +139,17 @@ struct netlink_ext_ack { } \ } while (0) +#define NL_REQ_ATTR_CHECK(extack, nest, tb, type) ({ \ + struct nlattr **__tb = (tb); \ + u32 __attr = (type); \ + int __retval; \ + \ + __retval = !__tb[__attr]; \ + if (__retval) \ + NL_SET_ERR_ATTR_MISS((extack), (nest), __attr); \ + __retval; \ +}) + static inline void nl_set_extack_cookie_u64(struct netlink_ext_ack *extack, u64 cookie) { -- cgit From 87888fb9ac0c71cdc1edd0779382052475dadb84 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Tue, 16 Aug 2022 14:57:32 +0300 Subject: tty: Remove baudrate dead code & make ktermios params const MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With the architectures currently in-tree, either: 1) CBAUDEX is zero 2) The earlier BOTHER if check covers cbaud < 1 case 3) All CBAUD bits are covered by the baud_table Thus, the check for cbaud being out-of-range for CBAUDEX case cannot ever be true. The ktermios parameters can now be made const. Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220816115739.10928-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/tty.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tty.h b/include/linux/tty.h index 7b0a5d478ef6..cf5ab26de73d 100644 --- a/include/linux/tty.h +++ b/include/linux/tty.h @@ -434,7 +434,7 @@ int tty_hung_up_p(struct file *filp); void do_SAK(struct tty_struct *tty); void __do_SAK(struct tty_struct *tty); void no_tty(void); -speed_t tty_termios_baud_rate(struct ktermios *termios); +speed_t tty_termios_baud_rate(const struct ktermios *termios); void tty_termios_encode_baud_rate(struct ktermios *termios, speed_t ibaud, speed_t obaud); void tty_encode_baud_rate(struct tty_struct *tty, speed_t ibaud, -- cgit From d15f89d997d995c9a0bf5be5de4c1c405886f21c Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Tue, 16 Aug 2022 14:57:35 +0300 Subject: tty: Make tty_termios_copy_hw() old ktermios const MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220816115739.10928-5-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/tty.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tty.h b/include/linux/tty.h index cf5ab26de73d..ae41893f8653 100644 --- a/include/linux/tty.h +++ b/include/linux/tty.h @@ -458,7 +458,7 @@ static inline speed_t tty_get_baud_rate(struct tty_struct *tty) unsigned char tty_get_char_size(unsigned int cflag); unsigned char tty_get_frame_size(unsigned int cflag); -void tty_termios_copy_hw(struct ktermios *new, struct ktermios *old); +void tty_termios_copy_hw(struct ktermios *new, const struct ktermios *old); int tty_termios_hw_change(const struct ktermios *a, const struct ktermios *b); int tty_set_termios(struct tty_struct *tty, struct ktermios *kt); -- cgit From 8b7d2d95cf82f8ca034ac65d0f39a2b3359b680f Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Tue, 16 Aug 2022 14:57:36 +0300 Subject: tty: Make ldisc ->set_termios() old ktermios const MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220816115739.10928-6-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_ldisc.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_ldisc.h b/include/linux/tty_ldisc.h index ede6f2157f32..dcb61ec11424 100644 --- a/include/linux/tty_ldisc.h +++ b/include/linux/tty_ldisc.h @@ -130,7 +130,7 @@ int ldsem_down_write_nested(struct ld_semaphore *sem, int subclass, * a pointer to wordsize-sensitive structure belongs here, but most of * ldiscs will happily leave it %NULL. * - * @set_termios: [TTY] ``void ()(struct tty_struct *tty, struct ktermios *old)`` + * @set_termios: [TTY] ``void ()(struct tty_struct *tty, const struct ktermios *old)`` * * This function notifies the line discpline that a change has been made * to the termios structure. @@ -227,7 +227,7 @@ struct tty_ldisc_ops { unsigned long arg); int (*compat_ioctl)(struct tty_struct *tty, unsigned int cmd, unsigned long arg); - void (*set_termios)(struct tty_struct *tty, struct ktermios *old); + void (*set_termios)(struct tty_struct *tty, const struct ktermios *old); __poll_t (*poll)(struct tty_struct *tty, struct file *file, struct poll_table_struct *wait); void (*hangup)(struct tty_struct *tty); -- cgit From bec5b814d46c2a704c3c8148752e62a33e9fa6dc Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Tue, 16 Aug 2022 14:57:37 +0300 Subject: serial: Make ->set_termios() old ktermios const MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220816115739.10928-7-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_8250.h | 4 ++-- include/linux/serial_core.h | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h index 8c7b793aa4d7..43edd10e8c96 100644 --- a/include/linux/serial_8250.h +++ b/include/linux/serial_8250.h @@ -32,7 +32,7 @@ struct plat_serial8250_port { void (*serial_out)(struct uart_port *, int, int); void (*set_termios)(struct uart_port *, struct ktermios *new, - struct ktermios *old); + const struct ktermios *old); void (*set_ldisc)(struct uart_port *, struct ktermios *); unsigned int (*get_mctrl)(struct uart_port *); @@ -157,7 +157,7 @@ extern int early_serial8250_setup(struct earlycon_device *device, extern void serial8250_update_uartclk(struct uart_port *port, unsigned int uartclk); extern void serial8250_do_set_termios(struct uart_port *port, - struct ktermios *termios, struct ktermios *old); + struct ktermios *termios, const struct ktermios *old); extern void serial8250_do_set_ldisc(struct uart_port *port, struct ktermios *termios); extern unsigned int serial8250_do_get_mctrl(struct uart_port *port); diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index aef3145f2032..0e9bc943bfa8 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -379,7 +379,7 @@ struct uart_ops { void (*shutdown)(struct uart_port *); void (*flush_buffer)(struct uart_port *); void (*set_termios)(struct uart_port *, struct ktermios *new, - struct ktermios *old); + const struct ktermios *old); void (*set_ldisc)(struct uart_port *, struct ktermios *); void (*pm)(struct uart_port *, unsigned int state, unsigned int oldstate); @@ -425,7 +425,7 @@ struct uart_port { void (*serial_out)(struct uart_port *, int, int); void (*set_termios)(struct uart_port *, struct ktermios *new, - struct ktermios *old); + const struct ktermios *old); void (*set_ldisc)(struct uart_port *, struct ktermios *); unsigned int (*get_mctrl)(struct uart_port *); @@ -644,7 +644,7 @@ void uart_write_wakeup(struct uart_port *port); void uart_update_timeout(struct uart_port *port, unsigned int cflag, unsigned int baud); unsigned int uart_get_baud_rate(struct uart_port *port, struct ktermios *termios, - struct ktermios *old, unsigned int min, + const struct ktermios *old, unsigned int min, unsigned int max); unsigned int uart_get_divisor(struct uart_port *port, unsigned int baud); -- cgit From f6d47fe5921a6d3f5a1a3d0b9a3dd34b8e295722 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Tue, 16 Aug 2022 14:57:38 +0300 Subject: usb: serial: Make ->set_termios() old ktermios const MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220816115739.10928-8-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/serial.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/usb/serial.h b/include/linux/usb/serial.h index 8ea319f89e1f..f7bfedb740f5 100644 --- a/include/linux/usb/serial.h +++ b/include/linux/usb/serial.h @@ -276,8 +276,8 @@ struct usb_serial_driver { unsigned int cmd, unsigned long arg); void (*get_serial)(struct tty_struct *tty, struct serial_struct *ss); int (*set_serial)(struct tty_struct *tty, struct serial_struct *ss); - void (*set_termios)(struct tty_struct *tty, - struct usb_serial_port *port, struct ktermios *old); + void (*set_termios)(struct tty_struct *tty, struct usb_serial_port *port, + const struct ktermios *old); void (*break_ctl)(struct tty_struct *tty, int break_state); unsigned int (*chars_in_buffer)(struct tty_struct *tty); void (*wait_until_sent)(struct tty_struct *tty, long timeout); -- cgit From a8c11c1520347be74b02312d10ef686b01b525f1 Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Tue, 16 Aug 2022 14:57:39 +0300 Subject: tty: Make ->set_termios() old ktermios const MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There should be no reason to adjust old ktermios which is going to get discarded anyway. Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220816115739.10928-9-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_driver.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h index 4841d8069c07..b2456b545ba0 100644 --- a/include/linux/tty_driver.h +++ b/include/linux/tty_driver.h @@ -141,7 +141,7 @@ struct serial_struct; * * Optional. * - * @set_termios: ``void ()(struct tty_struct *tty, struct ktermios *old)`` + * @set_termios: ``void ()(struct tty_struct *tty, const struct ktermios *old)`` * * This routine allows the @tty driver to be notified when device's * termios settings have changed. New settings are in @tty->termios. @@ -365,7 +365,7 @@ struct tty_operations { unsigned int cmd, unsigned long arg); long (*compat_ioctl)(struct tty_struct *tty, unsigned int cmd, unsigned long arg); - void (*set_termios)(struct tty_struct *tty, struct ktermios * old); + void (*set_termios)(struct tty_struct *tty, const struct ktermios *old); void (*throttle)(struct tty_struct * tty); void (*unthrottle)(struct tty_struct * tty); void (*stop)(struct tty_struct *tty); -- cgit From 9c6d778800b921bde3bff3cff5003d1650f942d1 Mon Sep 17 00:00:00 2001 From: Alan Stern Date: Fri, 26 Aug 2022 15:31:32 -0400 Subject: USB: core: Prevent nested device-reset calls Automatic kernel fuzzing revealed a recursive locking violation in usb-storage: ============================================ WARNING: possible recursive locking detected 5.18.0 #3 Not tainted -------------------------------------------- kworker/1:3/1205 is trying to acquire lock: ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at: usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 but task is already holding lock: ffff888018638db8 (&us_interface_key[i]){+.+.}-{3:3}, at: usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 ... stack backtrace: CPU: 1 PID: 1205 Comm: kworker/1:3 Not tainted 5.18.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: usb_hub_wq hub_event Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_deadlock_bug kernel/locking/lockdep.c:2988 [inline] check_deadlock kernel/locking/lockdep.c:3031 [inline] validate_chain kernel/locking/lockdep.c:3816 [inline] __lock_acquire.cold+0x152/0x3ca kernel/locking/lockdep.c:5053 lock_acquire kernel/locking/lockdep.c:5665 [inline] lock_acquire+0x1ab/0x520 kernel/locking/lockdep.c:5630 __mutex_lock_common kernel/locking/mutex.c:603 [inline] __mutex_lock+0x14f/0x1610 kernel/locking/mutex.c:747 usb_stor_pre_reset+0x35/0x40 drivers/usb/storage/usb.c:230 usb_reset_device+0x37d/0x9a0 drivers/usb/core/hub.c:6109 r871xu_dev_remove+0x21a/0x270 drivers/staging/rtl8712/usb_intf.c:622 usb_unbind_interface+0x1bd/0x890 drivers/usb/core/driver.c:458 device_remove drivers/base/dd.c:545 [inline] device_remove+0x11f/0x170 drivers/base/dd.c:537 __device_release_driver drivers/base/dd.c:1222 [inline] device_release_driver_internal+0x1a7/0x2f0 drivers/base/dd.c:1248 usb_driver_release_interface+0x102/0x180 drivers/usb/core/driver.c:627 usb_forced_unbind_intf+0x4d/0xa0 drivers/usb/core/driver.c:1118 usb_reset_device+0x39b/0x9a0 drivers/usb/core/hub.c:6114 This turned out not to be an error in usb-storage but rather a nested device reset attempt. That is, as the rtl8712 driver was being unbound from a composite device in preparation for an unrelated USB reset (that driver does not have pre_reset or post_reset callbacks), its ->remove routine called usb_reset_device() -- thus nesting one reset call within another. Performing a reset as part of disconnect processing is a questionable practice at best. However, the bug report points out that the USB core does not have any protection against nested resets. Adding a reset_in_progress flag and testing it will prevent such errors in the future. Link: https://lore.kernel.org/all/CAB7eexKUpvX-JNiLzhXBDWgfg2T9e9_0Tw4HQ6keN==voRbP0g@mail.gmail.com/ Cc: stable@vger.kernel.org Reported-and-tested-by: Rondreis Signed-off-by: Alan Stern Link: https://lore.kernel.org/r/YwkflDxvg0KWqyZK@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman --- include/linux/usb.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/usb.h b/include/linux/usb.h index f7a9914fc97f..9ff1ad4dfad1 100644 --- a/include/linux/usb.h +++ b/include/linux/usb.h @@ -575,6 +575,7 @@ struct usb3_lpm_parameters { * @devaddr: device address, XHCI: assigned by HW, others: same as devnum * @can_submit: URBs may be submitted * @persist_enabled: USB_PERSIST enabled for this device + * @reset_in_progress: the device is being reset * @have_langid: whether string_langid is valid * @authorized: policy has said we can use it; * (user space) policy determines if we authorize this device to be @@ -662,6 +663,7 @@ struct usb_device { unsigned can_submit:1; unsigned persist_enabled:1; + unsigned reset_in_progress:1; unsigned have_langid:1; unsigned authorized:1; unsigned authenticated:1; -- cgit From 43a063cab325ee7cc50349967e536b3cd4e57f03 Mon Sep 17 00:00:00 2001 From: Yosry Ahmed Date: Tue, 23 Aug 2022 00:46:37 +0000 Subject: KVM: x86/mmu: count KVM mmu usage in secondary pagetable stats. Count the pages used by KVM mmu on x86 in memory stats under secondary pagetable stats (e.g. "SecPageTables" in /proc/meminfo) to give better visibility into the memory consumption of KVM mmu in a similar way to how normal user page tables are accounted. Add the inner helper in common KVM, ARM will also use it to count stats in a future commit. Signed-off-by: Yosry Ahmed Reviewed-by: Sean Christopherson Acked-by: Marc Zyngier # generic KVM changes Link: https://lore.kernel.org/r/20220823004639.2387269-3-yosryahmed@google.com Link: https://lore.kernel.org/r/20220823004639.2387269-4-yosryahmed@google.com [sean: squash x86 usage to workaround modpost issues] Signed-off-by: Sean Christopherson --- include/linux/kvm_host.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f4519d3689e1..04c7e5f2f727 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2247,6 +2247,19 @@ static inline void kvm_handle_signal_exit(struct kvm_vcpu *vcpu) } #endif /* CONFIG_KVM_XFER_TO_GUEST_WORK */ +/* + * If more than one page is being (un)accounted, @virt must be the address of + * the first page of a block of pages what were allocated together (i.e + * accounted together). + * + * kvm_account_pgtable_pages() is thread-safe because mod_lruvec_page_state() + * is thread-safe. + */ +static inline void kvm_account_pgtable_pages(void *virt, int nr) +{ + mod_lruvec_page_state(virt_to_page(virt), NR_SECONDARY_PAGETABLE, nr); +} + /* * This defines how many reserved entries we want to keep before we * kick the vcpu to the userspace to avoid dirty ring full. This -- cgit From 4e508b259ed02f5fa608cdd83b817a7f49c22271 Mon Sep 17 00:00:00 2001 From: "Chengci.Xu" Date: Wed, 17 Aug 2022 20:46:07 +0800 Subject: memory: mtk-smi: Add enable IOMMU SMC command for MM master For concerns about security, the register to enable/disable IOMMU of SMI LARB should only be configured in secure world. Thus, we add some SMC command for multimedia master to enable/disable MM IOMMU in ATF by setting the register of SMI LARB. This function is prepared for MT8188. Signed-off-by: Chengci.Xu Signed-off-by: Krzysztof Kozlowski Link: https://lore.kernel.org/r/20220817124608.10062-4-chengci.xu@mediatek.com --- include/linux/soc/mediatek/mtk_sip_svc.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk_sip_svc.h b/include/linux/soc/mediatek/mtk_sip_svc.h index 082398e0cfb1..0761128b4354 100644 --- a/include/linux/soc/mediatek/mtk_sip_svc.h +++ b/include/linux/soc/mediatek/mtk_sip_svc.h @@ -22,4 +22,7 @@ ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, MTK_SIP_SMC_CONVENTION, \ ARM_SMCCC_OWNER_SIP, fn_id) +/* IOMMU related SMC call */ +#define MTK_SIP_KERNEL_IOMMU_CONTROL MTK_SIP_SMC_CMD(0x514) + #endif -- cgit From 9d2b2e83ef277b9c7b8852e8717140daa373ccf1 Mon Sep 17 00:00:00 2001 From: Nuno Sá Date: Tue, 30 Aug 2022 20:54:10 -0700 Subject: Input: adp5588-keys - support gpi key events as 'gpio keys' MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This change replaces the support for GPIs as key event generators. Instead of reporting the events directly, we add a gpio based irqchip so that these events can be consumed by keys defined in the gpio-keys driver (as it's goal is indeed for keys on GPIOs capable of generating interrupts). With this, the gpio-adp5588 driver can also be dropped. The basic idea is that all the pins that are not being used as part of the keymap matrix can be possibly requested as GPIOs by gpio-keys (it's also fine to use these pins as plain interrupts though that's not really the point). Since the gpiochip now also has irqchip capabilities, we should only remove it after we free the device interrupt (otherwise we could, in theory, be handling GPIs interrupts while the gpiochip is concurrently removed). Thus the call 'adp5588_gpio_add()' is moved and since the setup phase also needs to come before making the gpios visible, we also need to move 'adp5588_setup()'. While at it, always select GPIOLIB so that we don't need to use #ifdef guards. Signed-off-by: Nuno Sá Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220829131553.690063-2-nuno.sa@analog.com Signed-off-by: Dmitry Torokhov --- include/linux/platform_data/adp5588.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/adp5588.h b/include/linux/platform_data/adp5588.h index 6d3f7d911a92..82170ec8c266 100644 --- a/include/linux/platform_data/adp5588.h +++ b/include/linux/platform_data/adp5588.h @@ -147,8 +147,6 @@ struct adp5588_kpad_platform_data { unsigned en_keylock:1; /* Enable Key Lock feature */ unsigned short unlock_key1; /* Unlock Key 1 */ unsigned short unlock_key2; /* Unlock Key 2 */ - const struct adp5588_gpi_map *gpimap; - unsigned short gpimapsize; const struct adp5588_gpio_platform_data *gpio_data; }; -- cgit From 6704a86283b7e79ff7ae36d388466428f6672962 Mon Sep 17 00:00:00 2001 From: Nuno Sá Date: Tue, 30 Aug 2022 21:00:14 -0700 Subject: Input: adp5588-keys - add support for fw properties MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Use firmware properties (eg: OF) to get the device specific configuration. This change just replaces the platform data since there was no platform using it and so, it makes no sense having both. Special note to the PULL-UP disable setting that is now supported as part of the gpio subsystem (using 'set_config()' callback). Signed-off-by: Nuno Sá Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220829131553.690063-5-nuno.sa@analog.com Signed-off-by: Dmitry Torokhov --- include/linux/platform_data/adp5588.h | 169 ---------------------------------- 1 file changed, 169 deletions(-) delete mode 100644 include/linux/platform_data/adp5588.h (limited to 'include/linux') diff --git a/include/linux/platform_data/adp5588.h b/include/linux/platform_data/adp5588.h deleted file mode 100644 index 82170ec8c266..000000000000 --- a/include/linux/platform_data/adp5588.h +++ /dev/null @@ -1,169 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * Analog Devices ADP5588 I/O Expander and QWERTY Keypad Controller - * - * Copyright 2009-2010 Analog Devices Inc. - */ - -#ifndef _ADP5588_H -#define _ADP5588_H - -#define DEV_ID 0x00 /* Device ID */ -#define CFG 0x01 /* Configuration Register1 */ -#define INT_STAT 0x02 /* Interrupt Status Register */ -#define KEY_LCK_EC_STAT 0x03 /* Key Lock and Event Counter Register */ -#define Key_EVENTA 0x04 /* Key Event Register A */ -#define Key_EVENTB 0x05 /* Key Event Register B */ -#define Key_EVENTC 0x06 /* Key Event Register C */ -#define Key_EVENTD 0x07 /* Key Event Register D */ -#define Key_EVENTE 0x08 /* Key Event Register E */ -#define Key_EVENTF 0x09 /* Key Event Register F */ -#define Key_EVENTG 0x0A /* Key Event Register G */ -#define Key_EVENTH 0x0B /* Key Event Register H */ -#define Key_EVENTI 0x0C /* Key Event Register I */ -#define Key_EVENTJ 0x0D /* Key Event Register J */ -#define KP_LCK_TMR 0x0E /* Keypad Lock1 to Lock2 Timer */ -#define UNLOCK1 0x0F /* Unlock Key1 */ -#define UNLOCK2 0x10 /* Unlock Key2 */ -#define GPIO_INT_STAT1 0x11 /* GPIO Interrupt Status */ -#define GPIO_INT_STAT2 0x12 /* GPIO Interrupt Status */ -#define GPIO_INT_STAT3 0x13 /* GPIO Interrupt Status */ -#define GPIO_DAT_STAT1 0x14 /* GPIO Data Status, Read twice to clear */ -#define GPIO_DAT_STAT2 0x15 /* GPIO Data Status, Read twice to clear */ -#define GPIO_DAT_STAT3 0x16 /* GPIO Data Status, Read twice to clear */ -#define GPIO_DAT_OUT1 0x17 /* GPIO DATA OUT */ -#define GPIO_DAT_OUT2 0x18 /* GPIO DATA OUT */ -#define GPIO_DAT_OUT3 0x19 /* GPIO DATA OUT */ -#define GPIO_INT_EN1 0x1A /* GPIO Interrupt Enable */ -#define GPIO_INT_EN2 0x1B /* GPIO Interrupt Enable */ -#define GPIO_INT_EN3 0x1C /* GPIO Interrupt Enable */ -#define KP_GPIO1 0x1D /* Keypad or GPIO Selection */ -#define KP_GPIO2 0x1E /* Keypad or GPIO Selection */ -#define KP_GPIO3 0x1F /* Keypad or GPIO Selection */ -#define GPI_EM1 0x20 /* GPI Event Mode 1 */ -#define GPI_EM2 0x21 /* GPI Event Mode 2 */ -#define GPI_EM3 0x22 /* GPI Event Mode 3 */ -#define GPIO_DIR1 0x23 /* GPIO Data Direction */ -#define GPIO_DIR2 0x24 /* GPIO Data Direction */ -#define GPIO_DIR3 0x25 /* GPIO Data Direction */ -#define GPIO_INT_LVL1 0x26 /* GPIO Edge/Level Detect */ -#define GPIO_INT_LVL2 0x27 /* GPIO Edge/Level Detect */ -#define GPIO_INT_LVL3 0x28 /* GPIO Edge/Level Detect */ -#define Debounce_DIS1 0x29 /* Debounce Disable */ -#define Debounce_DIS2 0x2A /* Debounce Disable */ -#define Debounce_DIS3 0x2B /* Debounce Disable */ -#define GPIO_PULL1 0x2C /* GPIO Pull Disable */ -#define GPIO_PULL2 0x2D /* GPIO Pull Disable */ -#define GPIO_PULL3 0x2E /* GPIO Pull Disable */ -#define CMP_CFG_STAT 0x30 /* Comparator Configuration and Status Register */ -#define CMP_CONFG_SENS1 0x31 /* Sensor1 Comparator Configuration Register */ -#define CMP_CONFG_SENS2 0x32 /* L2 Light Sensor Reference Level, Output Falling for Sensor 1 */ -#define CMP1_LVL2_TRIP 0x33 /* L2 Light Sensor Hysteresis (Active when Output Rising) for Sensor 1 */ -#define CMP1_LVL2_HYS 0x34 /* L3 Light Sensor Reference Level, Output Falling For Sensor 1 */ -#define CMP1_LVL3_TRIP 0x35 /* L3 Light Sensor Hysteresis (Active when Output Rising) For Sensor 1 */ -#define CMP1_LVL3_HYS 0x36 /* Sensor 2 Comparator Configuration Register */ -#define CMP2_LVL2_TRIP 0x37 /* L2 Light Sensor Reference Level, Output Falling for Sensor 2 */ -#define CMP2_LVL2_HYS 0x38 /* L2 Light Sensor Hysteresis (Active when Output Rising) for Sensor 2 */ -#define CMP2_LVL3_TRIP 0x39 /* L3 Light Sensor Reference Level, Output Falling For Sensor 2 */ -#define CMP2_LVL3_HYS 0x3A /* L3 Light Sensor Hysteresis (Active when Output Rising) For Sensor 2 */ -#define CMP1_ADC_DAT_R1 0x3B /* Comparator 1 ADC data Register1 */ -#define CMP1_ADC_DAT_R2 0x3C /* Comparator 1 ADC data Register2 */ -#define CMP2_ADC_DAT_R1 0x3D /* Comparator 2 ADC data Register1 */ -#define CMP2_ADC_DAT_R2 0x3E /* Comparator 2 ADC data Register2 */ - -#define ADP5588_DEVICE_ID_MASK 0xF - - /* Configuration Register1 */ -#define ADP5588_AUTO_INC (1 << 7) -#define ADP5588_GPIEM_CFG (1 << 6) -#define ADP5588_OVR_FLOW_M (1 << 5) -#define ADP5588_INT_CFG (1 << 4) -#define ADP5588_OVR_FLOW_IEN (1 << 3) -#define ADP5588_K_LCK_IM (1 << 2) -#define ADP5588_GPI_IEN (1 << 1) -#define ADP5588_KE_IEN (1 << 0) - -/* Interrupt Status Register */ -#define ADP5588_CMP2_INT (1 << 5) -#define ADP5588_CMP1_INT (1 << 4) -#define ADP5588_OVR_FLOW_INT (1 << 3) -#define ADP5588_K_LCK_INT (1 << 2) -#define ADP5588_GPI_INT (1 << 1) -#define ADP5588_KE_INT (1 << 0) - -/* Key Lock and Event Counter Register */ -#define ADP5588_K_LCK_EN (1 << 6) -#define ADP5588_LCK21 0x30 -#define ADP5588_KEC 0xF - -#define ADP5588_MAXGPIO 18 -#define ADP5588_BANK(offs) ((offs) >> 3) -#define ADP5588_BIT(offs) (1u << ((offs) & 0x7)) - -/* Put one of these structures in i2c_board_info platform_data */ - -#define ADP5588_KEYMAPSIZE 80 - -#define GPI_PIN_ROW0 97 -#define GPI_PIN_ROW1 98 -#define GPI_PIN_ROW2 99 -#define GPI_PIN_ROW3 100 -#define GPI_PIN_ROW4 101 -#define GPI_PIN_ROW5 102 -#define GPI_PIN_ROW6 103 -#define GPI_PIN_ROW7 104 -#define GPI_PIN_COL0 105 -#define GPI_PIN_COL1 106 -#define GPI_PIN_COL2 107 -#define GPI_PIN_COL3 108 -#define GPI_PIN_COL4 109 -#define GPI_PIN_COL5 110 -#define GPI_PIN_COL6 111 -#define GPI_PIN_COL7 112 -#define GPI_PIN_COL8 113 -#define GPI_PIN_COL9 114 - -#define GPI_PIN_ROW_BASE GPI_PIN_ROW0 -#define GPI_PIN_ROW_END GPI_PIN_ROW7 -#define GPI_PIN_COL_BASE GPI_PIN_COL0 -#define GPI_PIN_COL_END GPI_PIN_COL9 - -#define GPI_PIN_BASE GPI_PIN_ROW_BASE -#define GPI_PIN_END GPI_PIN_COL_END - -#define ADP5588_GPIMAPSIZE_MAX (GPI_PIN_END - GPI_PIN_BASE + 1) - -struct adp5588_gpi_map { - unsigned short pin; - unsigned short sw_evt; -}; - -struct adp5588_kpad_platform_data { - int rows; /* Number of rows */ - int cols; /* Number of columns */ - const unsigned short *keymap; /* Pointer to keymap */ - unsigned short keymapsize; /* Keymap size */ - unsigned repeat:1; /* Enable key repeat */ - unsigned en_keylock:1; /* Enable Key Lock feature */ - unsigned short unlock_key1; /* Unlock Key 1 */ - unsigned short unlock_key2; /* Unlock Key 2 */ - const struct adp5588_gpio_platform_data *gpio_data; -}; - -struct i2c_client; /* forward declaration */ - -struct adp5588_gpio_platform_data { - int gpio_start; /* GPIO Chip base # */ - const char *const *names; - unsigned irq_base; /* interrupt base # */ - unsigned pullup_dis_mask; /* Pull-Up Disable Mask */ - int (*setup)(struct i2c_client *client, - unsigned gpio, unsigned ngpio, - void *context); - int (*teardown)(struct i2c_client *client, - unsigned gpio, unsigned ngpio, - void *context); - void *context; -}; - -#endif -- cgit From 6b70fe0601adb1396ad0b85cdf05d217500b49e7 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Mon, 29 Aug 2022 14:38:42 +0200 Subject: acl: add vfs_set_acl_prepare() Various filesystems store POSIX ACLs on the backing store in their uapi format. Such filesystems need to translate from the uapi POSIX ACL format into the VFS format during i_op->get_acl(). The VFS provides the posix_acl_from_xattr() helper for this task. But the usage of posix_acl_from_xattr() is currently ambiguous. It is intended to transform from a uapi POSIX ACL to the VFS represenation. For example, when retrieving POSIX ACLs for permission checking during lookup or when calling getxattr() to retrieve system.posix_acl_{access,default}. Calling posix_acl_from_xattr() during i_op->get_acl() will map the raw {g,u}id values stored as ACL_{GROUP,USER} entries in the uapi POSIX ACL format into k{g,u}id_t in the filesystem's idmapping and return a struct posix_acl ready to be returned to the VFS for caching and to perform permission checks on. However, posix_acl_from_xattr() is also called during setxattr() for all filesystems that rely on VFS provides posix_acl_{access,default}_xattr_handler. The posix_acl_xattr_set() handler which is used for the ->set() method of posix_acl_{access,default}_xattr_handler uses posix_acl_from_xattr() to translate from the uapi POSIX ACL format to the VFS format so that it can be passed to the i_op->set_acl() handler of the filesystem or for direct caching in case no i_op->set_acl() handler is defined. During setxattr() the {g,u}id values stored as ACL_{GROUP,USER} entries in the uapi POSIX ACL format aren't raw {g,u}id values that need to be mapped according to the filesystem's idmapping. Instead they are {g,u}id values in the caller's idmapping which have been generated during posix_acl_fix_xattr_from_user(). In other words, they are k{g,u}id_t which are passed as raw {g,u}id values abusing the uapi POSIX ACL format (Please note that this type safety violation has existed since the introduction of k{g,u}id_t. Please see [1] for more details.). So when posix_acl_from_xattr() is called in posix_acl_xattr_set() the filesystem idmapping is completely irrelevant. Instead, we abuse the initial idmapping to recover the k{g,u}id_t base on the value stored in raw {g,u}id as ACL_{GROUP,USER} in the uapi POSIX ACL format. We need to clearly distinguish betweeen these two operations as it is really easy to confuse for filesystems as can be seen in ntfs3. In order to do this we factor out make_posix_acl() which takes callbacks allowing callers to pass dedicated methods to generate the correct k{g,u}id_t. This is just an internal static helper which is not exposed to any filesystems but it neatly encapsulates the basic logic of walking through a uapi POSIX ACL and returning an allocated VFS POSIX ACL with the correct k{g,u}id_t values. The posix_acl_from_xattr() helper can then be implemented as a simple call to make_posix_acl() with callbacks that generate the correct k{g,u}id_t from the raw {g,u}id values in ACL_{GROUP,USER} entries in the uapi POSIX ACL format as read from the backing store. For setxattr() we add a new helper vfs_set_acl_prepare() which has callbacks to map the POSIX ACLs from the uapi format with the k{g,u}id_t values stored in raw {g,u}id format in ACL_{GROUP,USER} entries into the correct k{g,u}id_t values in the filesystem idmapping. In contrast to posix_acl_from_xattr() the vfs_set_acl_prepare() helper needs to take the mount idmapping into account. The differences are explained in more detail in the kernel doc for the new functions. In follow up patches we will remove all abuses of posix_acl_from_xattr() for setxattr() operations and replace it with calls to vfs_set_acl_prepare(). The new vfs_set_acl_prepare() helper allows us to deal with the ambiguity in how the POSI ACL uapi struct stores {g,u}id values depending on whether this is a getxattr() or setxattr() operation. This also allows us to remove the posix_acl_setxattr_idmapped_mnt() helper reducing the abuse of the POSIX ACL uapi format to pass values that should be distinct types in {g,u}id values stored as ACL_{GROUP,USER} entries. The removal of posix_acl_setxattr_idmapped_mnt() in turn allows us to re-constify the value parameter of vfs_setxattr() which in turn allows us to avoid the nasty cast from a const void pointer to a non-const void pointer on ovl_do_setxattr(). Ultimately, the plan is to get rid of the type violations completely and never pass the values from k{g,u}id_t as raw {g,u}id in ACL_{GROUP,USER} entries in uapi POSIX ACL format. But that's a longer way to go and this is a preparatory step. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1] Co-Developed-by: Seth Forshee Signed-off-by: Christian Brauner (Microsoft) --- include/linux/posix_acl_xattr.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/posix_acl_xattr.h b/include/linux/posix_acl_xattr.h index b6bd3eac2bcc..47eca15fd842 100644 --- a/include/linux/posix_acl_xattr.h +++ b/include/linux/posix_acl_xattr.h @@ -66,6 +66,9 @@ struct posix_acl *posix_acl_from_xattr(struct user_namespace *user_ns, const void *value, size_t size); int posix_acl_to_xattr(struct user_namespace *user_ns, const struct posix_acl *acl, void *buffer, size_t size); +struct posix_acl *vfs_set_acl_prepare(struct user_namespace *mnt_userns, + struct user_namespace *fs_userns, + const void *value, size_t size); extern const struct xattr_handler posix_acl_access_xattr_handler; extern const struct xattr_handler posix_acl_default_xattr_handler; -- cgit From 66d1c8021e1d9c97101d58fa09c33c002d96747a Mon Sep 17 00:00:00 2001 From: Piyush Mehta Date: Mon, 22 Aug 2022 11:10:51 +0530 Subject: usb: chipidea: Add support for VBUS control with PHY Some platforms make use of VBUS control over PHY which means controller driver has to access PHY registers to turn on/off VBUS line.This patch adds support for such platforms in chipidea. Flag 'CI_HDRC_PHY_VBUS_CONTROL' added to support VBus control feature. Acked-by: Peter Chen Signed-off-by: Piyush Mehta Signed-off-by: Piyush Mehta Link: https://lore.kernel.org/r/20220822054051.2941282-1-piyush.mehta@amd.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/chipidea.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/usb/chipidea.h b/include/linux/usb/chipidea.h index edf3342507f1..ee38835ed77c 100644 --- a/include/linux/usb/chipidea.h +++ b/include/linux/usb/chipidea.h @@ -62,6 +62,7 @@ struct ci_hdrc_platform_data { #define CI_HDRC_REQUIRES_ALIGNED_DMA BIT(13) #define CI_HDRC_IMX_IS_HSIC BIT(14) #define CI_HDRC_PMQOS BIT(15) +#define CI_HDRC_PHY_VBUS_CONTROL BIT(16) enum usb_dr_mode dr_mode; #define CI_HDRC_CONTROLLER_RESET_EVENT 0 #define CI_HDRC_CONTROLLER_STOPPED_EVENT 1 -- cgit From 66df18b3bd74107dd7c196e75ce00d64d7553152 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Wed, 31 Aug 2022 10:47:50 +0200 Subject: gpio: ucb1400: Use proper header The UCB1400 implements a GPIO driver so it needs to include the header, not the legacy header. Compile tested on pxa_defconfig. Signed-off-by: Linus Walleij Signed-off-by: Bartosz Golaszewski --- include/linux/ucb1400.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ucb1400.h b/include/linux/ucb1400.h index 22345391350b..2516082cd3a9 100644 --- a/include/linux/ucb1400.h +++ b/include/linux/ucb1400.h @@ -23,7 +23,7 @@ #include #include #include -#include +#include /* * UCB1400 AC-link registers -- cgit From d8f3f5834febb74d18b5b2098f67d9db740f3e30 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 3 Aug 2022 10:36:32 -0700 Subject: rcu: Update rcu_access_pointer() header for rcu_dereference_protected() The rcu_access_pointer() docbook header correctly notes that it may be used during post-grace-period teardown. However, it is usually better to use rcu_dereference_protected() for this purpose. This commit therefore calls out this preferred usage. Reported-by: Maxim Mikityanskiy Signed-off-by: Paul E. McKenney --- include/linux/rcupdate.h | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index f527f27e6438..61a1a85c720c 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -496,13 +496,21 @@ do { \ * against NULL. Although rcu_access_pointer() may also be used in cases * where update-side locks prevent the value of the pointer from changing, * you should instead use rcu_dereference_protected() for this use case. + * Within an RCU read-side critical section, there is little reason to + * use rcu_access_pointer(). + * + * It is usually best to test the rcu_access_pointer() return value + * directly in order to avoid accidental dereferences being introduced + * by later inattentive changes. In other words, assigning the + * rcu_access_pointer() return value to a local variable results in an + * accident waiting to happen. * * It is also permissible to use rcu_access_pointer() when read-side - * access to the pointer was removed at least one grace period ago, as - * is the case in the context of the RCU callback that is freeing up - * the data, or after a synchronize_rcu() returns. This can be useful - * when tearing down multi-linked structures after a grace period - * has elapsed. + * access to the pointer was removed at least one grace period ago, as is + * the case in the context of the RCU callback that is freeing up the data, + * or after a synchronize_rcu() returns. This can be useful when tearing + * down multi-linked structures after a grace period has elapsed. However, + * rcu_dereference_protected() is normally preferred for this use case. */ #define rcu_access_pointer(p) __rcu_access_pointer((p), __UNIQUE_ID(rcu), __rcu) -- cgit From 91a967fd6934abc0c7e4b0d26728e38674278707 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 28 Jul 2022 15:37:05 -0700 Subject: rcu: Add full-sized polling for get_completed*() and poll_state*() The get_completed_synchronize_rcu() and poll_state_synchronize_rcu() APIs compress the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds the first members of the full-state RCU grace-period polling API, namely the get_completed_synchronize_rcu_full() and poll_state_synchronize_rcu_full() functions. These use up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but which are guaranteed not to miss grace periods, at least in situations where the single-CPU grace-period optimization does not apply. Signed-off-by: Paul E. McKenney --- include/linux/rcupdate.h | 3 +++ include/linux/rcutiny.h | 9 +++++++++ include/linux/rcutree.h | 8 ++++++++ 3 files changed, 20 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index f527f27e6438..faaa174dfb27 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -42,7 +42,10 @@ void call_rcu(struct rcu_head *head, rcu_callback_t func); void rcu_barrier_tasks(void); void rcu_barrier_tasks_rude(void); void synchronize_rcu(void); + +struct rcu_gp_oldstate; unsigned long get_completed_synchronize_rcu(void); +void get_completed_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); #ifdef CONFIG_PREEMPT_RCU diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 62815c0a2dce..1fbff8600d92 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -14,10 +14,19 @@ #include /* for HZ */ +struct rcu_gp_oldstate { + unsigned long rgos_norm; +}; + unsigned long get_state_synchronize_rcu(void); unsigned long start_poll_synchronize_rcu(void); bool poll_state_synchronize_rcu(unsigned long oldstate); +static inline bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) +{ + return poll_state_synchronize_rcu(rgosp->rgos_norm); +} + static inline void cond_synchronize_rcu(unsigned long oldstate) { might_sleep(); diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 47eaa4cb0df7..4ccbc3aa9dc2 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -40,11 +40,19 @@ bool rcu_eqs_special_set(int cpu); void rcu_momentary_dyntick_idle(void); void kfree_rcu_scheduler_running(void); bool rcu_gp_might_be_stalled(void); + +struct rcu_gp_oldstate { + unsigned long rgos_norm; + unsigned long rgos_exp; + unsigned long rgos_polled; +}; + unsigned long start_poll_synchronize_rcu_expedited(void); void cond_synchronize_rcu_expedited(unsigned long oldstate); unsigned long get_state_synchronize_rcu(void); unsigned long start_poll_synchronize_rcu(void); bool poll_state_synchronize_rcu(unsigned long oldstate); +bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); void cond_synchronize_rcu(unsigned long oldstate); bool rcu_is_idle_cpu(int cpu); -- cgit From 3fdefca9b42c8bebe3beea5c1a067c9718ca0fc7 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 28 Jul 2022 19:58:13 -0700 Subject: rcu: Add full-sized polling for get_state() The get_state_synchronize_rcu() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds the next member of the full-state RCU grace-period polling API, namely the get_state_synchronize_rcu_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. Signed-off-by: Paul E. McKenney --- include/linux/rcutiny.h | 6 ++++++ include/linux/rcutree.h | 1 + 2 files changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 1fbff8600d92..6e299955c4e9 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -19,6 +19,12 @@ struct rcu_gp_oldstate { }; unsigned long get_state_synchronize_rcu(void); + +static inline void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) +{ + rgosp->rgos_norm = get_state_synchronize_rcu(); +} + unsigned long start_poll_synchronize_rcu(void); bool poll_state_synchronize_rcu(unsigned long oldstate); diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 4ccbc3aa9dc2..7b769f1b417a 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -50,6 +50,7 @@ struct rcu_gp_oldstate { unsigned long start_poll_synchronize_rcu_expedited(void); void cond_synchronize_rcu_expedited(unsigned long oldstate); unsigned long get_state_synchronize_rcu(void); +void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); unsigned long start_poll_synchronize_rcu(void); bool poll_state_synchronize_rcu(unsigned long oldstate); bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); -- cgit From 76ea364161e72b1878126edf8d507d2a62fb47b0 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 2 Aug 2022 17:04:54 -0700 Subject: rcu: Add full-sized polling for start_poll() The start_poll_synchronize_rcu() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds the next member of the full-state RCU grace-period polling API, namely the start_poll_synchronize_rcu_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. Signed-off-by: Paul E. McKenney --- include/linux/rcutiny.h | 6 ++++++ include/linux/rcutree.h | 1 + 2 files changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 6e299955c4e9..6bc30e46a819 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -26,6 +26,12 @@ static inline void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) } unsigned long start_poll_synchronize_rcu(void); + +static inline void start_poll_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) +{ + rgosp->rgos_norm = start_poll_synchronize_rcu(); +} + bool poll_state_synchronize_rcu(unsigned long oldstate); static inline bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 7b769f1b417a..8f2e0f0b26f6 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -52,6 +52,7 @@ void cond_synchronize_rcu_expedited(unsigned long oldstate); unsigned long get_state_synchronize_rcu(void); void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); unsigned long start_poll_synchronize_rcu(void); +void start_poll_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); bool poll_state_synchronize_rcu(unsigned long oldstate); bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); void cond_synchronize_rcu(unsigned long oldstate); -- cgit From 6c502b14ba66da0670a59e20354469fa56eab26c Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 3 Aug 2022 12:38:51 -0700 Subject: rcu: Add full-sized polling for start_poll_expedited() The start_poll_synchronize_rcu_expedited() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds yet another member of the full-state RCU grace-period polling API, which is the start_poll_synchronize_rcu_expedited_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. [ paulmck: Apply feedback from kernel test robot and Julia Lawall. ] Signed-off-by: Paul E. McKenney --- include/linux/rcutiny.h | 5 +++++ include/linux/rcutree.h | 1 + 2 files changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 6bc30e46a819..653e35777a99 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -49,6 +49,11 @@ static inline unsigned long start_poll_synchronize_rcu_expedited(void) return start_poll_synchronize_rcu(); } +static inline void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp) +{ + rgosp->rgos_norm = start_poll_synchronize_rcu_expedited(); +} + static inline void cond_synchronize_rcu_expedited(unsigned long oldstate) { cond_synchronize_rcu(oldstate); diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 8f2e0f0b26f6..7151fd861736 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -48,6 +48,7 @@ struct rcu_gp_oldstate { }; unsigned long start_poll_synchronize_rcu_expedited(void); +void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp); void cond_synchronize_rcu_expedited(unsigned long oldstate); unsigned long get_state_synchronize_rcu(void); void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); -- cgit From b6fe4917ae4353b397079902cb024ae01f20dfb2 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 4 Aug 2022 13:46:05 -0700 Subject: rcu: Add full-sized polling for cond_sync_full() The cond_synchronize_rcu() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds yet another member of the full-state RCU grace-period polling API, which is the cond_synchronize_rcu_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. [ paulmck: Apply feedback from kernel test robot and Julia Lawall. ] Signed-off-by: Paul E. McKenney --- include/linux/rcutiny.h | 5 +++++ include/linux/rcutree.h | 1 + 2 files changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 653e35777a99..3bee97f76bf4 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -44,6 +44,11 @@ static inline void cond_synchronize_rcu(unsigned long oldstate) might_sleep(); } +static inline void cond_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) +{ + cond_synchronize_rcu(rgosp->rgos_norm); +} + static inline unsigned long start_poll_synchronize_rcu_expedited(void) { return start_poll_synchronize_rcu(); diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 7151fd861736..1b44288c027d 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -57,6 +57,7 @@ void start_poll_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); bool poll_state_synchronize_rcu(unsigned long oldstate); bool poll_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); void cond_synchronize_rcu(unsigned long oldstate); +void cond_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); bool rcu_is_idle_cpu(int cpu); -- cgit From 8df13f01608ea48712956c0b1afce35bdba5a1c5 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 4 Aug 2022 15:23:26 -0700 Subject: rcu: Add full-sized polling for cond_sync_exp_full() The cond_synchronize_rcu_expedited() API compresses the combined expedited and normal grace-period states into a single unsigned long, which conserves storage, but can miss grace periods in certain cases involving overlapping normal and expedited grace periods. Missing the occasional grace period is usually not a problem, but there are use cases that care about each and every grace period. This commit therefore adds yet another member of the full-state RCU grace-period polling API, which is the cond_synchronize_rcu_exp_full() function. This uses up to three times the storage (rcu_gp_oldstate structure instead of unsigned long), but is guaranteed not to miss grace periods. Signed-off-by: Paul E. McKenney --- include/linux/rcutiny.h | 5 +++++ include/linux/rcutree.h | 1 + 2 files changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 3bee97f76bf4..4405e9112cee 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -64,6 +64,11 @@ static inline void cond_synchronize_rcu_expedited(unsigned long oldstate) cond_synchronize_rcu(oldstate); } +static inline void cond_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp) +{ + cond_synchronize_rcu_expedited(rgosp->rgos_norm); +} + extern void rcu_barrier(void); static inline void synchronize_rcu_expedited(void) diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 1b44288c027d..755b082f4ec6 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -50,6 +50,7 @@ struct rcu_gp_oldstate { unsigned long start_poll_synchronize_rcu_expedited(void); void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp); void cond_synchronize_rcu_expedited(unsigned long oldstate); +void cond_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp); unsigned long get_state_synchronize_rcu(void); void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); unsigned long start_poll_synchronize_rcu(void); -- cgit From 7ecef0871dd9a879038dbe8a681ab48bd0c92988 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Thu, 4 Aug 2022 17:54:53 -0700 Subject: rcu: Remove ->rgos_polled field from rcu_gp_oldstate structure Because both normal and expedited grace periods increment their respective counters on their pre-scheduler early boot fastpaths, the rcu_gp_oldstate structure no longer needs its ->rgos_polled field. This commit therefore removes this field, shrinking this structure so that it is the same size as an rcu_head structure. Signed-off-by: Paul E. McKenney --- include/linux/rcutree.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 755b082f4ec6..455a03bdce15 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -44,7 +44,6 @@ bool rcu_gp_might_be_stalled(void); struct rcu_gp_oldstate { unsigned long rgos_norm; unsigned long rgos_exp; - unsigned long rgos_polled; }; unsigned long start_poll_synchronize_rcu_expedited(void); -- cgit From 18538248e5486b0f0e8581083de275176674cd1f Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Wed, 24 Aug 2022 15:39:09 -0700 Subject: rcu: Add functions to compare grace-period state values This commit adds same_state_synchronize_rcu() and same_state_synchronize_rcu_full() functions to compare grace-period state values, for example, those obtained from get_state_synchronize_rcu() and get_state_synchronize_rcu_full(). These functions allow small structures to omit these state values by placing them in list headers for lists containing structures with the same token value. Presumably the per-structure list pointers are the same ones used to link the structures into whatever reader-accessible data structure was used. This commit also adds both NUM_ACTIVE_RCU_POLL_OLDSTATE and NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE, which define the maximum number of distinct unsigned long values and rcu_gp_oldstate values, respectively, corresponding to not-yet-completed grace periods. These values can be used to size arrays of the list headers described above. Signed-off-by: Paul E. McKenney --- include/linux/rcupdate.h | 21 +++++++++++++++++++++ include/linux/rcutiny.h | 14 ++++++++++++++ include/linux/rcutree.h | 28 ++++++++++++++++++++++++++++ 3 files changed, 63 insertions(+) (limited to 'include/linux') diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index faaa174dfb27..9941d5c3d5e1 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -47,6 +47,27 @@ struct rcu_gp_oldstate; unsigned long get_completed_synchronize_rcu(void); void get_completed_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp); +// Maximum number of unsigned long values corresponding to +// not-yet-completed RCU grace periods. +#define NUM_ACTIVE_RCU_POLL_OLDSTATE 2 + +/** + * same_state_synchronize_rcu - Are two old-state values identical? + * @oldstate1: First old-state value. + * @oldstate2: Second old-state value. + * + * The two old-state values must have been obtained from either + * get_state_synchronize_rcu(), start_poll_synchronize_rcu(), or + * get_completed_synchronize_rcu(). Returns @true if the two values are + * identical and @false otherwise. This allows structures whose lifetimes + * are tracked by old-state values to push these values to a list header, + * allowing those structures to be slightly smaller. + */ +static inline bool same_state_synchronize_rcu(unsigned long oldstate1, unsigned long oldstate2) +{ + return oldstate1 == oldstate2; +} + #ifdef CONFIG_PREEMPT_RCU void __rcu_read_lock(void); diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 4405e9112cee..768196a5f39d 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -18,6 +18,20 @@ struct rcu_gp_oldstate { unsigned long rgos_norm; }; +// Maximum number of rcu_gp_oldstate values corresponding to +// not-yet-completed RCU grace periods. +#define NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE 2 + +/* + * Are the two oldstate values the same? See the Tree RCU version for + * docbook header. + */ +static inline bool same_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp1, + struct rcu_gp_oldstate *rgosp2) +{ + return rgosp1->rgos_norm == rgosp2->rgos_norm; +} + unsigned long get_state_synchronize_rcu(void); static inline void get_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp) diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 455a03bdce15..5efb51486e8a 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -46,6 +46,34 @@ struct rcu_gp_oldstate { unsigned long rgos_exp; }; +// Maximum number of rcu_gp_oldstate values corresponding to +// not-yet-completed RCU grace periods. +#define NUM_ACTIVE_RCU_POLL_FULL_OLDSTATE 4 + +/** + * same_state_synchronize_rcu_full - Are two old-state values identical? + * @rgosp1: First old-state value. + * @rgosp2: Second old-state value. + * + * The two old-state values must have been obtained from either + * get_state_synchronize_rcu_full(), start_poll_synchronize_rcu_full(), + * or get_completed_synchronize_rcu_full(). Returns @true if the two + * values are identical and @false otherwise. This allows structures + * whose lifetimes are tracked by old-state values to push these values + * to a list header, allowing those structures to be slightly smaller. + * + * Note that equality is judged on a bitwise basis, so that an + * @rcu_gp_oldstate structure with an already-completed state in one field + * will compare not-equal to a structure with an already-completed state + * in the other field. After all, the @rcu_gp_oldstate structure is opaque + * so how did such a situation come to pass in the first place? + */ +static inline bool same_state_synchronize_rcu_full(struct rcu_gp_oldstate *rgosp1, + struct rcu_gp_oldstate *rgosp2) +{ + return rgosp1->rgos_norm == rgosp2->rgos_norm && rgosp1->rgos_exp == rgosp2->rgos_exp; +} + unsigned long start_poll_synchronize_rcu_expedited(void); void start_poll_synchronize_rcu_expedited_full(struct rcu_gp_oldstate *rgosp); void cond_synchronize_rcu_expedited(unsigned long oldstate); -- cgit From d66e4cf974a53c1195f1f5a96387ee5dbad2bdf2 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 2 Aug 2022 10:18:23 -0700 Subject: srcu: Add GP and maximum requested GP to Tiny SRCU rcutorture output This commit adds the ->srcu_idx and ->srcu_max_idx fields to the Tiny SRCU rcutorture output for additional diagnostics. Signed-off-by: Paul E. McKenney --- include/linux/srcutiny.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h index 6cfaa0a9a9b9..4fcec6f5af90 100644 --- a/include/linux/srcutiny.h +++ b/include/linux/srcutiny.h @@ -82,10 +82,12 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp, int idx; idx = ((data_race(READ_ONCE(ssp->srcu_idx)) + 1) & 0x2) >> 1; - pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd)\n", + pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd) gp: %hu->%hu\n", tt, tf, idx, data_race(READ_ONCE(ssp->srcu_lock_nesting[!idx])), - data_race(READ_ONCE(ssp->srcu_lock_nesting[idx]))); + data_race(READ_ONCE(ssp->srcu_lock_nesting[idx])), + data_race(READ_ONCE(ssp->srcu_idx)), + data_race(READ_ONCE(ssp->srcu_idx_max))); } #endif -- cgit From 5fe89191e43fb37e874b8e5177fb2a5c72379b06 Mon Sep 17 00:00:00 2001 From: "Paul E. McKenney" Date: Tue, 2 Aug 2022 15:32:47 -0700 Subject: srcu: Make Tiny SRCU use full-sized grace-period counters This commit makes Tiny SRCU use full-sized grace-period counters to further avoid counter-wrap issues when using polled grace-period APIs. Signed-off-by: Paul E. McKenney --- include/linux/srcutiny.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/srcutiny.h b/include/linux/srcutiny.h index 4fcec6f5af90..5aa5e0faf6a1 100644 --- a/include/linux/srcutiny.h +++ b/include/linux/srcutiny.h @@ -15,10 +15,10 @@ struct srcu_struct { short srcu_lock_nesting[2]; /* srcu_read_lock() nesting depth. */ - unsigned short srcu_idx; /* Current reader array element in bit 0x2. */ - unsigned short srcu_idx_max; /* Furthest future srcu_idx request. */ u8 srcu_gp_running; /* GP workqueue running? */ u8 srcu_gp_waiting; /* GP waiting for readers? */ + unsigned long srcu_idx; /* Current reader array element in bit 0x2. */ + unsigned long srcu_idx_max; /* Furthest future srcu_idx request. */ struct swait_queue_head srcu_wq; /* Last srcu_read_unlock() wakes GP. */ struct rcu_head *srcu_cb_head; /* Pending callbacks: Head. */ @@ -82,7 +82,7 @@ static inline void srcu_torture_stats_print(struct srcu_struct *ssp, int idx; idx = ((data_race(READ_ONCE(ssp->srcu_idx)) + 1) & 0x2) >> 1; - pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd) gp: %hu->%hu\n", + pr_alert("%s%s Tiny SRCU per-CPU(idx=%d): (%hd,%hd) gp: %lu->%lu\n", tt, tf, idx, data_race(READ_ONCE(ssp->srcu_lock_nesting[!idx])), data_race(READ_ONCE(ssp->srcu_lock_nesting[idx])), -- cgit From f9cad07b840ec8a8eb54928908d694b6e262631c Mon Sep 17 00:00:00 2001 From: Mika Westerberg Date: Tue, 30 Aug 2022 18:32:47 +0300 Subject: thunderbolt: Show link type for XDomain connections too Following what we do for routers already, extend this to XDomain connections as well. This will show in sysfs whether the link is in USB4 or Thunderbolt mode. Signed-off-by: Mika Westerberg Signed-off-by: David S. Miller --- include/linux/thunderbolt.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/thunderbolt.h b/include/linux/thunderbolt.h index 9f442d73f3df..90cd08ab2f5d 100644 --- a/include/linux/thunderbolt.h +++ b/include/linux/thunderbolt.h @@ -187,6 +187,7 @@ void tb_unregister_property_dir(const char *key, struct tb_property_dir *dir); * @device_name: Name of the device (or %NULL if not known) * @link_speed: Speed of the link in Gb/s * @link_width: Width of the link (1 or 2) + * @link_usb4: Downstream link is USB4 * @is_unplugged: The XDomain is unplugged * @needs_uuid: If the XDomain does not have @remote_uuid it will be * queried first @@ -234,6 +235,7 @@ struct tb_xdomain { const char *device_name; unsigned int link_speed; unsigned int link_width; + bool link_usb4; bool is_unplugged; bool needs_uuid; struct ida service_ids; -- cgit From ec1bd37123c607ca6485beb4542a792a4db765aa Mon Sep 17 00:00:00 2001 From: Khalid Masum Date: Thu, 18 Aug 2022 10:07:38 +0600 Subject: fscache: fix misdocumented parameter This patch fixes two warnings generated by make docs. The functions fscache_use_cookie and fscache_unuse_cookie, both have a parameter named cookie. But they are documented with the name "object" with unclear description. Which generates the warning when creating docs. This commit will replace the currently misdocumented parameter names with the correct ones while adding proper descriptions. CC: Randy Dunlap Signed-off-by: Khalid Masum Signed-off-by: David Howells Link: https://lore.kernel.org/r/20220521142446.4746-1-khalid.masum.92@gmail.com/ # v1 Link: https://lore.kernel.org/r/20220818040738.12036-1-khalid.masum.92@gmail.com/ # v2 Link: https://lore.kernel.org/r/880d7d25753fb326ee17ac08005952112fcf9bdb.1657360984.git.mchehab@kernel.org/ # Mauro's version --- include/linux/fscache.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 720874e6ee94..36e5dd84cf59 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -258,7 +258,7 @@ struct fscache_cookie *fscache_acquire_cookie(struct fscache_volume *volume, /** * fscache_use_cookie - Request usage of cookie attached to an object - * @object: Object description + * @cookie: The cookie representing the cache object * @will_modify: If cache is expected to be modified locally * * Request usage of the cookie attached to an object. The caller should tell @@ -274,7 +274,7 @@ static inline void fscache_use_cookie(struct fscache_cookie *cookie, /** * fscache_unuse_cookie - Cease usage of cookie attached to an object - * @object: Object description + * @cookie: The cookie representing the cache object * @aux_data: Updated auxiliary data (or NULL) * @object_size: Revised size of the object (or NULL) * -- cgit From 52edb4080eb9606536c34d5d642ccd9d35ad5d08 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Mon, 29 Aug 2022 14:38:43 +0200 Subject: acl: move idmapping handling into posix_acl_xattr_set() The uapi POSIX ACL struct passed through the value argument during setxattr() contains {g,u}id values encoded via ACL_{GROUP,USER} entries that should actually be stored in the form of k{g,u}id_t (See [1] for a long explanation of the issue.). In 0c5fd887d2bb ("acl: move idmapped mount fixup into vfs_{g,s}etxattr()") we took the mount's idmapping into account in order to let overlayfs handle POSIX ACLs on idmapped layers correctly. The fixup is currently performed directly in vfs_setxattr() which piles on top of the earlier hackiness by handling the mount's idmapping and stuff the vfs{g,u}id_t values into the uapi struct as well. While that is all correct and works fine it's just ugly. Now that we have introduced vfs_make_posix_acl() earlier move handling idmapped mounts out of vfs_setxattr() and into the POSIX ACL handler where it belongs. Note that we also need to call vfs_make_posix_acl() for EVM which interpretes POSIX ACLs during security_inode_setxattr(). Leave them a longer comment for future reference. All filesystems that support idmapped mounts via FS_ALLOW_IDMAP use the standard POSIX ACL xattr handlers and are covered by this change. This includes overlayfs which simply calls vfs_{g,s}etxattr(). The following filesystems use custom POSIX ACL xattr handlers: 9p, cifs, ecryptfs, and ntfs3 (and overlayfs but we've covered that in the paragraph above) and none of them support idmapped mounts yet. Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org/ [1] Signed-off-by: Christian Brauner (Microsoft) Reviewed-by: Seth Forshee (DigitalOcean) --- include/linux/posix_acl_xattr.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/posix_acl_xattr.h b/include/linux/posix_acl_xattr.h index 47eca15fd842..8163dd48c430 100644 --- a/include/linux/posix_acl_xattr.h +++ b/include/linux/posix_acl_xattr.h @@ -38,9 +38,6 @@ void posix_acl_fix_xattr_to_user(void *value, size_t size); void posix_acl_getxattr_idmapped_mnt(struct user_namespace *mnt_userns, const struct inode *inode, void *value, size_t size); -void posix_acl_setxattr_idmapped_mnt(struct user_namespace *mnt_userns, - const struct inode *inode, - void *value, size_t size); #else static inline void posix_acl_fix_xattr_from_user(void *value, size_t size) { @@ -54,12 +51,6 @@ posix_acl_getxattr_idmapped_mnt(struct user_namespace *mnt_userns, size_t size) { } -static inline void -posix_acl_setxattr_idmapped_mnt(struct user_namespace *mnt_userns, - const struct inode *inode, void *value, - size_t size) -{ -} #endif struct posix_acl *posix_acl_from_xattr(struct user_namespace *user_ns, -- cgit From 6344e66970c619a1623f457910e78819076e9104 Mon Sep 17 00:00:00 2001 From: Christian Brauner Date: Mon, 29 Aug 2022 14:38:45 +0200 Subject: xattr: constify value argument in vfs_setxattr() Now that we don't perform translations directly in vfs_setxattr() anymore we can constify the @value argument in vfs_setxattr(). This also allows us to remove the hack to cast from a const in ovl_do_setxattr(). Signed-off-by: Christian Brauner (Microsoft) Reviewed-by: Seth Forshee (DigitalOcean) --- include/linux/xattr.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/xattr.h b/include/linux/xattr.h index 979a9d3e5bfb..4c379d23ec6e 100644 --- a/include/linux/xattr.h +++ b/include/linux/xattr.h @@ -61,7 +61,7 @@ int __vfs_setxattr_locked(struct user_namespace *, struct dentry *, const char *, const void *, size_t, int, struct inode **); int vfs_setxattr(struct user_namespace *, struct dentry *, const char *, - void *, size_t, int); + const void *, size_t, int); int __vfs_removexattr(struct user_namespace *, struct dentry *, const char *); int __vfs_removexattr_locked(struct user_namespace *, struct dentry *, const char *, struct inode **); -- cgit From b6df1cbb415e543f2908f9c59a8fb20714b86879 Mon Sep 17 00:00:00 2001 From: James Clark Date: Tue, 30 Aug 2022 18:26:10 +0100 Subject: coresight: Simplify sysfs accessors by using csdev_access abstraction The coresight_device struct is available in the sysfs accessor, and this contains a csdev_access struct which can be used to access registers. Use this instead of passing in the type of each drvdata so that a common function can be shared between all the cs drivers. No functional changes. Signed-off-by: James Clark Reviewed-by: Mike Leach Link: https://lore.kernel.org/r/20220830172614.340962-3-james.clark@arm.com Signed-off-by: Mathieu Poirier --- include/linux/coresight.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'include/linux') diff --git a/include/linux/coresight.h b/include/linux/coresight.h index 9f445f09fcfe..a47dd1f62216 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -372,6 +372,24 @@ static inline u32 csdev_access_relaxed_read32(struct csdev_access *csa, return csa->read(offset, true, false); } +static inline u64 csdev_access_relaxed_read_pair(struct csdev_access *csa, + s32 lo_offset, s32 hi_offset) +{ + u64 val; + + if (likely(csa->io_mem)) { + val = readl_relaxed(csa->base + lo_offset); + val |= (hi_offset < 0) ? 0 : + (u64)readl_relaxed(csa->base + hi_offset) << 32; + return val; + } + + val = csa->read(lo_offset, true, false); + val |= (hi_offset < 0) ? 0 : + (u64)csa->read(hi_offset, true, false) << 32; + return val; +} + static inline u32 csdev_access_read32(struct csdev_access *csa, u32 offset) { if (likely(csa->io_mem)) -- cgit From 0a98181f805058773961c5ab3172ecf1bf1ed0e1 Mon Sep 17 00:00:00 2001 From: James Clark Date: Tue, 30 Aug 2022 18:26:13 +0100 Subject: coresight: Make new csdev_access offsets unsigned New csdev_access functions were added as part of the previous refactor. In order to make them more consistent with the existing ones, change any signed offset types to be unsigned. Now that they are unsigned, stop using hi_off = -1 to signify a single 32bit access. Instead just call the existing 32bit accessors. This is also applied to other parts of the codebase, and the coresight_{read,write}_reg_pair() functions can be deleted. Signed-off-by: James Clark Reviewed-by: Mike Leach Link: https://lore.kernel.org/r/20220830172614.340962-6-james.clark@arm.com Signed-off-by: Mathieu Poirier --- include/linux/coresight.h | 27 ++++++++++++++++----------- 1 file changed, 16 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/coresight.h b/include/linux/coresight.h index a47dd1f62216..1554021231f9 100644 --- a/include/linux/coresight.h +++ b/include/linux/coresight.h @@ -373,21 +373,26 @@ static inline u32 csdev_access_relaxed_read32(struct csdev_access *csa, } static inline u64 csdev_access_relaxed_read_pair(struct csdev_access *csa, - s32 lo_offset, s32 hi_offset) + u32 lo_offset, u32 hi_offset) { - u64 val; - if (likely(csa->io_mem)) { - val = readl_relaxed(csa->base + lo_offset); - val |= (hi_offset < 0) ? 0 : - (u64)readl_relaxed(csa->base + hi_offset) << 32; - return val; + return readl_relaxed(csa->base + lo_offset) | + ((u64)readl_relaxed(csa->base + hi_offset) << 32); } - val = csa->read(lo_offset, true, false); - val |= (hi_offset < 0) ? 0 : - (u64)csa->read(hi_offset, true, false) << 32; - return val; + return csa->read(lo_offset, true, false) | (csa->read(hi_offset, true, false) << 32); +} + +static inline void csdev_access_relaxed_write_pair(struct csdev_access *csa, u64 val, + u32 lo_offset, u32 hi_offset) +{ + if (likely(csa->io_mem)) { + writel_relaxed((u32)val, csa->base + lo_offset); + writel_relaxed((u32)(val >> 32), csa->base + hi_offset); + } else { + csa->write((u32)val, lo_offset, true, false); + csa->write((u32)(val >> 32), hi_offset, true, false); + } } static inline u32 csdev_access_read32(struct csdev_access *csa, u32 offset) -- cgit From 92d23c6e94157739b997cacce151586a0d07bb8a Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Fri, 26 Aug 2022 09:21:16 -0700 Subject: overflow, tracing: Define the is_signed_type() macro once There are two definitions of the is_signed_type() macro: one in and a second definition in . As suggested by Linus Torvalds, move the definition of the is_signed_type() macro into the header file. Change the definition of the is_signed_type() macro to make sure that it does not trigger any sparse warnings with future versions of sparse for bitwise types. See also: https://lore.kernel.org/all/CAHk-=whjH6p+qzwUdx5SOVVHjS3WvzJQr6mDUwhEyTf6pJWzaQ@mail.gmail.com/ https://lore.kernel.org/all/CAHk-=wjQGnVfb4jehFR0XyZikdQvCZouE96xR_nnf5kqaM5qqQ@mail.gmail.com/ Cc: Andrew Morton Cc: Arnd Bergmann Cc: Dan Williams Cc: Eric Dumazet Cc: Ingo Molnar Cc: Isabella Basso Cc: "Jason A. Donenfeld" Cc: Josh Poimboeuf Cc: Luc Van Oostenryck Cc: Masami Hiramatsu Cc: Nathan Chancellor Cc: Peter Zijlstra Cc: Rasmus Villemoes Cc: Sander Vanheule Cc: Steven Rostedt Cc: Vlastimil Babka Cc: Yury Norov Signed-off-by: Bart Van Assche Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220826162116.1050972-3-bvanassche@acm.org --- include/linux/compiler.h | 6 ++++++ include/linux/overflow.h | 1 - include/linux/trace_events.h | 2 -- 3 files changed, 6 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 01ce94b58b42..7713d7bcdaea 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -239,6 +239,12 @@ static inline void *offset_to_ptr(const int *off) /* &a[0] degrades to a pointer: a different type from an array */ #define __must_be_array(a) BUILD_BUG_ON_ZERO(__same_type((a), &(a)[0])) +/* + * Whether 'type' is a signed type or an unsigned type. Supports scalar types, + * bool and also pointer types. + */ +#define is_signed_type(type) (((type)(-1)) < (__force type)1) + /* * This is needed in functions which generate the stack canary, see * arch/x86/kernel/smpboot.c::start_secondary() for an example. diff --git a/include/linux/overflow.h b/include/linux/overflow.h index f1221d11f8e5..0eb3b192f07a 100644 --- a/include/linux/overflow.h +++ b/include/linux/overflow.h @@ -30,7 +30,6 @@ * https://mail-index.netbsd.org/tech-misc/2007/02/05/0000.html - * credit to Christian Biere. */ -#define is_signed_type(type) (((type)(-1)) < (type)1) #define __type_half_max(type) ((type)1 << (8*sizeof(type) - 1 - is_signed_type(type))) #define type_max(T) ((T)((__type_half_max(T) - 1) + __type_half_max(T))) #define type_min(T) ((T)((T)-type_max(T)-(T)1)) diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index b18759a673c6..8401dec93c15 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -814,8 +814,6 @@ extern int trace_add_event_call(struct trace_event_call *call); extern int trace_remove_event_call(struct trace_event_call *call); extern int trace_event_get_offsets(struct trace_event_call *call); -#define is_signed_type(type) (((type)(-1)) < (type)1) - int ftrace_set_clr_event(struct trace_array *tr, char *buf, int set); int trace_set_clr_event(const char *system, const char *event, int set); int trace_array_set_clr_event(struct trace_array *tr, const char *system, -- cgit From bd8092def983f567800f764a5d23b2dca98f078c Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Thu, 18 Aug 2022 23:01:56 +0200 Subject: PM: suspend: move from strlcpy() with unused retval to strscpy() Follow the advice of the below link and prefer 'strscpy' in this subsystem. Conversion is 1:1 because the return value is not used. Generated by a coccinelle script. Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/ Signed-off-by: Wolfram Sang Signed-off-by: Rafael J. Wysocki --- include/linux/suspend.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 70f2921e2e70..23a253df7f6b 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -75,7 +75,7 @@ extern struct suspend_stats suspend_stats; static inline void dpm_save_failed_dev(const char *name) { - strlcpy(suspend_stats.failed_devs[suspend_stats.last_failed_dev], + strscpy(suspend_stats.failed_devs[suspend_stats.last_failed_dev], name, sizeof(suspend_stats.failed_devs[0])); suspend_stats.last_failed_dev++; -- cgit From 21370ecddfe1ff6fb826faedb601cfbb7adcf4ff Mon Sep 17 00:00:00 2001 From: Allen-KH Cheng Date: Thu, 1 Sep 2022 01:21:51 +0800 Subject: soc: mediatek: mutex: Add mt8186 mutex mod settings for mdp3 Add mt8186 mutex mod settings for mdp3. Co-developed-by: Xiandong Wang Signed-off-by: Xiandong Wang Signed-off-by: Allen-KH Cheng Reviewed-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20220831172151.10215-3-allen-kh.cheng@mediatek.com Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/mtk-mutex.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk-mutex.h b/include/linux/soc/mediatek/mtk-mutex.h index a0f4f51a3b45..b335c2837cd8 100644 --- a/include/linux/soc/mediatek/mtk-mutex.h +++ b/include/linux/soc/mediatek/mtk-mutex.h @@ -20,6 +20,8 @@ enum mtk_mutex_mod_index { MUTEX_MOD_IDX_MDP_WDMA, MUTEX_MOD_IDX_MDP_AAL0, MUTEX_MOD_IDX_MDP_CCORR0, + MUTEX_MOD_IDX_MDP_HDR0, + MUTEX_MOD_IDX_MDP_COLOR0, MUTEX_MOD_IDX_MAX /* ALWAYS keep at the end */ }; -- cgit From 2555283eb40df89945557273121e9393ef9b542b Mon Sep 17 00:00:00 2001 From: Jann Horn Date: Wed, 31 Aug 2022 19:06:00 +0200 Subject: mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse anon_vma->degree tracks the combined number of child anon_vmas and VMAs that use the anon_vma as their ->anon_vma. anon_vma_clone() then assumes that for any anon_vma attached to src->anon_vma_chain other than src->anon_vma, it is impossible for it to be a leaf node of the VMA tree, meaning that for such VMAs ->degree is elevated by 1 because of a child anon_vma, meaning that if ->degree equals 1 there are no VMAs that use the anon_vma as their ->anon_vma. This assumption is wrong because the ->degree optimization leads to leaf nodes being abandoned on anon_vma_clone() - an existing anon_vma is reused and no new parent-child relationship is created. So it is possible to reuse an anon_vma for one VMA while it is still tied to another VMA. This is an issue because is_mergeable_anon_vma() and its callers assume that if two VMAs have the same ->anon_vma, the list of anon_vmas attached to the VMAs is guaranteed to be the same. When this assumption is violated, vma_merge() can merge pages into a VMA that is not attached to the corresponding anon_vma, leading to dangling page->mapping pointers that will be dereferenced during rmap walks. Fix it by separately tracking the number of child anon_vmas and the number of VMAs using the anon_vma as their ->anon_vma. Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") Cc: stable@kernel.org Acked-by: Michal Hocko Acked-by: Vlastimil Babka Signed-off-by: Jann Horn Signed-off-by: Linus Torvalds --- include/linux/rmap.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index bf80adca980b..b89b4b86951f 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -41,12 +41,15 @@ struct anon_vma { atomic_t refcount; /* - * Count of child anon_vmas and VMAs which points to this anon_vma. + * Count of child anon_vmas. Equals to the count of all anon_vmas that + * have ->parent pointing to this one, including itself. * * This counter is used for making decision about reusing anon_vma * instead of forking new one. See comments in function anon_vma_clone. */ - unsigned degree; + unsigned long num_children; + /* Count of VMAs whose ->anon_vma pointer points to this object. */ + unsigned long num_active_vmas; struct anon_vma *parent; /* Parent of this anon_vma */ -- cgit From 12198d9179aaa53d0a4026318d8b73b146d89729 Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Wed, 20 Jul 2022 10:29:34 +0200 Subject: clk: davinci: remove PLL and PSC clocks for DaVinci DM644x and DM646x Commit 7dd33764486d ("ARM: davinci: Delete DM644x board files") and commit b4aed01de486 ("ARM: davinci: Delete DM646x board files") removes the support for DaVinci DM644x and DM646x boards. Hence, remove the PLL and PSC clock descriptions for those boards as well. Signed-off-by: Lukas Bulwahn Link: https://lore.kernel.org/r/20220720082934.17741-1-lukas.bulwahn@gmail.com Reviewed-by: David Lechner Reviewed-by: Linus Walleij Signed-off-by: Stephen Boyd --- include/linux/clk/davinci.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk/davinci.h b/include/linux/clk/davinci.h index 8a7b5cd7eac0..f6ebab6228c2 100644 --- a/include/linux/clk/davinci.h +++ b/include/linux/clk/davinci.h @@ -28,13 +28,5 @@ int dm365_pll1_init(struct device *dev, void __iomem *base, struct regmap *cfgch int dm365_pll2_init(struct device *dev, void __iomem *base, struct regmap *cfgchip); int dm365_psc_init(struct device *dev, void __iomem *base); #endif -#ifdef CONFIG_ARCH_DAVINCI_DM644x -int dm644x_pll1_init(struct device *dev, void __iomem *base, struct regmap *cfgchip); -int dm644x_psc_init(struct device *dev, void __iomem *base); -#endif -#ifdef CONFIG_ARCH_DAVINCI_DM646x -int dm646x_pll1_init(struct device *dev, void __iomem *base, struct regmap *cfgchip); -int dm646x_psc_init(struct device *dev, void __iomem *base); -#endif #endif /* __LINUX_CLK_DAVINCI_PLL_H___ */ -- cgit From 9a1043d43a9ab75d28b8ec54512ea13ec33bc910 Mon Sep 17 00:00:00 2001 From: Serge Semin Date: Mon, 22 Aug 2022 22:07:22 +0300 Subject: EDAC/mc: Replace spaces with tabs in memtype flags definition Currently, the memory type macros are partly defined with multiple spaces between the macro name and its definition. Replace the spaces with tabs as the kernel coding style requires. Signed-off-by: Serge Semin Signed-off-by: Borislav Petkov Link: https://lore.kernel.org/r/20220822190730.27277-13-Sergey.Semin@baikalelectronics.ru --- include/linux/edac.h | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/edac.h b/include/linux/edac.h index e730b3468719..fa4bda2a70f6 100644 --- a/include/linux/edac.h +++ b/include/linux/edac.h @@ -231,21 +231,21 @@ enum mem_type { #define MEM_FLAG_DDR BIT(MEM_DDR) #define MEM_FLAG_RDDR BIT(MEM_RDDR) #define MEM_FLAG_RMBS BIT(MEM_RMBS) -#define MEM_FLAG_DDR2 BIT(MEM_DDR2) -#define MEM_FLAG_FB_DDR2 BIT(MEM_FB_DDR2) -#define MEM_FLAG_RDDR2 BIT(MEM_RDDR2) -#define MEM_FLAG_XDR BIT(MEM_XDR) -#define MEM_FLAG_DDR3 BIT(MEM_DDR3) -#define MEM_FLAG_RDDR3 BIT(MEM_RDDR3) -#define MEM_FLAG_LPDDR3 BIT(MEM_LPDDR3) -#define MEM_FLAG_DDR4 BIT(MEM_DDR4) -#define MEM_FLAG_RDDR4 BIT(MEM_RDDR4) -#define MEM_FLAG_LRDDR4 BIT(MEM_LRDDR4) -#define MEM_FLAG_LPDDR4 BIT(MEM_LPDDR4) -#define MEM_FLAG_DDR5 BIT(MEM_DDR5) -#define MEM_FLAG_RDDR5 BIT(MEM_RDDR5) -#define MEM_FLAG_LRDDR5 BIT(MEM_LRDDR5) -#define MEM_FLAG_NVDIMM BIT(MEM_NVDIMM) +#define MEM_FLAG_DDR2 BIT(MEM_DDR2) +#define MEM_FLAG_FB_DDR2 BIT(MEM_FB_DDR2) +#define MEM_FLAG_RDDR2 BIT(MEM_RDDR2) +#define MEM_FLAG_XDR BIT(MEM_XDR) +#define MEM_FLAG_DDR3 BIT(MEM_DDR3) +#define MEM_FLAG_RDDR3 BIT(MEM_RDDR3) +#define MEM_FLAG_LPDDR3 BIT(MEM_LPDDR3) +#define MEM_FLAG_DDR4 BIT(MEM_DDR4) +#define MEM_FLAG_RDDR4 BIT(MEM_RDDR4) +#define MEM_FLAG_LRDDR4 BIT(MEM_LRDDR4) +#define MEM_FLAG_LPDDR4 BIT(MEM_LPDDR4) +#define MEM_FLAG_DDR5 BIT(MEM_DDR5) +#define MEM_FLAG_RDDR5 BIT(MEM_RDDR5) +#define MEM_FLAG_LRDDR5 BIT(MEM_LRDDR5) +#define MEM_FLAG_NVDIMM BIT(MEM_NVDIMM) #define MEM_FLAG_WIO2 BIT(MEM_WIO2) #define MEM_FLAG_HBM2 BIT(MEM_HBM2) -- cgit From 26a40990ba052e6f553256f9d0f112452b992a38 Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 17 Aug 2022 19:18:22 +0900 Subject: mm/sl[au]b: cleanup kmem_cache_alloc[_node]_trace() Despite its name, kmem_cache_alloc[_node]_trace() is hook for inlined kmalloc. So rename it to kmalloc[_node]_trace(). Move its implementation to slab_common.c by using __kmem_cache_alloc_node(), but keep CONFIG_TRACING=n varients to save a function call when CONFIG_TRACING=n. Use __assume_kmalloc_alignment for kmalloc[_node]_trace instead of __assume_slab_alignement. Generally kmalloc has larger alignment requirements. Suggested-by: Vlastimil Babka Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 4ee5b2fed164..c8e485ce8815 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -449,16 +449,16 @@ void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assum __malloc; #ifdef CONFIG_TRACING -extern void *kmem_cache_alloc_trace(struct kmem_cache *s, gfp_t flags, size_t size) - __assume_slab_alignment __alloc_size(3); - -extern void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags, - int node, size_t size) __assume_slab_alignment - __alloc_size(4); +void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size) + __assume_kmalloc_alignment __alloc_size(3); +void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags, + int node, size_t size) __assume_kmalloc_alignment + __alloc_size(4); #else /* CONFIG_TRACING */ -static __always_inline __alloc_size(3) void *kmem_cache_alloc_trace(struct kmem_cache *s, - gfp_t flags, size_t size) +/* Save a function call when CONFIG_TRACING=n */ +static __always_inline __alloc_size(3) +void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size) { void *ret = kmem_cache_alloc(s, flags); @@ -466,8 +466,9 @@ static __always_inline __alloc_size(3) void *kmem_cache_alloc_trace(struct kmem_ return ret; } -static __always_inline void *kmem_cache_alloc_node_trace(struct kmem_cache *s, gfp_t gfpflags, - int node, size_t size) +static __always_inline __alloc_size(4) +void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags, + int node, size_t size) { void *ret = kmem_cache_alloc_node(s, gfpflags, node); @@ -550,7 +551,7 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags) if (!index) return ZERO_SIZE_PTR; - return kmem_cache_alloc_trace( + return kmalloc_trace( kmalloc_caches[kmalloc_type(flags)][index], flags, size); #endif @@ -572,9 +573,9 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla if (!index) return ZERO_SIZE_PTR; - return kmem_cache_alloc_node_trace( + return kmalloc_node_trace( kmalloc_caches[kmalloc_type(flags)][index], - flags, node, size); + flags, node, size); } return __kmalloc_node(size, flags, node); } -- cgit From 7f8170686d1733321841b21da8c2cd09ddb922b7 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:36 +0800 Subject: soundwire: intel: cleanup definition of LCOUNT Add definition in header file rather than hidden in code. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-2-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index ec16ae49e6a4..49a3c265529b 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -15,7 +15,10 @@ #define SDW_LINK_SIZE 0x10000 /* Intel SHIM Registers Definition */ +/* LCAP */ #define SDW_SHIM_LCAP 0x0 +#define SDW_SHIM_LCAP_LCOUNT_MASK GENMASK(2, 0) + #define SDW_SHIM_LCTL 0x4 #define SDW_SHIM_IPPTR 0x8 #define SDW_SHIM_SYNC 0xC -- cgit From feaa24aa408f117dbf074b580d38131dc819505a Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:37 +0800 Subject: soundwire: intel: regroup definitions for LCTL No functionality change, just regroup offset and bitfield definitions. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-3-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 49a3c265529b..d9f51f43e42c 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -19,7 +19,14 @@ #define SDW_SHIM_LCAP 0x0 #define SDW_SHIM_LCAP_LCOUNT_MASK GENMASK(2, 0) +/* LCTL */ #define SDW_SHIM_LCTL 0x4 + +#define SDW_SHIM_LCTL_SPA BIT(0) +#define SDW_SHIM_LCTL_SPA_MASK GENMASK(3, 0) +#define SDW_SHIM_LCTL_CPA BIT(8) +#define SDW_SHIM_LCTL_CPA_MASK GENMASK(11, 8) + #define SDW_SHIM_IPPTR 0x8 #define SDW_SHIM_SYNC 0xC @@ -39,11 +46,6 @@ #define SDW_SHIM_WAKEEN 0x190 #define SDW_SHIM_WAKESTS 0x192 -#define SDW_SHIM_LCTL_SPA BIT(0) -#define SDW_SHIM_LCTL_SPA_MASK GENMASK(3, 0) -#define SDW_SHIM_LCTL_CPA BIT(8) -#define SDW_SHIM_LCTL_CPA_MASK GENMASK(11, 8) - #define SDW_SHIM_SYNC_SYNCPRD_VAL_24 (24000 / SDW_CADENCE_GSYNC_KHZ - 1) #define SDW_SHIM_SYNC_SYNCPRD_VAL_38_4 (38400 / SDW_CADENCE_GSYNC_KHZ - 1) #define SDW_SHIM_SYNC_SYNCPRD GENMASK(14, 0) -- cgit From c36b610047463a37203e1158aeaf90f0a82f45dc Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:38 +0800 Subject: soundwire: intel: remove IPPTR unused definition This read-only register only defines an offset which is known already and a version which isn't used. Remove unused definition. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-4-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index d9f51f43e42c..581a9ba32f82 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -27,7 +27,6 @@ #define SDW_SHIM_LCTL_CPA BIT(8) #define SDW_SHIM_LCTL_CPA_MASK GENMASK(11, 8) -#define SDW_SHIM_IPPTR 0x8 #define SDW_SHIM_SYNC 0xC #define SDW_SHIM_CTLSCAP(x) (0x010 + 0x60 * (x)) -- cgit From ca33a58d12d3c75a3b28cc97037563bb6e03be7c Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:39 +0800 Subject: soundwire: intel: cleanup SHIM SYNC Regroup offset and bitfields, no functionality change Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-5-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 581a9ba32f82..66503cf29f48 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -27,8 +27,17 @@ #define SDW_SHIM_LCTL_CPA BIT(8) #define SDW_SHIM_LCTL_CPA_MASK GENMASK(11, 8) +/* SYNC */ #define SDW_SHIM_SYNC 0xC +#define SDW_SHIM_SYNC_SYNCPRD_VAL_24 (24000 / SDW_CADENCE_GSYNC_KHZ - 1) +#define SDW_SHIM_SYNC_SYNCPRD_VAL_38_4 (38400 / SDW_CADENCE_GSYNC_KHZ - 1) +#define SDW_SHIM_SYNC_SYNCPRD GENMASK(14, 0) +#define SDW_SHIM_SYNC_SYNCCPU BIT(15) +#define SDW_SHIM_SYNC_CMDSYNC_MASK GENMASK(19, 16) +#define SDW_SHIM_SYNC_CMDSYNC BIT(16) +#define SDW_SHIM_SYNC_SYNCGO BIT(24) + #define SDW_SHIM_CTLSCAP(x) (0x010 + 0x60 * (x)) #define SDW_SHIM_CTLS0CM(x) (0x012 + 0x60 * (x)) #define SDW_SHIM_CTLS1CM(x) (0x014 + 0x60 * (x)) @@ -45,14 +54,6 @@ #define SDW_SHIM_WAKEEN 0x190 #define SDW_SHIM_WAKESTS 0x192 -#define SDW_SHIM_SYNC_SYNCPRD_VAL_24 (24000 / SDW_CADENCE_GSYNC_KHZ - 1) -#define SDW_SHIM_SYNC_SYNCPRD_VAL_38_4 (38400 / SDW_CADENCE_GSYNC_KHZ - 1) -#define SDW_SHIM_SYNC_SYNCPRD GENMASK(14, 0) -#define SDW_SHIM_SYNC_SYNCCPU BIT(15) -#define SDW_SHIM_SYNC_CMDSYNC_MASK GENMASK(19, 16) -#define SDW_SHIM_SYNC_CMDSYNC BIT(16) -#define SDW_SHIM_SYNC_SYNCGO BIT(24) - #define SDW_SHIM_PCMSCAP_ISS GENMASK(3, 0) #define SDW_SHIM_PCMSCAP_OSS GENMASK(7, 4) #define SDW_SHIM_PCMSCAP_BSS GENMASK(12, 8) -- cgit From c27ce5c9dd178f7218de6fd29651ad04bd496d5c Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:40 +0800 Subject: soundwire: intel: remove unused PDM capabilities We removed PDM support a long time ago but kept the definitions. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-6-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 66503cf29f48..34af5bc010b3 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -47,7 +47,6 @@ #define SDW_SHIM_PCMSYCHM(x, y) (0x022 + (0x60 * (x)) + (0x2 * (y))) #define SDW_SHIM_PCMSYCHC(x, y) (0x042 + (0x60 * (x)) + (0x2 * (y))) -#define SDW_SHIM_PDMSCAP(x) (0x062 + 0x60 * (x)) #define SDW_SHIM_IOCTL(x) (0x06C + 0x60 * (x)) #define SDW_SHIM_CTMCTL(x) (0x06E + 0x60 * (x)) @@ -63,11 +62,6 @@ #define SDW_SHIM_PCMSYCM_STREAM GENMASK(13, 8) #define SDW_SHIM_PCMSYCM_DIR BIT(15) -#define SDW_SHIM_PDMSCAP_ISS GENMASK(3, 0) -#define SDW_SHIM_PDMSCAP_OSS GENMASK(7, 4) -#define SDW_SHIM_PDMSCAP_BSS GENMASK(12, 8) -#define SDW_SHIM_PDMSCAP_CPSS GENMASK(15, 13) - #define SDW_SHIM_IOCTL_MIF BIT(0) #define SDW_SHIM_IOCTL_CO BIT(1) #define SDW_SHIM_IOCTL_COE BIT(2) -- cgit From bd45a65dad8e13ba47f8fb3d6a6c68adecc96db7 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:41 +0800 Subject: soundwire: intel: add comment for control stream cap/chmap add comment and newline to mark out control stream capabilities and chmap. These registers are unused by the driver, only dumped in debugfs. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-7-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 34af5bc010b3..7c1d111c5f3f 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -38,11 +38,13 @@ #define SDW_SHIM_SYNC_CMDSYNC BIT(16) #define SDW_SHIM_SYNC_SYNCGO BIT(24) +/* Control stream capabililities and channel mask */ #define SDW_SHIM_CTLSCAP(x) (0x010 + 0x60 * (x)) #define SDW_SHIM_CTLS0CM(x) (0x012 + 0x60 * (x)) #define SDW_SHIM_CTLS1CM(x) (0x014 + 0x60 * (x)) #define SDW_SHIM_CTLS2CM(x) (0x016 + 0x60 * (x)) #define SDW_SHIM_CTLS3CM(x) (0x018 + 0x60 * (x)) + #define SDW_SHIM_PCMSCAP(x) (0x020 + 0x60 * (x)) #define SDW_SHIM_PCMSYCHM(x, y) (0x022 + (0x60 * (x)) + (0x2 * (y))) -- cgit From 40f7a3ddf4e4eff8868d0b0e4ed072f41ab6a528 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:42 +0800 Subject: soundwire: intel: cleanup PCM stream capabilities Regroup offset and bitfields Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-8-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 7c1d111c5f3f..958628e936ea 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -45,8 +45,13 @@ #define SDW_SHIM_CTLS2CM(x) (0x016 + 0x60 * (x)) #define SDW_SHIM_CTLS3CM(x) (0x018 + 0x60 * (x)) +/* PCM Stream capabilities */ #define SDW_SHIM_PCMSCAP(x) (0x020 + 0x60 * (x)) +#define SDW_SHIM_PCMSCAP_ISS GENMASK(3, 0) +#define SDW_SHIM_PCMSCAP_OSS GENMASK(7, 4) +#define SDW_SHIM_PCMSCAP_BSS GENMASK(12, 8) + #define SDW_SHIM_PCMSYCHM(x, y) (0x022 + (0x60 * (x)) + (0x2 * (y))) #define SDW_SHIM_PCMSYCHC(x, y) (0x042 + (0x60 * (x)) + (0x2 * (y))) #define SDW_SHIM_IOCTL(x) (0x06C + 0x60 * (x)) @@ -55,10 +60,6 @@ #define SDW_SHIM_WAKEEN 0x190 #define SDW_SHIM_WAKESTS 0x192 -#define SDW_SHIM_PCMSCAP_ISS GENMASK(3, 0) -#define SDW_SHIM_PCMSCAP_OSS GENMASK(7, 4) -#define SDW_SHIM_PCMSCAP_BSS GENMASK(12, 8) - #define SDW_SHIM_PCMSYCM_LCHN GENMASK(3, 0) #define SDW_SHIM_PCMSYCM_HCHN GENMASK(7, 4) #define SDW_SHIM_PCMSYCM_STREAM GENMASK(13, 8) -- cgit From 5c0d256201ec0e47e42ca8cafd3bee54f1c923f0 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:43 +0800 Subject: soundwire: intel: cleanup PCM Stream channel map and channel count Regroup offset and bitfield definitions. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-9-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 958628e936ea..a1810701642e 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -52,19 +52,23 @@ #define SDW_SHIM_PCMSCAP_OSS GENMASK(7, 4) #define SDW_SHIM_PCMSCAP_BSS GENMASK(12, 8) +/* PCM Stream Channel Map */ #define SDW_SHIM_PCMSYCHM(x, y) (0x022 + (0x60 * (x)) + (0x2 * (y))) -#define SDW_SHIM_PCMSYCHC(x, y) (0x042 + (0x60 * (x)) + (0x2 * (y))) -#define SDW_SHIM_IOCTL(x) (0x06C + 0x60 * (x)) -#define SDW_SHIM_CTMCTL(x) (0x06E + 0x60 * (x)) -#define SDW_SHIM_WAKEEN 0x190 -#define SDW_SHIM_WAKESTS 0x192 +/* PCM Stream Channel Count */ +#define SDW_SHIM_PCMSYCHC(x, y) (0x042 + (0x60 * (x)) + (0x2 * (y))) #define SDW_SHIM_PCMSYCM_LCHN GENMASK(3, 0) #define SDW_SHIM_PCMSYCM_HCHN GENMASK(7, 4) #define SDW_SHIM_PCMSYCM_STREAM GENMASK(13, 8) #define SDW_SHIM_PCMSYCM_DIR BIT(15) +#define SDW_SHIM_IOCTL(x) (0x06C + 0x60 * (x)) +#define SDW_SHIM_CTMCTL(x) (0x06E + 0x60 * (x)) + +#define SDW_SHIM_WAKEEN 0x190 +#define SDW_SHIM_WAKESTS 0x192 + #define SDW_SHIM_IOCTL_MIF BIT(0) #define SDW_SHIM_IOCTL_CO BIT(1) #define SDW_SHIM_IOCTL_COE BIT(2) -- cgit From 3ea29d33651db069080d1e3b12aea5ef36d766d1 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:44 +0800 Subject: soundwire: intel: cleanup IO control Regroup offset and bitfield definitions. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-10-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index a1810701642e..a2e1927ac8ac 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -63,11 +63,8 @@ #define SDW_SHIM_PCMSYCM_STREAM GENMASK(13, 8) #define SDW_SHIM_PCMSYCM_DIR BIT(15) +/* IO control */ #define SDW_SHIM_IOCTL(x) (0x06C + 0x60 * (x)) -#define SDW_SHIM_CTMCTL(x) (0x06E + 0x60 * (x)) - -#define SDW_SHIM_WAKEEN 0x190 -#define SDW_SHIM_WAKESTS 0x192 #define SDW_SHIM_IOCTL_MIF BIT(0) #define SDW_SHIM_IOCTL_CO BIT(1) @@ -79,6 +76,11 @@ #define SDW_SHIM_IOCTL_CIBD BIT(8) #define SDW_SHIM_IOCTL_DIBD BIT(9) +#define SDW_SHIM_CTMCTL(x) (0x06E + 0x60 * (x)) + +#define SDW_SHIM_WAKEEN 0x190 +#define SDW_SHIM_WAKESTS 0x192 + #define SDW_SHIM_CTMCTL_DACTQE BIT(0) #define SDW_SHIM_CTMCTL_DODS BIT(1) #define SDW_SHIM_CTMCTL_DOAIS GENMASK(4, 3) -- cgit From bc7b9595392407549158651bc760c9bda2ab8b5d Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:45 +0800 Subject: soundwire: intel: cleanup AC Timing Control Regroup offset and bitfield definitions Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-11-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index a2e1927ac8ac..3a56fd5a6331 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -76,11 +76,12 @@ #define SDW_SHIM_IOCTL_CIBD BIT(8) #define SDW_SHIM_IOCTL_DIBD BIT(9) -#define SDW_SHIM_CTMCTL(x) (0x06E + 0x60 * (x)) - #define SDW_SHIM_WAKEEN 0x190 #define SDW_SHIM_WAKESTS 0x192 +/* AC Timing control */ +#define SDW_SHIM_CTMCTL(x) (0x06E + 0x60 * (x)) + #define SDW_SHIM_CTMCTL_DACTQE BIT(0) #define SDW_SHIM_CTMCTL_DODS BIT(1) #define SDW_SHIM_CTMCTL_DOAIS GENMASK(4, 3) -- cgit From 279e46bc298629b26436c90bd4b5d104cc1e0fb2 Mon Sep 17 00:00:00 2001 From: Pierre-Louis Bossart Date: Tue, 23 Aug 2022 13:38:46 +0800 Subject: soundwire: intel: cleanup WakeEnable and WakeStatus Regroup offset and bitfield definitions. Signed-off-by: Pierre-Louis Bossart Reviewed-by: Rander Wang Signed-off-by: Bard Liao Link: https://lore.kernel.org/r/20220823053846.2684635-12-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/soundwire/sdw_intel.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/soundwire/sdw_intel.h b/include/linux/soundwire/sdw_intel.h index 3a56fd5a6331..2e9fd91572d4 100644 --- a/include/linux/soundwire/sdw_intel.h +++ b/include/linux/soundwire/sdw_intel.h @@ -76,9 +76,16 @@ #define SDW_SHIM_IOCTL_CIBD BIT(8) #define SDW_SHIM_IOCTL_DIBD BIT(9) +/* Wake Enable*/ #define SDW_SHIM_WAKEEN 0x190 + +#define SDW_SHIM_WAKEEN_ENABLE BIT(0) + +/* Wake Status */ #define SDW_SHIM_WAKESTS 0x192 +#define SDW_SHIM_WAKESTS_STATUS BIT(0) + /* AC Timing control */ #define SDW_SHIM_CTMCTL(x) (0x06E + 0x60 * (x)) @@ -86,9 +93,6 @@ #define SDW_SHIM_CTMCTL_DODS BIT(1) #define SDW_SHIM_CTMCTL_DOAIS GENMASK(4, 3) -#define SDW_SHIM_WAKEEN_ENABLE BIT(0) -#define SDW_SHIM_WAKESTS_STATUS BIT(0) - /* Intel ALH Register definitions */ #define SDW_ALH_STRMZCFG(x) (0x000 + (0x4 * (x))) #define SDW_ALH_NUM_STREAMS 64 -- cgit From 8dfa9d554061873f96335730fb1d403698b2b1b4 Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Wed, 17 Aug 2022 19:18:25 +0900 Subject: mm/slab_common: move declaration of __ksize() to mm/slab.h __ksize() is only called by KASAN. Remove export symbol and move declaration to mm/slab.h as we don't want to grow its callers. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Reviewed-by: Vlastimil Babka Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index c8e485ce8815..9b592e611cb1 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -187,7 +187,6 @@ int kmem_cache_shrink(struct kmem_cache *s); void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2); void kfree(const void *objp); void kfree_sensitive(const void *objp); -size_t __ksize(const void *objp); size_t ksize(const void *objp); #ifdef CONFIG_PRINTK bool kmem_valid_obj(void *object); -- cgit From ac56a0b48da86fd1b4389632fb7c4c8a5d86eefa Mon Sep 17 00:00:00 2001 From: David Howells Date: Fri, 26 Aug 2022 15:39:28 +0100 Subject: rxrpc: Fix ICMP/ICMP6 error handling Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing it to siphon off UDP packets early in the handling of received UDP packets thereby avoiding the packet going through the UDP receive queue, it doesn't get ICMP packets through the UDP ->sk_error_report() callback. In fact, it doesn't appear that there's any usable option for getting hold of ICMP packets. Fix this by adding a new UDP encap hook to distribute error messages for UDP tunnels. If the hook is set, then the tunnel driver will be able to see ICMP packets. The hook provides the offset into the packet of the UDP header of the original packet that caused the notification. An alternative would be to call the ->error_handler() hook - but that requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error() do, though isn't really necessary or desirable in rxrpc's case is we want to parse them there and then, not queue them). Changes ======= ver #3) - Fixed an uninitialised variable. ver #2) - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals. Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells --- include/linux/udp.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/udp.h b/include/linux/udp.h index 254a2654400f..e96da4157d04 100644 --- a/include/linux/udp.h +++ b/include/linux/udp.h @@ -70,6 +70,7 @@ struct udp_sock { * For encapsulation sockets. */ int (*encap_rcv)(struct sock *sk, struct sk_buff *skb); + void (*encap_err_rcv)(struct sock *sk, struct sk_buff *skb, unsigned int udp_offset); int (*encap_err_lookup)(struct sock *sk, struct sk_buff *skb); void (*encap_destroy)(struct sock *sk); -- cgit From 4e55e22d3d9aa50ef1ba059bf3a53aa61109c179 Mon Sep 17 00:00:00 2001 From: Heikki Krogerus Date: Wed, 31 Aug 2022 15:50:52 +0300 Subject: USB: hcd-pci: Drop the unused id parameter from usb_hcd_pci_probe() Since the HC driver is now passed to the function with its own parameter (it used to be picked from id->driver_data), the id parameter serves no purpose. Signed-off-by: Heikki Krogerus Link: https://lore.kernel.org/r/20220831125052.71584-1-heikki.krogerus@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/usb/hcd.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h index 67f8713d3fa3..78cd566ee238 100644 --- a/include/linux/usb/hcd.h +++ b/include/linux/usb/hcd.h @@ -479,7 +479,6 @@ static inline int ehset_single_step_set_feature(struct usb_hcd *hcd, int port) struct pci_dev; struct pci_device_id; extern int usb_hcd_pci_probe(struct pci_dev *dev, - const struct pci_device_id *id, const struct hc_driver *driver); extern void usb_hcd_pci_remove(struct pci_dev *dev); extern void usb_hcd_pci_shutdown(struct pci_dev *dev); -- cgit From a97126265dfe10d3321c0fde4708a6cea49b19ed Mon Sep 17 00:00:00 2001 From: Henning Schild Date: Thu, 25 Aug 2022 12:44:20 +0200 Subject: leds: simatic-ipc-leds-gpio: add new model 227G This adds support of the Siemens Simatic IPC227G. Its LEDs are connected to GPIO pins provided by the gpio-f7188x module. We make sure that gets loaded, if not enabled in the kernel config no LED support will be available. Reviewed-by: Andy Shevchenko Reviewed-by: Hans de Goede Signed-off-by: Henning Schild Link: https://lore.kernel.org/r/20220825104422.14156-6-henning.schild@siemens.com Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/simatic-ipc-base.h | 1 + include/linux/platform_data/x86/simatic-ipc.h | 1 + 2 files changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/simatic-ipc-base.h b/include/linux/platform_data/x86/simatic-ipc-base.h index 39fefd48cf4d..57d6a10dfc9e 100644 --- a/include/linux/platform_data/x86/simatic-ipc-base.h +++ b/include/linux/platform_data/x86/simatic-ipc-base.h @@ -19,6 +19,7 @@ #define SIMATIC_IPC_DEVICE_427E 2 #define SIMATIC_IPC_DEVICE_127E 3 #define SIMATIC_IPC_DEVICE_227E 4 +#define SIMATIC_IPC_DEVICE_227G 5 struct simatic_ipc_platform { u8 devmode; diff --git a/include/linux/platform_data/x86/simatic-ipc.h b/include/linux/platform_data/x86/simatic-ipc.h index f3b76b39776b..7a2e79f3be0b 100644 --- a/include/linux/platform_data/x86/simatic-ipc.h +++ b/include/linux/platform_data/x86/simatic-ipc.h @@ -31,6 +31,7 @@ enum simatic_ipc_station_ids { SIMATIC_IPC_IPC427E = 0x00000A01, SIMATIC_IPC_IPC477E = 0x00000A02, SIMATIC_IPC_IPC127E = 0x00000D01, + SIMATIC_IPC_IPC227G = 0x00000F01, }; static inline u32 simatic_ipc_get_station_id(u8 *data, int max_len) -- cgit From 8f5c9858c5db129359b5de2f60f5f034bf5d56c0 Mon Sep 17 00:00:00 2001 From: Henning Schild Date: Thu, 25 Aug 2022 12:44:22 +0200 Subject: platform/x86: simatic-ipc: add new model 427G This adds support for the Siemens Simatic IPC427G. A board which basically works like the 227G added in a previous patch. So all there is to do is to add the station_id and make it take all the 227G branches. Reviewed-by: Andy Shevchenko Signed-off-by: Henning Schild Link: https://lore.kernel.org/r/20220825104422.14156-8-henning.schild@siemens.com Reviewed-by: Hans de Goede Signed-off-by: Hans de Goede --- include/linux/platform_data/x86/simatic-ipc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/simatic-ipc.h b/include/linux/platform_data/x86/simatic-ipc.h index 7a2e79f3be0b..632320ec8f08 100644 --- a/include/linux/platform_data/x86/simatic-ipc.h +++ b/include/linux/platform_data/x86/simatic-ipc.h @@ -32,6 +32,7 @@ enum simatic_ipc_station_ids { SIMATIC_IPC_IPC477E = 0x00000A02, SIMATIC_IPC_IPC127E = 0x00000D01, SIMATIC_IPC_IPC227G = 0x00000F01, + SIMATIC_IPC_IPC427G = 0x00001001, }; static inline u32 simatic_ipc_get_station_id(u8 *data, int max_len) -- cgit From 0a64ce6e5442bbd96cbe9057d9ba1edab244f25b Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Tue, 30 Aug 2022 16:50:04 +0200 Subject: kernel/panic: Drop unblank_screen call MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit console_unblank() does this too (called in both places right after), and with a lot more confidence inspiring approach to locking. Reconstructing this story is very strange: In b61312d353da ("oops handling: ensure that any oops is flushed to the mtdoops console") it is claimed that a printk(" "); flushed out the console buffer, which was removed in e3e8a75d2acf ("[PATCH] Extract and use wake_up_klogd()"). In todays kernels this is done way earlier in console_flush_on_panic with some really nasty tricks. I didn't bother to fully reconstruct this all, least because the call to bust_spinlock(0); gets moved every few years, depending upon how the wind blows (or well, who screamed loudest about the various issue each call site caused). Before that commit the only calls to console_unblank() where in s390 arch code. The other side here is the console->unblank callback, which was introduced in 2.1.31 for the vt driver. Which predates the console_unblank() function by a lot, which was added (without users) in 2.4.14.3. So pretty much impossible to guess at any motivation here. Also afaict the vt driver is the only (and always was the only) console driver implementing the unblank callback, so no idea why a call to console_unblank() was added for the mtdooops driver - the action actually flushing out the console buffers is done from console_unlock() only. Note that as prep for the s390 users the locking was adjusted in 2.5.22 (I couldn't figure out how to properly reference the BK commit from the historical git trees) from a normal semaphore to a trylock. Note that a copy of the direct unblank_screen() call was added to panic() in c7c3f05e341a ("panic: avoid deadlocks in re-entrant console drivers"), which partially inlined the bust_spinlocks(0); call. Long story short, I have no idea why the direct call to unblank_screen survived for so long (the infrastructure to do it properly existed for years), nor why it wasn't removed when the console_unblank() call was finally added. But it makes a ton more sense to finally do that than not - it's just better encapsulation to go through the console functions instead of doing a direct call, so let's dare. Plus it really does not make much sense to call the only unblank implementation there is twice, once without, and once with appropriate locking. Cc: Greg Kroah-Hartman Cc: Jiri Slaby Cc: Daniel Vetter Cc: "Ilpo Järvinen" Cc: Tetsuo Handa Cc: Xuezhi Zhang Cc: Yangxi Xiang Cc: nick black Cc: Petr Mladek Cc: Andrew Morton Cc: Luis Chamberlain Cc: "Guilherme G. Piccoli" Cc: Marco Elver Cc: John Ogness Cc: Sebastian Andrzej Siewior Cc: David Gow Cc: tangmeng Cc: Tiezhu Yang Cc: Chris Wilson Reviewed-by: Petr Mladek Acked-by: Sebastian Andrzej Siewior Signed-off-by: Daniel Vetter Signed-off-by: Andrew Morton Link: https://lore.kernel.org/r/20220830145004.430545-1-daniel.vetter@ffwll.ch Signed-off-by: Greg Kroah-Hartman --- include/linux/vt_kern.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vt_kern.h b/include/linux/vt_kern.h index b5ab452fca5b..c1f5aebef170 100644 --- a/include/linux/vt_kern.h +++ b/include/linux/vt_kern.h @@ -30,7 +30,6 @@ struct vc_data *vc_deallocate(unsigned int console); void reset_palette(struct vc_data *vc); void do_blank_screen(int entering_gfx); void do_unblank_screen(int leaving_gfx); -void unblank_screen(void); void poke_blanked_console(void); int con_font_op(struct vc_data *vc, struct console_font_op *op); int con_set_cmap(unsigned char __user *cmap); -- cgit From 3954cf4338becb1f140bb6fa4f5e9a42f2529b86 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Mon, 22 Aug 2022 08:14:24 +0200 Subject: devres: remove devm_ioremap_np devm_ioremap_np has never been used anywhere since it was added in early 2021, so remove it. Signed-off-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220822061424.151819-1-hch@lst.de Signed-off-by: Greg Kroah-Hartman --- include/linux/io.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io.h b/include/linux/io.h index 5fc800390fe4..308f4f0cfb93 100644 --- a/include/linux/io.h +++ b/include/linux/io.h @@ -59,8 +59,6 @@ void __iomem *devm_ioremap_uc(struct device *dev, resource_size_t offset, resource_size_t size); void __iomem *devm_ioremap_wc(struct device *dev, resource_size_t offset, resource_size_t size); -void __iomem *devm_ioremap_np(struct device *dev, resource_size_t offset, - resource_size_t size); void devm_iounmap(struct device *dev, void __iomem *addr); int check_signature(const volatile void __iomem *io_addr, const unsigned char *signature, int length); -- cgit From c25491747b21536bd56dccb82a109754bbc8d52c Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sat, 27 Aug 2022 19:04:37 -1000 Subject: kernfs: Add KERNFS_REMOVING flags KERNFS_ACTIVATED tracks whether a given node has ever been activated. As a node was only deactivated on removal, this was used for 1. Drain optimization (removed by the previous patch). 2. To hide !activated nodes 3. To avoid double activations 4. Reject adding children to a node being removed 5. Skip activaing a node which is being removed. We want to decouple deactivation from removal so that nodes can be deactivated and hidden dynamically, which makes KERNFS_ACTIVATED useless for all of the above purposes. #1 is already gone. #2 and #3 can instead test whether the node is currently active. A new flag KERNFS_REMOVING is added to explicitly mark nodes which are being removed for #4 and #5. While this leaves KERNFS_ACTIVATED with no users, leave it be as it will be used in a following patch. Cc: Chengming Zhou Tested-by: Chengming Zhou Reviewed-by: Chengming Zhou Signed-off-by: Tejun Heo Link: https://lore.kernel.org/r/20220828050440.734579-7-tj@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/kernfs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 367044d7708c..b77d257c1f7e 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -112,6 +112,7 @@ enum kernfs_node_flag { KERNFS_SUICIDED = 0x0800, KERNFS_EMPTY_DIR = 0x1000, KERNFS_HAS_RELEASE = 0x2000, + KERNFS_REMOVING = 0x4000, }; /* @flags for kernfs_create_root() */ -- cgit From 783bd07d095b722108725af11113795ee046ed0e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sat, 27 Aug 2022 19:04:39 -1000 Subject: kernfs: Implement kernfs_show() Currently, kernfs nodes can be created hidden and activated later by calling kernfs_activate() to allow creation of multiple nodes to succeed or fail as a unit. This is an one-way one-time-only transition. This patch introduces kernfs_show() which can toggle visibility dynamically. As the currently proposed use - toggling the cgroup pressure files - only requires operating on leaf nodes, for the sake of simplicity, restrict it as such for now. Hiding uses the same mechanism as deactivation and likewise guarantees that there are no in-flight operations on completion. KERNFS_ACTIVATED and KERNFS_HIDDEN are used to manage the interactions between activations and show/hide operations. A node is visible iff both activated & !hidden. Cc: Chengming Zhou Cc: Johannes Weiner Tested-by: Chengming Zhou Reviewed-by: Chengming Zhou Signed-off-by: Tejun Heo Link: https://lore.kernel.org/r/20220828050440.734579-9-tj@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/kernfs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index b77d257c1f7e..73f5c120def8 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -108,6 +108,7 @@ enum kernfs_node_flag { KERNFS_HAS_SEQ_SHOW = 0x0040, KERNFS_HAS_MMAP = 0x0080, KERNFS_LOCKDEP = 0x0100, + KERNFS_HIDDEN = 0x0200, KERNFS_SUICIDAL = 0x0400, KERNFS_SUICIDED = 0x0800, KERNFS_EMPTY_DIR = 0x1000, @@ -430,6 +431,7 @@ struct kernfs_node *kernfs_create_link(struct kernfs_node *parent, const char *name, struct kernfs_node *target); void kernfs_activate(struct kernfs_node *kn); +void kernfs_show(struct kernfs_node *kn, bool show); void kernfs_remove(struct kernfs_node *kn); void kernfs_break_active_protection(struct kernfs_node *kn); void kernfs_unbreak_active_protection(struct kernfs_node *kn); -- cgit From e2691f6b44ed2135bfd005ad5fbabac4f433a7a1 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Sat, 27 Aug 2022 19:04:40 -1000 Subject: cgroup: Implement cgroup_file_show() Add cgroup_file_show() which allows toggling visibility of a cgroup file using the new kernfs_show(). This will be used to hide psi interface files on cgroups where it's disabled. Cc: Chengming Zhou Cc: Johannes Weiner Tested-by: Chengming Zhou Reviewed-by: Chengming Zhou Signed-off-by: Tejun Heo Link: https://lore.kernel.org/r/20220828050440.734579-10-tj@kernel.org Signed-off-by: Greg Kroah-Hartman --- include/linux/cgroup.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index ed53bfe7c46c..f61dd135efbe 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -114,6 +114,7 @@ int cgroup_add_dfl_cftypes(struct cgroup_subsys *ss, struct cftype *cfts); int cgroup_add_legacy_cftypes(struct cgroup_subsys *ss, struct cftype *cfts); int cgroup_rm_cftypes(struct cftype *cfts); void cgroup_file_notify(struct cgroup_file *cfile); +void cgroup_file_show(struct cgroup_file *cfile, bool show); int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen); int cgroupstats_build(struct cgroupstats *stats, struct dentry *dentry); -- cgit From 16ede66973c84f890c03584f79158dd5b2d725f5 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 25 Aug 2022 07:53:12 -0700 Subject: sbitmap: fix batched wait_cnt accounting Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch Link: https://lore.kernel.org/r/20220825145312.1217900-1-kbusch@fb.com Signed-off-by: Jens Axboe --- include/linux/sbitmap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 8f5a86e210b9..4d2d5205ab58 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -575,8 +575,9 @@ void sbitmap_queue_wake_all(struct sbitmap_queue *sbq); * sbitmap_queue_wake_up() - Wake up some of waiters in one waitqueue * on a &struct sbitmap_queue. * @sbq: Bitmap queue to wake up. + * @nr: Number of bits cleared. */ -void sbitmap_queue_wake_up(struct sbitmap_queue *sbq); +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr); /** * sbitmap_queue_show() - Dump &struct sbitmap_queue information to a &struct -- cgit From e34a0425b8ef524355811e7408dc1d53d08dc538 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 26 Aug 2022 16:34:01 -0300 Subject: vfio/pci: Split linux/vfio_pci_core.h The header in include/linux should have only the exported interface for other vfio_pci modules to use. Internal definitions for vfio_pci.ko should be in a "priv" header along side the .c files. Move the internal declarations out of vfio_pci_core.h. They either move to vfio_pci_priv.h or to the C file that is the only user. Reviewed-by: Kevin Tian Signed-off-by: Jason Gunthorpe Reviewed-by: Cornelia Huck Link: https://lore.kernel.org/r/1-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio_pci_core.h | 134 +----------------------------------------- 1 file changed, 1 insertion(+), 133 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 5579ece4347b..9d18b832e61a 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -20,39 +20,10 @@ #define VFIO_PCI_CORE_H #define VFIO_PCI_OFFSET_SHIFT 40 - #define VFIO_PCI_OFFSET_TO_INDEX(off) (off >> VFIO_PCI_OFFSET_SHIFT) #define VFIO_PCI_INDEX_TO_OFFSET(index) ((u64)(index) << VFIO_PCI_OFFSET_SHIFT) #define VFIO_PCI_OFFSET_MASK (((u64)(1) << VFIO_PCI_OFFSET_SHIFT) - 1) -/* Special capability IDs predefined access */ -#define PCI_CAP_ID_INVALID 0xFF /* default raw access */ -#define PCI_CAP_ID_INVALID_VIRT 0xFE /* default virt access */ - -/* Cap maximum number of ioeventfds per device (arbitrary) */ -#define VFIO_PCI_IOEVENTFD_MAX 1000 - -struct vfio_pci_ioeventfd { - struct list_head next; - struct vfio_pci_core_device *vdev; - struct virqfd *virqfd; - void __iomem *addr; - uint64_t data; - loff_t pos; - int bar; - int count; - bool test_mem; -}; - -struct vfio_pci_irq_ctx { - struct eventfd_ctx *trigger; - struct virqfd *unmask; - struct virqfd *mask; - char *name; - bool masked; - struct irq_bypass_producer producer; -}; - struct vfio_pci_core_device; struct vfio_pci_region; @@ -78,23 +49,6 @@ struct vfio_pci_region { u32 flags; }; -struct vfio_pci_dummy_resource { - struct resource resource; - int index; - struct list_head res_next; -}; - -struct vfio_pci_vf_token { - struct mutex lock; - uuid_t uuid; - int users; -}; - -struct vfio_pci_mmap_vma { - struct vm_area_struct *vma; - struct list_head vma_next; -}; - struct vfio_pci_core_device { struct vfio_device vdev; struct pci_dev *pdev; @@ -141,92 +95,11 @@ struct vfio_pci_core_device { struct rw_semaphore memory_lock; }; -#define is_intx(vdev) (vdev->irq_type == VFIO_PCI_INTX_IRQ_INDEX) -#define is_msi(vdev) (vdev->irq_type == VFIO_PCI_MSI_IRQ_INDEX) -#define is_msix(vdev) (vdev->irq_type == VFIO_PCI_MSIX_IRQ_INDEX) -#define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev))) -#define irq_is(vdev, type) (vdev->irq_type == type) - -void vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); -void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); - -int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, - uint32_t flags, unsigned index, - unsigned start, unsigned count, void *data); - -ssize_t vfio_pci_config_rw(struct vfio_pci_core_device *vdev, - char __user *buf, size_t count, - loff_t *ppos, bool iswrite); - -ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, - size_t count, loff_t *ppos, bool iswrite); - -#ifdef CONFIG_VFIO_PCI_VGA -ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, char __user *buf, - size_t count, loff_t *ppos, bool iswrite); -#else -static inline ssize_t vfio_pci_vga_rw(struct vfio_pci_core_device *vdev, - char __user *buf, size_t count, - loff_t *ppos, bool iswrite) -{ - return -EINVAL; -} -#endif - -long vfio_pci_ioeventfd(struct vfio_pci_core_device *vdev, loff_t offset, - uint64_t data, int count, int fd); - -int vfio_pci_init_perm_bits(void); -void vfio_pci_uninit_perm_bits(void); - -int vfio_config_init(struct vfio_pci_core_device *vdev); -void vfio_config_free(struct vfio_pci_core_device *vdev); - +/* Will be exported for vfio pci drivers usage */ int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev, unsigned int type, unsigned int subtype, const struct vfio_pci_regops *ops, size_t size, u32 flags, void *data); - -int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, - pci_power_t state); - -bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev); -void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev); -u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev); -void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev, - u16 cmd); - -#ifdef CONFIG_VFIO_PCI_IGD -int vfio_pci_igd_init(struct vfio_pci_core_device *vdev); -#else -static inline int vfio_pci_igd_init(struct vfio_pci_core_device *vdev) -{ - return -ENODEV; -} -#endif - -#ifdef CONFIG_VFIO_PCI_ZDEV_KVM -int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev, - struct vfio_info_cap *caps); -int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev); -void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev); -#else -static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_core_device *vdev, - struct vfio_info_cap *caps) -{ - return -ENODEV; -} - -static inline int vfio_pci_zdev_open_device(struct vfio_pci_core_device *vdev) -{ - return 0; -} - -static inline void vfio_pci_zdev_close_device(struct vfio_pci_core_device *vdev) -{} -#endif - -/* Will be exported for vfio pci drivers usage */ void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga, bool is_disable_idle_d3); void vfio_pci_core_close_device(struct vfio_device *core_vdev); @@ -256,9 +129,4 @@ void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev); pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev, pci_channel_state_t state); -static inline bool vfio_pci_is_vga(struct pci_dev *pdev) -{ - return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; -} - #endif /* VFIO_PCI_CORE_H */ -- cgit From 1e979ef5df8b7b604a625343a179b812a7984068 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 26 Aug 2022 16:34:02 -0300 Subject: vfio/pci: Rename vfio_pci_register_dev_region() As this is part of the vfio_pci_core component it should be called vfio_pci_core_register_dev_region() like everything else exported from this module. Suggested-by: Kevin Tian Signed-off-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Cornelia Huck Link: https://lore.kernel.org/r/2-v2-1bd95d72f298+e0e-vfio_pci_priv_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio_pci_core.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 9d18b832e61a..e5cf0d3313a6 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -96,10 +96,10 @@ struct vfio_pci_core_device { }; /* Will be exported for vfio pci drivers usage */ -int vfio_pci_register_dev_region(struct vfio_pci_core_device *vdev, - unsigned int type, unsigned int subtype, - const struct vfio_pci_regops *ops, - size_t size, u32 flags, void *data); +int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev, + unsigned int type, unsigned int subtype, + const struct vfio_pci_regops *ops, + size_t size, u32 flags, void *data); void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga, bool is_disable_idle_d3); void vfio_pci_core_close_device(struct vfio_device *core_vdev); -- cgit From 4813724c4b76b62f8e6b60dd5655633d0db1c9a8 Mon Sep 17 00:00:00 2001 From: Abhishek Sahu Date: Mon, 29 Aug 2022 17:18:48 +0530 Subject: vfio/pci: Mask INTx during runtime suspend This patch adds INTx handling during runtime suspend/resume. All the suspend/resume related code for the user to put the device into the low power state will be added in subsequent patches. The INTx lines may be shared among devices. Whenever any INTx interrupt comes for the VFIO devices, then vfio_intx_handler() will be called for each device sharing the interrupt. Inside vfio_intx_handler(), it calls pci_check_and_mask_intx() and checks if the interrupt has been generated for the current device. Now, if the device is already in the D3cold state, then the config space can not be read. Attempt to read config space in D3cold state can cause system unresponsiveness in a few systems. To prevent this, mask INTx in runtime suspend callback, and unmask the same in runtime resume callback. If INTx has been already masked, then no handling is needed in runtime suspend/resume callbacks. 'pm_intx_masked' tracks this, and vfio_pci_intx_mask() has been updated to return true if the INTx vfio_pci_irq_ctx.masked value is changed inside this function. For the runtime suspend which is triggered for the no user of VFIO device, the 'irq_type' will be VFIO_PCI_NUM_IRQS and these callbacks won't do anything. The MSI/MSI-X are not shared so similar handling should not be needed for MSI/MSI-X. vfio_msihandler() triggers eventfd_signal() without doing any device-specific config access. When the user performs any config access or IOCTL after receiving the eventfd notification, then the device will be moved to the D0 state first before servicing any request. Another option was to check this flag 'pm_intx_masked' inside vfio_intx_handler() instead of masking the interrupts. This flag is being set inside the runtime_suspend callback but the device can be in non-D3cold state (for example, if the user has disabled D3cold explicitly by sysfs, the D3cold is not supported in the platform, etc.). Also, in D3cold supported case, the device will be in D0 till the PCI core moves the device into D3cold. In this case, there is a possibility that the device can generate an interrupt. Adding check in the IRQ handler will not clear the IRQ status and the interrupt line will still be asserted. This can cause interrupt flooding. Signed-off-by: Abhishek Sahu Link: https://lore.kernel.org/r/20220829114850.4341-4-abhsahu@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio_pci_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index e5cf0d3313a6..a0f1f36e42a2 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -78,6 +78,7 @@ struct vfio_pci_core_device { bool needs_reset; bool nointx; bool needs_pm_restore; + bool pm_intx_masked; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; int ioeventfds_nr; -- cgit From cc2742fe3660cc6500021d3da8f937d326392dbd Mon Sep 17 00:00:00 2001 From: Abhishek Sahu Date: Mon, 29 Aug 2022 17:18:49 +0530 Subject: vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY/EXIT Currently, if the runtime power management is enabled for vfio-pci based devices in the guest OS, then the guest OS will do the register write for PCI_PM_CTRL register. This write request will be handled in vfio_pm_config_write() where it will do the actual register write of PCI_PM_CTRL register. With this, the maximum D3hot state can be achieved for low power. If we can use the runtime PM framework, then we can achieve the D3cold state (on the supported systems) which will help in saving maximum power. 1. D3cold state can't be achieved by writing PCI standard PM config registers. This patch implements the following newly added low power related device features: - VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY - VFIO_DEVICE_FEATURE_LOW_POWER_EXIT The VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY feature will allow the device to make use of low power platform states on the host while the VFIO_DEVICE_FEATURE_LOW_POWER_EXIT will prevent further use of those power states. 2. The vfio-pci driver uses runtime PM framework for low power entry and exit. On the platforms where D3cold state is supported, the runtime PM framework will put the device into D3cold otherwise, D3hot or some other power state will be used. There are various cases where the device will not go into the runtime suspended state. For example, - The runtime power management is disabled on the host side for the device. - The user keeps the device busy after calling LOW_POWER_ENTRY. - There are dependent devices that are still in runtime active state. For these cases, the device will be in the same power state that has been configured by the user through PCI_PM_CTRL register. 3. The hypervisors can implement virtual ACPI methods. For example, in guest linux OS if PCI device ACPI node has _PR3 and _PR0 power resources with _ON/_OFF method, then guest linux OS invokes the _OFF method during D3cold transition and then _ON during D0 transition. The hypervisor can tap these virtual ACPI calls and then call the low power device feature IOCTL. 4. The 'pm_runtime_engaged' flag tracks the entry and exit to runtime PM. This flag is protected with 'memory_lock' semaphore. 5. All the config and other region access are wrapped under pm_runtime_resume_and_get() and pm_runtime_put(). So, if any device access happens while the device is in the runtime suspended state, then the device will be resumed first before access. Once the access has been finished, then the device will again go into the runtime suspended state. 6. The memory region access through mmap will not be allowed in the low power state. Since __vfio_pci_memory_enabled() is a common function, so check for 'pm_runtime_engaged' has been added explicitly in vfio_pci_mmap_fault() to block only mmap'ed access. Signed-off-by: Abhishek Sahu Link: https://lore.kernel.org/r/20220829114850.4341-5-abhsahu@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio_pci_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index a0f1f36e42a2..1025d53fde0b 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -79,6 +79,7 @@ struct vfio_pci_core_device { bool nointx; bool needs_pm_restore; bool pm_intx_masked; + bool pm_runtime_engaged; struct pci_saved_state *pci_saved_state; struct pci_saved_state *pm_save; int ioeventfds_nr; -- cgit From 453e6c98fd2bdae0df9adbc86af8d8bf1164edd5 Mon Sep 17 00:00:00 2001 From: Abhishek Sahu Date: Mon, 29 Aug 2022 17:18:50 +0530 Subject: vfio/pci: Implement VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP This patch implements VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP device feature. In the VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY, if there is any access for the VFIO device on the host side, then the device will be moved out of the low power state without the user's guest driver involvement. Once the device access has been finished, then the host can move the device again into low power state. With the low power entry happened through VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP, the device will not be moved back into the low power state and a notification will be sent to the user by triggering wakeup eventfd. vfio_pci_core_pm_entry() will be called for both the variants of low power feature entry so add an extra argument for wakeup eventfd context and store locally in 'struct vfio_pci_core_device'. For the entry happened without wakeup eventfd, all the exit related handling will be done by the LOW_POWER_EXIT device feature only. When the LOW_POWER_EXIT will be called, then the vfio core layer vfio_device_pm_runtime_get() will increment the usage count and will resume the device. In the driver runtime_resume callback, the 'pm_wake_eventfd_ctx' will be NULL. Then vfio_pci_core_pm_exit() will call vfio_pci_runtime_pm_exit() and all the exit related handling will be done. For the entry happened with wakeup eventfd, in the driver resume callback, eventfd will be triggered and all the exit related handling will be done. When vfio_pci_runtime_pm_exit() will be called by vfio_pci_core_pm_exit(), then it will return early. But if the runtime suspend has not happened on the host side, then all the exit related handling will be done in vfio_pci_core_pm_exit() only. Signed-off-by: Abhishek Sahu Link: https://lore.kernel.org/r/20220829114850.4341-6-abhsahu@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio_pci_core.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 1025d53fde0b..089b603bcfdc 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -85,6 +85,7 @@ struct vfio_pci_core_device { int ioeventfds_nr; struct eventfd_ctx *err_trigger; struct eventfd_ctx *req_trigger; + struct eventfd_ctx *pm_wake_eventfd_ctx; struct list_head dummy_resources_list; struct mutex ioeventfds_lock; struct list_head ioeventfds_list; -- cgit From c8e477c649b40c1a073b7a843d89e51dc0037db7 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 30 Jan 2022 19:57:52 -0500 Subject: ->getprocattr(): attribute name is const char *, TYVM... cast of ->d_name.name to char * is completely wrong - nothing is allowed to modify its contents. Reviewed-by: Christian Brauner (Microsoft) Acked-by: Paul Moore Acked-by: Casey Schaufler Signed-off-by: Al Viro --- include/linux/lsm_hook_defs.h | 2 +- include/linux/security.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h index 806448173033..03360d27bedf 100644 --- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -253,7 +253,7 @@ LSM_HOOK(int, 0, sem_semop, struct kern_ipc_perm *perm, struct sembuf *sops, LSM_HOOK(int, 0, netlink_send, struct sock *sk, struct sk_buff *skb) LSM_HOOK(void, LSM_RET_VOID, d_instantiate, struct dentry *dentry, struct inode *inode) -LSM_HOOK(int, -EINVAL, getprocattr, struct task_struct *p, char *name, +LSM_HOOK(int, -EINVAL, getprocattr, struct task_struct *p, const char *name, char **value) LSM_HOOK(int, -EINVAL, setprocattr, const char *name, void *value, size_t size) LSM_HOOK(int, 0, ismaclabel, const char *name) diff --git a/include/linux/security.h b/include/linux/security.h index 1bc362cb413f..93488c01d9bd 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -461,7 +461,7 @@ int security_sem_semctl(struct kern_ipc_perm *sma, int cmd); int security_sem_semop(struct kern_ipc_perm *sma, struct sembuf *sops, unsigned nsops, int alter); void security_d_instantiate(struct dentry *dentry, struct inode *inode); -int security_getprocattr(struct task_struct *p, const char *lsm, char *name, +int security_getprocattr(struct task_struct *p, const char *lsm, const char *name, char **value); int security_setprocattr(const char *lsm, const char *name, void *value, size_t size); @@ -1301,7 +1301,7 @@ static inline void security_d_instantiate(struct dentry *dentry, { } static inline int security_getprocattr(struct task_struct *p, const char *lsm, - char *name, char **value) + const char *name, char **value) { return -EINVAL; } -- cgit From ea4af4aa03c3966c63231b4191da94497de8f034 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 4 Aug 2022 13:19:18 -0400 Subject: nd_jump_link(): constify path Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Al Viro --- include/linux/namei.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/namei.h b/include/linux/namei.h index caeb08a98536..00fee52df842 100644 --- a/include/linux/namei.h +++ b/include/linux/namei.h @@ -83,7 +83,7 @@ extern int follow_up(struct path *); extern struct dentry *lock_rename(struct dentry *, struct dentry *); extern void unlock_rename(struct dentry *, struct dentry *); -extern int __must_check nd_jump_link(struct path *path); +extern int __must_check nd_jump_link(const struct path *path); static inline void nd_terminate_link(void *name, size_t len, size_t maxlen) { -- cgit From 977f1aa5e4d1942bc8012bf3f0e695008979ff4e Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 31 Aug 2022 18:44:27 +0000 Subject: net: bql: add more documentation Add some documentation for netdev_tx_sent_queue() and netdev_tx_completed_queue() Stating that netdev_tx_completed_queue() must be called once per TX completion round is apparently not obvious for everybody. Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/netdevice.h | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index eec35f9b6616..cfe41c053e11 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3353,6 +3353,16 @@ static inline void netdev_txq_bql_complete_prefetchw(struct netdev_queue *dev_qu #endif } +/** + * netdev_tx_sent_queue - report the number of bytes queued to a given tx queue + * @dev_queue: network device queue + * @bytes: number of bytes queued to the device queue + * + * Report the number of bytes queued for sending/completion to the network + * device hardware queue. @bytes should be a good approximation and should + * exactly match netdev_completed_queue() @bytes. + * This is typically called once per packet, from ndo_start_xmit(). + */ static inline void netdev_tx_sent_queue(struct netdev_queue *dev_queue, unsigned int bytes) { @@ -3398,13 +3408,14 @@ static inline bool __netdev_tx_sent_queue(struct netdev_queue *dev_queue, } /** - * netdev_sent_queue - report the number of bytes queued to hardware - * @dev: network device - * @bytes: number of bytes queued to the hardware device queue + * netdev_sent_queue - report the number of bytes queued to hardware + * @dev: network device + * @bytes: number of bytes queued to the hardware device queue * - * Report the number of bytes queued for sending/completion to the network - * device hardware queue. @bytes should be a good approximation and should - * exactly match netdev_completed_queue() @bytes + * Report the number of bytes queued for sending/completion to the network + * device hardware queue#0. @bytes should be a good approximation and should + * exactly match netdev_completed_queue() @bytes. + * This is typically called once per packet, from ndo_start_xmit(). */ static inline void netdev_sent_queue(struct net_device *dev, unsigned int bytes) { @@ -3419,6 +3430,15 @@ static inline bool __netdev_sent_queue(struct net_device *dev, xmit_more); } +/** + * netdev_tx_completed_queue - report number of packets/bytes at TX completion. + * @dev_queue: network device queue + * @pkts: number of packets (currently ignored) + * @bytes: number of bytes dequeued from the device queue + * + * Must be called at most once per TX completion round (and not per + * individual packet), so that BQL can adjust its limits appropriately. + */ static inline void netdev_tx_completed_queue(struct netdev_queue *dev_queue, unsigned int pkts, unsigned int bytes) { -- cgit From 3261400639463a853ba2b3be8bd009c2a8089775 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Wed, 31 Aug 2022 23:38:09 +0000 Subject: tcp: TX zerocopy should not sense pfmemalloc status We got a recent syzbot report [1] showing a possible misuse of pfmemalloc page status in TCP zerocopy paths. Indeed, for pages coming from user space or other layers, using page_is_pfmemalloc() is moot, and possibly could give false positives. There has been attempts to make page_is_pfmemalloc() more robust, but not using it in the first place in this context is probably better, removing cpu cycles. Note to stable teams : You need to backport 84ce071e38a6 ("net: introduce __skb_fill_page_desc_noacc") as a prereq. Race is more probable after commit c07aea3ef4d4 ("mm: add a signature in struct page") because page_is_pfmemalloc() is now using low order bit from page->lru.next, which can change more often than page->index. Low order bit should never be set for lru.next (when used as an anchor in LRU list), so KCSAN report is mostly a false positive. Backporting to older kernel versions seems not necessary. [1] BUG: KCSAN: data-race in lru_add_fn / tcp_build_frag write to 0xffffea0004a1d2c8 of 8 bytes by task 18600 on cpu 0: __list_add include/linux/list.h:73 [inline] list_add include/linux/list.h:88 [inline] lruvec_add_folio include/linux/mm_inline.h:105 [inline] lru_add_fn+0x440/0x520 mm/swap.c:228 folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246 folio_batch_add_and_move mm/swap.c:263 [inline] folio_add_lru+0xf1/0x140 mm/swap.c:490 filemap_add_folio+0xf8/0x150 mm/filemap.c:948 __filemap_get_folio+0x510/0x6d0 mm/filemap.c:1981 pagecache_get_page+0x26/0x190 mm/folio-compat.c:104 grab_cache_page_write_begin+0x2a/0x30 mm/folio-compat.c:116 ext4_da_write_begin+0x2dd/0x5f0 fs/ext4/inode.c:2988 generic_perform_write+0x1d4/0x3f0 mm/filemap.c:3738 ext4_buffered_write_iter+0x235/0x3e0 fs/ext4/file.c:270 ext4_file_write_iter+0x2e3/0x1210 call_write_iter include/linux/fs.h:2187 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x468/0x760 fs/read_write.c:578 ksys_write+0xe8/0x1a0 fs/read_write.c:631 __do_sys_write fs/read_write.c:643 [inline] __se_sys_write fs/read_write.c:640 [inline] __x64_sys_write+0x3e/0x50 fs/read_write.c:640 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffffea0004a1d2c8 of 8 bytes by task 18611 on cpu 1: page_is_pfmemalloc include/linux/mm.h:1740 [inline] __skb_fill_page_desc include/linux/skbuff.h:2422 [inline] skb_fill_page_desc include/linux/skbuff.h:2443 [inline] tcp_build_frag+0x613/0xb20 net/ipv4/tcp.c:1018 do_tcp_sendpages+0x3e8/0xaf0 net/ipv4/tcp.c:1075 tcp_sendpage_locked net/ipv4/tcp.c:1140 [inline] tcp_sendpage+0x89/0xb0 net/ipv4/tcp.c:1150 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833 kernel_sendpage+0x184/0x300 net/socket.c:3561 sock_sendpage+0x5a/0x70 net/socket.c:1054 pipe_to_sendpage+0x128/0x160 fs/splice.c:361 splice_from_pipe_feed fs/splice.c:415 [inline] __splice_from_pipe+0x222/0x4d0 fs/splice.c:559 splice_from_pipe fs/splice.c:594 [inline] generic_splice_sendpage+0x89/0xc0 fs/splice.c:743 do_splice_from fs/splice.c:764 [inline] direct_splice_actor+0x80/0xa0 fs/splice.c:931 splice_direct_to_actor+0x305/0x620 fs/splice.c:886 do_splice_direct+0xfb/0x180 fs/splice.c:974 do_sendfile+0x3bf/0x910 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1317 [inline] __se_sys_sendfile64 fs/read_write.c:1303 [inline] __x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1303 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x0000000000000000 -> 0xffffea0004a1d288 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 18611 Comm: syz-executor.4 Not tainted 6.0.0-rc2-syzkaller-00248-ge022620b5d05-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022 Fixes: c07aea3ef4d4 ("mm: add a signature in struct page") Reported-by: syzbot Signed-off-by: Eric Dumazet Cc: Shakeel Butt Reviewed-by: Shakeel Butt Signed-off-by: David S. Miller --- include/linux/skbuff.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ca8afa382bf2..18e163a3460d 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2444,6 +2444,27 @@ static inline void skb_fill_page_desc(struct sk_buff *skb, int i, skb_shinfo(skb)->nr_frags = i + 1; } +/** + * skb_fill_page_desc_noacc - initialise a paged fragment in an skb + * @skb: buffer containing fragment to be initialised + * @i: paged fragment index to initialise + * @page: the page to use for this fragment + * @off: the offset to the data with @page + * @size: the length of the data + * + * Variant of skb_fill_page_desc() which does not deal with + * pfmemalloc, if page is not owned by us. + */ +static inline void skb_fill_page_desc_noacc(struct sk_buff *skb, int i, + struct page *page, int off, + int size) +{ + struct skb_shared_info *shinfo = skb_shinfo(skb); + + __skb_fill_page_desc_noacc(shinfo, i, page, off, size); + shinfo->nr_frags = i + 1; +} + void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, int size, unsigned int truesize); -- cgit From c3f760ef128789252e7c4f10d3c1721422dceba9 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Wed, 31 Aug 2022 17:00:58 -0700 Subject: net: remove netif_tx_napi_add() All callers are now gone. Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netdevice.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index cfe41c053e11..f0068c1ff1df 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2569,8 +2569,6 @@ netif_napi_add_tx_weight(struct net_device *dev, netif_napi_add_weight(dev, napi, poll, weight); } -#define netif_tx_napi_add netif_napi_add_tx_weight - /** * netif_napi_add_tx() - initialize a NAPI context to be used for Tx only * @dev: network device -- cgit From b30f7c8eb0780e1479a9882526e838664271f4c9 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Thu, 1 Sep 2022 13:07:32 +0100 Subject: spi: mux: Fix mux interaction with fast path optimisations The spi-mux driver is rather too clever and attempts to resubmit any message that is submitted to it to the parent controller with some adjusted callbacks. This does not play at all nicely with the fast path which now sets flags on the message indicating that it's being handled through the fast path, we see async messages flagged as being on the fast path. Ideally the spi-mux code would duplicate the message but that's rather invasive and a bit fragile in that it relies on the mux knowing which fields in the message to copy. Instead teach the core that there are controllers which can't cope with the fast path and have the mux flag itself as being such a controller, ensuring that messages going via the mux don't get partially handled via the fast path. This will reduce the performance of any spi-mux connected device since we'll now always use the thread for both the actual controller and the mux controller instead of just the actual controller but given that we were always hitting the slow path anyway it's hopefully not too much of an additional cost and it allows us to keep the fast path. Fixes: ae7d2346dc89 ("spi: Don't use the message queue if possible in spi_sync") Reported-by: Casper Andersson Tested-by: Casper Andersson Signed-off-by: Mark Brown Link: https://lore.kernel.org/r/20220901120732.49245-1-broonie@kernel.org Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index e6c73d5ff1a8..f089ee1ead58 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -469,6 +469,7 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * SPI_TRANS_FAIL_NO_START. * @queue_empty: signal green light for opportunistically skipping the queue * for spi_sync transfers. + * @must_async: disable all fast paths in the core * * Each SPI controller can communicate with one or more @spi_device * children. These make a small bus, sharing MOSI, MISO and SCK signals @@ -690,6 +691,7 @@ struct spi_controller { /* Flag for enabling opportunistic skipping of the queue in spi_sync */ bool queue_empty; + bool must_async; }; static inline void *spi_controller_get_devdata(struct spi_controller *ctlr) -- cgit From dc84dbbcc97bfb47e0f2b175d816e601b2890c91 Mon Sep 17 00:00:00 2001 From: Shung-Hsi Yu Date: Wed, 31 Aug 2022 11:19:06 +0800 Subject: bpf, tnums: Warn against the usage of tnum_in(tnum_range(), ...) Commit a657182a5c51 ("bpf: Don't use tnum_range on array range checking for poke descriptors") has shown that using tnum_range() as argument to tnum_in() can lead to misleading code that looks like tight bound check when in fact the actual allowed range is much wider. Document such behavior to warn against its usage in general, and suggest some scenario where result can be trusted. Signed-off-by: Shung-Hsi Yu Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/984b37f9fdf7ac36831d2137415a4a915744c1b6.1661462653.git.daniel@iogearbox.net Link: https://www.openwall.com/lists/oss-security/2022/08/26/1 Link: https://lore.kernel.org/bpf/20220831031907.16133-3-shung-hsi.yu@suse.com Link: https://lore.kernel.org/bpf/20220831031907.16133-2-shung-hsi.yu@suse.com --- include/linux/tnum.h | 20 ++++++++++++++++++-- 1 file changed, 18 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tnum.h b/include/linux/tnum.h index 498dbcedb451..1c3948a1d6ad 100644 --- a/include/linux/tnum.h +++ b/include/linux/tnum.h @@ -21,7 +21,12 @@ struct tnum { struct tnum tnum_const(u64 value); /* A completely unknown value */ extern const struct tnum tnum_unknown; -/* A value that's unknown except that @min <= value <= @max */ +/* An unknown value that is a superset of @min <= value <= @max. + * + * Could include values outside the range of [@min, @max]. + * For example tnum_range(0, 2) is represented by {0, 1, 2, *3*}, + * rather than the intended set of {0, 1, 2}. + */ struct tnum tnum_range(u64 min, u64 max); /* Arithmetic and logical ops */ @@ -73,7 +78,18 @@ static inline bool tnum_is_unknown(struct tnum a) */ bool tnum_is_aligned(struct tnum a, u64 size); -/* Returns true if @b represents a subset of @a. */ +/* Returns true if @b represents a subset of @a. + * + * Note that using tnum_range() as @a requires extra cautions as tnum_in() may + * return true unexpectedly due to tnum limited ability to represent tight + * range, e.g. + * + * tnum_in(tnum_range(0, 2), tnum_const(3)) == true + * + * As a rule of thumb, if @a is explicitly coded rather than coming from + * reg->var_off, it should be in form of tnum_const(), tnum_range(0, 2**n - 1), + * or tnum_range(2**n, 2**(n+1) - 1). + */ bool tnum_in(struct tnum a, struct tnum b); /* Formatting functions. These have snprintf-like semantics: they will write -- cgit From 4ff09db1b79b98b4a2a7511571c640b76cab3beb Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 1 Sep 2022 17:28:02 -0700 Subject: bpf: net: Change sk_getsockopt() to take the sockptr_t argument This patch changes sk_getsockopt() to take the sockptr_t argument such that it can be used by bpf_getsockopt(SOL_SOCKET) in a latter patch. security_socket_getpeersec_stream() is not changed. It stays with the __user ptr (optval.user and optlen.user) to avoid changes to other security hooks. bpf_getsockopt(SOL_SOCKET) also does not support SO_PEERSEC. Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220902002802.2888419-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/filter.h | 3 +-- include/linux/sockptr.h | 5 +++++ 2 files changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index a5f21dc3c432..527ae1d64e27 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -900,8 +900,7 @@ int sk_reuseport_attach_filter(struct sock_fprog *fprog, struct sock *sk); int sk_reuseport_attach_bpf(u32 ufd, struct sock *sk); void sk_reuseport_prog_free(struct bpf_prog *prog); int sk_detach_filter(struct sock *sk); -int sk_get_filter(struct sock *sk, struct sock_filter __user *filter, - unsigned int len); +int sk_get_filter(struct sock *sk, sockptr_t optval, unsigned int len); bool sk_filter_charge(struct sock *sk, struct sk_filter *fp); void sk_filter_uncharge(struct sock *sk, struct sk_filter *fp); diff --git a/include/linux/sockptr.h b/include/linux/sockptr.h index d45902fb4cad..bae5e2369b4f 100644 --- a/include/linux/sockptr.h +++ b/include/linux/sockptr.h @@ -64,6 +64,11 @@ static inline int copy_to_sockptr_offset(sockptr_t dst, size_t offset, return 0; } +static inline int copy_to_sockptr(sockptr_t dst, const void *src, size_t size) +{ + return copy_to_sockptr_offset(dst, 0, src, size); +} + static inline void *memdup_sockptr(sockptr_t src, size_t len) { void *p = kmalloc_track_caller(len, GFP_USER | __GFP_NOWARN); -- cgit From 728f064cd7ebea8c182e99e6f152c8b4a0a6b071 Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 1 Sep 2022 17:28:28 -0700 Subject: bpf: net: Change do_ip_getsockopt() to take the sockptr_t argument Similar to the earlier patch that changes sk_getsockopt() to take the sockptr_t argument. This patch also changes do_ip_getsockopt() to take the sockptr_t argument such that a latter patch can make bpf_getsockopt(SOL_IP) to reuse do_ip_getsockopt(). Note on the change in ip_mc_gsfget(). This function is to return an array of sockaddr_storage in optval. This function is shared between ip_get_mcast_msfilter() and compat_ip_get_mcast_msfilter(). However, the sockaddr_storage is stored at different offset of the optval because of the difference between group_filter and compat_group_filter. Thus, a new 'ss_offset' argument is added to ip_mc_gsfget(). Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220902002828.2890585-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/igmp.h | 4 ++-- include/linux/mroute.h | 6 +++--- 2 files changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/igmp.h b/include/linux/igmp.h index 93c262ecbdc9..78890143f079 100644 --- a/include/linux/igmp.h +++ b/include/linux/igmp.h @@ -118,9 +118,9 @@ extern int ip_mc_source(int add, int omode, struct sock *sk, struct ip_mreq_source *mreqs, int ifindex); extern int ip_mc_msfilter(struct sock *sk, struct ip_msfilter *msf,int ifindex); extern int ip_mc_msfget(struct sock *sk, struct ip_msfilter *msf, - struct ip_msfilter __user *optval, int __user *optlen); + sockptr_t optval, sockptr_t optlen); extern int ip_mc_gsfget(struct sock *sk, struct group_filter *gsf, - struct sockaddr_storage __user *p); + sockptr_t optval, size_t offset); extern int ip_mc_sf_allow(struct sock *sk, __be32 local, __be32 rmt, int dif, int sdif); extern void ip_mc_init_dev(struct in_device *); diff --git a/include/linux/mroute.h b/include/linux/mroute.h index 6cbbfe94348c..80b8400ab8b2 100644 --- a/include/linux/mroute.h +++ b/include/linux/mroute.h @@ -17,7 +17,7 @@ static inline int ip_mroute_opt(int opt) } int ip_mroute_setsockopt(struct sock *, int, sockptr_t, unsigned int); -int ip_mroute_getsockopt(struct sock *, int, char __user *, int __user *); +int ip_mroute_getsockopt(struct sock *, int, sockptr_t, sockptr_t); int ipmr_ioctl(struct sock *sk, int cmd, void __user *arg); int ipmr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); int ip_mr_init(void); @@ -29,8 +29,8 @@ static inline int ip_mroute_setsockopt(struct sock *sock, int optname, return -ENOPROTOOPT; } -static inline int ip_mroute_getsockopt(struct sock *sock, int optname, - char __user *optval, int __user *optlen) +static inline int ip_mroute_getsockopt(struct sock *sk, int optname, + sockptr_t optval, sockptr_t optlen) { return -ENOPROTOOPT; } -- cgit From 6dadbe4bac68309eb46ab0f30e8ff47a789df49a Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 1 Sep 2022 17:28:53 -0700 Subject: bpf: net: Change do_ipv6_getsockopt() to take the sockptr_t argument Similar to the earlier patch that changes sk_getsockopt() to take the sockptr_t argument . This patch also changes do_ipv6_getsockopt() to take the sockptr_t argument such that a latter patch can make bpf_getsockopt(SOL_IPV6) to reuse do_ipv6_getsockopt(). Note on the change in ip6_mc_msfget(). This function is to return an array of sockaddr_storage in optval. This function is shared between ipv6_get_msfilter() and compat_ipv6_get_msfilter(). However, the sockaddr_storage is stored at different offset of the optval because of the difference between group_filter and compat_group_filter. Thus, a new 'ss_offset' argument is added to ip6_mc_msfget(). Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220902002853.2892532-1-kafai@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/mroute6.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mroute6.h b/include/linux/mroute6.h index bc351a85ce9b..8f2b307fb124 100644 --- a/include/linux/mroute6.h +++ b/include/linux/mroute6.h @@ -27,7 +27,7 @@ struct sock; #ifdef CONFIG_IPV6_MROUTE extern int ip6_mroute_setsockopt(struct sock *, int, sockptr_t, unsigned int); -extern int ip6_mroute_getsockopt(struct sock *, int, char __user *, int __user *); +extern int ip6_mroute_getsockopt(struct sock *, int, sockptr_t, sockptr_t); extern int ip6_mr_input(struct sk_buff *skb); extern int ip6mr_ioctl(struct sock *sk, int cmd, void __user *arg); extern int ip6mr_compat_ioctl(struct sock *sk, unsigned int cmd, void __user *arg); @@ -42,7 +42,7 @@ static inline int ip6_mroute_setsockopt(struct sock *sock, int optname, static inline int ip6_mroute_getsockopt(struct sock *sock, - int optname, char __user *optval, int __user *optlen) + int optname, sockptr_t optval, sockptr_t optlen) { return -ENOPROTOOPT; } -- cgit From 3db2aeb121b9c54f52c399b8905c2bbbf201b7b5 Mon Sep 17 00:00:00 2001 From: Hans de Goede Date: Wed, 17 Aug 2022 14:03:32 +0200 Subject: platform/x86: nvidia-wmi-ec-backlight: Move fw interface definitions to a header (v2) Move the WMI interface definitions to a header, so that the definitions can be shared with drivers/acpi/video_detect.c . Changes in v2: - Add missing Nvidia copyright header - Move WMI_BRIGHTNESS_GUID to nvidia-wmi-ec-backlight.h as well Suggested-by: Daniel Dadap Reviewed-by: Daniel Dadap Signed-off-by: Hans de Goede --- .../platform_data/x86/nvidia-wmi-ec-backlight.h | 76 ++++++++++++++++++++++ 1 file changed, 76 insertions(+) create mode 100644 include/linux/platform_data/x86/nvidia-wmi-ec-backlight.h (limited to 'include/linux') diff --git a/include/linux/platform_data/x86/nvidia-wmi-ec-backlight.h b/include/linux/platform_data/x86/nvidia-wmi-ec-backlight.h new file mode 100644 index 000000000000..23d60130272c --- /dev/null +++ b/include/linux/platform_data/x86/nvidia-wmi-ec-backlight.h @@ -0,0 +1,76 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. + */ + +#ifndef __PLATFORM_DATA_X86_NVIDIA_WMI_EC_BACKLIGHT_H +#define __PLATFORM_DATA_X86_NVIDIA_WMI_EC_BACKLIGHT_H + +#define WMI_BRIGHTNESS_GUID "603E9613-EF25-4338-A3D0-C46177516DB7" + +/** + * enum wmi_brightness_method - WMI method IDs + * @WMI_BRIGHTNESS_METHOD_LEVEL: Get/Set EC brightness level status + * @WMI_BRIGHTNESS_METHOD_SOURCE: Get/Set EC Brightness Source + */ +enum wmi_brightness_method { + WMI_BRIGHTNESS_METHOD_LEVEL = 1, + WMI_BRIGHTNESS_METHOD_SOURCE = 2, + WMI_BRIGHTNESS_METHOD_MAX +}; + +/** + * enum wmi_brightness_mode - Operation mode for WMI-wrapped method + * @WMI_BRIGHTNESS_MODE_GET: Get the current brightness level/source. + * @WMI_BRIGHTNESS_MODE_SET: Set the brightness level. + * @WMI_BRIGHTNESS_MODE_GET_MAX_LEVEL: Get the maximum brightness level. This + * is only valid when the WMI method is + * %WMI_BRIGHTNESS_METHOD_LEVEL. + */ +enum wmi_brightness_mode { + WMI_BRIGHTNESS_MODE_GET = 0, + WMI_BRIGHTNESS_MODE_SET = 1, + WMI_BRIGHTNESS_MODE_GET_MAX_LEVEL = 2, + WMI_BRIGHTNESS_MODE_MAX +}; + +/** + * enum wmi_brightness_source - Backlight brightness control source selection + * @WMI_BRIGHTNESS_SOURCE_GPU: Backlight brightness is controlled by the GPU. + * @WMI_BRIGHTNESS_SOURCE_EC: Backlight brightness is controlled by the + * system's Embedded Controller (EC). + * @WMI_BRIGHTNESS_SOURCE_AUX: Backlight brightness is controlled over the + * DisplayPort AUX channel. + */ +enum wmi_brightness_source { + WMI_BRIGHTNESS_SOURCE_GPU = 1, + WMI_BRIGHTNESS_SOURCE_EC = 2, + WMI_BRIGHTNESS_SOURCE_AUX = 3, + WMI_BRIGHTNESS_SOURCE_MAX +}; + +/** + * struct wmi_brightness_args - arguments for the WMI-wrapped ACPI method + * @mode: Pass in an &enum wmi_brightness_mode value to select between + * getting or setting a value. + * @val: In parameter for value to set when using %WMI_BRIGHTNESS_MODE_SET + * mode. Not used in conjunction with %WMI_BRIGHTNESS_MODE_GET or + * %WMI_BRIGHTNESS_MODE_GET_MAX_LEVEL mode. + * @ret: Out parameter returning retrieved value when operating in + * %WMI_BRIGHTNESS_MODE_GET or %WMI_BRIGHTNESS_MODE_GET_MAX_LEVEL + * mode. Not used in %WMI_BRIGHTNESS_MODE_SET mode. + * @ignored: Padding; not used. The ACPI method expects a 24 byte params struct. + * + * This is the parameters structure for the WmiBrightnessNotify ACPI method as + * wrapped by WMI. The value passed in to @val or returned by @ret will be a + * brightness value when the WMI method ID is %WMI_BRIGHTNESS_METHOD_LEVEL, or + * an &enum wmi_brightness_source value with %WMI_BRIGHTNESS_METHOD_SOURCE. + */ +struct wmi_brightness_args { + u32 mode; + u32 val; + u32 ret; + u32 ignored[3]; +}; + +#endif -- cgit From 2aec909912da55a6e469fd6ee8412080a5433ed2 Mon Sep 17 00:00:00 2001 From: Johannes Berg Date: Mon, 29 Aug 2022 11:46:38 +0200 Subject: wifi: use struct_group to copy addresses We sometimes copy all the addresses from the 802.11 header for the AAD, which may cause complaints from fortify checks. Use struct_group() to avoid the compiler warnings/errors. Change-Id: Ic3ea389105e7813b22095b295079eecdabde5045 Signed-off-by: Johannes Berg --- include/linux/ieee80211.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ieee80211.h b/include/linux/ieee80211.h index 55e6f4ad0ca6..b6e6d5b40774 100644 --- a/include/linux/ieee80211.h +++ b/include/linux/ieee80211.h @@ -310,9 +310,11 @@ static inline u16 ieee80211_sn_sub(u16 sn1, u16 sn2) struct ieee80211_hdr { __le16 frame_control; __le16 duration_id; - u8 addr1[ETH_ALEN]; - u8 addr2[ETH_ALEN]; - u8 addr3[ETH_ALEN]; + struct_group(addrs, + u8 addr1[ETH_ALEN]; + u8 addr2[ETH_ALEN]; + u8 addr3[ETH_ALEN]; + ); __le16 seq_ctrl; u8 addr4[ETH_ALEN]; } __packed __aligned(2); -- cgit From f89aa0b6db18dea3c3c8ef266cc6c9fd8dff2d72 Mon Sep 17 00:00:00 2001 From: Markus Schneider-Pargmann Date: Thu, 1 Sep 2022 12:41:41 +0800 Subject: video/hdmi: Add audio_infoframe packing for DP Similar to HDMI, DP uses audio infoframes as well which are structured very similar to the HDMI ones. This patch adds a helper function to pack the HDMI audio infoframe for DP, called hdmi_audio_infoframe_pack_for_dp(). hdmi_audio_infoframe_pack_only() is split into two parts. One of them packs the payload only and can be used for HDMI and DP. Also constify the frame parameter in hdmi_audio_infoframe_check() as it is passed to hdmi_audio_infoframe_check_only() which expects a const. Signed-off-by: Markus Schneider-Pargmann Signed-off-by: Guillaume Ranquet Signed-off-by: Bo-Chen Chen Reviewed-by: AngeloGioacchino Del Regno Signed-off-by: Dmitry Osipenko Link: https://patchwork.freedesktop.org/patch/msgid/20220901044149.16782-3-rex-bc.chen@mediatek.com --- include/linux/hdmi.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hdmi.h b/include/linux/hdmi.h index c8ec982ff498..2f4dcc8d060e 100644 --- a/include/linux/hdmi.h +++ b/include/linux/hdmi.h @@ -336,7 +336,12 @@ ssize_t hdmi_audio_infoframe_pack(struct hdmi_audio_infoframe *frame, void *buffer, size_t size); ssize_t hdmi_audio_infoframe_pack_only(const struct hdmi_audio_infoframe *frame, void *buffer, size_t size); -int hdmi_audio_infoframe_check(struct hdmi_audio_infoframe *frame); +int hdmi_audio_infoframe_check(const struct hdmi_audio_infoframe *frame); + +struct dp_sdp; +ssize_t +hdmi_audio_infoframe_pack_for_dp(const struct hdmi_audio_infoframe *frame, + struct dp_sdp *sdp, u8 dp_version); enum hdmi_3d_structure { HDMI_3D_STRUCTURE_INVALID = -1, -- cgit From bce1b56c73826fec8caf6187f0c922ede397a5a8 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Sun, 4 Sep 2022 06:39:25 -0600 Subject: Revert "sbitmap: fix batched wait_cnt accounting" This reverts commit 16ede66973c84f890c03584f79158dd5b2d725f5. This is causing issues with CPU stalls on my test box, revert it for now until we understand what is going on. It looks like infinite looping off sbitmap_queue_wake_up(), but hard to tell with a lot of CPUs hitting this issue and the console scrolling infinitely. Link: https://lore.kernel.org/linux-block/e742813b-ce5c-0d58-205b-1626f639b1bd@kernel.dk/ Signed-off-by: Jens Axboe --- include/linux/sbitmap.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 4d2d5205ab58..8f5a86e210b9 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -575,9 +575,8 @@ void sbitmap_queue_wake_all(struct sbitmap_queue *sbq); * sbitmap_queue_wake_up() - Wake up some of waiters in one waitqueue * on a &struct sbitmap_queue. * @sbq: Bitmap queue to wake up. - * @nr: Number of bits cleared. */ -void sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr); +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq); /** * sbitmap_queue_show() - Dump &struct sbitmap_queue information to a &struct -- cgit From 2e9bffc4f713db465177238f6033f7d367d6f151 Mon Sep 17 00:00:00 2001 From: Shawn Lin Date: Thu, 25 Aug 2022 21:38:34 +0200 Subject: phy: rockchip: Support PCIe v3 RK3568 supports PCIe v3 using not Combphy like PCIe v2 on rk3566. It use a dedicated PCIe-phy. Add support for this. Initial support by Shawn Lin, modifications by Peter Geis and Frank Wunderlich. Add data-lanes property for splitting pcie-lanes across controllers. The data-lanes is an array where x=0 means lane is disabled and x > 0 means controller x is assigned to phy lane. Signed-off-by: Shawn Lin Suggested-by: Peter Geis Signed-off-by: Frank Wunderlich Link: https://lore.kernel.org/r/20220825193836.54262-4-linux@fw-web.de Signed-off-by: Vinod Koul --- include/linux/phy/pcie.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) create mode 100644 include/linux/phy/pcie.h (limited to 'include/linux') diff --git a/include/linux/phy/pcie.h b/include/linux/phy/pcie.h new file mode 100644 index 000000000000..e7ac81764576 --- /dev/null +++ b/include/linux/phy/pcie.h @@ -0,0 +1,12 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2022 Rockchip Electronics Co., Ltd. + */ +#ifndef __PHY_PCIE_H +#define __PHY_PCIE_H + +#define PHY_MODE_PCIE_RC 20 +#define PHY_MODE_PCIE_EP 21 +#define PHY_MODE_PCIE_BIFURCATION 22 + +#endif -- cgit From 9c06002682ae0cdd01caf011899224acbc6582b2 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Wed, 13 Jul 2022 20:22:35 +0300 Subject: dmaengine: hsu: Include headers we are direct user of For the sake of integrity, include headers we are direct user of. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220713172235.22611-4-andriy.shevchenko@linux.intel.com Signed-off-by: Vinod Koul --- include/linux/dma/hsu.h | 6 ++++-- include/linux/platform_data/dma-hsu.h | 2 +- 2 files changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma/hsu.h b/include/linux/dma/hsu.h index a6b7bc707356..77ea602c287c 100644 --- a/include/linux/dma/hsu.h +++ b/include/linux/dma/hsu.h @@ -8,11 +8,13 @@ #ifndef _DMA_HSU_H #define _DMA_HSU_H -#include -#include +#include +#include +#include #include +struct device; struct hsu_dma; /** diff --git a/include/linux/platform_data/dma-hsu.h b/include/linux/platform_data/dma-hsu.h index c65b412b2b33..611bae193c1c 100644 --- a/include/linux/platform_data/dma-hsu.h +++ b/include/linux/platform_data/dma-hsu.h @@ -8,7 +8,7 @@ #ifndef _PLATFORM_DATA_DMA_HSU_H #define _PLATFORM_DATA_DMA_HSU_H -#include +struct device; struct hsu_dma_slave { struct device *dma_dev; -- cgit From 0eadd36d9123745f70e233a9d93951d05ca1916a Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Sat, 3 Sep 2022 23:18:47 -0700 Subject: gpiolib: make fwnode_get_named_gpiod() static There are no external users of fwnode_get_named_gpiod() anymore, so let's stop exporting it and mark it as static. Signed-off-by: Dmitry Torokhov Reviewed-by: Andy Shevchenko Signed-off-by: Bartosz Golaszewski --- include/linux/gpio/consumer.h | 13 ------------- 1 file changed, 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h index fe0f460d9a3b..36460ced060b 100644 --- a/include/linux/gpio/consumer.h +++ b/include/linux/gpio/consumer.h @@ -174,10 +174,6 @@ int desc_to_gpio(const struct gpio_desc *desc); /* Child properties interface */ struct fwnode_handle; -struct gpio_desc *fwnode_get_named_gpiod(struct fwnode_handle *fwnode, - const char *propname, int index, - enum gpiod_flags dflags, - const char *label); struct gpio_desc *fwnode_gpiod_get_index(struct fwnode_handle *fwnode, const char *con_id, int index, enum gpiod_flags flags, @@ -553,15 +549,6 @@ static inline int desc_to_gpio(const struct gpio_desc *desc) /* Child properties interface */ struct fwnode_handle; -static inline -struct gpio_desc *fwnode_get_named_gpiod(struct fwnode_handle *fwnode, - const char *propname, int index, - enum gpiod_flags dflags, - const char *label) -{ - return ERR_PTR(-ENOSYS); -} - static inline struct gpio_desc *fwnode_gpiod_get_index(struct fwnode_handle *fwnode, const char *con_id, int index, -- cgit From 4a502cf4d77e12119e7061a05d5789cd3129d185 Mon Sep 17 00:00:00 2001 From: Maxime Chevallier Date: Fri, 2 Sep 2022 10:32:03 +0200 Subject: net: pcs: add new PCS driver for altera TSE PCS The Altera Triple Speed Ethernet has a SGMII/1000BaseC PCS that can be integrated in several ways. It can either be part of the TSE MAC's address space, accessed through 32 bits accesses on the mapped mdio device 0, or through a dedicated 16 bits register set. This driver allows using the TSE PCS outside of altera TSE's driver, since it can be used standalone by other MACs. Signed-off-by: Maxime Chevallier Signed-off-by: David S. Miller --- include/linux/pcs-altera-tse.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) create mode 100644 include/linux/pcs-altera-tse.h (limited to 'include/linux') diff --git a/include/linux/pcs-altera-tse.h b/include/linux/pcs-altera-tse.h new file mode 100644 index 000000000000..92ab9f08e835 --- /dev/null +++ b/include/linux/pcs-altera-tse.h @@ -0,0 +1,17 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) 2022 Bootlin + * + * Maxime Chevallier + */ + +#ifndef __LINUX_PCS_ALTERA_TSE_H +#define __LINUX_PCS_ALTERA_TSE_H + +struct phylink_pcs; +struct net_device; + +struct phylink_pcs *alt_tse_pcs_create(struct net_device *ndev, + void __iomem *pcs_base, int reg_width); + +#endif /* __LINUX_PCS_ALTERA_TSE_H */ -- cgit From dec9b2f1e0455a151a7293c367da22ab973f713e Mon Sep 17 00:00:00 2001 From: Greg Kroah-Hartman Date: Fri, 2 Sep 2022 16:59:15 +0200 Subject: debugfs: add debugfs_lookup_and_remove() There is a very common pattern of using debugfs_remove(debufs_lookup(..)) which results in a dentry leak of the dentry that was looked up. Instead of having to open-code the correct pattern of calling dput() on the dentry, create debugfs_lookup_and_remove() to handle this pattern automatically and properly without any memory leaks. Cc: stable Reported-by: Kuyo Chang Tested-by: Kuyo Chang Link: https://lore.kernel.org/r/YxIaQ8cSinDR881k@kroah.com Signed-off-by: Greg Kroah-Hartman --- include/linux/debugfs.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h index c869f1e73d75..f60674692d36 100644 --- a/include/linux/debugfs.h +++ b/include/linux/debugfs.h @@ -91,6 +91,8 @@ struct dentry *debugfs_create_automount(const char *name, void debugfs_remove(struct dentry *dentry); #define debugfs_remove_recursive debugfs_remove +void debugfs_lookup_and_remove(const char *name, struct dentry *parent); + const struct file_operations *debugfs_real_fops(const struct file *filp); int debugfs_file_get(struct dentry *dentry); @@ -225,6 +227,10 @@ static inline void debugfs_remove(struct dentry *dentry) static inline void debugfs_remove_recursive(struct dentry *dentry) { } +static inline void debugfs_lookup_and_remove(const char *name, + struct dentry *parent) +{ } + const struct file_operations *debugfs_real_fops(const struct file *filp); static inline int debugfs_file_get(struct dentry *dentry) -- cgit From 9ca05b0f27de928be121cccf07735819dc9e1ed3 Mon Sep 17 00:00:00 2001 From: Maher Sanalla Date: Mon, 29 Aug 2022 12:02:27 +0300 Subject: RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile When the RDMA auxiliary driver probes, it sets its profile based on devlink driverinit value. The latter might not be in sync with FW yet (In case devlink reload is not performed), thus causing a mismatch between RDMA driver and FW. This results in the following FW syndrome when the RDMA driver tries to adjust RoCE state, which fails the probe: "0xC1F678 | modify_nic_vport_context: roce_en set on a vport that doesn't support roce" To prevent this, select the PF profile based on FW RoCE capability instead of relying on devlink driverinit value. To provide backward compatibility of the RoCE disable feature, on older FW's where roce_rw is not set (FW RoCE capability is read-only), keep the current behavior e.g., rely on devlink driverinit value. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Reviewed-by: Shay Drory Reviewed-by: Michael Guralnik Reviewed-by: Saeed Mahameed Signed-off-by: Maher Sanalla Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 96b16fbe1aa4..d6338fb449c8 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1279,16 +1279,17 @@ enum { MLX5_TRIGGERED_CMD_COMP = (u64)1 << 32, }; -static inline bool mlx5_is_roce_init_enabled(struct mlx5_core_dev *dev) +bool mlx5_is_roce_on(struct mlx5_core_dev *dev); + +static inline bool mlx5_get_roce_state(struct mlx5_core_dev *dev) { - struct devlink *devlink = priv_to_devlink(dev); - union devlink_param_value val; - int err; - - err = devlink_param_driverinit_value_get(devlink, - DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE, - &val); - return err ? MLX5_CAP_GEN(dev, roce) : val.vbool; + if (MLX5_CAP_GEN(dev, roce_rw_supported)) + return MLX5_CAP_GEN(dev, roce); + + /* If RoCE cap is read-only in FW, get RoCE state from devlink + * in order to support RoCE enable/disable feature + */ + return mlx5_is_roce_on(dev); } #endif /* MLX5_DRIVER_H */ -- cgit From 8a2dd123f12f69e5373d3103da2c97fc36223e0c Mon Sep 17 00:00:00 2001 From: Chris Mi Date: Mon, 29 Aug 2022 12:06:04 +0300 Subject: RDMA/mlx5: Move function mlx5_core_query_ib_ppcnt() to mlx5_ib This patch doesn't change any functionality, but move one function to mlx5_ib because it is not used by mlx5_core. The actual fix is in the next patch. Reviewed-by: Roi Dayan Signed-off-by: Chris Mi Link: https://lore.kernel.org/r/fd47b9138412bd94ed30f838026cbb4cf3878150.1661763871.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky --- include/linux/mlx5/driver.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 76d7661e3e63..432bd202e2fe 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1079,8 +1079,6 @@ int mlx5_core_destroy_psv(struct mlx5_core_dev *dev, int psv_num); void mlx5_core_put_rsc(struct mlx5_core_rsc_common *common); int mlx5_query_odp_caps(struct mlx5_core_dev *dev, struct mlx5_odp_caps *odp_caps); -int mlx5_core_query_ib_ppcnt(struct mlx5_core_dev *dev, - u8 port_num, void *out, size_t sz); int mlx5_init_rl_table(struct mlx5_core_dev *dev); void mlx5_cleanup_rl_table(struct mlx5_core_dev *dev); -- cgit From 05ad5d4581c3c1cc724fe50d4652833fb9f3037b Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Fri, 2 Sep 2022 18:02:39 -0400 Subject: net: phy: Add 1000BASE-KX interface mode Add 1000BASE-KX interface mode. This 1G backplane ethernet as described in clause 70. Clause 73 autonegotiation is mandatory, and only full duplex operation is supported. Although at the PMA level this interface mode is identical to 1000BASE-X, it uses a different form of in-band autonegation. This justifies a separate interface mode, since the interface mode (along with the MLO_AN_* autonegotiation mode) sets the type of autonegotiation which will be used on a link. This results in more than just electrical differences between the link modes. With regard to 1000BASE-X, 1000BASE-KX holds a similar position to SGMII: same signaling, but different autonegotiation. PCS drivers (which typically handle in-band autonegotiation) may only support 1000BASE-X, and not 1000BASE-KX. Similarly, the phy mode is used to configure serdes phys with phy_set_mode_ext. Due to the different electrical standards (SFI or XFI vs Clause 70), they will likely want to use different configuration. Adding a phy interface mode for 1000BASE-KX helps simplify configuration in these areas. Signed-off-by: Sean Anderson Signed-off-by: David S. Miller --- include/linux/phy.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 7c49ab95441b..337230c135f7 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -116,6 +116,7 @@ extern const int phy_10gbit_features_array[1]; * @PHY_INTERFACE_MODE_USXGMII: Universal Serial 10GE MII * @PHY_INTERFACE_MODE_10GKR: 10GBASE-KR - with Clause 73 AN * @PHY_INTERFACE_MODE_QUSGMII: Quad Universal SGMII + * @PHY_INTERFACE_MODE_1000BASEKX: 1000Base-KX - with Clause 73 AN * @PHY_INTERFACE_MODE_MAX: Book keeping * * Describes the interface between the MAC and PHY. @@ -154,6 +155,7 @@ typedef enum { /* 10GBASE-KR - with Clause 73 AN */ PHY_INTERFACE_MODE_10GKR, PHY_INTERFACE_MODE_QUSGMII, + PHY_INTERFACE_MODE_1000BASEKX, PHY_INTERFACE_MODE_MAX, } phy_interface_t; @@ -251,6 +253,8 @@ static inline const char *phy_modes(phy_interface_t interface) return "trgmii"; case PHY_INTERFACE_MODE_1000BASEX: return "1000base-x"; + case PHY_INTERFACE_MODE_1000BASEKX: + return "1000base-kx"; case PHY_INTERFACE_MODE_2500BASEX: return "2500base-x"; case PHY_INTERFACE_MODE_5GBASER: -- cgit From 7c8199e24fa09d2344ae0204527d55d7803e8409 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Fri, 2 Sep 2022 14:10:43 -0700 Subject: bpf: Introduce any context BPF specific memory allocator. Tracing BPF programs can attach to kprobe and fentry. Hence they run in unknown context where calling plain kmalloc() might not be safe. Front-end kmalloc() with minimal per-cpu cache of free elements. Refill this cache asynchronously from irq_work. BPF programs always run with migration disabled. It's safe to allocate from cache of the current cpu with irqs disabled. Free-ing is always done into bucket of the current cpu as well. irq_work trims extra free elements from buckets with kfree and refills them with kmalloc, so global kmalloc logic takes care of freeing objects allocated by one cpu and freed on another. struct bpf_mem_alloc supports two modes: - When size != 0 create kmem_cache and bpf_mem_cache for each cpu. This is typical bpf hash map use case when all elements have equal size. - When size == 0 allocate 11 bpf_mem_cache-s for each cpu, then rely on kmalloc/kfree. Max allocation size is 4096 in this case. This is bpf_dynptr and bpf_kptr use case. bpf_mem_alloc/bpf_mem_free are bpf specific 'wrappers' of kmalloc/kfree. bpf_mem_cache_alloc/bpf_mem_cache_free are 'wrappers' of kmem_cache_alloc/kmem_cache_free. The allocators are NMI-safe from bpf programs only. They are not NMI-safe in general. Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20220902211058.60789-2-alexei.starovoitov@gmail.com --- include/linux/bpf_mem_alloc.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 include/linux/bpf_mem_alloc.h (limited to 'include/linux') diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h new file mode 100644 index 000000000000..804733070f8d --- /dev/null +++ b/include/linux/bpf_mem_alloc.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2022 Meta Platforms, Inc. and affiliates. */ +#ifndef _BPF_MEM_ALLOC_H +#define _BPF_MEM_ALLOC_H +#include + +struct bpf_mem_cache; +struct bpf_mem_caches; + +struct bpf_mem_alloc { + struct bpf_mem_caches __percpu *caches; + struct bpf_mem_cache __percpu *cache; +}; + +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); + +/* kmalloc/kfree equivalent: */ +void *bpf_mem_alloc(struct bpf_mem_alloc *ma, size_t size); +void bpf_mem_free(struct bpf_mem_alloc *ma, void *ptr); + +/* kmem_cache_alloc/free equivalent: */ +void *bpf_mem_cache_alloc(struct bpf_mem_alloc *ma); +void bpf_mem_cache_free(struct bpf_mem_alloc *ma, void *ptr); + +#endif /* _BPF_MEM_ALLOC_H */ -- cgit From 4ab67149f3c6e97c5c506a726f0ebdec38241679 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Fri, 2 Sep 2022 14:10:52 -0700 Subject: bpf: Add percpu allocation support to bpf_mem_alloc. Extend bpf_mem_alloc to cache free list of fixed size per-cpu allocations. Once such cache is created bpf_mem_cache_alloc() will return per-cpu objects. bpf_mem_cache_free() will free them back into global per-cpu pool after observing RCU grace period. per-cpu flavor of bpf_mem_alloc is going to be used by per-cpu hash maps. The free list cache consists of tuples { llist_node, per-cpu pointer } Unlike alloc_percpu() that returns per-cpu pointer the bpf_mem_cache_alloc() returns a pointer to per-cpu pointer and bpf_mem_cache_free() expects to receive it back. Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Acked-by: Kumar Kartikeya Dwivedi Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20220902211058.60789-11-alexei.starovoitov@gmail.com --- include/linux/bpf_mem_alloc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 804733070f8d..653ed1584a03 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -12,7 +12,7 @@ struct bpf_mem_alloc { struct bpf_mem_cache __percpu *cache; }; -int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size); +int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); void bpf_mem_alloc_destroy(struct bpf_mem_alloc *ma); /* kmalloc/kfree equivalent: */ -- cgit From 9f2c6e96c65e6fa1aebef546be0c30a5895fcb37 Mon Sep 17 00:00:00 2001 From: Alexei Starovoitov Date: Fri, 2 Sep 2022 14:10:58 -0700 Subject: bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc. User space might be creating and destroying a lot of hash maps. Synchronous rcu_barrier-s in a destruction path of hash map delay freeing of hash buckets and other map memory and may cause artificial OOM situation under stress. Optimize rcu_barrier usage between bpf hash map and bpf_mem_alloc: - remove rcu_barrier from hash map, since htab doesn't use call_rcu directly and there are no callback to wait for. - bpf_mem_alloc has call_rcu_in_progress flag that indicates pending callbacks. Use it to avoid barriers in fast path. - When barriers are needed copy bpf_mem_alloc into temp structure and wait for rcu barrier-s in the worker to let the rest of hash map freeing to proceed. Signed-off-by: Alexei Starovoitov Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220902211058.60789-17-alexei.starovoitov@gmail.com --- include/linux/bpf_mem_alloc.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_mem_alloc.h b/include/linux/bpf_mem_alloc.h index 653ed1584a03..3e164b8efaa9 100644 --- a/include/linux/bpf_mem_alloc.h +++ b/include/linux/bpf_mem_alloc.h @@ -3,6 +3,7 @@ #ifndef _BPF_MEM_ALLOC_H #define _BPF_MEM_ALLOC_H #include +#include struct bpf_mem_cache; struct bpf_mem_caches; @@ -10,6 +11,7 @@ struct bpf_mem_caches; struct bpf_mem_alloc { struct bpf_mem_caches __percpu *caches; struct bpf_mem_cache __percpu *cache; + struct work_struct work; }; int bpf_mem_alloc_init(struct bpf_mem_alloc *ma, int size, bool percpu); -- cgit From f0b933236ec97de5ee49c60aae57a9c5c4dadc87 Mon Sep 17 00:00:00 2001 From: Cezary Rojewski Date: Sun, 4 Sep 2022 12:28:39 +0200 Subject: lib/string_helpers: Introduce parse_int_array_user() Add new helper function to allow for splitting specified user string into a sequence of integers. Internally it makes use of get_options() so the returned sequence contains the integers extracted plus an additional element that begins the sequence and specifies the integers count. Suggested-by: Andy Shevchenko Reviewed-by: Andy Shevchenko Signed-off-by: Cezary Rojewski Link: https://lore.kernel.org/r/20220904102840.862395-2-cezary.rojewski@intel.com Signed-off-by: Mark Brown --- include/linux/string_helpers.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/string_helpers.h b/include/linux/string_helpers.h index 4d72258d42fd..dc2e726fd820 100644 --- a/include/linux/string_helpers.h +++ b/include/linux/string_helpers.h @@ -21,6 +21,8 @@ enum string_size_units { void string_get_size(u64 size, u64 blk_size, enum string_size_units units, char *buf, int len); +int parse_int_array_user(const char __user *from, size_t count, int **array); + #define UNESCAPE_SPACE BIT(0) #define UNESCAPE_OCTAL BIT(1) #define UNESCAPE_HEX BIT(2) -- cgit From 8409fe92d88c332923130149fe209d1c882b286e Mon Sep 17 00:00:00 2001 From: Vitaly Kuznetsov Date: Sat, 27 Aug 2022 15:03:43 +0200 Subject: PCI: Move PCI_VENDOR_ID_MICROSOFT/PCI_DEVICE_ID_HYPERV_VIDEO definitions to pci_ids.h There are already three places in kernel which define PCI_VENDOR_ID_MICROSOFT and two for PCI_DEVICE_ID_HYPERV_VIDEO and there's a need to use these from core VMBus code. Move the defines where they belong. No functional change. Reviewed-by: Michael Kelley Acked-by: Bjorn Helgaas # pci_ids.h Signed-off-by: Vitaly Kuznetsov Link: https://lore.kernel.org/r/20220827130345.1320254-2-vkuznets@redhat.com Signed-off-by: Wei Liu --- include/linux/pci_ids.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h index 6feade66efdb..15b49e655ce3 100644 --- a/include/linux/pci_ids.h +++ b/include/linux/pci_ids.h @@ -2079,6 +2079,9 @@ #define PCI_DEVICE_ID_ICE_1712 0x1712 #define PCI_DEVICE_ID_VT1724 0x1724 +#define PCI_VENDOR_ID_MICROSOFT 0x1414 +#define PCI_DEVICE_ID_HYPERV_VIDEO 0x5353 + #define PCI_VENDOR_ID_OXSEMI 0x1415 #define PCI_DEVICE_ID_OXSEMI_12PCI840 0x8403 #define PCI_DEVICE_ID_OXSEMI_PCIe840 0xC000 -- cgit From 2bc9cd66eb25d0fefbb081421d6586495e25840e Mon Sep 17 00:00:00 2001 From: Vincent Whitchurch Date: Mon, 29 Aug 2022 11:18:40 +0200 Subject: iio: Use per-device lockdep class for mlock If an IIO driver uses callbacks from another IIO driver and calls iio_channel_start_all_cb() from one of its buffer setup ops, then lockdep complains due to the lock nesting, as in the below example with lmp91000. Since the locks are being taken on different IIO devices, there is no actual deadlock. Fix the warning by telling lockdep to use a different class for each iio_device. ============================================ WARNING: possible recursive locking detected -------------------------------------------- python3/23 is trying to acquire lock: (&indio_dev->mlock){+.+.}-{3:3}, at: iio_update_buffers but task is already holding lock: (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&indio_dev->mlock); lock(&indio_dev->mlock); *** DEADLOCK *** May be due to missing lock nesting notation 5 locks held by python3/23: #0: (sb_writers#5){.+.+}-{0:0}, at: ksys_write #1: (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter #2: (kn->active#14){.+.+}-{0:0}, at: kernfs_fop_write_iter #3: (&indio_dev->mlock){+.+.}-{3:3}, at: enable_store #4: (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_update_buffers Call Trace: __mutex_lock iio_update_buffers iio_channel_start_all_cb lmp91000_buffer_postenable __iio_update_buffers enable_store Fixes: 67e17300dc1d76 ("iio: potentiostat: add LMP91000 support") Signed-off-by: Vincent Whitchurch Reviewed-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220829091840.2791846-1-vincent.whitchurch@axis.com Signed-off-by: Jonathan Cameron --- include/linux/iio/iio-opaque.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iio/iio-opaque.h b/include/linux/iio/iio-opaque.h index 6b3586b3f952..d1f8b30a7c8b 100644 --- a/include/linux/iio/iio-opaque.h +++ b/include/linux/iio/iio-opaque.h @@ -11,6 +11,7 @@ * checked by device drivers but should be considered * read-only as this is a core internal bit * @driver_module: used to make it harder to undercut users + * @mlock_key: lockdep class for iio_dev lock * @info_exist_lock: lock to prevent use during removal * @trig_readonly: mark the current trigger immutable * @event_interface: event chrdevs associated with interrupt lines @@ -42,6 +43,7 @@ struct iio_dev_opaque { int currentmode; int id; struct module *driver_module; + struct lock_class_key mlock_key; struct mutex info_exist_lock; bool trig_readonly; struct iio_event_interface *event_interface; -- cgit From 835e699ef82adfc85ac4cc3f1f237c1adfdefd20 Mon Sep 17 00:00:00 2001 From: Jagath Jog J Date: Wed, 31 Aug 2022 12:01:16 +0530 Subject: iio: Add new event type gesture and use direction for single and double tap Add new event type for tap called gesture and the direction can be used to differentiate single and double tap. This may be used by accelerometer sensors to express single and double tap events. For directional tap, modifiers like IIO_MOD_(X/Y/Z) can be used along with singletap and doubletap direction. Signed-off-by: Jagath Jog J Link: https://lore.kernel.org/r/20220831063117.4141-2-jagathjog1996@gmail.com Signed-off-by: Jonathan Cameron --- include/linux/iio/types.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/iio/types.h b/include/linux/iio/types.h index 27143b03909d..82faa98c719a 100644 --- a/include/linux/iio/types.h +++ b/include/linux/iio/types.h @@ -17,6 +17,8 @@ enum iio_event_info { IIO_EV_INFO_HIGH_PASS_FILTER_3DB, IIO_EV_INFO_LOW_PASS_FILTER_3DB, IIO_EV_INFO_TIMEOUT, + IIO_EV_INFO_RESET_TIMEOUT, + IIO_EV_INFO_TAP2_MIN_DELAY, }; #define IIO_VAL_INT 1 -- cgit From ead384d956345681e1ddf97890d5e15ded015f07 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Fri, 19 Aug 2022 18:20:37 +0800 Subject: efi/loongarch: Add efistub booting support This patch adds efistub booting support, which is the standard UEFI boot protocol for LoongArch to use. We use generic efistub, which means we can pass boot information (i.e., system table, memory map, kernel command line, initrd) via a light FDT and drop a lot of non-standard code. We use a flat mapping to map the efi runtime in the kernel's address space. In efi, VA = PA; in kernel, VA = PA + PAGE_OFFSET. As a result, flat mapping is not identity mapping, SetVirtualAddressMap() is still needed for the efi runtime. Tested-by: Xi Ruoyao Signed-off-by: Huacai Chen [ardb: change fpic to fpie as suggested by Xi Ruoyao] Signed-off-by: Ard Biesheuvel --- include/linux/pe.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pe.h b/include/linux/pe.h index daf09ffffe38..1d3836ef9d92 100644 --- a/include/linux/pe.h +++ b/include/linux/pe.h @@ -65,6 +65,8 @@ #define IMAGE_FILE_MACHINE_SH5 0x01a8 #define IMAGE_FILE_MACHINE_THUMB 0x01c2 #define IMAGE_FILE_MACHINE_WCEMIPSV2 0x0169 +#define IMAGE_FILE_MACHINE_LOONGARCH32 0x6232 +#define IMAGE_FILE_MACHINE_LOONGARCH64 0x6264 /* flags */ #define IMAGE_FILE_RELOCS_STRIPPED 0x0001 -- cgit From 3aac580d5cc3001ca1627725b3b61edb529f341d Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Thu, 1 Sep 2022 06:09:54 -0700 Subject: perf: Add sample_flags to indicate the PMU-filled sample data On some platforms, some data e.g., timestamps, can be retrieved from the PMU driver. Usually, the data from the PMU driver is more accurate. The current perf kernel should output the PMU-filled sample data if it's available. To check the availability of the PMU-filled sample data, the current perf kernel initializes the related fields in the perf_sample_data_init(). When outputting a sample, the perf checks whether the field is updated by the PMU driver. If yes, the updated value will be output. If not, the perf uses an SW way to calculate the value or just outputs the initialized value if an SW way is unavailable either. With more and more data being provided by the PMU driver, more fields has to be initialized in the perf_sample_data_init(). That will increase the number of cache lines touched in perf_sample_data_init() and be harmful to the performance. Add new "sample_flags" to indicate the PMU-filled sample data. The PMU driver should set the corresponding PERF_SAMPLE_ flag when the field is updated. The initialization of the corresponding field is not required anymore. The following patches will make use of it and remove the corresponding fields from the perf_sample_data_init(), which will further minimize the number of cache lines touched. Only clear the sample flags that have already been done by the PMU driver in the perf_prepare_sample() for the PERF_RECORD_SAMPLE. For the other PERF_RECORD_ event type, the sample data is not available. Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220901130959.1285717-2-kan.liang@linux.intel.com --- include/linux/perf_event.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 1999408a9cbb..0978165a2d87 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1008,6 +1008,7 @@ struct perf_sample_data { * Fields set by perf_sample_data_init(), group so as to * minimize the cachelines touched. */ + u64 sample_flags; u64 addr; struct perf_raw_record *raw; struct perf_branch_stack *br_stack; @@ -1057,6 +1058,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr, u64 period) { /* remaining struct members initialized in perf_prepare_sample() */ + data->sample_flags = 0; data->addr = addr; data->raw = NULL; data->br_stack = NULL; -- cgit From a9a931e2666878343782c82d7d55cc173ddeb3e9 Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Thu, 1 Sep 2022 06:09:56 -0700 Subject: perf: Use sample_flags for branch stack Use the new sample_flags to indicate whether the branch stack is filled by the PMU driver. Remove the br_stack from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220901130959.1285717-4-kan.liang@linux.intel.com --- include/linux/perf_event.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 0978165a2d87..1e12e79454e0 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1011,7 +1011,6 @@ struct perf_sample_data { u64 sample_flags; u64 addr; struct perf_raw_record *raw; - struct perf_branch_stack *br_stack; u64 period; union perf_sample_weight weight; u64 txn; @@ -1021,6 +1020,8 @@ struct perf_sample_data { * The other fields, optionally {set,used} by * perf_{prepare,output}_sample(). */ + struct perf_branch_stack *br_stack; + u64 type; u64 ip; struct { @@ -1061,7 +1062,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->sample_flags = 0; data->addr = addr; data->raw = NULL; - data->br_stack = NULL; data->period = period; data->weight.full = 0; data->data_src.val = PERF_MEM_NA; -- cgit From 2abe681da0a192ab850a5271d838a7817b469fca Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Thu, 1 Sep 2022 06:09:57 -0700 Subject: perf: Use sample_flags for weight Use the new sample_flags to indicate whether the weight field is filled by the PMU driver. Remove the weight field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220901130959.1285717-5-kan.liang@linux.intel.com --- include/linux/perf_event.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 1e12e79454e0..06a587b5faa9 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1012,7 +1012,6 @@ struct perf_sample_data { u64 addr; struct perf_raw_record *raw; u64 period; - union perf_sample_weight weight; u64 txn; union perf_mem_data_src data_src; @@ -1021,6 +1020,7 @@ struct perf_sample_data { * perf_{prepare,output}_sample(). */ struct perf_branch_stack *br_stack; + union perf_sample_weight weight; u64 type; u64 ip; @@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->addr = addr; data->raw = NULL; data->period = period; - data->weight.full = 0; data->data_src.val = PERF_MEM_NA; data->txn = 0; } -- cgit From e16fd7f2cb1a65555cfe76f983eaefb1eab7471f Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Thu, 1 Sep 2022 06:09:58 -0700 Subject: perf: Use sample_flags for data_src Use the new sample_flags to indicate whether the data_src field is filled by the PMU driver. Remove the data_src field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220901130959.1285717-6-kan.liang@linux.intel.com --- include/linux/perf_event.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 06a587b5faa9..6849f10dfc7e 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1013,7 +1013,6 @@ struct perf_sample_data { struct perf_raw_record *raw; u64 period; u64 txn; - union perf_mem_data_src data_src; /* * The other fields, optionally {set,used} by @@ -1021,6 +1020,7 @@ struct perf_sample_data { */ struct perf_branch_stack *br_stack; union perf_sample_weight weight; + union perf_mem_data_src data_src; u64 type; u64 ip; @@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->addr = addr; data->raw = NULL; data->period = period; - data->data_src.val = PERF_MEM_NA; data->txn = 0; } -- cgit From ee9db0e14b0575aa827579dc2471a29ec5fc6877 Mon Sep 17 00:00:00 2001 From: Kan Liang Date: Thu, 1 Sep 2022 06:09:59 -0700 Subject: perf: Use sample_flags for txn Use the new sample_flags to indicate whether the txn field is filled by the PMU driver. Remove the txn field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Kan Liang Signed-off-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220901130959.1285717-7-kan.liang@linux.intel.com --- include/linux/perf_event.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 6849f10dfc7e..581880ddb9ef 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1012,7 +1012,6 @@ struct perf_sample_data { u64 addr; struct perf_raw_record *raw; u64 period; - u64 txn; /* * The other fields, optionally {set,used} by @@ -1021,6 +1020,7 @@ struct perf_sample_data { struct perf_branch_stack *br_stack; union perf_sample_weight weight; union perf_mem_data_src data_src; + u64 txn; u64 type; u64 ip; @@ -1063,7 +1063,6 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, data->addr = addr; data->raw = NULL; data->period = period; - data->txn = 0; } /* -- cgit From 0083d27b21dd2a598df8275b98f89ced532e2e53 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 6 Sep 2022 09:38:42 -1000 Subject: cgroup: Improve cftype add/rm error handling Let's track whether a cftype is currently added or not using a new flag __CFTYPE_ADDED so that duplicate operations can be failed safely and consistently allow using empty cftypes. Signed-off-by: Tejun Heo --- include/linux/cgroup-defs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 1283993d7ea8..9fa1652d0493 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -131,6 +131,7 @@ enum { /* internal flags, do not use outside cgroup core proper */ __CFTYPE_ONLY_ON_DFL = (1 << 16), /* only on default hierarchy */ __CFTYPE_NOT_ON_DFL = (1 << 17), /* not on default hierarchy */ + __CFTYPE_ADDED = (1 << 18), }; /* -- cgit From 8a693f7766f9e27c390c5fec8a5db1f9d1d8177e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 6 Sep 2022 09:38:55 -1000 Subject: cgroup: Remove CFTYPE_PRESSURE CFTYPE_PRESSURE is used to flag PSI related files so that they are not created if PSI is disabled during boot. It's a bit weird to use a generic flag to mark a specific file type. Let's instead move the PSI files into its own cftypes array and add/rm them conditionally. This is a bit more code but cleaner. No userland visible changes. Signed-off-by: Tejun Heo Cc: Johannes Weiner --- include/linux/cgroup-defs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 9fa1652d0493..8f481d1b159a 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -126,7 +126,6 @@ enum { CFTYPE_NO_PREFIX = (1 << 3), /* (DON'T USE FOR NEW FILES) no subsys prefix */ CFTYPE_WORLD_WRITABLE = (1 << 4), /* (DON'T USE FOR NEW FILES) S_IWUGO */ CFTYPE_DEBUG = (1 << 5), /* create when cgroup_debug */ - CFTYPE_PRESSURE = (1 << 6), /* only if pressure feature is enabled */ /* internal flags, do not use outside cgroup core proper */ __CFTYPE_ONLY_ON_DFL = (1 << 16), /* only on default hierarchy */ -- cgit From 14db0b3c7b837f4edeb7c1794290c2f345c7f627 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Mon, 15 Aug 2022 16:50:51 -0700 Subject: fscrypt: stop using PG_error to track error status As a step towards freeing the PG_error flag for other uses, change ext4 and f2fs to stop using PG_error to track decryption errors. Instead, if a decryption error occurs, just mark the whole bio as failed. The coarser granularity isn't really a problem since it isn't any worse than what the block layer provides, and errors from a multi-page readahead aren't reported to applications unless a single-page read fails too. Signed-off-by: Eric Biggers Reviewed-by: Chao Yu # for f2fs part Link: https://lore.kernel.org/r/20220815235052.86545-2-ebiggers@kernel.org --- include/linux/fscrypt.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index b95b8601b9c1..488fd8c8f8af 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -351,7 +351,7 @@ u64 fscrypt_fname_siphash(const struct inode *dir, const struct qstr *name); int fscrypt_d_revalidate(struct dentry *dentry, unsigned int flags); /* bio.c */ -void fscrypt_decrypt_bio(struct bio *bio); +bool fscrypt_decrypt_bio(struct bio *bio); int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk, sector_t pblk, unsigned int len); @@ -644,8 +644,9 @@ static inline int fscrypt_d_revalidate(struct dentry *dentry, } /* bio.c */ -static inline void fscrypt_decrypt_bio(struct bio *bio) +static inline bool fscrypt_decrypt_bio(struct bio *bio) { + return true; } static inline int fscrypt_zeroout_range(const struct inode *inode, pgoff_t lblk, -- cgit From 720e6a435194fb5237833a4a7ec6aa60a78964a8 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Wed, 31 Aug 2022 08:26:46 -0700 Subject: bpf: Allow struct argument in trampoline based programs Allow struct argument in trampoline based programs where the struct size should be <= 16 bytes. In such cases, the argument will be put into up to 2 registers for bpf, x86_64 and arm64 architectures. To support arch-specific trampoline manipulation, add arg_flags for additional struct information about arguments in btf_func_model. Such information will be used in arch specific function arch_prepare_bpf_trampoline() to prepare argument access properly in trampoline. Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20220831152646.2078089-1-yhs@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9c1674973e03..4d32f125f4af 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -727,10 +727,14 @@ enum bpf_cgroup_storage_type { */ #define MAX_BPF_FUNC_REG_ARGS 5 +/* The argument is a structure. */ +#define BTF_FMODEL_STRUCT_ARG BIT(0) + struct btf_func_model { u8 ret_size; u8 nr_args; u8 arg_size[MAX_BPF_FUNC_ARGS]; + u8 arg_flags[MAX_BPF_FUNC_ARGS]; }; /* Restore arguments before returning from trampoline to let original function -- cgit From be376df724aa3b7abdf79390eaab60c58a92f4f0 Mon Sep 17 00:00:00 2001 From: Marek Vasut Date: Sat, 27 Aug 2022 04:49:03 +0200 Subject: wifi: brcmfmac: add 43439 SDIO ids and initialization Add HW and SDIO ids for use with the muRata 1YN (Cypress CYW43439). Add the firmware mapping structures for the CYW43439 chipset. The 43439 needs some things setup similar to the 43430 chipset. Signed-off-by: Marek Vasut Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/20220827024903.617294-1-marex@denx.de --- include/linux/mmc/sdio_ids.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mmc/sdio_ids.h b/include/linux/mmc/sdio_ids.h index 53f0efa0bccf..74f9d9a6d330 100644 --- a/include/linux/mmc/sdio_ids.h +++ b/include/linux/mmc/sdio_ids.h @@ -74,6 +74,7 @@ #define SDIO_DEVICE_ID_BROADCOM_43362 0xa962 #define SDIO_DEVICE_ID_BROADCOM_43364 0xa9a4 #define SDIO_DEVICE_ID_BROADCOM_43430 0xa9a6 +#define SDIO_DEVICE_ID_BROADCOM_CYPRESS_43439 0xa9af #define SDIO_DEVICE_ID_BROADCOM_43455 0xa9bf #define SDIO_DEVICE_ID_BROADCOM_CYPRESS_43752 0xaae8 -- cgit From 9fc18f6d56d5b79d527c17a8100a0965d18345cf Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Sun, 21 Aug 2022 16:06:44 +0200 Subject: dma-mapping: mark dma_supported static Now that the remaining users in drivers are gone, this function can be marked static. Signed-off-by: Christoph Hellwig --- include/linux/dma-mapping.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 25a30906289d..0ee20b764000 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -139,7 +139,6 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs); bool dma_can_mmap(struct device *dev); -int dma_supported(struct device *dev, u64 mask); bool dma_pci_p2pdma_supported(struct device *dev); int dma_set_mask(struct device *dev, u64 mask); int dma_set_coherent_mask(struct device *dev, u64 mask); @@ -248,10 +247,6 @@ static inline bool dma_can_mmap(struct device *dev) { return false; } -static inline int dma_supported(struct device *dev, u64 mask) -{ - return 0; -} static inline bool dma_pci_p2pdma_supported(struct device *dev) { return false; -- cgit From 283945017cbf685546ba7d065f254ad77eb888b1 Mon Sep 17 00:00:00 2001 From: Yuan Can Date: Mon, 15 Aug 2022 01:33:39 +0000 Subject: iommu: Remove comment of dev_has_feat in struct doc dev_has_feat has been removed from iommu_ops in commit 309c56e84602 ("iommu: remove the unused dev_has_feat method"), remove its description in the struct doc. Signed-off-by: Yuan Can Link: https://lore.kernel.org/r/20220815013339.2552-1-yuancan@huawei.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index ea30f00dc145..a004e87e0e1d 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -212,7 +212,7 @@ struct iommu_iotlb_gather { * @of_xlate: add OF master IDs to iommu grouping * @is_attach_deferred: Check if domain attach should be deferred from iommu * driver init to device driver init (default no) - * @dev_has/enable/disable_feat: per device entries to check/enable/disable + * @dev_enable/disable_feat: per device entries to enable/disable * iommu specific features. * @sva_bind: Bind process address space to device * @sva_unbind: Unbind process address space from device -- cgit From a1be74c5384c4a47d91ab4af2ed854d3b7eaf763 Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Mon, 5 Sep 2022 13:58:43 +0300 Subject: net/mlx5: Introduce ifc bits for page tracker Introduce ifc related stuff to enable using page tracker. A page tracker is a dirty page tracking object used by the device to report the tracking log. Signed-off-by: Yishai Hadas Link: https://lore.kernel.org/r/20220905105852.26398-2-yishaih@nvidia.com Signed-off-by: Leon Romanovsky --- include/linux/mlx5/mlx5_ifc.h | 83 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 82 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 4acd5610e96b..06eab92b9fb3 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -89,6 +89,7 @@ enum { MLX5_OBJ_TYPE_VIRTIO_NET_Q = 0x000d, MLX5_OBJ_TYPE_VIRTIO_Q_COUNTERS = 0x001c, MLX5_OBJ_TYPE_MATCH_DEFINER = 0x0018, + MLX5_OBJ_TYPE_PAGE_TRACK = 0x46, MLX5_OBJ_TYPE_MKEY = 0xff01, MLX5_OBJ_TYPE_QP = 0xff02, MLX5_OBJ_TYPE_PSV = 0xff03, @@ -1733,7 +1734,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 max_geneve_tlv_options[0x8]; u8 reserved_at_568[0x3]; u8 max_geneve_tlv_option_data_len[0x5]; - u8 reserved_at_570[0x10]; + u8 reserved_at_570[0x9]; + u8 adv_virtualization[0x1]; + u8 reserved_at_57a[0x6]; u8 reserved_at_580[0xb]; u8 log_max_dci_stream_channels[0x5]; @@ -11818,4 +11821,82 @@ struct mlx5_ifc_load_vhca_state_out_bits { u8 reserved_at_40[0x40]; }; +struct mlx5_ifc_adv_virtualization_cap_bits { + u8 reserved_at_0[0x3]; + u8 pg_track_log_max_num[0x5]; + u8 pg_track_max_num_range[0x8]; + u8 pg_track_log_min_addr_space[0x8]; + u8 pg_track_log_max_addr_space[0x8]; + + u8 reserved_at_20[0x3]; + u8 pg_track_log_min_msg_size[0x5]; + u8 reserved_at_28[0x3]; + u8 pg_track_log_max_msg_size[0x5]; + u8 reserved_at_30[0x3]; + u8 pg_track_log_min_page_size[0x5]; + u8 reserved_at_38[0x3]; + u8 pg_track_log_max_page_size[0x5]; + + u8 reserved_at_40[0x7c0]; +}; + +struct mlx5_ifc_page_track_report_entry_bits { + u8 dirty_address_high[0x20]; + + u8 dirty_address_low[0x20]; +}; + +enum { + MLX5_PAGE_TRACK_STATE_TRACKING, + MLX5_PAGE_TRACK_STATE_REPORTING, + MLX5_PAGE_TRACK_STATE_ERROR, +}; + +struct mlx5_ifc_page_track_range_bits { + u8 start_address[0x40]; + + u8 length[0x40]; +}; + +struct mlx5_ifc_page_track_bits { + u8 modify_field_select[0x40]; + + u8 reserved_at_40[0x10]; + u8 vhca_id[0x10]; + + u8 reserved_at_60[0x20]; + + u8 state[0x4]; + u8 track_type[0x4]; + u8 log_addr_space_size[0x8]; + u8 reserved_at_90[0x3]; + u8 log_page_size[0x5]; + u8 reserved_at_98[0x3]; + u8 log_msg_size[0x5]; + + u8 reserved_at_a0[0x8]; + u8 reporting_qpn[0x18]; + + u8 reserved_at_c0[0x18]; + u8 num_ranges[0x8]; + + u8 reserved_at_e0[0x20]; + + u8 range_start_address[0x40]; + + u8 length[0x40]; + + struct mlx5_ifc_page_track_range_bits track_range[0]; +}; + +struct mlx5_ifc_create_page_track_obj_in_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits general_obj_in_cmd_hdr; + struct mlx5_ifc_page_track_bits obj_context; +}; + +struct mlx5_ifc_modify_page_track_obj_in_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits general_obj_in_cmd_hdr; + struct mlx5_ifc_page_track_bits obj_context; +}; + #endif /* MLX5_IFC_H */ -- cgit From 939838632b9119614128028eaea3b1d7bf29f16f Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Mon, 5 Sep 2022 13:58:44 +0300 Subject: net/mlx5: Query ADV_VIRTUALIZATION capabilities Query ADV_VIRTUALIZATION capabilities which provide information for advanced virtualization related features. Current capabilities refer to the page tracker object which is used for tracking the pages that are dirtied by the device. Signed-off-by: Yishai Hadas Link: https://lore.kernel.org/r/20220905105852.26398-3-yishaih@nvidia.com Signed-off-by: Leon Romanovsky --- include/linux/mlx5/device.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index b5f58fd37a0f..5b41b9fb3d48 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1200,6 +1200,7 @@ enum mlx5_cap_type { MLX5_CAP_DEV_SHAMPO = 0x1d, MLX5_CAP_GENERAL_2 = 0x20, MLX5_CAP_PORT_SELECTION = 0x25, + MLX5_CAP_ADV_VIRTUALIZATION = 0x26, /* NUM OF CAP Types */ MLX5_CAP_NUM }; @@ -1365,6 +1366,14 @@ enum mlx5_qcam_feature_groups { MLX5_GET(port_selection_cap, \ mdev->caps.hca[MLX5_CAP_PORT_SELECTION]->max, cap) +#define MLX5_CAP_ADV_VIRTUALIZATION(mdev, cap) \ + MLX5_GET(adv_virtualization_cap, \ + mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->cur, cap) + +#define MLX5_CAP_ADV_VIRTUALIZATION_MAX(mdev, cap) \ + MLX5_GET(adv_virtualization_cap, \ + mdev->caps.hca[MLX5_CAP_ADV_VIRTUALIZATION]->max, cap) + #define MLX5_CAP_FLOWTABLE_PORT_SELECTION(mdev, cap) \ MLX5_CAP_PORT_SELECTION(mdev, flow_table_properties_port_selection.cap) -- cgit From 2d4697375dea514be202abf563a9419e74489c25 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 1 Sep 2022 00:27:42 +0300 Subject: swab: Add array operations For now, some simple array operations to swab. Signed-off-by: Andy Shevchenko Reviewed-by: Linus Walleij Link: https://lore.kernel.org/r/20220831212744.56435-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/swab.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swab.h b/include/linux/swab.h index bcff5149861a..9b804dbb0d79 100644 --- a/include/linux/swab.h +++ b/include/linux/swab.h @@ -20,4 +20,29 @@ # define swab64s __swab64s # define swahw32s __swahw32s # define swahb32s __swahb32s + +static inline void swab16_array(u16 *buf, unsigned int words) +{ + while (words--) { + swab16s(buf); + buf++; + } +} + +static inline void swab32_array(u32 *buf, unsigned int words) +{ + while (words--) { + swab32s(buf); + buf++; + } +} + +static inline void swab64_array(u64 *buf, unsigned int words) +{ + while (words--) { + swab64s(buf); + buf++; + } +} + #endif /* _LINUX_SWAB_H */ -- cgit From 359ad15763762c713a51300134e784a72eb9cb80 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Mon, 15 Aug 2022 16:26:49 +0100 Subject: iommu: Retire iommu_capable() With all callers now converted to the device-specific version, retire the old bus-based interface, and give drivers the chance to indicate accurate per-instance capabilities. Signed-off-by: Robin Murphy Reviewed-by: Lu Baolu Link: https://lore.kernel.org/r/d8bd8777d06929ad8f49df7fc80e1b9af32a41b5.1660574547.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a004e87e0e1d..0013a1befe7b 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -227,7 +227,7 @@ struct iommu_iotlb_gather { * @owner: Driver module providing these ops */ struct iommu_ops { - bool (*capable)(enum iommu_cap); + bool (*capable)(struct device *dev, enum iommu_cap); /* Domain allocation and freeing by the iommu driver */ struct iommu_domain *(*domain_alloc)(unsigned iommu_domain_type); @@ -420,7 +420,6 @@ extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops); extern int bus_iommu_probe(struct bus_type *bus); extern bool iommu_present(struct bus_type *bus); extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap); -extern bool iommu_capable(struct bus_type *bus, enum iommu_cap cap); extern struct iommu_domain *iommu_domain_alloc(struct bus_type *bus); extern struct iommu_group *iommu_group_get_by_id(int id); extern void iommu_domain_free(struct iommu_domain *domain); @@ -697,11 +696,6 @@ static inline bool device_iommu_capable(struct device *dev, enum iommu_cap cap) return false; } -static inline bool iommu_capable(struct bus_type *bus, enum iommu_cap cap) -{ - return false; -} - static inline struct iommu_domain *iommu_domain_alloc(struct bus_type *bus) { return NULL; -- cgit From 29e932295bfaba792d29e66e8be0637ff3994724 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Mon, 15 Aug 2022 17:20:17 +0100 Subject: iommu: Clean up bus_set_iommu() Clean up the remaining trivial bus_set_iommu() callsites along with the implementation. Now drivers only have to know and care about iommu_device instances, phew! Reviewed-by: Kevin Tian Tested-by: Matthew Rosato # s390 Tested-by: Niklas Schnelle # s390 Signed-off-by: Robin Murphy Link: https://lore.kernel.org/r/ea383d5f4d74ffe200ab61248e5de6e95846180a.1660572783.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0013a1befe7b..35810f87ae74 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -416,7 +416,6 @@ static inline const struct iommu_ops *dev_iommu_ops(struct device *dev) return dev->iommu->iommu_dev->ops; } -extern int bus_set_iommu(struct bus_type *bus, const struct iommu_ops *ops); extern int bus_iommu_probe(struct bus_type *bus); extern bool iommu_present(struct bus_type *bus); extern bool device_iommu_capable(struct device *dev, enum iommu_cap cap); -- cgit From fa49364cd5e65797014688cba623541083b3e5b0 Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Tue, 16 Aug 2022 18:28:04 +0100 Subject: iommu/dma: Move public interfaces to linux/iommu.h The iommu-dma layer is now mostly encapsulated by iommu_dma_ops, with only a couple more public interfaces left pertaining to MSI integration. Since these depend on the main IOMMU API header anyway, move their declarations there, taking the opportunity to update the half-baked comments to proper kerneldoc along the way. Signed-off-by: Robin Murphy Acked-by: Catalin Marinas Acked-by: Marc Zyngier Link: https://lore.kernel.org/r/9cd99738f52094e6bed44bfee03fa4f288d20695.1660668998.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel --- include/linux/dma-iommu.h | 40 ---------------------------------------- include/linux/iommu.h | 36 ++++++++++++++++++++++++++++++++++++ 2 files changed, 36 insertions(+), 40 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h index 24607dc3c2ac..e83de4f1f3d6 100644 --- a/include/linux/dma-iommu.h +++ b/include/linux/dma-iommu.h @@ -15,27 +15,10 @@ /* Domain management interface for IOMMU drivers */ int iommu_get_dma_cookie(struct iommu_domain *domain); -int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base); void iommu_put_dma_cookie(struct iommu_domain *domain); -/* Setup call for arch DMA mapping code */ -void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); int iommu_dma_init_fq(struct iommu_domain *domain); -/* The DMA API isn't _quite_ the whole story, though... */ -/* - * iommu_dma_prepare_msi() - Map the MSI page in the IOMMU device - * - * The MSI page will be stored in @desc. - * - * Return: 0 on success otherwise an error describing the failure. - */ -int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); - -/* Update the MSI message if required. */ -void iommu_dma_compose_msi_msg(struct msi_desc *desc, - struct msi_msg *msg); - void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); void iommu_dma_free_cpu_cached_iovas(unsigned int cpu, @@ -46,15 +29,8 @@ extern bool iommu_dma_forcedac; #else /* CONFIG_IOMMU_DMA */ struct iommu_domain; -struct msi_desc; -struct msi_msg; struct device; -static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base, - u64 dma_limit) -{ -} - static inline int iommu_dma_init_fq(struct iommu_domain *domain) { return -EINVAL; @@ -65,26 +41,10 @@ static inline int iommu_get_dma_cookie(struct iommu_domain *domain) return -ENODEV; } -static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base) -{ - return -ENODEV; -} - static inline void iommu_put_dma_cookie(struct iommu_domain *domain) { } -static inline int iommu_dma_prepare_msi(struct msi_desc *desc, - phys_addr_t msi_addr) -{ - return 0; -} - -static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, - struct msi_msg *msg) -{ -} - static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) { } diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 35810f87ae74..a325532aeab5 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1063,4 +1063,40 @@ void iommu_debugfs_setup(void); static inline void iommu_debugfs_setup(void) {} #endif +#ifdef CONFIG_IOMMU_DMA +#include + +/* Setup call for arch DMA mapping code */ +void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit); + +int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base); + +int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr); +void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg); + +#else /* CONFIG_IOMMU_DMA */ + +struct msi_desc; +struct msi_msg; + +static inline void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit) +{ +} + +static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base) +{ + return -ENODEV; +} + +static inline int iommu_dma_prepare_msi(struct msi_desc *desc, phys_addr_t msi_addr) +{ + return 0; +} + +static inline void iommu_dma_compose_msi_msg(struct msi_desc *desc, struct msi_msg *msg) +{ +} + +#endif /* CONFIG_IOMMU_DMA */ + #endif /* __LINUX_IOMMU_H */ -- cgit From d1b2234b7fbf94348901bed4ea8d015c8a819d02 Mon Sep 17 00:00:00 2001 From: Lior Nahmanson Date: Mon, 5 Sep 2022 22:21:16 -0700 Subject: net/mlx5: Removed esp_id from struct mlx5_flow_act esp_id is no longer in used Signed-off-by: Lior Nahmanson Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller --- include/linux/mlx5/fs.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 8e73c377da2c..920cbc9524ad 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -245,7 +245,6 @@ struct mlx5_flow_act { struct mlx5_pkt_reformat *pkt_reformat; union { u32 ipsec_obj_id; - uintptr_t esp_id; }; u32 flags; struct mlx5_fs_vlan vlan[MLX5_FS_VLAN_DEPTH]; -- cgit From e227ee990bf974f655ec3132b496409990f6634b Mon Sep 17 00:00:00 2001 From: Lior Nahmanson Date: Mon, 5 Sep 2022 22:21:17 -0700 Subject: net/mlx5: Generalize Flow Context for new crypto fields In order to support MACsec offload (and maybe some other crypto features in the future), generalize flow action parameters / defines to be used by crypto offlaods other than IPsec. The following changes made: ipsec_obj_id field at flow action context was changed to crypto_obj_id, intreduced a new crypto_type field where IPsec is the default zero type for backward compatibility. Action ipsec_decrypt was changed to crypto_decrypt. Action ipsec_encrypt was changed to crypto_encrypt. IPsec offload code was updated accordingly for backward compatibility. Signed-off-by: Lior Nahmanson Reviewed-by: Raed Salem Signed-off-by: Raed Salem Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller --- include/linux/mlx5/fs.h | 7 ++++--- include/linux/mlx5/mlx5_ifc.h | 12 ++++++++---- 2 files changed, 12 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 920cbc9524ad..e62d50acb6bd 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -243,9 +243,10 @@ struct mlx5_flow_act { u32 action; struct mlx5_modify_hdr *modify_hdr; struct mlx5_pkt_reformat *pkt_reformat; - union { - u32 ipsec_obj_id; - }; + struct mlx5_flow_act_crypto_params { + u8 type; + u32 obj_id; + } crypto; u32 flags; struct mlx5_fs_vlan vlan[MLX5_FS_VLAN_DEPTH]; struct ib_counters *counters; diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 4acd5610e96b..5758218cb3fa 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -3310,8 +3310,8 @@ enum { MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH = 0x100, MLX5_FLOW_CONTEXT_ACTION_VLAN_POP_2 = 0x400, MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH_2 = 0x800, - MLX5_FLOW_CONTEXT_ACTION_IPSEC_DECRYPT = 0x1000, - MLX5_FLOW_CONTEXT_ACTION_IPSEC_ENCRYPT = 0x2000, + MLX5_FLOW_CONTEXT_ACTION_CRYPTO_DECRYPT = 0x1000, + MLX5_FLOW_CONTEXT_ACTION_CRYPTO_ENCRYPT = 0x2000, MLX5_FLOW_CONTEXT_ACTION_EXECUTE_ASO = 0x4000, }; @@ -3321,6 +3321,10 @@ enum { MLX5_FLOW_CONTEXT_FLOW_SOURCE_LOCAL_VPORT = 0x2, }; +enum { + MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_IPSEC = 0x0, +}; + struct mlx5_ifc_vlan_bits { u8 ethtype[0x10]; u8 prio[0x3]; @@ -3374,7 +3378,7 @@ struct mlx5_ifc_flow_context_bits { u8 extended_destination[0x1]; u8 reserved_at_81[0x1]; u8 flow_source[0x2]; - u8 reserved_at_84[0x4]; + u8 encrypt_decrypt_type[0x4]; u8 destination_list_size[0x18]; u8 reserved_at_a0[0x8]; @@ -3386,7 +3390,7 @@ struct mlx5_ifc_flow_context_bits { struct mlx5_ifc_vlan_bits push_vlan_2; - u8 ipsec_obj_id[0x20]; + u8 encrypt_decrypt_obj_id[0x20]; u8 reserved_at_140[0xc0]; struct mlx5_ifc_fte_match_param_bits match_value; -- cgit From 8385c51ff5bceb92f7401138e16b0a617ca2ab02 Mon Sep 17 00:00:00 2001 From: Lior Nahmanson Date: Mon, 5 Sep 2022 22:21:18 -0700 Subject: net/mlx5: Introduce MACsec Connect-X offload hardware bits and structures Add MACsec offload related IFC structs, layouts and enumerations. Signed-off-by: Lior Nahmanson Reviewed-by: Raed Salem Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller --- include/linux/mlx5/device.h | 4 ++ include/linux/mlx5/mlx5_ifc.h | 99 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 101 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index b5f58fd37a0f..2927810f172b 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1198,6 +1198,7 @@ enum mlx5_cap_type { MLX5_CAP_DEV_EVENT = 0x14, MLX5_CAP_IPSEC, MLX5_CAP_DEV_SHAMPO = 0x1d, + MLX5_CAP_MACSEC = 0x1f, MLX5_CAP_GENERAL_2 = 0x20, MLX5_CAP_PORT_SELECTION = 0x25, /* NUM OF CAP Types */ @@ -1446,6 +1447,9 @@ enum mlx5_qcam_feature_groups { #define MLX5_CAP_DEV_SHAMPO(mdev, cap)\ MLX5_GET(shampo_cap, mdev->caps.hca_cur[MLX5_CAP_DEV_SHAMPO], cap) +#define MLX5_CAP_MACSEC(mdev, cap)\ + MLX5_GET(macsec_cap, (mdev)->caps.hca[MLX5_CAP_MACSEC]->cur, cap) + enum { MLX5_CMD_STAT_OK = 0x0, MLX5_CMD_STAT_INT_ERR = 0x1, diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 5758218cb3fa..8decbf9a7bdd 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -82,6 +82,7 @@ enum { MLX5_GENERAL_OBJ_TYPES_CAP_SW_ICM = (1ULL << MLX5_OBJ_TYPE_SW_ICM), MLX5_GENERAL_OBJ_TYPES_CAP_GENEVE_TLV_OPT = (1ULL << 11), MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q = (1ULL << 13), + MLX5_GENERAL_OBJ_TYPES_CAP_MACSEC_OFFLOAD = (1ULL << 39), }; enum { @@ -449,7 +450,12 @@ struct mlx5_ifc_flow_table_prop_layout_bits { u8 reserved_at_60[0x2]; u8 reformat_insert[0x1]; u8 reformat_remove[0x1]; - u8 reserver_at_64[0x14]; + u8 macsec_encrypt[0x1]; + u8 macsec_decrypt[0x1]; + u8 reserved_at_66[0x2]; + u8 reformat_add_macsec[0x1]; + u8 reformat_remove_macsec[0x1]; + u8 reserved_at_6a[0xe]; u8 log_max_ft_num[0x8]; u8 reserved_at_80[0x10]; @@ -611,7 +617,11 @@ struct mlx5_ifc_fte_match_set_misc2_bits { u8 metadata_reg_a[0x20]; - u8 reserved_at_1a0[0x60]; + u8 reserved_at_1a0[0x8]; + + u8 macsec_syndrome[0x8]; + + u8 reserved_at_1b0[0x50]; }; struct mlx5_ifc_fte_match_set_misc3_bits { @@ -1276,6 +1286,24 @@ struct mlx5_ifc_ipsec_cap_bits { u8 reserved_at_30[0x7d0]; }; +struct mlx5_ifc_macsec_cap_bits { + u8 macsec_epn[0x1]; + u8 reserved_at_1[0x2]; + u8 macsec_crypto_esp_aes_gcm_256_encrypt[0x1]; + u8 macsec_crypto_esp_aes_gcm_128_encrypt[0x1]; + u8 macsec_crypto_esp_aes_gcm_256_decrypt[0x1]; + u8 macsec_crypto_esp_aes_gcm_128_decrypt[0x1]; + u8 reserved_at_7[0x4]; + u8 log_max_macsec_offload[0x5]; + u8 reserved_at_10[0x10]; + + u8 min_log_macsec_full_replay_window[0x8]; + u8 max_log_macsec_full_replay_window[0x8]; + u8 reserved_at_30[0x10]; + + u8 reserved_at_40[0x7c0]; +}; + enum { MLX5_WQ_TYPE_LINKED_LIST = 0x0, MLX5_WQ_TYPE_CYCLIC = 0x1, @@ -3295,6 +3323,7 @@ union mlx5_ifc_hca_cap_union_bits { struct mlx5_ifc_device_mem_cap_bits device_mem_cap; struct mlx5_ifc_virtio_emulation_cap_bits virtio_emulation_cap; struct mlx5_ifc_shampo_cap_bits shampo_cap; + struct mlx5_ifc_macsec_cap_bits macsec_cap; u8 reserved_at_0[0x8000]; }; @@ -3323,6 +3352,7 @@ enum { enum { MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_IPSEC = 0x0, + MLX5_FLOW_CONTEXT_ENCRYPT_DECRYPT_TYPE_MACSEC = 0x1, }; struct mlx5_ifc_vlan_bits { @@ -6320,6 +6350,8 @@ enum mlx5_reformat_ctx_type { MLX5_REFORMAT_TYPE_L2_TO_L3_TUNNEL = 0x4, MLX5_REFORMAT_TYPE_INSERT_HDR = 0xf, MLX5_REFORMAT_TYPE_REMOVE_HDR = 0x10, + MLX5_REFORMAT_TYPE_ADD_MACSEC = 0x11, + MLX5_REFORMAT_TYPE_DEL_MACSEC = 0x12, }; struct mlx5_ifc_alloc_packet_reformat_context_in_bits { @@ -11475,6 +11507,7 @@ enum { MLX5_GENERAL_OBJECT_TYPES_IPSEC = 0x13, MLX5_GENERAL_OBJECT_TYPES_SAMPLER = 0x20, MLX5_GENERAL_OBJECT_TYPES_FLOW_METER_ASO = 0x24, + MLX5_GENERAL_OBJECT_TYPES_MACSEC = 0x27, }; enum { @@ -11525,6 +11558,67 @@ struct mlx5_ifc_modify_ipsec_obj_in_bits { struct mlx5_ifc_ipsec_obj_bits ipsec_object; }; +struct mlx5_ifc_macsec_aso_bits { + u8 valid[0x1]; + u8 reserved_at_1[0x1]; + u8 mode[0x2]; + u8 window_size[0x2]; + u8 soft_lifetime_arm[0x1]; + u8 hard_lifetime_arm[0x1]; + u8 remove_flow_enable[0x1]; + u8 epn_event_arm[0x1]; + u8 reserved_at_a[0x16]; + + u8 remove_flow_packet_count[0x20]; + + u8 remove_flow_soft_lifetime[0x20]; + + u8 reserved_at_60[0x80]; + + u8 mode_parameter[0x20]; + + u8 replay_protection_window[8][0x20]; +}; + +struct mlx5_ifc_macsec_offload_obj_bits { + u8 modify_field_select[0x40]; + + u8 confidentiality_en[0x1]; + u8 reserved_at_41[0x1]; + u8 esn_en[0x1]; + u8 esn_overlap[0x1]; + u8 reserved_at_44[0x2]; + u8 confidentiality_offset[0x2]; + u8 reserved_at_48[0x4]; + u8 aso_return_reg[0x4]; + u8 reserved_at_50[0x10]; + + u8 esn_msb[0x20]; + + u8 reserved_at_80[0x8]; + u8 dekn[0x18]; + + u8 reserved_at_a0[0x20]; + + u8 sci[0x40]; + + u8 reserved_at_100[0x8]; + u8 macsec_aso_access_pd[0x18]; + + u8 reserved_at_120[0x60]; + + u8 salt[3][0x20]; + + u8 reserved_at_1e0[0x20]; + + struct mlx5_ifc_macsec_aso_bits macsec_aso; +}; + +struct mlx5_ifc_create_macsec_obj_in_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits general_obj_in_cmd_hdr; + struct mlx5_ifc_macsec_offload_obj_bits macsec_object; +}; + struct mlx5_ifc_encryption_key_obj_bits { u8 modify_field_select[0x40]; @@ -11642,6 +11736,7 @@ enum { enum { MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_TYPE_TLS = 0x1, MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_TYPE_IPSEC = 0x2, + MLX5_GENERAL_OBJECT_TYPE_ENCRYPTION_KEY_TYPE_MACSEC = 0x4, }; struct mlx5_ifc_tls_static_params_bits { -- cgit From ee534d7f81ba9cec028580f91429b3dc29b90c7f Mon Sep 17 00:00:00 2001 From: Lior Nahmanson Date: Mon, 5 Sep 2022 22:21:20 -0700 Subject: net/mlx5: Add MACsec Tx tables support to fs_core Changed EGRESS_KERNEL namespace to EGRESS_IPSEC and add new namespace for MACsec TX. This namespace should be the last namespace for transmitted packets. Signed-off-by: Lior Nahmanson Reviewed-by: Raed Salem Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller --- include/linux/mlx5/fs.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index e62d50acb6bd..53d186774206 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -92,7 +92,8 @@ enum mlx5_flow_namespace_type { MLX5_FLOW_NAMESPACE_SNIFFER_RX, MLX5_FLOW_NAMESPACE_SNIFFER_TX, MLX5_FLOW_NAMESPACE_EGRESS, - MLX5_FLOW_NAMESPACE_EGRESS_KERNEL, + MLX5_FLOW_NAMESPACE_EGRESS_IPSEC, + MLX5_FLOW_NAMESPACE_EGRESS_MACSEC, MLX5_FLOW_NAMESPACE_RDMA_RX, MLX5_FLOW_NAMESPACE_RDMA_RX_KERNEL, MLX5_FLOW_NAMESPACE_RDMA_TX, -- cgit From e467b283ffd50cf15b84c73eef68787e257eaed5 Mon Sep 17 00:00:00 2001 From: Lior Nahmanson Date: Mon, 5 Sep 2022 22:21:21 -0700 Subject: net/mlx5e: Add MACsec TX steering rules Tx flow steering consists of two flow tables (FTs). The first FT (crypto table) has two fixed rules: One default miss rule so non MACsec offloaded packets bypass the MACSec tables, another rule to make sure that MACsec key exchange (MKE) traffic passes unencrypted as expected (matched of ethertype). On each new MACsec offload flow, a new MACsec rule is added. This rule is matched on metadata_reg_a (which contains the id of the flow) and invokes the MACsec offload action on match. The second FT (check table) has two fixed rules: One rule for verifying that the previous offload actions were finished successfully and packet need to be transmitted. Another default rule for dropping packets that were failed in the offload actions. The MACsec FTs should be created on demand when the first MACsec rule is added and destroyed when the last MACsec rule is deleted. Signed-off-by: Lior Nahmanson Reviewed-by: Raed Salem Signed-off-by: Raed Salem Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller --- include/linux/mlx5/qp.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index 8bda3ba5b109..be640c749d5e 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -252,6 +252,7 @@ enum { enum { MLX5_ETH_WQE_FT_META_IPSEC = BIT(0), + MLX5_ETH_WQE_FT_META_MACSEC = BIT(1), }; struct mlx5_wqe_eth_seg { -- cgit From 15d187e285b3d5f4fe0328c09566355c1e387ff6 Mon Sep 17 00:00:00 2001 From: Lior Nahmanson Date: Mon, 5 Sep 2022 22:21:24 -0700 Subject: net/mlx5: Add MACsec Rx tables support to fs_core Add new namespace for MACsec RX flows. Encrypted MACsec packets should be first decrypted and stripped from MACsec header and then continues with the kernel's steering pipeline. Signed-off-by: Lior Nahmanson Reviewed-by: Raed Salem Signed-off-by: Saeed Mahameed Signed-off-by: David S. Miller --- include/linux/mlx5/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 53d186774206..c7a91981cd5a 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -79,6 +79,7 @@ static inline void build_leftovers_ft_param(int *priority, enum mlx5_flow_namespace_type { MLX5_FLOW_NAMESPACE_BYPASS, + MLX5_FLOW_NAMESPACE_KERNEL_RX_MACSEC, MLX5_FLOW_NAMESPACE_LAG, MLX5_FLOW_NAMESPACE_OFFLOADS, MLX5_FLOW_NAMESPACE_ETHTOOL, -- cgit From aaac38f614871df252aa7459647bf68d42f7c3e7 Mon Sep 17 00:00:00 2001 From: Vasant Hegde Date: Thu, 25 Aug 2022 06:39:36 +0000 Subject: iommu/amd: Initial support for AMD IOMMU v2 page table Introduce IO page table framework support for AMD IOMMU v2 page table. This patch implements 4 level page table within iommu amd driver and supports 4K/2M/1G page sizes. Signed-off-by: Vasant Hegde Link: https://lore.kernel.org/r/20220825063939.8360-7-vasant.hegde@amd.com Signed-off-by: Joerg Roedel --- include/linux/io-pgtable.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index ca98aeadcc80..ffe616db9409 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -16,6 +16,7 @@ enum io_pgtable_fmt { ARM_V7S, ARM_MALI_LPAE, AMD_IOMMU_V1, + AMD_IOMMU_V2, APPLE_DART, IO_PGTABLE_NUM_FMTS, }; @@ -260,6 +261,7 @@ extern struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns; extern struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns; extern struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns; extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v1_init_fns; +extern struct io_pgtable_init_fns io_pgtable_amd_iommu_v2_init_fns; extern struct io_pgtable_init_fns io_pgtable_apple_dart_init_fns; #endif /* __IO_PGTABLE_H */ -- cgit From 5e0531f6b90ac096fedaf5bd0eae0bb4e5a39da5 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 7 Sep 2022 16:11:25 +0200 Subject: spi: Add capability to perform some transfer with chipselect off Some components require a few clock cycles with chipselect off before or/and after the data transfer done with CS on. Typically IDT 801034 QUAD PCM CODEC datasheet states "Note *: CCLK should have one cycle before CS goes low, and two cycles after CS goes high". The cycles "before" are implicitely provided by all previous activity on the SPI bus. But the cycles "after" must be provided in order to terminate the SPI transfer. In order to use that kind of component, add a cs_off flag to spi_transfer struct. When this flag is set, the transfer is performed with chipselect off. This allows consummer to add a dummy transfer at the end of the transfer list which is performed with chipselect OFF, providing the required additional clock cycles. Signed-off-by: Christophe Leroy Link: https://lore.kernel.org/r/434165c46f06d802690208a11e7ea2500e8da4c7.1662558898.git.christophe.leroy@csgroup.eu Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index e6c73d5ff1a8..6e6c62c59957 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -847,6 +847,7 @@ struct spi_res { * for this transfer. If 0 the default (from @spi_device) is used. * @dummy_data: indicates transfer is dummy bytes transfer. * @cs_change: affects chipselect after this transfer completes + * @cs_off: performs the transfer with chipselect off. * @cs_change_delay: delay between cs deassert and assert when * @cs_change is set and @spi_transfer is not the last in @spi_message * @delay: delay to be introduced after this transfer before @@ -959,6 +960,7 @@ struct spi_transfer { unsigned cs_change:1; unsigned tx_nbits:3; unsigned rx_nbits:3; + unsigned cs_off:1; #define SPI_NBITS_SINGLE 0x01 /* 1bit transfer */ #define SPI_NBITS_DUAL 0x02 /* 2bits transfer */ #define SPI_NBITS_QUAD 0x04 /* 4bits transfer */ -- cgit From 787f51f210ebe5d7d5e4c8101b148f0018e9c409 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Thu, 1 Sep 2022 10:16:49 +0200 Subject: USB/ARM: Switch S3C2410 UDC to GPIO descriptors This converts the S3C2410 UDC USB device controller to use GPIO descriptor tables and modern GPIO. Cc: Krzysztof Kozlowski Cc: Alim Akhtar Cc: linux-arm-kernel@lists.infradead.org Cc: linux-samsung-soc@vger.kernel.org Acked-by: Krzysztof Kozlowski Signed-off-by: Linus Walleij Link: https://lore.kernel.org/r/20220901081649.564348-1-linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/platform_data/usb-s3c2410_udc.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/usb-s3c2410_udc.h b/include/linux/platform_data/usb-s3c2410_udc.h index 07394819d03b..c0fbe1fe3426 100644 --- a/include/linux/platform_data/usb-s3c2410_udc.h +++ b/include/linux/platform_data/usb-s3c2410_udc.h @@ -22,12 +22,6 @@ enum s3c2410_udc_cmd_e { struct s3c2410_udc_mach_info { void (*udc_command)(enum s3c2410_udc_cmd_e); void (*vbus_draw)(unsigned int ma); - - unsigned int pullup_pin; - unsigned int pullup_pin_inverted; - - unsigned int vbus_pin; - unsigned char vbus_pin_inverted; }; extern void __init s3c24xx_udc_set_platdata(struct s3c2410_udc_mach_info *); -- cgit From e77cab77f2cb3a1ca2ba8df4af45bb35617ac16d Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Thu, 1 Sep 2022 17:39:32 +0300 Subject: serial: Create uart_xmit_advance() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit A very common pattern in the drivers is to advance xmit tail index and do bookkeeping of Tx'ed characters. Create uart_xmit_advance() to handle it. Reviewed-by: Andy Shevchenko Cc: stable Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220901143934.8850-2-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 6e4f4765d209..1eaea9fe44d8 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -624,6 +624,23 @@ struct uart_state { /* number of characters left in xmit buffer before we ask for more */ #define WAKEUP_CHARS 256 +/** + * uart_xmit_advance - Advance xmit buffer and account Tx'ed chars + * @up: uart_port structure describing the port + * @chars: number of characters sent + * + * This function advances the tail of circular xmit buffer by the number of + * @chars transmitted and handles accounting of transmitted bytes (into + * @up's icount.tx). + */ +static inline void uart_xmit_advance(struct uart_port *up, unsigned int chars) +{ + struct circ_buf *xmit = &up->state->xmit; + + xmit->tail = (xmit->tail + chars) & (UART_XMIT_SIZE - 1); + up->icount.tx += chars; +} + struct module; struct tty_driver; -- cgit From 85d6b66d31c35158364058ee98fb69ab5bb6a6b1 Mon Sep 17 00:00:00 2001 From: Jim Cromie Date: Sun, 4 Sep 2022 15:40:39 -0600 Subject: dyndbg: fix module.dyndbg handling For CONFIG_DYNAMIC_DEBUG=N, the ddebug_dyndbg_module_param_cb() stub-fn is too permissive: bash-5.1# modprobe drm JUNKdyndbg bash-5.1# modprobe drm dyndbgJUNK [ 42.933220] dyndbg param is supported only in CONFIG_DYNAMIC_DEBUG builds [ 42.937484] ACPI: bus type drm_connector registered This caused no ill effects, because unknown parameters are either ignored by default with an "unknown parameter" warning, or ignored because dyndbg allows its no-effect use on non-dyndbg builds. But since the code has an explicit feedback message, it should be issued accurately. Fix with strcmp for exact param-name match. Fixes: b48420c1d301 dynamic_debug: make dynamic-debug work for module initialization Reported-by: Rasmus Villemoes Acked-by: Jason Baron Acked-by: Daniel Vetter Signed-off-by: Jim Cromie Link: https://lore.kernel.org/r/20220904214134.408619-3-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/dynamic_debug.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index dce631e678dd..f30b01aa9fa4 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -201,7 +201,7 @@ static inline int ddebug_remove_module(const char *mod) static inline int ddebug_dyndbg_module_param_cb(char *param, char *val, const char *modname) { - if (strstr(param, "dyndbg")) { + if (!strcmp(param, "dyndbg")) { /* avoid pr_warn(), which wants pr_fmt() fully defined */ printk(KERN_WARNING "dyndbg param is supported only in " "CONFIG_DYNAMIC_DEBUG builds\n"); -- cgit From e26ef3af964acfea311403126acee8c56c89e26b Mon Sep 17 00:00:00 2001 From: Jim Cromie Date: Sun, 4 Sep 2022 15:40:46 -0600 Subject: dyndbg: drop EXPORTed dynamic_debug_exec_queries This exported fn is unused, and will not be needed. Lets dump it. The export was added to let drm control pr_debugs, as part of using them to avoid drm_debug_enabled overheads. But its better to just implement the drm.debug bitmap interface, then its available for everyone. Fixes: a2d375eda771 ("dyndbg: refine export, rename to dynamic_debug_exec_queries()") Fixes: 4c0d77828d4f ("dyndbg: export ddebug_exec_queries") Acked-by: Jason Baron Acked-by: Daniel Vetter Signed-off-by: Jim Cromie Link: https://lore.kernel.org/r/20220904214134.408619-10-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/dynamic_debug.h | 9 --------- 1 file changed, 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index f30b01aa9fa4..8d9eec5f6d8b 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -55,9 +55,6 @@ struct _ddebug { #if defined(CONFIG_DYNAMIC_DEBUG_CORE) -/* exported for module authors to exercise >control */ -int dynamic_debug_exec_queries(const char *query, const char *modname); - int ddebug_add_module(struct _ddebug *tab, unsigned int n, const char *modname); extern int ddebug_remove_module(const char *mod_name); @@ -221,12 +218,6 @@ static inline int ddebug_dyndbg_module_param_cb(char *param, char *val, rowsize, groupsize, buf, len, ascii); \ } while (0) -static inline int dynamic_debug_exec_queries(const char *query, const char *modname) -{ - pr_warn("kernel not built with CONFIG_DYNAMIC_DEBUG_CORE\n"); - return 0; -} - #endif /* !CONFIG_DYNAMIC_DEBUG_CORE */ #endif -- cgit From b7b4eebdba7b6aea6b34dc29691b71c39d1dbd6a Mon Sep 17 00:00:00 2001 From: Jim Cromie Date: Sun, 4 Sep 2022 15:40:48 -0600 Subject: dyndbg: gather __dyndbg[] state into struct _ddebug_info This new struct composes the linker provided (vector,len) section, and provides a place to add other __dyndbg[] state-data later: descs - the vector of descriptors in __dyndbg section. num_descs - length of the data/section. Use it, in several different ways, as follows: In lib/dynamic_debug.c: ddebug_add_module(): Alter params-list, replacing 2 args (array,index) with a struct _ddebug_info * containing them both, with room for expansion. This helps future-proof the function prototype against the looming addition of class-map info into the dyndbg-state, by providing a place to add more member fields later. NB: later add static struct _ddebug_info builtins_state declaration, not needed yet. ddebug_add_module() is called in 2 contexts: In dynamic_debug_init(), declare, init a struct _ddebug_info di auto-var to use as a cursor. Then iterate over the prdbg blocks of the builtin modules, and update the di cursor before calling _add_module for each. Its called from kernel/module/main.c:load_info() for each loaded module: In internal.h, alter struct load_info, replacing the dyndbg array,len fields with an embedded _ddebug_info containing them both; and populate its members in find_module_sections(). The 2 calling contexts differ in that _init deals with contiguous subranges of __dyndbgs[] section, packed together, while loadable modules are added one at a time. So rename ddebug_add_module() into outer/__inner fns, call __inner from _init, and provide the offset into the builtin __dyndbgs[] where the module's prdbgs reside. The cursor provides start, len of the subrange for each. The offset will be used later to pack the results of builtin __dyndbg_sites[] de-duplication, and is 0 and unneeded for loadable modules, Note: kernel/module/main.c includes for struct _ddeubg_info. This might be prone to include loops, since its also included by printk.h. Nothing has broken in robot-land on this. cc: Luis Chamberlain Signed-off-by: Jim Cromie Link: https://lore.kernel.org/r/20220904214134.408619-12-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/dynamic_debug.h | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index 8d9eec5f6d8b..6a2001250da1 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -51,12 +51,16 @@ struct _ddebug { #endif } __attribute__((aligned(8))); - +/* encapsulate linker provided built-in (or module) dyndbg data */ +struct _ddebug_info { + struct _ddebug *descs; + unsigned int num_descs; +}; #if defined(CONFIG_DYNAMIC_DEBUG_CORE) -int ddebug_add_module(struct _ddebug *tab, unsigned int n, - const char *modname); +int ddebug_add_module(struct _ddebug_info *dyndbg, const char *modname); + extern int ddebug_remove_module(const char *mod_name); extern __printf(2, 3) void __dynamic_pr_debug(struct _ddebug *descriptor, const char *fmt, ...); @@ -184,8 +188,7 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, #include #include -static inline int ddebug_add_module(struct _ddebug *tab, unsigned int n, - const char *modname) +static inline int ddebug_add_module(struct _ddebug_info *dinfo, const char *modname) { return 0; } -- cgit From ca90fca7f7b51830dfb95bf655210a1c84588f15 Mon Sep 17 00:00:00 2001 From: Jim Cromie Date: Sun, 4 Sep 2022 15:40:49 -0600 Subject: dyndbg: add class_id to pr_debug callsites DRM issues ~10 exclusive categories of debug messages; to represent this directly in dyndbg, add a new 6-bit field: struct _ddebug.class_id. This gives us 64 classes, which should be more than enough. #> echo 0x012345678 > /sys/module/drm/parameters/debug All existing callsites are initialized with _DPRINTK_CLASS_DFLT, which is 2^6-1. This reserves 0-62 for use in new categorized/class'd pr_debugs, which fits perfectly with natural enums (ints: 0..N). Thats done by extending the init macro: DEFINE_DYNAMIC_DEBUG_METADATA() with _CLS(cls, ...) suffix, and redefing old name using extended name. Then extend the factory macro callchain with _cls() versions to provide the callsite.class_id, with old-names passing _DPRINTK_CLASS_DFLT. This sets us up to create class'd prdebug callsites (class'd callsites are those with .class_id != _DPRINTK_CLASS_DFLT). No behavior change. cc: Rasmus Villemoes Signed-off-by: Jim Cromie Link: https://lore.kernel.org/r/20220904214134.408619-13-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/dynamic_debug.h | 71 +++++++++++++++++++++++++++++++++---------- 1 file changed, 55 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index 6a2001250da1..633f4e463766 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -6,6 +6,8 @@ #include #endif +#include + /* * An instance of this structure is created in a special * ELF section at every dynamic debug callsite. At runtime, @@ -21,6 +23,9 @@ struct _ddebug { const char *filename; const char *format; unsigned int lineno:18; +#define CLS_BITS 6 + unsigned int class_id:CLS_BITS; +#define _DPRINTK_CLASS_DFLT ((1 << CLS_BITS) - 1) /* * The flags field controls the behaviour at the callsite. * The bits here are changed dynamically when the user @@ -88,7 +93,7 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, const struct ib_device *ibdev, const char *fmt, ...); -#define DEFINE_DYNAMIC_DEBUG_METADATA(name, fmt) \ +#define DEFINE_DYNAMIC_DEBUG_METADATA_CLS(name, cls, fmt) \ static struct _ddebug __aligned(8) \ __section("__dyndbg") name = { \ .modname = KBUILD_MODNAME, \ @@ -97,8 +102,14 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, .format = (fmt), \ .lineno = __LINE__, \ .flags = _DPRINTK_FLAGS_DEFAULT, \ + .class_id = cls, \ _DPRINTK_KEY_INIT \ - } + }; \ + BUILD_BUG_ON_MSG(cls > _DPRINTK_CLASS_DFLT, \ + "classid value overflow") + +#define DEFINE_DYNAMIC_DEBUG_METADATA(name, fmt) \ + DEFINE_DYNAMIC_DEBUG_METADATA_CLS(name, _DPRINTK_CLASS_DFLT, fmt) #ifdef CONFIG_JUMP_LABEL @@ -129,17 +140,34 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, #endif /* CONFIG_JUMP_LABEL */ -#define __dynamic_func_call(id, fmt, func, ...) do { \ - DEFINE_DYNAMIC_DEBUG_METADATA(id, fmt); \ - if (DYNAMIC_DEBUG_BRANCH(id)) \ - func(&id, ##__VA_ARGS__); \ -} while (0) - -#define __dynamic_func_call_no_desc(id, fmt, func, ...) do { \ - DEFINE_DYNAMIC_DEBUG_METADATA(id, fmt); \ +/* + * Factory macros: ($prefix)dynamic_func_call($suffix) + * + * Lower layer (with __ prefix) gets the callsite metadata, and wraps + * the func inside a debug-branch/static-key construct. Upper layer + * (with _ prefix) does the UNIQUE_ID once, so that lower can ref the + * name/label multiple times, and tie the elements together. + * Multiple flavors: + * (|_cls): adds in _DPRINT_CLASS_DFLT as needed + * (|_no_desc): former gets callsite descriptor as 1st arg (for prdbgs) + */ +#define __dynamic_func_call_cls(id, cls, fmt, func, ...) do { \ + DEFINE_DYNAMIC_DEBUG_METADATA_CLS(id, cls, fmt); \ if (DYNAMIC_DEBUG_BRANCH(id)) \ - func(__VA_ARGS__); \ + func(&id, ##__VA_ARGS__); \ } while (0) +#define __dynamic_func_call(id, fmt, func, ...) \ + __dynamic_func_call_cls(id, _DPRINTK_CLASS_DFLT, fmt, \ + func, ##__VA_ARGS__) + +#define __dynamic_func_call_cls_no_desc(id, cls, fmt, func, ...) do { \ + DEFINE_DYNAMIC_DEBUG_METADATA_CLS(id, cls, fmt); \ + if (DYNAMIC_DEBUG_BRANCH(id)) \ + func(__VA_ARGS__); \ +} while (0) +#define __dynamic_func_call_no_desc(id, fmt, func, ...) \ + __dynamic_func_call_cls_no_desc(id, _DPRINTK_CLASS_DFLT, \ + fmt, func, ##__VA_ARGS__) /* * "Factory macro" for generating a call to func, guarded by a @@ -149,22 +177,33 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, * the varargs. Note that fmt is repeated in invocations of this * macro. */ +#define _dynamic_func_call_cls(cls, fmt, func, ...) \ + __dynamic_func_call_cls(__UNIQUE_ID(ddebug), cls, fmt, func, ##__VA_ARGS__) #define _dynamic_func_call(fmt, func, ...) \ - __dynamic_func_call(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__) + _dynamic_func_call_cls(_DPRINTK_CLASS_DFLT, fmt, func, ##__VA_ARGS__) + /* * A variant that does the same, except that the descriptor is not * passed as the first argument to the function; it is only called * with precisely the macro's varargs. */ -#define _dynamic_func_call_no_desc(fmt, func, ...) \ - __dynamic_func_call_no_desc(__UNIQUE_ID(ddebug), fmt, func, ##__VA_ARGS__) +#define _dynamic_func_call_cls_no_desc(cls, fmt, func, ...) \ + __dynamic_func_call_cls_no_desc(__UNIQUE_ID(ddebug), cls, fmt, \ + func, ##__VA_ARGS__) +#define _dynamic_func_call_no_desc(fmt, func, ...) \ + _dynamic_func_call_cls_no_desc(_DPRINTK_CLASS_DFLT, fmt, \ + func, ##__VA_ARGS__) + +#define dynamic_pr_debug_cls(cls, fmt, ...) \ + _dynamic_func_call_cls(cls, fmt, __dynamic_pr_debug, \ + pr_fmt(fmt), ##__VA_ARGS__) #define dynamic_pr_debug(fmt, ...) \ - _dynamic_func_call(fmt, __dynamic_pr_debug, \ + _dynamic_func_call(fmt, __dynamic_pr_debug, \ pr_fmt(fmt), ##__VA_ARGS__) #define dynamic_dev_dbg(dev, fmt, ...) \ - _dynamic_func_call(fmt,__dynamic_dev_dbg, \ + _dynamic_func_call(fmt, __dynamic_dev_dbg, \ dev, fmt, ##__VA_ARGS__) #define dynamic_netdev_dbg(dev, fmt, ...) \ -- cgit From 3fc95d80a536e49e38ba4f79ca60cb4e64f99b3b Mon Sep 17 00:00:00 2001 From: Jim Cromie Date: Sun, 4 Sep 2022 15:40:50 -0600 Subject: dyndbg: add __pr_debug_cls for testing For selftest purposes, add __pr_debug_cls(class, fmt, ...) I didn't think we'd need to define this, since DRM effectively has it already in drm_dbg, drm_devdbg. But test_dynamic_debug needs it in order to demonstrate all the moving parts. Note the __ prefix; its not intended for general use, at least until a need emerges. ISTM the drm.debug model (macro wrappers inserting enum const 1st arg) is the baseline approach. That said, nouveau might want it for easy use in its debug macros. TBD. NB: it does require a builtin-constant class, __pr_debug_cls(i++, ...) is disallowed by compiler. Signed-off-by: Jim Cromie Link: https://lore.kernel.org/r/20220904214134.408619-14-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/dynamic_debug.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index 633f4e463766..3c9690da44d9 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -221,6 +221,13 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, KERN_DEBUG, prefix_str, prefix_type, \ rowsize, groupsize, buf, len, ascii) +/* for test only, generally expect drm.debug style macro wrappers */ +#define __pr_debug_cls(cls, fmt, ...) do { \ + BUILD_BUG_ON_MSG(!__builtin_constant_p(cls), \ + "expecting constant class int/enum"); \ + dynamic_pr_debug_cls(cls, fmt, ##__VA_ARGS__); \ + } while (0) + #else /* !CONFIG_DYNAMIC_DEBUG_CORE */ #include -- cgit From aad0214f30264c19044a77fc4776def349b76fc4 Mon Sep 17 00:00:00 2001 From: Jim Cromie Date: Sun, 4 Sep 2022 15:40:51 -0600 Subject: dyndbg: add DECLARE_DYNDBG_CLASSMAP macro Using DECLARE_DYNDBG_CLASSMAP, modules can declare up to 31 classnames. By doing so, they authorize dyndbg to manipulate class'd prdbgs (ie: __pr_debug_cls, and soon drm_*dbg), ala:: :#> echo class DRM_UT_KMS +p > /proc/dynamic_debug/control The macro declares and initializes a static struct ddebug_class_map:: - maps approved class-names to class_ids used in module, by array order. forex: DRM_UT_* - class-name vals allow validation of "class FOO" queries using macro is opt-in - enum class_map_type - determines interface, behavior Each module has its own class-type and class_id space, and only known class-names will be authorized for a manipulation. Only DRM modules should know and respont to this: :#> echo class DRM_UT_CORE +p > control # across all modules pr_debugs (with default class_id) are still controllable as before. DECLARE_DYNDBG_CLASSMAP(_var, _maptype, _base, classes...) is:: _var: name of the static struct var. user passes to module_param_cb() if they want a sysfs node. _maptype: this is hard-coded to DD_CLASS_TYPE_DISJOINT_BITS for now. _base: usually 0, it allows splitting 31 classes into subranges, so that multiple classes / sysfs-nodes can share the module's class-id space. classes: list of class_name strings, these are mapped to class-ids starting at _base. This class-names list must have a corresponding ENUM, with SYMBOLS that match the literals, and 1st enum val = _base. enum class_map_type has 4 values, on 2 factors:: - classes are disjoint/independent vs relative/xcontrol interface doesn't enforce the LEVELS relationship, so you could confusingly have V3 enabled, but V1 disabled. OTOH, the control iface already allows infinite tweaking of the underlying callsites; sysfs node readback can only tell the user what they previously wrote. 2. All dyndbg >control reduces to a query/command, includes +/-, which is at-root a kernel patching operation with +/- semantics. And the _NAMES handling exposes it to the user, making it API-adjacent. And its not just >control where +/- gets used (which is settled), the new place is with sysfs-nodes exposing _*_NAMES classes, and here its subtly different. _DISJOINT_NAMES: is simple, independent _LEVEL_NAMES: masks-on bits 0 .. N-1, N..max off # turn on L3,L2,L1 others off echo +L3 > /sys/module/test_dynamic_debug/parameters/p_level_names # turn on L2,L1 others off echo -L3 > /sys/module/test_dynamic_debug/parameters/p_level_names IOW, the - changes the threshold-on bitpos by 1. Alternatively, we could treat the +/- as half-duplex, where -L3 turns off L>2 (and ignores L1), and +L2 would turn on L<=2 (and ignore others). Signed-off-by: Jim Cromie Link: https://lore.kernel.org/r/20220904214134.408619-15-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/dynamic_debug.h | 55 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 55 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index 3c9690da44d9..98dbf1d49984 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -56,6 +56,61 @@ struct _ddebug { #endif } __attribute__((aligned(8))); +enum class_map_type { + DD_CLASS_TYPE_DISJOINT_BITS, + /** + * DD_CLASS_TYPE_DISJOINT_BITS: classes are independent, one per bit. + * expecting hex input. Built for drm.debug, basis for other types. + */ + DD_CLASS_TYPE_LEVEL_NUM, + /** + * DD_CLASS_TYPE_LEVEL_NUM: input is numeric level, 0-N. + * N turns on just bits N-1 .. 0, so N=0 turns all bits off. + */ + DD_CLASS_TYPE_DISJOINT_NAMES, + /** + * DD_CLASS_TYPE_DISJOINT_NAMES: input is a CSV of [+-]CLASS_NAMES, + * classes are independent, like _DISJOINT_BITS. + */ + DD_CLASS_TYPE_LEVEL_NAMES, + /** + * DD_CLASS_TYPE_LEVEL_NAMES: input is a CSV of [+-]CLASS_NAMES, + * intended for names like: INFO,DEBUG,TRACE, with a module prefix + * avoid EMERG,ALERT,CRIT,ERR,WARNING: they're not debug + */ +}; + +struct ddebug_class_map { + struct list_head link; + struct module *mod; + const char *mod_name; /* needed for builtins */ + const char **class_names; + const int length; + const int base; /* index of 1st .class_id, allows split/shared space */ + enum class_map_type map_type; +}; + +/** + * DECLARE_DYNDBG_CLASSMAP - declare classnames known by a module + * @_var: a struct ddebug_class_map, passed to module_param_cb + * @_type: enum class_map_type, chooses bits/verbose, numeric/symbolic + * @_base: offset of 1st class-name. splits .class_id space + * @classes: class-names used to control class'd prdbgs + */ +#define DECLARE_DYNDBG_CLASSMAP(_var, _maptype, _base, ...) \ + static const char *_var##_classnames[] = { __VA_ARGS__ }; \ + static struct ddebug_class_map __aligned(8) __used \ + __section("__dyndbg_classes") _var = { \ + .mod = THIS_MODULE, \ + .mod_name = KBUILD_MODNAME, \ + .base = _base, \ + .map_type = _maptype, \ + .length = NUM_TYPE_ARGS(char*, __VA_ARGS__), \ + .class_names = _var##_classnames, \ + } +#define NUM_TYPE_ARGS(eltype, ...) \ + (sizeof((eltype[]){__VA_ARGS__}) / sizeof(eltype)) + /* encapsulate linker provided built-in (or module) dyndbg data */ struct _ddebug_info { struct _ddebug *descs; -- cgit From 66f4006b6ace1a1a1a1dca4225972f79a298e251 Mon Sep 17 00:00:00 2001 From: Jim Cromie Date: Sun, 4 Sep 2022 15:40:52 -0600 Subject: kernel/module: add __dyndbg_classes section Add __dyndbg_classes section, using __dyndbg as a model. Use it: vmlinux.lds.h: KEEP the new section, which also silences orphan section warning on loadable modules. Add (__start_/__stop_)__dyndbg_classes linker symbols for the c externs (below). kernel/module/main.c: - fill new fields in find_module_sections(), using section_objs() - extend callchain prototypes to pass classes, length load_module(): pass new info to dynamic_debug_setup() dynamic_debug_setup(): new params, pass through to ddebug_add_module() dynamic_debug.c: - add externs to the linker symbols. ddebug_add_module(): - It currently builds a debug_table, and *will* find and attach classes. dynamic_debug_init(): - add class fields to the _ddebug_info cursor var: di. Signed-off-by: Jim Cromie Link: https://lore.kernel.org/r/20220904214134.408619-16-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/dynamic_debug.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index 98dbf1d49984..9073a43a2039 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -114,7 +114,9 @@ struct ddebug_class_map { /* encapsulate linker provided built-in (or module) dyndbg data */ struct _ddebug_info { struct _ddebug *descs; + struct ddebug_class_map *classes; unsigned int num_descs; + unsigned int num_classes; }; #if defined(CONFIG_DYNAMIC_DEBUG_CORE) -- cgit From b9400852c0801aebee4fc8b62e6b7cc69c7fcbda Mon Sep 17 00:00:00 2001 From: Jim Cromie Date: Sun, 4 Sep 2022 15:40:57 -0600 Subject: dyndbg: add drm.debug style (drm/parameters/debug) bitmap support Add kernel_param_ops and callbacks to use a class-map to validate and apply input to a sysfs-node, which allows users to control classes defined in that class-map. This supports uses like: echo 0x3 > /sys/module/drm/parameters/debug IE add these: - int param_set_dyndbg_classes() - int param_get_dyndbg_classes() - struct kernel_param_ops param_ops_dyndbg_classes Following the model of kernel/params.c STANDARD_PARAM_DEFS, these are non-static and exported. This might be unnecessary here. get/set use an augmented kernel_param; the arg refs a new struct ddebug_class_param, which contains: - A ptr to user's state-store; a union of &ulong for drm.debug, &int for nouveau level debug. By ref'g the client's bit-state _var, code coordinates with existing code (like drm_debug_enabled) which uses it, so existing/remaining calls can work unchanged. Changing drm.debug to a ulong allows use of BIT() etc. - FLAGS: dyndbg.flags toggled by changes to bitmap. Usually just "p". - MAP: a pointer to struct ddebug_classes_map, which maps those class-names to .class_ids 0..N that the module is using. This class-map is declared & initialized by DECLARE_DYNDBG_CLASSMAP. - map-type: 4 enums DD_CLASS_TYPE_* select 2 input forms and 2 meanings. numeric input: DD_CLASS_TYPE_DISJOINT_BITS integer input, independent bits. ie: drm.debug DD_CLASS_TYPE_LEVEL_NUM integer input, 0..N levels classnames-list (comma separated) input: DD_CLASS_TYPE_DISJOINT_NAMES each name affects a bit, others preserved DD_CLASS_TYPE_LEVEL_NAMES names have level meanings, like kern_levels.h _NAMES - comma-separated classnames (with optional +-) _NUM - numeric input, 0-N expected _BITS - numeric input, 0x1F bitmap form expected _DISJOINT - bits are independent _LEVEL - (x /sys/module/drm/parameters/debug_catnames A naive _LEVEL_NAMES use, with one class, that sets all in the class-map according to (x /sys/module/test_dynamic_debug/parameters/p_level_names : problem solved echo -L1 > /sys/module/test_dynamic_debug/parameters/p_level_names Note this artifact: : this is same as prev cmd (due to +/-) echo L0 > /sys/module/test_dynamic_debug/parameters/p_level_names : this is "even-more" off, but same wo __pr_debug_class(L0, ".."). echo -L0 > /sys/module/test_dynamic_debug/parameters/p_level_names A stress-test/make-work usage (kid toggling a light switch): echo +L7,L0,L7,L0,L7,L0,L7,L0,L7,L0,L7,L0,L7 \ > /sys/module/test_dynamic_debug/parameters/p_level_names ddebug_apply_class_bitmap(): inside-fn, works on bitmaps, receives new-bits, finds diffs vs client-bitvector holding "current" state, and issues exec_query to commit the adjustment. param_set_dyndbg_classes(): interface fn, sends _NAMES to param_set_dyndbg_classnames() and returns, falls thru to handle _BITS, _NUM internally, and calls ddebug_apply_class_bitmap(). Finishes by updating state. param_set_dyndbg_classnames(): handles classnames-list in loop, calls ddebug_apply_class_bitmap for each, then updates state. NOTES: _LEVEL_ is overlay on _DISJOINT_; inputs are converted to a bitmask, by the callbacks. IOW this is possible, and possibly confusing: echo class V3 +p > control echo class V1 -p > control IMO thats ok, relative verbosity is an interface property. _LEVEL_NUM maps still need class-names, even though the names are not usable at the sysfs interface (unlike with _NAMES style). The names are the only way to >control the classes. - It must have a "V0" name, something below "V1" to turn "V1" off. __pr_debug_cls(V0,..) is printk, don't do that. - "class names" is required at the >control interface. - relative levels are not enforced at >control _LEVEL_NAMES bear +/- signs, which alters the on-bit-pos by 1. IOW, +L2 means L0,L1,L2, and -L2 means just L0,L1. This kinda spoils the readback fidelity, since the L0 bit gets turned on by any use of any L*, except "-L0". All the interface uncertainty here pertains to the _NAMES features. Nobody has actually asked for this, so its practical (if a little tedious) to split it out. Signed-off-by: Jim Cromie Link: https://lore.kernel.org/r/20220904214134.408619-21-jim.cromie@gmail.com Signed-off-by: Greg Kroah-Hartman --- include/linux/dynamic_debug.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h index 9073a43a2039..41682278d2e8 100644 --- a/include/linux/dynamic_debug.h +++ b/include/linux/dynamic_debug.h @@ -119,6 +119,15 @@ struct _ddebug_info { unsigned int num_classes; }; +struct ddebug_class_param { + union { + unsigned long *bits; + unsigned int *lvl; + }; + char flags[8]; + const struct ddebug_class_map *map; +}; + #if defined(CONFIG_DYNAMIC_DEBUG_CORE) int ddebug_add_module(struct _ddebug_info *dyndbg, const char *modname); @@ -278,6 +287,10 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor, KERN_DEBUG, prefix_str, prefix_type, \ rowsize, groupsize, buf, len, ascii) +struct kernel_param; +int param_set_dyndbg_classes(const char *instr, const struct kernel_param *kp); +int param_get_dyndbg_classes(char *buffer, const struct kernel_param *kp); + /* for test only, generally expect drm.debug style macro wrappers */ #define __pr_debug_cls(cls, fmt, ...) do { \ BUILD_BUG_ON_MSG(!__builtin_constant_p(cls), \ @@ -324,6 +337,14 @@ static inline int ddebug_dyndbg_module_param_cb(char *param, char *val, rowsize, groupsize, buf, len, ascii); \ } while (0) +struct kernel_param; +static inline int param_set_dyndbg_classes(const char *instr, const struct kernel_param *kp) +{ return 0; } +static inline int param_get_dyndbg_classes(char *buffer, const struct kernel_param *kp) +{ return 0; } + #endif /* !CONFIG_DYNAMIC_DEBUG_CORE */ +extern const struct kernel_param_ops param_ops_dyndbg_classes; + #endif -- cgit From 95f2f26f3cac06cfc046d2b29e60719d7848ea54 Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Tue, 6 Sep 2022 17:12:58 +0200 Subject: bpf: split btf_check_subprog_arg_match in two btf_check_subprog_arg_match() was used twice in verifier.c: - when checking for the type mismatches between a (sub)prog declaration and BTF - when checking the call of a subprog to see if the provided arguments are correct and valid This is problematic when we check if the first argument of a program (pointer to ctx) is correctly accessed: To be able to ensure we access a valid memory in the ctx, the verifier assumes the pointer to context is not null. This has the side effect of marking the program accessing the entire context, even if the context is never dereferenced. For example, by checking the context access with the current code, the following eBPF program would fail with -EINVAL if the ctx is set to null from the userspace: ``` SEC("syscall") int prog(struct my_ctx *args) { return 0; } ``` In that particular case, we do not want to actually check that the memory is correct while checking for the BTF validity, but we just want to ensure that the (sub)prog definition matches the BTF we have. So split btf_check_subprog_arg_match() in two so we can actually check for the memory used when in a call, and ignore that part when not. Note that a further patch is in preparation to disentangled btf_check_func_arg_match() from these two purposes, and so right now we just add a new hack around that by adding a boolean to this function. Signed-off-by: Benjamin Tissoires Acked-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220906151303.2780789-3-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 4d32f125f4af..3cf161cfd396 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1947,6 +1947,8 @@ int btf_distill_func_proto(struct bpf_verifier_log *log, struct bpf_reg_state; int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog, struct bpf_reg_state *regs); +int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog, + struct bpf_reg_state *regs); int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, const struct btf *btf, u32 func_id, struct bpf_reg_state *regs, -- cgit From eb1f7f71c126c8fd50ea81af98f97c4b581ea4ae Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Tue, 6 Sep 2022 17:13:02 +0200 Subject: bpf/verifier: allow kfunc to return an allocated mem For drivers (outside of network), the incoming data is not statically defined in a struct. Most of the time the data buffer is kzalloc-ed and thus we can not rely on eBPF and BTF to explore the data. This commit allows to return an arbitrary memory, previously allocated by the driver. An interesting extra point is that the kfunc can mark the exported memory region as read only or read/write. So, when a kfunc is not returning a pointer to a struct but to a plain type, we can consider it is a valid allocated memory assuming that: - one of the arguments is either called rdonly_buf_size or rdwr_buf_size - and this argument is a const from the caller point of view We can then use this parameter as the size of the allocated memory. The memory is either read-only or read-write based on the name of the size parameter. Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Benjamin Tissoires Link: https://lore.kernel.org/r/20220906151303.2780789-7-benjamin.tissoires@redhat.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 9 ++++++++- include/linux/bpf_verifier.h | 2 ++ include/linux/btf.h | 10 ++++++++++ 3 files changed, 20 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3cf161cfd396..79883f883ff3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1944,6 +1944,13 @@ int btf_distill_func_proto(struct bpf_verifier_log *log, const char *func_name, struct btf_func_model *m); +struct bpf_kfunc_arg_meta { + u64 r0_size; + bool r0_rdonly; + int ref_obj_id; + u32 flags; +}; + struct bpf_reg_state; int btf_check_subprog_arg_match(struct bpf_verifier_env *env, int subprog, struct bpf_reg_state *regs); @@ -1952,7 +1959,7 @@ int btf_check_subprog_call(struct bpf_verifier_env *env, int subprog, int btf_check_kfunc_arg_match(struct bpf_verifier_env *env, const struct btf *btf, u32 func_id, struct bpf_reg_state *regs, - u32 kfunc_flags); + struct bpf_kfunc_arg_meta *meta); int btf_prepare_func_args(struct bpf_verifier_env *env, int subprog, struct bpf_reg_state *reg); int btf_check_type_match(struct bpf_verifier_log *log, const struct bpf_prog *prog, diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 1fdddbf3546b..8fbc1d05281e 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -598,6 +598,8 @@ int bpf_check_attach_target(struct bpf_verifier_log *log, struct bpf_attach_target_info *tgt_info); void bpf_free_kfunc_btf_tab(struct bpf_kfunc_btf_tab *tab); +int mark_chain_precision(struct bpf_verifier_env *env, int regno); + #define BPF_BASE_TYPE_MASK GENMASK(BPF_BASE_TYPE_BITS - 1, 0) /* extract base type from bpf_{arg, return, reg}_type. */ diff --git a/include/linux/btf.h b/include/linux/btf.h index ad93c2d9cc1c..1fcc833a8690 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -441,4 +441,14 @@ static inline int register_btf_id_dtor_kfuncs(const struct btf_id_dtor_kfunc *dt } #endif +static inline bool btf_type_is_struct_ptr(struct btf *btf, const struct btf_type *t) +{ + if (!btf_type_is_ptr(t)) + return false; + + t = btf_type_skip_modifiers(btf, t->type, NULL); + + return btf_type_is_struct(t); +} + #endif -- cgit From 448325199f574d33824dbf9121efb03558412966 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Sun, 4 Sep 2022 22:41:14 +0200 Subject: bpf: Add copy_map_value_long to copy to remote percpu memory bpf_long_memcpy is used while copying to remote percpu regions from BPF syscall and helpers, so that the copy is atomic at word size granularity. This might not be possible when you copy from map value hosting kptrs from or to percpu maps, as the alignment or size in disjoint regions may not be multiple of word size. Hence, to avoid complicating the copy loop, we only use bpf_long_memcpy when special fields are not present, otherwise use normal memcpy to copy the disjoint regions. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220904204145.3089-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 52 +++++++++++++++++++++++++++++++++------------------- 1 file changed, 33 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 79883f883ff3..6a73e94821c4 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -280,14 +280,33 @@ static inline void check_and_init_map_value(struct bpf_map *map, void *dst) } } -/* copy everything but bpf_spin_lock and bpf_timer. There could be one of each. */ -static inline void copy_map_value(struct bpf_map *map, void *dst, void *src) +/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and + * forced to use 'long' read/writes to try to atomically copy long counters. + * Best-effort only. No barriers here, since it _will_ race with concurrent + * updates from BPF programs. Called from bpf syscall and mostly used with + * size 8 or 16 bytes, so ask compiler to inline it. + */ +static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) +{ + const long *lsrc = src; + long *ldst = dst; + + size /= sizeof(long); + while (size--) + *ldst++ = *lsrc++; +} + +/* copy everything but bpf_spin_lock, bpf_timer, and kptrs. There could be one of each. */ +static inline void __copy_map_value(struct bpf_map *map, void *dst, void *src, bool long_memcpy) { u32 curr_off = 0; int i; if (likely(!map->off_arr)) { - memcpy(dst, src, map->value_size); + if (long_memcpy) + bpf_long_memcpy(dst, src, round_up(map->value_size, 8)); + else + memcpy(dst, src, map->value_size); return; } @@ -299,6 +318,17 @@ static inline void copy_map_value(struct bpf_map *map, void *dst, void *src) } memcpy(dst + curr_off, src + curr_off, map->value_size - curr_off); } + +static inline void copy_map_value(struct bpf_map *map, void *dst, void *src) +{ + __copy_map_value(map, dst, src, false); +} + +static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src) +{ + __copy_map_value(map, dst, src, true); +} + void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, bool lock_src); void bpf_timer_cancel_and_free(void *timer); @@ -1827,22 +1857,6 @@ int bpf_get_file_flag(int flags); int bpf_check_uarg_tail_zero(bpfptr_t uaddr, size_t expected_size, size_t actual_size); -/* memcpy that is used with 8-byte aligned pointers, power-of-8 size and - * forced to use 'long' read/writes to try to atomically copy long counters. - * Best-effort only. No barriers here, since it _will_ race with concurrent - * updates from BPF programs. Called from bpf syscall and mostly used with - * size 8 or 16 bytes, so ask compiler to inline it. - */ -static inline void bpf_long_memcpy(void *dst, const void *src, u32 size) -{ - const long *lsrc = src; - long *ldst = dst; - - size /= sizeof(long); - while (size--) - *ldst++ = *lsrc++; -} - /* verify correctness of eBPF program */ int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr); -- cgit From cc48755808c646666436745b35629c3f0d05e165 Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Sun, 4 Sep 2022 22:41:16 +0200 Subject: bpf: Add zero_map_value to zero map value with special fields We need this helper to skip over special fields (bpf_spin_lock, bpf_timer, kptrs) while zeroing a map value. Use the same logic as copy_map_value but memset instead of memcpy. Currently, the code zeroing map value memory does not have to deal with special fields, hence this is a prerequisite for introducing such support. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220904204145.3089-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6a73e94821c4..48ae05099f36 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -329,6 +329,25 @@ static inline void copy_map_value_long(struct bpf_map *map, void *dst, void *src __copy_map_value(map, dst, src, true); } +static inline void zero_map_value(struct bpf_map *map, void *dst) +{ + u32 curr_off = 0; + int i; + + if (likely(!map->off_arr)) { + memset(dst, 0, map->value_size); + return; + } + + for (i = 0; i < map->off_arr->cnt; i++) { + u32 next_off = map->off_arr->field_off[i]; + + memset(dst + curr_off, 0, next_off - curr_off); + curr_off += map->off_arr->field_sz[i]; + } + memset(dst + curr_off, 0, map->value_size - curr_off); +} + void copy_map_value_locked(struct bpf_map *map, void *dst, void *src, bool lock_src); void bpf_timer_cancel_and_free(void *timer); -- cgit From 5950e5d574c636a07dd21a872c2f8b41f6d20c55 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 22 Aug 2022 13:18:17 +0200 Subject: freezer: Have {,un}lock_system_sleep() save/restore flags Rafael explained that the reason for having both PF_NOFREEZE and PF_FREEZER_SKIP is that {,un}lock_system_sleep() is callable from kthread context that has previously called set_freezable(). In preparation of merging the flags, have {,un}lock_system_slee() save and restore current->flags. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Rafael J. Wysocki Link: https://lore.kernel.org/r/20220822114648.725003428@infradead.org --- include/linux/suspend.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 70f2921e2e70..34fb72c0db85 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -510,8 +510,8 @@ extern bool pm_save_wakeup_count(unsigned int count); extern void pm_wakep_autosleep_enabled(bool set); extern void pm_print_active_wakeup_sources(void); -extern void lock_system_sleep(void); -extern void unlock_system_sleep(void); +extern unsigned int lock_system_sleep(void); +extern void unlock_system_sleep(unsigned int); #else /* !CONFIG_PM_SLEEP */ @@ -534,8 +534,8 @@ static inline void pm_system_wakeup(void) {} static inline void pm_wakeup_clear(bool reset) {} static inline void pm_system_irq_wakeup(unsigned int irq_number) {} -static inline void lock_system_sleep(void) {} -static inline void unlock_system_sleep(void) {} +static inline unsigned int lock_system_sleep(void) { return 0; } +static inline void unlock_system_sleep(unsigned int flags) {} #endif /* !CONFIG_PM_SLEEP */ -- cgit From 1fbcaa923ce2d7e6de17abd74fa076dc1e0be1a2 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 22 Aug 2022 13:18:18 +0200 Subject: freezer,umh: Clean up freezer/initrd interaction handle_initrd() marks itself as PF_FREEZER_SKIP in order to ensure that the UMH, which is going to freeze the system, doesn't indefinitely wait for it's caller. Rework things by adding UMH_FREEZABLE to indicate the completion is freezable. Signed-off-by: Peter Zijlstra (Intel) Acked-by: Rafael J. Wysocki Link: https://lore.kernel.org/r/20220822114648.791019324@infradead.org --- include/linux/umh.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/umh.h b/include/linux/umh.h index 244aff638220..5d1f6129b847 100644 --- a/include/linux/umh.h +++ b/include/linux/umh.h @@ -11,10 +11,11 @@ struct cred; struct file; -#define UMH_NO_WAIT 0 /* don't wait at all */ -#define UMH_WAIT_EXEC 1 /* wait for the exec, but not the process */ -#define UMH_WAIT_PROC 2 /* wait for the process to complete */ -#define UMH_KILLABLE 4 /* wait for EXEC/PROC killable */ +#define UMH_NO_WAIT 0x00 /* don't wait at all */ +#define UMH_WAIT_EXEC 0x01 /* wait for the exec, but not the process */ +#define UMH_WAIT_PROC 0x02 /* wait for the process to complete */ +#define UMH_KILLABLE 0x04 /* wait for EXEC/PROC killable */ +#define UMH_FREEZABLE 0x08 /* wait for EXEC/PROC freezable */ struct subprocess_info { struct work_struct work; -- cgit From f9fc8cad9728124cefe8844fb53d1814c92c6bfc Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 6 Sep 2022 12:39:55 +0200 Subject: sched: Add TASK_ANY for wait_task_inactive() Now that wait_task_inactive()'s @match_state argument is a mask (like ttwu()) it is possible to replace the special !match_state case with an 'all-states' value such that any blocked state will match. Suggested-by: Ingo Molnar (mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/YxhkzfuFTvRnpUaH@hirez.programming.kicks-ass.net --- include/linux/sched.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index e7b2f8a5c711..0facaadbae71 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -101,6 +101,8 @@ struct task_group; #define TASK_RTLOCK_WAIT 0x1000 #define TASK_STATE_MAX 0x2000 +#define TASK_ANY (TASK_STATE_MAX-1) + /* Convenience macros for the sake of set_current_state: */ #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE) #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) -- cgit From 929659acea03db6411a32de9037abab9f856f586 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 22 Aug 2022 13:18:20 +0200 Subject: sched/completion: Add wait_for_completion_state() Allows waiting with a custom @state. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ingo Molnar Link: https://lore.kernel.org/r/20220822114648.922711674@infradead.org --- include/linux/completion.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/completion.h b/include/linux/completion.h index 51d9ab079629..62b32b19e0a8 100644 --- a/include/linux/completion.h +++ b/include/linux/completion.h @@ -103,6 +103,7 @@ extern void wait_for_completion(struct completion *); extern void wait_for_completion_io(struct completion *); extern int wait_for_completion_interruptible(struct completion *x); extern int wait_for_completion_killable(struct completion *x); +extern int wait_for_completion_state(struct completion *x, unsigned int state); extern unsigned long wait_for_completion_timeout(struct completion *x, unsigned long timeout); extern unsigned long wait_for_completion_io_timeout(struct completion *x, -- cgit From 3f884a10ab41efe2d495bf8ec8d443d70c500a60 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 22 Aug 2022 13:18:21 +0200 Subject: sched/wait: Add wait_event_state() Allows waiting with a custom @state. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ingo Molnar Link: https://lore.kernel.org/r/20220822114648.989212021@infradead.org --- include/linux/wait.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/wait.h b/include/linux/wait.h index 58cfbf81447c..b926eb9f3e26 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -932,6 +932,34 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *); __ret; \ }) +#define __wait_event_state(wq, condition, state) \ + ___wait_event(wq, condition, state, 0, 0, schedule()) + +/** + * wait_event_state - sleep until a condition gets true + * @wq_head: the waitqueue to wait on + * @condition: a C expression for the event to wait for + * @state: state to sleep in + * + * The process is put to sleep (@state) until the @condition evaluates to true + * or a signal is received (when allowed by @state). The @condition is checked + * each time the waitqueue @wq_head is woken up. + * + * wake_up() has to be called after changing any variable that could + * change the result of the wait condition. + * + * The function will return -ERESTARTSYS if it was interrupted by a signal + * (when allowed by @state) and 0 if @condition evaluated to true. + */ +#define wait_event_state(wq_head, condition, state) \ +({ \ + int __ret = 0; \ + might_sleep(); \ + if (!(condition)) \ + __ret = __wait_event_state(wq_head, condition, state); \ + __ret; \ +}) + #define __wait_event_killable_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ TASK_KILLABLE, 0, timeout, \ -- cgit From 9963e444f71e671bcbc30d61cf23a2c686ac7d05 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 6 Sep 2022 13:11:38 +0200 Subject: sched: Widen TAKS_state literals In preparation of adding more states, add a few 0s to the literals as we've just about ran out. Suggested-by: Ingo Molnar Signed-off-by: Peter Zijlstra (Intel) --- include/linux/sched.h | 30 +++++++++++++++--------------- 1 file changed, 15 insertions(+), 15 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 0facaadbae71..9fc23a2d9813 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -81,25 +81,25 @@ struct task_group; */ /* Used in tsk->state: */ -#define TASK_RUNNING 0x0000 -#define TASK_INTERRUPTIBLE 0x0001 -#define TASK_UNINTERRUPTIBLE 0x0002 -#define __TASK_STOPPED 0x0004 -#define __TASK_TRACED 0x0008 +#define TASK_RUNNING 0x00000000 +#define TASK_INTERRUPTIBLE 0x00000001 +#define TASK_UNINTERRUPTIBLE 0x00000002 +#define __TASK_STOPPED 0x00000004 +#define __TASK_TRACED 0x00000008 /* Used in tsk->exit_state: */ -#define EXIT_DEAD 0x0010 -#define EXIT_ZOMBIE 0x0020 +#define EXIT_DEAD 0x00000010 +#define EXIT_ZOMBIE 0x00000020 #define EXIT_TRACE (EXIT_ZOMBIE | EXIT_DEAD) /* Used in tsk->state again: */ -#define TASK_PARKED 0x0040 -#define TASK_DEAD 0x0080 -#define TASK_WAKEKILL 0x0100 -#define TASK_WAKING 0x0200 -#define TASK_NOLOAD 0x0400 -#define TASK_NEW 0x0800 +#define TASK_PARKED 0x00000040 +#define TASK_DEAD 0x00000080 +#define TASK_WAKEKILL 0x00000100 +#define TASK_WAKING 0x00000200 +#define TASK_NOLOAD 0x00000400 +#define TASK_NEW 0x00000800 /* RT specific auxilliary flag to mark RT lock waiters */ -#define TASK_RTLOCK_WAIT 0x1000 -#define TASK_STATE_MAX 0x2000 +#define TASK_RTLOCK_WAIT 0x00001000 +#define TASK_STATE_MAX 0x00002000 #define TASK_ANY (TASK_STATE_MAX-1) -- cgit From f5d39b020809146cc28e6e73369bf8065e0310aa Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 22 Aug 2022 13:18:22 +0200 Subject: freezer,sched: Rewrite core freezer logic Rewrite the core freezer to behave better wrt thawing and be simpler in general. By replacing PF_FROZEN with TASK_FROZEN, a special block state, it is ensured frozen tasks stay frozen until thawed and don't randomly wake up early, as is currently possible. As such, it does away with PF_FROZEN and PF_FREEZER_SKIP, freeing up two PF_flags (yay!). Specifically; the current scheme works a little like: freezer_do_not_count(); schedule(); freezer_count(); And either the task is blocked, or it lands in try_to_freezer() through freezer_count(). Now, when it is blocked, the freezer considers it frozen and continues. However, on thawing, once pm_freezing is cleared, freezer_count() stops working, and any random/spurious wakeup will let a task run before its time. That is, thawing tries to thaw things in explicit order; kernel threads and workqueues before doing bringing SMP back before userspace etc.. However due to the above mentioned races it is entirely possible for userspace tasks to thaw (by accident) before SMP is back. This can be a fatal problem in asymmetric ISA architectures (eg ARMv9) where the userspace task requires a special CPU to run. As said; replace this with a special task state TASK_FROZEN and add the following state transitions: TASK_FREEZABLE -> TASK_FROZEN __TASK_STOPPED -> TASK_FROZEN __TASK_TRACED -> TASK_FROZEN The new TASK_FREEZABLE can be set on any state part of TASK_NORMAL (IOW. TASK_INTERRUPTIBLE and TASK_UNINTERRUPTIBLE) -- any such state is already required to deal with spurious wakeups and the freezer causes one such when thawing the task (since the original state is lost). The special __TASK_{STOPPED,TRACED} states *can* be restored since their canonical state is in ->jobctl. With this, frozen tasks need an explicit TASK_FROZEN wakeup and are free of undue (early / spurious) wakeups. Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ingo Molnar Acked-by: Rafael J. Wysocki Link: https://lore.kernel.org/r/20220822114649.055452969@infradead.org --- include/linux/freezer.h | 245 ++----------------------------------------- include/linux/sched.h | 13 ++- include/linux/sunrpc/sched.h | 7 +- include/linux/wait.h | 12 +-- 4 files changed, 26 insertions(+), 251 deletions(-) (limited to 'include/linux') diff --git a/include/linux/freezer.h b/include/linux/freezer.h index 0621c5f86c39..b303472255be 100644 --- a/include/linux/freezer.h +++ b/include/linux/freezer.h @@ -8,9 +8,11 @@ #include #include #include +#include #ifdef CONFIG_FREEZER -extern atomic_t system_freezing_cnt; /* nr of freezing conds in effect */ +DECLARE_STATIC_KEY_FALSE(freezer_active); + extern bool pm_freezing; /* PM freezing in effect */ extern bool pm_nosig_freezing; /* PM nosig freezing in effect */ @@ -22,10 +24,7 @@ extern unsigned int freeze_timeout_msecs; /* * Check if a process has been frozen */ -static inline bool frozen(struct task_struct *p) -{ - return p->flags & PF_FROZEN; -} +extern bool frozen(struct task_struct *p); extern bool freezing_slow_path(struct task_struct *p); @@ -34,9 +33,10 @@ extern bool freezing_slow_path(struct task_struct *p); */ static inline bool freezing(struct task_struct *p) { - if (likely(!atomic_read(&system_freezing_cnt))) - return false; - return freezing_slow_path(p); + if (static_branch_unlikely(&freezer_active)) + return freezing_slow_path(p); + + return false; } /* Takes and releases task alloc lock using task_lock() */ @@ -48,23 +48,14 @@ extern int freeze_kernel_threads(void); extern void thaw_processes(void); extern void thaw_kernel_threads(void); -/* - * DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION - * If try_to_freeze causes a lockdep warning it means the caller may deadlock - */ -static inline bool try_to_freeze_unsafe(void) +static inline bool try_to_freeze(void) { might_sleep(); if (likely(!freezing(current))) return false; - return __refrigerator(false); -} - -static inline bool try_to_freeze(void) -{ if (!(current->flags & PF_NOFREEZE)) debug_check_no_locks_held(); - return try_to_freeze_unsafe(); + return __refrigerator(false); } extern bool freeze_task(struct task_struct *p); @@ -79,195 +70,6 @@ static inline bool cgroup_freezing(struct task_struct *task) } #endif /* !CONFIG_CGROUP_FREEZER */ -/* - * The PF_FREEZER_SKIP flag should be set by a vfork parent right before it - * calls wait_for_completion(&vfork) and reset right after it returns from this - * function. Next, the parent should call try_to_freeze() to freeze itself - * appropriately in case the child has exited before the freezing of tasks is - * complete. However, we don't want kernel threads to be frozen in unexpected - * places, so we allow them to block freeze_processes() instead or to set - * PF_NOFREEZE if needed. Fortunately, in the ____call_usermodehelper() case the - * parent won't really block freeze_processes(), since ____call_usermodehelper() - * (the child) does a little before exec/exit and it can't be frozen before - * waking up the parent. - */ - - -/** - * freezer_do_not_count - tell freezer to ignore %current - * - * Tell freezers to ignore the current task when determining whether the - * target frozen state is reached. IOW, the current task will be - * considered frozen enough by freezers. - * - * The caller shouldn't do anything which isn't allowed for a frozen task - * until freezer_cont() is called. Usually, freezer[_do_not]_count() pair - * wrap a scheduling operation and nothing much else. - */ -static inline void freezer_do_not_count(void) -{ - current->flags |= PF_FREEZER_SKIP; -} - -/** - * freezer_count - tell freezer to stop ignoring %current - * - * Undo freezer_do_not_count(). It tells freezers that %current should be - * considered again and tries to freeze if freezing condition is already in - * effect. - */ -static inline void freezer_count(void) -{ - current->flags &= ~PF_FREEZER_SKIP; - /* - * If freezing is in progress, the following paired with smp_mb() - * in freezer_should_skip() ensures that either we see %true - * freezing() or freezer_should_skip() sees !PF_FREEZER_SKIP. - */ - smp_mb(); - try_to_freeze(); -} - -/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */ -static inline void freezer_count_unsafe(void) -{ - current->flags &= ~PF_FREEZER_SKIP; - smp_mb(); - try_to_freeze_unsafe(); -} - -/** - * freezer_should_skip - whether to skip a task when determining frozen - * state is reached - * @p: task in quesion - * - * This function is used by freezers after establishing %true freezing() to - * test whether a task should be skipped when determining the target frozen - * state is reached. IOW, if this function returns %true, @p is considered - * frozen enough. - */ -static inline bool freezer_should_skip(struct task_struct *p) -{ - /* - * The following smp_mb() paired with the one in freezer_count() - * ensures that either freezer_count() sees %true freezing() or we - * see cleared %PF_FREEZER_SKIP and return %false. This makes it - * impossible for a task to slip frozen state testing after - * clearing %PF_FREEZER_SKIP. - */ - smp_mb(); - return p->flags & PF_FREEZER_SKIP; -} - -/* - * These functions are intended to be used whenever you want allow a sleeping - * task to be frozen. Note that neither return any clear indication of - * whether a freeze event happened while in this function. - */ - -/* Like schedule(), but should not block the freezer. */ -static inline void freezable_schedule(void) -{ - freezer_do_not_count(); - schedule(); - freezer_count(); -} - -/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */ -static inline void freezable_schedule_unsafe(void) -{ - freezer_do_not_count(); - schedule(); - freezer_count_unsafe(); -} - -/* - * Like schedule_timeout(), but should not block the freezer. Do not - * call this with locks held. - */ -static inline long freezable_schedule_timeout(long timeout) -{ - long __retval; - freezer_do_not_count(); - __retval = schedule_timeout(timeout); - freezer_count(); - return __retval; -} - -/* - * Like schedule_timeout_interruptible(), but should not block the freezer. Do not - * call this with locks held. - */ -static inline long freezable_schedule_timeout_interruptible(long timeout) -{ - long __retval; - freezer_do_not_count(); - __retval = schedule_timeout_interruptible(timeout); - freezer_count(); - return __retval; -} - -/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */ -static inline long freezable_schedule_timeout_interruptible_unsafe(long timeout) -{ - long __retval; - - freezer_do_not_count(); - __retval = schedule_timeout_interruptible(timeout); - freezer_count_unsafe(); - return __retval; -} - -/* Like schedule_timeout_killable(), but should not block the freezer. */ -static inline long freezable_schedule_timeout_killable(long timeout) -{ - long __retval; - freezer_do_not_count(); - __retval = schedule_timeout_killable(timeout); - freezer_count(); - return __retval; -} - -/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */ -static inline long freezable_schedule_timeout_killable_unsafe(long timeout) -{ - long __retval; - freezer_do_not_count(); - __retval = schedule_timeout_killable(timeout); - freezer_count_unsafe(); - return __retval; -} - -/* - * Like schedule_hrtimeout_range(), but should not block the freezer. Do not - * call this with locks held. - */ -static inline int freezable_schedule_hrtimeout_range(ktime_t *expires, - u64 delta, const enum hrtimer_mode mode) -{ - int __retval; - freezer_do_not_count(); - __retval = schedule_hrtimeout_range(expires, delta, mode); - freezer_count(); - return __retval; -} - -/* - * Freezer-friendly wrappers around wait_event_interruptible(), - * wait_event_killable() and wait_event_interruptible_timeout(), originally - * defined in - */ - -/* DO NOT ADD ANY NEW CALLERS OF THIS FUNCTION */ -#define wait_event_freezekillable_unsafe(wq, condition) \ -({ \ - int __retval; \ - freezer_do_not_count(); \ - __retval = wait_event_killable(wq, (condition)); \ - freezer_count_unsafe(); \ - __retval; \ -}) - #else /* !CONFIG_FREEZER */ static inline bool frozen(struct task_struct *p) { return false; } static inline bool freezing(struct task_struct *p) { return false; } @@ -281,35 +83,8 @@ static inline void thaw_kernel_threads(void) {} static inline bool try_to_freeze(void) { return false; } -static inline void freezer_do_not_count(void) {} -static inline void freezer_count(void) {} -static inline int freezer_should_skip(struct task_struct *p) { return 0; } static inline void set_freezable(void) {} -#define freezable_schedule() schedule() - -#define freezable_schedule_unsafe() schedule() - -#define freezable_schedule_timeout(timeout) schedule_timeout(timeout) - -#define freezable_schedule_timeout_interruptible(timeout) \ - schedule_timeout_interruptible(timeout) - -#define freezable_schedule_timeout_interruptible_unsafe(timeout) \ - schedule_timeout_interruptible(timeout) - -#define freezable_schedule_timeout_killable(timeout) \ - schedule_timeout_killable(timeout) - -#define freezable_schedule_timeout_killable_unsafe(timeout) \ - schedule_timeout_killable(timeout) - -#define freezable_schedule_hrtimeout_range(expires, delta, mode) \ - schedule_hrtimeout_range(expires, delta, mode) - -#define wait_event_freezekillable_unsafe(wq, condition) \ - wait_event_killable(wq, condition) - #endif /* !CONFIG_FREEZER */ #endif /* FREEZER_H_INCLUDED */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 9fc23a2d9813..4c91efd6df82 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -97,12 +97,19 @@ struct task_group; #define TASK_WAKING 0x00000200 #define TASK_NOLOAD 0x00000400 #define TASK_NEW 0x00000800 -/* RT specific auxilliary flag to mark RT lock waiters */ #define TASK_RTLOCK_WAIT 0x00001000 -#define TASK_STATE_MAX 0x00002000 +#define TASK_FREEZABLE 0x00002000 +#define __TASK_FREEZABLE_UNSAFE (0x00004000 * IS_ENABLED(CONFIG_LOCKDEP)) +#define TASK_FROZEN 0x00008000 +#define TASK_STATE_MAX 0x00010000 #define TASK_ANY (TASK_STATE_MAX-1) +/* + * DO NOT ADD ANY NEW USERS ! + */ +#define TASK_FREEZABLE_UNSAFE (TASK_FREEZABLE | __TASK_FREEZABLE_UNSAFE) + /* Convenience macros for the sake of set_current_state: */ #define TASK_KILLABLE (TASK_WAKEKILL | TASK_UNINTERRUPTIBLE) #define TASK_STOPPED (TASK_WAKEKILL | __TASK_STOPPED) @@ -1716,7 +1723,6 @@ extern struct pid *cad_pid; #define PF_NPROC_EXCEEDED 0x00001000 /* set_user() noticed that RLIMIT_NPROC was exceeded */ #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ -#define PF_FROZEN 0x00010000 /* Frozen for system suspend */ #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */ @@ -1727,7 +1733,6 @@ extern struct pid *cad_pid; #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMALLOC_PIN 0x10000000 /* Allocation context constrained to zones which allow long term pinning. */ -#define PF_FREEZER_SKIP 0x40000000 /* Freezer should not count it as freezable */ #define PF_SUSPEND_TASK 0x80000000 /* This thread called freeze_processes() and should not be frozen */ /* diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index acc62647317c..baeca2f564dc 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -252,7 +252,7 @@ int rpc_malloc(struct rpc_task *); void rpc_free(struct rpc_task *); int rpciod_up(void); void rpciod_down(void); -int __rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *); +int rpc_wait_for_completion_task(struct rpc_task *task); #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) struct net; void rpc_show_tasks(struct net *); @@ -264,11 +264,6 @@ extern struct workqueue_struct *xprtiod_workqueue; void rpc_prepare_task(struct rpc_task *task); gfp_t rpc_task_gfp_mask(void); -static inline int rpc_wait_for_completion_task(struct rpc_task *task) -{ - return __rpc_wait_for_completion_task(task, NULL); -} - #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) || IS_ENABLED(CONFIG_TRACEPOINTS) static inline const char * rpc_qname(const struct rpc_wait_queue *q) { diff --git a/include/linux/wait.h b/include/linux/wait.h index b926eb9f3e26..14ad8a0e9fac 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -361,8 +361,8 @@ do { \ } while (0) #define __wait_event_freezable(wq_head, condition) \ - ___wait_event(wq_head, condition, TASK_INTERRUPTIBLE, 0, 0, \ - freezable_schedule()) + ___wait_event(wq_head, condition, (TASK_INTERRUPTIBLE|TASK_FREEZABLE), \ + 0, 0, schedule()) /** * wait_event_freezable - sleep (or freeze) until a condition gets true @@ -420,8 +420,8 @@ do { \ #define __wait_event_freezable_timeout(wq_head, condition, timeout) \ ___wait_event(wq_head, ___wait_cond_timeout(condition), \ - TASK_INTERRUPTIBLE, 0, timeout, \ - __ret = freezable_schedule_timeout(__ret)) + (TASK_INTERRUPTIBLE|TASK_FREEZABLE), 0, timeout, \ + __ret = schedule_timeout(__ret)) /* * like wait_event_timeout() -- except it uses TASK_INTERRUPTIBLE to avoid @@ -642,8 +642,8 @@ do { \ #define __wait_event_freezable_exclusive(wq, condition) \ - ___wait_event(wq, condition, TASK_INTERRUPTIBLE, 1, 0, \ - freezable_schedule()) + ___wait_event(wq, condition, (TASK_INTERRUPTIBLE|TASK_FREEZABLE), 1, 0,\ + schedule()) #define wait_event_freezable_exclusive(wq, condition) \ ({ \ -- cgit From fb04563d1cae6f361892b4a339ad92100b1eb0d0 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 6 Sep 2022 13:27:32 +0200 Subject: sched: Show PF_flag holes Put placeholders in PF_flag so we can see how holey it is. Suggested-by: Ingo Molnar Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Ingo Molnar Link: https://lkml.kernel.org/r/YxctoffFFPXONESt@hirez.programming.kicks-ass.net --- include/linux/sched.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 4c91efd6df82..15e3bd96e4ce 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1722,7 +1722,9 @@ extern struct pid *cad_pid; #define PF_MEMALLOC 0x00000800 /* Allocating memory */ #define PF_NPROC_EXCEEDED 0x00001000 /* set_user() noticed that RLIMIT_NPROC was exceeded */ #define PF_USED_MATH 0x00002000 /* If unset the fpu must be initialized before use */ +#define PF__HOLE__00004000 0x00004000 #define PF_NOFREEZE 0x00008000 /* This thread should not be frozen */ +#define PF__HOLE__00010000 0x00010000 #define PF_KSWAPD 0x00020000 /* I am kswapd */ #define PF_MEMALLOC_NOFS 0x00040000 /* All allocation requests will inherit GFP_NOFS */ #define PF_MEMALLOC_NOIO 0x00080000 /* All allocation requests will inherit GFP_NOIO */ @@ -1730,9 +1732,14 @@ extern struct pid *cad_pid; * I am cleaning dirty pages from some other bdi. */ #define PF_KTHREAD 0x00200000 /* I am a kernel thread */ #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */ +#define PF__HOLE__00800000 0x00800000 +#define PF__HOLE__01000000 0x01000000 +#define PF__HOLE__02000000 0x02000000 #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with cpus_mask */ #define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */ #define PF_MEMALLOC_PIN 0x10000000 /* Allocation context constrained to zones which allow long term pinning. */ +#define PF__HOLE__20000000 0x20000000 +#define PF__HOLE__40000000 0x40000000 #define PF_SUSPEND_TASK 0x80000000 /* This thread called freeze_processes() and should not be frozen */ /* -- cgit From 03b02db93be407103c385814033633364674a6f6 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Tue, 6 Sep 2022 14:14:14 +0530 Subject: perf: Consolidate branch sample filter helpers Besides the branch type filtering requests, 'event.attr.branch_sample_type' also contains various flags indicating which additional information should be captured, along with the base branch record. These flags help configure the underlying hardware, and capture the branch records appropriately when required e.g after PMU interrupt. But first, this moves an existing helper perf_sample_save_hw_index() into the header before adding some more helpers for other branch sample filter flags. Signed-off-by: Anshuman Khandual Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220906084414.396220-1-anshuman.khandual@arm.com --- include/linux/perf_event.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 581880ddb9ef..a627528e5f3a 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1685,4 +1685,30 @@ static inline void perf_lopwr_cb(bool mode) } #endif +#ifdef CONFIG_PERF_EVENTS +static inline bool branch_sample_no_flags(const struct perf_event *event) +{ + return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_NO_FLAGS; +} + +static inline bool branch_sample_no_cycles(const struct perf_event *event) +{ + return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_NO_CYCLES; +} + +static inline bool branch_sample_type(const struct perf_event *event) +{ + return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_TYPE_SAVE; +} + +static inline bool branch_sample_hw_index(const struct perf_event *event) +{ + return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX; +} + +static inline bool branch_sample_priv(const struct perf_event *event) +{ + return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_PRIV_SAVE; +} +#endif /* CONFIG_PERF_EVENTS */ #endif /* _LINUX_PERF_EVENT_H */ -- cgit From 7517f08b9a5eef0fa683b976c97d6178d00e6a3d Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Wed, 7 Sep 2022 14:49:21 +0530 Subject: perf/core: Expand PERF_EVENT_FLAG_ARCH Two hardware event flags on x86 platform has overshot PERF_EVENT_FLAG_ARCH (0x0000ffff). These flags are PERF_X86_EVENT_PEBS_LAT_HYBRID (0x20000) and PERF_X86_EVENT_AMD_BRS (0x10000). Lets expand PERF_EVENT_FLAG_ARCH mask to accommodate those flags, and also create room for two more in the future. Signed-off-by: Anshuman Khandual Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: James Clark Link: https://lkml.kernel.org/r/20220907091924.439193-2-anshuman.khandual@arm.com --- include/linux/perf_event.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index a627528e5f3a..3e3c07512b75 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -138,7 +138,7 @@ struct hw_perf_event_extra { * PERF_EVENT_FLAG_ARCH bits are reserved for architecture-specific * usage. */ -#define PERF_EVENT_FLAG_ARCH 0x0000ffff +#define PERF_EVENT_FLAG_ARCH 0x000fffff #define PERF_EVENT_FLAG_USER_READ_CNT 0x80000000 /** -- cgit From f67dd218fafd9de9a13d095e775b621db76a058f Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Wed, 7 Sep 2022 14:49:22 +0530 Subject: perf/core: Assert PERF_EVENT_FLAG_ARCH does not overlap with generic flags This just ensures that PERF_EVENT_FLAG_ARCH does not overlap with generic hardware event flags. Signed-off-by: Anshuman Khandual Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: James Clark Link: https://lkml.kernel.org/r/20220907091924.439193-3-anshuman.khandual@arm.com --- include/linux/perf_event.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 3e3c07512b75..f88cb31eaf75 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -141,6 +141,8 @@ struct hw_perf_event_extra { #define PERF_EVENT_FLAG_ARCH 0x000fffff #define PERF_EVENT_FLAG_USER_READ_CNT 0x80000000 +static_assert((PERF_EVENT_FLAG_USER_READ_CNT & PERF_EVENT_FLAG_ARCH) == 0); + /** * struct hw_perf_event - performance event hardware details: */ -- cgit From 91207f62616f9f51b52436364e6d064f002e9112 Mon Sep 17 00:00:00 2001 From: Anshuman Khandual Date: Wed, 7 Sep 2022 14:49:23 +0530 Subject: arm64/perf: Assert all platform event flags are within PERF_EVENT_FLAG_ARCH Ensure all platform specific event flags are within PERF_EVENT_FLAG_ARCH. Signed-off-by: Anshuman Khandual Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: James Clark Link: https://lkml.kernel.org/r/20220907091924.439193-4-anshuman.khandual@arm.com --- include/linux/perf/arm_pmu.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index 0407a38b470a..0356cb6a215d 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -24,10 +24,11 @@ /* * ARM PMU hw_event flags */ -/* Event uses a 64bit counter */ -#define ARMPMU_EVT_64BIT 1 -/* Event uses a 47bit counter */ -#define ARMPMU_EVT_47BIT 2 +#define ARMPMU_EVT_64BIT 0x00001 /* Event uses a 64bit counter */ +#define ARMPMU_EVT_47BIT 0x00002 /* Event uses a 47bit counter */ + +static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_64BIT) == ARMPMU_EVT_64BIT); +static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_47BIT) == ARMPMU_EVT_47BIT); #define HW_OP_UNSUPPORTED 0xFFFF #define C(_x) PERF_COUNT_HW_CACHE_##_x -- cgit From f3c0eba287049237b23d1300376768293eb89e69 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Fri, 2 Sep 2022 18:48:55 +0200 Subject: perf: Add a few assertions While auditing 6b959ba22d34 ("perf/core: Fix reentry problem in perf_output_read_group()") a few spots were found that wanted assertions. Notable for_each_sibling_event() relies on exclusion from modification. This would normally be holding either ctx->lock or ctx->mutex, however due to how things are constructed disabling IRQs is a valid and sufficient substitute for ctx->lock. Another possible site to add assertions would be the various pmu::{add,del,read,..}() methods, but that's not trivially expressable in C -- the best option is wrappers, but those are easy enough to forget. Signed-off-by: Peter Zijlstra (Intel) --- include/linux/perf_event.h | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index f88cb31eaf75..368bdc4f563f 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -61,6 +61,7 @@ struct perf_guest_info_callbacks { #include #include #include +#include #include struct perf_callchain_entry { @@ -634,7 +635,23 @@ struct pmu_event_list { struct list_head list; }; +/* + * event->sibling_list is modified whole holding both ctx->lock and ctx->mutex + * as such iteration must hold either lock. However, since ctx->lock is an IRQ + * safe lock, and is only held by the CPU doing the modification, having IRQs + * disabled is sufficient since it will hold-off the IPIs. + */ +#ifdef CONFIG_PROVE_LOCKING +#define lockdep_assert_event_ctx(event) \ + WARN_ON_ONCE(__lockdep_enabled && \ + (this_cpu_read(hardirqs_enabled) || \ + lockdep_is_held(&(event)->ctx->mutex) != LOCK_STATE_HELD)) +#else +#define lockdep_assert_event_ctx(event) +#endif + #define for_each_sibling_event(sibling, event) \ + lockdep_assert_event_ctx(event); \ if ((event)->group_leader == (event)) \ list_for_each_entry((sibling), &(event)->sibling_list, sibling_list) -- cgit From d219d2a9a92e39aa92799efe8f2aa21259b6dd82 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 29 Aug 2022 13:37:17 -0700 Subject: overflow: Allow mixed type arguments When the check_[op]_overflow() helpers were introduced, all arguments were required to be the same type to make the fallback macros simpler. However, now that the fallback macros have been removed[1], it is fine to allow mixed types, which makes using the helpers much more useful, as they can be used to test for type-based overflows (e.g. adding two large ints but storing into a u8), as would be handy in the drm core[2]. Remove the restriction, and add additional self-tests that exercise some of the mixed-type overflow cases, and double-check for accidental macro side-effects. [1] https://git.kernel.org/linus/4eb6bd55cfb22ffc20652732340c4962f3ac9a91 [2] https://lore.kernel.org/lkml/20220824084514.2261614-2-gwan-gyeong.mun@intel.com Cc: Rasmus Villemoes Cc: Gwan-gyeong Mun Cc: "Gustavo A. R. Silva" Cc: Nick Desaulniers Cc: linux-hardening@vger.kernel.org Reviewed-by: Andrzej Hajda Reviewed-by: Gwan-gyeong Mun Tested-by: Gwan-gyeong Mun Signed-off-by: Kees Cook --- include/linux/overflow.h | 72 +++++++++++++++++++++++++++--------------------- 1 file changed, 41 insertions(+), 31 deletions(-) (limited to 'include/linux') diff --git a/include/linux/overflow.h b/include/linux/overflow.h index 0eb3b192f07a..19dfdd74835e 100644 --- a/include/linux/overflow.h +++ b/include/linux/overflow.h @@ -51,40 +51,50 @@ static inline bool __must_check __must_check_overflow(bool overflow) return unlikely(overflow); } -/* - * For simplicity and code hygiene, the fallback code below insists on - * a, b and *d having the same type (similar to the min() and max() - * macros), whereas gcc's type-generic overflow checkers accept - * different types. Hence we don't just make check_add_overflow an - * alias for __builtin_add_overflow, but add type checks similar to - * below. +/** check_add_overflow() - Calculate addition with overflow checking + * + * @a: first addend + * @b: second addend + * @d: pointer to store sum + * + * Returns 0 on success. + * + * *@d holds the results of the attempted addition, but is not considered + * "safe for use" on a non-zero return value, which indicates that the + * sum has overflowed or been truncated. */ -#define check_add_overflow(a, b, d) __must_check_overflow(({ \ - typeof(a) __a = (a); \ - typeof(b) __b = (b); \ - typeof(d) __d = (d); \ - (void) (&__a == &__b); \ - (void) (&__a == __d); \ - __builtin_add_overflow(__a, __b, __d); \ -})) +#define check_add_overflow(a, b, d) \ + __must_check_overflow(__builtin_add_overflow(a, b, d)) -#define check_sub_overflow(a, b, d) __must_check_overflow(({ \ - typeof(a) __a = (a); \ - typeof(b) __b = (b); \ - typeof(d) __d = (d); \ - (void) (&__a == &__b); \ - (void) (&__a == __d); \ - __builtin_sub_overflow(__a, __b, __d); \ -})) +/** check_sub_overflow() - Calculate subtraction with overflow checking + * + * @a: minuend; value to subtract from + * @b: subtrahend; value to subtract from @a + * @d: pointer to store difference + * + * Returns 0 on success. + * + * *@d holds the results of the attempted subtraction, but is not considered + * "safe for use" on a non-zero return value, which indicates that the + * difference has underflowed or been truncated. + */ +#define check_sub_overflow(a, b, d) \ + __must_check_overflow(__builtin_sub_overflow(a, b, d)) -#define check_mul_overflow(a, b, d) __must_check_overflow(({ \ - typeof(a) __a = (a); \ - typeof(b) __b = (b); \ - typeof(d) __d = (d); \ - (void) (&__a == &__b); \ - (void) (&__a == __d); \ - __builtin_mul_overflow(__a, __b, __d); \ -})) +/** check_mul_overflow() - Calculate multiplication with overflow checking + * + * @a: first factor + * @b: second factor + * @d: pointer to store product + * + * Returns 0 on success. + * + * *@d holds the results of the attempted multiplication, but is not + * considered "safe for use" on a non-zero return value, which indicates + * that the product has overflowed or been truncated. + */ +#define check_mul_overflow(a, b, d) \ + __must_check_overflow(__builtin_mul_overflow(a, b, d)) /** check_shl_overflow() - Calculate a left-shifted value and check overflow * -- cgit From dfbafa70bde26c40615f8c538ce68dac82a64fb4 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 26 Aug 2022 11:04:43 -0700 Subject: string: Introduce strtomem() and strtomem_pad() One of the "legitimate" uses of strncpy() is copying a NUL-terminated string into a fixed-size non-NUL-terminated character array. To avoid the weaknesses and ambiguity of intent when using strncpy(), provide replacement functions that explicitly distinguish between trailing padding and not, and require the destination buffer size be discoverable by the compiler. For example: struct obj { int foo; char small[4] __nonstring; char big[8] __nonstring; int bar; }; struct obj p; /* This will truncate to 4 chars with no trailing NUL */ strncpy(p.small, "hello", sizeof(p.small)); /* p.small contains 'h', 'e', 'l', 'l' */ /* This will NUL pad to 8 chars. */ strncpy(p.big, "hello", sizeof(p.big)); /* p.big contains 'h', 'e', 'l', 'l', 'o', '\0', '\0', '\0' */ When the "__nonstring" attributes are missing, the intent of the programmer becomes ambiguous for whether the lack of a trailing NUL in the p.small copy is a bug. Additionally, it's not clear whether the trailing padding in the p.big copy is _needed_. Both cases become unambiguous with: strtomem(p.small, "hello"); strtomem_pad(p.big, "hello", 0); See also https://github.com/KSPP/linux/issues/90 Expand the memcpy KUnit tests to include these functions. Cc: Wolfram Sang Cc: Nick Desaulniers Cc: Geert Uytterhoeven Cc: Guenter Roeck Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 32 +++++++++++++++++++++++++++++++ include/linux/string.h | 43 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 75 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 3b401fa0f374..8e8c2c87b1d5 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -77,6 +77,38 @@ extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) #define POS __pass_object_size(1) #define POS0 __pass_object_size(0) +/** + * strncpy - Copy a string to memory with non-guaranteed NUL padding + * + * @p: pointer to destination of copy + * @q: pointer to NUL-terminated source string to copy + * @size: bytes to write at @p + * + * If strlen(@q) >= @size, the copy of @q will stop after @size bytes, + * and @p will NOT be NUL-terminated + * + * If strlen(@q) < @size, following the copy of @q, trailing NUL bytes + * will be written to @p until @size total bytes have been written. + * + * Do not use this function. While FORTIFY_SOURCE tries to avoid + * over-reads of @q, it cannot defend against writing unterminated + * results to @p. Using strncpy() remains ambiguous and fragile. + * Instead, please choose an alternative, so that the expectation + * of @p's contents is unambiguous: + * + * +--------------------+-----------------+------------+ + * | @p needs to be: | padded to @size | not padded | + * +====================+=================+============+ + * | NUL-terminated | strscpy_pad() | strscpy() | + * +--------------------+-----------------+------------+ + * | not NUL-terminated | strtomem_pad() | strtomem() | + * +--------------------+-----------------+------------+ + * + * Note strscpy*()'s differing return values for detecting truncation, + * and strtomem*()'s expectation that the destination is marked with + * __nonstring when it is a character array. + * + */ __FORTIFY_INLINE __diagnose_as(__builtin_strncpy, 1, 2, 3) char *strncpy(char * const POS p, const char *q, __kernel_size_t size) { diff --git a/include/linux/string.h b/include/linux/string.h index 61ec7e4f6311..cf7607b32102 100644 --- a/include/linux/string.h +++ b/include/linux/string.h @@ -260,6 +260,49 @@ static inline const char *kbasename(const char *path) void memcpy_and_pad(void *dest, size_t dest_len, const void *src, size_t count, int pad); +/** + * strtomem_pad - Copy NUL-terminated string to non-NUL-terminated buffer + * + * @dest: Pointer of destination character array (marked as __nonstring) + * @src: Pointer to NUL-terminated string + * @pad: Padding character to fill any remaining bytes of @dest after copy + * + * This is a replacement for strncpy() uses where the destination is not + * a NUL-terminated string, but with bounds checking on the source size, and + * an explicit padding character. If padding is not required, use strtomem(). + * + * Note that the size of @dest is not an argument, as the length of @dest + * must be discoverable by the compiler. + */ +#define strtomem_pad(dest, src, pad) do { \ + const size_t _dest_len = __builtin_object_size(dest, 1); \ + \ + BUILD_BUG_ON(!__builtin_constant_p(_dest_len) || \ + _dest_len == (size_t)-1); \ + memcpy_and_pad(dest, _dest_len, src, strnlen(src, _dest_len), pad); \ +} while (0) + +/** + * strtomem - Copy NUL-terminated string to non-NUL-terminated buffer + * + * @dest: Pointer of destination character array (marked as __nonstring) + * @src: Pointer to NUL-terminated string + * + * This is a replacement for strncpy() uses where the destination is not + * a NUL-terminated string, but with bounds checking on the source size, and + * without trailing padding. If padding is required, use strtomem_pad(). + * + * Note that the size of @dest is not an argument, as the length of @dest + * must be discoverable by the compiler. + */ +#define strtomem(dest, src) do { \ + const size_t _dest_len = __builtin_object_size(dest, 1); \ + \ + BUILD_BUG_ON(!__builtin_constant_p(_dest_len) || \ + _dest_len == (size_t)-1); \ + memcpy(dest, src, min(_dest_len, strnlen(src, _dest_len))); \ +} while (0) + /** * memset_after - Set a value after a struct member to the end of a struct * -- cgit From d07c0acb4f41cc42a0d97530946965b3e4fa68c1 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 2 Sep 2022 13:02:26 -0700 Subject: fortify: Fix __compiletime_strlen() under UBSAN_BOUNDS_LOCAL With CONFIG_FORTIFY=y and CONFIG_UBSAN_LOCAL_BOUNDS=y enabled, we observe a runtime panic while running Android's Compatibility Test Suite's (CTS) android.hardware.input.cts.tests. This is stemming from a strlen() call in hidinput_allocate(). __compiletime_strlen() is implemented in terms of __builtin_object_size(), then does an array access to check for NUL-termination. A quirk of __builtin_object_size() is that for strings whose values are runtime dependent, __builtin_object_size(str, 1 or 0) returns the maximum size of possible values when those sizes are determinable at compile time. Example: static const char *v = "FOO BAR"; static const char *y = "FOO BA"; unsigned long x (int z) { // Returns 8, which is: // max(__builtin_object_size(v, 1), __builtin_object_size(y, 1)) return __builtin_object_size(z ? v : y, 1); } So when FORTIFY_SOURCE is enabled, the current implementation of __compiletime_strlen() will try to access beyond the end of y at runtime using the size of v. Mixed with UBSAN_LOCAL_BOUNDS we get a fault. hidinput_allocate() has a local C string whose value is control flow dependent on a switch statement, so __builtin_object_size(str, 1) evaluates to the maximum string length, making all other cases fault on the last character check. hidinput_allocate() could be cleaned up to avoid runtime calls to strlen() since the local variable can only have literal values, so there's no benefit to trying to fortify the strlen call site there. Perform a __builtin_constant_p() check against index 0 earlier in the macro to filter out the control-flow-dependant case. Add a KUnit test for checking the expected behavioral characteristics of FORTIFY_SOURCE internals. Cc: Nathan Chancellor Cc: Tom Rix Cc: Andrew Morton Cc: Vlastimil Babka Cc: "Steven Rostedt (Google)" Cc: David Gow Cc: Yury Norov Cc: Masami Hiramatsu Cc: Sander Vanheule Cc: linux-hardening@vger.kernel.org Cc: llvm@lists.linux.dev Reviewed-by: Nick Desaulniers Tested-by: Android Treehugger Robot Link: https://android-review.googlesource.com/c/kernel/common/+/2206839 Co-developed-by: Nick Desaulniers Signed-off-by: Nick Desaulniers Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 8e8c2c87b1d5..be264091f7a7 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -19,7 +19,8 @@ void __write_overflow_field(size_t avail, size_t wanted) __compiletime_warning(" unsigned char *__p = (unsigned char *)(p); \ size_t __ret = (size_t)-1; \ size_t __p_size = __builtin_object_size(p, 1); \ - if (__p_size != (size_t)-1) { \ + if (__p_size != (size_t)-1 && \ + __builtin_constant_p(*__p)) { \ size_t __p_len = __p_size - 1; \ if (__builtin_constant_p(__p[__p_len]) && \ __p[__p_len] == '\0') \ -- cgit From 311fb40aa0569abacc430b0d66ee41470803111f Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 2 Sep 2022 13:23:06 -0700 Subject: fortify: Use SIZE_MAX instead of (size_t)-1 Clean up uses of "(size_t)-1" in favor of SIZE_MAX. Cc: linux-hardening@vger.kernel.org Suggested-by: Nick Desaulniers Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 29 +++++++++++++++-------------- 1 file changed, 15 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index be264091f7a7..e46af17d23d0 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -3,6 +3,7 @@ #define _LINUX_FORTIFY_STRING_H_ #include +#include #define __FORTIFY_INLINE extern __always_inline __gnu_inline __overloadable #define __RENAME(x) __asm__(#x) @@ -17,9 +18,9 @@ void __write_overflow_field(size_t avail, size_t wanted) __compiletime_warning(" #define __compiletime_strlen(p) \ ({ \ unsigned char *__p = (unsigned char *)(p); \ - size_t __ret = (size_t)-1; \ + size_t __ret = SIZE_MAX; \ size_t __p_size = __builtin_object_size(p, 1); \ - if (__p_size != (size_t)-1 && \ + if (__p_size != SIZE_MAX && \ __builtin_constant_p(*__p)) { \ size_t __p_len = __p_size - 1; \ if (__builtin_constant_p(__p[__p_len]) && \ @@ -127,7 +128,7 @@ char *strcat(char * const POS p, const char *q) { size_t p_size = __builtin_object_size(p, 1); - if (p_size == (size_t)-1) + if (p_size == SIZE_MAX) return __underlying_strcat(p, q); if (strlcat(p, q, p_size) >= p_size) fortify_panic(__func__); @@ -142,7 +143,7 @@ __FORTIFY_INLINE __kernel_size_t strnlen(const char * const POS p, __kernel_size size_t ret; /* We can take compile-time actions when maxlen is const. */ - if (__builtin_constant_p(maxlen) && p_len != (size_t)-1) { + if (__builtin_constant_p(maxlen) && p_len != SIZE_MAX) { /* If p is const, we can use its compile-time-known len. */ if (maxlen >= p_size) return p_len; @@ -170,7 +171,7 @@ __kernel_size_t __fortify_strlen(const char * const POS p) size_t p_size = __builtin_object_size(p, 1); /* Give up if we don't know how large p is. */ - if (p_size == (size_t)-1) + if (p_size == SIZE_MAX) return __underlying_strlen(p); ret = strnlen(p, p_size); if (p_size <= ret) @@ -187,7 +188,7 @@ __FORTIFY_INLINE size_t strlcpy(char * const POS p, const char * const POS q, si size_t q_len; /* Full count of source string length. */ size_t len; /* Count of characters going into destination. */ - if (p_size == (size_t)-1 && q_size == (size_t)-1) + if (p_size == SIZE_MAX && q_size == SIZE_MAX) return __real_strlcpy(p, q, size); q_len = strlen(q); len = (q_len >= size) ? size - 1 : q_len; @@ -215,7 +216,7 @@ __FORTIFY_INLINE ssize_t strscpy(char * const POS p, const char * const POS q, s size_t q_size = __builtin_object_size(q, 1); /* If we cannot get size of p and q default to call strscpy. */ - if (p_size == (size_t) -1 && q_size == (size_t) -1) + if (p_size == SIZE_MAX && q_size == SIZE_MAX) return __real_strscpy(p, q, size); /* @@ -260,7 +261,7 @@ char *strncat(char * const POS p, const char * const POS q, __kernel_size_t coun size_t p_size = __builtin_object_size(p, 1); size_t q_size = __builtin_object_size(q, 1); - if (p_size == (size_t)-1 && q_size == (size_t)-1) + if (p_size == SIZE_MAX && q_size == SIZE_MAX) return __underlying_strncat(p, q, count); p_len = strlen(p); copy_len = strnlen(q, count); @@ -301,10 +302,10 @@ __FORTIFY_INLINE void fortify_memset_chk(__kernel_size_t size, /* * Always stop accesses beyond the struct that contains the * field, when the buffer's remaining size is known. - * (The -1 test is to optimize away checks where the buffer + * (The SIZE_MAX test is to optimize away checks where the buffer * lengths are unknown.) */ - if (p_size != (size_t)(-1) && p_size < size) + if (p_size != SIZE_MAX && p_size < size) fortify_panic("memset"); } @@ -395,11 +396,11 @@ __FORTIFY_INLINE void fortify_memcpy_chk(__kernel_size_t size, /* * Always stop accesses beyond the struct that contains the * field, when the buffer's remaining size is known. - * (The -1 test is to optimize away checks where the buffer + * (The SIZE_MAX test is to optimize away checks where the buffer * lengths are unknown.) */ - if ((p_size != (size_t)(-1) && p_size < size) || - (q_size != (size_t)(-1) && q_size < size)) + if ((p_size != SIZE_MAX && p_size < size) || + (q_size != SIZE_MAX && q_size < size)) fortify_panic(func); } @@ -498,7 +499,7 @@ char *strcpy(char * const POS p, const char * const POS q) size_t size; /* If neither buffer size is known, immediately give up. */ - if (p_size == (size_t)-1 && q_size == (size_t)-1) + if (p_size == SIZE_MAX && q_size == SIZE_MAX) return __underlying_strcpy(p, q); size = strlen(q) + 1; /* Compile-time check for const size overflow. */ -- cgit From 54d9469bc515dc5fcbc20eecbe19cea868b70d68 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Thu, 24 Jun 2021 15:39:26 -0700 Subject: fortify: Add run-time WARN for cross-field memcpy() Enable run-time checking of dynamic memcpy() and memmove() lengths, issuing a WARN when a write would exceed the size of the target struct member, when built with CONFIG_FORTIFY_SOURCE=y. This would have caught all of the memcpy()-based buffer overflows in the last 3 years, specifically covering all the cases where the destination buffer size is known at compile time. This change ONLY adds a run-time warning. As false positives are currently still expected, this will not block the overflow. The new warnings will look like this: memcpy: detected field-spanning write (size N) of single field "var->dest" (size M) WARNING: CPU: n PID: pppp at source/file/path.c:nr function+0xXX/0xXX [module] There may be false positives in the kernel where intentional field-spanning writes are happening. These need to be addressed similarly to how the compile-time cases were addressed: add a struct_group(), split the memcpy(), or some other refactoring. In order to make counting/investigating instances of added runtime checks easier, each instance includes the destination variable name as a WARN argument, prefixed with 'field "'. Therefore, on an x86_64 defconfig build, it is trivial to inspect the build artifacts to find instances. For example on an x86_64 defconfig build, there are 78 new run-time memcpy() bounds checks added: $ for i in vmlinux $(find . -name '*.ko'); do \ strings "$i" | grep '^field "'; done | wc -l 78 Simple cases where a destination buffer is known to be a dynamic size do not generate a WARN. For example: struct normal_flex_array { void *a; int b; u32 c; size_t array_size; u8 flex_array[]; }; struct normal_flex_array *instance; ... /* These will be ignored for run-time bounds checking. */ memcpy(instance, src, len); memcpy(instance->flex_array, src, len); However, one of the dynamic-sized destination cases is irritatingly unable to be detected by the compiler: when using memcpy() to target a composite struct member which contains a trailing flexible array struct. For example: struct wrapper { int foo; char bar; struct normal_flex_array embedded; }; struct wrapper *instance; ... /* This will incorrectly WARN when len > sizeof(instance->embedded) */ memcpy(&instance->embedded, src, len); These cases end up appearing to the compiler to be sized as if the flexible array had 0 elements. :( For more details see: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832 https://godbolt.org/z/vW6x8vh4P These "composite flexible array structure destination" cases will be need to be flushed out and addressed on a case-by-case basis. Regardless, for the general case of using memcpy() on flexible array destinations, future APIs will be created to handle common cases. Those can be used to migrate away from open-coded memcpy() so that proper error handling (instead of trapping) can be used. As mentioned, none of these bounds checks block any overflows currently. For users that have tested their workloads, do not encounter any warnings, and wish to make these checks stop any overflows, they can use a big hammer and set the sysctl panic_on_warn=1. Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 70 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 67 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index e46af17d23d0..ff879efe94ed 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -2,6 +2,7 @@ #ifndef _LINUX_FORTIFY_STRING_H_ #define _LINUX_FORTIFY_STRING_H_ +#include #include #include @@ -353,7 +354,7 @@ __FORTIFY_INLINE void fortify_memset_chk(__kernel_size_t size, * V = vulnerable to run-time overflow (will need refactoring to solve) * */ -__FORTIFY_INLINE void fortify_memcpy_chk(__kernel_size_t size, +__FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size, const size_t p_size, const size_t q_size, const size_t p_size_field, @@ -402,16 +403,79 @@ __FORTIFY_INLINE void fortify_memcpy_chk(__kernel_size_t size, if ((p_size != SIZE_MAX && p_size < size) || (q_size != SIZE_MAX && q_size < size)) fortify_panic(func); + + /* + * Warn when writing beyond destination field size. + * + * We must ignore p_size_field == 0 for existing 0-element + * fake flexible arrays, until they are all converted to + * proper flexible arrays. + * + * The implementation of __builtin_object_size() behaves + * like sizeof() when not directly referencing a flexible + * array member, which means there will be many bounds checks + * that will appear at run-time, without a way for them to be + * detected at compile-time (as can be done when the destination + * is specifically the flexible array member). + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101832 + */ + if (p_size_field != 0 && p_size_field != SIZE_MAX && + p_size != p_size_field && p_size_field < size) + return true; + + return false; } #define __fortify_memcpy_chk(p, q, size, p_size, q_size, \ p_size_field, q_size_field, op) ({ \ size_t __fortify_size = (size_t)(size); \ - fortify_memcpy_chk(__fortify_size, p_size, q_size, \ - p_size_field, q_size_field, #op); \ + WARN_ONCE(fortify_memcpy_chk(__fortify_size, p_size, q_size, \ + p_size_field, q_size_field, #op), \ + #op ": detected field-spanning write (size %zu) of single %s (size %zu)\n", \ + __fortify_size, \ + "field \"" #p "\" at " __FILE__ ":" __stringify(__LINE__), \ + p_size_field); \ __underlying_##op(p, q, __fortify_size); \ }) +/* + * Notes about compile-time buffer size detection: + * + * With these types... + * + * struct middle { + * u16 a; + * u8 middle_buf[16]; + * int b; + * }; + * struct end { + * u16 a; + * u8 end_buf[16]; + * }; + * struct flex { + * int a; + * u8 flex_buf[]; + * }; + * + * void func(TYPE *ptr) { ... } + * + * Cases where destination size cannot be currently detected: + * - the size of ptr's object (seemingly by design, gcc & clang fail): + * __builtin_object_size(ptr, 1) == SIZE_MAX + * - the size of flexible arrays in ptr's obj (by design, dynamic size): + * __builtin_object_size(ptr->flex_buf, 1) == SIZE_MAX + * - the size of ANY array at the end of ptr's obj (gcc and clang bug): + * __builtin_object_size(ptr->end_buf, 1) == SIZE_MAX + * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836 + * + * Cases where destination size is currently detected: + * - the size of non-array members within ptr's object: + * __builtin_object_size(ptr->a, 1) == 2 + * - the size of non-flexible-array in the middle of ptr's obj: + * __builtin_object_size(ptr->middle_buf, 1) == 16 + * + */ + /* * __builtin_object_size() must be captured here to avoid evaluating argument * side-effects further into the macro layers. -- cgit From b239da34203f49c40b5d656220c39647c3ff0b3c Mon Sep 17 00:00:00 2001 From: Kumar Kartikeya Dwivedi Date: Sun, 4 Sep 2022 22:41:28 +0200 Subject: bpf: Add helper macro bpf_for_each_reg_in_vstate For a lot of use cases in future patches, we will want to modify the state of registers part of some same 'group' (e.g. same ref_obj_id). It won't just be limited to releasing reference state, but setting a type flag dynamically based on certain actions, etc. Hence, we need a way to easily pass a callback to the function that iterates over all registers in current bpf_verifier_state in all frames upto (and including) the curframe. While in C++ we would be able to easily use a lambda to pass state and the callback together, sadly we aren't using C++ in the kernel. The next best thing to avoid defining a function for each case seems like statement expressions in GNU C. The kernel already uses them heavily, hence they can passed to the macro in the style of a lambda. The statement expression will then be substituted in the for loop bodies. Variables __state and __reg are set to current bpf_func_state and reg for each invocation of the expression inside the passed in verifier state. Then, convert mark_ptr_or_null_regs, clear_all_pkt_pointers, release_reference, find_good_pkt_pointers, find_equal_scalars to use bpf_for_each_reg_in_vstate. Signed-off-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220904204145.3089-16-memxor@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 8fbc1d05281e..b49a349cc6ae 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -348,6 +348,27 @@ struct bpf_verifier_state { iter < frame->allocated_stack / BPF_REG_SIZE; \ iter++, reg = bpf_get_spilled_reg(iter, frame)) +/* Invoke __expr over regsiters in __vst, setting __state and __reg */ +#define bpf_for_each_reg_in_vstate(__vst, __state, __reg, __expr) \ + ({ \ + struct bpf_verifier_state *___vstate = __vst; \ + int ___i, ___j; \ + for (___i = 0; ___i <= ___vstate->curframe; ___i++) { \ + struct bpf_reg_state *___regs; \ + __state = ___vstate->frame[___i]; \ + ___regs = __state->regs; \ + for (___j = 0; ___j < MAX_BPF_REG; ___j++) { \ + __reg = &___regs[___j]; \ + (void)(__expr); \ + } \ + bpf_for_each_spilled_reg(___j, __state, __reg) { \ + if (!__reg) \ + continue; \ + (void)(__expr); \ + } \ + } \ + }) + /* linked list of verifier states used to prune search */ struct bpf_verifier_state_list { struct bpf_verifier_state state; -- cgit From d01387fc16421cbbf95d1fda8fe1258195396c64 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 7 Sep 2022 15:52:31 +0100 Subject: firmware: arm_ffa: Add pointer to the ffa_dev_ops in struct ffa_dev Currently ffa_dev_ops_get() is the way to fetch the ffa_dev_ops pointer. It checks if the ffa_dev structure pointer is valid before returning the ffa_dev_ops pointer. Instead, the pointer can be made part of the ffa_dev structure and since the core driver is incharge of creating ffa_device for each identified partition, there is no need to check for the validity explicitly if the pointer is embedded in the structure. Add the pointer to the ffa_dev_ops in the ffa_dev structure itself and initialise the same as part of creation of the device. Link: https://lore.kernel.org/r/20220907145240.1683088-2-sudeep.holla@arm.com Reviewed-by: Jens Wiklander Signed-off-by: Sudeep Holla --- include/linux/arm_ffa.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h index e5c76c1ef9ed..91b47e42b73d 100644 --- a/include/linux/arm_ffa.h +++ b/include/linux/arm_ffa.h @@ -17,6 +17,7 @@ struct ffa_device { bool mode_32bit; uuid_t uuid; struct device dev; + const struct ffa_dev_ops *ops; }; #define to_ffa_dev(d) container_of(d, struct ffa_device, dev) @@ -47,7 +48,8 @@ static inline void *ffa_dev_get_drvdata(struct ffa_device *fdev) } #if IS_REACHABLE(CONFIG_ARM_FFA_TRANSPORT) -struct ffa_device *ffa_device_register(const uuid_t *uuid, int vm_id); +struct ffa_device *ffa_device_register(const uuid_t *uuid, int vm_id, + const struct ffa_dev_ops *ops); void ffa_device_unregister(struct ffa_device *ffa_dev); int ffa_driver_register(struct ffa_driver *driver, struct module *owner, const char *mod_name); @@ -57,7 +59,8 @@ const struct ffa_dev_ops *ffa_dev_ops_get(struct ffa_device *dev); #else static inline -struct ffa_device *ffa_device_register(const uuid_t *uuid, int vm_id) +struct ffa_device *ffa_device_register(const uuid_t *uuid, int vm_id, + const struct ffa_dev_ops *ops) { return NULL; } -- cgit From 55bf84fd0a76894ae29c69b3552e073fa37818be Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 7 Sep 2022 15:52:33 +0100 Subject: firmware: arm_ffa: Remove ffa_dev_ops_get() The only user of this exported ffa_dev_ops_get() was OPTEE driver which now uses ffa_dev->ops directly, there are no other users for this. Also, since any ffa driver can use ffa_dev->ops directly, there will be no need for ffa_dev_ops_get(), so just remove ffa_dev_ops_get(). Link: https://lore.kernel.org/r/20220907145240.1683088-4-sudeep.holla@arm.com Reviewed-by: Jens Wiklander Signed-off-by: Sudeep Holla --- include/linux/arm_ffa.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h index 91b47e42b73d..556f50f27fb1 100644 --- a/include/linux/arm_ffa.h +++ b/include/linux/arm_ffa.h @@ -55,7 +55,6 @@ int ffa_driver_register(struct ffa_driver *driver, struct module *owner, const char *mod_name); void ffa_driver_unregister(struct ffa_driver *driver); bool ffa_device_is_valid(struct ffa_device *ffa_dev); -const struct ffa_dev_ops *ffa_dev_ops_get(struct ffa_device *dev); #else static inline @@ -79,11 +78,6 @@ static inline void ffa_driver_unregister(struct ffa_driver *driver) {} static inline bool ffa_device_is_valid(struct ffa_device *ffa_dev) { return false; } -static inline -const struct ffa_dev_ops *ffa_dev_ops_get(struct ffa_device *dev) -{ - return NULL; -} #endif /* CONFIG_ARM_FFA_TRANSPORT */ #define ffa_register(driver) \ -- cgit From 8c3812c8f74f050278d734ec4b90149d84bdbefb Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 7 Sep 2022 15:52:36 +0100 Subject: firmware: arm_ffa: Make memory apis ffa_device independent There is a requirement to make memory APIs independent of the ffa_device. One of the use-case is to have a common memory driver that manages the memory for all the ffa_devices. That common memory driver won't be a ffa_driver or won't have any ffa_device associated with it. So having these memory APIs accessible without a ffa_device is needed and should be possible as most of these are handled by the partition manager(SPM or hypervisor). Drop the ffa_device argument to the memory APIs and make them ffa_device independent. Link: https://lore.kernel.org/r/20220907145240.1683088-7-sudeep.holla@arm.com Acked-by: Jens Wiklander Signed-off-by: Sudeep Holla --- include/linux/arm_ffa.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h index 556f50f27fb1..eafab07c9f58 100644 --- a/include/linux/arm_ffa.h +++ b/include/linux/arm_ffa.h @@ -262,10 +262,8 @@ struct ffa_dev_ops { int (*sync_send_receive)(struct ffa_device *dev, struct ffa_send_direct_data *data); int (*memory_reclaim)(u64 g_handle, u32 flags); - int (*memory_share)(struct ffa_device *dev, - struct ffa_mem_ops_args *args); - int (*memory_lend)(struct ffa_device *dev, - struct ffa_mem_ops_args *args); + int (*memory_share)(struct ffa_mem_ops_args *args); + int (*memory_lend)(struct ffa_mem_ops_args *args); }; #endif /* _LINUX_ARM_FFA_H */ -- cgit From 7aa7a97989557011f762a4b7c2e4e3b061b638e4 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 7 Sep 2022 15:52:37 +0100 Subject: firmware: arm_ffa: Rename ffa_dev_ops as ffa_ops Except the message APIs, all other APIs are ffa_device independent and can be used without any associated ffa_device from a non ffa_driver. In order to reflect the same, just rename ffa_dev_ops as ffa_ops to avoid any confusion or to keep it simple. Link: https://lore.kernel.org/r/20220907145240.1683088-8-sudeep.holla@arm.com Suggested-by: Sumit Garg Reviewed-by: Sumit Garg Reviewed-by: Jens Wiklander Signed-off-by: Sudeep Holla --- include/linux/arm_ffa.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h index eafab07c9f58..4c4b06783035 100644 --- a/include/linux/arm_ffa.h +++ b/include/linux/arm_ffa.h @@ -17,7 +17,7 @@ struct ffa_device { bool mode_32bit; uuid_t uuid; struct device dev; - const struct ffa_dev_ops *ops; + const struct ffa_ops *ops; }; #define to_ffa_dev(d) container_of(d, struct ffa_device, dev) @@ -49,7 +49,7 @@ static inline void *ffa_dev_get_drvdata(struct ffa_device *fdev) #if IS_REACHABLE(CONFIG_ARM_FFA_TRANSPORT) struct ffa_device *ffa_device_register(const uuid_t *uuid, int vm_id, - const struct ffa_dev_ops *ops); + const struct ffa_ops *ops); void ffa_device_unregister(struct ffa_device *ffa_dev); int ffa_driver_register(struct ffa_driver *driver, struct module *owner, const char *mod_name); @@ -59,7 +59,7 @@ bool ffa_device_is_valid(struct ffa_device *ffa_dev); #else static inline struct ffa_device *ffa_device_register(const uuid_t *uuid, int vm_id, - const struct ffa_dev_ops *ops) + const struct ffa_ops *ops) { return NULL; } @@ -254,7 +254,7 @@ struct ffa_mem_ops_args { struct ffa_mem_region_attributes *attrs; }; -struct ffa_dev_ops { +struct ffa_ops { u32 (*api_version_get)(void); int (*partition_info_get)(const char *uuid_str, struct ffa_partition_info *buffer); -- cgit From bb1be749850055d88d839eff0962e5915788f228 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 7 Sep 2022 15:52:38 +0100 Subject: firmware: arm_ffa: Add v1.1 get_partition_info support FF-A v1.1 adds support to discovery the UUIDs of the partitions that was missing in v1.0 and which the driver workarounds by using UUIDs supplied by the ffa_drivers. Add the v1.1 get_partition_info support and disable the workaround if the detected FF-A version is greater than v1.0. Link: https://lore.kernel.org/r/20220907145240.1683088-9-sudeep.holla@arm.com Reviewed-by: Jens Wiklander Signed-off-by: Sudeep Holla --- include/linux/arm_ffa.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h index 4c4b06783035..09567ffd1f49 100644 --- a/include/linux/arm_ffa.h +++ b/include/linux/arm_ffa.h @@ -107,6 +107,7 @@ struct ffa_partition_info { /* partition can send and receive indirect messages. */ #define FFA_PARTITION_INDIRECT_MSG BIT(2) u32 properties; + u32 uuid[4]; }; /* For use with FFA_MSG_SEND_DIRECT_{REQ,RESP} which pass data via registers */ -- cgit From 106b11b1ccd5a43432d9517f4a26629a1658cfe6 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 7 Sep 2022 15:52:39 +0100 Subject: firmware: arm_ffa: Set up 32bit execution mode flag using partiion property FF-A v1.1 adds a flag in the partition properties to indicate if the partition runs in the AArch32 or AArch64 execution state. Use the same to set-up the 32-bit execution flag mode in the ffa_dev automatically if the detected firmware version is above v1.0 and ignore any requests to do the same from the ffa_driver. Link: https://lore.kernel.org/r/20220907145240.1683088-10-sudeep.holla@arm.com Signed-off-by: Sudeep Holla --- include/linux/arm_ffa.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h index 09567ffd1f49..5964b6104996 100644 --- a/include/linux/arm_ffa.h +++ b/include/linux/arm_ffa.h @@ -106,6 +106,8 @@ struct ffa_partition_info { #define FFA_PARTITION_DIRECT_SEND BIT(1) /* partition can send and receive indirect messages. */ #define FFA_PARTITION_INDIRECT_MSG BIT(2) +/* partition runs in the AArch64 execution state. */ +#define FFA_PARTITION_AARCH64_EXEC BIT(8) u32 properties; u32 uuid[4]; }; -- cgit From 5b0c6328e47dccf552996ca711005ca3f44034e9 Mon Sep 17 00:00:00 2001 From: Sudeep Holla Date: Wed, 7 Sep 2022 15:52:40 +0100 Subject: firmware: arm_ffa: Split up ffa_ops into info, message and memory operations In preparation to make memory operations accessible for a non ffa_driver/device, it is better to split the ffa_ops into different categories of operations: info, message and memory. The info and memory are ffa_device independent and can be used without any associated ffa_device from a non ffa_driver. However, we don't export these info and memory APIs yet without the user. The first users of these APIs can export them. Link: https://lore.kernel.org/r/20220907145240.1683088-11-sudeep.holla@arm.com Reviewed-by: Jens Wiklander Signed-off-by: Sudeep Holla --- include/linux/arm_ffa.h | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/arm_ffa.h b/include/linux/arm_ffa.h index 5964b6104996..5f02d2e6b9d9 100644 --- a/include/linux/arm_ffa.h +++ b/include/linux/arm_ffa.h @@ -257,16 +257,28 @@ struct ffa_mem_ops_args { struct ffa_mem_region_attributes *attrs; }; -struct ffa_ops { +struct ffa_info_ops { u32 (*api_version_get)(void); int (*partition_info_get)(const char *uuid_str, struct ffa_partition_info *buffer); +}; + +struct ffa_msg_ops { void (*mode_32bit_set)(struct ffa_device *dev); int (*sync_send_receive)(struct ffa_device *dev, struct ffa_send_direct_data *data); +}; + +struct ffa_mem_ops { int (*memory_reclaim)(u64 g_handle, u32 flags); int (*memory_share)(struct ffa_mem_ops_args *args); int (*memory_lend)(struct ffa_mem_ops_args *args); }; +struct ffa_ops { + const struct ffa_info_ops *info_ops; + const struct ffa_msg_ops *msg_ops; + const struct ffa_mem_ops *mem_ops; +}; + #endif /* _LINUX_ARM_FFA_H */ -- cgit From 2f79cdfe58c13949bbbb65ba5926abfe9561d0ec Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Wed, 31 Aug 2022 09:46:12 -0700 Subject: fs: only do a memory barrier for the first set_buffer_uptodate() Commit d4252071b97d ("add barriers to buffer_uptodate and set_buffer_uptodate") added proper memory barriers to the buffer head BH_Uptodate bit, so that anybody who tests a buffer for being up-to-date will be guaranteed to actually see initialized state. However, that commit didn't _just_ add the memory barrier, it also ended up dropping the "was it already set" logic that the BUFFER_FNS() macro had. That's conceptually the right thing for a generic "this is a memory barrier" operation, but in the case of the buffer contents, we really only care about the memory barrier for the _first_ time we set the bit, in that the only memory ordering protection we need is to avoid anybody seeing uninitialized memory contents. Any other access ordering wouldn't be about the BH_Uptodate bit anyway, and would require some other proper lock (typically BH_Lock or the folio lock). A reader that races with somebody invalidating the buffer head isn't an issue wrt the memory ordering, it's a serialization issue. Now, you'd think that the buffer head operations don't matter in this day and age (and I certainly thought so), but apparently some loads still end up being heavy users of buffer heads. In particular, the kernel test robot reported that not having this bit access optimization in place caused a noticeable direct IO performance regression on ext4: fxmark.ssd_ext4_no_jnl_DWTL_54_directio.works/sec -26.5% regression although you presumably need a fast disk and a lot of cores to actually notice. Link: https://lore.kernel.org/all/Yw8L7HTZ%2FdE2%2Fo9C@xsang-OptiPlex-9020/ Reported-by: kernel test robot Tested-by: Fengwei Yin Cc: Mikulas Patocka Cc: Matthew Wilcox (Oracle) Cc: stable@kernel.org Signed-off-by: Linus Torvalds --- include/linux/buffer_head.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 089c9ade4325..df518c429667 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -137,6 +137,17 @@ BUFFER_FNS(Defer_Completion, defer_completion) static __always_inline void set_buffer_uptodate(struct buffer_head *bh) { + /* + * If somebody else already set this uptodate, they will + * have done the memory barrier, and a reader will thus + * see *some* valid buffer state. + * + * Any other serialization (with IO errors or whatever that + * might clear the bit) has to come from other state (eg BH_Lock). + */ + if (test_bit(BH_Uptodate, &bh->b_state)) + return; + /* * make it consistent with folio_mark_uptodate * pairs with smp_load_acquire in buffer_uptodate -- cgit From 86432b7f8f92b784c2e4af5b02766fb44052abf7 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 8 Sep 2022 16:05:18 +0300 Subject: spi: Group cs_change and cs_off flags together in struct spi_transfer The commit 5e0531f6b90a ("spi: Add capability to perform some transfer with chipselect off") added a new flag but squeezed it into a wrong group of struct spi_transfer members (note that SPI_NBITS_* are macros for easier interpretation of the tx_nbits and rx_nbits bitfields). Group cs_change and cs_off flags together and their doc strings. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220908130518.32186-1-andriy.shevchenko@linux.intel.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 6e6c62c59957..19acd4c9c7e3 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -846,8 +846,8 @@ struct spi_res { * @bits_per_word: select a bits_per_word other than the device default * for this transfer. If 0 the default (from @spi_device) is used. * @dummy_data: indicates transfer is dummy bytes transfer. - * @cs_change: affects chipselect after this transfer completes * @cs_off: performs the transfer with chipselect off. + * @cs_change: affects chipselect after this transfer completes * @cs_change_delay: delay between cs deassert and assert when * @cs_change is set and @spi_transfer is not the last in @spi_message * @delay: delay to be introduced after this transfer before @@ -957,10 +957,10 @@ struct spi_transfer { struct sg_table rx_sg; unsigned dummy_data:1; + unsigned cs_off:1; unsigned cs_change:1; unsigned tx_nbits:3; unsigned rx_nbits:3; - unsigned cs_off:1; #define SPI_NBITS_SINGLE 0x01 /* 1bit transfer */ #define SPI_NBITS_DUAL 0x02 /* 2bits transfer */ #define SPI_NBITS_QUAD 0x04 /* 4bits transfer */ -- cgit From 58ccf0190d19d9a8a41f8a02b9e06742b58df4a1 Mon Sep 17 00:00:00 2001 From: Joao Martins Date: Thu, 8 Sep 2022 21:34:42 +0300 Subject: vfio: Add an IOVA bitmap support The new facility adds a bunch of wrappers that abstract how an IOVA range is represented in a bitmap that is granulated by a given page_size. So it translates all the lifting of dealing with user pointers into its corresponding kernel addresses backing said user memory into doing finally the (non-atomic) bitmap ops to change various bits. The formula for the bitmap is: data[(iova / page_size) / 64] & (1ULL << (iova % 64)) Where 64 is the number of bits in a unsigned long (depending on arch) It introduces an IOVA iterator that uses a windowing scheme to minimize the pinning overhead, as opposed to pinning it on demand 4K at a time. Assuming a 4K kernel page and 4K requested page size, we can use a single kernel page to hold 512 page pointers, mapping 2M of bitmap, representing 64G of IOVA space. An example usage of these helpers for a given @base_iova, @page_size, @length and __user @data: bitmap = iova_bitmap_alloc(base_iova, page_size, length, data); if (IS_ERR(bitmap)) return -ENOMEM; ret = iova_bitmap_for_each(bitmap, arg, dirty_reporter_fn); iova_bitmap_free(bitmap); Each iteration of the @dirty_reporter_fn is called with a unique @iova and @length argument, indicating the current range available through the iova_bitmap. The @dirty_reporter_fn uses iova_bitmap_set() to mark dirty areas (@iova_length) within that provided range, as following: iova_bitmap_set(bitmap, iova, iova_length); The facility is intended to be used for user bitmaps representing dirtied IOVAs by IOMMU (via IOMMUFD) and PCI Devices (via vfio-pci). Signed-off-by: Joao Martins Signed-off-by: Yishai Hadas Link: https://lore.kernel.org/r/20220908183448.195262-5-yishaih@nvidia.com Signed-off-by: Alex Williamson --- include/linux/iova_bitmap.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) create mode 100644 include/linux/iova_bitmap.h (limited to 'include/linux') diff --git a/include/linux/iova_bitmap.h b/include/linux/iova_bitmap.h new file mode 100644 index 000000000000..c006cf0a25f3 --- /dev/null +++ b/include/linux/iova_bitmap.h @@ -0,0 +1,26 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2022, Oracle and/or its affiliates. + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved + */ +#ifndef _IOVA_BITMAP_H_ +#define _IOVA_BITMAP_H_ + +#include + +struct iova_bitmap; + +typedef int (*iova_bitmap_fn_t)(struct iova_bitmap *bitmap, + unsigned long iova, size_t length, + void *opaque); + +struct iova_bitmap *iova_bitmap_alloc(unsigned long iova, size_t length, + unsigned long page_size, + u64 __user *data); +void iova_bitmap_free(struct iova_bitmap *bitmap); +int iova_bitmap_for_each(struct iova_bitmap *bitmap, void *opaque, + iova_bitmap_fn_t fn); +void iova_bitmap_set(struct iova_bitmap *bitmap, + unsigned long iova, size_t length); + +#endif -- cgit From 80c4b92a2dc48cce82a0348add48533db7e07314 Mon Sep 17 00:00:00 2001 From: Yishai Hadas Date: Thu, 8 Sep 2022 21:34:43 +0300 Subject: vfio: Introduce the DMA logging feature support Introduce the DMA logging feature support in the vfio core layer. It includes the processing of the device start/stop/report DMA logging UAPIs and calling the relevant driver 'op' to do the work. Specifically, Upon start, the core translates the given input ranges into an interval tree, checks for unexpected overlapping, non aligned ranges and then pass the translated input to the driver for start tracking the given ranges. Upon report, the core translates the given input user space bitmap and page size into an IOVA kernel bitmap iterator. Then it iterates it and call the driver to set the corresponding bits for the dirtied pages in a specific IOVA range. Upon stop, the driver is called to stop the previous started tracking. The next patches from the series will introduce the mlx5 driver implementation for the logging ops. Signed-off-by: Yishai Hadas Link: https://lore.kernel.org/r/20220908183448.195262-6-yishaih@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e05ddc6fe6a5..0e2826559091 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -14,6 +14,7 @@ #include #include #include +#include struct kvm; @@ -33,10 +34,11 @@ struct vfio_device { struct device *dev; const struct vfio_device_ops *ops; /* - * mig_ops is a static property of the vfio_device which must be set - * prior to registering the vfio_device. + * mig_ops/log_ops is a static property of the vfio_device which must + * be set prior to registering the vfio_device. */ const struct vfio_migration_ops *mig_ops; + const struct vfio_log_ops *log_ops; struct vfio_group *group; struct vfio_device_set *dev_set; struct list_head dev_set_list; @@ -108,6 +110,28 @@ struct vfio_migration_ops { enum vfio_device_mig_state *curr_state); }; +/** + * @log_start: Optional callback to ask the device start DMA logging. + * @log_stop: Optional callback to ask the device stop DMA logging. + * @log_read_and_clear: Optional callback to ask the device read + * and clear the dirty DMAs in some given range. + * + * The vfio core implementation of the DEVICE_FEATURE_DMA_LOGGING_ set + * of features does not track logging state relative to the device, + * therefore the device implementation of vfio_log_ops must handle + * arbitrary user requests. This includes rejecting subsequent calls + * to log_start without an intervening log_stop, as well as graceful + * handling of log_stop and log_read_and_clear from invalid states. + */ +struct vfio_log_ops { + int (*log_start)(struct vfio_device *device, + struct rb_root_cached *ranges, u32 nnodes, u64 *page_size); + int (*log_stop)(struct vfio_device *device); + int (*log_read_and_clear)(struct vfio_device *device, + unsigned long iova, unsigned long length, + struct iova_bitmap *dirty); +}; + /** * vfio_check_feature - Validate user input for the VFIO_DEVICE_FEATURE ioctl * @flags: Arg from the device_feature op -- cgit From 1537bf26e212ffcf007d0590958025f6bfdd4ac8 Mon Sep 17 00:00:00 2001 From: Sergey Matyukevich Date: Tue, 30 Aug 2022 18:53:05 +0300 Subject: perf: RISC-V: exclude invalid pmu counters from SBI calls SBI firmware may not provide information for some counters in response to SBI_EXT_PMU_COUNTER_GET_INFO call. Exclude such counters from the subsequent SBI requests. For this purpose use global mask to keep track of fully specified counters. Signed-off-by: Sergey Matyukevich Reviewed-by: Atish Patra Link: https://lore.kernel.org/r/20220830155306.301714-3-geomatsi@gmail.com Signed-off-by: Palmer Dabbelt --- include/linux/perf/riscv_pmu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/perf/riscv_pmu.h b/include/linux/perf/riscv_pmu.h index bf66fe011fa8..e17e86ad6f3a 100644 --- a/include/linux/perf/riscv_pmu.h +++ b/include/linux/perf/riscv_pmu.h @@ -45,7 +45,7 @@ struct riscv_pmu { irqreturn_t (*handle_irq)(int irq_num, void *dev); - int num_counters; + unsigned long cmask; u64 (*ctr_read)(struct perf_event *event); int (*ctr_get_idx)(struct perf_event *event); int (*ctr_get_width)(int idx); -- cgit From bb5721f063ba63cd54cc5916881a706c21e6abc6 Mon Sep 17 00:00:00 2001 From: Colin Foster Date: Mon, 5 Sep 2022 09:21:25 -0700 Subject: mfd: ocelot: Add helper to get regmap from a resource Several ocelot-related modules are designed for MMIO / regmaps. As such, they often use a combination of devm_platform_get_and_ioremap_resource() and devm_regmap_init_mmio(). Operating in an MFD might be different, in that it could be memory mapped, or it could be SPI, I2C... In these cases a fallback to use IORESOURCE_REG instead of IORESOURCE_MEM becomes necessary. When this happens, there's redundant logic that needs to be implemented in every driver. In order to avoid this redundancy, utilize a single function that, if the MFD scenario is enabled, will perform this fallback logic. Signed-off-by: Colin Foster Reviewed-by: Vladimir Oltean Reviewed-by: Andy Shevchenko Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220905162132.2943088-2-colin.foster@in-advantage.com --- include/linux/mfd/ocelot.h | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) create mode 100644 include/linux/mfd/ocelot.h (limited to 'include/linux') diff --git a/include/linux/mfd/ocelot.h b/include/linux/mfd/ocelot.h new file mode 100644 index 000000000000..dd72073d2d4f --- /dev/null +++ b/include/linux/mfd/ocelot.h @@ -0,0 +1,62 @@ +/* SPDX-License-Identifier: GPL-2.0 OR MIT */ +/* Copyright 2022 Innovative Advantage Inc. */ + +#ifndef _LINUX_MFD_OCELOT_H +#define _LINUX_MFD_OCELOT_H + +#include +#include +#include +#include +#include +#include + +struct resource; + +static inline struct regmap * +ocelot_regmap_from_resource_optional(struct platform_device *pdev, + unsigned int index, + const struct regmap_config *config) +{ + struct device *dev = &pdev->dev; + struct resource *res; + void __iomem *regs; + + /* + * Don't use _get_and_ioremap_resource() here, since that will invoke + * prints of "invalid resource" which will simply add confusion. + */ + res = platform_get_resource(pdev, IORESOURCE_MEM, index); + if (res) { + regs = devm_ioremap_resource(dev, res); + if (IS_ERR(regs)) + return ERR_CAST(regs); + return devm_regmap_init_mmio(dev, regs, config); + } + + /* + * Fall back to using REG and getting the resource from the parent + * device, which is possible in an MFD configuration + */ + if (dev->parent) { + res = platform_get_resource(pdev, IORESOURCE_REG, index); + if (!res) + return NULL; + + return dev_get_regmap(dev->parent, res->name); + } + + return NULL; +} + +static inline struct regmap * +ocelot_regmap_from_resource(struct platform_device *pdev, unsigned int index, + const struct regmap_config *config) +{ + struct regmap *map; + + map = ocelot_regmap_from_resource_optional(pdev, index, config); + return map ?: ERR_PTR(-ENOENT); +} + +#endif -- cgit From 39f7d0832c28a6a31abb5b7363e7b1ba0714fe18 Mon Sep 17 00:00:00 2001 From: Colin Foster Date: Mon, 5 Sep 2022 09:21:30 -0700 Subject: resource: add define macro for register address resources DEFINE_RES_ macros have been created for the commonly used resource types, but not IORESOURCE_REG. Add the macro so it can be used in a similar manner to all other resource types. Signed-off-by: Colin Foster Reviewed-by: Vladimir Oltean Reviewed-by: Andy Shevchenko Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220905162132.2943088-7-colin.foster@in-advantage.com --- include/linux/ioport.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 616b683563a9..8a76dca9deee 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -172,6 +172,11 @@ enum { #define DEFINE_RES_MEM(_start, _size) \ DEFINE_RES_MEM_NAMED((_start), (_size), NULL) +#define DEFINE_RES_REG_NAMED(_start, _size, _name) \ + DEFINE_RES_NAMED((_start), (_size), (_name), IORESOURCE_REG) +#define DEFINE_RES_REG(_start, _size) \ + DEFINE_RES_REG_NAMED((_start), (_size), NULL) + #define DEFINE_RES_IRQ_NAMED(_irq, _name) \ DEFINE_RES_NAMED((_irq), 1, (_name), IORESOURCE_IRQ) #define DEFINE_RES_IRQ(_irq) \ -- cgit From f2042ed21da7f8886c93efefff61f93e6d57e9bd Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Tue, 16 Aug 2022 18:28:05 +0100 Subject: iommu/dma: Make header private Now that dma-iommu.h only contains internal interfaces, make it private to the IOMMU subsytem. Signed-off-by: Robin Murphy Link: https://lore.kernel.org/r/b237e06c56a101f77af142a54b629b27aa179d22.1660668998.git.robin.murphy@arm.com [ joro : re-add stub for iommu_dma_get_resv_regions ] Signed-off-by: Joerg Roedel --- include/linux/dma-iommu.h | 53 ----------------------------------------------- 1 file changed, 53 deletions(-) delete mode 100644 include/linux/dma-iommu.h (limited to 'include/linux') diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h deleted file mode 100644 index e83de4f1f3d6..000000000000 --- a/include/linux/dma-iommu.h +++ /dev/null @@ -1,53 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Copyright (C) 2014-2015 ARM Ltd. - */ -#ifndef __DMA_IOMMU_H -#define __DMA_IOMMU_H - -#include -#include - -#ifdef CONFIG_IOMMU_DMA -#include -#include -#include - -/* Domain management interface for IOMMU drivers */ -int iommu_get_dma_cookie(struct iommu_domain *domain); -void iommu_put_dma_cookie(struct iommu_domain *domain); - -int iommu_dma_init_fq(struct iommu_domain *domain); - -void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list); - -void iommu_dma_free_cpu_cached_iovas(unsigned int cpu, - struct iommu_domain *domain); - -extern bool iommu_dma_forcedac; - -#else /* CONFIG_IOMMU_DMA */ - -struct iommu_domain; -struct device; - -static inline int iommu_dma_init_fq(struct iommu_domain *domain) -{ - return -EINVAL; -} - -static inline int iommu_get_dma_cookie(struct iommu_domain *domain) -{ - return -ENODEV; -} - -static inline void iommu_put_dma_cookie(struct iommu_domain *domain) -{ -} - -static inline void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list) -{ -} - -#endif /* CONFIG_IOMMU_DMA */ -#endif /* __DMA_IOMMU_H */ -- cgit From c9874d3ffeaf8ee215187692ed918b3031d996d1 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 16 Aug 2018 11:47:53 -0400 Subject: termios: start unifying non-UAPI parts of asm/termios.h * new header (linut/termios_internal.h), pulled by the users of those suckers * defaults for INIT_C_CC and externs for conversion helpers moved over there * remove termios-base.h (empty now) Signed-off-by: Al Viro Link: https://lore.kernel.org/r/YxDmptU7dNGZ+/Hn@ZenIV Signed-off-by: Greg Kroah-Hartman --- include/linux/termios_internal.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+) create mode 100644 include/linux/termios_internal.h (limited to 'include/linux') diff --git a/include/linux/termios_internal.h b/include/linux/termios_internal.h new file mode 100644 index 000000000000..103ca0370948 --- /dev/null +++ b/include/linux/termios_internal.h @@ -0,0 +1,30 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_TERMIOS_CONV_H +#define _LINUX_TERMIOS_CONV_H + +#include +#include + +#ifndef INIT_C_CC +/* intr=^C quit=^\ erase=del kill=^U + eof=^D vtime=\0 vmin=\1 sxtc=\0 + start=^Q stop=^S susp=^Z eol=\0 + reprint=^R discard=^U werase=^W lnext=^V + eol2=\0 +*/ +#define INIT_C_CC "\003\034\177\025\004\0\1\0\021\023\032\0\022\017\027\026\0" +#endif + +int user_termio_to_kernel_termios(struct ktermios *, struct termio __user *); +int kernel_termios_to_user_termio(struct termio __user *, struct ktermios *); +#ifdef TCGETS2 +int user_termios_to_kernel_termios(struct ktermios *, struct termios2 __user *); +int kernel_termios_to_user_termios(struct termios2 __user *, struct ktermios *); +int user_termios_to_kernel_termios_1(struct ktermios *, struct termios __user *); +int kernel_termios_to_user_termios_1(struct termios __user *, struct ktermios *); +#else /* TCGETS2 */ +int user_termios_to_kernel_termios(struct ktermios *, struct termios __user *); +int kernel_termios_to_user_termios(struct termios __user *, struct ktermios *); +#endif /* TCGETS2 */ + +#endif /* _LINUX_TERMIOS_CONV_H */ -- cgit From 38fc315a73f7a1b2d19eb32dd55de089b78b2a54 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 9 Aug 2022 17:17:04 -0400 Subject: termios: consolidate values for VDISCARD in INIT_C_CC On old systems it used to be ^O. Linux had never actually used the value, but INIT_C_CC (on i386) did initialize it to ^O; unfortunately, it had a typo in the comment claiming that to be ^U. Most of the architectures copied the (correct) definition along with mistaken comment. alpha, powerpc and sparc tried to make the definition match comment. However, util-linux still resets it to ^O on any architecture, ^O is the historical value, kernel ignores it anyway and finally, Linus said "Just change everybody to do the same, nobody cares about VDISCARD". Signed-off-by: Al Viro Link: https://lore.kernel.org/r/YxDmy//MKzs3ye7l@ZenIV Signed-off-by: Greg Kroah-Hartman --- include/linux/termios_internal.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/termios_internal.h b/include/linux/termios_internal.h index 103ca0370948..7eb3598c109d 100644 --- a/include/linux/termios_internal.h +++ b/include/linux/termios_internal.h @@ -9,7 +9,7 @@ /* intr=^C quit=^\ erase=del kill=^U eof=^D vtime=\0 vmin=\1 sxtc=\0 start=^Q stop=^S susp=^Z eol=\0 - reprint=^R discard=^U werase=^W lnext=^V + reprint=^R discard=^O werase=^W lnext=^V eol2=\0 */ #define INIT_C_CC "\003\034\177\025\004\0\1\0\021\023\032\0\022\017\027\026\0" -- cgit From d04f9915fa44b52d7a91080677381a082238e9c4 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 9 Sep 2018 17:57:42 -0400 Subject: make generic INIT_C_CC a bit more generic turn it into an array initializer; then alpha, mips and powerpc variants fold into it. Signed-off-by: Al Viro Link: https://lore.kernel.org/r/YxDm7M6M91gC2RPL@ZenIV Signed-off-by: Greg Kroah-Hartman --- include/linux/termios_internal.h | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/termios_internal.h b/include/linux/termios_internal.h index 7eb3598c109d..8a53141ab44a 100644 --- a/include/linux/termios_internal.h +++ b/include/linux/termios_internal.h @@ -12,7 +12,20 @@ reprint=^R discard=^O werase=^W lnext=^V eol2=\0 */ -#define INIT_C_CC "\003\034\177\025\004\0\1\0\021\023\032\0\022\017\027\026\0" +#define INIT_C_CC { \ + [VINTR] = 'C'-0x40, \ + [VQUIT] = '\\'-0x40, \ + [VERASE] = '\177', \ + [VKILL] = 'U'-0x40, \ + [VEOF] = 'D'-0x40, \ + [VSTART] = 'Q'-0x40, \ + [VSTOP] = 'S'-0x40, \ + [VSUSP] = 'Z'-0x40, \ + [VREPRINT] = 'R'-0x40, \ + [VDISCARD] = 'O'-0x40, \ + [VWERASE] = 'W'-0x40, \ + [VLNEXT] = 'V'-0x40, \ + [VMIN] = 1 } #endif int user_termio_to_kernel_termios(struct ktermios *, struct termio __user *); -- cgit From e7b4c812b9685e22753d6355e53fdeaaa22862dd Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 9 Aug 2022 19:41:40 -0400 Subject: termios: convert the last (sparc) INIT_C_CC to array Signed-off-by: Al Viro Link: https://lore.kernel.org/r/YxDnDCR2VRTA3Etp@ZenIV Signed-off-by: Greg Kroah-Hartman --- include/linux/termios_internal.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/termios_internal.h b/include/linux/termios_internal.h index 8a53141ab44a..d77f29e5e2b7 100644 --- a/include/linux/termios_internal.h +++ b/include/linux/termios_internal.h @@ -5,13 +5,19 @@ #include #include -#ifndef INIT_C_CC /* intr=^C quit=^\ erase=del kill=^U eof=^D vtime=\0 vmin=\1 sxtc=\0 start=^Q stop=^S susp=^Z eol=\0 reprint=^R discard=^O werase=^W lnext=^V eol2=\0 */ + +#ifdef VDSUSP +#define INIT_C_CC_VDSUSP_EXTRA [VDSUSP] = 'Y'-0x40, +#else +#define INIT_C_CC_VDSUSP_EXTRA +#endif + #define INIT_C_CC { \ [VINTR] = 'C'-0x40, \ [VQUIT] = '\\'-0x40, \ @@ -25,8 +31,8 @@ [VDISCARD] = 'O'-0x40, \ [VWERASE] = 'W'-0x40, \ [VLNEXT] = 'V'-0x40, \ + INIT_C_CC_VDSUSP_EXTRA \ [VMIN] = 1 } -#endif int user_termio_to_kernel_termios(struct ktermios *, struct termio __user *); int kernel_termios_to_user_termio(struct termio __user *, struct ktermios *); -- cgit From 89bbeb7e3199e1514729aa6de8057289e6375fe6 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 9 Aug 2022 19:41:40 -0400 Subject: termios: get rid of non-UAPI asm/termios.h All non-UAPI asm/termios.h consist of include of UAPI counterpart and, possibly, include of linux/uaccess.h The latter can't be simply removed, even though nothing in linux/termios.h doesn't depend upon it anymore - there are several places that rely upon that indirect chain of includes to pull linux/uaccess.h. So the include needs to be lifted out of there - we lift into tty_driver.h, serdev.h and places that pull asm/termios.h, but none of * linux/uaccess.h (obvious) * net/sock.h (pulls uaccess.h) * linux/{tty,tty_driver,serdev}.h (tty.h pulls tty_driver.h) That leaves us just with the include of UAPI asm/termios.h, which is what will resolve to if we simply remove non-UAPI header. Signed-off-by: Al Viro Link: https://lore.kernel.org/r/YxDnKvYCHn/ogBUv@ZenIV Signed-off-by: Greg Kroah-Hartman --- include/linux/serdev.h | 1 + include/linux/tty_driver.h | 1 + 2 files changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/serdev.h b/include/linux/serdev.h index 3368c261ab62..66f624fc618c 100644 --- a/include/linux/serdev.h +++ b/include/linux/serdev.h @@ -7,6 +7,7 @@ #include #include +#include #include #include diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h index b2456b545ba0..f961164a5274 100644 --- a/include/linux/tty_driver.h +++ b/include/linux/tty_driver.h @@ -7,6 +7,7 @@ #include #include #include +#include #include #include -- cgit From d79ddb069c5257a924456eb99b53fc1ea715c0a3 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Fri, 26 Aug 2022 00:41:05 +0800 Subject: sched/psi: Move private helpers to sched/stats.h This patch move psi_task_change/psi_task_switch declarations out of PSI public header, since they are only needed for implementing the PSI stats tracking in sched/stats.h psi_task_switch is obvious, psi_task_change can't be public helper since it doesn't check psi_disabled static key. And there is no any user now, so put it in sched/stats.h too. Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Acked-by: Johannes Weiner Link: https://lore.kernel.org/r/20220825164111.29534-5-zhouchengming@bytedance.com --- include/linux/psi.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/psi.h b/include/linux/psi.h index dd74411ac21d..fffd229fbf19 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -18,10 +18,6 @@ extern struct psi_group psi_system; void psi_init(void); -void psi_task_change(struct task_struct *task, int clear, int set); -void psi_task_switch(struct task_struct *prev, struct task_struct *next, - bool sleep); - void psi_memstall_enter(unsigned long *flags); void psi_memstall_leave(unsigned long *flags); -- cgit From 71dbdde7914d32e86f01ac1f6e54e964c9dfdbd9 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Fri, 26 Aug 2022 00:41:07 +0800 Subject: sched/psi: Remove NR_ONCPU task accounting We put all fields updated by the scheduler in the first cacheline of struct psi_group_cpu for performance. Since we want add another PSI_IRQ_FULL to track IRQ/SOFTIRQ pressure, we need to reclaim space first. This patch remove NR_ONCPU task accounting in struct psi_group_cpu, use one bit in state_mask to track instead. Signed-off-by: Johannes Weiner Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Chengming Zhou Tested-by: Chengming Zhou Link: https://lore.kernel.org/r/20220825164111.29534-7-zhouchengming@bytedance.com --- include/linux/psi_types.h | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index c7fe7c089718..54cb74946db4 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -15,13 +15,6 @@ enum psi_task_count { NR_IOWAIT, NR_MEMSTALL, NR_RUNNING, - /* - * This can't have values other than 0 or 1 and could be - * implemented as a bit flag. But for now we still have room - * in the first cacheline of psi_group_cpu, and this way we - * don't have to special case any state tracking for it. - */ - NR_ONCPU, /* * For IO and CPU stalls the presence of running/oncpu tasks * in the domain means a partial rather than a full stall. @@ -32,16 +25,18 @@ enum psi_task_count { * threads and memstall ones. */ NR_MEMSTALL_RUNNING, - NR_PSI_TASK_COUNTS = 5, + NR_PSI_TASK_COUNTS = 4, }; /* Task state bitmasks */ #define TSK_IOWAIT (1 << NR_IOWAIT) #define TSK_MEMSTALL (1 << NR_MEMSTALL) #define TSK_RUNNING (1 << NR_RUNNING) -#define TSK_ONCPU (1 << NR_ONCPU) #define TSK_MEMSTALL_RUNNING (1 << NR_MEMSTALL_RUNNING) +/* Only one task can be scheduled, no corresponding task count */ +#define TSK_ONCPU (1 << NR_PSI_TASK_COUNTS) + /* Resources that workloads could be stalled on */ enum psi_res { PSI_IO, @@ -68,6 +63,9 @@ enum psi_states { NR_PSI_STATES = 7, }; +/* Use one bit in the state mask to track TSK_ONCPU */ +#define PSI_ONCPU (1 << NR_PSI_STATES) + enum psi_aggregators { PSI_AVGS = 0, PSI_POLL, -- cgit From 52b1364ba0b105122d6de0e719b36db705011ac1 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Fri, 26 Aug 2022 00:41:08 +0800 Subject: sched/psi: Add PSI_IRQ to track IRQ/SOFTIRQ pressure Now PSI already tracked workload pressure stall information for CPU, memory and IO. Apart from these, IRQ/SOFTIRQ could have obvious impact on some workload productivity, such as web service workload. When CONFIG_IRQ_TIME_ACCOUNTING, we can get IRQ/SOFTIRQ delta time from update_rq_clock_task(), in which we can record that delta to CPU curr task's cgroups as PSI_IRQ_FULL status. Note we don't use PSI_IRQ_SOME since IRQ/SOFTIRQ always happen in the current task on the CPU, make nothing productive could run even if it were runnable, so we only use PSI_IRQ_FULL. Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Acked-by: Johannes Weiner Link: https://lore.kernel.org/r/20220825164111.29534-8-zhouchengming@bytedance.com --- include/linux/psi_types.h | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 54cb74946db4..40c28171cd91 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -42,7 +42,10 @@ enum psi_res { PSI_IO, PSI_MEM, PSI_CPU, - NR_PSI_RESOURCES = 3, +#ifdef CONFIG_IRQ_TIME_ACCOUNTING + PSI_IRQ, +#endif + NR_PSI_RESOURCES, }; /* @@ -58,9 +61,12 @@ enum psi_states { PSI_MEM_FULL, PSI_CPU_SOME, PSI_CPU_FULL, +#ifdef CONFIG_IRQ_TIME_ACCOUNTING + PSI_IRQ_FULL, +#endif /* Only per-CPU, to weigh the CPU in the global average: */ PSI_NONIDLE, - NR_PSI_STATES = 7, + NR_PSI_STATES, }; /* Use one bit in the state mask to track TSK_ONCPU */ -- cgit From 57899a6610e67ba26fa3251ebbef4a5ed21efc5d Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Fri, 26 Aug 2022 00:41:09 +0800 Subject: sched/psi: Consolidate cgroup_psi() cgroup_psi() can't return psi_group for root cgroup, so we have many open code "psi = cgroup_ino(cgrp) == 1 ? &psi_system : cgrp->psi". This patch move cgroup_psi() definition to , in which we can return psi_system for root cgroup, so can handle all cgroups. Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Acked-by: Johannes Weiner Link: https://lore.kernel.org/r/20220825164111.29534-9-zhouchengming@bytedance.com --- include/linux/cgroup.h | 5 ----- include/linux/psi.h | 6 ++++++ 2 files changed, 6 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index b0914aa26506..80cb970257be 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -673,11 +673,6 @@ static inline void pr_cont_cgroup_path(struct cgroup *cgrp) pr_cont_kernfs_path(cgrp->kn); } -static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) -{ - return cgrp->psi; -} - bool cgroup_psi_enabled(void); static inline void cgroup_init_kthreadd(void) diff --git a/include/linux/psi.h b/include/linux/psi.h index fffd229fbf19..362a74ca1d3b 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -7,6 +7,7 @@ #include #include #include +#include struct seq_file; struct css_set; @@ -30,6 +31,11 @@ __poll_t psi_trigger_poll(void **trigger_ptr, struct file *file, poll_table *wait); #ifdef CONFIG_CGROUPS +static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) +{ + return cgroup_ino(cgrp) == 1 ? &psi_system : cgrp->psi; +} + int psi_cgroup_alloc(struct cgroup *cgrp); void psi_cgroup_free(struct cgroup *cgrp); void cgroup_move_task(struct task_struct *p, struct css_set *to); -- cgit From dc86aba751e2867244411adda1562f6664747019 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Fri, 26 Aug 2022 00:41:10 +0800 Subject: sched/psi: Cache parent psi_group to speed up group iteration We use iterate_groups() to iterate each level psi_group to update PSI stats, which is a very hot path. In current code, iterate_groups() have to use multiple branches and cgroup_parent() to get parent psi_group for each level, which is not very efficient. This patch cache parent psi_group in struct psi_group, only need to get psi_group of task itself first, then just use group->parent to iterate. Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Acked-by: Johannes Weiner Link: https://lore.kernel.org/r/20220825164111.29534-10-zhouchengming@bytedance.com --- include/linux/psi_types.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index 40c28171cd91..a0b746258c68 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -151,6 +151,8 @@ struct psi_trigger { }; struct psi_group { + struct psi_group *parent; + /* Protects data used by the aggregator */ struct mutex avgs_lock; -- cgit From 34f26a15611afb03c33df6819359d36f5b382589 Mon Sep 17 00:00:00 2001 From: Chengming Zhou Date: Wed, 7 Sep 2022 17:03:32 +0800 Subject: sched/psi: Per-cgroup PSI accounting disable/re-enable interface PSI accounts stalls for each cgroup separately and aggregates it at each level of the hierarchy. This may cause non-negligible overhead for some workloads when under deep level of the hierarchy. commit 3958e2d0c34e ("cgroup: make per-cgroup pressure stall tracking configurable") make PSI to skip per-cgroup stall accounting, only account system-wide to avoid this each level overhead. But for our use case, we also want leaf cgroup PSI stats accounted for userspace adjustment on that cgroup, apart from only system-wide adjustment. So this patch introduce a per-cgroup PSI accounting disable/re-enable interface "cgroup.pressure", which is a read-write single value file that allowed values are "0" and "1", the defaults is "1" so per-cgroup PSI stats is enabled by default. Implementation details: It should be relatively straight-forward to disable and re-enable state aggregation, time tracking, averaging on a per-cgroup level, if we can live with losing history from while it was disabled. I.e. the avgs will restart from 0, total= will have gaps. But it's hard or complex to stop/restart groupc->tasks[] updates, which is not implemented in this patch. So we always update groupc->tasks[] and PSI_ONCPU bit in psi_group_change() even when the cgroup PSI stats is disabled. Suggested-by: Johannes Weiner Suggested-by: Tejun Heo Signed-off-by: Chengming Zhou Signed-off-by: Peter Zijlstra (Intel) Acked-by: Johannes Weiner Link: https://lkml.kernel.org/r/20220907090332.2078-1-zhouchengming@bytedance.com --- include/linux/cgroup-defs.h | 3 +++ include/linux/psi.h | 2 ++ include/linux/psi_types.h | 3 +++ 3 files changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cgroup-defs.h b/include/linux/cgroup-defs.h index 4bcf56b3491c..7df76b318245 100644 --- a/include/linux/cgroup-defs.h +++ b/include/linux/cgroup-defs.h @@ -428,6 +428,9 @@ struct cgroup { struct cgroup_file procs_file; /* handle for "cgroup.procs" */ struct cgroup_file events_file; /* handle for "cgroup.events" */ + /* handles for "{cpu,memory,io,irq}.pressure" */ + struct cgroup_file psi_files[NR_PSI_RESOURCES]; + /* * The bitmask of subsystems enabled on the child cgroups. * ->subtree_control is the one configured through diff --git a/include/linux/psi.h b/include/linux/psi.h index 362a74ca1d3b..b029a847def1 100644 --- a/include/linux/psi.h +++ b/include/linux/psi.h @@ -39,6 +39,7 @@ static inline struct psi_group *cgroup_psi(struct cgroup *cgrp) int psi_cgroup_alloc(struct cgroup *cgrp); void psi_cgroup_free(struct cgroup *cgrp); void cgroup_move_task(struct task_struct *p, struct css_set *to); +void psi_cgroup_restart(struct psi_group *group); #endif #else /* CONFIG_PSI */ @@ -60,6 +61,7 @@ static inline void cgroup_move_task(struct task_struct *p, struct css_set *to) { rcu_assign_pointer(p->cgroups, to); } +static inline void psi_cgroup_restart(struct psi_group *group) {} #endif #endif /* CONFIG_PSI */ diff --git a/include/linux/psi_types.h b/include/linux/psi_types.h index a0b746258c68..6e4372735068 100644 --- a/include/linux/psi_types.h +++ b/include/linux/psi_types.h @@ -152,6 +152,7 @@ struct psi_trigger { struct psi_group { struct psi_group *parent; + bool enabled; /* Protects data used by the aggregator */ struct mutex avgs_lock; @@ -194,6 +195,8 @@ struct psi_group { #else /* CONFIG_PSI */ +#define NR_PSI_RESOURCES 0 + struct psi_group { }; #endif /* CONFIG_PSI */ -- cgit From b9be19eef35587b15da0a5d887d6e04c0c6cefab Mon Sep 17 00:00:00 2001 From: Phil Auld Date: Tue, 6 Sep 2022 16:35:42 -0400 Subject: drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES As PAGE_SIZE is unsigned long, -1 > PAGE_SIZE when NR_CPUS <= 3. This leads to very large file sizes: topology$ ls -l total 0 -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 core_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 core_cpus_list -r--r--r-- 1 root root 4096 Sep 5 10:58 core_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 core_siblings -r--r--r-- 1 root root 4096 Sep 5 11:59 core_siblings_list -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 die_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 die_cpus_list -r--r--r-- 1 root root 4096 Sep 5 11:59 die_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 package_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 package_cpus_list -r--r--r-- 1 root root 4096 Sep 5 10:58 physical_package_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 thread_siblings -r--r--r-- 1 root root 4096 Sep 5 11:59 thread_siblings_list Adjust the inequality to catch the case when NR_CPUS is configured to a small value. Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Yury Norov Cc: stable@vger.kernel.org Cc: feng xiangjun Fixes: 7ee951acd31a ("drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist") Reported-by: feng xiangjun Signed-off-by: Phil Auld Signed-off-by: Yury Norov --- include/linux/cpumask.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index bd047864c7ac..e8ad12b5b9d2 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -1127,9 +1127,10 @@ cpumap_print_list_to_buf(char *buf, const struct cpumask *mask, * cover a worst-case of every other cpu being on one of two nodes for a * very large NR_CPUS. * - * Use PAGE_SIZE as a minimum for smaller configurations. + * Use PAGE_SIZE as a minimum for smaller configurations while avoiding + * unsigned comparison to -1. */ -#define CPUMAP_FILE_MAX_BYTES ((((NR_CPUS * 9)/32 - 1) > PAGE_SIZE) \ +#define CPUMAP_FILE_MAX_BYTES (((NR_CPUS * 9)/32 > PAGE_SIZE) \ ? (NR_CPUS * 9)/32 - 1 : PAGE_SIZE) #define CPULIST_FILE_MAX_BYTES (((NR_CPUS * 7)/2 > PAGE_SIZE) ? (NR_CPUS * 7)/2 : PAGE_SIZE) -- cgit From 811d59fdf56a17c66742578fe8be8a6a841af448 Mon Sep 17 00:00:00 2001 From: Mario Limonciello Date: Mon, 29 Aug 2022 11:29:49 -0500 Subject: ACPI: s2idle: Add a new ->check() callback for platform_s2idle_ops On some platforms it is found that Linux more aggressively enters s2idle than Windows enters Modern Standby and this uncovers some synchronization issues for the platform. To aid in debugging this class of problems in the future, add support for an extra optional callback intended for drivers to emit extra debugging. Signed-off-by: Mario Limonciello Acked-by: Rafael J. Wysocki Link: https://lore.kernel.org/r/20220829162953.5947-2-mario.limonciello@amd.com Signed-off-by: Hans de Goede --- include/linux/acpi.h | 1 + include/linux/suspend.h | 1 + 2 files changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6f64b2f3dc54..acaa2ddc067d 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1075,6 +1075,7 @@ acpi_status acpi_os_prepare_extended_sleep(u8 sleep_state, struct acpi_s2idle_dev_ops { struct list_head list_node; void (*prepare)(void); + void (*check)(void); void (*restore)(void); }; int acpi_register_lps0_dev(struct acpi_s2idle_dev_ops *arg); diff --git a/include/linux/suspend.h b/include/linux/suspend.h index 70f2921e2e70..03ed42ed2c7f 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -191,6 +191,7 @@ struct platform_s2idle_ops { int (*begin)(void); int (*prepare)(void); int (*prepare_late)(void); + void (*check)(void); bool (*wake)(void); void (*restore_early)(void); void (*restore)(void); -- cgit From 6bb057bfd9d509755349cd2a6ca5e5e6e6071304 Mon Sep 17 00:00:00 2001 From: Heikki Krogerus Date: Tue, 16 Aug 2022 13:16:26 +0300 Subject: ACPI: resource: Add helper function acpi_dev_get_memory_resources() Wrapper function that finds all memory type resources by using acpi_dev_get_resources(). It removes the need for the drivers to check the resource data type separately. Signed-off-by: Heikki Krogerus Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6f64b2f3dc54..ed4aa395cc49 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -506,6 +506,7 @@ int acpi_dev_get_resources(struct acpi_device *adev, struct list_head *list, void *preproc_data); int acpi_dev_get_dma_resources(struct acpi_device *adev, struct list_head *list); +int acpi_dev_get_memory_resources(struct acpi_device *adev, struct list_head *list); int acpi_dev_filter_resource_type(struct acpi_resource *ares, unsigned long types); -- cgit From d4f7bdb2ed7bf320a8772258fdc257655d225afb Mon Sep 17 00:00:00 2001 From: Daniel Xu Date: Wed, 7 Sep 2022 10:40:37 -0600 Subject: bpf: Add stub for btf_struct_access() Add corresponding unimplemented stub for when CONFIG_BPF_SYSCALL=n Signed-off-by: Daniel Xu Acked-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/4021398e884433b1fef57a4d28361bb9fcf1bd05.1662568410.git.dxu@dxuuu.xyz Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 48ae05099f36..54178b9e9c3a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2211,6 +2211,15 @@ static inline struct bpf_prog *bpf_prog_by_id(u32 id) return ERR_PTR(-ENOTSUPP); } +static inline int btf_struct_access(struct bpf_verifier_log *log, + const struct btf *btf, + const struct btf_type *t, int off, int size, + enum bpf_access_type atype, + u32 *next_btf_id, enum bpf_type_flag *flag) +{ + return -EACCES; +} + static inline const struct bpf_func_proto * bpf_base_func_proto(enum bpf_func_id func_id) { -- cgit From 1bfe26fb082724be453e4d7fd9bb358e3ba669b2 Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Thu, 8 Sep 2022 16:07:16 -0700 Subject: bpf: Add verifier support for custom callback return range Verifier logic to confirm that a callback function returns 0 or 1 was added in commit 69c087ba6225b ("bpf: Add bpf_for_each_map_elem() helper"). At the time, callback return value was only used to continue or stop iteration. In order to support callbacks with a broader return value range, such as those added in rbtree series[0] and others, add a callback_ret_range to bpf_func_state. Verifier's helpers which set in_callback_fn will also set the new field, which the verifier will later use to check return value bounds. Default to tnum_range(0, 0) instead of using tnum_unknown as a sentinel value as the latter would prevent the valid range (0, U64_MAX) being used. Previous global default tnum_range(0, 1) is explicitly set for extant callback helpers. The change to global default was made after discussion around this patch in rbtree series [1], goal here is to make it more obvious that callback_ret_range should be explicitly set. [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com/ [1]: lore.kernel.org/bpf/20220830172759.4069786-2-davemarchevsky@fb.com/ Signed-off-by: Dave Marchevsky Reviewed-by: Stanislav Fomichev Link: https://lore.kernel.org/r/20220908230716.2751723-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index b49a349cc6ae..e197f8fb27e2 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -248,6 +248,7 @@ struct bpf_func_state { */ u32 async_entry_cnt; bool in_callback_fn; + struct tnum callback_ret_range; bool in_async_callback_fn; /* The following fields should be last. See copy_func_state() */ -- cgit From 9cd4f1434479f1ac25c440c421fbf52069079914 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Sun, 11 Sep 2022 11:18:45 +0800 Subject: iommu/vt-d: Fix possible recursive locking in intel_iommu_init() The global rwsem dmar_global_lock was introduced by commit 3a5670e8ac932 ("iommu/vt-d: Introduce a rwsem to protect global data structures"). It is used to protect DMAR related global data from DMAR hotplug operations. The dmar_global_lock used in the intel_iommu_init() might cause recursive locking issue, for example, intel_iommu_get_resv_regions() is taking the dmar_global_lock from within a section where intel_iommu_init() already holds it via probe_acpi_namespace_devices(). Using dmar_global_lock in intel_iommu_init() could be relaxed since it is unlikely that any IO board must be hot added before the IOMMU subsystem is initialized. This eliminates the possible recursive locking issue by moving down DMAR hotplug support after the IOMMU is initialized and removing the uses of dmar_global_lock in intel_iommu_init(). Fixes: d5692d4af08cd ("iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()") Reported-by: Robin Murphy Signed-off-by: Lu Baolu Reviewed-by: Kevin Tian Link: https://lore.kernel.org/r/894db0ccae854b35c73814485569b634237b5538.1657034828.git.robin.murphy@arm.com Link: https://lore.kernel.org/r/20220718235325.3952426-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/dmar.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dmar.h b/include/linux/dmar.h index d81a51978d01..8917a32173c4 100644 --- a/include/linux/dmar.h +++ b/include/linux/dmar.h @@ -65,6 +65,7 @@ struct dmar_pci_notify_info { extern struct rw_semaphore dmar_global_lock; extern struct list_head dmar_drhd_units; +extern int intel_iommu_enabled; #define for_each_drhd_unit(drhd) \ list_for_each_entry_rcu(drhd, &dmar_drhd_units, list, \ @@ -88,7 +89,8 @@ extern struct list_head dmar_drhd_units; static inline bool dmar_rcu_check(void) { return rwsem_is_locked(&dmar_global_lock) || - system_state == SYSTEM_BOOTING; + system_state == SYSTEM_BOOTING || + (IS_ENABLED(CONFIG_INTEL_IOMMU) && !intel_iommu_enabled); } #define dmar_rcu_dereference(p) rcu_dereference_check((p), dmar_rcu_check()) -- cgit From a1c7c1a40478db4edef8ad0a0c6975c8921088b7 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Tue, 19 Jul 2022 13:41:31 +0200 Subject: power: supply: Explain maintenance charging In order for everyone to understand clearly why we want to use maintenance charging for batteries, expand the description with two diagrams and some text. Signed-off-by: Linus Walleij Reviewed-by: Matti Vaittinen Signed-off-by: Sebastian Reichel --- include/linux/power_supply.h | 48 ++++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/power_supply.h b/include/linux/power_supply.h index cb380c1d9459..aa2c4a7c4826 100644 --- a/include/linux/power_supply.h +++ b/include/linux/power_supply.h @@ -374,9 +374,37 @@ struct power_supply_vbat_ri_table { * These timers should be chosen to align with the typical discharge curve * for the battery. * - * When the main CC/CV charging is complete the battery can optionally be - * maintenance charged at the voltages from this table: a table of settings is - * traversed using a slightly lower current and voltage than what is used for + * Ordinary CC/CV charging will stop charging when the charge current goes + * below charge_term_current_ua, and then restart it (if the device is still + * plugged into the charger) at charge_restart_voltage_uv. This happens in most + * consumer products because the power usage while connected to a charger is + * not zero, and devices are not manufactured to draw power directly from the + * charger: instead they will at all times dissipate the battery a little, like + * the power used in standby mode. This will over time give a charge graph + * such as this: + * + * Energy + * ^ ... ... ... ... ... ... ... + * | . . . . . . . . . . . . . + * | .. . .. . .. . .. . .. . .. . .. + * |. .. .. .. .. .. .. + * +-------------------------------------------------------------------> t + * + * Practically this means that the Li-ions are wandering back and forth in the + * battery and this causes degeneration of the battery anode and cathode. + * To prolong the life of the battery, maintenance charging is applied after + * reaching charge_term_current_ua to hold up the charge in the battery while + * consuming power, thus lowering the wear on the battery: + * + * Energy + * ^ ....................................... + * | . ...................... + * | .. + * |. + * +-------------------------------------------------------------------> t + * + * Maintenance charging uses the voltages from this table: a table of settings + * is traversed using a slightly lower current and voltage than what is used for * CC/CV charging. The maintenance charging will for safety reasons not go on * indefinately: we lower the current and voltage with successive maintenance * settings, then disable charging completely after we reach the last one, @@ -385,14 +413,22 @@ struct power_supply_vbat_ri_table { * ordinary CC/CV charging from there. * * As an example, a Samsung EB425161LA Lithium-Ion battery is CC/CV charged - * at 900mA to 4340mV, then maintenance charged at 600mA and 4150mV for - * 60 hours, then maintenance charged at 600mA and 4100mV for 200 hours. + * at 900mA to 4340mV, then maintenance charged at 600mA and 4150mV for up to + * 60 hours, then maintenance charged at 600mA and 4100mV for up to 200 hours. * After this the charge cycle is restarted waiting for * charge_restart_voltage_uv. * * For most mobile electronics this type of maintenance charging is enough for * the user to disconnect the device and make use of it before both maintenance - * charging cycles are complete. + * charging cycles are complete, if the current and voltage has been chosen + * appropriately. These need to be determined from battery discharge curves + * and expected standby current. + * + * If the voltage anyway drops to charge_restart_voltage_uv during maintenance + * charging, ordinary CC/CV charging is restarted. This can happen if the + * device is e.g. actively used during charging, so more current is drawn than + * the expected stand-by current. Also overvoltage protection will be applied + * as usual. */ struct power_supply_maintenance_charge_table { int charge_current_max_ua; -- cgit From 65d3440e8d2e9e024ef2357f8ff021611b068c99 Mon Sep 17 00:00:00 2001 From: Dan Williams Date: Fri, 26 Aug 2022 10:18:07 -0700 Subject: mm/memory-failure: fix detection of memory_failure() handlers Some pagemap types, like MEMORY_DEVICE_GENERIC (device-dax) do not even have pagemap ops which results in crash signatures like this: BUG: kernel NULL pointer dereference, address: 0000000000000010 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 8000000205073067 P4D 8000000205073067 PUD 2062b3067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 22 PID: 4535 Comm: device-dax Tainted: G OE N 6.0.0-rc2+ #59 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:memory_failure+0x667/0xba0 [..] Call Trace: ? _printk+0x58/0x73 do_madvise.part.0.cold+0xaf/0xc5 Check for ops before checking if the ops have a memory_failure() handler. Link: https://lkml.kernel.org/r/166153428781.2758201.1990616683438224741.stgit@dwillia2-xfh.jf.intel.com Fixes: 33a8f7f2b3a3 ("pagemap,pmem: introduce ->memory_failure()") Signed-off-by: Dan Williams Acked-by: Naoya Horiguchi Reviewed-by: Miaohe Lin Reviewed-by: Christoph Hellwig Cc: Shiyang Ruan Cc: Darrick J. Wong Cc: Al Viro Cc: Dave Chinner Cc: Goldwyn Rodrigues Cc: Jane Chu Cc: Matthew Wilcox Cc: Ritesh Harjani Signed-off-by: Andrew Morton --- include/linux/memremap.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memremap.h b/include/linux/memremap.h index 19010491a603..c3b4cc84877b 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -139,6 +139,11 @@ struct dev_pagemap { }; }; +static inline bool pgmap_has_memory_failure(struct dev_pagemap *pgmap) +{ + return pgmap->ops && pgmap->ops->memory_failure; +} + static inline struct vmem_altmap *pgmap_altmap(struct dev_pagemap *pgmap) { if (pgmap->flags & PGMAP_ALTMAP_VALID) -- cgit From 825cf206ed510c4a1758bef8957e2b039253e2e3 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 26 Aug 2022 23:58:44 -0700 Subject: statx: add direct I/O alignment information Traditionally, the conditions for when DIO (direct I/O) is supported were fairly simple. For both block devices and regular files, DIO had to be aligned to the logical block size of the block device. However, due to filesystem features that have been added over time (e.g. multi-device support, data journalling, inline data, encryption, verity, compression, checkpoint disabling, log-structured mode), the conditions for when DIO is allowed on a regular file have gotten increasingly complex. Whether a particular regular file supports DIO, and with what alignment, can depend on various file attributes and filesystem mount options, as well as which block device(s) the file's data is located on. Moreover, the general rule of DIO needing to be aligned to the block device's logical block size was recently relaxed to allow user buffers (but not file offsets) aligned to the DMA alignment instead. See commit bf8d08532bc1 ("iomap: add support for dma aligned direct-io"). XFS has an ioctl XFS_IOC_DIOINFO that exposes DIO alignment information. Uplifting this to the VFS is one possibility. However, as discussed (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@kernel.org/T/#u), this ioctl is rarely used and not known to be used outside of XFS-specific code. It was also never intended to indicate when a file doesn't support DIO at all, nor was it intended for block devices. Therefore, let's expose this information via statx(). Add the STATX_DIOALIGN flag and two new statx fields associated with it: * stx_dio_mem_align: the alignment (in bytes) required for user memory buffers for DIO, or 0 if DIO is not supported on the file. * stx_dio_offset_align: the alignment (in bytes) required for file offsets and I/O segment lengths for DIO, or 0 if DIO is not supported on the file. This will only be nonzero if stx_dio_mem_align is nonzero, and vice versa. Note that as with other statx() extensions, if STATX_DIOALIGN isn't set in the returned statx struct, then these new fields won't be filled in. This will happen if the file is neither a regular file nor a block device, or if the file is a regular file and the filesystem doesn't support STATX_DIOALIGN. It might also happen if the caller didn't include STATX_DIOALIGN in the request mask, since statx() isn't required to return unrequested information. This commit only adds the VFS-level plumbing for STATX_DIOALIGN. For regular files, individual filesystems will still need to add code to support it. For block devices, a separate commit will wire it up too. Reviewed-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Reviewed-by: Martin K. Petersen Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20220827065851.135710-2-ebiggers@kernel.org --- include/linux/stat.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/stat.h b/include/linux/stat.h index 7df06931f25d..ff277ced50e9 100644 --- a/include/linux/stat.h +++ b/include/linux/stat.h @@ -50,6 +50,8 @@ struct kstat { struct timespec64 btime; /* File creation time */ u64 blocks; u64 mnt_id; + u32 dio_mem_align; + u32 dio_offset_align; }; #endif -- cgit From 2d985f8c6b91b5007a16e640bb9c038c5fb2839b Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 26 Aug 2022 23:58:45 -0700 Subject: vfs: support STATX_DIOALIGN on block devices Add support for STATX_DIOALIGN to block devices, so that direct I/O alignment restrictions are exposed to userspace in a generic way. Note that this breaks the tradition of stat operating only on the block device node, not the block device itself. However, it was felt that doing this is preferable, in order to make the interface useful and avoid needing separate interfaces for regular files and block devices. Signed-off-by: Eric Biggers Reviewed-by: Christoph Hellwig Reviewed-by: Christian Brauner (Microsoft) Link: https://lore.kernel.org/r/20220827065851.135710-3-ebiggers@kernel.org --- include/linux/blkdev.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 84b13fdd34a7..8038c5fbde40 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1498,6 +1498,7 @@ int sync_blockdev(struct block_device *bdev); int sync_blockdev_range(struct block_device *bdev, loff_t lstart, loff_t lend); int sync_blockdev_nowait(struct block_device *bdev); void sync_bdevs(bool wait); +void bdev_statx_dioalign(struct inode *inode, struct kstat *stat); void printk_all_partitions(void); #else static inline void invalidate_bdev(struct block_device *bdev) @@ -1514,6 +1515,9 @@ static inline int sync_blockdev_nowait(struct block_device *bdev) static inline void sync_bdevs(bool wait) { } +static inline void bdev_statx_dioalign(struct inode *inode, struct kstat *stat) +{ +} static inline void printk_all_partitions(void) { } -- cgit From 53dd3f802a6e269868cb599609287a841e65a996 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Fri, 26 Aug 2022 23:58:46 -0700 Subject: fscrypt: change fscrypt_dio_supported() to prepare for STATX_DIOALIGN To prepare for STATX_DIOALIGN support, make two changes to fscrypt_dio_supported(). First, remove the filesystem-block-alignment check and make the filesystems handle it instead. It previously made sense to have it in fs/crypto/; however, to support STATX_DIOALIGN the alignment restriction would have to be returned to filesystems. It ends up being simpler if filesystems handle this part themselves, especially for f2fs which only allows fs-block-aligned DIO in the first place. Second, make fscrypt_dio_supported() work on inodes whose encryption key hasn't been set up yet, by making it set up the key if needed. This is required for statx(), since statx() doesn't require a file descriptor. Reviewed-by: Christoph Hellwig Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20220827065851.135710-4-ebiggers@kernel.org --- include/linux/fscrypt.h | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index 7d2f1e0f23b1..13598859d5b3 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -768,7 +768,7 @@ bool fscrypt_mergeable_bio(struct bio *bio, const struct inode *inode, bool fscrypt_mergeable_bio_bh(struct bio *bio, const struct buffer_head *next_bh); -bool fscrypt_dio_supported(struct kiocb *iocb, struct iov_iter *iter); +bool fscrypt_dio_supported(struct inode *inode); u64 fscrypt_limit_io_blocks(const struct inode *inode, u64 lblk, u64 nr_blocks); @@ -801,11 +801,8 @@ static inline bool fscrypt_mergeable_bio_bh(struct bio *bio, return true; } -static inline bool fscrypt_dio_supported(struct kiocb *iocb, - struct iov_iter *iter) +static inline bool fscrypt_dio_supported(struct inode *inode) { - const struct inode *inode = file_inode(iocb->ki_filp); - return !fscrypt_needs_contents_encryption(inode); } -- cgit From a7f4e6e4c47c41869fe5bea17e013b5557c57ed3 Mon Sep 17 00:00:00 2001 From: Zach O'Keefe Date: Wed, 6 Jul 2022 16:59:25 -0700 Subject: mm/thp: add flag to enforce sysfs THP in hugepage_vma_check() MADV_COLLAPSE is not coupled to the kernel-oriented sysfs THP settings[1]. hugepage_vma_check() is the authority on determining if a VMA is eligible for THP allocation/collapse, and currently enforces the sysfs THP settings. Add a flag to disable these checks. For now, only apply this arg to anon and file, which use /sys/kernel/transparent_hugepage/enabled. We can expand this to shmem, which uses /sys/kernel/transparent_hugepage/shmem_enabled, later. Use this flag in collapse_pte_mapped_thp() where previously the VMA flags passed to hugepage_vma_check() were OR'd with VM_HUGEPAGE to elide the VM_HUGEPAGE check in "madvise" THP mode. Prior to "mm: khugepaged: check THP flag in hugepage_vma_check()", this check also didn't check "never" THP mode. As such, this restores the previous behavior of collapse_pte_mapped_thp() where sysfs THP settings are ignored. See comment in code for justification why this is OK. [1] https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220706235936.2197195-8-zokeefe@google.com Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi Cc: Alex Shi Cc: Andrea Arcangeli Cc: Arnd Bergmann Cc: Axel Rasmussen Cc: Chris Kennelly Cc: Chris Zankel Cc: David Hildenbrand Cc: David Rientjes Cc: Helge Deller Cc: Hugh Dickins Cc: Ivan Kokshaysky Cc: James Bottomley Cc: Jens Axboe Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Matt Turner Cc: Max Filippov Cc: Miaohe Lin Cc: Michal Hocko Cc: Minchan Kim Cc: Pasha Tatashin Cc: Pavel Begunkov Cc: Peter Xu Cc: Rongwei Wang Cc: SeongJae Park Cc: Song Liu Cc: Thomas Bogendoerfer Cc: Vlastimil Babka Cc: Zi Yan Cc: Dan Carpenter Cc: "Souptick Joarder (HPE)" Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 768e5261fdae..b0e80cc72b0c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -168,9 +168,8 @@ static inline bool file_thp_enabled(struct vm_area_struct *vma) !inode_is_open_for_write(inode) && S_ISREG(inode->i_mode); } -bool hugepage_vma_check(struct vm_area_struct *vma, - unsigned long vm_flags, - bool smaps, bool in_pf); +bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, + bool smaps, bool in_pf, bool enforce_sysfs); #define transparent_hugepage_use_zero_page() \ (transparent_hugepage_flags & \ @@ -321,8 +320,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, } static inline bool hugepage_vma_check(struct vm_area_struct *vma, - unsigned long vm_flags, - bool smaps, bool in_pf) + unsigned long vm_flags, bool smaps, + bool in_pf, bool enforce_sysfs) { return false; } -- cgit From 7d8faaf155454f8798ec56404faca29a82689c77 Mon Sep 17 00:00:00 2001 From: Zach O'Keefe Date: Wed, 6 Jul 2022 16:59:27 -0700 Subject: mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse This idea was introduced by David Rientjes[1]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * Avoid unpredictable timing of khugepaged collapse Semantics This call is independent of the system-wide THP sysfs settings, but will fail for memory marked VM_NOHUGEPAGE. If the ranges provided span multiple VMAs, the semantics of the collapse over each VMA is independent from the others. This implies a hugepage cannot cross a VMA boundary. If collapse of a given hugepage-aligned/sized region fails, the operation may continue to attempt collapsing the remainder of memory specified. The memory ranges provided must be page-aligned, but are not required to be hugepage-aligned. If the memory ranges are not hugepage-aligned, the start/end of the range will be clamped to the first/last hugepage-aligned address covered by said range. The memory ranges must span at least one hugepage-sized region. All non-resident pages covered by the range will first be swapped/faulted-in, before being internally copied onto a freshly allocated hugepage. Unmapped pages will have their data directly initialized to 0 in the new hugepage. However, for every eligible hugepage aligned/sized region to-be collapsed, at least one page must currently be backed by memory (a PMD covering the address range must already exist). Allocation for the new hugepage may enter direct reclaim and/or compaction, regardless of VMA flags. When the system has multiple NUMA nodes, the hugepage will be allocated from the node providing the most native pages. This operation operates on the current state of the specified process and makes no persistent changes or guarantees on how pages will be mapped, constructed, or faulted in the future Return Value If all hugepage-sized/aligned regions covered by the provided range were either successfully collapsed, or were already PMD-mapped THPs, this operation will be deemed successful. On success, process_madvise(2) returns the number of bytes advised, and madvise(2) returns 0. Else, -1 is returned and errno is set to indicate the error for the most-recently attempted hugepage collapse. Note that many failures might have occurred, since the operation may continue to collapse in the event a single hugepage-sized/aligned region fails. ENOMEM Memory allocation failed or VMA not found EBUSY Memcg charging failed EAGAIN Required resource temporarily unavailable. Try again might succeed. EINVAL Other error: No PMD found, subpage doesn't have Present bit set, "Special" page no backed by struct page, VMA incorrectly sized, address not page-aligned, ... Most notable here is ENOMEM and EBUSY (new to madvise) which are intended to provide the caller with actionable feedback so they may take an appropriate fallback measure. Use Cases An immediate user of this new functionality are malloc() implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage coverage and dTLB performance. TCMalloc is such an implementation that could benefit from this[2]. Only privately-mapped anon memory is supported for now, but additional support for file, shmem, and HugeTLB high-granularity mappings[2] is expected. File and tmpfs/shmem support would permit: * Backing executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. With MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance and lower RAM footprints. * Backing guest memory by hugapages after the memory contents have been migrated in native-page-sized chunks to a new host, in a userfaultfd-based live-migration stack. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ [2] https://github.com/google/tcmalloc/tree/master/tcmalloc [jrdr.linux@gmail.com: avoid possible memory leak in failure path] Link: https://lkml.kernel.org/r/20220713024109.62810-1-jrdr.linux@gmail.com [zokeefe@google.com add missing kfree() to madvise_collapse()] Link: https://lore.kernel.org/linux-mm/20220713024109.62810-1-jrdr.linux@gmail.com/ Link: https://lkml.kernel.org/r/20220713161851.1879439-1-zokeefe@google.com [zokeefe@google.com: delay computation of hpage boundaries until use]] Link: https://lkml.kernel.org/r/20220720140603.1958773-4-zokeefe@google.com Link: https://lkml.kernel.org/r/20220706235936.2197195-10-zokeefe@google.com Signed-off-by: Zach O'Keefe Signed-off-by: "Souptick Joarder (HPE)" Suggested-by: David Rientjes Cc: Alex Shi Cc: Andrea Arcangeli Cc: Arnd Bergmann Cc: Axel Rasmussen Cc: Chris Kennelly Cc: Chris Zankel Cc: David Hildenbrand Cc: Helge Deller Cc: Hugh Dickins Cc: Ivan Kokshaysky Cc: James Bottomley Cc: Jens Axboe Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Matt Turner Cc: Max Filippov Cc: Miaohe Lin Cc: Michal Hocko Cc: Minchan Kim Cc: Pasha Tatashin Cc: Pavel Begunkov Cc: Peter Xu Cc: Rongwei Wang Cc: SeongJae Park Cc: Song Liu Cc: Thomas Bogendoerfer Cc: Vlastimil Babka Cc: Yang Shi Cc: Zi Yan Cc: Dan Carpenter Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index b0e80cc72b0c..38265f9f782e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -218,6 +218,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); +int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -361,9 +364,16 @@ static inline void split_huge_pmd_address(struct vm_area_struct *vma, static inline int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice) { - BUG(); - return 0; + return -EINVAL; } + +static inline int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + return -EINVAL; +} + static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, -- cgit From e9c2dbc8bf71a5039604a1dc45b10f24a2098f3b Mon Sep 17 00:00:00 2001 From: Yang Yang Date: Mon, 8 Aug 2022 00:56:45 +0000 Subject: mm/vmscan: define macros for refaults in struct lruvec The magic number 0 and 1 are used in several places in vmscan.c. Define macros for them to improve code readability. Link: https://lkml.kernel.org/r/20220808005644.1721066-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang Cc: Johannes Weiner Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e24b40c52468..8f571dc7c524 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -306,6 +306,8 @@ static inline bool is_active_lru(enum lru_list lru) return (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE); } +#define WORKINGSET_ANON 0 +#define WORKINGSET_FILE 1 #define ANON_AND_FILE 2 enum lruvec_flags { -- cgit From d2226ebd5484afcf9f9b71b394ec1567a7730eb1 Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Fri, 5 Aug 2022 08:59:03 +0800 Subject: mm/hugetlb: add dedicated func to get 'allowed' nodemask for current process Muchun Song found that after MPOL_PREFERRED_MANY policy was introduced in commit b27abaccf8e8 ("mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes"), the policy_nodemask_current()'s semantics for this new policy has been changed, which returns 'preferred' nodes instead of 'allowed' nodes. With the changed semantic of policy_nodemask_current, a task with MPOL_PREFERRED_MANY policy could fail to get its reservation even though it can fall back to other nodes (either defined by cpusets or all online nodes) for that reservation failing mmap calles unnecessarily early. The fix is to not consider MPOL_PREFERRED_MANY for reservations at all because they, unlike MPOL_MBIND, do not pose any actual hard constrain. Michal suggested the policy_nodemask_current() is only used by hugetlb, and could be moved to hugetlb code with more explicit name to enforce the 'allowed' semantics for which only MPOL_BIND policy matters. apply_policy_zone() is made extern to be called in hugetlb code and its return value is changed to bool. [1]. https://lore.kernel.org/lkml/20220801084207.39086-1-songmuchun@bytedance.com/t/ Link: https://lkml.kernel.org/r/20220805005903.95563-1-feng.tang@intel.com Fixes: b27abaccf8e8 ("mm/mempolicy: add MPOL_PREFERRED_MANY for multiple preferred nodes") Signed-off-by: Feng Tang Reported-by: Muchun Song Suggested-by: Michal Hocko Acked-by: Michal Hocko Reviewed-by: Muchun Song Cc: Mike Kravetz Cc: Dave Hansen Cc: Ben Widawsky Signed-off-by: Andrew Morton --- include/linux/mempolicy.h | 13 +------------ 1 file changed, 1 insertion(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 668389b4b53d..d232de7cdc56 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -151,13 +151,6 @@ extern bool mempolicy_in_oom_domain(struct task_struct *tsk, const nodemask_t *mask); extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy); -static inline nodemask_t *policy_nodemask_current(gfp_t gfp) -{ - struct mempolicy *mpol = get_task_policy(current); - - return policy_nodemask(gfp, mpol); -} - extern unsigned int mempolicy_slab_node(void); extern enum zone_type policy_zone; @@ -189,6 +182,7 @@ static inline bool mpol_is_preferred_many(struct mempolicy *pol) return (pol->mode == MPOL_PREFERRED_MANY); } +extern bool apply_policy_zone(struct mempolicy *policy, enum zone_type zone); #else @@ -294,11 +288,6 @@ static inline void mpol_put_task_policy(struct task_struct *task) { } -static inline nodemask_t *policy_nodemask_current(gfp_t gfp) -{ - return NULL; -} - static inline bool mpol_is_preferred_many(struct mempolicy *pol) { return false; -- cgit From b84e04f1baeebe6872b22a027cfc558621e842d4 Mon Sep 17 00:00:00 2001 From: Imran Khan Date: Mon, 15 Aug 2022 05:53:53 +1000 Subject: kfence: add sysfs interface to disable kfence for selected slabs. By default kfence allocation can happen for any slab object, whose size is up to PAGE_SIZE, as long as that allocation is the first allocation after expiration of kfence sample interval. But in certain debugging scenarios we may be interested in debugging corruptions involving some specific slub objects like dentry or ext4_* etc. In such cases limiting kfence for allocations involving only specific slub objects will increase the probablity of catching the issue since kfence pool will not be consumed by other slab objects. This patch introduces a sysfs interface '/sys/kernel/slab//skip_kfence' to disable kfence for specific slabs. Having the interface work in this way does not impact current/default behavior of kfence and allows us to use kfence for specific slabs (when needed) as well. The decision to skip/use kfence is taken depending on whether kmem_cache.flags has (newly introduced) SLAB_SKIP_KFENCE flag set or not. Link: https://lkml.kernel.org/r/20220814195353.2540848-1-imran.f.khan@oracle.com Signed-off-by: Imran Khan Reviewed-by: Vlastimil Babka Reviewed-by: Marco Elver Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Alexander Potapenko Cc: Dmitry Vyukov Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Roman Gushchin Signed-off-by: Andrew Morton --- include/linux/slab.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 0fefdf528e0d..352e3f082acc 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -119,6 +119,12 @@ */ #define SLAB_NO_USER_FLAGS ((slab_flags_t __force)0x10000000U) +#ifdef CONFIG_KFENCE +#define SLAB_SKIP_KFENCE ((slab_flags_t __force)0x20000000U) +#else +#define SLAB_SKIP_KFENCE 0 +#endif + /* The following flags affect the page allocator grouping pages by mobility */ /* Objects are reclaimable */ #define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0x00020000U) -- cgit From 736a8ccce99c633b2457996bb9bf2cefff0db43d Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Fri, 29 Jul 2022 16:01:04 +0800 Subject: hugetlb_cgroup: remove unneeded return value The return value of set_hugetlb_cgroup and set_hugetlb_cgroup_rsvd are always ignored. Remove them to clean up the code. Link: https://lkml.kernel.org/r/20220729080106.12752-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Mike Kravetz Cc: Mina Almasry Signed-off-by: Andrew Morton --- include/linux/hugetlb_cgroup.h | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb_cgroup.h b/include/linux/hugetlb_cgroup.h index 379344828e78..630cd255d0cf 100644 --- a/include/linux/hugetlb_cgroup.h +++ b/include/linux/hugetlb_cgroup.h @@ -90,32 +90,31 @@ hugetlb_cgroup_from_page_rsvd(struct page *page) return __hugetlb_cgroup_from_page(page, true); } -static inline int __set_hugetlb_cgroup(struct page *page, +static inline void __set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg, bool rsvd) { VM_BUG_ON_PAGE(!PageHuge(page), page); if (compound_order(page) < HUGETLB_CGROUP_MIN_ORDER) - return -1; + return; if (rsvd) set_page_private(page + SUBPAGE_INDEX_CGROUP_RSVD, (unsigned long)h_cg); else set_page_private(page + SUBPAGE_INDEX_CGROUP, (unsigned long)h_cg); - return 0; } -static inline int set_hugetlb_cgroup(struct page *page, +static inline void set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) { - return __set_hugetlb_cgroup(page, h_cg, false); + __set_hugetlb_cgroup(page, h_cg, false); } -static inline int set_hugetlb_cgroup_rsvd(struct page *page, +static inline void set_hugetlb_cgroup_rsvd(struct page *page, struct hugetlb_cgroup *h_cg) { - return __set_hugetlb_cgroup(page, h_cg, true); + __set_hugetlb_cgroup(page, h_cg, true); } static inline bool hugetlb_cgroup_disabled(void) @@ -199,16 +198,14 @@ hugetlb_cgroup_from_page_rsvd(struct page *page) return NULL; } -static inline int set_hugetlb_cgroup(struct page *page, +static inline void set_hugetlb_cgroup(struct page *page, struct hugetlb_cgroup *h_cg) { - return 0; } -static inline int set_hugetlb_cgroup_rsvd(struct page *page, +static inline void set_hugetlb_cgroup_rsvd(struct page *page, struct hugetlb_cgroup *h_cg) { - return 0; } static inline bool hugetlb_cgroup_disabled(void) -- cgit From 4d86d4f7227c6f2acfbbbe0623d49865aa71b756 Mon Sep 17 00:00:00 2001 From: Peter Collingbourne Date: Tue, 26 Jul 2022 16:02:41 -0700 Subject: mm: add more BUILD_BUG_ONs to gfp_migratetype() gfp_migratetype() also expects GFP_RECLAIMABLE and GFP_MOVABLE|GFP_RECLAIMABLE to be shiftable into MIGRATE_* enum values, so add some more BUILD_BUG_ONs to reflect this assumption. Link: https://linux-review.googlesource.com/id/Iae64e2182f75c3aca776a486b71a72571d66d83e Link: https://lkml.kernel.org/r/20220726230241.3770532-1-pcc@google.com Signed-off-by: Peter Collingbourne Signed-off-by: Andrew Morton --- include/linux/gfp.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index f314be58fa77..ea6cb9399152 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -18,6 +18,9 @@ static inline int gfp_migratetype(const gfp_t gfp_flags) VM_WARN_ON((gfp_flags & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK); BUILD_BUG_ON((1UL << GFP_MOVABLE_SHIFT) != ___GFP_MOVABLE); BUILD_BUG_ON((___GFP_MOVABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_MOVABLE); + BUILD_BUG_ON((___GFP_RECLAIMABLE >> GFP_MOVABLE_SHIFT) != MIGRATE_RECLAIMABLE); + BUILD_BUG_ON(((___GFP_MOVABLE | ___GFP_RECLAIMABLE) >> + GFP_MOVABLE_SHIFT) != MIGRATE_HIGHATOMIC); if (unlikely(page_group_by_mobility_disabled)) return MIGRATE_UNMOVABLE; -- cgit From 33024536bafd9129f1d16ade0974671c648700ac Mon Sep 17 00:00:00 2001 From: Huang Ying Date: Wed, 13 Jul 2022 16:39:51 +0800 Subject: memory tiering: hot page selection with hint page fault latency Patch series "memory tiering: hot page selection", v4. To optimize page placement in a memory tiering system with NUMA balancing, the hot pages in the slow memory nodes need to be identified. Essentially, the original NUMA balancing implementation selects the mostly recently accessed (MRU) pages to promote. But this isn't a perfect algorithm to identify the hot pages. Because the pages with quite low access frequency may be accessed eventually given the NUMA balancing page table scanning period could be quite long (e.g. 60 seconds). So in this patchset, we implement a new hot page identification algorithm based on the latency between NUMA balancing page table scanning and hint page fault. Which is a kind of mostly frequently accessed (MFU) algorithm. In NUMA balancing memory tiering mode, if there are hot pages in slow memory node and cold pages in fast memory node, we need to promote/demote hot/cold pages between the fast and cold memory nodes. A choice is to promote/demote as fast as possible. But the CPU cycles and memory bandwidth consumed by the high promoting/demoting throughput will hurt the latency of some workload because of accessing inflating and slow memory bandwidth contention. A way to resolve this issue is to restrict the max promoting/demoting throughput. It will take longer to finish the promoting/demoting. But the workload latency will be better. This is implemented in this patchset as the page promotion rate limit mechanism. The promotion hot threshold is workload and system configuration dependent. So in this patchset, a method to adjust the hot threshold automatically is implemented. The basic idea is to control the number of the candidate promotion pages to match the promotion rate limit. We used the pmbench memory accessing benchmark tested the patchset on a 2-socket server system with DRAM and PMEM installed. The test results are as follows, pmbench score promote rate (accesses/s) MB/s ------------- ------------ base 146887704.1 725.6 hot selection 165695601.2 544.0 rate limit 162814569.8 165.2 auto adjustment 170495294.0 136.9 From the results above, With hot page selection patch [1/3], the pmbench score increases about 12.8%, and promote rate (overhead) decreases about 25.0%, compared with base kernel. With rate limit patch [2/3], pmbench score decreases about 1.7%, and promote rate decreases about 69.6%, compared with hot page selection patch. With threshold auto adjustment patch [3/3], pmbench score increases about 4.7%, and promote rate decrease about 17.1%, compared with rate limit patch. Baolin helped to test the patchset with MySQL on a machine which contains 1 DRAM node (30G) and 1 PMEM node (126G). sysbench /usr/share/sysbench/oltp_read_write.lua \ ...... --tables=200 \ --table-size=1000000 \ --report-interval=10 \ --threads=16 \ --time=120 The tps can be improved about 5%. This patch (of 3): To optimize page placement in a memory tiering system with NUMA balancing, the hot pages in the slow memory node need to be identified. Essentially, the original NUMA balancing implementation selects the mostly recently accessed (MRU) pages to promote. But this isn't a perfect algorithm to identify the hot pages. Because the pages with quite low access frequency may be accessed eventually given the NUMA balancing page table scanning period could be quite long (e.g. 60 seconds). The most frequently accessed (MFU) algorithm is better. So, in this patch we implemented a better hot page selection algorithm. Which is based on NUMA balancing page table scanning and hint page fault as follows, - When the page tables of the processes are scanned to change PTE/PMD to be PROT_NONE, the current time is recorded in struct page as scan time. - When the page is accessed, hint page fault will occur. The scan time is gotten from the struct page. And The hint page fault latency is defined as hint page fault time - scan time The shorter the hint page fault latency of a page is, the higher the probability of their access frequency to be higher. So the hint page fault latency is a better estimation of the page hot/cold. It's hard to find some extra space in struct page to hold the scan time. Fortunately, we can reuse some bits used by the original NUMA balancing. NUMA balancing uses some bits in struct page to store the page accessing CPU and PID (referring to page_cpupid_xchg_last()). Which is used by the multi-stage node selection algorithm to avoid to migrate pages shared accessed by the NUMA nodes back and forth. But for pages in the slow memory node, even if they are shared accessed by multiple NUMA nodes, as long as the pages are hot, they need to be promoted to the fast memory node. So the accessing CPU and PID information are unnecessary for the slow memory pages. We can reuse these bits in struct page to record the scan time. For the fast memory pages, these bits are used as before. For the hot threshold, the default value is 1 second, which works well in our performance test. All pages with hint page fault latency < hot threshold will be considered hot. It's hard for users to determine the hot threshold. So we don't provide a kernel ABI to set it, just provide a debugfs interface for advanced users to experiment. We will continue to work on a hot threshold automatic adjustment mechanism. The downside of the above method is that the response time to the workload hot spot changing may be much longer. For example, - A previous cold memory area becomes hot - The hint page fault will be triggered. But the hint page fault latency isn't shorter than the hot threshold. So the pages will not be promoted. - When the memory area is scanned again, maybe after a scan period, the hint page fault latency measured will be shorter than the hot threshold and the pages will be promoted. To mitigate this, if there are enough free space in the fast memory node, the hot threshold will not be used, all pages will be promoted upon the hint page fault for fast response. Thanks Zhong Jiang reported and tested the fix for a bug when disabling memory tiering mode dynamically. Link: https://lkml.kernel.org/r/20220713083954.34196-1-ying.huang@intel.com Link: https://lkml.kernel.org/r/20220713083954.34196-2-ying.huang@intel.com Signed-off-by: "Huang, Ying" Reviewed-by: Baolin Wang Tested-by: Baolin Wang Cc: Johannes Weiner Cc: Michal Hocko Cc: Rik van Riel Cc: Mel Gorman Cc: Peter Zijlstra Cc: Dave Hansen Cc: Yang Shi Cc: Zi Yan Cc: Wei Xu Cc: osalvador Cc: Shakeel Butt Cc: Zhong Jiang Cc: Oscar Salvador Signed-off-by: Andrew Morton --- include/linux/mm.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 21f8b27bd9fd..27839b158ca4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1255,6 +1255,18 @@ static inline int folio_nid(const struct folio *folio) } #ifdef CONFIG_NUMA_BALANCING +/* page access time bits needs to hold at least 4 seconds */ +#define PAGE_ACCESS_TIME_MIN_BITS 12 +#if LAST_CPUPID_SHIFT < PAGE_ACCESS_TIME_MIN_BITS +#define PAGE_ACCESS_TIME_BUCKETS \ + (PAGE_ACCESS_TIME_MIN_BITS - LAST_CPUPID_SHIFT) +#else +#define PAGE_ACCESS_TIME_BUCKETS 0 +#endif + +#define PAGE_ACCESS_TIME_MASK \ + (LAST_CPUPID_MASK << PAGE_ACCESS_TIME_BUCKETS) + static inline int cpu_pid_to_cpupid(int cpu, int pid) { return ((cpu & LAST__CPU_MASK) << LAST__PID_SHIFT) | (pid & LAST__PID_MASK); @@ -1318,12 +1330,25 @@ static inline void page_cpupid_reset_last(struct page *page) page->flags |= LAST_CPUPID_MASK << LAST_CPUPID_PGSHIFT; } #endif /* LAST_CPUPID_NOT_IN_PAGE_FLAGS */ + +static inline int xchg_page_access_time(struct page *page, int time) +{ + int last_time; + + last_time = page_cpupid_xchg_last(page, time >> PAGE_ACCESS_TIME_BUCKETS); + return last_time << PAGE_ACCESS_TIME_BUCKETS; +} #else /* !CONFIG_NUMA_BALANCING */ static inline int page_cpupid_xchg_last(struct page *page, int cpupid) { return page_to_nid(page); /* XXX */ } +static inline int xchg_page_access_time(struct page *page, int time) +{ + return 0; +} + static inline int page_cpupid_last(struct page *page) { return page_to_nid(page); /* XXX */ -- cgit From c6833e10008f976a173dd5abdf992e492cbc3bcf Mon Sep 17 00:00:00 2001 From: Huang Ying Date: Wed, 13 Jul 2022 16:39:52 +0800 Subject: memory tiering: rate limit NUMA migration throughput In NUMA balancing memory tiering mode, if there are hot pages in slow memory node and cold pages in fast memory node, we need to promote/demote hot/cold pages between the fast and cold memory nodes. A choice is to promote/demote as fast as possible. But the CPU cycles and memory bandwidth consumed by the high promoting/demoting throughput will hurt the latency of some workload because of accessing inflating and slow memory bandwidth contention. A way to resolve this issue is to restrict the max promoting/demoting throughput. It will take longer to finish the promoting/demoting. But the workload latency will be better. This is implemented in this patch as the page promotion rate limit mechanism. The number of the candidate pages to be promoted to the fast memory node via NUMA balancing is counted, if the count exceeds the limit specified by the users, the NUMA balancing promotion will be stopped until the next second. A new sysctl knob kernel.numa_balancing_promote_rate_limit_MBps is added for the users to specify the limit. Link: https://lkml.kernel.org/r/20220713083954.34196-3-ying.huang@intel.com Signed-off-by: "Huang, Ying" Reviewed-by: Baolin Wang Tested-by: Baolin Wang Cc: Dave Hansen Cc: Johannes Weiner Cc: Mel Gorman Cc: Michal Hocko Cc: osalvador Cc: Peter Zijlstra Cc: Rik van Riel Cc: Shakeel Butt Cc: Wei Xu Cc: Yang Shi Cc: Zhong Jiang Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 7 +++++++ include/linux/sched/sysctl.h | 1 + 2 files changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 8f571dc7c524..a0003eaa751f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -221,6 +221,7 @@ enum node_stat_item { #endif #ifdef CONFIG_NUMA_BALANCING PGPROMOTE_SUCCESS, /* promote successfully */ + PGPROMOTE_CANDIDATE, /* candidate pages to promote */ #endif NR_VM_NODE_STAT_ITEMS }; @@ -998,6 +999,12 @@ typedef struct pglist_data { struct deferred_split deferred_split_queue; #endif +#ifdef CONFIG_NUMA_BALANCING + /* start time in ms of current promote rate limit period */ + unsigned int nbp_rl_start; + /* number of promote candidate pages at start time of current rate limit period */ + unsigned long nbp_rl_nr_cand; +#endif /* Fields commonly accessed by the page reclaim scanner */ /* diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index e650946816d0..303ee7dd0c7e 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -27,6 +27,7 @@ enum sched_tunable_scaling { #ifdef CONFIG_NUMA_BALANCING extern int sysctl_numa_balancing_mode; +extern unsigned int sysctl_numa_balancing_promote_rate_limit; #else #define sysctl_numa_balancing_mode 0 #endif -- cgit From c959924b0dc53bf6252793f41480bc01b9792570 Mon Sep 17 00:00:00 2001 From: Huang Ying Date: Wed, 13 Jul 2022 16:39:53 +0800 Subject: memory tiering: adjust hot threshold automatically The promotion hot threshold is workload and system configuration dependent. So in this patch, a method to adjust the hot threshold automatically is implemented. The basic idea is to control the number of the candidate promotion pages to match the promotion rate limit. If the hint page fault latency of a page is less than the hot threshold, we will try to promote the page, and the page is called the candidate promotion page. If the number of the candidate promotion pages in the statistics interval is much more than the promotion rate limit, the hot threshold will be decreased to reduce the number of the candidate promotion pages. Otherwise, the hot threshold will be increased to increase the number of the candidate promotion pages. To make the above method works, in each statistics interval, the total number of the pages to check (on which the hint page faults occur) and the hot/cold distribution need to be stable. Because the page tables are scanned linearly in NUMA balancing, but the hot/cold distribution isn't uniform along the address usually, the statistics interval should be larger than the NUMA balancing scan period. So in the patch, the max scan period is used as statistics interval and it works well in our tests. Link: https://lkml.kernel.org/r/20220713083954.34196-4-ying.huang@intel.com Signed-off-by: "Huang, Ying" Reviewed-by: Baolin Wang Tested-by: Baolin Wang Cc: Dave Hansen Cc: Johannes Weiner Cc: Mel Gorman Cc: Michal Hocko Cc: osalvador Cc: Peter Zijlstra Cc: Rik van Riel Cc: Shakeel Butt Cc: Wei Xu Cc: Yang Shi Cc: Zhong Jiang Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index a0003eaa751f..025754b0bc09 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1004,6 +1004,15 @@ typedef struct pglist_data { unsigned int nbp_rl_start; /* number of promote candidate pages at start time of current rate limit period */ unsigned long nbp_rl_nr_cand; + /* promote threshold in ms */ + unsigned int nbp_threshold; + /* start time in ms of current promote threshold adjustment period */ + unsigned int nbp_th_start; + /* + * number of promote candidate pages at stat time of current promote + * threshold adjustment period + */ + unsigned long nbp_th_nr_cand; #endif /* Fields commonly accessed by the page reclaim scanner */ -- cgit From 0192445cb2f7ed1cd7a95a0fc8c7645480baba25 Mon Sep 17 00:00:00 2001 From: Zi Yan Date: Mon, 15 Aug 2022 10:39:59 -0400 Subject: arch: mm: rename FORCE_MAX_ZONEORDER to ARCH_FORCE_MAX_ORDER This Kconfig option is used by individual arch to set its desired MAX_ORDER. Rename it to reflect its actual use. Link: https://lkml.kernel.org/r/20220815143959.1511278-1-zi.yan@sent.com Acked-by: Mike Rapoport Signed-off-by: Zi Yan Acked-by: Guo Ren [csky] Acked-by: Arnd Bergmann Acked-by: Catalin Marinas [arm64] Acked-by: Huacai Chen [LoongArch] Acked-by: Michael Ellerman [powerpc] Cc: Vineet Gupta Cc: Taichi Sugaya Cc: Neil Armstrong Cc: Qin Jian Cc: Guo Ren Cc: Geert Uytterhoeven Cc: Thomas Bogendoerfer Cc: Dinh Nguyen Cc: Christophe Leroy Cc: Yoshinori Sato Cc: "David S. Miller" Cc: Chris Zankel Cc: Ley Foon Tan Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 025754b0bc09..fd61347b4b1f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -24,10 +24,10 @@ #include /* Free memory management - zoned buddy allocator. */ -#ifndef CONFIG_FORCE_MAX_ZONEORDER +#ifndef CONFIG_ARCH_FORCE_MAX_ORDER #define MAX_ORDER 11 #else -#define MAX_ORDER CONFIG_FORCE_MAX_ZONEORDER +#define MAX_ORDER CONFIG_ARCH_FORCE_MAX_ORDER #endif #define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1)) -- cgit From fb70c4878d6b3001ef40fa39432a38d8cabdcbf7 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Mon, 15 Aug 2022 19:10:17 +0800 Subject: mm: kill find_min_pfn_with_active_regions() find_min_pfn_with_active_regions() is only called from free_area_init(). Open-code the PHYS_PFN(memblock_start_of_DRAM()) into free_area_init(), and kill find_min_pfn_with_active_regions(). Link: https://lkml.kernel.org/r/20220815111017.39341-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Signed-off-by: Andrew Morton --- include/linux/mm.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 27839b158ca4..e98ef2cb1176 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2520,7 +2520,6 @@ extern unsigned long absent_pages_in_range(unsigned long start_pfn, unsigned long end_pfn); extern void get_pfn_range_for_nid(unsigned int nid, unsigned long *start_pfn, unsigned long *end_pfn); -extern unsigned long find_min_pfn_with_active_regions(void); #ifndef CONFIG_NUMA static inline int early_pfn_to_nid(unsigned long pfn) -- cgit From b1d5488a252dc9c0d9574100d0b8d807bf154603 Mon Sep 17 00:00:00 2001 From: Charan Teja Kalla Date: Thu, 18 Aug 2022 19:20:00 +0530 Subject: mm: fix use-after free of page_ext after race with memory-offline The below is one path where race between page_ext and offline of the respective memory blocks will cause use-after-free on the access of page_ext structure. process1 process2 --------- --------- a)doing /proc/page_owner doing memory offline through offline_pages. b) PageBuddy check is failed thus proceed to get the page_owner information through page_ext access. page_ext = lookup_page_ext(page); migrate_pages(); ................. Since all pages are successfully migrated as part of the offline operation,send MEM_OFFLINE notification where for page_ext it calls: offline_page_ext()--> __free_page_ext()--> free_page_ext()--> vfree(ms->page_ext) mem_section->page_ext = NULL c) Check for the PAGE_EXT flags in the page_ext->flags access results into the use-after-free (leading to the translation faults). As mentioned above, there is really no synchronization between page_ext access and its freeing in the memory_offline. The memory offline steps(roughly) on a memory block is as below: 1) Isolate all the pages 2) while(1) try free the pages to buddy.(->free_list[MIGRATE_ISOLATE]) 3) delete the pages from this buddy list. 4) Then free page_ext.(Note: The struct page is still alive as it is freed only during hot remove of the memory which frees the memmap, which steps the user might not perform). This design leads to the state where struct page is alive but the struct page_ext is freed, where the later is ideally part of the former which just representing the page_flags (check [3] for why this design is chosen). The abovementioned race is just one example __but the problem persists in the other paths too involving page_ext->flags access(eg: page_is_idle())__. Fix all the paths where offline races with page_ext access by maintaining synchronization with rcu lock and is achieved in 3 steps: 1) Invalidate all the page_ext's of the sections of a memory block by storing a flag in the LSB of mem_section->page_ext. 2) Wait until all the existing readers to finish working with the ->page_ext's with synchronize_rcu(). Any parallel process that starts after this call will not get page_ext, through lookup_page_ext(), for the block parallel offline operation is being performed. 3) Now safely free all sections ->page_ext's of the block on which offline operation is being performed. Note: If synchronize_rcu() takes time then optimizations can be done in this path through call_rcu()[2]. Thanks to David Hildenbrand for his views/suggestions on the initial discussion[1] and Pavan kondeti for various inputs on this patch. [1] https://lore.kernel.org/linux-mm/59edde13-4167-8550-86f0-11fc67882107@quicinc.com/ [2] https://lore.kernel.org/all/a26ce299-aed1-b8ad-711e-a49e82bdd180@quicinc.com/T/#u [3] https://lore.kernel.org/all/6fa6b7aa-731e-891c-3efb-a03d6a700efa@redhat.com/ [quic_charante@quicinc.com: rename label `loop' to `ext_put_continue' per David] Link: https://lkml.kernel.org/r/1661496993-11473-1-git-send-email-quic_charante@quicinc.com Link: https://lkml.kernel.org/r/1660830600-9068-1-git-send-email-quic_charante@quicinc.com Signed-off-by: Charan Teja Kalla Suggested-by: David Hildenbrand Suggested-by: Michal Hocko Acked-by: Michal Hocko Acked-by: David Hildenbrand Cc: Fernand Sieber Cc: Minchan Kim Cc: Pasha Tatashin Cc: Pavan Kondeti Cc: SeongJae Park Cc: Shakeel Butt Cc: Vlastimil Babka Cc: William Kucharski Signed-off-by: Andrew Morton --- include/linux/page_ext.h | 17 +++++++++++------ include/linux/page_idle.h | 34 ++++++++++++++++++++++++---------- 2 files changed, 35 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index fabb2e1e087f..ed27198cdaf4 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -55,7 +55,8 @@ static inline void page_ext_init(void) } #endif -struct page_ext *lookup_page_ext(const struct page *page); +extern struct page_ext *page_ext_get(struct page *page); +extern void page_ext_put(struct page_ext *page_ext); static inline struct page_ext *page_ext_next(struct page_ext *curr) { @@ -71,11 +72,6 @@ static inline void pgdat_page_ext_init(struct pglist_data *pgdat) { } -static inline struct page_ext *lookup_page_ext(const struct page *page) -{ - return NULL; -} - static inline void page_ext_init(void) { } @@ -87,5 +83,14 @@ static inline void page_ext_init_flatmem_late(void) static inline void page_ext_init_flatmem(void) { } + +static inline struct page_ext *page_ext_get(struct page *page) +{ + return NULL; +} + +static inline void page_ext_put(struct page_ext *page_ext) +{ +} #endif /* CONFIG_PAGE_EXTENSION */ #endif /* __LINUX_PAGE_EXT_H */ diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 4663dfed1293..5cb7bd2078ec 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -13,65 +13,79 @@ * If there is not enough space to store Idle and Young bits in page flags, use * page ext flags instead. */ - static inline bool folio_test_young(struct folio *folio) { - struct page_ext *page_ext = lookup_page_ext(&folio->page); + struct page_ext *page_ext = page_ext_get(&folio->page); + bool page_young; if (unlikely(!page_ext)) return false; - return test_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_young = test_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_ext_put(page_ext); + + return page_young; } static inline void folio_set_young(struct folio *folio) { - struct page_ext *page_ext = lookup_page_ext(&folio->page); + struct page_ext *page_ext = page_ext_get(&folio->page); if (unlikely(!page_ext)) return; set_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_ext_put(page_ext); } static inline bool folio_test_clear_young(struct folio *folio) { - struct page_ext *page_ext = lookup_page_ext(&folio->page); + struct page_ext *page_ext = page_ext_get(&folio->page); + bool page_young; if (unlikely(!page_ext)) return false; - return test_and_clear_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_young = test_and_clear_bit(PAGE_EXT_YOUNG, &page_ext->flags); + page_ext_put(page_ext); + + return page_young; } static inline bool folio_test_idle(struct folio *folio) { - struct page_ext *page_ext = lookup_page_ext(&folio->page); + struct page_ext *page_ext = page_ext_get(&folio->page); + bool page_idle; if (unlikely(!page_ext)) return false; - return test_bit(PAGE_EXT_IDLE, &page_ext->flags); + page_idle = test_bit(PAGE_EXT_IDLE, &page_ext->flags); + page_ext_put(page_ext); + + return page_idle; } static inline void folio_set_idle(struct folio *folio) { - struct page_ext *page_ext = lookup_page_ext(&folio->page); + struct page_ext *page_ext = page_ext_get(&folio->page); if (unlikely(!page_ext)) return; set_bit(PAGE_EXT_IDLE, &page_ext->flags); + page_ext_put(page_ext); } static inline void folio_clear_idle(struct folio *folio) { - struct page_ext *page_ext = lookup_page_ext(&folio->page); + struct page_ext *page_ext = page_ext_get(&folio->page); if (unlikely(!page_ext)) return; clear_bit(PAGE_EXT_IDLE, &page_ext->flags); + page_ext_put(page_ext); } #endif /* !CONFIG_64BIT */ -- cgit From e2f8f44b7686bf65747f485ac7c9c43de8b8de09 Mon Sep 17 00:00:00 2001 From: Rolf Eike Beer Date: Mon, 22 Aug 2022 15:01:32 +0200 Subject: mm: pagewalk: fix documentation of PTE hole handling Empty PTEs are passed to the pte_entry callback, not to pte_hole. Link: https://lkml.kernel.org/r/3695521.kQq0lBPeGt@devpool047 Signed-off-by: Rolf Eike Beer Signed-off-by: Andrew Morton --- include/linux/pagewalk.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index ac7b38ad5903..f3fafb731ffd 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -15,12 +15,12 @@ struct mm_walk; * this handler is required to be able to handle * pmd_trans_huge() pmds. They may simply choose to * split_huge_page() instead of handling it explicitly. - * @pte_entry: if set, called for each non-empty PTE (lowest-level) - * entry + * @pte_entry: if set, called for each PTE (lowest-level) entry, + * including empty ones * @pte_hole: if set, called for each hole at all levels, - * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD - * 4:PTE. Any folded depths (where PTRS_PER_P?D is equal - * to 1) are skipped. + * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. + * Any folded depths (where PTRS_PER_P?D is equal to 1) + * are skipped. * @hugetlb_entry: if set, called for each hugetlb entry * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means -- cgit From 408587baee39753304dd572dacf374f75545d25a Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Thu, 25 Aug 2022 00:05:05 +0000 Subject: mm: page_counter: rearrange struct page_counter fields MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit With memcg v2 enabled, memcg->memory.usage is a very hot member for the workloads doing memcg charging on multiple CPUs concurrently. Particularly the network intensive workloads. In addition, there is a false cache sharing between memory.usage and memory.high on the charge path. This patch moves the usage into a separate cacheline and move all the read most fields into separate cacheline. To evaluate the impact of this optimization, on a 72 CPUs machine, we ran the following workload in a three level of cgroup hierarchy. $ netserver -6 # 36 instances of netperf with following params $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K Results (average throughput of netperf): Without (6.0-rc1) 10482.7 Mbps With patch 12413.7 Mbps (18.4% improvement) With the patch, the throughput improved by 18.4%. One side-effect of this patch is the increase in the size of struct mem_cgroup. For example with this patch on 64 bit build, the size of struct mem_cgroup increased from 4032 bytes to 4416 bytes. However for the performance improvement, this additional size is worth it. In addition there are opportunities to reduce the size of struct mem_cgroup like deprecation of kmem and tcpmem page counters and better packing. Link: https://lkml.kernel.org/r/20220825000506.239406-3-shakeelb@google.com Signed-off-by: Shakeel Butt Reported-by: kernel test robot Reviewed-by: Feng Tang Acked-by: Soheil Hassas Yeganeh Acked-by: Roman Gushchin Acked-by: Michal Hocko Cc: Eric Dumazet Cc: Johannes Weiner Cc: "Michal Koutný" Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/page_counter.h | 35 +++++++++++++++++++++++------------ 1 file changed, 23 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index 679591301994..78a1c934e416 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -3,15 +3,26 @@ #define _LINUX_PAGE_COUNTER_H #include +#include #include #include +#if defined(CONFIG_SMP) +struct pc_padding { + char x[0]; +} ____cacheline_internodealigned_in_smp; +#define PC_PADDING(name) struct pc_padding name +#else +#define PC_PADDING(name) +#endif + struct page_counter { + /* + * Make sure 'usage' does not share cacheline with any other field. The + * memcg->memory.usage is a hot member of struct mem_cgroup. + */ atomic_long_t usage; - unsigned long min; - unsigned long low; - unsigned long high; - unsigned long max; + PC_PADDING(_pad1_); /* effective memory.min and memory.min usage tracking */ unsigned long emin; @@ -23,18 +34,18 @@ struct page_counter { atomic_long_t low_usage; atomic_long_t children_low_usage; - /* legacy */ unsigned long watermark; unsigned long failcnt; - /* - * 'parent' is placed here to be far from 'usage' to reduce - * cache false sharing, as 'usage' is written mostly while - * parent is frequently read for cgroup's hierarchical - * counting nature. - */ + /* Keep all the read most fields in a separete cacheline. */ + PC_PADDING(_pad2_); + + unsigned long min; + unsigned long low; + unsigned long high; + unsigned long max; struct page_counter *parent; -}; +} ____cacheline_internodealigned_in_smp; #if BITS_PER_LONG == 32 #define PAGE_COUNTER_MAX LONG_MAX -- cgit From 1813e51eece0ad6f4aacaeb738e7cced46feb470 Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Thu, 25 Aug 2022 00:05:06 +0000 Subject: memcg: increase MEMCG_CHARGE_BATCH to 64 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit For several years, MEMCG_CHARGE_BATCH was kept at 32 but with bigger machines and the network intensive workloads requiring througput in Gbps, 32 is too small and makes the memcg charging path a bottleneck. For now, increase it to 64 for easy acceptance to 6.0. We will need to revisit this in future for ever increasing demand of higher performance. Please note that the memcg charge path drain the per-cpu memcg charge stock, so there should not be any oom behavior change. Though it does have impact on rstat flushing and high limit reclaim backoff. To evaluate the impact of this optimization, on a 72 CPUs machine, we ran the following workload in a three level of cgroup hierarchy. $ netserver -6 # 36 instances of netperf with following params $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K Results (average throughput of netperf): Without (6.0-rc1) 10482.7 Mbps With patch 17064.7 Mbps (62.7% improvement) With the patch, the throughput improved by 62.7%. Link: https://lkml.kernel.org/r/20220825000506.239406-4-shakeelb@google.com Signed-off-by: Shakeel Butt Reported-by: kernel test robot Acked-by: Soheil Hassas Yeganeh Reviewed-by: Feng Tang Acked-by: Roman Gushchin Acked-by: Muchun Song Acked-by: Michal Hocko Cc: Eric Dumazet Cc: Johannes Weiner Cc: "Michal Koutný" Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6257867fbf95..a2461f9a8738 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -354,10 +354,11 @@ struct mem_cgroup { }; /* - * size of first charge trial. "32" comes from vmscan.c's magic value. - * TODO: maybe necessary to use big numbers in big irons. + * size of first charge trial. + * TODO: maybe necessary to use big numbers in big irons or dynamic based of the + * workload. */ -#define MEMCG_CHARGE_BATCH 32U +#define MEMCG_CHARGE_BATCH 64U extern struct mem_cgroup *root_mem_cgroup; -- cgit From c4f20f1479c456d9dd1c1e6d8bf956a25de742dc Mon Sep 17 00:00:00 2001 From: Li Zhe Date: Thu, 25 Aug 2022 18:27:14 +0800 Subject: page_ext: introduce boot parameter 'early_page_ext' In commit 2f1ee0913ce5 ("Revert "mm: use early_pfn_to_nid in page_ext_init""), we call page_ext_init() after page_alloc_init_late() to avoid some panic problem. It seems that we cannot track early page allocations in current kernel even if page structure has been initialized early. This patch introduces a new boot parameter 'early_page_ext' to resolve this problem. If we pass it to the kernel, page_ext_init() will be moved up and the feature 'deferred initialization of struct pages' will be disabled to initialize the page allocator early and prevent the panic problem above. It can help us to catch early page allocations. This is useful especially when we find that the free memory value is not the same right after different kernel booting. [akpm@linux-foundation.org: fix section issue by removing __meminitdata] Link: https://lkml.kernel.org/r/20220825102714.669-1-lizhe.67@bytedance.com Signed-off-by: Li Zhe Suggested-by: Michal Hocko Acked-by: Michal Hocko Acked-by: Vlastimil Babka Cc: Jason A. Donenfeld Cc: Jonathan Corbet Cc: Kees Cook Cc: Mark-PK Tsai Cc: Masami Hiramatsu (Google) Cc: Steven Rostedt Signed-off-by: Andrew Morton --- include/linux/page_ext.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index ed27198cdaf4..22be4582faae 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -36,9 +36,15 @@ struct page_ext { unsigned long flags; }; +extern bool early_page_ext; extern unsigned long page_ext_size; extern void pgdat_page_ext_init(struct pglist_data *pgdat); +static inline bool early_page_ext_enabled(void) +{ + return early_page_ext; +} + #ifdef CONFIG_SPARSEMEM static inline void page_ext_init_flatmem(void) { @@ -68,6 +74,11 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr) #else /* !CONFIG_PAGE_EXTENSION */ struct page_ext; +static inline bool early_page_ext_enabled(void) +{ + return false; +} + static inline void pgdat_page_ext_init(struct pglist_data *pgdat) { } -- cgit From 35b471467f88b8d8b52ba5b006a29b9d057d09bf Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Tue, 23 Aug 2022 17:40:17 -0700 Subject: filemap: add filemap_get_folios_contig() Patch series "Convert to filemap_get_folios_contig()", v3. This patch series replaces find_get_pages_contig() with filemap_get_folios_contig(). This patch (of 7): This function is meant to replace find_get_pages_contig(). Unlike find_get_pages_contig(), filemap_get_folios_contig() no longer takes in a target number of pages to find - It returns up to 15 contiguous folios. To be more consistent with filemap_get_folios(), filemap_get_folios_contig() now also updates the start index passed in, and takes an end index. Link: https://lkml.kernel.org/r/20220824004023.77310-1-vishal.moola@gmail.com Link: https://lkml.kernel.org/r/20220824004023.77310-2-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Cc: Chris Mason Cc: Josef Bacik Cc: David Sterba Cc: Ryusuke Konishi Cc: Al Viro Cc: Matthew Wilcox Cc: David Sterba Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 0178b2040ea3..8689c32d628b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -718,6 +718,8 @@ static inline struct page *find_subpage(struct page *head, pgoff_t index) unsigned filemap_get_folios(struct address_space *mapping, pgoff_t *start, pgoff_t end, struct folio_batch *fbatch); +unsigned filemap_get_folios_contig(struct address_space *mapping, + pgoff_t *start, pgoff_t end, struct folio_batch *fbatch); unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start, unsigned int nr_pages, struct page **pages); unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, -- cgit From 48658d8509d2db3391c95aa74308a2b1fc8e8461 Mon Sep 17 00:00:00 2001 From: "Vishal Moola (Oracle)" Date: Tue, 23 Aug 2022 17:40:23 -0700 Subject: filemap: remove find_get_pages_contig() All callers of find_get_pages_contig() have been removed, so it is no longer needed. Link: https://lkml.kernel.org/r/20220824004023.77310-8-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) Cc: Al Viro Cc: Chris Mason Cc: David Sterba Cc: David Sterba Cc: Josef Bacik Cc: Matthew Wilcox Cc: Ryusuke Konishi Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 8689c32d628b..09de43e36a64 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -720,8 +720,6 @@ unsigned filemap_get_folios(struct address_space *mapping, pgoff_t *start, pgoff_t end, struct folio_batch *fbatch); unsigned filemap_get_folios_contig(struct address_space *mapping, pgoff_t *start, pgoff_t end, struct folio_batch *fbatch); -unsigned find_get_pages_contig(struct address_space *mapping, pgoff_t start, - unsigned int nr_pages, struct page **pages); unsigned find_get_pages_range_tag(struct address_space *mapping, pgoff_t *index, pgoff_t end, xa_mark_t tag, unsigned int nr_pages, struct page **pages); -- cgit From 639118d1571f70b1157b4bb5ac574b0ab0f38099 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Sat, 27 Aug 2022 19:20:43 +0800 Subject: mm: kill is_memblock_offlined() Directly check state of struct memory_block, no need a single function. Link: https://lkml.kernel.org/r/20220827112043.187028-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Reviewed-by: David Hildenbrand Reviewed-by: Oscar Salvador Reviewed-by: Anshuman Khandual Signed-off-by: Andrew Morton --- include/linux/memory_hotplug.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index e0b2209ab71c..54675791bc50 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -11,7 +11,6 @@ struct page; struct zone; struct pglist_data; struct mem_section; -struct memory_block; struct memory_group; struct resource; struct vmem_altmap; @@ -333,7 +332,6 @@ extern void move_pfn_range_to_zone(struct zone *zone, unsigned long start_pfn, extern void remove_pfn_range_from_zone(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages); -extern bool is_memblock_offlined(struct memory_block *mem); extern int sparse_add_section(int nid, unsigned long pfn, unsigned long nr_pages, struct vmem_altmap *altmap, struct dev_pagemap *pgmap); -- cgit From b4a0215e11dcfe23a48c65c6d6c82c0c2c551a48 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Sat, 27 Aug 2022 19:19:59 +0800 Subject: mm: fix null-ptr-deref in kswapd_is_running() kswapd_run/stop() will set pgdat->kswapd to NULL, which could race with kswapd_is_running() in kcompactd(), kswapd_run/stop() kcompactd() kswapd_is_running() pgdat->kswapd // error or nomal ptr verify pgdat->kswapd // load non-NULL pgdat->kswapd pgdat->kswapd = NULL task_is_running(pgdat->kswapd) // Null pointer derefence KASAN reports the null-ptr-deref shown below, vmscan: Failed to start kswapd on node 0 ... BUG: KASAN: null-ptr-deref in kcompactd+0x440/0x504 Read of size 8 at addr 0000000000000024 by task kcompactd0/37 CPU: 0 PID: 37 Comm: kcompactd0 Kdump: loaded Tainted: G OE 5.10.60 #1 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x394 show_stack+0x34/0x4c dump_stack+0x158/0x1e4 __kasan_report+0x138/0x140 kasan_report+0x44/0xdc __asan_load8+0x94/0xd0 kcompactd+0x440/0x504 kthread+0x1a4/0x1f0 ret_from_fork+0x10/0x18 At present kswapd/kcompactd_run() and kswapd/kcompactd_stop() are protected by mem_hotplug_begin/done(), but without kcompactd(). There is no need to involve memory hotplug lock in kcompactd(), so let's add a new mutex to protect pgdat->kswapd accesses. Also, because the kcompactd task will check the state of kswapd task, it's better to call kcompactd_stop() before kswapd_stop() to reduce lock conflicts. [akpm@linux-foundation.org: add comments] Link: https://lkml.kernel.org/r/20220827111959.186838-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Cc: David Hildenbrand Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/memory_hotplug.h | 20 ++++++++++++++++++++ include/linux/mmzone.h | 6 ++++-- 2 files changed, 24 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 54675791bc50..51052969dbfe 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -215,6 +215,22 @@ void put_online_mems(void); void mem_hotplug_begin(void); void mem_hotplug_done(void); +/* See kswapd_is_running() */ +static inline void pgdat_kswapd_lock(pg_data_t *pgdat) +{ + mutex_lock(&pgdat->kswapd_lock); +} + +static inline void pgdat_kswapd_unlock(pg_data_t *pgdat) +{ + mutex_unlock(&pgdat->kswapd_lock); +} + +static inline void pgdat_kswapd_lock_init(pg_data_t *pgdat) +{ + mutex_init(&pgdat->kswapd_lock); +} + #else /* ! CONFIG_MEMORY_HOTPLUG */ #define pfn_to_online_page(pfn) \ ({ \ @@ -251,6 +267,10 @@ static inline bool movable_node_is_enabled(void) { return false; } + +static inline void pgdat_kswapd_lock(pg_data_t *pgdat) {} +static inline void pgdat_kswapd_unlock(pg_data_t *pgdat) {} +static inline void pgdat_kswapd_lock_init(pg_data_t *pgdat) {} #endif /* ! CONFIG_MEMORY_HOTPLUG */ /* diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index fd61347b4b1f..18cf0fc5ce67 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -956,8 +956,10 @@ typedef struct pglist_data { atomic_t nr_writeback_throttled;/* nr of writeback-throttled tasks */ unsigned long nr_reclaim_start; /* nr pages written while throttled * when throttling started. */ - struct task_struct *kswapd; /* Protected by - mem_hotplug_begin/done() */ +#ifdef CONFIG_MEMORY_HOTPLUG + struct mutex kswapd_lock; +#endif + struct task_struct *kswapd; /* Protected by kswapd_lock */ int kswapd_order; enum zone_type kswapd_highest_zoneidx; -- cgit From a38c94ed59fc312b51a33d867444ce73c6805ff6 Mon Sep 17 00:00:00 2001 From: Liu Shixin Date: Mon, 29 Aug 2022 17:57:09 +0800 Subject: mm/thp: simplify has_transparent_hugepage by using IS_BUILTIN Simplify code of has_transparent_hugepage define by using IS_BUILTIN. No functional change. Link: https://lkml.kernel.org/r/20220829095709.3287462-1-liushixin2@huawei.com Signed-off-by: Liu Shixin Reviewed-by: David Hildenbrand Cc: Hugh Dickins Cc: Kefeng Wang Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 014ee8f0fbaa..5911761487de 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1598,11 +1598,7 @@ typedef unsigned int pgtbl_mod_mask; #endif #ifndef has_transparent_hugepage -#ifdef CONFIG_TRANSPARENT_HUGEPAGE -#define has_transparent_hugepage() 1 -#else -#define has_transparent_hugepage() 0 -#endif +#define has_transparent_hugepage() IS_BUILTIN(CONFIG_TRANSPARENT_HUGEPAGE) #endif /* -- cgit From bcd0dea5f4fb3d098b03ef3064fe6c8201d7c515 Mon Sep 17 00:00:00 2001 From: Liu Shixin Date: Mon, 29 Aug 2022 17:51:25 +0800 Subject: mm/thp: remove redundant CONFIG_TRANSPARENT_HUGEPAGE Simplify code by removing redundant CONFIG_TRANSPARENT_HUGEPAGE judgment. No functional change. Link: https://lkml.kernel.org/r/20220829095125.3284567-1-liushixin2@huawei.com Signed-off-by: Liu Shixin Cc: Kefeng Wang Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 5911761487de..d13b4f7cc5be 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1276,8 +1276,7 @@ static inline int pgd_devmap(pgd_t pgd) #endif #if !defined(CONFIG_TRANSPARENT_HUGEPAGE) || \ - (defined(CONFIG_TRANSPARENT_HUGEPAGE) && \ - !defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD)) + !defined(CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD) static inline int pud_trans_huge(pud_t pud) { return 0; -- cgit From 214f8796907b8015b778badf4710a4701472779a Mon Sep 17 00:00:00 2001 From: Zhang Yi Date: Thu, 1 Sep 2022 21:34:52 +0800 Subject: fs/buffer: remove __breadahead_gfp() Patch series "fs/buffer: remove ll_rw_block()", v2. ll_rw_block() will skip locked buffer before submitting IO, it assumes that locked buffer means it is under IO. This assumption is not always true because we cannot guarantee every buffer lock path would submit IO. After commit 88dbcbb3a484 ("blkdev: avoid migration stalls for blkdev pages"), buffer_migrate_folio_norefs() becomes one exceptional case, and there may be others. So ll_rw_block() is not safe on the sync read path, we could get false positive EIO return value when filesystem reading metadata. It seems that it could be only used on the readahead path. Unfortunately, many filesystem misuse the ll_rw_block() on the sync read path. This patch set just remove ll_rw_block() and add new friendly helpers, which could prevent false positive EIO on the read metadata path. Thanks for the suggestion from Jan, the original discussion is at [1]. patch 1: remove unused helpers in fs/buffer.c patch 2: add new bh_read_[*] helpers patch 3-11: remove all ll_rw_block() calls in filesystems patch 12-14: do some leftover cleanups. [1]. https://lore.kernel.org/linux-mm/20220825080146.2021641-1-chengzhihao1@huawei.com/ This patch (of 14): No one use __breadahead_gfp() and sb_breadahead_unmovable() any more, remove them. Link: https://lkml.kernel.org/r/20220901133505.2510834-1-yi.zhang@huawei.com Link: https://lkml.kernel.org/r/20220901133505.2510834-2-yi.zhang@huawei.com Signed-off-by: Zhang Yi Reviewed-by: Jan Kara Reviewed-by: Christoph Hellwig Cc: Alexander Viro Cc: Andreas Gruenbacher Cc: Bob Peterson Cc: Evgeniy Dushistov Cc: Heming Zhao Cc: Jens Axboe Cc: Konstantin Komarov Cc: Mark Fasheh Cc: Theodore Ts'o Cc: Yu Kuai Cc: Zhihao Cheng Signed-off-by: Andrew Morton --- include/linux/buffer_head.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 089c9ade4325..c3863c417b00 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -214,8 +214,6 @@ struct buffer_head *__getblk_gfp(struct block_device *bdev, sector_t block, void __brelse(struct buffer_head *); void __bforget(struct buffer_head *); void __breadahead(struct block_device *, sector_t block, unsigned int size); -void __breadahead_gfp(struct block_device *, sector_t block, unsigned int size, - gfp_t gfp); struct buffer_head *__bread_gfp(struct block_device *, sector_t block, unsigned size, gfp_t gfp); void invalidate_bh_lrus(void); @@ -340,12 +338,6 @@ sb_breadahead(struct super_block *sb, sector_t block) __breadahead(sb->s_bdev, block, sb->s_blocksize); } -static inline void -sb_breadahead_unmovable(struct super_block *sb, sector_t block) -{ - __breadahead_gfp(sb->s_bdev, block, sb->s_blocksize, 0); -} - static inline struct buffer_head * sb_getblk(struct super_block *sb, sector_t block) { -- cgit From fdee117ee86479fd2644bcd9ac2b2469e55722d1 Mon Sep 17 00:00:00 2001 From: Zhang Yi Date: Thu, 1 Sep 2022 21:34:53 +0800 Subject: fs/buffer: add some new buffer read helpers Current ll_rw_block() helper is fragile because it assumes that locked buffer means it's under IO which is submitted by some other who holds the lock, it skip buffer if it failed to get the lock, so it's only safe on the readahead path. Unfortunately, now that most filesystems still use this helper mistakenly on the sync metadata read path. There is no guarantee that the one who holds the buffer lock always submit IO (e.g. buffer_migrate_folio_norefs() after commit 88dbcbb3a484 ("blkdev: avoid migration stalls for blkdev pages"), it could lead to false positive -EIO when submitting reading IO. This patch add some friendly buffer read helpers to prepare replacing ll_rw_block() and similar calls. We can only call bh_readahead_[] helpers for the readahead paths. Link: https://lkml.kernel.org/r/20220901133505.2510834-3-yi.zhang@huawei.com Signed-off-by: Zhang Yi Reviewed-by: Jan Kara Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/buffer_head.h | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index c3863c417b00..6d09785bed9f 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -232,6 +232,9 @@ void write_boundary_block(struct block_device *bdev, sector_t bblock, unsigned blocksize); int bh_uptodate_or_lock(struct buffer_head *bh); int bh_submit_read(struct buffer_head *bh); +int __bh_read(struct buffer_head *bh, blk_opf_t op_flags, bool wait); +void __bh_read_batch(int nr, struct buffer_head *bhs[], + blk_opf_t op_flags, bool force_lock); extern int buffer_heads_over_limit; @@ -399,6 +402,41 @@ static inline struct buffer_head *__getblk(struct block_device *bdev, return __getblk_gfp(bdev, block, size, __GFP_MOVABLE); } +static inline void bh_readahead(struct buffer_head *bh, blk_opf_t op_flags) +{ + if (!buffer_uptodate(bh) && trylock_buffer(bh)) { + if (!buffer_uptodate(bh)) + __bh_read(bh, op_flags, false); + else + unlock_buffer(bh); + } +} + +static inline void bh_read_nowait(struct buffer_head *bh, blk_opf_t op_flags) +{ + if (!bh_uptodate_or_lock(bh)) + __bh_read(bh, op_flags, false); +} + +/* Returns 1 if buffer uptodated, 0 on success, and -EIO on error. */ +static inline int bh_read(struct buffer_head *bh, blk_opf_t op_flags) +{ + if (bh_uptodate_or_lock(bh)) + return 1; + return __bh_read(bh, op_flags, true); +} + +static inline void bh_read_batch(int nr, struct buffer_head *bhs[]) +{ + __bh_read_batch(nr, bhs, 0, true); +} + +static inline void bh_readahead_batch(int nr, struct buffer_head *bhs[], + blk_opf_t op_flags) +{ + __bh_read_batch(nr, bhs, op_flags, false); +} + /** * __bread() - reads a specified block and returns the bh * @bdev: the block_device to read from -- cgit From 79f5978420691fb84593f98557ea56f8b32228f2 Mon Sep 17 00:00:00 2001 From: Zhang Yi Date: Thu, 1 Sep 2022 21:35:03 +0800 Subject: fs/buffer: remove ll_rw_block() helper Now that all ll_rw_block() users has been replaced to new safe helpers, we just remove it here. Link: https://lkml.kernel.org/r/20220901133505.2510834-13-yi.zhang@huawei.com Signed-off-by: Zhang Yi Reviewed-by: Jan Kara Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/buffer_head.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 6d09785bed9f..b415d8bc2a09 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -223,7 +223,6 @@ struct buffer_head *alloc_buffer_head(gfp_t gfp_flags); void free_buffer_head(struct buffer_head * bh); void unlock_buffer(struct buffer_head *bh); void __lock_buffer(struct buffer_head *bh); -void ll_rw_block(blk_opf_t, int, struct buffer_head * bh[]); int sync_dirty_buffer(struct buffer_head *bh); int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); -- cgit From 454552d0145486cf82572e73cca266ab6a56e86b Mon Sep 17 00:00:00 2001 From: Zhang Yi Date: Thu, 1 Sep 2022 21:35:05 +0800 Subject: fs/buffer: remove bh_submit_read() helper bh_submit_read() has no user anymore, just remove it. Link: https://lkml.kernel.org/r/20220901133505.2510834-15-yi.zhang@huawei.com Signed-off-by: Zhang Yi Reviewed-by: Jan Kara Reviewed-by: Christoph Hellwig Signed-off-by: Andrew Morton --- include/linux/buffer_head.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index b415d8bc2a09..9b6556d3f110 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -230,7 +230,6 @@ int submit_bh(blk_opf_t, struct buffer_head *); void write_boundary_block(struct block_device *bdev, sector_t bblock, unsigned blocksize); int bh_uptodate_or_lock(struct buffer_head *bh); -int bh_submit_read(struct buffer_head *bh); int __bh_read(struct buffer_head *bh, blk_opf_t op_flags, bool wait); void __bh_read_batch(int nr, struct buffer_head *bhs[], blk_opf_t op_flags, bool force_lock); -- cgit From 263b899802fc43cd0e6979f819271dfbb93c94af Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Thu, 1 Sep 2022 20:00:21 +0800 Subject: hugetlb: make hugetlb_cma_check() static Patch series "A few cleanup patches for hugetlb", v2. This series contains a few cleanup patches to use helper functions to simplify the codes, remove unneeded nid parameter and so on. More details can be found in the respective changelogs. This patch (of 10): Make hugetlb_cma_check() static as it's only used inside mm/hugetlb.c. Link: https://lkml.kernel.org/r/20220901120030.63318-1-linmiaohe@huawei.com Link: https://lkml.kernel.org/r/20220901120030.63318-2-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Muchun Song Cc: Mike Kravetz Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3ec981a0d8b3..57e72954a482 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1123,14 +1123,10 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) extern void __init hugetlb_cma_reserve(int order); -extern void __init hugetlb_cma_check(void); #else static inline __init void hugetlb_cma_reserve(int order) { } -static inline __init void hugetlb_cma_check(void) -{ -} #endif bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); -- cgit From 088b8aa537c2c767765f1c19b555f21ffe555786 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 1 Sep 2022 10:35:59 +0200 Subject: mm: fix PageAnonExclusive clearing racing with concurrent RCU GUP-fast commit 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive") made sure that when PageAnonExclusive() has to be cleared during temporary unmapping of a page, that the PTE is cleared/invalidated and that the TLB is flushed. What we want to achieve in all cases is that we cannot end up with a pin on an anonymous page that may be shared, because such pins would be unreliable and could result in memory corruptions when the mapped page and the pin go out of sync due to a write fault. That TLB flush handling was inspired by an outdated comment in mm/ksm.c:write_protect_page(), which similarly required the TLB flush in the past to synchronize with GUP-fast. However, ever since general RCU GUP fast was introduced in commit 2667f50e8b81 ("mm: introduce a general RCU get_user_pages_fast()"), a TLB flush is no longer sufficient to handle concurrent GUP-fast in all cases -- it only handles traditional IPI-based GUP-fast correctly. Peter Xu (thankfully) questioned whether that TLB flush is really required. On architectures that send an IPI broadcast on TLB flush, it works as expected. To synchronize with RCU GUP-fast properly, we're conceptually fine, however, we have to enforce a certain memory order and are missing memory barriers. Let's document that, avoid the TLB flush where possible and use proper explicit memory barriers where required. We shouldn't really care about the additional memory barriers here, as we're not on extremely hot paths -- and we're getting rid of some TLB flushes. We use a smp_mb() pair for handling concurrent pinning and a smp_rmb()/smp_wmb() pair for handling the corner case of only temporary PTE changes but permanent PageAnonExclusive changes. One extreme example, whereby GUP-fast takes a R/O pin and KSM wants to convert an exclusive anonymous page to a KSM page, and that page is already mapped write-protected (-> no PTE change) would be: Thread 0 (KSM) Thread 1 (GUP-fast) (B1) Read the PTE # (B2) skipped without FOLL_WRITE (A1) Clear PTE smp_mb() (A2) Check pinned (B3) Pin the mapped page smp_mb() (A3) Clear PageAnonExclusive smp_wmb() (A4) Restore PTE (B4) Check if the PTE changed smp_rmb() (B5) Check PageAnonExclusive Thread 1 will properly detect that PageAnonExclusive was cleared and back off. Note that we don't need a memory barrier between checking if the page is pinned and clearing PageAnonExclusive, because stores are not speculated. The possible issues due to reordering are of theoretical nature so far and attempts to reproduce the race failed. Especially the "no PTE change" case isn't the common case, because we'd need an exclusive anonymous page that's mapped R/O and the PTE is clean in KSM code -- and using KSM with page pinning isn't extremely common. Further, the clear+TLB flush we used for now implies a memory barrier. So the problematic missing part should be the missing memory barrier after pinning but before checking if the PTE changed. Link: https://lkml.kernel.org/r/20220901083559.67446-1-david@redhat.com Fixes: 6c287605fd56 ("mm: remember exclusively mapped anonymous pages with PG_anon_exclusive") Signed-off-by: David Hildenbrand Cc: Jason Gunthorpe Cc: John Hubbard Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Peter Xu Cc: Alistair Popple Cc: Nadav Amit Cc: Yang Shi Cc: Vlastimil Babka Cc: Michal Hocko Cc: Mike Kravetz Cc: Andrea Parri Cc: Will Deacon Cc: Peter Zijlstra Cc: "Paul E. McKenney" Cc: Christoph von Recklinghausen Cc: Don Dutile Signed-off-by: Andrew Morton --- include/linux/mm.h | 9 +++++-- include/linux/rmap.h | 66 ++++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 68 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index e98ef2cb1176..8a5ad9d050bf 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2999,8 +2999,8 @@ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) * PageAnonExclusive() has to protect against concurrent GUP: * * Ordinary GUP: Using the PT lock * * GUP-fast and fork(): mm->write_protect_seq - * * GUP-fast and KSM or temporary unmapping (swap, migration): - * clear/invalidate+flush of the page table entry + * * GUP-fast and KSM or temporary unmapping (swap, migration): see + * page_try_share_anon_rmap() * * Must be called with the (sub)page that's actually referenced via the * page table entry, which might not necessarily be the head page for a @@ -3021,6 +3021,11 @@ static inline bool gup_must_unshare(unsigned int flags, struct page *page) */ if (!PageAnon(page)) return false; + + /* Paired with a memory barrier in page_try_share_anon_rmap(). */ + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP)) + smp_rmb(); + /* * Note that PageKsm() pages cannot be exclusive, and consequently, * cannot get pinned. diff --git a/include/linux/rmap.h b/include/linux/rmap.h index bf80adca980b..72b2bcc37f73 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -267,7 +267,7 @@ dup: * @page: the exclusive anonymous page to try marking possibly shared * * The caller needs to hold the PT lock and has to have the page table entry - * cleared/invalidated+flushed, to properly sync against GUP-fast. + * cleared/invalidated. * * This is similar to page_try_dup_anon_rmap(), however, not used during fork() * to duplicate a mapping, but instead to prepare for KSM or temporarily @@ -283,12 +283,68 @@ static inline int page_try_share_anon_rmap(struct page *page) { VM_BUG_ON_PAGE(!PageAnon(page) || !PageAnonExclusive(page), page); - /* See page_try_dup_anon_rmap(). */ - if (likely(!is_device_private_page(page) && - unlikely(page_maybe_dma_pinned(page)))) - return -EBUSY; + /* device private pages cannot get pinned via GUP. */ + if (unlikely(is_device_private_page(page))) { + ClearPageAnonExclusive(page); + return 0; + } + /* + * We have to make sure that when we clear PageAnonExclusive, that + * the page is not pinned and that concurrent GUP-fast won't succeed in + * concurrently pinning the page. + * + * Conceptually, PageAnonExclusive clearing consists of: + * (A1) Clear PTE + * (A2) Check if the page is pinned; back off if so. + * (A3) Clear PageAnonExclusive + * (A4) Restore PTE (optional, but certainly not writable) + * + * When clearing PageAnonExclusive, we cannot possibly map the page + * writable again, because anon pages that may be shared must never + * be writable. So in any case, if the PTE was writable it cannot + * be writable anymore afterwards and there would be a PTE change. Only + * if the PTE wasn't writable, there might not be a PTE change. + * + * Conceptually, GUP-fast pinning of an anon page consists of: + * (B1) Read the PTE + * (B2) FOLL_WRITE: check if the PTE is not writable; back off if so. + * (B3) Pin the mapped page + * (B4) Check if the PTE changed by re-reading it; back off if so. + * (B5) If the original PTE is not writable, check if + * PageAnonExclusive is not set; back off if so. + * + * If the PTE was writable, we only have to make sure that GUP-fast + * observes a PTE change and properly backs off. + * + * If the PTE was not writable, we have to make sure that GUP-fast either + * detects a (temporary) PTE change or that PageAnonExclusive is cleared + * and properly backs off. + * + * Consequently, when clearing PageAnonExclusive(), we have to make + * sure that (A1), (A2)/(A3) and (A4) happen in the right memory + * order. In GUP-fast pinning code, we have to make sure that (B3),(B4) + * and (B5) happen in the right memory order. + * + * We assume that there might not be a memory barrier after + * clearing/invalidating the PTE (A1) and before restoring the PTE (A4), + * so we use explicit ones here. + */ + + /* Paired with the memory barrier in try_grab_folio(). */ + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP)) + smp_mb(); + + if (unlikely(page_maybe_dma_pinned(page))) + return -EBUSY; ClearPageAnonExclusive(page); + + /* + * This is conceptually a smp_wmb() paired with the smp_rmb() in + * gup_must_unshare(). + */ + if (IS_ENABLED(CONFIG_HAVE_FAST_GUP)) + smp_mb__after_atomic(); return 0; } -- cgit From 7bb5da0d490b2d836c5218f5186ee588d2145310 Mon Sep 17 00:00:00 2001 From: Valentin Schneider Date: Thu, 30 Jun 2022 23:32:57 +0100 Subject: kexec: turn all kexec_mutex acquisitions into trylocks Patch series "kexec, panic: Making crash_kexec() NMI safe", v4. This patch (of 2): Most acquistions of kexec_mutex are done via mutex_trylock() - those were a direct "translation" from: 8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()") there have however been two additions since then that use mutex_lock(): crash_get_memory_size() and crash_shrink_memory(). A later commit will replace said mutex with an atomic variable, and locking operations will become atomic_cmpxchg(). Rather than having those mutex_lock() become while (atomic_cmpxchg(&lock, 0, 1)), turn them into trylocks that can return -EBUSY on acquisition failure. This does halve the printable size of the crash kernel, but that's still neighbouring 2G for 32bit kernels which should be ample enough. Link: https://lkml.kernel.org/r/20220630223258.4144112-1-vschneid@redhat.com Link: https://lkml.kernel.org/r/20220630223258.4144112-2-vschneid@redhat.com Signed-off-by: Valentin Schneider Cc: Arnd Bergmann Cc: "Eric W . Biederman" Cc: Juri Lelli Cc: Luis Claudio R. Goncalves Cc: Miaohe Lin Cc: Petr Mladek Cc: Sebastian Andrzej Siewior Cc: Thomas Gleixner Cc: Baoquan He Signed-off-by: Andrew Morton --- include/linux/kexec.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 13e6c4b58f07..41a686996aaa 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -427,7 +427,7 @@ extern int kexec_load_disabled; extern bool kexec_in_progress; int crash_shrink_memory(unsigned long new_size); -size_t crash_get_memory_size(void); +ssize_t crash_get_memory_size(void); #ifndef arch_kexec_protect_crashkres /* -- cgit From 2be9880dc87342dc7ae459c9ea5c9ee2a45b33d8 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Fri, 19 Aug 2022 09:44:06 +0800 Subject: kernel: exit: cleanup release_thread() Only x86 has own release_thread(), introduce a new weak release_thread() function to clean empty definitions in other ARCHs. Link: https://lkml.kernel.org/r/20220819014406.32266-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Acked-by: Guo Ren [csky] Acked-by: Russell King (Oracle) Acked-by: Geert Uytterhoeven Acked-by: Brian Cain Acked-by: Michael Ellerman [powerpc] Acked-by: Stafford Horne [openrisc] Acked-by: Catalin Marinas [arm64] Acked-by: Huacai Chen [LoongArch] Cc: Alexander Gordeev Cc: Anton Ivanov Cc: Borislav Petkov Cc: Christian Borntraeger Cc: Christophe Leroy Cc: Chris Zankel Cc: Dave Hansen Cc: "David S. Miller" Cc: Dinh Nguyen Cc: Guo Ren [csky] Cc: Heiko Carstens Cc: Helge Deller Cc: Ingo Molnar Cc: Ivan Kokshaysky Cc: James Bottomley Cc: Johannes Berg Cc: Jonas Bonn Cc: Matt Turner Cc: Max Filippov Cc: Michal Simek Cc: Nicholas Piggin Cc: Palmer Dabbelt Cc: Paul Walmsley Cc: Richard Henderson Cc: Richard Weinberger Cc: Rich Felker Cc: Stefan Kristiansson Cc: Sven Schnelle Cc: Thomas Bogendoerfer Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vineet Gupta Cc: Will Deacon Cc: Xuerui Wang Cc: Yoshinori Sato Signed-off-by: Andrew Morton --- include/linux/sched/task.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sched/task.h b/include/linux/sched/task.h index 81cab4b01edc..d6c48163c6de 100644 --- a/include/linux/sched/task.h +++ b/include/linux/sched/task.h @@ -127,6 +127,9 @@ static inline void put_task_struct_many(struct task_struct *t, int nr) void put_task_struct_rcu_user(struct task_struct *task); +/* Free all architecture-specific resources held by a thread. */ +void release_thread(struct task_struct *dead_task); + #ifdef CONFIG_ARCH_WANTS_DYNAMIC_TASK_STRUCT extern int arch_task_struct_size __read_mostly; #else -- cgit From da3f52ba359590d8eb465ae0d8b6e11d6fd9432f Mon Sep 17 00:00:00 2001 From: Uros Bizjak Date: Sun, 21 Aug 2022 21:30:11 +0200 Subject: iversion: use atomic64_try_cmpxchg) Use atomic64_try_cmpxchg instead of atomic64_cmpxchg (*ptr, old, new) == old in inode_set_max_iversion_raw, inode_maybe_inc_version and inode_query_iversion. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications. The loop in inode_maybe_inc_iversion improves from: 5563: 48 89 ca mov %rcx,%rdx 5566: 48 89 c8 mov %rcx,%rax 5569: 48 83 e2 fe and $0xfffffffffffffffe,%rdx 556d: 48 83 c2 02 add $0x2,%rdx 5571: f0 48 0f b1 16 lock cmpxchg %rdx,(%rsi) 5576: 48 39 c1 cmp %rax,%rcx 5579: 0f 84 85 fc ff ff je 5204 <...> 557f: 48 89 c1 mov %rax,%rcx 5582: eb df jmp 5563 <...> to: 5563: 48 89 c2 mov %rax,%rdx 5566: 48 83 e2 fe and $0xfffffffffffffffe,%rdx 556a: 48 83 c2 02 add $0x2,%rdx 556e: f0 48 0f b1 11 lock cmpxchg %rdx,(%rcx) 5573: 0f 84 8b fc ff ff je 5204 <...> 5579: eb e8 jmp 5563 <...> Link: https://lkml.kernel.org/r/20220821193011.88208-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak Signed-off-by: Andrew Morton --- include/linux/iversion.h | 32 +++++++++----------------------- 1 file changed, 9 insertions(+), 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iversion.h b/include/linux/iversion.h index 3bfebde5a1a6..eb5a15810169 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -123,17 +123,12 @@ inode_peek_iversion_raw(const struct inode *inode) static inline void inode_set_max_iversion_raw(struct inode *inode, u64 val) { - u64 cur, old; + u64 cur = inode_peek_iversion_raw(inode); - cur = inode_peek_iversion_raw(inode); - for (;;) { + do { if (cur > val) break; - old = atomic64_cmpxchg(&inode->i_version, cur, val); - if (likely(old == cur)) - break; - cur = old; - } + } while (!atomic64_try_cmpxchg(&inode->i_version, &cur, val)); } /** @@ -197,7 +192,7 @@ inode_set_iversion_queried(struct inode *inode, u64 val) static inline bool inode_maybe_inc_iversion(struct inode *inode, bool force) { - u64 cur, old, new; + u64 cur, new; /* * The i_version field is not strictly ordered with any other inode @@ -211,19 +206,14 @@ inode_maybe_inc_iversion(struct inode *inode, bool force) */ smp_mb(); cur = inode_peek_iversion_raw(inode); - for (;;) { + do { /* If flag is clear then we needn't do anything */ if (!force && !(cur & I_VERSION_QUERIED)) return false; /* Since lowest bit is flag, add 2 to avoid it */ new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT; - - old = atomic64_cmpxchg(&inode->i_version, cur, new); - if (likely(old == cur)) - break; - cur = old; - } + } while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new)); return true; } @@ -304,10 +294,10 @@ inode_peek_iversion(const struct inode *inode) static inline u64 inode_query_iversion(struct inode *inode) { - u64 cur, old, new; + u64 cur, new; cur = inode_peek_iversion_raw(inode); - for (;;) { + do { /* If flag is already set, then no need to swap */ if (cur & I_VERSION_QUERIED) { /* @@ -320,11 +310,7 @@ inode_query_iversion(struct inode *inode) } new = cur | I_VERSION_QUERIED; - old = atomic64_cmpxchg(&inode->i_version, cur, new); - if (likely(old == cur)) - break; - cur = old; - } + } while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new)); return cur >> I_VERSION_QUERIED_SHIFT; } -- cgit From e1d7c7609ae0933b59840390c1b207ac0a925c8b Mon Sep 17 00:00:00 2001 From: Uros Bizjak Date: Mon, 22 Aug 2022 16:38:51 +0200 Subject: bitops: use try_cmpxchg in set_mask_bits and bit_clear_unless Use try_cmpxchg instead of cmpxchg (*ptr, old, new) == old in set_mask_bits and bit_clear_unless. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after cmpxchg (and related move instruction in front of cmpxchg). Also, try_cmpxchg implicitly assigns old *ptr value to "old" when cmpxchg fails, enabling further code simplifications. Link: https://lkml.kernel.org/r/20220822143851.3290-1-ubizjak@gmail.com Signed-off-by: Uros Bizjak Signed-off-by: Andrew Morton --- include/linux/bitops.h | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 3b89c64bcfd8..fb24a513d917 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -328,10 +328,10 @@ static __always_inline void __assign_bit(long nr, volatile unsigned long *addr, const typeof(*(ptr)) mask__ = (mask), bits__ = (bits); \ typeof(*(ptr)) old__, new__; \ \ + old__ = READ_ONCE(*(ptr)); \ do { \ - old__ = READ_ONCE(*(ptr)); \ new__ = (old__ & ~mask__) | bits__; \ - } while (cmpxchg(ptr, old__, new__) != old__); \ + } while (!try_cmpxchg(ptr, &old__, new__)); \ \ old__; \ }) @@ -343,11 +343,12 @@ static __always_inline void __assign_bit(long nr, volatile unsigned long *addr, const typeof(*(ptr)) clear__ = (clear), test__ = (test);\ typeof(*(ptr)) old__, new__; \ \ + old__ = READ_ONCE(*(ptr)); \ do { \ - old__ = READ_ONCE(*(ptr)); \ + if (old__ & test__) \ + break; \ new__ = old__ & ~clear__; \ - } while (!(old__ & test__) && \ - cmpxchg(ptr, old__, new__) != old__); \ + } while (!try_cmpxchg(ptr, &old__, new__)); \ \ !(old__ & test__); \ }) -- cgit From 4acb83417cadfdcbe64215f9d0ddcf3132af808e Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Fri, 9 Sep 2022 11:40:22 -0700 Subject: sbitmap: fix batched wait_cnt accounting Batched completions can clear multiple bits, but we're only decrementing the wait_cnt by one each time. This can cause waiters to never be woken, stalling IO. Use the batched count instead. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215679 Signed-off-by: Keith Busch Link: https://lore.kernel.org/r/20220909184022.1709476-1-kbusch@fb.com Signed-off-by: Jens Axboe --- include/linux/sbitmap.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sbitmap.h b/include/linux/sbitmap.h index 8f5a86e210b9..4d2d5205ab58 100644 --- a/include/linux/sbitmap.h +++ b/include/linux/sbitmap.h @@ -575,8 +575,9 @@ void sbitmap_queue_wake_all(struct sbitmap_queue *sbq); * sbitmap_queue_wake_up() - Wake up some of waiters in one waitqueue * on a &struct sbitmap_queue. * @sbq: Bitmap queue to wake up. + * @nr: Number of bits cleared. */ -void sbitmap_queue_wake_up(struct sbitmap_queue *sbq); +void sbitmap_queue_wake_up(struct sbitmap_queue *sbq, int nr); /** * sbitmap_queue_show() - Dump &struct sbitmap_queue information to a &struct -- cgit From 320fb0f91e55ba248d4bad106b408e59099cfa89 Mon Sep 17 00:00:00 2001 From: Yu Kuai Date: Mon, 29 Aug 2022 10:22:37 +0800 Subject: blk-throttle: fix that io throttle can only work for single bio Test scripts: cd /sys/fs/cgroup/blkio/ echo "8:0 1024" > blkio.throttle.write_bps_device echo $$ > cgroup.procs dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & dd if=/dev/zero of=/dev/sda bs=10k count=1 oflag=direct & Test result: 10240 bytes (10 kB, 10 KiB) copied, 10.0134 s, 1.0 kB/s 10240 bytes (10 kB, 10 KiB) copied, 10.0135 s, 1.0 kB/s The problem is that the second bio is finished after 10s instead of 20s. Root cause: 1) second bio will be flagged: __blk_throtl_bio while (true) { ... if (sq->nr_queued[rw]) -> some bio is throttled already break }; bio_set_flag(bio, BIO_THROTTLED); -> flag the bio 2) flagged bio will be dispatched without waiting: throtl_dispatch_tg tg_may_dispatch tg_with_in_bps_limit if (bps_limit == U64_MAX || bio_flagged(bio, BIO_THROTTLED)) *wait = 0; -> wait time is zero return true; commit 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") support to count split bios for iops limit, thus it adds flagged bio checking in tg_with_in_bps_limit() so that split bios will only count once for bps limit, however, it introduce a new problem that io throttle won't work if multiple bios are throttled. In order to fix the problem, handle iops/bps limit in different ways: 1) for iops limit, there is no flag to record if the bio is throttled, and iops is always applied. 2) for bps limit, original bio will be flagged with BIO_BPS_THROTTLED, and io throttle will ignore bio with the flag. Noted this patch also remove the code to set flag in __bio_clone(), it's introduced in commit 111be8839817 ("block-throttle: avoid double charge"), and author thinks split bio can be resubmited and throttled again, which is wrong because split bio will continue to dispatch from caller. Fixes: 9f5ede3c01f9 ("block: throttle split bio in case of iops limit") Cc: Signed-off-by: Yu Kuai Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220829022240.3348319-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe --- include/linux/bio.h | 2 +- include/linux/blk_types.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bio.h b/include/linux/bio.h index ca22b06700a9..2c5806997bbf 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -509,7 +509,7 @@ static inline void bio_set_dev(struct bio *bio, struct block_device *bdev) { bio_clear_flag(bio, BIO_REMAPPED); if (bio->bi_bdev != bdev) - bio_clear_flag(bio, BIO_THROTTLED); + bio_clear_flag(bio, BIO_BPS_THROTTLED); bio->bi_bdev = bdev; bio_associate_blkg(bio); } diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 1ef99790f6ed..41afb4cfb9b0 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -325,7 +325,7 @@ enum { BIO_QUIET, /* Make BIO Quiet */ BIO_CHAIN, /* chained bio, ->bi_remaining in effect */ BIO_REFFED, /* bio has elevated ->bi_cnt */ - BIO_THROTTLED, /* This bio has already been subjected to + BIO_BPS_THROTTLED, /* This bio has already been subjected to * throttling rules. Don't do it again. */ BIO_TRACE_COMPLETION, /* bio_endio() should trace the final completion * of this bio. */ -- cgit From fd72cb1bb5c71d2738a17cbdee6280365a451706 Mon Sep 17 00:00:00 2001 From: Tomas Winkler Date: Thu, 8 Sep 2022 00:50:59 +0300 Subject: mei: add kdoc for struct mei_aux_device struct mei_aux_device is an interface structure requires proper documenation. Signed-off-by: Tomas Winkler Reviewed-by: Daniele Ceraolo Spurio Signed-off-by: Daniele Ceraolo Spurio Link: https://patchwork.freedesktop.org/patch/msgid/20220907215113.1596567-3-tomas.winkler@intel.com Acked-by: Greg Kroah-Hartman Signed-off-by: Joonas Lahtinen --- include/linux/mei_aux.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mei_aux.h b/include/linux/mei_aux.h index 587f25128848..a0cb587006d5 100644 --- a/include/linux/mei_aux.h +++ b/include/linux/mei_aux.h @@ -7,6 +7,12 @@ #include +/** + * struct mei_aux_device - mei auxiliary device + * @aux_dev: - auxiliary device object + * @irq: interrupt driving the mei auxiliary device + * @bar: mmio resource bar reserved to mei auxiliary device + */ struct mei_aux_device { struct auxiliary_device aux_dev; int irq; -- cgit From ed57967ab64f1dcdb9efb78a3f4a950dfdccf544 Mon Sep 17 00:00:00 2001 From: Tomas Winkler Date: Thu, 8 Sep 2022 00:51:00 +0300 Subject: mei: add slow_firmware flag to the mei auxiliary device Add slow_firmware flag to the mei auxiliary device info to inform the mei driver about slow underlying firmware. Such firmware will require to use larger operation timeouts. Signed-off-by: Alexander Usyskin Signed-off-by: Tomas Winkler Reviewed-by: Daniele Ceraolo Spurio Signed-off-by: Daniele Ceraolo Spurio Link: https://patchwork.freedesktop.org/patch/msgid/20220907215113.1596567-4-tomas.winkler@intel.com Acked-by: Greg Kroah-Hartman Signed-off-by: Joonas Lahtinen --- include/linux/mei_aux.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mei_aux.h b/include/linux/mei_aux.h index a0cb587006d5..4894d8bf4159 100644 --- a/include/linux/mei_aux.h +++ b/include/linux/mei_aux.h @@ -12,11 +12,14 @@ * @aux_dev: - auxiliary device object * @irq: interrupt driving the mei auxiliary device * @bar: mmio resource bar reserved to mei auxiliary device + * @slow_firmware: The device has slow underlying firmware. + * Such firmware will require to use larger operation timeouts. */ struct mei_aux_device { struct auxiliary_device aux_dev; int irq; struct resource bar; + bool slow_firmware; }; #define auxiliary_dev_to_mei_aux_dev(auxiliary_dev) \ -- cgit From 342e4c7e2d38e421f99898b629b5d82e5ae90323 Mon Sep 17 00:00:00 2001 From: Tomas Winkler Date: Thu, 8 Sep 2022 00:51:08 +0300 Subject: mei: gsc: setup gsc extended operational memory 1. Retrieve extended operational memory physical pointers from the auxiliary device info. 2. Setup memory registers. 3. Notify firmware that the memory is ready by sending the memory ready command. 4. Disable PXP device if GSC is not in PXP mode. CC: Daniele Ceraolo Spurio Signed-off-by: Tomas Winkler Signed-off-by: Alexander Usyskin Reviewed-by: Daniele Ceraolo Spurio Signed-off-by: Daniele Ceraolo Spurio Link: https://patchwork.freedesktop.org/patch/msgid/20220907215113.1596567-12-tomas.winkler@intel.com Acked-by: Greg Kroah-Hartman Signed-off-by: Joonas Lahtinen Acked-by: Greg Kroah-Hartman Signed-off-by: Joonas Lahtinen --- include/linux/mei_aux.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mei_aux.h b/include/linux/mei_aux.h index 4894d8bf4159..506912ad363b 100644 --- a/include/linux/mei_aux.h +++ b/include/linux/mei_aux.h @@ -12,6 +12,8 @@ * @aux_dev: - auxiliary device object * @irq: interrupt driving the mei auxiliary device * @bar: mmio resource bar reserved to mei auxiliary device + * @ext_op_mem: resource for extend operational memory + * used in graphics PXP mode. * @slow_firmware: The device has slow underlying firmware. * Such firmware will require to use larger operation timeouts. */ @@ -19,6 +21,7 @@ struct mei_aux_device { struct auxiliary_device aux_dev; int irq; struct resource bar; + struct resource ext_op_mem; bool slow_firmware; }; -- cgit From a47126ec29f538e1197862919f94d3b6668144a4 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Fri, 9 Sep 2022 15:24:57 -0500 Subject: PCI/PTM: Cache PTM Capability offset Cache the PTM Capability offset instead of searching for it every time we enable/disable PTM or save/restore PTM state. No functional change intended. Link: https://lore.kernel.org/r/20220909202505.314195-2-helgaas@kernel.org Tested-by: Rajvi Jingar Signed-off-by: Bjorn Helgaas Reviewed-by: Kuppuswamy Sathyanarayanan Reviewed-by: Mika Westerberg --- include/linux/pci.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 060af91bafcd..54be939023a3 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -475,6 +475,7 @@ struct pci_dev { unsigned int broken_cmd_compl:1; /* No compl for some cmds */ #endif #ifdef CONFIG_PCIE_PTM + u16 ptm_cap; /* PTM Capability */ unsigned int ptm_root:1; unsigned int ptm_enabled:1; u8 ptm_granularity; -- cgit From e8bdc5ea481638e0a4fd5639050d2b170417f493 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Fri, 9 Sep 2022 15:25:00 -0500 Subject: PCI/PTM: Add pci_suspend_ptm() and pci_resume_ptm() We disable PTM during suspend because that allows some Root Ports to enter lower-power PM states, which means we also need to disable PTM for all downstream devices. Add pci_suspend_ptm() and pci_resume_ptm() for this purpose. pci_enable_ptm() and pci_disable_ptm() are for drivers to use to enable or disable PTM. They use dev->ptm_enabled to keep track of whether PTM should be enabled. pci_suspend_ptm() and pci_resume_ptm() are PCI core-internal functions to temporarily disable PTM during suspend and (depending on dev->ptm_enabled) re-enable PTM during resume. Enable/disable/suspend/resume all use internal __pci_enable_ptm() and __pci_disable_ptm() functions that only update the PTM Control register. Outline: pci_enable_ptm(struct pci_dev *dev) { __pci_enable_ptm(dev); dev->ptm_enabled = 1; pci_ptm_info(dev); } pci_disable_ptm(struct pci_dev *dev) { if (dev->ptm_enabled) { __pci_disable_ptm(dev); dev->ptm_enabled = 0; } } pci_suspend_ptm(struct pci_dev *dev) { if (dev->ptm_enabled) __pci_disable_ptm(dev); } pci_resume_ptm(struct pci_dev *dev) { if (dev->ptm_enabled) __pci_enable_ptm(dev); } Nothing currently calls pci_resume_ptm(); the suspend path saves the PTM state before disabling PTM, so the PTM state restore in the resume path implicitly re-enables it. A future change will use pci_resume_ptm() to fix some problems with this approach. Link: https://lore.kernel.org/r/20220909202505.314195-5-helgaas@kernel.org Tested-by: Rajvi Jingar Signed-off-by: Bjorn Helgaas Reviewed-by: Mika Westerberg --- include/linux/pci.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pci.h b/include/linux/pci.h index 54be939023a3..cb5f796e3319 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1678,10 +1678,12 @@ bool pci_ats_disabled(void); #ifdef CONFIG_PCIE_PTM int pci_enable_ptm(struct pci_dev *dev, u8 *granularity); +void pci_disable_ptm(struct pci_dev *dev); bool pcie_ptm_enabled(struct pci_dev *dev); #else static inline int pci_enable_ptm(struct pci_dev *dev, u8 *granularity) { return -EINVAL; } +static inline void pci_disable_ptm(struct pci_dev *dev) { } static inline bool pcie_ptm_enabled(struct pci_dev *dev) { return false; } #endif -- cgit From 4b9d1bc7911c9d9159c4881455c584cde99fbb19 Mon Sep 17 00:00:00 2001 From: Jiangshan Yi Date: Tue, 6 Sep 2022 13:49:01 +0800 Subject: Input: hp_sdc: fix spelling typo in comment Fix spelling typo in comment. Reported-by: k2ci Signed-off-by: Jiangshan Yi Signed-off-by: Helge Deller --- include/linux/hp_sdc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hp_sdc.h b/include/linux/hp_sdc.h index 6f1dee7e67e0..9be8704e2d38 100644 --- a/include/linux/hp_sdc.h +++ b/include/linux/hp_sdc.h @@ -180,7 +180,7 @@ switch (val) { \ #define HP_SDC_CMD_SET_IM 0x40 /* 010xxxxx == set irq mask */ -/* The documents provided do not explicitly state that all registers betweem +/* The documents provided do not explicitly state that all registers between * 0x01 and 0x1f inclusive can be read by sending their register index as a * command, but this is implied and appears to be the case. */ -- cgit From 6f7a71f804287a7566314ab1a73d8ca2c18ca0d7 Mon Sep 17 00:00:00 2001 From: AngeloGioacchino Del Regno Date: Tue, 13 Sep 2022 14:34:54 +0200 Subject: regulator: Add driver for MT6331 PMIC regulators Add a driver for the regulators found in the MT6331 PMIC. This PMIC features six buck and 21 Low DropOut (LDO) regulators. Signed-off-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20220913123456.384513-3-angelogioacchino.delregno@collabora.com Signed-off-by: Mark Brown --- include/linux/regulator/mt6331-regulator.h | 46 ++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 include/linux/regulator/mt6331-regulator.h (limited to 'include/linux') diff --git a/include/linux/regulator/mt6331-regulator.h b/include/linux/regulator/mt6331-regulator.h new file mode 100644 index 000000000000..2801a9879c14 --- /dev/null +++ b/include/linux/regulator/mt6331-regulator.h @@ -0,0 +1,46 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2022 Collabora Ltd. + * Author: AngeloGioacchino Del Regno + */ + +#ifndef __LINUX_REGULATOR_MT6331_H +#define __LINUX_REGULATOR_MT6331_H + +enum { + /* BUCK */ + MT6331_ID_VDVFS11 = 0, + MT6331_ID_VDVFS12, + MT6331_ID_VDVFS13, + MT6331_ID_VDVFS14, + MT6331_ID_VCORE2, + MT6331_ID_VIO18, + /* LDO */ + MT6331_ID_VTCXO1, + MT6331_ID_VTCXO2, + MT6331_ID_AVDD32_AUD, + MT6331_ID_VAUXA32, + MT6331_ID_VCAMA, + MT6331_ID_VIO28, + MT6331_ID_VCAM_AF, + MT6331_ID_VMC, + MT6331_ID_VMCH, + MT6331_ID_VEMC33, + MT6331_ID_VGP1, + MT6331_ID_VSIM1, + MT6331_ID_VSIM2, + MT6331_ID_VMIPI, + MT6331_ID_VIBR, + MT6331_ID_VGP4, + MT6331_ID_VCAMD, + MT6331_ID_VUSB10, + MT6331_ID_VCAM_IO, + MT6331_ID_VSRAM_DVFS1, + MT6331_ID_VGP2, + MT6331_ID_VGP3, + MT6331_ID_VRTC, + MT6331_ID_VDIG18, + MT6331_ID_VREG_MAX +}; + +#endif /* __LINUX_REGULATOR_MT6331_H */ -- cgit From 1cc5a52e873a4f9725eafe5aa9cd213b7b58e29e Mon Sep 17 00:00:00 2001 From: AngeloGioacchino Del Regno Date: Tue, 13 Sep 2022 14:34:56 +0200 Subject: regulator: Add driver for MT6332 PMIC regulators Add a driver for the regulators found in the MT6332 PMICs, including six buck and four LDO regulators. Signed-off-by: AngeloGioacchino Del Regno Link: https://lore.kernel.org/r/20220913123456.384513-5-angelogioacchino.delregno@collabora.com Signed-off-by: Mark Brown --- include/linux/regulator/mt6332-regulator.h | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) create mode 100644 include/linux/regulator/mt6332-regulator.h (limited to 'include/linux') diff --git a/include/linux/regulator/mt6332-regulator.h b/include/linux/regulator/mt6332-regulator.h new file mode 100644 index 000000000000..af5e3ed31029 --- /dev/null +++ b/include/linux/regulator/mt6332-regulator.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (c) 2022 Collabora Ltd. + * Author: AngeloGioacchino Del Regno + */ + +#ifndef __LINUX_REGULATOR_MT6332_H +#define __LINUX_REGULATOR_MT6332_H + +enum { + /* BUCK */ + MT6332_ID_VDRAM = 0, + MT6332_ID_VDVFS2, + MT6332_ID_VPA, + MT6332_ID_VRF1, + MT6332_ID_VRF2, + MT6332_ID_VSBST, + /* LDO */ + MT6332_ID_VAUXB32, + MT6332_ID_VBIF28, + MT6332_ID_VDIG18, + MT6332_ID_VSRAM_DVFS2, + MT6332_ID_VUSB33, + MT6332_ID_VREG_MAX +}; + +#endif /* __LINUX_REGULATOR_MT6332_H */ -- cgit From 1dd611a9c55f6657328429b988e302323691c3dc Mon Sep 17 00:00:00 2001 From: Heiner Kallweit Date: Fri, 19 Aug 2022 23:26:17 +0200 Subject: mmc: core: Switch to basic workqueue API for sdio_irq_work The delay parameter isn't set by any user, therefore simplify the code and switch to the basic workqueue API w/o delay support. This also reduces the size of struct mmc_host. Signed-off-by: Heiner Kallweit Link: https://lore.kernel.org/r/13d8200a-e2a8-d907-38ce-a16fc5ce14aa@gmail.com Signed-off-by: Ulf Hansson --- include/linux/mmc/host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index eb8bc5b9b0b7..8fdd3cf971a3 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -476,7 +476,7 @@ struct mmc_host { unsigned int sdio_irqs; struct task_struct *sdio_irq_thread; - struct delayed_work sdio_irq_work; + struct work_struct sdio_irq_work; bool sdio_irq_pending; atomic_t sdio_irq_thread_abort; -- cgit From 96a7457a14d9cf98cf58de1e26c03180e0f28141 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Mon, 12 Sep 2022 19:07:19 +0200 Subject: can: skb: unify skb CAN frame identification helpers Replace open coded checks for sk_buffs containing Classical CAN and CAN FD frame structures as a preparation for CAN XL support. With the added length check the unintended processing of CAN XL frames having the CANXL_XLF bit set can be suppressed even when the skb->len fits to non CAN XL frames. The CAN_RAW socket needs a rework to use these helpers. Therefore the use of these helpers is postponed to the CAN_RAW CAN XL integration. The J1939 protocol gets a check for Classical CAN frames too. Acked-by: Vincent Mailhol Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/all/20220912170725.120748-2-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde --- include/linux/can/skb.h | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h index 182749e858b3..27ebfc28510f 100644 --- a/include/linux/can/skb.h +++ b/include/linux/can/skb.h @@ -97,10 +97,20 @@ static inline struct sk_buff *can_create_echo_skb(struct sk_buff *skb) return nskb; } +static inline bool can_is_can_skb(const struct sk_buff *skb) +{ + struct can_frame *cf = (struct can_frame *)skb->data; + + /* the CAN specific type of skb is identified by its data length */ + return (skb->len == CAN_MTU && cf->len <= CAN_MAX_DLEN); +} + static inline bool can_is_canfd_skb(const struct sk_buff *skb) { + struct canfd_frame *cfd = (struct canfd_frame *)skb->data; + /* the CAN specific type of skb is identified by its data length */ - return skb->len == CANFD_MTU; + return (skb->len == CANFD_MTU && cfd->len <= CANFD_MAX_DLEN); } #endif /* !_CAN_SKB_H */ -- cgit From 467ef4c7b9d1c22ee64342804bf92549d765df14 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Mon, 12 Sep 2022 19:07:20 +0200 Subject: can: skb: add skb CAN frame data length helpers Add two helpers to retrieve the data length from CAN sk_buffs and prepare the length information to be a uint16 value for the CAN XL support. Acked-by: Vincent Mailhol Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/all/20220912170725.120748-3-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde --- include/linux/can/skb.h | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h index 27ebfc28510f..ddffc2fc008c 100644 --- a/include/linux/can/skb.h +++ b/include/linux/can/skb.h @@ -20,7 +20,8 @@ void can_flush_echo_skb(struct net_device *dev); int can_put_echo_skb(struct sk_buff *skb, struct net_device *dev, unsigned int idx, unsigned int frame_len); struct sk_buff *__can_get_echo_skb(struct net_device *dev, unsigned int idx, - u8 *len_ptr, unsigned int *frame_len_ptr); + unsigned int *len_ptr, + unsigned int *frame_len_ptr); unsigned int __must_check can_get_echo_skb(struct net_device *dev, unsigned int idx, unsigned int *frame_len_ptr); @@ -113,4 +114,25 @@ static inline bool can_is_canfd_skb(const struct sk_buff *skb) return (skb->len == CANFD_MTU && cfd->len <= CANFD_MAX_DLEN); } +/* get length element value from can[fd]_frame structure */ +static inline unsigned int can_skb_get_len_val(struct sk_buff *skb) +{ + const struct canfd_frame *cfd = (struct canfd_frame *)skb->data; + + return cfd->len; +} + +/* get needed data length inside CAN frame for all frame types (RTR aware) */ +static inline unsigned int can_skb_get_data_len(struct sk_buff *skb) +{ + unsigned int len = can_skb_get_len_val(skb); + const struct can_frame *cf = (struct can_frame *)skb->data; + + /* RTR frames have an actual length of zero */ + if (can_is_can_skb(skb) && cf->can_id & CAN_RTR_FLAG) + return 0; + + return len; +} + #endif /* !_CAN_SKB_H */ -- cgit From fb08cba12b52cba4366e858932307649dc5304e2 Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Mon, 12 Sep 2022 19:07:23 +0200 Subject: can: canxl: update CAN infrastructure for CAN XL frames - add new ETH_P_CANXL ethernet protocol type - update skb checks for CAN XL - add alloc_canxl_skb() which now needs a data length parameter - introduce init_can_skb_reserve() to reduce code duplication Acked-by: Vincent Mailhol Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/all/20220912170725.120748-6-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde --- include/linux/can/skb.h | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/can/skb.h b/include/linux/can/skb.h index ddffc2fc008c..1abc25a8d144 100644 --- a/include/linux/can/skb.h +++ b/include/linux/can/skb.h @@ -30,6 +30,9 @@ void can_free_echo_skb(struct net_device *dev, unsigned int idx, struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf); struct sk_buff *alloc_canfd_skb(struct net_device *dev, struct canfd_frame **cfd); +struct sk_buff *alloc_canxl_skb(struct net_device *dev, + struct canxl_frame **cxl, + unsigned int data_len); struct sk_buff *alloc_can_err_skb(struct net_device *dev, struct can_frame **cf); bool can_dropped_invalid_skb(struct net_device *dev, struct sk_buff *skb); @@ -114,11 +117,29 @@ static inline bool can_is_canfd_skb(const struct sk_buff *skb) return (skb->len == CANFD_MTU && cfd->len <= CANFD_MAX_DLEN); } -/* get length element value from can[fd]_frame structure */ +static inline bool can_is_canxl_skb(const struct sk_buff *skb) +{ + const struct canxl_frame *cxl = (struct canxl_frame *)skb->data; + + if (skb->len < CANXL_HDR_SIZE + CANXL_MIN_DLEN || skb->len > CANXL_MTU) + return false; + + /* this also checks valid CAN XL data length boundaries */ + if (skb->len != CANXL_HDR_SIZE + cxl->len) + return false; + + return cxl->flags & CANXL_XLF; +} + +/* get length element value from can[|fd|xl]_frame structure */ static inline unsigned int can_skb_get_len_val(struct sk_buff *skb) { + const struct canxl_frame *cxl = (struct canxl_frame *)skb->data; const struct canfd_frame *cfd = (struct canfd_frame *)skb->data; + if (can_is_canxl_skb(skb)) + return cxl->len; + return cfd->len; } -- cgit From ebf87fc728502244550eaf8819fc785e2014b2ad Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Mon, 12 Sep 2022 19:07:24 +0200 Subject: can: dev: add CAN XL support to virtual CAN Make use of new can_skb_get_data_len() helper. Add support for variable CANXL MTU using the new can_is_canxl_dev_mtu(). Acked-by: Vincent Mailhol Signed-off-by: Oliver Hartkopp Link: https://lore.kernel.org/all/20220912170725.120748-7-socketcan@hartkopp.net Signed-off-by: Marc Kleine-Budde --- include/linux/can/dev.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h index c3e50e537e39..58f5431a5559 100644 --- a/include/linux/can/dev.h +++ b/include/linux/can/dev.h @@ -147,6 +147,11 @@ static inline u32 can_get_static_ctrlmode(struct can_priv *priv) return priv->ctrlmode & ~priv->ctrlmode_supported; } +static inline bool can_is_canxl_dev_mtu(unsigned int mtu) +{ + return (mtu >= CANXL_MIN_MTU && mtu <= CANXL_MAX_MTU); +} + void can_setup(struct net_device *dev); struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max, -- cgit From 85ebe0afd3f8377a9b506321314f4b8344caa8ec Mon Sep 17 00:00:00 2001 From: William Breathitt Gray Date: Thu, 18 Aug 2022 12:28:10 -0400 Subject: isa: Introduce the module_isa_driver_with_irq helper macro Several ISA drivers feature IRQ support that can configured via an "irq" array module parameter. This array typically matches directly with the respective "base" array module parameter. To reduce code repetition, a module_isa_driver_with_irq helper macro is introduced to provide a check ensuring that the number of "irq" passed to the module matches with the respective number of "base". Signed-off-by: William Breathitt Gray Acked-by: William Breathitt Gray Signed-off-by: Bartosz Golaszewski --- include/linux/isa.h | 52 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 42 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/isa.h b/include/linux/isa.h index e30963190968..4fbbf5e36e08 100644 --- a/include/linux/isa.h +++ b/include/linux/isa.h @@ -38,6 +38,32 @@ static inline void isa_unregister_driver(struct isa_driver *d) } #endif +#define module_isa_driver_init(__isa_driver, __num_isa_dev) \ +static int __init __isa_driver##_init(void) \ +{ \ + return isa_register_driver(&(__isa_driver), __num_isa_dev); \ +} \ +module_init(__isa_driver##_init) + +#define module_isa_driver_with_irq_init(__isa_driver, __num_isa_dev, __num_irq) \ +static int __init __isa_driver##_init(void) \ +{ \ + if (__num_irq != __num_isa_dev) { \ + pr_err("%s: Number of irq (%u) does not match number of base (%u)\n", \ + __isa_driver.driver.name, __num_irq, __num_isa_dev); \ + return -EINVAL; \ + } \ + return isa_register_driver(&(__isa_driver), __num_isa_dev); \ +} \ +module_init(__isa_driver##_init) + +#define module_isa_driver_exit(__isa_driver) \ +static void __exit __isa_driver##_exit(void) \ +{ \ + isa_unregister_driver(&(__isa_driver)); \ +} \ +module_exit(__isa_driver##_exit) + /** * module_isa_driver() - Helper macro for registering a ISA driver * @__isa_driver: isa_driver struct @@ -48,16 +74,22 @@ static inline void isa_unregister_driver(struct isa_driver *d) * use this macro once, and calling it replaces module_init and module_exit. */ #define module_isa_driver(__isa_driver, __num_isa_dev) \ -static int __init __isa_driver##_init(void) \ -{ \ - return isa_register_driver(&(__isa_driver), __num_isa_dev); \ -} \ -module_init(__isa_driver##_init); \ -static void __exit __isa_driver##_exit(void) \ -{ \ - isa_unregister_driver(&(__isa_driver)); \ -} \ -module_exit(__isa_driver##_exit); +module_isa_driver_init(__isa_driver, __num_isa_dev); \ +module_isa_driver_exit(__isa_driver) + +/** + * module_isa_driver_with_irq() - Helper macro for registering an ISA driver with irq + * @__isa_driver: isa_driver struct + * @__num_isa_dev: number of devices to register + * @__num_irq: number of IRQ to register + * + * Helper macro for ISA drivers with irq that do not do anything special in + * module init/exit. Each module may only use this macro once, and calling it + * replaces module_init and module_exit. + */ +#define module_isa_driver_with_irq(__isa_driver, __num_isa_dev, __num_irq) \ +module_isa_driver_with_irq_init(__isa_driver, __num_isa_dev, __num_irq); \ +module_isa_driver_exit(__isa_driver) /** * max_num_isa_dev() - Maximum possible number registered of an ISA device -- cgit From 47e34cb74d376ddfeaef94abb1d6dfb3c905ee51 Mon Sep 17 00:00:00 2001 From: Dave Marchevsky Date: Mon, 12 Sep 2022 08:45:44 -0700 Subject: bpf: Add verifier check for BPF_PTR_POISON retval and arg BPF_PTR_POISON was added in commit c0a5a21c25f37 ("bpf: Allow storing referenced kptr in map") to denote a bpf_func_proto btf_id which the verifier will replace with a dynamically-determined btf_id at verification time. This patch adds verifier 'poison' functionality to BPF_PTR_POISON in order to prepare for expanded use of the value to poison ret- and arg-btf_id in ongoing work, namely rbtree and linked list patchsets [0, 1]. Specifically, when the verifier checks helper calls, it assumes that BPF_PTR_POISON'ed ret type will be replaced with a valid type before - or in lieu of - the default ret_btf_id logic. Similarly for arg btf_id. If poisoned btf_id reaches default handling block for either, consider this a verifier internal error and fail verification. Otherwise a helper w/ poisoned btf_id but no verifier logic replacing the type will cause a crash as the invalid pointer is dereferenced. Also move BPF_PTR_POISON to existing include/linux/posion.h header and remove unnecessary shift. [0]: lore.kernel.org/bpf/20220830172759.4069786-1-davemarchevsky@fb.com [1]: lore.kernel.org/bpf/20220904204145.3089-1-memxor@gmail.com Signed-off-by: Dave Marchevsky Acked-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220912154544.1398199-1-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov --- include/linux/poison.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/poison.h b/include/linux/poison.h index d62ef5a6b4e9..2d3249eb0e62 100644 --- a/include/linux/poison.h +++ b/include/linux/poison.h @@ -81,4 +81,7 @@ /********** net/core/page_pool.c **********/ #define PP_SIGNATURE (0x40 + POISON_POINTER_DELTA) +/********** kernel/bpf/ **********/ +#define BPF_PTR_POISON ((void *)(0xeB9FUL + POISON_POINTER_DELTA)) + #endif -- cgit From 2747b93ebbede2af2d7bb088b9ddae3193ceede8 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Thu, 25 Aug 2022 18:08:56 +0200 Subject: locking: Detect includes rwlock.h outside of spinlock.h From: Michael S. Tsirkin The check for __LINUX_SPINLOCK_H within rwlock.h (and other files) detects the direct include of the header file if it is at the very beginning of the include section. If it is listed later then chances are high that spinlock.h was already included (including rwlock.h) and the additional listing of rwlock.h will not cause any failure. On PREEMPT_RT this additional rwlock.h will lead to compile failures since it uses a different rwlock implementation. Add __LINUX_INSIDE_SPINLOCK_H to spinlock.h and check for this instead of __LINUX_SPINLOCK_H to detect wrong includes. This will help detect direct includes of rwlock.h with without PREEMPT_RT enabled. [ bigeasy: add remaining __LINUX_SPINLOCK_H user and rewrite commit description. ] Signed-off-by: Michael S. Tsirkin Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/YweemHxJx7O8rjBx@linutronix.de --- include/linux/rwlock.h | 2 +- include/linux/spinlock.h | 2 ++ include/linux/spinlock_api_smp.h | 2 +- include/linux/spinlock_api_up.h | 2 +- include/linux/spinlock_rt.h | 2 +- include/linux/spinlock_up.h | 2 +- 6 files changed, 7 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rwlock.h b/include/linux/rwlock.h index 8f416c5e929e..c0ef596f340b 100644 --- a/include/linux/rwlock.h +++ b/include/linux/rwlock.h @@ -1,7 +1,7 @@ #ifndef __LINUX_RWLOCK_H #define __LINUX_RWLOCK_H -#ifndef __LINUX_SPINLOCK_H +#ifndef __LINUX_INSIDE_SPINLOCK_H # error "please don't include this file directly" #endif diff --git a/include/linux/spinlock.h b/include/linux/spinlock.h index 5c0c5174155d..1341f7d62da4 100644 --- a/include/linux/spinlock.h +++ b/include/linux/spinlock.h @@ -1,6 +1,7 @@ /* SPDX-License-Identifier: GPL-2.0 */ #ifndef __LINUX_SPINLOCK_H #define __LINUX_SPINLOCK_H +#define __LINUX_INSIDE_SPINLOCK_H /* * include/linux/spinlock.h - generic spinlock/rwlock declarations @@ -492,4 +493,5 @@ int __alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *lock_mask, void free_bucket_spinlocks(spinlock_t *locks); +#undef __LINUX_INSIDE_SPINLOCK_H #endif /* __LINUX_SPINLOCK_H */ diff --git a/include/linux/spinlock_api_smp.h b/include/linux/spinlock_api_smp.h index 51fa0dab68c4..89eb6f4c659c 100644 --- a/include/linux/spinlock_api_smp.h +++ b/include/linux/spinlock_api_smp.h @@ -1,7 +1,7 @@ #ifndef __LINUX_SPINLOCK_API_SMP_H #define __LINUX_SPINLOCK_API_SMP_H -#ifndef __LINUX_SPINLOCK_H +#ifndef __LINUX_INSIDE_SPINLOCK_H # error "please don't include this file directly" #endif diff --git a/include/linux/spinlock_api_up.h b/include/linux/spinlock_api_up.h index b8ba00ccccde..819aeba1c87e 100644 --- a/include/linux/spinlock_api_up.h +++ b/include/linux/spinlock_api_up.h @@ -1,7 +1,7 @@ #ifndef __LINUX_SPINLOCK_API_UP_H #define __LINUX_SPINLOCK_API_UP_H -#ifndef __LINUX_SPINLOCK_H +#ifndef __LINUX_INSIDE_SPINLOCK_H # error "please don't include this file directly" #endif diff --git a/include/linux/spinlock_rt.h b/include/linux/spinlock_rt.h index 835aedaf68ac..61c49b16f69a 100644 --- a/include/linux/spinlock_rt.h +++ b/include/linux/spinlock_rt.h @@ -2,7 +2,7 @@ #ifndef __LINUX_SPINLOCK_RT_H #define __LINUX_SPINLOCK_RT_H -#ifndef __LINUX_SPINLOCK_H +#ifndef __LINUX_INSIDE_SPINLOCK_H #error Do not include directly. Use spinlock.h #endif diff --git a/include/linux/spinlock_up.h b/include/linux/spinlock_up.h index 16521074b6f7..c87204247592 100644 --- a/include/linux/spinlock_up.h +++ b/include/linux/spinlock_up.h @@ -1,7 +1,7 @@ #ifndef __LINUX_SPINLOCK_UP_H #define __LINUX_SPINLOCK_UP_H -#ifndef __LINUX_SPINLOCK_H +#ifndef __LINUX_INSIDE_SPINLOCK_H # error "please don't include this file directly" #endif -- cgit From f24a0b1c22c2e90abb4ee1f7a3b0f0d8fc2ede5f Mon Sep 17 00:00:00 2001 From: Maxime Ripard Date: Tue, 16 Aug 2022 13:25:09 +0200 Subject: clk: Mention that .recalc_rate can return 0 on error Multiple platforms (amlogic, imx8) return 0 when the clock rate cannot be determined properly by the recalc_rate hook. Mention in the documentation that the framework is ok with that. Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20220816112530.1837489-5-maxime@cerno.tech Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 1615010aa0ec..9a14cfa0d201 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -118,8 +118,9 @@ struct clk_duty { * * @recalc_rate Recalculate the rate of this clock, by querying hardware. The * parent rate is an input parameter. It is up to the caller to - * ensure that the prepare_mutex is held across this call. - * Returns the calculated rate. Optional, but recommended - if + * ensure that the prepare_mutex is held across this call. If the + * driver cannot figure out a rate for this clock, it must return + * 0. Returns the calculated rate. Optional, but recommended - if * this op is not set then clock rate will be initialized to 0. * * @round_rate: Given a target rate as input, returns the closest rate actually -- cgit From c35e84b0977617f96fb1dbef3fb8d71a861325c0 Mon Sep 17 00:00:00 2001 From: Maxime Ripard Date: Tue, 16 Aug 2022 13:25:21 +0200 Subject: clk: Introduce clk_hw_init_rate_request() clk-divider instantiates clk_rate_request internally for its round_rate implementations to share the code with its determine_rate implementations. However, it's missing a few fields (min_rate, max_rate) that would be initialized properly if it was using clk_core_init_rate_req(). Let's create the clk_hw_init_rate_request() function for clock providers to be able to share the code to instation clk_rate_requests with the framework. This will also be useful for some tests introduced in later patches. Tested-by: Alexander Stein # imx8mp Tested-by: Marek Szyprowski # exynos4210, meson g12b Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20220816112530.1837489-17-maxime@cerno.tech Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 9a14cfa0d201..d857717aa386 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -42,6 +42,8 @@ struct dentry; * struct clk_rate_request - Structure encoding the clk constraints that * a clock user might require. * + * Should be initialized by calling clk_hw_init_rate_request(). + * * @rate: Requested clock rate. This field will be adjusted by * clock drivers according to hardware capabilities. * @min_rate: Minimum rate imposed by clk users. @@ -60,6 +62,10 @@ struct clk_rate_request { struct clk_hw *best_parent_hw; }; +void clk_hw_init_rate_request(const struct clk_hw *hw, + struct clk_rate_request *req, + unsigned long rate); + /** * struct clk_duty - Struture encoding the duty cycle ratio of a clock * -- cgit From 22fb0e284fbc3c1b85d24c5a1df8ea3ac82ab1d1 Mon Sep 17 00:00:00 2001 From: Maxime Ripard Date: Tue, 16 Aug 2022 13:25:25 +0200 Subject: clk: Constify clk_has_parent() clk_has_parent() doesn't modify the clocks being passed, so let's make it const. Suggested-by: Stephen Boyd Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20220816112530.1837489-21-maxime@cerno.tech Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju Signed-off-by: Stephen Boyd --- include/linux/clk.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/clk.h b/include/linux/clk.h index c13061cabdfc..1ef013324237 100644 --- a/include/linux/clk.h +++ b/include/linux/clk.h @@ -799,7 +799,7 @@ int clk_set_rate_exclusive(struct clk *clk, unsigned long rate); * * Returns true if @parent is a possible parent for @clk, false otherwise. */ -bool clk_has_parent(struct clk *clk, struct clk *parent); +bool clk_has_parent(const struct clk *clk, const struct clk *parent); /** * clk_set_rate_range - set a rate range for a clock source -- cgit From 262ca38f4b6eb418b20b8e1d6d8495c6a98727c1 Mon Sep 17 00:00:00 2001 From: Maxime Ripard Date: Tue, 16 Aug 2022 13:25:26 +0200 Subject: clk: Stop forwarding clk_rate_requests to the parent If the clock cannot modify its rate and has CLK_SET_RATE_PARENT, clk_mux_determine_rate_flags(), clk_core_round_rate_nolock() and a number of drivers will forward the clk_rate_request to the parent clock. clk_core_round_rate_nolock() will pass the pointer directly, which means that we pass a clk_rate_request to the parent that has the rate, min_rate and max_rate of the child, and the best_parent_rate and best_parent_hw fields will be relative to the child as well, so will point to our current clock and its rate. The most common case for CLK_SET_RATE_PARENT is that the child and parent clock rates will be equal, so the rate field isn't a worry, but the other fields are. Similarly, if the parent clock driver ever modifies the best_parent_rate or best_parent_hw, this will be applied to the child once the call to clk_core_round_rate_nolock() is done. best_parent_hw is probably not going to be a valid parent, and best_parent_rate might lead to a parent rate change different to the one that was initially computed. clk_mux_determine_rate_flags() and the affected drivers will copy the request before forwarding it to the parents, so they won't be affected by the latter issue, but the former is still going to be there and will lead to erroneous data and context being passed to the various clock drivers in the same sub-tree. Let's create two new functions, clk_core_forward_rate_req() and clk_hw_forward_rate_request() for the framework and the clock providers that will copy a request from a child clock and update the context to match the parent's. We also update the relevant call sites in the framework and drivers to use that new function. Let's also add a test to make sure we avoid regressions there. Tested-by: Alexander Stein # imx8mp Tested-by: Marek Szyprowski # exynos4210, meson g12b Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20220816112530.1837489-22-maxime@cerno.tech Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index d857717aa386..8bce6c524f29 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -65,6 +65,11 @@ struct clk_rate_request { void clk_hw_init_rate_request(const struct clk_hw *hw, struct clk_rate_request *req, unsigned long rate); +void clk_hw_forward_rate_request(const struct clk_hw *core, + const struct clk_rate_request *old_req, + const struct clk_hw *parent, + struct clk_rate_request *req, + unsigned long parent_rate); /** * struct clk_duty - Struture encoding the duty cycle ratio of a clock -- cgit From 253993253466ba7187730b196174146d5247e97b Mon Sep 17 00:00:00 2001 From: Maxime Ripard Date: Tue, 16 Aug 2022 13:25:28 +0200 Subject: clk: Introduce the clk_hw_get_rate_range function Some clock providers are hand-crafting their clk_rate_request, and need to figure out the current boundaries of their clk_hw to fill it properly. Let's create such a function for clock providers. Signed-off-by: Maxime Ripard Link: https://lore.kernel.org/r/20220816112530.1837489-24-maxime@cerno.tech Tested-by: Linux Kernel Functional Testing Tested-by: Naresh Kamboju Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 8bce6c524f29..8724a3547a79 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -1267,6 +1267,8 @@ int clk_mux_determine_rate_flags(struct clk_hw *hw, struct clk_rate_request *req, unsigned long flags); void clk_hw_reparent(struct clk_hw *hw, struct clk_hw *new_parent); +void clk_hw_get_rate_range(struct clk_hw *hw, unsigned long *min_rate, + unsigned long *max_rate); void clk_hw_set_rate_range(struct clk_hw *hw, unsigned long min_rate, unsigned long max_rate); -- cgit From b404cb45990bf24d41c29fe856aafb0746a7b81f Mon Sep 17 00:00:00 2001 From: Xinlei Lee Date: Wed, 14 Sep 2022 21:21:00 +0800 Subject: soc: mediatek: Add mmsys func to adapt to dpi output for MT8186 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add mmsys func to manipulate dpi output format config for MT8186. Co-developed-by: Jitao Shi Signed-off-by: Jitao Shi Signed-off-by: Xinlei Lee Reviewed-by: Nís F. R. A. Prado Link: https://lore.kernel.org/all/1663161662-1598-2-git-send-email-xinlei.lee@mediatek.com/ Signed-off-by: Matthias Brugger --- include/linux/soc/mediatek/mtk-mmsys.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk-mmsys.h b/include/linux/soc/mediatek/mtk-mmsys.h index 59117d970daf..d2b02bb43768 100644 --- a/include/linux/soc/mediatek/mtk-mmsys.h +++ b/include/linux/soc/mediatek/mtk-mmsys.h @@ -65,4 +65,6 @@ void mtk_mmsys_ddp_disconnect(struct device *dev, enum mtk_ddp_comp_id cur, enum mtk_ddp_comp_id next); +void mtk_mmsys_ddp_dpi_fmt_config(struct device *dev, u32 val); + #endif /* __MTK_MMSYS_H */ -- cgit From 7187440dd7c45fd8b6dd4f3ff56b03ca4aa1bbd2 Mon Sep 17 00:00:00 2001 From: Dan Carpenter Date: Thu, 8 Sep 2022 15:20:23 +0300 Subject: iov_iter: use "maxpages" parameter This was intended to be "maxpages" instead of INT_MAX. There is only one caller and it passes INT_MAX so this does not affect runtime. Fixes: b93235e68921 ("tls: cap the output scatter list to something reasonable") Signed-off-by: Dan Carpenter Signed-off-by: David S. Miller --- include/linux/uio.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/uio.h b/include/linux/uio.h index 5896af36199c..2e3134b14ffd 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -298,7 +298,7 @@ iov_iter_npages_cap(struct iov_iter *i, int maxpages, size_t max_bytes) shorted = iov_iter_count(i) - max_bytes; iov_iter_truncate(i, max_bytes); } - npages = iov_iter_npages(i, INT_MAX); + npages = iov_iter_npages(i, maxpages); if (shorted) iov_iter_reexpand(i, iov_iter_count(i) + shorted); -- cgit From 82f00b24f532557fb0e15a6a2747859e4b70c4bd Mon Sep 17 00:00:00 2001 From: Weili Qian Date: Fri, 9 Sep 2022 17:46:55 +0800 Subject: crypto: hisilicon/qm - get hardware features from hardware registers Before hardware V3, hardwares do not provide the feature registers, driver resolves hardware differences based on the hardware version. As a result, the driver does not support the new hardware. Hardware V3 and later versions support to obtain hardware features, such as power-gating management and doorbell isolation, through the hardware registers. To be compatible with later hardware versions, the features of the current device is obtained by reading the hardware registers instead of the hardware version. Signed-off-by: Weili Qian Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 116e8bd68c99..851c962ba473 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -168,6 +168,15 @@ enum qm_vf_state { QM_NOT_READY, }; +enum qm_cap_bits { + QM_SUPPORT_DB_ISOLATION = 0x0, + QM_SUPPORT_FUNC_QOS, + QM_SUPPORT_STOP_QP, + QM_SUPPORT_MB_COMMAND, + QM_SUPPORT_SVA_PREFETCH, + QM_SUPPORT_RPM, +}; + struct dfx_diff_registers { u32 *regs; u32 reg_offset; @@ -258,6 +267,18 @@ struct hisi_qm_err_ini { void (*err_info_init)(struct hisi_qm *qm); }; +struct hisi_qm_cap_info { + u32 type; + /* Register offset */ + u32 offset; + /* Bit offset in register */ + u32 shift; + u32 mask; + u32 v1_val; + u32 v2_val; + u32 v3_val; +}; + struct hisi_qm_list { struct mutex lock; struct list_head list; @@ -278,6 +299,9 @@ struct hisi_qm { struct pci_dev *pdev; void __iomem *io_base; void __iomem *db_io_base; + + /* Capbility version, 0: not supports */ + u32 cap_ver; u32 sqe_size; u32 qp_base; u32 qp_num; @@ -304,6 +328,8 @@ struct hisi_qm { struct hisi_qm_err_info err_info; struct hisi_qm_err_status err_status; unsigned long misc_ctl; /* driver removing and reset sched */ + /* Device capability bit */ + unsigned long caps; struct rw_semaphore qps_lock; struct idr qp_idr; @@ -326,8 +352,6 @@ struct hisi_qm { bool use_sva; bool is_frozen; - /* doorbell isolation enable */ - bool use_db_isolation; resource_size_t phys_base; resource_size_t db_phys_base; struct uacce_device *uacce; @@ -501,6 +525,9 @@ void hisi_qm_pm_init(struct hisi_qm *qm); int hisi_qm_get_dfx_access(struct hisi_qm *qm); void hisi_qm_put_dfx_access(struct hisi_qm *qm); void hisi_qm_regs_dump(struct seq_file *s, struct debugfs_regset32 *regset); +u32 hisi_qm_get_hw_info(struct hisi_qm *qm, + const struct hisi_qm_cap_info *info_table, + u32 index, bool is_read); /* Used by VFIO ACC live migration driver */ struct pci_driver *hisi_sec_get_pf_driver(void); -- cgit From 129a9f340172b4f3857260a7a7bb9d7b3496ba50 Mon Sep 17 00:00:00 2001 From: Weili Qian Date: Fri, 9 Sep 2022 17:46:56 +0800 Subject: crypto: hisilicon/qm - get qp num and depth from hardware registers Hardware V3 and later versions can obtain qp num and depth supported by the hardware from registers. To be compatible with later hardware versions, get qp num and depth from registers instead of fixed marcos. Signed-off-by: Weili Qian Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 851c962ba473..371c0e7ffd27 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -109,7 +109,6 @@ QM_MAILBOX_TIMEOUT | QM_FLR_TIMEOUT) #define QM_BASE_CE QM_ECC_1BIT -#define QM_Q_DEPTH 1024 #define QM_MIN_QNUM 2 #define HISI_ACC_SGL_SGE_NR_MAX 255 #define QM_SHAPER_CFG 0x100164 @@ -310,6 +309,8 @@ struct hisi_qm { u32 max_qp_num; u32 vfs_num; u32 db_interval; + u16 eq_depth; + u16 aeq_depth; struct list_head list; struct hisi_qm_list *qm_list; @@ -375,6 +376,8 @@ struct hisi_qp_ops { struct hisi_qp { u32 qp_id; + u16 sq_depth; + u16 cq_depth; u8 alg_type; u8 req_type; -- cgit From d90fab0deb8e580aa001f6876e4436c21e944f27 Mon Sep 17 00:00:00 2001 From: Weili Qian Date: Fri, 9 Sep 2022 17:46:58 +0800 Subject: crypto: hisilicon/qm - get error type from hardware registers Hardware V3 and later versions support get error type from registers. To be compatible with later hardware versions, get error type from registers instead of fixed marco. Signed-off-by: Weili Qian Signed-off-by: Herbert Xu --- include/linux/hisi_acc_qm.h | 27 ++++----------------------- 1 file changed, 4 insertions(+), 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hisi_acc_qm.h b/include/linux/hisi_acc_qm.h index 371c0e7ffd27..e230c7c46110 100644 --- a/include/linux/hisi_acc_qm.h +++ b/include/linux/hisi_acc_qm.h @@ -87,28 +87,6 @@ #define PEH_AXUSER_CFG 0x401001 #define PEH_AXUSER_CFG_ENABLE 0xffffffff -#define QM_AXI_RRESP BIT(0) -#define QM_AXI_BRESP BIT(1) -#define QM_ECC_MBIT BIT(2) -#define QM_ECC_1BIT BIT(3) -#define QM_ACC_GET_TASK_TIMEOUT BIT(4) -#define QM_ACC_DO_TASK_TIMEOUT BIT(5) -#define QM_ACC_WB_NOT_READY_TIMEOUT BIT(6) -#define QM_SQ_CQ_VF_INVALID BIT(7) -#define QM_CQ_VF_INVALID BIT(8) -#define QM_SQ_VF_INVALID BIT(9) -#define QM_DB_TIMEOUT BIT(10) -#define QM_OF_FIFO_OF BIT(11) -#define QM_DB_RANDOM_INVALID BIT(12) -#define QM_MAILBOX_TIMEOUT BIT(13) -#define QM_FLR_TIMEOUT BIT(14) - -#define QM_BASE_NFE (QM_AXI_RRESP | QM_AXI_BRESP | QM_ECC_MBIT | \ - QM_ACC_GET_TASK_TIMEOUT | QM_DB_TIMEOUT | \ - QM_OF_FIFO_OF | QM_DB_RANDOM_INVALID | \ - QM_MAILBOX_TIMEOUT | QM_FLR_TIMEOUT) -#define QM_BASE_CE QM_ECC_1BIT - #define QM_MIN_QNUM 2 #define HISI_ACC_SGL_SGE_NR_MAX 255 #define QM_SHAPER_CFG 0x100164 @@ -240,7 +218,10 @@ struct hisi_qm_err_info { char *acpi_rst; u32 msi_wr_port; u32 ecc_2bits_mask; - u32 dev_ce_mask; + u32 qm_shutdown_mask; + u32 dev_shutdown_mask; + u32 qm_reset_mask; + u32 dev_reset_mask; u32 ce; u32 nfe; u32 fe; -- cgit From 882597aff2d442a52ca173c293939232c2e1ebea Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Wed, 14 Sep 2022 07:14:24 -0700 Subject: Input: auo-pixcir-ts - drop support for platform data Currently there are no users of auo_pixcir_ts_platdata in the mainline, and having it (with legacy gpio numbers) prevents us from converting the driver to gpiod API, so let's drop it. If, in the future, someone wants to use this driver on non-device tree, non-ACPI system, they should use static device properties instead of platform data. Reviewed-by: Heiko Stuebner Link: https://lore.kernel.org/r/20220914141428.2201784-1-dmitry.torokhov@gmail.com Signed-off-by: Dmitry Torokhov --- include/linux/input/auo-pixcir-ts.h | 44 ------------------------------------- 1 file changed, 44 deletions(-) delete mode 100644 include/linux/input/auo-pixcir-ts.h (limited to 'include/linux') diff --git a/include/linux/input/auo-pixcir-ts.h b/include/linux/input/auo-pixcir-ts.h deleted file mode 100644 index ed0776997a7a..000000000000 --- a/include/linux/input/auo-pixcir-ts.h +++ /dev/null @@ -1,44 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * Driver for AUO in-cell touchscreens - * - * Copyright (c) 2011 Heiko Stuebner - * - * based on auo_touch.h from Dell Streak kernel - * - * Copyright (c) 2008 QUALCOMM Incorporated. - * Copyright (c) 2008 QUALCOMM USA, INC. - */ - -#ifndef __AUO_PIXCIR_TS_H__ -#define __AUO_PIXCIR_TS_H__ - -/* - * Interrupt modes: - * periodical: interrupt is asserted periodicaly - * compare coordinates: interrupt is asserted when coordinates change - * indicate touch: interrupt is asserted during touch - */ -#define AUO_PIXCIR_INT_PERIODICAL 0x00 -#define AUO_PIXCIR_INT_COMP_COORD 0x01 -#define AUO_PIXCIR_INT_TOUCH_IND 0x02 - -/* - * @gpio_int interrupt gpio - * @int_setting one of AUO_PIXCIR_INT_* - * @init_hw hardwarespecific init - * @exit_hw hardwarespecific shutdown - * @x_max x-resolution - * @y_max y-resolution - */ -struct auo_pixcir_ts_platdata { - int gpio_int; - int gpio_rst; - - int int_setting; - - unsigned int x_max; - unsigned int y_max; -}; - -#endif -- cgit From f67f12ff57bcfcd7d64280f748787793217faeaf Mon Sep 17 00:00:00 2001 From: Serge Semin Date: Fri, 9 Sep 2022 22:36:08 +0300 Subject: ata: libahci_platform: Introduce reset assertion/deassertion methods Currently the ACHI-platform library supports only the assert and deassert reset signals and ignores the platforms with self-deasserting reset lines. That prone to having the platforms with self-deasserting reset method misbehaviour when it comes to resuming from sleep state after the clocks have been fully disabled. For such cases the controller needs to be fully reset all over after the reference clocks are enabled and stable, otherwise the controller state machine might be in an undetermined state. The best solution would be to auto-detect which reset method is supported by the particular platform and use it implicitly in the framework of the ahci_platform_enable_resources()/ahci_platform_disable_resources() methods. Alas it can't be implemented due to the AHCI-platform library already supporting the shared reset control lines. As [1] says in such case we have to use only one of the next methods: + reset_control_assert()/reset_control_deassert(); + reset_control_reset()/reset_control_rearm(). If the driver had an exclusive control over the reset lines we could have been able to manipulate the lines with no much limitation and just used the combination of the methods above to cover all the possible reset-control cases. Since the shared reset control has already been advertised and couldn't be changed with no risk to breaking the platforms relying on it, we have no choice but to make the platform drivers to determine which reset methods the platform reset system supports. In order to implement both types of reset control support we suggest to introduce the new AHCI-platform flag: AHCI_PLATFORM_RST_TRIGGER, which when passed to the ahci_platform_get_resources() method together with the AHCI_PLATFORM_GET_RESETS flag will indicate that the reset lines are self-deasserting thus the reset_control_reset()/reset_control_rearm() will be used to control the reset state. Otherwise the reset_control_deassert()/reset_control_assert() methods will be utilized. [1] Documentation/driver-api/reset.rst Signed-off-by: Serge Semin Reviewed-by: Hannes Reinecke Signed-off-by: Damien Le Moal --- include/linux/ahci_platform.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ahci_platform.h b/include/linux/ahci_platform.h index 49e5383d4222..6d7dd472d370 100644 --- a/include/linux/ahci_platform.h +++ b/include/linux/ahci_platform.h @@ -23,6 +23,8 @@ int ahci_platform_enable_phys(struct ahci_host_priv *hpriv); void ahci_platform_disable_phys(struct ahci_host_priv *hpriv); int ahci_platform_enable_clks(struct ahci_host_priv *hpriv); void ahci_platform_disable_clks(struct ahci_host_priv *hpriv); +int ahci_platform_deassert_rsts(struct ahci_host_priv *hpriv); +int ahci_platform_assert_rsts(struct ahci_host_priv *hpriv); int ahci_platform_enable_regulators(struct ahci_host_priv *hpriv); void ahci_platform_disable_regulators(struct ahci_host_priv *hpriv); int ahci_platform_enable_resources(struct ahci_host_priv *hpriv); @@ -41,6 +43,7 @@ int ahci_platform_resume_host(struct device *dev); int ahci_platform_suspend(struct device *dev); int ahci_platform_resume(struct device *dev); -#define AHCI_PLATFORM_GET_RESETS 0x01 +#define AHCI_PLATFORM_GET_RESETS BIT(0) +#define AHCI_PLATFORM_RST_TRIGGER BIT(1) #endif /* _AHCI_PLATFORM_H */ -- cgit From 6ce73f3a6fc0ad6af56ba4c14ab7a410876f8286 Mon Sep 17 00:00:00 2001 From: Serge Semin Date: Fri, 9 Sep 2022 22:36:16 +0300 Subject: ata: libahci_platform: Add function returning a clock-handle by id Since all the clocks are retrieved by the method ahci_platform_get_resources() there is no need for the LLD (glue) drivers to be looking for some particular of them in the kernel clocks table again. Instead we suggest to add a simple method returning a device-specific clock with passed connection ID if it is managed to be found. Otherwise the function will return NULL. Thus the glue-drivers won't need to either manually touching the hpriv->clks array or calling clk_get()-friends. The AHCI platform drivers will be able to use the new function right after the ahci_platform_get_resources() method invocation and up to the device removal. Note the method is left unused here, but will be utilized in the framework of the DWC AHCI SATA driver being added in the next commit. Signed-off-by: Serge Semin Signed-off-by: Damien Le Moal --- include/linux/ahci_platform.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/ahci_platform.h b/include/linux/ahci_platform.h index 6d7dd472d370..17fa26215292 100644 --- a/include/linux/ahci_platform.h +++ b/include/linux/ahci_platform.h @@ -13,6 +13,7 @@ #include +struct clk; struct device; struct ata_port_info; struct ahci_host_priv; @@ -21,6 +22,8 @@ struct scsi_host_template; int ahci_platform_enable_phys(struct ahci_host_priv *hpriv); void ahci_platform_disable_phys(struct ahci_host_priv *hpriv); +struct clk *ahci_platform_find_clk(struct ahci_host_priv *hpriv, + const char *con_id); int ahci_platform_enable_clks(struct ahci_host_priv *hpriv); void ahci_platform_disable_clks(struct ahci_host_priv *hpriv); int ahci_platform_deassert_rsts(struct ahci_host_priv *hpriv); -- cgit From bfeb7e399bacae4ee46ad978f5fce3e47f0978d6 Mon Sep 17 00:00:00 2001 From: Yauheni Kaliuta Date: Mon, 5 Sep 2022 12:01:49 +0300 Subject: bpf: Use bpf_capable() instead of CAP_SYS_ADMIN for blinding decision The full CAP_SYS_ADMIN requirement for blinding looks too strict nowadays. These days given unprivileged BPF is disabled by default, the main users for constant blinding coming from unprivileged in particular via cBPF -> eBPF migration (e.g. old-style socket filters). Signed-off-by: Yauheni Kaliuta Signed-off-by: Daniel Borkmann Link: https://lore.kernel.org/bpf/20220831090655.156434-1-ykaliuta@redhat.com Link: https://lore.kernel.org/bpf/20220905090149.61221-1-ykaliuta@redhat.com --- include/linux/filter.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index 527ae1d64e27..75335432fcbc 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1099,7 +1099,7 @@ static inline bool bpf_jit_blinding_enabled(struct bpf_prog *prog) return false; if (!bpf_jit_harden) return false; - if (bpf_jit_harden == 1 && capable(CAP_SYS_ADMIN)) + if (bpf_jit_harden == 1 && bpf_capable()) return false; return true; -- cgit From ceea991a019c57a1fb0edd12a5f836a0fa431aee Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Sat, 3 Sep 2022 15:11:54 +0200 Subject: bpf: Move bpf_dispatcher function out of ftrace locations The dispatcher function is attached/detached to trampoline by dispatcher update function. At the same time it's available as ftrace attachable function. After discussion [1] the proposed solution is to use compiler attributes to alter bpf_dispatcher_##name##_func function: - remove it from being instrumented with __no_instrument_function__ attribute, so ftrace has no track of it - but still generate 5 nop instructions with patchable_function_entry(5) attribute, which are expected by bpf_arch_text_poke used by dispatcher update function Enabling HAVE_DYNAMIC_FTRACE_NO_PATCHABLE option for x86, so __patchable_function_entries functions are not part of ftrace/mcount locations. Adding attributes to bpf_dispatcher_XXX function on x86_64 so it's kept out of ftrace locations and has 5 byte nop generated at entry. These attributes need to be arch specific as pointed out by Ilya Leoshkevic in here [2]. The dispatcher image is generated only for x86_64 arch, so the code can stay as is for other archs. [1] https://lore.kernel.org/bpf/20220722110811.124515-1-jolsa@kernel.org/ [2] https://lore.kernel.org/bpf/969a14281a7791c334d476825863ee449964dd0c.camel@linux.ibm.com/ Suggested-by: Peter Zijlstra (Intel) Signed-off-by: Jiri Olsa Signed-off-by: Daniel Borkmann Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/bpf/20220903131154.420467-3-jolsa@kernel.org --- include/linux/bpf.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 54178b9e9c3a..e0dbe0c0a17e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -977,7 +977,14 @@ int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs); }, \ } +#ifdef CONFIG_X86_64 +#define BPF_DISPATCHER_ATTRIBUTES __attribute__((patchable_function_entry(5))) +#else +#define BPF_DISPATCHER_ATTRIBUTES +#endif + #define DEFINE_BPF_DISPATCHER(name) \ + notrace BPF_DISPATCHER_ATTRIBUTES \ noinline __nocfi unsigned int bpf_dispatcher_##name##_func( \ const void *ctx, \ const struct bpf_insn *insnsi, \ -- cgit From c2f2e2c3aecdbabf822272a4b6e7d91537633cd9 Mon Sep 17 00:00:00 2001 From: ChiaEn Wu Date: Thu, 15 Sep 2022 17:47:32 +0800 Subject: lib: add linear range index macro Add linear_range_idx macro for declaring the linear_range struct simply. Reviewed-by: Matti Vaittinen Signed-off-by: ChiaEn Wu Signed-off-by: Sebastian Reichel --- include/linux/linear_range.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/linear_range.h b/include/linux/linear_range.h index fd3d0b358f22..2e4f4c3539c0 100644 --- a/include/linux/linear_range.h +++ b/include/linux/linear_range.h @@ -26,6 +26,17 @@ struct linear_range { unsigned int step; }; +#define LINEAR_RANGE(_min, _min_sel, _max_sel, _step) \ + { \ + .min = _min, \ + .min_sel = _min_sel, \ + .max_sel = _max_sel, \ + .step = _step, \ + } + +#define LINEAR_RANGE_IDX(_idx, _min, _min_sel, _max_sel, _step) \ + [_idx] = LINEAR_RANGE(_min, _min_sel, _max_sel, _step) + unsigned int linear_range_values_in_range(const struct linear_range *r); unsigned int linear_range_values_in_range_array(const struct linear_range *r, int ranges); -- cgit From c7007d9f19527b47992ff78a088e8697a9e9d5f5 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 2 May 2022 00:55:49 +0200 Subject: efi/libstub: add some missing EFI prototypes Define the correct prototypes for the load_image, start_image and unload_image boot service pointers so we can call them from the EFI zboot code. Also add some prototypes related to installation and deinstallation of protocols in to the EFI protocol database, including some definitions related to device paths. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index d2b84c2fec39..af90f7989f80 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -368,6 +368,9 @@ void efi_native_runtime_setup(void); #define UV_SYSTEM_TABLE_GUID EFI_GUID(0x3b13a7d4, 0x633e, 0x11dd, 0x93, 0xec, 0xda, 0x25, 0x56, 0xd8, 0x95, 0x93) #define LINUX_EFI_CRASH_GUID EFI_GUID(0xcfc8fc79, 0xbe2e, 0x4ddc, 0x97, 0xf0, 0x9f, 0x98, 0xbf, 0xe2, 0x98, 0xa0) #define LOADED_IMAGE_PROTOCOL_GUID EFI_GUID(0x5b1b31a1, 0x9562, 0x11d2, 0x8e, 0x3f, 0x00, 0xa0, 0xc9, 0x69, 0x72, 0x3b) +#define LOADED_IMAGE_DEVICE_PATH_PROTOCOL_GUID EFI_GUID(0xbc62157e, 0x3e33, 0x4fec, 0x99, 0x20, 0x2d, 0x3b, 0x36, 0xd7, 0x50, 0xdf) +#define EFI_DEVICE_PATH_PROTOCOL_GUID EFI_GUID(0x09576e91, 0x6d3f, 0x11d2, 0x8e, 0x39, 0x00, 0xa0, 0xc9, 0x69, 0x72, 0x3b) +#define EFI_DEVICE_PATH_TO_TEXT_PROTOCOL_GUID EFI_GUID(0x8b843e20, 0x8132, 0x4852, 0x90, 0xcc, 0x55, 0x1a, 0x4e, 0x4a, 0x7f, 0x1c) #define EFI_GRAPHICS_OUTPUT_PROTOCOL_GUID EFI_GUID(0x9042a9de, 0x23dc, 0x4a38, 0x96, 0xfb, 0x7a, 0xde, 0xd0, 0x80, 0x51, 0x6a) #define EFI_UGA_PROTOCOL_GUID EFI_GUID(0x982c298b, 0xf4fa, 0x41cb, 0xb8, 0x38, 0x77, 0xaa, 0x68, 0x8f, 0xb8, 0x39) #define EFI_PCI_IO_PROTOCOL_GUID EFI_GUID(0x4cf5b200, 0x68b8, 0x4ca5, 0x9e, 0xec, 0xb2, 0x3e, 0x3f, 0x50, 0x02, 0x9a) @@ -952,6 +955,7 @@ extern int efi_status_to_err(efi_status_t status); #define EFI_DEV_MEDIA_VENDOR 3 #define EFI_DEV_MEDIA_FILE 4 #define EFI_DEV_MEDIA_PROTOCOL 5 +#define EFI_DEV_MEDIA_REL_OFFSET 8 #define EFI_DEV_BIOS_BOOT 0x05 #define EFI_DEV_END_PATH 0x7F #define EFI_DEV_END_PATH2 0xFF @@ -982,12 +986,20 @@ struct efi_vendor_dev_path { u8 vendordata[]; } __packed; +struct efi_rel_offset_dev_path { + struct efi_generic_dev_path header; + u32 reserved; + u64 starting_offset; + u64 ending_offset; +} __packed; + struct efi_dev_path { union { struct efi_generic_dev_path header; struct efi_acpi_dev_path acpi; struct efi_pci_dev_path pci; struct efi_vendor_dev_path vendor; + struct efi_rel_offset_dev_path rel_offset; }; } __packed; -- cgit From f5a5e83379b537f6252526bb4d285b771f6f0b89 Mon Sep 17 00:00:00 2001 From: Hector Martin Date: Wed, 14 Sep 2022 09:34:31 +0100 Subject: soc: apple: rtkit: Add apple_rtkit_poll This allows a client to receive messages in atomic context, by polling. Signed-off-by: Hector Martin Signed-off-by: Russell King (Oracle) Reviewed-by: Sven Peter Reviewed-by: Eric Curtin Reviewed-by: Linus Walleij Signed-off-by: Arnd Bergmann --- include/linux/soc/apple/rtkit.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/apple/rtkit.h b/include/linux/soc/apple/rtkit.h index 88eb832eac7b..c9cabb679cd1 100644 --- a/include/linux/soc/apple/rtkit.h +++ b/include/linux/soc/apple/rtkit.h @@ -152,4 +152,16 @@ int apple_rtkit_send_message(struct apple_rtkit *rtk, u8 ep, u64 message, int apple_rtkit_send_message_wait(struct apple_rtkit *rtk, u8 ep, u64 message, unsigned long timeout, bool atomic); +/* + * Process incoming messages in atomic context. + * This only guarantees that messages arrive as far as the recv_message_early + * callback; drivers expecting to handle incoming messages synchronously + * by calling this function must do it that way. + * Will return 1 if some data was processed, 0 if none was, or a + * negative error code on failure. + * + * @rtk: RTKit reference + */ +int apple_rtkit_poll(struct apple_rtkit *rtk); + #endif /* _LINUX_APPLE_RTKIT_H_ */ -- cgit From 460d9cb62f7fe82cc835d6b3924a8d06fd2d510a Mon Sep 17 00:00:00 2001 From: Samuel Holland Date: Sun, 14 Aug 2022 23:12:44 -0500 Subject: soc: sunxi: sram: Return void from the release function There is no point in returning an error here, as the caller can do nothing about it. In fact, all callers already ignore the return value. Acked-by: Jernej Skrabec Signed-off-by: Samuel Holland Reviewed-by: Heiko Stuebner Tested-by: Heiko Stuebner Link: https://lore.kernel.org/r/20220815041248.53268-8-samuel@sholland.org Signed-off-by: Jernej Skrabec --- include/linux/soc/sunxi/sunxi_sram.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/soc/sunxi/sunxi_sram.h b/include/linux/soc/sunxi/sunxi_sram.h index c5f663bba9c2..60e274d1b821 100644 --- a/include/linux/soc/sunxi/sunxi_sram.h +++ b/include/linux/soc/sunxi/sunxi_sram.h @@ -14,6 +14,6 @@ #define _SUNXI_SRAM_H_ int sunxi_sram_claim(struct device *dev); -int sunxi_sram_release(struct device *dev); +void sunxi_sram_release(struct device *dev); #endif /* _SUNXI_SRAM_H_ */ -- cgit From e63efbcaba7d6f6846b1c2ec5cf259249b9ed88f Mon Sep 17 00:00:00 2001 From: Hector Martin Date: Fri, 16 Sep 2022 17:02:46 +0100 Subject: wifi: brcmfmac: pcie: Read Apple OTP information On Apple platforms, the One Time Programmable ROM in the Broadcom chips contains information about the specific board design (module, vendor, version) that is required to select the correct NVRAM file. Parse this OTP ROM and extract the required strings. Note that the user OTP offset/size is per-chip. This patch does not add any chips yet. Reviewed-by: Arend van Spriel Signed-off-by: Hector Martin Signed-off-by: Russell King (Oracle) Signed-off-by: Kalle Valo Link: https://lore.kernel.org/r/E1oZDni-0077aM-I6@rmk-PC.armlinux.org.uk --- include/linux/bcma/bcma_driver_chipcommon.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bcma/bcma_driver_chipcommon.h b/include/linux/bcma/bcma_driver_chipcommon.h index e3314f746bfa..2d94c30ed439 100644 --- a/include/linux/bcma/bcma_driver_chipcommon.h +++ b/include/linux/bcma/bcma_driver_chipcommon.h @@ -271,6 +271,7 @@ #define BCMA_CC_SROM_CONTROL_OP_WRDIS 0x40000000 #define BCMA_CC_SROM_CONTROL_OP_WREN 0x60000000 #define BCMA_CC_SROM_CONTROL_OTPSEL 0x00000010 +#define BCMA_CC_SROM_CONTROL_OTP_PRESENT 0x00000020 #define BCMA_CC_SROM_CONTROL_LOCK 0x00000008 #define BCMA_CC_SROM_CONTROL_SIZE_MASK 0x00000006 #define BCMA_CC_SROM_CONTROL_SIZE_1K 0x00000000 -- cgit From 555bb4ccd1dd78d0263eae31629fe1fdd65c1fb5 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 25 Aug 2022 18:41:24 +0200 Subject: preempt: Provide preempt_[dis|en]able_nested() On PREEMPT_RT enabled kernels, spinlocks and rwlocks are neither disabling preemption nor interrupts. Though there are a few places which depend on the implicit preemption/interrupt disable of those locks, e.g. seqcount write sections, per CPU statistics updates etc. To avoid sprinkling CONFIG_PREEMPT_RT conditionals all over the place, add preempt_disable_nested() and preempt_enable_nested() which should be descriptive enough. Add a lockdep assertion for the !PREEMPT_RT case to catch callers which do not have preemption disabled. Suggested-by: Linus Torvalds Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220825164131.402717-2-bigeasy@linutronix.de --- include/linux/preempt.h | 42 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 42 insertions(+) (limited to 'include/linux') diff --git a/include/linux/preempt.h b/include/linux/preempt.h index b4381f255a5c..0df425bf9bd7 100644 --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -421,4 +421,46 @@ static inline void migrate_enable(void) { } #endif /* CONFIG_SMP */ +/** + * preempt_disable_nested - Disable preemption inside a normally preempt disabled section + * + * Use for code which requires preemption protection inside a critical + * section which has preemption disabled implicitly on non-PREEMPT_RT + * enabled kernels, by e.g.: + * - holding a spinlock/rwlock + * - soft interrupt context + * - regular interrupt handlers + * + * On PREEMPT_RT enabled kernels spinlock/rwlock held sections, soft + * interrupt context and regular interrupt handlers are preemptible and + * only prevent migration. preempt_disable_nested() ensures that preemption + * is disabled for cases which require CPU local serialization even on + * PREEMPT_RT. For non-PREEMPT_RT kernels this is a NOP. + * + * The use cases are code sequences which are not serialized by a + * particular lock instance, e.g.: + * - seqcount write side critical sections where the seqcount is not + * associated to a particular lock and therefore the automatic + * protection mechanism does not work. This prevents a live lock + * against a preempting high priority reader. + * - RMW per CPU variable updates like vmstat. + */ +/* Macro to avoid header recursion hell vs. lockdep */ +#define preempt_disable_nested() \ +do { \ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) \ + preempt_disable(); \ + else \ + lockdep_assert_preemption_disabled(); \ +} while (0) + +/** + * preempt_enable_nested - Undo the effect of preempt_disable_nested() + */ +static __always_inline void preempt_enable_nested(void) +{ + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); +} + #endif /* __LINUX_PREEMPT_H */ -- cgit From a738e9bad6030d4fd33bfd7db3399a250b7e94d8 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 25 Aug 2022 18:41:27 +0200 Subject: mm/debug: Provide VM_WARN_ON_IRQS_ENABLED() Some places in the VM code expect interrupts disabled, which is a valid expectation on non-PREEMPT_RT kernels, but does not hold on RT kernels in some places because the RT spinlock substitution does not disable interrupts. To avoid sprinkling CONFIG_PREEMPT_RT conditionals into those places, provide VM_WARN_ON_IRQS_ENABLED() which is only enabled when VM_DEBUG=y and PREEMPT_RT=n. Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Acked-by: Michal Hocko Link: https://lore.kernel.org/r/20220825164131.402717-5-bigeasy@linutronix.de --- include/linux/mmdebug.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h index 15ae78cd2853..b8728d11c949 100644 --- a/include/linux/mmdebug.h +++ b/include/linux/mmdebug.h @@ -94,6 +94,12 @@ void dump_mm(const struct mm_struct *mm); #define VM_WARN(cond, format...) BUILD_BUG_ON_INVALID(cond) #endif +#ifdef CONFIG_DEBUG_VM_IRQSOFF +#define VM_WARN_ON_IRQS_ENABLED() WARN_ON_ONCE(!irqs_disabled()) +#else +#define VM_WARN_ON_IRQS_ENABLED() do { } while (0) +#endif + #ifdef CONFIG_DEBUG_VIRTUAL #define VIRTUAL_BUG_ON(cond) BUG_ON(cond) #else -- cgit From 44b0c2957adc62b86fcd51adeaf8e993171bc319 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Thu, 25 Aug 2022 18:41:31 +0200 Subject: u64_stats: Streamline the implementation The u64 stats code handles 3 different cases: - 32bit UP - 32bit SMP - 64bit with an unreadable #ifdef maze, which was recently expanded with PREEMPT_RT conditionals. Reduce it to two cases (32bit and 64bit) and drop the optimization for 32bit UP as suggested by Linus. Use the new preempt_disable/enable_nested() helpers to get rid of the CONFIG_PREEMPT_RT conditionals. Signed-off-by: Thomas Gleixner Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20220825164131.402717-9-bigeasy@linutronix.de --- include/linux/u64_stats_sync.h | 145 ++++++++++++++++++----------------------- 1 file changed, 64 insertions(+), 81 deletions(-) (limited to 'include/linux') diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h index 6ad4e9032d53..46040d66334a 100644 --- a/include/linux/u64_stats_sync.h +++ b/include/linux/u64_stats_sync.h @@ -8,7 +8,7 @@ * * Key points : * - * - Use a seqcount on 32-bit SMP, only disable preemption for 32-bit UP. + * - Use a seqcount on 32-bit * - The whole thing is a no-op on 64-bit architectures. * * Usage constraints: @@ -20,7 +20,8 @@ * writer and also spin forever. * * 3) Write side must use the _irqsave() variant if other writers, or a reader, - * can be invoked from an IRQ context. + * can be invoked from an IRQ context. On 64bit systems this variant does not + * disable interrupts. * * 4) If reader fetches several counters, there is no guarantee the whole values * are consistent w.r.t. each other (remember point #2: seqcounts are not @@ -29,11 +30,6 @@ * 5) Readers are allowed to sleep or be preempted/interrupted: they perform * pure reads. * - * 6) Readers must use both u64_stats_fetch_{begin,retry}_irq() if the stats - * might be updated from a hardirq or softirq context (remember point #1: - * seqcounts are not used for UP kernels). 32-bit UP stat readers could read - * corrupted 64-bit values otherwise. - * * Usage : * * Stats producer (writer) should use following template granted it already got @@ -66,7 +62,7 @@ #include struct u64_stats_sync { -#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)) +#if BITS_PER_LONG == 32 seqcount_t seq; #endif }; @@ -98,7 +94,22 @@ static inline void u64_stats_inc(u64_stats_t *p) local64_inc(&p->v); } -#else +static inline void u64_stats_init(struct u64_stats_sync *syncp) { } +static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp) { } +static inline void __u64_stats_update_end(struct u64_stats_sync *syncp) { } +static inline unsigned long __u64_stats_irqsave(void) { return 0; } +static inline void __u64_stats_irqrestore(unsigned long flags) { } +static inline unsigned int __u64_stats_fetch_begin(const struct u64_stats_sync *syncp) +{ + return 0; +} +static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp, + unsigned int start) +{ + return false; +} + +#else /* 64 bit */ typedef struct { u64 v; @@ -123,123 +134,95 @@ static inline void u64_stats_inc(u64_stats_t *p) { p->v++; } -#endif -#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)) -#define u64_stats_init(syncp) seqcount_init(&(syncp)->seq) -#else static inline void u64_stats_init(struct u64_stats_sync *syncp) { + seqcount_init(&syncp->seq); } -#endif -static inline void u64_stats_update_begin(struct u64_stats_sync *syncp) +static inline void __u64_stats_update_begin(struct u64_stats_sync *syncp) { -#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)) - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - preempt_disable(); + preempt_disable_nested(); write_seqcount_begin(&syncp->seq); -#endif } -static inline void u64_stats_update_end(struct u64_stats_sync *syncp) +static inline void __u64_stats_update_end(struct u64_stats_sync *syncp) { -#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)) write_seqcount_end(&syncp->seq); - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - preempt_enable(); -#endif + preempt_enable_nested(); } -static inline unsigned long -u64_stats_update_begin_irqsave(struct u64_stats_sync *syncp) +static inline unsigned long __u64_stats_irqsave(void) { - unsigned long flags = 0; + unsigned long flags; -#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)) - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - preempt_disable(); - else - local_irq_save(flags); - write_seqcount_begin(&syncp->seq); -#endif + local_irq_save(flags); return flags; } -static inline void -u64_stats_update_end_irqrestore(struct u64_stats_sync *syncp, - unsigned long flags) +static inline void __u64_stats_irqrestore(unsigned long flags) { -#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)) - write_seqcount_end(&syncp->seq); - if (IS_ENABLED(CONFIG_PREEMPT_RT)) - preempt_enable(); - else - local_irq_restore(flags); -#endif + local_irq_restore(flags); } static inline unsigned int __u64_stats_fetch_begin(const struct u64_stats_sync *syncp) { -#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)) return read_seqcount_begin(&syncp->seq); -#else - return 0; -#endif } -static inline unsigned int u64_stats_fetch_begin(const struct u64_stats_sync *syncp) +static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp, + unsigned int start) { -#if BITS_PER_LONG == 32 && (!defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT)) - preempt_disable(); -#endif - return __u64_stats_fetch_begin(syncp); + return read_seqcount_retry(&syncp->seq, start); } +#endif /* !64 bit */ -static inline bool __u64_stats_fetch_retry(const struct u64_stats_sync *syncp, - unsigned int start) +static inline void u64_stats_update_begin(struct u64_stats_sync *syncp) { -#if BITS_PER_LONG == 32 && (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)) - return read_seqcount_retry(&syncp->seq, start); -#else - return false; -#endif + __u64_stats_update_begin(syncp); +} + +static inline void u64_stats_update_end(struct u64_stats_sync *syncp) +{ + __u64_stats_update_end(syncp); +} + +static inline unsigned long u64_stats_update_begin_irqsave(struct u64_stats_sync *syncp) +{ + unsigned long flags = __u64_stats_irqsave(); + + __u64_stats_update_begin(syncp); + return flags; +} + +static inline void u64_stats_update_end_irqrestore(struct u64_stats_sync *syncp, + unsigned long flags) +{ + __u64_stats_update_end(syncp); + __u64_stats_irqrestore(flags); +} + +static inline unsigned int u64_stats_fetch_begin(const struct u64_stats_sync *syncp) +{ + return __u64_stats_fetch_begin(syncp); } static inline bool u64_stats_fetch_retry(const struct u64_stats_sync *syncp, unsigned int start) { -#if BITS_PER_LONG == 32 && (!defined(CONFIG_SMP) && !defined(CONFIG_PREEMPT_RT)) - preempt_enable(); -#endif return __u64_stats_fetch_retry(syncp, start); } -/* - * In case irq handlers can update u64 counters, readers can use following helpers - * - SMP 32bit arches use seqcount protection, irq safe. - * - UP 32bit must disable irqs. - * - 64bit have no problem atomically reading u64 values, irq safe. - */ +/* Obsolete interfaces */ static inline unsigned int u64_stats_fetch_begin_irq(const struct u64_stats_sync *syncp) { -#if BITS_PER_LONG == 32 && defined(CONFIG_PREEMPT_RT) - preempt_disable(); -#elif BITS_PER_LONG == 32 && !defined(CONFIG_SMP) - local_irq_disable(); -#endif - return __u64_stats_fetch_begin(syncp); + return u64_stats_fetch_begin(syncp); } static inline bool u64_stats_fetch_retry_irq(const struct u64_stats_sync *syncp, unsigned int start) { -#if BITS_PER_LONG == 32 && defined(CONFIG_PREEMPT_RT) - preempt_enable(); -#elif BITS_PER_LONG == 32 && !defined(CONFIG_SMP) - local_irq_enable(); -#endif - return __u64_stats_fetch_retry(syncp, start); + return u64_stats_fetch_retry(syncp, start); } #endif /* _LINUX_U64_STATS_SYNC_H */ -- cgit From 6a164c646999847b843e651f71c53dfaceb2c2b4 Mon Sep 17 00:00:00 2001 From: Sebastian Andrzej Siewior Date: Mon, 9 May 2022 16:04:08 +0200 Subject: genirq: Provide generic_handle_domain_irq_safe(). commit 509853f9e1e7b ("genirq: Provide generic_handle_irq_safe()") addressed the problem of demultiplexing interrupt handlers which are force threaded on PREEMPT_RT enabled kernels which means that the demultiplexed handler is invoked with interrupts enabled which triggers a lockdep warning due to a non-irq safe lock acquisition. The same problem exists for the irq domain based interrupt handling via generic_handle_domain_irq() which has been reported against the AMD pin-ctrl driver. Provide generic_handle_domain_irq_safe() which can used from any context. [ tglx: Split the usage sites out and massaged changelog ] Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Thomas Gleixner Link: https://lore.kernel.org/r/YnkfWFzvusFFktSt@linutronix.de Link: https://bugzilla.kernel.org/show_bug.cgi?id=215954 --- include/linux/irqdesc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/irqdesc.h b/include/linux/irqdesc.h index 1cd4e36890fb..844a8e30e6de 100644 --- a/include/linux/irqdesc.h +++ b/include/linux/irqdesc.h @@ -169,6 +169,7 @@ int generic_handle_irq_safe(unsigned int irq); * conversion failed. */ int generic_handle_domain_irq(struct irq_domain *domain, unsigned int hwirq); +int generic_handle_domain_irq_safe(struct irq_domain *domain, unsigned int hwirq); int generic_handle_domain_nmi(struct irq_domain *domain, unsigned int hwirq); #endif -- cgit From b88c48bfdd85820b19f0c0295a88d1596876e7c8 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 26 Aug 2022 20:26:41 +0300 Subject: pwm: core: Get rid of unused devm_of_pwm_get() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The devm_of_pwm_get() has recently lost its single user, drop the dead API as well. Note, the new code should use either plain pwm_get() or managed devm_pwm_get() or devm_fwnode_pwm_get() APIs. Signed-off-by: Andy Shevchenko Acked-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20220826172642.16404-2-andriy.shevchenko@linux.intel.com Signed-off-by: Guenter Roeck --- include/linux/pwm.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pwm.h b/include/linux/pwm.h index 9429930c5566..572ba92e4206 100644 --- a/include/linux/pwm.h +++ b/include/linux/pwm.h @@ -408,8 +408,6 @@ struct pwm_device *of_pwm_get(struct device *dev, struct device_node *np, void pwm_put(struct pwm_device *pwm); struct pwm_device *devm_pwm_get(struct device *dev, const char *con_id); -struct pwm_device *devm_of_pwm_get(struct device *dev, struct device_node *np, - const char *con_id); struct pwm_device *devm_fwnode_pwm_get(struct device *dev, struct fwnode_handle *fwnode, const char *con_id); @@ -517,14 +515,6 @@ static inline struct pwm_device *devm_pwm_get(struct device *dev, return ERR_PTR(-ENODEV); } -static inline struct pwm_device *devm_of_pwm_get(struct device *dev, - struct device_node *np, - const char *con_id) -{ - might_sleep(); - return ERR_PTR(-ENODEV); -} - static inline struct pwm_device * devm_fwnode_pwm_get(struct device *dev, struct fwnode_handle *fwnode, const char *con_id) -- cgit From b5ae0ad56455dfb277b165b9270382f0e1c1d943 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Fri, 26 Aug 2022 20:26:42 +0300 Subject: pwm: core: Make of_pwm_get() static MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There are no users outside of PWM core of the of_pwm_get(). Make it static. Signed-off-by: Andy Shevchenko Acked-by: Uwe Kleine-König Link: https://lore.kernel.org/r/20220826172642.16404-3-andriy.shevchenko@linux.intel.com Signed-off-by: Guenter Roeck --- include/linux/pwm.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pwm.h b/include/linux/pwm.h index 572ba92e4206..d70c6e5a839d 100644 --- a/include/linux/pwm.h +++ b/include/linux/pwm.h @@ -403,8 +403,6 @@ struct pwm_device *of_pwm_single_xlate(struct pwm_chip *pc, const struct of_phandle_args *args); struct pwm_device *pwm_get(struct device *dev, const char *con_id); -struct pwm_device *of_pwm_get(struct device *dev, struct device_node *np, - const char *con_id); void pwm_put(struct pwm_device *pwm); struct pwm_device *devm_pwm_get(struct device *dev, const char *con_id); @@ -495,14 +493,6 @@ static inline struct pwm_device *pwm_get(struct device *dev, return ERR_PTR(-ENODEV); } -static inline struct pwm_device *of_pwm_get(struct device *dev, - struct device_node *np, - const char *con_id) -{ - might_sleep(); - return ERR_PTR(-ENODEV); -} - static inline void pwm_put(struct pwm_device *pwm) { might_sleep(); -- cgit From 41929b72eb5dab26280dfc1e7c1523f0f4853dde Mon Sep 17 00:00:00 2001 From: Michael Shych Date: Wed, 10 Aug 2022 20:15:50 +0300 Subject: platform_data/emc2305: define platform data for EMC2305 driver Introduce platform data structure for EM2305 driver to allow configuration device PWMs and thermal zones by passing required platform data to the driver. If no platform data is provided, the driver is supposed to work with default settings. Signed-off-by: Michael Shych Reviewed-by: Vadim Pasternak Link: https://lore.kernel.org/r/20220810171552.56417-2-michaelsh@nvidia.com Signed-off-by: Guenter Roeck --- include/linux/platform_data/emc2305.h | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) create mode 100644 include/linux/platform_data/emc2305.h (limited to 'include/linux') diff --git a/include/linux/platform_data/emc2305.h b/include/linux/platform_data/emc2305.h new file mode 100644 index 000000000000..54d672dd6f7d --- /dev/null +++ b/include/linux/platform_data/emc2305.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0 */ + +#ifndef __LINUX_PLATFORM_DATA_EMC2305__ +#define __LINUX_PLATFORM_DATA_EMC2305__ + +#define EMC2305_PWM_MAX 5 + +/** + * struct emc2305_platform_data - EMC2305 driver platform data + * @max_state: maximum cooling state of the cooling device; + * @pwm_num: number of active channels; + * @pwm_separate: separate PWM settings for every channel; + * @pwm_min: array of minimum PWM per channel; + */ +struct emc2305_platform_data { + u8 max_state; + u8 pwm_num; + bool pwm_separate; + u8 pwm_min[EMC2305_PWM_MAX]; +}; + +#endif -- cgit From 5db72fdb74983a1e81331aadf99ae2305f277562 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Tue, 13 Sep 2022 19:31:40 +0300 Subject: ACPI: utils: Add acpi_dev_uid_to_integer() helper to get _UID as integer Some users interpret _UID only as integer and for them it's easier to have an integer representation of _UID. Add respective helper for that. Signed-off-by: Andy Shevchenko Reviewed-by: Hans de Goede Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6f64b2f3dc54..9434db02cb60 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -798,6 +798,11 @@ acpi_dev_hid_uid_match(struct acpi_device *adev, const char *hid2, const char *u return false; } +static inline int acpi_dev_uid_to_integer(struct acpi_device *adev, u64 *integer) +{ + return -ENODEV; +} + static inline struct acpi_device * acpi_dev_get_first_match_dev(const char *hid, const char *uid, s64 hrv) { -- cgit From 38bef8e57f2149acd2c910a98f57dd6291d2e0ec Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 5 Sep 2022 16:08:17 -0700 Subject: smp: add set_nr_cpu_ids() In preparation to support compile-time nr_cpu_ids, add a setter for the variable. This is a no-op for all arches. Signed-off-by: Yury Norov --- include/linux/cpumask.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index e8ad12b5b9d2..24763997894d 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -39,6 +39,11 @@ typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t; #define nr_cpu_ids 1U #else extern unsigned int nr_cpu_ids; + +static inline void set_nr_cpu_ids(unsigned int nr) +{ + nr_cpu_ids = nr; +} #endif #ifdef CONFIG_CPUMASK_OFFSTACK -- cgit From 7102b3bb070fdf4580a05cbfc5ad3c0691dc4bf9 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 5 Sep 2022 16:08:18 -0700 Subject: lib/cpumask: delete misleading comment The comment says that HOTPLUG config option enables all cpus in cpu_possible_mask up to NR_CPUs. This is wrong. Even if HOTPLUG is enabled, the mask is populated on boot with respect to ACPI/DT records. Signed-off-by: Yury Norov --- include/linux/cpumask.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 24763997894d..9d2f0e3e927e 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -72,10 +72,6 @@ static inline void set_nr_cpu_ids(unsigned int nr) * cpu_online_mask is the dynamic subset of cpu_present_mask, * indicating those CPUs available for scheduling. * - * If HOTPLUG is enabled, then cpu_possible_mask is forced to have - * all NR_CPUS bits set, otherwise it is just the set of CPUs that - * ACPI reports present at boot. - * * If HOTPLUG is enabled, then cpu_present_mask varies dynamically, * depending on what ACPI reports as currently plugged in, otherwise * cpu_present_mask is just a copy of cpu_possible_mask. -- cgit From aa47a7c215e79a2ade6916f163c5a17b561bce4f Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 5 Sep 2022 16:08:19 -0700 Subject: lib/cpumask: deprecate nr_cpumask_bits Cpumask code is written in assumption that when CONFIG_CPUMASK_OFFSTACK is enabled, all cpumasks have boot-time defined size, otherwise the size is always NR_CPUS. The latter is wrong because the number of possible cpus is always calculated on boot, and it may be less than NR_CPUS. On my 4-cpu arm64 VM the nr_cpu_ids is 4, as expected, and nr_cpumask_bits is 256, which corresponds to NR_CPUS. This not only leads to useless traversing of cpumask bits greater than 4, this also makes some cpumask routines fail. For example, cpumask_full(0b1111000..000) would erroneously return false in the example above because tail bits in the mask are all unset. This patch deprecates nr_cpumask_bits and wires it to nr_cpu_ids unconditionally, so that cpumask routines will not waste time traversing unused part of cpu masks. It also fixes cpumask_full() and similar routines. As a side effect, because now a length of cpumasks is defined at run-time even if CPUMASK_OFFSTACK is disabled, compiler can't optimize corresponding functions. It increases kernel size by ~2.5KB if OFFSTACK is off. This is addressed in the following patch. Signed-off-by: Yury Norov --- include/linux/cpumask.h | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 9d2f0e3e927e..2f6622cead1f 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -46,13 +46,8 @@ static inline void set_nr_cpu_ids(unsigned int nr) } #endif -#ifdef CONFIG_CPUMASK_OFFSTACK -/* Assuming NR_CPUS is huge, a runtime limit is more efficient. Also, - * not all bits may be allocated. */ +/* Deprecated. Always use nr_cpu_ids. */ #define nr_cpumask_bits nr_cpu_ids -#else -#define nr_cpumask_bits ((unsigned int)NR_CPUS) -#endif /* * The following particular system cpumasks and operations manage -- cgit From a050910972bb25152b42ad2e544652117c5ad915 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Mon, 2 May 2022 01:08:16 +0200 Subject: efi/libstub: implement generic EFI zboot Implement a minimal EFI app that decompresses the real kernel image and launches it using the firmware's LoadImage and StartImage boot services. This removes the need for any arch-specific hacks. Note that on systems that have UEFI secure boot policies enabled, LoadImage/StartImage require images to be signed, or their hashes known a priori, in order to be permitted to boot. There are various possible strategies to work around this requirement, but they all rely either on overriding internal PI/DXE protocols (which are not part of the EFI spec) or omitting the firmware provided LoadImage() and StartImage() boot services, which is also undesirable, given that they encapsulate platform specific policies related to secure boot and measured boot, but also related to memory permissions (whether or not and which types of heap allocations have both write and execute permissions.) The only generic and truly portable way around this is to simply sign both the inner and the outer image with the same key/cert pair, so this is what is implemented here. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index af90f7989f80..5efc3105f8e0 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -411,6 +411,7 @@ void efi_native_runtime_setup(void); #define LINUX_EFI_TPM_FINAL_LOG_GUID EFI_GUID(0x1e2ed096, 0x30e2, 0x4254, 0xbd, 0x89, 0x86, 0x3b, 0xbe, 0xf8, 0x23, 0x25) #define LINUX_EFI_MEMRESERVE_TABLE_GUID EFI_GUID(0x888eb0c6, 0x8ede, 0x4ff5, 0xa8, 0xf0, 0x9a, 0xee, 0x5c, 0xb9, 0x77, 0xc2) #define LINUX_EFI_INITRD_MEDIA_GUID EFI_GUID(0x5568e427, 0x68fc, 0x4f3d, 0xac, 0x74, 0xca, 0x55, 0x52, 0x31, 0xcc, 0x68) +#define LINUX_EFI_ZBOOT_MEDIA_GUID EFI_GUID(0xe565a30d, 0x47da, 0x4dbd, 0xb3, 0x54, 0x9b, 0xb5, 0xc8, 0x4f, 0x8b, 0xe2) #define LINUX_EFI_MOK_VARIABLE_TABLE_GUID EFI_GUID(0xc451ed2b, 0x9694, 0x45d3, 0xba, 0xba, 0xed, 0x9f, 0x89, 0x88, 0xa3, 0x89) #define LINUX_EFI_COCO_SECRET_AREA_GUID EFI_GUID(0xadf956ad, 0xe98c, 0x484c, 0xae, 0x11, 0xb5, 0x1c, 0x7d, 0x33, 0x64, 0x47) -- cgit From db01868bf2e919a10551066d54cf4bef5dd5a01e Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Sun, 11 Sep 2022 04:06:57 +0300 Subject: net: introduce iterators over synced hw addresses Some network drivers use __dev_mc_sync()/__dev_uc_sync() and therefore program the hardware only with addresses with a non-zero sync_cnt. Some of the above drivers also need to save/restore the address filtering lists when certain events happen, and they need to walk through the struct net_device :: uc and struct net_device :: mc lists. But these lists contain unsynced addresses too. To keep the appearance of an elementary form of data encapsulation, provide iterators through these lists that only look at entries with a non-zero sync_cnt, instead of filtering entries out from device drivers. Signed-off-by: Vladimir Oltean Reviewed-by: Florian Fainelli Signed-off-by: Paolo Abeni --- include/linux/netdevice.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index f0068c1ff1df..9f42fc871c3b 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -253,11 +253,17 @@ struct netdev_hw_addr_list { #define netdev_uc_empty(dev) netdev_hw_addr_list_empty(&(dev)->uc) #define netdev_for_each_uc_addr(ha, dev) \ netdev_hw_addr_list_for_each(ha, &(dev)->uc) +#define netdev_for_each_synced_uc_addr(_ha, _dev) \ + netdev_for_each_uc_addr((_ha), (_dev)) \ + if ((_ha)->sync_cnt) #define netdev_mc_count(dev) netdev_hw_addr_list_count(&(dev)->mc) #define netdev_mc_empty(dev) netdev_hw_addr_list_empty(&(dev)->mc) #define netdev_for_each_mc_addr(ha, dev) \ netdev_hw_addr_list_for_each(ha, &(dev)->mc) +#define netdev_for_each_synced_mc_addr(_ha, _dev) \ + netdev_for_each_mc_addr((_ha), (_dev)) \ + if ((_ha)->sync_cnt) struct hh_cache { unsigned int hh_len; -- cgit From 1e839143d674603b0bbbc4c513bca35404967dbc Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Fri, 2 Sep 2022 15:29:23 +0200 Subject: HID: core: store the unique system identifier in hid_device This unique identifier is currently used only for ensuring uniqueness in sysfs. However, this could be handful for userspace to refer to a specific hid_device by this id. 2 use cases are in my mind: LEDs (and their naming convention), and HID-BPF. Reviewed-by: Greg Kroah-Hartman Signed-off-by: Benjamin Tissoires Link: https://lore.kernel.org/r/20220902132938.2409206-9-benjamin.tissoires@redhat.com --- include/linux/hid.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index 4363a63b9775..a43dd17bc78f 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -658,6 +658,8 @@ struct hid_device { /* device report descriptor */ struct list_head debug_list; spinlock_t debug_list_lock; wait_queue_head_t debug_wait; + + unsigned int id; /* system unique id */ }; #define to_hid_device(pdev) \ -- cgit From ead77b65aef430d3bfe63524c243a60a29eb8d90 Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Fri, 2 Sep 2022 15:29:24 +0200 Subject: HID: export hid_report_type to uapi When we are dealing with eBPF, we need to have access to the report type. Currently our implementation differs from the USB standard, making it impossible for users to know the exact value besides hardcoding it themselves. And instead of a blank define, convert it as an enum. Note that we need to also do change in the ll_driver API, but given that this will have a wider impact outside of this tree, we leave this as a TODO for the future. Reviewed-by: Greg Kroah-Hartman Signed-off-by: Benjamin Tissoires Link: https://lore.kernel.org/r/20220902132938.2409206-10-benjamin.tissoires@redhat.com --- include/linux/hid.h | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index a43dd17bc78f..b1a33dbbc78e 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -314,15 +314,6 @@ struct hid_item { #define HID_BAT_ABSOLUTESTATEOFCHARGE 0x00850065 #define HID_VD_ASUS_CUSTOM_MEDIA_KEYS 0xff310076 -/* - * HID report types --- Ouch! HID spec says 1 2 3! - */ - -#define HID_INPUT_REPORT 0 -#define HID_OUTPUT_REPORT 1 -#define HID_FEATURE_REPORT 2 - -#define HID_REPORT_TYPES 3 /* * HID connect requests @@ -509,7 +500,7 @@ struct hid_report { struct list_head hidinput_list; struct list_head field_entry_list; /* ordered list of input fields */ unsigned int id; /* id of this report */ - unsigned int type; /* report type */ + enum hid_report_type type; /* report type */ unsigned int application; /* application usage for this report */ struct hid_field *field[HID_MAX_FIELDS]; /* fields of the report */ struct hid_field_entry *field_entries; /* allocated memory of input field_entry */ @@ -926,7 +917,8 @@ extern int hidinput_connect(struct hid_device *hid, unsigned int force); extern void hidinput_disconnect(struct hid_device *); int hid_set_field(struct hid_field *, unsigned, __s32); -int hid_input_report(struct hid_device *, int type, u8 *, u32, int); +int hid_input_report(struct hid_device *hid, enum hid_report_type type, u8 *data, u32 size, + int interrupt); struct hid_field *hidinput_get_led_field(struct hid_device *hid); unsigned int hidinput_count_leds(struct hid_device *hid); __s32 hidinput_calc_abs_res(const struct hid_field *field, __u16 code); @@ -935,11 +927,11 @@ int __hid_request(struct hid_device *hid, struct hid_report *rep, int reqtype); u8 *hid_alloc_report_buf(struct hid_report *report, gfp_t flags); struct hid_device *hid_allocate_device(void); struct hid_report *hid_register_report(struct hid_device *device, - unsigned int type, unsigned int id, + enum hid_report_type type, unsigned int id, unsigned int application); int hid_parse_report(struct hid_device *hid, __u8 *start, unsigned size); struct hid_report *hid_validate_values(struct hid_device *hid, - unsigned int type, unsigned int id, + enum hid_report_type type, unsigned int id, unsigned int field_index, unsigned int report_counts); @@ -1111,7 +1103,7 @@ void hid_hw_request(struct hid_device *hdev, struct hid_report *report, int reqtype); int hid_hw_raw_request(struct hid_device *hdev, unsigned char reportnum, __u8 *buf, - size_t len, unsigned char rtype, int reqtype); + size_t len, enum hid_report_type rtype, int reqtype); int hid_hw_output_report(struct hid_device *hdev, __u8 *buf, size_t len); /** @@ -1184,8 +1176,8 @@ static inline u32 hid_report_len(struct hid_report *report) return DIV_ROUND_UP(report->size, 8) + (report->id > 0); } -int hid_report_raw_event(struct hid_device *hid, int type, u8 *data, u32 size, - int interrupt); +int hid_report_raw_event(struct hid_device *hid, enum hid_report_type type, u8 *data, u32 size, + int interrupt); /* HID quirks API */ unsigned long hid_lookup_quirk(const struct hid_device *hdev); -- cgit From 735e1bb1b8067e209941a6bdfde23214696ff47e Mon Sep 17 00:00:00 2001 From: Benjamin Tissoires Date: Fri, 2 Sep 2022 15:29:25 +0200 Subject: HID: convert defines of HID class requests into a proper enum This allows to export the type in BTF and so in the automatically generated vmlinux.h. It will also add some static checks on the users when we change the ll driver API (see not below). Note that we need to also do change in the ll_driver API, but given that this will have a wider impact outside of this tree, we leave this as a TODO for the future. Reviewed-by: Greg Kroah-Hartman Signed-off-by: Benjamin Tissoires Link: https://lore.kernel.org/r/20220902132938.2409206-11-benjamin.tissoires@redhat.com --- include/linux/hid.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hid.h b/include/linux/hid.h index b1a33dbbc78e..8677ae38599e 100644 --- a/include/linux/hid.h +++ b/include/linux/hid.h @@ -923,7 +923,7 @@ struct hid_field *hidinput_get_led_field(struct hid_device *hid); unsigned int hidinput_count_leds(struct hid_device *hid); __s32 hidinput_calc_abs_res(const struct hid_field *field, __u16 code); void hid_output_report(struct hid_report *report, __u8 *data); -int __hid_request(struct hid_device *hid, struct hid_report *rep, int reqtype); +int __hid_request(struct hid_device *hid, struct hid_report *rep, enum hid_class_request reqtype); u8 *hid_alloc_report_buf(struct hid_report *report, gfp_t flags); struct hid_device *hid_allocate_device(void); struct hid_report *hid_register_report(struct hid_device *device, @@ -1100,10 +1100,11 @@ void hid_hw_stop(struct hid_device *hdev); int __must_check hid_hw_open(struct hid_device *hdev); void hid_hw_close(struct hid_device *hdev); void hid_hw_request(struct hid_device *hdev, - struct hid_report *report, int reqtype); + struct hid_report *report, enum hid_class_request reqtype); int hid_hw_raw_request(struct hid_device *hdev, unsigned char reportnum, __u8 *buf, - size_t len, enum hid_report_type rtype, int reqtype); + size_t len, enum hid_report_type rtype, + enum hid_class_request reqtype); int hid_hw_output_report(struct hid_device *hdev, __u8 *buf, size_t len); /** @@ -1131,7 +1132,7 @@ static inline int hid_hw_power(struct hid_device *hdev, int level) * @reqtype: hid request type */ static inline int hid_hw_idle(struct hid_device *hdev, int report, int idle, - int reqtype) + enum hid_class_request reqtype) { if (hdev->ll_driver->idle) return hdev->ll_driver->idle(hdev, report, idle, reqtype); -- cgit From 176042404ee6a96ba7e9054e1bda6220360a26ad Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 15 Sep 2022 10:41:56 +0100 Subject: mm: add PSI accounting around ->read_folio and ->readahead calls PSI tries to account for the cost of bringing back in pages discarded by the MM LRU management. Currently the prime place for that is hooked into the bio submission path, which is a rather bad place: - it does not actually account I/O for non-block file systems, of which we have many - it adds overhead and a layering violation to the block layer Add the accounting into the two places in the core MM code that read pages into an address space by calling into ->read_folio and ->readahead so that the entire file system operations are covered, to broaden the coverage and allow removing the accounting in the block layer going forward. As psi_memstall_enter can deal with nested calls this will not lead to double accounting even while the bio annotations are still present. Signed-off-by: Christoph Hellwig Acked-by: Johannes Weiner Link: https://lore.kernel.org/r/20220915094200.139713-2-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/pagemap.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 0178b2040ea3..201dc7281640 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1173,6 +1173,8 @@ struct readahead_control { pgoff_t _index; unsigned int _nr_pages; unsigned int _batch_count; + bool _workingset; + unsigned long _pflags; }; #define DEFINE_READAHEAD(ractl, f, r, m, i) \ -- cgit From 118f3663fbc658e9ad6165e129076981c7b685c5 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 15 Sep 2022 10:42:00 +0100 Subject: block: remove PSI accounting from the bio layer PSI accounting is now done by the VM code, where it should have been since the beginning. Signed-off-by: Christoph Hellwig Acked-by: Johannes Weiner Link: https://lore.kernel.org/r/20220915094200.139713-6-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk_types.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 41afb4cfb9b0..e0b098089ef2 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -321,7 +321,6 @@ enum { BIO_NO_PAGE_REF, /* don't put release vec pages */ BIO_CLONED, /* doesn't own data */ BIO_BOUNCED, /* bio is a bounce bio */ - BIO_WORKINGSET, /* contains userspace workingset pages */ BIO_QUIET, /* Make BIO Quiet */ BIO_CHAIN, /* chained bio, ->bi_remaining in effect */ BIO_REFFED, /* bio has elevated ->bi_cnt */ -- cgit From 256dea9134c38fcc409ec7ea6a1eb67829b563bc Mon Sep 17 00:00:00 2001 From: Ronak Jain Date: Wed, 14 Sep 2022 18:03:15 +0530 Subject: firmware: xilinx: add support for sd/gem config Add new APIs in firmware to configure SD/GEM registers. Internally it calls PM IOCTL for below SD/GEM register configuration: - SD/EMMC select - SD slot type - SD base clock - SD 8 bit support - SD fixed config - GEM SGMII Mode - GEM fixed config Signed-off-by: Ronak Jain Signed-off-by: Radhey Shyam Pandey Reviewed-by: Claudiu Beznea Signed-off-by: Jakub Kicinski --- include/linux/firmware/xlnx-zynqmp.h | 45 ++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) (limited to 'include/linux') diff --git a/include/linux/firmware/xlnx-zynqmp.h b/include/linux/firmware/xlnx-zynqmp.h index 9f50dacbf7d6..76d2b3ebad84 100644 --- a/include/linux/firmware/xlnx-zynqmp.h +++ b/include/linux/firmware/xlnx-zynqmp.h @@ -153,6 +153,9 @@ enum pm_ioctl_id { /* Runtime feature configuration */ IOCTL_SET_FEATURE_CONFIG = 26, IOCTL_GET_FEATURE_CONFIG = 27, + /* Dynamic SD/GEM configuration */ + IOCTL_SET_SD_CONFIG = 30, + IOCTL_SET_GEM_CONFIG = 31, }; enum pm_query_id { @@ -399,6 +402,30 @@ enum pm_feature_config_id { PM_FEATURE_EXTWDT_VALUE = 4, }; +/** + * enum pm_sd_config_type - PM SD configuration. + * @SD_CONFIG_EMMC_SEL: To set SD_EMMC_SEL in CTRL_REG_SD and SD_SLOTTYPE + * @SD_CONFIG_BASECLK: To set SD_BASECLK in SD_CONFIG_REG1 + * @SD_CONFIG_8BIT: To set SD_8BIT in SD_CONFIG_REG2 + * @SD_CONFIG_FIXED: To set fixed config registers + */ +enum pm_sd_config_type { + SD_CONFIG_EMMC_SEL = 1, + SD_CONFIG_BASECLK = 2, + SD_CONFIG_8BIT = 3, + SD_CONFIG_FIXED = 4, +}; + +/** + * enum pm_gem_config_type - PM GEM configuration. + * @GEM_CONFIG_SGMII_MODE: To set GEM_SGMII_MODE in GEM_CLK_CTRL register + * @GEM_CONFIG_FIXED: To set fixed config registers + */ +enum pm_gem_config_type { + GEM_CONFIG_SGMII_MODE = 1, + GEM_CONFIG_FIXED = 2, +}; + /** * struct zynqmp_pm_query_data - PM query data * @qid: query ID @@ -475,6 +502,9 @@ int zynqmp_pm_is_function_supported(const u32 api_id, const u32 id); int zynqmp_pm_set_feature_config(enum pm_feature_config_id id, u32 value); int zynqmp_pm_get_feature_config(enum pm_feature_config_id id, u32 *payload); int zynqmp_pm_register_sgi(u32 sgi_num, u32 reset); +int zynqmp_pm_set_sd_config(u32 node, enum pm_sd_config_type config, u32 value); +int zynqmp_pm_set_gem_config(u32 node, enum pm_gem_config_type config, + u32 value); #else static inline int zynqmp_pm_get_api_version(u32 *version) { @@ -745,6 +775,21 @@ static inline int zynqmp_pm_register_sgi(u32 sgi_num, u32 reset) { return -ENODEV; } + +static inline int zynqmp_pm_set_sd_config(u32 node, + enum pm_sd_config_type config, + u32 value) +{ + return -ENODEV; +} + +static inline int zynqmp_pm_set_gem_config(u32 node, + enum pm_gem_config_type config, + u32 value) +{ + return -ENODEV; +} + #endif #endif /* __FIRMWARE_ZYNQMP_H__ */ -- cgit From 17df341d35265c8e14d71a48886fc7dcf92ef8a1 Mon Sep 17 00:00:00 2001 From: Christophe JAILLET Date: Sun, 11 Sep 2022 13:39:01 +0200 Subject: headers: Remove some left-over license text Remove a left-over from commit 2874c5fd2842 ("treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 152") There is no need for an empty "License:". Signed-off-by: Christophe JAILLET Link: https://lore.kernel.org/r/0e5ff727626b748238f4b78932f81572143d8f0b.1662896317.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski --- include/linux/if_pppol2tp.h | 2 -- include/linux/if_pppox.h | 2 -- 2 files changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/if_pppol2tp.h b/include/linux/if_pppol2tp.h index 96d40942e5a3..c87efd333faa 100644 --- a/include/linux/if_pppol2tp.h +++ b/include/linux/if_pppol2tp.h @@ -4,8 +4,6 @@ * * This file supplies definitions required by the PPP over L2TP driver * (l2tp_ppp.c). All version information wrt this file is located in l2tp_ppp.c - * - * License: */ #ifndef __LINUX_IF_PPPOL2TP_H #define __LINUX_IF_PPPOL2TP_H diff --git a/include/linux/if_pppox.h b/include/linux/if_pppox.h index 69e813bcb947..ff3beda1312c 100644 --- a/include/linux/if_pppox.h +++ b/include/linux/if_pppox.h @@ -5,8 +5,6 @@ * * This file supplies definitions required by the PPP over Ethernet driver * (pppox.c). All version information wrt this file is located in pppox.c - * - * License: */ #ifndef __LINUX_IF_PPPOX_H #define __LINUX_IF_PPPOX_H -- cgit From fdf214978a71b2749d26f6da2b1d51d9ac23831d Mon Sep 17 00:00:00 2001 From: Daniel Xu Date: Tue, 20 Sep 2022 08:15:24 -0600 Subject: bpf: Move nf_conn extern declarations to filter.h We're seeing the following new warnings on netdev/build_32bit and netdev/build_allmodconfig_warn CI jobs: ../net/core/filter.c:8608:1: warning: symbol 'nf_conn_btf_access_lock' was not declared. Should it be static? ../net/core/filter.c:8611:5: warning: symbol 'nfct_bsa' was not declared. Should it be static? Fix by ensuring extern declaration is present while compiling filter.o. Fixes: 864b656f82cc ("bpf: Add support for writing to nf_conn:mark") Signed-off-by: Daniel Xu Link: https://lore.kernel.org/r/2bd2e0283df36d8a4119605878edb1838d144174.1663683114.git.dxu@dxuuu.xyz Signed-off-by: Martin KaFai Lau --- include/linux/filter.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index 75335432fcbc..98e28126c24b 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -567,6 +567,12 @@ struct sk_filter { DECLARE_STATIC_KEY_FALSE(bpf_stats_enabled_key); +extern struct mutex nf_conn_btf_access_lock; +extern int (*nfct_btf_struct_access)(struct bpf_verifier_log *log, const struct btf *btf, + const struct btf_type *t, int off, int size, + enum bpf_access_type atype, u32 *next_btf_id, + enum bpf_type_flag *flag); + typedef unsigned int (*bpf_dispatcher_fn)(const void *ctx, const struct bpf_insn *insnsi, unsigned int (*bpf_func)(const void *, -- cgit From 6f9c07be9d020489326098801f0661f754c7c865 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 5 Sep 2022 16:08:20 -0700 Subject: lib/cpumask: add FORCE_NR_CPUS config option The size of cpumasks is hard-limited by compile-time parameter NR_CPUS, but defined at boot-time when kernel parses ACPI/DT tables, and stored in nr_cpu_ids. In many practical cases, number of CPUs for a target is known at compile time, and can be provided with NR_CPUS. In that case, compiler may be instructed to rely on NR_CPUS as on actual number of CPUs, not an upper limit. It allows to optimize many cpumask routines and significantly shrink size of the kernel image. This patch adds FORCE_NR_CPUS option to teach the compiler to rely on NR_CPUS and enable corresponding optimizations. If FORCE_NR_CPUS=y, kernel will not set nr_cpu_ids at boot, but only check that the actual number of possible CPUs is equal to NR_CPUS, and WARN if that doesn't hold. The new option is especially useful in embedded applications because kernel configurations are unique for each SoC, the number of CPUs is constant and known well, and memory limitations are typically harder. For my 4-CPU ARM64 build with NR_CPUS=4, FORCE_NR_CPUS=y saves 46KB: add/remove: 3/4 grow/shrink: 46/729 up/down: 652/-46952 (-46300) Signed-off-by: Yury Norov --- include/linux/cpumask.h | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 2f6622cead1f..1b442fb2001f 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -35,16 +35,20 @@ typedef struct cpumask { DECLARE_BITMAP(bits, NR_CPUS); } cpumask_t; */ #define cpumask_pr_args(maskp) nr_cpu_ids, cpumask_bits(maskp) -#if NR_CPUS == 1 -#define nr_cpu_ids 1U +#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS) +#define nr_cpu_ids ((unsigned int)NR_CPUS) #else extern unsigned int nr_cpu_ids; +#endif static inline void set_nr_cpu_ids(unsigned int nr) { +#if (NR_CPUS == 1) || defined(CONFIG_FORCE_NR_CPUS) + WARN_ON(nr != nr_cpu_ids); +#else nr_cpu_ids = nr; -} #endif +} /* Deprecated. Always use nr_cpu_ids. */ #define nr_cpumask_bits nr_cpu_ids -- cgit From 690aa8c3ae308bc696ec8b1b357b995193927083 Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Fri, 16 Sep 2022 14:28:32 +0200 Subject: ata: fix ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting() ACS-5 section 7.13.6.41 Words 85..87, 120: Commands and feature sets supported or enabled states that: If bit 15 of word 86 is set to one, bit 14 of word 119 is set to one, and bit 15 of word 119 is cleared to zero, then word 119 is valid. If bit 15 of word 86 is set to one, bit 14 of word 120 is set to one, and bit 15 of word 120 is cleared to zero, then word 120 is valid. (This text also exists in really old ACS standards, e.g. ACS-3.) Currently, ata_id_sense_reporting_enabled() and ata_id_has_sense_reporting() both check bit 15 of word 86, but neither of them check that bit 14 of word 119 is set to one, or that bit 15 of word 119 is cleared to zero. Additionally, make ata_id_sense_reporting_enabled() return false if !ata_id_has_sense_reporting(), similar to how e.g. ata_id_flush_ext_enabled() returns false if !ata_id_has_flush_ext(). Fixes: e87fd28cf9a2 ("libata: Implement support for sense data reporting") Signed-off-by: Niklas Cassel Signed-off-by: Damien Le Moal --- include/linux/ata.h | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ata.h b/include/linux/ata.h index 21292b5bbb55..868bfd503aee 100644 --- a/include/linux/ata.h +++ b/include/linux/ata.h @@ -771,16 +771,21 @@ static inline bool ata_id_has_read_log_dma_ext(const u16 *id) static inline bool ata_id_has_sense_reporting(const u16 *id) { - if (!(id[ATA_ID_CFS_ENABLE_2] & (1 << 15))) + if (!(id[ATA_ID_CFS_ENABLE_2] & BIT(15))) + return false; + if ((id[ATA_ID_COMMAND_SET_3] & (BIT(15) | BIT(14))) != BIT(14)) return false; - return id[ATA_ID_COMMAND_SET_3] & (1 << 6); + return id[ATA_ID_COMMAND_SET_3] & BIT(6); } static inline bool ata_id_sense_reporting_enabled(const u16 *id) { - if (!(id[ATA_ID_CFS_ENABLE_2] & (1 << 15))) + if (!ata_id_has_sense_reporting(id)) + return false; + /* ata_id_has_sense_reporting() == true, word 86 must have bit 15 set */ + if ((id[ATA_ID_COMMAND_SET_4] & (BIT(15) | BIT(14))) != BIT(14)) return false; - return id[ATA_ID_COMMAND_SET_4] & (1 << 6); + return id[ATA_ID_COMMAND_SET_4] & BIT(6); } /** -- cgit From 9c6e09a434e1317e09b78b3b69cd384022ec9a03 Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Fri, 16 Sep 2022 14:28:33 +0200 Subject: ata: fix ata_id_has_devslp() ACS-5 section 7.13.6.36 Word 78: Serial ATA features supported states that: If word 76 is not 0000h or FFFFh, word 78 reports the features supported by the device. If this word is not supported, the word shall be cleared to zero. (This text also exists in really old ACS standards, e.g. ACS-3.) Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros (which already have this check), thus making it more likely that the next ATA_ID_FEATURE_SUPP macro that is added will include this check. Fixes: 65fe1f0f66a5 ("ahci: implement aggressive SATA device sleep support") Signed-off-by: Niklas Cassel Signed-off-by: Damien Le Moal --- include/linux/ata.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ata.h b/include/linux/ata.h index 868bfd503aee..bc136a43689f 100644 --- a/include/linux/ata.h +++ b/include/linux/ata.h @@ -566,6 +566,10 @@ struct ata_bmdma_prd { ((((id)[ATA_ID_SATA_CAPABILITY] != 0x0000) && \ ((id)[ATA_ID_SATA_CAPABILITY] != 0xffff)) && \ ((id)[ATA_ID_FEATURE_SUPP] & (1 << 2))) +#define ata_id_has_devslp(id) \ + ((((id)[ATA_ID_SATA_CAPABILITY] != 0x0000) && \ + ((id)[ATA_ID_SATA_CAPABILITY] != 0xffff)) && \ + ((id)[ATA_ID_FEATURE_SUPP] & (1 << 8))) #define ata_id_iordy_disable(id) ((id)[ATA_ID_CAPABILITY] & (1 << 10)) #define ata_id_has_iordy(id) ((id)[ATA_ID_CAPABILITY] & (1 << 11)) #define ata_id_u32(id,n) \ @@ -578,7 +582,6 @@ struct ata_bmdma_prd { #define ata_id_cdb_intr(id) (((id)[ATA_ID_CONFIG] & 0x60) == 0x20) #define ata_id_has_da(id) ((id)[ATA_ID_SATA_CAPABILITY_2] & (1 << 4)) -#define ata_id_has_devslp(id) ((id)[ATA_ID_FEATURE_SUPP] & (1 << 8)) #define ata_id_has_ncq_autosense(id) \ ((id)[ATA_ID_FEATURE_SUPP] & (1 << 7)) -- cgit From a5fb6bf853148974dbde092ec1bde553bea5e49f Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Fri, 16 Sep 2022 14:28:34 +0200 Subject: ata: fix ata_id_has_ncq_autosense() ACS-5 section 7.13.6.36 Word 78: Serial ATA features supported states that: If word 76 is not 0000h or FFFFh, word 78 reports the features supported by the device. If this word is not supported, the word shall be cleared to zero. (This text also exists in really old ACS standards, e.g. ACS-3.) Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros (which already have this check), thus making it more likely that the next ATA_ID_FEATURE_SUPP macro that is added will include this check. Fixes: 5b01e4b9efa0 ("libata: Implement NCQ autosense") Signed-off-by: Niklas Cassel Signed-off-by: Damien Le Moal --- include/linux/ata.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ata.h b/include/linux/ata.h index bc136a43689f..4845443e0f08 100644 --- a/include/linux/ata.h +++ b/include/linux/ata.h @@ -570,6 +570,10 @@ struct ata_bmdma_prd { ((((id)[ATA_ID_SATA_CAPABILITY] != 0x0000) && \ ((id)[ATA_ID_SATA_CAPABILITY] != 0xffff)) && \ ((id)[ATA_ID_FEATURE_SUPP] & (1 << 8))) +#define ata_id_has_ncq_autosense(id) \ + ((((id)[ATA_ID_SATA_CAPABILITY] != 0x0000) && \ + ((id)[ATA_ID_SATA_CAPABILITY] != 0xffff)) && \ + ((id)[ATA_ID_FEATURE_SUPP] & (1 << 7))) #define ata_id_iordy_disable(id) ((id)[ATA_ID_CAPABILITY] & (1 << 10)) #define ata_id_has_iordy(id) ((id)[ATA_ID_CAPABILITY] & (1 << 11)) #define ata_id_u32(id,n) \ @@ -582,8 +586,6 @@ struct ata_bmdma_prd { #define ata_id_cdb_intr(id) (((id)[ATA_ID_CONFIG] & 0x60) == 0x20) #define ata_id_has_da(id) ((id)[ATA_ID_SATA_CAPABILITY_2] & (1 << 4)) -#define ata_id_has_ncq_autosense(id) \ - ((id)[ATA_ID_FEATURE_SUPP] & (1 << 7)) static inline bool ata_id_has_hipm(const u16 *id) { -- cgit From 630624cb1b5826d753ac8e01a0e42de43d66dedf Mon Sep 17 00:00:00 2001 From: Niklas Cassel Date: Fri, 16 Sep 2022 14:28:35 +0200 Subject: ata: fix ata_id_has_dipm() ACS-5 section 7.13.6.36 Word 78: Serial ATA features supported states that: If word 76 is not 0000h or FFFFh, word 78 reports the features supported by the device. If this word is not supported, the word shall be cleared to zero. (This text also exists in really old ACS standards, e.g. ACS-3.) The problem with ata_id_has_dipm() is that the while it performs a check against 0 and 0xffff, it performs the check against ATA_ID_FEATURE_SUPP (word 78), the same word where the feature bit is stored. Fix this by performing the check against ATA_ID_SATA_CAPABILITY (word 76), like required by the spec. The feature bit check itself is of course still performed against ATA_ID_FEATURE_SUPP (word 78). Additionally, move the macro to the other ATA_ID_FEATURE_SUPP macros (which already have this check), thus making it more likely that the next ATA_ID_FEATURE_SUPP macro that is added will include this check. Fixes: ca77329fb713 ("[libata] Link power management infrastructure") Signed-off-by: Niklas Cassel Signed-off-by: Damien Le Moal --- include/linux/ata.h | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ata.h b/include/linux/ata.h index 4845443e0f08..e3050e153a71 100644 --- a/include/linux/ata.h +++ b/include/linux/ata.h @@ -574,6 +574,10 @@ struct ata_bmdma_prd { ((((id)[ATA_ID_SATA_CAPABILITY] != 0x0000) && \ ((id)[ATA_ID_SATA_CAPABILITY] != 0xffff)) && \ ((id)[ATA_ID_FEATURE_SUPP] & (1 << 7))) +#define ata_id_has_dipm(id) \ + ((((id)[ATA_ID_SATA_CAPABILITY] != 0x0000) && \ + ((id)[ATA_ID_SATA_CAPABILITY] != 0xffff)) && \ + ((id)[ATA_ID_FEATURE_SUPP] & (1 << 3))) #define ata_id_iordy_disable(id) ((id)[ATA_ID_CAPABILITY] & (1 << 10)) #define ata_id_has_iordy(id) ((id)[ATA_ID_CAPABILITY] & (1 << 11)) #define ata_id_u32(id,n) \ @@ -597,17 +601,6 @@ static inline bool ata_id_has_hipm(const u16 *id) return val & (1 << 9); } -static inline bool ata_id_has_dipm(const u16 *id) -{ - u16 val = id[ATA_ID_FEATURE_SUPP]; - - if (val == 0 || val == 0xffff) - return false; - - return val & (1 << 3); -} - - static inline bool ata_id_has_fua(const u16 *id) { if ((id[ATA_ID_CFSSE] & 0xC000) != 0x4000) -- cgit From 7ebb5f8e001037efbdf18afabbbebf71bd98825e Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 21 Sep 2022 10:40:53 +0800 Subject: Revert "iommu/vt-d: Fix possible recursive locking in intel_iommu_init()" This reverts commit 9cd4f1434479f1ac25c440c421fbf52069079914. Some issues were reported on the original commit. Some thunderbolt devices don't work anymore due to the following DMA fault. DMAR: DRHD: handling fault status reg 2 DMAR: [INTR-REMAP] Request device [09:00.0] fault index 0x8080 [fault reason 0x25] Blocked a compatibility format interrupt request Bring it back for now to avoid functional regression. Fixes: 9cd4f1434479f ("iommu/vt-d: Fix possible recursive locking in intel_iommu_init()") Link: https://lore.kernel.org/linux-iommu/485A6EA5-6D58-42EA-B298-8571E97422DE@getmailspring.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216497 Cc: Mika Westerberg Cc: # 5.19.x Reported-and-tested-by: George Hilliard Signed-off-by: Lu Baolu Link: https://lore.kernel.org/r/20220920081701.3453504-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/dmar.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dmar.h b/include/linux/dmar.h index 8917a32173c4..d81a51978d01 100644 --- a/include/linux/dmar.h +++ b/include/linux/dmar.h @@ -65,7 +65,6 @@ struct dmar_pci_notify_info { extern struct rw_semaphore dmar_global_lock; extern struct list_head dmar_drhd_units; -extern int intel_iommu_enabled; #define for_each_drhd_unit(drhd) \ list_for_each_entry_rcu(drhd, &dmar_drhd_units, list, \ @@ -89,8 +88,7 @@ extern int intel_iommu_enabled; static inline bool dmar_rcu_check(void) { return rwsem_is_locked(&dmar_global_lock) || - system_state == SYSTEM_BOOTING || - (IS_ENABLED(CONFIG_INTEL_IOMMU) && !intel_iommu_enabled); + system_state == SYSTEM_BOOTING; } #define dmar_rcu_dereference(p) rcu_dereference_check((p), dmar_rcu_check()) -- cgit From 65394169bdae073bfb2c6816f5bf095bd7d53e61 Mon Sep 17 00:00:00 2001 From: Michał Kępień Date: Wed, 29 Jun 2022 14:57:34 +0200 Subject: mtd: track maximum number of bitflips for each read request MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit mtd_read_oob() callers are currently oblivious to the details of ECC errors detected during the read operation - they only learn (through the return value) whether any corrected bitflips or uncorrectable errors occurred. More detailed ECC information can be useful to user-space applications for making better-informed choices about moving data around. Extend struct mtd_oob_ops with a pointer to a newly-introduced struct mtd_req_stats and set its 'max_bitflips' field to the maximum number of bitflips found in a single ECC step during the read operation performed by mtd_read_oob(). This is a prerequisite for ultimately passing that value back to user space. Suggested-by: Boris Brezillon Signed-off-by: Michał Kępień Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220629125737.14418-2-kernel@kempniu.pl --- include/linux/mtd/mtd.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h index 955aee14b0f7..fccad1766458 100644 --- a/include/linux/mtd/mtd.h +++ b/include/linux/mtd/mtd.h @@ -40,6 +40,10 @@ struct mtd_erase_region_info { unsigned long *lockmap; /* If keeping bitmap of locks */ }; +struct mtd_req_stats { + unsigned int max_bitflips; +}; + /** * struct mtd_oob_ops - oob operation operands * @mode: operation mode @@ -70,6 +74,7 @@ struct mtd_oob_ops { uint32_t ooboffs; uint8_t *datbuf; uint8_t *oobbuf; + struct mtd_req_stats *stats; }; /** -- cgit From 7bea6056927727f98f4efdd338f112f7517f05b5 Mon Sep 17 00:00:00 2001 From: Michał Kępień Date: Wed, 29 Jun 2022 14:57:36 +0200 Subject: mtd: add ECC error accounting for each read request MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Extend struct mtd_req_stats with two new fields holding the number of corrected bitflips and uncorrectable errors detected during a read operation. This is a prerequisite for ultimately passing those counters to user space, where they can be useful to applications for making better-informed choices about moving data around. Unlike 'max_bitflips' (which is set - in a common code path - to the return value of a function called while the MTD device's mutex is held), these counters have to be maintained in each MTD driver which defines the '_read_oob' callback because the statistics need to be calculated while the MTD device's mutex is held. Suggested-by: Boris Brezillon Signed-off-by: Michał Kępień Signed-off-by: Miquel Raynal Link: https://lore.kernel.org/linux-mtd/20220629125737.14418-4-kernel@kempniu.pl --- include/linux/mtd/mtd.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h index fccad1766458..c12a5930f32c 100644 --- a/include/linux/mtd/mtd.h +++ b/include/linux/mtd/mtd.h @@ -41,6 +41,8 @@ struct mtd_erase_region_info { }; struct mtd_req_stats { + unsigned int uncorrectable_errors; + unsigned int corrected_bitflips; unsigned int max_bitflips; }; -- cgit From b2bed51a5261f4266ecb857bba680a7f668d3ddf Mon Sep 17 00:00:00 2001 From: Bart Van Assche Date: Tue, 20 Sep 2022 13:06:26 -0700 Subject: block: Fix the enum blk_eh_timer_return documentation The documentation of the blk_eh_timer_return enumeration values does not reflect correctly how e.g. the SCSI core uses these values. Fix the documentation. Cc: Christoph Hellwig Cc: Ming Lei Cc: Hannes Reinecke Cc: Damien Le Moal Cc: Johannes Thumshirn Fixes: 88b0cfad2888 ("block: document the blk_eh_timer_return values") Signed-off-by: Bart Van Assche Reviewed-by: Johannes Thumshirn Reviewed-by: Damien Le Moal Link: https://lore.kernel.org/r/20220920200626.3422296-1-bvanassche@acm.org Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 74b99d716b0b..e21ff85751cf 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -268,9 +268,16 @@ static inline void rq_list_move(struct request **src, struct request **dst, rq_list_add(dst, rq); } +/** + * enum blk_eh_timer_return - How the timeout handler should proceed + * @BLK_EH_DONE: The block driver completed the command or will complete it at + * a later time. + * @BLK_EH_RESET_TIMER: Reset the request timer and continue waiting for the + * request to complete. + */ enum blk_eh_timer_return { - BLK_EH_DONE, /* drivers has completed the command */ - BLK_EH_RESET_TIMER, /* reset timer and try again */ + BLK_EH_DONE, + BLK_EH_RESET_TIMER, }; #define BLK_TAG_ALLOC_FIFO 0 /* allocate starting from 0 */ -- cgit From 9f0deaa12d832f488500a5afe9b912e9b3cfc432 Mon Sep 17 00:00:00 2001 From: Dylan Yudaken Date: Tue, 16 Aug 2022 06:59:59 -0700 Subject: eventfd: guard wake_up in eventfd fs calls as well Guard wakeups that the user can trigger, and that may end up triggering a call back into eventfd_signal. This is in addition to the current approach that only guards in eventfd_signal. Rename in_eventfd_signal -> in_eventfd at the same time to reflect this. Without this there would be a deadlock in the following code using libaio: int main() { struct io_context *ctx = NULL; struct iocb iocb; struct iocb *iocbs[] = { &iocb }; int evfd; uint64_t val = 1; evfd = eventfd(0, EFD_CLOEXEC); assert(!io_setup(2, &ctx)); io_prep_poll(&iocb, evfd, POLLIN); io_set_eventfd(&iocb, evfd); assert(1 == io_submit(ctx, 1, iocbs)); write(evfd, &val, 8); } Signed-off-by: Dylan Yudaken Reviewed-by: Jens Axboe Link: https://lore.kernel.org/r/20220816135959.1490641-1-dylany@fb.com Signed-off-by: Jens Axboe --- include/linux/eventfd.h | 2 +- include/linux/sched.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index 305d5f19093b..30eb30d6909b 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -46,7 +46,7 @@ void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt); static inline bool eventfd_signal_allowed(void) { - return !current->in_eventfd_signal; + return !current->in_eventfd; } #else /* CONFIG_EVENTFD */ diff --git a/include/linux/sched.h b/include/linux/sched.h index e7b2f8a5c711..8d82d6d32670 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -936,7 +936,7 @@ struct task_struct { #endif #ifdef CONFIG_EVENTFD /* Recursion prevention for eventfd_signal() */ - unsigned in_eventfd_signal:1; + unsigned in_eventfd:1; #endif #ifdef CONFIG_IOMMU_SVA unsigned pasid_activated:1; -- cgit From c0e0d6ba25f180ab76d3c18f8b360a119dffa634 Mon Sep 17 00:00:00 2001 From: Dylan Yudaken Date: Tue, 30 Aug 2022 05:50:10 -0700 Subject: io_uring: add IORING_SETUP_DEFER_TASKRUN Allow deferring async tasks until the user calls io_uring_enter(2) with the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at io_uring_setup time. This functionality requires that the later io_uring_enter will be called from the same submission task, and therefore restrict this flag to work only when IORING_SETUP_SINGLE_ISSUER is also set. Being able to hand pick when tasks are run prevents the problem where there is current work to be done, however task work runs anyway. For example, a common workload would obtain a batch of CQEs, and process each one. Interrupting this to additional taskwork would add latency but not gain anything. If instead task work is deferred to just before more CQEs are obtained then no additional latency is added. The way this is implemented is by trying to keep task work local to a io_ring_ctx, rather than to the submission task. This is required, as the application will want to wake up only a single io_ring_ctx at a time to process work, and so the lists of work have to be kept separate. This has some other benefits like not having to check the task continually in handle_tw_list (and potentially unlocking/locking those), and reducing locks in the submit & process completions path. There are networking cases where using this option can reduce request latency by 50%. For example a contrived example using [1] where the client sends 2k data and receives the same data back while doing some system calls (to trigger task work) shows this reduction. The reason ends up being that if sending responses is delayed by processing task work, then the client side sits idle. Whereas reordering the sends first means that the client runs it's workload in parallel with the local task work. [1]: Using https://github.com/DylanZA/netbench/tree/defer_run Client: ./netbench --client_only 1 --control_port 10000 --host --tx "epoll --threads 16 --per_thread 1 --size 2048 --resp 2048 --workload 1000" Server: ./netbench --server_only 1 --control_port 10000 --rx "io_uring --defer_taskrun 0 --workload 100" --rx "io_uring --defer_taskrun 1 --workload 100" Signed-off-by: Dylan Yudaken Link: https://lore.kernel.org/r/20220830125013.570060-5-dylany@fb.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 677a25d44d7f..d56ff2185168 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -301,6 +301,8 @@ struct io_ring_ctx { struct io_hash_table cancel_table; bool poll_multi_queue; + struct llist_head work_llist; + struct list_head io_buffers_comp; } ____cacheline_aligned_in_smp; -- cgit From 21a091b970cdbcf3e8ff829234b51be6f9192766 Mon Sep 17 00:00:00 2001 From: Dylan Yudaken Date: Tue, 30 Aug 2022 05:50:12 -0700 Subject: io_uring: signal registered eventfd to process deferred task work Some workloads rely on a registered eventfd (via io_uring_register_eventfd(3)) in order to wake up and process the io_uring. In the case of a ring setup with IORING_SETUP_DEFER_TASKRUN, that eventfd also needs to be signalled when there are tasks to run. This changes an old behaviour which assumed 1 eventfd signal implied at least 1 CQE, however only when this new flag is set (and so old users will not notice). This should be expected with the IORING_SETUP_DEFER_TASKRUN flag as it is not guaranteed that every task will result in a CQE. Signed-off-by: Dylan Yudaken Link: https://lore.kernel.org/r/20220830125013.570060-7-dylany@fb.com [axboe: fold in call_rcu() serialization fix] Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index d56ff2185168..aa4d90a53866 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -184,6 +184,8 @@ struct io_ev_fd { struct eventfd_ctx *cq_ev_fd; unsigned int eventfd_async: 1; struct rcu_head rcu; + atomic_t refs; + atomic_t ops; }; struct io_alloc_cache { -- cgit From de27e18e86173b704beaa19f0ee376f3305c4794 Mon Sep 17 00:00:00 2001 From: Kanchan Joshi Date: Tue, 23 Aug 2022 21:44:40 +0530 Subject: fs: add file_operations->uring_cmd_iopoll io_uring will invoke this to do completion polling on uring-cmd operations. Signed-off-by: Kanchan Joshi Link: https://lore.kernel.org/r/20220823161443.49436-2-joshi.k@samsung.com Signed-off-by: Jens Axboe --- include/linux/fs.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9eced4cc286e..d6badd19784f 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2132,6 +2132,7 @@ struct file_operations { loff_t len, unsigned int remap_flags); int (*fadvise)(struct file *, loff_t, loff_t, int); int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags); + int (*uring_cmd_iopoll)(struct io_uring_cmd *ioucmd); } __randomize_layout; struct inode_operations { -- cgit From 5756a3a7e713bcab705a5f0c810a2b1f7f4ecfaa Mon Sep 17 00:00:00 2001 From: Kanchan Joshi Date: Tue, 23 Aug 2022 21:44:41 +0530 Subject: io_uring: add iopoll infrastructure for io_uring_cmd Put this up in the same way as iopoll is done for regular read/write IO. Make place for storing a cookie into struct io_uring_cmd on submission. Perform the completion using the ->uring_cmd_iopoll handler. Signed-off-by: Kanchan Joshi Signed-off-by: Pankaj Raghav Link: https://lore.kernel.org/r/20220823161443.49436-3-joshi.k@samsung.com Signed-off-by: Jens Axboe --- include/linux/io_uring.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 4a2f6cc5a492..58676c0a398f 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -20,8 +20,12 @@ enum io_uring_cmd_flags { struct io_uring_cmd { struct file *file; const void *cmd; - /* callback to defer completions to task context */ - void (*task_work_cb)(struct io_uring_cmd *cmd); + union { + /* callback to defer completions to task context */ + void (*task_work_cb)(struct io_uring_cmd *cmd); + /* used for polled completion */ + void *cookie; + }; u32 cmd_op; u32 pad; u8 pdu[32]; /* available inline for free use */ -- cgit From c6e99ea482e2a9e1fef2488891242f9749584225 Mon Sep 17 00:00:00 2001 From: Kanchan Joshi Date: Tue, 23 Aug 2022 21:44:42 +0530 Subject: block: export blk_rq_is_poll This is in preparation to support iopoll for nvme passthrough. Signed-off-by: Kanchan Joshi Link: https://lore.kernel.org/r/20220823161443.49436-4-joshi.k@samsung.com Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 92294a5fb083..de384f5d2c37 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -980,6 +980,7 @@ int blk_rq_map_kern(struct request_queue *, struct request *, void *, int blk_rq_append_bio(struct request *rq, struct bio *bio); void blk_execute_rq_nowait(struct request *rq, bool at_head); blk_status_t blk_execute_rq(struct request *rq, bool at_head); +bool blk_rq_is_poll(struct request *rq); struct req_iterator { struct bvec_iter iter; -- cgit From de97fcb30316410a2c46be102f074a454ecc6cf1 Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Fri, 2 Sep 2022 15:18:05 -0600 Subject: fs: add batch and poll flags to the uring_cmd_iopoll() handler We need the poll_flags to know how to poll for the IO, and we should have the batch structure in preparation for supporting batched completions with iopoll. Signed-off-by: Jens Axboe --- include/linux/fs.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index d6badd19784f..01681d061a6a 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2132,7 +2132,8 @@ struct file_operations { loff_t len, unsigned int remap_flags); int (*fadvise)(struct file *, loff_t, loff_t, int); int (*uring_cmd)(struct io_uring_cmd *ioucmd, unsigned int issue_flags); - int (*uring_cmd_iopoll)(struct io_uring_cmd *ioucmd); + int (*uring_cmd_iopoll)(struct io_uring_cmd *, struct io_comp_batch *, + unsigned int poll_flags); } __randomize_layout; struct inode_operations { -- cgit From 1d7b61c06dc310421911dac7c5d2d15b754c8b63 Mon Sep 17 00:00:00 2001 From: Arnaud Pouliquen Date: Wed, 21 Sep 2022 15:50:44 +0200 Subject: remoteproc: virtio: Create platform device for the remoteproc_virtio Define a platform driver to manage the remoteproc virtio device as a platform devices. The platform device allows to pass rproc_vdev_data platform data to specify properties that are stored in the rproc_vdev structure. Such approach will allow to preserve legacy remoteproc virtio device creation but also to probe the device using device tree mechanism. remoteproc_virtio.c update: - Add rproc_virtio_driver platform driver. The probe ops replaces the rproc_rvdev_add_device function. - All reference to the rvdev->dev has been updated to rvdev-pdev->dev. - rproc_rvdev_release is removed as associated to the rvdev device. - The use of rvdev->kref counter is replaced by get/put_device on the remoteproc virtio platform device. - The vdev device no longer increments rproc device counter. increment/decrement is done in rproc_virtio_probe/rproc_virtio_remove function in charge of the vrings allocation/free. remoteproc_core.c update: Migrate from the rvdev device to the rvdev platform device. From this patch, when a vdev resource is found in the resource table the remoteproc core register a platform device. Signed-off-by: Arnaud Pouliquen Link: https://lore.kernel.org/r/20220921135044.917140-5-arnaud.pouliquen@foss.st.com Signed-off-by: Mathieu Poirier --- include/linux/remoteproc.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index aea79c77db0f..1abf56ad02da 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -616,9 +616,8 @@ struct rproc_vring { /** * struct rproc_vdev - remoteproc state for a supported virtio device - * @refcount: reference counter for the vdev and vring allocations * @subdev: handle for registering the vdev as a rproc subdevice - * @dev: device struct used for reference count semantics + * @pdev: remoteproc virtio platform device * @id: virtio device id (as in virtio_ids.h) * @node: list node * @rproc: the rproc handle @@ -627,10 +626,9 @@ struct rproc_vring { * @index: vdev position versus other vdev declared in resource table */ struct rproc_vdev { - struct kref refcount; struct rproc_subdev subdev; - struct device dev; + struct platform_device *pdev; unsigned int id; struct list_head node; -- cgit From 14a99e130f2758bc826a7e7a8bdf6f7400b54f0f Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Wed, 14 Sep 2022 19:07:28 -0700 Subject: lib/find_bit: create find_first_zero_bit_le() find_first_zero_bit_le() is an alias to find_next_zero_bit_le(), despite that 'next' is known to be slower than 'first' version. Now that we have common FIND_FIRST_BIT() macro helper, it's trivial to implement find_first_zero_bit_le() as a real function. Reviewed-by: Valentin Schneider Signed-off-by: Yury Norov --- include/linux/find.h | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/find.h b/include/linux/find.h index 424ef67d4a42..2464bff5de04 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -17,6 +17,10 @@ extern unsigned long _find_first_and_bit(const unsigned long *addr1, extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size); extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size); +#ifdef __BIG_ENDIAN +unsigned long _find_first_zero_bit_le(const unsigned long *addr, unsigned long size); +#endif + #ifndef find_next_bit /** * find_next_bit - find the next set bit in a memory region @@ -251,6 +255,20 @@ unsigned long find_next_zero_bit_le(const void *addr, unsigned } #endif +#ifndef find_first_zero_bit_le +static inline +unsigned long find_first_zero_bit_le(const void *addr, unsigned long size) +{ + if (small_const_nbits(size)) { + unsigned long val = swab(*(const unsigned long *)addr) | ~GENMASK(size - 1, 0); + + return val == ~0UL ? size : ffz(val); + } + + return _find_first_zero_bit_le(addr, size); +} +#endif + #ifndef find_next_bit_le static inline unsigned long find_next_bit_le(const void *addr, unsigned @@ -270,11 +288,6 @@ unsigned long find_next_bit_le(const void *addr, unsigned } #endif -#ifndef find_first_zero_bit_le -#define find_first_zero_bit_le(addr, size) \ - find_next_zero_bit_le((addr), (size), 0) -#endif - #else #error "Please fix " #endif -- cgit From e79864f3164c573afce09ec4b72b75ebe061c14d Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Wed, 14 Sep 2022 19:07:29 -0700 Subject: lib/find_bit: optimize find_next_bit() functions Over the past couple years, the function _find_next_bit() was extended with parameters that modify its behavior to implement and- zero- and le- flavors. The parameters are passed at compile time, but current design prevents a compiler from optimizing out the conditionals. As find_next_bit() API grows, I expect that more parameters will be added. Current design would require more conditional code in _find_next_bit(), which would bloat the helper even more and make it barely readable. This patch replaces _find_next_bit() with a macro FIND_NEXT_BIT, and adds a set of wrappers, so that the compile-time optimizations become possible. The common logic is moved to the new macro, and all flavors may be generated by providing a FETCH macro parameter, like in this example: #define FIND_NEXT_BIT(FETCH, MUNGE, size, start) ... find_next_xornot_and_bit(addr1, addr2, addr3, size, start) { return FIND_NEXT_BIT(addr1[idx] ^ ~addr2[idx] & addr3[idx], /* nop */, size, start); } The FETCH may be of any complexity, as soon as it only refers the bitmap(s) and an iterator idx. MUNGE is here to support _le code generation for BE builds. May be empty. I ran find_bit_benchmark 16 times on top of 6.0-rc2 and 16 times on top of 6.0-rc2 + this series. The results for kvm/x86_64 are: v6.0-rc2 Optimized Difference Z-score Random dense bitmap ns ns ns % find_next_bit: 787735 670546 117189 14.9 3.97 find_next_zero_bit: 777492 664208 113284 14.6 10.51 find_last_bit: 830925 687573 143352 17.3 2.35 find_first_bit: 3874366 3306635 567731 14.7 1.84 find_first_and_bit: 40677125 37739887 2937238 7.2 1.36 find_next_and_bit: 347865 304456 43409 12.5 1.35 Random sparse bitmap find_next_bit: 19816 14021 5795 29.2 6.10 find_next_zero_bit: 1318901 1223794 95107 7.2 1.41 find_last_bit: 14573 13514 1059 7.3 6.92 find_first_bit: 1313321 1249024 64297 4.9 1.53 find_first_and_bit: 8921 8098 823 9.2 4.56 find_next_and_bit: 9796 7176 2620 26.7 5.39 Where the statistics is significant (z-score > 3), the improvement is ~15%. According to the bloat-o-meter, the Image size is 10-11K less: x86_64/defconfig: add/remove: 32/14 grow/shrink: 61/782 up/down: 6344/-16521 (-10177) arm64/defconfig: add/remove: 3/2 grow/shrink: 50/714 up/down: 608/-11556 (-10948) Suggested-by: Linus Torvalds Signed-off-by: Yury Norov --- include/linux/find.h | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/find.h b/include/linux/find.h index 2464bff5de04..dead6f53a97b 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -8,9 +8,12 @@ #include -extern unsigned long _find_next_bit(const unsigned long *addr1, - const unsigned long *addr2, unsigned long nbits, - unsigned long start, unsigned long invert, unsigned long le); +unsigned long _find_next_bit(const unsigned long *addr1, unsigned long nbits, + unsigned long start); +unsigned long _find_next_and_bit(const unsigned long *addr1, const unsigned long *addr2, + unsigned long nbits, unsigned long start); +unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits, + unsigned long start); extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size); extern unsigned long _find_first_and_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long size); @@ -19,6 +22,10 @@ extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long siz #ifdef __BIG_ENDIAN unsigned long _find_first_zero_bit_le(const unsigned long *addr, unsigned long size); +unsigned long _find_next_zero_bit_le(const unsigned long *addr, unsigned + long size, unsigned long offset); +unsigned long _find_next_bit_le(const unsigned long *addr, unsigned + long size, unsigned long offset); #endif #ifndef find_next_bit @@ -45,7 +52,7 @@ unsigned long find_next_bit(const unsigned long *addr, unsigned long size, return val ? __ffs(val) : size; } - return _find_next_bit(addr, NULL, size, offset, 0UL, 0); + return _find_next_bit(addr, size, offset); } #endif @@ -75,7 +82,7 @@ unsigned long find_next_and_bit(const unsigned long *addr1, return val ? __ffs(val) : size; } - return _find_next_bit(addr1, addr2, size, offset, 0UL, 0); + return _find_next_and_bit(addr1, addr2, size, offset); } #endif @@ -103,7 +110,7 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, return val == ~0UL ? size : ffz(val); } - return _find_next_bit(addr, NULL, size, offset, ~0UL, 0); + return _find_next_zero_bit(addr, size, offset); } #endif @@ -251,7 +258,7 @@ unsigned long find_next_zero_bit_le(const void *addr, unsigned return val == ~0UL ? size : ffz(val); } - return _find_next_bit(addr, NULL, size, offset, ~0UL, 1); + return _find_next_zero_bit_le(addr, size, offset); } #endif @@ -284,7 +291,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned return val ? __ffs(val) : size; } - return _find_next_bit(addr, NULL, size, offset, 0UL, 1); + return _find_next_bit_le(addr, size, offset); } #endif -- cgit From cb9ff3f3b84c95867856c3be42de73972feb1249 Mon Sep 17 00:00:00 2001 From: Kevin Tian Date: Wed, 21 Sep 2022 18:43:47 +0800 Subject: vfio: Add helpers for unifying vfio_device life cycle The idea is to let vfio core manage the vfio_device life cycle instead of duplicating the logic cross drivers. This is also a preparatory step for adding struct device into vfio_device. New pair of helpers together with a kref in vfio_device: - vfio_alloc_device() - vfio_put_device() Drivers can register @init/@release callbacks to manage any private state wrapping the vfio_device. However vfio-ccw doesn't fit this model due to a life cycle mess that its private structure mixes both parent and mdev info hence must be allocated/freed outside of the life cycle of vfio device. Per prior discussions this won't be fixed in short term by IBM folks. Instead of waiting for those modifications introduce another helper vfio_init_device() so ccw can call it to initialize a pre-allocated vfio_device. Further implication of the ccw trick is that vfio_device cannot be freed uniformly in vfio core. Instead, require *EVERY* driver to implement @release and free vfio_device inside. Then ccw can choose to delay the free at its own discretion. Another trick down the road is that kvzalloc() is used to accommodate the need of gvt which uses vzalloc() while all others use kzalloc(). So drivers should call a helper vfio_free_device() to free the vfio_device instead of assuming that kfree() or vfree() is appliable. Later once the ccw mess is fixed we can remove those tricks and fully handle structure alloc/free in vfio core. Existing vfio_{un}init_group_dev() will be deprecated after all existing usages are converted to the new model. Suggested-by: Jason Gunthorpe Co-developed-by: Yi Liu Signed-off-by: Yi Liu Signed-off-by: Kevin Tian Reviewed-by: Tony Krowiak Reviewed-by: Jason Gunthorpe Reviewed-by: Eric Auger Link: https://lore.kernel.org/r/20220921104401.38898-2-kevin.tian@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 0e2826559091..f67cac700e6f 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -47,7 +47,8 @@ struct vfio_device { struct kvm *kvm; /* Members below here are private, not for driver use */ - refcount_t refcount; + struct kref kref; /* object life cycle */ + refcount_t refcount; /* user count on registered device*/ unsigned int open_count; struct completion comp; struct list_head group_next; @@ -57,6 +58,8 @@ struct vfio_device { /** * struct vfio_device_ops - VFIO bus driver device callbacks * + * @init: initialize private fields in device structure + * @release: Reclaim private fields in device structure * @open_device: Called when the first file descriptor is opened for this device * @close_device: Opposite of open_device * @read: Perform read(2) on device file descriptor @@ -74,6 +77,8 @@ struct vfio_device { */ struct vfio_device_ops { char *name; + int (*init)(struct vfio_device *vdev); + void (*release)(struct vfio_device *vdev); int (*open_device)(struct vfio_device *vdev); void (*close_device)(struct vfio_device *vdev); ssize_t (*read)(struct vfio_device *vdev, char __user *buf, @@ -161,6 +166,24 @@ static inline int vfio_check_feature(u32 flags, size_t argsz, u32 supported_ops, return 1; } +struct vfio_device *_vfio_alloc_device(size_t size, struct device *dev, + const struct vfio_device_ops *ops); +#define vfio_alloc_device(dev_struct, member, dev, ops) \ + container_of(_vfio_alloc_device(sizeof(struct dev_struct) + \ + BUILD_BUG_ON_ZERO(offsetof( \ + struct dev_struct, member)), \ + dev, ops), \ + struct dev_struct, member) + +int vfio_init_device(struct vfio_device *device, struct device *dev, + const struct vfio_device_ops *ops); +void vfio_free_device(struct vfio_device *device); +void vfio_device_release(struct kref *kref); +static inline void vfio_put_device(struct vfio_device *device) +{ + kref_put(&device->kref, vfio_device_release); +} + void vfio_init_group_dev(struct vfio_device *device, struct device *dev, const struct vfio_device_ops *ops); void vfio_uninit_group_dev(struct vfio_device *device); -- cgit From 63d7c77989de98d3e92611dbb858028b74dca377 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Wed, 21 Sep 2022 18:43:48 +0800 Subject: vfio/pci: Use the new device life cycle helpers Also introduce two pci core helpers as @init/@release for pci drivers: - vfio_pci_core_init_dev() - vfio_pci_core_release_dev() Signed-off-by: Yi Liu Signed-off-by: Kevin Tian Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220921104401.38898-3-kevin.tian@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio_pci_core.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 089b603bcfdc..0499ea836058 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -109,6 +109,8 @@ void vfio_pci_core_close_device(struct vfio_device *core_vdev); void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev, struct pci_dev *pdev, const struct vfio_device_ops *vfio_pci_ops); +int vfio_pci_core_init_dev(struct vfio_device *core_vdev); +void vfio_pci_core_release_dev(struct vfio_device *core_vdev); int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev); void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev); void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev); -- cgit From 27aeb915595b87165a3004aab05b2b837d01e6ed Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Wed, 21 Sep 2022 18:43:50 +0800 Subject: vfio/hisi_acc: Use the new device life cycle helpers Tidy up @probe so all migration specific initialization logic is moved to migration specific @init callback. Remove vfio_pci_core_{un}init_device() given no user now. Signed-off-by: Yi Liu Signed-off-by: Kevin Tian Reviewed-by: Jason Gunthorpe Reviewed-by: Shameer Kolothum Link: https://lore.kernel.org/r/20220921104401.38898-5-kevin.tian@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio_pci_core.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio_pci_core.h b/include/linux/vfio_pci_core.h index 0499ea836058..367fd79226a3 100644 --- a/include/linux/vfio_pci_core.h +++ b/include/linux/vfio_pci_core.h @@ -106,13 +106,9 @@ int vfio_pci_core_register_dev_region(struct vfio_pci_core_device *vdev, void vfio_pci_core_set_params(bool nointxmask, bool is_disable_vga, bool is_disable_idle_d3); void vfio_pci_core_close_device(struct vfio_device *core_vdev); -void vfio_pci_core_init_device(struct vfio_pci_core_device *vdev, - struct pci_dev *pdev, - const struct vfio_device_ops *vfio_pci_ops); int vfio_pci_core_init_dev(struct vfio_device *core_vdev); void vfio_pci_core_release_dev(struct vfio_device *core_vdev); int vfio_pci_core_register_device(struct vfio_pci_core_device *vdev); -void vfio_pci_core_uninit_device(struct vfio_pci_core_device *vdev); void vfio_pci_core_unregister_device(struct vfio_pci_core_device *vdev); extern const struct pci_error_handlers vfio_pci_core_err_handlers; int vfio_pci_core_sriov_configure(struct vfio_pci_core_device *vdev, -- cgit From ebb72b765fb49685c4603d2bff47a4ab5d2580a9 Mon Sep 17 00:00:00 2001 From: Kevin Tian Date: Wed, 21 Sep 2022 18:43:59 +0800 Subject: vfio/ccw: Use the new device life cycle helpers ccw is the only exception which cannot use vfio_alloc_device() because its private device structure is designed to serve both mdev and parent. Life cycle of the parent is managed by css_driver so vfio_ccw_private must be allocated/freed in css_driver probe/remove path instead of conforming to vfio core life cycle for mdev. Given that use a wait/completion scheme so the mdev remove path waits after vfio_put_device() until receiving a completion notification from @release. The completion indicates that all active references on vfio_device have been released. After that point although free of vfio_ccw_private is delayed to css_driver it's at least guaranteed to have no parallel reference on released vfio device part from other code paths. memset() in @probe is removed. vfio_device is either already cleared when probed for the first time or cleared in @release from last probe. The right fix is to introduce separate structures for mdev and parent, but this won't happen in short term per prior discussions. Remove vfio_init/uninit_group_dev() as no user now. Suggested-by: Jason Gunthorpe Signed-off-by: Kevin Tian Reviewed-by: Jason Gunthorpe Reviewed-by: Eric Farman Link: https://lore.kernel.org/r/20220921104401.38898-14-kevin.tian@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index f67cac700e6f..3cf857b1eec7 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -184,9 +184,6 @@ static inline void vfio_put_device(struct vfio_device *device) kref_put(&device->kref, vfio_device_release); } -void vfio_init_group_dev(struct vfio_device *device, struct device *dev, - const struct vfio_device_ops *ops); -void vfio_uninit_group_dev(struct vfio_device *device); int vfio_register_group_dev(struct vfio_device *device); int vfio_register_emulated_iommu_dev(struct vfio_device *device); void vfio_unregister_group_dev(struct vfio_device *device); -- cgit From 3c28a76124b25882411f005924be73795b6ef078 Mon Sep 17 00:00:00 2001 From: Yi Liu Date: Wed, 21 Sep 2022 18:44:01 +0800 Subject: vfio: Add struct device to vfio_device and replace kref. With it a 'vfio-dev/vfioX' node is created under the sysfs path of the parent, indicating the device is bound to a vfio driver, e.g.: /sys/devices/pci0000\:6f/0000\:6f\:01.0/vfio-dev/vfio0 It is also a preparatory step toward adding cdev for supporting future device-oriented uAPI. Add Documentation/ABI/testing/sysfs-devices-vfio-dev. Suggested-by: Jason Gunthorpe Signed-off-by: Yi Liu Signed-off-by: Kevin Tian Reviewed-by: Jason Gunthorpe Link: https://lore.kernel.org/r/20220921104401.38898-16-kevin.tian@intel.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 3cf857b1eec7..ee399a768070 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -47,7 +47,8 @@ struct vfio_device { struct kvm *kvm; /* Members below here are private, not for driver use */ - struct kref kref; /* object life cycle */ + unsigned int index; + struct device device; /* device.kref covers object life circle */ refcount_t refcount; /* user count on registered device*/ unsigned int open_count; struct completion comp; @@ -178,10 +179,9 @@ struct vfio_device *_vfio_alloc_device(size_t size, struct device *dev, int vfio_init_device(struct vfio_device *device, struct device *dev, const struct vfio_device_ops *ops); void vfio_free_device(struct vfio_device *device); -void vfio_device_release(struct kref *kref); static inline void vfio_put_device(struct vfio_device *device) { - kref_put(&device->kref, vfio_device_release); + put_device(&device->device); } int vfio_register_group_dev(struct vfio_device *device); -- cgit From 583c1f420173f7d84413a1a1fbf5109d798b4faa Mon Sep 17 00:00:00 2001 From: David Vernet Date: Mon, 19 Sep 2022 19:00:57 -0500 Subject: bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type We want to support a ringbuf map type where samples are published from user-space, to be consumed by BPF programs. BPF currently supports a kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF map type. We'll need to define a new map type for user-space -> kernel, as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply to a user-space producer ring buffer, and we'll want to add one or more helper functions that would not apply for a kernel-producer ring buffer. This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type definition. The map type is useless in its current form, as there is no way to access or use it for anything until we one or more BPF helpers. A follow-on patch will therefore add a new helper function that allows BPF programs to run callbacks on samples that are published to the ring buffer. Signed-off-by: David Vernet Signed-off-by: Andrii Nakryiko Acked-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com --- include/linux/bpf_types.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h index 2b9112b80171..2c6a4f2562a7 100644 --- a/include/linux/bpf_types.h +++ b/include/linux/bpf_types.h @@ -126,6 +126,7 @@ BPF_MAP_TYPE(BPF_MAP_TYPE_STRUCT_OPS, bpf_struct_ops_map_ops) #endif BPF_MAP_TYPE(BPF_MAP_TYPE_RINGBUF, ringbuf_map_ops) BPF_MAP_TYPE(BPF_MAP_TYPE_BLOOM_FILTER, bloom_filter_map_ops) +BPF_MAP_TYPE(BPF_MAP_TYPE_USER_RINGBUF, user_ringbuf_map_ops) BPF_LINK_TYPE(BPF_LINK_TYPE_RAW_TRACEPOINT, raw_tracepoint) BPF_LINK_TYPE(BPF_LINK_TYPE_TRACING, tracing) -- cgit From 20571567384428dfc9fe5cf9f2e942e1df13c2dd Mon Sep 17 00:00:00 2001 From: David Vernet Date: Mon, 19 Sep 2022 19:00:58 -0500 Subject: bpf: Add bpf_user_ringbuf_drain() helper In a prior change, we added a new BPF_MAP_TYPE_USER_RINGBUF map type which will allow user-space applications to publish messages to a ring buffer that is consumed by a BPF program in kernel-space. In order for this map-type to be useful, it will require a BPF helper function that BPF programs can invoke to drain samples from the ring buffer, and invoke callbacks on those samples. This change adds that capability via a new BPF helper function: bpf_user_ringbuf_drain(struct bpf_map *map, void *callback_fn, void *ctx, u64 flags) BPF programs may invoke this function to run callback_fn() on a series of samples in the ring buffer. callback_fn() has the following signature: long callback_fn(struct bpf_dynptr *dynptr, void *context); Samples are provided to the callback in the form of struct bpf_dynptr *'s, which the program can read using BPF helper functions for querying struct bpf_dynptr's. In order to support bpf_ringbuf_drain(), a new PTR_TO_DYNPTR register type is added to the verifier to reflect a dynptr that was allocated by a helper function and passed to a BPF program. Unlike PTR_TO_STACK dynptrs which are allocated on the stack by a BPF program, PTR_TO_DYNPTR dynptrs need not use reference tracking, as the BPF helper is trusted to properly free the dynptr before returning. The verifier currently only supports PTR_TO_DYNPTR registers that are also DYNPTR_TYPE_LOCAL. Note that while the corresponding user-space libbpf logic will be added in a subsequent patch, this patch does contain an implementation of the .map_poll() callback for BPF_MAP_TYPE_USER_RINGBUF maps. This .map_poll() callback guarantees that an epoll-waiting user-space producer will receive at least one event notification whenever at least one sample is drained in an invocation of bpf_user_ringbuf_drain(), provided that the function is not invoked with the BPF_RB_NO_WAKEUP flag. If the BPF_RB_FORCE_WAKEUP flag is provided, a wakeup notification is sent even if no sample was drained. Signed-off-by: David Vernet Signed-off-by: Andrii Nakryiko Link: https://lore.kernel.org/bpf/20220920000100.477320-3-void@manifault.com --- include/linux/bpf.h | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index e0dbe0c0a17e..33e543b86e1a 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -451,7 +451,7 @@ enum bpf_type_flag { /* DYNPTR points to memory local to the bpf program. */ DYNPTR_TYPE_LOCAL = BIT(8 + BPF_BASE_TYPE_BITS), - /* DYNPTR points to a ringbuf record. */ + /* DYNPTR points to a kernel-produced ringbuf record. */ DYNPTR_TYPE_RINGBUF = BIT(9 + BPF_BASE_TYPE_BITS), /* Size is known at compile time. */ @@ -656,6 +656,7 @@ enum bpf_reg_type { PTR_TO_MEM, /* reg points to valid memory region */ PTR_TO_BUF, /* reg points to a read/write buffer */ PTR_TO_FUNC, /* reg points to a bpf program function */ + PTR_TO_DYNPTR, /* reg points to a dynptr */ __BPF_REG_TYPE_MAX, /* Extended reg_types. */ @@ -1394,6 +1395,11 @@ struct bpf_array { #define BPF_MAP_CAN_READ BIT(0) #define BPF_MAP_CAN_WRITE BIT(1) +/* Maximum number of user-producer ring buffer samples that can be drained in + * a call to bpf_user_ringbuf_drain(). + */ +#define BPF_MAX_USER_RINGBUF_SAMPLES (128 * 1024) + static inline u32 bpf_map_flags_to_cap(struct bpf_map *map) { u32 access_flags = map->map_flags & (BPF_F_RDONLY_PROG | BPF_F_WRONLY_PROG); @@ -2495,6 +2501,7 @@ extern const struct bpf_func_proto bpf_loop_proto; extern const struct bpf_func_proto bpf_copy_from_user_task_proto; extern const struct bpf_func_proto bpf_set_retval_proto; extern const struct bpf_func_proto bpf_get_retval_proto; +extern const struct bpf_func_proto bpf_user_ringbuf_drain_proto; const struct bpf_func_proto *tracing_prog_func_proto( enum bpf_func_id func_id, const struct bpf_prog *prog); @@ -2639,7 +2646,7 @@ enum bpf_dynptr_type { BPF_DYNPTR_TYPE_INVALID, /* Points to memory that is local to the bpf program */ BPF_DYNPTR_TYPE_LOCAL, - /* Underlying data is a ringbuf record */ + /* Underlying data is a kernel-produced ringbuf record */ BPF_DYNPTR_TYPE_RINGBUF, }; -- cgit From b8d31762a0ae6861e1115302ee338560d853e317 Mon Sep 17 00:00:00 2001 From: Roberto Sassu Date: Tue, 20 Sep 2022 09:59:42 +0200 Subject: btf: Allow dynamic pointer parameters in kfuncs Allow dynamic pointers (struct bpf_dynptr_kern *) to be specified as parameters in kfuncs. Also, ensure that dynamic pointers passed as argument are valid and initialized, are a pointer to the stack, and of the type local. More dynamic pointer types can be supported in the future. To properly detect whether a parameter is of the desired type, introduce the stringify_struct() macro to compare the returned structure name with the desired name. In addition, protect against structure renames, by halting the build with BUILD_BUG_ON(), so that developers have to revisit the code. To check if a dynamic pointer passed to the kfunc is valid and initialized, and if its type is local, export the existing functions is_dynptr_reg_valid_init() and is_dynptr_type_expected(). Cc: Joanne Koong Cc: Kumar Kartikeya Dwivedi Signed-off-by: Roberto Sassu Acked-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220920075951.929132-5-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 5 +++++ include/linux/btf.h | 9 +++++++++ 2 files changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index e197f8fb27e2..9e1e6965f407 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -593,6 +593,11 @@ int check_kfunc_mem_size_reg(struct bpf_verifier_env *env, struct bpf_reg_state u32 regno); int check_mem_reg(struct bpf_verifier_env *env, struct bpf_reg_state *reg, u32 regno, u32 mem_size); +bool is_dynptr_reg_valid_init(struct bpf_verifier_env *env, + struct bpf_reg_state *reg); +bool is_dynptr_type_expected(struct bpf_verifier_env *env, + struct bpf_reg_state *reg, + enum bpf_arg_type arg_type); /* this lives here instead of in bpf.h because it needs to dereference tgt_prog */ static inline u64 bpf_trampoline_compute_key(const struct bpf_prog *tgt_prog, diff --git a/include/linux/btf.h b/include/linux/btf.h index 1fcc833a8690..f9aababc5d78 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -52,6 +52,15 @@ #define KF_SLEEPABLE (1 << 5) /* kfunc may sleep */ #define KF_DESTRUCTIVE (1 << 6) /* kfunc performs destructive actions */ +/* + * Return the name of the passed struct, if exists, or halt the build if for + * example the structure gets renamed. In this way, developers have to revisit + * the code using that structure name, and update it accordingly. + */ +#define stringify_struct(x) \ + ({ BUILD_BUG_ON(sizeof(struct x) < 0); \ + __stringify(x); }) + struct btf; struct btf_member; struct btf_type; -- cgit From 51df4865718540f51bb5d3e552c50dc88e1333d6 Mon Sep 17 00:00:00 2001 From: Roberto Sassu Date: Tue, 20 Sep 2022 09:59:43 +0200 Subject: bpf: Export bpf_dynptr_get_size() Export bpf_dynptr_get_size(), so that kernel code dealing with eBPF dynamic pointers can obtain the real size of data carried by this data structure. Signed-off-by: Roberto Sassu Reviewed-by: Joanne Koong Acked-by: KP Singh Acked-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/r/20220920075951.929132-6-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 33e543b86e1a..6535fb1e21b9 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2654,6 +2654,7 @@ void bpf_dynptr_init(struct bpf_dynptr_kern *ptr, void *data, enum bpf_dynptr_type type, u32 offset, u32 size); void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr); int bpf_dynptr_check_size(u32 size); +u32 bpf_dynptr_get_size(struct bpf_dynptr_kern *ptr); #ifdef CONFIG_BPF_LSM void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype); -- cgit From 90fd8f26edd47942203639bf3a5dde8fa1931a0e Mon Sep 17 00:00:00 2001 From: Roberto Sassu Date: Tue, 20 Sep 2022 09:59:44 +0200 Subject: KEYS: Move KEY_LOOKUP_ to include/linux/key.h and define KEY_LOOKUP_ALL In preparation for the patch that introduces the bpf_lookup_user_key() eBPF kfunc, move KEY_LOOKUP_ definitions to include/linux/key.h, to be able to validate the kfunc parameters. Add them to enum key_lookup_flag, so that all the current ones and the ones defined in the future are automatically exported through BTF and available to eBPF programs. Also, add KEY_LOOKUP_ALL to the enum, with the logical OR of currently defined flags as value, to facilitate checking whether a variable contains only those flags. Signed-off-by: Roberto Sassu Acked-by: Jarkko Sakkinen Link: https://lore.kernel.org/r/20220920075951.929132-7-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov --- include/linux/key.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/key.h b/include/linux/key.h index 7febc4881363..d27477faf00d 100644 --- a/include/linux/key.h +++ b/include/linux/key.h @@ -88,6 +88,12 @@ enum key_need_perm { KEY_DEFER_PERM_CHECK, /* Special: permission check is deferred */ }; +enum key_lookup_flag { + KEY_LOOKUP_CREATE = 0x01, + KEY_LOOKUP_PARTIAL = 0x02, + KEY_LOOKUP_ALL = (KEY_LOOKUP_CREATE | KEY_LOOKUP_PARTIAL), +}; + struct seq_file; struct user_struct; struct signal_struct; -- cgit From f3cf4134c5c6c47b9b5c7aa3cb2d67e107887a7b Mon Sep 17 00:00:00 2001 From: Roberto Sassu Date: Tue, 20 Sep 2022 09:59:45 +0200 Subject: bpf: Add bpf_lookup_*_key() and bpf_key_put() kfuncs Add the bpf_lookup_user_key(), bpf_lookup_system_key() and bpf_key_put() kfuncs, to respectively search a key with a given key handle serial number and flags, obtain a key from a pre-determined ID defined in include/linux/verification.h, and cleanup. Introduce system_keyring_id_check() to validate the keyring ID parameter of bpf_lookup_system_key(). Signed-off-by: Roberto Sassu Acked-by: Kumar Kartikeya Dwivedi Acked-by: Song Liu Link: https://lore.kernel.org/r/20220920075951.929132-8-roberto.sassu@huaweicloud.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 8 ++++++++ include/linux/verification.h | 8 ++++++++ 2 files changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 6535fb1e21b9..a1435b019aca 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2664,4 +2664,12 @@ static inline void bpf_cgroup_atype_get(u32 attach_btf_id, int cgroup_atype) {} static inline void bpf_cgroup_atype_put(int cgroup_atype) {} #endif /* CONFIG_BPF_LSM */ +struct key; + +#ifdef CONFIG_KEYS +struct bpf_key { + struct key *key; + bool has_ref; +}; +#endif /* CONFIG_KEYS */ #endif /* _LINUX_BPF_H */ diff --git a/include/linux/verification.h b/include/linux/verification.h index a655923335ae..f34e50ebcf60 100644 --- a/include/linux/verification.h +++ b/include/linux/verification.h @@ -17,6 +17,14 @@ #define VERIFY_USE_SECONDARY_KEYRING ((struct key *)1UL) #define VERIFY_USE_PLATFORM_KEYRING ((struct key *)2UL) +static inline int system_keyring_id_check(u64 id) +{ + if (id > (unsigned long)VERIFY_USE_PLATFORM_KEYRING) + return -EINVAL; + + return 0; +} + /* * The use to which an asymmetric key is being put. */ -- cgit From 05b24ff9b2cfabfcfd951daaa915a036ab53c9e1 Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Fri, 16 Sep 2022 09:19:14 +0200 Subject: bpf: Prevent bpf program recursion for raw tracepoint probes We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the same tracepoint by using bpf_trace_printk helper that takes trace_printk_lock lock. Call Trace: ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90 bpf_trace_printk+0x2b/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 __unfreeze_partials+0x5b/0x160 ... The can be reproduced by attaching bpf program as raw tracepoint on contention_begin tracepoint. The bpf prog calls bpf_trace_printk helper. Then by running perf bench the spin lock code is forced to take slow path and call contention_begin tracepoint. Fixing this by skipping execution of the bpf program if it's already running, Using bpf prog 'active' field, which is being currently used by trampoline programs for the same reason. Moving bpf_prog_inc_misses_counter to syscall.c because trampoline.c is compiled in just for CONFIG_BPF_JIT option. Reviewed-by: Stanislav Fomichev Reported-by: syzbot+2251879aa068ad9c960d@syzkaller.appspotmail.com [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#t Signed-off-by: Jiri Olsa Link: https://lore.kernel.org/r/20220916071914.7156-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index a1435b019aca..edd43edb27d6 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2042,6 +2042,8 @@ static inline bool has_current_bpf_ctx(void) { return !!current->bpf_ctx; } + +void notrace bpf_prog_inc_misses_counter(struct bpf_prog *prog); #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) { @@ -2264,6 +2266,10 @@ static inline bool has_current_bpf_ctx(void) { return false; } + +static inline void bpf_prog_inc_misses_counter(struct bpf_prog *prog) +{ +} #endif /* CONFIG_BPF_SYSCALL */ void __bpf_free_used_btfs(struct bpf_prog_aux *aux, -- cgit From d7e7b9af104c7b389a0c21eb26532511bce4b510 Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Thu, 1 Sep 2022 12:32:06 -0700 Subject: fscrypt: stop using keyrings subsystem for fscrypt_master_key The approach of fs/crypto/ internally managing the fscrypt_master_key structs as the payloads of "struct key" objects contained in a "struct key" keyring has outlived its usefulness. The original idea was to simplify the code by reusing code from the keyrings subsystem. However, several issues have arisen that can't easily be resolved: - When a master key struct is destroyed, blk_crypto_evict_key() must be called on any per-mode keys embedded in it. (This started being the case when inline encryption support was added.) Yet, the keyrings subsystem can arbitrarily delay the destruction of keys, even past the time the filesystem was unmounted. Therefore, currently there is no easy way to call blk_crypto_evict_key() when a master key is destroyed. Currently, this is worked around by holding an extra reference to the filesystem's request_queue(s). But it was overlooked that the request_queue reference is *not* guaranteed to pin the corresponding blk_crypto_profile too; for device-mapper devices that support inline crypto, it doesn't. This can cause a use-after-free. - When the last inode that was using an incompletely-removed master key is evicted, the master key removal is completed by removing the key struct from the keyring. Currently this is done via key_invalidate(). Yet, key_invalidate() takes the key semaphore. This can deadlock when called from the shrinker, since in fscrypt_ioctl_add_key(), memory is allocated with GFP_KERNEL under the same semaphore. - More generally, the fact that the keyrings subsystem can arbitrarily delay the destruction of keys (via garbage collection delay, or via random processes getting temporary key references) is undesirable, as it means we can't strictly guarantee that all secrets are ever wiped. - Doing the master key lookups via the keyrings subsystem results in the key_permission LSM hook being called. fscrypt doesn't want this, as all access control for encrypted files is designed to happen via the files themselves, like any other files. The workaround which SELinux users are using is to change their SELinux policy to grant key search access to all domains. This works, but it is an odd extra step that shouldn't really have to be done. The fix for all these issues is to change the implementation to what I should have done originally: don't use the keyrings subsystem to keep track of the filesystem's fscrypt_master_key structs. Instead, just store them in a regular kernel data structure, and rework the reference counting, locking, and lifetime accordingly. Retain support for RCU-mode key lookups by using a hash table. Replace fscrypt_sb_free() with fscrypt_sb_delete(), which releases the keys synchronously and runs a bit earlier during unmount, so that block devices are still available. A side effect of this patch is that neither the master keys themselves nor the filesystem keyrings will be listed in /proc/keys anymore. ("Master key users" and the master key users keyrings will still be listed.) However, this was mostly an implementation detail, and it was intended just for debugging purposes. I don't know of anyone using it. This patch does *not* change how "master key users" (->mk_users) works; that still uses the keyrings subsystem. That is still needed for key quotas, and changing that isn't necessary to solve the issues listed above. If we decide to change that too, it would be a separate patch. I've marked this as fixing the original commit that added the fscrypt keyring, but as noted above the most important issue that this patch fixes wasn't introduced until the addition of inline encryption support. Fixes: 22d94f493bfb ("fscrypt: add FS_IOC_ADD_ENCRYPTION_KEY ioctl") Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20220901193208.138056-2-ebiggers@kernel.org --- include/linux/fs.h | 2 +- include/linux/fscrypt.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9eced4cc286e..0830486f47ef 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1472,7 +1472,7 @@ struct super_block { const struct xattr_handler **s_xattr; #ifdef CONFIG_FS_ENCRYPTION const struct fscrypt_operations *s_cop; - struct key *s_master_keys; /* master crypto keys in use */ + struct fscrypt_keyring *s_master_keys; /* master crypto keys in use */ #endif #ifdef CONFIG_FS_VERITY const struct fsverity_operations *s_vop; diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index 488fd8c8f8af..db5bb5650bf2 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -310,7 +310,7 @@ fscrypt_free_dummy_policy(struct fscrypt_dummy_policy *dummy_policy) } /* keyring.c */ -void fscrypt_sb_free(struct super_block *sb); +void fscrypt_sb_delete(struct super_block *sb); int fscrypt_ioctl_add_key(struct file *filp, void __user *arg); int fscrypt_add_test_dummy_key(struct super_block *sb, const struct fscrypt_dummy_policy *dummy_policy); @@ -524,7 +524,7 @@ fscrypt_free_dummy_policy(struct fscrypt_dummy_policy *dummy_policy) } /* keyring.c */ -static inline void fscrypt_sb_free(struct super_block *sb) +static inline void fscrypt_sb_delete(struct super_block *sb) { } -- cgit From 0e91fc1e0f5c70ce575451103ec66c2ec21f1a6e Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Thu, 1 Sep 2022 12:32:08 -0700 Subject: fscrypt: work on block_devices instead of request_queues request_queues are a block layer implementation detail that should not leak into file systems. Change the fscrypt inline crypto code to retrieve block devices instead of request_queues from the file system. As part of that, clean up the interaction with multi-device file systems by returning both the number of devices and the actual device array in a single method call. Signed-off-by: Christoph Hellwig [ebiggers: bug fixes and minor tweaks] Signed-off-by: Eric Biggers Link: https://lore.kernel.org/r/20220901193208.138056-4-ebiggers@kernel.org --- include/linux/fscrypt.h | 21 +++++++++------------ 1 file changed, 9 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index db5bb5650bf2..1f12ebb4a69d 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -161,24 +161,21 @@ struct fscrypt_operations { int *ino_bits_ret, int *lblk_bits_ret); /* - * Return the number of block devices to which the filesystem may write - * encrypted file contents. + * Return an array of pointers to the block devices to which the + * filesystem may write encrypted file contents, NULL if the filesystem + * only has a single such block device, or an ERR_PTR() on error. + * + * On successful non-NULL return, *num_devs is set to the number of + * devices in the returned array. The caller must free the returned + * array using kfree(). * * If the filesystem can use multiple block devices (other than block * devices that aren't used for encrypted file contents, such as * external journal devices), and wants to support inline encryption, * then it must implement this function. Otherwise it's not needed. */ - int (*get_num_devices)(struct super_block *sb); - - /* - * If ->get_num_devices() returns a value greater than 1, then this - * function is called to get the array of request_queues that the - * filesystem is using -- one per block device. (There may be duplicate - * entries in this array, as block devices can share a request_queue.) - */ - void (*get_devices)(struct super_block *sb, - struct request_queue **devs); + struct block_device **(*get_devices)(struct super_block *sb, + unsigned int *num_devs); }; static inline struct fscrypt_info *fscrypt_get_info(const struct inode *inode) -- cgit From d7f06bdd6ee87fbefa05af5f57361d85e7715b11 Mon Sep 17 00:00:00 2001 From: Phil Auld Date: Tue, 6 Sep 2022 16:35:42 -0400 Subject: drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES As PAGE_SIZE is unsigned long, -1 > PAGE_SIZE when NR_CPUS <= 3. This leads to very large file sizes: topology$ ls -l total 0 -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 core_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 core_cpus_list -r--r--r-- 1 root root 4096 Sep 5 10:58 core_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 core_siblings -r--r--r-- 1 root root 4096 Sep 5 11:59 core_siblings_list -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 die_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 die_cpus_list -r--r--r-- 1 root root 4096 Sep 5 11:59 die_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 11:59 package_cpus -r--r--r-- 1 root root 4096 Sep 5 11:59 package_cpus_list -r--r--r-- 1 root root 4096 Sep 5 10:58 physical_package_id -r--r--r-- 1 root root 18446744073709551615 Sep 5 10:10 thread_siblings -r--r--r-- 1 root root 4096 Sep 5 11:59 thread_siblings_list Adjust the inequality to catch the case when NR_CPUS is configured to a small value. Fixes: 7ee951acd31a ("drivers/base: fix userspace break from using bin_attributes for cpumap and cpulist") Cc: Greg Kroah-Hartman Cc: "Rafael J. Wysocki" Cc: Yury Norov Cc: stable@vger.kernel.org Cc: feng xiangjun Reported-by: feng xiangjun Signed-off-by: Phil Auld Signed-off-by: Yury Norov Link: https://lore.kernel.org/r/20220906203542.1796629-1-pauld@redhat.com Signed-off-by: Greg Kroah-Hartman --- include/linux/cpumask.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index bd047864c7ac..e8ad12b5b9d2 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -1127,9 +1127,10 @@ cpumap_print_list_to_buf(char *buf, const struct cpumask *mask, * cover a worst-case of every other cpu being on one of two nodes for a * very large NR_CPUS. * - * Use PAGE_SIZE as a minimum for smaller configurations. + * Use PAGE_SIZE as a minimum for smaller configurations while avoiding + * unsigned comparison to -1. */ -#define CPUMAP_FILE_MAX_BYTES ((((NR_CPUS * 9)/32 - 1) > PAGE_SIZE) \ +#define CPUMAP_FILE_MAX_BYTES (((NR_CPUS * 9)/32 > PAGE_SIZE) \ ? (NR_CPUS * 9)/32 - 1 : PAGE_SIZE) #define CPULIST_FILE_MAX_BYTES (((NR_CPUS * 7)/2 > PAGE_SIZE) ? (NR_CPUS * 7)/2 : PAGE_SIZE) -- cgit From cca1fd41ab2862465d75443822d751e4f9a112ee Mon Sep 17 00:00:00 2001 From: William Breathitt Gray Date: Thu, 22 Sep 2022 07:20:57 -0400 Subject: counter: Realign counter_comp comment block to 80 characters The member documentation comment lines for struct counter_comp extend past the 80-characters column boundary due to extra identation at the start of each section. This patch realigns the comment block within the 80-characters boundary by removing these superfluous indents. Reviewed-by: Yanteng Si Link: https://lore.kernel.org/r/20220902120839.4260-1-william.gray@linaro.org/ Signed-off-by: William Breathitt Gray Link: https://lore.kernel.org/r/8294b04153c33602e9c3dd21ac90c1e99bd0fdaf.1663844776.git.william.gray@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/counter.h | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/counter.h b/include/linux/counter.h index 1fe17f5adb09..a81234bc8ea8 100644 --- a/include/linux/counter.h +++ b/include/linux/counter.h @@ -38,64 +38,64 @@ enum counter_comp_type { * @type: Counter component data type * @name: device-specific component name * @priv: component-relevant data - * @action_read: Synapse action mode read callback. The read value of the + * @action_read: Synapse action mode read callback. The read value of the * respective Synapse action mode should be passed back via * the action parameter. - * @device_u8_read: Device u8 component read callback. The read value of the + * @device_u8_read: Device u8 component read callback. The read value of the * respective Device u8 component should be passed back via * the val parameter. - * @count_u8_read: Count u8 component read callback. The read value of the + * @count_u8_read: Count u8 component read callback. The read value of the * respective Count u8 component should be passed back via * the val parameter. - * @signal_u8_read: Signal u8 component read callback. The read value of the + * @signal_u8_read: Signal u8 component read callback. The read value of the * respective Signal u8 component should be passed back via * the val parameter. - * @device_u32_read: Device u32 component read callback. The read value of + * @device_u32_read: Device u32 component read callback. The read value of * the respective Device u32 component should be passed * back via the val parameter. - * @count_u32_read: Count u32 component read callback. The read value of the + * @count_u32_read: Count u32 component read callback. The read value of the * respective Count u32 component should be passed back via * the val parameter. - * @signal_u32_read: Signal u32 component read callback. The read value of + * @signal_u32_read: Signal u32 component read callback. The read value of * the respective Signal u32 component should be passed * back via the val parameter. - * @device_u64_read: Device u64 component read callback. The read value of + * @device_u64_read: Device u64 component read callback. The read value of * the respective Device u64 component should be passed * back via the val parameter. - * @count_u64_read: Count u64 component read callback. The read value of the + * @count_u64_read: Count u64 component read callback. The read value of the * respective Count u64 component should be passed back via * the val parameter. - * @signal_u64_read: Signal u64 component read callback. The read value of + * @signal_u64_read: Signal u64 component read callback. The read value of * the respective Signal u64 component should be passed * back via the val parameter. - * @action_write: Synapse action mode write callback. The write value of + * @action_write: Synapse action mode write callback. The write value of * the respective Synapse action mode is passed via the * action parameter. - * @device_u8_write: Device u8 component write callback. The write value of + * @device_u8_write: Device u8 component write callback. The write value of * the respective Device u8 component is passed via the val * parameter. - * @count_u8_write: Count u8 component write callback. The write value of + * @count_u8_write: Count u8 component write callback. The write value of * the respective Count u8 component is passed via the val * parameter. - * @signal_u8_write: Signal u8 component write callback. The write value of + * @signal_u8_write: Signal u8 component write callback. The write value of * the respective Signal u8 component is passed via the val * parameter. - * @device_u32_write: Device u32 component write callback. The write value of + * @device_u32_write: Device u32 component write callback. The write value of * the respective Device u32 component is passed via the * val parameter. - * @count_u32_write: Count u32 component write callback. The write value of + * @count_u32_write: Count u32 component write callback. The write value of * the respective Count u32 component is passed via the val * parameter. - * @signal_u32_write: Signal u32 component write callback. The write value of + * @signal_u32_write: Signal u32 component write callback. The write value of * the respective Signal u32 component is passed via the * val parameter. - * @device_u64_write: Device u64 component write callback. The write value of + * @device_u64_write: Device u64 component write callback. The write value of * the respective Device u64 component is passed via the * val parameter. - * @count_u64_write: Count u64 component write callback. The write value of + * @count_u64_write: Count u64 component write callback. The write value of * the respective Count u64 component is passed via the val * parameter. - * @signal_u64_write: Signal u64 component write callback. The write value of + * @signal_u64_write: Signal u64 component write callback. The write value of * the respective Signal u64 component is passed via the * val parameter. */ -- cgit From 4d269ed485298e8a09485a664e7b35b370ab4ada Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:09 +0000 Subject: x86/resctrl: Kill off alloc_enabled rdt_resources_all[] used to have extra entries for L2CODE/L2DATA. These were hidden from resctrl by the alloc_enabled value. Now that the L2/L2CODE/L2DATA resources have been merged together, alloc_enabled doesn't mean anything, it always has the same value as alloc_capable which indicates allocation is supported by this resource. Remove alloc_enabled and its helpers. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-2-james.morse@arm.com --- include/linux/resctrl.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 21deb5212bbd..386ab3a41500 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -130,7 +130,6 @@ struct resctrl_schema; /** * struct rdt_resource - attributes of a resctrl resource * @rid: The index of the resource - * @alloc_enabled: Is allocation enabled on this machine * @mon_enabled: Is monitoring enabled for this feature * @alloc_capable: Is allocation available on this machine * @mon_capable: Is monitor feature available on this machine @@ -150,7 +149,6 @@ struct resctrl_schema; */ struct rdt_resource { int rid; - bool alloc_enabled; bool mon_enabled; bool alloc_capable; bool mon_capable; -- cgit From bab6ee736873becc0216ba5fd159394e272d01b2 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:10 +0000 Subject: x86/resctrl: Merge mon_capable and mon_enabled mon_enabled and mon_capable are always set as a pair by rdt_get_mon_l3_config(). There is no point having two values. Merge them together. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-3-james.morse@arm.com --- include/linux/resctrl.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 386ab3a41500..8180c539800d 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -130,7 +130,6 @@ struct resctrl_schema; /** * struct rdt_resource - attributes of a resctrl resource * @rid: The index of the resource - * @mon_enabled: Is monitoring enabled for this feature * @alloc_capable: Is allocation available on this machine * @mon_capable: Is monitor feature available on this machine * @num_rmid: Number of RMIDs available @@ -149,7 +148,6 @@ struct resctrl_schema; */ struct rdt_resource { int rid; - bool mon_enabled; bool alloc_capable; bool mon_capable; int num_rmid; -- cgit From de84a090d99a3b991bd89cd86a94b65d15bd1bbe Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Tue, 20 Sep 2022 12:11:21 +0200 Subject: net: ethernet: mtk_eth_wed: add wed support for mt7986 chipset Introduce Wireless Etherne Dispatcher support on transmission side for mt7986 chipset Tested-by: Daniel Golle Co-developed-by: Bo Jiao Signed-off-by: Bo Jiao Co-developed-by: Sujuan Chen Signed-off-by: Sujuan Chen Signed-off-by: Lorenzo Bianconi Signed-off-by: Paolo Abeni --- include/linux/soc/mediatek/mtk_wed.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk_wed.h b/include/linux/soc/mediatek/mtk_wed.h index 7e00cca06709..592221a7149b 100644 --- a/include/linux/soc/mediatek/mtk_wed.h +++ b/include/linux/soc/mediatek/mtk_wed.h @@ -14,6 +14,7 @@ struct mtk_wdma_desc; struct mtk_wed_ring { struct mtk_wdma_desc *desc; dma_addr_t desc_phys; + u32 desc_size; int size; u32 reg_base; @@ -45,10 +46,17 @@ struct mtk_wed_device { struct pci_dev *pci_dev; u32 wpdma_phys; + u32 wpdma_int; + u32 wpdma_mask; + u32 wpdma_tx; + u32 wpdma_txfree; u16 token_start; unsigned int nbuf; + u8 tx_tbit[MTK_WED_TX_QUEUES]; + u8 txfree_tbit; + u32 (*init_buf)(void *ptr, dma_addr_t phys, int token_id); int (*offload_enable)(struct mtk_wed_device *wed); void (*offload_disable)(struct mtk_wed_device *wed); -- cgit From 2b2ba3ecb2411c5e2a0d670be5e9ded2c93351e9 Mon Sep 17 00:00:00 2001 From: Lorenzo Bianconi Date: Tue, 20 Sep 2022 12:11:22 +0200 Subject: net: ethernet: mtk_eth_wed: add axi bus support Other than pcie bus, introduce support for axi bus to mtk wed driver. Axi bus is used to connect mt7986-wmac soc chip available on mt7986 device. Tested-by: Daniel Golle Co-developed-by: Bo Jiao Signed-off-by: Bo Jiao Co-developed-by: Sujuan Chen Signed-off-by: Sujuan Chen Signed-off-by: Lorenzo Bianconi Signed-off-by: Paolo Abeni --- include/linux/soc/mediatek/mtk_wed.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/soc/mediatek/mtk_wed.h b/include/linux/soc/mediatek/mtk_wed.h index 592221a7149b..4450c8b7a1cb 100644 --- a/include/linux/soc/mediatek/mtk_wed.h +++ b/include/linux/soc/mediatek/mtk_wed.h @@ -11,6 +11,11 @@ struct mtk_wed_hw; struct mtk_wdma_desc; +enum mtk_wed_bus_tye { + MTK_WED_BUS_PCIE, + MTK_WED_BUS_AXI, +}; + struct mtk_wed_ring { struct mtk_wdma_desc *desc; dma_addr_t desc_phys; @@ -43,7 +48,11 @@ struct mtk_wed_device { /* filled by driver: */ struct { - struct pci_dev *pci_dev; + union { + struct platform_device *platform_dev; + struct pci_dev *pci_dev; + }; + enum mtk_wed_bus_tye bus_type; u32 wpdma_phys; u32 wpdma_int; -- cgit From 3a7232cdf19e39e7f24c493117b373788b348af2 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:11 +0000 Subject: x86/resctrl: Add domain online callback for resctrl work Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to allocate the memory. Move domain_setup_mon_state(), the monitor subdir creation call and the mbm/limbo workers into a new resctrl_online_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-4-james.morse@arm.com --- include/linux/resctrl.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 8180c539800d..d512455b4c3a 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -192,5 +192,6 @@ u32 resctrl_arch_get_num_closid(struct rdt_resource *r); int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid); u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d, u32 closid, enum resctrl_conf_type type); +int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d); #endif /* _RESCTRL_H */ -- cgit From 798fd4b9ac37fec571f55fb8592497b0dd5f7a73 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:13 +0000 Subject: x86/resctrl: Add domain offline callback for resctrl work Because domains are exposed to user-space via resctrl, the filesystem must update its state when CPU hotplug callbacks are triggered. Some of this work is common to any architecture that would support resctrl, but the work is tied up with the architecture code to free the memory. Move the monitor subdir removal and the cancelling of the mbm/limbo works into a new resctrl_offline_domain() call. These bits are not specific to the architecture. Grouping them in one function allows that code to be moved to /fs/ and re-used by another architecture. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-6-james.morse@arm.com --- include/linux/resctrl.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index d512455b4c3a..5d283bdd6162 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -193,5 +193,6 @@ int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid); u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d, u32 closid, enum resctrl_conf_type type); int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d); +void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d); #endif /* _RESCTRL_H */ -- cgit From 7a4e0d2c7fb8e28bb8ce0687925c9cf91d65f2a0 Mon Sep 17 00:00:00 2001 From: наб Date: Fri, 16 Sep 2022 03:54:59 +0200 Subject: tty: remove TTY_MAGIC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit According to Greg, in the context of magic numbers as defined in magic-number.rst, "the tty layer should not need this and I'll gladly take patches" Acked-by: Jiri Slaby Signed-off-by: Ahelenia Ziemiańska Ref: https://lore.kernel.org/linux-doc/YyMlovoskUcHLEb7@kroah.com/ Link: https://lore.kernel.org/r/476d024cd6b04160a5de381ea2b9856b60088cbd.1663288066.git.nabijaczleweli@nabijaczleweli.xyz Signed-off-by: Greg Kroah-Hartman --- include/linux/tty.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty.h b/include/linux/tty.h index ae41893f8653..730c3301d710 100644 --- a/include/linux/tty.h +++ b/include/linux/tty.h @@ -122,8 +122,6 @@ struct tty_operations; /** * struct tty_struct - state associated with a tty while open * - * @magic: magic value set early in @alloc_tty_struct to %TTY_MAGIC, for - * debugging purposes * @kref: reference counting by tty_kref_get() and tty_kref_put(), reaching zero * frees the structure * @dev: class device or %NULL (e.g. ptys, serdev) @@ -193,7 +191,6 @@ struct tty_operations; * &struct tty_port. */ struct tty_struct { - int magic; struct kref kref; struct device *dev; struct tty_driver *driver; @@ -260,9 +257,6 @@ struct tty_file_private { struct list_head list; }; -/* tty magic number */ -#define TTY_MAGIC 0x5401 - /** * DOC: TTY Struct Flags * -- cgit From 5052df99d3bc3cd281222bbcba44323b2d0937d2 Mon Sep 17 00:00:00 2001 From: наб Date: Fri, 16 Sep 2022 03:55:05 +0200 Subject: tty: remove TTY_DRIVER_MAGIC MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit According to Greg, in the context of magic numbers as defined in magic-number.rst, "the tty layer should not need this and I'll gladly take patches" Acked-by: Jiri Slaby Signed-off-by: Ahelenia Ziemiańska Ref: https://lore.kernel.org/linux-doc/YyMlovoskUcHLEb7@kroah.com/ Link: https://lore.kernel.org/r/723478a270a3858f27843cbec621df4d5d44efcc.1663288066.git.nabijaczleweli@nabijaczleweli.xyz Signed-off-by: Greg Kroah-Hartman --- include/linux/tty_driver.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/tty_driver.h b/include/linux/tty_driver.h index f961164a5274..e00034118c7b 100644 --- a/include/linux/tty_driver.h +++ b/include/linux/tty_driver.h @@ -397,7 +397,6 @@ struct tty_operations { /** * struct tty_driver -- driver for TTY devices * - * @magic: set to %TTY_DRIVER_MAGIC in __tty_alloc_driver() * @kref: reference counting. Reaching zero frees all the internals and the * driver. * @cdevs: allocated/registered character /dev devices @@ -433,7 +432,6 @@ struct tty_operations { * @driver_name, @name, @type, @subtype, @init_termios, and @ops. */ struct tty_driver { - int magic; struct kref kref; struct cdev **cdevs; struct module *owner; @@ -490,9 +488,6 @@ static inline void tty_set_operations(struct tty_driver *driver, driver->ops = op; } -/* tty driver magic number */ -#define TTY_DRIVER_MAGIC 0x5402 - /** * DOC: TTY Driver Flags * -- cgit From 9906890c89e4dbd900ed87ad3040080339a7f411 Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Wed, 21 Sep 2022 00:35:32 +0100 Subject: serial: 8250: Let drivers request full 16550A feature probing A SERIAL_8250_16550A_VARIANTS configuration option has been recently defined that lets one request the 8250 driver not to probe for 16550A device features so as to reduce the driver's device startup time in virtual machines. Some actual hardware devices require these features to have been fully determined however for their driver to work correctly, so define a flag to let drivers request full 16550A feature probing on a device-by-device basis if required regardless of the SERIAL_8250_16550A_VARIANTS option setting chosen. Fixes: dc56ecb81a0a ("serial: 8250: Support disabling mdelay-filled probes of 16550A variants") Cc: stable@vger.kernel.org # v5.6+ Reported-by: Anders Blomdell Signed-off-by: Maciej W. Rozycki Link: https://lore.kernel.org/r/alpine.DEB.2.21.2209202357520.41633@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 02a4299b7d42..2ae182f0f1de 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -422,7 +422,7 @@ struct uart_icount { __u32 buf_overrun; }; -typedef unsigned int __bitwise upf_t; +typedef u64 __bitwise upf_t; typedef unsigned int __bitwise upstat_t; struct uart_port { @@ -530,6 +530,7 @@ struct uart_port { #define UPF_FIXED_PORT ((__force upf_t) (1 << 29)) #define UPF_DEAD ((__force upf_t) (1 << 30)) #define UPF_IOREMAP ((__force upf_t) (1 << 31)) +#define UPF_FULL_PROBE ((__force upf_t) (1ULL << 32)) #define __UPF_CHANGE_MASK 0x17fff #define UPF_CHANGE_MASK ((__force upf_t) __UPF_CHANGE_MASK) -- cgit From 46a8973c4d9d7b12e0e4dd9f589d08d420fb6c0d Mon Sep 17 00:00:00 2001 From: "Maciej W. Rozycki" Date: Wed, 21 Sep 2022 00:35:42 +0100 Subject: serial: 8250: Switch UART port flags to using BIT_ULL Use BIT_ULL rather than encoding bits explicitly where applicable with UART port flags. This makes a (__force upf_t) cast redundant, but keep it for visual consistency with the flags defined in terms of userspace macros. Signed-off-by: Maciej W. Rozycki Link: https://lore.kernel.org/r/alpine.DEB.2.21.2209210007030.41633@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index 2ae182f0f1de..9e0a9d379390 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -513,24 +513,24 @@ struct uart_port { #define UPF_BUGGY_UART ((__force upf_t) ASYNC_BUGGY_UART /* 14 */ ) #define UPF_MAGIC_MULTIPLIER ((__force upf_t) ASYNC_MAGIC_MULTIPLIER /* 16 */ ) -#define UPF_NO_THRE_TEST ((__force upf_t) (1 << 19)) +#define UPF_NO_THRE_TEST ((__force upf_t) BIT_ULL(19)) /* Port has hardware-assisted h/w flow control */ -#define UPF_AUTO_CTS ((__force upf_t) (1 << 20)) -#define UPF_AUTO_RTS ((__force upf_t) (1 << 21)) +#define UPF_AUTO_CTS ((__force upf_t) BIT_ULL(20)) +#define UPF_AUTO_RTS ((__force upf_t) BIT_ULL(21)) #define UPF_HARD_FLOW ((__force upf_t) (UPF_AUTO_CTS | UPF_AUTO_RTS)) /* Port has hardware-assisted s/w flow control */ -#define UPF_SOFT_FLOW ((__force upf_t) (1 << 22)) -#define UPF_CONS_FLOW ((__force upf_t) (1 << 23)) -#define UPF_SHARE_IRQ ((__force upf_t) (1 << 24)) -#define UPF_EXAR_EFR ((__force upf_t) (1 << 25)) -#define UPF_BUG_THRE ((__force upf_t) (1 << 26)) +#define UPF_SOFT_FLOW ((__force upf_t) BIT_ULL(22)) +#define UPF_CONS_FLOW ((__force upf_t) BIT_ULL(23)) +#define UPF_SHARE_IRQ ((__force upf_t) BIT_ULL(24)) +#define UPF_EXAR_EFR ((__force upf_t) BIT_ULL(25)) +#define UPF_BUG_THRE ((__force upf_t) BIT_ULL(26)) /* The exact UART type is known and should not be probed. */ -#define UPF_FIXED_TYPE ((__force upf_t) (1 << 27)) -#define UPF_BOOT_AUTOCONF ((__force upf_t) (1 << 28)) -#define UPF_FIXED_PORT ((__force upf_t) (1 << 29)) -#define UPF_DEAD ((__force upf_t) (1 << 30)) -#define UPF_IOREMAP ((__force upf_t) (1 << 31)) -#define UPF_FULL_PROBE ((__force upf_t) (1ULL << 32)) +#define UPF_FIXED_TYPE ((__force upf_t) BIT_ULL(27)) +#define UPF_BOOT_AUTOCONF ((__force upf_t) BIT_ULL(28)) +#define UPF_FIXED_PORT ((__force upf_t) BIT_ULL(29)) +#define UPF_DEAD ((__force upf_t) BIT_ULL(30)) +#define UPF_IOREMAP ((__force upf_t) BIT_ULL(31)) +#define UPF_FULL_PROBE ((__force upf_t) BIT_ULL(32)) #define __UPF_CHANGE_MASK 0x17fff #define UPF_CHANGE_MASK ((__force upf_t) __UPF_CHANGE_MASK) -- cgit From 039d4926379b1d1c17b51cf21c500a5eed86899e Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Thu, 22 Sep 2022 10:00:05 +0300 Subject: serial: 8250: Toggle IER bits on only after irq has been set up MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Invoking TIOCVHANGUP on 8250_mid port on Ice Lake-D and then reopening the port triggers these faults during serial8250_do_startup(): DMAR: DRHD: handling fault status reg 3 DMAR: [DMA Write NO_PASID] Request device [00:1a.0] fault addr 0x0 [fault reason 0x05] PTE Write access is not set If the IRQ hasn't been set up yet, the UART will have zeroes in its MSI address/data registers. Disabling the IRQ at the interrupt controller won't stop the UART from performing a DMA write to the address programmed in its MSI address register (zero) when it wants to signal an interrupt. The UARTs (in Ice Lake-D) implement PCI 2.1 style MSI without masking capability, so there is no way to mask the interrupt at the source PCI function level, except disabling the MSI capability entirely, but that would cause it to fall back to INTx# assertion, and the PCI specification prohibits disabling the MSI capability as a way to mask a function's interrupt service request. The MSI address register is zeroed by the hangup as the irq is freed. The interrupt is signalled during serial8250_do_startup() performing a THRE test that temporarily toggles THRI in IER. The THRE test currently occurs before UART's irq (and MSI address) is properly set up. Refactor serial8250_do_startup() such that irq is set up before the THRE test. The current irq setup code is intermixed with the timer setup code. As THRE test must be performed prior to the timer setup, extract it into own function and call it only after the THRE test. The ->setup_timer() needs to be part of the struct uart_8250_ops in order to not create circular dependency between 8250 and 8250_base modules. Fixes: 40b36daad0ac ("[PATCH] 8250 UART backup timer") Reported-by: Lennert Buytenhek Tested-by: Lennert Buytenhek Reviewed-by: Andy Shevchenko Signed-off-by: Ilpo Järvinen Link: https://lore.kernel.org/r/20220922070005.2965-1-ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_8250.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h index 43edd10e8c96..19376bee9667 100644 --- a/include/linux/serial_8250.h +++ b/include/linux/serial_8250.h @@ -74,6 +74,7 @@ struct uart_8250_port; struct uart_8250_ops { int (*setup_irq)(struct uart_8250_port *); void (*release_irq)(struct uart_8250_port *); + void (*setup_timer)(struct uart_8250_port *); }; struct uart_8250_em485 { -- cgit From 781096d971dfe3c5f9401a300bdb0b148a600584 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:16 +0000 Subject: x86/resctrl: Create mba_sc configuration in the rdt_domain To support resctrl's MBA software controller, the architecture must provide a second configuration array to hold the mbps_val[] from user-space. This complicates the interface between the architecture specific code and the filesystem portions of resctrl that will move to /fs/, to allow multiple architectures to support resctrl. Make the filesystem parts of resctrl create an array for the mba_sc values. The software controller can be changed to use this, allowing the architecture code to only consider the values configured in hardware. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-9-james.morse@arm.com --- include/linux/resctrl.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 5d283bdd6162..93dfe553b364 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -15,6 +15,9 @@ int proc_resctrl_show(struct seq_file *m, #endif +/* max value for struct rdt_domain's mbps_val */ +#define MBA_MAX_MBPS U32_MAX + /** * enum resctrl_conf_type - The type of configuration. * @CDP_NONE: No prioritisation, both code and data are controlled or monitored. @@ -53,6 +56,9 @@ struct resctrl_staged_config { * @cqm_work_cpu: worker CPU for CQM h/w counters * @plr: pseudo-locked region (if any) associated with domain * @staged_config: parsed configuration to be applied + * @mbps_val: When mba_sc is enabled, this holds the array of user + * specified control values for mba_sc in MBps, indexed + * by closid */ struct rdt_domain { struct list_head list; @@ -67,6 +73,7 @@ struct rdt_domain { int cqm_work_cpu; struct pseudo_lock_region *plr; struct resctrl_staged_config staged_config[CDP_NUM_TYPES]; + u32 *mbps_val; }; /** -- cgit From ff6357bb50023af2a1dc8f113930082c5252c753 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:19 +0000 Subject: x86/resctrl: Allow update_mba_bw() to update controls directly update_mba_bw() calculates a new control value for the MBA resource based on the user provided mbps_val and the current measured bandwidth. Some control values need remapping by delay_bw_map(). It does this by calling wrmsrl() directly. This needs splitting up to be done by an architecture specific helper, so that the remainder can eventually be moved to /fs/. Add resctrl_arch_update_one() to apply one configuration value to the provided resource and domain. This avoids the staging and cross-calling that is only needed with changes made by user-space. delay_bw_map() moves to be part of the arch code, to maintain the 'percentage control' view of MBA resources in resctrl. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-12-james.morse@arm.com --- include/linux/resctrl.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 93dfe553b364..f4c9101df461 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -197,6 +197,14 @@ struct resctrl_schema { /* The number of closid supported by this resource regardless of CDP */ u32 resctrl_arch_get_num_closid(struct rdt_resource *r); int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid); + +/* + * Update the ctrl_val and apply this config right now. + * Must be called on one of the domain's CPUs. + */ +int resctrl_arch_update_one(struct rdt_resource *r, struct rdt_domain *d, + u32 closid, enum resctrl_conf_type t, u32 cfg_val); + u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d, u32 closid, enum resctrl_conf_type type); int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d); -- cgit From 21803630c4ffb433bab2951fbe0ee7b8dbcc8bcb Mon Sep 17 00:00:00 2001 From: Emeel Hakim Date: Wed, 21 Sep 2022 11:10:46 -0700 Subject: net/mlx5: Fix fields name prefix in MACsec Fix ifc fields name to be consistent with the device spec document. Fixes: 8385c51ff5bc ("net/mlx5: Introduce MACsec Connect-X offload hardware bits and structures") Signed-off-by: Emeel Hakim Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Jakub Kicinski --- include/linux/mlx5/mlx5_ifc.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 8decbf9a7bdd..da0ed11fcebd 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -11585,15 +11585,15 @@ struct mlx5_ifc_macsec_offload_obj_bits { u8 confidentiality_en[0x1]; u8 reserved_at_41[0x1]; - u8 esn_en[0x1]; - u8 esn_overlap[0x1]; + u8 epn_en[0x1]; + u8 epn_overlap[0x1]; u8 reserved_at_44[0x2]; u8 confidentiality_offset[0x2]; u8 reserved_at_48[0x4]; u8 aso_return_reg[0x4]; u8 reserved_at_50[0x10]; - u8 esn_msb[0x20]; + u8 epn_msb[0x20]; u8 reserved_at_80[0x8]; u8 dekn[0x18]; -- cgit From 23cc83c6ca87c9a4da62b9ddf4d0fc09f3054f81 Mon Sep 17 00:00:00 2001 From: Emeel Hakim Date: Wed, 21 Sep 2022 11:10:49 -0700 Subject: net/mlx5: Add ifc bits for MACsec extended packet number (EPN) and replay protection Add ifc bits related to advanced steering operations (ASO) and general object modify for macsec to use as part of offloading EPN and replay protection features. Reviewed-by: Raed Salem Signed-off-by: Emeel Hakim Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Jakub Kicinski --- include/linux/mlx5/mlx5_ifc.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index da0ed11fcebd..bd577b99b146 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -11558,6 +11558,20 @@ struct mlx5_ifc_modify_ipsec_obj_in_bits { struct mlx5_ifc_ipsec_obj_bits ipsec_object; }; +enum { + MLX5_MACSEC_ASO_REPLAY_PROTECTION = 0x1, +}; + +enum { + MLX5_MACSEC_ASO_REPLAY_WIN_32BIT = 0x0, + MLX5_MACSEC_ASO_REPLAY_WIN_64BIT = 0x1, + MLX5_MACSEC_ASO_REPLAY_WIN_128BIT = 0x2, + MLX5_MACSEC_ASO_REPLAY_WIN_256BIT = 0x3, +}; + +#define MLX5_MACSEC_ASO_INC_SN 0x2 +#define MLX5_MACSEC_ASO_REG_C_4_5 0x2 + struct mlx5_ifc_macsec_aso_bits { u8 valid[0x1]; u8 reserved_at_1[0x1]; @@ -11619,6 +11633,21 @@ struct mlx5_ifc_create_macsec_obj_in_bits { struct mlx5_ifc_macsec_offload_obj_bits macsec_object; }; +struct mlx5_ifc_modify_macsec_obj_in_bits { + struct mlx5_ifc_general_obj_in_cmd_hdr_bits general_obj_in_cmd_hdr; + struct mlx5_ifc_macsec_offload_obj_bits macsec_object; +}; + +enum { + MLX5_MODIFY_MACSEC_BITMASK_EPN_OVERLAP = BIT(0), + MLX5_MODIFY_MACSEC_BITMASK_EPN_MSB = BIT(1), +}; + +struct mlx5_ifc_query_macsec_obj_out_bits { + struct mlx5_ifc_general_obj_out_cmd_hdr_bits general_obj_out_cmd_hdr; + struct mlx5_ifc_macsec_offload_obj_bits macsec_object; +}; + struct mlx5_ifc_encryption_key_obj_bits { u8 modify_field_select[0x40]; -- cgit From 4411a6c0abd3e55b4a4fb9432b3a0553f12337c2 Mon Sep 17 00:00:00 2001 From: Emeel Hakim Date: Wed, 21 Sep 2022 11:10:53 -0700 Subject: net/mlx5e: Support MACsec offload extended packet number (EPN) MACsec EPN splits the packet number (PN) into two 32-bits fields, epn_lsb (32 least significant bits (LSBs) of PN) and epn_msb (32 most significant bits (MSBs) of PN). Epn_msb bits are managed by SW and for that HW is required to send an object change event of type EPN event notifying the SW to update the epn_msb in addition, once epn_msb is updated SW update HW with the new epn_msb value for HW to perform replay protection. To prevent HW from stopping while handling the event, SW manages another bit for HW called epn_overlap, HW uses the latter to get an indication regarding how to read the epn_msb value correctly while still receiving packets. Add epn event handling that updates the epn_overlap and epn_msb for every 2^31 packets according to the following logic: if epn_lsb crosses 2^31 (half sequence number wraparound) upon HW relevant event, SW updates the esn_overlap value to OLD (value = 1). When the epn_lsb crosses 2^32 (full sequence number wraparound) upon HW relevant event, SW updates the esn_overlap to NEW (value = 0) and increment the esn_msb. When using MACsec EPN a salt and short secure channel id (ssci) needs to be provided by the user, when offloading EPN need to pass this salt and ssci to the HW to be used in the initial vector (IV) calculations. Reviewed-by: Raed Salem Signed-off-by: Emeel Hakim Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Reported-by: kernel test robot Signed-off-by: Saeed Mahameed Signed-off-by: Jakub Kicinski --- include/linux/mlx5/device.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 2927810f172b..dcd60fb9e6b4 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -325,6 +325,7 @@ enum mlx5_event { MLX5_EVENT_TYPE_WQ_INVAL_REQ_ERROR = 0x10, MLX5_EVENT_TYPE_WQ_ACCESS_ERROR = 0x11, MLX5_EVENT_TYPE_SRQ_CATAS_ERROR = 0x12, + MLX5_EVENT_TYPE_OBJECT_CHANGE = 0x27, MLX5_EVENT_TYPE_INTERNAL_ERROR = 0x08, MLX5_EVENT_TYPE_PORT_CHANGE = 0x09, @@ -699,6 +700,12 @@ struct mlx5_eqe_temp_warning { __be64 sensor_warning_lsb; } __packed; +struct mlx5_eqe_obj_change { + u8 rsvd0[2]; + __be16 obj_type; + __be32 obj_id; +} __packed; + #define SYNC_RST_STATE_MASK 0xf enum sync_rst_state_type { @@ -737,6 +744,7 @@ union ev_data { struct mlx5_eqe_xrq_err xrq_err; struct mlx5_eqe_sync_fw_update sync_fw_update; struct mlx5_eqe_vhca_state vhca_state; + struct mlx5_eqe_obj_change obj_change; } __packed; struct mlx5_eqe { -- cgit From 6edf2576a6cc46460c164831517a36064eb8109c Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Tue, 13 Sep 2022 14:54:20 +0800 Subject: mm/slub: enable debugging memory wasting of kmalloc kmalloc's API family is critical for mm, with one nature that it will round up the request size to a fixed one (mostly power of 2). Say when user requests memory for '2^n + 1' bytes, actually 2^(n+1) bytes could be allocated, so in worst case, there is around 50% memory space waste. The wastage is not a big issue for requests that get allocated/freed quickly, but may cause problems with objects that have longer life time. We've met a kernel boot OOM panic (v5.10), and from the dumped slab info: [ 26.062145] kmalloc-2k 814056KB 814056KB From debug we found there are huge number of 'struct iova_magazine', whose size is 1032 bytes (1024 + 8), so each allocation will waste 1016 bytes. Though the issue was solved by giving the right (bigger) size of RAM, it is still nice to optimize the size (either use a kmalloc friendly size or create a dedicated slab for it). And from lkml archive, there was another crash kernel OOM case [1] back in 2019, which seems to be related with the similar slab waste situation, as the log is similar: [ 4.332648] iommu: Adding device 0000:20:02.0 to group 16 [ 4.338946] swapper/0 invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=0, oom_score_adj=0 ... [ 4.857565] kmalloc-2048 59164KB 59164KB The crash kernel only has 256M memory, and 59M is pretty big here. (Note: the related code has been changed and optimised in recent kernel [2], these logs are just picked to demo the problem, also a patch changing its size to 1024 bytes has been merged) So add an way to track each kmalloc's memory waste info, and leverage the existing SLUB debug framework (specifically SLUB_STORE_USER) to show its call stack of original allocation, so that user can evaluate the waste situation, identify some hot spots and optimize accordingly, for a better utilization of memory. The waste info is integrated into existing interface: '/sys/kernel/debug/slab/kmalloc-xx/alloc_traces', one example of 'kmalloc-4k' after boot is: 126 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] waste=233856/1856 age=280763/281414/282065 pid=1330 cpus=32 nodes=1 __kmem_cache_alloc_node+0x11f/0x4e0 __kmalloc_node+0x4e/0x140 ixgbe_alloc_q_vector+0xbe/0x830 [ixgbe] ixgbe_init_interrupt_scheme+0x2ae/0xc90 [ixgbe] ixgbe_probe+0x165f/0x1d20 [ixgbe] local_pci_probe+0x78/0xc0 work_for_cpu_fn+0x26/0x40 ... which means in 'kmalloc-4k' slab, there are 126 requests of 2240 bytes which got a 4KB space (wasting 1856 bytes each and 233856 bytes in total), from ixgbe_alloc_q_vector(). And when system starts some real workload like multiple docker instances, there could are more severe waste. [1]. https://lkml.org/lkml/2019/8/12/266 [2]. https://lore.kernel.org/lkml/2920df89-9975-5785-f79b-257d3052dfaf@huawei.com/ [Thanks Hyeonggon for pointing out several bugs about sorting/format] [Thanks Vlastimil for suggesting way to reduce memory usage of orig_size and keep it only for kmalloc objects] Signed-off-by: Feng Tang Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Robin Murphy Cc: John Garry Cc: Kefeng Wang Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 0fefdf528e0d..a713b0e5bbcd 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -29,6 +29,8 @@ #define SLAB_RED_ZONE ((slab_flags_t __force)0x00000400U) /* DEBUG: Poison objects */ #define SLAB_POISON ((slab_flags_t __force)0x00000800U) +/* Indicate a kmalloc slab */ +#define SLAB_KMALLOC ((slab_flags_t __force)0x00001000U) /* Align objs on cache lines */ #define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x00002000U) /* Use GFP_DMA memory */ -- cgit From fea62d370d7a1ba288d71d0cae7ad47c2a02b839 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:22 +0000 Subject: x86/resctrl: Allow per-rmid arch private storage to be reset To abstract the rmid counters into a helper that returns the number of bytes counted, architecture specific per-rmid state is needed. It needs to be possible to reset this hidden state, as the values may outlive the life of an rmid, or the mount time of the filesystem. mon_event_read() is called with first = true when an rmid is first allocated in mkdir_mondata_subdir(). Add resctrl_arch_reset_rmid() and call it from __mon_event_count()'s rr->first check. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-15-james.morse@arm.com --- include/linux/resctrl.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index f4c9101df461..818456770176 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -32,6 +32,16 @@ enum resctrl_conf_type { #define CDP_NUM_TYPES (CDP_DATA + 1) +/* + * Event IDs, the values match those used to program IA32_QM_EVTSEL before + * reading IA32_QM_CTR on RDT systems. + */ +enum resctrl_event_id { + QOS_L3_OCCUP_EVENT_ID = 0x01, + QOS_L3_MBM_TOTAL_EVENT_ID = 0x02, + QOS_L3_MBM_LOCAL_EVENT_ID = 0x03, +}; + /** * struct resctrl_staged_config - parsed configuration to be applied * @new_ctrl: new ctrl value to be loaded @@ -210,4 +220,17 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d, int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d); void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d); +/** + * resctrl_arch_reset_rmid() - Reset any private state associated with rmid + * and eventid. + * @r: The domain's resource. + * @d: The rmid's domain. + * @rmid: The rmid whose counter values should be reset. + * @eventid: The eventid whose counter values should be reset. + * + * This can be called from any CPU. + */ +void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, + u32 rmid, enum resctrl_event_id eventid); + #endif /* _RESCTRL_H */ -- cgit From 72bc36956f73ac54f19a7ac7302fb274069bec18 Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Tue, 20 Sep 2022 18:12:28 -0400 Subject: net: phylink: Document MAC_(A)SYM_PAUSE This documents the possible MLO_PAUSE_* settings which can result from different combinations of MAC_(A)SYM_PAUSE. Special note is paid to settings which can result from user configuration (MLO_PAUSE_AN). The autonegotiation results are more-or-less a direct consequence of IEEE 802.3 Table 28B-2. Signed-off-by: Sean Anderson Signed-off-by: David S. Miller --- include/linux/phylink.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 6d06896fc20d..1f997e14bf80 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -21,6 +21,35 @@ enum { MLO_AN_FIXED, /* Fixed-link mode */ MLO_AN_INBAND, /* In-band protocol */ + /* MAC_SYM_PAUSE and MAC_ASYM_PAUSE are used when configuring our + * autonegotiation advertisement. They correspond to the PAUSE and + * ASM_DIR bits defined by 802.3, respectively. + * + * The following table lists the values of tx_pause and rx_pause which + * might be requested in mac_link_up. The exact values depend on either + * the results of autonegotation (if MLO_PAUSE_AN is set) or user + * configuration (if MLO_PAUSE_AN is not set). + * + * MAC_SYM_PAUSE MAC_ASYM_PAUSE MLO_PAUSE_AN tx_pause/rx_pause + * ============= ============== ============ ================== + * 0 0 0 0/0 + * 0 0 1 0/0 + * 0 1 0 0/0, 0/1, 1/0, 1/1 + * 0 1 1 0/0, 1/0 + * 1 0 0 0/0, 1/1 + * 1 0 1 0/0, 1/1 + * 1 1 0 0/0, 0/1, 1/0, 1/1 + * 1 1 1 0/0, 0/1, 1/1 + * + * If you set MAC_ASYM_PAUSE, the user may request any combination of + * tx_pause and rx_pause. You do not have to support these + * combinations. + * + * However, you should support combinations of tx_pause and rx_pause + * which might be the result of autonegotation. For example, don't set + * MAC_SYM_PAUSE unless your device can support tx_pause and rx_pause + * at the same time. + */ MAC_SYM_PAUSE = BIT(0), MAC_ASYM_PAUSE = BIT(1), MAC_10HD = BIT(2), -- cgit From 606116529ab2d12e93bf751f74ed50a621b46846 Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Tue, 20 Sep 2022 18:12:29 -0400 Subject: net: phylink: Export phylink_caps_to_linkmodes This function is convenient for MAC drivers. They can use it to add or remove particular link modes based on capabilities (such as if half duplex is not supported for a particular interface mode). Signed-off-by: Sean Anderson Signed-off-by: David S. Miller --- include/linux/phylink.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 1f997e14bf80..7cf26d7a522d 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -547,6 +547,7 @@ void pcs_link_up(struct phylink_pcs *pcs, unsigned int mode, phy_interface_t interface, int speed, int duplex); #endif +void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps); void phylink_get_linkmodes(unsigned long *linkmodes, phy_interface_t interface, unsigned long mac_capabilities); void phylink_generic_validate(struct phylink_config *config, -- cgit From 3e6eab8f3ef93cd78cd4b67f497ef6035eb073ad Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Tue, 20 Sep 2022 18:12:30 -0400 Subject: net: phylink: Generate caps and convert to linkmodes separately If we call phylink_caps_to_linkmodes directly from phylink_get_linkmodes, it is difficult to re-use this functionality in MAC drivers. This is because MAC drivers must then work with an ethtool linkmode bitmap, instead of with mac capabilities. Instead, let the caller of phylink_get_linkmodes do the conversion. To reflect this change, rename the function to phylink_get_capabilities. Signed-off-by: Sean Anderson Signed-off-by: David S. Miller --- include/linux/phylink.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 7cf26d7a522d..cc039ae7e80c 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -548,8 +548,8 @@ void pcs_link_up(struct phylink_pcs *pcs, unsigned int mode, #endif void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps); -void phylink_get_linkmodes(unsigned long *linkmodes, phy_interface_t interface, - unsigned long mac_capabilities); +unsigned long phylink_get_capabilities(phy_interface_t interface, + unsigned long mac_capabilities); void phylink_generic_validate(struct phylink_config *config, unsigned long *supported, struct phylink_link_state *state); -- cgit From 0c3e10cb44232833a50cb8e3e784c432906a60c1 Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Tue, 20 Sep 2022 18:12:31 -0400 Subject: net: phy: Add support for rate matching This adds support for rate matching (also known as rate adaptation) to the phy subsystem. The general idea is that the phy interface runs at one speed, and the MAC throttles the rate at which it sends packets to the link speed. There's a good overview of several techniques for achieving this at [1]. This patch adds support for three: pause-frame based (such as in Aquantia phys), CRS-based (such as in 10PASS-TS and 2BASE-TL), and open-loop-based (such as in 10GBASE-W). This patch makes a few assumptions and a few non assumptions about the types of rate matching available. First, it assumes that different phys may use different forms of rate matching. Second, it assumes that phys can use rate matching for any of their supported link speeds (e.g. if a phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T). Third, it does not assume that all interface modes will use the same form of rate matching. Fourth, it does not assume that all phy devices will support rate matching (even if some do). Relaxing or strengthening these (non-)assumptions could result in a different API. For example, if all interface modes were assumed to use the same form of rate matching, then a bitmask of interface modes supportting rate matching would suffice. For some better visibility into the process, the current rate matching mode is exposed as part of the ethtool ksettings. For the moment, only read access is supported. I'm not sure what userspace might want to configure yet (disable it altogether, disable just one mode, specify the mode to use, etc.). For the moment, since only pause-based rate adaptation support is added in the next few commits, rate matching can be disabled altogether by adjusting the advertisement. 802.3 calls this feature "rate adaptation" in clause 49 (10GBASE-R) and "rate matching" in clause 61 (10PASS-TL and 2BASE-TS). Aquantia also calls this feature "rate adaptation". I chose "rate matching" because it is shorter, and because Russell doesn't think "adaptation" is correct in this context. Signed-off-by: Sean Anderson Signed-off-by: David S. Miller --- include/linux/phy.h | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 337230c135f7..9c66f357f489 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -280,7 +280,6 @@ static inline const char *phy_modes(phy_interface_t interface) } } - #define PHY_INIT_TIMEOUT 100000 #define PHY_FORCE_TIMEOUT 10 @@ -574,6 +573,7 @@ struct macsec_ops; * @lp_advertising: Current link partner advertised linkmodes * @eee_broken_modes: Energy efficient ethernet modes which should be prohibited * @autoneg: Flag autoneg being used + * @rate_matching: Current rate matching mode * @link: Current link state * @autoneg_complete: Flag auto negotiation of the link has completed * @mdix: Current crossover @@ -641,6 +641,8 @@ struct phy_device { unsigned irq_suspended:1; unsigned irq_rerun:1; + int rate_matching; + enum phy_state state; u32 dev_flags; @@ -805,6 +807,21 @@ struct phy_driver { */ int (*get_features)(struct phy_device *phydev); + /** + * @get_rate_matching: Get the supported type of rate matching for a + * particular phy interface. This is used by phy consumers to determine + * whether to advertise lower-speed modes for that interface. It is + * assumed that if a rate matching mode is supported on an interface, + * then that interface's rate can be adapted to all slower link speeds + * supported by the phy. If iface is %PHY_INTERFACE_MODE_NA, and the phy + * supports any kind of rate matching for any interface, then it must + * return that rate matching mode (preferring %RATE_MATCH_PAUSE to + * %RATE_MATCH_CRS). If the interface is not supported, this should + * return %RATE_MATCH_NONE. + */ + int (*get_rate_matching)(struct phy_device *phydev, + phy_interface_t iface); + /* PHY Power Management */ /** @suspend: Suspend the hardware, saving state if needed */ int (*suspend)(struct phy_device *phydev); @@ -971,6 +988,7 @@ struct phy_fixup { const char *phy_speed_to_str(int speed); const char *phy_duplex_to_str(unsigned int duplex); +const char *phy_rate_matching_to_str(int rate_matching); int phy_interface_num_ports(phy_interface_t interface); @@ -1687,6 +1705,8 @@ int phy_disable_interrupts(struct phy_device *phydev); void phy_request_interrupt(struct phy_device *phydev); void phy_free_interrupt(struct phy_device *phydev); void phy_print_status(struct phy_device *phydev); +int phy_get_rate_matching(struct phy_device *phydev, + phy_interface_t iface); void phy_set_max_speed(struct phy_device *phydev, u32 max_speed); void phy_remove_link_mode(struct phy_device *phydev, u32 link_mode); void phy_advertise_supported(struct phy_device *phydev); -- cgit From ae0e4bb2a0e0e434e9f98fb4994093ec2ee71997 Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Tue, 20 Sep 2022 18:12:32 -0400 Subject: net: phylink: Adjust link settings based on rate matching If the phy is configured to use pause-based rate matching, ensure that the link is full duplex with pause frame reception enabled. As suggested, if pause-based rate matching is enabled by the phy, then pause reception is unconditionally enabled. The interface duplex is determined based on the rate matching type. When rate matching is enabled, so is the speed. We assume the maximum interface speed is used. This is only relevant for MLO_AN_PHY. For MLO_AN_INBAND, the MAC/PCS's view of the interface speed will be used. Although there are no RATE_ADAPT_CRS phys in-tree, it has been added for comparison (and the implementation is quite simple). Co-developed-by: Russell King Signed-off-by: Sean Anderson Signed-off-by: David S. Miller --- include/linux/phylink.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index cc039ae7e80c..5c99c21e42b5 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -88,6 +88,10 @@ static inline bool phylink_autoneg_inband(unsigned int mode) * @speed: link speed, one of the SPEED_* constants. * @duplex: link duplex mode, one of DUPLEX_* constants. * @pause: link pause state, described by MLO_PAUSE_* constants. + * @rate_matching: rate matching being performed, one of the RATE_MATCH_* + * constants. If rate matching is taking place, then the speed/duplex of + * the medium link mode (@speed and @duplex) and the speed/duplex of the phy + * interface mode (@interface) are different. * @link: true if the link is up. * @an_enabled: true if autonegotiation is enabled/desired. * @an_complete: true if autonegotiation has completed. @@ -99,6 +103,7 @@ struct phylink_link_state { int speed; int duplex; int pause; + int rate_matching; unsigned int link:1; unsigned int an_enabled:1; unsigned int an_complete:1; -- cgit From b7e9294885b610791fcebc799bf2a9e219446fd6 Mon Sep 17 00:00:00 2001 From: Sean Anderson Date: Tue, 20 Sep 2022 18:12:33 -0400 Subject: net: phylink: Adjust advertisement based on rate matching This adds support for adjusting the advertisement for pause-based rate matching. This may result in a lossy link, since the final link settings are not adjusted. Asymmetric pause support is necessary. It would be possible for a MAC supporting only symmetric pause to use pause-based rate adaptation, but only if pause reception was enabled as well. Signed-off-by: Sean Anderson Signed-off-by: David S. Miller --- include/linux/phylink.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 5c99c21e42b5..664dd409feb9 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -554,7 +554,8 @@ void pcs_link_up(struct phylink_pcs *pcs, unsigned int mode, void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps); unsigned long phylink_get_capabilities(phy_interface_t interface, - unsigned long mac_capabilities); + unsigned long mac_capabilities, + int rate_matching); void phylink_generic_validate(struct phylink_config *config, unsigned long *supported, struct phylink_link_state *state); -- cgit From 4d044c521a63b2cd394ea6e3547012032145e47e Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:23 +0000 Subject: x86/resctrl: Abstract __rmid_read() __rmid_read() selects the specified eventid and returns the counter value from the MSR. The error handling is architecture specific, and handled by the callers, rdtgroup_mondata_show() and __mon_event_count(). Error handling should be handled by architecture specific code, as a different architecture may have different requirements. MPAM's counters can report that they are 'not ready', requiring a second read after a short delay. This should be hidden from resctrl. Make __rmid_read() the architecture specific function for reading a counter. Rename it resctrl_arch_rmid_read() and move the error handling into it. A read from a counter that hardware supports but resctrl does not now returns -EINVAL instead of -EIO from the default case in __mon_event_count(). It isn't possible for user-space to see this change as resctrl doesn't expose counters it doesn't support. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-16-james.morse@arm.com --- include/linux/resctrl.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 818456770176..efe60dd7fd21 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -219,6 +219,7 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d, u32 closid, enum resctrl_conf_type type); int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d); void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d); +int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res); /** * resctrl_arch_reset_rmid() - Reset any private state associated with rmid -- cgit From 8286618aca331bf17323ff3023ca831ac6e4b86f Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:24 +0000 Subject: x86/resctrl: Pass the required parameters into resctrl_arch_rmid_read() resctrl_arch_rmid_read() is intended as the function that an architecture agnostic resctrl filesystem driver can use to read a value in bytes from a hardware register. Currently the function returns the MBM values in chunks directly from hardware. To convert this to bytes, some correction and overflow calculations are needed. These depend on the resource and domain structures. Overflow detection requires the old chunks value. None of this is available to resctrl_arch_rmid_read(). MPAM requires the resource and domain structures to find the MMIO device that holds the registers. Pass the resource and domain to resctrl_arch_rmid_read(). This makes rmid_dirty() too big. Instead merge it with its only caller, and the name is kept as a local variable. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-17-james.morse@arm.com --- include/linux/resctrl.h | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index efe60dd7fd21..7ccfa0d1bb34 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -219,7 +219,23 @@ u32 resctrl_arch_get_config(struct rdt_resource *r, struct rdt_domain *d, u32 closid, enum resctrl_conf_type type); int resctrl_online_domain(struct rdt_resource *r, struct rdt_domain *d); void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d); -int resctrl_arch_rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *res); + +/** + * resctrl_arch_rmid_read() - Read the eventid counter corresponding to rmid + * for this resource and domain. + * @r: resource that the counter should be read from. + * @d: domain that the counter should be read from. + * @rmid: rmid of the counter to read. + * @eventid: eventid to read, e.g. L3 occupancy. + * @val: result of the counter read in chunks. + * + * Call from process context on a CPU that belongs to domain @d. + * + * Return: + * 0 on success, or -EIO, -EINVAL etc on error. + */ +int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d, + u32 rmid, enum resctrl_event_id eventid, u64 *val); /** * resctrl_arch_reset_rmid() - Reset any private state associated with rmid -- cgit From ae2328b52962531c2d7c6b531022a3eb2d680f17 Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:27 +0000 Subject: x86/resctrl: Rename and change the units of resctrl_cqm_threshold resctrl_cqm_threshold is stored in a hardware specific chunk size, but exposed to user-space as bytes. This means the filesystem parts of resctrl need to know how the hardware counts, to convert the user provided byte value to chunks. The interface between the architecture's resctrl code and the filesystem ought to treat everything as bytes. Change the unit of resctrl_cqm_threshold to bytes. resctrl_arch_rmid_read() still returns its value in chunks, so this needs converting to bytes. As all the users have been touched, rename the variable to resctrl_rmid_realloc_threshold, which describes what the value is for. Neither r->num_rmid nor hw_res->mon_scale are guaranteed to be a power of 2, so the existing code introduces a rounding error from resctrl's theoretical fraction of the cache usage. This behaviour is kept as it ensures the user visible value matches the value read from hardware when the rmid will be reallocated. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-20-james.morse@arm.com --- include/linux/resctrl.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 7ccfa0d1bb34..9995d043650a 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -250,4 +250,6 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain *d, void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, u32 rmid, enum resctrl_event_id eventid); +extern unsigned int resctrl_rmid_realloc_threshold; + #endif /* _RESCTRL_H */ -- cgit From d80975e264c8f01518890f3d91ab5bada8fa7f5e Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:28 +0000 Subject: x86/resctrl: Add resctrl_rmid_realloc_limit to abstract x86's boot_cpu_data resctrl_rmid_realloc_threshold can be set by user-space. The maximum value is specified by the architecture. Currently max_threshold_occ_write() reads the maximum value from boot_cpu_data.x86_cache_size, which is not portable to another architecture. Add resctrl_rmid_realloc_limit to describe the maximum size in bytes that user-space can set the threshold to. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-21-james.morse@arm.com --- include/linux/resctrl.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index 9995d043650a..cb857f753322 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -251,5 +251,6 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_domain *d, u32 rmid, enum resctrl_event_id eventid); extern unsigned int resctrl_rmid_realloc_threshold; +extern unsigned int resctrl_rmid_realloc_limit; #endif /* _RESCTRL_H */ -- cgit From f7b1843eca6fe295ba0c71fc02a3291954078f2b Mon Sep 17 00:00:00 2001 From: James Morse Date: Fri, 2 Sep 2022 15:48:29 +0000 Subject: x86/resctrl: Make resctrl_arch_rmid_read() return values in bytes resctrl_arch_rmid_read() returns a value in chunks, as read from the hardware. This needs scaling to bytes by mon_scale, as provided by the architecture code. Now that resctrl_arch_rmid_read() performs the overflow and corrections itself, it may as well return a value in bytes directly. This allows the accesses to the architecture specific 'hw' structure to be removed. Move the mon_scale conversion into resctrl_arch_rmid_read(). mbm_bw_count() is updated to calculate bandwidth from bytes. Signed-off-by: James Morse Signed-off-by: Borislav Petkov Reviewed-by: Jamie Iles Reviewed-by: Shaopeng Tan Reviewed-by: Reinette Chatre Tested-by: Xin Hao Tested-by: Shaopeng Tan Tested-by: Cristian Marussi Link: https://lore.kernel.org/r/20220902154829.30399-22-james.morse@arm.com --- include/linux/resctrl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h index cb857f753322..0cf5b20c6ddf 100644 --- a/include/linux/resctrl.h +++ b/include/linux/resctrl.h @@ -227,7 +227,7 @@ void resctrl_offline_domain(struct rdt_resource *r, struct rdt_domain *d); * @d: domain that the counter should be read from. * @rmid: rmid of the counter to read. * @eventid: eventid to read, e.g. L3 occupancy. - * @val: result of the counter read in chunks. + * @val: result of the counter read in bytes. * * Call from process context on a CPU that belongs to domain @d. * -- cgit From 1e1f26635e5459db4134952369b76b8d59c50438 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Mon, 19 Sep 2022 19:58:03 -0700 Subject: ASoC: ssm2518: drop support for platform data There are currently no users of this driver's platform data in the mainline kernel, so let's drop it. Newer devices should use DT, ACPI, or static software properties to describe the hardware. Signed-off-by: Dmitry Torokhov Link: https://lore.kernel.org/r/20220920025804.1788667-1-dmitry.torokhov@gmail.com Signed-off-by: Mark Brown --- include/linux/platform_data/ssm2518.h | 21 --------------------- 1 file changed, 21 deletions(-) delete mode 100644 include/linux/platform_data/ssm2518.h (limited to 'include/linux') diff --git a/include/linux/platform_data/ssm2518.h b/include/linux/platform_data/ssm2518.h deleted file mode 100644 index 3f9e632d6f63..000000000000 --- a/include/linux/platform_data/ssm2518.h +++ /dev/null @@ -1,21 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0-only */ -/* - * SSM2518 amplifier audio driver - * - * Copyright 2013 Analog Devices Inc. - * Author: Lars-Peter Clausen - */ - -#ifndef __LINUX_PLATFORM_DATA_SSM2518_H__ -#define __LINUX_PLATFORM_DATA_SSM2518_H__ - -/** - * struct ssm2518_platform_data - Platform data for the ssm2518 driver - * @enable_gpio: GPIO connected to the nSD pin. Set to -1 if the nSD pin is - * hardwired. - */ -struct ssm2518_platform_data { - int enable_gpio; -}; - -#endif -- cgit From 22873deac9e7b273bbf17eee515c8170510d861a Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Sat, 24 Sep 2022 06:59:59 +0200 Subject: vfs: add vfs_tmpfile_open() helper This helper unifies tmpfile creation with opening. Existing vfs_tmpfile() callers outside of fs/namei.c will be converted to using this helper. There are two such callers: cachefile and overlayfs. The cachefiles code currently uses the open_with_fake_path() helper to open the tmpfile, presumably to disable accounting of the open file. Overlayfs uses tmpfile for copy_up, which means these struct file instances will be short lived, hence it doesn't really matter if they are accounted or not. Disable accounting in this helper too, which should be okay for both callers. Add MAY_OPEN permission checking for consistency. Like for create(2) read/write permissions are not checked. Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Miklos Szeredi --- include/linux/fs.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9eced4cc286e..15fafda95dd3 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2007,6 +2007,10 @@ static inline int vfs_whiteout(struct user_namespace *mnt_userns, struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns, struct dentry *dentry, umode_t mode, int open_flag); +struct file *vfs_tmpfile_open(struct user_namespace *mnt_userns, + const struct path *parentpath, + umode_t mode, int open_flag, const struct cred *cred); + int vfs_mkobj(struct dentry *, umode_t, int (*f)(struct dentry *, umode_t, void *), void *); -- cgit From 3e9d4c593558ea86f49e10e62373a54c7f5a63e4 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Sat, 24 Sep 2022 07:00:00 +0200 Subject: vfs: make vfs_tmpfile() static No callers outside of fs/namei.c anymore. Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Miklos Szeredi --- include/linux/fs.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 15fafda95dd3..02646542f6bb 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2004,9 +2004,6 @@ static inline int vfs_whiteout(struct user_namespace *mnt_userns, WHITEOUT_DEV); } -struct dentry *vfs_tmpfile(struct user_namespace *mnt_userns, - struct dentry *dentry, umode_t mode, int open_flag); - struct file *vfs_tmpfile_open(struct user_namespace *mnt_userns, const struct path *parentpath, umode_t mode, int open_flag, const struct cred *cred); -- cgit From 9751b338656f05a0ce918befd5118fcd970c71c6 Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Sat, 24 Sep 2022 07:00:00 +0200 Subject: vfs: move open right after ->tmpfile() Create a helper finish_open_simple() that opens the file with the original dentry. Handle the error case here as well to simplify callers. Call this helper right after ->tmpfile() is called. Next patch will change the tmpfile API and move this call into tmpfile instances. Signed-off-by: Miklos Szeredi --- include/linux/fs.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 02646542f6bb..a3c50869e79b 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2780,6 +2780,15 @@ extern int finish_open(struct file *file, struct dentry *dentry, int (*open)(struct inode *, struct file *)); extern int finish_no_open(struct file *file, struct dentry *dentry); +/* Helper for the simple case when original dentry is used */ +static inline int finish_open_simple(struct file *file, int error) +{ + if (error) + return error; + + return finish_open(file, file->f_path.dentry, NULL); +} + /* fs/dcache.c */ extern void __init vfs_caches_init_early(void); extern void __init vfs_caches_init(void); -- cgit From 863f144f12add1f4eab80b70561a90857c524a8b Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Sat, 24 Sep 2022 07:00:00 +0200 Subject: vfs: open inside ->tmpfile() This is in preparation for adding tmpfile support to fuse, which requires that the tmpfile creation and opening are done as a single operation. Replace the 'struct dentry *' argument of i_op->tmpfile with 'struct file *'. Call finish_open_simple() as the last thing in ->tmpfile() instances (may be omitted in the error case). Change d_tmpfile() argument to 'struct file *' as well to make callers more readable. Reviewed-by: Christian Brauner (Microsoft) Signed-off-by: Miklos Szeredi --- include/linux/dcache.h | 3 ++- include/linux/fs.h | 2 +- 2 files changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dcache.h b/include/linux/dcache.h index 92c78ed02b54..bde9f8ff8869 100644 --- a/include/linux/dcache.h +++ b/include/linux/dcache.h @@ -16,6 +16,7 @@ #include struct path; +struct file; struct vfsmount; /* @@ -250,7 +251,7 @@ extern struct dentry * d_make_root(struct inode *); /* - the ramfs-type tree */ extern void d_genocide(struct dentry *); -extern void d_tmpfile(struct dentry *, struct inode *); +extern void d_tmpfile(struct file *, struct inode *); extern struct dentry *d_find_alias(struct inode *); extern void d_prune_aliases(struct inode *); diff --git a/include/linux/fs.h b/include/linux/fs.h index a3c50869e79b..8218d9964ff8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2168,7 +2168,7 @@ struct inode_operations { struct file *, unsigned open_flag, umode_t create_mode); int (*tmpfile) (struct user_namespace *, struct inode *, - struct dentry *, umode_t); + struct file *, umode_t); int (*set_acl)(struct user_namespace *, struct inode *, struct posix_acl *, int); int (*fileattr_set)(struct user_namespace *mnt_userns, -- cgit From 4a575865c1ea67018d96acda9b43e5d3d25b2366 Mon Sep 17 00:00:00 2001 From: Rafał Miłecki Date: Fri, 16 Sep 2022 13:20:49 +0100 Subject: mtd: allow getting MTD device associated with a specific DT node MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MTD subsystem API allows interacting with MTD devices (e.g. reading, writing, handling bad blocks). So far a random driver could get MTD device only by its name (get_mtd_device_nm()). This change allows getting them also by a DT node. This API is required for drivers handling DT defined MTD partitions in a specific way (e.g. U-Boot (sub)partition with environment variables). Acked-by: Miquel Raynal Signed-off-by: Rafał Miłecki Signed-off-by: Srinivas Kandagatla Link: https://lore.kernel.org/r/20220916122100.170016-3-srinivas.kandagatla@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/mtd/mtd.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mtd/mtd.h b/include/linux/mtd/mtd.h index 955aee14b0f7..6fc841ceef31 100644 --- a/include/linux/mtd/mtd.h +++ b/include/linux/mtd/mtd.h @@ -677,6 +677,7 @@ extern int mtd_device_unregister(struct mtd_info *master); extern struct mtd_info *get_mtd_device(struct mtd_info *mtd, int num); extern int __get_mtd_device(struct mtd_info *mtd); extern void __put_mtd_device(struct mtd_info *mtd); +extern struct mtd_info *of_get_mtd_device_by_node(struct device_node *np); extern struct mtd_info *get_mtd_device_nm(const char *name); extern void put_mtd_device(struct mtd_info *mtd); -- cgit From aade55c86033bee868a93e4bf3843c9c99e84526 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 22 Sep 2022 16:54:10 +0300 Subject: device property: Add const qualifier to device_get_match_data() parameter Add const qualifier to the device_get_match_data() parameter. Some of the future users may utilize this function without forcing the type. All the same, dev_fwnode() may be used with a const qualifier. Reported-by: kernel test robot Acked-by: Heikki Krogerus Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220922135410.49694-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman --- include/linux/property.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/property.h b/include/linux/property.h index a5b429d623f6..117cc200c656 100644 --- a/include/linux/property.h +++ b/include/linux/property.h @@ -32,7 +32,7 @@ enum dev_dma_attr { DEV_DMA_COHERENT, }; -struct fwnode_handle *dev_fwnode(struct device *dev); +struct fwnode_handle *dev_fwnode(const struct device *dev); bool device_property_present(struct device *dev, const char *propname); int device_property_read_u8_array(struct device *dev, const char *propname, @@ -387,7 +387,7 @@ bool device_dma_supported(struct device *dev); enum dev_dma_attr device_get_dma_attr(struct device *dev); -const void *device_get_match_data(struct device *dev); +const void *device_get_match_data(const struct device *dev); int device_get_phy_mode(struct device *dev); int fwnode_get_phy_mode(struct fwnode_handle *fwnode); -- cgit From bf2ee8d0c385f883a00473768b67faf2189b2410 Mon Sep 17 00:00:00 2001 From: Jianmin Lv Date: Sun, 11 Sep 2022 17:06:34 +0800 Subject: ACPI: scan: Support multiple DMA windows with different offsets In DT systems configurations, of_dma_get_range() returns struct bus_dma_region DMA regions; they are used to set-up devices DMA windows with different offset available for translation between DMA address and CPU address. In ACPI systems configuration, acpi_dma_get_range() does not return DMA regions yet and that precludes setting up the dev->dma_range_map pointer and therefore DMA regions with multiple offsets. Update acpi_dma_get_range() to return struct bus_dma_region DMA regions like of_dma_get_range() does. After updating acpi_dma_get_range(), acpi_arch_dma_setup() is changed for ARM64, where the original dma_addr and size are removed as these arguments are now redundant, and pass 0 and U64_MAX for dma_base and size of arch_setup_dma_ops; this is a simplification consistent with what other ACPI architectures also pass to iommu_setup_dma_ops(). Reviewed-by: Robin Murphy Signed-off-by: Jianmin Lv Reviewed-by: Lorenzo Pieralisi Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6f64b2f3dc54..bb41623dab77 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -281,12 +281,12 @@ void acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa); #ifdef CONFIG_ARM64 void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa); -void acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size); +void acpi_arch_dma_setup(struct device *dev); #else static inline void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { } static inline void -acpi_arch_dma_setup(struct device *dev, u64 *dma_addr, u64 *dma_size) { } +acpi_arch_dma_setup(struct device *dev) { } #endif int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma); @@ -977,8 +977,7 @@ static inline enum dev_dma_attr acpi_get_dma_attr(struct acpi_device *adev) return DEV_DMA_NOT_SUPPORTED; } -static inline int acpi_dma_get_range(struct device *dev, u64 *dma_addr, - u64 *offset, u64 *size) +static inline int acpi_dma_get_range(struct device *dev, const struct bus_dma_region **map) { return -ENODEV; } -- cgit From c78c43fe7d42524c8f364aaf95ef3652e7f1186b Mon Sep 17 00:00:00 2001 From: Jianmin Lv Date: Sun, 11 Sep 2022 17:06:35 +0800 Subject: LoongArch: Use acpi_arch_dma_setup() and remove ARCH_HAS_PHYS_TO_DMA Use _DMA defined in ACPI spec for translation between DMA address and CPU address, and implement acpi_arch_dma_setup for initializing dev->dma_range_map, where acpi_dma_get_range is called for parsing _DMA. e.g. If we have two dma ranges: cpu address dma address size offset 0x200080000000 0x2080000000 0x400000000 0x1fe000000000 0x400080000000 0x4080000000 0x400000000 0x3fc000000000 _DMA for pci devices should be declared in host bridge as flowing: Name (_DMA, ResourceTemplate() { QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite, 0x0, 0x4080000000, 0x447fffffff, 0x3fc000000000, 0x400000000, , , ) QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, NonCacheable, ReadWrite, 0x0, 0x2080000000, 0x247fffffff, 0x1fe000000000, 0x400000000, , , ) }) Acked-by: Huacai Chen Signed-off-by: Jianmin Lv Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index bb41623dab77..a71d73a0d43e 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -279,14 +279,17 @@ acpi_numa_processor_affinity_init(struct acpi_srat_cpu_affinity *pa) { } void acpi_numa_x2apic_affinity_init(struct acpi_srat_x2apic_cpu_affinity *pa); +#if defined(CONFIG_ARM64) || defined(CONFIG_LOONGARCH) +void acpi_arch_dma_setup(struct device *dev); +#else +static inline void acpi_arch_dma_setup(struct device *dev) { } +#endif + #ifdef CONFIG_ARM64 void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa); -void acpi_arch_dma_setup(struct device *dev); #else static inline void acpi_numa_gicc_affinity_init(struct acpi_srat_gicc_affinity *pa) { } -static inline void -acpi_arch_dma_setup(struct device *dev) { } #endif int acpi_numa_memory_affinity_init (struct acpi_srat_mem_affinity *ma); -- cgit From 43cf36974d760a3d1c705a83de89ac58059e5f0b Mon Sep 17 00:00:00 2001 From: Daniel Scally Date: Thu, 22 Sep 2022 00:04:37 +0100 Subject: platform/x86: int3472: Support multiple clock consumers At present, the tps68470.c only supports a single clock consumer when passing platform data to the clock driver. In some devices multiple sensors depend on the clock provided by a single TPS68470 and so all need to be able to acquire the clock. Support passing multiple consumers as platform data. Reviewed-by: Hans de Goede Signed-off-by: Daniel Scally Reviewed-by: Stephen Boyd Acked-by: Stephen Boyd Signed-off-by: Rafael J. Wysocki --- include/linux/platform_data/tps68470.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/platform_data/tps68470.h b/include/linux/platform_data/tps68470.h index 126d082c3f2e..e605a2cab07f 100644 --- a/include/linux/platform_data/tps68470.h +++ b/include/linux/platform_data/tps68470.h @@ -27,9 +27,14 @@ struct tps68470_regulator_platform_data { const struct regulator_init_data *reg_init_data[TPS68470_NUM_REGULATORS]; }; -struct tps68470_clk_platform_data { +struct tps68470_clk_consumer { const char *consumer_dev_name; const char *consumer_con_id; }; +struct tps68470_clk_platform_data { + unsigned int n_consumers; + struct tps68470_clk_consumer consumers[]; +}; + #endif -- cgit From 7c7f9bc986e698873b489c371a08f206979d06b7 Mon Sep 17 00:00:00 2001 From: Lukas Wunner Date: Thu, 22 Sep 2022 18:27:33 +0200 Subject: serial: Deassert Transmit Enable on probe in driver-specific way MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When a UART port is newly registered, uart_configure_port() seeks to deassert RS485 Transmit Enable by setting the RTS bit in port->mctrl. However a number of UART drivers interpret a set RTS bit as *assertion* instead of deassertion: Affected drivers include those using serial8250_em485_config() (except 8250_bcm2835aux.c) and some using mctrl_gpio (e.g. imx.c). Since the interpretation of the RTS bit is driver-specific, it is not suitable as a means to centrally deassert Transmit Enable in the serial core. Instead, the serial core must call on drivers to deassert it in their driver-specific way. One way to achieve that is to call ->rs485_config(). It implicitly deasserts Transmit Enable. So amend uart_configure_port() and uart_resume_port() to invoke uart_rs485_config(). That allows removing calls to uart_rs485_config() from drivers' ->probe() hooks and declaring the function static. Skip any invocation of ->set_mctrl() if RS485 is enabled. RS485 has no hardware flow control, so the modem control lines are irrelevant and need not be touched. When leaving RS485 mode, reset the modem control lines to the state stored in port->mctrl. That way, UARTs which are muxed between RS485 and RS232 transceivers drive the lines correctly when switched to RS232. (serial8250_do_startup() historically raises the OUT1 modem signal because otherwise interrupts are not signaled on ancient PC UARTs, but I believe that no longer applies to modern, RS485-capable UARTs and is thus safe to be skipped.) imx.c modifies port->mctrl whenever Transmit Enable is asserted and deasserted. Stop it from doing that so port->mctrl reflects the RS232 line state. 8250_omap.c deasserts Transmit Enable on ->runtime_resume() by calling ->set_mctrl(). Because that is now a no-op in RS485 mode, amend the function to call serial8250_em485_stop_tx(). fsl_lpuart.c retrieves and applies the RS485 device tree properties after registering the UART port. Because applying now happens on registration in uart_configure_port(), move retrieval of the properties ahead of uart_add_one_port(). Link: https://lore.kernel.org/all/20220329085050.311408-1-matthias.schiffer@ew.tq-group.com/ Link: https://lore.kernel.org/all/8f538a8903795f22f9acc94a9a31b03c9c4ccacb.camel@ginzinger.com/ Fixes: d3b3404df318 ("serial: Fix incorrect rs485 polarity on uart open") Cc: stable@vger.kernel.org # v4.14+ Reported-by: Matthias Schiffer Reported-by: Roosen Henri Tested-by: Matthias Schiffer Reviewed-by: Ilpo Järvinen Signed-off-by: Lukas Wunner Link: https://lore.kernel.org/r/2de36eba3fbe11278d5002e4e501afe0ceaca039.1663863805.git.lukas@wunner.de Signed-off-by: Greg Kroah-Hartman --- include/linux/serial_core.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index b4c21f84ad79..d657f2a42a7b 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -951,5 +951,4 @@ static inline int uart_handle_break(struct uart_port *port) !((cflag) & CLOCAL)) int uart_get_rs485_mode(struct uart_port *port); -int uart_rs485_config(struct uart_port *port); #endif /* LINUX_SERIAL_CORE_H */ -- cgit From 1a77dd1c2bb5d4a58c16d198cf593720787c02e4 Mon Sep 17 00:00:00 2001 From: Arun Easi Date: Wed, 7 Sep 2022 16:33:08 -0700 Subject: scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled Fix this compilation error seen when CONFIG_TRACING is not enabled: drivers/scsi/qla2xxx/qla_os.c: In function 'qla_trace_init': drivers/scsi/qla2xxx/qla_os.c:2854:25: error: implicit declaration of function 'trace_array_get_by_name'; did you mean 'trace_array_set_clr_event'? [-Werror=implicit-function-declaration] 2854 | qla_trc_array = trace_array_get_by_name("qla2xxx"); | ^~~~~~~~~~~~~~~~~~~~~~~ | trace_array_set_clr_event drivers/scsi/qla2xxx/qla_os.c: In function 'qla_trace_uninit': drivers/scsi/qla2xxx/qla_os.c:2869:9: error: implicit declaration of function 'trace_array_put' [-Werror=implicit-function-declaration] 2869 | trace_array_put(qla_trc_array); | ^~~~~~~~~~~~~~~ Link: https://lore.kernel.org/r/20220907233308.4153-2-aeasi@marvell.com Reported-by: kernel test robot Reviewed-by: Steven Rostedt (Google) Signed-off-by: Arun Easi Signed-off-by: Martin K. Petersen --- include/linux/trace.h | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/trace.h b/include/linux/trace.h index bf169612ffe1..b5e16e438448 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -2,8 +2,6 @@ #ifndef _LINUX_TRACE_H #define _LINUX_TRACE_H -#ifdef CONFIG_TRACING - #define TRACE_EXPORT_FUNCTION BIT(0) #define TRACE_EXPORT_EVENT BIT(1) #define TRACE_EXPORT_MARKER BIT(2) @@ -28,6 +26,8 @@ struct trace_export { int flags; }; +#ifdef CONFIG_TRACING + int register_ftrace_export(struct trace_export *export); int unregister_ftrace_export(struct trace_export *export); @@ -48,6 +48,38 @@ void osnoise_arch_unregister(void); void osnoise_trace_irq_entry(int id); void osnoise_trace_irq_exit(int id, const char *desc); +#else /* CONFIG_TRACING */ +static inline int register_ftrace_export(struct trace_export *export) +{ + return -EINVAL; +} +static inline int unregister_ftrace_export(struct trace_export *export) +{ + return 0; +} +static inline void trace_printk_init_buffers(void) +{ +} +static inline int trace_array_printk(struct trace_array *tr, unsigned long ip, + const char *fmt, ...) +{ + return 0; +} +static inline int trace_array_init_printk(struct trace_array *tr) +{ + return -EINVAL; +} +static inline void trace_array_put(struct trace_array *tr) +{ +} +static inline struct trace_array *trace_array_get_by_name(const char *name) +{ + return NULL; +} +static inline int trace_array_destroy(struct trace_array *tr) +{ + return 0; +} #endif /* CONFIG_TRACING */ #endif /* _LINUX_TRACE_H */ -- cgit From 38622010a6de3a62cc72688348548854ed82dcf5 Mon Sep 17 00:00:00 2001 From: Boris Burkov Date: Mon, 15 Aug 2022 13:54:28 -0700 Subject: btrfs: send: add support for fs-verity Preserve the fs-verity status of a btrfs file across send/recv. There is no facility for installing the Merkle tree contents directly on the receiving filesystem, so we package up the parameters used to enable verity found in the verity descriptor. This gives the receive side enough information to properly enable verity again. Note that this means that receive will have to re-compute the whole Merkle tree, similar to how compression worked before encoded_write. Since the file becomes read-only after verity is enabled, it is important that verity is added to the send stream after any file writes. Therefore, when we process a verity item, merely note that it happened, then actually create the command in the send stream during 'finish_inode_if_needed'. This also creates V3 of the send stream format, without any format changes besides adding the new commands and attributes. Signed-off-by: Boris Burkov Signed-off-by: David Sterba --- include/linux/fsverity.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fsverity.h b/include/linux/fsverity.h index 7af030fa3c36..40f14e5fed9d 100644 --- a/include/linux/fsverity.h +++ b/include/linux/fsverity.h @@ -22,6 +22,9 @@ */ #define FS_VERITY_MAX_DIGEST_SIZE SHA512_DIGEST_SIZE +/* Arbitrary limit to bound the kmalloc() size. Can be changed. */ +#define FS_VERITY_MAX_DESCRIPTOR_SIZE 16384 + /* Verity operations for filesystems */ struct fsverity_operations { -- cgit From 691cdf016d3be6f66a3ea384809be229e0f9c590 Mon Sep 17 00:00:00 2001 From: Christophe Leroy Date: Wed, 7 Sep 2022 11:34:45 +0200 Subject: powerpc: Rely on generic definition of hugepd_t and is_hugepd when unused CONFIG_ARCH_HAS_HUGEPD is used to tell core mm when huge page directories are used. When they are not used, no need to provide hugepd_t or is_hugepd(), just rely on the core mm fallback definition. For that, change core mm behaviour so that CONFIG_ARCH_HAS_HUGEPD is used instead of indirect is_hugepd macro existence. powerpc being the only user of huge page directories, there is no impact on other architectures. Signed-off-by: Christophe Leroy Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/da81462d93069bb90fe5e762dd3283a644318937.1662543243.git.christophe.leroy@csgroup.eu --- include/linux/hugetlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3ec981a0d8b3..1ec1535be04f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -17,7 +17,7 @@ struct ctl_table; struct user_struct; struct mmu_gather; -#ifndef is_hugepd +#ifndef CONFIG_ARCH_HAS_HUGEPD typedef struct { unsigned long pd; } hugepd_t; #define is_hugepd(hugepd) (0) #define __hugepd(x) ((hugepd_t) { (x) }) -- cgit From 4f58330fcc8482aa90674e1f40f601e82f18ed4a Mon Sep 17 00:00:00 2001 From: Robin Murphy Date: Tue, 13 Sep 2022 12:47:20 +0100 Subject: iommu/iova: Fix module config properly IOMMU_IOVA is intended to be an optional library for users to select as and when they desire. Since it can be a module now, this means that built-in code which has chosen not to select it should not fail to link if it happens to have selected as a module by someone else. Replace IS_ENABLED() with IS_REACHABLE() to do the right thing. CC: Thierry Reding Reported-by: John Garry Fixes: 15bbdec3931e ("iommu: Make the iova library a module") Signed-off-by: Robin Murphy Reviewed-by: Thierry Reding Link: https://lore.kernel.org/r/548c2f683ca379aface59639a8f0cccc3a1ac050.1663069227.git.robin.murphy@arm.com Signed-off-by: Joerg Roedel --- include/linux/iova.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iova.h b/include/linux/iova.h index c6ba6d95d79c..83c00fac2acb 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -75,7 +75,7 @@ static inline unsigned long iova_pfn(struct iova_domain *iovad, dma_addr_t iova) return iova >> iova_shift(iovad); } -#if IS_ENABLED(CONFIG_IOMMU_IOVA) +#if IS_REACHABLE(CONFIG_IOMMU_IOVA) int iova_cache_get(void); void iova_cache_put(void); -- cgit From dc09fe1c5edd9c27a52cb6dc5a7bb4452d45c71c Mon Sep 17 00:00:00 2001 From: Sven Peter Date: Fri, 16 Sep 2022 11:41:51 +0200 Subject: iommu/io-pgtable-dart: Add DART PTE support for t6000 The DARTs present in the M1 Pro/Max/Ultra SoC use a diffent PTE format. They support a 42bit physical address space by shifting the paddr and extending its mask inside the PTE. They also come with mandatory sub-page protection now which we just configure to always allow access to the entire page. This feature is already present but optional on the previous DARTs which allows to unconditionally configure it. Signed-off-by: Sven Peter Co-developed-by: Janne Grunau Signed-off-by: Janne Grunau Reviewed-by: Rob Herring Acked-by: Hector Martin Link: https://lore.kernel.org/r/20220916094152.87137-5-j@jannau.net Signed-off-by: Joerg Roedel --- include/linux/io-pgtable.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index ca98aeadcc80..b768937382cd 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -17,6 +17,7 @@ enum io_pgtable_fmt { ARM_MALI_LPAE, AMD_IOMMU_V1, APPLE_DART, + APPLE_DART2, IO_PGTABLE_NUM_FMTS, }; -- cgit From c59fb127583869350256656b7ed848c398bef879 Mon Sep 17 00:00:00 2001 From: Paolo Bonzini Date: Wed, 21 Sep 2022 00:32:01 +0000 Subject: KVM: remove KVM_REQ_UNHALT KVM_REQ_UNHALT is now unnecessary because it is replaced by the return value of kvm_vcpu_block/kvm_vcpu_halt. Remove it. No functional change intended. Signed-off-by: Paolo Bonzini Signed-off-by: Sean Christopherson Acked-by: Marc Zyngier Message-Id: <20220921003201.1441511-13-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 04c7e5f2f727..32f259fa5801 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -151,12 +151,11 @@ static inline bool is_error_page(struct page *page) #define KVM_REQUEST_NO_ACTION BIT(10) /* * Architecture-independent vcpu->requests bit members - * Bits 4-7 are reserved for more arch-independent bits. + * Bits 3-7 are reserved for more arch-independent bits. */ #define KVM_REQ_TLB_FLUSH (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_VM_DEAD (1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) #define KVM_REQ_UNBLOCK 2 -#define KVM_REQ_UNHALT 3 #define KVM_REQUEST_ARCH_BASE 8 /* -- cgit From 9fca7115827b2e5f48d84e50bceb4edfd4cb6375 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Thu, 8 Sep 2022 14:54:45 -0700 Subject: cfi: Remove CONFIG_CFI_CLANG_SHADOW In preparation to switching to -fsanitize=kcfi, remove support for the CFI module shadow that will no longer be needed. Signed-off-by: Sami Tolvanen Reviewed-by: Kees Cook Tested-by: Kees Cook Tested-by: Nathan Chancellor Acked-by: Peter Zijlstra (Intel) Tested-by: Peter Zijlstra (Intel) Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220908215504.3686827-4-samitolvanen@google.com --- include/linux/cfi.h | 12 ------------ 1 file changed, 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cfi.h b/include/linux/cfi.h index c6dfc1ed0626..4ab51c067007 100644 --- a/include/linux/cfi.h +++ b/include/linux/cfi.h @@ -20,18 +20,6 @@ extern void __cfi_check(uint64_t id, void *ptr, void *diag); #define __CFI_ADDRESSABLE(fn, __attr) \ const void *__cfi_jt_ ## fn __visible __attr = (void *)&fn -#ifdef CONFIG_CFI_CLANG_SHADOW - -extern void cfi_module_add(struct module *mod, unsigned long base_addr); -extern void cfi_module_remove(struct module *mod, unsigned long base_addr); - -#else - -static inline void cfi_module_add(struct module *mod, unsigned long base_addr) {} -static inline void cfi_module_remove(struct module *mod, unsigned long base_addr) {} - -#endif /* CONFIG_CFI_CLANG_SHADOW */ - #else /* !CONFIG_CFI_CLANG */ #ifdef CONFIG_X86_KERNEL_IBT -- cgit From 92efda8eb15295a07f450828b2db14485bfc09c2 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Thu, 8 Sep 2022 14:54:46 -0700 Subject: cfi: Drop __CFI_ADDRESSABLE The __CFI_ADDRESSABLE macro is used for init_module and cleanup_module to ensure we have the address of the CFI jump table, and with CONFIG_X86_KERNEL_IBT to ensure LTO won't optimize away the symbols. As __CFI_ADDRESSABLE is no longer necessary with -fsanitize=kcfi, add a more flexible version of the __ADDRESSABLE macro and always ensure these symbols won't be dropped. Signed-off-by: Sami Tolvanen Reviewed-by: Kees Cook Tested-by: Kees Cook Tested-by: Nathan Chancellor Acked-by: Peter Zijlstra (Intel) Tested-by: Peter Zijlstra (Intel) Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220908215504.3686827-5-samitolvanen@google.com --- include/linux/cfi.h | 20 -------------------- include/linux/compiler.h | 6 ++++-- include/linux/module.h | 4 ++-- 3 files changed, 6 insertions(+), 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cfi.h b/include/linux/cfi.h index 4ab51c067007..2cdbc0fbd0ab 100644 --- a/include/linux/cfi.h +++ b/include/linux/cfi.h @@ -13,26 +13,6 @@ typedef void (*cfi_check_fn)(uint64_t id, void *ptr, void *diag); /* Compiler-generated function in each module, and the kernel */ extern void __cfi_check(uint64_t id, void *ptr, void *diag); -/* - * Force the compiler to generate a CFI jump table entry for a function - * and store the jump table address to __cfi_jt_. - */ -#define __CFI_ADDRESSABLE(fn, __attr) \ - const void *__cfi_jt_ ## fn __visible __attr = (void *)&fn - -#else /* !CONFIG_CFI_CLANG */ - -#ifdef CONFIG_X86_KERNEL_IBT - -#define __CFI_ADDRESSABLE(fn, __attr) \ - const void *__cfi_jt_ ## fn __visible __attr = (void *)&fn - -#endif /* CONFIG_X86_KERNEL_IBT */ - #endif /* CONFIG_CFI_CLANG */ -#ifndef __CFI_ADDRESSABLE -#define __CFI_ADDRESSABLE(fn, __attr) -#endif - #endif /* _LINUX_CFI_H */ diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 7713d7bcdaea..7bfafc69172a 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -221,9 +221,11 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, * otherwise, or eliminated entirely due to lack of references that are * visible to the compiler. */ -#define __ADDRESSABLE(sym) \ - static void * __section(".discard.addressable") __used \ +#define ___ADDRESSABLE(sym, __attrs) \ + static void * __used __attrs \ __UNIQUE_ID(__PASTE(__addressable_,sym)) = (void *)&sym; +#define __ADDRESSABLE(sym) \ + ___ADDRESSABLE(sym, __section(".discard.addressable")) /** * offset_to_ptr - convert a relative memory offset to an absolute pointer diff --git a/include/linux/module.h b/include/linux/module.h index 518296ea7f73..8937b020ec04 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -132,7 +132,7 @@ extern void cleanup_module(void); { return initfn; } \ int init_module(void) __copy(initfn) \ __attribute__((alias(#initfn))); \ - __CFI_ADDRESSABLE(init_module, __initdata); + ___ADDRESSABLE(init_module, __initdata); /* This is only required if you want to be unloadable. */ #define module_exit(exitfn) \ @@ -140,7 +140,7 @@ extern void cleanup_module(void); { return exitfn; } \ void cleanup_module(void) __copy(exitfn) \ __attribute__((alias(#exitfn))); \ - __CFI_ADDRESSABLE(cleanup_module, __exitdata); + ___ADDRESSABLE(cleanup_module, __exitdata); #endif -- cgit From 89245600941e4e0f87d77f60ee269b5e61ef4e49 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Thu, 8 Sep 2022 14:54:47 -0700 Subject: cfi: Switch to -fsanitize=kcfi Switch from Clang's original forward-edge control-flow integrity implementation to -fsanitize=kcfi, which is better suited for the kernel, as it doesn't require LTO, doesn't use a jump table that requires altering function references, and won't break cross-module function address equality. Signed-off-by: Sami Tolvanen Reviewed-by: Kees Cook Tested-by: Kees Cook Tested-by: Nathan Chancellor Acked-by: Peter Zijlstra (Intel) Tested-by: Peter Zijlstra (Intel) Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220908215504.3686827-6-samitolvanen@google.com --- include/linux/cfi.h | 29 +++++++++++++++++++++++++---- include/linux/compiler-clang.h | 14 +++----------- include/linux/module.h | 6 +++--- 3 files changed, 31 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cfi.h b/include/linux/cfi.h index 2cdbc0fbd0ab..5e134f4ce8b7 100644 --- a/include/linux/cfi.h +++ b/include/linux/cfi.h @@ -2,17 +2,38 @@ /* * Clang Control Flow Integrity (CFI) support. * - * Copyright (C) 2021 Google LLC + * Copyright (C) 2022 Google LLC */ #ifndef _LINUX_CFI_H #define _LINUX_CFI_H +#include +#include + #ifdef CONFIG_CFI_CLANG -typedef void (*cfi_check_fn)(uint64_t id, void *ptr, void *diag); +enum bug_trap_type report_cfi_failure(struct pt_regs *regs, unsigned long addr, + unsigned long *target, u32 type); -/* Compiler-generated function in each module, and the kernel */ -extern void __cfi_check(uint64_t id, void *ptr, void *diag); +static inline enum bug_trap_type report_cfi_failure_noaddr(struct pt_regs *regs, + unsigned long addr) +{ + return report_cfi_failure(regs, addr, NULL, 0); +} +#ifdef CONFIG_ARCH_USES_CFI_TRAPS +bool is_cfi_trap(unsigned long addr); +#endif #endif /* CONFIG_CFI_CLANG */ +#ifdef CONFIG_MODULES +#ifdef CONFIG_ARCH_USES_CFI_TRAPS +void module_cfi_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs, + struct module *mod); +#else +static inline void module_cfi_finalize(const Elf_Ehdr *hdr, + const Elf_Shdr *sechdrs, + struct module *mod) {} +#endif /* CONFIG_ARCH_USES_CFI_TRAPS */ +#endif /* CONFIG_MODULES */ + #endif /* _LINUX_CFI_H */ diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index c84fec767445..42e55579d649 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -66,17 +66,9 @@ # define __noscs __attribute__((__no_sanitize__("shadow-call-stack"))) #endif -#define __nocfi __attribute__((__no_sanitize__("cfi"))) -#define __cficanonical __attribute__((__cfi_canonical_jump_table__)) - -#if defined(CONFIG_CFI_CLANG) -/* - * With CONFIG_CFI_CLANG, the compiler replaces function address - * references with the address of the function's CFI jump table - * entry. The function_nocfi macro always returns the address of the - * actual function instead. - */ -#define function_nocfi(x) __builtin_function_start(x) +#if __has_feature(kcfi) +/* Disable CFI checking inside a function. */ +#define __nocfi __attribute__((__no_sanitize__("kcfi"))) #endif /* diff --git a/include/linux/module.h b/include/linux/module.h index 8937b020ec04..ec61fb53979a 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -27,7 +27,6 @@ #include #include #include -#include #include #include @@ -387,8 +386,9 @@ struct module { const s32 *crcs; unsigned int num_syms; -#ifdef CONFIG_CFI_CLANG - cfi_check_fn cfi_check; +#ifdef CONFIG_ARCH_USES_CFI_TRAPS + s32 *kcfi_traps; + s32 *kcfi_traps_end; #endif /* Kernel parameters. */ -- cgit From e84e008e7b02c015047e76261726da1550130a59 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Thu, 8 Sep 2022 14:54:48 -0700 Subject: cfi: Add type helper macros With CONFIG_CFI_CLANG, assembly functions called indirectly from C code must be annotated with type identifiers to pass CFI checking. In order to make this easier, the compiler emits a __kcfi_typeid_ symbol for each address-taken function declaration in C, which contains the expected type identifier that we can refer to in assembly code. Add a typed version of SYM_FUNC_START, which emits the type identifier before the function. Architectures that support KCFI can define their own __CFI_TYPE macro to override the default preamble format. As an example, for the x86_64 blowfish_dec_blk function, the compiler emits the following type symbol: $ readelf -sW vmlinux | grep __kcfi_typeid_blowfish_dec_blk 120204: 00000000ef478db5 0 NOTYPE WEAK DEFAULT ABS __kcfi_typeid_blowfish_dec_blk And SYM_TYPED_FUNC_START will generate the following preamble based on the __CFI_TYPE definition for the architecture: $ objdump -dr arch/x86/crypto/blowfish-x86_64-asm_64.o ... 0000000000000400 <__cfi_blowfish_dec_blk>: ... 40b: b8 00 00 00 00 mov $0x0,%eax 40c: R_X86_64_32 __kcfi_typeid_blowfish_dec_blk 0000000000000410 : ... Note that the address of all assembly functions annotated with SYM_TYPED_FUNC_START must be taken in C code that's linked into the binary or the missing __kcfi_typeid_ symbol will result in a linker error with CONFIG_CFI_CLANG. If the code that contains the indirect call is not always compiled in, __ADDRESSABLE(functionname) can be used to ensure that the __kcfi_typeid_ symbol is emitted. Signed-off-by: Sami Tolvanen Reviewed-by: Kees Cook Tested-by: Kees Cook Tested-by: Nathan Chancellor Acked-by: Peter Zijlstra (Intel) Tested-by: Peter Zijlstra (Intel) Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220908215504.3686827-7-samitolvanen@google.com --- include/linux/cfi_types.h | 45 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 include/linux/cfi_types.h (limited to 'include/linux') diff --git a/include/linux/cfi_types.h b/include/linux/cfi_types.h new file mode 100644 index 000000000000..6b8713675765 --- /dev/null +++ b/include/linux/cfi_types.h @@ -0,0 +1,45 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Clang Control Flow Integrity (CFI) type definitions. + */ +#ifndef _LINUX_CFI_TYPES_H +#define _LINUX_CFI_TYPES_H + +#ifdef __ASSEMBLY__ +#include + +#ifdef CONFIG_CFI_CLANG +/* + * Use the __kcfi_typeid_ type identifier symbol to + * annotate indirectly called assembly functions. The compiler emits + * these symbols for all address-taken function declarations in C + * code. + */ +#ifndef __CFI_TYPE +#define __CFI_TYPE(name) \ + .4byte __kcfi_typeid_##name +#endif + +#define SYM_TYPED_ENTRY(name, linkage, align...) \ + linkage(name) ASM_NL \ + align ASM_NL \ + __CFI_TYPE(name) ASM_NL \ + name: + +#define SYM_TYPED_START(name, linkage, align...) \ + SYM_TYPED_ENTRY(name, linkage, align) + +#else /* CONFIG_CFI_CLANG */ + +#define SYM_TYPED_START(name, linkage, align...) \ + SYM_START(name, linkage, align) + +#endif /* CONFIG_CFI_CLANG */ + +#ifndef SYM_TYPED_FUNC_START +#define SYM_TYPED_FUNC_START(name) \ + SYM_TYPED_START(name, SYM_L_GLOBAL, SYM_A_ALIGN) +#endif + +#endif /* __ASSEMBLY__ */ +#endif /* _LINUX_CFI_TYPES_H */ -- cgit From 5dbbb3eaa2a784342f3206b77381b516181d089c Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Thu, 8 Sep 2022 14:54:54 -0700 Subject: init: Drop __nocfi from __init It's no longer necessary to disable CFI checking for all __init functions. Drop the __nocfi attribute from __init. Signed-off-by: Sami Tolvanen Reviewed-by: Kees Cook Tested-by: Kees Cook Tested-by: Nathan Chancellor Acked-by: Peter Zijlstra (Intel) Tested-by: Peter Zijlstra (Intel) Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220908215504.3686827-13-samitolvanen@google.com --- include/linux/init.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/init.h b/include/linux/init.h index baf0b29a7010..88f2964097f5 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -47,7 +47,7 @@ /* These are for everybody (although not all archs will actually discard it in modules) */ -#define __init __section(".init.text") __cold __latent_entropy __noinitretpoline __nocfi +#define __init __section(".init.text") __cold __latent_entropy __noinitretpoline #define __initdata __section(".init.data") #define __initconst __section(".init.rodata") #define __exitdata __section(".exit.data") -- cgit From 607289a7cd7a3ca42b8a6877fcb6072e6eb20c34 Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Thu, 8 Sep 2022 14:54:55 -0700 Subject: treewide: Drop function_nocfi With -fsanitize=kcfi, we no longer need function_nocfi() as the compiler won't change function references to point to a jump table. Remove all implementations and uses of the macro. Signed-off-by: Sami Tolvanen Reviewed-by: Kees Cook Tested-by: Kees Cook Tested-by: Nathan Chancellor Acked-by: Peter Zijlstra (Intel) Tested-by: Peter Zijlstra (Intel) Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220908215504.3686827-14-samitolvanen@google.com --- include/linux/compiler.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler.h b/include/linux/compiler.h index 7bfafc69172a..973a1bfd7ef5 100644 --- a/include/linux/compiler.h +++ b/include/linux/compiler.h @@ -203,16 +203,6 @@ void ftrace_likely_update(struct ftrace_likely_data *f, int val, __v; \ }) -/* - * With CONFIG_CFI_CLANG, the compiler replaces function addresses in - * instrumented C code with jump table addresses. Architectures that - * support CFI can define this macro to return the actual function address - * when needed. - */ -#ifndef function_nocfi -#define function_nocfi(x) (x) -#endif - #endif /* __KERNEL__ */ /* -- cgit From 5659b598b4dcb352b1a567c55fc5a658bc80076c Mon Sep 17 00:00:00 2001 From: Sami Tolvanen Date: Thu, 8 Sep 2022 14:54:57 -0700 Subject: treewide: Drop __cficanonical CONFIG_CFI_CLANG doesn't use a jump table anymore and therefore, won't change function references to point elsewhere. Remove the __cficanonical attribute and all uses of it. Note that the Clang definition of the attribute was removed earlier, just clean up the no-op definition and users. Signed-off-by: Sami Tolvanen Reviewed-by: Kees Cook Tested-by: Kees Cook Tested-by: Nathan Chancellor Acked-by: Peter Zijlstra (Intel) Tested-by: Peter Zijlstra (Intel) Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220908215504.3686827-16-samitolvanen@google.com --- include/linux/compiler_types.h | 4 ---- include/linux/init.h | 4 ++-- include/linux/pci.h | 4 ++-- 3 files changed, 4 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4f2a819fd60a..6f2ec0976e2d 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -265,10 +265,6 @@ struct ftrace_likely_data { # define __nocfi #endif -#ifndef __cficanonical -# define __cficanonical -#endif - /* * Any place that could be marked with the "alloc_size" attribute is also * a place to be marked with the "malloc" attribute. Do this as part of the diff --git a/include/linux/init.h b/include/linux/init.h index 88f2964097f5..a0a90cd73ebe 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -220,8 +220,8 @@ extern bool initcall_debug; __initcall_name(initstub, __iid, id) #define __define_initcall_stub(__stub, fn) \ - int __init __cficanonical __stub(void); \ - int __init __cficanonical __stub(void) \ + int __init __stub(void); \ + int __init __stub(void) \ { \ return fn(); \ } \ diff --git a/include/linux/pci.h b/include/linux/pci.h index 060af91bafcd..5da0846aa3c1 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2019,8 +2019,8 @@ enum pci_fixup_pass { #ifdef CONFIG_LTO_CLANG #define __DECLARE_PCI_FIXUP_SECTION(sec, name, vendor, device, class, \ class_shift, hook, stub) \ - void __cficanonical stub(struct pci_dev *dev); \ - void __cficanonical stub(struct pci_dev *dev) \ + void stub(struct pci_dev *dev); \ + void stub(struct pci_dev *dev) \ { \ hook(dev); \ } \ -- cgit From fa35198f39571bbdae53c5b321020021eaad6bd2 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 19 Sep 2022 16:33:33 -0700 Subject: fortify: Explicitly check bounds are compile-time constants In preparation for replacing __builtin_object_size() with __builtin_dynamic_object_size(), all the compile-time size checks need to check that the bounds comparisons are, in fact, known at compile-time. Enforce what was guaranteed with __bos(). In other words, since all uses of __bos() were constant expressions, it was not required to test for this. When these change to __bdos(), they _may_ be constant expressions, and the checks are only valid when the prior condition holds. This results in no binary differences. Cc: linux-hardening@vger.kernel.org Link: https://lore.kernel.org/lkml/20220920192202.190793-3-keescook@chromium.org Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 49 ++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index ff879efe94ed..1c582224c525 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -80,6 +80,11 @@ extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) #define POS __pass_object_size(1) #define POS0 __pass_object_size(0) +#define __compiletime_lessthan(bounds, length) ( \ + __builtin_constant_p((bounds) < (length)) && \ + (bounds) < (length) \ +) + /** * strncpy - Copy a string to memory with non-guaranteed NUL padding * @@ -117,7 +122,7 @@ char *strncpy(char * const POS p, const char *q, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 1); - if (__builtin_constant_p(size) && p_size < size) + if (__compiletime_lessthan(p_size, size)) __write_overflow(); if (p_size < size) fortify_panic(__func__); @@ -224,7 +229,7 @@ __FORTIFY_INLINE ssize_t strscpy(char * const POS p, const char * const POS q, s * If size can be known at compile time and is greater than * p_size, generate a compile time write overflow error. */ - if (__builtin_constant_p(size) && size > p_size) + if (__compiletime_lessthan(p_size, size)) __write_overflow(); /* @@ -281,15 +286,16 @@ __FORTIFY_INLINE void fortify_memset_chk(__kernel_size_t size, /* * Length argument is a constant expression, so we * can perform compile-time bounds checking where - * buffer sizes are known. + * buffer sizes are also known at compile time. */ /* Error when size is larger than enclosing struct. */ - if (p_size > p_size_field && p_size < size) + if (__compiletime_lessthan(p_size_field, p_size) && + __compiletime_lessthan(p_size, size)) __write_overflow(); /* Warn when write size is larger than dest field. */ - if (p_size_field < size) + if (__compiletime_lessthan(p_size_field, size)) __write_overflow_field(p_size_field, size); } /* @@ -365,25 +371,28 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size, /* * Length argument is a constant expression, so we * can perform compile-time bounds checking where - * buffer sizes are known. + * buffer sizes are also known at compile time. */ /* Error when size is larger than enclosing struct. */ - if (p_size > p_size_field && p_size < size) + if (__compiletime_lessthan(p_size_field, p_size) && + __compiletime_lessthan(p_size, size)) __write_overflow(); - if (q_size > q_size_field && q_size < size) + if (__compiletime_lessthan(q_size_field, q_size) && + __compiletime_lessthan(q_size, size)) __read_overflow2(); /* Warn when write size argument larger than dest field. */ - if (p_size_field < size) + if (__compiletime_lessthan(p_size_field, size)) __write_overflow_field(p_size_field, size); /* * Warn for source field over-read when building with W=1 * or when an over-write happened, so both can be fixed at * the same time. */ - if ((IS_ENABLED(KBUILD_EXTRA_WARN1) || p_size_field < size) && - q_size_field < size) + if ((IS_ENABLED(KBUILD_EXTRA_WARN1) || + __compiletime_lessthan(p_size_field, size)) && + __compiletime_lessthan(q_size_field, size)) __read_overflow2_field(q_size_field, size); } /* @@ -494,7 +503,7 @@ __FORTIFY_INLINE void *memscan(void * const POS0 p, int c, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); - if (__builtin_constant_p(size) && p_size < size) + if (__compiletime_lessthan(p_size, size)) __read_overflow(); if (p_size < size) fortify_panic(__func__); @@ -508,9 +517,9 @@ int memcmp(const void * const POS0 p, const void * const POS0 q, __kernel_size_t size_t q_size = __builtin_object_size(q, 0); if (__builtin_constant_p(size)) { - if (p_size < size) + if (__compiletime_lessthan(p_size, size)) __read_overflow(); - if (q_size < size) + if (__compiletime_lessthan(q_size, size)) __read_overflow2(); } if (p_size < size || q_size < size) @@ -523,7 +532,7 @@ void *memchr(const void * const POS0 p, int c, __kernel_size_t size) { size_t p_size = __builtin_object_size(p, 0); - if (__builtin_constant_p(size) && p_size < size) + if (__compiletime_lessthan(p_size, size)) __read_overflow(); if (p_size < size) fortify_panic(__func__); @@ -535,7 +544,7 @@ __FORTIFY_INLINE void *memchr_inv(const void * const POS0 p, int c, size_t size) { size_t p_size = __builtin_object_size(p, 0); - if (__builtin_constant_p(size) && p_size < size) + if (__compiletime_lessthan(p_size, size)) __read_overflow(); if (p_size < size) fortify_panic(__func__); @@ -547,7 +556,7 @@ __FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp { size_t p_size = __builtin_object_size(p, 0); - if (__builtin_constant_p(size) && p_size < size) + if (__compiletime_lessthan(p_size, size)) __read_overflow(); if (p_size < size) fortify_panic(__func__); @@ -563,11 +572,13 @@ char *strcpy(char * const POS p, const char * const POS q) size_t size; /* If neither buffer size is known, immediately give up. */ - if (p_size == SIZE_MAX && q_size == SIZE_MAX) + if (__builtin_constant_p(p_size) && + __builtin_constant_p(q_size) && + p_size == SIZE_MAX && q_size == SIZE_MAX) return __underlying_strcpy(p, q); size = strlen(q) + 1; /* Compile-time check for const size overflow. */ - if (__builtin_constant_p(size) && p_size < size) + if (__compiletime_lessthan(p_size, size)) __write_overflow(); /* Run-time check for dynamic size overflow. */ if (p_size < size) -- cgit From 9f7d69c5cd23904a29178a7ecc4eee9c1cfba04b Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 19 Sep 2022 19:50:32 -0700 Subject: fortify: Convert to struct vs member helpers In preparation for adding support for __builtin_dynamic_object_size(), wrap each instance of __builtin_object_size(p, N) with either the new __struct_size(p) as __bos(p, 0), or __member_size(p) as __bos(p, 1). This will allow us to replace the definitions with __bdos() next. There are no binary differences from this change. Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Tom Rix Cc: linux-hardening@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/lkml/20220920192202.190793-4-keescook@chromium.org Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 68 ++++++++++++++++++++++-------------------- 1 file changed, 35 insertions(+), 33 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 1c582224c525..b62c90cfafaf 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -20,7 +20,7 @@ void __write_overflow_field(size_t avail, size_t wanted) __compiletime_warning(" ({ \ unsigned char *__p = (unsigned char *)(p); \ size_t __ret = SIZE_MAX; \ - size_t __p_size = __builtin_object_size(p, 1); \ + size_t __p_size = __member_size(p); \ if (__p_size != SIZE_MAX && \ __builtin_constant_p(*__p)) { \ size_t __p_len = __p_size - 1; \ @@ -72,13 +72,15 @@ extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) __underlying_memcpy(dst, src, bytes) /* - * Clang's use of __builtin_object_size() within inlines needs hinting via - * __pass_object_size(). The preference is to only ever use type 1 (member + * Clang's use of __builtin_*object_size() within inlines needs hinting via + * __pass_*object_size(). The preference is to only ever use type 1 (member * size, rather than struct size), but there remain some stragglers using * type 0 that will be converted in the future. */ -#define POS __pass_object_size(1) -#define POS0 __pass_object_size(0) +#define POS __pass_object_size(1) +#define POS0 __pass_object_size(0) +#define __struct_size(p) __builtin_object_size(p, 0) +#define __member_size(p) __builtin_object_size(p, 1) #define __compiletime_lessthan(bounds, length) ( \ __builtin_constant_p((bounds) < (length)) && \ @@ -120,7 +122,7 @@ extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) __FORTIFY_INLINE __diagnose_as(__builtin_strncpy, 1, 2, 3) char *strncpy(char * const POS p, const char *q, __kernel_size_t size) { - size_t p_size = __builtin_object_size(p, 1); + size_t p_size = __member_size(p); if (__compiletime_lessthan(p_size, size)) __write_overflow(); @@ -132,7 +134,7 @@ char *strncpy(char * const POS p, const char *q, __kernel_size_t size) __FORTIFY_INLINE __diagnose_as(__builtin_strcat, 1, 2) char *strcat(char * const POS p, const char *q) { - size_t p_size = __builtin_object_size(p, 1); + size_t p_size = __member_size(p); if (p_size == SIZE_MAX) return __underlying_strcat(p, q); @@ -144,7 +146,7 @@ char *strcat(char * const POS p, const char *q) extern __kernel_size_t __real_strnlen(const char *, __kernel_size_t) __RENAME(strnlen); __FORTIFY_INLINE __kernel_size_t strnlen(const char * const POS p, __kernel_size_t maxlen) { - size_t p_size = __builtin_object_size(p, 1); + size_t p_size = __member_size(p); size_t p_len = __compiletime_strlen(p); size_t ret; @@ -174,7 +176,7 @@ __FORTIFY_INLINE __diagnose_as(__builtin_strlen, 1) __kernel_size_t __fortify_strlen(const char * const POS p) { __kernel_size_t ret; - size_t p_size = __builtin_object_size(p, 1); + size_t p_size = __member_size(p); /* Give up if we don't know how large p is. */ if (p_size == SIZE_MAX) @@ -189,8 +191,8 @@ __kernel_size_t __fortify_strlen(const char * const POS p) extern size_t __real_strlcpy(char *, const char *, size_t) __RENAME(strlcpy); __FORTIFY_INLINE size_t strlcpy(char * const POS p, const char * const POS q, size_t size) { - size_t p_size = __builtin_object_size(p, 1); - size_t q_size = __builtin_object_size(q, 1); + size_t p_size = __member_size(p); + size_t q_size = __member_size(q); size_t q_len; /* Full count of source string length. */ size_t len; /* Count of characters going into destination. */ @@ -218,8 +220,8 @@ __FORTIFY_INLINE ssize_t strscpy(char * const POS p, const char * const POS q, s { size_t len; /* Use string size rather than possible enclosing struct size. */ - size_t p_size = __builtin_object_size(p, 1); - size_t q_size = __builtin_object_size(q, 1); + size_t p_size = __member_size(p); + size_t q_size = __member_size(q); /* If we cannot get size of p and q default to call strscpy. */ if (p_size == SIZE_MAX && q_size == SIZE_MAX) @@ -264,8 +266,8 @@ __FORTIFY_INLINE __diagnose_as(__builtin_strncat, 1, 2, 3) char *strncat(char * const POS p, const char * const POS q, __kernel_size_t count) { size_t p_len, copy_len; - size_t p_size = __builtin_object_size(p, 1); - size_t q_size = __builtin_object_size(q, 1); + size_t p_size = __member_size(p); + size_t q_size = __member_size(q); if (p_size == SIZE_MAX && q_size == SIZE_MAX) return __underlying_strncat(p, q, count); @@ -323,11 +325,11 @@ __FORTIFY_INLINE void fortify_memset_chk(__kernel_size_t size, }) /* - * __builtin_object_size() must be captured here to avoid evaluating argument - * side-effects further into the macro layers. + * __struct_size() vs __member_size() must be captured here to avoid + * evaluating argument side-effects further into the macro layers. */ #define memset(p, c, s) __fortify_memset_chk(p, c, s, \ - __builtin_object_size(p, 0), __builtin_object_size(p, 1)) + __struct_size(p), __member_size(p)) /* * To make sure the compiler can enforce protection against buffer overflows, @@ -420,7 +422,7 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size, * fake flexible arrays, until they are all converted to * proper flexible arrays. * - * The implementation of __builtin_object_size() behaves + * The implementation of __builtin_*object_size() behaves * like sizeof() when not directly referencing a flexible * array member, which means there will be many bounds checks * that will appear at run-time, without a way for them to be @@ -486,22 +488,22 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size, */ /* - * __builtin_object_size() must be captured here to avoid evaluating argument - * side-effects further into the macro layers. + * __struct_size() vs __member_size() must be captured here to avoid + * evaluating argument side-effects further into the macro layers. */ #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \ - __builtin_object_size(p, 0), __builtin_object_size(q, 0), \ - __builtin_object_size(p, 1), __builtin_object_size(q, 1), \ + __struct_size(p), __struct_size(q), \ + __member_size(p), __member_size(q), \ memcpy) #define memmove(p, q, s) __fortify_memcpy_chk(p, q, s, \ - __builtin_object_size(p, 0), __builtin_object_size(q, 0), \ - __builtin_object_size(p, 1), __builtin_object_size(q, 1), \ + __struct_size(p), __struct_size(q), \ + __member_size(p), __member_size(q), \ memmove) extern void *__real_memscan(void *, int, __kernel_size_t) __RENAME(memscan); __FORTIFY_INLINE void *memscan(void * const POS0 p, int c, __kernel_size_t size) { - size_t p_size = __builtin_object_size(p, 0); + size_t p_size = __struct_size(p); if (__compiletime_lessthan(p_size, size)) __read_overflow(); @@ -513,8 +515,8 @@ __FORTIFY_INLINE void *memscan(void * const POS0 p, int c, __kernel_size_t size) __FORTIFY_INLINE __diagnose_as(__builtin_memcmp, 1, 2, 3) int memcmp(const void * const POS0 p, const void * const POS0 q, __kernel_size_t size) { - size_t p_size = __builtin_object_size(p, 0); - size_t q_size = __builtin_object_size(q, 0); + size_t p_size = __struct_size(p); + size_t q_size = __struct_size(q); if (__builtin_constant_p(size)) { if (__compiletime_lessthan(p_size, size)) @@ -530,7 +532,7 @@ int memcmp(const void * const POS0 p, const void * const POS0 q, __kernel_size_t __FORTIFY_INLINE __diagnose_as(__builtin_memchr, 1, 2, 3) void *memchr(const void * const POS0 p, int c, __kernel_size_t size) { - size_t p_size = __builtin_object_size(p, 0); + size_t p_size = __struct_size(p); if (__compiletime_lessthan(p_size, size)) __read_overflow(); @@ -542,7 +544,7 @@ void *memchr(const void * const POS0 p, int c, __kernel_size_t size) void *__real_memchr_inv(const void *s, int c, size_t n) __RENAME(memchr_inv); __FORTIFY_INLINE void *memchr_inv(const void * const POS0 p, int c, size_t size) { - size_t p_size = __builtin_object_size(p, 0); + size_t p_size = __struct_size(p); if (__compiletime_lessthan(p_size, size)) __read_overflow(); @@ -554,7 +556,7 @@ __FORTIFY_INLINE void *memchr_inv(const void * const POS0 p, int c, size_t size) extern void *__real_kmemdup(const void *src, size_t len, gfp_t gfp) __RENAME(kmemdup); __FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp) { - size_t p_size = __builtin_object_size(p, 0); + size_t p_size = __struct_size(p); if (__compiletime_lessthan(p_size, size)) __read_overflow(); @@ -567,8 +569,8 @@ __FORTIFY_INLINE void *kmemdup(const void * const POS0 p, size_t size, gfp_t gfp __FORTIFY_INLINE __diagnose_as(__builtin_strcpy, 1, 2) char *strcpy(char * const POS p, const char * const POS q) { - size_t p_size = __builtin_object_size(p, 1); - size_t q_size = __builtin_object_size(q, 1); + size_t p_size = __member_size(p); + size_t q_size = __member_size(q); size_t size; /* If neither buffer size is known, immediately give up. */ -- cgit From 90bfc37b5ab91c1a6165e3e5cfc49bf04571b762 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 1 Sep 2022 15:09:53 -0400 Subject: SUNRPC: Fix svcxdr_init_decode's end-of-buffer calculation Ensure that stream-based argument decoding can't go past the actual end of the receive buffer. xdr_init_decode's calculation of the value of xdr->end over-estimates the end of the buffer because the Linux kernel RPC server code does not remove the size of the RPC header from rqstp->rq_arg before calling the upper layer's dispatcher. The server-side still uses the svc_getnl() macros to decode the RPC call header. These macros reduce the length of the head iov but do not update the total length of the message in the buffer (buf->len). A proper fix for this would be to replace the use of svc_getnl() and friends in the RPC header decoder, but that would be a large and invasive change that would be difficult to backport. Fixes: 5191955d6fc6 ("SUNRPC: Prepare for xdr_stream-style decoding on the server-side") Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index daecb009c05b..5a830b66f059 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -544,16 +544,27 @@ static inline void svc_reserve_auth(struct svc_rqst *rqstp, int space) } /** - * svcxdr_init_decode - Prepare an xdr_stream for svc Call decoding + * svcxdr_init_decode - Prepare an xdr_stream for Call decoding * @rqstp: controlling server RPC transaction context * + * This function currently assumes the RPC header in rq_arg has + * already been decoded. Upon return, xdr->p points to the + * location of the upper layer header. */ static inline void svcxdr_init_decode(struct svc_rqst *rqstp) { struct xdr_stream *xdr = &rqstp->rq_arg_stream; - struct kvec *argv = rqstp->rq_arg.head; + struct xdr_buf *buf = &rqstp->rq_arg; + struct kvec *argv = buf->head; - xdr_init_decode(xdr, &rqstp->rq_arg, argv->iov_base, NULL); + /* + * svc_getnl() and friends do not keep the xdr_buf's ::len + * field up to date. Refresh that field before initializing + * the argument decoding stream. + */ + buf->len = buf->head->iov_len + buf->page_len + buf->tail->iov_len; + + xdr_init_decode(xdr, buf, argv->iov_base, NULL); xdr_set_scratch_page(xdr, rqstp->rq_scratch_page); } -- cgit From 1242a87da0d8cd2a428e96ca68e7ea899b0f4624 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Thu, 1 Sep 2022 15:09:59 -0400 Subject: SUNRPC: Fix svcxdr_init_encode's buflen calculation Commit 2825a7f90753 ("nfsd4: allow encoding across page boundaries") added an explicit computation of the remaining length in the rq_res XDR buffer. The computation appears to suffer from an "off-by-one" bug. Because buflen is too large by one page, XDR encoding can run off the end of the send buffer by eventually trying to use the struct page address in rq_page_end, which always contains NULL. Fixes: bddfdbcddbe2 ("NFSD: Extract the svcxdr_init_encode() helper") Reviewed-by: Jeff Layton Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 5a830b66f059..0ca8a8ffb47e 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -587,7 +587,7 @@ static inline void svcxdr_init_encode(struct svc_rqst *rqstp) xdr->end = resv->iov_base + PAGE_SIZE - rqstp->rq_auth_slack; buf->len = resv->iov_len; xdr->page_ptr = buf->pages - 1; - buf->buflen = PAGE_SIZE * (1 + rqstp->rq_page_end - buf->pages); + buf->buflen = PAGE_SIZE * (rqstp->rq_page_end - buf->pages); buf->buflen -= rqstp->rq_auth_slack; xdr->rqst = NULL; } -- cgit From 103cc1fafee48adb91fca0e19deb869fd23e46ab Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:22:38 -0400 Subject: SUNRPC: Parametrize how much of argsize should be zeroed Currently, SUNRPC clears the whole of .pc_argsize before processing each incoming RPC transaction. Add an extra parameter to struct svc_procedure to enable upper layers to reduce the amount of each operation's argument structure that is zeroed by SUNRPC. The size of struct nfsd4_compoundargs, in particular, is a lot to clear on each incoming RPC Call. A subsequent patch will cut this down to something closer to what NFSv2 and NFSv3 uses. This patch should cause no behavior changes. Signed-off-by: Chuck Lever --- include/linux/sunrpc/svc.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h index 0ca8a8ffb47e..88de45491376 100644 --- a/include/linux/sunrpc/svc.h +++ b/include/linux/sunrpc/svc.h @@ -472,6 +472,7 @@ struct svc_procedure { /* XDR free result: */ void (*pc_release)(struct svc_rqst *); unsigned int pc_argsize; /* argument struct size */ + unsigned int pc_argzero; /* how much of argument to clear */ unsigned int pc_ressize; /* result struct size */ unsigned int pc_cachetype; /* cache info (NFS) */ unsigned int pc_xdrressize; /* maximum size of XDR reply */ -- cgit From 98124f5bd6c76699d514fbe491dd95265369cc99 Mon Sep 17 00:00:00 2001 From: Chuck Lever Date: Mon, 12 Sep 2022 17:22:56 -0400 Subject: NFSD: Refactor common code out of dirlist helpers The dust has settled a bit and it's become obvious what code is totally common between nfsd_init_dirlist_pages() and nfsd3_init_dirlist_pages(). Move that common code to SUNRPC. The new helper brackets the existing xdr_init_decode_pages() API. Signed-off-by: Chuck Lever --- include/linux/sunrpc/xdr.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/xdr.h b/include/linux/sunrpc/xdr.h index 69175029abbb..f84e2a1358e1 100644 --- a/include/linux/sunrpc/xdr.h +++ b/include/linux/sunrpc/xdr.h @@ -240,6 +240,8 @@ typedef int (*kxdrdproc_t)(struct rpc_rqst *rqstp, struct xdr_stream *xdr, extern void xdr_init_encode(struct xdr_stream *xdr, struct xdr_buf *buf, __be32 *p, struct rpc_rqst *rqst); +extern void xdr_init_encode_pages(struct xdr_stream *xdr, struct xdr_buf *buf, + struct page **pages, struct rpc_rqst *rqst); extern __be32 *xdr_reserve_space(struct xdr_stream *xdr, size_t nbytes); extern int xdr_reserve_space_vec(struct xdr_stream *xdr, struct kvec *vec, size_t nbytes); -- cgit From 24291caf8447f6fc060c8d00136bdc30ee207f38 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Sat, 17 Sep 2022 20:07:12 -0700 Subject: lib/bitmap: add bitmap_weight_and() The function calculates Hamming weight of (bitmap1 & bitmap2). Now we have to do like this: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); This requires additional memory, adds pressure on alloc subsystem, and way less cache-friendly than just: weight = bitmap_weight_and(map1, map2, nbits); The following patches apply it for cpumask functions. Signed-off-by: Yury Norov --- include/linux/bitmap.h | 12 ++++++++++++ include/linux/cpumask.h | 11 +++++++++++ 2 files changed, 23 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index f65410a49fda..b2aef45af0db 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -51,6 +51,7 @@ struct device; * bitmap_empty(src, nbits) Are all bits zero in *src? * bitmap_full(src, nbits) Are all bits set in *src? * bitmap_weight(src, nbits) Hamming Weight: number set bits + * bitmap_weight_and(src1, src2, nbits) Hamming Weight of and'ed bitmap * bitmap_set(dst, pos, nbits) Set specified bit area * bitmap_clear(dst, pos, nbits) Clear specified bit area * bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area @@ -164,6 +165,8 @@ bool __bitmap_intersects(const unsigned long *bitmap1, bool __bitmap_subset(const unsigned long *bitmap1, const unsigned long *bitmap2, unsigned int nbits); unsigned int __bitmap_weight(const unsigned long *bitmap, unsigned int nbits); +unsigned int __bitmap_weight_and(const unsigned long *bitmap1, + const unsigned long *bitmap2, unsigned int nbits); void __bitmap_set(unsigned long *map, unsigned int start, int len); void __bitmap_clear(unsigned long *map, unsigned int start, int len); @@ -439,6 +442,15 @@ unsigned int bitmap_weight(const unsigned long *src, unsigned int nbits) return __bitmap_weight(src, nbits); } +static __always_inline +unsigned long bitmap_weight_and(const unsigned long *src1, + const unsigned long *src2, unsigned int nbits) +{ + if (small_const_nbits(nbits)) + return hweight_long(*src1 & *src2 & BITMAP_LAST_WORD_MASK(nbits)); + return __bitmap_weight_and(src1, src2, nbits); +} + static __always_inline void bitmap_set(unsigned long *map, unsigned int start, unsigned int nbits) { diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 1b442fb2001f..9a71af8097ca 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -586,6 +586,17 @@ static inline unsigned int cpumask_weight(const struct cpumask *srcp) return bitmap_weight(cpumask_bits(srcp), nr_cpumask_bits); } +/** + * cpumask_weight_and - Count of bits in (*srcp1 & *srcp2) + * @srcp1: the cpumask to count bits (< nr_cpu_ids) in. + * @srcp2: the cpumask to count bits (< nr_cpu_ids) in. + */ +static inline unsigned int cpumask_weight_and(const struct cpumask *srcp1, + const struct cpumask *srcp2) +{ + return bitmap_weight_and(cpumask_bits(srcp1), cpumask_bits(srcp2), nr_cpumask_bits); +} + /** * cpumask_shift_right - *dstp = *srcp >> n * @dstp: the cpumask result -- cgit From 3cea8d475327756066e2a54f0b651bb7284dd448 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Sat, 17 Sep 2022 20:07:13 -0700 Subject: lib: add find_nth{,_and,_andnot}_bit() Kernel lacks for a function that searches for Nth bit in a bitmap. Usually people do it like this: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; We can do it more efficiently, if we: 1. find a word containing Nth bit, using hweight(); and 2. find the bit, using a helper fns(), that works similarly to __ffs() and ffz(). fns() is implemented as a simple loop. For x86_64, there's PDEP instruction to do that: ret = clz(pdep(1 << idx, num)). However, for large bitmaps the most of improvement comes from using hweight(), so I kept fns() simple. New find_nth_bit() is ~70 times faster on x86_64/kvm in find_bit benchmark: find_nth_bit: 7154190 ns, 16411 iterations for_each_bit: 505493126 ns, 16315 iterations With all that, a family of 3 new functions is added, and used where appropriate in the following patches. Signed-off-by: Yury Norov --- include/linux/bitops.h | 19 +++++++++++ include/linux/find.h | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 105 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 3b89c64bcfd8..d7dd83fafeba 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -247,6 +247,25 @@ static inline unsigned long __ffs64(u64 word) return __ffs((unsigned long)word); } +/** + * fns - find N'th set bit in a word + * @word: The word to search + * @n: Bit to find + */ +static inline unsigned long fns(unsigned long word, unsigned int n) +{ + unsigned int bit; + + while (word) { + bit = __ffs(word); + if (n-- == 0) + return bit; + __clear_bit(bit, &word); + } + + return BITS_PER_LONG; +} + /** * assign_bit - Assign value to a bit in memory * @nr: the bit to set diff --git a/include/linux/find.h b/include/linux/find.h index dead6f53a97b..b100944daba0 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -15,6 +15,11 @@ unsigned long _find_next_and_bit(const unsigned long *addr1, const unsigned long unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits, unsigned long start); extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size); +unsigned long __find_nth_bit(const unsigned long *addr, unsigned long size, unsigned long n); +unsigned long __find_nth_and_bit(const unsigned long *addr1, const unsigned long *addr2, + unsigned long size, unsigned long n); +unsigned long __find_nth_andnot_bit(const unsigned long *addr1, const unsigned long *addr2, + unsigned long size, unsigned long n); extern unsigned long _find_first_and_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long size); extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size); @@ -136,6 +141,87 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size) } #endif +/** + * find_nth_bit - find N'th set bit in a memory region + * @addr: The address to start the search at + * @size: The maximum number of bits to search + * @n: The number of set bit, which position is needed, counting from 0 + * + * The following is semantically equivalent: + * idx = find_nth_bit(addr, size, 0); + * idx = find_first_bit(addr, size); + * + * Returns the bit number of the N'th set bit. + * If no such, returns @size. + */ +static inline +unsigned long find_nth_bit(const unsigned long *addr, unsigned long size, unsigned long n) +{ + if (n >= size) + return size; + + if (small_const_nbits(size)) { + unsigned long val = *addr & GENMASK(size - 1, 0); + + return val ? fns(val, n) : size; + } + + return __find_nth_bit(addr, size, n); +} + +/** + * find_nth_and_bit - find N'th set bit in 2 memory regions + * @addr1: The 1st address to start the search at + * @addr2: The 2nd address to start the search at + * @size: The maximum number of bits to search + * @n: The number of set bit, which position is needed, counting from 0 + * + * Returns the bit number of the N'th set bit. + * If no such, returns @size. + */ +static inline +unsigned long find_nth_and_bit(const unsigned long *addr1, const unsigned long *addr2, + unsigned long size, unsigned long n) +{ + if (n >= size) + return size; + + if (small_const_nbits(size)) { + unsigned long val = *addr1 & *addr2 & GENMASK(size - 1, 0); + + return val ? fns(val, n) : size; + } + + return __find_nth_and_bit(addr1, addr2, size, n); +} + +/** + * find_nth_andnot_bit - find N'th set bit in 2 memory regions, + * flipping bits in 2nd region + * @addr1: The 1st address to start the search at + * @addr2: The 2nd address to start the search at + * @size: The maximum number of bits to search + * @n: The number of set bit, which position is needed, counting from 0 + * + * Returns the bit number of the N'th set bit. + * If no such, returns @size. + */ +static inline +unsigned long find_nth_andnot_bit(const unsigned long *addr1, const unsigned long *addr2, + unsigned long size, unsigned long n) +{ + if (n >= size) + return size; + + if (small_const_nbits(size)) { + unsigned long val = *addr1 & (~*addr2) & GENMASK(size - 1, 0); + + return val ? fns(val, n) : size; + } + + return __find_nth_andnot_bit(addr1, addr2, size, n); +} + #ifndef find_first_and_bit /** * find_first_and_bit - find the first set bit in both memory regions -- cgit From 97848c10f9f8a8ce4296b149d06cab424eba05b3 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Sat, 17 Sep 2022 20:07:15 -0700 Subject: lib/bitmap: remove bitmap_ord_to_pos Now that we have find_nth_bit(), we can drop bitmap_ord_to_pos(). Signed-off-by: Yury Norov --- include/linux/bitmap.h | 1 - include/linux/nodemask.h | 3 +-- 2 files changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index b2aef45af0db..7d6d73b78147 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -225,7 +225,6 @@ void bitmap_copy_le(unsigned long *dst, const unsigned long *src, unsigned int n #else #define bitmap_copy_le bitmap_copy #endif -unsigned int bitmap_ord_to_pos(const unsigned long *bitmap, unsigned int ord, unsigned int nbits); int bitmap_print_to_pagebuf(bool list, char *buf, const unsigned long *maskp, int nmaskbits); diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 4b71a96190a8..0c45fb066caa 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -508,8 +508,7 @@ static inline int node_random(const nodemask_t *maskp) w = nodes_weight(*maskp); if (w) - bit = bitmap_ord_to_pos(maskp->bits, - get_random_int() % w, MAX_NUMNODES); + bit = find_nth_bit(maskp->bits, MAX_NUMNODES, get_random_int() % w); return bit; #else return 0; -- cgit From 944c417daeb63fa345fe0f754c57a5a23ca6d701 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Sat, 17 Sep 2022 20:07:16 -0700 Subject: cpumask: add cpumask_nth_{,and,andnot} Add cpumask_nth_{,and,andnot} as wrappers around corresponding find functions, and use it in cpumask_local_spread(). Signed-off-by: Yury Norov --- include/linux/cpumask.h | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 9a71af8097ca..e4f9136a4a63 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -337,6 +337,50 @@ unsigned int cpumask_any_but(const struct cpumask *mask, unsigned int cpu) return i; } +/** + * cpumask_nth - get the first cpu in a cpumask + * @srcp: the cpumask pointer + * @cpu: the N'th cpu to find, starting from 0 + * + * Returns >= nr_cpu_ids if such cpu doesn't exist. + */ +static inline unsigned int cpumask_nth(unsigned int cpu, const struct cpumask *srcp) +{ + return find_nth_bit(cpumask_bits(srcp), nr_cpumask_bits, cpumask_check(cpu)); +} + +/** + * cpumask_nth_and - get the first cpu in 2 cpumasks + * @srcp1: the cpumask pointer + * @srcp2: the cpumask pointer + * @cpu: the N'th cpu to find, starting from 0 + * + * Returns >= nr_cpu_ids if such cpu doesn't exist. + */ +static inline +unsigned int cpumask_nth_and(unsigned int cpu, const struct cpumask *srcp1, + const struct cpumask *srcp2) +{ + return find_nth_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), + nr_cpumask_bits, cpumask_check(cpu)); +} + +/** + * cpumask_nth_andnot - get the first cpu set in 1st cpumask, and clear in 2nd. + * @srcp1: the cpumask pointer + * @srcp2: the cpumask pointer + * @cpu: the N'th cpu to find, starting from 0 + * + * Returns >= nr_cpu_ids if such cpu doesn't exist. + */ +static inline +unsigned int cpumask_nth_andnot(unsigned int cpu, const struct cpumask *srcp1, + const struct cpumask *srcp2) +{ + return find_nth_andnot_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), + nr_cpumask_bits, cpumask_check(cpu)); +} + #define CPU_BITS_NONE \ { \ [0 ... BITS_TO_LONGS(NR_CPUS)-1] = 0UL \ -- cgit From eab3126571ed1e3e57ce0f066b566af472ebc47a Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 3 Jun 2022 15:29:22 +0200 Subject: efi: libstub: simplify efi_get_memory_map() and struct efi_boot_memmap Currently, struct efi_boot_memmap is a struct that is passed around between callers of efi_get_memory_map() and the users of the resulting data, and which carries pointers to various variables whose values are provided by the EFI GetMemoryMap() boot service. This is overly complex, and it is much easier to carry these values in the struct itself. So turn the struct into one that carries these data items directly, including a flex array for the variable number of EFI memory descriptors that the boot service may return. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index d2b84c2fec39..f1b3e0d1b3fa 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -518,6 +518,15 @@ typedef union { efi_system_table_32_t mixed_mode; } efi_system_table_t; +struct efi_boot_memmap { + unsigned long map_size; + unsigned long desc_size; + u32 desc_ver; + unsigned long map_key; + unsigned long buff_size; + efi_memory_desc_t map[]; +}; + /* * Architecture independent structure for describing a memory map for the * benefit of efi_memmap_init_early(), and for passing context between -- cgit From de185b56e8a62822d4e1cdb3e068b38ca709aa47 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Wed, 21 Sep 2022 20:05:00 +0200 Subject: blk-cgroup: pass a gendisk to blkcg_schedule_throttle Pass the gendisk to blkcg_schedule_throttle as part of moving the blk-cgroup infrastructure to be gendisk based. Remove the unused !BLK_CGROUP stub while we're at it. Signed-off-by: Christoph Hellwig Reviewed-by: Andreas Herrmann Acked-by: Tejun Heo Link: https://lore.kernel.org/r/20220921180501.1539876-17-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blk-cgroup.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blk-cgroup.h b/include/linux/blk-cgroup.h index 9f40dbc65f82..dd5841a42c33 100644 --- a/include/linux/blk-cgroup.h +++ b/include/linux/blk-cgroup.h @@ -18,14 +18,14 @@ struct bio; struct cgroup_subsys_state; -struct request_queue; +struct gendisk; #define FC_APPID_LEN 129 #ifdef CONFIG_BLK_CGROUP extern struct cgroup_subsys_state * const blkcg_root_css; -void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay); +void blkcg_schedule_throttle(struct gendisk *disk, bool use_memdelay); void blkcg_maybe_throttle_current(void); bool blk_cgroup_congested(void); void blkcg_pin_online(struct cgroup_subsys_state *blkcg_css); @@ -39,7 +39,6 @@ struct cgroup_subsys_state *bio_blkcg_css(struct bio *bio); static inline void blkcg_maybe_throttle_current(void) { } static inline bool blk_cgroup_congested(void) { return false; } -static inline void blkcg_schedule_throttle(struct request_queue *q, bool use_memdelay) { } static inline struct cgroup_subsys_state *bio_blkcg_css(struct bio *bio) { return NULL; -- cgit From 21c9e90ab9a4c991d21dd15cc5163c99a885d4a8 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Tue, 30 Aug 2022 20:36:01 +0800 Subject: mm, hwpoison: use num_poisoned_pages_sub() to decrease num_poisoned_pages Use num_poisoned_pages_sub() to combine multiple atomic ops into one. Also num_poisoned_pages_dec() can be killed as there's no caller now. Link: https://lkml.kernel.org/r/20220830123604.25763-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Acked-by: Naoya Horiguchi Signed-off-by: Andrew Morton --- include/linux/swapops.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index a3d435bf9f97..88825d1785d2 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -485,11 +485,6 @@ static inline void num_poisoned_pages_inc(void) atomic_long_inc(&num_poisoned_pages); } -static inline void num_poisoned_pages_dec(void) -{ - atomic_long_dec(&num_poisoned_pages); -} - static inline void num_poisoned_pages_sub(long i) { atomic_long_sub(i, &num_poisoned_pages); -- cgit From eba4d770efc86a3710e36b828190858abfa3bb74 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 11 Aug 2022 12:13:26 -0400 Subject: mm/swap: comment all the ifdef in swapops.h swapops.h contains quite a few layers of ifdef, some of the "else" and "endif" doesn't get proper comment on the macro so it's hard to follow on what are they referring to. Add the comments. Link: https://lkml.kernel.org/r/20220811161331.37055-3-peterx@redhat.com Signed-off-by: Peter Xu Suggested-by: Nadav Amit Reviewed-by: Huang Ying Reviewed-by: Alistair Popple Cc: Andi Kleen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Hugh Dickins Cc: "Kirill A . Shutemov" Cc: Minchan Kim Cc: Vlastimil Babka Cc: Dave Hansen Signed-off-by: Andrew Morton --- include/linux/swapops.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 88825d1785d2..7d1b74046520 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -247,8 +247,8 @@ extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, #ifdef CONFIG_HUGETLB_PAGE extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte); -#endif -#else +#endif /* CONFIG_HUGETLB_PAGE */ +#else /* CONFIG_MIGRATION */ static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) { return swp_entry(0, 0); @@ -276,7 +276,7 @@ static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, #ifdef CONFIG_HUGETLB_PAGE static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { } static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { } -#endif +#endif /* CONFIG_HUGETLB_PAGE */ static inline int is_writable_migration_entry(swp_entry_t entry) { return 0; @@ -286,7 +286,7 @@ static inline int is_readable_migration_entry(swp_entry_t entry) return 0; } -#endif +#endif /* CONFIG_MIGRATION */ typedef unsigned long pte_marker; @@ -426,7 +426,7 @@ static inline int is_pmd_migration_entry(pmd_t pmd) { return is_swap_pmd(pmd) && is_migration_entry(pmd_to_swp_entry(pmd)); } -#else +#else /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ static inline int set_pmd_migration_entry(struct page_vma_mapped_walk *pvmw, struct page *page) { @@ -455,7 +455,7 @@ static inline int is_pmd_migration_entry(pmd_t pmd) { return 0; } -#endif +#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */ #ifdef CONFIG_MEMORY_FAILURE @@ -490,7 +490,7 @@ static inline void num_poisoned_pages_sub(long i) atomic_long_sub(i, &num_poisoned_pages); } -#else +#else /* CONFIG_MEMORY_FAILURE */ static inline swp_entry_t make_hwpoison_entry(struct page *page) { @@ -509,7 +509,7 @@ static inline void num_poisoned_pages_inc(void) static inline void num_poisoned_pages_sub(long i) { } -#endif +#endif /* CONFIG_MEMORY_FAILURE */ static inline int non_swap_entry(swp_entry_t entry) { -- cgit From 0d206b5d2e0d7d7f09ac9540e3ab3e35a34f536e Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 11 Aug 2022 12:13:27 -0400 Subject: mm/swap: add swp_offset_pfn() to fetch PFN from swap entry We've got a bunch of special swap entries that stores PFN inside the swap offset fields. To fetch the PFN, normally the user just calls swp_offset() assuming that'll be the PFN. Add a helper swp_offset_pfn() to fetch the PFN instead, fetching only the max possible length of a PFN on the host, meanwhile doing proper check with MAX_PHYSMEM_BITS to make sure the swap offsets can actually store the PFNs properly always using the BUILD_BUG_ON() in is_pfn_swap_entry(). One reason to do so is we never tried to sanitize whether swap offset can really fit for storing PFN. At the meantime, this patch also prepares us with the future possibility to store more information inside the swp offset field, so assuming "swp_offset(entry)" to be the PFN will not stand any more very soon. Replace many of the swp_offset() callers to use swp_offset_pfn() where proper. Note that many of the existing users are not candidates for the replacement, e.g.: (1) When the swap entry is not a pfn swap entry at all, or, (2) when we wanna keep the whole swp_offset but only change the swp type. For the latter, it can happen when fork() triggered on a write-migration swap entry pte, we may want to only change the migration type from write->read but keep the rest, so it's not "fetching PFN" but "changing swap type only". They're left aside so that when there're more information within the swp offset they'll be carried over naturally in those cases. Since at it, dropping hwpoison_entry_to_pfn() because that's exactly what the new swp_offset_pfn() is about. Link: https://lkml.kernel.org/r/20220811161331.37055-4-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: "Huang, Ying" Cc: Alistair Popple Cc: Andi Kleen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Hugh Dickins Cc: "Kirill A . Shutemov" Cc: Minchan Kim Cc: Nadav Amit Cc: Vlastimil Babka Cc: Dave Hansen Signed-off-by: Andrew Morton --- include/linux/swapops.h | 35 +++++++++++++++++++++++++++++------ 1 file changed, 29 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 7d1b74046520..578212fbf2be 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -23,6 +23,20 @@ #define SWP_TYPE_SHIFT (BITS_PER_XA_VALUE - MAX_SWAPFILES_SHIFT) #define SWP_OFFSET_MASK ((1UL << SWP_TYPE_SHIFT) - 1) +/* + * Definitions only for PFN swap entries (see is_pfn_swap_entry()). To + * store PFN, we only need SWP_PFN_BITS bits. Each of the pfn swap entries + * can use the extra bits to store other information besides PFN. + */ +#ifdef MAX_PHYSMEM_BITS +#define SWP_PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) +#else /* MAX_PHYSMEM_BITS */ +#define SWP_PFN_BITS (BITS_PER_LONG - PAGE_SHIFT) +#endif /* MAX_PHYSMEM_BITS */ +#define SWP_PFN_MASK (BIT(SWP_PFN_BITS) - 1) + +static inline bool is_pfn_swap_entry(swp_entry_t entry); + /* Clear all flags but only keep swp_entry_t related information */ static inline pte_t pte_swp_clear_flags(pte_t pte) { @@ -64,6 +78,17 @@ static inline pgoff_t swp_offset(swp_entry_t entry) return entry.val & SWP_OFFSET_MASK; } +/* + * This should only be called upon a pfn swap entry to get the PFN stored + * in the swap entry. Please refers to is_pfn_swap_entry() for definition + * of pfn swap entry. + */ +static inline unsigned long swp_offset_pfn(swp_entry_t entry) +{ + VM_BUG_ON(!is_pfn_swap_entry(entry)); + return swp_offset(entry) & SWP_PFN_MASK; +} + /* check whether a pte points to a swap entry */ static inline int is_swap_pte(pte_t pte) { @@ -369,7 +394,7 @@ static inline int pte_none_mostly(pte_t pte) static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry) { - struct page *p = pfn_to_page(swp_offset(entry)); + struct page *p = pfn_to_page(swp_offset_pfn(entry)); /* * Any use of migration entries may only occur while the @@ -387,6 +412,9 @@ static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry) */ static inline bool is_pfn_swap_entry(swp_entry_t entry) { + /* Make sure the swp offset can always store the needed fields */ + BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS); + return is_migration_entry(entry) || is_device_private_entry(entry) || is_device_exclusive_entry(entry); } @@ -475,11 +503,6 @@ static inline int is_hwpoison_entry(swp_entry_t entry) return swp_type(entry) == SWP_HWPOISON; } -static inline unsigned long hwpoison_entry_to_pfn(swp_entry_t entry) -{ - return swp_offset(entry); -} - static inline void num_poisoned_pages_inc(void) { atomic_long_inc(&num_poisoned_pages); -- cgit From 2e3468778dbe3ec389a10c21a703bb8e5be5cfbc Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 11 Aug 2022 12:13:29 -0400 Subject: mm: remember young/dirty bit for page migrations When page migration happens, we always ignore the young/dirty bit settings in the old pgtable, and marking the page as old in the new page table using either pte_mkold() or pmd_mkold(), and keeping the pte clean. That's fine from functional-wise, but that's not friendly to page reclaim because the moving page can be actively accessed within the procedure. Not to mention hardware setting the young bit can bring quite some overhead on some systems, e.g. x86_64 needs a few hundreds nanoseconds to set the bit. The same slowdown problem to dirty bits when the memory is first written after page migration happened. Actually we can easily remember the A/D bit configuration and recover the information after the page is migrated. To achieve it, define a new set of bits in the migration swap offset field to cache the A/D bits for old pte. Then when removing/recovering the migration entry, we can recover the A/D bits even if the page changed. One thing to mention is that here we used max_swapfile_size() to detect how many swp offset bits we have, and we'll only enable this feature if we know the swp offset is big enough to store both the PFN value and the A/D bits. Otherwise the A/D bits are dropped like before. Link: https://lkml.kernel.org/r/20220811161331.37055-6-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: "Huang, Ying" Cc: Alistair Popple Cc: Andi Kleen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Hugh Dickins Cc: "Kirill A . Shutemov" Cc: Minchan Kim Cc: Nadav Amit Cc: Vlastimil Babka Cc: Dave Hansen Signed-off-by: Andrew Morton --- include/linux/swapops.h | 99 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 99 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 578212fbf2be..11b874f212a2 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -8,6 +8,10 @@ #ifdef CONFIG_MMU +#ifdef CONFIG_SWAP +#include +#endif /* CONFIG_SWAP */ + /* * swapcache pages are stored in the swapper_space radix tree. We want to * get good packing density in that tree, so the index should be dense in @@ -35,6 +39,31 @@ #endif /* MAX_PHYSMEM_BITS */ #define SWP_PFN_MASK (BIT(SWP_PFN_BITS) - 1) +/** + * Migration swap entry specific bitfield definitions. Layout: + * + * |----------+--------------------| + * | swp_type | swp_offset | + * |----------+--------+-+-+-------| + * | | resv |D|A| PFN | + * |----------+--------+-+-+-------| + * + * @SWP_MIG_YOUNG_BIT: Whether the page used to have young bit set (bit A) + * @SWP_MIG_DIRTY_BIT: Whether the page used to have dirty bit set (bit D) + * + * Note: A/D bits will be stored in migration entries iff there're enough + * free bits in arch specific swp offset. By default we'll ignore A/D bits + * when migrating a page. Please refer to migration_entry_supports_ad() + * for more information. If there're more bits besides PFN and A/D bits, + * they should be reserved and always be zeros. + */ +#define SWP_MIG_YOUNG_BIT (SWP_PFN_BITS) +#define SWP_MIG_DIRTY_BIT (SWP_PFN_BITS + 1) +#define SWP_MIG_TOTAL_BITS (SWP_PFN_BITS + 2) + +#define SWP_MIG_YOUNG BIT(SWP_MIG_YOUNG_BIT) +#define SWP_MIG_DIRTY BIT(SWP_MIG_DIRTY_BIT) + static inline bool is_pfn_swap_entry(swp_entry_t entry); /* Clear all flags but only keep swp_entry_t related information */ @@ -265,6 +294,57 @@ static inline swp_entry_t make_writable_migration_entry(pgoff_t offset) return swp_entry(SWP_MIGRATION_WRITE, offset); } +/* + * Returns whether the host has large enough swap offset field to support + * carrying over pgtable A/D bits for page migrations. The result is + * pretty much arch specific. + */ +static inline bool migration_entry_supports_ad(void) +{ + /* + * max_swapfile_size() returns the max supported swp-offset plus 1. + * We can support the migration A/D bits iff the pfn swap entry has + * the offset large enough to cover all of them (PFN, A & D bits). + */ +#ifdef CONFIG_SWAP + return max_swapfile_size() >= (1UL << SWP_MIG_TOTAL_BITS); +#else /* CONFIG_SWAP */ + return false; +#endif /* CONFIG_SWAP */ +} + +static inline swp_entry_t make_migration_entry_young(swp_entry_t entry) +{ + if (migration_entry_supports_ad()) + return swp_entry(swp_type(entry), + swp_offset(entry) | SWP_MIG_YOUNG); + return entry; +} + +static inline bool is_migration_entry_young(swp_entry_t entry) +{ + if (migration_entry_supports_ad()) + return swp_offset(entry) & SWP_MIG_YOUNG; + /* Keep the old behavior of aging page after migration */ + return false; +} + +static inline swp_entry_t make_migration_entry_dirty(swp_entry_t entry) +{ + if (migration_entry_supports_ad()) + return swp_entry(swp_type(entry), + swp_offset(entry) | SWP_MIG_DIRTY); + return entry; +} + +static inline bool is_migration_entry_dirty(swp_entry_t entry) +{ + if (migration_entry_supports_ad()) + return swp_offset(entry) & SWP_MIG_DIRTY; + /* Keep the old behavior of clean page after migration */ + return false; +} + extern void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, spinlock_t *ptl); extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, @@ -311,6 +391,25 @@ static inline int is_readable_migration_entry(swp_entry_t entry) return 0; } +static inline swp_entry_t make_migration_entry_young(swp_entry_t entry) +{ + return entry; +} + +static inline bool is_migration_entry_young(swp_entry_t entry) +{ + return false; +} + +static inline swp_entry_t make_migration_entry_dirty(swp_entry_t entry) +{ + return entry; +} + +static inline bool is_migration_entry_dirty(swp_entry_t entry) +{ + return false; +} #endif /* CONFIG_MIGRATION */ typedef unsigned long pte_marker; -- cgit From be45a4902c7caa717fee6b2f671e59b396ed395c Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 11 Aug 2022 12:13:30 -0400 Subject: mm/swap: cache maximum swapfile size when init swap We used to have swapfile_maximum_size() fetching a maximum value of swapfile size per-arch. As the caller of max_swapfile_size() grows, this patch introduce a variable "swapfile_maximum_size" and cache the value of old max_swapfile_size(), so that we don't need to calculate the value every time. Caching the value in swapfile_init() is safe because when reaching the phase we should have initialized all the relevant information. Here the major arch to take care of is x86, which defines the max swapfile size based on L1TF mitigation. Here both X86_BUG_L1TF or l1tf_mitigation should have been setup properly when reaching swapfile_init(). As a reference, the code path looks like this for x86: - start_kernel - setup_arch - early_cpu_init - early_identify_cpu --> setup X86_BUG_L1TF - parse_early_param - l1tf_cmdline --> set l1tf_mitigation - check_bugs - l1tf_select_mitigation --> set l1tf_mitigation - arch_call_rest_init - rest_init - kernel_init - kernel_init_freeable - do_basic_setup - do_initcalls --> calls swapfile_init() (initcall level 4) The swapfile size only depends on swp pte format on non-x86 archs, so caching it is safe too. Since at it, rename max_swapfile_size() to arch_max_swapfile_size() because arch can define its own function, so it's more straightforward to have "arch_" as its prefix. At the meantime, export swapfile_maximum_size to replace the old usages of max_swapfile_size(). [peterx@redhat.com: declare arch_max_swapfile_size) in swapfile.h] Link: https://lkml.kernel.org/r/YxTh1GuC6ro5fKL5@xz-m1.local Link: https://lkml.kernel.org/r/20220811161331.37055-7-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: "Huang, Ying" Cc: Alistair Popple Cc: Andi Kleen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Hugh Dickins Cc: "Kirill A . Shutemov" Cc: Minchan Kim Cc: Nadav Amit Cc: Vlastimil Babka Cc: Dave Hansen Signed-off-by: Andrew Morton --- include/linux/swapfile.h | 5 ++++- include/linux/swapops.h | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h index 54078542134c..e2d11ae4e73d 100644 --- a/include/linux/swapfile.h +++ b/include/linux/swapfile.h @@ -8,6 +8,9 @@ */ extern struct swap_info_struct *swap_info[]; extern unsigned long generic_max_swapfile_size(void); -extern unsigned long max_swapfile_size(void); +unsigned long arch_max_swapfile_size(void); + +/* Maximum swapfile size supported for the arch (not inclusive). */ +extern unsigned long swapfile_maximum_size; #endif /* _LINUX_SWAPFILE_H */ diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 11b874f212a2..027b4095e132 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -307,7 +307,7 @@ static inline bool migration_entry_supports_ad(void) * the offset large enough to cover all of them (PFN, A & D bits). */ #ifdef CONFIG_SWAP - return max_swapfile_size() >= (1UL << SWP_MIG_TOTAL_BITS); + return swapfile_maximum_size >= (1UL << SWP_MIG_TOTAL_BITS); #else /* CONFIG_SWAP */ return false; #endif /* CONFIG_SWAP */ -- cgit From 5154e607967d3f587fda84a40abbf900275016c9 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Thu, 11 Aug 2022 12:13:31 -0400 Subject: mm/swap: cache swap migration A/D bits support Introduce a variable swap_migration_ad_supported to cache whether the arch supports swap migration A/D bits. Here one thing to mention is that SWP_MIG_TOTAL_BITS will internally reference the other macro MAX_PHYSMEM_BITS, which is a function call on x86 (constant on all the rest of archs). It's safe to reference it in swapfile_init() because when reaching here we're already during initcalls level 4 so we must have initialized 5-level pgtable for x86_64 (right after early_identify_cpu() finishes). - start_kernel - setup_arch - early_cpu_init - get_cpu_cap --> fetch from CPUID (including X86_FEATURE_LA57) - early_identify_cpu --> clear X86_FEATURE_LA57 (if early lvl5 not enabled (USE_EARLY_PGTABLE_L5)) - arch_call_rest_init - rest_init - kernel_init - kernel_init_freeable - do_basic_setup - do_initcalls --> calls swapfile_init() (initcall level 4) This should slightly speed up the migration swap entry handlings. Link: https://lkml.kernel.org/r/20220811161331.37055-8-peterx@redhat.com Signed-off-by: Peter Xu Cc: Alistair Popple Cc: Andi Kleen Cc: Andrea Arcangeli Cc: David Hildenbrand Cc: Huang Ying Cc: Hugh Dickins Cc: "Kirill A . Shutemov" Cc: Minchan Kim Cc: Nadav Amit Cc: Vlastimil Babka Cc: Dave Hansen Signed-off-by: Andrew Morton --- include/linux/swapfile.h | 2 ++ include/linux/swapops.h | 7 +------ 2 files changed, 3 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h index e2d11ae4e73d..7ed529a77c5b 100644 --- a/include/linux/swapfile.h +++ b/include/linux/swapfile.h @@ -12,5 +12,7 @@ unsigned long arch_max_swapfile_size(void); /* Maximum swapfile size supported for the arch (not inclusive). */ extern unsigned long swapfile_maximum_size; +/* Whether swap migration entry supports storing A/D bits for the arch */ +extern bool swap_migration_ad_supported; #endif /* _LINUX_SWAPFILE_H */ diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 027b4095e132..86b95ccb81bb 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -301,13 +301,8 @@ static inline swp_entry_t make_writable_migration_entry(pgoff_t offset) */ static inline bool migration_entry_supports_ad(void) { - /* - * max_swapfile_size() returns the max supported swp-offset plus 1. - * We can support the migration A/D bits iff the pfn swap entry has - * the offset large enough to cover all of them (PFN, A & D bits). - */ #ifdef CONFIG_SWAP - return swapfile_maximum_size >= (1UL << SWP_MIG_TOTAL_BITS); + return swap_migration_ad_supported; #else /* CONFIG_SWAP */ return false; #endif /* CONFIG_SWAP */ -- cgit From aa1cf99b87e934e761b46ce2b925335a398980da Mon Sep 17 00:00:00 2001 From: Yang Yang Date: Mon, 15 Aug 2022 07:11:35 +0000 Subject: delayacct: support re-entrance detection of thrashing accounting Once upon a time, we only support accounting thrashing of page cache. Then Joonsoo introduced workingset detection for anonymous pages and we gained the ability to account thrashing of them[1]. For page cache thrashing accounting, there is no suitable place to do it in fs level likes swap_readpage(). So we have to do it in folio_wait_bit_common(). Then for anonymous pages thrashing accounting, we have to do it in both swap_readpage() and folio_wait_bit_common(). This likes PSI, so we should let thrashing accounting supports re-entrance detection. This patch is to prepare complete thrashing accounting, and is based on patch "filemap: make the accounting of thrashing more consistent". [1] commit aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU") Link: https://lkml.kernel.org/r/20220815071134.74551-1-yang.yang29@zte.com.cn Signed-off-by: Yang Yang Signed-off-by: CGEL ZTE Reviewed-by: Ran Xiaokai Reviewed-by: wangyong Acked-by: Joonsoo Kim Signed-off-by: Andrew Morton --- include/linux/delayacct.h | 16 ++++++++-------- include/linux/sched.h | 4 ++++ 2 files changed, 12 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/delayacct.h b/include/linux/delayacct.h index 58aea2d7385c..0da97dba9ef8 100644 --- a/include/linux/delayacct.h +++ b/include/linux/delayacct.h @@ -73,8 +73,8 @@ extern int delayacct_add_tsk(struct taskstats *, struct task_struct *); extern __u64 __delayacct_blkio_ticks(struct task_struct *); extern void __delayacct_freepages_start(void); extern void __delayacct_freepages_end(void); -extern void __delayacct_thrashing_start(void); -extern void __delayacct_thrashing_end(void); +extern void __delayacct_thrashing_start(bool *in_thrashing); +extern void __delayacct_thrashing_end(bool *in_thrashing); extern void __delayacct_swapin_start(void); extern void __delayacct_swapin_end(void); extern void __delayacct_compact_start(void); @@ -143,22 +143,22 @@ static inline void delayacct_freepages_end(void) __delayacct_freepages_end(); } -static inline void delayacct_thrashing_start(void) +static inline void delayacct_thrashing_start(bool *in_thrashing) { if (!static_branch_unlikely(&delayacct_key)) return; if (current->delays) - __delayacct_thrashing_start(); + __delayacct_thrashing_start(in_thrashing); } -static inline void delayacct_thrashing_end(void) +static inline void delayacct_thrashing_end(bool *in_thrashing) { if (!static_branch_unlikely(&delayacct_key)) return; if (current->delays) - __delayacct_thrashing_end(); + __delayacct_thrashing_end(in_thrashing); } static inline void delayacct_swapin_start(void) @@ -237,9 +237,9 @@ static inline void delayacct_freepages_start(void) {} static inline void delayacct_freepages_end(void) {} -static inline void delayacct_thrashing_start(void) +static inline void delayacct_thrashing_start(bool *in_thrashing) {} -static inline void delayacct_thrashing_end(void) +static inline void delayacct_thrashing_end(bool *in_thrashing) {} static inline void delayacct_swapin_start(void) {} diff --git a/include/linux/sched.h b/include/linux/sched.h index e7b2f8a5c711..d9a2466664f7 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -944,6 +944,10 @@ struct task_struct { #ifdef CONFIG_CPU_SUP_INTEL unsigned reported_split_lock:1; #endif +#ifdef CONFIG_TASK_DELAY_ACCT + /* delay due to memory thrashing */ + unsigned in_thrashing:1; +#endif unsigned long atomic_flags; /* Flags requiring atomic access. */ -- cgit From e1fd09e3d1dd4a1a8b3b33bc1fd647eee9f4e475 Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 01:59:58 -0600 Subject: mm: x86, arm64: add arch_has_hw_pte_young() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "Multi-Gen LRU Framework", v14. What's new ========== 1. OpenWrt, in addition to Android, Arch Linux Zen, Armbian, ChromeOS, Liquorix, post-factum and XanMod, is now shipping MGLRU on 5.15. 2. Fixed long-tailed direct reclaim latency seen on high-memory (TBs) machines. The old direct reclaim backoff, which tries to enforce a minimum fairness among all eligible memcgs, over-swapped by about (total_mem>>DEF_PRIORITY)-nr_to_reclaim. The new backoff, which pulls the plug on swapping once the target is met, trades some fairness for curtailed latency: https://lore.kernel.org/r/20220918080010.2920238-10-yuzhao@google.com/ 3. Fixed minior build warnings and conflicts. More comments and nits. TLDR ==== The current page reclaim is too expensive in terms of CPU usage and it often makes poor choices about what to evict. This patchset offers an alternative solution that is performant, versatile and straightforward. Patchset overview ================= The design and implementation overview is in patch 14: https://lore.kernel.org/r/20220918080010.2920238-15-yuzhao@google.com/ 01. mm: x86, arm64: add arch_has_hw_pte_young() 02. mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG Take advantage of hardware features when trying to clear the accessed bit in many PTEs. 03. mm/vmscan.c: refactor shrink_node() 04. Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" Minor refactors to improve readability for the following patches. 05. mm: multi-gen LRU: groundwork Adds the basic data structure and the functions that insert pages to and remove pages from the multi-gen LRU (MGLRU) lists. 06. mm: multi-gen LRU: minimal implementation A minimal implementation without optimizations. 07. mm: multi-gen LRU: exploit locality in rmap Exploits spatial locality to improve efficiency when using the rmap. 08. mm: multi-gen LRU: support page table walks Further exploits spatial locality by optionally scanning page tables. 09. mm: multi-gen LRU: optimize multiple memcgs Optimizes the overall performance for multiple memcgs running mixed types of workloads. 10. mm: multi-gen LRU: kill switch Adds a kill switch to enable or disable MGLRU at runtime. 11. mm: multi-gen LRU: thrashing prevention 12. mm: multi-gen LRU: debugfs interface Provide userspace with features like thrashing prevention, working set estimation and proactive reclaim. 13. mm: multi-gen LRU: admin guide 14. mm: multi-gen LRU: design doc Add an admin guide and a design doc. Benchmark results ================= Independent lab results ----------------------- Based on the popularity of searches [01] and the memory usage in Google's public cloud, the most popular open-source memory-hungry applications, in alphabetical order, are: Apache Cassandra Memcached Apache Hadoop MongoDB Apache Spark PostgreSQL MariaDB (MySQL) Redis An independent lab evaluated MGLRU with the most widely used benchmark suites for the above applications. They posted 960 data points along with kernel metrics and perf profiles collected over more than 500 hours of total benchmark time. Their final reports show that, with 95% confidence intervals (CIs), the above applications all performed significantly better for at least part of their benchmark matrices. On 5.14: 1. Apache Spark [02] took 95% CIs [9.28, 11.19]% and [12.20, 14.93]% less wall time to sort three billion random integers, respectively, under the medium- and the high-concurrency conditions, when overcommitting memory. There were no statistically significant changes in wall time for the rest of the benchmark matrix. 2. MariaDB [03] achieved 95% CIs [5.24, 10.71]% and [20.22, 25.97]% more transactions per minute (TPM), respectively, under the medium- and the high-concurrency conditions, when overcommitting memory. There were no statistically significant changes in TPM for the rest of the benchmark matrix. 3. Memcached [04] achieved 95% CIs [23.54, 32.25]%, [20.76, 41.61]% and [21.59, 30.02]% more operations per second (OPS), respectively, for sequential access, random access and Gaussian (distribution) access, when THP=always; 95% CIs [13.85, 15.97]% and [23.94, 29.92]% more OPS, respectively, for random access and Gaussian access, when THP=never. There were no statistically significant changes in OPS for the rest of the benchmark matrix. 4. MongoDB [05] achieved 95% CIs [2.23, 3.44]%, [6.97, 9.73]% and [2.16, 3.55]% more operations per second (OPS), respectively, for exponential (distribution) access, random access and Zipfian (distribution) access, when underutilizing memory; 95% CIs [8.83, 10.03]%, [21.12, 23.14]% and [5.53, 6.46]% more OPS, respectively, for exponential access, random access and Zipfian access, when overcommitting memory. On 5.15: 5. Apache Cassandra [06] achieved 95% CIs [1.06, 4.10]%, [1.94, 5.43]% and [4.11, 7.50]% more operations per second (OPS), respectively, for exponential (distribution) access, random access and Zipfian (distribution) access, when swap was off; 95% CIs [0.50, 2.60]%, [6.51, 8.77]% and [3.29, 6.75]% more OPS, respectively, for exponential access, random access and Zipfian access, when swap was on. 6. Apache Hadoop [07] took 95% CIs [5.31, 9.69]% and [2.02, 7.86]% less average wall time to finish twelve parallel TeraSort jobs, respectively, under the medium- and the high-concurrency conditions, when swap was on. There were no statistically significant changes in average wall time for the rest of the benchmark matrix. 7. PostgreSQL [08] achieved 95% CI [1.75, 6.42]% more transactions per minute (TPM) under the high-concurrency condition, when swap was off; 95% CIs [12.82, 18.69]% and [22.70, 46.86]% more TPM, respectively, under the medium- and the high-concurrency conditions, when swap was on. There were no statistically significant changes in TPM for the rest of the benchmark matrix. 8. Redis [09] achieved 95% CIs [0.58, 5.94]%, [6.55, 14.58]% and [11.47, 19.36]% more total operations per second (OPS), respectively, for sequential access, random access and Gaussian (distribution) access, when THP=always; 95% CIs [1.27, 3.54]%, [10.11, 14.81]% and [8.75, 13.64]% more total OPS, respectively, for sequential access, random access and Gaussian access, when THP=never. Our lab results --------------- To supplement the above results, we ran the following benchmark suites on 5.16-rc7 and found no regressions [10]. fs_fio_bench_hdd_mq pft fs_lmbench pgsql-hammerdb fs_parallelio redis fs_postmark stream hackbench sysbenchthread kernbench tpcc_spark memcached unixbench multichase vm-scalability mutilate will-it-scale nginx [01] https://trends.google.com [02] https://lore.kernel.org/r/20211102002002.92051-1-bot@edi.works/ [03] https://lore.kernel.org/r/20211009054315.47073-1-bot@edi.works/ [04] https://lore.kernel.org/r/20211021194103.65648-1-bot@edi.works/ [05] https://lore.kernel.org/r/20211109021346.50266-1-bot@edi.works/ [06] https://lore.kernel.org/r/20211202062806.80365-1-bot@edi.works/ [07] https://lore.kernel.org/r/20211209072416.33606-1-bot@edi.works/ [08] https://lore.kernel.org/r/20211218071041.24077-1-bot@edi.works/ [09] https://lore.kernel.org/r/20211122053248.57311-1-bot@edi.works/ [10] https://lore.kernel.org/r/20220104202247.2903702-1-yuzhao@google.com/ Read-world applications ======================= Third-party testimonials ------------------------ Konstantin reported [11]: I have Archlinux with 8G RAM + zswap + swap. While developing, I have lots of apps opened such as multiple LSP-servers for different langs, chats, two browsers, etc... Usually, my system gets quickly to a point of SWAP-storms, where I have to kill LSP-servers, restart browsers to free memory, etc, otherwise the system lags heavily and is barely usable. 1.5 day ago I migrated from 5.11.15 kernel to 5.12 + the LRU patchset, and I started up by opening lots of apps to create memory pressure, and worked for a day like this. Till now I had not a single SWAP-storm, and mind you I got 3.4G in SWAP. I was never getting to the point of 3G in SWAP before without a single SWAP-storm. Vaibhav from IBM reported [12]: In a synthetic MongoDB Benchmark, seeing an average of ~19% throughput improvement on POWER10(Radix MMU + 64K Page Size) with MGLRU patches on top of 5.16 kernel for MongoDB + YCSB across three different request distributions, namely, Exponential, Uniform and Zipfan. Shuang from U of Rochester reported [13]: With the MGLRU, fio achieved 95% CIs [38.95, 40.26]%, [4.12, 6.64]% and [9.26, 10.36]% higher throughput, respectively, for random access, Zipfian (distribution) access and Gaussian (distribution) access, when the average number of jobs per CPU is 1; 95% CIs [42.32, 49.15]%, [9.44, 9.89]% and [20.99, 22.86]% higher throughput, respectively, for random access, Zipfian access and Gaussian access, when the average number of jobs per CPU is 2. Daniel from Michigan Tech reported [14]: With Memcached allocating ~100GB of byte-addressable Optante, performance improvement in terms of throughput (measured as queries per second) was about 10% for a series of workloads. Large-scale deployments ----------------------- We've rolled out MGLRU to tens of millions of ChromeOS users and about a million Android users. Google's fleetwide profiling [15] shows an overall 40% decrease in kswapd CPU usage, in addition to improvements in other UX metrics, e.g., an 85% decrease in the number of low-memory kills at the 75th percentile and an 18% decrease in app launch time at the 50th percentile. The downstream kernels that have been using MGLRU include: 1. Android [16] 2. Arch Linux Zen [17] 3. Armbian [18] 4. ChromeOS [19] 5. Liquorix [20] 6. OpenWrt [21] 7. post-factum [22] 8. XanMod [23] [11] https://lore.kernel.org/r/140226722f2032c86301fbd326d91baefe3d7d23.camel@yandex.ru/ [12] https://lore.kernel.org/r/87czj3mux0.fsf@vajain21.in.ibm.com/ [13] https://lore.kernel.org/r/20220105024423.26409-1-szhai2@cs.rochester.edu/ [14] https://lore.kernel.org/r/CA+4-3vksGvKd18FgRinxhqHetBS1hQekJE2gwco8Ja-bJWKtFw@mail.gmail.com/ [15] https://dl.acm.org/doi/10.1145/2749469.2750392 [16] https://android.com [17] https://archlinux.org [18] https://armbian.com [19] https://chromium.org [20] https://liquorix.net [21] https://openwrt.org [22] https://codeberg.org/pf-kernel [23] https://xanmod.org Summary ======= The facts are: 1. The independent lab results and the real-world applications indicate substantial improvements; there are no known regressions. 2. Thrashing prevention, working set estimation and proactive reclaim work out of the box; there are no equivalent solutions. 3. There is a lot of new code; no smaller changes have been demonstrated similar effects. Our options, accordingly, are: 1. Given the amount of evidence, the reported improvements will likely materialize for a wide range of workloads. 2. Gauging the interest from the past discussions, the new features will likely be put to use for both personal computers and data centers. 3. Based on Google's track record, the new code will likely be well maintained in the long term. It'd be more difficult if not impossible to achieve similar effects with other approaches. This patch (of 14): Some architectures automatically set the accessed bit in PTEs, e.g., x86 and arm64 v8.2. On architectures that do not have this capability, clearing the accessed bit in a PTE usually triggers a page fault following the TLB miss of this PTE (to emulate the accessed bit). Being aware of this capability can help make better decisions, e.g., whether to spread the work out over a period of time to reduce bursty page faults when trying to clear the accessed bit in many PTEs. Note that theoretically this capability can be unreliable, e.g., hotplugged CPUs might be different from builtin ones. Therefore it should not be used in architecture-independent code that involves correctness, e.g., to determine whether TLB flushes are required (in combination with the accessed bit). Link: https://lkml.kernel.org/r/20220918080010.2920238-1-yuzhao@google.com Link: https://lkml.kernel.org/r/20220918080010.2920238-2-yuzhao@google.com Signed-off-by: Yu Zhao Reviewed-by: Barry Song Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Acked-by: Will Deacon Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: linux-arm-kernel@lists.infradead.org Cc: Matthew Wilcox Cc: Mel Gorman Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Tejun Heo Cc: Vlastimil Babka Cc: Miaohe Lin Cc: Mike Rapoport Cc: Qi Zheng Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index d13b4f7cc5be..375e8e7e64f4 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -260,6 +260,19 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif +#ifndef arch_has_hw_pte_young +/* + * Return whether the accessed bit is supported on the local CPU. + * + * This stub assumes accessing through an old PTE triggers a page fault. + * Architectures that automatically set the access bit should overwrite it. + */ +static inline bool arch_has_hw_pte_young(void) +{ + return false; +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, -- cgit From eed9a328aa1ae6ac1edaa026957e6882f57de0dd Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 01:59:59 -0600 Subject: mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Some architectures support the accessed bit in non-leaf PMD entries, e.g., x86 sets the accessed bit in a non-leaf PMD entry when using it as part of linear address translation [1]. Page table walkers that clear the accessed bit may use this capability to reduce their search space. Note that: 1. Although an inline function is preferable, this capability is added as a configuration option for consistency with the existing macros. 2. Due to the little interest in other varieties, this capability was only tested on Intel and AMD CPUs. Thanks to the following developers for their efforts [2][3]. Randy Dunlap Stephen Rothwell [1]: Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3 (June 2021), section 4.8 [2] https://lore.kernel.org/r/bfdcc7c8-922f-61a9-aa15-7e7250f04af7@infradead.org/ [3] https://lore.kernel.org/r/20220413151513.5a0d7a7e@canb.auug.org.au/ Link: https://lkml.kernel.org/r/20220918080010.2920238-3-yuzhao@google.com Signed-off-by: Yu Zhao Reviewed-by: Barry Song Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Qi Zheng Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 375e8e7e64f4..a108b60a6962 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -213,7 +213,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, #endif #ifndef __HAVE_ARCH_PMDP_TEST_AND_CLEAR_YOUNG -#ifdef CONFIG_TRANSPARENT_HUGEPAGE +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG) static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) @@ -234,7 +234,7 @@ static inline int pmdp_test_and_clear_young(struct vm_area_struct *vma, BUILD_BUG(); return 0; } -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG */ #endif #ifndef __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH -- cgit From aa1b67903a19e026d1749241fad177f6185c2d42 Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 02:00:01 -0600 Subject: Revert "include/linux/mm_inline.h: fold __update_lru_size() into its sole caller" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This patch undoes the following refactor: commit 289ccba18af4 ("include/linux/mm_inline.h: fold __update_lru_size() into its sole caller") The upcoming changes to include/linux/mm_inline.h will reuse __update_lru_size(). Link: https://lkml.kernel.org/r/20220918080010.2920238-5-yuzhao@google.com Signed-off-by: Yu Zhao Reviewed-by: Miaohe Lin Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Barry Song Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Qi Zheng Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm_inline.h | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 7b25b53c474a..fb8aadb81cd6 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -34,7 +34,7 @@ static inline int page_is_file_lru(struct page *page) return folio_is_file_lru(page_folio(page)); } -static __always_inline void update_lru_size(struct lruvec *lruvec, +static __always_inline void __update_lru_size(struct lruvec *lruvec, enum lru_list lru, enum zone_type zid, long nr_pages) { @@ -43,6 +43,13 @@ static __always_inline void update_lru_size(struct lruvec *lruvec, __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); __mod_zone_page_state(&pgdat->node_zones[zid], NR_ZONE_LRU_BASE + lru, nr_pages); +} + +static __always_inline void update_lru_size(struct lruvec *lruvec, + enum lru_list lru, enum zone_type zid, + long nr_pages) +{ + __update_lru_size(lruvec, lru, zid, nr_pages); #ifdef CONFIG_MEMCG mem_cgroup_update_lru_size(lruvec, lru, zid, nr_pages); #endif -- cgit From ec1c86b25f4bdd9dce6436c0539d2a6ae676e1c4 Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 02:00:02 -0600 Subject: mm: multi-gen LRU: groundwork MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Evictable pages are divided into multiple generations for each lruvec. The youngest generation number is stored in lrugen->max_seq for both anon and file types as they are aged on an equal footing. The oldest generation numbers are stored in lrugen->min_seq[] separately for anon and file types as clean file pages can be evicted regardless of swap constraints. These three variables are monotonically increasing. Generation numbers are truncated into order_base_2(MAX_NR_GENS+1) bits in order to fit into the gen counter in folio->flags. Each truncated generation number is an index to lrugen->lists[]. The sliding window technique is used to track at least MIN_NR_GENS and at most MAX_NR_GENS generations. The gen counter stores a value within [1, MAX_NR_GENS] while a page is on one of lrugen->lists[]. Otherwise it stores 0. There are two conceptually independent procedures: "the aging", which produces young generations, and "the eviction", which consumes old generations. They form a closed-loop system, i.e., "the page reclaim". Both procedures can be invoked from userspace for the purposes of working set estimation and proactive reclaim. These techniques are commonly used to optimize job scheduling (bin packing) in data centers [1][2]. To avoid confusion, the terms "hot" and "cold" will be applied to the multi-gen LRU, as a new convention; the terms "active" and "inactive" will be applied to the active/inactive LRU, as usual. The protection of hot pages and the selection of cold pages are based on page access channels and patterns. There are two access channels: one through page tables and the other through file descriptors. The protection of the former channel is by design stronger because: 1. The uncertainty in determining the access patterns of the former channel is higher due to the approximation of the accessed bit. 2. The cost of evicting the former channel is higher due to the TLB flushes required and the likelihood of encountering the dirty bit. 3. The penalty of underprotecting the former channel is higher because applications usually do not prepare themselves for major page faults like they do for blocked I/O. E.g., GUI applications commonly use dedicated I/O threads to avoid blocking rendering threads. There are also two access patterns: one with temporal locality and the other without. For the reasons listed above, the former channel is assumed to follow the former pattern unless VM_SEQ_READ or VM_RAND_READ is present; the latter channel is assumed to follow the latter pattern unless outlying refaults have been observed [3][4]. The next patch will address the "outlying refaults". Three macros, i.e., LRU_REFS_WIDTH, LRU_REFS_PGOFF and LRU_REFS_MASK, used later are added in this patch to make the entire patchset less diffy. A page is added to the youngest generation on faulting. The aging needs to check the accessed bit at least twice before handing this page over to the eviction. The first check takes care of the accessed bit set on the initial fault; the second check makes sure this page has not been used since then. This protocol, AKA second chance, requires a minimum of two generations, hence MIN_NR_GENS. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 [3] https://lwn.net/Articles/495543/ [4] https://lwn.net/Articles/815342/ Link: https://lkml.kernel.org/r/20220918080010.2920238-6-yuzhao@google.com Signed-off-by: Yu Zhao Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Barry Song Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Qi Zheng Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm_inline.h | 175 ++++++++++++++++++++++++++++++++++++++ include/linux/mmzone.h | 102 ++++++++++++++++++++++ include/linux/page-flags-layout.h | 13 +-- include/linux/page-flags.h | 4 +- include/linux/sched.h | 4 + 5 files changed, 291 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index fb8aadb81cd6..2ff703900fd0 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -40,6 +40,9 @@ static __always_inline void __update_lru_size(struct lruvec *lruvec, { struct pglist_data *pgdat = lruvec_pgdat(lruvec); + lockdep_assert_held(&lruvec->lru_lock); + WARN_ON_ONCE(nr_pages != (int)nr_pages); + __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); __mod_zone_page_state(&pgdat->node_zones[zid], NR_ZONE_LRU_BASE + lru, nr_pages); @@ -101,11 +104,177 @@ static __always_inline enum lru_list folio_lru_list(struct folio *folio) return lru; } +#ifdef CONFIG_LRU_GEN + +static inline bool lru_gen_enabled(void) +{ + return true; +} + +static inline bool lru_gen_in_fault(void) +{ + return current->in_lru_fault; +} + +static inline int lru_gen_from_seq(unsigned long seq) +{ + return seq % MAX_NR_GENS; +} + +static inline int folio_lru_gen(struct folio *folio) +{ + unsigned long flags = READ_ONCE(folio->flags); + + return ((flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; +} + +static inline bool lru_gen_is_active(struct lruvec *lruvec, int gen) +{ + unsigned long max_seq = lruvec->lrugen.max_seq; + + VM_WARN_ON_ONCE(gen >= MAX_NR_GENS); + + /* see the comment on MIN_NR_GENS */ + return gen == lru_gen_from_seq(max_seq) || gen == lru_gen_from_seq(max_seq - 1); +} + +static inline void lru_gen_update_size(struct lruvec *lruvec, struct folio *folio, + int old_gen, int new_gen) +{ + int type = folio_is_file_lru(folio); + int zone = folio_zonenum(folio); + int delta = folio_nr_pages(folio); + enum lru_list lru = type * LRU_INACTIVE_FILE; + struct lru_gen_struct *lrugen = &lruvec->lrugen; + + VM_WARN_ON_ONCE(old_gen != -1 && old_gen >= MAX_NR_GENS); + VM_WARN_ON_ONCE(new_gen != -1 && new_gen >= MAX_NR_GENS); + VM_WARN_ON_ONCE(old_gen == -1 && new_gen == -1); + + if (old_gen >= 0) + WRITE_ONCE(lrugen->nr_pages[old_gen][type][zone], + lrugen->nr_pages[old_gen][type][zone] - delta); + if (new_gen >= 0) + WRITE_ONCE(lrugen->nr_pages[new_gen][type][zone], + lrugen->nr_pages[new_gen][type][zone] + delta); + + /* addition */ + if (old_gen < 0) { + if (lru_gen_is_active(lruvec, new_gen)) + lru += LRU_ACTIVE; + __update_lru_size(lruvec, lru, zone, delta); + return; + } + + /* deletion */ + if (new_gen < 0) { + if (lru_gen_is_active(lruvec, old_gen)) + lru += LRU_ACTIVE; + __update_lru_size(lruvec, lru, zone, -delta); + return; + } +} + +static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +{ + unsigned long seq; + unsigned long flags; + int gen = folio_lru_gen(folio); + int type = folio_is_file_lru(folio); + int zone = folio_zonenum(folio); + struct lru_gen_struct *lrugen = &lruvec->lrugen; + + VM_WARN_ON_ONCE_FOLIO(gen != -1, folio); + + if (folio_test_unevictable(folio)) + return false; + /* + * There are three common cases for this page: + * 1. If it's hot, e.g., freshly faulted in or previously hot and + * migrated, add it to the youngest generation. + * 2. If it's cold but can't be evicted immediately, i.e., an anon page + * not in swapcache or a dirty page pending writeback, add it to the + * second oldest generation. + * 3. Everything else (clean, cold) is added to the oldest generation. + */ + if (folio_test_active(folio)) + seq = lrugen->max_seq; + else if ((type == LRU_GEN_ANON && !folio_test_swapcache(folio)) || + (folio_test_reclaim(folio) && + (folio_test_dirty(folio) || folio_test_writeback(folio)))) + seq = lrugen->min_seq[type] + 1; + else + seq = lrugen->min_seq[type]; + + gen = lru_gen_from_seq(seq); + flags = (gen + 1UL) << LRU_GEN_PGOFF; + /* see the comment on MIN_NR_GENS about PG_active */ + set_mask_bits(&folio->flags, LRU_GEN_MASK | BIT(PG_active), flags); + + lru_gen_update_size(lruvec, folio, -1, gen); + /* for folio_rotate_reclaimable() */ + if (reclaiming) + list_add_tail(&folio->lru, &lrugen->lists[gen][type][zone]); + else + list_add(&folio->lru, &lrugen->lists[gen][type][zone]); + + return true; +} + +static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +{ + unsigned long flags; + int gen = folio_lru_gen(folio); + + if (gen < 0) + return false; + + VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio); + VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio); + + /* for folio_migrate_flags() */ + flags = !reclaiming && lru_gen_is_active(lruvec, gen) ? BIT(PG_active) : 0; + flags = set_mask_bits(&folio->flags, LRU_GEN_MASK, flags); + gen = ((flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1; + + lru_gen_update_size(lruvec, folio, gen, -1); + list_del(&folio->lru); + + return true; +} + +#else /* !CONFIG_LRU_GEN */ + +static inline bool lru_gen_enabled(void) +{ + return false; +} + +static inline bool lru_gen_in_fault(void) +{ + return false; +} + +static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +{ + return false; +} + +static inline bool lru_gen_del_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) +{ + return false; +} + +#endif /* CONFIG_LRU_GEN */ + static __always_inline void lruvec_add_folio(struct lruvec *lruvec, struct folio *folio) { enum lru_list lru = folio_lru_list(folio); + if (lru_gen_add_folio(lruvec, folio, false)) + return; + update_lru_size(lruvec, lru, folio_zonenum(folio), folio_nr_pages(folio)); if (lru != LRU_UNEVICTABLE) @@ -123,6 +292,9 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struct folio *folio) { enum lru_list lru = folio_lru_list(folio); + if (lru_gen_add_folio(lruvec, folio, true)) + return; + update_lru_size(lruvec, lru, folio_zonenum(folio), folio_nr_pages(folio)); /* This is not expected to be used on LRU_UNEVICTABLE */ @@ -140,6 +312,9 @@ void lruvec_del_folio(struct lruvec *lruvec, struct folio *folio) { enum lru_list lru = folio_lru_list(folio); + if (lru_gen_del_folio(lruvec, folio, false)) + return; + if (lru != LRU_UNEVICTABLE) list_del(&folio->lru); update_lru_size(lruvec, lru, folio_zonenum(folio), diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 18cf0fc5ce67..6f4ea078d90f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -317,6 +317,102 @@ enum lruvec_flags { */ }; +#endif /* !__GENERATING_BOUNDS_H */ + +/* + * Evictable pages are divided into multiple generations. The youngest and the + * oldest generation numbers, max_seq and min_seq, are monotonically increasing. + * They form a sliding window of a variable size [MIN_NR_GENS, MAX_NR_GENS]. An + * offset within MAX_NR_GENS, i.e., gen, indexes the LRU list of the + * corresponding generation. The gen counter in folio->flags stores gen+1 while + * a page is on one of lrugen->lists[]. Otherwise it stores 0. + * + * A page is added to the youngest generation on faulting. The aging needs to + * check the accessed bit at least twice before handing this page over to the + * eviction. The first check takes care of the accessed bit set on the initial + * fault; the second check makes sure this page hasn't been used since then. + * This process, AKA second chance, requires a minimum of two generations, + * hence MIN_NR_GENS. And to maintain ABI compatibility with the active/inactive + * LRU, e.g., /proc/vmstat, these two generations are considered active; the + * rest of generations, if they exist, are considered inactive. See + * lru_gen_is_active(). + * + * PG_active is always cleared while a page is on one of lrugen->lists[] so that + * the aging needs not to worry about it. And it's set again when a page + * considered active is isolated for non-reclaiming purposes, e.g., migration. + * See lru_gen_add_folio() and lru_gen_del_folio(). + * + * MAX_NR_GENS is set to 4 so that the multi-gen LRU can support twice the + * number of categories of the active/inactive LRU when keeping track of + * accesses through page tables. This requires order_base_2(MAX_NR_GENS+1) bits + * in folio->flags. + */ +#define MIN_NR_GENS 2U +#define MAX_NR_GENS 4U + +#ifndef __GENERATING_BOUNDS_H + +struct lruvec; + +#define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) +#define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF) + +#ifdef CONFIG_LRU_GEN + +enum { + LRU_GEN_ANON, + LRU_GEN_FILE, +}; + +/* + * The youngest generation number is stored in max_seq for both anon and file + * types as they are aged on an equal footing. The oldest generation numbers are + * stored in min_seq[] separately for anon and file types as clean file pages + * can be evicted regardless of swap constraints. + * + * Normally anon and file min_seq are in sync. But if swapping is constrained, + * e.g., out of swap space, file min_seq is allowed to advance and leave anon + * min_seq behind. + * + * The number of pages in each generation is eventually consistent and therefore + * can be transiently negative. + */ +struct lru_gen_struct { + /* the aging increments the youngest generation number */ + unsigned long max_seq; + /* the eviction increments the oldest generation numbers */ + unsigned long min_seq[ANON_AND_FILE]; + /* the multi-gen LRU lists, lazily sorted on eviction */ + struct list_head lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; + /* the multi-gen LRU sizes, eventually consistent */ + long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; +}; + +void lru_gen_init_lruvec(struct lruvec *lruvec); + +#ifdef CONFIG_MEMCG +void lru_gen_init_memcg(struct mem_cgroup *memcg); +void lru_gen_exit_memcg(struct mem_cgroup *memcg); +#endif + +#else /* !CONFIG_LRU_GEN */ + +static inline void lru_gen_init_lruvec(struct lruvec *lruvec) +{ +} + +#ifdef CONFIG_MEMCG +static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) +{ +} + +static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg) +{ +} +#endif + +#endif /* CONFIG_LRU_GEN */ + struct lruvec { struct list_head lists[NR_LRU_LISTS]; /* per lruvec lru_lock for memcg */ @@ -334,6 +430,10 @@ struct lruvec { unsigned long refaults[ANON_AND_FILE]; /* Various lruvec state flags (enum lruvec_flags) */ unsigned long flags; +#ifdef CONFIG_LRU_GEN + /* evictable pages divided into generations */ + struct lru_gen_struct lrugen; +#endif #ifdef CONFIG_MEMCG struct pglist_data *pgdat; #endif @@ -749,6 +849,8 @@ static inline bool zone_is_empty(struct zone *zone) #define ZONES_PGOFF (NODES_PGOFF - ZONES_WIDTH) #define LAST_CPUPID_PGOFF (ZONES_PGOFF - LAST_CPUPID_WIDTH) #define KASAN_TAG_PGOFF (LAST_CPUPID_PGOFF - KASAN_TAG_WIDTH) +#define LRU_GEN_PGOFF (KASAN_TAG_PGOFF - LRU_GEN_WIDTH) +#define LRU_REFS_PGOFF (LRU_GEN_PGOFF - LRU_REFS_WIDTH) /* * Define the bit shifts to access each section. For non-existent diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h index ef1e3e736e14..240905407a18 100644 --- a/include/linux/page-flags-layout.h +++ b/include/linux/page-flags-layout.h @@ -55,7 +55,8 @@ #define SECTIONS_WIDTH 0 #endif -#if ZONES_WIDTH + SECTIONS_WIDTH + NODES_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS +#if ZONES_WIDTH + LRU_GEN_WIDTH + SECTIONS_WIDTH + NODES_SHIFT \ + <= BITS_PER_LONG - NR_PAGEFLAGS #define NODES_WIDTH NODES_SHIFT #elif defined(CONFIG_SPARSEMEM_VMEMMAP) #error "Vmemmap: No space for nodes field in page flags" @@ -89,8 +90,8 @@ #define LAST_CPUPID_SHIFT 0 #endif -#if ZONES_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + KASAN_TAG_WIDTH + LAST_CPUPID_SHIFT \ - <= BITS_PER_LONG - NR_PAGEFLAGS +#if ZONES_WIDTH + LRU_GEN_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \ + KASAN_TAG_WIDTH + LAST_CPUPID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT #else #define LAST_CPUPID_WIDTH 0 @@ -100,10 +101,12 @@ #define LAST_CPUPID_NOT_IN_PAGE_FLAGS #endif -#if ZONES_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + KASAN_TAG_WIDTH + LAST_CPUPID_WIDTH \ - > BITS_PER_LONG - NR_PAGEFLAGS +#if ZONES_WIDTH + LRU_GEN_WIDTH + SECTIONS_WIDTH + NODES_WIDTH + \ + KASAN_TAG_WIDTH + LAST_CPUPID_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS #error "Not enough bits in page flags" #endif +#define LRU_REFS_WIDTH 0 + #endif #endif /* _LINUX_PAGE_FLAGS_LAYOUT */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 465ff35a8c00..0b0ae5084e60 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -1058,7 +1058,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page) 1UL << PG_private | 1UL << PG_private_2 | \ 1UL << PG_writeback | 1UL << PG_reserved | \ 1UL << PG_slab | 1UL << PG_active | \ - 1UL << PG_unevictable | __PG_MLOCKED) + 1UL << PG_unevictable | __PG_MLOCKED | LRU_GEN_MASK) /* * Flags checked when a page is prepped for return by the page allocator. @@ -1069,7 +1069,7 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page) * alloc-free cycle to prevent from reusing the page. */ #define PAGE_FLAGS_CHECK_AT_PREP \ - (PAGEFLAGS_MASK & ~__PG_HWPOISON) + ((PAGEFLAGS_MASK & ~__PG_HWPOISON) | LRU_GEN_MASK | LRU_REFS_MASK) #define PAGE_FLAGS_PRIVATE \ (1UL << PG_private | 1UL << PG_private_2) diff --git a/include/linux/sched.h b/include/linux/sched.h index d9a2466664f7..a2dcfb91df03 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -914,6 +914,10 @@ struct task_struct { #ifdef CONFIG_MEMCG unsigned in_user_fault:1; #endif +#ifdef CONFIG_LRU_GEN + /* whether the LRU algorithm may apply to this access */ + unsigned in_lru_fault:1; +#endif #ifdef CONFIG_COMPAT_BRK unsigned brk_randomized:1; #endif -- cgit From ac35a490237446b71e3b4b782b1596967edd0aa8 Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 02:00:03 -0600 Subject: mm: multi-gen LRU: minimal implementation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To avoid confusion, the terms "promotion" and "demotion" will be applied to the multi-gen LRU, as a new convention; the terms "activation" and "deactivation" will be applied to the active/inactive LRU, as usual. The aging produces young generations. Given an lruvec, it increments max_seq when max_seq-min_seq+1 approaches MIN_NR_GENS. The aging promotes hot pages to the youngest generation when it finds them accessed through page tables; the demotion of cold pages happens consequently when it increments max_seq. Promotion in the aging path does not involve any LRU list operations, only the updates of the gen counter and lrugen->nr_pages[]; demotion, unless as the result of the increment of max_seq, requires LRU list operations, e.g., lru_deactivate_fn(). The aging has the complexity O(nr_hot_pages), since it is only interested in hot pages. The eviction consumes old generations. Given an lruvec, it increments min_seq when lrugen->lists[] indexed by min_seq%MAX_NR_GENS becomes empty. A feedback loop modeled after the PID controller monitors refaults over anon and file types and decides which type to evict when both types are available from the same generation. The protection of pages accessed multiple times through file descriptors takes place in the eviction path. Each generation is divided into multiple tiers. A page accessed N times through file descriptors is in tier order_base_2(N). Tiers do not have dedicated lrugen->lists[], only bits in folio->flags. The aforementioned feedback loop also monitors refaults over all tiers and decides when to protect pages in which tiers (N>1), using the first tier (N=0,1) as a baseline. The first tier contains single-use unmapped clean pages, which are most likely the best choices. In contrast to promotion in the aging path, the protection of a page in the eviction path is achieved by moving this page to the next generation, i.e., min_seq+1, if the feedback loop decides so. This approach has the following advantages: 1. It removes the cost of activation in the buffered access path by inferring whether pages accessed multiple times through file descriptors are statistically hot and thus worth protecting in the eviction path. 2. It takes pages accessed through page tables into account and avoids overprotecting pages accessed multiple times through file descriptors. (Pages accessed through page tables are in the first tier, since N=0.) 3. More tiers provide better protection for pages accessed more than twice through file descriptors, when under heavy buffered I/O workloads. Server benchmark results: Single workload: fio (buffered I/O): +[30, 32]% IOPS BW 5.19-rc1: 2673k 10.2GiB/s patch1-6: 3491k 13.3GiB/s Single workload: memcached (anon): -[4, 6]% Ops/sec KB/sec 5.19-rc1: 1161501.04 45177.25 patch1-6: 1106168.46 43025.04 Configurations: CPU: two Xeon 6154 Mem: total 256G Node 1 was only used as a ram disk to reduce the variance in the results. patch drivers/block/brd.c < gfp_flags = GFP_NOIO | __GFP_ZERO | __GFP_HIGHMEM | __GFP_THISNODE; > page = alloc_pages_node(1, gfp_flags, 0); EOF cat >>/etc/systemd/system.conf <>/etc/memcached.conf </sys/fs/cgroup/user.slice/test/memory.max echo $$ >/sys/fs/cgroup/user.slice/test/cgroup.procs fio -name=mglru --numjobs=72 --directory=/mnt --size=1408m \ --buffered=1 --ioengine=io_uring --iodepth=128 \ --iodepth_batch_submit=32 --iodepth_batch_complete=32 \ --rw=randread --random_distribution=random --norandommap \ --time_based --ramp_time=10m --runtime=5m --group_reporting cat memcached.sh modprobe brd rd_nr=1 rd_size=113246208 swapoff -a mkswap /dev/ram0 swapon /dev/ram0 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=65000000 --key-pattern=P:P -c 1 -t 36 \ --ratio 1:0 --pipeline 8 -d 2000 memtier_benchmark -S /var/run/memcached/memcached.sock \ -P memcache_binary -n allkeys --key-minimum=1 \ --key-maximum=65000000 --key-pattern=R:R -c 1 -t 36 \ --ratio 0:1 --pipeline 8 --randomize --distinct-client-seed Client benchmark results: kswapd profiles: 5.19-rc1 40.33% page_vma_mapped_walk (overhead) 21.80% lzo1x_1_do_compress (real work) 7.53% do_raw_spin_lock 3.95% _raw_spin_unlock_irq 2.52% vma_interval_tree_iter_next 2.37% folio_referenced_one 2.28% vma_interval_tree_subtree_search 1.97% anon_vma_interval_tree_iter_first 1.60% ptep_clear_flush 1.06% __zram_bvec_write patch1-6 39.03% lzo1x_1_do_compress (real work) 18.47% page_vma_mapped_walk (overhead) 6.74% _raw_spin_unlock_irq 3.97% do_raw_spin_lock 2.49% ptep_clear_flush 2.48% anon_vma_interval_tree_iter_first 1.92% folio_referenced_one 1.88% __zram_bvec_write 1.48% memmove 1.31% vma_interval_tree_iter_next Configurations: CPU: single Snapdragon 7c Mem: total 4G ChromeOS MemoryPressure [1] [1] https://chromium.googlesource.com/chromiumos/platform/tast-tests/ Link: https://lkml.kernel.org/r/20220918080010.2920238-7-yuzhao@google.com Signed-off-by: Yu Zhao Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Barry Song Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Qi Zheng Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm_inline.h | 36 ++++++++++++++++++++++++++++++++++ include/linux/mmzone.h | 41 +++++++++++++++++++++++++++++++++++++++ include/linux/page-flags-layout.h | 5 ++++- 3 files changed, 81 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 2ff703900fd0..f2b2296a42f9 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -121,6 +121,33 @@ static inline int lru_gen_from_seq(unsigned long seq) return seq % MAX_NR_GENS; } +static inline int lru_hist_from_seq(unsigned long seq) +{ + return seq % NR_HIST_GENS; +} + +static inline int lru_tier_from_refs(int refs) +{ + VM_WARN_ON_ONCE(refs > BIT(LRU_REFS_WIDTH)); + + /* see the comment in folio_lru_refs() */ + return order_base_2(refs + 1); +} + +static inline int folio_lru_refs(struct folio *folio) +{ + unsigned long flags = READ_ONCE(folio->flags); + bool workingset = flags & BIT(PG_workingset); + + /* + * Return the number of accesses beyond PG_referenced, i.e., N-1 if the + * total number of accesses is N>1, since N=0,1 both map to the first + * tier. lru_tier_from_refs() will account for this off-by-one. Also see + * the comment on MAX_NR_TIERS. + */ + return ((flags & LRU_REFS_MASK) >> LRU_REFS_PGOFF) + workingset; +} + static inline int folio_lru_gen(struct folio *folio) { unsigned long flags = READ_ONCE(folio->flags); @@ -173,6 +200,15 @@ static inline void lru_gen_update_size(struct lruvec *lruvec, struct folio *foli __update_lru_size(lruvec, lru, zone, -delta); return; } + + /* promotion */ + if (!lru_gen_is_active(lruvec, old_gen) && lru_gen_is_active(lruvec, new_gen)) { + __update_lru_size(lruvec, lru, zone, -delta); + __update_lru_size(lruvec, lru + LRU_ACTIVE, zone, delta); + } + + /* demotion requires isolation, e.g., lru_deactivate_fn() */ + VM_WARN_ON_ONCE(lru_gen_is_active(lruvec, old_gen) && !lru_gen_is_active(lruvec, new_gen)); } static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, bool reclaiming) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6f4ea078d90f..7e343420bfb1 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -350,6 +350,28 @@ enum lruvec_flags { #define MIN_NR_GENS 2U #define MAX_NR_GENS 4U +/* + * Each generation is divided into multiple tiers. A page accessed N times + * through file descriptors is in tier order_base_2(N). A page in the first tier + * (N=0,1) is marked by PG_referenced unless it was faulted in through page + * tables or read ahead. A page in any other tier (N>1) is marked by + * PG_referenced and PG_workingset. This implies a minimum of two tiers is + * supported without using additional bits in folio->flags. + * + * In contrast to moving across generations which requires the LRU lock, moving + * across tiers only involves atomic operations on folio->flags and therefore + * has a negligible cost in the buffered access path. In the eviction path, + * comparisons of refaulted/(evicted+protected) from the first tier and the + * rest infer whether pages accessed multiple times through file descriptors + * are statistically hot and thus worth protecting. + * + * MAX_NR_TIERS is set to 4 so that the multi-gen LRU can support twice the + * number of categories of the active/inactive LRU when keeping track of + * accesses through file descriptors. This uses MAX_NR_TIERS-2 spare bits in + * folio->flags. + */ +#define MAX_NR_TIERS 4U + #ifndef __GENERATING_BOUNDS_H struct lruvec; @@ -364,6 +386,16 @@ enum { LRU_GEN_FILE, }; +#define MIN_LRU_BATCH BITS_PER_LONG +#define MAX_LRU_BATCH (MIN_LRU_BATCH * 64) + +/* whether to keep historical stats from evicted generations */ +#ifdef CONFIG_LRU_GEN_STATS +#define NR_HIST_GENS MAX_NR_GENS +#else +#define NR_HIST_GENS 1U +#endif + /* * The youngest generation number is stored in max_seq for both anon and file * types as they are aged on an equal footing. The oldest generation numbers are @@ -386,6 +418,15 @@ struct lru_gen_struct { struct list_head lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the multi-gen LRU sizes, eventually consistent */ long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; + /* the exponential moving average of refaulted */ + unsigned long avg_refaulted[ANON_AND_FILE][MAX_NR_TIERS]; + /* the exponential moving average of evicted+protected */ + unsigned long avg_total[ANON_AND_FILE][MAX_NR_TIERS]; + /* the first tier doesn't need protection, hence the minus one */ + unsigned long protected[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS - 1]; + /* can be modified without holding the LRU lock */ + atomic_long_t evicted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; + atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; }; void lru_gen_init_lruvec(struct lruvec *lruvec); diff --git a/include/linux/page-flags-layout.h b/include/linux/page-flags-layout.h index 240905407a18..7d79818dc065 100644 --- a/include/linux/page-flags-layout.h +++ b/include/linux/page-flags-layout.h @@ -106,7 +106,10 @@ #error "Not enough bits in page flags" #endif -#define LRU_REFS_WIDTH 0 +/* see the comment on MAX_NR_TIERS */ +#define LRU_REFS_WIDTH min(__LRU_REFS_WIDTH, BITS_PER_LONG - NR_PAGEFLAGS - \ + ZONES_WIDTH - LRU_GEN_WIDTH - SECTIONS_WIDTH - \ + NODES_WIDTH - KASAN_TAG_WIDTH - LAST_CPUPID_WIDTH) #endif #endif /* _LINUX_PAGE_FLAGS_LAYOUT */ -- cgit From 018ee47f14893d500131dfca2ff9f3ff8ebd4ed2 Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 02:00:04 -0600 Subject: mm: multi-gen LRU: exploit locality in rmap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Searching the rmap for PTEs mapping each page on an LRU list (to test and clear the accessed bit) can be expensive because pages from different VMAs (PA space) are not cache friendly to the rmap (VA space). For workloads mostly using mapped pages, searching the rmap can incur the highest CPU cost in the reclaim path. This patch exploits spatial locality to reduce the trips into the rmap. When shrink_page_list() walks the rmap and finds a young PTE, a new function lru_gen_look_around() scans at most BITS_PER_LONG-1 adjacent PTEs. On finding another young PTE, it clears the accessed bit and updates the gen counter of the page mapped by this PTE to (max_seq%MAX_NR_GENS)+1. Server benchmark results: Single workload: fio (buffered I/O): no change Single workload: memcached (anon): +[3, 5]% Ops/sec KB/sec patch1-6: 1106168.46 43025.04 patch1-7: 1147696.57 44640.29 Configurations: no change Client benchmark results: kswapd profiles: patch1-6 39.03% lzo1x_1_do_compress (real work) 18.47% page_vma_mapped_walk (overhead) 6.74% _raw_spin_unlock_irq 3.97% do_raw_spin_lock 2.49% ptep_clear_flush 2.48% anon_vma_interval_tree_iter_first 1.92% folio_referenced_one 1.88% __zram_bvec_write 1.48% memmove 1.31% vma_interval_tree_iter_next patch1-7 48.16% lzo1x_1_do_compress (real work) 8.20% page_vma_mapped_walk (overhead) 7.06% _raw_spin_unlock_irq 2.92% ptep_clear_flush 2.53% __zram_bvec_write 2.11% do_raw_spin_lock 2.02% memmove 1.93% lru_gen_look_around 1.56% free_unref_page_list 1.40% memset Configurations: no change Link: https://lkml.kernel.org/r/20220918080010.2920238-8-yuzhao@google.com Signed-off-by: Yu Zhao Acked-by: Barry Song Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Qi Zheng Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 31 +++++++++++++++++++++++++++++++ include/linux/mm.h | 5 +++++ include/linux/mmzone.h | 6 ++++++ 3 files changed, 42 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index a2461f9a8738..9b8ab121d948 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -445,6 +445,7 @@ static inline struct obj_cgroup *__folio_objcg(struct folio *folio) * - LRU isolation * - lock_page_memcg() * - exclusive reference + * - mem_cgroup_trylock_pages() * * For a kmem folio a caller should hold an rcu read lock to protect memcg * associated with a kmem folio from being released. @@ -506,6 +507,7 @@ static inline struct mem_cgroup *folio_memcg_rcu(struct folio *folio) * - LRU isolation * - lock_page_memcg() * - exclusive reference + * - mem_cgroup_trylock_pages() * * For a kmem page a caller should hold an rcu read lock to protect memcg * associated with a kmem page from being released. @@ -960,6 +962,23 @@ void unlock_page_memcg(struct page *page); void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val); +/* try to stablize folio_memcg() for all the pages in a memcg */ +static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) +{ + rcu_read_lock(); + + if (mem_cgroup_disabled() || !atomic_read(&memcg->moving_account)) + return true; + + rcu_read_unlock(); + return false; +} + +static inline void mem_cgroup_unlock_pages(void) +{ + rcu_read_unlock(); +} + /* idx can be of type enum memcg_stat_item or node_stat_item */ static inline void mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) @@ -1434,6 +1453,18 @@ static inline void folio_memcg_unlock(struct folio *folio) { } +static inline bool mem_cgroup_trylock_pages(struct mem_cgroup *memcg) +{ + /* to match folio_memcg_rcu() */ + rcu_read_lock(); + return true; +} + +static inline void mem_cgroup_unlock_pages(void) +{ + rcu_read_unlock(); +} + static inline void mem_cgroup_handle_over_high(void) { } diff --git a/include/linux/mm.h b/include/linux/mm.h index 8a5ad9d050bf..7cc9ffc19e7f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1490,6 +1490,11 @@ static inline unsigned long folio_pfn(struct folio *folio) return page_to_pfn(&folio->page); } +static inline struct folio *pfn_folio(unsigned long pfn) +{ + return page_folio(pfn_to_page(pfn)); +} + static inline atomic_t *folio_pincount_ptr(struct folio *folio) { return &folio_page(folio, 1)->compound_pincount; diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7e343420bfb1..9ef5aa37c60c 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -375,6 +375,7 @@ enum lruvec_flags { #ifndef __GENERATING_BOUNDS_H struct lruvec; +struct page_vma_mapped_walk; #define LRU_GEN_MASK ((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF) #define LRU_REFS_MASK ((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF) @@ -430,6 +431,7 @@ struct lru_gen_struct { }; void lru_gen_init_lruvec(struct lruvec *lruvec); +void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); #ifdef CONFIG_MEMCG void lru_gen_init_memcg(struct mem_cgroup *memcg); @@ -442,6 +444,10 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } +static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +{ +} + #ifdef CONFIG_MEMCG static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) { -- cgit From bd74fdaea146029e4fa12c6de89adbe0779348a9 Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 02:00:05 -0600 Subject: mm: multi-gen LRU: support page table walks MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit To further exploit spatial locality, the aging prefers to walk page tables to search for young PTEs and promote hot pages. A kill switch will be added in the next patch to disable this behavior. When disabled, the aging relies on the rmap only. NB: this behavior has nothing similar with the page table scanning in the 2.4 kernel [1], which searches page tables for old PTEs, adds cold pages to swapcache and unmaps them. To avoid confusion, the term "iteration" specifically means the traversal of an entire mm_struct list; the term "walk" will be applied to page tables and the rmap, as usual. An mm_struct list is maintained for each memcg, and an mm_struct follows its owner task to the new memcg when this task is migrated. Given an lruvec, the aging iterates lruvec_memcg()->mm_list and calls walk_page_range() with each mm_struct on this list to promote hot pages before it increments max_seq. When multiple page table walkers iterate the same list, each of them gets a unique mm_struct; therefore they can run concurrently. Page table walkers ignore any misplaced pages, e.g., if an mm_struct was migrated, pages it left in the previous memcg will not be promoted when its current memcg is under reclaim. Similarly, page table walkers will not promote pages from nodes other than the one under reclaim. This patch uses the following optimizations when walking page tables: 1. It tracks the usage of mm_struct's between context switches so that page table walkers can skip processes that have been sleeping since the last iteration. 2. It uses generational Bloom filters to record populated branches so that page table walkers can reduce their search space based on the query results, e.g., to skip page tables containing mostly holes or misplaced pages. 3. It takes advantage of the accessed bit in non-leaf PMD entries when CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y. 4. It does not zigzag between a PGD table and the same PMD table spanning multiple VMAs. IOW, it finishes all the VMAs within the range of the same PMD table before it returns to a PGD table. This improves the cache performance for workloads that have large numbers of tiny VMAs [2], especially when CONFIG_PGTABLE_LEVELS=5. Server benchmark results: Single workload: fio (buffered I/O): no change Single workload: memcached (anon): +[8, 10]% Ops/sec KB/sec patch1-7: 1147696.57 44640.29 patch1-8: 1245274.91 48435.66 Configurations: no change Client benchmark results: kswapd profiles: patch1-7 48.16% lzo1x_1_do_compress (real work) 8.20% page_vma_mapped_walk (overhead) 7.06% _raw_spin_unlock_irq 2.92% ptep_clear_flush 2.53% __zram_bvec_write 2.11% do_raw_spin_lock 2.02% memmove 1.93% lru_gen_look_around 1.56% free_unref_page_list 1.40% memset patch1-8 49.44% lzo1x_1_do_compress (real work) 6.19% page_vma_mapped_walk (overhead) 5.97% _raw_spin_unlock_irq 3.13% get_pfn_folio 2.85% ptep_clear_flush 2.42% __zram_bvec_write 2.08% do_raw_spin_lock 1.92% memmove 1.44% alloc_zspage 1.36% memset Configurations: no change Thanks to the following developers for their efforts [3]. kernel test robot [1] https://lwn.net/Articles/23732/ [2] https://llvm.org/docs/ScudoHardenedAllocator.html [3] https://lore.kernel.org/r/202204160827.ekEARWQo-lkp@intel.com/ Link: https://lkml.kernel.org/r/20220918080010.2920238-9-yuzhao@google.com Signed-off-by: Yu Zhao Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Barry Song Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Qi Zheng Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 5 +++ include/linux/mm_types.h | 76 ++++++++++++++++++++++++++++++++++++++++++++++ include/linux/mmzone.h | 56 +++++++++++++++++++++++++++++++++- include/linux/swap.h | 4 +++ 4 files changed, 140 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 9b8ab121d948..344022f102c2 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -350,6 +350,11 @@ struct mem_cgroup { struct deferred_split deferred_split_queue; #endif +#ifdef CONFIG_LRU_GEN + /* per-memcg mm_struct list */ + struct lru_gen_mm_list mm_list; +#endif + struct mem_cgroup_per_node *nodeinfo[]; }; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index cf97f3884fda..e1797813cc2c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -672,6 +672,22 @@ struct mm_struct { */ unsigned long ksm_merging_pages; #endif +#ifdef CONFIG_LRU_GEN + struct { + /* this mm_struct is on lru_gen_mm_list */ + struct list_head list; + /* + * Set when switching to this mm_struct, as a hint of + * whether it has been used since the last time per-node + * page table walkers cleared the corresponding bits. + */ + unsigned long bitmap; +#ifdef CONFIG_MEMCG + /* points to the memcg of "owner" above */ + struct mem_cgroup *memcg; +#endif + } lru_gen; +#endif /* CONFIG_LRU_GEN */ } __randomize_layout; /* @@ -698,6 +714,66 @@ static inline cpumask_t *mm_cpumask(struct mm_struct *mm) return (struct cpumask *)&mm->cpu_bitmap; } +#ifdef CONFIG_LRU_GEN + +struct lru_gen_mm_list { + /* mm_struct list for page table walkers */ + struct list_head fifo; + /* protects the list above */ + spinlock_t lock; +}; + +void lru_gen_add_mm(struct mm_struct *mm); +void lru_gen_del_mm(struct mm_struct *mm); +#ifdef CONFIG_MEMCG +void lru_gen_migrate_mm(struct mm_struct *mm); +#endif + +static inline void lru_gen_init_mm(struct mm_struct *mm) +{ + INIT_LIST_HEAD(&mm->lru_gen.list); + mm->lru_gen.bitmap = 0; +#ifdef CONFIG_MEMCG + mm->lru_gen.memcg = NULL; +#endif +} + +static inline void lru_gen_use_mm(struct mm_struct *mm) +{ + /* + * When the bitmap is set, page reclaim knows this mm_struct has been + * used since the last time it cleared the bitmap. So it might be worth + * walking the page tables of this mm_struct to clear the accessed bit. + */ + WRITE_ONCE(mm->lru_gen.bitmap, -1); +} + +#else /* !CONFIG_LRU_GEN */ + +static inline void lru_gen_add_mm(struct mm_struct *mm) +{ +} + +static inline void lru_gen_del_mm(struct mm_struct *mm) +{ +} + +#ifdef CONFIG_MEMCG +static inline void lru_gen_migrate_mm(struct mm_struct *mm) +{ +} +#endif + +static inline void lru_gen_init_mm(struct mm_struct *mm) +{ +} + +static inline void lru_gen_use_mm(struct mm_struct *mm) +{ +} + +#endif /* CONFIG_LRU_GEN */ + struct mmu_gather; extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9ef5aa37c60c..b1635c4020dc 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -408,7 +408,7 @@ enum { * min_seq behind. * * The number of pages in each generation is eventually consistent and therefore - * can be transiently negative. + * can be transiently negative when reset_batch_size() is pending. */ struct lru_gen_struct { /* the aging increments the youngest generation number */ @@ -430,6 +430,53 @@ struct lru_gen_struct { atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; }; +enum { + MM_LEAF_TOTAL, /* total leaf entries */ + MM_LEAF_OLD, /* old leaf entries */ + MM_LEAF_YOUNG, /* young leaf entries */ + MM_NONLEAF_TOTAL, /* total non-leaf entries */ + MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ + MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ + NR_MM_STATS +}; + +/* double-buffering Bloom filters */ +#define NR_BLOOM_FILTERS 2 + +struct lru_gen_mm_state { + /* set to max_seq after each iteration */ + unsigned long seq; + /* where the current iteration continues (inclusive) */ + struct list_head *head; + /* where the last iteration ended (exclusive) */ + struct list_head *tail; + /* to wait for the last page table walker to finish */ + struct wait_queue_head wait; + /* Bloom filters flip after each iteration */ + unsigned long *filters[NR_BLOOM_FILTERS]; + /* the mm stats for debugging */ + unsigned long stats[NR_HIST_GENS][NR_MM_STATS]; + /* the number of concurrent page table walkers */ + int nr_walkers; +}; + +struct lru_gen_mm_walk { + /* the lruvec under reclaim */ + struct lruvec *lruvec; + /* unstable max_seq from lru_gen_struct */ + unsigned long max_seq; + /* the next address within an mm to scan */ + unsigned long next_addr; + /* to batch promoted pages */ + int nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; + /* to batch the mm stats */ + int mm_stats[NR_MM_STATS]; + /* total batched items */ + int batched; + bool can_swap; + bool force_scan; +}; + void lru_gen_init_lruvec(struct lruvec *lruvec); void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); @@ -480,6 +527,8 @@ struct lruvec { #ifdef CONFIG_LRU_GEN /* evictable pages divided into generations */ struct lru_gen_struct lrugen; + /* to concurrently iterate lru_gen_mm_list */ + struct lru_gen_mm_state mm_state; #endif #ifdef CONFIG_MEMCG struct pglist_data *pgdat; @@ -1176,6 +1225,11 @@ typedef struct pglist_data { unsigned long flags; +#ifdef CONFIG_LRU_GEN + /* kswap mm walk data */ + struct lru_gen_mm_walk mm_walk; +#endif + ZONE_PADDING(_pad2_) /* Per-node vmstats */ diff --git a/include/linux/swap.h b/include/linux/swap.h index 43150b9bbc5c..6308150b234a 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -162,6 +162,10 @@ union swap_header { */ struct reclaim_state { unsigned long reclaimed_slab; +#ifdef CONFIG_LRU_GEN + /* per-thread mm walk data */ + struct lru_gen_mm_walk *mm_walk; +#endif }; #ifdef __KERNEL__ -- cgit From 354ed597442952fb680c9cafc7e4eb8a76f9514c Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 02:00:07 -0600 Subject: mm: multi-gen LRU: kill switch MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add /sys/kernel/mm/lru_gen/enabled as a kill switch. Components that can be disabled include: 0x0001: the multi-gen LRU core 0x0002: walking page table, when arch_has_hw_pte_young() returns true 0x0004: clearing the accessed bit in non-leaf PMD entries, when CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG=y [yYnN]: apply to all the components above E.g., echo y >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0007 echo 5 >/sys/kernel/mm/lru_gen/enabled cat /sys/kernel/mm/lru_gen/enabled 0x0005 NB: the page table walks happen on the scale of seconds under heavy memory pressure, in which case the mmap_lock contention is a lesser concern, compared with the LRU lock contention and the I/O congestion. So far the only well-known case of the mmap_lock contention happens on Android, due to Scudo [1] which allocates several thousand VMAs for merely a few hundred MBs. The SPF and the Maple Tree also have provided their own assessments [2][3]. However, if walking page tables does worsen the mmap_lock contention, the kill switch can be used to disable it. In this case the multi-gen LRU will suffer a minor performance degradation, as shown previously. Clearing the accessed bit in non-leaf PMD entries can also be disabled, since this behavior was not tested on x86 varieties other than Intel and AMD. [1] https://source.android.com/devices/tech/debug/scudo [2] https://lore.kernel.org/r/20220128131006.67712-1-michel@lespinasse.org/ [3] https://lore.kernel.org/r/20220426150616.3937571-1-Liam.Howlett@oracle.com/ Link: https://lkml.kernel.org/r/20220918080010.2920238-11-yuzhao@google.com Signed-off-by: Yu Zhao Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Barry Song Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Qi Zheng Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/cgroup.h | 15 ++++++++++++++- include/linux/mm_inline.h | 15 +++++++++++++-- include/linux/mmzone.h | 9 +++++++++ 3 files changed, 36 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index ac5d0515680e..9179463c3c9f 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -432,6 +432,18 @@ static inline void cgroup_put(struct cgroup *cgrp) css_put(&cgrp->self); } +extern struct mutex cgroup_mutex; + +static inline void cgroup_lock(void) +{ + mutex_lock(&cgroup_mutex); +} + +static inline void cgroup_unlock(void) +{ + mutex_unlock(&cgroup_mutex); +} + /** * task_css_set_check - obtain a task's css_set with extra access conditions * @task: the task to obtain css_set for @@ -446,7 +458,6 @@ static inline void cgroup_put(struct cgroup *cgrp) * as locks used during the cgroup_subsys::attach() methods. */ #ifdef CONFIG_PROVE_RCU -extern struct mutex cgroup_mutex; extern spinlock_t css_set_lock; #define task_css_set_check(task, __c) \ rcu_dereference_check((task)->cgroups, \ @@ -708,6 +719,8 @@ struct cgroup; static inline u64 cgroup_id(const struct cgroup *cgrp) { return 1; } static inline void css_get(struct cgroup_subsys_state *css) {} static inline void css_put(struct cgroup_subsys_state *css) {} +static inline void cgroup_lock(void) {} +static inline void cgroup_unlock(void) {} static inline int cgroup_attach_task_all(struct task_struct *from, struct task_struct *t) { return 0; } static inline int cgroupstats_build(struct cgroupstats *stats, diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index f2b2296a42f9..4949eda9a9a2 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -106,10 +106,21 @@ static __always_inline enum lru_list folio_lru_list(struct folio *folio) #ifdef CONFIG_LRU_GEN +#ifdef CONFIG_LRU_GEN_ENABLED static inline bool lru_gen_enabled(void) { - return true; + DECLARE_STATIC_KEY_TRUE(lru_gen_caps[NR_LRU_GEN_CAPS]); + + return static_branch_likely(&lru_gen_caps[LRU_GEN_CORE]); +} +#else +static inline bool lru_gen_enabled(void) +{ + DECLARE_STATIC_KEY_FALSE(lru_gen_caps[NR_LRU_GEN_CAPS]); + + return static_branch_unlikely(&lru_gen_caps[LRU_GEN_CORE]); } +#endif static inline bool lru_gen_in_fault(void) { @@ -222,7 +233,7 @@ static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *folio, VM_WARN_ON_ONCE_FOLIO(gen != -1, folio); - if (folio_test_unevictable(folio)) + if (folio_test_unevictable(folio) || !lrugen->enabled) return false; /* * There are three common cases for this page: diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index b1635c4020dc..95c58c7fbdff 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -387,6 +387,13 @@ enum { LRU_GEN_FILE, }; +enum { + LRU_GEN_CORE, + LRU_GEN_MM_WALK, + LRU_GEN_NONLEAF_YOUNG, + NR_LRU_GEN_CAPS +}; + #define MIN_LRU_BATCH BITS_PER_LONG #define MAX_LRU_BATCH (MIN_LRU_BATCH * 64) @@ -428,6 +435,8 @@ struct lru_gen_struct { /* can be modified without holding the LRU lock */ atomic_long_t evicted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; + /* whether the multi-gen LRU is enabled */ + bool enabled; }; enum { -- cgit From 1332a809d95a4fc763cabe5ecb6d4fb6a6d941b2 Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 02:00:08 -0600 Subject: mm: multi-gen LRU: thrashing prevention MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add /sys/kernel/mm/lru_gen/min_ttl_ms for thrashing prevention, as requested by many desktop users [1]. When set to value N, it prevents the working set of N milliseconds from getting evicted. The OOM killer is triggered if this working set cannot be kept in memory. Based on the average human detectable lag (~100ms), N=1000 usually eliminates intolerable lags due to thrashing. Larger values like N=3000 make lags less noticeable at the risk of premature OOM kills. Compared with the size-based approach [2], this time-based approach has the following advantages: 1. It is easier to configure because it is agnostic to applications and memory sizes. 2. It is more reliable because it is directly wired to the OOM killer. [1] https://lore.kernel.org/r/Ydza%2FzXKY9ATRoh6@google.com/ [2] https://lore.kernel.org/r/20101028191523.GA14972@google.com/ Link: https://lkml.kernel.org/r/20220918080010.2920238-12-yuzhao@google.com Signed-off-by: Yu Zhao Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Barry Song Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Qi Zheng Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 95c58c7fbdff..87347945270b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -422,6 +422,8 @@ struct lru_gen_struct { unsigned long max_seq; /* the eviction increments the oldest generation numbers */ unsigned long min_seq[ANON_AND_FILE]; + /* the birth time of each generation in jiffies */ + unsigned long timestamps[MAX_NR_GENS]; /* the multi-gen LRU lists, lazily sorted on eviction */ struct list_head lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES]; /* the multi-gen LRU sizes, eventually consistent */ -- cgit From d6c3af7d8a2ba5602c28841248c551a712ac50f5 Mon Sep 17 00:00:00 2001 From: Yu Zhao Date: Sun, 18 Sep 2022 02:00:09 -0600 Subject: mm: multi-gen LRU: debugfs interface MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add /sys/kernel/debug/lru_gen for working set estimation and proactive reclaim. These techniques are commonly used to optimize job scheduling (bin packing) in data centers [1][2]. Compared with the page table-based approach and the PFN-based approach, this lruvec-based approach has the following advantages: 1. It offers better choices because it is aware of memcgs, NUMA nodes, shared mappings and unmapped page cache. 2. It is more scalable because it is O(nr_hot_pages), whereas the PFN-based approach is O(nr_total_pages). Add /sys/kernel/debug/lru_gen_full for debugging. [1] https://dl.acm.org/doi/10.1145/3297858.3304053 [2] https://dl.acm.org/doi/10.1145/3503222.3507731 Link: https://lkml.kernel.org/r/20220918080010.2920238-13-yuzhao@google.com Signed-off-by: Yu Zhao Reviewed-by: Qi Zheng Acked-by: Brian Geffon Acked-by: Jan Alexander Steffens (heftig) Acked-by: Oleksandr Natalenko Acked-by: Steven Barrett Acked-by: Suleiman Souhlal Tested-by: Daniel Byrne Tested-by: Donald Carr Tested-by: Holger Hoffstätte Tested-by: Konstantin Kharlamov Tested-by: Shuang Zhai Tested-by: Sofia Trinh Tested-by: Vaibhav Jain Cc: Andi Kleen Cc: Aneesh Kumar K.V Cc: Barry Song Cc: Catalin Marinas Cc: Dave Hansen Cc: Hillf Danton Cc: Jens Axboe Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Linus Torvalds Cc: Matthew Wilcox Cc: Mel Gorman Cc: Miaohe Lin Cc: Michael Larabel Cc: Michal Hocko Cc: Mike Rapoport Cc: Mike Rapoport Cc: Peter Zijlstra Cc: Tejun Heo Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/nodemask.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 4b71a96190a8..3a0eec9f2faa 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -493,6 +493,7 @@ static inline int num_node_state(enum node_states state) #define first_online_node 0 #define first_memory_node 0 #define next_online_node(nid) (MAX_NUMNODES) +#define next_memory_node(nid) (MAX_NUMNODES) #define nr_node_ids 1U #define nr_online_nodes 1U -- cgit From 992bf77591cb7e696fcc59aa7e64d1200b673513 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:33 +0530 Subject: mm/demotion: add support for explicit memory tiers Patch series "mm/demotion: Memory tiers and demotion", v15. The current kernel has the basic memory tiering support: Inactive pages on a higher tier NUMA node can be migrated (demoted) to a lower tier NUMA node to make room for new allocations on the higher tier NUMA node. Frequently accessed pages on a lower tier NUMA node can be migrated (promoted) to a higher tier NUMA node to improve the performance. In the current kernel, memory tiers are defined implicitly via a demotion path relationship between NUMA nodes, which is created during the kernel initialization and updated when a NUMA node is hot-added or hot-removed. The current implementation puts all nodes with CPU into the highest tier, and builds the tier hierarchy tier-by-tier by establishing the per-node demotion targets based on the distances between nodes. This current memory tier kernel implementation needs to be improved for several important use cases: * The current tier initialization code always initializes each memory-only NUMA node into a lower tier. But a memory-only NUMA node may have a high performance memory device (e.g. a DRAM-backed memory-only node on a virtual machine) and that should be put into a higher tier. * The current tier hierarchy always puts CPU nodes into the top tier. But on a system with HBM (e.g. GPU memory) devices, these memory-only HBM NUMA nodes should be in the top tier, and DRAM nodes with CPUs are better to be placed into the next lower tier. * Also because the current tier hierarchy always puts CPU nodes into the top tier, when a CPU is hot-added (or hot-removed) and triggers a memory node from CPU-less into a CPU node (or vice versa), the memory tier hierarchy gets changed, even though no memory node is added or removed. This can make the tier hierarchy unstable and make it difficult to support tier-based memory accounting. * A higher tier node can only be demoted to nodes with shortest distance on the next lower tier as defined by the demotion path, not any other node from any lower tier. This strict, demotion order does not work in all use cases (e.g. some use cases may want to allow cross-socket demotion to another node in the same demotion tier as a fallback when the preferred demotion node is out of space), and has resulted in the feature request for an interface to override the system-wide, per-node demotion order from the userspace. This demotion order is also inconsistent with the page allocation fallback order when all the nodes in a higher tier are out of space: The page allocation can fall back to any node from any lower tier, whereas the demotion order doesn't allow that. This patch series make the creation of memory tiers explicit under the control of device driver. Memory Tier Initialization ========================== Linux kernel presents memory devices as NUMA nodes and each memory device is of a specific type. The memory type of a device is represented by its abstract distance. A memory tier corresponds to a range of abstract distance. This allows for classifying memory devices with a specific performance range into a memory tier. By default, all memory nodes are assigned to the default tier with abstract distance 512. A device driver can move its memory nodes from the default tier. For example, PMEM can move its memory nodes below the default tier, whereas GPU can move its memory nodes above the default tier. The kernel initialization code makes the decision on which exact tier a memory node should be assigned to based on the requests from the device drivers as well as the memory device hardware information provided by the firmware. Hot-adding/removing CPUs doesn't affect memory tier hierarchy. This patch (of 10): In the current kernel, memory tiers are defined implicitly via a demotion path relationship between NUMA nodes, which is created during the kernel initialization and updated when a NUMA node is hot-added or hot-removed. The current implementation puts all nodes with CPU into the highest tier, and builds the tier hierarchy by establishing the per-node demotion targets based on the distances between nodes. This current memory tier kernel implementation needs to be improved for several important use cases, The current tier initialization code always initializes each memory-only NUMA node into a lower tier. But a memory-only NUMA node may have a high performance memory device (e.g. a DRAM-backed memory-only node on a virtual machine) that should be put into a higher tier. The current tier hierarchy always puts CPU nodes into the top tier. But on a system with HBM or GPU devices, the memory-only NUMA nodes mapping these devices should be in the top tier, and DRAM nodes with CPUs are better to be placed into the next lower tier. With current kernel higher tier node can only be demoted to nodes with shortest distance on the next lower tier as defined by the demotion path, not any other node from any lower tier. This strict, demotion order does not work in all use cases (e.g. some use cases may want to allow cross-socket demotion to another node in the same demotion tier as a fallback when the preferred demotion node is out of space), This demotion order is also inconsistent with the page allocation fallback order when all the nodes in a higher tier are out of space: The page allocation can fall back to any node from any lower tier, whereas the demotion order doesn't allow that. This patch series address the above by defining memory tiers explicitly. Linux kernel presents memory devices as NUMA nodes and each memory device is of a specific type. The memory type of a device is represented by its abstract distance. A memory tier corresponds to a range of abstract distance. This allows for classifying memory devices with a specific performance range into a memory tier. This patch configures the range/chunk size to be 128. The default DRAM abstract distance is 512. We can have 4 memory tiers below the default DRAM with abstract distance range 0 - 127, 127 - 255, 256- 383, 384 - 511. Faster memory devices can be placed in these faster(higher) memory tiers. Slower memory devices like persistent memory will have abstract distance higher than the default DRAM level. [akpm@linux-foundation.org: fix comment, per Aneesh] Link: https://lkml.kernel.org/r/20220818131042.113280-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20220818131042.113280-2-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: Jagdish Gediya Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/memory-tiers.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 include/linux/memory-tiers.h (limited to 'include/linux') diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h new file mode 100644 index 000000000000..ecada7bf4091 --- /dev/null +++ b/include/linux/memory-tiers.h @@ -0,0 +1,18 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_MEMORY_TIERS_H +#define _LINUX_MEMORY_TIERS_H + +/* + * Each tier cover a abstrace distance chunk size of 128 + */ +#define MEMTIER_CHUNK_BITS 7 +#define MEMTIER_CHUNK_SIZE (1 << MEMTIER_CHUNK_BITS) +/* + * Smaller abstract distance values imply faster (higher) memory tiers. Offset + * the DRAM adistance so that we can accommodate devices with a slightly lower + * adistance value (slightly faster) than default DRAM adistance to be part of + * the same memory tier. + */ +#define MEMTIER_ADISTANCE_DRAM ((4 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1)) + +#endif /* _LINUX_MEMORY_TIERS_H */ -- cgit From 9195244022788935eac0df16132394ffa5613542 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:34 +0530 Subject: mm/demotion: move memory demotion related code This moves memory demotion related code to mm/memory-tiers.c. No functional change in this patch. Link: https://lkml.kernel.org/r/20220818131042.113280-3-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Jagdish Gediya Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/memory-tiers.h | 8 ++++++++ include/linux/migrate.h | 2 -- 2 files changed, 8 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index ecada7bf4091..5f24396da76c 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -15,4 +15,12 @@ */ #define MEMTIER_ADISTANCE_DRAM ((4 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1)) +#ifdef CONFIG_NUMA +#include +extern bool numa_demotion_enabled; + +#else + +#define numa_demotion_enabled false +#endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 22c0a0cf5e0c..96f8c84413fe 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -103,7 +103,6 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, #if defined(CONFIG_MIGRATION) && defined(CONFIG_NUMA) extern void set_migration_target_nodes(void); extern void migrate_on_reclaim_init(void); -extern bool numa_demotion_enabled; extern int next_demotion_node(int node); #else static inline void set_migration_target_nodes(void) {} @@ -112,7 +111,6 @@ static inline int next_demotion_node(int node) { return NUMA_NO_NODE; } -#define numa_demotion_enabled false #endif #ifdef CONFIG_COMPACTION -- cgit From c6123a19c9f040e597f55f856c679651c26b31d1 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:35 +0530 Subject: mm/demotion: add hotplug callbacks to handle new numa node onlined If the new NUMA node onlined doesn't have a abstract distance assigned, the kernel adds the NUMA node to default memory tier. [aneesh.kumar@linux.ibm.com: fix kernel error with memory hotplug] Link: https://lkml.kernel.org/r/20220825092019.379069-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20220818131042.113280-4-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Jagdish Gediya Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/memory-tiers.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 5f24396da76c..7ae6308b2191 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -14,6 +14,7 @@ * the same memory tier. */ #define MEMTIER_ADISTANCE_DRAM ((4 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1)) +#define MEMTIER_HOTPLUG_PRIO 100 #ifdef CONFIG_NUMA #include -- cgit From 7b88bda3761b95856cf97822efe8281c8100067b Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:36 +0530 Subject: mm/demotion/dax/kmem: set node's abstract distance to MEMTIER_DEFAULT_DAX_ADISTANCE By default, all nodes are assigned to the default memory tier which is the memory tier designated for nodes with DRAM Set dax kmem device node's tier to slower memory tier by assigning abstract distance to MEMTIER_DEFAULT_DAX_ADISTANCE. Low-level drivers like papr_scm or ACPI NFIT can initialize memory device type to a more accurate value based on device tree details or HMAT. If the kernel doesn't find the memory type initialized, a default slower memory type is assigned by the kmem driver. [aneesh.kumar@linux.ibm.com: assign correct memory type for multiple dax devices with the same node affinity] Link: https://lkml.kernel.org/r/20220826100224.542312-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20220818131042.113280-5-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Jagdish Gediya Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/memory-tiers.h | 42 +++++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 7ae6308b2191..49d281866dca 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -2,6 +2,9 @@ #ifndef _LINUX_MEMORY_TIERS_H #define _LINUX_MEMORY_TIERS_H +#include +#include +#include /* * Each tier cover a abstrace distance chunk size of 128 */ @@ -16,12 +19,49 @@ #define MEMTIER_ADISTANCE_DRAM ((4 * MEMTIER_CHUNK_SIZE) + (MEMTIER_CHUNK_SIZE >> 1)) #define MEMTIER_HOTPLUG_PRIO 100 +struct memory_tier; +struct memory_dev_type { + /* list of memory types that are part of same tier as this type */ + struct list_head tier_sibiling; + /* abstract distance for this specific memory type */ + int adistance; + /* Nodes of same abstract distance */ + nodemask_t nodes; + struct kref kref; + struct memory_tier *memtier; +}; + #ifdef CONFIG_NUMA -#include extern bool numa_demotion_enabled; +struct memory_dev_type *alloc_memory_type(int adistance); +void destroy_memory_type(struct memory_dev_type *memtype); +void init_node_memory_type(int node, struct memory_dev_type *default_type); +void clear_node_memory_type(int node, struct memory_dev_type *memtype); #else #define numa_demotion_enabled false +/* + * CONFIG_NUMA implementation returns non NULL error. + */ +static inline struct memory_dev_type *alloc_memory_type(int adistance) +{ + return NULL; +} + +static inline void destroy_memory_type(struct memory_dev_type *memtype) +{ + +} + +static inline void init_node_memory_type(int node, struct memory_dev_type *default_type) +{ + +} + +static inline void clear_node_memory_type(int node, struct memory_dev_type *memtype) +{ + +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ -- cgit From 6c542ab75714fe90dae292aeb3e91ac53f5ff599 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:37 +0530 Subject: mm/demotion: build demotion targets based on explicit memory tiers This patch switch the demotion target building logic to use memory tiers instead of NUMA distance. All N_MEMORY NUMA nodes will be placed in the default memory tier and additional memory tiers will be added by drivers like dax kmem. This patch builds the demotion target for a NUMA node by looking at all memory tiers below the tier to which the NUMA node belongs. The closest node in the immediately following memory tier is used as a demotion target. Since we are now only building demotion target for N_MEMORY NUMA nodes the CPU hotplug calls are removed in this patch. Link: https://lkml.kernel.org/r/20220818131042.113280-6-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Jagdish Gediya Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/memory-tiers.h | 13 +++++++++++++ include/linux/migrate.h | 13 ------------- 2 files changed, 13 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 49d281866dca..ce9c6bac6725 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -37,6 +37,14 @@ struct memory_dev_type *alloc_memory_type(int adistance); void destroy_memory_type(struct memory_dev_type *memtype); void init_node_memory_type(int node, struct memory_dev_type *default_type); void clear_node_memory_type(int node, struct memory_dev_type *memtype); +#ifdef CONFIG_MIGRATION +int next_demotion_node(int node); +#else +static inline int next_demotion_node(int node) +{ + return NUMA_NO_NODE; +} +#endif #else @@ -63,5 +71,10 @@ static inline void clear_node_memory_type(int node, struct memory_dev_type *memt { } + +static inline int next_demotion_node(int node) +{ + return NUMA_NO_NODE; +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 96f8c84413fe..704a04f5a074 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -100,19 +100,6 @@ static inline int migrate_huge_page_move_mapping(struct address_space *mapping, #endif /* CONFIG_MIGRATION */ -#if defined(CONFIG_MIGRATION) && defined(CONFIG_NUMA) -extern void set_migration_target_nodes(void); -extern void migrate_on_reclaim_init(void); -extern int next_demotion_node(int node); -#else -static inline void set_migration_target_nodes(void) {} -static inline void migrate_on_reclaim_init(void) {} -static inline int next_demotion_node(int node) -{ - return NUMA_NO_NODE; -} -#endif - #ifdef CONFIG_COMPACTION bool PageMovable(struct page *page); void __SetPageMovable(struct page *page, const struct movable_operations *ops); -- cgit From 7766cf7a7e7545ab434a16c6f9531b09efe14dc1 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:38 +0530 Subject: mm/demotion: add pg_data_t member to track node memory tier details Also update different helpes to use NODE_DATA()->memtier. Since node specific memtier can change based on the reassignment of NUMA node to a different memory tiers, accessing NODE_DATA()->memtier needs to happen under an rcu read lock or memory_tier_lock. Link: https://lkml.kernel.org/r/20220818131042.113280-7-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Jagdish Gediya Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 87347945270b..e335a492c2eb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1246,6 +1246,9 @@ typedef struct pglist_data { /* Per-node vmstats */ struct per_cpu_nodestat __percpu *per_cpu_nodestats; atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; +#ifdef CONFIG_NUMA + struct memory_tier __rcu *memtier; +#endif } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) -- cgit From b26ac6f3ba38fac83db2d72551e6d994d0e0516f Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:39 +0530 Subject: mm/demotion: drop memtier from memtype Now that we track node-specific memtier in pg_data_t, we can drop memtier from memtype. Link: https://lkml.kernel.org/r/20220818131042.113280-8-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Jagdish Gediya Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/memory-tiers.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index ce9c6bac6725..7ca52ad2789f 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -28,7 +28,6 @@ struct memory_dev_type { /* Nodes of same abstract distance */ nodemask_t nodes; struct kref kref; - struct memory_tier *memtier; }; #ifdef CONFIG_NUMA -- cgit From 32008027289239100d8d2876f50b15d92bde1855 Mon Sep 17 00:00:00 2001 From: Jagdish Gediya Date: Thu, 18 Aug 2022 18:40:40 +0530 Subject: mm/demotion: demote pages according to allocation fallback order Currently, a higher tier node can only be demoted to selected nodes on the next lower tier as defined by the demotion path. This strict demotion order does not work in all use cases (e.g. some use cases may want to allow cross-socket demotion to another node in the same demotion tier as a fallback when the preferred demotion node is out of space). This demotion order is also inconsistent with the page allocation fallback order when all the nodes in a higher tier are out of space: The page allocation can fall back to any node from any lower tier, whereas the demotion order doesn't allow that currently. This patch adds support to get all the allowed demotion targets for a memory tier. demote_page_list() function is now modified to utilize this allowed node mask as the fallback allocation mask. Link: https://lkml.kernel.org/r/20220818131042.113280-9-aneesh.kumar@linux.ibm.com Signed-off-by: Jagdish Gediya Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/memory-tiers.h | 12 ++++++++++++ 1 file changed, 12 insertions(+) (limited to 'include/linux') diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 7ca52ad2789f..42791554b9b9 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -5,6 +5,7 @@ #include #include #include +#include /* * Each tier cover a abstrace distance chunk size of 128 */ @@ -38,11 +39,17 @@ void init_node_memory_type(int node, struct memory_dev_type *default_type); void clear_node_memory_type(int node, struct memory_dev_type *memtype); #ifdef CONFIG_MIGRATION int next_demotion_node(int node); +void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); #else static inline int next_demotion_node(int node) { return NUMA_NO_NODE; } + +static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets) +{ + *targets = NODE_MASK_NONE; +} #endif #else @@ -75,5 +82,10 @@ static inline int next_demotion_node(int node) { return NUMA_NO_NODE; } + +static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets) +{ + *targets = NODE_MASK_NONE; +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ -- cgit From 467b171af881282fc627328e6c164f044a6df888 Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:41 +0530 Subject: mm/demotion: update node_is_toptier to work with memory tiers With memory tier support we can have memory only NUMA nodes in the top tier from which we want to avoid promotion tracking NUMA faults. Update node_is_toptier to work with memory tiers. All NUMA nodes are by default top tier nodes. With lower(slower) memory tiers added we consider all memory tiers above a memory tier having CPU NUMA nodes as a top memory tier [sj@kernel.org: include missed header file, memory-tiers.h] Link: https://lkml.kernel.org/r/20220820190720.248704-1-sj@kernel.org [akpm@linux-foundation.org: mm/memory.c needs linux/memory-tiers.h] [aneesh.kumar@linux.ibm.com: make toptier_distance inclusive upper bound of toptiers] Link: https://lkml.kernel.org/r/20220830081457.118960-1-aneesh.kumar@linux.ibm.com Link: https://lkml.kernel.org/r/20220818131042.113280-10-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Jagdish Gediya Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/memory-tiers.h | 11 +++++++++++ include/linux/node.h | 5 ----- 2 files changed, 11 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index 42791554b9b9..965009aa01d7 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -40,6 +40,7 @@ void clear_node_memory_type(int node, struct memory_dev_type *memtype); #ifdef CONFIG_MIGRATION int next_demotion_node(int node); void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); +bool node_is_toptier(int node); #else static inline int next_demotion_node(int node) { @@ -50,6 +51,11 @@ static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *target { *targets = NODE_MASK_NONE; } + +static inline bool node_is_toptier(int node) +{ + return true; +} #endif #else @@ -87,5 +93,10 @@ static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *target { *targets = NODE_MASK_NONE; } + +static inline bool node_is_toptier(int node) +{ + return true; +} #endif /* CONFIG_NUMA */ #endif /* _LINUX_MEMORY_TIERS_H */ diff --git a/include/linux/node.h b/include/linux/node.h index 40d641a8bfb0..9ec680dd607f 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -185,9 +185,4 @@ static inline void register_hugetlbfs_with_node(node_registration_func_t reg, #define to_node(device) container_of(device, struct node, dev) -static inline bool node_is_toptier(int node) -{ - return node_state(node, N_CPU); -} - #endif /* _LINUX_NODE_H_ */ -- cgit From 3e061d924fe9c7b487ca45935f92fcd414ce6f1a Mon Sep 17 00:00:00 2001 From: "Aneesh Kumar K.V" Date: Thu, 18 Aug 2022 18:40:42 +0530 Subject: lib/nodemask: optimize node_random for nodemask with single NUMA node The most common case for certain node_random usage (demotion nodemask) is with nodemask weight 1. We can avoid calling get_random_init() in that case and always return the only node set in the nodemask. A simple test as below before = rdtsc_ordered(); for (i= 0; i < 100; i++) { rand = node_random(&nmask); } after = rdtsc_ordered(); Without fix after - before : 16438 With fix after - before : 816 Link: https://lkml.kernel.org/r/20220818131042.113280-11-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V Reviewed-by: "Huang, Ying" Acked-by: Wei Xu Cc: Alistair Popple Cc: Bharata B Rao Cc: Dan Williams Cc: Dave Hansen Cc: Davidlohr Bueso Cc: Hesham Almatary Cc: Jagdish Gediya Cc: Johannes Weiner Cc: Jonathan Cameron Cc: Michal Hocko Cc: Tim Chen Cc: Yang Shi Cc: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/nodemask.h | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 3a0eec9f2faa..e66742db741c 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -505,12 +505,21 @@ static inline int num_node_state(enum node_states state) static inline int node_random(const nodemask_t *maskp) { #if defined(CONFIG_NUMA) && (MAX_NUMNODES > 1) - int w, bit = NUMA_NO_NODE; + int w, bit; w = nodes_weight(*maskp); - if (w) + switch (w) { + case 0: + bit = NUMA_NO_NODE; + break; + case 1: + bit = first_node(*maskp); + break; + default: bit = bitmap_ord_to_pos(maskp->bits, - get_random_int() % w, MAX_NUMNODES); + get_random_int() % w, MAX_NUMNODES); + break; + } return bit; #else return 0; -- cgit From 54a611b605901c7d5d05b6b8f5d04a6ceb0962aa Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:39 +0000 Subject: Maple Tree: add new data structure Patch series "Introducing the Maple Tree" The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. Davidlor said : Yes I like the maple tree, and at this stage I don't think we can ask for : more from this series wrt the MM - albeit there seems to still be some : folks reporting breakage. Fundamentally I see Liam's work to (re)move : complexity out of the MM (not to say that the actual maple tree is not : complex) by consolidating the three complimentary data structures very : much worth it considering performance does not take a hit. This was very : much a turn off with the range locking approach, which worst case scenario : incurred in prohibitive overhead. Also as Liam and Matthew have : mentioned, RCU opens up a lot of nice performance opportunities, and in : addition academia[1] has shown outstanding scalability of address spaces : with the foundation of replacing the locked rbtree with RCU aware trees. A similar work has been discovered in the academic press https://pdos.csail.mit.edu/papers/rcuvm:asplos12.pdf Sheer coincidence. We designed our tree with the intention of solving the hardest problem first. Upon settling on a b-tree variant and a rough outline, we researched ranged based b-trees and RCU b-trees and did find that article. So it was nice to find reassurances that we were on the right path, but our design choice of using ranges made that paper unusable for us. This patch (of 70): The maple tree is an RCU-safe range based B-tree designed to use modern processor cache efficiently. There are a number of places in the kernel that a non-overlapping range-based tree would be beneficial, especially one with a simple interface. If you use an rbtree with other data structures to improve performance or an interval tree to track non-overlapping ranges, then this is for you. The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf nodes. With the increased branching factor, it is significantly shorter than the rbtree so it has fewer cache misses. The removal of the linked list between subsequent entries also reduces the cache misses and the need to pull in the previous and next VMA during many tree alterations. The first user that is covered in this patch set is the vm_area_struct, where three data structures are replaced by the maple tree: the augmented rbtree, the vma cache, and the linked list of VMAs in the mm_struct. The long term goal is to reduce or remove the mmap_lock contention. The plan is to get to the point where we use the maple tree in RCU mode. Readers will not block for writers. A single write operation will be allowed at a time. A reader re-walks if stale data is encountered. VMAs would be RCU enabled and this mode would be entered once multiple tasks are using the mm_struct. There is additional BUG_ON() calls added within the tree, most of which are in debug code. These will be replaced with a WARN_ON() call in the future. There is also additional BUG_ON() calls within the code which will also be reduced in number at a later date. These exist to catch things such as out-of-range accesses which would crash anyways. Link: https://lkml.kernel.org/r/20220906194824.2110408-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20220906194824.2110408-2-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Signed-off-by: Matthew Wilcox (Oracle) Tested-by: David Howells Tested-by: Sven Schnelle Tested-by: Yu Zhao Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Davidlohr Bueso Cc: Catalin Marinas Cc: SeongJae Park Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 685 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 685 insertions(+) create mode 100644 include/linux/maple_tree.h (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h new file mode 100644 index 000000000000..2effab72add1 --- /dev/null +++ b/include/linux/maple_tree.h @@ -0,0 +1,685 @@ +/* SPDX-License-Identifier: GPL-2.0+ */ +#ifndef _LINUX_MAPLE_TREE_H +#define _LINUX_MAPLE_TREE_H +/* + * Maple Tree - An RCU-safe adaptive tree for storing ranges + * Copyright (c) 2018-2022 Oracle + * Authors: Liam R. Howlett + * Matthew Wilcox + */ + +#include +#include +#include +/* #define CONFIG_MAPLE_RCU_DISABLED */ +/* #define CONFIG_DEBUG_MAPLE_TREE_VERBOSE */ + +/* + * Allocated nodes are mutable until they have been inserted into the tree, + * at which time they cannot change their type until they have been removed + * from the tree and an RCU grace period has passed. + * + * Removed nodes have their ->parent set to point to themselves. RCU readers + * check ->parent before relying on the value that they loaded from the + * slots array. This lets us reuse the slots array for the RCU head. + * + * Nodes in the tree point to their parent unless bit 0 is set. + */ +#if defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64) +/* 64bit sizes */ +#define MAPLE_NODE_SLOTS 31 /* 256 bytes including ->parent */ +#define MAPLE_RANGE64_SLOTS 16 /* 256 bytes */ +#define MAPLE_ARANGE64_SLOTS 10 /* 240 bytes */ +#define MAPLE_ARANGE64_META_MAX 15 /* Out of range for metadata */ +#define MAPLE_ALLOC_SLOTS (MAPLE_NODE_SLOTS - 1) +#else +/* 32bit sizes */ +#define MAPLE_NODE_SLOTS 63 /* 256 bytes including ->parent */ +#define MAPLE_RANGE64_SLOTS 32 /* 256 bytes */ +#define MAPLE_ARANGE64_SLOTS 21 /* 240 bytes */ +#define MAPLE_ARANGE64_META_MAX 31 /* Out of range for metadata */ +#define MAPLE_ALLOC_SLOTS (MAPLE_NODE_SLOTS - 2) +#endif /* defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64) */ + +#define MAPLE_NODE_MASK 255UL + +/* + * The node->parent of the root node has bit 0 set and the rest of the pointer + * is a pointer to the tree itself. No more bits are available in this pointer + * (on m68k, the data structure may only be 2-byte aligned). + * + * Internal non-root nodes can only have maple_range_* nodes as parents. The + * parent pointer is 256B aligned like all other tree nodes. When storing a 32 + * or 64 bit values, the offset can fit into 4 bits. The 16 bit values need an + * extra bit to store the offset. This extra bit comes from a reuse of the last + * bit in the node type. This is possible by using bit 1 to indicate if bit 2 + * is part of the type or the slot. + * + * Once the type is decided, the decision of an allocation range type or a range + * type is done by examining the immutable tree flag for the MAPLE_ALLOC_RANGE + * flag. + * + * Node types: + * 0x??1 = Root + * 0x?00 = 16 bit nodes + * 0x010 = 32 bit nodes + * 0x110 = 64 bit nodes + * + * Slot size and location in the parent pointer: + * type : slot location + * 0x??1 : Root + * 0x?00 : 16 bit values, type in 0-1, slot in 2-6 + * 0x010 : 32 bit values, type in 0-2, slot in 3-6 + * 0x110 : 64 bit values, type in 0-2, slot in 3-6 + */ + +/* + * This metadata is used to optimize the gap updating code and in reverse + * searching for gaps or any other code that needs to find the end of the data. + */ +struct maple_metadata { + unsigned char end; + unsigned char gap; +}; + +/* + * Leaf nodes do not store pointers to nodes, they store user data. Users may + * store almost any bit pattern. As noted above, the optimisation of storing an + * entry at 0 in the root pointer cannot be done for data which have the bottom + * two bits set to '10'. We also reserve values with the bottom two bits set to + * '10' which are below 4096 (ie 2, 6, 10 .. 4094) for internal use. Some APIs + * return errnos as a negative errno shifted right by two bits and the bottom + * two bits set to '10', and while choosing to store these values in the array + * is not an error, it may lead to confusion if you're testing for an error with + * mas_is_err(). + * + * Non-leaf nodes store the type of the node pointed to (enum maple_type in bits + * 3-6), bit 2 is reserved. That leaves bits 0-1 unused for now. + * + * In regular B-Tree terms, pivots are called keys. The term pivot is used to + * indicate that the tree is specifying ranges, Pivots may appear in the + * subtree with an entry attached to the value whereas keys are unique to a + * specific position of a B-tree. Pivot values are inclusive of the slot with + * the same index. + */ + +struct maple_range_64 { + struct maple_pnode *parent; + unsigned long pivot[MAPLE_RANGE64_SLOTS - 1]; + union { + void __rcu *slot[MAPLE_RANGE64_SLOTS]; + struct { + void __rcu *pad[MAPLE_RANGE64_SLOTS - 1]; + struct maple_metadata meta; + }; + }; +}; + +/* + * At tree creation time, the user can specify that they're willing to trade off + * storing fewer entries in a tree in return for storing more information in + * each node. + * + * The maple tree supports recording the largest range of NULL entries available + * in this node, also called gaps. This optimises the tree for allocating a + * range. + */ +struct maple_arange_64 { + struct maple_pnode *parent; + unsigned long pivot[MAPLE_ARANGE64_SLOTS - 1]; + void __rcu *slot[MAPLE_ARANGE64_SLOTS]; + unsigned long gap[MAPLE_ARANGE64_SLOTS]; + struct maple_metadata meta; +}; + +struct maple_alloc { + unsigned long total; + unsigned char node_count; + unsigned int request_count; + struct maple_alloc *slot[MAPLE_ALLOC_SLOTS]; +}; + +struct maple_topiary { + struct maple_pnode *parent; + struct maple_enode *next; /* Overlaps the pivot */ +}; + +enum maple_type { + maple_dense, + maple_leaf_64, + maple_range_64, + maple_arange_64, +}; + + +/** + * DOC: Maple tree flags + * + * * MT_FLAGS_ALLOC_RANGE - Track gaps in this tree + * * MT_FLAGS_USE_RCU - Operate in RCU mode + * * MT_FLAGS_HEIGHT_OFFSET - The position of the tree height in the flags + * * MT_FLAGS_HEIGHT_MASK - The mask for the maple tree height value + * * MT_FLAGS_LOCK_MASK - How the mt_lock is used + * * MT_FLAGS_LOCK_IRQ - Acquired irq-safe + * * MT_FLAGS_LOCK_BH - Acquired bh-safe + * * MT_FLAGS_LOCK_EXTERN - mt_lock is not used + * + * MAPLE_HEIGHT_MAX The largest height that can be stored + */ +#define MT_FLAGS_ALLOC_RANGE 0x01 +#define MT_FLAGS_USE_RCU 0x02 +#define MT_FLAGS_HEIGHT_OFFSET 0x02 +#define MT_FLAGS_HEIGHT_MASK 0x7C +#define MT_FLAGS_LOCK_MASK 0x300 +#define MT_FLAGS_LOCK_IRQ 0x100 +#define MT_FLAGS_LOCK_BH 0x200 +#define MT_FLAGS_LOCK_EXTERN 0x300 + +#define MAPLE_HEIGHT_MAX 31 + + +#define MAPLE_NODE_TYPE_MASK 0x0F +#define MAPLE_NODE_TYPE_SHIFT 0x03 + +#define MAPLE_RESERVED_RANGE 4096 + +#ifdef CONFIG_LOCKDEP +typedef struct lockdep_map *lockdep_map_p; +#define mt_lock_is_held(mt) lock_is_held(mt->ma_external_lock) +#define mt_set_external_lock(mt, lock) \ + (mt)->ma_external_lock = &(lock)->dep_map +#else +typedef struct { /* nothing */ } lockdep_map_p; +#define mt_lock_is_held(mt) 1 +#define mt_set_external_lock(mt, lock) do { } while (0) +#endif + +/* + * If the tree contains a single entry at index 0, it is usually stored in + * tree->ma_root. To optimise for the page cache, an entry which ends in '00', + * '01' or '11' is stored in the root, but an entry which ends in '10' will be + * stored in a node. Bits 3-6 are used to store enum maple_type. + * + * The flags are used both to store some immutable information about this tree + * (set at tree creation time) and dynamic information set under the spinlock. + * + * Another use of flags are to indicate global states of the tree. This is the + * case with the MAPLE_USE_RCU flag, which indicates the tree is currently in + * RCU mode. This mode was added to allow the tree to reuse nodes instead of + * re-allocating and RCU freeing nodes when there is a single user. + */ +struct maple_tree { + union { + spinlock_t ma_lock; + lockdep_map_p ma_external_lock; + }; + void __rcu *ma_root; + unsigned int ma_flags; +}; + +/** + * MTREE_INIT() - Initialize a maple tree + * @name: The maple tree name + * @__flags: The maple tree flags + * + */ +#define MTREE_INIT(name, __flags) { \ + .ma_lock = __SPIN_LOCK_UNLOCKED((name).ma_lock), \ + .ma_flags = __flags, \ + .ma_root = NULL, \ +} + +/** + * MTREE_INIT_EXT() - Initialize a maple tree with an external lock. + * @name: The tree name + * @__flags: The maple tree flags + * @__lock: The external lock + */ +#ifdef CONFIG_LOCKDEP +#define MTREE_INIT_EXT(name, __flags, __lock) { \ + .ma_external_lock = &(__lock).dep_map, \ + .ma_flags = (__flags), \ + .ma_root = NULL, \ +} +#else +#define MTREE_INIT_EXT(name, __flags, __lock) MTREE_INIT(name, __flags) +#endif + +#define DEFINE_MTREE(name) \ + struct maple_tree name = MTREE_INIT(name, 0) + +#define mtree_lock(mt) spin_lock((&(mt)->ma_lock)) +#define mtree_unlock(mt) spin_unlock((&(mt)->ma_lock)) + +/* + * The Maple Tree squeezes various bits in at various points which aren't + * necessarily obvious. Usually, this is done by observing that pointers are + * N-byte aligned and thus the bottom log_2(N) bits are available for use. We + * don't use the high bits of pointers to store additional information because + * we don't know what bits are unused on any given architecture. + * + * Nodes are 256 bytes in size and are also aligned to 256 bytes, giving us 8 + * low bits for our own purposes. Nodes are currently of 4 types: + * 1. Single pointer (Range is 0-0) + * 2. Non-leaf Allocation Range nodes + * 3. Non-leaf Range nodes + * 4. Leaf Range nodes All nodes consist of a number of node slots, + * pivots, and a parent pointer. + */ + +struct maple_node { + union { + struct { + struct maple_pnode *parent; + void __rcu *slot[MAPLE_NODE_SLOTS]; + }; + struct { + void *pad; + struct rcu_head rcu; + struct maple_enode *piv_parent; + unsigned char parent_slot; + enum maple_type type; + unsigned char slot_len; + unsigned int ma_flags; + }; + struct maple_range_64 mr64; + struct maple_arange_64 ma64; + struct maple_alloc alloc; + }; +}; + +/* + * More complicated stores can cause two nodes to become one or three and + * potentially alter the height of the tree. Either half of the tree may need + * to be rebalanced against the other. The ma_topiary struct is used to track + * which nodes have been 'cut' from the tree so that the change can be done + * safely at a later date. This is done to support RCU. + */ +struct ma_topiary { + struct maple_enode *head; + struct maple_enode *tail; + struct maple_tree *mtree; +}; + +void *mtree_load(struct maple_tree *mt, unsigned long index); + +int mtree_insert(struct maple_tree *mt, unsigned long index, + void *entry, gfp_t gfp); +int mtree_insert_range(struct maple_tree *mt, unsigned long first, + unsigned long last, void *entry, gfp_t gfp); +int mtree_alloc_range(struct maple_tree *mt, unsigned long *startp, + void *entry, unsigned long size, unsigned long min, + unsigned long max, gfp_t gfp); +int mtree_alloc_rrange(struct maple_tree *mt, unsigned long *startp, + void *entry, unsigned long size, unsigned long min, + unsigned long max, gfp_t gfp); + +int mtree_store_range(struct maple_tree *mt, unsigned long first, + unsigned long last, void *entry, gfp_t gfp); +int mtree_store(struct maple_tree *mt, unsigned long index, + void *entry, gfp_t gfp); +void *mtree_erase(struct maple_tree *mt, unsigned long index); + +void mtree_destroy(struct maple_tree *mt); +void __mt_destroy(struct maple_tree *mt); + +/** + * mtree_empty() - Determine if a tree has any present entries. + * @mt: Maple Tree. + * + * Context: Any context. + * Return: %true if the tree contains only NULL pointers. + */ +static inline bool mtree_empty(const struct maple_tree *mt) +{ + return mt->ma_root == NULL; +} + +/* Advanced API */ + +/* + * The maple state is defined in the struct ma_state and is used to keep track + * of information during operations, and even between operations when using the + * advanced API. + * + * If state->node has bit 0 set then it references a tree location which is not + * a node (eg the root). If bit 1 is set, the rest of the bits are a negative + * errno. Bit 2 (the 'unallocated slots' bit) is clear. Bits 3-6 indicate the + * node type. + * + * state->alloc either has a request number of nodes or an allocated node. If + * stat->alloc has a requested number of nodes, the first bit will be set (0x1) + * and the remaining bits are the value. If state->alloc is a node, then the + * node will be of type maple_alloc. maple_alloc has MAPLE_NODE_SLOTS - 1 for + * storing more allocated nodes, a total number of nodes allocated, and the + * node_count in this node. node_count is the number of allocated nodes in this + * node. The scaling beyond MAPLE_NODE_SLOTS - 1 is handled by storing further + * nodes into state->alloc->slot[0]'s node. Nodes are taken from state->alloc + * by removing a node from the state->alloc node until state->alloc->node_count + * is 1, when state->alloc is returned and the state->alloc->slot[0] is promoted + * to state->alloc. Nodes are pushed onto state->alloc by putting the current + * state->alloc into the pushed node's slot[0]. + * + * The state also contains the implied min/max of the state->node, the depth of + * this search, and the offset. The implied min/max are either from the parent + * node or are 0-oo for the root node. The depth is incremented or decremented + * every time a node is walked down or up. The offset is the slot/pivot of + * interest in the node - either for reading or writing. + * + * When returning a value the maple state index and last respectively contain + * the start and end of the range for the entry. Ranges are inclusive in the + * Maple Tree. + */ +struct ma_state { + struct maple_tree *tree; /* The tree we're operating in */ + unsigned long index; /* The index we're operating on - range start */ + unsigned long last; /* The last index we're operating on - range end */ + struct maple_enode *node; /* The node containing this entry */ + unsigned long min; /* The minimum index of this node - implied pivot min */ + unsigned long max; /* The maximum index of this node - implied pivot max */ + struct maple_alloc *alloc; /* Allocated nodes for this operation */ + unsigned char depth; /* depth of tree descent during write */ + unsigned char offset; + unsigned char mas_flags; +}; + +struct ma_wr_state { + struct ma_state *mas; + struct maple_node *node; /* Decoded mas->node */ + unsigned long r_min; /* range min */ + unsigned long r_max; /* range max */ + enum maple_type type; /* mas->node type */ + unsigned char offset_end; /* The offset where the write ends */ + unsigned char node_end; /* mas->node end */ + unsigned long *pivots; /* mas->node->pivots pointer */ + unsigned long end_piv; /* The pivot at the offset end */ + void __rcu **slots; /* mas->node->slots pointer */ + void *entry; /* The entry to write */ + void *content; /* The existing entry that is being overwritten */ +}; + +#define mas_lock(mas) spin_lock(&((mas)->tree->ma_lock)) +#define mas_unlock(mas) spin_unlock(&((mas)->tree->ma_lock)) + + +/* + * Special values for ma_state.node. + * MAS_START means we have not searched the tree. + * MAS_ROOT means we have searched the tree and the entry we found lives in + * the root of the tree (ie it has index 0, length 1 and is the only entry in + * the tree). + * MAS_NONE means we have searched the tree and there is no node in the + * tree for this entry. For example, we searched for index 1 in an empty + * tree. Or we have a tree which points to a full leaf node and we + * searched for an entry which is larger than can be contained in that + * leaf node. + * MA_ERROR represents an errno. After dropping the lock and attempting + * to resolve the error, the walk would have to be restarted from the + * top of the tree as the tree may have been modified. + */ +#define MAS_START ((struct maple_enode *)1UL) +#define MAS_ROOT ((struct maple_enode *)5UL) +#define MAS_NONE ((struct maple_enode *)9UL) +#define MAS_PAUSE ((struct maple_enode *)17UL) +#define MA_ERROR(err) \ + ((struct maple_enode *)(((unsigned long)err << 2) | 2UL)) + +#define MA_STATE(name, mt, first, end) \ + struct ma_state name = { \ + .tree = mt, \ + .index = first, \ + .last = end, \ + .node = MAS_START, \ + .min = 0, \ + .max = ULONG_MAX, \ + .alloc = NULL, \ + } + +#define MA_WR_STATE(name, ma_state, wr_entry) \ + struct ma_wr_state name = { \ + .mas = ma_state, \ + .content = NULL, \ + .entry = wr_entry, \ + } + +#define MA_TOPIARY(name, tree) \ + struct ma_topiary name = { \ + .head = NULL, \ + .tail = NULL, \ + .mtree = tree, \ + } + +void *mas_walk(struct ma_state *mas); +void *mas_store(struct ma_state *mas, void *entry); +void *mas_erase(struct ma_state *mas); +int mas_store_gfp(struct ma_state *mas, void *entry, gfp_t gfp); +void mas_store_prealloc(struct ma_state *mas, void *entry); +void *mas_find(struct ma_state *mas, unsigned long max); +void *mas_find_rev(struct ma_state *mas, unsigned long min); +int mas_preallocate(struct ma_state *mas, void *entry, gfp_t gfp); +bool mas_is_err(struct ma_state *mas); + +bool mas_nomem(struct ma_state *mas, gfp_t gfp); +void mas_pause(struct ma_state *mas); +void maple_tree_init(void); +void mas_destroy(struct ma_state *mas); +int mas_expected_entries(struct ma_state *mas, unsigned long nr_entries); + +void *mas_prev(struct ma_state *mas, unsigned long min); +void *mas_next(struct ma_state *mas, unsigned long max); + +int mas_empty_area(struct ma_state *mas, unsigned long min, unsigned long max, + unsigned long size); + +/* Checks if a mas has not found anything */ +static inline bool mas_is_none(struct ma_state *mas) +{ + return mas->node == MAS_NONE; +} + +/* Checks if a mas has been paused */ +static inline bool mas_is_paused(struct ma_state *mas) +{ + return mas->node == MAS_PAUSE; +} + +void mas_dup_tree(struct ma_state *oldmas, struct ma_state *mas); +void mas_dup_store(struct ma_state *mas, void *entry); + +/* + * This finds an empty area from the highest address to the lowest. + * AKA "Topdown" version, + */ +int mas_empty_area_rev(struct ma_state *mas, unsigned long min, + unsigned long max, unsigned long size); +/** + * mas_reset() - Reset a Maple Tree operation state. + * @mas: Maple Tree operation state. + * + * Resets the error or walk state of the @mas so future walks of the + * array will start from the root. Use this if you have dropped the + * lock and want to reuse the ma_state. + * + * Context: Any context. + */ +static inline void mas_reset(struct ma_state *mas) +{ + mas->node = MAS_START; +} + +/** + * mas_for_each() - Iterate over a range of the maple tree. + * @__mas: Maple Tree operation state (maple_state) + * @__entry: Entry retrieved from the tree + * @__max: maximum index to retrieve from the tree + * + * When returned, mas->index and mas->last will hold the entire range for the + * entry. + * + * Note: may return the zero entry. + * + */ +#define mas_for_each(__mas, __entry, __max) \ + while (((__entry) = mas_find((__mas), (__max))) != NULL) + + +/** + * mas_set_range() - Set up Maple Tree operation state for a different index. + * @mas: Maple Tree operation state. + * @start: New start of range in the Maple Tree. + * @last: New end of range in the Maple Tree. + * + * Move the operation state to refer to a different range. This will + * have the effect of starting a walk from the top; see mas_next() + * to move to an adjacent index. + */ +static inline +void mas_set_range(struct ma_state *mas, unsigned long start, unsigned long last) +{ + mas->index = start; + mas->last = last; + mas->node = MAS_START; +} + +/** + * mas_set() - Set up Maple Tree operation state for a different index. + * @mas: Maple Tree operation state. + * @index: New index into the Maple Tree. + * + * Move the operation state to refer to a different index. This will + * have the effect of starting a walk from the top; see mas_next() + * to move to an adjacent index. + */ +static inline void mas_set(struct ma_state *mas, unsigned long index) +{ + + mas_set_range(mas, index, index); +} + +static inline bool mt_external_lock(const struct maple_tree *mt) +{ + return (mt->ma_flags & MT_FLAGS_LOCK_MASK) == MT_FLAGS_LOCK_EXTERN; +} + +/** + * mt_init_flags() - Initialise an empty maple tree with flags. + * @mt: Maple Tree + * @flags: maple tree flags. + * + * If you need to initialise a Maple Tree with special flags (eg, an + * allocation tree), use this function. + * + * Context: Any context. + */ +static inline void mt_init_flags(struct maple_tree *mt, unsigned int flags) +{ + mt->ma_flags = flags; + if (!mt_external_lock(mt)) + spin_lock_init(&mt->ma_lock); + rcu_assign_pointer(mt->ma_root, NULL); +} + +/** + * mt_init() - Initialise an empty maple tree. + * @mt: Maple Tree + * + * An empty Maple Tree. + * + * Context: Any context. + */ +static inline void mt_init(struct maple_tree *mt) +{ + mt_init_flags(mt, 0); +} + +static inline bool mt_in_rcu(struct maple_tree *mt) +{ +#ifdef CONFIG_MAPLE_RCU_DISABLED + return false; +#endif + return mt->ma_flags & MT_FLAGS_USE_RCU; +} + +/** + * mt_clear_in_rcu() - Switch the tree to non-RCU mode. + * @mt: The Maple Tree + */ +static inline void mt_clear_in_rcu(struct maple_tree *mt) +{ + if (!mt_in_rcu(mt)) + return; + + if (mt_external_lock(mt)) { + BUG_ON(!mt_lock_is_held(mt)); + mt->ma_flags &= ~MT_FLAGS_USE_RCU; + } else { + mtree_lock(mt); + mt->ma_flags &= ~MT_FLAGS_USE_RCU; + mtree_unlock(mt); + } +} + +/** + * mt_set_in_rcu() - Switch the tree to RCU safe mode. + * @mt: The Maple Tree + */ +static inline void mt_set_in_rcu(struct maple_tree *mt) +{ + if (mt_in_rcu(mt)) + return; + + if (mt_external_lock(mt)) { + BUG_ON(!mt_lock_is_held(mt)); + mt->ma_flags |= MT_FLAGS_USE_RCU; + } else { + mtree_lock(mt); + mt->ma_flags |= MT_FLAGS_USE_RCU; + mtree_unlock(mt); + } +} + +void *mt_find(struct maple_tree *mt, unsigned long *index, unsigned long max); +void *mt_find_after(struct maple_tree *mt, unsigned long *index, + unsigned long max); +void *mt_prev(struct maple_tree *mt, unsigned long index, unsigned long min); +void *mt_next(struct maple_tree *mt, unsigned long index, unsigned long max); + +/** + * mt_for_each - Iterate over each entry starting at index until max. + * @__tree: The Maple Tree + * @__entry: The current entry + * @__index: The index to update to track the location in the tree + * @__max: The maximum limit for @index + * + * Note: Will not return the zero entry. + */ +#define mt_for_each(__tree, __entry, __index, __max) \ + for (__entry = mt_find(__tree, &(__index), __max); \ + __entry; __entry = mt_find_after(__tree, &(__index), __max)) + + +#ifdef CONFIG_DEBUG_MAPLE_TREE +extern atomic_t maple_tree_tests_run; +extern atomic_t maple_tree_tests_passed; + +void mt_dump(const struct maple_tree *mt); +void mt_validate(struct maple_tree *mt); +#define MT_BUG_ON(__tree, __x) do { \ + atomic_inc(&maple_tree_tests_run); \ + if (__x) { \ + pr_info("BUG at %s:%d (%u)\n", \ + __func__, __LINE__, __x); \ + mt_dump(__tree); \ + pr_info("Pass: %u Run:%u\n", \ + atomic_read(&maple_tree_tests_passed), \ + atomic_read(&maple_tree_tests_run)); \ + dump_stack(); \ + } else { \ + atomic_inc(&maple_tree_tests_passed); \ + } \ +} while (0) +#else +#define MT_BUG_ON(__tree, __x) BUG_ON(__x) +#endif /* CONFIG_DEBUG_MAPLE_TREE */ + +#endif /*_LINUX_MAPLE_TREE_H */ -- cgit From d4af56c5c7c6781ca6ca8075e2cf5bc119ed33d1 Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:45 +0000 Subject: mm: start tracking VMAs with maple tree Start tracking the VMAs with the new maple tree structure in parallel with the rb_tree. Add debug and trace events for maple tree operations and duplicate the rb_tree that is created on forks into the maple tree. The maple tree is added to the mm_struct including the mm_init struct, added support in required mm/mmap functions, added tracking in kernel/fork for process forking, and used to find the unmapped_area and checked against what the rbtree finds. This also moves the mmap_lock() in exit_mmap() since the oom reaper call does walk the VMAs. Otherwise lockdep will be unhappy if oom happens. When splitting a vma fails due to allocations of the maple tree nodes, the error path in __split_vma() calls new->vm_ops->close(new). The page accounting for hugetlb is actually in the close() operation, so it accounts for the removal of 1/2 of the VMA which was not adjusted. This results in a negative exit value. To avoid the negative charge, set vm_start = vm_end and vm_pgoff = 0. There is also a potential accounting issue in special mappings from insert_vm_struct() failing to allocate, so reverse the charge there in the failure scenario. Link: https://lkml.kernel.org/r/20220906194824.2110408-9-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Signed-off-by: Matthew Wilcox (Oracle) Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: SeongJae Park Cc: Sven Schnelle Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 5 +++++ include/linux/mm_types.h | 3 +++ 2 files changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 7cc9ffc19e7f..896d04248e66 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2567,6 +2567,8 @@ extern bool arch_has_descending_max_zone_pfns(void); /* nommu.c */ extern atomic_long_t mmap_pages_allocated; extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t); +/* mmap.c */ +void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas); /* interval_tree.c */ void vma_interval_tree_insert(struct vm_area_struct *node, @@ -2630,6 +2632,9 @@ extern struct vm_area_struct *copy_vma(struct vm_area_struct **, bool *need_rmap_locks); extern void exit_mmap(struct mm_struct *); +void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas); +void vma_mas_remove(struct vm_area_struct *vma, struct ma_state *mas); + static inline int check_data_rlimit(unsigned long rlim, unsigned long new, unsigned long start, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e1797813cc2c..425bc5f7d477 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -486,6 +487,7 @@ struct kioctx_table; struct mm_struct { struct { struct vm_area_struct *mmap; /* list of VMAs */ + struct maple_tree mm_mt; struct rb_root mm_rb; u64 vmacache_seqnum; /* per-thread vmacache */ #ifdef CONFIG_MMU @@ -697,6 +699,7 @@ struct mm_struct { unsigned long cpu_bitmap[]; }; +#define MM_MT_FLAGS (MT_FLAGS_ALLOC_RANGE | MT_FLAGS_LOCK_EXTERN) extern struct mm_struct init_mm; /* Pointer magic because the dynamic array size confuses some compilers. */ -- cgit From f39af05949a4280b9f04d5dd0f606b81aac3dae8 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Tue, 6 Sep 2022 19:48:46 +0000 Subject: mm: add VMA iterator This thin layer of abstraction over the maple tree state is for iterating over VMAs. You can go forwards, go backwards or ask where the iterator is. Rename the existing vma_next() to __vma_next() -- it will be removed by the end of this series. Link: https://lkml.kernel.org/r/20220906194824.2110408-10-Liam.Howlett@oracle.com Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Liam R. Howlett Acked-by: Vlastimil Babka Reviewed-by: David Hildenbrand Reviewed-by: Davidlohr Bueso Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Howells Cc: SeongJae Park Cc: Sven Schnelle Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 32 ++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 21 +++++++++++++++++++++ 2 files changed, 53 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 896d04248e66..3701da1fac5f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -661,6 +661,38 @@ static inline bool vma_is_accessible(struct vm_area_struct *vma) return vma->vm_flags & VM_ACCESS_FLAGS; } +static inline +struct vm_area_struct *vma_find(struct vma_iterator *vmi, unsigned long max) +{ + return mas_find(&vmi->mas, max); +} + +static inline struct vm_area_struct *vma_next(struct vma_iterator *vmi) +{ + /* + * Uses vma_find() to get the first VMA when the iterator starts. + * Calling mas_next() could skip the first entry. + */ + return vma_find(vmi, ULONG_MAX); +} + +static inline struct vm_area_struct *vma_prev(struct vma_iterator *vmi) +{ + return mas_prev(&vmi->mas, 0); +} + +static inline unsigned long vma_iter_addr(struct vma_iterator *vmi) +{ + return vmi->mas.index; +} + +#define for_each_vma(__vmi, __vma) \ + while (((__vma) = vma_next(&(__vmi))) != NULL) + +/* The MM code likes to work with exclusive end addresses */ +#define for_each_vma_range(__vmi, __vma, __end) \ + while (((__vma) = vma_find(&(__vmi), (__end) - 1)) != NULL) + #ifdef CONFIG_SHMEM /* * The vma_is_shmem is not inline because it is used only by slow diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 425bc5f7d477..d0b51fbdf5d4 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -777,6 +777,27 @@ static inline void lru_gen_use_mm(struct mm_struct *mm) #endif /* CONFIG_LRU_GEN */ +struct vma_iterator { + struct ma_state mas; +}; + +#define VMA_ITERATOR(name, __mm, __addr) \ + struct vma_iterator name = { \ + .mas = { \ + .tree = &(__mm)->mm_mt, \ + .index = __addr, \ + .node = MAS_START, \ + }, \ + } + +static inline void vma_iter_init(struct vma_iterator *vmi, + struct mm_struct *mm, unsigned long addr) +{ + vmi->mas.tree = &mm->mm_mt; + vmi->mas.index = addr; + vmi->mas.node = MAS_START; +} + struct mmu_gather; extern void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm); extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct *mm); -- cgit From c9dbe82cb99db5b6029c6bc43fcf7881d3f50268 Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:47 +0000 Subject: kernel/fork: use maple tree for dup_mmap() during forking The maple tree was already tracking VMAs in this function by an earlier commit, but the rbtree iterator was being used to iterate the list. Change the iterator to use a maple tree native iterator and switch to the maple tree advanced API to avoid multiple walks of the tree during insert operations. Unexport the now-unused vma_store() function. For performance reasons we bulk allocate the maple tree nodes. The node calculations are done internally to the tree and use the VMA count and assume the worst-case node requirements. The VM_DONT_COPY flag does not allow for the most efficient copy method of the tree and so a bulk loading algorithm is used. Link: https://lkml.kernel.org/r/20220906194824.2110408-15-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Vlastimil Babka Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: SeongJae Park Cc: Sven Schnelle Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 3701da1fac5f..646ea4d3bd74 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2599,8 +2599,6 @@ extern bool arch_has_descending_max_zone_pfns(void); /* nommu.c */ extern atomic_long_t mmap_pages_allocated; extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t); -/* mmap.c */ -void vma_mas_store(struct vm_area_struct *vma, struct ma_state *mas); /* interval_tree.c */ void vma_interval_tree_insert(struct vm_area_struct *node, -- cgit From 524e00b36e8c547f5582eef3fb645a8d9fc5e3df Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:48 +0000 Subject: mm: remove rb tree. Remove the RB tree and start using the maple tree for vm_area_struct tracking. Drop validate_mm() calls in expand_upwards() and expand_downwards() as the lock is not held. Link: https://lkml.kernel.org/r/20220906194824.2110408-18-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: "Matthew Wilcox (Oracle)" Cc: SeongJae Park Cc: Sven Schnelle Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 -- include/linux/mm_types.h | 14 -------------- 2 files changed, 16 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 646ea4d3bd74..dfce1aaa7a64 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2654,8 +2654,6 @@ extern int __split_vma(struct mm_struct *, struct vm_area_struct *, extern int split_vma(struct mm_struct *, struct vm_area_struct *, unsigned long addr, int new_below); extern int insert_vm_struct(struct mm_struct *, struct vm_area_struct *); -extern void __vma_link_rb(struct mm_struct *, struct vm_area_struct *, - struct rb_node **, struct rb_node *); extern void unlink_file_vma(struct vm_area_struct *); extern struct vm_area_struct *copy_vma(struct vm_area_struct **, unsigned long addr, unsigned long len, pgoff_t pgoff, diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index d0b51fbdf5d4..ac747273c4d6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -410,19 +410,6 @@ struct vm_area_struct { /* linked list of VM areas per task, sorted by address */ struct vm_area_struct *vm_next, *vm_prev; - - struct rb_node vm_rb; - - /* - * Largest free memory gap in bytes to the left of this VMA. - * Either between this VMA and vma->vm_prev, or between one of the - * VMAs below us in the VMA rbtree and its ->vm_prev. This helps - * get_unmapped_area find a free area of the right size. - */ - unsigned long rb_subtree_gap; - - /* Second cache line starts here. */ - struct mm_struct *vm_mm; /* The address space we belong to. */ /* @@ -488,7 +475,6 @@ struct mm_struct { struct { struct vm_area_struct *mmap; /* list of VMAs */ struct maple_tree mm_mt; - struct rb_root mm_rb; u64 vmacache_seqnum; /* per-thread vmacache */ #ifdef CONFIG_MMU unsigned long (*get_unmapped_area) (struct file *filp, -- cgit From dc8635b25e87232f62276c02899b9d21dd0793c2 Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:49 +0000 Subject: mm: optimize find_exact_vma() to use vma_lookup() Use vma_lookup() to walk the tree to the start value requested. If the vma at the start does not match, then the answer is NULL and there is no need to look at the next vma the way that find_vma() would. Link: https://lkml.kernel.org/r/20220906194824.2110408-21-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Reviewed-by: Vlastimil Babka Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: "Matthew Wilcox (Oracle)" Cc: SeongJae Park Cc: Sven Schnelle Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index dfce1aaa7a64..a80083091f53 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2850,7 +2850,7 @@ static inline unsigned long vma_pages(struct vm_area_struct *vma) static inline struct vm_area_struct *find_exact_vma(struct mm_struct *mm, unsigned long vm_start, unsigned long vm_end) { - struct vm_area_struct *vma = find_vma(mm, vm_start); + struct vm_area_struct *vma = vma_lookup(mm, vm_start); if (vma && (vma->vm_start != vm_start || vma->vm_end != vm_end)) vma = NULL; -- cgit From abdba2dda0c477ca708a939b02f9b2e74666ed2d Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:50 +0000 Subject: mm: use maple tree operations for find_vma_intersection() Move find_vma_intersection() to mmap.c and change implementation to maple tree. When searching for a vma within a range, it is easier to use the maple tree interface. Exported find_vma_intersection() for kvm module. Link: https://lkml.kernel.org/r/20220906194824.2110408-24-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: "Matthew Wilcox (Oracle)" Cc: SeongJae Park Cc: Sven Schnelle Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 22 ++++------------------ 1 file changed, 4 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index a80083091f53..06a6b8db75b7 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2778,26 +2778,12 @@ extern struct vm_area_struct * find_vma(struct mm_struct * mm, unsigned long add extern struct vm_area_struct * find_vma_prev(struct mm_struct * mm, unsigned long addr, struct vm_area_struct **pprev); -/** - * find_vma_intersection() - Look up the first VMA which intersects the interval - * @mm: The process address space. - * @start_addr: The inclusive start user address. - * @end_addr: The exclusive end user address. - * - * Returns: The first VMA within the provided range, %NULL otherwise. Assumes - * start_addr < end_addr. +/* + * Look up the first VMA which intersects the interval [start_addr, end_addr) + * NULL if none. Assume start_addr < end_addr. */ -static inline struct vm_area_struct *find_vma_intersection(struct mm_struct *mm, - unsigned long start_addr, - unsigned long end_addr) -{ - struct vm_area_struct *vma = find_vma(mm, start_addr); - - if (vma && end_addr <= vma->vm_start) - vma = NULL; - return vma; -} + unsigned long start_addr, unsigned long end_addr); /** * vma_lookup() - Find a VMA at a specific address -- cgit From 7964cf8caa4dfa42c4149f3833d3878713cda3dc Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:51 +0000 Subject: mm: remove vmacache By using the maple tree and the maple tree state, the vmacache is no longer beneficial and is complicating the VMA code. Remove the vmacache to reduce the work in keeping it up to date and code complexity. Link: https://lkml.kernel.org/r/20220906194824.2110408-26-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Acked-by: Vlastimil Babka Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: "Matthew Wilcox (Oracle)" Cc: SeongJae Park Cc: Sven Schnelle Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 1 - include/linux/mm_types_task.h | 12 ------------ include/linux/sched.h | 1 - include/linux/vm_event_item.h | 4 ---- include/linux/vmacache.h | 28 ---------------------------- include/linux/vmstat.h | 6 ------ 6 files changed, 52 deletions(-) delete mode 100644 include/linux/vmacache.h (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index ac747273c4d6..4541b74b1bdb 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -475,7 +475,6 @@ struct mm_struct { struct { struct vm_area_struct *mmap; /* list of VMAs */ struct maple_tree mm_mt; - u64 vmacache_seqnum; /* per-thread vmacache */ #ifdef CONFIG_MMU unsigned long (*get_unmapped_area) (struct file *filp, unsigned long addr, unsigned long len, diff --git a/include/linux/mm_types_task.h b/include/linux/mm_types_task.h index c1bc6731125c..0bb4b6da9993 100644 --- a/include/linux/mm_types_task.h +++ b/include/linux/mm_types_task.h @@ -24,18 +24,6 @@ IS_ENABLED(CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK)) #define ALLOC_SPLIT_PTLOCKS (SPINLOCK_SIZE > BITS_PER_LONG/8) -/* - * The per task VMA cache array: - */ -#define VMACACHE_BITS 2 -#define VMACACHE_SIZE (1U << VMACACHE_BITS) -#define VMACACHE_MASK (VMACACHE_SIZE - 1) - -struct vmacache { - u64 seqnum; - struct vm_area_struct *vmas[VMACACHE_SIZE]; -}; - /* * When updating this, please also update struct resident_page_types[] in * kernel/fork.c diff --git a/include/linux/sched.h b/include/linux/sched.h index a2dcfb91df03..fbac3c19fe35 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -861,7 +861,6 @@ struct task_struct { struct mm_struct *active_mm; /* Per-thread vma caching: */ - struct vmacache vmacache; #ifdef SPLIT_RSS_COUNTING struct task_rss_stat rss_stat; diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index f3fc36cd2276..3518dba1e02f 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -129,10 +129,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, NR_TLB_LOCAL_FLUSH_ALL, NR_TLB_LOCAL_FLUSH_ONE, #endif /* CONFIG_DEBUG_TLBFLUSH */ -#ifdef CONFIG_DEBUG_VM_VMACACHE - VMACACHE_FIND_CALLS, - VMACACHE_FIND_HITS, -#endif #ifdef CONFIG_SWAP SWAP_RA, SWAP_RA_HIT, diff --git a/include/linux/vmacache.h b/include/linux/vmacache.h deleted file mode 100644 index 6fce268a4588..000000000000 --- a/include/linux/vmacache.h +++ /dev/null @@ -1,28 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __LINUX_VMACACHE_H -#define __LINUX_VMACACHE_H - -#include -#include - -static inline void vmacache_flush(struct task_struct *tsk) -{ - memset(tsk->vmacache.vmas, 0, sizeof(tsk->vmacache.vmas)); -} - -extern void vmacache_update(unsigned long addr, struct vm_area_struct *newvma); -extern struct vm_area_struct *vmacache_find(struct mm_struct *mm, - unsigned long addr); - -#ifndef CONFIG_MMU -extern struct vm_area_struct *vmacache_find_exact(struct mm_struct *mm, - unsigned long start, - unsigned long end); -#endif - -static inline void vmacache_invalidate(struct mm_struct *mm) -{ - mm->vmacache_seqnum++; -} - -#endif /* __LINUX_VMACACHE_H */ diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index bfe38869498d..19cf5b6892ce 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -125,12 +125,6 @@ static inline void vm_events_fold_cpu(int cpu) #define count_vm_tlb_events(x, y) do { (void)(y); } while (0) #endif -#ifdef CONFIG_DEBUG_VM_VMACACHE -#define count_vm_vmacache_event(x) count_vm_event(x) -#else -#define count_vm_vmacache_event(x) do {} while (0) -#endif - #define __count_zid_vm_events(item, zid, delta) \ __count_vm_events(item##_NORMAL - ZONE_NORMAL + zid, delta) -- cgit From d7c62295570f012e1d386ae6ed472b36baf037ad Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:51 +0000 Subject: mm: convert vma_lookup() to use mtree_load() Unlike the rbtree, the Maple Tree will return a NULL if there's nothing at a particular address. Since the previous commit dropped the vmacache, it is now possible to consult the tree directly. Link: https://lkml.kernel.org/r/20220906194824.2110408-27-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Signed-off-by: Matthew Wilcox (Oracle) Acked-by: Vlastimil Babka Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: SeongJae Park Cc: Sven Schnelle Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 06a6b8db75b7..49a58807719b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2795,12 +2795,7 @@ struct vm_area_struct *find_vma_intersection(struct mm_struct *mm, static inline struct vm_area_struct *vma_lookup(struct mm_struct *mm, unsigned long addr) { - struct vm_area_struct *vma = find_vma(mm, addr); - - if (vma && addr < vma->vm_start) - vma = NULL; - - return vma; + return mtree_load(&mm->mm_mt, addr); } static inline unsigned long vm_start_gap(struct vm_area_struct *vma) -- cgit From 11f9a21ab65542189372b7d64bb2d2937dfdc9dc Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:52 +0000 Subject: mm/mmap: reorganize munmap to use maple states Remove __do_munmap() in favour of do_munmap(), do_mas_munmap(), and do_mas_align_munmap(). do_munmap() is a wrapper to create a maple state for any callers that have not been converted to the maple tree. do_mas_munmap() takes a maple state to mumap a range. This is just a small function which checks for error conditions and aligns the end of the range. do_mas_align_munmap() uses the aligned range to mumap a range. do_mas_align_munmap() starts with the first VMA in the range, then finds the last VMA in the range. Both start and end are split if necessary. Then the VMAs are removed from the linked list and the mm mlock count is updated at the same time. Followed by a single tree operation of overwriting the area in with a NULL. Finally, the detached list is unmapped and freed. By reorganizing the munmap calls as outlined, it is now possible to avoid extra work of aligning pre-aligned callers which are known to be safe, avoid extra VMA lookups or tree walks for modifications. detach_vmas_to_be_unmapped() is no longer used, so drop this code. vm_brk_flags() can just call the do_mas_munmap() as it checks for intersecting VMAs directly. Link: https://lkml.kernel.org/r/20220906194824.2110408-29-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: "Matthew Wilcox (Oracle)" Cc: SeongJae Park Cc: Sven Schnelle Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 49a58807719b..579449d6c23b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2710,8 +2710,9 @@ extern unsigned long mmap_region(struct file *file, unsigned long addr, extern unsigned long do_mmap(struct file *file, unsigned long addr, unsigned long len, unsigned long prot, unsigned long flags, unsigned long pgoff, unsigned long *populate, struct list_head *uf); -extern int __do_munmap(struct mm_struct *, unsigned long, size_t, - struct list_head *uf, bool downgrade); +extern int do_mas_munmap(struct ma_state *mas, struct mm_struct *mm, + unsigned long start, size_t len, struct list_head *uf, + bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior); -- cgit From 69dbe6daf1041e32e003f966d71f70f20c63af53 Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:48:57 +0000 Subject: userfaultfd: use maple tree iterator to iterate VMAs Don't use the mm_struct linked list or the vma->vm_next in prep for removal. Link: https://lkml.kernel.org/r/20220906194824.2110408-45-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: "Matthew Wilcox (Oracle)" Cc: SeongJae Park Cc: Sven Schnelle Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/userfaultfd_k.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index e1b8a915e9e9..f07e6998bb68 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -175,9 +175,8 @@ extern bool userfaultfd_remove(struct vm_area_struct *vma, unsigned long start, unsigned long end); -extern int userfaultfd_unmap_prep(struct vm_area_struct *vma, - unsigned long start, unsigned long end, - struct list_head *uf); +extern int userfaultfd_unmap_prep(struct mm_struct *mm, unsigned long start, + unsigned long end, struct list_head *uf); extern void userfaultfd_unmap_complete(struct mm_struct *mm, struct list_head *uf); @@ -258,7 +257,7 @@ static inline bool userfaultfd_remove(struct vm_area_struct *vma, return true; } -static inline int userfaultfd_unmap_prep(struct vm_area_struct *vma, +static inline int userfaultfd_unmap_prep(struct mm_struct *mm, unsigned long start, unsigned long end, struct list_head *uf) { -- cgit From 763ecb035029f500d7e6dc99acd1ad299b7726a1 Mon Sep 17 00:00:00 2001 From: "Liam R. Howlett" Date: Tue, 6 Sep 2022 19:49:06 +0000 Subject: mm: remove the vma linked list Replace any vm_next use with vma_find(). Update free_pgtables(), unmap_vmas(), and zap_page_range() to use the maple tree. Use the new free_pgtables() and unmap_vmas() in do_mas_align_munmap(). At the same time, alter the loop to be more compact. Now that free_pgtables() and unmap_vmas() take a maple tree as an argument, rearrange do_mas_align_munmap() to use the new tree to hold the vmas to remove. Remove __vma_link_list() and __vma_unlink_list() as they are exclusively used to update the linked list. Drop linked list update from __insert_vm_struct(). Rework validation of tree as it was depending on the linked list. [yang.lee@linux.alibaba.com: fix one kernel-doc comment] Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=1949 Link: https://lkml.kernel.org/r/20220824021918.94116-1-yang.lee@linux.alibaba.comLink: https://lkml.kernel.org/r/20220906194824.2110408-69-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Signed-off-by: Yang Li Tested-by: Yu Zhao Cc: Catalin Marinas Cc: David Hildenbrand Cc: David Howells Cc: Davidlohr Bueso Cc: "Matthew Wilcox (Oracle)" Cc: SeongJae Park Cc: Sven Schnelle Cc: Vlastimil Babka Cc: Will Deacon Signed-off-by: Andrew Morton --- include/linux/mm.h | 5 +++-- include/linux/mm_types.h | 4 ---- 2 files changed, 3 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 579449d6c23b..37384a84f71a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1857,8 +1857,9 @@ void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address, unsigned long size); void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size); -void unmap_vmas(struct mmu_gather *tlb, struct vm_area_struct *start_vma, - unsigned long start, unsigned long end); +void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt, + struct vm_area_struct *start_vma, unsigned long start, + unsigned long end); struct mmu_notifier_range; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 4541b74b1bdb..5e32211cb5a9 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -408,8 +408,6 @@ struct vm_area_struct { unsigned long vm_end; /* The first byte after our end address within vm_mm. */ - /* linked list of VM areas per task, sorted by address */ - struct vm_area_struct *vm_next, *vm_prev; struct mm_struct *vm_mm; /* The address space we belong to. */ /* @@ -473,7 +471,6 @@ struct vm_area_struct { struct kioctx_table; struct mm_struct { struct { - struct vm_area_struct *mmap; /* list of VMAs */ struct maple_tree mm_mt; #ifdef CONFIG_MMU unsigned long (*get_unmapped_area) (struct file *filp, @@ -488,7 +485,6 @@ struct mm_struct { unsigned long mmap_compat_legacy_base; #endif unsigned long task_size; /* size of task vm space */ - unsigned long highest_vm_end; /* highest vma end address */ pgd_t * pgd; #ifdef CONFIG_MEMBARRIER -- cgit From bf3980c85212fc71512d27a46f5aab66f46ca284 Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Tue, 31 May 2022 15:30:59 -0700 Subject: mm: drop oom code from exit_mmap The primary reason to invoke the oom reaper from the exit_mmap path used to be a prevention of an excessive oom killing if the oom victim exit races with the oom reaper (see [1] for more details). The invocation has moved around since then because of the interaction with the munlock logic but the underlying reason has remained the same (see [2]). Munlock code is no longer a problem since [3] and there shouldn't be any blocking operation before the memory is unmapped by exit_mmap so the oom reaper invocation can be dropped. The unmapping part can be done with the non-exclusive mmap_sem and the exclusive one is only required when page tables are freed. Remove the oom_reaper from exit_mmap which will make the code easier to read. This is really unlikely to make any observable difference although some microbenchmarks could benefit from one less branch that needs to be evaluated even though it almost never is true. [1] 212925802454 ("mm: oom: let oom_reap_task and exit_mmap run concurrently") [2] 27ae357fa82b ("mm, oom: fix concurrent munlock and oom reaper unmap, v3") [3] a213e5cf71cb ("mm/munlock: delete munlock_vma_pages_all(), allow oomreap") [akpm@linux-foundation.org: restore Suren's mmap_read_lock() optimization] Link: https://lkml.kernel.org/r/20220531223100.510392-1-surenb@google.com Signed-off-by: Suren Baghdasaryan Acked-by: Michal Hocko Cc: Andrea Arcangeli Cc: Christian Brauner (Microsoft) Cc: Christoph Hellwig Cc: David Hildenbrand Cc: David Rientjes Cc: Jann Horn Cc: Johannes Weiner Cc: John Hubbard Cc: "Kirill A . Shutemov" Cc: Liam Howlett Cc: Matthew Wilcox Cc: Minchan Kim Cc: Oleg Nesterov Cc: Peter Xu Cc: Roman Gushchin Cc: Shakeel Butt Cc: Shuah Khan Signed-off-by: Andrew Morton --- include/linux/oom.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/oom.h b/include/linux/oom.h index 02d1e7bbd8cd..6cdde62b078b 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -106,8 +106,6 @@ static inline vm_fault_t check_stable_address_space(struct mm_struct *mm) return 0; } -bool __oom_reap_task_mm(struct mm_struct *mm); - long oom_badness(struct task_struct *p, unsigned long totalpages); -- cgit From b3541d912a84dc40cabb516f2deeac9ae6fa30da Mon Sep 17 00:00:00 2001 From: Suren Baghdasaryan Date: Tue, 31 May 2022 15:31:00 -0700 Subject: mm: delete unused MMF_OOM_VICTIM flag With the last usage of MMF_OOM_VICTIM in exit_mmap gone, this flag is now unused and can be removed. [akpm@linux-foundation.org: remove comment about now-removed mm_is_oom_victim()] Link: https://lkml.kernel.org/r/20220531223100.510392-2-surenb@google.com Signed-off-by: Suren Baghdasaryan Acked-by: Michal Hocko Cc: David Rientjes Cc: Matthew Wilcox Cc: Johannes Weiner Cc: Roman Gushchin Cc: Minchan Kim Cc: "Kirill A . Shutemov" Cc: Andrea Arcangeli Cc: Christian Brauner (Microsoft) Cc: Christoph Hellwig Cc: Oleg Nesterov Cc: David Hildenbrand Cc: Jann Horn Cc: Shakeel Butt Cc: Peter Xu Cc: John Hubbard Cc: Shuah Khan Cc: Liam Howlett Signed-off-by: Andrew Morton --- include/linux/oom.h | 9 --------- include/linux/sched/coredump.h | 7 +++---- 2 files changed, 3 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/oom.h b/include/linux/oom.h index 6cdde62b078b..7d0c9c48a0c5 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -77,15 +77,6 @@ static inline bool tsk_is_oom_victim(struct task_struct * tsk) return tsk->signal->oom_mm; } -/* - * Use this helper if tsk->mm != mm and the victim mm needs a special - * handling. This is guaranteed to stay true after once set. - */ -static inline bool mm_is_oom_victim(struct mm_struct *mm) -{ - return test_bit(MMF_OOM_VICTIM, &mm->flags); -} - /* * Checks whether a page fault on the given mm is still reliable. * This is no longer true if the oom reaper started to reap the diff --git a/include/linux/sched/coredump.h b/include/linux/sched/coredump.h index 4d0a5be28b70..8270ad7ae14c 100644 --- a/include/linux/sched/coredump.h +++ b/include/linux/sched/coredump.h @@ -71,9 +71,8 @@ static inline int get_dumpable(struct mm_struct *mm) #define MMF_UNSTABLE 22 /* mm is unstable for copy_from_user */ #define MMF_HUGE_ZERO_PAGE 23 /* mm has ever used the global huge zero page */ #define MMF_DISABLE_THP 24 /* disable THP for all VMAs */ -#define MMF_OOM_VICTIM 25 /* mm is the oom victim */ -#define MMF_OOM_REAP_QUEUED 26 /* mm was queued for oom_reaper */ -#define MMF_MULTIPROCESS 27 /* mm is shared between processes */ +#define MMF_OOM_REAP_QUEUED 25 /* mm was queued for oom_reaper */ +#define MMF_MULTIPROCESS 26 /* mm is shared between processes */ /* * MMF_HAS_PINNED: Whether this mm has pinned any pages. This can be either * replaced in the future by mm.pinned_vm when it becomes stable, or grow into @@ -81,7 +80,7 @@ static inline int get_dumpable(struct mm_struct *mm) * pinned pages were unpinned later on, we'll still keep this bit set for the * lifecycle of this mm, just for simplicity. */ -#define MMF_HAS_PINNED 28 /* FOLL_PIN has run, never cleared */ +#define MMF_HAS_PINNED 27 /* FOLL_PIN has run, never cleared */ #define MMF_DISABLE_THP_MASK (1 << MMF_DISABLE_THP) #define MMF_INIT_MASK (MMF_DUMPABLE_MASK | MMF_DUMP_FILTER_MASK |\ -- cgit From 474098edac262ae26bfab1c48445877075a31cbd Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 25 Aug 2022 18:46:57 +0200 Subject: mm/gup: replace FOLL_NUMA by gup_can_follow_protnone() Patch series "mm: minor cleanups around NUMA hinting". Working on some GUP cleanups (e.g., getting rid of some FOLL_ flags) and preparing for other GUP changes (getting rid of FOLL_FORCE|FOLL_WRITE for for taking a R/O longterm pin), this is something I can easily send out independently. Get rid of FOLL_NUMA, allow FOLL_FORCE access to PROT_NONE mapped pages in GUP-fast, and fixup some documentation around NUMA hinting. This patch (of 3): No need for a special flag that is not even properly documented to be internal-only. Let's just factor this check out and get rid of this flag. The separate function has the nice benefit that we can centralize comments. Link: https://lkml.kernel.org/r/20220825164659.89824-2-david@redhat.com Link: https://lkml.kernel.org/r/20220825164659.89824-1-david@redhat.com Signed-off-by: David Hildenbrand Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Jason Gunthorpe Cc: John Hubbard Cc: Matthew Wilcox Cc: Mel Gorman Cc: Peter Xu Signed-off-by: Andrew Morton --- include/linux/mm.h | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 37384a84f71a..eb25cae06c55 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2933,7 +2933,6 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, * and return without waiting upon it */ #define FOLL_NOFAULT 0x80 /* do not fault in pages */ #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ -#define FOLL_NUMA 0x200 /* force NUMA hinting page fault */ #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ #define FOLL_TRIED 0x800 /* a retry, previous pass started an IO */ #define FOLL_REMOTE 0x2000 /* we are working on non-current tsk/mm */ @@ -3054,6 +3053,21 @@ static inline bool gup_must_unshare(unsigned int flags, struct page *page) return !PageAnonExclusive(page); } +/* + * Indicates whether GUP can follow a PROT_NONE mapped page, or whether + * a (NUMA hinting) fault is required. + */ +static inline bool gup_can_follow_protnone(unsigned int flags) +{ + /* + * FOLL_FORCE has to be able to make progress even if the VMA is + * inaccessible. Further, FOLL_FORCE access usually does not represent + * application behaviour and we should avoid triggering NUMA hinting + * faults. + */ + return flags & FOLL_FORCE; +} + typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data); extern int apply_to_page_range(struct mm_struct *mm, unsigned long address, unsigned long size, pte_fn_t fn, void *data); -- cgit From 7014887a01587d8c50871d5985cd572ca08b29c0 Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Thu, 25 Aug 2022 18:46:59 +0200 Subject: mm: fixup documentation regarding pte_numa() and PROT_NUMA pte_numa() no longer exists -- replaced by pte_protnone() -- and PROT_NUMA probably never existed: MM_CP_PROT_NUMA also ends up using PROT_NONE. Let's fixup the doc. Link: https://lkml.kernel.org/r/20220825164659.89824-4-david@redhat.com Signed-off-by: David Hildenbrand Cc: Andrea Arcangeli Cc: Hugh Dickins Cc: Jason Gunthorpe Cc: John Hubbard Cc: Matthew Wilcox Cc: Mel Gorman Cc: Peter Xu Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5e32211cb5a9..26573ba485f3 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -614,22 +614,22 @@ struct mm_struct { #endif #ifdef CONFIG_NUMA_BALANCING /* - * numa_next_scan is the next time that the PTEs will be marked - * pte_numa. NUMA hinting faults will gather statistics and - * migrate pages to new nodes if necessary. + * numa_next_scan is the next time that PTEs will be remapped + * PROT_NONE to trigger NUMA hinting faults; such faults gather + * statistics and migrate pages to new nodes if necessary. */ unsigned long numa_next_scan; - /* Restart point for scanning and setting pte_numa */ + /* Restart point for scanning and remapping PTEs. */ unsigned long numa_scan_offset; - /* numa_scan_seq prevents two threads setting pte_numa */ + /* numa_scan_seq prevents two threads remapping PTEs. */ int numa_scan_seq; #endif /* * An operation with batched TLB flushing is going on. Anything * that can move process memory needs to flush the TLB when - * moving a PROT_NONE or PROT_NUMA mapped page. + * moving a PROT_NONE mapped page. */ atomic_t tlb_flush_pending; #ifdef CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH -- cgit From 974f4367dd315acc15ad4a6453f8304aea60dfbd Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 23 Aug 2022 11:22:30 +0200 Subject: mm: reduce noise in show_mem for lowmem allocations While discussing early DMA pool pre-allocation failure with Christoph [1] I have realized that the allocation failure warning is rather noisy for constrained allocations like GFP_DMA{32}. Those zones are usually not populated on all nodes very often as their memory ranges are constrained. This is an attempt to reduce the ballast that doesn't provide any relevant information for those allocation failures investigation. Please note that I have only compile tested it (in my default config setup) and I am throwing it mostly to see what people think about it. [1] http://lkml.kernel.org/r/20220817060647.1032426-1-hch@lst.de [mhocko@suse.com: update] Link: https://lkml.kernel.org/r/Yw29bmJTIkKogTiW@dhcp22.suse.cz [mhocko@suse.com: fix build] [akpm@linux-foundation.org: fix it for mapletree] [akpm@linux-foundation.org: update it for Michal's update] [mhocko@suse.com: fix arch/powerpc/xmon/xmon.c] Link: https://lkml.kernel.org/r/Ywh3C4dKB9B93jIy@dhcp22.suse.cz [akpm@linux-foundation.org: fix arch/sparc/kernel/setup_32.c] Link: https://lkml.kernel.org/r/YwScVmVofIZkopkF@dhcp22.suse.cz Signed-off-by: Michal Hocko Acked-by: Johannes Weiner Acked-by: Vlastimil Babka Cc: Christoph Hellwig Cc: Mel Gorman Cc: Dan Carpenter Signed-off-by: Andrew Morton --- include/linux/mm.h | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index eb25cae06c55..e56dd8f7eae1 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1838,7 +1838,11 @@ extern void pagefault_out_of_memory(void); */ #define SHOW_MEM_FILTER_NODES (0x0001u) /* disallowed nodes */ -extern void show_free_areas(unsigned int flags, nodemask_t *nodemask); +extern void __show_free_areas(unsigned int flags, nodemask_t *nodemask, int max_zone_idx); +static void __maybe_unused show_free_areas(unsigned int flags, nodemask_t *nodemask) +{ + __show_free_areas(flags, nodemask, MAX_NR_ZONES - 1); +} #ifdef CONFIG_MMU extern bool can_do_mlock(void); @@ -2578,7 +2582,12 @@ extern void calculate_min_free_kbytes(void); extern int __meminit init_per_zone_wmark_min(void); extern void mem_init(void); extern void __init mmap_init(void); -extern void show_mem(unsigned int flags, nodemask_t *nodemask); + +extern void __show_mem(unsigned int flags, nodemask_t *nodemask, int max_zone_idx); +static inline void show_mem(unsigned int flags, nodemask_t *nodemask) +{ + __show_mem(flags, nodemask, MAX_NR_ZONES - 1); +} extern long si_mem_available(void); extern void si_meminfo(struct sysinfo * val); extern void si_meminfo_node(struct sysinfo *val, int nid); -- cgit From e6ad640bc404eb298dd1880113131768ddf5c6a8 Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Fri, 26 Aug 2022 23:06:42 +0000 Subject: mm: deduplicate cacheline padding code There are three users (mmzone.h, memcontrol.h, page_counter.h) using similar code for forcing cacheline padding between fields of different structures. Dedup that code. Link: https://lkml.kernel.org/r/20220826230642.566725-1-shakeelb@google.com Signed-off-by: Shakeel Butt Suggested-by: Feng Tang Reviewed-by: Feng Tang Acked-by: Michal Hocko Signed-off-by: Andrew Morton --- include/linux/cache.h | 13 +++++++++++++ include/linux/memcontrol.h | 13 ++----------- include/linux/mmzone.h | 24 +++++------------------- include/linux/page_counter.h | 13 ++----------- 4 files changed, 22 insertions(+), 41 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cache.h b/include/linux/cache.h index d742c57eaee5..5da1bbd96154 100644 --- a/include/linux/cache.h +++ b/include/linux/cache.h @@ -85,4 +85,17 @@ #define cache_line_size() L1_CACHE_BYTES #endif +/* + * Helper to add padding within a struct to ensure data fall into separate + * cachelines. + */ +#if defined(CONFIG_SMP) +struct cacheline_padding { + char x[0]; +} ____cacheline_internodealigned_in_smp; +#define CACHELINE_PADDING(name) struct cacheline_padding name +#else +#define CACHELINE_PADDING(name) +#endif + #endif /* __LINUX_CACHE_H */ diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 344022f102c2..60545e4a1c03 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -185,15 +185,6 @@ struct mem_cgroup_thresholds { struct mem_cgroup_threshold_ary *spare; }; -#if defined(CONFIG_SMP) -struct memcg_padding { - char x[0]; -} ____cacheline_internodealigned_in_smp; -#define MEMCG_PADDING(name) struct memcg_padding name -#else -#define MEMCG_PADDING(name) -#endif - /* * Remember four most recent foreign writebacks with dirty pages in this * cgroup. Inode sharing is expected to be uncommon and, even if we miss @@ -304,7 +295,7 @@ struct mem_cgroup { spinlock_t move_lock; unsigned long move_lock_flags; - MEMCG_PADDING(_pad1_); + CACHELINE_PADDING(_pad1_); /* memory.stat */ struct memcg_vmstats vmstats; @@ -326,7 +317,7 @@ struct mem_cgroup { struct list_head objcg_list; #endif - MEMCG_PADDING(_pad2_); + CACHELINE_PADDING(_pad2_); /* * set > 0 if pages under this cgroup are moving to other cgroup. diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index e335a492c2eb..c69c08156822 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -121,20 +121,6 @@ static inline bool free_area_empty(struct free_area *area, int migratetype) struct pglist_data; -/* - * Add a wild amount of padding here to ensure data fall into separate - * cachelines. There are very few zone structures in the machine, so space - * consumption is not a concern here. - */ -#if defined(CONFIG_SMP) -struct zone_padding { - char x[0]; -} ____cacheline_internodealigned_in_smp; -#define ZONE_PADDING(name) struct zone_padding name; -#else -#define ZONE_PADDING(name) -#endif - #ifdef CONFIG_NUMA enum numa_stat_item { NUMA_HIT, /* allocated in intended node */ @@ -837,7 +823,7 @@ struct zone { int initialized; /* Write-intensive fields used from the page allocator */ - ZONE_PADDING(_pad1_) + CACHELINE_PADDING(_pad1_); /* free areas of different sizes */ struct free_area free_area[MAX_ORDER]; @@ -849,7 +835,7 @@ struct zone { spinlock_t lock; /* Write-intensive fields used by compaction and vmstats. */ - ZONE_PADDING(_pad2_) + CACHELINE_PADDING(_pad2_); /* * When free pages are below this point, additional steps are taken @@ -886,7 +872,7 @@ struct zone { bool contiguous; - ZONE_PADDING(_pad3_) + CACHELINE_PADDING(_pad3_); /* Zone statistics */ atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; atomic_long_t vm_numa_event[NR_VM_NUMA_EVENT_ITEMS]; @@ -1196,7 +1182,7 @@ typedef struct pglist_data { #endif /* CONFIG_NUMA */ /* Write-intensive fields used by page reclaim */ - ZONE_PADDING(_pad1_) + CACHELINE_PADDING(_pad1_); #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* @@ -1241,7 +1227,7 @@ typedef struct pglist_data { struct lru_gen_mm_walk mm_walk; #endif - ZONE_PADDING(_pad2_) + CACHELINE_PADDING(_pad2_); /* Per-node vmstats */ struct per_cpu_nodestat __percpu *per_cpu_nodestats; diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index 78a1c934e416..c141ea9a95ef 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -7,22 +7,13 @@ #include #include -#if defined(CONFIG_SMP) -struct pc_padding { - char x[0]; -} ____cacheline_internodealigned_in_smp; -#define PC_PADDING(name) struct pc_padding name -#else -#define PC_PADDING(name) -#endif - struct page_counter { /* * Make sure 'usage' does not share cacheline with any other field. The * memcg->memory.usage is a hot member of struct mem_cgroup. */ atomic_long_t usage; - PC_PADDING(_pad1_); + CACHELINE_PADDING(_pad1_); /* effective memory.min and memory.min usage tracking */ unsigned long emin; @@ -38,7 +29,7 @@ struct page_counter { unsigned long failcnt; /* Keep all the read most fields in a separete cacheline. */ - PC_PADDING(_pad2_); + CACHELINE_PADDING(_pad2_); unsigned long min; unsigned long low; -- cgit From cb4df4cae4f2bd8cf7a32eff81178fce31600f7c Mon Sep 17 00:00:00 2001 From: xu xin Date: Tue, 30 Aug 2022 14:38:38 +0000 Subject: ksm: count allocated ksm rmap_items for each process Patch series "ksm: count allocated rmap_items and update documentation", v5. KSM can save memory by merging identical pages, but also can consume additional memory, because it needs to generate rmap_items to save each scanned page's brief rmap information. To determine how beneficial the ksm-policy (like madvise), they are using brings, so we add a new interface /proc//ksm_stat for each process The value "ksm_rmap_items" in it indicates the total allocated ksm rmap_items of this process. The detailed description can be seen in the following patches' commit message. This patch (of 2): KSM can save memory by merging identical pages, but also can consume additional memory, because it needs to generate rmap_items to save each scanned page's brief rmap information. Some of these pages may be merged, but some may not be abled to be merged after being checked several times, which are unprofitable memory consumed. The information about whether KSM save memory or consume memory in system-wide range can be determined by the comprehensive calculation of pages_sharing, pages_shared, pages_unshared and pages_volatile. A simple approximate calculation: profit =~ pages_sharing * sizeof(page) - (all_rmap_items) * sizeof(rmap_item); where all_rmap_items equals to the sum of pages_sharing, pages_shared, pages_unshared and pages_volatile. But we cannot calculate this kind of ksm profit inner single-process wide because the information of ksm rmap_item's number of a process is lacked. For user applications, if this kind of information could be obtained, it helps upper users know how beneficial the ksm-policy (like madvise) they are using brings, and then optimize their app code. For example, one application madvise 1000 pages as MERGEABLE, while only a few pages are really merged, then it's not cost-efficient. So we add a new interface /proc//ksm_stat for each process in which the value of ksm_rmap_itmes is only shown now and so more values can be added in future. So similarly, we can calculate the ksm profit approximately for a single process by: profit =~ ksm_merging_pages * sizeof(page) - ksm_rmap_items * sizeof(rmap_item); where ksm_merging_pages is shown at /proc//ksm_merging_pages, and ksm_rmap_items is shown in /proc//ksm_stat. Link: https://lkml.kernel.org/r/20220830143731.299702-1-xu.xin16@zte.com.cn Link: https://lkml.kernel.org/r/20220830143838.299758-1-xu.xin16@zte.com.cn Signed-off-by: xu xin Reviewed-by: Xiaokai Ran Reviewed-by: Yang Yang Signed-off-by: CGEL ZTE Cc: Alexey Dobriyan Cc: Bagas Sanjaya Cc: Hugh Dickins Cc: Izik Eidus Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 26573ba485f3..8f30f262431c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -654,6 +654,11 @@ struct mm_struct { * merging. */ unsigned long ksm_merging_pages; + /* + * Represent how many pages are checked for ksm merging + * including merged and not merged. + */ + unsigned long ksm_rmap_items; #endif #ifdef CONFIG_LRU_GEN struct { -- cgit From bf7a87f1075f67c286f794519f0fedfa8b0b18cc Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Mon, 26 Sep 2022 17:33:35 +0200 Subject: kprobes: Add new KPROBE_FLAG_ON_FUNC_ENTRY kprobe flag Adding KPROBE_FLAG_ON_FUNC_ENTRY kprobe flag to indicate that attach address is on function entry. This is used in following changes in get_func_ip helper to return correct function address. Acked-by: Masami Hiramatsu (Google) Signed-off-by: Jiri Olsa Link: https://lore.kernel.org/r/20220926153340.1621984-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/kprobes.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 55041d2f884d..a0b92be98984 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -103,6 +103,7 @@ struct kprobe { * this flag is only for optimized_kprobe. */ #define KPROBE_FLAG_FTRACE 8 /* probe is using ftrace */ +#define KPROBE_FLAG_ON_FUNC_ENTRY 16 /* probe is on the function entry */ /* Has this kprobe gone ? */ static inline bool kprobe_gone(struct kprobe *p) -- cgit From 19c02415da2345d0dda2b5c4495bc17cc14b18b5 Mon Sep 17 00:00:00 2001 From: Song Liu Date: Mon, 26 Sep 2022 11:47:38 -0700 Subject: bpf: use bpf_prog_pack for bpf_dispatcher Allocate bpf_dispatcher with bpf_prog_pack_alloc so that bpf_dispatcher can share pages with bpf programs. arch_prepare_bpf_dispatcher() is updated to provide a RW buffer as working area for arch code to write to. This also fixes CPA W^X warnning like: CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ... Signed-off-by: Song Liu Link: https://lore.kernel.org/r/20220926184739.3512547-2-song@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 3 ++- include/linux/filter.h | 5 +++++ 2 files changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index edd43edb27d6..9ae155c75014 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -946,6 +946,7 @@ struct bpf_dispatcher { struct bpf_dispatcher_prog progs[BPF_DISPATCHER_MAX]; int num_progs; void *image; + void *rw_image; u32 image_off; struct bpf_ksym ksym; }; @@ -964,7 +965,7 @@ int bpf_trampoline_unlink_prog(struct bpf_tramp_link *link, struct bpf_trampolin struct bpf_trampoline *bpf_trampoline_get(u64 key, struct bpf_attach_target_info *tgt_info); void bpf_trampoline_put(struct bpf_trampoline *tr); -int arch_prepare_bpf_dispatcher(void *image, s64 *funcs, int num_funcs); +int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs); #define BPF_DISPATCHER_INIT(_name) { \ .mutex = __MUTEX_INITIALIZER(_name.mutex), \ .func = &_name##_func, \ diff --git a/include/linux/filter.h b/include/linux/filter.h index 98e28126c24b..efc42a6e3aed 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1023,6 +1023,8 @@ extern long bpf_jit_limit_max; typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size); +void bpf_jit_fill_hole_with_zero(void *area, unsigned int size); + struct bpf_binary_header * bpf_jit_binary_alloc(unsigned int proglen, u8 **image_ptr, unsigned int alignment, @@ -1035,6 +1037,9 @@ void bpf_jit_free(struct bpf_prog *fp); struct bpf_binary_header * bpf_jit_binary_pack_hdr(const struct bpf_prog *fp); +void *bpf_prog_pack_alloc(u32 size, bpf_jit_fill_hole_t bpf_fill_ill_insns); +void bpf_prog_pack_free(struct bpf_binary_header *hdr); + static inline bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp) { return list_empty(&fp->aux->ksym.lnode) || -- cgit From 5b0d1c7bd5722467960829af51d523f5a6ffd848 Mon Sep 17 00:00:00 2001 From: Song Liu Date: Mon, 26 Sep 2022 11:47:39 -0700 Subject: bpf: Enforce W^X for bpf trampoline Mark the trampoline as RO+X after arch_prepare_bpf_trampoline, so that the trampoine follows W^X rule strictly. This will turn off warnings like CPA refuse W^X violation: 8000000000000163 -> 0000000000000163 range: ... Also remove bpf_jit_alloc_exec_page(), since it is not used any more. Signed-off-by: Song Liu Link: https://lore.kernel.org/r/20220926184739.3512547-3-song@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9ae155c75014..5161fac0513f 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1008,7 +1008,6 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, struct bpf_prog *to); /* Called only from JIT-enabled code, so there's no need for stubs. */ -void *bpf_jit_alloc_exec_page(void); void bpf_image_ksym_add(void *data, struct bpf_ksym *ksym); void bpf_image_ksym_del(struct bpf_ksym *ksym); void bpf_ksym_add(struct bpf_ksym *ksym); -- cgit From 1c32a8012b7fabe469b6af826edfd4ae2a6201d3 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 20 Sep 2022 15:38:58 +0200 Subject: nvme: improve the NVME_CONNECT_AUTHREQ* definitions Mark them as unsigned so that we don't need extra casts, and define them relative to cdword0 instead of requiring extra shifts. Signed-off-by: Christoph Hellwig Reviewed-by: Sagi Grimberg Reviewed-by: Hannes Reinecke --- include/linux/nvme.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/nvme.h b/include/linux/nvme.h index ae53d74f3696..050d7d0cd81b 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -1482,8 +1482,8 @@ struct nvmf_connect_command { }; enum { - NVME_CONNECT_AUTHREQ_ASCR = (1 << 2), - NVME_CONNECT_AUTHREQ_ATR = (1 << 1), + NVME_CONNECT_AUTHREQ_ASCR = (1U << 18), + NVME_CONNECT_AUTHREQ_ATR = (1U << 17), }; struct nvmf_connect_data { -- cgit From f4dc7fffa9873db50ec25624572f8217a6225de8 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Fri, 16 Sep 2022 14:03:06 +0200 Subject: efi: libstub: unify initrd loading between architectures Use a EFI configuration table to pass the initrd to the core kernel, instead of per-arch methods. This cleans up the code considerably, and should make it easier for architectures to get rid of their reliance on DT for doing EFI boot in the future. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index f1b3e0d1b3fa..778ddb22f7da 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1330,6 +1330,11 @@ struct linux_efi_coco_secret_area { u64 size; }; +struct linux_efi_initrd { + unsigned long base; + unsigned long size; +}; + /* Header of a populated EFI secret area */ #define EFI_SECRET_TABLE_HEADER_GUID EFI_GUID(0x1e74f542, 0x71dd, 0x4d66, 0x96, 0x3e, 0xef, 0x42, 0x87, 0xff, 0x17, 0x3b) -- cgit From 171539f5a90e3fdf7d17f5396fac79d7e44ad68e Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 15 Sep 2022 23:20:06 +0200 Subject: efi: libstub: install boot-time memory map as config table Expose the EFI boot time memory map to the kernel via a configuration table. This is arch agnostic and enables future changes that remove the dependency on DT on architectures that don't otherwise rely on it. Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 778ddb22f7da..252b9b328577 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -410,6 +410,7 @@ void efi_native_runtime_setup(void); #define LINUX_EFI_INITRD_MEDIA_GUID EFI_GUID(0x5568e427, 0x68fc, 0x4f3d, 0xac, 0x74, 0xca, 0x55, 0x52, 0x31, 0xcc, 0x68) #define LINUX_EFI_MOK_VARIABLE_TABLE_GUID EFI_GUID(0xc451ed2b, 0x9694, 0x45d3, 0xba, 0xba, 0xed, 0x9f, 0x89, 0x88, 0xa3, 0x89) #define LINUX_EFI_COCO_SECRET_AREA_GUID EFI_GUID(0xadf956ad, 0xe98c, 0x484c, 0xae, 0x11, 0xb5, 0x1c, 0x7d, 0x33, 0x64, 0x47) +#define LINUX_EFI_BOOT_MEMMAP_GUID EFI_GUID(0x800f683f, 0xd08b, 0x423a, 0xa2, 0x93, 0x96, 0x5c, 0x3c, 0x6f, 0xe2, 0xb4) #define RISCV_EFI_BOOT_PROTOCOL_GUID EFI_GUID(0xccd15fec, 0x6f73, 0x4eec, 0x83, 0x95, 0x3e, 0x69, 0xe4, 0xb9, 0x40, 0xbf) -- cgit From 3c6edd9034240ce9582be3392112321336bd25bb Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 22 Sep 2022 12:03:52 +0200 Subject: efi: zboot: create MemoryMapped() device path for the parent if needed LoadImage() is supposed to install an instance of the protocol EFI_LOADED_IMAGE_DEVICE_PATH_PROTOCOL onto the loaded image's handle so that the program can figure out where it was loaded from. The reference implementation even does this (with a NULL protocol pointer) if the call to LoadImage() used the source buffer and size arguments, and passed NULL for the image device path. Hand rolled implementations of LoadImage may behave differently, though, and so it is better to tolerate situations where the protocol is missing. And actually, concatenating an Offset() node to a NULL device path (as we do currently) is not great either. So in cases where the protocol is absent, or when it points to NULL, construct a MemoryMapped() device node as the base node that describes the parent image's footprint in memory. Cc: Daan De Meyer Cc: Jeremy Linton Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 89f16ec3ebab..da3974bf05d3 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1004,6 +1004,13 @@ struct efi_rel_offset_dev_path { u64 ending_offset; } __packed; +struct efi_mem_mapped_dev_path { + struct efi_generic_dev_path header; + u32 memory_type; + u64 starting_addr; + u64 ending_addr; +} __packed; + struct efi_dev_path { union { struct efi_generic_dev_path header; -- cgit From 4bf207d7a54d49637da94dbc00d2c025b74764d1 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Thu, 1 Sep 2022 11:20:53 -0300 Subject: net/mlx5: Add IFC bits for mkey ATS Allows telling a mkey to use PCI ATS for DMA that flows through it. Link: https://lore.kernel.org/r/1-v1-bd147097458e+ede-umem_dmabuf_jgg@nvidia.com Signed-off-by: Jason Gunthorpe --- include/linux/mlx5/mlx5_ifc.h | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 4acd5610e96b..92602e33a82c 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1707,7 +1707,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 steering_format_version[0x4]; u8 create_qp_start_hint[0x18]; - u8 reserved_at_460[0x3]; + u8 reserved_at_460[0x1]; + u8 ats[0x1]; + u8 reserved_at_462[0x1]; u8 log_max_uctx[0x5]; u8 reserved_at_468[0x2]; u8 ipsec_offload[0x1]; @@ -3873,7 +3875,9 @@ struct mlx5_ifc_mkc_bits { u8 lw[0x1]; u8 lr[0x1]; u8 access_mode_1_0[0x2]; - u8 reserved_at_18[0x8]; + u8 reserved_at_18[0x2]; + u8 ma_translation_mode[0x2]; + u8 reserved_at_1c[0x4]; u8 qpn[0x18]; u8 mkey_7_0[0x8]; @@ -11134,7 +11138,8 @@ struct mlx5_ifc_dealloc_memic_out_bits { struct mlx5_ifc_umem_bits { u8 reserved_at_0[0x80]; - u8 reserved_at_80[0x1b]; + u8 ats[0x1]; + u8 reserved_at_81[0x1a]; u8 log_page_size[0x5]; u8 page_offset[0x20]; -- cgit From 987f20a9dcce3989e48d87cff3952c095c994445 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Mon, 26 Sep 2022 17:15:32 -0500 Subject: a.out: Remove the a.out implementation In commit 19e8b701e258 ("a.out: Stop building a.out/osf1 support on alpha and m68k") the last users of a.out were disabled. As nothing has turned up to cause this change to be reverted, let's remove the code implementing a.out support as well. There may be userspace users of the uapi bits left so the uapi headers have been left untouched. Signed-off-by: "Eric W. Biederman" Acked-by: Arnd Bergmann # arm defconfigs Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/871qrx3hq3.fsf@email.froward.int.ebiederm.org --- include/linux/a.out.h | 18 ------------------ 1 file changed, 18 deletions(-) delete mode 100644 include/linux/a.out.h (limited to 'include/linux') diff --git a/include/linux/a.out.h b/include/linux/a.out.h deleted file mode 100644 index 600cf45645c6..000000000000 --- a/include/linux/a.out.h +++ /dev/null @@ -1,18 +0,0 @@ -/* SPDX-License-Identifier: GPL-2.0 */ -#ifndef __A_OUT_GNU_H__ -#define __A_OUT_GNU_H__ - -#include - -#ifndef __ASSEMBLY__ -#ifdef linux -#include -#if defined(__i386__) || defined(__mc68000__) -#else -#ifndef SEGMENT_SIZE -#define SEGMENT_SIZE PAGE_SIZE -#endif -#endif -#endif -#endif /*__ASSEMBLY__ */ -#endif /* __A_OUT_GNU_H__ */ -- cgit From 568ec936bf1384fc15873908c96a9aeb62536edb Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 27 Sep 2022 09:58:15 +0200 Subject: block: replace blk_queue_nowait with bdev_nowait Replace blk_queue_nowait with a bdev_nowait helpers that takes the block_device given that the I/O submission path should not have to look into the request_queue. Signed-off-by: Christoph Hellwig Reviewed-by: Pankaj Raghav Link: https://lore.kernel.org/r/20220927075815.269694-1-hch@lst.de Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 84b13fdd34a7..4750772ef228 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -618,7 +618,6 @@ bool blk_queue_flag_test_and_set(unsigned int flag, struct request_queue *q); #define blk_queue_quiesced(q) test_bit(QUEUE_FLAG_QUIESCED, &(q)->queue_flags) #define blk_queue_pm_only(q) atomic_read(&(q)->pm_only) #define blk_queue_registered(q) test_bit(QUEUE_FLAG_REGISTERED, &(q)->queue_flags) -#define blk_queue_nowait(q) test_bit(QUEUE_FLAG_NOWAIT, &(q)->queue_flags) #define blk_queue_sq_sched(q) test_bit(QUEUE_FLAG_SQ_SCHED, &(q)->queue_flags) extern void blk_set_pm_only(struct request_queue *q); @@ -1280,6 +1279,11 @@ static inline bool bdev_fua(struct block_device *bdev) return test_bit(QUEUE_FLAG_FUA, &bdev_get_queue(bdev)->queue_flags); } +static inline bool bdev_nowait(struct block_device *bdev) +{ + return test_bit(QUEUE_FLAG_NOWAIT, &bdev_get_queue(bdev)->queue_flags); +} + static inline enum blk_zoned_model bdev_zoned_model(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); -- cgit From 644c4229559257cadc4267fc36c2dc22ee9c040f Mon Sep 17 00:00:00 2001 From: Konrad Dybcio Date: Wed, 21 Sep 2022 02:44:58 +0200 Subject: clk: qcom: smd: Add SM6375 clocks Add support for controlling SMD RPM clocks on SM6375. Signed-off-by: Konrad Dybcio Signed-off-by: Bjorn Andersson Link: https://lore.kernel.org/r/20220921004458.151842-3-konrad.dybcio@somainline.org --- include/linux/soc/qcom/smd-rpm.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/soc/qcom/smd-rpm.h b/include/linux/soc/qcom/smd-rpm.h index 82c9d489833a..3ab8c07f71c0 100644 --- a/include/linux/soc/qcom/smd-rpm.h +++ b/include/linux/soc/qcom/smd-rpm.h @@ -41,6 +41,7 @@ struct qcom_smd_rpm; #define QCOM_SMD_RPM_HWKM_CLK 0x6d6b7768 #define QCOM_SMD_RPM_PKA_CLK 0x616b70 #define QCOM_SMD_RPM_MCFG_CLK 0x6766636d +#define QCOM_SMD_RPM_MMXI_CLK 0x69786d6d int qcom_rpm_smd_write(struct qcom_smd_rpm *rpm, int state, -- cgit From 3008119a3dd8052b7e5c07488e20a7abb0d287f2 Mon Sep 17 00:00:00 2001 From: Gaosheng Cui Date: Fri, 23 Sep 2022 17:00:12 +0800 Subject: ftrace: Remove obsoleted code from ftrace and task_struct The trace of "struct task_struct" was no longer used since commit 345ddcc882d8 ("ftrace: Have set_ftrace_pid use the bitmap like events do"), and the functions about flags for current->trace is useless, so remove them. Link: https://lkml.kernel.org/r/20220923090012.505990-1-cuigaosheng1@huawei.com Signed-off-by: Gaosheng Cui Signed-off-by: Steven Rostedt (Google) --- include/linux/ftrace.h | 41 ----------------------------------------- include/linux/sched.h | 3 --- 2 files changed, 44 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 0b61371e287b..62557d4bffc2 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -1122,47 +1122,6 @@ static inline void unpause_graph_tracing(void) { } #endif /* CONFIG_FUNCTION_GRAPH_TRACER */ #ifdef CONFIG_TRACING - -/* flags for current->trace */ -enum { - TSK_TRACE_FL_TRACE_BIT = 0, - TSK_TRACE_FL_GRAPH_BIT = 1, -}; -enum { - TSK_TRACE_FL_TRACE = 1 << TSK_TRACE_FL_TRACE_BIT, - TSK_TRACE_FL_GRAPH = 1 << TSK_TRACE_FL_GRAPH_BIT, -}; - -static inline void set_tsk_trace_trace(struct task_struct *tsk) -{ - set_bit(TSK_TRACE_FL_TRACE_BIT, &tsk->trace); -} - -static inline void clear_tsk_trace_trace(struct task_struct *tsk) -{ - clear_bit(TSK_TRACE_FL_TRACE_BIT, &tsk->trace); -} - -static inline int test_tsk_trace_trace(struct task_struct *tsk) -{ - return tsk->trace & TSK_TRACE_FL_TRACE; -} - -static inline void set_tsk_trace_graph(struct task_struct *tsk) -{ - set_bit(TSK_TRACE_FL_GRAPH_BIT, &tsk->trace); -} - -static inline void clear_tsk_trace_graph(struct task_struct *tsk) -{ - clear_bit(TSK_TRACE_FL_GRAPH_BIT, &tsk->trace); -} - -static inline int test_tsk_trace_graph(struct task_struct *tsk) -{ - return tsk->trace & TSK_TRACE_FL_GRAPH; -} - enum ftrace_dump_mode; extern enum ftrace_dump_mode ftrace_dump_on_oops; diff --git a/include/linux/sched.h b/include/linux/sched.h index e7b2f8a5c711..c7ee04e9147a 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1381,9 +1381,6 @@ struct task_struct { #endif #ifdef CONFIG_TRACING - /* State flags for use by tracers: */ - unsigned long trace; - /* Bitmask and counter of trace recursion: */ unsigned long trace_recursion; #endif /* CONFIG_TRACING */ -- cgit From 976a859c9c68fdf160379bfc154431489f318292 Mon Sep 17 00:00:00 2001 From: Aya Levin Date: Wed, 7 Sep 2022 16:36:23 -0700 Subject: net/mlx5: Expose NPPS related registers Add management capability bits indicating firmware may support N pulses per second. Add corresponding fields in MTPPS register. Signed-off-by: Aya Levin Reviewed-by: Eran Ben Elisha Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 06eab92b9fb3..e929b0dcd985 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -9792,7 +9792,9 @@ struct mlx5_ifc_pcam_reg_bits { struct mlx5_ifc_mcam_enhanced_features_bits { u8 reserved_at_0[0x5d]; u8 mcia_32dwords[0x1]; - u8 reserved_at_5e[0xc]; + u8 out_pulse_duration_ns[0x1]; + u8 npps_period[0x1]; + u8 reserved_at_60[0xa]; u8 reset_state[0x1]; u8 ptpcyc2realtime_modify[0x1]; u8 reserved_at_6c[0x2]; @@ -10292,7 +10294,12 @@ struct mlx5_ifc_mtpps_reg_bits { u8 reserved_at_18[0x4]; u8 cap_max_num_of_pps_out_pins[0x4]; - u8 reserved_at_20[0x24]; + u8 reserved_at_20[0x13]; + u8 cap_log_min_npps_period[0x5]; + u8 reserved_at_38[0x3]; + u8 cap_log_min_out_pulse_duration_ns[0x5]; + + u8 reserved_at_40[0x4]; u8 cap_pin_3_mode[0x4]; u8 reserved_at_48[0x4]; u8 cap_pin_2_mode[0x4]; @@ -10311,7 +10318,9 @@ struct mlx5_ifc_mtpps_reg_bits { u8 cap_pin_4_mode[0x4]; u8 field_select[0x20]; - u8 reserved_at_a0[0x60]; + u8 reserved_at_a0[0x20]; + + u8 npps_period[0x40]; u8 enable[0x1]; u8 reserved_at_101[0xb]; @@ -10320,7 +10329,8 @@ struct mlx5_ifc_mtpps_reg_bits { u8 pin_mode[0x4]; u8 pin[0x8]; - u8 reserved_at_120[0x20]; + u8 reserved_at_120[0x2]; + u8 out_pulse_duration_ns[0x1e]; u8 time_stamp[0x40]; -- cgit From f0462bc3e9024e9738fcd1a026456238317d84c7 Mon Sep 17 00:00:00 2001 From: Aya Levin Date: Wed, 7 Sep 2022 16:36:24 -0700 Subject: net/mlx5: Add support for NPPS with real time mode Add support for setting NPPS. NPPS is currently available in REAL_TIME_CLOCK mode only. In addition allow the user to set the pulse duration. When NPPS pulse duration is not set explicitly by the user, driver set it to 50% of the NPPS period. Signed-off-by: Aya Levin Reviewed-by: Eran Ben Elisha Reviewed-by: Saeed Mahameed Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 954af9cafb1d..481506376344 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -698,6 +698,8 @@ struct mlx5_pps { struct work_struct out_work; u64 start[MAX_PIN_NUM]; u8 enabled; + u64 min_npps_period; + u64 min_out_pulse_duration_ns; }; struct mlx5_timer { -- cgit From 8d1ac895fff96a228db20db92243e93687659ef7 Mon Sep 17 00:00:00 2001 From: "Liu, Changcheng" Date: Wed, 7 Sep 2022 16:36:25 -0700 Subject: net/mlx5: add IFC bits for bypassing port select flow table port_select_flow_table_bypass - When set, device supports bypass port select flow table. active_port - Bitmask indicates the current active ports in PORT_SELECT_FT LAG. MLX5_SET_HCA_CAP_OP_MODE_PORT_SELECTION - op_mod to operate PORT_SELECTION_Capabilities. Signed-off-by: Liu, Changcheng Reviewed-by: Mark Bloch Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index e929b0dcd985..b7f4e93df0f2 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -68,6 +68,7 @@ enum { MLX5_SET_HCA_CAP_OP_MOD_ODP = 0x2, MLX5_SET_HCA_CAP_OP_MOD_ATOMIC = 0x3, MLX5_SET_HCA_CAP_OP_MOD_ROCE = 0x4, + MLX5_SET_HCA_CAP_OP_MODE_PORT_SELECTION = 0x25, }; enum { @@ -814,7 +815,9 @@ struct mlx5_ifc_flow_table_nic_cap_bits { struct mlx5_ifc_port_selection_cap_bits { u8 reserved_at_0[0x10]; u8 port_select_flow_table[0x1]; - u8 reserved_at_11[0xf]; + u8 reserved_at_11[0x1]; + u8 port_select_flow_table_bypass[0x1]; + u8 reserved_at_13[0xd]; u8 reserved_at_20[0x1e0]; @@ -10933,7 +10936,9 @@ struct mlx5_ifc_lagc_bits { u8 reserved_at_18[0x5]; u8 lag_state[0x3]; - u8 reserved_at_20[0x14]; + u8 reserved_at_20[0xc]; + u8 active_port[0x4]; + u8 reserved_at_30[0x4]; u8 tx_remap_affinity_2[0x4]; u8 reserved_at_38[0x4]; u8 tx_remap_affinity_1[0x4]; -- cgit From a83bb5df2ac604ab418fbe0a8720f55de46652eb Mon Sep 17 00:00:00 2001 From: "Liu, Changcheng" Date: Wed, 7 Sep 2022 16:36:26 -0700 Subject: RDMA/mlx5: Don't set tx affinity when lag is in hash mode In hash mode, without setting tx affinity explicitly, the port select flow table decides which port is used for the traffic. If port_select_flow_table_bypass capability is supported and tx affinity is set explicitly for QP/TIS, they will be added into the explicit affinity table in FW to check which port is used for the traffic. 1. The overloaded explicit affinity table may affect performance. To avoid this, do not set tx affinity explicitly by default. 2. The packets of the same flow need to be transmitted on the same port. Because the packets of the same flow use different QPs in slow & fast path, it shouldn't set tx affinity explicitly for these QPs. Signed-off-by: Liu, Changcheng Reviewed-by: Mark Bloch Reviewed-by: Vlad Buslov Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 481506376344..9114caceb2b5 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1153,6 +1153,7 @@ int mlx5_cmd_destroy_vport_lag(struct mlx5_core_dev *dev); bool mlx5_lag_is_roce(struct mlx5_core_dev *dev); bool mlx5_lag_is_sriov(struct mlx5_core_dev *dev); bool mlx5_lag_is_active(struct mlx5_core_dev *dev); +bool mlx5_lag_mode_is_hash(struct mlx5_core_dev *dev); bool mlx5_lag_is_master(struct mlx5_core_dev *dev); bool mlx5_lag_is_shared_fdb(struct mlx5_core_dev *dev); struct net_device *mlx5_lag_get_roce_netdev(struct mlx5_core_dev *dev); -- cgit From 66af4fe37119dbf6a82078949a82e625155777ef Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Wed, 7 Sep 2022 16:36:30 -0700 Subject: net/mlx5: Remove unused functions Remove functions which are no longer used in the driver: mlx5e_ipsec_is_tx_flow mlx5_health_flush get_cqe_enhanced_num_mini_cqes get_cqe_l3_hdr_type mlx5_health_flush mlx5_fs_is_ipsec_flow _mlx5_fs_is_outer_ipproto_flow mlx5_fs_is_outer_tcp_flow mlx5_fs_is_outer_udp_flow mlx5_fs_is_vxlan_flow mlx5_fs_is_outer_ipsec_flow Signed-off-by: Gal Pressman Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/device.h | 11 ---------- include/linux/mlx5/driver.h | 1 - include/linux/mlx5/fs_helpers.h | 48 ----------------------------------------- 3 files changed, 60 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 5b41b9fb3d48..753753f3ce12 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -874,12 +874,6 @@ static inline u8 get_cqe_opcode(struct mlx5_cqe64 *cqe) return cqe->op_own >> 4; } -static inline u8 get_cqe_enhanced_num_mini_cqes(struct mlx5_cqe64 *cqe) -{ - /* num_of_mini_cqes is zero based */ - return get_cqe_opcode(cqe) + 1; -} - static inline u8 get_cqe_lro_tcppsh(struct mlx5_cqe64 *cqe) { return (cqe->lro.tcppsh_abort_dupack >> 6) & 1; @@ -890,11 +884,6 @@ static inline u8 get_cqe_l4_hdr_type(struct mlx5_cqe64 *cqe) return (cqe->l4_l3_hdr_type >> 4) & 0x7; } -static inline u8 get_cqe_l3_hdr_type(struct mlx5_cqe64 *cqe) -{ - return (cqe->l4_l3_hdr_type >> 2) & 0x3; -} - static inline bool cqe_is_tunneled(struct mlx5_cqe64 *cqe) { return cqe->tls_outer_l3_tunneled & 0x1; diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 9114caceb2b5..52f34fe47049 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1018,7 +1018,6 @@ int mlx5_cmd_exec_polling(struct mlx5_core_dev *dev, void *in, int in_size, bool mlx5_cmd_is_down(struct mlx5_core_dev *dev); int mlx5_core_get_caps(struct mlx5_core_dev *dev, enum mlx5_cap_type cap_type); -void mlx5_health_flush(struct mlx5_core_dev *dev); void mlx5_health_cleanup(struct mlx5_core_dev *dev); int mlx5_health_init(struct mlx5_core_dev *dev); void mlx5_start_health_poll(struct mlx5_core_dev *dev); diff --git a/include/linux/mlx5/fs_helpers.h b/include/linux/mlx5/fs_helpers.h index 9db21cd0e92c..bc5125bc0561 100644 --- a/include/linux/mlx5/fs_helpers.h +++ b/include/linux/mlx5/fs_helpers.h @@ -38,46 +38,6 @@ #define MLX5_FS_IPV4_VERSION 4 #define MLX5_FS_IPV6_VERSION 6 -static inline bool mlx5_fs_is_ipsec_flow(const u32 *match_c) -{ - void *misc_params_c = MLX5_ADDR_OF(fte_match_param, match_c, - misc_parameters); - - return MLX5_GET(fte_match_set_misc, misc_params_c, outer_esp_spi); -} - -static inline bool _mlx5_fs_is_outer_ipproto_flow(const u32 *match_c, - const u32 *match_v, u8 match) -{ - const void *headers_c = MLX5_ADDR_OF(fte_match_param, match_c, - outer_headers); - const void *headers_v = MLX5_ADDR_OF(fte_match_param, match_v, - outer_headers); - - return MLX5_GET(fte_match_set_lyr_2_4, headers_c, ip_protocol) == 0xff && - MLX5_GET(fte_match_set_lyr_2_4, headers_v, ip_protocol) == match; -} - -static inline bool mlx5_fs_is_outer_tcp_flow(const u32 *match_c, - const u32 *match_v) -{ - return _mlx5_fs_is_outer_ipproto_flow(match_c, match_v, IPPROTO_TCP); -} - -static inline bool mlx5_fs_is_outer_udp_flow(const u32 *match_c, - const u32 *match_v) -{ - return _mlx5_fs_is_outer_ipproto_flow(match_c, match_v, IPPROTO_UDP); -} - -static inline bool mlx5_fs_is_vxlan_flow(const u32 *match_c) -{ - void *misc_params_c = MLX5_ADDR_OF(fte_match_param, match_c, - misc_parameters); - - return MLX5_GET(fte_match_set_misc, misc_params_c, vxlan_vni); -} - static inline bool _mlx5_fs_is_outer_ipv_flow(struct mlx5_core_dev *mdev, const u32 *match_c, const u32 *match_v, int version) @@ -131,12 +91,4 @@ mlx5_fs_is_outer_ipv6_flow(struct mlx5_core_dev *mdev, const u32 *match_c, MLX5_FS_IPV6_VERSION); } -static inline bool mlx5_fs_is_outer_ipsec_flow(const u32 *match_c) -{ - void *misc_params_c = - MLX5_ADDR_OF(fte_match_param, match_c, misc_parameters); - - return MLX5_GET(fte_match_set_misc, misc_params_c, outer_esp_spi); -} - #endif -- cgit From b53ff37fcd5c7038d9dd000dcf682575a8f47999 Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Wed, 7 Sep 2022 16:36:31 -0700 Subject: net/mlx5: Remove unused structs Remove structs which are no longer used in the driver: mlx5dr_cmd_qp_create_attr mlx5_fs_dr_ns mlx5_pas Signed-off-by: Gal Pressman Reviewed-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 52f34fe47049..949feb7f889f 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -857,11 +857,6 @@ struct mlx5_cmd_work_ent { refcount_t refcnt; }; -struct mlx5_pas { - u64 pa; - u8 log_sz; -}; - enum phy_port_state { MLX5_AAA_111 }; -- cgit From 9175d8103780084e70bc89f2040ea62dc30f6f60 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky Date: Wed, 7 Sep 2022 16:36:32 -0700 Subject: net/mlx5: Remove from FPGA IFC file not-needed definitions Move IP layout bits definitions to be close to the place that actually uses it, together with removal extra defines that not in-use. Reviewed-by: Raed Salem Signed-off-by: Leon Romanovsky Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 16 ++++++++++++++++ include/linux/mlx5/mlx5_ifc_fpga.h | 24 ------------------------ 2 files changed, 16 insertions(+), 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index b7f4e93df0f2..c25f2ce75734 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -478,6 +478,22 @@ struct mlx5_ifc_odp_per_transport_service_cap_bits { u8 reserved_at_6[0x1a]; }; +struct mlx5_ifc_ipv4_layout_bits { + u8 reserved_at_0[0x60]; + + u8 ipv4[0x20]; +}; + +struct mlx5_ifc_ipv6_layout_bits { + u8 ipv6[16][0x8]; +}; + +union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits { + struct mlx5_ifc_ipv6_layout_bits ipv6_layout; + struct mlx5_ifc_ipv4_layout_bits ipv4_layout; + u8 reserved_at_0[0x80]; +}; + struct mlx5_ifc_fte_match_set_lyr_2_4_bits { u8 smac_47_16[0x20]; diff --git a/include/linux/mlx5/mlx5_ifc_fpga.h b/include/linux/mlx5/mlx5_ifc_fpga.h index 45c7c0d67635..0596472923ad 100644 --- a/include/linux/mlx5/mlx5_ifc_fpga.h +++ b/include/linux/mlx5/mlx5_ifc_fpga.h @@ -32,30 +32,6 @@ #ifndef MLX5_IFC_FPGA_H #define MLX5_IFC_FPGA_H -struct mlx5_ifc_ipv4_layout_bits { - u8 reserved_at_0[0x60]; - - u8 ipv4[0x20]; -}; - -struct mlx5_ifc_ipv6_layout_bits { - u8 ipv6[16][0x8]; -}; - -union mlx5_ifc_ipv6_layout_ipv4_layout_auto_bits { - struct mlx5_ifc_ipv6_layout_bits ipv6_layout; - struct mlx5_ifc_ipv4_layout_bits ipv4_layout; - u8 reserved_at_0[0x80]; -}; - -enum { - MLX5_FPGA_CAP_SANDBOX_VENDOR_ID_MLNX = 0x2c9, -}; - -enum { - MLX5_FPGA_CAP_SANDBOX_PRODUCT_ID_IPSEC = 0x2, -}; - struct mlx5_ifc_fpga_shell_caps_bits { u8 max_num_qps[0x10]; u8 reserved_at_10[0x8]; -- cgit From 7b084630153152239d84990ac4540c2dd360186f Mon Sep 17 00:00:00 2001 From: Namhyung Kim Date: Wed, 21 Sep 2022 15:00:31 -0700 Subject: perf: Use sample_flags for addr Use the new sample_flags to indicate whether the addr field is filled by the PMU driver. As most PMU drivers pass 0, it can set the flag only if it has a non-zero value. And use 0 in perf_sample_output() if it's not filled already. Signed-off-by: Namhyung Kim Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220921220032.2858517-1-namhyung@kernel.org --- include/linux/perf_event.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 368bdc4f563f..f4a13579b0e8 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1028,7 +1028,6 @@ struct perf_sample_data { * minimize the cachelines touched. */ u64 sample_flags; - u64 addr; struct perf_raw_record *raw; u64 period; @@ -1040,6 +1039,7 @@ struct perf_sample_data { union perf_sample_weight weight; union perf_mem_data_src data_src; u64 txn; + u64 addr; u64 type; u64 ip; @@ -1079,9 +1079,13 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, { /* remaining struct members initialized in perf_prepare_sample() */ data->sample_flags = 0; - data->addr = addr; data->raw = NULL; data->period = period; + + if (addr) { + data->addr = addr; + data->sample_flags |= PERF_SAMPLE_ADDR; + } } /* -- cgit From 838d9bb62d132ec3baf1b5aba2e95ef9a7a9a3cd Mon Sep 17 00:00:00 2001 From: Namhyung Kim Date: Wed, 21 Sep 2022 15:00:32 -0700 Subject: perf: Use sample_flags for raw_data Use the new sample_flags to indicate whether the raw data field is filled by the PMU driver. Although it could check with the NULL, follow the same rule with other fields. Remove the raw field from the perf_sample_data_init() to minimize the number of cache lines touched. Signed-off-by: Namhyung Kim Signed-off-by: Peter Zijlstra (Intel) Link: https://lkml.kernel.org/r/20220921220032.2858517-2-namhyung@kernel.org --- include/linux/perf_event.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index f4a13579b0e8..e9b151cde491 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1028,7 +1028,6 @@ struct perf_sample_data { * minimize the cachelines touched. */ u64 sample_flags; - struct perf_raw_record *raw; u64 period; /* @@ -1040,6 +1039,7 @@ struct perf_sample_data { union perf_mem_data_src data_src; u64 txn; u64 addr; + struct perf_raw_record *raw; u64 type; u64 ip; @@ -1078,8 +1078,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data, u64 addr, u64 period) { /* remaining struct members initialized in perf_prepare_sample() */ - data->sample_flags = 0; - data->raw = NULL; + data->sample_flags = PERF_SAMPLE_PERIOD; data->period = period; if (addr) { -- cgit From b8a94bfb33952bb17fbc65f8903d242a721c533d Mon Sep 17 00:00:00 2001 From: Miguel Ojeda Date: Mon, 5 Apr 2021 05:03:50 +0200 Subject: kallsyms: increase maximum kernel symbol length to 512 Rust symbols can become quite long due to namespacing introduced by modules, types, traits, generics, etc. For instance, the following code: pub mod my_module { pub struct MyType; pub struct MyGenericType(T); pub trait MyTrait { fn my_method() -> u32; } impl MyTrait for MyGenericType { fn my_method() -> u32 { 42 } } } generates a symbol of length 96 when using the upcoming v0 mangling scheme: _RNvXNtCshGpAVYOtgW1_7example9my_moduleINtB2_13MyGenericTypeNtB2_6MyTypeENtB2_7MyTrait9my_method At the moment, Rust symbols may reach up to 300 in length. Setting 512 as the maximum seems like a reasonable choice to keep some headroom. Reviewed-by: Kees Cook Reviewed-by: Petr Mladek Reviewed-by: Greg Kroah-Hartman Co-developed-by: Alex Gaynor Signed-off-by: Alex Gaynor Co-developed-by: Wedson Almeida Filho Signed-off-by: Wedson Almeida Filho Co-developed-by: Gary Guo Signed-off-by: Gary Guo Co-developed-by: Boqun Feng Signed-off-by: Boqun Feng Signed-off-by: Miguel Ojeda --- include/linux/kallsyms.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/kallsyms.h b/include/linux/kallsyms.h index ad39636e0c3f..649faac31ddb 100644 --- a/include/linux/kallsyms.h +++ b/include/linux/kallsyms.h @@ -15,7 +15,7 @@ #include -#define KSYM_NAME_LEN 128 +#define KSYM_NAME_LEN 512 #define KSYM_SYMBOL_LEN (sizeof("%s+%#lx/%#lx [%s %s]") + \ (KSYM_NAME_LEN - 1) + \ 2*(BITS_PER_LONG*3/10) + (MODULE_NAME_LEN - 1) + \ -- cgit From 2f7ab1267dc9b2d1f29695aff3211c87483480f3 Mon Sep 17 00:00:00 2001 From: Miguel Ojeda Date: Sat, 3 Jul 2021 16:42:57 +0200 Subject: Kbuild: add Rust support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Having most of the new files in place, we now enable Rust support in the build system, including `Kconfig` entries related to Rust, the Rust configuration printer and a few other bits. Reviewed-by: Kees Cook Reviewed-by: Nick Desaulniers Tested-by: Nick Desaulniers Reviewed-by: Greg Kroah-Hartman Co-developed-by: Alex Gaynor Signed-off-by: Alex Gaynor Co-developed-by: Finn Behrens Signed-off-by: Finn Behrens Co-developed-by: Adam Bratschi-Kaye Signed-off-by: Adam Bratschi-Kaye Co-developed-by: Wedson Almeida Filho Signed-off-by: Wedson Almeida Filho Co-developed-by: Michael Ellerman Signed-off-by: Michael Ellerman Co-developed-by: Sven Van Asbroeck Signed-off-by: Sven Van Asbroeck Co-developed-by: Gary Guo Signed-off-by: Gary Guo Co-developed-by: Boris-Chengbiao Zhou Signed-off-by: Boris-Chengbiao Zhou Co-developed-by: Boqun Feng Signed-off-by: Boqun Feng Co-developed-by: Douglas Su Signed-off-by: Douglas Su Co-developed-by: Dariusz Sosnowski Signed-off-by: Dariusz Sosnowski Co-developed-by: Antonio Terceiro Signed-off-by: Antonio Terceiro Co-developed-by: Daniel Xu Signed-off-by: Daniel Xu Co-developed-by: Björn Roy Baron Signed-off-by: Björn Roy Baron Co-developed-by: Martin Rodriguez Reboredo Signed-off-by: Martin Rodriguez Reboredo Signed-off-by: Miguel Ojeda --- include/linux/compiler_types.h | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4f2a819fd60a..50b3f6b9502e 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -4,8 +4,12 @@ #ifndef __ASSEMBLY__ +/* + * Skipped when running bindgen due to a libclang issue; + * see https://github.com/rust-lang/rust-bindgen/issues/2244. + */ #if defined(CONFIG_DEBUG_INFO_BTF) && defined(CONFIG_PAHOLE_HAS_BTF_TAG) && \ - __has_attribute(btf_type_tag) + __has_attribute(btf_type_tag) && !defined(__BINDGEN__) # define BTF_TYPE_TAG(value) __attribute__((btf_type_tag(#value))) #else # define BTF_TYPE_TAG(value) /* nothing */ -- cgit From 5aec788aeb8eb74282b75ac1b317beb0fbb69a42 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 27 Sep 2022 21:02:34 +0200 Subject: sched: Fix TASK_state comparisons Task state is fundamentally a bitmask; direct comparisons are probably not working as intended. Specifically the normal wait-state have a number of possible modifiers: TASK_UNINTERRUPTIBLE: TASK_WAKEKILL, TASK_NOLOAD, TASK_FREEZABLE TASK_INTERRUPTIBLE: TASK_FREEZABLE Specifically, the addition of TASK_FREEZABLE wrecked __wait_is_interruptible(). This however led to an audit of direct comparisons yielding the rest of the changes. Fixes: f5d39b020809 ("freezer,sched: Rewrite core freezer logic") Reported-by: Christian Borntraeger Debugged-by: Christian Borntraeger Signed-off-by: Peter Zijlstra (Intel) Tested-by: Christian Borntraeger --- include/linux/wait.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/wait.h b/include/linux/wait.h index 14ad8a0e9fac..7f5a51aae0a7 100644 --- a/include/linux/wait.h +++ b/include/linux/wait.h @@ -281,7 +281,7 @@ static inline void wake_up_pollfree(struct wait_queue_head *wq_head) #define ___wait_is_interruptible(state) \ (!__builtin_constant_p(state) || \ - state == TASK_INTERRUPTIBLE || state == TASK_KILLABLE) \ + (state & (TASK_INTERRUPTIBLE | TASK_WAKEKILL))) extern void init_wait_entry(struct wait_queue_entry *wq_entry, int flags); -- cgit From 99df7a2810b6d24651d4887ab61a142e042fb235 Mon Sep 17 00:00:00 2001 From: Nathan Lynch Date: Mon, 26 Sep 2022 08:16:42 -0500 Subject: powerpc/pseries: block untrusted device tree changes when locked down The /proc/powerpc/ofdt interface allows the root user to freely alter the in-kernel device tree, enabling arbitrary physical address writes via drivers that could bind to malicious device nodes, thus making it possible to disable lockdown. Historically this interface has been used on the pseries platform to facilitate the runtime addition and removal of processor, memory, and device resources (aka Dynamic Logical Partitioning or DLPAR). Years ago, the processor and memory use cases were migrated to designs that happen to be lockdown-friendly: device tree updates are communicated directly to the kernel from firmware without passing through untrusted user space. I/O device DLPAR via the "drmgr" command in powerpc-utils remains the sole legitimate user of /proc/powerpc/ofdt, but it is already broken in lockdown since it uses /dev/mem to allocate argument buffers for the rtas syscall. So only illegitimate uses of the interface should see a behavior change when running on a locked down kernel. Signed-off-by: Nathan Lynch Acked-by: Paul Moore (LSM) Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220926131643.146502-2-nathanl@linux.ibm.com --- include/linux/security.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/security.h b/include/linux/security.h index 1bc362cb413f..7da801ceb5a4 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -114,6 +114,7 @@ enum lockdown_reason { LOCKDOWN_IOPORT, LOCKDOWN_MSR, LOCKDOWN_ACPI_TABLES, + LOCKDOWN_DEVICE_TREE, LOCKDOWN_PCMCIA_CIS, LOCKDOWN_TIOCSSERIAL, LOCKDOWN_MODULE_PARAMETERS, -- cgit From b8f3e48834fe8c86b4f21739c6effd160e2c2c19 Mon Sep 17 00:00:00 2001 From: Nathan Lynch Date: Mon, 26 Sep 2022 08:16:43 -0500 Subject: powerpc/rtas: block error injection when locked down The error injection facility on pseries VMs allows corruption of arbitrary guest memory, potentially enabling a sufficiently privileged user to disable lockdown or perform other modifications of the running kernel via the rtas syscall. Block the PAPR error injection facility from being opened or called when locked down. Signed-off-by: Nathan Lynch Acked-by: Paul Moore (LSM) Signed-off-by: Michael Ellerman Link: https://lore.kernel.org/r/20220926131643.146502-3-nathanl@linux.ibm.com --- include/linux/security.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/security.h b/include/linux/security.h index 7da801ceb5a4..a6d67600759e 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -123,6 +123,7 @@ enum lockdown_reason { LOCKDOWN_XMON_WR, LOCKDOWN_BPF_WRITE_USER, LOCKDOWN_DBG_WRITE_KERNEL, + LOCKDOWN_RTAS_ERROR_INJECTION, LOCKDOWN_INTEGRITY_MAX, LOCKDOWN_KCORE, LOCKDOWN_KPROBES, -- cgit From 141f3d6256e58103ece1c3dd2835e871f1dde240 Mon Sep 17 00:00:00 2001 From: Damien Le Moal Date: Sat, 24 Sep 2022 15:18:26 +0900 Subject: ata: libata-sata: Fix device queue depth control The function __ata_change_queue_depth() uses the helper ata_scsi_find_dev() to get the ata_device structure of a scsi device and set that device maximum queue depth. However, when the ata device is managed by libsas, ata_scsi_find_dev() returns NULL, turning __ata_change_queue_depth() into a nop, which prevents the user from setting the maximum queue depth of ATA devices used with libsas based HBAs. Fix this by renaming __ata_change_queue_depth() to ata_change_queue_depth() and adding a pointer to the ata_device structure of the target device as argument. This pointer is provided by ata_scsi_change_queue_depth() using ata_scsi_find_dev() in the case of a libata managed device and by sas_change_queue_depth() using sas_to_ata_dev() in the case of a libsas managed ata device. Signed-off-by: Damien Le Moal Tested-by: John Garry --- include/linux/libata.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/libata.h b/include/linux/libata.h index 698032e5ef2d..20765d1c5f80 100644 --- a/include/linux/libata.h +++ b/include/linux/libata.h @@ -1136,8 +1136,8 @@ extern int ata_scsi_slave_config(struct scsi_device *sdev); extern void ata_scsi_slave_destroy(struct scsi_device *sdev); extern int ata_scsi_change_queue_depth(struct scsi_device *sdev, int queue_depth); -extern int __ata_change_queue_depth(struct ata_port *ap, struct scsi_device *sdev, - int queue_depth); +extern int ata_change_queue_depth(struct ata_port *ap, struct ata_device *dev, + struct scsi_device *sdev, int queue_depth); extern struct ata_device *ata_dev_pair(struct ata_device *adev); extern int ata_do_set_mode(struct ata_link *link, struct ata_device **r_failed_dev); extern void ata_scsi_port_error_handler(struct Scsi_Host *host, struct ata_port *ap); -- cgit From f25723dcef4a38f6a39e17afeabd1adf6402230e Mon Sep 17 00:00:00 2001 From: Vincent Whitchurch Date: Tue, 27 Sep 2022 13:21:14 +0200 Subject: spi: Save current RX and TX DMA devices Save the current RX and TX DMA devices to avoid having to duplicate the logic to pick them, since we'll need access to them in some more functions to fix a bug in the cache handling. Signed-off-by: Vincent Whitchurch Link: https://lore.kernel.org/r/20220927112117.77599-2-vincent.whitchurch@axis.com Signed-off-by: Mark Brown --- include/linux/spi/spi.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index 19acd4c9c7e3..4b4041e9895a 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -378,6 +378,8 @@ extern struct spi_device *spi_new_ancillary_device(struct spi_device *spi, u8 ch * @cleanup: frees controller-specific state * @can_dma: determine whether this controller supports DMA * @dma_map_dev: device which can be used for DMA mapping + * @cur_rx_dma_dev: device which is currently used for RX DMA mapping + * @cur_tx_dma_dev: device which is currently used for TX DMA mapping * @queued: whether this controller is providing an internal message queue * @kworker: pointer to thread struct for message pump * @pump_messages: work struct for scheduling work to the message pump @@ -609,6 +611,8 @@ struct spi_controller { struct spi_device *spi, struct spi_transfer *xfer); struct device *dma_map_dev; + struct device *cur_rx_dma_dev; + struct device *cur_tx_dma_dev; /* * These hooks are for drivers that want to use the generic -- cgit From 612d5494aef9bd2ab68d585a8c0ac2b16d12d520 Mon Sep 17 00:00:00 2001 From: Huacai Chen Date: Tue, 27 Sep 2022 20:45:57 +0800 Subject: irqchip: Make irqchip_init() usable on pure ACPI systems Pure ACPI systems (e.g., LoongArch) do not need OF_IRQ, but still require irqchip_init() to perform the ACPI irqchip probing, even when OF_IRQ isn't selected. Relax the dependency to enable the generic irqchip support when ACPI_GENERIC_GSI is configured. Signed-off-by: Huacai Chen Tested-by: Tiezhu Yang [maz: revamped commit message] Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220927124557.3246737-1-chenhuacai@loongson.cn --- include/linux/of_irq.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of_irq.h b/include/linux/of_irq.h index 83fccd0c9bba..d6d3eae2f145 100644 --- a/include/linux/of_irq.h +++ b/include/linux/of_irq.h @@ -37,9 +37,8 @@ extern unsigned int irq_create_of_mapping(struct of_phandle_args *irq_data); extern int of_irq_to_resource(struct device_node *dev, int index, struct resource *r); -extern void of_irq_init(const struct of_device_id *matches); - #ifdef CONFIG_OF_IRQ +extern void of_irq_init(const struct of_device_id *matches); extern int of_irq_parse_one(struct device_node *device, int index, struct of_phandle_args *out_irq); extern int of_irq_count(struct device_node *dev); @@ -57,6 +56,9 @@ extern struct irq_domain *of_msi_map_get_device_domain(struct device *dev, extern void of_msi_configure(struct device *dev, struct device_node *np); u32 of_msi_map_id(struct device *dev, struct device_node *msi_np, u32 id_in); #else +static inline void of_irq_init(const struct of_device_id *matches) +{ +} static inline int of_irq_parse_one(struct device_node *device, int index, struct of_phandle_args *out_irq) { -- cgit From 334f7d42db3eb0274aa6b4aba7ce14d87df3fef0 Mon Sep 17 00:00:00 2001 From: Frank Li Date: Thu, 22 Sep 2022 11:12:42 -0500 Subject: irqchip: Allow extra fields to be passed to IRQCHIP_PLATFORM_DRIVER_END IRQCHIP_PLATFORM_DRIVER_* doesn't allow some fields (such as .pm) to be set in the platform_driver structure. Make IRQCHIP_PLATFORM_DRIVER_END variadic so that .pm or another field can be set if needed. Signed-off-by: Frank Li [maz: revamped commit message] Signed-off-by: Marc Zyngier Link: https://lore.kernel.org/r/20220922161246.20586-3-Frank.Li@nxp.com --- include/linux/irqchip.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/irqchip.h b/include/linux/irqchip.h index 3a091d0710ae..d5e6024cb2a8 100644 --- a/include/linux/irqchip.h +++ b/include/linux/irqchip.h @@ -44,7 +44,8 @@ static const struct of_device_id drv_name##_irqchip_match_table[] = { #define IRQCHIP_MATCH(compat, fn) { .compatible = compat, \ .data = typecheck_irq_init_cb(fn), }, -#define IRQCHIP_PLATFORM_DRIVER_END(drv_name) \ + +#define IRQCHIP_PLATFORM_DRIVER_END(drv_name, ...) \ {}, \ }; \ MODULE_DEVICE_TABLE(of, drv_name##_irqchip_match_table); \ @@ -56,6 +57,7 @@ static struct platform_driver drv_name##_driver = { \ .owner = THIS_MODULE, \ .of_match_table = drv_name##_irqchip_match_table, \ .suppress_bind_attrs = true, \ + __VA_ARGS__ \ }, \ }; \ builtin_platform_driver(drv_name##_driver) -- cgit From 2d48bfca42a6d2b5f92d24ceae4425fc4fae14ab Mon Sep 17 00:00:00 2001 From: Chris Morgan Date: Mon, 8 Aug 2022 12:38:07 -0500 Subject: mfd: rk808: Add Rockchip rk817 battery charger support Add rk817 charger support cell to rk808 mfd driver. Signed-off-by: Chris Morgan Signed-off-by: Maya Matuszczyk Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220808173809.11320-3-macroalpha82@gmail.com --- include/linux/mfd/rk808.h | 91 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 91 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mfd/rk808.h b/include/linux/mfd/rk808.h index 58602032e642..9af1f3105f80 100644 --- a/include/linux/mfd/rk808.h +++ b/include/linux/mfd/rk808.h @@ -519,6 +519,77 @@ enum rk809_reg_id { #define MIC_DIFF_DIS (0x0 << 7) #define MIC_DIFF_EN (0x1 << 7) +/* RK817 Battery Registers */ +#define RK817_GAS_GAUGE_ADC_CONFIG0 0x50 +#define RK817_GG_EN (0x1 << 7) +#define RK817_SYS_VOL_ADC_EN (0x1 << 6) +#define RK817_TS_ADC_EN (0x1 << 5) +#define RK817_USB_VOL_ADC_EN (0x1 << 4) +#define RK817_BAT_VOL_ADC_EN (0x1 << 3) +#define RK817_BAT_CUR_ADC_EN (0x1 << 2) + +#define RK817_GAS_GAUGE_ADC_CONFIG1 0x55 + +#define RK817_VOL_CUR_CALIB_UPD BIT(7) + +#define RK817_GAS_GAUGE_GG_CON 0x56 +#define RK817_GAS_GAUGE_GG_STS 0x57 + +#define RK817_BAT_CON (0x1 << 4) +#define RK817_RELAX_VOL_UPD (0x3 << 2) +#define RK817_RELAX_STS (0x1 << 1) + +#define RK817_GAS_GAUGE_RELAX_THRE_H 0x58 +#define RK817_GAS_GAUGE_RELAX_THRE_L 0x59 +#define RK817_GAS_GAUGE_OCV_THRE_VOL 0x62 +#define RK817_GAS_GAUGE_OCV_VOL_H 0x63 +#define RK817_GAS_GAUGE_OCV_VOL_L 0x64 +#define RK817_GAS_GAUGE_PWRON_VOL_H 0x6b +#define RK817_GAS_GAUGE_PWRON_VOL_L 0x6c +#define RK817_GAS_GAUGE_PWRON_CUR_H 0x6d +#define RK817_GAS_GAUGE_PWRON_CUR_L 0x6e +#define RK817_GAS_GAUGE_OFF_CNT 0x6f +#define RK817_GAS_GAUGE_Q_INIT_H3 0x70 +#define RK817_GAS_GAUGE_Q_INIT_H2 0x71 +#define RK817_GAS_GAUGE_Q_INIT_L1 0x72 +#define RK817_GAS_GAUGE_Q_INIT_L0 0x73 +#define RK817_GAS_GAUGE_Q_PRES_H3 0x74 +#define RK817_GAS_GAUGE_Q_PRES_H2 0x75 +#define RK817_GAS_GAUGE_Q_PRES_L1 0x76 +#define RK817_GAS_GAUGE_Q_PRES_L0 0x77 +#define RK817_GAS_GAUGE_BAT_VOL_H 0x78 +#define RK817_GAS_GAUGE_BAT_VOL_L 0x79 +#define RK817_GAS_GAUGE_BAT_CUR_H 0x7a +#define RK817_GAS_GAUGE_BAT_CUR_L 0x7b +#define RK817_GAS_GAUGE_USB_VOL_H 0x7e +#define RK817_GAS_GAUGE_USB_VOL_L 0x7f +#define RK817_GAS_GAUGE_SYS_VOL_H 0x80 +#define RK817_GAS_GAUGE_SYS_VOL_L 0x81 +#define RK817_GAS_GAUGE_Q_MAX_H3 0x82 +#define RK817_GAS_GAUGE_Q_MAX_H2 0x83 +#define RK817_GAS_GAUGE_Q_MAX_L1 0x84 +#define RK817_GAS_GAUGE_Q_MAX_L0 0x85 +#define RK817_GAS_GAUGE_SLEEP_CON_SAMP_CUR_H 0x8f +#define RK817_GAS_GAUGE_SLEEP_CON_SAMP_CUR_L 0x90 +#define RK817_GAS_GAUGE_CAL_OFFSET_H 0x91 +#define RK817_GAS_GAUGE_CAL_OFFSET_L 0x92 +#define RK817_GAS_GAUGE_VCALIB0_H 0x93 +#define RK817_GAS_GAUGE_VCALIB0_L 0x94 +#define RK817_GAS_GAUGE_VCALIB1_H 0x95 +#define RK817_GAS_GAUGE_VCALIB1_L 0x96 +#define RK817_GAS_GAUGE_IOFFSET_H 0x97 +#define RK817_GAS_GAUGE_IOFFSET_L 0x98 +#define RK817_GAS_GAUGE_BAT_R1 0x9a +#define RK817_GAS_GAUGE_BAT_R2 0x9b +#define RK817_GAS_GAUGE_BAT_R3 0x9c +#define RK817_GAS_GAUGE_DATA0 0x9d +#define RK817_GAS_GAUGE_DATA1 0x9e +#define RK817_GAS_GAUGE_DATA2 0x9f +#define RK817_GAS_GAUGE_DATA3 0xa0 +#define RK817_GAS_GAUGE_DATA4 0xa1 +#define RK817_GAS_GAUGE_DATA5 0xa2 +#define RK817_GAS_GAUGE_CUR_ADC_K0 0xb0 + #define RK817_POWER_EN_REG(i) (0xb1 + (i)) #define RK817_POWER_SLP_EN_REG(i) (0xb5 + (i)) @@ -544,10 +615,30 @@ enum rk809_reg_id { #define RK817_LDO_ON_VSEL_REG(idx) (0xcc + (idx) * 2) #define RK817_BOOST_OTG_CFG (0xde) +#define RK817_PMIC_CHRG_OUT 0xe4 +#define RK817_CHRG_VOL_SEL (0x07 << 4) +#define RK817_CHRG_CUR_SEL (0x07 << 0) + +#define RK817_PMIC_CHRG_IN 0xe5 +#define RK817_USB_VLIM_EN (0x01 << 7) +#define RK817_USB_VLIM_SEL (0x07 << 4) +#define RK817_USB_ILIM_EN (0x01 << 3) +#define RK817_USB_ILIM_SEL (0x07 << 0) +#define RK817_PMIC_CHRG_TERM 0xe6 +#define RK817_CHRG_TERM_ANA_DIG (0x01 << 2) +#define RK817_CHRG_TERM_ANA_SEL (0x03 << 0) +#define RK817_CHRG_EN (0x01 << 6) + +#define RK817_PMIC_CHRG_STS 0xeb +#define RK817_BAT_EXS BIT(7) +#define RK817_CHG_STS (0x07 << 4) + #define RK817_ID_MSB 0xed #define RK817_ID_LSB 0xee #define RK817_SYS_STS 0xf0 +#define RK817_PLUG_IN_STS (0x1 << 6) + #define RK817_SYS_CFG(i) (0xf1 + (i)) #define RK817_ON_SOURCE_REG 0xf5 -- cgit From a47137a5134be2f0b4724fe548362a253727d8b1 Mon Sep 17 00:00:00 2001 From: Linus Walleij Date: Mon, 5 Sep 2022 13:58:10 +0200 Subject: mfd/omap1: htc-i2cpld: Convert to a pure GPIO driver Instead of passing GPIO numbers pertaining to ourselves through platform data, just request GPIO descriptors from our own GPIO chips and use them, and cut down on the unnecessary complexity. Cc: Aaro Koskinen Cc: Janusz Krzysztofik Cc: Tony Lindgren Cc: Cory Maccarrone Cc: linux-omap@vger.kernel.org Signed-off-by: Linus Walleij Signed-off-by: Lee Jones Link: https://lore.kernel.org/r/20220905115810.5987-1-linus.walleij@linaro.org --- include/linux/htcpld.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/htcpld.h b/include/linux/htcpld.h index 842fce69ac06..5f8ac9b1d724 100644 --- a/include/linux/htcpld.h +++ b/include/linux/htcpld.h @@ -13,8 +13,6 @@ struct htcpld_chip_platform_data { }; struct htcpld_core_platform_data { - unsigned int int_reset_gpio_hi; - unsigned int int_reset_gpio_lo; unsigned int i2c_adapter_id; struct htcpld_chip_platform_data *chip; -- cgit From b1cab78ba392162a5ef11173d02db6e182e04edd Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Mon, 26 Sep 2022 19:59:31 +0000 Subject: Revert "net: set proper memcg for net_init hooks allocations" MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This reverts commit 1d0403d20f6c281cb3d14c5f1db5317caeec48e9. Anatoly Pugachev reported that the commit 1d0403d20f6c ("net: set proper memcg for net_init hooks allocations") is somehow causing the sparc64 VMs failed to boot and the VMs boot fine with that patch reverted. So, revert the patch for now and later we can debug the issue. Link: https://lore.kernel.org/all/20220918092849.GA10314@u164.east.ru/ Reported-by: Anatoly Pugachev Signed-off-by: Shakeel Butt Cc: Vasily Averin Cc: Jakub Kicinski Cc: Michal Koutný Cc: Andrew Morton Cc: cgroups@vger.kernel.org Cc: sparclinux@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org Tested-by: Anatoly Pugachev Acked-by: Johannes Weiner Fixes: 1d0403d20f6c ("net: set proper memcg for net_init hooks allocations") Reviewed-by: Muchun Song Acked-by: Roman Gushchin Signed-off-by: Linus Torvalds --- include/linux/memcontrol.h | 45 --------------------------------------------- 1 file changed, 45 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 6257867fbf95..567f12323f55 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1788,42 +1788,6 @@ static inline void count_objcg_event(struct obj_cgroup *objcg, rcu_read_unlock(); } -/** - * get_mem_cgroup_from_obj - get a memcg associated with passed kernel object. - * @p: pointer to object from which memcg should be extracted. It can be NULL. - * - * Retrieves the memory group into which the memory of the pointed kernel - * object is accounted. If memcg is found, its reference is taken. - * If a passed kernel object is uncharged, or if proper memcg cannot be found, - * as well as if mem_cgroup is disabled, NULL is returned. - * - * Return: valid memcg pointer with taken reference or NULL. - */ -static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p) -{ - struct mem_cgroup *memcg; - - rcu_read_lock(); - do { - memcg = mem_cgroup_from_obj(p); - } while (memcg && !css_tryget(&memcg->css)); - rcu_read_unlock(); - return memcg; -} - -/** - * mem_cgroup_or_root - always returns a pointer to a valid memory cgroup. - * @memcg: pointer to a valid memory cgroup or NULL. - * - * If passed argument is not NULL, returns it without any additional checks - * and changes. Otherwise, root_mem_cgroup is returned. - * - * NOTE: root_mem_cgroup can be NULL during early boot. - */ -static inline struct mem_cgroup *mem_cgroup_or_root(struct mem_cgroup *memcg) -{ - return memcg ? memcg : root_mem_cgroup; -} #else static inline bool mem_cgroup_kmem_disabled(void) { @@ -1880,15 +1844,6 @@ static inline void count_objcg_event(struct obj_cgroup *objcg, { } -static inline struct mem_cgroup *get_mem_cgroup_from_obj(void *p) -{ - return NULL; -} - -static inline struct mem_cgroup *mem_cgroup_or_root(struct mem_cgroup *memcg) -{ - return NULL; -} #endif /* CONFIG_MEMCG_KMEM */ #if defined(CONFIG_MEMCG_KMEM) && defined(CONFIG_ZSWAP) -- cgit From 49f27f2b4bfa8b6e26f02df615e544f52648bfb2 Mon Sep 17 00:00:00 2001 From: Peng Fan Date: Wed, 28 Sep 2022 14:47:55 +0800 Subject: remoteproc: Introduce rproc features remote processor may support: - boot recovery with help from main processor - self recovery without help from main processor - iommu - etc Introduce rproc features could simplify code to avoid adding more bool flags Acked-by: Arnaud Pouliquen Signed-off-by: Peng Fan Link: https://lore.kernel.org/r/20220928064756.4059662-2-peng.fan@oss.nxp.com Signed-off-by: Mathieu Poirier --- include/linux/remoteproc.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/remoteproc.h b/include/linux/remoteproc.h index 1abf56ad02da..fe8978eb69f1 100644 --- a/include/linux/remoteproc.h +++ b/include/linux/remoteproc.h @@ -489,6 +489,20 @@ struct rproc_dump_segment { loff_t offset; }; +/** + * enum rproc_features - features supported + * + * @RPROC_FEAT_ATTACH_ON_RECOVERY: The remote processor does not need help + * from Linux to recover, such as firmware + * loading. Linux just needs to attach after + * recovery. + */ + +enum rproc_features { + RPROC_FEAT_ATTACH_ON_RECOVERY, + RPROC_MAX_FEATURES, +}; + /** * struct rproc - represents a physical remote processor device * @node: list node of this rproc object @@ -530,6 +544,7 @@ struct rproc_dump_segment { * @elf_machine: firmware ELF machine * @cdev: character device of the rproc * @cdev_put_on_release: flag to indicate if remoteproc should be shutdown on @char_dev release + * @features: indicate remoteproc features */ struct rproc { struct list_head node; @@ -570,6 +585,7 @@ struct rproc { u16 elf_machine; struct cdev cdev; bool cdev_put_on_release; + DECLARE_BITMAP(features, RPROC_MAX_FEATURES); }; /** -- cgit From f3304ecd7f060db1d4197fbdce5a503259f770d3 Mon Sep 17 00:00:00 2001 From: Masahiro Yamada Date: Fri, 16 Sep 2022 15:29:53 +0900 Subject: linux/export: use inline assembler to populate symbol CRCs Since commit 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS"), the module versioning on the (non-upstreamed-yet) kvx Linux port is broken due to unexpected padding for __crc_* symbols. The kvx GCC adds padding so u32 gets 8-byte alignment instead of 4. I do not know if this happens for upstream architectures in general, but any compiler has the freedom to insert padding for faster access. Use the inline assembler to directly specify the wanted data layout. This is how we previously did before the breakage. Link: https://lore.kernel.org/lkml/20220817161438.32039-1-ysionneau@kalray.eu/ Link: https://lore.kernel.org/linux-kbuild/31ce5305-a76b-13d7-ea55-afca82c46cf2@kalray.eu/ Fixes: 7b4537199a4a ("kbuild: link symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS") Reported-by: Yann Sionneau Signed-off-by: Masahiro Yamada Tested-by: Yann Sionneau --- include/linux/export-internal.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/export-internal.h b/include/linux/export-internal.h index c2b1d4fd5987..fe7e6ba918f1 100644 --- a/include/linux/export-internal.h +++ b/include/linux/export-internal.h @@ -10,8 +10,10 @@ #include #include -/* __used is needed to keep __crc_* for LTO */ #define SYMBOL_CRC(sym, crc, sec) \ - u32 __section("___kcrctab" sec "+" #sym) __used __crc_##sym = crc + asm(".section \"___kcrctab" sec "+" #sym "\",\"a\"" "\n" \ + "__crc_" #sym ":" "\n" \ + ".long " #crc "\n" \ + ".previous" "\n") #endif /* __LINUX_EXPORT_INTERNAL_H__ */ -- cgit From f0d74c4da1f060d2a66976193712a5e6abd361f5 Mon Sep 17 00:00:00 2001 From: Kui-Feng Lee Date: Mon, 26 Sep 2022 11:49:53 -0700 Subject: bpf: Parameterize task iterators. Allow creating an iterator that loops through resources of one thread/process. People could only create iterators to loop through all resources of files, vma, and tasks in the system, even though they were interested in only the resources of a specific task or process. Passing the additional parameters, people can now create an iterator to go through all resources or only the resources of a task. Signed-off-by: Kui-Feng Lee Signed-off-by: Andrii Nakryiko Acked-by: Yonghong Song Acked-by: Martin KaFai Lau Link: https://lore.kernel.org/bpf/20220926184957.208194-2-kuifeng@fb.com --- include/linux/bpf.h | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5161fac0513f..0f3eaf3ed98c 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1796,6 +1796,27 @@ int bpf_obj_get_user(const char __user *pathname, int flags); extern int bpf_iter_ ## target(args); \ int __init bpf_iter_ ## target(args) { return 0; } +/* + * The task type of iterators. + * + * For BPF task iterators, they can be parameterized with various + * parameters to visit only some of tasks. + * + * BPF_TASK_ITER_ALL (default) + * Iterate over resources of every task. + * + * BPF_TASK_ITER_TID + * Iterate over resources of a task/tid. + * + * BPF_TASK_ITER_TGID + * Iterate over resources of every task of a process / task group. + */ +enum bpf_iter_task_type { + BPF_TASK_ITER_ALL = 0, + BPF_TASK_ITER_TID, + BPF_TASK_ITER_TGID, +}; + struct bpf_iter_aux_info { /* for map_elem iter */ struct bpf_map *map; @@ -1805,6 +1826,10 @@ struct bpf_iter_aux_info { struct cgroup *start; /* starting cgroup */ enum bpf_cgroup_iter_order order; } cgroup; + struct { + enum bpf_iter_task_type type; + u32 pid; + } task; }; typedef int (*bpf_iter_attach_target_t)(struct bpf_prog *prog, -- cgit From 7e9fbbb1b776d8d7969551565bc246f74ec53b27 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Wed, 28 Sep 2022 13:39:38 -0400 Subject: ring-buffer: Add ring_buffer_wake_waiters() On closing of a file that represents a ring buffer or flushing the file, there may be waiters on the ring buffer that needs to be woken up and exit the ring_buffer_wait() function. Add ring_buffer_wake_waiters() to wake up the waiters on the ring buffer and allow them to exit the wait loop. Link: https://lkml.kernel.org/r/20220928133938.28dc2c27@gandalf.local.home Cc: stable@vger.kernel.org Cc: Ingo Molnar Cc: Andrew Morton Fixes: 15693458c4bc0 ("tracing/ring-buffer: Move poll wake ups into ring buffer code") Signed-off-by: Steven Rostedt (Google) --- include/linux/ring_buffer.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index dac53fd3afea..2504df9a0453 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -101,7 +101,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, struct file *filp, poll_table *poll_table); - +void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); #define RING_BUFFER_ALL_CPUS -1 -- cgit From f3ddb74ad0790030c9592229fb14d8c451f4e9a8 Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Tue, 27 Sep 2022 19:15:27 -0400 Subject: tracing: Wake up ring buffer waiters on closing of the file When the file that represents the ring buffer is closed, there may be waiters waiting on more input from the ring buffer. Call ring_buffer_wake_waiters() to wake up any waiters when the file is closed. Link: https://lkml.kernel.org/r/20220927231825.182416969@goodmis.org Cc: stable@vger.kernel.org Cc: Ingo Molnar Cc: Andrew Morton Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice") Signed-off-by: Steven Rostedt (Google) --- include/linux/trace_events.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 8401dec93c15..20749bd9db71 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -92,6 +92,7 @@ struct trace_iterator { unsigned int temp_size; char *fmt; /* modified format holder */ unsigned int fmt_size; + long wait_index; /* trace_seq for __print_flags() and __print_symbolic() etc. */ struct trace_seq tmp_seq; -- cgit From 6eaab4dfdd30ea8fb0dd4ee04940676c12b728e8 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Fri, 23 Sep 2022 17:39:01 +0100 Subject: net: introduce struct ubuf_info_msgzc We're going to split struct ubuf_info and leave there only mandatory fields. Users are free to extend it. Add struct ubuf_info_msgzc, which will be an extended version for MSG_ZEROCOPY and some other users. It duplicates of struct ubuf_info for now and will be removed in a couple of patches. Signed-off-by: Pavel Begunkov Acked-by: Paolo Abeni Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index f15d5b62539b..fd7dcb977fdf 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -554,7 +554,28 @@ struct ubuf_info { } mmp; }; +struct ubuf_info_msgzc { + struct ubuf_info ubuf; + + union { + struct { + unsigned long desc; + void *ctx; + }; + struct { + u32 id; + u16 len; + u16 zerocopy:1; + u32 bytelen; + }; + }; + + struct mmpin mmp; +}; + #define skb_uarg(SKB) ((struct ubuf_info *)(skb_shinfo(SKB)->destructor_arg)) +#define uarg_to_msgzc(ubuf_ptr) container_of((ubuf_ptr), struct ubuf_info_msgzc, \ + ubuf) int mm_account_pinned_pages(struct mmpin *mmp, size_t size); void mm_unaccount_pinned_pages(struct mmpin *mmp); -- cgit From e7d2b510165fff6bedc9cca88c071ad846850c74 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Fri, 23 Sep 2022 17:39:04 +0100 Subject: net: shrink struct ubuf_info We can benefit from a smaller struct ubuf_info, so leave only mandatory fields and let users to decide how they want to extend it. Convert MSG_ZEROCOPY to struct ubuf_info_msgzc and remove duplicated fields. This reduces the size from 48 bytes to just 16. Signed-off-by: Pavel Begunkov Acked-by: Paolo Abeni Signed-off-by: Jakub Kicinski --- include/linux/skbuff.h | 22 ++++------------------ 1 file changed, 4 insertions(+), 18 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index fd7dcb977fdf..920eb6413fee 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -533,25 +533,8 @@ enum { struct ubuf_info { void (*callback)(struct sk_buff *, struct ubuf_info *, bool zerocopy_success); - union { - struct { - unsigned long desc; - void *ctx; - }; - struct { - u32 id; - u16 len; - u16 zerocopy:1; - u32 bytelen; - }; - }; refcount_t refcnt; u8 flags; - - struct mmpin { - struct user_struct *user; - unsigned int num_pg; - } mmp; }; struct ubuf_info_msgzc { @@ -570,7 +553,10 @@ struct ubuf_info_msgzc { }; }; - struct mmpin mmp; + struct mmpin { + struct user_struct *user; + unsigned int num_pg; + } mmp; }; #define skb_uarg(SKB) ((struct ubuf_info *)(skb_shinfo(SKB)->destructor_arg)) -- cgit From b48b89f9c189d24eb5e2b4a0ac067da5a24ee86d Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Tue, 27 Sep 2022 06:27:53 -0700 Subject: net: drop the weight argument from netif_napi_add We tell driver developers to always pass NAPI_POLL_WEIGHT as the weight to netif_napi_add(). This may be confusing to newcomers, drop the weight argument, those who really need to tweak the weight can use netif_napi_add_weight(). Acked-by: Marc Kleine-Budde # for CAN Link: https://lore.kernel.org/r/20220927132753.750069-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 9f42fc871c3b..7a084a1c14e9 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -2553,16 +2553,15 @@ void netif_napi_add_weight(struct net_device *dev, struct napi_struct *napi, * @dev: network device * @napi: NAPI context * @poll: polling function - * @weight: default weight * * netif_napi_add() must be used to initialize a NAPI context prior to calling * *any* of the other NAPI-related functions. */ static inline void netif_napi_add(struct net_device *dev, struct napi_struct *napi, - int (*poll)(struct napi_struct *, int), int weight) + int (*poll)(struct napi_struct *, int)) { - netif_napi_add_weight(dev, napi, poll, weight); + netif_napi_add_weight(dev, napi, poll, NAPI_POLL_WEIGHT); } static inline void -- cgit From 40b72108f9c609c5043903efb70c52d46b8aa871 Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Tue, 27 Sep 2022 13:35:56 -0700 Subject: net/mlx5: Add the log_min_mkey_entity_size capability Add the capability that will allow the driver to determine the minimal MTT page size to be able to map the smallest possible pages in XSK. The older firmwares that don't have this capability default to 12 (i.e. 4096-byte pages). Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan Reviewed-by: Saeed Mahameed Signed-off-by: Saeed Mahameed Signed-off-by: Jakub Kicinski --- include/linux/mlx5/mlx5_ifc.h | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 3d963c0e47e4..1ad762e22d86 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1878,7 +1878,13 @@ struct mlx5_ifc_cmd_hca_cap_2_bits { u8 max_reformat_remove_size[0x8]; u8 max_reformat_remove_offset[0x8]; - u8 reserved_at_c0[0x160]; + u8 reserved_at_c0[0xe0]; + + u8 reserved_at_1a0[0xb]; + u8 log_min_mkey_entity_size[0x5]; + u8 reserved_at_1b0[0x10]; + + u8 reserved_at_1c0[0x60]; u8 reserved_at_220[0x1]; u8 sw_vhca_id_valid[0x1]; -- cgit From 9ed9cac1850a2a55674b4a17100c50b46f645921 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 23 Sep 2022 13:28:07 -0700 Subject: slab: Remove __malloc attribute from realloc functions The __malloc attribute should not be applied to "realloc" functions, as the returned pointer may alias the storage of the prior pointer. Instead of splitting __malloc from __alloc_size, which would be a huge amount of churn, just create __realloc_size for the few cases where it is needed. Thanks to Geert Uytterhoeven for reporting build failures with gcc-8 in earlier version which tried to remove the #ifdef. While the "alloc_size" attribute is available on all GCC versions, I forgot that it gets disabled explicitly by the kernel in GCC < 9.1 due to misbehaviors. Add a note to the compiler_attributes.h entry for it. Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Andrew Morton Cc: Vlastimil Babka Cc: Roman Gushchin Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Marco Elver Cc: linux-mm@kvack.org Signed-off-by: Kees Cook Signed-off-by: Vlastimil Babka --- include/linux/compiler_attributes.h | 3 ++- include/linux/compiler_types.h | 8 +++++--- include/linux/slab.h | 12 ++++++------ 3 files changed, 13 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/compiler_attributes.h b/include/linux/compiler_attributes.h index 445e80517cab..96a4ed11b4be 100644 --- a/include/linux/compiler_attributes.h +++ b/include/linux/compiler_attributes.h @@ -35,7 +35,8 @@ /* * Note: do not use this directly. Instead, use __alloc_size() since it is conditionally - * available and includes other attributes. + * available and includes other attributes. For GCC < 9.1, __alloc_size__ gets undefined + * in compiler-gcc.h, due to misbehaviors. * * gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute * clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4f2a819fd60a..0717534f8364 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -271,14 +271,16 @@ struct ftrace_likely_data { /* * Any place that could be marked with the "alloc_size" attribute is also - * a place to be marked with the "malloc" attribute. Do this as part of the - * __alloc_size macro to avoid redundant attributes and to avoid missing a - * __malloc marking. + * a place to be marked with the "malloc" attribute, except those that may + * be performing a _reallocation_, as that may alias the existing pointer. + * For these, use __realloc_size(). */ #ifdef __alloc_size__ # define __alloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) __malloc +# define __realloc_size(x, ...) __alloc_size__(x, ## __VA_ARGS__) #else # define __alloc_size(x, ...) __malloc +# define __realloc_size(x, ...) #endif #ifndef asm_volatile_goto diff --git a/include/linux/slab.h b/include/linux/slab.h index 0fefdf528e0d..41bd036e7551 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -184,7 +184,7 @@ int kmem_cache_shrink(struct kmem_cache *s); /* * Common kmalloc functions provided by all allocators */ -void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __alloc_size(2); +void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __realloc_size(2); void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); @@ -647,10 +647,10 @@ static inline __alloc_size(1, 2) void *kmalloc_array(size_t n, size_t size, gfp_ * @new_size: new size of a single member of the array * @flags: the type of memory to allocate (see kmalloc) */ -static inline __alloc_size(2, 3) void * __must_check krealloc_array(void *p, - size_t new_n, - size_t new_size, - gfp_t flags) +static inline __realloc_size(2, 3) void * __must_check krealloc_array(void *p, + size_t new_n, + size_t new_size, + gfp_t flags) { size_t bytes; @@ -774,7 +774,7 @@ static inline __alloc_size(1, 2) void *kvcalloc(size_t n, size_t size, gfp_t fla } extern void *kvrealloc(const void *p, size_t oldsize, size_t newsize, gfp_t flags) - __alloc_size(3); + __realloc_size(3); extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len); -- cgit From 05a940656e1eb2026d9ee31019d5b47e9545124d Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 23 Sep 2022 13:28:08 -0700 Subject: slab: Introduce kmalloc_size_roundup() In the effort to help the compiler reason about buffer sizes, the __alloc_size attribute was added to allocators. This improves the scope of the compiler's ability to apply CONFIG_UBSAN_BOUNDS and (in the near future) CONFIG_FORTIFY_SOURCE. For most allocations, this works well, as the vast majority of callers are not expecting to use more memory than what they asked for. There is, however, one common exception to this: anticipatory resizing of kmalloc allocations. These cases all use ksize() to determine the actual bucket size of a given allocation (e.g. 128 when 126 was asked for). This comes in two styles in the kernel: 1) An allocation has been determined to be too small, and needs to be resized. Instead of the caller choosing its own next best size, it wants to minimize the number of calls to krealloc(), so it just uses ksize() plus some additional bytes, forcing the realloc into the next bucket size, from which it can learn how large it is now. For example: data = krealloc(data, ksize(data) + 1, gfp); data_len = ksize(data); 2) The minimum size of an allocation is calculated, but since it may grow in the future, just use all the space available in the chosen bucket immediately, to avoid needing to reallocate later. A good example of this is skbuff's allocators: data = kmalloc_reserve(size, gfp_mask, node, &pfmemalloc); ... /* kmalloc(size) might give us more room than requested. * Put skb_shared_info exactly at the end of allocated zone, * to allow max possible filling before reallocation. */ osize = ksize(data); size = SKB_WITH_OVERHEAD(osize); In both cases, the "how much was actually allocated?" question is answered _after_ the allocation, where the compiler hinting is not in an easy place to make the association any more. This mismatch between the compiler's view of the buffer length and the code's intention about how much it is going to actually use has already caused problems[1]. It is possible to fix this by reordering the use of the "actual size" information. We can serve the needs of users of ksize() and still have accurate buffer length hinting for the compiler by doing the bucket size calculation _before_ the allocation. Code can instead ask "how large an allocation would I get for a given size?". Introduce kmalloc_size_roundup(), to serve this function so we can start replacing the "anticipatory resizing" uses of ksize(). [1] https://github.com/ClangBuiltLinux/linux/issues/1599 https://github.com/KSPP/linux/issues/183 [ vbabka@suse.cz: add SLOB version ] Cc: Vlastimil Babka Cc: Christoph Lameter Cc: Pekka Enberg Cc: David Rientjes Cc: Joonsoo Kim Cc: Andrew Morton Cc: linux-mm@kvack.org Signed-off-by: Kees Cook Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 41bd036e7551..727640173568 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -188,7 +188,21 @@ void * __must_check krealloc(const void *objp, size_t new_size, gfp_t flags) __r void kfree(const void *objp); void kfree_sensitive(const void *objp); size_t __ksize(const void *objp); + +/** + * ksize - Report actual allocation size of associated object + * + * @objp: Pointer returned from a prior kmalloc()-family allocation. + * + * This should not be used for writing beyond the originally requested + * allocation size. Either use krealloc() or round up the allocation size + * with kmalloc_size_roundup() prior to allocation. If this is used to + * access beyond the originally requested allocation size, UBSAN_BOUNDS + * and/or FORTIFY_SOURCE may trip, since they only know about the + * originally allocated size via the __alloc_size attribute. + */ size_t ksize(const void *objp); + #ifdef CONFIG_PRINTK bool kmem_valid_obj(void *object); void kmem_dump_obj(void *object); @@ -779,6 +793,23 @@ extern void kvfree(const void *addr); extern void kvfree_sensitive(const void *addr, size_t len); unsigned int kmem_cache_size(struct kmem_cache *s); + +/** + * kmalloc_size_roundup - Report allocation bucket size for the given size + * + * @size: Number of bytes to round up from. + * + * This returns the number of bytes that would be available in a kmalloc() + * allocation of @size bytes. For example, a 126 byte request would be + * rounded up to the next sized kmalloc bucket, 128 bytes. (This is strictly + * for the general-purpose kmalloc()-based allocations, and is not for the + * pre-sized kmem_cache_alloc()-based allocations.) + * + * Use this to kmalloc() the full bucket size ahead of time instead of using + * ksize() to query the size after an allocation. + */ +size_t kmalloc_size_roundup(size_t size); + void __init kmem_cache_init_late(void); #if defined(CONFIG_SMP) && defined(CONFIG_SLAB) -- cgit From c60ba2d34608dc6e224440267ed392adf65e05f2 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sat, 24 Sep 2022 02:10:37 +0206 Subject: printk: Make pr_flush() static No user outside the printk code and no reason to export this. Signed-off-by: Thomas Gleixner Signed-off-by: John Ogness Reviewed-by: Sergey Senozhatsky Reviewed-by: Petr Mladek Reviewed-by: Greg Kroah-Hartman Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220924000454.3319186-2-john.ogness@linutronix.de --- include/linux/printk.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/printk.h b/include/linux/printk.h index 091fba7283e1..b70a42f94031 100644 --- a/include/linux/printk.h +++ b/include/linux/printk.h @@ -170,8 +170,6 @@ extern void __printk_safe_exit(void); #define printk_deferred_enter __printk_safe_enter #define printk_deferred_exit __printk_safe_exit -extern bool pr_flush(int timeout_ms, bool reset_on_progress); - /* * Please don't use printk_ratelimit(), because it shares ratelimiting state * with all other unrelated printk_ratelimit() callsites. Instead use @@ -222,11 +220,6 @@ static inline void printk_deferred_exit(void) { } -static inline bool pr_flush(int timeout_ms, bool reset_on_progress) -{ - return true; -} - static inline int printk_ratelimit(void) { return 0; -- cgit From e3f12f0602833c88ad2cbe3aff397acd0cfd2695 Mon Sep 17 00:00:00 2001 From: Thomas Gleixner Date: Sat, 24 Sep 2022 02:10:38 +0206 Subject: printk: Declare log_wait properly kernel/printk/printk.c:365:1: warning: symbol 'log_wait' was not declared. Should it be static? Signed-off-by: Thomas Gleixner Signed-off-by: John Ogness Reviewed-by: Sergey Senozhatsky Reviewed-by: Petr Mladek Reviewed-by: Greg Kroah-Hartman Signed-off-by: Petr Mladek Link: https://lore.kernel.org/r/20220924000454.3319186-3-john.ogness@linutronix.de --- include/linux/syslog.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/syslog.h b/include/linux/syslog.h index 86af908e2663..955f80e34d4f 100644 --- a/include/linux/syslog.h +++ b/include/linux/syslog.h @@ -8,6 +8,8 @@ #ifndef _LINUX_SYSLOG_H #define _LINUX_SYSLOG_H +#include + /* Close the log. Currently a NOP. */ #define SYSLOG_ACTION_CLOSE 0 /* Open the log. Currently a NOP. */ @@ -35,5 +37,6 @@ #define SYSLOG_FROM_PROC 1 int do_syslog(int type, char __user *buf, int count, int source); +extern wait_queue_head_t log_wait; #endif /* _LINUX_SYSLOG_H */ -- cgit From 8cafdb5ab94cda3ebb0975be16e2d564a05132ea Mon Sep 17 00:00:00 2001 From: Pankaj Raghav Date: Thu, 29 Sep 2022 09:47:44 +0200 Subject: block: adapt blk_mq_plug() to not plug for writes that require a zone lock The current implementation of blk_mq_plug() disables plugging for all operations that involves a transfer to the device as we just check if the last bit in op_is_write() function. Modify blk_mq_plug() to disable plugging only for REQ_OP_WRITE and REQ_OP_WRITE_ZEROS as they might require a zone lock. Suggested-by: Christoph Hellwig Suggested-by: Damien Le Moal Reviewed-by: Christoph Hellwig Signed-off-by: Pankaj Raghav Reviewed-by: Damien Le Moal Reviewed-by: Johannes Thumshirn Link: https://lore.kernel.org/r/20220929074745.103073-2-p.raghav@samsung.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 4750772ef228..49373d002631 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -1304,6 +1304,15 @@ static inline bool bdev_is_zoned(struct block_device *bdev) return false; } +static inline bool bdev_op_is_zoned_write(struct block_device *bdev, + blk_opf_t op) +{ + if (!bdev_is_zoned(bdev)) + return false; + + return op == REQ_OP_WRITE || op == REQ_OP_WRITE_ZEROES; +} + static inline sector_t bdev_zone_sectors(struct block_device *bdev) { struct request_queue *q = bdev_get_queue(bdev); -- cgit From 39d6d08b2edf99c4b39a689a70bf0adee065b357 Mon Sep 17 00:00:00 2001 From: Beau Belgrave Date: Thu, 28 Jul 2022 16:33:08 -0700 Subject: tracing/user_events: Use bits vs bytes for enabled status page data User processes may require many events and when they do the cache performance of a byte index status check is less ideal than a bit index. The previous event limit per-page was 4096, the new limit is 32,768. This change adds a bitwise index to the user_reg struct. Programs check that the bit at status_bit has a bit set within the status page(s). Link: https://lkml.kernel.org/r/20220728233309.1896-6-beaub@linux.microsoft.com Link: https://lore.kernel.org/all/2059213643.196683.1648499088753.JavaMail.zimbra@efficios.com/ Suggested-by: Mathieu Desnoyers Signed-off-by: Beau Belgrave Signed-off-by: Steven Rostedt (Google) --- include/linux/user_events.h | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/user_events.h b/include/linux/user_events.h index 736e05603463..592a3fbed98e 100644 --- a/include/linux/user_events.h +++ b/include/linux/user_events.h @@ -20,15 +20,6 @@ #define USER_EVENTS_SYSTEM "user_events" #define USER_EVENTS_PREFIX "u:" -/* Bits 0-6 are for known probe types, Bit 7 is for unknown probes */ -#define EVENT_BIT_FTRACE 0 -#define EVENT_BIT_PERF 1 -#define EVENT_BIT_OTHER 7 - -#define EVENT_STATUS_FTRACE (1 << EVENT_BIT_FTRACE) -#define EVENT_STATUS_PERF (1 << EVENT_BIT_PERF) -#define EVENT_STATUS_OTHER (1 << EVENT_BIT_OTHER) - /* Create dynamic location entry within a 32-bit value */ #define DYN_LOC(offset, size) ((size) << 16 | (offset)) @@ -45,12 +36,12 @@ struct user_reg { /* Input: Pointer to string with event name, description and flags */ __u64 name_args; - /* Output: Byte index of the event within the status page */ - __u32 status_index; + /* Output: Bitwise index of the event within the status page */ + __u32 status_bit; /* Output: Index of the event to use when writing data */ __u32 write_index; -}; +} __attribute__((__packed__)); #define DIAG_IOC_MAGIC '*' -- cgit From adfdfcbdbd32b356323a3db6d3a683270051a7e6 Mon Sep 17 00:00:00 2001 From: Jerome Neanne Date: Thu, 29 Sep 2022 15:25:25 +0200 Subject: regulator: gpio: Add input_supply support in gpio_regulator_config This is simillar as fixed-regulator. Used to extract regulator parent from the device tree. Without that property used, the parent regulator can be shut down (if not an always on). Thus leading to inappropriate behavior: On am62-SP-SK this fix is required to avoid tps65219 ldo1 (SDMMC rail) to be shut down after boot completion. Signed-off-by: Jerome Neanne Link: https://lore.kernel.org/r/20220929132526.29427-2-jneanne@baylibre.com Signed-off-by: Mark Brown --- include/linux/regulator/gpio-regulator.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/regulator/gpio-regulator.h b/include/linux/regulator/gpio-regulator.h index fdeb312cdabd..c223e50ff9f7 100644 --- a/include/linux/regulator/gpio-regulator.h +++ b/include/linux/regulator/gpio-regulator.h @@ -42,6 +42,7 @@ struct gpio_regulator_state { /** * struct gpio_regulator_config - config structure * @supply_name: Name of the regulator supply + * @input_supply: Name of the input regulator supply * @enabled_at_boot: Whether regulator has been enabled at * boot or not. 1 = Yes, 0 = No * This is used to keep the regulator at @@ -62,6 +63,7 @@ struct gpio_regulator_state { */ struct gpio_regulator_config { const char *supply_name; + const char *input_supply; unsigned enabled_at_boot:1; unsigned startup_delay; -- cgit From 64696c40d03c01e0ea2e3e9aa1c490a7b6a1b6be Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 29 Sep 2022 00:04:03 -0700 Subject: bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline The struct_ops prog is to allow using bpf to implement the functions in a struct (eg. kernel module). The current usage is to implement the tcp_congestion. The kernel does not call the tcp-cc's ops (ie. the bpf prog) in a recursive way. The struct_ops is sharing the tracing-trampoline's enter/exit function which tracks prog->active to avoid recursion. It is needed for tracing prog. However, it turns out the struct_ops bpf prog will hit this prog->active and unnecessarily skipped running the struct_ops prog. eg. The '.ssthresh' may run in_task() and then interrupted by softirq that runs the same '.ssthresh'. Skip running the '.ssthresh' will end up returning random value to the caller. The patch adds __bpf_prog_{enter,exit}_struct_ops for the struct_ops trampoline. They do not track the prog->active to detect recursion. One exception is when the tcp_congestion's '.init' ops is doing bpf_setsockopt(TCP_CONGESTION) and then recurs to the same '.init' ops. This will be addressed in the following patches. Fixes: ca06f55b9002 ("bpf: Add per-program recursion prevention mechanism") Signed-off-by: Martin KaFai Lau Link: https://lore.kernel.org/r/20220929070407.965581-2-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0f3eaf3ed98c..9e7d46d16032 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -864,6 +864,10 @@ u64 notrace __bpf_prog_enter_lsm_cgroup(struct bpf_prog *prog, struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_prog_exit_lsm_cgroup(struct bpf_prog *prog, u64 start, struct bpf_tramp_run_ctx *run_ctx); +u64 notrace __bpf_prog_enter_struct_ops(struct bpf_prog *prog, + struct bpf_tramp_run_ctx *run_ctx); +void notrace __bpf_prog_exit_struct_ops(struct bpf_prog *prog, u64 start, + struct bpf_tramp_run_ctx *run_ctx); void notrace __bpf_tramp_enter(struct bpf_tramp_image *tr); void notrace __bpf_tramp_exit(struct bpf_tramp_image *tr); -- cgit From 061ff040710e9f6f043d1fa80b1b362d2845b17a Mon Sep 17 00:00:00 2001 From: Martin KaFai Lau Date: Thu, 29 Sep 2022 00:04:06 -0700 Subject: bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by using a bit of an existing 1 byte hole in tcp_sock to check if there is on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: Martin KaFai Lau Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20220929070407.965581-5-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/tcp.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index a9fbe22732c3..3bdf687e2fb3 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -388,6 +388,12 @@ struct tcp_sock { u8 bpf_sock_ops_cb_flags; /* Control calling BPF programs * values defined in uapi/linux/tcp.h */ + u8 bpf_chg_cc_inprogress:1; /* In the middle of + * bpf_setsockopt(TCP_CONGESTION), + * it is to avoid the bpf_tcp_cc->init() + * to recur itself by calling + * bpf_setsockopt(TCP_CONGESTION, "itself"). + */ #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) (TP->bpf_sock_ops_cb_flags & ARG) #else #define BPF_SOCK_OPS_TEST_FLAG(TP, ARG) 0 -- cgit From f62384995e4cb7703e5295779c44135c5311770d Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Mon, 26 Sep 2022 17:43:14 +0200 Subject: random: split initialization into early step and later step The full RNG initialization relies on some timestamps, made possible with initialization functions like time_init() and timekeeping_init(). However, these are only available rather late in initialization. Meanwhile, other things, such as memory allocator functions, make use of the RNG much earlier. So split RNG initialization into two phases. We can provide arch randomness very early on, and then later, after timekeeping and such are available, initialize the rest. This ensures that, for example, slabs are properly randomized if RDRAND is available. Without this, CONFIG_SLAB_FREELIST_RANDOM=y loses a degree of its security, because its random seed is potentially deterministic, since it hasn't yet incorporated RDRAND. It also makes it possible to use a better seed in kfence, which currently relies on only the cycle counter. Another positive consequence is that on systems with RDRAND, running with CONFIG_WARN_ALL_UNSEEDED_RANDOM=y results in no warnings at all. One subtle side effect of this change is that on systems with no RDRAND, RDTSC is now only queried by random_init() once, committing the moment of the function call, instead of multiple times as before. This is intentional, as the multiple RDTSCs in a loop before weren't accomplishing very much, with jitter being better provided by try_to_generate_entropy(). Plus, filling blocks with RDTSC is still being done in extract_entropy(), which is necessarily called before random bytes are served anyway. Cc: Andrew Morton Reviewed-by: Kees Cook Reviewed-by: Dominik Brodowski Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 3fec206487f6..a9e6e16f9774 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -72,7 +72,8 @@ static inline unsigned long get_random_canary(void) return get_random_long() & CANARY_MASK; } -int __init random_init(const char *command_line); +void __init random_init_early(const char *command_line); +void __init random_init(void); bool rng_is_initialized(void); int wait_for_random_bytes(void); -- cgit From 585cd5fe9f7378601b1d4915ad6e9088333b5e5e Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 28 Sep 2022 18:47:30 +0200 Subject: random: add 8-bit and 16-bit batches There are numerous places in the kernel that would be sped up by having smaller batches. Currently those callsites do `get_random_u32() & 0xff` or similar. Since these are pretty spread out, and will require patches to multiple different trees, let's get ahead of the curve and lay the foundation for `get_random_u8()` and `get_random_u16()`, so that it's then possible to start submitting conversion patches leisurely. Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index a9e6e16f9774..2c130f8f18e5 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -38,6 +38,8 @@ static inline int unregister_random_vmfork_notifier(struct notifier_block *nb) { #endif void get_random_bytes(void *buf, size_t len); +u8 get_random_u8(void); +u16 get_random_u16(void); u32 get_random_u32(void); u64 get_random_u64(void); static inline unsigned int get_random_int(void) -- cgit From 4c95236a335d8b62aa8dbd587bed6a5f30d8265a Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Thu, 29 Sep 2022 11:49:35 +0200 Subject: prandom: make use of smaller types in prandom_u32_max When possible at compile-time, make use of smaller types in prandom_u32_max(), so that we can use smaller batches from random.c, which in turn leads to a 2x or 4x performance boost. This makes a difference, for example, in kfence, which needs a fast stream of small numbers (booleans). At the same time, we use the occasion to update the old documentation on these functions. prandom_u32() and prandom_bytes() have direct replacements now in random.h, while prandom_u32_max() remains useful as a prandom.h function, since it's not cryptographically secure by virtue of not being evenly distributed. Cc: Dominik Brodowski Acked-by: Marco Elver Signed-off-by: Jason A. Donenfeld --- include/linux/prandom.h | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/prandom.h b/include/linux/prandom.h index deace5fb4e62..78db003bc290 100644 --- a/include/linux/prandom.h +++ b/include/linux/prandom.h @@ -12,11 +12,13 @@ #include #include +/* Deprecated: use get_random_u32 instead. */ static inline u32 prandom_u32(void) { return get_random_u32(); } +/* Deprecated: use get_random_bytes instead. */ static inline void prandom_bytes(void *buf, size_t nbytes) { return get_random_bytes(buf, nbytes); @@ -37,17 +39,20 @@ void prandom_seed_full_state(struct rnd_state __percpu *pcpu_state); * prandom_u32_max - returns a pseudo-random number in interval [0, ep_ro) * @ep_ro: right open interval endpoint * - * Returns a pseudo-random number that is in interval [0, ep_ro). Note - * that the result depends on PRNG being well distributed in [0, ~0U] - * u32 space. Here we use maximally equidistributed combined Tausworthe - * generator, that is, prandom_u32(). This is useful when requesting a - * random index of an array containing ep_ro elements, for example. + * Returns a pseudo-random number that is in interval [0, ep_ro). This is + * useful when requesting a random index of an array containing ep_ro elements, + * for example. The result is somewhat biased when ep_ro is not a power of 2, + * so do not use this for cryptographic purposes. * * Returns: pseudo-random number in interval [0, ep_ro) */ static inline u32 prandom_u32_max(u32 ep_ro) { - return (u32)(((u64) prandom_u32() * ep_ro) >> 32); + if (__builtin_constant_p(ep_ro <= 1U << 8) && ep_ro <= 1U << 8) + return (get_random_u8() * ep_ro) >> 8; + if (__builtin_constant_p(ep_ro <= 1U << 16) && ep_ro <= 1U << 16) + return (get_random_u16() * ep_ro) >> 16; + return ((u64)get_random_u32() * ep_ro) >> 32; } /* -- cgit From 9f4beead610c83065cc0410bfe97ff51d8e9578d Mon Sep 17 00:00:00 2001 From: Lukas Bulwahn Date: Thu, 29 Sep 2022 22:39:03 +0200 Subject: binfmt: remove taso from linux_binprm struct With commit 987f20a9dcce ("a.out: Remove the a.out implementation"), the use of the special taso flag for alpha architectures in the linux_binprm struct is gone. Remove the definition of taso in the linux_binprm struct. No functional change. Signed-off-by: Lukas Bulwahn Reviewed-by: "Eric W. Biederman" Signed-off-by: Kees Cook Link: https://lore.kernel.org/r/20220929203903.9475-1-lukas.bulwahn@gmail.com --- include/linux/binfmts.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 3dc20c4f394c..8d51f69f9f5e 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -43,9 +43,6 @@ struct linux_binprm { * original userspace. */ point_of_no_return:1; -#ifdef __alpha__ - unsigned int taso:1; -#endif struct file *executable; /* Executable to pass to the interpreter */ struct file *interpreter; struct file *file; -- cgit From f5290d8e4f0caa81a491448a27dd70e726095d07 Mon Sep 17 00:00:00 2001 From: Dmitry Baryshkov Date: Fri, 16 Sep 2022 09:17:38 +0300 Subject: clk: asm9260: use parent index to link the reference clock Rewrite clk-asm9260 to use parent index to use the reference clock. During this rework two helpers are added: - clk_hw_register_mux_table_parent_data() to supplement clk_hw_register_mux_table() but using parent_data instead of parent_names - clk_hw_register_fixed_rate_parent_accuracy() to be used instead of directly calling __clk_hw_register_fixed_rate(). The later function is an internal API, which is better not to be called directly. Signed-off-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20220916061740.87167-2-dmitry.baryshkov@linaro.org Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 1615010aa0ec..86140ac2f9a5 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -439,6 +439,20 @@ struct clk *clk_register_fixed_rate(struct device *dev, const char *name, __clk_hw_register_fixed_rate((dev), NULL, (name), NULL, NULL, \ (parent_data), NULL, (flags), \ (fixed_rate), (fixed_accuracy), 0) +/** + * clk_hw_register_fixed_rate_parent_accuracy - register fixed-rate clock with + * the clock framework + * @dev: device that is registering this clock + * @name: name of this clock + * @parent_name: name of clock's parent + * @flags: framework-specific flags + * @fixed_rate: non-adjustable clock rate + */ +#define clk_hw_register_fixed_rate_parent_accuracy(dev, name, parent_data, \ + flags, fixed_rate) \ + __clk_hw_register_fixed_rate((dev), NULL, (name), NULL, NULL, \ + (parent_data), (flags), (fixed_rate), 0, \ + CLK_FIXED_RATE_PARENT_ACCURACY) void clk_unregister_fixed_rate(struct clk *clk); void clk_hw_unregister_fixed_rate(struct clk_hw *hw); @@ -957,6 +971,13 @@ struct clk *clk_register_mux_table(struct device *dev, const char *name, (parent_names), NULL, NULL, (flags), (reg), \ (shift), (mask), (clk_mux_flags), (table), \ (lock)) +#define clk_hw_register_mux_table_parent_data(dev, name, parent_data, \ + num_parents, flags, reg, shift, mask, \ + clk_mux_flags, table, lock) \ + __clk_hw_register_mux((dev), NULL, (name), (num_parents), \ + NULL, NULL, (parent_data), (flags), (reg), \ + (shift), (mask), (clk_mux_flags), (table), \ + (lock)) #define clk_hw_register_mux(dev, name, parent_names, num_parents, flags, reg, \ shift, width, clk_mux_flags, lock) \ __clk_hw_register_mux((dev), NULL, (name), (num_parents), \ -- cgit From 1d7d20658534c7d36fe6f4252f6f1a27d9631a99 Mon Sep 17 00:00:00 2001 From: Dmitry Baryshkov Date: Fri, 16 Sep 2022 09:17:39 +0300 Subject: clk: fixed-rate: add devm_clk_hw_register_fixed_rate Add devm_clk_hw_register_fixed_rate(), devres-managed helper to register fixed-rate clock. Signed-off-by: Dmitry Baryshkov Link: https://lore.kernel.org/r/20220916061740.87167-3-dmitry.baryshkov@linaro.org Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 29 +++++++++++++++++++++-------- 1 file changed, 21 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 86140ac2f9a5..49405b336cad 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -350,7 +350,7 @@ struct clk_hw *__clk_hw_register_fixed_rate(struct device *dev, const char *parent_name, const struct clk_hw *parent_hw, const struct clk_parent_data *parent_data, unsigned long flags, unsigned long fixed_rate, unsigned long fixed_accuracy, - unsigned long clk_fixed_flags); + unsigned long clk_fixed_flags, bool devm); struct clk *clk_register_fixed_rate(struct device *dev, const char *name, const char *parent_name, unsigned long flags, unsigned long fixed_rate); @@ -365,7 +365,20 @@ struct clk *clk_register_fixed_rate(struct device *dev, const char *name, */ #define clk_hw_register_fixed_rate(dev, name, parent_name, flags, fixed_rate) \ __clk_hw_register_fixed_rate((dev), NULL, (name), (parent_name), NULL, \ - NULL, (flags), (fixed_rate), 0, 0) + NULL, (flags), (fixed_rate), 0, 0, false) + +/** + * devm_clk_hw_register_fixed_rate - register fixed-rate clock with the clock + * framework + * @dev: device that is registering this clock + * @name: name of this clock + * @parent_name: name of clock's parent + * @flags: framework-specific flags + * @fixed_rate: non-adjustable clock rate + */ +#define devm_clk_hw_register_fixed_rate(dev, name, parent_name, flags, fixed_rate) \ + __clk_hw_register_fixed_rate((dev), NULL, (name), (parent_name), NULL, \ + NULL, (flags), (fixed_rate), 0, 0, true) /** * clk_hw_register_fixed_rate_parent_hw - register fixed-rate clock with * the clock framework @@ -378,7 +391,7 @@ struct clk *clk_register_fixed_rate(struct device *dev, const char *name, #define clk_hw_register_fixed_rate_parent_hw(dev, name, parent_hw, flags, \ fixed_rate) \ __clk_hw_register_fixed_rate((dev), NULL, (name), NULL, (parent_hw), \ - NULL, (flags), (fixed_rate), 0, 0) + NULL, (flags), (fixed_rate), 0, 0, false) /** * clk_hw_register_fixed_rate_parent_data - register fixed-rate clock with * the clock framework @@ -392,7 +405,7 @@ struct clk *clk_register_fixed_rate(struct device *dev, const char *name, fixed_rate) \ __clk_hw_register_fixed_rate((dev), NULL, (name), NULL, NULL, \ (parent_data), (flags), (fixed_rate), 0, \ - 0) + 0, false) /** * clk_hw_register_fixed_rate_with_accuracy - register fixed-rate clock with * the clock framework @@ -408,7 +421,7 @@ struct clk *clk_register_fixed_rate(struct device *dev, const char *name, fixed_accuracy) \ __clk_hw_register_fixed_rate((dev), NULL, (name), (parent_name), \ NULL, NULL, (flags), (fixed_rate), \ - (fixed_accuracy), 0) + (fixed_accuracy), 0, false) /** * clk_hw_register_fixed_rate_with_accuracy_parent_hw - register fixed-rate * clock with the clock framework @@ -423,7 +436,7 @@ struct clk *clk_register_fixed_rate(struct device *dev, const char *name, parent_hw, flags, fixed_rate, fixed_accuracy) \ __clk_hw_register_fixed_rate((dev), NULL, (name), NULL, (parent_hw) \ NULL, NULL, (flags), (fixed_rate), \ - (fixed_accuracy), 0) + (fixed_accuracy), 0, false) /** * clk_hw_register_fixed_rate_with_accuracy_parent_data - register fixed-rate * clock with the clock framework @@ -438,7 +451,7 @@ struct clk *clk_register_fixed_rate(struct device *dev, const char *name, parent_data, flags, fixed_rate, fixed_accuracy) \ __clk_hw_register_fixed_rate((dev), NULL, (name), NULL, NULL, \ (parent_data), NULL, (flags), \ - (fixed_rate), (fixed_accuracy), 0) + (fixed_rate), (fixed_accuracy), 0, false) /** * clk_hw_register_fixed_rate_parent_accuracy - register fixed-rate clock with * the clock framework @@ -452,7 +465,7 @@ struct clk *clk_register_fixed_rate(struct device *dev, const char *name, flags, fixed_rate) \ __clk_hw_register_fixed_rate((dev), NULL, (name), NULL, NULL, \ (parent_data), (flags), (fixed_rate), 0, \ - CLK_FIXED_RATE_PARENT_ACCURACY) + CLK_FIXED_RATE_PARENT_ACCURACY, false) void clk_unregister_fixed_rate(struct clk *clk); void clk_hw_unregister_fixed_rate(struct clk_hw *hw); -- cgit From dbae2b062824fc2d35ae2d5df2f500626c758e80 Mon Sep 17 00:00:00 2001 From: Paolo Abeni Date: Wed, 28 Sep 2022 10:43:09 +0200 Subject: net: skb: introduce and use a single page frag cache After commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs") we are observing 10-20% regressions in performance tests with small packets. The perf trace points to high pressure on the slab allocator. This change tries to improve the allocation schema for small packets using an idea originally suggested by Eric: a new per CPU page frag is introduced and used in __napi_alloc_skb to cope with small allocation requests. To ensure that the above does not lead to excessive truesize underestimation, the frag size for small allocation is inflated to 1K and all the above is restricted to build with 4K page size. Note that we need to update accordingly the run-time check introduced with commit fd9ea57f4e95 ("net: add napi_get_frags_check() helper"). Alex suggested a smart page refcount schema to reduce the number of atomic operations and deal properly with pfmemalloc pages. Under small packet UDP flood, I measure a 15% peak tput increases. Suggested-by: Eric Dumazet Suggested-by: Alexander H Duyck Signed-off-by: Paolo Abeni Reviewed-by: Eric Dumazet Reviewed-by: Alexander Duyck Link: https://lore.kernel.org/r/6b6f65957c59f86a353fc09a5127e83a32ab5999.1664350652.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 7a084a1c14e9..64c3a5864969 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3821,6 +3821,7 @@ void netif_receive_skb_list(struct list_head *head); gro_result_t napi_gro_receive(struct napi_struct *napi, struct sk_buff *skb); void napi_gro_flush(struct napi_struct *napi, bool flush_old); struct sk_buff *napi_get_frags(struct napi_struct *napi); +void napi_get_frags_check(struct napi_struct *napi); gro_result_t napi_gro_frags(struct napi_struct *napi); struct packet_offload *gro_find_receive_by_type(__be16 type); struct packet_offload *gro_find_complete_by_type(__be16 type); -- cgit From aac4daa8941ea6566563ac001e9e5d4e54a674e2 Mon Sep 17 00:00:00 2001 From: Vladimir Oltean Date: Wed, 28 Sep 2022 12:51:57 +0300 Subject: net/sched: query offload capabilities through ndo_setup_tc() When adding optional new features to Qdisc offloads, existing drivers must reject the new configuration until they are coded up to act on it. Since modifying all drivers in lockstep with the changes in the Qdisc can create problems of its own, it would be nice if there existed an automatic opt-in mechanism for offloading optional features. Jakub proposes that we multiplex one more kind of call through ndo_setup_tc(): one where the driver populates a Qdisc-specific capability structure. First user will be taprio in further changes. Here we are introducing the definitions for the base functionality. Link: https://patchwork.kernel.org/project/netdevbpf/patch/20220923163310.3192733-3-vladimir.oltean@nxp.com/ Suggested-by: Jakub Kicinski Co-developed-by: Jakub Kicinski Signed-off-by: Vladimir Oltean Signed-off-by: Jakub Kicinski --- include/linux/netdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 64c3a5864969..eddf8ee270e7 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -940,6 +940,7 @@ struct net_device_path_ctx { }; enum tc_setup_type { + TC_QUERY_CAPS, TC_SETUP_QDISC_MQPRIO, TC_SETUP_CLSU32, TC_SETUP_CLSFLOWER, -- cgit From 5bdf402a05fafc55c876b9993b4c948bb2bc2361 Mon Sep 17 00:00:00 2001 From: "Ritesh Harjani (IBM)" Date: Thu, 18 Aug 2022 10:34:40 +0530 Subject: fs/buffer: make submit_bh & submit_bh_wbc return type as void submit_bh/submit_bh_wbc are non-blocking functions which just submit the bio and return. The caller of submit_bh/submit_bh_wbc needs to wait on buffer till I/O completion and then check buffer head's b_state field to know if there was any I/O error. Hence there is no need for these functions to have any return type. Even now they always returns 0. Hence drop the return value and make their return type as void to avoid any confusion. Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Signed-off-by: Ritesh Harjani (IBM) Link: https://lore.kernel.org/r/cb66ef823374cdd94d2d03083ce13de844fffd41.1660788334.git.ritesh.list@gmail.com Signed-off-by: Theodore Ts'o --- include/linux/buffer_head.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 089c9ade4325..dcc0e90d8979 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -229,7 +229,7 @@ void ll_rw_block(blk_opf_t, int, struct buffer_head * bh[]); int sync_dirty_buffer(struct buffer_head *bh); int __sync_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); void write_dirty_buffer(struct buffer_head *bh, blk_opf_t op_flags); -int submit_bh(blk_opf_t, struct buffer_head *); +void submit_bh(blk_opf_t, struct buffer_head *); void write_boundary_block(struct block_device *bdev, sector_t bblock, unsigned blocksize); int bh_uptodate_or_lock(struct buffer_head *bh); -- cgit From cbfecb927f429a6fa613d74b998496bd71e4438a Mon Sep 17 00:00:00 2001 From: Lukas Czerner Date: Thu, 25 Aug 2022 12:06:57 +0200 Subject: fs: record I_DIRTY_TIME even if inode already has I_DIRTY_INODE Currently the I_DIRTY_TIME will never get set if the inode already has I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's true, however ext4 will only update the on-disk inode in ->dirty_inode(), not on actual writeback. As a result if the inode already has I_DIRTY_INODE state by the time we get to __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled into on-disk inode and will not get updated until the next I_DIRTY_INODE update, which might never come if we crash or get a power failure. The problem can be reproduced on ext4 by running xfstest generic/622 with -o iversion mount option. Fix it by allowing I_DIRTY_TIME to be set even if the inode already has I_DIRTY_INODE. Also make sure that the case is properly handled in writeback_single_inode() as well. Additionally changes in xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag. Thanks Jan Kara for suggestions on how to make this work properly. Cc: Dave Chinner Cc: Christoph Hellwig Cc: stable@kernel.org Signed-off-by: Lukas Czerner Suggested-by: Jan Kara Reviewed-by: Jan Kara Link: https://lore.kernel.org/r/20220825100657.44217-1-lczerner@redhat.com Signed-off-by: Theodore Ts'o --- include/linux/fs.h | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index 9eced4cc286e..56a4b4b02477 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2371,13 +2371,14 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, * don't have to write inode on fdatasync() when only * e.g. the timestamps have changed. * I_DIRTY_PAGES Inode has dirty pages. Inode itself may be clean. - * I_DIRTY_TIME The inode itself only has dirty timestamps, and the + * I_DIRTY_TIME The inode itself has dirty timestamps, and the * lazytime mount option is enabled. We keep track of this * separately from I_DIRTY_SYNC in order to implement * lazytime. This gets cleared if I_DIRTY_INODE - * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e. - * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in - * i_state, but not both. I_DIRTY_PAGES may still be set. + * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But + * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already + * in place because writeback might already be in progress + * and we don't want to lose the time update * I_NEW Serves as both a mutex and completion notification. * New inodes set I_NEW. If two processes both create * the same inode, one of them will release its inode and -- cgit From d427c8999b071af2003203b42a007c4a80156c18 Mon Sep 17 00:00:00 2001 From: Richard Gobert Date: Wed, 28 Sep 2022 14:55:31 +0200 Subject: net-next: skbuff: refactor pskb_pull pskb_may_pull already contains all of the checks performed by pskb_pull. Use pskb_may_pull for validation in pskb_pull, eliminating the duplication and making __pskb_pull obsolete. Replace __pskb_pull with pskb_pull where applicable. Signed-off-by: Richard Gobert Signed-off-by: David S. Miller --- include/linux/skbuff.h | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 920eb6413fee..9fcf534f2d92 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2616,20 +2616,6 @@ void *skb_pull_data(struct sk_buff *skb, size_t len); void *__pskb_pull_tail(struct sk_buff *skb, int delta); -static inline void *__pskb_pull(struct sk_buff *skb, unsigned int len) -{ - if (len > skb_headlen(skb) && - !__pskb_pull_tail(skb, len - skb_headlen(skb))) - return NULL; - skb->len -= len; - return skb->data += len; -} - -static inline void *pskb_pull(struct sk_buff *skb, unsigned int len) -{ - return unlikely(len > skb->len) ? NULL : __pskb_pull(skb, len); -} - static inline bool pskb_may_pull(struct sk_buff *skb, unsigned int len) { if (likely(len <= skb_headlen(skb))) @@ -2639,6 +2625,15 @@ static inline bool pskb_may_pull(struct sk_buff *skb, unsigned int len) return __pskb_pull_tail(skb, len - skb_headlen(skb)) != NULL; } +static inline void *pskb_pull(struct sk_buff *skb, unsigned int len) +{ + if (!pskb_may_pull(skb, len)) + return NULL; + + skb->len -= len; + return skb->data += len; +} + void skb_condense(struct sk_buff *skb); /** -- cgit From f4ce91ce12a7c6ead19b128ffa8cff6e3ded2a14 Mon Sep 17 00:00:00 2001 From: Neal Cardwell Date: Wed, 28 Sep 2022 16:03:31 -0400 Subject: tcp: fix tcp_cwnd_validate() to not forget is_cwnd_limited This commit fixes a bug in the tracking of max_packets_out and is_cwnd_limited. This bug can cause the connection to fail to remember that is_cwnd_limited is true, causing the connection to fail to grow cwnd when it should, causing throughput to be lower than it should be. The following event sequence is an example that triggers the bug: (a) The connection is cwnd_limited, but packets_out is not at its peak due to TSO deferral deciding not to send another skb yet. In such cases the connection can advance max_packets_seq and set tp->is_cwnd_limited to true and max_packets_out to a small number. (b) Then later in the round trip the connection is pacing-limited (not cwnd-limited), and packets_out is larger. In such cases the connection would raise max_packets_out to a bigger number but (unexpectedly) flip tp->is_cwnd_limited from true to false. This commit fixes that bug. One straightforward fix would be to separately track (a) the next window after max_packets_out reaches a maximum, and (b) the next window after tp->is_cwnd_limited is set to true. But this would require consuming an extra u32 sequence number. Instead, to save space we track only the most important information. Specifically, we track the strongest available signal of the degree to which the cwnd is fully utilized: (1) If the connection is cwnd-limited then we remember that fact for the current window. (2) If the connection not cwnd-limited then we track the maximum number of outstanding packets in the current window. In particular, note that the new logic cannot trigger the buggy (a)/(b) sequence above because with the new logic a condition where tp->packets_out > tp->max_packets_out can only trigger an update of tp->is_cwnd_limited if tp->is_cwnd_limited is false. This first showed up in a testing of a BBRv2 dev branch, but this buggy behavior highlighted a general issue with the tcp_cwnd_validate() logic that can cause cwnd to fail to increase at the proper rate for any TCP congestion control, including Reno or CUBIC. Fixes: ca8a22634381 ("tcp: make cwnd-limited checks measurement-based, and gentler") Signed-off-by: Neal Cardwell Signed-off-by: Kevin(Yudong) Yang Signed-off-by: Yuchung Cheng Signed-off-by: Eric Dumazet Signed-off-by: David S. Miller --- include/linux/tcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/tcp.h b/include/linux/tcp.h index a9fbe22732c3..4791fd801945 100644 --- a/include/linux/tcp.h +++ b/include/linux/tcp.h @@ -295,7 +295,7 @@ struct tcp_sock { u32 packets_out; /* Packets which are "in flight" */ u32 retrans_out; /* Retransmitted packets out */ u32 max_packets_out; /* max packets_out in last window */ - u32 max_packets_seq; /* right edge of max_packets_out flight */ + u32 cwnd_usage_seq; /* right edge of cwnd usage tracking flight */ u16 urg_data; /* Saved octet of OOB data and control flags */ u8 ecn_flags; /* ECN status bits. */ -- cgit From 650ae67bbf7ba5ac193f053969612fbb93247b64 Mon Sep 17 00:00:00 2001 From: William Breathitt Gray Date: Tue, 27 Sep 2022 18:53:38 -0400 Subject: counter: Introduce the Signal polarity component The Signal polarity component represents the active level of a respective Signal. There are two possible states: positive (rising edge) and negative (falling edge); enum counter_signal_polarity represents these states. A convenience macro COUNTER_COMP_POLARITY() is provided for driver authors to declare a Signal polarity component. Cc: Julien Panis Link: https://lore.kernel.org/r/8f47d6e1db71a11bb1e2666f8e2a6e9d256d4131.1664204990.git.william.gray@linaro.org/ Signed-off-by: William Breathitt Gray Link: https://lore.kernel.org/r/b6e53438badcb6318997d13dd2fc052f97d808ac.1664318353.git.william.gray@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/counter.h | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'include/linux') diff --git a/include/linux/counter.h b/include/linux/counter.h index a81234bc8ea8..b3fb6b68881a 100644 --- a/include/linux/counter.h +++ b/include/linux/counter.h @@ -31,6 +31,7 @@ enum counter_comp_type { COUNTER_COMP_ENUM, COUNTER_COMP_COUNT_DIRECTION, COUNTER_COMP_COUNT_MODE, + COUNTER_COMP_SIGNAL_POLARITY, }; /** @@ -477,6 +478,15 @@ struct counter_available { #define COUNTER_COMP_FLOOR(_read, _write) \ COUNTER_COMP_COUNT_U64("floor", _read, _write) +#define COUNTER_COMP_POLARITY(_read, _write, _available) \ +{ \ + .type = COUNTER_COMP_SIGNAL_POLARITY, \ + .name = "polarity", \ + .signal_u32_read = (_read), \ + .signal_u32_write = (_write), \ + .priv = &(_available), \ +} + #define COUNTER_COMP_PRESET(_read, _write) \ COUNTER_COMP_COUNT_U64("preset", _read, _write) -- cgit From 45d2918520b2d8e640e4fb3fbf664dfb823dc520 Mon Sep 17 00:00:00 2001 From: William Breathitt Gray Date: Tue, 27 Sep 2022 18:53:40 -0400 Subject: counter: Introduce the Count capture component Some devices provide a latch function to save historic Count values. This patch standardizes exposure of such functionality as Count capture components. A COUNTER_COMP_CAPTURE macro is provided for driver authors to define a capture component. A new event COUNTER_EVENT_CAPTURE is introduced to represent Count value capture events. Cc: Julien Panis Link: https://lore.kernel.org/r/c239572ab4208d0d6728136e82a88ad464369a7a.1664204990.git.william.gray@linaro.org/ Signed-off-by: William Breathitt Gray Link: https://lore.kernel.org/r/3cebaa0b807a225eb277d771504fe6dba7269ffd.1664318353.git.william.gray@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/counter.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/counter.h b/include/linux/counter.h index b3fb6b68881a..e160197971dd 100644 --- a/include/linux/counter.h +++ b/include/linux/counter.h @@ -453,6 +453,9 @@ struct counter_available { .priv = &(_available), \ } +#define COUNTER_COMP_CAPTURE(_read, _write) \ + COUNTER_COMP_COUNT_U64("capture", _read, _write) + #define COUNTER_COMP_CEILING(_read, _write) \ COUNTER_COMP_COUNT_U64("ceiling", _read, _write) -- cgit From d2011be1e22f7769c7c71d6d7f777ffcc544808d Mon Sep 17 00:00:00 2001 From: William Breathitt Gray Date: Tue, 27 Sep 2022 18:53:42 -0400 Subject: counter: Introduce the COUNTER_COMP_ARRAY component type The COUNTER_COMP_ARRAY Counter component type is introduced to enable support for Counter array components. With Counter array components, exposure for buffers on counter devices can be defined via new Counter array component macros. This should simplify code for driver authors who would otherwise need to define individual Counter components for each array element. Eight Counter array component macros are introduced:: DEFINE_COUNTER_ARRAY_U64(_name, _length) DEFINE_COUNTER_ARRAY_CAPTURE(_name, _length) DEFINE_COUNTER_ARRAY_POLARITY(_name, _enums, _length) COUNTER_COMP_DEVICE_ARRAY_U64(_name, _read, _write, _array) COUNTER_COMP_COUNT_ARRAY_U64(_name, _read, _write, _array) COUNTER_COMP_SIGNAL_ARRAY_U64(_name, _read, _write, _array) COUNTER_COMP_ARRAY_CAPTURE(_read, _write, _array) COUNTER_COMP_ARRAY_POLARITY(_read, _write, _array) Eight Counter array callbacks are introduced as well:: int (*signal_array_u32_read)(struct counter_device *counter, struct counter_signal *signal, size_t idx, u32 *val); int (*signal_array_u32_write)(struct counter_device *counter, struct counter_signal *signal, size_t idx, u32 val); int (*device_array_u64_read)(struct counter_device *counter, size_t idx, u64 *val); int (*count_array_u64_read)(struct counter_device *counter, struct counter_count *count, size_t idx, u64 *val); int (*signal_array_u64_read)(struct counter_device *counter, struct counter_signal *signal, size_t idx, u64 *val); int (*device_array_u64_write)(struct counter_device *counter, size_t idx, u64 val); int (*count_array_u64_write)(struct counter_device *counter, struct counter_count *count, size_t idx, u64 val); int (*signal_array_u64_write)(struct counter_device *counter, struct counter_signal *signal, size_t idx, u64 val); Driver authors can handle reads/writes for an array component by receiving an element index via the `idx` parameter and processing the respective value via the `val` parameter. For example, suppose a driver wants to expose a Count's read-only capture buffer of four elements using a callback `foobar_capture_read()`:: DEFINE_COUNTER_ARRAY_CAPTURE(foobar_capture_array, 4); COUNTER_COMP_ARRAY_CAPTURE(foobar_capture_read, NULL, foobar_capture_array) Respective sysfs attributes for each array element would appear for the respective Count: * /sys/bus/counter/devices/counterX/countY/capture0 * /sys/bus/counter/devices/counterX/countY/capture1 * /sys/bus/counter/devices/counterX/countY/capture2 * /sys/bus/counter/devices/counterX/countY/capture3 If a user tries to read _capture2_ for example, `idx` will be `2` when passed to the `foobar_capture_read()` callback, and thus the driver knows which array element to handle. Counter arrays for polarity elements can be defined in a similar manner as u64 elements:: const enum counter_signal_polarity foobar_polarity_states[] = { COUNTER_SIGNAL_POLARITY_POSITIVE, COUNTER_SIGNAL_POLARITY_NEGATIVE, }; DEFINE_COUNTER_ARRAY_POLARITY(foobar_polarity_array, foobar_polarity_states, 4); COUNTER_COMP_ARRAY_POLARITY(foobar_polarity_read, foobar_polarity_write, foobar_polarity_array) Tested-by: Julien Panis Link: https://lore.kernel.org/r/5310c22520aeae65b1b74952419f49ac4c8e1ec1.1664204990.git.william.gray@linaro.org/ Signed-off-by: William Breathitt Gray Link: https://lore.kernel.org/r/a51fd608704bdfc5a0efa503fc5481df34241e0a.1664318353.git.william.gray@linaro.org Signed-off-by: Greg Kroah-Hartman --- include/linux/counter.h | 134 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 134 insertions(+) (limited to 'include/linux') diff --git a/include/linux/counter.h b/include/linux/counter.h index e160197971dd..c41fa602ed28 100644 --- a/include/linux/counter.h +++ b/include/linux/counter.h @@ -32,6 +32,7 @@ enum counter_comp_type { COUNTER_COMP_COUNT_DIRECTION, COUNTER_COMP_COUNT_MODE, COUNTER_COMP_SIGNAL_POLARITY, + COUNTER_COMP_ARRAY, }; /** @@ -69,6 +70,30 @@ enum counter_comp_type { * @signal_u64_read: Signal u64 component read callback. The read value of * the respective Signal u64 component should be passed * back via the val parameter. + * @signal_array_u32_read: Signal u32 array component read callback. The + * index of the respective Count u32 array + * component element is passed via the idx + * parameter. The read value of the respective + * Count u32 array component element should be + * passed back via the val parameter. + * @device_array_u64_read: Device u64 array component read callback. The + * index of the respective Device u64 array + * component element is passed via the idx + * parameter. The read value of the respective + * Device u64 array component element should be + * passed back via the val parameter. + * @count_array_u64_read: Count u64 array component read callback. The + * index of the respective Count u64 array + * component element is passed via the idx + * parameter. The read value of the respective + * Count u64 array component element should be + * passed back via the val parameter. + * @signal_array_u64_read: Signal u64 array component read callback. The + * index of the respective Count u64 array + * component element is passed via the idx + * parameter. The read value of the respective + * Count u64 array component element should be + * passed back via the val parameter. * @action_write: Synapse action mode write callback. The write value of * the respective Synapse action mode is passed via the * action parameter. @@ -99,6 +124,30 @@ enum counter_comp_type { * @signal_u64_write: Signal u64 component write callback. The write value of * the respective Signal u64 component is passed via the * val parameter. + * @signal_array_u32_write: Signal u32 array component write callback. The + * index of the respective Signal u32 array + * component element is passed via the idx + * parameter. The write value of the respective + * Signal u32 array component element is passed via + * the val parameter. + * @device_array_u64_write: Device u64 array component write callback. The + * index of the respective Device u64 array + * component element is passed via the idx + * parameter. The write value of the respective + * Device u64 array component element is passed via + * the val parameter. + * @count_array_u64_write: Count u64 array component write callback. The + * index of the respective Count u64 array + * component element is passed via the idx + * parameter. The write value of the respective + * Count u64 array component element is passed via + * the val parameter. + * @signal_array_u64_write: Signal u64 array component write callback. The + * index of the respective Signal u64 array + * component element is passed via the idx + * parameter. The write value of the respective + * Signal u64 array component element is passed via + * the val parameter. */ struct counter_comp { enum counter_comp_type type; @@ -126,6 +175,17 @@ struct counter_comp { struct counter_count *count, u64 *val); int (*signal_u64_read)(struct counter_device *counter, struct counter_signal *signal, u64 *val); + int (*signal_array_u32_read)(struct counter_device *counter, + struct counter_signal *signal, + size_t idx, u32 *val); + int (*device_array_u64_read)(struct counter_device *counter, + size_t idx, u64 *val); + int (*count_array_u64_read)(struct counter_device *counter, + struct counter_count *count, + size_t idx, u64 *val); + int (*signal_array_u64_read)(struct counter_device *counter, + struct counter_signal *signal, + size_t idx, u64 *val); }; union { int (*action_write)(struct counter_device *counter, @@ -149,6 +209,17 @@ struct counter_comp { struct counter_count *count, u64 val); int (*signal_u64_write)(struct counter_device *counter, struct counter_signal *signal, u64 val); + int (*signal_array_u32_write)(struct counter_device *counter, + struct counter_signal *signal, + size_t idx, u32 val); + int (*device_array_u64_write)(struct counter_device *counter, + size_t idx, u64 val); + int (*count_array_u64_write)(struct counter_device *counter, + struct counter_count *count, + size_t idx, u64 val); + int (*signal_array_u64_write)(struct counter_device *counter, + struct counter_signal *signal, + size_t idx, u64 val); }; }; @@ -453,6 +524,57 @@ struct counter_available { .priv = &(_available), \ } +struct counter_array { + enum counter_comp_type type; + const struct counter_available *avail; + union { + size_t length; + size_t idx; + }; +}; + +#define DEFINE_COUNTER_ARRAY_U64(_name, _length) \ + struct counter_array _name = { \ + .type = COUNTER_COMP_U64, \ + .length = (_length), \ + } + +#define DEFINE_COUNTER_ARRAY_CAPTURE(_name, _length) \ + DEFINE_COUNTER_ARRAY_U64(_name, _length) + +#define DEFINE_COUNTER_ARRAY_POLARITY(_name, _enums, _length) \ + DEFINE_COUNTER_AVAILABLE(_name##_available, _enums); \ + struct counter_array _name = { \ + .type = COUNTER_COMP_SIGNAL_POLARITY, \ + .avail = &(_name##_available), \ + .length = (_length), \ + } + +#define COUNTER_COMP_DEVICE_ARRAY_U64(_name, _read, _write, _array) \ +{ \ + .type = COUNTER_COMP_ARRAY, \ + .name = (_name), \ + .device_array_u64_read = (_read), \ + .device_array_u64_write = (_write), \ + .priv = &(_array), \ +} +#define COUNTER_COMP_COUNT_ARRAY_U64(_name, _read, _write, _array) \ +{ \ + .type = COUNTER_COMP_ARRAY, \ + .name = (_name), \ + .count_array_u64_read = (_read), \ + .count_array_u64_write = (_write), \ + .priv = &(_array), \ +} +#define COUNTER_COMP_SIGNAL_ARRAY_U64(_name, _read, _write, _array) \ +{ \ + .type = COUNTER_COMP_ARRAY, \ + .name = (_name), \ + .signal_array_u64_read = (_read), \ + .signal_array_u64_write = (_write), \ + .priv = &(_array), \ +} + #define COUNTER_COMP_CAPTURE(_read, _write) \ COUNTER_COMP_COUNT_U64("capture", _read, _write) @@ -496,4 +618,16 @@ struct counter_available { #define COUNTER_COMP_PRESET_ENABLE(_read, _write) \ COUNTER_COMP_COUNT_BOOL("preset_enable", _read, _write) +#define COUNTER_COMP_ARRAY_CAPTURE(_read, _write, _array) \ + COUNTER_COMP_COUNT_ARRAY_U64("capture", _read, _write, _array) + +#define COUNTER_COMP_ARRAY_POLARITY(_read, _write, _array) \ +{ \ + .type = COUNTER_COMP_ARRAY, \ + .name = "polarity", \ + .signal_array_u32_read = (_read), \ + .signal_array_u32_write = (_write), \ + .priv = &(_array), \ +} + #endif /* _COUNTER_H_ */ -- cgit From de671d6116b5210097cd6fbb877bac92536f265b Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 21 Sep 2022 15:19:54 -0600 Subject: block: change request end_io handler to pass back a return value Everything is just converted to returning RQ_END_IO_NONE, and there should be no functional changes with this patch. In preparation for allowing the end_io handler to pass ownership back to the block layer, rather than retain ownership of the request. Reviewed-by: Keith Busch Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 00a15808c137..e6fa49dd6196 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -14,7 +14,12 @@ struct blk_flush_queue; #define BLKDEV_MIN_RQ 4 #define BLKDEV_DEFAULT_RQ 128 -typedef void (rq_end_io_fn)(struct request *, blk_status_t); +enum rq_end_io_ret { + RQ_END_IO_NONE, + RQ_END_IO_FREE, +}; + +typedef enum rq_end_io_ret (rq_end_io_fn)(struct request *, blk_status_t); /* * request flags */ -- cgit From ab3e1d3bbab9e973aeb4dd4603251578658a47ff Mon Sep 17 00:00:00 2001 From: Jens Axboe Date: Wed, 21 Sep 2022 08:24:16 -0600 Subject: block: allow end_io based requests in the completion batch handling With end_io handlers now being able to potentially pass ownership of the request upon completion, we can allow requests with end_io handlers in the batch completion handling. Reviewed-by: Anuj Gupta Reviewed-by: Keith Busch Co-developed-by: Stefan Roesch Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index e6fa49dd6196..50811d0fb143 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -853,8 +853,9 @@ static inline bool blk_mq_add_to_batch(struct request *req, struct io_comp_batch *iob, int ioerror, void (*complete)(struct io_comp_batch *)) { - if (!iob || (req->rq_flags & RQF_ELV) || req->end_io || ioerror) + if (!iob || (req->rq_flags & RQF_ELV) || ioerror) return false; + if (!iob->complete) iob->complete = complete; else if (iob->complete != complete) -- cgit From a9216fac3ed8819cbbda5d39dd5fcaa43dfd35d8 Mon Sep 17 00:00:00 2001 From: Anuj Gupta Date: Fri, 30 Sep 2022 11:57:38 +0530 Subject: io_uring: add io_uring_cmd_import_fixed This is a new helper that callers can use to obtain a bvec iterator for the previously mapped buffer. This is preparatory work to enable fixed-buffer support for io_uring_cmd. Signed-off-by: Anuj Gupta Signed-off-by: Kanchan Joshi Link: https://lore.kernel.org/r/20220930062749.152261-2-anuj20.g@samsung.com Signed-off-by: Jens Axboe --- include/linux/io_uring.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 58676c0a398f..1dbf51115c30 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -4,6 +4,7 @@ #include #include +#include enum io_uring_cmd_flags { IO_URING_F_COMPLETE_DEFER = 1, @@ -32,6 +33,8 @@ struct io_uring_cmd { }; #if defined(CONFIG_IO_URING) +int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, + struct iov_iter *iter, void *ioucmd); void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t res2); void io_uring_cmd_complete_in_task(struct io_uring_cmd *ioucmd, void (*task_work_cb)(struct io_uring_cmd *)); @@ -59,6 +62,11 @@ static inline void io_uring_free(struct task_struct *tsk) __io_uring_free(tsk); } #else +static int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, + struct iov_iter *iter, void *ioucmd) +{ + return -EOPNOTSUPP; +} static inline void io_uring_cmd_done(struct io_uring_cmd *cmd, ssize_t ret, ssize_t ret2) { -- cgit From 9cda70f622cdcf049521a9c2886e5fd8a90a0591 Mon Sep 17 00:00:00 2001 From: Anuj Gupta Date: Fri, 30 Sep 2022 11:57:39 +0530 Subject: io_uring: introduce fixed buffer support for io_uring_cmd Add IORING_URING_CMD_FIXED flag that is to be used for sending io_uring command with previously registered buffers. User-space passes the buffer index in sqe->buf_index, same as done in read/write variants that uses fixed buffers. Signed-off-by: Anuj Gupta Signed-off-by: Kanchan Joshi Link: https://lore.kernel.org/r/20220930062749.152261-3-anuj20.g@samsung.com [axboe: shuffle valid flags check before acting on it] Signed-off-by: Jens Axboe --- include/linux/io_uring.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 1dbf51115c30..e10c5cc81082 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -28,7 +28,7 @@ struct io_uring_cmd { void *cookie; }; u32 cmd_op; - u32 pad; + u32 flags; u8 pdu[32]; /* available inline for free use */ }; -- cgit From 557654025df5706785d395558244890dc4b2c875 Mon Sep 17 00:00:00 2001 From: Anuj Gupta Date: Fri, 30 Sep 2022 11:57:40 +0530 Subject: block: add blk_rq_map_user_io Create a helper blk_rq_map_user_io for mapping of vectored as well as non-vectored requests. This will help in saving dupilcation of code at few places in scsi and nvme. Signed-off-by: Anuj Gupta Suggested-by: Christoph Hellwig Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20220930062749.152261-4-anuj20.g@samsung.com Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index 50811d0fb143..ba18e9bdb799 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -985,6 +985,8 @@ struct rq_map_data { int blk_rq_map_user(struct request_queue *, struct request *, struct rq_map_data *, void __user *, unsigned long, gfp_t); +int blk_rq_map_user_io(struct request *, struct rq_map_data *, + void __user *, unsigned long, gfp_t, bool, int, bool, int); int blk_rq_map_user_iov(struct request_queue *, struct request *, struct rq_map_data *, const struct iov_iter *, gfp_t); int blk_rq_unmap_user(struct bio *); -- cgit From e5a3cc83d54019a90119ce0cf17b5e21729b79bf Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Thu, 29 Sep 2022 00:21:42 -0700 Subject: net/mlx5e: Use runtime page_shift for striding RQ This commit allows striding RQ to determine MTT page size at runtime, instead of sticking to the compile-time PAGE_SIZE. This functionality will be used by a following commit that adjusts the MTT page size to the XSK frame size. Stick with PAGE_SIZE for XSK on legacy RQ, as frag_stride is not used in data path, it only helps calculate how pages are partitioned into fragments, and PAGE_SIZE will ensure each fragment starts at the beginning of a new allocation unit (XSK frame). Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Jakub Kicinski --- include/linux/mlx5/qp.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index be640c749d5e..afac93f9552c 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -162,6 +162,8 @@ enum { MLX5_SEND_WQE_MAX_WQEBBS = 16, }; +#define MLX5_SEND_WQE_MAX_SIZE (MLX5_SEND_WQE_MAX_WQEBBS * MLX5_SEND_WQE_BB) + enum { MLX5_WQE_FMR_PERM_LOCAL_READ = 1 << 27, MLX5_WQE_FMR_PERM_LOCAL_WRITE = 1 << 28, -- cgit From 6470d2e7e8ed8e9dd560d8dc3e09d1100a17ee26 Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Thu, 29 Sep 2022 00:21:46 -0700 Subject: net/mlx5e: xsk: Use KSM for unaligned XSK UMR MTTs used in striding RQ have certain alignment requirements. While it's guaranteed to work when UMR pages are aligned to the UMR page size, in practice it works then UMR pages are aligned to 8 bytes. However, it's still not enough flexibility for the unaligned mode of XSK. This patch leverages KSM to map UMR pages without alignment requirements, when unaligned XSK is active. The downside is that KSM entries are twice as big as MTTs, which limits the maximum WQE size, so regular RQs and aligned XSK continue using MTTs. Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Jakub Kicinski --- include/linux/mlx5/qp.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/qp.h b/include/linux/mlx5/qp.h index afac93f9552c..4657d5c54abe 100644 --- a/include/linux/mlx5/qp.h +++ b/include/linux/mlx5/qp.h @@ -478,6 +478,12 @@ struct mlx5_klm { __be64 va; }; +struct mlx5_ksm { + __be32 reserved; + __be32 key; + __be64 va; +}; + struct mlx5_stride_block_entry { __be16 stride; __be16 bcount; -- cgit From 82b1ec794d701478381482264f3bfada3a7bf2d9 Mon Sep 17 00:00:00 2001 From: Sumeet Pawnikar Date: Tue, 27 Sep 2022 21:17:09 +0530 Subject: thermal: core: Increase maximum number of trip points On one of the Chrome system, if we define more than 12 trip points, probe for thermal sensor fails with "int3403 thermal: probe of INTC1046:03 failed with error -22" and throws an error as "thermal_sys: Error: Incorrect number of thermal trips". The thermal_zone_device_register() interface needs maximum number of trip points supported in a zone as an argument. This number can't exceed THERMAL_MAX_TRIPS, which is currently set to 12. To address this issue, THERMAL_MAX_TRIPS value has to be increased. This interface also has an argument to specify a mask of trips which are writable. This mask is defined as an int. This mask sets the ceiling for increasing maximum number of supported trips. With the current implementation, maximum number of trips can be supported is 31. Also, THERMAL_MAX_TRIPS macro is used in one place only. So, remove THERMAL_MAX_TRIPS macro and compare num_trips directly with using a macro BITS_PER_TYPE(int)-1. Signed-off-by: Sumeet Pawnikar Reviewed-by: Andy Shevchenko Signed-off-by: Rafael J. Wysocki --- include/linux/thermal.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 86c24ddd5985..6f1ec4fb7ef8 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -17,8 +17,6 @@ #include #include -#define THERMAL_MAX_TRIPS 12 - /* invalid cooling state */ #define THERMAL_CSTATE_INVALID -1UL -- cgit From 88269151be679b9accb2f1e73487eaeb9eae9e39 Mon Sep 17 00:00:00 2001 From: Dmitry Torokhov Date: Thu, 29 Sep 2022 17:32:49 -0700 Subject: of: base: make of_device_compatible_match() accept const device node of_device_is_compatible() accepts const device node pointer, there is no reason why of_device_compatible_match() can't do the same. Signed-off-by: Dmitry Torokhov Link: https://lore.kernel.org/r/YzY5MaU5N4A2st5R@google.com Signed-off-by: Rob Herring --- include/linux/of.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/of.h b/include/linux/of.h index 766d002bddb9..6b79ef9a6541 100644 --- a/include/linux/of.h +++ b/include/linux/of.h @@ -342,7 +342,7 @@ extern int of_property_read_string_helper(const struct device_node *np, const char **out_strs, size_t sz, int index); extern int of_device_is_compatible(const struct device_node *device, const char *); -extern int of_device_compatible_match(struct device_node *device, +extern int of_device_compatible_match(const struct device_node *device, const char *const *compat); extern bool of_device_is_available(const struct device_node *device); extern bool of_device_is_big_endian(const struct device_node *device); @@ -562,7 +562,7 @@ static inline int of_device_is_compatible(const struct device_node *device, return 0; } -static inline int of_device_compatible_match(struct device_node *device, +static inline int of_device_compatible_match(const struct device_node *device, const char *const *compat) { return 0; -- cgit From 1c8934b4802d2744c97e7c97d244af967f4bf141 Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 23 Jun 2022 14:57:17 +0300 Subject: clk: Remove never used devm_of_clk_del_provider() For the entire history of the devm_of_clk_del_provider) existence (since 2017) it was never used. Remove it for good. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220623115719.52683-1-andriy.shevchenko@linux.intel.com Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 1615010aa0ec..305e18eb23cf 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -1454,7 +1454,7 @@ int devm_of_clk_add_hw_provider(struct device *dev, void *data), void *data); void of_clk_del_provider(struct device_node *np); -void devm_of_clk_del_provider(struct device *dev); + struct clk *of_clk_src_simple_get(struct of_phandle_args *clkspec, void *data); struct clk_hw *of_clk_hw_simple_get(struct of_phandle_args *clkspec, @@ -1491,7 +1491,7 @@ static inline int devm_of_clk_add_hw_provider(struct device *dev, return 0; } static inline void of_clk_del_provider(struct device_node *np) {} -static inline void devm_of_clk_del_provider(struct device *dev) {} + static inline struct clk *of_clk_src_simple_get( struct of_phandle_args *clkspec, void *data) { -- cgit From 07bdf48d3fee12268bf9a179821aee9b80f5239c Mon Sep 17 00:00:00 2001 From: Andy Shevchenko Date: Thu, 23 Jun 2022 14:57:18 +0300 Subject: clkdev: Remove never used devm_clk_release_clkdev() For the entire history of the devm_clk_release_clkdev() existence (since 2018) it was never used. Remove it for good. Signed-off-by: Andy Shevchenko Link: https://lore.kernel.org/r/20220623115719.52683-2-andriy.shevchenko@linux.intel.com Signed-off-by: Stephen Boyd --- include/linux/clkdev.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clkdev.h b/include/linux/clkdev.h index 8a8423eb8e9a..45570bc21a43 100644 --- a/include/linux/clkdev.h +++ b/include/linux/clkdev.h @@ -46,6 +46,4 @@ int clk_hw_register_clkdev(struct clk_hw *, const char *, const char *); int devm_clk_hw_register_clkdev(struct device *dev, struct clk_hw *hw, const char *con_id, const char *dev_id); -void devm_clk_release_clkdev(struct device *dev, const char *con_id, - const char *dev_id); #endif -- cgit From 854701ba4c39afae2362ba19a580c461cb183e4f Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 19 Sep 2022 14:05:54 -0700 Subject: net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and} The functions require to be passed with a cpu index prior to one that is the first to start search, so the valid input range is [-1, nr_cpu_ids-1). However, the code checks against [-1, nr_cpu_ids). Acked-by: Jakub Kicinski Signed-off-by: Yury Norov --- include/linux/netdevice.h | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 05d6f3facd5a..4d6d5a2dd82e 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3643,9 +3643,8 @@ static inline bool netif_attr_test_online(unsigned long j, static inline unsigned int netif_attrmask_next(int n, const unsigned long *srcp, unsigned int nr_bits) { - /* -1 is a legal arg here. */ - if (n != -1) - cpu_max_bits_warn(n, nr_bits); + /* n is a prior cpu */ + cpu_max_bits_warn(n + 1, nr_bits); if (srcp) return find_next_bit(srcp, nr_bits, n + 1); @@ -3666,9 +3665,8 @@ static inline int netif_attrmask_next_and(int n, const unsigned long *src1p, const unsigned long *src2p, unsigned int nr_bits) { - /* -1 is a legal arg here. */ - if (n != -1) - cpu_max_bits_warn(n, nr_bits); + /* n is a prior cpu */ + cpu_max_bits_warn(n + 1, nr_bits); if (src1p && src2p) return find_next_and_bit(src1p, src2p, nr_bits, n + 1); -- cgit From 33e67710beda78aed38a2fe10be6088d4aeb1c53 Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 19 Sep 2022 14:05:55 -0700 Subject: cpumask: switch for_each_cpu{,_not} to use for_each_bit() The difference between for_each_cpu() and for_each_set_bit() is that the latter uses cpumask_next() instead of find_next_bit(), and so calls cpumask_check(). This check is useless because the iterator value is not provided by user. It generates false-positives for the very last iteration of for_each_cpu(). Signed-off-by: Yury Norov --- include/linux/cpumask.h | 12 +++--------- include/linux/find.h | 5 +++++ 2 files changed, 8 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index e4f9136a4a63..9c85774d45a4 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -246,9 +246,7 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p, * After the loop, cpu is >= nr_cpu_ids. */ #define for_each_cpu(cpu, mask) \ - for ((cpu) = -1; \ - (cpu) = cpumask_next((cpu), (mask)), \ - (cpu) < nr_cpu_ids;) + for_each_set_bit(cpu, cpumask_bits(mask), nr_cpumask_bits) /** * for_each_cpu_not - iterate over every cpu in a complemented mask @@ -258,9 +256,7 @@ unsigned int cpumask_next_and(int n, const struct cpumask *src1p, * After the loop, cpu is >= nr_cpu_ids. */ #define for_each_cpu_not(cpu, mask) \ - for ((cpu) = -1; \ - (cpu) = cpumask_next_zero((cpu), (mask)), \ - (cpu) < nr_cpu_ids;) + for_each_clear_bit(cpu, cpumask_bits(mask), nr_cpumask_bits) #if NR_CPUS == 1 static inline @@ -313,9 +309,7 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta * After the loop, cpu is >= nr_cpu_ids. */ #define for_each_cpu_and(cpu, mask1, mask2) \ - for ((cpu) = -1; \ - (cpu) = cpumask_next_and((cpu), (mask1), (mask2)), \ - (cpu) < nr_cpu_ids;) + for_each_and_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), nr_cpumask_bits) /** * cpumask_any_but - return a "random" in a cpumask, but not this one. diff --git a/include/linux/find.h b/include/linux/find.h index b100944daba0..128615a3f93e 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -390,6 +390,11 @@ unsigned long find_next_bit_le(const void *addr, unsigned (bit) < (size); \ (bit) = find_next_bit((addr), (size), (bit) + 1)) +#define for_each_and_bit(bit, addr1, addr2, size) \ + for ((bit) = find_next_and_bit((addr1), (addr2), (size), 0); \ + (bit) < (size); \ + (bit) = find_next_and_bit((addr1), (addr2), (size), (bit) + 1)) + /* same as for_each_set_bit() but use bit as value to start with */ #define for_each_set_bit_from(bit, addr, size) \ for ((bit) = find_next_bit((addr), (size), (bit)); \ -- cgit From 6cc18331a987c4a29d66b9c4fd292587fba4d7bd Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 19 Sep 2022 14:05:56 -0700 Subject: lib/find_bit: add find_next{,_and}_bit_wrap The helper is better optimized for the worst case: in case of empty cpumask, current code traverses 2 * size: next = cpumask_next_and(prev, src1p, src2p); if (next >= nr_cpu_ids) next = cpumask_first_and(src1p, src2p); At bitmap level we can stop earlier after checking 'size + offset' bits. Signed-off-by: Yury Norov --- include/linux/find.h | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) (limited to 'include/linux') diff --git a/include/linux/find.h b/include/linux/find.h index 128615a3f93e..77c087b7a451 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -290,6 +290,52 @@ unsigned long find_last_bit(const unsigned long *addr, unsigned long size) } #endif +/** + * find_next_and_bit_wrap - find the next set bit in both memory regions + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at + * + * Returns the bit number for the next set bit, or first set bit up to @offset + * If no bits are set, returns @size. + */ +static inline +unsigned long find_next_and_bit_wrap(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size, unsigned long offset) +{ + unsigned long bit = find_next_and_bit(addr1, addr2, size, offset); + + if (bit < size) + return bit; + + bit = find_first_and_bit(addr1, addr2, offset); + return bit < offset ? bit : size; +} + +/** + * find_next_bit_wrap - find the next set bit in both memory regions + * @addr: The first address to base the search on + * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at + * + * Returns the bit number for the next set bit, or first set bit up to @offset + * If no bits are set, returns @size. + */ +static inline +unsigned long find_next_bit_wrap(const unsigned long *addr, + unsigned long size, unsigned long offset) +{ + unsigned long bit = find_next_bit(addr, size, offset); + + if (bit < size) + return bit; + + bit = find_first_bit(addr, offset); + return bit < offset ? bit : size; +} + /** * find_next_clump8 - find next 8-bit clump with set bits in a memory region * @clump: location to store copy of found clump -- cgit From 4fe49b3b97c2640147c46519c2a6fdb06df34f5f Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 19 Sep 2022 14:05:57 -0700 Subject: lib/bitmap: introduce for_each_set_bit_wrap() macro Add for_each_set_bit_wrap() macro and use it in for_each_cpu_wrap(). The new macro is based on __for_each_wrap() iterator, which is simpler and smaller than cpumask_next_wrap(). Signed-off-by: Yury Norov --- include/linux/cpumask.h | 6 ++---- include/linux/find.h | 39 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 41 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 9c85774d45a4..13b32dd9803b 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -289,10 +289,8 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta * * After the loop, cpu is >= nr_cpu_ids. */ -#define for_each_cpu_wrap(cpu, mask, start) \ - for ((cpu) = cpumask_next_wrap((start)-1, (mask), (start), false); \ - (cpu) < nr_cpumask_bits; \ - (cpu) = cpumask_next_wrap((cpu), (mask), (start), true)) +#define for_each_cpu_wrap(cpu, mask, start) \ + for_each_set_bit_wrap(cpu, cpumask_bits(mask), nr_cpumask_bits, start) /** * for_each_cpu_and - iterate over every cpu in both masks diff --git a/include/linux/find.h b/include/linux/find.h index 77c087b7a451..3b746a183216 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -336,6 +336,32 @@ unsigned long find_next_bit_wrap(const unsigned long *addr, return bit < offset ? bit : size; } +/* + * Helper for for_each_set_bit_wrap(). Make sure you're doing right thing + * before using it alone. + */ +static inline +unsigned long __for_each_wrap(const unsigned long *bitmap, unsigned long size, + unsigned long start, unsigned long n) +{ + unsigned long bit; + + /* If not wrapped around */ + if (n > start) { + /* and have a bit, just return it. */ + bit = find_next_bit(bitmap, size, n); + if (bit < size) + return bit; + + /* Otherwise, wrap around and ... */ + n = 0; + } + + /* Search the other part. */ + bit = find_next_bit(bitmap, start, n); + return bit < start ? bit : size; +} + /** * find_next_clump8 - find next 8-bit clump with set bits in a memory region * @clump: location to store copy of found clump @@ -514,6 +540,19 @@ unsigned long find_next_bit_le(const void *addr, unsigned (b) = find_next_zero_bit((addr), (size), (e) + 1), \ (e) = find_next_bit((addr), (size), (b) + 1)) +/** + * for_each_set_bit_wrap - iterate over all set bits starting from @start, and + * wrapping around the end of bitmap. + * @bit: offset for current iteration + * @addr: bitmap address to base the search on + * @size: bitmap size in number of bits + * @start: Starting bit for bitmap traversing, wrapping around the bitmap end + */ +#define for_each_set_bit_wrap(bit, addr, size, start) \ + for ((bit) = find_next_bit_wrap((addr), (size), (start)); \ + (bit) < (size); \ + (bit) = __for_each_wrap((addr), (size), (start), (bit) + 1)) + /** * for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits * @start: bit offset to start search and to store the current iteration offset -- cgit From fdae96a3fc7f70eb8ff9619550d7fa604719626a Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 19 Sep 2022 14:05:58 -0700 Subject: lib/find: optimize for_each() macros Moving an iterator of the macros inside conditional part of for-loop helps to generate a better code. It had been first implemented in commit 7baac8b91f9871ba ("cpumask: make for_each_cpu_mask a bit smaller"). Now that cpumask for-loops are the aliases to bitmap loops, it's worth to optimize them the same way. Bloat-o-meter says: add/remove: 8/12 grow/shrink: 147/592 up/down: 4876/-24416 (-19540) Signed-off-by: Yury Norov --- include/linux/find.h | 56 +++++++++++++++++++++++----------------------------- 1 file changed, 25 insertions(+), 31 deletions(-) (limited to 'include/linux') diff --git a/include/linux/find.h b/include/linux/find.h index 3b746a183216..0cdfab9734a6 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -458,31 +458,25 @@ unsigned long find_next_bit_le(const void *addr, unsigned #endif #define for_each_set_bit(bit, addr, size) \ - for ((bit) = find_next_bit((addr), (size), 0); \ - (bit) < (size); \ - (bit) = find_next_bit((addr), (size), (bit) + 1)) + for ((bit) = 0; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++) #define for_each_and_bit(bit, addr1, addr2, size) \ - for ((bit) = find_next_and_bit((addr1), (addr2), (size), 0); \ - (bit) < (size); \ - (bit) = find_next_and_bit((addr1), (addr2), (size), (bit) + 1)) + for ((bit) = 0; \ + (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\ + (bit)++) /* same as for_each_set_bit() but use bit as value to start with */ #define for_each_set_bit_from(bit, addr, size) \ - for ((bit) = find_next_bit((addr), (size), (bit)); \ - (bit) < (size); \ - (bit) = find_next_bit((addr), (size), (bit) + 1)) + for (; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++) #define for_each_clear_bit(bit, addr, size) \ - for ((bit) = find_next_zero_bit((addr), (size), 0); \ - (bit) < (size); \ - (bit) = find_next_zero_bit((addr), (size), (bit) + 1)) + for ((bit) = 0; \ + (bit) = find_next_zero_bit((addr), (size), (bit)), (bit) < (size); \ + (bit)++) /* same as for_each_clear_bit() but use bit as value to start with */ #define for_each_clear_bit_from(bit, addr, size) \ - for ((bit) = find_next_zero_bit((addr), (size), (bit)); \ - (bit) < (size); \ - (bit) = find_next_zero_bit((addr), (size), (bit) + 1)) + for (; (bit) = find_next_zero_bit((addr), (size), (bit)), (bit) < (size); (bit)++) /** * for_each_set_bitrange - iterate over all set bit ranges [b; e) @@ -492,11 +486,11 @@ unsigned long find_next_bit_le(const void *addr, unsigned * @size: bitmap size in number of bits */ #define for_each_set_bitrange(b, e, addr, size) \ - for ((b) = find_next_bit((addr), (size), 0), \ - (e) = find_next_zero_bit((addr), (size), (b) + 1); \ + for ((b) = 0; \ + (b) = find_next_bit((addr), (size), b), \ + (e) = find_next_zero_bit((addr), (size), (b) + 1), \ (b) < (size); \ - (b) = find_next_bit((addr), (size), (e) + 1), \ - (e) = find_next_zero_bit((addr), (size), (b) + 1)) + (b) = (e) + 1) /** * for_each_set_bitrange_from - iterate over all set bit ranges [b; e) @@ -506,11 +500,11 @@ unsigned long find_next_bit_le(const void *addr, unsigned * @size: bitmap size in number of bits */ #define for_each_set_bitrange_from(b, e, addr, size) \ - for ((b) = find_next_bit((addr), (size), (b)), \ - (e) = find_next_zero_bit((addr), (size), (b) + 1); \ + for (; \ + (b) = find_next_bit((addr), (size), (b)), \ + (e) = find_next_zero_bit((addr), (size), (b) + 1), \ (b) < (size); \ - (b) = find_next_bit((addr), (size), (e) + 1), \ - (e) = find_next_zero_bit((addr), (size), (b) + 1)) + (b) = (e) + 1) /** * for_each_clear_bitrange - iterate over all unset bit ranges [b; e) @@ -520,11 +514,11 @@ unsigned long find_next_bit_le(const void *addr, unsigned * @size: bitmap size in number of bits */ #define for_each_clear_bitrange(b, e, addr, size) \ - for ((b) = find_next_zero_bit((addr), (size), 0), \ - (e) = find_next_bit((addr), (size), (b) + 1); \ + for ((b) = 0; \ + (b) = find_next_zero_bit((addr), (size), (b)), \ + (e) = find_next_bit((addr), (size), (b) + 1), \ (b) < (size); \ - (b) = find_next_zero_bit((addr), (size), (e) + 1), \ - (e) = find_next_bit((addr), (size), (b) + 1)) + (b) = (e) + 1) /** * for_each_clear_bitrange_from - iterate over all unset bit ranges [b; e) @@ -534,11 +528,11 @@ unsigned long find_next_bit_le(const void *addr, unsigned * @size: bitmap size in number of bits */ #define for_each_clear_bitrange_from(b, e, addr, size) \ - for ((b) = find_next_zero_bit((addr), (size), (b)), \ - (e) = find_next_bit((addr), (size), (b) + 1); \ + for (; \ + (b) = find_next_zero_bit((addr), (size), (b)), \ + (e) = find_next_bit((addr), (size), (b) + 1), \ (b) < (size); \ - (b) = find_next_zero_bit((addr), (size), (e) + 1), \ - (e) = find_next_bit((addr), (size), (b) + 1)) + (b) = (e) + 1) /** * for_each_set_bit_wrap - iterate over all set bits starting from @start, and -- cgit From 78e5a3399421ad79fc024e6d78e2deb7809d26af Mon Sep 17 00:00:00 2001 From: Yury Norov Date: Mon, 19 Sep 2022 14:05:53 -0700 Subject: cpumask: fix checking valid cpu range The range of valid CPUs is [0, nr_cpu_ids). Some cpumask functions are passed with a shifted CPU index, and for them, the valid range is [-1, nr_cpu_ids-1). Currently for those functions, we check the index against [-1, nr_cpu_ids), which is wrong. Signed-off-by: Yury Norov --- include/linux/cpumask.h | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 13b32dd9803b..286804bfe3b7 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -174,9 +174,8 @@ static inline unsigned int cpumask_last(const struct cpumask *srcp) static inline unsigned int cpumask_next(int n, const struct cpumask *srcp) { - /* -1 is a legal arg here. */ - if (n != -1) - cpumask_check(n); + /* n is a prior cpu */ + cpumask_check(n + 1); return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1); } @@ -189,9 +188,8 @@ unsigned int cpumask_next(int n, const struct cpumask *srcp) */ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp) { - /* -1 is a legal arg here. */ - if (n != -1) - cpumask_check(n); + /* n is a prior cpu */ + cpumask_check(n + 1); return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1); } @@ -231,9 +229,8 @@ static inline unsigned int cpumask_next_and(int n, const struct cpumask *src1p, const struct cpumask *src2p) { - /* -1 is a legal arg here. */ - if (n != -1) - cpumask_check(n); + /* n is a prior cpu */ + cpumask_check(n + 1); return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p), nr_cpumask_bits, n + 1); } @@ -263,8 +260,8 @@ static inline unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap) { cpumask_check(start); - if (n != -1) - cpumask_check(n); + /* n is a prior cpu */ + cpumask_check(n + 1); /* * Return the first available CPU when wrapping, or when starting before cpu0, -- cgit From fd580c9830316edad6f8b1d9f542563730658efe Mon Sep 17 00:00:00 2001 From: Russell King Date: Fri, 30 Sep 2022 16:21:00 +0200 Subject: net: sfp: augment SFP parsing with phy_interface_t bitmap MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit We currently parse the SFP EEPROM to a bitmap of ethtool link modes, and then attempt to convert the link modes to a PHY interface mode. While this works at present, there are cases where this is sub-optimal. For example, where a module can operate with several different PHY interface modes. To start addressing this, arrange for the SFP EEPROM parsing to also provide a bitmap of the possible PHY interface modes. Signed-off-by: Russell King Signed-off-by: Marek Behún Signed-off-by: David S. Miller --- include/linux/sfp.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sfp.h b/include/linux/sfp.h index 302094b855fb..d1f343853b6c 100644 --- a/include/linux/sfp.h +++ b/include/linux/sfp.h @@ -535,7 +535,7 @@ int sfp_parse_port(struct sfp_bus *bus, const struct sfp_eeprom_id *id, unsigned long *support); bool sfp_may_have_phy(struct sfp_bus *bus, const struct sfp_eeprom_id *id); void sfp_parse_support(struct sfp_bus *bus, const struct sfp_eeprom_id *id, - unsigned long *support); + unsigned long *support, unsigned long *interfaces); phy_interface_t sfp_select_interface(struct sfp_bus *bus, unsigned long *link_modes); @@ -568,7 +568,8 @@ static inline bool sfp_may_have_phy(struct sfp_bus *bus, static inline void sfp_parse_support(struct sfp_bus *bus, const struct sfp_eeprom_id *id, - unsigned long *support) + unsigned long *support, + unsigned long *interfaces) { } -- cgit From eca68a3c7d05b38b4e728cead0c49718f2bc1d5a Mon Sep 17 00:00:00 2001 From: Marek Behún Date: Fri, 30 Sep 2022 16:21:03 +0200 Subject: net: phylink: pass supported host PHY interface modes to phylib for SFP's PHYs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Pass the supported PHY interface types to phylib if the PHY we are connecting is inside a SFP, so that the PHY driver can select an appropriate host configuration mode for their interface according to the host capabilities. For example the Marvell 88X3310 PHY inside RollBall SFP modules defaults to 10gbase-r mode on host's side, and the marvell10g driver currently does not change this setting. But a host may not support 10gbase-r. For example Turris Omnia only supports sgmii, 1000base-x and 2500base-x modes. The PHY can be configured to use those modes, but in order for the PHY driver to do that, it needs to know which modes are supported. Signed-off-by: Marek Behún Signed-off-by: David S. Miller --- include/linux/phy.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index 9c66f357f489..d65fc76fe0ae 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -571,6 +571,7 @@ struct macsec_ops; * @advertising: Currently advertised linkmodes * @adv_old: Saved advertised while power saving for WoL * @lp_advertising: Current link partner advertised linkmodes + * @host_interfaces: PHY interface modes supported by host * @eee_broken_modes: Energy efficient ethernet modes which should be prohibited * @autoneg: Flag autoneg being used * @rate_matching: Current rate matching mode @@ -670,6 +671,9 @@ struct phy_device { /* used with phy_speed_down */ __ETHTOOL_DECLARE_LINK_MODE_MASK(adv_old); + /* Host supported PHY interface types. Should be ignored if empty. */ + DECLARE_PHY_INTERFACE_MASK(host_interfaces); + /* Energy efficient ethernet modes which should be prohibited */ u32 eee_broken_modes; -- cgit From e85b1347ace677c3822c12d9332dfaaffe594da6 Mon Sep 17 00:00:00 2001 From: Marek Behún Date: Fri, 30 Sep 2022 16:21:08 +0200 Subject: net: sfp: create/destroy I2C mdiobus before PHY probe/after PHY release MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Instead of configuring the I2C mdiobus when SFP driver is probed, create/destroy the mdiobus before the PHY is probed for/after it is released. This way we can tell the mdio-i2c code which protocol to use for each SFP transceiver. Move the code that determines MDIO I2C protocol from sfp_sm_probe_for_phy() to sfp_sm_mod_probe(), where most of the SFP ID parsing is done. Don't allocate I2C bus if no PHY is expected. Signed-off-by: Marek Behún Signed-off-by: David S. Miller --- include/linux/mdio/mdio-i2c.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mdio/mdio-i2c.h b/include/linux/mdio/mdio-i2c.h index b1d27f7cd23f..3bde1a555a49 100644 --- a/include/linux/mdio/mdio-i2c.h +++ b/include/linux/mdio/mdio-i2c.h @@ -11,6 +11,12 @@ struct device; struct i2c_adapter; struct mii_bus; +enum mdio_i2c_proto { + MDIO_I2C_NONE, + MDIO_I2C_MARVELL_C22, + MDIO_I2C_C45, +}; + struct mii_bus *mdio_i2c_alloc(struct device *parent, struct i2c_adapter *i2c); #endif -- cgit From 09bbedac72d5a9267088c15d1a71c8c3a8fb47e7 Mon Sep 17 00:00:00 2001 From: Marek Behún Date: Fri, 30 Sep 2022 16:21:09 +0200 Subject: net: phy: mdio-i2c: support I2C MDIO protocol for RollBall SFP modules MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Some multigig SFPs from RollBall and Hilink do not expose functional MDIO access to the internal PHY of the SFP via I2C address 0x56 (although there seems to be read-only clause 22 access on this address). Instead these SFPs PHY can be accessed via I2C via the SFP Enhanced Digital Diagnostic Interface - I2C address 0x51. The SFP_PAGE has to be selected to 3 and the password must be filled with 0xff bytes for this PHY communication to work. This extends the mdio-i2c driver to support this protocol by adding a special parameter to mdio_i2c_alloc function via which this RollBall protocol can be selected. Signed-off-by: Marek Behún Cc: Andrew Lunn Cc: Russell King Signed-off-by: David S. Miller --- include/linux/mdio/mdio-i2c.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mdio/mdio-i2c.h b/include/linux/mdio/mdio-i2c.h index 3bde1a555a49..65b550a6fc32 100644 --- a/include/linux/mdio/mdio-i2c.h +++ b/include/linux/mdio/mdio-i2c.h @@ -15,8 +15,10 @@ enum mdio_i2c_proto { MDIO_I2C_NONE, MDIO_I2C_MARVELL_C22, MDIO_I2C_C45, + MDIO_I2C_ROLLBALL, }; -struct mii_bus *mdio_i2c_alloc(struct device *parent, struct i2c_adapter *i2c); +struct mii_bus *mdio_i2c_alloc(struct device *parent, struct i2c_adapter *i2c, + enum mdio_i2c_proto protocol); #endif -- cgit From 62c07983bef9d3e78e71189441e1a470f0d1e653 Mon Sep 17 00:00:00 2001 From: Eric Dumazet Date: Sat, 1 Oct 2022 13:51:02 -0700 Subject: once: add DO_ONCE_SLOW() for sleepable contexts Christophe Leroy reported a ~80ms latency spike happening at first TCP connect() time. This is because __inet_hash_connect() uses get_random_once() to populate a perturbation table which became quite big after commit 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") get_random_once() uses DO_ONCE(), which block hard irqs for the duration of the operation. This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock for operations where we prefer to stay in process context. Then __inet_hash_connect() can use get_random_slow_once() to populate its perturbation table. Fixes: 4c2c8f03a5ab ("tcp: increase source port perturb table to 2^16") Fixes: 190cc82489f4 ("tcp: change source port randomizarion at connect() time") Reported-by: Christophe Leroy Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#t Signed-off-by: Eric Dumazet Cc: Willy Tarreau Tested-by: Christophe Leroy Signed-off-by: David S. Miller --- include/linux/once.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/once.h b/include/linux/once.h index b14d8b309d52..176ab75b42df 100644 --- a/include/linux/once.h +++ b/include/linux/once.h @@ -5,10 +5,18 @@ #include #include +/* Helpers used from arbitrary contexts. + * Hard irqs are blocked, be cautious. + */ bool __do_once_start(bool *done, unsigned long *flags); void __do_once_done(bool *done, struct static_key_true *once_key, unsigned long *flags, struct module *mod); +/* Variant for process contexts only. */ +bool __do_once_slow_start(bool *done); +void __do_once_slow_done(bool *done, struct static_key_true *once_key, + struct module *mod); + /* Call a function exactly once. The idea of DO_ONCE() is to perform * a function call such as initialization of random seeds, etc, only * once, where DO_ONCE() can live in the fast-path. After @func has @@ -52,7 +60,27 @@ void __do_once_done(bool *done, struct static_key_true *once_key, ___ret; \ }) +/* Variant of DO_ONCE() for process/sleepable contexts. */ +#define DO_ONCE_SLOW(func, ...) \ + ({ \ + bool ___ret = false; \ + static bool __section(".data.once") ___done = false; \ + static DEFINE_STATIC_KEY_TRUE(___once_key); \ + if (static_branch_unlikely(&___once_key)) { \ + ___ret = __do_once_slow_start(&___done); \ + if (unlikely(___ret)) { \ + func(__VA_ARGS__); \ + __do_once_slow_done(&___done, &___once_key, \ + THIS_MODULE); \ + } \ + } \ + ___ret; \ + }) + #define get_random_once(buf, nbytes) \ DO_ONCE(get_random_bytes, (buf), (nbytes)) +#define get_random_slow_once(buf, nbytes) \ + DO_ONCE_SLOW(get_random_bytes, (buf), (nbytes)) + #endif /* _LINUX_ONCE_H */ -- cgit From 79e1119b7e0099c6c9379ca3129ffb7aa2a1c249 Mon Sep 17 00:00:00 2001 From: Qi Zheng Date: Wed, 31 Aug 2022 11:19:47 +0800 Subject: ksm: remove redundant declarations in ksm.h Currently, for struct stable_node, no one uses it in both the include/linux/ksm.h file and the file that contains it. For struct mem_cgroup, it's also not used in ksm.h. So they're all redundant, just remove them. Link: https://lkml.kernel.org/r/20220831031951.43152-4-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng Cc: Johannes Weiner Cc: Matthew Wilcox Cc: Mike Rapoport Cc: Minchan Kim Cc: Vlastimil Babka Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/ksm.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ksm.h b/include/linux/ksm.h index 0b4f17418f64..7e232ba59b86 100644 --- a/include/linux/ksm.h +++ b/include/linux/ksm.h @@ -15,9 +15,6 @@ #include #include -struct stable_node; -struct mem_cgroup; - #ifdef CONFIG_KSM int ksm_madvise(struct vm_area_struct *vma, unsigned long start, unsigned long end, int advice, unsigned long *vm_flags); -- cgit From 379708ffde1b049bc41084e0a0572c44c8a1d2c4 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:45:58 +0100 Subject: mm: add the first tail page to struct folio Some of the static checkers get confused by extracting the page from the folio and referring to fields in the first tail page. Adding these fields to struct folio lets us avoid doing that. It has the risk that people will refer to those fields without checking that the folio is actually a large folio, so prefix them with underscores and document the preferred function to use instead. Link: https://lkml.kernel.org/r/20220902194653.1739778-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/mm_types.h | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8f30f262431c..5c87d0f292a2 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -245,6 +245,13 @@ struct page { * @_refcount: Do not access this member directly. Use folio_ref_count() * to find how many references there are to this folio. * @memcg_data: Memory Control Group data. + * @_flags_1: For large folios, additional page flags. + * @__head: Points to the folio. Do not use. + * @_folio_dtor: Which destructor to use for this folio. + * @_folio_order: Do not use directly, call folio_order(). + * @_total_mapcount: Do not use directly, call folio_entire_mapcount(). + * @_pincount: Do not use directly, call folio_maybe_dma_pinned(). + * @_folio_nr_pages: Do not use directly, call folio_nr_pages(). * * A folio is a physically, virtually and logically contiguous set * of bytes. It is a power-of-two in size, and it is aligned to that @@ -283,9 +290,17 @@ struct folio { }; struct page page; }; + unsigned long _flags_1; + unsigned long __head; + unsigned char _folio_dtor; + unsigned char _folio_order; + atomic_t _total_mapcount; + atomic_t _pincount; +#ifdef CONFIG_64BIT + unsigned int _folio_nr_pages; +#endif }; -static_assert(sizeof(struct page) == sizeof(struct folio)); #define FOLIO_MATCH(pg, fl) \ static_assert(offsetof(struct page, pg) == offsetof(struct folio, fl)) FOLIO_MATCH(flags, flags); @@ -300,6 +315,19 @@ FOLIO_MATCH(_refcount, _refcount); FOLIO_MATCH(memcg_data, memcg_data); #endif #undef FOLIO_MATCH +#define FOLIO_MATCH(pg, fl) \ + static_assert(offsetof(struct folio, fl) == \ + offsetof(struct page, pg) + sizeof(struct page)) +FOLIO_MATCH(flags, _flags_1); +FOLIO_MATCH(compound_head, __head); +FOLIO_MATCH(compound_dtor, _folio_dtor); +FOLIO_MATCH(compound_order, _folio_order); +FOLIO_MATCH(compound_mapcount, _total_mapcount); +FOLIO_MATCH(compound_pincount, _pincount); +#ifdef CONFIG_64BIT +FOLIO_MATCH(compound_nr, _folio_nr_pages); +#endif +#undef FOLIO_MATCH static inline atomic_t *folio_mapcount_ptr(struct folio *folio) { -- cgit From c3a15bff46cb5149aeae4c8ae69443d791fa6578 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:45:59 +0100 Subject: mm: reimplement folio_order() and folio_nr_pages() Instead of calling compound_order() and compound_nr_pages(), use the folio directly. Saves 1905 bytes from mm/filemap.o due to folio_test_large() now being a cheaper check than PageHead(). Link: https://lkml.kernel.org/r/20220902194653.1739778-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/mm.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index e56dd8f7eae1..a37c8a29c49b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -729,7 +729,9 @@ static inline unsigned int compound_order(struct page *page) */ static inline unsigned int folio_order(struct folio *folio) { - return compound_order(&folio->page); + if (!folio_test_large(folio)) + return 0; + return folio->_folio_order; } #include @@ -1659,7 +1661,13 @@ static inline void set_page_links(struct page *page, enum zone_type zone, */ static inline long folio_nr_pages(struct folio *folio) { - return compound_nr(&folio->page); + if (!folio_test_large(folio)) + return 1; +#ifdef CONFIG_64BIT + return folio->_folio_nr_pages; +#else + return 1L << folio->_folio_order; +#endif } /** -- cgit From d788f5b374c2ba204fed57e39acf2452acc24812 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:00 +0100 Subject: mm: add split_folio() This wrapper removes a need to use split_huge_page(&folio->page). Convert two callers. Link: https://lkml.kernel.org/r/20220902194653.1739778-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/huge_mm.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 38265f9f782e..a1341fdcf666 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -444,6 +444,11 @@ static inline int split_folio_to_list(struct folio *folio, return split_huge_page_to_list(&folio->page, list); } +static inline int split_folio(struct folio *folio) +{ + return split_folio_to_list(folio, NULL); +} + /* * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to * limitations in the implementation like arm64 MTE can override this to -- cgit From 681ecf6301786cb06942b57f0ef7103b07ae6813 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:01 +0100 Subject: mm: add folio_add_lru_vma() Convert lru_cache_add_inactive_or_unevictable() to folio_add_lru_vma() and add a compatibility wrapper. Link: https://lkml.kernel.org/r/20220902194653.1739778-6-willy@infradead.org Signed-off-by: "Matthew Wilcox (Oracle)" Signed-off-by: Andrew Morton --- include/linux/swap.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 6308150b234a..2ede1e3695d9 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -379,11 +379,11 @@ extern unsigned long totalreserve_pages; /* linux/mm/swap.c */ -extern void lru_note_cost(struct lruvec *lruvec, bool file, - unsigned int nr_pages); -extern void lru_note_cost_folio(struct folio *); -extern void folio_add_lru(struct folio *); -extern void lru_cache_add(struct page *); +void lru_note_cost(struct lruvec *lruvec, bool file, unsigned int nr_pages); +void lru_note_cost_folio(struct folio *); +void folio_add_lru(struct folio *); +void folio_add_lru_vma(struct folio *, struct vm_area_struct *); +void lru_cache_add(struct page *); void mark_page_accessed(struct page *); void folio_mark_accessed(struct folio *); -- cgit From 907ea17eb2b436f07332c935476d77893abae735 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:04 +0100 Subject: shmem: convert shmem_replace_page() to use folios throughout Introduce folio_set_swap_entry() to abstract how both folio->private and swp_entry_t work. Use swap_address_space() directly instead of indirecting through folio_mapping(). Include an assertion that the old folio is not large as we only allocate a single-page folio to replace it. Use folio_put_refs() instead of calling folio_put() twice. Link: https://lkml.kernel.org/r/20220902194653.1739778-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/swap.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 2ede1e3695d9..61e13d1a4cab 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -355,6 +355,11 @@ static inline swp_entry_t folio_swap_entry(struct folio *folio) return entry; } +static inline void folio_set_swap_entry(struct folio *folio, swp_entry_t entry) +{ + folio->private = (void *)entry.val; +} + /* linux/mm/workingset.c */ void workingset_age_nonresident(struct lruvec *lruvec, unsigned long nr_pages); void *workingset_eviction(struct folio *folio, struct mem_cgroup *target_memcg); -- cgit From bdb0ed54a4768dc3c2613d4c45f94c887d43cd7a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:06 +0100 Subject: mm/swapfile: convert try_to_free_swap() to folio_free_swap() Add kernel-doc for folio_free_swap() and make it return bool. Add a try_to_free_swap() compatibility wrapper. Link: https://lkml.kernel.org/r/20220902194653.1739778-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/swap.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 61e13d1a4cab..dac6308d878e 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -490,6 +490,7 @@ static inline long get_nr_swap_pages(void) extern void si_swapinfo(struct sysinfo *); swp_entry_t folio_alloc_swap(struct folio *folio); +bool folio_free_swap(struct folio *folio); extern void put_swap_page(struct page *page, swp_entry_t entry); extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); @@ -606,6 +607,11 @@ static inline swp_entry_t folio_alloc_swap(struct folio *folio) return entry; } +static inline bool folio_free_swap(struct folio *folio) +{ + return false; +} + static inline int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, unsigned long nr_pages, sector_t start_block) -- cgit From 4081f7446d95a9d3ced12dc04ff02c187a761e90 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:09 +0100 Subject: mm/swap: convert put_swap_page() to put_swap_folio() With all callers now using a folio, we can convert this function. Link: https://lkml.kernel.org/r/20220902194653.1739778-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/swap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index dac6308d878e..42cbef554de6 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -491,7 +491,7 @@ static inline long get_nr_swap_pages(void) extern void si_swapinfo(struct sysinfo *); swp_entry_t folio_alloc_swap(struct folio *folio); bool folio_free_swap(struct folio *folio); -extern void put_swap_page(struct page *page, swp_entry_t entry); +void put_swap_folio(struct folio *folio, swp_entry_t entry); extern swp_entry_t get_swap_page_of_type(int); extern int get_swap_pages(int n, swp_entry_t swp_entries[], int entry_size); extern int add_swap_count_continuation(swp_entry_t, gfp_t); @@ -576,7 +576,7 @@ static inline void swap_free(swp_entry_t swp) { } -static inline void put_swap_page(struct page *page, swp_entry_t swp) +static inline void put_swap_folio(struct folio *folio, swp_entry_t swp) { } -- cgit From 6599591816f522c1cc8ec4eb5cea75738963756a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:12 +0100 Subject: memcg: convert mem_cgroup_swapin_charge_page() to mem_cgroup_swapin_charge_folio() All callers now have a folio, so pass it in here and remove an unnecessary call to page_folio(). Link: https://lkml.kernel.org/r/20220902194653.1739778-17-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 60545e4a1c03..ca0df42662ad 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -688,7 +688,7 @@ static inline int mem_cgroup_charge(struct folio *folio, struct mm_struct *mm, return __mem_cgroup_charge(folio, mm, gfp); } -int mem_cgroup_swapin_charge_page(struct page *page, struct mm_struct *mm, +int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry); void mem_cgroup_swapin_uncharge_swap(swp_entry_t entry); @@ -1254,7 +1254,7 @@ static inline int mem_cgroup_charge(struct folio *folio, return 0; } -static inline int mem_cgroup_swapin_charge_page(struct page *page, +static inline int mem_cgroup_swapin_charge_folio(struct folio *folio, struct mm_struct *mm, gfp_t gfp, swp_entry_t entry) { return 0; -- cgit From 4e1fc793ad9892cec67b40c9f67583160e08f695 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:20 +0100 Subject: shmem: add shmem_get_folio() With no remaining callers of shmem_getpage_gfp(), add shmem_get_folio() and reimplement shmem_getpage() as a call to shmem_get_folio(). Link: https://lkml.kernel.org/r/20220902194653.1739778-25-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/shmem_fs.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index ff0b990de83d..f4bd50b08a91 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -113,6 +113,8 @@ enum sgp_type { extern int shmem_getpage(struct inode *inode, pgoff_t index, struct page **pagep, enum sgp_type sgp); +int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop, + enum sgp_type sgp); static inline struct page *shmem_read_mapping_page( struct address_space *mapping, pgoff_t index) -- cgit From 923e2f0e7c30db5c1ee5d680050ab781e6c114fb Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:29 +0100 Subject: shmem: remove shmem_getpage() With all callers removed, remove this wrapper function. The flags are now mysteriously called SGP, but I think we can live with that. Link: https://lkml.kernel.org/r/20220902194653.1739778-34-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/shmem_fs.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index f4bd50b08a91..f24071e3c826 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -102,7 +102,7 @@ extern unsigned long shmem_swap_usage(struct vm_area_struct *vma); extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, pgoff_t start, pgoff_t end); -/* Flag allocation requirements to shmem_getpage */ +/* Flag allocation requirements to shmem_get_folio */ enum sgp_type { SGP_READ, /* don't exceed i_size, don't allocate page */ SGP_NOALLOC, /* similar, but fail on hole or use fallocated page */ @@ -111,8 +111,6 @@ enum sgp_type { SGP_FALLOC, /* like SGP_WRITE, but make existing page Uptodate */ }; -extern int shmem_getpage(struct inode *inode, pgoff_t index, - struct page **pagep, enum sgp_type sgp); int shmem_get_folio(struct inode *inode, pgoff_t index, struct folio **foliop, enum sgp_type sgp); -- cgit From 9202d527b715f67bcdccbb9b712b65fe053f8109 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:43 +0100 Subject: memcg: convert mem_cgroup_swap_full() to take a folio All callers now have a folio, so convert the function to take a folio. Saves a couple of calls to compound_head(). Link: https://lkml.kernel.org/r/20220902194653.1739778-48-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/swap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index 42cbef554de6..d8bd6401c3e7 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -692,7 +692,7 @@ static inline void mem_cgroup_uncharge_swap(swp_entry_t entry, unsigned int nr_p } extern long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg); -extern bool mem_cgroup_swap_full(struct page *page); +extern bool mem_cgroup_swap_full(struct folio *folio); #else static inline void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry) { @@ -714,7 +714,7 @@ static inline long mem_cgroup_get_nr_swap_pages(struct mem_cgroup *memcg) return get_nr_swap_pages(); } -static inline bool mem_cgroup_swap_full(struct page *page) +static inline bool mem_cgroup_swap_full(struct folio *folio) { return vm_swap_full(); } -- cgit From 3b344157c0c15b8f9588e3021dfb22ee25f4508a Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:44 +0100 Subject: mm: remove try_to_free_swap() All callers have now been converted to folio_free_swap() and we can remove this wrapper. Link: https://lkml.kernel.org/r/20220902194653.1739778-49-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/swap.h | 6 ------ 1 file changed, 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index d8bd6401c3e7..fc8d98660326 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -510,7 +510,6 @@ extern int __swp_swapcount(swp_entry_t entry); extern int swp_swapcount(swp_entry_t entry); extern struct swap_info_struct *page_swap_info(struct page *); extern struct swap_info_struct *swp_swap_info(swp_entry_t entry); -extern int try_to_free_swap(struct page *); struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); extern void exit_swap_address_space(unsigned int type); @@ -595,11 +594,6 @@ static inline int swp_swapcount(swp_entry_t entry) return 0; } -static inline int try_to_free_swap(struct page *page) -{ - return 0; -} - static inline swp_entry_t folio_alloc_swap(struct folio *folio) { swp_entry_t entry; -- cgit From 29eea9b5a9c9ecf21164a082a42bfabe06fdcb30 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:50 +0100 Subject: mm: convert page_get_anon_vma() to folio_get_anon_vma() With all callers now passing in a folio, rename the function and convert all callers. Removes a couple of calls to compound_head() and a reference to page->mapping. Link: https://lkml.kernel.org/r/20220902194653.1739778-55-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/rmap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 72b2bcc37f73..3d56e3712bb2 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -163,7 +163,7 @@ static inline void anon_vma_merge(struct vm_area_struct *vma, unlink_anon_vmas(next); } -struct anon_vma *page_get_anon_vma(struct page *page); +struct anon_vma *folio_get_anon_vma(struct folio *folio); /* RMAP flags, currently only relevant for some anon rmap operations. */ typedef int __bitwise rmap_t; -- cgit From 0c826c0b6a176b9ed5ace7106fd1770bb48f1898 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:51 +0100 Subject: rmap: remove page_unlock_anon_vma_read() This was simply an alias for anon_vma_unlock_read() since 2011. Link: https://lkml.kernel.org/r/20220902194653.1739778-56-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/rmap.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 3d56e3712bb2..ca3e4ba6c58c 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -458,13 +458,8 @@ struct rmap_walk_control { void rmap_walk(struct folio *folio, struct rmap_walk_control *rwc); void rmap_walk_locked(struct folio *folio, struct rmap_walk_control *rwc); - -/* - * Called by memory-failure.c to kill processes. - */ struct anon_vma *folio_lock_anon_vma_read(struct folio *folio, struct rmap_walk_control *rwc); -void page_unlock_anon_vma_read(struct anon_vma *anon_vma); #else /* !CONFIG_MMU */ -- cgit From 19672a9e4a75252871cba319f4e3b859b8fdf671 Mon Sep 17 00:00:00 2001 From: "Matthew Wilcox (Oracle)" Date: Fri, 2 Sep 2022 20:46:53 +0100 Subject: mm: convert lock_page_or_retry() to folio_lock_or_retry() Remove a call to compound_head() in each of the two callers. Link: https://lkml.kernel.org/r/20220902194653.1739778-58-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 09de43e36a64..32846b6306db 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -989,19 +989,16 @@ static inline int lock_page_killable(struct page *page) } /* - * lock_page_or_retry - Lock the page, unless this would block and the + * folio_lock_or_retry - Lock the folio, unless this would block and the * caller indicated that it can handle a retry. * * Return value and mmap_lock implications depend on flags; see * __folio_lock_or_retry(). */ -static inline bool lock_page_or_retry(struct page *page, struct mm_struct *mm, - unsigned int flags) +static inline bool folio_lock_or_retry(struct folio *folio, + struct mm_struct *mm, unsigned int flags) { - struct folio *folio; might_sleep(); - - folio = page_folio(page); return folio_trylock(folio) || __folio_lock_or_retry(folio, mm, flags); } -- cgit From f372bde922e2ced8e0b5a928887b4cf587cc4453 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Mon, 5 Sep 2022 23:05:29 +0200 Subject: kasan: only define kasan_metadata_size for Generic mode KASAN provides a helper for calculating the size of per-object metadata stored in the redzone. As now only the Generic mode uses per-object metadata, only define kasan_metadata_size() for this mode. Link: https://lkml.kernel.org/r/8f81d4938b80446bc72538a08217009f328a3e23.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Peter Collingbourne Signed-off-by: Andrew Morton --- include/linux/kasan.h | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index b092277bf48d..027df7599573 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -150,14 +150,6 @@ static __always_inline void kasan_cache_create_kmalloc(struct kmem_cache *cache) __kasan_cache_create_kmalloc(cache); } -size_t __kasan_metadata_size(struct kmem_cache *cache); -static __always_inline size_t kasan_metadata_size(struct kmem_cache *cache) -{ - if (kasan_enabled()) - return __kasan_metadata_size(cache); - return 0; -} - void __kasan_poison_slab(struct slab *slab); static __always_inline void kasan_poison_slab(struct slab *slab) { @@ -282,7 +274,6 @@ static inline void kasan_cache_create(struct kmem_cache *cache, unsigned int *size, slab_flags_t *flags) {} static inline void kasan_cache_create_kmalloc(struct kmem_cache *cache) {} -static inline size_t kasan_metadata_size(struct kmem_cache *cache) { return 0; } static inline void kasan_poison_slab(struct slab *slab) {} static inline void kasan_unpoison_object_data(struct kmem_cache *cache, void *object) {} @@ -333,6 +324,8 @@ static inline void kasan_unpoison_task_stack(struct task_struct *task) {} #ifdef CONFIG_KASAN_GENERIC +size_t kasan_metadata_size(struct kmem_cache *cache); + void kasan_cache_shrink(struct kmem_cache *cache); void kasan_cache_shutdown(struct kmem_cache *cache); void kasan_record_aux_stack(void *ptr); @@ -340,6 +333,12 @@ void kasan_record_aux_stack_noalloc(void *ptr); #else /* CONFIG_KASAN_GENERIC */ +/* Tag-based KASAN modes do not use per-object metadata. */ +static inline size_t kasan_metadata_size(struct kmem_cache *cache) +{ + return 0; +} + static inline void kasan_cache_shrink(struct kmem_cache *cache) {} static inline void kasan_cache_shutdown(struct kmem_cache *cache) {} static inline void kasan_record_aux_stack(void *ptr) {} -- cgit From 3b7f8813e9ecf7fe91f2f8dc3b581a111cd374a5 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Mon, 5 Sep 2022 23:05:30 +0200 Subject: kasan: only define kasan_never_merge for Generic mode KASAN prevents merging of slab caches whose objects have per-object metadata stored in redzones. As now only the Generic mode uses per-object metadata, define kasan_never_merge() only for this mode. Link: https://lkml.kernel.org/r/81ed01f29ff3443580b7e2fe362a8b47b1e8006d.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Peter Collingbourne Signed-off-by: Andrew Morton --- include/linux/kasan.h | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 027df7599573..9743d4b3a918 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -103,14 +103,6 @@ struct kasan_cache { bool is_kmalloc; }; -slab_flags_t __kasan_never_merge(void); -static __always_inline slab_flags_t kasan_never_merge(void) -{ - if (kasan_enabled()) - return __kasan_never_merge(); - return 0; -} - void __kasan_unpoison_range(const void *addr, size_t size); static __always_inline void kasan_unpoison_range(const void *addr, size_t size) { @@ -261,10 +253,6 @@ static __always_inline bool kasan_check_byte(const void *addr) #else /* CONFIG_KASAN */ -static inline slab_flags_t kasan_never_merge(void) -{ - return 0; -} static inline void kasan_unpoison_range(const void *address, size_t size) {} static inline void kasan_poison_pages(struct page *page, unsigned int order, bool init) {} @@ -325,6 +313,7 @@ static inline void kasan_unpoison_task_stack(struct task_struct *task) {} #ifdef CONFIG_KASAN_GENERIC size_t kasan_metadata_size(struct kmem_cache *cache); +slab_flags_t kasan_never_merge(void); void kasan_cache_shrink(struct kmem_cache *cache); void kasan_cache_shutdown(struct kmem_cache *cache); @@ -338,6 +327,11 @@ static inline size_t kasan_metadata_size(struct kmem_cache *cache) { return 0; } +/* And thus nothing prevents cache merging. */ +static inline slab_flags_t kasan_never_merge(void) +{ + return 0; +} static inline void kasan_cache_shrink(struct kmem_cache *cache) {} static inline void kasan_cache_shutdown(struct kmem_cache *cache) {} -- cgit From 26f21f3ac76df6cf3b447e8231f8754991165475 Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Mon, 5 Sep 2022 23:05:31 +0200 Subject: kasan: only define metadata offsets for Generic mode Hide the definitions of alloc_meta_offset and free_meta_offset under an ifdef CONFIG_KASAN_GENERIC check, as these fields are now only used when the Generic mode is enabled. Link: https://lkml.kernel.org/r/d4bafa0534facafd1a23c465a94261e64f366493.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Peter Collingbourne Signed-off-by: Andrew Morton --- include/linux/kasan.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index 9743d4b3a918..a212c2e3f32d 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -98,8 +98,10 @@ static inline bool kasan_has_integrated_init(void) #ifdef CONFIG_KASAN struct kasan_cache { +#ifdef CONFIG_KASAN_GENERIC int alloc_meta_offset; int free_meta_offset; +#endif bool is_kmalloc; }; -- cgit From 682ed08924407b719fa0b1123a26971748d76ace Mon Sep 17 00:00:00 2001 From: Andrey Konovalov Date: Mon, 5 Sep 2022 23:05:33 +0200 Subject: kasan: only define kasan_cache_create for Generic mode Right now, kasan_cache_create() assigns SLAB_KASAN for all KASAN modes and then sets up metadata-related cache parameters for the Generic mode. SLAB_KASAN is used in two places: 1. In slab_ksize() to account for per-object metadata when calculating the size of the accessible memory within the object. 2. In slab_common.c via kasan_never_merge() to prevent merging of caches with per-object metadata. Both cases are only relevant when per-object metadata is present, which is only the case with the Generic mode. Thus, assign SLAB_KASAN and define kasan_cache_create() only for the Generic mode. Also update the SLAB_KASAN-related comment. Link: https://lkml.kernel.org/r/61faa2aa1906e2d02c97d00ddf99ce8911dda095.1662411799.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov Reviewed-by: Marco Elver Cc: Alexander Potapenko Cc: Andrey Ryabinin Cc: Dmitry Vyukov Cc: Evgenii Stepanov Cc: Peter Collingbourne Signed-off-by: Andrew Morton --- include/linux/kasan.h | 18 ++++++------------ include/linux/slab.h | 2 +- 2 files changed, 7 insertions(+), 13 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kasan.h b/include/linux/kasan.h index a212c2e3f32d..d811b3d7d2a1 100644 --- a/include/linux/kasan.h +++ b/include/linux/kasan.h @@ -128,15 +128,6 @@ static __always_inline void kasan_unpoison_pages(struct page *page, __kasan_unpoison_pages(page, order, init); } -void __kasan_cache_create(struct kmem_cache *cache, unsigned int *size, - slab_flags_t *flags); -static __always_inline void kasan_cache_create(struct kmem_cache *cache, - unsigned int *size, slab_flags_t *flags) -{ - if (kasan_enabled()) - __kasan_cache_create(cache, size, flags); -} - void __kasan_cache_create_kmalloc(struct kmem_cache *cache); static __always_inline void kasan_cache_create_kmalloc(struct kmem_cache *cache) { @@ -260,9 +251,6 @@ static inline void kasan_poison_pages(struct page *page, unsigned int order, bool init) {} static inline void kasan_unpoison_pages(struct page *page, unsigned int order, bool init) {} -static inline void kasan_cache_create(struct kmem_cache *cache, - unsigned int *size, - slab_flags_t *flags) {} static inline void kasan_cache_create_kmalloc(struct kmem_cache *cache) {} static inline void kasan_poison_slab(struct slab *slab) {} static inline void kasan_unpoison_object_data(struct kmem_cache *cache, @@ -316,6 +304,8 @@ static inline void kasan_unpoison_task_stack(struct task_struct *task) {} size_t kasan_metadata_size(struct kmem_cache *cache); slab_flags_t kasan_never_merge(void); +void kasan_cache_create(struct kmem_cache *cache, unsigned int *size, + slab_flags_t *flags); void kasan_cache_shrink(struct kmem_cache *cache); void kasan_cache_shutdown(struct kmem_cache *cache); @@ -334,6 +324,10 @@ static inline slab_flags_t kasan_never_merge(void) { return 0; } +/* And no cache-related metadata initialization is required. */ +static inline void kasan_cache_create(struct kmem_cache *cache, + unsigned int *size, + slab_flags_t *flags) {} static inline void kasan_cache_shrink(struct kmem_cache *cache) {} static inline void kasan_cache_shutdown(struct kmem_cache *cache) {} diff --git a/include/linux/slab.h b/include/linux/slab.h index 352e3f082acc..617a39f7db46 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -106,7 +106,7 @@ # define SLAB_ACCOUNT 0 #endif -#ifdef CONFIG_KASAN +#ifdef CONFIG_KASAN_GENERIC #define SLAB_KASAN ((slab_flags_t __force)0x08000000U) #else #define SLAB_KASAN 0 -- cgit From 36001cba4f728e7fa2a58bc69fece22eaeef5cca Mon Sep 17 00:00:00 2001 From: Kaixu Xia Date: Tue, 6 Sep 2022 23:18:47 +0800 Subject: mm/damon/core: iterate the regions list from current point in damon_set_regions() We iterate the whole regions list every time to get the first/last regions intersecting with the specific range in damon_set_regions(), in order to add new region or resize existing regions to fit in the specific range. Actually, it is unnecessary to iterate the new added regions and the front regions that have been checked. Just iterate the regions list from the current point using list_for_each_entry_from() every time to improve performance. The kunit tests passed: [PASSED] damon_test_apply_three_regions1 [PASSED] damon_test_apply_three_regions2 [PASSED] damon_test_apply_three_regions3 [PASSED] damon_test_apply_three_regions4 Link: https://lkml.kernel.org/r/1662477527-13003-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia Reviewed-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 7b1f4a488230..d54acec048d6 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -463,9 +463,17 @@ static inline struct damon_region *damon_last_region(struct damon_target *t) return list_last_entry(&t->regions_list, struct damon_region, list); } +static inline struct damon_region *damon_first_region(struct damon_target *t) +{ + return list_first_entry(&t->regions_list, struct damon_region, list); +} + #define damon_for_each_region(r, t) \ list_for_each_entry(r, &t->regions_list, list) +#define damon_for_each_region_from(r, t) \ + list_for_each_entry_from(r, &t->regions_list, list) + #define damon_for_each_region_safe(r, next, t) \ list_for_each_entry_safe(r, next, &t->regions_list, list) -- cgit From 4f9bc69ac5ce34071a9a51343bc81ca76cb2e3f1 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Wed, 7 Sep 2022 14:08:42 +0800 Subject: mm: reuse pageblock_start/end_pfn() macro Move pageblock_start_pfn/pageblock_end_pfn() into pageblock-flags.h, then they could be used somewhere else, not only in compaction, also use ALIGN_DOWN() instead of round_down() to be pair with ALIGN(), which should be same for pageblock usage. Link: https://lkml.kernel.org/r/20220907060844.126891-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Acked-by: Mike Rapoport Reviewed-by: David Hildenbrand Cc: Oscar Salvador Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/pageblock-flags.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h index 83c7248053a1..a09b7fe6bbf8 100644 --- a/include/linux/pageblock-flags.h +++ b/include/linux/pageblock-flags.h @@ -53,6 +53,8 @@ extern unsigned int pageblock_order; #endif /* CONFIG_HUGETLB_PAGE */ #define pageblock_nr_pages (1UL << pageblock_order) +#define pageblock_start_pfn(pfn) ALIGN_DOWN((pfn), pageblock_nr_pages) +#define pageblock_end_pfn(pfn) ALIGN((pfn) + 1, pageblock_nr_pages) /* Forward declaration */ struct page; -- cgit From 5f7fa13fa858c17580ed513bd5e0a4b36d68fdd6 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Wed, 7 Sep 2022 14:08:43 +0800 Subject: mm: add pageblock_align() macro Add pageblock_align() macro and use it to simplify code. Link: https://lkml.kernel.org/r/20220907060844.126891-2-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Acked-by: Mike Rapoport Reviewed-by: David Hildenbrand Cc: Oscar Salvador Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/pageblock-flags.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h index a09b7fe6bbf8..293c76630fa8 100644 --- a/include/linux/pageblock-flags.h +++ b/include/linux/pageblock-flags.h @@ -53,6 +53,7 @@ extern unsigned int pageblock_order; #endif /* CONFIG_HUGETLB_PAGE */ #define pageblock_nr_pages (1UL << pageblock_order) +#define pageblock_align(pfn) ALIGN((pfn), pageblock_nr_pages) #define pageblock_start_pfn(pfn) ALIGN_DOWN((pfn), pageblock_nr_pages) #define pageblock_end_pfn(pfn) ALIGN((pfn) + 1, pageblock_nr_pages) -- cgit From ee0913c4719610204315a0d8a35122c6233249e0 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Wed, 7 Sep 2022 14:08:44 +0800 Subject: mm: add pageblock_aligned() macro Add pageblock_aligned() and use it to simplify code. Link: https://lkml.kernel.org/r/20220907060844.126891-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang Acked-by: Mike Rapoport Cc: David Hildenbrand Cc: Oscar Salvador Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/pageblock-flags.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-flags.h index 293c76630fa8..5f1ae07d724b 100644 --- a/include/linux/pageblock-flags.h +++ b/include/linux/pageblock-flags.h @@ -54,6 +54,7 @@ extern unsigned int pageblock_order; #define pageblock_nr_pages (1UL << pageblock_order) #define pageblock_align(pfn) ALIGN((pfn), pageblock_nr_pages) +#define pageblock_aligned(pfn) IS_ALIGNED((pfn), pageblock_nr_pages) #define pageblock_start_pfn(pfn) ALIGN_DOWN((pfn), pageblock_nr_pages) #define pageblock_end_pfn(pfn) ALIGN((pfn) + 1, pageblock_nr_pages) -- cgit From 410f8e82689e1e66044fea51ef852054a09502b7 Mon Sep 17 00:00:00 2001 From: Shakeel Butt Date: Wed, 7 Sep 2022 04:35:35 +0000 Subject: memcg: extract memcg_vmstats from struct mem_cgroup Patch series "memcg: reduce memory overhead of memory cgroups". Currently a lot of memory is wasted to maintain the vmevents for memory cgroups as we have multiple arrays of size NR_VM_EVENT_ITEMS which can be as large as 110. However memcg code uses small portion of those entries. This patch series eliminate this overhead by removing the unneeded vmevent entries from memory cgroup data structures. This patch (of 3): This is a preparatory patch to reduce the memory overhead of memory cgroup. The struct memcg_vmstats is the largest object embedded into the struct mem_cgroup. This patch extracts struct memcg_vmstats from struct mem_cgroup to ease the following patches in reducing the size of struct memcg_vmstats. Link: https://lkml.kernel.org/r/20220907043537.3457014-1-shakeelb@google.com Link: https://lkml.kernel.org/r/20220907043537.3457014-2-shakeelb@google.com Signed-off-by: Shakeel Butt Acked-by: Roman Gushchin Cc: Johannes Weiner Cc: Michal Hocko Cc: Muchun Song Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 37 ++++--------------------------------- 1 file changed, 4 insertions(+), 33 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index ca0df42662ad..dc7d40e575d5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -80,29 +80,8 @@ enum mem_cgroup_events_target { MEM_CGROUP_NTARGETS, }; -struct memcg_vmstats_percpu { - /* Local (CPU and cgroup) page state & events */ - long state[MEMCG_NR_STAT]; - unsigned long events[NR_VM_EVENT_ITEMS]; - - /* Delta calculation for lockless upward propagation */ - long state_prev[MEMCG_NR_STAT]; - unsigned long events_prev[NR_VM_EVENT_ITEMS]; - - /* Cgroup1: threshold notifications & softlimit tree updates */ - unsigned long nr_page_events; - unsigned long targets[MEM_CGROUP_NTARGETS]; -}; - -struct memcg_vmstats { - /* Aggregated (CPU and subtree) page state & events */ - long state[MEMCG_NR_STAT]; - unsigned long events[NR_VM_EVENT_ITEMS]; - - /* Pending child counts during tree propagation */ - long state_pending[MEMCG_NR_STAT]; - unsigned long events_pending[NR_VM_EVENT_ITEMS]; -}; +struct memcg_vmstats_percpu; +struct memcg_vmstats; struct mem_cgroup_reclaim_iter { struct mem_cgroup *position; @@ -298,7 +277,7 @@ struct mem_cgroup { CACHELINE_PADDING(_pad1_); /* memory.stat */ - struct memcg_vmstats vmstats; + struct memcg_vmstats *vmstats; /* memory.events */ atomic_long_t memory_events[MEMCG_NR_MEMORY_EVENTS]; @@ -1001,15 +980,7 @@ static inline void mod_memcg_page_state(struct page *page, rcu_read_unlock(); } -static inline unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx) -{ - long x = READ_ONCE(memcg->vmstats.state[idx]); -#ifdef CONFIG_SMP - if (x < 0) - x = 0; -#endif - return x; -} +unsigned long memcg_page_state(struct mem_cgroup *memcg, int idx); static inline unsigned long lruvec_page_state(struct lruvec *lruvec, enum node_stat_item idx) -- cgit From f5a79d7c0c87c8d88bb5e3f3c898258fdf1b3b05 Mon Sep 17 00:00:00 2001 From: Yajun Deng Date: Thu, 8 Sep 2022 19:14:43 +0000 Subject: mm/damon: introduce struct damos_access_pattern damon_new_scheme() has too many parameters, so introduce struct damos_access_pattern to simplify it. In additon, we can't use a bpf trace kprobe that has more than 5 parameters. Link: https://lkml.kernel.org/r/20220908191443.129534-1-sj@kernel.org Signed-off-by: Yajun Deng Signed-off-by: SeongJae Park Reviewed-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 37 ++++++++++++++++++++----------------- 1 file changed, 20 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index d54acec048d6..90f20675da22 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -216,13 +216,26 @@ struct damos_stat { }; /** - * struct damos - Represents a Data Access Monitoring-based Operation Scheme. + * struct damos_access_pattern - Target access pattern of the given scheme. * @min_sz_region: Minimum size of target regions. * @max_sz_region: Maximum size of target regions. * @min_nr_accesses: Minimum ``->nr_accesses`` of target regions. * @max_nr_accesses: Maximum ``->nr_accesses`` of target regions. * @min_age_region: Minimum age of target regions. * @max_age_region: Maximum age of target regions. + */ +struct damos_access_pattern { + unsigned long min_sz_region; + unsigned long max_sz_region; + unsigned int min_nr_accesses; + unsigned int max_nr_accesses; + unsigned int min_age_region; + unsigned int max_age_region; +}; + +/** + * struct damos - Represents a Data Access Monitoring-based Operation Scheme. + * @pattern: Access pattern of target regions. * @action: &damo_action to be applied to the target regions. * @quota: Control the aggressiveness of this scheme. * @wmarks: Watermarks for automated (in)activation of this scheme. @@ -230,10 +243,8 @@ struct damos_stat { * @list: List head for siblings. * * For each aggregation interval, DAMON finds regions which fit in the - * condition (&min_sz_region, &max_sz_region, &min_nr_accesses, - * &max_nr_accesses, &min_age_region, &max_age_region) and applies &action to - * those. To avoid consuming too much CPU time or IO resources for the - * &action, "a is used. + * &pattern and applies &action to those. To avoid consuming too much + * CPU time or IO resources for the &action, "a is used. * * To do the work only when needed, schemes can be activated for specific * system situations using &wmarks. If all schemes that registered to the @@ -248,12 +259,7 @@ struct damos_stat { * &action is applied. */ struct damos { - unsigned long min_sz_region; - unsigned long max_sz_region; - unsigned int min_nr_accesses; - unsigned int max_nr_accesses; - unsigned int min_age_region; - unsigned int max_age_region; + struct damos_access_pattern pattern; enum damos_action action; struct damos_quota quota; struct damos_watermarks wmarks; @@ -509,12 +515,9 @@ void damon_destroy_region(struct damon_region *r, struct damon_target *t); int damon_set_regions(struct damon_target *t, struct damon_addr_range *ranges, unsigned int nr_ranges); -struct damos *damon_new_scheme( - unsigned long min_sz_region, unsigned long max_sz_region, - unsigned int min_nr_accesses, unsigned int max_nr_accesses, - unsigned int min_age_region, unsigned int max_age_region, - enum damos_action action, struct damos_quota *quota, - struct damos_watermarks *wmarks); +struct damos *damon_new_scheme(struct damos_access_pattern *pattern, + enum damos_action action, struct damos_quota *quota, + struct damos_watermarks *wmarks); void damon_add_scheme(struct damon_ctx *ctx, struct damos *s); void damon_destroy_scheme(struct damos *s); -- cgit From 0d83b2d89dbfad17b62d4e7fb8f0b0525ba1a204 Mon Sep 17 00:00:00 2001 From: Xin Hao Date: Fri, 9 Sep 2022 21:36:06 +0000 Subject: mm/damon: remove duplicate get_monitoring_region() definitions In lru_sort.c and reclaim.c, they are all defining get_monitoring_region() function, there is no need to define it separately. As 'get_monitoring_region()' is not a 'static' function anymore, we try to use a prefix to distinguish with other functions, so there rename it to 'damon_find_biggest_system_ram'. Link: https://lkml.kernel.org/r/20220909213606.136221-1-sj@kernel.org Signed-off-by: Xin Hao Signed-off-by: SeongJae Park Suggested-by: SeongJae Park Reviewed-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 90f20675da22..016b6c9c03d6 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -549,6 +549,8 @@ static inline bool damon_target_has_pid(const struct damon_ctx *ctx) int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); +bool damon_find_biggest_system_ram(unsigned long *start, unsigned long *end); + #endif /* CONFIG_DAMON */ #endif /* _DAMON_H */ -- cgit From 13cc378403a83e70430ae9bad53fd65199f21fe1 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Fri, 9 Sep 2022 10:57:11 +0800 Subject: writeback: remove unused macro DIRTY_FULL_SCOPE It's introduced but never used. Remove it. Link: https://lkml.kernel.org/r/20220909025711.32012-1-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: Jan Kara Acked-by: Jens Axboe Cc: Bart Van Assche Cc: David Howells Cc: Matthew Wilcox Cc: NeilBrown Cc: Vlastimil Babka Cc: zhanglianjie Signed-off-by: Andrew Morton --- include/linux/writeback.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/writeback.h b/include/linux/writeback.h index 3f045f6d6c4f..06f9291b6fd5 100644 --- a/include/linux/writeback.h +++ b/include/linux/writeback.h @@ -17,20 +17,12 @@ struct bio; DECLARE_PER_CPU(int, dirty_throttle_leaks); /* - * The 1/4 region under the global dirty thresh is for smooth dirty throttling: - * - * (thresh - thresh/DIRTY_FULL_SCOPE, thresh) - * - * Further beyond, all dirtier tasks will enter a loop waiting (possibly long - * time) for the dirty pages to drop, unless written enough pages. - * * The global dirty threshold is normally equal to the global dirty limit, * except when the system suddenly allocates a lot of anonymous memory and * knocks down the global dirty threshold quickly, in which case the global * dirty limit will follow down slowly to prevent livelocking all dirtier tasks. */ #define DIRTY_SCOPE 8 -#define DIRTY_FULL_SCOPE (DIRTY_SCOPE / 2) struct backing_dev_info; -- cgit From cbeaa77b044938cfe91818821ece6b0b1511e967 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 13 Sep 2022 17:44:32 +0000 Subject: mm/damon/core: use a dedicated struct for monitoring attributes DAMON monitoring attributes are directly defined as fields of 'struct damon_ctx'. This makes 'struct damon_ctx' a little long and complicated. This commit defines and uses a struct, 'struct damon_attrs', which is dedicated for only the monitoring attributes to make the purpose of the five values clearer and simplify 'struct damon_ctx'. Link: https://lkml.kernel.org/r/20220913174449.50645-6-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 016b6c9c03d6..2ceee8b07726 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -389,13 +389,15 @@ struct damon_callback { }; /** - * struct damon_ctx - Represents a context for each monitoring. This is the - * main interface that allows users to set the attributes and get the results - * of the monitoring. + * struct damon_attrs - Monitoring attributes for accuracy/overhead control. * * @sample_interval: The time between access samplings. * @aggr_interval: The time between monitor results aggregations. * @ops_update_interval: The time between monitoring operations updates. + * @min_nr_regions: The minimum number of adaptive monitoring + * regions. + * @max_nr_regions: The maximum number of adaptive monitoring + * regions. * * For each @sample_interval, DAMON checks whether each region is accessed or * not. It aggregates and keeps the access information (number of accesses to @@ -405,7 +407,21 @@ struct damon_callback { * @ops_update_interval. All time intervals are in micro-seconds. * Please refer to &struct damon_operations and &struct damon_callback for more * detail. + */ +struct damon_attrs { + unsigned long sample_interval; + unsigned long aggr_interval; + unsigned long ops_update_interval; + unsigned long min_nr_regions; + unsigned long max_nr_regions; +}; + +/** + * struct damon_ctx - Represents a context for each monitoring. This is the + * main interface that allows users to set the attributes and get the results + * of the monitoring. * + * @attrs: Monitoring attributes for accuracy/overhead control. * @kdamond: Kernel thread who does the monitoring. * @kdamond_lock: Mutex for the synchronizations with @kdamond. * @@ -427,15 +443,11 @@ struct damon_callback { * @ops: Set of monitoring operations for given use cases. * @callback: Set of callbacks for monitoring events notifications. * - * @min_nr_regions: The minimum number of adaptive monitoring regions. - * @max_nr_regions: The maximum number of adaptive monitoring regions. * @adaptive_targets: Head of monitoring targets (&damon_target) list. * @schemes: Head of schemes (&damos) list. */ struct damon_ctx { - unsigned long sample_interval; - unsigned long aggr_interval; - unsigned long ops_update_interval; + struct damon_attrs attrs; /* private: internal use only */ struct timespec64 last_aggregation; @@ -448,8 +460,6 @@ struct damon_ctx { struct damon_operations ops; struct damon_callback callback; - unsigned long min_nr_regions; - unsigned long max_nr_regions; struct list_head adaptive_targets; struct list_head schemes; }; -- cgit From bead3b00088eb8016b32cafa7e0701b3283e68a4 Mon Sep 17 00:00:00 2001 From: SeongJae Park Date: Tue, 13 Sep 2022 17:44:33 +0000 Subject: mm/damon/core: reduce parameters for damon_set_attrs() Number of parameters for 'damon_set_attrs()' is six. As it could be confusing and verbose, this commit reduces the number by receiving single pointer to a 'struct damon_attrs'. Link: https://lkml.kernel.org/r/20220913174449.50645-7-sj@kernel.org Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 2ceee8b07726..c5dc0c77c772 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -540,9 +540,7 @@ unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); -int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long ops_upd_int, - unsigned long min_nr_reg, unsigned long max_nr_reg); +int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs); int damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes, ssize_t nr_schemes); int damon_nr_running_ctxs(void); -- cgit From b958d4d08fbfe938af24ea06ebbf839b48fa18a9 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Wed, 14 Sep 2022 15:26:02 +0800 Subject: mm: hugetlb: simplify per-node sysfs creation and removal Patch series "simplify handling of per-node sysfs creation and removal", v4. This patch (of 2): The following commit offload per-node sysfs creation and removal to a kworker and did not say why it is needed. And it also said "I don't know that this is absolutely required". It seems like the author was not sure as well. Since it only complicates the code, this patch will revert the changes to simplify the code. 39da08cb074c ("hugetlb: offload per node attribute registrations") We could use memory hotplug notifier to do per-node sysfs creation and removal instead of inserting those operations to node registration and unregistration. Then, it can reduce the code coupling between node.c and hugetlb.c. Also, it can simplify the code. Link: https://lkml.kernel.org/r/20220914072603.60293-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20220914072603.60293-2-songmuchun@bytedance.com Signed-off-by: Muchun Song Acked-by: Mike Kravetz Acked-by: David Hildenbrand Cc: Andi Kleen Cc: Greg Kroah-Hartman Cc: Muchun Song Cc: Oscar Salvador Cc: Rafael J. Wysocki Signed-off-by: Andrew Morton --- include/linux/node.h | 24 ++++-------------------- 1 file changed, 4 insertions(+), 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/node.h b/include/linux/node.h index 9ec680dd607f..427a5975cf40 100644 --- a/include/linux/node.h +++ b/include/linux/node.h @@ -2,15 +2,15 @@ /* * include/linux/node.h - generic node definition * - * This is mainly for topological representation. We define the - * basic 'struct node' here, which can be embedded in per-arch + * This is mainly for topological representation. We define the + * basic 'struct node' here, which can be embedded in per-arch * definitions of processors. * * Basic handling of the devices is done in drivers/base/node.c - * and system devices are handled in drivers/base/sys.c. + * and system devices are handled in drivers/base/sys.c. * * Nodes are exported via driverfs in the class/node/devices/ - * directory. + * directory. */ #ifndef _LINUX_NODE_H_ #define _LINUX_NODE_H_ @@ -18,7 +18,6 @@ #include #include #include -#include /** * struct node_hmem_attrs - heterogeneous memory performance attributes @@ -84,10 +83,6 @@ static inline void node_set_perf_attrs(unsigned int nid, struct node { struct device dev; struct list_head access_list; - -#if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_HUGETLBFS) - struct work_struct node_work; -#endif #ifdef CONFIG_HMEM_REPORTING struct list_head cache_attrs; struct device *cache_dev; @@ -96,7 +91,6 @@ struct node { struct memory_block; extern struct node *node_devices[]; -typedef void (*node_registration_func_t)(struct node *); #if defined(CONFIG_MEMORY_HOTPLUG) && defined(CONFIG_NUMA) void register_memory_blocks_under_node(int nid, unsigned long start_pfn, @@ -144,11 +138,6 @@ extern void unregister_memory_block_under_nodes(struct memory_block *mem_blk); extern int register_memory_node_under_compute_node(unsigned int mem_nid, unsigned int cpu_nid, unsigned access); - -#ifdef CONFIG_HUGETLBFS -extern void register_hugetlbfs_with_node(node_registration_func_t doregister, - node_registration_func_t unregister); -#endif #else static inline void node_dev_init(void) { @@ -176,11 +165,6 @@ static inline int unregister_cpu_under_node(unsigned int cpu, unsigned int nid) static inline void unregister_memory_block_under_nodes(struct memory_block *mem_blk) { } - -static inline void register_hugetlbfs_with_node(node_registration_func_t reg, - node_registration_func_t unreg) -{ -} #endif #define to_node(device) container_of(device, struct node, dev) -- cgit From a4a00b451ef5e1deb959088e25e248f4ee399792 Mon Sep 17 00:00:00 2001 From: Muchun Song Date: Wed, 14 Sep 2022 15:26:03 +0800 Subject: mm: hugetlb: eliminate memory-less nodes handling The memory-notify-based approach aims to handle meory-less nodes, however, it just adds the complexity of code as pointed by David in thread [1]. The handling of memory-less nodes is introduced by commit 4faf8d950ec4 ("hugetlb: handle memory hot-plug events"). >From its commit message, we cannot find any necessity of handling this case. So, we can simply register/unregister sysfs entries in register_node/unregister_node to simlify the code. BTW, hotplug callback added because in hugetlb_register_all_nodes() we register sysfs nodes only for N_MEMORY nodes, seeing commit 9b5e5d0fdc91, which said it was a preparation for handling memory-less nodes via memory hotplug. Since we want to remove memory hotplug, so make sure we only register per-node sysfs for online (N_ONLINE) nodes in hugetlb_register_all_nodes(). https://lore.kernel.org/linux-mm/60933ffc-b850-976c-78a0-0ee6e0ea9ef0@redhat.com/ [1] Link: https://lkml.kernel.org/r/20220914072603.60293-3-songmuchun@bytedance.com Suggested-by: David Hildenbrand Signed-off-by: Muchun Song Acked-by: David Hildenbrand Cc: Andi Kleen Cc: Greg Kroah-Hartman Cc: Mike Kravetz Cc: Oscar Salvador Cc: Rafael J. Wysocki Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 57e72954a482..6d7f39754060 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -16,6 +16,7 @@ struct ctl_table; struct user_struct; struct mmu_gather; +struct node; #ifndef is_hugepd typedef struct { unsigned long pd; } hugepd_t; @@ -935,6 +936,11 @@ static inline void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, } #endif +#ifdef CONFIG_NUMA +void hugetlb_register_node(struct node *node); +void hugetlb_unregister_node(struct node *node); +#endif + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; @@ -1109,6 +1115,14 @@ static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte) { } + +static inline void hugetlb_register_node(struct node *node) +{ +} + +static inline void hugetlb_unregister_node(struct node *node) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, -- cgit From c195c3215741746b1eb7ab7980b926ddc37a4be3 Mon Sep 17 00:00:00 2001 From: Ke Sun Date: Wed, 14 Sep 2022 10:17:38 +0800 Subject: mm/filemap: make folio_put_wait_locked static It's only used in mm/filemap.c, since commit ("mm/migrate.c: rework migration_entry_wait() to not take a pageref"). Make it static. Link: https://lkml.kernel.org/r/20220914021738.3228011-1-sunke@kylinos.cn Signed-off-by: Ke Sun Reported-by: k2ci Signed-off-by: Andrew Morton --- include/linux/pagemap.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 32846b6306db..23125ab87ded 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -1039,7 +1039,6 @@ static inline int wait_on_page_locked_killable(struct page *page) return folio_wait_locked_killable(page_folio(page)); } -int folio_put_wait_locked(struct folio *folio, int state); void wait_on_page_writeback(struct page *page); void folio_wait_writeback(struct folio *folio); int folio_wait_writeback_killable(struct folio *folio); -- cgit From 7e1813d48dd30e6c6f235f6661d1bc108fcab528 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Wed, 14 Sep 2022 15:18:04 -0700 Subject: hugetlb: rename remove_huge_page to hugetlb_delete_from_page_cache remove_huge_page removes a hugetlb page from the page cache. Change to hugetlb_delete_from_page_cache as it is a more descriptive name. huge_add_to_page_cache is global in scope, but only deals with hugetlb pages. For consistency and clarity, rename to hugetlb_add_to_page_cache. Link: https://lkml.kernel.org/r/20220914221810.95771-4-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin Cc: Andrea Arcangeli Cc: "Aneesh Kumar K.V" Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Davidlohr Bueso Cc: James Houghton Cc: "Kirill A. Shutemov" Cc: Michal Hocko Cc: Mina Almasry Cc: Muchun Song Cc: Naoya Horiguchi Cc: Pasha Tatashin Cc: Peter Xu Cc: Prakash Sangappa Cc: Sven Schnelle Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6d7f39754060..4893d6d07099 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -666,7 +666,7 @@ struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask); struct page *alloc_huge_page_vma(struct hstate *h, struct vm_area_struct *vma, unsigned long address); -int huge_add_to_page_cache(struct page *page, struct address_space *mapping, +int hugetlb_add_to_page_cache(struct page *page, struct address_space *mapping, pgoff_t idx); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, unsigned long address, struct page *page); -- cgit From 8d9bfb2608145cf3e408428c224099e1585471af Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Wed, 14 Sep 2022 15:18:07 -0700 Subject: hugetlb: add vma based lock for pmd sharing Allocate a new hugetlb_vma_lock structure and hang off vm_private_data for synchronization use by vmas that could be involved in pmd sharing. This data structure contains a rw semaphore that is the primary tool used for synchronization. This new structure is ref counted, so that it can exist when NOT attached to a vma. This is only helpful in resolving lock ordering issues where code may need to obtain the vma_lock while there are no guarantees the vma may go away. By obtaining a ref on the structure, it can be guaranteed that at least the rw semaphore will not go away. Only add infrastructure for the new lock here. Actual use will be added in subsequent patches. [mike.kravetz@oracle.com: fix build issue for missing hugetlb_vma_lock_release] Link: https://lkml.kernel.org/r/YyNUtA1vRASOE4+M@monkey Link: https://lkml.kernel.org/r/20220914221810.95771-7-mike.kravetz@oracle.com Signed-off-by: Mike Kravetz Reviewed-by: Miaohe Lin Cc: Andrea Arcangeli Cc: "Aneesh Kumar K.V" Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Davidlohr Bueso Cc: James Houghton Cc: "Kirill A. Shutemov" Cc: Michal Hocko Cc: Mina Almasry Cc: Muchun Song Cc: Naoya Horiguchi Cc: Pasha Tatashin Cc: Peter Xu Cc: Prakash Sangappa Cc: Sven Schnelle Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 43 +++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 41 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 4893d6d07099..7b70aa931729 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -115,6 +115,12 @@ struct file_region { #endif }; +struct hugetlb_vma_lock { + struct kref refs; + struct rw_semaphore rw_sema; + struct vm_area_struct *vma; +}; + extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); @@ -127,7 +133,7 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, long min_hpages); void hugepage_put_subpool(struct hugepage_subpool *spool); -void reset_vma_resv_huge_pages(struct vm_area_struct *vma); +void hugetlb_dup_vma_private(struct vm_area_struct *vma); void clear_vma_resv_huge_pages(struct vm_area_struct *vma); int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); int hugetlb_overcommit_handler(struct ctl_table *, int, void *, size_t *, @@ -215,6 +221,14 @@ struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, pgd_t *pgd, int flags); +void hugetlb_vma_lock_read(struct vm_area_struct *vma); +void hugetlb_vma_unlock_read(struct vm_area_struct *vma); +void hugetlb_vma_lock_write(struct vm_area_struct *vma); +void hugetlb_vma_unlock_write(struct vm_area_struct *vma); +int hugetlb_vma_trylock_write(struct vm_area_struct *vma); +void hugetlb_vma_assert_locked(struct vm_area_struct *vma); +void hugetlb_vma_lock_release(struct kref *kref); + int pmd_huge(pmd_t pmd); int pud_huge(pud_t pud); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, @@ -226,7 +240,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); #else /* !CONFIG_HUGETLB_PAGE */ -static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma) +static inline void hugetlb_dup_vma_private(struct vm_area_struct *vma) { } @@ -337,6 +351,31 @@ static inline int prepare_hugepage_range(struct file *file, return -EINVAL; } +static inline void hugetlb_vma_lock_read(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_unlock_read(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_lock_write(struct vm_area_struct *vma) +{ +} + +static inline void hugetlb_vma_unlock_write(struct vm_area_struct *vma) +{ +} + +static inline int hugetlb_vma_trylock_write(struct vm_area_struct *vma) +{ + return 1; +} + +static inline void hugetlb_vma_assert_locked(struct vm_area_struct *vma) +{ +} + static inline int pmd_huge(pmd_t pmd) { return 0; -- cgit From 83a4f1ef45a90d740bc6edf6a2533b14a3e5d183 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:36 +0200 Subject: stackdepot: reserve 5 extra bits in depot_stack_handle_t Some users (currently only KMSAN) may want to use spare bits in depot_stack_handle_t. Let them do so by adding @extra_bits to __stack_depot_save() to store arbitrary flags, and providing stack_depot_get_extra_bits() to retrieve those flags. Also adapt KASAN to the new prototype by passing extra_bits=0, as KASAN does not intend to store additional information in the stack handle. Link: https://lkml.kernel.org/r/20220915150417.722975-3-glider@google.com Signed-off-by: Alexander Potapenko Reviewed-by: Marco Elver Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/stackdepot.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index bc2797955de9..9ca7798d7a31 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -14,9 +14,15 @@ #include typedef u32 depot_stack_handle_t; +/* + * Number of bits in the handle that stack depot doesn't use. Users may store + * information in them. + */ +#define STACK_DEPOT_EXTRA_BITS 5 depot_stack_handle_t __stack_depot_save(unsigned long *entries, unsigned int nr_entries, + unsigned int extra_bits, gfp_t gfp_flags, bool can_alloc); /* @@ -59,6 +65,8 @@ depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned int stack_depot_fetch(depot_stack_handle_t handle, unsigned long **entries); +unsigned int stack_depot_get_extra_bits(depot_stack_handle_t handle); + int stack_depot_snprint(depot_stack_handle_t handle, char *buf, size_t size, int spaces); -- cgit From 33b75c1d884e81ec97525e0a6fdcb187adf273f4 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:37 +0200 Subject: instrumented.h: allow instrumenting both sides of copy_from_user() Introduce instrument_copy_from_user_before() and instrument_copy_from_user_after() hooks to be invoked before and after the call to copy_from_user(). KASAN and KCSAN will be only using instrument_copy_from_user_before(), but for KMSAN we'll need to insert code after copy_from_user(). Link: https://lkml.kernel.org/r/20220915150417.722975-4-glider@google.com Signed-off-by: Alexander Potapenko Reviewed-by: Marco Elver Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/instrumented.h | 21 +++++++++++++++++++-- include/linux/uaccess.h | 19 ++++++++++++++----- 2 files changed, 33 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h index 42faebbaa202..ee8f7d17d34f 100644 --- a/include/linux/instrumented.h +++ b/include/linux/instrumented.h @@ -120,7 +120,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n) } /** - * instrument_copy_from_user - instrument writes of copy_from_user + * instrument_copy_from_user_before - add instrumentation before copy_from_user * * Instrument writes to kernel memory, that are due to copy_from_user (and * variants). The instrumentation should be inserted before the accesses. @@ -130,10 +130,27 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n) * @n number of bytes to copy */ static __always_inline void -instrument_copy_from_user(const void *to, const void __user *from, unsigned long n) +instrument_copy_from_user_before(const void *to, const void __user *from, unsigned long n) { kasan_check_write(to, n); kcsan_check_write(to, n); } +/** + * instrument_copy_from_user_after - add instrumentation after copy_from_user + * + * Instrument writes to kernel memory, that are due to copy_from_user (and + * variants). The instrumentation should be inserted after the accesses. + * + * @to destination address + * @from source address + * @n number of bytes to copy + * @left number of bytes not copied (as returned by copy_from_user) + */ +static __always_inline void +instrument_copy_from_user_after(const void *to, const void __user *from, + unsigned long n, unsigned long left) +{ +} + #endif /* _LINUX_INSTRUMENTED_H */ diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h index 47e5d374c7eb..afb18f198843 100644 --- a/include/linux/uaccess.h +++ b/include/linux/uaccess.h @@ -58,20 +58,28 @@ static __always_inline __must_check unsigned long __copy_from_user_inatomic(void *to, const void __user *from, unsigned long n) { - instrument_copy_from_user(to, from, n); + unsigned long res; + + instrument_copy_from_user_before(to, from, n); check_object_size(to, n, false); - return raw_copy_from_user(to, from, n); + res = raw_copy_from_user(to, from, n); + instrument_copy_from_user_after(to, from, n, res); + return res; } static __always_inline __must_check unsigned long __copy_from_user(void *to, const void __user *from, unsigned long n) { + unsigned long res; + might_fault(); + instrument_copy_from_user_before(to, from, n); if (should_fail_usercopy()) return n; - instrument_copy_from_user(to, from, n); check_object_size(to, n, false); - return raw_copy_from_user(to, from, n); + res = raw_copy_from_user(to, from, n); + instrument_copy_from_user_after(to, from, n, res); + return res; } /** @@ -115,8 +123,9 @@ _copy_from_user(void *to, const void __user *from, unsigned long n) unsigned long res = n; might_fault(); if (!should_fail_usercopy() && likely(access_ok(from, n))) { - instrument_copy_from_user(to, from, n); + instrument_copy_from_user_before(to, from, n); res = raw_copy_from_user(to, from, n); + instrument_copy_from_user_after(to, from, n, res); } if (unlikely(res)) memset(to + (n - res), 0, res); -- cgit From 888f84a6da4d17e453058169fa7b235fff34f5bf Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:38 +0200 Subject: x86: asm: instrument usercopy in get_user() and put_user() Use hooks from instrumented.h to notify bug detection tools about usercopy events in variations of get_user() and put_user(). Link: https://lkml.kernel.org/r/20220915150417.722975-5-glider@google.com Signed-off-by: Alexander Potapenko Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Marco Elver Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/instrumented.h | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) (limited to 'include/linux') diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h index ee8f7d17d34f..9f1dba8f717b 100644 --- a/include/linux/instrumented.h +++ b/include/linux/instrumented.h @@ -153,4 +153,32 @@ instrument_copy_from_user_after(const void *to, const void __user *from, { } +/** + * instrument_get_user() - add instrumentation to get_user()-like macros + * + * get_user() and friends are fragile, so it may depend on the implementation + * whether the instrumentation happens before or after the data is copied from + * the userspace. + * + * @to destination variable, may not be address-taken + */ +#define instrument_get_user(to) \ +({ \ +}) + +/** + * instrument_put_user() - add instrumentation to put_user()-like macros + * + * put_user() and friends are fragile, so it may depend on the implementation + * whether the instrumentation happens before or after the data is copied from + * the userspace. + * + * @from source address + * @ptr userspace pointer to copy to + * @size number of bytes to copy + */ +#define instrument_put_user(from, ptr, size) \ +({ \ +}) + #endif /* _LINUX_INSTRUMENTED_H */ -- cgit From 9b448bc25b776daab3215393c3ce6953dd3bb8ad Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:41 +0200 Subject: kmsan: introduce __no_sanitize_memory and __no_kmsan_checks __no_sanitize_memory is a function attribute that instructs KMSAN to skip a function during instrumentation. This is needed to e.g. implement the noinstr functions. __no_kmsan_checks is a function attribute that makes KMSAN ignore the uninitialized values coming from the function's inputs, and initialize the function's outputs. Functions marked with this attribute can't be inlined into functions not marked with it, and vice versa. This behavior is overridden by __always_inline. __SANITIZE_MEMORY__ is a macro that's defined iff the file is instrumented with KMSAN. This is not the same as CONFIG_KMSAN, which is defined for every file. Link: https://lkml.kernel.org/r/20220915150417.722975-8-glider@google.com Signed-off-by: Alexander Potapenko Reviewed-by: Marco Elver Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/compiler-clang.h | 23 +++++++++++++++++++++++ include/linux/compiler-gcc.h | 6 ++++++ 2 files changed, 29 insertions(+) (limited to 'include/linux') diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h index c84fec767445..4fa0cc4cbd2c 100644 --- a/include/linux/compiler-clang.h +++ b/include/linux/compiler-clang.h @@ -51,6 +51,29 @@ #define __no_sanitize_undefined #endif +#if __has_feature(memory_sanitizer) +#define __SANITIZE_MEMORY__ +/* + * Unlike other sanitizers, KMSAN still inserts code into functions marked with + * no_sanitize("kernel-memory"). Using disable_sanitizer_instrumentation + * provides the behavior consistent with other __no_sanitize_ attributes, + * guaranteeing that __no_sanitize_memory functions remain uninstrumented. + */ +#define __no_sanitize_memory __disable_sanitizer_instrumentation + +/* + * The __no_kmsan_checks attribute ensures that a function does not produce + * false positive reports by: + * - initializing all local variables and memory stores in this function; + * - skipping all shadow checks; + * - passing initialized arguments to this function's callees. + */ +#define __no_kmsan_checks __attribute__((no_sanitize("kernel-memory"))) +#else +#define __no_sanitize_memory +#define __no_kmsan_checks +#endif + /* * Support for __has_feature(coverage_sanitizer) was added in Clang 13 together * with no_sanitize("coverage"). Prior versions of Clang support coverage diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h index 9b157b71036f..f55a37efdb97 100644 --- a/include/linux/compiler-gcc.h +++ b/include/linux/compiler-gcc.h @@ -114,6 +114,12 @@ #define __SANITIZE_ADDRESS__ #endif +/* + * GCC does not support KMSAN. + */ +#define __no_sanitize_memory +#define __no_kmsan_checks + /* * Turn individual warnings and errors on and off locally, depending * on version. -- cgit From 5de0ce85f5a4d2883eae6f48eb015bc5dfbd91e9 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:42 +0200 Subject: kmsan: mark noinstr as __no_sanitize_memory noinstr functions should never be instrumented, so make KMSAN skip them by applying the __no_sanitize_memory attribute. Link: https://lkml.kernel.org/r/20220915150417.722975-9-glider@google.com Signed-off-by: Alexander Potapenko Reviewed-by: Marco Elver Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/compiler_types.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 4f2a819fd60a..015207a6e2bf 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -229,7 +229,8 @@ struct ftrace_likely_data { /* Section for code which can't be instrumented at all */ #define noinstr \ noinline notrace __attribute((__section__(".noinstr.text"))) \ - __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage + __no_kcsan __no_sanitize_address __no_profile __no_sanitize_coverage \ + __no_sanitize_memory #endif /* __KERNEL__ */ -- cgit From f80be4571b19b9fd8dd1528cd2a2f123aff51f70 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:45 +0200 Subject: kmsan: add KMSAN runtime core MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit For each memory location KernelMemorySanitizer maintains two types of metadata: 1. The so-called shadow of that location - а byte:byte mapping describing whether or not individual bits of memory are initialized (shadow is 0) or not (shadow is 1). 2. The origins of that location - а 4-byte:4-byte mapping containing 4-byte IDs of the stack traces where uninitialized values were created. Each struct page now contains pointers to two struct pages holding KMSAN metadata (shadow and origins) for the original struct page. Utility routines in mm/kmsan/core.c and mm/kmsan/shadow.c handle the metadata creation, addressing, copying and checking. mm/kmsan/report.c performs error reporting in the cases an uninitialized value is used in a way that leads to undefined behavior. KMSAN compiler instrumentation is responsible for tracking the metadata along with the kernel memory. mm/kmsan/instrumentation.c provides the implementation for instrumentation hooks that are called from files compiled with -fsanitize=kernel-memory. To aid parameter passing (also done at instrumentation level), each task_struct now contains a struct kmsan_task_state used to track the metadata of function parameters and return values for that task. Finally, this patch provides CONFIG_KMSAN that enables KMSAN, and declares CFLAGS_KMSAN, which are applied to files compiled with KMSAN. The KMSAN_SANITIZE:=n Makefile directive can be used to completely disable KMSAN instrumentation for certain files. Similarly, KMSAN_ENABLE_CHECKS:=n disables KMSAN checks and makes newly created stack memory initialized. Users can also use functions from include/linux/kmsan-checks.h to mark certain memory regions as uninitialized or initialized (this is called "poisoning" and "unpoisoning") or check that a particular region is initialized. Link: https://lkml.kernel.org/r/20220915150417.722975-12-glider@google.com Signed-off-by: Alexander Potapenko Acked-by: Marco Elver Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/kmsan-checks.h | 64 ++++++++++++++++++++++++++++++++++++++++++++ include/linux/kmsan_types.h | 35 ++++++++++++++++++++++++ include/linux/mm_types.h | 12 +++++++++ include/linux/sched.h | 5 ++++ 4 files changed, 116 insertions(+) create mode 100644 include/linux/kmsan-checks.h create mode 100644 include/linux/kmsan_types.h (limited to 'include/linux') diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h new file mode 100644 index 000000000000..a6522a0c28df --- /dev/null +++ b/include/linux/kmsan-checks.h @@ -0,0 +1,64 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KMSAN checks to be used for one-off annotations in subsystems. + * + * Copyright (C) 2017-2022 Google LLC + * Author: Alexander Potapenko + * + */ + +#ifndef _LINUX_KMSAN_CHECKS_H +#define _LINUX_KMSAN_CHECKS_H + +#include + +#ifdef CONFIG_KMSAN + +/** + * kmsan_poison_memory() - Mark the memory range as uninitialized. + * @address: address to start with. + * @size: size of buffer to poison. + * @flags: GFP flags for allocations done by this function. + * + * Until other data is written to this range, KMSAN will treat it as + * uninitialized. Error reports for this memory will reference the call site of + * kmsan_poison_memory() as origin. + */ +void kmsan_poison_memory(const void *address, size_t size, gfp_t flags); + +/** + * kmsan_unpoison_memory() - Mark the memory range as initialized. + * @address: address to start with. + * @size: size of buffer to unpoison. + * + * Until other data is written to this range, KMSAN will treat it as + * initialized. + */ +void kmsan_unpoison_memory(const void *address, size_t size); + +/** + * kmsan_check_memory() - Check the memory range for being initialized. + * @address: address to start with. + * @size: size of buffer to check. + * + * If any piece of the given range is marked as uninitialized, KMSAN will report + * an error. + */ +void kmsan_check_memory(const void *address, size_t size); + +#else + +static inline void kmsan_poison_memory(const void *address, size_t size, + gfp_t flags) +{ +} +static inline void kmsan_unpoison_memory(const void *address, size_t size) +{ +} +static inline void kmsan_check_memory(const void *address, size_t size) +{ +} + +#endif + +#endif /* _LINUX_KMSAN_CHECKS_H */ diff --git a/include/linux/kmsan_types.h b/include/linux/kmsan_types.h new file mode 100644 index 000000000000..8bfa6c98176d --- /dev/null +++ b/include/linux/kmsan_types.h @@ -0,0 +1,35 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * A minimal header declaring types added by KMSAN to existing kernel structs. + * + * Copyright (C) 2017-2022 Google LLC + * Author: Alexander Potapenko + * + */ +#ifndef _LINUX_KMSAN_TYPES_H +#define _LINUX_KMSAN_TYPES_H + +/* These constants are defined in the MSan LLVM instrumentation pass. */ +#define KMSAN_RETVAL_SIZE 800 +#define KMSAN_PARAM_SIZE 800 + +struct kmsan_context_state { + char param_tls[KMSAN_PARAM_SIZE]; + char retval_tls[KMSAN_RETVAL_SIZE]; + char va_arg_tls[KMSAN_PARAM_SIZE]; + char va_arg_origin_tls[KMSAN_PARAM_SIZE]; + u64 va_arg_overflow_size_tls; + char param_origin_tls[KMSAN_PARAM_SIZE]; + u32 retval_origin_tls; +}; + +#undef KMSAN_PARAM_SIZE +#undef KMSAN_RETVAL_SIZE + +struct kmsan_ctx { + struct kmsan_context_state cstate; + int kmsan_in_runtime; + bool allow_reporting; +}; + +#endif /* _LINUX_KMSAN_TYPES_H */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5c87d0f292a2..500e536796ca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -224,6 +224,18 @@ struct page { not kmapped, ie. highmem) */ #endif /* WANT_PAGE_VIRTUAL */ +#ifdef CONFIG_KMSAN + /* + * KMSAN metadata for this page: + * - shadow page: every bit indicates whether the corresponding + * bit of the original page is initialized (0) or not (1); + * - origin page: every 4 bytes contain an id of the stack trace + * where the uninitialized value was created. + */ + struct page *kmsan_shadow; + struct page *kmsan_origin; +#endif + #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS int _last_cpupid; #endif diff --git a/include/linux/sched.h b/include/linux/sched.h index fbac3c19fe35..88a043f7235e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -1362,6 +1363,10 @@ struct task_struct { #endif #endif +#ifdef CONFIG_KMSAN + struct kmsan_ctx kmsan_ctx; +#endif + #if IS_ENABLED(CONFIG_KUNIT) struct kunit *kunit_test; #endif -- cgit From b073d7f8aee4ebf05d10e3380df377b73120cf16 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:48 +0200 Subject: mm: kmsan: maintain KMSAN metadata for page operations Insert KMSAN hooks that make the necessary bookkeeping changes: - poison page shadow and origins in alloc_pages()/free_page(); - clear page shadow and origins in clear_page(), copy_user_highpage(); - copy page metadata in copy_highpage(), wp_page_copy(); - handle vmap()/vunmap()/iounmap(); Link: https://lkml.kernel.org/r/20220915150417.722975-15-glider@google.com Signed-off-by: Alexander Potapenko Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Marco Elver Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/highmem.h | 3 + include/linux/kmsan.h | 145 ++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 148 insertions(+) create mode 100644 include/linux/kmsan.h (limited to 'include/linux') diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 25679035ca28..e9912da5441b 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -6,6 +6,7 @@ #include #include #include +#include #include #include #include @@ -311,6 +312,7 @@ static inline void copy_user_highpage(struct page *to, struct page *from, vfrom = kmap_local_page(from); vto = kmap_local_page(to); copy_user_page(vto, vfrom, vaddr, to); + kmsan_unpoison_memory(page_address(to), PAGE_SIZE); kunmap_local(vto); kunmap_local(vfrom); } @@ -326,6 +328,7 @@ static inline void copy_highpage(struct page *to, struct page *from) vfrom = kmap_local_page(from); vto = kmap_local_page(to); copy_page(vto, vfrom); + kmsan_copy_page_meta(to, from); kunmap_local(vto); kunmap_local(vfrom); } diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h new file mode 100644 index 000000000000..b36bf3db835e --- /dev/null +++ b/include/linux/kmsan.h @@ -0,0 +1,145 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KMSAN API for subsystems. + * + * Copyright (C) 2017-2022 Google LLC + * Author: Alexander Potapenko + * + */ +#ifndef _LINUX_KMSAN_H +#define _LINUX_KMSAN_H + +#include +#include +#include + +struct page; + +#ifdef CONFIG_KMSAN + +/** + * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call. + * @page: struct page pointer returned by alloc_pages(). + * @order: order of allocated struct page. + * @flags: GFP flags used by alloc_pages() + * + * KMSAN marks 1<<@order pages starting at @page as uninitialized, unless + * @flags contain __GFP_ZERO. + */ +void kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags); + +/** + * kmsan_free_page() - Notify KMSAN about a free_pages() call. + * @page: struct page pointer passed to free_pages(). + * @order: order of deallocated struct page. + * + * KMSAN marks freed memory as uninitialized. + */ +void kmsan_free_page(struct page *page, unsigned int order); + +/** + * kmsan_copy_page_meta() - Copy KMSAN metadata between two pages. + * @dst: destination page. + * @src: source page. + * + * KMSAN copies the contents of metadata pages for @src into the metadata pages + * for @dst. If @dst has no associated metadata pages, nothing happens. + * If @src has no associated metadata pages, @dst metadata pages are unpoisoned. + */ +void kmsan_copy_page_meta(struct page *dst, struct page *src); + +/** + * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap. + * @start: start of vmapped range. + * @end: end of vmapped range. + * @prot: page protection flags used for vmap. + * @pages: array of pages. + * @page_shift: page_shift passed to vmap_range_noflush(). + * + * KMSAN maps shadow and origin pages of @pages into contiguous ranges in + * vmalloc metadata address range. + */ +void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end, + pgprot_t prot, struct page **pages, + unsigned int page_shift); + +/** + * kmsan_vunmap_kernel_range_noflush() - Notify KMSAN about a vunmap. + * @start: start of vunmapped range. + * @end: end of vunmapped range. + * + * KMSAN unmaps the contiguous metadata ranges created by + * kmsan_map_kernel_range_noflush(). + */ +void kmsan_vunmap_range_noflush(unsigned long start, unsigned long end); + +/** + * kmsan_ioremap_page_range() - Notify KMSAN about a ioremap_page_range() call. + * @addr: range start. + * @end: range end. + * @phys_addr: physical range start. + * @prot: page protection flags used for ioremap_page_range(). + * @page_shift: page_shift argument passed to vmap_range_noflush(). + * + * KMSAN creates new metadata pages for the physical pages mapped into the + * virtual memory. + */ +void kmsan_ioremap_page_range(unsigned long addr, unsigned long end, + phys_addr_t phys_addr, pgprot_t prot, + unsigned int page_shift); + +/** + * kmsan_iounmap_page_range() - Notify KMSAN about a iounmap_page_range() call. + * @start: range start. + * @end: range end. + * + * KMSAN unmaps the metadata pages for the given range and, unlike for + * vunmap_page_range(), also deallocates them. + */ +void kmsan_iounmap_page_range(unsigned long start, unsigned long end); + +#else + +static inline int kmsan_alloc_page(struct page *page, unsigned int order, + gfp_t flags) +{ + return 0; +} + +static inline void kmsan_free_page(struct page *page, unsigned int order) +{ +} + +static inline void kmsan_copy_page_meta(struct page *dst, struct page *src) +{ +} + +static inline void kmsan_vmap_pages_range_noflush(unsigned long start, + unsigned long end, + pgprot_t prot, + struct page **pages, + unsigned int page_shift) +{ +} + +static inline void kmsan_vunmap_range_noflush(unsigned long start, + unsigned long end) +{ +} + +static inline void kmsan_ioremap_page_range(unsigned long start, + unsigned long end, + phys_addr_t phys_addr, + pgprot_t prot, + unsigned int page_shift) +{ +} + +static inline void kmsan_iounmap_page_range(unsigned long start, + unsigned long end) +{ +} + +#endif + +#endif /* _LINUX_KMSAN_H */ -- cgit From 68ef169a1dd20df5cfa5a161b7304ad9fdd14c36 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:49 +0200 Subject: mm: kmsan: call KMSAN hooks from SLUB code In order to report uninitialized memory coming from heap allocations KMSAN has to poison them unless they're created with __GFP_ZERO. It's handy that we need KMSAN hooks in the places where init_on_alloc/init_on_free initialization is performed. In addition, we apply __no_kmsan_checks to get_freepointer_safe() to suppress reports when accessing freelist pointers that reside in freed objects. Link: https://lkml.kernel.org/r/20220915150417.722975-16-glider@google.com Signed-off-by: Alexander Potapenko Reviewed-by: Marco Elver Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/kmsan.h | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index b36bf3db835e..5c4e0079054e 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -14,6 +14,7 @@ #include struct page; +struct kmem_cache; #ifdef CONFIG_KMSAN @@ -48,6 +49,44 @@ void kmsan_free_page(struct page *page, unsigned int order); */ void kmsan_copy_page_meta(struct page *dst, struct page *src); +/** + * kmsan_slab_alloc() - Notify KMSAN about a slab allocation. + * @s: slab cache the object belongs to. + * @object: object pointer. + * @flags: GFP flags passed to the allocator. + * + * Depending on cache flags and GFP flags, KMSAN sets up the metadata of the + * newly created object, marking it as initialized or uninitialized. + */ +void kmsan_slab_alloc(struct kmem_cache *s, void *object, gfp_t flags); + +/** + * kmsan_slab_free() - Notify KMSAN about a slab deallocation. + * @s: slab cache the object belongs to. + * @object: object pointer. + * + * KMSAN marks the freed object as uninitialized. + */ +void kmsan_slab_free(struct kmem_cache *s, void *object); + +/** + * kmsan_kmalloc_large() - Notify KMSAN about a large slab allocation. + * @ptr: object pointer. + * @size: object size. + * @flags: GFP flags passed to the allocator. + * + * Similar to kmsan_slab_alloc(), but for large allocations. + */ +void kmsan_kmalloc_large(const void *ptr, size_t size, gfp_t flags); + +/** + * kmsan_kfree_large() - Notify KMSAN about a large slab deallocation. + * @ptr: object pointer. + * + * Similar to kmsan_slab_free(), but for large allocations. + */ +void kmsan_kfree_large(const void *ptr); + /** * kmsan_map_kernel_range_noflush() - Notify KMSAN about a vmap. * @start: start of vmapped range. @@ -114,6 +153,24 @@ static inline void kmsan_copy_page_meta(struct page *dst, struct page *src) { } +static inline void kmsan_slab_alloc(struct kmem_cache *s, void *object, + gfp_t flags) +{ +} + +static inline void kmsan_slab_free(struct kmem_cache *s, void *object) +{ +} + +static inline void kmsan_kmalloc_large(const void *ptr, size_t size, + gfp_t flags) +{ +} + +static inline void kmsan_kfree_large(const void *ptr) +{ +} + static inline void kmsan_vmap_pages_range_noflush(unsigned long start, unsigned long end, pgprot_t prot, -- cgit From 50b5e49ca694a60f84a2a12d62b6cb6ec8e3649f Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:50 +0200 Subject: kmsan: handle task creation and exiting Tell KMSAN that a new task is created, so the tool creates a backing metadata structure for that task. Link: https://lkml.kernel.org/r/20220915150417.722975-17-glider@google.com Signed-off-by: Alexander Potapenko Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Marco Elver Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/kmsan.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index 5c4e0079054e..354aee6f7b1a 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -15,9 +15,22 @@ struct page; struct kmem_cache; +struct task_struct; #ifdef CONFIG_KMSAN +/** + * kmsan_task_create() - Initialize KMSAN state for the task. + * @task: task to initialize. + */ +void kmsan_task_create(struct task_struct *task); + +/** + * kmsan_task_exit() - Notify KMSAN that a task has exited. + * @task: task about to finish. + */ +void kmsan_task_exit(struct task_struct *task); + /** * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call. * @page: struct page pointer returned by alloc_pages(). @@ -139,6 +152,14 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end); #else +static inline void kmsan_task_create(struct task_struct *task) +{ +} + +static inline void kmsan_task_exit(struct task_struct *task) +{ +} + static inline int kmsan_alloc_page(struct page *page, unsigned int order, gfp_t flags) { -- cgit From 3c206509826094e85ead0b056f484db96829248d Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:51 +0200 Subject: init: kmsan: call KMSAN initialization routines kmsan_init_shadow() scans the mappings created at boot time and creates metadata pages for those mappings. When the memblock allocator returns pages to pagealloc, we reserve 2/3 of those pages and use them as metadata for the remaining 1/3. Once KMSAN starts, every page allocated by pagealloc has its associated shadow and origin pages. kmsan_initialize() initializes the bookkeeping for init_task and enables KMSAN. Link: https://lkml.kernel.org/r/20220915150417.722975-18-glider@google.com Signed-off-by: Alexander Potapenko Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Marco Elver Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/kmsan.h | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index 354aee6f7b1a..e00de976ee43 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -31,6 +31,28 @@ void kmsan_task_create(struct task_struct *task); */ void kmsan_task_exit(struct task_struct *task); +/** + * kmsan_init_shadow() - Initialize KMSAN shadow at boot time. + * + * Allocate and initialize KMSAN metadata for early allocations. + */ +void __init kmsan_init_shadow(void); + +/** + * kmsan_init_runtime() - Initialize KMSAN state and enable KMSAN. + */ +void __init kmsan_init_runtime(void); + +/** + * kmsan_memblock_free_pages() - handle freeing of memblock pages. + * @page: struct page to free. + * @order: order of @page. + * + * Freed pages are either returned to buddy allocator or held back to be used + * as metadata pages. + */ +bool __init kmsan_memblock_free_pages(struct page *page, unsigned int order); + /** * kmsan_alloc_page() - Notify KMSAN about an alloc_pages() call. * @page: struct page pointer returned by alloc_pages(). @@ -152,6 +174,20 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end); #else +static inline void kmsan_init_shadow(void) +{ +} + +static inline void kmsan_init_runtime(void) +{ +} + +static inline bool kmsan_memblock_free_pages(struct page *page, + unsigned int order) +{ + return true; +} + static inline void kmsan_task_create(struct task_struct *task) { } -- cgit From 75cf0290271bf6dae9dee982aef15242dadf97e4 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:52 +0200 Subject: instrumented.h: add KMSAN support To avoid false positives, KMSAN needs to unpoison the data copied from the userspace. To detect infoleaks - check the memory buffer passed to copy_to_user(). Link: https://lkml.kernel.org/r/20220915150417.722975-19-glider@google.com Signed-off-by: Alexander Potapenko Reviewed-by: Marco Elver Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/instrumented.h | 18 +++++++++++++----- include/linux/kmsan-checks.h | 19 +++++++++++++++++++ 2 files changed, 32 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/instrumented.h b/include/linux/instrumented.h index 9f1dba8f717b..501fa8486749 100644 --- a/include/linux/instrumented.h +++ b/include/linux/instrumented.h @@ -2,7 +2,7 @@ /* * This header provides generic wrappers for memory access instrumentation that - * the compiler cannot emit for: KASAN, KCSAN. + * the compiler cannot emit for: KASAN, KCSAN, KMSAN. */ #ifndef _LINUX_INSTRUMENTED_H #define _LINUX_INSTRUMENTED_H @@ -10,6 +10,7 @@ #include #include #include +#include #include /** @@ -117,6 +118,7 @@ instrument_copy_to_user(void __user *to, const void *from, unsigned long n) { kasan_check_read(from, n); kcsan_check_read(from, n); + kmsan_copy_to_user(to, from, n, 0); } /** @@ -151,6 +153,7 @@ static __always_inline void instrument_copy_from_user_after(const void *to, const void __user *from, unsigned long n, unsigned long left) { + kmsan_unpoison_memory(to, n - left); } /** @@ -162,10 +165,14 @@ instrument_copy_from_user_after(const void *to, const void __user *from, * * @to destination variable, may not be address-taken */ -#define instrument_get_user(to) \ -({ \ +#define instrument_get_user(to) \ +({ \ + u64 __tmp = (u64)(to); \ + kmsan_unpoison_memory(&__tmp, sizeof(__tmp)); \ + to = __tmp; \ }) + /** * instrument_put_user() - add instrumentation to put_user()-like macros * @@ -177,8 +184,9 @@ instrument_copy_from_user_after(const void *to, const void __user *from, * @ptr userspace pointer to copy to * @size number of bytes to copy */ -#define instrument_put_user(from, ptr, size) \ -({ \ +#define instrument_put_user(from, ptr, size) \ +({ \ + kmsan_copy_to_user(ptr, &from, sizeof(from), 0); \ }) #endif /* _LINUX_INSTRUMENTED_H */ diff --git a/include/linux/kmsan-checks.h b/include/linux/kmsan-checks.h index a6522a0c28df..c4cae333deec 100644 --- a/include/linux/kmsan-checks.h +++ b/include/linux/kmsan-checks.h @@ -46,6 +46,21 @@ void kmsan_unpoison_memory(const void *address, size_t size); */ void kmsan_check_memory(const void *address, size_t size); +/** + * kmsan_copy_to_user() - Notify KMSAN about a data transfer to userspace. + * @to: destination address in the userspace. + * @from: source address in the kernel. + * @to_copy: number of bytes to copy. + * @left: number of bytes not copied. + * + * If this is a real userspace data transfer, KMSAN checks the bytes that were + * actually copied to ensure there was no information leak. If @to belongs to + * the kernel space (which is possible for compat syscalls), KMSAN just copies + * the metadata. + */ +void kmsan_copy_to_user(void __user *to, const void *from, size_t to_copy, + size_t left); + #else static inline void kmsan_poison_memory(const void *address, size_t size, @@ -58,6 +73,10 @@ static inline void kmsan_unpoison_memory(const void *address, size_t size) static inline void kmsan_check_memory(const void *address, size_t size) { } +static inline void kmsan_copy_to_user(void __user *to, const void *from, + size_t to_copy, size_t left) +{ +} #endif -- cgit From 7ade4f10779cb46f5c29ced9b7a41f68501cf0ed Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:55 +0200 Subject: dma: kmsan: unpoison DMA mappings KMSAN doesn't know about DMA memory writes performed by devices. We unpoison such memory when it's mapped to avoid false positive reports. Link: https://lkml.kernel.org/r/20220915150417.722975-22-glider@google.com Signed-off-by: Alexander Potapenko Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Marco Elver Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/kmsan.h | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index e00de976ee43..dac296da45c5 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -9,6 +9,7 @@ #ifndef _LINUX_KMSAN_H #define _LINUX_KMSAN_H +#include #include #include #include @@ -16,6 +17,7 @@ struct page; struct kmem_cache; struct task_struct; +struct scatterlist; #ifdef CONFIG_KMSAN @@ -172,6 +174,35 @@ void kmsan_ioremap_page_range(unsigned long addr, unsigned long end, */ void kmsan_iounmap_page_range(unsigned long start, unsigned long end); +/** + * kmsan_handle_dma() - Handle a DMA data transfer. + * @page: first page of the buffer. + * @offset: offset of the buffer within the first page. + * @size: buffer size. + * @dir: one of possible dma_data_direction values. + * + * Depending on @direction, KMSAN: + * * checks the buffer, if it is copied to device; + * * initializes the buffer, if it is copied from device; + * * does both, if this is a DMA_BIDIRECTIONAL transfer. + */ +void kmsan_handle_dma(struct page *page, size_t offset, size_t size, + enum dma_data_direction dir); + +/** + * kmsan_handle_dma_sg() - Handle a DMA transfer using scatterlist. + * @sg: scatterlist holding DMA buffers. + * @nents: number of scatterlist entries. + * @dir: one of possible dma_data_direction values. + * + * Depending on @direction, KMSAN: + * * checks the buffers in the scatterlist, if they are copied to device; + * * initializes the buffers, if they are copied from device; + * * does both, if this is a DMA_BIDIRECTIONAL transfer. + */ +void kmsan_handle_dma_sg(struct scatterlist *sg, int nents, + enum dma_data_direction dir); + #else static inline void kmsan_init_shadow(void) @@ -254,6 +285,16 @@ static inline void kmsan_iounmap_page_range(unsigned long start, { } +static inline void kmsan_handle_dma(struct page *page, size_t offset, + size_t size, enum dma_data_direction dir) +{ +} + +static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents, + enum dma_data_direction dir) +{ +} + #endif #endif /* _LINUX_KMSAN_H */ -- cgit From 553a80188a5d7164d2b0688b06bf3fe297023bfe Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:03:57 +0200 Subject: kmsan: handle memory sent to/from USB Depending on the value of is_out kmsan_handle_urb() KMSAN either marks the data copied to the kernel from a USB device as initialized, or checks the data sent to the device for being initialized. Link: https://lkml.kernel.org/r/20220915150417.722975-24-glider@google.com Signed-off-by: Alexander Potapenko Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Marco Elver Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/kmsan.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index dac296da45c5..c473e0e21683 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -18,6 +18,7 @@ struct page; struct kmem_cache; struct task_struct; struct scatterlist; +struct urb; #ifdef CONFIG_KMSAN @@ -203,6 +204,16 @@ void kmsan_handle_dma(struct page *page, size_t offset, size_t size, void kmsan_handle_dma_sg(struct scatterlist *sg, int nents, enum dma_data_direction dir); +/** + * kmsan_handle_urb() - Handle a USB data transfer. + * @urb: struct urb pointer. + * @is_out: data transfer direction (true means output to hardware). + * + * If @is_out is true, KMSAN checks the transfer buffer of @urb. Otherwise, + * KMSAN initializes the transfer buffer. + */ +void kmsan_handle_urb(const struct urb *urb, bool is_out); + #else static inline void kmsan_init_shadow(void) @@ -295,6 +306,10 @@ static inline void kmsan_handle_dma_sg(struct scatterlist *sg, int nents, { } +static inline void kmsan_handle_urb(const struct urb *urb, bool is_out) +{ +} + #endif #endif /* _LINUX_KMSAN_H */ -- cgit From ff901d80fff6d65ada6f2a60a1f7d180ee2e0416 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:04:09 +0200 Subject: x86: kmsan: use __msan_ string functions where possible. Unless stated otherwise (by explicitly calling __memcpy(), __memset() or __memmove()) we want all string functions to call their __msan_ versions (e.g. __msan_memcpy() instead of memcpy()), so that shadow and origin values are updated accordingly. Bootloader must still use the default string functions to avoid crashes. Link: https://lkml.kernel.org/r/20220915150417.722975-36-glider@google.com Signed-off-by: Alexander Potapenko Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Marco Elver Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/fortify-string.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 3b401fa0f374..6c8a1a29d0b6 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -285,8 +285,10 @@ __FORTIFY_INLINE void fortify_memset_chk(__kernel_size_t size, * __builtin_object_size() must be captured here to avoid evaluating argument * side-effects further into the macro layers. */ +#ifndef CONFIG_KMSAN #define memset(p, c, s) __fortify_memset_chk(p, c, s, \ __builtin_object_size(p, 0), __builtin_object_size(p, 1)) +#endif /* * To make sure the compiler can enforce protection against buffer overflows, -- cgit From 6cae637fa26df867449c6bc20ea8bc693abe49b0 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Thu, 15 Sep 2022 17:04:14 +0200 Subject: entry: kmsan: introduce kmsan_unpoison_entry_regs() struct pt_regs passed into IRQ entry code is set up by uninstrumented asm functions, therefore KMSAN may not notice the registers are initialized. kmsan_unpoison_entry_regs() unpoisons the contents of struct pt_regs, preventing potential false positives. Unlike kmsan_unpoison_memory(), it can be called under kmsan_in_runtime(), which is often the case in IRQ entry code. Link: https://lkml.kernel.org/r/20220915150417.722975-41-glider@google.com Signed-off-by: Alexander Potapenko Cc: Alexander Viro Cc: Alexei Starovoitov Cc: Andrey Konovalov Cc: Andrey Konovalov Cc: Andy Lutomirski Cc: Arnd Bergmann Cc: Borislav Petkov Cc: Christoph Hellwig Cc: Christoph Lameter Cc: David Rientjes Cc: Dmitry Vyukov Cc: Eric Biggers Cc: Eric Biggers Cc: Eric Dumazet Cc: Greg Kroah-Hartman Cc: Herbert Xu Cc: Ilya Leoshkevich Cc: Ingo Molnar Cc: Jens Axboe Cc: Joonsoo Kim Cc: Kees Cook Cc: Marco Elver Cc: Mark Rutland Cc: Matthew Wilcox Cc: Michael S. Tsirkin Cc: Pekka Enberg Cc: Peter Zijlstra Cc: Petr Mladek Cc: Stephen Rothwell Cc: Steven Rostedt Cc: Thomas Gleixner Cc: Vasily Gorbik Cc: Vegard Nossum Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/kmsan.h | 15 +++++++++++++++ 1 file changed, 15 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h index c473e0e21683..e38ae3c34618 100644 --- a/include/linux/kmsan.h +++ b/include/linux/kmsan.h @@ -214,6 +214,17 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents, */ void kmsan_handle_urb(const struct urb *urb, bool is_out); +/** + * kmsan_unpoison_entry_regs() - Handle pt_regs in low-level entry code. + * @regs: struct pt_regs pointer received from assembly code. + * + * KMSAN unpoisons the contents of the passed pt_regs, preventing potential + * false positive reports. Unlike kmsan_unpoison_memory(), + * kmsan_unpoison_entry_regs() can be called from the regions where + * kmsan_in_runtime() returns true, which is the case in early entry code. + */ +void kmsan_unpoison_entry_regs(const struct pt_regs *regs); + #else static inline void kmsan_init_shadow(void) @@ -310,6 +321,10 @@ static inline void kmsan_handle_urb(const struct urb *urb, bool is_out) { } +static inline void kmsan_unpoison_entry_regs(const struct pt_regs *regs) +{ +} + #endif #endif /* _LINUX_KMSAN_H */ -- cgit From 16bc1b0f0269b6110f6d25880b52947d354e2980 Mon Sep 17 00:00:00 2001 From: Kaixu Xia Date: Thu, 15 Sep 2022 19:33:41 +0800 Subject: mm/damon: use 'struct damon_target *' instead of 'void *' in target_valid() We could use 'struct damon_target *' directly instead of 'void *' in target_valid() operation to make code simple. Link: https://lkml.kernel.org/r/1663241621-13293-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia Reviewed-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index c5dc0c77c772..1dda8d0068e5 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -346,7 +346,7 @@ struct damon_operations { unsigned long (*apply_scheme)(struct damon_ctx *context, struct damon_target *t, struct damon_region *r, struct damos *scheme); - bool (*target_valid)(void *target); + bool (*target_valid)(struct damon_target *t); void (*cleanup)(struct damon_ctx *context); }; -- cgit From cc713520bdc1b84fc5394f6ac8649b93ad2c5dde Mon Sep 17 00:00:00 2001 From: Kaixu Xia Date: Fri, 16 Sep 2022 23:20:35 +0800 Subject: mm/damon: return void from damon_set_schemes() There is no point in returning an int from damon_set_schemes(). It always returns 0 which is meaningless for the caller, so change it to return void directly. Link: https://lkml.kernel.org/r/1663341635-12675-1-git-send-email-kaixuxia@tencent.com Signed-off-by: Kaixu Xia Reviewed-by: SeongJae Park Reviewed-by: Muchun Song Signed-off-by: Andrew Morton --- include/linux/damon.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index 1dda8d0068e5..e7808a84675f 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -541,7 +541,7 @@ unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); int damon_set_attrs(struct damon_ctx *ctx, struct damon_attrs *attrs); -int damon_set_schemes(struct damon_ctx *ctx, +void damon_set_schemes(struct damon_ctx *ctx, struct damos **schemes, ssize_t nr_schemes); int damon_nr_running_ctxs(void); bool damon_is_registered_ops(enum damon_ops_id id); -- cgit From 638a9ae97ab596f1f7b7522dad709e69cb5b4e9d Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Fri, 16 Sep 2022 15:22:44 +0800 Subject: mm: remove obsolete macro NR_PCP_ORDER_MASK and NR_PCP_ORDER_WIDTH Since commit 8b10b465d0e1 ("mm/page_alloc: free pages in a single pass during bulk free"), they're not used anymore. Remove them. Link: https://lkml.kernel.org/r/20220916072257.9639-4-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Anshuman Khandual Reviewed-by: Oscar Salvador Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c69c08156822..3ff1e757d5aa 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -564,13 +564,6 @@ enum zone_watermarks { #define NR_LOWORDER_PCP_LISTS (MIGRATE_PCPTYPES * (PAGE_ALLOC_COSTLY_ORDER + 1)) #define NR_PCP_LISTS (NR_LOWORDER_PCP_LISTS + NR_PCP_THP) -/* - * Shift to encode migratetype and order in the same integer, with order - * in the least significant bits. - */ -#define NR_PCP_ORDER_WIDTH 8 -#define NR_PCP_ORDER_MASK ((1<_watermark[WMARK_MIN] + z->watermark_boost) #define low_wmark_pages(z) (z->_watermark[WMARK_LOW] + z->watermark_boost) #define high_wmark_pages(z) (z->_watermark[WMARK_HIGH] + z->watermark_boost) -- cgit From 5749fcc5f04cef4091dea0c2ba6b5c5f5e05a549 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Fri, 16 Sep 2022 15:22:46 +0800 Subject: mm/page_alloc: add __init annotations to init_mem_debugging_and_hardening() It's only called by mm_init(). Add __init annotations to it. Link: https://lkml.kernel.org/r/20220916072257.9639-6-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Anshuman Khandual Reviewed-by: Oscar Salvador Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index a37c8a29c49b..8bbcccbc5565 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3092,7 +3092,7 @@ extern int apply_to_existing_page_range(struct mm_struct *mm, unsigned long address, unsigned long size, pte_fn_t fn, void *data); -extern void init_mem_debugging_and_hardening(void); +extern void __init init_mem_debugging_and_hardening(void); #ifdef CONFIG_PAGE_POISONING extern void __kernel_poison_pages(struct page *page, int numpages); extern void __kernel_unpoison_pages(struct page *page, int numpages); -- cgit From 30e3b5d7c82f78c63c53197b5d8b99636bb60d56 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Fri, 16 Sep 2022 15:22:48 +0800 Subject: mm: remove obsolete pgdat_is_empty() There's no caller. Remove it. Link: https://lkml.kernel.org/r/20220916072257.9639-8-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Anshuman Khandual Reviewed-by: Oscar Salvador Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/mmzone.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 3ff1e757d5aa..4c8510f26b02 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1241,11 +1241,6 @@ static inline unsigned long pgdat_end_pfn(pg_data_t *pgdat) return pgdat->node_start_pfn + pgdat->node_spanned_pages; } -static inline bool pgdat_is_empty(pg_data_t *pgdat) -{ - return !pgdat->node_start_pfn && !pgdat->node_spanned_pages; -} - #include void build_all_zonelists(pg_data_t *pgdat); -- cgit From f774a6a6fd39e1b5677bdf71f6813b382faddeeb Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Fri, 16 Sep 2022 15:22:51 +0800 Subject: mm, memory_hotplug: remove obsolete generic_free_nodedata() Commit 390511e1476e ("mm, memory_hotplug: drop arch_free_nodedata") drops the last caller of generic_free_nodedata(). Remove it too. Link: https://lkml.kernel.org/r/20220916072257.9639-11-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Anshuman Khandual Reviewed-by: Oscar Salvador Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/memory_hotplug.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 51052969dbfe..9fcbf5706595 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -43,11 +43,6 @@ extern void arch_refresh_nodedata(int nid, pg_data_t *pgdat); ({ \ memblock_alloc(sizeof(*pgdat), SMP_CACHE_BYTES); \ }) -/* - * This definition is just for error path in node hotadd. - * For node hotremove, we have to replace this. - */ -#define generic_free_nodedata(pgdat) kfree(pgdat) extern pg_data_t *node_data[]; static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) @@ -63,9 +58,6 @@ static inline pg_data_t *generic_alloc_nodedata(int nid) BUG(); return NULL; } -static inline void generic_free_nodedata(pg_data_t *pgdat) -{ -} static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) { } -- cgit From def76fd549c513bb90278a8d6d0fe3ef3faa20a7 Mon Sep 17 00:00:00 2001 From: Miaohe Lin Date: Fri, 16 Sep 2022 15:22:56 +0800 Subject: mm/page_alloc: remove obsolete gfpflags_normal_context() Since commit dacb5d8875cc ("tcp: fix page frag corruption on page fault"), there's no caller of gfpflags_normal_context(). Remove it as this helper is strictly tied to the sk page frag usage and there won't be other user in the future. [linmiaohe@huawei.com: fix htmldocs] Link: https://lkml.kernel.org/r/1bc55727-9b66-0e9e-c306-f10c4716ea89@huawei.com Link: https://lkml.kernel.org/r/20220916072257.9639-16-linmiaohe@huawei.com Signed-off-by: Miaohe Lin Reviewed-by: David Hildenbrand Reviewed-by: Anshuman Khandual Reviewed-by: Oscar Salvador Cc: Matthew Wilcox Signed-off-by: Andrew Morton --- include/linux/gfp.h | 23 ----------------------- 1 file changed, 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index ea6cb9399152..ef4aea3b356e 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -36,29 +36,6 @@ static inline bool gfpflags_allow_blocking(const gfp_t gfp_flags) return !!(gfp_flags & __GFP_DIRECT_RECLAIM); } -/** - * gfpflags_normal_context - is gfp_flags a normal sleepable context? - * @gfp_flags: gfp_flags to test - * - * Test whether @gfp_flags indicates that the allocation is from the - * %current context and allowed to sleep. - * - * An allocation being allowed to block doesn't mean it owns the %current - * context. When direct reclaim path tries to allocate memory, the - * allocation context is nested inside whatever %current was doing at the - * time of the original allocation. The nested allocation may be allowed - * to block but modifying anything %current owns can corrupt the outer - * context's expectations. - * - * %true result from this function indicates that the allocation context - * can sleep and use anything that's associated with %current. - */ -static inline bool gfpflags_normal_context(const gfp_t gfp_flags) -{ - return (gfp_flags & (__GFP_DIRECT_RECLAIM | __GFP_MEMALLOC)) == - __GFP_DIRECT_RECLAIM; -} - #ifdef CONFIG_HIGHMEM #define OPT_ZONE_HIGHMEM ZONE_HIGHMEM #else -- cgit From 233f0b31bd9503ce2be7be0bde69c67287c8a741 Mon Sep 17 00:00:00 2001 From: Kaixu Xia Date: Tue, 20 Sep 2022 16:53:22 +0000 Subject: mm/damon: deduplicate damon_{reclaim,lru_sort}_apply_parameters() The bodies of damon_{reclaim,lru_sort}_apply_parameters() contain duplicates. This commit adds a common function damon_set_region_biggest_system_ram_default() to remove the duplicates. Link: https://lkml.kernel.org/r/6329f00d.a70a0220.9bb29.3678SMTPIN_ADDED_BROKEN@mx.google.com Signed-off-by: Kaixu Xia Suggested-by: SeongJae Park Reviewed-by: SeongJae Park Signed-off-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index e7808a84675f..ed5470f50bab 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -557,7 +557,8 @@ static inline bool damon_target_has_pid(const struct damon_ctx *ctx) int damon_start(struct damon_ctx **ctxs, int nr_ctxs, bool exclusive); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); -bool damon_find_biggest_system_ram(unsigned long *start, unsigned long *end); +int damon_set_region_biggest_system_ram_default(struct damon_target *t, + unsigned long *start, unsigned long *end); #endif /* CONFIG_DAMON */ -- cgit From 2eb989195d9a361d13d66ffb8738847649e080ad Mon Sep 17 00:00:00 2001 From: Kairui Song Date: Tue, 20 Sep 2022 02:06:33 +0800 Subject: mm: memcontrol: use memcg_kmem_enabled in count_objcg_event Patch series "mm: memcontrol: cleanup and optimize for two accounting params", v2. This patch (of 2): There are currently two helpers for checking if cgroup kmem accounting is enabled: - mem_cgroup_kmem_disabled - memcg_kmem_enabled mem_cgroup_kmem_disabled is a simple helper that returns true if cgroup.memory=nokmem is specified, otherwise returns false. memcg_kmem_enabled is a bit different, it returns true if cgroup.memory=nokmem is not specified and there was at least one non-root memory control enabled cgroup ever created. This help improve performance when kmem accounting was not actually activated. And it's optimized with static branch. The usage of mem_cgroup_kmem_disabled is for sub-systems that need to preallocate data for kmem accounting since they could be initialized before kmem accounting is activated. But count_objcg_event doesn't need that, so using memcg_kmem_enabled is better here. Link: https://lkml.kernel.org/r/20220919180634.45958-1-ryncsn@gmail.com Link: https://lkml.kernel.org/r/20220919180634.45958-2-ryncsn@gmail.com Signed-off-by: Kairui Song Acked-by: Shakeel Butt Acked-by: Roman Gushchin Acked-by: Muchun Song Cc: Johannes Weiner Cc: Michal Hocko Signed-off-by: Andrew Morton --- include/linux/memcontrol.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index dc7d40e575d5..ef479e554253 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -1778,7 +1778,7 @@ static inline void count_objcg_event(struct obj_cgroup *objcg, { struct mem_cgroup *memcg; - if (mem_cgroup_kmem_disabled()) + if (!memcg_kmem_enabled()) return; rcu_read_lock(); -- cgit From 7c6c6cc4d3a213e7303ef06ff40f6193df01839c Mon Sep 17 00:00:00 2001 From: Zach O'Keefe Date: Thu, 22 Sep 2022 15:40:37 -0700 Subject: mm/shmem: add flag to enforce shmem THP in hugepage_vma_check() Patch series "mm: add file/shmem support to MADV_COLLAPSE", v4. This series builds on top of the previous "mm: userspace hugepage collapse" series which introduced the MADV_COLLAPSE madvise mode and added support for private, anonymous mappings[2], by adding support for file and shmem backed memory to CONFIG_READ_ONLY_THP_FOR_FS=y kernels. File and shmem support have been added with effort to align with existing MADV_COLLAPSE semantics and policy decisions[3]. Collapse of shmem-backed memory ignores kernel-guiding directives and heuristics including all sysfs settings (transparent_hugepage/shmem_enabled), and tmpfs huge= mount options (shmem always supports large folios). Like anonymous mappings, on successful return of MADV_COLLAPSE on file/shmem memory, the contents of memory mapped by the addresses provided will be synchronously pmd-mapped THPs. This functionality unlocks two important uses: (1) Immediately back executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. Now, we can have the best of both worlds: Peak upfront performance and lower RAM footprints. (2) userfaultfd-based live migration of virtual machines satisfy UFFD faults by fetching native-sized pages over the network (to avoid latency of transferring an entire hugepage). However, after guest memory has been fully copied to the new host, MADV_COLLAPSE can be used to immediately increase guest performance. khugepaged has received a small improvement by association and can now detect and collapse pte-mapped THPs. However, there is still work to be done along the file collapse path. Compound pages of arbitrary order still needs to be supported and THP collapse needs to be converted to using folios in general. Eventually, we'd like to move away from the read-only and executable-mapped constraints currently imposed on eligible files and support any inode claiming huge folio support. That said, I think the series as-is covers enough to claim that MADV_COLLAPSE supports file/shmem memory. Patches 1-3 Implement the guts of the series. Patch 4 Is a tracepoint for debugging. Patches 5-9 Refactor existing khugepaged selftests to work with new memory types + new collapse tests. Patch 10 Adds a userfaultfd selftest mode to mimic a functional test of UFFDIO_REGISTER_MODE_MINOR+MADV_COLLAPSE live migration. (v4 note: "userfaultfd shmem" selftest is failing as of Sep 22 mm-unstable) [1] https://lore.kernel.org/linux-mm/YyiK8YvVcrtZo0z3@google.com/ [2] https://lore.kernel.org/linux-mm/20220706235936.2197195-1-zokeefe@google.com/ [3] https://lore.kernel.org/linux-mm/YtBmhaiPHUTkJml8@google.com/ [4] https://lore.kernel.org/linux-mm/20220922222731.1124481-1-zokeefe@google.com/ [5] https://lore.kernel.org/linux-mm/20220922184651.1016461-1-zokeefe@google.com/ This patch (of 10): Extend 'mm/thp: add flag to enforce sysfs THP in hugepage_vma_check()' to shmem, allowing callers to ignore /sys/kernel/transparent_hugepage/shmem_enabled and tmpfs huge= mount. This is intended to be used by MADV_COLLAPSE, and the rationale is analogous to the anon/file case: MADV_COLLAPSE is not coupled to directives that advise the kernel's decisions on when THPs should be considered eligible. shmem/tmpfs always claims large folio support, regardless of sysfs or mount options. [shy828301@gmail.com: test shmem_huge_force explicitly] Link: https://lore.kernel.org/linux-mm/CAHbLzko3A5-TpS0BgBeKkx5cuOkWgLvWXQH=TdgW-baO4rPtdg@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220922224046.1143204-1-zokeefe@google.com Link: https://lkml.kernel.org/r/20220907144521.3115321-2-zokeefe@google.com Link: https://lkml.kernel.org/r/20220922224046.1143204-2-zokeefe@google.com Signed-off-by: Zach O'Keefe Reviewed-by: Yang Shi Cc: Axel Rasmussen Cc: Chris Kennelly Cc: David Hildenbrand Cc: David Rientjes Cc: Hugh Dickins Cc: James Houghton Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Minchan Kim Cc: Pasha Tatashin Cc: Peter Xu Cc: Rongwei Wang Cc: SeongJae Park Cc: Song Liu Cc: Vlastimil Babka Signed-off-by: Andrew Morton --- include/linux/shmem_fs.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index f24071e3c826..d500ea967dc7 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -92,11 +92,13 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping, extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end); int shmem_unuse(unsigned int type); -extern bool shmem_is_huge(struct vm_area_struct *vma, - struct inode *inode, pgoff_t index); -static inline bool shmem_huge_enabled(struct vm_area_struct *vma) +extern bool shmem_is_huge(struct vm_area_struct *vma, struct inode *inode, + pgoff_t index, bool shmem_huge_force); +static inline bool shmem_huge_enabled(struct vm_area_struct *vma, + bool shmem_huge_force) { - return shmem_is_huge(vma, file_inode(vma->vm_file), vma->vm_pgoff); + return shmem_is_huge(vma, file_inode(vma->vm_file), vma->vm_pgoff, + shmem_huge_force); } extern unsigned long shmem_swap_usage(struct vm_area_struct *vma); extern unsigned long shmem_partial_swap_usage(struct address_space *mapping, -- cgit From 34488399fa08faaf664743fa54b271eb6f9e1321 Mon Sep 17 00:00:00 2001 From: Zach O'Keefe Date: Thu, 22 Sep 2022 15:40:39 -0700 Subject: mm/madvise: add file and shmem support to MADV_COLLAPSE Add support for MADV_COLLAPSE to collapse shmem-backed and file-backed memory into THPs (requires CONFIG_READ_ONLY_THP_FOR_FS=y). On success, the backing memory will be a hugepage. For the memory range and process provided, the page tables will synchronously have a huge pmd installed, mapping the THP. Other mappings of the file extent mapped by the memory range may be added to a set of entries that khugepaged will later process and attempt update their page tables to map the THP by a pmd. This functionality unlocks two important uses: (1) Immediately back executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. Now, we can have the best of both worlds: Peak upfront performance and lower RAM footprints. (2) userfaultfd-based live migration of virtual machines satisfy UFFD faults by fetching native-sized pages over the network (to avoid latency of transferring an entire hugepage). However, after guest memory has been fully copied to the new host, MADV_COLLAPSE can be used to immediately increase guest performance. Since khugepaged is single threaded, this change now introduces possibility of collapse contexts racing in file collapse path. There a important few places to consider: (1) hpage_collapse_scan_file(), when we xas_pause() and drop RCU. We could have the memory collapsed out from under us, but the next xas_for_each() iteration will correctly pick up the hugepage. The hugepage might not be up to date (insofar as copying of small page contents might not have completed - the page still may be locked), but regardless what small page index we were iterating over, we'll find the hugepage and identify it as a suitably aligned compound page of order HPAGE_PMD_ORDER. In khugepaged path, we locklessly check the value of the pmd, and only add it to deferred collapse array if we find pmd mapping pte table. This is fine, since other values that could have raced in right afterwards denote failure, or that the memory was successfully collapsed, so we don't need further processing. In madvise path, we'll take mmap_lock() in write to serialize against page table updates and will know what to do based on the true value of the pmd: recheck all ptes if we point to a pte table, directly install the pmd, if the pmd has been cleared, but memory not yet faulted, or nothing at all if we find a huge pmd. It's worth putting emphasis here on how we treat the none pmd here. If khugepaged has processed this mm's page tables already, it will have left the pmd cleared (ready for refault by the process). Depending on the VMA flags and sysfs settings, amount of RAM on the machine, and the current load, could be a relatively common occurrence - and as such is one we'd like to handle successfully in MADV_COLLAPSE. When we see the none pmd in collapse_pte_mapped_thp(), we've locked mmap_lock in write and checked (a) huepaged_vma_check() to see if the backing memory is appropriate still, along with VMA sizing and appropriate hugepage alignment within the file, and (b) we've found a hugepage head of order HPAGE_PMD_ORDER at the offset in the file mapped by our hugepage-aligned virtual address. Even though the common-case is likely race with khugepaged, given these checks (regardless how we got here - we could be operating on a completely different file than originally checked in hpage_collapse_scan_file() for all we know) it should be safe to directly make the pmd a huge pmd pointing to this hugepage. (2) collapse_file() is mostly serialized on the same file extent by lock sequence: | lock hupepage | lock mapping->i_pages | lock 1st page | unlock mapping->i_pages | | lock mapping->i_pages | page_ref_freeze(3) | xas_store(hugepage) | unlock mapping->i_pages | page_ref_unfreeze(1) | unlock 1st page V unlock hugepage Once a context (who already has their fresh hugepage locked) locks mapping->i_pages exclusively, it will hold said lock until it locks the first page, and it will hold that lock until the after the hugepage has been added to the page cache (and will unlock the hugepage after page table update, though that isn't important here). A racing context that loses the race for mapping->i_pages will then lose the race to locking the first page. Here - depending on how far the other racing context has gotten - we might find the new hugepage (in which case we'll exit cleanly when we check PageTransCompound()), or we'll find the "old" 1st small page (in which we'll exit cleanly when we discover unexpected refcount of 2 after isolate_lru_page()). This is assuming we are able to successfully lock the page we find - in shmem path, we could just fail the trylock and exit cleanly anyways. Failure path in collapse_file() is similar: once we hold lock on 1st small page, we are serialized against other collapse contexts. Before the 1st small page is unlocked, we add it back to the pagecache and unfreeze the refcount appropriately. Contexts who lost the race to the 1st small page will then find the same 1st small page with the correct refcount and will be able to proceed. [zokeefe@google.com: don't check pmd value twice in collapse_pte_mapped_thp()] Link: https://lkml.kernel.org/r/20220927033854.477018-1-zokeefe@google.com [shy828301@gmail.com: Delete hugepage_vma_revalidate_anon(), remove check for multi-add in khugepaged_add_pte_mapped_thp()] Link: https://lore.kernel.org/linux-mm/CAHbLzkrtpM=ic7cYAHcqkubah5VTR8N5=k5RT8MTvv5rN1Y91w@mail.gmail.com/ Link: https://lkml.kernel.org/r/20220907144521.3115321-4-zokeefe@google.com Link: https://lkml.kernel.org/r/20220922224046.1143204-4-zokeefe@google.com Signed-off-by: Zach O'Keefe Cc: Axel Rasmussen Cc: Chris Kennelly Cc: David Hildenbrand Cc: David Rientjes Cc: Hugh Dickins Cc: James Houghton Cc: "Kirill A. Shutemov" Cc: Matthew Wilcox Cc: Miaohe Lin Cc: Minchan Kim Cc: Pasha Tatashin Cc: Peter Xu Cc: Rongwei Wang Cc: SeongJae Park Cc: Song Liu Cc: Vlastimil Babka Cc: Yang Shi Signed-off-by: Andrew Morton --- include/linux/khugepaged.h | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index 384f034ae947..70162d707caf 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -16,11 +16,13 @@ extern void khugepaged_enter_vma(struct vm_area_struct *vma, unsigned long vm_flags); extern void khugepaged_min_free_kbytes_update(void); #ifdef CONFIG_SHMEM -extern void collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr); +extern int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, + bool install_pmd); #else -static inline void collapse_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) +static inline int collapse_pte_mapped_thp(struct mm_struct *mm, + unsigned long addr, bool install_pmd) { + return 0; } #endif @@ -46,9 +48,10 @@ static inline void khugepaged_enter_vma(struct vm_area_struct *vma, unsigned long vm_flags) { } -static inline void collapse_pte_mapped_thp(struct mm_struct *mm, - unsigned long addr) +static inline int collapse_pte_mapped_thp(struct mm_struct *mm, + unsigned long addr, bool install_pmd) { + return 0; } static inline void khugepaged_min_free_kbytes_update(void) -- cgit From 6b91e5dfb3c7ef485587e7ab494dcb47bcdadce3 Mon Sep 17 00:00:00 2001 From: Gaosheng Cui Date: Thu, 22 Sep 2022 19:09:35 +0800 Subject: mm: remove unused inline functions from include/linux/mm_inline.h Remove the following unused inline functions from mm_inline.h: 1. All uses of add_page_to_lru_list_tail() have been removed since commit 7a3dbfe8a52b ("mm/swap: convert lru_deactivate_file to a folio_batch"), and it can be replaced by lruvec_add_folio_tail(). 2. All uses of __clear_page_lru_flags() have been removed since commit 188e8caee968 ("mm/swap: convert __page_cache_release() to use a folio"), and it can be replaced by __folio_clear_lru_flags(). They are useless, so remove them. Link: https://lkml.kernel.org/r/20220922110935.1495099-1-cuigaosheng1@huawei.com Signed-off-by: Gaosheng Cui Signed-off-by: Andrew Morton --- include/linux/mm_inline.h | 11 ----------- 1 file changed, 11 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 4949eda9a9a2..e8ed225d8f7c 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -76,11 +76,6 @@ static __always_inline void __folio_clear_lru_flags(struct folio *folio) __folio_clear_unevictable(folio); } -static __always_inline void __clear_page_lru_flags(struct page *page) -{ - __folio_clear_lru_flags(page_folio(page)); -} - /** * folio_lru_list - Which LRU list should a folio be on? * @folio: The folio to test. @@ -348,12 +343,6 @@ void lruvec_add_folio_tail(struct lruvec *lruvec, struct folio *folio) list_add_tail(&folio->lru, &lruvec->lists[lru]); } -static __always_inline void add_page_to_lru_list_tail(struct page *page, - struct lruvec *lruvec) -{ - lruvec_add_folio_tail(lruvec, page_folio(page)); -} - static __always_inline void lruvec_del_folio(struct lruvec *lruvec, struct folio *folio) { -- cgit From e55b9f96860f6c6026cff97966a740576285e07b Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 26 Sep 2022 09:57:04 -0400 Subject: mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol Since 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control"), CONFIG_MEMCG_SWAP hasn't been a user-visible config option anymore, it just means CONFIG_MEMCG && CONFIG_SWAP. Update the sites accordingly and drop the symbol. [ While touching the docs, remove two references to CONFIG_MEMCG_KMEM, which hasn't been a user-visible symbol for over half a decade. ] Link: https://lkml.kernel.org/r/20220926135704.400818-5-hannes@cmpxchg.org Signed-off-by: Johannes Weiner Acked-by: Shakeel Butt Cc: Hugh Dickins Cc: Michal Hocko Cc: Roman Gushchin Signed-off-by: Andrew Morton --- include/linux/swap.h | 2 +- include/linux/swap_cgroup.h | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swap.h b/include/linux/swap.h index fc8d98660326..a18cf4b7c724 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -666,7 +666,7 @@ static inline void folio_throttle_swaprate(struct folio *folio, gfp_t gfp) cgroup_throttle_swaprate(&folio->page, gfp); } -#ifdef CONFIG_MEMCG_SWAP +#if defined(CONFIG_MEMCG) && defined(CONFIG_SWAP) void mem_cgroup_swapout(struct folio *folio, swp_entry_t entry); int __mem_cgroup_try_charge_swap(struct folio *folio, swp_entry_t entry); static inline int mem_cgroup_try_charge_swap(struct folio *folio, diff --git a/include/linux/swap_cgroup.h b/include/linux/swap_cgroup.h index a12dd1c3966c..ae73a87775b3 100644 --- a/include/linux/swap_cgroup.h +++ b/include/linux/swap_cgroup.h @@ -4,7 +4,7 @@ #include -#ifdef CONFIG_MEMCG_SWAP +#if defined(CONFIG_MEMCG) && defined(CONFIG_SWAP) extern unsigned short swap_cgroup_cmpxchg(swp_entry_t ent, unsigned short old, unsigned short new); @@ -40,6 +40,6 @@ static inline void swap_cgroup_swapoff(int type) return; } -#endif /* CONFIG_MEMCG_SWAP */ +#endif #endif /* __LINUX_SWAP_CGROUP_H */ -- cgit From 8f824b4abd31c5ea32ae1d6725c47bdb247d18da Mon Sep 17 00:00:00 2001 From: Jiangshan Yi Date: Mon, 5 Sep 2022 10:10:34 +0800 Subject: init.h: fix spelling typo in comment Fix spelling typo in comment. Link: https://lkml.kernel.org/r/20220905021034.947701-1-13667453960@163.com Signed-off-by: Jiangshan Yi Reported-by: k2ci Signed-off-by: Andrew Morton --- include/linux/init.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/init.h b/include/linux/init.h index baf0b29a7010..3254766ebbf2 100644 --- a/include/linux/init.h +++ b/include/linux/init.h @@ -134,7 +134,7 @@ static inline initcall_t initcall_from_entry(initcall_entry_t *entry) extern initcall_entry_t __con_initcall_start[], __con_initcall_end[]; -/* Used for contructor calls. */ +/* Used for constructor calls. */ typedef void (*ctor_fn_t)(void); struct file_system_type; -- cgit From 5ca14835dc429c09fefb290f60343fe266382760 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Fri, 9 Sep 2022 13:57:41 -0700 Subject: fs: uninline inode_maybe_inc_iversion() It has many callsites and is large. text data bss dec hex filename 91796 15984 512 108292 1a704 mm/shmem.o-before 91180 15984 512 107676 1a49c mm/shmem.o-after Acked-by: Jeff Layton Cc: Chuck Lever Cc: Alexander Viro Cc: Hugh Dickins Signed-off-by: Andrew Morton --- include/linux/iversion.h | 46 +--------------------------------------------- 1 file changed, 1 insertion(+), 45 deletions(-) (limited to 'include/linux') diff --git a/include/linux/iversion.h b/include/linux/iversion.h index eb5a15810169..e27bd4f55d84 100644 --- a/include/linux/iversion.h +++ b/include/linux/iversion.h @@ -172,51 +172,7 @@ inode_set_iversion_queried(struct inode *inode, u64 val) I_VERSION_QUERIED); } -/** - * inode_maybe_inc_iversion - increments i_version - * @inode: inode with the i_version that should be updated - * @force: increment the counter even if it's not necessary? - * - * Every time the inode is modified, the i_version field must be seen to have - * changed by any observer. - * - * If "force" is set or the QUERIED flag is set, then ensure that we increment - * the value, and clear the queried flag. - * - * In the common case where neither is set, then we can return "false" without - * updating i_version. - * - * If this function returns false, and no other metadata has changed, then we - * can avoid logging the metadata. - */ -static inline bool -inode_maybe_inc_iversion(struct inode *inode, bool force) -{ - u64 cur, new; - - /* - * The i_version field is not strictly ordered with any other inode - * information, but the legacy inode_inc_iversion code used a spinlock - * to serialize increments. - * - * Here, we add full memory barriers to ensure that any de-facto - * ordering with other info is preserved. - * - * This barrier pairs with the barrier in inode_query_iversion() - */ - smp_mb(); - cur = inode_peek_iversion_raw(inode); - do { - /* If flag is clear then we needn't do anything */ - if (!force && !(cur & I_VERSION_QUERIED)) - return false; - - /* Since lowest bit is flag, add 2 to avoid it */ - new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT; - } while (!atomic64_try_cmpxchg(&inode->i_version, &cur, new)); - return true; -} - +bool inode_maybe_inc_iversion(struct inode *inode, bool force); /** * inode_inc_iversion - forcibly increment i_version -- cgit From 5d0ce3595ab75330a15cec914096efbbb8b41e4a Mon Sep 17 00:00:00 2001 From: Jiebin Sun Date: Wed, 14 Sep 2022 03:25:37 +0800 Subject: percpu: add percpu_counter_add_local and percpu_counter_sub_local Patch series "/msg: mitigate the lock contention in ipc/msg", v6. Here are two patches to mitigate the lock contention in ipc/msg. The 1st patch is to add the new interface percpu_counter_add_local and percpu_counter_sub_local. The batch size in percpu_counter_add_batch should be very large in heavy writing and rare reading case. Add the "_local" version, and mostly it will do local adding, reduce the global updating and mitigate lock contention in writing. The 2nd patch is to use percpu_counter instead of atomic update in ipc/msg. The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counter greatly improve the performance. Since there is one percpu struct per namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent. This patch (of 2): The batch size in percpu_counter_add_batch should be very large in heavy writing and rare reading case. Add the "_local" version, and mostly it will do local adding, reduce the global updating and mitigate lock contention in writing. Link: https://lkml.kernel.org/r/20220913192538.3023708-1-jiebin.sun@intel.com Link: https://lkml.kernel.org/r/20220913192538.3023708-2-jiebin.sun@intel.com Signed-off-by: Jiebin Sun Reviewed-by: Tim Chen Cc: Alexander Mikhalitsyn Cc: Alexey Gladkov Cc: Christoph Lameter Cc: Dennis Zhou Cc: "Eric W . Biederman" Cc: Manfred Spraul Cc: Shakeel Butt Cc: Tejun Heo Cc: Vasily Averin Cc: Davidlohr Bueso Signed-off-by: Andrew Morton --- include/linux/percpu_counter.h | 32 ++++++++++++++++++++++++++++++++ 1 file changed, 32 insertions(+) (limited to 'include/linux') diff --git a/include/linux/percpu_counter.h b/include/linux/percpu_counter.h index 01861eebed79..8ed5fba6d156 100644 --- a/include/linux/percpu_counter.h +++ b/include/linux/percpu_counter.h @@ -15,6 +15,9 @@ #include #include +/* percpu_counter batch for local add or sub */ +#define PERCPU_COUNTER_LOCAL_BATCH INT_MAX + #ifdef CONFIG_SMP struct percpu_counter { @@ -56,6 +59,22 @@ static inline void percpu_counter_add(struct percpu_counter *fbc, s64 amount) percpu_counter_add_batch(fbc, amount, percpu_counter_batch); } +/* + * With percpu_counter_add_local() and percpu_counter_sub_local(), counts + * are accumulated in local per cpu counter and not in fbc->count until + * local count overflows PERCPU_COUNTER_LOCAL_BATCH. This makes counter + * write efficient. + * But percpu_counter_sum(), instead of percpu_counter_read(), needs to be + * used to add up the counts from each CPU to account for all the local + * counts. So percpu_counter_add_local() and percpu_counter_sub_local() + * should be used when a counter is updated frequently and read rarely. + */ +static inline void +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) +{ + percpu_counter_add_batch(fbc, amount, PERCPU_COUNTER_LOCAL_BATCH); +} + static inline s64 percpu_counter_sum_positive(struct percpu_counter *fbc) { s64 ret = __percpu_counter_sum(fbc); @@ -138,6 +157,13 @@ percpu_counter_add(struct percpu_counter *fbc, s64 amount) preempt_enable(); } +/* non-SMP percpu_counter_add_local is the same with percpu_counter_add */ +static inline void +percpu_counter_add_local(struct percpu_counter *fbc, s64 amount) +{ + percpu_counter_add(fbc, amount); +} + static inline void percpu_counter_add_batch(struct percpu_counter *fbc, s64 amount, s32 batch) { @@ -193,4 +219,10 @@ static inline void percpu_counter_sub(struct percpu_counter *fbc, s64 amount) percpu_counter_add(fbc, -amount); } +static inline void +percpu_counter_sub_local(struct percpu_counter *fbc, s64 amount) +{ + percpu_counter_add_local(fbc, -amount); +} + #endif /* _LINUX_PERCPU_COUNTER_H */ -- cgit From 72d1e611082eda18689106a0c192f2827072713c Mon Sep 17 00:00:00 2001 From: Jiebin Sun Date: Wed, 14 Sep 2022 03:25:38 +0800 Subject: ipc/msg: mitigate the lock contention with percpu counter The msg_bytes and msg_hdrs atomic counters are frequently updated when IPC msg queue is in heavy use, causing heavy cache bounce and overhead. Change them to percpu_counter greatly improve the performance. Since there is one percpu struct per namespace, additional memory cost is minimal. Reading of the count done in msgctl call, which is infrequent. So the need to sum up the counts in each CPU is infrequent. Apply the patch and test the pts/stress-ng-1.4.0 -- system v message passing (160 threads). Score gain: 3.99x CPU: ICX 8380 x 2 sockets Core number: 40 x 2 physical cores Benchmark: pts/stress-ng-1.4.0 -- system v message passing (160 threads) [akpm@linux-foundation.org: coding-style cleanups] [jiebin.sun@intel.com: avoid negative value by overflow in msginfo] Link: https://lkml.kernel.org/r/20220920150809.4014944-1-jiebin.sun@intel.com [akpm@linux-foundation.org: fix min() warnings] Link: https://lkml.kernel.org/r/20220913192538.3023708-3-jiebin.sun@intel.com Signed-off-by: Jiebin Sun Reviewed-by: Tim Chen Cc: Alexander Mikhalitsyn Cc: Alexey Gladkov Cc: Christoph Lameter Cc: Davidlohr Bueso Cc: Dennis Zhou Cc: "Eric W . Biederman" Cc: Manfred Spraul Cc: Shakeel Butt Cc: Tejun Heo Cc: Vasily Averin Signed-off-by: Andrew Morton --- include/linux/ipc_namespace.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ipc_namespace.h b/include/linux/ipc_namespace.h index e3e8c8662b49..e8240cf2611a 100644 --- a/include/linux/ipc_namespace.h +++ b/include/linux/ipc_namespace.h @@ -11,6 +11,7 @@ #include #include #include +#include struct user_namespace; @@ -36,8 +37,8 @@ struct ipc_namespace { unsigned int msg_ctlmax; unsigned int msg_ctlmnb; unsigned int msg_ctlmni; - atomic_t msg_bytes; - atomic_t msg_hdrs; + struct percpu_counter percpu_msg_bytes; + struct percpu_counter percpu_msg_hdrs; size_t shm_ctlmax; size_t shm_ctlall; -- cgit From 168723c1f8d6e1e823d4c6ad3cf64478cf58330a Mon Sep 17 00:00:00 2001 From: Maxim Mikityanskiy Date: Sat, 1 Oct 2022 21:56:22 -0700 Subject: net/mlx5e: xsk: Use umr_mode to calculate striding RQ parameters Instead of passing the unaligned flag, pass an enum that indicates the UMR mode. The next commit will add the third mode (KLM for certain configurations of XSK), which will be added to this enum instead of adding another bool flag everywhere. Signed-off-by: Maxim Mikityanskiy Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Jakub Kicinski --- include/linux/mlx5/driver.h | 4 ++++ 1 file changed, 4 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index f8ecb33105d3..285f301a6390 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1288,4 +1288,8 @@ static inline bool mlx5_get_roce_state(struct mlx5_core_dev *dev) return mlx5_is_roce_on(dev); } +enum { + MLX5_OCTWORD = 16, +}; + #endif /* MLX5_DRIVER_H */ -- cgit From 16ab85e78439bab1201ff26ba430231d1574b4ae Mon Sep 17 00:00:00 2001 From: Gal Pressman Date: Sat, 1 Oct 2022 21:56:27 -0700 Subject: net/mlx5e: Expose rx_oversize_pkts_buffer counter Add the rx_oversize_pkts_buffer counter to ethtool statistics. This counter exposes the number of dropped received packets due to length which arrived to RQ and exceed software buffer size allocated by the device for incoming traffic. It might imply that the device MTU is larger than the software buffers size. Signed-off-by: Gal Pressman Reviewed-by: Tariq Toukan Signed-off-by: Saeed Mahameed Signed-off-by: Jakub Kicinski --- include/linux/mlx5/mlx5_ifc.h | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 1ad762e22d86..06574d430ff5 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -1491,7 +1491,9 @@ struct mlx5_ifc_cmd_hca_cap_bits { u8 reserved_at_120[0xa]; u8 log_max_ra_req_dc[0x6]; - u8 reserved_at_130[0x9]; + u8 reserved_at_130[0x2]; + u8 eth_wqe_too_small[0x1]; + u8 reserved_at_133[0x6]; u8 vnic_env_cq_overrun[0x1]; u8 log_max_ra_res_dc[0x6]; @@ -3537,7 +3539,9 @@ struct mlx5_ifc_vnic_diagnostic_statistics_bits { u8 cq_overrun[0x20]; - u8 reserved_at_220[0xde0]; + u8 eth_wqe_too_small[0x20]; + + u8 reserved_at_220[0xdc0]; }; struct mlx5_ifc_traffic_counter_bits { -- cgit From 9b98d395b85dd042fe83fb696b1ac02e6c93a520 Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Sat, 1 Oct 2022 21:56:28 -0700 Subject: net/mlx5: Start health poll at earlier stage of driver load Start health poll at earlier stage, so if fw fatal issue occurred before or during initialization commands such as init_hca or set_hca_cap the poll health can detect and indicate that the driver is already in error state. Signed-off-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Signed-off-by: Jakub Kicinski --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index 285f301a6390..a12929bc31b2 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -1017,6 +1017,7 @@ void mlx5_health_cleanup(struct mlx5_core_dev *dev); int mlx5_health_init(struct mlx5_core_dev *dev); void mlx5_start_health_poll(struct mlx5_core_dev *dev); void mlx5_stop_health_poll(struct mlx5_core_dev *dev, bool disable_health); +void mlx5_start_health_fw_log_up(struct mlx5_core_dev *dev); void mlx5_drain_health_wq(struct mlx5_core_dev *dev); void mlx5_trigger_health_work(struct mlx5_core_dev *dev); int mlx5_frag_buf_alloc_node(struct mlx5_core_dev *dev, int size, -- cgit From 3114b075eb2531dea31a961944309485d6a53040 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Mon, 3 Oct 2022 08:51:57 +0200 Subject: net: add framework to support Ethernet PSE and PDs devices This framework was create with intention to provide support for Ethernet PSE (Power Sourcing Equipment) and PDs (Powered Device). At current step this patch implements generic PSE support for PoDL (Power over Data Lines 802.3bu) specification with reserving name space for PD devices as well. This framework can be extended to support 802.3af and 802.3at "Power via the Media Dependent Interface" (or PoE/Power over Ethernet) Signed-off-by: Oleksij Rempel Reviewed-by: Andrew Lunn Signed-off-by: Jakub Kicinski --- include/linux/pse-pd/pse.h | 67 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 67 insertions(+) create mode 100644 include/linux/pse-pd/pse.h (limited to 'include/linux') diff --git a/include/linux/pse-pd/pse.h b/include/linux/pse-pd/pse.h new file mode 100644 index 000000000000..3ba787a48b15 --- /dev/null +++ b/include/linux/pse-pd/pse.h @@ -0,0 +1,67 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* +// Copyright (c) 2022 Pengutronix, Oleksij Rempel + */ +#ifndef _LINUX_PSE_CONTROLLER_H +#define _LINUX_PSE_CONTROLLER_H + +#include +#include +#include + +struct module; +struct device_node; +struct of_phandle_args; +struct pse_control; + +/** + * struct pse_controller_dev - PSE controller entity that might + * provide multiple PSE controls + * @ops: a pointer to device specific struct pse_controller_ops + * @owner: kernel module of the PSE controller driver + * @list: internal list of PSE controller devices + * @pse_control_head: head of internal list of requested PSE controls + * @dev: corresponding driver model device struct + * @of_pse_n_cells: number of cells in PSE line specifiers + * @of_xlate: translation function to translate from specifier as found in the + * device tree to id as given to the PSE control ops + * @nr_lines: number of PSE controls in this controller device + * @lock: Mutex for serialization access to the PSE controller + */ +struct pse_controller_dev { + const struct pse_controller_ops *ops; + struct module *owner; + struct list_head list; + struct list_head pse_control_head; + struct device *dev; + int of_pse_n_cells; + int (*of_xlate)(struct pse_controller_dev *pcdev, + const struct of_phandle_args *pse_spec); + unsigned int nr_lines; + struct mutex lock; +}; + +#if IS_ENABLED(CONFIG_PSE_CONTROLLER) +int pse_controller_register(struct pse_controller_dev *pcdev); +void pse_controller_unregister(struct pse_controller_dev *pcdev); +struct device; +int devm_pse_controller_register(struct device *dev, + struct pse_controller_dev *pcdev); + +struct pse_control *of_pse_control_get(struct device_node *node); +void pse_control_put(struct pse_control *psec); + +#else + +static inline struct pse_control *of_pse_control_get(struct device_node *node) +{ + return ERR_PTR(-ENOENT); +} + +static inline void pse_control_put(struct pse_control *psec) +{ +} + +#endif + +#endif -- cgit From 5e82147de1cbd758bb280908daa39d95ed467538 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Mon, 3 Oct 2022 08:51:59 +0200 Subject: net: mdiobus: search for PSE nodes by parsing PHY nodes. Some PHYs can be linked with PSE (Power Sourcing Equipment), so search for related nodes and attach it to the phydev. Signed-off-by: Oleksij Rempel Reviewed-by: Andrew Lunn Signed-off-by: Jakub Kicinski --- include/linux/phy.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phy.h b/include/linux/phy.h index d65fc76fe0ae..ddf66198f751 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -597,6 +597,7 @@ struct macsec_ops; * @master_slave_get: Current master/slave advertisement * @master_slave_state: Current master/slave configuration * @mii_ts: Pointer to time stamper callbacks + * @psec: Pointer to Power Sourcing Equipment control struct * @lock: Mutex for serialization access to PHY * @state_queue: Work queue for state machine * @shared: Pointer to private data shared by phys in one package @@ -715,6 +716,7 @@ struct phy_device { struct phylink *phylink; struct net_device *attached_dev; struct mii_timestamper *mii_ts; + struct pse_control *psec; u8 mdix; u8 mdix_ctrl; -- cgit From 18ff0bcda6d1dd3d53b4ce3f03e61bf1a648f960 Mon Sep 17 00:00:00 2001 From: Oleksij Rempel Date: Mon, 3 Oct 2022 08:52:00 +0200 Subject: ethtool: add interface to interact with Ethernet Power Equipment Add interface to support Power Sourcing Equipment. At current step it provides generic way to address all variants of PSE devices as defined in IEEE 802.3-2018 but support only objects specified for IEEE 802.3-2018 104.4 PoDL Power Sourcing Equipment (PSE). Currently supported and mandatory objects are: IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl This is minimal interface needed to control PSE on each separate ethernet port but it provides not all mandatory objects specified in IEEE 802.3-2018. Since "PoDL PSE" and "PSE" have similar names, but some different values I decide to not merge them and keep separate naming schema. This should allow as to be as close to IEEE 802.3 spec as possible and avoid name conflicts in the future. This implementation is connected to PHYs instead of MACs because PSE auto classification can potentially interfere with PHY auto negotiation. So, may be some extra PHY related initialization will be needed. With WIP version of ethtools interaction with PSE capable link looks as following: $ ip l ... 5: t1l1@eth0: .. ... $ ethtool --show-pse t1l1 PSE attributs for t1l1: PoDL PSE Admin State: disabled PoDL PSE Power Detection Status: disabled $ ethtool --set-pse t1l1 podl-pse-admin-control enable $ ethtool --show-pse t1l1 PSE attributs for t1l1: PoDL PSE Admin State: enabled PoDL PSE Power Detection Status: delivering power Signed-off-by: kernel test robot Signed-off-by: Oleksij Rempel Reviewed-by: Bagas Sanjaya Reviewed-by: Andrew Lunn Signed-off-by: Jakub Kicinski --- include/linux/pse-pd/pse.h | 62 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pse-pd/pse.h b/include/linux/pse-pd/pse.h index 3ba787a48b15..fd1a916eeeba 100644 --- a/include/linux/pse-pd/pse.h +++ b/include/linux/pse-pd/pse.h @@ -9,6 +9,47 @@ #include #include +struct phy_device; +struct pse_controller_dev; + +/** + * struct pse_control_config - PSE control/channel configuration. + * + * @admin_cotrol: set PoDL PSE admin control as described in + * IEEE 802.3-2018 30.15.1.2.1 acPoDLPSEAdminControl + */ +struct pse_control_config { + enum ethtool_podl_pse_admin_state admin_cotrol; +}; + +/** + * struct pse_control_status - PSE control/channel status. + * + * @podl_admin_state: operational state of the PoDL PSE + * functions. IEEE 802.3-2018 30.15.1.1.2 aPoDLPSEAdminState + * @podl_pw_status: power detection status of the PoDL PSE. + * IEEE 802.3-2018 30.15.1.1.3 aPoDLPSEPowerDetectionStatus: + */ +struct pse_control_status { + enum ethtool_podl_pse_admin_state podl_admin_state; + enum ethtool_podl_pse_pw_d_status podl_pw_status; +}; + +/** + * struct pse_controller_ops - PSE controller driver callbacks + * + * @ethtool_get_status: get PSE control status for ethtool interface + * @ethtool_set_config: set PSE control configuration over ethtool interface + */ +struct pse_controller_ops { + int (*ethtool_get_status)(struct pse_controller_dev *pcdev, + unsigned long id, struct netlink_ext_ack *extack, + struct pse_control_status *status); + int (*ethtool_set_config)(struct pse_controller_dev *pcdev, + unsigned long id, struct netlink_ext_ack *extack, + const struct pse_control_config *config); +}; + struct module; struct device_node; struct of_phandle_args; @@ -51,6 +92,13 @@ int devm_pse_controller_register(struct device *dev, struct pse_control *of_pse_control_get(struct device_node *node); void pse_control_put(struct pse_control *psec); +int pse_ethtool_get_status(struct pse_control *psec, + struct netlink_ext_ack *extack, + struct pse_control_status *status); +int pse_ethtool_set_config(struct pse_control *psec, + struct netlink_ext_ack *extack, + const struct pse_control_config *config); + #else static inline struct pse_control *of_pse_control_get(struct device_node *node) @@ -62,6 +110,20 @@ static inline void pse_control_put(struct pse_control *psec) { } +int pse_ethtool_get_status(struct pse_control *psec, + struct netlink_ext_ack *extack, + struct pse_control_status *status) +{ + return -ENOTSUPP; +} + +int pse_ethtool_set_config(struct pse_control *psec, + struct netlink_ext_ack *extack, + const struct pse_control_config *config) +{ + return -ENOTSUPP; +} + #endif #endif -- cgit From 2a4187f4406ec3236f8b9d0d5150d2bf8d021b68 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Mon, 3 Oct 2022 20:14:13 +0200 Subject: once: rename _SLOW to _SLEEPABLE The _SLOW designation wasn't really descriptive of anything. This is meant to be called from process context when it's possible to sleep. So name this more aptly _SLEEPABLE, which better fits its intended use. Fixes: 62c07983bef9 ("once: add DO_ONCE_SLOW() for sleepable contexts") Cc: Christophe Leroy Signed-off-by: Jason A. Donenfeld Reviewed-by: Eric Dumazet Link: https://lore.kernel.org/r/20221003181413.1221968-1-Jason@zx2c4.com Signed-off-by: Jakub Kicinski --- include/linux/once.h | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-) (limited to 'include/linux') diff --git a/include/linux/once.h b/include/linux/once.h index 176ab75b42df..bc714d414448 100644 --- a/include/linux/once.h +++ b/include/linux/once.h @@ -13,9 +13,9 @@ void __do_once_done(bool *done, struct static_key_true *once_key, unsigned long *flags, struct module *mod); /* Variant for process contexts only. */ -bool __do_once_slow_start(bool *done); -void __do_once_slow_done(bool *done, struct static_key_true *once_key, - struct module *mod); +bool __do_once_sleepable_start(bool *done); +void __do_once_sleepable_done(bool *done, struct static_key_true *once_key, + struct module *mod); /* Call a function exactly once. The idea of DO_ONCE() is to perform * a function call such as initialization of random seeds, etc, only @@ -61,26 +61,26 @@ void __do_once_slow_done(bool *done, struct static_key_true *once_key, }) /* Variant of DO_ONCE() for process/sleepable contexts. */ -#define DO_ONCE_SLOW(func, ...) \ - ({ \ - bool ___ret = false; \ - static bool __section(".data.once") ___done = false; \ - static DEFINE_STATIC_KEY_TRUE(___once_key); \ - if (static_branch_unlikely(&___once_key)) { \ - ___ret = __do_once_slow_start(&___done); \ - if (unlikely(___ret)) { \ - func(__VA_ARGS__); \ - __do_once_slow_done(&___done, &___once_key, \ - THIS_MODULE); \ - } \ - } \ - ___ret; \ +#define DO_ONCE_SLEEPABLE(func, ...) \ + ({ \ + bool ___ret = false; \ + static bool __section(".data.once") ___done = false; \ + static DEFINE_STATIC_KEY_TRUE(___once_key); \ + if (static_branch_unlikely(&___once_key)) { \ + ___ret = __do_once_sleepable_start(&___done); \ + if (unlikely(___ret)) { \ + func(__VA_ARGS__); \ + __do_once_sleepable_done(&___done, &___once_key,\ + THIS_MODULE); \ + } \ + } \ + ___ret; \ }) #define get_random_once(buf, nbytes) \ DO_ONCE(get_random_bytes, (buf), (nbytes)) -#define get_random_slow_once(buf, nbytes) \ - DO_ONCE_SLOW(get_random_bytes, (buf), (nbytes)) +#define get_random_sleepable_once(buf, nbytes) \ + DO_ONCE_SLEEPABLE(get_random_bytes, (buf), (nbytes)) #endif /* _LINUX_ONCE_H */ -- cgit From d7915651fe9a6474e06161c34c637e6aeb811e9d Mon Sep 17 00:00:00 2001 From: Christian Marangi Date: Wed, 14 Sep 2022 16:47:42 +0200 Subject: clk: introduce (devm_)hw_register_mux_parent_data_table API Introduce (devm_)hw_register_mux_parent_data_table new API. We have basic support for clk_register_mux using parent_data but we lack any API to provide a custom parent_map. Add these 2 new API to correctly handle these special configuration instead of using the generic __(devm_)clk_hw_register_mux API. Signed-off-by: Christian Marangi Link: https://lore.kernel.org/r/20220914144743.17369-1-ansuelsmth@gmail.com Signed-off-by: Stephen Boyd --- include/linux/clk-provider.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk-provider.h b/include/linux/clk-provider.h index 1615010aa0ec..65b70f0d62c5 100644 --- a/include/linux/clk-provider.h +++ b/include/linux/clk-provider.h @@ -974,6 +974,13 @@ struct clk *clk_register_mux_table(struct device *dev, const char *name, __clk_hw_register_mux((dev), NULL, (name), (num_parents), NULL, NULL, \ (parent_data), (flags), (reg), (shift), \ BIT((width)) - 1, (clk_mux_flags), NULL, (lock)) +#define clk_hw_register_mux_parent_data_table(dev, name, parent_data, \ + num_parents, flags, reg, shift, \ + width, clk_mux_flags, table, \ + lock) \ + __clk_hw_register_mux((dev), NULL, (name), (num_parents), NULL, NULL, \ + (parent_data), (flags), (reg), (shift), \ + BIT((width)) - 1, (clk_mux_flags), table, (lock)) #define devm_clk_hw_register_mux(dev, name, parent_names, num_parents, flags, reg, \ shift, width, clk_mux_flags, lock) \ __devm_clk_hw_register_mux((dev), NULL, (name), (num_parents), \ @@ -987,6 +994,13 @@ struct clk *clk_register_mux_table(struct device *dev, const char *name, (parent_hws), NULL, (flags), (reg), \ (shift), BIT((width)) - 1, \ (clk_mux_flags), NULL, (lock)) +#define devm_clk_hw_register_mux_parent_data_table(dev, name, parent_data, \ + num_parents, flags, reg, shift, \ + width, clk_mux_flags, table, \ + lock) \ + __devm_clk_hw_register_mux((dev), NULL, (name), (num_parents), NULL, \ + NULL, (parent_data), (flags), (reg), (shift), \ + BIT((width)) - 1, (clk_mux_flags), table, (lock)) int clk_mux_val_to_index(struct clk_hw *hw, const u32 *table, unsigned int flags, unsigned int val); -- cgit From 681bf011b9b5989c6e9db6beb64494918aab9a43 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Mon, 3 Oct 2022 21:03:27 -0700 Subject: eth: pse: add missing static inlines build bot reports missing 'static inline' qualifiers in the header. Reported-by: kernel test robot Fixes: 18ff0bcda6d1 ("ethtool: add interface to interact with Ethernet Power Equipment") Reviewed-by: Oleksij Rempel Link: https://lore.kernel.org/r/20221004040327.2034878-1-kuba@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/pse-pd/pse.h | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pse-pd/pse.h b/include/linux/pse-pd/pse.h index fd1a916eeeba..fb724c65c77b 100644 --- a/include/linux/pse-pd/pse.h +++ b/include/linux/pse-pd/pse.h @@ -110,16 +110,16 @@ static inline void pse_control_put(struct pse_control *psec) { } -int pse_ethtool_get_status(struct pse_control *psec, - struct netlink_ext_ack *extack, - struct pse_control_status *status) +static inline int pse_ethtool_get_status(struct pse_control *psec, + struct netlink_ext_ack *extack, + struct pse_control_status *status) { return -ENOTSUPP; } -int pse_ethtool_set_config(struct pse_control *psec, - struct netlink_ext_ack *extack, - const struct pse_control_config *config) +static inline int pse_ethtool_set_config(struct pse_control *psec, + struct netlink_ext_ack *extack, + const struct pse_control_config *config) { return -ENOTSUPP; } -- cgit From 5d10f480f77b67332e4835ad565bfe4cb8528159 Mon Sep 17 00:00:00 2001 From: Daniel Lezcano Date: Thu, 18 Aug 2022 10:23:16 +0200 Subject: thermal/of: Remove the thermal_zone_of_get_sensor_id() function The function thermal_zone_of_get_sensor_id() is no longer used anywhere, remove it. Signed-off-by: Daniel Lezcano Link: https://lore.kernel.org/r/20220818082316.2717095-2-daniel.lezcano@linaro.org --- include/linux/thermal.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/linux') diff --git a/include/linux/thermal.h b/include/linux/thermal.h index 6f1ec4fb7ef8..9ecc128944a1 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -308,9 +308,6 @@ void devm_thermal_of_zone_unregister(struct device *dev, struct thermal_zone_dev void thermal_of_zone_unregister(struct thermal_zone_device *tz); -int thermal_zone_of_get_sensor_id(struct device_node *tz_np, - struct device_node *sensor_np, - u32 *id); #else static inline struct thermal_zone_device *thermal_of_zone_register(struct device_node *sensor, int id, void *data, @@ -334,13 +331,6 @@ static inline void devm_thermal_of_zone_unregister(struct device *dev, struct thermal_zone_device *tz) { } - -static inline int thermal_zone_of_get_sensor_id(struct device_node *tz_np, - struct device_node *sensor_np, - u32 *id) -{ - return -ENOENT; -} #endif #ifdef CONFIG_THERMAL -- cgit From 0ce38047e82a02017839b6cae837f13a1383a3a0 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Tue, 4 Oct 2022 10:46:58 +0200 Subject: perf: Fix lockdep_assert_event_ctx() I'm a flamin' moron; because even after Mark told me it should be '&&' I still got it wrong in the final commit. Fixes: f3c0eba28704 ("perf: Add a few assertions") Reported-by: Borislav Petkov Reported-by: Mark Rutland Signed-off-by: Peter Zijlstra (Intel) Tested-by: Borislav Petkov Link: https://lkml.kernel.org/r/YvvIWmDBWdIUCMZj@FVFF77S0Q05N --- include/linux/perf_event.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index e9b151cde491..853f64b6c8c2 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -644,7 +644,7 @@ struct pmu_event_list { #ifdef CONFIG_PROVE_LOCKING #define lockdep_assert_event_ctx(event) \ WARN_ON_ONCE(__lockdep_enabled && \ - (this_cpu_read(hardirqs_enabled) || \ + (this_cpu_read(hardirqs_enabled) && \ lockdep_is_held(&(event)->ctx->mutex) != LOCK_STATE_HELD)) #else #define lockdep_assert_event_ctx(event) -- cgit From 4c99256013fa4e0fe9733ca1bab2b5684ccc02a1 Mon Sep 17 00:00:00 2001 From: Raul E Rangel Date: Thu, 29 Sep 2022 10:19:09 -0600 Subject: gpiolib: acpi: Add wake_capable variants of acpi_dev_gpio_irq_get The ACPI spec defines the SharedAndWake and ExclusiveAndWake share type keywords. This is an indication that the GPIO IRQ can also be used as a wake source. This change exposes the wake_capable bit so drivers can correctly enable wake functionality instead of making an assumption. Signed-off-by: Raul E Rangel Reviewed-by: Mika Westerberg Reviewed-by: Andy Shevchenko Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index 6f64b2f3dc54..cd7371a5f283 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -1202,7 +1202,8 @@ bool acpi_gpio_get_irq_resource(struct acpi_resource *ares, struct acpi_resource_gpio **agpio); bool acpi_gpio_get_io_resource(struct acpi_resource *ares, struct acpi_resource_gpio **agpio); -int acpi_dev_gpio_irq_get_by(struct acpi_device *adev, const char *name, int index); +int acpi_dev_gpio_irq_wake_get_by(struct acpi_device *adev, const char *name, int index, + bool *wake_capable); #else static inline bool acpi_gpio_get_irq_resource(struct acpi_resource *ares, struct acpi_resource_gpio **agpio) @@ -1214,16 +1215,28 @@ static inline bool acpi_gpio_get_io_resource(struct acpi_resource *ares, { return false; } -static inline int acpi_dev_gpio_irq_get_by(struct acpi_device *adev, - const char *name, int index) +static inline int acpi_dev_gpio_irq_wake_get_by(struct acpi_device *adev, const char *name, + int index, bool *wake_capable) { return -ENXIO; } #endif +static inline int acpi_dev_gpio_irq_wake_get(struct acpi_device *adev, int index, + bool *wake_capable) +{ + return acpi_dev_gpio_irq_wake_get_by(adev, NULL, index, wake_capable); +} + +static inline int acpi_dev_gpio_irq_get_by(struct acpi_device *adev, const char *name, + int index) +{ + return acpi_dev_gpio_irq_wake_get_by(adev, name, index, NULL); +} + static inline int acpi_dev_gpio_irq_get(struct acpi_device *adev, int index) { - return acpi_dev_gpio_irq_get_by(adev, NULL, index); + return acpi_dev_gpio_irq_wake_get_by(adev, NULL, index, NULL); } /* Device properties */ -- cgit From 5ff811604f93bdd2650beed80b48c2ca16c6fba6 Mon Sep 17 00:00:00 2001 From: Raul E Rangel Date: Thu, 29 Sep 2022 10:19:10 -0600 Subject: ACPI: resources: Add wake_capable parameter to acpi_dev_irq_flags ACPI IRQ/Interrupt resources contain a bit that describes if the interrupt should wake the system. This change exposes that bit via a new IORESOURCE_IRQ_WAKECAPABLE flag. Drivers should check this flag before arming an IRQ to wake the system. Signed-off-by: Raul E Rangel Reviewed-by: Andy Shevchenko Signed-off-by: Rafael J. Wysocki --- include/linux/acpi.h | 2 +- include/linux/ioport.h | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/acpi.h b/include/linux/acpi.h index cd7371a5f283..ea2efbdbeee5 100644 --- a/include/linux/acpi.h +++ b/include/linux/acpi.h @@ -495,7 +495,7 @@ bool acpi_dev_resource_address_space(struct acpi_resource *ares, struct resource_win *win); bool acpi_dev_resource_ext_address_space(struct acpi_resource *ares, struct resource_win *win); -unsigned long acpi_dev_irq_flags(u8 triggering, u8 polarity, u8 shareable); +unsigned long acpi_dev_irq_flags(u8 triggering, u8 polarity, u8 shareable, u8 wake_capable); unsigned int acpi_dev_get_irq_type(int triggering, int polarity); bool acpi_dev_resource_interrupt(struct acpi_resource *ares, int index, struct resource *res); diff --git a/include/linux/ioport.h b/include/linux/ioport.h index 616b683563a9..3baeea4d903b 100644 --- a/include/linux/ioport.h +++ b/include/linux/ioport.h @@ -79,7 +79,8 @@ struct resource { #define IORESOURCE_IRQ_HIGHLEVEL (1<<2) #define IORESOURCE_IRQ_LOWLEVEL (1<<3) #define IORESOURCE_IRQ_SHAREABLE (1<<4) -#define IORESOURCE_IRQ_OPTIONAL (1<<5) +#define IORESOURCE_IRQ_OPTIONAL (1<<5) +#define IORESOURCE_IRQ_WAKECAPABLE (1<<6) /* PnP DMA specific bits (IORESOURCE_BITS) */ #define IORESOURCE_DMA_TYPE_MASK (3<<0) -- cgit From e7fd8b68404f3d8bc03f85bc3c078d67096be06f Mon Sep 17 00:00:00 2001 From: Kai-Heng Feng Date: Thu, 29 Sep 2022 15:05:23 +0800 Subject: kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode Add SYS_OFF_MODE_RESTART_PREPARE callbacks to be invoked before a system restart. Suggested-by: Dmitry Osipenko Reviewed-by: Dmitry Osipenko Signed-off-by: Kai-Heng Feng [ rjw: Changelog edits ] Signed-off-by: Rafael J. Wysocki --- include/linux/reboot.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/reboot.h b/include/linux/reboot.h index e5d9ef886179..2b6bb593be5b 100644 --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -105,6 +105,14 @@ enum sys_off_mode { */ SYS_OFF_MODE_POWER_OFF, + /** + * @SYS_OFF_MODE_RESTART_PREPARE: + * + * Handlers prepare system to be restarted. Handlers are + * allowed to sleep. + */ + SYS_OFF_MODE_RESTART_PREPARE, + /** * @SYS_OFF_MODE_RESTART: * -- cgit From 0e0abad2a71bcd7ba0f30e7975f5b4199ade4e60 Mon Sep 17 00:00:00 2001 From: Geert Uytterhoeven Date: Tue, 4 Oct 2022 14:39:10 +0200 Subject: io_uring: Add missing inline to io_uring_cmd_import_fixed() dummy MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit If CONFIG_IO_URING is not set: include/linux/io_uring.h:65:12: error: ‘io_uring_cmd_import_fixed’ defined but not used [-Werror=unused-function] 65 | static int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, | ^~~~~~~~~~~~~~~~~~~~~~~~~ Fix this by adding the missing "inline" keyword. Fixes: a9216fac3ed8819c ("io_uring: add io_uring_cmd_import_fixed") Signed-off-by: Geert Uytterhoeven Link: https://lore.kernel.org/r/7404b4a696f64e33e5ef3c5bd3754d4f26d13e50.1664887093.git.geert+renesas@glider.be Signed-off-by: Jens Axboe --- include/linux/io_uring.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index e10c5cc81082..43bc8a2edccf 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -62,7 +62,7 @@ static inline void io_uring_free(struct task_struct *tsk) __io_uring_free(tsk); } #else -static int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, +static inline int io_uring_cmd_import_fixed(u64 ubuf, unsigned long len, int rw, struct iov_iter *iter, void *ioucmd) { return -EOPNOTSUPP; -- cgit From da4ab869e37cf81f93333ba74b16e0ea6d322e15 Mon Sep 17 00:00:00 2001 From: Jeff Layton Date: Wed, 25 May 2022 06:11:00 -0400 Subject: libceph: drop last_piece flag from ceph_msg_data_cursor ceph_msg_data_next is always passed a NULL pointer for this field. Some of the "next" operations look at it in order to determine the length, but we can just take the min of the data on the page or cursor->resid. Signed-off-by: Jeff Layton Reviewed-by: Xiubo Li Reviewed-by: Ilya Dryomov Signed-off-by: Ilya Dryomov --- include/linux/ceph/messenger.h | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/ceph/messenger.h b/include/linux/ceph/messenger.h index e7f2fb2fc207..99c1726be6ee 100644 --- a/include/linux/ceph/messenger.h +++ b/include/linux/ceph/messenger.h @@ -207,7 +207,6 @@ struct ceph_msg_data_cursor { struct ceph_msg_data *data; /* current data item */ size_t resid; /* bytes not yet consumed */ - bool last_piece; /* current is last piece */ bool need_crc; /* crc update needed */ union { #ifdef CONFIG_BLOCK @@ -498,8 +497,7 @@ void ceph_con_discard_requeued(struct ceph_connection *con, u64 reconnect_seq); void ceph_msg_data_cursor_init(struct ceph_msg_data_cursor *cursor, struct ceph_msg *msg, size_t length); struct page *ceph_msg_data_next(struct ceph_msg_data_cursor *cursor, - size_t *page_offset, size_t *length, - bool *last_piece); + size_t *page_offset, size_t *length); void ceph_msg_data_advance(struct ceph_msg_data_cursor *cursor, size_t bytes); u32 ceph_crc32c_page(u32 crc, struct page *page, unsigned int page_offset, -- cgit From bdef2b7896df293736330eb6eb0f43947049b828 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:41 +0200 Subject: vfio/mdev: make mdev.h standalone includable Include and so that users of this headers don't need to do that and remove those includes that aren't needed any more. Signed-off-by: Christoph Hellwig Reviewed-by: Eric Farman Reviewed-by: Jason Gunthorpe Reviewed-by: Tony Krowiak Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Link: https://lore.kernel.org/r/20220923092652.100656-4-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 47ad3b104d9e..a5d8ae6132a2 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -10,6 +10,9 @@ #ifndef MDEV_H #define MDEV_H +#include +#include + struct mdev_type; struct mdev_device { -- cgit From 89345d5177aa0f6d678251e1e0870b0eeb1ab510 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:42 +0200 Subject: vfio/mdev: embedd struct mdev_parent in the parent data structure Simplify mdev_{un}register_device by requiring the caller to pass in a structure allocate as part of the parent device structure. This removes the need for a list of parents and the separate mdev_parent refcount as we can simplify rely on the reference to the parent device. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Tony Krowiak Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Reviewed-by: Eric Farman Link: https://lore.kernel.org/r/20220923092652.100656-5-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index a5d8ae6132a2..262512c2a8ff 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -23,6 +23,16 @@ struct mdev_device { bool active; }; +/* embedded into the struct device that the mdev devices hang off */ +struct mdev_parent { + struct device *dev; + struct mdev_driver *mdev_driver; + struct kset *mdev_types_kset; + struct list_head type_list; + /* Synchronize device creation/removal with parent unregistration */ + struct rw_semaphore unreg_sem; +}; + static inline struct mdev_device *to_mdev_device(struct device *dev) { return container_of(dev, struct mdev_device, dev); @@ -70,8 +80,9 @@ struct mdev_driver { extern struct bus_type mdev_bus_type; -int mdev_register_device(struct device *dev, struct mdev_driver *mdev_driver); -void mdev_unregister_device(struct device *dev); +int mdev_register_parent(struct mdev_parent *parent, struct device *dev, + struct mdev_driver *mdev_driver); +void mdev_unregister_parent(struct mdev_parent *parent); int mdev_register_driver(struct mdev_driver *drv); void mdev_unregister_driver(struct mdev_driver *drv); -- cgit From da44c340c4fe9d9653ae84fa6a60f406bafcffce Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:43 +0200 Subject: vfio/mdev: simplify mdev_type handling Instead of abusing struct attribute_group to control initialization of struct mdev_type, just define the actual attributes in the mdev_driver, allocate the mdev_type structures in the caller and pass them to mdev_register_parent. This allows the caller to use container_of to get at the containing structure and thus significantly simplify the code. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Tony Krowiak Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Reviewed-by: Eric Farman Link: https://lore.kernel.org/r/20220923092652.100656-6-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 262512c2a8ff..19bc93c10e8c 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -23,14 +23,27 @@ struct mdev_device { bool active; }; +struct mdev_type { + /* set by the driver before calling mdev_register parent: */ + const char *sysfs_name; + + /* set by the core, can be used drivers */ + struct mdev_parent *parent; + + /* internal only */ + struct kobject kobj; + struct kobject *devices_kobj; +}; + /* embedded into the struct device that the mdev devices hang off */ struct mdev_parent { struct device *dev; struct mdev_driver *mdev_driver; struct kset *mdev_types_kset; - struct list_head type_list; /* Synchronize device creation/removal with parent unregistration */ struct rw_semaphore unreg_sem; + struct mdev_type **types; + unsigned int nr_types; }; static inline struct mdev_device *to_mdev_device(struct device *dev) @@ -38,8 +51,6 @@ static inline struct mdev_device *to_mdev_device(struct device *dev) return container_of(dev, struct mdev_device, dev); } -unsigned int mdev_get_type_group_id(struct mdev_device *mdev); -unsigned int mtype_get_type_group_id(struct mdev_type *mtype); struct device *mtype_get_parent_dev(struct mdev_type *mtype); /* interface for exporting mdev supported type attributes */ @@ -66,22 +77,21 @@ struct mdev_type_attribute mdev_type_attr_##_name = \ * struct mdev_driver - Mediated device driver * @probe: called when new device created * @remove: called when device removed - * @supported_type_groups: Attributes to define supported types. It is mandatory - * to provide supported types. + * @types_attrs: attributes to the type kobjects. * @driver: device driver structure - * **/ struct mdev_driver { int (*probe)(struct mdev_device *dev); void (*remove)(struct mdev_device *dev); - struct attribute_group **supported_type_groups; + const struct attribute * const *types_attrs; struct device_driver driver; }; extern struct bus_type mdev_bus_type; int mdev_register_parent(struct mdev_parent *parent, struct device *dev, - struct mdev_driver *mdev_driver); + struct mdev_driver *mdev_driver, struct mdev_type **types, + unsigned int nr_types); void mdev_unregister_parent(struct mdev_parent *parent); int mdev_register_driver(struct mdev_driver *drv); -- cgit From cbf3bb28aaeaee425ca7b9c537a3efff1f8c98ae Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:44 +0200 Subject: vfio/mdev: remove mdev_from_dev Just open code it in the only caller. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Link: https://lore.kernel.org/r/20220923092652.100656-7-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 19bc93c10e8c..4f558de52fd9 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -102,9 +102,5 @@ static inline struct device *mdev_dev(struct mdev_device *mdev) { return &mdev->dev; } -static inline struct mdev_device *mdev_from_dev(struct device *dev) -{ - return dev->bus == &mdev_bus_type ? to_mdev_device(dev) : NULL; -} #endif /* MDEV_H */ -- cgit From 2815fe149ffa8e1a022b2830ab62999135c00a4e Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:45 +0200 Subject: vfio/mdev: unexport mdev_bus_type mdev_bus_type is only used in mdev.ko now, so unexport it. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Link: https://lore.kernel.org/r/20220923092652.100656-8-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 4f558de52fd9..6c179d2b8927 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -87,8 +87,6 @@ struct mdev_driver { struct device_driver driver; }; -extern struct bus_type mdev_bus_type; - int mdev_register_parent(struct mdev_parent *parent, struct device *dev, struct mdev_driver *mdev_driver, struct mdev_type **types, unsigned int nr_types); -- cgit From 062e720cd209d8091c4f3d118d93973f02209aca Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:46 +0200 Subject: vfio/mdev: remove mdev_parent_dev Just open code the dereferences in the only user. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Link: https://lore.kernel.org/r/20220923092652.100656-9-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 6c179d2b8927..bbedffcb38d4 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -95,7 +95,6 @@ void mdev_unregister_parent(struct mdev_parent *parent); int mdev_register_driver(struct mdev_driver *drv); void mdev_unregister_driver(struct mdev_driver *drv); -struct device *mdev_parent_dev(struct mdev_device *mdev); static inline struct device *mdev_dev(struct mdev_device *mdev) { return &mdev->dev; -- cgit From c7c1f38f6cba7e3249866c06639ea62755f0a24e Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:47 +0200 Subject: vfio/mdev: remove mtype_get_parent_dev Just open code the dereferences in the only user. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Jason J. Herne Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Reviewed-by: Eric Farman Link: https://lore.kernel.org/r/20220923092652.100656-10-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index bbedffcb38d4..e445f809ceca 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -51,8 +51,6 @@ static inline struct mdev_device *to_mdev_device(struct device *dev) return container_of(dev, struct mdev_device, dev); } -struct device *mtype_get_parent_dev(struct mdev_type *mtype); - /* interface for exporting mdev supported type attributes */ struct mdev_type_attribute { struct attribute attr; -- cgit From 290aac5df88a83e264b3a73ec146e5e5b3c45793 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 23 Sep 2022 11:26:48 +0200 Subject: vfio/mdev: consolidate all the device_api sysfs into the core code Every driver just emits a static string, simply feed it through the ops and provide a standard sysfs show function. Signed-off-by: Jason Gunthorpe Signed-off-by: Christoph Hellwig Reviewed-by: Tony Krowiak Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Reviewed-by: Eric Farman Link: https://lore.kernel.org/r/20220923092652.100656-11-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index e445f809ceca..af1ff0165b8d 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -61,11 +61,6 @@ struct mdev_type_attribute { size_t count); }; -#define MDEV_TYPE_ATTR(_name, _mode, _show, _store) \ -struct mdev_type_attribute mdev_type_attr_##_name = \ - __ATTR(_name, _mode, _show, _store) -#define MDEV_TYPE_ATTR_RW(_name) \ - struct mdev_type_attribute mdev_type_attr_##_name = __ATTR_RW(_name) #define MDEV_TYPE_ATTR_RO(_name) \ struct mdev_type_attribute mdev_type_attr_##_name = __ATTR_RO(_name) #define MDEV_TYPE_ATTR_WO(_name) \ @@ -73,12 +68,14 @@ struct mdev_type_attribute mdev_type_attr_##_name = \ /** * struct mdev_driver - Mediated device driver + * @device_api: string to return for the device_api sysfs * @probe: called when new device created * @remove: called when device removed * @types_attrs: attributes to the type kobjects. * @driver: device driver structure **/ struct mdev_driver { + const char *device_api; int (*probe)(struct mdev_device *dev); void (*remove)(struct mdev_device *dev); const struct attribute * const *types_attrs; -- cgit From 0bc79069ccbdbe26492493dd0c4e38b7cadf8ad5 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:49 +0200 Subject: vfio/mdev: consolidate all the name sysfs into the core code Every driver just emits a static string, simply add a field to the mdev_type for the driver to fill out or fall back to the sysfs name and provide a standard sysfs show function. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Reviewed-by: Eric Farman Link: https://lore.kernel.org/r/20220923092652.100656-12-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index af1ff0165b8d..4bb8a58b577b 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -26,6 +26,7 @@ struct mdev_device { struct mdev_type { /* set by the driver before calling mdev_register parent: */ const char *sysfs_name; + const char *pretty_name; /* set by the core, can be used drivers */ struct mdev_parent *parent; -- cgit From f2fbc72e6da4f8e01fe5fe3d6871a791e76271c3 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:50 +0200 Subject: vfio/mdev: consolidate all the available_instance sysfs into the core code Every driver just print a number, simply add a method to the mdev_driver to return it and provide a standard sysfs show function. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Reviewed-by: Eric Farman Link: https://lore.kernel.org/r/20220923092652.100656-13-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 4bb8a58b577b..d39e08a1824c 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -72,6 +72,7 @@ struct mdev_type_attribute { * @device_api: string to return for the device_api sysfs * @probe: called when new device created * @remove: called when device removed + * @get_available: Return the max number of instances that can be created * @types_attrs: attributes to the type kobjects. * @driver: device driver structure **/ @@ -79,6 +80,7 @@ struct mdev_driver { const char *device_api; int (*probe)(struct mdev_device *dev); void (*remove)(struct mdev_device *dev); + unsigned int (*get_available)(struct mdev_type *mtype); const struct attribute * const *types_attrs; struct device_driver driver; }; -- cgit From 685a1537f4c603cfcaf4b9be56ff6a571f7ddd08 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Fri, 23 Sep 2022 11:26:51 +0200 Subject: vfio/mdev: consolidate all the description sysfs into the core code Every driver just emits a string, simply add a method to the mdev_driver to return it and provide a standard sysfs show function. Remove the now unused types_attrs field in struct mdev_driver and the support code for it. Signed-off-by: Christoph Hellwig Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Link: https://lore.kernel.org/r/20220923092652.100656-14-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 19 ++----------------- 1 file changed, 2 insertions(+), 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index d39e08a1824c..33674cb5ed5d 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -52,28 +52,13 @@ static inline struct mdev_device *to_mdev_device(struct device *dev) return container_of(dev, struct mdev_device, dev); } -/* interface for exporting mdev supported type attributes */ -struct mdev_type_attribute { - struct attribute attr; - ssize_t (*show)(struct mdev_type *mtype, - struct mdev_type_attribute *attr, char *buf); - ssize_t (*store)(struct mdev_type *mtype, - struct mdev_type_attribute *attr, const char *buf, - size_t count); -}; - -#define MDEV_TYPE_ATTR_RO(_name) \ - struct mdev_type_attribute mdev_type_attr_##_name = __ATTR_RO(_name) -#define MDEV_TYPE_ATTR_WO(_name) \ - struct mdev_type_attribute mdev_type_attr_##_name = __ATTR_WO(_name) - /** * struct mdev_driver - Mediated device driver * @device_api: string to return for the device_api sysfs * @probe: called when new device created * @remove: called when device removed * @get_available: Return the max number of instances that can be created - * @types_attrs: attributes to the type kobjects. + * @show_description: Print a description of the mtype * @driver: device driver structure **/ struct mdev_driver { @@ -81,7 +66,7 @@ struct mdev_driver { int (*probe)(struct mdev_device *dev); void (*remove)(struct mdev_device *dev); unsigned int (*get_available)(struct mdev_type *mtype); - const struct attribute * const *types_attrs; + ssize_t (*show_description)(struct mdev_type *mtype, char *buf); struct device_driver driver; }; -- cgit From 9c799c224d6ebc5be51065bd3217a2d7eea23b8f Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 23 Sep 2022 11:26:52 +0200 Subject: vfio/mdev: add mdev available instance checking to the core Many of the mdev drivers use a simple counter for keeping track of the available instances. Move this code to the core code and store the counter in the mdev_parent. Implement it using correct locking, fixing mdpy. Drivers just provide the value in the mdev_driver at registration time and the core code takes care of maintaining it and exposing the value in sysfs. [hch: count instances per-parent instead of per-type, use an atomic_t to avoid taking mdev_list_lock in the show method] Signed-off-by: Jason Gunthorpe Signed-off-by: Christoph Hellwig Reviewed-by: Kevin Tian Reviewed-by: Kirti Wankhede Reviewed-by: Eric Farman Link: https://lore.kernel.org/r/20220923092652.100656-15-hch@lst.de Signed-off-by: Alex Williamson --- include/linux/mdev.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mdev.h b/include/linux/mdev.h index 33674cb5ed5d..139d05b26f82 100644 --- a/include/linux/mdev.h +++ b/include/linux/mdev.h @@ -45,6 +45,7 @@ struct mdev_parent { struct rw_semaphore unreg_sem; struct mdev_type **types; unsigned int nr_types; + atomic_t available_instances; }; static inline struct mdev_device *to_mdev_device(struct device *dev) @@ -55,6 +56,7 @@ static inline struct mdev_device *to_mdev_device(struct device *dev) /** * struct mdev_driver - Mediated device driver * @device_api: string to return for the device_api sysfs + * @max_instances: maximum number of instances supported (optional) * @probe: called when new device created * @remove: called when device removed * @get_available: Return the max number of instances that can be created @@ -63,6 +65,7 @@ static inline struct mdev_device *to_mdev_device(struct device *dev) **/ struct mdev_driver { const char *device_api; + unsigned int max_instances; int (*probe)(struct mdev_device *dev); void (*remove)(struct mdev_device *dev); unsigned int (*get_available)(struct mdev_type *mtype); -- cgit From 34e1ed189fab1b7533befb266b96051104c1deb6 Mon Sep 17 00:00:00 2001 From: Paul Cercueil Date: Mon, 8 Aug 2022 19:40:38 +0200 Subject: PM: Improve EXPORT_*_DEV_PM_OPS macros Update the _EXPORT_DEV_PM_OPS() internal macro. It was not used anywhere outside pm.h and pm_runtime.h, so it is safe to update it. Before, this macro would take a few parameters to be used as sleep and runtime callbacks. This made it unsuitable to use with different callbacks, for instance the "noirq" ones. It is now semantically different: instead of creating a conditionally exported dev_pm_ops structure, it only contains part of the definition. This macro should however never be used directly (hence the trailing underscore). Instead, the following four macros are provided: - EXPORT_DEV_PM_OPS(name) - EXPORT_GPL_DEV_PM_OPS(name) - EXPORT_NS_DEV_PM_OPS(name, ns) - EXPORT_NS_GPL_DEV_PM_OPS(name, ns) For instance, it is now possible to conditionally export noirq suspend/resume PM functions like this: EXPORT_GPL_DEV_PM_OPS(foo_pm_ops) = { NOIRQ_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) }; The existing helper macros EXPORT_*_SIMPLE_DEV_PM_OPS() and EXPORT_*_RUNTIME_DEV_PM_OPS() have been updated to use these new macros. Signed-off-by: Paul Cercueil Reviewed-by: Jonathan Cameron Signed-off-by: Rafael J. Wysocki --- include/linux/pm.h | 37 +++++++++++++++++++++++-------------- include/linux/pm_runtime.h | 20 ++++++++++++-------- 2 files changed, 35 insertions(+), 22 deletions(-) (limited to 'include/linux') diff --git a/include/linux/pm.h b/include/linux/pm.h index 871c9c49ec9d..93cd34f00822 100644 --- a/include/linux/pm.h +++ b/include/linux/pm.h @@ -375,19 +375,20 @@ const struct dev_pm_ops name = { \ } #ifdef CONFIG_PM -#define _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, runtime_suspend_fn, \ - runtime_resume_fn, idle_fn, sec, ns) \ - _DEFINE_DEV_PM_OPS(name, suspend_fn, resume_fn, runtime_suspend_fn, \ - runtime_resume_fn, idle_fn); \ - __EXPORT_SYMBOL(name, sec, ns) +#define _EXPORT_DEV_PM_OPS(name, sec, ns) \ + const struct dev_pm_ops name; \ + __EXPORT_SYMBOL(name, sec, ns); \ + const struct dev_pm_ops name #else -#define _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, runtime_suspend_fn, \ - runtime_resume_fn, idle_fn, sec, ns) \ -static __maybe_unused _DEFINE_DEV_PM_OPS(__static_##name, suspend_fn, \ - resume_fn, runtime_suspend_fn, \ - runtime_resume_fn, idle_fn) +#define _EXPORT_DEV_PM_OPS(name, sec, ns) \ + static __maybe_unused const struct dev_pm_ops __static_##name #endif +#define EXPORT_DEV_PM_OPS(name) _EXPORT_DEV_PM_OPS(name, "", "") +#define EXPORT_GPL_DEV_PM_OPS(name) _EXPORT_DEV_PM_OPS(name, "_gpl", "") +#define EXPORT_NS_DEV_PM_OPS(name, ns) _EXPORT_DEV_PM_OPS(name, "", #ns) +#define EXPORT_NS_GPL_DEV_PM_OPS(name, ns) _EXPORT_DEV_PM_OPS(name, "_gpl", #ns) + /* * Use this if you want to use the same suspend and resume callbacks for suspend * to RAM and hibernation. @@ -399,13 +400,21 @@ static __maybe_unused _DEFINE_DEV_PM_OPS(__static_##name, suspend_fn, \ _DEFINE_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL) #define EXPORT_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ - _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "", "") + EXPORT_DEV_PM_OPS(name) = { \ + SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ + } #define EXPORT_GPL_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ - _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "_gpl", "") + EXPORT_GPL_DEV_PM_OPS(name) = { \ + SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ + } #define EXPORT_NS_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn, ns) \ - _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "", #ns) + EXPORT_NS_DEV_PM_OPS(name, ns) = { \ + SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ + } #define EXPORT_NS_GPL_SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn, ns) \ - _EXPORT_DEV_PM_OPS(name, suspend_fn, resume_fn, NULL, NULL, NULL, "_gpl", #ns) + EXPORT_NS_GPL_DEV_PM_OPS(name, ns) = { \ + SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ + } /* Deprecated. Use DEFINE_SIMPLE_DEV_PM_OPS() instead. */ #define SIMPLE_DEV_PM_OPS(name, suspend_fn, resume_fn) \ diff --git a/include/linux/pm_runtime.h b/include/linux/pm_runtime.h index 0a41b2dcccad..9a8151a2bdea 100644 --- a/include/linux/pm_runtime.h +++ b/include/linux/pm_runtime.h @@ -40,17 +40,21 @@ resume_fn, idle_fn) #define EXPORT_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn) \ - _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ - suspend_fn, resume_fn, idle_fn, "", "") + EXPORT_DEV_PM_OPS(name) = { \ + RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ + } #define EXPORT_GPL_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn) \ - _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ - suspend_fn, resume_fn, idle_fn, "_gpl", "") + EXPORT_GPL_DEV_PM_OPS(name) = { \ + RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ + } #define EXPORT_NS_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn, ns) \ - _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ - suspend_fn, resume_fn, idle_fn, "", #ns) + EXPORT_NS_DEV_PM_OPS(name, ns) = { \ + RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ + } #define EXPORT_NS_GPL_RUNTIME_DEV_PM_OPS(name, suspend_fn, resume_fn, idle_fn, ns) \ - _EXPORT_DEV_PM_OPS(name, pm_runtime_force_suspend, pm_runtime_force_resume, \ - suspend_fn, resume_fn, idle_fn, "_gpl", #ns) + EXPORT_NS_GPL_DEV_PM_OPS(name, ns) = { \ + RUNTIME_PM_OPS(suspend_fn, resume_fn, idle_fn) \ + } #ifdef CONFIG_PM extern struct workqueue_struct *pm_wq; -- cgit From a9cfee0ef98e99c8b1951dfd1d57a88580354d0d Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Wed, 28 Sep 2022 23:38:53 +0800 Subject: f2fs: support recording stop_checkpoint reason into super_block This patch supports to record stop_checkpoint error into f2fs_super_block.s_stop_reason[]. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim --- include/linux/f2fs_fs.h | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index d445150c5350..5dd1e52b8997 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -73,6 +73,20 @@ struct f2fs_device { __le32 total_segments; } __packed; +/* reason of stop_checkpoint */ +enum stop_cp_reason { + STOP_CP_REASON_SHUTDOWN, + STOP_CP_REASON_FAULT_INJECT, + STOP_CP_REASON_META_PAGE, + STOP_CP_REASON_WRITE_FAIL, + STOP_CP_REASON_CORRUPTED_SUMMARY, + STOP_CP_REASON_UPDATE_INODE, + STOP_CP_REASON_FLUSH_FAIL, + STOP_CP_REASON_MAX, +}; + +#define MAX_STOP_REASON 32 + struct f2fs_super_block { __le32 magic; /* Magic Number */ __le16 major_ver; /* Major Version */ @@ -116,7 +130,8 @@ struct f2fs_super_block { __u8 hot_ext_count; /* # of hot file extension */ __le16 s_encoding; /* Filename charset encoding */ __le16 s_encoding_flags; /* Filename charset encoding flags */ - __u8 reserved[306]; /* valid reserved region */ + __u8 s_stop_reason[MAX_STOP_REASON]; /* stop checkpoint reason */ + __u8 reserved[274]; /* valid reserved region */ __le32 crc; /* checksum of superblock */ } __packed; -- cgit From 95fa90c9e5a7f14c2497d5b032544478c9377c3a Mon Sep 17 00:00:00 2001 From: Chao Yu Date: Wed, 28 Sep 2022 23:38:54 +0800 Subject: f2fs: support recording errors into superblock This patch supports to record detail reason of FSCORRUPTED error into f2fs_super_block.s_errors[]. Signed-off-by: Chao Yu Signed-off-by: Jaegeuk Kim --- include/linux/f2fs_fs.h | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/f2fs_fs.h b/include/linux/f2fs_fs.h index 5dd1e52b8997..ee0d75d9a302 100644 --- a/include/linux/f2fs_fs.h +++ b/include/linux/f2fs_fs.h @@ -87,6 +87,28 @@ enum stop_cp_reason { #define MAX_STOP_REASON 32 +/* detail reason for EFSCORRUPTED */ +enum f2fs_error { + ERROR_CORRUPTED_CLUSTER, + ERROR_FAIL_DECOMPRESSION, + ERROR_INVALID_BLKADDR, + ERROR_CORRUPTED_DIRENT, + ERROR_CORRUPTED_INODE, + ERROR_INCONSISTENT_SUMMARY, + ERROR_INCONSISTENT_FOOTER, + ERROR_INCONSISTENT_SUM_TYPE, + ERROR_CORRUPTED_JOURNAL, + ERROR_INCONSISTENT_NODE_COUNT, + ERROR_INCONSISTENT_BLOCK_COUNT, + ERROR_INVALID_CURSEG, + ERROR_INCONSISTENT_SIT, + ERROR_CORRUPTED_VERITY_XATTR, + ERROR_CORRUPTED_XATTR, + ERROR_MAX, +}; + +#define MAX_F2FS_ERRORS 16 + struct f2fs_super_block { __le32 magic; /* Magic Number */ __le16 major_ver; /* Major Version */ @@ -131,7 +153,8 @@ struct f2fs_super_block { __le16 s_encoding; /* Filename charset encoding */ __le16 s_encoding_flags; /* Filename charset encoding flags */ __u8 s_stop_reason[MAX_STOP_REASON]; /* stop checkpoint reason */ - __u8 reserved[274]; /* valid reserved region */ + __u8 s_errors[MAX_F2FS_ERRORS]; /* reason of image corrupts */ + __u8 reserved[258]; /* valid reserved region */ __le32 crc; /* checksum of superblock */ } __packed; -- cgit From f1375ec1df09fb09fecfdf9418d3784e7ba12469 Mon Sep 17 00:00:00 2001 From: Meng Li Date: Wed, 17 Aug 2022 11:46:27 +0800 Subject: cpufreq: amd-pstate: Expose struct amd_cpudata Expose struct amd_cpudata to AMD P-State unit test module. This data struct will be used on the following AMD P-State unit test (amd-pstate-ut) module. The amd-pstate-ut module can get some AMD infomations by this data struct. For example: highest perf, nominal perf, boost supported etc. Signed-off-by: Meng Li Acked-by: Huang Rui Acked-by: Shuah Khan Signed-off-by: Shuah Khan --- include/linux/amd-pstate.h | 77 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 77 insertions(+) create mode 100644 include/linux/amd-pstate.h (limited to 'include/linux') diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h new file mode 100644 index 000000000000..1c4b8659f171 --- /dev/null +++ b/include/linux/amd-pstate.h @@ -0,0 +1,77 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * linux/include/linux/amd-pstate.h + * + * Copyright (C) 2022 Advanced Micro Devices, Inc. + * + * Author: Meng Li + */ + +#ifndef _LINUX_AMD_PSTATE_H +#define _LINUX_AMD_PSTATE_H + +#include + +/********************************************************************* + * AMD P-state INTERFACE * + *********************************************************************/ +/** + * struct amd_aperf_mperf + * @aperf: actual performance frequency clock count + * @mperf: maximum performance frequency clock count + * @tsc: time stamp counter + */ +struct amd_aperf_mperf { + u64 aperf; + u64 mperf; + u64 tsc; +}; + +/** + * struct amd_cpudata - private CPU data for AMD P-State + * @cpu: CPU number + * @req: constraint request to apply + * @cppc_req_cached: cached performance request hints + * @highest_perf: the maximum performance an individual processor may reach, + * assuming ideal conditions + * @nominal_perf: the maximum sustained performance level of the processor, + * assuming ideal operating conditions + * @lowest_nonlinear_perf: the lowest performance level at which nonlinear power + * savings are achieved + * @lowest_perf: the absolute lowest performance level of the processor + * @max_freq: the frequency that mapped to highest_perf + * @min_freq: the frequency that mapped to lowest_perf + * @nominal_freq: the frequency that mapped to nominal_perf + * @lowest_nonlinear_freq: the frequency that mapped to lowest_nonlinear_perf + * @cur: Difference of Aperf/Mperf/tsc count between last and current sample + * @prev: Last Aperf/Mperf/tsc count value read from register + * @freq: current cpu frequency value + * @boost_supported: check whether the Processor or SBIOS supports boost mode + * + * The amd_cpudata is key private data for each CPU thread in AMD P-State, and + * represents all the attributes and goals that AMD P-State requests at runtime. + */ +struct amd_cpudata { + int cpu; + + struct freq_qos_request req[2]; + u64 cppc_req_cached; + + u32 highest_perf; + u32 nominal_perf; + u32 lowest_nonlinear_perf; + u32 lowest_perf; + + u32 max_freq; + u32 min_freq; + u32 nominal_freq; + u32 lowest_nonlinear_freq; + + struct amd_aperf_mperf cur; + struct amd_aperf_mperf prev; + + u64 freq; + bool boost_supported; +}; + +#endif /* _LINUX_AMD_PSTATE_H */ -- cgit From 07d2872bf4c864eb83d034263c155746a2fb7a3b Mon Sep 17 00:00:00 2001 From: Avri Altman Date: Wed, 28 Sep 2022 12:57:44 +0300 Subject: mmc: core: Add SD card quirk for broken discard Some SD-cards from Sandisk that are SDA-6.0 compliant reports they supports discard, while they actually don't. This might cause mk2fs to fail while trying to format the card and revert it to a read-only mode. To fix this problem, let's add a card quirk (MMC_QUIRK_BROKEN_SD_DISCARD) to indicate that we shall fall-back to use the legacy erase command instead. Signed-off-by: Avri Altman Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220928095744.16455-1-avri.altman@wdc.com [Ulf: Updated the commit message] Signed-off-by: Ulf Hansson --- include/linux/mmc/card.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h index 8a30de08e913..c726ea781255 100644 --- a/include/linux/mmc/card.h +++ b/include/linux/mmc/card.h @@ -293,6 +293,7 @@ struct mmc_card { #define MMC_QUIRK_BROKEN_IRQ_POLLING (1<<11) /* Polling SDIO_CCCR_INTx could create a fake interrupt */ #define MMC_QUIRK_TRIM_BROKEN (1<<12) /* Skip trim */ #define MMC_QUIRK_BROKEN_HPI (1<<13) /* Disable broken HPI support */ +#define MMC_QUIRK_BROKEN_SD_DISCARD (1<<14) /* Disable broken SD discard support */ bool reenable_cmdq; /* Re-enable Command Queue */ -- cgit From 90d482908eedd56f01a707325aa541cf9c40f936 Mon Sep 17 00:00:00 2001 From: Valentin Schneider Date: Mon, 3 Oct 2022 16:34:17 +0100 Subject: lib/find_bit: Introduce find_next_andnot_bit() In preparation of introducing for_each_cpu_andnot(), add a variant of find_next_bit() that negate the bits in @addr2 when ANDing them with the bits in @addr1. Signed-off-by: Valentin Schneider --- include/linux/find.h | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) (limited to 'include/linux') diff --git a/include/linux/find.h b/include/linux/find.h index 0cdfab9734a6..2235af19e7e1 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -12,6 +12,8 @@ unsigned long _find_next_bit(const unsigned long *addr1, unsigned long nbits, unsigned long start); unsigned long _find_next_and_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long nbits, unsigned long start); +unsigned long _find_next_andnot_bit(const unsigned long *addr1, const unsigned long *addr2, + unsigned long nbits, unsigned long start); unsigned long _find_next_zero_bit(const unsigned long *addr, unsigned long nbits, unsigned long start); extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size); @@ -91,6 +93,37 @@ unsigned long find_next_and_bit(const unsigned long *addr1, } #endif +#ifndef find_next_andnot_bit +/** + * find_next_andnot_bit - find the next set bit in *addr1 excluding all the bits + * in *addr2 + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * @offset: The bitnumber to start searching at + * + * Returns the bit number for the next set bit + * If no bits are set, returns @size. + */ +static inline +unsigned long find_next_andnot_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long size, + unsigned long offset) +{ + if (small_const_nbits(size)) { + unsigned long val; + + if (unlikely(offset >= size)) + return size; + + val = *addr1 & ~*addr2 & GENMASK(size - 1, offset); + return val ? __ffs(val) : size; + } + + return _find_next_andnot_bit(addr1, addr2, size, offset); +} +#endif + #ifndef find_next_zero_bit /** * find_next_zero_bit - find the next cleared bit in a memory region -- cgit From 5f75ff295c662c1c8fb9e1737e9dc3b9a1e7fb29 Mon Sep 17 00:00:00 2001 From: Valentin Schneider Date: Mon, 3 Oct 2022 16:34:18 +0100 Subject: cpumask: Introduce for_each_cpu_andnot() for_each_cpu_and() is very convenient as it saves having to allocate a temporary cpumask to store the result of cpumask_and(). The same issue applies to cpumask_andnot() which doesn't actually need temporary storage for iteration purposes. Following what has been done for for_each_cpu_and(), introduce for_each_cpu_andnot(). Signed-off-by: Valentin Schneider --- include/linux/cpumask.h | 18 ++++++++++++++++++ include/linux/find.h | 5 +++++ 2 files changed, 23 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 286804bfe3b7..2f065ad97541 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -306,6 +306,24 @@ unsigned int __pure cpumask_next_wrap(int n, const struct cpumask *mask, int sta #define for_each_cpu_and(cpu, mask1, mask2) \ for_each_and_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), nr_cpumask_bits) +/** + * for_each_cpu_andnot - iterate over every cpu present in one mask, excluding + * those present in another. + * @cpu: the (optionally unsigned) integer iterator + * @mask1: the first cpumask pointer + * @mask2: the second cpumask pointer + * + * This saves a temporary CPU mask in many places. It is equivalent to: + * struct cpumask tmp; + * cpumask_andnot(&tmp, &mask1, &mask2); + * for_each_cpu(cpu, &tmp) + * ... + * + * After the loop, cpu is >= nr_cpu_ids. + */ +#define for_each_cpu_andnot(cpu, mask1, mask2) \ + for_each_andnot_bit(cpu, cpumask_bits(mask1), cpumask_bits(mask2), nr_cpumask_bits) + /** * cpumask_any_but - return a "random" in a cpumask, but not this one. * @mask: the cpumask to search diff --git a/include/linux/find.h b/include/linux/find.h index 2235af19e7e1..ccaf61a0f5fd 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -498,6 +498,11 @@ unsigned long find_next_bit_le(const void *addr, unsigned (bit) = find_next_and_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\ (bit)++) +#define for_each_andnot_bit(bit, addr1, addr2, size) \ + for ((bit) = 0; \ + (bit) = find_next_andnot_bit((addr1), (addr2), (size), (bit)), (bit) < (size);\ + (bit)++) + /* same as for_each_set_bit() but use bit as value to start with */ #define for_each_set_bit_from(bit, addr, size) \ for (; (bit) = find_next_bit((addr), (size), (bit)), (bit) < (size); (bit)++) -- cgit From 39494194f93bed7926d4b3bd03a6a76ba23e612b Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 5 Oct 2022 15:57:35 -0400 Subject: SUNRPC: Fix races with rpc_killall_tasks() Ensure that we immediately call rpc_exit_task() after waking up, and that the tk_rpc_status cannot get clobbered by some other function. Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker --- include/linux/sunrpc/sched.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index acc62647317c..647247040ef9 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -209,6 +209,7 @@ struct rpc_task *rpc_run_task(const struct rpc_task_setup *); struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req); void rpc_put_task(struct rpc_task *); void rpc_put_task_async(struct rpc_task *); +bool rpc_task_set_rpc_status(struct rpc_task *task, int rpc_status); void rpc_signal_task(struct rpc_task *); void rpc_exit_task(struct rpc_task *); void rpc_exit(struct rpc_task *, int); -- cgit From f8423909ecca208834a9d704e58409800f8b5f21 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 5 Oct 2022 15:57:36 -0400 Subject: SUNRPC: Add a helper to allow pNFS drivers to selectively cancel RPC calls Add the helper rpc_cancel_tasks(), which uses a caller-defined selection function to define a set of in-flight RPC calls to cancel. This is mainly intended for pNFS drivers which are subject to a layout recall, and which may therefore want to cancel all pending I/O using that layout in order to redrive it after the layout recall has been satisfied. Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker --- include/linux/sunrpc/sched.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h index 647247040ef9..cdcf0fe56a6f 100644 --- a/include/linux/sunrpc/sched.h +++ b/include/linux/sunrpc/sched.h @@ -210,11 +210,16 @@ struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req); void rpc_put_task(struct rpc_task *); void rpc_put_task_async(struct rpc_task *); bool rpc_task_set_rpc_status(struct rpc_task *task, int rpc_status); +void rpc_task_try_cancel(struct rpc_task *task, int error); void rpc_signal_task(struct rpc_task *); void rpc_exit_task(struct rpc_task *); void rpc_exit(struct rpc_task *, int); void rpc_release_calldata(const struct rpc_call_ops *, void *); void rpc_killall_tasks(struct rpc_clnt *); +unsigned long rpc_cancel_tasks(struct rpc_clnt *clnt, int error, + bool (*fnmatch)(const struct rpc_task *, + const void *), + const void *data); void rpc_execute(struct rpc_task *); void rpc_init_priority_wait_queue(struct rpc_wait_queue *, const char *); void rpc_init_wait_queue(struct rpc_wait_queue *, const char *); -- cgit From dc4c4304855a5721d214e2a53e17df5152dd5f34 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Wed, 5 Oct 2022 15:57:37 -0400 Subject: SUNRPC: Add API to force the client to disconnect Allow the caller to force a disconnection of the RPC client so that we can clear any pending requests that are buffered in the socket. Signed-off-by: Trond Myklebust Signed-off-by: Anna Schumaker --- include/linux/sunrpc/clnt.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h index 75eea5ebb179..770ef2cb5775 100644 --- a/include/linux/sunrpc/clnt.h +++ b/include/linux/sunrpc/clnt.h @@ -246,6 +246,7 @@ void rpc_clnt_xprt_switch_remove_xprt(struct rpc_clnt *, struct rpc_xprt *); bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt, const struct sockaddr *sap); void rpc_clnt_xprt_set_online(struct rpc_clnt *clnt, struct rpc_xprt *xprt); +void rpc_clnt_disconnect(struct rpc_clnt *clnt); void rpc_cleanup_clids(void); static inline int rpc_reply_expected(struct rpc_task *task) -- cgit From a890d1c657ecba73a7b28591c92587aef1be1888 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 5 Oct 2022 12:54:38 +0200 Subject: random: clear new batches when bringing new CPUs online The commit that added the new get_random_{u8,u16}() functions neglected to update the code that clears the batches when bringing up a new CPU. It also forgot a few comments and helper defines, so add those in too. Fixes: 585cd5fe9f73 ("random: add 8-bit and 16-bit batches") Signed-off-by: Jason A. Donenfeld --- include/linux/random.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/random.h b/include/linux/random.h index 2c130f8f18e5..08322f700cdc 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -96,6 +96,8 @@ static inline int get_random_bytes_wait(void *buf, size_t nbytes) *out = get_random_ ## name(); \ return 0; \ } +declare_get_random_var_wait(u8, u8) +declare_get_random_var_wait(u16, u16) declare_get_random_var_wait(u32, u32) declare_get_random_var_wait(u64, u32) declare_get_random_var_wait(int, unsigned int) -- cgit From e3e6e1d16a4cf7b63159ec71774e822194071954 Mon Sep 17 00:00:00 2001 From: Hawkins Jiawei Date: Tue, 27 Sep 2022 07:34:59 +0800 Subject: wifi: wext: use flex array destination for memcpy() Syzkaller reports buffer overflow false positive as follows: ------------[ cut here ]------------ memcpy: detected field-spanning write (size 8) of single field "&compat_event->pointer" at net/wireless/wext-core.c:623 (size 4) WARNING: CPU: 0 PID: 3607 at net/wireless/wext-core.c:623 wireless_send_event+0xab5/0xca0 net/wireless/wext-core.c:623 Modules linked in: CPU: 1 PID: 3607 Comm: syz-executor659 Not tainted 6.0.0-rc6-next-20220921-syzkaller #0 [...] Call Trace: ioctl_standard_call+0x155/0x1f0 net/wireless/wext-core.c:1022 wireless_process_ioctl+0xc8/0x4c0 net/wireless/wext-core.c:955 wext_ioctl_dispatch net/wireless/wext-core.c:988 [inline] wext_ioctl_dispatch net/wireless/wext-core.c:976 [inline] wext_handle_ioctl+0x26b/0x280 net/wireless/wext-core.c:1049 sock_ioctl+0x285/0x640 net/socket.c:1220 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] Wireless events will be sent on the appropriate channels in wireless_send_event(). Different wireless events may have different payload structure and size, so kernel uses **len** and **cmd** field in struct __compat_iw_event as wireless event common LCP part, uses **pointer** as a label to mark the position of remaining different part. Yet the problem is that, **pointer** is a compat_caddr_t type, which may be smaller than the relative structure at the same position. So during wireless_send_event() tries to parse the wireless events payload, it may trigger the memcpy() run-time destination buffer bounds checking when the relative structure's data is copied to the position marked by **pointer**. This patch solves it by introducing flexible-array field **ptr_bytes**, to mark the position of the wireless events remaining part next to LCP part. What's more, this patch also adds **ptr_len** variable in wireless_send_event() to improve its maintainability. Reported-and-tested-by: syzbot+473754e5af963cf014cf@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/00000000000070db2005e95a5984@google.com/ Suggested-by: Kees Cook Reviewed-by: Kees Cook Signed-off-by: Hawkins Jiawei Signed-off-by: Johannes Berg --- include/linux/wireless.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/wireless.h b/include/linux/wireless.h index 2d1b54556eff..e6e34d74dda0 100644 --- a/include/linux/wireless.h +++ b/include/linux/wireless.h @@ -26,7 +26,15 @@ struct compat_iw_point { struct __compat_iw_event { __u16 len; /* Real length of this stuff */ __u16 cmd; /* Wireless IOCTL */ - compat_caddr_t pointer; + + union { + compat_caddr_t pointer; + + /* we need ptr_bytes to make memcpy() run-time destination + * buffer bounds checking happy, nothing special + */ + DECLARE_FLEX_ARRAY(__u8, ptr_bytes); + }; }; #define IW_EV_COMPAT_LCP_LEN offsetof(struct __compat_iw_event, pointer) #define IW_EV_COMPAT_POINT_OFF offsetof(struct compat_iw_point, length) -- cgit From cdbd952bb7b5fca36676b3d318796b196b127397 Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Mon, 15 Aug 2022 18:04:20 -0400 Subject: virtio: drop vp_legacy_set_queue_size There's actually no way to set queue size on legacy virtio pci. Signed-off-by: Michael S. Tsirkin Message-Id: <20220815220447.155860-1-mst@redhat.com> --- include/linux/virtio_pci_legacy.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/virtio_pci_legacy.h b/include/linux/virtio_pci_legacy.h index e5d665faf00e..a8dc757d0367 100644 --- a/include/linux/virtio_pci_legacy.h +++ b/include/linux/virtio_pci_legacy.h @@ -32,8 +32,6 @@ void vp_legacy_set_queue_address(struct virtio_pci_legacy_device *ldev, u16 index, u32 queue_pfn); bool vp_legacy_get_queue_enable(struct virtio_pci_legacy_device *ldev, u16 idx); -void vp_legacy_set_queue_size(struct virtio_pci_legacy_device *ldev, - u16 idx, u16 size); u16 vp_legacy_get_queue_size(struct virtio_pci_legacy_device *ldev, u16 idx); int vp_legacy_probe(struct virtio_pci_legacy_device *ldev); -- cgit From 90fea5a800c3dd80fb8ad9a02929bcef5fde42b8 Mon Sep 17 00:00:00 2001 From: Jason Wang Date: Tue, 27 Sep 2022 15:48:08 +0800 Subject: vdpa: device feature provisioning This patch allows the device features to be provisioned through netlink. A new attribute is introduced to allow the userspace to pass a 64bit device features during device adding. This provides several advantages: - Allow to provision a subset of the features to ease the cross vendor live migration. - Better debug-ability for vDPA framework and parent. Reviewed-by: Eli Cohen Signed-off-by: Jason Wang Message-Id: <20220927074810.28627-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin --- include/linux/vdpa.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h index d282f464d2f1..6d0f5e4e82c2 100644 --- a/include/linux/vdpa.h +++ b/include/linux/vdpa.h @@ -104,6 +104,7 @@ struct vdpa_iova_range { }; struct vdpa_dev_set_config { + u64 device_features; struct { u8 mac[ETH_ALEN]; u16 mtu; -- cgit From 4b22ef042d6f54a6e5899555f2db71749133eca8 Mon Sep 17 00:00:00 2001 From: Jason Gunthorpe Date: Fri, 7 Oct 2022 11:04:39 -0300 Subject: vfio: Add vfio_file_is_group() This replaces uses of vfio_file_iommu_group() which were only detecting if the file is a VFIO file with no interest in the actual group. The only remaning user of vfio_file_iommu_group() is in KVM for the SPAPR stuff. It passes the iommu_group into the arch code through kvm for some reason. Tested-by: Matthew Rosato Tested-by: Christian Borntraeger Tested-by: Eric Farman Signed-off-by: Jason Gunthorpe Link: https://lore.kernel.org/r/1-v2-15417f29324e+1c-vfio_group_disassociate_jgg@nvidia.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index ee399a768070..e7cebeb875dd 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -199,6 +199,7 @@ int vfio_mig_get_next_state(struct vfio_device *device, * External user API */ struct iommu_group *vfio_file_iommu_group(struct file *file); +bool vfio_file_is_group(struct file *file); bool vfio_file_enforced_coherent(struct file *file); void vfio_file_set_kvm(struct file *file, struct kvm *kvm); bool vfio_file_has_dev(struct file *file, struct vfio_device *device); -- cgit From 9afd20a5a1b3558950b5e12f3ed057ef93c64a05 Mon Sep 17 00:00:00 2001 From: Viresh Kumar Date: Mon, 10 Oct 2022 09:52:37 +0530 Subject: clk: spear: Move prototype to accessible header Fixes the following W=1 kernel build warning(s): drivers/clk/spear/spear6xx_clock.c:116:13: warning: no previous prototype for function 'spear6xx_clk_init' [-Wmissing-prototypes] Reported-by: kernel test robot Signed-off-by: Viresh Kumar Signed-off-by: Arnd Bergmann --- include/linux/clk/spear.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include/linux') diff --git a/include/linux/clk/spear.h b/include/linux/clk/spear.h index a64d034ceddd..eaf95ca656f8 100644 --- a/include/linux/clk/spear.h +++ b/include/linux/clk/spear.h @@ -8,6 +8,20 @@ #ifndef __LINUX_CLK_SPEAR_H #define __LINUX_CLK_SPEAR_H +#ifdef CONFIG_ARCH_SPEAR3XX +void __init spear3xx_clk_init(void __iomem *misc_base, + void __iomem *soc_config_base); +#else +static inline void __init spear3xx_clk_init(void __iomem *misc_base, + void __iomem *soc_config_base) {} +#endif + +#ifdef CONFIG_ARCH_SPEAR6XX +void __init spear6xx_clk_init(void __iomem *misc_base); +#else +static inline void __init spear6xx_clk_init(void __iomem *misc_base) {} +#endif + #ifdef CONFIG_MACH_SPEAR1310 void __init spear1310_clk_init(void __iomem *misc_base, void __iomem *ras_base); #else -- cgit From b994d8f0773cf3b01129c094d00050710f2c422b Mon Sep 17 00:00:00 2001 From: Jonathan Neuschäfer Date: Sat, 8 Oct 2022 17:14:59 +0200 Subject: spi: spi-mem: Fix typo (of -> or) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit In this instance, "or" makes more sense than "of", so I guess that "or" was intended and "of" was a typo. Signed-off-by: Jonathan Neuschäfer Link: https://lore.kernel.org/r/20221008151459.1421406-1-j.neuschaefer@gmx.net Signed-off-by: Mark Brown --- include/linux/spi/spi-mem.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h index 2ba044d0d5e5..8e984d75f5b6 100644 --- a/include/linux/spi/spi-mem.h +++ b/include/linux/spi/spi-mem.h @@ -225,7 +225,7 @@ static inline void *spi_mem_get_drvdata(struct spi_mem *mem) /** * struct spi_controller_mem_ops - SPI memory operations * @adjust_op_size: shrink the data xfer of an operation to match controller's - * limitations (can be alignment of max RX/TX size + * limitations (can be alignment or max RX/TX size * limitations) * @supports_op: check if an operation is supported by the controller * @exec_op: execute a SPI memory operation -- cgit From ca5eebda3e1c1a58a1c5a337da393ed6734593e3 Mon Sep 17 00:00:00 2001 From: Brian Foster Date: Mon, 3 Oct 2022 09:35:34 -0400 Subject: block: avoid sign extend problem with default queue flags mask request_queue->queue_flags is unsigned long, which is 8-bytes on 64-bit architectures. Most queue flag modifications occur through bit field helpers, but default flags can be logically OR'd via the QUEUE_FLAG_MQ_DEFAULT mask. If this mask happens to include bit 31, the assignment can sign extend the field and set all upper 32 bits. This exact problem has been observed on a downstream kernel that happens to use bit 31 for QUEUE_FLAG_NOWAIT. This is not an immediate problem for current upstream because bit 31 is not included in the default flag assignment (and is not used at all, actually). Regardless, fix up the QUEUE_FLAG_MQ_DEFAULT mask definition to avoid the landmine in the future. Signed-off-by: Brian Foster Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20221003133534.1075582-1-bfoster@redhat.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 49373d002631..e4fb47753177 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -580,9 +580,9 @@ struct request_queue { #define QUEUE_FLAG_NOWAIT 29 /* device supports NOWAIT */ #define QUEUE_FLAG_SQ_SCHED 30 /* single queue style io dispatch */ -#define QUEUE_FLAG_MQ_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \ - (1 << QUEUE_FLAG_SAME_COMP) | \ - (1 << QUEUE_FLAG_NOWAIT)) +#define QUEUE_FLAG_MQ_DEFAULT ((1UL << QUEUE_FLAG_IO_STAT) | \ + (1UL << QUEUE_FLAG_SAME_COMP) | \ + (1UL << QUEUE_FLAG_NOWAIT)) void blk_queue_flag_set(unsigned int flag, struct request_queue *q); void blk_queue_flag_clear(unsigned int flag, struct request_queue *q); -- cgit From a6d1ce5951185ee91bbe6909fe2758f3625561b0 Mon Sep 17 00:00:00 2001 From: Yosry Ahmed Date: Tue, 11 Oct 2022 00:33:58 +0000 Subject: cgroup: add cgroup_v1v2_get_from_[fd/file]() Add cgroup_v1v2_get_from_fd() and cgroup_v1v2_get_from_file() that support both cgroup1 and cgroup2. Signed-off-by: Yosry Ahmed Signed-off-by: Tejun Heo --- include/linux/cgroup.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 398f0bce7c21..a88de5bdeaa9 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -106,6 +106,7 @@ struct cgroup_subsys_state *css_tryget_online_from_dir(struct dentry *dentry, struct cgroup *cgroup_get_from_path(const char *path); struct cgroup *cgroup_get_from_fd(int fd); +struct cgroup *cgroup_v1v2_get_from_fd(int fd); int cgroup_attach_task_all(struct task_struct *from, struct task_struct *); int cgroup_transfer_tasks(struct cgroup *to, struct cgroup *from); -- cgit From 81895a65ec63ee1daec3255dc1a06675d2fbe915 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 5 Oct 2022 16:43:38 +0200 Subject: treewide: use prandom_u32_max() when possible, part 1 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rather than incurring a division or requesting too many random bytes for the given range, use the prandom_u32_max() function, which only takes the minimum required bytes from the RNG and avoids divisions. This was done mechanically with this coccinelle script: @basic@ expression E; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; typedef u64; @@ ( - ((T)get_random_u32() % (E)) + prandom_u32_max(E) | - ((T)get_random_u32() & ((E) - 1)) + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2) | - ((u64)(E) * get_random_u32() >> 32) + prandom_u32_max(E) | - ((T)get_random_u32() & ~PAGE_MASK) + prandom_u32_max(PAGE_SIZE) ) @multi_line@ identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; identifier RAND; expression E; @@ - RAND = get_random_u32(); ... when != RAND - RAND %= (E); + RAND = prandom_u32_max(E); // Find a potential literal @literal_mask@ expression LITERAL; type T; identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32"; position p; @@ ((T)get_random_u32()@p & (LITERAL)) // Add one to the literal. @script:python add_one@ literal << literal_mask.LITERAL; RESULT; @@ value = None if literal.startswith('0x'): value = int(literal, 16) elif literal[0] in '123456789': value = int(literal, 10) if value is None: print("I don't know how to handle %s" % (literal)) cocci.include_match(False) elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1: print("Skipping 0x%x for cleanup elsewhere" % (value)) cocci.include_match(False) elif value & (value + 1) != 0: print("Skipping 0x%x because it's not a power of two minus one" % (value)) cocci.include_match(False) elif literal.startswith('0x'): coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1)) else: coccinelle.RESULT = cocci.make_expr("%d" % (value + 1)) // Replace the literal mask with the calculated result. @plus_one@ expression literal_mask.LITERAL; position literal_mask.p; expression add_one.RESULT; identifier FUNC; @@ - (FUNC()@p & (LITERAL)) + prandom_u32_max(RESULT) @collapse_ret@ type T; identifier VAR; expression E; @@ { - T VAR; - VAR = (E); - return VAR; + return E; } @drop_var@ type T; identifier VAR; @@ { - T VAR; ... when != VAR } Reviewed-by: Greg Kroah-Hartman Reviewed-by: Kees Cook Reviewed-by: Yury Norov Reviewed-by: KP Singh Reviewed-by: Jan Kara # for ext4 and sbitmap Reviewed-by: Christoph Böhmwalder # for drbd Acked-by: Jakub Kicinski Acked-by: Heiko Carstens # for s390 Acked-by: Ulf Hansson # for mmc Acked-by: Darrick J. Wong # for xfs Signed-off-by: Jason A. Donenfeld --- include/linux/nodemask.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 378956c93c94..efef68c9352a 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -516,7 +516,7 @@ static inline int node_random(const nodemask_t *maskp) bit = first_node(*maskp); break; default: - bit = find_nth_bit(maskp->bits, MAX_NUMNODES, get_random_int() % w); + bit = find_nth_bit(maskp->bits, MAX_NUMNODES, prandom_u32_max(w)); break; } return bit; -- cgit From de492c83cae0af72de370b9404aacda93dafcad5 Mon Sep 17 00:00:00 2001 From: "Jason A. Donenfeld" Date: Wed, 5 Oct 2022 17:50:20 +0200 Subject: prandom: remove unused functions With no callers left of prandom_u32() and prandom_bytes(), as well as get_random_int(), remove these deprecated wrappers, in favor of get_random_u32() and get_random_bytes(). Reviewed-by: Greg Kroah-Hartman Reviewed-by: Kees Cook Reviewed-by: Yury Norov Acked-by: Jakub Kicinski Signed-off-by: Jason A. Donenfeld --- include/linux/prandom.h | 12 ------------ include/linux/random.h | 5 ----- 2 files changed, 17 deletions(-) (limited to 'include/linux') diff --git a/include/linux/prandom.h b/include/linux/prandom.h index 78db003bc290..e0a0759dd09c 100644 --- a/include/linux/prandom.h +++ b/include/linux/prandom.h @@ -12,18 +12,6 @@ #include #include -/* Deprecated: use get_random_u32 instead. */ -static inline u32 prandom_u32(void) -{ - return get_random_u32(); -} - -/* Deprecated: use get_random_bytes instead. */ -static inline void prandom_bytes(void *buf, size_t nbytes) -{ - return get_random_bytes(buf, nbytes); -} - struct rnd_state { __u32 s1, s2, s3, s4; }; diff --git a/include/linux/random.h b/include/linux/random.h index 08322f700cdc..147a5e0d0b8e 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -42,10 +42,6 @@ u8 get_random_u8(void); u16 get_random_u16(void); u32 get_random_u32(void); u64 get_random_u64(void); -static inline unsigned int get_random_int(void) -{ - return get_random_u32(); -} static inline unsigned long get_random_long(void) { #if BITS_PER_LONG == 64 @@ -100,7 +96,6 @@ declare_get_random_var_wait(u8, u8) declare_get_random_var_wait(u16, u16) declare_get_random_var_wait(u32, u32) declare_get_random_var_wait(u64, u32) -declare_get_random_var_wait(int, unsigned int) declare_get_random_var_wait(long, unsigned long) #undef declare_get_random_var -- cgit From 6a961bffd1c3505c13b4d33bbb8385fe08239cb8 Mon Sep 17 00:00:00 2001 From: Tiezhu Yang Date: Fri, 2 Sep 2022 11:41:46 +0800 Subject: include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype The argument has_signal of arch_do_signal_or_restart() has been removed in commit 8ba62d37949e ("task_work: Call tracehook_notify_signal from get_signal on all architectures"), let us remove the related comment. Link: https://lkml.kernel.org/r/1662090106-5545-1-git-send-email-yangtiezhu@loongson.cn Fixes: 8ba62d37949e ("task_work: Call tracehook_notify_signal from get_signal on all architectures") Signed-off-by: Tiezhu Yang Reviewed-by: Kees Cook Signed-off-by: Andrew Morton --- include/linux/entry-common.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index 84a466b176cf..d95ab85f96ba 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -253,7 +253,6 @@ static __always_inline void arch_exit_to_user_mode(void) { } /** * arch_do_signal_or_restart - Architecture specific signal delivery function * @regs: Pointer to currents pt_regs - * @has_signal: actual signal to handle * * Invoked from exit_to_user_mode_loop(). */ -- cgit From fac35ba763ed07ba93154c95ffc0c4a55023707f Mon Sep 17 00:00:00 2001 From: Baolin Wang Date: Thu, 1 Sep 2022 18:41:31 +0800 Subject: mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page On some architectures (like ARM64), it can support CONT-PTE/PMD size hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and 1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified. So when looking up a CONT-PTE size hugetlb page by follow_page(), it will use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size hugetlb in follow_page_pte(). However this pte entry lock is incorrect for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get the correct lock, which is mm->page_table_lock. That means the pte entry of the CONT-PTE size hugetlb under current pte lock is unstable in follow_page_pte(), we can continue to migrate or poison the pte entry of the CONT-PTE size hugetlb, which can cause some potential race issues, even though they are under the 'pte lock'. For example, suppose thread A is trying to look up a CONT-PTE size hugetlb page by move_pages() syscall under the lock, however antoher thread B can migrate the CONT-PTE hugetlb page at the same time, which will cause thread A to get an incorrect page, if thread A also wants to do page migration, then data inconsistency error occurs. Moreover we have the same issue for CONT-PMD size hugetlb in follow_huge_pmd(). To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte() to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to get the correct pte entry lock to make the pte entry stable. Mike said: Support for CONT_PMD/_PTE was added with bb9dd3df8ee9 ("arm64: hugetlb: refactor find_num_contig()"). Patch series "Support for contiguous pte hugepages", v4. However, I do not believe these code paths were executed until migration support was added with 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") I would go with 5480280d3f2d for the Fixes: targe. Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com Fixes: 5480280d3f2d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages") Signed-off-by: Baolin Wang Suggested-by: Mike Kravetz Reviewed-by: Mike Kravetz Cc: David Hildenbrand Cc: Muchun Song Cc: Signed-off-by: Andrew Morton --- include/linux/hugetlb.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3ec981a0d8b3..67c88b82fc32 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -207,8 +207,8 @@ struct page *follow_huge_addr(struct mm_struct *mm, unsigned long address, struct page *follow_huge_pd(struct vm_area_struct *vma, unsigned long address, hugepd_t hpd, int flags, int pdshift); -struct page *follow_huge_pmd(struct mm_struct *mm, unsigned long address, - pmd_t *pmd, int flags); +struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, unsigned long address, + int flags); struct page *follow_huge_pud(struct mm_struct *mm, unsigned long address, pud_t *pud, int flags); struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, @@ -312,8 +312,8 @@ static inline struct page *follow_huge_pd(struct vm_area_struct *vma, return NULL; } -static inline struct page *follow_huge_pmd(struct mm_struct *mm, - unsigned long address, pmd_t *pmd, int flags) +static inline struct page *follow_huge_pmd_pte(struct vm_area_struct *vma, + unsigned long address, int flags) { return NULL; } -- cgit From 0091bfc81741b8d3aeb3b7ab8636f911b2de6e80 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Mon, 3 Oct 2022 13:59:47 +0100 Subject: io_uring/af_unix: defer registered files gc to io_uring release Instead of putting io_uring's registered files in unix_gc() we want it to be done by io_uring itself. The trick here is to consider io_uring registered files for cycle detection but not actually putting them down. Because io_uring can't register other ring instances, this will remove all refs to the ring file triggering the ->release path and clean up with io_ring_ctx_free(). Cc: stable@vger.kernel.org Fixes: 6b06314c47e1 ("io_uring: add file set registration") Reported-and-tested-by: David Bouman Signed-off-by: Pavel Begunkov Signed-off-by: Thadeu Lima de Souza Cascardo [axboe: add kerneldoc comment to skb, fold in skb leak fix] Signed-off-by: Jens Axboe --- include/linux/skbuff.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 9fcf534f2d92..7be5bb4c94b6 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -803,6 +803,7 @@ typedef unsigned char *sk_buff_data_t; * @csum_level: indicates the number of consecutive checksums found in * the packet minus one that have been verified as * CHECKSUM_UNNECESSARY (max 3) + * @scm_io_uring: SKB holds io_uring registered files * @dst_pending_confirm: need to confirm neighbour * @decrypted: Decrypted SKB * @slow_gro: state present at GRO time, slower prepare step required @@ -982,6 +983,7 @@ struct sk_buff { #endif __u8 slow_gro:1; __u8 csum_not_inet:1; + __u8 scm_io_uring:1; #ifdef CONFIG_NET_SCHED __u16 tc_index; /* traffic control index */ -- cgit From b7a817752efc850603c4c23ed78da2b990a6a34a Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Tue, 4 Oct 2022 03:19:25 +0100 Subject: io_uring: remove notif leftovers Notifications were killed but there is a couple of fields and struct declarations left, remove them. Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/8df8877d677be5a2b43afd936d600e60105ea960.1664849941.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring_types.h | 5 ----- 1 file changed, 5 deletions(-) (limited to 'include/linux') diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index aa4d90a53866..f5b687a787a3 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -34,9 +34,6 @@ struct io_file_table { unsigned int alloc_hint; }; -struct io_notif; -struct io_notif_slot; - struct io_hash_bucket { spinlock_t lock; struct hlist_head list; @@ -242,8 +239,6 @@ struct io_ring_ctx { unsigned nr_user_files; unsigned nr_user_bufs; struct io_mapped_ubuf **user_bufs; - struct io_notif_slot *notif_slots; - unsigned nr_notif_slots; struct io_submit_state submit_state; -- cgit From 7be1c1a3c7b13fb259bb5159662a7b83622013b8 Mon Sep 17 00:00:00 2001 From: Alexey Dobriyan Date: Tue, 11 Oct 2022 20:55:31 +0300 Subject: mm: more vma cache removal Link: https://lkml.kernel.org/r/Y0WuE3Riv4iy5Jx8@localhost.localdomain Fixes: 7964cf8caa4d ("mm: remove vmacache") Signed-off-by: Alexey Dobriyan Acked-by: Liam Howlett Signed-off-by: Andrew Morton --- include/linux/sched.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/sched.h b/include/linux/sched.h index 88a043f7235e..e0bb85cf8bdd 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -861,8 +861,6 @@ struct task_struct { struct mm_struct *mm; struct mm_struct *active_mm; - /* Per-thread vma caching: */ - #ifdef SPLIT_RSS_COUNTING struct task_rss_stat rss_stat; #endif -- cgit From 652e04464d3944226052c827bdaaf5113b072870 Mon Sep 17 00:00:00 2001 From: Xin Hao Date: Tue, 27 Sep 2022 08:19:45 +0800 Subject: mm/damon: move sz_damon_region to damon_sz_region Rename sz_damon_region() to damon_sz_region(), and move it to "include/linux/damon.h", because in many places, we can to use this func. Link: https://lkml.kernel.org/r/20220927001946.85375-1-xhao@linux.alibaba.com Signed-off-by: Xin Hao Suggested-by: SeongJae Park Reviewed-by: SeongJae Park Signed-off-by: Andrew Morton --- include/linux/damon.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/damon.h b/include/linux/damon.h index ed5470f50bab..620ada094c3b 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -484,6 +484,12 @@ static inline struct damon_region *damon_first_region(struct damon_target *t) return list_first_entry(&t->regions_list, struct damon_region, list); } +static inline unsigned long damon_sz_region(struct damon_region *r) +{ + return r->ar.end - r->ar.start; +} + + #define damon_for_each_region(r, t) \ list_for_each_entry(r, &t->regions_list, list) -- cgit From 16ce101db85db694a91380aa4c89b25530871d33 Mon Sep 17 00:00:00 2001 From: Alistair Popple Date: Wed, 28 Sep 2022 22:01:15 +1000 Subject: mm/memory.c: fix race when faulting a device private page MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "Fix several device private page reference counting issues", v2 This series aims to fix a number of page reference counting issues in drivers dealing with device private ZONE_DEVICE pages. These result in use-after-free type bugs, either from accessing a struct page which no longer exists because it has been removed or accessing fields within the struct page which are no longer valid because the page has been freed. During normal usage it is unlikely these will cause any problems. However without these fixes it is possible to crash the kernel from userspace. These crashes can be triggered either by unloading the kernel module or unbinding the device from the driver prior to a userspace task exiting. In modules such as Nouveau it is also possible to trigger some of these issues by explicitly closing the device file-descriptor prior to the task exiting and then accessing device private memory. This involves some minor changes to both PowerPC and AMD GPU code. Unfortunately I lack hardware to test either of those so any help there would be appreciated. The changes mimic what is done in for both Nouveau and hmm-tests though so I doubt they will cause problems. This patch (of 8): When the CPU tries to access a device private page the migrate_to_ram() callback associated with the pgmap for the page is called. However no reference is taken on the faulting page. Therefore a concurrent migration of the device private page can free the page and possibly the underlying pgmap. This results in a race which can crash the kernel due to the migrate_to_ram() function pointer becoming invalid. It also means drivers can't reliably read the zone_device_data field because the page may have been freed with memunmap_pages(). Close the race by getting a reference on the page while holding the ptl to ensure it has not been freed. Unfortunately the elevated reference count will cause the migration required to handle the fault to fail. To avoid this failure pass the faulting page into the migrate_vma functions so that if an elevated reference count is found it can be checked to see if it's expected or not. [mpe@ellerman.id.au: fix build] Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple Acked-by: Felix Kuehling Cc: Jason Gunthorpe Cc: John Hubbard Cc: Ralph Campbell Cc: Michael Ellerman Cc: Lyude Paul Cc: Alex Deucher Cc: Alex Sierra Cc: Ben Skeggs Cc: Christian König Cc: Dan Williams Cc: David Hildenbrand Cc: "Huang, Ying" Cc: Matthew Wilcox Cc: Yang Shi Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/migrate.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 704a04f5a074..52090d1f9230 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -62,6 +62,8 @@ extern const char *migrate_reason_names[MR_TYPES]; #ifdef CONFIG_MIGRATION extern void putback_movable_pages(struct list_head *l); +int migrate_folio_extra(struct address_space *mapping, struct folio *dst, + struct folio *src, enum migrate_mode mode, int extra_count); int migrate_folio(struct address_space *mapping, struct folio *dst, struct folio *src, enum migrate_mode mode); extern int migrate_pages(struct list_head *l, new_page_t new, free_page_t free, @@ -197,6 +199,12 @@ struct migrate_vma { */ void *pgmap_owner; unsigned long flags; + + /* + * Set to vmf->page if this is being called to migrate a page as part of + * a migrate_to_ram() callback. + */ + struct page *fault_page; }; int migrate_vma_setup(struct migrate_vma *args); -- cgit From ef233450898f8893dafa193a9f3211fa077a3d05 Mon Sep 17 00:00:00 2001 From: Alistair Popple Date: Wed, 28 Sep 2022 22:01:16 +1000 Subject: mm: free device private pages have zero refcount MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Since 27674ef6c73f ("mm: remove the extra ZONE_DEVICE struct page refcount") device private pages have no longer had an extra reference count when the page is in use. However before handing them back to the owning device driver we add an extra reference count such that free pages have a reference count of one. This makes it difficult to tell if a page is free or not because both free and in use pages will have a non-zero refcount. Instead we should return pages to the drivers page allocator with a zero reference count. Kernel code can then safely use kernel functions such as get_page_unless_zero(). Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple Acked-by: Felix Kuehling Cc: Jason Gunthorpe Cc: Michael Ellerman Cc: Alex Deucher Cc: Christian König Cc: Ben Skeggs Cc: Lyude Paul Cc: Ralph Campbell Cc: Alex Sierra Cc: John Hubbard Cc: Dan Williams Cc: David Hildenbrand Cc: "Huang, Ying" Cc: Matthew Wilcox Cc: Yang Shi Cc: Zi Yan Signed-off-by: Andrew Morton --- include/linux/memremap.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/memremap.h b/include/linux/memremap.h index c3b4cc84877b..7fcaf3180a5b 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -187,6 +187,7 @@ static inline bool folio_is_device_coherent(const struct folio *folio) } #ifdef CONFIG_ZONE_DEVICE +void zone_device_page_init(struct page *page); void *memremap_pages(struct dev_pagemap *pgmap, int nid); void memunmap_pages(struct dev_pagemap *pgmap); void *devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap); -- cgit From e778406b40dbb1342a1888cd751ca9d2982a12e2 Mon Sep 17 00:00:00 2001 From: Alistair Popple Date: Wed, 28 Sep 2022 22:01:19 +1000 Subject: mm/migrate_device.c: add migrate_device_range() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Device drivers can use the migrate_vma family of functions to migrate existing private anonymous mappings to device private pages. These pages are backed by memory on the device with drivers being responsible for copying data to and from device memory. Device private pages are freed via the pgmap->page_free() callback when they are unmapped and their refcount drops to zero. Alternatively they may be freed indirectly via migration back to CPU memory in response to a pgmap->migrate_to_ram() callback called whenever the CPU accesses an address mapped to a device private page. In other words drivers cannot control the lifetime of data allocated on the devices and must wait until these pages are freed from userspace. This causes issues when memory needs to reclaimed on the device, either because the device is going away due to a ->release() callback or because another user needs to use the memory. Drivers could use the existing migrate_vma functions to migrate data off the device. However this would require them to track the mappings of each page which is both complicated and not always possible. Instead drivers need to be able to migrate device pages directly so they can free up device memory. To allow that this patch introduces the migrate_device family of functions which are functionally similar to migrate_vma but which skips the initial lookup based on mapping. Link: https://lkml.kernel.org/r/868116aab70b0c8ee467d62498bb2cf0ef907295.1664366292.git-series.apopple@nvidia.com Signed-off-by: Alistair Popple Cc: "Huang, Ying" Cc: Zi Yan Cc: Matthew Wilcox Cc: Yang Shi Cc: David Hildenbrand Cc: Ralph Campbell Cc: John Hubbard Cc: Alex Deucher Cc: Alex Sierra Cc: Ben Skeggs Cc: Christian König Cc: Dan Williams Cc: Felix Kuehling Cc: Jason Gunthorpe Cc: Lyude Paul Cc: Michael Ellerman Signed-off-by: Andrew Morton --- include/linux/migrate.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 52090d1f9230..3ef77f52a4f0 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -210,6 +210,13 @@ struct migrate_vma { int migrate_vma_setup(struct migrate_vma *args); void migrate_vma_pages(struct migrate_vma *migrate); void migrate_vma_finalize(struct migrate_vma *migrate); +int migrate_device_range(unsigned long *src_pfns, unsigned long start, + unsigned long npages); +void migrate_device_pages(unsigned long *src_pfns, unsigned long *dst_pfns, + unsigned long npages); +void migrate_device_finalize(unsigned long *src_pfns, + unsigned long *dst_pfns, unsigned long npages); + #endif /* CONFIG_MIGRATION */ #endif /* _LINUX_MIGRATE_H */ -- cgit From a2550d3ce53c68f54042bc5e468c4d07491ffe0e Mon Sep 17 00:00:00 2001 From: Christian Marangi Date: Wed, 12 Oct 2022 19:18:36 +0200 Subject: net: dsa: qca8k: fix inband mgmt for big-endian systems The header and the data of the skb for the inband mgmt requires to be in little-endian. This is problematic for big-endian system as the mgmt header is written in the cpu byte order. Fix this by converting each value for the mgmt header and data to little-endian, and convert to cpu byte order the mgmt header and data sent by the switch. Fixes: 5950c7c0a68c ("net: dsa: qca8k: add support for mgmt read/write in Ethernet packet") Tested-by: Pawel Dembicki Tested-by: Lech Perczak Signed-off-by: Christian Marangi Reviewed-by: Lech Perczak Signed-off-by: David S. Miller --- include/linux/dsa/tag_qca.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/dsa/tag_qca.h b/include/linux/dsa/tag_qca.h index 50be7cbd93a5..0e176da1e43f 100644 --- a/include/linux/dsa/tag_qca.h +++ b/include/linux/dsa/tag_qca.h @@ -61,9 +61,9 @@ struct sk_buff; /* Special struct emulating a Ethernet header */ struct qca_mgmt_ethhdr { - u32 command; /* command bit 31:0 */ - u32 seq; /* seq 63:32 */ - u32 mdio_data; /* first 4byte mdio */ + __le32 command; /* command bit 31:0 */ + __le32 seq; /* seq 63:32 */ + __le32 mdio_data; /* first 4byte mdio */ __be16 hdr; /* qca hdr */ } __packed; -- cgit From 0d4636f7d72df3179b20a2d32b647881917a5e2a Mon Sep 17 00:00:00 2001 From: Christian Marangi Date: Wed, 12 Oct 2022 19:18:37 +0200 Subject: net: dsa: qca8k: fix ethtool autocast mib for big-endian systems The switch sends autocast mib in little-endian. This is problematic for big-endian system as the values needs to be converted. Fix this by converting each mib value to cpu byte order. Fixes: 5c957c7ca78c ("net: dsa: qca8k: add support for mib autocast in Ethernet packet") Tested-by: Pawel Dembicki Tested-by: Lech Perczak Signed-off-by: Christian Marangi Signed-off-by: David S. Miller --- include/linux/dsa/tag_qca.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/dsa/tag_qca.h b/include/linux/dsa/tag_qca.h index 0e176da1e43f..b1b5720d89a5 100644 --- a/include/linux/dsa/tag_qca.h +++ b/include/linux/dsa/tag_qca.h @@ -73,7 +73,7 @@ enum mdio_cmd { }; struct mib_ethhdr { - u32 data[3]; /* first 3 mib counter */ + __le32 data[3]; /* first 3 mib counter */ __be16 hdr; /* qca hdr */ } __packed; -- cgit From 57d849636a04a12713dd3a10a97cb9658ec7edf6 Mon Sep 17 00:00:00 2001 From: Kefeng Wang Date: Wed, 12 Oct 2022 11:06:35 +0800 Subject: clk: at91: fix the build with binutils 2.27 There is an issue when build with older versions of binutils 2.27.0, arch/arm/mach-at91/pm_suspend.S: Assembler messages: arch/arm/mach-at91/pm_suspend.S:1086: Error: garbage following instruction -- `ldr tmp1,=0x00020010UL' Use UL() macro to fix the issue in assembly file. Fixes: 4fd36e458392 ("ARM: at91: pm: add plla disable/enable support for sam9x60") Signed-off-by: Kefeng Wang Link: https://lore.kernel.org/r/20221012030635.13140-1-wangkefeng.wang@huawei.com Signed-off-by: Stephen Boyd --- include/linux/clk/at91_pmc.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/clk/at91_pmc.h b/include/linux/clk/at91_pmc.h index 3484309b59bf..7af499bdbecb 100644 --- a/include/linux/clk/at91_pmc.h +++ b/include/linux/clk/at91_pmc.h @@ -12,6 +12,8 @@ #ifndef AT91_PMC_H #define AT91_PMC_H +#include + #define AT91_PMC_V1 (1) /* PMC version 1 */ #define AT91_PMC_V2 (2) /* PMC version 2 [SAM9X60] */ @@ -45,8 +47,8 @@ #define AT91_PMC_PCSR 0x18 /* Peripheral Clock Status Register */ #define AT91_PMC_PLL_ACR 0x18 /* PLL Analog Control Register [for SAM9X60] */ -#define AT91_PMC_PLL_ACR_DEFAULT_UPLL 0x12020010UL /* Default PLL ACR value for UPLL */ -#define AT91_PMC_PLL_ACR_DEFAULT_PLLA 0x00020010UL /* Default PLL ACR value for PLLA */ +#define AT91_PMC_PLL_ACR_DEFAULT_UPLL UL(0x12020010) /* Default PLL ACR value for UPLL */ +#define AT91_PMC_PLL_ACR_DEFAULT_PLLA UL(0x00020010) /* Default PLL ACR value for PLLA */ #define AT91_PMC_PLL_ACR_UTMIVR (1 << 12) /* UPLL Voltage regulator Control */ #define AT91_PMC_PLL_ACR_UTMIBG (1 << 13) /* UPLL Bandgap Control */ -- cgit From fc8695eb11f07d936a4a9dbd15d7797986bc8b89 Mon Sep 17 00:00:00 2001 From: Jakub Kicinski Date: Fri, 14 Oct 2022 09:07:46 -0700 Subject: Revert "net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}" This reverts commit 854701ba4c39afae2362ba19a580c461cb183e4f. We have more violations around, which leads to: WARNING: CPU: 2 PID: 1 at include/linux/cpumask.h:110 __netif_set_xps_queue+0x14e/0x770 Let's back this out and retry with a larger clean up in -next. Fixes: 854701ba4c39 ("net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and}") Link: https://lore.kernel.org/all/20221014030459.3272206-2-guoren@kernel.org/ Signed-off-by: Jakub Kicinski Signed-off-by: David S. Miller --- include/linux/netdevice.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index a36edb0ec199..eddf8ee270e7 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -3663,8 +3663,9 @@ static inline bool netif_attr_test_online(unsigned long j, static inline unsigned int netif_attrmask_next(int n, const unsigned long *srcp, unsigned int nr_bits) { - /* n is a prior cpu */ - cpu_max_bits_warn(n + 1, nr_bits); + /* -1 is a legal arg here. */ + if (n != -1) + cpu_max_bits_warn(n, nr_bits); if (srcp) return find_next_bit(srcp, nr_bits, n + 1); @@ -3685,8 +3686,9 @@ static inline int netif_attrmask_next_and(int n, const unsigned long *src1p, const unsigned long *src2p, unsigned int nr_bits) { - /* n is a prior cpu */ - cpu_max_bits_warn(n + 1, nr_bits); + /* -1 is a legal arg here. */ + if (n != -1) + cpu_max_bits_warn(n, nr_bits); if (src1p && src2p) return find_next_and_bit(src1p, src2p, nr_bits, n + 1); -- cgit From 96de900ae78e7dbedc937fd91bafe2934579c65a Mon Sep 17 00:00:00 2001 From: Shenwei Wang Date: Fri, 14 Oct 2022 09:47:28 -0500 Subject: net: phylink: add mac_managed_pm in phylink_config structure The recent commit 'commit 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state")' requires the MAC driver explicitly tell the phy driver who is managing the PM, otherwise you will see warning during resume stage. Add a boolean property in the phylink_config structure so that the MAC driver can use it to tell the PHY driver if it wants to manage the PM. Fixes: 744d23c71af3 ("net: phy: Warn about incorrect mdio_bus_phy_resume() state") Signed-off-by: Shenwei Wang Acked-by: Florian Fainelli Reviewed-by: Russell King (Oracle) Signed-off-by: David S. Miller --- include/linux/phylink.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/phylink.h b/include/linux/phylink.h index 664dd409feb9..3f01ac8017e0 100644 --- a/include/linux/phylink.h +++ b/include/linux/phylink.h @@ -122,6 +122,7 @@ enum phylink_op_type { * (See commit 7cceb599d15d ("net: phylink: avoid mac_config calls") * @poll_fixed_state: if true, starts link_poll, * if MAC link is at %MLO_AN_FIXED mode. + * @mac_managed_pm: if true, indicate the MAC driver is responsible for PHY PM. * @ovr_an_inband: if true, override PCS to MLO_AN_INBAND * @get_fixed_state: callback to execute to determine the fixed link state, * if MAC link is at %MLO_AN_FIXED mode. @@ -134,6 +135,7 @@ struct phylink_config { enum phylink_op_type type; bool legacy_pre_march2020; bool poll_fixed_state; + bool mac_managed_pm; bool ovr_an_inband; void (*get_fixed_state)(struct phylink_config *config, struct phylink_link_state *state); -- cgit From e36ce448a08d43de69e7449eb225805a7a8addf8 Mon Sep 17 00:00:00 2001 From: Hyeonggon Yoo <42.hyeyoo@gmail.com> Date: Sat, 15 Oct 2022 13:34:29 +0900 Subject: mm/slab: use kmalloc_node() for off slab freelist_idx_t array allocation After commit d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator"), SLAB passes large ( > PAGE_SIZE * 2) requests to buddy like SLUB does. SLAB has been using kmalloc caches to allocate freelist_idx_t array for off slab caches. But after the commit, freelist_size can be bigger than KMALLOC_MAX_CACHE_SIZE. Instead of using pointer to kmalloc cache, use kmalloc_node() and only check if the kmalloc cache is off slab during calculate_slab_order(). If freelist_size > KMALLOC_MAX_CACHE_SIZE, no looping condition happens as it allocates freelist_idx_t array directly from buddy. Link: https://lore.kernel.org/all/20221014205818.GA1428667@roeck-us.net/ Reported-and-tested-by: Guenter Roeck Fixes: d6a71648dbc0 ("mm/slab: kmalloc: pass requests larger than order-1 page to page allocator") Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka --- include/linux/slab_def.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index e24c9aff6fed..f0ffad6a3365 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -33,7 +33,6 @@ struct kmem_cache { size_t colour; /* cache colouring range */ unsigned int colour_off; /* colour offset */ - struct kmem_cache *freelist_cache; unsigned int freelist_size; /* constructor func */ -- cgit From 80493877d7d0ae0cbe62921d748682811c58026f Mon Sep 17 00:00:00 2001 From: Tetsuo Handa Date: Sun, 16 Oct 2022 00:53:51 +0900 Subject: Revert "cpumask: fix checking valid cpu range". This reverts commit 78e5a3399421 ("cpumask: fix checking valid cpu range"). syzbot is hitting WARN_ON_ONCE(cpu >= nr_cpumask_bits) warning at cpu_max_bits_warn() [1], for commit 78e5a3399421 ("cpumask: fix checking valid cpu range") is broken. Obviously that patch hits WARN_ON_ONCE() when e.g. reading /proc/cpuinfo because passing "cpu + 1" instead of "cpu" will trivially hit cpu == nr_cpumask_bits condition. Although syzbot found this problem in linux-next.git on 2022/09/27 [2], this problem was not fixed immediately. As a result, that patch was sent to linux.git before the patch author recognizes this problem, and syzbot started failing to test changes in linux.git since 2022/10/10 [3]. Andrew Jones proposed a fix for x86 and riscv architectures [4]. But [2] and [5] indicate that affected locations are not limited to arch code. More delay before we find and fix affected locations, less tested kernel (and more difficult to bisect and fix) before release. We should have inspected and fixed basically all cpumask users before applying that patch. We should not crash kernels in order to ask existing cpumask users to update their code, even if limited to CONFIG_DEBUG_PER_CPU_MAPS=y case. Link: https://syzkaller.appspot.com/bug?extid=d0fd2bf0dd6da72496dd [1] Link: https://syzkaller.appspot.com/bug?extid=21da700f3c9f0bc40150 [2] Link: https://syzkaller.appspot.com/bug?extid=51a652e2d24d53e75734 [3] Link: https://lkml.kernel.org/r/20221014155845.1986223-1-ajones@ventanamicro.com [4] Link: https://syzkaller.appspot.com/bug?extid=4d46c43d81c3bd155060 [5] Reported-by: Andrew Jones Reported-by: syzbot+d0fd2bf0dd6da72496dd@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa Cc: Yury Norov Cc: Borislav Petkov Signed-off-by: Linus Torvalds --- include/linux/cpumask.h | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 2f065ad97541..c2aa0aa26b45 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -174,8 +174,9 @@ static inline unsigned int cpumask_last(const struct cpumask *srcp) static inline unsigned int cpumask_next(int n, const struct cpumask *srcp) { - /* n is a prior cpu */ - cpumask_check(n + 1); + /* -1 is a legal arg here. */ + if (n != -1) + cpumask_check(n); return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1); } @@ -188,8 +189,9 @@ unsigned int cpumask_next(int n, const struct cpumask *srcp) */ static inline unsigned int cpumask_next_zero(int n, const struct cpumask *srcp) { - /* n is a prior cpu */ - cpumask_check(n + 1); + /* -1 is a legal arg here. */ + if (n != -1) + cpumask_check(n); return find_next_zero_bit(cpumask_bits(srcp), nr_cpumask_bits, n+1); } @@ -229,8 +231,9 @@ static inline unsigned int cpumask_next_and(int n, const struct cpumask *src1p, const struct cpumask *src2p) { - /* n is a prior cpu */ - cpumask_check(n + 1); + /* -1 is a legal arg here. */ + if (n != -1) + cpumask_check(n); return find_next_and_bit(cpumask_bits(src1p), cpumask_bits(src2p), nr_cpumask_bits, n + 1); } @@ -260,8 +263,8 @@ static inline unsigned int cpumask_next_wrap(int n, const struct cpumask *mask, int start, bool wrap) { cpumask_check(start); - /* n is a prior cpu */ - cpumask_check(n + 1); + if (n != -1) + cpumask_check(n); /* * Return the first available CPU when wrapping, or when starting before cpu0, -- cgit From 25b72d530e7aa185955196b63f53c38f751f1632 Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Wed, 12 Oct 2022 12:18:54 -0700 Subject: fbdev: MIPS supports iomem addresses Add MIPS to fb_* helpers list for iomem addresses. This silences Sparse warnings about lacking __iomem address space casts: drivers/video/fbdev/pvr2fb.c:800:9: sparse: sparse: incorrect type in argument 1 (different address spaces) drivers/video/fbdev/pvr2fb.c:800:9: sparse: expected void const * drivers/video/fbdev/pvr2fb.c:800:9: sparse: got char [noderef] __iomem *screen_base Reported-by: kernel test robot Link: https://lore.kernel.org/lkml/202210100209.tR2Iqbqk-lkp@intel.com/ Cc: Helge Deller Cc: linux-fbdev@vger.kernel.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Kees Cook Signed-off-by: Helge Deller --- include/linux/fb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fb.h b/include/linux/fb.h index 0aff76bcbb00..bcb8658f5b64 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -555,7 +555,7 @@ static inline struct apertures_struct *alloc_apertures(unsigned int max_num) { #elif defined(__i386__) || defined(__alpha__) || defined(__x86_64__) || \ defined(__hppa__) || defined(__sh__) || defined(__powerpc__) || \ - defined(__arm__) || defined(__aarch64__) + defined(__arm__) || defined(__aarch64__) || defined(__mips__) #define fb_readb __raw_readb #define fb_readw __raw_readw -- cgit From 472a1482325b3a285e0bcf82c0b0edc689b7e8cd Mon Sep 17 00:00:00 2001 From: William Breathitt Gray Date: Sun, 2 Oct 2022 08:04:19 -0400 Subject: counter: Reduce DEFINE_COUNTER_ARRAY_POLARITY() to defining counter_array A spare warning was reported for drivers/counter/ti-ecap-capture.c:: sparse warnings: (new ones prefixed by >>) >> drivers/counter/ti-ecap-capture.c:380:8: sparse: sparse: symbol 'ecap_cnt_pol_array' was not declared. Should it be static? vim +/ecap_cnt_pol_array +380 drivers/counter/ti-ecap-capture.c 379 > 380 static DEFINE_COUNTER_ARRAY_POLARITY(ecap_cnt_pol_array, ecap_cnt_pol_avail, ECAP_NB_CEVT); 381 The first argument to the DEFINE_COUNTER_ARRAY_POLARITY() macro is a token serving as the symbol name in the definition of a new struct counter_array structure. However, this macro actually expands to two statements:: #define DEFINE_COUNTER_ARRAY_POLARITY(_name, _enums, _length) \ DEFINE_COUNTER_AVAILABLE(_name##_available, _enums); \ struct counter_array _name = { \ .type = COUNTER_COMP_SIGNAL_POLARITY, \ .avail = &(_name##_available), \ .length = (_length), \ } Because of this, the "static" on line 380 only applies to the first statement. This patch splits out the DEFINE_COUNTER_AVAILABLE() line and leaves DEFINE_COUNTER_ARRAY_POLARITY() as a simple structure definition to avoid issues like this. Reported-by: kernel test robot Link: https://lore.kernel.org/all/202210020619.NQbyomII-lkp@intel.com/ Cc: Julien Panis Signed-off-by: William Breathitt Gray --- include/linux/counter.h | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/counter.h b/include/linux/counter.h index c41fa602ed28..b63746637de2 100644 --- a/include/linux/counter.h +++ b/include/linux/counter.h @@ -542,11 +542,10 @@ struct counter_array { #define DEFINE_COUNTER_ARRAY_CAPTURE(_name, _length) \ DEFINE_COUNTER_ARRAY_U64(_name, _length) -#define DEFINE_COUNTER_ARRAY_POLARITY(_name, _enums, _length) \ - DEFINE_COUNTER_AVAILABLE(_name##_available, _enums); \ +#define DEFINE_COUNTER_ARRAY_POLARITY(_name, _available, _length) \ struct counter_array _name = { \ .type = COUNTER_COMP_SIGNAL_POLARITY, \ - .avail = &(_name##_available), \ + .avail = &(_available), \ .length = (_length), \ } -- cgit From ca6c21327c6af02b7eec31ce4b9a740a18c6c13f Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 6 Oct 2022 15:00:39 +0200 Subject: perf: Fix missing SIGTRAPs Marco reported: Due to the implementation of how SIGTRAP are delivered if perf_event_attr::sigtrap is set, we've noticed 3 issues: 1. Missing SIGTRAP due to a race with event_sched_out() (more details below). 2. Hardware PMU events being disabled due to returning 1 from perf_event_overflow(). The only way to re-enable the event is for user space to first "properly" disable the event and then re-enable it. 3. The inability to automatically disable an event after a specified number of overflows via PERF_EVENT_IOC_REFRESH. The worst of the 3 issues is problem (1), which occurs when a pending_disable is "consumed" by a racing event_sched_out(), observed as follows: CPU0 | CPU1 --------------------------------+--------------------------- __perf_event_overflow() | perf_event_disable_inatomic() | pending_disable = CPU0 | ... | _perf_event_enable() | event_function_call() | task_function_call() | /* sends IPI to CPU0 */ | ... __perf_event_enable() +--------------------------- ctx_resched() task_ctx_sched_out() ctx_sched_out() group_sched_out() event_sched_out() pending_disable = -1 perf_pending_event() perf_pending_event_disable() /* Fails to send SIGTRAP because no pending_disable! */ In the above case, not only is that particular SIGTRAP missed, but also all future SIGTRAPs because 'event_limit' is not reset back to 1. To fix, rework pending delivery of SIGTRAP via IRQ-work by introduction of a separate 'pending_sigtrap', no longer using 'event_limit' and 'pending_disable' for its delivery. Additionally; and different to Marco's proposed patch: - recognise that pending_disable effectively duplicates oncpu for the case where it is set. As such, change the irq_work handler to use ->oncpu to target the event and use pending_* as boolean toggles. - observe that SIGTRAP targets the ctx->task, so the context switch optimization that carries contexts between tasks is invalid. If the irq_work were delayed enough to hit after a context switch the SIGTRAP would be delivered to the wrong task. - observe that if the event gets scheduled out (rotation/migration/context-switch/...) the irq-work would be insufficient to deliver the SIGTRAP when the event gets scheduled back in (the irq-work might still be pending on the old CPU). Therefore have event_sched_out() convert the pending sigtrap into a task_work which will deliver the signal at return_to_user. Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Reported-by: Dmitry Vyukov Debugged-by: Dmitry Vyukov Reported-by: Marco Elver Debugged-by: Marco Elver Signed-off-by: Peter Zijlstra (Intel) Reviewed-by: Marco Elver Tested-by: Marco Elver --- include/linux/perf_event.h | 19 +++++++++++++++---- 1 file changed, 15 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 853f64b6c8c2..0031f7b4d9ab 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -756,11 +756,14 @@ struct perf_event { struct fasync_struct *fasync; /* delayed work for NMIs and such */ - int pending_wakeup; - int pending_kill; - int pending_disable; + unsigned int pending_wakeup; + unsigned int pending_kill; + unsigned int pending_disable; + unsigned int pending_sigtrap; unsigned long pending_addr; /* SIGTRAP */ - struct irq_work pending; + struct irq_work pending_irq; + struct callback_head pending_task; + unsigned int pending_work; atomic_t event_limit; @@ -877,6 +880,14 @@ struct perf_event_context { #endif void *task_ctx_data; /* pmu specific data */ struct rcu_head rcu_head; + + /* + * Sum (event->pending_sigtrap + event->pending_work) + * + * The SIGTRAP is targeted at ctx->task, as such it won't do changing + * that until the signal is delivered. + */ + local_t nr_pending; }; /* -- cgit From ccd30a476f8e864732de220bd50e6f372f5ebcab Mon Sep 17 00:00:00 2001 From: Eric Biggers Date: Tue, 11 Oct 2022 14:38:38 -0700 Subject: fscrypt: fix keyring memory leak on mount failure Commit d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key") moved the keyring destruction from __put_super() to generic_shutdown_super() so that the filesystem's block device(s) are still available. Unfortunately, this causes a memory leak in the case where a mount is attempted with the test_dummy_encryption mount option, but the mount fails after the option has already been processed. To fix this, attempt the keyring destruction in both places. Reported-by: syzbot+104c2a89561289cec13e@syzkaller.appspotmail.com Fixes: d7e7b9af104c ("fscrypt: stop using keyrings subsystem for fscrypt_master_key") Signed-off-by: Eric Biggers Reviewed-by: Christian Brauner (Microsoft) Link: https://lore.kernel.org/r/20221011213838.209879-1-ebiggers@kernel.org --- include/linux/fscrypt.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index cad78b569c7e..4f5f8a651213 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -307,7 +307,7 @@ fscrypt_free_dummy_policy(struct fscrypt_dummy_policy *dummy_policy) } /* keyring.c */ -void fscrypt_sb_delete(struct super_block *sb); +void fscrypt_destroy_keyring(struct super_block *sb); int fscrypt_ioctl_add_key(struct file *filp, void __user *arg); int fscrypt_add_test_dummy_key(struct super_block *sb, const struct fscrypt_dummy_policy *dummy_policy); @@ -521,7 +521,7 @@ fscrypt_free_dummy_policy(struct fscrypt_dummy_policy *dummy_policy) } /* keyring.c */ -static inline void fscrypt_sb_delete(struct super_block *sb) +static inline void fscrypt_destroy_keyring(struct super_block *sb) { } -- cgit From dbe69b29988465b011f198f2797b1c2b6980b50e Mon Sep 17 00:00:00 2001 From: Jiri Olsa Date: Tue, 18 Oct 2022 09:59:34 +0200 Subject: bpf: Fix dispatcher patchable function entry to 5 bytes nop The patchable_function_entry(5) might output 5 single nop instructions (depends on toolchain), which will clash with bpf_arch_text_poke check for 5 bytes nop instruction. Adding early init call for dispatcher that checks and change the patchable entry into expected 5 nop instruction if needed. There's no need to take text_mutex, because we are using it in early init call which is called at pre-smp time. Fixes: ceea991a019c ("bpf: Move bpf_dispatcher function out of ftrace locations") Signed-off-by: Jiri Olsa Acked-by: Peter Zijlstra (Intel) Link: https://lore.kernel.org/r/20221018075934.574415-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 9e7d46d16032..0566705c1d4e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -27,6 +27,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; @@ -970,6 +971,8 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key, struct bpf_attach_target_info *tgt_info); void bpf_trampoline_put(struct bpf_trampoline *tr); int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs); +int __init bpf_arch_init_dispatcher_early(void *ip); + #define BPF_DISPATCHER_INIT(_name) { \ .mutex = __MUTEX_INITIALIZER(_name.mutex), \ .func = &_name##_func, \ @@ -983,6 +986,13 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func }, \ } +#define BPF_DISPATCHER_INIT_CALL(_name) \ + static int __init _name##_init(void) \ + { \ + return bpf_arch_init_dispatcher_early(_name##_func); \ + } \ + early_initcall(_name##_init) + #ifdef CONFIG_X86_64 #define BPF_DISPATCHER_ATTRIBUTES __attribute__((patchable_function_entry(5))) #else @@ -1000,7 +1010,9 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func } \ EXPORT_SYMBOL(bpf_dispatcher_##name##_func); \ struct bpf_dispatcher bpf_dispatcher_##name = \ - BPF_DISPATCHER_INIT(bpf_dispatcher_##name); + BPF_DISPATCHER_INIT(bpf_dispatcher_##name); \ + BPF_DISPATCHER_INIT_CALL(bpf_dispatcher_##name); + #define DECLARE_BPF_DISPATCHER(name) \ unsigned int bpf_dispatcher_##name##_func( \ const void *ctx, \ -- cgit From 0251d0107cfb0bb5ab2d3f97710487b9522db020 Mon Sep 17 00:00:00 2001 From: Lu Baolu Date: Wed, 19 Oct 2022 08:44:44 +0800 Subject: iommu: Add gfp parameter to iommu_alloc_resv_region Add gfp parameter to iommu_alloc_resv_region() for the callers to specify the memory allocation behavior. Thus iommu_alloc_resv_region() could also be available in critical contexts. Signed-off-by: Lu Baolu Tested-by: Alex Williamson Link: https://lore.kernel.org/r/20220927053109.4053662-2-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel --- include/linux/iommu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/iommu.h b/include/linux/iommu.h index a325532aeab5..3c9da1f8979e 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -455,7 +455,7 @@ extern void iommu_set_default_translated(bool cmd_line); extern bool iommu_default_passthrough(void); extern struct iommu_resv_region * iommu_alloc_resv_region(phys_addr_t start, size_t length, int prot, - enum iommu_resv_type type); + enum iommu_resv_type type, gfp_t gfp); extern int iommu_get_group_resv_regions(struct iommu_group *group, struct list_head *head); -- cgit From 8a254d90a77580244ec57e82bca7eb65656cc167 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Wed, 19 Oct 2022 23:29:58 +0200 Subject: efi: efivars: Fix variable writes without query_variable_store() Commit bbc6d2c6ef22 ("efi: vars: Switch to new wrapper layer") refactored the efivars layer so that the 'business logic' related to which UEFI variables affect the boot flow in which way could be moved out of it, and into the efivarfs driver. This inadvertently broke setting variables on firmware implementations that lack the QueryVariableInfo() boot service, because we no longer tolerate a EFI_UNSUPPORTED result from check_var_size() when calling efivar_entry_set_get_size(), which now ends up calling check_var_size() a second time inadvertently. If QueryVariableInfo() is missing, we support writes of up to 64k - let's move that logic into check_var_size(), and drop the redundant call. Cc: # v6.0 Fixes: bbc6d2c6ef22 ("efi: vars: Switch to new wrapper layer") Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 3 --- 1 file changed, 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index da3974bf05d3..80f3c1c7827d 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1085,9 +1085,6 @@ efi_status_t efivar_set_variable_locked(efi_char16_t *name, efi_guid_t *vendor, efi_status_t efivar_set_variable(efi_char16_t *name, efi_guid_t *vendor, u32 attr, unsigned long data_size, void *data); -efi_status_t check_var_size(u32 attributes, unsigned long size); -efi_status_t check_var_size_nonblocking(u32 attributes, unsigned long size); - #if IS_ENABLED(CONFIG_EFI_CAPSULE_LOADER) extern bool efi_capsule_pending(int *reset_type); -- cgit From ed51862f2f57cbce6fed2d4278cfe70a490899fd Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Mon, 17 Oct 2022 20:45:39 +0200 Subject: kvm: Add support for arch compat vm ioctls We will introduce the first architecture specific compat vm ioctl in the next patch. Add all necessary boilerplate to allow architectures to override compat vm ioctls when necessary. Signed-off-by: Alexander Graf Message-Id: <20221017184541.2658-2-graf@amazon.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 32f259fa5801..00c3448ba7f8 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1390,6 +1390,8 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, struct kvm_enable_cap *cap); long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg); +long kvm_arch_vm_compat_ioctl(struct file *filp, unsigned int ioctl, + unsigned long arg); int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu); int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu); -- cgit From e993ffe3da4bcddea0536b03be1031bf35cd8d85 Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Fri, 21 Oct 2022 11:16:39 +0100 Subject: net: flag sockets supporting msghdr originated zerocopy We need an efficient way in io_uring to check whether a socket supports zerocopy with msghdr provided ubuf_info. Add a new flag into the struct socket flags fields. Cc: # 6.0 Signed-off-by: Pavel Begunkov Acked-by: Jakub Kicinski Link: https://lore.kernel.org/r/3dafafab822b1c66308bb58a0ac738b1e3f53f74.1666346426.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/net.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/net.h b/include/linux/net.h index 711c3593c3b8..18d942bbdf6e 100644 --- a/include/linux/net.h +++ b/include/linux/net.h @@ -41,6 +41,7 @@ struct net; #define SOCK_NOSPACE 2 #define SOCK_PASSCRED 3 #define SOCK_PASSSEC 4 +#define SOCK_SUPPORT_ZC 5 #ifndef ARCH_HAS_SOCKET_TYPES /** -- cgit From 52826d3b2d1d8e1180a84bef7d72596d6a024a38 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 23 Oct 2022 12:01:01 -0700 Subject: kernel/utsname_sysctl.c: Fix hostname polling Commit bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") added a new entry to the uts_kern_table[] array, but didn't update the UTS_PROC_xyz enumerators of older entries, breaking anything that used them. Which is admittedly not many cases: it's really just the two uses of uts_proc_notify() in kernel/sys.c. But apparently journald-systemd actually uses this to detect hostname changes. Reported-by: Torsten Hilbrich Fixes: bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") Link: https://lore.kernel.org/lkml/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/ Link: https://linux-regtracking.leemhuis.info/regzbot/regression/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/ Cc: Petr Vorel Signed-off-by: Linus Torvalds --- include/linux/utsname.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/utsname.h b/include/linux/utsname.h index 2b1737c9b244..bf7613ba412b 100644 --- a/include/linux/utsname.h +++ b/include/linux/utsname.h @@ -10,6 +10,7 @@ #include enum uts_proc { + UTS_PROC_ARCH, UTS_PROC_OSTYPE, UTS_PROC_OSRELEASE, UTS_PROC_VERSION, -- cgit From 161a438d730dade2ba2b1bf8785f0759aba4ca5f Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 20 Oct 2022 10:39:08 +0200 Subject: efi: random: reduce seed size to 32 bytes We no longer need at least 64 bytes of random seed to permit the early crng init to complete. The RNG is now based on Blake2s, so reduce the EFI seed size to the Blake2s hash size, which is sufficient for our purposes. While at it, drop the READ_ONCE(), which was supposed to prevent size from being evaluated after seed was unmapped. However, this cannot actually happen, so READ_ONCE() is unnecessary here. Cc: # v4.14+ Signed-off-by: Ard Biesheuvel Reviewed-by: Jason A. Donenfeld Acked-by: Ilias Apalodimas --- include/linux/efi.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 80f3c1c7827d..929d559ad41d 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1222,7 +1222,7 @@ efi_status_t efi_random_get_seed(void); arch_efi_call_virt_teardown(); \ }) -#define EFI_RANDOM_SEED_SIZE 64U +#define EFI_RANDOM_SEED_SIZE 32U // BLAKE2S_HASH_SIZE struct linux_efi_random_seed { u32 size; -- cgit From 31970608a6d3796c3adbfbfd379fa3092de65c5d Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Mon, 26 Sep 2022 12:45:32 -0700 Subject: overflow: Fix kern-doc markup for functions Fix the kern-doc markings for several of the overflow helpers and move their location into the core kernel API documentation, where it belongs (it's not driver-specific). Cc: Jonathan Corbet Cc: linux-doc@vger.kernel.org Cc: linux-hardening@vger.kernel.org Reviewed-by: Akira Yokosawa Signed-off-by: Kees Cook --- include/linux/overflow.h | 38 ++++++++++++++++---------------------- 1 file changed, 16 insertions(+), 22 deletions(-) (limited to 'include/linux') diff --git a/include/linux/overflow.h b/include/linux/overflow.h index 19dfdd74835e..1d3be1a2204c 100644 --- a/include/linux/overflow.h +++ b/include/linux/overflow.h @@ -51,8 +51,8 @@ static inline bool __must_check __must_check_overflow(bool overflow) return unlikely(overflow); } -/** check_add_overflow() - Calculate addition with overflow checking - * +/** + * check_add_overflow() - Calculate addition with overflow checking * @a: first addend * @b: second addend * @d: pointer to store sum @@ -66,8 +66,8 @@ static inline bool __must_check __must_check_overflow(bool overflow) #define check_add_overflow(a, b, d) \ __must_check_overflow(__builtin_add_overflow(a, b, d)) -/** check_sub_overflow() - Calculate subtraction with overflow checking - * +/** + * check_sub_overflow() - Calculate subtraction with overflow checking * @a: minuend; value to subtract from * @b: subtrahend; value to subtract from @a * @d: pointer to store difference @@ -81,8 +81,8 @@ static inline bool __must_check __must_check_overflow(bool overflow) #define check_sub_overflow(a, b, d) \ __must_check_overflow(__builtin_sub_overflow(a, b, d)) -/** check_mul_overflow() - Calculate multiplication with overflow checking - * +/** + * check_mul_overflow() - Calculate multiplication with overflow checking * @a: first factor * @b: second factor * @d: pointer to store product @@ -96,23 +96,24 @@ static inline bool __must_check __must_check_overflow(bool overflow) #define check_mul_overflow(a, b, d) \ __must_check_overflow(__builtin_mul_overflow(a, b, d)) -/** check_shl_overflow() - Calculate a left-shifted value and check overflow - * +/** + * check_shl_overflow() - Calculate a left-shifted value and check overflow * @a: Value to be shifted * @s: How many bits left to shift * @d: Pointer to where to store the result * * Computes *@d = (@a << @s) * - * Returns true if '*d' cannot hold the result or when 'a << s' doesn't + * Returns true if '*@d' cannot hold the result or when '@a << @s' doesn't * make sense. Example conditions: - * - 'a << s' causes bits to be lost when stored in *d. - * - 's' is garbage (e.g. negative) or so large that the result of - * 'a << s' is guaranteed to be 0. - * - 'a' is negative. - * - 'a << s' sets the sign bit, if any, in '*d'. * - * '*d' will hold the results of the attempted shift, but is not + * - '@a << @s' causes bits to be lost when stored in *@d. + * - '@s' is garbage (e.g. negative) or so large that the result of + * '@a << @s' is guaranteed to be 0. + * - '@a' is negative. + * - '@a << @s' sets the sign bit, if any, in '*@d'. + * + * '*@d' will hold the results of the attempted shift, but is not * considered "safe for use" if true is returned. */ #define check_shl_overflow(a, s, d) __must_check_overflow(({ \ @@ -129,7 +130,6 @@ static inline bool __must_check __must_check_overflow(bool overflow) /** * size_mul() - Calculate size_t multiplication with saturation at SIZE_MAX - * * @factor1: first factor * @factor2: second factor * @@ -149,7 +149,6 @@ static inline size_t __must_check size_mul(size_t factor1, size_t factor2) /** * size_add() - Calculate size_t addition with saturation at SIZE_MAX - * * @addend1: first addend * @addend2: second addend * @@ -169,7 +168,6 @@ static inline size_t __must_check size_add(size_t addend1, size_t addend2) /** * size_sub() - Calculate size_t subtraction with saturation at SIZE_MAX - * * @minuend: value to subtract from * @subtrahend: value to subtract from @minuend * @@ -192,7 +190,6 @@ static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend) /** * array_size() - Calculate size of 2-dimensional array. - * * @a: dimension one * @b: dimension two * @@ -205,7 +202,6 @@ static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend) /** * array3_size() - Calculate size of 3-dimensional array. - * * @a: dimension one * @b: dimension two * @c: dimension three @@ -220,7 +216,6 @@ static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend) /** * flex_array_size() - Calculate size of a flexible array member * within an enclosing structure. - * * @p: Pointer to the structure. * @member: Name of the flexible array member. * @count: Number of elements in the array. @@ -237,7 +232,6 @@ static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend) /** * struct_size() - Calculate size of structure with trailing flexible array. - * * @p: Pointer to the structure. * @member: Name of the array member. * @count: Number of elements in the array. -- cgit From 52491a38b2c2411f3f0229dc6ad610349c704a41 Mon Sep 17 00:00:00 2001 From: Michal Luczaj Date: Thu, 13 Oct 2022 21:12:19 +0000 Subject: KVM: Initialize gfn_to_pfn_cache locks in dedicated helper Move the gfn_to_pfn_cache lock initialization to another helper and call the new helper during VM/vCPU creation. There are race conditions possible due to kvm_gfn_to_pfn_cache_init()'s ability to re-initialize the cache's locks. For example: a race between ioctl(KVM_XEN_HVM_EVTCHN_SEND) and kvm_gfn_to_pfn_cache_init() leads to a corrupted shinfo gpc lock. (thread 1) | (thread 2) | kvm_xen_set_evtchn_fast | read_lock_irqsave(&gpc->lock, ...) | | kvm_gfn_to_pfn_cache_init | rwlock_init(&gpc->lock) read_unlock_irqrestore(&gpc->lock, ...) | Rename "cache_init" and "cache_destroy" to activate+deactivate to avoid implying that the cache really is destroyed/freed. Note, there more races in the newly named kvm_gpc_activate() that will be addressed separately. Fixes: 982ed0de4753 ("KVM: Reinstate gfn_to_pfn_cache with invalidation support") Cc: stable@vger.kernel.org Suggested-by: Sean Christopherson Signed-off-by: Michal Luczaj [sean: call out that this is a bug fix] Signed-off-by: Sean Christopherson Message-Id: <20221013211234.1318131-2-seanjc@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 24 +++++++++++++++++------- 1 file changed, 17 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 00c3448ba7f8..18592bdf4c1b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1240,8 +1240,18 @@ int kvm_vcpu_write_guest(struct kvm_vcpu *vcpu, gpa_t gpa, const void *data, void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn); /** - * kvm_gfn_to_pfn_cache_init - prepare a cached kernel mapping and HPA for a - * given guest physical address. + * kvm_gpc_init - initialize gfn_to_pfn_cache. + * + * @gpc: struct gfn_to_pfn_cache object. + * + * This sets up a gfn_to_pfn_cache by initializing locks. Note, the cache must + * be zero-allocated (or zeroed by the caller before init). + */ +void kvm_gpc_init(struct gfn_to_pfn_cache *gpc); + +/** + * kvm_gpc_activate - prepare a cached kernel mapping and HPA for a given guest + * physical address. * * @kvm: pointer to kvm instance. * @gpc: struct gfn_to_pfn_cache object. @@ -1265,9 +1275,9 @@ void kvm_vcpu_mark_page_dirty(struct kvm_vcpu *vcpu, gfn_t gfn); * kvm_gfn_to_pfn_cache_check() to ensure that the cache is valid before * accessing the target page. */ -int kvm_gfn_to_pfn_cache_init(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, - struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, - gpa_t gpa, unsigned long len); +int kvm_gpc_activate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, + struct kvm_vcpu *vcpu, enum pfn_cache_usage usage, + gpa_t gpa, unsigned long len); /** * kvm_gfn_to_pfn_cache_check - check validity of a gfn_to_pfn_cache. @@ -1324,7 +1334,7 @@ int kvm_gfn_to_pfn_cache_refresh(struct kvm *kvm, struct gfn_to_pfn_cache *gpc, void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc); /** - * kvm_gfn_to_pfn_cache_destroy - destroy and unlink a gfn_to_pfn_cache. + * kvm_gpc_deactivate - deactivate and unlink a gfn_to_pfn_cache. * * @kvm: pointer to kvm instance. * @gpc: struct gfn_to_pfn_cache object. @@ -1332,7 +1342,7 @@ void kvm_gfn_to_pfn_cache_unmap(struct kvm *kvm, struct gfn_to_pfn_cache *gpc); * This removes a cache from the @kvm's list to be processed on MMU notifier * invocation. */ -void kvm_gfn_to_pfn_cache_destroy(struct kvm *kvm, struct gfn_to_pfn_cache *gpc); +void kvm_gpc_deactivate(struct kvm *kvm, struct gfn_to_pfn_cache *gpc); void kvm_sigset_activate(struct kvm_vcpu *vcpu); void kvm_sigset_deactivate(struct kvm_vcpu *vcpu); -- cgit From 2d87d455ead2cbdee7e60463cddc5bff3f98c912 Mon Sep 17 00:00:00 2001 From: Ming Lei Date: Thu, 27 Oct 2022 16:57:09 +0800 Subject: blk-mq: don't add non-pt request with ->end_io to batch dm-rq implements ->end_io callback for request issued to underlying queue, and it isn't passthrough request. Commit ab3e1d3bbab9 ("block: allow end_io based requests in the completion batch handling") doesn't clear rq->bio and rq->__data_len for request with ->end_io in blk_mq_end_request_batch(), and this way is actually dangerous, but so far it is only for nvme passthrough request. dm-rq needs to clean up remained bios in case of partial completion, and req->bio is required, then use-after-free is triggered, so the underlying clone request can't be completed in blk_mq_end_request_batch. Fix panic by not adding such request into batch list, and the issue can be triggered simply by exposing nvme pci to dm-mpath simply. Fixes: ab3e1d3bbab9 ("block: allow end_io based requests in the completion batch handling") Cc: dm-devel@redhat.com Cc: Mike Snitzer Reported-by: Changhui Zhong Signed-off-by: Ming Lei Link: https://lore.kernel.org/r/20221027085709.513175-1-ming.lei@redhat.com Signed-off-by: Jens Axboe --- include/linux/blk-mq.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h index ba18e9bdb799..d6119c5d1069 100644 --- a/include/linux/blk-mq.h +++ b/include/linux/blk-mq.h @@ -853,7 +853,8 @@ static inline bool blk_mq_add_to_batch(struct request *req, struct io_comp_batch *iob, int ioerror, void (*complete)(struct io_comp_batch *)) { - if (!iob || (req->rq_flags & RQF_ELV) || ioerror) + if (!iob || (req->rq_flags & RQF_ELV) || ioerror || + (req->end_io && !blk_rq_is_passthrough(req))) return false; if (!iob->complete) -- cgit From bacd22df95147ed673bec4692ab2d4d585935241 Mon Sep 17 00:00:00 2001 From: Tariq Toukan Date: Wed, 26 Oct 2022 14:51:45 +0100 Subject: net/mlx5: Fix possible use-after-free in async command interface mlx5_cmd_cleanup_async_ctx should return only after all its callback handlers were completed. Before this patch, the below race between mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and lead to a use-after-free: 1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e. elevated by 1, a single inflight callback). 2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1. 3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and is about to call wake_up(). 4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns immediately as the condition (num_inflight == 0) holds. 5. mlx5_cmd_cleanup_async_ctx returns. 6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx object. 7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed object. Fix it by syncing using a completion object. Mark it completed when num_inflight reaches 0. Trace: BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270 Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0 CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack_lvl+0x57/0x7d print_report.cold+0x2d5/0x684 ? do_raw_spin_lock+0x23d/0x270 kasan_report+0xb1/0x1a0 ? do_raw_spin_lock+0x23d/0x270 do_raw_spin_lock+0x23d/0x270 ? rwlock_bug.part.0+0x90/0x90 ? __delete_object+0xb8/0x100 ? lock_downgrade+0x6e0/0x6e0 _raw_spin_lock_irqsave+0x43/0x60 ? __wake_up_common_lock+0xb9/0x140 __wake_up_common_lock+0xb9/0x140 ? __wake_up_common+0x650/0x650 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kasan_set_track+0x21/0x30 ? destroy_tis_callback+0x53/0x70 [mlx5_core] ? kfree+0x1ba/0x520 ? do_raw_spin_unlock+0x54/0x220 mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core] mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core] ? dump_command+0xcc0/0xcc0 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core] cmd_comp_notifier+0x7e/0xb0 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core] atomic_notifier_call_chain+0xd7/0x1d0 ? irq_release+0x140/0x140 [mlx5_core] irq_int_handler+0x19/0x30 [mlx5_core] __handle_irq_event_percpu+0x1f2/0x620 handle_irq_event+0xb2/0x1d0 handle_edge_irq+0x21e/0xb00 __common_interrupt+0x79/0x1a0 common_interrupt+0x78/0xa0 asm_common_interrupt+0x22/0x40 RIP: 0010:default_idle+0x42/0x60 Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00 RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242 RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110 RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3 R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005 R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000 ? default_idle_call+0xcc/0x450 default_idle_call+0xec/0x450 do_idle+0x394/0x450 ? arch_cpu_idle_exit+0x40/0x40 ? do_idle+0x17/0x450 cpu_startup_entry+0x19/0x20 start_secondary+0x221/0x2b0 ? set_cpu_sibling_map+0x2070/0x2070 secondary_startup_64_no_verify+0xcd/0xdb Allocated by task 49502: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kvmalloc_node+0x48/0xe0 mlx5e_bulk_async_init+0x35/0x110 [mlx5_core] mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 49502: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x11d/0x1b0 kfree+0x1ba/0x520 mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core] mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core] mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core] mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core] mlx5e_suspend+0xdb/0x140 [mlx5_core] mlx5e_remove+0x89/0x190 [mlx5_core] auxiliary_bus_remove+0x52/0x70 device_release_driver_internal+0x40f/0x650 driver_detach+0xc1/0x180 bus_remove_driver+0x125/0x2f0 auxiliary_driver_unregister+0x16/0x50 mlx5e_cleanup+0x26/0x30 [mlx5_core] cleanup+0xc/0x4e [mlx5_core] __x64_sys_delete_module+0x2b5/0x450 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: e355477ed9e4 ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API") Signed-off-by: Tariq Toukan Reviewed-by: Moshe Shemesh Signed-off-by: Saeed Mahameed Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/driver.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index a12929bc31b2..af2ceb4160bc 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -970,7 +970,7 @@ void mlx5_cmd_allowed_opcode(struct mlx5_core_dev *dev, u16 opcode); struct mlx5_async_ctx { struct mlx5_core_dev *dev; atomic_t num_inflight; - struct wait_queue_head wait; + struct completion inflight_done; }; struct mlx5_async_work; -- cgit From 67eae54bc227b30dedcce9db68b063ba1adb7838 Mon Sep 17 00:00:00 2001 From: Peter Xu Date: Mon, 24 Oct 2022 15:33:35 -0400 Subject: mm/uffd: fix vma check on userfault for wp We used to have a report that pte-marker code can be reached even when uffd-wp is not compiled in for file memories, here: https://lore.kernel.org/all/YzeR+R6b4bwBlBHh@x1n/T/#u I just got time to revisit this and found that the root cause is we simply messed up with the vma check, so that for !PTE_MARKER_UFFD_WP system, we will allow UFFDIO_REGISTER of MINOR & WP upon shmem as the check was wrong: if (vm_flags & VM_UFFD_MINOR) return is_vm_hugetlb_page(vma) || vma_is_shmem(vma); Where we'll allow anything to pass on shmem as long as minor mode is requested. Axel did it right when introducing minor mode but I messed it up in b1f9e876862d when moving code around. Fix it. Link: https://lkml.kernel.org/r/20221024193336.1233616-1-peterx@redhat.com Link: https://lkml.kernel.org/r/20221024193336.1233616-2-peterx@redhat.com Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") Signed-off-by: Peter Xu Cc: Axel Rasmussen Cc: Andrea Arcangeli Cc: Nadav Amit Cc: Signed-off-by: Andrew Morton --- include/linux/userfaultfd_k.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index f07e6998bb68..9df0b9a762cc 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -146,9 +146,9 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma) static inline bool vma_can_userfault(struct vm_area_struct *vma, unsigned long vm_flags) { - if (vm_flags & VM_UFFD_MINOR) - return is_vm_hugetlb_page(vma) || vma_is_shmem(vma); - + if ((vm_flags & VM_UFFD_MINOR) && + (!is_vm_hugetlb_page(vma) && !vma_is_shmem(vma))) + return false; #ifndef CONFIG_PTE_MARKER_UFFD_WP /* * If user requested uffd-wp but not enabled pte markers for -- cgit From 78a498c3a227f2ac773a8234b2ce092a4403f2c3 Mon Sep 17 00:00:00 2001 From: Alexander Potapenko Date: Mon, 24 Oct 2022 23:21:44 +0200 Subject: x86: fortify: kmsan: fix KMSAN fortify builds Ensure that KMSAN builds replace memset/memcpy/memmove calls with the respective __msan_XXX functions, and that none of the macros are redefined twice. This should allow building kernel with both CONFIG_KMSAN and CONFIG_FORTIFY_SOURCE. Link: https://lkml.kernel.org/r/20221024212144.2852069-5-glider@google.com Link: https://github.com/google/kmsan/issues/89 Signed-off-by: Alexander Potapenko Reported-by: Tamas K Lengyel Cc: Nathan Chancellor Cc: Nick Desaulniers Cc: Kees Cook Signed-off-by: Andrew Morton --- include/linux/fortify-string.h | 17 +++++++++++++++-- include/linux/kmsan_string.h | 21 +++++++++++++++++++++ 2 files changed, 36 insertions(+), 2 deletions(-) create mode 100644 include/linux/kmsan_string.h (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 4029fe368a4f..18a31b125f9d 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -43,11 +43,24 @@ extern __kernel_size_t __underlying_strlen(const char *p) __RENAME(strlen); extern char *__underlying_strncat(char *p, const char *q, __kernel_size_t count) __RENAME(strncat); extern char *__underlying_strncpy(char *p, const char *q, __kernel_size_t size) __RENAME(strncpy); #else -#define __underlying_memchr __builtin_memchr -#define __underlying_memcmp __builtin_memcmp + +#if defined(__SANITIZE_MEMORY__) +/* + * For KMSAN builds all memcpy/memset/memmove calls should be replaced by the + * corresponding __msan_XXX functions. + */ +#include +#define __underlying_memcpy __msan_memcpy +#define __underlying_memmove __msan_memmove +#define __underlying_memset __msan_memset +#else #define __underlying_memcpy __builtin_memcpy #define __underlying_memmove __builtin_memmove #define __underlying_memset __builtin_memset +#endif + +#define __underlying_memchr __builtin_memchr +#define __underlying_memcmp __builtin_memcmp #define __underlying_strcat __builtin_strcat #define __underlying_strcpy __builtin_strcpy #define __underlying_strlen __builtin_strlen diff --git a/include/linux/kmsan_string.h b/include/linux/kmsan_string.h new file mode 100644 index 000000000000..7287da6f52ef --- /dev/null +++ b/include/linux/kmsan_string.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * KMSAN string functions API used in other headers. + * + * Copyright (C) 2022 Google LLC + * Author: Alexander Potapenko + * + */ +#ifndef _LINUX_KMSAN_STRING_H +#define _LINUX_KMSAN_STRING_H + +/* + * KMSAN overrides the default memcpy/memset/memmove implementations in the + * kernel, which requires having __msan_XXX function prototypes in several other + * headers. Keep them in one place instead of open-coding. + */ +void *__msan_memcpy(void *dst, const void *src, size_t size); +void *__msan_memset(void *s, int c, size_t n); +void *__msan_memmove(void *dest, const void *src, size_t len); + +#endif /* _LINUX_KMSAN_STRING_H */ -- cgit From 6f7630b1b5bc672b54c1285ee6aba752b446672c Mon Sep 17 00:00:00 2001 From: Kees Cook Date: Fri, 28 Oct 2022 15:32:07 -0700 Subject: fortify: Capture __bos() results in const temp vars In two recent run-time memcpy() bound checking bug reports (NFS[1] and JFS[2]), the _detection_ was working correctly (in the sense that the requested copy size was larger than the destination field size), but the _warning text_ was showing the destination field size as SIZE_MAX ("unknown size"). This should be impossible, since the detection function will explicitly give up if the destination field size is unknown. For example, the JFS warning was: memcpy: detected field-spanning write (size 132) of single field "ip->i_link" at fs/jfs/namei.c:950 (size 18446744073709551615) Other cases of this warning (e.g.[3]) have reported correctly, and the reproducer only happens under GCC (at least 10.2 and 12.1), so this currently appears to be a GCC bug. Explicitly capturing the __builtin_object_size() results in const temporary variables fixes the report. For example, the JFS reproducer now correctly reports the field size (128): memcpy: detected field-spanning write (size 132) of single field "ip->i_link" at fs/jfs/namei.c:950 (size 128) Examination of the .text delta (which is otherwise identical), shows the literal value used in the report changing: - mov $0xffffffffffffffff,%rcx + mov $0x80,%ecx [1] https://lore.kernel.org/lkml/Y0zEzZwhOxTDcBTB@codemonkey.org.uk/ [2] https://syzkaller.appspot.com/bug?id=23d613df5259b977dac1696bec77f61a85890e3d [3] https://lore.kernel.org/all/202210110948.26b43120-yujie.liu@intel.com/ Cc: "Dr. David Alan Gilbert" Cc: llvm@lists.linux.dev Cc: linux-hardening@vger.kernel.org Signed-off-by: Kees Cook --- include/linux/fortify-string.h | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fortify-string.h b/include/linux/fortify-string.h index 4029fe368a4f..0f00a551939a 100644 --- a/include/linux/fortify-string.h +++ b/include/linux/fortify-string.h @@ -441,13 +441,18 @@ __FORTIFY_INLINE bool fortify_memcpy_chk(__kernel_size_t size, #define __fortify_memcpy_chk(p, q, size, p_size, q_size, \ p_size_field, q_size_field, op) ({ \ - size_t __fortify_size = (size_t)(size); \ - WARN_ONCE(fortify_memcpy_chk(__fortify_size, p_size, q_size, \ - p_size_field, q_size_field, #op), \ + const size_t __fortify_size = (size_t)(size); \ + const size_t __p_size = (p_size); \ + const size_t __q_size = (q_size); \ + const size_t __p_size_field = (p_size_field); \ + const size_t __q_size_field = (q_size_field); \ + WARN_ONCE(fortify_memcpy_chk(__fortify_size, __p_size, \ + __q_size, __p_size_field, \ + __q_size_field, #op), \ #op ": detected field-spanning write (size %zu) of single %s (size %zu)\n", \ __fortify_size, \ "field \"" #p "\" at " __FILE__ ":" __stringify(__LINE__), \ - p_size_field); \ + __p_size_field); \ __underlying_##op(p, q, __fortify_size); \ }) -- cgit From 8bbabb3fddcd0f858be69ed5abc9b470a239d6f2 Mon Sep 17 00:00:00 2001 From: Cong Wang Date: Tue, 1 Nov 2022 21:34:17 -0700 Subject: bpf, sock_map: Move cancel_work_sync() out of sock lock Stanislav reported a lockdep warning, which is caused by the cancel_work_sync() called inside sock_map_close(), as analyzed below by Jakub: psock->work.func = sk_psock_backlog() ACQUIRE psock->work_mutex sk_psock_handle_skb() skb_send_sock() __skb_send_sock() sendpage_unlocked() kernel_sendpage() sock->ops->sendpage = inet_sendpage() sk->sk_prot->sendpage = tcp_sendpage() ACQUIRE sk->sk_lock tcp_sendpage_locked() RELEASE sk->sk_lock RELEASE psock->work_mutex sock_map_close() ACQUIRE sk->sk_lock sk_psock_stop() sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED) cancel_work_sync() __cancel_work_timer() __flush_work() // wait for psock->work to finish RELEASE sk->sk_lock We can move the cancel_work_sync() out of the sock lock protection, but still before saved_close() was called. Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") Reported-by: Stanislav Fomichev Signed-off-by: Cong Wang Signed-off-by: Daniel Borkmann Tested-by: Jakub Sitnicki Acked-by: John Fastabend Acked-by: Jakub Sitnicki Link: https://lore.kernel.org/bpf/20221102043417.279409-1-xiyou.wangcong@gmail.com --- include/linux/skmsg.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index 48f4b645193b..70d6cb94e580 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -376,7 +376,7 @@ static inline void sk_psock_report_error(struct sk_psock *psock, int err) } struct sk_psock *sk_psock_init(struct sock *sk, int node); -void sk_psock_stop(struct sk_psock *psock, bool wait); +void sk_psock_stop(struct sk_psock *psock); #if IS_ENABLED(CONFIG_BPF_STREAM_PARSER) int sk_psock_init_strp(struct sock *sk, struct sk_psock *psock); -- cgit From eb4940d4adf590590a9d0c47e38d2799c2ff9670 Mon Sep 17 00:00:00 2001 From: Vlastimil Babka Date: Fri, 4 Nov 2022 13:57:11 +0100 Subject: mm/slab: remove !CONFIG_TRACING variants of kmalloc_[node_]trace() For !CONFIG_TRACING kernels, the kmalloc() implementation tries (in cases where the allocation size is build-time constant) to save a function call, by inlining kmalloc_trace() to a kmem_cache_alloc() call. However since commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") this path now fails to pass the original request size to be eventually recorded (for kmalloc caches with debugging enabled). We could adjust the code to call __kmem_cache_alloc_node() as the CONFIG_TRACING variant, but that would as a result inline a call with 5 parameters, bloating the kmalloc() call sites. The cost of extra function call (to kmalloc_trace()) seems like a lesser evil. It also appears that the !CONFIG_TRACING variant is incompatible with upcoming hardening efforts [1] so it's easier if we just remove it now. Kernels with no tracing are rare these days and the benefit is dubious anyway. [1] https://lore.kernel.org/linux-mm/20221101222520.never.109-kees@kernel.org/T/#m20ecf14390e406247bde0ea9cce368f469c539ed Link: https://lore.kernel.org/all/097d8fba-bd10-a312-24a3-a4068c4f424c@suse.cz/ Suggested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Signed-off-by: Vlastimil Babka --- include/linux/slab.h | 23 ----------------------- 1 file changed, 23 deletions(-) (limited to 'include/linux') diff --git a/include/linux/slab.h b/include/linux/slab.h index 90877fcde70b..45efc6c553b8 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -470,35 +470,12 @@ void *__kmalloc_node(size_t size, gfp_t flags, int node) __assume_kmalloc_alignm void *kmem_cache_alloc_node(struct kmem_cache *s, gfp_t flags, int node) __assume_slab_alignment __malloc; -#ifdef CONFIG_TRACING void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size) __assume_kmalloc_alignment __alloc_size(3); void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags, int node, size_t size) __assume_kmalloc_alignment __alloc_size(4); -#else /* CONFIG_TRACING */ -/* Save a function call when CONFIG_TRACING=n */ -static __always_inline __alloc_size(3) -void *kmalloc_trace(struct kmem_cache *s, gfp_t flags, size_t size) -{ - void *ret = kmem_cache_alloc(s, flags); - - ret = kasan_kmalloc(s, ret, size, flags); - return ret; -} - -static __always_inline __alloc_size(4) -void *kmalloc_node_trace(struct kmem_cache *s, gfp_t gfpflags, - int node, size_t size) -{ - void *ret = kmem_cache_alloc_node(s, gfpflags, node); - - ret = kasan_kmalloc(s, ret, size, gfpflags); - return ret; -} -#endif /* CONFIG_TRACING */ - void *kmalloc_large(size_t size, gfp_t flags) __assume_page_alignment __alloc_size(1); -- cgit From 18acb7fac22ff7b36c7ea5a76b12996e7b7dbaba Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 3 Nov 2022 13:00:13 +0100 Subject: bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop") MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Because __attribute__((patchable_function_entry)) is only available since GCC-8 this solution fails to build on the minimum required GCC version. Undo these changes so we might try again -- without cluttering up the patches with too many changes. This is an almost complete revert of: dbe69b299884 ("bpf: Fix dispatcher patchable function entry to 5 bytes nop") ceea991a019c ("bpf: Move bpf_dispatcher function out of ftrace locations") (notably the arch/x86/Kconfig hunk is kept). Reported-by: David Laight Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Daniel Borkmann Tested-by: Björn Töpel Tested-by: Jiri Olsa Acked-by: Björn Töpel Acked-by: Jiri Olsa Link: https://lkml.kernel.org/r/439d8dc735bb4858875377df67f1b29a@AcuMS.aculab.com Link: https://lore.kernel.org/bpf/20221103120647.728830733@infradead.org --- include/linux/bpf.h | 21 +-------------------- 1 file changed, 1 insertion(+), 20 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 0566705c1d4e..5cd95716b441 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -27,7 +27,6 @@ #include #include #include -#include struct bpf_verifier_env; struct bpf_verifier_log; @@ -971,8 +970,6 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key, struct bpf_attach_target_info *tgt_info); void bpf_trampoline_put(struct bpf_trampoline *tr); int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs); -int __init bpf_arch_init_dispatcher_early(void *ip); - #define BPF_DISPATCHER_INIT(_name) { \ .mutex = __MUTEX_INITIALIZER(_name.mutex), \ .func = &_name##_func, \ @@ -986,21 +983,7 @@ int __init bpf_arch_init_dispatcher_early(void *ip); }, \ } -#define BPF_DISPATCHER_INIT_CALL(_name) \ - static int __init _name##_init(void) \ - { \ - return bpf_arch_init_dispatcher_early(_name##_func); \ - } \ - early_initcall(_name##_init) - -#ifdef CONFIG_X86_64 -#define BPF_DISPATCHER_ATTRIBUTES __attribute__((patchable_function_entry(5))) -#else -#define BPF_DISPATCHER_ATTRIBUTES -#endif - #define DEFINE_BPF_DISPATCHER(name) \ - notrace BPF_DISPATCHER_ATTRIBUTES \ noinline __nocfi unsigned int bpf_dispatcher_##name##_func( \ const void *ctx, \ const struct bpf_insn *insnsi, \ @@ -1010,9 +993,7 @@ int __init bpf_arch_init_dispatcher_early(void *ip); } \ EXPORT_SYMBOL(bpf_dispatcher_##name##_func); \ struct bpf_dispatcher bpf_dispatcher_##name = \ - BPF_DISPATCHER_INIT(bpf_dispatcher_##name); \ - BPF_DISPATCHER_INIT_CALL(bpf_dispatcher_##name); - + BPF_DISPATCHER_INIT(bpf_dispatcher_##name); #define DECLARE_BPF_DISPATCHER(name) \ unsigned int bpf_dispatcher_##name##_func( \ const void *ctx, \ -- cgit From c86df29d11dfba27c0a1f5039cd6fe387fbf4239 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Thu, 3 Nov 2022 13:00:14 +0100 Subject: bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The dispatcher function is currently abusing the ftrace __fentry__ call location for its own purposes -- this obviously gives trouble when the dispatcher and ftrace are both in use. A previous solution tried using __attribute__((patchable_function_entry())) which works, except it is GCC-8+ only, breaking the build on the earlier still supported compilers. Instead use static_call() -- which has its own annotations and does not conflict with ftrace -- to rewrite the dispatch function. By using: return static_call()(ctx, insni, bpf_func) you get a perfect forwarding tail call as function body (iow a single jmp instruction). By having the default static_call() target be bpf_dispatcher_nop_func() it retains the default behaviour (an indirect call to the argument function). Only once a dispatcher program is attached is the target rewritten to directly call the JIT'ed image. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Daniel Borkmann Tested-by: Björn Töpel Tested-by: Jiri Olsa Acked-by: Björn Töpel Acked-by: Jiri Olsa Link: https://lkml.kernel.org/r/Y1/oBlK0yFk5c/Im@hirez.programming.kicks-ass.net Link: https://lore.kernel.org/bpf/20221103120647.796772565@infradead.org --- include/linux/bpf.h | 39 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 38 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 5cd95716b441..74c6f449d81e 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -27,6 +27,7 @@ #include #include #include +#include struct bpf_verifier_env; struct bpf_verifier_log; @@ -953,6 +954,10 @@ struct bpf_dispatcher { void *rw_image; u32 image_off; struct bpf_ksym ksym; +#ifdef CONFIG_HAVE_STATIC_CALL + struct static_call_key *sc_key; + void *sc_tramp; +#endif }; static __always_inline __nocfi unsigned int bpf_dispatcher_nop_func( @@ -970,6 +975,34 @@ struct bpf_trampoline *bpf_trampoline_get(u64 key, struct bpf_attach_target_info *tgt_info); void bpf_trampoline_put(struct bpf_trampoline *tr); int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_funcs); + +/* + * When the architecture supports STATIC_CALL replace the bpf_dispatcher_fn + * indirection with a direct call to the bpf program. If the architecture does + * not have STATIC_CALL, avoid a double-indirection. + */ +#ifdef CONFIG_HAVE_STATIC_CALL + +#define __BPF_DISPATCHER_SC_INIT(_name) \ + .sc_key = &STATIC_CALL_KEY(_name), \ + .sc_tramp = STATIC_CALL_TRAMP_ADDR(_name), + +#define __BPF_DISPATCHER_SC(name) \ + DEFINE_STATIC_CALL(bpf_dispatcher_##name##_call, bpf_dispatcher_nop_func) + +#define __BPF_DISPATCHER_CALL(name) \ + static_call(bpf_dispatcher_##name##_call)(ctx, insnsi, bpf_func) + +#define __BPF_DISPATCHER_UPDATE(_d, _new) \ + __static_call_update((_d)->sc_key, (_d)->sc_tramp, (_new)) + +#else +#define __BPF_DISPATCHER_SC_INIT(name) +#define __BPF_DISPATCHER_SC(name) +#define __BPF_DISPATCHER_CALL(name) bpf_func(ctx, insnsi) +#define __BPF_DISPATCHER_UPDATE(_d, _new) +#endif + #define BPF_DISPATCHER_INIT(_name) { \ .mutex = __MUTEX_INITIALIZER(_name.mutex), \ .func = &_name##_func, \ @@ -981,25 +1014,29 @@ int arch_prepare_bpf_dispatcher(void *image, void *buf, s64 *funcs, int num_func .name = #_name, \ .lnode = LIST_HEAD_INIT(_name.ksym.lnode), \ }, \ + __BPF_DISPATCHER_SC_INIT(_name##_call) \ } #define DEFINE_BPF_DISPATCHER(name) \ + __BPF_DISPATCHER_SC(name); \ noinline __nocfi unsigned int bpf_dispatcher_##name##_func( \ const void *ctx, \ const struct bpf_insn *insnsi, \ bpf_func_t bpf_func) \ { \ - return bpf_func(ctx, insnsi); \ + return __BPF_DISPATCHER_CALL(name); \ } \ EXPORT_SYMBOL(bpf_dispatcher_##name##_func); \ struct bpf_dispatcher bpf_dispatcher_##name = \ BPF_DISPATCHER_INIT(bpf_dispatcher_##name); + #define DECLARE_BPF_DISPATCHER(name) \ unsigned int bpf_dispatcher_##name##_func( \ const void *ctx, \ const struct bpf_insn *insnsi, \ bpf_func_t bpf_func); \ extern struct bpf_dispatcher bpf_dispatcher_##name; + #define BPF_DISPATCHER_FUNC(name) bpf_dispatcher_##name##_func #define BPF_DISPATCHER_PTR(name) (&bpf_dispatcher_##name) void bpf_dispatcher_change_prog(struct bpf_dispatcher *d, struct bpf_prog *from, -- cgit From ae64438be1923e3c1102d90fd41db7afcfaf54cc Mon Sep 17 00:00:00 2001 From: Oliver Hartkopp Date: Wed, 2 Nov 2022 10:54:31 +0100 Subject: can: dev: fix skb drop check In commit a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") the priv->ctrlmode element is read even on virtual CAN interfaces that do not create the struct can_priv at startup. This out-of-bounds read may lead to CAN frame drops for virtual CAN interfaces like vcan and vxcan. This patch mainly reverts the original commit and adds a new helper for CAN interface drivers that provide the required information in struct can_priv. Fixes: a6d190f8c767 ("can: skb: drop tx skb if in listen only mode") Reported-by: Dariusz Stojaczyk Cc: Vincent Mailhol Cc: Max Staudt Signed-off-by: Oliver Hartkopp Acked-by: Vincent Mailhol Link: https://lore.kernel.org/all/20221102095431.36831-1-socketcan@hartkopp.net Cc: stable@vger.kernel.org # 6.0.x [mkl: patch pch_can, too] Signed-off-by: Marc Kleine-Budde --- include/linux/can/dev.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) (limited to 'include/linux') diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h index 58f5431a5559..982ba245eb41 100644 --- a/include/linux/can/dev.h +++ b/include/linux/can/dev.h @@ -152,6 +152,22 @@ static inline bool can_is_canxl_dev_mtu(unsigned int mtu) return (mtu >= CANXL_MIN_MTU && mtu <= CANXL_MAX_MTU); } +/* drop skb if it does not contain a valid CAN frame for sending */ +static inline bool can_dev_dropped_skb(struct net_device *dev, struct sk_buff *skb) +{ + struct can_priv *priv = netdev_priv(dev); + + if (priv->ctrlmode & CAN_CTRLMODE_LISTENONLY) { + netdev_info_once(dev, + "interface in listen only mode, dropping skb\n"); + kfree_skb(skb); + dev->stats.tx_dropped++; + return true; + } + + return can_dropped_invalid_skb(dev, skb); +} + void can_setup(struct net_device *dev); struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max, -- cgit From 120b116208a0877227fc82e3f0df81e7a3ed4ab1 Mon Sep 17 00:00:00 2001 From: Liam Howlett Date: Fri, 28 Oct 2022 18:04:30 +0000 Subject: maple_tree: reorganize testing to restore module testing Along the development cycle, the testing code support for module/in-kernel compiles was removed. Restore this functionality by moving any internal API tests to the userspace side, as well as threading tests. Fix the lockdep issues and add a way to reduce memory usage so the tests can complete with KASAN + memleak detection. Make the tests work on 32 bit hosts where possible and detect 32 bit hosts in the radix test suite. [akpm@linux-foundation.org: fix module export] [akpm@linux-foundation.org: fix it some more] [liam.howlett@oracle.com: fix compile warnings on 32bit build in check_find()] Link: https://lkml.kernel.org/r/20221107203816.1260327-1-Liam.Howlett@oracle.com Link: https://lkml.kernel.org/r/20221028180415.3074673-1-Liam.Howlett@oracle.com Signed-off-by: Liam R. Howlett Signed-off-by: Andrew Morton --- include/linux/maple_tree.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h index 2effab72add1..e594db58a0f1 100644 --- a/include/linux/maple_tree.h +++ b/include/linux/maple_tree.h @@ -638,6 +638,12 @@ static inline void mt_set_in_rcu(struct maple_tree *mt) } } +static inline unsigned int mt_height(const struct maple_tree *mt) + +{ + return (mt->ma_flags & MT_FLAGS_HEIGHT_MASK) >> MT_FLAGS_HEIGHT_OFFSET; +} + void *mt_find(struct maple_tree *mt, unsigned long *index, unsigned long max); void *mt_find_after(struct maple_tree *mt, unsigned long *index, unsigned long max); @@ -664,6 +670,7 @@ extern atomic_t maple_tree_tests_passed; void mt_dump(const struct maple_tree *mt); void mt_validate(struct maple_tree *mt); +void mt_cache_shrink(void); #define MT_BUG_ON(__tree, __x) do { \ atomic_inc(&maple_tree_tests_run); \ if (__x) { \ -- cgit From 5cd189e410debedda416fecfc12f4716b5829845 Mon Sep 17 00:00:00 2001 From: Anthony DeRossi Date: Wed, 9 Nov 2022 17:40:26 -0800 Subject: vfio: Export the device set open count The open count of a device set is the sum of the open counts of all devices in the set. Drivers can use this value to determine whether shared resources are in use without tracking them manually or accessing the private open_count in vfio_device. Signed-off-by: Anthony DeRossi Reviewed-by: Jason Gunthorpe Reviewed-by: Kevin Tian Reviewed-by: Yi Liu Link: https://lore.kernel.org/r/20221110014027.28780-3-ajderossi@gmail.com Signed-off-by: Alex Williamson --- include/linux/vfio.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/vfio.h b/include/linux/vfio.h index e7cebeb875dd..fdd393f70b19 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -189,6 +189,7 @@ int vfio_register_emulated_iommu_dev(struct vfio_device *device); void vfio_unregister_group_dev(struct vfio_device *device); int vfio_assign_device_set(struct vfio_device *device, void *set_id); +unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set); int vfio_mig_get_next_state(struct vfio_device *device, enum vfio_device_mig_state cur_fsm, -- cgit From 550b33cfd445296868a478e8413ffb2e963eed32 Mon Sep 17 00:00:00 2001 From: Ard Biesheuvel Date: Thu, 10 Nov 2022 10:36:20 +0100 Subject: arm64: efi: Force the use of SetVirtualAddressMap() on Altra machines Ampere Altra machines are reported to misbehave when the SetTime() EFI runtime service is called after ExitBootServices() but before calling SetVirtualAddressMap(). Given that the latter is horrid, pointless and explicitly documented as optional by the EFI spec, we no longer invoke it at boot if the configured size of the VA space guarantees that the EFI runtime memory regions can remain mapped 1:1 like they are at boot time. On Ampere Altra machines, this results in SetTime() calls issued by the rtc-efi driver triggering synchronous exceptions during boot. We can now recover from those without bringing down the system entirely, due to commit 23715a26c8d81291 ("arm64: efi: Recover from synchronous exceptions occurring in firmware"). However, it would be better to avoid the issue entirely, given that the firmware appears to remain in a funny state after this. So attempt to identify these machines based on the 'family' field in the type #1 SMBIOS record, and call SetVirtualAddressMap() unconditionally in that case. Tested-by: Alexandru Elisei Signed-off-by: Ard Biesheuvel --- include/linux/efi.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/efi.h b/include/linux/efi.h index 929d559ad41d..7603fc58c47c 100644 --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -389,6 +389,7 @@ void efi_native_runtime_setup(void); #define EFI_LOAD_FILE2_PROTOCOL_GUID EFI_GUID(0x4006c0c1, 0xfcb3, 0x403e, 0x99, 0x6d, 0x4a, 0x6c, 0x87, 0x24, 0xe0, 0x6d) #define EFI_RT_PROPERTIES_TABLE_GUID EFI_GUID(0xeb66918a, 0x7eef, 0x402a, 0x84, 0x2e, 0x93, 0x1d, 0x21, 0xc3, 0x8a, 0xe9) #define EFI_DXE_SERVICES_TABLE_GUID EFI_GUID(0x05ad34ba, 0x6f02, 0x4214, 0x95, 0x2e, 0x4d, 0xa0, 0x39, 0x8e, 0x2b, 0xb9) +#define EFI_SMBIOS_PROTOCOL_GUID EFI_GUID(0x03583ff6, 0xcb36, 0x4940, 0x94, 0x7e, 0xb9, 0xb3, 0x9f, 0x4a, 0xfa, 0xf7) #define EFI_IMAGE_SECURITY_DATABASE_GUID EFI_GUID(0xd719b2cb, 0x3d3a, 0x4596, 0xa3, 0xbc, 0xda, 0xd0, 0x0e, 0x67, 0x65, 0x6f) #define EFI_SHIM_LOCK_GUID EFI_GUID(0x605dab50, 0xe046, 0x4300, 0xab, 0xb6, 0x3d, 0xd8, 0x10, 0xdd, 0x8b, 0x23) -- cgit From 1f6e04a1c7b85da3b765ca9f46029e5d1826d839 Mon Sep 17 00:00:00 2001 From: Xu Kuohai Date: Fri, 11 Nov 2022 07:56:20 -0500 Subject: bpf: Fix offset calculation error in __copy_map_value and zero_map_value Function __copy_map_value and zero_map_value miscalculated copy offset, resulting in possible copy of unwanted data to user or kernel. Fix it. Fixes: cc48755808c6 ("bpf: Add zero_map_value to zero map value with special fields") Fixes: 4d7d7f69f4b1 ("bpf: Adapt copy_map_value for multiple offset case") Signed-off-by: Xu Kuohai Signed-off-by: Andrii Nakryiko Acked-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/bpf/20221111125620.754855-1-xukuohai@huaweicloud.com --- include/linux/bpf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 74c6f449d81e..c1bd1bd10506 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -315,7 +315,7 @@ static inline void __copy_map_value(struct bpf_map *map, void *dst, void *src, b u32 next_off = map->off_arr->field_off[i]; memcpy(dst + curr_off, src + curr_off, next_off - curr_off); - curr_off += map->off_arr->field_sz[i]; + curr_off = next_off + map->off_arr->field_sz[i]; } memcpy(dst + curr_off, src + curr_off, map->value_size - curr_off); } @@ -344,7 +344,7 @@ static inline void zero_map_value(struct bpf_map *map, void *dst) u32 next_off = map->off_arr->field_off[i]; memset(dst + curr_off, 0, next_off - curr_off); - curr_off += map->off_arr->field_sz[i]; + curr_off = next_off + map->off_arr->field_sz[i]; } memset(dst + curr_off, 0, map->value_size - curr_off); } -- cgit From 42fb0a1e84ff525ebe560e2baf9451ab69127e2b Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" Date: Thu, 20 Oct 2022 23:14:27 -0400 Subject: tracing/ring-buffer: Have polling block on watermark Currently the way polling works on the ring buffer is broken. It will return immediately if there's any data in the ring buffer whereas a read will block until the watermark (defined by the tracefs buffer_percent file) is hit. That is, a select() or poll() will return as if there's data available, but then the following read will block. This is broken for the way select()s and poll()s are supposed to work. Have the polling on the ring buffer also block the same way reads and splice does on the ring buffer. Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home Cc: Linux Trace Kernel Cc: Masami Hiramatsu Cc: Mathieu Desnoyers Cc: Primiano Tucci Cc: stable@vger.kernel.org Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full") Signed-off-by: Steven Rostedt (Google) --- include/linux/ring_buffer.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 2504df9a0453..3c7d295746f6 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -100,7 +100,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, - struct file *filp, poll_table *poll_table); + struct file *filp, poll_table *poll_table, int full); void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); #define RING_BUFFER_ALL_CPUS -1 -- cgit From c964d62f5cab7b43dd0534f22a96eab386c6ec5d Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 10 Nov 2022 10:44:57 -0800 Subject: block: make dma_alignment a stacking queue_limit Device mappers had always been getting the default 511 dma mask, but the underlying device might have a larger alignment requirement. Since this value is used to determine alloweable direct-io alignment, this needs to be a stackable limit. Signed-off-by: Keith Busch Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20221110184501.2451620-2-kbusch@meta.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 50e358a19d98..9898e34b2c2d 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -311,6 +311,13 @@ struct queue_limits { unsigned char discard_misaligned; unsigned char raid_partial_stripes_expensive; enum blk_zoned_model zoned; + + /* + * Drivers that set dma_alignment to less than 511 must be prepared to + * handle individual bvec's that are not a multiple of a SECTOR_SIZE + * due to possible offsets. + */ + unsigned int dma_alignment; }; typedef int (*report_zones_cb)(struct blk_zone *zone, unsigned int idx, @@ -456,12 +463,6 @@ struct request_queue { unsigned long nr_requests; /* Max # of requests */ unsigned int dma_pad_mask; - /* - * Drivers that set dma_alignment to less than 511 must be prepared to - * handle individual bvec's that are not a multiple of a SECTOR_SIZE - * due to possible offsets. - */ - unsigned int dma_alignment; #ifdef CONFIG_BLK_INLINE_ENCRYPTION struct blk_crypto_profile *crypto_profile; @@ -1324,7 +1325,7 @@ static inline sector_t bdev_zone_sectors(struct block_device *bdev) static inline int queue_dma_alignment(const struct request_queue *q) { - return q ? q->dma_alignment : 511; + return q ? q->limits.dma_alignment : 511; } static inline unsigned int bdev_dma_alignment(struct block_device *bdev) -- cgit From b3228254bb6e91e57f920227f72a1a7d81925d81 Mon Sep 17 00:00:00 2001 From: Keith Busch Date: Thu, 10 Nov 2022 10:44:59 -0800 Subject: block: make blk_set_default_limits() private There are no external users of this function. Signed-off-by: Keith Busch Reviewed-by: Christoph Hellwig Link: https://lore.kernel.org/r/20221110184501.2451620-4-kbusch@meta.com Signed-off-by: Jens Axboe --- include/linux/blkdev.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 9898e34b2c2d..891f8cbcd043 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -945,7 +945,6 @@ extern void blk_queue_io_min(struct request_queue *q, unsigned int min); extern void blk_limits_io_opt(struct queue_limits *limits, unsigned int opt); extern void blk_queue_io_opt(struct request_queue *q, unsigned int opt); extern void blk_set_queue_depth(struct request_queue *q, unsigned int depth); -extern void blk_set_default_limits(struct queue_limits *lim); extern void blk_set_stacking_limits(struct queue_limits *lim); extern int blk_stack_limits(struct queue_limits *t, struct queue_limits *b, sector_t offset); -- cgit From bedf06833b1f63c2627bd5634602e05592129d7a Mon Sep 17 00:00:00 2001 From: Aashish Sharma Date: Mon, 7 Nov 2022 21:35:56 +0530 Subject: tracing: Fix warning on variable 'struct trace_array' Move the declaration of 'struct trace_array' out of #ifdef CONFIG_TRACING block, to fix the following warning when CONFIG_TRACING is not set: >> include/linux/trace.h:63:45: warning: 'struct trace_array' declared inside parameter list will not be visible outside of this definition or declaration Link: https://lkml.kernel.org/r/20221107160556.2139463-1-shraash@google.com Fixes: 1a77dd1c2bb5 ("scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled") Cc: "Martin K. Petersen" Cc: Arun Easi Acked-by: Masami Hiramatsu (Google) Reviewed-by: Guenter Roeck Signed-off-by: Aashish Sharma Signed-off-by: Steven Rostedt (Google) --- include/linux/trace.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/trace.h b/include/linux/trace.h index b5e16e438448..80ffda871749 100644 --- a/include/linux/trace.h +++ b/include/linux/trace.h @@ -26,13 +26,13 @@ struct trace_export { int flags; }; +struct trace_array; + #ifdef CONFIG_TRACING int register_ftrace_export(struct trace_export *export); int unregister_ftrace_export(struct trace_export *export); -struct trace_array; - void trace_printk_init_buffers(void); __printf(3, 4) int trace_array_printk(struct trace_array *tr, unsigned long ip, -- cgit From 9eb8ca049c23afb0410f4a7b9a7158f1a0a3ad0e Mon Sep 17 00:00:00 2001 From: David Matlack Date: Wed, 16 Nov 2022 16:16:57 -0800 Subject: KVM: Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL on every halt, rather than just sampling the module parameter when the VM is first created. This restore the original behavior of kvm.halt_poll_ns for VMs that have not opted into KVM_CAP_HALT_POLL. Notably, this change restores the ability for admins to disable or change the maximum halt-polling time system wide for VMs not using KVM_CAP_HALT_POLL. Reported-by: Christian Borntraeger Fixes: acd05785e48c ("kvm: add capability for halt polling") Signed-off-by: David Matlack Message-Id: <20221117001657.1067231-4-dmatlack@google.com> Signed-off-by: Paolo Bonzini --- include/linux/kvm_host.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 18592bdf4c1b..637a60607c7d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -776,6 +776,7 @@ struct kvm { struct srcu_struct srcu; struct srcu_struct irq_srcu; pid_t userspace_pid; + bool override_halt_poll_ns; unsigned int max_halt_poll_ns; u32 dirty_ring_size; bool vm_bugged; -- cgit From 91482864768a874c4290ef93b84a78f4f1dac51b Mon Sep 17 00:00:00 2001 From: Pavel Begunkov Date: Thu, 17 Nov 2022 18:40:16 +0000 Subject: io_uring: fix multishot accept request leaks Having REQ_F_POLLED set doesn't guarantee that the request is executed as a multishot from the polling path. Fortunately for us, if the code thinks it's multishot issue when it's not, it can only ask to skip completion so leaking the request. Use issue_flags to mark multipoll issues. Cc: stable@vger.kernel.org Fixes: 390ed29b5e425 ("io_uring: add IORING_ACCEPT_MULTISHOT for accept") Signed-off-by: Pavel Begunkov Link: https://lore.kernel.org/r/7700ac57653f2823e30b34dc74da68678c0c5f13.1668710222.git.asml.silence@gmail.com Signed-off-by: Jens Axboe --- include/linux/io_uring.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/io_uring.h b/include/linux/io_uring.h index 43bc8a2edccf..0ded9e271523 100644 --- a/include/linux/io_uring.h +++ b/include/linux/io_uring.h @@ -16,6 +16,9 @@ enum io_uring_cmd_flags { IO_URING_F_SQE128 = 4, IO_URING_F_CQE32 = 8, IO_URING_F_IOPOLL = 16, + + /* the request is executed from poll, it should not be freed */ + IO_URING_F_MULTISHOT = 32, }; struct io_uring_cmd { -- cgit From 489d144563f23911262a652234b80c70c89c978b Mon Sep 17 00:00:00 2001 From: Christian Löhle Date: Thu, 17 Nov 2022 14:42:09 +0000 Subject: mmc: core: Fix ambiguous TRIM and DISCARD arg Clean up the MMC_TRIM_ARGS define that became ambiguous with DISCARD introduction. While at it, let's fix one usage where MMC_TRIM_ARGS falsely included DISCARD too. Fixes: b3bf915308ca ("mmc: core: new discard feature support at eMMC v4.5") Signed-off-by: Christian Loehle Acked-by: Adrian Hunter Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/11376b5714964345908f3990f17e0701@hyperstone.com Signed-off-by: Ulf Hansson --- include/linux/mmc/mmc.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/mmc/mmc.h b/include/linux/mmc/mmc.h index 9c50bc40f8ff..6f7993803ee7 100644 --- a/include/linux/mmc/mmc.h +++ b/include/linux/mmc/mmc.h @@ -451,7 +451,7 @@ static inline bool mmc_ready_for_data(u32 status) #define MMC_SECURE_TRIM1_ARG 0x80000001 #define MMC_SECURE_TRIM2_ARG 0x80008000 #define MMC_SECURE_ARGS 0x80000000 -#define MMC_TRIM_ARGS 0x00008001 +#define MMC_TRIM_OR_DISCARD_ARGS 0x00008003 #define mmc_driver_type_mask(n) (1 << (n)) -- cgit From 870c2481174b839e7159555127bc8b5a5d0699ba Mon Sep 17 00:00:00 2001 From: Moshe Shemesh Date: Tue, 31 May 2022 09:14:03 +0300 Subject: net/mlx5: cmdif, Print info on any firmware cmd failure to tracepoint While moving to new CMD API (quiet API), some pre-existing flows may call the new API function that in case of error, returns the error instead of printing it as previously done. For such flows we bring back the print but to tracepoint this time for sys admins to have the ability to check for errors especially for commands using the new quiet API. Tracepoint output example: devlink-1333 [001] ..... 822.746922: mlx5_cmd: ACCESS_REG(0x805) op_mod(0x0) failed, status bad resource(0x5), syndrome (0xb06e1f), err(-22) Fixes: f23519e542e5 ("net/mlx5: cmdif, Add new api for command execution") Signed-off-by: Moshe Shemesh Reviewed-by: Shay Drory Reviewed-by: Maor Gottlieb Signed-off-by: Saeed Mahameed --- include/linux/mlx5/driver.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h index af2ceb4160bc..06cbad166225 100644 --- a/include/linux/mlx5/driver.h +++ b/include/linux/mlx5/driver.h @@ -981,6 +981,7 @@ struct mlx5_async_work { struct mlx5_async_ctx *ctx; mlx5_async_cbk_t user_callback; u16 opcode; /* cmd opcode */ + u16 op_mod; /* cmd op_mod */ void *out; /* pointer to the cmd output buffer */ }; -- cgit From 50c697215a8cc22f0e58c88f06f2716c05a26e85 Mon Sep 17 00:00:00 2001 From: Sam James Date: Wed, 16 Nov 2022 18:26:34 +0000 Subject: kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible Add missing include for strcmp. Clang 16 makes -Wimplicit-function-declaration an error by default. Unfortunately, out of tree modules may use this in configure scripts, which means failure might cause silent miscompilation or misconfiguration. For more information, see LWN.net [0] or LLVM's Discourse [1], gentoo-dev@ [2], or the (new) c-std-porting mailing list [3]. [0] https://lwn.net/Articles/913505/ [1] https://discourse.llvm.org/t/configure-script-breakage-with-the-new-werror-implicit-function-declaration/65213 [2] https://archives.gentoo.org/gentoo-dev/message/dd9f2d3082b8b6f8dfbccb0639e6e240 [3] hosted at lists.linux.dev. [akpm@linux-foundation.org: remember "linux/"] Link: https://lkml.kernel.org/r/20221116182634.2823136-1-sam@gentoo.org Signed-off-by: Sam James Cc: Signed-off-by: Andrew Morton --- include/linux/license.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/license.h b/include/linux/license.h index 7cce390f120b..ad937f57f2cb 100644 --- a/include/linux/license.h +++ b/include/linux/license.h @@ -2,6 +2,8 @@ #ifndef __LICENSE_H #define __LICENSE_H +#include + static inline int license_is_gpl_compatible(const char *license) { return (strcmp(license, "GPL") == 0 -- cgit From ea4452de2ae987342fadbdd2c044034e6480daad Mon Sep 17 00:00:00 2001 From: Qi Zheng Date: Fri, 18 Nov 2022 18:00:11 +0800 Subject: mm: fix unexpected changes to {failslab|fail_page_alloc}.attr When we specify __GFP_NOWARN, we only expect that no warnings will be issued for current caller. But in the __should_failslab() and __should_fail_alloc_page(), the local GFP flags alter the global {failslab|fail_page_alloc}.attr, which is persistent and shared by all tasks. This is not what we expected, let's fix it. [akpm@linux-foundation.org: unexport should_fail_ex()] Link: https://lkml.kernel.org/r/20221118100011.2634-1-zhengqi.arch@bytedance.com Fixes: 3f913fc5f974 ("mm: fix missing handler for __GFP_NOWARN") Signed-off-by: Qi Zheng Reported-by: Dmitry Vyukov Reviewed-by: Akinobu Mita Reviewed-by: Jason Gunthorpe Cc: Akinobu Mita Cc: Matthew Wilcox Cc: Signed-off-by: Andrew Morton --- include/linux/fault-inject.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/fault-inject.h b/include/linux/fault-inject.h index 9f6e25467844..444236dadcf0 100644 --- a/include/linux/fault-inject.h +++ b/include/linux/fault-inject.h @@ -20,7 +20,6 @@ struct fault_attr { atomic_t space; unsigned long verbose; bool task_filter; - bool no_warn; unsigned long stacktrace_depth; unsigned long require_start; unsigned long require_end; @@ -32,6 +31,10 @@ struct fault_attr { struct dentry *dname; }; +enum fault_flags { + FAULT_NOWARN = 1 << 0, +}; + #define FAULT_ATTR_INITIALIZER { \ .interval = 1, \ .times = ATOMIC_INIT(1), \ @@ -40,11 +43,11 @@ struct fault_attr { .ratelimit_state = RATELIMIT_STATE_INIT_DISABLED, \ .verbose = 2, \ .dname = NULL, \ - .no_warn = false, \ } #define DECLARE_FAULT_ATTR(name) struct fault_attr name = FAULT_ATTR_INITIALIZER int setup_fault_attr(struct fault_attr *attr, char *str); +bool should_fail_ex(struct fault_attr *attr, ssize_t size, int flags); bool should_fail(struct fault_attr *attr, ssize_t size); #ifdef CONFIG_FAULT_INJECTION_DEBUG_FS -- cgit From 9f0933ac026f7e54fe096797af9de20724e79097 Mon Sep 17 00:00:00 2001 From: David Howells Date: Mon, 21 Nov 2022 16:31:34 +0000 Subject: fscache: fix OOB Read in __fscache_acquire_volume The type of a->key[0] is char in fscache_volume_same(). If the length of cache volume key is greater than 127, the value of a->key[0] is less than 0. In this case, klen becomes much larger than 255 after type conversion, because the type of klen is size_t. As a result, memcmp() is read out of bounds. This causes a slab-out-of-bounds Read in __fscache_acquire_volume(), as reported by Syzbot. Fix this by changing the type of the stored key to "u8 *" rather than "char *" (it isn't a simple string anyway). Also put in a check that the volume name doesn't exceed NAME_MAX. BUG: KASAN: slab-out-of-bounds in memcmp+0x16f/0x1c0 lib/string.c:757 Read of size 8 at addr ffff888016f3aa90 by task syz-executor344/3613 Call Trace: memcmp+0x16f/0x1c0 lib/string.c:757 memcmp include/linux/fortify-string.h:420 [inline] fscache_volume_same fs/fscache/volume.c:133 [inline] fscache_hash_volume fs/fscache/volume.c:171 [inline] __fscache_acquire_volume+0x76c/0x1080 fs/fscache/volume.c:328 fscache_acquire_volume include/linux/fscache.h:204 [inline] v9fs_cache_session_get_cookie+0x143/0x240 fs/9p/cache.c:34 v9fs_session_init+0x1166/0x1810 fs/9p/v9fs.c:473 v9fs_mount+0xba/0xc90 fs/9p/vfs_super.c:126 legacy_get_tree+0x105/0x220 fs/fs_context.c:610 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 Fixes: 62ab63352350 ("fscache: Implement volume registration") Reported-by: syzbot+a76f6a6e524cf2080aa3@syzkaller.appspotmail.com Signed-off-by: David Howells Reviewed-by: Zhang Peng Reviewed-by: Jingbo Xu cc: Dominique Martinet cc: Jeff Layton cc: v9fs-developer@lists.sourceforge.net cc: linux-cachefs@redhat.com Link: https://lore.kernel.org/r/Y3OH+Dmi0QIOK18n@codewreck.org/ # Zhang Peng's v1 fix Link: https://lore.kernel.org/r/20221115140447.2971680-1-zhangpeng362@huawei.com/ # Zhang Peng's v2 fix Link: https://lore.kernel.org/r/166869954095.3793579.8500020902371015443.stgit@warthog.procyon.org.uk/ # v1 Signed-off-by: Linus Torvalds --- include/linux/fscache.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/fscache.h b/include/linux/fscache.h index 36e5dd84cf59..8e312c8323a8 100644 --- a/include/linux/fscache.h +++ b/include/linux/fscache.h @@ -75,7 +75,7 @@ struct fscache_volume { atomic_t n_accesses; /* Number of cache accesses in progress */ unsigned int debug_id; unsigned int key_hash; /* Hash of key string */ - char *key; /* Volume ID, eg. "afs@example.com@1234" */ + u8 *key; /* Volume ID, eg. "afs@example.com@1234" */ struct list_head proc_link; /* Link in /proc/fs/fscache/volumes */ struct hlist_bl_node hash_link; /* Link in hash table */ struct work_struct work; -- cgit From c0071be0e16c461680d87b763ba1ee5e46548fde Mon Sep 17 00:00:00 2001 From: Emeel Hakim Date: Mon, 31 Oct 2022 11:07:59 +0200 Subject: net/mlx5e: MACsec, remove replay window size limitation in offload path Currently offload path limits replay window size to 32/64/128/256 bits, such a limitation should not exist since software allows it. Remove such limitation. Fixes: eb43846b43c3 ("net/mlx5e: Support MACsec offload replay window") Signed-off-by: Emeel Hakim Reviewed-by: Raed Salem Signed-off-by: Saeed Mahameed --- include/linux/mlx5/mlx5_ifc.h | 7 ------- 1 file changed, 7 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 5a4e914e2a6f..981fc7dfa408 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -11611,13 +11611,6 @@ enum { MLX5_MACSEC_ASO_REPLAY_PROTECTION = 0x1, }; -enum { - MLX5_MACSEC_ASO_REPLAY_WIN_32BIT = 0x0, - MLX5_MACSEC_ASO_REPLAY_WIN_64BIT = 0x1, - MLX5_MACSEC_ASO_REPLAY_WIN_128BIT = 0x2, - MLX5_MACSEC_ASO_REPLAY_WIN_256BIT = 0x3, -}; - #define MLX5_MACSEC_ASO_INC_SN 0x2 #define MLX5_MACSEC_ASO_REG_C_4_5 0x2 -- cgit From 26e8f6a75248247982458e8237b98c9fb2ffcf9d Mon Sep 17 00:00:00 2001 From: Heiko Schocher Date: Wed, 23 Nov 2022 08:16:36 +0100 Subject: can: sja1000: fix size of OCR_MODE_MASK define bitfield mode in ocr register has only 2 bits not 3, so correct the OCR_MODE_MASK define. Signed-off-by: Heiko Schocher Link: https://lore.kernel.org/all/20221123071636.2407823-1-hs@denx.de Signed-off-by: Marc Kleine-Budde --- include/linux/can/platform/sja1000.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/can/platform/sja1000.h b/include/linux/can/platform/sja1000.h index 5755ae5a4712..6a869682c120 100644 --- a/include/linux/can/platform/sja1000.h +++ b/include/linux/can/platform/sja1000.h @@ -14,7 +14,7 @@ #define OCR_MODE_TEST 0x01 #define OCR_MODE_NORMAL 0x02 #define OCR_MODE_CLOCK 0x03 -#define OCR_MODE_MASK 0x07 +#define OCR_MODE_MASK 0x03 #define OCR_TX0_INVERT 0x04 #define OCR_TX0_PULLDOWN 0x08 #define OCR_TX0_PULLUP 0x10 -- cgit From 10bc8e4af65946b727728d7479c028742321b60a Mon Sep 17 00:00:00 2001 From: Amir Goldstein Date: Thu, 17 Nov 2022 22:52:49 +0200 Subject: vfs: fix copy_file_range() averts filesystem freeze protection Commit 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies") removed fallback to generic_copy_file_range() for cross-fs cases inside vfs_copy_file_range(). To preserve behavior of nfsd and ksmbd server-side-copy, the fallback to generic_copy_file_range() was added in nfsd and ksmbd code, but that call is missing sb_start_write(), fsnotify hooks and more. Ideally, nfsd and ksmbd would pass a flag to vfs_copy_file_range() that will take care of the fallback, but that code would be subtle and we got vfs_copy_file_range() logic wrong too many times already. Instead, add a flag to explicitly request vfs_copy_file_range() to perform only generic_copy_file_range() and let nfsd and ksmbd use this flag only in the fallback path. This choise keeps the logic changes to minimum in the non-nfsd/ksmbd code paths to reduce the risk of further regressions. Fixes: 868f9f2f8e00 ("vfs: fix copy_file_range() regression in cross-fs copies") Tested-by: Namjae Jeon Tested-by: Luis Henriques Signed-off-by: Amir Goldstein Signed-off-by: Al Viro --- include/linux/fs.h | 8 ++++++++ 1 file changed, 8 insertions(+) (limited to 'include/linux') diff --git a/include/linux/fs.h b/include/linux/fs.h index e654435f1651..59ae95ddb679 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2089,6 +2089,14 @@ struct dir_context { */ #define REMAP_FILE_ADVISORY (REMAP_FILE_CAN_SHORTEN) +/* + * These flags control the behavior of vfs_copy_file_range(). + * They are not available to the user via syscall. + * + * COPY_FILE_SPLICE: call splice direct instead of fs clone/copy ops + */ +#define COPY_FILE_SPLICE (1 << 0) + struct iov_iter; struct io_uring_cmd; -- cgit From dda3bbbb26c823cd54d5bf211df7db12147c9392 Mon Sep 17 00:00:00 2001 From: Saeed Mahameed Date: Tue, 29 Nov 2022 01:30:05 -0800 Subject: Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path" This reverts commit c0071be0e16c461680d87b763ba1ee5e46548fde. The cited commit removed the validity checks which initialized the window_sz and never removed the use of the now uninitialized variable, so now we are left with wrong value in the window size and the following clang warning: [-Wuninitialized] drivers/net/ethernet/mellanox/mlx5/core/en_accel/macsec.c:232:45: warning: variable 'window_sz' is uninitialized when used here MLX5_SET(macsec_aso, aso_ctx, window_size, window_sz); Revet at this time to address the clang issue due to lack of time to test the proper solution. Fixes: c0071be0e16c ("net/mlx5e: MACsec, remove replay window size limitation in offload path") Signed-off-by: Saeed Mahameed Reported-by: Jacob Keller Link: https://lore.kernel.org/r/20221129093006.378840-1-saeed@kernel.org Signed-off-by: Jakub Kicinski --- include/linux/mlx5/mlx5_ifc.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 981fc7dfa408..5a4e914e2a6f 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -11611,6 +11611,13 @@ enum { MLX5_MACSEC_ASO_REPLAY_PROTECTION = 0x1, }; +enum { + MLX5_MACSEC_ASO_REPLAY_WIN_32BIT = 0x0, + MLX5_MACSEC_ASO_REPLAY_WIN_64BIT = 0x1, + MLX5_MACSEC_ASO_REPLAY_WIN_128BIT = 0x2, + MLX5_MACSEC_ASO_REPLAY_WIN_256BIT = 0x3, +}; + #define MLX5_MACSEC_ASO_INC_SN 0x2 #define MLX5_MACSEC_ASO_REG_C_4_5 0x2 -- cgit From dec1d352de5c6e2bcdbb03a2e7a84d85ad2e4f14 Mon Sep 17 00:00:00 2001 From: Yang Shi Date: Tue, 8 Nov 2022 10:43:57 -0800 Subject: mm: replace VM_WARN_ON to pr_warn if the node is offline with __GFP_THISNODE Syzbot reported the below splat: WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 __alloc_pages_node include/linux/gfp.h:221 [inline] WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 hpage_collapse_alloc_page mm/khugepaged.c:807 [inline] WARNING: CPU: 1 PID: 3646 at include/linux/gfp.h:221 alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963 Modules linked in: CPU: 1 PID: 3646 Comm: syz-executor210 Not tainted 6.1.0-rc1-syzkaller-00454-ga70385240892 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/11/2022 RIP: 0010:__alloc_pages_node include/linux/gfp.h:221 [inline] RIP: 0010:hpage_collapse_alloc_page mm/khugepaged.c:807 [inline] RIP: 0010:alloc_charge_hpage+0x802/0xaa0 mm/khugepaged.c:963 Code: e5 01 4c 89 ee e8 6e f9 ae ff 4d 85 ed 0f 84 28 fc ff ff e8 70 fc ae ff 48 8d 6b ff 4c 8d 63 07 e9 16 fc ff ff e8 5e fc ae ff <0f> 0b e9 96 fa ff ff 41 bc 1a 00 00 00 e9 86 fd ff ff e8 47 fc ae RSP: 0018:ffffc90003fdf7d8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888077f457c0 RSI: ffffffff81cd8f42 RDI: 0000000000000001 RBP: ffff888079388c0c R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: dffffc0000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f6b48ccf700(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f6b48a819f0 CR3: 00000000171e7000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: collapse_file+0x1ca/0x5780 mm/khugepaged.c:1715 hpage_collapse_scan_file+0xd6c/0x17a0 mm/khugepaged.c:2156 madvise_collapse+0x53a/0xb40 mm/khugepaged.c:2611 madvise_vma_behavior+0xd0a/0x1cc0 mm/madvise.c:1066 madvise_walk_vmas+0x1c7/0x2b0 mm/madvise.c:1240 do_madvise.part.0+0x24a/0x340 mm/madvise.c:1419 do_madvise mm/madvise.c:1432 [inline] __do_sys_madvise mm/madvise.c:1432 [inline] __se_sys_madvise mm/madvise.c:1430 [inline] __x64_sys_madvise+0x113/0x150 mm/madvise.c:1430 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f6b48a4eef9 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 b1 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f6b48ccf318 EFLAGS: 00000246 ORIG_RAX: 000000000000001c RAX: ffffffffffffffda RBX: 00007f6b48af0048 RCX: 00007f6b48a4eef9 RDX: 0000000000000019 RSI: 0000000000600003 RDI: 0000000020000000 RBP: 00007f6b48af0040 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f6b48aa53a4 R13: 00007f6b48bffcbf R14: 00007f6b48ccf400 R15: 0000000000022000 It is because khugepaged allocates pages with __GFP_THISNODE, but the preferred node is bogus. The previous patch fixed the khugepaged code to avoid allocating page from non-existing node. But it is still racy against memory hotremove. There is no synchronization with the memory hotplug so it is possible that memory gets offline during a longer taking scanning. So this warning still seems not quite helpful because: * There is no guarantee the node is online for __GFP_THISNODE context for all the callsites. * Kernel just fails the allocation regardless the warning, and it looks all callsites handle the allocation failure gracefully. Although while the warning has helped to identify a buggy code, it is not safe in general and this warning could panic the system with panic-on-warn configuration which tends to be used surprisingly often. So replace VM_WARN_ON to pr_warn(). And the warning will be triggered if __GFP_NOWARN is set since the allocator would print out warning for such case if __GFP_NOWARN is not set. [shy828301@gmail.com: rename nid to this_node and gfp to warn_gfp] Link: https://lkml.kernel.org/r/20221123193014.153983-1-shy828301@gmail.com [akpm@linux-foundation.org: fix whitespace] [akpm@linux-foundation.org: print gfp_mask instead of warn_gfp, per Michel] Link: https://lkml.kernel.org/r/20221108184357.55614-3-shy828301@gmail.com Fixes: 7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") Signed-off-by: Yang Shi Reported-by: Suggested-by: Michal Hocko Acked-by: Michal Hocko Cc: Zach O'Keefe Signed-off-by: Andrew Morton --- include/linux/gfp.h | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/gfp.h b/include/linux/gfp.h index ef4aea3b356e..65a78773dcca 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -210,6 +210,20 @@ alloc_pages_bulk_array_node(gfp_t gfp, int nid, unsigned long nr_pages, struct p return __alloc_pages_bulk(gfp, nid, NULL, nr_pages, NULL, page_array); } +static inline void warn_if_node_offline(int this_node, gfp_t gfp_mask) +{ + gfp_t warn_gfp = gfp_mask & (__GFP_THISNODE|__GFP_NOWARN); + + if (warn_gfp != (__GFP_THISNODE|__GFP_NOWARN)) + return; + + if (node_online(this_node)) + return; + + pr_warn("%pGg allocation from offline node %d\n", &gfp_mask, this_node); + dump_stack(); +} + /* * Allocate pages, preferring the node given as nid. The node must be valid and * online. For more general interface, see alloc_pages_node(). @@ -218,7 +232,7 @@ static inline struct page * __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) { VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); - VM_WARN_ON((gfp_mask & __GFP_THISNODE) && !node_online(nid)); + warn_if_node_offline(nid, gfp_mask); return __alloc_pages(gfp_mask, order, nid, NULL); } @@ -227,7 +241,7 @@ static inline struct folio *__folio_alloc_node(gfp_t gfp, unsigned int order, int nid) { VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES); - VM_WARN_ON((gfp & __GFP_THISNODE) && !node_online(nid)); + warn_if_node_offline(nid, gfp); return __folio_alloc(gfp, order, nid, NULL); } -- cgit From 21b85b09527c28e242db55c1b751f7f7549b830c Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Mon, 14 Nov 2022 15:55:05 -0800 Subject: madvise: use zap_page_range_single for madvise dontneed This series addresses the issue first reported in [1], and fully described in patch 2. Patches 1 and 2 address the user visible issue and are tagged for stable backports. While exploring solutions to this issue, related problems with mmu notification calls were discovered. This is addressed in the patch "hugetlb: remove duplicate mmu notifications:". Since there are no user visible effects, this third is not tagged for stable backports. Previous discussions suggested further cleanup by removing the routine zap_page_range. This is possible because zap_page_range_single is now exported, and all callers of zap_page_range pass ranges entirely within a single vma. This work will be done in a later patch so as not to distract from this bug fix. [1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/ This patch (of 2): Expose the routine zap_page_range_single to zap a range within a single vma. The madvise routine madvise_dontneed_single_vma can use this routine as it explicitly operates on a single vma. Also, update the mmu notification range in zap_page_range_single to take hugetlb pmd sharing into account. This is required as MADV_DONTNEED supports hugetlb vmas. Link: https://lkml.kernel.org/r/20221114235507.294320-1-mike.kravetz@oracle.com Link: https://lkml.kernel.org/r/20221114235507.294320-2-mike.kravetz@oracle.com Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Mike Kravetz Reported-by: Wei Chen Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Mina Almasry Cc: Nadav Amit Cc: Naoya Horiguchi Cc: Peter Xu Cc: Rik van Riel Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- include/linux/mm.h | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index 8bbcccbc5565..cbfb489d381c 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1852,6 +1852,23 @@ static void __maybe_unused show_free_areas(unsigned int flags, nodemask_t *nodem __show_free_areas(flags, nodemask, MAX_NR_ZONES - 1); } +/* + * Parameter block passed down to zap_pte_range in exceptional cases. + */ +struct zap_details { + struct folio *single_folio; /* Locked folio to be unmapped */ + bool even_cows; /* Zap COWed private pages too? */ + zap_flags_t zap_flags; /* Extra flags for zapping */ +}; + +/* + * Whether to drop the pte markers, for example, the uffd-wp information for + * file-backed memory. This should only be specified when we will completely + * drop the page in the mm, either by truncation or unmapping of the vma. By + * default, the flag is not set. + */ +#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) + #ifdef CONFIG_MMU extern bool can_do_mlock(void); #else @@ -1869,6 +1886,8 @@ void zap_vma_ptes(struct vm_area_struct *vma, unsigned long address, unsigned long size); void zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size); +void zap_page_range_single(struct vm_area_struct *vma, unsigned long address, + unsigned long size, struct zap_details *details); void unmap_vmas(struct mmu_gather *tlb, struct maple_tree *mt, struct vm_area_struct *start_vma, unsigned long start, unsigned long end); @@ -3467,12 +3486,4 @@ madvise_set_anon_name(struct mm_struct *mm, unsigned long start, } #endif -/* - * Whether to drop the pte markers, for example, the uffd-wp information for - * file-backed memory. This should only be specified when we will completely - * drop the page in the mm, either by truncation or unmapping of the vma. By - * default, the flag is not set. - */ -#define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) - #endif /* _LINUX_MM_H */ -- cgit From 04ada095dcfc4ae359418053c0be94453bdf1e84 Mon Sep 17 00:00:00 2001 From: Mike Kravetz Date: Mon, 14 Nov 2022 15:55:06 -0800 Subject: hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear page tables associated with the address range. For hugetlb vmas, zap_page_range will call __unmap_hugepage_range_final. However, __unmap_hugepage_range_final assumes the passed vma is about to be removed and deletes the vma_lock to prevent pmd sharing as the vma is on the way out. In the case of madvise(MADV_DONTNEED) the vma remains, but the missing vma_lock prevents pmd sharing and could potentially lead to issues with truncation/fault races. This issue was originally reported here [1] as a BUG triggered in page_try_dup_anon_rmap. Prior to the introduction of the hugetlb vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to prevent pmd sharing. Subsequent faults on this vma were confused as VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was not set in new pages added to the page table. This resulted in pages that appeared anonymous in a VM_SHARED vma and triggered the BUG. Address issue by adding a new zap flag ZAP_FLAG_UNMAP to indicate an unmap call from unmap_vmas(). This is used to indicate the 'final' unmapping of a hugetlb vma. When called via MADV_DONTNEED, this flag is not set and the vm_lock is not deleted. [1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/ Link: https://lkml.kernel.org/r/20221114235507.294320-3-mike.kravetz@oracle.com Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Signed-off-by: Mike Kravetz Reported-by: Wei Chen Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Matthew Wilcox Cc: Mina Almasry Cc: Nadav Amit Cc: Naoya Horiguchi Cc: Peter Xu Cc: Rik van Riel Cc: Vlastimil Babka Cc: Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/mm.h b/include/linux/mm.h index cbfb489d381c..974ccca609d2 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1868,6 +1868,8 @@ struct zap_details { * default, the flag is not set. */ #define ZAP_FLAG_DROP_MARKER ((__force zap_flags_t) BIT(0)) +/* Set in unmap_vmas() to indicate a final unmap call. Only used by hugetlb */ +#define ZAP_FLAG_UNMAP ((__force zap_flags_t) BIT(1)) #ifdef CONFIG_MMU extern bool can_do_mlock(void); -- cgit From 6617da8fb565445e0be4a4885443006374943d09 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Wed, 30 Nov 2022 14:49:41 -0800 Subject: mm: add dummy pmd_young() for architectures not having it In order to avoid #ifdeffery add a dummy pmd_young() implementation as a fallback. This is required for the later patch "mm: introduce arch_has_hw_nonleaf_pmd_young()". Link: https://lkml.kernel.org/r/fd3ac3cd-7349-6bbd-890a-71a9454ca0b3@suse.com Signed-off-by: Juergen Gross Acked-by: Yu Zhao Cc: Borislav Petkov Cc: Dave Hansen Cc: Geert Uytterhoeven Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: Sander Eikelenboom Cc: Thomas Gleixner Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index a108b60a6962..6b0d59269b33 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -165,6 +165,13 @@ static inline pte_t *virt_to_kpte(unsigned long vaddr) return pmd_none(*pmd) ? NULL : pte_offset_kernel(pmd, vaddr); } +#ifndef pmd_young +static inline int pmd_young(pmd_t pmd) +{ + return 0; +} +#endif + #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, -- cgit From 4aaf269c768dbacd6268af73fda2ffccaa3f1d88 Mon Sep 17 00:00:00 2001 From: Juergen Gross Date: Wed, 23 Nov 2022 07:45:10 +0100 Subject: mm: introduce arch_has_hw_nonleaf_pmd_young() When running as a Xen PV guests commit eed9a328aa1a ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") can cause a protection violation in pmdp_test_and_clear_young(): BUG: unable to handle page fault for address: ffff8880083374d0 #PF: supervisor write access in kernel mode #PF: error_code(0x0003) - permissions violation PGD 3026067 P4D 3026067 PUD 3027067 PMD 7fee5067 PTE 8010000008337065 Oops: 0003 [#1] PREEMPT SMP NOPTI CPU: 7 PID: 158 Comm: kswapd0 Not tainted 6.1.0-rc5-20221118-doflr+ #1 RIP: e030:pmdp_test_and_clear_young+0x25/0x40 This happens because the Xen hypervisor can't emulate direct writes to page table entries other than PTEs. This can easily be fixed by introducing arch_has_hw_nonleaf_pmd_young() similar to arch_has_hw_pte_young() and test that instead of CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG. Link: https://lkml.kernel.org/r/20221123064510.16225-1-jgross@suse.com Fixes: eed9a328aa1a ("mm: x86: add CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG") Signed-off-by: Juergen Gross Reported-by: Sander Eikelenboom Acked-by: Yu Zhao Tested-by: Sander Eikelenboom Acked-by: David Hildenbrand [core changes] Signed-off-by: Andrew Morton --- include/linux/pgtable.h | 11 +++++++++++ 1 file changed, 11 insertions(+) (limited to 'include/linux') diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 6b0d59269b33..5f0d7d0b9471 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -267,6 +267,17 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif +#ifndef arch_has_hw_nonleaf_pmd_young +/* + * Return whether the accessed bit in non-leaf PMD entries is supported on the + * local CPU. + */ +static inline bool arch_has_hw_nonleaf_pmd_young(void) +{ + return IS_ENABLED(CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG); +} +#endif + #ifndef arch_has_hw_pte_young /* * Return whether the accessed bit is supported on the local CPU. -- cgit From 1d351f1894342c378b96bb9ed89f8debb1e24e9f Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Mon, 28 Nov 2022 13:21:40 -0800 Subject: revert "kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible" It causes build failures with unusual CC/HOSTCC combinations. Quoting https://lkml.kernel.org/r/A222B1E6-69B8-4085-AD1B-27BDB72CA971@goldelico.com: HOSTCC scripts/mod/modpost.o - due to target missing In file included from include/linux/string.h:5, from scripts/mod/../../include/linux/license.h:5, from scripts/mod/modpost.c:24: include/linux/compiler.h:246:10: fatal error: asm/rwonce.h: No such file or directory 246 | #include | ^~~~~~~~~~~~~~ compilation terminated. ... The problem is that HOSTCC is not necessarily the same compiler or even architecture as CC and pulling in or files indirectly isn't a good idea then. My toolchain is providing HOSTCC = gcc (MacPorts) and CC = arm-linux-gnueabihf (built from gcc source) and all running on Darwin. If I change the include to I can then "HOSTCC scripts/mod/modpost.c" but then it fails for "CC kernel/module/main.c" not finding : CC kernel/module/main.o - due to target missing In file included from kernel/module/main.c:43:0: ./include/linux/license.h:5:20: fatal error: string.h: No such file or directory #include ^ compilation terminated. Reported-by: "H. Nikolaus Schaller" Cc: Sam James Signed-off-by: Andrew Morton --- include/linux/license.h | 2 -- 1 file changed, 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/license.h b/include/linux/license.h index ad937f57f2cb..7cce390f120b 100644 --- a/include/linux/license.h +++ b/include/linux/license.h @@ -2,8 +2,6 @@ #ifndef __LICENSE_H #define __LICENSE_H -#include - static inline int license_is_gpl_compatible(const char *license) { return (strcmp(license, "GPL") == 0 -- cgit From fbf8321238bac04368f57af572e05a9c01347a0b Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 7 Dec 2022 16:53:15 -1000 Subject: memcg: Fix possible use-after-free in memcg_write_event_control() memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's. Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type. Signed-off-by: Tejun Heo Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Cc: stable@kernel.org # v3.14+ Reported-by: Jann Horn Acked-by: Johannes Weiner Acked-by: Roman Gushchin Signed-off-by: Linus Torvalds --- include/linux/cgroup.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 528bd44b59e2..2b7d077de7ef 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -68,6 +68,7 @@ struct css_task_iter { struct list_head iters_node; /* css_set->task_iters */ }; +extern struct file_system_type cgroup_fs_type; extern struct cgroup_root cgrp_dfl_root; extern struct css_set init_css_set; -- cgit From 630dc25e43dacfb5af94cd41532e77c47ec1caff Mon Sep 17 00:00:00 2001 From: David Hildenbrand Date: Mon, 5 Dec 2022 16:08:57 +0100 Subject: mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit We use "unsigned long" to store a PFN in the kernel and phys_addr_t to store a physical address. On a 64bit system, both are 64bit wide. However, on a 32bit system, the latter might be 64bit wide. This is, for example, the case on x86 with PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans 32bit. The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses that case, and assumes that the maximum PFN is limited by an 32bit phys_addr_t. This implies, that SWP_PFN_BITS will currently only be able to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong. Let's rely on the number of bits in phys_addr_t instead, but make sure to not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in is_pfn_swap_entry() unhappy. Note that swp_entry_t is effectively an unsigned long and the maximum swap offset shares that value with the swap type. For example, on an 8 GiB x86 PAE system with a kernel config based on Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently fail removing migration entries (remove_migration_ptes()), because mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as swp_offset_pfn() wrongly masks off PFN bits. For example, split_huge_page_to_list()->...->remap_page() will leave migration entries in place and continue to unlock the page. Later, when we stumble over these migration entries (e.g., via /proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these migration entries shouldn't exist anymore and the page was unlocked. [ 33.067591] kernel BUG at include/linux/swapops.h:497! [ 33.067597] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G E 6.1.0-rc8+ #16 [ 33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 [ 33.067606] EIP: pagemap_pmd_range+0x644/0x650 [ 33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31 [ 33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000 [ 33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68 [ 33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246 [ 33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0 [ 33.067625] Call Trace: [ 33.067628] ? madvise_free_pte_range+0x720/0x720 [ 33.067632] ? smaps_pte_range+0x4b0/0x4b0 [ 33.067634] walk_pgd_range+0x325/0x720 [ 33.067637] ? mt_find+0x1d6/0x3a0 [ 33.067641] ? mt_find+0x1d6/0x3a0 [ 33.067643] __walk_page_range+0x164/0x170 [ 33.067646] walk_page_range+0xf9/0x170 [ 33.067648] ? __kmem_cache_alloc_node+0x2a8/0x340 [ 33.067653] pagemap_read+0x124/0x280 [ 33.067658] ? default_llseek+0x101/0x160 [ 33.067662] ? smaps_account+0x1d0/0x1d0 [ 33.067664] vfs_read+0x90/0x290 [ 33.067667] ? do_madvise.part.0+0x24b/0x390 [ 33.067669] ? debug_smp_processor_id+0x12/0x20 [ 33.067673] ksys_pread64+0x58/0x90 [ 33.067675] __ia32_sys_ia32_pread64+0x1b/0x20 [ 33.067680] __do_fast_syscall_32+0x4c/0xc0 [ 33.067683] do_fast_syscall_32+0x29/0x60 [ 33.067686] do_SYSENTER_32+0x15/0x20 [ 33.067689] entry_SYSENTER_32+0x98/0xf1 Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it readable and consistent. [david@redhat.com: rely on sizeof(phys_addr_t) and min_t() instead] Link: https://lkml.kernel.org/r/20221206105737.69478-1-david@redhat.com [david@redhat.com: use "int" for comparison, as we're only comparing numbers < 64] Link: https://lkml.kernel.org/r/1f157500-2676-7cef-a84e-9224ed64e540@redhat.com Link: https://lkml.kernel.org/r/20221205150857.167583-1-david@redhat.com Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry") Signed-off-by: David Hildenbrand Acked-by: Peter Xu Reviewed-by: Yang Shi Cc: Hugh Dickins Cc: Andrea Arcangeli Signed-off-by: Andrew Morton --- include/linux/swapops.h | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) (limited to 'include/linux') diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 86b95ccb81bb..b07b277d6a16 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -33,11 +33,13 @@ * can use the extra bits to store other information besides PFN. */ #ifdef MAX_PHYSMEM_BITS -#define SWP_PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) +#define SWP_PFN_BITS (MAX_PHYSMEM_BITS - PAGE_SHIFT) #else /* MAX_PHYSMEM_BITS */ -#define SWP_PFN_BITS (BITS_PER_LONG - PAGE_SHIFT) +#define SWP_PFN_BITS min_t(int, \ + sizeof(phys_addr_t) * 8 - PAGE_SHIFT, \ + SWP_TYPE_SHIFT) #endif /* MAX_PHYSMEM_BITS */ -#define SWP_PFN_MASK (BIT(SWP_PFN_BITS) - 1) +#define SWP_PFN_MASK (BIT(SWP_PFN_BITS) - 1) /** * Migration swap entry specific bitfield definitions. Layout: -- cgit From 4a7ba45b1a435e7097ca0f79a847d0949d0eb088 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Wed, 7 Dec 2022 16:53:15 -1000 Subject: memcg: fix possible use-after-free in memcg_write_event_control() memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's. Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type. Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Signed-off-by: Tejun Heo Reported-by: Jann Horn Acked-by: Roman Gushchin Acked-by: Johannes Weiner Cc: Linus Torvalds Cc: Michal Hocko Cc: Muchun Song Cc: Shakeel Butt Cc: [3.14+] Signed-off-by: Andrew Morton --- include/linux/cgroup.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h index 528bd44b59e2..2b7d077de7ef 100644 --- a/include/linux/cgroup.h +++ b/include/linux/cgroup.h @@ -68,6 +68,7 @@ struct css_task_iter { struct list_head iters_node; /* css_set->task_iters */ }; +extern struct file_system_type cgroup_fs_type; extern struct cgroup_root cgrp_dfl_root; extern struct css_set init_css_set; -- cgit