git.armlinux.org.uk/linux.git - Linus' kernel tree

Age	Commit message (Collapse)	Author
2024-04-19	crypto: x86/sha256-ni - convert to use rounds macros	Eric Biggers
	To avoid source code duplication, do the SHA-256 rounds using macros. This reduces the length of sha256_ni_asm.S by 153 lines while still producing the exact same object file. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-19	crypto: x86/aes-xts - access round keys using single-byte offsets	Eric Biggers
	Access the AES round keys using offsets -716 through 716, instead of 016 through 1416. This allows VEX-encoded instructions to address all round keys using 1-byte offsets, whereas before some needed 4-byte offsets. This decreases the code size of aes-xts-avx-x86_64.o by 4.2%. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12	crypto: x86/aes-xts - make non-AVX implementation use new glue code	Eric Biggers
	Make the non-AVX implementation of AES-XTS (xts-aes-aesni) use the new glue code that was introduced for the AVX implementations of AES-XTS. This reduces code size, and it improves the performance of xts-aes-aesni due to the optimization for messages that don't span page boundaries. This required moving the new glue functions higher up in the file and allowing the IV encryption function to be specified by the caller. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12	crypto: x86/sha512-avx2 - add missing vzeroupper	Eric Biggers
	Since sha512_transform_rorx() uses ymm registers, execute vzeroupper before returning from it. This is necessary to avoid reducing the performance of SSE code. Fixes: e01d69cb0195 ("crypto: sha512 - Optimized SHA512 x86_64 assembly routine using AVX instructions.") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12	crypto: x86/sha256-avx2 - add missing vzeroupper	Eric Biggers
	Since sha256_transform_rorx() uses ymm registers, execute vzeroupper before returning from it. This is necessary to avoid reducing the performance of SSE code. Fixes: d34a460092d8 ("crypto: sha256 - Optimized sha256 x86_64 routine using AVX2's RORX instructions") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-12	crypto: x86/nh-avx2 - add missing vzeroupper	Eric Biggers
	Since nh_avx2() uses ymm registers, execute vzeroupper before returning from it. This is necessary to avoid reducing the performance of SSE code. Fixes: 0f961f9f670e ("crypto: x86/nhpoly1305 - add AVX2 accelerated NHPoly1305") Signed-off-by: Eric Biggers <ebiggers@google.com> Acked-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05	crypto: x86/aes-xts - wire up VAES + AVX10/512 implementation	Eric Biggers
	Add an AES-XTS implementation "xts-aes-vaes-avx10_512" for x86_64 CPUs with the VAES, VPCLMULQDQ, and either AVX10/512 or AVX512BW + AVX512VL extensions. This implementation uses zmm registers to operate on four AES blocks at a time. The assembly code is instantiated using a macro so that most of the source code is shared with other implementations. To avoid downclocking on older Intel CPU models, an exclusion list is used to prevent this 512-bit implementation from being used by default on some CPU models. They will use xts-aes-vaes-avx10_256 instead. For now, this exclusion list is simply coded into aesni-intel_glue.c. It may make sense to eventually move it into a more central location. xts-aes-vaes-avx10_512 is slightly faster than xts-aes-vaes-avx10_256 on some current CPUs. E.g., on AMD Zen 4, AES-256-XTS decryption throughput increases by 13% with 4096-byte inputs, or 14% with 512-byte inputs. On Intel Sapphire Rapids, AES-256-XTS decryption throughput increases by 2% with 4096-byte inputs, or 3% with 512-byte inputs. Future CPUs may provide stronger 512-bit support, in which case a larger benefit should be seen. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05	crypto: x86/aes-xts - wire up VAES + AVX10/256 implementation	Eric Biggers
	Add an AES-XTS implementation "xts-aes-vaes-avx10_256" for x86_64 CPUs with the VAES, VPCLMULQDQ, and either AVX10/256 or AVX512BW + AVX512VL extensions. This implementation avoids using zmm registers, instead using ymm registers to operate on two AES blocks at a time. The assembly code is instantiated using a macro so that most of the source code is shared with other implementations. This is the optimal implementation on CPUs that support VAES and AVX512 but where the zmm registers should not be used due to downclocking effects, for example Intel's Ice Lake. It should also be the optimal implementation on future CPUs that support AVX10/256 but not AVX10/512. The performance is slightly better than that of xts-aes-vaes-avx2, which uses the same 256-bit vector length, due to factors such as being able to use ymm16-ymm31 to cache the AES round keys, and being able to use the vpternlogd instruction to do XORs more efficiently. For example, on Ice Lake, the throughput of decrypting 4096-byte messages with AES-256-XTS is 6.6% higher with xts-aes-vaes-avx10_256 than with xts-aes-vaes-avx2. While this is a small improvement, it is straightforward to provide this implementation (xts-aes-vaes-avx10_256) as long as we are providing xts-aes-vaes-avx2 and xts-aes-vaes-avx10_512 anyway, due to the way the _aes_xts_crypt macro is structured. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05	crypto: x86/aes-xts - wire up VAES + AVX2 implementation	Eric Biggers
	Add an AES-XTS implementation "xts-aes-vaes-avx2" for x86_64 CPUs with the VAES, VPCLMULQDQ, and AVX2 extensions, but not AVX512 or AVX10. This implementation uses ymm registers to operate on two AES blocks at a time. The assembly code is instantiated using a macro so that most of the source code is shared with other implementations. This is the optimal implementation on AMD Zen 3. It should also be the optimal implementation on Intel Alder Lake, which similarly supports VAES but not AVX512. Comparing to xts-aes-aesni-avx on Zen 3, xts-aes-vaes-avx2 provides 70% higher AES-256-XTS decryption throughput with 4096-byte messages, or 23% higher with 512-byte messages. A large improvement is also seen with CPUs that do support AVX512 (e.g., 98% higher AES-256-XTS decryption throughput on Ice Lake with 4096-byte messages), though the following patches add AVX512 optimized implementations to get a bit more performance on those CPUs. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05	crypto: x86/aes-xts - wire up AESNI + AVX implementation	Eric Biggers
	Add an AES-XTS implementation "xts-aes-aesni-avx" for x86_64 CPUs that have the AES-NI and AVX extensions but not VAES. It's similar to the existing xts-aes-aesni in that uses xmm registers to operate on one AES block at a time. It differs from xts-aes-aesni in the following ways: - It uses the VEX-coded (non-destructive) instructions from AVX. This improves performance slightly. - It incorporates some additional optimizations such as interleaving the tweak computation with AES en/decryption, handling single-page messages more efficiently, and caching the first round key. - It supports only 64-bit (x86_64). - It's generated by an assembly macro that will also be used to generate VAES-based implementations. The performance improvement over xts-aes-aesni varies from small to large, depending on the CPU and other factors such as the size of the messages en/decrypted. For example, the following increases in AES-256-XTS decryption throughput are seen on the following CPUs: \| 4096-byte messages \| 512-byte messages \| ----------------------+--------------------+-------------------+ Intel Skylake \| 6% \| 31% \| Intel Cascade Lake \| 4% \| 26% \| AMD Zen 1 \| 61% \| 73% \| AMD Zen 2 \| 36% \| 59% \| (The above CPUs don't support VAES, so they can't use VAES instead.) While this isn't as large an improvement as what VAES provides, this still seems worthwhile. This implementation is fairly easy to provide based on the assembly macro that's needed for VAES anyway, and it will be the best implementation on a large number of CPUs (very roughly, the CPUs launched by Intel and AMD from 2011 to 2018). This makes the existing xts-aes-aesni mostly obsolete. For now, leave it in place to support 32-bit kernels and also CPUs like Intel Westmere that support AES-NI but not AVX. (We could potentially remove it anyway and just rely on the indirect acceleration via ecb-aes-aesni in those cases, but that change will need to be considered separately.) Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-05	crypto: x86/aes-xts - add AES-XTS assembly macro for modern CPUs	Eric Biggers
	Add an assembly file aes-xts-avx-x86_64.S which contains a macro that expands into AES-XTS implementations for x86_64 CPUs that support at least AES-NI and AVX, optionally also taking advantage of VAES, VPCLMULQDQ, and AVX512 or AVX10. This patch doesn't expand the macro at all. Later patches will do so, adding each implementation individually so that the motivation and use case for each individual implementation can be fully presented. The file also provides a function aes_xts_encrypt_iv() which handles the encryption of the IV (tweak), using AES-NI and AVX. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02	crypto: x86/aesni - Update aesni_set_key() to return void	Chang S. Bae
	The aesni_set_key() implementation has no error case, yet its prototype specifies to return an error code. Modify the function prototype to return void and adjust the related code. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-04-02	crypto: x86/aesni - Rearrange AES key size check	Chang S. Bae
	aes_expandkey() already includes an AES key size check. If AES-NI is unusable, invoke the function without the size check. Also, use aes_check_keylen() instead of open code. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2024-01-10	Merge tag 'v6.8-p1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Add incremental lskcipher/skcipher processing Algorithms: - Remove SHA1 from drbg - Remove CFB and OFB Drivers: - Add comp high perf mode configuration in hisilicon/zip - Add support for 420xx devices in qat - Add IAA Compression Accelerator driver" * tag 'v6.8-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (172 commits) crypto: iaa - Account for cpu-less numa nodes crypto: scomp - fix req->dst buffer overflow crypto: sahara - add support for crypto_engine crypto: sahara - remove error message for bad aes request size crypto: sahara - remove unnecessary NULL assignments crypto: sahara - remove 'active' flag from sahara_aes_reqctx struct crypto: sahara - use dev_err_probe() crypto: sahara - use devm_clk_get_enabled() crypto: sahara - use BIT() macro crypto: sahara - clean up macro indentation crypto: sahara - do not resize req->src when doing hash operations crypto: sahara - fix processing hash requests with req->nbytes < sg->length crypto: sahara - improve error handling in sahara_sha_process() crypto: sahara - fix wait_for_completion_timeout() error handling crypto: sahara - fix ahash reqsize crypto: sahara - handle zero-length aes requests crypto: skcipher - remove excess kerneldoc members crypto: shash - remove excess kerneldoc members crypto: qat - generate dynamically arbiter mappings crypto: qat - add support for ring pair level telemetry ...
2024-01-03	arch/x86: Fix typos	Bjorn Helgaas
	Fix typos, most reported by "codespell arch/x86". Only touches comments, no code changes. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20240103004011.1758650-1-helgaas@kernel.org
2023-12-08	crypto: x86/sm4 - Remove cfb(sm4)	Herbert Xu
	Remove the unused CFB implementation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-11-17	crypto: x86/sha256 - autoload if SHA-NI detected	Eric Biggers
	The x86 SHA-256 module contains four implementations: SSSE3, AVX, AVX2, and SHA-NI. Commit 1c43c0f1f84a ("crypto: x86/sha - load modules based on CPU features") made the module be autoloaded when SSSE3, AVX, or AVX2 is detected. The omission of SHA-NI appears to be an oversight, perhaps because of the outdated file-level comment. This patch fixes this, though in practice this makes no difference because SSSE3 is a subset of the other three features anyway. Indeed, sha256_ni_transform() executes SSSE3 instructions such as pshufb. Reviewed-by: Roxana Nicolescu <roxana.nicolescu@canonical.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-11-17	crypto: x86/sha1 - autoload if SHA-NI detected	Eric Biggers
	The x86 SHA-1 module contains four implementations: SSSE3, AVX, AVX2, and SHA-NI. Commit 1c43c0f1f84a ("crypto: x86/sha - load modules based on CPU features") made the module be autoloaded when SSSE3, AVX, or AVX2 is detected. The omission of SHA-NI appears to be an oversight, perhaps because of the outdated file-level comment. This patch fixes this, though in practice this makes no difference because SSSE3 is a subset of the other three features anyway. Indeed, sha1_ni_transform() executes SSSE3 instructions such as pshufb. Reviewed-by: Roxana Nicolescu <roxana.nicolescu@canonical.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20	crypto: x86/nhpoly1305 - implement ->digest	Eric Biggers
	Implement the ->digest method to improve performance on single-page messages by reducing the number of indirect calls. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-20	crypto: x86/sha256 - implement ->digest for sha256	Eric Biggers
	Implement a ->digest function for sha256-ssse3, sha256-avx, sha256-avx2, and sha256-ni. This improves the performance of crypto_shash_digest() with these algorithms by reducing the number of indirect calls that are made. For now, don't bother with this for sha224, since sha224 is rarely used. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-05	crypto: x86/aesni - Perform address alignment early for XTS mode	Chang S. Bae
	Currently, the alignment of each field in struct aesni_xts_ctx occurs right before every access. However, it's possible to perform this alignment ahead of time. Introduce a helper function that converts struct crypto_skcipher tfm to struct aesni_xts_ctx ctx and returns an aligned address. Utilize this helper function at the beginning of each XTS function and then eliminate redundant alignment code. Suggested-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/all/ZFWQ4sZEVu%2FLHq+Q@gmail.com/ Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-05	crypto: x86/aesni - Correct the data type in struct aesni_xts_ctx	Chang S. Bae
	Currently, every field in struct aesni_xts_ctx is defined as a byte array of the same size as struct crypto_aes_ctx. This data type is obscure and the choice lacks justification. To rectify this, update the field type in struct aesni_xts_ctx to match its actual structure. Suggested-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/all/ZFWQ4sZEVu%2FLHq+Q@gmail.com/ Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-10-05	crypto: x86/aesni - Refactor the common address alignment code	Chang S. Bae
	The address alignment code has been duplicated for each mode. Instead of duplicating the same code, refactor the alignment code and simplify the alignment helpers. Suggested-by: Eric Biggers <ebiggers@kernel.org> Link: https://lore.kernel.org/all/20230526065414.GB875@sol.localdomain/ Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-09-20	crypto: x86/sha - load modules based on CPU features	Roxana Nicolescu
	x86 optimized crypto modules are built as modules rather than build-in and they are not loaded when the crypto API is initialized, resulting in the generic builtin module (sha1-generic) being used instead. It was discovered when creating a sha1/sha256 checksum of a 2Gb file by using kcapi-tools because it would take significantly longer than creating a sha512 checksum of the same file. trace-cmd showed that for sha1/256 the generic module was used, whereas for sha512 the optimized module was used instead. Add module aliases() for these x86 optimized crypto modules based on CPU feature bits so udev gets a chance to load them later in the boot process. This resulted in ~3x decrease in the real-time execution of kcapi-dsg. Fix is inspired from commit aa031b8f702e ("crypto: x86/sha512 - load based on CPU features") where a similar fix was done for sha512. Cc: stable@vger.kernel.org # 5.15+ Suggested-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Suggested-by: Julian Andres Klode <julian.klode@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-09-20	crypto: aesni - Fix double word in comments	Bo Liu
	Remove the repeated word "if" in comments. Signed-off-by: Bo Liu <liubo03@inspur.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-07-22	crypto: x86/aesni - remove unused parameter to aes_set_key_common()	Eric Biggers
	The 'tfm' parameter to aes_set_key_common() is never used, so remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-07-14	crypto: x86/aesni - Align the address before aes_set_key_common()	Chang S. Bae
	aes_set_key_common() performs runtime alignment to the void *raw_ctx pointer. This facilitates consistent access to the 16byte-aligned address during key extension. However, the alignment is already handlded in the GCM-related setkey functions before invoking the common function. Consequently, the alignment in the common function is unnecessary for those functions. To establish a consistent approach throughout the glue code, remove the aes_ctx() call from its current location. Instead, place it at each call site where the runtime alignment is currently absent. Link: https://lore.kernel.org/lkml/20230605024623.GA4653@quark.localdomain/ Suggested-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Cc: linux-crypto@vger.kernel.org Cc: x86@kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-05-29	Merge tag 'v6.4-p3' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "Fix an alignment crash in x86/aria" * tag 'v6.4-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors
2023-05-24	crypto: x86/aria - Use 16 byte alignment for GFNI constant vectors	Ard Biesheuvel
	The GFNI routines in the AVX version of the ARIA implementation now use explicit VMOVDQA instructions to load the constant input vectors, which means they must be 16 byte aligned. So ensure that this is the case, by dropping the section split and the incorrect .align 8 directive, and emitting the constants into the 16-byte aligned section instead. Note that the AVX2 version of this code deviates from this pattern, and does not require a similar fix, given that it loads these contants as 8-byte memory operands, for which AVX2 permits any alignment. Cc: Taehee Yoo <ap420073@gmail.com> Fixes: 8b84475318641c2b ("crypto: x86/aria-avx - Do not use avx2 instructions") Reported-by: syzbot+a6abcf08bad8b18fd198@syzkaller.appspotmail.com Tested-by: syzbot+a6abcf08bad8b18fd198@syzkaller.appspotmail.com Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-27	Merge tag 'modules-6.4-rc1' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull module updates from Luis Chamberlain: "The summary of the changes for this pull requests is: - Song Liu's new struct module_memory replacement - Nick Alcock's MODULE_LICENSE() removal for non-modules - My cleanups and enhancements to reduce the areas where we vmalloc module memory for duplicates, and the respective debug code which proves the remaining vmalloc pressure comes from userspace. Most of the changes have been in linux-next for quite some time except the minor fixes I made to check if a module was already loaded prior to allocating the final module memory with vmalloc and the respective debug code it introduces to help clarify the issue. Although the functional change is small it is rather safe as it can only help reduce vmalloc space for duplicates and is confirmed to fix a bootup issue with over 400 CPUs with KASAN enabled. I don't expect stable kernels to pick up that fix as the cleanups would have also had to have been picked up. Folks on larger CPU systems with modules will want to just upgrade if vmalloc space has been an issue on bootup. Given the size of this request, here's some more elaborate details: The functional change change in this pull request is the very first patch from Song Liu which replaces the 'struct module_layout' with a new 'struct module_memory'. The old data structure tried to put together all types of supported module memory types in one data structure, the new one abstracts the differences in memory types in a module to allow each one to provide their own set of details. This paves the way in the future so we can deal with them in a cleaner way. If you look at changes they also provide a nice cleanup of how we handle these different memory areas in a module. This change has been in linux-next since before the merge window opened for v6.3 so to provide more than a full kernel cycle of testing. It's a good thing as quite a bit of fixes have been found for it. Jason Baron then made dynamic debug a first class citizen module user by using module notifier callbacks to allocate / remove module specific dynamic debug information. Nick Alcock has done quite a bit of work cross-tree to remove module license tags from things which cannot possibly be module at my request so to: a) help him with his longer term tooling goals which require a deterministic evaluation if a piece a symbol code could ever be part of a module or not. But quite recently it is has been made clear that tooling is not the only one that would benefit. Disambiguating symbols also helps efforts such as live patching, kprobes and BPF, but for other reasons and R&D on this area is active with no clear solution in sight. b) help us inch closer to the now generally accepted long term goal of automating all the MODULE_LICENSE() tags from SPDX license tags In so far as a) is concerned, although module license tags are a no-op for non-modules, tools which would want create a mapping of possible modules can only rely on the module license tag after the commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"). Nick has been working on this for years and AFAICT I was the only one to suggest two alternatives to this approach for tooling. The complexity in one of my suggested approaches lies in that we'd need a possible-obj-m and a could-be-module which would check if the object being built is part of any kconfig build which could ever lead to it being part of a module, and if so define a new define -DPOSSIBLE_MODULE [0]. A more obvious yet theoretical approach I've suggested would be to have a tristate in kconfig imply the same new -DPOSSIBLE_MODULE as well but that means getting kconfig symbol names mapping to modules always, and I don't think that's the case today. I am not aware of Nick or anyone exploring either of these options. Quite recently Josh Poimboeuf has pointed out that live patching, kprobes and BPF would benefit from resolving some part of the disambiguation as well but for other reasons. The function granularity KASLR (fgkaslr) patches were mentioned but Joe Lawrence has clarified this effort has been dropped with no clear solution in sight [1]. In the meantime removing module license tags from code which could never be modules is welcomed for both objectives mentioned above. Some developers have also welcomed these changes as it has helped clarify when a module was never possible and they forgot to clean this up, and so you'll see quite a bit of Nick's patches in other pull requests for this merge window. I just picked up the stragglers after rc3. LWN has good coverage on the motivation behind this work [2] and the typical cross-tree issues he ran into along the way. The only concrete blocker issue he ran into was that we should not remove the MODULE_LICENSE() tags from files which have no SPDX tags yet, even if they can never be modules. Nick ended up giving up on his efforts due to having to do this vetting and backlash he ran into from folks who really did not understand the core of the issue nor were providing any alternative / guidance. I've gone through his changes and dropped the patches which dropped the module license tags where an SPDX license tag was missing, it only consisted of 11 drivers. To see if a pull request deals with a file which lacks SPDX tags you can just use: ./scripts/spdxcheck.py -f \ $(git diff --name-only commid-id \| xargs echo) You'll see a core module file in this pull request for the above, but that's not related to his changes. WE just need to add the SPDX license tag for the kernel/module/kmod.c file in the future but it demonstrates the effectiveness of the script. Most of Nick's changes were spread out through different trees, and I just picked up the slack after rc3 for the last kernel was out. Those changes have been in linux-next for over two weeks. The cleanups, debug code I added and final fix I added for modules were motivated by David Hildenbrand's report of boot failing on a systems with over 400 CPUs when KASAN was enabled due to running out of virtual memory space. Although the functional change only consists of 3 lines in the patch "module: avoid allocation if module is already present and ready", proving that this was the best we can do on the modules side took quite a bit of effort and new debug code. The initial cleanups I did on the modules side of things has been in linux-next since around rc3 of the last kernel, the actual final fix for and debug code however have only been in linux-next for about a week or so but I think it is worth getting that code in for this merge window as it does help fix / prove / evaluate the issues reported with larger number of CPUs. Userspace is not yet fixed as it is taking a bit of time for folks to understand the crux of the issue and find a proper resolution. Worst come to worst, I have a kludge-of-concept [3] of how to make kernel_read() calls for modules unique / converge them, but I'm currently inclined to just see if userspace can fix this instead" Link: https://lore.kernel.org/all/Y/kXDqW+7d71C4wz@bombadil.infradead.org/ [0] Link: https://lkml.kernel.org/r/025f2151-ce7c-5630-9b90-98742c97ac65@redhat.com [1] Link: https://lwn.net/Articles/927569/ [2] Link: https://lkml.kernel.org/r/20230414052840.1994456-3-mcgrof@kernel.org [3] tag 'modules-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (121 commits) module: add debugging auto-load duplicate module support module: stats: fix invalid_mod_bytes typo module: remove use of uninitialized variable len module: fix building stats for 32-bit targets module: stats: include uapi/linux/module.h module: avoid allocation if module is already present and ready module: add debug stats to help identify memory pressure module: extract patient module check into helper modules/kmod: replace implementation with a semaphore Change DEFINE_SEMAPHORE() to take a number argument module: fix kmemleak annotations for non init ELF sections module: Ignore L0 and rename is_arm_mapping_symbol() module: Move is_arm_mapping_symbol() to module_symbol.h module: Sync code of is_arm_mapping_symbol() scripts/gdb: use mem instead of core_layout to get the module address interconnect: remove module-related code interconnect: remove MODULE_LICENSE in non-modules zswap: remove MODULE_LICENSE in non-modules zpool: remove MODULE_LICENSE in non-modules x86/mm/dump_pagetables: remove MODULE_LICENSE in non-modules ...
2023-04-20	crypto: x86/sha - Use local .L symbols for code	Ard Biesheuvel
	Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/crc32 - Use local .L symbols for code	Ard Biesheuvel
	Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/aesni - Use local .L symbols for code	Ard Biesheuvel
	Avoid cluttering up the kallsyms symbol table with entries that should not end up in things like backtraces, as they have undescriptive and generated identifiers. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/sha256 - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/ghash - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/des3 - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/crc32c - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/cast6 - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/cast5 - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/camellia - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Co-developed-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Thomas Garnier <thgarnie@chromium.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/aria - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/aesni - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. In the GCM case, we can get rid of the oversized permutation array entirely while at it. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-20	crypto: x86/aegis128 - Use RIP-relative addressing	Ard Biesheuvel
	Prefer RIP-relative addressing where possible, which removes the need for boot time relocation fixups. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-04-13	crypto: blake2s: remove module_init and module.h inclusion	Nick Alcock
	Now this can no longer be built as a module, drop all remaining module-related code as well. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: linux-crypto@vger.kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-04-13	crypto: remove MODULE_LICENSE in non-modules	Nick Alcock
	Since commit 8b41fc4454e ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations are used to identify modules. As a consequence, uses of the macro in non-modules will cause modprobe to misidentify their containing object file as a module when it is not (false positives), and modprobe might succeed rather than failing with a suitable error message. So remove it in the files in this commit, none of which can be built as modules. Signed-off-by: Nick Alcock <nick.alcock@oracle.com> Suggested-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: linux-modules@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Hitomi Hasegawa <hasegawa-hitomi@fujitsu.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: linux-crypto@vger.kernel.org Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2023-02-14	crypto: x86/aria-avx - Do not use avx2 instructions	Taehee Yoo
	vpbroadcastb and vpbroadcastd are not AVX instructions. But the aria-avx assembly code contains these instructions. So, kernel panic will occur if the aria-avx works on AVX2 unsupported CPU. vbroadcastss, and vpshufb are used to avoid using vpbroadcastb in it. Unfortunately, this change reduces performance by about 5%. Also, vpbroadcastd is simply replaced by vmovdqa in it. Fixes: ba3579e6e45c ("crypto: aria-avx - add AES-NI/AVX/x86_64/GFNI assembler implementation of aria cipher") Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: Erhard F. <erhard_f@mailbox.org> Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-02-10	crypto: x86/blowfish - Eliminate use of SYM_TYPED_FUNC_START in asm	Peter Lafreniere
	Now that we use the ECB/CBC macros, none of the asm functions in blowfish-x86_64 are called indirectly. So we can safely use SYM_FUNC_START instead of SYM_TYPED_FUNC_START with no effect, allowing us to remove an include. Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-02-10	crypto: x86/blowfish - Convert to use ECB/CBC helpers	Peter Lafreniere
	We can simplify the blowfish-x86_64 glue code by using the preexisting ECB/CBC helper macros. Additionally, this allows for easier reuse of asm functions in later x86 implementations of blowfish. This involves: 1 - Modifying blowfish_dec_blk_4way() to xor outputs when a flag is passed. 2 - Renaming blowfish_dec_blk_4way() to __blowfish_dec_blk_4way(). 3 - Creating two wrapper functions around __blowfish_dec_blk_4way() for use in the ECB/CBC macros. 4 - Removing the custom ecb_encrypt() and cbc_encrypt() routines in favor of macro-based routines. Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-02-10	crypto: x86/blowfish - Remove unused encode parameter	Peter Lafreniere
	The blowfish-x86_64 encryption functions have an unused argument. Remove it. This involves: 1 - Removing xor_block() macros. 2 - Removing handling of fourth argument from __blowfish_enc_blk{,_4way}() functions. 3 - Renaming __blowfish_enc_blk{,_4way}() to blowfish_enc_blk{,_4way}(). 4 - Removing the blowfish_enc_blk{,_4way}() wrappers from blowfish_glue.c 5 - Temporarily using SYM_TYPED_FUNC_START for now indirectly-callable encode functions. Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2023-02-03	crypto: x86 - exit fpu context earlier in ECB/CBC macros	Peter Lafreniere
	Currently the ecb/cbc macros hold fpu context unnecessarily when using scalar cipher routines (e.g. when handling odd sizes of blocks per walk). Change the macros to drop fpu context as soon as the fpu is out of use. No performance impact found (on Intel Haswell). Signed-off-by: Peter Lafreniere <peter@n8pjl.ca> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>