diff options
author | Frank van der Linden <fvdl@google.com> | 2025-02-28 18:29:20 +0000 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2025-03-16 22:06:29 -0700 |
commit | b1222550fbf73840e31a103305d03ad53b8f5a59 (patch) | |
tree | a64c20e1756976af7754978b88fdcca89f63ae0d /mm/sparse-vmemmap.c | |
parent | eefd3d024a53c5d45e7e70a8677257b035e4de52 (diff) |
mm/hugetlb: do pre-HVO for bootmem allocated pages
For large systems, the overhead of vmemmap pages for hugetlb is
substantial. It's about 1.5% of memory, which is about 45G for a 3T
system. If you want to configure most of that system for hugetlb (e.g.
to use as backing memory for VMs), there is a chance of running out of
memory on boot, even though you know that the 45G will become available
later.
To avoid this scenario, and since it's a waste to first allocate and then
free that 45G during boot, do pre-HVO for hugetlb bootmem allocated pages
('gigantic' pages).
pre-HVO is done by adding functions that are called from
sparse_init_nid_early and sparse_init_nid_late. The first is called
before memmap allocation, so it takes care of allocating memmap HVO-style.
The second verifies that all bootmem pages look good, specifically it
checks that they do not intersect with multiple zones. This can only be
done from sparse_init_nid_late path, when zones have been initialized.
The hugetlb page size must be aligned to the section size, and aligned to
the size of memory described by the number of page structures contained in
one PMD (since pre-HVO is not prepared to split PMDs). This should be
true for most 'gigantic' pages, it is for 1G pages on x86, where both of
these alignment requirements are 128M.
This will only have an effect if hugetlb_bootmem_alloc was called early in
boot. If not, it won't do anything, and HVO for bootmem hugetlb pages
works as before.
Link: https://lkml.kernel.org/r/20250228182928.2645936-20-fvdl@google.com
Signed-off-by: Frank van der Linden <fvdl@google.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev>
Cc: Usama Arif <usamaarif642@gmail.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Yu Zhao <yuzhao@google.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/sparse-vmemmap.c')
-rw-r--r-- | mm/sparse-vmemmap.c | 4 |
1 files changed, 4 insertions, 0 deletions
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index 8cc848c4b17c..fd2ab5118e13 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -32,6 +32,8 @@ #include <asm/pgalloc.h> #include <asm/tlbflush.h> +#include "hugetlb_vmemmap.h" + /* * Flags for vmemmap_populate_range and friends. */ @@ -594,6 +596,7 @@ struct page * __meminit __populate_section_memmap(unsigned long pfn, */ void __init sparse_vmemmap_init_nid_early(int nid) { + hugetlb_vmemmap_init_early(nid); } /* @@ -604,5 +607,6 @@ void __init sparse_vmemmap_init_nid_early(int nid) */ void __init sparse_vmemmap_init_nid_late(int nid) { + hugetlb_vmemmap_init_late(nid); } #endif |