summaryrefslogtreecommitdiff
path: root/mm/hugetlb.c
diff options
context:
space:
mode:
authorFrank van der Linden <fvdl@google.com>2025-02-28 18:29:20 +0000
committerAndrew Morton <akpm@linux-foundation.org>2025-03-16 22:06:29 -0700
commitb1222550fbf73840e31a103305d03ad53b8f5a59 (patch)
treea64c20e1756976af7754978b88fdcca89f63ae0d /mm/hugetlb.c
parenteefd3d024a53c5d45e7e70a8677257b035e4de52 (diff)
mm/hugetlb: do pre-HVO for bootmem allocated pages
For large systems, the overhead of vmemmap pages for hugetlb is substantial. It's about 1.5% of memory, which is about 45G for a 3T system. If you want to configure most of that system for hugetlb (e.g. to use as backing memory for VMs), there is a chance of running out of memory on boot, even though you know that the 45G will become available later. To avoid this scenario, and since it's a waste to first allocate and then free that 45G during boot, do pre-HVO for hugetlb bootmem allocated pages ('gigantic' pages). pre-HVO is done by adding functions that are called from sparse_init_nid_early and sparse_init_nid_late. The first is called before memmap allocation, so it takes care of allocating memmap HVO-style. The second verifies that all bootmem pages look good, specifically it checks that they do not intersect with multiple zones. This can only be done from sparse_init_nid_late path, when zones have been initialized. The hugetlb page size must be aligned to the section size, and aligned to the size of memory described by the number of page structures contained in one PMD (since pre-HVO is not prepared to split PMDs). This should be true for most 'gigantic' pages, it is for 1G pages on x86, where both of these alignment requirements are 128M. This will only have an effect if hugetlb_bootmem_alloc was called early in boot. If not, it won't do anything, and HVO for bootmem hugetlb pages works as before. Link: https://lkml.kernel.org/r/20250228182928.2645936-20-fvdl@google.com Signed-off-by: Frank van der Linden <fvdl@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Hildenbrand <david@redhat.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Joao Martins <joao.m.martins@oracle.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: Oscar Salvador <osalvador@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin (Cruise) <roman.gushchin@linux.dev> Cc: Usama Arif <usamaarif642@gmail.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/hugetlb.c')
-rw-r--r--mm/hugetlb.c17
1 files changed, 14 insertions, 3 deletions
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index db0d35bc9b9b..d1134e915927 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3223,7 +3223,18 @@ found:
*/
memblock_reserved_mark_noinit(virt_to_phys((void *)m + PAGE_SIZE),
huge_page_size(h) - PAGE_SIZE);
- /* Put them into a private list first because mem_map is not up yet */
+
+ /*
+ * Put them into a private list first because mem_map is not up yet.
+ *
+ * For pre-HVO to work correctly, pages need to be on the list for
+ * the node they were actually allocated from. That node may be
+ * different in the case of fallback by memblock_alloc_try_nid_raw.
+ * So, extract the actual node first.
+ */
+ if (nid == NUMA_NO_NODE)
+ node = early_pfn_to_nid(PHYS_PFN(virt_to_phys(m)));
+
INIT_LIST_HEAD(&m->list);
list_add(&m->list, &huge_boot_pages[node]);
m->hstate = h;
@@ -3318,8 +3329,8 @@ static void __init prep_and_add_bootmem_folios(struct hstate *h,
}
}
-static bool __init hugetlb_bootmem_page_zones_valid(int nid,
- struct huge_bootmem_page *m)
+bool __init hugetlb_bootmem_page_zones_valid(int nid,
+ struct huge_bootmem_page *m)
{
unsigned long start_pfn;
bool valid;