summaryrefslogtreecommitdiff
path: root/mm
AgeCommit message (Collapse)Author
2025-01-13readahead: properly shorten readahead when falling back to do_page_cache_ra()Jan Kara
When we succeed in creating some folios in page_cache_ra_order() but then need to fallback to single page folios, we don't shorten the amount to read passed to do_page_cache_ra() by the amount we've already read. This then results in reading more and also in placing another readahead mark in the middle of the readahead window which confuses readahead code. Fix the problem by properly reducing number of pages to read. Unlike previous attempt at this fix (commit 7c877586da31) which had to be reverted, we are now careful to check there is indeed something to read so that we don't submit negative-sized readahead. Link: https://lkml.kernel.org/r/20241204181016.15273-3-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13readahead: don't shorten readahead window in read_pages()Jan Kara
Patch series "readahead: Reintroduce fix for improper RA window sizing". This small patch series reintroduces a fix of readahead window confusion (and thus read throughput reduction) when page_cache_ra_order() ends up failing due to folios already present in the page cache. After thinking about this for a while I have ended up with a dumb fix that just rechecks if we have something to read before calling do_page_cache_ra(). This fixes the problem reported in [1]. I still think it doesn't make much sense to update readahead window size in read_pages() so patch 1 removes that but the real fix in patch 2 does not depend on it. [1] https://lore.kernel.org/all/49648605-d800-4859-be49-624bbe60519d@gmail.com This patch (of 2): When ->readahead callback doesn't read all requested pages, read_pages() shortens the readahead window (ra->size). However we don't know why pages were not read and what appropriate window size is. So don't try to secondguess the filesystem. If it needs different readahead window, it should set it manually similarly as during expansion the filesystem can use readahead_expand(). Link: https://lkml.kernel.org/r/20241204181016.15273-1-jack@suse.cz Link: https://lkml.kernel.org/r/20241204181016.15273-2-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: forward the gfp flags from alloc_contig_range() to ↵David Hildenbrand
post_alloc_hook() In the __GFP_COMP case, we already pass the gfp_flags to prep_new_page()->post_alloc_hook(). However, in the !__GFP_COMP case, we essentially pass only hardcoded __GFP_MOVABLE to post_alloc_hook(), preventing some action modifiers from being effective.. Let's pass our now properly adjusted gfp flags there as well. This way, we can now support __GFP_ZERO for alloc_contig_*(). As a side effect, we now also support __GFP_SKIP_ZERO and__GFP_ZEROTAGS; but we'll keep the more special stuff (KASAN, NOLOCKDEP) disabled for now. It's worth noting that with __GFP_ZERO, we might unnecessarily zero pages when we have to release part of our range using free_contig_range() again. This can be optimized in the future, if ever required; the caller we'll be converting (powernv/memtrace) next won't trigger this. Link: https://lkml.kernel.org/r/20241203094732.200195-6-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: sort out the alloc_contig_range() gfp flags messDavid Hildenbrand
It's all a bit complicated for alloc_contig_range(). For example, we don't support many flags, so let's start bailing out on unsupported ones -- ignoring the placement hints, as we are already given the range to allocate. While we currently set cc.gfp_mask, in __alloc_contig_migrate_range() we simply create yet another GFP mask whereby we ignore the reclaim flags specify by the caller. That looks very inconsistent. Let's clean it up, constructing the gfp flags used for compaction/migration exactly once. Update the documentation of the gfp_mask parameter for alloc_contig_range() and alloc_contig_pages(). Link: https://lkml.kernel.org/r/20241203094732.200195-5-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: make __alloc_contig_migrate_range() staticDavid Hildenbrand
The single user is in page_alloc.c. Link: https://lkml.kernel.org/r/20241203094732.200195-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_isolation: don't pass gfp flags to start_isolate_page_range()David Hildenbrand
The parameter is unused, so let's stop passing it. Link: https://lkml.kernel.org/r/20241203094732.200195-3-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_isolation: don't pass gfp flags to isolate_single_pageblock()David Hildenbrand
Patch series "mm/page_alloc: gfp flags cleanups for alloc_contig_*()", v2. Let's clean up the gfp flags handling, and support __GFP_ZERO, such that we can finally remove the TODO in memtrace code. This patch (of 6): The flags are no longer used, we can stop passing them to isolate_single_pageblock(). Link: https://lkml.kernel.org/r/20241203094732.200195-1-david@redhat.com Link: https://lkml.kernel.org/r/20241203094732.200195-2-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Naveen N Rao <naveen@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/memory_hotplug: move debug_pagealloc_map_pages() into online_pages_range()David Hildenbrand
In the near future, we want to have a single way to handover PageOffline pages to the buddy, whereby they could have: (a) Never been exposed to the buddy before: kept PageOffline when onlining the memory block. (b) Been allocated from the buddy, for example using alloc_contig_range() to then be set PageOffline, Let's start by making generic_online_page()->__free_pages_core() less special compared to ordinary page freeing (e.g., free_contig_range()), and perform the debug_pagealloc_map_pages() call unconditionally, even when the online callback might decide to keep the pages offline. All pages are already initialized with PageOffline, so nobody touches them either way. Link: https://lkml.kernel.org/r/20241203102050.223318-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/vma: move __vm_munmap() to mm/vma.cLorenzo Stoakes
This was arbitrarily left in mmap.c it makes no sense being there, move it to vma.c to render it testable. Link: https://lkml.kernel.org/r/5e5e81807c54dfbe363edb2d431eb3d7a37fcdba.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/vma: move stack expansion logic to mm/vma.cLorenzo Stoakes
We build on previous work making expand_downwards() an entirely internal function. This logic is subtle and so it is highly useful to get it into vma.c so we can then userland unit test. We must additionally move acct_stack_growth() to vma.c as it is a helper function used by both expand_downwards() and expand_upwards(). We are also then able to mark anon_vma_interval_tree_pre_update_vma() and anon_vma_interval_tree_post_update_vma() static as these are no longer used by anything else. Link: https://lkml.kernel.org/r/0feb104eff85922019d4fb29280f3afb130c5204.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: abstract get_arg_page() stack expansion and mmap read lockLorenzo Stoakes
Right now fs/exec.c invokes expand_downwards(), an otherwise internal implementation detail of the VMA logic in order to ensure that an arg page can be obtained by get_user_pages_remote(). In order to be able to move the stack expansion logic into mm/vma.c to make it available to userland testing we need to find an alternative approach here. We do so by providing the mmap_read_lock_maybe_expand() function which also helpfully documents what get_arg_page() is doing here and adds an additional check against VM_GROWSDOWN to make explicit that the stack expansion logic is only invoked when the VMA is indeed a downward-growing stack. This allows expand_downwards() to become a static function. Importantly, the VMA referenced by mmap_read_maybe_expand() must NOT be currently user-visible in any way, that is place within an rmap or VMA tree. It must be a newly allocated VMA. This is the case when exec invokes this function. Link: https://lkml.kernel.org/r/5295d1c70c58e6aa63d14be68d4e1de9fa1c8e6d.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/vma: move unmapped_area() internals to mm/vma.cLorenzo Stoakes
We want to be able to unit test the unmapped area logic, so move it to mm/vma.c. The wrappers which invoke this remain in place in mm/mmap.c. In addition, naturally, update the existing test code to enable this to be compiled in userland. Link: https://lkml.kernel.org/r/53a57a52a64ea54e9d129d2e2abca3a538022379.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/vma: move brk() internals to mm/vma.cLorenzo Stoakes
Patch series "mm/vma: make more mmap logic userland testable". This series carries on the work started in previous series and continued in commit 52956b0d7fb9 ("mm: isolate mmap internal logic to mm/vma.c"), moving the remainder of memory mapping implementation details logic into mm/vma.c allowing the bulk of the mapping logic to be unit tested. It is highly useful to do so, as this means we can both fundamentally test this core logic, and introduce regression tests to ensure any issues previously resolved do not recur. Vitally, this includes the do_brk_flags() function, meaning we have both core means of userland mapping memory now testable. Performance testing was performed after this change given the brk() system call's sensitivity to change, and no performance regression was observed. The stack expansion logic is also moved into mm/vma.c, which necessitates a change in the API exposed to the exec code, removing the invocation of the expand_downwards() function used in get_arg_page() and instead adding mmap_read_lock_maybe_expand() to wrap this. This patch (of 5): Now we have moved mmap_region() internals to mm/vma.c, making it available to userland testing, it makes sense to do the same with brk(). This continues the pattern of VMA heavy lifting being done in mm/vma.c in an environment where it can be subject to straightforward unit and regression testing, with other VMA-adjacent files becoming wrappers around this functionality. [lorenzo.stoakes@oracle.com: add missing personality header import] Link: https://lkml.kernel.org/r/2a717265-985f-45eb-9257-8b2857088ed4@lucifer.local Link: https://lkml.kernel.org/r/cover.1733248985.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/3d24b9e67bb0261539ca921d1188a10a1b4d4357.1733248985.git.lorenzo.stoakes@oracle.com Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: add some detailed comments in can_steal_fallbackgaoxiang17
[akpm@linux-foundation.org: tweak grammar, fit to 80 cols] Link: https://lkml.kernel.org/r/20240920122030.159751-1-gxxa03070307@gmail.com Signed-off-by: gaoxiang17 <gaoxiang17@xiaomi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm:kasan: fix sparse warnings: Should it be static?Nihar Chaithanya
Yes, when making the global variables kasan_ptr_result and kasan_int_result as static volatile, the warnings are removed and the variable and assignments are retained, but when just static is used I understand that it might be optimized. Add a fix making the global varaibles - static volatile, removing the warnings: mm/kasan/kasan_test.c:36:6: warning: symbol 'kasan_ptr_result' was not declared. Should it be static? mm/kasan/kasan_test.c:37:5: warning: symbol 'kasan_int_result' was not declared. Should it be static? Link: https://lkml.kernel.org/r/20241011114537.35664-1-niharchaithanya@gmail.com Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312261010.o0lRiI9b-lkp@intel.com/ Signed-off-by: Nihar Chaithanya <niharchaithanya@gmail.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: swap_cgroup: get rid of __lookup_swap_cgroup()Roman Gushchin
Because swap_cgroup map is now virtually contiguous, swap_cgroup_record() can be simplified, which eliminates a need to use __lookup_swap_cgroup(). Now as __lookup_swap_cgroup() is really trivial and is used only once, it can be inlined. Link: https://lkml.kernel.org/r/20241115190229.676440-2-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: swap_cgroup: allocate swap_cgroup map using vcalloc()Roman Gushchin
Currently swap_cgroup's map is constructed as a vmalloc()'s-based array of pointers to individual struct pages. This brings an unnecessary complexity into the code. This patch turns the swap_cgroup's map into a single space allocated by vcalloc(). [akpm@linux-foundation.org: s/vfree/kvfree/, per Shakeel] Link: https://lkml.kernel.org/r/20241115190229.676440-1-roman.gushchin@linux.dev Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: remove the non-useful else after a break in a if statementKeren Sun
Remove the else block since there is already a break in the statement of if (iter->oom_lock), just set iter->oom_lock true after the if block ends. Link: https://lkml.kernel.org/r/20241115235744.1419580-4-kerensun@google.com Signed-off-by: Keren Sun <kerensun@google.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: remove unnecessary whitespace before a quoted newlineKeren Sun
Remove whitespaces before newlines for strings in pr_warn_once() Link: https://lkml.kernel.org/r/20241115235744.1419580-3-kerensun@google.com Signed-off-by: Keren Sun <kerensun@google.com> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: prefer 'unsigned int' to bare use of 'unsigned'Keren Sun
Patch series "mm: fix format issues and param types" Change the param 'mode' from type 'unsigned' to 'unsigned int' in memcg_event_wake() and memcg_oom_wake_function(), and for the param 'nid' in VM_BUG_ON(). Link: https://lkml.kernel.org/r/20241115235744.1419580-2-kerensun@google.com Signed-off-by: Keren Sun <kerensun@google.com> Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13filemap: remove unused folio_add_wait_queueDr. David Alan Gilbert
folio_add_wait_queue() has been unused since 2021's commit 850cba069c26 ("cachefiles: Delete the cachefiles driver pending rewrite") Remove it. Link: https://lkml.kernel.org/r/20241116151446.95555-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/rodata_test: verify test data is unchanged, rather than non-zeroPetr Tesarik
Verify that the test variable holds the initialization value, rather than any non-zero value. Link: https://lkml.kernel.org/r/386ffda192eb4a26f68c526c496afd48a5cd87ce.1732016064.git.ptesarik@suse.com Signed-off-by: Petr Tesarik <ptesarik@suse.com> Reviewed-by: Kees Cook <kees@kernel.org> Cc: Jinbum Park <jinb.park7@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/rodata_test: use READ_ONCE() to read const variablePetr Tesarik
Patch series "Fix mm/rodata_test", v2. Make sure that the test actually reads the read-only memory location. Verify that the variable contains the expected value rather than any non-zero value. This patch (of 2): The C compiler may optimize away the memory read of a const variable if its value is known at compile time. In particular, GCC14 with -O2 generates no code at all for test 1, and it generates the following x86_64 instructions for test 3: cmpl $195, 4(%rsp) je .L14 That is, it replaces the read of rodata_test_data with an immediate value and compares it to the value of the local variable "zero". Use READ_ONCE() to undo any such compiler optimizations and enforce a memory read. Link: https://lkml.kernel.org/r/cover.1732016064.git.ptesarik@suse.com Link: https://lkml.kernel.org/r/2a66dee010151b25cb143efb39091ef7530aa00a.1732016064.git.ptesarik@suse.com Fixes: 2959a5f726f6 ("mm: add arch-independent testcases for RODATA") Signed-off-by: Petr Tesarik <ptesarik@suse.com> Reviewed-by: Kees Cook <kees@kernel.org> Cc: Jinbum Park <jinb.park7@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: shmem: add a kernel command line to change the default huge policy for tmpfsBaolin Wang
Now the tmpfs can allow to allocate any sized large folios, and the default huge policy is still preferred to be 'never'. Due to tmpfs not behaving like other file systems in some cases as previously explained by David[1]: : I think I raised this in the past, but tmpfs/shmem is just like any : other file system .. except it sometimes really isn't and behaves much : more like (swappable) anonymous memory. (or mlocked files) : : There are many systems out there that run without swap enabled, or with : extremely minimal swap (IIRC until recently kubernetes was completely : incompatible with swapping). Swap can even be disabled today for shmem : using a mount option. : : That's a big difference to all other file systems where you are : guaranteed to have backend storage where you can simply evict under : memory pressure (might temporarily fail, of course). : : I *think* that's the reason why we have the "huge=" parameter that also : controls the THP allocations during page faults (IOW possible memory : over-allocation). Maybe also because it was a new feature, and we only : had a single THP size. Thus adding a new command line to change the default huge policy will be helpful to use the large folios for tmpfs, which is similar to the 'transparent_hugepage_shmem' cmdline for shmem. [1] https://lore.kernel.org/all/cbadd5fe-69d5-4c21-8eb8-3344ed36c721@redhat.com/ Link: https://lkml.kernel.org/r/ff390b2656f0d39649547f8f2cbb30fcb7e7be2d.1732779148.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: shmem: add large folio support for tmpfsBaolin Wang
Add large folio support for tmpfs write and fallocate paths matching the same high order preference mechanism used in the iomap buffered IO path as used in __filemap_get_folio(). Add shmem_mapping_size_orders() to get a hint for the orders of the folio based on the file size which takes care of the mapping requirements. Traditionally, tmpfs only supported PMD-sized large folios. However nowadays with other file systems supporting any sized large folios, and extending anonymous to support mTHP, we should not restrict tmpfs to allocating only PMD-sized large folios, making it more special. Instead, we should allow tmpfs can allocate any sized large folios. Considering that tmpfs already has the 'huge=' option to control the PMD-sized large folios allocation, we can extend the 'huge=' option to allow any sized large folios. The semantics of the 'huge=' mount option are: huge=never: no any sized large folios huge=always: any sized large folios huge=within_size: like 'always' but respect the i_size huge=advise: like 'always' if requested with madvise() Note: for tmpfs mmap() faults, due to the lack of a write size hint, still allocate the PMD-sized huge folios if huge=always/within_size/advise is set. Moreover, the 'deny' and 'force' testing options controlled by '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same semantics. The 'deny' can disable any sized large folios for tmpfs, while the 'force' can enable PMD sized large folios for tmpfs. Link: https://lkml.kernel.org/r/035bf55fbdebeff65f5cb2cdb9907b7d632c3228.1732779148.git.baolin.wang@linux.alibaba.com Co-developed-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Daniel Gomez <da.gomez@samsung.com> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: shmem: change shmem_huge_global_enabled() to return huge order bitmapBaolin Wang
Change the shmem_huge_global_enabled() to return the suitable huge order bitmap, and return 0 if huge pages are not allowed. This is a preparation for supporting various huge orders allocation of tmpfs in the following patches. No functional changes. Link: https://lkml.kernel.org/r/9dce1cfad3e9c1587cf1a0ea782ddbebd0e92984.1732779148.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13kasan: make kasan_record_aux_stack_noalloc() the default behaviourPeter Zijlstra
kasan_record_aux_stack_noalloc() was introduced to record a stack trace without allocating memory in the process. It has been added to callers which were invoked while a raw_spinlock_t was held. More and more callers were identified and changed over time. Is it a good thing to have this while functions try their best to do a locklessly setup? The only downside of having kasan_record_aux_stack() not allocate any memory is that we end up without a stacktrace if stackdepot runs out of memory and at the same stacktrace was not recorded before To quote Marco Elver from https://lore.kernel.org/all/CANpmjNPmQYJ7pv1N3cuU8cP18u7PP_uoZD8YxwZd4jtbof9nVQ@mail.gmail.com/ | I'd be in favor, it simplifies things. And stack depot should be | able to replenish its pool sufficiently in the "non-aux" cases | i.e. regular allocations. Worst case we fail to record some | aux stacks, but I think that's only really bad if there's a bug | around one of these allocations. In general the probabilities | of this being a regression are extremely small [...] Make the kasan_record_aux_stack_noalloc() behaviour default as kasan_record_aux_stack(). [bigeasy@linutronix.de: dressed the diff as patch] Link: https://lkml.kernel.org/r/20241122155451.Mb2pmeyJ@linutronix.de Fixes: 7cb3007ce2da ("kasan: generic: introduce kasan_record_aux_stack_noalloc()") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Reported-by: syzbot+39f85d612b7c20d8db48@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67275485.050a0220.3c8d68.0a37.GAE@google.com Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Ben Segall <bsegall@google.com> Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <frederic@kernel.org> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: <kasan-dev@googlegroups.com> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Liam R. Howlett <Liam.Howlett@Oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: syzkaller-bugs@googlegroups.com Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Zqiang <qiang.zhang1211@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/memory: fix a comment typo in lock_mm_and_find_vma()Chin Yik Ming
s/equivalend/equivalent/ Link: https://lkml.kernel.org/r/20241120105041.2394283-1-yikming2222@gmail.com Signed-off-by: Chin Yik Ming <yikming2222@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: change type of cma_area_count to unsigned intJiale Yang
Prefer 'unsigned int' over plain 'unsigned'. Also make it consistent with mm/cma.c Link: https://lkml.kernel.org/r/tencent_1E5E3AA25C261196D8C1F7097F130E382008@qq.com Signed-off-by: Jiale Yang <295107659@qq.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/hugetlb_cgroup: avoid useless return in void functionPintu Kumar
The return statement at the end of void function is unnecessary. Just remove it as part of cleanup. Link: https://lkml.kernel.org/r/20241122173558.20670-1-quic_pintu@quicinc.com Signed-off-by: Pintu Kumar <quic_pintu@quicinc.com> Cc: Pintu Agarwal <pintu.ping@gmail.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: mmap_lock: optimize mmap_lock tracepointsShakeel Butt
We are starting to deploy mmap_lock tracepoint monitoring across our fleet and the early results showed that these tracepoints are consuming significant amount of CPUs in kernfs_path_from_node when enabled. It seems like the kernel is trying to resolve the cgroup path in the fast path of the locking code path when the tracepoints are enabled. In addition for some application their metrics are regressing when monitoring is enabled. The cgroup path resolution can be slow and should not be done in the fast path. Most userspace tools, like bpftrace, provides functionality to get the cgroup path from cgroup id, so let's just trace the cgroup id and the users can use better tools to get the path in the slow path. Link: https://lkml.kernel.org/r/20241125171617.113892-1-shakeel.butt@linux.dev Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev> Reviewed-by: Yosry Ahmed <yosryahmed@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Reviewed-by: Axel Rasmussen <axelrasmussen@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/damon/core: remove duplicate list_empty quota->goals checkHonggyu Kim
damos_set_effective_quota() checks quota contidions but there are some duplicate checks for quota->goals inside. This patch reduces one of if statement to simplify the esz calculation logic by setting esz as ULONG_MAX by default. Link: https://lkml.kernel.org/r/20241125184307.41746-1-sj@kernel.org Signed-off-by: Honggyu Kim <honggyu.kim@sk.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13slab: allocate frozen pagesMatthew Wilcox (Oracle)
Since slab does not use the page refcount, it can allocate and free frozen pages, saving one atomic operation per free. Link: https://lkml.kernel.org/r/20241125210149.2976098-16-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/mempolicy: add alloc_frozen_pages()Matthew Wilcox (Oracle)
Provide an interface to allocate pages from the page allocator without incrementing their refcount. This saves an atomic operation on free, which may be beneficial to some users (eg slab). Link: https://lkml.kernel.org/r/20241125210149.2976098-15-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: add __alloc_frozen_pages()Matthew Wilcox (Oracle)
Defer the initialisation of the page refcount to the new __alloc_pages() wrapper and turn the old __alloc_pages() into __alloc_frozen_pages(). Link: https://lkml.kernel.org/r/20241125210149.2976098-14-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: move set_page_refcounted() to end of __alloc_pages()Matthew Wilcox (Oracle)
Remove some code duplication by calling set_page_refcounted() at the end of __alloc_pages() instead of after each call that can allocate a page. That means that we free a frozen page if we've exceeded the allowed memcg memory. Link: https://lkml.kernel.org/r/20241125210149.2976098-13-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: move set_page_refcounted() to callers of __alloc_pages_slowpath()Matthew Wilcox (Oracle)
In preparation for allocating frozen pages, stop initialising the page refcount in __alloc_pages_slowpath(). Link: https://lkml.kernel.org/r/20241125210149.2976098-12-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: move set_page_refcounted() to callers of ↵Matthew Wilcox (Oracle)
__alloc_pages_direct_reclaim() In preparation for allocating frozen pages, stop initialising the page refcount in __alloc_pages_direct_reclaim(). Link: https://lkml.kernel.org/r/20241125210149.2976098-11-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: move set_page_refcounted() to callers of ↵Matthew Wilcox (Oracle)
__alloc_pages_direct_compact() In preparation for allocating frozen pages, stop initialising the page refcount in __alloc_pages_direct_compact(). Link: https://lkml.kernel.org/r/20241125210149.2976098-10-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: move set_page_refcounted() to callers of __alloc_pages_may_oom()Matthew Wilcox (Oracle)
In preparation for allocating frozen pages, stop initialising the page refcount in __alloc_pages_may_oom(). Link: https://lkml.kernel.org/r/20241125210149.2976098-9-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: move set_page_refcounted() to callers of ↵Matthew Wilcox (Oracle)
__alloc_pages_cpuset_fallback() In preparation for allocating frozen pages, stop initialising the page refcount in __alloc_pages_cpuset_fallback(). Link: https://lkml.kernel.org/r/20241125210149.2976098-8-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: move set_page_refcounted() to callers of get_page_from_freelist()Matthew Wilcox (Oracle)
In preparation for allocating frozen pages, stop initialising the page refcount in get_page_from_freelist(). Link: https://lkml.kernel.org/r/20241125210149.2976098-7-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: David Hildenbrand <david@redhat.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: move set_page_refcounted() to callers of prep_new_page()Matthew Wilcox (Oracle)
In preparation for allocating frozen pages, stop initialising the page refcount in prep_new_page(). Link: https://lkml.kernel.org/r/20241125210149.2976098-6-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: move set_page_refcounted() to callers of post_alloc_hook()Matthew Wilcox (Oracle)
In preparation for allocating frozen pages, stop initialising the page refcount in post_alloc_hook(). Link: https://lkml.kernel.org/r/20241125210149.2976098-5-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: export free_frozen_pages() instead of free_unref_page()Matthew Wilcox (Oracle)
We already have the concept of "frozen pages" (eg page_ref_freeze()), so let's not complicate things by also having the concept of "unref pages". Link: https://lkml.kernel.org/r/20241125210149.2976098-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: make alloc_pages_mpol() staticMatthew Wilcox (Oracle)
All callers outside mempolicy.c now use folio_alloc_mpol() thanks to Kefeng's cleanups, so we can remove this as a visible symbol. And also remove the alloc_hooks for alloc_pages_mpol(), since all users in mempolicy.c are using the nonprof version. Link: https://lkml.kernel.org/r/20241125210149.2976098-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/page_alloc: cache page_zone() result in free_unref_page()Matthew Wilcox (Oracle)
Patch series "Allocate and free frozen pages", v3. Slab does not need to use the page refcount at all, and it can avoid an atomic operation on page free. Hugetlb wants to delay setting the refcount until it has assembled a complete gigantic page. We already have the ability to freeze a page (safely reduce its reference count to 0), so this patchset adds APIs to allocate and free pages which are in a frozen state. This patchset is also a step towards the Glorious Future in which struct page doesn't have a refcount; the users which need a refcount will have one in their per-allocation memdesc. This patch (of 15): Save 17 bytes of text by calculating page_zone() once instead of twice. Link: https://lkml.kernel.org/r/20241125210149.2976098-1-willy@infradead.org Link: https://lkml.kernel.org/r/20241125210149.2976098-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm: migrate: remove unused argument vma from migrate_misplaced_folio()Donet Tom
Commit ee86814b0562 ("mm/migrate: move NUMA hinting fault folio isolation + checks under PTL") removed the code that had used the vma argument in migrate_misplaced_folio. Since the vma argument was no longer used in migrate_misplaced_folio, this patch removes it. Link: https://lkml.kernel.org/r/20241126155655.466186-1-donettom@linux.ibm.com Signed-off-by: Donet Tom <donettom@linux.ibm.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: Zi Yan <ziy@nvidia.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/zswap: add LRU_STOP to comment about dropping the lru lockAlice Ryhl
This function has been able to return LRU_STOP since commit b49547ade38a ("mm/zswap: stop lru list shrinking when encounter warm region"). To reduce confusion, update the comment to also list LRU_STOP as an option. Link: https://lkml.kernel.org/r/20241127-lru-stop-comment-v1-1-f54a7cba9429@google.com Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Chengming Zhou <chengming.zhou@linux.dev> Cc: Alice Ryhl <aliceryhl@google.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-13mm/slab: fix kernel-doc func param namesRandy Dunlap
Use corrected function parameter names to eliminate kernel-doc warnings: slab.h:142: warning: Function parameter or struct member 's' not described in 'slab_folio' slab.h:142: warning: Excess function parameter 'slab' description in 'slab_folio' slab.h:168: warning: Function parameter or struct member 's' not described in 'slab_page' slab.h:168: warning: Excess function parameter 'slab' description in 'slab_page' Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>