summaryrefslogtreecommitdiff
path: root/mm/page-writeback.c
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2024-11-23 09:58:07 -0800
committerLinus Torvalds <torvalds@linux-foundation.org>2024-11-23 09:58:07 -0800
commit5c00ff742bf5caf85f60e1c73999f99376fb865d (patch)
treefa484e83c27af79f1c0511e7e0673507461c9379 /mm/page-writeback.c
parent228a1157fb9fec47eb135b51c0202b574e079ebf (diff)
parent2532e6c74a67e65b95f310946e0c0e0a41b3a34b (diff)
Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton: - The series "zram: optimal post-processing target selection" from Sergey Senozhatsky improves zram's post-processing selection algorithm. This leads to improved memory savings. - Wei Yang has gone to town on the mapletree code, contributing several series which clean up the implementation: - "refine mas_mab_cp()" - "Reduce the space to be cleared for maple_big_node" - "maple_tree: simplify mas_push_node()" - "Following cleanup after introduce mas_wr_store_type()" - "refine storing null" - The series "selftests/mm: hugetlb_fault_after_madv improvements" from David Hildenbrand fixes this selftest for s390. - The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng implements some rationaizations and cleanups in the page mapping code. - The series "mm: optimize shadow entries removal" from Shakeel Butt optimizes the file truncation code by speeding up the handling of shadow entries. - The series "Remove PageKsm()" from Matthew Wilcox completes the migration of this flag over to being a folio-based flag. - The series "Unify hugetlb into arch_get_unmapped_area functions" from Oscar Salvador implements a bunch of consolidations and cleanups in the hugetlb code. - The series "Do not shatter hugezeropage on wp-fault" from Dev Jain takes away the wp-fault time practice of turning a huge zero page into small pages. Instead we replace the whole thing with a THP. More consistent cleaner and potentiall saves a large number of pagefaults. - The series "percpu: Add a test case and fix for clang" from Andy Shevchenko enhances and fixes the kernel's built in percpu test code. - The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett optimizes mremap() by avoiding doing things which we didn't need to do. - The series "Improve the tmpfs large folio read performance" from Baolin Wang teaches tmpfs to copy data into userspace at the folio size rather than as individual pages. A 20% speedup was observed. - The series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting. - The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt removes the long-deprecated memcgv2 charge moving feature. - The series "fix error handling in mmap_region() and refactor" from Lorenzo Stoakes cleanup up some of the mmap() error handling and addresses some potential performance issues. - The series "x86/module: use large ROX pages for text allocations" from Mike Rapoport teaches x86 to use large pages for read-only-execute module text. - The series "page allocation tag compression" from Suren Baghdasaryan is followon maintenance work for the new page allocation profiling feature. - The series "page->index removals in mm" from Matthew Wilcox remove most references to page->index in mm/. A slow march towards shrinking struct page. - The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs interface tests" from Andrew Paniakin performs maintenance work for DAMON's self testing code. - The series "mm: zswap swap-out of large folios" from Kanchana Sridhar improves zswap's batching of compression and decompression. It is a step along the way towards using Intel IAA hardware acceleration for this zswap operation. - The series "kasan: migrate the last module test to kunit" from Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests over to the KUnit framework. - The series "implement lightweight guard pages" from Lorenzo Stoakes permits userapace to place fault-generating guard pages within a single VMA, rather than requiring that multiple VMAs be created for this. Improved efficiencies for userspace memory allocators are expected. - The series "memcg: tracepoint for flushing stats" from JP Kobryn uses tracepoints to provide increased visibility into memcg stats flushing activity. - The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky fixes a zram buglet which potentially affected performance. - The series "mm: add more kernel parameters to control mTHP" from MaĆ­ra Canal enhances our ability to control/configuremultisize THP from the kernel boot command line. - The series "kasan: few improvements on kunit tests" from Sabyrzhan Tasbolatov has a couple of fixups for the KASAN KUnit tests. - The series "mm/list_lru: Split list_lru lock into per-cgroup scope" from Kairui Song optimizes list_lru memory utilization when lockdep is enabled. * tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits) cma: enforce non-zero pageblock_order during cma_init_reserved_mem() mm/kfence: add a new kunit test test_use_after_free_read_nofault() zram: fix NULL pointer in comp_algorithm_show() memcg/hugetlb: add hugeTLB counters to memcg vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount zram: ZRAM_DEF_COMP should depend on ZRAM MAINTAINERS/MEMORY MANAGEMENT: add document files for mm Docs/mm/damon: recommend academic papers to read and/or cite mm: define general function pXd_init() kmemleak: iommu/iova: fix transient kmemleak false positive mm/list_lru: simplify the list_lru walk callback function mm/list_lru: split the lock to per-cgroup scope mm/list_lru: simplify reparenting and initial allocation mm/list_lru: code clean up for reparenting mm/list_lru: don't export list_lru_add mm/list_lru: don't pass unnecessary key parameters kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols ...
Diffstat (limited to 'mm/page-writeback.c')
-rw-r--r--mm/page-writeback.c45
1 files changed, 25 insertions, 20 deletions
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 72a5d8836425..fdb89ce85fff 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -917,7 +917,9 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc,
unsigned long thresh)
{
struct wb_domain *dom = dtc_dom(dtc);
+ struct bdi_writeback *wb = dtc->wb;
u64 wb_thresh;
+ u64 wb_max_thresh;
unsigned long numerator, denominator;
unsigned long wb_min_ratio, wb_max_ratio;
@@ -931,11 +933,28 @@ static unsigned long __wb_calc_thresh(struct dirty_throttle_control *dtc,
wb_thresh *= numerator;
wb_thresh = div64_ul(wb_thresh, denominator);
- wb_min_max_ratio(dtc->wb, &wb_min_ratio, &wb_max_ratio);
+ wb_min_max_ratio(wb, &wb_min_ratio, &wb_max_ratio);
wb_thresh += (thresh * wb_min_ratio) / (100 * BDI_RATIO_SCALE);
- if (wb_thresh > (thresh * wb_max_ratio) / (100 * BDI_RATIO_SCALE))
- wb_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE);
+ wb_max_thresh = thresh * wb_max_ratio / (100 * BDI_RATIO_SCALE);
+ if (wb_thresh > wb_max_thresh)
+ wb_thresh = wb_max_thresh;
+
+ /*
+ * With strictlimit flag, the wb_thresh is treated as
+ * a hard limit in balance_dirty_pages() and wb_position_ratio().
+ * It's possible that wb_thresh is close to zero, not because
+ * the device is slow, but because it has been inactive.
+ * To prevent occasional writes from being blocked, we raise wb_thresh.
+ */
+ if (unlikely(wb->bdi->capabilities & BDI_CAP_STRICTLIMIT)) {
+ unsigned long limit = hard_dirty_limit(dom, dtc->thresh);
+ u64 wb_scale_thresh = 0;
+
+ if (limit > dtc->dirty)
+ wb_scale_thresh = (limit - dtc->dirty) / 100;
+ wb_thresh = max(wb_thresh, min(wb_scale_thresh, wb_max_thresh / 4));
+ }
return wb_thresh;
}
@@ -2724,8 +2743,6 @@ EXPORT_SYMBOL(noop_dirty_folio);
/*
* Helper function for set_page_dirty family.
*
- * Caller must hold folio_memcg_lock().
- *
* NOTE: This relies on being atomic wrt interrupts.
*/
static void folio_account_dirtied(struct folio *folio,
@@ -2758,7 +2775,6 @@ static void folio_account_dirtied(struct folio *folio,
/*
* Helper function for deaccounting dirty page without writeback.
*
- * Caller must hold folio_memcg_lock().
*/
void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
{
@@ -2776,9 +2792,8 @@ void folio_account_cleaned(struct folio *folio, struct bdi_writeback *wb)
* If warn is true, then emit a warning if the folio is not uptodate and has
* not been truncated.
*
- * The caller must hold folio_memcg_lock(). It is the caller's
- * responsibility to prevent the folio from being truncated while
- * this function is in progress, although it may have been truncated
+ * It is the caller's responsibility to prevent the folio from being truncated
+ * while this function is in progress, although it may have been truncated
* before this function is called. Most callers have the folio locked.
* A few have the folio blocked from truncation through other means (e.g.
* zap_vma_pages() has it mapped and is holding the page table lock).
@@ -2822,14 +2837,10 @@ void __folio_mark_dirty(struct folio *folio, struct address_space *mapping,
*/
bool filemap_dirty_folio(struct address_space *mapping, struct folio *folio)
{
- folio_memcg_lock(folio);
- if (folio_test_set_dirty(folio)) {
- folio_memcg_unlock(folio);
+ if (folio_test_set_dirty(folio))
return false;
- }
__folio_mark_dirty(folio, mapping, !folio_test_private(folio));
- folio_memcg_unlock(folio);
if (mapping->host) {
/* !PageAnon && !swapper_space */
@@ -2956,14 +2967,12 @@ void __folio_cancel_dirty(struct folio *folio)
struct bdi_writeback *wb;
struct wb_lock_cookie cookie = {};
- folio_memcg_lock(folio);
wb = unlocked_inode_to_wb_begin(inode, &cookie);
if (folio_test_clear_dirty(folio))
folio_account_cleaned(folio, wb);
unlocked_inode_to_wb_end(inode, &cookie);
- folio_memcg_unlock(folio);
} else {
folio_clear_dirty(folio);
}
@@ -3074,7 +3083,6 @@ bool __folio_end_writeback(struct folio *folio)
struct address_space *mapping = folio_mapping(folio);
bool ret;
- folio_memcg_lock(folio);
if (mapping && mapping_use_writeback_tags(mapping)) {
struct inode *inode = mapping->host;
struct backing_dev_info *bdi = inode_to_bdi(inode);
@@ -3105,7 +3113,6 @@ bool __folio_end_writeback(struct folio *folio)
lruvec_stat_mod_folio(folio, NR_WRITEBACK, -nr);
zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, -nr);
node_stat_mod_folio(folio, NR_WRITTEN, nr);
- folio_memcg_unlock(folio);
return ret;
}
@@ -3118,7 +3125,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
- folio_memcg_lock(folio);
if (mapping && mapping_use_writeback_tags(mapping)) {
XA_STATE(xas, &mapping->i_pages, folio_index(folio));
struct inode *inode = mapping->host;
@@ -3159,7 +3165,6 @@ void __folio_start_writeback(struct folio *folio, bool keep_write)
lruvec_stat_mod_folio(folio, NR_WRITEBACK, nr);
zone_stat_mod_folio(folio, NR_ZONE_WRITE_PENDING, nr);
- folio_memcg_unlock(folio);
access_ret = arch_make_folio_accessible(folio);
/*