summaryrefslogtreecommitdiff
path: root/mm/damon
AgeCommit message (Collapse)Author
2025-04-01mm/damon/core: simplify control flow in damon_register_ops()Taotao Chen
The function logic is not complex, so using goto is unnecessary. Replace it with a straightforward if-else to simplify control flow and improve readability. Link: https://lkml.kernel.org/r/Z9vxcPCw8tDsjKw1@OneApple Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-21mm/damon: implement a new DAMOS filter type for active pagesNhat Pham
Patch series "mm/damon: introduce DAMOS filter type for active pages". The memory reclaim algorithm categorizes pages into active and inactive lists, separately for file and anon pages. The system's performance relies heavily on the (relative and absolute) accuracy of this categorization. This patch series add a new DAMOS filter for pages' activeness, giving us visibility into the access frequency of the pages on each list. This insight can help us diagnose issues with the active-inactive balancing dynamics, and make decisions to optimize reclaim efficiency and memory utilization. For instance, we might decide to enable DAMON_LRU_SORT, if we find that there are pages on the active list that are infrequently accessed, or less frequently accessed than pages on the inactive list. This patch (of 2): Implement a DAMOS filter type for active pages on DAMON kernel API, and add support of it from the physical address space DAMON operations set (paddr). Link: https://lkml.kernel.org/r/20250318183029.2062917-1-nphamcs@gmail.com Link: https://lkml.kernel.org/r/20250318183029.2062917-2-nphamcs@gmail.com Signed-off-by: Nhat Pham <nphamcs@gmail.com> Suggested-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs-schemes: avoid Wformat-security warning on ↵SeongJae Park
damon_sysfs_access_pattern_add_range_dir() When -Wformat-security is given, compiler warns as a potential security issue on damon_sysfs_access_pattern_add_range_dir() as below: mm/damon/sysfs-schemes.c: In function `damon_sysfs_access_pattern_add_range_dir': mm/damon/sysfs-schemes.c:1503:25: warning: format not a string literal and no format arguments [-Wformat-security] 1503 | &access_pattern->kobj, name); | ^ Fix it by using "%s" as the format and the name as the argument. Link: https://lkml.kernel.org/r/20250310165009.652491-1-sj@kernel.org Fixes: 7e84b1f8212a ("mm/damon/sysfs: support DAMON-based Operation Schemes") Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon: remove damon_operations->reset_aggregatedSeongJae Park
The operations layer hook was introduced to let operations set do any aggregation data reset if needed. But it is not really be used now. Remove it. Link: https://lkml.kernel.org/r/20250306175908.66300-14-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon: remove damon_callback->before_damos_applySeongJae Park
The hook was introduced to let DAMON kernel API users access DAMOS schemes-eligible regions in a safe way. Now it is no more used by anyone, and the functionality is provided in a better way by damos_walk(). Remove it. Link: https://lkml.kernel.org/r/20250306175908.66300-13-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon: remove damon_callback->after_samplingSeongJae Park
The callback was used by DAMON sysfs interface for reading DAMON internal data. But it is no more being used, and damon_call() can do similar works in a better way. Remove it. Link: https://lkml.kernel.org/r/20250306175908.66300-12-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon: remove ->before_start of damon_callbackSeongJae Park
The function pointer field was added to be used as a place to do some initialization works just before DAMON starts working. However, nobody is using it now. Remove it. Link: https://lkml.kernel.org/r/20250306175908.66300-11-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs-schemes: remove obsolete comment for ↵SeongJae Park
damon_sysfs_schemes_clear_regions() The comment on damon_sysfs_schemes_clear_regions() function is obsolete, since it has updated to directly called from DAMON sysfs interface code. Remove the outdated comment. Link: https://lkml.kernel.org/r/20250306175908.66300-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs: remove damon_sysfs_cmd_request and its readersSeongJae Park
damon_sysfs_cmd_request is DAMON sysfs interface's own synchronization mechanism for accessing DAMON internal data via damon_callback hooks. All the users are now migrated to damon_call() and damos_walk(), so nobody really uses it. No one writes to the data structure but reading code is still remained. Remove the reading code and the entire data structure. Link: https://lkml.kernel.org/r/20250306175908.66300-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs: remove damon_sysfs_cmd_request_callback() and its callersSeongJae Park
damon_sysfs_cmd_request_callback() is the damon_callback hook functions that were used to handle user requests that need to read and/or write DAMON internal data. All the usages are now updated to use damon_call() or damos_walk(), though. Remove it and its callers. Link: https://lkml.kernel.org/r/20250306175908.66300-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs: remove damon_sysfs_cmd_request code from ↵SeongJae Park
damon_sysfs_handle_cmd() damon_sysfs_handle_cmd() handles user requests that it can directly handle on its own. For requests that need to be handled from damon_callback hooks, it uses DAMON sysfs interface's own synchronous damon_callback hooks management mechanism, namely damon_sysfs_cmd_request. Now all user requests are handled without damon_callback hooks, so damon_sysfs_cmd_request client code in damon_sysfs_andle_cmd() does nothing in real. Remove the unnecessary code. Link: https://lkml.kernel.org/r/20250306175908.66300-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs: handle commit command using damon_call()SeongJae Park
DAMON sysfs interface is using damon_callback->after_aggregation hook with its self-implemented synchronization mechanism for the hook. It is inefficient, complicated, and take up to one aggregation interval to complete, which can be long on some configs. Use damon_call() instead. It provides a synchronization mechanism that built inside DAMON's core layer, so more efficient than DAMON sysfs interface's own one. Also it isolates the implementation inside the core layer, and hence it makes the code easier to maintain. Finally, it takes up to one sampling interval, which is much shorter than the aggregation interval in common setups. Link: https://lkml.kernel.org/r/20250306175908.66300-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/core: make damon_set_attrs() be safe to be called from damon_call()SeongJae Park
Currently all DAMON kernel API callers do online DAMON parameters commit from damon_callback->after_aggregation because only those are safe place to call the DAMON monitoring attributes update function, namely damon_set_attrs(). Because damon_callback hooks provide no synchronization, the callers work in asynchronous ways or implement their own inefficient and complicated synchronization mechanisms. It also means online DAMON parameters commit can take up to one aggregation interval. On large systems having long aggregation intervals, that can be too slow. The synchronization can be done in more efficient and simple way while removing the latency constraint if it can be done using damon_call(). The fact that damon_call() can be executed in the middle of the aggregation makes damon_set_attrs() unsafe to be called from it, though. Two real problems can occur in the case. First, converting the not yet completely aggregated nr_accesses for new user-set intervals can arguably degrade the accuracy or at least make the logic complicated. Second, kdamond_reset_aggregated() will not be called after the monitoring results update, so next aggregation starts from unclean state. This can result in inconsistent and unexpected nr_accesses_bp. Make it safe as follows. Catch the middle-of-the-aggregation case from damon_set_attrs() by checking the passed_sample_intervals and next_aggregationsis of the context. And pass the information to nr_accesses conversion logic. The logic works as before if it is not the case (called after the current aggregation is completed). If it is the case (committing parameters in the middle of the aggregation), it drops the nr_accesses information that so far aggregated, and make the status same to the beginning of this aggregation, but as if the last aggregation was started with the updated sampling/aggregation intervals. The middle-of-aggregastion check introduce yet another edge case, though. This happens because kdamond_tune_intervals() can also call damon_set_attrs() with the middle-of-aggregation check. Consider damon_call() for parameters commit and kdamond_tune_intervals() are called in same iteration of kdamond main loop. Because kdamond_tune_interval() is called for aggregation intervals, it should be the end of the aggregation. The first damon_set_attrs() call from kdamond_call() understands it is the end of the aggregation and correctly handle it. But, because the damon_set_attrs() updated next_aggregation_sis of the context. Hence, the second damon_set_attrs() invocation from kdamond_tune_interval() believes it is called in the middle of the aggregation. It therefore resets aggregated information so far. After that, kdamond_reset_interval() is called and double-reset the aggregated information. Avoid this case, too, by setting the next_aggregation_sis before kdamond_tune_intervals() is invoked. Link: https://lkml.kernel.org/r/20250306175908.66300-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/core: invoke kdamond_call() after merging is done if possibleSeongJae Park
kdamond_call() callers may iterate the regions, so better to call it when the number of regions is as small as possible. It is when kdamond_merge_regions() is finished. Invoke it on the point. This change is also aimed to make future changes for carrying online parameters commit with damon_call() easier. The commit operation should be able to make sequence between other aggregation interval based operations including regioins merging and aggregation reset. Placing damon_call() invocation after the regions merging makes the sequence handling simpler. Link: https://lkml.kernel.org/r/20250306175908.66300-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs: validate user inputs from damon_sysfs_commit_input()SeongJae Park
Patch series "mm/damon/sysfs: commit parameters online via damon_call()". Due to the lack of ways to synchronously access DAMON internal data, DAMON sysfs interface is using damon_callback hooks with its own synchronization mechanism. The mechanism is built on top of damon_callback hooks in an ineifficient and complicated way. Patch series "mm/damon: replace most damon_callback usages in sysfs with new core functions", which starts with commit e035320fd38e ("mm/damon/sysfs-schemes: remove unnecessary schemes existence check in damon_sysfs_schemes_clear_regions()") introduced two new DAMON kernel API functions that providing the synchronous access, replaced most damon_callback hooks usage in DAMON sysfs interface, and cleaned up unnecessary code. Continue the replacement and cleanup works. Update the last DAMON sysfs' usage of its own synchronization mechanism, namely online DAMON parameters commit, to use damon_call() instead of the damon_callback hooks and the hard-to-maintain core-external synchronization mechanism. Then remove the no more be used code due to the change, and more unused code that just not yet cleaned up. The first four patches (patches 1-4) of this series makes DAMON sysfs interface's online parameters commit to use damon_call(). Then, following three patches (patches 5-7) remove the DAMON sysfs interface's own synchronization mechanism and its usages, which is no more be used by anyone due to the first four patches. Finally, six patches (8-13) do more cleanup of outdated comment and unused code. This patch (of 13): Online DAMON parameters commit via DAMON sysfs interface can make kdamond stop. This behavior was made because it can make the implementation simpler. The implementation tries committing the parameter without validation. If it finds something wrong in the middle of the parameters update, it returns error without reverting the partially committed parameters back. It is safe though, since it immediately breaks kdamond main loop in the case of the error return. Users can make the wrong parameters by mistake, though. Stopping kdamond in the case is not very useful behavior. Also this makes it difficult to utilize damon_call() instead of damon_callback hook for online parameters update, since damon_call() cannot immediately break kdamond main loop in the middle. Validate the input parameters and return error when it fails before starting parameters updates. In case of mistakenly wrong parameters, kdamond can continue running with the old and valid parameters. Link: https://lkml.kernel.org/r/20250306175908.66300-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250306175908.66300-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs-schemes: return error when for attempts to install filters on ↵SeongJae Park
wrong sysfs directory Return error if the user tries to install a DAMOS filter on DAMOS filters sysfs directory that assumed to be used for filters that handled by a DAMON layer that not same to that for the installing filter. Link: https://lkml.kernel.org/r/20250305222733.59089-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs-schemes: record filters of which layer should be added to the ↵SeongJae Park
given filters directory Unlike their name and assumed purposes, {core,ops}_filters DAMOS sysfs directories are allowing installing any type of filters. As a first step for preventing such wrong installments, add information about filters that handled by what layer should the installed to the given filters directory in the DAMOS sysfs internal data structures. Link: https://lkml.kernel.org/r/20250305222733.59089-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/core: expose damos_filter_for_ops() to DAMON kernel API callersSeongJae Park
damos_filter_for_ops() can be useful to avoid putting wrong type of filters in wrong place. Make it be exposed to DAMON kernel API callers. Link: https://lkml.kernel.org/r/20250305222733.59089-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs-schemes: commit filters in {core,ops}_filters directoriesSeongJae Park
Connect user inputs for files under core_filters and ops_filters with DAMON, so that the files can really function. Becasuse {core,ops}_filters are easier to be managed in terms of expecting filters evaluation order, add filters in {core,ops}_filters before 'filters' directory. Link: https://lkml.kernel.org/r/20250305222733.59089-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs-schemes: implement core_filters and ops_filters directoriesSeongJae Park
Implement two DAMOS sysfs directories for managing core and operations layer handled filters separately. Those are named as 'core_filters' and 'ops_filters', and have files hierarchy same to 'filters'. This commit is only populating and cleaning up the directories, not really connecting the files with DAMON. Following changes will make the connections. Link: https://lkml.kernel.org/r/20250305222733.59089-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs-schemes: let damon_sysfs_scheme_set_filters() be used for ↵SeongJae Park
different named directories Patch series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers". DAMOS filters are categorized into two groups based on their handling layers, namely core and operations layers. The categorization affects when each filter is evaluated. Core layer handled filters are evaluated first. The order meant nothing before, but introduction of allow filters changed that. DAMOS sysfs interface provides single directory for filters, namely 'filters'. Users can install any filters in any order there. DAMON will internally categorize those into core and operations layer handled ones, and apply the evaluation order rule. The ordering rule is clearly documented. But the interface could still confuse users since it is allowed to install filters on the directory in mixed ways. Add two sysfs directories for managing filters by handling layers, namely 'core_filters' and 'ops_filters' for filters that handled by core and operations layer, respectively. Those are avoided to be used for installing filters that not handled by the assumed layers. For backward compatibility, keep 'filters' directory with its curernt behavior. Filters installed in the directory will be added to DAMON after those of 'core_filters' and 'ops_filters' directories, with the automatic categorizations. Also recommend users to use the new directories while noticing 'filters' directory could be deprecated in future on the usage documents. Note that new directories provide all features that were provided with 'filters', but just in a more clear way. Deprecating 'filters' in future will hence not make an irreversal feature loss. This patch (of 8): damon_sysfs_scheme_set_filters() is using a hard-coded directory name, "filters". Refactor for general named directories of same files hierarchy, to use from upcoming changes for adding sibling directories having files same to those of "filters", and named as "core_filters" and "ops_filters". [arnd@arndb.deL avoid Wformat-security warning] Link: https://lkml.kernel.org/r/20250310135142.4176976-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20250305222733.59089-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250305222733.59089-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/paddr: respect ops_filters_default_rejectSeongJae Park
Use damos->ops_filters_default_reject, which is set based on the installed filters' behaviors, from physical address space DAMON operations set. Link: https://lkml.kernel.org/r/20250304211913.53574-9-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/core: set damos_filter default allowance behavior based on ↵SeongJae Park
installed filters Decide whether to allow or reject by default on core and opertions layer handled filters evaluation stages. It is decided as the opposite of the last installed filter's behavior. If there is no filter at all, allow by default. If there is any operations layer handled filters, core layer's filtering stage sets allowing as the default behavior regardless of the last filter of core layer-handling ones, since the last filter of core layer handled filters in the case is not really the last filter of the entire filtering stage. Also, make the core layer's DAMOS filters handling stage uses the newly set behavior field. [sj@kernel.org: setup damos->{core,ops}_filters_default_reject for initial start] Link: https://lkml.kernel.org/r/20250315222610.35245-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250304211913.53574-8-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/paddr: support only damos->ops_filtersSeongJae Park
DAMON physical address space operation set implementation (paddr) started handling both damos->filters and damos->ops_filters to avoid breakage during the change for the ->ops_filters setup. Now the change is done, so paddr's support of ->filters is only a waste that can safely be dropped. Remove it. Link: https://lkml.kernel.org/r/20250304211913.53574-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/core: put ops-handled filters to damos->ops_filtersSeongJae Park
damos->ops_filters has introduced to be used for all operations layer handled filters. But DAMON kernel API callers can put any type of DAMOS filters to any of damos->filters and damos->ops_filters. DAMON user-space ABI users have no way to use ->ops_filters at all. Update damos_add_filter(), which should be used by API callers to install DAMOS filters, to add filters to ->filters and ->ops_filters depending on their handling layer. The change forces both API callers and ABI users to use proper lists since ABI users use the API internally. Link: https://lkml.kernel.org/r/20250304211913.53574-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/core: support committing ops_filtersSeongJae Park
DAMON kernel API callers should use damon_commit_ctx() to install DAMON parameters including DAMOS filters. But damos_commit_ops_filters(), which is called by damon_commit_ctx() for filters installing, is not handling damos->ops_filters. Hence, no DAMON kernel API caller can use damos->ops_filters. Do the committing of the ops_filters to make it usable. Link: https://lkml.kernel.org/r/20250304211913.53574-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/paddr: support ops_filtersSeongJae Park
DAMON keeps all DAMOS filters in damos->filters. Upcoming changes will make it to use damos->ops_filters for all operations layer handled DAMOS filters, though. DAMON physical address space operations set implementation (paddr) is not ready for the changes, since it handles only damos->filters. To avoid any breakage during the upcoming changes, make paddr to handle both lists. After the change is made, ->filters support on paddr can be safely removed. Link: https://lkml.kernel.org/r/20250304211913.53574-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/core: introduce damos->ops_filtersSeongJae Park
Patch series "mm/damon: make allow filters after reject filters useful and intuitive". DAMOS filters do allow or reject elements of memory for given DAMOS scheme only if those match the filter criterias. For elements that don't match any DAMOS filter, 'allowing' is the default behavior. This makes allow-filters that don't have any reject-filter after them meaningless sources of overhead. The decision was made to keep the behavior consistent with that before the introduction of allow-filters. This, however, makes usage of DAMOS filters confusing and inefficient. It is more intuitive and still consistent behavior to reject by default unless there is no filter at all or the last filter is a reject filter. Update the filtering logic in the way and update documents to clarify the behavior. Note that this is changing the old behavior. But the old behavior for the problematic filter combination was definitely confusing, inefficient and anyway useless. Also, the behavior has relatively recently introduced. It is difficult to anticipate any user that depends on the behavior. Hence this is not a user-breaking behavior change but an obvious improvement. This patch (of 9): DAMOS filters can be categorized into two groups depending on which layer they are handled, namely core layer and ops layer. The groups are important because the filtering behavior depends on evaluation sequence of filters, and core layer-handled filters are evaluated before operations layer-handled ones. The behavior is clearly documented, but the implementation is bit inefficient and complicated. All filters are maintained in a single list (damos->filters) in mix. Filters evaluation logics in core layer and operations layer iterates all the filters on the list, while skipping filters that should be not handled by the layer of the logic. It is inefficient. Making future extensions having differentiations for filters of different handling layers will also be complicated. Add a new list that will be used for having all operations layer-handled DAMOS filters to DAMOS scheme data structure. Also add the support of its initialization and basic traversal functions. Link: https://lkml.kernel.org/r/20250304211913.53574-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250304211913.53574-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs: implement a command to update auto-tuned monitoring intervalsSeongJae Park
DAMON kernel API callers can show auto-tuned sampling and aggregation intervals from the monmitoring attributes data structure. That can be useful for debugging or tuning of the feature. DAMON user-space ABI users has no way to see that, though. Implement a new DAMON sysfs interface command, namely 'update_tuned_intervals', for the purpose. If the command is written to the kdamond state file, the tuned sampling and aggregation intervals will be updated to the corresponding sysfs interface files. Link: https://lkml.kernel.org/r/20250303221726.484227-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs: commit intervals tuning goalSeongJae Park
Connect DAMON sysfs interface for sampling and aggregation intervals auto-tuning with DAMON core API, so that users can really use the feature using the sysfs files. Link: https://lkml.kernel.org/r/20250303221726.484227-5-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/sysfs: implement intervals tuning goal directorySeongJae Park
Implement DAMON sysfs interface directory and its files for setting DAMON sampling and aggregation intervals auto-tuning goal. Link: https://lkml.kernel.org/r/20250303221726.484227-4-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon/core: implement intervals auto-tuningSeongJae Park
Implement the DAMON sampling and aggregation intervals auto-tuning mechanism as briefly described on 'struct damon_intervals_goal'. The core part for deciding the direction and amount of the changes is implemented reusing the feedback loop function which is being used for DAMOS quotas auto-tuning. Unlike the DAMOS quotas auto-tuning use case, limit the maximum decreasing amount after the adjustment to 50% of the current value, though. This is because the intervals have no good merits at rapid reductions since it could unnecessarily increase the monitoring overhead. Link: https://lkml.kernel.org/r/20250303221726.484227-3-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-17mm/damon: add data structure for monitoring intervals auto-tuningSeongJae Park
Patch series "mm/damon: auto-tune aggregation interval". DAMON requires time-consuming and repetitive aggregation interval tuning. Introduce a feature for automating it using a feedback loop that aims an amount of observed access events, like auto-exposing cameras. Background: Access Frequency Monitoring and Aggregation Interval ================================================================ DAMON checks if each memory element (damon_region) is accessed or not for every user-specified time interval called 'sampling interval'. It aggregates the check intervals on per-element counter called 'nr_accesses'. DAMON users can read the counters to get the access temperature of a given element. The counters are reset for every another user-specified time interval called 'aggregation interval'. This can be illustrated as DAMON continuously capturing a snapshot of access events that happen and captured within the last aggregation interval. This implies the aggregation interval plays a key role for the quality of the snapshots, like the camera exposure time. If it is too short, the amount of access events that happened and captured for each snapshot is small, so each snapshot will show no many interesting things but just a cold and dark world with hopefuly one pale blue dot or two. If it is too long, too many events are aggregated in a single shot, so each snapshot will look like world of flames, or Muspellheim. It will be difficult to find practical insights in both cases. Problem: Time Consuming and Repetitive Tuning ============================================= The appropriate length of the aggregation interval depends on how frequently the system and workloads are making access events that DAMON can observe. Hence, users have to tune the interval with excessive amount of tests with the target system and workloads. If the system and workloads are changed, the tuning should be done again. If the characteristic of the workloads is dynamic, it becomes more challenging. It is therefore time-consuming and repetitive. The tuning challenge mainly stems from the wrong question. It is not asking users what quality of monitoring results they want, but how DAMON should operate for their hidden goal. To make the right answer, users need to fully understand DAMON's mechanisms and the characteristics of their workloads. Users shouldn't be asked to understand the underlying mechanism. Understanding the characteristics of the workloads shouldn't be the role of users but DAMON. Aim-oriented Feedback-driven Auto-Tuning ========================================= Fortunately, the appropriate length of the aggregation interval can be inferred using a feedback loop. If the current snapshots are showing no much intresting information, in other words, if it shows only rare access events, increasing the aggregation interval helps, and vice versa. We tested this theory on a few real-world workloads, and documented one of the experience with an official DAMON monitoring intervals tuning guideline. Since it is a simple theory that requires repeatable tries, it can be a good job for machines. Based on the guideline's theory, we design an automation of aggregation interval tuning, in a way similar to that of camera auto-exposure feature. It defines the amount of interesting information as the ratio of DAMON-observed access events that DAMON actually observed to theoretical maximum amount of it within each snapshot. Events are accounted in byte and sampling attempts granularity. For example, let's say there is a region of 'X' bytes size. DAMON tried access check smapling for the region 'Y' times in total for a given aggregation. Among the 'Y' attempts, 'Z' times it shown positive results. Then, the theoritical maximum number of access events for the region is 'X * Y'. And the number of access events that DAMON has observed for the region is 'X * Z'. The abount of the interesting information is '(X * Z / X * Y)'. Note that each snapshot would have multiple regions. Users can set an arbitrary value of the ratio as their target. Once the target is set, the automation periodically measures the current value of the ratio and increase or decrease the aggregation interval if the ratio value is lower or higher than the target. The amount of the change is proportion to the distance between the current adn the target values. To avoid auto-tuning goes too long way, let users set the minimum and the maximum aggregation interval times. Changing only aggregation interval while sampling interval is kept makes the maximum level of access frequency in each snapshot, or discernment of regions inconsistent. Also, unnecessarily short sampling interval causes meaningless monitoring overhed. The automation therefore adjusts the sampling interval together with aggregation interval, while keeping the ratio between the two intervals. Users can set the ratio, or the discernment. Discussion ========== The modified question (aimed amount of access events, or lights, in each snapshot) is easy to answer by both the users and the kernel. If users are interested in finding more cold regions, the value should be lower, and vice versa. If users have no idea, kernel can suggest a fair default value based on some theories and experiments. For example, based on the Pareto principle (80/20 rule), we could expect 20% target ratio will capture 80% of real access events. Since 80% might be too high, applying the rule once again, 4% (20% * 20%) may capture about 56% (80% * 80%) of real access events. Sampling to aggregation intervals ratio and min/max aggregation intervals are also arguably easy to answer. What users want is discernment of regions for efficient system operation, for examples, X amount of colder regions or Y amount of warmer regions, not exactly how many times each cache line is accessed in nanoseconds degree. The appropriate min/max aggregation interval can relatively naively set, and may better to set for aimed monitoring overhead. Since sampling interval is directly deciding the overhead, setting it based on the sampling interval can be easy. With my experiences, I'd argue the intervals ratio 0.05, and 5 milliseconds to 20 seconds sampling interval range (100 milliseconds to 400 seconds aggregation interval) can be a good default suggestion. Evaluation ========== On a machine running a real world server workload, I ran DAMON to monitor its physical address space for about 23 hours, with this feature turned on. We set it to tune sampling interval in a range from 5 milliseconds to 10 seconds, aiming 4 % DAMON-observed access ratio per three aggregation intervals. The exact command I used is as below. damo start --monitoring_intervals_goal 4% 3 5ms 10s --damos_action stat During the test run, DAMON continuously updated sampling and aggregation intervals as designed, within the given range. For all the time, DAMON was able to find the intervals that meets the target access events ratio in the given intervals range (sampling interval between 5 milliseconds and 10 seconds). For most of the time, tuned sampling interval was converged in 300-400 milliseconds. It made only small amount of changes within the range. The average of the tuned sampling interval during the test was about 380 milliseconds. The workload periodically gets less load and decreases its CPU usage. Presumably this also caused it making less memory access events. Reactively to such event,s DAMON also increased the intervals as expected. It was still able to find the optimum interval that satisfying the target access ratio within the given intervals range. Usually it was converged to about 5 seconds. Once the workload gets normal amount of load again, DAMON reactively reduced the intervals to the normal range. I collected and visualized DAMON's monitoring results on the server a few times. Every time the visualized access pattern looked not biased to only cold or hot pages but diverse and balanced. Let me show some of the snapshots that I collected at the nearly end of the test (after about 23 hours have passed since starting DAMON on the server). The recency histogram looks as below. Please note that this visualization shows only a very coarse grained information. For more details about the visualization format, please refer to DAMON user-space tool documentation[1]. # ./damo report access --style recency-sz-hist --tried_regions_of 0 0 0 --access_rate 0 0 <last accessed time (us)> <total size> [-19 h 7 m 45.514 s, -17 h 12 m 58.963 s) 6.198 GiB |**** | [-17 h 12 m 58.963 s, -15 h 18 m 12.412 s) 0 B | | [-15 h 18 m 12.412 s, -13 h 23 m 25.860 s) 0 B | | [-13 h 23 m 25.860 s, -11 h 28 m 39.309 s) 0 B | | [-11 h 28 m 39.309 s, -9 h 33 m 52.757 s) 0 B | | [-9 h 33 m 52.757 s, -7 h 39 m 6.206 s) 0 B | | [-7 h 39 m 6.206 s, -5 h 44 m 19.654 s) 0 B | | [-5 h 44 m 19.654 s, -3 h 49 m 33.103 s) 0 B | | [-3 h 49 m 33.103 s, -1 h 54 m 46.551 s) 0 B | | [-1 h 54 m 46.551 s, -0 ns) 16.967 GiB |********* | [-0 ns, --6886551440000 ns) 38.835 GiB |********************| memory bw estimate: 9.425 GiB per second total size: 62.000 GiB It shows about 38 GiB of memory was accessed at least once within last aggregation interval (given ~300 milliseconds tuned sampling interval, this is about six seconds). This is about 61 % of the total memory. In other words, DAMON found warmest 61 % memory of the system. The number is particularly interesting given our Pareto principle based theory for the tuning goal value. We set it as 20 % of 20 % (4 %), thinking it would capture 80 % of 80 % (64 %) real access events. And it foudn 61 % hot memory, or working set. Nevertheless, to make the theory clearer, much more discussion and tests would be needed. At the moment, nonetheless, we can say making the target value higher helps finding more hot memory regions. The histogram also shows an amount of cold memory. About 17 GiB memory of the system has not accessed at least for last aggregation interval (about six seconds), and at most for about last two hours. The real longest unaccessed time of the 17 GiB memory was about 19 minutes, though. This is a limitation of this visualization format. It further found very cold 6 GiB memory. It has not accessed at least for last 17 hours and at most 19 hours. What about hot memory distribution? To see this, I capture and visualize the snapshot in access temperature histogram. Again, please refer to the DAMON user-space tool documentation[1] for the format and what access temperature mean. Both the visualization and metric shows only very coarse grained and limited information. The resulting histogram look like below. # ./damo report access --style temperature-sz-hist --tried_regions_of 0 0 0 <temperature> <total size> [-6,840,763,776,000, -5,501,580,939,800) 6.198 GiB |*** | [-5,501,580,939,800, -4,162,398,103,600) 0 B | | [-4,162,398,103,600, -2,823,215,267,400) 0 B | | [-2,823,215,267,400, -1,484,032,431,200) 0 B | | [-1,484,032,431,200, -144,849,595,000) 0 B | | [-144,849,595,000, 1,194,333,241,200) 55.802 GiB |********************| [1,194,333,241,200, 2,533,516,077,400) 4.000 KiB |* | [2,533,516,077,400, 3,872,698,913,600) 4.000 KiB |* | [3,872,698,913,600, 5,211,881,749,800) 8.000 KiB |* | [5,211,881,749,800, 6,551,064,586,000) 12.000 KiB |* | [6,551,064,586,000, 7,890,247,422,200) 4.000 KiB |* | memory bw estimate: 5.178 GiB per second total size: 62.000 GiB We can see most of the memory is in similar access temperature range, and definitely some pages are extremely hot. To see the picture in more detail, let's capture and visualize the snapshot per DAMON-region, sorted by their access temperature. The total number of the regions was about 300. Due to the limited space, I'm showing only a few parts of the output here. # ./damo report access --style hot --tried_regions_of 0 0 0 heatmap: 00000000888888889999999888888888888888888888888888888888888888888888888888888888 # min/max temperatures: -6,827,258,184,000, 17,589,052,500, column size: 793.600 MiB |999999999999999999999999999999999999999| 4.000 KiB access 100 % 18 h 9 m 43.918 s |999999999999999999999999999999999999999| 8.000 KiB access 100 % 17 h 56 m 5.351 s |999999999999999999999999999999999999999| 4.000 KiB access 100 % 15 h 24 m 19.634 s |999999999999999999999999999999999999999| 4.000 KiB access 100 % 14 h 10 m 55.606 s |999999999999999999999999999999999999999| 4.000 KiB access 100 % 11 h 34 m 18.993 s [...] |99999999999999999999999999999| 8.000 KiB access 100 % 1 m 27.945 s |11111111111111111111111111111| 80.000 KiB access 15 % 1 m 21.180 s |00000000000000000000000000000| 24.000 KiB access 5 % 1 m 21.180 s |00000000000000000000000000000| 5.919 GiB access 10 % 1 m 14.415 s |99999999999999999999999999999| 12.000 KiB access 100 % 1 m 7.650 s [...] |0| 4.000 KiB access 5 % 0 ns |0| 12.000 KiB access 5 % 0 ns |0| 188.000 KiB access 0 % 0 ns |0| 24.000 KiB access 0 % 0 ns |0| 48.000 KiB access 0 % 0 ns [...] |0000000000000000000000000000000| 8.000 KiB access 0 % 6 m 45.901 s |00000000000000000000000000000000| 36.000 KiB access 0 % 7 m 26.491 s |00000000000000000000000000000000| 4.000 KiB access 0 % 12 m 37.682 s |000000000000000000000000000000000| 8.000 KiB access 0 % 18 m 9.168 s |000000000000000000000000000000000| 16.000 KiB access 0 % 19 m 3.288 s |0000000000000000000000000000000000000000| 6.198 GiB access 0 % 18 h 57 m 52.582 s memory bw estimate: 8.798 GiB per second total size: 62.000 GiB We can see DAMON found small and extremely hot regions that accessed for all access check sampling (once per about 300 milliseconds) for more than 10 hours. The access temperature rapidly decreases. DAMON was also able to find small and big regions that not accessed for up to about 19 minutes. It even found an outlier cold region of 6 GiB that not accessed for about 19 hours. It is unclear what the outlier region is, as of this writing. For the testing, DAMON was consuming about 0.1% of single CPU time. This is again expected results, since DAMON was using about 370 milliseconds sampling interval in most case. # ps -p $kdamond_pid -o %cpu %CPU 0.1 I also ran similar tests against kernel build workload and an in-memory cache workload benchmark[2]. Detialed results including tuned intervals and captured access pattern were of course different sicne those depend on the workloads. But the auto-tuning feature was always working as expected like the above results for the real world workload. To wrap up, with intervals auto-tuning feature, DAMON was able to capture access pattern snapshots of a quality on a real world server workload. The auto-tuning feature was able to adaptively react to the dynamic access patterns of the workload and reliably provide consistent monitoring results without manual human interventions. Also, the auto-tuning made DAMON consumes only necessary amount of resource for the required quality. References ========== [1] https://github.com/damonitor/damo/blob/next/USAGE.md#access-report-styles [2] https://github.com/facebookresearch/DCPerf/blob/main/packages/tao_bench/README.md This patch (of 8): Add data structures for DAMON sampling and aggregation intervals automatic tuning that aims specific amount of DAMON-observed access events per snapshot. In more detail, define the data structure for the tuning goal, link it to the monitoring attributes data structure so that DAMON kernel API callers can make the request, and update parameters setup DAMON function to respect the new parameter. Link: https://lkml.kernel.org/r/20250303221726.484227-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250303221726.484227-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon: implement a new DAMOS filter type for unmapped pagesSeongJae Park
Patch series "mm/damon: introduce DAMOS filter type for unmapped pages". User decides whether their memory will be mapped or unmapped. It implies that the two types of memory can have different characteristics and management requirements. Provide the DAMON-observaibility DAMOS-operation capability for the different types by introducing a new DAMOS filter type for unmapped pages. This patch (of 2): Implement yet another DAMOS filter type for unmapped pages on DAMON kernel API, and add support of it from the physical address space DAMON operations set (paddr). Since it is for only unmapped pages, support from the virtual address spaces DAMON operations set (vaddr) is not required. Link: https://lkml.kernel.org/r/20250219220146.133650-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250219220146.133650-2-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon/sysfs-schemes: add files for setting damos_filter->sz_rangeUsama Arif
Add min and max files for damon filters to let the userspace decide the min/max folio size to operate on. This will be needed to decide what folio sizes to give pa_stat for. Link: https://lkml.kernel.org/r/20250211124437.278873-3-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon: introduce DAMOS filter type hugepage_sizeUsama Arif
Patch series "mm/damon: add support for hugepage_size DAMOS filter", v5. hugepage_size DAMOS filter can be used to gather statistics to check if memory regions of specific access tempratures are backed by hugepages of a size in a specific range. This filter can help to observe and prove the effectivenes of different schemes for shrinking/collapsing hugepages. This patch (of 4): This is to gather statistics to check if memory regions of specific access tempratures are backed by pages of a size in a specific range. This filter can help to observe and prove the effectivenes of different schemes for shrinking/collapsing hugepages. [sj@kernel.org: add kernel-doc comment for damos_filter->sz_range] Link: https://lkml.kernel.org/r/20250218223058.52459-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250211124437.278873-1-usamaarif642@gmail.com Link: https://lkml.kernel.org/r/20250211124437.278873-2-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Usama Arif <usamaarif642@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon/core: do damos walking in entire regions granularitySeongJae Park
damos_walk_control can be installed while DAMOS is walking the regions. This means the walk callback function invocations can be started from a region at the middle of the regions list. This makes it hard to be used reliably. Particularly, DAMOS tried regions update for collecting monitoring results gets problematic results. Increase the walk_control_lock critical section to do walking in entire regions granularity. Link: https://lkml.kernel.org/r/20250210182737.134994-4-sj@kernel.org Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon/core: do not call damos_walk_control->walk() if walk is completedSeongJae Park
damos_walk() invokes callback functions of schemes until all schemes finishes at least one round of walks. If there are multiple DAMOS schemes having different apply_interval, the callback functions for longer apply interval scheme will be called for more than a round of the walk. The behavior is different from the document (see damos_walk() kernel-doc comment), and not useful. Make the behavior be same to the documented one, by stopping invoking the callback if the walk for the given scheme is completed. Link: https://lkml.kernel.org/r/20250210182737.134994-3-sj@kernel.org Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon/core: unset damos->walk_completed after confimed setSeongJae Park
Patch series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors". damos_walk() can finish working earlier or later than expected, and start earlier than practical. First two behaviors are clearly wrong behavior (doesn't follow the documentation) and all three behaviors are only making the feature useless. Fix those. This patch (of 3): damos->walk_completed is only set, not unset. This can cause next damos_walk() finish earlier than expected. Unset it after all walk_completed is confirmed. Link: https://lkml.kernel.org/r/20250210182737.134994-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250210182737.134994-2-sj@kernel.org Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon: avoid applying DAMOS action to same entity multiple timesSeongJae Park
'paddr' DAMON operations set can apply a DAMOS scheme's action to a large folio multiple times in single DAMOS-regions-walk if the folio is laid on multiple DAMON regions. Add a field for DAMOS scheme object that can be used by the underlying ops to know what was the last entity that the scheme's action has applied. The core layer unsets the field when each DAMOS-regions-walk is done for the given scheme. And update 'paddr' ops to use the infrastructure to avoid the problem. Link: https://lkml.kernel.org/r/20250207212033.45269-3-sj@kernel.org Fixes: 57223ac29584 ("mm/damon/paddr: support the pageout scheme") Signed-off-by: SeongJae Park <sj@kernel.org> Reported-by: Usama Arif <usamaarif642@gmail.com> Closes: https://lore.kernel.org/20250203225604.44742-3-usamaarif642@gmail.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon/ops: have damon_get_folio return folio even for tail pagesUsama Arif
Patch series "mm/damon/paddr: fix large folios access and schemes handling". DAMON operations set for physical address space, namely 'paddr', treats tail pages as unaccessed always. It can also apply DAMOS action to a large folio multiple times within single DAMOS' regions walking. As a result, the monitoring output has poor quality and DAMOS works in unexpected ways when large folios are being used. Fix those. The patches were parts of Usama's hugepage_size DAMOS filter patch series[1]. The first fix has collected from there with a slight commit message change for the subject prefix. The second fix is re-written by SJ and posted as an RFC before this series. The second one also got a slight commit message change for the subject prefix. [1] https://lore.kernel.org/20250203225604.44742-1-usamaarif642@gmail.com [2] https://lore.kernel.org/20250206231103.38298-1-sj@kernel.org This patch (of 2): This effectively adds support for large folios in damon for paddr, as damon_pa_mkold/young won't get a null folio from this function and won't ignore it, hence access will be checked and reported. This also means that larger folios will be considered for different DAMOS actions like pageout, prioritization and migration. As these DAMOS actions will consider larger folios, iterate through the region at folio_size and not PAGE_SIZE intervals. This should not have an affect on vaddr, as damon_young_pmd_entry considers pmd entries. Link: https://lkml.kernel.org/r/20250207212033.45269-1-sj@kernel.org Link: https://lkml.kernel.org/r/20250207212033.45269-2-sj@kernel.org Fixes: a28397beb55b ("mm/damon: implement primitives for physical address space monitoring") Signed-off-by: Usama Arif <usamaarif642@gmail.com> Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: SeongJae Park <sj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon: handle device-exclusive entries correctly in damon_folio_mkold_one()David Hildenbrand
Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). damon_folio_mkold_one() is not prepared for that and calls damon_ptep_mkold() with PFN swap PTEs. Teach damon_ptep_mkold() to deal with these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as damon_get_folio() filters out non-lru folios. Should we just skip PFN swap PTEs completely? Possible, but it seems straight forward to just handle it correctly. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Link: https://lkml.kernel.org/r/20250210193801.781278-16-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: SeongJae Park <sj@kernel.org> Tested-by: Alistair Popple <apopple@nvidia.com> Cc: Alex Shi <alexs@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Karol Herbst <kherbst@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Lyude <lyude@redhat.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yanteng Si <si.yanteng@linux.dev> Cc: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon: handle device-exclusive entries correctly in damon_folio_young_one()David Hildenbrand
Ever since commit b756a3b5e7ea ("mm: device exclusive memory access") we can return with a device-exclusive entry from page_vma_mapped_walk(). damon_folio_young_one() is not prepared for that, so teach it about these PFN swap PTEs. Note that device-private entries are so far not applicable on that path, as we expect ZONE_DEVICE pages so far only in migration code when it comes to the RMAP. The impact is rather small: we'd be calling pte_young() on a non-present PTE, which is not really defined to have semantic. Note that we could currently only run into this case with device-exclusive entries on THPs. We still adjust the mapcount on conversion to device-exclusive; this makes the rmap walk abort early for small folios, because we'll always have !folio_mapped() with a single device-exclusive entry. We'll adjust the mapcount logic once all page_vma_mapped_walk() users can properly handle device-exclusive entries. Link: https://lkml.kernel.org/r/20250210193801.781278-15-david@redhat.com Fixes: b756a3b5e7ea ("mm: device exclusive memory access") Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: SeongJae Park <sj@kernel.org> Tested-by: Alistair Popple <apopple@nvidia.com> Cc: Alex Shi <alexs@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Dave Airlie <airlied@gmail.com> Cc: Jann Horn <jannh@google.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Karol Herbst <kherbst@redhat.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Lyude <lyude@redhat.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Cc: Peter Xu <peterx@redhat.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Yanteng Si <si.yanteng@linux.dev> Cc: Barry Song <v-songbaohua@oppo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon/core: initialize damos->walk_completed in damon_new_scheme()SeongJae Park
The function for allocating and initialize a 'struct damos' object, damon_new_scheme(), is not initializing damos->walk_completed field. Only damos_walk_complete() is setting the field. Hence the field will be eventually set and used correctly from second damos_walk() call for the scheme. But the first damos_walk() could mistakenly not walk on the regions. Actually, a common usage of DAMOS for taking an access pattern snapshot is installing a monitoring-purpose DAMOS scheme, doing damos_walk() to retrieve the snapshot, and then removing the scheme. DAMON user-space tool (damo) also gets runtime snapshot in the way. Hence the problem can continuously happen in such use cases. Initialize it properly in the allocation function. Link: https://lkml.kernel.org/r/20250228174450.41472-1-sj@kernel.org Fixes: bf0eaba0ff9c ("mm/damon/core: implement damos_walk()") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16mm/damon: respect core layer filters' allowance decision on ops layerSeongJae Park
Filtering decisions are made in filters evaluation order. Once a decision is made by a filter, filters that scheduled to be evaluated after the decision-made filter should just respect it. This is the intended and documented behavior. Since core layer-handled filters are evaluated before operations layer-handled filters, decisions made on core layer should respected by ops layer. In case of reject filters, the decision is respected, since core layer-rejected regions are not passed to ops layer. But in case of allow filters, ops layer filters don't know if the region has passed to them because it was allowed by core filters or just because it didn't match to any core layer. The current wrong implementation assumes it was due to not matched by any core filters. As a reuslt, the decision is not respected. Pass the missing information to ops layer using a new filed in 'struct damos', and make the ops layer filters respect it. Link: https://lkml.kernel.org/r/20250228175336.42781-1-sj@kernel.org Fixes: 491fee286e56 ("mm/damon/core: support damos_filter->allow") Signed-off-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/damon/core: use str_high_low() helper in damos_wmark_wait_us()Thorsten Blum
Remove hard-coded strings by using the str_high_low() helper function. Link: https://lkml.kernel.org/r/20250116204216.106999-2-thorsten.blum@linux.dev Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/damon/paddr: improve readability of damon_pa_statJoshua Hahn
damon_pa_stat contains an unnecessary goto statement, and the if/else can be re-written to be more readable. This patch is written on top of SJ's patch series [1], which in turn is written on top of another one of his series [2]. [1] https://lore.kernel.org/all/20241219040327.61902-1-sj@kernel.org/ [2] https://lore.kernel.org/all/20241213215306.54778-1-sj@kernel.org/ Link: https://lkml.kernel.org/r/20250113210201.446051-1-joshua.hahnjy@gmail.com Signed-off-by: Joshua Hahn <joshua.hahnjy@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/damon/paddr: increment pa_stat damon address range by folio sizeUsama Arif
This is to avoid going through all the pages in a folio. For folio_size > PAGE_SIZE, damon_get_folio will return NULL for tail pages, so the for loop in those instances will be a nop. Have a more efficient loop by just incrementing the address by folio_size. Link: https://lkml.kernel.org/r/20250113190738.1156381-1-usamaarif642@gmail.com Signed-off-by: Usama Arif <usamaarif642@gmail.com> Reviewed-by: SeongJae Park <sj@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/damon/sysfs-schemes: add a file for setting damos_filter->allowSeongJae Park
Only kernel-space DAMON API users can use inclusive DAMOS filters. Add a sysfs file named 'allow' under DAMOS filter directory of DAMON sysfs interface, to let the user-space users use inclusive DAMOS filters. Link: https://lkml.kernel.org/r/20250109175126.57878-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm/damon: add 'allow' argument to damos_new_filter()SeongJae Park
DAMON API users should set damos_filter->allow manually to use a DAMOS allow-filter, since damos_new_filter() unsets the field always. It is cumbersome and easy to mistake. Add an arugment for setting the field to damos_new_filter(). Link: https://lkml.kernel.org/r/20250109175126.57878-6-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>