diff options
author | Lorenzo Stoakes <lorenzo.stoakes@oracle.com> | 2025-01-31 12:31:49 +0000 |
---|---|---|
committer | Andrew Morton <akpm@linux-foundation.org> | 2025-03-16 22:06:01 -0700 |
commit | 3a75ccba047b11a4e8c437e8347b2d1746f1d808 (patch) | |
tree | 564450ca65d9d00d364a46d8ad1b8535a0f365a1 /mm/vma.c | |
parent | 6df8bae8e851eacf2acf2237860213e002aba74f (diff) |
mm: simplify vma merge structure and expand comments
Patch series "mm: further simplify VMA merge operation", v3.
While significant efforts have been made to improve the VMA merge
operation, there remains remnants of the bad (or rather confusing) old
days, which make the code difficult to understand, more bug prone and thus
harder to modify.
This series attempts to significantly improve matters in a number of
respects - with a focus on simplifying the commit_merge() function which
actually actions the merge operation - and importantly, adjusting the two
most confusing merge cases - those in which we 'adjust' the VMA
immediately adjacent to the one being merged.
One source of confusion are the VMAs being threaded through the operation
themselves - vmg->prev, vmg->vma and vmg->next.
At the start of the operation, vmg->vma is either NULL if a new VMA is
propose to be added, or if not then a pointer to an existing VMA being
modified, and prev/next are (perhaps not present) VMAs sat immediately
before and after the range specified in vmg->start, end, respectively.
However, during the VMA merge operation, we change vmg->start, end and
pgoff to span the newly merged range and vmg->vma to either be:
a. The ultimately returned VMA (in most cases) or b. A VMA which we will
manipulate, but ultimately instead return vmg->next.
Case b. especially here is confusing for somebody reading this code, but
the fact we update this state, along with vmg->start, end, pgoff only
makes matters worse.
We simplify things by replacing vmg->vma with vmg->middle and never
changing it - this is always either NULL (for a new VMA) or the VMA being
modified between vmg->prev and vmg->next.
We further simplify by placing the merged VMA in a new vmg->target field -
whether case b. above is the case or not. The reader of the code can now
simply rely on vmg->middle being the middle VMA and vmg->target being the
ultimately merged VMA.
We additionally tackle the confusing cases where we 'adjust' VMAs other
than the one we ultimately return as the merged VMA (this includes case b.
above). These are:
(1)
merge
<----------->
|------||--------| |------------|---|
| prev || middle | -> | target | m |
|------||--------| |------------|---|
In which case middle must be adjusted so middle->vm_start is increased as
well as performing the merge.
(2) (equivalent to case b. above)
<------------->
|---------||------| |---|-------------|
| middle || next | -> | m | target |
|---------||------| |---|-------------|
In which case next must be adjusted so next->vm_start is decreased as well
as performing the merge.
This cases have previously been performed by calculating and passing
around a dubious and confusing 'adj_start' parameter along side a pointer
to an 'adjust' VMA indicating which VMA requires additional adjustment
(middle in case 1 and next in case 2).
With the VMG structure in place we are able to avoid this by simply
setting a merge flag to describe each case:
(1) Sets the vmg->__adjust_middle_start flag
(2) Sets the vmg->__adjust_next_start flag
By doing so it turns out we can vastly simplify the logic and calculate
what is required to perform the operation.
Taken together the refactorings make it far easier to understand what is
being done even in these more confusing cases, make the code far more
maintainable, debuggable, and testable, providing more internal state
indicating what is happening in the merge operation.
The changes have no functional net impact on the merge operation and
everything should still behave as it did before.
This patch (of 5):
The merge code, while much improved, still has a number of points of
confusion. As part of a broader series cleaning this up to make this more
maintainable, we start by addressing some confusion around
vma_merge_struct fields.
So far, the caller either provides no vmg->vma (a new VMA) or supplies the
existing VMA which is being altered, setting vmg->start,end,pgoff to the
proposed VMA dimensions.
vmg->vma is then updated, as are vmg->start,end,pgoff as the merge process
proceeds and the appropriate merge strategy is determined.
This is rather confusing, as vmg->vma starts off as the 'middle' VMA
between vmg->prev,next, but becomes the 'target' VMA, except in one
specific edge case (merge next, shrink middle).
Int his patch we introduce vmg->middle to describe the VMA that is between
vmg->prev and vmg->next, and does NOT change during the merge operation.
We replace vmg->vma with vmg->target, and use this only during the merge
operation itself.
Aside from the merge right, shrink middle case, this becomes the VMA that
forms the basis of the VMA that is returned. This edge case can be
addressed in a future commit.
We also add a number of comments to explain what is going on.
Finally, we adjust the ASCII diagrams showing each merge case in
vma_merge_existing_range() to be clearer - the arrow range previously
showed the vmg->start, end spanned area, but it is clearer to change this
to show the final merged VMA.
This patch has no change in functional behaviour.
Link: https://lkml.kernel.org/r/cover.1738326519.git.lorenzo.stoakes@oracle.com
Link: https://lkml.kernel.org/r/4dfe60f1419d55e5d0516f56349695d73a57184c.1738326519.git.lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Diffstat (limited to 'mm/vma.c')
-rw-r--r-- | mm/vma.c | 166 |
1 files changed, 86 insertions, 80 deletions
@@ -52,7 +52,7 @@ struct mmap_state { .pgoff = (map_)->pgoff, \ .file = (map_)->file, \ .prev = (map_)->prev, \ - .vma = vma_, \ + .middle = vma_, \ .next = (vma_) ? NULL : (map_)->next, \ .state = VMA_MERGE_START, \ .merge_flags = VMG_FLAG_DEFAULT, \ @@ -639,7 +639,7 @@ static int commit_merge(struct vma_merge_struct *vmg, { struct vma_prepare vp; - init_multi_vma_prep(&vp, vmg->vma, adjust, remove, remove2); + init_multi_vma_prep(&vp, vmg->target, adjust, remove, remove2); VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma && vp.anon_vma != adjust->anon_vma); @@ -652,15 +652,15 @@ static int commit_merge(struct vma_merge_struct *vmg, adjust->vm_end); } - if (vma_iter_prealloc(vmg->vmi, vmg->vma)) + if (vma_iter_prealloc(vmg->vmi, vmg->target)) return -ENOMEM; vma_prepare(&vp); - vma_adjust_trans_huge(vmg->vma, vmg->start, vmg->end, adj_start); - vma_set_range(vmg->vma, vmg->start, vmg->end, vmg->pgoff); + vma_adjust_trans_huge(vmg->target, vmg->start, vmg->end, adj_start); + vma_set_range(vmg->target, vmg->start, vmg->end, vmg->pgoff); if (expanded) - vma_iter_store(vmg->vmi, vmg->vma); + vma_iter_store(vmg->vmi, vmg->target); if (adj_start) { adjust->vm_start += adj_start; @@ -671,7 +671,7 @@ static int commit_merge(struct vma_merge_struct *vmg, } } - vma_complete(&vp, vmg->vmi, vmg->vma->vm_mm); + vma_complete(&vp, vmg->vmi, vmg->target->vm_mm); return 0; } @@ -694,8 +694,9 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma) * identical properties. * * This function checks for the existence of any such mergeable VMAs and updates - * the maple tree describing the @vmg->vma->vm_mm address space to account for - * this, as well as any VMAs shrunk/expanded/deleted as a result of this merge. + * the maple tree describing the @vmg->middle->vm_mm address space to account + * for this, as well as any VMAs shrunk/expanded/deleted as a result of this + * merge. * * As part of this operation, if a merge occurs, the @vmg object will have its * vma, start, end, and pgoff fields modified to execute the merge. Subsequent @@ -704,45 +705,47 @@ static bool can_merge_remove_vma(struct vm_area_struct *vma) * Returns: The merged VMA if merge succeeds, or NULL otherwise. * * ASSUMPTIONS: - * - The caller must assign the VMA to be modifed to @vmg->vma. + * - The caller must assign the VMA to be modifed to @vmg->middle. * - The caller must have set @vmg->prev to the previous VMA, if there is one. * - The caller must not set @vmg->next, as we determine this. * - The caller must hold a WRITE lock on the mm_struct->mmap_lock. - * - vmi must be positioned within [@vmg->vma->vm_start, @vmg->vma->vm_end). + * - vmi must be positioned within [@vmg->middle->vm_start, @vmg->middle->vm_end). */ static __must_check struct vm_area_struct *vma_merge_existing_range( struct vma_merge_struct *vmg) { - struct vm_area_struct *vma = vmg->vma; + struct vm_area_struct *middle = vmg->middle; struct vm_area_struct *prev = vmg->prev; struct vm_area_struct *next, *res; struct vm_area_struct *anon_dup = NULL; struct vm_area_struct *adjust = NULL; unsigned long start = vmg->start; unsigned long end = vmg->end; - bool left_side = vma && start == vma->vm_start; - bool right_side = vma && end == vma->vm_end; + bool left_side = middle && start == middle->vm_start; + bool right_side = middle && end == middle->vm_end; int err = 0; long adj_start = 0; - bool merge_will_delete_vma, merge_will_delete_next; + bool merge_will_delete_middle, merge_will_delete_next; bool merge_left, merge_right, merge_both; bool expanded; mmap_assert_write_locked(vmg->mm); - VM_WARN_ON_VMG(!vma, vmg); /* We are modifying a VMA, so caller must specify. */ + VM_WARN_ON_VMG(!middle, vmg); /* We are modifying a VMA, so caller must specify. */ VM_WARN_ON_VMG(vmg->next, vmg); /* We set this. */ VM_WARN_ON_VMG(prev && start <= prev->vm_start, vmg); VM_WARN_ON_VMG(start >= end, vmg); /* - * If vma == prev, then we are offset into a VMA. Otherwise, if we are + * If middle == prev, then we are offset into a VMA. Otherwise, if we are * not, we must span a portion of the VMA. */ - VM_WARN_ON_VMG(vma && ((vma != prev && vmg->start != vma->vm_start) || - vmg->end > vma->vm_end), vmg); - /* The vmi must be positioned within vmg->vma. */ - VM_WARN_ON_VMG(vma && !(vma_iter_addr(vmg->vmi) >= vma->vm_start && - vma_iter_addr(vmg->vmi) < vma->vm_end), vmg); + VM_WARN_ON_VMG(middle && + ((middle != prev && vmg->start != middle->vm_start) || + vmg->end > middle->vm_end), vmg); + /* The vmi must be positioned within vmg->middle. */ + VM_WARN_ON_VMG(middle && + !(vma_iter_addr(vmg->vmi) >= middle->vm_start && + vma_iter_addr(vmg->vmi) < middle->vm_end), vmg); vmg->state = VMA_MERGE_NOMERGE; @@ -776,13 +779,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range( merge_both = merge_left && merge_right; /* If we span the entire VMA, a merge implies it will be deleted. */ - merge_will_delete_vma = left_side && right_side; + merge_will_delete_middle = left_side && right_side; /* - * If we need to remove vma in its entirety but are unable to do so, + * If we need to remove middle in its entirety but are unable to do so, * we have no sensible recourse but to abort the merge. */ - if (merge_will_delete_vma && !can_merge_remove_vma(vma)) + if (merge_will_delete_middle && !can_merge_remove_vma(middle)) return NULL; /* @@ -793,7 +796,7 @@ static __must_check struct vm_area_struct *vma_merge_existing_range( /* * If we cannot delete next, then we can reduce the operation to merging - * prev and vma (thereby deleting vma). + * prev and middle (thereby deleting middle). */ if (merge_will_delete_next && !can_merge_remove_vma(next)) { merge_will_delete_next = false; @@ -801,8 +804,8 @@ static __must_check struct vm_area_struct *vma_merge_existing_range( merge_both = false; } - /* No matter what happens, we will be adjusting vma. */ - vma_start_write(vma); + /* No matter what happens, we will be adjusting middle. */ + vma_start_write(middle); if (merge_left) vma_start_write(prev); @@ -812,13 +815,13 @@ static __must_check struct vm_area_struct *vma_merge_existing_range( if (merge_both) { /* - * |<----->| - * |-------*********-------| - * prev vma next - * extend delete delete + * |<-------------------->| + * |-------********-------| + * prev middle next + * extend delete delete */ - vmg->vma = prev; + vmg->target = prev; vmg->start = prev->vm_start; vmg->end = next->vm_end; vmg->pgoff = prev->vm_pgoff; @@ -826,78 +829,79 @@ static __must_check struct vm_area_struct *vma_merge_existing_range( /* * We already ensured anon_vma compatibility above, so now it's * simply a case of, if prev has no anon_vma object, which of - * next or vma contains the anon_vma we must duplicate. + * next or middle contains the anon_vma we must duplicate. */ - err = dup_anon_vma(prev, next->anon_vma ? next : vma, &anon_dup); + err = dup_anon_vma(prev, next->anon_vma ? next : middle, + &anon_dup); } else if (merge_left) { /* - * |<----->| OR - * |<--------->| + * |<------------>| OR + * |<----------------->| * |-------************* - * prev vma + * prev middle * extend shrink/delete */ - vmg->vma = prev; + vmg->target = prev; vmg->start = prev->vm_start; vmg->pgoff = prev->vm_pgoff; - if (!merge_will_delete_vma) { - adjust = vma; - adj_start = vmg->end - vma->vm_start; + if (!merge_will_delete_middle) { + adjust = middle; + adj_start = vmg->end - middle->vm_start; } - err = dup_anon_vma(prev, vma, &anon_dup); + err = dup_anon_vma(prev, middle, &anon_dup); } else { /* merge_right */ /* - * |<----->| OR - * |<--------->| + * |<------------->| OR + * |<----------------->| * *************-------| - * vma next + * middle next * shrink/delete extend */ pgoff_t pglen = PHYS_PFN(vmg->end - vmg->start); VM_WARN_ON_VMG(!merge_right, vmg); - /* If we are offset into a VMA, then prev must be vma. */ - VM_WARN_ON_VMG(vmg->start > vma->vm_start && prev && vma != prev, vmg); + /* If we are offset into a VMA, then prev must be middle. */ + VM_WARN_ON_VMG(vmg->start > middle->vm_start && prev && middle != prev, vmg); - if (merge_will_delete_vma) { - vmg->vma = next; + if (merge_will_delete_middle) { + vmg->target = next; vmg->end = next->vm_end; vmg->pgoff = next->vm_pgoff - pglen; } else { /* - * We shrink vma and expand next. + * We shrink middle and expand next. * * IMPORTANT: This is the ONLY case where the final - * merged VMA is NOT vmg->vma, but rather vmg->next. + * merged VMA is NOT vmg->target, but rather vmg->next. */ - - vmg->start = vma->vm_start; + vmg->target = middle; + vmg->start = middle->vm_start; vmg->end = start; - vmg->pgoff = vma->vm_pgoff; + vmg->pgoff = middle->vm_pgoff; adjust = next; - adj_start = -(vma->vm_end - start); + adj_start = -(middle->vm_end - start); } - err = dup_anon_vma(next, vma, &anon_dup); + err = dup_anon_vma(next, middle, &anon_dup); } if (err) goto abort; /* - * In nearly all cases, we expand vmg->vma. There is one exception - + * In nearly all cases, we expand vmg->middle. There is one exception - * merge_right where we partially span the VMA. In this case we shrink - * the end of vmg->vma and adjust the start of vmg->next accordingly. + * the end of vmg->middle and adjust the start of vmg->next accordingly. */ - expanded = !merge_right || merge_will_delete_vma; + expanded = !merge_right || merge_will_delete_middle; if (commit_merge(vmg, adjust, - merge_will_delete_vma ? vma : NULL, + merge_will_delete_middle ? middle : NULL, merge_will_delete_next ? next : NULL, adj_start, expanded)) { if (anon_dup) @@ -973,7 +977,7 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg) bool just_expand = vmg->merge_flags & VMG_FLAG_JUST_EXPAND; mmap_assert_write_locked(vmg->mm); - VM_WARN_ON_VMG(vmg->vma, vmg); + VM_WARN_ON_VMG(vmg->middle, vmg); /* vmi must point at or before the gap. */ VM_WARN_ON_VMG(vma_iter_addr(vmg->vmi) > end, vmg); @@ -989,13 +993,13 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg) /* If we can merge with the next VMA, adjust vmg accordingly. */ if (can_merge_right) { vmg->end = next->vm_end; - vmg->vma = next; + vmg->middle = next; } /* If we can merge with the previous VMA, adjust vmg accordingly. */ if (can_merge_left) { vmg->start = prev->vm_start; - vmg->vma = prev; + vmg->middle = prev; vmg->pgoff = prev->vm_pgoff; /* @@ -1017,10 +1021,10 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg) * Now try to expand adjacent VMA(s). This takes care of removing the * following VMA if we have VMAs on both sides. */ - if (vmg->vma && !vma_expand(vmg)) { - khugepaged_enter_vma(vmg->vma, vmg->flags); + if (vmg->middle && !vma_expand(vmg)) { + khugepaged_enter_vma(vmg->middle, vmg->flags); vmg->state = VMA_MERGE_SUCCESS; - return vmg->vma; + return vmg->middle; } return NULL; @@ -1032,44 +1036,46 @@ struct vm_area_struct *vma_merge_new_range(struct vma_merge_struct *vmg) * @vmg: Describes a VMA expansion operation. * * Expand @vma to vmg->start and vmg->end. Can expand off the start and end. - * Will expand over vmg->next if it's different from vmg->vma and vmg->end == - * vmg->next->vm_end. Checking if the vmg->vma can expand and merge with + * Will expand over vmg->next if it's different from vmg->middle and vmg->end == + * vmg->next->vm_end. Checking if the vmg->middle can expand and merge with * vmg->next needs to be handled by the caller. * * Returns: 0 on success. * * ASSUMPTIONS: - * - The caller must hold a WRITE lock on vmg->vma->mm->mmap_lock. - * - The caller must have set @vmg->vma and @vmg->next. + * - The caller must hold a WRITE lock on vmg->middle->mm->mmap_lock. + * - The caller must have set @vmg->middle and @vmg->next. */ int vma_expand(struct vma_merge_struct *vmg) { struct vm_area_struct *anon_dup = NULL; bool remove_next = false; - struct vm_area_struct *vma = vmg->vma; + struct vm_area_struct *middle = vmg->middle; struct vm_area_struct *next = vmg->next; mmap_assert_write_locked(vmg->mm); - vma_start_write(vma); - if (next && (vma != next) && (vmg->end == next->vm_end)) { + vma_start_write(middle); + if (next && (middle != next) && (vmg->end == next->vm_end)) { int ret; remove_next = true; /* This should already have been checked by this point. */ VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg); vma_start_write(next); - ret = dup_anon_vma(vma, next, &anon_dup); + ret = dup_anon_vma(middle, next, &anon_dup); if (ret) return ret; } /* Not merging but overwriting any part of next is not handled. */ VM_WARN_ON_VMG(next && !remove_next && - next != vma && vmg->end > next->vm_start, vmg); + next != middle && vmg->end > next->vm_start, vmg); /* Only handles expanding */ - VM_WARN_ON_VMG(vma->vm_start < vmg->start || vma->vm_end > vmg->end, vmg); + VM_WARN_ON_VMG(middle->vm_start < vmg->start || + middle->vm_end > vmg->end, vmg); + vmg->target = middle; if (commit_merge(vmg, NULL, remove_next ? next : NULL, NULL, 0, true)) goto nomem; @@ -1508,7 +1514,7 @@ int do_vmi_munmap(struct vma_iterator *vmi, struct mm_struct *mm, */ static struct vm_area_struct *vma_modify(struct vma_merge_struct *vmg) { - struct vm_area_struct *vma = vmg->vma; + struct vm_area_struct *vma = vmg->middle; unsigned long start = vmg->start; unsigned long end = vmg->end; struct vm_area_struct *merged; @@ -1609,7 +1615,7 @@ struct vm_area_struct *vma_merge_extend(struct vma_iterator *vmi, VMG_VMA_STATE(vmg, vmi, vma, vma, vma->vm_end, vma->vm_end + delta); vmg.next = vma_iter_next_rewind(vmi, NULL); - vmg.vma = NULL; /* We use the VMA to populate VMG fields only. */ + vmg.middle = NULL; /* We use the VMA to populate VMG fields only. */ return vma_merge_new_range(&vmg); } @@ -1730,7 +1736,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, if (new_vma && new_vma->vm_start < addr + len) return NULL; /* should never get here */ - vmg.vma = NULL; /* New VMA range. */ + vmg.middle = NULL; /* New VMA range. */ vmg.pgoff = pgoff; vmg.next = vma_iter_next_rewind(&vmi, NULL); new_vma = vma_merge_new_range(&vmg); |