Lee Schermerhorn | 11 Nov 20:44 2010

[PATCH/RFC 0/8] numa - Migrate-on-Fault

[RFC] Migrate-on-fault a.k.a Lazy Page Migration

At the Linux Plumber's conference, Andi Kleen encouraged me again
to resubmit my automatic page migration patches because he thinks
they will be useful for virtualization.  Later, in the Virtualization
mini-conf, the subject came up during a presentation about adding
NUMA awareness to qemu/kvm.  After the presentation, I discussed
these series with Andrea Arcangeli and he also encouraged me to
post them.  My position within HP has changed such that I'm not
sure how much time I'll have to spend on this area nor whether I'll
have access to the larger NUMA platforms on which to test the
patches thoroughly.  However, here is the second of 4 series that
comprise my shared policy enhancements and lazy/auto-migration

I have rebased the patches against a recent mmotm tree.  This
rebase built cleanly, booted and passed a few ad hoc tests on
x86_64.  I've made a pass over the patch descriptions to update
them.  If there is sufficient interest in merging this, I'll
do what I can to assist in the completion and testing of the

Based atop the previously posted:

1) Shared policy cleanup, fixes, mapped file policy

To follow:

3)  Auto [as in "self"] migration facility
4)  a Migration Cache -- originally written by Marcello Tosatti

I'll announce this series and the automatic/lazy migration series
to follow on lkml, linux-mm, ...  However, I'll limit the actual
posting to linux-numa to avoid spamming the other lists.


This series of patches implements page migration in the fault path.

!!! N.B., Need to consider iteraction with KSM and Transparent Huge
!!! Pages.

The basic idea is that when a fault handler such as do_swap_page()
finds a cached page with zero mappings that is otherwise "stable"--
e.g., no I/O in progress--this is a good opportunity to check whether the
page resides on the node indicated by the mempolicy in the current context.

We only attempt to migrate when there are zero mappings because 1) we can
easily migrate the page--don't have to go through the effort of removing
all mappings and 2) default policy--a common case--can give different
answers from different tasks running on different nodes.  Checking the
policy when there are zero mappings effectively implements a "first touch"
placement policy.

Note that this mechanism could be used to migrate page cache pages that
were read in earlier, are no longer referenced, but are about to be
used by a new task on another node from where the page resides.  The
same mechanism can be used to pull anon pages along with a task when
the load balancer decides to move it to another node.  However, that
will require a bit more mechanism, and is the subject of another
patch series.

The kernel's direct migration facility support most of the
mechanism that is required to implement this "migration on fault".
Some changes were needed to the migratepage op functions to behave
appropriately when called from the fault path.  Then we need to add
the function[s] to test the current page in the fault path for zero
mapping, no writebacks, misplacement, ...; and the
function[s] to acutally migrate the page contents to a newly
allocated page using the [modified] migratepage address space
operations of the direct migration mechanism.

This series used to include patches to migrate cached file pages and
shmem pages.  Testing with, e.g., kernel builds, showed a great deal
of thrashing of page cache pages, so those patches have been removed.
I think page replication would be a better approach for shared,
read-only pages.  Nick Piggin created such a patch quite a while back
and I had integrated it with automigration series.  Those patches have
since gone stale.

Lee Schermerhorn