[PATCH -mm] mm: more likely reclaim MADV_SEQUENTIAL mappings

Linux Kernel Mailing List, post #232,059
Author:
Date:
Subject:
 Johannes Weiner
 2008-07-19 19:31:49
 [PATCH -mm] mm: more likely reclaim MADV_SEQUENTIAL mappings
File pages accessed only once through sequential-read mappings between
fault and scan time are perfect candidates for reclaim.

This patch makes page_referenced() ignore these singular references and
the pages stay on the inactive list where they likely fall victim to the
next reclaim phase.

Already activated pages are still treated normally. If they were
accessed multiple times and therefor promoted to the active list, we
probably want to keep them.

Benchmarks show that big (relative to the system's memory)
MADV_SEQUENTIAL mappings read sequentially cause much less kernel
activity. Especially less LRU moving-around because we never activate
read-once pages in the first place just to demote them again.

And leaving these perfect reclaim candidates on the inactive list makes
it more likely for the real working set to survive the next reclaim
scan.

Signed-off-by: Johannes Weiner <[email protected]>
Signed-off-by: Rik van Riel <[email protected]>
---
mm/rmap.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)

Benchmark graphs and the test-application can be found here:

http://hannes.saeurebad.de/madvseq/

Patch is against -mm, although only tested on good ol' linus-tree as
-mmotm wouldn't compile at the moment.

--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -333,8 +333,18 @@ static int page_referenced_one(struct pa
goto out_unmap;
}

- if (ptep_clear_flush_young_notify(vma, address, pte))
- referenced++;
+ if (ptep_clear_flush_young_notify(vma, address, pte)) {
+ /*
+ * If there was just one sequential access to the
+ * page, ignore it. Otherwise, mark_page_accessed()
+ * will have promoted the page to the active list and
+ * it should be kept.
+ */
+ if (VM_SequentialReadHint(vma) && !PageActive(page))
+ ClearPageReferenced(page);
+ else
+ referenced++;
+ }

/* Pretend the page is referenced if the task has the
swap token and is in the middle of a page fault. */
@@ -455,9 +465,6 @@ int page_referenced(struct page *page, i
{
int referenced = 0;

- if (TestClearPageReferenced(page))
- referenced++;
-
if (page_mapped(page) && page->mapping) {
if (PageAnon(page))
referenced += page_referenced_anon(page, mem_cont);
@@ -473,6 +480,9 @@ int page_referenced(struct page *page, i
}
}

+ if (TestClearPageReferenced(page))
+ referenced++;
+
if (page_test_and_clear_young(page))
referenced++;

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author:
Date:
Subject:
 Rik van Riel
 2008-07-19 13:59:30
 Re: [PATCH -mm] mm: more likely reclaim MADV_SEQUENTIAL mappings
On Sat, 19 Jul 2008 19:31:49 +0200
Johannes Weiner <[email protected]> wrote:

> File pages accessed only once through sequential-read mappings between
> fault and scan time are perfect candidates for reclaim.
>
> This patch makes page_referenced() ignore these singular references and
> the pages stay on the inactive list where they likely fall victim to the
> next reclaim phase.

Which is exactly what the madvise man page says about pages in
MADV_SEQUENTIAL ranges. Yay.

MADV_SEQUENTIAL
Expect page references in sequential order. (Hence, pages in
the given range can be aggressively read ahead, and may be freed
soon after they are accessed.)

--
All rights reversed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author:
Date:
Subject:
 KOSAKI Motohiro
 2008-07-21 09:09:26
 Re: [PATCH -mm] mm: more likely reclaim MADV_SEQUENTIAL mappings
Hi Johannes,

> File pages accessed only once through sequential-read mappings between
> fault and scan time are perfect candidates for reclaim.
>
> This patch makes page_referenced() ignore these singular references and
> the pages stay on the inactive list where they likely fall victim to the
> next reclaim phase.
>
> Already activated pages are still treated normally. If they were
> accessed multiple times and therefor promoted to the active list, we
> probably want to keep them.
>
> Benchmarks show that big (relative to the system's memory)
> MADV_SEQUENTIAL mappings read sequentially cause much less kernel
> activity. Especially less LRU moving-around because we never activate
> read-once pages in the first place just to demote them again.
>
> And leaving these perfect reclaim candidates on the inactive list makes
> it more likely for the real working set to survive the next reclaim
> scan.

looks good to me.
Actually, I made similar patch half year ago.

in my experience,
- page_referenced_one is performance critical point.
you should test some benchmark.
- its patch improved mmaped-copy performance about 5%.
(Of cource, you should test in current -mm. MM code was changed widely)

So, I'm looking for your test result :)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author:
Date:
Subject:
 Andrew Morton
 2008-07-20 18:48:43
 Re: [PATCH -mm] mm: more likely reclaim MADV_SEQUENTIAL mappings
On Mon, 21 Jul 2008 09:09:26 +0900 "KOSAKI Motohiro" <[email protected]> wrote:

> Hi Johannes,
>
> > File pages accessed only once through sequential-read mappings between
> > fault and scan time are perfect candidates for reclaim.
> >
> > This patch makes page_referenced() ignore these singular references and
> > the pages stay on the inactive list where they likely fall victim to the
> > next reclaim phase.
> >
> > Already activated pages are still treated normally. If they were
> > accessed multiple times and therefor promoted to the active list, we
> > probably want to keep them.
> >
> > Benchmarks show that big (relative to the system's memory)
> > MADV_SEQUENTIAL mappings read sequentially cause much less kernel
> > activity. Especially less LRU moving-around because we never activate
> > read-once pages in the first place just to demote them again.
> >
> > And leaving these perfect reclaim candidates on the inactive list makes
> > it more likely for the real working set to survive the next reclaim
> > scan.
>
> looks good to me.
> Actually, I made similar patch half year ago.
>
> in my experience,
> - page_referenced_one is performance critical point.
> you should test some benchmark.
> - its patch improved mmaped-copy performance about 5%.
> (Of cource, you should test in current -mm. MM code was changed widely)
>
> So, I'm looking for your test result :)

The change seems logical and I queued it for 2.6.28.

But yes, testing for what-does-this-improve is good and useful, but so
is testing for what-does-this-worsen. How do we do that in this case?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Author:
Date:
Subject:
 KOSAKI Motohiro
 2008-07-21 12:53:23
 Re: [PATCH -mm] mm: more likely reclaim MADV_SEQUENTIAL mappings
Hi Andrew,

>> in my experience,
>> - page_referenced_one is performance critical point.
>> you should test some benchmark.
>> - its patch improved mmaped-copy performance about 5%.
>> (Of cource, you should test in current -mm. MM code was changed widely)
>>
>> So, I'm looking for your test result :)
>
> The change seems logical and I queued it for 2.6.28.

Great.

> But yes, testing for what-does-this-improve is good and useful, but so
> is testing for what-does-this-worsen. How do we do that in this case?

In general, page_referenced_one is important for reclaim throuput.
if crap page_referenced_one changing happend,
system reclaim throuput become slow down.

Of cource, I don't think this patch cause performance regression :-)

So, any benchmark with memcgroup memory restriction is good choice.

btw:
maybe, I will able to post mamped-copy improve mesurement of Johannes's patch
after OLS.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/