James Robinson | 1 Oct 2009 01:56
Picon
Favicon

Re: [Memory] in TCMalloc, more careful handling of VirtualAlloc commit via SystemAlloc

On Wed, Sep 30, 2009 at 2:28 PM, James Robinson <jamesr <at> google.com> wrote:
On Wed, Sep 30, 2009 at 11:29 AM, Anton Muhin <antonm <at> chromium.org> wrote:
On Wed, Sep 30, 2009 at 10:27 PM, Mike Belshe <mbelshe <at> google.com> wrote:
>
>
> On Wed, Sep 30, 2009 at 11:24 AM, Anton Muhin <antonm <at> chromium.org> wrote:
>>
>> On Wed, Sep 30, 2009 at 10:17 PM, Mike Belshe <mbelshe <at> google.com> wrote:
>> > On Wed, Sep 30, 2009 at 11:05 AM, Anton Muhin <antonm <at> chromium.org>
>> > wrote:
>> >>
>> >> On Wed, Sep 30, 2009 at 9:58 PM, Mike Belshe <mbelshe <at> google.com>
>> >> wrote:
>> >> > On Wed, Sep 30, 2009 at 10:48 AM, Anton Muhin <antonm <at> google.com>
>> >> > wrote:
>> >> >>
>> >> >> On Wed, Sep 30, 2009 at 9:39 PM, Jim Roskind <jar <at> google.com> wrote:
>> >> >> > If you're not interested in TCMalloc customization for Chromium,
>> >> >> > you
>> >> >> > should
>> >> >> > stop reading now.
>> >> >> > This post is meant to gather some discussion on a topic before I
>> >> >> > code
>> >> >> > and
>> >> >> > land a change.
>> >> >> > MOTIVATION
>> >> >> > We believe poor memory utilization is at the heart of a lot of
>> >> >> > jank
>> >> >> > problems.  Such problems may be difficult to repro in short
>> >> >> > controlled
>> >> >> > benchmarks, but our users are telling us we have problems, so we
>> >> >> > know
>> >> >> > we
>> >> >> > have problems.  As a result, we need to be more conservative in
>> >> >> > memory
>> >> >> > utilization and handling.
>> >> >> > SUMMARY OF CHANGE
>> >> >> > I'm thinking of changing our TCMalloc so that when a span is freed
>> >> >> > into
>> >> >> > TCMalloc's free list, and it gets coalesced with an adjacent span
>> >> >> > that
>> >> >> > is
>> >> >> > already decommitted, that the coalesced span should be entirely
>> >> >> > decommitted
>> >> >> > (as opposed to our current customized performance of committing
>> >> >> > the
>> >> >> > entire
>> >> >> > span).
>> >> >> > This proposed policy was put in place previously by Mike, but
>> >> >> > (reportedly)
>> >> >> > caused a 3-5% perf regression in V8.  I believe AntonM changed
>> >> >> > that
>> >> >> > policy
>> >> >> > to what we have currently, where always ensure full commitment of
>> >> >> > a
>> >> >> > coalesced span (regaining V8 performance on a benchmark).
>> >> >>
>> >> >> The immediate question and plea.  Question: how can we estimate
>> >> >> performance implications of the change?  Yes, we have some internal
>> >> >> benchmarks which could be used for that (they release memory
>> >> >> heavily).
>> >> >>  Anything else?
>> >> >>
>> >> >> Plea: please, do not regress DOM performance unless there are really
>> >> >> compelling reasons.  And even in this case :)
>> >> >
>> >> > Anton -
>> >> > All evidence from user complaints and bug reports are that Chrome
>> >> > uses
>> >> > too
>> >> > much memory.  If you load Chrome on a 1GB system, you can feel it
>> >> > yourself.
>> >> >  Unfortunately, we have yet to build a reliable swapping benchmark.
>> >> >  By
>> >> > allowing tcmalloc to accumulate large chunks of unused pages, we
>> >> > increase
>> >> > the chance that paging will occur on the system.  But because paging
>> >> > is
>> >> > a
>> >> > system-wide activity, it can hit our various processes in
>> >> > unpredictable
>> >> > ways
>> >> > - and this leads to jank.  I think the jank is worse than the
>> >> > benchmark
>> >> > win.
>> >> > I wish we had a better way to quantify the damage caused by paging.
>> >> >  Jim
>> >> > and
>> >> > others are working on that.
>> >> > But it's clear to me that we're just being a memory pig for what is
>> >> > really a
>> >> > modest gain on a semi-obscure benchmark right now.  Using the current
>> >> > algorithms, we have literally multi-hundred megabyte memory usage
>> >> > swings
>> >> > in
>> >> > exchange for 3% on a benchmark.  Don't you agree this is the wrong
>> >> > tradeoff?
>> >> >  (DOM benchmark grows to 500+MB right now; when you switch tabs it
>> >> > drops
>> >> > to
>> >> > <100MB).  Other pages have been witnessed which have similar behavior
>> >> > (loading the histograms page).
>> >> > We may be able to put in some algorithms which are more aware of the
>> >> > current
>> >> > available memory going forward, but I agree with Jim that there will
>> >> > be
>> >> > a
>> >> > lot of negative effects as long as we continue to have such large
>> >> > memory
>> >> > swings.
>> >>
>> >> Mike, I am completely agree that we should reduce memory usage.  On
>> >> the other hand speed was always one of Chrome trademarks.  My feeling
>> >> is more committed pages in free list make us faster (but yes, there is
>> >> paging etc.).  That's exactly the reason I asked for some way to
>> >> quantify quality of different approaches, esp. given classic memory
>> >> vs. speed dilemma, ideally (imho) both speed and memory usage should
>> >> be considered.
>> >
>> > The team is working on benchmarks.
>> > I think the evidence of paging is pretty overwhelming.
>> > Paging and jank is far worse than the small perf boost on dom node
>> > creation.
>> >  I don't believe the benchmark in question is a significant driver of
>> > primary performance.  Do you?
>>
>> To some extent.  Just to make it clear: I am not insisting, if
>> consensus is we should trade performance in DOM for reduced memory
>> usage in this case, that's fine.  I only want to have real numbers
>> before we make any decision.
>>
>> <at> pkasting: it wasn't 3%, it was (closer to 8% if memory serves).
>
> When I checked it in my records show a 217 -> 210 benchmark drop, which is
> 3%.

My numbers were substantially bigger, but anyway we need to remeasure
it---there are too many factors.

I did some measurements on my windows machine between the current behavior (always commit spans when merging them together) with a very conservative alternative (always decommit spans on ::Delete, including the just released one).  The interesting bits are the benchmark scores and memory use at the end of the run.

For the DOM benchmark, the score regressed from an average over 4 runs of 188.25 to 185 which is <2%.  The peak memory is about the same but the memory committed by the tab at the end of the run decreased from an average of 642MB to 57MB which is a 91% reduction.  4 runs probably isn't enough to make a definitive statement about the perf impact but I think the memory impact is pretty clear.  The memory characteristics of the V8 benchmark was unchanged but the performance dropped from an average of 3009 to 2944, which is about 2%.  Sunspider did not change at all in either memory or performance.

Sorry, disregard those DOM numbers (I wasn't running the right test).

I re-ran on dromaeo's DOM Core test suite twice with and without the aggressive decommitting and the numbers are:

r23768 unmodified:
scores: 299.36 run/s  302.47 run/s
memory footprint of renderer at end of run: 333,648KB 334,156KB

r23768 with decommitting:
scores: 296.06 run/s  293.88 run/s
memory footprint of renderer at end of run: 91,856KB 68,208KB

I think if the tradeoff is between <2% perf compared to 3-5x memory use it's better to get more conservative with our memory use first and then figure out how to earn back the perf impact without blowing the memory use sky-high again.  I think it's pretty clear we don't need all 200MB of extra committed memory in order to do 3 more runs per second.

- James


- James

yours,
anton.

>>
>> And forgotten.  Regarding the policy to decommit spans in ::Delete.
>> Please, correct me if I'm wrong, but doesn't that actually would make
>> all the free spans decommitted---the span would be only committed when
>> it gets allocated, no?  Decommitting only if any of adjacent spans is
>> decommitted may keep some spans committed, but it's difficult for me
>> to say how often.
>
> Oh - more work is still needed, yes :-)
>
> Mike
>
>>
>> yours,
>> anton.
>>
>> > Mike
>> >
>> >>
>> >> yours,
>> >> anton.
>> >>
>> >> > Mike
>> >> >
>> >> >
>> >> >
>> >> >>
>> >> >> > WHY CHANGE?
>> >> >> > The problematic scenario I'm anticipating (and may currently be
>> >> >> > burning
>> >> >> > us)
>> >> >> > is:
>> >> >> > a) A (renderer) process allocates a lot of memory, and achieves a
>> >> >> > significant high water mark of memory used.
>> >> >> > b) The process deallocates a lot of memory, and it flows into the
>> >> >> > TCMalloc
>> >> >> > free list. [We still have a lot of memory attributed to that
>> >> >> > process,
>> >> >> > and
>> >> >> > the app as a whole shows as using that memory.]
>> >> >> > c) We eventually decide to decommit a lot of our free memory.
>> >> >> >  Currently
>> >> >> > this happens when we switch away from a tab. [This saves us from
>> >> >> > further
>> >> >> > swapping out the unused memory].
>> >> >> > Now comes the evil problem.
>> >> >> > d) We return to the tab which has a giant free list of spans, most
>> >> >> > of
>> >> >> > which
>> >> >> > are decommitted.  [The good news is that the memory is still
>> >> >> >  decommitted]
>> >> >> > e) We allocate  a block of memory, such as 32k chunk.  This memory
>> >> >> > is
>> >> >> > pulled
>> >> >> > from a decommitted span, and ONLY the allocated chunk is
>> >> >> > committed.
>> >> >> > [That
>> >> >> > sounds good]
>> >> >> > f) We free the block of memory from (e).  What ever span is
>> >> >> > adjacent
>> >> >> > to
>> >> >> > that
>> >> >> > block is committed <potential oops>.  Hence, if we he took (e)
>> >> >> > from a
>> >> >> > 200Meg
>> >> >> > span, the act of freeing (e) will cause a 200Meg commitment!?!
>> >> >> >  This
>> >> >> > in
>> >> >> > turn
>> >> >> > would not only require touching (and having VirtualAlloc clear to
>> >> >> > zero)
>> >> >> > all
>> >> >> > allocated memory in the large span, it will also immediately put
>> >> >> > memory
>> >> >> > pressure on the OS, and force as much as 200Megs of other apps to
>> >> >> > be
>> >> >> > swapped
>> >> >> > out to disk :-(.
>> >> >>
>> >> >> I'm not sure about swapping unless you touch those now committed
>> >> >> pages, but only experiment will tell.
>> >> >>
>> >> >> > I'm wary that our recent fix that allows spans to be (correctly)
>> >> >> > coalesced
>> >> >> > independent of their size should cause it to be easier to coalesce
>> >> >> > spans.
>> >> >> >  Worse yet, as we proceed to further optimize TCMalloc, one
>> >> >> > measure
>> >> >> > of
>> >> >> > success will be that the list of spans will be fragmented less and
>> >> >> > less,
>> >> >> > and
>> >> >> > we'll have larger and larger coalesced singular spans.  Any large
>> >> >> > "reserved"
>> >> >> > but not "commited" span will be a jank time-bomb waiting to blow
>> >> >> > up
>> >> >> > if
>> >> >> > the
>> >> >> > process every allocates/frees from such a large span :-(.
>> >> >> >
>> >> >> > WHAT IS THE PLAN GOING FORWARD (or how can we do better, and
>> >> >> > regain
>> >> >> > performance, etc.)
>> >> >> > We have at least the following plausible alternative ways to move
>> >> >> > forward
>> >> >> > with TCMalloc.  The overall goal is to avoid wasteful decommits,
>> >> >> > and
>> >> >> > at
>> >> >> > the
>> >> >> > same time avoid heap-wide flailing between minimal and maximal
>> >> >> > span
>> >> >> > commitment states.
>> >> >> > Each free-span is currently the maximal contiguous region of
>> >> >> > memory
>> >> >> > that
>> >> >> > TCMalloc is controlling, but has been deallocated.  Currently
>> >> >> > spans
>> >> >> > have
>> >> >> > to
>> >> >> > be totally committed, or totally decommitted.  There is no mixture
>> >> >> > supported.
>> >> >> > a) We could re-architect the span handling to allow spans to be
>> >> >> > combinations
>> >> >> > of committed and decommitted regions.
>> >> >> > b) We could vary out policy on what to do with a coalesced span,
>> >> >> > based
>> >> >> > on
>> >> >> > span size and memory pressure.  For example: We can consistently
>> >> >> > monitor
>> >> >> > the
>> >> >> > in-use vs free (but committed) ratio.  We can try to stay in some
>> >> >> > "acceptable" region by varying our policy.
>> >> >> > c) We could actually return to the OS some portions of spans that
>> >> >> > we
>> >> >> > have
>> >> >> > decommitted.  We could then let the OS give us back these regions
>> >> >> > if
>> >> >> > we
>> >> >> > need
>> >> >> > memory.  Until we get them back, we would not be at risk of doing
>> >> >> > unnecessary commits.  Decisions about when to return to the OS can
>> >> >> > be
>> >> >> > made
>> >> >> > based on span size and memory pressure.
>> >> >> > d) We can change the interval and forcing function for
>> >> >> > decommitting
>> >> >> > spans
>> >> >> > that are in our free list.
>> >> >> > In each of the above cases, we need benchmark data on user-class
>> >> >> > machines to
>> >> >> > show costs of these changes.  Until we understand the memory
>> >> >> > impact,
>> >> >> > we
>> >> >> > need
>> >> >> > to move forward conservatively in our action, and be vigilant for
>> >> >> > thrashing
>> >> >> > scenarios.
>> >> >> >
>> >> >> > Comments??
>> >> >>
>> >> >> As a close attempt you may have a look at
>> >> >> http://codereview.chromium.org/256013/show
>> >> >>
>> >> >> That allows spans with a mix of committed/decommitted pages (but
>> >> >> only
>> >> >> in returned list) as committing seems to live fine if some pages are
>> >> >> already committed.
>> >> >>
>> >> >> That has some minor performance benefit, but I didn't investigate it
>> >> >> in details yet.
>> >> >>
>> >> >> just my 2 cents,
>> >> >> anton.
>> >> >
>> >> >
>> >
>> >
>
>



--~--~---------~--~----~------------~-------~--~----~
Chromium Developers mailing list: chromium-dev <at> googlegroups.com
View archives, change email options, or unsubscribe:
    http://groups.google.com/group/chromium-dev

-~----------~----~----~----~------~----~------~--~---


Gmane