14 Feb 09:57 2011
Re: heap exhaustion ?
Nikodemus Siivola <nikodemus <at> random-state.net>
2011-02-14 08:57:34 GMT
2011-02-14 08:57:34 GMT
On 14 February 2011 07:04, Jianshi Huang <jianshi.huang <at> gmail.com> wrote: > Heap exhausted during garbage collection: 448 bytes available, 512 requested. > Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc > Waste Trig WP GCs Mem-age > 0: 0 0 0 0 0 0 0 0 0 0 > 0 2000000 0 0 0.0000 > 1: 0 0 0 0 0 0 0 0 0 0 > 0 2000000 0 0 0.0000 > 2: 0 0 0 0 0 0 0 0 0 0 > 0 2000000 0 0 0.0000 > 3: 0 0 0 0 0 0 0 0 0 0 > 0 2000000 0 0 0.0000 > 4: 0 0 0 0 0 0 0 0 0 0 > 0 2000000 0 0 0.0000 > 5: 47086 198400 0 0 130093 907945 417 757 0 > 4113377392 143234960 3988343744 0 19 0.9691 > 6: 0 0 0 0 18880 3778 0 0 0 92807168 > 0 2000000 18630 0 0.0000 > Total bytes allocated = 8305371856 > Dynamic-space-size bytes = 4294901760 > GC control variables: > *GC-INHIBIT* = true > *GC-PENDING* = in progress > *STOP-FOR-GC-PENDING* = false > fatal error encountered in SBCL pid 29148(tid 140737209779968): > Heap exhausted, game over. > > > It seems SBCL try to allocated more than 8GB?! of memory and died > since max heap size is 8GB. My program usually consumes 700MB~1GB for > data, so others memory used should be consumed by garbages. > > My questions are: > > 1) What does " Dynamic-space-size bytes" mean here? It's about half > the size of maximum heap It is actually the size of the heap, but the fprintf statement in the runtime appears to have %u where it should have %lu ... so that it has gotten truncated by accident. Argh. But it does stand for the heap-size, so you can disregard that. > 2) My program processes data from streams so objects get created and > collected constantly, is it the case that the youngest generations of > heap kept expanding and finally exceeded the limit? If so, is there a > way to compact/resize the generations?(a full GC?) Youngest generations cannot expand without bounds: objects instead get migrated older generations when necessary. From the heap-map printed we can see all data has been migrated to the final untenured generation by the time the heap is exhausted. (The final generation gets collected too, so that's no a problem.) Without knowing more about you allocation patterns it's hard to hazard a guess, but: (define-alien-routine print-generation-stats void) and you can call (print-generation-stats) to print the heap map at any time to stderr of the process, so you can see what is happening -- are things slowly accumulating in older generations over the lifetime of your application, or is this is a sudden collapse, etc. I'll start from the assumption that you're not accidentally accumulating things in your application. :) There are two major reasons for running out of memory through no fault of your own: 1. Getting bitten by SBCL's conservativism (stack and registers.) However, in a long-running application the effect of conservativism typically isn't the issue: if it was the problem, it should not cause slow accumulation of uncollected garbage, but it should rather keep a static amount of garbage uncollected. So while the following should not apply to you, I'll put it here for lurkers to read: If your application produces large multipart structures (long lists, trees of objects -- especially if those tree contain backpointers), it can help to break links in those structures when they are no longer needed. The problem with "large multipart structures" is that there is a large number of objects in the heap where conservatively retaining any of them could cause many others to be retained as well. Of course large flat structures like big SIMPLE-VECTORs aren't immune to this either, but since they aren't preserved by interior pointers they are no more likely to be retained by accident than a single CONS cell ... and if you have million elements in a list, you have a million potential false positives on average responsible for retaining 0.5 million other things. So in practise it is large structures composed of great many smaller parts are only ever a problem. (Which is not to say they are automatically a problem -- again, it really depends on the application.) 2. Doing *something* which causes SBCL to keep accumulating something in its internals due to a known or unknown bug. For example: (defun foo () (let ((name (gensym))) (setf (fdefinition name) #'foo) (fmakunbound name))) (loop (foo)) will eventually exhause the heap as even though FMAKUNBOUND removes the function binding, it leaves the *name* in SBCL's globaldb. EQL-specializers and EQL-specialized methods in CLOS are another known leak. I can't from the top my head think of other known issues, but maybe there is an unknown one that is biting you? 4. Getting bitten by the generational trap. Let's say you have "cyclic" application -- a workload comes in, you process it, then repeat from start with another workload. Let's say that processing a single workload involves on average 1 minor GC in which large amounts of the data can be live. These minor GCs initially only collect the nursery, promoting live objects to generation 1 -- where uncollected garbage from earlier cycles keeps accumulating till a collection is triggered for it as well. When this happens, *first* the nursery is collected into gen 1. Then gen 1 is collected into gen 2. So now live objects from this cycle have ended up in gen 2 -- where slowly in this way uncollected garbage accumulates till a collection is triggered for gen 2. When this happens, first nursery is collected into gen 1. Then gen 1 into gen 2. Then gen 2 into gen 3... so now live objects from this cycle ended up in gen 3, where they accumulate till an even deeper collection is triggered. This keeps going on till final generation is reached. When that happens, it is collected but not promoted, breaking the chain of promotion. Now, given a "bad" allocation pattern, it may be that you exhaust the heap due to uncollected garbage in older generations before a collection deep enough to collect that garbage is triggered. Based on the description of your application, I suspect this may be happening to you. In this case forcing a full collection every cycle (or every few cycles) should help -- watching the PRINT-GENERATION-STATS should tell you if this is the case, and how often you should force a full GC. ... As for bad stuff your application (or a library you depend on) could be doing: A. a cache or memoization that keeps growing without bounds? B. a HASH-TABLE that should be weak, but has misspecified it's weakness -- :VALUE when it should be :KEY, etc. C. a HASH-TABLE created using a large :REHASH-SIZE -- this is virtually almost a bad idea, especially if the number is a float... > 3) What's the best practice of memory management for long-run programs > in sbcl? Do full GC periodically? See above. Cheers, -- Nikodemus ------------------------------------------------------------------------------ The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb _______________________________________________ Sbcl-devel mailing list Sbcl-devel <at> lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/sbcl-devel