David Schleef | 8 Apr 02:54 2004

memory usage script

There's been some discussion on IRC lately about tracking the memory
usage of a process.  In particular, looking at any usage statistics
in /proc or ps or top is essentially useless, because those numbers
often don't correlate to anything we care about.

For the most part, we only care about "dirty pages"[0].  What is a
dirty page?  A dirty page is a page that is not backed up by a file,
or a page that is changed from the contents of the file.  The stack
is on dirty pages.  Memory allocated by malloc is typically on dirty
pages.  Mmap()ed files, such as buffers from filesrc, are not dirty.
Executable code from a library is not dirty.

There are, of course, exceptions to these rules.  *Untouched* memory
is never dirty, so if you malloc a gigabyte of memory, you'll only
cause a few dirty pages.  If you write to all that memory, you'll
have about 250k dirty pages (on i386).

The problem is, Linux doesn't directly tell you which pages are
dirty.  However, you can often guess -- in well written programs,
malloced areas are almost always dirty, for example.  Executable
code is almost never dirty.  But you still need some way to gather
statistics for pages you *can* tell apart.  Enter mem_usage, the
attached script.

Here's the output of './mem_usage 20536', which happens to be
rhythmbox (numbers are in kB):

Backed by file:
  Executable               16720
  RO data                  2748
  Data                     2160      <-- dirty
  Unknown                  0
  Writable code (stack)    16932
  Data (malloc, mmap)      3312      <-- dirty
  RO data                  0
  Unreadable               1672
  Unknown                  0

This is with about 1500 songs loaded, after playing for about 20
minutes.  Note that there is about 5 MB that is definitely dirty.
Note that the process has about 16.5 MB allocated for the stack --
a significant percentage of this is untouched zero pages, thus is
not dirty.  Only the parts of the stack that have been used are
dirty, which is typically very small.  Everything else is read-only.
(The Unreadable section contains guard pages which cause stack
overflows to cause segfaults -- ignore it.)

So as it is now, after Benjamin's recent memleak fixes, there is about
5-6 MB of real memory usage.

For comparison, this is gst-launch playing a mp3:

Backed by file:
  Executable               4712
  RO data                  6376
  Data                     192
  Unknown                  0
  Writable code (stack)    2212
  Data (malloc, mmap)      328
  RO data                  0
  Unreadable               0
  Unknown                  0

Note that the RO data backed by file is much larger, since filesrc
mmaps its files.  There's also only one stack allocated (2 MB each).

(btw, thanks Benjamin for tracking down so many memory leaks.)


[0]  Technically, we care about the "working set" of pages.  This
     includes dirty pages plus any read-only pages backed by a file
     that have been accessed recently (or the kernel thinks will be
     accessed in the near future).  But we can't really do much about
     making either media files or code size smaller.

#!/usr/bin/perl -w

$pid = $ARGV[0];

if(!$pid) {
	print "procmap <pid>\n";
	exit 1;

open MAP, "/proc/$pid/maps";
 <at> lines = <MAP>;
close MAP;

$writable_code = 0;
$data = 0;
$rodata = 0;
$unreadable = 0;
$unbacked_unknown = 0;
$mapped_executable = 0;
$mapped_rodata = 0;
$mapped_rwdata = 0;
$mapped_unknown = 0;
while ($line = shift  <at> lines) {
	$line =~ m/^(\w+)-(\w+) (....) (\w+) (\S+) (\d+) *(.*)$/;
	$start = hex($1);
	$end = hex($2);
	$rwxp = $3;
	#$offset = hex($4);
	$device = $5;
	#$inode = $6;
	#$filename = $7;

	$seg_size = ($end - $start)/1024;
	if ($device eq "00:00") {
		# anonymous mapping
		if ($rwxp =~ m/rwx./) {
			$writable_code += $seg_size;
		} elsif ($rwxp =~ m/rw-./) {
			$data += $seg_size;
		} elsif ($rwxp =~ m/r--./) {
			$rodata += $seg_size;
		} elsif ($rwxp =~ m/---./) {
			$unreadable += $seg_size;
		} else {
			$unbacked_unknown += $seg_size;
	} else {
		if ($rwxp =~ m/r-x./) {
			$mapped_executable += $seg_size;
		} elsif ($rwxp =~ m/r--./) {
			$mapped_rodata += $seg_size;
		} elsif ($rwxp =~ m/rw-./) {
			$mapped_rwdata += $seg_size;
		} else {
			$mapped_unknown += $seg_size;

print "Backed by file:\n";
print "  Executable               $mapped_executable\n";
print "  RO data                  $mapped_rodata\n";
print "  Data                     $mapped_rwdata\n";
print "  Unknown                  $mapped_unknown\n";
print "Anonymous:\n";
print "  Writable code (stack)    $writable_code\n";
print "  Data (malloc, mmap)      $data\n";
print "  RO data                  $rodata\n";
print "  Unreadable               $unreadable\n";
print "  Unknown                  $unbacked_unknown\n";