Gregory Haskins | 2 Jun 15:20
Picon
Favicon
Gravatar

Re: [ANNOUNCE] sched: schedtop utility

Hi Ankita,
  For some reason, I didn't get your original email.  I had to go find it on the lkml.org archives.

But anyway, see inline

>>> On Mon, Jun 2, 2008 at  9:07 AM, in message
<1212412051.6269.5.camel <at> lappy.programming.kicks-ass.net>, Peter Zijlstra
<peterz <at> infradead.org> wrote: 
> On Mon, 2008-06-02 at 18:18 +0530, Ankita Garg wrote:
>> Hi Gregory,
>> 
>> On Thu, May 22, 2008 at 08:06:44AM -0600, Gregory Haskins wrote:
>> > Hi all scheduler developers,
>> >   I had an itch to scratch w.r.t. watching the stats in /proc/schedstats, 
> and it appears that the perl scripts referenced in 
> Documentation/scheduler/sched-stats.txt do not support v14 from HEAD so I 
> whipped up a little utility I call "schedtop".
>> >
>> 
>> Nice tool! Helps in better visualization of the data in schedstats. 
>> 
>> Using the tool, realized that most of the timing related stats therein
>> might not be completely usable in many scenarios, as might already be
>> known.
>> 
>> Without any additional load on the system, all the stats are nice and
>> sane. But, as soon as I ran my particular testcase, the data
>> pertaining to the delta of run_delay/cpu_time went haywire! I understand
>> that all the values are based on top of rq->clock, which relies on tsc that 
>> is not synced across cpus and would result in skews/incorrect values.
>> But, turns out to be not so reliable data for debugging. This is
>> ofcourse nothing related to the tool, but for schedstat in
>> general...rather just adding on to the already existing woes with non-syned 
>> tscs :-)
> 
> Thing is, things runtime should be calculated by using per cpu deltas.
> You take a stamp when you get scheduled on the cpu and another one when
> you stop running, then the delta is added to runtime.
> 
> This is always on the same cpu - when you get migrated you're stopped
> and re-scheduled so that should work out nicely.
> 
> So in that sense it shouldn't matter that the rq->clock values can get
> skewed between cpus.
> 
> So I'm still a little puzzled by your observations; though it could be
> that the schedstat stuff got broken - I've never really looked too
> closely at it.

I suspect we must be talking about those stats that are always moving pretty fast.  I see that too, and I use the
(potentially unknown) filtering feature of schedtop: "-i REGEX" will set the include filter, and "-x
REGEX" will set the exclude filter.  By default, include allows everything, and exclude filters nothing. 
Changing it to "-x sched_info" will exclude all those pesky stats that move fast and do not convey useful
(to me, anyway) data.  I hope that helps!

Also, about your idea for the /proc/≤pid>/schedstats, I was thinking the same thing while on my trip on
Friday.  I will add this feature.  Thanks!

-Greg


Gmane