Picon
Gravatar

Re: [AT91RM9200] further run-time problems (jffs2, Oops in __update_rq_clock, IPSec)

On Mon, Nov 12, 2007 at 03:45:04PM +0100, Guennadi Liakhovetski wrote:
> 2. There has been an Oops once in vi...
> 
> Unable to handle kernel paging request at virtual address e5dcc3ec
> pgd = c135c000
> [e5dcc3ec] *pgd=00000000
> Internal error: Oops: 5 [#1]
> Modules linked in:
> CPU: 0    Not tainted  (2.6.23.1-ga63c3b88-dirty #52)
> PC is at __update_rq_clock+0x4c/0x140
> LR is at __update_rq_clock+0x28/0x140
> pc : [<c0033e38>]    lr : [<c0033e14>]    psr: 60000093
> sp : c1517b08  ip : e5dcc010  fp : c0117b3c
> r10: c025125c  r9 : 00000001  r8 : 00000000
> r7 : 00000000  r6 : c1cc9720  r5 : e5dcc3ec  r4 : e2be3800
> r3 : 00989680  r2 : ffffd430  r1 : 00989665  r0 : e2be3800
> Flags: nZCv  IRQs off nt user
> Control: c000717f  Table: 2135c000  DAC: 00000015
> Process vi (pid: 2017, stack limit = 0xc1516258)
> Stack: (0xc1517b08 to 0xc1518000)
> Backtrace: frame pointer underflow
^^^^^^^^^^^^^^ hint.

> Backtrace aborted due to bad frame pointer <c0117b3c>
> Code: e0c88005 e51bc034 e3580000 e28c5ff7 (e8950060)
> 
> __update_rq_clock:
>         @ args = 0, pretend = 0, frame = 12
>         @ frame_needed = 1, uses_anonymous_args = 0
>         mov     ip, sp  @,
>         stmfd   sp!, {r4, r5, r6, r7, r8, r9, sl, fp, ip, lr, pc}       @,
>         sub     fp, ip, #4      @,,
>         sub     sp, sp, #12     @,,
>         ldr     r3, .L33+8      @ tmp108,
>         add     r4, r0, #996    @ prev_raw, rq,
>         ldmia   r4, {r4-r5}     @ prev_raw
>         str     r0, [fp, #-52]  @ tmp12,
>         mov     lr, pc
>         bx      r3      @ tmp108
>         str     r0, [fp, #-48]  @, now
>         str     r1, [fp, #-44]  @, now
>         mov     r8, r1  @ delta,
>         mov     r7, r0  @ delta,
>         subs    r7, r7, r4      @ delta, delta, prev_raw
>         sbc     r8, r8, r5      @ delta, delta, prev_raw
>         ldr     ip, [fp, #-52]  @,
>         cmp     r8, #0  @ delta,
>         add     r5, ip, #988    @ clock.432, rq,
>         ldmia   r5, {r5-r6}     @ clock.432
> 	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Notice where 'r5' comes from - 'ip', which comes from '[fp, #-52]'.
Now read through from the start of the function and see what value 'fp'
is supposed to have.

sp at the ldmia instruction is currently at 0xc1517b08.  Add 12.
There's 11 registers pushed into the stack, so add 44 bytes.  This
is the value of 'ip' and 'sp' after the first instruction.  This
gives a value of 0xc1517b40.

To confirm this, here's the state which the stmfd instruction saved
onto the stack:

7b00:                   c0314974 c1517b18 c0046378 ffffd43a c1cc9860 c1cc9720
                           A        B        C       r4       r5       r6
7b20: c02f4554 c1516000 00000001 c025125c c1517b64 c1517b40 c0250a74 c0033dfc
        r7       r8       r9       sl       fp       ip       lr       pc

That means 'fp' after the first 'sub' instruction should be 0xc1517b3c.
However, it is actually 0xc0117b3c.  Note that these two values look
very similar.  The difference is only 0x01400000.  Two bit errors in
SDRAM?

The other thing to consider is sched_clock() - the 'bx' instruction
you've marked above is calling that function.  Is it messing up the
frame pointer?

Also note that r8, r1 and r5 values.  The code immediately before which
touches these registers does the following:

>         mov     r8, r1  @ delta,
>         sbc     r8, r8, r5      @ delta, delta, prev_raw
>         cmp     r8, #0  @ delta,

and the state here after this cmp instruction is: r8 = 0, r5 = e5dcc3ec
r1 = 00989665.  In my mind, 00989665 - e5dcc3ec is certainly not zero.

Finally, go back to the stack dump above and look at the values marked
A, B and C.  A = [fp, #-52], B = [fp, #-48] (aka r0) C = [fp, #-44]
(aka r1).  We can't really tell about 'A', but we can talk about 'B'
and 'C' since nothing's changed the values in those registers between
when they were stored and the point we oopsed.  The conclusion I come
to is that r0 and r1 were not stored in these locations, so in all
probability, 'fp' had already been corrupted by the "str r0, [fp, #-48]"
instruction.

-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ:        http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette:  http://www.arm.linux.org.uk/mailinglists/etiquette.php


Gmane