6 May 2012 18:44
Re: sparc-softmmu uninitialized memory read?
Blue Swirl <blauwirbel <at> gmail.com>
2012-05-06 16:44:35 GMT
2012-05-06 16:44:35 GMT
On Sun, May 6, 2012 at 2:02 PM, Andreas Färber <afaerber <at> suse.de> wrote: > Am 06.05.2012 13:32, schrieb Blue Swirl: >> On Sat, May 5, 2012 at 3:37 PM, Andreas Färber <afaerber <at> suse.de> wrote: >>> Hello Blue, >>> >>> Testing a potential AREG0 fix for ppc host by malc I got an error >>> running `./sparc-softmmu/sparc-softmmu` (same with CD/kernel): >>> >>> qemu: fatal: Trap 0x07 while interrupts disabled, Error state >>> pc: 00005e0c npc: 00005e10 >>> General Registers: >>> %g0-7: 00000000 00000001 babababa 00000000 00000020 07ffff08 07ffe000 >>> babababa >>> >>> Current Register Window: >>> %o0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >>> 00000000 >>> %l0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >>> 00000000 >>> %i0-7: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 >>> 00000000 >>> >>> Floating Point Registers: >>> %f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> %f08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> %f16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> %f24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 >>> psr: 048000c0 (icc: N--- SPE: SP-) wim: 00000001 >>> fsr: 00000000 y: 00000020 >>> Abgebrochen >>> >>> The 0xbabababa in %g2 and %g7 is a signature I've seen in uninitialized >>> memory on openSUSE 12.1 Betas. So I ran valgrind, and the following >>> caught my eye on both ppc and x86_64: >>> >>> ==18801== Command: ./sparc-softmmu/qemu-system-sparc >>> ==18801== >>> ==18801== Thread 2: >>> ==18801== Conditional jump or move depends on uninitialised value(s) >>> ==18801== at 0x25C5AF: compute_all_logic (cc_helper.c:37) >>> ==18801== by 0x25C648: helper_compute_psr (cc_helper.c:470) >>> ==18801== by 0x8CD0981: ??? >>> ==18801== Uninitialised value was created by a heap allocation >>> ==18801== at 0x4C27CE8: memalign (in >>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==18801== by 0x4C27D97: posix_memalign (in >>> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >>> ==18801== by 0x1F2101: qemu_memalign (oslib-posix.c:93) >>> ==18801== by 0x1F21A9: qemu_vmalloc (oslib-posix.c:126) >>> ==18801== by 0x2665F6: qemu_ram_alloc_from_ptr (exec.c:2647) >>> ==18801== by 0x286D76: memory_region_init_ram (memory.c:954) >>> ==18801== by 0x297FFD: ram_init1 (sun4m.c:757) >>> ==18801== by 0x204DAE: qdev_init (qdev.c:151) >>> ==18801== by 0x204EEC: qdev_init_nofail (qdev.c:258) >>> ==18801== by 0x298845: ram_init.constprop.7 (sun4m.c:783) >>> ==18801== by 0x298980: sun4m_hw_init (sun4m.c:862) >>> ==18801== by 0x2994A2: ss5_init (sun4m.c:1289) >>> >>> This is at 8f473dd104f0937ce98523fa6f9de0bd845aebbe, and cc_helper.c:37 >>> is int32_t dst argument of get_NZ_icc(), which is always called with >>> CC_DST, i.e. env->cc_dst. >>> >>> This seems to indicate that a read from uninitialized memory occurred, >>> from which cc_dst is being initialized? >> >> This should happen in target-sparc/cpu.c:45 >> memset(env, 0, offsetof(CPUSPARCState, breakpoints)); >> >> cc_dst is between structure start and CPU_COMMON. >> >> 89aaf60dedbe0e6415acfe816e02b538e5c54e68 fixed a bug relating to reset recently. > > The still-current master commit above includes that fix though, and > that's no explanation for the uninitialized memory stemming from sun4m > RAM as opposed to QOM object_new(). Somewhere a read is happening, > possibly in OpenBIOS, from uninitialized memory that is then stored into > the CPUSPARCState after that has been zero-initialized, IIUC. Ok, I see it now. OpenBIOS assumes that the Sparc32 SMP table is valid when the valid field is nonzero, indicating secondary processor setup so OpenBIOS jumps to the location indicated with the SMP table. With 0xbabababa in memory, this fragile logic fails and there is the early crash. https://tracker.coreboot.org/trac/openbios/browser/trunk/openbios-devel/arch/sparc32/entry.S#L132 https://tracker.coreboot.org/trac/openbios/browser/trunk/openbios-devel/arch/sparc32/entry.S#L157 I think the current logic would also not survive a reset just when a secondary processor is brought online. The fix is to make the SMP table logic robust, for example with a checksum. We could also read CPU ID from MXCC and skip the check for boot CPU, though MXCC should not exist for all models. > > My issue here is that sparc64 boots HelenOS fine up until it's trying to > load the kernel (identical to x86_64 host) but sparc32 exits really > early on ppc. It might well be that there's a bug hidden in malc's TCG > patch that's causing the fatal error state, but the uninitialized memory > report is on both TCG hosts, so unlikely TCG-related. > > /-F > >>> Any idea where that could originate from or how to further debug? >>> It doesn't seem to be caused by the >>> 7d21dcc84b8c07918124a9c0708694d2fb013f65 OpenBIOS r1056 update. >>> >>> Regards, >>> >>> Andreas > > -- > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany > GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg