6 Mar 2012 17:18
Re: ntpd wedged by libc?
Christos Zoulas <christos <at> zoulas.com>
2012-03-06 16:18:40 GMT
2012-03-06 16:18:40 GMT
On Mar 6, 3:01pm, hart <at> ntp.org (Dave Hart) wrote: -- Subject: Re: ntpd wedged by libc? | On Tue, Mar 6, 2012 at 14:22, Christos Zoulas <christos <at> zoulas.com> wrote: | > On Mar 5, =A08:03pm, agcarver+netbsd <at> acarver.net (AGC) wrote: | > | I had to bump it up to 30,000 but it finally did finish: | > | > Then I don't think it is leaking... You can try with the old libc, | > and you'll see it will run out of memory. | | The problem has evolved. At first, ntpd stopped responding due to out | of memory due to a leak triggered by lots of snprintf with floating | point. With the leak so identified now fixed, it's still ntpd is now | reported to be "wedging" (I assume meaning spinning using lots of CPU | and not responding to network traffic) and it's still apparently | related to snprintf of floating points. The opening message of this | thread has a stack trace which I assume came from attaching a debugger | to the spinning ntpd: | | =3D=3D=3D=3D=3D=3D | Seems I'm still having issues with libc on 5.1/sparc specifically with | ntpd wedging when doing math: | | #0 0x103d38c8 in __pow5mult_D2A () from /usr/lib/libc.so.12 | #1 0x103d3ac4 in __muldi3 () from /usr/lib/libc.so.12 | #2 0x103d34dc in __mult_D2A () from /usr/lib/libc.so.12 | #3 0x103d3728 in __pow5mult_D2A () from /usr/lib/libc.so.12 | #4 0x103c61d4 in __dtoa () from /usr/lib/libc.so.12 | #5 0x103c315c in __vfprintf_unlocked () from /usr/lib/libc.so.12 | #6 0x103330c4 in snprintf () from /usr/lib/libc.so.12 | #7 0x000256f4 in ctl_putdblf (tag=3D0x87d79 "", fmt=3D0x88458 "%.3f", | d=3D4.5623779296875) | at ntp_control.c:1431 | =3D=3D=3D=3D=3D=3D | | There have been over 50 messages in the thread, so I think we can all | be forgiven forgetting a detail or two along the way, but I don't | think anyone has suggested the original leak bug hasn't been fixed. | Rather, it seems there is still some sort of problem on "5.1" (not | -current, clearly) on sparc with ntpd being polled every few seconds | by ntpq triggering a hang snprintf'ing with floating point. | | The stack trace looks very similar to the first go-around. If | accurate, it suggests the same code still has issues that ntpd's abuse | tickles but t_printf.c doesn't. Sure, let's change the test to be closer to the ntp one, let's make the format %.3f for example. The way I tracked it down initially was by instrumenting all malloc/free's in the dtoa code... christos