Re: thread leakage

From: Daniel Black <daniel.subs_at_internode.on.net>
Date: Sat, 19 Mar 2011 13:09:27 +1100

On Tuesday 15 March 2011 08:15:01 Murray S. Kucherawy wrote:

Same process running as Tuesday:

ls -la /proc/`pidof opendkim`/task | wc -l
171

> > #7 0xb7f4b04c in dkim_policy (dkim=0xa230cd0, pcode=0xb4f1f00c,
> > pstate=0x0) at dkim.c:4860 #8 0x08050b21 in mlfi_eom (ctx=0xa2312c8) at
> > opendkim.c:11355
> > #9 0xb7f313ef in st_bodyend (g=0xb46fe2e4) at engine.c:1614
> > #10 0xb7f30f65 in mi_engine (ctx=0xa2312c8) at engine.c:405
> > #11 0xb7f354bd in mi_worker (arg=0xa20f9b0) at worker.c:652
> > #12 0xb7b48955 in start_thread () from /lib/i686/cmov/libpthread.so.0
> > #13 0xb7ac8e7e in clone () from /lib/i686/cmov/libc.so.6
>
> This and the seven others like it are sitting in opendkim functions
> awaiting a reply to a DNS query that has been submitted to unbound.

looking closer at the what now exceeds 7 thread seems all without exception
are quering .in domains:
mtnl.net.in
airtelbroadband.in
vsnl.net.in
aol.in x very many
salesa1booksindia.in x many
goldleafhotel.in

> They
> are in pthread_cond_timedwait() which means they are not deadlocked, but
> rather are waiting for a condition to be signaled that hasn't happened
> yet. There is some timeout in effect (can't tell what it is from here;
> try "up 2" and "print timeout" to find out) that hasn't been reached.

[Switching to thread 86 (Thread 0x90eddb70 (LWP 25326))]#0 0xb7f5d424 in
__kernel_vsyscall ()
(gdb) bt
#0 0xb7f5d424 in __kernel_vsyscall ()
#1 0xb7b4d482 in pthread_cond_timedwait_at__at_GLIBC_2.3.2 () from
/lib/i686/cmov/libpthread.so.0
#2 0x080622a0 in dkimf_unbound_wait (srv=0x9f81768, qh=0x9fc6298,
to=0x90ed78f0, bytes=0x90ed790c, error=0x0, dnssec=0x0)
    at opendkim-dns.c:196
#3 dkimf_ub_waitreply (srv=0x9f81768, qh=0x9fc6298, to=0x90ed78f0,
bytes=0x90ed790c, error=0x0, dnssec=0x0) at opendkim-dns.c:417
#4 0xb7f4234f in dkim_get_policy_dns_excheck (dkim=0xa25f2d0, query=0xa383218
"aol.in", qstatus=0x90eda7ec) at dkim-policy.c:276
#5 0xb7f42b99 in dkim_get_policy_dns (dkim=0xa25f2d0, query=0xa383218
"aol.in", excheck=true, buf=0x90eda3eb "", buflen=1025,
    qstatus=0x90eda7ec) at dkim-policy.c:425
#6 0xb7f4adee in dkim_get_policy (dkim=0xa25f2d0, query=0x1da318e <Address
0x1da318e out of bounds>, excheck=252, qstatus=0x90eda94c,
    policy=0x90eda944, pflags=0x90eda948) at dkim.c:2504
#7 0xb7f4b04c in dkim_policy (dkim=0xa25f2d0, pcode=0xa380cec, pstate=0x0) at
dkim.c:4860
#8 0x08050b21 in mlfi_eom (ctx=0xa294d68) at opendkim.c:11355
#9 0xb7f313ef in st_bodyend (g=0x90edd2e4) at engine.c:1614
#10 0xb7f30f65 in mi_engine (ctx=0xa294d68) at engine.c:405
#11 0xb7f354bd in mi_worker (arg=0xa22c0e8) at worker.c:652
#12 0xb7b48955 in start_thread () from /lib/i686/cmov/libpthread.so.0
#13 0xb7ac8e7e in clone () from /lib/i686/cmov/libc.so.6
(gdb) up 2
#2 0x080622a0 in dkimf_unbound_wait (srv=0x9f81768, qh=0x9fc6298,
to=0x90ed78f0, bytes=0x90ed790c, error=0x0, dnssec=0x0)
    at opendkim-dns.c:196
196 (void) pthread_cond_timedwait(&ub-
>ub_ready,
(gdb) print timeout
$1 = {tv_sec = 1300497228, tv_nsec = 127624000}

gdb) thread 24
[Switching to thread 24 (Thread 0x6b2e2b70 (LWP 32517))]#0 0xb7f5d424 in
__kernel_vsyscall ()
(gdb) bt
#0 0xb7f5d424 in __kernel_vsyscall ()
#1 0xb7b4d482 in pthread_cond_timedwait_at__at_GLIBC_2.3.2 () from
/lib/i686/cmov/libpthread.so.0
#2 0x080622a0 in dkimf_unbound_wait (srv=0x9f81768, qh=0xa398c10,
to=0x6b2dc8f0, bytes=0x6b2dc90c, error=0x0, dnssec=0x0)
    at opendkim-dns.c:196
#3 dkimf_ub_waitreply (srv=0x9f81768, qh=0xa398c10, to=0x6b2dc8f0,
bytes=0x6b2dc90c, error=0x0, dnssec=0x0) at opendkim-dns.c:417
#4 0xb7f4234f in dkim_get_policy_dns_excheck (dkim=0xa2bf620, query=0xa416be0
"salesa1booksindia.in", qstatus=0x6b2df7ec)
    at dkim-policy.c:276
#5 0xb7f42b99 in dkim_get_policy_dns (dkim=0xa2bf620, query=0xa416be0
"salesa1booksindia.in", excheck=true, buf=0x6b2df3eb "",
    buflen=1025, qstatus=0x6b2df7ec) at dkim-policy.c:425
#6 0xb7f4adee in dkim_get_policy (dkim=0xa2bf620, query=0x1da321e <Address
0x1da321e out of bounds>, excheck=252, qstatus=0x6b2df94c,
    policy=0x6b2df944, pflags=0x6b2df948) at dkim.c:2504
#7 0xb7f4b04c in dkim_policy (dkim=0xa2bf620, pcode=0x79b05ab4, pstate=0x0)
at dkim.c:4860
#8 0x08050b21 in mlfi_eom (ctx=0xa40cea0) at opendkim.c:11355
#9 0xb7f313ef in st_bodyend (g=0x6b2e22e4) at engine.c:1614
#10 0xb7f30f65 in mi_engine (ctx=0xa40cea0) at engine.c:405
#11 0xb7f354bd in mi_worker (arg=0xa3cba78) at worker.c:652
#12 0xb7b48955 in start_thread () from /lib/i686/cmov/libpthread.so.0
#13 0xb7ac8e7e in clone () from /lib/i686/cmov/libc.so.6
(gdb) up 2
#2 0x080622a0 in dkimf_unbound_wait (srv=0x9f81768, qh=0xa398c10,
to=0x6b2dc8f0, bytes=0x6b2dc90c, error=0x0, dnssec=0x0)
    at opendkim-dns.c:196
196 (void) pthread_cond_timedwait(&ub-
>ub_ready,
(gdb) print timeout
$2 = {tv_sec = 1300497228, tv_nsec = 132916000}

> Thread 8 is slightly different. It has volunteered to be the one that
> actually waits on a descriptor to see if unbound has an answer to return,
> and will then notify the eight threads above if/when their replies arrive.
> It's the one sitting dkimf_wait_fd(). There should always be exactly one
> of those except when the system is idle, and that's the case here. It too
> has a timeout that has not yet been reached, though we can't see what that
> timeout is (try "up 2" and "print *until").

(gdb) thread 149
[Switching to thread 149 (Thread 0xadaf9b70 (LWP 31651))]#0 0xb7f5d424 in
__kernel_vsyscall ()
(gdb) bt
#0 0xb7f5d424 in __kernel_vsyscall ()
#1 0xb7ac2011 in select () from /lib/i686/cmov/libc.so.6
#2 0x0806856a in dkimf_wait_fd (fd=6, until=0xadaed888) at util.c:1709
#3 0x08062307 in dkimf_unbound_wait (srv=0x9f81768, qh=0xa1f7af0,
to=0xadaf38f0, bytes=0xadaf390c, error=0x0, dnssec=0x0)
    at opendkim-dns.c:231
#4 dkimf_ub_waitreply (srv=0x9f81768, qh=0xa1f7af0, to=0xadaf38f0,
bytes=0xadaf390c, error=0x0, dnssec=0x0) at opendkim-dns.c:417
#5 0xb7f4234f in dkim_get_policy_dns_excheck (dkim=0xa2ec2b0, query=0xa259d78
"nicholaspiramal.co.in", qstatus=0xadaf67ec)
    at dkim-policy.c:276
#6 0xb7f42b99 in dkim_get_policy_dns (dkim=0xa2ec2b0, query=0xa259d78
"nicholaspiramal.co.in", excheck=true, buf=0xadaf63eb "",
    buflen=1025, qstatus=0xadaf67ec) at dkim-policy.c:425
#7 0xb7f4adee in dkim_get_policy (dkim=0xa2ec2b0, query=0x0, excheck=254,
qstatus=0xadaf694c, policy=0xadaf6944, pflags=0xadaf6948)
    at dkim.c:2504
#8 0xb7f4b04c in dkim_policy (dkim=0xa2ec2b0, pcode=0xa2f4874, pstate=0x0) at
dkim.c:4860
#9 0x08050b21 in mlfi_eom (ctx=0xa2bbf48) at opendkim.c:11355
#10 0xb7f313ef in st_bodyend (g=0xadaf92e4) at engine.c:1614
#11 0xb7f30f65 in mi_engine (ctx=0xa2bbf48) at engine.c:405
#12 0xb7f354bd in mi_worker (arg=0xa20d4d0) at worker.c:652
#13 0xb7b48955 in start_thread () from /lib/i686/cmov/libpthread.so.0
#14 0xb7ac8e7e in clone () from /lib/i686/cmov/libc.so.6
(gdb) up 2
#2 0x0806856a in dkimf_wait_fd (fd=6, until=0xadaed888) at util.c:1709
1709 return select(fd + 1, &fds, NULL, NULL, &left);
(gdb) print *until
$1 = {tv_sec = 1300499236, tv_nsec = 547200000}

The thread with the dkimf_wait_fd changes around a lot. More than once a
second.

> > Thread 3 (Thread 0xb02feb70 (LWP 13413)):
> > #0 0xb7f5d424 in __kernel_vsyscall ()
> > #1 0xb7b4d482 in pthread_cond_timedwait_at__at_GLIBC_2.3.2 () from
> > /lib/i686/cmov/libpthread.so.0
> > #2 0xb7f355c7 in mi_worker (arg=0xa2049b0) at worker.c:724
> > #3 0xb7b48955 in start_thread () from /lib/i686/cmov/libpthread.so.0
> > #4 0xb7ac8e7e in clone () from /lib/i686/cmov/libc.so.6
>
> This and the next one are milter workers that are currently idle.
>
> > Thread 1 (Thread 0xb79786c0 (LWP 21328)):
> > #0 0xb7f5d424 in __kernel_vsyscall ()
> > #1 0xb7abb696 in poll () from /lib/i686/cmov/libc.so.6
> > #2 0xb7f324cd in mi_listener (conn=0x9f7e9e8 "inet:8891", dbg=0,
> > smfi=0x9f81708, timeout=7210, backlog=128) at listener.c:766
> > #3 0xb7f32d57 in smfi_main () at main.c:242
> > #4 0x0805d617 in main (argc=7, argv=0xbfe79e93) at opendkim.c:14485
>
> This is libmilter's master thread, listening for new connections from MTAs.
>
> So this looks pretty normal, depending on whether or not you would expect
> the DNS to be answering fast enough that there shouldn't be this many
> threads waiting for answers all at once.

Its quite a light load so the number of threads, particulary 140+, is too
many.

These backtraces have been done after disabling the milter on the MTA and
reloading its settings. The opendkim process was left running.

> It might be interesting to see
> what's in the timeout variables for each of the waiting threads just in
> case they're somehow absurdly high values.

looks absurdly high to me.
Received on Sat Mar 19 2011 - 02:08:24 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:33:09 PST