Re: Excessive threads in opendkim-2.2.2 on Solaris 10

From: Murray S. Kucherawy <msk_at_blackops.org>
Date: Wed, 26 Jan 2011 00:55:48 -0800 (PST)

On Tue, 25 Jan 2011, Gary Mills wrote:
> Jan 25 15:54:53 electra opendkim[4975]: [ID 909925 mail.error] OpenDKIM Filter: thread_create() failed: 11, try again
> Jan 25 15:55:08 electra opendkim[4975]: [ID 909925 mail.error] OpenDKIM Filter: thread_create() failed: 11, abort
> Jan 25 15:55:16 electra opendkim[4975]: [ID 109917 mail.error] OpenDKIM Filter, mi_rd_cmd: read returned -1: Connection reset by peer
> Jan 25 15:55:16 electra opendkim[4975]: [ID 699540 mail.debug] p0PLOMoG008746: no signature data
> Jan 25 15:55:21 electra opendkim[4975]: [ID 109917 mail.error] OpenDKIM Filter, mi_rd_cmd: read returned -1: Connection reset by peer
> Jan 25 15:55:24 electra opendkim[4975]: [ID 555540 mail.info] OpenDKIM Filter v2.2.2 terminating with status -1, errno = 12
> Jan 25 15:55:24 electra opendkim[464]: [ID 482974 mail.info] OpenDKIM Filter v2.2.2 starting (args: -x /etc/mail/opendkim.conf)

This again shows thread exhaustion. It appears to be caused by the
asynchronous resolver becoming wedged, meaning jobs that need DNS answers
are timing out waiting for them; enough of a backlog of these forms that
no more threads can be allocated (libmilter creates a thread per SMTP
client) and eventually exits the entire process when it fails to do so
enough times.

> I also have a partial stack trace of the existing process:
>
> # pstack 464 | head -44
> 464: opendkim -x /etc/mail/opendkim.conf
> ----------------- lwp# 1 / thread# 1 --------------------
> feafa795 pollsys (8036ffc, 1, 8036fb0, 0)
> fea9f8be poll (8036ffc, 1, 1388, 8075f11) + 52
> 0807598b mi_listener (80ade00, 0, 80acef8, 1c2a, 80, 1) + cf
> 08074d4b smfi_main (8047d74, feffb800, 0, 0, 0, 0) + 67
> 0806a885 main (3, 8047db8, 8047dc8, 8047dac) + 2b95
> 0805a00d _start (3, 8047e60, 8047e69, 8047e6c, 0, 8047e84) + 7d

This is the milter listener, awaiting new connections. Normal.

> ----------------- lwp# 2 / thread# 2 --------------------
> feafb1d5 writev (5, fe94ed04, 2)
> fef52be9 ar_sendquery (80ad880, 8c93b00) + 279
> fef54252 ar_dispatcher (80ad880) + 1162
> feaf73b7 _thr_setup (fe840200) + 4e
> feaf76a0 _lwp_start (fe840200, 0, 0, fe94eff8, feaf76a0, fe840200)

Now this is interesting. writev() is called when the asynchronous
resolver is in TCP mode and is trying to send data to the nameserver.
Let's assume you weren't simply lucky enough to catch it while it was
doing one of these, which normally completes very fast, and it's actually
stuck there. That would easily explain how libar is jammed, but doesn't
quite explain why or what the right way is to compensate.

According to the Solaris man pages, writev() when used on a socket
actually behaves like send(). The send() man page says this:

"If the socket does not have enough buffer space available to hold a
message, the send() function blocks the message, unless the socket has
been placed in non-blocking I/O mode (see fcntl(2)). The select(3C) or
poll(2) call can be used to determine when it is possible to send more
data."

I suspect what's going on is this: Your system is busy with a number of
DKIM verification things in process, when one DKIM operation comes in that
results in a reply too big to fit in a UDP packet. The asynchronous
resolver notes this and decides it's time to switch to TCP, so it closes
the UDP socket, opens a TCP connection, connects, and then resends all
pending queries. This would be fine except that it doesn't check
periodically using select() (or poll(), as you mentioned) to see if that
new socket is still writable. Eventually it overloads the socket buffer,
and writev() blocks (because send() blocks). Meanwhile the nameserver has
jammed a bunch of replies down the socket to be consumed, but since the
resolver is blocked in writev(), it will never read them, and the two end
up wedging each other as the socket buffer fills up on the nameserver side
as well.

In general, writev() can also typically report that it wrote less data
than was requested if the buffer is full, rather than blocking. In that
case the caller should resume the write operation later when the socket is
writable again. I'm not sure we're dealing with that case properly
either. It's hard to tell if that's happening for you because this report
doesn't show return values. If a partial write did occur, libar would
then proceed to start over with the next query; what the nameserver sees
is gibberish and may start ignoring it, which would also cause clients to
time out waiting for answers that will never come.

So this is definitely grounds for a patch to libar. I'll get one together
this week.

> ----------------- lwp# 3 / thread# 3 --------------------
> feafa795 pollsys (fe83ec70, 2, fe83ed18, 0)
> feaa3e36 pselect (a, fe83ef3c, fe83eebc, feb70300, fe83ed18, 0) + 18e
> feaa412c select (a, fe83ef3c, fe83eebc, 0, fe83eeb0, 0) + 82
> fef53543 ar_dispatcher (80be9c8) + 453
> feaf73b7 _thr_setup (fe840a00) + 4e
> feaf76a0 _lwp_start (fe840a00, 0, 0, fe83eff8, feaf76a0, fe840a00)

This is a little bothersome. How can ar_dispatcher() be in two places at
the same time? (In the previous thread, it's calling ar_sendquery() which
is stuck in writev()...) opendkim only ever instantiates it once.

> ----------------- lwp# 4 / thread# 4 --------------------
> feaf9b75 sigtimedwait (fe73fef0, fe73ff00, 0)
> feaec71d sigwait (fe73ffc0) + 1c
> feae4b82 __posix_sigwait (fe73ffc0, fe73ffd0) + 2e
> 0805c43b dkimf_reloader (0) + 4b
> feaf73b7 _thr_setup (fe841200) + 4e
> feaf76a0 _lwp_start (fe841200, 0, 0, fe73fff8, feaf76a0, fe841200)

opendkim thread blocked waiting for a signal of interest. Normal.

> ----------------- lwp# 5 / thread# 5 --------------------
> feaf9b75 sigtimedwait (fe640f00, fe640f10, 0)
> feaec71d sigwait (fe640fc0, 2, fe640fdc, feaf6ba7) + 1c
> 0807705b mi_signal_thread (80abeb0) + 4b
> feaf73b7 _thr_setup (fe841a00) + 4e
> feaf76a0 _lwp_start (fe841a00, 0, 0, fe640ff8, feaf76a0, fe841a00)

Same thing, except inside libmilter. (They watch for different signals.)
Also normal.

> ----------------- lwp# 96452 / thread# 96452 --------------------
> feaf7719 lwp_park (0, e3abec80, 0)
> feaf1efe cond_wait_queue (8ac7868, 8ac7878, e3abec80) + 5e
> feaf227b cond_wait_common (8ac7868, 8ac7878, e3abec80) + 1db
> feaf24ad _cond_timedwait (8ac7868, 8ac7878, e3abecf8) + 51
> feaf2518 cond_timedwait (8ac7868, 8ac7878, e3abecf8) + 24
> feaf2554 pthread_cond_timedwait (8ac7868, 8ac7878, e3abecf8) + 1e
> fef5590b ar_waitreply (80ad880, 8ac7840, e3abed3c, e3abee90, 0, 0) + 2db
> 0806fcf8 dkimf_ar_waitreply (80ad880, 8ac7840, e3abee90, e3ac13b8, 0, 8b4a110) + a8
> fef7b21b dkim_get_policy_dns (8b4a0e0, e3ac1c7f, 0, e3ac1833, 401, e3ac1c40) + 2bb

A thread waiting for a reply from the resolver, presumably that will never
come. Normal, and I bet there were lots of these.

> `dccm' does use poll(), as does sendmail. I believe that FD_SETSIZE
> is defined to be 1024 for a 32-bit process. This could be the cause
> of the problem, although perhaps select() only fails when something
> else causes the thread count to grow.

I think select() and poll() quite possibly end up calling pollsys() down
below. I'll look for any reports online about poll() being the better
choice on Solaris.

-MSK
Received on Wed Jan 26 2011 - 08:56:24 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:33:08 PST