RE: Opendkim number of threads growing abnormaly under Solaris 10

From: Murray S. Kucherawy <msk_at_cloudmark.com>
Date: Wed, 13 Oct 2010 09:51:45 -0700

> -----Original Message-----
> From: Gary Mills [mailto:mills_at_cc.umanitoba.ca]
> Sent: Wednesday, October 13, 2010 5:24 AM
> To: Murray S. Kucherawy
> Cc: Christian.Pelissier_at_onera.fr; opendkim-dev_at_lists.opendkim.org
> Subject: Re: Opendkim number of threads growing abnormaly under Solaris 10
>
> You probably missed all the stack traces that Christian sent earlier.
> Here's one for one thread:
>
> ----------------- lwp# 1108 / thread# 1108 --------------------
> febc9ba7 recvfrom (16, fdb42080, 2000, 0, fdb3f8a0, fdb3f9c4)
> fef5d457 send_dg (fef77aa8, fdb40010, 19, fdb42080, 2000, fdb3ffc4) + 1ed
> fef5ca49 res_nsend (fef77aa8, fdb40010, 19, fdb42080, 2000) + 468
> fef57470 res_send (fdb40010, 19, fdb42080, 2000, 1f597, 100) + 46
> fef85398 dkim_res_query (0, f, 80be400, fdb42080, 2000, fdb4206c) + 58
> fef8699e dkim_get_policy_dns_excheck (80ce9c0, 80be400, fdb4a79c, fef86d10, 0, 0) + 1eb
> fef86e1e dkim_get_policy_dns (80ce9c0, 80be400, 1, fdb4abb0, 401, fdb4a79c) + 11f
> fef8c491 dkim_get_policy (80ce9c0, 80be400, 1, fdb4b004, fdb4b008, fdb4b00c) + 1b5
> fef8f47e dkim_policy (80ce9c0, 80cf0c4, 0, 2, 0, 0) + c4
> 0805cf4b mlfi_eom (80aea00) + ae1
> fef14df7 st_bodyend (fdb4df70) + 97
> fef139e4 mi_engine (80aea00) + 214
> fef1634b mi_handle_session (80aea00) + 33
> fef15a06 mi_thread_handle_wrapper (80aea00) + 19
> febc73a7 _thr_setup (fea87200) + 4e
> febc7690 _lwp_start (fea87200, 0, 0, fdb4dff8, febc7690, fea87200)

This confirms that it's stuck waiting for a DNS reply. There are many possible causes:

- the resolver against which the filter was linked is not thread-safe, so for example two threads T1 and T2 send queries Q1 and Q2 and then go to sleep waiting for the replies, but T2 receives the answer to Q1, notes that it's not the one it wants, discards it, then gets the answer it wants and returns; meanwhile T1 will wait for a long time, possibly forever, for an answer that will now never come

- the resolver can't hear answers from the nameserver because of a misconfigured firewall or packet filter

- the nameserver isn't getting answers fast enough for some reason

I mentioned in a previous reply that although the configuration file shows "DNSTimeout 30", that's actually not able to be enforced for the stock system resolver since the res_*() functions don't take timeout parameters.

I suggested trying "--enable-arlib" or "--with-unbound" which do have timeout facilities and are definitely thread-safe.
Received on Wed Oct 13 2010 - 16:51:55 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:32:54 PST