CPU spin in opendkim-2.2.0.Beta6

From: Gary Mills <mills_at_cc.umanitoba.ca>
Date: Wed, 6 Oct 2010 10:27:06 -0500

I had opendkim-2.2.0.Beta6 running for about ten days on my test
system with very little traffic except for a few spammers. Then I got
a report that it was using 100% of a CPU, and had been for several
days. I don't know exactly when this behavior started. Maybe this
one has been fixed in a later version.

`truss -d' showed it constanly repeating pollsys():

    /2: 0.7097 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0
    /2: 0.7098 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0
    /2: 0.7098 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0
    /2: 0.7099 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0
    /2: 0.7099 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0

The numbers are in fractions of a second. Here's a stack trace
showing all the threads:

    7335: opendkim -x /etc/mail/opendkim.conf
    ----------------- lwp# 1 / thread# 1 --------------------
     feafa795 pollsys (8037008, 1, 8036fc0, 0)
     fea9f8be poll (8037008, 1, 1388, 8075eb1) + 52
     0807592b mi_listener (80adbd0, 0, 80ac890, 1c2a, 80, 1) + cf
     08074ceb smfi_main (8047d74, feffb800, 0, 0, 0, 0) + 67
     0806a861 main (3, 8047db8, 8047dc8, 8047dac) + 2b81
     0805a02d _start (3, 8047e60, 8047e69, 8047e6c, 0, 8047e84) + 7d
    ----------------- lwp# 2 / thread# 2 --------------------
     feafa795 __pollsys (fe94ec90, 2, fe94ed18, 0) + 15
     feaa3e36 pselect (7, fe94ef40, fe94eec0, feb70300, fe94ed18, 0) + 18e
     feaa412c select (7, fe94ef40, fe94eec0, 0, fe94eeb8, 0) + 82
     fef5335f ar_dispatcher (80b07b8) + 39f
     feaf73b7 _thr_setup (fe840200) + 4e
     feaf76a0 _lwp_start (fe840200, 0, 0, fe94eff8, feaf76a0, fe840200)
    ----------------- lwp# 3 / thread# 3 --------------------
     feafa795 pollsys (fe83ec70, 2, 0, 0)
     feaa3e36 pselect (a, fe83ef40, fe83eec0, feb70300, 0, 0) + 18e
     feaa412c select (a, fe83ef40, fe83eec0, 0, 0, 0) + 82
     fef5335f ar_dispatcher (80c1b90) + 39f
     feaf73b7 _thr_setup (fe840a00) + 4e
     feaf76a0 _lwp_start (fe840a00, 0, 0, fe83eff8, feaf76a0, fe840a00)
    ----------------- lwp# 4 / thread# 4 --------------------
     feaf9b75 sigtimedwait (fe73fef0, fe73ff00, 0)
     feaec71d sigwait (fe73ffc0) + 1c
     feae4b82 __posix_sigwait (fe73ffc0, fe73ffd0) + 2e
     0805c3eb dkimf_reloader (0) + 4b
     feaf73b7 _thr_setup (fe841200) + 4e
     feaf76a0 _lwp_start (fe841200, 0, 0, fe73fff8, feaf76a0, fe841200)
    ----------------- lwp# 5 / thread# 5 --------------------
     feaf9b75 sigtimedwait (fe640f00, fe640f10, 0)
     feaec71d sigwait (fe640fc0, 2, fe640fdc, feaf6ba7) + 1c
     08076ffb mi_signal_thread (80abae0) + 4b
     feaf73b7 _thr_setup (fe841a00) + 4e
     feaf76a0 _lwp_start (fe841a00, 0, 0, fe640ff8, feaf76a0, fe841a00)
    ----------------- lwp# 211 / thread# 211 --------------------
     feaf7719 lwp_park (0, fe536c00, 0)
     feaf1efe cond_wait_queue (80c9c58, 80c9c68, fe536c00) + 5e
     feaf227b cond_wait_common (80c9c58, 80c9c68, fe536c00) + 1db
     feaf24ad _cond_timedwait (80c9c58, 80c9c68, fe536c7c) + 51
     feaf2518 cond_timedwait (80c9c58, 80c9c68, fe536c7c) + 24
     feaf2554 pthread_cond_timedwait (80c9c58, 80c9c68, fe536c7c) + 1e
     fef5568b ar_waitreply (80b07b8, 80c9c30, fe536cc0, fe53cd10, 0, 0) + 2db
     0806fcc8 dkimf_ar_waitreply (80b07b8, 80c9c30, fe53cd10, fe53cd48, 0, 0) + a8
     fef7abc1 dkim_get_policy_dns_excheck (80c8f10, 80c0088, fe53fc40) + 4f1
     fef7af83 dkim_get_policy_dns (80c8f10, 80c0088, 1, fe53f833, 401, fe53fc40) + 1a3
     fef82a12 dkim_get_policy (80c8f10, 80c0088, 1, fe53fd8c, fe53fd80, fe53fd88) + 282
     fef8712d dkim_policy (80c8f10, 80c6c80, 0) + 11d
     0806686d mlfi_eom (80b47a8, a, fe541f68, 8077805) + 109d
     08077aa1 mi_engine (80b47a8, 80b47a8, fe541fb8, 80760df) + 2f9
     080760fa mi_handle_session (80b47a8, 0, fe842200, feb6e000, feaf5d23, feb6e000) + 2e
     080756f2 mi_thread_handle_wrapper (80b47a8) + e
     feaf73b7 _thr_setup (fe842200) + 4e
     feaf76a0 _lwp_start (fe842200, 0, 0, fe541ff8, feaf76a0, fe842200)

It seems to be thread #2 that's spinning. I don't know what triggered
this behavior. When I restarted the milter, it went back to normal.
I'll check the logs. If anything's related, I'll let you know.

-- 
-Gary Mills-        -Unix Group-        -Computer and Network Services-
Received on Wed Oct 06 2010 - 15:27:20 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:32:54 PST