I had opendkim-2.2.0.Beta6 running for about ten days on my test
system with very little traffic except for a few spammers. Then I got
a report that it was using 100% of a CPU, and had been for several
days. I don't know exactly when this behavior started. Maybe this
one has been fixed in a later version.
`truss -d' showed it constanly repeating pollsys():
/2: 0.7097 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0
/2: 0.7098 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0
/2: 0.7098 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0
/2: 0.7099 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0
/2: 0.7099 pollsys(0xFE94EC90, 2, 0xFE94ED18, 0x00000000) = 0
The numbers are in fractions of a second. Here's a stack trace
showing all the threads:
7335: opendkim -x /etc/mail/opendkim.conf
----------------- lwp# 1 / thread# 1 --------------------
feafa795 pollsys (8037008, 1, 8036fc0, 0)
fea9f8be poll (8037008, 1, 1388, 8075eb1) + 52
0807592b mi_listener (80adbd0, 0, 80ac890, 1c2a, 80, 1) + cf
08074ceb smfi_main (8047d74, feffb800, 0, 0, 0, 0) + 67
0806a861 main (3, 8047db8, 8047dc8, 8047dac) + 2b81
0805a02d _start (3, 8047e60, 8047e69, 8047e6c, 0, 8047e84) + 7d
----------------- lwp# 2 / thread# 2 --------------------
feafa795 __pollsys (fe94ec90, 2, fe94ed18, 0) + 15
feaa3e36 pselect (7, fe94ef40, fe94eec0, feb70300, fe94ed18, 0) + 18e
feaa412c select (7, fe94ef40, fe94eec0, 0, fe94eeb8, 0) + 82
fef5335f ar_dispatcher (80b07b8) + 39f
feaf73b7 _thr_setup (fe840200) + 4e
feaf76a0 _lwp_start (fe840200, 0, 0, fe94eff8, feaf76a0, fe840200)
----------------- lwp# 3 / thread# 3 --------------------
feafa795 pollsys (fe83ec70, 2, 0, 0)
feaa3e36 pselect (a, fe83ef40, fe83eec0, feb70300, 0, 0) + 18e
feaa412c select (a, fe83ef40, fe83eec0, 0, 0, 0) + 82
fef5335f ar_dispatcher (80c1b90) + 39f
feaf73b7 _thr_setup (fe840a00) + 4e
feaf76a0 _lwp_start (fe840a00, 0, 0, fe83eff8, feaf76a0, fe840a00)
----------------- lwp# 4 / thread# 4 --------------------
feaf9b75 sigtimedwait (fe73fef0, fe73ff00, 0)
feaec71d sigwait (fe73ffc0) + 1c
feae4b82 __posix_sigwait (fe73ffc0, fe73ffd0) + 2e
0805c3eb dkimf_reloader (0) + 4b
feaf73b7 _thr_setup (fe841200) + 4e
feaf76a0 _lwp_start (fe841200, 0, 0, fe73fff8, feaf76a0, fe841200)
----------------- lwp# 5 / thread# 5 --------------------
feaf9b75 sigtimedwait (fe640f00, fe640f10, 0)
feaec71d sigwait (fe640fc0, 2, fe640fdc, feaf6ba7) + 1c
08076ffb mi_signal_thread (80abae0) + 4b
feaf73b7 _thr_setup (fe841a00) + 4e
feaf76a0 _lwp_start (fe841a00, 0, 0, fe640ff8, feaf76a0, fe841a00)
----------------- lwp# 211 / thread# 211 --------------------
feaf7719 lwp_park (0, fe536c00, 0)
feaf1efe cond_wait_queue (80c9c58, 80c9c68, fe536c00) + 5e
feaf227b cond_wait_common (80c9c58, 80c9c68, fe536c00) + 1db
feaf24ad _cond_timedwait (80c9c58, 80c9c68, fe536c7c) + 51
feaf2518 cond_timedwait (80c9c58, 80c9c68, fe536c7c) + 24
feaf2554 pthread_cond_timedwait (80c9c58, 80c9c68, fe536c7c) + 1e
fef5568b ar_waitreply (80b07b8, 80c9c30, fe536cc0, fe53cd10, 0, 0) + 2db
0806fcc8 dkimf_ar_waitreply (80b07b8, 80c9c30, fe53cd10, fe53cd48, 0, 0) + a8
fef7abc1 dkim_get_policy_dns_excheck (80c8f10, 80c0088, fe53fc40) + 4f1
fef7af83 dkim_get_policy_dns (80c8f10, 80c0088, 1, fe53f833, 401, fe53fc40) + 1a3
fef82a12 dkim_get_policy (80c8f10, 80c0088, 1, fe53fd8c, fe53fd80, fe53fd88) + 282
fef8712d dkim_policy (80c8f10, 80c6c80, 0) + 11d
0806686d mlfi_eom (80b47a8, a, fe541f68, 8077805) + 109d
08077aa1 mi_engine (80b47a8, 80b47a8, fe541fb8, 80760df) + 2f9
080760fa mi_handle_session (80b47a8, 0, fe842200, feb6e000, feaf5d23, feb6e000) + 2e
080756f2 mi_thread_handle_wrapper (80b47a8) + e
feaf73b7 _thr_setup (fe842200) + 4e
feaf76a0 _lwp_start (fe842200, 0, 0, fe541ff8, feaf76a0, fe842200)
It seems to be thread #2 that's spinning. I don't know what triggered
this behavior. When I restarted the milter, it went back to normal.
I'll check the logs. If anything's related, I'll let you know.
--
-Gary Mills- -Unix Group- -Computer and Network Services-
Received on Wed Oct 06 2010 - 15:27:20 PST