RE: Excessive threads in opendkim-2.2.2 on Solaris 10

From: Murray S. Kucherawy <msk_at_cloudmark.com>
Date: Wed, 26 Jan 2011 16:56:16 -0800

> -----Original Message-----
> From: Gary Mills [mailto:mills_at_cc.umanitoba.ca]
> Sent: Wednesday, January 26, 2011 4:50 PM
> To: Murray S. Kucherawy
> Cc: opendkim-dev_at_lists.opendkim.org
> Subject: Re: Excessive threads in opendkim-2.2.2 on Solaris 10
>
> > I think my first plan of attack will be to add compile-time support
> > for poll() instead of select(), and you should be able to see if
> > that helps or not. Second will be more robust handling of a partial
> > writev() and doing more frequent checking to see if the descriptor
> > can't accept any more data.
>
> That sounds good to me. I don't know why the nameserver would stop
> accepting queries, but I suppose it could be overloaded. This happens
> occasionally when our Internet connection is down, so that the
> nameserver can't resolve recursive queries. They eventually time out,
> as do the clients. Any query will need a timeout of some sort.

I have the writev() restart part done and ready to test. Working in poll() support will be somewhat more tricky than I had anticipated, but it's in progress now.

> In the one from this morning, there were 99 instances of mi_rd_cmd().
> The highest file descriptor was 0x23b or 571 decimal. That's close
> enough to the limit to be worrisome.

Yep, I agree. Use of select() on Solaris might be a problem. Historically FD_SETSIZE was 1024 or maybe less. On modern BSD systems it is actually configurable at compile-time.

> There are a couple of TCP connections to the nameserver:
>
> 5: S_IFSOCK mode:0666 dev:287,0 ino:2144 uid:0 gid:0 size:0
> O_RDWR
> SOCK_STREAM
> SO_SNDBUF(49152),SO_RCVBUF(49152),IP_NEXTHOP(0.192.0.0)
> sockname: AF_INET 127.0.0.1 port: 60980
> peername: AF_INET 127.0.0.1 port: 53
> 8: S_IFSOCK mode:0666 dev:287,0 ino:49858 uid:0 gid:0 size:0
> O_RDWR
> SOCK_STREAM
> SO_SNDBUF(49152),SO_RCVBUF(49152),IP_NEXTHOP(0.192.0.0)
> sockname: AF_INET 127.0.0.1 port: 37772
> peername: AF_INET 127.0.0.1 port: 53
>
> I didn't see any UDP connections, but they would be transient.
> Are the TCP connections persistent, or did they just happen to get
> captured in the output?

In UDP mode you'd see one UDP socket open constantly, and this is used to send queries to the nameserver. In TCP mode one connection is put up and stays up for the lifetime of the process (i.e. it never downgrades).

I'm at a loss to explain why you see two of them. It's too bad lsof can't tell you where in the source code the second one was allocated. It could, I suppose, be one that wasn't closed when the library decided to give up and try again after too many I/O errors.
Received on Thu Jan 27 2011 - 00:56:25 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:33:08 PST