Re: /etc/init.d/opendkim fails to start opendkim when run over ssh with a pseudo-tty

From: Sam Umbach <sumbach_at_gmail.com>
Date: Sun, 17 Apr 2011 17:32:32 -0400

On Sun, Apr 17, 2011 at 4:51 PM, Murray S. Kucherawy <msk_at_blackops.org> wrote:
> On Sun, 17 Apr 2011, Sam Umbach wrote:
>>>
>>> Does anything get logged during the failure to start?
>>
>> When the daemon fails to start, nothing is logged or written to STDERR or
>> STDOUT.
>
> What about the syslog?

I checked /var/log/*, nothing there.

>> Based on my tests today, I strongly suspect there is a race condition when
>> the daemon is started from a pty.  This affects both opendkim and dk-filter,
>> and I have seen the same results on Ubuntu 10.04 (lucid), 10.10 (maverick),
>> and 11.04 beta 2 (natty).  I'm not sure whether the issue lies in the
>> opendkim and dk-filter executables, start-stop-daemon, or elsewhere.
>
> But what's the race condition?  My guess is the shell script allocates a pty
> and (presumably) assigns it to descriptors 0, 1 and 2.  Then it forks and
> execs opendkim, waiting for a return status.  The child process, opendkim,
> operating in background and/or autorestart mode, it almost immediately forks
> again, and in the child it closes 0, 1 and 2 and replaces them with newly
> opened files that are read-write to /dev/null, and calls setsid() to create
> a new session, detaching it from a controlling terminal.  The pty is thus
> closed, never used.  In autorestart *and* background mode, the fork/reopen
> process is repeated.  Since all of the processes I'm talking about either
> wait on each other or exit immediately, there's no race condition involved
> that I can think of.  The only thing I can imagine is that the shell script
> has some expectation having to do with the pty that opendkim isn't
> satisfying.  If that's the case, I'd like to know what that is, because it's
> something I've never heard of before.
>
> The only time the pty might get used is if during startup there's an attempt
> to print some error condition, but that should be the exception and not the
> rule.

I've removed sudo and the /etc/init.d/opendkim script from the
equation. Starting the daemon directly over a pty fails about 10% of
the time in my tests on Ubuntu 10.10 maverick using the following
commands:

    ssh -tt root_at_maverick '/etc/init.d/opendkim stop'
    ssh -tt root_at_maverick 'pidof opendkim'
    ssh -tt root_at_maverick '/usr/sbin/opendkim -x /etc/opendkim.conf -u
opendkim -P /var/run/opendkim/opendkim.pid -p inet:8891_at_localhost'
    ssh -tt root_at_maverick 'pidof opendkim'

I would not conclude that the problem lies in opendkim yet. I see the
same behavior in dk-filter, and I wouldn't be surprised if other
daemons behaved similarly. At this point, I'm going to build a really
simple daemon program and try to reproduce the behavior. With that
source code in hand I should be able to approach the Ubuntu team for
assistance.

-Sam
Received on Sun Apr 17 2011 - 21:32:45 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:20:17 PST