Re: /etc/init.d/opendkim fails to start opendkim when run over ssh with a pseudo-tty

From: Sam Umbach <sumbach_at_gmail.com>
Date: Sun, 17 Apr 2011 20:22:58 -0400

I think I see what's happening now: the parent process exits and the
pty is torn down before the child process ever starts running (and has
a chance to close its file handles and call setsid()). Since the pty
is the child process' controlling terminal, the child is terminated.

I'm going to continue looking for a good reference implementation for
daemonize. Off the top of my head, I see a few possible solutions:
* don't terminate the parent process until it receives a signal from
the child indicating that setsid has succeeded
* call setsid in the parent process, before calling fork()
** This may not work: "The fork() has to come before the setsid() to
ensure that you aren't a process leader (the setsid() will fail if you
are)."
** This would dissociate the parent process from the tty. Since the
parent exits immediately following the fork, I don't know if this is a
problem.
** Why not do the redirection of stdin/out/err to /dev/null before fork(), too?

What do you think? I would think others must've stumbled onto this
before, but I'm having a difficult time finding similar reports.

-Sam


On Sun, Apr 17, 2011 at 5:32 PM, Sam Umbach <sumbach_at_gmail.com> wrote:
> On Sun, Apr 17, 2011 at 4:51 PM, Murray S. Kucherawy <msk_at_blackops.org> wrote:
>> On Sun, 17 Apr 2011, Sam Umbach wrote:
>>>>
>>>> Does anything get logged during the failure to start?
>>>
>>> When the daemon fails to start, nothing is logged or written to STDERR or
>>> STDOUT.
>>
>> What about the syslog?
>
> I checked /var/log/*, nothing there.
>
>>> Based on my tests today, I strongly suspect there is a race condition when
>>> the daemon is started from a pty.  This affects both opendkim and dk-filter,
>>> and I have seen the same results on Ubuntu 10.04 (lucid), 10.10 (maverick),
>>> and 11.04 beta 2 (natty).  I'm not sure whether the issue lies in the
>>> opendkim and dk-filter executables, start-stop-daemon, or elsewhere.
>>
>> But what's the race condition?  My guess is the shell script allocates a pty
>> and (presumably) assigns it to descriptors 0, 1 and 2.  Then it forks and
>> execs opendkim, waiting for a return status.  The child process, opendkim,
>> operating in background and/or autorestart mode, it almost immediately forks
>> again, and in the child it closes 0, 1 and 2 and replaces them with newly
>> opened files that are read-write to /dev/null, and calls setsid() to create
>> a new session, detaching it from a controlling terminal.  The pty is thus
>> closed, never used.  In autorestart *and* background mode, the fork/reopen
>> process is repeated.  Since all of the processes I'm talking about either
>> wait on each other or exit immediately, there's no race condition involved
>> that I can think of.  The only thing I can imagine is that the shell script
>> has some expectation having to do with the pty that opendkim isn't
>> satisfying.  If that's the case, I'd like to know what that is, because it's
>> something I've never heard of before.
>>
>> The only time the pty might get used is if during startup there's an attempt
>> to print some error condition, but that should be the exception and not the
>> rule.
>
> I've removed sudo and the /etc/init.d/opendkim script from the
> equation.  Starting the daemon directly over a pty fails about 10% of
> the time in my tests on Ubuntu 10.10 maverick using the following
> commands:
>
>    ssh -tt root_at_maverick '/etc/init.d/opendkim stop'
>    ssh -tt root_at_maverick 'pidof opendkim'
>    ssh -tt root_at_maverick '/usr/sbin/opendkim -x /etc/opendkim.conf -u
> opendkim -P /var/run/opendkim/opendkim.pid -p inet:8891_at_localhost'
>    ssh -tt root_at_maverick 'pidof opendkim'
>
> I would not conclude that the problem lies in opendkim yet.  I see the
> same behavior in dk-filter, and I wouldn't be surprised if other
> daemons behaved similarly.  At this point, I'm going to build a really
> simple daemon program and try to reproduce the behavior.  With that
> source code in hand I should be able to approach the Ubuntu team for
> assistance.
>
> -Sam
>
Received on Mon Apr 18 2011 - 00:23:12 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:20:17 PST