Re: /etc/init.d/opendkim fails to start opendkim when run over ssh with a pseudo-tty

From: Sam Umbach <sumbach_at_gmail.com>
Date: Sun, 17 Apr 2011 22:43:03 -0400

The problem is not related to the file descriptors at all. Here's the
sequence of events:

* connection with ssh established
* pty allocated
* shell started on pty
* /etc/init.d/opendkim start executed, starting the parent process on the pty
* parent calls fork(), creating the child process
* parent process exits
* shell exits
* ssh connection is torn down, pty deallocated
* child process is still part of the session associated with the pty,
receives sighup, and terminates

This is a race condition. If the child process gets far enough to run
setsid() before the pty is deallocated, the child process lives on.
Unfortunately, the child process doesn't get this far in some
percentage of attempts, which is why launching the daemon over ssh
with a pty sometimes works and sometimes fails.

I see a few ways out:
* parent process does not exit until the child has run setsid(). This
could be indicated by the child sending the parent process a signal.
* parent process calls setsid() before fork() -- will not work if the
parent process is the session leader, probably not a good solution
* parent process ignores sighup before fork(). There is a small
possibility that the parent process would miss a sighup signal, but
since the parent terminates shortly after the fork() call, this
shouldn't be a problem.

The daemonize code in opendkim is completely in line with all the
recommendations I've found, which makes me think this race condition
may be an issue for many daemons. I can't be the first person to run
into this issue, but I have not yet found another report of it.

-Sam


On Sun, Apr 17, 2011 at 9:51 PM, Murray S. Kucherawy <msk_at_blackops.org> wrote:
> On Sun, 17 Apr 2011, Sam Umbach wrote:
>>
>> I think I see what's happening now: the parent process exits and the pty
>> is torn down before the child process ever starts running (and has a chance
>> to close its file handles and call setsid()).  Since the pty is the child
>> process' controlling terminal, the child is terminated.
>
> Since fork() clones all descriptors, why would a close() of them in the
> parent (implicit or explicit) cause the tear-down of the pty?  There's still
> a descriptor open to it in the child.
>
>
>
Received on Mon Apr 18 2011 - 02:43:16 PST

This archive was generated by hypermail 2.3.0 : Mon Oct 29 2012 - 23:20:17 PST