On Sun, Apr 17, 2011 at 4:51 PM, Murray S. Kucherawy <msk_at_blackops.org> wrote:
> On Sun, 17 Apr 2011, Sam Umbach wrote:
>>>
>>> Does anything get logged during the failure to start?
>>
>> When the daemon fails to start, nothing is logged or written to STDERR or
>> STDOUT.
>
> What about the syslog?
I checked /var/log/*, nothing there.
>> Based on my tests today, I strongly suspect there is a race condition when
>> the daemon is started from a pty. This affects both opendkim and dk-filter,
>> and I have seen the same results on Ubuntu 10.04 (lucid), 10.10 (maverick),
>> and 11.04 beta 2 (natty). I'm not sure whether the issue lies in the
>> opendkim and dk-filter executables, start-stop-daemon, or elsewhere.
>
> But what's the race condition? My guess is the shell script allocates a pty
> and (presumably) assigns it to descriptors 0, 1 and 2. Then it forks and
> execs opendkim, waiting for a return status. The child process, opendkim,
> operating in background and/or autorestart mode, it almost immediately forks
> again, and in the child it closes 0, 1 and 2 and replaces them with newly
> opened files that are read-write to /dev/null, and calls setsid() to create
> a new session, detaching it from a controlling terminal. The pty is thus
> closed, never used. In autorestart *and* background mode, the fork/reopen
> process is repeated. Since all of the processes I'm talking about either
> wait on each other or exit immediately, there's no race condition involved
> that I can think of. The only thing I can imagine is that the shell script
> has some expectation having to do with the pty that opendkim isn't
> satisfying. If that's the case, I'd like to know what that is, because it's
> something I've never heard of before.
>
> The only time the pty might get used is if during startup there's an attempt
> to print some error condition, but that should be the exception and not the
> rule.
I've removed sudo and the /etc/init.d/opendkim script from the
equation. Starting the daemon directly over a pty fails about 10% of
the time in my tests on Ubuntu 10.10 maverick using the following
commands:
ssh -tt root_at_maverick '/etc/init.d/opendkim stop'
ssh -tt root_at_maverick 'pidof opendkim'
ssh -tt root_at_maverick '/usr/sbin/opendkim -x /etc/opendkim.conf -u
opendkim -P /var/run/opendkim/opendkim.pid -p inet:8891_at_localhost'
ssh -tt root_at_maverick 'pidof opendkim'
I would not conclude that the problem lies in opendkim yet. I see the
same behavior in dk-filter, and I wouldn't be surprised if other
daemons behaved similarly. At this point, I'm going to build a really
simple daemon program and try to reproduce the behavior. With that
source code in hand I should be able to approach the Ubuntu team for
assistance.
-Sam
Received on Sun Apr 17 2011 - 21:32:45 PST
This archive was generated by hypermail 2.2.0+W3C-0.50 : Sun May 15 2011 - 15:58:22 PST