Saturday, August 29, 2009

Know What you Inherit

I ran into a couple of bugs recently that took longer than they should have to fix. Like all bugs, it seems obvious now that I know what was wrong but I think I spent a total of 5-6 hours investigating and fixing this across multiple platforms (hey, this is what makes developing software fun!). Here's hoping that those fancy search engines will help someone else find this post and save them time incase they have a similar problem.

I was working on some code that launched another process. This is a fairly mundane task. Interestingly though, the process launched went on to kill its parent and re-launch another copy of the parent. So, process A is running and launches process B, which then kills A and launches A'. Sounds kinda simple, right?

Well, everything seemed to work until I noticed that some resources needed by A' weren't available (ports/files). That seemed odd, given that A had been using those same resources successfully, and those should now be available since A was gone. The title of the post gives it away, but the problem is that a fork causes child processes to inherit the file descriptors/handles of the parent, so the files/ports opened by A were inherited by B and subsequently by A'. This meant that they weren't really cleaned up by the OS when A exited. Hence A' could not re-open those resources (some were locked for exclusive use).

The really interesting aspect of this bug was how it manifest itself on different OSes and the different fixes.

On Windows, TCPView showed the ports in use by a "<non-existent>" process. The PID of this process was that of A, which was already gone by now and didn't show up in the list of running processes.

After some analytical debugging, I guessed that the handles were being inherited by the children resulting in these strange 'orphaned' ports. The solution on Windows was to use the CreateProcess API with the 'bInheritHandles' flag to FALSE. Problem solved. By the way, the not existent process IDs attached to those ports seem to imply that the OS tracks ownership by processID, and hence cannot validate that the new owner of these is the child process now that the parent is dead - the child obviously has a different process ID. This might not be really how it's done, but seems logical.

The Mac was much more interesting. The manifestation of the problem was simply unavailability of the resources. There isn't an equivalent of the bInheritHandles flag for the fork command on UNIX. To cut to the chase, the solution was to fork the process, close all open file descriptors that it inherits and then call exec.

Now, the Mac guys have added a nifty 'open' command to the system which helps launch processes as a child of the launchd process, which doesn't inherit the callers file descriptors (it's parent is the launchd process). However, this open command has a couple of pretty severe limitations. It can only open app bundles, not native binaries. And you can't pass in command line arguments to the application you open. FAIL on both counts! Mac purists will claim that command line arguments aren't the 'right thing' on the Mac, and that Apple Events should be used instead. That's BS, really. For anyone writing platform independent C++ code, it's just not practical. People don't realize that not every app on the Mac is a Cocoa/Carbon app. Anyways, so on OS X I needed to be a little creative and have A launch B, but before doing anything useful in B enumerate all open file descriptors and close them (readdir() /dev/fd on FreeBSD and Darwin enumerates all open file descriptors). Oh and by the way, you might not want to close FDs 0, 1 and 2 - they're stdin, stdout and stderr respectively. This post on StackOverflow has some good related information incase you're curious.

No comments: