hossg
Debugging Upstart
Upstart is an event-driven daemon designed to replace the old-stylee System-V init process for managing and starting processes upon (re)boot on Linux. It’s used on Ubuntu, but designed in a way to be compatible with other distros.
I recently needed to add an Upstart script to automatically launch my node.js app every time I rebooted my server. In principle this is very easy – upstart simply looks in /etc/initand runs all of the “jobs” it finds there (defined as .conf files).
Although scripts can be complex, and can include pre- and post-execute steps, my script was pretty straightforward:
node_example.conf:
description "nodejs"
author "zeresht"
start on startup
stop on shutdown
exec sudo sh -c "/ebs/bin/node /ebs/app/app.js/"
This simply tells upstarts what to do (start, stop) on what events (startup, shutdown), along with an exec stanza that specifies the command to actually run.
There’s excellent documentation in the Upstart Cookbook here: http://upstart.ubuntu.com/cookbook/
Obviously you want to make sure you configure each job with appropriate logging, in case of issues. For applications that log to STDOUT or STDERR this can be done with simple redirection in the job .conf file. In my case node wasn’t starting correctly for some reason, so I simply changed the exec stanza to:
exec sudo sh -c "/usr/bin/node /ebs/puddleduck/app.js >> /ebs/log/app.js.log 2>&1"
Sadly I found that node still wasn’t starting up . Most confusing, since when launching the Upstart script manually all appeared fine:
root@devserver:/etc/init# start node_example
node_example start/running, process 7566
And yet if I ran ps -ef | grep node I would find nothing running.
The first place to look is the Upstart log itself – which can be found at /var/log/syslog. This showed clearly there was an error:
node_example main process (7562) terminated with status 2
Now I have no idea what a status 2 is, and a quick google didn’t reveal an obvious answer.
My major concern at this point was that there was a dependency of node or my app that was not yet ready when the script was run. In order to investigate this I simply turned up the logging level of Upstart with the initctl log-priority command. This takes a single parameter to specify the log-level (you can see a list of the options by typing initctl log-priority --help).
With logging tuned up to debug I could see the exact event-sequence within Upstart, and managed to reassure myself that things were happening in the correct order, that the right filesystems were mounted, etc. With that done it was clear that the issue lay with the exec command itself, and so I started paring it back to basics outside of Upstart, eventually revealing the issue (note the typo in the exec stanza – the final ‘/’ was the cause of all my pain).
The real issue here is that not only do the start/stop commands not show an error, they specifically give the impression that the command was successful. Very misleading…