3 January 2015

A PC Watchdog


Project Features

  • Automatically resets a crashed, frozen Linux PC
  • Simple
  • Uses a cheap USB-to-serial adapter

Introduction
For several years I have been using a MythTV Linux server to record television shows. It works very well and I like it a lot. It has one major problem though: Sometimes it locks up. I suspect that this happens when it changes channels and signals are weak (e.g. due to bad weather). It seems the drivers for my DVB-S receiver cards don't like that.

This is annoying, because it remains in that state until the PC is reset manually. No recordings are made in that state. It gets particularly annoying when I am on holiday, because then the recordings of a week or more can be lost. The box needs to be reset via the reset button. Not even ctrl-alt-del will work.

The aim of this project was to automate the process with an external watchdog. The idea was that the PC had to send a "heartbeat signal", i.e. a fixed character via a USB-to-serial adapter regularly. If it doesn't do so, the watchdog resets the PC after five minutes.

Since this is a Linux box, a simple cron job can be used to create such a heartbeat signal. If the signal is there, this means that the operating system is alive, the cron daemon is working and the USB driver is good.

Hardware
There is a little short-cut in order to keep this project simple. I used an of-the-shelf USB-to-serial adapter. I ordered it from a Chinese dealer and it is really amazing how they can sell them so cheap. The adapter's output voltage levels are TTL, which means that they can be connected directly to a microcontroller. And there is even a 3.3 Volt power supply output. What more could you ask for?


What this means is that the rest of the circuit can be very simple. The microcontroller is a humble ATTiny13. It has one input - the serial line from the adapter. There is one output - the reset signal, buffered with a NPN transistor. And finally there is an optional second output to display a status signal. This can be useful for debugging.

For my prototype I added an LED on the serial signal. This is not really necessary because the USB adapter already has a TX LED. I also added an LED on the reset line so that I see the reset pulse, if one is generated.

For the function as such, none of these LEDs are necessary, of course. But my advice would be to add at least the status LED. If your PC starts to behave strangely, it is always good to have all available information.

Here is a schematic for the circuit:



Software
There isn't much to the software. The major point is that the ATTiny13 hasn't got a HW UART, so the reception of the serial data is done in software. In order to do this as reliably as possible, I set the baud rate to 300. After all, we only want to see one character per minute, so this is perfectly acceptable.

After power-on, the watchdog is immediately active. This means that the PC must boot and produce the first heartbeat signal within 5 minutes (default, can be changed in PCWatchdog.h). 

The status LED has the following functions:
  1. When the watchdog is activated, the status LED flashes three times quickly to indicate this.
  2. A successfully received heartbeat is acknowledged with a single flash, one second long. You should see this once per minute.
  3. If a RESET pulse was issued anytime earlier in the current power cycle, the heartbeat signal is followed by a second, shorter pulse.
  4. After a RESET pulse, the watchdogs is activated only after an extra 5 minute pause (configurable via header file). This is to allow for additional time after the restart, e.g. to run a disk check. During that time, the status LED is permanently on.
  5. To avoid a continuous reset loop, the watchdog stops all activities after 10 resets. The reason for this is that continuous resets might destroy the system (e.g. HW or file system). In that deactivated state the LED blinks slowly (1 second on, 1 second off, continuously). My server shuts down at least once a day and then it will "forget" the number of reset cycles. If a server was permanently powered up, this behaviour might not be ideal, because the watchdog would/might reach this state eventually.
Host Configuration
So far I only have a Linux server. Under Linux, it is relatively easy to create a heartbeat signal. My script looks like this:
#!/bin/bash

# set port to 300 baud
stty -F /dev/ttyUSB0 300

# send the heartbeat character
echo -n 'A' > /dev/ttyUSB0
Since there is only one serial-to-USB adapter on my server, the device name is well-defined. A more sophisticated script might be necessary, if multiple adapters are used.

It makes sense to test this script manually to see if it works. If it does, the script can be invoked once per minute by adding an entry in the cron table. This is my entry:

* * * * *       /root/heartbeat.sh > /dev/null
The five asterisks actually are meaningful. Basically, the script is invoked at any day, any hour, minute and so on. To add this entry, use "crontab -e". For more information on its usage consult the manpage.

Installing the Hardware
The USB connector goes in a USB port obviously. The RESET line is connected to the RESET pin on the mainboard of the PC. I made a Y-cable to keep the function of the reset button on the PC case.


There are actually two RESET pins on the mainboard header that usually are shorted via the RESET button in order to restart the PC. I found that on my board, one of these pins is connected to ground, the second one is the actual RESET pin. My circuit connects it to ground via a transistor. In order to find out which pin is which, I used an ohmmeter.

Source code, hex file and all the rest can be downloaded here.

5 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Can you build some for sale? How much?

    ReplyDelete
  3. https://googledrive.com/host/0B_acb-AcSkXYc2lfbVl1M09LMDQ/PCWatchdog.zip - the link is not available

    ReplyDelete
  4. It seems that you are right. Please try this:
    https://drive.google.com/open?id=0B_acb-AcSkXYcGp2bzJKRUdGcVU

    ReplyDelete