Apcupsd is a daemon that monitors APC UPS units. When used with a "smart" UPS, apcupsd is capable of sending the killpower signal to these units and, with the correct "return from powerfail" settings on the computer being protected, can automatically bring the system back up when power returns. The problem is that FreeBSD has no reliable way of telling the userland daemon when it is ready to have its power removed and apcupsd, by default, sends a killpower signal immediately, relying on the internal grace timer of the UPS to be long enough for a shutdown to succeed. Oftentimes, it is nowhere near long enough or the shutdown takes longer than expected.
If you've ever been annoyed by your gmirror rebuilding after each and every power failure, read on.
These have been around since the mid-triassic period and, such is their longevity and reliability, some of the older models have been re-celled and are still in service today. Why does this have any bearing on UPS killpower, I hear you ask?
APC SmartUPS models have certain variables that can be programmed into their EEPROMs altering certain aspects of the UPS's operation. Newer models have grace timer delays that are programmable from apctest (part of the apcupsd port) to allow adequate time for a shutdown. Older models, specifically those with option DIP switches, appear not to and apctest will tell you in no uncertain terms if you have one of these. The delay is 20 seconds by default and cannot be changed unless you know how to set one of these older models up. A server with many services running cannot possibly shut down in 20 seconds if killpower is called before shutdown. It becomes a race to see which finishes first, the machine syncing disks or the UPS grace timer expiring. Nine times out of ten, the machine loses.
I said "appear" because these models do, in fact, have programmable options, but only if all four option DIP switches are set to 1. This then enables you to set the shutdown delay using apctest, along with things like battery date, transfer thresholds and UPS name.
Of course, having apcupsd tell the UPS to shut down before even shutdown is called is most definitely sub-optimal.
The "solution," for want of a better word, is a quick and dirty hack of the shutdown sequence, although certain elements of the race remain and this does require careful testing: We simply call apcupsd --killpower much later. This is achieved by using a conditional at the bottom of /etc/rc.shutdown and modifying the apccontrol script. The modifications are simple:
In ${LOCALBASE}/etc/apcupsd/apccontrol, make the doshutdown section look like this:
What we do here is create an empty file to indicate that the shutdown was called by apcupsd. It's in /tmp because most security conscious admins will have cleartmp_enable set in rc.conf, so it'll be removed after a reboot and will never carry over to the next shutdown. Do not be tempted to change the shutdown flag to -p. Whilst this may seem a good idea and would certainly take the load off the UPS in the shutdown delay period, altering the load dramatically on a running inverter can have nasty consequences. "-h" is used for a very good reason here and should be left alone. Now add the following to the "Insert other shutdown procedures here" section of /etc/rc.shutdown:
This simply detects the empty file we created and executes the commands to delete it and call killpower on the UPS, telling you what it is doing in the process. Now you need to make sure that the apcupsd daemon is not started with the --kill-on-powerfail flag, which it is by default. Simply add
to /etc/rc.conf. Now restart apcupsd.
The above gives any system a fighting chance at shutting down before the grace period expires. All that remains to be done once rc.shutdown is finished is to send TERM, then KILL if they remain for longer than 120 seconds to any remaining process, stop syslogd and sync the disks. Most systems can manage this in 180 seconds, so we first of all want to set the shutdown delay to 180 (of the allowable 020, 180, 300 and 600 second delays).
Using apctest, which will read your apcupsd.conf for settings, choose option 5 (Program EEPROM) then option 8 (Change shutdown delay) and set the value to 180. Don't forget that if your UPS is one of the older models with DIP switches, they must all be set to "1" (up) for this to work.
This being FreeBSD, we don't rely on blind faith to do our testing. So, in your apcupsd.conf, set TIMEOUT (TIMEOUT being the length of time on battery before calling doshutdown) in apcupsd.conf to 30 seconds, restart apcupsd, WAIT until apcaccess gives you a full report instead of the "cannot connect" message and then kill the power. You should see, just before syslog stops, this:
Terminated.
Shutdown from apcupsd.
Calling killpower on UPS.
.
Older models take a while to reply to apcupsd so that period on its own after "Calling killpower on UPS." may take a few seconds to appear. Be patient.
If you get a watchdog timeout (people running squid with a large disk cache will get these every time) and rc.shutdown terminates, bump the rcshutdown_timeout variable in rc.conf to 120 otherwise the killpower command will never be reached. Until the rc scripts have been run with stop, the UPS still hasn't had the order to shutdown, so any processes that hang are given time to gracefully exit in their own time without becoming part of the "race to the finish."
Now watch. Ensure that the "System halted, press any key to reboot" message appears before the UPS switches off. If it does, you're good to go.
If not, increase the shutdown delay using apctest to 300 seconds, then 600 seconds if this is not adequate. If, after testing the 600 second setting, the UPS wins, your UPS shuts down too early and will eventually cause you data loss. In this rare case, remove all our changes except the apcupsd_flags="" in /etc/rc.conf. In this case, I'm afraid there seems to be no fix, but at least ensuring apcupsd does not call killpower at all will save you from a nasty surprise one day.
Also worth noting here is that the use of "dumb" UPS units such as the BackUPS (not BackUPS Pro, which are smart signalling devices) with its original cable will cause the UPS to power down as soon as apcupsd calls shutdown. This is because there is NO grace timer on these things. As soon as the killpower line goes high, they shut off. apcupsd does killpower right before doshutdown, so set apcupsd_flags="" in /etc/rc.conf every time you use a dumb UPS, otherwise you may as well not have a UPS at all.