Hardware Watch Dog for the Nexcom NIFE103

The Hardware Watch Dog of a NIFE103

The NIFE103 comes with a chip (namely NCT7904D) that allows (next to monitoring other critical system sensor parameters) to use a hardware based watch dog.

A (very rough) documentation of the NIFE103's Watch Dog functionality could be found in [1] and a documentation of the specific hardware monitoring chip in [2].

In short, the watch dog needs to be configured and kicked via SMBus ioctls. An attempt to do so via a Python script directly was unsuccessfull so the actual calls needed to be implemented and compiled in C.

Requirements

  • The binaries provided by this package requires to run on a Nexcom NIFE103 hardware that has a NCT7904D chip build in which answers on i2c-7 bus on address 2dhex.

  • The binaries require the i2c support either compiled into the kernel or as a module.
  • As of time of writing of this README, the nct7904 kernel module needs to be blacklisted in modprobe configuration.
  • The kernel module i2c_i801 must NOT be blacklisted in modprobe configuration.
  • The binaries have to run as root user (either natively, via sudo or by running suid root).

Installing NIFE103 Watch Dog Support

  1. If not already provided by your system's installed OS image, extract the files from the archive into your ${SITECONTROLLER_HOME}/scripts folder. This is usually located at /opt/azeti/SiteController/scripts:

  2. make sure the binaries have correct ownership and permissions:

  3. Double check nct7904 is blacklisted:

    If it is not blacklisted, just add a blacklist nct7904 entry to /etc/modprobe.d/blacklist.conf or remove the # in the beginning of the line, if that entry was just commented out.

  4. Double check i2c_i801 is NOT blacklisted (note the #):

    If it is blacklisted, just add a # at the beginning of that line.

  5. If you needed to modify a file in /etc/modprobe.d/ you need to reboot the machine for the changes to take effect.
  6. modify the SiteController.cfg file a.k.a. Site configuration so that in section [remote_exec_calls] the following entries are present:

  7. upload the watchdog-NIFE103.template.xml file as a new component template to the cloud (if it's not already available there, that is)

  8. add this component template to your Site template in the usual way.

The Files included in the Tarball



NIFE103-wdt-initwill initialize the watch dog timer (needs to be run once before using the watch dog). It accepts a single parameter which sets the timeout value for the watch dog in minutes. If omitted the parameter defaults to 10 minutes.
NIFE103-wdt-startstarts the timer.
NIFE103-wdt-stopstops the timer. After this command the watch dog is no longer guarding the system.
NIFE103-wdt-reset_timerresets the timer and thus 'kicks' the watch dog. This binary also requires the timeout value as a command line parameter. Same as the wdt-init executable the value is expected to be specified in minutes and defaults to 10 if the parameter is not provided.
README.mdthis file you're reading right now.
watchdog.shinterface shell script to be used with the SiteController. You may want to edit the TIMEOUT_MINUTES variable in this script.
watchdog-NIFE103.template.xmla component template that could be used to run the watch dog timer with the SiteController. If you changed the TIMEOUT_MINUTES in watchdog.sh, you may also want to adapt the timer parameter in xpath('/component_template/ac_rules/rule[1]/timers/timer[1]/@delay') of watchdog-NIFE103.template.xml.

Using NIFE103 Watch Dog Support

The SiteController is now set up to be monitoring the system state. All it remains to do is a restart of the SiteController. Once restarted the hardware based watch dog is initialized and started, the timer in the automation rule set will kick the watchdog_kick action every 120 seconds which in turn will call the necessary io controls to reset the timer.

Should the watch dog run into a time-out after 10 minutes without any call to these ioctl, the system is rebooted.

Restrictions

  • By using these binaries you can no longer use the lm-sensors package to monitor the NIFE103 hardware sensors because otherwise the kernel module locks the access to the SMBus.
  • Does not yet cooperate with watchdogd
  • as the watch dog is using the same ioctl as the indicator LED on the front panel of the NIFE103 the two features may interfere with each other (still needs to be tested).

References

Next Steps