Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

label = "kb-how-to-article"

Definition of a Watch dog

A watchdog timer (sometimes called a computer operating properly or COP timer, or simply a watchdog) is an electronic timer that is used to detect and recover from computer malfunctions. During normal operation, the computer regularly resets the watchdog timer to prevent it from elapsing, or "timing out". If, due to a hardware fault or program error, the computer fails to reset the watchdog, the timer will elapse and generate a timeout signal. The timeout signal is used to initiate corrective action or actions. The corrective actions typically include placing the computer system in a safe state and restoring normal system operation. [Wikipedia]

Distinction between this Watch Dog and the other Watch Dog

In order to distinguish the watch dog documented here from the one implemented in Python being a part of the SiteController for years by now, the function range of the two watch dogs are briefly recapitulated here.

The Watch Dog of the SiteController

The Python watch dog is monitoring the SiteController modules that are supposed to run according to its sensor configuration (a.k.a. Site template). That watch dog will send a special request telegram in a regular interval to the modules and expects an answer from each of the modules stating the internal health state of each module by themselves. The watch dog waits for those answers. Should an answer time out, the watch dog checks if that modules has a process in the operating system process table.

If the process is missing, the watch dog declares the module as crashed and immediately tries to start that process anew. If by contrast the watch dog finds a process however, it assumes that process to be hanging and reports this to the cloud server. But it will not try to automatically heal the situation as sometimes a process just needs a bit of extra time for completing the processing of a bigger set of data.

A Hardware Watch Dog

The type of watch dog this document is about however is supposed to check the entire system for its health state. It uses a special watch dog hardware timer that needs to be reset (kicked) in regular intervals. If that watch dog times out the system will be rebooted immediately.

That could bring a system back up in case it crashed entirely, without any intervention of an operator required.

Section


Column
width400px

Introduction

Tip

TODO:

  • change the labels to match your content in "Related Articles" below
  • add labels to your article
Column
width400px
Panel
borderColorlightgrey
bgColor#f0f0f0
titleOn this page:
Table of Contents

Related pages:

Filter by label (Content by label)
showLabelsfalse
showSpacefalse
sortcreation
cql


Column
width400px


Panel
borderColorlightgrey
bgColor#f0f0f0
titleOn this page:

Table of Contents

Related pages:

  • Filter by label (Content by label)
    showLabelsfalse
    showSpacefalse
    sortcreation
    cqllabel = "kb-how-to-article"



The Hardware Watch Dog of a NIFE103

...

The Files included in the Tarball



NIFE103-wdt-initwill initialize the watch dog timer (needs to be run once before using the watch dog). It accepts a single parameter which sets the timeout value for the watch dog in minutes. If omitted the parameter defaults to 10 minutes.
NIFE103-wdt-startstarts the timer.
NIFE103-wdt-stopstops the timer. After this command the watch dog is no longer guarding the system.
NIFE103-wdt-reset_timerresets the timer and thus 'kicks' the watch dog. This binary also requires the timeout value as a command line parameter. Same as the wdt-init executable the value is expected to be specified in minutes and defaults to 10 if the parameter is not provided.
README.mdthis file you're reading right now.
watchdog.shinterface shell script to be used with the SiteController. You may want to edit the TIMEOUT_MINUTES variable in this script.
watchdog-NIFE103.template.xmla component template that could be used to run the watch dog timer with the SiteController. If you changed the TIMEOUT_MINUTES in watchdog.sh, you may also want to adapt the timer parameter in xpath('/component_template/ac_rules/rule[1]/timers/timer[1]/@delay') of watchdog-NIFE103.template.xml.

Using NIFE103 Watch Dog Support

...

  • By using these binaries you can no longer use the lm-sensors package to monitor the NIFE103 hardware sensors because otherwise the kernel module locks the access to the SMBus.
  • Does not yet cooperate with watchdogd
  • as the watch dog is using the same ioctl as the indicator LED on the front panel of the NIFE103 the two features may interfere with each other (still needs to be tested).

References

...