check_monitor_stats - Service Check for Performance Metrics of the Monitor Process

Extension Version:1.0.0
Supported SONARPLEX Version(s):5.1.1a or higher
Requires:---

This plugin allows you to monitor detailed performance metrics of the Monitor Process. It is especially useful to do performance analysis of your monitoring configuration and for troubleshooting. Understanding the metrics requires some deep understanding of the internal processing and thus is considered an advanced topic.

Installation

  1. Open the Administration Web Interface> Configuration > My Plugins
  2. Choose Browse and select the new extension file and click to start the upload
  3. The installation was successful if no errors are shown in red and the last status line is OK

See the Managing Extensions with the Administration Web Interface article for further information.

Configuration

The check uses SONARPLEX internal metrics so it should be bound to an internal host. You can do so by creating a new host that points to localhost, name it -azeti-Benchmark and set Host Address to 127.0.0.1.

On this page:

Variables

The check queries for external variables that hold the performance metrics. These are explained below in detail.

SettingDescription
PROGRUNTIMETTstring with time Nagios process has been running (minutes)
STATUSFILEAGETTstring with age of status data file (minutes).
TOTCMDBUFtotal number of external command buffer slots available.
USEDCMDBUFnumber of external command buffer slots currently in use.
HIGHCMDBUFhighest number of external command buffer slots ever in use.
NUMSERVICEStotal number of services.
NUMHOSTStotal number of hosts.
NUMSVCOKnumber of services OK.
NUMSVCWARNnumber of services WARNING.
NUMSVCUNKNnumber of services UNKNOWN.
NUMSVCCRITnumber of services CRITICAL.
NUMSVCPROBnumber of service problems (WARNING, UNKNOWN or CRITIAL).
NUMSVCCHECKEDnumber of services that have been checked since start.
NUMSVCSCHEDULEDnumber of services that are currently scheduled to be checked.
NUMSVCFLAPPINGnumber of services that are currently flapping.
NUMSVCDOWNTIMEnumber of services that are currently in downtime.
NUMHSTUPnumber of hosts UP.
NUMHSTDOWNnumber of hosts DOWN.
NUMHSTUNRnumber of hosts UNREACHABLE.
NUMHSTPROBnumber of host problems (DOWN or UNREACHABLE).
NUMHSTCHECKEDnumber of hosts that have been checked since start.
NUMHSTSCHEDULEDnumber of hosts that are currently scheduled to be checked.
NUMHSTFLAPPINGnumber of hosts that are currently flapping.
NUMHSTDOWNTIMEnumber of hosts that are currently in downtime.
NUMHSTACTCHK[1/5/15/60]Mnumber of hosts actively checked in last 1/5/15/60 minutes.
NUMHSTPSVCHK[1/5/15/60]Mnumber of hosts passively checked in last 1/5/15/60 minutes.
NUMSVCACTCHK[1/5/15/60]Mnumber of services actively checked in last 1/5/15/60 minutes.
NUMSVCPSVCHK[1/5/15/60]Mnumber of services passively checked in last 1/5/15/60 minutes.
[AVG/MIN/MAX]ACTSVCLATMIN/MAX/AVG active service check latency (ms).
[AVG/MIN/MAX]ACTSVCEXTMIN/MAX/AVG active service check execution time (ms).
[AVG/MIN/MAX]ACTSVCPSCMIN/MAX/AVG active service check %% state change.
[AVG/MIN/MAX]PSVSVCLATMIN/MAX/AVG passive service check latency (ms).
[AVG/MIN/MAX]PSVSVCPSCMIN/MAX/AVG passive service check %% state change.
[AVG/MIN/MAX]SVCPSCMIN/MAX/AVG service check %% state change.
[AVG/MIN/MAX]ACTHSTLATMIN/MAX/AVG active host check latency (ms).
[AVG/MIN/MAX]ACTHSTEXTMIN/MAX/AVG active host check execution time (ms).
[AVG/MIN/MAX]ACTHSTPSCMIN/MAX/AVG active host check %% state change.
[AVG/MIN/MAX]PSVHSTLATMIN/MAX/AVG passive host check latency (ms).
[AVG/MIN/MAX]PSVHSTPSCMIN/MAX/AVG passive host check %% state change.
[AVG/MIN/MAX]HSTPSCMIN/MAX/AVG host check %% state change.
NUMACTHSTCHECKS[1/5/15]Mnumber of total active host checks occuring in last 1/5/15 minutes.
NUMOACTHSTCHECKS[1/5/15]Mnumber of on-demand active host checks occuring in last 1/5/15 minutes.
NUMCACHEDHSTCHECKS[1/5/15]Mnumber of cached host checks occuring in last 1/5/15 minutes.
NUMSACTHSTCHECKS[1/5/15]Mnumber of scheduled active host checks occuring in last 1/5/15 minutes.
NUMPARHSTCHECKS[1/5/15]Mnumber of parallel host checks occuring in last 1/5/15 minutes.
NUMSERHSTCHECKS[1/5/15]Mnumber of serial host checks occuring in last 1/5/15 minutes.
NUMPSVHSTCHECKS[1/5/15]Mnumber of passive host checks occuring in last 1/5/15 minutes.
NUMACTSVCCHECKS[1/5/15]Mnumber of total active service checks occuring in last 1/5/15 minutes.
NUMOACTSVCCHECKS[1/5/15]Mnumber of on-demand active service checks occuring in last 1/5/15 minutes.
NUMCACHEDSVCCHECKS[1/5/15]Mnumber of cached service checks occuring in last 1/5/15 minutes.
NUMSACTSVCCHECKS[1/5/15]Mnumber of scheduled active service checks occuring in last 1/5/15 minutes.
NUMPSVSVCCHECKS[1/5/15]Mnumber of passive service checks occuring in last 1/5/15 minutes.
NUMEXTCMDS[1/5/15]Mnumber of external commands processed in last 1/5/15 minutes.

Settings

SettingDescription
Variable MeasuredChoose the Variable which should be checked
Warning ThresholdNumeric threshold for the WARNING state
Critical ThresholdNumeric threshold for the WARNING state

SONARMANAGER Host Template

Use the Host Template (extension .atf) included in the plugin distribution ZIP Archive as a start for configuration.

Importing host template(s):

  1. Open the SONARMANAGER Top Menu > Setup > Import Templates

  2. A file browse dialog appears, browse to the location of the .atf template file and commit
  3. If the monitoring configuration is locked, unlock it via Status Tree > azeti device, right click > Unlock Configuration
  4. Choose or create the host to which the template should apply, Status Tree  > azeti device, right click > Properties
  5. Expand the drop down menu for Template and choose the new template
  6. Choose OK and upload the configuration with Ctrl + U or Status Tree  > azeti device, right click > Configuration Upload

The following macros are not currently supported in the footer:
  • style