|
Setting up a high availability (HA) monitoring envirnoment between two SONARPLEX devices.
High availability is a function that syncronizes two SONARPLEX devices by copying the configuration, all RRD graphs and all log files from a master to a slave SONARPLEX device. In case of a failure (e.g. hardware malfunction, power outage etc.) of the master, the slave will load the synchronized data and takes over the role of the former master.
This does not include an automatic IP-address failover! All IPs remain unchanged, so make sure the slave has access to all of the configured hosts that are being monitored! |
If the real master is back online, the slave will recognize it and falls back to the slave state automatically. In this moment, all RRD graphs and log files that were collected in the meantime will be re-synced with the real master again.
Only RRD graphs and logs are getting re-synced back. Configuration changes are not synced back! Any config changes done while the slave acted as master will be lost. Better consider creating a backup of the configuration to restore it on the real master after switching back. |
The instructions are based on the example setup as described below.
This is the SONARPLEX HA master device which contains the productive host- and servicechecks.
IP-Address | 172.16.0.100 |
---|---|
azeti Agent Port | 4192 |
azeti Agent Password | SamplePW |
This is the SONARPLEX HA slave device which is completely empty and contains no productive information.
IP-Address | 172.16.0.200 |
---|
The slave runs just one single HA service check to maintain HA capabilities. This service should run in short intervals (3-5 min.) to reduce the outage times and gaps in logging and graphs.
The general procedure is as follows:
Create new host with the IP of the SONARPLEX HA master, in this case 172.16.0.100.
To verify the functionality of the high availability setup on your SONARPLEX devices, follow these steps:
Failover behavior: automatic, HA mode 0: Monitor process is up, last complete syncronization: 2014-08-20 08:13:28 (302 files) |
Let the SONARPLEX HA slaves service check "check_azeti_ha" run through the following HA modes:
Check result | HA mode |
---|---|
OK | HA mode 0: Monitor process is up |
WARNING | HA mode 1: Machine seems to be down |
1st hard CRITICAL | HA mode 2: Machine seems to be down |
2nd hard CRITICAL | Machine reboots with HA master configuration |
The SONARPLEX HA slave is about to reboot with all configurations, log files and graphs from the last complete syncronization.
There are three ways to trigger a manual failover in case of a functionality test or a scheduled maintenance. These all result in the slave SONARPLEX becoming the master SONARPLEX.
The first method is to shutdown the SONARPLEX HA master completely. To do this, follow these steps:
The next way to trigger a manual failover is to stop the monitoring process on the SONARPLEX HA master:
The last way is to just disconnect the SONARPLEX HA master from the network so the SONARPLEX HA slave won't recognize it anymore. This is not recommended as the monitoring procss on the former SONARPLEX HA master is still running and thus, you can end up with inconsistent graphs and logs, once the connection has been re-established.
If a failover has taken place and all problems have been solved, you may wish to switch back to the original state of the setup (former slave being slave and former master being master again). This can be done simply by starting both SONARPLEX appliances and having them connected to your network in a way that they can communicate with each other. The former SONARPLEX HA slave will notice the SONARPLEX HA master to be back online and automatically synchronizes the collected graphs and log files. After completion, it will recover its former configuration with the only check being the HA master check.
Only log files and graphs are synchronized back to the SONARPLEX HA master! Changes in configuration done on the SONARPLEX HA slave while running as HA master are not getting transferred back and will be lost! If you made changes to the config, create a backup and save it prior to the switchback. |
If one of your HA members suffers a hardware failure and needs to be replaced, contact support@azeti.net for assistance in creating an RMA. The following article describes the procedure after receiving your RMA replacement device.
If this takes too long, you also have the opportunity to exchange the roles of both HA devices. This way, the running device will become the full SONARPLEX HA master and the replaced unconfigured device will become the new SONARPLEX HA slave.