...
Section | ||
---|---|---|
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
...
|
Performance Optimization
- Set timeouts for service checks if available, adjust timeouts slightly to find the appropriate values (depends on type and “cost” of check, network latency and the environment). Ideally checks do not execute longer than 3-5 seconds, try a timout of 5 seconds
...
- Set check intervals wisely, don’t check everything every minute if there is not need for this (most performance issues come from unnecessary short check intervals)
...
- Use check_icmp instead of check_ping, this has a big impact on the performance
...
- Check only what you really need to check, adjust parameters to get the needed information with the least effort, for example don’t do check_icmp with 10 packets if you just want to check if a connection is up, 3 packets should be enough
...
- Slowly changing metrics can be checked with bigger intervals, e.g. HDD usage could be checked every 15-20 minutes instead of every 3 minutes
...
- Check http://<SONARPLEX-IP>/cgi-bin/ps.cgi to get detailed process and performance metrics, see the listing of every check with its individual execution time and latency to evaluate “evil” and “costly” service checks
...
- Enable performance data only if needed, processing and storing of the data is costly
...
- Enable SLA Processing (
...
> Configuration > System > SLA) only if needed, the processing can produce a heavy loadInsert excerpt _AdminWeb _AdminWeb nopanel true
Troubleshooting Distributed
...
Monitoring
The logging capabilities help you to identify issues in any Distributed Monitoring setup. Besides this you can use below check list to rule out possible errors.
- Check the logs for information about the delivery and receipt of the status:
...
- NOC SONARPLEX Log: "Distributed Monitoring (NOC Processor)”
Sattelite SONARPLEX Log: "Distributed Monitoring (Sattelite Processor)”
Tip
...
The return codes for the send and receive commands are logged, which will give hints if the processing was successful or not. A return code other than 0 is an indication for a problem.
- Check the network connectivity
...
- Make sure both machines can reach each other at least on the azeti agent port (default 4192)
...
- check your firewalling configuration
...
- check your routing if problems persist.
...
- Use the "Troubleshoot" function in azeti SONARMANAGER to ping and traceroute the devices each another (right click on a SONARPLEX node in the tree view)
- Monitor the azeti agent availability from the sattelites to the NOC SONARPLEX
...
- Configure the NOC appliance as a new host on your sattellite and the other way around.
...
- Add a service check to verify if they can reach each other through the agent connection (default port 4192), use the check command
...
- check_azeti_uptime
...
- or
...
- check_azeti_agentversion
...
- for example
...
SONARPLEX Performance Metrics
...
This is the amount of time between the scheduled execution time and the actual execution time. Ideally every service check has a latency of 0 seconds. Make sure to have a latency below 10 seconds, better below 5. If you see high latencies than there are too much concurrent service checks, this can be adjusted slightly by decreasing the concurrent service check number (Configuration :: System :: Configuration > System > Load Configuration) but the ideal solution is to scale up with an addition SONARPLEX. Recommendation: < 5 seconds
...