|
Logging: Disable the logging facilities if your system runs stable. Logging has an impact on the overall I/O resources so only enable it if necessary.
|
The logging capabilities help you to identify issues in any Distributed Monitoring setup. Besides this you can use below check list to rule out possible errors.
Sattelite SONARPLEX Log: "Distributed Monitoring (Sattelite Processor)”
The return codes for the send and receive commands are logged, which will give hints if the processing was successful or not. A return code other than 0 is an indication for a problem. |
Beginning with SONARPLEX OS 3.7.0a default service checks for the performance of the SONARPLEX are added by default, find them at the default host –azeti-A-. These new checks help you to identify bottlenecks and to scale up in time.
The average CPU and MEM usage over time should range below 75%.
A high number of concurrent processes imply a configuration issue. Often this is caused by a service check command with a high execution time (5 seconds and more) which forces other processes to wait in the queue, this effect sums up and causes a large number of processes and a high load. The load is the number of processes, which are waiting for system resources (I/O). The load should range below 7 – 10.
This is the amount of time between the scheduled execution time and the actual execution time. Ideally every service check has a latency of 0 seconds. Make sure to have a latency below 10 seconds, better below 5. If you see high latencies than there are too much concurrent service checks, this can be adjusted slightly by decreasing the concurrent service check number (Configuration > System > Load Configuration) but the ideal solution is to scale up with an addition SONARPLEX. Recommendation: < 5 seconds
The average service check execution time is a important performance metric as it helps you to identify the average cost of your checks. The smaller the execution, the more service checks can be executed per minute. A high execution time implies slow service check commands, have a look into each different service to identify the slow and costly service checks. Either try to optimize the service check plug-in or increase the service check interval to lower the overall execution time. The service check execution time should ideally range below 3-5 seconds. Recommendation: < 3-5 seconds
The appropriate sizing of a VAA highly depends on the used service check commands, service check interval and the “cost” (execution time) of the service checks. Try to start with a small machine setup and scale it up as the load increases. Make sure to keep an eye on the most important performance metrics (service check latency and service check execution time). Below is a table with sizing recommendations depending of the number of services.
Service Checks | CPU | RAM |
---|---|---|
up to 100 | Single Core, 500 MHz | 512 MB |
300 | Dual Core, 1 GHz | 1 GB |
1000 | Dual Core, 2 GHz | 2 GB |
3000 | Multi CPU, Multi Core, 2,5 GHZ or better | 4 GB |