Categories
Announcements

Server monitoring: management of false positives

We have recently noticed that a functionality developed for website monitoring was sorely lacking regarding server monitoring.

That is why today’s update proposes to correct this problem and we will explain it to you.

Website monitoring

When our robots monitor your websites, they perform tests from probes located around the world as we explained earlier.

As our probes are not infallible, it is possible that one of them may encounter temporary difficulties such as a DNS resolution problem or network slowdowns.

To avoid alerting you for no reason in case the problem comes from the probe, our robots will perform new tests before concluding, if necessary, that your website is out of order.

This is what we call at home the control of “false positive” tests. This rule explains why you can sometimes see several tests at the same time in your availability reports.

Server monitoring

The most obvious case to illustrate the problem of false positives for server monitoring is when the CPU alert threshold was too low.

In this situation, if at the time of data collection the CPU usage temporarily exceeds the limit, an alert will be sent. The next time the CPU usage is below the threshold again, the anomaly will be cleared.

To limit unwanted alerts, a double-checking option has been added and can be activated like this:

  1. Login to your Hitflow manager
  2. Go to the list of servers via “Monitoring > Servers”.
  3. Edit a server by clicking on the “edit” button.
  4. Go to the “Alert settings” section where you will be able to check the “Double-check” box to activate the option.
  5. Save the changes.

From now on, it will take two reports with an exceeded threshold to consider that the server is not working properly.

In the case of a website, false positive control is not an option. However, for servers, we prefer to let you be the judge of the situation.

Indeed, while a website test is initiated by our robots, in the case of monitoring the resources of your servers, it is your server that must send the information.

It is therefore up to you to decide whether or not you want to wait for a second report before sending an alert.