A Wazuh cluster is a group of Wazuh managers that work together to enhance the availability and scalability of the service. With a Wazuh cluster setup, we have the potential to greatly increase the number of agents as long as we add worker nodes whenever necessary. So the question is, why should we use a load balancer?

The term load balancing refers to the distribution of workloads across several back-end servers. The use of a load balancer (LB) in a Wazuh cluster provides several benefits:

  • The workload distribution optimizes the usage of processing resources (Wazuh cluster nodes).
  • It helps avoid node overload.
  • It increases service reliability. The agents communicate actively with the cluster even if a node falls down.

Setting up Wazuh cluster environment with NGINX LB

There are several types of available load balancers, including Application Load Balancers and Network Load Balancers. Application LBs’ main purpose is to route Layer 7 (HTTP/HTTPS) traffic, while Network LBs are in charge of handling Layer 4 (TCP) connections. Hence, since we want to configure the LB to distribute agent reports, we should deploy a TCP Load Balancer.

Load balancer algorithms

We are going to setup NGINX to load balance a Wazuh cluster. However, regardless of the chosen LB, there are a bunch of available methodologies a LB can use to distribute the workloads. The most common ones are:

  • Round Robin: Directs traffic sequentially; usually the default. Round robin works best when all the server machines are equally powerful. However, if this is not the case, round robin doesn’t work very well. The under-powered servers will have to handle the same load as the others.
  • Least connections: Dynamically selects the server with the lowest number of active connections. In an environment with servers of unequal processing capacity, it will obtain the relative load of the server by factoring in both the number of connections and the server’s capacity. This method provides the best load balance but is quite complex.
  • Least time: Expands upon least connections by factoring in lowest latency as well.
  • Hash: Selects the server based on user-defined keys such as a source IP address. Hash method uses these keys to allocate the clients (agents) to particular servers (manager nodes). Since the key can be regenerated if the session is broken, this method ensures that the client reconnects to the same server after the session is restored. This is especially useful if it is important that the client connects back to an active session. It also reduces data sharing needs on the cluster side.
  • Random: Directs the traffic to a randomly selected server. The advantages are clear: simplicity and lightweight. However, it leaves us completely unaware of possible server overloads, resulting in lag or lost data.

It is up to the user to select which LB and distribution method to use. Each algorithm has pros and cons, and the specifics of the environment should be studied beforehand. Nonetheless, in a default setup and due to Wazuh’s nature, hash IP is highly recommended as it adds persistence to the communication between Wazuh agents and nodes.

Wazuh configuration

When setting up a load balancer within a Wazuh cluster, it is important to remember to:

  • Disable use_source_ip for authd registration process
  • Point agents to the load balancer’s IP
  • Use TCP protocol if we want to ensure data consistency

If you need help configuring your own Wazuh cluster, please visit configuring a Wazuh cluster in our documentation.

Registering agents through a LB

In this setup, the cluster nodes and the agents are blind to the existence of the other, with the LB acting as the middleman.Wazuh cluster with load balancer

For this reason, we need to ensure the proper registration of the agents in the master node. To achieve this, we change use_source_ip in the master’s ossec.conf to no, allowing the agent to register using any IP.

<auth>
  <disabled>no</disabled>
  <port>1515</port>
  <use_source_ip>no</use_source_ip>

Agents pointing to the LB

Usually, Wazuh agents are configured to report to a specific manager. However, we now need the agents to report to the LB, and the LB will take care of the distribution. For this, we simply add the LB IP to every agent’s ossec.conf. In an environment with a large number of agents, agent groups and centralized configuration can help us with the configuration process.

<ossec_config>
  <client>
    <server>       
      <address>LB_IP</address>

Using TCP protocol

If we want to obtain data consistently, we need permanent connections. As a result, we recommend using TCP protocol instead of UDP. TCP is a reliable connection-oriented transport protocol with error checking, while UDP does not make sure that data reaches the target. In order to use the TCP protocol, we should configure both our agents and our Wazuh manager nodes. In ossec.conf, we change protocol to tcp.

<ossec_config>
  <client>
    <server>                
      <address>LB_IP</address>
      <port>1514</port>
      <protocol>tcp</protocol>

NGINX LB configuration

An example configuration of NGINX as a TCP load balancer for a Wazuh cluster would be:

stream {
    upstream master {
        server wazuh-master:1515;
    }
    upstream mycluster {
	hash $remote_addr consistent;
        server wazuh-master:1514;
        server wazuh-worker1:1514;
        server wazuh-worker2:1514;
    }
    server {
        listen 1515;
        proxy_pass master;
    }
    server {
        listen 1514;
        proxy_pass mycluster;
    }
}

We have set up two different upstream servers, mycluster and master. The first one, listening and responding on port 1515, is configured to accommodate agents registration using authd process. For that reason, only wazuh-master is used. The second server uses port 1514 to redirect agents reporting traffic to the cluster nodes, in this case wazuh-master, wazuh-worker1 and wazuh-worker2. The distribution algorithm used is hash $remote_addr, ensuring the communication between an agent and a node is persistent during the entire session.

Similarly to thix NGINX configuration example, we could setup a LB in a Wazuh cluster using several other suppliers, such as F5, Imperva, AWS NLB, etc.

NGINX LB Use case

We have a setup with previous NGINX configuration and 5 agents registered and reporting to Wazuh. We use Wazuh API to check which nodes our agents are currently reporting to:

curl -u foo:bar "http://localhost:55000/agents?pretty&select=node_name"
{
   "error": 0,
   "data": {
      "items": [
         {
            "node_name": "master-node",
            "id": "000"
         },
         {
            "node_name": "master-node",
            "id": "001"
         },
         {
            "node_name": "worker2",
            "id": "002"
         },
         {
            "node_name": "master-node",
            "id": "003"
         },
         {
            "node_name": "worker1",
            "id": "004"
         },
         {
            "node_name": "worker2",
            "id": "005"
         }
      ],
      "totalItems": 6
   }
}

We see agent 004 is actually reporting to worker1. As we initially stated, using a LB increases service reliability. If node worker1 falls down, agent 004 gets reconnected to the cluster through the LB after just a few seconds.

2019/06/27 10:20:32 ossec-agentd: ERROR: (1137): Lost connection with manager. Setting lock.
2019/06/27 10:20:32 ossec-logcollector: WARNING: Process locked due to agent is offline. Waiting for connection...
2019/06/27 10:20:41 ossec-agentd: INFO: Trying to connect to server (nginx-lb/192.168.96.2:1514/tcp).
2019/06/27 10:20:41 ossec-agentd: INFO: (4102): Connected to the server (nginx-lb/192.168.96.2:1514/tcp).
2019/06/27 10:20:41 ossec-agentd: INFO: Server responded. Releasing lock.
2019/06/27 10:20:42 ossec-logcollector: INFO: Agent is now online. Process unlocked, continuing...

Finally, we use the API once more to confirm agent 004 is now reporting to a different node.

curl -u foo:bar "http://localhost:55000/agents?pretty&select=node_name"
{
   "error": 0,
   "data": {
      "items": [
         {
            "node_name": "master-node",
            "id": "000"
         },
         {
            "node_name": "master-node",
            "id": "001"
         },
         {
            "node_name": "worker2",
            "id": "002"
         },
         {
            "node_name": "master-node",
            "id": "003"
         },
         {
            "node_name": "master-node",
            "id": "004"
         },
         {
            "node_name": "worker2",
            "id": "005"
         }
      ],
      "totalItems": 6
   }
}

Conclusion

We have discussed the necessary steps to add a LB to a Wazuh cluster and the benefits of doing so.  We have also presented a guideline for LB algorithm selection together with an NGINX use case. In this way, we have determined the steps to increase the reliability of our Wazuh cluster and to optimize the usage of our resources. Additionally, we can further improve the security of our setup by adding a High Availability Load Balancer to perform automatic failovers.

References