NGINX Load balancer for a Wazuh cluster

A Wazuh cluster is a group of Wazuh managers that work together to enhance the availability and scalability of the service. With a Wazuh cluster setup, we have the potential to greatly increase the number of agents as long as we add worker nodes whenever necessary. So the question is, why should we use a NGINX load balancer?
The term load balancing refers to the distribution of workloads across several back-end servers. The use of a load balancer (LB) in a Wazuh cluster provides several benefits:
There are several types of available load balancers, including Application Load Balancers and Network Load Balancers. Application LBs’ main purpose is to route Layer 7 (HTTP/HTTPS) traffic, while Network LBs are in charge of handling Layer 4 (TCP) connections. Hence, since we want to configure the LB to distribute agent reports, we should deploy a TCP Load Balancer.
We are going to set up NGINX to load balance a Wazuh cluster. However, regardless of the chosen LB, there are a bunch of available methodologies an LB can use to distribute the workloads. The most common ones are:
It is up to the user to select which LB and distribution method to use. Each algorithm has pros and cons, and the specifics of the environment should be studied beforehand. Nonetheless, in a default setup and due to Wazuh’s nature, hash IP is highly recommended as it adds persistence to the communication between Wazuh agents and nodes.
When setting up a load balancer within a Wazuh cluster, it is important to remember to:
If you need help configuring your own Wazuh cluster, please visit configuring a Wazuh cluster in our documentation.
In this setup, the cluster nodes and the agents are blind to the existence of the other, with the LB acting as the middleman.
For this reason, we need to ensure the proper registration of the agents in the master node. To achieve this, we change use_source_ip
in the master’s ossec.conf
to no
, allowing the agent to register using any IP.
<auth> <disabled>no</disabled> <port>1515</port> <use_source_ip>no</use_source_ip>
Usually, Wazuh agents are configured to report to a specific manager. However, we now need the agents to report to the LB, and the LB will take care of the distribution. For this, we simply add the LB IP to every agent’s ossec.conf
. In an environment with a large number of agents, agent groups and centralized configuration can help us with the configuration process.
<ossec_config> <client> <server> <address>LB_IP</address>
If we want to obtain data consistently, we need permanent connections. As a result, we recommend using TCP protocol instead of UDP. TCP is a reliable connection-oriented transport protocol with error checking, while UDP does not make sure that data reaches the target. In order to use the TCP protocol, we should configure both our agents and our Wazuh manager nodes. In ossec.conf
, we change protocol
to tcp
.
<ossec_config> <client> <server> <address>LB_IP</address> <port>1514</port> <protocol>tcp</protocol>
An example configuration of NGINX as a TCP load balancer for a Wazuh cluster would be:
stream { upstream master { server wazuh-master:1515; } upstream mycluster { hash $remote_addr consistent; server wazuh-master:1514; server wazuh-worker1:1514; server wazuh-worker2:1514; } server { listen 1515; proxy_pass master; } server { listen 1514; proxy_pass mycluster; } }
We have set up two different upstream servers, mycluster and master. The first one, listening and responding on port 1515, is configured to accommodate agents registration using authd
process. For that reason, only wazuh-master is used. The second server uses port 1514 to redirect agents reporting traffic to the cluster nodes, in this case wazuh-master, wazuh-worker1 and wazuh-worker2. The distribution algorithm used is hash $remote_addr
, ensuring the communication between an agent and a node is persistent during the entire session.
Similarly to this NGINX configuration example, we could setup a LB in a Wazuh cluster using several other suppliers, such as F5, Imperva, AWS NLB, etc.
We have a setup with previous NGINX configuration and 5 agents registered and reporting to Wazuh. We use Wazuh API to check which nodes our agents are currently reporting to:
curl -u foo:bar "http://localhost:55000/agents?pretty&select=node_name" { "error": 0, "data": { "items": [ { "node_name": "master-node", "id": "000" }, { "node_name": "master-node", "id": "001" }, { "node_name": "worker2", "id": "002" }, { "node_name": "master-node", "id": "003" }, { "node_name": "worker1", "id": "004" }, { "node_name": "worker2", "id": "005" } ], "totalItems": 6 } }
We see agent 004 is actually reporting to worker1. As we initially stated, using a LB increases service reliability. If node worker1 falls down, agent 004 gets reconnected to the cluster through the LB after just a few seconds.
2019/06/27 10:20:32 ossec-agentd: ERROR: (1137): Lost connection with manager. Setting lock. 2019/06/27 10:20:32 ossec-logcollector: WARNING: Process locked due to agent is offline. Waiting for connection 2019/06/27 10:20:41 ossec-agentd: INFO: Trying to connect to server (nginx-lb/192.168.96.2:1514/tcp). 2019/06/27 10:20:41 ossec-agentd: INFO: (4102): Connected to the server (nginx-lb/192.168.96.2:1514/tcp). 2019/06/27 10:20:41 ossec-agentd: INFO: Server responded. Releasing lock. 2019/06/27 10:20:42 ossec-logcollector: INFO: Agent is now online. Process unlocked, continuing
Finally, we use the API once more to confirm agent 004 is now reporting to a different node.
curl -u foo:bar "http://localhost:55000/agents?pretty&select=node_name" { "error": 0, "data": { "items": [ { "node_name": "master-node", "id": "000" }, { "node_name": "master-node", "id": "001" }, { "node_name": "worker2", "id": "002" }, { "node_name": "master-node", "id": "003" }, { "node_name": "master-node", "id": "004" }, { "node_name": "worker2", "id": "005" } ], "totalItems": 6 } }
We have discussed the necessary steps to add an LB to a Wazuh cluster and the benefits of doing so. We have also presented a guideline for LB algorithm selection together with an NGINX use case. In this way, we have determined the steps to increase the reliability of our Wazuh cluster and to optimize the usage of our resources. Additionally, we can further improve the security of our setup by adding a High Availability Load Balancer to perform automatic failovers.
If you have any questions about this, don’t hesitate to check out our documentation to learn more about Wazuh or join our community where our team and contributors will help you.