26 KiB
ProxLB - (Re)Balance VM Workloads in Proxmox Clusters
Table of Contents
- ProxLB - (Re)Balance VM Workloads in Proxmox Clusters
Introduction
ProxLB (PLB) is an advanced tool designed to enhance the efficiency and performance of Proxmox clusters by optimizing the distribution of virtual machines (VMs) or Containers (CTs) across the cluster nodes by using the Proxmox API. ProxLB meticulously gathers and analyzes a comprehensive set of resource metrics from both the cluster nodes and the running VMs. These metrics include CPU usage, memory consumption, and disk utilization, specifically focusing on local disk resources.
PLB collects resource usage data from each node in the Proxmox cluster, including CPU, (local) disk and memory utilization. Additionally, it gathers resource usage statistics from all running VMs, ensuring a granular understanding of the cluster's workload distribution.
Intelligent rebalancing is a key feature of ProxLB where it re-balances VMs based on their memory, disk or CPU usage, ensuring that no node is overburdened while others remain underutilized. The rebalancing capabilities of PLB significantly enhance cluster performance and reliability. By ensuring that resources are evenly distributed, PLB helps prevent any single node from becoming a performance bottleneck, improving the reliability and stability of the cluster. Efficient rebalancing leads to better utilization of available resources, potentially reducing the need for additional hardware investments and lowering operational costs.
Automated rebalancing reduces the need for manual actions, allowing operators to focus on other critical tasks, thereby increasing operational efficiency.
Video of Migration
Features
- Rebalance the cluster by:
- Memory
- Disk (only local storage)
- CPU
- Rolling Updates
- Auto Node Patching
- Moving workloads to other nodes
- Performing
- Periodically
- One-shot solution
- Types
- Rebalance only VMs
- Rebalance only CTs
- Rebalance all (VMs and CTs)
- Filter
- Exclude nodes
- Exclude virtual machines
- Grouping
- Include groups (VMs that are rebalanced to nodes together)
- Exclude groups (VMs that must run on different nodes)
- Ignore groups (VMs that should be untouched)
- Dry-run support
- Human readable output in CLI
- JSON output for further parsing
- Migrate VM workloads away (e.g. maintenance preparation)
- Fully based on Proxmox API
- ProxLB API (own API)
- Usage
- One-Shot (one-shot)
- Periodically (daemon)
- Proxmox Web GUI Integration (optional)
How does it work?
ProxLB is a load-balancing system designed to optimize the distribution of virtual machines (VMs) and containers (CTs) across a cluster. It works by first gathering resource usage metrics from all nodes in the cluster through the Proxmox API. This includes detailed resource metrics for each VM and CT on every node. ProxLB then evaluates the difference between the maximum and minimum resource usage of the nodes, referred to as "Balanciness." If this difference exceeds a predefined threshold (which is configurable), the system initiates the rebalancing process.
Before starting any migrations, ProxLB validates that rebalancing actions are necessary and beneficial. Depending on the selected balancing mode — such as CPU, memory, or disk — it creates a balancing matrix. This matrix sorts the VMs by their maximum used or assigned resources, identifying the VM with the highest usage. ProxLB then places this VM on the node with the most free resources in the selected balancing type. This process runs recursively until the operator-defined Balanciness is achieved. Balancing can be defined for the used or max. assigned resources of VMs/CTs.
Usage
Running PLB is easy and it runs almost everywhere since it just depends on Python3 and the proxmoxer library. Therefore, it can directly run on a Proxmox node, dedicated systems like Debian, RedHat, or even FreeBSD, as long as the API is reachable by the client running PLB.
Dependencies
- Python3
- proxmoxer (Python module)
Options
The following options can be set in the proxlb.conf file:
| Option | Example | Description | Default |
|---|---|---|---|
| api_host | hypervisor01.gyptazy.ch | Host or IP address of the remote Proxmox API. | hypervisor01.gyptazy.ch |
| api_user | root@pam | Username for the API. | root@pam |
| api_pass | FooBar | Password for the API. | FooBar |
| verify_ssl | 1 | Validate SSL certificates (1) or ignore (0). | 1 |
| method | memory | Defines the balancing method where you can use memory, disk or cpu. |
memory |
| mode | used | Rebalance by used resources (efficiency) or assigned (avoid overprovisioning) resources. |
used |
| mode_option | bytes | Rebalance by node's resources in bytes or percent. |
bytes |
| type | vm | Rebalance only vm (virtual machines), ct (containers) or all (virtual machines & containers). |
vm |
| balanciness | 10 | Value of the percentage of lowest and highest resource consumption on nodes may differ before rebalancing. | 10 |
| parallel_migrations | 1 | Defines if migrations should be done parallely or sequentially. | 1 |
| ignore_nodes | virt01,dev-virt* | Defines a comma separated list of nodes to exclude. | None |
| ignore_vms | mysql01 | Defines a comma separated list of VMs to exclude. (* as suffix wildcard or tags are also supported) |
testvm01,testvm02 |
| master_only | 0 | Defines is this should only be performed (1) on the cluster master node or not (0). | 0 |
| daemon | 1 | Run as a daemon (1) or one-shot (0). | 1 |
| schedule | 24 | Hours to rebalance in hours. | 24 |
| log_verbosity | INFO | Defines the log level where you can use INFO, WARN or CRITICAL. |
CRITICAL |
| proxlb_api_enable | 0 | Enables (1) the ProxLB own API. | 0 |
| proxlb_api_listener | 0.0.0.0 | Defines the listener address for the ProxLB API. | 0.0.0.0 |
| proxlb_api_port | 8008 | Defines the tcp port for the ProxLB API to listen. | 8008 |
| rolling_updates | 0 | Defines if rolling updates (auto node patching) should be activated. | 0 |
An example of the configuration file looks like:
[proxmox]
api_host: hypervisor01.gyptazy.ch
api_user: root@pam
api_pass: FooBar
verify_ssl: 1
[balancing]
method: memory
mode: used
type: vm
# Balanciness defines how much difference may be
# between the lowest & highest resource consumption
# of nodes before rebalancing will be done.
# Examples:
# Rebalancing: node01: 41% memory consumption :: node02: 52% consumption
# No rebalancing: node01: 43% memory consumption :: node02: 50% consumption
balanciness: 10
# Enable parallel migrations. If set to 0 it will wait for completed migrations
# before starting next migration.
parallel_migrations: 1
ignore_nodes: dummynode01,dummynode02
ignore_vms: testvm01,testvm02
[service]
# The master_only option might be usuful if running ProxLB on all nodes in a cluster
# but only a single one should do the balancing. The master node is obtained from the Proxmox
# HA status.
master_only: 0
daemon: 1
[api]
enable: 0
[misc]
rolling_updates: 0
Parameters
The following options and parameters are currently supported:
| Option | Long Option | Description | Default |
|---|---|---|---|
| -c | --config | Path to a config file. | /etc/proxlb/proxlb.conf (default) |
| -d | --dry-run | Perform a dry-run without doing any actions. | Unset |
| -j | --json | Return a JSON of the VM movement. | Unset |
Balancing
General
In general, virtual machines and containers can be rebalanced and moved around nodes in the cluster. Often, this also works without downtime without any further downtimes. However, this does not work with containers. LXC based containers will be shutdown, copied and started on the new node. Also to note, live migrations can work fluently without any issues but there are still several things to be considered. This is out of scope for ProxLB and applies in general to Proxmox and your cluster setup. You can find more details about this here: https://pve.proxmox.com/wiki/Migrate_to_Proxmox_VE.
By Used Memory of VMs/CTs
By continuously monitoring the current resource usage of VMs, ProxLB intelligently reallocates workloads to prevent any single node from becoming overloaded. This approach ensures that resources are balanced efficiently, providing consistent and optimal performance across the entire cluster at all times. To activate this balancing mode, simply activate the following option in your ProxLB configuration:
mode: used
Afterwards, restart the service (if running in daemon mode) to activate this rebalancing mode.
By Assigned Memory of VMs/CTs
By ensuring that resources are always available for each VM, ProxLB prevents over-provisioning and maintains a balanced load across all nodes. This guarantees that users have consistent access to the resources they need. However, if the total assigned resources exceed the combined capacity of the cluster, ProxLB will issue a warning, indicating potential over-provisioning despite its best efforts to balance the load. To activate this balancing mode, simply activate the following option in your ProxLB configuration:
mode: assigned
Afterwards, restart the service (if running in daemon mode) to activate this rebalancing mode.
Grouping
Include (Stay Together)
Access the Proxmox Web UI by opening your web browser and navigating to your Proxmox VE web interface, then log in with your credentials. Navigate to the VM you want to tag by selecting it from the left-hand navigation panel. Click on the "Options" tab to view the VM's options, then select "Edit" or "Add" (depending on whether you are editing an existing tag or adding a new one). In the tag field, enter plb_include_ followed by your unique identifier, for example, plb_include_group1. Save the changes to apply the tag to the VM. Repeat these steps for each VM that should be included in the group.
Exclude (Stay Separate)
Access the Proxmox Web UI by opening your web browser and navigating to your Proxmox VE web interface, then log in with your credentials. Navigate to the VM you want to tag by selecting it from the left-hand navigation panel. Click on the "Options" tab to view the VM's options, then select "Edit" or "Add" (depending on whether you are editing an existing tag or adding a new one). In the tag field, enter plb_exclude_ followed by your unique identifier, for example, plb_exclude_critical. Save the changes to apply the tag to the VM. Repeat these steps for each VM that should be excluded from being on the same node.
Ignore VMs (Tag Style)
In Proxmox, you can ensure that certain VMs are ignored during the rebalancing process by setting a specific tag within the Proxmox Web UI, rather than solely relying on configurations in the ProxLB config file. This can be achieved by adding the tag 'plb_ignore_vm' to the VM. Once this tag is applied, the VM will be excluded from any further rebalancing operations, simplifying the management process.
Rolling Updates
Warning: This feature is still in beta! Do NOT use this on production systems!
Rolling updates ensure that the cluster and its nodes are always up to date by integrating the pending updates from the defined system repository. With every run of the rebalancing, the executing node will also check is the ProxLB API (proxlb_api_enable) and the rolling update feature (rolling_updates) are enabled. Both ones activated, will perform the following logic:
- Check if updates are present
- Install updates
- Validate if updates require a reboot:
- -> No Reboot:
- -> Done
- -> Reboot required:
- -> Set self to maintenance mode in ProxLB API
- -> Query all other nodes on the cluster on the ProxLB API
- -> Any Node in maintenance:
- -> Stop
- -> No other Node in maintenance:
- -> Move all VMs/CTs to other nodes
- -> Reboot Node
- -> No Reboot:
Please take note, that this feature requires a patched Proxmox API file. All actions should only be performed by the Proxmox or ProxLB API. Currently, the Proxmox API does not have any method to perform and install updates. Therefore, a patched API node file is required. ProxLB will vlaidate if the needed API endpoint is needed and if missing stop the rolling update functionality. The patched API functionality can be integrated by installing the package proxlb-addition-api.deb and is required on all nodes in a cluster. This package is not listed in the regular repository because if overwrites the present file(s). This is highly WIP and should not be used on production systems right now!
Note: This feature requires you to activate the ProxLB API and also the package proxlb-addition-api.deb.
ProxLB API
ProxLB comes with its own API. The API is based on Python's `FastAPI` and provides additional features. The API is required when using the rolling updates feature.
Configuration
The API has some configuration parameters. By defalult, it listens on 0.0.0.0 and the tcp port 8008 and is from any host accessable and does not require any authentiocation yet. You may firewall or add authentications with a reverse proxy. Currently, you can define to enable it, the listener and the port.
| Option | Example | Description | Default |
|---|---|---|---|
| proxlb_api_enable | 0 | Enables (1) the ProxLB own API. | 0 |
| proxlb_api_listener | 0.0.0.0 | Defines the listener address for the ProxLB API. | 0.0.0.0 |
| proxlb_api_port | 8008 | Defines the tcp port for the ProxLB API to listen. | 8008 |
Features
This sections just covers a few exmaples what the API provides to have a rough overview.
| Path | Method | Return Example | Description |
|---|---|---|---|
| /status | get | {'status': 'running', 'code': 0, 'monitoring': 'OK'} | Returns a JSON health monitoing output. |
| /updates/self/run | get | 0/1 | Triggers a node to be actively performing updates. |
| /updates/self/status | get | 0/1 | Returns the node's update status. |
You can find all API functions in its Swagger interface. When running ProxLB with an enabled API interface, the docs can be accesed on the content path /docs. For example, simply open up https://hypervisor01.gyptazy.ch:8008/docs.
Systemd
When installing a Linux distribution (such as .deb or .rpm) file, this will be shipped with a systemd unit file. The default configuration file will be sourced from /etc/proxlb/proxlb.conf.
| Unit Name | Options |
|---|---|
| proxlb | start, stop, status, restart |
Manual
A manual installation is possible and also supports BSD based systems. Proxmox Rebalancing Service relies on mainly two important files:
- proxlb (Python Executable)
- proxlb.conf (Config file)
The executable must be able to read the config file, if no dedicated config file is given by the -c argument, PLB tries to read it from /etc/proxlb/proxlb.conf.
Proxmox GUI Integration
PLB can also be directly be used from the Proxmox Web UI by installing the optional package proxlb-addition-ui.deb package which has a dependency on the proxlb package. For the Web UI integration, it requires to be installed (in addition) on the nodes on the cluster. Afterwards, a new menu item is present in the HA chapter called Rebalancing. This chapter provides two possibilities:
- Rebalancing VM workloads
- Migrate VM workloads away from a defined node (e.g. maintenance preparation)
Quick Start
The easiest way to get started is by using the ready-to-use packages that I provide on my CDN and to run it on a Linux Debian based system. This can also be one of the Proxmox nodes itself.
wget https://cdn.gyptazy.ch/files/amd64/debian/proxlb/proxlb_1.0.0_amd64.deb
dpkg -i proxlb_1.0.0_amd64.deb
# Adjust your config
vi /etc/proxlb/proxlb.conf
systemctl restart proxlb
systemctl status proxlb
Container Quick Start (Docker/Podman)
Creating a container image of ProxLB is straightforward using the provided Dockerfile. The Dockerfile simplifies the process by automating the setup and configuration required to get ProxLB running in a container. Simply follow the steps in the Dockerfile to build the image, ensuring all dependencies and configurations are correctly applied. For those looking for an even quicker setup, a ready-to-use ProxLB container image is also available, eliminating the need for manual building and allowing for immediate deployment.
git clone https://github.com/gyptazy/ProxLB.git
cd ProxLB
docker build -t proxlb .
Afterwards simply adjust the config file to your needs:
vi /etc/proxlb/proxlb.conf
Finally, start the created container.
docker run -it --rm -v $(pwd)/proxlb.conf:/etc/proxlb/proxlb.conf proxlb
Logging
ProxLB uses the SystemdHandler for logging. You can find all your logs in your systemd unit log or in the journalctl. In default, ProxLB only logs critical events. However, for further understanding of the balancing it might be useful to change this to INFO or DEBUG which can simply be done in the proxlb.conf file by changing the log_verbosity parameter.
Available logging values:
| Verbosity | Description |
|---|---|
| DEBUG | This option logs everything and is needed for debugging the code. |
| INFO | This option provides insides behind the scenes. What/why has been something done and with which values. |
| WARNING | This option provides only warning messages, which might be a problem in general but not for the application itself. |
| CRITICAL | This option logs all critical events that will avoid running ProxLB. |
Motivation
As a developer managing a cluster of virtual machines for my projects, I often encountered the challenge of resource imbalance. Nodes within the cluster would become unevenly loaded, with some nodes being overburdened while others remained underutilized. This imbalance led to inefficiencies, performance bottlenecks, and increased operational costs. Frustrated by the lack of an adequate solution to address this issue, I decided to develop the ProxLB (PLB) to ensure better resource distribution across my clusters.
My primary motivation for creating PLB stemmed from my work on my BoxyBSD project, where I consistently faced the difficulty of maintaining balanced nodes while running various VM workloads but also on my personal clusters. The absence of an efficient rebalancing mechanism made it challenging to achieve optimal performance and stability. Recognizing the necessity for a tool that could gather and analyze resource metrics from both the cluster nodes and the running VMs, I embarked on developing ProxLB.
PLB meticulously collects detailed resource usage data from each node in a Proxmox cluster, including CPU load, memory usage, and local disk space utilization. It also gathers comprehensive statistics from all running VMs, providing a granular understanding of the workload distribution. With this data, PLB intelligently redistributes VMs based on memory usage, local disk usage, and CPU usage. This ensures that no single node is overburdened, storage resources are evenly distributed, and the computational load is balanced, enhancing overall cluster performance.
As an advocate of the open-source philosophy, I believe in the power of community and collaboration. By sharing solutions like PLB, I aim to contribute to the collective knowledge and tools available to developers facing similar challenges. Open source fosters innovation, transparency, and mutual support, enabling developers to build on each other's work and create better solutions together.
Developing PLB was driven by a desire to solve a real problem I faced in my projects. However, the spirit behind this effort was to provide a valuable resource to the community. By open-sourcing PLB, I hope to help other developers manage their clusters more efficiently, optimize their resource usage, and reduce operational costs. Sharing this solution aligns with the core principles of open source, where the goal is not only to solve individual problems but also to contribute to the broader ecosystem.
References
Here you can find some overviews of references for and about the ProxLB (PLB):
| Description | Link |
|---|---|
| General introduction into ProxLB | https://gyptazy.ch/blog/proxlb-rebalancing-vm-workloads-across-nodes-in-proxmox-clusters/ |
| Howto install and use ProxLB on Debian to rebalance vm workloads in a Proxmox cluster | https://gyptazy.ch/howtos/howto-install-and-use-proxlb-to-rebalance-vm-workloads-across-nodes-in-proxmox-clusters/ |
Downloads
ProxLB can be obtained in man different ways, depending on which use case you prefer. You can use simply copy the code from GitHub, use created packages for Debian or RedHat based systems, use a Repository to keep ProxLB always up to date or simply use a Container image for Docker/Podman.
Packages
Ready to use packages can be found at:
- https://cdn.gyptazy.ch/files/amd64/debian/proxlb/
- https://cdn.gyptazy.ch/files/amd64/ubuntu/proxlb/
- https://cdn.gyptazy.ch/files/amd64/redhat/proxlb/
- https://cdn.gyptazy.ch/files/amd64/freebsd/proxlb/
Repository
Debian based systems can also use the repository by adding the following line to their apt sources:
deb https://repo.gyptazy.ch/ /
The Repository's GPG key can be found at: https://repo.gyptazy.ch/repo/KEY.gpg
You can also simply import it by running:
# KeyID: DEB76ADF7A0BAADB51792782FD6A7A70C11226AA
# SHA256: 5e44fffa09c747886ee37cc6e9e7eaf37c6734443cc648eaf0a9241a89084383 KEY.gpg
wget -O /etc/apt/trusted.gpg.d/proxlb.asc https://repo.gyptazy.ch/repo/KEY.gpg
Note: The defined repositories repo.gyptazy.ch and repo.proxlb.de are the same!
Container Images (Docker/Podman)
Container Images for Podman, Docker etc., can be found at:
| Version | Image |
|---|---|
| latest | cr.gyptazy.ch/proxlb/proxlb:latest |
| v1.0.0 | cr.gyptazy.ch/proxlb/proxlb:v1.0.0 |
| v0.9.9 | cr.gyptazy.ch/proxlb/proxlb:v0.9.9 |
Misc
Bugs
Bugs can be reported via the GitHub issue tracker here. You may also report bugs via email or deliver PRs to fix them on your own. Therefore, you might also see the contributing chapter.
Contributing
Feel free to add further documentation, to adjust already existing one or to contribute with code. Please take care about the style guide and naming conventions. You can find more in our CONTRIBUTING.md file.
Support
If you need assistance or have any questions, we offer support through our dedicated chat room in Matrix and on Reddit. Join our community for real-time help, advice, and discussions. Connect with us in our dedicated chat room for immediate support and live interaction with other users and developers. You can also visit our Reddit community to post your queries, share your experiences, and get support from fellow community members and moderators. You may also just open directly an issue here on GitHub. We are here to help and ensure you have the best experience possible.
| Support Channel | Link |
|---|---|
| Matrix | #proxlb:gyptazy.ch |
| Reddit community | |
| GitHub | ProxLB GitHub |
Author(s)
- Florian Paul Azim Hoberg @gyptazy (https://gyptazy.ch)