Compare commits

..

31 Commits

Author SHA1 Message Date
Florian
143135f1d8 Merge pull request #50 from gyptazy/release/v1.0.2
release: Prepare release v1.0.2
2024-08-13 17:10:37 +02:00
Florian Paul Azim Hoberg
c865829a2e release: Prepare release v1.0.2 2024-08-13 16:37:30 +02:00
Florian
101855b404 Merge pull request #46 from gyptazy/fix/45-adjust-daemon-time-mix-min-hrs
fix: Fix daemon timer to use hours instead of minutes.
2024-08-06 21:29:34 +02:00
Florian Paul Azim Hoberg
37e7a601be fix: Fix daemon timer to use hours instead of minutes.
Reported by: @mater-345
Fixes: #45
2024-08-06 18:06:05 +02:00
Florian
8791007e77 Merge pull request #43 from gyptazy/feature/40-option-run-only-on-master-node
feature: Add option to run ProxLB only on the Proxmox's master node in the cluster.
2024-08-06 18:00:26 +02:00
Florian Paul Azim Hoberg
3a2c16b137 feature: Add option to run ProxLB only on the Proxmox's master node in the cluster.
Fixes: #40
2024-08-06 17:58:34 +02:00
Florian
adc476e848 Merge pull request #42 from gyptazy/feature/41-add-option-run-migration-parallel-or-serial
feature: Add option to run migrations in parallel or sequentially
2024-08-04 08:27:04 +02:00
Florian Paul Azim Hoberg
28be8b8146 feature: Add option to run migrations in parallel or sequentially
Fixes: #41
2024-08-04 08:25:03 +02:00
Florian
cbaeba2046 Merge pull request #38 from gyptazy/release/pre-1.0.0
release: Prepare release 1.0.0
2024-08-02 12:57:20 +02:00
Florian Paul Azim Hoberg
61de9cb01d release: Prepare release 1.0.0 2024-08-01 10:34:13 +02:00
Florian
2e36d59f84 Merge pull request #37 from gyptazy/feature/36-docs-add-repository
docs: Add section for downloads (pkgs, repo, container image)
2024-08-01 10:10:27 +02:00
Florian Paul Azim Hoberg
3f1444a19f docs: Add section for downloads (pkgs, repo, container image)
Fixes: #36
2024-08-01 09:52:31 +02:00
Florian
86fe2487b5 Merge pull request #34 from gyptazy/feature/29-rebalance-by-free-node-memory-in-percent
feature: Add new mode_option to rebalance by node's bytes or percent.
2024-07-30 22:13:56 +02:00
Florian Paul Azim Hoberg (@gyptazy)
46832ba6b2 feature: Add new mode_option to rebalance by node's bytes or percent.
Fixes: #29
2024-07-30 07:41:17 +02:00
Florian
4671b414b8 Merge pull request #33 from gyptazy/fix/27-container-migration
fix: Rebalance CT function including reboot
2024-07-28 19:48:42 +02:00
Florian Paul Azim Hoberg
4efa9df965 fix: Rebalance CT function including reboot
Fixes: #27
Fixes: #29

fix
2024-07-28 19:46:58 +02:00
Florian
5c6cf04ed2 Merge pull request #31 from gyptazy/docs/30-improve-documentation
docs: Update the docs
2024-07-23 13:59:32 +02:00
Florian Paul Azim Hoberg
62f82e389a docs: Update the docs
Fixes: #30
2024-07-23 13:57:35 +02:00
Florian
1c7a630e39 Merge pull request #28 from gyptazy/feature/27-add-container-support
feature(core): Add support for LXC/Container to be rebalanced.
2024-07-21 11:42:13 +02:00
Florian Paul Azim Hoberg
f2209ce1b0 feature(core): Add support for LXC/Container to be rebalanced.
Fixes: #27
2024-07-21 11:41:13 +02:00
Florian
1908c2e8d8 Merge pull request #25 from formorer/patch-1
docs(README): Fix typo
2024-07-18 08:02:02 +02:00
Alexander Wirt
0e1774ee84 Fix typo in README.md 2024-07-18 07:58:55 +02:00
Florian
8bf4b1f107 Merge pull request #23 from gyptazy/feature/16-rebalance-on-assigned-resources
feature: Add option to rebalance VMs by their assigned resources. [#16]
2024-07-16 12:42:34 +02:00
Florian Paul Azim Hoberg
86009ff3c2 feature: Add option to rebalance VMs by their assigned resources. [#16]
Fixes: #16
2024-07-16 12:39:59 +02:00
Florian
3d634ef824 Merge pull request #22 from gyptazy/feature/17-add-configurable-log-verbosity
feature(logging): Add configurable log verbosity. [#17]
2024-07-13 08:33:38 +02:00
Florian Paul Azim Hoberg
e204bba54f feature(logging): Add configurable log verbosity. [#17]
Fixes #17
2024-07-13 08:31:33 +02:00
Florian
0cd5bb0b3f Merge pull request #20 from DerbyKurby/patch-1
docs(README): Update ToC and fix typos.
2024-07-12 15:30:02 +02:00
DerbyKurby
2278a91cb9 fix-readme 2024-07-12 15:18:06 +02:00
Florian
d26efea831 Merge pull request #13 from gyptazy/feature/6-add-dry-run-support
feature: Add dry-run support to see what kind of rebalancing would be done
2024-07-12 08:17:46 +02:00
Florian Paul Azim Hoberg (@gyptazy)
bf5ba5f8a6 feature: Add dry-run support to see what kind of rebalancing would be done. [#6]
* Add new cli param: -d for dry-run: Prints movements to cli
  * Add new cli param: -j for json: Prints movement as a json to cli

Fixes: #6
2024-07-11 12:57:56 +02:00
Florian
5b40b2ffdb Merge pull request #14 from gyptazy/feature/github-actions-build
feature(github-workflows): Add initial github workflows
2024-07-11 12:13:42 +02:00
23 changed files with 929 additions and 202 deletions

View File

@@ -1,2 +1,2 @@
added:
- Add container (e.g., Docker, Podman) support. [#10 by @daanbosch]
- Add Docker/Podman support. [#10 by @daanbosch]

View File

@@ -0,0 +1,2 @@
added:
- Add option to rebalance by assigned VM resources to avoid overprovisioning. [#16]

View File

@@ -0,0 +1,4 @@
added:
- Add feature to make log verbosity configurable [#17].
changed:
- Adjusted general logging and log more details.

View File

@@ -0,0 +1,2 @@
added:
- Add LXC/Container integration. [#27]

View File

@@ -0,0 +1,2 @@
added:
- Add option_mode to rebalance by node's free resources in percent (instead of bytes). [#29]

View File

@@ -0,0 +1,2 @@
added:
- Add dry-run support to see what kind of rebalancing would be done. [#6]

View File

@@ -1 +1 @@
date: TBD
date: 2024-08-01

View File

@@ -0,0 +1,2 @@
added:
- Add option to run ProxLB only on the Proxmox's master node in the cluster (reg. HA feature). [#40]

View File

@@ -0,0 +1,2 @@
added:
- Add option to run migrations in parallel or sequentially. [#41]

View File

@@ -0,0 +1,2 @@
changed:
- Fix daemon timer to use hours instead of minutes. [#45]

View File

@@ -0,0 +1,2 @@
fixed:
- Fix CMake packaging for Debian package to avoid overwriting the config file. [#49]

View File

@@ -0,0 +1 @@
date: 2024-08-13

View File

@@ -6,6 +6,40 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.0.2] - 2024-08-13
### Added
- Add option to run migration in parallel or sequentially. [#41]
- Add option to run ProxLB only on the Proxmox's master node in the cluster (reg. HA feature). [#40]
### Changed
- Fix daemon timer to use hours instead of minutes. [#45]
- Fix CMake packaging for Debian package to avoid overwriting the config file. [#49]
- Fix wonkey code style.
## [1.0.0] - 2024-08-01
### Added
- Add feature to prevent VMs from being relocated by defining a wildcard pattern. [#7]
- Add feature to make log verbosity configurable [#17].
- Add option_mode to rebalance by node's free resources in percent (instead of bytes). [#29]
- Add option to rebalance by assigned VM resources to avoid over provisioning. [#16]
- Add Docker/Podman support. [#10 by @daanbosch]
- Add exclude grouping feature to rebalance VMs from being located together to new nodes. [#4]
- Add feature to prevent VMs from being relocated by defining the 'plb_ignore_vm' tag. [#7]
- Add dry-run support to see what kind of rebalancing would be done. [#6]
- Add LXC/Container integration. [#27]
- Add include grouping feature to rebalance VMs bundled to new nodes. [#3]
### Changed
- Adjusted general logging and log more details.
## [0.9.9] - 2024-07-06
### Added

121
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,121 @@
# Contributing to ProxLB (PLB)
Thank you for considering contributing to ProxLB! We appreciate your help in improving the efficiency and performance of Proxmox clusters. Below are guidelines for contributing to the project.
## Table of Contents
- [Contributing to ProxLB (PLB)](#contributing-to-proxlb-plb)
- [Table of Contents](#table-of-contents)
- [Creating an Issue](#creating-an-issue)
- [Running Linting](#running-linting)
- [Running Tests](#running-tests)
- [Add Changelogs](#add-changelogs)
- [Submitting a Pull Request](#submitting-a-pull-request)
- [Code of Conduct](#code-of-conduct)
- [Getting Help](#getting-help)
## Creating an Issue
If you encounter a bug, have a feature request, or have any suggestions, please create an issue in our GitHub repository. To create an issue:
1. **Go to the [Issues](https://github.com/gyptazy/proxlb/issues) section of the repository.**
2. **Click on the "New issue" button.**
3. **Select the appropriate issue template (Bug Report, Feature Request, or Custom Issue).**
4. **Provide a clear and descriptive title.**
5. **Fill out the necessary details in the issue template.** Provide as much detail as possible to help us understand and reproduce the issue or evaluate the feature request.
## Running Linting
Before submitting a pull request, ensure that your changes sucessfully perform the lintin. ProxLB uses [flake8] for running tests. Follow these steps to run tests locally:
1. **Install pytest if you haven't already:**
```sh
pip install fake8
```
2. **Run the lintin:**
```sh
python3 -m flake8 proxlb
```
Linting will also be performed for each PR. Therefore, it might make sense to test this before pushing locally.
## Running Tests
Before submitting a pull request, ensure that your changes do not break existing functionality. ProxLB uses [pytest](https://docs.pytest.org/en/stable/) for running tests. Follow these steps to run tests locally:
1. **Install pytest if you haven't already:**
```sh
pip install pytest
```
2. **Run the tests:**
```sh
pytest
```
Ensure all tests pass before submitting your changes.
## Add Changelogs
ProxLB uses the [Changelog Fragments Creator](https://github.com/gyptazy/changelog-fragments-creator) for creating the overall `CHANGELOG.md` file. This changelog file is being generated from the files placed in the https://github.com/gyptazy/ProxLB/tree/main/.changelogs/ directory. Each release is represented by its version number where additional yaml files are being placed and parsed by the CFC tool. Such files look like:
```
added:
- Add option to rebalance by assigned VM resources to avoid overprovisioning. [#16]
```
Every PR should contain such a file describing the change to ensure this is also stated in the changelog file.
## Submitting a Pull Request
We welcome your contributions! Follow these steps to submit a pull request:
1. **Fork the repository to your GitHub account.**
2. **Clone your forked repository to your local machine:**
```sh
git clone https://github.com/gyptazy/proxlb.git
cd proxlb
```
Please prefix your PR regarding its type. It might be:
* doc
* feature
* fix
It should also provide the issue id to which it is related.
1. **Create a new branch for your changes:**
```sh
git checkout -b feature/10-add-new-cool-stuff
```
2. **Make your changes and commit them with a descriptive commit message:**
```sh
git add .
git commit -m "feature: Adding new cool stuff"
```
3. **Push your changes to your forked repository:**
```sh
git push origin feature/10-add-new-cool-stuff
```
4. **Create a pull request from your forked repository:**
- Go to the original repository on GitHub.
- Click on the "New pull request" button.
- Select the branch you pushed your changes to and create the pull request.
Please ensure that your pull request:
- Follows the project's coding style and guidelines.
- Includes tests for any new functionality.
- Updates the documentation as necessary.
## Code of Conduct
By participating in this project, you agree to abide by our [Code of Conduct](CODE_OF_CONDUCT.md). Please read it to understand the expected behavior and responsibilities when interacting with the community.
## Getting Help
If you need help or have any questions, feel free to reach out by creating an issue or by joining our [discussion forum](https://github.com/gyptazy/proxlb/discussions). You can also refer to our [documentation](https://github.com/gyptazy/ProxLB/tree/main/docs) for more information about the project or join our [chat room](https://matrix.to/#/#proxlb:gyptazy.ch) in Matrix.
Thank you for contributing to ProxLB! Together, we can enhance the efficiency and performance of Proxmox clusters.

180
README.md
View File

@@ -5,33 +5,49 @@
<p float="center"><img src="https://img.shields.io/github/license/gyptazy/ProxLB"/><img src="https://img.shields.io/github/contributors/gyptazy/ProxLB"/><img src="https://img.shields.io/github/last-commit/gyptazy/ProxLB/main"/><img src="https://img.shields.io/github/issues-raw/gyptazy/ProxLB"/><img src="https://img.shields.io/github/issues-pr/gyptazy/ProxLB"/></p>
## Table of Content
* Introduction
* Video of Migration
* Features
* Usage
* Dependencies
* Options
* Parameters
* Systemd
* Manuel
* Proxmox GUI Integration
* Quick Start
* Container Quick Start (Docker/Podman)
* Motivation
* References
* Packages / Container Images
* Misc
* Bugs
* Contributing
* Author(s)
## Table of Contents
- [ProxLB - (Re)Balance VM Workloads in Proxmox Clusters](#proxlb---rebalance-vm-workloads-in-proxmox-clusters)
- [Table of Contents](#table-of-contents)
- [Introduction](#introduction)
- [Video of Migration](#video-of-migration)
- [Features](#features)
- [How does it work?](#how-does-it-work)
- [Usage](#usage)
- [Dependencies](#dependencies)
- [Options](#options)
- [Parameters](#parameters)
- [Balancing](#balancing)
- [General](#general)
- [By Used Memory of VMs/CTs](#by-used-memory-of-vmscts)
- [By Assigned Memory of VMs/CTs](#by-assigned-memory-of-vmscts)
- [Grouping](#grouping)
- [Include (Stay Together)](#include-stay-together)
- [Exclude (Stay Separate)](#exclude-stay-separate)
- [Ignore VMs (Tag Style)](#ignore-vms-tag-style)
- [Systemd](#systemd)
- [Manual](#manual)
- [Proxmox GUI Integration](#proxmox-gui-integration)
- [Quick Start](#quick-start)
- [Container Quick Start (Docker/Podman)](#container-quick-start-dockerpodman)
- [Logging](#logging)
- [Motivation](#motivation)
- [References](#references)
- [Downloads](#downloads)
- [Packages](#packages)
- [Repository](#repository)
- [Container Images (Docker/Podman)](#container-images-dockerpodman)
- [Misc](#misc)
- [Bugs](#bugs)
- [Contributing](#contributing)
- [Support](#support)
- [Author(s)](#authors)
## Introduction
`ProxLB` (PLB) is an advanced tool designed to enhance the efficiency and performance of Proxmox clusters by optimizing the distribution of virtual machines (VMs) across the cluster nodes by using the Proxmox API. ProxLB meticulously gathers and analyzes a comprehensive set of resource metrics from both the cluster nodes and the running VMs. These metrics include CPU usage, memory consumption, and disk utilization, specifically focusing on local disk resources.
`ProxLB` (PLB) is an advanced tool designed to enhance the efficiency and performance of Proxmox clusters by optimizing the distribution of virtual machines (VMs) or Containers (CTs) across the cluster nodes by using the Proxmox API. ProxLB meticulously gathers and analyzes a comprehensive set of resource metrics from both the cluster nodes and the running VMs. These metrics include CPU usage, memory consumption, and disk utilization, specifically focusing on local disk resources.
PLB collects resource usage data from each node in the Proxmox cluster, including CPU, (local) disk and memory utilization. Additionally, it gathers resource usage statistics from all running VMs, ensuring a granular understanding of the cluster's workload distribution.
Intelligent rebalancing is a key feature of ProxLB where it re-balances VMs based on their memory, disk or cpu usage, ensuring that no node is overburdened while others remain underutilized. The rebalancing capabilities of PLB significantly enhance cluster performance and reliability. By ensuring that resources are evenly distributed, PLB helps prevent any single node from becoming a performance bottleneck, improving the reliability and stability of the cluster. Efficient rebalancing leads to better utilization of available resources, potentially reducing the need for additional hardware investments and lowering operational costs.
Intelligent rebalancing is a key feature of ProxLB where it re-balances VMs based on their memory, disk or CPU usage, ensuring that no node is overburdened while others remain underutilized. The rebalancing capabilities of PLB significantly enhance cluster performance and reliability. By ensuring that resources are evenly distributed, PLB helps prevent any single node from becoming a performance bottleneck, improving the reliability and stability of the cluster. Efficient rebalancing leads to better utilization of available resources, potentially reducing the need for additional hardware investments and lowering operational costs.
Automated rebalancing reduces the need for manual actions, allowing operators to focus on other critical tasks, thereby increasing operational efficiency.
@@ -46,9 +62,20 @@ Automated rebalancing reduces the need for manual actions, allowing operators to
* Performing
* Periodically
* One-shot solution
* Types
* Rebalance only VMs
* Rebalance only CTs
* Rebalance all (VMs and CTs)
* Filter
* Exclude nodes
* Exclude virtual machines
* Grouping
* Include groups (VMs that are rebalanced to nodes together)
* Exclude groups (VMs that must run on different nodes)
* Ignore groups (VMs that should be untouched)
* Dry-run support
* Human readable output in CLI
* JSON output for further parsing
* Migrate VM workloads away (e.g. maintenance preparation)
* Fully based on Proxmox API
* Usage
@@ -56,6 +83,11 @@ Automated rebalancing reduces the need for manual actions, allowing operators to
* Periodically (daemon)
* Proxmox Web GUI Integration (optional)
## How does it work?
ProxLB is a load-balancing system designed to optimize the distribution of virtual machines (VMs) and containers (CTs) across a cluster. It works by first gathering resource usage metrics from all nodes in the cluster through the Proxmox API. This includes detailed resource metrics for each VM and CT on every node. ProxLB then evaluates the difference between the maximum and minimum resource usage of the nodes, referred to as "Balanciness." If this difference exceeds a predefined threshold (which is configurable), the system initiates the rebalancing process.
Before starting any migrations, ProxLB validates that rebalancing actions are necessary and beneficial. Depending on the selected balancing mode — such as CPU, memory, or disk — it creates a balancing matrix. This matrix sorts the VMs by their maximum used or assigned resources, identifying the VM with the highest usage. ProxLB then places this VM on the node with the most free resources in the selected balancing type. This process runs recursively until the operator-defined Balanciness is achieved. Balancing can be defined for the used or max. assigned resources of VMs/CTs.
## Usage
Running PLB is easy and it runs almost everywhere since it just depends on `Python3` and the `proxmoxer` library. Therefore, it can directly run on a Proxmox node, dedicated systems like Debian, RedHat, or even FreeBSD, as long as the API is reachable by the client running PLB.
@@ -73,10 +105,17 @@ The following options can be set in the `proxlb.conf` file:
| api_pass | FooBar | Password for the API. |
| verify_ssl | 1 | Validate SSL certificates (1) or ignore (0). (default: 1) |
| method | memory | Defines the balancing method (default: memory) where you can use `memory`, `disk` or `cpu`. |
| mode | used | Rebalance by `used` resources (efficiency) or `assigned` (avoid overprovisioning) resources. (default: used)|
| mode_option | byte | Rebalance by node's resources in `bytes` or `percent`. (default: bytes) |
| type | vm | Rebalance only `vm` (virtual machines), `ct` (containers) or `all` (virtual machines & containers). (default: vm)|
| balanciness | 10 | Value of the percentage of lowest and highest resource consumption on nodes may differ before rebalancing. (default: 10) |
| parallel_migrations | 1 | Defines if migrations should be done parallely or sequentially. (default: 1) |
| ignore_nodes | dummynode01,dummynode02,test* | Defines a comma separated list of nodes to exclude. |
| ignore_vms | testvm01,testvm02 | Defines a comma separated list of VMs to exclude. (`*` as suffix wildcard or tags are also supported) |
| master_only | 0 | Defines is this should only be performed (1) on the cluster master node or not (0). (default: 0) |
| daemon | 1 | Run as a daemon (1) or one-shot (0). (default: 1) |
| schedule | 24 | Hours to rebalance in hours. (default: 24) |
| log_verbosity | INFO | Defines the log level (default: CRITICAL) where you can use `INFO`, `WARN` or `CRITICAL` |
An example of the configuration file looks like:
```
@@ -87,9 +126,25 @@ api_pass: FooBar
verify_ssl: 1
[balancing]
method: memory
mode: used
type: vm
# Balanciness defines how much difference may be
# between the lowest & highest resource consumption
# of nodes before rebalancing will be done.
# Examples:
# Rebalancing: node01: 41% memory consumption :: node02: 52% consumption
# No rebalancing: node01: 43% memory consumption :: node02: 50% consumption
balanciness: 10
# Enable parallel migrations. If set to 0 it will wait for completed migrations
# before starting next migration.
parallel_migrations: 1
ignore_nodes: dummynode01,dummynode02
ignore_vms: testvm01,testvm02
[service]
# The master_only option might be usuful if running ProxLB on all nodes in a cluster
# but only a single one should do the balancing. The master node is obtained from the Proxmox
# HA status.
master_only: 0
daemon: 1
```
@@ -99,7 +154,28 @@ The following options and parameters are currently supported:
| Option | Long Option | Description | Default |
|------|:------:|------:|------:|
| -c | --config | Path to a config file. | /etc/proxlb/proxlb.conf (default) |
| -d | --dry-run | Perform a dry-run without doing any actions. | Unset |
| -j | --json | Return a JSON of the VM movement. | Unset |
### Balancing
#### General
In general, virtual machines and containers can be rebalanced and moved around nodes in the cluster. Often, this also works without downtime without any further downtimes. However, this does **not** work with containers. LXC based containers will be shutdown, copied and started on the new node. Also to note, live migrations can work fluently without any issues but there are still several things to be considered. This is out of scope for ProxLB and applies in general to Proxmox and your cluster setup. You can find more details about this here: https://pve.proxmox.com/wiki/Migrate_to_Proxmox_VE.
#### By Used Memory of VMs/CTs
By continuously monitoring the current resource usage of VMs, ProxLB intelligently reallocates workloads to prevent any single node from becoming overloaded. This approach ensures that resources are balanced efficiently, providing consistent and optimal performance across the entire cluster at all times. To activate this balancing mode, simply activate the following option in your ProxLB configuration:
```
mode: used
```
Afterwards, restart the service (if running in daemon mode) to activate this rebalancing mode.
#### By Assigned Memory of VMs/CTs
By ensuring that resources are always available for each VM, ProxLB prevents over-provisioning and maintains a balanced load across all nodes. This guarantees that users have consistent access to the resources they need. However, if the total assigned resources exceed the combined capacity of the cluster, ProxLB will issue a warning, indicating potential over-provisioning despite its best efforts to balance the load. To activate this balancing mode, simply activate the following option in your ProxLB configuration:
```
mode: assigned
```
Afterwards, restart the service (if running in daemon mode) to activate this rebalancing mode.
### Grouping
#### Include (Stay Together)
@@ -108,7 +184,7 @@ The following options and parameters are currently supported:
#### Exclude (Stay Separate)
<img align="left" src="https://cdn.gyptazy.ch/images/plb-rebalancing-exclude-balance-group.jpg"/> Access the Proxmox Web UI by opening your web browser and navigating to your Proxmox VE web interface, then log in with your credentials. Navigate to the VM you want to tag by selecting it from the left-hand navigation panel. Click on the "Options" tab to view the VM's options, then select "Edit" or "Add" (depending on whether you are editing an existing tag or adding a new one). In the tag field, enter plb_exclude_ followed by your unique identifier, for example, plb_exclude_critical. Save the changes to apply the tag to the VM. Repeat these steps for each VM that should be excluded from being on the same node.
#### Ignore VMs (tag style)
#### Ignore VMs (Tag Style)
<img align="left" src="https://cdn.gyptazy.ch/images/plb-rebalancing-ignore-vm.jpg"/> In Proxmox, you can ensure that certain VMs are ignored during the rebalancing process by setting a specific tag within the Proxmox Web UI, rather than solely relying on configurations in the ProxLB config file. This can be achieved by adding the tag 'plb_ignore_vm' to the VM. Once this tag is applied, the VM will be excluded from any further rebalancing operations, simplifying the management process.
### Systemd
@@ -134,8 +210,8 @@ The executable must be able to read the config file, if no dedicated config file
The easiest way to get started is by using the ready-to-use packages that I provide on my CDN and to run it on a Linux Debian based system. This can also be one of the Proxmox nodes itself.
```
wget https://cdn.gyptazy.ch/files/amd64/debian/proxlb/proxlb_0.9.9_amd64.deb
dpkg -i proxlb_0.9.9_amd64.deb
wget https://cdn.gyptazy.ch/files/amd64/debian/proxlb/proxlb_1.0.2_amd64.deb
dpkg -i proxlb_1.0.2_amd64.deb
# Adjust your config
vi /etc/proxlb/proxlb.conf
systemctl restart proxlb
@@ -148,7 +224,7 @@ Creating a container image of ProxLB is straightforward using the provided Docke
```bash
git clone https://github.com/gyptazy/ProxLB.git
cd ProxLB
build -t proxlb .
docker build -t proxlb .
```
Afterwards simply adjust the config file to your needs:
@@ -162,7 +238,15 @@ docker run -it --rm -v $(pwd)/proxlb.conf:/etc/proxlb/proxlb.conf proxlb
```
### Logging
ProxLB uses the `SystemdHandler` for logging. You can find all your logs in your systemd unit log or in the journalctl.
ProxLB uses the `SystemdHandler` for logging. You can find all your logs in your systemd unit log or in the `journalctl`. In default, ProxLB only logs critical events. However, for further understanding of the balancing it might be useful to change this to `INFO` or `DEBUG` which can simply be done in the [proxlb.conf](https://github.com/gyptazy/ProxLB/blob/main/proxlb.conf#L14) file by changing the `log_verbosity` parameter.
Available logging values:
| Verbosity | Description |
|------|:------:|
| DEBUG | This option logs everything and is needed for debugging the code. |
| INFO | This option provides insides behind the scenes. What/why has been something done and with which values. |
| WARNING | This option provides only warning messages, which might be a problem in general but not for the application itself. |
| CRITICAL | This option logs all critical events that will avoid running ProxLB. |
## Motivation
As a developer managing a cluster of virtual machines for my projects, I often encountered the challenge of resource imbalance. Nodes within the cluster would become unevenly loaded, with some nodes being overburdened while others remained underutilized. This imbalance led to inefficiencies, performance bottlenecks, and increased operational costs. Frustrated by the lack of an adequate solution to address this issue, I decided to develop the ProxLB (PLB) to ensure better resource distribution across my clusters.
@@ -183,25 +267,61 @@ Here you can find some overviews of references for and about the ProxLB (PLB):
| General introduction into ProxLB | https://gyptazy.ch/blog/proxlb-rebalancing-vm-workloads-across-nodes-in-proxmox-clusters/ |
| Howto install and use ProxLB on Debian to rebalance vm workloads in a Proxmox cluster | https://gyptazy.ch/howtos/howto-install-and-use-proxlb-to-rebalance-vm-workloads-across-nodes-in-proxmox-clusters/ |
## Packages / Container Images
## Downloads
ProxLB can be obtained in man different ways, depending on which use case you prefer. You can use simply copy the code from GitHub, use created packages for Debian or RedHat based systems, use a Repository to keep ProxLB always up to date or simply use a Container image for Docker/Podman.
### Packages
Ready to use packages can be found at:
* https://cdn.gyptazy.ch/files/amd64/debian/proxlb/
* https://cdn.gyptazy.ch/files/amd64/ubuntu/proxlb/
* https://cdn.gyptazy.ch/files/amd64/redhat/proxlb/
* https://cdn.gyptazy.ch/files/amd64/freebsd/proxlb/
### Repository
Debian based systems can also use the repository by adding the following line to their apt sources:
```
deb https://repo.gyptazy.ch/ /
```
The Repository's GPG key can be found at: `https://repo.gyptazy.ch/repo/KEY.gpg`
You can also simply import it by running:
```
# KeyID: DEB76ADF7A0BAADB51792782FD6A7A70C11226AA
# SHA256: 5e44fffa09c747886ee37cc6e9e7eaf37c6734443cc648eaf0a9241a89084383 KEY.gpg
wget -O /etc/apt/trusted.gpg.d/proxlb.asc https://repo.gyptazy.ch/repo/KEY.gpg
```
*Note: The defined repositories `repo.gyptazy.ch` and `repo.proxlb.de` are the same!*
### Container Images (Docker/Podman)
Container Images for Podman, Docker etc., can be found at:
| Version | Image |
|------|:------:|
| latest | cr.gyptazy.ch/proxlb/proxlb:latest |
| v0.0.9 | cr.gyptazy.ch/proxlb/proxlb:v0.0.9 |
| v1.0.2 | cr.gyptazy.ch/proxlb/proxlb:v1.0.2 |
| v1.0.0 | cr.gyptazy.ch/proxlb/proxlb:v1.0.0 |
| v0.9.9 | cr.gyptazy.ch/proxlb/proxlb:v0.9.9 |
## Misc
### Bugs
Bugs can be reported via the GitHub issue tracker [here](https://github.com/gyptazy/ProxLB/issues). You may also report bugs via email or deliver PRs to fix them on your own. Therefore, you might also see the contributing chapter.
### Contributing
Feel free to add further documentation, to adjust already existing one or to contribute with code. Please take care about the style guide and naming conventions.
Feel free to add further documentation, to adjust already existing one or to contribute with code. Please take care about the style guide and naming conventions. You can find more in our [CONTRIBUTING.md](https://github.com/gyptazy/ProxLB/blob/main/CONTRIBUTING.md) file.
### Support
If you need assistance or have any questions, we offer support through our dedicated [chat room](https://matrix.to/#/#proxlb:gyptazy.ch) in Matrix and on Reddit. Join our community for real-time help, advice, and discussions. Connect with us in our dedicated chat room for immediate support and live interaction with other users and developers. You can also visit our [Reddit community](https://www.reddit.com/r/Proxmox/comments/1e78ap3/introducing_proxlb_rebalance_your_vm_workloads/) to post your queries, share your experiences, and get support from fellow community members and moderators. You may also just open directly an issue [here](https://github.com/gyptazy/ProxLB/issues) on GitHub. We are here to help and ensure you have the best experience possible.
| Support Channel | Link |
|------|:------:|
| Matrix | [#proxlb:gyptazy.ch](https://matrix.to/#/#proxlb:gyptazy.ch) |
| Reddit | [Reddit community](https://www.reddit.com/r/Proxmox/comments/1e78ap3/introducing_proxlb_rebalance_your_vm_workloads/) |
| GitHub | [ProxLB GitHub](https://github.com/gyptazy/ProxLB/issues) |
### Author(s)
* Florian Paul Azim Hoberg @gyptazy (https://gyptazy.ch)

View File

@@ -1,4 +1,20 @@
# Configuration
## Balancing
### By Used Memmory of VMs
By continuously monitoring the current resource usage of VMs, ProxLB intelligently reallocates workloads to prevent any single node from becoming overloaded. This approach ensures that resources are balanced efficiently, providing consistent and optimal performance across the entire cluster at all times. To activate this balancing mode, simply activate the following option in your ProxLB configuration:
```
mode: used
```
Afterwards, restart the service (if running in daemon mode) to activate this rebalancing mode.
### By Assigned Memory of VMs
By ensuring that resources are always available for each VM, ProxLB prevents over-provisioning and maintains a balanced load across all nodes. This guarantees that users have consistent access to the resources they need. However, if the total assigned resources exceed the combined capacity of the cluster, ProxLB will issue a warning, indicating potential over-provisioning despite its best efforts to balance the load. To activate this balancing mode, simply activate the following option in your ProxLB configuration:
```
mode: assigned
```
Afterwards, restart the service (if running in daemon mode) to activate this rebalancing mode.
## Grouping
### Include (Stay Together)
<img align="left" src="https://cdn.gyptazy.ch/images/plb-rebalancing-include-balance-group.jpg"/> Access the Proxmox Web UI by opening your web browser and navigating to your Proxmox VE web interface, then log in with your credentials. Navigate to the VM you want to tag by selecting it from the left-hand navigation panel. Click on the "Options" tab to view the VM's options, then select "Edit" or "Add" (depending on whether you are editing an existing tag or adding a new one). In the tag field, enter plb_include_ followed by your unique identifier, for example, plb_include_group1. Save the changes to apply the tag to the VM. Repeat these steps for each VM that should be included in the group.

View File

@@ -21,3 +21,57 @@ Jul 06 10:25:16 build01 proxlb[7285]: proxlb: Error: [python-imports]: Could no
Debian/Ubuntu: apt-get install python3-proxmoxer
If the package is not provided by your systems repository, you can also install it by running `pip3 install proxmoxer`.
### How does it work?
ProxLB is a load-balancing system designed to optimize the distribution of virtual machines (VMs) and containers (CTs) across a cluster. It works by first gathering resource usage metrics from all nodes in the cluster through the Proxmox API. This includes detailed resource metrics for each VM and CT on every node. ProxLB then evaluates the difference between the maximum and minimum resource usage of the nodes, referred to as "Balanciness." If this difference exceeds a predefined threshold (which is configurable), the system initiates the rebalancing process.
Before starting any migrations, ProxLB validates that rebalancing actions are necessary and beneficial. Depending on the selected balancing mode — such as CPU, memory, or disk — it creates a balancing matrix. This matrix sorts the VMs by their maximum used or assigned resources, identifying the VM with the highest usage. ProxLB then places this VM on the node with the most free resources in the selected balancing type. This process runs recursively until the operator-defined Balanciness is achieved. Balancing can be defined for the used or max. assigned resources of VMs/CTs.
### Logging
ProxLB uses the `SystemdHandler` for logging. You can find all your logs in your systemd unit log or in the `journalctl`. In default, ProxLB only logs critical events. However, for further understanding of the balancing it might be useful to change this to `INFO` or `DEBUG` which can simply be done in the [proxlb.conf](https://github.com/gyptazy/ProxLB/blob/main/proxlb.conf#L14) file by changing the `log_verbosity` parameter.
Available logging values:
| Verbosity | Description |
|------|:------:|
| DEBUG | This option logs everything and is needed for debugging the code. |
| INFO | This option provides insides behind the scenes. What/why has been something done and with which values. |
| WARNING | This option provides only warning messages, which might be a problem in general but not for the application itself. |
| CRITICAL | This option logs all critical events that will avoid running ProxLB. |
### Motivation
As a developer managing a cluster of virtual machines for my projects, I often encountered the challenge of resource imbalance. Nodes within the cluster would become unevenly loaded, with some nodes being overburdened while others remained underutilized. This imbalance led to inefficiencies, performance bottlenecks, and increased operational costs. Frustrated by the lack of an adequate solution to address this issue, I decided to develop the ProxLB (PLB) to ensure better resource distribution across my clusters.
My primary motivation for creating PLB stemmed from my work on my BoxyBSD project, where I consistently faced the difficulty of maintaining balanced nodes while running various VM workloads but also on my personal clusters. The absence of an efficient rebalancing mechanism made it challenging to achieve optimal performance and stability. Recognizing the necessity for a tool that could gather and analyze resource metrics from both the cluster nodes and the running VMs, I embarked on developing ProxLB.
PLB meticulously collects detailed resource usage data from each node in a Proxmox cluster, including CPU load, memory usage, and local disk space utilization. It also gathers comprehensive statistics from all running VMs, providing a granular understanding of the workload distribution. With this data, PLB intelligently redistributes VMs based on memory usage, local disk usage, and CPU usage. This ensures that no single node is overburdened, storage resources are evenly distributed, and the computational load is balanced, enhancing overall cluster performance.
As an advocate of the open-source philosophy, I believe in the power of community and collaboration. By sharing solutions like PLB, I aim to contribute to the collective knowledge and tools available to developers facing similar challenges. Open source fosters innovation, transparency, and mutual support, enabling developers to build on each other's work and create better solutions together.
Developing PLB was driven by a desire to solve a real problem I faced in my projects. However, the spirit behind this effort was to provide a valuable resource to the community. By open-sourcing PLB, I hope to help other developers manage their clusters more efficiently, optimize their resource usage, and reduce operational costs. Sharing this solution aligns with the core principles of open source, where the goal is not only to solve individual problems but also to contribute to the broader ecosystem.
### Packages / Container Images
Ready to use packages can be found at:
* https://cdn.gyptazy.ch/files/amd64/debian/proxlb/
* https://cdn.gyptazy.ch/files/amd64/ubuntu/proxlb/
* https://cdn.gyptazy.ch/files/amd64/redhat/proxlb/
* https://cdn.gyptazy.ch/files/amd64/freebsd/proxlb/
Container Images for Podman, Docker etc., can be found at:
| Version | Image |
|------|:------:|
| latest | cr.gyptazy.ch/proxlb/proxlb:latest |
### Bugs
Bugs can be reported via the GitHub issue tracker [here](https://github.com/gyptazy/ProxLB/issues). You may also report bugs via email or deliver PRs to fix them on your own. Therefore, you might also see the contributing chapter.
### Contributing
Feel free to add further documentation, to adjust already existing one or to contribute with code. Please take care about the style guide and naming conventions. You can find more in our [CONTRIBUTING.md](https://github.com/gyptazy/ProxLB/blob/main/CONTRIBUTING.md) file.
### Support
If you need assistance or have any questions, we offer support through our dedicated [chat room](https://matrix.to/#/#proxlb:gyptazy.ch) in Matrix and on Reddit. Join our community for real-time help, advice, and discussions. Connect with us in our dedicated chat room for immediate support and live interaction with other users and developers. You can also visit our [Reddit community](https://www.reddit.com/r/Proxmox/comments/1e78ap3/introducing_proxlb_rebalance_your_vm_workloads/) to post your queries, share your experiences, and get support from fellow community members and moderators. You may also just open directly an issue [here](https://github.com/gyptazy/ProxLB/issues) on GitHub. We are here to help and ensure you have the best experience possible.
| Support Channel | Link |
|------|:------:|
| Matrix | [#proxlb:gyptazy.ch](https://matrix.to/#/#proxlb:gyptazy.ch) |
| Reddit | [Reddit community](https://www.reddit.com/r/Proxmox/comments/1e78ap3/introducing_proxlb_rebalance_your_vm_workloads/) |
| GitHub | [ProxLB GitHub](https://github.com/gyptazy/ProxLB/issues) |

View File

@@ -1,5 +1,5 @@
cmake_minimum_required(VERSION 3.16)
project(proxmox-rebalancing-service VERSION 0.9.9)
project(proxmox-rebalancing-service VERSION 1.0.2)
install(PROGRAMS ../proxlb DESTINATION /bin)
install(FILES ../proxlb.conf DESTINATION /etc/proxlb)
@@ -17,8 +17,8 @@ set(CPACK_PACKAGE_VENDOR "gyptazy")
set(CPACK_PACKAGE_VERSION ${CMAKE_PROJECT_VERSION})
set(CPACK_GENERATOR "RPM")
set(CPACK_RPM_PACKAGE_ARCHITECTURE "amd64")
set(CPACK_RPM_PACKAGE_SUMMARY "ProxLB Rebalancing VM workloads within Proxmox clusters.")
set(CPACK_RPM_PACKAGE_DESCRIPTION "ProxLB Rebalancing VM workloads within Proxmox clusters.")
set(CPACK_RPM_PACKAGE_SUMMARY "ProxLB - Rebalance VM workloads across nodes in Proxmox clusters.")
set(CPACK_RPM_PACKAGE_DESCRIPTION "ProxLB - Rebalance VM workloads across nodes in Proxmox clusters.")
set(CPACK_RPM_CHANGELOG_FILE "${CMAKE_CURRENT_SOURCE_DIR}/changelog_redhat")
set(CPACK_PACKAGE_RELEASE 1)
set(CPACK_RPM_PACKAGE_LICENSE "GPL 3.0")
@@ -27,15 +27,14 @@ set(CPACK_RPM_PACKAGE_REQUIRES "python >= 3.2.0")
# DEB packaging
set(CPACK_DEBIAN_FILE_NAME DEB-DEFAULT)
set(CPACK_DEBIAN_PACKAGE_ARCHITECTURE "amd64")
set(CPACK_DEBIAN_PACKAGE_SUMMARY "ProxLB Rebalancing VM workloads within Proxmox clusters.")
set(CPACK_DEBIAN_PACKAGE_DESCRIPTION "ProxLB Rebalancing VM workloads within Proxmox clusters.")
set(CPACK_DEBIAN_PACKAGE_SUMMARY "ProxLB - Rebalance VM workloads across nodes in Proxmox clusters.")
set(CPACK_DEBIAN_PACKAGE_DESCRIPTION "ProxLB - Rebalance VM workloads across nodes in Proxmox clusters.")
set(CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA "${CMAKE_CURRENT_SOURCE_DIR}/changelog_debian")
set(CPACK_DEBIAN_PACKAGE_DEPENDS "python3")
set(CPACK_DEBIAN_PACKAGE_DEPENDS "python3, python3-proxmoxer")
set(CPACK_DEBIAN_PACKAGE_LICENSE "GPL 3.0")
# Install
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
set(CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA "${CMAKE_CURRENT_SOURCE_DIR}/postinst")
set(CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA "${CMAKE_CURRENT_SOURCE_DIR}/postinst;${CMAKE_CURRENT_SOURCE_DIR}/conffiles")
set(CPACK_RPM_POST_INSTALL_SCRIPT_FILE "${CMAKE_CURRENT_SOURCE_DIR}/postinst")
include(CPack)

View File

@@ -1,5 +1,21 @@
proxlb (0.9.0) unstable; urgency=low
proxlb (1.0.2) unstable; urgency=low
* Add option to run migration in parallel or sequentially.
* Add option to run ProxLB only on a Proxmox cluster master (req. HA feature).
* Fix daemon timer to use hours instead of minutes.
* Fix CMake packaging for Debian package to avoid overwriting the config file.
* Fix some wonkey code styles.
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.ch> Tue, 13 Aug 2024 17:28:14 +0200
proxlb (1.0.0) unstable; urgency=low
* Initial release of ProxLB.
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.ch> Sun, 07 Jul 2024 05:38:41 -0200
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.ch> Thu, 01 Aug 2024 17:04:12 +0200
proxlb (0.9.0) unstable; urgency=low
* Initial development release of ProxLB as a tech preview.
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.ch> Sun, 07 Jul 2024 05:38:41 +0200

View File

@@ -1,2 +1,11 @@
* Sun Jul 07 2024 Florian Paul Azim Hoberg <gyptazy@gyptazy.ch>
* Tue Aug 13 2024 Florian Paul Azim Hoberg <gyptazy@gyptazy.ch>
- Add option to run migration in parallel or sequentially.
- Add option to run ProxLB only on a Proxmox cluster master (req. HA feature).
- Fixed daemon timer to use hours instead of minutes.
- Fixed some wonkey code styles.
* Thu Aug 01 2024 Florian Paul Azim Hoberg <gyptazy@gyptazy.ch>
- Initial release of ProxLB.
* Sun Jul 07 2024 Florian Paul Azim Hoberg <gyptazy@gyptazy.ch>
- Initial development release of ProxLB as a tech preview.

1
packaging/conffiles Normal file
View File

@@ -0,0 +1 @@
/etc/proxlb/proxlb.conf

652
proxlb
View File

@@ -22,16 +22,18 @@
import argparse
import configparser
import json
import logging
import os
try:
import proxmoxer
_imports = True
except ImportError as error:
except ImportError:
_imports = False
import random
import re
import requests
import socket
import sys
import time
import urllib3
@@ -39,7 +41,7 @@ import urllib3
# Constants
__appname__ = "ProxLB"
__version__ = "0.9.9"
__version__ = "1.0.2"
__author__ = "Florian Paul Azim Hoberg <gyptazy@gyptazy.ch> @gyptazy"
__errors__ = False
@@ -71,14 +73,18 @@ class SystemdHandler(logging.Handler):
# Functions
def initialize_logger(log_level, log_handler):
def initialize_logger(log_level, update_log_verbosity=False):
""" Initialize ProxLB logging handler. """
info_prefix = 'Info: [logger]:'
root_logger = logging.getLogger()
root_logger.setLevel(log_level)
root_logger.addHandler(SystemdHandler())
logging.info(f'{info_prefix} Logger got initialized.')
if not update_log_verbosity:
root_logger.addHandler(SystemdHandler())
logging.info(f'{info_prefix} Logger got initialized.')
else:
logging.info(f'{info_prefix} Logger verbosity got updated to: {log_level}.')
def pre_validations(config_path):
@@ -107,7 +113,7 @@ def validate_daemon(daemon, schedule):
if bool(int(daemon)):
logging.info(f'{info_prefix} Running in daemon mode. Next run in {schedule} hours.')
time.sleep(int(schedule) * 60)
time.sleep(int(schedule) * 60 * 60)
else:
logging.info(f'{info_prefix} Not running in daemon mode. Quitting.')
sys.exit(0)
@@ -140,7 +146,9 @@ def __validate_config_file(config_path):
def initialize_args():
""" Initialize given arguments for ProxLB. """
argparser = argparse.ArgumentParser(description='ProxLB')
argparser.add_argument('-c', '--config', type=str, help='Path to config file.')
argparser.add_argument('-c', '--config', type=str, help='Path to config file.', required=False)
argparser.add_argument('-d', '--dry-run', help='Perform a dry-run without doing any actions.', action='store_true', required=False)
argparser.add_argument('-j', '--json', help='Return a JSON of the VM movement.', action='store_true', required=False)
return argparser.parse_args()
@@ -166,17 +174,24 @@ def initialize_config_options(config_path):
config = configparser.ConfigParser()
config.read(config_path)
# Proxmox config
proxmox_api_host = config['proxmox']['api_host']
proxmox_api_user = config['proxmox']['api_user']
proxmox_api_pass = config['proxmox']['api_pass']
proxmox_api_ssl_v = config['proxmox']['verify_ssl']
proxmox_api_host = config['proxmox']['api_host']
proxmox_api_user = config['proxmox']['api_user']
proxmox_api_pass = config['proxmox']['api_pass']
proxmox_api_ssl_v = config['proxmox']['verify_ssl']
# Balancing
balancing_method = config['balancing'].get('method', 'memory')
ignore_nodes = config['balancing'].get('ignore_nodes', None)
ignore_vms = config['balancing'].get('ignore_vms', None)
balancing_method = config['balancing'].get('method', 'memory')
balancing_mode = config['balancing'].get('mode', 'used')
balancing_mode_option = config['balancing'].get('mode_option', 'bytes')
balancing_type = config['balancing'].get('type', 'vm')
balanciness = config['balancing'].get('balanciness', 10)
parallel_migrations = config['balancing'].get('parallel_migrations', 1)
ignore_nodes = config['balancing'].get('ignore_nodes', None)
ignore_vms = config['balancing'].get('ignore_vms', None)
# Service
daemon = config['service'].get('daemon', 1)
schedule = config['service'].get('schedule', 24)
master_only = config['service'].get('master_only', 0)
daemon = config['service'].get('daemon', 1)
schedule = config['service'].get('schedule', 24)
log_verbosity = config['service'].get('log_verbosity', 'CRITICAL')
except configparser.NoSectionError:
logging.critical(f'{error_prefix} Could not find the required section.')
sys.exit(2)
@@ -188,8 +203,8 @@ def initialize_config_options(config_path):
sys.exit(2)
logging.info(f'{info_prefix} Configuration file loaded.')
return proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v, balancing_method, \
ignore_nodes, ignore_vms, daemon, schedule
return proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v, balancing_method, balancing_mode, balancing_mode_option, \
balancing_type, balanciness, parallel_migrations, ignore_nodes, ignore_vms, master_only, daemon, schedule, log_verbosity
def api_connect(proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v):
@@ -219,36 +234,105 @@ def api_connect(proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_ap
return api_object
def execute_rebalancing_only_by_master(api_object, master_only):
""" Validate if balancing should only be done by the cluster master. Afterwards, validate if this node is the cluster master. """
info_prefix = 'Info: [only-on-master-executor]:'
master_only = bool(int(master_only))
if bool(int(master_only)):
logging.info(f'{info_prefix} Master only rebalancing is defined. Starting validation.')
cluster_master_node = get_cluster_master(api_object)
cluster_master = validate_cluster_master(cluster_master_node)
return cluster_master, master_only
else:
logging.info(f'{info_prefix} No master only rebalancing is defined. Skipping validation.')
return False, master_only
def get_cluster_master(api_object):
""" Get the current master of the Proxmox cluster. """
error_prefix = 'Error: [cluster-master-getter]:'
info_prefix = 'Info: [cluster-master-getter]:'
try:
ha_status_object = api_object.cluster().ha().status().manager_status().get()
logging.info(f'{info_prefix} Master node: {ha_status_object.get("manager_status", None).get("master_node", None)}')
except urllib3.exceptions.NameResolutionError:
logging.critical(f'{error_prefix} Could not resolve the API.')
sys.exit(2)
except requests.exceptions.ConnectTimeout:
logging.critical(f'{error_prefix} Connection time out to API.')
sys.exit(2)
except requests.exceptions.SSLError:
logging.critical(f'{error_prefix} SSL certificate verification failed for API.')
sys.exit(2)
cluster_master = ha_status_object.get("manager_status", None).get("master_node", None)
if cluster_master:
return cluster_master
else:
logging.critical(f'{error_prefix} Could not obtain cluster master. Please check your configuration - stopping.')
sys.exit(2)
def validate_cluster_master(cluster_master):
""" Validate if the current execution node is the cluster master. """
info_prefix = 'Info: [cluster-master-validator]:'
node_executor_hostname = socket.gethostname()
logging.info(f'{info_prefix} Node executor hostname is: {node_executor_hostname}')
if node_executor_hostname != cluster_master:
logging.info(f'{info_prefix} {node_executor_hostname} is not the cluster master ({cluster_master}).')
return False
else:
return True
def get_node_statistics(api_object, ignore_nodes):
""" Get statistics of cpu, memory and disk for each node in the cluster. """
info_prefix = 'Info: [node-statistics]:'
node_statistics = {}
info_prefix = 'Info: [node-statistics]:'
node_statistics = {}
ignore_nodes_list = ignore_nodes.split(',')
for node in api_object.nodes.get():
if node['status'] == 'online' and node['node'] not in ignore_nodes_list:
node_statistics[node['node']] = {}
node_statistics[node['node']]['cpu_total'] = node['maxcpu']
node_statistics[node['node']]['cpu_used'] = node['cpu']
node_statistics[node['node']]['cpu_free'] = int(node['maxcpu']) - int(node['cpu'])
node_statistics[node['node']]['cpu_free_percent'] = int((node_statistics[node['node']]['cpu_free']) / int(node['maxcpu']) * 100)
node_statistics[node['node']]['memory_total'] = node['maxmem']
node_statistics[node['node']]['memory_used'] = node['mem']
node_statistics[node['node']]['memory_free'] = int(node['maxmem']) - int(node['mem'])
node_statistics[node['node']]['memory_free_percent'] = int((node_statistics[node['node']]['memory_free']) / int(node['maxmem']) * 100)
node_statistics[node['node']]['disk_total'] = node['maxdisk']
node_statistics[node['node']]['disk_used'] = node['disk']
node_statistics[node['node']]['disk_free'] = int(node['maxdisk']) - int(node['disk'])
node_statistics[node['node']]['disk_free_percent'] = int((node_statistics[node['node']]['disk_free']) / int(node['maxdisk']) * 100)
node_statistics[node['node']]['cpu_total'] = node['maxcpu']
node_statistics[node['node']]['cpu_assigned'] = node['cpu']
node_statistics[node['node']]['cpu_assigned_percent'] = int((node_statistics[node['node']]['cpu_assigned']) / int(node_statistics[node['node']]['cpu_total']) * 100)
node_statistics[node['node']]['cpu_assigned_percent_last_run'] = 0
node_statistics[node['node']]['cpu_used'] = 0
node_statistics[node['node']]['cpu_free'] = int(node['maxcpu']) - int(node['cpu'])
node_statistics[node['node']]['cpu_free_percent'] = int((node_statistics[node['node']]['cpu_free']) / int(node['maxcpu']) * 100)
node_statistics[node['node']]['cpu_free_percent_last_run'] = 0
node_statistics[node['node']]['memory_total'] = node['maxmem']
node_statistics[node['node']]['memory_assigned'] = 0
node_statistics[node['node']]['memory_assigned_percent'] = int((node_statistics[node['node']]['memory_assigned']) / int(node_statistics[node['node']]['memory_total']) * 100)
node_statistics[node['node']]['memory_assigned_percent_last_run'] = 0
node_statistics[node['node']]['memory_used'] = node['mem']
node_statistics[node['node']]['memory_free'] = int(node['maxmem']) - int(node['mem'])
node_statistics[node['node']]['memory_free_percent'] = int((node_statistics[node['node']]['memory_free']) / int(node['maxmem']) * 100)
node_statistics[node['node']]['memory_free_percent_last_run'] = 0
node_statistics[node['node']]['disk_total'] = node['maxdisk']
node_statistics[node['node']]['disk_assigned'] = 0
node_statistics[node['node']]['disk_assigned_percent'] = int((node_statistics[node['node']]['disk_assigned']) / int(node_statistics[node['node']]['disk_total']) * 100)
node_statistics[node['node']]['disk_assigned_percent_last_run'] = 0
node_statistics[node['node']]['disk_used'] = node['disk']
node_statistics[node['node']]['disk_free'] = int(node['maxdisk']) - int(node['disk'])
node_statistics[node['node']]['disk_free_percent'] = int((node_statistics[node['node']]['disk_free']) / int(node['maxdisk']) * 100)
node_statistics[node['node']]['disk_free_percent_last_run'] = 0
logging.info(f'{info_prefix} Added node {node["node"]}.')
logging.info(f'{info_prefix} Created node statistics.')
return node_statistics
def get_vm_statistics(api_object, ignore_vms):
def get_vm_statistics(api_object, ignore_vms, balancing_type):
""" Get statistics of cpu, memory and disk for each vm in the cluster. """
info_prefix = 'Info: [vm-statistics]:'
warn_prefix = 'Warn: [vm-statistics]:'
vm_statistics = {}
ignore_vms_list = ignore_vms.split(',')
group_include = None
@@ -261,43 +345,112 @@ def get_vm_statistics(api_object, ignore_vms):
vm_ignore_wildcard = __validate_ignore_vm_wildcard(ignore_vms)
for node in api_object.nodes.get():
for vm in api_object.nodes(node['node']).qemu.get():
# Get the VM tags from API.
vm_tags = __get_vm_tags(api_object, node, vm['vmid'])
if vm_tags is not None:
group_include, group_exclude, vm_ignore = __get_proxlb_groups(vm_tags)
# Add all virtual machines if type is vm or all.
if balancing_type == 'vm' or balancing_type == 'all':
for vm in api_object.nodes(node['node']).qemu.get():
# Get wildcard match for VMs to ignore if a wildcard pattern was
# previously found. Wildcards may slow down the task when using
# many patterns in the ignore list. Therefore, run this only if
# a wildcard pattern was found. We also do not need to validate
# this if the VM is already being ignored by a defined tag.
if vm_ignore_wildcard and not vm_ignore:
vm_ignore = __check_vm_name_wildcard_pattern(vm['name'], ignore_vms_list)
# Get the VM tags from API.
vm_tags = __get_vm_tags(api_object, node, vm['vmid'], 'vm')
if vm_tags is not None:
group_include, group_exclude, vm_ignore = __get_proxlb_groups(vm_tags)
if vm['status'] == 'running' and vm['name'] not in ignore_vms_list and not vm_ignore:
vm_statistics[vm['name']] = {}
vm_statistics[vm['name']]['group_include'] = group_include
vm_statistics[vm['name']]['group_exclude'] = group_exclude
vm_statistics[vm['name']]['cpu_total'] = vm['cpus']
vm_statistics[vm['name']]['cpu_used'] = vm['cpu']
vm_statistics[vm['name']]['memory_total'] = vm['maxmem']
vm_statistics[vm['name']]['memory_used'] = vm['mem']
vm_statistics[vm['name']]['disk_total'] = vm['maxdisk']
vm_statistics[vm['name']]['disk_used'] = vm['disk']
vm_statistics[vm['name']]['vmid'] = vm['vmid']
vm_statistics[vm['name']]['node_parent'] = node['node']
# Rebalancing node will be overwritten after calculations.
# If the vm stays on the node, it will be removed at a
# later time.
vm_statistics[vm['name']]['node_rebalance'] = node['node']
logging.info(f'{info_prefix} Added vm {vm["name"]}.')
# Get wildcard match for VMs to ignore if a wildcard pattern was
# previously found. Wildcards may slow down the task when using
# many patterns in the ignore list. Therefore, run this only if
# a wildcard pattern was found. We also do not need to validate
# this if the VM is already being ignored by a defined tag.
if vm_ignore_wildcard and not vm_ignore:
vm_ignore = __check_vm_name_wildcard_pattern(vm['name'], ignore_vms_list)
if vm['status'] == 'running' and vm['name'] not in ignore_vms_list and not vm_ignore:
vm_statistics[vm['name']] = {}
vm_statistics[vm['name']]['group_include'] = group_include
vm_statistics[vm['name']]['group_exclude'] = group_exclude
vm_statistics[vm['name']]['cpu_total'] = vm['cpus']
vm_statistics[vm['name']]['cpu_used'] = vm['cpu']
vm_statistics[vm['name']]['memory_total'] = vm['maxmem']
vm_statistics[vm['name']]['memory_used'] = vm['mem']
vm_statistics[vm['name']]['disk_total'] = vm['maxdisk']
vm_statistics[vm['name']]['disk_used'] = vm['disk']
vm_statistics[vm['name']]['vmid'] = vm['vmid']
vm_statistics[vm['name']]['node_parent'] = node['node']
vm_statistics[vm['name']]['type'] = 'vm'
# Rebalancing node will be overwritten after calculations.
# If the vm stays on the node, it will be removed at a
# later time.
vm_statistics[vm['name']]['node_rebalance'] = node['node']
logging.info(f'{info_prefix} Added vm {vm["name"]}.')
# Add all containers if type is ct or all.
if balancing_type == 'ct' or balancing_type == 'all':
for vm in api_object.nodes(node['node']).lxc.get():
logging.warning(f'{warn_prefix} Rebalancing on LXC containers (CT) always requires them to shut down.')
logging.warning(f'{warn_prefix} {vm["name"]} is from type CT and cannot be live migrated!')
# Get the VM tags from API.
vm_tags = __get_vm_tags(api_object, node, vm['vmid'], 'ct')
if vm_tags is not None:
group_include, group_exclude, vm_ignore = __get_proxlb_groups(vm_tags)
# Get wildcard match for VMs to ignore if a wildcard pattern was
# previously found. Wildcards may slow down the task when using
# many patterns in the ignore list. Therefore, run this only if
# a wildcard pattern was found. We also do not need to validate
# this if the VM is already being ignored by a defined tag.
if vm_ignore_wildcard and not vm_ignore:
vm_ignore = __check_vm_name_wildcard_pattern(vm['name'], ignore_vms_list)
if vm['status'] == 'running' and vm['name'] not in ignore_vms_list and not vm_ignore:
vm_statistics[vm['name']] = {}
vm_statistics[vm['name']]['group_include'] = group_include
vm_statistics[vm['name']]['group_exclude'] = group_exclude
vm_statistics[vm['name']]['cpu_total'] = vm['cpus']
vm_statistics[vm['name']]['cpu_used'] = vm['cpu']
vm_statistics[vm['name']]['memory_total'] = vm['maxmem']
vm_statistics[vm['name']]['memory_used'] = vm['mem']
vm_statistics[vm['name']]['disk_total'] = vm['maxdisk']
vm_statistics[vm['name']]['disk_used'] = vm['disk']
vm_statistics[vm['name']]['vmid'] = vm['vmid']
vm_statistics[vm['name']]['node_parent'] = node['node']
vm_statistics[vm['name']]['type'] = 'ct'
# Rebalancing node will be overwritten after calculations.
# If the vm stays on the node, it will be removed at a
# later time.
vm_statistics[vm['name']]['node_rebalance'] = node['node']
logging.info(f'{info_prefix} Added vm {vm["name"]}.')
logging.info(f'{info_prefix} Created VM statistics.')
return vm_statistics
def update_node_statistics(node_statistics, vm_statistics):
""" Update node statistics by VMs statistics. """
info_prefix = 'Info: [node-update-statistics]:'
warn_prefix = 'Warning: [node-update-statistics]:'
for vm, vm_value in vm_statistics.items():
node_statistics[vm_value['node_parent']]['cpu_assigned'] = node_statistics[vm_value['node_parent']]['cpu_assigned'] + int(vm_value['cpu_total'])
node_statistics[vm_value['node_parent']]['cpu_assigned_percent'] = (node_statistics[vm_value['node_parent']]['cpu_assigned'] / node_statistics[vm_value['node_parent']]['cpu_total']) * 100
node_statistics[vm_value['node_parent']]['memory_assigned'] = node_statistics[vm_value['node_parent']]['memory_assigned'] + int(vm_value['memory_total'])
node_statistics[vm_value['node_parent']]['memory_assigned_percent'] = (node_statistics[vm_value['node_parent']]['memory_assigned'] / node_statistics[vm_value['node_parent']]['memory_total']) * 100
node_statistics[vm_value['node_parent']]['disk_assigned'] = node_statistics[vm_value['node_parent']]['disk_assigned'] + int(vm_value['disk_total'])
node_statistics[vm_value['node_parent']]['disk_assigned_percent'] = (node_statistics[vm_value['node_parent']]['disk_assigned'] / node_statistics[vm_value['node_parent']]['disk_total']) * 100
if node_statistics[vm_value['node_parent']]['cpu_assigned_percent'] > 99:
logging.warning(f'{warn_prefix} Node {vm_value["node_parent"]} is overprovisioned for CPU by {int(node_statistics[vm_value["node_parent"]]["cpu_assigned_percent"])}%.')
if node_statistics[vm_value['node_parent']]['memory_assigned_percent'] > 99:
logging.warning(f'{warn_prefix} Node {vm_value["node_parent"]} is overprovisioned for memory by {int(node_statistics[vm_value["node_parent"]]["memory_assigned_percent"])}%.')
if node_statistics[vm_value['node_parent']]['disk_assigned_percent'] > 99:
logging.warning(f'{warn_prefix} Node {vm_value["node_parent"]} is overprovisioned for disk by {int(node_statistics[vm_value["node_parent"]]["disk_assigned_percent"])}%.')
logging.info(f'{info_prefix} Updated node resource assignments by all VMs.')
logging.debug('node_statistics')
return node_statistics
def __validate_ignore_vm_wildcard(ignore_vms):
""" Validate if a wildcard is used for ignored VMs. """
if '*' in ignore_vms:
@@ -312,18 +465,23 @@ def __check_vm_name_wildcard_pattern(vm_name, ignore_vms_list):
return True
def __get_vm_tags(api_object, node, vmid):
""" Get a comment for a VM from a given VMID. """
info_prefix = 'Info: [api-get-vm-tags]:'
def __get_vm_tags(api_object, node, vmid, balancing_type):
""" Get tags for a VM/CT for a given VMID. """
info_prefix = 'Info: [api-get-vm-tags]:'
vm_config = api_object.nodes(node['node']).qemu(vmid).config.get()
logging.info(f'{info_prefix} Got VM comment from API.')
if balancing_type == 'vm':
vm_config = api_object.nodes(node['node']).qemu(vmid).config.get()
if balancing_type == 'ct':
vm_config = api_object.nodes(node['node']).lxc(vmid).config.get()
logging.info(f'{info_prefix} Got VM/CT tag from API.')
return vm_config.get('tags', None)
def __get_proxlb_groups(vm_tags):
""" Get ProxLB related include and exclude groups. """
info_prefix = 'Info: [api-get-vm-include-exclude-tags]:'
info_prefix = 'Info: [api-get-vm-include-exclude-tags]:'
group_include = None
group_exclude = None
vm_ignore = None
@@ -346,33 +504,31 @@ def __get_proxlb_groups(vm_tags):
return group_include, group_exclude, vm_ignore
def balancing_calculations(balancing_method, node_statistics, vm_statistics):
def balancing_calculations(balancing_method, balancing_mode, balancing_mode_option, node_statistics, vm_statistics, balanciness, rebalance, processed_vms):
""" Calculate re-balancing of VMs on present nodes across the cluster. """
info_prefix = 'Info: [rebalancing-calculator]:'
balanciness = 10
rebalance = False
processed_vms = []
rebalance = True
emergency_counter = 0
info_prefix = 'Info: [rebalancing-calculator]:'
# Validate for a supported balancing method.
# Validate for a supported balancing method, mode and if rebalancing is required.
__validate_balancing_method(balancing_method)
__validate_balancing_mode(balancing_mode)
__validate_vm_statistics(vm_statistics)
rebalance = __validate_balanciness(balanciness, balancing_method, balancing_mode, node_statistics)
# Rebalance VMs with the highest resource usage to a new
# node until reaching the desired balanciness.
while rebalance and emergency_counter < 10000:
emergency_counter = emergency_counter + 1
rebalance = __validate_balanciness(balanciness, balancing_method, node_statistics)
if rebalance:
# Get most used/assigned resources of the VM and the most free or less allocated node.
resources_vm_most_used, processed_vms = __get_most_used_resources_vm(balancing_method, balancing_mode, vm_statistics, processed_vms)
resources_node_most_free = __get_most_free_resources_node(balancing_method, balancing_mode, balancing_mode_option, node_statistics)
if rebalance:
resource_highest_used_resources_vm, processed_vms = __get_most_used_resources_vm(balancing_method, vm_statistics, processed_vms)
resource_highest_free_resources_node = __get_most_free_resources_node(balancing_method, node_statistics)
node_statistics, vm_statistics = __update_resource_statistics(resource_highest_used_resources_vm, resource_highest_free_resources_node,
vm_statistics, node_statistics, balancing_method)
# Update resource statistics for VMs and nodes.
node_statistics, vm_statistics = __update_resource_statistics(resources_vm_most_used, resources_node_most_free,
vm_statistics, node_statistics, balancing_method, balancing_mode)
# Start recursion until we do not have any needs to rebalance anymore.
balancing_calculations(balancing_method, balancing_mode, balancing_mode_option, node_statistics, vm_statistics, balanciness, rebalance, processed_vms)
# Honour groupings for include and exclude groups for rebalancing VMs.
node_statistics, vm_statistics = __get_vm_tags_include_groups(vm_statistics, node_statistics, balancing_method)
node_statistics, vm_statistics = __get_vm_tags_exclude_groups(vm_statistics, node_statistics, balancing_method)
node_statistics, vm_statistics = __get_vm_tags_include_groups(vm_statistics, node_statistics, balancing_method, balancing_mode)
node_statistics, vm_statistics = __get_vm_tags_exclude_groups(vm_statistics, node_statistics, balancing_method, balancing_mode)
# Remove VMs that are not being relocated.
vms_to_remove = [vm_name for vm_name, vm_info in vm_statistics.items() if 'node_rebalance' in vm_info and vm_info['node_rebalance'] == vm_info.get('node_parent')]
@@ -386,7 +542,7 @@ def balancing_calculations(balancing_method, node_statistics, vm_statistics):
def __validate_balancing_method(balancing_method):
""" Validate for valid and supported balancing method. """
error_prefix = 'Error: [balancing-method-validation]:'
info_prefix = 'Info: [balancing-method-validation]]:'
info_prefix = 'Info: [balancing-method-validation]:'
if balancing_method not in ['memory', 'disk', 'cpu']:
logging.error(f'{error_prefix} Invalid balancing method: {balancing_method}')
@@ -395,84 +551,146 @@ def __validate_balancing_method(balancing_method):
logging.info(f'{info_prefix} Valid balancing method: {balancing_method}')
def __validate_balanciness(balanciness, balancing_method, node_statistics):
def __validate_balancing_mode(balancing_mode):
""" Validate for valid and supported balancing mode. """
error_prefix = 'Error: [balancing-mode-validation]:'
info_prefix = 'Info: [balancing-mode-validation]:'
if balancing_mode not in ['used', 'assigned']:
logging.error(f'{error_prefix} Invalid balancing method: {balancing_mode}')
sys.exit(2)
else:
logging.info(f'{info_prefix} Valid balancing method: {balancing_mode}')
def __validate_vm_statistics(vm_statistics):
""" Validate for at least a single object of type CT/VM to rebalance. """
error_prefix = 'Error: [balancing-vm-stats-validation]:'
if len(vm_statistics) == 0:
logging.error(f'{error_prefix} Not a single CT/VM found in cluster.')
sys.exit(1)
def __validate_balanciness(balanciness, balancing_method, balancing_mode, node_statistics):
""" Validate for balanciness to ensure further rebalancing is needed. """
info_prefix = 'Info: [balanciness-validation]]:'
node_memory_free_percent_list = []
info_prefix = 'Info: [balanciness-validation]:'
node_resource_percent_list = []
node_assigned_percent_match = []
# Remap balancing mode to get the related values from nodes dict.
if balancing_mode == 'used':
node_resource_selector = 'free'
if balancing_mode == 'assigned':
node_resource_selector = 'assigned'
for node_name, node_info in node_statistics.items():
node_memory_free_percent_list.append(node_info[f'{balancing_method}_free_percent'])
node_memory_free_percent_list_sorted = sorted(node_memory_free_percent_list)
node_lowest_percent = node_memory_free_percent_list_sorted[0]
node_highest_percent = node_memory_free_percent_list_sorted[-1]
# Save information of nodes from current run to compare them in the next recursion.
if node_statistics[node_name][f'{balancing_method}_{node_resource_selector}_percent_last_run'] == node_statistics[node_name][f'{balancing_method}_{node_resource_selector}_percent']:
node_statistics[node_name][f'{balancing_method}_{node_resource_selector}_percent_match'] = True
else:
node_statistics[node_name][f'{balancing_method}_{node_resource_selector}_percent_match'] = False
# Update value to the current value of the recursion run.
node_statistics[node_name][f'{balancing_method}_{node_resource_selector}_percent_last_run'] = node_statistics[node_name][f'{balancing_method}_{node_resource_selector}_percent']
if (node_lowest_percent + balanciness) < node_highest_percent:
logging.info(f'{info_prefix} Rebalancing is for {balancing_method} is needed.')
# If all node resources are unchanged, the recursion can be left.
for key, value in node_statistics.items():
node_assigned_percent_match.append(value.get(f'{balancing_method}_{node_resource_selector}_percent_match', False))
if False not in node_assigned_percent_match:
return False
# Add node information to resource list.
node_resource_percent_list.append(int(node_info[f'{balancing_method}_{node_resource_selector}_percent']))
logging.debug(f'{info_prefix} Node: {node_name} with values: {node_info}')
# Create a sorted list of the delta + balanciness between the node resources.
node_resource_percent_list_sorted = sorted(node_resource_percent_list)
node_lowest_percent = node_resource_percent_list_sorted[0]
node_highest_percent = node_resource_percent_list_sorted[-1]
# Validate if the recursion should be proceeded for further rebalancing.
if (int(node_lowest_percent) + int(balanciness)) < int(node_highest_percent):
logging.info(f'{info_prefix} Rebalancing for {balancing_method} is needed. Highest usage: {int(node_highest_percent)}% | Lowest usage: {int(node_lowest_percent)}%.')
return True
else:
logging.info(f'{info_prefix} Rebalancing is for {balancing_method} is not needed.')
logging.info(f'{info_prefix} Rebalancing for {balancing_method} is not needed. Highest usage: {int(node_highest_percent)}% | Lowest usage: {int(node_lowest_percent)}%.')
return False
def __get_most_used_resources_vm(balancing_method, vm_statistics, processed_vms):
def __get_most_used_resources_vm(balancing_method, balancing_mode, vm_statistics, processed_vms):
""" Get and return the most used resources of a VM by the defined balancing method. """
if balancing_method == 'memory':
vm = max(vm_statistics.items(), key=lambda item: item[1]['memory_used'] if item[0] not in processed_vms else -float('inf'))
processed_vms.append(vm[0])
return vm, processed_vms
if balancing_method == 'disk':
vm = max(vm_statistics.items(), key=lambda item: item[1]['disk_used'] if item[0] not in processed_vms else -float('inf'))
processed_vms.append(vm[0])
return vm, processed_vms
if balancing_method == 'cpu':
vm = max(vm_statistics.items(), key=lambda item: item[1]['cpu_used'] if item[0] not in processed_vms else -float('inf'))
processed_vms.append(vm[0])
return vm, processed_vms
info_prefix = 'Info: [get-most-used-resources-vm]:'
# Remap balancing mode to get the related values from nodes dict.
if balancing_mode == 'used':
vm_resource_selector = 'used'
if balancing_mode == 'assigned':
vm_resource_selector = 'total'
vm = max(vm_statistics.items(), key=lambda item: item[1][f'{balancing_method}_{vm_resource_selector}'] if item[0] not in processed_vms else -float('inf'))
processed_vms.append(vm[0])
logging.info(f'{info_prefix} {vm}')
return vm, processed_vms
def __get_most_free_resources_node(balancing_method, node_statistics):
def __get_most_free_resources_node(balancing_method, balancing_mode, balancing_mode_option, node_statistics):
""" Get and return the most free resources of a node by the defined balancing method. """
if balancing_method == 'memory':
return max(node_statistics.items(), key=lambda item: item[1]['memory_free'])
if balancing_method == 'disk':
return max(node_statistics.items(), key=lambda item: item[1]['disk_free'])
if balancing_method == 'cpu':
return max(node_statistics.items(), key=lambda item: item[1]['cpu_free'])
info_prefix = 'Info: [get-most-free-resources-nodes]:'
# Return the node information based on the balancing mode.
if balancing_mode == 'used' and balancing_mode_option == 'bytes':
node = max(node_statistics.items(), key=lambda item: item[1][f'{balancing_method}_free'])
if balancing_mode == 'used' and balancing_mode_option == 'percent':
node = max(node_statistics.items(), key=lambda item: item[1][f'{balancing_method}_free_percent'])
if balancing_mode == 'assigned':
node = min(node_statistics.items(), key=lambda item: item[1][f'{balancing_method}_assigned'] if item[1][f'{balancing_method}_assigned_percent'] > 0 or item[1][f'{balancing_method}_assigned_percent'] < 100 else -float('inf'))
logging.info(f'{info_prefix} {node}')
return node
def __update_resource_statistics(resource_highest_used_resources_vm, resource_highest_free_resources_node, vm_statistics, node_statistics, balancing_method):
def __update_resource_statistics(resource_highest_used_resources_vm, resource_highest_free_resources_node, vm_statistics, node_statistics, balancing_method, balancing_mode):
""" Update VM and node resource statistics. """
info_prefix = 'Info: [rebalancing-resource-statistics-update]:'
info_prefix = 'Info: [rebalancing-resource-statistics-update]:'
if resource_highest_used_resources_vm[1]['node_parent'] != resource_highest_free_resources_node[0]:
vm_name = resource_highest_used_resources_vm[0]
vm_node_parent = resource_highest_used_resources_vm[1]['node_parent']
vm_node_rebalance = resource_highest_free_resources_node[0]
vm_resource_used = vm_statistics[resource_highest_used_resources_vm[0]][f'{balancing_method}_used']
vm_name = resource_highest_used_resources_vm[0]
vm_node_parent = resource_highest_used_resources_vm[1]['node_parent']
vm_node_rebalance = resource_highest_free_resources_node[0]
vm_resource_used = vm_statistics[resource_highest_used_resources_vm[0]][f'{balancing_method}_used']
vm_resource_total = vm_statistics[resource_highest_used_resources_vm[0]][f'{balancing_method}_total']
# Update dictionaries for new values
# Assign new rebalance node to vm
vm_statistics[vm_name]['node_rebalance'] = vm_node_rebalance
logging.info(f'Moving {vm_name} from {vm_node_parent} to {vm_node_rebalance}')
# Recalculate values for nodes
## Add freed resources to old parent node
node_statistics[vm_node_parent][f'{balancing_method}_used'] = int(node_statistics[vm_node_parent][f'{balancing_method}_used']) - int(vm_resource_used)
node_statistics[vm_node_parent][f'{balancing_method}_free'] = int(node_statistics[vm_node_parent][f'{balancing_method}_free']) + int(vm_resource_used)
node_statistics[vm_node_parent][f'{balancing_method}_free_percent'] = int(int(node_statistics[vm_node_parent][f'{balancing_method}_free']) / int(node_statistics[vm_node_parent][f'{balancing_method}_total']) * 100)
node_statistics[vm_node_parent][f'{balancing_method}_used'] = int(node_statistics[vm_node_parent][f'{balancing_method}_used']) - int(vm_resource_used)
node_statistics[vm_node_parent][f'{balancing_method}_free'] = int(node_statistics[vm_node_parent][f'{balancing_method}_free']) + int(vm_resource_used)
node_statistics[vm_node_parent][f'{balancing_method}_free_percent'] = int(int(node_statistics[vm_node_parent][f'{balancing_method}_free']) / int(node_statistics[vm_node_parent][f'{balancing_method}_total']) * 100)
node_statistics[vm_node_parent][f'{balancing_method}_assigned'] = int(node_statistics[vm_node_parent][f'{balancing_method}_assigned']) - int(vm_resource_total)
node_statistics[vm_node_parent][f'{balancing_method}_assigned_percent'] = int(int(node_statistics[vm_node_parent][f'{balancing_method}_assigned']) / int(node_statistics[vm_node_parent][f'{balancing_method}_total']) * 100)
## Removed newly allocated resources to new rebalanced node
node_statistics[vm_node_rebalance][f'{balancing_method}_used'] = int(node_statistics[vm_node_rebalance][f'{balancing_method}_used']) + int(vm_resource_used)
node_statistics[vm_node_rebalance][f'{balancing_method}_free'] = int(node_statistics[vm_node_rebalance][f'{balancing_method}_free']) - int(vm_resource_used)
node_statistics[vm_node_rebalance][f'{balancing_method}_free_percent'] = int(int(node_statistics[vm_node_rebalance][f'{balancing_method}_free']) / int(node_statistics[vm_node_rebalance][f'{balancing_method}_total']) * 100)
node_statistics[vm_node_rebalance][f'{balancing_method}_used'] = int(node_statistics[vm_node_rebalance][f'{balancing_method}_used']) + int(vm_resource_used)
node_statistics[vm_node_rebalance][f'{balancing_method}_free'] = int(node_statistics[vm_node_rebalance][f'{balancing_method}_free']) - int(vm_resource_used)
node_statistics[vm_node_rebalance][f'{balancing_method}_free_percent'] = int(int(node_statistics[vm_node_rebalance][f'{balancing_method}_free']) / int(node_statistics[vm_node_rebalance][f'{balancing_method}_total']) * 100)
node_statistics[vm_node_rebalance][f'{balancing_method}_assigned'] = int(node_statistics[vm_node_rebalance][f'{balancing_method}_assigned']) + int(vm_resource_total)
node_statistics[vm_node_rebalance][f'{balancing_method}_assigned_percent'] = int(int(node_statistics[vm_node_rebalance][f'{balancing_method}_assigned']) / int(node_statistics[vm_node_rebalance][f'{balancing_method}_total']) * 100)
logging.info(f'{info_prefix} Updated VM and node statistics.')
return node_statistics, vm_statistics
def __get_vm_tags_include_groups(vm_statistics, node_statistics, balancing_method):
def __get_vm_tags_include_groups(vm_statistics, node_statistics, balancing_method, balancing_mode):
""" Get VMs tags for include groups. """
info_prefix = 'Info: [rebalancing-tags-group-include]:'
info_prefix = 'Info: [rebalancing-tags-group-include]:'
tags_include_vms = {}
processed_vm = []
@@ -499,16 +717,15 @@ def __get_vm_tags_include_groups(vm_statistics, node_statistics, balancing_metho
vm_node_rebalance = vm_statistics[vm_name]['node_rebalance']
else:
_mocked_vm_object = (vm_name, vm_statistics[vm_name])
node_statistics, vm_statistics = __update_resource_statistics(_mocked_vm_object, [vm_node_rebalance],
vm_statistics, node_statistics, balancing_method)
node_statistics, vm_statistics = __update_resource_statistics(_mocked_vm_object, [vm_node_rebalance], vm_statistics, node_statistics, balancing_method, balancing_mode)
processed_vm.append(vm_name)
return node_statistics, vm_statistics
def __get_vm_tags_exclude_groups(vm_statistics, node_statistics, balancing_method):
def __get_vm_tags_exclude_groups(vm_statistics, node_statistics, balancing_method, balancing_mode):
""" Get VMs tags for exclude groups. """
info_prefix = 'Info: [rebalancing-tags-group-exclude]:'
info_prefix = 'Info: [rebalancing-tags-group-exclude]:'
tags_exclude_vms = {}
processed_vm = []
@@ -539,61 +756,178 @@ def __get_vm_tags_exclude_groups(vm_statistics, node_statistics, balancing_metho
random_node = random.choice(list(node_statistics.keys()))
else:
_mocked_vm_object = (vm_name, vm_statistics[vm_name])
node_statistics, vm_statistics = __update_resource_statistics(_mocked_vm_object, [random_node],
vm_statistics, node_statistics, balancing_method)
node_statistics, vm_statistics = __update_resource_statistics(_mocked_vm_object, [random_node], vm_statistics, node_statistics, balancing_method, balancing_mode)
processed_vm.append(vm_name)
return node_statistics, vm_statistics
def run_vm_rebalancing(api_object, vm_statistics_rebalanced):
""" Run rebalancing of vms to new nodes in cluster. """
def __wait_job_finalized(api_object, node_name, job_id, counter):
""" Wait for a job to be finalized. """
error_prefix = 'Error: [job-status-getter]:'
info_prefix = 'Info: [job-status-getter]:'
logging.info(f'{info_prefix} Getting job status for job {job_id}.')
task = api_object.nodes(node_name).tasks(job_id).status().get()
logging.info(f'{info_prefix} {task}')
if task['status'] == 'running':
logging.info(f'{info_prefix} Validating job {job_id} for the {counter} run.')
# Do not run for infinity this recursion and fail when reaching the limit.
if counter == 300:
logging.critical(f'{error_prefix} The job {job_id} on node {node_name} did not finished in time for migration.')
time.sleep(5)
counter = counter + 1
logging.info(f'{info_prefix} Revalidating job {job_id} in a next run.')
__wait_job_finalized(api_object, node_name, job_id, counter)
logging.info(f'{info_prefix} Job {job_id} for migration from {node_name} terminiated succesfully.')
def __run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args, parallel_migrations):
""" Run & execute the VM rebalancing via API. """
error_prefix = 'Error: [rebalancing-executor]:'
info_prefix = 'Info: [rebalancing-executor]:'
logging.info(f'{info_prefix} Starting to rebalance vms to their new nodes.')
for vm, value in vm_statistics_rebalanced.items():
if len(vm_statistics_rebalanced) > 0 and not app_args.dry_run:
for vm, value in vm_statistics_rebalanced.items():
try:
logging.info(f'{info_prefix} Rebalancing vm {vm} from node {value["node_parent"]} to node {value["node_rebalance"]}.')
api_object.nodes(value['node_parent']).qemu(value['vmid']).migrate().post(target=value['node_rebalance'],online=1)
except proxmoxer.core.ResourceException as error_resource:
__errors__ = True
logging.critical(f'{error_prefix} {error_resource}')
try:
# Migrate type VM (live migration).
if value['type'] == 'vm':
logging.info(f'{info_prefix} Rebalancing VM {vm} from node {value["node_parent"]} to node {value["node_rebalance"]}.')
job_id = api_object.nodes(value['node_parent']).qemu(value['vmid']).migrate().post(target=value['node_rebalance'],online=1)
# Migrate type CT (requires restart of container).
if value['type'] == 'ct':
logging.info(f'{info_prefix} Rebalancing CT {vm} from node {value["node_parent"]} to node {value["node_rebalance"]}.')
job_id = api_object.nodes(value['node_parent']).lxc(value['vmid']).migrate().post(target=value['node_rebalance'],restart=1)
except proxmoxer.core.ResourceException as error_resource:
logging.critical(f'{error_prefix} {error_resource}')
# Wait for migration to be finished unless running parallel migrations.
if not bool(int(parallel_migrations)):
logging.info(f'{info_prefix} Rebalancing will be performed sequentially.')
__wait_job_finalized(api_object, value['node_parent'], job_id, counter=1)
else:
logging.info(f'{info_prefix} Rebalancing will be performed parallely.')
else:
logging.info(f'{info_prefix} No rebalancing needed.')
def __create_json_output(vm_statistics_rebalanced, app_args):
""" Create a machine parsable json output of VM rebalance statitics. """
info_prefix = 'Info: [json-output-generator]:'
if app_args.json:
logging.info(f'{info_prefix} Printing json output of VM statistics.')
print(json.dumps(vm_statistics_rebalanced))
def __create_cli_output(vm_statistics_rebalanced, app_args):
""" Create output for CLI when running in dry-run mode. """
info_prefix_dry_run = 'Info: [cli-output-generator-dry-run]:'
info_prefix_run = 'Info: [cli-output-generator]:'
vm_to_node_list = []
if app_args.dry_run:
info_prefix = info_prefix_dry_run
logging.info(f'{info_prefix} Starting dry-run to rebalance vms to their new nodes.')
else:
info_prefix = info_prefix_run
logging.info(f'{info_prefix} Start rebalancing vms to their new nodes.')
vm_to_node_list.append(['VM', 'Current Node', 'Rebalanced Node', 'VM Type'])
for vm_name, vm_values in vm_statistics_rebalanced.items():
vm_to_node_list.append([vm_name, vm_values['node_parent'], vm_values['node_rebalance'], vm_values['type']])
if len(vm_statistics_rebalanced) > 0:
logging.info(f'{info_prefix} Printing cli output of VM rebalancing.')
__print_table_cli(vm_to_node_list, app_args.dry_run)
else:
logging.info(f'{info_prefix} No rebalancing needed.')
def __print_table_cli(table, dry_run=False):
""" Pretty print a given table to the cli. """
info_prefix_dry_run = 'Info: [cli-output-generator-table-dryn-run]:'
info_prefix_run = 'Info: [cli-output-generator-table]:'
info_prefix = info_prefix_run
longest_cols = [
(max([len(str(row[i])) for row in table]) + 3)
for i in range(len(table[0]))
]
row_format = "".join(["{:>" + str(longest_col) + "}" for longest_col in longest_cols])
for row in table:
# Print CLI output when running in dry-run mode to make the user's life easier.
if dry_run:
info_prefix = info_prefix_dry_run
print(row_format.format(*row))
# Log all items in info mode.
logging.info(f'{info_prefix} {row_format.format(*row)}')
def run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args, parallel_migrations):
""" Run rebalancing of vms to new nodes in cluster. """
__run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args, parallel_migrations)
__create_json_output(vm_statistics_rebalanced, app_args)
__create_cli_output(vm_statistics_rebalanced, app_args)
def main():
""" Run ProxLB for balancing VM workloads across a Proxmox cluster. """
# Initialize PAS.
initialize_logger('CRITICAL', 'SystemdHandler()')
initialize_logger('CRITICAL')
app_args = initialize_args()
config_path = initialize_config_path(app_args)
pre_validations(config_path)
# Parse global config
proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v, balancing_method, \
ignore_nodes, ignore_vms, daemon, schedule = initialize_config_options(config_path)
# Parse global config.
proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v, balancing_method, balancing_mode, balancing_mode_option, balancing_type, \
balanciness, parallel_migrations, ignore_nodes, ignore_vms, master_only, daemon, schedule, log_verbosity = initialize_config_options(config_path)
# Overwrite logging handler with user defined log verbosity.
initialize_logger(log_verbosity, update_log_verbosity=True)
while True:
# API Authentication.
api_object = api_connect(proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v)
# Get master node of cluster and ensure that ProxLB is only performed on the
# cluster master node to avoid ongoing rebalancing.
cluster_master, master_only = execute_rebalancing_only_by_master(api_object, master_only)
# Validate daemon service and skip following tasks when not being the cluster master.
if not cluster_master and master_only:
validate_daemon(daemon, schedule)
continue
# Get metric & statistics for vms and nodes.
node_statistics = get_node_statistics(api_object, ignore_nodes)
vm_statistics = get_vm_statistics(api_object, ignore_vms)
vm_statistics = get_vm_statistics(api_object, ignore_vms, balancing_type)
node_statistics = update_node_statistics(node_statistics, vm_statistics)
# Calculate rebalancing of vms.
node_statistics_rebalanced, vm_statistics_rebalanced = balancing_calculations(balancing_method, node_statistics, vm_statistics)
node_statistics_rebalanced, vm_statistics_rebalanced = balancing_calculations(balancing_method, balancing_mode, balancing_mode_option,
node_statistics, vm_statistics, balanciness, rebalance=False, processed_vms=[])
# Rebalance vms to new nodes within the cluster.
run_vm_rebalancing(api_object, vm_statistics_rebalanced)
run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args, parallel_migrations)
# Validate for any errors
# Validate for any errors.
post_validations()
# Validate daemon service
# Validate daemon service.
validate_daemon(daemon, schedule)
if __name__ == '__main__':
main()
main()

View File

@@ -5,8 +5,10 @@ api_pass: FooBar
verify_ssl: 1
[balancing]
method: memory
mode: used
ignore_nodes: dummynode01,dummynode02
ignore_vms: testvm01,testvm02
[service]
daemon: 1
schedule: 24
log_verbosity: CRITICAL