Compare commits

...

13 Commits

Author SHA1 Message Date
Florian Paul Azim Hoberg
fe715f203e docs: Adjust the readme.md 2024-08-24 08:10:16 +02:00
Florian
959c3b5f8d Merge pull request #55 from gyptazy/feature/51-storage-balancing-feature
feature: Add storage balancing function.
2024-08-24 08:06:35 +02:00
Florian Paul Azim Hoberg
ef8b97efc2 feature: Add storage balancing function. [#51].
feature: Add feature to allow the API hosts being provided as a comma separated list. [#60]

Fixes: #51
Fixes: #60
2024-08-23 18:48:57 +02:00
Florian
e4d40b460b Merge pull request #54 from gyptazy/feature/code-cleanup-future
feature: Add cli arg (-b) to return the best next node for VM placement.
2024-08-19 21:11:38 +02:00
Florian Paul Azim Hoberg
39142780d5 feature: Add cli arg (-b) to return the best next node for VM placement.
Fixes: #8
Fixes: #53
2024-08-19 21:09:20 +02:00
Florian
143135f1d8 Merge pull request #50 from gyptazy/release/v1.0.2
release: Prepare release v1.0.2
2024-08-13 17:10:37 +02:00
Florian Paul Azim Hoberg
c865829a2e release: Prepare release v1.0.2 2024-08-13 16:37:30 +02:00
Florian
101855b404 Merge pull request #46 from gyptazy/fix/45-adjust-daemon-time-mix-min-hrs
fix: Fix daemon timer to use hours instead of minutes.
2024-08-06 21:29:34 +02:00
Florian Paul Azim Hoberg
37e7a601be fix: Fix daemon timer to use hours instead of minutes.
Reported by: @mater-345
Fixes: #45
2024-08-06 18:06:05 +02:00
Florian
8791007e77 Merge pull request #43 from gyptazy/feature/40-option-run-only-on-master-node
feature: Add option to run ProxLB only on the Proxmox's master node in the cluster.
2024-08-06 18:00:26 +02:00
Florian Paul Azim Hoberg
3a2c16b137 feature: Add option to run ProxLB only on the Proxmox's master node in the cluster.
Fixes: #40
2024-08-06 17:58:34 +02:00
Florian
adc476e848 Merge pull request #42 from gyptazy/feature/41-add-option-run-migration-parallel-or-serial
feature: Add option to run migrations in parallel or sequentially
2024-08-04 08:27:04 +02:00
Florian Paul Azim Hoberg
28be8b8146 feature: Add option to run migrations in parallel or sequentially
Fixes: #41
2024-08-04 08:25:03 +02:00
20 changed files with 804 additions and 127 deletions

View File

@@ -0,0 +1,2 @@
added:
- Add option to run ProxLB only on the Proxmox's master node in the cluster (reg. HA feature). [#40]

View File

@@ -0,0 +1,2 @@
added:
- Add option to run migrations in parallel or sequentially. [#41]

View File

@@ -0,0 +1,2 @@
changed:
- Fix daemon timer to use hours instead of minutes. [#45]

View File

@@ -0,0 +1,2 @@
fixed:
- Fix CMake packaging for Debian package to avoid overwriting the config file. [#49]

View File

@@ -0,0 +1 @@
date: 2024-08-13

View File

@@ -0,0 +1,2 @@
added:
- Add storage balancing function. [#51]

View File

@@ -0,0 +1,6 @@
added:
- Add a convert function to cast all bool alike options from configparser to bools. [#53]
- Add a config parser options for future features. [#53]
- Add a config versio schema that must be supported by ProxLB. [#53]
changed:
- Improve the underlying code base for future implementations. [#53]

View File

@@ -0,0 +1,2 @@
added:
- Add feature to allow the API hosts being provided as a comma separated list. [#60]

View File

@@ -0,0 +1,2 @@
added:
- Add cli arg `-b` to return the next best node for next VM/CT placement. [#8]

View File

@@ -0,0 +1,2 @@
fixed:
- Fixed `master_only` function by inverting the condition.

View File

@@ -0,0 +1 @@
date: TBD

View File

@@ -6,6 +6,20 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.0.2] - 2024-08-13
### Added
- Add option to run migration in parallel or sequentially. [#41]
- Add option to run ProxLB only on the Proxmox's master node in the cluster (reg. HA feature). [#40]
### Changed
- Fix daemon timer to use hours instead of minutes. [#45]
- Fix CMake packaging for Debian package to avoid overwriting the config file. [#49]
- Fix wonkey code style.
## [1.0.0] - 2024-08-01
### Added
@@ -37,4 +51,4 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- Development release of ProxLB.
- Development release of ProxLB.

109
README.md
View File

@@ -20,9 +20,10 @@
- [General](#general)
- [By Used Memory of VMs/CTs](#by-used-memory-of-vmscts)
- [By Assigned Memory of VMs/CTs](#by-assigned-memory-of-vmscts)
- [Grouping](#grouping)
- [Include (Stay Together)](#include-stay-together)
- [Exclude (Stay Separate)](#exclude-stay-separate)
- [Storage Balancing](#storage-balancing)
- [Affinity Rules / Grouping Relationships](#affinity-rules--grouping-relationships)
- [Affinity (Stay Together)](#affinity-stay-together)
- [Anti-Affinity (Keep Apart)](#anti-affinity-keep-apart)
- [Ignore VMs (Tag Style)](#ignore-vms-tag-style)
- [Systemd](#systemd)
- [Manual](#manual)
@@ -55,10 +56,14 @@ Automated rebalancing reduces the need for manual actions, allowing operators to
<img src="https://cdn.gyptazy.ch/images/proxlb-rebalancing-demo.gif"/>
## Features
* Rebalance the cluster by:
* Rebalance VMs/CTs in the cluster by:
* Memory
* Disk (only local storage)
* CPU
* Rebalance Storage in the cluster
* Rebalance VMs/CTs disks to other storage pools
* Rebalance by used storage
* Get best Node for new VM/CT placement in cluster
* Performing
* Periodically
* One-shot solution
@@ -66,6 +71,7 @@ Automated rebalancing reduces the need for manual actions, allowing operators to
* Rebalance only VMs
* Rebalance only CTs
* Rebalance all (VMs and CTs)
* Rebalance VM/CT disks (Storage)
* Filter
* Exclude nodes
* Exclude virtual machines
@@ -98,22 +104,31 @@ Running PLB is easy and it runs almost everywhere since it just depends on `Pyth
### Options
The following options can be set in the `proxlb.conf` file:
| Option | Example | Description |
|------|:------:|:------:|
| api_host | hypervisor01.gyptazy.ch | Host or IP address of the remote Proxmox API. |
| api_user | root@pam | Username for the API. |
| api_pass | FooBar | Password for the API. |
| verify_ssl | 1 | Validate SSL certificates (1) or ignore (0). (default: 1) |
| method | memory | Defines the balancing method (default: memory) where you can use `memory`, `disk` or `cpu`. |
| mode | used | Rebalance by `used` resources (efficiency) or `assigned` (avoid overprovisioning) resources. (default: used)|
| mode_option | byte | Rebalance by node's resources in `bytes` or `percent`. (default: bytes) |
| type | vm | Rebalance only `vm` (virtual machines), `ct` (containers) or `all` (virtual machines & containers). (default: vm)|
| balanciness | 10 | Value of the percentage of lowest and highest resource consumption on nodes may differ before rebalancing. (default: 10) |
| ignore_nodes | dummynode01,dummynode02,test* | Defines a comma separated list of nodes to exclude. |
| ignore_vms | testvm01,testvm02 | Defines a comma separated list of VMs to exclude. (`*` as suffix wildcard or tags are also supported) |
| daemon | 1 | Run as a daemon (1) or one-shot (0). (default: 1) |
| schedule | 24 | Hours to rebalance in hours. (default: 24) |
| log_verbosity | INFO | Defines the log level (default: CRITICAL) where you can use `INFO`, `WARN` or `CRITICAL` |
| Section | Option | Example | Description |
|------|:------:|:------:|:------:|
| `proxmox` | api_host | hypervisor01.gyptazy.ch | Host or IP address (or comma separated list) of the remote Proxmox API. |
| | api_user | root@pam | Username for the API. |
| | api_pass | FooBar | Password for the API. |
| | verify_ssl | 1 | Validate SSL certificates (1) or ignore (0). (default: 1) |
| `vm_balancing` | enable | 1 | Enables VM/CT balancing. |
| | method | memory | Defines the balancing method (default: memory) where you can use `memory`, `disk` or `cpu`. |
| | mode | used | Rebalance by `used` resources (efficiency) or `assigned` (avoid overprovisioning) resources. (default: used)|
| | mode_option | byte | Rebalance by node's resources in `bytes` or `percent`. (default: bytes) |
| | type | vm | Rebalance only `vm` (virtual machines), `ct` (containers) or `all` (virtual machines & containers). (default: vm)|
| | balanciness | 10 | Value of the percentage of lowest and highest resource consumption on nodes may differ before rebalancing. (default: 10) |
| | parallel_migrations | 1 | Defines if migrations should be done parallely or sequentially. (default: 1) |
| | ignore_nodes | dummynode01,dummynode02,test* | Defines a comma separated list of nodes to exclude. |
| | ignore_vms | testvm01,testvm02 | Defines a comma separated list of VMs to exclude. (`*` as suffix wildcard or tags are also supported) |
| | master_only | 0 | Defines is this should only be performed (1) on the cluster master node or not (0). (default: 0) |
| `storage_balancing` | enable | 0 | Enables storage balancing. |
| | balanciness | 10 | Value of the percentage of lowest and highest storage consumption may differ before rebalancing. (default: 10) |
| | parallel_migrations | 1 | Defines if migrations should be done parallely or sequentially. (default: 1) |
| `update_service` | enable | 0 | Enables the automated update service (rolling updates). |
| `api` | enable | 0 | Enables the ProxLB API. |
| `service`| daemon | 1 | Run as a daemon (1) or one-shot (0). (default: 1) |
| | schedule | 24 | Hours to rebalance in hours. (default: 24) |
| | log_verbosity | INFO | Defines the log level (default: CRITICAL) where you can use `INFO`, `WARN` or `CRITICAL` |
| | config_version | 3 | Defines the current config version schema for ProxLB |
An example of the configuration file looks like:
```
@@ -122,7 +137,8 @@ api_host: hypervisor01.gyptazy.ch
api_user: root@pam
api_pass: FooBar
verify_ssl: 1
[balancing]
[vm_balancing]
enable: 1
method: memory
mode: used
type: vm
@@ -133,10 +149,24 @@ type: vm
# Rebalancing: node01: 41% memory consumption :: node02: 52% consumption
# No rebalancing: node01: 43% memory consumption :: node02: 50% consumption
balanciness: 10
# Enable parallel migrations. If set to 0 it will wait for completed migrations
# before starting next migration.
parallel_migrations: 1
ignore_nodes: dummynode01,dummynode02
ignore_vms: testvm01,testvm02
[storage_balancing]
enable: 0
[update_service]
enable: 0
[api]
enable: 0
[service]
# The master_only option might be usuful if running ProxLB on all nodes in a cluster
# but only a single one should do the balancing. The master node is obtained from the Proxmox
# HA status.
master_only: 0
daemon: 1
config_version: 3
```
### Parameters
@@ -145,12 +175,13 @@ The following options and parameters are currently supported:
| Option | Long Option | Description | Default |
|------|:------:|------:|------:|
| -c | --config | Path to a config file. | /etc/proxlb/proxlb.conf (default) |
| -d | --dry-run | Perform a dry-run without doing any actions. | Unset |
| -j | --json | Return a JSON of the VM movement. | Unset |
| -d | --dry-run | Performs a dry-run without doing any actions. | Unset |
| -j | --json | Returns a JSON of the VM movement. | Unset |
| -b | --best-node | Returns the best next node for a VM/CT placement (useful for further usage with Terraform/Ansible). | Unset |
### Balancing
#### General
In general, virtual machines and containers can be rebalanced and moved around nodes in the cluster. Often, this also works without downtime without any further downtimes. However, this does **not** work with containers. LXC based containers will be shutdown, copied and started on the new node. Also to note, live migrations can work fluently without any issues but there are still several things to be considered. This is out of scope for ProxLB and applies in general to Proxmox and your cluster setup. You can find more details about this here: https://pve.proxmox.com/wiki/Migrate_to_Proxmox_VE.
In general, virtual machines (VMs), containers (CTs) can be rebalanced and moved around nodes or shared storage (storage balancing) in the cluster. Often, this also works without downtime without any further downtimes. However, this does **not** work with containers. LXC based containers will be shutdown, copied and started on the new node. Also to note, live migrations can work fluently without any issues but there are still several things to be considered. This is out of scope for ProxLB and applies in general to Proxmox and your cluster setup. You can find more details about this here: https://pve.proxmox.com/wiki/Migrate_to_Proxmox_VE.
#### By Used Memory of VMs/CTs
By continuously monitoring the current resource usage of VMs, ProxLB intelligently reallocates workloads to prevent any single node from becoming overloaded. This approach ensures that resources are balanced efficiently, providing consistent and optimal performance across the entire cluster at all times. To activate this balancing mode, simply activate the following option in your ProxLB configuration:
@@ -168,11 +199,26 @@ mode: assigned
Afterwards, restart the service (if running in daemon mode) to activate this rebalancing mode.
### Grouping
#### Include (Stay Together)
#### Storage Balancing
Starting with ProxLB 1.0.3, ProxLB also supports the balancing of underlying shared storage. In this case, all attached disks (`rootfs` in a context of a CT) of a VM or CT are being fetched and evaluated. If a VM has multiple disks attached, the disks can also be distributed over different storages. As a result, only shared storage is supported. Non shared storage would require to move the whole VM including all attached disks to the parent's node local storage.
Limitations:
* Only shared storage
* Only supported for the following VM disk types:
* ide (only disks, not CD)
* nvme
* scsi
* virtio
* sata
* rootfs (Container)
*Note: Storage balancing is currently in beta and should be used carefully.*
### Affinity Rules / Grouping Relationships
#### Affinity (Stay Together)
<img align="left" src="https://cdn.gyptazy.ch/images/plb-rebalancing-include-balance-group.jpg"/> Access the Proxmox Web UI by opening your web browser and navigating to your Proxmox VE web interface, then log in with your credentials. Navigate to the VM you want to tag by selecting it from the left-hand navigation panel. Click on the "Options" tab to view the VM's options, then select "Edit" or "Add" (depending on whether you are editing an existing tag or adding a new one). In the tag field, enter plb_include_ followed by your unique identifier, for example, plb_include_group1. Save the changes to apply the tag to the VM. Repeat these steps for each VM that should be included in the group.
#### Exclude (Stay Separate)
#### Anti-Affinity (Keep Apart)
<img align="left" src="https://cdn.gyptazy.ch/images/plb-rebalancing-exclude-balance-group.jpg"/> Access the Proxmox Web UI by opening your web browser and navigating to your Proxmox VE web interface, then log in with your credentials. Navigate to the VM you want to tag by selecting it from the left-hand navigation panel. Click on the "Options" tab to view the VM's options, then select "Edit" or "Add" (depending on whether you are editing an existing tag or adding a new one). In the tag field, enter plb_exclude_ followed by your unique identifier, for example, plb_exclude_critical. Save the changes to apply the tag to the VM. Repeat these steps for each VM that should be excluded from being on the same node.
#### Ignore VMs (Tag Style)
@@ -201,8 +247,8 @@ The executable must be able to read the config file, if no dedicated config file
The easiest way to get started is by using the ready-to-use packages that I provide on my CDN and to run it on a Linux Debian based system. This can also be one of the Proxmox nodes itself.
```
wget https://cdn.gyptazy.ch/files/amd64/debian/proxlb/proxlb_1.0.0_amd64.deb
dpkg -i proxlb_1.0.0_amd64.deb
wget https://cdn.gyptazy.ch/files/amd64/debian/proxlb/proxlb_1.0.2_amd64.deb
dpkg -i proxlb_1.0.2_amd64.deb
# Adjust your config
vi /etc/proxlb/proxlb.conf
systemctl restart proxlb
@@ -294,6 +340,7 @@ Container Images for Podman, Docker etc., can be found at:
| Version | Image |
|------|:------:|
| latest | cr.gyptazy.ch/proxlb/proxlb:latest |
| v1.0.2 | cr.gyptazy.ch/proxlb/proxlb:v1.0.2 |
| v1.0.0 | cr.gyptazy.ch/proxlb/proxlb:v1.0.0 |
| v0.9.9 | cr.gyptazy.ch/proxlb/proxlb:v0.9.9 |
@@ -305,12 +352,12 @@ Bugs can be reported via the GitHub issue tracker [here](https://github.com/gypt
Feel free to add further documentation, to adjust already existing one or to contribute with code. Please take care about the style guide and naming conventions. You can find more in our [CONTRIBUTING.md](https://github.com/gyptazy/ProxLB/blob/main/CONTRIBUTING.md) file.
### Support
If you need assistance or have any questions, we offer support through our dedicated [chat room](https://matrix.to/#/#proxlb:gyptazy.ch) in Matrix and on Reddit. Join our community for real-time help, advice, and discussions. Connect with us in our dedicated chat room for immediate support and live interaction with other users and developers. You can also visit our [Reddit community](https://www.reddit.com/r/Proxmox/comments/1e78ap3/introducing_proxlb_rebalance_your_vm_workloads/) to post your queries, share your experiences, and get support from fellow community members and moderators. You may also just open directly an issue [here](https://github.com/gyptazy/ProxLB/issues) on GitHub. We are here to help and ensure you have the best experience possible.
If you need assistance or have any questions, we offer support through our dedicated [chat room](https://matrix.to/#/#proxlb:gyptazy.ch) in Matrix and on Reddit. Join our community for real-time help, advice, and discussions. Connect with us in our dedicated chat room for immediate support and live interaction with other users and developers. You can also visit our [GitHub Community](https://github.com/gyptazy/ProxLB/discussions/) to post your queries, share your experiences, and get support from fellow community members and moderators. You may also just open directly an issue [here](https://github.com/gyptazy/ProxLB/issues) on GitHub. We are here to help and ensure you have the best experience possible.
| Support Channel | Link |
|------|:------:|
| Matrix | [#proxlb:gyptazy.ch](https://matrix.to/#/#proxlb:gyptazy.ch) |
| Reddit | [Reddit community](https://www.reddit.com/r/Proxmox/comments/1e78ap3/introducing_proxlb_rebalance_your_vm_workloads/) |
| GitHub Community | [GitHub Community](https://github.com/gyptazy/ProxLB/discussions/)
| GitHub | [ProxLB GitHub](https://github.com/gyptazy/ProxLB/issues) |
### Author(s)

View File

@@ -27,6 +27,16 @@ ProxLB is a load-balancing system designed to optimize the distribution of virtu
Before starting any migrations, ProxLB validates that rebalancing actions are necessary and beneficial. Depending on the selected balancing mode — such as CPU, memory, or disk — it creates a balancing matrix. This matrix sorts the VMs by their maximum used or assigned resources, identifying the VM with the highest usage. ProxLB then places this VM on the node with the most free resources in the selected balancing type. This process runs recursively until the operator-defined Balanciness is achieved. Balancing can be defined for the used or max. assigned resources of VMs/CTs.
### ProxLB config version is too low
ProxLB may run into an error when the used config schema version is too low. This might happen after major changes that require new config options. Please make sure, to use a supported config version in addition to your running ProxLB config.
Example Error:
```
Error: [config-version-validator]: ProxLB config version 2 is too low. Required: 3.
```
The easiest way to solve this, is by taking the minimum required config schema version from a git tag, representing the ProxLB version.
### Logging
ProxLB uses the `SystemdHandler` for logging. You can find all your logs in your systemd unit log or in the `journalctl`. In default, ProxLB only logs critical events. However, for further understanding of the balancing it might be useful to change this to `INFO` or `DEBUG` which can simply be done in the [proxlb.conf](https://github.com/gyptazy/ProxLB/blob/main/proxlb.conf#L14) file by changing the `log_verbosity` parameter.
@@ -74,4 +84,4 @@ If you need assistance or have any questions, we offer support through our dedic
|------|:------:|
| Matrix | [#proxlb:gyptazy.ch](https://matrix.to/#/#proxlb:gyptazy.ch) |
| Reddit | [Reddit community](https://www.reddit.com/r/Proxmox/comments/1e78ap3/introducing_proxlb_rebalance_your_vm_workloads/) |
| GitHub | [ProxLB GitHub](https://github.com/gyptazy/ProxLB/issues) |
| GitHub | [ProxLB GitHub](https://github.com/gyptazy/ProxLB/issues) |

View File

@@ -1,5 +1,5 @@
cmake_minimum_required(VERSION 3.16)
project(proxmox-rebalancing-service VERSION 1.0.0)
project(proxmox-rebalancing-service VERSION 1.0.2)
install(PROGRAMS ../proxlb DESTINATION /bin)
install(FILES ../proxlb.conf DESTINATION /etc/proxlb)
@@ -30,12 +30,11 @@ set(CPACK_DEBIAN_PACKAGE_ARCHITECTURE "amd64")
set(CPACK_DEBIAN_PACKAGE_SUMMARY "ProxLB - Rebalance VM workloads across nodes in Proxmox clusters.")
set(CPACK_DEBIAN_PACKAGE_DESCRIPTION "ProxLB - Rebalance VM workloads across nodes in Proxmox clusters.")
set(CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA "${CMAKE_CURRENT_SOURCE_DIR}/changelog_debian")
set(CPACK_DEBIAN_PACKAGE_DEPENDS "python3")
set(CPACK_DEBIAN_PACKAGE_DEPENDS "python3, python3-proxmoxer")
set(CPACK_DEBIAN_PACKAGE_LICENSE "GPL 3.0")
# Install
set(CPACK_PACKAGING_INSTALL_PREFIX ${CMAKE_INSTALL_PREFIX})
set(CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA "${CMAKE_CURRENT_SOURCE_DIR}/postinst")
set(CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA "${CMAKE_CURRENT_SOURCE_DIR}/postinst;${CMAKE_CURRENT_SOURCE_DIR}/conffiles")
set(CPACK_RPM_POST_INSTALL_SCRIPT_FILE "${CMAKE_CURRENT_SOURCE_DIR}/postinst")
include(CPack)

View File

@@ -1,3 +1,13 @@
proxlb (1.0.2) unstable; urgency=low
* Add option to run migration in parallel or sequentially.
* Add option to run ProxLB only on a Proxmox cluster master (req. HA feature).
* Fix daemon timer to use hours instead of minutes.
* Fix CMake packaging for Debian package to avoid overwriting the config file.
* Fix some wonkey code styles.
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.ch> Tue, 13 Aug 2024 17:28:14 +0200
proxlb (1.0.0) unstable; urgency=low
* Initial release of ProxLB.

View File

@@ -1,3 +1,9 @@
* Tue Aug 13 2024 Florian Paul Azim Hoberg <gyptazy@gyptazy.ch>
- Add option to run migration in parallel or sequentially.
- Add option to run ProxLB only on a Proxmox cluster master (req. HA feature).
- Fixed daemon timer to use hours instead of minutes.
- Fixed some wonkey code styles.
* Thu Aug 01 2024 Florian Paul Azim Hoberg <gyptazy@gyptazy.ch>
- Initial release of ProxLB.

1
packaging/conffiles Normal file
View File

@@ -0,0 +1 @@
/etc/proxlb/proxlb.conf

737
proxlb
View File

@@ -22,6 +22,7 @@
import argparse
import configparser
import copy
import json
import logging
import os
@@ -33,16 +34,18 @@ except ImportError:
import random
import re
import requests
import socket
import sys
import time
import urllib3
# Constants
__appname__ = "ProxLB"
__version__ = "1.0.0"
__author__ = "Florian Paul Azim Hoberg <gyptazy@gyptazy.ch> @gyptazy"
__errors__ = False
__appname__ = "ProxLB"
__version__ = "1.0.3b"
__config_version__ = 3
__author__ = "Florian Paul Azim Hoberg <gyptazy@gyptazy.ch> @gyptazy"
__errors__ = False
# Classes
@@ -112,7 +115,7 @@ def validate_daemon(daemon, schedule):
if bool(int(daemon)):
logging.info(f'{info_prefix} Running in daemon mode. Next run in {schedule} hours.')
time.sleep(int(schedule) * 60)
time.sleep(int(schedule) * 60 * 60)
else:
logging.info(f'{info_prefix} Not running in daemon mode. Quitting.')
sys.exit(0)
@@ -145,9 +148,10 @@ def __validate_config_file(config_path):
def initialize_args():
""" Initialize given arguments for ProxLB. """
argparser = argparse.ArgumentParser(description='ProxLB')
argparser.add_argument('-c', '--config', type=str, help='Path to config file.', required=True)
argparser.add_argument('-d', '--dry-run', help='Perform a dry-run without doing any actions.', action='store_true', required=False)
argparser.add_argument('-j', '--json', help='Return a JSON of the VM movement.', action='store_true', required=False)
argparser.add_argument('-c', '--config', type=str, help='Path to config file.', required=False)
argparser.add_argument('-d', '--dry-run', help='Perform a dry-run without doing any actions.', action='store_true', required=False)
argparser.add_argument('-j', '--json', help='Return a JSON of the VM movement.', action='store_true', required=False)
argparser.add_argument('-b', '--best-node', help='Returns the best next node.', action='store_true', required=False)
return argparser.parse_args()
@@ -166,29 +170,43 @@ def initialize_config_path(app_args):
def initialize_config_options(config_path):
""" Read configuration from given config file for ProxLB. """
error_prefix = 'Error: [config]:'
info_prefix = 'Info: [config]:'
error_prefix = 'Error: [config]:'
info_prefix = 'Info: [config]:'
proxlb_config = {}
try:
config = configparser.ConfigParser()
config.read(config_path)
# Proxmox config
proxmox_api_host = config['proxmox']['api_host']
proxmox_api_user = config['proxmox']['api_user']
proxmox_api_pass = config['proxmox']['api_pass']
proxmox_api_ssl_v = config['proxmox']['verify_ssl']
# Balancing
balancing_method = config['balancing'].get('method', 'memory')
balancing_mode = config['balancing'].get('mode', 'used')
balancing_mode_option = config['balancing'].get('mode_option', 'bytes')
balancing_type = config['balancing'].get('type', 'vm')
balanciness = config['balancing'].get('balanciness', 10)
ignore_nodes = config['balancing'].get('ignore_nodes', None)
ignore_vms = config['balancing'].get('ignore_vms', None)
proxlb_config['proxmox_api_host'] = config['proxmox']['api_host']
proxlb_config['proxmox_api_user'] = config['proxmox']['api_user']
proxlb_config['proxmox_api_pass'] = config['proxmox']['api_pass']
proxlb_config['proxmox_api_ssl_v'] = config['proxmox']['verify_ssl']
# VM Balancing
proxlb_config['vm_balancing_enable'] = config['vm_balancing'].get('enable', 1)
proxlb_config['vm_balancing_method'] = config['vm_balancing'].get('method', 'memory')
proxlb_config['vm_balancing_mode'] = config['vm_balancing'].get('mode', 'used')
proxlb_config['vm_balancing_mode_option'] = config['vm_balancing'].get('mode_option', 'bytes')
proxlb_config['vm_balancing_type'] = config['vm_balancing'].get('type', 'vm')
proxlb_config['vm_balanciness'] = config['vm_balancing'].get('balanciness', 10)
proxlb_config['vm_parallel_migrations'] = config['vm_balancing'].get('parallel_migrations', 1)
proxlb_config['vm_ignore_nodes'] = config['vm_balancing'].get('ignore_nodes', None)
proxlb_config['vm_ignore_vms'] = config['vm_balancing'].get('ignore_vms', None)
# Storage Balancing
proxlb_config['storage_balancing_enable'] = config['storage_balancing'].get('enable', 0)
proxlb_config['storage_balancing_method'] = config['storage_balancing'].get('method', 'disk_space')
proxlb_config['storage_balanciness'] = config['storage_balancing'].get('balanciness', 10)
proxlb_config['storage_parallel_migrations'] = config['storage_balancing'].get('parallel_migrations', 1)
# Update Support
proxlb_config['update_service'] = config['update_service'].get('enable', 0)
# API
proxlb_config['api'] = config['update_service'].get('enable', 0)
# Service
daemon = config['service'].get('daemon', 1)
schedule = config['service'].get('schedule', 24)
log_verbosity = config['service'].get('log_verbosity', 'CRITICAL')
proxlb_config['master_only'] = config['service'].get('master_only', 0)
proxlb_config['daemon'] = config['service'].get('daemon', 1)
proxlb_config['schedule'] = config['service'].get('schedule', 24)
proxlb_config['log_verbosity'] = config['service'].get('log_verbosity', 'CRITICAL')
proxlb_config['config_version'] = config['service'].get('config_version', 2)
except configparser.NoSectionError:
logging.critical(f'{error_prefix} Could not find the required section.')
sys.exit(2)
@@ -199,9 +217,43 @@ def initialize_config_options(config_path):
logging.critical(f'{error_prefix} Could not find the required options in config file.')
sys.exit(2)
# Normalize and update bools. Afterwards, validate minimum required config version.
proxlb_config = __update_config_parser_bools(proxlb_config)
validate_config_minimum_version(proxlb_config)
logging.info(f'{info_prefix} Configuration file loaded.')
return proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v, balancing_method, balancing_mode, \
balancing_mode_option, balancing_type, balanciness, ignore_nodes, ignore_vms, daemon, schedule, log_verbosity
return proxlb_config
def __update_config_parser_bools(proxlb_config):
""" Update bools in config from configparser to real bools """
info_prefix = 'Info: [config-bool-converter]:'
# Normalize and update config parser values to bools.
for section, option_value in proxlb_config.items():
if option_value in [1, '1', 'yes', 'Yes', 'true', 'True', 'enable']:
logging.info(f'{info_prefix} Converting {section} to bool: True.')
proxlb_config[section] = True
if option_value in [0, '0', 'no', 'No', 'false', 'False', 'disable']:
logging.info(f'{info_prefix} Converting {section} to bool: False.')
proxlb_config[section] = False
return proxlb_config
def validate_config_minimum_version(proxlb_config):
""" Validate the minimum required config file for ProxLB """
info_prefix = 'Info: [config-version-validator]:'
error_prefix = 'Error: [config-version-validator]:'
if int(proxlb_config['config_version']) < __config_version__:
logging.error(f'{error_prefix} ProxLB config version {proxlb_config["config_version"]} is too low. Required: {__config_version__}.')
print(f'{error_prefix} ProxLB config version {proxlb_config["config_version"]} is too low. Required: {__config_version__}.')
sys.exit(1)
else:
logging.info(f'{info_prefix} ProxLB config version {proxlb_config["config_version"]} is fine. Required: {__config_version__}.')
def api_connect(proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v):
@@ -215,6 +267,8 @@ def api_connect(proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_ap
requests.packages.urllib3.disable_warnings()
logging.warning(f'{warn_prefix} API connection does not verify SSL certificate.')
proxmox_api_host = __api_connect_get_host(proxmox_api_host)
try:
api_object = proxmoxer.ProxmoxAPI(proxmox_api_host, user=proxmox_api_user, password=proxmox_api_pass, verify_ssl=proxmox_api_ssl_v)
except urllib3.exceptions.NameResolutionError:
@@ -231,6 +285,122 @@ def api_connect(proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_ap
return api_object
def __api_connect_get_host(proxmox_api_host):
""" Validate if a list of API hosts got provided and pre-validate the hosts. """
info_prefix = 'Info: [api-connect-get-host]:'
proxmox_port = 8006
if ',' in proxmox_api_host:
logging.info(f'{info_prefix} Multiple hosts for API connection are given. Testing hosts for further usage.')
proxmox_api_host = proxmox_api_host.split(',')
# Validate all given hosts and check for responsive on Proxmox web port.
for host in proxmox_api_host:
logging.info(f'{info_prefix} Testing host {host} on port tcp/{proxmox_port}.')
reachable = __api_connect_test_ipv4_host(host, proxmox_port)
if reachable:
return host
else:
logging.info(f'{info_prefix} Using host {proxmox_api_host} on port tcp/{proxmox_port}.')
return proxmox_api_host
def __api_connect_test_ipv4_host(proxmox_api_host, port):
error_prefix = 'Error: [api-connect-test-host]:'
info_prefix = 'Info: [api-connect-test-host]:'
proxmox_connection_timeout = 2
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.settimeout(proxmox_connection_timeout)
logging.info(f'{info_prefix} Timeout for host {proxmox_api_host} is set to {proxmox_connection_timeout} seconds.')
result = sock.connect_ex((proxmox_api_host,port))
if result == 0:
sock.close()
logging.info(f'{info_prefix} Host {proxmox_api_host} is reachable on port tcp/{port}.')
return True
else:
sock.close()
logging.critical(f'{error_prefix} Host {proxmox_api_host} is unreachable on port tcp/{port}.')
return False
def __api_connect_test_ipv6_host(proxmox_api_host, port):
error_prefix = 'Error: [api-connect-test-host]:'
info_prefix = 'Info: [api-connect-test-host]:'
proxmox_connection_timeout = 2
sock = socket.socket(socket.AF_INET6, socket.SOCK_STREAM)
sock.settimeout(proxmox_connection_timeout)
logging.info(f'{info_prefix} Timeout for host {proxmox_api_host} is set to {proxmox_connection_timeout}.')
result = sock.connect_ex((proxmox_api_host,port))
if result == 0:
sock.close()
logging.info(f'{info_prefix} Host {proxmox_api_host} is reachable on port tcp/{port}.')
return True
else:
sock.close()
logging.critical(f'{error_prefix} Host {proxmox_api_host} is unreachable on port tcp/{port}.')
return False
def execute_rebalancing_only_by_master(api_object, master_only):
""" Validate if balancing should only be done by the cluster master. Afterwards, validate if this node is the cluster master. """
info_prefix = 'Info: [only-on-master-executor]:'
master_only = bool(int(master_only))
if bool(int(master_only)):
logging.info(f'{info_prefix} Master only rebalancing is defined. Starting validation.')
cluster_master_node = get_cluster_master(api_object)
cluster_master = validate_cluster_master(cluster_master_node)
return cluster_master, master_only
else:
logging.info(f'{info_prefix} No master only rebalancing is defined. Skipping validation.')
return False, master_only
def get_cluster_master(api_object):
""" Get the current master of the Proxmox cluster. """
error_prefix = 'Error: [cluster-master-getter]:'
info_prefix = 'Info: [cluster-master-getter]:'
try:
ha_status_object = api_object.cluster().ha().status().manager_status().get()
logging.info(f'{info_prefix} Master node: {ha_status_object.get("manager_status", None).get("master_node", None)}')
except urllib3.exceptions.NameResolutionError:
logging.critical(f'{error_prefix} Could not resolve the API.')
sys.exit(2)
except requests.exceptions.ConnectTimeout:
logging.critical(f'{error_prefix} Connection time out to API.')
sys.exit(2)
except requests.exceptions.SSLError:
logging.critical(f'{error_prefix} SSL certificate verification failed for API.')
sys.exit(2)
cluster_master = ha_status_object.get("manager_status", None).get("master_node", None)
if cluster_master:
return cluster_master
else:
logging.critical(f'{error_prefix} Could not obtain cluster master. Please check your configuration - stopping.')
sys.exit(2)
def validate_cluster_master(cluster_master):
""" Validate if the current execution node is the cluster master. """
info_prefix = 'Info: [cluster-master-validator]:'
node_executor_hostname = socket.gethostname()
logging.info(f'{info_prefix} Node executor hostname is: {node_executor_hostname}')
if node_executor_hostname != cluster_master:
logging.info(f'{info_prefix} {node_executor_hostname} is not the cluster master ({cluster_master}).')
return False
else:
return True
def get_node_statistics(api_object, ignore_nodes):
""" Get statistics of cpu, memory and disk for each node in the cluster. """
info_prefix = 'Info: [node-statistics]:'
@@ -272,14 +442,15 @@ def get_node_statistics(api_object, ignore_nodes):
def get_vm_statistics(api_object, ignore_vms, balancing_type):
""" Get statistics of cpu, memory and disk for each vm in the cluster. """
info_prefix = 'Info: [vm-statistics]:'
warn_prefix = 'Warn: [vm-statistics]:'
vm_statistics = {}
ignore_vms_list = ignore_vms.split(',')
group_include = None
group_exclude = None
vm_ignore = None
vm_ignore_wildcard = False
info_prefix = 'Info: [vm-statistics]:'
warn_prefix = 'Warn: [vm-statistics]:'
vm_statistics = {}
ignore_vms_list = ignore_vms.split(',')
group_include = None
group_exclude = None
vm_ignore = None
vm_ignore_wildcard = False
_vm_details_storage_allowed = ['ide', 'nvme', 'scsi', 'virtio', 'sata', 'rootfs']
# Wildcard support: Initially validate if we need to honour
# any wildcards within the vm_ignore list.
@@ -316,11 +487,38 @@ def get_vm_statistics(api_object, ignore_vms, balancing_type):
vm_statistics[vm['name']]['disk_used'] = vm['disk']
vm_statistics[vm['name']]['vmid'] = vm['vmid']
vm_statistics[vm['name']]['node_parent'] = node['node']
vm_statistics[vm['name']]['type'] = 'vm'
# Rebalancing node will be overwritten after calculations.
# If the vm stays on the node, it will be removed at a
# later time.
vm_statistics[vm['name']]['node_rebalance'] = node['node']
vm_statistics[vm['name']]['storage'] = {}
vm_statistics[vm['name']]['type'] = 'vm'
# Get disk details of the related object.
_vm_details = api_object.nodes(node['node']).qemu(vm['vmid']).config.get()
logging.info(f'{info_prefix} Getting disk information for vm {vm["name"]}.')
for vm_detail_key, vm_detail_value in _vm_details.items():
# vm_detail_key_validator = re.sub('\d+$', '', vm_detail_key)
vm_detail_key_validator = re.sub(r'\d+$', '', vm_detail_key)
if vm_detail_key_validator in _vm_details_storage_allowed:
vm_statistics[vm['name']]['storage'][vm_detail_key] = {}
match = re.match(r'([^:]+):[^/]+/(.+),iothread=\d+,size=(\d+G)', _vm_details[vm_detail_key])
# Create an efficient match group and split the strings to assign them to the storage information.
if match:
_volume = match.group(1)
_disk_name = match.group(2)
_disk_size = match.group(3)
vm_statistics[vm['name']]['storage'][vm_detail_key]['name'] = _disk_name
vm_statistics[vm['name']]['storage'][vm_detail_key]['device_name'] = vm_detail_key
vm_statistics[vm['name']]['storage'][vm_detail_key]['volume'] = _volume
vm_statistics[vm['name']]['storage'][vm_detail_key]['storage_parent'] = _volume
vm_statistics[vm['name']]['storage'][vm_detail_key]['storage_rebalance'] = _volume
vm_statistics[vm['name']]['storage'][vm_detail_key]['size'] = _disk_size[:-1]
logging.info(f'{info_prefix} Added disk for {vm["name"]}: Name {_disk_name} on volume {_volume} with size {_disk_size}.')
else:
logging.info(f'{info_prefix} No (or unsupported) disk(s) for {vm["name"]} found.')
logging.info(f'{info_prefix} Added vm {vm["name"]}.')
# Add all containers if type is ct or all.
@@ -354,11 +552,38 @@ def get_vm_statistics(api_object, ignore_vms, balancing_type):
vm_statistics[vm['name']]['disk_used'] = vm['disk']
vm_statistics[vm['name']]['vmid'] = vm['vmid']
vm_statistics[vm['name']]['node_parent'] = node['node']
vm_statistics[vm['name']]['type'] = 'ct'
# Rebalancing node will be overwritten after calculations.
# If the vm stays on the node, it will be removed at a
# later time.
vm_statistics[vm['name']]['node_rebalance'] = node['node']
vm_statistics[vm['name']]['storage'] = {}
vm_statistics[vm['name']]['type'] = 'ct'
# Get disk details of the related object.
_vm_details = api_object.nodes(node['node']).lxc(vm['vmid']).config.get()
logging.info(f'{info_prefix} Getting disk information for vm {vm["name"]}.')
for vm_detail_key, vm_detail_value in _vm_details.items():
# vm_detail_key_validator = re.sub('\d+$', '', vm_detail_key)
vm_detail_key_validator = re.sub(r'\d+$', '', vm_detail_key)
if vm_detail_key_validator in _vm_details_storage_allowed:
vm_statistics[vm['name']]['storage'][vm_detail_key] = {}
match = re.match(r'(?P<volume>[^:]+):(?P<disk_name>[^,]+),size=(?P<disk_size>\S+)', _vm_details[vm_detail_key])
# Create an efficient match group and split the strings to assign them to the storage information.
if match:
_volume = match.group(1)
_disk_name = match.group(2)
_disk_size = match.group(3)
vm_statistics[vm['name']]['storage'][vm_detail_key]['name'] = _disk_name
vm_statistics[vm['name']]['storage'][vm_detail_key]['device_name'] = vm_detail_key
vm_statistics[vm['name']]['storage'][vm_detail_key]['volume'] = _volume
vm_statistics[vm['name']]['storage'][vm_detail_key]['storage_parent'] = _volume
vm_statistics[vm['name']]['storage'][vm_detail_key]['storage_rebalance'] = _volume
vm_statistics[vm['name']]['storage'][vm_detail_key]['size'] = _disk_size[:-1]
logging.info(f'{info_prefix} Added disk for {vm["name"]}: Name {_disk_name} on volume {_volume} with size {_disk_size}.')
else:
logging.info(f'{info_prefix} No disks for {vm["name"]} found.')
logging.info(f'{info_prefix} Added vm {vm["name"]}.')
logging.info(f'{info_prefix} Created VM statistics.')
@@ -392,6 +617,57 @@ def update_node_statistics(node_statistics, vm_statistics):
return node_statistics
def get_storage_statistics(api_object):
""" Get statistics of all storage in the cluster. """
info_prefix = 'Info: [storage-statistics]:'
storage_statistics = {}
for node in api_object.nodes.get():
for storage in api_object.nodes(node['node']).storage.get():
# Only add enabled and active storage repositories that might be suitable for further
# storage balancing.
if storage['enabled'] and storage['active'] and storage['shared']:
storage_statistics[storage['storage']] = {}
storage_statistics[storage['storage']]['name'] = storage['storage']
storage_statistics[storage['storage']]['total'] = storage['total']
storage_statistics[storage['storage']]['used'] = storage['used']
storage_statistics[storage['storage']]['used_percent'] = storage['used'] / storage['total'] * 100
storage_statistics[storage['storage']]['used_percent_last_run'] = 0
storage_statistics[storage['storage']]['free'] = storage['total'] - storage['used']
storage_statistics[storage['storage']]['free_percent'] = storage_statistics[storage['storage']]['free'] / storage['total'] * 100
storage_statistics[storage['storage']]['used_fraction'] = storage['used_fraction']
storage_statistics[storage['storage']]['type'] = storage['type']
storage_statistics[storage['storage']]['content'] = storage['content']
storage_statistics[storage['storage']]['usage_type'] = ''
# Split the Proxmox returned values to a list and validate the supported
# types of the underlying storage for further migrations.
storage_content_list = storage['content'].split(',')
usage_ct = False
usage_vm = False
if 'rootdir' in storage_content_list:
usage_ct = True
storage_statistics[storage['storage']]['usage_type'] = 'ct'
logging.info(f'{info_prefix} Storage {storage["storage"]} support CTs.')
if 'images' in storage_content_list:
usage_vm = True
storage_statistics[storage['storage']]['usage_type'] = 'vm'
logging.info(f'{info_prefix} Storage {storage["storage"]} support VMs.')
if usage_ct and usage_vm:
storage_statistics[storage['storage']]['usage_type'] = 'all'
logging.info(f'{info_prefix} Updateing storage {storage["storage"]} support to CTs and VMs.')
logging.info(f'{info_prefix} Added storage {storage["storage"]}.')
logging.info(f'{info_prefix} Created storage statistics.')
return storage_statistics
def __validate_ignore_vm_wildcard(ignore_vms):
""" Validate if a wildcard is used for ignored VMs. """
if '*' in ignore_vms:
@@ -445,9 +721,9 @@ def __get_proxlb_groups(vm_tags):
return group_include, group_exclude, vm_ignore
def balancing_calculations(balancing_method, balancing_mode, balancing_mode_option, node_statistics, vm_statistics, balanciness, rebalance, processed_vms):
def balancing_vm_calculations(balancing_method, balancing_mode, balancing_mode_option, node_statistics, vm_statistics, balanciness, app_args, rebalance, processed_vms):
""" Calculate re-balancing of VMs on present nodes across the cluster. """
info_prefix = 'Info: [rebalancing-calculator]:'
info_prefix = 'Info: [rebalancing-vm-calculator]:'
# Validate for a supported balancing method, mode and if rebalancing is required.
__validate_balancing_method(balancing_method)
@@ -461,21 +737,25 @@ def balancing_calculations(balancing_method, balancing_mode, balancing_mode_opti
resources_node_most_free = __get_most_free_resources_node(balancing_method, balancing_mode, balancing_mode_option, node_statistics)
# Update resource statistics for VMs and nodes.
node_statistics, vm_statistics = __update_resource_statistics(resources_vm_most_used, resources_node_most_free,
node_statistics, vm_statistics = __update_vm_resource_statistics(resources_vm_most_used, resources_node_most_free,
vm_statistics, node_statistics, balancing_method, balancing_mode)
# Start recursion until we do not have any needs to rebalance anymore.
balancing_calculations(balancing_method, balancing_mode, balancing_mode_option, node_statistics, vm_statistics, balanciness, rebalance, processed_vms)
balancing_vm_calculations(balancing_method, balancing_mode, balancing_mode_option, node_statistics, vm_statistics, balanciness, app_args, rebalance, processed_vms)
# If only best node argument set we simply return the next best node for VM
# and CT placement on the CLI and stop ProxLB.
if app_args.best_node:
logging.info(f'{info_prefix} Only best next node for new VM & CT placement requsted.')
best_next_node = __get_most_free_resources_node(balancing_method, balancing_mode, balancing_mode_option, node_statistics)
print(best_next_node[0])
logging.info(f'{info_prefix} Best next node for VM & CT placement: {best_next_node[0]}')
sys.exit(0)
# Honour groupings for include and exclude groups for rebalancing VMs.
node_statistics, vm_statistics = __get_vm_tags_include_groups(vm_statistics, node_statistics, balancing_method, balancing_mode)
node_statistics, vm_statistics = __get_vm_tags_exclude_groups(vm_statistics, node_statistics, balancing_method, balancing_mode)
# Remove VMs that are not being relocated.
vms_to_remove = [vm_name for vm_name, vm_info in vm_statistics.items() if 'node_rebalance' in vm_info and vm_info['node_rebalance'] == vm_info.get('node_parent')]
for vm_name in vms_to_remove:
del vm_statistics[vm_name]
logging.info(f'{info_prefix} Balancing calculations done.')
return node_statistics, vm_statistics
@@ -483,7 +763,7 @@ def balancing_calculations(balancing_method, balancing_mode, balancing_mode_opti
def __validate_balancing_method(balancing_method):
""" Validate for valid and supported balancing method. """
error_prefix = 'Error: [balancing-method-validation]:'
info_prefix = 'Info: [balancing-method-validation]]:'
info_prefix = 'Info: [balancing-method-validation]:'
if balancing_method not in ['memory', 'disk', 'cpu']:
logging.error(f'{error_prefix} Invalid balancing method: {balancing_method}')
@@ -495,7 +775,7 @@ def __validate_balancing_method(balancing_method):
def __validate_balancing_mode(balancing_mode):
""" Validate for valid and supported balancing mode. """
error_prefix = 'Error: [balancing-mode-validation]:'
info_prefix = 'Info: [balancing-mode-validation]]:'
info_prefix = 'Info: [balancing-mode-validation]:'
if balancing_mode not in ['used', 'assigned']:
logging.error(f'{error_prefix} Invalid balancing method: {balancing_mode}')
@@ -593,7 +873,7 @@ def __get_most_free_resources_node(balancing_method, balancing_mode, balancing_m
return node
def __update_resource_statistics(resource_highest_used_resources_vm, resource_highest_free_resources_node, vm_statistics, node_statistics, balancing_method, balancing_mode):
def __update_vm_resource_statistics(resource_highest_used_resources_vm, resource_highest_free_resources_node, vm_statistics, node_statistics, balancing_method, balancing_mode):
""" Update VM and node resource statistics. """
info_prefix = 'Info: [rebalancing-resource-statistics-update]:'
@@ -658,7 +938,7 @@ def __get_vm_tags_include_groups(vm_statistics, node_statistics, balancing_metho
vm_node_rebalance = vm_statistics[vm_name]['node_rebalance']
else:
_mocked_vm_object = (vm_name, vm_statistics[vm_name])
node_statistics, vm_statistics = __update_resource_statistics(_mocked_vm_object, [vm_node_rebalance], vm_statistics, node_statistics, balancing_method, balancing_mode)
node_statistics, vm_statistics = __update_vm_resource_statistics(_mocked_vm_object, [vm_node_rebalance], vm_statistics, node_statistics, balancing_method, balancing_mode)
processed_vm.append(vm_name)
return node_statistics, vm_statistics
@@ -697,47 +977,122 @@ def __get_vm_tags_exclude_groups(vm_statistics, node_statistics, balancing_metho
random_node = random.choice(list(node_statistics.keys()))
else:
_mocked_vm_object = (vm_name, vm_statistics[vm_name])
node_statistics, vm_statistics = __update_resource_statistics(_mocked_vm_object, [random_node], vm_statistics, node_statistics, balancing_method, balancing_mode)
node_statistics, vm_statistics = __update_vm_resource_statistics(_mocked_vm_object, [random_node], vm_statistics, node_statistics, balancing_method, balancing_mode)
processed_vm.append(vm_name)
return node_statistics, vm_statistics
def __run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args):
""" Run & execute the VM rebalancing via API. """
error_prefix = 'Error: [rebalancing-executor]:'
info_prefix = 'Info: [rebalancing-executor]:'
def __wait_job_finalized(api_object, node_name, job_id, counter):
""" Wait for a job to be finalized. """
error_prefix = 'Error: [job-status-getter]:'
info_prefix = 'Info: [job-status-getter]:'
if len(vm_statistics_rebalanced) > 0 and not app_args.dry_run:
for vm, value in vm_statistics_rebalanced.items():
logging.info(f'{info_prefix} Getting job status for job {job_id}.')
task = api_object.nodes(node_name).tasks(job_id).status().get()
logging.info(f'{info_prefix} {task}')
if task['status'] == 'running':
logging.info(f'{info_prefix} Validating job {job_id} for the {counter} run.')
# Do not run for infinity this recursion and fail when reaching the limit.
if counter == 300:
logging.critical(f'{error_prefix} The job {job_id} on node {node_name} did not finished in time for migration.')
time.sleep(5)
counter = counter + 1
logging.info(f'{info_prefix} Revalidating job {job_id} in a next run.')
__wait_job_finalized(api_object, node_name, job_id, counter)
logging.info(f'{info_prefix} Job {job_id} for migration from {node_name} terminiated succesfully.')
def __run_vm_rebalancing(api_object, _vm_vm_statistics, app_args, parallel_migrations):
""" Run & execute the VM rebalancing via API. """
error_prefix = 'Error: [vm-rebalancing-executor]:'
info_prefix = 'Info: [vm-rebalancing-executor]:'
# Remove VMs/CTs that do not have a new node location.
vms_to_remove = [vm_name for vm_name, vm_info in _vm_vm_statistics.items() if 'node_rebalance' in vm_info and vm_info['node_rebalance'] == vm_info.get('node_parent')]
for vm_name in vms_to_remove:
del _vm_vm_statistics[vm_name]
if len(_vm_vm_statistics) > 0 and not app_args.dry_run:
for vm, value in _vm_vm_statistics.items():
try:
# Migrate type VM (live migration).
if value['type'] == 'vm':
logging.info(f'{info_prefix} Rebalancing VM {vm} from node {value["node_parent"]} to node {value["node_rebalance"]}.')
api_object.nodes(value['node_parent']).qemu(value['vmid']).migrate().post(target=value['node_rebalance'],online=1)
job_id = api_object.nodes(value['node_parent']).qemu(value['vmid']).migrate().post(target=value['node_rebalance'],online=1)
# Migrate type CT (requires restart of container).
if value['type'] == 'ct':
logging.info(f'{info_prefix} Rebalancing CT {vm} from node {value["node_parent"]} to node {value["node_rebalance"]}.')
api_object.nodes(value['node_parent']).lxc(value['vmid']).migrate().post(target=value['node_rebalance'],restart=1)
job_id = api_object.nodes(value['node_parent']).lxc(value['vmid']).migrate().post(target=value['node_rebalance'],restart=1)
except proxmoxer.core.ResourceException as error_resource:
logging.critical(f'{error_prefix} {error_resource}')
# Wait for migration to be finished unless running parallel migrations.
if not bool(int(parallel_migrations)):
logging.info(f'{info_prefix} Rebalancing will be performed sequentially.')
__wait_job_finalized(api_object, value['node_parent'], job_id, counter=1)
else:
logging.info(f'{info_prefix} Rebalancing will be performed parallely.')
else:
logging.info(f'{info_prefix} No rebalancing needed.')
return _vm_vm_statistics
def __create_json_output(vm_statistics_rebalanced, app_args):
def __run_storage_rebalancing(api_object, _storage_vm_statistics, app_args, parallel_migrations):
""" Run & execute the storage rebalancing via API. """
error_prefix = 'Error: [storage-rebalancing-executor]:'
info_prefix = 'Info: [storage-rebalancing-executor]:'
# Remove VMs/CTs that do not have a new storage location.
vms_to_remove = [vm_name for vm_name, vm_info in _storage_vm_statistics.items() if all(storage.get('storage_rebalance') == storage.get('storage_parent') for storage in vm_info.get('storage', {}).values())]
for vm_name in vms_to_remove:
del _storage_vm_statistics[vm_name]
if len(_storage_vm_statistics) > 0 and not app_args.dry_run:
for vm, value in _storage_vm_statistics.items():
for disk, disk_info in value['storage'].items():
if disk_info.get('storage_rebalance', None) is not None:
try:
# Migrate type VM (live migration).
logging.info(f'{info_prefix} Rebalancing storage of VM {vm} from node.')
job_id = api_object.nodes(value['node_parent']).qemu(value['vmid']).move_disk().post(disk=disk,storage=disk_info.get('storage_rebalance', None), delete=1)
except proxmoxer.core.ResourceException as error_resource:
logging.critical(f'{error_prefix} {error_resource}')
# Wait for migration to be finished unless running parallel migrations.
if not bool(int(parallel_migrations)):
logging.info(f'{info_prefix} Rebalancing will be performed sequentially.')
__wait_job_finalized(api_object, value['node_parent'], job_id, counter=1)
else:
logging.info(f'{info_prefix} Rebalancing will be performed parallely.')
else:
logging.info(f'{info_prefix} No rebalancing needed.')
return _storage_vm_statistics
def __create_json_output(vm_statistics, app_args):
""" Create a machine parsable json output of VM rebalance statitics. """
info_prefix = 'Info: [json-output-generator]:'
if app_args.json:
logging.info(f'{info_prefix} Printing json output of VM statistics.')
print(json.dumps(vm_statistics_rebalanced))
print(json.dumps(vm_statistics))
def __create_cli_output(vm_statistics_rebalanced, app_args):
def __create_cli_output(vm_statistics, app_args):
""" Create output for CLI when running in dry-run mode. """
info_prefix_dry_run = 'Info: [cli-output-generator-dry-run]:'
info_prefix_run = 'Info: [cli-output-generator]:'
@@ -750,11 +1105,12 @@ def __create_cli_output(vm_statistics_rebalanced, app_args):
info_prefix = info_prefix_run
logging.info(f'{info_prefix} Start rebalancing vms to their new nodes.')
vm_to_node_list.append(['VM', 'Current Node', 'Rebalanced Node', 'VM Type'])
for vm_name, vm_values in vm_statistics_rebalanced.items():
vm_to_node_list.append([vm_name, vm_values['node_parent'], vm_values['node_rebalance'], vm_values['type']])
vm_to_node_list.append(['VM', 'Current Node', 'Rebalanced Node', 'Current Storage', 'Rebalanced Storage', 'VM Type'])
for vm_name, vm_values in vm_statistics.items():
for disk, disk_values in vm_values['storage'].items():
vm_to_node_list.append([vm_name, vm_values['node_parent'], vm_values['node_rebalance'], f'{disk_values.get("storage_parent", "N/A")} ({disk_values.get("device_name", "N/A")})', f'{disk_values.get("storage_rebalance", "N/A")} ({disk_values.get("device_name", "N/A")})', vm_values['type']])
if len(vm_statistics_rebalanced) > 0:
if len(vm_statistics) > 0:
logging.info(f'{info_prefix} Printing cli output of VM rebalancing.')
__print_table_cli(vm_to_node_list, app_args.dry_run)
else:
@@ -784,15 +1140,201 @@ def __print_table_cli(table, dry_run=False):
logging.info(f'{info_prefix} {row_format.format(*row)}')
def run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args):
def run_rebalancing(api_object, vm_statistics, app_args, parallel_migrations, balancing_type):
""" Run rebalancing of vms to new nodes in cluster. """
__run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args)
__create_json_output(vm_statistics_rebalanced, app_args)
__create_cli_output(vm_statistics_rebalanced, app_args)
_vm_vm_statistics = {}
_storage_vm_statistics = {}
if balancing_type == 'vm':
_vm_vm_statistics = copy.deepcopy(vm_statistics)
_vm_vm_statistics = __run_vm_rebalancing(api_object, _vm_vm_statistics, app_args, parallel_migrations)
return _vm_vm_statistics
if balancing_type == 'storage':
_storage_vm_statistics = copy.deepcopy(vm_statistics)
_storage_vm_statistics = __run_storage_rebalancing(api_object, _storage_vm_statistics, app_args, parallel_migrations)
return _storage_vm_statistics
def run_output_rebalancing(app_args, vm_output_statistics, storage_output_statistics):
""" Generate output of rebalanced resources. """
output_statistics = {**vm_output_statistics, **storage_output_statistics}
__create_json_output(output_statistics, app_args)
__create_cli_output(output_statistics, app_args)
def balancing_storage_calculations(storage_balancing_method, storage_statistics, vm_statistics, balanciness, rebalance, processed_vms):
""" Calculate re-balancing of storage on present datastores across the cluster. """
info_prefix = 'Info: [storage-rebalancing-calculator]:'
# Validate for a supported balancing method, mode and if rebalancing is required.
__validate_vm_statistics(vm_statistics)
rebalance = __validate_storage_balanciness(balanciness, storage_balancing_method, storage_statistics)
if rebalance:
vm_name, vm_disk_device = __get_most_used_resources_vm_storage(vm_statistics)
if vm_name not in processed_vms:
processed_vms.append(vm_name)
resources_storage_most_free = __get_most_free_storage(storage_balancing_method, storage_statistics)
# Update resource statistics for VMs and storage.
storage_statistics, vm_statistic = __update_resource_storage_statistics(storage_statistics, resources_storage_most_free, vm_statistics, vm_name, vm_disk_device)
# Start recursion until we do not have any needs to rebalance anymore.
balancing_storage_calculations(storage_balancing_method, storage_statistics, vm_statistics, balanciness, rebalance, processed_vms)
logging.info(f'{info_prefix} Balancing calculations done.')
return storage_statistics, vm_statistics
def __validate_storage_balanciness(balanciness, storage_balancing_method, storage_statistics):
""" Validate for balanciness of storage to ensure further rebalancing is needed. """
info_prefix = 'Info: [storage-balanciness-validation]:'
error_prefix = 'Error: [storage-balanciness-validation]:'
storage_resource_percent_list = []
storage_assigned_percent_match = []
# Validate for an allowed balancing method and define the storage resource selector.
if storage_balancing_method == 'disk_space':
logging.info(f'{info_prefix} Getting most free storage volume by disk size.')
storage_resource_selector = 'used'
elif storage_balancing_method == 'disk_io':
logging.error(f'{error_prefix} Getting most free storage volume by disk IO is not yet supported.')
sys.exit(2)
else:
logging.error(f'{error_prefix} Getting most free storage volume by disk IO is not yet supported.')
sys.exit(2)
# Obtain the metrics
for storage_name, storage_info in storage_statistics.items():
logging.info(f'{info_prefix} Validating storage: {storage_name} for balanciness for usage with: {storage_balancing_method}.')
# Save information of nodes from current run to compare them in the next recursion.
if storage_statistics[storage_name][f'{storage_resource_selector}_percent_last_run'] == storage_statistics[storage_name][f'{storage_resource_selector}_percent']:
storage_statistics[storage_name][f'{storage_resource_selector}_percent_match'] = True
else:
storage_statistics[storage_name][f'{storage_resource_selector}_percent_match'] = False
# Update value to the current value of the recursion run.
storage_statistics[storage_name][f'{storage_resource_selector}_percent_last_run'] = storage_statistics[storage_name][f'{storage_resource_selector}_percent']
# If all node resources are unchanged, the recursion can be left.
for key, value in storage_statistics.items():
storage_assigned_percent_match.append(value.get(f'{storage_resource_selector}_percent_match', False))
if False not in storage_assigned_percent_match:
return False
# Add node information to resource list.
storage_resource_percent_list.append(int(storage_info[f'{storage_resource_selector}_percent']))
logging.info(f'{info_prefix} Storage: {storage_name} with values: {storage_info}')
# Create a sorted list of the delta + balanciness between the node resources.
storage_resource_percent_list_sorted = sorted(storage_resource_percent_list)
storage_lowest_percent = storage_resource_percent_list_sorted[0]
storage_highest_percent = storage_resource_percent_list_sorted[-1]
# Validate if the recursion should be proceeded for further rebalancing.
if (int(storage_lowest_percent) + int(balanciness)) < int(storage_highest_percent):
logging.info(f'{info_prefix} Rebalancing for type "{storage_resource_selector}" of storage is needed. Highest usage: {int(storage_highest_percent)}% | Lowest usage: {int(storage_lowest_percent)}%.')
return True
else:
logging.info(f'{info_prefix} Rebalancing for type "{storage_resource_selector}" of storage is not needed. Highest usage: {int(storage_highest_percent)}% | Lowest usage: {int(storage_lowest_percent)}%.')
return False
def __get_most_used_resources_vm_storage(vm_statistics):
""" Get and return the most used disk of a VM by storage. """
info_prefix = 'Info: [get-most-used-disks-resources-vm]:'
# Get the biggest storage of a VM/CT. A VM/CT can hold multiple disks. Therefore, we need to iterate
# over all assigned disks to get the biggest one.
vm_object = sorted(
vm_statistics.items(),
key=lambda x: max(
(size_in_bytes(storage['size']) for storage in x[1].get('storage', {}).values() if 'size' in storage),
default=0
),
reverse=True
)
vm_object = vm_object[0]
vm_name = vm_object[0]
vm_disk_device = max(vm_object[1]['storage'], key=lambda x: int(vm_object[1]['storage'][x]['size']))
logging.info(f'{info_prefix} Got most used VM: {vm_name} with storage device: {vm_disk_device}.')
return vm_name, vm_disk_device
def __get_most_free_storage(storage_balancing_method, storage_statistics):
""" Get the storage with the most free space or IO, depending on the balancing mode. """
info_prefix = 'Info: [get-most-free-storage]:'
error_prefix = 'Error: [get-most-free-storage]:'
storage_volume = None
logging.info(f'{info_prefix} Starting to evaluate the most free storage volume.')
if storage_balancing_method == 'disk_space':
logging.info(f'{info_prefix} Getting most free storage volume by disk space.')
storage_volume = max(storage_statistics, key=lambda x: storage_statistics[x]['free_percent'])
if storage_balancing_method == 'disk_io':
logging.info(f'{info_prefix} Getting most free storage volume by disk IO.')
logging.error(f'{error_prefix} Getting most free storage volume by disk IO is not yet supported.')
sys.exit(2)
return storage_volume
def __update_resource_storage_statistics(storage_statistics, resources_storage_most_free, vm_statistics, vm_name, vm_disk_device):
""" Update VM and storage resource statistics. """
info_prefix = 'Info: [rebalancing-storage-resource-statistics-update]:'
current_storage = vm_statistics[vm_name]['storage'][vm_disk_device]['storage_parent']
current_storage_size = storage_statistics[current_storage]['free'] / (1024 ** 3)
rebalance_storage = resources_storage_most_free
rebalance_storage_size = storage_statistics[rebalance_storage]['free'] / (1024 ** 3)
vm_storage_size = vm_statistics[vm_name]['storage'][vm_disk_device]['size']
vm_storage_size_bytes = int(vm_storage_size) * 1024**3
# Assign new storage device to vm
logging.info(f'{info_prefix} Validating VM {vm_name} for potential storage rebalancing.')
if vm_statistics[vm_name]['storage'][vm_disk_device]['storage_rebalance'] == vm_statistics[vm_name]['storage'][vm_disk_device]['storage_parent']:
logging.info(f'{info_prefix} Setting VM {vm_name} from {current_storage} to {rebalance_storage} storage.')
vm_statistics[vm_name]['storage'][vm_disk_device]['storage_rebalance'] = resources_storage_most_free
else:
logging.info(f'{info_prefix} Setting VM {vm_name} from {current_storage} to {rebalance_storage} storage.')
# Recalculate values for storage
## Add freed resources to old parent storage device
storage_statistics[current_storage]['used'] = storage_statistics[current_storage]['used'] - vm_storage_size_bytes
storage_statistics[current_storage]['free'] = storage_statistics[current_storage]['free'] + vm_storage_size_bytes
storage_statistics[current_storage]['free_percent'] = (storage_statistics[current_storage]['free'] / storage_statistics[current_storage]['total']) * 100
storage_statistics[current_storage]['used_percent'] = (storage_statistics[current_storage]['used'] / storage_statistics[current_storage]['total']) * 100
logging.info(f'{info_prefix} Adding free space of {vm_storage_size}G to old storage with {current_storage_size}G. [free: {int(current_storage_size) + int(vm_storage_size)}G | {storage_statistics[current_storage]["free_percent"]}%]')
## Removed newly allocated resources to new rebalanced storage device
storage_statistics[rebalance_storage]['used'] = storage_statistics[rebalance_storage]['used'] + vm_storage_size_bytes
storage_statistics[rebalance_storage]['free'] = storage_statistics[rebalance_storage]['free'] - vm_storage_size_bytes
storage_statistics[rebalance_storage]['free_percent'] = (storage_statistics[rebalance_storage]['free'] / storage_statistics[rebalance_storage]['total']) * 100
storage_statistics[rebalance_storage]['used_percent'] = (storage_statistics[rebalance_storage]['used'] / storage_statistics[rebalance_storage]['total']) * 100
logging.info(f'{info_prefix} Adding used space of {vm_storage_size}G to new storage with {rebalance_storage_size}G. [free: {int(rebalance_storage_size) - int(vm_storage_size)}G | {storage_statistics[rebalance_storage]["free_percent"]}%]')
logging.info(f'{info_prefix} Updated VM and storage statistics.')
return storage_statistics, vm_statistics
def size_in_bytes(size_str):
size_unit = size_str[-1].upper()
size_value = float(size_str)
size_multipliers = {'K': 1024, 'M': 1024**2, 'G': 1024**3, 'T': 1024**4}
return size_value * size_multipliers.get(size_unit, 1)
def main():
""" Run ProxLB for balancing VM workloads across a Proxmox cluster. """
vm_output_statistics = {}
storage_output_statistics = {}
# Initialize PAS.
initialize_logger('CRITICAL')
app_args = initialize_args()
@@ -800,33 +1342,50 @@ def main():
pre_validations(config_path)
# Parse global config.
proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v, balancing_method, balancing_mode, balancing_mode_option, balancing_type, \
balanciness, ignore_nodes, ignore_vms, daemon, schedule, log_verbosity = initialize_config_options(config_path)
proxlb_config = initialize_config_options(config_path)
# Overwrite logging handler with user defined log verbosity.
initialize_logger(log_verbosity, update_log_verbosity=True)
initialize_logger(proxlb_config['log_verbosity'], update_log_verbosity=True)
while True:
# API Authentication.
api_object = api_connect(proxmox_api_host, proxmox_api_user, proxmox_api_pass, proxmox_api_ssl_v)
api_object = api_connect(proxlb_config['proxmox_api_host'], proxlb_config['proxmox_api_user'], proxlb_config['proxmox_api_pass'], proxlb_config['proxmox_api_ssl_v'])
# Get master node of cluster and ensure that ProxLB is only performed on the
# cluster master node to avoid ongoing rebalancing.
cluster_master, master_only = execute_rebalancing_only_by_master(api_object, proxlb_config['master_only'])
# Validate daemon service and skip following tasks when not being the cluster master.
if not cluster_master and master_only:
validate_daemon(proxlb_config['daemon'], proxlb_config['schedule'])
continue
# Get metric & statistics for vms and nodes.
node_statistics = get_node_statistics(api_object, ignore_nodes)
vm_statistics = get_vm_statistics(api_object, ignore_vms, balancing_type)
node_statistics = update_node_statistics(node_statistics, vm_statistics)
if proxlb_config['vm_balancing_enable'] or proxlb_config['storage_balancing_enable'] or app_args.best_node:
node_statistics = get_node_statistics(api_object, proxlb_config['vm_ignore_nodes'])
vm_statistics = get_vm_statistics(api_object, proxlb_config['vm_ignore_vms'], proxlb_config['vm_balancing_type'])
node_statistics = update_node_statistics(node_statistics, vm_statistics)
storage_statistics = get_storage_statistics(api_object)
# Calculate rebalancing of vms.
node_statistics_rebalanced, vm_statistics_rebalanced = balancing_calculations(balancing_method, balancing_mode, balancing_mode_option,
node_statistics, vm_statistics, balanciness, rebalance=False, processed_vms=[])
# Execute VM/CT balancing sub-routines.
if proxlb_config['vm_balancing_enable'] or app_args.best_node:
node_statistics, vm_statistics = balancing_vm_calculations(proxlb_config['vm_balancing_method'], proxlb_config['vm_balancing_mode'], proxlb_config['vm_balancing_mode_option'], node_statistics, vm_statistics, proxlb_config['vm_balanciness'], app_args, rebalance=False, processed_vms=[])
vm_output_statistics = run_rebalancing(api_object, vm_statistics, app_args, proxlb_config['vm_parallel_migrations'], 'vm')
# Rebalance vms to new nodes within the cluster.
run_vm_rebalancing(api_object, vm_statistics_rebalanced, app_args)
# Execute storage balancing sub-routines.
if proxlb_config['storage_balancing_enable']:
storage_statistics, vm_statistics = balancing_storage_calculations(proxlb_config['storage_balancing_method'], storage_statistics, vm_statistics, proxlb_config['storage_balanciness'], rebalance=False, processed_vms=[])
storage_output_statistics = run_rebalancing(api_object, vm_statistics, app_args, proxlb_config['storage_parallel_migrations'], 'storage')
# Generate balancing output
if proxlb_config['vm_balancing_enable'] or proxlb_config['storage_balancing_enable']:
run_output_rebalancing(app_args, vm_output_statistics, storage_output_statistics)
# Validate for any errors.
post_validations()
# Validate daemon service.
validate_daemon(daemon, schedule)
validate_daemon(proxlb_config['daemon'], proxlb_config['schedule'])
if __name__ == '__main__':

View File

@@ -4,11 +4,18 @@ api_user: root@pam
api_pass: FooBar
verify_ssl: 1
[balancing]
enable: 1
method: memory
mode: used
ignore_nodes: dummynode01,dummynode02
ignore_vms: testvm01,testvm02
[storage_balancing]
enable: 0
[update_service]
enable: 0
[api]
enable: 0
[service]
daemon: 1
schedule: 24
log_verbosity: CRITICAL
log_verbosity: CRITICAL