Compare commits

...

14 Commits

Author SHA1 Message Date
Florian Paul Azim Hoberg
c56d465f90 Add DPM (Dynamic Power Management) feature for Proxmox cluster nodes
Fixes: #141
2025-05-09 07:32:46 +02:00
Florian
1e096e1aae Merge pull request #221 from gyptazy/fix/137-systemd-unit-file
fix: Adjust the systemd unit file to run after the network target on non PVE nodes
2025-04-26 08:43:33 +02:00
gyptazy
420d669236 fix: Adjust the systemd unit file to run after the network target on non PVE nodes
Fixes: #137
2025-04-26 08:42:24 +02:00
Florian
24aa6aabc6 Merge pull request #220 from gyptazy/feature/157-add-retry-proxmox-api
feature: Add a retry mechanism when connecting to the Proxmox API
2025-04-24 13:49:55 +02:00
Florian Paul Azim Hoberg
5a9a4af532 feature: Add a retry mechanism when connecting to the Proxmox API
Fixes: #157
2025-04-24 13:29:41 +02:00
Florian
50f93e5f59 Merge pull request #219 from gyptazy/feature/218-add-1-to-1-relations-guest-hypervisor
feature: Add possibility to pin guests to a specific hypervisor node.
2025-04-24 13:01:44 +02:00
Florian Paul Azim Hoberg
33784f60b4 feature: Add possibility to pin guests to a specific hypervisor node.
Fixes: #218
2025-04-24 08:54:58 +02:00
Florian
9a261aa781 Merge pull request #213 from gyptazy/prepare/release-v1.1.2
release: Prepare release v1.1.2
2025-04-19 20:14:12 +02:00
gyptazy
366d5bc264 release: Prepare release v1.1.2 2025-04-19 20:10:49 +02:00
Florian
96ffa086b1 Merge pull request #212 from gyptazy/release/1.1.1
release: Create release 1.1.1
2025-04-19 19:45:33 +02:00
gyptazy
db005c138e release: Create release 1.1.1
Fixes: #211
2025-04-19 19:43:07 +02:00
Florian
1168f545e5 Merge pull request #210 from gyptazy/docs/209-adjust-options-in-readme
docs: * Fix the rendering of the possible values of the ProxLB option…
2025-04-19 06:50:48 +02:00
gyptazy
cc663c0518 docs: * Fix the rendering of the possible values of the ProxLB options in the README file
* Mention the privilege separation part on the token generation chapter

Fixes: #209
2025-04-19 06:49:04 +02:00
Florian
40de31bc3b Merge pull request #208 from gyptazy/techdebt/fix-code-style
tecdebt: Adjust code style.
2025-04-18 17:07:01 +02:00
23 changed files with 638 additions and 73 deletions

View File

@@ -1 +1 @@
date: TBD
date: 2025-04-20

View File

@@ -0,0 +1,2 @@
fixed:
- Fix systemd unit file to run after network on non PVE nodes (by @robertdahlem) [#137]

View File

@@ -0,0 +1,2 @@
added:
- Add a configurable retry mechanism when connecting to the Proxmox API (by @gyptazy) [#157]

View File

@@ -0,0 +1,2 @@
added:
- Add 1-to-1 relationships between guest and hypervisor node to ping a guest on a node (by @gyptazy) [#218]

View File

@@ -0,0 +1 @@
date: TBD

View File

@@ -0,0 +1,2 @@
added:
- Add power management feature for cluster nodes (by @gyptazy) [#141]

View File

@@ -0,0 +1 @@
date: TBD

View File

@@ -5,6 +5,29 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [1.1.1] - 2025-04-20
### Added
- Providing the API upstream error message when migration fails in debug mode (by @gyptazy) [#205]
### Changed
- Change the default behaviour of the daemon mode to active [#176]
- Change the default banalcing mode to used instead of assigned [#180]
### Fixed
- Set cpu_used to the cpu usage, which is a percent, times the total number of cores to get a number where guest cpu_used can be added to nodes cpu_used and be meaningful (by @glitchvern) [#195]
- Fix tag evluation for VMs for being ignored for further balancing [#163]
- Honor the value when balancing should not be performed and stop balancing [#174]
- allow the use of minutes instead of hours and only accept hours or minutes in the format (by @glitchvern) [#187]
- Remove hard coded memory usage from lowest usage node and use method and mode specified in configuration instead (by @glitchvern) [#197]
- Fix the guest type relationship in the logs when a migration job failed (by @gyptazy) [#204]
- Requery a guest if that running guest reports 0 cpu usage (by @glitchvern) [#200]
- Fix Python path for Docker entrypoint (by @crandler) [#170]
- Improve logging verbosity of messages that had a wrong servity [#165]
## [1.1.0] - 2025-04-01

116
README.md
View File

@@ -21,6 +21,7 @@
1. [Affinity Rules](#affinity-rules)
2. [Anti-Affinity Rules](#anti-affinity-rules)
3. [Ignore VMs](#ignore-vms)
4. [Pin VMs to Hypervisor Nodes](#pin-vms-to-hypervisor-nodes)
7. [Maintenance](#maintenance)
8. [Misc](#misc)
1. [Bugs](#bugs)
@@ -45,28 +46,29 @@ Overall, ProxLB significantly enhances resource management by intelligently dist
<img src="https://cdn.gyptazy.com/images/proxlb-rebalancing-demo.gif"/>
## Features
ProxLB's key features are by enabling automatic rebalancing of VMs and CTs across a Proxmox cluster based on memory, CPU, and local disk usage while identifying optimal nodes for automation. It supports maintenance mode, affinity rules, and seamless Proxmox API integration with ACL support, offering flexible usage as a one-time operation, a daemon, or through the Proxmox Web GUI.
ProxLB's key features are by enabling automatic rebalancing of VMs and CTs across a Proxmox cluster based on memory, CPU, and local disk usage while identifying optimal nodes for automation. It supports maintenance mode, affinity rules, and seamless Proxmox API integration with ACL support, offering flexible usage as a one-time operation, a daemon, or through the Proxmox Web GUI. In addition, ProxLB also supports additional enterprise alike features like power managements for nodes (often also known as DPM) where nodes can be turned on/off on demand when workloads are higher/lower than usual. Also the automated security-patching of nodes within the cluster (known as ASPM) may help to reduce the manual work from cluster admins, where nodes will install patches, move guests across the cluster, reboot and then reblance the cluster again.
**Features**
* Rebalance VMs/CTs in the cluster by:
* Memory
* Disk (only local storage)
* CPU
* Get best nodes for further automation
* Supported Guest Types
* VMs
* CTs
* Re-Balancing (DRS)
* Supporting VMs & CTs
* Balancing by:
* CPU
* Memory
* Disk
* Affinity / Anti-Affinity Rules
* Affinity: Groups guests together
* Anti-Affinity: Ensuring guests run on different nodes
* Best node evaluation
* Get the best node for guest placement (e.g., CI/CD)
* Maintenance Mode
* Set node(s) into maintenance
* Move all workloads to different nodes
* Affinity / Anti-Affinity Rules
* Evacuating a sinlge or multiple nodes
* Node Power Management (DPM)
* Auto Node Security-Patch-Management (ASPM)
* Fully based on Proxmox API
* Fully integrated into the Proxmox ACL
* No SSH required
* Usage
* One-Time
* Daemon
* Proxmox Web GUI Integration
* Utilizing the Proxmox User Authentications
* Supporting API tokens
* No SSH or Agents required
* Can run everywhere
## How does it work?
ProxLB is a load-balancing system designed to optimize the distribution of virtual machines (VMs) and containers (CTs) across a cluster. It works by first gathering resource usage metrics from all nodes in the cluster through the Proxmox API. This includes detailed resource metrics for each VM and CT on every node. ProxLB then evaluates the difference between the maximum and minimum resource usage of the nodes, referred to as "Balanciness." If this difference exceeds a predefined threshold (which is configurable), the system initiates the rebalancing process.
@@ -160,6 +162,7 @@ docker run -it --rm -v $(pwd)/proxlb.yaml:/etc/proxlb/proxlb.yaml proxlb
| Version | Image |
|------|:------:|
| latest | cr.gyptazy.com/proxlb/proxlb:latest |
| v1.1.1 | cr.gyptazy.com/proxlb/proxlb:v1.1.1 |
| v1.1.0 | cr.gyptazy.com/proxlb/proxlb:v1.1.0 |
| v1.0.6 | cr.gyptazy.com/proxlb/proxlb:v1.0.6 |
| v1.0.5 | cr.gyptazy.com/proxlb/proxlb:v1.0.5 |
@@ -240,29 +243,37 @@ The following options can be set in the configuration file `proxlb.yaml`:
| | pass | | FooBar | `Str` | Password for the API. (Recommended: Use API token authorization!) |
| | token_id | | proxlb | `Str` | Token ID of the user for the API. |
| | token_secret | | 430e308f-1337-1337-beef-1337beefcafe | `Str` | Secret of the token ID for the API. |
| | ssl_verification | | True | `Bool` | Validate SSL certificates (1) or ignore (0). (default: 1, type: bool) |
| | timeout | | 10 | `Int` | Timeout for the Proxmox API in sec. (default: 10) |
| | ssl_verification | | True | `Bool` | Validate SSL certificates (1) or ignore (0). [values: `1` (default), `0`] |
| | timeout | | 10 | `Int` | Timeout for the Proxmox API in sec. |
| | retries | | 1 | `Int` | How often a connection attempt to the defined API host should be performed. |
| | wait_time | | 1 | `Int` | How many seconds should be waited before performing another connection attempt to the API host. |
| `proxmox_cluster` | | | | | |
| | maintenance_nodes | | ['virt66.example.com'] | `List` | A list of Proxmox nodes that are defined to be in a maintenance. (default: []) |
| | ignore_nodes | | [] | `List` | A list of Proxmox nodes that are defined to be ignored. (default: []) |
| | maintenance_nodes | | ['virt66.example.com'] | `List` | A list of Proxmox nodes that are defined to be in a maintenance. |
| | ignore_nodes | | [] | `List` | A list of Proxmox nodes that are defined to be ignored. |
| | overprovisioning | | False | `Bool` | Avoids balancing when nodes would become overprovisioned. |
| `balancing` | | | | | |
| | enable | | True | `Bool` | Enables the guest balancing. (default: True)|
| | enforce_affinity | | True | `Bool` | Enforcing affinity/anti-affinity rules but balancing might become worse. (default: False) |
| | parallel | | False | `Bool` | If guests should be moved in parallel or sequentially. (default: False)|
| | live | | True | `Bool` | If guests should be moved live or shutdown. (default: True)|
| | with_local_disks | | True | `Bool` | If balancing of guests should include local disks (default: True)|
| | balance_types | | ['vm', 'ct'] | `List` | Defined the types of guests that should be honored. (default: ['vm', 'ct']) |
| | enable | | True | `Bool` | Enables the guest balancing.|
| | enforce_affinity | | True | `Bool` | Enforcing affinity/anti-affinity rules but balancing might become worse. |
| | parallel | | False | `Bool` | If guests should be moved in parallel or sequentially.|
| | live | | True | `Bool` | If guests should be moved live or shutdown.|
| | with_local_disks | | True | `Bool` | If balancing of guests should include local disks.|
| | balance_types | | ['vm', 'ct'] | `List` | Defined the types of guests that should be honored. [values: `vm`, `ct`]|
| | max_job_validation | | 1800 | `Int` | How long a job validation may take in seconds. (default: 1800) |
| | balanciness | | 10 | `Int` | The maximum delta of resource usage between node with highest and lowest usage. (default: 10) |
| | method | | memory | `Str` | The balancing method that should be used. (default: memory | choices: memory, cpu, disk)|
| | mode | | used | `Str` | The balancing mode that should be used. (default: used | choices: used, assigned)|
| | balanciness | | 10 | `Int` | The maximum delta of resource usage between node with highest and lowest usage. |
| | method | | memory | `Str` | The balancing method that should be used. [values: `memory` (default), `cpu`, `disk`]|
| | mode | | used | `Str` | The balancing mode that should be used. [values: `used` (default), `assigned`] |
| `dpm` | | | | | |
| | enable | | True | `Bool` | Enables the Dynamic Power Management functions.|
| | method | | memory | `Str` | The balancing method that should be used. [values: `memory` (default), `cpu`, `disk`]|
| | mode | | static | `Str` | The balancing mode that should be used. [values: `static` (default), `auto`] |
| | cluster_min_free_resources | | 60 | `Int` | Representing the minimum required free resouzrces in percent within the cluster. [values: `60`% (default)] |
| | cluster_min_nodes | | 3 | `Int` | The minimum of required nodes that should remain in a cluster. [values: `3` (default)] |
| `service` | | | | | |
| | daemon | | True | `Bool` | If daemon mode should be activated (default: True)|
| | daemon | | True | `Bool` | If daemon mode should be activated. |
| | `schedule` | | | `Dict` | Schedule config block for rebalancing. |
| | | interval | 12 | `Int` | How often rebalancing should occur in daemon mode (default: 12)|
| | | format | hours | `Str` | Sets the time format. (Allowed: `minutes`, `hours` | default: `hours`)|
| | log_level | | INFO | `Str` | Defines the default log level that should be logged. (default: INFO) |
| | | interval | 12 | `Int` | How often rebalancing should occur in daemon mode.|
| | | format | hours | `Str` | Sets the time format. [values: `hours` (default), `minutes`]|
| | log_level | | INFO | `Str` | Defines the default log level that should be logged. [values: `INFO` (default), `WARNING`, `CRITICAL`, `DEBUG`] |
An example of the configuration file looks like:
@@ -270,11 +281,15 @@ An example of the configuration file looks like:
proxmox_api:
hosts: ['virt01.example.com', '10.10.10.10', 'fe01::bad:code::cafe']
user: root@pam
#pass: crazyPassw0rd!
token_id: proxlb
token_secret: 430e308f-1337-1337-beef-1337beefcafe
pass: crazyPassw0rd!
# API Token method
# token_id: proxlb
# token_secret: 430e308f-1337-1337-beef-1337beefcafe
ssl_verification: True
timeout: 10
# API Connection retries
# retries: 1
# wait_time: 1
proxmox_cluster:
maintenance_nodes: ['virt66.example.com']
@@ -293,6 +308,15 @@ balancing:
method: memory
mode: used
dpm:
# DPM requires you to define the WOL (Wake-on-Lan)
# MAC address for each node in Proxmox.
enable: True
method: memory
mode: static
cluster_min_free_resources: 60
cluster_min_nodes: 1
service:
daemon: True
schedule:
@@ -343,7 +367,7 @@ As a result, ProxLB will try to place the VMs with the `plb_anti_affinity_ntp` t
**Note:** While this ensures that ProxLB tries distribute these VMs across different physical hosts within the Proxmox cluster this may not always work. If you have more guests attached to the group than nodes in the cluster, we still need to run them anywhere. If this case occurs, the next one with the most free resources will be selected.
### Ignore VMs / CTs
### Ignore VMs
<img align="left" src="https://cdn.gyptazy.com/images/proxlb-ignore-vm-movement.jpg"/> Guests, such as VMs or CTs, can also be completely ignored. This means, they won't be affected by any migration (even when (anti-)affinity rules are enforced). To ensure a proper resource evaluation, these guests are still collected and evaluated but simply skipped for balancing actions. Another thing is the implementation. While ProxLB might have a very restricted configuration file including the file permissions, this file is only read- and writeable by the Proxmox administrators. However, we might have user and groups who want to define on their own that their systems shouldn't be moved. Therefore, these users can simpy set a specific tag to the guest object - just like the (anti)affinity rules.
To define a guest to be ignored from the balancing, users assign a tag with the prefix `plb_ignore_$TAG`:
@@ -357,6 +381,20 @@ As a result, ProxLB will not migrate this guest with the `plb_ignore_dev` tag to
**Note:** Ignored guests are really ignored. Even by enforcing affinity rules this guest will be ignored.
### Pin VMs to Specific Hypervisor Nodes
<img align="left" src="https://cdn.gyptazy.com/images/proxlb-tag-node-pinning.jpg"/> Guests, such as VMs or CTs, can also be pinned to specific nodes in the cluster. This might be usefull when running applications with some special licensing requirements that are only fulfilled on certain nodes. It might also be interesting, when some physical hardware is attached to a node, that is not available in general within the cluster.
To pin a guest to a specific cluster node, users assign a tag with the prefix `plb_pin_$nodename` to the desired guest:
#### Example for Screenshot
```
plb_pin_node03
```
As a result, ProxLB will pin the guest `dev-vm01` to the node `virt03`.
**Note:** The given node names from the tag are validated. This means, ProxLB validated if the given node name is really part of the cluster. In case of a wrongly defined or unavailable node name it continous to use the regular processes to make sure the guest keeps running.
## Maintenance
<img src="https://cdn.gyptazy.com/images/proxlb-rebalancing-demo.gif"/>

View File

@@ -7,6 +7,9 @@ proxmox_api:
# token_secret: 430e308f-1337-1337-beef-1337beefcafe
ssl_verification: True
timeout: 10
# API Connection retries
# retries: 1
# wait_time: 1
proxmox_cluster:
maintenance_nodes: ['virt66.example.com']
@@ -25,6 +28,13 @@ balancing:
method: memory
mode: used
dpm:
enable: True
method: memory
mode: static
cluster_min_free_resources: 60
cluster_min_nodes: 1
service:
daemon: True
schedule:

17
debian/changelog vendored
View File

@@ -1,9 +1,24 @@
proxlb (1.1.2~b1) stable; urgency=medium
* Auto-created 1.1.2 beta 1 release.
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.com> Mon, 17 Mar 2025 18:55:02 +0000
proxlb (1.1.1) stable; urgency=medium
* Fix tag evluation for VMs for being ignored for further balancing. (Closes: #163)
* Improve logging verbosity of messages that had a wrong servity. (Closes: #165)
* Providing the API upstream error message when migration fails in debug mode (Closes: #205)
* Change the default behaviour of the daemon mode to active. (Closes: #176)
* Change the default banalcing mode to used instead of assigned. (Closes: #180)
* Set cpu_used to the cpu usage, which is a percent, times the total number of cores to get a number where guest cpu_used can be added to nodes cpu_used and be meaningful. (Closes: #195)
* Honor the value when balancing should not be performed and stop balancing. (Closes: #174)
* Allow the use of minutes instead of hours and only accept hours or minutes in the format. (Closes: #187)
* Remove hard coded memory usage from lowest usage node and use method and mode specified in configuration instead. (Closes: #197)
* Fix the guest type relationship in the logs when a migration job failed. (Closes: #204)
* Requery a guest if that running guest reports 0 cpu usage. (Closes: #200)
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.com> Tue, 1 Apr 2025 18:55:02 +0000
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.com> Sat, 20 Apr 2025 20:55:02 +0000
proxlb (1.1.0) stable; urgency=medium

View File

@@ -11,6 +11,7 @@
2. [Anti-Affinity Rules](#anti-affinity-rules)
3. [Affinity / Anti-Affinity Enforcing](#affinity--anti-affinity-enforcing)
4. [Ignore VMs](#ignore-vms)
5. [Pin VMs to Hypervisor Nodes](#pin-vms-to-hypervisor-nodes)
2. [API Loadbalancing](#api-loadbalancing)
3. [Ignore Host-Nodes or Guests](#ignore-host-nodes-or-guests)
4. [IPv6 Support](#ipv6-support)
@@ -18,6 +19,7 @@
6. [Parallel Migrations](#parallel-migrations)
7. [Run as a Systemd-Service](#run-as-a-systemd-service)
8. [SSL Self-Signed Certificates](#ssl-self-signed-certificates)
9. [Dynamic Power Management (DPM)](#dynamic-power-management)
## Authentication / User Accounts / Permissions
### Authentication
@@ -39,10 +41,9 @@ pveum acl modify / --roles proxlb --users proxlb@pve
*Note: The user management can also be done on the WebUI without invoking the CLI.*
### Creating an API Token for a User
Create an API token for user proxlb@pve with token ID proxlb and deactivated privilege separation:
```
# Create an API token for user proxlb@pve with token ID proxlb
pveum user token add proxlb@pve proxlb
pveum user token add proxlb@pve proxlb --privsep 0
```
Afterwards, you get the token secret returned. You can now add those entries to your ProxLB config. Make sure, that you also keep the `user` parameter, next to the new token parameters.
@@ -125,6 +126,20 @@ As a result, ProxLB will not migrate this guest with the `plb_ignore_dev` tag to
**Note:** Ignored guests are really ignored. Even by enforcing affinity rules this guest will be ignored.
### Pin VMs to Specific Hypervisor Nodes
<img align="left" src="https://cdn.gyptazy.com/images/proxlb-tag-node-pinning.jpg"/> Guests, such as VMs or CTs, can also be pinned to specific nodes in the cluster. This might be usefull when running applications with some special licensing requirements that are only fulfilled on certain nodes. It might also be interesting, when some physical hardware is attached to a node, that is not available in general within the cluster.
To pin a guest to a specific cluster node, users assign a tag with the prefix `plb_pin_$nodename` to the desired guest:
#### Example for Screenshot
```
plb_pin_node03
```
As a result, ProxLB will pin the guest `dev-vm01` to the node `virt03`.
**Note:** The given node names from the tag are validated. This means, ProxLB validated if the given node name is really part of the cluster. In case of a wrongly defined or unavailable node name it continous to use the regular processes to make sure the guest keeps running.
### API Loadbalancing
ProxLB supports API loadbalancing, where one or more host objects can be defined as a list. This ensures, that you can even operator ProxLB without further changes when one or more nodes are offline or in a maintenance. When defining multiple hosts, the first reachable one will be picked.
@@ -193,4 +208,34 @@ proxmox_api:
ssl_verification: False
```
*Note: Disabling SSL certificate validation is not recommended.*
*Note: Disabling SSL certificate validation is not recommended.*
### Dynamic Power Management (DPM)
<img align="left" src="https://cdn.gyptazy.com/images/proxlb-proxmox-node-wakeonlan-wol-mac-dpm.jpg"/> Configuring Dynamic Power Management (DPM) in ProxLB within a Proxmox cluster involves a few critical steps to ensure proper operation. The first consideration is that any node intended for automatic shutdown and startup must support Wake-on-LAN (WOL). This is essential because DPM relies on the ability to power nodes back on remotely. For this to work, the ProxLB instance must be able to reach the target nodes MAC address directly over the network.
To make this possible, you must configure the correct MAC address for WOL within the Proxmox web interface. This is done by selecting the node, going to the “System” section, then “Options,” and finally setting the “MAC address for Wake-on-LAN.” Alternatively, this value can also be submitted using the Proxmox API. Without this MAC address in place, ProxLB will not allow the node to be shut down. This restriction is in place to prevent nodes from being turned off without a way to bring them back online, which could lead to service disruption. By ensuring that each node has a valid WOL MAC address configured, DPM can operate safely and effectively, allowing ProxLB to manage the clusters power consumption dynamically.
#### Requirements
Using the powermanagement feature within clusters comes along with several requirements:
* ProxLB needs to reach the WOL-Mac address of the node (plain network)
* WOL must be enabled of the node in general (BIOS/UEFI)
* The related WOL network interface must be defined
* The related WOL network interface MAC address must be defined in Proxmox for the node
#### Options
| Section | Option | Sub Option | Example | Type | Description |
|---------|:------:|:----------:|:-------:|:----:|:-----------:|
| `dpm` | | | | | |
| | enable | | True | `Bool` | Enables the Dynamic Power Management functions.|
| | method | | memory | `Str` | The balancing method that should be used. [values: `memory` (default), `cpu`, `disk`]|
| | mode | | static | `Str` | The balancing mode that should be used. [values: `static` (default), `auto`] |
| | cluster_min_free_resources | | 60 | `Int` | Representing the minimum required free resouzrces in percent within the cluster. [values: `60`% (default)] |
| | cluster_min_nodes | | 3 | `Int` | The minimum of required nodes that should remain in a cluster. [values: `3` (default)] |
#### DPM Modes
##### Static
Static mode in DPM lets you set a fixed number of nodes that should always stay powered on in a Proxmox cluster. This is important to keep the cluster working properly, since you need at least three nodes to maintain quorum. The system wont let you go below that limit to avoid breaking cluster functionality.
Besides the minimum number of active nodes, you can also define a baseline for how many free resources—like CPU or RAM—should always be available when the virtual machines are running. If the available resources drop below that level, ProxLB will try to power on more nodes, as long as they're available and can be started. On the other hand, if the cluster has more than enough resources, ProxLB will begin to shut down nodes again, but only until the free resource threshold is reached.
This mode gives you a more stable setup by always keeping a minimum number of nodes ready while still adjusting the rest of the cluster based on resource usage, but in a controlled and predictable way.

View File

@@ -1,5 +1,5 @@
#!/usr/bin/env bash
VERSION="1.1.1"
VERSION="1.1.2b"
sed -i "s/^__version__ = .*/__version__ = \"$VERSION\"/" "proxlb/utils/version.py"
sed -i "s/version=\"[0-9]*\.[0-9]*\.[0-9]*\"/version=\"$VERSION\"/" setup.py

View File

@@ -17,6 +17,7 @@ from utils.logger import SystemdLogger
from utils.cli_parser import CliParser
from utils.config_parser import ConfigParser
from utils.proxmox_api import ProxmoxApi
from models.dpm import DPM
from models.nodes import Nodes
from models.guests import Guests
from models.groups import Groups
@@ -53,14 +54,17 @@ def main():
while True:
# Get all required objects from the Proxmox cluster
meta = {"meta": proxlb_config}
nodes = Nodes.get_nodes(proxmox_api, proxlb_config)
nodes, cluster = Nodes.get_nodes(proxmox_api, proxlb_config)
guests = Guests.get_guests(proxmox_api, nodes, meta)
groups = Groups.get_groups(guests, nodes)
# Merge obtained objects from the Proxmox cluster for further usage
proxlb_data = {**meta, **nodes, **guests, **groups}
proxlb_data = {**meta, **cluster, **nodes, **guests, **groups}
Helper.log_node_metrics(proxlb_data)
# Evaluate the dynamic power management for nodes in the clustet
DPM(proxlb_data)
# Update the initial node resource assignments
# by the previously created groups.
Calculations.set_node_assignments(proxlb_data)
@@ -70,10 +74,14 @@ def main():
Calculations.relocate_guests(proxlb_data)
Helper.log_node_metrics(proxlb_data, init=False)
# Perform balancing actions via Proxmox API
# Perform balancing
if not cli_args.dry_run or not proxlb_data["meta"]["balancing"].get("enable", False):
Balancing(proxmox_api, proxlb_data)
# Perform DPM
if not cli_args.dry_run:
DPM.dpm_shutdown_nodes(proxmox_api, proxlb_data)
# Validate if the JSON output should be
# printed to stdout
Helper.print_json(proxlb_data, cli_args.json)

View File

@@ -162,7 +162,7 @@ class Calculations:
logger.debug("Finished: get_most_free_node.")
@staticmethod
def relocate_guests_on_maintenance_nodes(proxlb_data: Dict[str, Any]):
def relocate_guests_on_maintenance_nodes(proxlb_data: Dict[str, Any]) -> None:
"""
Relocates guests that are currently on nodes marked for maintenance to
nodes with the most available resources.
@@ -192,7 +192,7 @@ class Calculations:
logger.debug("Finished: get_most_free_node.")
@staticmethod
def relocate_guests(proxlb_data: Dict[str, Any]):
def relocate_guests(proxlb_data: Dict[str, Any]) -> None:
"""
Relocates guests within the provided data structure to ensure affinity groups are
placed on nodes with the most free resources.
@@ -225,12 +225,13 @@ class Calculations:
for guest_name in proxlb_data["groups"]["affinity"][group_name]["guests"]:
proxlb_data["meta"]["balancing"]["balance_next_guest"] = guest_name
Calculations.val_anti_affinity(proxlb_data, guest_name)
Calculations.val_node_relationship(proxlb_data, guest_name)
Calculations.update_node_resources(proxlb_data)
logger.debug("Finished: relocate_guests.")
@staticmethod
def val_anti_affinity(proxlb_data: Dict[str, Any], guest_name: str):
def val_anti_affinity(proxlb_data: Dict[str, Any], guest_name: str) -> None:
"""
Validates and assigns nodes to guests based on anti-affinity rules.
@@ -279,7 +280,38 @@ class Calculations:
logger.debug("Finished: val_anti_affinity.")
@staticmethod
def update_node_resources(proxlb_data):
def val_node_relationship(proxlb_data: Dict[str, Any], guest_name: str) -> None:
"""
Validates and assigns guests to nodes based on defined relationships based on tags.
Parameters:
proxlb_data (Dict[str, Any]): The data holding all content of all objects.
guest_name (str): The name of the guest to be validated and assigned a node.
Returns:
None
"""
logger.debug("Starting: val_node_relationship.")
proxlb_data["guests"][guest_name]["processed"] = True
if proxlb_data["guests"][guest_name]["node_relationship"]:
logger.info(f"Guest '{guest_name}' has a specific relationship defined to node: {proxlb_data['guests'][guest_name]['node_relationship']}. Pinning to node.")
# Validate if the specified node name is really part of the cluster
if proxlb_data['guests'][guest_name]['node_relationship'] in proxlb_data["nodes"].keys():
logger.info(f"Guest '{guest_name}' has a specific relationship defined to node: {proxlb_data['guests'][guest_name]['node_relationship']} is a known hypervisor node in the cluster.")
# Pin the guest to the specified hypervisor node.
proxlb_data["meta"]["balancing"]["balance_next_node"] = proxlb_data['guests'][guest_name]['node_relationship']
else:
logger.warning(f"Guest '{guest_name}' has a specific relationship defined to node: {proxlb_data['guests'][guest_name]['node_relationship']} but this node name is not known in the cluster!")
else:
logger.info(f"Guest '{guest_name}' does not have any specific node relationships.")
logger.debug("Finished: val_node_relationship.")
@staticmethod
def update_node_resources(proxlb_data: Dict[str, Any]) -> None:
"""
Updates the resource allocation and usage statistics for nodes when a guest
is moved from one node to another.
@@ -343,3 +375,68 @@ class Calculations:
logger.debug(f"Set guest {guest_name} from node {node_current} to node {node_target}.")
logger.debug("Finished: update_node_resources.")
@staticmethod
def update_cluster_resources(proxlb_data: Dict[str, Any], node: str, action: str) -> None:
"""
Updates the cluster resource statistics based on the specified action and node.
This method modifies the cluster-level resource data (such as CPU, memory, disk usage,
and node counts) based on the action performed ('add' or 'remove') for the specified node.
It calculates the updated statistics after adding or removing a node and logs the results.
Parameters:
proxlb_data (Dict[str, Any]): The data representing the current state of the cluster,
including node-level statistics for CPU, memory, and disk.
node (str): The identifier of the node whose resources are being added or removed from the cluster.
action (str): The action to perform, either 'add' or 'remove'. 'add' will include the node's
resources in the cluster, while 'remove' will exclude the node's resources.
Returns:
None: The function modifies the `proxlb_data` dictionary in place to update the cluster resources.
"""
logger.debug("Starting: update_cluster_resources.")
logger.debug(f"DPM: Updating cluster statistics by online node {node}. Action: {action}")
logger.debug(f"DPM: update_cluster_resources - Before {action}: {proxlb_data['cluster']['memory_free_percent']}")
if action == "add":
proxlb_data["cluster"]["node_count"] = proxlb_data["cluster"].get("node_count", 0) + 1
proxlb_data["cluster"]["cpu_total"] = proxlb_data["cluster"].get("cpu_total", 0) + proxlb_data["nodes"][node]["cpu_total"]
proxlb_data["cluster"]["cpu_used"] = proxlb_data["cluster"].get("cpu_used", 0) + proxlb_data["nodes"][node]["cpu_used"]
proxlb_data["cluster"]["cpu_free"] = proxlb_data["cluster"].get("cpu_free", 0) + proxlb_data["nodes"][node]["cpu_free"]
proxlb_data["cluster"]["cpu_free_percent"] = proxlb_data["cluster"].get("cpu_free", 0) / proxlb_data["cluster"].get("cpu_total", 0) * 100
proxlb_data["cluster"]["cpu_used_percent"] = proxlb_data["cluster"].get("cpu_used", 0) / proxlb_data["cluster"].get("cpu_total", 0) * 100
proxlb_data["cluster"]["memory_total"] = proxlb_data["cluster"].get("memory_total", 0) + proxlb_data["nodes"][node]["memory_total"]
proxlb_data["cluster"]["memory_used"] = proxlb_data["cluster"].get("memory_used", 0) + proxlb_data["nodes"][node]["memory_used"]
proxlb_data["cluster"]["memory_free"] = proxlb_data["cluster"].get("memory_free", 0) + proxlb_data["nodes"][node]["memory_free"]
proxlb_data["cluster"]["memory_free_percent"] = proxlb_data["cluster"].get("memory_free", 0) / proxlb_data["cluster"].get("memory_total", 0) * 100
proxlb_data["cluster"]["memory_used_percent"] = proxlb_data["cluster"].get("memory_used", 0) / proxlb_data["cluster"].get("memory_total", 0) * 100
proxlb_data["cluster"]["disk_total"] = proxlb_data["cluster"].get("disk_total", 0) + proxlb_data["nodes"][node]["disk_total"]
proxlb_data["cluster"]["disk_used"] = proxlb_data["cluster"].get("disk_used", 0) + proxlb_data["nodes"][node]["disk_used"]
proxlb_data["cluster"]["disk_free"] = proxlb_data["cluster"].get("disk_free", 0) + proxlb_data["nodes"][node]["disk_free"]
proxlb_data["cluster"]["disk_free_percent"] = proxlb_data["cluster"].get("disk_free", 0) / proxlb_data["cluster"].get("disk_total", 0) * 100
proxlb_data["cluster"]["disk_used_percent"] = proxlb_data["cluster"].get("disk_used", 0) / proxlb_data["cluster"].get("disk_total", 0) * 100
proxlb_data["cluster"]["node_count_available"] = proxlb_data["cluster"].get("node_count_available", 0) + 1
proxlb_data["cluster"]["node_count_overall"] = proxlb_data["cluster"].get("node_count_overall", 0) + 1
if action == "remove":
proxlb_data["cluster"]["node_count"] = proxlb_data["cluster"].get("node_count", 0) - 1
proxlb_data["cluster"]["cpu_total"] = proxlb_data["cluster"].get("cpu_total", 0) - proxlb_data["nodes"][node]["cpu_total"]
proxlb_data["cluster"]["cpu_used"] = proxlb_data["cluster"].get("cpu_used", 0) - proxlb_data["nodes"][node]["cpu_used"]
proxlb_data["cluster"]["cpu_free"] = proxlb_data["cluster"].get("cpu_free", 0) - proxlb_data["nodes"][node]["cpu_free"]
proxlb_data["cluster"]["cpu_free_percent"] = proxlb_data["cluster"].get("cpu_free", 0) / proxlb_data["cluster"].get("cpu_total", 0) * 100
proxlb_data["cluster"]["cpu_used_percent"] = proxlb_data["cluster"].get("cpu_used", 0) / proxlb_data["cluster"].get("cpu_total", 0) * 100
proxlb_data["cluster"]["memory_total"] = proxlb_data["cluster"].get("memory_total", 0) - proxlb_data["nodes"][node]["memory_total"]
proxlb_data["cluster"]["memory_used"] = proxlb_data["cluster"].get("memory_used") - proxlb_data["nodes"][node]["memory_used"]
proxlb_data["cluster"]["memory_free"] = proxlb_data["cluster"].get("memory_free") - proxlb_data["nodes"][node]["memory_free"]
proxlb_data["cluster"]["memory_free_percent"] = proxlb_data["cluster"].get("memory_free") / proxlb_data["cluster"].get("memory_total", 0) * 100
proxlb_data["cluster"]["memory_used_percent"] = proxlb_data["cluster"].get("memory_used") / proxlb_data["cluster"].get("memory_total", 0) * 100
proxlb_data["cluster"]["disk_total"] = proxlb_data["cluster"].get("disk_total", 0) - proxlb_data["nodes"][node]["disk_total"]
proxlb_data["cluster"]["disk_used"] = proxlb_data["cluster"].get("disk_used", 0) - proxlb_data["nodes"][node]["disk_used"]
proxlb_data["cluster"]["disk_free"] = proxlb_data["cluster"].get("disk_free", 0) - proxlb_data["nodes"][node]["disk_free"]
proxlb_data["cluster"]["disk_free_percent"] = proxlb_data["cluster"].get("disk_free", 0) / proxlb_data["cluster"].get("disk_total", 0) * 100
proxlb_data["cluster"]["disk_used_percent"] = proxlb_data["cluster"].get("disk_used", 0) / proxlb_data["cluster"].get("disk_total", 0) * 100
proxlb_data["cluster"]["node_count_available"] = proxlb_data["cluster"].get("node_count_available", 0) - 1
logger.debug(f"DPM: update_cluster_resources - After {action}: {proxlb_data['cluster']['memory_free_percent']}")
logger.debug("Finished: update_cluster_resources.")

255
proxlb/models/dpm.py Normal file
View File

@@ -0,0 +1,255 @@
"""
The DPM (Dynamic Power Management) class is responsible for the dynamic management
of nodes within a Proxmox cluster, optimizing resource utilization by controlling
node power states based on specified schedules and conditions.
This class provides functionality for:
- Tracking and validating schedules for dynamic power management.
- Shutting down nodes that are underutilized or not needed.
- Starting up nodes using Wake-on-LAN (WOL) based on certain conditions.
- Ensuring that nodes are properly flagged for maintenance and startup/shutdown actions.
The DPM class can operate in different modes, such as static and automatic,
to either perform predefined actions or dynamically adjust based on real-time resource usage.
"""
__author__ = "Florian Paul Azim Hoberg <gyptazy>"
__copyright__ = "Copyright (C) 2025 Florian Paul Azim Hoberg (@gyptazy)"
__license__ = "GPL-3.0"
import proxmoxer
from typing import Dict, Any
from models.calculations import Calculations
from utils.logger import SystemdLogger
logger = SystemdLogger()
class DPM:
"""
The DPM (Dynamic Power Management) class is responsible for the dynamic management
of nodes within a Proxmox cluster, optimizing resource utilization by controlling
node power states based on specified schedules and conditions.
This class provides functionality for:
- Tracking and validating schedules for dynamic power management.
- Shutting down nodes that are underutilized or not needed.
- Starting up nodes using Wake-on-LAN (WOL) based on certain conditions.
- Ensuring that nodes are properly flagged for maintenance and startup/shutdown actions.
The DPM class can operate in different modes, such as static and automatic,
to either perform predefined actions or dynamically adjust based on real-time resource usage.
Attributes:
None directly defined for the class; instead, all actions are based on input data
and interactions with the Proxmox API and other helper functions.
Methods:
__init__(proxlb_data: Dict[str, Any]):
Initializes the DPM class, checking whether DPM is enabled and operating in the
appropriate mode (static or auto).
dpm_static(proxlb_data: Dict[str, Any]) -> None:
Evaluates the cluster's resource availability and performs static power management
actions by removing nodes that are not required.
dpm_shutdown_nodes(proxmox_api, proxlb_data) -> None:
Shuts down nodes flagged for DPM shutdown by using the Proxmox API, ensuring
that Wake-on-LAN (WOL) is available for proper node recovery.
dpm_startup_nodes(proxmox_api, proxlb_data) -> None:
Powers on nodes that are flagged for startup and are not in maintenance mode,
leveraging Wake-on-LAN (WOL) functionality.
dpm_validate_wol_mac(proxmox_api, node) -> None:
Validates and retrieves the Wake-on-LAN (WOL) MAC address for a given node,
ensuring that a valid address is set for powering on the node remotely.
"""
def __init__(self, proxlb_data: Dict[str, Any]):
"""
Initializes the DPM class with the provided ProxLB data.
Args:
proxlb_data (dict): The data required for balancing VMs and CTs.
"""
logger.debug("Starting: dpm class.")
if proxlb_data["meta"].get("dpm", {}).get("enable", False):
logger.debug("DPM function is enabled.")
mode = proxlb_data["meta"].get("dpm", {}).get("mode", None)
if mode == "static":
self.dpm_static(proxlb_data)
if mode == "auto":
self.dpm_auto(proxlb_data)
else:
logger.debug("DPM function is not enabled.")
logger.debug("Finished: dpm class.")
def dpm_static(self, proxlb_data: Dict[str, Any]) -> None:
"""
Evaluates and performs static Distributed Power Management (DPM) actions based on current cluster state.
This method monitors cluster resource availability and attempts to reduce the number of active nodes
when sufficient free resources are available. It ensures a minimum number of nodes remains active
and prioritizes shutting down nodes with the least utilized resources to minimize impact. Nodes selected
for shutdown are marked for maintenance and flagged for DPM shutdown.
Parameters:
proxlb_data (Dict[str, Any]): A dictionary containing metadata, cluster status, and node-level information
including resource utilization, configuration settings, and DPM thresholds.
Returns:
None: Modifies the input dictionary in-place to reflect updated cluster state and node flags.
"""
logger.debug("Starting: dpm_static.")
method = proxlb_data["meta"].get("dpm", {}).get("method", "memory")
cluster_nodes_overall = proxlb_data["cluster"]["node_count_overall"]
cluster_nodes_available = proxlb_data["cluster"]["node_count_available"]
cluster_free_resources_percent = int(proxlb_data["cluster"][f"{method}_free_percent"])
cluster_free_resources_req_min = proxlb_data["meta"].get("dpm", {}).get("cluster_min_free_resources", 0)
cluster_mind_nodes = proxlb_data["meta"].get("dpm", {}).get("cluster_min_nodes", 3)
logger.debug(f"DPM: Cluster Nodes: {cluster_nodes_overall} | Nodes available: {cluster_nodes_available} | Nodes offline: {cluster_nodes_overall - cluster_nodes_available}")
# Only proceed removing nodes if the cluster has enough resources
while cluster_free_resources_percent > cluster_free_resources_req_min:
logger.debug(f"DPM: More free resources {cluster_free_resources_percent}% available than required: {cluster_free_resources_req_min}%. DPM evaluation starting...")
# Ensure that we have at least a defined minimum of nodes left
if cluster_nodes_available > cluster_mind_nodes:
logger.debug(f"DPM: A minimum of {cluster_mind_nodes} nodes is required. {cluster_nodes_available} are available. Proceeding...")
# Get the node with the fewest used resources to keep migrations low
Calculations.get_most_free_node(proxlb_data, False)
dpm_node = proxlb_data["meta"]["balancing"]["balance_next_node"]
# Perform cluster calculation for evaluating how many nodes can safely leave
# the cluster. Further object calculations are being processed afterwards by
# the calculation class
logger.debug(f"DPM: Removing node {dpm_node} from cluster. Node will be turned off later.")
Calculations.update_cluster_resources(proxlb_data, dpm_node, "remove")
cluster_free_resources_percent = int(proxlb_data["cluster"][f"{method}_free_percent"])
logger.debug(f"DPM: Free cluster resources changed to: {int(proxlb_data['cluster'][f'{method}_free_percent'])}%.")
# Set node to maintenance and DPM shutdown
proxlb_data["nodes"][dpm_node]["maintenance"] = True
proxlb_data["nodes"][dpm_node]["dpm_shutdown"] = True
else:
logger.warning(f"DPM: A minimum of {cluster_mind_nodes} nodes is required. {cluster_nodes_available} are available. Cannot proceed!")
logger.debug(f"DPM: Not enough free resources {cluster_free_resources_percent}% available than required: {cluster_free_resources_req_min}%. DPM evaluation stopped.")
logger.debug("Finished: dpm_static.")
return proxlb_data
@staticmethod
def dpm_shutdown_nodes(proxmox_api, proxlb_data: Dict[str, Any]) -> None:
"""
Shuts down cluster nodes that are marked for maintenance and flagged for DPM shutdown.
This method iterates through the cluster nodes in the provided data and attempts to
power off any node that has both the 'maintenance' and 'dpm_shutdown' flags set.
It communicates with the Proxmox API to issue shutdown commands and logs any failures.
Parameters:
proxmox_api: An instance of the Proxmox API client used to issue node shutdown commands.
proxlb_data: A dictionary containing node status information, including flags for
maintenance and DPM shutdown readiness.
Returns:
None: Performs shutdown operations and logs outcomes; modifies no data directly.
"""
logger.debug("Starting: dpm_shutdown_nodes.")
for node, node_info in proxlb_data["nodes"].items():
if node_info["maintenance"] and node_info["dpm_shutdown"]:
logger.debug(f"DPM: Node: {node} is flagged as maintenance mode and to be powered off.")
# Ensure that the node has a valid WOL MAC defined. If not
# we would be unable to power on that system again
valid_wol_mac = DPM.dpm_validate_wol_mac(proxmox_api, node)
if valid_wol_mac:
try:
logger.debug(f"DPM: Shutting down node: {node}.")
job_id = proxmox_api.nodes(node).status.post(command="shutdown")
except proxmoxer.core.ResourceException as proxmox_api_error:
logger.critical(f"DPM: Error while powering off node {node}. Please check job-id: {job_id}")
logger.debug(f"DPM: Error while powering off node {node}. Please check job-id: {job_id}")
else:
logger.critical(f"DPM: Node {node} cannot be powered off due to missing WOL MAC. Please define a valid WOL MAC for this node.")
logger.debug("Finished: dpm_shutdown_nodes.")
@staticmethod
def dpm_startup_nodes(proxmox_api, proxlb_data: Dict[str, Any]) -> None:
"""
Starts uo cluster nodes that are marked for DPM start up.
This method iterates through the cluster nodes in the provided data and attempts to
power on any node that is not flagged as 'maintenance' but flagged as 'dpm_startup'.
It communicates with the Proxmox API to issue poweron commands and logs any failures.
Parameters:
proxmox_api: An instance of the Proxmox API client used to issue node startup commands.
proxlb_data: A dictionary containing node status information, including flags for
maintenance and DPM shutdown readiness.
Returns:
None: Performs poweron operations and logs outcomes; modifies no data directly.
"""
logger.debug("Starting: dpm_startup_nodes.")
for node, node_info in proxlb_data["nodes"].items():
if not node_info["maintenance"]:
logger.debug(f"DPM: Node: {node} is not in maintenance mode.")
if node_info["dpm_startup"]:
logger.debug(f"DPM: Node: {node} is flagged as to be started.")
try:
logger.debug(f"DPM: Powering on node: {node}.")
# Important: This requires Proxmox Operators to define the
# WOL address for each node within the Proxmox webinterface
job_id = proxmox_api.nodes().wakeonlan.post(node=node)
except proxmoxer.core.ResourceException as proxmox_api_error:
logger.critical(f"DPM: Error while powering on node {node}. Please check job-id: {job_id}")
logger.debug(f"DPM: Error while powering on node {node}. Please check job-id: {job_id}")
logger.debug("Finished: dpm_startup_nodes.")
@staticmethod
def dpm_validate_wol_mac(proxmox_api, node: Dict[str, Any]) -> str:
"""
Retrieves and validates the Wake-on-LAN (WOL) MAC address for a specified node.
This method fetches the MAC address configured for Wake-on-LAN (WOL) from the Proxmox API.
If the MAC address is found, it is logged. In case of failure to retrieve the address,
a critical log is generated indicating the absence of a WOL MAC address for the node.
Parameters:
proxmox_api: An instance of the Proxmox API client used to query node configurations.
node: The identifier (name or ID) of the node for which the WOL MAC address is to be validated.
Returns:
node_wol_mac_address: The WOL MAC address for the specified node if found, otherwise `None`.
"""
logger.debug("Starting: dpm_validate_wol_mac.")
try:
logger.debug(f"DPM: Getting WOL MAC address for node {node} from API.")
node_wol_mac_address = proxmox_api.nodes(node).config.get(property="wakeonlan")
node_wol_mac_address = node_wol_mac_address.get("wakeonlan")
logger.debug(f"DPM: Node {node} has MAC address: {node_wol_mac_address} for WOL.")
except proxmoxer.core.ResourceException as proxmox_api_error:
logger.debug(f"DPM: Failed to get WOL MAC address for node {node} from API.")
node_wol_mac_address = None
logger.critical(f"DPM: Node {node} has no MAC address defined for WOL.")
logger.debug("Finished: dpm_validate_wol_mac.")
return node_wol_mac_address

View File

@@ -90,6 +90,7 @@ class Guests:
guests['guests'][guest['name']]['affinity_groups'] = Tags.get_affinity_groups(guests['guests'][guest['name']]['tags'])
guests['guests'][guest['name']]['anti_affinity_groups'] = Tags.get_anti_affinity_groups(guests['guests'][guest['name']]['tags'])
guests['guests'][guest['name']]['ignore'] = Tags.get_ignore(guests['guests'][guest['name']]['tags'])
guests['guests'][guest['name']]['node_relationship'] = Tags.get_node_relationship(guests['guests'][guest['name']]['tags'])
guests['guests'][guest['name']]['type'] = 'vm'
else:
logger.debug(f'Metric for VM {guest["name"]} ignored because VM is not running.')
@@ -115,6 +116,7 @@ class Guests:
guests['guests'][guest['name']]['affinity_groups'] = Tags.get_affinity_groups(guests['guests'][guest['name']]['tags'])
guests['guests'][guest['name']]['anti_affinity_groups'] = Tags.get_anti_affinity_groups(guests['guests'][guest['name']]['tags'])
guests['guests'][guest['name']]['ignore'] = Tags.get_ignore(guests['guests'][guest['name']]['tags'])
guests['guests'][guest['name']]['node_relationship'] = Tags.get_node_relationship(guests['guests'][guest['name']]['tags'])
guests['guests'][guest['name']]['type'] = 'ct'
else:
logger.debug(f'Metric for CT {guest["name"]} ignored because CT is not running.')

View File

@@ -54,6 +54,7 @@ class Nodes:
"""
logger.debug("Starting: get_nodes.")
nodes = {"nodes": {}}
cluster = {"cluster": {}}
for node in proxmox_api.nodes.get():
# Ignoring a node results into ignoring all placed guests on the ignored node!
@@ -61,6 +62,8 @@ class Nodes:
nodes["nodes"][node["node"]] = {}
nodes["nodes"][node["node"]]["name"] = node["node"]
nodes["nodes"][node["node"]]["maintenance"] = False
nodes["nodes"][node["node"]]["dpm_shutdown"] = False
nodes["nodes"][node["node"]]["dpm_startup"] = False
nodes["nodes"][node["node"]]["cpu_total"] = node["maxcpu"]
nodes["nodes"][node["node"]]["cpu_assigned"] = 0
nodes["nodes"][node["node"]]["cpu_used"] = node["cpu"] * node["maxcpu"]
@@ -87,8 +90,35 @@ class Nodes:
if Nodes.set_node_maintenance(proxlb_config, node["node"]):
nodes["nodes"][node["node"]]["maintenance"] = True
# Generate the intial cluster statistics within the same loop to avoid a further one.
logger.debug(f"Updating cluster statistics by online node {node['node']}.")
cluster["cluster"]["node_count"] = cluster["cluster"].get("node_count", 0) + 1
cluster["cluster"]["cpu_total"] = cluster["cluster"].get("cpu_total", 0) + nodes["nodes"][node["node"]]["cpu_total"]
cluster["cluster"]["cpu_used"] = cluster["cluster"].get("cpu_used", 0) + nodes["nodes"][node["node"]]["cpu_used"]
cluster["cluster"]["cpu_free"] = cluster["cluster"].get("cpu_free", 0) + nodes["nodes"][node["node"]]["cpu_free"]
cluster["cluster"]["cpu_free_percent"] = cluster["cluster"].get("cpu_free", 0) / cluster["cluster"].get("cpu_total", 0) * 100
cluster["cluster"]["cpu_used_percent"] = cluster["cluster"].get("cpu_used", 0) / cluster["cluster"].get("cpu_total", 0) * 100
cluster["cluster"]["memory_total"] = cluster["cluster"].get("memory_total", 0) + nodes["nodes"][node["node"]]["memory_total"]
cluster["cluster"]["memory_used"] = cluster["cluster"].get("memory_used", 0) + nodes["nodes"][node["node"]]["memory_used"]
cluster["cluster"]["memory_free"] = cluster["cluster"].get("memory_free", 0) + nodes["nodes"][node["node"]]["memory_free"]
cluster["cluster"]["memory_free_percent"] = cluster["cluster"].get("memory_free", 0) / cluster["cluster"].get("memory_total", 0) * 100
cluster["cluster"]["memory_used_percent"] = cluster["cluster"].get("memory_used", 0) / cluster["cluster"].get("memory_total", 0) * 100
cluster["cluster"]["disk_total"] = cluster["cluster"].get("disk_total", 0) + nodes["nodes"][node["node"]]["disk_total"]
cluster["cluster"]["disk_used"] = cluster["cluster"].get("disk_used", 0) + nodes["nodes"][node["node"]]["disk_used"]
cluster["cluster"]["disk_free"] = cluster["cluster"].get("disk_free", 0) + nodes["nodes"][node["node"]]["disk_free"]
cluster["cluster"]["disk_free_percent"] = cluster["cluster"].get("disk_free", 0) / cluster["cluster"].get("disk_total", 0) * 100
cluster["cluster"]["disk_used_percent"] = cluster["cluster"].get("disk_used", 0) / cluster["cluster"].get("disk_total", 0) * 100
cluster["cluster"]["node_count_available"] = cluster["cluster"].get("node_count_available", 0) + 1
cluster["cluster"]["node_count_overall"] = cluster["cluster"].get("node_count_overall", 0) + 1
# Update the cluster statistics by offline nodes to have the overall count of nodes in the cluster
else:
logger.debug(f"Updating cluster statistics by offline node {node['node']}.")
cluster["cluster"]["node_count_overall"] = cluster["cluster"].get("node_count_overall", 0) + 1
logger.debug("Finished: get_nodes.")
return nodes
return nodes, cluster
@staticmethod
def set_node_maintenance(proxlb_config: Dict[str, Any], node_name: str) -> Dict[str, Any]:

View File

@@ -151,3 +151,29 @@ class Tags:
logger.debug("Finished: get_ignore.")
return ignore_tag
@staticmethod
def get_node_relationship(tags: List[str]) -> str:
"""
Get a node relationship tag for a guest from the Proxmox cluster by the API to pin
a guest to a node.
This method retrieves a relationship tag between a guest and a specific
hypervisor node to pin the guest to a specific node (e.g., for licensing reason).
Args:
tags (List): A list holding all defined tags for a given guest.
Returns:
Str: The related hypervisor node name.
"""
logger.debug("Starting: get_node_relationship.")
node_relationship_tag = False
if len(tags) > 0:
for tag in tags:
if tag.startswith("plb_pin"):
node_relationship_tag = tag.replace("plb_pin_", "")
logger.debug("Finished: get_node_relationship.")
return node_relationship_tag

View File

@@ -20,12 +20,13 @@ except ImportError:
PROXMOXER_PRESENT = False
import random
import socket
import sys
try:
import requests
REQUESTS_PRESENT = True
except ImportError:
REQUESTS_PRESENT = False
import sys
import time
try:
import urllib3
URLLIB3_PRESENT = True
@@ -141,7 +142,7 @@ class ProxmoxApi:
logger.debug("Finished: validate_config.")
def api_connect_get_hosts(self, proxmox_api_endpoints: list) -> str:
def api_connect_get_hosts(self, proxlb_config, proxmox_api_endpoints: list) -> str:
"""
Perform a connectivity test to determine a working host for the Proxmox API.
@@ -152,6 +153,7 @@ class ProxmoxApi:
are found, one is chosen at random to distribute the load across the cluster.
Args:
proxlb_config (Dict[str, Any]): A dictionary containing the ProxLB configuration.
proxmox_api_endpoints (list): A list of Proxmox API endpoints to test.
Returns:
@@ -175,21 +177,25 @@ class ProxmoxApi:
logger.critical(f"No proxmox_api hosts are defined.")
sys.exit(1)
# Get a suitable Proxmox API endpoint. Therefore, we check if we only have
# a single Proxmox API endpoint or multiple ones. If only one, we can return
# this one immediately. If this one does not work, the urllib will raise an
# exception during the connection attempt.
if len(proxmox_api_endpoints) == 1:
return proxmox_api_endpoints[0]
# If we have multiple Proxmox API endpoints, we need to check each one by
# doing a connection attempt for IPv4 and IPv6. If we find a working one,
# we return that one. This allows us to define multiple endpoints in a cluster.
validated_api_hosts = []
for host in proxmox_api_endpoints:
validated = self.test_api_proxmox_host(host)
if validated:
validated_api_hosts.append(validated)
# Get or set a default value for a maximum of retries when connecting to
# the Proxmox API
api_connection_retries = proxlb_config["proxmox_api"].get("retries", 1)
api_connection_wait_time = proxlb_config["proxmox_api"].get("wait_time", 1)
for api_connection_attempt in range(api_connection_retries):
validated = self.test_api_proxmox_host(host)
if validated:
validated_api_hosts.append(validated)
break
else:
logger.warning(f"Attempt {api_connection_attempt + 1}/{api_connection_retries} failed for host {host}. Retrying in {api_connection_wait_time} seconds...")
time.sleep(api_connection_wait_time)
if len(validated_api_hosts) > 0:
# Choose a random host to distribute the load across the cluster
@@ -307,7 +313,7 @@ class ProxmoxApi:
sock.close()
logger.warning(f"Host {host} is unreachable on IPv6 for tcp/{port}.")
logger.debug("Finished: test_api_proxmox_host_ipv4.")
logger.debug("Finished: test_api_proxmox_host_ipv6.")
return False
def test_api_user_permissions(self, proxmox_api: any):
@@ -372,7 +378,7 @@ class ProxmoxApi:
self.validate_config(proxlb_config)
# Get a valid Proxmox API endpoint
proxmox_api_endpoint = self.api_connect_get_hosts(proxlb_config.get("proxmox_api", {}).get("hosts", []))
proxmox_api_endpoint = self.api_connect_get_hosts(proxlb_config, proxlb_config.get("proxmox_api", {}).get("hosts", []))
# Disable warnings for SSL certificate validation
if not proxlb_config.get("proxmox_api").get("ssl_verification", True):

View File

@@ -3,5 +3,5 @@ __app_desc__ = "A DRS alike loadbalancer for Proxmox clusters."
__author__ = "Florian Paul Azim Hoberg <gyptazy>"
__copyright__ = "Copyright (C) 2025 Florian Paul Azim Hoberg (@gyptazy)"
__license__ = "GPL-3.0"
__version__ = "1.1.1"
__version__ = "1.1.2b"
__url__ = "https://github.com/gyptazy/ProxLB"

View File

@@ -1,11 +1,11 @@
[Unit]
Description=ProxLB - A loadbalancer for Proxmox clusters
After=pveproxy.service
Wants=pveproxy.service
After=network-online.target pveproxy.service
Wants=network-online.target pveproxy.service
[Service]
ExecStart=python3 /usr/lib/python3/dist-packages/proxlb/main.py -c /etc/proxlb/proxlb.yaml
User=plb
[Install]
WantedBy=multi-user.target
WantedBy=multi-user.target

View File

@@ -2,7 +2,7 @@ from setuptools import setup
setup(
name="proxlb",
version="1.1.1",
version="1.1.2b",
description="A DRS alike loadbalancer for Proxmox clusters.",
long_description="An advanced DRS alike loadbalancer for Proxmox clusters that also supports maintenance modes and affinity/anti-affinity rules.",
author="Florian Paul Azim Hoberg",