mirror of
https://github.com/gyptazy/ProxLB.git
synced 2026-04-06 04:41:58 +02:00
Compare commits
14 Commits
techdebt/f
...
feature/au
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c56d465f90 | ||
|
|
1e096e1aae | ||
|
|
420d669236 | ||
|
|
24aa6aabc6 | ||
|
|
5a9a4af532 | ||
|
|
50f93e5f59 | ||
|
|
33784f60b4 | ||
|
|
9a261aa781 | ||
|
|
366d5bc264 | ||
|
|
96ffa086b1 | ||
|
|
db005c138e | ||
|
|
1168f545e5 | ||
|
|
cc663c0518 | ||
|
|
40de31bc3b |
@@ -1 +1 @@
|
||||
date: TBD
|
||||
date: 2025-04-20
|
||||
|
||||
2
.changelogs/1.1.2/137_fix_systemd_unit_file.yml
Normal file
2
.changelogs/1.1.2/137_fix_systemd_unit_file.yml
Normal file
@@ -0,0 +1,2 @@
|
||||
fixed:
|
||||
- Fix systemd unit file to run after network on non PVE nodes (by @robertdahlem) [#137]
|
||||
@@ -0,0 +1,2 @@
|
||||
added:
|
||||
- Add a configurable retry mechanism when connecting to the Proxmox API (by @gyptazy) [#157]
|
||||
@@ -0,0 +1,2 @@
|
||||
added:
|
||||
- Add 1-to-1 relationships between guest and hypervisor node to ping a guest on a node (by @gyptazy) [#218]
|
||||
1
.changelogs/1.1.2/release_meta.yml
Normal file
1
.changelogs/1.1.2/release_meta.yml
Normal file
@@ -0,0 +1 @@
|
||||
date: TBD
|
||||
@@ -0,0 +1,2 @@
|
||||
added:
|
||||
- Add power management feature for cluster nodes (by @gyptazy) [#141]
|
||||
1
.changelogs/1.2.0/release_meta.yml
Normal file
1
.changelogs/1.2.0/release_meta.yml
Normal file
@@ -0,0 +1 @@
|
||||
date: TBD
|
||||
23
CHANGELOG.md
23
CHANGELOG.md
@@ -5,6 +5,29 @@ All notable changes to this project will be documented in this file.
|
||||
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
|
||||
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
|
||||
|
||||
## [1.1.1] - 2025-04-20
|
||||
|
||||
### Added
|
||||
|
||||
- Providing the API upstream error message when migration fails in debug mode (by @gyptazy) [#205]
|
||||
|
||||
### Changed
|
||||
|
||||
- Change the default behaviour of the daemon mode to active [#176]
|
||||
- Change the default banalcing mode to used instead of assigned [#180]
|
||||
|
||||
### Fixed
|
||||
|
||||
- Set cpu_used to the cpu usage, which is a percent, times the total number of cores to get a number where guest cpu_used can be added to nodes cpu_used and be meaningful (by @glitchvern) [#195]
|
||||
- Fix tag evluation for VMs for being ignored for further balancing [#163]
|
||||
- Honor the value when balancing should not be performed and stop balancing [#174]
|
||||
- allow the use of minutes instead of hours and only accept hours or minutes in the format (by @glitchvern) [#187]
|
||||
- Remove hard coded memory usage from lowest usage node and use method and mode specified in configuration instead (by @glitchvern) [#197]
|
||||
- Fix the guest type relationship in the logs when a migration job failed (by @gyptazy) [#204]
|
||||
- Requery a guest if that running guest reports 0 cpu usage (by @glitchvern) [#200]
|
||||
- Fix Python path for Docker entrypoint (by @crandler) [#170]
|
||||
- Improve logging verbosity of messages that had a wrong servity [#165]
|
||||
|
||||
|
||||
## [1.1.0] - 2025-04-01
|
||||
|
||||
|
||||
116
README.md
116
README.md
@@ -21,6 +21,7 @@
|
||||
1. [Affinity Rules](#affinity-rules)
|
||||
2. [Anti-Affinity Rules](#anti-affinity-rules)
|
||||
3. [Ignore VMs](#ignore-vms)
|
||||
4. [Pin VMs to Hypervisor Nodes](#pin-vms-to-hypervisor-nodes)
|
||||
7. [Maintenance](#maintenance)
|
||||
8. [Misc](#misc)
|
||||
1. [Bugs](#bugs)
|
||||
@@ -45,28 +46,29 @@ Overall, ProxLB significantly enhances resource management by intelligently dist
|
||||
<img src="https://cdn.gyptazy.com/images/proxlb-rebalancing-demo.gif"/>
|
||||
|
||||
## Features
|
||||
ProxLB's key features are by enabling automatic rebalancing of VMs and CTs across a Proxmox cluster based on memory, CPU, and local disk usage while identifying optimal nodes for automation. It supports maintenance mode, affinity rules, and seamless Proxmox API integration with ACL support, offering flexible usage as a one-time operation, a daemon, or through the Proxmox Web GUI.
|
||||
ProxLB's key features are by enabling automatic rebalancing of VMs and CTs across a Proxmox cluster based on memory, CPU, and local disk usage while identifying optimal nodes for automation. It supports maintenance mode, affinity rules, and seamless Proxmox API integration with ACL support, offering flexible usage as a one-time operation, a daemon, or through the Proxmox Web GUI. In addition, ProxLB also supports additional enterprise alike features like power managements for nodes (often also known as DPM) where nodes can be turned on/off on demand when workloads are higher/lower than usual. Also the automated security-patching of nodes within the cluster (known as ASPM) may help to reduce the manual work from cluster admins, where nodes will install patches, move guests across the cluster, reboot and then reblance the cluster again.
|
||||
|
||||
**Features**
|
||||
* Rebalance VMs/CTs in the cluster by:
|
||||
* Memory
|
||||
* Disk (only local storage)
|
||||
* CPU
|
||||
* Get best nodes for further automation
|
||||
* Supported Guest Types
|
||||
* VMs
|
||||
* CTs
|
||||
* Re-Balancing (DRS)
|
||||
* Supporting VMs & CTs
|
||||
* Balancing by:
|
||||
* CPU
|
||||
* Memory
|
||||
* Disk
|
||||
* Affinity / Anti-Affinity Rules
|
||||
* Affinity: Groups guests together
|
||||
* Anti-Affinity: Ensuring guests run on different nodes
|
||||
* Best node evaluation
|
||||
* Get the best node for guest placement (e.g., CI/CD)
|
||||
* Maintenance Mode
|
||||
* Set node(s) into maintenance
|
||||
* Move all workloads to different nodes
|
||||
* Affinity / Anti-Affinity Rules
|
||||
* Evacuating a sinlge or multiple nodes
|
||||
* Node Power Management (DPM)
|
||||
* Auto Node Security-Patch-Management (ASPM)
|
||||
* Fully based on Proxmox API
|
||||
* Fully integrated into the Proxmox ACL
|
||||
* No SSH required
|
||||
* Usage
|
||||
* One-Time
|
||||
* Daemon
|
||||
* Proxmox Web GUI Integration
|
||||
* Utilizing the Proxmox User Authentications
|
||||
* Supporting API tokens
|
||||
* No SSH or Agents required
|
||||
* Can run everywhere
|
||||
|
||||
## How does it work?
|
||||
ProxLB is a load-balancing system designed to optimize the distribution of virtual machines (VMs) and containers (CTs) across a cluster. It works by first gathering resource usage metrics from all nodes in the cluster through the Proxmox API. This includes detailed resource metrics for each VM and CT on every node. ProxLB then evaluates the difference between the maximum and minimum resource usage of the nodes, referred to as "Balanciness." If this difference exceeds a predefined threshold (which is configurable), the system initiates the rebalancing process.
|
||||
@@ -160,6 +162,7 @@ docker run -it --rm -v $(pwd)/proxlb.yaml:/etc/proxlb/proxlb.yaml proxlb
|
||||
| Version | Image |
|
||||
|------|:------:|
|
||||
| latest | cr.gyptazy.com/proxlb/proxlb:latest |
|
||||
| v1.1.1 | cr.gyptazy.com/proxlb/proxlb:v1.1.1 |
|
||||
| v1.1.0 | cr.gyptazy.com/proxlb/proxlb:v1.1.0 |
|
||||
| v1.0.6 | cr.gyptazy.com/proxlb/proxlb:v1.0.6 |
|
||||
| v1.0.5 | cr.gyptazy.com/proxlb/proxlb:v1.0.5 |
|
||||
@@ -240,29 +243,37 @@ The following options can be set in the configuration file `proxlb.yaml`:
|
||||
| | pass | | FooBar | `Str` | Password for the API. (Recommended: Use API token authorization!) |
|
||||
| | token_id | | proxlb | `Str` | Token ID of the user for the API. |
|
||||
| | token_secret | | 430e308f-1337-1337-beef-1337beefcafe | `Str` | Secret of the token ID for the API. |
|
||||
| | ssl_verification | | True | `Bool` | Validate SSL certificates (1) or ignore (0). (default: 1, type: bool) |
|
||||
| | timeout | | 10 | `Int` | Timeout for the Proxmox API in sec. (default: 10) |
|
||||
| | ssl_verification | | True | `Bool` | Validate SSL certificates (1) or ignore (0). [values: `1` (default), `0`] |
|
||||
| | timeout | | 10 | `Int` | Timeout for the Proxmox API in sec. |
|
||||
| | retries | | 1 | `Int` | How often a connection attempt to the defined API host should be performed. |
|
||||
| | wait_time | | 1 | `Int` | How many seconds should be waited before performing another connection attempt to the API host. |
|
||||
| `proxmox_cluster` | | | | | |
|
||||
| | maintenance_nodes | | ['virt66.example.com'] | `List` | A list of Proxmox nodes that are defined to be in a maintenance. (default: []) |
|
||||
| | ignore_nodes | | [] | `List` | A list of Proxmox nodes that are defined to be ignored. (default: []) |
|
||||
| | maintenance_nodes | | ['virt66.example.com'] | `List` | A list of Proxmox nodes that are defined to be in a maintenance. |
|
||||
| | ignore_nodes | | [] | `List` | A list of Proxmox nodes that are defined to be ignored. |
|
||||
| | overprovisioning | | False | `Bool` | Avoids balancing when nodes would become overprovisioned. |
|
||||
| `balancing` | | | | | |
|
||||
| | enable | | True | `Bool` | Enables the guest balancing. (default: True)|
|
||||
| | enforce_affinity | | True | `Bool` | Enforcing affinity/anti-affinity rules but balancing might become worse. (default: False) |
|
||||
| | parallel | | False | `Bool` | If guests should be moved in parallel or sequentially. (default: False)|
|
||||
| | live | | True | `Bool` | If guests should be moved live or shutdown. (default: True)|
|
||||
| | with_local_disks | | True | `Bool` | If balancing of guests should include local disks (default: True)|
|
||||
| | balance_types | | ['vm', 'ct'] | `List` | Defined the types of guests that should be honored. (default: ['vm', 'ct']) |
|
||||
| | enable | | True | `Bool` | Enables the guest balancing.|
|
||||
| | enforce_affinity | | True | `Bool` | Enforcing affinity/anti-affinity rules but balancing might become worse. |
|
||||
| | parallel | | False | `Bool` | If guests should be moved in parallel or sequentially.|
|
||||
| | live | | True | `Bool` | If guests should be moved live or shutdown.|
|
||||
| | with_local_disks | | True | `Bool` | If balancing of guests should include local disks.|
|
||||
| | balance_types | | ['vm', 'ct'] | `List` | Defined the types of guests that should be honored. [values: `vm`, `ct`]|
|
||||
| | max_job_validation | | 1800 | `Int` | How long a job validation may take in seconds. (default: 1800) |
|
||||
| | balanciness | | 10 | `Int` | The maximum delta of resource usage between node with highest and lowest usage. (default: 10) |
|
||||
| | method | | memory | `Str` | The balancing method that should be used. (default: memory | choices: memory, cpu, disk)|
|
||||
| | mode | | used | `Str` | The balancing mode that should be used. (default: used | choices: used, assigned)|
|
||||
| | balanciness | | 10 | `Int` | The maximum delta of resource usage between node with highest and lowest usage. |
|
||||
| | method | | memory | `Str` | The balancing method that should be used. [values: `memory` (default), `cpu`, `disk`]|
|
||||
| | mode | | used | `Str` | The balancing mode that should be used. [values: `used` (default), `assigned`] |
|
||||
| `dpm` | | | | | |
|
||||
| | enable | | True | `Bool` | Enables the Dynamic Power Management functions.|
|
||||
| | method | | memory | `Str` | The balancing method that should be used. [values: `memory` (default), `cpu`, `disk`]|
|
||||
| | mode | | static | `Str` | The balancing mode that should be used. [values: `static` (default), `auto`] |
|
||||
| | cluster_min_free_resources | | 60 | `Int` | Representing the minimum required free resouzrces in percent within the cluster. [values: `60`% (default)] |
|
||||
| | cluster_min_nodes | | 3 | `Int` | The minimum of required nodes that should remain in a cluster. [values: `3` (default)] |
|
||||
| `service` | | | | | |
|
||||
| | daemon | | True | `Bool` | If daemon mode should be activated (default: True)|
|
||||
| | daemon | | True | `Bool` | If daemon mode should be activated. |
|
||||
| | `schedule` | | | `Dict` | Schedule config block for rebalancing. |
|
||||
| | | interval | 12 | `Int` | How often rebalancing should occur in daemon mode (default: 12)|
|
||||
| | | format | hours | `Str` | Sets the time format. (Allowed: `minutes`, `hours` | default: `hours`)|
|
||||
| | log_level | | INFO | `Str` | Defines the default log level that should be logged. (default: INFO) |
|
||||
| | | interval | 12 | `Int` | How often rebalancing should occur in daemon mode.|
|
||||
| | | format | hours | `Str` | Sets the time format. [values: `hours` (default), `minutes`]|
|
||||
| | log_level | | INFO | `Str` | Defines the default log level that should be logged. [values: `INFO` (default), `WARNING`, `CRITICAL`, `DEBUG`] |
|
||||
|
||||
|
||||
An example of the configuration file looks like:
|
||||
@@ -270,11 +281,15 @@ An example of the configuration file looks like:
|
||||
proxmox_api:
|
||||
hosts: ['virt01.example.com', '10.10.10.10', 'fe01::bad:code::cafe']
|
||||
user: root@pam
|
||||
#pass: crazyPassw0rd!
|
||||
token_id: proxlb
|
||||
token_secret: 430e308f-1337-1337-beef-1337beefcafe
|
||||
pass: crazyPassw0rd!
|
||||
# API Token method
|
||||
# token_id: proxlb
|
||||
# token_secret: 430e308f-1337-1337-beef-1337beefcafe
|
||||
ssl_verification: True
|
||||
timeout: 10
|
||||
# API Connection retries
|
||||
# retries: 1
|
||||
# wait_time: 1
|
||||
|
||||
proxmox_cluster:
|
||||
maintenance_nodes: ['virt66.example.com']
|
||||
@@ -293,6 +308,15 @@ balancing:
|
||||
method: memory
|
||||
mode: used
|
||||
|
||||
dpm:
|
||||
# DPM requires you to define the WOL (Wake-on-Lan)
|
||||
# MAC address for each node in Proxmox.
|
||||
enable: True
|
||||
method: memory
|
||||
mode: static
|
||||
cluster_min_free_resources: 60
|
||||
cluster_min_nodes: 1
|
||||
|
||||
service:
|
||||
daemon: True
|
||||
schedule:
|
||||
@@ -343,7 +367,7 @@ As a result, ProxLB will try to place the VMs with the `plb_anti_affinity_ntp` t
|
||||
|
||||
**Note:** While this ensures that ProxLB tries distribute these VMs across different physical hosts within the Proxmox cluster this may not always work. If you have more guests attached to the group than nodes in the cluster, we still need to run them anywhere. If this case occurs, the next one with the most free resources will be selected.
|
||||
|
||||
### Ignore VMs / CTs
|
||||
### Ignore VMs
|
||||
<img align="left" src="https://cdn.gyptazy.com/images/proxlb-ignore-vm-movement.jpg"/> Guests, such as VMs or CTs, can also be completely ignored. This means, they won't be affected by any migration (even when (anti-)affinity rules are enforced). To ensure a proper resource evaluation, these guests are still collected and evaluated but simply skipped for balancing actions. Another thing is the implementation. While ProxLB might have a very restricted configuration file including the file permissions, this file is only read- and writeable by the Proxmox administrators. However, we might have user and groups who want to define on their own that their systems shouldn't be moved. Therefore, these users can simpy set a specific tag to the guest object - just like the (anti)affinity rules.
|
||||
|
||||
To define a guest to be ignored from the balancing, users assign a tag with the prefix `plb_ignore_$TAG`:
|
||||
@@ -357,6 +381,20 @@ As a result, ProxLB will not migrate this guest with the `plb_ignore_dev` tag to
|
||||
|
||||
**Note:** Ignored guests are really ignored. Even by enforcing affinity rules this guest will be ignored.
|
||||
|
||||
### Pin VMs to Specific Hypervisor Nodes
|
||||
<img align="left" src="https://cdn.gyptazy.com/images/proxlb-tag-node-pinning.jpg"/> Guests, such as VMs or CTs, can also be pinned to specific nodes in the cluster. This might be usefull when running applications with some special licensing requirements that are only fulfilled on certain nodes. It might also be interesting, when some physical hardware is attached to a node, that is not available in general within the cluster.
|
||||
|
||||
To pin a guest to a specific cluster node, users assign a tag with the prefix `plb_pin_$nodename` to the desired guest:
|
||||
|
||||
#### Example for Screenshot
|
||||
```
|
||||
plb_pin_node03
|
||||
```
|
||||
|
||||
As a result, ProxLB will pin the guest `dev-vm01` to the node `virt03`.
|
||||
|
||||
**Note:** The given node names from the tag are validated. This means, ProxLB validated if the given node name is really part of the cluster. In case of a wrongly defined or unavailable node name it continous to use the regular processes to make sure the guest keeps running.
|
||||
|
||||
## Maintenance
|
||||
<img src="https://cdn.gyptazy.com/images/proxlb-rebalancing-demo.gif"/>
|
||||
|
||||
|
||||
@@ -7,6 +7,9 @@ proxmox_api:
|
||||
# token_secret: 430e308f-1337-1337-beef-1337beefcafe
|
||||
ssl_verification: True
|
||||
timeout: 10
|
||||
# API Connection retries
|
||||
# retries: 1
|
||||
# wait_time: 1
|
||||
|
||||
proxmox_cluster:
|
||||
maintenance_nodes: ['virt66.example.com']
|
||||
@@ -25,6 +28,13 @@ balancing:
|
||||
method: memory
|
||||
mode: used
|
||||
|
||||
dpm:
|
||||
enable: True
|
||||
method: memory
|
||||
mode: static
|
||||
cluster_min_free_resources: 60
|
||||
cluster_min_nodes: 1
|
||||
|
||||
service:
|
||||
daemon: True
|
||||
schedule:
|
||||
|
||||
17
debian/changelog
vendored
17
debian/changelog
vendored
@@ -1,9 +1,24 @@
|
||||
proxlb (1.1.2~b1) stable; urgency=medium
|
||||
|
||||
* Auto-created 1.1.2 beta 1 release.
|
||||
|
||||
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.com> Mon, 17 Mar 2025 18:55:02 +0000
|
||||
|
||||
proxlb (1.1.1) stable; urgency=medium
|
||||
|
||||
* Fix tag evluation for VMs for being ignored for further balancing. (Closes: #163)
|
||||
* Improve logging verbosity of messages that had a wrong servity. (Closes: #165)
|
||||
* Providing the API upstream error message when migration fails in debug mode (Closes: #205)
|
||||
* Change the default behaviour of the daemon mode to active. (Closes: #176)
|
||||
* Change the default banalcing mode to used instead of assigned. (Closes: #180)
|
||||
* Set cpu_used to the cpu usage, which is a percent, times the total number of cores to get a number where guest cpu_used can be added to nodes cpu_used and be meaningful. (Closes: #195)
|
||||
* Honor the value when balancing should not be performed and stop balancing. (Closes: #174)
|
||||
* Allow the use of minutes instead of hours and only accept hours or minutes in the format. (Closes: #187)
|
||||
* Remove hard coded memory usage from lowest usage node and use method and mode specified in configuration instead. (Closes: #197)
|
||||
* Fix the guest type relationship in the logs when a migration job failed. (Closes: #204)
|
||||
* Requery a guest if that running guest reports 0 cpu usage. (Closes: #200)
|
||||
|
||||
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.com> Tue, 1 Apr 2025 18:55:02 +0000
|
||||
-- Florian Paul Azim Hoberg <gyptazy@gyptazy.com> Sat, 20 Apr 2025 20:55:02 +0000
|
||||
|
||||
proxlb (1.1.0) stable; urgency=medium
|
||||
|
||||
|
||||
@@ -11,6 +11,7 @@
|
||||
2. [Anti-Affinity Rules](#anti-affinity-rules)
|
||||
3. [Affinity / Anti-Affinity Enforcing](#affinity--anti-affinity-enforcing)
|
||||
4. [Ignore VMs](#ignore-vms)
|
||||
5. [Pin VMs to Hypervisor Nodes](#pin-vms-to-hypervisor-nodes)
|
||||
2. [API Loadbalancing](#api-loadbalancing)
|
||||
3. [Ignore Host-Nodes or Guests](#ignore-host-nodes-or-guests)
|
||||
4. [IPv6 Support](#ipv6-support)
|
||||
@@ -18,6 +19,7 @@
|
||||
6. [Parallel Migrations](#parallel-migrations)
|
||||
7. [Run as a Systemd-Service](#run-as-a-systemd-service)
|
||||
8. [SSL Self-Signed Certificates](#ssl-self-signed-certificates)
|
||||
9. [Dynamic Power Management (DPM)](#dynamic-power-management)
|
||||
|
||||
## Authentication / User Accounts / Permissions
|
||||
### Authentication
|
||||
@@ -39,10 +41,9 @@ pveum acl modify / --roles proxlb --users proxlb@pve
|
||||
*Note: The user management can also be done on the WebUI without invoking the CLI.*
|
||||
|
||||
### Creating an API Token for a User
|
||||
|
||||
Create an API token for user proxlb@pve with token ID proxlb and deactivated privilege separation:
|
||||
```
|
||||
# Create an API token for user proxlb@pve with token ID proxlb
|
||||
pveum user token add proxlb@pve proxlb
|
||||
pveum user token add proxlb@pve proxlb --privsep 0
|
||||
```
|
||||
|
||||
Afterwards, you get the token secret returned. You can now add those entries to your ProxLB config. Make sure, that you also keep the `user` parameter, next to the new token parameters.
|
||||
@@ -125,6 +126,20 @@ As a result, ProxLB will not migrate this guest with the `plb_ignore_dev` tag to
|
||||
|
||||
**Note:** Ignored guests are really ignored. Even by enforcing affinity rules this guest will be ignored.
|
||||
|
||||
### Pin VMs to Specific Hypervisor Nodes
|
||||
<img align="left" src="https://cdn.gyptazy.com/images/proxlb-tag-node-pinning.jpg"/> Guests, such as VMs or CTs, can also be pinned to specific nodes in the cluster. This might be usefull when running applications with some special licensing requirements that are only fulfilled on certain nodes. It might also be interesting, when some physical hardware is attached to a node, that is not available in general within the cluster.
|
||||
|
||||
To pin a guest to a specific cluster node, users assign a tag with the prefix `plb_pin_$nodename` to the desired guest:
|
||||
|
||||
#### Example for Screenshot
|
||||
```
|
||||
plb_pin_node03
|
||||
```
|
||||
|
||||
As a result, ProxLB will pin the guest `dev-vm01` to the node `virt03`.
|
||||
|
||||
**Note:** The given node names from the tag are validated. This means, ProxLB validated if the given node name is really part of the cluster. In case of a wrongly defined or unavailable node name it continous to use the regular processes to make sure the guest keeps running.
|
||||
|
||||
### API Loadbalancing
|
||||
ProxLB supports API loadbalancing, where one or more host objects can be defined as a list. This ensures, that you can even operator ProxLB without further changes when one or more nodes are offline or in a maintenance. When defining multiple hosts, the first reachable one will be picked.
|
||||
|
||||
@@ -193,4 +208,34 @@ proxmox_api:
|
||||
ssl_verification: False
|
||||
```
|
||||
|
||||
*Note: Disabling SSL certificate validation is not recommended.*
|
||||
*Note: Disabling SSL certificate validation is not recommended.*
|
||||
|
||||
### Dynamic Power Management (DPM)
|
||||
<img align="left" src="https://cdn.gyptazy.com/images/proxlb-proxmox-node-wakeonlan-wol-mac-dpm.jpg"/> Configuring Dynamic Power Management (DPM) in ProxLB within a Proxmox cluster involves a few critical steps to ensure proper operation. The first consideration is that any node intended for automatic shutdown and startup must support Wake-on-LAN (WOL). This is essential because DPM relies on the ability to power nodes back on remotely. For this to work, the ProxLB instance must be able to reach the target node’s MAC address directly over the network.
|
||||
|
||||
To make this possible, you must configure the correct MAC address for WOL within the Proxmox web interface. This is done by selecting the node, going to the “System” section, then “Options,” and finally setting the “MAC address for Wake-on-LAN.” Alternatively, this value can also be submitted using the Proxmox API. Without this MAC address in place, ProxLB will not allow the node to be shut down. This restriction is in place to prevent nodes from being turned off without a way to bring them back online, which could lead to service disruption. By ensuring that each node has a valid WOL MAC address configured, DPM can operate safely and effectively, allowing ProxLB to manage the cluster’s power consumption dynamically.
|
||||
|
||||
#### Requirements
|
||||
Using the powermanagement feature within clusters comes along with several requirements:
|
||||
* ProxLB needs to reach the WOL-Mac address of the node (plain network)
|
||||
* WOL must be enabled of the node in general (BIOS/UEFI)
|
||||
* The related WOL network interface must be defined
|
||||
* The related WOL network interface MAC address must be defined in Proxmox for the node
|
||||
|
||||
#### Options
|
||||
| Section | Option | Sub Option | Example | Type | Description |
|
||||
|---------|:------:|:----------:|:-------:|:----:|:-----------:|
|
||||
| `dpm` | | | | | |
|
||||
| | enable | | True | `Bool` | Enables the Dynamic Power Management functions.|
|
||||
| | method | | memory | `Str` | The balancing method that should be used. [values: `memory` (default), `cpu`, `disk`]|
|
||||
| | mode | | static | `Str` | The balancing mode that should be used. [values: `static` (default), `auto`] |
|
||||
| | cluster_min_free_resources | | 60 | `Int` | Representing the minimum required free resouzrces in percent within the cluster. [values: `60`% (default)] |
|
||||
| | cluster_min_nodes | | 3 | `Int` | The minimum of required nodes that should remain in a cluster. [values: `3` (default)] |
|
||||
|
||||
#### DPM Modes
|
||||
##### Static
|
||||
Static mode in DPM lets you set a fixed number of nodes that should always stay powered on in a Proxmox cluster. This is important to keep the cluster working properly, since you need at least three nodes to maintain quorum. The system won’t let you go below that limit to avoid breaking cluster functionality.
|
||||
|
||||
Besides the minimum number of active nodes, you can also define a baseline for how many free resources—like CPU or RAM—should always be available when the virtual machines are running. If the available resources drop below that level, ProxLB will try to power on more nodes, as long as they're available and can be started. On the other hand, if the cluster has more than enough resources, ProxLB will begin to shut down nodes again, but only until the free resource threshold is reached.
|
||||
|
||||
This mode gives you a more stable setup by always keeping a minimum number of nodes ready while still adjusting the rest of the cluster based on resource usage, but in a controlled and predictable way.
|
||||
@@ -1,5 +1,5 @@
|
||||
#!/usr/bin/env bash
|
||||
VERSION="1.1.1"
|
||||
VERSION="1.1.2b"
|
||||
|
||||
sed -i "s/^__version__ = .*/__version__ = \"$VERSION\"/" "proxlb/utils/version.py"
|
||||
sed -i "s/version=\"[0-9]*\.[0-9]*\.[0-9]*\"/version=\"$VERSION\"/" setup.py
|
||||
|
||||
@@ -17,6 +17,7 @@ from utils.logger import SystemdLogger
|
||||
from utils.cli_parser import CliParser
|
||||
from utils.config_parser import ConfigParser
|
||||
from utils.proxmox_api import ProxmoxApi
|
||||
from models.dpm import DPM
|
||||
from models.nodes import Nodes
|
||||
from models.guests import Guests
|
||||
from models.groups import Groups
|
||||
@@ -53,14 +54,17 @@ def main():
|
||||
while True:
|
||||
# Get all required objects from the Proxmox cluster
|
||||
meta = {"meta": proxlb_config}
|
||||
nodes = Nodes.get_nodes(proxmox_api, proxlb_config)
|
||||
nodes, cluster = Nodes.get_nodes(proxmox_api, proxlb_config)
|
||||
guests = Guests.get_guests(proxmox_api, nodes, meta)
|
||||
groups = Groups.get_groups(guests, nodes)
|
||||
|
||||
# Merge obtained objects from the Proxmox cluster for further usage
|
||||
proxlb_data = {**meta, **nodes, **guests, **groups}
|
||||
proxlb_data = {**meta, **cluster, **nodes, **guests, **groups}
|
||||
Helper.log_node_metrics(proxlb_data)
|
||||
|
||||
# Evaluate the dynamic power management for nodes in the clustet
|
||||
DPM(proxlb_data)
|
||||
|
||||
# Update the initial node resource assignments
|
||||
# by the previously created groups.
|
||||
Calculations.set_node_assignments(proxlb_data)
|
||||
@@ -70,10 +74,14 @@ def main():
|
||||
Calculations.relocate_guests(proxlb_data)
|
||||
Helper.log_node_metrics(proxlb_data, init=False)
|
||||
|
||||
# Perform balancing actions via Proxmox API
|
||||
# Perform balancing
|
||||
if not cli_args.dry_run or not proxlb_data["meta"]["balancing"].get("enable", False):
|
||||
Balancing(proxmox_api, proxlb_data)
|
||||
|
||||
# Perform DPM
|
||||
if not cli_args.dry_run:
|
||||
DPM.dpm_shutdown_nodes(proxmox_api, proxlb_data)
|
||||
|
||||
# Validate if the JSON output should be
|
||||
# printed to stdout
|
||||
Helper.print_json(proxlb_data, cli_args.json)
|
||||
|
||||
@@ -162,7 +162,7 @@ class Calculations:
|
||||
logger.debug("Finished: get_most_free_node.")
|
||||
|
||||
@staticmethod
|
||||
def relocate_guests_on_maintenance_nodes(proxlb_data: Dict[str, Any]):
|
||||
def relocate_guests_on_maintenance_nodes(proxlb_data: Dict[str, Any]) -> None:
|
||||
"""
|
||||
Relocates guests that are currently on nodes marked for maintenance to
|
||||
nodes with the most available resources.
|
||||
@@ -192,7 +192,7 @@ class Calculations:
|
||||
logger.debug("Finished: get_most_free_node.")
|
||||
|
||||
@staticmethod
|
||||
def relocate_guests(proxlb_data: Dict[str, Any]):
|
||||
def relocate_guests(proxlb_data: Dict[str, Any]) -> None:
|
||||
"""
|
||||
Relocates guests within the provided data structure to ensure affinity groups are
|
||||
placed on nodes with the most free resources.
|
||||
@@ -225,12 +225,13 @@ class Calculations:
|
||||
for guest_name in proxlb_data["groups"]["affinity"][group_name]["guests"]:
|
||||
proxlb_data["meta"]["balancing"]["balance_next_guest"] = guest_name
|
||||
Calculations.val_anti_affinity(proxlb_data, guest_name)
|
||||
Calculations.val_node_relationship(proxlb_data, guest_name)
|
||||
Calculations.update_node_resources(proxlb_data)
|
||||
|
||||
logger.debug("Finished: relocate_guests.")
|
||||
|
||||
@staticmethod
|
||||
def val_anti_affinity(proxlb_data: Dict[str, Any], guest_name: str):
|
||||
def val_anti_affinity(proxlb_data: Dict[str, Any], guest_name: str) -> None:
|
||||
"""
|
||||
Validates and assigns nodes to guests based on anti-affinity rules.
|
||||
|
||||
@@ -279,7 +280,38 @@ class Calculations:
|
||||
logger.debug("Finished: val_anti_affinity.")
|
||||
|
||||
@staticmethod
|
||||
def update_node_resources(proxlb_data):
|
||||
def val_node_relationship(proxlb_data: Dict[str, Any], guest_name: str) -> None:
|
||||
"""
|
||||
Validates and assigns guests to nodes based on defined relationships based on tags.
|
||||
|
||||
Parameters:
|
||||
proxlb_data (Dict[str, Any]): The data holding all content of all objects.
|
||||
guest_name (str): The name of the guest to be validated and assigned a node.
|
||||
|
||||
Returns:
|
||||
None
|
||||
"""
|
||||
logger.debug("Starting: val_node_relationship.")
|
||||
proxlb_data["guests"][guest_name]["processed"] = True
|
||||
|
||||
if proxlb_data["guests"][guest_name]["node_relationship"]:
|
||||
logger.info(f"Guest '{guest_name}' has a specific relationship defined to node: {proxlb_data['guests'][guest_name]['node_relationship']}. Pinning to node.")
|
||||
|
||||
# Validate if the specified node name is really part of the cluster
|
||||
if proxlb_data['guests'][guest_name]['node_relationship'] in proxlb_data["nodes"].keys():
|
||||
logger.info(f"Guest '{guest_name}' has a specific relationship defined to node: {proxlb_data['guests'][guest_name]['node_relationship']} is a known hypervisor node in the cluster.")
|
||||
# Pin the guest to the specified hypervisor node.
|
||||
proxlb_data["meta"]["balancing"]["balance_next_node"] = proxlb_data['guests'][guest_name]['node_relationship']
|
||||
else:
|
||||
logger.warning(f"Guest '{guest_name}' has a specific relationship defined to node: {proxlb_data['guests'][guest_name]['node_relationship']} but this node name is not known in the cluster!")
|
||||
|
||||
else:
|
||||
logger.info(f"Guest '{guest_name}' does not have any specific node relationships.")
|
||||
|
||||
logger.debug("Finished: val_node_relationship.")
|
||||
|
||||
@staticmethod
|
||||
def update_node_resources(proxlb_data: Dict[str, Any]) -> None:
|
||||
"""
|
||||
Updates the resource allocation and usage statistics for nodes when a guest
|
||||
is moved from one node to another.
|
||||
@@ -343,3 +375,68 @@ class Calculations:
|
||||
logger.debug(f"Set guest {guest_name} from node {node_current} to node {node_target}.")
|
||||
|
||||
logger.debug("Finished: update_node_resources.")
|
||||
|
||||
@staticmethod
|
||||
def update_cluster_resources(proxlb_data: Dict[str, Any], node: str, action: str) -> None:
|
||||
"""
|
||||
Updates the cluster resource statistics based on the specified action and node.
|
||||
|
||||
This method modifies the cluster-level resource data (such as CPU, memory, disk usage,
|
||||
and node counts) based on the action performed ('add' or 'remove') for the specified node.
|
||||
It calculates the updated statistics after adding or removing a node and logs the results.
|
||||
|
||||
Parameters:
|
||||
proxlb_data (Dict[str, Any]): The data representing the current state of the cluster,
|
||||
including node-level statistics for CPU, memory, and disk.
|
||||
node (str): The identifier of the node whose resources are being added or removed from the cluster.
|
||||
action (str): The action to perform, either 'add' or 'remove'. 'add' will include the node's
|
||||
resources in the cluster, while 'remove' will exclude the node's resources.
|
||||
|
||||
Returns:
|
||||
None: The function modifies the `proxlb_data` dictionary in place to update the cluster resources.
|
||||
"""
|
||||
logger.debug("Starting: update_cluster_resources.")
|
||||
logger.debug(f"DPM: Updating cluster statistics by online node {node}. Action: {action}")
|
||||
logger.debug(f"DPM: update_cluster_resources - Before {action}: {proxlb_data['cluster']['memory_free_percent']}")
|
||||
|
||||
if action == "add":
|
||||
proxlb_data["cluster"]["node_count"] = proxlb_data["cluster"].get("node_count", 0) + 1
|
||||
proxlb_data["cluster"]["cpu_total"] = proxlb_data["cluster"].get("cpu_total", 0) + proxlb_data["nodes"][node]["cpu_total"]
|
||||
proxlb_data["cluster"]["cpu_used"] = proxlb_data["cluster"].get("cpu_used", 0) + proxlb_data["nodes"][node]["cpu_used"]
|
||||
proxlb_data["cluster"]["cpu_free"] = proxlb_data["cluster"].get("cpu_free", 0) + proxlb_data["nodes"][node]["cpu_free"]
|
||||
proxlb_data["cluster"]["cpu_free_percent"] = proxlb_data["cluster"].get("cpu_free", 0) / proxlb_data["cluster"].get("cpu_total", 0) * 100
|
||||
proxlb_data["cluster"]["cpu_used_percent"] = proxlb_data["cluster"].get("cpu_used", 0) / proxlb_data["cluster"].get("cpu_total", 0) * 100
|
||||
proxlb_data["cluster"]["memory_total"] = proxlb_data["cluster"].get("memory_total", 0) + proxlb_data["nodes"][node]["memory_total"]
|
||||
proxlb_data["cluster"]["memory_used"] = proxlb_data["cluster"].get("memory_used", 0) + proxlb_data["nodes"][node]["memory_used"]
|
||||
proxlb_data["cluster"]["memory_free"] = proxlb_data["cluster"].get("memory_free", 0) + proxlb_data["nodes"][node]["memory_free"]
|
||||
proxlb_data["cluster"]["memory_free_percent"] = proxlb_data["cluster"].get("memory_free", 0) / proxlb_data["cluster"].get("memory_total", 0) * 100
|
||||
proxlb_data["cluster"]["memory_used_percent"] = proxlb_data["cluster"].get("memory_used", 0) / proxlb_data["cluster"].get("memory_total", 0) * 100
|
||||
proxlb_data["cluster"]["disk_total"] = proxlb_data["cluster"].get("disk_total", 0) + proxlb_data["nodes"][node]["disk_total"]
|
||||
proxlb_data["cluster"]["disk_used"] = proxlb_data["cluster"].get("disk_used", 0) + proxlb_data["nodes"][node]["disk_used"]
|
||||
proxlb_data["cluster"]["disk_free"] = proxlb_data["cluster"].get("disk_free", 0) + proxlb_data["nodes"][node]["disk_free"]
|
||||
proxlb_data["cluster"]["disk_free_percent"] = proxlb_data["cluster"].get("disk_free", 0) / proxlb_data["cluster"].get("disk_total", 0) * 100
|
||||
proxlb_data["cluster"]["disk_used_percent"] = proxlb_data["cluster"].get("disk_used", 0) / proxlb_data["cluster"].get("disk_total", 0) * 100
|
||||
proxlb_data["cluster"]["node_count_available"] = proxlb_data["cluster"].get("node_count_available", 0) + 1
|
||||
proxlb_data["cluster"]["node_count_overall"] = proxlb_data["cluster"].get("node_count_overall", 0) + 1
|
||||
|
||||
if action == "remove":
|
||||
proxlb_data["cluster"]["node_count"] = proxlb_data["cluster"].get("node_count", 0) - 1
|
||||
proxlb_data["cluster"]["cpu_total"] = proxlb_data["cluster"].get("cpu_total", 0) - proxlb_data["nodes"][node]["cpu_total"]
|
||||
proxlb_data["cluster"]["cpu_used"] = proxlb_data["cluster"].get("cpu_used", 0) - proxlb_data["nodes"][node]["cpu_used"]
|
||||
proxlb_data["cluster"]["cpu_free"] = proxlb_data["cluster"].get("cpu_free", 0) - proxlb_data["nodes"][node]["cpu_free"]
|
||||
proxlb_data["cluster"]["cpu_free_percent"] = proxlb_data["cluster"].get("cpu_free", 0) / proxlb_data["cluster"].get("cpu_total", 0) * 100
|
||||
proxlb_data["cluster"]["cpu_used_percent"] = proxlb_data["cluster"].get("cpu_used", 0) / proxlb_data["cluster"].get("cpu_total", 0) * 100
|
||||
proxlb_data["cluster"]["memory_total"] = proxlb_data["cluster"].get("memory_total", 0) - proxlb_data["nodes"][node]["memory_total"]
|
||||
proxlb_data["cluster"]["memory_used"] = proxlb_data["cluster"].get("memory_used") - proxlb_data["nodes"][node]["memory_used"]
|
||||
proxlb_data["cluster"]["memory_free"] = proxlb_data["cluster"].get("memory_free") - proxlb_data["nodes"][node]["memory_free"]
|
||||
proxlb_data["cluster"]["memory_free_percent"] = proxlb_data["cluster"].get("memory_free") / proxlb_data["cluster"].get("memory_total", 0) * 100
|
||||
proxlb_data["cluster"]["memory_used_percent"] = proxlb_data["cluster"].get("memory_used") / proxlb_data["cluster"].get("memory_total", 0) * 100
|
||||
proxlb_data["cluster"]["disk_total"] = proxlb_data["cluster"].get("disk_total", 0) - proxlb_data["nodes"][node]["disk_total"]
|
||||
proxlb_data["cluster"]["disk_used"] = proxlb_data["cluster"].get("disk_used", 0) - proxlb_data["nodes"][node]["disk_used"]
|
||||
proxlb_data["cluster"]["disk_free"] = proxlb_data["cluster"].get("disk_free", 0) - proxlb_data["nodes"][node]["disk_free"]
|
||||
proxlb_data["cluster"]["disk_free_percent"] = proxlb_data["cluster"].get("disk_free", 0) / proxlb_data["cluster"].get("disk_total", 0) * 100
|
||||
proxlb_data["cluster"]["disk_used_percent"] = proxlb_data["cluster"].get("disk_used", 0) / proxlb_data["cluster"].get("disk_total", 0) * 100
|
||||
proxlb_data["cluster"]["node_count_available"] = proxlb_data["cluster"].get("node_count_available", 0) - 1
|
||||
|
||||
logger.debug(f"DPM: update_cluster_resources - After {action}: {proxlb_data['cluster']['memory_free_percent']}")
|
||||
logger.debug("Finished: update_cluster_resources.")
|
||||
|
||||
255
proxlb/models/dpm.py
Normal file
255
proxlb/models/dpm.py
Normal file
@@ -0,0 +1,255 @@
|
||||
"""
|
||||
The DPM (Dynamic Power Management) class is responsible for the dynamic management
|
||||
of nodes within a Proxmox cluster, optimizing resource utilization by controlling
|
||||
node power states based on specified schedules and conditions.
|
||||
|
||||
This class provides functionality for:
|
||||
- Tracking and validating schedules for dynamic power management.
|
||||
- Shutting down nodes that are underutilized or not needed.
|
||||
- Starting up nodes using Wake-on-LAN (WOL) based on certain conditions.
|
||||
- Ensuring that nodes are properly flagged for maintenance and startup/shutdown actions.
|
||||
|
||||
The DPM class can operate in different modes, such as static and automatic,
|
||||
to either perform predefined actions or dynamically adjust based on real-time resource usage.
|
||||
"""
|
||||
|
||||
__author__ = "Florian Paul Azim Hoberg <gyptazy>"
|
||||
__copyright__ = "Copyright (C) 2025 Florian Paul Azim Hoberg (@gyptazy)"
|
||||
__license__ = "GPL-3.0"
|
||||
|
||||
|
||||
import proxmoxer
|
||||
from typing import Dict, Any
|
||||
from models.calculations import Calculations
|
||||
from utils.logger import SystemdLogger
|
||||
|
||||
logger = SystemdLogger()
|
||||
|
||||
|
||||
class DPM:
|
||||
"""
|
||||
The DPM (Dynamic Power Management) class is responsible for the dynamic management
|
||||
of nodes within a Proxmox cluster, optimizing resource utilization by controlling
|
||||
node power states based on specified schedules and conditions.
|
||||
|
||||
This class provides functionality for:
|
||||
- Tracking and validating schedules for dynamic power management.
|
||||
- Shutting down nodes that are underutilized or not needed.
|
||||
- Starting up nodes using Wake-on-LAN (WOL) based on certain conditions.
|
||||
- Ensuring that nodes are properly flagged for maintenance and startup/shutdown actions.
|
||||
|
||||
The DPM class can operate in different modes, such as static and automatic,
|
||||
to either perform predefined actions or dynamically adjust based on real-time resource usage.
|
||||
|
||||
Attributes:
|
||||
None directly defined for the class; instead, all actions are based on input data
|
||||
and interactions with the Proxmox API and other helper functions.
|
||||
|
||||
Methods:
|
||||
__init__(proxlb_data: Dict[str, Any]):
|
||||
Initializes the DPM class, checking whether DPM is enabled and operating in the
|
||||
appropriate mode (static or auto).
|
||||
|
||||
dpm_static(proxlb_data: Dict[str, Any]) -> None:
|
||||
Evaluates the cluster's resource availability and performs static power management
|
||||
actions by removing nodes that are not required.
|
||||
|
||||
dpm_shutdown_nodes(proxmox_api, proxlb_data) -> None:
|
||||
Shuts down nodes flagged for DPM shutdown by using the Proxmox API, ensuring
|
||||
that Wake-on-LAN (WOL) is available for proper node recovery.
|
||||
|
||||
dpm_startup_nodes(proxmox_api, proxlb_data) -> None:
|
||||
Powers on nodes that are flagged for startup and are not in maintenance mode,
|
||||
leveraging Wake-on-LAN (WOL) functionality.
|
||||
|
||||
dpm_validate_wol_mac(proxmox_api, node) -> None:
|
||||
Validates and retrieves the Wake-on-LAN (WOL) MAC address for a given node,
|
||||
ensuring that a valid address is set for powering on the node remotely.
|
||||
"""
|
||||
|
||||
def __init__(self, proxlb_data: Dict[str, Any]):
|
||||
"""
|
||||
Initializes the DPM class with the provided ProxLB data.
|
||||
|
||||
Args:
|
||||
proxlb_data (dict): The data required for balancing VMs and CTs.
|
||||
"""
|
||||
logger.debug("Starting: dpm class.")
|
||||
|
||||
if proxlb_data["meta"].get("dpm", {}).get("enable", False):
|
||||
logger.debug("DPM function is enabled.")
|
||||
mode = proxlb_data["meta"].get("dpm", {}).get("mode", None)
|
||||
|
||||
if mode == "static":
|
||||
self.dpm_static(proxlb_data)
|
||||
|
||||
if mode == "auto":
|
||||
self.dpm_auto(proxlb_data)
|
||||
|
||||
else:
|
||||
logger.debug("DPM function is not enabled.")
|
||||
|
||||
logger.debug("Finished: dpm class.")
|
||||
|
||||
def dpm_static(self, proxlb_data: Dict[str, Any]) -> None:
|
||||
"""
|
||||
Evaluates and performs static Distributed Power Management (DPM) actions based on current cluster state.
|
||||
|
||||
This method monitors cluster resource availability and attempts to reduce the number of active nodes
|
||||
when sufficient free resources are available. It ensures a minimum number of nodes remains active
|
||||
and prioritizes shutting down nodes with the least utilized resources to minimize impact. Nodes selected
|
||||
for shutdown are marked for maintenance and flagged for DPM shutdown.
|
||||
|
||||
Parameters:
|
||||
proxlb_data (Dict[str, Any]): A dictionary containing metadata, cluster status, and node-level information
|
||||
including resource utilization, configuration settings, and DPM thresholds.
|
||||
|
||||
Returns:
|
||||
None: Modifies the input dictionary in-place to reflect updated cluster state and node flags.
|
||||
"""
|
||||
logger.debug("Starting: dpm_static.")
|
||||
|
||||
method = proxlb_data["meta"].get("dpm", {}).get("method", "memory")
|
||||
cluster_nodes_overall = proxlb_data["cluster"]["node_count_overall"]
|
||||
cluster_nodes_available = proxlb_data["cluster"]["node_count_available"]
|
||||
cluster_free_resources_percent = int(proxlb_data["cluster"][f"{method}_free_percent"])
|
||||
cluster_free_resources_req_min = proxlb_data["meta"].get("dpm", {}).get("cluster_min_free_resources", 0)
|
||||
cluster_mind_nodes = proxlb_data["meta"].get("dpm", {}).get("cluster_min_nodes", 3)
|
||||
logger.debug(f"DPM: Cluster Nodes: {cluster_nodes_overall} | Nodes available: {cluster_nodes_available} | Nodes offline: {cluster_nodes_overall - cluster_nodes_available}")
|
||||
|
||||
# Only proceed removing nodes if the cluster has enough resources
|
||||
while cluster_free_resources_percent > cluster_free_resources_req_min:
|
||||
logger.debug(f"DPM: More free resources {cluster_free_resources_percent}% available than required: {cluster_free_resources_req_min}%. DPM evaluation starting...")
|
||||
|
||||
# Ensure that we have at least a defined minimum of nodes left
|
||||
if cluster_nodes_available > cluster_mind_nodes:
|
||||
logger.debug(f"DPM: A minimum of {cluster_mind_nodes} nodes is required. {cluster_nodes_available} are available. Proceeding...")
|
||||
|
||||
# Get the node with the fewest used resources to keep migrations low
|
||||
Calculations.get_most_free_node(proxlb_data, False)
|
||||
dpm_node = proxlb_data["meta"]["balancing"]["balance_next_node"]
|
||||
|
||||
# Perform cluster calculation for evaluating how many nodes can safely leave
|
||||
# the cluster. Further object calculations are being processed afterwards by
|
||||
# the calculation class
|
||||
logger.debug(f"DPM: Removing node {dpm_node} from cluster. Node will be turned off later.")
|
||||
Calculations.update_cluster_resources(proxlb_data, dpm_node, "remove")
|
||||
cluster_free_resources_percent = int(proxlb_data["cluster"][f"{method}_free_percent"])
|
||||
logger.debug(f"DPM: Free cluster resources changed to: {int(proxlb_data['cluster'][f'{method}_free_percent'])}%.")
|
||||
|
||||
# Set node to maintenance and DPM shutdown
|
||||
proxlb_data["nodes"][dpm_node]["maintenance"] = True
|
||||
proxlb_data["nodes"][dpm_node]["dpm_shutdown"] = True
|
||||
else:
|
||||
logger.warning(f"DPM: A minimum of {cluster_mind_nodes} nodes is required. {cluster_nodes_available} are available. Cannot proceed!")
|
||||
|
||||
logger.debug(f"DPM: Not enough free resources {cluster_free_resources_percent}% available than required: {cluster_free_resources_req_min}%. DPM evaluation stopped.")
|
||||
logger.debug("Finished: dpm_static.")
|
||||
return proxlb_data
|
||||
|
||||
@staticmethod
|
||||
def dpm_shutdown_nodes(proxmox_api, proxlb_data: Dict[str, Any]) -> None:
|
||||
"""
|
||||
Shuts down cluster nodes that are marked for maintenance and flagged for DPM shutdown.
|
||||
|
||||
This method iterates through the cluster nodes in the provided data and attempts to
|
||||
power off any node that has both the 'maintenance' and 'dpm_shutdown' flags set.
|
||||
It communicates with the Proxmox API to issue shutdown commands and logs any failures.
|
||||
|
||||
Parameters:
|
||||
proxmox_api: An instance of the Proxmox API client used to issue node shutdown commands.
|
||||
proxlb_data: A dictionary containing node status information, including flags for
|
||||
maintenance and DPM shutdown readiness.
|
||||
|
||||
Returns:
|
||||
None: Performs shutdown operations and logs outcomes; modifies no data directly.
|
||||
"""
|
||||
logger.debug("Starting: dpm_shutdown_nodes.")
|
||||
for node, node_info in proxlb_data["nodes"].items():
|
||||
|
||||
if node_info["maintenance"] and node_info["dpm_shutdown"]:
|
||||
logger.debug(f"DPM: Node: {node} is flagged as maintenance mode and to be powered off.")
|
||||
|
||||
# Ensure that the node has a valid WOL MAC defined. If not
|
||||
# we would be unable to power on that system again
|
||||
valid_wol_mac = DPM.dpm_validate_wol_mac(proxmox_api, node)
|
||||
|
||||
if valid_wol_mac:
|
||||
try:
|
||||
logger.debug(f"DPM: Shutting down node: {node}.")
|
||||
job_id = proxmox_api.nodes(node).status.post(command="shutdown")
|
||||
except proxmoxer.core.ResourceException as proxmox_api_error:
|
||||
logger.critical(f"DPM: Error while powering off node {node}. Please check job-id: {job_id}")
|
||||
logger.debug(f"DPM: Error while powering off node {node}. Please check job-id: {job_id}")
|
||||
else:
|
||||
logger.critical(f"DPM: Node {node} cannot be powered off due to missing WOL MAC. Please define a valid WOL MAC for this node.")
|
||||
|
||||
logger.debug("Finished: dpm_shutdown_nodes.")
|
||||
|
||||
@staticmethod
|
||||
def dpm_startup_nodes(proxmox_api, proxlb_data: Dict[str, Any]) -> None:
|
||||
"""
|
||||
Starts uo cluster nodes that are marked for DPM start up.
|
||||
|
||||
This method iterates through the cluster nodes in the provided data and attempts to
|
||||
power on any node that is not flagged as 'maintenance' but flagged as 'dpm_startup'.
|
||||
It communicates with the Proxmox API to issue poweron commands and logs any failures.
|
||||
|
||||
Parameters:
|
||||
proxmox_api: An instance of the Proxmox API client used to issue node startup commands.
|
||||
proxlb_data: A dictionary containing node status information, including flags for
|
||||
maintenance and DPM shutdown readiness.
|
||||
|
||||
Returns:
|
||||
None: Performs poweron operations and logs outcomes; modifies no data directly.
|
||||
"""
|
||||
logger.debug("Starting: dpm_startup_nodes.")
|
||||
for node, node_info in proxlb_data["nodes"].items():
|
||||
|
||||
if not node_info["maintenance"]:
|
||||
logger.debug(f"DPM: Node: {node} is not in maintenance mode.")
|
||||
|
||||
if node_info["dpm_startup"]:
|
||||
logger.debug(f"DPM: Node: {node} is flagged as to be started.")
|
||||
|
||||
try:
|
||||
logger.debug(f"DPM: Powering on node: {node}.")
|
||||
# Important: This requires Proxmox Operators to define the
|
||||
# WOL address for each node within the Proxmox webinterface
|
||||
job_id = proxmox_api.nodes().wakeonlan.post(node=node)
|
||||
except proxmoxer.core.ResourceException as proxmox_api_error:
|
||||
logger.critical(f"DPM: Error while powering on node {node}. Please check job-id: {job_id}")
|
||||
logger.debug(f"DPM: Error while powering on node {node}. Please check job-id: {job_id}")
|
||||
|
||||
logger.debug("Finished: dpm_startup_nodes.")
|
||||
|
||||
@staticmethod
|
||||
def dpm_validate_wol_mac(proxmox_api, node: Dict[str, Any]) -> str:
|
||||
"""
|
||||
Retrieves and validates the Wake-on-LAN (WOL) MAC address for a specified node.
|
||||
|
||||
This method fetches the MAC address configured for Wake-on-LAN (WOL) from the Proxmox API.
|
||||
If the MAC address is found, it is logged. In case of failure to retrieve the address,
|
||||
a critical log is generated indicating the absence of a WOL MAC address for the node.
|
||||
|
||||
Parameters:
|
||||
proxmox_api: An instance of the Proxmox API client used to query node configurations.
|
||||
node: The identifier (name or ID) of the node for which the WOL MAC address is to be validated.
|
||||
|
||||
Returns:
|
||||
node_wol_mac_address: The WOL MAC address for the specified node if found, otherwise `None`.
|
||||
"""
|
||||
logger.debug("Starting: dpm_validate_wol_mac.")
|
||||
|
||||
try:
|
||||
logger.debug(f"DPM: Getting WOL MAC address for node {node} from API.")
|
||||
node_wol_mac_address = proxmox_api.nodes(node).config.get(property="wakeonlan")
|
||||
node_wol_mac_address = node_wol_mac_address.get("wakeonlan")
|
||||
logger.debug(f"DPM: Node {node} has MAC address: {node_wol_mac_address} for WOL.")
|
||||
except proxmoxer.core.ResourceException as proxmox_api_error:
|
||||
logger.debug(f"DPM: Failed to get WOL MAC address for node {node} from API.")
|
||||
node_wol_mac_address = None
|
||||
logger.critical(f"DPM: Node {node} has no MAC address defined for WOL.")
|
||||
|
||||
logger.debug("Finished: dpm_validate_wol_mac.")
|
||||
return node_wol_mac_address
|
||||
@@ -90,6 +90,7 @@ class Guests:
|
||||
guests['guests'][guest['name']]['affinity_groups'] = Tags.get_affinity_groups(guests['guests'][guest['name']]['tags'])
|
||||
guests['guests'][guest['name']]['anti_affinity_groups'] = Tags.get_anti_affinity_groups(guests['guests'][guest['name']]['tags'])
|
||||
guests['guests'][guest['name']]['ignore'] = Tags.get_ignore(guests['guests'][guest['name']]['tags'])
|
||||
guests['guests'][guest['name']]['node_relationship'] = Tags.get_node_relationship(guests['guests'][guest['name']]['tags'])
|
||||
guests['guests'][guest['name']]['type'] = 'vm'
|
||||
else:
|
||||
logger.debug(f'Metric for VM {guest["name"]} ignored because VM is not running.')
|
||||
@@ -115,6 +116,7 @@ class Guests:
|
||||
guests['guests'][guest['name']]['affinity_groups'] = Tags.get_affinity_groups(guests['guests'][guest['name']]['tags'])
|
||||
guests['guests'][guest['name']]['anti_affinity_groups'] = Tags.get_anti_affinity_groups(guests['guests'][guest['name']]['tags'])
|
||||
guests['guests'][guest['name']]['ignore'] = Tags.get_ignore(guests['guests'][guest['name']]['tags'])
|
||||
guests['guests'][guest['name']]['node_relationship'] = Tags.get_node_relationship(guests['guests'][guest['name']]['tags'])
|
||||
guests['guests'][guest['name']]['type'] = 'ct'
|
||||
else:
|
||||
logger.debug(f'Metric for CT {guest["name"]} ignored because CT is not running.')
|
||||
|
||||
@@ -54,6 +54,7 @@ class Nodes:
|
||||
"""
|
||||
logger.debug("Starting: get_nodes.")
|
||||
nodes = {"nodes": {}}
|
||||
cluster = {"cluster": {}}
|
||||
|
||||
for node in proxmox_api.nodes.get():
|
||||
# Ignoring a node results into ignoring all placed guests on the ignored node!
|
||||
@@ -61,6 +62,8 @@ class Nodes:
|
||||
nodes["nodes"][node["node"]] = {}
|
||||
nodes["nodes"][node["node"]]["name"] = node["node"]
|
||||
nodes["nodes"][node["node"]]["maintenance"] = False
|
||||
nodes["nodes"][node["node"]]["dpm_shutdown"] = False
|
||||
nodes["nodes"][node["node"]]["dpm_startup"] = False
|
||||
nodes["nodes"][node["node"]]["cpu_total"] = node["maxcpu"]
|
||||
nodes["nodes"][node["node"]]["cpu_assigned"] = 0
|
||||
nodes["nodes"][node["node"]]["cpu_used"] = node["cpu"] * node["maxcpu"]
|
||||
@@ -87,8 +90,35 @@ class Nodes:
|
||||
if Nodes.set_node_maintenance(proxlb_config, node["node"]):
|
||||
nodes["nodes"][node["node"]]["maintenance"] = True
|
||||
|
||||
# Generate the intial cluster statistics within the same loop to avoid a further one.
|
||||
logger.debug(f"Updating cluster statistics by online node {node['node']}.")
|
||||
cluster["cluster"]["node_count"] = cluster["cluster"].get("node_count", 0) + 1
|
||||
cluster["cluster"]["cpu_total"] = cluster["cluster"].get("cpu_total", 0) + nodes["nodes"][node["node"]]["cpu_total"]
|
||||
cluster["cluster"]["cpu_used"] = cluster["cluster"].get("cpu_used", 0) + nodes["nodes"][node["node"]]["cpu_used"]
|
||||
cluster["cluster"]["cpu_free"] = cluster["cluster"].get("cpu_free", 0) + nodes["nodes"][node["node"]]["cpu_free"]
|
||||
cluster["cluster"]["cpu_free_percent"] = cluster["cluster"].get("cpu_free", 0) / cluster["cluster"].get("cpu_total", 0) * 100
|
||||
cluster["cluster"]["cpu_used_percent"] = cluster["cluster"].get("cpu_used", 0) / cluster["cluster"].get("cpu_total", 0) * 100
|
||||
cluster["cluster"]["memory_total"] = cluster["cluster"].get("memory_total", 0) + nodes["nodes"][node["node"]]["memory_total"]
|
||||
cluster["cluster"]["memory_used"] = cluster["cluster"].get("memory_used", 0) + nodes["nodes"][node["node"]]["memory_used"]
|
||||
cluster["cluster"]["memory_free"] = cluster["cluster"].get("memory_free", 0) + nodes["nodes"][node["node"]]["memory_free"]
|
||||
cluster["cluster"]["memory_free_percent"] = cluster["cluster"].get("memory_free", 0) / cluster["cluster"].get("memory_total", 0) * 100
|
||||
cluster["cluster"]["memory_used_percent"] = cluster["cluster"].get("memory_used", 0) / cluster["cluster"].get("memory_total", 0) * 100
|
||||
cluster["cluster"]["disk_total"] = cluster["cluster"].get("disk_total", 0) + nodes["nodes"][node["node"]]["disk_total"]
|
||||
cluster["cluster"]["disk_used"] = cluster["cluster"].get("disk_used", 0) + nodes["nodes"][node["node"]]["disk_used"]
|
||||
cluster["cluster"]["disk_free"] = cluster["cluster"].get("disk_free", 0) + nodes["nodes"][node["node"]]["disk_free"]
|
||||
cluster["cluster"]["disk_free_percent"] = cluster["cluster"].get("disk_free", 0) / cluster["cluster"].get("disk_total", 0) * 100
|
||||
cluster["cluster"]["disk_used_percent"] = cluster["cluster"].get("disk_used", 0) / cluster["cluster"].get("disk_total", 0) * 100
|
||||
|
||||
cluster["cluster"]["node_count_available"] = cluster["cluster"].get("node_count_available", 0) + 1
|
||||
cluster["cluster"]["node_count_overall"] = cluster["cluster"].get("node_count_overall", 0) + 1
|
||||
|
||||
# Update the cluster statistics by offline nodes to have the overall count of nodes in the cluster
|
||||
else:
|
||||
logger.debug(f"Updating cluster statistics by offline node {node['node']}.")
|
||||
cluster["cluster"]["node_count_overall"] = cluster["cluster"].get("node_count_overall", 0) + 1
|
||||
|
||||
logger.debug("Finished: get_nodes.")
|
||||
return nodes
|
||||
return nodes, cluster
|
||||
|
||||
@staticmethod
|
||||
def set_node_maintenance(proxlb_config: Dict[str, Any], node_name: str) -> Dict[str, Any]:
|
||||
|
||||
@@ -151,3 +151,29 @@ class Tags:
|
||||
|
||||
logger.debug("Finished: get_ignore.")
|
||||
return ignore_tag
|
||||
|
||||
@staticmethod
|
||||
def get_node_relationship(tags: List[str]) -> str:
|
||||
"""
|
||||
Get a node relationship tag for a guest from the Proxmox cluster by the API to pin
|
||||
a guest to a node.
|
||||
|
||||
This method retrieves a relationship tag between a guest and a specific
|
||||
hypervisor node to pin the guest to a specific node (e.g., for licensing reason).
|
||||
|
||||
Args:
|
||||
tags (List): A list holding all defined tags for a given guest.
|
||||
|
||||
Returns:
|
||||
Str: The related hypervisor node name.
|
||||
"""
|
||||
logger.debug("Starting: get_node_relationship.")
|
||||
node_relationship_tag = False
|
||||
|
||||
if len(tags) > 0:
|
||||
for tag in tags:
|
||||
if tag.startswith("plb_pin"):
|
||||
node_relationship_tag = tag.replace("plb_pin_", "")
|
||||
|
||||
logger.debug("Finished: get_node_relationship.")
|
||||
return node_relationship_tag
|
||||
|
||||
@@ -20,12 +20,13 @@ except ImportError:
|
||||
PROXMOXER_PRESENT = False
|
||||
import random
|
||||
import socket
|
||||
import sys
|
||||
try:
|
||||
import requests
|
||||
REQUESTS_PRESENT = True
|
||||
except ImportError:
|
||||
REQUESTS_PRESENT = False
|
||||
import sys
|
||||
import time
|
||||
try:
|
||||
import urllib3
|
||||
URLLIB3_PRESENT = True
|
||||
@@ -141,7 +142,7 @@ class ProxmoxApi:
|
||||
|
||||
logger.debug("Finished: validate_config.")
|
||||
|
||||
def api_connect_get_hosts(self, proxmox_api_endpoints: list) -> str:
|
||||
def api_connect_get_hosts(self, proxlb_config, proxmox_api_endpoints: list) -> str:
|
||||
"""
|
||||
Perform a connectivity test to determine a working host for the Proxmox API.
|
||||
|
||||
@@ -152,6 +153,7 @@ class ProxmoxApi:
|
||||
are found, one is chosen at random to distribute the load across the cluster.
|
||||
|
||||
Args:
|
||||
proxlb_config (Dict[str, Any]): A dictionary containing the ProxLB configuration.
|
||||
proxmox_api_endpoints (list): A list of Proxmox API endpoints to test.
|
||||
|
||||
Returns:
|
||||
@@ -175,21 +177,25 @@ class ProxmoxApi:
|
||||
logger.critical(f"No proxmox_api hosts are defined.")
|
||||
sys.exit(1)
|
||||
|
||||
# Get a suitable Proxmox API endpoint. Therefore, we check if we only have
|
||||
# a single Proxmox API endpoint or multiple ones. If only one, we can return
|
||||
# this one immediately. If this one does not work, the urllib will raise an
|
||||
# exception during the connection attempt.
|
||||
if len(proxmox_api_endpoints) == 1:
|
||||
return proxmox_api_endpoints[0]
|
||||
|
||||
# If we have multiple Proxmox API endpoints, we need to check each one by
|
||||
# doing a connection attempt for IPv4 and IPv6. If we find a working one,
|
||||
# we return that one. This allows us to define multiple endpoints in a cluster.
|
||||
validated_api_hosts = []
|
||||
for host in proxmox_api_endpoints:
|
||||
validated = self.test_api_proxmox_host(host)
|
||||
if validated:
|
||||
validated_api_hosts.append(validated)
|
||||
|
||||
# Get or set a default value for a maximum of retries when connecting to
|
||||
# the Proxmox API
|
||||
api_connection_retries = proxlb_config["proxmox_api"].get("retries", 1)
|
||||
api_connection_wait_time = proxlb_config["proxmox_api"].get("wait_time", 1)
|
||||
|
||||
for api_connection_attempt in range(api_connection_retries):
|
||||
validated = self.test_api_proxmox_host(host)
|
||||
if validated:
|
||||
validated_api_hosts.append(validated)
|
||||
break
|
||||
else:
|
||||
logger.warning(f"Attempt {api_connection_attempt + 1}/{api_connection_retries} failed for host {host}. Retrying in {api_connection_wait_time} seconds...")
|
||||
time.sleep(api_connection_wait_time)
|
||||
|
||||
if len(validated_api_hosts) > 0:
|
||||
# Choose a random host to distribute the load across the cluster
|
||||
@@ -307,7 +313,7 @@ class ProxmoxApi:
|
||||
sock.close()
|
||||
logger.warning(f"Host {host} is unreachable on IPv6 for tcp/{port}.")
|
||||
|
||||
logger.debug("Finished: test_api_proxmox_host_ipv4.")
|
||||
logger.debug("Finished: test_api_proxmox_host_ipv6.")
|
||||
return False
|
||||
|
||||
def test_api_user_permissions(self, proxmox_api: any):
|
||||
@@ -372,7 +378,7 @@ class ProxmoxApi:
|
||||
self.validate_config(proxlb_config)
|
||||
|
||||
# Get a valid Proxmox API endpoint
|
||||
proxmox_api_endpoint = self.api_connect_get_hosts(proxlb_config.get("proxmox_api", {}).get("hosts", []))
|
||||
proxmox_api_endpoint = self.api_connect_get_hosts(proxlb_config, proxlb_config.get("proxmox_api", {}).get("hosts", []))
|
||||
|
||||
# Disable warnings for SSL certificate validation
|
||||
if not proxlb_config.get("proxmox_api").get("ssl_verification", True):
|
||||
|
||||
@@ -3,5 +3,5 @@ __app_desc__ = "A DRS alike loadbalancer for Proxmox clusters."
|
||||
__author__ = "Florian Paul Azim Hoberg <gyptazy>"
|
||||
__copyright__ = "Copyright (C) 2025 Florian Paul Azim Hoberg (@gyptazy)"
|
||||
__license__ = "GPL-3.0"
|
||||
__version__ = "1.1.1"
|
||||
__version__ = "1.1.2b"
|
||||
__url__ = "https://github.com/gyptazy/ProxLB"
|
||||
|
||||
@@ -1,11 +1,11 @@
|
||||
[Unit]
|
||||
Description=ProxLB - A loadbalancer for Proxmox clusters
|
||||
After=pveproxy.service
|
||||
Wants=pveproxy.service
|
||||
After=network-online.target pveproxy.service
|
||||
Wants=network-online.target pveproxy.service
|
||||
|
||||
[Service]
|
||||
ExecStart=python3 /usr/lib/python3/dist-packages/proxlb/main.py -c /etc/proxlb/proxlb.yaml
|
||||
User=plb
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
WantedBy=multi-user.target
|
||||
|
||||
2
setup.py
2
setup.py
@@ -2,7 +2,7 @@ from setuptools import setup
|
||||
|
||||
setup(
|
||||
name="proxlb",
|
||||
version="1.1.1",
|
||||
version="1.1.2b",
|
||||
description="A DRS alike loadbalancer for Proxmox clusters.",
|
||||
long_description="An advanced DRS alike loadbalancer for Proxmox clusters that also supports maintenance modes and affinity/anti-affinity rules.",
|
||||
author="Florian Paul Azim Hoberg",
|
||||
|
||||
Reference in New Issue
Block a user