feat: Add Prometheus metrics for backup monitoring #329

Closed
opened 2026-04-05 16:16:13 +02:00 by MrUnknownDE · 0 comments
Owner

Originally created by @diegosarina on 12/26/2025

🎯 Summary

This PR adds Prometheus metrics support to Databasus, enabling comprehensive monitoring of backup operations. The implementation includes metrics for backup status, duration, size, and failure tracking.

What was added

Metrics Package (internal/features/metrics/)

  • Gauges: Track current backup status and in-progress backups
  • Counters: Count total backup attempts by status
  • Histograms: Track backup duration and size distributions
  • Labels: Support for database type, ID, name, workspace, and status

Metrics Exposed

  1. databasus_backups_status - Current number of backups by status
  2. databasus_backups_total - Total backup attempts counter
  3. databasus_backup_duration_seconds - Backup duration histogram
  4. databasus_backup_size_mb - Backup size histogram
  5. databasus_backups_in_progress - Currently running backups gauge

Endpoint

  • /metrics - Public endpoint (no auth required) for Prometheus scraping

🔧 Implementation Details

  • Instrumentation: Backup service now records metrics at key lifecycle events:

    • Backup start
    • Backup completion (with duration and size)
    • Backup failure (with duration)
    • Backup cancellation (with duration)
  • Label Sanitization: Database names are sanitized to comply with Prometheus label requirements

  • Workspace Support: Metrics include workspace ID for multi-tenant monitoring

📝 Documentation

  • Added METRICS.md with:
    • Complete metrics reference
    • PromQL query examples
    • Prometheus configuration guide
    • Grafana dashboard suggestions

Testing

  • Comprehensive test suite with 8 test cases covering:
    • Metric recording for all backup states
    • Label sanitization
    • Multiple backups accumulation
    • Database separation
    • Workspace handling

All tests pass:

🔍 Verification

  • All tests passing (make test)
  • Linting passes (make lint)
  • Code follows project conventions
  • Documentation included

📊 Example Metrics Output

❯ curl http://localhost:4005/metrics | grep databasus_backups
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
# HELP databasus_backups_in_progress Number of backups currently in progress by database type
# TYPE databasus_backups_in_progress gauge
databasus_backups_in_progress{database_id="3150fa67-23a6-4d8d-8495-05da4c0b4d13",database_name="test2",database_type="POSTGRES",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 0
databasus_backups_in_progress{database_id="ed77aaf1-c133-4174-911b-fd5c7709f4dc",database_name="test",database_type="POSTGRES",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 0
# HELP databasus_backups_status Current number of backups by status (in_progress, completed, failed, canceled)
# TYPE databasus_backups_status gauge
databasus_backups_status{status="completed"} 12
databasus_backups_status{status="in_progress"} 0
# HELP databasus_backups_total Total number of backup attempts
# TYPE databasus_backups_total counter
databasus_backups_total{database_id="3150fa67-23a6-4d8d-8495-05da4c0b4d13",database_name="test2",database_type="POSTGRES",status="completed",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 3
databasus_backups_total{database_id="3150fa67-23a6-4d8d-8495-05da4c0b4d13",database_name="test2",database_type="POSTGRES",status="started",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 3
databasus_backups_total{database_id="ed77aaf1-c133-4174-911b-fd5c7709f4dc",database_name="test",database_type="POSTGRES",status="completed",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 9
databasus_backups_total{database_id="ed77aaf1-c133-4174-911b-fd5c7709f4dc",database_name="test",database_type="POSTGRES",status="started",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 9
*Originally created by @diegosarina on 12/26/2025* ## 🎯 Summary This PR adds Prometheus metrics support to Databasus, enabling comprehensive monitoring of backup operations. The implementation includes metrics for backup status, duration, size, and failure tracking. ## ✨ What was added ### Metrics Package (`internal/features/metrics/`) - **Gauges**: Track current backup status and in-progress backups - **Counters**: Count total backup attempts by status - **Histograms**: Track backup duration and size distributions - **Labels**: Support for database type, ID, name, workspace, and status ### Metrics Exposed 1. `databasus_backups_status` - Current number of backups by status 2. `databasus_backups_total` - Total backup attempts counter 3. `databasus_backup_duration_seconds` - Backup duration histogram 4. `databasus_backup_size_mb` - Backup size histogram 5. `databasus_backups_in_progress` - Currently running backups gauge ### Endpoint - `/metrics` - Public endpoint (no auth required) for Prometheus scraping ## 🔧 Implementation Details - **Instrumentation**: Backup service now records metrics at key lifecycle events: - Backup start - Backup completion (with duration and size) - Backup failure (with duration) - Backup cancellation (with duration) - **Label Sanitization**: Database names are sanitized to comply with Prometheus label requirements - **Workspace Support**: Metrics include workspace ID for multi-tenant monitoring ## 📝 Documentation - Added `METRICS.md` with: - Complete metrics reference - PromQL query examples - Prometheus configuration guide - Grafana dashboard suggestions ## ✅ Testing - Comprehensive test suite with 8 test cases covering: - Metric recording for all backup states - Label sanitization - Multiple backups accumulation - Database separation - Workspace handling All tests pass: ✅ ## 🔍 Verification - [x] All tests passing (`make test`) - [x] Linting passes (`make lint`) - [x] Code follows project conventions - [x] Documentation included ## 📊 Example Metrics Output ``` ❯ curl http://localhost:4005/metrics | grep databasus_backups % Total % Received % Xferd Average Speed Time Time Time Current # HELP databasus_backups_in_progress Number of backups currently in progress by database type # TYPE databasus_backups_in_progress gauge databasus_backups_in_progress{database_id="3150fa67-23a6-4d8d-8495-05da4c0b4d13",database_name="test2",database_type="POSTGRES",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 0 databasus_backups_in_progress{database_id="ed77aaf1-c133-4174-911b-fd5c7709f4dc",database_name="test",database_type="POSTGRES",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 0 # HELP databasus_backups_status Current number of backups by status (in_progress, completed, failed, canceled) # TYPE databasus_backups_status gauge databasus_backups_status{status="completed"} 12 databasus_backups_status{status="in_progress"} 0 # HELP databasus_backups_total Total number of backup attempts # TYPE databasus_backups_total counter databasus_backups_total{database_id="3150fa67-23a6-4d8d-8495-05da4c0b4d13",database_name="test2",database_type="POSTGRES",status="completed",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 3 databasus_backups_total{database_id="3150fa67-23a6-4d8d-8495-05da4c0b4d13",database_name="test2",database_type="POSTGRES",status="started",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 3 databasus_backups_total{database_id="ed77aaf1-c133-4174-911b-fd5c7709f4dc",database_name="test",database_type="POSTGRES",status="completed",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 9 databasus_backups_total{database_id="ed77aaf1-c133-4174-911b-fd5c7709f4dc",database_name="test",database_type="POSTGRES",status="started",workspace_id="974c6b4b-2341-4b77-bb2c-959fd13ab74c"} 9 ```
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: github/databasus#329