Files
tor-guard-relay/docs/PERFORMANCE.md
rE-Bo0t.bx1 90b65ee469 🔖 release(v1.1.6): bind mount fix, full repo audit, workflow fixes
🔧 Entrypoint:
- Detect wrong ownership on /var/lib/tor and /var/lib/tor/keys at startup
  with actionable chown commands before Tor fails cryptically in Phase 4
- Accept DEBUG=TRUE, DEBUG=1, DEBUG=yes (case-insensitive)
- Fix signal trap bug: inner cleanup_verify_tmp no longer overwrites
  the global TERM/INT handler (could skip graceful shutdown)

🛡️ Security:
- Deprecate all versions < v1.1.5 (CVE-2025-15467, OpenSSL CVSS 9.8)
- Add deprecation notice to README and SECURITY.md
- Update lifecycle tables in CHANGELOG and SECURITY

🐛 Bug Fixes:
- Fix bootstrap detection in migrate-from-official.sh
  (parsed non-existent "bootstrap_percent" field — now "bootstrap")
- Fix health JSON docs across 4 files: uptime_seconds → uptime,
  add missing pid/errors fields, correct reachable type to string
- Fix validate.yml: bash -n → sh -n (POSIX script, not bash)

📚 Documentation:
- Add "Bind Mount Ownership" troubleshooting section to README
- Fix chown 1000:1000 typo → 100:101 in TROUBLESHOOTING-BRIDGE-MIGRATION.md
- Add [1.1.6] changelog entry
- Update version references across 20+ files to v1.1.6
- Update 47x alpine:3.22.2 → 3.23.3 across migration docs/scripts
- Fix tool count 4 → 5 in DEPLOYMENT, ARCHITECTURE, TROUBLESHOOTING
- Remove 5 broken links (CLAUDE.md, CONTRIBUTORS.md, SECURITY-AUDIT-REPORT.md)
- Fix stale image tags (:1.1.1/:1.1.2 → :latest) in 4 files
- Rewrite PR template as clean reusable form

⚙️ Workflow (release.yml):
- Fix duplicate title in release body (name + body both had ## 🧅 header)
- Fix trailing --- not being stripped from changelog extract
- Fix Full Changelog link comparing current tag to itself
- Extract Alpine version from Dockerfile instead of hardcoding
- Add fetch-depth: 0 for git history in release-notes job
- Fix fallback commit range when no conventional commits found

🐳 Dockerfiles:
- Fix stale base.name label (alpine:3.23.0 → alpine:3.23.3)
- Fix trailing whitespace after backslash in Dockerfile.edge

📋 Templates:
- Update cosmos-compose and docker-compose versions to 1.1.6
2026-02-08 16:04:22 +05:30

14 KiB
Raw Blame History

Performance Tuning & Optimization - Tor Guard Relay

Complete guide to optimizing CPU, memory, bandwidth, and network performance for your Tor relay.


Table of Contents


Performance Baseline

System Requirements by Relay Tier

Tier CPU RAM Bandwidth Use Case
Entry 1 core 512 MB 1050 Mbps Home lab, testing
Standard 2 cores 12 GB 50500 Mbps Production guard relay
High-Capacity 4+ cores 4+ GB 500+ Mbps High-traffic relay
Enterprise 8+ cores 8+ GB 1 Gbps+ Multiple relays

Expected Resource Usage (Steady State)

Resource Entry Standard High-Cap Notes
CPU 515% 1025% 2040% Varies by traffic
Memory 80150 MB 200400 MB 500+ MB Increases with connections
Bandwidth 550 Mbps 50500 Mbps 500+ Mbps Depends on limits
Disk I/O Light Moderate Heavy Monitor during bootstrap

CPU Optimization

1. Allocate CPU Cores

By default, Tor uses all available cores. Restrict or optimize as needed.

Check Current Allocation

# View Tor config
docker exec guard-relay grep -i numcpus /etc/tor/torrc

# View system CPUs
docker exec guard-relay nproc

Configure CPU Cores in relay.conf

# Use specific number of cores (example: 4 cores)
NumCPUs 4

# Or auto-detect (default, recommended)
NumCPUs 0

For Docker Compose

services:
  tor-guard-relay:
    # ... other config
    deploy:
      resources:
        limits:
          cpus: '4.0'  # Limit to 4 cores
        reservations:
          cpus: '2.0'  # Reserve 2 cores minimum

2. CPU Prioritization

Ensure Tor gets fair CPU scheduling.

# View current CPU usage
docker stats guard-relay --no-stream

# Show detailed CPU metrics
docker exec guard-relay ps aux | grep tor

3. Disable Unnecessary Features

# Disable directory service (if not needed)
# DirPort 0

# Keep SOCKS disabled (we're a relay, not a client)
SocksPort 0

# Disable bridge operation (if running guard relay)
BridgeRelay 0

4. Optimize Connection Handling

# Maximum simultaneous connections
# Default usually fine, but can tune:
# MaxClientCircuitsPending 100

# Connection timeout (default 15 minutes)
# CircuitIdleTimeout 900

Memory Management

1. Monitor Memory Usage

# Real-time memory monitoring
docker stats guard-relay

# View memory trends over 1 hour
watch -n 60 'docker exec guard-relay ps aux | grep tor | grep -v grep'

# Historical memory usage
docker exec guard-relay cat /proc/meminfo

2. Set Memory Limits in Docker Compose

services:
  tor-guard-relay:
    deploy:
      resources:
        limits:
          memory: 2G        # Hard limit
        reservations:
          memory: 1G        # Guaranteed allocation

3. Configure Tor Memory Settings

# MaxMemInQueues - Maximum total memory for circuit queues
# Default: 512 MB (usually fine)
MaxMemInQueues 512 MB

# When memory hits threshold, new circuits rejected
# Prevents OOM (out of memory) crashes

4. Handle Memory Leaks

Monitor for gradual increase:

#!/bin/bash
# Save as: /usr/local/bin/monitor-memory-growth.sh

CONTAINER="guard-relay"
INTERVAL=300  # 5 minutes

while true; do
  MEMORY=$(docker exec "$CONTAINER" ps aux | \
    grep '[t]or ' | awk '{print $6}' | head -1)
  
  echo "$(date): Memory = ${MEMORY}KB"
  sleep $INTERVAL
done

Run and observe for 24 hours:

/usr/local/bin/monitor-memory-growth.sh | tee /tmp/memory-log.txt

# Analyze growth rate
tail -20 /tmp/memory-log.txt

Bandwidth Optimization

1. Understand Bandwidth Limits

# Average bandwidth (sustained rate)
RelayBandwidthRate 100 MBytes

# Burst bandwidth (temporary spikes)
RelayBandwidthBurst 200 MBytes

2. Set Realistic Limits

Calculate your limits based on ISP:

Available Bandwidth: 1000 Mbps (ISP plan)
Usable for Tor: 50% (leave headroom for other services)
= 500 Mbps

Convert to MBytes/s: 500 Mbps ÷ 8 = 62.5 MBytes/s

Recommended:
- RelayBandwidthRate 50 MBytes
- RelayBandwidthBurst 100 MBytes

3. Bandwidth Accounting

Limit total monthly traffic:

# Monthly accounting window
# Starts on the 1st at UTC midnight
AccountingStart month 1 00:00

# Maximum data (upload + download combined)
AccountingMax 1000 GB

4. Monitor Actual Bandwidth Usage

# Real-time bandwidth stats
docker exec guard-relay tail -f /var/log/tor/notices.log | grep "bandwidth"

# Historical bandwidth usage
docker exec guard-relay grep "bandwidth" /var/log/tor/notices.log | tail -20

5. Optimize for Your Network

For Home Networks

# Conservative settings for residential connections
RelayBandwidthRate 10 MBytes
RelayBandwidthBurst 20 MBytes

For VPS with Unmetered Bandwidth

# Maximize contribution
RelayBandwidthRate 500 MBytes
RelayBandwidthBurst 1000 MBytes

For Datacenters with Traffic Shaping

# Match provider limits
RelayBandwidthRate 100 MBytes  # ISP limit
RelayBandwidthBurst 150 MBytes

Network Tuning

1. Enable IPv6 (if available)

In relay.conf:

# Dual-stack support
ORPort 9001
ORPort [::]:9001

# Directory port for IPv6
DirPort 9030

Verify IPv6 is working:

docker exec guard-relay curl -6 -s https://icanhazip.com
# Should return IPv6 address

docker exec guard-relay curl -4 -s https://icanhazip.com
# Should return IPv4 address

2. Optimize TCP Settings

On the host system (for Docker host):

# Increase TCP connection backlog
sudo sysctl -w net.core.somaxconn=65535

# Increase listen queue length
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65535

# Enable TCP keepalives
sudo sysctl -w net.ipv4.tcp_keepalives_intvl=60

# Make permanent
echo "net.core.somaxconn=65535" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog=65535" | sudo tee -a /etc/sysctl.conf

3. Firewall Optimization

Ensure firewall rules don't throttle traffic:

# UFW example
sudo ufw status

# High performance rules
sudo iptables -I INPUT -p tcp --dport 9001 -j ACCEPT

# Save rules
sudo iptables-save > /etc/iptables/rules.v4

4. DNS Performance

Configure Tor to use fast DNS:

# Use Google DNS (example)
ServerDNSListenAddress 127.0.0.1:53
ServerDNSResolvConfFile /etc/resolv.conf

Verify DNS resolution is fast:

# Test DNS response time
time docker exec guard-relay tor --resolve example.com

Monitoring & Metrics

=v1.1.1 uses external monitoring with the health JSON API for minimal image size and maximum security.

1. JSON Health API

Get relay metrics via the health tool:

# Get full health status (raw JSON)
docker exec guard-relay health

# Parse with jq (requires jq on host)
docker exec guard-relay health | jq .

# Check specific metrics
docker exec guard-relay health | jq .bootstrap      # Bootstrap percentage
docker exec guard-relay health | jq .reachable      # ORPort reachability
docker exec guard-relay health | jq .uptime          # Uptime

Example JSON output:

{
  "status": "up",
  "pid": 1,
  "uptime": "1-00:00:00",
  "bootstrap": 100,
  "reachable": "true",
  "errors": 0,
  "fingerprint": "1234567890ABCDEF...",
  "nickname": "MyRelay"
}

2. Prometheus Integration (External)

Use the health tool with Prometheus node_exporter textfile collector:

Create metrics exporter script:

#!/bin/bash
# /usr/local/bin/tor-metrics-exporter.sh
# Requires: jq on host (apt install jq / brew install jq)

HEALTH=$(docker exec guard-relay health)

echo "$HEALTH" | jq -r '
  "tor_bootstrap_percent \(.bootstrap)",
  "tor_reachable \(if .reachable == "true" then 1 else 0 end)"
' > /var/lib/node_exporter/textfile_collector/tor.prom

Run via cron every 5 minutes:

chmod +x /usr/local/bin/tor-metrics-exporter.sh
crontab -e
*/5 * * * * /usr/local/bin/tor-metrics-exporter.sh

3. Set Up Prometheus Scraping

prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'node_exporter'  # Scrapes textfile collector
    static_configs:
      - targets: ['localhost:9035']
    metrics_path: '/metrics'

4. Create Grafana Dashboard

Key metrics to track:

# Bandwidth rates
rate(tor_relay_bytes_read_total[5m])
rate(tor_relay_bytes_written_total[5m])

# Connection counts
tor_relay_connections

# CPU usage
rate(process_cpu_seconds_total[5m])

# Memory usage
process_resident_memory_bytes / 1024 / 1024

Benchmarking

Baseline Test (New Relay)

Run after initial bootstrap to establish baseline.

#!/bin/bash
# Save as: /usr/local/bin/benchmark-relay.sh

CONTAINER="guard-relay"
DURATION=300  # 5 minutes

echo "=== Tor Relay Benchmark ==="
echo "Duration: $DURATION seconds"
echo ""

# Capture initial state
MEM_START=$(docker exec $CONTAINER ps aux | grep '[t]or ' | awk '{print $6}')
CPU_START=$(docker exec $CONTAINER ps aux | grep '[t]or ' | awk '{print $3}')

echo "Starting metrics..."
echo "Initial Memory: ${MEM_START}KB"
echo "Initial CPU: ${CPU_START}%"
echo ""

# Run for duration
sleep $DURATION

# Capture final state
MEM_END=$(docker exec $CONTAINER ps aux | grep '[t]or ' | awk '{print $6}')
CPU_END=$(docker exec $CONTAINER ps aux | grep '[t]or ' | awk '{print $3}')

# Bandwidth
BW_READ=$(docker exec $CONTAINER grep "bandwidth" /var/log/tor/notices.log | tail -1)
BW_WRITE=$(docker logs $CONTAINER 2>&1 | grep "bandwidth" | tail -1)

echo "=== Results ==="
echo "Memory Delta: $(( MEM_END - MEM_START ))KB"
echo "CPU Usage: ${CPU_END}%"
echo "Last Bandwidth Report:"
echo "  Read: $BW_READ"
echo "  Write: $BW_WRITE"
echo ""
echo "Timestamp: $(date)"

Run benchmark:

chmod +x /usr/local/bin/benchmark-relay.sh
/usr/local/bin/benchmark-relay.sh

Compare Against Benchmarks

Metric Entry Standard High-Cap
5-min avg CPU <15% 1025% 2040%
5-min avg MEM <200 MB 200500 MB 500+ MB
Active Connections <100 100500 5002000
Bootstrap Time 1030 min 1030 min 1030 min

Troubleshooting

High CPU Usage

Symptoms: CPU consistently >50%

Diagnosis:

# Check if relay is under heavy load
docker stats guard-relay --no-stream

# View top processes inside container
docker exec guard-relay ps aux --sort=-%cpu

# Check Tor config for tuning issues
docker exec guard-relay grep -E "NumCPUs|MaxClientCircuitsPending" /etc/tor/torrc

Solutions:

# Limit CPU cores
NumCPUs 2  # Instead of auto

# Reduce allowed circuits
MaxClientCircuitsPending 50  # Default is usually 100

High Memory Usage

Symptoms: Memory >75% of limit, or constantly increasing

Diagnosis:

# Check memory trend
docker exec guard-relay free -h

# Look for memory leak signs in logs
docker logs guard-relay 2>&1 | grep -i "memory\|oom"

# Check MaxMemInQueues setting
docker exec guard-relay grep MaxMemInQueues /etc/tor/torrc

Solutions:

# Reduce max in-flight data
MaxMemInQueues 256 MB  # More conservative

# Or increase if system has capacity
MaxMemInQueues 1024 MB  # If you have 8+ GB RAM

Low Bandwidth Usage

Symptoms: Bandwidth well below configured limits

Diagnosis:

# Check configured limits
docker exec guard-relay grep "RelayBandwidth" /etc/tor/torrc

# Check actual usage
docker logs guard-relay 2>&1 | grep "Average"

# Verify ORPort is reachable
docker exec guard-relay status | grep "reachable"
# Or use JSON health check
docker exec guard-relay health | jq .reachable

Solutions:

  • Give relay time to build reputation (28 weeks for full capacity)
  • Increase bandwidth limits if you have capacity
  • Check firewall isn't limiting traffic
  • Verify network connectivity is stable

Connection Pool Exhaustion

Symptoms: "Too many open files" errors

Diagnosis:

# Check file descriptor usage
docker exec guard-relay cat /proc/sys/fs/file-max
docker exec guard-relay ulimit -n

Solutions:

# Increase container file descriptor limit
docker run -d \
  --ulimit nofile=65535:65535 \
  # ... other options
  r3bo0tbx1/onion-relay:latest

Best Practices

DO

  • Monitor metrics continuously - Use Prometheus + Grafana
  • Start conservative, scale gradually - Begin with lower bandwidth limits
  • Test configuration changes - Benchmark before/after
  • Keep logs rotating - Prevent disk fill
  • Plan for peak load - Size hardware for bursts, not average
  • Document your settings - Know why you tuned each parameter

DON'T

  • Don't max out bandwidth day 1 - New relays need reputation first
  • Don't ignore resource limits - OOM kills are hard to debug
  • Don't tune blindly - Always measure, then adjust
  • Don't forget IPv6 - Half the network could be IPv6

Reference

Key Configuration Parameters:

# CPU
NumCPUs 4

# Memory
MaxMemInQueues 512 MB

# Bandwidth
RelayBandwidthRate 100 MBytes
RelayBandwidthBurst 200 MBytes

# Connections
MaxClientCircuitsPending 100

# Network
ORPort 9001
ORPort [::]:9001
DirPort 9030

Quick Performance Checklist:

  • CPU allocation set appropriately
  • Memory limits configured
  • Bandwidth limits realistic
  • IPv6 enabled (if available)
  • Metrics enabled for monitoring
  • Prometheus scraping configured
  • Alerts set for resource thresholds
  • Baseline benchmarks recorded

Support