🔧 Entrypoint: - Detect wrong ownership on /var/lib/tor and /var/lib/tor/keys at startup with actionable chown commands before Tor fails cryptically in Phase 4 - Accept DEBUG=TRUE, DEBUG=1, DEBUG=yes (case-insensitive) - Fix signal trap bug: inner cleanup_verify_tmp no longer overwrites the global TERM/INT handler (could skip graceful shutdown) 🛡️ Security: - Deprecate all versions < v1.1.5 (CVE-2025-15467, OpenSSL CVSS 9.8) - Add deprecation notice to README and SECURITY.md - Update lifecycle tables in CHANGELOG and SECURITY 🐛 Bug Fixes: - Fix bootstrap detection in migrate-from-official.sh (parsed non-existent "bootstrap_percent" field — now "bootstrap") - Fix health JSON docs across 4 files: uptime_seconds → uptime, add missing pid/errors fields, correct reachable type to string - Fix validate.yml: bash -n → sh -n (POSIX script, not bash) 📚 Documentation: - Add "Bind Mount Ownership" troubleshooting section to README - Fix chown 1000:1000 typo → 100:101 in TROUBLESHOOTING-BRIDGE-MIGRATION.md - Add [1.1.6] changelog entry - Update version references across 20+ files to v1.1.6 - Update 47x alpine:3.22.2 → 3.23.3 across migration docs/scripts - Fix tool count 4 → 5 in DEPLOYMENT, ARCHITECTURE, TROUBLESHOOTING - Remove 5 broken links (CLAUDE.md, CONTRIBUTORS.md, SECURITY-AUDIT-REPORT.md) - Fix stale image tags (:1.1.1/:1.1.2 → :latest) in 4 files - Rewrite PR template as clean reusable form ⚙️ Workflow (release.yml): - Fix duplicate title in release body (name + body both had ## 🧅 header) - Fix trailing --- not being stripped from changelog extract - Fix Full Changelog link comparing current tag to itself - Extract Alpine version from Dockerfile instead of hardcoding - Add fetch-depth: 0 for git history in release-notes job - Fix fallback commit range when no conventional commits found 🐳 Dockerfiles: - Fix stale base.name label (alpine:3.23.0 → alpine:3.23.3) - Fix trailing whitespace after backslash in Dockerfile.edge 📋 Templates: - Update cosmos-compose and docker-compose versions to 1.1.6
14 KiB
⚡ Performance Tuning & Optimization - Tor Guard Relay
Complete guide to optimizing CPU, memory, bandwidth, and network performance for your Tor relay.
Table of Contents
- Performance Baseline
- CPU Optimization
- Memory Management
- Bandwidth Optimization
- Network Tuning
- Monitoring & Metrics
- Benchmarking
- Troubleshooting
Performance Baseline
System Requirements by Relay Tier
| Tier | CPU | RAM | Bandwidth | Use Case |
|---|---|---|---|---|
| Entry | 1 core | 512 MB | 10–50 Mbps | Home lab, testing |
| Standard | 2 cores | 1–2 GB | 50–500 Mbps | Production guard relay |
| High-Capacity | 4+ cores | 4+ GB | 500+ Mbps | High-traffic relay |
| Enterprise | 8+ cores | 8+ GB | 1 Gbps+ | Multiple relays |
Expected Resource Usage (Steady State)
| Resource | Entry | Standard | High-Cap | Notes |
|---|---|---|---|---|
| CPU | 5–15% | 10–25% | 20–40% | Varies by traffic |
| Memory | 80–150 MB | 200–400 MB | 500+ MB | Increases with connections |
| Bandwidth | 5–50 Mbps | 50–500 Mbps | 500+ Mbps | Depends on limits |
| Disk I/O | Light | Moderate | Heavy | Monitor during bootstrap |
CPU Optimization
1. Allocate CPU Cores
By default, Tor uses all available cores. Restrict or optimize as needed.
Check Current Allocation
# View Tor config
docker exec guard-relay grep -i numcpus /etc/tor/torrc
# View system CPUs
docker exec guard-relay nproc
Configure CPU Cores in relay.conf
# Use specific number of cores (example: 4 cores)
NumCPUs 4
# Or auto-detect (default, recommended)
NumCPUs 0
For Docker Compose
services:
tor-guard-relay:
# ... other config
deploy:
resources:
limits:
cpus: '4.0' # Limit to 4 cores
reservations:
cpus: '2.0' # Reserve 2 cores minimum
2. CPU Prioritization
Ensure Tor gets fair CPU scheduling.
# View current CPU usage
docker stats guard-relay --no-stream
# Show detailed CPU metrics
docker exec guard-relay ps aux | grep tor
3. Disable Unnecessary Features
# Disable directory service (if not needed)
# DirPort 0
# Keep SOCKS disabled (we're a relay, not a client)
SocksPort 0
# Disable bridge operation (if running guard relay)
BridgeRelay 0
4. Optimize Connection Handling
# Maximum simultaneous connections
# Default usually fine, but can tune:
# MaxClientCircuitsPending 100
# Connection timeout (default 15 minutes)
# CircuitIdleTimeout 900
Memory Management
1. Monitor Memory Usage
# Real-time memory monitoring
docker stats guard-relay
# View memory trends over 1 hour
watch -n 60 'docker exec guard-relay ps aux | grep tor | grep -v grep'
# Historical memory usage
docker exec guard-relay cat /proc/meminfo
2. Set Memory Limits in Docker Compose
services:
tor-guard-relay:
deploy:
resources:
limits:
memory: 2G # Hard limit
reservations:
memory: 1G # Guaranteed allocation
3. Configure Tor Memory Settings
# MaxMemInQueues - Maximum total memory for circuit queues
# Default: 512 MB (usually fine)
MaxMemInQueues 512 MB
# When memory hits threshold, new circuits rejected
# Prevents OOM (out of memory) crashes
4. Handle Memory Leaks
Monitor for gradual increase:
#!/bin/bash
# Save as: /usr/local/bin/monitor-memory-growth.sh
CONTAINER="guard-relay"
INTERVAL=300 # 5 minutes
while true; do
MEMORY=$(docker exec "$CONTAINER" ps aux | \
grep '[t]or ' | awk '{print $6}' | head -1)
echo "$(date): Memory = ${MEMORY}KB"
sleep $INTERVAL
done
Run and observe for 24 hours:
/usr/local/bin/monitor-memory-growth.sh | tee /tmp/memory-log.txt
# Analyze growth rate
tail -20 /tmp/memory-log.txt
Bandwidth Optimization
1. Understand Bandwidth Limits
# Average bandwidth (sustained rate)
RelayBandwidthRate 100 MBytes
# Burst bandwidth (temporary spikes)
RelayBandwidthBurst 200 MBytes
2. Set Realistic Limits
Calculate your limits based on ISP:
Available Bandwidth: 1000 Mbps (ISP plan)
Usable for Tor: 50% (leave headroom for other services)
= 500 Mbps
Convert to MBytes/s: 500 Mbps ÷ 8 = 62.5 MBytes/s
Recommended:
- RelayBandwidthRate 50 MBytes
- RelayBandwidthBurst 100 MBytes
3. Bandwidth Accounting
Limit total monthly traffic:
# Monthly accounting window
# Starts on the 1st at UTC midnight
AccountingStart month 1 00:00
# Maximum data (upload + download combined)
AccountingMax 1000 GB
4. Monitor Actual Bandwidth Usage
# Real-time bandwidth stats
docker exec guard-relay tail -f /var/log/tor/notices.log | grep "bandwidth"
# Historical bandwidth usage
docker exec guard-relay grep "bandwidth" /var/log/tor/notices.log | tail -20
5. Optimize for Your Network
For Home Networks
# Conservative settings for residential connections
RelayBandwidthRate 10 MBytes
RelayBandwidthBurst 20 MBytes
For VPS with Unmetered Bandwidth
# Maximize contribution
RelayBandwidthRate 500 MBytes
RelayBandwidthBurst 1000 MBytes
For Datacenters with Traffic Shaping
# Match provider limits
RelayBandwidthRate 100 MBytes # ISP limit
RelayBandwidthBurst 150 MBytes
Network Tuning
1. Enable IPv6 (if available)
In relay.conf:
# Dual-stack support
ORPort 9001
ORPort [::]:9001
# Directory port for IPv6
DirPort 9030
Verify IPv6 is working:
docker exec guard-relay curl -6 -s https://icanhazip.com
# Should return IPv6 address
docker exec guard-relay curl -4 -s https://icanhazip.com
# Should return IPv4 address
2. Optimize TCP Settings
On the host system (for Docker host):
# Increase TCP connection backlog
sudo sysctl -w net.core.somaxconn=65535
# Increase listen queue length
sudo sysctl -w net.ipv4.tcp_max_syn_backlog=65535
# Enable TCP keepalives
sudo sysctl -w net.ipv4.tcp_keepalives_intvl=60
# Make permanent
echo "net.core.somaxconn=65535" | sudo tee -a /etc/sysctl.conf
echo "net.ipv4.tcp_max_syn_backlog=65535" | sudo tee -a /etc/sysctl.conf
3. Firewall Optimization
Ensure firewall rules don't throttle traffic:
# UFW example
sudo ufw status
# High performance rules
sudo iptables -I INPUT -p tcp --dport 9001 -j ACCEPT
# Save rules
sudo iptables-save > /etc/iptables/rules.v4
4. DNS Performance
Configure Tor to use fast DNS:
# Use Google DNS (example)
ServerDNSListenAddress 127.0.0.1:53
ServerDNSResolvConfFile /etc/resolv.conf
Verify DNS resolution is fast:
# Test DNS response time
time docker exec guard-relay tor --resolve example.com
Monitoring & Metrics
=v1.1.1 uses external monitoring with the
healthJSON API for minimal image size and maximum security.
1. JSON Health API
Get relay metrics via the health tool:
# Get full health status (raw JSON)
docker exec guard-relay health
# Parse with jq (requires jq on host)
docker exec guard-relay health | jq .
# Check specific metrics
docker exec guard-relay health | jq .bootstrap # Bootstrap percentage
docker exec guard-relay health | jq .reachable # ORPort reachability
docker exec guard-relay health | jq .uptime # Uptime
Example JSON output:
{
"status": "up",
"pid": 1,
"uptime": "1-00:00:00",
"bootstrap": 100,
"reachable": "true",
"errors": 0,
"fingerprint": "1234567890ABCDEF...",
"nickname": "MyRelay"
}
2. Prometheus Integration (External)
Use the health tool with Prometheus node_exporter textfile collector:
Create metrics exporter script:
#!/bin/bash
# /usr/local/bin/tor-metrics-exporter.sh
# Requires: jq on host (apt install jq / brew install jq)
HEALTH=$(docker exec guard-relay health)
echo "$HEALTH" | jq -r '
"tor_bootstrap_percent \(.bootstrap)",
"tor_reachable \(if .reachable == "true" then 1 else 0 end)"
' > /var/lib/node_exporter/textfile_collector/tor.prom
Run via cron every 5 minutes:
chmod +x /usr/local/bin/tor-metrics-exporter.sh
crontab -e
*/5 * * * * /usr/local/bin/tor-metrics-exporter.sh
3. Set Up Prometheus Scraping
prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'node_exporter' # Scrapes textfile collector
static_configs:
- targets: ['localhost:9035']
metrics_path: '/metrics'
4. Create Grafana Dashboard
Key metrics to track:
# Bandwidth rates
rate(tor_relay_bytes_read_total[5m])
rate(tor_relay_bytes_written_total[5m])
# Connection counts
tor_relay_connections
# CPU usage
rate(process_cpu_seconds_total[5m])
# Memory usage
process_resident_memory_bytes / 1024 / 1024
Benchmarking
Baseline Test (New Relay)
Run after initial bootstrap to establish baseline.
#!/bin/bash
# Save as: /usr/local/bin/benchmark-relay.sh
CONTAINER="guard-relay"
DURATION=300 # 5 minutes
echo "=== Tor Relay Benchmark ==="
echo "Duration: $DURATION seconds"
echo ""
# Capture initial state
MEM_START=$(docker exec $CONTAINER ps aux | grep '[t]or ' | awk '{print $6}')
CPU_START=$(docker exec $CONTAINER ps aux | grep '[t]or ' | awk '{print $3}')
echo "Starting metrics..."
echo "Initial Memory: ${MEM_START}KB"
echo "Initial CPU: ${CPU_START}%"
echo ""
# Run for duration
sleep $DURATION
# Capture final state
MEM_END=$(docker exec $CONTAINER ps aux | grep '[t]or ' | awk '{print $6}')
CPU_END=$(docker exec $CONTAINER ps aux | grep '[t]or ' | awk '{print $3}')
# Bandwidth
BW_READ=$(docker exec $CONTAINER grep "bandwidth" /var/log/tor/notices.log | tail -1)
BW_WRITE=$(docker logs $CONTAINER 2>&1 | grep "bandwidth" | tail -1)
echo "=== Results ==="
echo "Memory Delta: $(( MEM_END - MEM_START ))KB"
echo "CPU Usage: ${CPU_END}%"
echo "Last Bandwidth Report:"
echo " Read: $BW_READ"
echo " Write: $BW_WRITE"
echo ""
echo "Timestamp: $(date)"
Run benchmark:
chmod +x /usr/local/bin/benchmark-relay.sh
/usr/local/bin/benchmark-relay.sh
Compare Against Benchmarks
| Metric | Entry | Standard | High-Cap |
|---|---|---|---|
| 5-min avg CPU | <15% | 10–25% | 20–40% |
| 5-min avg MEM | <200 MB | 200–500 MB | 500+ MB |
| Active Connections | <100 | 100–500 | 500–2000 |
| Bootstrap Time | 10–30 min | 10–30 min | 10–30 min |
Troubleshooting
High CPU Usage
Symptoms: CPU consistently >50%
Diagnosis:
# Check if relay is under heavy load
docker stats guard-relay --no-stream
# View top processes inside container
docker exec guard-relay ps aux --sort=-%cpu
# Check Tor config for tuning issues
docker exec guard-relay grep -E "NumCPUs|MaxClientCircuitsPending" /etc/tor/torrc
Solutions:
# Limit CPU cores
NumCPUs 2 # Instead of auto
# Reduce allowed circuits
MaxClientCircuitsPending 50 # Default is usually 100
High Memory Usage
Symptoms: Memory >75% of limit, or constantly increasing
Diagnosis:
# Check memory trend
docker exec guard-relay free -h
# Look for memory leak signs in logs
docker logs guard-relay 2>&1 | grep -i "memory\|oom"
# Check MaxMemInQueues setting
docker exec guard-relay grep MaxMemInQueues /etc/tor/torrc
Solutions:
# Reduce max in-flight data
MaxMemInQueues 256 MB # More conservative
# Or increase if system has capacity
MaxMemInQueues 1024 MB # If you have 8+ GB RAM
Low Bandwidth Usage
Symptoms: Bandwidth well below configured limits
Diagnosis:
# Check configured limits
docker exec guard-relay grep "RelayBandwidth" /etc/tor/torrc
# Check actual usage
docker logs guard-relay 2>&1 | grep "Average"
# Verify ORPort is reachable
docker exec guard-relay status | grep "reachable"
# Or use JSON health check
docker exec guard-relay health | jq .reachable
Solutions:
- Give relay time to build reputation (2–8 weeks for full capacity)
- Increase bandwidth limits if you have capacity
- Check firewall isn't limiting traffic
- Verify network connectivity is stable
Connection Pool Exhaustion
Symptoms: "Too many open files" errors
Diagnosis:
# Check file descriptor usage
docker exec guard-relay cat /proc/sys/fs/file-max
docker exec guard-relay ulimit -n
Solutions:
# Increase container file descriptor limit
docker run -d \
--ulimit nofile=65535:65535 \
# ... other options
r3bo0tbx1/onion-relay:latest
Best Practices
✅ DO
- ✅ Monitor metrics continuously - Use Prometheus + Grafana
- ✅ Start conservative, scale gradually - Begin with lower bandwidth limits
- ✅ Test configuration changes - Benchmark before/after
- ✅ Keep logs rotating - Prevent disk fill
- ✅ Plan for peak load - Size hardware for bursts, not average
- ✅ Document your settings - Know why you tuned each parameter
❌ DON'T
- ❌ Don't max out bandwidth day 1 - New relays need reputation first
- ❌ Don't ignore resource limits - OOM kills are hard to debug
- ❌ Don't tune blindly - Always measure, then adjust
- ❌ Don't forget IPv6 - Half the network could be IPv6
Reference
Key Configuration Parameters:
# CPU
NumCPUs 4
# Memory
MaxMemInQueues 512 MB
# Bandwidth
RelayBandwidthRate 100 MBytes
RelayBandwidthBurst 200 MBytes
# Connections
MaxClientCircuitsPending 100
# Network
ORPort 9001
ORPort [::]:9001
DirPort 9030
Quick Performance Checklist:
- CPU allocation set appropriately
- Memory limits configured
- Bandwidth limits realistic
- IPv6 enabled (if available)
- Metrics enabled for monitoring
- Prometheus scraping configured
- Alerts set for resource thresholds
- Baseline benchmarks recorded