mirror of
https://github.com/databasus/databasus.git
synced 2026-04-06 00:32:03 +02:00
Critical Issue: signal: killed during 60GB Database Restore on Databasus Hi #162
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @WendelSTi on 2/4/2026
I’m reaching out because we are facing a persistent signal: killed error when attempting to restore a large database (~60GB) using the Databasus container.
The Problem: During the restore process, the logs return: mysql failed: signal: killed – stderr:. The restore for the UUID 549956b1-40f3-44c6-bfae-a2b4d4b31ab6 enters a loop, constantly returning HTTP 400 after the failure.
Procedures already performed:
Memory Allocation: I have already allocated over 20GB of RAM to both the host server (S2SVHL046) and the database container limits.
Resource Monitoring: * docker inspect shows OOMKilled: false, meaning the Docker daemon itself isn't killing the container for exceeding its hard limit.
System logs (dmesg, syslog) do not show a global Out-of-Memory (OOM) event.
valkey-server and postgres are running stable within the same container, but the restore sub-process is being terminated.
Process Verification: I’ve checked the resources using docker top. The main Go process (./main) remains active, but the child process responsible for the restore is the one receiving the SIGKILL.
Observations: Since the database file is 60GB and the RAM is ~20GB, it seems the restore process is trying to allocate a large buffer or temporary memory table that exceeds the allowed per-process limit (ulimit) or causes a memory spike faster than the monitoring tools can capture.
Request: Could you check how the application handles large dump files? Specifically:
Does it try to load large chunks into memory?
Are there any work_mem or max_allowed_packet configurations inside the start.sh or the Go binary that we should tune?
Is there a way to prevent the 400 error loop that saturates the logs after the first failure?
Looking forward to your feedback.
Best regards,