MySQL Service Daily Crashes #349

Closed
opened 2026-04-05 20:27:09 +02:00 by MrUnknownDE · 0 comments
Owner

Originally created by @JulianPrieber on 8/25/2023

CloudPanel version(s) affected

v2.3.1-latest
maybe v2.3.0

Description

Description:
For the past two weeks, our server has been plagued by a recurring issue where the MySQL service crashes on a daily basis, necessitating manual restarts and causing significant periods of downtime. Multiple users on the Discord server have reported experiencing the same problem, indicating that this is not an isolated incident. Despite exhaustive efforts to mitigate the issue, such as increasing RAM, adjusting swap settings, and modifying the vm.overcommit_memory parameter, the problem persists. Interestingly, the only temporary workaround has been to set vm.overcommit_memory to 2, which prevents MySQL crashes but introduces complications as other systems contend with RAM limitations.

Our server setup is characterized by two nearly identical systems. However, the problematic behavior is isolated to one system, while the other continues to operate without disruptions. Both servers were established within the last three weeks, initially on CloudPanel version 2.3.1 and subsequently upgraded to 2.3.2. The afflicted server utilizes ARMx64 architecture, boasts 8GB of RAM, 4 cores, and runs Ubuntu 22 alongside MariaDB 10.11.5. We've been long-time users of CloudPanel since its version 1 release without encountering such issues. Our server is home to various PHP applications, including WordPress, and the root cause appears to be intertwined with Redis failures during database dumping, ultimately leading to memory cache overflow.

Attempts to reproduce the issue on a cloned system have proved unsuccessful, pointing to the possibility that traffic or specific database activity could be triggering the problem. Notably, during these crashes, no unusual spikes in incoming or outgoing traffic are detected. In fact, the servers tend to remain in an idle state with only sporadic requests.


Additional Notes:
Given the severe impact of this issue on system reliability and uptime, immediate and focused attention is imperative to identify the underlying cause and implement a sustainable solution. The sporadic nature of the crashes and their exclusive occurrence on one system, despite its similarity to another stable system, suggest that external factors or specific configurations could be influencing this behavior. Notably, the fact that another user has reported a similar issue on a system with 32GB of RAM indicates that the problem likely transcends a simple memory limitation.

How to reproduce

The method to reproduce this issue is currently unknown. Assistance with troubleshooting and investigation is greatly appreciated.

Possible Solution

No response

Additional Context

Log snipped:

06:16:18 kernel: [58632.865465] redis-server invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
06:16:18 kernel: [58632.865493]  oom_kill_process+0x25c/0x260
06:16:18 kernel: [58632.865499]  __alloc_pages_may_oom+0x118/0x19c
06:16:18 kernel: [58632.865625] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
06:16:18 kernel: [58632.866121] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=redis-server.service,mems_allowed=0,global_oom,task_memcg=/system.slice/mariadb.service,task=mariadbd,pid=1235,uid=115
06:16:18 kernel: [58632.866347] Out of memory: Killed process 1235 (mariadbd) total-vm:3061516kB, anon-rss:15524kB, file-rss:0kB, shmem-rss:0kB, UID:115 pgtables:1376kB oom_score_adj:0
06:16:20 systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer.
06:16:20 systemd[1]: mariadb.service: Failed with result 'oom-kill'.
06:16:36 kernel: [58642.140590] containerd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-999
06:16:36 kernel: [58642.140619]  oom_kill_process+0x25c/0x260
06:16:36 kernel: [58642.140625]  __alloc_pages_may_oom+0x118/0x19c
06:16:36 kernel: [58642.140720] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
06:16:36 kernel: [58642.141225] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=containerd.service,mems_allowed=0,global_oom,task_memcg=/system.slice/varnish.service,task=cache-main,pid=1470,uid=121
06:16:36 kernel: [58642.141425] Out of memory: Killed process 1470 (cache-main) total-vm:322680kB, anon-rss:0kB, file-rss:80812kB, shmem-rss:0kB, UID:121 pgtables:464kB oom_score_adj:0
06:16:36 kernel: [58644.155535] oom_reaper: reaped process 1470 (cache-main), now anon-rss:0kB, file-rss:82108kB, shmem-rss:0kB
06:16:36 kernel: [58644.225988] php-fpm8.2 invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0
06:16:36 kernel: [58644.226016]  oom_kill_process+0x25c/0x260
06:16:36 kernel: [58644.226022]  __alloc_pages_may_oom+0x118/0x19c
06:16:36 kernel: [58644.226148] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
06:16:37 kernel: [58644.226649] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=php8.2-fpm.service,mems_allowed=0,global_oom,task_memcg=/system.slice/php8.2-fpm.service,task=php-fpm8.2,pid=68068,uid=1001
06:16:37 kernel: [58644.226663] Out of memory: Killed process 68068 (php-fpm8.2) total-vm:967848kB, anon-rss:70032kB, file-rss:2596kB, shmem-rss:7172kB, UID:1001 pgtables:408kB oom_score_adj:0
06:16:38 kernel: [58652.174588] systemd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
06:16:38 kernel: [58652.174615]  oom_kill_process+0x25c/0x260
06:16:38 kernel: [58652.174621]  __alloc_pages_may_oom+0x118/0x19c
06:16:38 kernel: [58652.174715] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
06:16:38 kernel: [58652.175216] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=init.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/php8.2-fpm.service,task=php-fpm8.2,pid=67779,uid=1001
06:16:38 kernel: [58652.175231] Out of memory: Killed process 67779 (php-fpm8.2) total-vm:967840kB, anon-rss:20324kB, file-rss:2700kB, shmem-rss:1324kB, UID:1001 pgtables:408kB oom_score_adj:0
06:16:38 systemd[1]: php8.2-fpm.service: A process of this unit has been killed by the OOM killer.
06:16:38 systemd[1]: varnish.service: A process of this unit has been killed by the OOM killer.
06:16:40 systemd[1]: php8.2-fpm.service: Failed with result 'oom-kill'.
06:16:40 systemd[1]: varnish.service: Failed with result 'oom-kill'.

Failure around 4:26 PM:
chrome_elqPHKv5Mg
Note: The system experiences high CPU utilization and continuous disk writing until both memory and swap resources are fully exhausted.



Updates:
Our troubleshooting efforts have been ongoing. In order to address the persisting issue, we took steps to investigate potential culprits based on the error logs. Specifically, we proceeded to disable two prominent processes: Redis and Varnish cache.

However, despite the deactivation of these processes, the problem persists, leading us to conclude that neither Redis nor Varnish cache are responsible for the issue at hand. Further investigation is required to identify the root cause of the problem.

We logged failures for the last few days with no apparent pattern. The failures seem to get more frequent and more sporadic. With previous failures seeming to only occur at night between 3 and 5 AM (with the daily backup scheduled at 2 AM).

Failure at: 2023-08-22 23:09:03
Failure at: 2023-08-23 02:01:55
Failure at: 2023-08-23 18:55:04
Failure at: 2023-08-25 17:02:31
Failure at: 2023-08-25 17:03:02
Failure at: 2023-08-26 03:35:02
Failure at: 2023-08-27 03:06:09
Failure at: 2023-08-27 17:41:02
Failure at: 2023-08-28 14:12:55
Failure at: 2023-08-30 10:09:54
Failure at: 2023-08-30 10:12:02
Failure at: 2023-08-31 07:35:02
Failure at: 2023-09-01 03:18:34
Failure at: 2023-09-01 03:18:35
Failure at: 2023-09-01 14:50:17
Failure at: 2023-09-01 18:48:19
Failure at: 2023-09-01 19:00:08
Failure at: 2023-09-02 00:06:29
Failure at: 2023-09-02 20:15:02
Failure at: 2023-09-03 14:06:57
Failure at: 2023-09-03 14:08:02
Failure at: 2023-09-03 15:40:02
Failure at: 2023-09-03 23:12:15
Failure at: 2023-09-05 04:09:02
Failure at: 2023-09-05 15:26:01
Failure at: 2023-09-06 00:14:02
Failure at: 2023-09-06 15:53:13
*Originally created by @JulianPrieber on 8/25/2023* ### CloudPanel version(s) affected v2.3.1-latest _maybe v2.3.0_ ### Description **Description:** For the past two weeks, our server has been plagued by a recurring issue where the MySQL service crashes on a daily basis, necessitating manual restarts and causing significant periods of downtime. Multiple users on the Discord server have reported experiencing the same problem, indicating that this is not an isolated incident. Despite exhaustive efforts to mitigate the issue, such as increasing RAM, adjusting swap settings, and modifying the `vm.overcommit_memory` parameter, the problem persists. Interestingly, the only temporary workaround has been to set `vm.overcommit_memory` to 2, which prevents MySQL crashes but introduces complications as other systems contend with RAM limitations. Our server setup is characterized by two nearly identical systems. However, the problematic behavior is isolated to one system, while the other continues to operate without disruptions. Both servers were established within the last three weeks, initially on CloudPanel version 2.3.1 and subsequently upgraded to 2.3.2. The afflicted server utilizes ARMx64 architecture, boasts 8GB of RAM, 4 cores, and runs Ubuntu 22 alongside MariaDB 10.11.5. We've been long-time users of CloudPanel since its version 1 release without encountering such issues. Our server is home to various PHP applications, including WordPress, and the root cause appears to be intertwined with Redis failures during database dumping, ultimately leading to memory cache overflow. Attempts to reproduce the issue on a cloned system have proved unsuccessful, pointing to the possibility that traffic or specific database activity could be triggering the problem. Notably, during these crashes, no unusual spikes in incoming or outgoing traffic are detected. In fact, the servers tend to remain in an idle state with only sporadic requests. <br> **Additional Notes:** Given the severe impact of this issue on system reliability and uptime, immediate and focused attention is imperative to identify the underlying cause and implement a sustainable solution. The sporadic nature of the crashes and their exclusive occurrence on one system, despite its similarity to another stable system, suggest that external factors or specific configurations could be influencing this behavior. Notably, the fact that another user has reported a similar issue on a system with 32GB of RAM indicates that the problem likely transcends a simple memory limitation. ### How to reproduce The method to reproduce this issue is currently unknown. Assistance with troubleshooting and investigation is greatly appreciated. ### Possible Solution _No response_ ### Additional Context Log snipped: ``` 06:16:18 kernel: [58632.865465] redis-server invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 06:16:18 kernel: [58632.865493] oom_kill_process+0x25c/0x260 06:16:18 kernel: [58632.865499] __alloc_pages_may_oom+0x118/0x19c 06:16:18 kernel: [58632.865625] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name 06:16:18 kernel: [58632.866121] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=redis-server.service,mems_allowed=0,global_oom,task_memcg=/system.slice/mariadb.service,task=mariadbd,pid=1235,uid=115 06:16:18 kernel: [58632.866347] Out of memory: Killed process 1235 (mariadbd) total-vm:3061516kB, anon-rss:15524kB, file-rss:0kB, shmem-rss:0kB, UID:115 pgtables:1376kB oom_score_adj:0 06:16:20 systemd[1]: mariadb.service: A process of this unit has been killed by the OOM killer. 06:16:20 systemd[1]: mariadb.service: Failed with result 'oom-kill'. 06:16:36 kernel: [58642.140590] containerd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=-999 06:16:36 kernel: [58642.140619] oom_kill_process+0x25c/0x260 06:16:36 kernel: [58642.140625] __alloc_pages_may_oom+0x118/0x19c 06:16:36 kernel: [58642.140720] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name 06:16:36 kernel: [58642.141225] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=containerd.service,mems_allowed=0,global_oom,task_memcg=/system.slice/varnish.service,task=cache-main,pid=1470,uid=121 06:16:36 kernel: [58642.141425] Out of memory: Killed process 1470 (cache-main) total-vm:322680kB, anon-rss:0kB, file-rss:80812kB, shmem-rss:0kB, UID:121 pgtables:464kB oom_score_adj:0 06:16:36 kernel: [58644.155535] oom_reaper: reaped process 1470 (cache-main), now anon-rss:0kB, file-rss:82108kB, shmem-rss:0kB 06:16:36 kernel: [58644.225988] php-fpm8.2 invoked oom-killer: gfp_mask=0x400dc0(GFP_KERNEL_ACCOUNT|__GFP_ZERO), order=0, oom_score_adj=0 06:16:36 kernel: [58644.226016] oom_kill_process+0x25c/0x260 06:16:36 kernel: [58644.226022] __alloc_pages_may_oom+0x118/0x19c 06:16:36 kernel: [58644.226148] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name 06:16:37 kernel: [58644.226649] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=php8.2-fpm.service,mems_allowed=0,global_oom,task_memcg=/system.slice/php8.2-fpm.service,task=php-fpm8.2,pid=68068,uid=1001 06:16:37 kernel: [58644.226663] Out of memory: Killed process 68068 (php-fpm8.2) total-vm:967848kB, anon-rss:70032kB, file-rss:2596kB, shmem-rss:7172kB, UID:1001 pgtables:408kB oom_score_adj:0 06:16:38 kernel: [58652.174588] systemd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0 06:16:38 kernel: [58652.174615] oom_kill_process+0x25c/0x260 06:16:38 kernel: [58652.174621] __alloc_pages_may_oom+0x118/0x19c 06:16:38 kernel: [58652.174715] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name 06:16:38 kernel: [58652.175216] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=init.scope,mems_allowed=0,global_oom,task_memcg=/system.slice/php8.2-fpm.service,task=php-fpm8.2,pid=67779,uid=1001 06:16:38 kernel: [58652.175231] Out of memory: Killed process 67779 (php-fpm8.2) total-vm:967840kB, anon-rss:20324kB, file-rss:2700kB, shmem-rss:1324kB, UID:1001 pgtables:408kB oom_score_adj:0 06:16:38 systemd[1]: php8.2-fpm.service: A process of this unit has been killed by the OOM killer. 06:16:38 systemd[1]: varnish.service: A process of this unit has been killed by the OOM killer. 06:16:40 systemd[1]: php8.2-fpm.service: Failed with result 'oom-kill'. 06:16:40 systemd[1]: varnish.service: Failed with result 'oom-kill'. ``` <br> **Failure around 4:26 PM:** ![chrome_elqPHKv5Mg](https://github.com/cloudpanel-io/cloudpanel-ce/assets/60265788/eec0d35e-d325-4eb1-ab0d-11fe02eb9d02) _Note: The system experiences high CPU utilization and continuous disk writing until both memory and swap resources are fully exhausted._ <br><br> **Updates:** Our troubleshooting efforts have been ongoing. In order to address the persisting issue, we took steps to investigate potential culprits based on the error logs. Specifically, we proceeded to disable two prominent processes: Redis and Varnish cache. However, despite the deactivation of these processes, the problem persists, leading us to conclude that neither Redis nor Varnish cache are responsible for the issue at hand. Further investigation is required to identify the root cause of the problem. We logged failures for the last few days with no apparent pattern. The failures seem to get more frequent and more sporadic. With previous failures seeming to only occur at night between 3 and 5 AM (with the daily backup scheduled at 2 AM). ``` Failure at: 2023-08-22 23:09:03 Failure at: 2023-08-23 02:01:55 Failure at: 2023-08-23 18:55:04 Failure at: 2023-08-25 17:02:31 Failure at: 2023-08-25 17:03:02 Failure at: 2023-08-26 03:35:02 Failure at: 2023-08-27 03:06:09 Failure at: 2023-08-27 17:41:02 Failure at: 2023-08-28 14:12:55 Failure at: 2023-08-30 10:09:54 Failure at: 2023-08-30 10:12:02 Failure at: 2023-08-31 07:35:02 Failure at: 2023-09-01 03:18:34 Failure at: 2023-09-01 03:18:35 Failure at: 2023-09-01 14:50:17 Failure at: 2023-09-01 18:48:19 Failure at: 2023-09-01 19:00:08 Failure at: 2023-09-02 00:06:29 Failure at: 2023-09-02 20:15:02 Failure at: 2023-09-03 14:06:57 Failure at: 2023-09-03 14:08:02 Failure at: 2023-09-03 15:40:02 Failure at: 2023-09-03 23:12:15 Failure at: 2023-09-05 04:09:02 Failure at: 2023-09-05 15:26:01 Failure at: 2023-09-06 00:14:02 Failure at: 2023-09-06 15:53:13 ```
Sign in to join this conversation.