MySQL 崩溃导致 Discuz! Database Error (2002) notconnect 错误

发布时间: 2015-12-12 20:45:34 作者: 大象笔记

中午发现周末搭建的 Disuz 论坛以及周五搭建的测试用 WordPress 均报错无法访问。

以 Discuz 为例,报错信息为:

Discuz! Database Error (2002) notconnect

很明显,问题出在了 MySQL。重新刷新,错误依旧,说明不是连接池问题。

重启 MySQL

首要工作是把 MySQL 重新启动起来

$ sudo mysqld
150608 11:49:31 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
150608 11:49:31 [Note] mysqld (mysqld 5.5.43-0ubuntu0.12.04.1) starting as process 23352 ...

但是,

ps axuw | grep mysql 

查不到 mysqld 的进程,说明 mysql 没有启动成功。同时检查 mysql 日志,没有发现任何信息

less /var/log/mysql.log
less /var/log/mysql.err

这两个文件居然是空文件 :(

首先怀疑 mysql 的日志配置是否有问题, 查看 mysql 配置文件 /etc/mysql/my.cnf, 可以查到 mysql 默认的错误日志地址为

error_log=/var/log/mysql/error.log

修正日志目录的权限

chown zhongwei:zhongwei /var/log/mysql

再重新启动 mysqld

$ sudo mysqld
150608 11:57:21 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
150608 11:57:21 [Note] mysqld (mysqld 5.5.43-0ubuntu0.12.04.1) starting as process 23859 ...
150608 11:57:21 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
150608 11:57:21 [Note] Plugin 'FEDERATED' is disabled.
150608 11:57:21 InnoDB: The InnoDB memory heap is disabled
150608 11:57:21 InnoDB: Mutexes and rw_locks use GCC atomic builtins
150608 11:57:21 InnoDB: Compressed tables use zlib 1.2.3.4
150608 11:57:21 InnoDB: Initializing buffer pool, size = 128.0M
InnoDB: mmap(137363456 bytes) failed; errno 12
150608 11:57:21 InnoDB: Completed initialization of buffer pool
150608 11:57:21 InnoDB: Fatal error: cannot allocate memory for the buffer pool
150608 11:57:21 [ERROR] Plugin 'InnoDB' init function returned error.
150608 11:57:21 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
150608 11:57:21 [ERROR] Unknown/unsupported storage engine: InnoDB
150608 11:57:21 [ERROR] Aborting

150608 11:57:21 [Note] mysqld: Shutdown complete

终于看到具体的错误信息了

150608 11:57:21 InnoDB: Initializing buffer pool, size = 128.0M
InnoDB: mmap(137363456 bytes) failed; errno 12
150608 11:57:21 InnoDB: Completed initialization of buffer pool
150608 11:57:21 InnoDB: Fatal error: cannot allocate memory for the buffer pool

网上查询了一下,其中一种解决方案是在 my.conf 中将这个 buffer 调小

innodb_buffer_pool_size=32M

然后再尝试启动,mysql 恢复正常了。

$ sudo mysqld
150608 13:18:02 [Warning] Using unique option prefix key_buffer instead of key_buffer_size is deprecated and will be removed in a future release. Please use the full name instead.
150608 13:18:02 [Note] mysqld (mysqld 5.5.43-0ubuntu0.12.04.1) starting as process 24222 ...
150608 13:18:02 [Warning] Using unique option prefix myisam-recover instead of myisam-recover-options is deprecated and will be removed in a future release. Please use the full name instead.
150608 13:18:02 [Note] Plugin 'FEDERATED' is disabled.
150608 13:18:02 InnoDB: The InnoDB memory heap is disabled
150608 13:18:02 InnoDB: Mutexes and rw_locks use GCC atomic builtins
150608 13:18:02 InnoDB: Compressed tables use zlib 1.2.3.4
150608 13:18:02 InnoDB: Initializing buffer pool, size = 32.0M
150608 13:18:02 InnoDB: Completed initialization of buffer pool
150608 13:18:02 InnoDB: highest supported file format is Barracuda.
InnoDB: The log sequence number in ibdata files does not match
InnoDB: the log sequence number in the ib_logfiles!
150608 13:18:02  InnoDB: Database was not shut down normally!
InnoDB: Starting crash recovery.
InnoDB: Reading tablespace information from the .ibd files...
InnoDB: Restoring possible half-written data pages from the doublewrite
InnoDB: buffer...
150608 13:18:03  InnoDB: Waiting for the background threads to start
150608 13:18:04 InnoDB: 5.5.43 started; log sequence number 107010173
150608 13:18:04 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
150608 13:18:04 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
150608 13:18:04 [Note] Server socket created on IP: '0.0.0.0'.
150608 13:18:04 [ERROR] mysqld: Table './mysql/user' is marked as crashed and should be repaired
150608 13:18:04 [Warning] Checking table:   './mysql/user'
150608 13:18:04 [ERROR] 1 client is using or hasn't closed the table properly
150608 13:18:04 [ERROR] mysqld: Table './mysql/db' is marked as crashed and should be repaired
150608 13:18:04 [Warning] Checking table:   './mysql/db'
150608 13:18:04 [ERROR] 1 client is using or hasn't closed the table properly
150608 13:18:04 [Note] Event Scheduler: Loaded 0 events
150608 13:18:04 [Note] mysqld: ready for connections.
Version: '5.5.43-0ubuntu0.12.04.1'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  (Ubuntu)

此时 Ctrl-Z,回到命令行模式,MySQL 已经可以正常使用了,此时访问 Discuz 论坛,一起正常。 只是,错误日志里会看到 Discuz 每个相关的表,都在报这个错误

150608 13:18:04 [ERROR] mysqld: Table 'XXX' is marked as crashed and should be repaired
150608 13:18:04 [Warning] Checking table:   'XXX'

貌似忽略了也没有影响,进入 mysql, 使用 check table XXX 显示正常。

innodb_buffer_pool_size 有什么用

参考 InnoDB Startup Options and System Variables

The size in bytes of the memory buffer InnoDB uses to cache data and indexes of its tables. The default value is 8MB. The larger you set this value, the less disk I/O is needed to access data in tables. On a dedicated database server, you may set this to up to 80% of the machine physical memory size. However, do not set it too large because competition for physical memory might cause paging in the operating system.

原来是用来缓存数据和索引,以减少磁盘 I/O.

奇葩的是,为何我的 mysql 默认配置不是 8M,而是 128M.

原来有少侠有同样的疑问,innodb_buffer_pool_size with 128M default is too large innodb_buffer_pool_size 的默认值,在 5.5 之后就被调成了 128M, 参考最新的 mysql 5.5, 5.6 文档就会看到。。。

原来我看的是 5.0 的文档,吐血。。。

5.5 的文档在这里 innodb_buffer_pool_size

监控

为了防止出现 Discuz 因 mysql 崩溃而无法访问,增加了 web 宕机监控,至少服务挂掉的时候能收到报警通知

阿里云监控 监控上对应的首页即可。

然而,这并没有什么卵用

mysql 又挂了,而且没有错误日志。。。

使用

sudo service mysql start

There are stopped jobs

解决 innodb_buffer_pool_size 内存不够的方案有哪些

理论上有三种方式

上面使用的第三种方案,而第一种方案需要 money。那么来探讨一下第二种方案吧。

我所使用的阿里云 VPS 默认没有交换分区,也是在 top 时无意中发现的。。。

$ free
total       used       free     shared    buffers     cached
Mem:        501892     449024      52868          0       7468      32968
-/+ buffers/cache:     408588      93304
Swap:            0          0          0

常识是,SWAP 设置为内存的 2 倍。为何云主机默认没有设置交换分区,原因是,阿里云担心 swap 分区会导致硬盘IO性能下降。

首先,创建用于交换分区的文件, 命令格式 dd if=/dev/zero of=/mnt/swap bs=block_size count=number_of_block

sudo dd if=/dev/zero of=/mnt/swap bs=1M count=1024

注:block_size、number_of_block 大小可以自定义,比如bs=1M count=1024 代表设置1G大小swap分区

设置交换分区文件

sudo mkswap /mnt/swap

立即启用交换分区文件

sudo swapon /mnt/swap

如果在 /etc/rc.local 中有 swapoff -a 需要修改为 swapon -a

设置开机时自启用swap分区, 需要修改文件 /etc/fstab 中的 swap 行。 添加

/mnt/swap swap swap defaults 0 0

注:/mnt/swap 路径可以修改,可以根据创建的swap文件具体路径来配置。

设置后可以执行 free -m 命令查看效果

$ free -m
total       used       free     shared    buffers     cached
Mem:           490        481          8          0          2         21
-/+ buffers/cache:        457         32
Swap:         1023          0       1023

如果你使用的是 digitalocean 的主机,这个配置更专业 How To Add Swap on Ubuntu 14.04

交换分区设置多大为好

参考 Recommended Partitioning Scheme

| 系统内存大小 | Recommended swap space     | Recommended swap space if allowing for hibernation |
⩽ 2GB                2 times the amount of RAM        3 times the amount of RAM                 
> 2GB – 8GB      Equal to the amount of RAM      2 times the amount of RAM                          
> 8GB – 64GB    At least 4 GB                             1.5 times the amount of RAM                        
> 64GB              At least 4 GB                             Hibernation not recommended   

更详细的关于交换分区的讨论,参考 SwapFaq-Ubuntu

为何考虑休眠时,需要更大的交换分区?

因为休眠时,会将内存中的数据都转移到交换分区中,所以交换分区的大小至少要等于内存的大小。

交换分区使用的多,一定会导致系统卡么?

不一定,因为有的程序在启动时可能需要大量的内存,但是执行中期可能就没有这么大的内存要求,系统可以调度内存使用, 将其占用的内存转移到交换分区中,为其他程序腾出内存空间。

mysql 被系统干掉的日志找到了

系统干掉内存占用过大进程的日志在哪里? /var/log/kern.log

Jun  7 19:40:22 kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj name
Jun  7 19:40:22 kernel: [  538]     0   538     3795       49   0       0             0 upstart-socket-
Jun  7 19:40:22 kernel: [ 6643]  1000  6643    11771     2101   0       0             0 gunicorn
Jun  7 19:40:22 kernel: [27299]   104 27299     9445      160   0       0             0 ntpd
Jun  7 19:40:22 kernel: [  342]   105   342   222975    15389   0       0             0 mysqld
Jun  7 19:40:22 kernel: [19509]     0 19509    37137      749   0       0             0 php5-fpm
Jun  7 19:40:22 kernel: [19513]    33 19513    41245     4449   0       0             0 php5-fpm
Jun  7 19:40:22 kernel: [19514]    33 19514    44302     7499   0       0             0 php5-fpm
Jun  7 19:40:22 kernel: Out of memory: Kill process 342 (mysqld) score 123 or sacrifice child
Jun  7 19:40:22 kernel: Killed process 342 (mysqld) total-vm:891900kB, anon-rss:61556kB, file-rss:0kB
Jun  7 19:40:22 kernel: init: mysql main process (342) killed by KILL signal
Jun  7 19:40:22 kernel: init: mysql main process ended, respawning

奇怪的是 mysql 被干掉时,实际驻留内存 rss 只有 15M, 并不大。

按照 Linux -- 内存控制之oom killer机制及代码分析 的说明, 在内存不够用时,系统会判断哪个进程该被干掉,而不是某个进程内存达到某个阈值时被立即干掉的。

通过 /var/log/kern.log 发现,mysql 被干掉的那一瞬间,php5-fpm 占用的内存才是大头。这个如何设置内存限制呢?

仍有疑问的地方

参考

我是一名山东烟台的开发者,联系作者