MySQL 主从延迟几万秒 Queueing master event to the relay log

时间:2019-08-23 17:27 来源:linux.it.net.cn 作者:IT

数据库版本
Server version:    5.6.24-log Source distribution

问题描述

数据采集平台业务数据库由于批量灌数据导致主从延迟上万秒。

复制线程长期处于Queueing master event to the relay log状态。

监控数据显示
1.Seconds_Behind_Master 维持在6w秒左右，且有上升趋势。
2.主库有大量的binlog积压无法同步到从库，但主从库的网卡流量都很低远未达到瓶颈。
3.从库的qps与tps很低，维持在几百左右。
4.cpu 负载不高，但iowait维持在 12%左右
5.iostat -xmd 2 发现util维持在99%

问题分析
1.从监控数据分析貌似磁盘IO瓶颈，首先尝试用fileio，dd等工具测试，发现磁盘io确实很差。
考虑在从库禁用双1参数后(sync_binlog，innodb_flush_log_at_trx_commit)主从延迟时未见明显下降.

2.从其它主机拷大量拷贝小文件未发现网络传输问题

3.关闭并行复制
MySQL5.6版本中开启了库级别的并行复制，但show processlist发现从库大部分并行复制同步线程都处于空闲状态。
关闭并行复制后,主从延迟时间仍未得到缓解
stop slave sql_thread;set global slave_parallel_workers=0;start slave sql_thread;

4.解析binlog 未发现SQL执行效率低，无主键等问题

5.检查MySQL参数配置，问题浮出水面。

    mysql> show variables where variable_name in('slave_parallel_workers','read_only', 'master_info_repository','relay_log_info_repository','slave_net_timeout','log_slave_updates', 'slave_compressed_protocol','sync_master_info','sync_relay_log','sync_relay_log_info','relay_log_purge');
    +---------------------------+-------+
    | Variable_name             | Value |
    +---------------------------+-------+
    | log_slave_updates         | ON    |
    | master_info_repository    | FILE |
    | read_only                 | OFF   |
    | relay_log_info_repository | FILE |
    | relay_log_purge           | OFF   |
    | slave_compressed_protocol | ON    |
    | slave_net_timeout         | 10    |
    | slave_parallel_workers    | 6     |
    | sync_master_info          | 1     |
    | sync_relay_log            | 10000 |
    | sync_relay_log_info       | 10000 |
    +---------------------------+-------+

检查发现：master_info_repository设置为FILE,同时sync_master_info设置为1。
这两个参数组合起来的意思就是slave要同步每一个sync_master_info events 到 master.info文件中。由于磁盘的性能问题，导致fdatasync()的效率比较低，所以引起复制延迟。

解决办法

把master_info_repository设置为table后，主从延迟直线下降。
stop slave;set global relay_log_info_repository=table;set global master_info_repository=table;start slave;

官方文档中对master_info_repository参数的说明

https://dev.mysql.com/doc/refman/5.6/en/replication-options-slave.html#sysvar_sync_master_info
master_info_repository = FILE. If the value of sync_master_info is greater than 0, the slave synchronizes its master.info file to disk (using fdatasync()) after every sync_master_info events. If it is 0, the MySQL server performs no synchronization of the master.info file to disk; instead, the server relies on the operating system to flush its contents periodically as with any other file.
master_info_repository = TABLE. If the value of sync_master_info is greater than 0, the slave updates its master info repository table after every sync_master_info events. If it is 0, the table is never updated.

(责任编辑：IT)