一、主从复制

1.主从复制介绍

两台MySQL实例,通过binlog 自动 “实时” 同步数据
通常可以用作:辅助备份+ 架构演变

2.主从复制搭建
  • 两台以上MySQL实例(server_id不同,port,socket等),确认主从身份
mysql> select @@server_id;
  • 主库开启二进制(binlog)日志
mysql> select @@log_bin;
  • 主库开启复制专用用户:replication slave
mysql> grant replication slave on *.* to repl@'10.0.0.5%' identified by '123';
  • 主库备份恢复到从库
mysqldump -uroot -p456 -S /data/3307/mysql.sock -A --master-data=2 --single-transaction >/tmp/full.sql
mysql -S /data/3308/mysql.sock
mysql> set sql_log_bin=0;
mysql> source /tmp/full.sql
  • 告诉从库复制相关信息:change master to ,IP,port user password file_name position
    • 起点:
vim /tmp/full.sql
-- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=444;
mysql> change master to;
CHANGE MASTER TO
  MASTER_HOST='10.0.0.62',
  MASTER_USER='repl',
  MASTER_PASSWORD='123456',
  MASTER_PORT=3307,
  MASTER_LOG_FILE='mysql-bin.000002',
  MASTER_LOG_POS=444,
  MASTER_CONNECT_RETRY=10;
  • 从库开启复制线程:start slave;
mysql> show slave status \G;
--- 两个yes 就是成功
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
3.主从复制过程原理
3.1 涉及到的文件
  • 主库:binlog
  • 从库:
    • relaylog 中继日志:存储请求的binlog
    • master.info :存储change master to 信息
    • relaylog.info:上次已经回放relay的位置点
3.2 涉及到的线程
  • 主库
    • binlog dump thread
      mysql> show processlist\G;
      *************************** 1. row ***************************
           Id: 6
         User: repl
         Host: db01:55196
           db: NULL
      Command: Binlog Dump
         Time: 10015
        State: Master has sent all binlog to slave; waiting for more updates
         Info: NULL
  • 从库 :
    • slave_io_running
    • slave_sql_running
[root@db01 ~]# mysql -S /data/3308/mysql.sock -e "show slave status \G"|grep Running;
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
3.3工作过程原理:
  • change master to 时,ip port user password binlog position写入到master.info进行记录
  • start slave 时,从库会启动IO线程和SQL线程
  • IO_T,读取master.info信息,获取主库信息连接主库
  • 主库会生成一个准备binlog DUMP线程,来响应从库
  • IO_T根据master.info记录的binlog文件名和position号,请求主库DUMP最新日志
  • DUMP线程检查主库的binlog日志,如果有新的,TP(传送)给从从库的IO_T
  • IO_T将收到的日志存储到了TCP/IP 缓存,立即返回ACK(网络层面)给主库 ,主库工作完成
  • IO_T将缓存中的数据,存储到relay-log日志文件,更新master.info文件binlog 文件名和postion,IO_T工作完成
  • SQL_T读取relay-log.info文件,获取到上次执行到的relay-log的位置,作为起点,回放relay-log
  • SQL_T回放完成之后,会更新relay-log.info文件。
  • relay-log会有自动清理的功能。
    在这里插入图片描述

细节:

  • 主库一旦有新的日志生成,会发送“信号”给binlog dump ,IO线程再请求
[root@db01 world]# cd /data/3308/data/
[root@db01 data]# ll
total 122940
-rw-r----- 1 mysql mysql       56 Sep  3 15:18 auto.cnf
-rw-r----- 1 mysql mysql        6 Sep 16 12:19 db01.pid
-rw-r----- 1 mysql mysql      206 Sep 16 12:28 db01-relay-bin.000001
-rw-r----- 1 mysql mysql      320 Sep 16 12:28 db01-relay-bin.000002
-rw-r----- 1 mysql mysql       48 Sep 16 12:28 db01-relay-bin.index
drwxr-x--- 2 mysql mysql       20 Sep 16 12:23 ha
-rw-r----- 1 mysql mysql      359 Sep 16 10:13 ib_buffer_pool
-rw-r----- 1 mysql mysql 12582912 Sep 16 12:24 ibdata1
-rw-r----- 1 mysql mysql 50331648 Sep 16 12:24 ib_logfile0
-rw-r----- 1 mysql mysql 50331648 Sep 16 12:24 ib_logfile1
-rw-r----- 1 mysql mysql 12582912 Sep 16 12:19 ibtmp1
-rw-r----- 1 mysql mysql      122 Sep 16 15:52 master.info
drwxr-x--- 2 mysql mysql       48 Sep 16 12:23 mdp
drwxr-x--- 2 mysql mysql     4096 Sep 16 12:23 mysql
drwxr-x--- 2 mysql mysql       54 Sep 16 12:23 oldboy
drwxr-x--- 2 mysql mysql     8192 Sep  3 15:18 performance_schema
-rw-r----- 1 mysql mysql       58 Sep 16 12:28 relay-log.info
drwxr-x--- 2 mysql mysql       20 Sep 16 12:23 sch
drwxr-x--- 2 mysql mysql      176 Sep 16 12:23 school
drwxr-x--- 2 mysql mysql     8192 Sep  3 15:18 sys
drwxr-x--- 2 mysql mysql       52 Sep 16 12:23 test
drwxr-x--- 2 mysql mysql      246 Sep 16 12:23 world
drwxr-x--- 2 mysql mysql       46 Sep 16 12:23 x

从库可以查看日志:master.info

[root@db01 data]# cat master.info 
25
mysql-bin.000002
444
10.0.0.51
repl
123456
3307
10
0





0
30.000

0
031e1ee9-ce1b-11e9-8e51-000c29303796
86400


0

4.主从复制监控
mysql> show slave status \G;
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.0.0.51
                  Master_User: repl
                  Master_Port: 3307
                Connect_Retry: 10
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 444
               Relay_Log_File: db01-relay-bin.000002
                Relay_Log_Pos: 320
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 444
              Relay_Log_Space: 526
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 7
                  Master_UUID: 031e1ee9-ce1b-11e9-8e51-000c29303796
             Master_Info_File: /data/3308/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
         Replicate_Rewrite_DB: 
                 Channel_Name: 
           Master_TLS_Version: 
1 row in set (0.00 sec)
4.1主库相关信息
Master_Host: 10.0.0.51
Master_User: repl
Master_Port: 3307
Connect_Retry: 10
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 444
4.2 从库relaylog 的信息(relay-log.info)从库的320对应主库的444
Relay_Log_File: db01-relay-bin.000002
Relay_Log_Pos: 320
Relay_Master_Log_File: mysql-bin.000002
Exec_Master_Log_Pos: 444
4.3 线程状态:
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
--- 如果上面线程启动失败,则看下面的参数
Last_IO_Errno: 0
Last_IO_Error: 
Last_SQL_Errno: 0
Last_SQL_Error: 

或者:
从库日志(log_error)线程启动no,但是没有报错信息可以查看

4.4 过滤复制有关信息
Replicate_Do_DB: 
Replicate_Ignore_DB: 
Replicate_Do_Table: 
Replicate_Ignore_Table: 
Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
4.5 从库延时主库的时间
Seconds_Behind_Master: 0
4.6 延时从库(可以手动定制)
SQL_Delay: 0
SQL_Remaining_Delay: NULL
4.7 GTID 复制状态
Retrieved_Gtid_Set: 
Executed_Gtid_Set: 
Auto_Position: 0
5.主从复制故障
5.1 IO线程
Slave_IO_Running: Yes
Last_IO_Errno: 0
Last_IO_Error: 

1)连接主库 connecting

  • IP
  • port
  • password
  • 网络
  • 防火墙
  • 主库没起
    排查:
    1. 查看从库的master.info
[root@db01 data]# pwd
/data/3308/data
[root@db01 data]# cat master.info 
25
mysql-bin.000002
444
10.0.0.51
repl
123456
3307
10
0





0
30.000

0
031e1ee9-ce1b-11e9-8e51-000c29303796
86400


0
  1. 使用以上信息手工连接,查看结果
[root@db01 ~]# mysql -urepl -p123 -h 10.0.0.52 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.52' (113)
[root@db01 ~]# mysql -urepl -p123 -h 10.0.0.51 -P3301
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.51' (111)
[root@db01 ~]# mysql -urepl -p123 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 2003 (HY000): Can't connect to MySQL server on '10.0.0.51' (111)
[root@db01 ~]# mysql -urepl -p1234 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repl'@'db01' (using password: YES)
[root@db01 ~]# mysql -urepla -p123 -h 10.0.0.51 -P3307
mysql: [Warning] Using a password on the command line interface can be insecure.
ERROR 1045 (28000): Access denied for user 'repla'@'db01' (using password: YES)

在从库处理:

stop slave all;
reset slave all
change master to..
start salve;
show salve status\G;

2)请求binlog
起点错误
主库二进制日志破坏

Got fatal error 1236 from master when reading data from binary log: 'log event entry exceeded max_allowed_packet; Increase max_allowed_packet on master; the first event 'mysql-bin.000005' at 653, the last event read from '/data/3307/mysql-bin.000005' at 123, the last byte read from '/data/3307/mysql-bin.000005' at 672.'

3)记录日志

5.2 SQL线程

回放Relaylog(binlog):SQL回放不了

SQL语句为什么会执行失败?
语法,sql_mode,
权限
创建的对象已经存在
从库创建一个库,然后在主库在创建一个库,就会报sql线程错误,需要把从库的库删除之后重新start slave;
修改或删除的对象不存在
约束冲突
数据类型
死锁
锁等待超时
命令失效
总结:
版本不一致
从库写入数据了:只读,读写分离中间件
手工补偿
pt-checksume
pt-sync

方法一:
stop slave; 
set global sql_slave_skip_counter = 1;
#将同步指针向下移动一个,如果多次不同步,可以重复操作。
start slave;
方法二:
/etc/my.cnf
slave-skip-errors = 1032,1062,1007
常见错误代码:
1007:对象已存在
1032:无法执行DML
1062:主键冲突,或约束冲突

但是,以上操作有时是有风险的,最安全的做法就是重新构建主从。把握一个原则,一切以主库为主.
6.主从延时
6.1 基础监控
Seconds_Behind_Master:0
6.2 日志量监控

1)主库

mysql> show master status\G;
*************************** 1. row ***************************
             File: mysql-bin.000008
         Position: 750
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: 1e91fb5a-cd38-11e9-b843-000c29303796:1-6,
b3ad910d-d832-11e9-89dc-000c29303796:1-2,
f66f1a5b-d827-11e9-bdf6-000c29303796:1-6
1 row in set (0.00 sec)

2)从库

mysql> show salve status \G;
Master_Log_File: mysql-bin.000002
Read_Master_Log_Pos: 444
[root@db01 ~]# cat /data/3308/data/relay-log.info 
7
./db01-relay-bin.000002
320
mysql-bin.000002
444
6.3 导致主从延时的原因

主库:

  • sync_binlog=1
  • binlog存储性能低,建议换成SSD
  • binlog_dump 5.6以前的传统复制模型下,是串行工作的,如果在主库发生高并发事务,或者大事务的时候传输的时候,传输日志会受到阻塞。
  • 早期解决办法,切割大事务。
  • 5.6以后有了GTID模式,可以自动进行并行传输日志到从库
  • 5.7以后即使不开启GTID模式也会有匿名的GTID,建议开启

从库:

  • 从库SQL线程串行回放,在主库发生高并发事务,大事务的时候,会出现很高延时
  • 在5.6中,开启GTID之后,使用多SQL线程并行回放,限制:只能针对database级别(不能多个线程同时回放一个库)实现并行
  • 在5.7当中,开启GTID 之后,多SQL 并行回放,基于Logic_clock 实现了真正的事务级别的并行回放,这个技术称为MTS

其他:

  • 网络
  • 硬件差异大
  • 版本
  • 优化参数不一样
  • 锁等待
  • 死锁
  • 从库太多
  • 从库有大量的慢查询
7. 延时从库
7.1 为什么要延时从库
数据库故障?
物理损坏
主从复制非常擅长解决物理损坏.
逻辑损坏
普通主从复制没办法解决逻辑损坏
7.2 配置
SQL线程延时:数据已经写入relaylog中了,SQL线程"慢点"运行
一般企业建议3-6小时,具体看公司运维人员对于故障的反应时间
mysql>stop slave;
mysql>CHANGE MASTER TO MASTER_DELAY = 300;
mysql>start slave;
mysql> show slave status \G
SQL_Delay: 300
SQL_Remaining_Delay: NULL
7.3 如果使用延时从库恢复数据

测试演练:
1)背景:

  • 核心业务库
  • 2T 数据
  • 主从复制 1主3从,主库负责写,两个从库负责读1,1 从库延时3小时
    2)故障:
  • 主库误删除了其中一个核心业务库,大约1T数据,(DROP DATABASE)
    3)思路:
  • 1.快速监控到故障
  • 2.停业务,挂维护页
  • 3.停延时从库
  • 4.手工模拟SQL线程(截取日志)
    • 起点:relay.info
    • 终点:drop之前
      4)模拟过程
      主库模拟数据
--- 主库添加数据,模拟删除
create database delay charset utf8mb4;
use delay;
create table t1 (id int);
insert into t1 values(1),(2),(3);
commit;
insert into t1 values(11),(22),(33);
commit;
insert into t1 values(111),(222),(333);
commit;
drop database delay;
--- 从库停止sql线程
mysql> stop slave sql_thread;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status \G;
起点:
Relay_Log_File: db01-relay-bin.000002
Relay_Log_Pos: 320
终点:
mysql> show relaylog events in 'db01-relay-bin.000002';
| db01-relay-bin.000002 | 1523 | Query          |         7 |        1452 | drop database delay                 

截取relaylog

[root@db01 ~]# mysqlbinlog --start-position=320 --stop-position=1523 /data/3308/data/db01-relay-bin.000002 >/tmp/relay.sql

恢复relaylog

mysql> set sql_log_bin=0;
mysql> source /tmp/relay.sql
mysql> use delay;
mysql> show tables;
+-----------------+
| Tables_in_delay |
+-----------------+
| t1              |
+-----------------+
1 row in set (0.00 sec)

mysql> select *from t1;
+------+
| id   |
+------+
|    1 |
|    2 |
|    3 |
|   11 |
|   22 |
|   33 |
|  111 |
|  222 |
|  333 |
+------+
9 rows in set (0.00 sec)
8.过滤复制
8.1 配置

主库:

mysql> show master status;
--- 白名单
Binlog_Do_DB
--- 黑名单
Binlog_Ignore_DB

从库:

--- 白名单,只复制什么库
Replicate_Do_DB: 
--- 黑名单 ,不复制什么库
Replicate_Ignore_DB: 

--- 只复制什么表
Replicate_Do_Table: 
--- 不复制什么表
Replicate_Ignore_Table: 

---模糊
Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
my.cnf
replicate_do_db=world
replicate_do_table=oldboy.t1
replicate_wild_do_table=oldboy.t*

清理环境恢复主从

从库:

stop  salve;
reset slave all;
drop database delay;
reset master;

主库:

reset master;

从库:

CHANGE MASTER TO
  MASTER_HOST='10.0.0.51',
  MASTER_USER='repl',
  MASTER_PASSWORD='123456',
  MASTER_PORT=3307,
  MASTER_LOG_FILE='mysql-bin.000001',
  MASTER_LOG_POS=154,
  MASTER_CONNECT_RETRY=10;
start slave;

配置:
从库:
添加白名单,设置白名单库,重启从库

change master to master_host='10.0.0.62',master_user='repl',master_password='123456',master_log_file='mysql-bin.000002',master_log_pos=154;
[root@db01 ~]# vim /data/3308/my.cnf 
[mysqld]
basedir=/usr/local/mysql
datadir=/data/3308/data
socket=/data/3308/mysql.sock
log_error=/data/3308/mysql.log
port=3308
server_id=8
log_bin=/data/3308/mysql-bin
replicate_do_db=oldboy
replicate_do_db=school
[root@db01 ~]# systemctl restart mysqld3308

查看从库状态,白名单已经存在

mysql> show slave status\G;
Replicate_Do_DB: oldboy,school

主库:
创建一个新库,去从库查看库是否能创建

mysql> create database aaaa;

从库:
可以看到从库并没有创建aaaa库

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| ha                 |
| mdp                |
| mysql              |
| oldboy             |
| performance_schema |
| sch                |
| school             |
| sys                |
| test               |
| world              |
| x                  |
+--------------------+
12 rows in set (0.00 sec)

主库:
在设置的白名单的库中添加一个表

mysql> use oldboy;
mysql> show tables;
+------------------+
| Tables_in_oldboy |
+------------------+
| t100w            |
+------------------+
1 row in set (0.00 sec)
mysql> create table t1(id int);
Query OK, 0 rows affected (0.01 sec)

mysql> show tables;
+------------------+
| Tables_in_oldboy |
+------------------+
| t1               |
| t100w            |
+------------------+
2 rows in set (0.00 sec)

从库:
查看从库中已经创建成功

mysql> show tables;
+------------------+
| Tables_in_oldboy |
+------------------+
| t1               |
| t100w            |
+------------------+
2 rows in set (0.00 sec)
9.GTID复制
9.1 3台虚拟机测试环境清理重构
pkill mysqld
 \rm -rf /data/mysql/data/*
 \rm -rf /data/binlog/*
9.2 准备配置文件
主库db01:
cat > /etc/my.cnf <<EOF
[mysqld]
basedir=/usr/local/mysql/
datadir=/data/mysql/data
socket=/tmp/mysql.sock
server_id=51
port=3306
secure-file-priv=/tmp
autocommit=0
log_bin=/data/binlog/mysql-bin
binlog_format=row
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1
[mysql]
prompt=db01 [\\d]>
EOF

slave1(db02):
cat > /etc/my.cnf <<EOF
[mysqld]
basedir=/usr/local/mysql
datadir=/data/mysql/data
socket=/tmp/mysql.sock
server_id=52
port=3306
secure-file-priv=/tmp
autocommit=0
log_bin=/data/binlog/mysql-bin
binlog_format=row
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1
[mysql]
prompt=db02 [\\d]>
EOF

slave2(db03):
cat > /etc/my.cnf <<EOF
[mysqld]
basedir=/usr/local/mysql
datadir=/data/mysql/data
socket=/tmp/mysql.sock
server_id=53
port=3306
secure-file-priv=/tmp
autocommit=0
log_bin=/data/binlog/mysql-bin
binlog_format=row
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1
[mysql]
prompt=db03 [\\d]>
EOF
9.3 初始化数据
mysqld --initialize-insecure --user=mysql --basedir=/usr/local/mysql --datadir=/data/mysql/data 
9.4 主库导入测试数据
source /root/world.sql
9.5 启动数据库
/etc/init.d/mysqld start
9.6 构建主从
master:51
slave:52,53

主:

grant replication slave  on *.* to repl@'10.0.0.%' identified by '123';
mysqldump  -A --master-data=2 --single-transaction >/tmp/full.sql

从:

source /tmp/full.sql

change master to 
master_host='10.0.0.51',
master_user='repl',
master_password='123' ,
MASTER_AUTO_POSITION=1;
start slave;

主从复制不能添加 –set-gtid-purged=OFF 参数
mysqldump -A –master-data=2 –single-transaction –set-gtid-purged=OFF >/tmp/full2.sql