MySQL基于MHA高可用测试篇（Binlog模式）

一、环境准备

下面来测试MHA提供的各种功能，包括自动切换、手动切换、在线切换三种常用模式。下面是MHA manager的启动配置。

[root@mha ~]# cat /etc/masterha/app1.cnf
[server default]
manager_log=/var/log/masterha/app1/manager.log
manager_workdir=/var/log/masterha/app1
master_binlog_dir=/data/mysql/log/binlog/
password=123456
ping_interval=3
remote_workdir=/data/mysql/mha
repl_password=123456
repl_user=mysql_slave
secondary_check_script=/usr/local/bin/masterha_secondary_check -s mysql02 -s mysql03 --user=root --port=22 --master_host=mysql01 --master_ip=10.99.73.9 --master_port=3306 --ping_type=SELECT
report_script=/usr/local/bin/send_report
ssh_user=root
ssh_port=22
user=mha

[server1]
candidate_master=1
hostname=10.99.73.9
port=3306

[server2]
candidate_master=1
hostname=10.99.73.10
port=3306

[server3]
hostname=10.99.73.11
no_master=1
port=3306

[root@mha ~]# cat /etc/masterha/app1.cnf

[server default]

manager_log=/var/log/masterha/app1/manager.log

manager_workdir=/var/log/masterha/app1

master_binlog_dir=/data/mysql/log/binlog/

password=123456

ping_interval=3

remote_workdir=/data/mysql/mha

repl_password=123456

repl_user=mysql_slave

secondary_check_script=/usr/local/bin/masterha_secondary_check -s mysql02 -s mysql03 --user=root --port=22 --master_host=mysql01 --master_ip=10.99.73.9 --master_port=3306 --ping_type=SELECT

report_script=/usr/local/bin/send_report

ssh_user=root

ssh_port=22

user=mha

[server1]

candidate_master=1

hostname=10.99.73.9

port=3306

[server2]

candidate_master=1

hostname=10.99.73.10

port=3306

[server3]

hostname=10.99.73.11

no_master=1

port=3306

注意这里没有设置master_ip_failover_script与master_ip_online_change_script脚本，因为我们这里还没有涉及到VIP，所以可以没有这两个脚本，但仅限于测试。

通过前面简单实验，我们知道mha可以正常切换主从，但是当MySQL正常切换之后，应用中是无法正常自动切换的，这时就需要VIP了。VIP配置可以采用两种方式，一种通过keepalived的方式管理虚拟ip的浮动；另外一种通过脚本方式启动虚拟ip的方式（即不需要keepalived或者heartbeat类似的软件）。

一旦有了VIP，所以也就需要给master_ip_failover_script与master_ip_online_change_script提供脚本了。为了防止脑裂发生，推荐生产环境采用脚本的方式来管理虚拟ip，而不是使用keepalived来完成。但是当我们有了VIP之后，在切换时还有一个问题，我们知道网络模型OSI中二层数据链路是靠MAC地址通信的，所以上层无论三层或二层交换机是缓存有VIP的ARP（MAC和IP对应关系）表，交换机靠这个表来进行数据包的转换转发。所以当我们VIP切换后，上层交换机并不会立马刷新自己的ARP缓存表，这就需要我们人工干预了。

其实就是在我们进行切换时，可以通过使用一个arping命令实现。arping是在局域网中使用ARP请求判断目标主机是否在线的工具。你可以使用IP地址或MAC地址作为它的测试目标（因为APRING程序工作于OSI模型中的第二层，ARP协议的数据包无法通过路由器和网关，所以它只能用来检测局域网中的主机）。其命令语法如下：

Usage: arping [-fqbDUAV] [-c count] [-w timeout] [-I device] [-s source] destination

1	Usage: arping [-fqbDUAV] [-c count] [-w timeout] [-I device] [-s source] destination

常用参数介绍：

-b：用于发送以太网广播帧（FFFFFFFFFFFF）。arping一开始使用广播地址，在收到响应后就使用unicast地址。

-q：quiet output不显示任何信息。

-f：表示在收到第一个响应报文后就退出。

-w timeout：设定一个超时时间，单位是秒。如果到了指定时间，arping还没到完全收到响应则退出。

-c count：表示发送指定数量的ARP请求数据包后就停止。如果指定了deadline选项，则arping会等待相同数量的arp响应包，直到超时为止。

-s source：设定arping发送的arp数据包中的SPA字段的值。如果为空，则按下面处理，如果是DAD模式（冲突地址探测），则设置为0.0.0.0，如果是Unsolicited ARP模式（Gratutious ARP）则设置为目标地址，否则从路由表得出。

-I interface：设置ping使用的网络接口。

整体脚本代码如下，根据环境不同可能需要稍微修改，比如ssh连接的端口号和用户名，能提取的都已经变量化了。

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';
use Getopt::Long;

my (
    $orig_master_port, $orig_master_host, $orig_master_ip, $orig_master_password,
    $orig_master_user, $orig_master_is_new_slave, 
    $orig_master_ssh_user, $orig_master_ssh_port,
    $new_master_user, $new_master_host, $new_master_ip, $new_master_port, 
    $new_master_ssh_user, $new_master_ssh_port, $new_master_password,
    $ssh_user, $ssh_port, $command, $vip, $interface ,$gateway
);

GetOptions(
    'command=s'                => \$command,
    'new_master_host=s'        => \$new_master_host,
    'new_master_ip=s'          => \$new_master_ip,
    'new_master_port=i'        => \$new_master_port,
    'new_master_user=s'        => \$new_master_user,
    'new_master_password=s'    => \$new_master_password,
    'new_master_ssh_user=s'    => \$new_master_ssh_user,
    'new_master_ssh_port=s'    => \$new_master_ssh_port,
    'orig_master_host=s'       => \$orig_master_host,
    'orig_master_ip=s'         => \$orig_master_ip,
    'orig_master_port=i'       => \$orig_master_port,
    'orig_master_user=s'       => \$orig_master_user,
    'orig_master_password=s'   => \$orig_master_password,
    'orig_master_is_new_slave' => \$orig_master_is_new_slave,
    'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
    'orig_master_ssh_port=s'   => \$orig_master_ssh_port,
    'ssh_user=s'               => \$ssh_user,
    'ssh_port=i'               => \$ssh_port,
    'vip=s'                    => \$vip,
    'gateway=s'                => \$gateway,
    'interface=s'              => \$interface,
);

$new_master_ssh_user = "root";
$new_master_ssh_port = "22";
$orig_master_ssh_user = "root";
$orig_master_ssh_port = "22";

my $key = '1';
my @vipnomask=(split(/\//,$vip))[0];
my $ssh_start_vip = "sudo /sbin/ifconfig $interface:$key $vip && sudo /sbin/arping -I $interface -c 1 -s @vipnomask $gateway";
my $ssh_stop_vip = "sudo /sbin/ifconfig $interface:$key down";


exit &main();

sub main {

    print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

    if ( $command eq "stop" || $command eq "stopssh" ) {

        my $exit_code = 1;
        eval {
            print "Disabling the VIP on old master: $orig_master_host \n";
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn "Got Error: $@\n";
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "start" ) {

        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh -p $new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
sub stop_vip() {
    `ssh -p $orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
    print
    "Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}

100

101

102

103

104

105

#!/usr/bin/env perl

use strict;

use warnings FATAL => 'all';

use Getopt::Long;

my (

$orig_master_port, $orig_master_host, $orig_master_ip, $orig_master_password,

$orig_master_user, $orig_master_is_new_slave,

$orig_master_ssh_user, $orig_master_ssh_port,

$new_master_user, $new_master_host, $new_master_ip, $new_master_port,

$new_master_ssh_user, $new_master_ssh_port, $new_master_password,

$ssh_user, $ssh_port, $command, $vip, $interface ,$gateway

);

GetOptions(

'command=s' => \$command,

'new_master_host=s' => \$new_master_host,

'new_master_ip=s' => \$new_master_ip,

'new_master_port=i' => \$new_master_port,

'new_master_user=s' => \$new_master_user,

'new_master_password=s' => \$new_master_password,

'new_master_ssh_user=s' => \$new_master_ssh_user,

'new_master_ssh_port=s' => \$new_master_ssh_port,

'orig_master_host=s' => \$orig_master_host,

'orig_master_ip=s' => \$orig_master_ip,

'orig_master_port=i' => \$orig_master_port,

'orig_master_user=s' => \$orig_master_user,

'orig_master_password=s' => \$orig_master_password,

'orig_master_is_new_slave' => \$orig_master_is_new_slave,

'orig_master_ssh_user=s' => \$orig_master_ssh_user,

'orig_master_ssh_port=s' => \$orig_master_ssh_port,

'ssh_user=s' => \$ssh_user,

'ssh_port=i' => \$ssh_port,

'vip=s' => \$vip,

'gateway=s' => \$gateway,

'interface=s' => \$interface,

);

$new_master_ssh_user = "root";

$new_master_ssh_port = "22";

$orig_master_ssh_user = "root";

$orig_master_ssh_port = "22";

my $key = '1';

my @vipnomask=(split(/\//,$vip))[0];

my $ssh_start_vip = "sudo /sbin/ifconfig $interface:$key $vip && sudo /sbin/arping -I $interface -c 1 -s @vipnomask $gateway";

my $ssh_stop_vip = "sudo /sbin/ifconfig $interface:$key down";

exit &main();

sub main {

print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";

if ( $command eq "stop" || $command eq "stopssh" ) {

my $exit_code = 1;

eval {

print "Disabling the VIP on old master: $orig_master_host \n";

&stop_vip();

$exit_code = 0;

};

if ($@) {

warn "Got Error: $@\n";

exit $exit_code;

}

exit $exit_code;

}

elsif ( $command eq "start" ) {

my $exit_code = 10;

eval {

print "Enabling the VIP - $vip on the new master - $new_master_host \n";

&start_vip();

$exit_code = 0;

};

if ($@) {

warn $@;

exit $exit_code;

}

exit $exit_code;

}

elsif ( $command eq "status" ) {

print "Checking the Status of the script.. OK \n";

exit 0;

}

else {

&usage();

exit 1;

}

sub start_vip() {

`ssh -p $new_master_ssh_port $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \"`;

}

sub stop_vip() {

`ssh -p $orig_master_ssh_port $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;

}

sub usage {

"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";

}

使用方式如下，自动故障切换和在线切换都是使用这个脚本即可，配置如下：

master_ip_failover_script=/usr/local/bin/master_ip_online_change --interface=eth0 --vip=10.99.73.100/32 --gateway=10.99.73.1
master_ip_online_change_script=/usr/local/bin/master_ip_online_change --interface=eth0 --vip=10.99.73.100/32 --gateway=10.99.73.1

1 2	master_ip_failover_script=/usr/local/bin/master_ip_online_change --interface=eth0 --vip=10.99.73.100/32 --gateway=10.99.73.1 master_ip_online_change_script=/usr/local/bin/master_ip_online_change --interface=eth0 --vip=10.99.73.100/32 --gateway=10.99.73.1

二、测试自动Failover（主要测试new master是补谁的日志）

必须先启动MHA Manager，否则无法自动切换，当然手动切换不需要开启MHA Manager监控。

[root@mha ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover &

1	[root@mha ~]# nohup masterha_manager --conf=/etc/masterha/app1.cnf --remove_dead_master_conf --ignore_last_failover &

检查主从复制状态

[root@mha ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf
...........
10.99.73.9(10.99.73.9:3306) (current master)
 +--10.99.73.10(10.99.73.10:3306)
 +--10.99.73.11(10.99.73.11:3306)
...........
MySQL Replication Health is OK.

[root@mha ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf

...........

10.99.73.9(10.99.73.9:3306) (current master)

+--10.99.73.10(10.99.73.10:3306)

+--10.99.73.11(10.99.73.11:3306)

...........

MySQL Replication Health is OK.

自动failover模拟测试的操作步骤如下。

1. 使用sysbench生成测试数据

自行安装 sysbench 工具。

在主库（10.99.73.9）上进行sysbench数据生成，在sbtest库下生成sbtest表，共100W记录。

$ sysbench /usr/share/sysbench/oltp_read_only.lua \
--mysql-host=127.0.0.1 \
--mysql-port=3306 \
--mysql-user=mha \
--mysql-password=123456 \
--mysql-socket=/data/mysql/3306/mysql.sock \
--mysql-db=sbtest \
--db-driver=mysql \
--tables=1 \
--table-size=1000000 \
--report-interval=10 \
--threads=128 \
--time=120 \
prepare

$ sysbench /usr/share/sysbench/oltp_read_only.lua \

--mysql-host=127.0.0.1 \

--mysql-port=3306 \

--mysql-user=mha \

--mysql-password=123456 \

--mysql-socket=/data/mysql/3306/mysql.sock \

--mysql-db=sbtest \

--db-driver=mysql \

--tables=1 \

--table-size=1000000 \

--report-interval=10 \

--threads=128 \

--time=120 \

prepare

如果是需要产生大量的binlog，使用sysbench模拟压测，持续时间为3分钟，产生大量的binlog。使用如下语句（这里我们不使用）：

$ sysbench /usr/share/sysbench/oltp_read_only.lua \
--mysql-host=127.0.0.1 \
--mysql-port=3306 \
--mysql-user=mha \
--mysql-password=123456 \
--mysql-socket=/data/mysql/3306/mysql.sock \
--mysql-db=sbtest \
--db-driver=mysql \
--tables=1 \
--table-size=1000000 \
--report-interval=10 \
--threads=128 \
--time=120 \
run

$ sysbench /usr/share/sysbench/oltp_read_only.lua \

--mysql-host=127.0.0.1 \

--mysql-port=3306 \

--mysql-user=mha \

--mysql-password=123456 \

--mysql-socket=/data/mysql/3306/mysql.sock \

--mysql-db=sbtest \

--db-driver=mysql \

--tables=1 \

--table-size=1000000 \

--report-interval=10 \

--threads=128 \

--time=120 \

run

2. 停掉slave io线程（10.99.73.10）

一定要在master执行sysbench结束之前停掉slave io线程，这样我们才可以模拟查看当10.99.73.10成为new master之后是否会同步old master的binlog。

mysql> stop slave io_thread;
Query OK, 0 rows affected (0.10 sec)

mysql> select count(*) from sbtest.sbtest;
+----------+
| count(*) |
+----------+
|   730000 |
+----------+
1 row in set (0.24 sec)

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 | 45473439 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> stop slave io_thread;

Query OK, 0 rows affected (0.10 sec)

mysql> select count(*) from sbtest.sbtest;

+----------+

| count(*) |

+----------+

| 730000 |

+----------+

1 row in set (0.24 sec)

mysql> show master status;

+------------------+----------+--------------+------------------+-------------------+

+------------------+----------+--------------+------------------+-------------------+

| mysql-bin.000002 | 45473439 | | | |

+------------------+----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

另外一台slave我们没有停止io线程，所以还在继续接收执行日志。

3. 停掉slave io线程（10.99.73.11）

你可以一直使用show slave status命令查看当前此slave的状态，只要比10.99.73.10同步的数据多就可以停掉io线程了，但是记住不要把master的数据完全同步完了。

mysql> stop slave io_thread;
Query OK, 0 rows affected (0.03 sec)

mysql>  select count(*) from sbtest.sbtest;
+----------+
| count(*) |
+----------+
|   780000 |
+----------+
1 row in set (0.21 sec)

mysql>  show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 | 48587999 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> stop slave io_thread;

Query OK, 0 rows affected (0.03 sec)

mysql> select count(*) from sbtest.sbtest;

+----------+

| count(*) |

+----------+

| 780000 |

+----------+

1 row in set (0.21 sec)

mysql> show master status;

+------------------+----------+--------------+------------------+-------------------+

+------------------+----------+--------------+------------------+-------------------+

| mysql-bin.000002 | 48587999 | | | |

+------------------+----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

4. 杀掉主库mysql进程

模拟主库发生故障，进行自动failover操作。

mysql> select count(*) from sbtest.sbtest;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.20 sec)

mysql> show master status;
+------------------+-----------+--------------+------------------+-------------------+
| File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+-------------------+
| mysql-bin.000002 | 123299163 |              |                  |                   |
+------------------+-----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest.sbtest;

+----------+

| count(*) |

+----------+

| 1000000 |

+----------+

1 row in set (0.20 sec)

mysql> show master status;

+------------------+-----------+--------------+------------------+-------------------+

+------------------+-----------+--------------+------------------+-------------------+

| mysql-bin.000002 | 123299163 | | | |

+------------------+-----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

$ mysqladmin -S /data/mysql/3306/mysql.sock shutdown

1	$ mysqladmin -S /data/mysql/3306/mysql.sock shutdown

PS：可以看出master是100万数据，slave（10.99.73.10）只有90万数据，而slave（10.99.73.11）有97万数据，下面看整个MHA的切换过程。

5. 查看MHA切换日志

了解整个切换过程，在10.99.73.7上查看日志。

[root@mha ~]# cat /var/log/masterha/app1/manager.log
Mon Feb 20 15:22:21 2017 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)
Mon Feb 20 15:22:21 2017 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s mysql02 -s mysql03 --user=root --master_host=mysql01 --master_ip=10.99.73.9 --master_port=3306  --user=root  --master_host=10.99.73.9  --master_ip=10.99.73.9  --master_port=3306 --master_user=mha --master_password=123456 --ping_type=SELECT
Mon Feb 20 15:22:21 2017 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/log/binlog --output_file=/data/mysql/mha/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin
Mon Feb 20 15:22:21 2017 - [info] HealthCheck: SSH to 10.99.73.9 is reachable.
Monitoring server mysql02 is reachable, Master is not reachable from mysql02. OK.
Monitoring server mysql03 is reachable, Master is not reachable from mysql03. OK.
Mon Feb 20 15:22:21 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Mon Feb 20 15:22:24 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.99.73.9' (111))
Mon Feb 20 15:22:24 2017 - [warning] Connection failed 2 time(s)..
Mon Feb 20 15:22:27 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.99.73.9' (111))
Mon Feb 20 15:22:27 2017 - [warning] Connection failed 3 time(s)..
Mon Feb 20 15:22:30 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.99.73.9' (111))
Mon Feb 20 15:22:30 2017 - [warning] Connection failed 4 time(s)..
Mon Feb 20 15:22:30 2017 - [warning] Master is not reachable from health checker!
Mon Feb 20 15:22:30 2017 - [warning] Master 10.99.73.9(10.99.73.9:3306) is not reachable!
Mon Feb 20 15:22:30 2017 - [warning] SSH is reachable.
Mon Feb 20 15:22:30 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..
Mon Feb 20 15:22:30 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Feb 20 15:22:30 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Feb 20 15:22:30 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Feb 20 15:22:30 2017 - [info] GTID failover mode = 0
Mon Feb 20 15:22:30 2017 - [info] Dead Servers:
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:30 2017 - [info] Alive Servers:
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.10(10.99.73.10:3306)
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.11(10.99.73.11:3306)
Mon Feb 20 15:22:30 2017 - [info] Alive Slaves:
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.10(10.99.73.10:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:22:30 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:30 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.11(10.99.73.11:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:22:30 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:30 2017 - [info]     Not candidate for the new Master (no_master is set)
Mon Feb 20 15:22:30 2017 - [info] Checking slave configurations..
Mon Feb 20 15:22:30 2017 - [info] Checking replication filtering settings..
Mon Feb 20 15:22:30 2017 - [info]  Replication filtering check ok.
Mon Feb 20 15:22:30 2017 - [info] Master is down!
Mon Feb 20 15:22:30 2017 - [info] Terminating monitoring script.
Mon Feb 20 15:22:30 2017 - [info] Got exit code 20 (Master dead).
Mon Feb 20 15:22:30 2017 - [info] MHA::MasterFailover version 0.57.
Mon Feb 20 15:22:30 2017 - [info] Starting master failover.
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] * Phase 1: Configuration Check Phase..
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] GTID failover mode = 0
Mon Feb 20 15:22:30 2017 - [info] Dead Servers:
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:30 2017 - [info] Checking master reachability via MySQL(double check)...
Mon Feb 20 15:22:30 2017 - [info]  ok.
Mon Feb 20 15:22:30 2017 - [info] Alive Servers:
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.10(10.99.73.10:3306)
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.11(10.99.73.11:3306)
Mon Feb 20 15:22:30 2017 - [info] Alive Slaves:
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.10(10.99.73.10:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:22:30 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:30 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.11(10.99.73.11:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:22:30 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:30 2017 - [info]     Not candidate for the new Master (no_master is set)
Mon Feb 20 15:22:30 2017 - [info] Starting Non-GTID based failover.
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Feb 20 15:22:30 2017 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Mon Feb 20 15:22:30 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Feb 20 15:22:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] * Phase 3: Master Recovery Phase..
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:97142702
Mon Feb 20 15:22:30 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.11(10.99.73.11:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:22:30 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:30 2017 - [info]     Not candidate for the new Master (no_master is set)
Mon Feb 20 15:22:30 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:90008622
Mon Feb 20 15:22:30 2017 - [info] Oldest slaves:
Mon Feb 20 15:22:30 2017 - [info]   10.99.73.10(10.99.73.10:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:22:30 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:30 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Mon Feb 20 15:22:30 2017 - [info] 
Mon Feb 20 15:22:30 2017 - [info] Fetching dead master's binary logs..
Mon Feb 20 15:22:30 2017 - [info] Executing command on the dead master 10.99.73.9(10.99.73.9:3306): save_binary_logs --command=save --start_file=mysql-bin.000002  
--start_pos=97142702 --binlog_dir=/data/mysql/log/binlog --output_file=/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog 
--handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57
  Creating /data/mysql/mha if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000002 pos 97142702 to mysql-bin.000002 EOF into /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 154.. ok.
  Dumping effective binlog data from /data/mysql/log/binlog/mysql-bin.000002 position 97142702 to tail(123299186).. ok.
 Binlog Checksum enabled
 Concat succeeded.
Mon Feb 20 15:22:31 2017 - [info] scp from root@10.99.73.9:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog succeeded.
Mon Feb 20 15:22:31 2017 - [info] HealthCheck: SSH to 10.99.73.10 is reachable.
Mon Feb 20 15:22:32 2017 - [info] HealthCheck: SSH to 10.99.73.11 is reachable.
Mon Feb 20 15:22:32 2017 - [info] 
Mon Feb 20 15:22:32 2017 - [info] * Phase 3.3: Determining New Master Phase..
Mon Feb 20 15:22:32 2017 - [info] 
Mon Feb 20 15:22:32 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Mon Feb 20 15:22:32 2017 - [info] Checking whether 10.99.73.11 has relay logs from the oldest position..
Mon Feb 20 15:22:32 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000002 --latest_rmlp=97142702 --target_mlf=mysql-bin.000002 --target_rmlp=90008622 --server_id=103309 --workdir=/data/mysql/mha --timestamp=20170220152230 --manager_version=0.57 --relay_dir=/data/mysql/3306/log/relaylog --current_relay_log=relay-log.000003  :
    Relay log found at /data/mysql/3306/log/relaylog, up to relay-log.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:relay-log.000003, start_pos:90008835.
Target relay log FOUND!
Mon Feb 20 15:22:32 2017 - [info] OK. 10.99.73.11 has all relay logs.
Mon Feb 20 15:22:32 2017 - [info] Searching new master from slaves..
Mon Feb 20 15:22:32 2017 - [info]  Candidate masters from the configuration file:
Mon Feb 20 15:22:32 2017 - [info]   10.99.73.10(10.99.73.10:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:22:32 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:32 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Feb 20 15:22:32 2017 - [info]  Non-candidate masters:
Mon Feb 20 15:22:32 2017 - [info]   10.99.73.11(10.99.73.11:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:22:32 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:22:32 2017 - [info]     Not candidate for the new Master (no_master is set)
Mon Feb 20 15:22:32 2017 - [info]  Searching from candidate_master slaves which have received the latest relay log events..
Mon Feb 20 15:22:32 2017 - [info]   Not found.
Mon Feb 20 15:22:32 2017 - [info]  Searching from all candidate_master slaves..
Mon Feb 20 15:22:32 2017 - [info] New master is 10.99.73.10(10.99.73.10:3306)
Mon Feb 20 15:22:32 2017 - [info] Starting master failover..
Mon Feb 20 15:22:32 2017 - [info] 
From:
10.99.73.9(10.99.73.9:3306) (current master)
 +--10.99.73.10(10.99.73.10:3306)
 +--10.99.73.11(10.99.73.11:3306)

To:
10.99.73.10(10.99.73.10:3306) (new master)
 +--10.99.73.11(10.99.73.11:3306)
Mon Feb 20 15:22:32 2017 - [info] 
Mon Feb 20 15:22:32 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Mon Feb 20 15:22:32 2017 - [info] 
Mon Feb 20 15:22:32 2017 - [info] Server 10.99.73.10 received relay logs up to: mysql-bin.000002:90008622
Mon Feb 20 15:22:32 2017 - [info] Need to get diffs from the latest slave(10.99.73.11) up to: mysql-bin.000002:97142702 (using the latest slave's relay logs)
Mon Feb 20 15:22:32 2017 - [info] Connecting to the latest slave host 10.99.73.11, generating diff relay log files..
Mon Feb 20 15:22:32 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=10.99.73.10 --latest_mlf=mysql-bin.000002 --latest_rmlp=97142702 --target_mlf=mysql-bin.000002 --target_rmlp=90008622 --server_id=103309 --diff_file_readtolatest=/data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog 
--workdir=/data/mysql/mha --timestamp=20170220152230 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --relay_dir=/data/mysql/3306/log/relaylog --current_relay_log=relay-log.000003 
Mon Feb 20 15:22:33 2017 - [info] 
    Relay log found at /data/mysql/3306/log/relaylog, up to relay-log.000003
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:relay-log.000003, start_pos:90008835.
 Concat binary/relay logs from relay-log.000003 pos 90008835 to relay-log.000003 EOF into /data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog ..
 Binlog Checksum enabled
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 367.. ok.
  Dumping effective binlog data from /data/mysql/3306/log/relaylog/relay-log.000003 position 90008835 to tail(97142915).. ok.
 Binlog Checksum enabled
 Binlog Checksum enabled
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog .
 scp mysql03:/data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog to root@10.99.73.10(22) succeeded.
Mon Feb 20 15:22:33 2017 - [info]  Generating diff files succeeded.
Mon Feb 20 15:22:33 2017 - [info] Sending binlog..
Mon Feb 20 15:22:34 2017 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog to root@10.99.73.10:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog succeeded.
Mon Feb 20 15:22:34 2017 - [info] 
Mon Feb 20 15:22:34 2017 - [info] * Phase 3.4: Master Log Apply Phase..
Mon Feb 20 15:22:34 2017 - [info] 
Mon Feb 20 15:22:34 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Mon Feb 20 15:22:34 2017 - [info] Starting recovery on 10.99.73.10(10.99.73.10:3306)..
Mon Feb 20 15:22:34 2017 - [info]  Generating diffs succeeded.
Mon Feb 20 15:22:34 2017 - [info] Waiting until all relay logs are applied.
Mon Feb 20 15:22:34 2017 - [info]  done.
Mon Feb 20 15:22:34 2017 - [info] Getting slave status..
Mon Feb 20 15:22:34 2017 - [info] This slave(10.99.73.10)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000002:90008622). No need to recover from Exec_Master_Log_Pos.
Mon Feb 20 15:22:34 2017 - [info] Connecting to the target slave host 10.99.73.10, running recover script..
Mon Feb 20 15:22:34 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.99.73.10 --slave_ip=10.99.73.10  --slave_port=3306 --apply_files=/data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog 
--workdir=/data/mysql/mha --target_version=5.7.17-log --timestamp=20170220152230 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx
Mon Feb 20 15:22:37 2017 - [info] 
 Concat all apply files to /data/mysql/mha/total_binlog_for_10.99.73.10_3306.20170220152230.binlog ..
 Copying the first binlog file /data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog to /data/mysql/mha/total_binlog_for_10.99.73.10_3306.20170220152230.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog..  Binlog Checksum enabled
dumped up to pos 154. ok.
 /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog has effective binlog events from pos 154.
  Dumping effective binlog data from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog position 154 to tail(26156638).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /data/mysql/mha/total_binlog_for_10.99.73.10_3306.20170220152230.binlog .
MySQL client version is 5.7.17. Using --binary-mode.
Applying differential binary/relay log files /data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog on 10.99.73.10:3306. This may take long time...
Applying log files succeeded.
Mon Feb 20 15:22:37 2017 - [info]  All relay logs were successfully applied.
Mon Feb 20 15:22:37 2017 - [info] Getting new master's binlog name and position..
Mon Feb 20 15:22:37 2017 - [info]  mysql-bin.000002:62292198
Mon Feb 20 15:22:37 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.99.73.10', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=62292198, MASTER_USER='mysql_slave', MASTER_PASSWORD='xxx';
Mon Feb 20 15:22:37 2017 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Mon Feb 20 15:22:37 2017 - [info] Setting read_only=0 on 10.99.73.10(10.99.73.10:3306)..
Mon Feb 20 15:22:37 2017 - [info]  ok.
Mon Feb 20 15:22:37 2017 - [info] ** Finished master recovery successfully.
Mon Feb 20 15:22:37 2017 - [info] * Phase 3: Master Recovery Phase completed.
Mon Feb 20 15:22:37 2017 - [info] 
Mon Feb 20 15:22:37 2017 - [info] * Phase 4: Slaves Recovery Phase..
Mon Feb 20 15:22:37 2017 - [info] 
Mon Feb 20 15:22:37 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Mon Feb 20 15:22:37 2017 - [info] 
Mon Feb 20 15:22:37 2017 - [info] -- Slave diff file generation on host 10.99.73.11(10.99.73.11:3306) started, pid: 9926. Check tmp log /var/log/masterha/app1/10.99.73.11_3306_20170220152230.log if it takes time..
Mon Feb 20 15:22:37 2017 - [info] 
Mon Feb 20 15:22:37 2017 - [info] Log messages from 10.99.73.11 ...
Mon Feb 20 15:22:37 2017 - [info] 
Mon Feb 20 15:22:37 2017 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Feb 20 15:22:37 2017 - [info] End of log messages from 10.99.73.11.
Mon Feb 20 15:22:37 2017 - [info] -- 10.99.73.11(10.99.73.11:3306) has the latest relay log events.
Mon Feb 20 15:22:37 2017 - [info] Generating relay diff files from the latest slave succeeded.
Mon Feb 20 15:22:37 2017 - [info] 
Mon Feb 20 15:22:37 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Mon Feb 20 15:22:37 2017 - [info] 
Mon Feb 20 15:22:37 2017 - [info] -- Slave recovery on host 10.99.73.11(10.99.73.11:3306) started, pid: 9928. Check tmp log /var/log/masterha/app1/10.99.73.11_3306_20170220152230.log if it takes time..
Mon Feb 20 15:22:40 2017 - [info] 
Mon Feb 20 15:22:40 2017 - [info] Log messages from 10.99.73.11 ...
Mon Feb 20 15:22:40 2017 - [info] 
Mon Feb 20 15:22:37 2017 - [info] Sending binlog..
Mon Feb 20 15:22:37 2017 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog to root@10.99.73.11:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog succeeded.
Mon Feb 20 15:22:37 2017 - [info] Starting recovery on 10.99.73.11(10.99.73.11:3306)..
Mon Feb 20 15:22:37 2017 - [info]  Generating diffs succeeded.
Mon Feb 20 15:22:37 2017 - [info] Waiting until all relay logs are applied.
Mon Feb 20 15:22:37 2017 - [info]  done.
Mon Feb 20 15:22:37 2017 - [info] Getting slave status..
Mon Feb 20 15:22:37 2017 - [info] This slave(10.99.73.11)'s Exec_Master_Log_Pos(mysql-bin.000002:96173537) does not equal to Read_Master_Log_Pos(mysql-bin.000002:97142702). It is likely that relay log was cut during transaction. Need to recover from Exec_Master_Log_Pos.
Mon Feb 20 15:22:37 2017 - [info] Saving local relay logs from exec pos to read pos on 10.99.73.11: from relay-log.000003:96173750 to the end of the relay log..
Mon Feb 20 15:22:37 2017 - [info] Executing command : save_binary_logs --command=save --start_file=relay-log.000003  --start_pos=96173750 --output_file=/data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --binlog_dir=/data/mysql/3306/log/relaylog 
Mon Feb 20 15:22:38 2017 - [info] 
  Creating /data/mysql/mha if not exists..    ok.
 Concat binary/relay logs from relay-log.000003 pos 96173750 to relay-log.000003 EOF into /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog ..
 Binlog Checksum enabled
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 367.. ok.
  Dumping effective binlog data from /data/mysql/3306/log/relaylog/relay-log.000003 position 96173750 to tail(97142915).. ok.
 Binlog Checksum enabled
 Binlog Checksum enabled
 Concat succeeded.
Mon Feb 20 15:22:38 2017 - [info] Connecting to the target slave host 10.99.73.11, running recover script..
Mon Feb 20 15:22:38 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.99.73.11 --slave_ip=10.99.73.11  --slave_port=3306 --apply_files=/data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog 
--workdir=/data/mysql/mha --target_version=5.7.17-log --timestamp=20170220152230 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx
Mon Feb 20 15:22:40 2017 - [info] 
 Concat all apply files to /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220152230.binlog ..
 Copying the first binlog file /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog to /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220152230.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog..  Binlog Checksum enabled
dumped up to pos 154. ok.
 /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog has effective binlog events from pos 154.
  Dumping effective binlog data from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog position 154 to tail(26156638).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220152230.binlog .
MySQL client version is 5.7.17. Using --binary-mode.
Applying differential binary/relay log files /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog on 10.99.73.11:3306. This may take long time...
Applying log files succeeded.
Mon Feb 20 15:22:40 2017 - [info]  All relay logs were successfully applied.
Mon Feb 20 15:22:40 2017 - [info]  Resetting slave 10.99.73.11(10.99.73.11:3306) and starting replication from the new master 10.99.73.10(10.99.73.10:3306)..
Mon Feb 20 15:22:40 2017 - [info]  Executed CHANGE MASTER.
Mon Feb 20 15:22:40 2017 - [info]  Slave started.
Mon Feb 20 15:22:40 2017 - [info] End of log messages from 10.99.73.11.
Mon Feb 20 15:22:40 2017 - [info] -- Slave recovery on host 10.99.73.11(10.99.73.11:3306) succeeded.
Mon Feb 20 15:22:40 2017 - [info] All new slave servers recovered successfully.
Mon Feb 20 15:22:40 2017 - [info] 
Mon Feb 20 15:22:40 2017 - [info] * Phase 5: New master cleanup phase..
Mon Feb 20 15:22:40 2017 - [info] 
Mon Feb 20 15:22:40 2017 - [info] Resetting slave info on the new master..
Mon Feb 20 15:22:40 2017 - [info]  10.99.73.10: Resetting slave info succeeded.
Mon Feb 20 15:22:40 2017 - [info] Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully.
Mon Feb 20 15:22:40 2017 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .
Mon Feb 20 15:22:40 2017 - [info] 

----- Failover Report -----

app1: MySQL Master failover 10.99.73.9(10.99.73.9:3306) to 10.99.73.10(10.99.73.10:3306) succeeded

Master 10.99.73.9(10.99.73.9:3306) is down!

Check MHA Manager logs at mha:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.
The latest slave 10.99.73.11(10.99.73.11:3306) has all relay logs for recovery.
Selected 10.99.73.10(10.99.73.10:3306) as a new master.
10.99.73.10(10.99.73.10:3306): OK: Applying all logs succeeded.
10.99.73.11(10.99.73.11:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded.
10.99.73.11(10.99.73.11:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.99.73.10(10.99.73.10:3306)
10.99.73.10(10.99.73.10:3306): Resetting slave info succeeded.
Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully.
Mon Feb 20 15:22:40 2017 - [info] Sending mail..

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

[root@mha ~]# cat /var/log/masterha/app1/manager.log

Mon Feb 20 15:22:21 2017 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query)

Mon Feb 20 15:22:21 2017 - [info] Executing secondary network check script: /usr/local/bin/masterha_secondary_check -s mysql02 -s mysql03 --user=root --master_host=mysql01 --master_ip=10.99.73.9 --master_port=3306 --user=root --master_host=10.99.73.9 --master_ip=10.99.73.9 --master_port=3306 --master_user=mha --master_password=123456 --ping_type=SELECT

Mon Feb 20 15:22:21 2017 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql/log/binlog --output_file=/data/mysql/mha/save_binary_logs_test --manager_version=0.57 --binlog_prefix=mysql-bin

Mon Feb 20 15:22:21 2017 - [info] HealthCheck: SSH to 10.99.73.9 is reachable.

Monitoring server mysql02 is reachable, Master is not reachable from mysql02. OK.

Monitoring server mysql03 is reachable, Master is not reachable from mysql03. OK.

Mon Feb 20 15:22:21 2017 - [info] Master is not reachable from all other monitoring servers. Failover should start.

Mon Feb 20 15:22:24 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.99.73.9' (111))

Mon Feb 20 15:22:24 2017 - [warning] Connection failed 2 time(s)..

Mon Feb 20 15:22:27 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.99.73.9' (111))

Mon Feb 20 15:22:27 2017 - [warning] Connection failed 3 time(s)..

Mon Feb 20 15:22:30 2017 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.99.73.9' (111))

Mon Feb 20 15:22:30 2017 - [warning] Connection failed 4 time(s)..

Mon Feb 20 15:22:30 2017 - [warning] Master is not reachable from health checker!

Mon Feb 20 15:22:30 2017 - [warning] Master 10.99.73.9(10.99.73.9:3306) is not reachable!

Mon Feb 20 15:22:30 2017 - [warning] SSH is reachable.

Mon Feb 20 15:22:30 2017 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/masterha/app1.cnf again, and trying to connect to all servers to check server status..

Mon Feb 20 15:22:30 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Mon Feb 20 15:22:30 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..

Mon Feb 20 15:22:30 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..

Mon Feb 20 15:22:30 2017 - [info] GTID failover mode = 0

Mon Feb 20 15:22:30 2017 - [info] Dead Servers:

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:30 2017 - [info] Alive Servers:

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.10(10.99.73.10:3306)

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.11(10.99.73.11:3306)

Mon Feb 20 15:22:30 2017 - [info] Alive Slaves:

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:22:30 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:30 2017 - [info] Primary candidate for the new Master (candidate_master is set)

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:22:30 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:30 2017 - [info] Not candidate for the new Master (no_master is set)

Mon Feb 20 15:22:30 2017 - [info] Checking slave configurations..

Mon Feb 20 15:22:30 2017 - [info] Checking replication filtering settings..

Mon Feb 20 15:22:30 2017 - [info] Replication filtering check ok.

Mon Feb 20 15:22:30 2017 - [info] Master is down!

Mon Feb 20 15:22:30 2017 - [info] Terminating monitoring script.

Mon Feb 20 15:22:30 2017 - [info] Got exit code 20 (Master dead).

Mon Feb 20 15:22:30 2017 - [info] MHA::MasterFailover version 0.57.

Mon Feb 20 15:22:30 2017 - [info] Starting master failover.

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] * Phase 1: Configuration Check Phase..

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] GTID failover mode = 0

Mon Feb 20 15:22:30 2017 - [info] Dead Servers:

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:30 2017 - [info] Checking master reachability via MySQL(double check)...

Mon Feb 20 15:22:30 2017 - [info] ok.

Mon Feb 20 15:22:30 2017 - [info] Alive Servers:

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.10(10.99.73.10:3306)

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.11(10.99.73.11:3306)

Mon Feb 20 15:22:30 2017 - [info] Alive Slaves:

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:22:30 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:30 2017 - [info] Primary candidate for the new Master (candidate_master is set)

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:22:30 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:30 2017 - [info] Not candidate for the new Master (no_master is set)

Mon Feb 20 15:22:30 2017 - [info] Starting Non-GTID based failover.

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] ** Phase 1: Configuration Check Phase completed.

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase..

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] Forcing shutdown so that applications never connect to the current master..

Mon Feb 20 15:22:30 2017 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.

Mon Feb 20 15:22:30 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.

Mon Feb 20 15:22:30 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] * Phase 3: Master Recovery Phase..

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000002:97142702

Mon Feb 20 15:22:30 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:22:30 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:30 2017 - [info] Not candidate for the new Master (no_master is set)

Mon Feb 20 15:22:30 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000002:90008622

Mon Feb 20 15:22:30 2017 - [info] Oldest slaves:

Mon Feb 20 15:22:30 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:22:30 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:30 2017 - [info] Primary candidate for the new Master (candidate_master is set)

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..

Mon Feb 20 15:22:30 2017 - [info]

Mon Feb 20 15:22:30 2017 - [info] Fetching dead master's binary logs..

Mon Feb 20 15:22:30 2017 - [info] Executing command on the dead master 10.99.73.9(10.99.73.9:3306): save_binary_logs --command=save --start_file=mysql-bin.000002

--start_pos=97142702 --binlog_dir=/data/mysql/log/binlog --output_file=/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog

--handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57

Creating /data/mysql/mha if not exists.. ok.

Concat binary/relay logs from mysql-bin.000002 pos 97142702 to mysql-bin.000002 EOF into /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog ..

Binlog Checksum enabled

Dumping binlog format description event, from position 0 to 154.. ok.

Dumping effective binlog data from /data/mysql/log/binlog/mysql-bin.000002 position 97142702 to tail(123299186).. ok.

Binlog Checksum enabled

Concat succeeded.

Mon Feb 20 15:22:31 2017 - [info] scp from root@10.99.73.9:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog succeeded.

Mon Feb 20 15:22:31 2017 - [info] HealthCheck: SSH to 10.99.73.10 is reachable.

Mon Feb 20 15:22:32 2017 - [info] HealthCheck: SSH to 10.99.73.11 is reachable.

Mon Feb 20 15:22:32 2017 - [info]

Mon Feb 20 15:22:32 2017 - [info] * Phase 3.3: Determining New Master Phase..

Mon Feb 20 15:22:32 2017 - [info]

Mon Feb 20 15:22:32 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves..

Mon Feb 20 15:22:32 2017 - [info] Checking whether 10.99.73.11 has relay logs from the oldest position..

Mon Feb 20 15:22:32 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000002 --latest_rmlp=97142702 --target_mlf=mysql-bin.000002 --target_rmlp=90008622 --server_id=103309 --workdir=/data/mysql/mha --timestamp=20170220152230 --manager_version=0.57 --relay_dir=/data/mysql/3306/log/relaylog --current_relay_log=relay-log.000003 :

Relay log found at /data/mysql/3306/log/relaylog, up to relay-log.000003

Fast relay log position search succeeded.

Target relay log file/position found. start_file:relay-log.000003, start_pos:90008835.

Target relay log FOUND!

Mon Feb 20 15:22:32 2017 - [info] OK. 10.99.73.11 has all relay logs.

Mon Feb 20 15:22:32 2017 - [info] Searching new master from slaves..

Mon Feb 20 15:22:32 2017 - [info] Candidate masters from the configuration file:

Mon Feb 20 15:22:32 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:22:32 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:32 2017 - [info] Primary candidate for the new Master (candidate_master is set)

Mon Feb 20 15:22:32 2017 - [info] Non-candidate masters:

Mon Feb 20 15:22:32 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:22:32 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:22:32 2017 - [info] Not candidate for the new Master (no_master is set)

Mon Feb 20 15:22:32 2017 - [info] Searching from candidate_master slaves which have received the latest relay log events..

Mon Feb 20 15:22:32 2017 - [info] Not found.

Mon Feb 20 15:22:32 2017 - [info] Searching from all candidate_master slaves..

Mon Feb 20 15:22:32 2017 - [info] New master is 10.99.73.10(10.99.73.10:3306)

Mon Feb 20 15:22:32 2017 - [info] Starting master failover..

Mon Feb 20 15:22:32 2017 - [info]

From:

10.99.73.9(10.99.73.9:3306) (current master)

+--10.99.73.10(10.99.73.10:3306)

+--10.99.73.11(10.99.73.11:3306)

To:

10.99.73.10(10.99.73.10:3306) (new master)

+--10.99.73.11(10.99.73.11:3306)

Mon Feb 20 15:22:32 2017 - [info]

Mon Feb 20 15:22:32 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase..

Mon Feb 20 15:22:32 2017 - [info]

Mon Feb 20 15:22:32 2017 - [info] Server 10.99.73.10 received relay logs up to: mysql-bin.000002:90008622

Mon Feb 20 15:22:32 2017 - [info] Need to get diffs from the latest slave(10.99.73.11) up to: mysql-bin.000002:97142702 (using the latest slave's relay logs)

Mon Feb 20 15:22:32 2017 - [info] Connecting to the latest slave host 10.99.73.11, generating diff relay log files..

Mon Feb 20 15:22:32 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=10.99.73.10 --latest_mlf=mysql-bin.000002 --latest_rmlp=97142702 --target_mlf=mysql-bin.000002 --target_rmlp=90008622 --server_id=103309 --diff_file_readtolatest=/data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog

--workdir=/data/mysql/mha --timestamp=20170220152230 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --relay_dir=/data/mysql/3306/log/relaylog --current_relay_log=relay-log.000003

Mon Feb 20 15:22:33 2017 - [info]

Relay log found at /data/mysql/3306/log/relaylog, up to relay-log.000003

Fast relay log position search succeeded.

Target relay log file/position found. start_file:relay-log.000003, start_pos:90008835.

Concat binary/relay logs from relay-log.000003 pos 90008835 to relay-log.000003 EOF into /data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog ..

Binlog Checksum enabled

Dumping binlog format description event, from position 0 to 367.. ok.

Dumping effective binlog data from /data/mysql/3306/log/relaylog/relay-log.000003 position 90008835 to tail(97142915).. ok.

Binlog Checksum enabled

Concat succeeded.

Generating diff relay log succeeded. Saved at /data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog .

scp mysql03:/data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog to root@10.99.73.10(22) succeeded.

Mon Feb 20 15:22:33 2017 - [info] Generating diff files succeeded.

Mon Feb 20 15:22:33 2017 - [info] Sending binlog..

Mon Feb 20 15:22:34 2017 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog to root@10.99.73.10:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog succeeded.

Mon Feb 20 15:22:34 2017 - [info]

Mon Feb 20 15:22:34 2017 - [info] * Phase 3.4: Master Log Apply Phase..

Mon Feb 20 15:22:34 2017 - [info]

Mon Feb 20 15:22:34 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.

Mon Feb 20 15:22:34 2017 - [info] Starting recovery on 10.99.73.10(10.99.73.10:3306)..

Mon Feb 20 15:22:34 2017 - [info] Generating diffs succeeded.

Mon Feb 20 15:22:34 2017 - [info] Waiting until all relay logs are applied.

Mon Feb 20 15:22:34 2017 - [info] done.

Mon Feb 20 15:22:34 2017 - [info] Getting slave status..

Mon Feb 20 15:22:34 2017 - [info] This slave(10.99.73.10)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000002:90008622). No need to recover from Exec_Master_Log_Pos.

Mon Feb 20 15:22:34 2017 - [info] Connecting to the target slave host 10.99.73.10, running recover script..

Mon Feb 20 15:22:34 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.99.73.10 --slave_ip=10.99.73.10 --slave_port=3306 --apply_files=/data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog

--workdir=/data/mysql/mha --target_version=5.7.17-log --timestamp=20170220152230 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx

Mon Feb 20 15:22:37 2017 - [info]

Concat all apply files to /data/mysql/mha/total_binlog_for_10.99.73.10_3306.20170220152230.binlog ..

Copying the first binlog file /data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog to /data/mysql/mha/total_binlog_for_10.99.73.10_3306.20170220152230.binlog.. ok.

Dumping binlog head events (rotate events), skipping format description events from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog.. Binlog Checksum enabled

dumped up to pos 154. ok.

/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog has effective binlog events from pos 154.

Dumping effective binlog data from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog position 154 to tail(26156638).. ok.

Concat succeeded.

All apply target binary logs are concatinated at /data/mysql/mha/total_binlog_for_10.99.73.10_3306.20170220152230.binlog .

MySQL client version is 5.7.17. Using --binary-mode.

Applying differential binary/relay log files /data/mysql/mha/relay_from_read_to_latest_10.99.73.10_3306_20170220152230.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog on 10.99.73.10:3306. This may take long time...

Applying log files succeeded.

Mon Feb 20 15:22:37 2017 - [info] All relay logs were successfully applied.

Mon Feb 20 15:22:37 2017 - [info] Getting new master's binlog name and position..

Mon Feb 20 15:22:37 2017 - [info] mysql-bin.000002:62292198

Mon Feb 20 15:22:37 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.99.73.10', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=62292198, MASTER_USER='mysql_slave', MASTER_PASSWORD='xxx';

Mon Feb 20 15:22:37 2017 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.

Mon Feb 20 15:22:37 2017 - [info] Setting read_only=0 on 10.99.73.10(10.99.73.10:3306)..

Mon Feb 20 15:22:37 2017 - [info] ok.

Mon Feb 20 15:22:37 2017 - [info] ** Finished master recovery successfully.

Mon Feb 20 15:22:37 2017 - [info] * Phase 3: Master Recovery Phase completed.

Mon Feb 20 15:22:37 2017 - [info]

Mon Feb 20 15:22:37 2017 - [info] * Phase 4: Slaves Recovery Phase..

Mon Feb 20 15:22:37 2017 - [info]

Mon Feb 20 15:22:37 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..

Mon Feb 20 15:22:37 2017 - [info]

Mon Feb 20 15:22:37 2017 - [info] -- Slave diff file generation on host 10.99.73.11(10.99.73.11:3306) started, pid: 9926. Check tmp log /var/log/masterha/app1/10.99.73.11_3306_20170220152230.log if it takes time..

Mon Feb 20 15:22:37 2017 - [info]

Mon Feb 20 15:22:37 2017 - [info] Log messages from 10.99.73.11 ...

Mon Feb 20 15:22:37 2017 - [info]

Mon Feb 20 15:22:37 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave.

Mon Feb 20 15:22:37 2017 - [info] End of log messages from 10.99.73.11.

Mon Feb 20 15:22:37 2017 - [info] -- 10.99.73.11(10.99.73.11:3306) has the latest relay log events.

Mon Feb 20 15:22:37 2017 - [info] Generating relay diff files from the latest slave succeeded.

Mon Feb 20 15:22:37 2017 - [info]

Mon Feb 20 15:22:37 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..

Mon Feb 20 15:22:37 2017 - [info]

Mon Feb 20 15:22:37 2017 - [info] -- Slave recovery on host 10.99.73.11(10.99.73.11:3306) started, pid: 9928. Check tmp log /var/log/masterha/app1/10.99.73.11_3306_20170220152230.log if it takes time..

Mon Feb 20 15:22:40 2017 - [info]

Mon Feb 20 15:22:40 2017 - [info] Log messages from 10.99.73.11 ...

Mon Feb 20 15:22:40 2017 - [info]

Mon Feb 20 15:22:37 2017 - [info] Sending binlog..

Mon Feb 20 15:22:37 2017 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog to root@10.99.73.11:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog succeeded.

Mon Feb 20 15:22:37 2017 - [info] Starting recovery on 10.99.73.11(10.99.73.11:3306)..

Mon Feb 20 15:22:37 2017 - [info] Generating diffs succeeded.

Mon Feb 20 15:22:37 2017 - [info] Waiting until all relay logs are applied.

Mon Feb 20 15:22:37 2017 - [info] done.

Mon Feb 20 15:22:37 2017 - [info] Getting slave status..

Mon Feb 20 15:22:37 2017 - [info] This slave(10.99.73.11)'s Exec_Master_Log_Pos(mysql-bin.000002:96173537) does not equal to Read_Master_Log_Pos(mysql-bin.000002:97142702). It is likely that relay log was cut during transaction. Need to recover from Exec_Master_Log_Pos.

Mon Feb 20 15:22:37 2017 - [info] Saving local relay logs from exec pos to read pos on 10.99.73.11: from relay-log.000003:96173750 to the end of the relay log..

Mon Feb 20 15:22:37 2017 - [info] Executing command : save_binary_logs --command=save --start_file=relay-log.000003 --start_pos=96173750 --output_file=/data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --binlog_dir=/data/mysql/3306/log/relaylog

Mon Feb 20 15:22:38 2017 - [info]

Creating /data/mysql/mha if not exists.. ok.

Concat binary/relay logs from relay-log.000003 pos 96173750 to relay-log.000003 EOF into /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog ..

Binlog Checksum enabled

Dumping binlog format description event, from position 0 to 367.. ok.

Dumping effective binlog data from /data/mysql/3306/log/relaylog/relay-log.000003 position 96173750 to tail(97142915).. ok.

Binlog Checksum enabled

Concat succeeded.

Mon Feb 20 15:22:38 2017 - [info] Connecting to the target slave host 10.99.73.11, running recover script..

Mon Feb 20 15:22:38 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.99.73.11 --slave_ip=10.99.73.11 --slave_port=3306 --apply_files=/data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog

--workdir=/data/mysql/mha --target_version=5.7.17-log --timestamp=20170220152230 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx

Mon Feb 20 15:22:40 2017 - [info]

Concat all apply files to /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220152230.binlog ..

Copying the first binlog file /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog to /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220152230.binlog.. ok.

Dumping binlog head events (rotate events), skipping format description events from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog.. Binlog Checksum enabled

dumped up to pos 154. ok.

/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog has effective binlog events from pos 154.

Dumping effective binlog data from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog position 154 to tail(26156638).. ok.

Concat succeeded.

All apply target binary logs are concatinated at /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220152230.binlog .

MySQL client version is 5.7.17. Using --binary-mode.

Applying differential binary/relay log files /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220152230.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220152230.binlog on 10.99.73.11:3306. This may take long time...

Applying log files succeeded.

Mon Feb 20 15:22:40 2017 - [info] All relay logs were successfully applied.

Mon Feb 20 15:22:40 2017 - [info] Resetting slave 10.99.73.11(10.99.73.11:3306) and starting replication from the new master 10.99.73.10(10.99.73.10:3306)..

Mon Feb 20 15:22:40 2017 - [info] Executed CHANGE MASTER.

Mon Feb 20 15:22:40 2017 - [info] Slave started.

Mon Feb 20 15:22:40 2017 - [info] End of log messages from 10.99.73.11.

Mon Feb 20 15:22:40 2017 - [info] -- Slave recovery on host 10.99.73.11(10.99.73.11:3306) succeeded.

Mon Feb 20 15:22:40 2017 - [info] All new slave servers recovered successfully.

Mon Feb 20 15:22:40 2017 - [info]

Mon Feb 20 15:22:40 2017 - [info] * Phase 5: New master cleanup phase..

Mon Feb 20 15:22:40 2017 - [info]

Mon Feb 20 15:22:40 2017 - [info] Resetting slave info on the new master..

Mon Feb 20 15:22:40 2017 - [info] 10.99.73.10: Resetting slave info succeeded.

Mon Feb 20 15:22:40 2017 - [info] Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully.

Mon Feb 20 15:22:40 2017 - [info] Deleted server1 entry from /etc/masterha/app1.cnf .

Mon Feb 20 15:22:40 2017 - [info]

----- Failover Report -----

app1: MySQL Master failover 10.99.73.9(10.99.73.9:3306) to 10.99.73.10(10.99.73.10:3306) succeeded

Master 10.99.73.9(10.99.73.9:3306) is down!

Check MHA Manager logs at mha:/var/log/masterha/app1/manager.log for details.

Started automated(non-interactive) failover.

The latest slave 10.99.73.11(10.99.73.11:3306) has all relay logs for recovery.

Selected 10.99.73.10(10.99.73.10:3306) as a new master.

10.99.73.10(10.99.73.10:3306): OK: Applying all logs succeeded.

10.99.73.11(10.99.73.11:3306): This host has the latest relay log events.

Generating relay diff files from the latest slave succeeded.

10.99.73.11(10.99.73.11:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.99.73.10(10.99.73.10:3306)

10.99.73.10(10.99.73.10:3306): Resetting slave info succeeded.

Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully.

Mon Feb 20 15:22:40 2017 - [info] Sending mail..

看到最后如果出现”Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully.”，说明备选master现在已经上位了。并且可以看到两个slave中10.99.73.10成为了new master，而10.99.73.11成为了10.99.73.10的slave。这主要是因为我们刻意把10.99.73.11设置为了no_master，就是不参与选举。如果没有设置no_master的话，那么MHA在进行选择时会根据数据最接近于master的slave。

从上面的输出可以看出整个MHA的切换过程，共包括以下的步骤：

1. 配置文件检查阶段，这个阶段会检查整个集群配置文件配置，找出master和slave。

2. 宕机的master处理，这个阶段包括虚拟ip摘除操作，主机关机操作（可选）。

3. 找出含有最新数据的slave。

4. 复制dead master和最新slave相差的relay log，并保存到MHA Manger具体的目录下。

5. 找出并提升一个slave为新的master。

6. 应用从master保存的二进制日志事件（binlog events）到新的master。

7. 使其他的slave连接新的master进行复制。

6. 验证new master（10.99.73.10）

mysql> select count(*) from sbtest.sbtest;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.24 sec)

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 | 62292198 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> show slave hosts;
+-----------+-------------+------+-----------+--------------------------------------+
| Server_id | Host        | Port | Master_id | Slave_UUID                           |
+-----------+-------------+------+-----------+--------------------------------------+
|    103309 | 10.99.73.11 | 3306 |      2000 | c9c57719-edbf-11e6-8f8c-fa163ece854a |
+-----------+-------------+------+-----------+--------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest.sbtest;

+----------+

| count(*) |

+----------+

| 1000000 |

+----------+

1 row in set (0.24 sec)

mysql> show master status;

+------------------+----------+--------------+------------------+-------------------+

+------------------+----------+--------------+------------------+-------------------+

| mysql-bin.000002 | 62292198 | | | |

+------------------+----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

mysql> show slave hosts;

+-----------+-------------+------+-----------+--------------------------------------+

+-----------+-------------+------+-----------+--------------------------------------+

| 103309 | 10.99.73.11 | 3306 | 2000 | c9c57719-edbf-11e6-8f8c-fa163ece854a |

+-----------+-------------+------+-----------+--------------------------------------+

1 row in set (0.00 sec)

由此结果可以看出new master跟old master的数据一致，说明mha是根据old master来补齐new master的差异数据的。

7. 切换发送邮件

最后补充一下邮件发送脚本send_report ，如下：

#!/usr/bin/perl

use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.163.com';
my $mail_from='xxxx';
my $mail_user='xxxxx';
my $mail_pass='xxxxx';
my $mail_to=['xxxx','xxxx'];
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);

mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {
    my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;
    open my $DEBUG, "> /tmp/monitormail.log"
        or die "Can't open the debug      file:$!\n";
    my $sender = new Mail::Sender {
        ctype       => 'text/plain; charset=utf-8',
        encoding    => 'utf-8',
        smtp        => $smtp,
        from        => $mail_from,
        auth        => 'LOGIN',
        TLS_allowed => '0',
        authid      => $user,
        authpwd     => $passwd,
        to          => $mail_to,
        subject     => $subject,
        debug       => $DEBUG
    };

    $sender->MailMsg(
        {   msg   => $msg,
            debug => $DEBUG
        }
    ) or print $Mail::Sender::Error;
    return 1;
}

# Do whatever you want here
exit 0;

#!/usr/bin/perl

use strict;

use warnings FATAL => 'all';

use Mail::Sender;

use Getopt::Long;

#new_master_host and new_slave_hosts are set only when recovering master succeeded

my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );

my $smtp='smtp.163.com';

my $mail_from='xxxx';

my $mail_user='xxxxx';

my $mail_pass='xxxxx';

my $mail_to=['xxxx','xxxx'];

GetOptions(

'orig_master_host=s' => \$dead_master_host,

'new_master_host=s' => \$new_master_host,

'new_slave_hosts=s' => \$new_slave_hosts,

'subject=s' => \$subject,

'body=s' => \$body,

);

mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);

sub mailToContacts {

my ( $smtp, $mail_from, $user, $passwd, $mail_to, $subject, $msg ) = @_;

open my $DEBUG, "> /tmp/monitormail.log"

or die "Can't open the debug file:$!\n";

my $sender = new Mail::Sender {

ctype => 'text/plain; charset=utf-8',

encoding => 'utf-8',

smtp => $smtp,

from => $mail_from,

auth => 'LOGIN',

TLS_allowed => '0',

authid => $user,

authpwd => $passwd,

to => $mail_to,

subject => $subject,

debug => $DEBUG

};

$sender->MailMsg(

{ msg => $msg,

debug => $DEBUG

}

) or print $Mail::Sender::Error;

return 1;

}

# Do whatever you want here

exit 0;

切换后报警信息如下：

三、手动Failover（MHA Manager必须没有运行）

首先搞一个干净的MySQL复制集群加mha监控（在mha监控端不需要开启mha manager）。手动failover，这种场景意味着在业务上没有启用MHA自动切换功能，当主服务器故障时，人工手动调用MHA来进行故障切换操作。注意：如果MHA manager检测到没有dead的server，将报错，并结束failover，如下报错信息：

Thu Feb 16 18:18:07 2017 - [info] * Phase 1: Configuration Check Phase..
Thu Feb 16 18:18:07 2017 - [info] 
Thu Feb 16 18:18:07 2017 - [info] GTID failover mode = 1
Thu Feb 16 18:18:07 2017 - [info] Dead Servers:
Thu Feb 16 18:18:07 2017 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln187] None of server is dead. Stop failover.
Thu Feb 16 18:18:07 2017 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/local/bin/masterha_master_switch line 53.

Thu Feb 16 18:18:07 2017 - [info] * Phase 1: Configuration Check Phase..

Thu Feb 16 18:18:07 2017 - [info]

Thu Feb 16 18:18:07 2017 - [info] GTID failover mode = 1

Thu Feb 16 18:18:07 2017 - [info] Dead Servers:

Thu Feb 16 18:18:07 2017 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln187] None of server is dead. Stop failover.

Thu Feb 16 18:18:07 2017 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/bin/masterha_master_switch line 53.

进行手动切换命令如下，但是在切换之前我们需要模拟一下slave延迟，然后让切换脚本自动补全relay log。

在主库（10.99.73.9）上进行sysbench数据生成，在sbtest库下生成sbtest表，共100W记录。

$ sysbench /usr/share/sysbench/oltp_read_only.lua \
--mysql-host=127.0.0.1 \
--mysql-port=3306 \
--mysql-user=mha \
--mysql-password=123456 \
--mysql-socket=/data/mysql/3306/mysql.sock \
--mysql-db=sbtest \
--db-driver=mysql \
--tables=1 \
--table-size=1000000 \
--report-interval=10 \
--threads=128 \
--time=120 \
prepare

$ sysbench /usr/share/sysbench/oltp_read_only.lua \

--mysql-host=127.0.0.1 \

--mysql-port=3306 \

--mysql-user=mha \

--mysql-password=123456 \

--mysql-socket=/data/mysql/3306/mysql.sock \

--mysql-db=sbtest \

--db-driver=mysql \

--tables=1 \

--table-size=1000000 \

--report-interval=10 \

--threads=128 \

--time=120 \

prepare

主库写入一些数据后，就可以关闭10.99.73.10的io_thread，保持比主库数据量小。

mysql> stop slave io_thread;
Query OK, 0 rows affected (0.09 sec)

mysql> select count(*) from sbtest.sbtest;
+----------+
| count(*) |
+----------+
|   640000 |
+----------+
1 row in set (0.21 sec)

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 39867231 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> stop slave io_thread;

Query OK, 0 rows affected (0.09 sec)

mysql> select count(*) from sbtest.sbtest;

+----------+

| count(*) |

+----------+

| 640000 |

+----------+

1 row in set (0.21 sec)

mysql> show master status;

+------------------+----------+--------------+------------------+-------------------+

+------------------+----------+--------------+------------------+-------------------+

| mysql-bin.000001 | 39867231 | | | |

+------------------+----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

然后就可以关闭10.99.73.9的io_thread，但是要保证10.99.73.9主机的数据比10.99.73.10数据多，但是比10.99.73.9的数据小。

mysql> stop slave io_thread;
Query OK, 0 rows affected (0.00 sec)

mysql> select count(*) from sbtest.sbtest;
+----------+
| count(*) |
+----------+
|   700000 |
+----------+
1 row in set (0.19 sec)

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 43604703 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> stop slave io_thread;

Query OK, 0 rows affected (0.00 sec)

mysql> select count(*) from sbtest.sbtest;

+----------+

| count(*) |

+----------+

| 700000 |

+----------+

1 row in set (0.19 sec)

mysql> show master status;

+------------------+----------+--------------+------------------+-------------------+

+------------------+----------+--------------+------------------+-------------------+

| mysql-bin.000001 | 43604703 | | | |

+------------------+----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

然后等主库压测完事之后查看一下主库状态就关闭mysql进程。

$ mysql -S /data/mysql/3306/mysql.sock
mysql> select count(*) from sbtest.sbtest;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.23 sec)

mysql>  show master status;
+------------------+-----------+--------------+------------------+-------------------+
| File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+-----------+--------------+------------------+-------------------+
| mysql-bin.000001 | 123299163 |              |                  |                   |
+------------------+-----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

$ mysql -S /data/mysql/3306/mysql.sock

mysql> select count(*) from sbtest.sbtest;

+----------+

| count(*) |

+----------+

| 1000000 |

+----------+

1 row in set (0.23 sec)

mysql> show master status;

+------------------+-----------+--------------+------------------+-------------------+

+------------------+-----------+--------------+------------------+-------------------+

| mysql-bin.000001 | 123299163 | | | |

+------------------+-----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

$ mysqladmin -S /data/mysql/3306/mysql.sock shutdown

1	$ mysqladmin -S /data/mysql/3306/mysql.sock shutdown

然后就可以在mha主机上进行手动切换了，由于要指定特定的slave为候选master，而此slave还落后非常多，可以在每组服务器上都加上check_repl_delay=0表示忽略复制延迟，不然整个恢复过程会加长。如下配置：

.......
[server1]
candidate_master=1
hostname=10.99.73.9
port=3306
check_repl_delay=0

[server2]
candidate_master=1
hostname=10.99.73.10
port=3306
check_repl_delay=0

[server3]
hostname=10.99.73.11
no_master=1
port=3306
check_repl_delay=0

.......

[server1]

candidate_master=1

hostname=10.99.73.9

port=3306

check_repl_delay=0

[server2]

candidate_master=1

hostname=10.99.73.10

port=3306

check_repl_delay=0

[server3]

hostname=10.99.73.11

no_master=1

port=3306

check_repl_delay=0

手动切换：

$ masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=10.99.73.9 --dead_master_port=3306 --interactive=1 --new_master_host=10.99.73.10

1	$ masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=10.99.73.9 --dead_master_port=3306 --interactive=1 --new_master_host=10.99.73.10

会输出整个切换过程以及会询问你是否进行切换：

--dead_master_ip=<dead_master_ip> is not set. Using 10.99.73.9.
Mon Feb 20 15:52:49 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Feb 20 15:52:49 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Feb 20 15:52:49 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Feb 20 15:52:49 2017 - [info] MHA::MasterFailover version 0.57.
Mon Feb 20 15:52:49 2017 - [info] Starting master failover.
Mon Feb 20 15:52:49 2017 - [info] 
Mon Feb 20 15:52:49 2017 - [info] * Phase 1: Configuration Check Phase..
Mon Feb 20 15:52:49 2017 - [info] 
Mon Feb 20 15:52:49 2017 - [info] GTID failover mode = 0
Mon Feb 20 15:52:49 2017 - [info] Dead Servers:
Mon Feb 20 15:52:49 2017 - [info]   10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:52:49 2017 - [info] Checking master reachability via MySQL(double check)...
Mon Feb 20 15:52:49 2017 - [info]  ok.
Mon Feb 20 15:52:49 2017 - [info] Alive Servers:
Mon Feb 20 15:52:49 2017 - [info]   10.99.73.10(10.99.73.10:3306)
Mon Feb 20 15:52:49 2017 - [info]   10.99.73.11(10.99.73.11:3306)
Mon Feb 20 15:52:49 2017 - [info] Alive Slaves:
Mon Feb 20 15:52:49 2017 - [info]   10.99.73.10(10.99.73.10:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:52:49 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:52:49 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Feb 20 15:52:49 2017 - [info]   10.99.73.11(10.99.73.11:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:52:49 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:52:49 2017 - [info]     Not candidate for the new Master (no_master is set)
Master 10.99.73.9(10.99.73.9:3306) is dead. Proceed? (yes/NO): yes
Mon Feb 20 15:52:50 2017 - [info] Starting Non-GTID based failover.
Mon Feb 20 15:52:50 2017 - [info] 
Mon Feb 20 15:52:50 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Feb 20 15:52:50 2017 - [info] 
Mon Feb 20 15:52:50 2017 - [info] * Phase 2: Dead Master Shutdown Phase..
Mon Feb 20 15:52:50 2017 - [info] 
Mon Feb 20 15:52:50 2017 - [info] HealthCheck: SSH to 10.99.73.9 is reachable.
Mon Feb 20 15:52:51 2017 - [info] Forcing shutdown so that applications never connect to the current master..
Mon Feb 20 15:52:51 2017 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.
Mon Feb 20 15:52:51 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Mon Feb 20 15:52:51 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Mon Feb 20 15:52:51 2017 - [info] 
Mon Feb 20 15:52:51 2017 - [info] * Phase 3: Master Recovery Phase..
Mon Feb 20 15:52:51 2017 - [info] 
Mon Feb 20 15:52:51 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Mon Feb 20 15:52:51 2017 - [info] 
Mon Feb 20 15:52:51 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:123299163
Mon Feb 20 15:52:51 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):
Mon Feb 20 15:52:51 2017 - [info]   10.99.73.10(10.99.73.10:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:52:51 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:52:51 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Feb 20 15:52:51 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:86309812
Mon Feb 20 15:52:51 2017 - [info] Oldest slaves:
Mon Feb 20 15:52:51 2017 - [info]   10.99.73.11(10.99.73.11:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 15:52:51 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 15:52:51 2017 - [info]     Not candidate for the new Master (no_master is set)
Mon Feb 20 15:52:51 2017 - [info] 
Mon Feb 20 15:52:51 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Mon Feb 20 15:52:51 2017 - [info] 
Mon Feb 20 15:52:51 2017 - [info] Fetching dead master's binary logs..
Mon Feb 20 15:52:51 2017 - [info] Executing command on the dead master 10.99.73.9(10.99.73.9:3306): save_binary_logs --command=save --start_file=mysql-bin.000001  --start_pos=123299163 --binlog_dir=/data/mysql/log/binlog --output_file=/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57
  Creating /data/mysql/mha if not exists..    ok.
 Concat binary/relay logs from mysql-bin.000001 pos 123299163 to mysql-bin.000001 EOF into /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog ..
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 154.. ok.
  Dumping effective binlog data from /data/mysql/log/binlog/mysql-bin.000001 position 123299163 to tail(123299186).. ok.
 Binlog Checksum enabled
 Concat succeeded.
saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog                                     100%  177     0.2KB/s   00:00    
Mon Feb 20 15:52:51 2017 - [info] scp from root@10.99.73.9:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog succeeded.
Mon Feb 20 15:52:51 2017 - [info] HealthCheck: SSH to 10.99.73.10 is reachable.
Mon Feb 20 15:52:52 2017 - [info] HealthCheck: SSH to 10.99.73.11 is reachable.
Mon Feb 20 15:52:52 2017 - [info] 
Mon Feb 20 15:52:52 2017 - [info] * Phase 3.3: Determining New Master Phase..
Mon Feb 20 15:52:52 2017 - [info] 
Mon Feb 20 15:52:52 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Mon Feb 20 15:52:52 2017 - [info] Checking whether 10.99.73.10 has relay logs from the oldest position..
Mon Feb 20 15:52:52 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000001 --latest_rmlp=123299163 --target_mlf=mysql-bin.000001 --target_rmlp=86309812 --server_id=2000 --workdir=/data/mysql/mha --timestamp=20170220155249 --manager_version=0.57 --relay_dir=/data/mysql/3306/log/relaylog --current_relay_log=relay-log.000004  :
    Relay log found at /data/mysql/3306/log/relaylog, up to relay-log.000004
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:relay-log.000004, start_pos:7398357.
Target relay log FOUND!
Mon Feb 20 15:52:52 2017 - [info] OK. 10.99.73.10 has all relay logs.
Mon Feb 20 15:52:52 2017 - [info] 10.99.73.10 can be new master.
Mon Feb 20 15:52:52 2017 - [info] New master is 10.99.73.10(10.99.73.10:3306)
Mon Feb 20 15:52:52 2017 - [info] Starting master failover..
Mon Feb 20 15:52:52 2017 - [info] 
From:
10.99.73.9(10.99.73.9:3306) (current master)
 +--10.99.73.10(10.99.73.10:3306)
 +--10.99.73.11(10.99.73.11:3306)

To:
10.99.73.10(10.99.73.10:3306) (new master)
 +--10.99.73.11(10.99.73.11:3306)

Starting master switch from 10.99.73.9(10.99.73.9:3306) to 10.99.73.10(10.99.73.10:3306)? (yes/NO): yes
Mon Feb 20 15:52:55 2017 - [info] New master decided manually is 10.99.73.10(10.99.73.10:3306)
Mon Feb 20 15:52:55 2017 - [info] 
Mon Feb 20 15:52:55 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase..
Mon Feb 20 15:52:55 2017 - [info] 
Mon Feb 20 15:52:55 2017 - [info]  This server has all relay logs. No need to generate diff files from the latest slave.
Mon Feb 20 15:52:55 2017 - [info] Sending binlog..
saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog                                     100%  177     0.2KB/s   00:00    
Mon Feb 20 15:52:55 2017 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog to root@10.99.73.10:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog succeeded.
Mon Feb 20 15:52:55 2017 - [info] 
Mon Feb 20 15:52:55 2017 - [info] * Phase 3.4: Master Log Apply Phase..
Mon Feb 20 15:52:55 2017 - [info] 
Mon Feb 20 15:52:55 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Mon Feb 20 15:52:55 2017 - [info] Starting recovery on 10.99.73.10(10.99.73.10:3306)..
Mon Feb 20 15:52:55 2017 - [info]  Generating diffs succeeded.
Mon Feb 20 15:52:55 2017 - [info] Waiting until all relay logs are applied.
Mon Feb 20 15:52:55 2017 - [info]  done.
Mon Feb 20 15:52:55 2017 - [info] Getting slave status..
Mon Feb 20 15:52:55 2017 - [info] This slave(10.99.73.10)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:123299163). No need to recover from Exec_Master_Log_Pos.
Mon Feb 20 15:52:55 2017 - [info] Connecting to the target slave host 10.99.73.10, running recover script..
Mon Feb 20 15:52:55 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.99.73.10 --slave_ip=10.99.73.10  --slave_port=3306 --apply_files=/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog 
--workdir=/data/mysql/mha --target_version=5.7.17-log --timestamp=20170220155249 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx
Mon Feb 20 15:52:55 2017 - [info] 
MySQL client version is 5.7.17. Using --binary-mode.
Applying differential binary/relay log files /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog on 10.99.73.10:3306. This may take long time...
Applying log files succeeded.
Mon Feb 20 15:52:55 2017 - [info]  All relay logs were successfully applied.
Mon Feb 20 15:52:55 2017 - [info] Getting new master's binlog name and position..
Mon Feb 20 15:52:55 2017 - [info]  mysql-bin.000002:22424986
Mon Feb 20 15:52:55 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.99.73.10', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=22424986, MASTER_USER='mysql_slave', MASTER_PASSWORD='xxx';
Mon Feb 20 15:52:55 2017 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.
Mon Feb 20 15:52:55 2017 - [info] ** Finished master recovery successfully.
Mon Feb 20 15:52:55 2017 - [info] * Phase 3: Master Recovery Phase completed.
Mon Feb 20 15:52:55 2017 - [info] 
Mon Feb 20 15:52:55 2017 - [info] * Phase 4: Slaves Recovery Phase..
Mon Feb 20 15:52:55 2017 - [info] 
Mon Feb 20 15:52:55 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Mon Feb 20 15:52:55 2017 - [info] 
Mon Feb 20 15:52:55 2017 - [info] -- Slave diff file generation on host 10.99.73.11(10.99.73.11:3306) started, pid: 10463. Check tmp log /var/log/masterha/app1/10.99.73.11_3306_20170220155249.log if it takes time..
Mon Feb 20 15:52:57 2017 - [info] 
Mon Feb 20 15:52:57 2017 - [info] Log messages from 10.99.73.11 ...
Mon Feb 20 15:52:57 2017 - [info] 
Mon Feb 20 15:52:55 2017 - [info] Server 10.99.73.11 received relay logs up to: mysql-bin.000001:86309812
Mon Feb 20 15:52:55 2017 - [info] Need to get diffs from the latest slave(10.99.73.10) up to: mysql-bin.000001:123299163 (using the latest slave's relay logs)
Mon Feb 20 15:52:56 2017 - [info] Connecting to the latest slave host 10.99.73.10, generating diff relay log files..
Mon Feb 20 15:52:56 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=10.99.73.11 --latest_mlf=mysql-bin.000001 
--latest_rmlp=123299163 --target_mlf=mysql-bin.000001 --target_rmlp=86309812 --server_id=2000 --diff_file_readtolatest=/data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog 
--workdir=/data/mysql/mha --timestamp=20170220155249 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --relay_dir=/data/mysql/3306/log/relaylog --current_relay_log=relay-log.000004 
Mon Feb 20 15:52:57 2017 - [info] 
    Relay log found at /data/mysql/3306/log/relaylog, up to relay-log.000004
 Fast relay log position search succeeded.
 Target relay log file/position found. start_file:relay-log.000004, start_pos:7398357.
 Concat binary/relay logs from relay-log.000004 pos 7398357 to relay-log.000004 EOF into /data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog ..
 Binlog Checksum enabled
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 320.. ok.
  Dumping effective binlog data from /data/mysql/3306/log/relaylog/relay-log.000004 position 7398357 to tail(44387708).. ok.
 Binlog Checksum enabled
 Binlog Checksum enabled
 Concat succeeded.
 Generating diff relay log succeeded. Saved at /data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog .
 scp mysql02:/data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog to root@10.99.73.11(22) succeeded.
Mon Feb 20 15:52:57 2017 - [info]  Generating diff files succeeded.
Mon Feb 20 15:52:57 2017 - [info] End of log messages from 10.99.73.11.
Mon Feb 20 15:52:57 2017 - [info] -- Slave diff log generation on host 10.99.73.11(10.99.73.11:3306) succeeded.
Mon Feb 20 15:52:57 2017 - [info] Generating relay diff files from the latest slave succeeded.
Mon Feb 20 15:52:57 2017 - [info] 
Mon Feb 20 15:52:57 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Mon Feb 20 15:52:57 2017 - [info] 
Mon Feb 20 15:52:57 2017 - [info] -- Slave recovery on host 10.99.73.11(10.99.73.11:3306) started, pid: 10469. Check tmp log /var/log/masterha/app1/10.99.73.11_3306_20170220155249.log if it takes time..
saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog                                     100%  177     0.2KB/s   00:00    
Mon Feb 20 15:53:01 2017 - [info] 
Mon Feb 20 15:53:01 2017 - [info] Log messages from 10.99.73.11 ...
Mon Feb 20 15:53:01 2017 - [info] 
Mon Feb 20 15:52:57 2017 - [info] Sending binlog..
Mon Feb 20 15:52:57 2017 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog to root@10.99.73.11:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog succeeded.
Mon Feb 20 15:52:57 2017 - [info] Starting recovery on 10.99.73.11(10.99.73.11:3306)..
Mon Feb 20 15:52:57 2017 - [info]  Generating diffs succeeded.
Mon Feb 20 15:52:57 2017 - [info] Waiting until all relay logs are applied.
Mon Feb 20 15:52:57 2017 - [info]  done.
Mon Feb 20 15:52:57 2017 - [info] Getting slave status..
Mon Feb 20 15:52:57 2017 - [info] This slave(10.99.73.11)'s Exec_Master_Log_Pos(mysql-bin.000001:86309673) does not equal to Read_Master_Log_Pos(mysql-bin.000001:86309812). It is likely that relay log was cut during transaction. Need to recover from Exec_Master_Log_Pos.
Mon Feb 20 15:52:57 2017 - [info] Saving local relay logs from exec pos to read pos on 10.99.73.11: from relay-log.000002:86309886 to the end of the relay log..
Mon Feb 20 15:52:57 2017 - [info] Executing command : save_binary_logs --command=save --start_file=relay-log.000002  --start_pos=86309886 --output_file=/data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --binlog_dir=/data/mysql/3306/log/relaylog 
Mon Feb 20 15:52:57 2017 - [info] 
  Creating /data/mysql/mha if not exists..    ok.
 Concat binary/relay logs from relay-log.000002 pos 86309886 to relay-log.000002 EOF into /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog ..
 Binlog Checksum enabled
 Binlog Checksum enabled
  Dumping binlog format description event, from position 0 to 367.. ok.
  Dumping effective binlog data from /data/mysql/3306/log/relaylog/relay-log.000002 position 86309886 to tail(86310025).. ok.
 Binlog Checksum enabled
 Binlog Checksum enabled
 Concat succeeded.
Mon Feb 20 15:52:57 2017 - [info] Connecting to the target slave host 10.99.73.11, running recover script..
Mon Feb 20 15:52:57 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.99.73.11 --slave_ip=10.99.73.11  --slave_port=3306 
--apply_files=/data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog,/data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog,
/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog --workdir=/data/mysql/mha --target_version=5.7.17-log --timestamp=20170220155249 --handle_raw_binlog=1 
--disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx
Mon Feb 20 15:53:01 2017 - [info] 
 Concat all apply files to /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220155249.binlog ..
 Copying the first binlog file /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog to /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220155249.binlog.. ok.
  Dumping binlog head events (rotate events), skipping format description events from /data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog..  Binlog Checksum enabled
 Binlog Checksum enabled
dumped up to pos 320. ok.
 /data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog has effective binlog events from pos 320.
  Dumping effective binlog data from /data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog position 320 to tail(36989671).. ok.
  Dumping binlog head events (rotate events), skipping format description events from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog..  Binlog Checksum enabled
dumped up to pos 154. ok.
 /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog has effective binlog events from pos 154.
  Dumping effective binlog data from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog position 154 to tail(177).. ok.
 Concat succeeded.
All apply target binary logs are concatinated at /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220155249.binlog .
MySQL client version is 5.7.17. Using --binary-mode.
Applying differential binary/relay log files /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog,/data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog on 10.99.73.11:3306. This may take long time...
Applying log files succeeded.
Mon Feb 20 15:53:01 2017 - [info]  All relay logs were successfully applied.
Mon Feb 20 15:53:01 2017 - [info]  Resetting slave 10.99.73.11(10.99.73.11:3306) and starting replication from the new master 10.99.73.10(10.99.73.10:3306)..
Mon Feb 20 15:53:01 2017 - [info]  Executed CHANGE MASTER.
Mon Feb 20 15:53:01 2017 - [info]  Slave started.
Mon Feb 20 15:53:01 2017 - [info] End of log messages from 10.99.73.11.
Mon Feb 20 15:53:01 2017 - [info] -- Slave recovery on host 10.99.73.11(10.99.73.11:3306) succeeded.
Mon Feb 20 15:53:01 2017 - [info] All new slave servers recovered successfully.
Mon Feb 20 15:53:01 2017 - [info] 
Mon Feb 20 15:53:01 2017 - [info] * Phase 5: New master cleanup phase..
Mon Feb 20 15:53:01 2017 - [info] 
Mon Feb 20 15:53:01 2017 - [info] Resetting slave info on the new master..
Mon Feb 20 15:53:01 2017 - [info]  10.99.73.10: Resetting slave info succeeded.
Mon Feb 20 15:53:01 2017 - [info] Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully.
Mon Feb 20 15:53:01 2017 - [info] 

----- Failover Report -----

app1: MySQL Master failover 10.99.73.9(10.99.73.9:3306) to 10.99.73.10(10.99.73.10:3306) succeeded

Master 10.99.73.9(10.99.73.9:3306) is down!

Check MHA Manager logs at mha for details.

Started manual(interactive) failover.
The latest slave 10.99.73.10(10.99.73.10:3306) has all relay logs for recovery.
Selected 10.99.73.10(10.99.73.10:3306) as a new master.
10.99.73.10(10.99.73.10:3306): OK: Applying all logs succeeded.
10.99.73.11(10.99.73.11:3306): Generating differential relay logs up to 10.99.73.10(10.99.73.10:3306)succeeded.
Generating relay diff files from the latest slave succeeded.
10.99.73.11(10.99.73.11:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.99.73.10(10.99.73.10:3306)
10.99.73.10(10.99.73.10:3306): Resetting slave info succeeded.
Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully.
Mon Feb 20 15:53:01 2017 - [info] Sending mail..

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

--dead_master_ip=<dead_master_ip> is not set. Using 10.99.73.9.

Mon Feb 20 15:52:49 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Mon Feb 20 15:52:49 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..

Mon Feb 20 15:52:49 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..

Mon Feb 20 15:52:49 2017 - [info] MHA::MasterFailover version 0.57.

Mon Feb 20 15:52:49 2017 - [info] Starting master failover.

Mon Feb 20 15:52:49 2017 - [info]

Mon Feb 20 15:52:49 2017 - [info] * Phase 1: Configuration Check Phase..

Mon Feb 20 15:52:49 2017 - [info]

Mon Feb 20 15:52:49 2017 - [info] GTID failover mode = 0

Mon Feb 20 15:52:49 2017 - [info] Dead Servers:

Mon Feb 20 15:52:49 2017 - [info] 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:52:49 2017 - [info] Checking master reachability via MySQL(double check)...

Mon Feb 20 15:52:49 2017 - [info] ok.

Mon Feb 20 15:52:49 2017 - [info] Alive Servers:

Mon Feb 20 15:52:49 2017 - [info] 10.99.73.10(10.99.73.10:3306)

Mon Feb 20 15:52:49 2017 - [info] 10.99.73.11(10.99.73.11:3306)

Mon Feb 20 15:52:49 2017 - [info] Alive Slaves:

Mon Feb 20 15:52:49 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:52:49 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:52:49 2017 - [info] Primary candidate for the new Master (candidate_master is set)

Mon Feb 20 15:52:49 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:52:49 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:52:49 2017 - [info] Not candidate for the new Master (no_master is set)

Master 10.99.73.9(10.99.73.9:3306) is dead. Proceed? (yes/NO): yes

Mon Feb 20 15:52:50 2017 - [info] Starting Non-GTID based failover.

Mon Feb 20 15:52:50 2017 - [info]

Mon Feb 20 15:52:50 2017 - [info] ** Phase 1: Configuration Check Phase completed.

Mon Feb 20 15:52:50 2017 - [info]

Mon Feb 20 15:52:50 2017 - [info] * Phase 2: Dead Master Shutdown Phase..

Mon Feb 20 15:52:50 2017 - [info]

Mon Feb 20 15:52:50 2017 - [info] HealthCheck: SSH to 10.99.73.9 is reachable.

Mon Feb 20 15:52:51 2017 - [info] Forcing shutdown so that applications never connect to the current master..

Mon Feb 20 15:52:51 2017 - [warning] master_ip_failover_script is not set. Skipping invalidating dead master IP address.

Mon Feb 20 15:52:51 2017 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.

Mon Feb 20 15:52:51 2017 - [info] * Phase 2: Dead Master Shutdown Phase completed.

Mon Feb 20 15:52:51 2017 - [info]

Mon Feb 20 15:52:51 2017 - [info] * Phase 3: Master Recovery Phase..

Mon Feb 20 15:52:51 2017 - [info]

Mon Feb 20 15:52:51 2017 - [info] * Phase 3.1: Getting Latest Slaves Phase..

Mon Feb 20 15:52:51 2017 - [info]

Mon Feb 20 15:52:51 2017 - [info] The latest binary log file/position on all slaves is mysql-bin.000001:123299163

Mon Feb 20 15:52:51 2017 - [info] Latest slaves (Slaves that received relay log files to the latest):

Mon Feb 20 15:52:51 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:52:51 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:52:51 2017 - [info] Primary candidate for the new Master (candidate_master is set)

Mon Feb 20 15:52:51 2017 - [info] The oldest binary log file/position on all slaves is mysql-bin.000001:86309812

Mon Feb 20 15:52:51 2017 - [info] Oldest slaves:

Mon Feb 20 15:52:51 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 15:52:51 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 15:52:51 2017 - [info] Not candidate for the new Master (no_master is set)

Mon Feb 20 15:52:51 2017 - [info]

Mon Feb 20 15:52:51 2017 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..

Mon Feb 20 15:52:51 2017 - [info]

Mon Feb 20 15:52:51 2017 - [info] Fetching dead master's binary logs..

Mon Feb 20 15:52:51 2017 - [info] Executing command on the dead master 10.99.73.9(10.99.73.9:3306): save_binary_logs --command=save --start_file=mysql-bin.000001 --start_pos=123299163 --binlog_dir=/data/mysql/log/binlog --output_file=/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57

Creating /data/mysql/mha if not exists.. ok.

Concat binary/relay logs from mysql-bin.000001 pos 123299163 to mysql-bin.000001 EOF into /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog ..

Binlog Checksum enabled

Dumping binlog format description event, from position 0 to 154.. ok.

Dumping effective binlog data from /data/mysql/log/binlog/mysql-bin.000001 position 123299163 to tail(123299186).. ok.

Binlog Checksum enabled

Concat succeeded.

saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog 100% 177 0.2KB/s 00:00

Mon Feb 20 15:52:51 2017 - [info] scp from root@10.99.73.9:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog to local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog succeeded.

Mon Feb 20 15:52:51 2017 - [info] HealthCheck: SSH to 10.99.73.10 is reachable.

Mon Feb 20 15:52:52 2017 - [info] HealthCheck: SSH to 10.99.73.11 is reachable.

Mon Feb 20 15:52:52 2017 - [info]

Mon Feb 20 15:52:52 2017 - [info] * Phase 3.3: Determining New Master Phase..

Mon Feb 20 15:52:52 2017 - [info]

Mon Feb 20 15:52:52 2017 - [info] Finding the latest slave that has all relay logs for recovering other slaves..

Mon Feb 20 15:52:52 2017 - [info] Checking whether 10.99.73.10 has relay logs from the oldest position..

Mon Feb 20 15:52:52 2017 - [info] Executing command: apply_diff_relay_logs --command=find --latest_mlf=mysql-bin.000001 --latest_rmlp=123299163 --target_mlf=mysql-bin.000001 --target_rmlp=86309812 --server_id=2000 --workdir=/data/mysql/mha --timestamp=20170220155249 --manager_version=0.57 --relay_dir=/data/mysql/3306/log/relaylog --current_relay_log=relay-log.000004 :

Relay log found at /data/mysql/3306/log/relaylog, up to relay-log.000004

Fast relay log position search succeeded.

Target relay log file/position found. start_file:relay-log.000004, start_pos:7398357.

Target relay log FOUND!

Mon Feb 20 15:52:52 2017 - [info] OK. 10.99.73.10 has all relay logs.

Mon Feb 20 15:52:52 2017 - [info] 10.99.73.10 can be new master.

Mon Feb 20 15:52:52 2017 - [info] New master is 10.99.73.10(10.99.73.10:3306)

Mon Feb 20 15:52:52 2017 - [info] Starting master failover..

Mon Feb 20 15:52:52 2017 - [info]

From:

10.99.73.9(10.99.73.9:3306) (current master)

+--10.99.73.10(10.99.73.10:3306)

+--10.99.73.11(10.99.73.11:3306)

To:

10.99.73.10(10.99.73.10:3306) (new master)

+--10.99.73.11(10.99.73.11:3306)

Starting master switch from 10.99.73.9(10.99.73.9:3306) to 10.99.73.10(10.99.73.10:3306)? (yes/NO): yes

Mon Feb 20 15:52:55 2017 - [info] New master decided manually is 10.99.73.10(10.99.73.10:3306)

Mon Feb 20 15:52:55 2017 - [info]

Mon Feb 20 15:52:55 2017 - [info] * Phase 3.3: New Master Diff Log Generation Phase..

Mon Feb 20 15:52:55 2017 - [info]

Mon Feb 20 15:52:55 2017 - [info] This server has all relay logs. No need to generate diff files from the latest slave.

Mon Feb 20 15:52:55 2017 - [info] Sending binlog..

saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog 100% 177 0.2KB/s 00:00

Mon Feb 20 15:52:55 2017 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog to root@10.99.73.10:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog succeeded.

Mon Feb 20 15:52:55 2017 - [info]

Mon Feb 20 15:52:55 2017 - [info] * Phase 3.4: Master Log Apply Phase..

Mon Feb 20 15:52:55 2017 - [info]

Mon Feb 20 15:52:55 2017 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.

Mon Feb 20 15:52:55 2017 - [info] Starting recovery on 10.99.73.10(10.99.73.10:3306)..

Mon Feb 20 15:52:55 2017 - [info] Generating diffs succeeded.

Mon Feb 20 15:52:55 2017 - [info] Waiting until all relay logs are applied.

Mon Feb 20 15:52:55 2017 - [info] done.

Mon Feb 20 15:52:55 2017 - [info] Getting slave status..

Mon Feb 20 15:52:55 2017 - [info] This slave(10.99.73.10)'s Exec_Master_Log_Pos equals to Read_Master_Log_Pos(mysql-bin.000001:123299163). No need to recover from Exec_Master_Log_Pos.

Mon Feb 20 15:52:55 2017 - [info] Connecting to the target slave host 10.99.73.10, running recover script..

Mon Feb 20 15:52:55 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.99.73.10 --slave_ip=10.99.73.10 --slave_port=3306 --apply_files=/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog

--workdir=/data/mysql/mha --target_version=5.7.17-log --timestamp=20170220155249 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx

Mon Feb 20 15:52:55 2017 - [info]

MySQL client version is 5.7.17. Using --binary-mode.

Applying differential binary/relay log files /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog on 10.99.73.10:3306. This may take long time...

Applying log files succeeded.

Mon Feb 20 15:52:55 2017 - [info] All relay logs were successfully applied.

Mon Feb 20 15:52:55 2017 - [info] Getting new master's binlog name and position..

Mon Feb 20 15:52:55 2017 - [info] mysql-bin.000002:22424986

Mon Feb 20 15:52:55 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.99.73.10', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000002', MASTER_LOG_POS=22424986, MASTER_USER='mysql_slave', MASTER_PASSWORD='xxx';

Mon Feb 20 15:52:55 2017 - [warning] master_ip_failover_script is not set. Skipping taking over new master IP address.

Mon Feb 20 15:52:55 2017 - [info] ** Finished master recovery successfully.

Mon Feb 20 15:52:55 2017 - [info] * Phase 3: Master Recovery Phase completed.

Mon Feb 20 15:52:55 2017 - [info]

Mon Feb 20 15:52:55 2017 - [info] * Phase 4: Slaves Recovery Phase..

Mon Feb 20 15:52:55 2017 - [info]

Mon Feb 20 15:52:55 2017 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..

Mon Feb 20 15:52:55 2017 - [info]

Mon Feb 20 15:52:55 2017 - [info] -- Slave diff file generation on host 10.99.73.11(10.99.73.11:3306) started, pid: 10463. Check tmp log /var/log/masterha/app1/10.99.73.11_3306_20170220155249.log if it takes time..

Mon Feb 20 15:52:57 2017 - [info]

Mon Feb 20 15:52:57 2017 - [info] Log messages from 10.99.73.11 ...

Mon Feb 20 15:52:57 2017 - [info]

Mon Feb 20 15:52:55 2017 - [info] Server 10.99.73.11 received relay logs up to: mysql-bin.000001:86309812

Mon Feb 20 15:52:55 2017 - [info] Need to get diffs from the latest slave(10.99.73.10) up to: mysql-bin.000001:123299163 (using the latest slave's relay logs)

Mon Feb 20 15:52:56 2017 - [info] Connecting to the latest slave host 10.99.73.10, generating diff relay log files..

Mon Feb 20 15:52:56 2017 - [info] Executing command: apply_diff_relay_logs --command=generate_and_send --scp_user=root --scp_host=10.99.73.11 --latest_mlf=mysql-bin.000001

--latest_rmlp=123299163 --target_mlf=mysql-bin.000001 --target_rmlp=86309812 --server_id=2000 --diff_file_readtolatest=/data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog

--workdir=/data/mysql/mha --timestamp=20170220155249 --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --relay_dir=/data/mysql/3306/log/relaylog --current_relay_log=relay-log.000004

Mon Feb 20 15:52:57 2017 - [info]

Relay log found at /data/mysql/3306/log/relaylog, up to relay-log.000004

Fast relay log position search succeeded.

Target relay log file/position found. start_file:relay-log.000004, start_pos:7398357.

Concat binary/relay logs from relay-log.000004 pos 7398357 to relay-log.000004 EOF into /data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog ..

Binlog Checksum enabled

Dumping binlog format description event, from position 0 to 320.. ok.

Dumping effective binlog data from /data/mysql/3306/log/relaylog/relay-log.000004 position 7398357 to tail(44387708).. ok.

Binlog Checksum enabled

Concat succeeded.

Generating diff relay log succeeded. Saved at /data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog .

scp mysql02:/data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog to root@10.99.73.11(22) succeeded.

Mon Feb 20 15:52:57 2017 - [info] Generating diff files succeeded.

Mon Feb 20 15:52:57 2017 - [info] End of log messages from 10.99.73.11.

Mon Feb 20 15:52:57 2017 - [info] -- Slave diff log generation on host 10.99.73.11(10.99.73.11:3306) succeeded.

Mon Feb 20 15:52:57 2017 - [info] Generating relay diff files from the latest slave succeeded.

Mon Feb 20 15:52:57 2017 - [info]

Mon Feb 20 15:52:57 2017 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..

Mon Feb 20 15:52:57 2017 - [info]

Mon Feb 20 15:52:57 2017 - [info] -- Slave recovery on host 10.99.73.11(10.99.73.11:3306) started, pid: 10469. Check tmp log /var/log/masterha/app1/10.99.73.11_3306_20170220155249.log if it takes time..

saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog 100% 177 0.2KB/s 00:00

Mon Feb 20 15:53:01 2017 - [info]

Mon Feb 20 15:53:01 2017 - [info] Log messages from 10.99.73.11 ...

Mon Feb 20 15:53:01 2017 - [info]

Mon Feb 20 15:52:57 2017 - [info] Sending binlog..

Mon Feb 20 15:52:57 2017 - [info] scp from local:/var/log/masterha/app1/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog to root@10.99.73.11:/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog succeeded.

Mon Feb 20 15:52:57 2017 - [info] Starting recovery on 10.99.73.11(10.99.73.11:3306)..

Mon Feb 20 15:52:57 2017 - [info] Generating diffs succeeded.

Mon Feb 20 15:52:57 2017 - [info] Waiting until all relay logs are applied.

Mon Feb 20 15:52:57 2017 - [info] done.

Mon Feb 20 15:52:57 2017 - [info] Getting slave status..

Mon Feb 20 15:52:57 2017 - [info] This slave(10.99.73.11)'s Exec_Master_Log_Pos(mysql-bin.000001:86309673) does not equal to Read_Master_Log_Pos(mysql-bin.000001:86309812). It is likely that relay log was cut during transaction. Need to recover from Exec_Master_Log_Pos.

Mon Feb 20 15:52:57 2017 - [info] Saving local relay logs from exec pos to read pos on 10.99.73.11: from relay-log.000002:86309886 to the end of the relay log..

Mon Feb 20 15:52:57 2017 - [info] Executing command : save_binary_logs --command=save --start_file=relay-log.000002 --start_pos=86309886 --output_file=/data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.57 --binlog_dir=/data/mysql/3306/log/relaylog

Mon Feb 20 15:52:57 2017 - [info]

Creating /data/mysql/mha if not exists.. ok.

Concat binary/relay logs from relay-log.000002 pos 86309886 to relay-log.000002 EOF into /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog ..

Binlog Checksum enabled

Dumping binlog format description event, from position 0 to 367.. ok.

Dumping effective binlog data from /data/mysql/3306/log/relaylog/relay-log.000002 position 86309886 to tail(86310025).. ok.

Binlog Checksum enabled

Concat succeeded.

Mon Feb 20 15:52:57 2017 - [info] Connecting to the target slave host 10.99.73.11, running recover script..

Mon Feb 20 15:52:57 2017 - [info] Executing command: apply_diff_relay_logs --command=apply --slave_user='mha' --slave_host=10.99.73.11 --slave_ip=10.99.73.11 --slave_port=3306

--apply_files=/data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog,/data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog,

/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog --workdir=/data/mysql/mha --target_version=5.7.17-log --timestamp=20170220155249 --handle_raw_binlog=1

--disable_log_bin=0 --manager_version=0.57 --slave_pass=xxx

Mon Feb 20 15:53:01 2017 - [info]

Concat all apply files to /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220155249.binlog ..

Copying the first binlog file /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog to /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220155249.binlog.. ok.

Dumping binlog head events (rotate events), skipping format description events from /data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog.. Binlog Checksum enabled

Binlog Checksum enabled

dumped up to pos 320. ok.

/data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog has effective binlog events from pos 320.

Dumping effective binlog data from /data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog position 320 to tail(36989671).. ok.

Dumping binlog head events (rotate events), skipping format description events from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog.. Binlog Checksum enabled

dumped up to pos 154. ok.

/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog has effective binlog events from pos 154.

Dumping effective binlog data from /data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog position 154 to tail(177).. ok.

Concat succeeded.

All apply target binary logs are concatinated at /data/mysql/mha/total_binlog_for_10.99.73.11_3306.20170220155249.binlog .

MySQL client version is 5.7.17. Using --binary-mode.

Applying differential binary/relay log files /data/mysql/mha/relay_from_exec_to_read_10.99.73.11_3306_20170220155249.binlog,/data/mysql/mha/relay_from_read_to_latest_10.99.73.11_3306_20170220155249.binlog,/data/mysql/mha/saved_master_binlog_from_10.99.73.9_3306_20170220155249.binlog on 10.99.73.11:3306. This may take long time...

Applying log files succeeded.

Mon Feb 20 15:53:01 2017 - [info] All relay logs were successfully applied.

Mon Feb 20 15:53:01 2017 - [info] Resetting slave 10.99.73.11(10.99.73.11:3306) and starting replication from the new master 10.99.73.10(10.99.73.10:3306)..

Mon Feb 20 15:53:01 2017 - [info] Executed CHANGE MASTER.

Mon Feb 20 15:53:01 2017 - [info] Slave started.

Mon Feb 20 15:53:01 2017 - [info] End of log messages from 10.99.73.11.

Mon Feb 20 15:53:01 2017 - [info] -- Slave recovery on host 10.99.73.11(10.99.73.11:3306) succeeded.

Mon Feb 20 15:53:01 2017 - [info] All new slave servers recovered successfully.

Mon Feb 20 15:53:01 2017 - [info]

Mon Feb 20 15:53:01 2017 - [info] * Phase 5: New master cleanup phase..

Mon Feb 20 15:53:01 2017 - [info]

Mon Feb 20 15:53:01 2017 - [info] Resetting slave info on the new master..

Mon Feb 20 15:53:01 2017 - [info] 10.99.73.10: Resetting slave info succeeded.

Mon Feb 20 15:53:01 2017 - [info] Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully.

Mon Feb 20 15:53:01 2017 - [info]

----- Failover Report -----

app1: MySQL Master failover 10.99.73.9(10.99.73.9:3306) to 10.99.73.10(10.99.73.10:3306) succeeded

Master 10.99.73.9(10.99.73.9:3306) is down!

Check MHA Manager logs at mha for details.

Started manual(interactive) failover.

The latest slave 10.99.73.10(10.99.73.10:3306) has all relay logs for recovery.

Selected 10.99.73.10(10.99.73.10:3306) as a new master.

10.99.73.10(10.99.73.10:3306): OK: Applying all logs succeeded.

10.99.73.11(10.99.73.11:3306): Generating differential relay logs up to 10.99.73.10(10.99.73.10:3306)succeeded.

Generating relay diff files from the latest slave succeeded.

10.99.73.11(10.99.73.11:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.99.73.10(10.99.73.10:3306)

10.99.73.10(10.99.73.10:3306): Resetting slave info succeeded.

Master failover to 10.99.73.10(10.99.73.10:3306) completed successfully.

Mon Feb 20 15:53:01 2017 - [info] Sending mail..

上述模拟了master宕机的情况下手动把10.99.73.10提升为主库的操作过程。

查看一下10.99.73.10主机状态。

mysql> select count(*) from sbtest.sbtest;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.28 sec)

mysql> show master status;
+------------------+----------+--------------+------------------+-------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+------------------+----------+--------------+------------------+-------------------+
| mysql-bin.000002 | 22424986 |              |                  |                   |
+------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)

mysql> show slave hosts;
+-----------+-------------+------+-----------+--------------------------------------+
| Server_id | Host        | Port | Master_id | Slave_UUID                           |
+-----------+-------------+------+-----------+--------------------------------------+
|    103309 | 10.99.73.11 | 3306 |      2000 | c9c57719-edbf-11e6-8f8c-fa163ece854a |
+-----------+-------------+------+-----------+--------------------------------------+
1 row in set (0.00 sec)

mysql> select count(*) from sbtest.sbtest;

+----------+

| count(*) |

+----------+

| 1000000 |

+----------+

1 row in set (0.28 sec)

mysql> show master status;

+------------------+----------+--------------+------------------+-------------------+

+------------------+----------+--------------+------------------+-------------------+

| mysql-bin.000002 | 22424986 | | | |

+------------------+----------+--------------+------------------+-------------------+

1 row in set (0.00 sec)

mysql> show slave hosts;

+-----------+-------------+------+-----------+--------------------------------------+

+-----------+-------------+------+-----------+--------------------------------------+

| 103309 | 10.99.73.11 | 3306 | 2000 | c9c57719-edbf-11e6-8f8c-fa163ece854a |

+-----------+-------------+------+-----------+--------------------------------------+

1 row in set (0.00 sec)

可以看到10.99.73.10切换为主后，自动补全了old master所有的差异日志。

masterha_master_switch命令参数介绍

--master_state=dead

参数取值：dead or alive。如果设置为alive，就是在线master切换了，这样的场景，master必须是活着的。

--dead_master_host=(hostname)

Dead master的主机信息，包括--dead_master_ip，--dead_master_port。

--new_master_host=(hostname)

New master的主机信息，这个参数是可选项，如果你想特意指定某台机器作为new master，就设置这个参数。如果--new_master_host没有设置，那么选举master的规则参考automated master failover（ candidate_master parameter ）。

--interactive=(0|1)

交互式failover，设置为1（默认）。非交互式failover，设置为0。

--skip_change_master(0.56)

只会完成日志补偿，不会进行change master和start slave。如果你想double check slave是否成功的恢复完成，那么设置该参数对你特别有用。

--skip_disable_read_only

如果设置了这个参数，那么新master将还会是只读状态。如果你想手动开启新master的写权限，那么这个参数特别有用。

--last_failover_minute=(minutes)

同masterha_manager里面的参数一样。

--ignore_last_failover

同masterha_manager里面的参数一样。

--wait_on_failover_error=(seconds)

同masterha_manager里面的参数一样。这个参数，只对自动或者非交互式的failover有用，如果--interactive=0没有设置，那么这个参数将不起作用。

--remove_dead_master_conf

同masterha_manager里面的参数一样。

--wait_until_gtid_in_sync=(0|1)

这是基于GTID的参数，如果设置为1（默认）： MHA会等待，直到所有slave追上新master的gtid。如果设置为0：那么MHA不会等待slave追上新master。

--ignore_binlog_server_error

MHA忽略任何binlog server的错误。

四、在线进行切换

在许多情况下，需要将现有的主服务器迁移到另外一台服务器上。比如主服务器硬件故障，RAID控制卡需要重建，将主服务器移到性能更好的服务器上等等。维护主服务器引起性能下降，导致停机时间至少无法写入数据。另外，阻塞或杀掉当前运行的会话会导致主主之间数据不一致的问题发生。 MHA 提供快速切换和优雅的阻塞写入，这个切换过程只需要0.5-2s的时间，这段时间内数据是无法写入的。在很多情况下，0.5-2的阻塞写入是可以接受的。因此切换主服务器不需要计划分配维护时间窗口。

MHA在线切换的大概过程：

1. 检查配置文件，主要检测复制设置和确定当前主服务器。

2. 选出新的主服务器。

3. 阻塞当前master的写入操作，主要包括执行 FLUSH TABLES，设置 read_only，kill 所有业务线程，关闭 VIP，最后执行 FLUSH TABLES WITH READ LOCK；一切都是为了保证新的写入无法进入，正在操作的事务回滚。

4. 等待所有从服务器复制完成，然后获取从服务器 POSITION 点（SHOW MASTER STATUS）。

5. 允许写入操作到新的主服务器，主要包括设置 read_only=OFF，RESET SLAVE ALL，设置 VIP。

6. 重新设置老的 master 为 slave，主要包括设置 relay_log_purge=0，change master，unlock tables，start slave 这几个部分。

注意，在线切换的时候应用架构需要考虑以下两个问题：

1. 自动识别master和slave的问题（master的机器可能会切换），如果采用了vip的方式，基本可以解决这个问题。

2. 负载均衡的问题，可以定义大概的读写比例，每台机器可承担的负载比例，当有机器离开集群时，需要考虑这个问题。

为了保证数据完全一致性，在最快的时间内完成切换，MHA的在线切换必须满足以下条件才会切换成功，否则会切换失败。

1. 所有slave的IO线程都在运行。

2. 所有slave的SQL线程都在运行。

3. 所有的show slave status的输出中Seconds_Behind_Master参数小于或者等于running_updates_limit秒，如果在切换过程中不指定running_updates_limit，那么默认情况下running_updates_limit为1秒。

4. 在master端，通过show processlist输出，没有一个更新花费的时间大于running_updates_limit秒。

在线切换步骤如下：

首先，停掉MHA监控：

[root@mha ~]# masterha_stop --conf=/etc/masterha/app1.cnf

1	[root@mha ~]# masterha_stop --conf=/etc/masterha/app1.cnf

然后可以在主库（10.99.73.9）上进行sysbench数据生成，在sbtest库下生成sbtest表，共100W记录。

$ sysbench /usr/share/sysbench/oltp_read_only.lua \
--mysql-host=127.0.0.1 \
--mysql-port=3306 \
--mysql-user=mha \
--mysql-password=123456 \
--mysql-socket=/data/mysql/3306/mysql.sock \
--mysql-db=sbtest \
--db-driver=mysql \
--tables=1 \
--table-size=1000000 \
--report-interval=10 \
--threads=128 \
--time=120 \
prepare

$ sysbench /usr/share/sysbench/oltp_read_only.lua \

--mysql-host=127.0.0.1 \

--mysql-port=3306 \

--mysql-user=mha \

--mysql-password=123456 \

--mysql-socket=/data/mysql/3306/mysql.sock \

--mysql-db=sbtest \

--db-driver=mysql \

--tables=1 \

--table-size=1000000 \

--report-interval=10 \

--threads=128 \

--time=120 \

prepare

有条件的可以在压测时连接数据指定VIP地址，把socket去掉，添加--mysql-host=10.99.73.100，这就样可以模拟应用在线切换（一边压测一边切换）。

等主库一压测完后。就可以进行在线切换操作（模拟在线切换主库操作，原主库10.99.73.9变为slave，10.99.73.10提升为新的主库）。

[root@mha ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=10.99.73.10 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1

1	[root@mha ~]# masterha_master_switch --conf=/etc/masterha/app1.cnf --master_state=alive --new_master_host=10.99.73.10 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1

其中参数的意思：

--orig_master_is_new_slave

切换时加上此参数是将原master变为slave节点，如果不加此参数，原来的master将不启动。

--running_updates_limit

故障切换时，候选master如果有延迟的话，mha切换不能成功，加上此参数表示延迟在此时间范围内都可切换（单位为s），但是切换的时间长短是由recover时relay日志的大小决定。

--new_master_host & --new_master_port

选择把哪个Slave提升为New Master，如果只有一个Slave节点，此参数可不加，会自动把Slave节点提升New Master节点。

最后查看日志，了解切换过程，最后输出信息如下：

Mon Feb 20 16:15:41 2017 - [info] MHA::MasterRotate version 0.57.
Mon Feb 20 16:15:41 2017 - [info] Starting online master switch..
Mon Feb 20 16:15:41 2017 - [info] 
Mon Feb 20 16:15:41 2017 - [info] * Phase 1: Configuration Check Phase..
Mon Feb 20 16:15:41 2017 - [info] 
Mon Feb 20 16:15:41 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Mon Feb 20 16:15:41 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Mon Feb 20 16:15:41 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Mon Feb 20 16:15:41 2017 - [info] GTID failover mode = 0
Mon Feb 20 16:15:41 2017 - [info] Current Alive Master: 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 16:15:41 2017 - [info] Alive Slaves:
Mon Feb 20 16:15:41 2017 - [info]   10.99.73.10(10.99.73.10:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 16:15:41 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 16:15:41 2017 - [info]     Primary candidate for the new Master (candidate_master is set)
Mon Feb 20 16:15:41 2017 - [info]   10.99.73.11(10.99.73.11:3306)  Version=5.7.17-log (oldest major version between slaves) log-bin:enabled
Mon Feb 20 16:15:41 2017 - [info]     Replicating from 10.99.73.9(10.99.73.9:3306)
Mon Feb 20 16:15:41 2017 - [info]     Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.99.73.9(10.99.73.9:3306)? (YES/no): yes
Mon Feb 20 16:15:59 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Mon Feb 20 16:15:59 2017 - [info]  ok.
Mon Feb 20 16:15:59 2017 - [info] Checking MHA is not monitoring or doing failover..
Mon Feb 20 16:15:59 2017 - [info] Checking replication health on 10.99.73.10..
Mon Feb 20 16:15:59 2017 - [info]  ok.
Mon Feb 20 16:15:59 2017 - [info] Checking replication health on 10.99.73.11..
Mon Feb 20 16:15:59 2017 - [info]  ok.
Mon Feb 20 16:15:59 2017 - [info] 10.99.73.10 can be new master.
Mon Feb 20 16:15:59 2017 - [info] 
From:
10.99.73.9(10.99.73.9:3306) (current master)
 +--10.99.73.10(10.99.73.10:3306)
 +--10.99.73.11(10.99.73.11:3306)

To:
10.99.73.10(10.99.73.10:3306) (new master)
 +--10.99.73.11(10.99.73.11:3306)
 +--10.99.73.9(10.99.73.9:3306)

Starting master switch from 10.99.73.9(10.99.73.9:3306) to 10.99.73.10(10.99.73.10:3306)? (yes/NO): yes
Mon Feb 20 16:16:03 2017 - [info] Checking whether 10.99.73.10(10.99.73.10:3306) is ok for the new master..
Mon Feb 20 16:16:03 2017 - [info]  ok.
Mon Feb 20 16:16:03 2017 - [info] 10.99.73.9(10.99.73.9:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Mon Feb 20 16:16:04 2017 - [info] 10.99.73.9(10.99.73.9:3306): Resetting slave pointing to the dummy host.
Mon Feb 20 16:16:04 2017 - [info] ** Phase 1: Configuration Check Phase completed.
Mon Feb 20 16:16:04 2017 - [info] 
Mon Feb 20 16:16:04 2017 - [info] * Phase 2: Rejecting updates Phase..
Mon Feb 20 16:16:04 2017 - [info] 
Mon Feb 20 16:16:04 2017 - [info] Executing master ip online change script to disable write on the current master:
Mon Feb 20 16:16:04 2017 - [info]   /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=10.99.73.9 --orig_master_ip=10.99.73.9 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='123456' --new_master_host=10.99.73.10 --new_master_ip=10.99.73.10 --new_master_port=3306 --new_master_user='mha' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave
Unknown option: orig_master_password
Unknown option: new_master_password
Unknown option: orig_master_ssh_user
Unknown option: new_master_ssh_user
Unknown option: orig_master_is_new_slave
Mon Feb 20 16:16:04 2017 113836 Set read_only on the new master.. ok.
Mon Feb 20 16:16:04 2017 116266 Set read_only=1 on the orig master.. ok.
Disabling the VIP on old master: 10.99.73.9 
SIOCSIFFLAGS: Cannot assign requested address
Mon Feb 20 16:16:04 2017 240935 Killing all application threads..
Mon Feb 20 16:16:04 2017 240993 done.
Mon Feb 20 16:16:04 2017 - [info]  ok.
Mon Feb 20 16:16:04 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Mon Feb 20 16:16:04 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..
Mon Feb 20 16:16:04 2017 - [info]  ok.
Mon Feb 20 16:16:04 2017 - [info] Orig master binlog:pos is mysql-bin.000001:154.
Mon Feb 20 16:16:04 2017 - [info]  Waiting to execute all relay logs on 10.99.73.10(10.99.73.10:3306)..
Mon Feb 20 16:16:04 2017 - [info]  master_pos_wait(mysql-bin.000001:154) completed on 10.99.73.10(10.99.73.10:3306). Executed 0 events.
Mon Feb 20 16:16:04 2017 - [info]   done.
Mon Feb 20 16:16:04 2017 - [info] Getting new master's binlog name and position..
Mon Feb 20 16:16:04 2017 - [info]  mysql-bin.000001:154
Mon Feb 20 16:16:04 2017 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.99.73.10', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=154, MASTER_USER='mysql_slave', MASTER_PASSWORD='xxx';
Mon Feb 20 16:16:04 2017 - [info] Executing master ip online change script to allow write on the new master:
Mon Feb 20 16:16:04 2017 - [info]   /usr/local/bin/master_ip_online_change --command=start --orig_master_host=10.99.73.9 --orig_master_ip=10.99.73.9 --orig_master_port=3306 
--orig_master_user='mha' --orig_master_password='123456' --new_master_host=10.99.73.10 --new_master_ip=10.99.73.10 --new_master_port=3306 --new_master_user='mha' --new_master_password='123456' 
--orig_master_ssh_user=root --new_master_ssh_user=root   --orig_master_is_new_slave
Unknown option: orig_master_password
Unknown option: new_master_password
Unknown option: orig_master_ssh_user
Unknown option: new_master_ssh_user
Unknown option: orig_master_is_new_slave
Mon Feb 20 16:16:04 2017 358995 Set read_only=0 on the new master.
Enabling the VIP - 10.99.73.100/24 on the new master - 10.99.73.10 
Mon Feb 20 16:16:04 2017 - [info]  ok.
Mon Feb 20 16:16:04 2017 - [info] 
Mon Feb 20 16:16:04 2017 - [info] * Switching slaves in parallel..
Mon Feb 20 16:16:04 2017 - [info] 
Mon Feb 20 16:16:04 2017 - [info] -- Slave switch on host 10.99.73.11(10.99.73.11:3306) started, pid: 10883
Mon Feb 20 16:16:04 2017 - [info] 
Mon Feb 20 16:16:04 2017 - [info] Log messages from 10.99.73.11 ...
Mon Feb 20 16:16:04 2017 - [info] 
Mon Feb 20 16:16:04 2017 - [info]  Waiting to execute all relay logs on 10.99.73.11(10.99.73.11:3306)..
Mon Feb 20 16:16:04 2017 - [info]  master_pos_wait(mysql-bin.000001:154) completed on 10.99.73.11(10.99.73.11:3306). Executed 0 events.
Mon Feb 20 16:16:04 2017 - [info]   done.
Mon Feb 20 16:16:04 2017 - [info]  Resetting slave 10.99.73.11(10.99.73.11:3306) and starting replication from the new master 10.99.73.10(10.99.73.10:3306)..
Mon Feb 20 16:16:04 2017 - [info]  Executed CHANGE MASTER.
Mon Feb 20 16:16:04 2017 - [info]  Slave started.
Mon Feb 20 16:16:04 2017 - [info] End of log messages from 10.99.73.11 ...
Mon Feb 20 16:16:04 2017 - [info] 
Mon Feb 20 16:16:04 2017 - [info] -- Slave switch on host 10.99.73.11(10.99.73.11:3306) succeeded.
Mon Feb 20 16:16:04 2017 - [info] Unlocking all tables on the orig master:
Mon Feb 20 16:16:04 2017 - [info] Executing UNLOCK TABLES..
Mon Feb 20 16:16:04 2017 - [info]  ok.
Mon Feb 20 16:16:04 2017 - [info] Starting orig master as a new slave..
Mon Feb 20 16:16:04 2017 - [info]  Resetting slave 10.99.73.9(10.99.73.9:3306) and starting replication from the new master 10.99.73.10(10.99.73.10:3306)..
Mon Feb 20 16:16:04 2017 - [info]  Executed CHANGE MASTER.
Mon Feb 20 16:16:04 2017 - [info]  Slave started.
Mon Feb 20 16:16:04 2017 - [info] All new slave servers switched successfully.
Mon Feb 20 16:16:04 2017 - [info] 
Mon Feb 20 16:16:04 2017 - [info] * Phase 5: New master cleanup phase..
Mon Feb 20 16:16:04 2017 - [info] 
Mon Feb 20 16:16:04 2017 - [info]  10.99.73.10: Resetting slave info succeeded.
Mon Feb 20 16:16:04 2017 - [info] Switching master to 10.99.73.10(10.99.73.10:3306) completed successfully.

100

101

102

103

104

105

106

107

108

109

110

111

112

Mon Feb 20 16:15:41 2017 - [info] MHA::MasterRotate version 0.57.

Mon Feb 20 16:15:41 2017 - [info] Starting online master switch..

Mon Feb 20 16:15:41 2017 - [info]

Mon Feb 20 16:15:41 2017 - [info] * Phase 1: Configuration Check Phase..

Mon Feb 20 16:15:41 2017 - [info]

Mon Feb 20 16:15:41 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Mon Feb 20 16:15:41 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf..

Mon Feb 20 16:15:41 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf..

Mon Feb 20 16:15:41 2017 - [info] GTID failover mode = 0

Mon Feb 20 16:15:41 2017 - [info] Current Alive Master: 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 16:15:41 2017 - [info] Alive Slaves:

Mon Feb 20 16:15:41 2017 - [info] 10.99.73.10(10.99.73.10:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 16:15:41 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 16:15:41 2017 - [info] Primary candidate for the new Master (candidate_master is set)

Mon Feb 20 16:15:41 2017 - [info] 10.99.73.11(10.99.73.11:3306) Version=5.7.17-log (oldest major version between slaves) log-bin:enabled

Mon Feb 20 16:15:41 2017 - [info] Replicating from 10.99.73.9(10.99.73.9:3306)

Mon Feb 20 16:15:41 2017 - [info] Not candidate for the new Master (no_master is set)

It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.99.73.9(10.99.73.9:3306)? (YES/no): yes

Mon Feb 20 16:15:59 2017 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..

Mon Feb 20 16:15:59 2017 - [info] ok.

Mon Feb 20 16:15:59 2017 - [info] Checking MHA is not monitoring or doing failover..

Mon Feb 20 16:15:59 2017 - [info] Checking replication health on 10.99.73.10..

Mon Feb 20 16:15:59 2017 - [info] ok.

Mon Feb 20 16:15:59 2017 - [info] Checking replication health on 10.99.73.11..

Mon Feb 20 16:15:59 2017 - [info] ok.

Mon Feb 20 16:15:59 2017 - [info] 10.99.73.10 can be new master.

Mon Feb 20 16:15:59 2017 - [info]

From:

10.99.73.9(10.99.73.9:3306) (current master)

+--10.99.73.10(10.99.73.10:3306)

+--10.99.73.11(10.99.73.11:3306)

To:

10.99.73.10(10.99.73.10:3306) (new master)

+--10.99.73.11(10.99.73.11:3306)

+--10.99.73.9(10.99.73.9:3306)

Starting master switch from 10.99.73.9(10.99.73.9:3306) to 10.99.73.10(10.99.73.10:3306)? (yes/NO): yes

Mon Feb 20 16:16:03 2017 - [info] Checking whether 10.99.73.10(10.99.73.10:3306) is ok for the new master..

Mon Feb 20 16:16:03 2017 - [info] ok.

Mon Feb 20 16:16:03 2017 - [info] 10.99.73.9(10.99.73.9:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.

Mon Feb 20 16:16:04 2017 - [info] 10.99.73.9(10.99.73.9:3306): Resetting slave pointing to the dummy host.

Mon Feb 20 16:16:04 2017 - [info] ** Phase 1: Configuration Check Phase completed.

Mon Feb 20 16:16:04 2017 - [info]

Mon Feb 20 16:16:04 2017 - [info] * Phase 2: Rejecting updates Phase..

Mon Feb 20 16:16:04 2017 - [info]

Mon Feb 20 16:16:04 2017 - [info] Executing master ip online change script to disable write on the current master:

Mon Feb 20 16:16:04 2017 - [info] /usr/local/bin/master_ip_online_change --command=stop --orig_master_host=10.99.73.9 --orig_master_ip=10.99.73.9 --orig_master_port=3306 --orig_master_user='mha' --orig_master_password='123456' --new_master_host=10.99.73.10 --new_master_ip=10.99.73.10 --new_master_port=3306 --new_master_user='mha' --new_master_password='123456' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave

Unknown option: orig_master_password

Unknown option: new_master_password

Unknown option: orig_master_ssh_user

Unknown option: new_master_ssh_user

Unknown option: orig_master_is_new_slave

Mon Feb 20 16:16:04 2017 113836 Set read_only on the new master.. ok.

Mon Feb 20 16:16:04 2017 116266 Set read_only=1 on the orig master.. ok.

Disabling the VIP on old master: 10.99.73.9

SIOCSIFFLAGS: Cannot assign requested address

Mon Feb 20 16:16:04 2017 240935 Killing all application threads..

Mon Feb 20 16:16:04 2017 240993 done.

Mon Feb 20 16:16:04 2017 - [info] ok.

Mon Feb 20 16:16:04 2017 - [info] Locking all tables on the orig master to reject updates from everybody (including root):

Mon Feb 20 16:16:04 2017 - [info] Executing FLUSH TABLES WITH READ LOCK..

Mon Feb 20 16:16:04 2017 - [info] ok.

Mon Feb 20 16:16:04 2017 - [info] Orig master binlog:pos is mysql-bin.000001:154.

Mon Feb 20 16:16:04 2017 - [info] Waiting to execute all relay logs on 10.99.73.10(10.99.73.10:3306)..

Mon Feb 20 16:16:04 2017 - [info] master_pos_wait(mysql-bin.000001:154) completed on 10.99.73.10(10.99.73.10:3306). Executed 0 events.

Mon Feb 20 16:16:04 2017 - [info] done.

Mon Feb 20 16:16:04 2017 - [info] Getting new master's binlog name and position..

Mon Feb 20 16:16:04 2017 - [info] mysql-bin.000001:154

Mon Feb 20 16:16:04 2017 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.99.73.10', MASTER_PORT=3306, MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=154, MASTER_USER='mysql_slave', MASTER_PASSWORD='xxx';

Mon Feb 20 16:16:04 2017 - [info] Executing master ip online change script to allow write on the new master:

Mon Feb 20 16:16:04 2017 - [info] /usr/local/bin/master_ip_online_change --command=start --orig_master_host=10.99.73.9 --orig_master_ip=10.99.73.9 --orig_master_port=3306

--orig_master_user='mha' --orig_master_password='123456' --new_master_host=10.99.73.10 --new_master_ip=10.99.73.10 --new_master_port=3306 --new_master_user='mha' --new_master_password='123456'

--orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave

Unknown option: orig_master_password

Unknown option: new_master_password

Unknown option: orig_master_ssh_user

Unknown option: new_master_ssh_user

Unknown option: orig_master_is_new_slave

Mon Feb 20 16:16:04 2017 358995 Set read_only=0 on the new master.

Enabling the VIP - 10.99.73.100/24 on the new master - 10.99.73.10

Mon Feb 20 16:16:04 2017 - [info] ok.

Mon Feb 20 16:16:04 2017 - [info]

Mon Feb 20 16:16:04 2017 - [info] * Switching slaves in parallel..

Mon Feb 20 16:16:04 2017 - [info]

Mon Feb 20 16:16:04 2017 - [info] -- Slave switch on host 10.99.73.11(10.99.73.11:3306) started, pid: 10883

Mon Feb 20 16:16:04 2017 - [info]

Mon Feb 20 16:16:04 2017 - [info] Log messages from 10.99.73.11 ...

Mon Feb 20 16:16:04 2017 - [info]

Mon Feb 20 16:16:04 2017 - [info] Waiting to execute all relay logs on 10.99.73.11(10.99.73.11:3306)..

Mon Feb 20 16:16:04 2017 - [info] master_pos_wait(mysql-bin.000001:154) completed on 10.99.73.11(10.99.73.11:3306). Executed 0 events.

Mon Feb 20 16:16:04 2017 - [info] done.

Mon Feb 20 16:16:04 2017 - [info] Resetting slave 10.99.73.11(10.99.73.11:3306) and starting replication from the new master 10.99.73.10(10.99.73.10:3306)..

Mon Feb 20 16:16:04 2017 - [info] Executed CHANGE MASTER.

Mon Feb 20 16:16:04 2017 - [info] Slave started.

Mon Feb 20 16:16:04 2017 - [info] End of log messages from 10.99.73.11 ...

Mon Feb 20 16:16:04 2017 - [info]

Mon Feb 20 16:16:04 2017 - [info] -- Slave switch on host 10.99.73.11(10.99.73.11:3306) succeeded.

Mon Feb 20 16:16:04 2017 - [info] Unlocking all tables on the orig master:

Mon Feb 20 16:16:04 2017 - [info] Executing UNLOCK TABLES..

Mon Feb 20 16:16:04 2017 - [info] ok.

Mon Feb 20 16:16:04 2017 - [info] Starting orig master as a new slave..

Mon Feb 20 16:16:04 2017 - [info] Resetting slave 10.99.73.9(10.99.73.9:3306) and starting replication from the new master 10.99.73.10(10.99.73.10:3306)..

Mon Feb 20 16:16:04 2017 - [info] Executed CHANGE MASTER.

Mon Feb 20 16:16:04 2017 - [info] Slave started.

Mon Feb 20 16:16:04 2017 - [info] All new slave servers switched successfully.

Mon Feb 20 16:16:04 2017 - [info]

Mon Feb 20 16:16:04 2017 - [info] * Phase 5: New master cleanup phase..

Mon Feb 20 16:16:04 2017 - [info]

Mon Feb 20 16:16:04 2017 - [info] 10.99.73.10: Resetting slave info succeeded.

Mon Feb 20 16:16:04 2017 - [info] Switching master to 10.99.73.10(10.99.73.10:3306) completed successfully.

整个切换过程基本与我们上面说的步骤一致，这里面用到了一个master_pos_wait函数，语法：select master_pos_wait(file, pos[, timeout])。一般在主库执行，作为主从切换的判断依据，判断一下返回值>=0，则认为主从同步完成。这里的file和pos对应主库show master status得到的值，代表执行位置。函数逻辑是等待当前从库达到这个位置后返回, 返回期间执行的事务个数。参数timeout可选，若缺省则无限等待，timeout<=0时与缺省的逻辑相同。若为正数，则等待这么多秒，超时函数返回-1。其他返回值，若当前slave为启动或在等待期间被终止，返回NULL；若指定的值已经在之前达到，返回0。

查看10.99.73.10主机切换为master了。

mysql> show slave hosts;
+-----------+-------------+------+-----------+--------------------------------------+
| Server_id | Host        | Port | Master_id | Slave_UUID                           |
+-----------+-------------+------+-----------+--------------------------------------+
|    103309 | 10.99.73.11 | 3306 |      2000 | c9c57719-edbf-11e6-8f8c-fa163ece854a |
|        10 |             | 3306 |      2000 | 0f7c2151-edc0-11e6-ab9e-fa163e2ebbf0 |
+-----------+-------------+------+-----------+--------------------------------------+
2 rows in set (0.00 sec)

mysql> select count(*) from sbtest.sbtest;
+----------+
| count(*) |
+----------+
|  1000000 |
+----------+
1 row in set (0.20 sec)

mysql> show slave hosts;

+-----------+-------------+------+-----------+--------------------------------------+

+-----------+-------------+------+-----------+--------------------------------------+

| 103309 | 10.99.73.11 | 3306 | 2000 | c9c57719-edbf-11e6-8f8c-fa163ece854a |

| 10 | | 3306 | 2000 | 0f7c2151-edc0-11e6-ab9e-fa163e2ebbf0 |

+-----------+-------------+------+-----------+--------------------------------------+

2 rows in set (0.00 sec)

mysql> select count(*) from sbtest.sbtest;

+----------+

| count(*) |

+----------+

| 1000000 |

+----------+

1 row in set (0.20 sec)

MHA online切换主从状态必须正常，切换后old maser会变成slave，且为read_only模式。

五、二次检测测试

MHA官方提供了一个二次检测脚本，防止误切换。也就是说当MHA主机探测不到Master时，它会调用secondary_check_script脚本进行二次检测Master，如果都探测不到，才进行故障切换。

我的测试方法如下，利用iptables。先拒绝掉MHA访问Master的3306端口，然后看会不会发生切换。

$ iptables -A INPUT -s 10.99.73.7/32 -p tcp --dport 3306 -j REJECT
$ iptables -A INPUT -s 10.99.73.7/32 -p tcp --dport 22 -j REJECT
$ iptables -A INPUT -s 10.99.73.10/32 -p tcp --dport 22 -j REJECT
$ iptables -A INPUT -s 10.99.73.11/32 -p tcp --dport 22 -j REJECT

$ iptables -A INPUT -s 10.99.73.7/32 -p tcp --dport 3306 -j REJECT

$ iptables -A INPUT -s 10.99.73.7/32 -p tcp --dport 22 -j REJECT

$ iptables -A INPUT -s 10.99.73.10/32 -p tcp --dport 22 -j REJECT

$ iptables -A INPUT -s 10.99.73.11/32 -p tcp --dport 22 -j REJECT

验证结果并没有发送切换，因为此时MHA会调secondary_check_script脚本进行二次检测Master，发现Master处于存活状态。

接着屏蔽二次检测指定的主机，如下：

$ iptables -A INPUT -s 10.99.73.10/32 -p tcp --dport 3306 -j REJECT
$ iptables -A INPUT -s 10.99.73.11/32 -p tcp --dport 3306 -j REJECT

1 2	$ iptables -A INPUT -s 10.99.73.10/32 -p tcp --dport 3306 -j REJECT $ iptables -A INPUT -s 10.99.73.11/32 -p tcp --dport 3306 -j REJECT

验证结果，此时真正触发了MHA故障切换，因为满足条件，MHA确定Master已经故障。

六、修复宕机的Master

通常情况下自动切换以后，原master可能已经废弃掉，待原master主机修复后，如果数据完整的情况下，可能想把原来master重新作为新主库的slave，这时我们可以借助当时自动切换时刻的MHA日志来完成对原master的修复。下面是提取相关日志的命令：

[root@mysql02 app1]# grep -i "All other slaves should start" manager.log
Mon Apr 21 22:28:33 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.60', MASTER_PORT=3306,
MASTER_LOG_FILE='mysql-bin.000022', MASTER_LOG_POS=506716, MASTER_USER='repl', MASTER_PASSWORD='xxx';

[root@mysql02 app1]# grep -i "All other slaves should start" manager.log

Mon Apr 21 22:28:33 2016 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='192.168.0.60', MASTER_PORT=3306,