I'm trying to write a bash script to monitor simple master-slave replication with 5.6.33 and send email and log messages when anything goes wrong.
I want to test scenarios where replication fails and see if the script catches them all.
I tried to simulate network issues by blocking the slave (10.0.0.3) from accessing 3306 on master (10.0.0.2), but allowing the web server (10.0.0.1) to continue running the site.
On Master (10.0.0.2) I removed the ufw rule that allowed slave to connect and reloaded the firewall:
ubuntu@db:~$ sudo ufw delete 6
ubuntu@db:~$ sudo ufw reload
ubuntu@db:~$ sudo ufw status numbered
Status: active
To Action From
-- ------ ----
[ 1] 22 ALLOW IN Anywhere
[ 2] 22/tcp ALLOW OUT Anywhere (out)
[ 3] 22/udp ALLOW OUT Anywhere (out)
[ 4] 22 ALLOW IN 10.0.0.0/28
[ 5] 3306 ALLOW IN 10.0.0.1
[ 6] 22/tcp (v6) ALLOW OUT Anywhere (v6) (out)
[ 7] 22/udp (v6) ALLOW OUT Anywhere (v6) (out)
And on Slave (10.0.0.3) it shows that I cannot connect via telnet now, whereas previously I could:
ubuntu@ip-10-0-0-3:~$ telnet 10.0.0.2 3306
Trying 10.0.0.2...
^C
ubuntu@ip-10-0-0-3:~$
But still replication works.
Here's a sample of the grepped output of SHOW SLAVE STATUS in a loop with 1 second delay:
20170309_154122
Slave_IO_State: Waiting for master to send event
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
20170309_154123
Slave_IO_State: Waiting for master to send event
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Seconds_Behind_Master: 0
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
I also checked with
master> show master status;
and
slave> show slave status \G
and the log positions match continuously.
EDIT: Ufw's default is to deny incoming requests:
ubuntu@db:~$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
What mistake am I making?
I first tried blocking master's IP in the slave's firewall. That did not stop replication.
I figured that was obviously wrong because slaves request and read from the master, and that I would have to block slave's attempts to read, in the master's firewall.
But this too seems to work fine.
Given that I cannot telnet from slave to master, how is this working? I tested this by enabling and disabling the firewall and telnet. Is telnet an unreliable tool for testing replication connectivity?
Does mysql employ some "push" functionality from the master's side, because it knows the slave's details?
Any help is greatly appreciated.
EDIT2: I think I found the answer to the mystery but it exposes a security issue.
I ran netstat -plantu
on both master and slave and here are the relevant outputs:
Master:
ubuntu@db:~$ sudo netstat -plantu
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
...
tcp6 0 0 :::3306 :::* LISTEN 1319/mysqld
...
tcp6 0 0 10.0.0.2:3306 10.0.0.3:57128 ESTABLISHED 1319/mysqld
Slave:
ubuntu@ip-10-0-0-3:~$ sudo netstat -plantu
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
...
tcp 0 0 10.0.0.3:57128 10.0.0.2:3306 ESTABLISHED 3576/mysqld
...
tcp6 0 0 :::3306 :::* LISTEN 3576/mysqld
This would mean that replication happens over "tcp6" even though plain telnet does not work.
So now, how do I block ipv6 connections if not through ufw?