I'm trying to figure out how to troubleshoot my redis master / slave replication. It has "just stopped" working.
Setup Information
Let's say my master's IP address is 10.1.2.3
Here's what I've checked so far:
I've restarted redis on both the master and slave... but anytime I run INFO REPLICATION on the slave it shows the link as "down"
Ran netstat -lnp on both the master and slave. Here's the output from the master:
masterdb:~# netstat -lnp | grep 6379 tcp 0 0 127.0.0.1:6379 0.0.0.0:* LISTEN 21611/redis-server tcp 0 0 10.1.2.3:6379 0.0.0.0:* LISTEN 21611/redis-server
And from the slave machine:
slavedb:~# netstat -lnp | grep 6379 tcp 0 0 0.0.0.0:6379 0.0.0.0:* LISTEN 5577/redis-server tcp 0 0 :::6379 :::* LISTEN 5577/redis-server slavedb:~#
- I've checked the logs on both the master and the slave and I don't see any error messages. But I see timeout messages on the slave... which I think I've seen before, even when replication was working. The log looks like this on the slave:
5577:S 26 Oct 13:17:19.510 * MASTER <-> SLAVE sync started 5577:S 26 Oct 13:18:20.597 # Timeout connecting to the MASTER... 5577:S 26 Oct 13:18:20.597 * Connecting to MASTER 10.1.2.3:6379 5577:S 26 Oct 13:18:20.597 * MASTER <-> SLAVE sync started 5577:S 26 Oct 13:19:21.685 # Timeout connecting to the MASTER...
When i start the redis-cli on the slave and re-issue the slaveof command, i get this message:
127.0.0.1:6379> slaveof 10.1.2.3 6379 OK Already connected to specified master 127.0.0.1:6379>
I also tried the following commands on the master :
127.0.0.1:6379> save OK 127.0.0.1:6379> bgsave Background saving started 127.0.0.1:6379>
But that didn't resolve anything on the slave. It still says the link is down when I check the INFO on REPLCIATION:
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.1.2.3
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:1477488462
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
127.0.0.1:6379>
I'm not sure what else to check.