Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

redis master slave replication - missing data on the slave

$
0
0

Problem

I have a situation where data that I created on the master doesn't seem to have been replicated to my slaves properly.

Master Redis DB Setup Info

I have a master running on 10.1.1.1. The configuration is set to "SAVE" to disk. Here's a snippet from the configuration file:

save 900 1
save 300 10
save 60 10000

When I run a scan command against the hash in question, here are the results (which are correct):

127.0.0.1:6379> scan 0 match dep:*
1) "13"
2) 1) "dep:+19999999999_00:00_00:00"
   2) "dep:+19999999999_19:00_25:00"
   3) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379> 

Slave 1 Setup

Slave 1 has been set up to run in memory only. So in the configuration file, all the "save" options have been commented out.

Here's the data I have in slave 1: (missing a record)

127.0.0.1:6379> scan 0 match dep:*
1) "15"
2) 1) "dep:+19999999999_00:00_00:00"
   2) "dep:+19999999999_19:00_25:00"
127.0.0.1:6379> 

When I run the "info" command on this slave, this is what I get back: (only picked specific items that I thought might pertain to this problem)

# Replication
role:slave
master_host:10.1.1.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:5
master_sync_in_progress:0
slave_repl_offset:346292
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

#Stats
expired_keys:0

#Persistence
aof_enabled:0

Slave 2 Setup

Slave 2 is also supposed to be an in memory data store only. So all the save options in the config file have also been commented out like so:

#save 900 1
#save 300 10
#save 60 10000

This is the data I have on slave 2 (notice that it's missing data, but different records from slave 1)

127.0.0.1:6379> scan 0 match dep:*
1) "3"
2) 1) "dep:+19999999999_00:00_00:00"
   2) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379> 

Some of the results from the info command:

# Replication
role:slave
master_host:10.1.1.1
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_repl_offset:346754
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

#Stats
expired_keys:0

#Persistence
aof_enabled:0

This is my first crack at using REDIS, so I'm sure it's something simple that I've missed. I haven't tried to restart REDIS on the slaves just yet, because I don't want to lose any artifacts that might help me troubleshoot / understand how I got myself here in the first place.

Any suggestions would be appreciated.

EDIT 1

In checking the logs on slave 2, this is what I found:

4651:S 27 Sep 18:39:27.197 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
4651:S 27 Sep 18:39:27.197 # Server started, Redis version 3.0.5
4651:S 27 Sep 18:39:27.197 * The server is now ready to accept connections on port 6379
4651:S 27 Sep 18:39:27.198 * Connecting to MASTER 10.1.1.1:6379
4651:S 27 Sep 18:39:27.198 * MASTER <-> SLAVE sync started
4651:S 27 Sep 18:40:28.284 # Timeout connecting to the MASTER...
4651:S 27 Sep 18:40:28.284 * Connecting to MASTER 10.1.1.1:6379
4651:S 27 Sep 18:40:28.284 * MASTER <-> SLAVE sync started
4651:S 27 Sep 18:41:29.369 # Timeout connecting to the MASTER...
4651:S 27 Sep 18:41:29.369 * Connecting to MASTER 10.1.1.1:6379
4651:S 27 Sep 18:41:29.369 * MASTER <-> SLAVE sync started
4651:S 27 Sep 18:42:00.452 * Non blocking connect for SYNC fired the event.
4651:S 27 Sep 18:42:00.453 * Master replied to PING, replication can continue...
4651:S 27 Sep 18:42:00.453 * Partial resynchronization not possible (no cached master)
4651:S 27 Sep 18:42:00.463 * Full resync from master: b46c3622e4ef4c5586ebd2ec23eabcb04c3fcf32:1
4651:S 27 Sep 18:42:00.592 * MASTER <-> SLAVE sync: receiving 173 bytes from master
4651:S 27 Sep 18:42:00.592 * MASTER <-> SLAVE sync: Flushing old data
4651:S 27 Sep 18:42:00.592 * MASTER <-> SLAVE sync: Loading DB in memory
4651:S 27 Sep 18:42:00.592 * MASTER <-> SLAVE sync: Finished with success

How do redis slaves recover when there is a time out connecting to the master? I'm also wondering what this error means "Partial resynchronization not possible (no cached master)".

Currently googling... But if you have any comments, please feel free

EDIT 2

Here's another really interesting find (at least for me). I just added a new item the master, like so:

127.0.0.1:6379> HMSET dep:+19999999999_15:00_18:45:00 ext 2222 dd me.net days "fri"
OK
127.0.0.1:6379> scan 0 match dep:*
1) "13"
2) 1) "dep:+19999999999_00:00_00:00"
   2) "dep:+19999999999_19:00_25:00"
   3) "dep:+19999999999_15:00_18:45:00"
   4) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379> 

And now, when i check slave one again, it still only has 2 records, but its dropped a record that used to be there, and replaced it with the new i just added:

127.0.0.1:6379> scan 0 match dep:*
1) "7"
2) 1) "dep:+19999999999_00:00_00:00"
   2) "dep:+19999999999_15:00_18:45:00"
127.0.0.1:6379> 

EDIT 3

From the answer below, it looks like the first number returned by the SCAN command is a position in the cursor... And in reading the documentation I can specify a count indicating the number of records to return. But this still raises some questions for me. For example, in line with the answer below, I tried the following SCAN command on a slave:

127.0.0.1:6379> scan 0 match dep:*
1) "7"
2) 1) "dep:+19999999999_00:00_00:00"
   2) "dep:+19999999999_15:00_18:45:00"
127.0.0.1:6379> scan 7 match dep:*
1) "0"
2) 1) "dep:+19999999999_19:00_25:00"
   2) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379> 

This makes sense to me... it seems to be returning 2 records at a time ( still need to figure out how I can change this default)

According to this post - Redis scan count: How to force SCAN to return all keys matching a pattern? - , I can use the "count" keyword to indicate how many records to return.

But in order to get all 4 of the records I have, I had to run several queries before the cursor value came back as a zero... and i dont' know why. For example:

127.0.0.1:6379> scan 0 match dep:* count 3
1) "10"
2) 1) "dep:+19999999999_00:00_00:00"
127.0.0.1:6379> scan 10 match dep:* count 3
1) "3"
2) (empty list or set)
127.0.0.1:6379> scan 3 match dep:* count 3
1) "7"
2) 1) "dep:+19999999999_15:00_18:45:00"
127.0.0.1:6379> scan 7 match dep:* count 3
1) "0"
2) 1) "dep:+19999999999_19:00_25:00"
   2) "dep:+19999999999_08:00_12:00"
127.0.0.1:6379> 

Why didn't the first request return 3 records? In my mind, at most, I should have had to run this scan command 2 times. Can you explain what's happening here?

Also, maybe I shouldn't be using the scan command in my node js REST API? Imagine that a user will make a request for widget information... and I need to query this hash to find the key. It feels like this type of iteration would be very inefficient. The KEYS command will work too, but as per the docs,I shouldn't be using that in production because it will affect performance. Any comments / insights would be appreciated.


Viewing all articles
Browse latest Browse all 17268

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>