We're moving to a different cloud provider so need to migrate our database with as little downtime as possible.
Here's our current setup:
master
└───slave1
└───slave2
└───slaveBackup
A dump of slaveBackup
was imported into our new master database, newMaster
. slaveBackup
logs changes to its bin logs using log-slave-updates
and binlog_do_db
to only log the correct database which is identically named throughout.
To reduce downtime we now want to add newMaster
as a slave to slaveBackup
, thus creating the following chain while we migrate:
master
└───slave1
└───slave2
└───slaveBackup
└───newMaster
└───newSlave1
└───newSlave2
newMaster
is replicated as intended to newSlave1
and newSlave2
, all are identical.
All servers have unique server IDs.
We recorded the bin log file and position on slaveBackup
made immediately after the dump and used this to add newMaster
as a slave of slaveBackup
:
CHANGE MASTER TO
MASTER_HOST = 'slaveBackup',
MASTER_USER = 'slave',
MASTER_PASSWORD = '',
MASTER_LOG_FILE = 'mariadb-bin.00000X',
MASTER_LOG_POS = X;
Starting the slave worked perfectly and in time using SHOW SLAVE STATUS\G
we could see it catch up to its master.
Here's where it gets weird. Despite the slave newMaster
crunching through the bin logs, no rows are being inserted into the new database. What's weirder during this time Seconds_Behind_Master
was always 0.
The logs were definitely being processed as the slave user is different on newMaster
('slave'@'X.X.X.X') vs slaveBackup
('slave'@'Y.Y.Y.Y') and a moderation made after the dump of this user caused newMaster
to error and required SET GLOBAL SQL_SLAVE_SKIP_COUNTER = 1;
to recover.
Now the slave is fully caught up with master in terms of bin log file and position yet not data has been inserted into its database.
Why is this?
The bin logs on slaveBackup
definitely have the correct USE DATABASENAME;
statements before each insert/update, hence why I'm completely stumped.
To clarify the replication chain:
master -> slaveBackup -> newMaster -> newSlave1/2