This may look familiar, I've been learning about this stuff for a few days - but this is a new question. I have a setup like this:
Every node is a master, and I have them all replicating to each other through ssh tunnels, and everything is grand - until I test one the scenarios we will have to deal with: loss of connectivity on a node.
Steps to reproduce...
- Sever connectivity between hub (cloud) and one of the local servers.
- Update a row on the the hub server.
- Update the same row on the local server (non-PK column, same PK record).
- Restore connectivity.
Result: The hub, and all of the other local servers, end up with the updated value from the local server that was disconnected. The local server where the updates were made end up with the value set on the hub server.
From the docs on Global Transaction ID:
However, if user updates are done independently on multiple servers at the same time, then in general it is not possible for binlog order to be identical across all servers. This can happen when using multi-source replication, with multi-master ring topologies, or just if manual updates are done on a slave that is replicating from active master. If the binlog order is different on the new master from the order on the old master, then it is not sufficient for the slave to keep track of a single GTID to completely record the current state.
The domain ID, the first component of the GTID, is used to handle this.
I do have unique domain ids set on each of my servers. Then I found this paragraph on the same page which contains the following:
Using
master_use_gtid=current_pos
is probably easiest, as there is then no need to consider whether a server was a master or a slave prior to usingCHANGE MASTER
. But care must be taken not to inject extra transactions into the binlog on the slave server that are not intended to be replicated to other servers. If such an extra transaction is the most recent when the slave starts, it will be used as the starting point of replication. This will probably fail because that transaction is not present on the master. To avoid local changes on a slave server to go into the binlog, set@@sql_log_bin
to 0.If it is undesirable that changes to the binlog on the slave affects the GTID replication position, then
master_use_gtid=slave_pos
should be used. Then the slave will always connect to the master at the position of the last replicated GTID. This may avoid some surprises for users that expect behaviour consistent with traditional replication, where the replication position is never changed by local changes done on a server."
So, I tried setting the master_use_gtid
to slave_pos
, but I am still getting the same behavior. Previously it had been set to current_pos
. I'm kind of at a dead end now.
I am not quite sure exactly what I expected as far as which update would win, but I do know I expected the servers to end up with the same data after connection and syncing.
The question is: How can I get all nodes to always end up with the same data, and is there a way to decide which value wins?