Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

Master and slave tables with different character sets - will replication break?

$
0
0

We have some huge tables in our MySQL 5.6 database that have latin-1 as their character set, and want to convert them all to UTF-8 (along with converting the database's global character set). The necessary ALTER TABLE statements take, between them, about an hour to run. We'd like to minimise our downtime.

Our plan is to set up replication, make the encoding changes on the slave database, let the databases sync up, and then briefly take down our webserver and point it at the slave database.

Will this work, or is there a risk of it causing data corruption in the slave?

My concern about this stems from there being two obvious possible approaches for how string values (that are being INSERTed or UPDATEed) might be stored in MySQL's binlog and sent across to the slave. Either...

  1. The values are stored as unicode character sequences (i.e. the binlog knows what encoding they are encoded in sends that information to the slave so that it can convert the values to match the character set of the target column on the slave.)

    or...

  2. The encoded values are just stored as byte sequences, with no record of what the character set was.

The documentation doesn't shed any light on which of the above models MySQL uses.

If #1 is true, then our plan will work fine.

However, if #2 is true, then trying to replicate queries from the master to the slave is going to go wrong in some way if one of the columns involved has a different character set. For instance, once we've converted our tables from latin-1 to UTF-8 on the slave, the slave might try to store latin-1-encoded strings passed across from the master in a UTF-8-encoded column, resulting in either the stored text not being validly encoded in UTF-8 or representing the wrong characters.

The MySQL docs on Replication and Character Sets shed little light on the problem; they discuss potential issues resulting from differences in the global character set of each server, but don't touch on potential problems caused by differences in table or column character sets at all.

Is our plan sound, or are we at risk of ending up with corrupt data on the slave?


Viewing all articles
Browse latest Browse all 17268

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>