I'm having a really hard time diagnosing and working around a race condition that is causing problems on my slave and knocking it offline due to a foreign key relationship. We're using MIXED
format replication in a basic master -> slave
relationship.
The situation is thus, we have two tables in question, a topics
table and a stats
table. The topics table is generated from a maintenance task that adds new topics as they're found if they don't already exist. The second table stats
keeps stats on when the topic trended, and how often it trended, basically keeping hourly statistics on each topic.
The race condition is taking place when a new topic is added. The stats
table has a foreign key relating to the topic
. When a new topic is added, once the INSERT
has been confirmed as successful via a boolean response, the stats class is invoked which registers the first 'trended' count.
On the master this is working as expected, the topic is added, then the first stat is registered. However on the slave, this is not the case, the replication is logging the error...
Error 'Cannot add or update a child row: a foreign key constraint
fails (`db`.`stats`, CONSTRAINT `stats_primary` FOREIGN KEY (`Topic_ID`)
REFERENCES `topics` (`Topic_ID`) ON DELETE CASCADE ON UPDATE CASCADE)'
on query.
... which basically says that the topics
table doesn't have the topic that the stats
table is trying to refer to.
I don't understand why there is a race-condition at all, the INSERT
waits for a confirmation before attempting to create/increment a stat record, why is the binlog
not keeping the statements in order? I've tried adding sanity-checks all over the place, and there are no problems on the master, any suggestions?
EDIT 1
Here is the schema for the two tables...
-- ----------------------------
-- Table structure for `history`
-- ----------------------------
DROP TABLE IF EXISTS `history`;
CREATE TABLE `history` (
`ai_col` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Topic_ID` char(32) NOT NULL,
`timestamp` int(10) unsigned NOT NULL,
`trended` int(10) unsigned NOT NULL DEFAULT '0',
`viewed` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`ai_col`),
UNIQUE KEY `history_primary` (`Topic_ID`,`timestamp`) USING BTREE,
KEY `history_timestamp` (`timestamp`) USING BTREE,
KEY `history_trended` (`trended`) USING BTREE,
KEY `history_viewed` (`viewed`) USING BTREE,
CONSTRAINT `history_primary` FOREIGN KEY (`Topic_ID`) REFERENCES `topics` (`Topic_ID`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPACT;
-- ----------------------------
-- Table structure for `topics`
-- ----------------------------
DROP TABLE IF EXISTS `topics`;
CREATE TABLE `topics` (
`ai_col` int(10) unsigned NOT NULL AUTO_INCREMENT,
`Topic_ID` char(32) NOT NULL,
`terms` varchar(255) NOT NULL,
`title` varchar(100) DEFAULT NULL,
`link` varchar(255) DEFAULT NULL,
`type` enum('event','profile','timeline','general') DEFAULT NULL,
`event_date` date DEFAULT NULL,
`description` text,
`active` tinyint(1) unsigned NOT NULL DEFAULT '1',
`created` datetime NOT NULL,
`lastedit` datetime NOT NULL,
PRIMARY KEY (`ai_col`),
UNIQUE KEY `topics_topic_id` (`Topic_ID`) USING BTREE,
KEY `topics_link` (`link`) USING BTREE,
KEY `topics_title` (`title`) USING BTREE,
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8;
Here are a few sample rows, including the one in particular that caused this specific error, however keep in mind, this error is happening as soon as the log is completely caught up (0s behind) and is taking place with NEW data. Also keep in mind, these are from the dump file so they're all inserts rather than updates, note the higher 'trended' values)...
INSERT INTO `topics` VALUES ('479410', '827658485a914809b6759d28e6fd7abe', 'd40bad723b2ccc7706bd612760ccd290', '07b2f3037d844db587ebbc5df4685820', 'PC Organics', 'Trending on Twitter,February 2017', 'PC Organics', 'Pc-Organics', 'general', null, 'Stories and events relating to PC Organics', '1', '2017-02-04 11:34:47', '2017-02-04 11:34:47');
INSERT INTO `topics` VALUES ('479411', 'd1fdb23fdf4949ae86441c8ae21573e0', 'd40bad723b2ccc7706bd612760ccd290', '07b2f3037d844db587ebbc5df4685820', 'Tom Lawrence', 'Trending on Twitter,February 2017', 'Tom Lawrence', 'Tom-Lawrence', 'general', null, 'Stories and events relating to Tom Lawrence', '1', '2017-02-04 11:34:47', '2017-02-04 11:34:47');
INSERT INTO `topics` VALUES ('479412', '74180ea91a2542eebcbe596b90311360', 'd40bad723b2ccc7706bd612760ccd290', '07b2f3037d844db587ebbc5df4685820', 'Moore', 'Trending on Twitter,February 2017', 'Moore', 'Moore', 'general', null, 'Stories and events relating to Moore', '1', '2017-02-04 11:34:47', '2017-02-04 11:34:47');
INSERT INTO `history` VALUES ('3647775', '827658485a914809b6759d28e6fd7abe', '1486224000', '4', '0');
INSERT INTO `history` VALUES ('3647776', 'd1fdb23fdf4949ae86441c8ae21573e0', '1486224000', '3', '0');
INSERT INTO `history` VALUES ('3647778', '74180ea91a2542eebcbe596b90311360', '1486224000', '3', '0');
And this is the my.cnf
ansible template file with common, master, and slave components indicated ...
[client]
socket=/mnt/data/mysql.sock
[mysqld]
datadir=/mnt/data/
socket=/mnt/data/mysql.sock
user=mysql
symbolic-links=0
port=3306
bind-address=0.0.0.0
server-id = {{ mysql_db_id }}
skip-name-resolve
character-set-server=utf8
collation-server=utf8_general_ci
explicit_defaults_for_timestamp = 1
ignore-db-dir=lost+found
# Logs
slow_query_log = OFF
#slow_query_log_file=/mnt/data/mysql-slow.log
#long_query_time= 3
log-error=/mnt/data/mysqld.log
# Limits
max_allowed_packet = 256M
tmp-table-size = 32M
max-heap-table-size = 32M
query-cache-type = 0
query-cache-size = 0
max-connections = 1000
thread-cache-size = 50
open-files-limit = 65535
table-definition-cache = 4096
table-open-cache = 4096
{% if mysql_repl_role == 'master' %}
# Master
log_bin = mysql-bin
log-bin-index = mysql-bin.index
expire_logs_days = 10
max_binlog_size = 100M
max_binlog_files = 50
binlog_do_db = {{ database.name }}
binlog_format = MIXED
{% endif %}
{% if mysql_repl_role == 'slave' %}
# Slave
read_only
relay-log = relay-bin
relay-log-index = relay-bin.index
replicate-ignore-table = "db.sessions"
replicate-ignore-table = "db.cache"
{% endif %}
# INNODB
innodb-flush-method = O_DIRECT
innodb-log-files-in-group = 2
innodb-log-file-size = 256M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table = 1
innodb-buffer-pool-size = {{ innodb_buffer_pool_size }}
# HANDLERSOCKET PLUGIN
loose_handlersocket_port = 9998
loose_handlersocket_port_wr = 9999
loose_handlersocket_threads = 8
loose_handlersocket_threads_wr = 1
[mysqldump]
quick
quote-names
max_allowed_packet = 128M
[mysqld_safe]
log-error=/mnt/data/mysqld.log
pid-file=/mnt/data/mysqld.pid
Everything starts fine, and the rest of the system is able to exist happily with the replication in place. The ONLY issue is with the two tables outlined above, and we're talking about 200+ tables with most tables having multiple FK relationships. I've gone so far as ensuring I stopped using any non-ANSI compatible / MySQL specific query functions such as ... ON DUPLICATE KEY UPDATE ...
which I know has replication issues, but haven't hit on anything yet.
EDIT 2: binlog
Here is the statement in question causing the Foreign Key error above...
# at 1403605
#170204 12:05:01 server id 1 end_log_pos 1404039 CRC32 0x8dd8f69e Query thread_id=1120 exec_time=0 error_code=0
SET TIMESTAMP=1486227901/*!*/;
BEGIN
/*!*/;
# at 1403692
# at 1403724
#170204 12:05:01 server id 1 end_log_pos 1404071 CRC32 0x116b9b3a Intvar
SET INSERT_ID=3647784/*!*/;
#170204 12:05:01 server id 1 end_log_pos 1404270 CRC32 0x45ec23e8 Query thread_id=1120 exec_time=0 error_code=0
SET TIMESTAMP=1486227901/*!*/;
INSERT INTO history SET Topic_ID = '827658485a914809b6759d28e6fd7abe', `timestamp` = 1486227600, trended = 1
/*!*/;
# at 1403923
#170204 12:05:01 server id 1 end_log_pos 1404301 CRC32 0xe396df1e Xid = 157330
COMMIT/*!*/;
The SHOW SLAVE STATUS \G
is showing the following (which points to the above)...
Relay_Log_File: relay-bin.000003
Relay_Log_Pos: 1403605
And just to recap, here is the full error listed under Last_SQL_Error
...
Last_SQL_Error: Error 'Cannot add or update a child row: a foreign key constraint fails (`db`.`history`, CONSTRAINT `history_primary` FOREIGN KEY (`Topic_ID`) REFERENCES `topics` (`Topic_ID`) ON DELETE CASCADE ON UPDATE CASCADE)' on query. Default database: 'db'. Query: 'INSERT INTO history SET Topic_ID = '827658485a914809b6759d28e6fd7abe', `timestamp` = 1486227600, trended = 1'