We have recently upgraded to mysql5.6.25 from mysql5.5.x/mysql5.1.x on our mysql-cluster. Below is a brief snapshot of our architecture.
Since we have upgraded and enabled gtid-mode we have intermittently getting slave errors similar to :
Last_SQL_Error: Error 'When @@SESSION.GTID_NEXT is set to a GTID, you must explicitly set it to a different value after a COMMIT or ROLLBACK. Please check GTID_NEXT variable manual page for detailed explanation. Current @@SESSION.GTID_NEXT is 'd7e8990d-3a9e-11e5-8bc7-22000aa63d47:1466'.' on query. Default database: 'adplatform'. Query: 'create table X_new like X'
Our observations are as below..
- These slave errors are resolved simply by restarting the slave.
- Such errors are always with Create/Drop of tables which have Memory Storage Engine.
- Errors on Complete-Slave(B) show up continuously at a fixed minute of the hour and have been repeating since we have upgraded, almost a week.
- Errors on Complete-Slave as well as Partial slave are observed whenever its master is restarted.
- Cluster-1 and Cluster-2 have centos machines and Cluster-3 have ubuntu-machines. Slaves on centos machines also fail with same error whenever its master(C/D) is restarted, but ubuntu machines do not fail!!.
We have temporarily been able to live with issue by setting up an action-script on our monitoring system which fires on slave error alert on any machine.
A look into gtid_next section in replication-options doc of mysql tells following
Prior to MySQL 5.6.20, when GTIDs were enabled but gtid_next was not AUTOMATIC, DROP TABLE did not work correctly when used on a combination of nontemporary tables with temporary tables, or of temporary tables using transactional storage engines with temporary tables using nontransactional storage engines. In MySQL 5.6.20 and later, DROP TABLE or DROP TEMPORARY TABLE fails with an explicit error when used with either of these combinations of tables. (Bug #17620053)
This seems related to my issue but still doesn't not explain my scenario. Any hints/direction to solve the issue would be greatly appreciated...
EDIT : I managed to find a similar recently reported bug in mysql(#77729), description of which is as follows :
https://bugs.mysql.com/bug.php?id=77729
When you have table with Engine MEMORY working on replication master, mysqld injects "DELETE" statement in binary logs on first access query to this table. This insures consistency of data on replicating slaves.
If replication is GTID ROW based, this inserted "DELETE" breaks replication. Logged event is in STATEMENT format and do not generate correct SET GTID_NEXT statements in binary log.
Unfortunately, the status of this bug is marked as Can't Repeat
...