I've set up a MySQL "cluster" (four CentOS 7 servers running MySQL 5.7.19) using Group Replication in multi-primary mode, but I can't get it to accept writes from more than one thread at a time. This mode is recommended only for "advanced users", according to Oracle, so I guess that's the source of my troubles.
The group that I've set up works: I can write and read from it, it stays in sync, all good. However, we have a load test (in Java, running on Tomcat) that I'm running on the cluster, that consistently fails when launched with more than one thread. This load test runs a variety of transactions in as many threads as wanted as fast as it can towards a single node. What happens is that the transactions result in java.sql.SQLException: Plugin instructed the server to rollback the current transaction.
. (This is, as far as I can gather, what is printed any time the group replication plugin has determined that some transaction must be rolled back for whatever reason). This eventually kills all but one thread, which continues happily until completion. The odd thing is that this load test is made to never create contention on any row; each thread gets its own set of rows to manipulate. Stopping the group replication plugin or running in single-primary mode fixes the issue, allowing me to run concurrent threads with write transactions.
Only having one writer at a time would be unacceptable in production, so this is a showstopper.
I've tried all the isolation levels (including read-uncommitted). I've tried running the appliers in parallel. I've read the requirements and limitations in particular and the entire group replication dev documentation from Oracle in general. I've tried reading bad translations of Chinese open source forums... No luck.
Has anyone gotten this to work, or knows how to?
EDIT: It is possible to run more than one thread against the same server, if the transactions are timed so that they interleave. That is, more than one connection can execute transactions, but only one can execute a transaction at any given point in time, otherwise one of the transactions will fail.
EDIT: Clarifying based on kind input from Matt Lord:
"Perhaps the writes being executed by your benchmark/load test are against a table with cascading FKs?" No, the output from grep --perl-regexp "ON DELETE CASCADE|ON UPDATE CASCADE|ON DELETE SET NULL|ON UPDATE SET NULL|ON DELETE SET DEFAULT|ON UPDATE SET DEFAULT" mysqldump_gr.sql -ni
(where mysqldump_gr.sql is the result of mysqldump -u root -pvisa --triggers --routines --events --all-databases > mysqldump_gr.sql
) results in one huge text insert into mysql.help_topic.
"[Can you give me a] MySQL error log snippet covering the relevant time period from the node(s) you're executing writes against[?]" As weird as it sounds, this varies. Either there is no output to the error log during the test or there are lines like this one: [Note] Aborted connection 1077 to db: 'mydb' user: 'user' host: 'whereISendTransactionsFrom' (Got an error reading communication packets)
. I didn't write about this error message because I thought it was just a one-off the first time we tested and none of the google results had anything to do with GR, but now I did another test and here it is again...
"[Can you give me] A basic definition of the load test: schema, queries, write pattern[?] (e.g. is each benchmark/client thread being executed against a different mysqld server?)"
Unfortunately that's proprietary, but I can reiterate some info from above: The test is executed against a single node (i.e. a single server). Each thread gets its own rows to manipulate.
"[Can you give me] The my.cnf being used on the mysql instances[?]" I've tried with two different ones, though with many similarities due to requirements. This is the latest one, anonymized a bit:
[mysql]
port = 3306
socket = /var/lib/mysql/mysql.sock
[mysqld]
port = 3306
socket = /var/lib/mysql/mysql.sock
transaction_isolation = READ-UNCOMMITTED
explicit_defaults_for_timestamp= ON
user = mysql
default-storage-engine = InnoDB
socket = /var/lib/mysql/mysql.sock
pid-file = /var/lib/mysql/mysql.pid
bind-address = 0.0.0.0
skip-host-cache
secure-file-priv = ""
report_host = "realIpAddressHere"
datadir = /var/lib/mysql/
log-bin = /var/lib/mysql/mysql-bin
relay-log = /var/lib/mysql/relay-bin
server-id = 59331200
server_id = 59331200
auto_increment_increment = 10
auto_increment_offset = 1
replicate-ignore-db = mysql
slave-skip-errors = 1032,1062
master-info-repository = TABLE
relay-log-info-repository = TABLE
binlog_checksum = NONE
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
log_bin = binlog
binlog_format = ROW
transaction_write_set_extraction = XXHASH64
loose-group_replication_group_name = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
loose-group_replication_start_on_boot = off
loose-group_replication_local_address = "localAddressHere"
loose-group_replication_group_seeds = "groupSeedsHere"
loose-group_replication_bootstrap_group = off
loose-group_replication_single_primary_mode = OFF
loose-group_replication_enforce_update_everywhere_checks = ON
disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
loose-group_replication_ip_whitelist="ipRangeHere"
slave_parallel_workers = 1024
slave_transaction_retries = 18446744073709551615
slave_skip_errors = ddl_exist_errors
loose-group_replication_gtid_assignment_block_size = 1024
log-error = /var/lib/mysql/mysql-error.log
log-queries-not-using-indexes = 0
slow-query-log = 1
slow-query-log-file = /var/lib/mysql/mysql-slow.log
event_scheduler=ON
loose-group_replication_single_primary_mode = OFF
loose-group_replication_enforce_update_everywhere_checks = ON
We do not have a MySQL Enterprise subscription.