we are currently testing if moving from mariaDB master/slave replication to a 3-node-galera-cluster is an option for us.
General replication in the cluster works fine like expected. Load test shows that write-performance does not meet our requirements though. To migrate data from the master/slave-system to the cluster one cluster node is set up as slave to that m/s-system (backup from master is imported, slave replication on the cluster node starts from the point where the backup is taken). The cluster node never catches up to the master (seconds-behind-master is increasing). So we looked for tuning-options for our cluster. Setting repl.commit_order=2 improved performance, replication from the master to the cluster node went fine. But we then experienced that the other cluster nodes weren't replicating any more (though their state was synced and the wsrep%-variables did not show anything unusual, no throttling), replication within the cluster seemed to have stalled entirely. When resetting repl.commit_order replication worked again. Our cluster was not replicating neither when we had just the cluster operational (without replicating from the master node) with repl.commit_order=2.
Here excerpts from the server.cnf:
[galera]
# Mandatory settings
wsrep_provider = /usr/lib64/galera/libgalera_smm.so
wsrep_cluster_address = "gcomm://10.0.0.10,10.0.0.8,10.0.0.11"
wsrep_cluster_name = foobar
wsrep_node_name = foobar-db1
wsrep_node_address = 10.0.0.10
binlog_format = row
default_storage_engine = InnoDB
innodb_autoinc_lock_mode = 2
wsrep_slave_threads = 16
wsrep_provider_options = 'gcs.fc_limit=80;gcs.fc_factor=0.8;repl.commit_order=2'
# SST settings
wsrep_sst_method = xtrabackup-v2
wsrep_sst_receive_address= 10.0.0.10
wsrep_sst_auth = galera:somepw
wsrep_retry_autocommit = 10
# required since 10.1
wsrep_on=ON
Any ideas?