Yesterday the Galera cluster (3 nodes, RHEL7) was down several times for about 5-7 minutes. Galera came back online without human intervention.
The logs from the time slices:
Node1 https://gist.github.com/bf2498582631cdb2ff49ad2da6b235a2
Node2 https://gist.github.com/d0b8378195933cc50ef5bd40e3fd4ca6
Node3 https://gist.github.com/d93a7b3ea9664514e3751251f79ea037
I wonder what happend. And why downtime?
What's thos log entries and what's the supposed action of operator?
Slave SQL: Error 'Operation CREATE USER failed for '$USER'@'localhost'' on query.
Slave SQL: Error executing row event:
WSREP: Failed to apply trx 174957086 4 times
WSREP: Node consistency compromized, aborting...
WSREP: Failed to apply trx: source: 72173db5-2b40-11e7-8b70-2206a55850e2 version: 3 local: 0 state: APPLYING
Howto make sure those downtimes don't happen again?