I have a 4 machine mysql cluster running mariadb 10.0.21. I have two masters setup, that are slaving off of each other. And then two read-only slaves that are replicating from the 1st and 2nd db’s respectively.
In short, the way its setup is like this:
db1 replicates -> db2
db1 replicates -> db3
db2 replicates -> db1
db2 replicates -> db4
It's a pretty basic 4 db setup.
The problem I'm having is that I setup a nagios user to monitor the cluster on the 1st db.
I checked and I could log into the first 3 databases but not the 4th one with the nagios user. I could do that without creating the nagios user on each db because I'm replicating my mysql database on the first two database machines.
But for some reason the nagios user never was created on db4 the way it was on db2 even tho replication seems fine on all nodes.
As you probably recall, I have db 2 replicating to the 4th db.
And if I do a show master status on db 2, I can see that I'm replicating the mysql db:
MariaDB [mysql]> show master status;
+--------------------+----------+------------------------------+------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+--------------------+----------+------------------------------+------------------+
| mariadb-bin.000078 | 376 | jfwiki,jokefire,bacula,mysql | |
+--------------------+----------+------------------------------+------------------+
1 row in set (0.00 sec)
If I check that the nagios user is there on the 2nd db, it is.
MariaDB [mysql]> select User,Host,Password from user where user like 'nagios';
+--------+-------------+-------------------------------------------+
| User | Host | Password |
+--------+-------------+-------------------------------------------+
| nagios | 52.4.204.96 | *somepasswordhash |
+--------+-------------+-------------------------------------------+
1 row in set (0.00 sec)
This user was not created on db2, but is there because of the replication.
And if I check the slave status on db4, replication seems completely fine:
[root@db4:~] #mysql -e "show slave status\G" | egrep "Slave_IO_State|Master_Host|Slave_IO_Running|Slave_SQL_Running|Last_Errno|Seconds_Behind_Master"
Slave_IO_State: Waiting for master to send event
Master_Host: db2.example.com
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Seconds_Behind_Master: 0
And if I check for the presence of the nagios user on db4 the way I did on db2, the user just isn't there:
MariaDB [mysql]> select User,Host from user where user like 'nagios';
Empty set (0.00 sec)
So my question is, why did the nagios user not get replicated to db4 the way it did from db1 -> db2 and from db1 -> db3? Even tho slave replication on db4 seems to be okay? I could log into all those hosts using the nagios user from the monitoring host.
Here's the full output of the slave replication command on db4 in case I missed anything from my grep earlier:
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: db2.example.com
Master_User: jf_slave
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mariadb-bin.000078
Read_Master_Log_Pos: 376
Relay_Log_File: db4-relay-bin.000044
Relay_Log_Pos: 537
Relay_Master_Log_File: mariadb-bin.000078
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 376
Relay_Log_Space: 1121
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: Yes
Master_SSL_CA_File: /opt/mysql/ca.crt
Master_SSL_CA_Path:
Master_SSL_Cert: /opt/mysql/db4.example.com.crt
Master_SSL_Cipher:
Master_SSL_Key: /opt/mysql/db4.example.com.key
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
Master_SSL_Crl: /opt/mysql/ca.crt
Master_SSL_Crlpath:
Using_Gtid: No
Gtid_IO_Pos:
1 row in set (0.00 sec)
I am seeing some errors in the mariadb logs on db4, however they're no different than the errors I'm seeing on the 1st 3 databases where the nagios user data replicated successfully.
151004 15:34:36 [Note] Error reading relay log event: slave SQL thread was killed
151004 15:34:36 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
151004 15:34:36 [Note] Slave I/O thread killed while reading event
151004 15:34:36 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.000078', position 376
151004 15:36:47 [Note] Slave SQL thread initialized, starting replication in log 'mariadb-bin.000078' at position 376, relay log './db4-relay-bin.000042' position: 537
151004 15:36:47 [Note] Slave I/O thread: connected to master 'jf_slave@db2.example.com:3306',replication started in log 'mariadb-bin.000078' at position 376
151007 4:24:12 [Note] Error reading relay log event: slave SQL thread was killed
151007 4:24:12 [ERROR] Error reading packet from server: Lost connection to MySQL server during query ( server_errno=2013)
151007 4:24:12 [Note] Slave I/O thread killed while reading event
151007 4:24:12 [Note] Slave I/O thread exiting, read up to log 'mariadb-bin.000078', position 376
151007 4:28:20 [Note] Slave SQL thread initialized, starting replication in log 'mariadb-bin.000078' at position 376, relay log './db4-relay-bin.000043' position: 537
151007 4:28:20 [Note] Slave I/O thread: connected to master 'jf_slave@db2.example.com:3306',replication started in log 'mariadb-bin.000078' at position 376
So why would there be this database inconsistency, when all the indicators of replication seem okay? Why can't the nagios user log into db4 the way it can on the 1st 3 db's?
Thanks