Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

MongoDB 3.0.11 replica set failover does not happen

$
0
0

I have MongoDB 3.0.11 replica set 2 + 1 arbiter. Recently I have experienced networking issues on the primary, but secondary didn't become primary due to "member is more than 10 seconds behind the most up-to-date member (mask 0xA)" However, MMS monitoring shows replication lag not higher than 3 sec. The database has about 100 updates per second. Around 400 connection. No page faults. What is wrong here?

The replica set configuration:

mssp:PRIMARY> rs.config()
{
    "_id" : "mssp",
    "version" : 44459,
    "members" : [
        {
            "_id" : 0,
            "host" : "db-prod-1:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 10,
            "tags" : {

            },
            "slaveDelay" : 0,
            "votes" : 1
        },
        {
            "_id" : 1,
            "host" : "db-prod-2:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : 0,
            "votes" : 1
        },
        {
            "_id" : 2,
            "host" : "db-prod-arbiter-1:27017",
            "arbiterOnly" : true,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : 0,
            "votes" : 1
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatTimeoutSecs" : 10,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        }
    }
}

The logs of each member: db-prod-1

2016-07-22T14:35:53.841+0000 I NETWORK  [ReplExecNetThread-10] getaddrinfo("db-prod-arbiter-1") failed: Temporary failure in name resolution
2016-07-22T14:35:53.841+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to db-prod-arbiter-1:27017; Location18915 Failed attempt to connect to db-prod-arbiter-1:27017; couldn't initialize connection to host db-prod-arbiter-1, address is invalid
2016-07-22T14:35:53.942+0000 I NETWORK  [ReplExecNetThread-11] getaddrinfo("db-prod-2") failed: Temporary failure in name resolution
2016-07-22T14:35:53.942+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to db-prod-2:27017; Location18915 Failed attempt to connect to db-prod-2:27017; couldn't initialize connection to host db-prod-2, address is invalid
2016-07-22T14:35:53.942+0000 I REPL     [ReplicationExecutor] can't see a majority of the set, relinquishing primary
2016-07-22T14:35:53.942+0000 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat
2016-07-22T14:35:53.943+0000 I REPL     [replCallbackWithGlobalLock-0] transition to SECONDARY

db-prod-2

2016-07-22T14:35:53.952+0000 E REPL     [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: db-prod-1:27017
2016-07-22T14:35:53.952+0000 I -        [rsBackgroundSync] caught exception (socket exception [FAILED_STATE] for db-prod-1:27017 (10.240.0.2) failed) in destructor (kill)
2016-07-22T14:35:53.952+0000 I REPL     [ReplicationExecutor] could not find member to sync from
2016-07-22T14:35:54.047+0000 I REPL     [ReplicationExecutor] Member db-prod-1:27017 is now in state SECONDARY
2016-07-22T14:35:54.047+0000 I REPL     [ReplicationExecutor] Standing for election
2016-07-22T14:35:54.048+0000 I REPL     [ReplicationExecutor] not electing self, db-prod-1:27017 would veto with 'I don't think db-prod-2:27017 is electable because the member is not currently a secondary; member is more than 10 seconds behind the most up-to-date member (mask 0xA)'
2016-07-22T14:35:54.048+0000 I REPL     [ReplicationExecutor] not electing self, we are not freshest

db-prod-arbiter-1

2016-07-22T14:35:55.639+0000 I REPL     [ReplicationExecutor] Member db-prod-1:27017 is now in state SECONDARY

Viewing all articles
Browse latest Browse all 17268

Trending Articles