I am on mongo v3.0.x and have a replication set with 3 members, but the second member (and entire server) was unexpectedly taken down and destroyed. I have stood up a new server to use as a replacement for the the second member and need help bringing it back to a working state. I am not sure if this is the proper way to bring a member back in, even though they have the same address (mongochat02).
Do I need to do a rs.reconfig() or remove(mongochat02) and add(mongochat02) again? Or am I supposed to follow a different procedure to get this member working?
When I issue rs.status():
001-rs:PRIMARY> rs.status()
{
"set" : "001-rs",
"date" : ISODate("2017-04-07T19:39:23.860Z"),
"myState" : 1,
"term" : NumberLong(-1),
"heartbeatIntervalMillis" : NumberLong(2000),
"members" : [
{
"_id" : 0,
"name" : "mongochat01:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 174004,
"optime" : Timestamp(1491593961, 4),
"optimeDate" : ISODate("2017-04-07T19:39:21Z"),
"lastHeartbeat" : ISODate("2017-04-07T19:39:22.386Z"),
"lastHeartbeatRecv" : ISODate("2017-04-07T19:39:21.977Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "mongochat03:27017",
"configVersion" : 3
},
{
"_id" : 1,
"name" : "mongochat02:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2017-04-07T19:39:23.672Z"),
"lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
"pingMs" : NumberLong(0),
"lastHeartbeatMessage" : "Operation timed out",
"configVersion" : -1
},
{
"_id" : 2,
"name" : "mongochat03:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 15127641,
"optime" : Timestamp(1491593963, 2),
"optimeDate" : ISODate("2017-04-07T19:39:23Z"),
"electionTime" : Timestamp(1491419961, 1),
"electionDate" : ISODate("2017-04-05T19:19:21Z"),
"configVersion" : 3,
"self" : true
}
],
"ok" : 1
}
While on mongochat02, when I issue rs.status()
> rs.status()
{
"info" : "run rs.initiate(...) if not yet done for the set",
"ok" : 0,
"errmsg" : "no replset config has been received",
"code" : 94
}
Connections seem to work up until the heartbeat. In the logs, there is the following error while on mongochat02:
2017-04-06T10:42:11.831-0600 I REPL [ReplicationExecutor] Error in heartbeat request mongochat01:27017; ExceededTimeLimit: Operation timed out
2017-04-06T10:42:11.911-0600 I NETWORK [initandlisten] connection accepted from 10.1.240.185:36358 #6671 (151 connections now open)
2017-04-06T10:42:11.947-0600 I REPL [ReplicationExecutor] Error in heartbeat request to mongochat03:27017; ExceededTimeLimit: Operation timed out
This made me question if the members are unable to communicate with one another:
- All the members are able to ping one another
-
But mongochat02 is unable to connect to mongochat03/01 through shell
[root@mongochat02]$ mongo --host mongochat03:27017 MongoDB shell version: 3.2.9 connecting to: mongochat03:27017/test
2017-04-07T14:02:03.411-0600 I NETWORK [thread1] Socket recv() errno:110 Connection timed out mongochat03:27017
2017-04-07T14:02:03.411-0600 I NETWORK [thread1] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_ERROR] server [mongochat03:27017]