Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

Mongodb Replica Member Server Changed: Status Stuck In STARTUP. Missing config

$
0
0

I am on mongo v3.0.x and have a replication set with 3 members, but the second member (and entire server) was unexpectedly taken down and destroyed. I have stood up a new server to use as a replacement for the the second member and need help bringing it back to a working state. I am not sure if this is the proper way to bring a member back in, even though they have the same address (mongochat02).

Do I need to do a rs.reconfig() or remove(mongochat02) and add(mongochat02) again? Or am I supposed to follow a different procedure to get this member working?

When I issue rs.status():

001-rs:PRIMARY> rs.status()
{
    "set" : "001-rs",
    "date" : ISODate("2017-04-07T19:39:23.860Z"),
    "myState" : 1,
    "term" : NumberLong(-1),
    "heartbeatIntervalMillis" : NumberLong(2000),
    "members" : [
        {
            "_id" : 0,
            "name" : "mongochat01:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 174004,
            "optime" : Timestamp(1491593961, 4),
            "optimeDate" : ISODate("2017-04-07T19:39:21Z"),
            "lastHeartbeat" : ISODate("2017-04-07T19:39:22.386Z"),
            "lastHeartbeatRecv" : ISODate("2017-04-07T19:39:21.977Z"),
            "pingMs" : NumberLong(0),
            "syncingTo" : "mongochat03:27017",
            "configVersion" : 3
        },
        {
            "_id" : 1,
            "name" : "mongochat02:27017",
            "health" : 0,
            "state" : 8,
            "stateStr" : "(not reachable/healthy)",
            "uptime" : 0,
            "optime" : Timestamp(0, 0),
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"),
            "lastHeartbeat" : ISODate("2017-04-07T19:39:23.672Z"),
            "lastHeartbeatRecv" : ISODate("1970-01-01T00:00:00Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "Operation timed out",
            "configVersion" : -1
        },
        {
            "_id" : 2,
            "name" : "mongochat03:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 15127641,
            "optime" : Timestamp(1491593963, 2),
            "optimeDate" : ISODate("2017-04-07T19:39:23Z"),
            "electionTime" : Timestamp(1491419961, 1),
            "electionDate" : ISODate("2017-04-05T19:19:21Z"),
            "configVersion" : 3,
            "self" : true
        }
    ],
    "ok" : 1
}

While on mongochat02, when I issue rs.status()

> rs.status()
{
    "info" : "run rs.initiate(...) if not yet done for the set",
    "ok" : 0,
    "errmsg" : "no replset config has been received",
    "code" : 94
}

Connections seem to work up until the heartbeat. In the logs, there is the following error while on mongochat02:

2017-04-06T10:42:11.831-0600 I REPL     [ReplicationExecutor] Error in heartbeat request mongochat01:27017; ExceededTimeLimit: Operation timed out
2017-04-06T10:42:11.911-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:36358 #6671 (151 connections now open)
2017-04-06T10:42:11.947-0600 I REPL     [ReplicationExecutor] Error in heartbeat request to mongochat03:27017; ExceededTimeLimit: Operation timed out

This made me question if the members are unable to communicate with one another:

  • All the members are able to ping one another
  • But mongochat02 is unable to connect to mongochat03/01 through shell

    [root@mongochat02]$ mongo --host mongochat03:27017
    MongoDB shell version: 3.2.9
    connecting to: mongochat03:27017/test
    

    2017-04-07T14:02:03.411-0600 I NETWORK [thread1] Socket recv() errno:110 Connection timed out mongochat03:27017

    2017-04-07T14:02:03.411-0600 I NETWORK [thread1] SocketException: remote: (NONE):0 error: 9001 socket exception [RECV_ERROR] server [mongochat03:27017]

Mongo log from mongochat01 (currently secondary)

2017-04-11T21:03:36.264+0000 I REPL     [ReplicationExecutor] Member mongochat02:27017 is now in state STARTUP
2017-04-11T21:03:41.779+0000 I NETWORK  [initandlisten] connection accepted from 10.8.129.114:33648 #43227 (104 connections now open)
2017-04-11T21:03:47.896+0000 I NETWORK  [conn43147] end connection 10.8.129.114:33200 (103 connections now open)
2017-04-11T21:03:48.265+0000 I NETWORK  [ReplExecNetThread-2] Socket recv() timeout  10.8.129.114:27017
2017-04-11T21:03:48.265+0000 I NETWORK  [ReplExecNetThread-2] SocketException: remote: 10.8.129.114:27017 error: 9001 socket exception [RECV_TIMEOUT] server [10.8.129.114:27017]
2017-04-11T21:03:48.265+0000 I NETWORK  [ReplExecNetThread-2] DBClientCursor::init call() failed
2017-04-11T21:03:48.265+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to mongochat02:27017; Location10276 DBClientBase::findN: transport error: mongochat02:27017 ns: admin.$cmd query: { replSetHeartbeat: "001-rs", pv: 1, v: 3, from: "mongochat01:27017", fromId: 0, checkEmpty: false }
2017-04-11T21:03:50.380+0000 I REPL     [ReplicationExecutor] Member mongochat02:27017 is now in state STARTUP
2017-04-11T21:03:53.778+0000 I NETWORK  [initandlisten] connection accepted from 10.8.129.114:33654 #43228 (104 connections now open)

mongochat03 log (curent Primary)

 2017-04-11T21:15:45.955+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to mongochat02:27017; ExceededTimeLimit: Operation timed out
    2017-04-11T21:15:46.231+0000 I NETWORK  [initandlisten] connection accepted from 10.8.129.114:55282 #1148708 (108 connections now open)
    2017-04-11T21:15:48.041+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mongochat02:27017
    2017-04-11T21:15:52.764+0000 I NETWORK  [conn1148596] end connection 10.8.129.114:54834 (107 connections now open)
    2017-04-11T21:15:57.956+0000 I REPL     [ReplicationExecutor] Error in heartbeat request to mongochat02:27017; ExceededTimeLimit: Operation timed out
    2017-04-11T21:15:58.231+0000 I NETWORK  [initandlisten] connection accepted from 10.8.129.114:55288 #1148709 (108 connections now open)
    2017-04-11T21:15:59.902+0000 I NETWORK  [conn1148706] end connection 10.1.240.185:58407 (107 connections now open)
    2017-04-11T21:15:59.933+0000 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:58560 #1148710 (108 connections now open)
    2017-04-11T21:16:00.041+0000 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mongochat02:27017
    2017-04-11T21:16:04.540+0000 I NETWORK  [conn1148597] end connection 10.8.129.114:54842 (107 connections now open)

mongo.conf from monogchat01. They are all pretty much the same except the "bindIp" is the server ip.

# mongod.conf

# for documentation of all options, see:
#   http://docs.mongodb.org/manual/reference/configuration-options/

# where to write logging data.
systemLog:
  destination: file
  logAppend: true
  path: /var/log/mongodb/mongod.log

# Where and how to store data.
storage:
  dbPath: /opt/mongo
  journal:
    enabled: true
#  engine:
#  mmapv1:
#  wiredTiger:

# how the process runs
processManagement:
  fork: true  # fork and run in background
  pidFilePath: /var/run/mongodb/mongod.pid  # location of pidfile

# network interfaces
net:
  port: 27017
  bindIp: 127.0.0.1,10.1.240.185  # Listen to local interface only, comment to listen on all interfaces.


#security:

#operationProfiling:

#replication:
replication:
  replSetName: "001-rs"
#sharding:

## Enterprise-Only Options

#auditLog:

#snmp:

Logs on mongochat02

2017-04-12T09:00:18.782-0600 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten] MongoDB starting : pid=27788 port=27017 dbpath=/opt/mongo 64-bit host=den02vmchat02
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten] db version v3.2.9
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten] git version: 22ec9e93b40c85fc7cae7d56e7d6a02fd811088c
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten] allocator: tcmalloc
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten] modules: none
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten] build environment:
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten]     distmod: rhel70
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten]     distarch: x86_64
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten]     target_arch: x86_64
2017-04-12T09:00:18.789-0600 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { port: 27017 }, processManagement: { fork: true, pidFilePath: "/var/run/mongodb/mongod.pid" }, replication: { replSetName: "001-rs" }, storage: { dbPath: "/opt/mongo", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
2017-04-12T09:00:18.809-0600 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=3G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-04-12T09:00:19.631-0600 I CONTROL  [initandlisten]
2017-04-12T09:00:19.631-0600 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2017-04-12T09:00:19.631-0600 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2017-04-12T09:00:19.631-0600 I CONTROL  [initandlisten]
2017-04-12T09:00:19.631-0600 I CONTROL  [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2017-04-12T09:00:19.631-0600 I CONTROL  [initandlisten] **        We suggest setting it to 'never'
2017-04-12T09:00:19.631-0600 I CONTROL  [initandlisten]
2017-04-12T09:00:19.631-0600 I CONTROL  [initandlisten] ** WARNING: soft rlimits too low. rlimits set to 4096 processes, 64000 files. Number of processes should be at least 32000 : 0.5 times number of files.
2017-04-12T09:00:19.631-0600 I CONTROL  [initandlisten]
2017-04-12T09:00:19.814-0600 I REPL     [initandlisten] Did not find local voted for document at startup;  NoMatchingDocument: Did not find replica set lastVote document in local.replset.election
2017-04-12T09:00:19.814-0600 I REPL     [initandlisten] Did not find local replica set configuration document at startup;  NoMatchingDocument: Did not find replica set configuration document in local.system.replset
2017-04-12T09:00:19.814-0600 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/opt/mongo/diagnostic.data'
2017-04-12T09:00:19.814-0600 I NETWORK  [HostnameCanonicalizationWorker] Starting hostname canonicalization worker
2017-04-12T09:00:20.185-0600 I NETWORK  [initandlisten] waiting for connections on port 27017
2017-04-12T09:00:20.561-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.207:57560 #1 (1 connection now open)
2017-04-12T09:00:20.603-0600 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to mongochat03:27017
2017-04-12T09:00:21.247-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51255 #2 (2 connections now open)
2017-04-12T09:00:21.291-0600 I NETWORK  [conn2] end connection 10.1.240.185:51255 (1 connection now open)
2017-04-12T09:00:21.424-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51256 #3 (2 connections now open)
2017-04-12T09:00:21.528-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51257 #4 (3 connections now open)
2017-04-12T09:00:21.632-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51258 #5 (4 connections now open)
2017-04-12T09:00:21.736-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51259 #6 (5 connections now open)
2017-04-12T09:00:21.834-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51260 #7 (6 connections now open)
2017-04-12T09:00:21.839-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51261 #8 (7 connections now open)
2017-04-12T09:00:21.877-0600 I NETWORK  [conn7] end connection 10.1.240.185:51260 (6 connections now open)
2017-04-12T09:00:21.882-0600 I NETWORK  [conn3] end connection 10.1.240.185:51256 (5 connections now open)
2017-04-12T09:00:21.882-0600 I NETWORK  [conn4] end connection 10.1.240.185:51257 (4 connections now open)
2017-04-12T09:00:21.882-0600 I NETWORK  [conn5] end connection 10.1.240.185:51258 (3 connections now open)
2017-04-12T09:00:21.882-0600 I NETWORK  [conn6] end connection 10.1.240.185:51259 (2 connections now open)
2017-04-12T09:00:21.882-0600 I NETWORK  [conn8] end connection 10.1.240.185:51261 (1 connection now open)
2017-04-12T09:00:21.929-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51262 #9 (2 connections now open)
2017-04-12T09:00:21.929-0600 I ASIO     [NetworkInterfaceASIO-Replication-0] Connecting to chat.zayo.us:27017
2017-04-12T09:00:21.947-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51263 #10 (3 connections now open)
2017-04-12T09:00:22.051-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51264 #11 (4 connections now open)
2017-04-12T09:00:22.154-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51265 #12 (5 connections now open)
2017-04-12T09:00:22.258-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51266 #13 (6 connections now open)
2017-04-12T09:00:22.362-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51267 #14 (7 connections now open)
2017-04-12T09:00:22.406-0600 I NETWORK  [conn10] end connection 10.1.240.185:51263 (6 connections now open)
2017-04-12T09:00:22.406-0600 I NETWORK  [conn12] end connection 10.1.240.185:51265 (6 connections now open)
2017-04-12T09:00:22.406-0600 I NETWORK  [conn13] end connection 10.1.240.185:51266 (4 connections now open)
2017-04-12T09:00:22.406-0600 I NETWORK  [conn14] end connection 10.1.240.185:51267 (3 connections now open)
2017-04-12T09:00:22.406-0600 I NETWORK  [conn11] end connection 10.1.240.185:51264 (6 connections now open)
2017-04-12T09:00:22.990-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51268 #15 (3 connections now open)
2017-04-12T09:00:23.022-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51270 #16 (4 connections now open)
2017-04-12T09:00:23.051-0600 I NETWORK  [conn15] end connection 10.1.240.185:51268 (3 connections now open)
2017-04-12T09:00:23.092-0600 I NETWORK  [conn16] end connection 10.1.240.185:51270 (2 connections now open)
2017-04-12T09:00:25.397-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51271 #17 (3 connections now open)
2017-04-12T09:00:25.441-0600 I NETWORK  [conn17] end connection 10.1.240.185:51271 (2 connections now open)
2017-04-12T09:00:25.776-0600 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to mongochat03:27017
2017-04-12T09:00:25.861-0600 I ASIO     [NetworkInterfaceASIO-Replication-0] Successfully connected to chat.zayo.us:27017
2017-04-12T09:00:25.994-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51272 #18 (3 connections now open)
2017-04-12T09:00:26.003-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51273 #19 (4 connections now open)
2017-04-12T09:00:26.037-0600 I NETWORK  [conn18] end connection 10.1.240.185:51272 (3 connections now open)
2017-04-12T09:00:26.107-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51274 #20 (4 connections now open)
2017-04-12T09:00:26.211-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51275 #21 (5 connections now open)
2017-04-12T09:00:26.314-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51276 #22 (6 connections now open)
2017-04-12T09:00:26.418-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51277 #23 (7 connections now open)
2017-04-12T09:00:26.461-0600 I NETWORK  [conn19] end connection 10.1.240.185:51273 (6 connections now open)
2017-04-12T09:00:26.461-0600 I NETWORK  [conn20] end connection 10.1.240.185:51274 (5 connections now open)
2017-04-12T09:00:26.461-0600 I NETWORK  [conn21] end connection 10.1.240.185:51275 (4 connections now open)
2017-04-12T09:00:26.462-0600 I NETWORK  [conn22] end connection 10.1.240.185:51276 (3 connections now open)
2017-04-12T09:00:26.462-0600 I NETWORK  [conn23] end connection 10.1.240.185:51277 (2 connections now open)
2017-04-12T09:00:26.514-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51278 #24 (3 connections now open)
2017-04-12T09:00:26.618-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51279 #25 (4 connections now open)
2017-04-12T09:00:26.722-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51280 #26 (5 connections now open)
2017-04-12T09:00:26.826-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51281 #27 (6 connections now open)
2017-04-12T09:00:26.931-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51282 #28 (7 connections now open)
2017-04-12T09:00:26.974-0600 I NETWORK  [conn24] end connection 10.1.240.185:51278 (6 connections now open)
2017-04-12T09:00:26.974-0600 I NETWORK  [conn25] end connection 10.1.240.185:51279 (5 connections now open)
2017-04-12T09:00:26.974-0600 I NETWORK  [conn26] end connection 10.1.240.185:51280 (4 connections now open)
2017-04-12T09:00:26.975-0600 I NETWORK  [conn27] end connection 10.1.240.185:51281 (3 connections now open)
2017-04-12T09:00:26.975-0600 I NETWORK  [conn28] end connection 10.1.240.185:51282 (3 connections now open)
2017-04-12T09:00:27.161-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51283 #29 (3 connections now open)
2017-04-12T09:00:27.200-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51284 #30 (4 connections now open)
2017-04-12T09:00:27.203-0600 I NETWORK  [conn29] end connection 10.1.240.185:51283 (3 connections now open)
2017-04-12T09:00:27.244-0600 I NETWORK  [conn30] end connection 10.1.240.185:51284 (2 connections now open)
2017-04-12T09:00:29.553-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51285 #31 (3 connections now open)
2017-04-12T09:00:29.596-0600 I NETWORK  [conn31] end connection 10.1.240.185:51285 (2 connections now open)
2017-04-12T09:00:30.145-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51286 #32 (3 connections now open)
2017-04-12T09:00:30.188-0600 I NETWORK  [conn32] end connection 10.1.240.185:51286 (2 connections now open)
2017-04-12T09:00:30.569-0600 I NETWORK  [initandlisten] connection accepted from 10.1.240.185:51288 #33 (3 connections now open)
2017-04-12T09:00:30.603-0600 I REPL     [ReplicationExecutor] Error in heartbeat request to mongochat03:27017; ExceededTimeLimit: Operation timed out

Viewing all articles
Browse latest Browse all 17268

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>