Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

PostgreSQL 9.1 Streaming Replication - Replication does not recover until slave restart

$
0
0

I am running a streaming replication environment with PostgreSQL 9.1 (1 master, 3 slaves). Everything worked fine for aprox. 2 months. Yesterday, the replication to one of the slaves failed with the log on the slave having:

LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
FATAL:  terminating walreceiver process due to administrator command
LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
LOG:  incorrect resource manager data checksum in record at 61/DA2710A7

The slave was no longer in sync with the master. Two hours later, in which the log gets a new line like above every 5 seconds, I restarted the slave database server:

LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
LOG:  received fast shutdown request
LOG:  aborting any active transactions
LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
FATAL:  terminating connection due to administrator command
FATAL:  terminating connection due to administrator command
LOG:  shutting down
LOG:  database system is shut down

The new log file on the slave contains:

LOG:  database system was shut down in recovery at 2016-02-29 05:12:11 CET
LOG:  entering standby mode
LOG:  redo starts at 61/D92C10C9
LOG:  consistent recovery state reached at  61/DA2710A7
LOG:  database system is ready to accept read only connections
LOG:  incorrect resource manager data checksum in record at 61/DA2710A7
LOG:  streaming replication successfully connected to primary

Now the slave is in sync with the master but the checksum entry is still there. One more thing I checked were the network logs -> the network was available.

My questions are:

  1. Does anyone know why the walreceiver was terminated?
  2. Why didn't PostgreSQL retry the replication?
  3. What can I do to prevent this in the future?

Thank you.


Viewing all articles
Browse latest Browse all 17268

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>