Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

PostgreSQL 9.6 log shipping replication stuck on startup

$
0
0

Overview

We have been trying to restore replication between a Master and Replication server for a while, but the individual who had set up has left and we've been working at it for a while. In brief, the issue seems to be that the Replication server starts up and within a few minutes reports postgres: startup process recovering [FILENAME] and never advances beyond that same FILENAME.

Configuration

Master

pg_hba.conf

local   replication             replicator                      trust
host    replication             replicator    10.1.20.181/32    trust

postgresql.conf

(only active settings related to replication)

wal_level = hot_standby         # minimal, archive, hot_standby, or logical
fsync = on                      # turns forced synchronization on or off
wal_sync_method = fdatasync     # the default is the first option
archive_mode = on               # allows archiving to be done
archive_command = '/bin/true'   # Temporary until replication is working
archive_timeout = 300           # force a logfile segment switch after this
max_wal_senders = 5             # max number of walsender processes
wal_keep_segments = 600         # in logfile segments, 16MB each; 0 disables
max_replication_slots = 3       # max number of replication slots

Replication Server

postgresql.conf

hot_standby = on 

recovery.conf

standby_mode = 'on' # enables stand-by (readonly) mode

# Connect to the master postgres server using the replicator user we created.
primary_conninfo = 'host=10.1.20.180 port=5432 user=replicator'
primary_slot_name = 'oh_slot'

# Specifies a trigger file whose presence should cause streaming replication to
# end (i.e., failover).
trigger_file = '/tmp/pg_failover_trigger'  

Replication Set up Process

On the Master, we clear out all old WAL files by using pg_controldata to find the current WAL file and delete anything older

On the Replication Server:

  1. Stop the database
  2. Unmount pg_xlog directory
  3. Delete contents of data directory
  4. Run pg_basebackup -Xs -U replicator -h #{SOURCE_DB} -D /var/lib/postgresql/9.6/main -v
  5. Copy recovery.conf into new data directory
  6. Move the pg_xlog directory, create a new one and mount it, then copy the archive files over to the new directory and fix permissions
  7. Start up the database

At this point, I've seen the startup process process a number of WAL files, but it always (and rather quickly -- like within a few minutes) stops and continually reports that it is recovering the same file indefinitely. Checking tables between the two databases confirms that the replication database hasn't added new records that are in the master database.

I get the feeling that I am missing something fundamental and would greatly appreciate questions and guidance.


Viewing all articles
Browse latest Browse all 17268

Trending Articles