We have a master-slave replication configuration as follows.
On the master:
postgresql.conf
has replication configured as follows (commented line taken out for brevity):
max_wal_senders = 1
wal_keep_segments = 8
On the slave:
Same postgresql.conf
as on the master. recovery.conf
looks like this:
standby_mode = 'on'
primary_conninfo = 'host=master1 port=5432 user=replication password=replication'
trigger_file = '/tmp/postgresql.trigger.5432'
When this was initially setup, we performed some simple tests and confirmed the replication was working. However, when we did the initial data load, only some of the data made it to the slave.
Slave's log is now filled with messages that look like this:
< 2015-01-23 23:59:47.241 EST >LOG: started streaming WAL from primary at F/52000000 on timeline 1
< 2015-01-23 23:59:47.241 EST >FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000F00000052 has already been removed
< 2015-01-23 23:59:52.259 EST >LOG: started streaming WAL from primary at F/52000000 on timeline 1
< 2015-01-23 23:59:52.260 EST >FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000F00000052 has already been removed
< 2015-01-23 23:59:57.270 EST >LOG: started streaming WAL from primary at F/52000000 on timeline 1
< 2015-01-23 23:59:57.270 EST >FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 000000010000000F00000052 has already been removed
After some analysis and help on the #postgresql IRC channel, I've come to the conclusion that the slave cannot keep up with the master. My proposed solution is as follows.
On the master:
- Set
max_wal_senders=5
- Set
wal_keep_segments=4000
. Yes I know it is very high, but I'd like to monitor the situation and see what happens. I have room on the master.
On the slave:
- Save configuration files in the data directory (i.e.
pg_hba.conf pg_ident.conf postgresql.conf recovery.conf
) - Clear out the data directory (
rm -rf /var/lib/pgsql/9.3/data/*
) . This seems to be required bypg_basebackup
. - Run the following command:
pg_basebackup -h master -D /var/lib/pgsql/9.3/data --username=replication --password
Am I missing anything ? Is there a better way to bring the slave up-to-date w/o having to reload all the data ?
Any help is greatly appreciated.