Last week we had an incident with our master PostgreSQL server instance which got totally thrashed so, we had to switch to our slave as our one and only DB instance. After the initial chaos was controlled, we tried to create a new slave replica from scratch, according to instructions left by our DB provider who actually set up the first master-slave replica, but just yesterday I realized that the slave wasn't really replicating at all and, only has the contents of the initial base backup, so we have no redundancy right now in case of a new disaster!
On recovery.conf
we have:
standby_mode = on
primary_conninfo = 'host=10.1.1.65 port=5432 user=replicador password=XXXXXXXX'
In /var/lib/postgresql.log
(master) there are several messages like this:
2017-03-17 12:21:57 UTC [1969-1] replicador@[unknown] ERROR: requested WAL segment 0000000100000B280000003A has already been removed
And in the replica's log file, many messages like:
2017-03-22 13:22:32 UTC [2827-1] LOG: started streaming WAL from primary at B28/3A000000 on timeline 1
2017-03-22 13:22:32 UTC [2827-2] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 0000000100000B280000003A has already been removed
Seems like the process wasn't fast enough to keep the master's pace. What parameters should I check/raise? Can I actually re-sync, or am I lost and once we'll have to do it all over again?
What could be missing? What additional info should I provide?
Environment:
select version()
PostgreSQL 9.4.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit
lsb_release -a
Distributor ID: Debian
Description: Debian GNU/Linux 8.6 (jessie)
Release: 8.6
Codename: jessie
uname -a
Linux ip-10-1-0-139 3.16.0-4-amd64 #1 SMP Debian 3.16.36-1+deb8u2 (2016-10-19) x86_64 GNU/Linux