I'm running PostgreSQL 9.3.9, Docker 1.8.2 and Ubuntu 14.04. I have an issue where my hot standby keeps failing with the following error message:
incorrect resource manager data checksum in record at 46F/6A7B6D28
The first time I got this message I read something which said I should regenerate the hot standby from the master. So I rsync'd everything back over and restarted. The hot standby connects to the master over a TINC VPN connection, tt connected successfully and was streaming changes.
It all seemed to be working fine until around 8am. This is when our system starts to get a bit busier. Then all I see in the logs is the above message and the last WAL file on the hotstandby is 000000010000046F0000006A. If I restart the hot standby, it will start catching up with the master, but it also spikes my CPU and brings everything grinding to a halt, and it never seems to catch up anyway.
I have just migrated the DB and hot standby to two new servers, and I am using the simple configuration from the previous servers on these new machines. On the previous servers, the databases were not dockerized, on this they are. I have other dockerized PostgreSQL installations which are running the replication fine, but these servers do not have as high a load as this one. I even replicated from the new master to the old, non-dockerized PostgreSQL server which used to be the main server. The same thing happened here. I had been using the replication between the old master and hot standby just fine over TINC VPN before the migration. So I don't think Docker or TINC are at fault.
Here is the PostgreSQL configuration for both master and hot standby:
listen_addresses = '*'
port = 5432
max_connections = 600
shared_buffers = 8GB
effective_cache_size = 16GB
work_mem = 10MB
maintenance_work_mem = 1638GB
max_locks_per_transaction = 256
hot_standby = on
archive_mode = on
wal_level = 'hot_standby'
max_wal_senders = 10
wal_keep_segments = 1000 # 80 GB required on pg_xlog
archive_command = 'true'
And the additional recovery.conf on the hot standby:
standby_mode = 'on'
primary_conninfo = 'host=MY_IP port=MY_PORT user=MY_USER password=MY_PASSWORD'
Any help would be greatly appreciated! I need my hot standby!