Our Postgres BDR database system stopped replicating data between the nodes.
When I did a check using the pg_xlog_location_diff
I noticed that there is a growing buffer in the replication slot.
SELECT slot_name, database, active, pg_xlog_location_diff(pg_current_xlog_insert_location(), restart_lsn) AS retained_bytes
FROM pg_replication_slots
WHERE plugin = 'bdr';
slot_name | database | active | retained_bytes
-----------------------------------------+--------------+--------+----------------
bdr_26702_6275336279642079463_1_20305__ | ourdatabase | f | 32253352
I also noticed that the slot is marked as active=false.
SELECT * FROM pg_replication_slots;
-[ RECORD 1 ]+----------------------------------------
slot_name | bdr_26702_6275336279642079463_1_20305__
plugin | bdr
slot_type | logical
datoid | 26702
database | ourdatabase
active | f
xmin |
catalog_xmin | 8041
restart_lsn | 0/5F0C6C8
I increased the Postgres logging level, but then only messages I see in the log are:
LOCATION: LogicalIncreaseRestartDecodingForSlot, logical.c:886
DEBUG: 00000: updated xmin: 1 restart: 0
LOCATION: LogicalConfirmReceivedLocation, logical.c:958
DEBUG: 00000: failed to increase restart lsn: proposed 0/7DCE6F8, after 0/7DCE6F8, current candidate 0/7DCE6F8, current after 0/7DCE6F8, flushed up to 0/7DCE6F8
Please let me know if you have an idea how I can re-activate the replication slot and allow the replication to resume.