Quantcast
Channel: StackExchange Replication Questions
Viewing all articles
Browse latest Browse all 17268

What is a reasonable value for PostgreSQL replication lag?

$
0
0

I am monitoring our postgreSQL slave server's replication lag using the following python script, which queries the pg_current_xlog_locationfrom the master, and compares it to the pg_xlog_replay_location on the slave. Lately, I have been receiving warning emails indicating that the replication lag is between 2k and 70k bytes.

What would be a reasonable expectation here? I assume it is based on the WAL buffer size and checkpoint interval, but I am not sure exactly how to calculate it. Also, would I be better off comparing to pg_xlog_receive_location on the slave?

P.S. I am also monitoring replication on the master server by comparing sent_location to replay_location in the pg_stat_replication view. Additionally, I check that the master server is in streaming mode. That monitor has never fired an alert...

#!/usr/bin/python
import subprocess

slaveXlogDiffLimitBytes = 128

try:
    repModeRes = subprocess.check_output('psql -t -p {{postgresql_port}} -c "SELECT pg_is_in_recovery()"', shell=True)
    isInRepMode = repModeRes.strip() == 't'

    masterXlogLocationRes = subprocess.check_output('psql -t -p {{postgresql_port}} -h {{postgres_basebackup_host}} -U {{postgres_basebackup_user}} {{postgres_db_name}} -c "select pg_current_xlog_location();"', shell=True)
    masterXlogLocationStr = masterXlogLocationRes.strip()

    slaveXlogDiffRes = subprocess.check_output('psql -t -p {{postgresql_port}} {{postgres_db_name}} -c "select pg_xlog_location_diff(pg_last_xlog_replay_location(), \'' + masterXlogLocationStr + '\'::pg_lsn);"', shell=True)
    slaveXlogDiffBytes = float(slaveXlogDiffRes.strip())
except subprocess.CalledProcessError as e:
    print "Error retrieving stats: {0}".format(e)
    exit(1)

if isInRepMode != True:
    print ('Slave server is not in recovery mode')
    exit(1)

if slaveXlogDiffBytes > slaveXlogDiffLimitBytes:
    print "Slave server replication is behind master by %f bytes" % slaveXlogDiffBytes
    exit(1)

print('All clear!')
exit(0)

Viewing all articles
Browse latest Browse all 17268

Trending Articles