I have setup streaming replication for my postgres servers. I am using PostgreSQL 9.2.6. Replication seems to be working fine. I am monitoring this with two ways using nagios:
- Log_delay and
- Byte_lag
I am very frequently getting critical alert for log_delay
and in the same time byte_lag
is not throwing up any alert. log_delay
will be OK after 1 or 2 minutes. Can any one please suggest me If I am missing something in my setup?
Queries are given below.
Log_delay
SELECT
CASE WHEN pg_last_xlog_receive_location() = pg_last_xlog_replay_location()
THEN 0
ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
END AS log_delay;
Byte_lag
SELECT sent_offset - ( replay_offset - (sent_xlog - replay_xlog) * 255 * 16 ^ 6 )
AS byte_lag
FROM (
SELECT client_addr,
('x' || lpad(split_part(sent_location, '/', 1), 8, '0'))::bit(32)::bigint
AS sent_xlog,
('x' || lpad(split_part(replay_location, '/', 1), 8, '0'))::bit(32)::bigint
AS replay_xlog,
('x' || lpad(split_part(sent_location, '/', 2), 8, '0'))::bit(32)::bigint
AS sent_offset,
('x' || lpad(split_part(replay_location, '/', 2), 8, '0'))::bit(32)::bigint
AS replay_offset
FROM pg_stat_replication ) AS s;