Monitoring PostgreSQL Replication

After I have built my postgres hosts using Vagrant, I’m doing the following tests in order to check my replication:

pg_stat_replication view

Run the following statement on the master / primary host:

postgres=# SELECT * FROM pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag
| flush_lag | replay_lag | sync_priority | sync_state
-------+----------+------------+------------------+-------------+------------------------------+-------------+-------------------------------+--------------+-----------+-----------+-----------+-----------+------------+-----------
+-----------+------------+---------------+------------
19382 | 16384 | replicator | walreceiver | 192.168.0.3 | postgres-standby.localdomain | 48955 | 2019-01-07 12:04:26.588984+00 | | streaming | 0/4000000 | 0/4000000 | 0/4000000 | 0/4000000 |
| | | 0 | async
(1 row)
postgres=#

This view offers a lot of interesting information:


pid: Process id of walsender process
usesysid: OID of user which is used for Streaming replication.
usename: Name of user which is used for Streaming replication
application_name: Application name connected to master
client_addr: Address of standby/streaming replication
client_hostname: Hostname of standby.
client_port: TCP port number on which standby communicating with WAL sender
backend_start: Start time when SR connected to Master.
state: Current WAL sender state i.e streaming
sent_lsn: Last transaction location sent to standby.
write_lsn: Last transaction written on disk at standby
flush_lsn: Last transaction flush on disk at standby.
replay_lsn: Last transaction flush on disk at standby.
sync_priority: Priority of standby server being chosen as synchronous standby
sync_state: Sync State of standby (is it async or synchronous).

WAL sender/receiver processes running on the servers.

On Master / Primary:

[Mon Jan 07 12:34:28 postgres@postgres-primary:~ ] $ ps aux |grep postgres
postgres 19382 0.0 0.5 397088 10516 ? Ss 12:04 0:00 postgres: wal sender process replicator postgres-standby.localdomain(48955) streaming 0/4000000
root 19430 0.0 0.4 241076 7784 pts/0 S 12:21 0:00 sudo su - postgres
root 19432 0.0 0.2 191936 4648 pts/0 S 12:21 0:00 su - postgres
postgres 19433 0.0 0.1 115448 3468 pts/0 S 12:21 0:00 -bash
postgres 19475 0.0 0.2 155372 3996 pts/0 R+ 12:34 0:00 ps aux
postgres 19476 0.0 0.1 112724 2404 pts/0 S+ 12:34 0:00 grep --color=auto postgres
postgres 31163 0.0 1.2 396244 22984 ? S 08:10 0:00 /usr/pgsql-10/bin/postgres -D /test03/pgdata/data
postgres 31164 0.0 0.2 249048 5108 ? Ss 08:10 0:00 postgres: logger process
postgres 31166 0.0 0.6 396244 10908 ? Ss 08:10 0:00 postgres: checkpointer process
postgres 31167 0.0 0.2 396244 5000 ? Ss 08:10 0:00 postgres: writer process
postgres 31168 0.0 0.5 396244 10324 ? Ss 08:10 0:00 postgres: wal writer process
postgres 31169 0.0 0.4 396660 8368 ? Ss 08:10 0:00 postgres: autovacuum launcher process
postgres 31170 0.0 0.2 251168 4168 ? Ss 08:10 0:00 postgres: archiver process last was 000000010000000000000003
postgres 31171 0.0 0.3 251168 5520 ? Ss 08:10 0:00 postgres: stats collector process
postgres 31172 0.0 0.4 396536 7316 ? Ss 08:10 0:00 postgres: bgworker: logical replication launcher
[Mon Jan 07 12:34:32 postgres@postgres-primary:~ ] $

On Standby / Slave:

[Mon Jan 07 12:23:33 vagrant@postgres-standby:~ ] $ ps aux |grep postgres
vagrant 19073 0.0 0.1 112720 2408 pts/0 S+ 12:34 0:00 grep --color=auto postgres
postgres 30580 0.0 1.2 396244 22608 ? S 12:04 0:00 /usr/pgsql-10/bin/postgres -D /test03/pgdata/data
postgres 30581 0.0 0.2 249048 5108 ? Ss 12:04 0:00 postgres: logger process
postgres 30582 0.0 0.3 396340 6956 ? Ss 12:04 0:00 postgres: startup process recovering 000000010000000000000003
postgres 30584 0.0 0.2 396244 5000 ? Ss 12:04 0:00 postgres: checkpointer process
postgres 30585 0.0 0.2 396244 5064 ? Ss 12:04 0:00 postgres: writer process
postgres 30586 0.0 0.2 251168 4232 ? Ss 12:04 0:00 postgres: stats collector process
postgres 30594 0.0 0.5 403380 9576 ? Ss 12:04 0:01 postgres: wal receiver process streaming 0/4000000
[Mon Jan 07 12:34:39 vagrant@postgres-standby:~ ] $

One way of checking how up to date is our replication is by checking the amount of WAL records generated in the primary, but not yet applied in the standby.

On Primary / Master:

[Mon Jan 07 12:34:48 postgres@postgres-primary:~ ] $ psql
psql (10.6)
Type "help" for help.
postgres=# SELECT pg_current_wal_lsn();
pg_current_wal_lsn
0/4000000
(1 row)
postgres=#

On Standby / Slave:

 [Mon Jan 07 12:37:19 postgres@postgres-standby:~ ] $ psql
psql (10.6)
Type "help" for help.
postgres=# SELECT pg_last_wal_receive_lsn();
pg_last_wal_receive_lsn
0/4000000
(1 row)
postgres=#

postgres=# SELECT pg_last_wal_replay_lsn();
pg_last_wal_replay_lsn
0/4000000
(1 row)
postgres=#

We can use the following query to get the lag in seconds.

postgres=# SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn()
postgres-# THEN 0
postgres-# ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp())
postgres-# END AS log_delay;
log_delay
0
(1 row)
postgres=#

Leave Comment

Your email address will not be published. Required fields are marked *