I have skytools setup for postrgesql replication. It keeps on failing intermittently. When I checked londiste status I get following error
$ londiste /etc/skytools/mydb_data_0.ini status
Queue: mydb_data Local node: mydb_data_0
mydb_data (root)
| Tables: 18/0/0
| Lag: 9s, Tick: 2620740
+--: mydb_data_0 (leaf)
| Tables: 18/0/0
| Lag: 15m8s, Tick: 2620670
| ERR: mydb_data_0: Lost position: batch 620669..2620669, dst has 2620670
+--: mydb_data_1 (leaf)
Tables: 18/0/0
Lag: 9s, Tick: 2620740
I really don't understand what's going wrong. I get the same error message in postgres logs also,
Exception: Lost position: batch 2620669..2620669, dst has 2620670
I found this article to solve the error which I'm getting. It says you would have to --reset
option on the worker to reset the queue position on the remote site and then issued wait-sync
to get the table queue moving again.
So I did this,
$ londiste /etc/skytools/mytestdb_data_0.ini worker --reset
Ignoring stale pidfile
2016-12-23 17:00:34,278 15245 INFO Resetting queue tracking on dst side
It resets the queue successfully, but when I check londiste status I get this error,
$ londiste /etc/skytools/mydb_data_0.ini status
Queue: mydb_data Local node: mydb_data_0
mydb_data (root)
| Tables: 18/0/0
| Lag: 9s, Tick: 2620740
+--: mydb_data_0 (leaf)
| Tables: 18/0/0
| Lag: 15m8s, Tick: 2620670
| ERR: mydb_data_0: [ev_id=84594950,ev_txid=702851528] duplicate key value violates unique constraint "dmn_pkey"
+--: mydb_data_1 (leaf)
Tables: 18/0/0
Lag: 9s, Tick: 2620740
I don't what is causing this to fail, can you please guide me on this.
Postgresql verion : 9.5, Skytools version : 3.2