I use test_slony_state and see some informations like : Check of event info --------------------------------------------------- Problem : Events not propagating to node 2 Problem : Events not propagating to node 4 Problem : Events not propagating to node 3 Check of sl_confirm aging --------------------------------------------------- Confirmations not propagating from 2 to 1 Confirmations not propagating from 2 to 3 Confirmations not propagating from 2 to 4 Confirmations not propagating from 3 to 1 Confirmations not propagating from 3 to 2 Confirmations not propagating from 3 to 4 Confirmations not propagating from 4 to 1 Confirmations not propagating from 4 to 2 Confirmations not propagating from 4 to 3 You can see the results on one of my databases : - for Confirmations select con_origin, con_received, min(con_seqno) as minseq, max(con_seqno) as maxseq, date_trunc('minutes', min(now()-con_timestamp)) as age1, date_trunc('minutes', max(now()-con_timestamp)) as age2, min(now() - con_timestamp) > '00:30:00' as tooold from _pns_slony_voila_archi_0.sl_confirm group by con_origin, con_received order by con_origin, con_received; con_origin | con_received | minseq | maxseq | age1 | age2 | tooold ------------+--------------+---------+---------+----------+----------+-------- 101 | 102 | 4464029 | 4464792 | 00:00:00 | 00:16:00 | f 101 | 103 | 4464027 | 4464792 | 00:00:00 | 00:16:00 | f 101 | 104 | 4464024 | 4464792 | 00:00:00 | 00:16:00 | f 102 | 101 | 29 | 29 | 03:39:00 | 03:39:00 | t 102 | 103 | 29 | 29 | 03:39:00 | 03:39:00 | t 102 | 104 | 29 | 29 | 03:39:00 | 03:39:00 | t 103 | 101 | 62 | 62 | 03:39:00 | 03:39:00 | t 103 | 102 | 62 | 62 | 03:39:00 | 03:39:00 | t 103 | 104 | 62 | 62 | 03:39:00 | 03:39:00 | t 104 | 101 | 57 | 57 | 03:39:00 | 03:39:00 | t 104 | 102 | 57 | 57 | 03:39:00 | 03:39:00 | t 104 | 103 | 57 | 57 | 03:39:00 | 03:39:00 | t - for Events select ev_origin, min(ev_seqno), max(ev_seqno), date_trunc('minutes', min(now() - ev_timestamp)), date_trunc('minutes', max(now() - ev_timestamp)), min(now() - ev_timestamp) > '00:30:00' as agehi from _pns_slony_voila_archi_0.sl_event group by ev_origin; ev_origin | min | max | date_trunc | date_trunc | agehi -----------+---------+---------+------------+------------+------- 103 | 62 | 62 | 03:49:00 | 03:49:00 | t 104 | 57 | 57 | 03:49:00 | 03:49:00 | t 102 | 29 | 29 | 03:49:00 | 03:49:00 | t 101 | 4464493 | 4465346 | 00:00:00 | 00:14:00 | f If I restart every slons processes new events are generated on these nodes but still not correctly propagated. However the last event on each node seems to be propagated thanks to the restart stage (see below) After the restart I have something like : select ev_origin, min(ev_seqno), max(ev_seqno), date_trunc('minutes', min(now() - ev_timestamp)), date_trunc('minutes', max(now() - ev_timestamp)), min(now() - ev_timestamp) > '00:30:00' as agehi from _pns_slony_voila_archi_0.sl_event group by ev_origin; ev_origin | min | max | date_trunc | date_trunc | agehi -----------+---------+---------+------------+------------+------- 103 | 65 | 66 | 00:01:00 | 00:47:00 | f 104 | 60 | 61 | 00:01:00 | 00:47:00 | f 102 | 33 | 34 | 00:01:00 | 00:47:00 | f 101 | 4541581 | 4542350 | 00:00:00 | 00:13:00 | f (4 rows) So, another event is generated (for example, 66 from node 103) log extract : 2009-04-28 15:51:53 CEST DEBUG2 remoteWorkerThread_102: SYNC 33 processing 2009-04-28 15:51:53 CEST DEBUG1 remoteWorkerThread_102: no sets need syncing for this event 2009-04-28 15:51:53 CEST DEBUG2 remoteListenThread_104: queue event 102,33 SYNC 2009-04-28 15:51:53 CEST DEBUG2 remoteWorker_event: event 102,33 ignored - duplicate 2009-04-28 15:51:53 CEST DEBUG2 remoteListenThread_104: queue event 104,60 SYNC 2009-04-28 15:51:53 CEST DEBUG2 remoteWorker_event: event 104,60 ignored - duplicate 2009-04-28 15:51:53 CEST DEBUG2 remoteWorkerThread_104: SYNC 60 processing 2009-04-28 15:51:53 CEST DEBUG1 remoteWorkerThread_104: no sets need syncing for this event 2009-04-28 15:51:53 CEST DEBUG2 remoteWorkerThread_103: SYNC 65 processing 2009-04-28 15:51:53 CEST DEBUG1 remoteWorkerThread_103: no sets need syncing for this event after a few minutes I have : select ev_origin, min(ev_seqno), max(ev_seqno), date_trunc('minutes', min(now() - ev_timestamp)), date_trunc('minutes', max(now() - ev_timestamp)), min(now() - ev_timestamp) > '00:30:00' as agehi from _pns_slony_voila_archi_0.sl_event group by ev_origin; ev_origin | min | max | date_trunc | date_trunc | agehi -----------+---------+---------+------------+------------+------- 103 | 66 | 66 | 00:12:00 | 00:12:00 | f 104 | 61 | 61 | 00:12:00 | 00:12:00 | f 102 | 34 | 34 | 00:12:00 | 00:12:00 | f 101 | 4542310 | 4543008 | 00:00:00 | 00:11:00 | f log extract : 2009-04-28 16:37:49 CEST DEBUG2 remoteWorkerThread_103: SYNC 66 processing 2009-04-28 16:37:49 CEST DEBUG1 remoteWorkerThread_103: no sets need syncing for this event 2009-04-28 16:37:49 CEST DEBUG2 remoteWorkerThread_104: SYNC 61 processing 2009-04-28 16:37:49 CEST DEBUG1 remoteWorkerThread_104: no sets need syncing for this event 2009-04-28 16:37:53 CEST DEBUG2 remoteWorkerThread_102: SYNC 34 processing 2009-04-28 16:37:53 CEST DEBUG1 remoteWorkerThread_102: no sets need syncing for this event Precedent events (65,60 and 33) have been cleared, and a new one per node has been generated. But, the error continues and after 30 minutes I have : select ev_origin, min(ev_seqno), max(ev_seqno), date_trunc('minutes', min(now() - ev_timestamp)), date_trunc('minutes', max(now() - ev_timestamp)), min(now() - ev_timestamp) > '00:30:00' as agehi from _pns_slony_voila_archi_0.sl_event group by ev_origin; ev_origin | min | max | date_trunc | date_trunc | agehi -----------+---------+---------+------------+------------+------- 103 | 66 | 66 | 00:41:00 | 00:41:00 | t 104 | 61 | 61 | 00:41:00 | 00:41:00 | t 102 | 34 | 34 | 00:41:00 | 00:41:00 | t 101 | 4543634 | 4544715 | 00:00:00 | 00:18:00 | f
I have the same error messages in my servers. Does anybody know how I could correct this problem ?
I think this is the same issue as discussed in bug 127
I am marking this as fixed since bug 127 is now fixed.