Bug 92 - Events and Confirmations are not propagating
Summary: Events and Confirmations are not propagating
Status: RESOLVED FIXED
Alias: None
Product: Slony-I
Classification: Unclassified
Component: slon (show other bugs)
Version: devel
Hardware: PC Linux
: high critical
Assignee: Slony Bugs List
URL:
Depends on: 127
Blocks:
  Show dependency tree
 
Reported: 2009-05-18 02:35 UTC by Cyril Scetbon
Modified: 2010-08-11 14:07 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Cyril Scetbon 2009-05-18 02:35:04 UTC
I use test_slony_state and see some informations like :

Check of event info
---------------------------------------------------
Problem : Events not propagating to node 2
Problem : Events not propagating to node 4
Problem : Events not propagating to node 3

Check of sl_confirm aging
---------------------------------------------------
Confirmations not propagating from 2 to 1
Confirmations not propagating from 2 to 3
Confirmations not propagating from 2 to 4
Confirmations not propagating from 3 to 1
Confirmations not propagating from 3 to 2
Confirmations not propagating from 3 to 4
Confirmations not propagating from 4 to 1
Confirmations not propagating from 4 to 2
Confirmations not propagating from 4 to 3

You can see the results on one of my databases :

- for Confirmations

select con_origin, con_received, min(con_seqno) as minseq, max(con_seqno) as maxseq, date_trunc('minutes', min(now()-con_timestamp)) as age1, date_trunc('minutes', max(now()-con_timestamp)) as age2, min(now() - con_timestamp) > '00:30:00' as tooold from _pns_slony_voila_archi_0.sl_confirm group by con_origin, con_received order by con_origin, con_received;
con_origin | con_received | minseq  | maxseq  |   age1   |   age2   | tooold ------------+--------------+---------+---------+----------+----------+--------
       101 |          102 | 4464029 | 4464792 | 00:00:00 | 00:16:00 | f
       101 |          103 | 4464027 | 4464792 | 00:00:00 | 00:16:00 | f
       101 |          104 | 4464024 | 4464792 | 00:00:00 | 00:16:00 | f
       102 |          101 |      29 |      29 | 03:39:00 | 03:39:00 | t
       102 |          103 |      29 |      29 | 03:39:00 | 03:39:00 | t
       102 |          104 |      29 |      29 | 03:39:00 | 03:39:00 | t
       103 |          101 |      62 |      62 | 03:39:00 | 03:39:00 | t
       103 |          102 |      62 |      62 | 03:39:00 | 03:39:00 | t
       103 |          104 |      62 |      62 | 03:39:00 | 03:39:00 | t
       104 |          101 |      57 |      57 | 03:39:00 | 03:39:00 | t
       104 |          102 |      57 |      57 | 03:39:00 | 03:39:00 | t
       104 |          103 |      57 |      57 | 03:39:00 | 03:39:00 | t

- for Events

select ev_origin, min(ev_seqno), max(ev_seqno),
        date_trunc('minutes', min(now() - ev_timestamp)),
        date_trunc('minutes', max(now() - ev_timestamp)),
        min(now() - ev_timestamp) > '00:30:00' as agehi
    from _pns_slony_voila_archi_0.sl_event group by ev_origin;
ev_origin |   min   |   max   | date_trunc | date_trunc | agehi
-----------+---------+---------+------------+------------+-------
      103 |      62 |      62 | 03:49:00   | 03:49:00   | t
      104 |      57 |      57 | 03:49:00   | 03:49:00   | t
      102 |      29 |      29 | 03:49:00   | 03:49:00   | t
      101 | 4464493 | 4465346 | 00:00:00   | 00:14:00   | f

If I restart every slons processes new events are generated on these nodes but still not correctly propagated. However the last event on each node seems to be propagated thanks to the restart stage (see below)

After the restart I have something like :

select ev_origin, min(ev_seqno), max(ev_seqno),
       date_trunc('minutes', min(now() - ev_timestamp)),
       date_trunc('minutes', max(now() - ev_timestamp)),
       min(now() - ev_timestamp) > '00:30:00' as agehi
   from _pns_slony_voila_archi_0.sl_event group by ev_origin;
ev_origin |   min   |   max   | date_trunc | date_trunc | agehi
-----------+---------+---------+------------+------------+-------
      103 |      65 |      66 | 00:01:00   | 00:47:00   | f
      104 |      60 |      61 | 00:01:00   | 00:47:00   | f
      102 |      33 |      34 | 00:01:00   | 00:47:00   | f
      101 | 4541581 | 4542350 | 00:00:00   | 00:13:00   | f
(4 rows)

So, another event is generated (for example, 66 from node 103)

log extract :
2009-04-28 15:51:53 CEST DEBUG2 remoteWorkerThread_102: SYNC 33 processing
2009-04-28 15:51:53 CEST DEBUG1 remoteWorkerThread_102: no sets need syncing for this event
2009-04-28 15:51:53 CEST DEBUG2 remoteListenThread_104: queue event 102,33 SYNC
2009-04-28 15:51:53 CEST DEBUG2 remoteWorker_event: event 102,33 ignored - duplicate
2009-04-28 15:51:53 CEST DEBUG2 remoteListenThread_104: queue event 104,60 SYNC
2009-04-28 15:51:53 CEST DEBUG2 remoteWorker_event: event 104,60 ignored - duplicate
2009-04-28 15:51:53 CEST DEBUG2 remoteWorkerThread_104: SYNC 60 processing
2009-04-28 15:51:53 CEST DEBUG1 remoteWorkerThread_104: no sets need syncing for this event
2009-04-28 15:51:53 CEST DEBUG2 remoteWorkerThread_103: SYNC 65 processing
2009-04-28 15:51:53 CEST DEBUG1 remoteWorkerThread_103: no sets need syncing for this event

after a few minutes I have :

select ev_origin, min(ev_seqno), max(ev_seqno),
       date_trunc('minutes', min(now() - ev_timestamp)),
       date_trunc('minutes', max(now() - ev_timestamp)),
       min(now() - ev_timestamp) > '00:30:00' as agehi
   from _pns_slony_voila_archi_0.sl_event group by ev_origin;
ev_origin |   min   |   max   | date_trunc | date_trunc | agehi
-----------+---------+---------+------------+------------+-------
      103 |      66 |      66 | 00:12:00   | 00:12:00   | f
      104 |      61 |      61 | 00:12:00   | 00:12:00   | f
      102 |      34 |      34 | 00:12:00   | 00:12:00   | f
      101 | 4542310 | 4543008 | 00:00:00   | 00:11:00   | f

log extract :
2009-04-28 16:37:49 CEST DEBUG2 remoteWorkerThread_103: SYNC 66 processing
2009-04-28 16:37:49 CEST DEBUG1 remoteWorkerThread_103: no sets need syncing for this event
2009-04-28 16:37:49 CEST DEBUG2 remoteWorkerThread_104: SYNC 61 processing
2009-04-28 16:37:49 CEST DEBUG1 remoteWorkerThread_104: no sets need syncing for this event
2009-04-28 16:37:53 CEST DEBUG2 remoteWorkerThread_102: SYNC 34 processing
2009-04-28 16:37:53 CEST DEBUG1 remoteWorkerThread_102: no sets need syncing for this event

Precedent events (65,60 and 33) have been cleared, and a new one per node has been generated. But, the error continues and after 30 minutes I have :

select ev_origin, min(ev_seqno), max(ev_seqno),
       date_trunc('minutes', min(now() - ev_timestamp)),
       date_trunc('minutes', max(now() - ev_timestamp)),
       min(now() - ev_timestamp) > '00:30:00' as agehi
   from _pns_slony_voila_archi_0.sl_event group by ev_origin;
ev_origin |   min   |   max   | date_trunc | date_trunc | agehi
-----------+---------+---------+------------+------------+-------
      103 |      66 |      66 | 00:41:00   | 00:41:00   | t
      104 |      61 |      61 | 00:41:00   | 00:41:00   | t
      102 |      34 |      34 | 00:41:00   | 00:41:00   | t
      101 | 4543634 | 4544715 | 00:00:00   | 00:18:00   | f
Comment 1 Jean-Armel Luce 2009-05-29 04:36:36 UTC
I have the same error messages in my servers.

Does anybody know how I could correct this problem ?
Comment 2 Steve Singer 2010-06-10 10:26:10 UTC
I think this is the same issue as discussed in bug 127
Comment 3 Steve Singer 2010-08-11 14:07:31 UTC
I am marking this as fixed since bug 127 is now fixed.