Bug 136 - listen paths not being rebuilt correctly
Summary: listen paths not being rebuilt correctly
Status: RESOLVED FIXED
Alias: None
Product: Slony-I
Classification: Unclassified
Component: slon (show other bugs)
Version: 2.0
Hardware: PC Linux
: high normal
Assignee: Steve Singer
URL:
: 129 (view as bug list)
Depends on:
Blocks:
 
Reported: 2010-06-22 17:03 UTC by Steve Singer
Modified: 2010-08-27 12:24 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Singer 2010-06-22 17:03:27 UTC
I have a database (node 4) in my cluster that contains the following in
sl_listen, sl_set and sl_subscribe



test4=# select * FROM _disorder_replica.sl_set;
 set_id | set_origin | set_locked |                         set_comment                          
--------+------------+------------+--------------------------------------------------------------
      1 |          1 |            | A replication set so boring no one thought to give it a name
(1 row)

test4=# select * FROM _disorder_replica.sl_subscribe ;
 sub_set | sub_provider | sub_receiver | sub_forward | sub_active 
---------+--------------+--------------+-------------+------------
       1 |            1 |            2 | t           | t
       1 |            2 |            5 | t           | t
       1 |            2 |            4 | t           | t
(3 rows)

 select * FROM _disorder_replica.sl_listen where li_origin=1;
 li_origin | li_provider | li_receiver 
-----------+-------------+-------------
         1 |           1 |           2
         1 |           2 |           5
         1 |           2 |           4
(3 rows)

The remote listener_1 on node 4 are: 
 remoteListenThread_1: select ev_origin, ev_seqno, ev_timestamp,        ev_snapshot,        "pg_catalog".txid_snapshot_xmin(ev_snapshot),        "pg_catalog".txid_snapshot_xmax(ev_snapshot),        ev_type,        ev_data1, ev_data2,        ev_data3, ev_data4,        ev_data5, ev_data6,        ev_data7, ev_data8 from "_disorder_replica".sl_event e where (e.ev_origin = '5' and e.ev_seqno > '5000000012') or (e.ev_origin = '2' and e.ev_seqno > '5000000020') order by e.ev_origin, e.ev_seqno limit 40


The slon for node 4 is not receiving events from node 1 that have originated at one 1.   To get in this situation my cluster is in the state:

1===>2====>4
     \\
      5
I then issue a subscribe set so node 4 and 5 will receive data directly from node 1.

The slon for node 4 receives the 'subscribe' event via node 2 and then stops listening on events from node 1.
Comment 1 Steve Singer 2010-06-23 06:34:50 UTC
-- 2nd choice:
-- If we are subscribed to any set originating on this
-- event origin, we want to listen on all data providers
-- we use for this origin. We are a cascaded subscriber
-- for sets from this node.

Seems to indicate that the intention was to not listen on events from node 1 but always get them from node 2.  The issue is that if node 2 goes away (ie dies)

If node 2 is no longer accessible then the events originating from node 1 need to come from somewhere.

We could have node 4 add the sl_listen entry for node 1.  The problem then becomes that node 4 might receive a SYNC event directly from node 1 before it receives that sync event via node 2 (in normal circumstances), processing those SYNC events mean that we'd be receiving the data direct from the origin and not through node 2, this defeats the purpose of it being cascaded.

If we continue to ignore those syncs with the message "data provider 2 only confirmed up to ev_seqno 5000000150 for ev_origin 1" then node 4 will never get caught up enough to process the SUBSCRIBE message.

What I think we need to do is continue to NOT listen on the origin.   If node 2 goes away we need to issue a FAILOVER type of command that will tell node 4 that it's provider is no longer node 2 but it is now node 1.  The normal SUBSCRIBE command doesn't work because the SUBSCRIBE command travels from the origin (node 1) to node 4 via node 2 in queue order but it won't get processed.  

At first glance we can 

1) Have a command that gets slonik tell node 4 directly to reconfigure its provider for set 1.   
2) Have an event originate on the new provider that is faked to look like the next event so it will get processed.  I think this is a bad idea because even though we are 'failing' node 2 it is possible that node 4 is still processing events from node 2 so I don't see how your going to safely find the next event id and also ensure that the 'real' event with that id doesn't get processed in the meantime.

I am thinking that slonik should have logic where:   If this is a subscribe set, and the receiver is already subscribed to the set via a different provider then it contacts the receiver directly and reconfigures it
Comment 2 Steve Singer 2010-06-23 06:51:57 UTC
*** Bug 129 has been marked as a duplicate of this bug. ***
Comment 3 Jan Wieck 2010-07-07 09:54:39 UTC
It is actually intended to listen for events on the same node, that is the data provider for a subscription.

In this case, nodes 4 and 5 are using node 2 as data provider for a set originating on node 1. If they would be listening for events from node 1 anywhere else other than on node 2, they would possibly get those events BEFORE node 2 has finished replicating them. They cannot process them at that time anyway.


Jan
Comment 4 Steve Singer 2010-07-30 08:57:00 UTC
The issues described in this bug can also show up in a case like this

1==>3===>4
||  \\   
2     5   



FAILOVER 1===>4

Slonik executes failedNode2 on node 3.  This means that node 3 has a FAILOVER event in its sl_event with an origin of node 1.  That failover event needs to propogate to node 4.

The issue is that the FAILOVER command 'looks' like it is coming from node 1 but it is sitting on the queue at node 3.

The sl_listen configuration at node 4  looks as follows (after failedNode() has been executed on node 4, the backup node)
select * FROM _disorder_replica.sl_listen where li_receiver=4;
 li_origin | li_provider | li_receiver 
-----------+-------------+-------------
         2 |           2 |           4
         3 |           3 |           4
         5 |           5 |           4
         2 |           5 |           4
         3 |           5 |           4
         3 |           2 |           4
         5 |           2 |           4
         2 |           3 |           4
         5 |           3 |           4
         1 |           2 |           4

 select * FROM _disorder_replica.sl_subscribe ;
 sub_set | sub_provider | sub_receiver | sub_forward | sub_active 
---------+--------------+--------------+-------------+------------
       1 |            2 |            4 | t           | t
       1 |            4 |            2 | t           | t
       1 |            4 |            3 | t           | t
       1 |            4 |            5 | t           | t
(4 rows)


This is because failedNode() has already modified sl_subscribe so node 2 and node 3 are receiving data from node 4.

Because node 4 has a sl_subscribe row saying it gets the set from node 2 (and set 1 is still set to have node 1 as an origin) node 4 stops listening for node 1 origin events from node3.
Comment 5 Steve Singer 2010-08-09 07:56:49 UTC
Patches posted at 
http://lists.slony.info/pipermail/slony1-patches/2010-August/000110.html
Comment 6 Christopher Browne 2010-08-18 09:29:26 UTC
I have posted a suggested patch to Steve's patch.

http://github.com/cbbrowne/slony1-engine/commit/18d9a819a031269e336697e1fff291e1cbcd3667

Looking OK to me.
Comment 7 Steve Singer 2010-08-25 11:38:34 UTC
Go!  (Commit)
Comment 8 Steve Singer 2010-08-27 12:24:42 UTC
This has been committed to REL_2_0_STABLE and master