Bug 213 - failover while slon is not running
Summary: failover while slon is not running
Status: NEW
Alias: None
Product: Slony-I
Classification: Unclassified
Component: slon (show other bugs)
Version: devel
Hardware: PC Linux
: low enhancement
Assignee: Slony Bugs List
URL:
Depends on:
Blocks:
 
Reported: 2011-05-13 11:24 UTC by Steve Singer
Modified: 2011-05-13 11:24 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Singer 2011-05-13 11:24:12 UTC
This bug was introduced in  bfa8e601fe7ba1bd91a053901426d4f7195c53a0 (2.1.0) and 60566590d683b85733404ef290e6c1823c4c014c (2.0.5)

If a failover command is executed while the slon for the backup node is not running (say node 2)

The most ahead node (say node 3) will have a FAILOVER_SET event generated with a ev_origin=1 (the failing node).

For the failover to finish that event needs to be processed on node 2.  When the slon for node 2 is later started  it sees that no_active=false in sl_node (this change was made in the above referenced commits).  Since the node is inactive no remoteWorkerThread_1 is started so the slon for node 2 won't ever process the FAILOVER_SET event since that event has ev_origin=1.


As a workaround if you get into this situation you can:

manually (with psql) set no_active=true for the failed node on node 2.  Then start the slon for node 2.  It will now have a remoteWorkerThread_1 and process the FAILVOVER_SET command.

Longer term we probably need to split out a nodes inactive status for rebuild listen paths and waiting compared with starting slon worker threads?