Bug 111 - node -1 not found in runtime configuration
Summary: node -1 not found in runtime configuration
Status: NEW
Alias: None
Product: Slony-I
Classification: Unclassified
Component: slon (show other bugs)
Version: devel
Hardware: PC Linux
: medium normal
Deadline: 2010-02-04
Assignee: Slony Bugs List
URL:
Depends on:
Blocks: 10
  Show dependency tree
 
Reported: 2010-02-01 00:14 UTC by rezuser
Modified: 2010-08-16 07:51 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description rezuser 2010-02-01 00:14:07 UTC
We get this error very often and one common scenario is,

We have configured a master and slave database clusters in two seperate servers. in the master databse we created two replication sets, one for tables and another for sequences, then we merged the sequence set with tables set. after that we start the slon daemon and we get this error. and there is noway of coming back, unless we restore the whole cluster. Although we subscribe, no replication occurs.

remoteWorkerThread_1: node -1 not found in runtime configuration.
2010-01-25 15:45:18 IST WARN   remoteWorkerThread_1: data copy for set 1 failed - sleep 60 seconds

Please mention the cause of a this problem, or atleast how to overcome this issue without restoring the whole cluster.
Comment 1 Steve Singer 2010-04-23 10:02:28 UTC
See, 
http://lists.slony.info/pipermail/slony1-general/2010-April/010574.html

What we think is happening is that the subscription information for the set on the subscriber is being deleted (ie by an unsubscribe set, but a merge set might be similar?) before the ENABLE SUBSCRIPTION is processed by the slon.  When the event is finally processed the row in sl_subscription has already been deleted.
Comment 2 Jan Wieck 2010-06-09 13:39:39 UTC
Changed version to devel because actually fixing this requires features.

UNSUBSCRIBE SET should continue to be issued against the subscriber. If the event would originate from the set origin, the subscriber must crawl through all the backlog to finally unsubscribe. That is a waste.

The processing of ENABLE_EVENT should on node -1 error simply confirm the event, assuming that the subscription was canceled via UNSUBSCRIBE.

Upon receiving an UNSUBSCRIBE_SET event, the origin of that set will issue yet another UNSUBSCRIBE_SET in order to guard against a possible race condition where a third node, that is a forwarder for the set, receives the initial UNSUBSCRIBE_SET before processing the initial SUBSCRIBE_SET. This would cause it to wrongfully think that the node is actually subscribed to the set.


Jan