A user reported that they made the mistake of using over-broad wildcards to specify the sequences to replicate.
This led to the sequence that captures the node ID being replicated, which then led to subscriber nodes all deciding they were the origin node. This is the "I am Spartacus!" bug. http://en.wikipedia.org/wiki/Spartacus_%28film%29#.22I.27m_Spartacus.21.22
We should add some error checking to the functions that control adding tables and sequences to replication so that they will decline to add dangerous objects to replication, specifically:
a) Slony sequences, which may be recognized by having namespace ~ '^_.*' and name like 'sl_%'
b) Slony tables, which may be recognized by having namespace ~ '^_.*' and name like 'sl_%'
c) Objects in schemas pg_catalog and information_schema
Here is a preliminary proposal for implementation.
I am proposing here to do a somewhat fuzzy matching of which tables are to be refused; I treat that if the namespace starts with "_" and the table name starts with "sl_", then the table is to be treated as a replication table.
if v_tab_nspname ~ '^_.*' and v_tab_relname ~ '^sl_.*' then
raise exception 'Slony-I: setAddTable_int(): % appears to be a replication configuration table and cannot be replicated',
That may be a bit too open-ended. What I don't want to do is to be insufficiently open-ended; it would be broken for one slony installation to break another one.
It would be a more precise test to check that the namespace in question has one or more well-known Slony tables within it. (e.g. - I could check that, in addition, there's the table sl_log_1 in the same namespace).
Does that approach seem agreeable?
We do know the actual name of the current Slony cluster schema. No object in that should be replicated by Slony. Matching the namespace exactly on that should be sufficient.
In addition to that, nothing in pg_catalog or information_schema is supposed to be replicated either, I think.
(In reply to comment #2)
> We do know the actual name of the current Slony cluster schema. No object in
> that should be replicated by Slony. Matching the namespace exactly on that
> should be sufficient.
That's (at least!) 1/2 of the problem here; yes, that's a good test.
But I'm feeling a wee bit more paranoid. What if there are two clusters in the database (because someone has gotten overexuberant about replication)? Should we consider the second cluster?
> In addition to that, nothing in pg_catalog or information_schema is supposed to
> be replicated either, I think.
Yep, that was my "item c)"