Bug 327 - Sync failed right after initial copy
Summary: Sync failed right after initial copy
Status: RESOLVED FIXED
Alias: None
Product: Slony-I
Classification: Unclassified
Component: slon (show other bugs)
Version: 2.0
Hardware: PC Linux
: urgent critical
Assignee: Jan Wieck
URL:
: 330 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-12-09 02:25 UTC by Timur
Modified: 2014-02-13 05:17 UTC (History)
2 users (show)

See Also:


Attachments
Proposed fix (964 bytes, patch)
2014-01-29 09:14 UTC, Jan Wieck
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Timur 2013-12-09 02:25:23 UTC
HW/SW info:
Slony: v2.2.1
OS: CentOS release 6.4 (2.6.32-358.6.2.el6.x86_64)
DB: PostgreSQL 9.2.4

Details:
Sync failed right after initial copy. 
At first time, with "omit copy = yes" it failed with:

>>>>
2013-12-03 22:13:02 HKT ERROR  remoteWorkerThread_1: SYNC aborted
2013-12-03 22:13:12 HKT INFO   remoteWorkerThread_1: syncing set 18 with 219 table(s) from provider 1
2013-12-03 22:13:12 HKT ERROR  remoteWorkerThread_1_1: error at end of COPY IN: ERROR:  invalid memory alloc request size 1869439327
CONTEXT:  COPY sl_log_2, line 949: "1   1810515783      18203   27151   casino  jackpot_contrib_summary U       1       {contrib_amount_bc,4.317294024,summary_d..."
2013-12-03 22:13:12 HKT ERROR  remoteWorkerThread_1_1: failed SYNC's log selection query was  '
COPY (
select   log_origin, log_txid, NULL::integer, log_actionseq, NULL::text, NULL::text, log_cmdtype, NULL::integer, log_cmdargs
from "_cluster_casino".sl_log_script
where log_origin = 1 and log_txid >= "pg_catalog".txid_snapshot_xmax('1810515020:1810515020:')
and log_txid < '1810515860'
and "pg_catalog".txid_visible_in_snapshot(log_txid, '1810515860:1810515860:')
union all
select log_origin, log_txid, NULL::integer, log_actionseq, NULL::text, NULL::text, log_cmdtype, NULL::integer, log_cmdargs
from "_cluster_casino".sl_log_script
where log_origin = 1 and log_txid in (select *
                                      from "pg_catalog".txid_snapshot_xip('1810515020:1810515020:')
                                      except select * from "pg_catalog".txid_snapshot_xip('1810515860:1810515860:') )
                                      union
                                      all select log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs
                                      from "_cluster_casino".sl_log_1
                                      where log_origin = 1
                                      and log_tableid in (18229,18046,18058,18047,18048,18080,18104,18049,18050,18001,18002,18003,18005,18006,18007,18008,18009,18011,18012,18013,18010,18014,18015,
                                                         18016,18017,18018,18019,18020,18021,18022,18023,18024,18025,18026,18030,18027,18028,18031,18032,18033,18034,18035,18036,18037,18038,18039,18040,
                                                         18041,18042,18043,18044,18045,18051,18052,18053,18054,18055,18056,18057,18059,18060,18061,18062,18063,18064,18065,18066,18067,18068,18069,18070,
                                                         18071,18072,18073,18074,18075,18076,18077,18078,18079,18081,18082,18083,18084,18085,18086,18087,18088,18089,18090,18091,18092,18093,18094,18095,
                                                         18096,18097,18098,18099,18100,18101,18102,18103,18105,18106,18107,18110,18111,18112,18113,18114,18115,18116,18117,18118,18119,18120,18121,18122,
                                                         18123,18124,18125,18126,18127,18128,18129,18130,18132,18135,18136,18137,18138,18139,18140,18141,18142,18143,18145,18146,18147,18148,18149,18150,
                                                         18151,18152,18153,18154,18155,18156,18157,18158,18159,18160,18161,18162,18163,18164,18165,18166,18167,18168,18169,18170,18171,18172,18173,18174,
                                                         18175,18176,18177,18178,18179,18180,18181,18182,18183,18184,18185,18186,18187,18189,18190,18191,18192,18193,18194,18195,18196,18197,18198,18199,
                                                         18201,18202,18203,18204,18205,18206,18207,18208,18209,18210,18211,18212,18213,18214,18215,18216,18217,18218,18219,18220,18221,18222,18223,18224,
                                                         18225,18226,18227,18228)
                                      and log_txid >= '1810515020'
                                      and log_txid < '1810515860'
                                      and "pg_catalog".txid_visible_in_snapshot(log_txid, '1810515860:1810515860:')
                                      union all select log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs
                                      from "_cluster_casino".sl_log_1
                                      where log_origin = 1 and log_tableid in (18229,18046,18058,18047,18048,18080,18104,18049,18050,18001,18002,18003,18005,18006,18007,18008,18009,18011,18012,18013,
                                                                  18010,18014,18015,18016,18017,18018,18019,18020,18021,18022,18023,18024,18025,18026,18030,18027,18028,18031,18032,18033,18034,18035,18036,
                                                                  18037,18038,18039,18040,18041,18042,18043,18044,18045,18051,18052,18053,18054,18055,18056,18057,18059,18060,18061,18062,18063,18064,
                                                                  18065,18066,18067,18068,18069,18070,18071,18072,18073,18074,18075,18076,18077,18078,18079,18081,18082,18083,18084,18085,18086,18087,
                                                                  18088,18089,18090,18091,18092,18093,18094,18095,18096,18097,18098,18099,18100,18101,18102,18103,18105,18106,18107,18110,18111,18112,
                                                                  18113,18114,18115,18116,18117,18118,18119,18120,18121,18122,18123,18124,18125,18126,18127,18128,18129,18130,18132,18135,18136,18137,
                                                                  18138,18139,18140,18141,18142,18143,18145,18146,18147,18148,18149,18150,18151,18152,18153,18154,18155,18156,18157,18158,18159,18160,
                                                                  18161,18162,18163,18164,18165,18166,18167,18168,18169,18170,18171,18172,18173,18174,18175,18176,18177,18178,18179,18180,18181,18182,
                                                                  18183,18184,18185,18186,18187,18189,18190,18191,18192,18193,18194,18195,18196,18197,18198,18199,18201,18202,18203,18204,18205,18206,
                                                                  18207,18208,18209,18210,18211,18212,18213,18214,18215,18216,18217,18218,18219,18220,18221,18222,18223,18224,18225,18226,18227,18228)
                                       and log_txid in (select *
                                                        from "pg_catalog".txid_snapshot_xip('1810515020:1810515020:')
                                                        except select *
                                                        from "pg_catalog".txid_snapshot_xip('1810515860:1810515860:') )
                                                        union
                                                        all select log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs
                                                        from "_cluster_casino".sl_log_2 where log_origin = 1 and log_tableid in (18229,18046,18058,18047,18048,18080,18104,18049,18050,18001,18002,18003,18005,                                                        18006,18007,18008,18009,18011,18012,18013,18010,18014,18015,18016,18017,18018,18019,18020,18021,18022,18023,18024,18025,18026,18030,18027,18028,18031,                                                        18032,18033,18034,18035,18036,18037,18038,18039,18040,18041,18042,18043,18044,18045,18051,18052,18053,18054,18055,18056,18057,18059,18060,18061,18062,18063,18064,                                                        18065,18066,18067,18068,18069,18070,18071,18072,18073,18074,18075,18076,18077,18078,18079,18081,18082,18083,18084,18085,18086,18087,18088,18089,18090,18091,18092,18093,                                                        18094,18095,18096,18097,18098,18099,18100,18101,18102,18103,18105,18106,18107,18110,18111,18112,18113,18114,18115,18116,18117,18118,18119,18120,18121,18122,18123,18124,                                                        18125,18126,18127,18128,18129,18130,18132,18135,18136,18137,18138,18139,18140,18141,18142,18143,18145,18146,18147,18148,18149,18150,18151,18152,18153,18154,18155,18156,                                                        18157,18158,18159,18160,18161,18162,18163,18164,18165,18166,18167,18168,18169,18170,18171,18172,18173,18174,18175,18176,18177,18178,18179,18180,18181,18182,18183,18184,                                                        18185,18186,18187,18189,18190,18191,18192,18193,18194,18195,18196,18197,18198,18199,18201,18202,18203,18204,18205,18206,18207,18208,18209,18210,18211,18212,18213,18214,                                                        18215,18216,18217,18218,18219,18220,18221,18222,18223,18224,18225,18226,18227,18228)
                                                        and log_txid >= '1810515020' and log_txid < '1810515860' and "pg_catalog".txid_visible_in_snapshot(log_txid, '1810515860:1810515860:')
                                                        union all select log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs
                                                        from "_cluster_casino".sl_log_2 where log_origin = 1 and log_tableid in (18229,18046,18058,18047,18048,18080,18104,18049,18050,18001,18002,18003,18005,18006,18007,18008,18009,18011,18012,18013,18010,18014,18015,18016,18017,18018,18019,18020,18021,18022,18023,18024,18025,18026,18030,18027,18028,18031,18032,18033,18034,18035,18036,18037,18038,18039,18040,18041,18042,18043,18044,18045,18051,18052,18053,18054,18055,18056,18057,18059,18060,18061,18062,18063,18064,18065,18066,18067,18068,18069,18070,18071,18072,18073,18074,18075,18076,18077,18078,18079,18081,18082,18083,18084,18085,18086,18087,18088,18089,18090,18091,18092,18093,18094,18095,18096,18097,18098,18099,18100,18101,18102,18103,18105,18106,18107,18110,18111,18112,18113,18114,18115,18116,18117,18118,18119,18120,18121,18122,18123,18124,18125,18126,18127,18128,18129,18130,18132,18135,18136,18137,18138,18139,18140,18141,                                                        18142,18143,18145,18146,18147,18148,18149,18150,18151,18152,18153,18154,18155,18156,18157,18158,18159,18160,18161,18162,18163,18164,18165,18166,18167,18168,18169,18170,18171,18172,18173,18174,18175,18176,18177,18178,18179,18180,18181,18182,18183,18184,18185,18186,18187,18189,18190,18191,18192,18193,18194,18195,18196,18197,18198,18199,18201,18202,18203,18204,18205,18206,18207,18208,18209,18210,18211,18212,18213,18214,18215,18216,18217,18218,18219,18220,18221,18222,18223,18224,18225,18226,18227,18228)
                                                        and log_txid in (select * from "pg_catalog".txid_snapshot_xip('1810515020:1810515020:')
                                                        except select * from "pg_catalog".txid_snapshot_xip('1810515860:1810515860:') ) order by log_actionseq) TO STDOUT
<<<<


Later after cluster uninstall, after initial copy with "omit copy = no":
>>>>
2013-12-04 06:59:42 HKT INFO   remoteWorkerThread_1: 0.280 seconds to copy sequences
2013-12-04 06:59:42 HKT INFO   remoteWorkerThread_1: copy_set SYNC found, use event seqno 5000000641.
2013-12-04 06:59:43 HKT INFO   remoteWorkerThread_1: 0.477 seconds to build initial setsync status
2013-12-04 06:59:43 HKT INFO   copy_set 18 done in 10132.370 seconds
2013-12-04 06:59:43 HKT CONFIG enableSubscription: sub_set=18
2013-12-04 07:00:13 HKT CONFIG storeListen: li_origin=1 li_receiver=2 li_provider=1
2013-12-04 07:00:13 HKT CONFIG remoteWorkerThread_1: update provider configuration
2013-12-04 07:00:13 HKT CONFIG remoteWorkerThread_1: added active set 18 to provider 1
2013-12-04 07:00:13 HKT CONFIG version for "dbname=db01 host=db01-slony port=5443 user=slon_casino" is 90204
2013-12-04 07:00:13 HKT INFO   remoteWorkerThread_1: syncing set 18 with 219 table(s) from provider 1
2013-12-04 07:00:13 HKT INFO   cleanupThread: 9616.145 seconds for cleanupEvent()
2013-12-04 07:00:17 HKT INFO   remoteWorkerThread_1: SYNC 5000000642 done in 3.942 seconds
2013-12-04 07:00:17 HKT INFO   remoteWorkerThread_1: syncing set 18 with 219 table(s) from provider 1
2013-12-04 07:01:26 HKT ERROR  remoteWorkerThread_1_1: error at end of COPY IN: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
2013-12-04 07:01:26 HKT ERROR  remoteWorkerThread_1_1: failed SYNC's log selection query was 'COPY ( select log_origin, log_txid, NULL::integer, log_actionseq, NULL::text, NULL::text, log_cmdtype, NULL::integer, log_cmdargs from "_clust
er_casino".sl_log_script where log_origin = 1 and log_txid >= "pg_catalog".txid_snapshot_xmax('1814446848:1814446855:1814446848') and log_txid < '1814450484' and "pg_catalog".txid_visible_in_snapshot(log_txid, '1814450484:1814450484:') 
union all select log_origin, log_txid, NULL::integer, log_actionseq, NULL::text, NULL::text, log_cmdtype, NULL::integer, log_cmdargs from "_cluster_casino".sl_log_script where log_origin = 1 and log_txid in (select * from "pg_catalog".t
xid_snapshot_xip('1814446848:1814446855:1814446848') except select * from "pg_catalog".txid_snapshot_xip('1814450484:1814450484:') ) union all select log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, log_tablerelname, 
log_cmdtype, log_cmdupdncols, log_cmdargs from "_cluster_casino".sl_log_1 where log_origin = 1 and log_tableid in (18001,18002,18003,18005,18006,18007,18008,18009,18010,18011,18012,18013,18014,18015,18016,18017,18018,18019,18020,18021,1
8022,18023,18024,18025,18026,18027,18028,18030,18031,18032,18033,18034,18035,18036,18037,18038,18039,18040,18041,18042,18043,18044,18045,18046,18047,18048,18049,18050,18051,18052,18053,18054,18055,18056,18057,18058,18059,18060,18061,180
62,18063,18064,18065,18066,18067,18068,18069,18070,18071,18072,18073,18074,18075,18076,18077,18078,18079,18080,18081,18082,18083,18084,18085,18086,18087,18088,18089,18090,18091,18092,18093,18094,18095,18096,18097,18098,18099,18100,18101
,18102,18103,18104,18105,18106,18107,18110,18111,18112,18113,18114,18115,18116,18117,18118,18119,18120,18121,18122,18123,18124,18125,18126,18127,18128,18129,18130,18132,18135,18136,18137,18138,18139,18140,18141,18142,18143,18145,18146,1
8147,18148,18149,18150,18151,18152,18153,18154,18155,18156,18157,18158,18159,18160,18161,18162,18163,18164,18165,18166,18167,18168,18169,18170,18171,18172,18173,18174,18175,18176,18177,18178,18179,18180,18181,18182,18183,18184,18185,181
86,18187,18189,18190,18191,18192,18193,18194,18195,18196,18197,18198,18199,18201,18202,18203,18204,18205,18206,18207,18208,18209,18210,18211,18212,18213,18214,18215,18216,18217,18218,18219,18220,18221,18222,18223,18224,18225,18226,18227
,18228,18229) and log_txid >= '1814446855' and log_txid < '1814450484' and "pg_catalog".txid_visible_in_snapshot(log_txid, '1814450484:1814450484:') union all select log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, lo
g_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs from "_cluster_casino".sl_log_1 where log_origin = 1 and log_tableid in (18001,18002,18003,18005,18006,18007,18008,18009,18010,18011,18012,18013,18014,18015,18016,18017,18018,180
19,18020,18021,18022,18023,18024,18025,18026,18027,18028,18030,18031,18032,18033,18034,18035,18036,18037,18038,18039,18040,18041,18042,18043,18044,18045,18046,18047,18048,18049,18050,18051,18052,18053,18054,18055,18056,18057,18058,18059
,18060,18061,18062,18063,18064,18065,18066,18067,18068,18069,18070,18071,18072,18073,18074,18075,18076,18077,18078,18079,18080,18081,18082,18083,18084,18085,18086,18087,18088,18089,18090,18091,18092,18093,18094,18095,18096,18097,18098,1
8099,18100,18101,18102,18103,18104,18105,18106,18107,18110,18111,18112,18113,18114,18115,18116,18117,18118,18119,18120,18121,18122,18123,18124,18125,18126,18127,18128,18129,18130,18132,18135,18136,18137,18138,18139,18140,18141,18142,181
43,18145,18146,18147,18148,18149,18150,18151,18152,18153,18154,18155,18156,18157,18158,18159,18160,18161,18162,18163,18164,18165,18166,18167,18168,18169,18170,18171,18172,18173,18174,18175,18176,18177,18178,18179,18180,18181,18182,18183
,18184,18185,18186,18187,18189,18190,18191,18192,18193,18194,18195,18196,18197,18198,18199,18201,18202,18203,18204,18205,18206,18207,18208,18209,18210,18211,18212,18213,18214,18215,18216,18217,18218,18219,18220,18221,18222,18223,18224,1
8225,18226,18227,18228,18229) and log_txid in (select * from "pg_catalog".txid_snapshot_xip('1814446848:1814446855:1814446848') except select * from "pg_catalog".txid_snapshot_xip('1814450484:1814450484:') ) union all select log_origin,
 log_txid, log_tableid, log_actionseq, log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs from "_cluster_casino".sl_log_2 where log_origin = 1 and log_tableid in (18001,18002,18003,18005,18006,18007,18008,1800
9,18010,18011,18012,18013,18014,18015,18016,18017,18018,18019,18020,18021,18022,18023,18024,18025,18026,18027,18028,18030,18031,18032,18033,18034,18035,18036,18037,18038,18039,18040,18041,18042,18043,18044,18045,18046,18047,18048,18049,
18050,18051,18052,18053,18054,18055,18056,18057,18058,18059,18060,18061,18062,18063,18064,18065,18066,18067,18068,18069,18070,18071,18072,18073,18074,18075,18076,18077,18078,18079,18080,18081,18082,18083,18084,18085,18086,18087,18088,18
089,18090,18091,18092,18093,18094,18095,18096,18097,18098,18099,18100,18101,18102,18103,18104,18105,18106,18107,18110,18111,18112,18113,18114,18115,18116,18117,18118,18119,18120,18121,18122,18123,18124,18125,18126,18127,18128,18129,1813
0,18132,18135,18136,18137,18138,18139,18140,18141,18142,18143,18145,18146,18147,18148,18149,18150,18151,18152,18153,18154,18155,18156,18157,18158,18159,18160,18161,18162,18163,18164,18165,18166,18167,18168,18169,18170,18171,18172,18173,
18174,18175,18176,18177,18178,18179,18180,18181,18182,18183,18184,18185,18186,18187,18189,18190,18191,18192,18193,18194,18195,18196,18197,18198,18199,18201,18202,18203,18204,18205,18206,18207,18208,18209,18210,18211,18212,18213,18214,18
215,18216,18217,18218,18219,18220,18221,18222,18223,18224,18225,18226,18227,18228,18229) and log_txid >= '1814446855' and log_txid < '1814450484' and "pg_catalog".txid_visible_in_snapshot(log_txid, '1814450484:1814450484:') union all se
lect log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs from "_cluster_casino".sl_log_2 where log_origin = 1 and log_tableid in (18001,18002,18003,18005,18006,
18007,18008,18009,18010,18011,18012,18013,18014,18015,18016,18017,18018,18019,18020,18021,18022,18023,18024,18025,18026,18027,18028,18030,18031,18032,18033,18034,18035,18036,18037,18038,18039,18040,18041,18042,18043,18044,18045,18046,18
047,18048,18049,18050,18051,18052,18053,18054,18055,18056,18057,18058,18059,18060,18061,18062,18063,18064,18065,18066,18067,18068,18069,18070,18071,18072,18073,18074,18075,18076,18077,18078,18079,18080,18081,18082,18083,18084,18085,1808
6,18087,18088,18089,18090,18091,18092,18093,18094,18095,18096,18097,18098,18099,18100,18101,18102,18103,18104,18105,18106,18107,18110,18111,18112,18113,18114,18115,18116,18117,18118,18119,18120,18121,18122,18123,18124,18125,18126,18127,
18128,18129,18130,18132,18135,18136,18137,18138,18139,18140,18141,18142,18143,18145,18146,18147,18148,18149,18150,18151,18152,18153,18154,18155,18156,18157,18158,18159,18160,18161,18162,18163,18164,18165,18166,18167,18168,18169,18170,18
171,18172,18173,18174,18175,18176,18177,18178,18179,18180,18181,18182,18183,18184,18185,18186,18187,18189,18190,18191,18192,18193,18194,18195,18196,18197,18198,18199,18201,18202,18203,18204,18205,18206,18207,18208,18209,18210,18211,1821
2,18213,18214,18215,18216,18217,18218,18219,18220,18221,18222,18223,18224,18225,18226,18227,18228,18229) and log_txid in (select * from "pg_catalog".txid_snapshot_xip('1814446848:1814446855:1814446848') except select * from "pg_catalog"
.txid_snapshot_xip('1814450484:1814450484:') ) order by log_actionseq) TO STDOUT'
2013-12-04 07:01:26 HKT ERROR  remoteWorkerThread_1: SYNC aborted
2013-12-04 07:01:26 HKT ERROR  remoteWorkerThread_1: "rollback transaction" PGRES_FATAL_ERROR 2013-12-04 07:01:26 HKT CONFIG slon: child terminated signal: 9; pid: 12018, current worker pid: 12018
2013-12-04 07:01:26 HKT CONFIG slon: restart of worker in 10 seconds
<<<<

Together with second attempt's failure I got "Segmentation fault" in Postgres:
>>>>
<2013-12-04 07:00:13 HKT-slon_casino-XXX.XXX.XXX.85(57578)-db01 [app:slon.local_cleanup,pid:12031,00000]>LOG:  process 12031 acquired AccessExclusiveLock on relation 355503 of database 31761 after 9616098.245 ms
<2013-12-04 07:00:13 HKT-slon_casino-XXX.XXX.XXX.85(57578)-db01 [app:slon.local_cleanup,pid:12031,00000]>STATEMENT:  begin;lock table "_cluster_casino".sl_config_lock;select "_cluster_casino".cleanupEvent('10 minutes'::interval);commi
t;
<2013-12-04 07:00:13 HKT-slon_casino-XXX.XXX.XXX.85(57578)-db01 [app:slon.local_cleanup,pid:12031,00000]>LOG:  duration: 9616145.184 ms  statement: begin;lock table "_cluster_casino".sl_config_lock;select "_cluster_casino".cleanupEven
t('10 minutes'::interval);commit;
<2013-12-04 07:01:26 HKT--- [app:,pid:75714,00000]>LOG:  server process (PID 12028) was terminated by signal 11: Segmentation fault
<2013-12-04 07:01:26 HKT--- [app:,pid:75714,00000]>DETAIL:  Failed process was running: COPY "_cluster_casino"."sl_log_2" ( log_origin, log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols,
log_cmdargs) FROM STDIN
<2013-12-04 07:01:26 HKT--- [app:,pid:75714,00000]>LOG:  terminating any other active server processes
<2013-12-04 07:01:27 HKT--- [app:,pid:75714,00000]>LOG:  all server processes terminated; reinitializing
<2013-12-04 07:01:30 HKT--- [app:,pid:21980,00000]>LOG:  database system was interrupted; last known up at 2013-12-04 06:58:01 HKT
<2013-12-04 07:01:30 HKT--- [app:,pid:21980,00000]>LOG:  database system was not properly shut down; automatic recovery in progress
<2013-12-04 07:01:30 HKT--- [app:,pid:21980,00000]>LOG:  redo starts at 759/FA2ED570
<2013-12-04 07:01:36 HKT-slon_xg-XXX.XXX.XXX.84(58626)-db01 [app:[unknown],pid:21983,57P03]>FATAL:  the database system is in recovery mode
<<<<


Restart neither postgresql nor the slony has not helped. So, I did a rollback to Slony 2.1.4 and it works perfect now.

P.s.
I have 3 independent slony clusters (for 3 different db schemas) within one database. I got such problems only with one of them. Other 2 worked good with v2.2.1.
Unfortunately, I can't do more investigations for you, because this is production env and I already did a rollback to 2.1.4.
Comment 1 Jan Wieck 2013-12-12 10:22:09 UTC
Interestingly the number 1869439327 (the requested memory size) represents the string "_amo", which is part of the array data of that very log row in the column/value list right in the first column name "contrib_amount_bc". This would indicate something went wrong in the apply trigger and apparently wrote that column name over some other variables.

I'll take a detailed look at the apply trigger code.
Comment 2 Matt 2014-01-24 08:24:22 UTC
I think I may have hit this same bug. My symptoms:

Set up a cluster with Postgres 9.2.6 and Slony 2.2.1. Cascading replication set up with Master -> Intermediate Slave -> 4 Slaves

After 2 days of uptime and no DDL changes, the intermediate slave's slon daemon stated saying messages like:

2014-01-23 14:55:58 EST ERROR  remoteWorkerThread_1_1: error at end of COPY IN: ERROR:  invalid input syntax for integer: "33.7726910000000018"

The column in question is a “double precision” on both master & slave.

I tried restarting the slon daemon on both master and intermediate slave. The error message changed slightly:

COPY IN: ERROR:  invalid input syntax for integer: "Los Angeles"

Unfortunately I have to tear down this cluster since we need the machinery, but I'll keep my eye out for it again
Comment 3 Jan Wieck 2014-01-29 09:14:38 UTC
Created attachment 199 [details]
Proposed fix
Comment 4 Jan Wieck 2014-01-29 09:21:04 UTC
I was able to reproduce the memory corruption and found what I believe is the place, where it happens. The code, that evicts an apply query form the cache when more than apply_cache_size different query plans are needed inside of a single SYNC processing, free'd data from the "new" cache entry, not the one to evict.

The commit is here: https://github.com/wieck/slony1-engine/commit/6b56d480e211344eb952c9c23924ba555ac21d28
Comment 5 Steve Singer 2014-02-03 13:50:28 UTC
This fix looks safe to commit.
I haven't verified if the fix actually fixes the problem reported.

How difficult is it to write a unit test that would reproduce this issue?
Comment 6 Jan Wieck 2014-02-06 09:38:58 UTC
Patch applied to REL_2_2_STABLE and master.

A unit test does not necessarily make sense for this particular bug. Reproducing it was rather simple by creating an artificial transaction profile that caused individual LRU based apply cache eviction. After fixing it I ran the corresponding test application over night, pushing millions of transactions with a random number of updates while replicating thousands of SYNC events without any problems or apparent memory leaks. Any sort of unit test would not even scratch that test.
Comment 7 Steve Singer 2014-02-13 05:17:01 UTC
*** Bug 330 has been marked as a duplicate of this bug. ***