Summary: | ERROR: duplicate key value violates unique constraint "sl_nodelock-pkey" | ||
---|---|---|---|
Product: | Slony-I | Reporter: | Steve Singer <ssinger> |
Component: | slon | Assignee: | Steve Singer <ssinger> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | slony1-bugs |
Priority: | medium | ||
Version: | 2.0 | ||
Hardware: | PC | ||
OS: | Linux | ||
See Also: | http://bugs.slony.info/bugzilla/show_bug.cgi?id=81 |
Description
Steve Singer
2010-06-01 08:03:52 UTC
I've seen this happen some more often. I think the following is happening. -a slon_retry() method is called following the move set -slon_retry signals the parent slon process which in turn sends a kill to the child -The child exits, I can find no 'cleanup' process that the child runs on a kill to ensure the postgresql connections are closed or to remove entries from sl_nodelock -The child is restarted by the parent. -The child calls cleanupNodelock() which checks to see if the pid for the backend registered with sl_nodelock is still around. I think sometimes the old backend process is still around (hasn't yet exited) maybe because it is in the middle of a query and hasn't yet noticed that the slon it is talking to has gone away -Since the backend process is still around the row isn't deleted from sl_nodelock causing the insert into sl_nodelock to fail. Since I've seen thsi happen more than an isolated incident and it causes the watchdog to exit as well I am bumping the priority. Options to fix this include 1) Having a the slon worker properly exit and remove itself from the sl_nodelock table before exiting 2) Increase the 'sleep' time before restarting the child. This doesn't really fix the problem it just makes it less likely This has been committed to REL_2_0_STABLE and master |