Postgres 8.2 on Feisty beta dies regularly

Bug #93042 reported by Mark Shuttleworth
4
Affects Status Importance Assigned to Milestone
postgresql-8.2 (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: postgresql-8.2

On Feisty (beta) I find that PG 8.2 regularly gets jammed to the point where the process is unresponsive to cancel commands, and a complete db reset is required. This happens when I'm running a large test harness that hammers the database with many queries over the course of about an hour. It does not always happen at the same place. I will attach snippets of information to this bug as I learn more.

Revision history for this message
Mark Shuttleworth (sabdfl) wrote :

Here is the last item in the query log at the time of the latest hang:

[2007-03-17 09:23:42 GMT] mark@launchpad_ftest_template LOG: statement: COMMENT ON FUNCTION valid_regexp(text)
            IS 'Returns true if the input can be compiled as a regular expression.';
[2007-03-17 09:23:42 GMT] mark@launchpad_ftest_template LOG: statement: CREATE OR REPLACE FUNCTION sha1(text) RETURNS char(40)
        LANGUAGE plpythonu IMMUTABLE RETURNS NULL ON NULL INPUT AS
        $$
            import sha
            return sha.new(args[0]).hexdigest()
        $$;

Revision history for this message
Martin Pitt (pitti) wrote :

From IRC debugging:
- it's just the client connection that gets stuck, server continues to work on other TCP client connections
- no lingering connections (TIME_WAIT/CLOSE_WAIT)
- this time hanged on 'CREATE DATABASE launchpad_ftest TEMPLATE=launchpad_ftest_template ENCODING='UNICODE''

Revision history for this message
Martin Pitt (pitti) wrote :
Revision history for this message
Martin Pitt (pitti) wrote :

From looking at the trace it appears that for some reason the autovacuum daemon and the create table execution create a deadlock situation. Mark will try to set autovacuum=off and see whether the hangs still happen.

Changed in postgresql-8.2:
status: Unconfirmed → Confirmed
Revision history for this message
Martin Pitt (pitti) wrote :

http://librarian.launchpad.net/7042614/postmaster-bt.txt was the bt for the process that hanged on CREATE DATABASE.

There was another fork:
16273 ? Ss 0:00 postgres: mark launchpad_ftest_template [local] startup waiting

which also hanged in the semaphore:
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xb7c60dcb in semop () from /lib/tls/i686/cmov/libc.so.6
#2 0x081c98cc in PGSemaphoreLock (sema=0xb78dca50, interruptOK=6 '\006')
    at pg_sema.c:411
#3 0x081f363f in ProcSleep (locallock=0x83b5b80, lockMethodTable=0x833f76c)
    at proc.c:829
#4 0x081f2665 in LockAcquire (locktag=0xbfc5b414, lockmode=3,
    sessionLock=0 '\0', dontWait=0 '\0') at lock.c:1140
#5 0x081f0146 in LockSharedObject (classid=1262, objid=16385, objsubid=0,
    lockmode=3) at lmgr.c:578
#6 0x08293f28 in InitPostgres (dbname=0x83b3058 "launchpad_ftest_template",
    username=0x83983a0 "mark") at postinit.c:406
#7 0x081fd522 in PostgresMain (argc=4, argv=0x83983dc,
    username=0x83983a0 "mark") at postgres.c:3143
#8 0x081d211a in ServerLoop () at postmaster.c:2931
#9 0x081d3034 in PostmasterMain (argc=5, argv=0x83950e0) at postmaster.c:963
#10 0x0818c745 in main (argc=5, argv=0xbfc5ca64) at main.c:188

and the autovacuum daemon as well:
#1 0xb7c60dcb in semop () from /lib/tls/i686/cmov/libc.so.6
#2 0x081c98cc in PGSemaphoreLock (sema=0xb78ddc20, interruptOK=118 'v')
    at pg_sema.c:411
#3 0x081f553e in LWLockAcquire (lockid=BtreeVacuumLock, mode=LW_EXCLUSIVE)
    at lwlock.c:455
#4 0x080abe38 in _bt_end_vacuum (rel=0xb5f0b298) at nbtutils.c:1028
#5 0x080a9c68 in btbulkdelete (fcinfo=0xbfc58cd8) at nbtree.c:552
#6 0x0828b13d in FunctionCall4 (flinfo=0x8637188, arg1=3217395608, arg2=0,
    arg3=135662128, arg4=140545744) at fmgr.c:1206
#7 0x080a3b96 in index_bulk_delete (info=0xbfc58f98, stats=0x0,
    callback=0x8160a30 <lazy_tid_reaped>, callback_state=0x8608ed0)
    at indexam.c:573
#8 0x0816099f in lazy_vacuum_index (indrel=0xb5f0b298, stats=0x86092dc,
    vacrelstats=0x8608ed0) at vacuumlazy.c:651
#9 0x081611dd in lazy_vacuum_rel (onerel=0xb5f09ca8, vacstmt=0x8638ef8)
    at vacuumlazy.c:478
#10 0x0815f08b in vacuum_rel (relid=<value optimized out>, vacstmt=0x8638ef8,
    expected_relkind=114 'r') at vacuum.c:1088
#11 0x08160015 in vacuum (vacstmt=0x8638ef8, relids=0x8638f58) at vacuum.c:397
#12 0x081cc10a in AutoVacMain (argc=<value optimized out>,
    argv=<value optimized out>) at autovacuum.c:912
#13 0x081cc5c3 in autovac_start () at autovacuum.c:178
#14 0x081d26d0 in ServerLoop () at postmaster.c:1249
#15 0x081d3034 in PostmasterMain (argc=5, argv=0x83950e0) at postmaster.c:963
#16 0x0818c745 in main (argc=5, argv=0xbfc5ca64) at main.c:188

Revision history for this message
Martin Pitt (pitti) wrote :

It is possible that this version already fixes the deadlock. However, everyone who experienced that in the past: If the hangs do not occur any more, then the logs should reveal the underlying error condition. Please watch out for 'multiple active vacuums for index' and 'out of btvacinfo slots' error messages in the logs.

 postgresql-8.2 (8.2.3-2) experimental; urgency=low
 .
   * debian/control: Add Perl dependency to p-8.2-plperl, to ensure that
     creating plperl functions works (as opposed to plperlu, which only needs
     libperl). (see bug #412135)
   * debian/control: Do not mention nor suggest 'pgdocs' any more in p-doc's
     description since pgdocs is only available for 7.4. (see bug #405097)
   * debian/patches/04-timezone-symlinks.patch:
     - Use the timezone database from the system tzdata instead of shipping our
       own. Towards a single authoritative time zone database in Debian and
       Ubuntu... :) (LP: #41159)
     - Drop previous hardlink-to-symlink patch to zic, since that is irrelevant
       now.
     - debian/control: Add tzdata dependency.
   * Add debian/patches/12-vacuum-cycle-hang.patch: Properly release our
     semaphore lock before erroring out wit elog() to prevent deadlocks on
     vacuum errors. Thanks to Heikki Linnakangas!
   * debian/rules: Have a test suite failure fail the build again. Let's ignore
     the old kernels on the Debian mips[el] buildds for now.

Revision history for this message
Martin Pitt (pitti) wrote :

I got some IRC feedback that this patch fixed the hang, so I am closing this for now. Please cry out loudly if it happens again, then I will reopen and re-examine. Thank you!

Changed in postgresql-8.2:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.