interrupted pg connections can hang appservers in PQgetResult()

Bug #931161 reported by Robert Collins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
Triaged
High
Unassigned

Bug Description

We had a cross-DC firewall incident in the weekend. In the aftermath some appservers had gone off into lala land.

The backtraces were very similar: 2 active threads, both looking like:
Thread 3
#0 0x00002b8e9e7a3f93 in poll () from None
#1 0x00002b8ea4aa184f in ?? () from None
#2 0x00002b8ea4aa18d0 in ?? () from None
#3 0x00002b8ea4aa0a89 in PQgetResult () from None
#4 0x00002b8ea4aa0d28 in ?? () from None
#5 0x00002b8ea4870553 in pq_execute (curs=0xd9b94d8, query=0x14488714 "\
", ' ' <repeats 12 times>, "UPDATE SessionData SET last_accessed = CURRENT_TIMESTAMP\
", ' ' <repeats 12 times>, "WHERE client_id = E'6BPI4Wcg59P77Pi1ILzViTD.3WwbM-OpGPhVDtMn2iGzeUXnkH13Lw'\
", ' ' <repeats 16 times>, "AND last_accessed < CURREN"..., async=0) from psycopg/pqpath.c
#6 0x00002b8ea4876c8c in _psyco_curs_execute (self=0xd9b94d8, operation=<value optimised out>, vars=<value optimised out>, async=0) from psycopg/cursor_type.c
#7 0x00002b8ea48774bd in psyco_curs_execute (self=0xd9b94d8, args=<value optimised out>, kwargs=<value optimised out>) from psycopg/cursor_type.c
#8 0x00000000004a7a07 in ext_do_call () from ../Python/ceval.c

Thread 2
#0 0x00002b8e9e7a3f93 in poll () from None
#1 0x00002b8ea4aa184f in ?? () from None
#2 0x00002b8ea4aa18d0 in ?? () from None
#3 0x00002b8ea4aa0a89 in PQgetResult () from None
#4 0x00002b8ea4aa0d28 in ?? () from None
#5 0x00002b8ea4870553 in pq_execute (curs=0xedff308, query=0x118950d4 "SELECT BugMessage.bug, BugMessage.bugwatch, BugMessage.id, BugMessage.index, BugMessage.message, BugMessage.owner, BugMessage.remote_comment_id, Message.datecreated, Message.id, Message.owner, Message"..., async=0) from psycopg/pqpath.c
#6 0x00002b8ea4876c8c in _psyco_curs_execute (self=0xedff308, operation=<value optimised out>, vars=<value optimised out>, async=0) from psycopg/cursor_type.c
#7 0x00002b8ea48774bd in psyco_curs_execute (self=0xedff308, args=<value optimised out>, kwargs=<value optimised out>) from psycopg/cursor_type.c
#8 0x00000000004a7a07 in ext_do_call () from ../Python/ceval.c

The backtrace was captured 24 hours after haproxy took the servers out of rotation, so these are not 'active' requests but rather stuck threads.

Revision history for this message
Robert Collins (lifeless) wrote :

From James troup - keepalive settings for the host:
net.ipv4.tcp_keepalive_time = 7200
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_intvl = 75

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.