Launchpad itself

After switching to read-only some connections to the read-write database might still be open

Bug #513196 reported by Guilherme Salgado on 2010-01-27

This bug report is a duplicate of: Bug #531834: When switching to read-only mode, we're left with lots of "idle/select waiting" connections to the DBs which may be blocking the schema upgrade process. Edit Remove

14

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Triaged	Low	Stuart Bishop

Bug Description

Tom reported that during the roll out, after switching app servers to read-only, not all connections to the read-write database were closed.

Tags:

Revision history for this message

Tom Haddon (mthaddon) wrote on 2010-01-27:

#1

We were still seeing connections (idle, admittedly) from app servers to the read-write database a good number of minutes after switching to read-only. We ended up restarting the main DB, which had other knock on issues, as we need to make sure there are no connections to this DB before we run the upgrade.

Jan 27 09:10:33 <mthaddon> stub: I've run "~/scripts/losa-db-scripts/pgkillactive.py -s 0 -u '^(?!sso_|slony$|postgres$)'" on wildcherry, but "ps fuwwxx | grep -v session_prod | grep -v slony" still shows a bunch of connections that shouldn't be there I think
Jan 27 09:10:50 <mthaddon> stub: should I just kill them manually?
Jan 27 09:11:38 <stub> mthaddon: I don't see any active connections. I see lots of IDLE ones...
Jan 27 09:11:51 <mthaddon> stub: so how do I get rid of those?
Jan 27 09:12:03 <stub> mthaddon: Shut down the services
Jan 27 09:12:16 <mthaddon> stub: we can't - they're the app servers (in readonly mode)
Jan 27 09:12:22 <stub> mthaddon: Lots of shipit still talking to launchpad_prod_3
Jan 27 09:12:24 <mthaddon> stub: how about ~/scripts/losa-db-scripts/pgkillidle.py -s 0 -u '^(?!sso_|slony$|postgres$)'
Jan 27 09:12:34 <stub> Hmm.... so switching to read only mode doesn't drop connections to the masters?
Jan 27 09:12:42 <mthaddon> it seems like it doesn't, no
Jan 27 09:12:45 <stub> mthaddon: Not sure if that will work either
Jan 27 09:13:02 <stub> mthaddon: Simplest way is to bounce the database
Jan 27 09:13:12 <mthaddon> ok
Jan 27 09:13:14 <mthaddon> ugh
Jan 27 09:13:23 <mthaddon> stub: using -force?
Jan 27 09:13:29 <stub> Yes
Jan 27 09:13:33 <stub> Otherwise kill things manually

Revision history for this message

Guilherme Salgado (salgado) wrote on 2010-01-27:

#2

This is possible as we only close the connections when we start processing a new request, so threads that are sitting there idle since the switch will keep their connections open

Revision history for this message

Guilherme Salgado (salgado) wrote on 2010-01-27:

#3

I think we need to somehow wake the threads up every once in a while and have them close their connections if they're using the wrong database.

Another option would be to hit every app server with a bunch of simultaneous connections after the read-only.txt file is created, hoping that it will be enough to have each thread process at least one request, thus closing its connections to the read-write database. I don't think this would be acceptable, though.

How big a deal would it be if we had to manually close these idle connections that may be left behind?

Revision history for this message

Francis J. Lacoste (flacoste) wrote on 2010-01-27:

#4

Do these idle connections actually cause problems to the upgrade, or is it just to make sure that these connections don't suddenly become active. If they don't cause problems while idle, we could proceed since these app servers connection will be dropped automatically by the app server if a request comes in (because it would switch to the read-only store.)

Revision history for this message

Gary Poster (gary) wrote on 2010-01-27:

#5

Stuart, could you please give us your evaluation of whether this is symptom is fine or a problem?

Changed in launchpad-foundations:
assignee:	nobody → Stuart Bishop (stub)
status:	New → Triaged
importance:	Undecided → Low

Revision history for this message

Stuart Bishop (stub) wrote on 2010-01-28:

#6

The IDLE connections do not cause a problem for the upgrade. They do cause confusion.

Are we sure we are closing the connections as requests are handled in read-only mode? At production loads of around 20 requests per second, this means all the connections should have been dropped in well under a minute.

We should test this on staging.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #531834 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.