After switching to read-only some connections to the read-write database might still be open

Bug #513196 reported by Guilherme Salgado on 2010-01-27
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Low
Stuart Bishop

Bug Description

Tom reported that during the roll out, after switching app servers to read-only, not all connections to the read-write database were closed.

Tom Haddon (mthaddon) wrote :

We were still seeing connections (idle, admittedly) from app servers to the read-write database a good number of minutes after switching to read-only. We ended up restarting the main DB, which had other knock on issues, as we need to make sure there are no connections to this DB before we run the upgrade.

Jan 27 09:10:33 <mthaddon> stub: I've run "~/scripts/losa-db-scripts/pgkillactive.py -s 0 -u '^(?!sso_|slony$|postgres$)'" on wildcherry, but "ps fuwwxx | grep -v session_prod | grep -v slony" still shows a bunch of connections that shouldn't be there I think
Jan 27 09:10:50 <mthaddon> stub: should I just kill them manually?
Jan 27 09:11:38 <stub> mthaddon: I don't see any active connections. I see lots of IDLE ones...
Jan 27 09:11:51 <mthaddon> stub: so how do I get rid of those?
Jan 27 09:12:03 <stub> mthaddon: Shut down the services
Jan 27 09:12:16 <mthaddon> stub: we can't - they're the app servers (in readonly mode)
Jan 27 09:12:22 <stub> mthaddon: Lots of shipit still talking to launchpad_prod_3
Jan 27 09:12:24 <mthaddon> stub: how about ~/scripts/losa-db-scripts/pgkillidle.py -s 0 -u '^(?!sso_|slony$|postgres$)'
Jan 27 09:12:34 <stub> Hmm.... so switching to read only mode doesn't drop connections to the masters?
Jan 27 09:12:42 <mthaddon> it seems like it doesn't, no
Jan 27 09:12:45 <stub> mthaddon: Not sure if that will work either
Jan 27 09:13:02 <stub> mthaddon: Simplest way is to bounce the database
Jan 27 09:13:12 <mthaddon> ok
Jan 27 09:13:14 <mthaddon> ugh
Jan 27 09:13:23 <mthaddon> stub: using -force?
Jan 27 09:13:29 <stub> Yes
Jan 27 09:13:33 <stub> Otherwise kill things manually

Guilherme Salgado (salgado) wrote :

This is possible as we only close the connections when we start processing a new request, so threads that are sitting there idle since the switch will keep their connections open

Guilherme Salgado (salgado) wrote :

I think we need to somehow wake the threads up every once in a while and have them close their connections if they're using the wrong database.

Another option would be to hit every app server with a bunch of simultaneous connections after the read-only.txt file is created, hoping that it will be enough to have each thread process at least one request, thus closing its connections to the read-write database. I don't think this would be acceptable, though.

How big a deal would it be if we had to manually close these idle connections that may be left behind?

Francis J. Lacoste (flacoste) wrote :

Do these idle connections actually cause problems to the upgrade, or is it just to make sure that these connections don't suddenly become active. If they don't cause problems while idle, we could proceed since these app servers connection will be dropped automatically by the app server if a request comes in (because it would switch to the read-only store.)

Gary Poster (gary) wrote :

Stuart, could you please give us your evaluation of whether this is symptom is fine or a problem?

Changed in launchpad-foundations:
assignee: nobody → Stuart Bishop (stub)
status: New → Triaged
importance: Undecided → Low
Stuart Bishop (stub) wrote :

The IDLE connections do not cause a problem for the upgrade. They do cause confusion.

Are we sure we are closing the connections as requests are handled in read-only mode? At production loads of around 20 requests per second, this means all the connections should have been dropped in well under a minute.

We should test this on staging.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers