Need an easy way to force app servers to revert to non slave DB connections

Bug #317697 reported by Tom Haddon
4
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Medium
Unassigned

Bug Description

We've had a few issues recently where we've had to revert connections away from the slave DB to the main DB. While it might take a while to diagnose the issue, it would be very useful to be able to quickly revert app servers to not using slave DB. Something like the presence of a text file in the root of the code tree called "no-slave-db.txt" for instance.

Extra points if this could be done without a restart of the application servers.

At the moment, we have to manually edit config/{lpnet,edge}-lazr.conf, and then restart the app servers.

Revision history for this message
Tom Haddon (mthaddon) wrote :

And then to revert to using the slave DB again once this file is removed...

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 317697] [NEW] Need an easy way to force app servers to revert to non slave DB connections

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, Jan 16, 2009 at 7:50 AM, Launchpad Bug Tracker wrote:
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: http://getfiregpg.org

iD8DBQFJb/MlAfqZj7rGN0oRAmdEAJ4zRA7K+2DCLFepRqoxpCC4Q1C1tQCgg6j8
tVGgxpmBXBazOZ8uj6V1324=
=Cy54
-----END PGP SIGNATURE-----

> At the moment, we have to manually edit config/{lpnet,edge}-lazr.conf,
> and then restart the app servers.

A more reliable option is to have configs prepared under a different
name, inheriting from the real configs and just overriding the slave
connection strings.

We should look at a nicer way of doing this as requested - doubling
the number of production configs, even if the extras are only three
lines long, isn't nice.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

Stuart Bishop (stub)
Changed in launchpad-foundations:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Christian Reis (kiko) wrote :

Is doing this live really hard to do? I would think it's just a simple change to a global -- no?

Revision history for this message
Tom Haddon (mthaddon) wrote :

It's not so much hard as time consuming, which by definition is something we want avoid - if we're reverting it's because we're experiencing performance issues...

Currently to revert:

- Logon to each physical server
- Manually edit a configuration file
- Restart any application servers that are using that code tree

This takes at least a few minutes to do and is error prone because it involves manual editing of config files. We need something that can be done much quicker than this, and ideally without needing to restart the application servers.

Revision history for this message
Christian Reis (kiko) wrote :

Tom, I was suggesting doing a live switch from slaved to non-slaved, programatically, and not that it should be done manually ;-)

Revision history for this message
Tom Haddon (mthaddon) wrote :

Oh I see... you were suggesting fixing it shouldn't really be hard, not the current way we're doing it. Don't mind me...

Tom Haddon (mthaddon)
tags: added: canonical-losa-lp
Revision history for this message
Stuart Bishop (stub) wrote :

Shutting down cache-database-replication-lag.py now does this.

Changed in launchpad-foundations:
status: Triaged → Fix Released
Revision history for this message
Tom Haddon (mthaddon) wrote :

This wouldn't do it immediately, right? It'd just mean we're actually querying for this directly on the DB which may work okay for a while but then slow things down again whenever the query took too long I think.

Revision history for this message
Stuart Bishop (stub) wrote : Re: [Bug 317697] Re: Need an easy way to force app servers to revert to non slave DB connections

On Mon, May 31, 2010 at 3:18 PM, Tom Haddon <email address hidden> wrote:

> This wouldn't do it immediately, right? It'd just mean we're actually
> querying for this directly on the DB which may work okay for a while but
> then slow things down again whenever the query took too long I think.

It phases in. The appservers see lag start to increase, and once it
reaches max_usable_lag from the config (default 2 minutes), the slave
is never used.

If by 'querying for this directly on the DB' means calling the
replication_lag() stored procedure, the appservers never do this. They
only ever query the DatabaseReplicationLag cache.

--
Stuart Bishop <email address hidden>
http://www.stuartbishop.net/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.