Launchpad itself

Need more visibility into the progress of schema updates across master and slave DBs

Bug #531833 reported by Tom Haddon on 2010-03-04

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Won't Fix	High	Stuart Bishop

Bug Description

Each LP rollout at the moment is something of a lottery in terms of timing. Since figuring out whether any of the upgrade.py/fti.py/security.py steps are being blocked in some way is non-trivial, it's hard for us to know when things are going wrong. If we've estimated that the upgrade should take 30 mins, we only really start to worry if things have gone wrong in some way after about 35 minutes. If we're still only 20% of the way through at that stage, our outage estimations are going to be completely wrong.

Ideally we'd have some easy way of determining the progress of updates against the master and each slave DB, and whether they're being blocked in any way.

It would also be useful for the more general case of knowing what's happening on slave DBs to be able to see what the queue of items they're waiting to process is and whether they're blocked in any way.

See original description

Tags:

Tom Haddon (mthaddon) on 2010-03-04

Changed in launchpad:
importance:	Undecided → High

Brad Crittenden (bac) on 2010-03-04

affects:	launchpad → launchpad-foundations
Changed in launchpad-foundations:
status:	New → Triaged

Francis J. Lacoste (flacoste) on 2010-03-04

Changed in launchpad-foundations:
status:	Triaged → New
status:	New → Triaged

Gary Poster (gary) on 2010-03-04

Changed in launchpad-foundations:
milestone:	none → 10.03

Tom Haddon (mthaddon) on 2010-03-04

summary:	- Need more visibility into the progress of upgrade.py/fti.py/security.py + Need more visibility into the progress of schema updates across master + and slave DBs
description:	updated

Revision history for this message

Stuart Bishop (stub) wrote on 2010-03-15:

We can't get this level of detail out of the Slony tools themselves.

We might be able to get meaningful information out of the slony log files.

We could get the slony tools customized to provide less noise and more meaningful feedback.

Have there been blockages detected that were not caused by connections that should have been disconnected? pg_stat_activity can provide information on open connections, and we could write a report to aggregate all the servers if we need to.

Or is this more about getting 'I'm not blocked, I'm busy doing stuff' information? This information should all be in the slony log files. Currently it is buried in a lot of noise. We should consider switching to the Slony-I 2.x series which has apparently cleaned up the logging a lot.

Gary Poster (gary) on 2010-03-23

Changed in launchpad-foundations:
milestone:	10.03 → none

Revision history for this message

Stuart Bishop (stub) wrote on 2010-03-30:

I'll look at getting this information from the logs during the rollout. I think we can just run grep on the slon logs with a bit of filtering to get everything we need.

Changed in launchpad-foundations:
assignee:	nobody → Stuart Bishop (stub)

Revision history for this message

Stuart Bishop (stub) wrote on 2010-04-01:

The slon log for one of the slaves, when correctly filtered, provides the relevant information.

When making changes, slonik first applies all the db patches to the master. If they succeed, the patches are then applied in sequence to each of the slaves. So patch1 on slave1, patch1 on slave2, patch2 on slave1, patch2 on slave2 etc.

The following seems good to follow what is happening:

tail -f /var/log/slon/slon-launchpad_prod_1.log | grep -v '] DEBUG2'

I'm leaving the DEBUG1 messages as they are not that noisy, happening regularly enough to let you know things haven't crashed but not so often it obscures the more useful information such as the DDL statements being applied.

Revision history for this message

Stuart Bishop (stub) wrote on 2010-04-01:

I'll flag this as won't fix, as I believe this is good enough for our needs.

Changed in launchpad-foundations:
status:	Triaged → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.