debug mode doesn't scale
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
akanda |
Fix Released
|
Medium
|
Adam Gandelman |
Bug Description
Currently, debug mode may be enabled by the admin for specific routers or specific tenants. The method for enabling persistent debug mode is to create files in a configured directory named the UUID of the resource to ignore. For every incoming event, the RUG workers check these directories and determine if the resource corresponding to the event is in debug mode. If so, the event is ignored.
This works fine in environments with single RUG instances, but if we plan to scale the RUG out to multiple nodes this becomes a problem. Without a shared filesystem mounted @ the ignore directory, operators need to find the correct RUG process that is managing the resource they wish to ignore, and be sure to create the ignore files on the correct node. We need to update this to allow operators to scale out to multiple RUGs without removing or complicating the process used to put routers or tenants in debug mode.
While we're at it, we might as well add a new global debug mode. This mode should be toggled in a manner similar to routers/tenants, but instruct all RUG processes that the entire system is in debug mode and to ignore all events. (this is a feature request)
Some possible solutions:
* replace os.listdir(
* create an extensible driver interface for debug checking--the default driver would maintain existing behavior. an alternative driver could be added to use a set of configured hooks. additional drivers could developed (by us or operators) to check external systems (database, webserver, etc) for debug lists
* drop the file/externally managed debug modes entirely. rely entirely on the non-persistent debug mode (initiated via rug-ctl router debug $id), or introduce a RUG database to back debug state and make the debug mode initated via rug-ctl presistent.
This currently blocks the RUG HA blueprint.
description: | updated |
Changed in akanda: | |
status: | New → In Progress |
tags: | added: akanda-rug |
Changed in akanda: | |
status: | Fix Committed → Fix Released |
assignee: | nobody → Adam Gandelman (gandelman-a) |
importance: | Undecided → Medium |
Reviewed: https:/ /review. openstack. org/198176 /git.openstack. org/cgit/ stackforge/ akanda- rug/commit/ ?id=6a3261958b8 eccfe4f1fd12290 36a1206597dd55
Committed: https:/
Submitter: Jenkins
Branch: master
commit 6a3261958b8eccf e4f1fd1229036a1 206597dd55
Author: Adam Gandelman <email address hidden>
Date: Tue Jul 28 21:58:00 2015 -0700
Adds a DB layer, use it for debug modes
In preparation for scale out RUG, this adds a database layer
(built on oslo.db) that will be used for managing debug modes.
Instead of tracking debug'd/ignored routers and tenants in-memory
or on disk as files, this uses a database. This means that putting
things into debug mode via rug-ctl are now persistent, and the file-based
approach is no longer available. A sqlite database (the default) can be
used for single node installs, or the RUG can be pointed at mysql/pg to
handle this in larger environments.
This also adds a global debug mode that can be used to ignore all events
during maintanence periods.
A new optional 'reason' argument has been added to the debug modes, allowing router/ cluster into debug
operators to add a note when entering a tenant/
mode.
Change-Id: I3f5129e11b11cf 5aaed8889da3b20 4104e5ad203
Closes-bug: #1470619
Partially implements: blueprint rug-scaling