debug mode doesn't scale

Bug #1470619 reported by Adam Gandelman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
akanda
Fix Released
Medium
Adam Gandelman

Bug Description

Currently, debug mode may be enabled by the admin for specific routers or specific tenants. The method for enabling persistent debug mode is to create files in a configured directory named the UUID of the resource to ignore. For every incoming event, the RUG workers check these directories and determine if the resource corresponding to the event is in debug mode. If so, the event is ignored.

This works fine in environments with single RUG instances, but if we plan to scale the RUG out to multiple nodes this becomes a problem. Without a shared filesystem mounted @ the ignore directory, operators need to find the correct RUG process that is managing the resource they wish to ignore, and be sure to create the ignore files on the correct node. We need to update this to allow operators to scale out to multiple RUGs without removing or complicating the process used to put routers or tenants in debug mode.

While we're at it, we might as well add a new global debug mode. This mode should be toggled in a manner similar to routers/tenants, but instruct all RUG processes that the entire system is in debug mode and to ignore all events. (this is a feature request)

Some possible solutions:

* replace os.listdir(ignored_router_directory) with a call to utils.execute('/etc/akanda-rug/hooks/router_debug') which is a script that returns list of UUIDs to ignore. The default could be a simple `ls $ignored_router_directory` but operators could replace to call into external systems (curl, db) that they use to centrally manage debug mode for things. This would work but forking (multiple) processes to execute hooks on every incoming event introduces some serious overhead

* create an extensible driver interface for debug checking--the default driver would maintain existing behavior. an alternative driver could be added to use a set of configured hooks. additional drivers could developed (by us or operators) to check external systems (database, webserver, etc) for debug lists

* drop the file/externally managed debug modes entirely. rely entirely on the non-persistent debug mode (initiated via rug-ctl router debug $id), or introduce a RUG database to back debug state and make the debug mode initated via rug-ctl presistent.

This currently blocks the RUG HA blueprint.

Tags: akanda-rug
description: updated
Changed in akanda:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to akanda-rug (master)

Reviewed: https://review.openstack.org/198176
Committed: https://git.openstack.org/cgit/stackforge/akanda-rug/commit/?id=6a3261958b8eccfe4f1fd1229036a1206597dd55
Submitter: Jenkins
Branch: master

commit 6a3261958b8eccfe4f1fd1229036a1206597dd55
Author: Adam Gandelman <email address hidden>
Date: Tue Jul 28 21:58:00 2015 -0700

    Adds a DB layer, use it for debug modes

    In preparation for scale out RUG, this adds a database layer
    (built on oslo.db) that will be used for managing debug modes.
    Instead of tracking debug'd/ignored routers and tenants in-memory
    or on disk as files, this uses a database. This means that putting
    things into debug mode via rug-ctl are now persistent, and the file-based
    approach is no longer available. A sqlite database (the default) can be
    used for single node installs, or the RUG can be pointed at mysql/pg to
    handle this in larger environments.

    This also adds a global debug mode that can be used to ignore all events
    during maintanence periods.

    A new optional 'reason' argument has been added to the debug modes, allowing
    operators to add a note when entering a tenant/router/cluster into debug
    mode.

    Change-Id: I3f5129e11b11cf5aaed8889da3b204104e5ad203
    Closes-bug: #1470619
    Partially implements: blueprint rug-scaling

Changed in akanda:
status: In Progress → Fix Committed
Sean Roberts (sarob)
tags: added: akanda-rug
Changed in akanda:
status: Fix Committed → Fix Released
assignee: nobody → Adam Gandelman (gandelman-a)
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.