Bind backend rndc commands aren't limited

Bug #1896783 reported by Michael Chapman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
Fix Committed
High
Michael Chapman

Bug Description

In a local test environment I misconfigured the pools.yaml to point at bind servers that don't exist, then ran tempest. The result was about 1500 rndc processes that don't appear to ever disappear, causing significant load on the system.
The environment is deployed using tripleo, so the processes are all in the designate-worker container:

[root@controller-1 ~]# podman stats designate_worker

ID NAME CPU % MEM USAGE / LIMIT MEM % NET IO BLOCK IO PIDS
18103f368b4e designate_worker 1.73% 19.13GB / 33.56GB 57.02% -- / -- 4.882GB / 8.536MB 2076

Steps to reproduce:

1. Deploy devstack with the bind backend.
2. Edit /etc/designate/pools.yaml so rndc_host doesn't point at a bind server
3. Update the pools: designate-manage pool update --file /etc/designate/pools.yaml
4. Run tempest: tox -e all-plugin -- designate

This is despite designate worker only having 2 configured workers:
/etc/designate.conf:
[service:worker]
workers = 2

It might make sense to maintain a task queue for the rndc commands so that only a limited number can be active at any given time.
rndc doesn't have a timeout option that I can see in the man page, so it might makes sense to add one via oslo processutils. I think the built in timeout is about 30 seconds.

Changed in designate:
assignee: nobody → Michael Chapman (michaeltchapman)
summary: - Bind backend rndc commands have no timeout
+ Bind backend rndc commands aren't limited
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to designate (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/761274

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to designate (master)

Reviewed: https://review.opendev.org/761274
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=10f19870c4bc503f414d5beef92c3939d91764d9
Submitter: Zuul
Branch: master

commit 10f19870c4bc503f414d5beef92c3939d91764d9
Author: Michael Chapman <email address hidden>
Date: Wed Nov 4 15:24:43 2020 +1100

    Add timeout to rndc commands

    In the event of a backend BIND server being unreachable for any reason,
    rndc commands will persist for a very long time and can consume
    significant resources. This can be seen when running devstack with
    a pool configured to point at a bind server that doesn't exist - the
    rndc process count can climb into the thousands.

    An optional timeout has been added to rndc to alleviate this.

    Change-Id: Idd61e79715b21fdd3249136cf68a7b9d3069c3f9
    Related-Bug: 1896783

Changed in designate:
status: New → Fix Committed
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.