fix stop_grace_period for octavia worker container

Bug #1855684 reported by Gregory Thiemonge on 2019-12-09
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Undecided
Gregory Thiemonge

Bug Description

Default stop timeout for tripleo containers is 10 seconds, it may break octavia-worker services which can run long "taskflow" flows when building a load balancer.
If an admin restarts octavia-worker container while creating a load balancer, octavia-worker will be non-gracefully shutdown, and octavia will leak resources (VM instances and ports) that cannot be removed without manually editing the database.

To fix this issue, stop timeout value should be set to the same value as octavia-worker's graceful_shutdown_timeout (used by cotyledon to manage the service) which has been set in https://review.opendev.org/#/c/684201/ to 300 seconds.

Changed in tripleo:
assignee: nobody → Gregory Thiemonge (gthiemonge)

Fix proposed to branch: master
Review: https://review.opendev.org/698014

Changed in tripleo:
status: New → In Progress

Reviewed: https://review.opendev.org/698014
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c595835776eccbc2f59b69574f0aa5c3e87c9bd5
Submitter: Zuul
Branch: master

commit c595835776eccbc2f59b69574f0aa5c3e87c9bd5
Author: Gregory Thiemonge <email address hidden>
Date: Mon Dec 9 14:43:02 2019 +0100

    Set octavia services' stop grace period to 300sec

    Octavia worker, house-keeping and health-monitor serivices may use some
    long taskflow's flows to handle load balancers and amphorae (launch VMs,
    etc...). Those flows should not be interrupted when restarting those
    services (i.e when updating an overcloud, or restarting services because
    of certificates rotation), it might cause resource leaks that cannot be
    fixed by an admin.

    As default container stop timeout is defined to 10 seconds, this timeout
    value needs to be increased for octavia services (except octavia api) to
    ensure a graceful shutdown.
    This new value has been set to 300 seconds according to the octavia
    worker default configuration introduced in
    https://review.opendev.org/#/c/684201/

    Closes-Bug: #1855684
    Change-Id: I8911a79328769c910d03168cfa5a421d0dd0f9b6

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/703938
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=011935828040360940130d1402704f3bb68485e9
Submitter: Zuul
Branch: stable/stein

commit 011935828040360940130d1402704f3bb68485e9
Author: Gregory Thiemonge <email address hidden>
Date: Mon Dec 9 14:43:02 2019 +0100

    Set octavia services' stop grace period to 300sec

    Octavia worker, house-keeping and health-monitor serivices may use some
    long taskflow's flows to handle load balancers and amphorae (launch VMs,
    etc...). Those flows should not be interrupted when restarting those
    services (i.e when updating an overcloud, or restarting services because
    of certificates rotation), it might cause resource leaks that cannot be
    fixed by an admin.

    As default container stop timeout is defined to 10 seconds, this timeout
    value needs to be increased for octavia services (except octavia api) to
    ensure a graceful shutdown.
    This new value has been set to 300 seconds according to the octavia
    worker default configuration introduced in
    https://review.opendev.org/#/c/684201/

    Closes-Bug: #1855684
    Change-Id: I8911a79328769c910d03168cfa5a421d0dd0f9b6
    (cherry picked from commit c595835776eccbc2f59b69574f0aa5c3e87c9bd5)

tags: added: in-stable-stein

Reviewed: https://review.opendev.org/703937
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=36f9cc78c88b377092cebca5b11f451af35f4f10
Submitter: Zuul
Branch: stable/train

commit 36f9cc78c88b377092cebca5b11f451af35f4f10
Author: Gregory Thiemonge <email address hidden>
Date: Mon Dec 9 14:43:02 2019 +0100

    Set octavia services' stop grace period to 300sec

    Octavia worker, house-keeping and health-monitor serivices may use some
    long taskflow's flows to handle load balancers and amphorae (launch VMs,
    etc...). Those flows should not be interrupted when restarting those
    services (i.e when updating an overcloud, or restarting services because
    of certificates rotation), it might cause resource leaks that cannot be
    fixed by an admin.

    As default container stop timeout is defined to 10 seconds, this timeout
    value needs to be increased for octavia services (except octavia api) to
    ensure a graceful shutdown.
    This new value has been set to 300 seconds according to the octavia
    worker default configuration introduced in
    https://review.opendev.org/#/c/684201/

    Closes-Bug: #1855684
    Change-Id: I8911a79328769c910d03168cfa5a421d0dd0f9b6
    (cherry picked from commit c595835776eccbc2f59b69574f0aa5c3e87c9bd5)

tags: added: in-stable-train
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers