docker stop in update tasks breaks neutron and dataplane

Bug #1777146 reported by Brent Eagles
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Low
Brent Eagles

Bug Description

We stop all running containers if we detect that docker that needs to be restarted (e.g. rpm has been updated or the config has been changed). This includes containers that neutron launched and ends up breaking the dataplane.

Brent Eagles (beagles)
Changed in tripleo:
milestone: none → rocky-3
status: New → Triaged
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-role-container-registry (master)

Fix proposed to branch: master
Review: https://review.openstack.org/575756

Changed in tripleo:
assignee: nobody → Brent Eagles (beagles)
status: Triaged → In Progress
Brent Eagles (beagles)
tags: added: queens-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/575758

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/queens)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/575758
Reason: we have gate problems again, please do not restore or recheck, I'll take care of this one when gate is back stable.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-role-container-registry (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.openstack.org/575756
Reason: we have gate problems again, please do not restore or recheck, I'll take care of this one when gate is back stable.

Brent Eagles (beagles)
Changed in tripleo:
importance: Critical → High
Revision history for this message
Brent Eagles (beagles) wrote :

I'm dropping the severity to "high". My initial concern was that the keepalived sidecars were not cleaning up their VIPs properly on shutdown since we were seeing "that kind of thing" in a bug report. Looking more closely at the logs, etc, this doesn't seem likely what was happening and even if it were the bug would lie in keepalived or how we run it.

The bug is still worthy of a high severity though IMO as we still want to avoid trashing the sidecars unnecessarily because it *is* a performance/behavior regression from how updates would've behaved on baremetal.

Changed in tripleo:
milestone: rocky-3 → rocky-rc1
Changed in tripleo:
milestone: rocky-rc1 → stein-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/601590

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/queens)

Reviewed: https://review.openstack.org/601590
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=78358da44f37cd455421677d5b123668bc2efa33
Submitter: Zuul
Branch: stable/queens

commit 78358da44f37cd455421677d5b123668bc2efa33
Author: Jiri Stransky <email address hidden>
Date: Tue Sep 11 14:41:22 2018 +0200

    [Queens] Don't stop docker on config changes

    This will allow us to take advantage of the Docker live restore
    option, and not restart all containers during minor update on small
    docker config changes. The containers will be restarted only in case
    Docker RPM is updated during the minor update.

    If we ship a config change which wouldn't work well with live restore,
    we'll need to re-enable the restarts on config change. Docker
    documentation does not specify which config options are safe to change
    and which aren't.

    Change-Id: I97cd28a7e1dfc1bec943754fbf42519b6f563da6
    Closes-Bug: #1777146

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 8.0.6

This issue was fixed in the openstack/tripleo-heat-templates 8.0.6 release.

Changed in tripleo:
milestone: stein-1 → stein-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ansible-role-container-registry (master)

Change abandoned by Brent Eagles (<email address hidden>) on branch: master
Review: https://review.openstack.org/575756
Reason: There is some debate about how to detect when it *would* be necessary to stop containers (i.e. an incompatible upgrade). Abandoning for now, can be revived later.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/queens)

Change abandoned by Brent Eagles (<email address hidden>) on branch: stable/queens
Review: https://review.openstack.org/575758
Reason: There was some discussion about how to handle situations where a restart would be required on a docker upgrade. Abandoning to be revived later on if necessary/prioritized.

Changed in tripleo:
milestone: stein-2 → stein-3
Revision history for this message
Juan Antonio Osorio Robles (juan-osorio-robles) wrote :

Is this still an issue?

Revision history for this message
Brent Eagles (beagles) wrote :

It's still an issue, but we couldn't figure out how we wanted to deal with docker upgrades needing a restart. However as this was filed awhile ago and we have not merged a fix but nobody has been pressing for a solution I would rate this as low priority.

Changed in tripleo:
importance: High → Low
Revision history for this message
Brent Eagles (beagles) wrote :

Actually where this isn't relevant to the current release maybe the proper course of action is to mark as invalid?

Changed in tripleo:
milestone: stein-3 → none
Revision history for this message
Mark Goddard (mgoddard) wrote :

Removed kolla-ansible from affected projects and created https://bugs.launchpad.net/kolla-ansible/+bug/1843397 to cover the issue on our side.

no longer affects: kolla-ansible
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers