Podman error when stopping heat_engine container with systemd

Bug #1821241 reported by Emilien Macchi on 2019-03-21
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Emilien Macchi

Bug Description

Originally reported here:
https://bugzilla.redhat.com/show_bug.cgi?id=1691408

Problem: Redeploying an Undercloud fails when paunch tries to stop containers.

How to reproduce:
1) Deploy an Undercloud on RHEL8 with Stein.
2) Run: $ sudo servicectl stop tripleo_heat_engine
3) Look at the logs: $ sudo journalctl -u tripleo_heat_engine

You'll see the error:

Mar 21 18:08:03 undercloud.localdomain systemd[1]: Stopping heat_engine container...
Mar 21 18:08:38 undercloud.localdomain systemd[1]: heat_engine container is not active.
Mar 21 18:09:01 undercloud.localdomain podman[253956]: d6c494d9657c73e5fa9ac946136bf085ad84be8d17db26725ca54a8e8cec759f
Mar 21 18:09:01 undercloud.localdomain podman[26123]: time="2019-03-21T18:09:01Z" level=error msg="Error forwarding signal 15 to container d6c494d9657c73e5fa9ac946136bf085ad8
4be8d17db26725ca54a8e8cec759f: can only kill running containers: container state improper"

It always happens with the same container (heat-engine).

Changed in tripleo:
milestone: none → stein-rc1
importance: Undecided → High
status: New → Triaged
assignee: nobody → Emilien Macchi (emilienm)
Emilien Macchi (emilienm) wrote :
tags: added: idempotency

Fix proposed to branch: master
Review: https://review.openstack.org/645550

Changed in tripleo:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/645550
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=110a9496b2c87a4d41d863aa918c2c4535b32e22
Submitter: Zuul
Branch: master

commit 110a9496b2c87a4d41d863aa918c2c4535b32e22
Author: Emilien Macchi <email address hidden>
Date: Fri Mar 22 08:14:48 2019 -0400

    systemd: switch KillMode to 'none'

    When running the services with KillMode=process, there is a race
    condition between ExecStop and the command specified in ExecStart.
    The ExecStop seems faster and the container is killed then cleaned up.
    However podman started by ExecStart is still running and systemd kills
    it as soon as the ExecStop finished.

    Since we rely on Podman to manage the containers & processes, let's
    switch to KillMode=none.

    Credits to Giuseppe Scrivano for explaining the root cause.

    Change-Id: Icbf2b81477902e3d7ff9e064bf2408c2fc7e510e
    Closes-Bug: #1821241

Changed in tripleo:
status: In Progress → Fix Released

This issue was fixed in the openstack/paunch 4.4.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.