Paunch improperly names PID files for services running containers

Bug #1839929 reported by Bogdan Dobrelya
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Bogdan Dobrelya

Bug Description

Sometimes (when redeploying overcloud on top, so containers/services get the prefixed names) the PID files does not match what's being used by the systemd service file.

$ systemctl status tripleo_memcached-4btmx9ju.service
● tripleo_memcached-4btmx9ju.service - memcached-4btmx9ju container
   Loaded: loaded (/etc/systemd/system/tripleo_memcached-4btmx9ju.service; enabled; vendor preset: disabled)
   Active: failed (Result: protocol) since Mon 2019-08-12 15:09:49 UTC; 16h ago
  Process: *667419* ExecStart=/usr/bin/podman start memcached-4btmx9ju (code=exited, status=0/SUCCESS)

$ grep PIDFile /etc/systemd/system/tripleo_memcached-4btmx9ju.service
PIDFile=/var/run/memcached-4btmx9ju.pid
$ ls -1 /var/run/memcached-4btmx9ju.pid
ls: cannot access '/var/run/memcached-4btmx9ju.pid': No such file or directory
$ ls -1 /var/run/memcached.pid
/var/run/memcached.pid
$ cat /var/run/memcached.pid
666389

$ ps -fp 666389
UID PID PPID C STIME TTY TIME CMD
root 666389 1 0 Aug12 ? 00:00:00 /usr/libexec/podman/conmon -s -c *78b965370645faa7936eff76b5ca11b5fb04a16d2ed5198d202362513dc7a589* -u 78b965370645faa7936eff76b

$ sudo podman inspect 78b965370645faa7936eff76b5ca11b5fb04a16d2ed5198d202362513dc7a589 | jq '.[] .Name'
"memcached-4btmx9ju"

This is caused by mismatching container name specified for PIDFile in systemd unit and the name passed as --conmon-pidfile argument

Changed in tripleo:
importance: Undecided → Critical
tags: added: idempotency
Changed in tripleo:
importance: Critical → High
milestone: none → train-3
assignee: nobody → Bogdan Dobrelya (bogdando)
tags: added: containers
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (master)

Fix proposed to branch: master
Review: https://review.opendev.org/676156

Changed in tripleo:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (master)

Reviewed: https://review.opendev.org/676156
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=5d174c1bea474a34da6f4f68173323a8243c4fc5
Submitter: Zuul
Branch: master

commit 5d174c1bea474a34da6f4f68173323a8243c4fc5
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Aug 13 11:42:26 2019 +0200

    Fix mismatching fixed vs unique container names

    Sometimes, like when redeploying in-place, containers for the "fixed"
    service name might already existed, thus get the prefixed names. That
    might create mismatches. For example the pidfile names may diverge by
    the "fixzed" container service name vs its predictable prefixed unique
    name.

    Fix that by using the predictable unique names instead of the service
    container names for the builder and paunch actions run,
    debug/print-cmd that rely on it. This is achieved via a new parameter
    for the real container name (a delegate) used for the "fixed" service
    container name.

    For podman builder, we use that delegate for conmon pidfile and logging
    setup. So that now it always matches the PIDFile specified in the
    systemd unit generated for that container.

    For docker builder, we have no special uses for delegates, but we
    support that parameter to simplify the code around (so that there will
    be no need to wrap things with 'if cli == podman else...').

    Closes-bug: #1839929

    Change-Id: I5617e11f5d315f408d818e1ce47aa68f4a0d777a
    Signed-off-by: Bogdan Dobrelya <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/676984

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (stable/stein)

Reviewed: https://review.opendev.org/676984
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=4afb7b0e56371fb6750cf26d0c7f62f348e87616
Submitter: Zuul
Branch: stable/stein

commit 4afb7b0e56371fb6750cf26d0c7f62f348e87616
Author: Bogdan Dobrelya <email address hidden>
Date: Tue Aug 13 11:42:26 2019 +0200

    Fix mismatching fixed vs unique container names

    Sometimes, like when redeploying in-place, containers for the "fixed"
    service name might already existed, thus get the prefixed names. That
    might create mismatches. For example the pidfile names may diverge by
    the "fixzed" container service name vs its predictable prefixed unique
    name.

    Fix that by using the predictable unique names instead of the service
    container names for the builder and paunch actions run,
    debug/print-cmd that rely on it. This is achieved via a new parameter
    for the real container name (a delegate) used for the "fixed" service
    container name.

    For podman builder, we use that delegate for conmon pidfile and logging
    setup. So that now it always matches the PIDFile specified in the
    systemd unit generated for that container.

    For docker builder, we have no special uses for delegates, but we
    support that parameter to simplify the code around (so that there will
    be no need to wrap things with 'if cli == podman else...').

    Conflicts:
        paunch/builder/podman.py

    Closes-bug: #1839929

    Change-Id: I5617e11f5d315f408d818e1ce47aa68f4a0d777a
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 5d174c1bea474a34da6f4f68173323a8243c4fc5)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to paunch (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/679040

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Related fix for the backport https://review.opendev.org/#/c/679074/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on paunch (master)

Change abandoned by Bogdan Dobrelya (bogdando) (<email address hidden>) on branch: master
Review: https://review.opendev.org/679040

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch 4.5.1

This issue was fixed in the openstack/paunch 4.5.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on paunch (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/678577
Reason: We are facing gate issue: https://bugs.launchpad.net/tripleo/+bug/1844446

To clear the gate we need to abandon this patch and I will restore once the gate is ready again to land patches in TripleO. Please don't touch this patch, and ask on #tripleo Wes or Emilien for any question. Thanks for your patience.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch 5.2.0

This issue was fixed in the openstack/paunch 5.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to paunch (master)

Reviewed: https://review.opendev.org/678577
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=3dcbe5e68c50a09a503ab52a03e082949bfed594
Submitter: Zuul
Branch: master

commit 3dcbe5e68c50a09a503ab52a03e082949bfed594
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Aug 26 16:23:00 2019 +0200

    Fix discovering container names

    Make discover_container_name returning None, if the ps command failed
    or returned nothing useful.

    * For 'run', if no container name has been discovered, use its
      predictable (fixed) container service name.

    * For 'exec', also raise an error, if no name has been discovered for
      the fixed/service container. Do not use additional checks as the
      None returned by discover_container_name() already tells us all we
      need to know about the subject container.

    Related-Bug: #1839929

    Co-Authored-By: Cédric Jeanneret <email address hidden>
    Change-Id: I8a495d2c98617bb5edbe13ccf737d6c630eea7ad
    Signed-off-by: Bogdan Dobrelya <email address hidden>

tags: added: queens-backport-potential
tags: removed: queens-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to paunch (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.opendev.org/702457

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to paunch (stable/queens)

Reviewed: https://review.opendev.org/702457
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=f8ede6b3627aa7c52e0969ee251b01bf901922c8
Submitter: Zuul
Branch: stable/queens

commit f8ede6b3627aa7c52e0969ee251b01bf901922c8
Author: Bogdan Dobrelya <email address hidden>
Date: Mon Aug 26 16:23:00 2019 +0200

    Fix discovering container names

    (a partial backport limited to action 'run')
    Make discover_container_name returning None, if the ps command failed
    or returned nothing useful.

    * For 'run', if no container name has been discovered, use its
      predictable (fixed) container service name.

    Related-Bug: #1839929

    Co-Authored-By: Cédric Jeanneret <email address hidden>
    Change-Id: I8a495d2c98617bb5edbe13ccf737d6c630eea7ad
    Signed-off-by: Bogdan Dobrelya <email address hidden>
    (cherry picked from commit 3dcbe5e68c50a09a503ab52a03e082949bfed594)

tags: added: in-stable-queens
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.