tripleo_neutron_* podman healtchecks broken on the undercloud

Bug #1821856 reported by Luca Miccini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Cédric Jeanneret

Bug Description

 [root@undercloud-0 system]# systemctl status tripleo_neutron_dhcp_healthcheck.service
● tripleo_neutron_dhcp_healthcheck.service - neutron_dhcp healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_neutron_dhcp_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2019-03-27 07:10:44 UTC; 58s ago
  Process: 263666 ExecStart=/usr/bin/podman exec neutron_dhcp /openstack/healthcheck (code=exited, status=1/FAILURE)
 Main PID: 263666 (code=exited, status=1/FAILURE)

Mar 27 07:10:44 undercloud-0.redhat.local systemd[1]: Starting neutron_dhcp healthcheck...
Mar 27 07:10:44 undercloud-0.redhat.local podman[263666]: There is no neutron-dhcp-ag process with opened RabbitMQ ports (5671,5672) running in the container
Mar 27 07:10:44 undercloud-0.redhat.local podman[263666]: exit status 1
Mar 27 07:10:44 undercloud-0.redhat.local systemd[1]: tripleo_neutron_dhcp_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Mar 27 07:10:44 undercloud-0.redhat.local systemd[1]: tripleo_neutron_dhcp_healthcheck.service: Failed with result 'exit-code'.
Mar 27 07:10:44 undercloud-0.redhat.local systemd[1]: Failed to start neutron_dhcp healthcheck.
 [root@undercloud-0 system]# systemctl status tripleo_neutron_l3_agent_healthcheck.service

● tripleo_neutron_l3_agent_healthcheck.service - neutron_l3_agent healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_neutron_l3_agent_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2019-03-27 07:11:11 UTC; 34s ago
  Process: 264050 ExecStart=/usr/bin/podman exec neutron_l3_agent /openstack/healthcheck (code=exited, status=1/FAILURE)
 Main PID: 264050 (code=exited, status=1/FAILURE)

Mar 27 07:11:11 undercloud-0.redhat.local systemd[1]: Starting neutron_l3_agent healthcheck...
Mar 27 07:11:11 undercloud-0.redhat.local podman[264050]: There is no neutron-l3-agen process with opened RabbitMQ ports (5671,5672) running in the container
Mar 27 07:11:11 undercloud-0.redhat.local podman[264050]: exit status 1
Mar 27 07:11:11 undercloud-0.redhat.local systemd[1]: tripleo_neutron_l3_agent_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Mar 27 07:11:11 undercloud-0.redhat.local systemd[1]: tripleo_neutron_l3_agent_healthcheck.service: Failed with result 'exit-code'.
Mar 27 07:11:11 undercloud-0.redhat.local systemd[1]: Failed to start neutron_l3_agent healthcheck.

 [root@undercloud-0 system]# systemctl status tripleo_neutron_ovs_agent_healthcheck.service
● tripleo_neutron_ovs_agent_healthcheck.service - neutron_ovs_agent healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_neutron_ovs_agent_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2019-03-27 07:10:44 UTC; 1min 5s ago
  Process: 263667 ExecStart=/usr/bin/podman exec neutron_ovs_agent /openstack/healthcheck (code=exited, status=1/FAILURE)
 Main PID: 263667 (code=exited, status=1/FAILURE)

Mar 27 07:10:44 undercloud-0.redhat.local systemd[1]: Starting neutron_ovs_agent healthcheck...
Mar 27 07:10:44 undercloud-0.redhat.local podman[263667]: There is no neutron-openvsw process with opened RabbitMQ ports (5671,5672) running in the container
Mar 27 07:10:44 undercloud-0.redhat.local podman[263667]: exit status 1
Mar 27 07:10:44 undercloud-0.redhat.local systemd[1]: tripleo_neutron_ovs_agent_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Mar 27 07:10:44 undercloud-0.redhat.local systemd[1]: tripleo_neutron_ovs_agent_healthcheck.service: Failed with result 'exit-code'.
Mar 27 07:10:44 undercloud-0.redhat.local systemd[1]: Failed to start neutron_ovs_agent healthcheck.

they seem to be relying on the same logic that's why I opened a single bz, please let me know if you would like dedicated ones.

Changed in tripleo:
assignee: nobody → Cédric Jeanneret (cjeanner)
importance: Undecided → High
status: New → Triaged
tags: added: containers
Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: none → stein-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/648027
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=5312bf19c8f820ac65514885aebdc2dc4776d72d
Submitter: Zuul
Branch: master

commit 5312bf19c8f820ac65514885aebdc2dc4776d72d
Author: Cédric Jeanneret <email address hidden>
Date: Wed Mar 27 08:58:24 2019 +0100

    Silent file descriptor checks

    In order to avoid spam in journald, we just get the exit code and let
    the checker output the error message.

    Also, correct how we retrieve process in the healthcheck_port and _listen
    functions.
    "ss" doesn't allow to match some processes, like "neutron-l3-agent". We
    therefore use the PID instead, provided by "pgrep".
    The "-d" option of pgrep allow to prepare its output for the "grep -E",
    preventing any need of a loop.

    Change-Id: I1555a9b79c954e646fe9ae35272231c581cea03e
    Closes-Bug: #1821782
    Closes-Bug: #1821856

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 10.6.1

This issue was fixed in the openstack/tripleo-common 10.6.1 release.

tags: added: queens-backport-potential rocky-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/713375

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/rocky)

Reviewed: https://review.opendev.org/713375
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=89d2393ea9fa977946ba34b6b9b3b279f830b64c
Submitter: Zuul
Branch: stable/rocky

commit 89d2393ea9fa977946ba34b6b9b3b279f830b64c
Author: Cédric Jeanneret <email address hidden>
Date: Wed Mar 27 08:58:24 2019 +0100

    Silent file descriptor checks

    In order to avoid spam in journald, we just get the exit code and let
    the checker output the error message.

    Also, correct how we retrieve process in the healthcheck_port and _listen
    functions.
    "ss" doesn't allow to match some processes, like "neutron-l3-agent". We
    therefore use the PID instead, provided by "pgrep".
    The "-d" option of pgrep allow to prepare its output for the "grep -E",
    preventing any need of a loop.

    Change-Id: I1555a9b79c954e646fe9ae35272231c581cea03e
    Closes-Bug: #1821782
    Closes-Bug: #1821856
    (cherry picked from commit 5312bf19c8f820ac65514885aebdc2dc4776d72d)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/713579

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/queens)

Reviewed: https://review.opendev.org/713579
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=16776e3979b8206009844e0d63ab3bb38a632d74
Submitter: Zuul
Branch: stable/queens

commit 16776e3979b8206009844e0d63ab3bb38a632d74
Author: Cédric Jeanneret <email address hidden>
Date: Wed Mar 27 08:58:24 2019 +0100

    Silent file descriptor checks

    In order to avoid spam in journald, we just get the exit code and let
    the checker output the error message.

    Also, correct how we retrieve process in the healthcheck_port and _listen
    functions.
    "ss" doesn't allow to match some processes, like "neutron-l3-agent". We
    therefore use the PID instead, provided by "pgrep".
    The "-d" option of pgrep allow to prepare its output for the "grep -E",
    preventing any need of a loop.

    Change-Id: I1555a9b79c954e646fe9ae35272231c581cea03e
    Closes-Bug: #1821782
    Closes-Bug: #1821856
    (cherry picked from commit 5312bf19c8f820ac65514885aebdc2dc4776d72d)
    (cherry picked from commit 89d2393ea9fa977946ba34b6b9b3b279f830b64c)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common rocky-eol

This issue was fixed in the openstack/tripleo-common rocky-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common queens-eol

This issue was fixed in the openstack/tripleo-common queens-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.