cinder_scheduler healthcheck don't test the right port

Bug #1825342 reported by Artem Hrechanychenko
18
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Alan Bishop

Bug Description

[heat-admin@controller-0 ~]$ sudo systemctl status tripleo_cinder_scheduler_healthcheck.service
● tripleo_cinder_scheduler_healthcheck.service - cinder_scheduler healthcheck
   Loaded: loaded (/etc/systemd/system/tripleo_cinder_scheduler_healthcheck.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Thu 2019-04-18 10:41:00 UTC; 48s ago
  Process: 599072 ExecStart=/usr/bin/podman exec cinder_scheduler /openstack/healthcheck null (code=exited, status=1/FAILURE)
 Main PID: 599072 (code=exited, status=1/FAILURE)

Apr 18 10:41:00 controller-0 systemd[1]: Starting cinder_scheduler healthcheck...
Apr 18 10:41:00 controller-0 podman[599072]: There is no cinder-scheduler process with opened RabbitMQ ports (null) running in the container
Apr 18 10:41:00 controller-0 podman[599072]: exit status 1
Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Main process exited, code=exited, status=1/FAILURE
Apr 18 10:41:00 controller-0 systemd[1]: tripleo_cinder_scheduler_healthcheck.service: Failed with result 'exit-code'.
Apr 18 10:41:00 controller-0 systemd[1]: Failed to start cinder_scheduler healthcheck.

[heat-admin@controller-0 ~]$ sudo podman inspect cinder_scheduler |grep healthcheck
                "config_data": "{\"environment\": [\"KOLLA_CONFIG_STRATEGY=COPY_ALWAYS\", \"TRIPLEO_CONFIG_HASH=37c5752bb7a8713cb7bf28d9c72c5e39\"], \"healthcheck\": {\"test\": \"/openstack/healthcheck null\"}, \"image\": \"192.168.24.1:8787/rhosp15/openstack-cinder-scheduler:20190411.1\", \"net\": \"host\", \"privileged\": false, \"restart\": \"always\", \"volumes\": [\"/etc/hosts:/etc/hosts:ro\", \"/etc/localtime:/etc/localtime:ro\", \"/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro\", \"/etc/pki/ca-trust/source/anchors:/etc/pki/ca-trust/source/anchors:ro\", \"/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro\", \"/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro\", \"/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro\", \"/dev/log:/dev/log\", \"/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro\", \"/etc/puppet:/etc/puppet:ro\", \"/var/lib/kolla/config_files/cinder_scheduler.json:/var/lib/kolla/config_files/config.json:ro\", \"/var/lib/config-data/puppet-generated/cinder/:/var/lib/kolla/config_files/src:ro\", \"/var/log/containers/cinder:/var/log/cinder:z\"]}",

Revision history for this message
Alan Bishop (alan-bishop) wrote :

This is another manifestation (see [1]) of the side effect of the rabbit messaging parameters in [2]. The nova service has already been fixed by [3].

[1] https://bugs.launchpad.net/tripleo/+bug/1824805
[2] https://review.opendev.org/565086
[3] https://review.opendev.org/652964

I'm discussing things with ansmith on irc to see if there's a general solution, or if the nova solution needs to be replicated in every effected service.

Changed in tripleo:
importance: Undecided → High
status: New → Triaged
milestone: none → train-1
tags: added: stein-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/658108

Changed in tripleo:
assignee: nobody → Alan Bishop (alan-bishop)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/658108
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c5fe51147b53e032d2e87cf7c792d8459098b42e
Submitter: Zuul
Branch: master

commit c5fe51147b53e032d2e87cf7c792d8459098b42e
Author: Alan Bishop <email address hidden>
Date: Thu May 9 09:25:58 2019 -0400

    Use RpcPort for container healthchecks

    Update healthcheck commands that probe oslo's messaging port to use the
    RpcPort parameter. Previously, some templates referenced the service's
    own 'rabbit_port' config setting, which led to malformed healthcheck
    commands when the 'rabbit_port' settings were deprecated.

    Update the templates that looked up the port in the RabbitMQService's
    global_config_settings. Not only did this break the oslo abstraction
    by referring to a specific messaging backend (rabbit), it broke
    split-stack deployments in which the RabbitMQService is not actually
    deployed on the secondary stack's nodes.

    This patch creates a common healthcheck command using the RpcPort
    parameter in containers-common.yaml. This allows other templates to
    reference a common healthcheck command. Other templates that should
    also use this can be cleaned up in a separate patch.

    Closes-Bug: #1825342
    Change-Id: I0d3974089ae6e6879adab4852715c7a1c1188f7c

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/658360

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/658360
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=9118472dc63c8f0a462de9715a9d2beba4322c5c
Submitter: Zuul
Branch: stable/stein

commit 9118472dc63c8f0a462de9715a9d2beba4322c5c
Author: Alan Bishop <email address hidden>
Date: Thu May 9 09:25:58 2019 -0400

    Use RpcPort for container healthchecks

    Update healthcheck commands that probe oslo's messaging port to use the
    RpcPort parameter. Previously, some templates referenced the service's
    own 'rabbit_port' config setting, which led to malformed healthcheck
    commands when the 'rabbit_port' settings were deprecated.

    Update the templates that looked up the port in the RabbitMQService's
    global_config_settings. Not only did this break the oslo abstraction
    by referring to a specific messaging backend (rabbit), it broke
    split-stack deployments in which the RabbitMQService is not actually
    deployed on the secondary stack's nodes.

    This patch creates a common healthcheck command using the RpcPort
    parameter in containers-common.yaml. This allows other templates to
    reference a common healthcheck command. Other templates that should
    also use this can be cleaned up in a separate patch.

    Closes-Bug: #1825342
    Change-Id: I0d3974089ae6e6879adab4852715c7a1c1188f7c
    (cherry picked from commit c5fe51147b53e032d2e87cf7c792d8459098b42e)
    Conflicts:
     deployment/heat/heat-engine-container-puppet.yaml
     deployment/nova/nova-compute-container-puppet.yaml
     deployment/nova/nova-ironic-container-puppet.yaml
     deployment/nova/nova-scheduler-container-puppet.yaml

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.0.0

This issue was fixed in the openstack/tripleo-heat-templates 11.0.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 10.6.0

This issue was fixed in the openstack/tripleo-heat-templates 10.6.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.