[FFU] Queens services still running before running the transfer data step

Bug #1894238 reported by Jose Luis Franco
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Jose Luis Franco

Bug Description

We have seen in some environments that after running the system_upgrade_transfer_data step (which should stop services on the other two controllers and transfer the data backup to the first controller) leaves still some services running on the controller nodes:

[heat-admin@controller-1 ~]$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
03328d67d096 192.168.24.1:8787/rh-osbs/rhosp13-openstack-collectd:20200730.1 "dumb-init --singl..." 27 hours ago Up About an hour (healthy) collectd
c4c3e9794c22 192.168.24.1:8787/rh-osbs/rhosp13-openstack-qdrouterd:20200730.1 "dumb-init --singl..." 28 hours ago Up 28 hours metrics_qdr
ce87abef7b63 192.168.24.1:8787/rh-osbs/rhosp13-openstack-gnocchi-metricd:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (healthy) gnocchi_metricd
bceb8f1a3f76 192.168.24.1:8787/rh-osbs/rhosp13-openstack-gnocchi-statsd:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (healthy) gnocchi_statsd
b4ce70e2a3b4 192.168.24.1:8787/rh-osbs/rhosp13-openstack-gnocchi-api:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (healthy) gnocchi_api
e829f240326c 192.168.24.1:8787/rh-osbs/rhosp13-openstack-neutron-openvswitch-agent:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (unhealthy) neutron_ovs_agent
6987d5521d74 192.168.24.1:8787/rh-osbs/rhosp13-openstack-neutron-l3-agent:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (unhealthy) neutron_l3_agent
d4c4295bbd4d 192.168.24.1:8787/rh-osbs/rhosp13-openstack-neutron-metadata-agent:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (unhealthy) neutron_metadata_agent
18ea61b231c0 192.168.24.1:8787/rh-osbs/rhosp13-openstack-neutron-dhcp-agent:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (unhealthy) neutron_dhcp
165624d16a8c 192.168.24.1:8787/rh-osbs/rhosp13-openstack-panko-api:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (healthy) panko_api
9278ac9647a8 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-proxy-server:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (healthy) swift_proxy
a8d22a031d39 192.168.24.1:8787/rh-osbs/rhosp13-openstack-panko-api:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours panko_api_cron
c07aa0d32851 192.168.24.1:8787/rh-osbs/rhosp13-openstack-aodh-listener:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (unhealthy) aodh_listener
567b9e7c5aa8 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-container:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours swift_container_auditor
1d2658ea7753 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-proxy-server:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours swift_object_expirer
9a2a25ef2bfa 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-object:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours swift_object_updater
049eb5185b3f 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-container:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours swift_container_replicator
1127a58ccf1b 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-container:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (healthy) swift_container_server
45215ceb5421 192.168.24.1:8787/rh-osbs/rhosp13-openstack-aodh-api:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (healthy) aodh_api
79c98d9c984c 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-object:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours swift_rsync
f2b631cef1fa 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-account:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours swift_account_reaper
0b16d5e0c496 192.168.24.1:8787/rh-osbs/rhosp13-openstack-nova-consoleauth:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (unhealthy) nova_consoleauth
84f082ed61bf 192.168.24.1:8787/rh-osbs/rhosp13-openstack-aodh-notifier:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (unhealthy) aodh_notifier
7f64404cc5f1 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-account:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours swift_account_replicator
72a15b508d5b 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-object:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours swift_object_auditor
c26f49d7af90 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-object:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (healthy) swift_object_server
54d8375166ef 192.168.24.1:8787/rh-osbs/rhosp13-openstack-swift-container:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours swift_container_updater
981716f053e5 192.168.24.1:8787/rh-osbs/rhosp13-openstack-iscsid:20200730.1 "dumb-init --singl..." 29 hours ago Up 29 hours (healthy) iscsid

When checking the /var/lib/mistral/config_download_latest/Controller/external_upgrade_tasks.yaml we can't see any external_upgrade_tasks to stop some of these services leaving them running on the two RHEL7 controllers.

We need to identify all the common services from OSP13 and OSP16.1 and from those common ones, catch the ones which are missing some external_ugprade_tasks. A good example is iscsid: https://github.com/openstack/tripleo-heat-templates/blob/2f607aafeed75aea93ceeb1e42c7a86ddde77d6b/deployment/iscsid/iscsid-container-puppet.yaml

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/749979

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/749979
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=c8b424ea0dc52168389bee8a473cec2cf23952de
Submitter: Zuul
Branch: stable/train

commit c8b424ea0dc52168389bee8a473cec2cf23952de
Author: Jose Luis Franco Arza <email address hidden>
Date: Fri Sep 4 16:54:54 2020 +0200

    [Train only] Add missing stop service steps for FFU.

    We observed, when running the system_upgrade_transfer_data that few OSP
    services which are still present in Train keep running before the database
    backup is restored on the newly upgraded controller. This could cause some
    discrepancies between the restored state and the cluster one.

    Adding all the identified services present in both versions (deprecated
    services will be stopped when upgrading the old nodes operating system).

    Change-Id: I43e0d04866f6ce37be5de8b68aac6d43e165dc76
    Closes-Bug: #1894238

tags: added: in-stable-train
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 11.4.0

This issue was fixed in the openstack/tripleo-heat-templates 11.4.0 release.

Revision history for this message
Marios Andreou (marios-b) wrote :

Bug status has been set to 'Fix-Released' based on the discussion and/or patches above. If you disagree please re-set 'Triaged' and reach out to us on freenode #tripleo thank you!

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.