many containers are needlessly restarted on redeploy with no config changes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Michele Baldessari |
Bug Description
Rerunning the overcloud deploy command with no changes restarts a truckload of containers (first seen this via https:/
So we really have three separate issues here. Below is the list of all the containers
that may restart needlessly (at least what I have observed in my tests):
A) cron category:
ceilometer_
cinder_api
cinder_api_cron
cinder_scheduler
heat_api
heat_api_cfn
heat_api_cron
heat_engine
keystone
keystone_cron
logrotate_crond
nova_api
nova_api_cron
nova_conductor
nova_consoleauth
nova_metadata
nova_scheduler
nova_vnc_proxy
openstack-
panko_api
These end up being restarted because in the config volume for the container there is
a cron file and cron files are generated with a timestamp inside:
$ cat /var/lib/
# HEADER: This file was autogenerated at 2018-08-07 11:44:57 +0000 by puppet.
# HEADER: While it can still be managed manually, it is definitely not recommended.
# HEADER: Note particularly that the comments starting with 'Puppet Name' should
# HEADER: not be deleted, as doing so could cause duplicate cron jobs.
# Puppet Name: keystone-manage token_flush
PATH=/bin:
1 * * * * keystone-manage token_flush >>/var/
The timestamp is unfortunately hard coded into puppet in both the cron provider and the parsedfile
provider:
https:/
https:/
One possible fix would be to change the tar command in docker-puppet.py to be something like
this:
tar -c -f - /var/lib/
Note that the 'tar xO' is needed in order to force text output and not use the
native tar format (if you don't the sed won't work).
B) swift category:
swift_account_
swift_account_
swift_account_
swift_account_
swift_container
swift_container
swift_container
swift_container
swift_object_
swift_object_
swift_object_
swift_object_server
swift_object_
swift_proxy
swift_rsync
So the swift containers restart because when recalculating the md5 over the
/var/lib/
B.1) /etc/swift/
B.2) /etc/swift/*.gz it seems that the *.gz files seem to change over time
I see two potential fixes here:
B.3) We simply exclude rings and their backups from the tar commands:
EXCL="-
tar -c -f - /var/lib/
B.4) We stop storing the rings under /etc/swift and we store them in some other place ?
I lack the necessary swift knowledge to add anything too useful here.
C) libvirt category:
nova_compute
nova_libvirt
nova_migration_
nova_virtlogd
This one seems to be due to the fact that the /etc/libvirt/
is just different after a redeploy:
[root@compute-1 nova_libvirt]# git diff cb2441bb1caf757
diff --git a/puppet-
index 66e2024..6ffa492 100644
Binary files a/puppet-
(I have not exactly understood why yet because in my quick test without tls a simple 'saslpasswd2 -d -a libvirt -u overcloud migration' did not change the md5 for me, but maybe I am missing something)
Excluding this file from the tar command is probably a bad idea because if you do indeed change the password of the migration user then you probably do want the libvirtd container to restart (actually you likely want only that one to restart, as opposed to all nova* containers on the compute which is what happens now)
Just to make sure that the above theories are all correct I used the following tar commands:
EXCL="-
tar -c -f - /var/lib/
tar -c -f - /var/lib/
And I observed no spurious restarts.
tags: | added: idempotency |
Changed in tripleo: | |
importance: | High → Critical |
Changed in tripleo: | |
milestone: | rocky-rc1 → rocky-rc2 |
Changed in tripleo: | |
assignee: | nobody → Michele Baldessari (michele) |
Changed in tripleo: | |
status: | Triaged → In Progress |
Changed in tripleo: | |
milestone: | rocky-rc2 → stein-1 |
Changed in tripleo: | |
milestone: | stein-1 → stein-2 |
Changed in tripleo: | |
milestone: | stein-2 → stein-3 |
Changed in tripleo: | |
status: | In Progress → Fix Released |
Related fix proposed to branch: master /review. openstack. org/590008
Review: https:/