container stuck restarting on initial deploy is not reconfigured by a subsequent deploy

Bug #1543150 reported by Chris Ricker on 2016-02-08
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla
Critical
Steven Dake

Bug Description

I was playing with kolla inside Virtualbox (which lacks nested VT support)

So, the initial kolla-ansible deploy left me with a nova-libvirt that had tried to start (and failed, due to the lack of nested VT) using the kvm default. Log message was something like:

NFO:__main__:Loading config file at /var/lib/kolla/config_files/config.json
INFO:__main__:Validating config file
INFO:__main__:Copying service configuration files
INFO:__main__:Removing existing destination: /etc/libvirt/libvirtd.conf
INFO:__main__:Copying /var/lib/kolla/config_files/libvirtd.conf to /etc/libvirt/libvirtd.conf
INFO:__main__:Setting permissions for /etc/libvirt/libvirtd.conf
INFO:__main__:Writing out command to execute
Running command: '/usr/sbin/libvirtd —listen'

repeated 10 times on that container

So, then I switched to qemu by editing /etc/kolla/config/nova/nova-compute.conf and setting virt_type there, then rerunning kolla-ansible deploy

The config change appeared to get picked up in that 2-3 containers were reported changed, but the nova-libvirt was still not started

Seems like the ansible doesn't detect that corner case of an initially failed container start and deal with it on reconfig? vs having to manually docker kill it and then run the kolla-ansible

(Note: I'll try to reproduce cleanly and validate this. I had enough in flight on this that it may be more complicated than that)

Steven Dake (sdake) wrote :

Thanks Chris. I have seen this same behavior many times and reporting this as confirmed. It has been reported in the past as well. I'm not sure it can be fixed in an automated way for Liberty because it may require significant changes to the playbooks. After discussion with inc0, we think it will be solvable in Mitaka via a "kolla-ansible reconfigure" feature which could docker exec rm the config once lock file and restart containers that have configuration changes associated with them.

I think a backport may be difficult given the rate of change in Mitaka playbooks, but anything is possible :)

Regards,
-steve

Changed in kolla:
status: New → Confirmed
importance: Undecided → Critical
milestone: none → mitaka-3
Steven Dake (sdake) wrote :

From the original ansible-multi spec in June 2015:
The CONFIG_OUTSIDE_COPY_ONCE model of configuration maintains the immutable and declarative nature of the Kolla containers, as defined by our current Kolla best practices while introducing completely customizable configuration.

The current implementation does not deliver completely customizable configuration because it does not load new configuration changes as originally specified.

After a long discussion with inc0, I understand the pushback on CONFIG_ONCE - it centers around the fact that the above line was not clear that the completely customizable configuration was meant to be altered if the operator made a configuration option change and redeployed the existing containers.

We need to validate if this reconfigures a service that is not in a restart loop. I seem to recall it does, but inc0 seems to recall it doesn't, so we need validation one way or another on this point.

inc0 and sdake also agreed during our discussion that COPY_ONCE was a misnomer - it should be called "COPY_IMMUTABLE" or something similar to more clearly state the intent. The intent in my mind when I wrote this specification wasn't to copy the config one time and forever forget about changes, but instead if there was a reconfig done via the main deployment node, to COPY it one time into the container and forget about it until the next deployment operation. Unfortunately implementation of this requirement is dicy but I'm sure we can find a solution.

We talked about using timestamping to determine if a config change was made, but this boils down to the CONFIG_ALWAYS case. One correct solution would be to docker exec into the container and remove the config lock file for any configuration option changes and restart the container.

Regards
-steve

Steven Dake (sdake) wrote :

This is resolved by the reconfigure action in the playbooks.

Changed in kolla:
status: Confirmed → Won't Fix
assignee: nobody → Steven Dake (sdake)
status: Won't Fix → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers