Containerized stack update fails on libvirtd: rm: cannot remove '/var/lib/config-data/nova_libvirt/etc/libvirt/qemu': Device or resource busy

Bug #1696622 reported by Oliver Walsh
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Steve Baker

Bug Description

Just hit this error when trying a stack update on an existing containerized deployment:

rm -Rf /var/lib/config-data/nova_libvirt
rm: cannot remove '/var/lib/config-data/nova_libvirt/etc/libvirt/qemu': Device or resource busy

Tripping up in docker-puppet.py on the nova_libvirt container here: https://github.com/openstack/tripleo-heat-templates/blob/master/docker/docker-puppet.py#L173

No VMs were deployed at this point.

Tags: containers
Revision history for this message
Oliver Walsh (owalsh) wrote :

I wonder if an mv would get around this...

Changed in tripleo:
milestone: none → pike-3
importance: Undecided → High
status: New → Triaged
tags: added: containers
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

According to the code https://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/services/nova-libvirt.yaml#n109 and https://git.openstack.org/cgit/openstack/tripleo-heat-templates/tree/docker/services/nova-libvirt.yaml#n118 this seems expected (but highly unwanted) behavior, when something holds it busy via those logically conflicting bind mounted paths resolving into the same path /etc/libvirt/qemu in the container. Therefore, it can't be removed under the host's /var/lib/config-data/nova_libvirt/etc/libvirt/qemu because of items created via the 2nd rw bind mount helding the directory.

Revision history for this message
Oliver Walsh (owalsh) wrote :

mv works:

[root@overcloud-novacompute-0 heat-admin]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b8f2050c7374 192.168.24.1:8787/tripleoupstream/centos-binary-ceilometer-compute:latest "kolla_start" 8 hours ago Up 8 hours ceilometer_agent-compute
16008db60401 192.168.24.1:8787/tripleoupstream/centos-binary-neutron-openvswitch-agent:latest "kolla_start" 8 hours ago Up 8 hours neutronovsagent
2451c1afa3ef 192.168.24.1:8787/tripleoupstream/centos-binary-nova-compute:latest "kolla_start" 8 hours ago Up 8 hours novacompute
1a073725ac5e 192.168.24.1:8787/tripleoupstream/centos-binary-nova-compute:latest "kolla_start" 8 hours ago Up 8 hours nova_migration_target
8f60d9ffe14a 192.168.24.1:8787/tripleoupstream/centos-binary-nova-libvirt:latest "kolla_start" 8 hours ago Up 8 hours nova_libvirt
[root@overcloud-novacompute-0 heat-admin]# mv /var/lib/config-data/nova_libvirt /var/lib/config-data/_nova_libvirt
[root@overcloud-novacompute-0 heat-admin]# rm -Rf /var/lib/config-data/_nova_libvirt
rm: cannot remove ‘/var/lib/config-data/_nova_libvirt/etc/libvirt/qemu’: Device or resource busy

But this causes errors in the running containers when they try to open files in /etc:
Stderr: u'/usr/bin/nova-rootwrap: Incorrect configuration file: /etc/nova/rootwrap.conf\n'
...
Warning: Identity file /etc/nova/migration/identity not accessible: No such file or directory.

Revision history for this message
Steven Hardy (shardy) wrote :

Yeah I noticed the same and I'm hoping this patch from stevebaker will fix it:

https://review.openstack.org/#/c/465802/

The workflow then will be to launch the config container via docker-puppet, rsync the config, then restart the application container if it changed.

The "if it changed" part will require this patch I've been working on, that calculates a hash of the config data and adds it to the container environment, so we have a salt to trigger the application container restart:

https://review.openstack.org/#/c/467581/

Revision history for this message
Steven Hardy (shardy) wrote :

Another workaround is to modify docker-puppet so that we do || echo "not removing" when the rm fails, but the rsync solution is probably cleaner I think, as we can potentially add options to avoid leaving stale config when the "old" directory contains something which isn't in the config-container generated config

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Oliver, regarding the error in #3 you may be hitting an issue with the host XFS being created with the wrong ftype, can you confirm?

https://bugzilla.redhat.com/show_bug.cgi?id=1455713#c3

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

Here is the lp bug regarding comment #3, which is different to the issue in this bug's description

https://bugs.launchpad.net/tripleo/+bug/1693398

Revision history for this message
Steve Baker (steve-stevebaker) wrote :

I think another viable short-term fix for this bug is to extend this change to do more fine-grained mounts in /etc/libvirtd:

  https://review.openstack.org/#/c/467846

But until we do something like unique versioned/hashed config directories like shardy is working on, we're not using docker to do immutable infrastructure, so we should still be aiming for that.

Changed in tripleo:
assignee: nobody → Steve Baker (steve-stevebaker)
status: Triaged → In Progress
Changed in tripleo:
assignee: Steve Baker (steve-stevebaker) → Martin André (mandre)
Changed in tripleo:
assignee: Martin André (mandre) → Steve Baker (steve-stevebaker)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.openstack.org/465802
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=f600d459f051288042ce531bab029953563a11b3
Submitter: Jenkins
Branch: master

commit f600d459f051288042ce531bab029953563a11b3
Author: Steve Baker <email address hidden>
Date: Thu May 18 04:03:29 2017 +0000

    Replace NO_ARCHIVE block with single call to rsync

    Also attempts to move the workaround for bug #1696283 to before the
    puppet apply call.

    Closes-Bug: #1696622
    Change-Id: I3a195466a5039e7641e843c11e5436440bfc5a01

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 7.0.0.0b3

This issue was fixed in the openstack/tripleo-heat-templates 7.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.