lxc_host cache issues

Bug #1539236 reported by Bjoern
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
High
Jesse Pretorius

Bug Description

The role lxc_host is not fully idempotent.
When the playbooks are stopped between downloading the lxc cache and extracting the trust container rootfs into /var/cache/lxc/trusty/rootfs-amd6, the rerun of the lxc_host role will not extract cached LXC image since the "Move lxc cached image into place" task is dependent on a completed download of the cache file.
Ideally the tasks would check if the destination is empty and the download was successful (sha matches rpc-trusty-container.tgz)

When I tried to rerun the lxc_host role the post download tasks bailed all out :

TASK: [lxc_hosts | Create apt repos in the cached container] ******************
failed: [storage02] => (item={'url': u'http://rpc-repo.rackspace.com/container_images/rpc-trusty-container.tgz', 'name': 'trusty.tgz', 'chroot_path': 'trusty/rootfs-amd64', 'sha256sum': '56c6a6e132ea7d10be2f3e8104f47136ccf408b30e362133f0dc4a0a9adb4d0c'}) => {"failed": true, "item": {"chroot_path": "trusty/rootfs-amd64", "name": "trusty.tgz", "sha256sum": "56c6a6e132ea7d10be2f3e8104f47136ccf408b30e362133f0dc4a0a9adb4d0c", "url": "http://rpc-repo.rackspace.com/container_images/rpc-trusty-container.tgz"}}
msg: Destination directory /var/cache/lxc/trusty/rootfs-amd64/etc/apt does not exist

.. output truncated

TASK: [lxc_hosts | Update container resolvers] ********************************
failed: [storage01] => (item={'url': u'http://rpc-repo.rackspace.com/container_images/rpc-trusty-container.tgz', 'name': 'trusty.tgz', 'chroot_path': 'trusty/rootfs-amd64', 'sha256sum': '56c6a6e132ea7d10be2f3e8104f47136ccf408b30e362133f0dc4a0a9adb4d0c'}) => {"failed": true, "item": {"chroot_path": "trusty/rootfs-amd64", "name": "trusty.tgz", "sha256sum": "56c6a6e132ea7d10be2f3e8104f47136ccf408b30e362133f0dc4a0a9adb4d0c", "url": "http://rpc-repo.rackspace.com/container_images/rpc-trusty-container.tgz"}}
msg: Destination directory /var/cache/lxc/trusty/rootfs-amd64/run/resolvconf does not exist

.. output truncated

TASK: [lxc_hosts | Update container resolvconf base] **************************
failed: [storage04] => (item={'url': u'http://rpc-repo.rackspace.com/container_images/rpc-trusty-container.tgz', 'name': 'trusty.tgz', 'chroot_path': 'trusty/rootfs-amd64', 'sha256sum': '56c6a6e132ea7d10be2f3e8104f47136ccf408b30e362133f0dc4a0a9adb4d0c'}) => {"failed": true, "item": {"chroot_path": "trusty/rootfs-amd64", "name": "trusty.tgz", "sha256sum": "56c6a6e132ea7d10be2f3e8104f47136ccf408b30e362133f0dc4a0a9adb4d0c", "url": "http://rpc-repo.rackspace.com/container_images/rpc-trusty-container.tgz"}}
msg: Destination directory /var/cache/lxc/trusty/rootfs-amd64/etc/resolvconf/resolv.conf.d does not exist
.. output truncated

The problem was fixed with ansible hosts -m shell -a 'rm -f /var/cache/lxc_trusty.tgz' but I would much rather prefer to make the role more mature

Revision history for this message
Shane Cunningham (appprod0) wrote :

I'd like to add I've seen issues with this container image on Kilo. Specifically with compute nodes since they don't use containers, it would make sense to add logic to the playbooks that checked if the host requires the container image at all.

Bjoern (bjoern-t)
summary: - lxc_host role not idempotent
+ lxc_host cache issues
Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Both these issues are confirmed. I'm doing some work to revise the cache preparation process and will work to improve the robustness of this process while doing that.

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Jesse Pretorius (jesse-pretorius)
milestone: none → mitaka-3
Changed in openstack-ansible:
importance: Medium → High
Changed in openstack-ansible:
milestone: mitaka-3 → 13.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-lxc_hosts (master)

Fix proposed to branch: master
Review: https://review.openstack.org/289339

Changed in openstack-ansible:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-lxc_hosts (master)

Reviewed: https://review.openstack.org/289339
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-lxc_hosts/commit/?id=45beccf508b9fcd31d85e0895efc1efaac0e2032
Submitter: Jenkins
Branch: master

commit 45beccf508b9fcd31d85e0895efc1efaac0e2032
Author: Jesse Pretorius <email address hidden>
Date: Mon Mar 7 13:35:08 2016 +0000

    Always ensure that the local lxc cache file matches the upstream image

    Ansible 1.9x only actually checks whether there's a local file - it never
    checks whether the local file matches the given sha256sum.

    We therefore need to set 'force:yes' to ensure that Ansible does the
    following:
     - download the file to a temporary location, checking its sha256sum
       against the given value
     - check the sha256sum of the existing file and the downloaded file
     - if the sha256sums match, then throw away the temp file
     - if the sha256sums do not match, replace the existing file

    In order to also provide the ability to forcibly delete any existing lxc
    cache which was previously prepared (successfully or unsuccessfully), the
    boolean variable 'lxc_container_base_delete' has been added.

    Change-Id: I988940892c89679edea887716851314fc1cf13b5
    Closes-Bug: #1539236

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
Sam Stoelinga (sammiestoel) wrote :

I am still hitting this issue on commit: 298a2caafd4ab7891eb74b5ff408343ffab74cb1 it seems that didn't make it into stable/mitaka?

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

@Sam It definitely was included in stable/mitaka and that SHA is the Mitaka/13.1.3 SHA bump so it includes it. In fact, 13.1.3 includes https://github.com/openstack/openstack-ansible/blob/298a2caafd4ab7891eb74b5ff408343ffab74cb1/ansible-role-requirements.yml#L32 which in turn includes the change made: https://github.com/openstack/openstack-ansible-lxc_hosts/blob/47ad392dfc266b2fbdcf0c049b900848929b28b7/tasks/lxc_cache.yml#L34

If you're still hitting this issue then please register a new bug with the details of what you're seeing and we can re-triage it with your specific case in mind.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.