The script to generate image bundles fails silently if docker runs out of space

Bug #1962419 reported by Tee Ngo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Tee Ngo

Bug Description

Brief Description
-----------------
The prestage_images.yml playbook does not fail when docker runs out of space as the docker save command in gen-image-bundles.sh script fails silently.

Severity
--------
Major

Steps to Reproduce
------------------
1. Set up a distributed cloud with virtual subclouds
2. Deploy WRA app on both the system controller and the subclouds
3. Execute dcmanager subcloud prestage --sysadmin-password <sysadmin-passwd> <subcloud-name>

Expected Behavior
------------------
Subcloud prestage succeeds

Actual Behavior
----------------
Subcloud prestage succeeded but a closer look at the logs revealed that docker did not have enough space on the virtual subcloud to process 8G image bundle. Yet, the prestage_images.yml playbook did not fail as expected. The successful prestage was in fact a false positive.

Reproducibility
---------------
100%

System Configuration
--------------------
Distributed Cloud

Branch/Pull Time/Commit
-----------------------
Feb. 17, 2022 master load

Last Pass
---------
This was the first time testing prestage on virtual subclouds.

Timestamp/Logs
--------------
TASK [prestage/prestage-images : debug] **************************************** Friday 25 February 2022 21:23:32 +0000 (0:03:31.492) 0:09:16.972 ******* ok: [subcloud20] => gen_image_bundles_output.stdout_lines:
- '' - Building archive...
- 'Error response from daemon: write /var/lib/docker/tmp/docker-export-490299559/79999170187cd2242daeedb9b49d7b197549f35a4f77339e46cd314686e42a5f/layer.tar: no space left on device'
- Image bundles are stored under /home/sysadmin/22.02.
- Cleaning docker cache... - Completed

Test Activity
-------------
Developer Testing

Workaround
----------
None

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/831112
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/d6318db7e66c2b33eeac61fc82c2db27a32d92e0
Submitter: "Zuul (22348)"
Branch: master

commit d6318db7e66c2b33eeac61fc82c2db27a32d92e0
Author: Tee Ngo <email address hidden>
Date: Sun Feb 27 18:57:42 2022 -0500

    Fix the silent failure in prestage images playbook

    In this commit:
      - Set the pipefail option in gen-image-bundles.sh script to
        prevent silent failure.
      - Add check for docker space upfront and adjust max image
        bundle size accordingly.
      - Update the way the playbook retrieves registry image list so
        that images in the registries that do not end with ".io" are
        also accounted for.

    Test Plan:
      - Test subcloud prestage with virtual subclouds where
        docker available space is less than 16G. Verify that the
        max image bundle size is 4G.
      - Test subcloud prestage with hardware subclouds where
        docker available space is more than 16G. Verify that the
        max image bundle size is 8G.
      - Create a large file under /var/lib/docker to induce out of
        space error. Verify that the prestage images playbook
        fails as expected.
      - Deploy stx-monitor app on the subcloud. Run subcloud prestage
        and verify that docker.elastic.co/beats/filebeat image is
        included in one of the image bundles.

    Closes-Bug: 1962419
    Change-Id: I50c8a6ff2855d58dd62d6959a57fb12f3c36033e
    Signed-off-by: Tee Ngo <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Tee Ngo (teewrs)
importance: Undecided → Medium
tags: added: stx.7.0 stx.config
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.