Image conversion fails with large qcow2 guest image due to insufficient filesystem size

Bug #1819688 reported by Yang Liu on 2019-03-12
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Elena Taivan

Bug Description

Brief Description
-----------------
Attempted to create a windows glance image, it failed after ~15 minutes, and after that no pods are running, following error returns from any kubectl cmd:
The connection to the server 192.168.205.2:6443 was refused - did you specify the right host or port

Severity
--------
Major

Steps to Reproduce
------------------
1. copy a windows qcow2 image to active controller
2. Attempt to create a glance image using above windows qcow2 image file
glance --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne image-create --property os_type=windows --container-format bare --file /home/wrsroot/images/win2016.qcow2 --name win_2016 --visibility public --disk-format qcow2

Expected Behavior
------------------
- glance image created successfully
- system is functional after that

Actual Behavior
----------------
- glance image creation failed after 17 minutes with following error
[2019-03-11 20:42:37,816] 387 DEBUG MainThread ssh.expect :: Output:
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| checksum | None |
| container_format | bare |
| created_at | 2019-03-11T20:25:18Z |
| disk_format | qcow2 |
| id | 93c65e41-08a7-48c5-b95d-024a3bc0448e |
| min_disk | 0 |
| min_ram | 0 |
| name | win_2016 |
| os_hash_algo | None |
| os_hash_value | None |
| os_hidden | False |
| os_type | windows |
| owner | 9c3337f920d44cb5af0c157af3eb5909 |
| protected | False |
| size | None |
| status | queued |
| tags | [] |
| updated_at | 2019-03-11T20:25:18Z |
| virtual_size | Not available |
| visibility | public |
+------------------+--------------------------------------+
Error finding address for http://glance-api.openstack.svc.cluster.local:9292/v2/images/93c65e41-08a7-48c5-b95d-024a3bc0448e/file: Unable to establish connection to http://glance-api.openstack.svc.cluster.local:9292/v2/images/93c65e41-08a7-48c5-b95d-024a3bc0448e/file: [Errno 110] Connection timed out

- kubectl cmd gets rejected since the failure
controller-1:~$ kubectl get pods
The connection to the server 192.168.205.2:6443 was refused - did you specify the right host or port?

- sudo docker ps returns nothing
controller-1:~$ sudo docker ps
Password:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

Reproducibility
---------------
Reproducible (Saw this issue 2/2 times on regular systems)

System Configuration
--------------------
Two node system, Multi-node system

Branch/Pull Time/Commit
-----------------------
master as of 2019-03-05

Timestamp/Logs
--------------
[2019-03-11 20:25:17,194] 262 DEBUG MainThread ssh.send :: Send 'glance --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne image-create --property os_type=windows --container-format bare --file /home/wrsroot//images/win2016.qcow2 --name win_2016 --visibility public --disk-format qcow2'

Ghada Khalil (gkhalil) on 2019-03-14
tags: added: stx.containers
Ghada Khalil (gkhalil) wrote :

Marking as release gating; related to containers.
Assigning to Brent as this needs an architectural decision. This is likely an issue for any large images.

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Brent Rowsell (brent-rowsell)
tags: added: stx.2019.05
Ken Young (kenyis) on 2019-04-05
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil) on 2019-04-09
tags: added: stx.retestneeded
Frank Miller (sensfan22) wrote :

This issue occurred because the qcow image needs to be converted to raw and this is done via the glance/cinder containers and this is done using the docker_lv filesystem. In the past couple of months the docker_lv filesystem was increased to 30G so this specific TC will now pass. However glance images of larger sizes won't work and by using the docker_lv system, the I/O performance of all containers will be impacted when the conversion is done.

The proper solution for this issue is to support creation of a new (optional) filesystem that would be used for image conversion. As this is more of an enhancement, Brent (TL containers) and Frank (PL for containers) agree this LP should be re-gated to stx.3.0.

tags: added: stx.3.0
removed: stx.2.0
Changed in starlingx:
assignee: Brent Rowsell (brent-rowsell) → Kristine Bujold (kbujold)
Frank Miller (sensfan22) wrote :

Assigning to Kristine to implement in stx.3.0 as Kristine has experience with filesystems for StarlingX/containers.

Yang Liu (yliu12) wrote :

Updated the title to reflect the more generic issue.

summary: - No pods are running after attempt to create windows glance image
+ Image conversion fails with large qcow2 guest image due to insufficient
+ filesystem size
tags: added: stx.regression
Frank Miller (sensfan22) wrote :

This is more an enhancement request than a bug. As such it will not gate stx.3.0. Re-tagging to stx.4.0.

tags: added: stx.4.0
removed: stx.3.0
Frank Miller (sensfan22) wrote :

Re-assigning to Elena to implement a solution

Changed in starlingx:
assignee: Kristine Bujold (kbujold) → Elena Taivan (etaivan)
Changed in starlingx:
status: Triaged → In Progress

Reviewed: https://review.opendev.org/714936
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=4107faed7e3466cba6fe7b6867152c91c869105b
Submitter: Zuul
Branch: master

commit 4107faed7e3466cba6fe7b6867152c91c869105b
Author: Elena Taivan <email address hidden>
Date: Wed Mar 25 11:48:49 2020 +0000

    Add a new filesystem for image conversion

    Adding runtime manifest for conversion logical volume.
    Adding new 'ensure' parameter for 'platform::filesystem' class.

    Change-Id: I622837959a5a7aabc462640b588713396354ce73
    Partial-bug: 1819688
    Signed-off-by: Elena Taivan <email address hidden>

Reviewed: https://review.opendev.org/714939
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=8099bbbbcf6e67190dc2ede949c47da081317e2d
Submitter: Zuul
Branch: master

commit 8099bbbbcf6e67190dc2ede949c47da081317e2d
Author: Elena Taivan <email address hidden>
Date: Wed Mar 25 12:33:42 2020 +0000

    Add a new filesystem for image conversion

    Create the new host_fs CLI commands and the APIs
        system host-fs-add
        system host-fs-delete

    These commands will be used only for adding/removing 'image-conversion'
    filesystem dedicated only for qcow2 image conversion.
    'image-conversion' filesystem is optional.
    It is not allowed to add/remove any other filesystem.

    Change-Id: I87c876371e123ec1ba946170258401d220260e31
    Partial-bug: 1819688
    Depends-On: https://review.opendev.org/#/c/714936/
    Signed-off-by: Elena Taivan <email address hidden>

Reviewed: https://review.opendev.org/724270
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=04a3cb8cbad9b1700286c5de67aa5d974cf54400
Submitter: Zuul
Branch: master

commit 04a3cb8cbad9b1700286c5de67aa5d974cf54400
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 08:44:13 2020 +0000

    Changing permissions for conversion folder

    Adding writing permissions to '/opt/conversion' mountpoint
    so openstack image conversion can happen there.

    Change-Id: Id1a91db6570dcbed3b8068e79e72f5bb800f24ad
    Partial-bug: 1819688
    Signed-off-by: Elena Taivan <email address hidden>

Reviewed: https://review.opendev.org/724288
Committed: https://git.openstack.org/cgit/starlingx/fault/commit/?id=bef81593f094f51d6f6ca3a577b7472a938414b8
Submitter: Zuul
Branch: master

commit bef81593f094f51d6f6ca3a577b7472a938414b8
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 11:30:25 2020 +0000

    Raise alarms for image-conversion

    1. Raise alarm if image-conversion is not added on both controllers
    2. Raise alarm if the size of the filesystem is not the same on both controllers

    Change-Id: I803b313cfee372fd5d025efbba74c1ae34b9e248
    Partial-bug: 1819688
    Signed-off-by: Elena Taivan <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released

Reviewed: https://review.opendev.org/724291
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=491cca42ed854d2cb3ee3646b93c56a4f45f563c
Submitter: Zuul
Branch: master

commit 491cca42ed854d2cb3ee3646b93c56a4f45f563c
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 11:25:26 2020 +0000

    Qcow2 conversion to raw can be done using 'image-conversion' filesystem

    1. Conversion filesystem can be added before/after
       stx-openstack is applied
    2. If conversion filesystem is added after stx-openstack
       is applied, changes to stx-openstack will only take effect
       once the application is re-applied

    3. It is not allowed to delete image-conversion filesystem
       when stx-openstack is in applying/applied/removing state
    4. Raise alarms for image-conversion

    Change-Id: Ie205329b694525509b0820497186fcd9ec2e45c9
    Closes-bug: 1819688
    Depends-On: https://review.opendev.org/#/c/724270/
    Depends-On: https://review.opendev.org/724288/
    Signed-off-by: Elena Taivan <email address hidden>

Download full text (4.5 KiB)

Reviewed: https://review.opendev.org/729817
Committed: https://git.openstack.org/cgit/starlingx/fault/commit/?id=f21b937b2565fa3f8651d4d72d47a18677959b20
Submitter: Zuul
Branch: f/centos8

commit bef81593f094f51d6f6ca3a577b7472a938414b8
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 11:30:25 2020 +0000

    Raise alarms for image-conversion

    1. Raise alarm if image-conversion is not added on both controllers
    2. Raise alarm if the size of the filesystem is not the same on both controllers

    Change-Id: I803b313cfee372fd5d025efbba74c1ae34b9e248
    Partial-bug: 1819688
    Signed-off-by: Elena Taivan <email address hidden>

commit 2f364daa08c64b8976edf7140d3ca85b556ba974
Author: Sharath Kumar K <email address hidden>
Date: Mon Apr 20 18:50:47 2020 +0200

    Tox and Zuul job for the python code scan in starlingx/fault

    Setting up the bandit tool for the scanning of HIGH severity issues
    in the python codes under Starlingx/fault folder.
    Expecting this merge will enable zuul job for CI/CD of bandit scan.

    Configuration files:
    1. tox.ini for adding bandit environment and command.
    2. test-requirements.txt for adding bandit version.
    3. .zuul.yaml file for adding bandit job and configuring under
       check job to run code scan every time before code commit.

    Test:
    Run tox -e bandit command inside the fault folder to validate the
    bandit scan and result.

    Please note:
    Changes will be implemented in batches and this is Batch2 change.

    Story: 2007541
    Task: 39490
    Depends-On: https://review.opendev.org/#/c/721294/

    Change-Id: I84449691281d9769e9219e6f9f1338c20f518f40
    Signed-off-by: Sharath Kumar K <email address hidden>

commit 1c4cd003678f631f02b298dec70c403b976583bb
Author: Bart Wensley <email address hidden>
Date: Thu Apr 23 12:56:24 2020 -0500

    Mark distributed cloud alarms as not management affecting

    Changing both distributed cloud alarms (which are only raised
    on the system controller) to be not management affecting. There
    are two alarms:
     280.001: <subcloud> is offline
     280.002: <subcloud> <resource> sync_status is out-of-sync

    These alarms indicate issues with a subcloud and should not be
    considered management affecting, as that prevents upgrades
    from being done on the system controller.

    Change-Id: Ic3ac3feb3fa0fdd6c95b81e4c0e1327642fc29e4
    Closes-Bug: 1874123
    Signed-off-by: Bart Wensley <email address hidden>

commit df03d2927848fb029961ae9da323ba6701ba7c82
Author: Eric MacDonald <email address hidden>
Date: Tue Apr 14 16:47:55 2020 -0400

    Change fw 'upgrade' to 'update' for vim orchestrated logs/alarms

    Change-Id: Iebda426ca4813f41e82a330dd9bee36ca88160af
    Story: 2006740
    Task: 39143
    Signed-off-by: Eric MacDonald <email address hidden>

commit c63ad80c232918fb58fe351b59cdb309163f685b
Author: Sharath Kumar K <email address hidden>
Date: Fri Apr 3 08:19:21 2020 +0200

    De-branding in starlingx/fault: Titanium Cloud -> StarlingX

    1. Rename Titanium Cloud to StarlingX...

Read more...

tags: added: in-f-centos8
Download full text (16.7 KiB)

Reviewed: https://review.opendev.org/729825
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=d4617fbad74a05f2af81ee85a47565083991e6f8
Submitter: Zuul
Branch: f/centos8

commit 4134023ab84d8a635b118d5e3ff26ade3bbe535b
Author: Sharath Kumar K <email address hidden>
Date: Thu May 7 10:08:11 2020 +0200

    Tox and Zuul job for the bandit code scan in stx/stx-puppet

    Setting up the bandit tool for the scanning of HIGH severity issues
    in the python codes under Starlingx/stx-puppet folder.
    Expecting this merge will enable zuul job for CI/CD of bandit scan.

    Configuration files:
    1. tox.ini for adding bandit environment and command.
    2. test-requirements.txt for adding bandit version.
    3. .zuul.yaml file for adding bandit job and configuring under
       check job to run code scan every time before code commit.

    Test:
    Run tox -e bandit command inside the fault folder to validate the
    bandit scan and result.

    Story: 2007541
    Task: 39687
    Depends-On: https://review.opendev.org/#/c/721294/

    Change-Id: I2982268db2b5e75feeb287bc95420fedc9b0d816
    Signed-off-by: Sharath Kumar K <email address hidden>

commit 65daac29e4635f32a57e80cd18f96fd59dc8ebe0
Author: Bin Qian <email address hidden>
Date: Tue May 12 22:39:21 2020 -0400

    DC cert manifest should only apply to controller nodes

    DC cert manifest should only apply to controller nodes on system
    controller.
    This fix is for DC with worker nodes in central cloud.

    Change-Id: I4233509a6f0afb3013c01e81dea6f655d9e15371
    Closes-Bug: 1878260
    Signed-off-by: Bin Qian <email address hidden>

commit 04a3cb8cbad9b1700286c5de67aa5d974cf54400
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 08:44:13 2020 +0000

    Changing permissions for conversion folder

    Adding writing permissions to '/opt/conversion' mountpoint
    so openstack image conversion can happen there.

    Change-Id: Id1a91db6570dcbed3b8068e79e72f5bb800f24ad
    Partial-bug: 1819688
    Signed-off-by: Elena Taivan <email address hidden>

commit 4e9153cf234e714e4bbc9a9eb3d9b55b2828145a
Author: Tao Liu <email address hidden>
Date: Mon May 4 14:30:30 2020 -0500

    Move subcloud audit to separate process

    Subcloud audit is being removed from the dcmanager-manager
    process and it is running in dcmanager-audit process.

    This update adds associated puppet config.

    Story: 2007267
    Task: 39640
    Depends-On: https://review.opendev.org/#/c/725627/

    Change-Id: Idd2e675126a01d6113597646ddd9eb4a0bc5be44
    Signed-off-by: Tao Liu <email address hidden>

commit b793518f65ae932f3974ff85b797f505b5ef1c2a
Author: Robert Church <email address hidden>
Date: Wed Apr 29 12:49:04 2020 -0400

    Ensure containerd binds to the loopback interface

    Set the stream_server_address to bind to the loopback interface with a
    value of "127.0.0.1" for IPv4 and "::1" for IPv6.

    Without setting the stream_server_address in config.toml, containerd was
    binding to the OAM interface. Under most situations this resulted in
    containe...

Download full text (37.5 KiB)

Reviewed: https://review.opendev.org/729812
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=539d476456277c22d0dcbc3cbbc832e623242264
Submitter: Zuul
Branch: f/centos8

commit 320cc40de8518787c2be234d7fdf88ec0a462df2
Author: Don Penney <email address hidden>
Date: Wed May 13 13:06:11 2020 -0400

    Add auto-versioning to starlingx/config packages

    This update makes use of the PKG_GITREVCOUNT variable to auto-version
    the packages in this repo.

    Change-Id: I3a2c8caeb4b4647608978b1f2ccfcf0661508803
    Depends-On: https://review.opendev.org/727837
    Story: 2006166
    Task: 39766
    Signed-off-by: Don Penney <email address hidden>

commit d9f2aea0fb228ed69eb9c9262e29041eedabc15d
Author: Sharath Kumar K <email address hidden>
Date: Wed Apr 22 16:22:22 2020 +0200

    De-branding in starlingx/config: CGCS -> StarlingX

    1. Rename CGCS to StarlingX for .spec files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

    Please note, doing de-brand changes in batches, this is batch9 changes.

    Story: 2006387
    Task: 39524

    Change-Id: Ia1fe0f2baafb78c974551100f16e6a7d99882f15
    Signed-off-by: Sharath Kumar K <email address hidden>

    De-branding in starlingx/config: CGCS -> StarlingX

    1. Rename CGCS to StarlingX for .spec file
    2. Rename TIS to StarlingX for .service files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

    Please note, doing de-brand changes in batches, this is batch10 changes.

    Story: 2006387
    Task: 36202

    Change-Id: I404ce0da2621495175ad31489e9ad6f7b0211e26
    Signed-off-by: Sharath Kumar K <email address hidden>

commit d141e954fa6bbf688929ec90d1b6604a97792c43
Author: Teresa Ho <email address hidden>
Date: Tue Mar 31 10:08:57 2020 -0400

    Sysinv extensions for FPGA support

    This update adds cli and restapi to support FPGA device
    programming.

    CLI commands:
    system device-image-apply
    system device-image-create
    system device-image-delete
    system device-image-list
    system device-image-remove
    system device-image-show
    system device-image-state-list
    system device-label-list
    system host-device-image-update
    system host-device-image-update-abort
    system host-device-label-assign
    system host-device-label-list
    system host-device-label-remove

    Story: 2006740
    Task: 39498

    Change-Id: I556c2e7a51b3931b5a66ab27b67f51e3a8aebd9f
    Signed-off-by: Teresa Ho <email address hidden>

commit 491cca42ed854d2cb3ee3646b93c56a4f45f563c
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 11:25:26 2020 +0000

    Qcow2 conversion to raw can be done using 'image-conversion' filesystem

    1. Conversion filesystem can be added before/after
       stx-openstack is applied
    2. If conversion filesystem is added after stx-openstack
       is applied, changes to stx-openstack will only take effec...

Ghada Khalil (gkhalil) on 2020-06-03
tags: added: stx.distro.openstack
Yang Liu (yliu12) wrote :

This is verified with new procedure on wcp69-70 with 2020-06-27_18-35-20 load.

tags: removed: stx.retestneeded
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers