Image conversion fails with large qcow2 guest image due to insufficient filesystem size

Bug #1819688 reported by Yang Liu on 2019-03-12
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Medium
Kristine Bujold

Bug Description

Brief Description
-----------------
Attempted to create a windows glance image, it failed after ~15 minutes, and after that no pods are running, following error returns from any kubectl cmd:
The connection to the server 192.168.205.2:6443 was refused - did you specify the right host or port

Severity
--------
Major

Steps to Reproduce
------------------
1. copy a windows qcow2 image to active controller
2. Attempt to create a glance image using above windows qcow2 image file
glance --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne image-create --property os_type=windows --container-format bare --file /home/wrsroot/images/win2016.qcow2 --name win_2016 --visibility public --disk-format qcow2

Expected Behavior
------------------
- glance image created successfully
- system is functional after that

Actual Behavior
----------------
- glance image creation failed after 17 minutes with following error
[2019-03-11 20:42:37,816] 387 DEBUG MainThread ssh.expect :: Output:
+------------------+--------------------------------------+
| Property | Value |
+------------------+--------------------------------------+
| checksum | None |
| container_format | bare |
| created_at | 2019-03-11T20:25:18Z |
| disk_format | qcow2 |
| id | 93c65e41-08a7-48c5-b95d-024a3bc0448e |
| min_disk | 0 |
| min_ram | 0 |
| name | win_2016 |
| os_hash_algo | None |
| os_hash_value | None |
| os_hidden | False |
| os_type | windows |
| owner | 9c3337f920d44cb5af0c157af3eb5909 |
| protected | False |
| size | None |
| status | queued |
| tags | [] |
| updated_at | 2019-03-11T20:25:18Z |
| virtual_size | Not available |
| visibility | public |
+------------------+--------------------------------------+
Error finding address for http://glance-api.openstack.svc.cluster.local:9292/v2/images/93c65e41-08a7-48c5-b95d-024a3bc0448e/file: Unable to establish connection to http://glance-api.openstack.svc.cluster.local:9292/v2/images/93c65e41-08a7-48c5-b95d-024a3bc0448e/file: [Errno 110] Connection timed out

- kubectl cmd gets rejected since the failure
controller-1:~$ kubectl get pods
The connection to the server 192.168.205.2:6443 was refused - did you specify the right host or port?

- sudo docker ps returns nothing
controller-1:~$ sudo docker ps
Password:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

Reproducibility
---------------
Reproducible (Saw this issue 2/2 times on regular systems)

System Configuration
--------------------
Two node system, Multi-node system

Branch/Pull Time/Commit
-----------------------
master as of 2019-03-05

Timestamp/Logs
--------------
[2019-03-11 20:25:17,194] 262 DEBUG MainThread ssh.send :: Send 'glance --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne image-create --property os_type=windows --container-format bare --file /home/wrsroot//images/win2016.qcow2 --name win_2016 --visibility public --disk-format qcow2'

Ghada Khalil (gkhalil) on 2019-03-14
tags: added: stx.containers
Ghada Khalil (gkhalil) wrote :

Marking as release gating; related to containers.
Assigning to Brent as this needs an architectural decision. This is likely an issue for any large images.

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Brent Rowsell (brent-rowsell)
tags: added: stx.2019.05
Ken Young (kenyis) on 2019-04-05
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil) on 2019-04-09
tags: added: stx.retestneeded
Frank Miller (sensfan22) wrote :

This issue occurred because the qcow image needs to be converted to raw and this is done via the glance/cinder containers and this is done using the docker_lv filesystem. In the past couple of months the docker_lv filesystem was increased to 30G so this specific TC will now pass. However glance images of larger sizes won't work and by using the docker_lv system, the I/O performance of all containers will be impacted when the conversion is done.

The proper solution for this issue is to support creation of a new (optional) filesystem that would be used for image conversion. As this is more of an enhancement, Brent (TL containers) and Frank (PL for containers) agree this LP should be re-gated to stx.3.0.

tags: added: stx.3.0
removed: stx.2.0
Changed in starlingx:
assignee: Brent Rowsell (brent-rowsell) → Kristine Bujold (kbujold)
Frank Miller (sensfan22) wrote :

Assigning to Kristine to implement in stx.3.0 as Kristine has experience with filesystems for StarlingX/containers.

Yang Liu (yliu12) wrote :

Updated the title to reflect the more generic issue.

summary: - No pods are running after attempt to create windows glance image
+ Image conversion fails with large qcow2 guest image due to insufficient
+ filesystem size
tags: added: stx.regression
Frank Miller (sensfan22) wrote :

This is more an enhancement request than a bug. As such it will not gate stx.3.0. Re-tagging to stx.4.0.

tags: added: stx.4.0
removed: stx.3.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers