Openstack image upload failed with evicted pods

Bug #1943674 reported by OpenInfra
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Unassigned

Bug Description

In a freshly installed stx release 5 (standard dedicated storage) system, I tried to upload 41GB of qcow2 image but failed [1]. Noticed that some of the pods are evicted [2].

Ceph cluster has enough space and health is ok [3].

I can upload smaller files without any issue.

[1] https://paste.opendev.org/show/809313/
[2] https://paste.opendev.org/show/809312/
[3] https://paste.opendev.org/show/809314/

Revision history for this message
OpenInfra (openinfra) wrote :

I have increased both docker (150GB) and kublet (25GB) size in both controllers while working on this issue. And also increase the ceph mon size. [0].
https://paste.opendev.org/show/809328/
Disk usage of kubelet controller-0: [1] and controller-1: [2]
Disk usage of controller-0 [3] and controller-1 [4].

There were couple of pods failed with the following error and few failed with DiskPressure [7][8][9].
The node was low on resource: ephemeral-storage. Container horizon was using 19361853, which exceeds its request of 0. [5][6]

I was able to upload the image after increasing kubelet size to 50GB.

[0] https://paste.opendev.org/show/809328/
[1] https://paste.opendev.org/show/809325/
[2] https://paste.opendev.org/show/809324/
[3] https://paste.opendev.org/show/809326/
[4] https://paste.opendev.org/show/809323/

[5] https://paste.opendev.org/show/809320/
[6] https://paste.opendev.org/show/809319/
[7] https://paste.opendev.org/show/809318/
[8] https://paste.opendev.org/show/809321/
[9] https://paste.opendev.org/show/809322/

Revision history for this message
OpenInfra (openinfra) wrote :

###
### StarlingX
### Release 21.05
###

OS="centos"
SW_VERSION="21.05"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="r/stx.5.0"

JOB="STX_5.0_build_layer_flock"
<email address hidden>"
BUILD_NUMBER="37"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2021-05-21 23:03:55 +0000"

FLOCK_OS="centos"
FLOCK_JOB="STX_5.0_build_layer_flock"
<email address hidden>"
FLOCK_BUILD_NUMBER="37"
FLOCK_BUILD_HOST="starlingx_mirror"
FLOCK_BUILD_DATE="2021-05-21 23:03:55 +0000"

DISTRO_OS="centos"
DISTRO_JOB="STX_5.0_build_layer_distro"
<email address hidden>"
DISTRO_BUILD_NUMBER="35"
DISTRO_BUILD_HOST="starlingx_mirror"
DISTRO_BUILD_DATE="2021-05-18 23:02:22 +0000"

COMPILER_OS="centos"
COMPILER_JOB="STX_5.0_build_layer_compiler"
<email address hidden>"
COMPILER_BUILD_NUMBER="35"
COMPILER_BUILD_HOST="starlingx_mirror"
COMPILER_BUILD_DATE="2021-05-14 19:53:00 +0000"

Revision history for this message
OpenInfra (openinfra) wrote :

It would be good if there is a proper warning or error display in standard error or GUI when the upload failed.

This should be at least reflect in the StarlingX documentation by advising users to increase the disk size of the kubelet.

For example:
"Modify the size of the docker_lv filesystem. By default, the size of the docker_lv filesystem is 30G, which is not enough for stx-openstack installation. Use the host-fs-modify CLI to increase the filesystem size."

https://docs.starlingx.io/deploy_install_guides/r5_release/openstack/install.html

Ghada Khalil (gkhalil)
tags: added: stx.5.0
Changed in starlingx:
status: New → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.