Download images fails when there are many images
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
Medium
|
Boovan Rajendran |
Bug Description
Brief Description:
This could happen anytime the download step is used, however it is most likely to happen during restore, when a large number of images are downloaded.
While performing optimized restore all images are redownloaded.
During this download phase, the containerd cache is not cleared. Therefore you can fill up the cache before all the images are downloaded. This will cause the download task to fail.
Severity:
Critical
Steps to Reproduce:
Deploy a system
Increase size of docker-distribution lv so it's larger than docker lv
e.g. system controllerfs-modify docker-
Push images to regsitry.local until you have much more than docker lv size
Backup
Optimized restore
Expected Behavior:
Restore works
Actual Behavior:
Restore fails because not enough space to download images
Reproducibility:
100%
System Configuration
AIO-SX and system controllers
Last Pass
N/A
Ansible:
TASK [common/
Tuesday 12 December 2023 21:05:49 +0000 (0:00:00.064) 0:12:38.552 ******
FAILED - RETRYING: Download images and push to local registry (10 retries left).
FAILED - RETRYING: Download images and push to local registry (9 retries left).
FAILED - RETRYING: Download images and push to local registry (8 retries left).
FAILED - RETRYING: Download images and push to local registry (7 retries left).
Containerd.log:
2023-12-
2023-12-
2023-12-
2023-12-
2023-12-
2023-12-
2023-12-
df -h
sysadmin@
Filesystem Size Used Avail Use% Mounted on
none 7.6G 0 7.6G 0% /dev
tmpfs 7.7G 3.9M 7.7G 1% /run
/dev/mapper/
/dev/sda4 2.0G 205M 1.6G 12% /boot
tmpfs 7.7G 312K 7.7G 1% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 4.0M 0 4.0M 0% /sys/fs/cgroup
tmpfs 1.0G 196K 1.0G 1% /tmp
/dev/mapper/
/dev/sda3 300M 14M 287M 5% /boot/efi
/dev/mapper/
/dev/sda2 29G 26G 2.2G 93% /var/rootdirs/o
pt/platform-backup
/dev/mapper/
/dev/mapper/
cratch
/dev/mapper/
pt/backups
/dev/mapper/
t
/dev/drbd0 20G 126M 19G 1% /var/lib/postgr
esql
/dev/drbd1 2.0G 384M 1.5G 21% /var/lib/rabbit
mq
/dev/drbd2 9.8G 2.0M 9.3G 1% /var/rootdirs/o
pt/platform
/dev/drbd5 990M 24K 923M 1% /var/rootdirs/o
pt/extension
/dev/drbd7 4.9G 28K 4.6G 1% /var/rootdirs/o
pt/etcd
/dev/drbd8 40G 17G 21G 45% /var/lib/docker
-distribution
sudo du -hd1 /var/lib/docker/
sysadmin@
24K /var/lib/
0 /var/lib/
0 /var/lib/
0 /var/lib/
4.0K /var/lib/
24K /var/lib/
0 /var/lib/
28K /var/lib/
0 /var/lib/
72K /var/lib/
0 /var/lib/docker/tmp
0 /var/lib/
0 /var/lib/
7.7G /var/lib/
0 /var/lib/
0 /var/lib/
22G /var/lib/
7.3M /var/lib/
0 /var/lib/
0 /var/lib/
30G /var/lib/docker/
Alarms
N/A
Test Activity:
Developer Testing
Workaround:
While ansible is running the step "Download images and push to local registry", execute the following. Do not let the docker-lv become full:
sudo bash -c -- 'while true; do sleep 60 && crictl rmi --prune; done'
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
importance: | Undecided → Medium |
tags: | added: stx.10.0 stx.config |
Changed in starlingx: | |
assignee: | nobody → Boovan Rajendran (brajendr) |
Reviewed: https:/ /review. opendev. org/c/starlingx /ansible- playbooks/ +/906304 /opendev. org/starlingx/ ansible- playbooks/ commit/ d81436d34eb867a 16788b66393bb37 83b478e581
Committed: https:/
Submitter: "Zuul (22348)"
Branch: master
commit d81436d34eb867a 16788b66393bb37 83b478e581
Author: Boovan Rajendran <email address hidden>
Date: Mon Jan 22 11:43:03 2024 -0500
Exclude unwanted cached images to download during optimized restore
During optimized restore operation download images step is
failing since there is not enough space in the crictl image cache
to store all of the images.
While performing optimized B&R, we are taking a list of crictl cache
images as a backup during backup operation. While restore download
the images that are present in the backup cached image list and
exclude the other images.
we need list of k8s control plane images to satisfy the below scenarios. image_cache_ list is not present in backup file during restore.
- crictl_
- crictl image cache was cleared before backup.
* Push an image to registry.local before backup in a way that does not local:9001/ docker. io/busybox: latest local:9001/ docker. io/busybox: latest openrc
add it to cache.
```
docker login registry.local:9001 -u admin
docker image pull busybox
docker tag busybox:latest registry.
docker push registry.
```
* Check the pushed image is not present in crictl image cache
after optimized restore.
```
crictl images
```
* Check the pushed image is present in registry.local
after optimized restore.
```
source /etc/platform/
system registry-image-list
```
Test plan:
PASS: Perform optimized B&R on AIO-SX, verify unwanted cached images
deleted successfully after restore.
PASS: Perform optimized B&R on AIO-SX, verify that custom images are
in registry.local after restore.
PASS: Tested by creating and installing an iso as AIO-SX.
PASS: Tested by performing multiple k8s upgrade from 1.24 to 1.27.
PASS: Tested by performing unoptimized B&R on AIO-SX.
PASS: Tested by performing platform upgrade.
PASS: Tested by installing DC system.
PASS: Tested by performing Subcloud prestage.
Closes-Bug: 2051005
Change-Id: Iece7229a6c0089 c99be6905d7d3b9 e053c45d385
Signed-off-by: Boovan Rajendran <email address hidden>