registry rate-limiting: `cluster bootstrap` fails - timeout reached

Bug #2033116 reported by Peter Matulis
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Snap
Triaged
Medium
Unassigned

Bug Description

I just hit this, and I've seen several reports in the wild about the same problem. I'm deploying a single-node scenario and my machine has plenty of resources (Icarus metal: 'node-mees') yet I'm encountering a hardcoded timeout.

$ sudo snap install --channel 3.2/stable juju
$ sudo snap install --channel 2023.1/edge openstack
$ sunbeam prepare-node-script | bash -x && newgrp snap_daemon
$ sunbeam cluster bootstrap --accept-defaults

⠇ Deploying OpenStack Control Plane to Kubernetes (this may take a while) ... waiting for services to come online (0/29)Timed out while waiting for model 'opens
tack' to be ready
Error: Timed out while waiting for model 'openstack' to be ready

Tags: sts
Revision history for this message
Peter Matulis (petermatulis) wrote :
Revision history for this message
James Page (james-page) wrote :

As we discussed I think you're probably hitting the Dockerhub rate limits (I use the same lab and hit this sometimes as well).

This is a general issue with where microk8s and Juju source container images from - discussing with these teams to see if we can improve this situation.

Revision history for this message
Pedro Victor Lourenço Fragola (pedrovlf) wrote :

I encountered the same issue in my tests, and what I observed is that all the pods are healthy and properly created on the MicroK8s side.

Upon further investigation, it appears that when using a VM that meets the basic requirements outlined in the documentation[0] and employs an HDD disk, timeouts tend to occur frequently. On the other hand, when conducting the same test with an SSD disk to host the VM and increasing the CPU resources, timeouts no longer occur.

During the testing with the HDD, I noticed a high iowait, ranging from 30~50, and a significant number of CPU spikes. Ideally, we should explore options to extend the timeout for the bootstrap process.

[0] https://microstack.run/docs/single-node-guided

tags: added: sts
Revision history for this message
James Page (james-page) wrote :

Ultimately this would be resolved by a) components not using registries with low thresholds for rate limiting or b) using caching OCI registries to reduce hits on public registries.

We don't current have a way todo b) but its planned for development over the coming release cycle.

summary: - `cluster bootstrap` fails - timeout reached
+ registry rate-limiting: `cluster bootstrap` fails - timeout reached
Changed in snap-openstack:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Dan Emmons (dan-emmons) wrote (last edit ):

I'm consistently running into this issue on a VM, and what I'm seeing matches with pedrovlf's comment, as the underlying storage I have available is not SSD, and the iowait is consistently high during the process. I tested the idea of increasing the timeout, with the idea that the parameter could be made configurable, but the results did not make that look viable. I was able to stop the snap providing sunbeam, extract the snap contents, edit openstack.py so that the timeout was high enough to be functionally removed (and added a confirmation message so I could verify it was this version running), re-squashed this into a new snap file, replaced the original with it, and restarted services. The result was that it continued to run overnight, but even after more than 12 hours, it was only at:

waiting for services to come online (15/29)

This was where it had stalled for most of the run time, and I don't think it would have completed. Since the guide does say an SSD is required, I suppose it isn't a bug, but unfortunately I can confirm that the most obvious workaround, for the patient, is probably not a solution. This may of course be a separate issue from the Dockerhub rate limits.

UPDATE:
This issue was resolved for me by switching from Gnome-boxes with user-level QEMU/KVM to virt-manager with system-level QEMU/KVM, for the VM sunbeam was being run inside.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.