OpenStack Snap

registry rate-limiting: `cluster bootstrap` fails - timeout reached

Bug #2033116 reported by Peter Matulis on 2023-08-25

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Snap	Triaged	Medium	Unassigned

Bug Description

I just hit this, and I've seen several reports in the wild about the same problem. I'm deploying a single-node scenario and my machine has plenty of resources (Icarus metal: 'node-mees') yet I'm encountering a hardcoded timeout.

$ sudo snap install --channel 3.2/stable juju
$ sudo snap install --channel 2023.1/edge openstack
$ sunbeam prepare-node-script | bash -x && newgrp snap_daemon
$ sunbeam cluster bootstrap --accept-defaults

⠇ Deploying OpenStack Control Plane to Kubernetes (this may take a while) ... waiting for services to come online (0/29)Timed out while waiting for model 'opens
tack' to be ready
Error: Timed out while waiting for model 'openstack' to be ready

Tags:

Revision history for this message

Peter Matulis (petermatulis) wrote on 2023-08-25:

cluster-bootstrap.txt Edit (131.7 KiB, text/plain)

Revision history for this message

James Page (james-page) wrote on 2023-10-02:

As we discussed I think you're probably hitting the Dockerhub rate limits (I use the same lab and hit this sometimes as well).

This is a general issue with where microk8s and Juju source container images from - discussing with these teams to see if we can improve this situation.

Revision history for this message

Pedro Victor Lourenço Fragola (pedrovlf) wrote on 2023-10-24:

I encountered the same issue in my tests, and what I observed is that all the pods are healthy and properly created on the MicroK8s side.

Upon further investigation, it appears that when using a VM that meets the basic requirements outlined in the documentation[0] and employs an HDD disk, timeouts tend to occur frequently. On the other hand, when conducting the same test with an SSD disk to host the VM and increasing the CPU resources, timeouts no longer occur.

During the testing with the HDD, I noticed a high iowait, ranging from 30~50, and a significant number of CPU spikes. Ideally, we should explore options to extend the timeout for the bootstrap process.

[0] https://microstack.run/docs/single-node-guided

tags:

added: sts

Revision history for this message

James Page (james-page) wrote on 2023-11-01:

Ultimately this would be resolved by a) components not using registries with low thresholds for rate limiting or b) using caching OCI registries to reduce hits on public registries.

We don't current have a way todo b) but its planned for development over the coming release cycle.

summary:	- `cluster bootstrap` fails - timeout reached + registry rate-limiting: `cluster bootstrap` fails - timeout reached
Changed in snap-openstack:
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Dan Emmons (dan-emmons) wrote on 2023-11-14 (last edit on 2024-02-28):

I'm consistently running into this issue on a VM, and what I'm seeing matches with pedrovlf's comment, as the underlying storage I have available is not SSD, and the iowait is consistently high during the process. I tested the idea of increasing the timeout, with the idea that the parameter could be made configurable, but the results did not make that look viable. I was able to stop the snap providing sunbeam, extract the snap contents, edit openstack.py so that the timeout was high enough to be functionally removed (and added a confirmation message so I could verify it was this version running), re-squashed this into a new snap file, replaced the original with it, and restarted services. The result was that it continued to run overnight, but even after more than 12 hours, it was only at:

waiting for services to come online (15/29)

This was where it had stalled for most of the run time, and I don't think it would have completed. Since the guide does say an SSD is required, I suppose it isn't a bug, but unfortunately I can confirm that the most obvious workaround, for the patient, is probably not a solution. This may of course be a separate issue from the Dockerhub rate limits.

UPDATE:
This issue was resolved for me by switching from Gnome-boxes with user-level QEMU/KVM to virt-manager with system-level QEMU/KVM, for the VM sunbeam was being run inside.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

cluster-bootstrap.txt Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.