cluster bootstrap fails after 10': juju.errors.JujuAPIError: watcher was stopped

Bug #2039126 reported by Andrea Ieri
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Won't Fix
Undecided
Unassigned
OpenStack Snap
In Progress
High
Unassigned

Bug Description

Today I tried going through the sunbeam quickstart again on a VM on my local machine, and it failed after 10 minutes with `juju.errors.JujuAPIError: watcher was stopped`.

I am attaching the sunbeam debug log and a juju status in json format.

The instance has 8 cores and 24GB of RAM, and I can confirm no OOM killing took place.

I am not destroying my instance so if I can help by providing further debug data let me know.

Tags: cdo-qa
Revision history for this message
Andrea Ieri (aieri) wrote :
Revision history for this message
Andrea Ieri (aieri) wrote :
Revision history for this message
Marian Gasparovic (marosg) wrote :
tags: added: cdo-qa
Changed in snap-openstack:
assignee: nobody → Guillaume Boutry (gboutry)
importance: Undecided → High
status: New → In Progress
Revision history for this message
James Page (james-page) wrote :

Adding bug task for Juju; we switched the openstack snap to use Juju 3.1 in edge - on machines with many configured network interfaces, the controller bootstraps, but then the controller app stalls in waiting state.

Charms deploy OK, but then units sit in waiting state as well.

The same machine works fine with Juju 3.2

Changed in snap-openstack:
assignee: Guillaume Boutry (gboutry) → nobody
Revision history for this message
Joseph Phillips (manadart) wrote :

As described, Juju cannot come up on manually provisioned machines that have multiple local-cloud scoped IP addresses.

This is due to the lease sub-system waiting for the peer-grouper worker to broadcast API addresses.

In turn this means that singular controller and model workers never start, and applications cannot achieve leader units.

3.2 uses Dqlite for leases and doesn't have the same issue. The peer-grouper worker will still keep logging errors, but the lease system doesn't rely on it and everything comes up as desired.

We do not intend to work on a fix for this in the earlier tracks.

A work-around is to use single-NIC machines for controllers, or disable all NICs but one prior to bootstrap, restoring them after.

Changed in juju:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.