Failed to redeploy Maas based cluster after missing role is added

Bug #2060803 reported by Hemanth Nakkina
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Snap
New
Undecided
Unassigned

Bug Description

Deployed sunbeam with maas provider.

1. sunbeam cluster bootstrap --> successful
2. sunbeam cluster deploy --> failed as the node has no storage role.

$ sunbeam cluster deploy
Deployments needs at least one of each role to work correctly:
        control: 1
        compute: 1
        storage: 0

3. Added the role storage to the machine.
4. sunbeam cluster deploy --> Hanged forever

sunbeam tried to acquire a new node from maas.

juju models
Controller: sb2-controller

Model Cloud/Region Type Status Machines Cores Units Access Last connection
controller sb2/default maas available 1 2 2 admin just now
openstack-machines sb2/default maas available 2 2 - admin 4 minutes ago
ubuntu@sunbeam:~$ juju status -m openstack-machines
Model Controller Cloud/Region Version SLA Timestamp
openstack-machines sb2-controller sb2/default 3.4.2 unsupported 11:38:10Z

Machine State Address Inst id Base AZ Message
0 started 10.40.0.206 frank-sloth ubuntu@22.04 default Deployed
1 down pending ubuntu@22.04 failed to acquire node: No machine with system ID hdabk4 available.

Expected: sunbeam to pick `machine-0` or should skip deploying machine

Relevant logs:
11:33:02,411 sunbeam.jobs.common DEBUG Skipping step Add infrastructure model
11:33:02,413 sunbeam.jobs.common DEBUG Starting step 'Add machines'
11:33:03,301 sunbeam.provider.maas.steps DEBUG Machines fetched: [{'system_id': 'hdabk4', 'hostname': 'frank-sloth', 'roles': ['compute', 'storage', 'control'], 'zone': 'default', 'status': 'Deployed', 'root_disk': {'name': 'vda', 'tags': ['rotary', '1rpm'], 'root_partition': {'size': 42941284352}}, 'storage': {'ceph': []}, 'spaces': ['admin-space'], 'nics': [{'id': 39, 'name': 'ens3', 'mac_address': 'fa:16:3e:4a:d9:2c', 'tags': []}], 'cores': 2, 'memory': 4096}, {'system_id': '4wy4dd', 'hostname': 'good-dane', 'roles': ['juju-controller'], 'zone': 'default', 'status': 'Deployed', 'root_disk': {'name': 'vda', 'tags': ['rotary', '1rpm'], 'root_partition': {'size': 42941284352}}, 'storage': {'ceph': []}, 'spaces': ['admin-space'], 'nics': [{'id': 40, 'name': 'ens3', 'mac_address': 'fa:16:3e:11:37:47', 'tags': []}], 'cores': 2, 'memory': 4096}]
11:33:03,301 sunbeam.provider.maas.steps DEBUG Machines containing worker roles: [{'system_id': 'hdabk4', 'hostname': 'frank-sloth', 'roles': ['compute', 'storage', 'control'], 'zone': 'default', 'status': 'Deployed', 'root_disk': {'name': 'vda', 'tags': ['rotary', '1rpm'], 'root_partition': {'size': 42941284352}}, 'storage': {'ceph': []}, 'spaces': ['admin-space'], 'nics': [{'id': 39, 'name': 'ens3', 'mac_address': 'fa:16:3e:4a:d9:2c', 'tags': []}], 'cores': 2, 'memory': 4096}]
11:33:03,301 sunbeam.clusterd.service DEBUG [get] https://10.40.0.205:7000/1.0/nodes, args={'allow_redirects': True}
11:33:03,306 urllib3.connectionpool DEBUG https://10.40.0.205:7000 "GET /1.0/nodes HTTP/1.1" 200 193
11:33:03,307 sunbeam.clusterd.service DEBUG Response(<Response [200]>) = {"type":"sync","status":"Success","status_code":200,"operation":"","error_code":0,"error":"","metadata":[{"name":"frank-sloth","role":["compute","control"],"machineid":0,"systemid":"hdabk4"}]}

11:33:03,307 sunbeam.jobs.common DEBUG Running step Add machines
11:33:03,307 sunbeam.clusterd.service DEBUG [put] https://10.40.0.205:7000/1.0/nodes/frank-sloth, args={'data': '{"role": ["compute", "storage", "control"], "machineid": -1, "systemid": ""}'}
11:33:03,315 urllib3.connectionpool DEBUG https://10.40.0.205:7000 "PUT /1.0/nodes/frank-sloth HTTP/1.1" 200 108
11:33:03,316 sunbeam.clusterd.service DEBUG Response(<Response [200]>) = {"type":"sync","status":"Success","status_code":200,"operation":"","error_code":0,"error":"","metadata":{}}

11:33:03,317 sunbeam.jobs.common DEBUG Finished running step 'Add machines'. Result: ResultType.COMPLETED
11:33:03,320 sunbeam.jobs.common DEBUG Starting step 'Deploy machines'
11:33:03,321 sunbeam.clusterd.service DEBUG [get] https://10.40.0.205:7000/1.0/nodes, args={'allow_redirects': True}
11:33:03,326 urllib3.connectionpool DEBUG https://10.40.0.205:7000 "GET /1.0/nodes HTTP/1.1" 200 203
11:33:03,327 sunbeam.clusterd.service DEBUG Response(<Response [200]>) = {"type":"sync","status":"Success","status_code":200,"operation":"","error_code":0,"error":"","metadata":[{"name":"frank-sloth","role":["compute","control","storage"],"machineid":0,"systemid":"hdabk4"}]}

11:33:03,566 connector DEBUG Connector: closing controller connection
11:33:03,570 sunbeam.jobs.common DEBUG Running step Deploy machines
11:33:03,571 sunbeam.provider.maas.steps DEBUG Adding machine frank-sloth to model openstack-machines
11:33:03,792 connector DEBUG Connector: closing controller connection
11:33:03,846 sunbeam.clusterd.service DEBUG [put] https://10.40.0.205:7000/1.0/nodes/frank-sloth, args={'data': '{"role": null, "machineid": 1, "systemid": ""}'}
11:33:03,853 urllib3.connectionpool DEBUG https://10.40.0.205:7000 "PUT /1.0/nodes/frank-sloth HTTP/1.1" 200 108
11:33:03,853 sunbeam.clusterd.service DEBUG Response(<Response [200]>) = {"type":"sync","status":"Success","status_code":200,"operation":"","error_code":0,"error":"","metadata":{}}

Additional Notes:
Need to pass some timeout to wait_all_machines_deployed https://github.com/canonical/snap-openstack/blob/e753afbc73efefd6746f96373f5cc1008e334798/sunbeam-python/sunbeam/provider/maas/steps.py#L1294

description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.