Cluster gets disconnected after error: provisioningserver.service_monitor.UnknownServiceError: 'maas-dhcpd' is unknown to upstart.

Bug #1457708 reported by Ashley Lai on 2015-05-22
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
MAAS
Critical
Raphaël Badin

Bug Description

For the past few hours all deployments in prodstack hit with the 400 error. Please see the logs attached.

ERROR failed to bootstrap environment: cannot start bootstrap instance: gomaasapi: got error back from server: 400 BAD REQUEST ({"power_type": ["The cluster controller for this node is not responding; power type validation is not available.Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)"], "distro_series": ["'trusty' is not a valid distro_series. It should be one of: ''."]})
2015-05-22 01:01:22,838 [ERROR] oil_ci.juju.client: Calling "juju bootstrap" failed!
2015-05-22 01:01:22,838 [ERROR] oil_ci.cli: Deployment failed:
+ rc=1
+ echo 'Deployment returned: 1'

Tags: oil Edit Tag help

Related branches

Ashley Lai (alai) wrote :
Larry Michel (lmic) on 2015-05-22
summary: - Prostack: all pipelines hit with 400 BAD REQUEST error
+ 1.8b7: all pipelines hit with 400 BAD REQUEST error

At the time the problem happened, there is this in the regiond logs:
2015-05-22 01:01:21 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:21 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:21 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:21 [-] 127.0.0.1 - - [22/May/2015:01:01:20 +0000] "POST /MAAS/api/1.0/nodes/node-95167f62-12b6-11e4-9a15-00163eca07b6/?op=start HTTP/1.1" 400 230 "-" "Go 1.1 package http"
2015-05-22 01:01:22 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:22 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:22 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)
2015-05-22 01:01:22 [maasserver] ERROR: Unable to get RPC connection for cluster 'OIL Cluster' (037c960b-5b9f-4701-8366-eeda2c09d14e)

Raphaël Badin (rvb) wrote :

Actually, clusterd.log contains the stacktrace that explains why the cluster got disconnected: http://paste.ubuntu.com/11279881/

summary: - 1.8b7: all pipelines hit with 400 BAD REQUEST error
+ Cluster gets disconnected after error:
+ provisioningserver.service_monitor.UnknownServiceError: 'maas-dhcpd' is
+ unknown to upstart.
Raphaël Badin (rvb) wrote :

This error causes the cluster to disconnect.

Changed in maas:
importance: Undecided → Critical
status: New → Triaged
assignee: nobody → Raphaël Badin (rvb)
Raphaël Badin (rvb) on 2015-05-22
Changed in maas:
status: Triaged → In Progress
Raphaël Badin (rvb) on 2015-05-22
Changed in maas:
milestone: none → 1.8.0
Raphaël Badin (rvb) on 2015-05-22
Changed in maas:
status: In Progress → Fix Committed
Blake Rouse (blake-rouse) wrote :

I think the code should also handle this error and not crash. The packaging fix is needed but handling the exception needs to be better as well.

Raphaël Badin (rvb) wrote :

>I think the code should also handle this error and not crash. The packaging fix is needed but handling the exception needs to
> be better as well.

Agreed, and that's why I filed bug 1457799.

Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments