OpenStack Compute (nova)

Failure to set root password leaves instance in ERROR

Bug #1061045 reported by Johannes Erdfelt on 2012-10-03

This bug affects 2 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Amir Sadoughi	OpenStack Compute (nova) 2013.1 "grizzly"

Bug Description

If the agent isn't running on an instance, then setting the root password will timeout.

The API server will return a 500 error because of an RPC timeout. This should return something other than 500.

Eventually the compute server will timeout as well and leave the instance in ERROR. The instance is still running fine and ERROR seems like an incorrect state to leave the instance in.

Tags:

Revision history for this message

Johannes Erdfelt (johannes.erdfelt) wrote on 2012-10-03:

I think both problems are complicated by the retries that happen in the compute layer. There are 10 retries combined with a 30 second timeout for the xenapi driver, this could take 300 seconds total. This is longer than the RPC timeout.

The retry logic seems unnecessary and appears to be a result of legacy code.

If the whole timeout was something reasonable, then an error could be returned synchronously to the client instead of requiring the instance to be moved to ERROR so an asynchronous error could be made available.

Russell Bryant (russellb) on 2012-11-01

Changed in nova:
status:	New → Confirmed
importance:	Undecided → High
tags:	added: xenserver

Revision history for this message

Chris Behrens (cbehrens) wrote on 2012-12-12:

This must be referring to setting admin password later... after a successful build? set_admin_password in compute_api does a call(), but building a new instance is a cast and wouldn't return a 500 from the API for failed root password setting.

Revision history for this message

Johannes Erdfelt (johannes.erdfelt) wrote on 2012-12-12:

Yes, the 500 error is only when setting the root password after an instance is already built.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-01-16: Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/19854

Changed in nova:
assignee:	nobody → Amir Sadoughi (amir-sadoughi)
status:	Confirmed → In Progress

Revision history for this message

Amir Sadoughi (amir-sadoughi) wrote on 2013-01-24:

I wanted to document the test procedure I used to reproduce the bug and test the bugfix:

1. start instance `nova boot test-instance` in Xen/XCP environment from compute node
2. run `nova root-password test-instance`
3. before hitting [Enter] on the second password, kill the nova-agent on 'test-instance'.
4. observe timeout.
5. a. in case of bugfix missing: observe 'test-instance' in ERROR state and 500 error.
b. in case of bugfix in place: observe 'test-instance' not in ERROR state and 501 error.

Vish Ishaya (vishvananda) on 2013-01-24

tags:

added: folsom-backport-potential

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-02-01: Fix merged to nova (master)

Reviewed: https://review.openstack.org/19854
Committed: http://github.com/openstack/nova/commit/4dc160bf91d21b42363e5187adb96e59f95da717
Submitter: Jenkins
Branch: master

commit 4dc160bf91d21b42363e5187adb96e59f95da717
Author: Amir Sadoughi <email address hidden>
Date: Wed Jan 16 13:15:14 2013 -0600

Removes retry of set_admin_password

    * An RPC timeout may occur if an agent is missing and set_admin_password is
    invoked. This causes 500 errors in the OpenStack API.
    * Implemented a 501 error in API if the password set fails.
    * Modified xenapi agent to use NotImplementedError instead of Exception in
      set_admin_password.
    * Updated test code around set_admin_password to accept different exceptions.
    * Fixes bug 1061045

Change-Id: If7fab56c20f12e0490f4774e00004ed1d94242b9

Changed in nova:
status:	In Progress → Fix Committed

Thierry Carrez (ttx) on 2013-02-21

Changed in nova:
milestone:	none → grizzly-3
status:	Fix Committed → Fix Released

Thierry Carrez (ttx) on 2013-04-04

Changed in nova:
milestone:	grizzly-3 → 2013.1

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.