Need raise exception when performing action on instance if compute service is down

Bug #1228804 reported by ChangBo Guo(gcb)
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Current compute api doesn't have precheck for compute service before performing action on instance . If the compute service isn't available, for example like the compute host is down. and stop/start instance will lead to keep instance' task_state with 'powering-off'/'powering-on' without raising exception.
These actions include stop/start, pause/unpause suspend/resume
We need raise exception to report compute service is not available as below:

[root@guochbo-10-7-0-151 ~]# nova start test1
ERROR: Compute service of guochbo-10-7-0-151.sce.cn.ibm.com is unavailable at this time. (HTTP 400) (Request-ID: req-282b4ee7-8920-43e7-8ce7-40bc2330fc49)

Tags: api
Revision history for this message
ChangBo Guo(gcb) (glongwave) wrote :
Download full text (8.1 KiB)

This can be reproduced as follow steps:

1)nova show gcb_test_222
+--------------------------------------+----------------------------------------------------------+
| Property | Value |
+--------------------------------------+----------------------------------------------------------+
| status | ACTIVE |
| updated | 2013-09-21T03:45:52Z |
| OS-EXT-STS:task_state | None |
| OS-EXT-SRV-ATTR:host | kvm-installer-222 |
| key_name | None |
| image | cirros-Image (c2fdf54f-4978-4ee9-af0e-d1e0d5db2631) |
| hostId | b605c007b2a5de389a65a645d88d708b82745b4706361d9ef18bf410 |
| OS-EXT-STS:vm_state | active |
| OS-EXT-SRV-ATTR:instance_name | instance-00000038 |
| OS-SRV-USG:launched_at | 2013-09-20T22:45:53.555626 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | kvm-installer-222 |
| flavor | m1.tiny (1) |
| id | 7677f07e-f5f4-42ee-a5d1-f1dbcddab769 |
| security_groups | [{u'name': u'default'}] |
| OS-SRV-USG:terminated_at | None |
| user_id | d02e0752613142da90387adcc7cf2664 |
| name | gcb_test_222 |
| created | 2013-09-21T03:45:49Z |
| tenant_id | 6db20e01ef6f42f18cbebb56f70ebed3 |
| OS-DCF:diskConfig | MANUAL |
| metadata | {} |
| os-extended-volumes:volumes_attached | [] |
| accessIPv4 | |
| accessIPv6 | |
| net1 network | 10.0.2.7 |
| progress | 0 |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-AZ:availability_zone | nova |
| config_drive ...

Read more...

tags: added: api
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/47733

Changed in nova:
assignee: nobody → ChangBo Guo (guochbo)
status: New → In Progress
Revision history for this message
ChangBo Guo(gcb) (glongwave) wrote : Re: Need add precheck compute service before stop/start instance

I think this bug does confusing user . If the compute service is unavailable , but user don't know about this . Because ,nova list
always return the old instance state , start/stop one instance without any exception ,just change the task_state .

@ Russel , Chris, GuoHui
I would like to get suggestion from you to move on :), and find a best fit soluton for the bug .
I just posted a patch , move the check logic into nova/compute/api.py , we can easily apply the check to other actions like pause,
This time is closer to compute driver rpc call than last patch ( in api), I tested it as follow,

[root@guochbo-10-7-0-151 /]# nova list
+--------------------------------------+-------+--------+------------+-------------+-------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+-------+--------+------------+-------------+-------------------+
| 6340fef6-5dfa-45c1-8aad-eed80d7c0217 | test1 | ACTIVE | None | Running | network1=10.0.1.3 |
| 0ff75f64-01d6-4b1f-bec8-0edfc3417b48 | test2 | ACTIVE | None | Running | network1=10.0.1.4 |
+--------------------------------------+-------+--------+------------+-------------+-------------------+
[root@guochbo-10-7-0-151 /]# service openstack-nova-compute stop
Stopping openstack-nova-compute: [ OK ]
[root@guochbo-10-7-0-151 /]# nova stop test1
ERROR: Compute service of guochbo-10-7-0-151.sce.cn.ibm.com is unavailable at this time. (HTTP 400) (Request-ID: req-a192800e-eaf6-470a-bd4a-5aacd8c5227f)

1) Is this way OK for you ? I still need more work to modify other unittest to fit the chanage , but I will hold on until I get the right direction .
2) Any suggestion about the bug ?
Thanks in advance.

summary: - Need add precheck compute service before stop/start instance
+ Need add precheck compute service before performing action on instance
summary: - Need add precheck compute service before performing action on instance
+ Need raise exception when performing action on instance if compute
+ service is down
description: updated
description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/54237

Revision history for this message
Robert Collins (lifeless) wrote :

pre-flight isn't an effective way to solve this, as it has race conditions. Need to detect the situation and cater for it there.

Revision history for this message
ChangBo Guo(gcb) (glongwave) wrote :

Rober,
I agree with you . We need figure out the place of check . Any idea about the place ?

Revision history for this message
ChangBo Guo(gcb) (glongwave) wrote :

Need a mechanism to handle our instances' status and operations on them when compute service is down.
This is not a bug fix can handle this ,

Changed in nova:
assignee: ChangBo Guo (guochbo) → nobody
status: In Progress → New
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Matt Riedemann (mriedem) wrote :

I think you want to look at what this blueprint is doing:

https://blueprints.launchpad.net/nova/+spec/recover-stuck-state

Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Low → Undecided
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.