Hard Rebooting of nova compute guests is unreliable

Bug #1224518 reported by Peter Portante
This bug report is a duplicate of:  Bug #1014647: Tempest has no test for soft reboot. Edit Remove
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Incomplete
Undecided
Unassigned
tempest
Invalid
Low
Unassigned

Bug Description

See: http://logs.openstack.org/46/46146/2/check/gate-tempest-devstack-vm-postgres-full/b2712f1/console.html

2013-09-12 04:43:17.625 | ======================================================================
2013-09-12 04:43:17.649 | FAIL: tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_reboot_server_hard[gate,smoke]
2013-09-12 04:43:17.651 | tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_reboot_server_hard[gate,smoke]
2013-09-12 04:43:17.652 | ----------------------------------------------------------------------
2013-09-12 04:43:17.652 | _StringException: Empty attachments:
2013-09-12 04:43:17.652 | stderr
2013-09-12 04:43:17.652 | stdout
2013-09-12 04:43:17.653 |
2013-09-12 04:43:17.653 | pythonlogging:'': {{{
2013-09-12 04:43:17.653 | 2013-09-12 04:16:55,739 Request: GET http://127.0.0.1:8774/v2/83ed6f49279b4292a00b32397d2f52fb/servers/8ad0ad9a-3975-486f-94b4-af1c89b51aaf
2013-09-12 04:43:17.654 | 2013-09-12 04:16:55,806 Response Status: 200
2013-09-12 04:43:17.654 | 2013-09-12 04:16:55,806 Nova request id: req-cdc6b1fc-bcf2-4e9c-bea1-8bf935993cbd
2013-09-12 04:43:17.654 | 2013-09-12 04:16:55,807 Request: POST http://127.0.0.1:8774/v2/83ed6f49279b4292a00b32397d2f52fb/servers/8ad0ad9a-3975-486f-94b4-af1c89b51aaf/action
2013-09-12 04:43:17.655 | 2013-09-12 04:16:55,917 Response Status: 202
2013-09-12 04:43:17.655 | 2013-09-12 04:16:55,917 Nova request id: req-3af37dd3-0ddc-4daa-aa6f-6958a5073cc4
2013-09-12 04:43:17.655 | 2013-09-12 04:16:55,918 Request: GET http://127.0.0.1:8774/v2/83ed6f49279b4292a00b32397d2f52fb/servers/8ad0ad9a-3975-486f-94b4-af1c89b51aaf
2013-09-12 04:43:17.655 | 2013-09-12 04:16:55,986 Response Status: 200
2013-09-12 04:43:17.656 | 2013-09-12 04:16:55,986 Nova request id: req-a7298d3e-167c-4c8f-9506-6064ba811e5b

.
.
.

2013-09-12 04:43:17.976 | 2013-09-12 04:23:35,773 Request: GET http://127.0.0.1:8774/v2/83ed6f49279b4292a00b32397d2f52fb/servers/8ad0ad9a-3975-486f-94b4-af1c89b51aaf
2013-09-12 04:43:17.976 | 2013-09-12 04:23:35,822 Response Status: 200
2013-09-12 04:43:17.976 | 2013-09-12 04:23:35,823 Nova request id: req-a122aded-b49b-4847-9920-b2b8b09bc0ca
2013-09-12 04:43:17.976 | }}}
2013-09-12 04:43:17.977 |
2013-09-12 04:43:17.977 | Traceback (most recent call last):
2013-09-12 04:43:17.978 | File "tempest/api/compute/servers/test_server_actions.py", line 81, in test_reboot_server_hard
2013-09-12 04:43:17.978 | self.client.wait_for_server_status(self.server_id, 'ACTIVE')
2013-09-12 04:43:17.979 | File "tempest/services/compute/json/servers_client.py", line 176, in wait_for_server_status
2013-09-12 04:43:17.979 | raise exceptions.TimeoutException(message)
2013-09-12 04:43:17.979 | TimeoutException: Request timed out
2013-09-12 04:43:17.980 | Details: Server 8ad0ad9a-3975-486f-94b4-af1c89b51aaf failed to reach ACTIVE status within the required time (400 s). Current status: HARD_REBOOT.

Tags: testing
Revision history for this message
Attila Fazekas (afazekas) wrote :

10 results in logstash with "Current status: HARD_REBOOT" .

Revision history for this message
Sean Dague (sdague) wrote :

I don't think this is a tempest bug, this is a state transition bug in Nova

Changed in tempest:
importance: Undecided → Low
status: New → Invalid
Sean Dague (sdague)
summary: - test_reboot_server_hard fails sporadically in swift check jobs
+ Hard Rebooting of nova compute guests is unreliable
Revision history for this message
Matt Riedemann (mriedem) wrote :

In the attached log there are 59 failures, so something else must have really been failing hard causing a snowball fail. The associated patch was for swift so that shouldn't be related to why this would be failing in that case, especially since it failed in the gate.

Not getting any hits in the last 7 days on this query:

message:"Server " AND message:"failed to reach ACTIVE status within the required time" AND message:"Current status: HARD_REBOOT" AND filename:"console.html"

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiU2VydmVyIFwiIEFORCBtZXNzYWdlOlwiZmFpbGVkIHRvIHJlYWNoIEFDVElWRSBzdGF0dXMgd2l0aGluIHRoZSByZXF1aXJlZCB0aW1lXCIgQU5EIG1lc3NhZ2U6XCJDdXJyZW50IHN0YXR1czogSEFSRF9SRUJPT1RcIiBBTkQgZmlsZW5hbWU6XCJjb25zb2xlLmh0bWxcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiNjA0ODAwIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM5MDc3MTA4OTAwN30=

Given the age of this bug, and that I can't get any hits on it in logstash, I'm going to close it.

tags: added: testing
Matt Riedemann (mriedem)
Changed in nova:
status: New → Incomplete
Revision history for this message
Doug Hellmann (doug-hellmann) wrote :

I think I'm hitting a similar problem:

http://logs.openstack.org/04/104304/1/gate/gate-tempest-dsvm-full/5d3c265/console.html

2014-07-02 23:11:51.469 | tempest.api.compute.servers.test_server_actions.ServerActionsTestXML.test_reboot_server_hard[gate,smoke]
2014-07-02 23:11:51.469 | --------------------------------------------------------------------------------------------------------
2014-07-02 23:11:51.469 |
2014-07-02 23:11:51.469 | Captured traceback-1:
2014-07-02 23:11:51.469 | ~~~~~~~~~~~~~~~~~~~~~
2014-07-02 23:11:51.469 | Traceback (most recent call last):
2014-07-02 23:11:51.469 | File "tempest/api/compute/servers/test_server_actions.py", line 51, in tearDown
2014-07-02 23:11:51.469 | self.server_check_teardown()
2014-07-02 23:11:51.469 | File "tempest/api/compute/base.py", line 165, in server_check_teardown
2014-07-02 23:11:51.470 | cls.servers_client.wait_for_server_termination(cls.server_id)
2014-07-02 23:11:51.470 | File "tempest/services/compute/xml/servers_client.py", line 403, in wait_for_server_termination
2014-07-02 23:11:51.470 | raise exceptions.BuildErrorException(server_id=server_id)
2014-07-02 23:11:51.470 | BuildErrorException: Server e812dd65-6acc-4113-b463-df403fc948c6 failed to build and is in ERROR status
2014-07-02 23:11:51.470 |
2014-07-02 23:11:51.470 |
2014-07-02 23:11:51.470 | Captured traceback:
2014-07-02 23:11:51.470 | ~~~~~~~~~~~~~~~~~~~
2014-07-02 23:11:51.470 | Traceback (most recent call last):
2014-07-02 23:11:51.470 | File "tempest/api/compute/servers/test_server_actions.py", line 90, in test_reboot_server_hard
2014-07-02 23:11:51.470 | self.client.wait_for_server_status(self.server_id, 'ACTIVE')
2014-07-02 23:11:51.470 | File "tempest/services/compute/xml/servers_client.py", line 390, in wait_for_server_status
2014-07-02 23:11:51.471 | raise_on_error=raise_on_error)
2014-07-02 23:11:51.471 | File "tempest/common/waiters.py", line 87, in wait_for_server_status
2014-07-02 23:11:51.471 | raise exceptions.BuildErrorException(server_id=server_id)
2014-07-02 23:11:51.471 | BuildErrorException: Server e812dd65-6acc-4113-b463-df403fc948c6 failed to build and is in ERROR status
2014-07-02 23:11:51.471 |

Revision history for this message
Tom Cammann (tom-cammann) wrote :

Another similar issue with HARD_REBOOT coming up:

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "tempest/test.py", line 128, in wrapper
        return f(self, *func_args, **func_kwargs)
      File "tempest/api/compute/security_groups/test_security_groups.py", line 116, in test_server_security_groups
        self.servers_client.wait_for_server_status(server_id, 'ACTIVE')
      File "tempest/services/compute/xml/servers_client.py", line 390, in wait_for_server_status
        raise_on_error=raise_on_error)
      File "tempest/common/waiters.py", line 97, in wait_for_server_status
        raise exceptions.TimeoutException(message)
    TimeoutException: Request timed out
    Details: (SecurityGroupsTestXML:test_server_security_groups) Server e174b7d6-9ce1-4aa6-a0d9-4bfe38940a7c failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: HARD_REBOOT. Current task state: rebooting_hard.

Revision history for this message
Tom Cammann (tom-cammann) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.