resize test fails with "Returning 400 to user: Instance has not been resized" after VERIFY_RESIZE state

Bug #1326778 reported by Matt Riedemann
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Matt Riedemann

Bug Description

Fails here:

http://logs.openstack.org/93/96293/2/gate/gate-tempest-dsvm-postgres-full/a644174/console.html

The error in the n-api log is here:

http://logs.openstack.org/93/96293/2/gate/gate-tempest-dsvm-postgres-full/a644174/logs/screen-n-api.txt.gz#_2014-06-05_06_50_26_679

From the console log, it looks like it goes into an error state after going to verify_resize state:

2014-06-05 06:50:26.714 | 2014-06-05 06:50:26,573 Request (MigrationsAdminTest:test_list_migrations_in_flavor_resize_situation): 200 GET http://127.0.0.1:8774/v2/7d6640f8f8e34866b5bd00e109fe90b7/servers/88591e95-69a2-4e34-a294-90b79d8f0d55 0.103s
2014-06-05 06:50:26.714 | 2014-06-05 06:50:26,573 State transition "RESIZE/resize_finish" ==> "VERIFY_RESIZE/None" after 16 second wait
2014-06-05 06:50:26.714 | 2014-06-05 06:50:26,681 Request (MigrationsAdminTest:test_list_migrations_in_flavor_resize_situation): 400 POST http://127.0.0.1:8774/v2/7d6640f8f8e34866b5bd00e109fe90b7/servers/88591e95-69a2-4e34-a294-90b79d8f0d55/action 0.106s

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiTWlncmF0aW9uc0FkbWluVGVzdFwiIEFORCBtZXNzYWdlOlwiSFRUUCBleGNlcHRpb24gdGhyb3duXFw6IEluc3RhbmNlIGhhcyBub3QgYmVlbiByZXNpemVkXCIgQU5EIHRhZ3M6XCJzY3JlZW4tbi1hcGkudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MDE5NzIzNDQyMjh9

8 hits in 7 days, looks like this started on 6/5. Fails in check and gate queues.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Wondering if this is the root cause, this merged yesterday when this started showing up and is migration/resize related:

https://review.openstack.org/94474

Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

https://review.openstack.org/94474 has to be the issue. It changed the order of when the instance and migration objects were updated, so now the instance is updated first and the migration status is updated second. Tempest is waiting on the instance state change and once it hits that, it does an immediate confirm-resize and fails because the migration status isn't 'finished' yet.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Changed in nova:
status: Confirmed → In Progress
assignee: nobody → Matt Riedemann (mriedem)
Revision history for this message
Matt Riedemann (mriedem) wrote :

(8:25:33 AM) mriedem: lbragstad: so confirm_resize is synchronized on the instance.uuid
(8:25:35 AM) mriedem: but finish_resize isn't
(8:25:39 AM) mriedem: sdague: ^
(8:26:23 AM) mriedem: revert_resize isn't synchronized either
(8:26:25 AM) lbragstad: mriedem: where do you see that?
(8:26:52 AM) mriedem: lbragstad: http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n3098
(8:27:14 AM) mriedem: finish_resize is here: http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n3613
(8:27:22 AM) mtreinish: mriedem: oh, that could be it
(8:27:24 AM) mriedem: revert_resize is here: http://git.openstack.org/cgit/openstack/nova/tree/nova/compute/manager.py#n3203
(8:27:32 AM) mriedem: so sync those up and should be good to go

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/98108
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=77dd352933ca4272bfc376bd2f9ea3ff781a9bf8
Submitter: Jeremy Stanley (<email address hidden>)
Branch: master

commit 77dd352933ca4272bfc376bd2f9ea3ff781a9bf8
Author: Matt Riedemann <email address hidden>
Date: Thu Jun 5 13:15:45 2014 +0000

    Revert "Fix migration and instance resize update order"

    This reverts commit 6dd5cc503cc05c00c5f9d831480539c67f6e2a48.

    Change-Id: Id438276c3d71155ad6a9dfd31280a49e39905ae5
    Closes-Bug: #1326778

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/98110
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d27987fa56156392c06b1185b44e36dc9720435c
Submitter: Jenkins
Branch: master

commit d27987fa56156392c06b1185b44e36dc9720435c
Author: Lance Bragstad <email address hidden>
Date: Thu Jun 5 13:30:00 2014 +0000

    Sync revert and finish resize on instance.uuid

    We need to make sure we sync the state of the instance the same way we
    do for do_confirm_resize. Previously, we were not syncing the instance
    on revert or finish resize. We do this so that we can update the
    migration and instance resize update orders. Previously if you set
    resize_confirm_window to some small value, your resized instance could
    go into error state because the periodic task was run after the
    migration was updated but before the instance object was updated. This
    change makes it so the instance is always updated before the migration.

    Change-Id: Iaad1a05864b2b46180cadf16198b0fd8583c6703
    Closes-Bug: 1326778

Revision history for this message
Matt Riedemann (mriedem) wrote : Re: test_list_migrations_in_flavor_resize_situation fails with 400 after VERIFY_RESIZE state
Revision history for this message
Matt Riedemann (mriedem) wrote :

2014-06-26 23:35:44.120 | tempest.api.compute.servers.test_server_actions.ServerActionsTestXML.test_resize_server_revert[gate]
2014-06-26 23:35:44.121 | ----------------------------------------------------------------------------------------------------
2014-06-26 23:35:44.121 |
2014-06-26 23:35:44.121 | Captured traceback-1:
2014-06-26 23:35:44.122 | ~~~~~~~~~~~~~~~~~~~~~
2014-06-26 23:35:44.122 | Traceback (most recent call last):
2014-06-26 23:35:44.122 | File "tempest/api/compute/servers/test_server_actions.py", line 51, in tearDown
2014-06-26 23:35:44.123 | self.server_check_teardown()
2014-06-26 23:35:44.123 | File "tempest/api/compute/base.py", line 161, in server_check_teardown
2014-06-26 23:35:44.123 | 'ACTIVE')
2014-06-26 23:35:44.123 | File "tempest/services/compute/xml/servers_client.py", line 390, in wait_for_server_status
2014-06-26 23:35:44.124 | raise_on_error=raise_on_error)
2014-06-26 23:35:44.124 | File "tempest/common/waiters.py", line 107, in wait_for_server_status
2014-06-26 23:35:44.124 | raise exceptions.TimeoutException(message)
2014-06-26 23:35:44.125 | TimeoutException: Request timed out
2014-06-26 23:35:44.125 | Details: (ServerActionsTestXML:tearDown) Server 12af14bb-4efd-4946-a007-66a371ee5601 failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: VERIFY_RESIZE. Current task state: None.
2014-06-26 23:35:44.125 |
2014-06-26 23:35:44.126 |
2014-06-26 23:35:44.126 | Captured traceback:
2014-06-26 23:35:44.126 | ~~~~~~~~~~~~~~~~~~~
2014-06-26 23:35:44.126 | Traceback (most recent call last):
2014-06-26 23:35:44.127 | File "tempest/api/compute/servers/test_server_actions.py", line 253, in test_resize_server_revert
2014-06-26 23:35:44.127 | self.client.revert_resize(self.server_id)
2014-06-26 23:35:44.127 | File "tempest/services/compute/xml/servers_client.py", line 516, in revert_resize
2014-06-26 23:35:44.128 | return self.action(server_id, 'revertResize', None, **kwargs)
2014-06-26 23:35:44.128 | File "tempest/services/compute/xml/servers_client.py", line 441, in action
2014-06-26 23:35:44.128 | resp, body = self.post("servers/%s/action" % server_id, str(doc))
2014-06-26 23:35:44.129 | File "tempest/common/rest_client.py", line 218, in post
2014-06-26 23:35:44.129 | return self.request('POST', url, extra_headers, headers, body)
2014-06-26 23:35:44.129 | File "tempest/common/rest_client.py", line 430, in request
2014-06-26 23:35:44.130 | resp, resp_body)
2014-06-26 23:35:44.130 | File "tempest/common/rest_client.py", line 479, in _error_checker
2014-06-26 23:35:44.130 | raise exceptions.BadRequest(resp_body)
2014-06-26 23:35:44.130 | BadRequest: Bad request
2014-06-26 23:35:44.131 | Details: {'message': 'Instance has not been resized.', 'code': '400'}

Revision history for this message
Matt Riedemann (mriedem) wrote :
summary: - test_list_migrations_in_flavor_resize_situation fails with 400 after
- VERIFY_RESIZE state
+ resize test fails with "Returning 400 to user: Instance has not been
+ resized" after VERIFY_RESIZE state
Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/103090
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7e9d00287a24b7b157c87637a2c2f7d00ba72c9a
Submitter: Jenkins
Branch: master

commit 7e9d00287a24b7b157c87637a2c2f7d00ba72c9a
Author: Sean Dague <email address hidden>
Date: Fri Jun 27 11:09:49 2014 +0000

    Revert "Sync revert and finish resize on instance.uuid"

    This reverts commit d27987fa56156392c06b1185b44e36dc9720435c which
    actually seems to have retriggered the issue it was designed to fix.

    Related-Bug: #1326778

    Change-Id: Ia1905177de8494f9c3abb3bcfa0866c7bcb33f93

Thierry Carrez (ttx)
Changed in nova:
milestone: juno-1 → 2014.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.