test_rebuild_server_in_error_state randomly times out waiting for rebuilding instance to be active (cells v1)

Bug #1709985 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Won't Fix
Low
Unassigned

Bug Description

http://logs.openstack.org/12/491012/12/check/gate-tempest-dsvm-cells-ubuntu-xenial/4aa3da8/console.html#_2017-08-10_18_58_35_158151

2017-08-10 18:58:35.158151 | tempest.api.compute.admin.test_servers.ServersAdminTestJSON.test_rebuild_server_in_error_state[id-682cb127-e5bb-4f53-87ce-cb9003604442]
2017-08-10 18:58:35.158207 | ---------------------------------------------------------------------------------------------------------------------------------------
2017-08-10 18:58:35.158221 |
2017-08-10 18:58:35.158239 | Captured traceback:
2017-08-10 18:58:35.158258 | ~~~~~~~~~~~~~~~~~~~
2017-08-10 18:58:35.158281 | Traceback (most recent call last):
2017-08-10 18:58:35.158323 | File "tempest/api/compute/admin/test_servers.py", line 188, in test_rebuild_server_in_error_state
2017-08-10 18:58:35.158346 | raise_on_error=False)
2017-08-10 18:58:35.158381 | File "tempest/common/waiters.py", line 96, in wait_for_server_status
2017-08-10 18:58:35.158407 | raise lib_exc.TimeoutException(message)
2017-08-10 18:58:35.158436 | tempest.lib.exceptions.TimeoutException: Request timed out
2017-08-10 18:58:35.158525 | Details: (ServersAdminTestJSON:test_rebuild_server_in_error_state) Server e57c5e75-9a8b-436d-aa53-a545e32c308a failed to reach ACTIVE status and task state "None" within the required time (196 s). Current status: REBUILD. Current task state: rebuild_spawning.

Looks like this mostly shows up in cells v1 jobs, which wouldn't be surprising if we missed some state change due to the instance sync to the top level cell, but it's also happening sometimes in non-cells jobs. Could be a duplicate bug where we missing or don't get a network change / vif plug notification from neutron so we just wait forever.

Revision history for this message
Matt Riedemann (mriedem) wrote :

Actually the cells v1 job shouldn't be waiting for vif plug events, but it looks like nova-compute is configured for that:

Aug 10 17:49:38.949627 ubuntu-xenial-rax-ord-10376969 nova-compute[20201]: DEBUG oslo_service.service [None req-e2b1deea-3fbd-4806-8dc4-fbfde2dbf770 None None] vif_plugging_is_fatal = True {{(pid=20201) log_opt_values /usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:2875}}
Aug 10 17:49:38.949950 ubuntu-xenial-rax-ord-10376969 nova-compute[20201]: DEBUG oslo_service.service [None req-e2b1deea-3fbd-4806-8dc4-fbfde2dbf770 None None] vif_plugging_timeout = 300 {{(pid=20201) log_opt_values /usr/local/lib/python2.7/dist-packages/oslo_config/cfg.py:2875}}

http://logs.openstack.org/12/491012/12/check/gate-tempest-dsvm-cells-ubuntu-xenial/4aa3da8/logs/screen-n-cpu.txt.gz

cells v1 doesn't support those - oh but cellsv1 jobs run with nova-network, so neutron shouldn't be a problem there.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Cells v1 is deprecated so this is marked as low severity.

Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/493076

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/493076
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9e2a0163d36fd0c2152b39a714e028752d70677b
Submitter: Jenkins
Branch: master

commit 9e2a0163d36fd0c2152b39a714e028752d70677b
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 11 12:42:29 2017 -0400

    Skip test_rebuild_server_in_error_state for cells v1

    This test randomly fails due to a timeout in cells v1
    jobs and is a latent issue. Since cells v1 is deprecated
    and we aren't fixing latent bugs, let's just skip this.

    Change-Id: I386df03f406dd0f1847a0d091e070df7786f616e
    Related-Bug: #1709985

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/496358

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/496359

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/newton)

Related fix proposed to branch: stable/newton
Review: https://review.openstack.org/496360

Revision history for this message
Matt Riedemann (mriedem) wrote : Re: test_rebuild_server_in_error_state randomly times out waiting for rebuilding instance to be active
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/499001

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/pike)

Reviewed: https://review.openstack.org/496358
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=99ab4384887e1ae5b3e0a05caebf946e72952558
Submitter: Jenkins
Branch: stable/pike

commit 99ab4384887e1ae5b3e0a05caebf946e72952558
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 11 12:42:29 2017 -0400

    Skip test_rebuild_server_in_error_state for cells v1

    This test randomly fails due to a timeout in cells v1
    jobs and is a latent issue. Since cells v1 is deprecated
    and we aren't fixing latent bugs, let's just skip this.

    Change-Id: I386df03f406dd0f1847a0d091e070df7786f616e
    Related-Bug: #1709985
    (cherry picked from commit 9e2a0163d36fd0c2152b39a714e028752d70677b)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/499001
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=16e874180eb6bdeba3bff22e06dfa2bc33bf6ca8
Submitter: Jenkins
Branch: master

commit 16e874180eb6bdeba3bff22e06dfa2bc33bf6ca8
Author: Matt Riedemann <email address hidden>
Date: Tue Aug 29 21:26:49 2017 -0400

    Skip more racy rebuild failing tests with cells v1

    These are other rebuild tests that randomly fail the
    cells v1 job waiting for state changes due to VIF
    races.

    Since cells v1 is deprecated, let's just skip this.

    Change-Id: Ia00015a8cbb930efd274830b69f296a257578700
    Related-Bug: #1709985

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/newton)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: stable/newton
Review: https://review.openstack.org/496360
Reason: newton-eol is tomorrow and the ocata change isn't merged yet, so might as well just abandon this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/ocata)

Reviewed: https://review.openstack.org/496359
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1bfc26bd9c79c55d000e5ce59285c1d6e66dc31b
Submitter: Zuul
Branch: stable/ocata

commit 1bfc26bd9c79c55d000e5ce59285c1d6e66dc31b
Author: Matt Riedemann <email address hidden>
Date: Fri Aug 11 12:42:29 2017 -0400

    Skip test_rebuild_server_in_error_state for cells v1

    This test randomly fails due to a timeout in cells v1
    jobs and is a latent issue. Since cells v1 is deprecated
    and we aren't fixing latent bugs, let's just skip this.

    Change-Id: I386df03f406dd0f1847a0d091e070df7786f616e
    Related-Bug: #1709985
    (cherry picked from commit 9e2a0163d36fd0c2152b39a714e028752d70677b)

tags: added: in-stable-ocata
Matt Riedemann (mriedem)
Changed in nova:
status: Confirmed → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/567256

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/567256
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=927b6ccced40a189ce9ee6b1486b54599b74c444
Submitter: Zuul
Branch: master

commit 927b6ccced40a189ce9ee6b1486b54599b74c444
Author: Matt Riedemann <email address hidden>
Date: Wed May 9 11:17:28 2018 -0400

    Skip ServerActionsTestJSON.test_rebuild_server for cells v1 job

    This is another occurrence of a rebuild test randomly timing out
    waiting for status changes in the cells v1 job, so blacklist it.
    Cells v1 is deprecated and should be gone soon anyway, so we don't
    need to waste time hitting stuff like this.

    Change-Id: Icb1d33c6e602467e21efe4838cb6edbadab14834
    Related-Bug: #1709985

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/569454

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/569454
Reason: This is actually caused by bug 1772088.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/569454
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=771a736818b5c64e3a5796e809f17cd759542ad4
Submitter: Zuul
Branch: master

commit 771a736818b5c64e3a5796e809f17cd759542ad4
Author: Matt Riedemann <email address hidden>
Date: Fri May 18 12:05:03 2018 -0400

    Skip ServerShowV254Test.test_rebuild_server in cells v1 job

    This adds yet another rebuild test to the blacklist for the
    cells v1 job. Rebuild status updates are racy in the cells v1
    job because of the async status updates.

    Change-Id: I42f08ee2a7282c9cad761bbe0daa111e79678791
    Related-Bug: #1709985

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/576194

Revision history for this message
Matt Riedemann (mriedem) wrote :
summary: test_rebuild_server_in_error_state randomly times out waiting for
- rebuilding instance to be active
+ rebuilding instance to be active (cells v1)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/578125

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/578125
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=3780335b5b63cc7ee5700c9eda2d168eb54470f8
Submitter: Zuul
Branch: master

commit 3780335b5b63cc7ee5700c9eda2d168eb54470f8
Author: Matt Riedemann <email address hidden>
Date: Tue Jun 26 11:02:07 2018 -0400

    Skip ServerShowV247Test.test_update_rebuild_list_server in nova-cells-v1 job

    Another rebuild test that intermittently fails the cells v1
    job because of latent races with status changes in cells v1.

    Change-Id: Ic422a5d7ac795e6e6882f1f0ad82022a7bd42229
    Related-Bug: #1709985

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/581717

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/581717
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=4a1b08365c1f4a0c69ba68beb46f237c9032d837
Submitter: Zuul
Branch: master

commit 4a1b08365c1f4a0c69ba68beb46f237c9032d837
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 11 08:48:12 2018 -0400

    Skip more rebuild tests for cells v1 job

    This skips a couple more tempest rebuild tests for
    latent intermittent rebuild race failures
    due to status sync delays with cells v1.

    Change-Id: Ib2dcbba7f447f54c36877a4e7c29d1b6839a0a80
    Related-Bug: #1709985

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/605115

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/605270

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/605271

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/605416

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/queens)

Reviewed: https://review.openstack.org/605115
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c223869a22f8e3290e31ed059ceac006e8b03ed8
Submitter: Zuul
Branch: stable/queens

commit c223869a22f8e3290e31ed059ceac006e8b03ed8
Author: Matt Riedemann <email address hidden>
Date: Wed May 9 11:17:28 2018 -0400

    Skip ServerActionsTestJSON.test_rebuild_server for cells v1 job

    This is another occurrence of a rebuild test randomly timing out
    waiting for status changes in the cells v1 job, so blacklist it.
    Cells v1 is deprecated and should be gone soon anyway, so we don't
    need to waste time hitting stuff like this.

    Change-Id: Icb1d33c6e602467e21efe4838cb6edbadab14834
    Related-Bug: #1709985
    (cherry picked from commit 927b6ccced40a189ce9ee6b1486b54599b74c444)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/605270
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=9a72b720859ccea305db17cf9b8047300aba604e
Submitter: Zuul
Branch: stable/queens

commit 9a72b720859ccea305db17cf9b8047300aba604e
Author: Matt Riedemann <email address hidden>
Date: Fri May 18 12:05:03 2018 -0400

    Skip ServerShowV254Test.test_rebuild_server in cells v1 job

    This adds yet another rebuild test to the blacklist for the
    cells v1 job. Rebuild status updates are racy in the cells v1
    job because of the async status updates.

    Change-Id: I42f08ee2a7282c9cad761bbe0daa111e79678791
    Related-Bug: #1709985
    (cherry picked from commit 771a736818b5c64e3a5796e809f17cd759542ad4)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/605271
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=7bd6220f0b2c3a9d65dd701ba2a153fe55896066
Submitter: Zuul
Branch: stable/queens

commit 7bd6220f0b2c3a9d65dd701ba2a153fe55896066
Author: Matt Riedemann <email address hidden>
Date: Tue Jun 26 11:02:07 2018 -0400

    Skip ServerShowV247Test.test_update_rebuild_list_server in nova-cells-v1 job

    Another rebuild test that intermittently fails the cells v1
    job because of latent races with status changes in cells v1.

    Change-Id: Ic422a5d7ac795e6e6882f1f0ad82022a7bd42229
    Related-Bug: #1709985
    (cherry picked from commit 3780335b5b63cc7ee5700c9eda2d168eb54470f8)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/605416
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2e3912b14a14ab858c7aad06865d1d3b452922c0
Submitter: Zuul
Branch: stable/queens

commit 2e3912b14a14ab858c7aad06865d1d3b452922c0
Author: Matt Riedemann <email address hidden>
Date: Wed Jul 11 08:48:12 2018 -0400

    Skip more rebuild tests for cells v1 job

    This skips a couple more tempest rebuild tests for
    latent intermittent rebuild race failures
    due to status sync delays with cells v1.

    Conflicts:
          devstack/tempest-dsvm-cells-rc

    NOTE(mriedem): The conflict is due to not having change
    Iff89b9714e2413716bf87db6f0d773787c06eda3 in Queens
    and is not needed because the 2.63 microversion was not
    available nor tested in Queens.

    Change-Id: Ib2dcbba7f447f54c36877a4e7c29d1b6839a0a80
    Related-Bug: #1709985
    (cherry picked from commit 4a1b08365c1f4a0c69ba68beb46f237c9032d837)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.