test_tagged_attachment failing close to 100% in nova-next

Bug #1959899 reported by Artom Lifshitz
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
In Progress
High
Jorge San Emeterio

Bug Description

The test times out verifying the tags in the metadata API after attaching a NIC and volume.

This is happening almost 100% of time, but I've seen at least a couple of passsing runs as well. We have yet to determine what the exact failure mode is - is the metadata API timing out (unlikely, since every other test passes)? Or are the expected tags just not present (more likely)?

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

We have a heavy failure rate in nova-next https://zuul.opendev.org/t/openstack/builds?job_name=nova-next&project=openstack/nova so I'm setting this as Critical

tags: added: gate-failure
Changed in nova:
importance: Undecided → High
status: New → Confirmed
importance: High → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/827661

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

We are testing towards two directions:

1) adding waiters between the attach and the metadata query
https://review.opendev.org/c/openstack/nova/+/827549
https://review.opendev.org/c/openstack/tempest/+/827548

2) simply logging what the metadata query returned
https://review.opendev.org/c/openstack/nova/+/827661
https://review.opendev.org/c/openstack/tempest/+/827659

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/827851

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

2) reproduced the issue it seems that the nic-tag is missing from the metadata[1]:

[tempest.api.compute.servers.test_device_tagging] Failed to parse metadata: ['volume-tag'] != ['nic-tag', 'volume-tag']

[1] https://009a9b48abdcdef7d56d-ee17e3910737ace57c03e87f1c399184.ssl.cf5.rackcdn.com/827661/1/check/nova-next/180635f/testr_results.html

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/827851
Committed: https://opendev.org/openstack/nova/commit/b00ce99dd456ab701cef0a9d4429920834d3d840
Submitter: "Zuul (22348)"
Branch: master

commit b00ce99dd456ab701cef0a9d4429920834d3d840
Author: Sean Mooney <email address hidden>
Date: Fri Feb 4 12:28:10 2022 +0000

    skip test_tagged_attachment in nova-next

    This change adds
    tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment
    to the tempest exclude regex

    over the past few weeks we have noticed this test failing intermitently
    and it has not started to become a gate blocker. This test is executed in other
    jobs that use the PC machine type and is only failing in the nova-next
    job which uses q35. As such while we work out how to address this properly
    we skip it in the nova-next.

    Change-Id: I845ca5989a8ad84d7c04971316fd892cd29cfe1f
    Related-Bug: #1959899

Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Putting the status to High as now the CI is no longer blocked as we skip the related test, but we need to investigate it more.

Changed in nova:
importance: Critical → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/828542

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/828542
Committed: https://opendev.org/openstack/nova/commit/70f75ac98146a3f7804074c188d125d4c5338f46
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 70f75ac98146a3f7804074c188d125d4c5338f46
Author: Sean Mooney <email address hidden>
Date: Fri Feb 4 12:28:10 2022 +0000

    skip test_tagged_attachment in nova-next

    This change adds
    tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment
    to the tempest exclude regex

    over the past few weeks we have noticed this test failing intermitently
    and it has not started to become a gate blocker. This test is executed in other
    jobs that use the PC machine type and is only failing in the nova-next
    job which uses q35. As such while we work out how to address this properly
    we skip it in the nova-next.

    Conflicts in .zuul.yaml because 5d2f2da0afa changed the same
    tempest_exclude_regex and is not in xena.

    Change-Id: I845ca5989a8ad84d7c04971316fd892cd29cfe1f
    Related-Bug: #1959899
    (cherry picked from commit b00ce99dd456ab701cef0a9d4429920834d3d840)

tags: added: in-stable-xena
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to nova (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/830656

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/830656
Committed: https://opendev.org/openstack/nova/commit/5b7cb876ba044bc1e3f5d70ac16b67f03d13a498
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 5b7cb876ba044bc1e3f5d70ac16b67f03d13a498
Author: Sean Mooney <email address hidden>
Date: Fri Feb 4 12:28:10 2022 +0000

    skip test_tagged_attachment in nova-next

    This change adds
    tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment
    to the tempest exclude regex

    over the past few weeks we have noticed this test failing intermitently
    and it has not started to become a gate blocker. This test is executed in other
    jobs that use the PC machine type and is only failing in the nova-next
    job which uses q35. As such while we work out how to address this properly
    we skip it in the nova-next.

    Conflicts in .zuul.yaml because 5d2f2da0afa changed the same
    tempest_exclude_regex and is not in xena.

    Conflicts:
      .zuul.yaml

    NOTE(elod.illes): conflict is due to not having the following patches
    in stable/wallaby: I4f2d01a4cf443f9c539158e77032cd3d8ce24ad7 and
    Ib56a034fb08e309981d0b4553b8cee8d16b10152.

    Change-Id: I845ca5989a8ad84d7c04971316fd892cd29cfe1f
    Related-Bug: #1959899
    (cherry picked from commit b00ce99dd456ab701cef0a9d4429920834d3d840)
    (cherry picked from commit 70f75ac98146a3f7804074c188d125d4c5338f46)

tags: added: in-stable-wallaby
Changed in nova:
assignee: nobody → Jorge San Emeterio (jsanemet)
Revision history for this message
Jorge San Emeterio (jsanemet) wrote :

I have tried replicating the bug in here: https://review.opendev.org/c/openstack/nova/+/876699, but I have not managed to get the test to fail after three tries. Only the last build of it fails, but its due to other reasons.

Revision history for this message
Artom Lifshitz (notartom) wrote :

There's a good chance this got fixed as a side effect of https://review.opendev.org/q/topic:wait_until_sshable_pingable, I'll let Jorge investigate, confirm, and propose the un-skip patch.

Changed in nova:
status: Confirmed → In Progress
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.