ussuri | master fs020 failing on TestNetworkAdvancedServerOps

Bug #1881642 reported by Rafael Folco
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
tripleo
Expired
Critical
Unassigned

Bug Description

https://review.rdoproject.org/zuul/builds?pipeline=openstack-periodic-latest-released&job_name=periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-ussuri

last 4 results failed on
tempest.scenario.test_network_advanced_server_ops.TestNetworkAdvancedServerOps

 * test_server_connectivity_cold_migration[compute,id-a4858f6c-401e-4155-9a49-d5cd053d1a2f,network,slow]
 * test_server_connectivity_resize[compute,id-719eb59d-2f42-4b66-b8b1-bb1254473967,network,slow]

one of these also failed on
tempest.scenario.test_volume_boot_pattern.TestVolumeBootPattern
 * test_volume_boot_pattern[compute,id-557cd2c2-4eb8-4dce-98be-f86765ff311b,image,slow,volume]

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-ussuri/10d4b2a/logs/undercloud/var/log/tempest/stestr_results.html.gz

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-ussuri/af70009/logs/undercloud/var/log/tempest/stestr_results.html.gz

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-ussuri/debf777/logs/undercloud/var/log/tempest/stestr_results.html.gz

https://logserver.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-1ctlr_2comp-featureset020-ussuri/03da72f/logs/undercloud/var/log/tempest/stestr_results.html.gz

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/scenario/test_network_advanced_server_ops.py", line 227, in test_server_connectivity_cold_migration
    'VERIFY_RESIZE')
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (TestNetworkAdvancedServerOps:test_server_connectivity_cold_migration) Server cbf9dd10-ceef-422c-926d-77beb95857c8 failed to reach VERIFY_RESIZE status and task state "None" within the required time (300 s). Current status: ACTIVE. Current task state: None.

Revision history for this message
Rafael Folco (rafaelfolco) wrote :
summary: - ussuri tempest failures on fs020
+ ussuri | master fs020 failing on TestNetworkAdvancedServerOps
tags: added: promotion-blocker
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/735659

Revision history for this message
sean mooney (sean-k-mooney) wrote :

so the nova error we are seing is
http://paste.openstack.org/show/795051/

but that is not a nova bug. its a deployment bug.
for some reason the nova migration ssh container seams to think that
the system is not fully booted.

i have not looked into why that is because i dont really understand how pam is configured in that contaienr but that seams too be the root cause of why the instance cant resize.

as part of the migration we ssh to the other node to perform a number of commands

in this case we were creating the instance dir.
ssh -o BatchMode=yes 172.17.0.241 mkdir -p /var/lib/nova/instances/8ae0f4f2-48ee-46c7-807c-16c8fc2f707e

if you resolve that then failure shoudl go away.

Revision history for this message
sean mooney (sean-k-mooney) wrote :
Revision history for this message
Piotr Kopec (pkopec) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Rafael Folco (<email address hidden>) on branch: master
Review: https://review.opendev.org/735659
Reason: new ruck/rovers will decide what to do

Revision history for this message
Harry Kominos (hkominos) wrote :

I am nearly convinced that I am seeing the same behaviour in my ussuri deployment breaking the live migration.

It seems to me that the bellow connection attempt

virsh -c qemu+ssh://<email address hidden>:2022/system?keyfile=/etc/nova/migration/identity
Fails while it should not.

On the target machine though in securelog.

compute-0 sshd[28]: fatal: Access denied for user nova_migration by PAM account configuration [preauth]

Revision history for this message
Harry Kominos (hkominos) wrote :

there is a /run/nologin in the nova_migration_target container.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Rafael Folco (<email address hidden>) on branch: master
Review: https://review.opendev.org/735659
Reason: let the ruck/rovers decide, this became out of date

Changed in tripleo:
milestone: victoria-1 → victoria-3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Rafael Folco (<email address hidden>) on branch: master
Review: https://review.opendev.org/735659

Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Changed in tripleo:
milestone: xena-1 → xena-2
Revision history for this message
Marios Andreou (marios-b) wrote :

This is an automated action. Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone: xena-2 → none
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for tripleo because there has been no activity for 60 days.]

Changed in tripleo:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.