Multiple volume related test failed due iscsiadm getting sessions: iscsiadm: and VolumeDeviceNotFound in nova compute

Bug #1861393 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Alex Schultz

Bug Description

Multiple tempest tests related to volume failed on fs020 which got skipped in this review https://review.opendev.org/#/c/701403/ and these failed tests gets triggered in fs021 to find and debug the issue.

For example:
{{0} tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment [440.572185s] ... FAILED

{1} tempest.api.compute.volumes.test_attach_volume.AttachVolumeShelveTestJSON.test_attach_volume_shelved_or_offload_server [673.744471s] ... FAILED

Captured traceback-2:
~~~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/tempest/common/waiters.py", line 215, in wait_for_volume_resource_status
        raise lib_exc.TimeoutException(message)
    tempest.lib.exceptions.TimeoutException: Request timed out
    Details: volume 574c8e12-e5b6-4221-9a31-db6097f7a476 failed to reach available status (current reserved) within the required time (300 s).

Captured traceback:
~~~~~~~~~~~~~~~~~~~
    Traceback (most recent call last):
      File "/usr/lib/python2.7/site-packages/tempest/api/compute/volumes/test_attach_volume.py", line 255, in test_attach_volume_shelved_or_offload_server
        server, validation_resources, num_vol + 1)
      File "/usr/lib/python2.7/site-packages/tempest/api/compute/volumes/test_attach_volume.py", line 234, in _unshelve_server_and_check_volumes
        'ACTIVE')
      File "/usr/lib/python2.7/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status
        raise lib_exc.TimeoutException(message)
    tempest.lib.exceptions.TimeoutException: Request timed out
    Details: (AttachVolumeShelveTestJSON:test_attach_volume_shelved_or_offload_server) Server ccf85cec-e8c3-4b1a-8e15-6270e2e64385 failed to reach ACTIVE status and task state "None" within the required time (300 s). Current status: SHELVED_OFFLOADED. Current task state: spawning.

Below is the list of other tests which got failed:
* tempest.api.compute.servers.test_device_tagging.TaggedAttachmentsTest.test_tagged_attachment
* tempest.api.compute.volumes.test_attach_volume.AttachVolumeShelveTestJSON.test_attach_volume_shelved_or_offload_server
* tempest.api.compute.volumes.test_attach_volume_negative.AttachVolumeNegativeTest.test_attach_attached_volume_to_different_server
* tempest.api.compute.admin.test_volumes_negative.VolumesAdminNegativeTest.test_update_attached_volume_with_nonexistent_volume_in_body
* tempest.api.compute.volumes.test_attach_volume_negative.AttachVolumeNegativeTest.test_attach_attached_volume_to_same_server
* tempest.api.compute.volumes.test_attach_volume_negative.AttachVolumeNegativeTest.test_delete_attached_volume
* tempest.api.compute.servers.test_delete_server.DeleteServersTestJSON.test_delete_server_while_in_attached_volume
* tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario [
* tempest.scenario.test_stamp_pattern.TestStampPattern.test_stamp_pattern
* tempest.api.compute.servers.test_server_actions.ServerActionsTestJSON.test_rebuild_server_with_volume_attached

And many others

While looking at nova-compute logs http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset021-master/9cad320/logs/overcloud-novacompute-0/var/log/containers/nova/nova-compute.log

"2020-01-27 11:47:07.505 8 DEBUG os_brick.initiator.connectors.iscsi [req-576d1a32-ba56-4e5b-ade7-b7d8b733c1d2 89cf2d21771a4690ae41516837ff07eb 9943b7e169624779a7077676dc58e1ad - default default] iscsi session list stdout= stderr=iscsiadm: No active sessions.
 _run_iscsi_session /usr/lib/python2.7/site-packages/os_brick/initiator/connectors/iscsi.py:1113
2020-01-27 11:47:07.505 8 WARNING os_brick.initiator.connectors.iscsi [req-576d1a32-ba56-4e5b-ade7-b7d8b733c1d2 89cf2d21771a4690ae41516837ff07eb 9943b7e169624779a7077676dc58e1ad - default default] iscsiadm stderr output when getting sessions: iscsiadm: No active sessions.

2020-01-27 11:47:07.889 8 DEBUG nova.compute.provider_tree [req-00a1af11-5fbc-4deb-99c8-27b44afab732 785ce0b818f24cf8b2d43f3b40abb9d3 1ceecda10db945e48e69aedb2f0417c2 - default default] Inventory has not changed in ProviderTree for provider: 47fe23d0-af6a-4528-a394-4ef122e47656 update_inventory /usr/lib/python2.7/site-packages/nova/compute/provider_tree.py:181
2020-01-27 11:47:07.893 8 DEBUG nova.virt.libvirt.driver [req-00a1af11-5fbc-4deb-99c8-27b44afab732 785ce0b818f24cf8b2d43f3b40abb9d3 1ceecda10db945e48e69aedb2f0417c2 - default default] Libvirt baseline CPU <cpu>
  <arch>x86_64</arch>
  <model>qemu64</model>
  <vendor>Intel</vendor>
  <topology sockets="4" cores="1" threads="1"/>
</cpu>
 _get_guest_baseline_cpu_features /usr/lib/python2.7/site-packages/nova/virt/libvirt/driver.py:10532
2020-01-27 11:47:07.919 8 DEBUG nova.scheduler.client.report [req-00a1af11-5fbc-4deb-99c8-27b44afab732 785ce0b818f24cf8b2d43f3b40abb9d3 1ceecda10db945e48e69aedb2f0417c2 - default default] Inventory has not changed for provider 47fe23d0-af6a-4528-a394-4ef122e47656 based on inventory data: {u'VCPU': {u'allocation_ratio': 16.0, u'total': 4, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 4}, u'MEMORY_MB': {u'allocation_ratio': 1.0, u'total': 8191, u'reserved': 512, u'step_size': 1, u'min_unit': 1, u'max_unit': 8191}, u'DISK_GB': {u'allocation_ratio': 1.0, u'total': 79, u'reserved': 0, u'step_size': 1, u'min_unit': 1, u'max_unit': 79}} set_inventory_for_provider /usr/lib/python2.7/site-packages/nova/scheduler/client/report.py:897
2020-01-27 11:47:07.920 8 DEBUG oslo_concurrency.lockutils [req-00a1af11-5fbc-4deb-99c8-27b44afab732 785ce0b818f24cf8b2d43f3b40abb9d3 1ceecda10db945e48e69aedb2f0417c2 - default default] Lock "compute_resources" released by "nova.compute.resource_tracker.abort_instance_claim" :: held 0.471s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:370
2020-01-27 11:47:07.921 8 ERROR nova.compute.manager [req-00a1af11-5fbc-4deb-99c8-27b44afab732 785ce0b818f24cf8b2d43f3b40abb9d3 1ceecda10db945e48e69aedb2f0417c2 - default default] [instance: ccf85cec-e8c3-4b1a-8e15-6270e2e64385] Instance failed to spawn: VolumeDeviceNotFound: Volume device not found at .

It might be something related to tripleo_iscsi and it listens to 3260 port.

On checking the firewall rule http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset021-master/9cad320/logs/overcloud-controller-0/etc/sysconfig/iptables on controller there is no rule associated with that

Changed in tripleo:
assignee: nobody → yatin (yatinkarel)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.opendev.org/704962

Changed in tripleo:
assignee: yatin (yatinkarel) → chandan kumar (chkumar246)
Changed in tripleo:
assignee: chandan kumar (chkumar246) → yatin (yatinkarel)
Changed in tripleo:
assignee: yatin (yatinkarel) → chandan kumar (chkumar246)
Changed in tripleo:
assignee: chandan kumar (chkumar246) → Alex Schultz (alex-schultz)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/704805
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=45337bdb0f971c2bf33bfc542e1e478f396e791b
Submitter: Zuul
Branch: master

commit 45337bdb0f971c2bf33bfc542e1e478f396e791b
Author: yatinkarel <email address hidden>
Date: Wed Jan 29 19:05:50 2020 +0530

    Add missing firewall rule for iscsid in HA deployments

    After https://review.opendev.org/#/c/677237/ iscsid firewall
    rules are not applied as now rules are applied with ansible
    and these rules are collected like below:-
    ($.data.role_data, []).where($ != null).select($.get('firewall_rules').

    To get the rules applied for cinder-volume-pacemaker these
    rules need to be defined in role_data for it, reusing
    already defined rules from CinderBase resource.

    Depends-On: https://review.opendev.org/#/c/705051/
    Closes-Bug: #1861393
    Change-Id: I8cb54b94fa2c011df67696497eeaec4d022080cc

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/704962
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=a49edd82c483fa01777803f5a1f4c1d9a5e7bec3
Submitter: Zuul
Branch: master

commit a49edd82c483fa01777803f5a1f4c1d9a5e7bec3
Author: Chandan Kumar (raukadah) <email address hidden>
Date: Thu Jan 30 11:15:47 2020 +0000

    Revert "update master skip list, nova issues and timeout"

    # cinder-fix on tht
    Depends-On: https://review.opendev.org/#/c/704805/

    # ssh known host fix on tripleo-ansible
    Depends-On: https://review.opendev.org/#/c/704880/

    # Adding tripleo_role_networks on tripleo-common
    Depends-On: https://review.opendev.org/#/c/704919/

    Closes-Bug: #1861393
    Closes-Bug: #1861296

    This reverts commit 4a1526f8d48d5da7f812a70e7d2d407f7650380b.

    Change-Id: I888431e5d60591acb725a3e18ac133fa0cba496d

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 12.1.0

This issue was fixed in the openstack/tripleo-heat-templates 12.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.