periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute-master tempest test_shelve_unshelve_server failing in component-pipeline

Bug #1893618 reported by Arx Cruz
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Unassigned
Train
Fix Released
Undecided
Unassigned
Ussuri
Fix Released
Undecided
Unassigned
tripleo
Fix Released
Critical
Unassigned

Bug Description

https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute-master/b346467/logs/undercloud/var/log/tempest/stestr_results.html.gz

traceback-1: {{{
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_server_actions.py", line 66, in tearDown
    self.server_check_teardown()
  File "/usr/lib/python3.6/site-packages/tempest/api/compute/base.py", line 220, in server_check_teardown
    cls.server_id, 'ACTIVE')
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (ServerActionsTestJSON:tearDown) Server 41f15309-34bb-430d-8dad-7b9c8362a851 failed to reach ACTIVE status and task state "None" within the required time (300 s). Current status: SHELVED_OFFLOADED. Current task state: None.
}}}

traceback-2: {{{
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_server_actions.py", line 649, in _unshelve_server
    server_info = self.client.show_server(self.server_id)['server']
  File "/usr/lib/python3.6/site-packages/tempest/lib/services/compute/servers_client.py", line 145, in show_server
    resp, body = self.get("servers/%s" % server_id)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 313, in get
    return self.request('GET', url, extra_headers, headers)
  File "/usr/lib/python3.6/site-packages/tempest/lib/services/compute/base_compute_client.py", line 48, in request
    method, url, extra_headers, headers, body, chunked)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 702, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 808, in _error_checker
    raise exceptions.NotFound(resp_body, resp=resp)
tempest.lib.exceptions.NotFound: Object not found
Details: {'code': 404, 'message': 'Instance None could not be found.'}
}}}

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/common/utils/__init__.py", line 89, in wrapper
    return f(*func_args, **func_kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/api/compute/servers/test_server_actions.py", line 666, in test_shelve_unshelve_server
    waiters.wait_for_server_status(self.client, self.server_id, 'ACTIVE')
  File "/usr/lib/python3.6/site-packages/tempest/common/waiters.py", line 96, in wait_for_server_status
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (ServerActionsTestJSON:test_shelve_unshelve_server) Server 41f15309-34bb-430d-8dad-7b9c8362a851 failed to reach ACTIVE status and task state "None" within the required time (300 s). Current status: SHELVED_OFFLOADED. Current task state: None.

Traceback in nova-compute logs https://logserver.rdoproject.org/openstack-component-compute/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute-master/b346467/logs/undercloud/var/log/containers/nova/nova-compute.log.1.gz:-

2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server [req-9280bac1-da23-4f45-b01c-b6012198d97e 10fe2caa6924408485c181adfc7377e8 df52aad2e4da4f07b4b7b4ff6644e121 - default default] Exception during message handling: AttributeError: 'NoneType' object has no attribute 'encode'
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 273, in dispatch
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/dispatcher.py", line 193, in _do_dispatch
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 78, in wrapped
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server function_name, call_dict, binary)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server raise value
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/exception_wrapper.py", line 69, in wrapped
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server return f(self, context, *args, **kw)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 189, in decorated_function
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server "Error: %s", e, instance=instance)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server raise value
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 159, in decorated_function
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/utils.py", line 1440, in decorated_function
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 217, in decorated_function
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server kwargs['instance'], e, sys.exc_info())
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server raise value
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 205, in decorated_function
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server return function(self, context, *args, **kwargs)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 6539, in unshelve_instance
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server do_unshelve_instance()
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py", line 360, in inner
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 6538, in do_unshelve_instance
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server request_spec)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 6625, in _unshelve_instance
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server self._nil_out_instance_obj_host_and_node(instance)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server self.force_reraise()
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/six.py", line 703, in reraise
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server raise value
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/compute/manager.py", line 6609, in _unshelve_instance
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server block_device_info=block_device_info)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 3672, in spawn
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server block_device_info=block_device_info)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 4067, in _create_image
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server injection_info, fallback_from_host)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 4198, in _create_and_inject_local_root
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server self._finalize_unshelve_qcow2_image(context, instance, backend)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 4218, in _finalize_unshelve_qcow2_image
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server root_fname = imagecache.get_cache_fname(base_image_ref)
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/imagecache.py", line 52, in get_cache_fname
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server return hashlib.sha1(image_id.encode('utf-8')).hexdigest()
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server AttributeError: 'NoneType' object has no attribute 'encode'
2020-08-30 08:35:10.183 7 ERROR oslo_messaging.rpc.server

Arx Cruz (arxcruz)
tags: added: alert promotion-blocker tempest
Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → victoria-3
Revision history for this message
yatin (yatinkarel) wrote :

The test is failing post nova patch:- https://review.opendev.org/#/c/696084/, Will propose revert and trigger test job with https://review.rdoproject.org/r/#/c/29112/, Nova guys can suggest/confirm why issue didn't got caught in nova CI jobs, may be similar scenario not running there or Tripleo jobs missing something.

yatin (yatinkarel)
description: updated
Revision history for this message
yatin (yatinkarel) wrote :

Proposed revert https://review.opendev.org/#/c/749035/, but devstack jobs are hitting setuptools-50 issue.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

looks like there have only been two failures since the patch merged

http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22_finalize_unshelve_qcow2_image%5C%22&from=30d

both on the same patch

https://review.opendev.org/#/c/739246/

from irc yatin provided a copy of there repoducer system_metadta table

instance.system_metadata = {'image_hw_rng_model': 'virtio', 'image_min_ram': '0', 'image_min_disk': '1', 'image_disk_format': 'qcow2', 'image_container_format': 'bare', 'owner_user_name': 'tempest-ServerActionsTestJSON-739482129', 'owner_project_name': 'tempest-ServerActionsTestJSON-739482129', 'boot_roles': 'member,reader', 'clean_attempts': '7', 'shelved_at': '2020-08-31T13:19:16.906588', 'shelved_image_id': '7cda75b5-6e51-46b9ba34-3444d68191b3', 'shelved_host': 'standalone.localdomain'}

here we do indeed see that the image_base_image_ref is not set.

but it should be uncodtionally set here

system_meta.setdefault('image_base_image_ref', instance.image_ref)

if it was not set by
system_meta = utils.get_system_metadata_from_image(
            image, instance_type)

https://opendev.org/openstack/nova/src/commit/f5f7c2540150c7ee7640c834d5caec31b3f5a7ab/nova/utils.py#L729-L754

get_system_metadata_from_image creates an empty dict and does not set image_base_image_ref

as that is not stored in teh image properties and is not in SM_INHERITABLE_KEYS

https://opendev.org/openstack/nova/src/commit/f5f7c2540150c7ee7640c834d5caec31b3f5a7ab/nova/utils.py#L69-L71

so the only way this can fail is if instance.image_ref is none wehn we populte
the system metadata initally.

unless something selectivly deletes it after teh fact

Arx Cruz (arxcruz)
summary: periodic-tripleo-ci-centos-8-scenario000-multinode-oooq-container-
- updates-ussuri tempest test_shelve_unshelve_server failing in component-
+ updates tempest test_shelve_unshelve_server failing in component-
pipeline
Revision history for this message
yatin (yatinkarel) wrote : Re: periodic-tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates tempest test_shelve_unshelve_server failing in component-pipeline

<< unless something selectivly deletes it after teh fact

I also noticed the issue happens when test_rebuild_server* tests also run with instance created in setupclass(ServerActionsTestJSON) https://opendev.org/openstack/tempest/src/branch/master/tempest/api/compute/servers/test_server_actions.py#L234-L302

From nova code(https://opendev.org/openstack/nova/src/branch/master/nova/compute/api.py#L3558-L3561) i see below happening during rebuild, which seems cleaning up the original set "image_base_image_ref":-

        # On a rebuild, since we're potentially changing images, we need to
        # wipe out the old image properties that we're storing as instance
        # system metadata... and copy in the properties for the new image.
        orig_sys_metadata = _reset_image_metadata()

yatin (yatinkarel)
summary: - periodic-tripleo-ci-centos-8-scenario000-multinode-oooq-container-
- updates tempest test_shelve_unshelve_server failing in component-
- pipeline
+ periodic-tripleo-ci-centos-8-standalone-full-tempest-api-compute-master
+ tempest test_shelve_unshelve_server failing in component-pipeline
Revision history for this message
Arx Cruz (arxcruz) wrote :
Revision history for this message
Arx Cruz (arxcruz) wrote :

Disregard my previous comment, the failure isn't related

Changed in tripleo:
milestone: victoria-3 → wallaby-1
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/787943

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/787945

Revision history for this message
Lee Yarwood (lyarwood) wrote :
Changed in nova:
status: New → Fix Released
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/nova/+/787943
Committed: https://opendev.org/openstack/nova/commit/560414036d9f2ebcfeb8626029d0bd849d6cad44
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 560414036d9f2ebcfeb8626029d0bd849d6cad44
Author: Alexandre Arents <email address hidden>
Date: Tue Sep 1 08:26:25 2020 +0000

    Update image_base_image_ref during rebuild.

    In different location we assume system_metadata.image_base_image_ref
    exists, because it is set during instance creation in method
    _populate_instance_for_create

    But once instance is rebuild, all system_metadata image property a dropped
    and replace by new image property and without setting back
    image_base_image_ref.

    This change propose to set image_base_image_ref during rebuild.

    In specific case of shelve/unshelve in Qcow2 backend, image_base_image_ref is
    used to rebase disk image, so we ensure this property is set as instance may
    have been rebuild before the fix was apply.

    NOTE(lyarwood): An additional change is required within the
    ShelvePolicyTestV21 test on stable/ussuri to avoid the system_metadata
    of the fake instance being lazy loaded during the test.

    Related-Bug: #1732428
    Closes-Bug: #1893618
    Change-Id: Ia3031ea1f7db8b398f02d2080ca603ded8970200
    (cherry picked from commit fe52b6c25bebdd1b459c7a59fbb8d9f6de200c9d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 21.2.1

This issue was fixed in the openstack/nova 21.2.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/train)

Reviewed: https://review.opendev.org/c/openstack/nova/+/787945
Committed: https://opendev.org/openstack/nova/commit/8a01a58a60e7f4f2c189f8ca065d8c21ccbde04f
Submitter: "Zuul (22348)"
Branch: stable/train

commit 8a01a58a60e7f4f2c189f8ca065d8c21ccbde04f
Author: Alexandre Arents <email address hidden>
Date: Tue Sep 1 08:26:25 2020 +0000

    Update image_base_image_ref during rebuild.

    In different location we assume system_metadata.image_base_image_ref
    exists, because it is set during instance creation in method
    _populate_instance_for_create

    But once instance is rebuild, all system_metadata image property a dropped
    and replace by new image property and without setting back
    image_base_image_ref.

    This change propose to set image_base_image_ref during rebuild.

    In specific case of shelve/unshelve in Qcow2 backend, image_base_image_ref is
    used to rebase disk image, so we ensure this property is set as instance may
    have been rebuild before the fix was apply.

    Related-Bug: #1732428
    Closes-Bug: #1893618
    Change-Id: Ia3031ea1f7db8b398f02d2080ca603ded8970200
    (cherry picked from commit fe52b6c25bebdd1b459c7a59fbb8d9f6de200c9d)
    (cherry picked from commit 560414036d9f2ebcfeb8626029d0bd849d6cad44)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova train-eol

This issue was fixed in the openstack/nova train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.