Activity log for bug #2045921

Date Who What changed Old value New value Message
2023-12-07 18:11:14 melanie witt bug added bug
2023-12-07 18:13:38 melanie witt description Maybe "race" isn't the right word, but the ordering of the Instance.save() of ERROR state and the creation of the instance.fault record can result in the instance.fault not having been created yet after the instance is visibly showing ERROR state. Seen in the gate: test_qos_min_bw_allocation_basic fails because the expected 'fault' field is missing in the server Symptom: test_qos_min_bw_allocation_basic fails because the expected 'fault' field is not found in the server from GET /servers/{server_id} testr_results.html: testtools.matchers._impl.MismatchError: 'fault' not in {'id': '19b6c95f-b91b-4949-b72e-3f7fea0d1a49', 'name': 'tempest-MinBwAllocationPlacementTest-server-1096644195', 'status': 'ERROR', 'tenant_id': '1f52110a4e8649d78861c38daca6f179', 'user_id': '8ef9e2bc05034b03af2d7323155cb71f', 'metadata': {}, 'hostId': '', 'image': {'id': '3cb38f9c-a86e-47c8-984f-74efc924120c', 'links': [{'rel': 'bookmark', 'href': 'https://199.19.213.27/compute/images/3cb38f9c-a86e-47c8-984f-74efc924120c'}]}, 'flavor': {'vcpus': 1, 'ram': 128, 'disk': 1, 'ephemeral': 0, 'swap': 0, 'original_name': 'm1.nano', 'extra_specs': {'hw_rng:allowed': 'True'}}, 'created': '2023-12-07T15:00:24Z', 'updated': '2023-12-07T15:00:30Z', 'addresses': {}, 'accessIPv4': '', 'accessIPv6': '', 'links': [{'rel': 'self', 'href': 'https://199.19.213.27/compute/v2.1/servers/19b6c95f-b91b-4949-b72e-3f7fea0d1a49'}, {'rel': 'bookmark', 'href': 'https://199.19.213.27/compute/servers/19b6c95f-b91b-4949-b72e-3f7fea0d1a49'}], 'OS-DCF:diskConfig': 'MANUAL', 'OS-EXT-AZ:availability_zone': '', 'config_drive': '', 'key_name': None, 'OS-SRV-USG:launched_at': None, 'OS-SRV-USG:terminated_at': None, 'OS-EXT-STS:task_state': None, 'OS-EXT-STS:vm_state': 'error', 'OS-EXT-STS:power_state': 0, 'os-extended-volumes:volumes_attached': [], 'locked': False, 'description': None, 'tags': [], 'trusted_image_certificates': None, 'server_groups': []} screen-placement-api.txt: found no providers with 2147483647 NET_BW_IGR_KILOBIT_PER_SEC this ^ is expected for this part of the test OpenSearch query: message:"testtools.matchers._impl.MismatchError: 'fault' not in" Comments: This may be a race because ERROR state is set on the instance and save()'ed before the 'fault' record is created https://github.com/openstack/nova/blob/302e286408cce2c8df43d6742ca490405a20011d/nova/scheduler/utils.py#L902-L910 and the test waits for ERROR state before checking for the 'fault' field, so maybe sometimes it GETs the instance before the fault was able to be added https://github.com/openstack/tempest/blob/a0b161bbde6d7734833a26ced76ca44b888fe152/tempest/scenario/test_network_qos_placement.py#L269-L276 Maybe "race" isn't the right word, but the ordering of the Instance.save() of ERROR state and the creation of the instance.fault record can result in the instance.fault not having been created yet after the instance is visibly showing ERROR state. Seen in the gate: test_qos_min_bw_allocation_basic fails because the expected 'fault' field is missing in the server Symptom: test_qos_min_bw_allocation_basic fails because the expected 'fault' field is not found in the server from GET /servers/{server_id} testr_results.html: testtools.matchers._impl.MismatchError: 'fault' not in {'id': '19b6c95f-b91b-4949-b72e-3f7fea0d1a49', 'name': 'tempest-MinBwAllocationPlacementTest-server-1096644195', 'status': 'ERROR', 'tenant_id': '1f52110a4e8649d78861c38daca6f179', 'user_id': '8ef9e2bc05034b03af2d7323155cb71f', 'metadata': {}, 'hostId': '', 'image': {'id': '3cb38f9c-a86e-47c8-984f-74efc924120c', 'links': [{'rel': 'bookmark', 'href': 'https://199.19.213.27/compute/images/3cb38f9c-a86e-47c8-984f-74efc924120c'}]}, 'flavor': {'vcpus': 1, 'ram': 128, 'disk': 1, 'ephemeral': 0, 'swap': 0, 'original_name': 'm1.nano', 'extra_specs': {'hw_rng:allowed': 'True'}}, 'created': '2023-12-07T15:00:24Z', 'updated': '2023-12-07T15:00:30Z', 'addresses': {}, 'accessIPv4': '', 'accessIPv6': '', 'links': [{'rel': 'self', 'href': 'https://199.19.213.27/compute/v2.1/servers/19b6c95f-b91b-4949-b72e-3f7fea0d1a49'}, {'rel': 'bookmark', 'href': 'https://199.19.213.27/compute/servers/19b6c95f-b91b-4949-b72e-3f7fea0d1a49'}], 'OS-DCF:diskConfig': 'MANUAL', 'OS-EXT-AZ:availability_zone': '', 'config_drive': '', 'key_name': None, 'OS-SRV-USG:launched_at': None, 'OS-SRV-USG:terminated_at': None, 'OS-EXT-STS:task_state': None, 'OS-EXT-STS:vm_state': 'error', 'OS-EXT-STS:power_state': 0, 'os-extended-volumes:volumes_attached': [], 'locked': False, 'description': None, 'tags': [], 'trusted_image_certificates': None, 'server_groups': []} screen-placement-api.txt: found no providers with 2147483647 NET_BW_IGR_KILOBIT_PER_SEC this ^ is expected for this part of the test OpenSearch query: message:"testtools.matchers._impl.MismatchError: 'fault' not in" Comments: This may be a race because ERROR state is set on the instance and save()'ed before the 'fault' record is created https://github.com/openstack/nova/blob/302e286408cce2c8df43d6742ca490405a20011d/nova/scheduler/utils.py#L902-L910 and the test waits for ERROR state before checking for the 'fault' field, so maybe sometimes it GETs the instance before the fault was able to be added https://github.com/openstack/tempest/blob/a0b161bbde6d7734833a26ced76ca44b888fe152/tempest/scenario/test_network_qos_placement.py#L269-L276 Code: vm_state = updates['vm_state'] properties = request_spec.get('instance_properties', {}) notifier = rpc.get_notifier(service) state = vm_state.upper() LOG.warning('Setting instance to %s state.', state, instance_uuid=instance_uuid) instance = objects.Instance(context=context, uuid=instance_uuid, **updates) instance.obj_reset_changes(['uuid']) instance.save() compute_utils.add_instance_fault_from_exc( context, instance, ex, sys.exc_info()) I wonder if it would be legit to swap the ordering to do compute_utils.add_instance_fault_from_exc() before instance.save() of ERROR state?