OpenStack Shared File Systems Service (Manila)

[tempest] Waiting delete resource in error state does not stop immediately

Bug #2006792 reported by Felipe Rodrigues on 2023-02-10

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Shared File Systems Service (Manila)	In Progress	Undecided	Felipe Rodrigues

Bug Description

During cleanup phase, the Manila should stop wait delete process if the resource is in "error_deleting" or "error" status. Otherwise, it will keep checking until a timeout, slowing the results of the failed test.

From log Example [1]. The cleanup runs the delete snapshot at 18:06:28.950. Then it starts to get the snapshot waiting for a not found error. From the second attempt at 18:06:32.218 the resource is in "erro_deleting" status. It shoudl stop here, but it keeps waiting for not found error, getting the resource. It gives a timeout at 18:11:31.063. Therefore, it took 5 minutes to finish, instead of just 5 seconds.

There are some bad impacts:

1. The failed job takes much more time to finish, spending more resource.
2. The error log can lead to wrong assumptions.

[1] https://paste.openstack.org/show/818675/

Tags:

Revision history for this message

Felipe Rodrigues (felipefutty) wrote on 2023-02-10:

Possible soluton:

I think this patch added this wrong behavior [1]. Before this patch, the "_is_resource_deleted" [2] expected that the "func" returns the resource dict. The "_parse_body" returned the resource dict, extracting the first key. However, the [1] replaced by "rest_client.ResponseBody" which does not take the first key in account. As result, the "_is_resource_deleted" call of "res.get('status')" is returning None, instead of the "status" of the resource. The correct should be:

res['resource'].get("status")

[1]https://review.opendev.org/c/openstack/manila-tempest-plugin/+/788248
[2] https://github.com/openstack/manila-tempest-plugin/blob/master/manila_tempest_tests/services/share/json/shares_client.py#L330