GroupSnapshotsV319Test.test_reset_group_snapshot_status intermittently fails waiting to reset status to 'creating'

Bug #1770179 reported by Matt Riedemann
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Invalid
High
Unassigned

Bug Description

http://logs.openstack.org/31/566931/1/check/nova-cells-v1/8faceb3/job-output.txt.gz#_2018-05-08_17_19_56_621704

The reset status request is here:

http://logs.openstack.org/31/566931/1/check/nova-cells-v1/8faceb3/job-output.txt.gz#_2018-05-08_17_19_56_665430

2018-05-08 17:19:56.665430 | primary | 2018-05-08 17:16:37,767 3606 INFO [tempest.lib.common.rest_client] Request (GroupSnapshotsV319Test:test_reset_group_snapshot_status): 202 POST http://198.72.124.74/volume/v3/aafd6b059a2041fb87e4975b06c96d6d/group_snapshots/e2d2b2bf-dde6-4357-9af1-79b57017690a/action 0.071s
2018-05-08 17:19:56.665803 | primary | 2018-05-08 17:16:37,767 3606 DEBUG [tempest.lib.common.rest_client] Request - Headers: {'Openstack-Api-Version': 'volume 3.19', 'Content-Type': 'application/json', 'Accept': 'application/json', 'X-Auth-Token': '<omitted>'}
2018-05-08 17:19:56.665933 | primary | Body: {"reset_status": {"status": "creating"}}
2018-05-08 17:19:56.666751 | primary | Response - Headers: {u'openstack-api-version': 'volume 3.19', u'connection': 'close', u'date': 'Tue, 08 May 2018 17:16:37 GMT', u'content-type': 'text/html; charset=UTF-8', u'x-openstack-request-id': 'req-e73546a4-7365-4132-b289-0b1aee94e2ac', 'status': '202', u'content-length': '0', 'content-location': 'http://198.72.124.74/volume/v3/aafd6b059a2041fb87e4975b06c96d6d/group_snapshots/e2d2b2bf-dde6-4357-9af1-79b57017690a/action', u'vary': 'OpenStack-API-Version', u'server': 'Apache/2.4.18 (Ubuntu)'}

I'm not sure why this is a 202 because it's a synchronous DB update in cinder-api, and there are no errors in the c-api logs:

http://logs.openstack.org/31/566931/1/check/nova-cells-v1/8faceb3/logs/screen-c-api.txt.gz#_May_08_17_16_37_704529

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22group_snapshot%5C%22%20AND%20message%3A%5C%22failed%20to%20reach%20creating%20status%20(current%20available)%20within%20the%20required%20time%5C%22%20AND%20tags%3A%5C%22console%5C%22&from=7d

Matt Riedemann (mriedem)
Changed in cinder:
status: New → Confirmed
importance: Undecided → High
description: updated
Revision history for this message
TommyLike (hu-husheng) wrote :

Most of cinder's reset status actions are returning 202 instead of 200, I think we can fix them all.

Revision history for this message
Matt Riedemann (mriedem) wrote :

@Tommy, to be clear, the bug isn't the 202 response, it's that the group snapshot status wasn't actually updated, or at least there is a race when updating it.

Revision history for this message
TommyLike (hu-husheng) wrote :

@Matt, oh, thanks :)

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Based on the bug that was just duped here that had a traceback, that looked like a test issue in that it was looking for the transient "creating" state, but it had already transitioned to the "available" state. The test should probably be updated to allow for either if it doesn't catch the transition in time.

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Doesn't appear to actually be that state transitions too fast. The test explicitly tries setting each status:

https://github.com/openstack/tempest/blob/92f97e2d2f75eadc6d78707c011fa94dab1ba280/tempest/api/volume/admin/test_group_snapshots.py#L249

Revision history for this message
Matt Riedemann (mriedem) wrote :

Is it possible there are multiple tests changing state on the same group at the same time somehow?

Revision history for this message
Matt Riedemann (mriedem) wrote :

Or is it possible that there is some kind of periodic task which is changing the status? I just don't understand how this could be a race when the API call is synchronous.

tags: added: test-coverage
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/819722

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

Investigating whether this is still an issue. Reason for wondering is that an almost identical test (GroupsV320Test.test_reset_group_status) was reported to intermittently fail (Bug #1783566), but that one was never skipped and seems to be OK.

This patch reverts the skip-test in tempest: https://review.opendev.org/c/openstack/tempest/+/819671

This patch depends on the tempest revert and runs periodic tempest jobs:
https://review.opendev.org/c/openstack/cinder/+/819722

We can come back in a few weeks and see what the situation is.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by "Brian Rosmaita <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/819722
Reason: Tempest change this was testing has merged; this patch is no longer needed.

Revision history for this message
Brian Rosmaita (brian-rosmaita) wrote :

Looks like this is no longer an issue.

Changed in cinder:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.